Python Cache: How to Speed Up Your Code with Effective Caching

Vytenis Kaubrė

Last updated by Akvilė Lūžaitė

2025-07-24

10 min read

A smooth and seamless user experience is a critical factor for the success of any user-facing application. Developers often strive to minimize application latencies to improve the user experience. Usually, the root cause of these latencies is data access delays.

Developers can significantly reduce the delays by caching the data, which leads to faster load times and happier users. Web scraping is no exception – large-scale projects can also be considerably sped up.

But what is caching exactly, and how can it be implemented? This article will discuss caching, its purpose and uses, and how to run a Python script to supercharge your web scraping.

What is a cache in programming?

Caching is a mechanism for improving the performance of any application. In a technical sense, caching is storing the data in a cache and retrieving it later. But wait, what exactly is a cache?

A cache is a fast storage space (usually temporary) where frequently accessed data is kept to significantly speed up the system's performance and decrease the access times. For example, a computer's cache is a small but fast memory chip (usually an SRAM) between the CPU and the main memory chip (usually a DRAM).

The CPU first checks the cache when it needs to access the data. If it's in the cache, a cache hit occurs, and the data is thereby read from the cache instead of a relatively slower main memory. It results in reduced access times and enhanced performance.

The purpose of caching

Caching can improve the performance of applications and systems in several ways. Here are the primary reasons to use caching:

Reduced access time

A cache's main objective is to speed up access to frequently used data. Caching accomplishes this by keeping frequently used data in a temporary storage area that's easier to access than the original data source. Caching can significantly speed an application's or system's overall performance by decreasing access time, especially for function calls with repetitive arguments multiple times.

Reduced system load

Caching can also reduce the load on the system. This is achieved by reducing the number of requests made to the external data source (e.g., a database).

Cache stores the frequently used data in the cache storage, allowing the applications to access the cached value rather than repeatedly requesting the data source. This reduces the external data source load and eventually improves the system's performance by avoiding redundant computations.

Improved user experience

Caching allows users to get data more rapidly, supporting more natural interaction with the system or application. This is particularly important with real-time systems and web applications since users expect immediate responses. Therefore, caching can help improve the overall user experience of an application or a system.

Common use cases for caching

Caching is a general concept and has several prominent use cases. You can apply it in any scenario where data access has some patterns, and you can predict what data will be demanded next. You can prefetch the demanded data in the cache store and improve application performance. This is also a common strategy in data science, where algorithms may access large datasets or perform expensive computations that can be avoided through caching.

Web applications

We frequently need to access data from databases or external APIs in web applications. Caching can reduce the access time for databases or API requests and increase the performance of a web application.

In a web application, caching is used on both the client and the server side. On the client side, the web application stores static resources (e.g., images) and user preferences (e.g., theme settings) in the client's browser cache.

On the server side, in-memory caching (storing data in the system's memory) between the database and the web servers can significantly reduce request load to the database, resulting in faster load times. This often involves cache functions that store values stored from function calls to eliminate repeating the same calculations.

For example, shopping websites display a list of products, and users move back and forth on multiple products. To prevent repeated access to the database to get the product list, each time a visitor opens the page, we can cache the list of products using in-memory caching. If a new value is added (e.g., a newly launched product), the cache can be invalidated or refreshed.

Machine learning

Data science and machine learning applications frequently require large datasets. Prefetching subsets of the dataset in the cache will decrease the data access time and, eventually, reduce the training time of the model by avoiding expensive computations.

After completing the training, the machine learning models often learn weight vectors. You can cache these weight vectors and then quickly access them to predict any new unseen samples, which is the most frequent operation. Using cache functions in inference pipelines can eliminate redundant computations, particularly when handling function calls with the same arguments repeatedly.

Central processing units (CPUs)

Central processing units use dedicated caches (e.g., L1, L2, and L3) to improve their operations. CPU prefetches the data based on spatial and temporal access patterns and thus saves several CPU cycles otherwise wasted on reading from the main memory (RAM).

Caching at the CPU level also mirrors how high-level programming languages use cache decorators to wrap functions, storing cached values for previously computed inputs. This is especially helpful when dealing with recursive calls, where the same calculations may be performed multiple times if not cached.

The effectiveness of these mechanisms depends on the appropriate configuration of cache size, ensuring a balance between memory usage and performance.

Caching Strategies

Different caching mechanisms can be devised based on specific spatial or temporal data access patterns.

Spatial caching is less popular in user-facing applications and more common in scientific or technical applications that deal with vast volumes of data and require high performance and computational power. It involves prefetching data that's spatially close (in the memory) to the data the application or system is currently processing. This idea exploits the fact that it's more likely that a program or an application user will next demand the data units that are spatially close to the current demand. Thereby, prefetching can save time and reduce memory usage.

Temporal caching, a more common strategy, involves retaining data in the cache based on its frequency of use. This approach is highly effective when dealing with function calls that repeatedly operate on the same arguments, as the cached value can be quickly retrieved without recomputation. Here are the most popular temporal caching strategies:

First-In, First-Out (FIFO)

The FIFO caching approach operates on the principle that the first item added to the cache will be the first one to be removed. This method involves loading a predetermined number of items into the cache, and when the cache reaches total capacity, the oldest item is removed to accommodate a new one. The maximum cache size directly controls when eviction happens.

This approach is ideal for systems prioritizing the order of access, such as message processing or queue management systems, especially when cache functions are applied sequentially in predictable patterns.

Last-In, First-Out (LIFO)

It works by replacing the last item added to the cache first. The cache is loaded with a set number of items at the beginning. As new items are added, the most recently added item is the first one to be removed to make space for the newest one, with the oldest item in the cache remaining until it's removed to make space.

This method is suitable for applications prioritizing recent data over older data, such as stack-based data structures or real-time streaming. When combined with a default value, this method can also ensure fallback behaviors when the required data is missing.

Least Recently Used Caching (LRU)

LRU caching involves storing frequently used items while removing the least recently used ones to free up space. This method is particularly useful in scenarios with web applications or database systems, where more importance is placed on frequently accessed data than older data. In many implementations, an LRU strategy enforces a maximum size for the cache to control memory usage efficiently.

To understand this concept, let's imagine that we're hosting a movie website, and we need to cache movie information. Assume we have a cache of four units in size, and information for each movie takes one such unit. Therefore, we can cache information for four movies only. When the cache is full, and a new movie is accessed, the one that hasn’t been accessed in the longest time will be removed – unless it's the cached value of a recent function call with the same arguments, which would keep it prioritized in the cache.

Now, suppose we have the following list of movies to host:

Movie A
Movie B
Movie C
Movie D
Movie E
Movie F

Let’s say that the cache first populated with these four movies along with their request time:

(1:50) Movie B
(1:43) Movie C
(1:30) Movie A
(1:59) Movie F

Our cache is now full. If there's a request for a new movie (say Movie D at 02:30), we must remove any of the movies and add a new one. In the LRU caching strategy, the movie that wasn’t recently watched will be removed first. This means that Movie A will be replaced by Movie D with a new timestamp:

(1:50) Movie B
(1:43) Movie C
(2:30) Movie D
(1:59) Movie F

Most Recently Used Caching (MRU)

MRU caching eliminates items based on their most recent use. This differs from LRU caching, which removes the least recently used items first.

To illustrate, in our movie hosting example, the movie with the newest time will be replaced. Let's reset the cache to the time it initially became full:

(1:50) Movie B
(1:43) Movie C
(1:30) Movie A
(1:59) Movie F

Our cache is now full. If there’s a request for a new movie (Movie D), we must remove any of the movies and add a new one. Under MRU’s strategy, the movie with the latest time will be replaced with a new movie. This means Movie D at 2:30 will replace Movie F:

(1:50) Movie B
(1:43) Movie C
(1:30) Movie A
(2:30) Movie D

MRU is helpful in situations where the longer something goes without being used, the more probable it will be used again later.

Least Frequently Used Caching (LFU)

The Least Frequently Used (LFU) policy removes the cache item that has been used the least number of times since it was first added. LFU, unlike LRU and MRU, doesn’t need to store the access times. It simply keeps track of the number of times an item has been accessed since its addition.

Let's use the movie example again. This time, we have maintained a count of how often a movie is watched:

(2) Movie B
(2) Movie C
(1) Movie A
(3) Movie F

Our cache is now full. If there’s a request for a new movie (Movie D), we must remove any of the movies and add a new one. LFU replaces the movie with the lowest watch count with a new movie:

(2) Movie B
(2) Movie C
(1) Movie D
(3) Movie F

How to implement a cache in Python

There are different ways to implement caching in Python for different caching strategies. Here, we’ll see two methods of Python caching for a simple web scraping example. If you’re new to web scraping, take a look at our step-by-step Python web scraping guide.

Install the required libraries

We’ll use the requests library to make HTTP requests to a website. Install it with pip by entering the following command in your terminal:

python -m pip install requests

Other libraries we’ll use in this project, specifically time and functools, come natively with Python 3.11.2, so you don’t have to install them.

Method 1: Python caching using a manual decorator

A decorator in Python is a function that accepts another function as an argument and outputs a new function. We can alter the behavior of the original function using a decorator without changing its source code.

One common use case for decorators is to implement caching. This involves creating a dictionary to store the function's results and then saving them in the cache for future use.

Let’s start by creating a simple function that takes a URL as a function argument, requests that URL, and returns the response text:

import requests


def get_html_data(url):
    response = requests.get(url)
    return response.text

Link to GitHub

Now, let's move toward creating a memoized version of this function:

def memoize(func):
    cache = {}

    def wrapper(*args):
        if args in cache:
            return cache[args]
        else:
            result = func(*args)
            cache[args] = result
            return result

    return wrapper


@memoize
def get_html_data_cached(url):
    response = requests.get(url)
    return response.text

Link to GitHub

The wrapper function determines whether the current input arguments have been previously cached and, if so, returns the previously cached result. If not, the code calls the original function and caches the result before being returned. In this case, we define a memoize decorator that generates a cache dictionary to hold the results of previous function calls.

By adding @memoize above the function definition, we can use the memoize decorator to enhance the get_html_data function. This generates a new memoized function that we’ve called get_html_data_cached. It only makes a single network request for a URL and then stores the response in the cache for further requests.

Let’s use the time module to compare the execution speeds of the get_html_data function and the memoized get_html_data_cached function:

import time


start_time = time.time()
get_html_data('https://books.toscrape.com/')
print('Time taken (normal function):', time.time() - start_time)


start_time = time.time()
get_html_data_cached('https://books.toscrape.com/')
print('Time taken (memoized function using manual decorator):', time.time() - start_time)

Link to GitHub

Here’s what the complete code looks like:

# Import the required modules
import time
import requests


# Function to get the HTML Content
def get_html_data(url):
    response = requests.get(url)
    return response.text


# Memoize function to cache the data
def memoize(func):
    cache = {}

    # Inner wrapper function to store the data in the cache
    def wrapper(*args):
        if args in cache:
            return cache[args]
        else:
            result = func(*args)
            cache[args] = result
            return result

    return wrapper


# Memoized function to get the HTML Content
@memoize
def get_html_data_cached(url):
    response = requests.get(url)
    return response.text


# Get the time it took for a normal function
start_time = time.time()
get_html_data('https://books.toscrape.com/')
print('Time taken (normal function):', time.time() - start_time)

# Get the time it took for a memoized function (manual decorator)
start_time = time.time()
get_html_data_cached('https://books.toscrape.com/')
print('Time taken (memoized function using manual decorator):', time.time() - start_time)

Link to GitHub

And here’s the output:

Time comparison of a normal function and a cached function

Notice the time difference between the two functions. Both take almost the same time, but the supremacy of caching lies behind the re-access.

Since we’re making only one request, the memoized function also has to access data from the main memory. Therefore, with our example, a significant time difference in execution isn’t expected. However, if you increase the number of calls to these functions, the time difference will significantly increase (see below section called Performance comparison).

Method 2: Python caching using LRU cache decorator

Another method to implement caching in Python is to use the built-in @lru_cache decorator from functools. This decorator implements cache using the least recently used (LRU) caching strategy. This LRU cache is a fixed-size cache, which means it’ll discard the data from the cache that hasn’t been used recently.

To use the @lru_cache decorator, we can create a new function for extracting HTML content and place the decorator name at the top. Make sure to import the functools module before using the decorator:

from functools import lru_cache


@lru_cache(maxsize=None)
def get_html_data_lru(url):
    response = requests.get(url)
    return response.text

Link to GitHub

In the above example, the get_html_data_lru method is memoized using the @lru_cache decorator. The cache can grow indefinitely when the maxsize option is set to None.

To use the @lru_cache decorator, just add it above the get_html_data_lru function. Here’s the complete code sample:

# Import the required modules
from functools import lru_cache
import time
import requests


# Function to get the HTML Content
def get_html_data(url):
    response = requests.get(url)
    return response.text


# Memoized using LRU Cache
@lru_cache(maxsize=None)
def get_html_data_lru(url):
    response = requests.get(url)
    return response.text


# Get the time it took for a normal function
start_time = time.time()
get_html_data('https://books.toscrape.com/')
print('Time taken (normal function):', time.time() - start_time)

# Get the time it took for a memoized function (LRU cache)
start_time = time.time()
get_html_data_lru('https://books.toscrape.com/')
print('Time taken (memoized function with LRU cache):', time.time() - start_time)

Link to GitHub

This produced the following output:

Time comparison of a normal function and a cached function with LRU

Performance comparison

In the following table, we’ve determined the execution times of all three functions for different numbers of requests to these functions:

No. of requests	Time taken by normal function	Time taken by memoized function (manual decorator)	Time taken by memoized function (lru_cache decorator)
1	2.1 Seconds	2.0 Seconds	1.7 Seconds
10	17.3 Seconds	2.1 Seconds	1.8 Seconds
20	32.2 Seconds	2.2 Seconds	2.1 Seconds
30	57.3 Seconds	2.22 Seconds	2.12 Seconds

As the number of requests to the functions increases, you can see a significant reduction in execution times using the caching strategy. The following comparison chart depicts these results:

The comparison results clearly show that using a caching strategy in your code can significantly improve overall performance and speed.

Conclusion

If you want to speed up your code, caching the data can help. There are many use cases where caching is the game changer and web scraping is one of them. When dealing with large-scale projects, consider caching frequently used data in your code to massively speed up your data extraction efforts and improve overall performance.

In case you have questions about our products, feel free to contact our 24/7 support via live chat or email.

Frequently asked questions

What is the difference between caching and data replication?

Caching and data replication are two different methods used to improve the performance and availability of data.

Caching involves storing frequently accessed data locally (often using a Python local cache) to serve future function calls quickly. For example, when a user makes a second call with the same input, the cached value is returned, reducing expensive function calls and redundant computations.

Data replication, however, involves duplicating data across multiple servers or locations to ensure high availability and fault tolerance. While caching boosts performance by minimizing delay, replication primarily ensures resilience and uptime.

What is the difference between memoize and cache in Python?

Memoization is a specific form of caching, typically applied to recursive calls or repetitive function calls. It's often used in problems like calculating the fibonacci sequence, where the same larger values are repeatedly recomputed.

Using a cache decorator like @lru_cache in Python can memoize a function, storing its cached value and automatically evicting the least recently used entries when the cache size is exceeded.

Caching, more broadly, refers to storing data for quicker access, whether it's function return values, API responses, or parsed XML data. If you're handling expensive function calls in your Python applications, both memoization and caching can dramatically improve performance.

Learn more by reading our related blog posts like How to Parse XML in Python or Python Requests Library: 2025 Guide to see how you can cache parsed results or HTTP responses.

When should I use caching in Python?

Use caching when:

You call the same function with identical arguments multiple times.
You're dealing with expensive computations or I/O operations.
You want to improve performance in Python web scraping, API consumption, or data parsing.

Whether you're implementing a Python local cache, a decorator-based approach, or custom logic, caching helps you significantly speed up execution.

Is caching useful for recursive functions like Fibonacci?

Absolutely. Calculating the fibonacci sequence recursively is a classic example where caching helps. Without memoization, every call recalculates previous values, causing exponential growth in recursive calls.

With a cache decorator like @lru_cache, each new value is stored, and future function calls simply reuse cached values – massively improving speed.

How do I cache data fetched using Python Requests?

When using the requests library, it's common to hit the same endpoints repeatedly. You can cache these responses using third-party tools like requests-cache, which stores responses in memory or on disk.

This kind of caching is extremely valuable when working with cURL with Python or when making frequent API calls using Python Requests.

How can I cache API responses while rotating proxies in Python?

If you're rotating proxies to avoid getting blocked during high-volume scraping or API polling, you still want to avoid repeating the same calculations or requests. Caching responses while rotating proxies ensures that only new values trigger actual network activity, saving time and reducing server hits.

Combining rotate proxies Python logic with tools like requests-cache or custom caching layers helps achieve this. Use queries like cache Python or Python local cache to find libraries that support this kind of hybrid approach.

About the author

Vytenis Kaubrė

Technical Content Researcher

Vytenis Kaubrė is a Technical Content Researcher at Oxylabs. Creative writing and a growing interest in technology fuel his daily work, where he researches and crafts technical content, all the while honing his skills in Python. Off duty, you may catch him working on personal projects, learning all things cybersecurity, or relaxing with a book.

Learn more about Vytenis Kaubrė Learn more about Vytenis Kaubrė

All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.