A smooth and seamless user experience is a critical factor for the success of any user-facing application. Developers often strive to minimize application latencies to improve the user experience. Usually, the root cause of these latencies is data access delays.
Developers can significantly reduce the delays by caching the data, which leads to faster load times and happier users. Web scraping is no exception – large-scale projects can also be considerably sped up.
But what is caching exactly, and how can it be implemented? This article will discuss caching, its purpose and uses, and how to run a Python script to supercharge your web scraping.
Caching is a mechanism for improving the performance of any application. In a technical sense, caching is storing the data in a cache and retrieving it later. But wait, what exactly is a cache?
A cache is a fast storage space (usually temporary) where frequently accessed data is kept to significantly speed up the system's performance and decrease the access times. For example, a computer's cache is a small but fast memory chip (usually an SRAM) between the CPU and the main memory chip (usually a DRAM).
The CPU first checks the cache when it needs to access the data. If it's in the cache, a cache hit occurs, and the data is thereby read from the cache instead of a relatively slower main memory. It results in reduced access times and enhanced performance.
Caching can improve the performance of applications and systems in several ways. Here are the primary reasons to use caching:
A cache's main objective is to speed up access to frequently used data. Caching accomplishes this by keeping frequently used data in a temporary storage area that's easier to access than the original data source. Caching can significantly speed an application's or system's overall performance by decreasing access time, especially for function calls with repetitive arguments multiple times.
Caching can also reduce the load on the system. This is achieved by reducing the number of requests made to the external data source (e.g., a database).
Cache stores the frequently used data in the cache storage, allowing the applications to access the cached value rather than repeatedly requesting the data source. This reduces the external data source load and eventually improves the system's performance by avoiding redundant computations.
Caching allows users to get data more rapidly, supporting more natural interaction with the system or application. This is particularly important with real-time systems and web applications since users expect immediate responses. Therefore, caching can help improve the overall user experience of an application or a system.
Caching is a general concept and has several prominent use cases. You can apply it in any scenario where data access has some patterns, and you can predict what data will be demanded next. You can prefetch the demanded data in the cache store and improve application performance. This is also a common strategy in data science, where algorithms may access large datasets or perform expensive computations that can be avoided through caching.
We frequently need to access data from databases or external APIs in web applications. Caching can reduce the access time for databases or API requests and increase the performance of a web application.
In a web application, caching is used on both the client and the server side. On the client side, the web application stores static resources (e.g., images) and user preferences (e.g., theme settings) in the client's browser cache.
On the server side, in-memory caching (storing data in the system's memory) between the database and the web servers can significantly reduce request load to the database, resulting in faster load times. This often involves cache functions that store values stored from function calls to eliminate repeating the same calculations.
For example, shopping websites display a list of products, and users move back and forth on multiple products. To prevent repeated access to the database to get the product list, each time a visitor opens the page, we can cache the list of products using in-memory caching. If a new value is added (e.g., a newly launched product), the cache can be invalidated or refreshed.
Data science and machine learning applications frequently require large datasets. Prefetching subsets of the dataset in the cache will decrease the data access time and, eventually, reduce the training time of the model by avoiding expensive computations.
After completing the training, the machine learning models often learn weight vectors. You can cache these weight vectors and then quickly access them to predict any new unseen samples, which is the most frequent operation. Using cache functions in inference pipelines can eliminate redundant computations, particularly when handling function calls with the same arguments repeatedly.
Central processing units use dedicated caches (e.g., L1, L2, and L3) to improve their operations. CPU prefetches the data based on spatial and temporal access patterns and thus saves several CPU cycles otherwise wasted on reading from the main memory (RAM).
Caching at the CPU level also mirrors how high-level programming languages use cache decorators to wrap functions, storing cached values for previously computed inputs. This is especially helpful when dealing with recursive calls, where the same calculations may be performed multiple times if not cached.
The effectiveness of these mechanisms depends on the appropriate configuration of cache size, ensuring a balance between memory usage and performance.
Different caching mechanisms can be devised based on specific spatial or temporal data access patterns.
Spatial caching is less popular in user-facing applications and more common in scientific or technical applications that deal with vast volumes of data and require high performance and computational power. It involves prefetching data that's spatially close (in the memory) to the data the application or system is currently processing. This idea exploits the fact that it's more likely that a program or an application user will next demand the data units that are spatially close to the current demand. Thereby, prefetching can save time and reduce memory usage.
Temporal caching, a more common strategy, involves retaining data in the cache based on its frequency of use. This approach is highly effective when dealing with function calls that repeatedly operate on the same arguments, as the cached value can be quickly retrieved without recomputation. Here are the most popular temporal caching strategies:
The FIFO caching approach operates on the principle that the first item added to the cache will be the first one to be removed. This method involves loading a predetermined number of items into the cache, and when the cache reaches total capacity, the oldest item is removed to accommodate a new one. The maximum cache size directly controls when eviction happens.
This approach is ideal for systems prioritizing the order of access, such as message processing or queue management systems, especially when cache functions are applied sequentially in predictable patterns.
It works by replacing the last item added to the cache first. The cache is loaded with a set number of items at the beginning. As new items are added, the most recently added item is the first one to be removed to make space for the newest one, with the oldest item in the cache remaining until it's removed to make space.
This method is suitable for applications prioritizing recent data over older data, such as stack-based data structures or real-time streaming. When combined with a default value, this method can also ensure fallback behaviors when the required data is missing.
LRU caching involves storing frequently used items while removing the least recently used ones to free up space. This method is particularly useful in scenarios with web applications or database systems, where more importance is placed on frequently accessed data than older data. In many implementations, an LRU strategy enforces a maximum size for the cache to control memory usage efficiently.
To understand this concept, let's imagine that we're hosting a movie website, and we need to cache movie information. Assume we have a cache of four units in size, and information for each movie takes one such unit. Therefore, we can cache information for four movies only. When the cache is full, and a new movie is accessed, the one that hasn’t been accessed in the longest time will be removed – unless it's the cached value of a recent function call with the same arguments, which would keep it prioritized in the cache.
Now, suppose we have the following list of movies to host:
Movie A
Movie B
Movie C
Movie D
Movie E
Movie F
Let’s say that the cache first populated with these four movies along with their request time:
(1:50) Movie B
(1:43) Movie C
(1:30) Movie A
(1:59) Movie F
Our cache is now full. If there's a request for a new movie (say Movie D at 02:30), we must remove any of the movies and add a new one. In the LRU caching strategy, the movie that wasn’t recently watched will be removed first. This means that Movie A will be replaced by Movie D with a new timestamp:
(1:50) Movie B
(1:43) Movie C
(2:30) Movie D
(1:59) Movie F
MRU caching eliminates items based on their most recent use. This differs from LRU caching, which removes the least recently used items first.
To illustrate, in our movie hosting example, the movie with the newest time will be replaced. Let's reset the cache to the time it initially became full:
(1:50) Movie B
(1:43) Movie C
(1:30) Movie A
(1:59) Movie F
Our cache is now full. If there’s a request for a new movie (Movie D), we must remove any of the movies and add a new one. Under MRU’s strategy, the movie with the latest time will be replaced with a new movie. This means Movie D at 2:30 will replace Movie F:
(1:50) Movie B
(1:43) Movie C
(1:30) Movie A
(2:30) Movie D
MRU is helpful in situations where the longer something goes without being used, the more probable it will be used again later.
The Least Frequently Used (LFU) policy removes the cache item that has been used the least number of times since it was first added. LFU, unlike LRU and MRU, doesn’t need to store the access times. It simply keeps track of the number of times an item has been accessed since its addition.
Let's use the movie example again. This time, we have maintained a count of how often a movie is watched:
(2) Movie B
(2) Movie C
(1) Movie A
(3) Movie F
Our cache is now full. If there’s a request for a new movie (Movie D), we must remove any of the movies and add a new one. LFU replaces the movie with the lowest watch count with a new movie:
(2) Movie B
(2) Movie C
(1) Movie D
(3) Movie F
There are different ways to implement caching in Python for different caching strategies. Here, we’ll see two methods of Python caching for a simple web scraping example. If you’re new to web scraping, take a look at our step-by-step Python web scraping guide.
We’ll use the requests library to make HTTP requests to a website. Install it with pip by entering the following command in your terminal:
python -m pip install requests
Other libraries we’ll use in this project, specifically time and functools, come natively with Python 3.11.2, so you don’t have to install them.
A decorator in Python is a function that accepts another function as an argument and outputs a new function. We can alter the behavior of the original function using a decorator without changing its source code.
One common use case for decorators is to implement caching. This involves creating a dictionary to store the function's results and then saving them in the cache for future use.
Let’s start by creating a simple function that takes a URL as a function argument, requests that URL, and returns the response text:
import requests
def get_html_data(url):
response = requests.get(url)
return response.text
Link to GitHubNow, let's move toward creating a memoized version of this function:
def memoize(func):
cache = {}
def wrapper(*args):
if args in cache:
return cache[args]
else:
result = func(*args)
cache[args] = result
return result
return wrapper
@memoize
def get_html_data_cached(url):
response = requests.get(url)
return response.text
Link to GitHubThe wrapper function determines whether the current input arguments have been previously cached and, if so, returns the previously cached result. If not, the code calls the original function and caches the result before being returned. In this case, we define a memoize decorator that generates a cache dictionary to hold the results of previous function calls.
By adding @memoize above the function definition, we can use the memoize decorator to enhance the get_html_data function. This generates a new memoized function that we’ve called get_html_data_cached. It only makes a single network request for a URL and then stores the response in the cache for further requests.
Let’s use the time module to compare the execution speeds of the get_html_data function and the memoized get_html_data_cached function:
import time
start_time = time.time()
get_html_data('https://books.toscrape.com/')
print('Time taken (normal function):', time.time() - start_time)
start_time = time.time()
get_html_data_cached('https://books.toscrape.com/')
print('Time taken (memoized function using manual decorator):', time.time() - start_time)
Link to GitHubHere’s what the complete code looks like:
# Import the required modules
import time
import requests
# Function to get the HTML Content
def get_html_data(url):
response = requests.get(url)
return response.text
# Memoize function to cache the data
def memoize(func):
cache = {}
# Inner wrapper function to store the data in the cache
def wrapper(*args):
if args in cache:
return cache[args]
else:
result = func(*args)
cache[args] = result
return result
return wrapper
# Memoized function to get the HTML Content
@memoize
def get_html_data_cached(url):
response = requests.get(url)
return response.text
# Get the time it took for a normal function
start_time = time.time()
get_html_data('https://books.toscrape.com/')
print('Time taken (normal function):', time.time() - start_time)
# Get the time it took for a memoized function (manual decorator)
start_time = time.time()
get_html_data_cached('https://books.toscrape.com/')
print('Time taken (memoized function using manual decorator):', time.time() - start_time)
Link to GitHubAnd here’s the output:
Notice the time difference between the two functions. Both take almost the same time, but the supremacy of caching lies behind the re-access.
Since we’re making only one request, the memoized function also has to access data from the main memory. Therefore, with our example, a significant time difference in execution isn’t expected. However, if you increase the number of calls to these functions, the time difference will significantly increase (see below section called Performance comparison).
Another method to implement caching in Python is to use the built-in @lru_cache decorator from functools. This decorator implements cache using the least recently used (LRU) caching strategy. This LRU cache is a fixed-size cache, which means it’ll discard the data from the cache that hasn’t been used recently.
To use the @lru_cache decorator, we can create a new function for extracting HTML content and place the decorator name at the top. Make sure to import the functools module before using the decorator:
from functools import lru_cache
@lru_cache(maxsize=None)
def get_html_data_lru(url):
response = requests.get(url)
return response.text
Link to GitHubIn the above example, the get_html_data_lru method is memoized using the @lru_cache decorator. The cache can grow indefinitely when the maxsize option is set to None.
To use the @lru_cache decorator, just add it above the get_html_data_lru function. Here’s the complete code sample:
# Import the required modules
from functools import lru_cache
import time
import requests
# Function to get the HTML Content
def get_html_data(url):
response = requests.get(url)
return response.text
# Memoized using LRU Cache
@lru_cache(maxsize=None)
def get_html_data_lru(url):
response = requests.get(url)
return response.text
# Get the time it took for a normal function
start_time = time.time()
get_html_data('https://books.toscrape.com/')
print('Time taken (normal function):', time.time() - start_time)
# Get the time it took for a memoized function (LRU cache)
start_time = time.time()
get_html_data_lru('https://books.toscrape.com/')
print('Time taken (memoized function with LRU cache):', time.time() - start_time)
Link to GitHubThis produced the following output:
In the following table, we’ve determined the execution times of all three functions for different numbers of requests to these functions:
No. of requests | Time taken by normal function | Time taken by memoized function (manual decorator) | Time taken by memoized function (lru_cache decorator) |
---|---|---|---|
1 | 2.1 Seconds | 2.0 Seconds | 1.7 Seconds |
10 | 17.3 Seconds | 2.1 Seconds | 1.8 Seconds |
20 | 32.2 Seconds | 2.2 Seconds | 2.1 Seconds |
30 | 57.3 Seconds | 2.22 Seconds | 2.12 Seconds |
As the number of requests to the functions increases, you can see a significant reduction in execution times using the caching strategy. The following comparison chart depicts these results:
The comparison results clearly show that using a caching strategy in your code can significantly improve overall performance and speed.
If you want to speed up your code, caching the data can help. There are many use cases where caching is the game changer and web scraping is one of them. When dealing with large-scale projects, consider caching frequently used data in your code to massively speed up your data extraction efforts and improve overall performance.
In case you have questions about our products, feel free to contact our 24/7 support via live chat or email.
Caching and data replication are two different methods used to improve the performance and availability of data.
Caching involves storing frequently accessed data locally (often using a Python local cache) to serve future function calls quickly. For example, when a user makes a second call with the same input, the cached value is returned, reducing expensive function calls and redundant computations.
Data replication, however, involves duplicating data across multiple servers or locations to ensure high availability and fault tolerance. While caching boosts performance by minimizing delay, replication primarily ensures resilience and uptime.
Memoization is a specific form of caching, typically applied to recursive calls or repetitive function calls. It's often used in problems like calculating the fibonacci sequence, where the same larger values are repeatedly recomputed.
Using a cache decorator like @lru_cache in Python can memoize a function, storing its cached value and automatically evicting the least recently used entries when the cache size is exceeded.
Caching, more broadly, refers to storing data for quicker access, whether it's function return values, API responses, or parsed XML data. If you're handling expensive function calls in your Python applications, both memoization and caching can dramatically improve performance.
Learn more by reading our related blog posts like How to Parse XML in Python or Python Requests Library: 2025 Guide to see how you can cache parsed results or HTTP responses.
Use caching when:
You call the same function with identical arguments multiple times.
You're dealing with expensive computations or I/O operations.
You want to improve performance in Python web scraping, API consumption, or data parsing.
Whether you're implementing a Python local cache, a decorator-based approach, or custom logic, caching helps you significantly speed up execution.
Absolutely. Calculating the fibonacci sequence recursively is a classic example where caching helps. Without memoization, every call recalculates previous values, causing exponential growth in recursive calls.
With a cache decorator like @lru_cache, each new value is stored, and future function calls simply reuse cached values – massively improving speed.
When using the requests library, it's common to hit the same endpoints repeatedly. You can cache these responses using third-party tools like requests-cache, which stores responses in memory or on disk.
This kind of caching is extremely valuable when working with cURL with Python or when making frequent API calls using Python Requests.
If you're rotating proxies to avoid getting blocked during high-volume scraping or API polling, you still want to avoid repeating the same calculations or requests. Caching responses while rotating proxies ensures that only new values trigger actual network activity, saving time and reducing server hits.
Combining rotate proxies Python logic with tools like requests-cache or custom caching layers helps achieve this. Use queries like cache Python or Python local cache to find libraries that support this kind of hybrid approach.
About the author
Vytenis Kaubrė
Technical Content Researcher
Vytenis Kaubrė is a Technical Content Researcher at Oxylabs. Creative writing and a growing interest in technology fuel his daily work, where he researches and crafts technical content, all the while honing his skills in Python. Off duty, you may catch him working on personal projects, learning all things cybersecurity, or relaxing with a book.
All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.
Get the latest news from data gathering world
Human-like scraping without IP blocking
Forget about IP blocks and CAPTCHAs with 170M+ premium proxies located in 195 countries.
Effortless data gathering
Extract data even from the most complex websites without hassle by using Web Scraper API.
Scale up your business with Oxylabs®
Proxies
Advanced proxy solutions
Data Collection
Datasets
Resources
Innovation hub
Human-like scraping without IP blocking
Forget about IP blocks and CAPTCHAs with 170M+ premium proxies located in 195 countries.
Effortless data gathering
Extract data even from the most complex websites without hassle by using Web Scraper API.