Overview of Google crawlers and fetchers (user agents)
Google uses crawlers and fetchers to perform actions for its products, either automatically or triggered by user request. Crawler (sometimes also called a "robot" or "spider") is a generic term for any program that is used to automatically discover and scan websites by following links from one web page to another. Fetchers act as a program like wget that typically make a single request on behalf of a user. Google's clients fall into three categories:
Common crawlers | The common crawlers used for Google's products (such as Googlebot). They always respect robots.txt rules for automatic crawls. |
Special-case crawlers |
Special-case crawlers are similar to common crawlers, however are used by specific products
where there's an agreement between the crawled site and the Google product about the crawl
process. For example, AdsBot ignores the global robots.txt user agent
(* ) with the ad publisher's permission.
|
User-triggered fetchers | User-triggered fetchers are part of tools and product functions where the end user triggers a fetch. For example, Google Site Verifier acts on the request of a user. |
Technical properties of Google's crawlers and fetchers
Google's crawlers and fetchers are designed to be run simultaneously by thousands of machines to improve performance and scale as the web grows. To optimize bandwidth usage, these clients are distributed across many datacenters across the world so they're located near the sites that they might access. Therefore, your logs may show visits from several IP addresses. Google egresses primarily from IP addresses in the United States. In case Google detects that a site is blocking requests from the United States, it may attempt to crawl from IP addresses located in other countries.
Google's crawlers and fetchers use HTTP/1.1 and, if supported by the site,
HTTP/2. Crawling over
HTTP/2 may save computing resources (for example, CPU, RAM) for your site and Googlebot, however
there's no product specific benefit to the site (for example, no ranking boost in Google Search).
To opt out from crawling over HTTP/2, instruct the server that's hosting your site to respond
with a 421
HTTP status code when Google attempts to access your site over
HTTP/2. If that's not feasible, you
can send a message to the Crawling team
(however this solution is temporary).
Google's crawlers and fetchers support the following content encodings (compressions):
gzip,
deflate, and
Brotli (br). The
content encodings supported by each Google user agent is advertised in the
Accept-Encoding
header of each request they make. For example,
Accept-Encoding: gzip, deflate, br
.
Our goal is to crawl as many pages from your site as we can on each visit without overwhelming your server. If your site is having trouble keeping up with Google's crawling requests, you can reduce the crawl rate. Note that sending the inappropriate HTTP response code to Google's crawlers may affect how your site appears in Google products.
Verifying Google's crawlers and fetchers
Google's crawlers identify themselves in three ways:
-
The HTTP
user-agent
request header. - The source IP address of the request.
- The reverse DNS hostname of the source IP.
Learn how to use these details to verify Google's crawlers and fetchers.