What happens when a robots.txt file has a 5XX error code?

Documented evidence of a persistent 5xx error in a robots.txt file

Author:: MJ Cachón y Carlos Sánchez
Topics:: Crawling; servers

Publication Date:: 2025-11-07
Last Review:: 2025-11-07

Índice: mostrar

I understand this is an unusual question and in fact this is not a post about a common case.

However, it's a doubt that arose from an inconsistency I read in Google's documentation and from a post by Gary Illyes himself, a Google representative. His post said the following:

A robots.txt file that returns a 500/503 HTTP status code for an extended period of time will remove your site from search results, even if the rest of the site is accessible to Googlebot. The same applies to network timeouts. — Gary Illlyes

Curiously, when you went to check the documentation, it said that when the robots.txt had a 5XX error for a long time, it was taken as if there were no crawl restrictions and at the same time, 2 paragraphs later, it was written that it was considered that the entire site was inaccessible.

Source: LinkedIn Post

This is clearly contradictory, in case this happens:

The site can be totally inaccessible to Google
It may have no crawl restrictions

Obviously, both issues are clearly contradictory, and this needs a deeper explanation.

Logic and experience told me that Gary Illyes was wrong, even comments agreeing with me. But you have to prove these issues with research, not with words.

So I turned to the person in the industry I know who most enjoys doing experiments with Google, MJ Cachón, who listened to my proposal and wanted to collaborate with me on the research and gave me access to the website https://rana.ninja/.

My premise was the following:

When the robots.txt has a 5XX it uses its last cached version, which is maintained for one month. After that month, the robots.txt is taken as a 4XX and it is understood that there are no crawl restrictions. As stated in one of the versions of the official documentation.

However, let's see what exactly happens. Please, never do these experiments on production projects that cannot risk being lost.

Strange documentation changes

Shortly after starting the research, Google changed the documentation.

Although Google did not announce the changes, which continue to be contrary to Gary Illyes' premise in his post, these are the changes that occurred in the official documentation about 5XX in robots.txt in December 2024.

Situation	Before (17/12/2024)	Now (24/12/2024)
5xx or 429 error when requesting `robots.txt`	Google interprets as if the entire site were temporarily blocked.	Google stops crawling the site for 12 hours, but continues trying to obtain the `robots.txt`.
30 days without being able to obtain `robots.txt`	Uses the last cached version. If there is no copy, assumes there are no restrictions.	Uses the last cached version. If there is no copy, keeps trying, with behavior depending on site availability.
After 30 days	Continues assuming no restrictions if there is no cached copy.	Google assumes there is no `robots.txt` and continues crawling normally, but keeps trying to obtain it.

From my point of view and what we have gathered from information, it does seem there is crawling of the rest of the website from the beginning. As the documentation says, the rules that were in the last robots.txt stored in cache are respected for one month and then there is no restriction whatsoever.

Let's go to the documented evidence.

Investigation on the effects of 5xx on robots.txt

We start from the beginning.

25/02/2025 The test begins

Cachón blocks his website's robots.txt with a 5XX, with the last crawl and record that Google has of his robots.txt being 09/02/2025.

This is how the last version cached by Google of Cachón's robots.txt looked:

User-agent: * Disallow: guia-definitiva-de-screaming-frog-impulsa-el-seo-de-tu-web/ Disallow: guia-screaming/ Disallow: 13-consejos-screaming-frog-que-debes-saber/ Disallow: wp-content/themes/twentytwenty/assets/js/index.js?ver=2.0 Allow: /wp-content/uploads/* Allow: /wp-content/*.js Allow: /wp-content/*.css Allow: /wp-includes/*.js Allow: /wp-includes/*.css Allow: /*.css$ Allow: /*.js$ Disallow: /cgi-bin Disallow: /wp-content/plugins/ Allow: /wp-content/plugins/*.jpeg Allow: /wp-content/plugins/*.png Allow: /wp-content/plugins/*.js Allow: /wp-content/plugins/*.css Disallow: /wp-content/themes/ Allow: /wp-content/themes/*.js Allow: /wp-content/themes/*.css Allow: /wp-content/themes/*.woff2 Disallow: /wp-includes/ Disallow: /*/attachment/ Disallow: /tag/*/page/ Disallow: /tag/*/feed/ Disallow: /page/ Disallow: /comments/ Disallow: /xmlrpc.php Disallow: /?attachment_id* Disallow: *? Disallow: ?s= Disallow: /search Disallow: /trackback Disallow: /*trackback Disallow: /*trackback* Disallow: /*/trackback Allow: /feed/$ Disallow: /feed/ Disallow: /comments/feed/ Disallow: */feed/$ Disallow: */*/feed/$ Disallow: */feed/rss/$ Disallow: */trackback/$ Disallow: */*/feed/$ Disallow: */*/feed/rss/$ Disallow: */*/trackback/$ Disallow: */*/*/feed/$ Disallow: /*/*/*/feed/rss/$ Disallow: /*/*/*/trackback/$ Sitemap: https://rana.ninja/sitemap_index.xml Sitemap: https://rana.ninja/post-sitemap.xml Sitemap: https://rana.ninja/page-sitemap.xml

So as an experiment, we'll test the crawling of https://rana.ninja/page/ which has to be blocked according to the last version, and in fact that's what happens, despite there being a 5XX on the robots.txt

You can also verify that even when testing in real time, the URL cannot be crawled despite the 5XX of the robots.txt

Conclusions

The first days after leaving the robots.txt with a 503, this is what happens:

The Robots.txt is inaccessible to Google
Google stores the last cached copies
Google respects the last cached rules of the robots.txt and still cannot crawl pages that were in Disallow.

12/03/2025 Review of the website status

Apparently Google continues to respect the robots.txt directives

The only noticeable difference after these days is that Google Search Console has removed the oldest cached version:

09/05/2025 Final conclusions

We wanted to check if the disaster that Gary Illyes announced on the web would occur, but after several months, what happens is the following:

The cached robots.txt history disappears:

The website content is totally crawlable and indexable:

The page https://rana.ninja/page/ doesn't appear as blocked by the robots.txt:

BUT, it doesn't allow indexing:

After a few months of the test, indexing drops dramatically:

Apparently, although the documentation says that it is taken as if there are no restrictions with the robots.txt and that everything can be crawled, it seems that deindexing is something that happens progressively after the continued 5xx error in the robots.txt.

And although there are URLs that continue to be shown in Google:

It's only a matter of time before this fatal deindexing spreads throughout the entire website:

Final conclusions

What we think can be better or worse. And although this case is somewhat marginal, these are the conclusions of the study when this happens:

Initial behavior (first month):

When the robots.txt returns a 5XX, Google uses the most recent cached version.
The directives of the last cached robots.txt are respected for approximately one month.

Behavior after the first month:

Although Google's documentation says that after 30 days without being able to access the robots.txt it is understood that there are no crawl restrictions, in practice crawling may continue, but indexing begins to deteriorate progressively.
Previously blocked pages become crawlable, but a progressive drop in indexing is observed, ultimately affecting the entire website.

Real final conclusion vs official documentation:

Google's documentation and public statements (such as those by Gary Illyes) show inconsistencies.
In practice, the prolonged 5XX error in robots.txt ends up causing strong deindexing of the website, although it doesn't directly prevent crawling.
This confirms that the real impact is more serious and slower than what the documentation initially suggested.