How to resolve Google "Indexed, though blocked by robots.txt"

Google isn't crawling your page, but it is indexing the URL. It isn't indexing the content of the page, just the URL itself, possibly along with anchor text of links that point to it. Google says:

A robotted page can still be indexed if linked to from from other sites While Google won't crawl or index the content blocked by robots.txt, we might still find and index a disallowed URL if it is linked from other places on the web. As a result, the URL address and, potentially, other publicly available information such as anchor text in links to the page can still appear in Google search results. To properly prevent your URL from appearing in Google Search results, you should password-protect the files on your server or use the noindex meta tag or response header (or remove the page entirely).

The reason for this is that some important sites don't allow any crawling. One such site is (or was) the California DMV. It is important that users be able to search for the California DMV even if Google can't crawl the site. Google's Matt Cutts posted about this issue in 2006.

When Google indexes a page that is blocked by robots.txt it usually appears in the search results something like this (image source):

If you don't want the page indexed at all, you have to let Google crawl it and use the <meta name="robots" content="noindex"> tag. Keep in mind that if the page is blocked by robots.txt, Google will never be able to see that tag and the URL will still be indexed.

The other "experimental" option would be to use Noindex: rather than Disallow: in robots.txt. See How does “Noindex:” in robots.txt work? The only downside to this is that Google says they may stop supporting it at any point. Other search engines won't know what to do with that directive, so you would have to put it in a Google specific section of robots.txt. In 2019, Google announced that it no longer supports a noindex: directive in robots.txt.

How to resolve Google "Indexed, though blocked by robots.txt"

Tags:

Robots.Txt

Google Search Console

Related

Recent Posts