Drupal - Does Google crawl Drupal sites in maintenance mode?
When you put a Drupal site in maintenance mode, non-administrators see the standard maintenance mode page (assuming you clear caches after doing so). If you examine the response, you will see that it is sent back with a HTTP status code of 503, which from RFC 2616 is:
503 Service Unavailable
The server is currently unable to handle the request due to a temporary overloading or maintenance of the server. The implication is that this is a temporary condition which will be alleviated after some delay. If known, the length of the delay MAY be indicated in a Retry-After header. If no Retry-After is given, the client SHOULD handle the response as it would for a 500 response.
And from the Official Google Webmaster blog:
If my site is down for maintenance, how can I tell Googlebot to come back later rather than to index the "down for maintenance" page?
You should configure your server to return a status of 503 (network unavailable) rather than 200 (successful). That lets Googlebot know to try the pages again later.
So, that gives evidence that Drupal does the right thing, and that Google will revisit your site and index pages the next time it gets back a non 5XX status code.
Unless you've done something custom to allow it, Google can't crawl your site in maintenance mode.
Because you need to be logged in to view, googlebot will see the designated maintenance page.
For additional guidelines from Google see the following links:
- Webmaster Guidelines: https://support.google.com/webmasters/answer/35769?hl=en
- SEO Guidelines (multi-page PDF): http://www.google.com/webmasters/docs/search-engine-optimization-starter-guide.pdf
- SEO Guidelines (single-page PDF): https://storage.googleapis.com/support-kms-prod/SNP_3027140_en_v0
Your concerns:
If you do not know whether maintenance mode is blocking "everything you want blocked" by 100% (!) -- but at the same time are very concerned about "whether someone like google might still access it". ... then maintenance mode might be a bad choice for blocking your development site in the first place.
Recommendation:
Personally, I recommend simply adding a .htpasswd to your dev sites.
It is simple to automate even inside aegir deployments. It never gets in your way, because your browser and every command-line tool can skip it for you. You can let other people in. It blocks the site in all completeness 100% to google etc.