Are web crawlers able to find a secondary robots.txt in a sub-directory?
No, web crawlers will not read or obey a robots.txt file in a subdirectory. As described on the quasi-official robotstxt.org site:
Where to put it
The short answer: in the top-level directory of your web server.
or on Google's help pages (emphasis mine):
A
robots.txt
file is a file at the root of your site that indicates those parts of your site you don’t want accessed by search engine crawlers.
In any case, using robots.txt to hide sensitive pages from search results is a bad idea anyway, since search engines can index pages disallowed in robots.txt if other pages link to them. Or, as described on the Google help page linked above:
You should not use robots.txt as a means to hide your web pages from Google Search results. This is because other pages might point to your page, and your page could get indexed that way, avoiding the robots.txt file.
So what should you do instead?
You can let search engines crawl the pages (if they find them), but include a robots meta tag with the content
noindex,nofollow
. This will tell search engines not to index those pages even if they do find links to them, and not to follow any further links from those pages. (Of course, this will only work for HTML web pages.)For non-HTML resources, you can configure your web server (e.g. using an
.htaccess
file) to send the X-Robots-Tag HTTP header with the same content.You can set up password authentication to protect the sensitive pages. Besides protecting the pages from unauthorized human visitors, it will also effectively keep web crawlers away.
Your robots.txt
should be in the root directory and should not have any other name. According to the standard specification:
This file must be accessible via HTTP on the local URL "/robots.txt".