How do random attackers discover websites to target?
Assume every malicious person already has the URL of your website. Actually, to be thorough, assume they have every URL on your website. Having a "hidden" "private" website is no more an actual security measure than having "hidden" directories. To assume otherwise is folly.
There's a variety of ways that people can discover a site.
- links might get placed on social networking sites (e.g. twitter, facebook)
- search engines like Shodan might find the content
- ...
So as the other answers mention, it's easiest to assume that it's discoverable. One thing that's not been mentioned that I'd recommend to protect a site if it only needs to be accessed by people within your organisation is that you could use .htaccess files to restrict access to the site to only specific source IP addresses (assuming that your company has fixed external IP addresses, which most will).
there's a number of articles on the details of doing this one example here
In addition to the ways you listed, here are a couple more:
- IPv4 space is limited. It is feasible to scan random (or sequential) IP addresses for web servers that are open and vulnerable. This will not work for virtual hosts (where the HTTP Host header determines which site to serve).
- Domain registration data is public. Attackers could run through feed a list of registered domains to a scanner, though the majority would be unused and parked.
- Online services like RobTex can be used to find lists of domains and hostnames that resolve to an IP address. They get this from regular DNS lookups, and the information could be used to find sites that are hosted at a particular provider, for example.