What causes the "This site may be hacked" or "This site ahead contains malware" messages?
I was about to answer this question, as I was doing a bit of research I stumbled upon this excellent answer from Quora by Steve Gill, Co-Founder and ex-Chief Scientist of a Premier Cyber Intelligence Organization. Then realised this answers it better than I could.
I have worked with Google privately on some of their technology and was one of the few people they called for help when they themselves were hacked by China who stole their source code.
Google China hackers stole source code - researcher
Google have quite a few tools in their arsenal, and more have likely been added since.
To name a few:
- They acquired and now fully host Virustotal, one of the biggest AV engine aggregators in the world that can simultaneously scan files
with 40–50 antivirus engines looking for signs of malware.- They have a network of independent proxies that will load a webpage and run it inside of a custom sandbox. Any unexpected system level
modifications detected inside that sandbox help automatically report
unwanted malicious behavior.- They perform malware detection on the sites they crawl natively and run them through simple algorithmic checks looking for well known
malicious third party scripts and embedded code.- They partner with third party data providers that feed them potential malicious seed data.
It is Google’s job to present data that is of highest quality for the browsing and searching experience. Helping prevent attacks against its users is keenly part of that tenant.
Quite a few tools also help the web administrator deal with such problems proactively such as reporting on malicious content by AS (BGP Autonomous System Level Reporting), or directly to the administrator through tools such as the Webmaster Tools.