How do you detect a VPN or Proxy connection?
List of Tor exit nodes is publicly available. You only want "exit nodes" and it's available as CSV. This should be 100% complete and accurate as it's generated directly from Tor directory.
A free list of open proxies is available from iblocklist.com. A free list that incorporates open proxies, Tor nodes and VPN endpoints from ip2location.com.
The last two have most likely limited coverage and accuracy, especially as it comes to VPN exit nodes - there's just too many of them. Some providers take another approach and consider all "hosted subnets" (subnets from which ISPs assign their clients IPs for hosted servers) as some kind of VPN or proxy, as end-users should be connecting from "consumer" subnets.
The simplest way to do this is to use an external service like an API to block VPN or proxy users.
MaxMind and GetIPIntel both offer it via API, you might want to give it a try. GetIPIntel provides free API service so I suggest you try that first.
For OpenVPN, someone used unique MSS values to identify VPN connections but the setup is complicated and it might be "patched" now.
The strategies you've mentioned in your edits don't seem like a very good idea because you'll run into many false positives. Sending out port scans whenever they connect to your service is going to take a lot of time and resources before you get the results.
Unfortunately, there's is no proper technical way to get the information you want. You might invent some tests, but those will have a very low correlation with the reality. So either you'll not catch those you want, or you'll have a larger number of false positives. Neither can be considered to make sense.
Generating any kind of traffic backwards from an Internet server in response to an incoming client (a port scan, or even a simple ping) is generally frowned upon. Or, in the case of a port scan, it may be even worse for you, eg when the client lives behind a central corporate firewall, the worst of which is when the client comes from behind the central government network firewall pool...
Frankly, IP-based bans (or actually, any kind of limiting focusing on people who do not exclusively possess their public IP address: proxy servers, VPNs, NAT devices, etc) have been unrealistic for a long time, and as the IPv4 pools have been getting depleted in many parts of the world, ISPs are putting more and more clients behind large NAT pools (it's this week's news in my country that the largest ISP, a subsidiary of Deutsche Telekom, has started handing out private IPv4 addresses as a standard way of business to its customers, and people have to ask the provider explicitly to get a public IP address), so there's even less and less point in doing so. If you want to ban clients, you should ban them based on identity (account), and not based on IP address.
At IPinfo we offer a privacy detection API, which will let you know if a connection is coming from a VPN, an anonymous proxy, a tor exit node, or a hosting provider (which could be used to tunnel traffic). Here's an example:
$ curl ipinfo.io/43.241.71.120/privacy?token=$TOKEN
{
"vpn": true,
"proxy": false,
"tor": false,
"hosting": true
}
If you wanted to block connections to your site from VPNs then you could make an API request to get this information, and reply with an error if it's detected as a VPN. In PHP that would look something like this:
$ip = $_SERVER['REMOTE_ADDR'];
$url = "http://ipinfo.io/{$ip}/privacy?token={$IPINFO_API_TOKEN}";
$details = json_decode(file_get_contents($url));
// Just block VPNs
if($details->vpn) {
return echo "VPN Access Blocked!";
}
// Or we could block all the other types of private / anonymous connections...
if($details->vpn || $details->proxy || $details->tor || $details->hosting) {
return echo "Access Blocked!";
}