How dangerous is it to allow arbitrary webhook urls to post to?
In addition to "validating webhook URLs", implement rate limiting on your API endpoints and/or calling webhooks.
If you have a big/popular enough service you should have this even if you don't allow users to have custom callback URLs — eventually somebody will attempt to make a million requests against a resource-intensive (for you) API endpoint, and you really should have protection in place.
That is — you don't have to thoroughly try to detect "malicious" URLs — just have logic in place that would cut off access if a single account makes X requests within Y minutes (add more complex logic as needed).
——
Of course you should also:
- run webhooks asynchronously (in the background) so your API speed does not depend on a 3rd party.
- run basic checks that prevent infinite redirect loops, and (for example) blacklist your own domain (make sure to check the blacklist for each redirect target separetely).
These concerns are all very valid and we've had to consider them when writing the PubSubHubbub spec. We 'solved' most of them by having a verification step which involves executing the webhook and expecting a specific answer from it. Basically when a new hook is added, it's called with a GET request and an additional 'hub.challenge' param. The webhook then MUST respond with a 200 and echo the hub.challenge in the body.
Generally, I would recommand looking at PubSubHubbub when implementing a webhooks "API", because that's what it is :) Initially it was designed for RSS/Atom but now works with any type of resources, including JSON. It also provides mechanisms for secure delivery (using secret and HMAC signatures).
Bringing this down from comment because it's important.
A concern to address as well is internal targets. For example, an attacker could pass an address that is internal to your network that would normally be firewalled off and/or hidden behind NAT. The implications of this vary wildly based on your application.
Consider also the possibility of accessing infrastructure resources like AWS S3 buckets, Dynamo DB databases, or SQS queues. Depending on your configuration, you could be allowing arbitrary writes to these resources. Depending on your setup, it may be possible for an attacker to read/write data to/from one of these services, and then have that data exfiltrated via regenerative webhook (trigger another webhook that would echo this data) back to his/her own server!
This is something to consider when validating webhook URLs.