Dogfooding our own rate-limited API
If this is causing you a problem, it will cause your putative ecosystem of developers a problem (e.g. when they try to develop an alternative UI). If you are really eating your own dog food, make the API (and the rate limiting) work for your application. Here are some suggestions:
Do not rate limit by IP address. Rather, rate limit by something associated with the user, e.g. their user ID. Apply the rate limit at the authentication stage.
Design your API so that users do not need to call it continuously (e.g. give a list call that returns many results, rather than a repeated call that returns one item each time)
Design your web app with the same constraints you expect your developer ecosystem to have, i.e. ensure you can design it within reasonable throttling rates.
Ensure your back end is scalable (horizontally preferably) so you don't need to impose throttling at levels so low it actually causes a problem to a UI.
Ensure your throttling has the ability to cope with bursts, as well as limiting longer term abuse.
Ensure your throttling performs sensible actions tailored to the abuse you are seeking to remove. For instance, consider queuing or delaying mild abusers rather than refusing the connection. Most web front ends will only open four simultaneous connections at once. If you delay an attempt to open a fifth you'll only hit the case where they are using a CLI at the same time as the web client (ot two web clients). If you delay the n-th API call without a gap rather than failing it, the end user will see things slow down rather than break. If you combine this with only queuing N API calls at once, you will only hit people who are parallelising large numbers of API calls, which is probably not the behaviour you want - e.g. 100 simultaneous API calls then a gap for an hour is normally far worse than 100 sequential API calls over an hour.
Did this not answer your question? Well, if you really need to do what you are asking, rate-limit at the authentication stage and apply a different rate limit based on the group your user fits into. If you are using one set of credentials (used by your devs and QA team), you get a higher rate limit. But you can immediately see why this will inevitably lead you to your ecosystem seeing issues that your dev and QA team do not see.
Since your own JavaScript client is accessing the API directly, anyone's going to be able to look at what it's doing and mimic it, including use the same API key. You can try to make it more difficult, like by obfuscating your code or putting various hurdles in the way, but you and the person you're trying to restrain have fundamentally the same access. Instead of trying to create a difference in privileges, you'll need to construct a system where it's totally OK that the unofficial client uses all the access in its scope, but the system is arranged in such a way that official use across all clients is greater.
This is often done with per-user access tokens, as opposed to one token for the entire application. Each token's limit should be plenty for typical use of your API, but restrictive for someone trying to abuse it. For example, 100 calls per minute might be more than enough to support typical browsing, but if I want to scrape you, I can't do it effectively on that budget.
There will always be an arms race - I can get around the limit by creating lots of bot user accounts. That, though, is a pretty solved problem if you just add a captcha to your signup flow, at a tiny bit of expense to the real human. When you get into these scenarios, everything's just a tradeoff between convenience and restriction. You'll never find something totally bulletproof, so focus on making it good enough and wait until someone exploits you to learn where the holes were.
Unfortunately, there is no perfect solution to this.
The general approach is typically to provide a spoofable way for clients to identify themselves (e.g. an identifier, version, and API key -- for example), for clients to register information about themselves that can be used to limit access (e.g. the client is a server in a given IP address range, so only allow callers in that range; e.g. the client is JavaScript, but delivered only to a specific category of browser, so only allow access to HTTP requests that specify certain user agent strings; etc.), and then to use machine learning/pattern recognition to detect anomalous usage that is likely a spoofed client and then to reject traffic from these spoofed clients (or confirm with clients that these usages are indeed not coming from the legitimate client, replace their spoofable credentials, and then disallow further traffic using the older spoofed credentials).
You can make it slightly more difficult to spoof by using multiple layers of key. For example, you give out a longer-lived credential that lives on a server (and that can only be used in a limited set of IP address ranges) to make an API call that records information about the client (e.g. the user agent) and returns a shorter-lived client-side key that is syndicated in JavaScript for use on the client for client-side API requests. This, too, is imperfect (a spoofer could issue the same server call to get the credential), but it will be more difficult if the returned API key is included in obfuscated (and frequently changing) JavaScript or HTML (which would make it difficult to reliably extract from the response). That also provides a way to more easily detect spoofing; the client-side key is now tied to a particular client (e.g. specific user agent, perhaps even a specific cookie jar) that makes reuse in another client easy to detect and the expiration also limits the duration in which the spoofed key may be reused.
Buy your product. Become a paid customer of yourself.
"Anonymous access to our API has a very low threshold for API calls per hour, whereas our paid customers are permitted upwards of 1000 calls per hour or more."
This also helps test the system from a customer's perspective.