Akka Http Performance tuning
Some disclaimers first: I haven't worked with wrk
tool before, so I might get something wrong. Here are assumptions I've made for this answer:
- Connections count is independent from threads count, i.e. if I specify
-t4 -c10000
it keeps 10000 connections, not 4 * 10000. - For every connection the behavior is as follows: it sends the request, receives the response completely, and immediately sends the next one, etc., until the time runs out.
Also I've run the server on the same machine as wrk, and my machine seems to be weaker than yours (I have only dual-core CPU), so I've reduced wrk's thread counts to 2, and connection count to 1000, to get decent results.
The Akka Http version I've used is the 10.0.1
, and wrk version is 4.0.2
.
Now to the answer. Let's look at the blocking code you have:
Future { // Blocking code
Thread.sleep(100)
"OK"
}
This means, every request will take at least 100 milliseconds. If you have 200 threads, and 1000 connections, the timeline will be as follows:
Msg: 0 200 400 600 800 1000 1200 2000
|--------|--------|--------|--------|--------|--------|---..---|---...
Ms: 0 100 200 300 400 500 600 1000
Where Msg
is amount of processed messages, Ms
is elapsed time in milliseconds.
This gives us 2000 messages processed per second, or ~60000 messages per 30 seconds, which mostly agrees to the test figures:
wrk -t2 -c1000 -d 30s --timeout 10s --latency http://localhost:8080/hello
Running 30s test @ http://localhost:8080/hello
2 threads and 1000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 412.30ms 126.87ms 631.78ms 82.89%
Req/Sec 0.95k 204.41 1.40k 75.73%
Latency Distribution
50% 455.18ms
75% 512.93ms
90% 517.72ms
99% 528.19ms
here: --> 56104 requests in 30.09s <--, 7.70MB read
Socket errors: connect 0, read 1349, write 14, timeout 0
Requests/sec: 1864.76
Transfer/sec: 262.23KB
It is also obvious that this number (2000 messages per second) is strictly bound by the threads count. E.g. if we would have 300 threads, we'd process 300 messages every 100 ms, so we'd have 3000 messages per second, if our system can handle so many threads. Let's see how we'll fare if we provide 1 thread per connection, i.e. 1000 threads in pool:
wrk -t2 -c1000 -d 30s --timeout 10s --latency http://localhost:8080/hello
Running 30s test @ http://localhost:8080/hello
2 threads and 1000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 107.08ms 16.86ms 582.44ms 97.24%
Req/Sec 3.80k 1.22k 5.05k 79.28%
Latency Distribution
50% 104.77ms
75% 106.74ms
90% 110.01ms
99% 155.24ms
223751 requests in 30.08s, 30.73MB read
Socket errors: connect 0, read 1149, write 1, timeout 0
Requests/sec: 7439.64
Transfer/sec: 1.02MB
As you can see, now one request takes almost exactly 100ms on average, i.e. the same amount we put into Thread.sleep
. It seems we can't get much faster than this! Now we're pretty much in standard situation of one thread per request
, which worked pretty well for many years until the asynchronous IO let servers scale up much higher.
For the sake of comparison, here's the fully non-blocking test results on my machine with default fork-join thread pool:
complete {
Future {
"OK"
}
}
====>
wrk -t2 -c1000 -d 30s --timeout 10s --latency http://localhost:8080/hello
Running 30s test @ http://localhost:8080/hello
2 threads and 1000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 15.50ms 14.35ms 468.11ms 93.43%
Req/Sec 22.00k 5.99k 34.67k 72.95%
Latency Distribution
50% 13.16ms
75% 18.77ms
90% 25.72ms
99% 66.65ms
1289402 requests in 30.02s, 177.07MB read
Socket errors: connect 0, read 1103, write 42, timeout 0
Requests/sec: 42946.15
Transfer/sec: 5.90MB
To summarize, if you use blocking operations, you need one thread per request to achieve the best throughput, so configure your thread pool accordingly. There are natural limits for how many threads your system can handle, and you might need to tune your OS for maximum threads count. For best throughput, avoid blocking operations.
Also don't confuse asynchronous operations with non-blocking ones. Your code with Future
and Thread.sleep
is a perfect example of asynchronous, but blocking operation. Lots of popular software operates in this mode (some legacy HTTP clients, Cassandra drivers, AWS Java SDKs, etc.). To fully reap the benefits of non-blocking HTTP server, you need to be non-blocking all the way down, not just asynchronous. It might not be always possible, but it's something to strive for.