Multiple unary rpc calls vs long-running bidirectional streaming in grpc?

Streams ensure that messages are delivered in the order that they were sent, this would mean that if there are concurrent messages, there will be some kind of bottleneck.

Google’s gRPC team advises against using streams over unary for performance, but nevertheless, there have been arguments that theoretically, streams should have lower overhead. But that does not seem to be true.

For a lower number of concurrent requests, both seem to have comparable latencies. However, for higher loads, unary calls are much more performant.

There is no apparent reason we should prefer streams over unary, given using streams comes with additional problems like

  • Poor latency when we have concurrent requests
  • Complex implementation at the application level
  • Lack of load balancing: the client will connect with one server and ignore any new servers
  • Poor resilience to network interruptions (even small interruptions in TCP connections will fail the connection)

Some benchmarks here: https://nshnt.medium.com/using-grpc-streams-for-unary-calls-cd64a1638c8a


the issue is that client side streaming cannot ensure reliable delivery at application level (i.e. if the stream closed in between, how many messages that were sent were actually processed by the server) and I can't afford this

This implies you need a response. Even if the response is just an acknowledgement, it is still a response from gRPC's perspective.

The general approach should be "use unary," unless large enough problems can be solved by streaming to overcome their complexity costs. I discussed this at 2018 CloudNativeCon NA (there's a link to slides and YouTube for the video).

For example, if you have multiple backends then each unary RPC may be sent to a different backend. That may cause a high overhead for those various backends to synchronize themselves. A streaming RPC chooses a backend at the beginning and continues using the same backend. So streaming might reduce the frequency of backend synchronization and allow higher performance in the service implementation. But streaming adds complexity when errors occur, and in this case it will cause the RPCs to become long-lived which are more complicated to load balance. So you need to weigh whether the added complexity from streaming/long-lived RPCs provides a large enough benefit to your application.

We don't generally recommend using streaming RPCs for higher gRPC performance. It is true that sending a message on a stream is faster than a new unary RPC, but the improvement is fixed and has higher complexity. Instead, we recommend using streaming RPCs when it would provide higher application (your code) performance or lower application complexity.