StackExchange redis client very slow compared to benchmark tests

My results from the code below:

Connecting to server...
Connected
PING (sync per op)
    1709ms for 1000000 ops on 50 threads took 1.709594 seconds
    585137 ops/s
SET (sync per op)
    759ms for 500000 ops on 50 threads took 0.7592914 seconds
    658761 ops/s
GET (sync per op)
    780ms for 500000 ops on 50 threads took 0.7806102 seconds
    641025 ops/s
PING (pipelined per thread)
    3751ms for 1000000 ops on 50 threads took 3.7510956 seconds
    266595 ops/s
SET (pipelined per thread)
    1781ms for 500000 ops on 50 threads took 1.7819831 seconds
    280741 ops/s
GET (pipelined per thread)
    1977ms for 500000 ops on 50 threads took 1.9772623 seconds
    252908 ops/s

===

Server configuration: make sure persistence is disabled, etc

The first thing you should do in a benchmark is: benchmark one thing. At the moment you're including a lot of serialization overhead, which won't help get a clear picture. Ideally, for a like-for-like benchmark, you should be using a 3-byte fixed payload, because:

3 bytes payload

Next, you'd need to look at parallelism:

50 parallel clients

It isn't clear whether your test is parallel, but if it isn't we should absolutely expect to see less raw throughput. Conveniently, SE.Redis is designed to be easy to parallelize: you can just spin up multiple threads talking to the same connection (this actually also has the advantage of avoiding packet fragmentation, as you can end up with multiple messages per packet, where-as a single-thread sync approach is guaranteed to use at most one message per packet).

Finally, we need to understand what the listed benchmark is doing. Is it doing:

(send, receive) x n

or is it doing

send x n, receive separately until all n are received

? Both options are possible. Your sync API usage is the first one, but the second test is equally well-defined, and for all I know: that's what it is measuring. There are two ways of simulating this second setup:

  • send the first (n-1) messages with the "fire and forget" flag, so you only actually wait for the last one
  • use the *Async API for all messages, and only Wait() or await the last Task

Here's a benchmark that I used in the above, that shows both "sync per op" (via the sync API) and "pipeline per thread" (using the *Async API and just waiting for the last task per thread), both using 50 threads:

using StackExchange.Redis;
using System;
using System.Diagnostics;
using System.Threading;
using System.Threading.Tasks;

static class P
{
    static void Main()
    {
        Console.WriteLine("Connecting to server...");
        using (var muxer = ConnectionMultiplexer.Connect("127.0.0.1"))
        {
            Console.WriteLine("Connected");
            var db = muxer.GetDatabase();

            RedisKey key = "some key";
            byte[] payload = new byte[3];
            new Random(12345).NextBytes(payload);
            RedisValue value = payload;
            DoWork("PING (sync per op)", db, 1000000, 50, x => { x.Ping(); return null; });
            DoWork("SET (sync per op)", db, 500000, 50, x => { x.StringSet(key, value); return null; });
            DoWork("GET (sync per op)", db, 500000, 50, x => { x.StringGet(key); return null; });

            DoWork("PING (pipelined per thread)", db, 1000000, 50, x => x.PingAsync());
            DoWork("SET (pipelined per thread)", db, 500000, 50, x => x.StringSetAsync(key, value));
            DoWork("GET (pipelined per thread)", db, 500000, 50, x => x.StringGetAsync(key));
        }
    }
    static void DoWork(string action, IDatabase db, int count, int threads, Func<IDatabase, Task> op)
    {
        object startup = new object(), shutdown = new object();
        int activeThreads = 0, outstandingOps = count;
        Stopwatch sw = default(Stopwatch);
        var threadStart = new ThreadStart(() =>
        {
            lock(startup)
            {
                if(++activeThreads == threads)
                {
                    sw = Stopwatch.StartNew();
                    Monitor.PulseAll(startup);
                }
                else
                {
                    Monitor.Wait(startup);
                }
            }
            Task final = null;
            while (Interlocked.Decrement(ref outstandingOps) >= 0)
            {
                final = op(db);
            }
            if (final != null) final.Wait();
            lock(shutdown)
            {
                if (--activeThreads == 0)
                {
                    sw.Stop();
                    Monitor.PulseAll(shutdown);
                }
            }
        });
        lock (shutdown)
        {
            for (int i = 0; i < threads; i++)
            {
                new Thread(threadStart).Start();
            }
            Monitor.Wait(shutdown);
            Console.WriteLine($@"{action}
    {sw.ElapsedMilliseconds}ms for {count} ops on {threads} threads took {sw.Elapsed.TotalSeconds} seconds
    {(count * 1000) / sw.ElapsedMilliseconds} ops/s");
        }
    }
}

You are fetching data in synchronous way (50 clients in parallel but each client's requests are made synchronously instead of asynchronously)

One option would be to use the async/await methods (StackExchange.Redis support that).

If you need to get multiple keys at once (for example to build a daily graph of visitors to your website assuming you save visitors counter per day keys) then you should try fetching data from redis in asynchronous manner using redis pipelining, this should give you much better performance.