Why does a server CPU perform faster tasks than a Macbook Pro CPU with the same benchmark score?
Benchmarks are vague handwaves to some very specific performance characteristics (peak instruction rate) that often do not take into account other factors in a system.
A non-exhaustive list of things that can make a big difference to programs but not peak instruction rates:
- Memory. Type, bandwidth, channels. These all make a difference in how fast data can get to the CPU for it to do work. Servers typically have more channels of RAM, higher quantities and much higher peak bandwidth figures than desktop or laptop CPUs. Having a high single core instruction rate wins you nothing if you can't get data to the CPU fast enough to hit that rate.
As a simple check I had a look and the 8180 Xeon (closest I could find) has 6 memory channels, while your laptop CPU would (hopefully) have 2 channels set up (or could have been poorly designed and only have one). The server has 3 times the memory bandwidth of your laptop. That will make a massive difference for memory intensive tasks. - Hard disk. Faster hard disks, SDDs and so on can make a big difference in getting data to the memory for the CPU to work on. An SSD is orders of magnitude faster seeking for small bits of data, and bulk transfer is also much higher too. NVMe is even faster again. Servers often use RAID for backup or can be set up for raw speed. While they may both be NVMe a server farm may well have enterprise class disks in a RAID 0 or 01 and be faster than your single disk, particularly likely on shared machines where minimal impact across VMs is desirable.
- Thermal limiting. Benchmarks, especially on laptops and ultra-portable machines, only tend to last long enough to see the initial ramp-up of performance. Over time heat reservoirs become full as fans can't keep up with heat output, and that initial turbo-boost speed drops down to the "normal" peak clock frequency. This can skew benchmark results and make a laptop look a lot better than it will perform under long term loads. Servers tend to have over-specified (and loud) cooling systems to ensure performance, laptops are designed for quiet home comfort and the fans are far less powerful. What you see in a benchmark may not have the same thermal limiting as what you have in front of you, yours may not perform as well and may limit sooner.
- Bottlenecks. Servers will have far more I/Os than laptops. More PCIe channels, more dedicated IO ports and much higher bandwidth to peripherals meaning more data in flight down uncontested paths. Multiple PCIe devices contending for time on a multiplexer connected to a 16-lane CPU will be slower than a CPU which has 40+ dedicated lanes.
- Cores. Having more cores makes a difference to not only the task you are doing on one core, but means that the tasks are not fighting for time. The tradeoff is that is is easier to hit memory bandwidth limits with more cores vying for bus time.
- Caches. Server CPUs tend to have much larger CPU caches. While this is more of an optimisation, larger caches do mean less time going to memory and allow the CPU to hit their peak performance more than a smaller cache. A single core benchmark is probably small enough to fit in most cache sizes and so tells you nothing about the rest of the system.
- Graphics. Related to PCIe/memory bus contention, your laptop will be doing graphics work, most likely with an iGPU. That means your system memory is being used (and memory bandwidth stolen) in order to drive a graphical display. The server would likely have none of that, most likely being a headless node in a compute cluster. The server has far less graphical overhead.
Consumer class CPUs are indeed powerful, but server class has far more logic, control and bandwidth to the wider system. Generally though, that is fine. We don't expect a 15 watt processor to perform the same as a 10x more expensive CPU with a 140 watt power budget. That extra power budget gives a lot more freedom.
If server CPUs had the same performance as a desktop or laptop CPU, then there wouldn't be a distinction between the two.
Just to further nail home the point: a similar single core score just tells you that the cores are reasonably comparable under ideal conditions. They may be theoretically close in terms of performance, but it doesn't tell you anything about the wider system and what the CPU is capable of when tied to other components. Single core speed is artificially focused on one small point in the system, more so than most normal uses of a system will encounter.
For more information on why one system is "better" than another you need to look more at so-called "real world" benchmarks, which will show (still artificial but) more comparable system performance metrics and hopefully give some idea where bottlenecks might lie. Better yet you do the kind of test you did which shows that for that workload a server class system, with it's underlying architecture and components, is much better.
Adding to Mokubai's excellent answer:
Instruction Set Extensions. Some extensions, such as AVX-512, are available in server processors (such as the SKX processor mentioned in the question) but not (or only later) in consumer processors. The Coffee Lake consumer CPU from the question, for example, does not support AVX-512. I don't think that compilers are too heavily affected by this, but if you were to execute certain numeric tasks, including scientific computation or machine learning, this could cause a difference.
Core interconnects. Not relevant for single-threaded tasks, but when multiple cores are used, the type of interconnect has an influence on the "speed" with which cores can talk to each other. While the consumer processor uses a ring interconnect, the server processor is the first to use a mesh interconnect.
Intel Xeon Platinum 8151 Specs From Intel Corporation
Intel i5-8259U Specs From Intel Corporation
- It appears the Xeon has 38.5 MB L3 Cache
- It appears the Intel Core i5-8259U has only has 6 MB Intel® Smart Cache
A processor cache is where a processor stores recently written or read values instead of relying on main system memory.
- Caches are designed in all sorts of shapes and sizes, but have several classic characteristics that make them easy to exploit. Caches typically have a low set associativity, and make use of bank selectors. Associative Caches
- Inside a typical processor cache, a given physical (or logical depending on the design) address has to map to a location within the cache. They typically work with units of memory known as cache lines, which range in size from small 16-byte lines to more typical 64- and even 128-byte lines.
- If two source addresses (or cache lines) map to the same cache address, one of them has to be evicted from the cache.
- The eviction means that the lost source address must be fetched from memory the next time it is used. In a fully associated cache (also known as a completely associated memory or CAM), a source address can map anywhere inside the cache. This yields a high cache hit rate as evictions occur less frequently.
- This type of cache is expensive (in terms of die space) and slower to implement. Raising the latency of a cache hit is usually not worth the minor savings in the cache miss penalties you would have otherwise... You can read more here
DDR4 at higher bus rate also helps increase speed. Not too mention that the Xeon has Transactional Synchronization Extensions whereas the i5 does not.
They are not in the same class of processor, but hopefully the information above helps you, and the links from intel corporation assist in the validity of my responses.