How does my screen driver handle so much data?
Your calculations are correct in essence. For a 1440p60Hz signal, you have a data rate of 5.8Gbps once you allow for blanking time as well (non-visible pixel border in the image output).
For HDMI/DVI, a 10/8b encoding is used, which means effectively although you have say 24bit of colour data per pixel, actually 30bit is sent as the data is encoded and protocol control words added. No compression is done at all, the raw data is sent, so that means you need 7.25Gbps of data bandwidth.
Again looking at HDMI/DVI. It uses the "TDMS" signalling standard for data transfer. The HDMI V1.2 standard mandates a maximum of 4.9Gbps for a Single-Link (3 serial data lines + 1 clock line), or in the case of Dual-Link DVI a maximum of 9.8Gbps (6 serial data lines, I think). So there is more than sufficient bandwidth to do 1440p60 through a Dual-Link DVI, but not through a HDMI V1.2.
In the HDMI V1.3 standard (most devices actually skipped to V1.4a which is the same bandwidth as 1.3), the bandwidth was doubled to around 10Gbps which would support 1440p60, and is also enough bandwidth for UHD at 30Hz (2160p30).
DisplayPort as another example has 4 serial data streams, each capable (in V1.1) of 2.16Gbps per stream (accounting for encoding), so with a V1.1 link you could do 1440p60 easily with all 4 streams. They have also release a newer standard, V1.2 which doubles that to 4.32Gbps/stream allowing for UHD @ 60Hz. There is a newer version still which they have pushed even further to 6.4Gbps/stream.
Initially those figures sound huge, but actually not so much when you consider USB 3.0. That was released with a data rate of 5Gbps over just a single cable (actually two, one for TX, one for RX, but I digress). PCIe which is what your graphics card uses internally nowadays runs at up to 8Gbps through a single differential pair, so it is not all that surprising that external data interfaces are catching up.
But the question remains, how is it done? When you think about VGA, that is comprised of single wires for R, G, and B data which are sent in an analogue format. Analogue as we know is highly susceptible to noise, and the throughput of DAC/ADCs is also limited, so that massively limits what you can push through them (having said that you can barely do 1440p60Hz over VGA if you are lucky).
However with modern standards we use digital standards which are much more immune to noise (you only need to distinguish high or low rather than every value in between), and also you remove the need for conversion between analogue and digital.
Furthermore the advent of using differential standards over single ended helps significantly because you are now comparing the value between two wires (+ve difference = 1, -ve difference = 0) rather than comparing a single wire with some threshold. This means that attenuation is less of an issue because it affects both wires equally and attenuates down to the mid-point voltage - the "eye" (voltage difference) gets smaller, but you can still tell whether it is +ve or -ve even if it is only 100mV or less. Single ended signals once the signal attenuates it might drop below your threshold and become indistinguishable even if it still has 1V or larger amplitude.
By using a serial link over a parallel one, we also can go to faster data rates because skew ceases to be an issue. In a parallel bus, say 32bit wide, you need to perfectly match the length and propagation characteristics of 32 cables in order for the signals not to move out of phase from one another (skew). In a serial link you have only a single cable, so skew can't happen.
TL;DR The data is sent at the full bit-rate you calculated (several Gbps), with no compression. Modern signalling techniques of serialised digital links over differential pairs make this possible.
Modern computers are surprisingly fast. People will happily load up full HD 30fps videos without realising that that involves billions of arithmetic operations per second. Gamers tend to be slightly more aware of this; a GTX 1060 will give you 4.4 TFLOPS (trillion floating point operations per second).
Please explain if my calculations are wrong and how is this data transported from my graphics card to my screen?
How wide are buses between my graphics card and my screen?
Another answer has addressed the multi-gigabit nature of HDMI, DisplayLink etc.
Perhaps explain in a nutshell how a display does store pixels? Shift registers? Cache?
The display itself stores, in theory, no image data.
(Some displays, especially televisions, store a frame or two to apply image processing. This increases latency and is unpopular with gamers.)
The graphics subsystem of a computer stores pixels in ordinary DRAM. It doesn't usually redraw the whole thing from the processor every frame, but hands some of the functionality off to dedicated subsystems and a compositor. A compositor will allow e.g. each window on the desktop to be stored as a distinct set of pixels, which can then be moved, scrolled or zoomed by the dedicated hardware. This becomes quite obvious with scrolling on mobile devices - you can go a short way until you run out of "offscreen" pre-computed pixels and the software has to stop and render some more to the compositor's buffers.
Games are redrawn every frame, and there's plenty of literature on how a scene is built up. This is built up into a framebuffer on the graphics card which is then transmitted out while the next frame is drawn into a different buffer.
Video decoding is usually given to dedicated hardware too, especially H.264.
The link between display card and LCD panel is carried over several high-speed differential pairs using TMDS signaling, usually called "lanes". Typically four lanes are used, so one can say that the bus is 4-bit wide. For some more details there is a stackhexchange answer.
Each LCD panel model is usually produced with several interface incarnations, so one needs to be careful and look at suffixes when trying to replace a broken panel. Most modern digital link (HDMI 1.4) has 10.2 Gbps, or just 2.5 Gbps per lane. In your calculations (663 MBps) it totals to 1.2 Gbps per lane (assuming 4 lanes), which is not that much (for example SATA3 has 6Gbps).
ADDITION on LCD panels. The active-matrix LCD actually tries to store the frame image (pixel data) in capacitors associated with "Twisted Nematic Cells" (the one that controls film polarization). The problem is that the size of analog storage caps must be a trade-off between storage time and speed of pixel switch. So it can't be made large, loses stored potential fast, and therefore requires periodic refresh. Each pixel cell is connected with data and address lines via a transistor ("active" element), see this Tomshardware article. The LCD driver-controller muliplexes data and address lines in line-by-line fashion thus maintaining the displayed image. The image itself is stored in a frame buffer (RAM) inside the graphics controller.