How does a GPU/CPU communicate with a standard display output? (HDMI/DVI/etc)
The image displayed on the monitor is stored in your computer's video RAM on the graphics card in a structure called a framebuffer. The data in the framebuffer is generally 24 bit RGB color, so there will be one byte for red, one for green, and one for blue for each pixel on the display, possibly with some extra padding bytes. The data in the video RAM can be generated by the GPU or by the CPU. The video RAM is continuously read out by a specialized DMA component on the video card and sent to the monitor. The signal output to the monitor is either an analog signal (VGA) where the color components are sent through digital to analog converters before leaving the card, or a digital signal in the case of DVI, HDMI, or DisplayPort. The hardware responsible for this also generate the horizontal and vertical sync signals as well as all of the appropriate delays so the image data is only sent to the monitor when it is ready for it. In the DVI and HDMI, the stream of pixel color information is encoded and serialized and sent via TMDS (transition minimized differential signaling) to the monitor. DisplayPort uses 8b/10b encoding. The encoding serves multiple purposes. First, TMDS minimizes signal transitions to reduce EMI emissions. Second, both TMDS and 8b/10b are DC balanced protocols so DC blocking capacitors can be used to eliminate issues with ground loops. Third, 8b/10b ensures a high enough transition density to enable clock recovery at the receiver as DisplayPort does not distribute a separate clock.
Also, for HDMI and DisplayPort, audio data is also sent to the graphics card for transmission to the monitor. This data is inserted into pauses in the data stream between video frames. In this case, the video card will present itself as an audio sink to the operating system, and the audio data will be transferred via DMA to the card for inclusion with the video data.
Now, you probably realize that for a 1920x1080 display with 4 bytes per pixel, you only need about 8 MB to store the image, but the video RAM in your computer is probably many times that size. This is because the video RAM is not only intended for storing the framebuffer. The video RAM is directly connected to the GPU, a special purpose processor designed for efficient 3D rendering and video decoding. The GPU uses its direct access to the video RAM to expedite the rendering process. In fact, getting data from main memory into video memory is a bit of a bottleneck as the PCI bus that connects the video card to the CPU and main memory is significantly slower than the connection between the GPU and the video RAM. Any software that requires lots of high resolution 3D rendering has to copy all of the 3D scene data (primarily 3D meshes and texture data) into video RAM so the GPU can access it efficiently.
The various modern display outputs are essentially serial bitstreams. The bitrate is too high for a processor (or if it could keep up it would claim too much of its processing time). A piece of memory is set aside to contain the bits of the image. A dedicated piece of hardware reads the memory contents and streams it out. This piece is similar to a DMA controler and actually quite simple. It is only a small part of a modern GPU, which mostly concerns itself with creating that image in memory from higher-level GPU commands.
The memory that contains the video image can be part of the main memory (cheap) or a dedicated memory that can be simultaneously accessed by the 'DMA' and the CPU and/or GPU. The 'DMA' must be configured with various parameter, for instance the pixel width and height, the the color depth, the start location in memory, etc.
On modern computers the GPU is a (very specialized) processor that rivals the speed of the main CPU (and exceeds it on its own turf). It does things like generating a psuido-3D image from a bunch of 3D objects with textures and light sources. This can all done inside the video memory, by the GPU. The CPU just delivers the objects, textures, and light sources.
Reading the video data from memory and shifting it out is a rather simple process, but it must be done quite fast, and all the time. Hence this task is well suited to dedicated hardware, and ill-suited to a CPU. AFAIK the last computers that had the CPU involved in generating the video signal were the ZX80/81 and Spectrum. On those the CPU could do its own work only during the (vertical?) retrace time.