Where to start when considering making a GPU?
That's kinda like going to your collage final exam for science class and having this as your question: Describe the universe. Be brief, yet concise. There is no way possible to answer that one in any practical way-- so I'll answer a different question.
What are the kinds of things I need to know before attempting to design a GPU?
In a rough chronological order, they are:
- Either VHDL or Verilog.
- FPGA's (useful area to play with writing digital logic).
- Basic data-path stuff, like FIFO's.
- Bus interfaces, like PCIe and DDR2/3 interfacing
- Binary implementations of math functions, including floating point, etc.
- CPU design.
- Video interfacing standards.
- High speed analog stuff (the analog side of high speed digital)
- PLL's and other semi-advanced clocking stuff.
- PCB design of high speed circuits.
- Low voltage, high current DC/DC converter design.
- Lots and lots of software stuff.
- And finally, ASIC or other custom chip type design.
I will also dare say that you won't be making this kind of thing out of TTL logic chips. I doubt that you could get a reasonable DDR2/3 memory interface working with normal TTL chips. Using a big FPGA would be much easier (but not easy).
Going up to step 6 will probably be "good enough to quench your intellectual thirst". That could also be done within a reasonable amount of time-- about a year-- to set as a short-ish term goal.
EDIT: If all you want to do is spit out a video signal then it's relatively easy. It is, in essence, a chunk of memory that is shifted out to a display at 60-ish Hz. The devil's in the details, but here's a rough outline of how to do this:
Start with some dual port RAM. It doesn't have to be true dual port ram, just some RAM that a CPU can read/write and that your video circuit can read. The size and speed of this RAM will depend on what kind of display you're driving. I personally would use DDR2 SDRAM connected up to the memory interface of a Xilinx Spartan-6 FPGA. Their "memory interface generator" core (MIG) makes it easy to turn this into a dual-port RAM.
Next, design a circuit that will control how this RAM is read and spit this data out a simple bus. Normally you just read the RAM sequentially. The "simple bus" really is just that. It's some bits with the pixel value on it-- and that's it. This circuit will need to do two more things: it will have to go back to the beginning of RAM every video frame and it will have to "pause" the output during the horizontal/vertical retrace periods.
Thirdly: make a circuit that will output the video control signals (HSync, Vsync, etc.) as well as tell the previous circuit when to pause and restart. These circuits are actually fairly easy to do. Finding the appropriate video standard is harder, imho.
And Finally: Connect the control signals and video pixel data bus to "something". That could be a small color LCD. It could be to a video DAC for outputting a VGA compatible signal. There are NTSC/PAL encoders that would take these signals. Etc.
If the resolution is really small you might get away with using the internal RAM of the FPGA instead of an external DDR2 SDRAM. I should warn you that if DDR2 SDRAM is used then you'll probably require a FIFO and some other stuff-- but that too isn't terribly difficult. But with DDR2 SDRAM you can support fairly high resolution displays. You can also find FPGA development boards with integrated VGA DAC's and other forms of video outputs.
Racing the Beam is a detailed look at the design and operation of the Atari VCS. It has a thorough treatment of the Television Interface Adapter.
The TIA is about the simplest, practical, GPU.
Understanding a small, but complete, working system can be a good way to learn a new subject.
Complete schematics are available, as is a technical manual.
If you just want to put some stuff on the screen, and think you might really, really enjoy wiring, you could aim for an early 1980-ish character graphics system. If you can hit the timing for RS-170A, you might even be able to push the signal into a spare AV input on a 50" plasma television, and go retro in a big way.
Some early systems used their 8-bit CPUs to directly generate the display, examples being the 6507 in the Atari 2600 and the Z-80 in the Timex Sinclair ZX-81. You can even do the same sort of thing with modern microcontrollers. The advantage this way is that the hardware is simple, but the software generally has to be in assembler, and is very exacting, and the results will be truly underwhelming. Arguably the 2600 employed extra hardware, but the TIA didn't have much of a FIFO, and the 6502 (well, 6507, really) had to dump bytes to it in real time. In this approach, there is no standard video mode; every application that uses video has to be intimately combined with the needs of keeping the pixels flowing.
If you really want to build something out of TTL, the next level of complexity would be to go for character-ROM based text display. This allows you to put any of, say, 256 characters in any of for example 40 columns and 25 row positions. There are a couple ways to do this.
One way - do what the TRS80 Model I did. A group of 74161 counters with an assortment of gates generated the video address; three 74157s multiplexed 12 bits of the CPU address with the video address, to feed an address to a 2K static RAM. RAM data was buffered back to the CPU, but fed un-buffered as address to the character set ROM. There was no bus arbitration; if the CPU wanted video RAM, the video system got stepped on, resulting in the 'snow' effect. The muxed video address was combined with some lines from the counter section to round out the low addresses; character ROM output was dumped into a 74166 shift register. The whole thing ran off divisions from a 14.31818MHz crystal. In this approach, you'd have exactly one video mode completely implemented in hardware, like 40x25 or 64x16, etc., and whatever character set you can put in the ROM.
Another way - dig up a so called CRTC chip like a 6845. These combined most of the counter and glue logic, and provided the processor with a control-register interface so you could reprogram some of the timing. Systems like this could be made somewhat more flexible, for example, you might get 40x25 and 80x25 out of the same hardware, under register control. If you get clever about the clock frequencies, you might be able to let your CPU have free access to the video RAM during one half the clock, and the video address generator access during the other half the clock, thereby obviating the need for bus arbitration and eliminating the snow effect.
If you want to go for real graphics modes, though, you'll quickly find that rolling your own is problematic. The original Apple 2 managed it, but that system had something like 110 MSI TTL chips in it, and even so there were some funny things to deal with, like non-linear mapping of the video buffer to the display, and extremely limited color palettes, to name two. And Woz is generally recognized as having had a clue. By the time the '2e' came along, Apple was already putting the video system in a custom chip. The C-64, out about the same time, owed its graphics capabilities to custom chips.
So .. I'd say there about two ways to do it. One way - get your bucket of old TTL out and aspire for an 80x25 one-color text display; the other way - get yourself a good FPGA evaluation board, do the whole thing in VHDL, and start with an 80x25 text display.