How does a computer use just a few registers?
The other variables and thread stacks are usually stored in protected memory space, where they can be called into registers when needed.
You may want to check out the book The Elements of Computing Systems for a good understanding of how your computer's CPU works. The book is set up as a series of projects where you work up from a NAND gate to a CPU, assembler, simple compiler, and on to a small operating system. It's invaluable in understanding how all your computer's parts fit together.
Each time a thread (or a process) swaps out, all the registers are pushed onto the stack by the operating system kernel into a data structure usually called the Process Control Block. Then, when the thread/process swaps back in, the register's data are read from PCB and popped off the stack to the registers.
There are also internal registers and a mapping table that the x86 has internally that sort of set up a virtual register table to preserve the IA32 instruction set architecture while having a greater flexibility to design superscalar architectures and sophisticated instruction scheduling algorithms.
Also, instruction sets usually have a load and store instruction, which is used in conjunction with pointers to memory, allowing data to be stored from registers into memory. Which is where the term Load-Store machine comes from, ie, a computer that doesn't have instructions that operate directly on memory.
Some computers do have instructions that operate on memory; some are stack-based. It depends on the designers and the constraints being placed on the hardware.
Multi-threading itself doesn't affect the number of registers in use. When a thread is swapped out, it generally has its registers saved to memory and the next thread to run has those registers loaded up from its previous save.
An example is a system having a thread control block structure (TCB). This structure would contain (while the thread wasn't running), the saved instruction pointer, stack pointer, general purpose registers, floating point registers, thread statistics and so on. In short, everything needed to totally restore the thread to the state it was in when it was swapped out for another thread to run.
And not everything that goes on in a computer is done in registers. Modern compilers can optimise code so that the data items used the most are kept in registers but the vast majority of data is held in memory and only bought into registers when needed.
The best book I've ever read on the subject is Tanenbaum's "Structured Computer Organization" which examines computers in terms of layers, from the digital logic level up to the operating system level, with each level building on the previous.
Aside: my dream is to one day write a book just like this that covers everything, from the quark level up to Emacs :-)