Why are there no 256-bit or 512-bit microprocessors?
Think about it. What exactly do you envision a "256 bit" processor being? What makes the bit-ness of a processor in the first place?
I think if no further qualifications are made, the bit-ness of a processor refers to its ALU width. This is the width of the binary number that it can handle natively in a single operation. A "32 bit" processor can therefore operate directly on values up to 32 bits wide in single instructions. Your 256 bit processor would therefore contain a very large ALU capable of adding, subtracting, ORing, ANDing, etc, 256 bit numbers in single operations. Why do you want that? What problem makes the large and expensive ALU worth having and paying for, even for those cases where the processor is only counting 100 iterations of a loop and the like?
The point is, you have to pay for the wide ALU whether you then use it a lot or only a small fraction of its capabilities. To justify a 256 bit ALU, you'd have to find an important enough problem that can really benefit from manipulating 256 bit words in single instructions. While you can probably contrive a few examples, there aren't enough of such problems that make the manufacturers feel they will ever get a return on the significant investment required to produce such a chip. If it there are niche but important (well-funded) problems that can really benefit from a wide ALU, then we would see very expensive highly targeted processors for that application. Their price, however, would prevent wide usage outside the narrow application that it was designed for. For example, if 256 bits made certain cryptography applications possible for the military, specialized 256 bit processors costing 100s to 1000s of dollars each would probably emerge. You wouldn't put one of these in a toaster, a power supply, or even a car though.
I should also be clear that the wide ALU doesn't just make the ALU more expensive, but other parts of the chip too. A 256 bit wide ALU also means there have to be 256 bit wide data paths. That alone would take a lot of silicon area. That data has to come from somewhere and go somewhere, so there would need to be registers, cache, other memory, etc, for the wide ALU to be used effectively.
Another point is that you can do any width arithmetic on any width processor. You can add a 32 bit memory word into another 32 bit memory word on a PIC 18 in 8 instructions, whereas you could do it on the same architecture scaled to 32 bits in only 2 instructions. The point is that a narrow ALU doesn't keep you from performing wide computations, only that the wide computations will take longer. It is therefore a question of speed, not capability. If you look at the spectrum of applications that need to use particular width numbers, you will see very very few require 256 bit words. The expense of accelerating just those few applications with hardware that won't help the others just isn't worth it and doesn't make a good investment for product development.
Well, I don't know about 256 or 512 bit, but I've heard about a 1024 bit processor (I can't find it right now). The word is VLIW, for Very Long Instruction Word. So that's the instruction bus, not the data bus width. The advantages are that you can implement Instruction Level Parallelism (ILP) on a large scale.
My first encounter with ILP must have been 20 years ago with Motorola DSPs, which had instructions for performing a MAC (Multiply and ACcumulate) while moving data to and from memory, so that you could perform a new MAC on the next instruction, without wasting time between two MACs for moving data.
Today there are also general-purpose controllers offering this option. VLIW applies this at a much higher scale.
Since your data bus width won't be as wide you can have several instructions plus constants in an instruction. The reason why the data bus doesn't follow the trend is that it's pretty useless; a 64-bit data register can represent a 20 decimal digit number. When was the last time you needed 20 digits of accuracy? For most applications 10\$^{20}\$ = \$\infty\$.
Further reading
VLIW Architecture
"Bitness" of a microprocessor is usually defined in terms of size of the general purpose registers. The size determines how large numbers a processor can handle natively and how much memory it can access. 64bit numbers are enough for almost any algorithm and the amount of addressable memory (16 million terabytes) is enough for quite some time to come. There simply isn't any advantage to increasing the size of the general purpose registers. On the flip side, the area of arithmetic logic units (ALU) used to perform operations on the registers scales with the square of the amount of bits. A 256bit ALU would be 16x larger and significantly slower.
On the other hand, there is point in widening the processor to make it possible to do many smaller operations at once. In fact Intel's Sandy Bridge and Ivy Bridge processors do just that, they have 256bit SIMD registers and can do two arithmetic operations and one memory operation per cycle on them. So one could justify calling them 256bit, or even 768bit processors, if one was a sneaky marketer wanting to bend regularly used terms.