Microprocessors for multiple (~40) SPI devices?
Either use demultiplexers such as the 74HC138 for the slave select, or use diode-ORs with a matrix select.
Or if the protocol allows for it, you could chain all the buttons together and use one long SPI transfer for all of them.
One word of caution when you are using that many SPI devices on a single bus:
- Input Capacitance.
That number of devices will put a massive amount of capacitance on the bus. Unless you take precautions it will severely limit your maximum bus frequency, and thus the speed at which you can update display contents etc. (Basically the input capacitance coupled with the output impedance of the MCU's IO pins form a low-pass filter turning higher frequency square waves into more like sine waves, which SPI doesn't like - it messes with the timing).
If you are happy to use lower communication speeds then that's fine - however I would recommend splitting the bus into a number of smaller segments and buffering each SCK and MOSI signal to keep the capacitance on each segment to within reasonable levels. An alternative is to use a single high current drive buffer to reduce the output impedance of the SCK and MOSI pins.