SRAM and Flip-Flops
You have to keep transistors and gates apart.
Four transistors is not bad to store a bit of data. If you would use a couple of gates you'd need at least 8. (A 2-input NAND gate consists of 4 transistors.) An SRAM cell is basically two inverters connected back to back, so that they one keeps the level of the other alive. One inverter consists of 2 transistors, so that's 4 in total.
Actually it's possible to use even less hardware to store a bit, and that's what DRAM does: it stores a bit as a voltage level in a capacitor. This means that you can get a lot more data in a square mm of DRAM than in an SRAM. Unfortunately the capacitor voltage leaks away, so the DRAM has to be refreshed continuously.
There are various ways of making a 1-bit memory cell. However, those implemented with active logic are all one way or another a amplifier with positive feedback. As you mentioned, this can be done with two transistors and some resistors:
Look at this carefully and you will see it has two stable states, either Q1 on or Q2 on. However, it also has a significant drawback, which is that it draws current continuously. The resistors can be made quite high, but there are still many many bits on a modern static RAM chip and the currents for each bit would add up.
The basic CMOS inverter doesn't draw current (except for small leakage) when solidly in either state. This is a simple two-FET circuit. A PFET can pull high and a NFET pull low. The gates are tied together and the thresholds set so that only one of the two FETs will be on when the gates are fully high or fully low. However, a inverter doesn't provide positive gain. That can be solved by using two inverters back to back. Two inverters in a row make positive gain. If the two inverters are connected in a loop, then they have two stable states. One will be high and the other low, but the circuit is stable in both the high-low and low-high states. Since a CMOS inverter is just two FETs as described above, this memory cell is 4 FETs with the big advantage that it doesn't take any current when not switching. As Steven said, four CMOS FETs per bit isn't really all that bad. Everything is a tradeoff.
CMOS AND gates require 4 transistors (the minimum) for the 2-input gate.
You can go down to 2 in resistor-transistor logic:
For registers, there are many topologies but the simplest requires at least a ring with two inverters, thus 4 transistors plus the writing buffers, so about 8 transistors.
SRAM need 4 transistors in the smallest simplest design (resistor-transistor, but resistors are far bigger than transistors in MOS technology), 6 for a full MOS cell. You can have 1-transistor DRAM though, using a capacitor to store the value; but that's again dynamic logic, and it's the highest integration possible.