Why is this Verilog RAM modification better in terms of resource usage?
As Joshua says, something is clearly wrong here. The synthesis tool has clearly optimized away your memory.
Having had a quick readup on the ice40 blockram it seems to have registered output, so making the output combinatorial would force the tool to use a big bunch of registers instead of a blockram.
Speculating a bit here, but I wonder if readmemh only works on things that the synthesis tool was able to infer as blockram, and not on "big hunks of registers".
Another possibility is you have forgotten to hook up some of the inputs and/or outputs properly. With the aggressive optimization that synthesis tools do, you can't really test resource usage without having a functional design.
Normally if my design shows a drop in resources it means that it actually 'optimized' something away; I suspect the same has happened here.
Typical FPGA toolchains will cut everything away that does not directly or indirectly influence an output pin.
Best way to check what Yosys is doing seems to include using the 'show' command: http://www.clifford.at/yosys/files/yosys_appnote_011_design_investigation.pdf