Preferred idiom for endianess-agnostic reads
After some research, I found (with the help of the terrific people in ##c on Freenode), that gcc 5.0 will implement optimizations for the kind of pattern described above. In fact, it compiles the C source listed in my question to the exact assembly I listed below.
I haven't found similar information about clang, so I filed a bug report. As of Clang 9.0, clang recognises both the read as well as the write idiom and turns it into fast code.