Origin of the notation for statistical divergence
Kullback and Leibler did not originate the $D(P||Q)$ notation. In their paper "On Information and Sufficiency", Ann. Math. Stat, 22(1):79-86, 1951, they use $$I_{1:2}(E)=\frac{1}{\mu_1(E)}\int_{E} \,d\mu_1(x) \log \frac{f_1(x)}{f_2(x)},$$ stated for a set $E\subseteq S$ of the sample space $S.$ They attribute this notation to Halmos and Savage.
Shannon doesn't seem to use it either, as far as I can tell by a cursory look. Maybe an information theorist (Cover? Wolfowitz(?), Gallager(?, but in his classic book it only appears as a problem, for the discrete case, and without a symbol, just as a sum!), Wyner(?),Csiszar?) later on adopted the notation.
The two vertical bars may be there to stop people think it is a conditional distribution.