Preserving mutual information after compressing states

Ahlswede and Gacs showed the following 'strong data processing inequality' in the mid-70s.

Suppose the channel $P_{U|V}$ is such that $0$ error communication is not possible over it - i.e., for every $v \neq v', \exists u: P(u|v)P(u|v')> 0$. Then there exists an $\eta < 1$ such that for any Markov chain $W-V-U,$ $$I(W;U) \le \eta I(V;U).$$

In our case, let $W$ be the quantised version of $Y$, $V=Y$ and $U= X$. If the channel $P_{X|Y}$ cannot have $0$ error communication, then it follows that we must lose some information in the quantisation. So, basically anything can serve as a counterexample (although it's non-trivial to see how).

Strictly speaking the inequality above is quite a big hammer - for instance, it shows that varying the distribution of $Y$ while keeping the channel $P_{X|Y}$ fixed cannot help.

See this recent survey due to Polyanskiy and Wu for an account of strong data processing inequalities. The inequality I cite is equation (21) there.


An alternate:

In a comment you had asked how few levels $Y$ needed to be quantised to in order to attain good compression of the information about $X$. This paper studies a slightly different problem, which you may be interested in. In short: consider the set of random variables $X,Y$ with joint distribution such that $I(X;Y) \ge \beta$. They study the minimax question of how much information a $M<|\mathcal{Y}|$ level quantiser can retain about $X$ in the worst case as one varies $P_{XY}$ subject to the prior constraint. In fact, with their converse results, you can find other counterexamples for your problem - for instance, they show that for binary $X$, there exists some joint distribution $P_{XY}$ such that any binary quantisation $Y_2$ of $Y$ must have $I(X;Y_2) \le 3\beta/\max( \log(1/\beta), 1). $ This means that there should exist distributions with mutual information $< 1/8$ for which you cannot retain sufficent mutual information in any $2$-level quantisation (possibly you can figure something out explicitly using the constructions in their upper bound).