Is "entanglement" unique to quantum systems?

You are correct that the description of a classical probability distribution of a joint system requires $kl$ parameters. There indeed is a difference between classical and quantum systems in this sense, but it is more subtle.

Every classical probability distribution can be described as a probabilistic mixture of deterministic states. For these deterministic states (extreme points of the space of probability densities), the description complexity of a joint system can be described by $k+l$ bits of information.

Every quantum density matrix can be described as a probabilistic mixture of pure states. For these pure states (extreme points of the of density matrices) the description requires $kl$ parameters.

Thus, classical probabilistic systems can be described in terms of probability distributions over more fundamental objects: deterministic states. These deterministic states only require $k+l$ parameters.

Quantum mixed states (the quantum analogue of probability distributions) can also be described in terms of probability distributions over more fundamental objects: pure states. However, these pure states now require $kl$ parameters.


Is "entanglement" unique to quantum systems?

Yes, unequivocally. As exemplified by Bell's inequality, there is a superluminal aspect to entanglement that creates experimental results that cannot be achieved fully by using any combination of fully classical mechanisms. The "hidden variable" models of quantum entanglement are just another way of postulating the question: "Is entanglement actually ordinary conditional probability in which certain variables are inaccessible to any known form of experimentation?"

Ringing the Bell

Until John Bell developed his famous inequality, no one has a way to test that idea in the laboratory. Ironically, Bell -- who was a strong supporter of Einstein's views on quantum mechanics -- was rooting for a hidden variables result, even though his name is now almost universally associated with proving the opposite case to be true.

Bell's inequality made it possible to accumulate solid experimental evidence on whether or not hidden variables, and thus ordinary conditional probabilities, were sufficient to explain quantum behaviors. The experimental outcome, which by now is very solid indeed, is that no combination of hidden variables can produce the spectra of correlations seen with quantum entanglement. In particular, if you analyzed one end of an entangled pair (e.g. entangled spin polarizations), the wave function representing the other entangled end of that pair is "instantly" updated with information about the range of possible options open to it when it is in turn analyzed. These updates are the source of the inequality in Bell's equation.

Fracturing "Now"

The nature of this entanglement-enabled "update" is quite curious.

In traditional conditional probabilities, an event that produces a correlated pair -- e.g. two arrows pointing oppositely on a dial to indicate the 100% certainty detection polarizations -- is an event that has already happened and cannot be erased. Consequently, no transfer of any kind of information is ever needed between the members of the pair; both simply contain "hidden arrows" that translate into real probability curves that when analyzed using the local setting of some detector.

Although entanglement is usually explained in terms of instantaneous "resetting" of the remote member of the pair, there is actually a simpler and more self-consistent way to understand what is going on. The first and most critical point is this: A quantum entangled pair is by definition one that has left no information record anywhere in the universe on exactly how its original entangling event occurred. That's unavoidably for a quantum scenario, since the instant such information comes into existence, that aspect of the experiment becomes classical and no longer follows quantum rules.

Now think about that for a moment. I am being neither flippant or metaphorical when I ask this question: If no record of how the original entangling event took place exists anywhere in the universe, has it really occurred yet? Causality cannot be affected in the past by what occurs to the system now, for the simple reason that by definition no conflicting history of the event exists anywhere else in the universe.

Quantum Cheats

Insisting that such unresolved quantum systems have well-defined pasts is a very Hamiltonian perspective, that is, one that insists that every component of the system have a well-defined "now" state. The Lagrangian quantum methods first proposed by Dirac, then ironically abandoned by him after Richard Feynman and Freeman Dyson took them up with a vengeance, are much more forgiving. They permit the final quantum resolution of events deep in the classical past to be remain ragged and even chaotically unresolved at multiple levels of scale. For quantum events hidden away in quiet corners of the universe, some resolutions of "how did it happen" for quantum systems can in principle remain unresolved literally for eons of classical time. For a quantum system, this high priority on immediate classical-style resolution simply does not matter. Such systems will instead remain superimposed and entangled for as long as needed, specifically until they are forced by some interaction with the information-rich classical universe to "explain" how it will ensure the absolute and universal conservation of parameters that include mass-energy, angular momentum (spin or polarization), charge, or any of the lesser known conservation rules. And then they cheat: They simply make up a history on the spot, one that always ensures that all various conservation rules really do get followed.

It is entirely self-consistent, then, to think of the act of detecting one member of an entangled pair not just as sending a "reset" instantaneously to its partner, but as deciding the original entangling event took place. And even if you don't like the idea itself... well, it turns out that it's a great way to keep it clear in your head how the conditional probabilities of an entangled event will differ from those of an otherwise similar classical conditional probability event. The first detection of an entangled event pair decides what that original event looked like... and the results of the other member of the pair must then work with that "new" past. Incidentally, I should note that when entangled particles have space-like separation, there is also a nice and necessary symmetry by which either event can be viewed as being the one that "sets" the original event. The detection spectra work out to be the same under with either interpretation. (However, if you want to have fun looking for possible oddities, both theoretically and experimentally, that's a good area to explore.)

Are conditional probabilities just a kind of "classical entanglement"?

No, because classical conditional probabilities do not include any kind of information transfer between the two entities.

Classical Emulation of Quantum Entanglement

However, if you are dead set on doing it, you can create a very slow and cumbersome classical analog to the conditional probabilities that are characteristic of quantum entanglement. The main thing you need to do is create your own fully classical "hidden channel" for resetting the other member of the pair after a detection takes place. That update channel must be kept protected and hidden, and the remote member of the pair cannot be allowed to be inspected or updated until the update arrives. Needless to say, the result can be almost unfathomably slow if you try to do very much of this, an inverse reflection of the speed of quantum computers (and also precisely why Feynman first proposed using quantum computers to study such systems, long before @PeterShor electrified everyone by showing that such computers could do far more than just simulations of quantum events.

Deep Patterns

One last tangent that I just have to mention: I find it absolutely fascinating that the hidden channels that I just described for simulating quantum entanglement correspond remarkably closely to the concept of atomic transactions in relational databases. As with simulating quantum events, these ACID constraints result in astronomical slow-downs if applied to networks that cover large areas. Such traditional databases thus correspond surprisingly well to classical attempts to emulate quantum systems, and as with such simulations work well only when physically localized. Conversely, new highly distributed database have BASE features that lose immediate coherency in favor of locality of processing. They correspond quite well to classical systems.

There appear to be deep information patterns that transcend many levels not just of physics, but of how technology itself is forced to evolve to reach new levels of capability. Quite fascinating, that.