Entanglement and coherence
Okay, this is getting even more into depth, which is great stuff! I heartily recommend anyone who is this dedicated take a few courses on the subject, if you haven't already.
The state matrix formulation of Quantum Mechanics
Here's the most basic formulation of quantum mechanics which adequately shows all of these properties, called the density-matrix or state-matrix formulation. Take a wavefunction $|\psi\rangle$ and identify the state-matrix $\rho = |\psi\rangle\langle\psi|$ with this state. The state matrix has all of the same information as the wavefunction but evolves according to the product rule,$$i\hbar ~\frac {\partial\rho}{\partial t} = \hat H \rho - \rho \hat H.$$
As always, we predict expectation values of experiments by associating to their numerical parameters a Hermitian operator $\hat A.$ Now, instead of calculating this as the usual $\langle A \rangle = \langle\psi|\hat A|\psi\rangle$ we insert some orthonormal basis $I = \sum_i |i\rangle\langle i|$ into the middle of this expression as $$\langle A \rangle = \sum_i \langle\psi|\hat A|i\rangle\langle i|\psi\rangle = \sum_i \langle i|\psi\rangle\langle\psi|\hat A|i\rangle = \sum_i \langle i|\rho ~ \hat A|i\rangle = \operatorname{Tr} \rho \hat A.$$All expectation values are therefore traces of these matrix products. We can also insert a further identity inside these two between them, to find $\langle i | \rho | j\rangle = \rho_{ij},\;\langle j | \hat A | i\rangle = A_{ji},$ and so we have a matrix expression $\langle A \rangle = \sum_{ij} \rho_{ij} A_{ji},$ if you like. Any discrete basis of the Hilbert space will work even if it has no particular tie to our Hamiltonian.
How to generate an effective substate matrix
Now suppose we have an observable which only impacts one subsystem of the whole system. Here we simply convert the basis to one that spans both subsystems, $|i, j\rangle$ and our observable has the form $\hat A \otimes I$ in terms of its effect on the respective systems. Our expression for the expected value is therefore: $$\operatorname{Tr} \rho (\hat A \otimes I) = \sum_{ij} \langle i,j|\rho (\hat A \otimes I) |i, j\rangle$$Inserting another identity $I = \sum_{mn} |m,n\rangle\langle m,n|$ we can look carefully to the second term:$$\langle A\rangle = \sum_{ij~mn} \langle i,j|\rho |m, n\rangle\langle m,n|(\hat A \otimes I) |i, j\rangle = \sum_{ij~mn} \langle i,j|\rho |m, n \rangle A_{mi} \delta_{nj}.$$We therefore find that there is an expression for something which acts precisely as an effective substate-matrix $\tilde \rho$ for the subsystem: it reproduces all of the expectation values that you see above for any operator which only works in the substate. That substate matrix is:$$\tilde\rho_{ij} = \sum_{n} \langle i,n| \rho |j, n\rangle,$$whence $\langle A \rangle = \sum_{ij} \tilde\rho_{ij} A_{ji}.$
We call the process which generates the substate matrix "tracing out" the rest of the superstate, because it has the same structure as a partial trace.
The difference between superposition and entanglement.
Let us calculate the state matrix for $a |0\rangle + b |1\rangle$. This is very simple: it is $$\rho = aa^* |0\rangle\langle 0| + a b^* |0\rangle\langle 1| + b a^* |1\rangle\langle 0| + b b^* |1\rangle\langle 1|,$$ or, written as a bona-fide matrix, $$\rho = \begin{bmatrix} a a^* & a b^* \\ b a^* & b b^*\end{bmatrix}.$$
Now let us entangle it with another system. We will use the CNOT operation to entangle it with a constant $|0\rangle$, generating $a |00\rangle + b |11\rangle.$ When we perform the above recipe to this system we find ourselves looking at a completely different density matrix: $$\tilde\rho = \begin{bmatrix} a a^* & 0 \\ 0 & b b^*\end{bmatrix}.$$ Now let me explain why I couldn't use wavefunctions to get this result: it is that this state matrix cannot be expressed as a wavefunction, unless either $a = 0$ or $b = 0.$ The previous matrix $\rho$ is actually as general as a single-particle wavefunction can be, and it has off-diagonal terms. This one does not, precisely because there is no way that the "tracing out" step can convert a $|00\rangle\langle 11|$ term to anything "internal" to the substate matrix. It lives outside of the substate matrix and can only be measured by measuring both parts of the global state and comparing them!
The double-slit observable
The simplest observable is $\hat A_1 = |1\rangle\langle 1|$, measuring the probability that a qubit is in state $|1\rangle.$ Now suppose that we don't do this directly, but first evolve the state with a unitary matrix. This will correspond to a photon going through a slit corresponding to the qubit and then traveling to a photomultiplier tube at position $y$, which will "click" (transition from $|0\rangle$ to $|1\rangle$ with amplitudes $f_{0,1}(y)$ when only one of these is open. So the unitary transformation is, for some $\alpha_{0,1}$ that don't matter, $$|0\rangle \mapsto \alpha_0(y) |0\rangle + f_0(y) |1\rangle\\ |1\rangle \mapsto \alpha_1(y) |0\rangle + f_1(y) |1\rangle.$$We measure the resulting qubit, which results in $$\langle A \rangle = \operatorname{Tr} (U\rho U^\dagger~ \hat A_1) = \operatorname{Tr} (\rho~U^\dagger \hat A_1 U).$$The matrix $U^\dagger \hat A_1 U$ is therefore $$\begin{bmatrix}\alpha_0^*&f_0^*\\\alpha_1^*&f_1^*\end{bmatrix} \begin{bmatrix}0&0\\0&1\end{bmatrix} \begin{bmatrix}\alpha_0&\alpha_1\\f_0 & f_1\end{bmatrix} = \begin{bmatrix}f_0 f_0^*& f_1 f_0^*\\f_0 f_1^* & f_1 f_1^*\end{bmatrix}. $$ This is our double-slit observable matrix.
From this you have enough to calculate the two cases, which are $$\begin{align} \operatorname{Tr} (\rho \hat A) =& a a^* f_0 f_0^* + a b^* f_0 f_1^* + a^* b f_0^* f_1 + b b^* f_1 f_1^* = |a f_0(y) + b f_1(y)|^2\\ \operatorname{Tr} (\tilde \rho \hat A) =& a a^* f_0 f_0^* + b b^* f_1 f_1^* = |a f_0(y)|^2 + |b f_1(y)|^2.\end{align}$$In fact in general the latter probability matrix, with no off-diagonal terms, behaves like a classical probabilistic mixture of classical bits $0$ and $1.$ That is a very general result from the linearity of trace; in general if $\rho = \sum_i p_i \rho_i$ then $\operatorname{Tr}(\rho \hat A) = \sum_i p_i \operatorname{Tr}(\rho_i \hat A)$, so the system behaves like a classical-probability-mixture of the different constituent $\rho_i$. (Caution: this basis is generally not unique. If you work it out, $\rho = \frac 12 |0\rangle\langle 0| + \frac 12 |1\rangle\langle 1|$ is actually the same as $\rho = \frac 12 |+\rangle\langle +| + \frac 12 |-\rangle\langle -|.$ I am telling you this because I have heard people who do not know this argue that this explains how Quantum Mechanics "chooses" a basis for its decoherence, hence why the world looks classical rather than quantum at a macro-scale... it doesn't really resolve that problem at all!
So that is how to easily understand entanglement as destroying coherence: the more you're entangled, the more the orthogonality of the other system kills your off-diagonal terms, and the more your substate looks like a classical probability mixture, transferring the cool quantum effects to the system-as-a-whole.
I am posting these notes following a request for further information regarding this question. Should not affect the OP's choice of answer.
Notes added in proof:
On the meaning of quantum coherence:
Quantum coherence is a direct extension of the classical concept of wave coherence. Two classical waves are said to be coherent if they can produce a well-defined interference pattern. In order for this to happen, for instance with electromagnetic waves, the two waves must have the same frequency and a constant phase difference, such that when they add/superpose/overlap the resulting wave pattern remains well-defined. This is how coherent sources were first defined in optics.
In contrast, incoherent optical sources, even if monochromatic, produce an ensemble, or statistical superposition, of light waves with random relative phases (and polarizations, to be precise), which do not/cannot interfere which each other. To get an interference pattern one must first isolate a single coherent component and use it to set up coherent sources, such as the two slits in the famous double-slit example.
When electron interference patterns where first detected, it made sense to interpret them in the same terms as optical interference, and the concept of coherence transferred automatically to superpositions of wave functions and quantum states in general. So did the concept of incoherent statistical ensemble.
So, in general a coherent quantum state means a coherent superposition that can produce interference patterns (there is also a more specific notion of "coherent states", as in those of the harmonic oscillator, please do not confuse the concepts). For this to happen it must be a pure state $|\psi\rangle$. If such a $|\psi\rangle$ is expressed as a superposition of two other states, say $|\psi\rangle \sim |0\rangle + |a|e^{i\theta}|1\rangle$, then it implies a well-defined relative phase (or phase difference) between states $|0\rangle$ and $|1\rangle$, even if the superposition amplitude $|a|e^{i\theta}$ changes in time. See some good explanations along these lines in answers to this related question.
On the other hand, the concept of incoherent superposition evolved into that of mixed state, described no longer by a state vector $|\psi\rangle$, but by a positive definite state operator $\rho$. A mixed quantum state $\rho$ is understood in two distinct ways that are equivalent as long as the overall dynamics remains linear (yes, nonlinear dynamics would distinguish between the two):
1) Following the optics analogy: as an incoherent superposition of coherent states, or in quantum theory terms, as a statistical mixture of pure states. That is,
$$
\rho = \sum_{k}{p_k |\psi_k\rangle\langle\psi_k|}
$$
where $p_k$ is the probability of pure state $|\psi_k\rangle$, $0\le p_k\le 1$, and the states $|\psi_k\rangle$ need not be mutually orthogonal (in which case they are not the eigenstates of $\rho$, those are different and always exist!). This sort of statistical mixture is equivalent to a physical ensemble of identical quantum systems (copies), each in some pure state $|\psi_k\rangle$. In this case $p_k$ represents the frequency of copies in the respective $|\psi_k\rangle$.
2) As the reduced state of a subsystem of a larger quantum system that is overall in a pure state. This definition gives an intrinsic quantum meaning to mixed states, and relies in turn on the concept of entanglement.
Formally, a joint pure state of two systems $A$ and $B$ is entangled if it is not a direct product of "local" pure states, that is, $|\psi_{AB}\rangle \neq |\psi_A\rangle\otimes|\psi_B\rangle$. Conversely, if $A$ and $B$ are in a joint pure state, then they are disentangled if and only if each of them is in a pure state and $|\psi_{AB}\rangle = |\psi_A\rangle\otimes|\psi_B\rangle$. The latter is called a separable pure state.
The operational meaning of a separable pure state $|\psi_{AB}\rangle$ is that measurements of any two "local" observables $O_A$ and $O_B$ are statistically_uncorrelated_, in the sense that the average of a product $O_A O_B = O_A\otimes O_B$ equals the product of the averages, $$ \langle \psi_{AB}| O_A\otimes O_B |\psi_{AB} \rangle = \langle \psi_{AB}| O_A |\psi_{AB} \rangle \langle \psi_{AB}| O_B |\psi_{AB} \rangle $$ or equivalently, that the statistical correlation of $O_A$ and $O_B$ is null, $$ \langle \psi_{AB}| O_A\otimes O_B |\psi_{AB} \rangle - \langle \psi_{AB}| O_A |\psi_{AB} \rangle \langle \psi_{AB}| O_B |\psi_{AB} \rangle = 0 $$
On entanglement and loss of coherence:
From the above it follows immediately that a joint pure state is entangled if and only if it produces non-vanishing correlations for at least one pair of "local" observables. In this case we know with certainty that neither $A$ nor $B$ can be in pure states, since otherwise the state would be separable!
But now we can also see an interesting relation between entanglement and coherence, which answers questions 1 & 2:
An entangled pure state is by all means a coherent state, generally a coherent superposition of separable pure states of two or more subsystems. Yet the individual subsystems can no longer be in coherent, pure states themselves. This is what Chris Drost pointed out when he wrote that entanglement is paradoxically responsible for loss of coherence. Coherence is necessarily lost within individual entangled subsystems because they cannot be in coherent states, but at the same time correlations between subsystems keep the total state coherent.
Things get somewhat more complicated as soon as we acknowledge that entangled states may also be mixed states themselves, but this is the general idea.
In order to give any simple example we need to complete the 2nd definition of a mixed state above and see what becomes of the "local", reduced state of an entangled subsystem. The following derivation hopefully emphasizes the connection to basic probability rules. Let the total entangled state be $|\psi_{AB} \rangle$, or equivalently $\rho_{AB} = |\psi_{AB} \rangle \langle \psi_{AB} |$, and let $O_A$ be any arbitrary observable of $A$, with eigenbasis $\{|j_A\rangle\}_j$ and corresponding eigenvalues $\omega_j$. Also let $\{|k_B\rangle\}_k$ be an arbitrary orthonormal basis set of $B$. The average of $O_A$ in state $|\psi_{AB} \rangle$ is $$ \langle \psi_{AB} | O_A |\psi_{AB}\rangle \equiv \langle \psi_{AB} | O_A\otimes I_B |\psi_{AB}\rangle = \sum_{j,k}{\langle \psi_{AB} | j_Ak_B\rangle \omega_j \langle j_Ak_B|\psi_{AB}\rangle} = \\ = \sum_j{\omega_j \sum_k{\langle \psi_{AB} | j_Ak_B\rangle \langle j_Ak_B|\psi_{AB}\rangle}} $$ The meaning of the last expression is quite transparent, since the sum over $k$ gives the total probability $p_j$ that subsystem $A$ is in state $|j_A\rangle$ while $B$ is in any of the basis states $|k_B\rangle$. Let us rewrite this probability slightly differently: $$ p_j = \sum_k{\langle \psi_{AB} | j_Ak_B\rangle \langle j_Ak_B|\psi_{AB}\rangle} = \sum_k{\langle j_Ak_B|\psi_{AB}\rangle\langle \psi_{AB} | j_Ak_B\rangle} = \\ = \langle j_A| \left[ \sum_k{\langle k_B|\psi_{AB}\rangle\langle \psi_{AB} | k_B\rangle}\right] |j_A\rangle $$ Notice that this time the expression in the square brackets is independent of the eigenbasis $\{|j_A\rangle\}_j$ and therefore of the choice of $O_A$. If we denote it as $$ \rho_A = \sum_k{\langle k_B|\psi_{AB}\rangle\langle \psi_{AB} | k_B\rangle} $$ we obtain that the total probability to have subsystem $A$ in any state $|j_A\rangle$ is given by $$ p_j = \langle j_A| \rho_A |j_A\rangle $$ and that the average of $O_A$ amounts to $$ \langle \psi_{AB} | O_A |\psi_{AB}\rangle = \sum_j{\omega_j \langle j_A| \rho_A |j_A\rangle} = \sum_{j}{\langle j_A|\left[\sum_{j'}{|j'_A\rangle\omega_j \langle j'_A|}\right] \rho_A |j_A\rangle} = Tr_A(O_A\rho_A) $$ It can be easily verified that the entity $\rho_A$ is in fact a hermitian, positive definite operator on the Hilbert space of $A$. In addition, since the $p_j$'s must sum up to $1$, $\sum_j{p_j} = \sum_j{\langle j_A| \rho_A |j_A\rangle} = 1$, we also have that $Tr_A\rho_A = 1$, a property that is again independent of the basis $\{|j_A\rangle\}_j$. In other words, $\rho_A$ is a density matrix that encapsulates all information about the statistics of subsystem $A$, regardless of the state of $B$. It is said that the information on $B$ is averaged out.
Furthermore, we can rewrite $\rho_A$ as $$ \rho_A = \sum_k{\langle k_B|\psi_{AB}\rangle\langle \psi_{AB} | k_B\rangle} = \sum_k{\langle k_B|\left[ |\psi_{AB}\rangle\langle \psi_{AB} |\right] | k_B\rangle} = \sum_k{\langle k_B|\rho_{AB} | k_B\rangle} $$ or $$ \rho_A = Tr_B\rho_{AB} = Tr_B\left(|\psi_{AB}\rangle \langle \psi_{AB}|\right) $$ The latter expression is the one we want to keep, since it can be shown that it is independent of the choice of basis $\{|k_B\rangle\}_k$.
The density matrix $\rho_A$ describes the reduced state of subsystem $A$. Similarly, the density matrix $\rho_B = Tr_A\rho_{AB} = Tr_A\left(|\psi_{AB}\rangle \langle \psi_{AB}|\right)$ describes the reduced state of subsystem $B$. Show as an exercise that the average of any observable $O_B$ of $B$ is given by $\langle \psi_{AB} | O_B |\psi_{AB}\rangle = Tr_B\left( O_B\rho_B\right)$ :)
The above is all that is needed for a basic understanding of various examples of coherence and entanglement. For instance:
Any pure state $|\psi_A\rangle = \alpha_0|0_A\rangle + \alpha_1|1_A\rangle$ of system $A$ is a coherent superposition showing interference between pure states $|0_A\rangle$ and $|1_A\rangle$.
Same goes for states $|\psi_B\rangle = \beta_0|0_B\rangle + \beta_1|1_B\rangle$ of $B$.
States $|\psi_A\rangle\otimes|\psi_B\rangle$, $|\psi_A\rangle\otimes|0_B\rangle$, etc, are separable pure states such that both $A$ and $B$ are each individually in coherent superpositions of pure states. Interference experiments on $A$ alone will show the same interference patterns as in the absence of $B$, and vice-versa.
Entangled states $|\psi_{AB}\rangle = \gamma_0|0_A0_B\rangle + \gamma_1|1_A1_B\rangle$ of the joint system $A$-$B$ are coherent with respect to joint pure (and separable) states $|0_A0_B\rangle$ and $|1_A1_B\rangle$. That is, a joint interference experiment on $A$ and $B$ produces an interference pattern. But now the "local" state of $A$ alone is described by the reduced density matrix $$ \rho_A = Tr_B\left(|\psi_{AB}\rangle \langle \psi_{AB}|\right) = Tr_B\left[ \left(\gamma_0|0_A0_B\rangle + \gamma_1|1_A1_B\rangle \right)\left(\gamma^*_0\langle 0_A0_B| + \gamma^*_1\langle 1_A1_B| \right)\right] =\\ = |\gamma_0|^2 |0_A\rangle \langle 0_A| + |\gamma_1|^2 |1_A\rangle \langle 1_A| $$ and it is an "incoherent" mixed state: it does not produce an interference pattern by itself ("locally"), or when the interference experiment erases all information on $B$. Notice that $\rho_A$ is the intrinsic reduced (local) mixed state of $A$ when the total entangled state is $|\psi_{AB}\rangle$. No additional measurement needs to be performed on either $A$ or $B$ to bring $A$ in state $\rho_A$. Check as an exercise that the same goes for $B$.
Finally, a very brief answer to question 3: Yes, decoherence understood as loss of coherent superposition involves entanglement and/or a dissipative dynamics in the presence of another system (measurement apparatus, environment, etc). Sometimes though it may mean loss of phase coherence under internal interactions.
When I have a particle $A$ in a superposition state $\psi_A = a|0\rangle + b|1\rangle$ and entangle it to another system $B,$ in state $\psi_B,$ my first particle still remains in a superposition, and its measurement is still random, is it not?
When two particles are entangled then you simply do not have particle A in state A and particle B in state B. If the two particles had their own states then the joint state would be the product of the two states.
Go back and reread the first part where the author talks about what it means to be entangled, when you are not entangled you have the general state as the product of two single particle states. But entangled states don't have that (by definition). If you are rereading it note that a superposition of two eigenstates of a spin 1/2 direction is simply an eigenstate of a differently oriented eigenstate. A superposition of single particle states doesn't have to be any weirder than an eigenstate, so when the author says the single particle superpositions are weird and nonclassical this might not be the case. And the past about expectation values is wrong too, there are no functions of x after you take an expectation value. But the rest. The definition of entanglement seemed fine, though you seem to have not grasped it.
So why do we say that entanglement destroys coherence?
Don't focus on superposition, there is no physical meaning to the result of a superposition, what you get after a superposition could be what someone else starts with to make superpositions so it isn't the key to anything. Its real, but don't for instance think you can loom at something and tell whether it was a superposition. A superposition is like a sum. You might look at at 5 and say that it is 2+3 and so is a sum but someone else can look at 5+7 and say that 5 is a term. Term ... Sum. You can't necessarily tell.
Interference happens when you have two things overlap and not be orthogonal. It is possible for instance to entangle the spin and still get spatial interference as long as the spin dynamics don't couple to the spatial dynamics.
The reason the entanglement can destroy the interference is by making them not overlap. I said you can get interference even if you entangle the spins. A way to lose the interference is if you entangle going left for one particle with going left for the other particle.
You see the wave isn't a wave in space, people just fail to tell you that sometimes. When you have two (or more) particles the wave is in configuration space, which means you assign a complex number to a 6d space where the first three coordinates tell you where the first particle is and the next three tell you where the second is and so on. So knowing all the particles tells you the configuration and knowing the configuration tells you all the particles.
So when you entangle the positions of both particles then the wave is nonzero only for configurations where they are both left or both right. When you try to get an interference you need two waves to evaluated at the same point. In the post you read it was written as x but it should have been a point in 6d space like $(x_1,y_1,z_1,x_2,y_2,z_2).$ So they don't interfere because at every $(x_1,y_1,z_1,x_2,y_2,z_2)$ the one that went left still has the second part like on the left and the one that went right still the second particle on the right so the 6d x where the wave is simply doesn't have the $\left|00\right\rangle $ and the $\left|11\right\rangle $ overlap anywhere on the screen. In a sense it is just that the waves don't overlap.
It would be great if one could elaborately show this for the simplest entangled pairs!
It is 100% like of the left slit shot the beam upwards and the right beam shot it downwards. To the right and down you see a big spot and the the left and up you see a bug spot and there is no interference because the two paths didn't overlap.
It is lack of overlap that makes coherence irrelevant. And it seems deep only because you didn't get told all the details. Every alleged deep thing in quantum mechanics is just making a big deal about the words instead of looking at the details of the dynamics of the actual experimental setup.
Is the point maybe that if $B$ is measured first, only then $A$ loses its coherence?
The order of measurements on different particles does not change the frequency of the results you get.
does the converse mean that the concept of decoherence is tightly related to entanglement of a small system with its environment?
Yes. What you call measurement is the end result of a process of entangling the subject with the device and then the environment. Entanglement is natural.
Or in other words would decoherence happen at all without entanglement?
There is no "without entanglement" entanglement is a natural thing that happens all the time. There is no known way to not have it I guess if you had no interactions you might be able to avoid it.