Intuition behind Transition Kernels (without use of Markov Chains)

I would recommend first studying basic discrete time Markov chains without measure theory, as in my comment above.


To give a small amount of help in your current predicament: Basic discrete time Markov chains $\{M(t)\}_{t=0}^{\infty}$ over a finite or countably infinite state space $S$ are defined by a transition probability matrix $P=(P_{ij})$ such that $$P_{ij}=P[M(t+1)=j|M(t)=i] \quad \mbox{for all $i,j\in S$} $$

The crazy kernel stuff generalizes $P$ to uncountably infinite state spaces $S$. We have $$K(x,B)=P[M(t+1)\in B|M(t)=x] $$ for all $x\in S$ and all (measurable) subsets $B\subseteq S$. So $K$ defines a probability measure over the next-state, given the current state.

An example might be a state space $S$ given by the unit interval $[0,1]$. Given $M(t)=x$, define the next-state by $M(t+1)= (x + U_t) \mod 1$ (where the $\mod 1$ operation wraps the number back to the unit interval), where $\{U_t\}_{t=0}^{\infty}$ are i.i.d. random variables uniformly distributed over $[0,1/3]$. So, for example, given $x=1/3$, the kernel $K(1/3,\cdot)$ formally defines the next state according to a uniform distribution over $[1/3, 2/3]$. So: \begin{align} K(1/3, [2/3,1]) &= 0\\ K(1/3, [1/3, 1]) &= 1\\ K(1/3, [1/3, 1/2]) &=1/2 \end{align} and so on. Similarly, you can check that $K(5/6, [0,1/2]) = 1/2$.


Assuming the Markov states $\vec{M}(t)$ are vectors in $\mathbb{R}^n$, I personally would avoid the “kernel” stuff. I would just define the next-state transitions in terms of a conditional CDF function $$F(\vec{y}|\vec{x}) = P[\vec{M}(t+1) \leq \vec{y} | \vec{M}(t)=\vec{x}]$$ defined for all vectors $\vec{x}, \vec{y} \in \mathbb{R}^n$. And conditional PDFs can be defined as derivatives of the above (when differentiable). That is equivalent to specifying a kernel, yet it seems simpler. But that is just me.