Conditional expectation as a Radon-Nikodym derivative.

Your intuition and formula makes sense when each $E_i$ is an elementary event in ${\cal F} $ which can not be decomposed into 2 disjoint events, both of positive probability. If it can be decomposed then there is a mis-match as in general $E(X|{\cal F})(\omega)$ is not constant over $\omega\in E_i$ while your formula is.

Suppose that $\Omega$ may be partitioned into a countable family of disjoint measurable events $E_i$, $i\geq 1$. It suffices to keep only the events with strictly positive probability, as they will carry the total probability. The $\sigma$-algebra ${\cal F}$ generated by this partition simply consists of all unions of elements in this family. A measurable function w.r.t ${\cal F}$ is precisely a linear combination of $\chi_{E_i}$, the characteristic functions on our disjoint family of events. We may thus write: $$ E(X|{\cal F}) (\omega) = \sum_j c_j \chi_{E_j}(\omega)$$ The constants may be computed from the fact that $\int_{E_i} E(X|{\cal F}) dP = c_i P(E_i) = \int_{E_i} X\; dP$. We get: $$ E(X|{\cal F}) (\omega) = \sum_j\chi_{E_j}(\omega) \frac{1}{P(E_j)} \int_{E_j} X\; dP $$ corresponding to the formula you mentioned. By writing down the defining equation you see that this indeed is the Radon-Nykodym derivative of $\nu(E)=\int_E X \; dP$, $E\in {\cal F}$ with respect to $P_{|{\cal F}}$.

Conditional expectation, however, becomes less intuitive when ${\cal F}$ is no longer generated by a countable partition, although sometimes you may find a tweak to get around. Example: Let $P$ be a probability on ${\Bbb R}$ having density wrt Lebesgue $f\in L^1({\Bbb R})$, $dP(x) = f(x) dx$.

We will consider a sub-$\sigma$-algebra generated by symmetric subsets of the Borel $\sigma$-algebra. Thus $A\in {\cal F}$ iff $x\in A \Leftrightarrow -x\in A$.

A measurable function wrt ${\cal F}$ is now any function which is symmetric, i.e. $Y(x)=Y(-x)$ for all $x$. This time an elementary event consists of a symmetric couple $\{x,-x\}$ which has zero probability. And you can not throw all these away when calculating conditional expectation. So going back to the definition, given an $X\in L^1(dP)$ you need to find a symmetric integrable function $Y$ so that for any measurable $I\subset (0,+\infty)$ you have: $$ \int_{I\cup (-I)} Y\; dP = \int_{I\cup (-I)} X \; dP $$ Using that $Y$ is symmetric and a change of variables this becomes: $$ \int_I Y(x) (f(x)+f(-x)) \; dx = \int_I (X(x) f(x) + X(-x) f(-x)) \; dx $$ On the set $\Lambda = \{ x\in {\Bbb R} : f(x)+f(-x)>0 \}$ which has full probability we may then solve this by defining: $$ Y(x) = \frac{X(x) f(x) + X(-x) f(-x) }{f(x) + f(-x) }, \; x\in \Lambda. $$ On the complement $Y$ is not defined but the complement has zero probability. $Y$ is then symmetric and has the same expectation as $X$ on symmetric events. Again $Y(x)$ is the Radon-Nikodym derivative of $\nu(E) = \int_E X \; dP$ wrt $P(E)$ with $E\in {\cal F}$.

Our luck here is that there is a simple symmetry, i.e. $x\mapsto -x$, describing the events in ${\cal F}$ and that the probability measure transforms nicely under this symmetry. In more general situations you may not be able to describe $E(X|{\cal F})$ explicitly in terms of values of $X$ and you are stuck with just the defining properties for conditional expectation [which, on the other hand, may suffice for whatever computation you need to carry out].