Effective number of bits of 14-bit ADC
You've been bamboozled!
14-bit is marketing speak, and the hardware also gives you that, so they'll say you have nothing to complain about. Just above ENOB in the datasheet it gives SINAD (Signal to Noise and Distortion) numbers. That's 72 dB, and 1 bit corresponds to a 6 dB level, so that 72 dB is indeed 12 bits. The 2 lowest bits are noise.
It's possible to retrieve data which is lower than the noise floor, but it needs very good correlation, which means it has to be very predictable.
Suppose one wishes to measure a steady voltage as accurately as possible, using an ADC that will return an 8-bit value for each measurement. Suppose further that ADC is specified so that a code of N will nominally be returned for voltages between (N-0.5)/100 and (N+0.5)/100 volts (so e.g. a code of 47 would nominally represent something between 0.465 and 0.475 volts). What should one wish to have the ADC output if fed a steady-state voltage of precisely 0.47183 volts?
If the ADC always outputs the value that represents the above-defined range in which the input falls (47 in this case), then no matter how many readings one takes, the value will appear to be 47. Resolving anything finer than that would be impossible.
Suppose instead that the ADC were constructed so that a random "dither" value linearly distributed from -0.5 to +0.5 were added to each reading before converting it to an integer? Under that scenario, a voltage of 47.183 volts would return a reading of 48, approximately 18.3% of the time, and a value of 47 the other 81.7% of the time. If one computed the average of 10,000 readings, one should expect it to be approximately 47.183. Because of the randomness, it may be slightly higher or lower, but it should be pretty close. Note that if one takes enough readings, one may achieve an arbitrary level of expected precision, though each additional bit would require more than doubling the number of readings.
Adding in precisely one LSB of linearly-distributed dithering would be a very nice behavior for an ADC. Unfortunately, implementing such behavior is not easy. If the dithering is not linearly distributed, or if its magnitude is not precisely one LSB, the amount of real precision one could get from averaging would be severely limited, no matter how many samples are used. If instead of adding one LSB of linearly-distributed randomness, one adds multiple LSB's worth, achieving a given level of precision will require more readings than would be required using ideal one-LSB randomness, but the ultimate limit to the accuracy that can be achieved by taking an arbitrary number of readings will be far less sensitive to imperfections in the dithering source.
Note that in some applications, it's best to use an ADC which does not dither its result. This is especially true in circumstances where one is more interested in observing changes in ADC values than in the precise values themselves. If quickly resolving the difference between a +3 unit/sample and a +5 unit/sample rate of increase is more important than knowing whether a steady-state voltage is precisely 13.2 or 13.4 units, a non-dithering ADC may be better than a dithering one. On the other hand, use of a dithering ADC may be helpful if one wants to measure things more precisely than a single reading would allow.
Bonus lesson: You really do get 14 bits of precision, but only 12 bits of accuracy.