The statistical nature of the 2nd Law of Thermodynamics
Giving a full answer to this one takes quite a bit of information, so I'll first give a few references and then summarise how they all fit in.
References
Relevant Physics SE Questions
Does the scientific community consider the Loschmidt paradox resolved? If so what is the resolution?
Theoretical proof forbidding Loschmidt reversal?
Perpetual motion machine of the second kind possible in nano technology?
Review Papers
There are two review papers describing the concepts I am about to talk about:
Sevick, E. M.; Prabhakar, R.; Williams, Stephen R.; Bernhardt, Debra Joy, "Fluctuation Theorems", Annual Rev. of Phys. Chem., 59, pp. 603-633 (this one is paywalled).
E. T. Jaynes, "Gibbs vs Boltzmann Entropies", Am. J. Phys. 33, number 5, pp 391-398, 1965 as well as many other of his works in this field
Charles Bennett, "The Thermodynamics of Computing: A Review", Int. J. Theoretical Physics, 21, 12, 1982
And a remarkable experiment that actually BUILDS AND TESTS the Maxwell Daemon.
- Shoichi Toyabe; Takahiro Sagawa; Masahito Ueda; Eiro Muneyuki; Masaki Sano (2010-09-29). "Information heat engine: converting information to energy by feedback control". Nature Physics 6 (12): 988–992. arXiv:1009.5287. Bibcode:2011NatPh...6..988T. doi:10.1038/nphys1821.
"We demonstrated that free energy is obtained by a feedback control using the information about the system; information is converted to free energy, as the first realization of Szilard-type Maxwell’s demon."
Now Your Question
Now to your question. You are quite right in your conclusion about the second law's statistical nature:
... But then someone imagines a box with a 10 particle gas, and finds that every now and then all particles are in the left. Conclusion, the 2nd law holds only in a statistical sense ...
and indeed various fluctuation theorems (see the "Fluctuation Theorem" Wikipedia page as well as the "Fluctuation Theorems" review paper I cited above) quantify the probability of observing deviations of a given "severity" from the second law. For the reason you clearly understand, the smaller the system, the less meaningful it becomes to describe it in terms of "macroscopic" properties such as temperature, pressure and so forth (indeed these quantities can be construed to be a parameter of a statistical population, which have less and less relevance for smaller and smaller sample sizes from that population).
So I think the most meaningful version of the second law to address for this question is Carnot's classic macroscopic statement that it is "impossible to build a perpetual motion machine of the second kind". A particular property of such a perpetual motion machine is its periodicity in its interactions with its surroundings: it undergoes a periodic cycle and when it comes back to its beginning point, both it and the surrounding world are in the same state. So the impossibility of the second kind perpetual motion machine talks about "not winning in the long term": you might make small conversions of the heat in a uniform thermodynamic temperature system into useful work in the short term by dint of fluctuations, but in the long term you cannot. Ultimately this is an experimental fact and is thought to be owing to the boundary conditions of the universe.
The Szilard Engine and Maxwell Daemons: Information is Physical
Let's look at the Szilard engine and Maxwell Daemon first: the latter was conceived by Maxwell to illustrate that the second law was "just statistical" and it does seem to thwart the second law, as does the Szilard engine. Indeed they do win in the short term, but in the long term they do not. The full resolution to the problem is discussed in detail in Bennett's paper that I cited above, and the reason they do not is Landauer's Principle: the idea that the merging of two computational paths or the erasing of one bit of information always costs useful work, an amount given by $k_B\,T\,\log 2$, where $k_B$ is Boltzmann's constant and $T$ the temperature of the system doing the computation.
Bennett invented perfectly reversible mechanical gates ("billiard ball computers") whose state can be polled without the expenditure of energy and then used such mechanical gates to thought-experimentally study the Szilard Engine and to show that Landauer's Limit arises not from the cost of finding out a system's state (as Szilard had originally assumed) but from the need to continually "forget" former states of the engine.
Probing this idea more carefully, as also done in Bennett's paper: One can indeed build the Maxwell Daemon with simple finite state machines in the laboratory, as described in the Nature paper I cited. As the Daemon converts heat to work, it must record a sequence of bits describing which side of the Daemon's door (or engine's piston, for an equivalent discussion of the Szilard engine) molecules were on. For a finite memory machine, one needs eventually to erase the memory so that the machine can keep working.
However, "information" ultimately is not abstract - it needs to be "written in some kind of ink" you might say - and that ink is the states of physical systems. The fundamental laws of physics are reversible, so that one can in principle compute any former state of a system from the full knowledge of any future state - no information gets lost. So, if the finite state machine's memory is erased, the information encoded that memory must show up, recorded somehow, as changes in the states of the physical system making up and surrounding the physical memory.
So now those physical states behave just like the computer memory: eventually those physical states can encode no more information, and the increased thermodynamic entropy of that physical system must be thrown out of the system, with the work expenditure required by the Second Law, before the Daemon can keep working. The need for this work is begotten of the need to erase information, and is the ultimate justification for Landauer's principle.
The Szilard Engine and Daemon "win" in the short term because they are not truly cyclic: they change the states of memory: the second law prevails when that memory is brought back to its beginning state too.
Another Illustration of Non Cyclic Thwarting of the Second Law
Another illustration of the importance of true cycles in considering the second law is a "trick" whereby one can extract ALL of the enthalpy of a chemical reaction as useful work IF one has a sequence of cooler and cooler reservoirs that one can use as follows: (1) Lower the reactants down to absolute zero temperature by drawing heat from the reactants into the reservoirs, (2) Let the reaction to go ahead at aboslute zero thus extracting all the reactants' enthalphy as work and then (3) Use the sequence of reservoirs in rising temperature order to bring the reaction products back to the beginning temperature. The point is that some of the enthalpies of formation will now be left in the cold reservoirs and so the system has not been taken through a complete cycle. One can't do this indefinitely: the cool reservoirs will eventually heat up if one does this repeatedly. You might "win" with small amounts of reactants, but you can't do so indefinitely because you are degrading the system: the work needed to restore the cold reservoirs to their beginning state is then the difference between the enthalpy of reaction and the free energy.
"Proofs" of the Second Law
E. T. Jaynes tried to bring information theory rigorously to thermodynamics and critically examines Boltzmann's concept of entropy. In particular the Boltzmann "stosszahlansatz" (assumption of molecular chaos) can often only be applied once, as later changes to the system leave the states of molecules of a gas correlated, thus begetting the difference between the Gibbs (informational) and Boltzmann ("experimental", i.e. defined only when you have big systems) entropies, with the former unchanged in things like irreversible volume changes, the latter always increasing. So, from an assumption of molecular chaos, one can prove once that the Boltzmann entropy must increase in an irreversible change. But the irreversible change and the correlation between system constituents it begets means that one cannot apply the assumption of molecular chaos again and repeat the proof unless one comes up with an explanation of how the system gets back to a state where the states of all its constituent parts are uncorrelated. See the Jaynes papers in my references: Jaynes does eventually argue that one needs to appeal to experiment to support the large scale second law of thermodynamics.
So ultimately it would seem that the statement that the Boltzmann entropy of a system always increases in the long term can only be substantiated experimentally. Why the entropy of a system always increases when physical laws are just as valid with time running backwards is called "Loschmidt's Paradox". There has been a great deal of work to understand this and it's generally agreed that the answer has to do with the "boundary conditions" of the universe - roughly put, the universe was (observed fact) in an exquisitely low entropy state at the big bang, and so the overwhelmingly likeliest history is one where entropy rises with increasing time. But how and why that low entropy state arose is, as I understand it, one of the profound mysteries of modern physics. A good layperson's summary of why we have a second law of thermodynamics, how entropy is to some extent a subjective concept, and the discussion of this profound mystery is to be found in chapter 27 of Roger Penrose's "The Road to Reality". I would highly recommend you look at this reference.