How does light combine to make new colours?
Color perception is entirely a biological (and psychological) response. The combination of red and green light looks indistinguishable, to human eyes, from certain yellow wavelengths of light, but that is because human eyes have the specific types of color photoreceptors that they do. The same won't be true for other species.
A reasonable model for colour is that the eye takes the overlap of the wavelength spectrum of the incoming light against the response function of the thee types of photoreceptors, which look basically like this:
Image source
If the light has two sharp peaks on the green and the red, the output is that both the M and the L receptors are equally stimulated, so the brain interprets that as "well, the light must've been in the middle, then". But of course, if we had an extra receptor in the middle, we'd be able to tell the difference.
There are two more rather interesting points in your question:
every colour within the visible spectrum can somehow be created by "combining" the three in different intensities.
This is false. There is a sizable chunk of color space that's not available to RGB combinations. The basic tool to map this is called a chromaticity plot, which looks like this:
Image source
The pure-wavelength colours are on the curved outside edge, labelled by their wavelength in nanometers. The core standard that RGB-combination devices aim to be able to display are the ones inside the triangle marked sRGB; depending on the device, it may fall short or it can go beyond this and cover a larger triangle (and if this larger triangle is big enough to cover, say, a good fraction of the Adobe RGB space, then it is typically prominently advertised) but it's still a fraction of the total color space available to human vision.
(A cautionary note: if you're seeing chromaticity plots on a device with an RGB screen, then the colors outside your device's renderable space will not be rendered properly and they will seem flatter than the actual colors they represent. If you want the full difference, get a prism and a white-light source and form a full spectrum, and compare it to the edge of the diagram as displayed in your device.)
Is RGB the only such triplet?
No. There are plenty of possible number-triplet ways to encode color, known as color spaces, each with their own advantages and disadvantages. Some common alternatives to RGB are CMYK (cyan-magenta-yellow-black), HSV (hue-saturation-value) and HSL (hue-saturation-lightness), but there are also some more exotic choices like the CIE XYZ and LAB spaces. Depending on their ranges, they may be re-encodings of the RGB color space (or coincide with re-encodings of RGB on parts of their domains), but some color spaces use separate approaches to color perception (i.e. they may be additive, like RGB, subtractive, like CMYK, or a nonlinear re-encoding of color, like XYZ or HSV).
In the eye's retina there are three types of cones that act like the filters in the figure, spanning fairly wide frequency bands.
There you can see that pure yellow light will stimulate both "red" and "green" cones.
So, by getting light from nearby pixels of red and green, the retinal cones will respond the same way as it would from pure yellow, if the mix is right.
So it is very much a biological thing. Notice that a wavelength that stimulates a green cone will also stimulate at least one of the red and blue cones. We could thus imagine artificially stimulating only green cones (with electrodes) and might then see a so-called impossible color.
As for RGB alternatives, yes there are other color spaces that can be used to similarly mix to all the possible colors (as defined by the human retina).
Note that RGB screens typically cannot reproduce all colors. The image below shows the triangle of limitation on a typical screen. Professional screens tend to cover more, but rarely all colors.
The other answers by bernander and Emilio Pisanty already explains how eyes capture light and transform it into electrical impulses. There are few more things to understand. My answer will focus mostly on question 1 as the question 2 is already fully covered.
Light is a combination of multiple wavelengths
If you take any light, it actually is an electromagnetic wave (I'm oversimplifying here, but otherwise we'll not get anywhere). The trouble is there is hardly any source of light that produces just one wavelength (lasers do). So essentially the light is a combination of many different wavelengths. To see it you need to use a prism that splits the light beam into each of the wavelengths separately. This is essentially why we see a rainbow - water drops work as natural prisms and the Sun light is pretty much combination of (almost) all visible wavelengths.
If you use more than one source of light, on each of the wavelengths you'll have a sum of light coming for it from each of the sources. In other words if we imagine three lasers, red, green and blue, each of them producing exactly one wavelength, if we intersect their beams in one point and put a screen there, it will be a single point lit with those three wavelengths at the same time. We will not see three colours there, it will be just one spot with one colour. What colour will it be? I'll get back to that later.
Eye receptors capture if there's light only (and its strength)
This is tricky. There are basically 4 types of receptors on eye retina. One (rods) is responsible for recognising any visible(1) wavelength, and three responsible for spotting light within just part of the visible wavelengths range. They react more to the light that is closer to its optimum wavelength (which depends on the receptor/cone type - either red, green or blue, as already explained by others) and the farther the light wavelength is from this optimum the weaker is the reaction. I'll ignore rods responsible for any light since it is used mostly when there is not enough light for the other three (cones) to operate (that's why we see everything in shades of grey in a very dim light).
The receptors cannot tell which wavelength it have captured. If for a single receptor there is just a weak beam of its optimal wavelength or a strong beam but at the edge of what is noticeable - single receptor will just recognise as a pretty much the same amount of light. And produce impulse for the brain.
It's brain that decides what to do with the information
This is the most tricky part. Very, VERY tricky. The thing is brain receives impulses from different eye receptors and combines them. Based on what it has learned in the past (aka experience) it presents to your consciousness something known as a colour.
If you use a single wavelength light, your cones will react in a specific way. That way your mind can learn (from rainbow!!!) those colours. Now if a combination of many wavelengths produce similar cone reaction, the mind will not be able to understand there were multiple wavelengths and just show you the colour it knows from single-wavelength light that produces the same cones reaction. So if the combination of signals coming from eye receptors show there is some red and green light (i.e. those two types of cones produce strong signal exposed to some light) but not much blue then your mind interprets that there must be something you know as yellow. Note - it doesn't matter if the light was just only a single yellow wavelength beam, one single strong wavelength of red and one single strong wavelength of green combined or it was a combination of many wavelengths that made both green and red cones react. Your mind has just 3 signals and based on that has to tell what colour it is.
So if you properly balance the three laser beams mentioned earlier you might end up with a white dot, but you may also end up e.g. with a yellow dot. Or a brown dot. All depends how the cones will react to each of the used wavelengths and how strong will the reactions be.
And that's pretty much how RGB works
What is tricky here is that some combinations of wavelengths produces a combination of cones' responses that are different to any of the single-wavelengths light responses. Your mind still has to interpret it somehow so it presents it to you in some way different to any colour existing from the physical perspective. That way we can see colours like brown or grey.
What about that experience
As already mentioned, the basics is that what colour you'll see will be the relation to previous experience - if the cones reaction to combination of multiple wavelength is similar to a known single wavelength colour reaction you will see that colour. If not you will see something else (but again in a repetitive fashion(2) - but read further).
You can find several optical illusions related to colours or shades of grey. One of the famous recent examples seen in the internet was a dress on a photo that some interpreted as blue and black in strong light while others white and yellow in a shade. If you go in a very dim light into the woods you'll see the leaves faintly green even though your cones do not get enough light to work and all you see is actually a bit of light at all (so some grey). Yet your mind knows that leaves should be green so it sort of paints them for you. If you come back later in a full light you might actually see that some of those green leaves are red or yellow. But our mind tried its best to fill in the gap and used experience to add a colour. It's even more messing with things when the light is not white - mind still uses experience and adapts to the light (to some level) - so green will still look green in red light of a sunset.
So why RGB works?
Simply speaking the light used in each of the light sources causes specific (to some level predictable) reaction of cones as described above. As it can produce most of the possible cones' reactions, as a result you can see most of the colours on a TV/monitor screen.
TL/DR
What you see is a combination of what light gets to your eyes, how eyes make from it electric impulses that reaches brain and how brain interprets it based on the earlier experience.
(1) we call it visible because our eye receptors are capable of noticing it. so it should maybe say "some wavelength range that we call visible. Again, there is a bit of simplification - cones might have slightly broader wavelength coverage than rods. Also this might slightly vary across various humans but those differences might be disregarded. On the other hand other species respond to different wavelength ranges, e.g. dogs have just two types of cones so they essentially see less colours.
(2) it's also interpreted so that the colours that produce just a slightly different cones reaction seem quite similar (shades)