Why do we need random variables?
An honest answer should start with the fact that probabilists usually care more about the distributions of random variables than the underlying probability spaces. Terry Tao has a blog post in which he argues that probabilistic concepts are those that are invariant under extending the underlying probability space. A lot of standard probability concepts such as expectations and variances depend only on the distributions of random variables, and in principle, one could state the strong law of large numbers as a result about infinite product measures.
From a didactic point, starting with distributions is odd though. If we are interested in the average height of the population of the Netherlands, we can start with the distribution of heights, but the motivation of the concept requires us to think of this as the height of actual people and making this formal, requires us to reintroduce the sample space of people in the Netherlands.
When it comes to conditioning, we would have to introduce all variables we might want to condition on, by their distribution in a huge joint probability space of distributions. In many applications, the joint distribution will be supported on the graph of a function and we might well treat this function as a random variable to begin with.
On a more advanced level, there are methods of proof that are based on auxiliary underlying probability space. For example, Skorokhod's representation theorem allows us to study weak convergence, something we care about a lot when working with distributions, in terms of almost sure convergence on an auxiliary underlying probability space.
An area which goes well beyond basic probability in which the underlying probability space cannot be dispensed with is the theory of adapted stochastic processes in continuous time. The filtrations representing information are not represented in the distributions of sample paths. There have been some attempts to define a distribution for adapted processes in a way to preserve the relevant information, the most convincing version can be found in the paper Adapted probability distributions by Hoover and Keisler (see also this book.) The resulting notion is very involved and draws on ideas from model theory unfamiliar to most probabilists. In any case, it has not been widely adopted (no pun intended) in the probability literature.
Although in principle the sample space, with its $\sigma$-algebra and probability measure, comes first, things are not always so neat in real life. In applications it is often the random variables (some numerical quantities that you are interested in) that are most important, and the sample space is just scaffolding set up to support them. In fact, this is one of the main things that distinguishes probability theory from measure theory. There is a nice discussion of this in D.H. Fremlin, Measure Theory, Volume 2, Ch. 27.
One of your concerns is (let me quote from your question)
Often I read that there is the possibility of having a family X1,…,Xn of random variables on the same space. I know no example—and would be happy to discover—of a problem truly modelled by this, whereas in most examples that I read there is either a single random variable
Here is what I do on the first day of my probability class.
The statistical experiment I describe is: Go to the road outside the college building and consider the first car that goes left to right after your arrival. As we do not know/cannot predict which car in the city might be there it is a statistical experiment. The sample space is the set of all cars in your city (or in your country).
Questions:
How many people are in that car?
What is the amount of petrol in the fuel tank at that time?
How many kilometers the car has travelled that day before you noticed?
What is the wavelength of the color of the car? (admittedly artificial)
All these are random variables on the same sample space.
Answer to question 1 might be useful to a person who sells eatables on the roadside? (more passengers means more business)
Answer to question 2 might help decide if it would be profitable to open a petrol-selling shop here.
I ask students to come up with examples of such statistical experiments instead of coin-tossing and dice-throwing ones.
I got this from a bright student:
Go to the library. Observe the first book that is borrowed by a user that day. Sample space is all books of the library.
Random variables are: Number of pages of that book, Price of that book, How many times it has been borrowed earlier.