Why does LIGO do blind data injections but not the LHC?
After they told me about their impressive "LHC Olympics" in which physicists (often hardcore theorists) were reverse engineering a particle physics model from the raw (but fake) LHC data, I proposed the same idea in a circle of physicists at Harvard, including Nima Arkani-Hamed, sometime in 2005 and we have worked on that LHC ideas in some detail. We were thinking how amusing it would be to inject some signs of extra dimensions and lots of other things. We were also acknowledging the increase of the excitement that it could bring to the particle physics community.
The main reason why this "drill" probably isn't as important for the LHC as it was for LIGO is that particle physicists – experimenters and phenomenologists – are doing lots of similar exercises, anyway, even if they're not told that "it is real (but fake) data from the LHC". Phenomenologists preemptively think about lots of "possible signals" etc. They don't need an extra "training" of the same kind.
Moreover, LIGO detects boring noise at almost all time, so if some of this noise is overwritten, LIGO doesn't lose much valuable data. However, even if the LHC is expected to create Standard-Model-like processes all the time, their structure is more complex than just some nameless "noise". So by overwriting the real data by something with a contamination of a fake signal, one could really contaminate the data for many analyses. Real work by many people that takes too much time could be useless and it's too much to ask.
Here, the difference really is that LIGO was pretty sure that it wouldn't get any real signal around 2010. So the physicists in LIGO didn't have anything of the kind to work on, and not to lose their skills, a "drill" was a good idea. On the other hand, the LHC is analyzing real LHC data from previously untested energies such as 13 TeV and there is a significant probability that they discover something even without injections. So the injections are not needed – people work hard on interesting, structured, data, anyway.
A related difference is that the strength of the LIGO signal builds up quickly during those 0.2 signals that the black hole merger took. On the other hand, the strength of the LHC signal builds for a whole year or more. If all the interesting new physics events at the LHC took place too quickly (in a day) and then disappeared, the experimenters could see that something is suspicious. The LHC would need to contaminate the signal in the whole run and it wouldn't know how strong the contamination per unit time of the drill should be. The signal always gets stronger if one records more LHC collisions – but a single event detected by LIGO can't be "strengthened" by such waiting. So the LIGO drill is a well-defined campaign that takes some finite time while the LHC drill could be an "undetermined time" campaign.
As CuriousOne basically said but I will say it differently, there are also many more possible discoveries at the LHC. So inventing one particular "fake signal" could be a very problematic thing – what is the best signal to inject? The LIGO case was very different. The fake 2010 signal was actually a black hole merger extremely similar to the actual 2015-2016 discovery. So there was basically "a single most likely first discovery" – a scenario as unique and specific as a fire in a skyscraper – so a particular drill for that scenario made some sense.
Let me first mention that the LHC is in a way a text book experiment: you have a very good control over the experimental conditions and you can repeat your experiment as often as you like. You have, in way, full control over the signal. Results are reproducible in that you just redo the experiment. LIGO is "just" a detector: In particular, you have absolutely no control over the signal. This makes the two experiments very different and what is interesting for one experiment might not be interesting for another.
Here are a few reasons that I can see why this is not really feasible for the LHC:
LIGO is dependent on single events, the LHC is not. If the LHC finds something, this is always based on many rounds of experiments and billions of collisions to get the required statistics. If LIGO finds something, this is based on one signal lasting for just a few milliseconds. That means that in order to fake the LHC signal, you have to manipulate it for months, while in order to fake the LIGO signal, you need to manipulate maybe a second of the data set. In addition, if you manipulate months of data, there is a good chance you also manipulated good data that would have lead to a significant discovery.
The LHC signal is particles bumping together, which are then immediately detected by a vast number of very different detectors in two experiments (ATLAS and CMS). While this can be done using Monte Carlo simulations as pointed out by CuriousOne, it seems it's still much easier for LIGO: LIGO is "just" a Michelson interferometer: in order to fake a signal, you wiggle with the mirrors, because that changes the pathlength of the laser, which is all you ever measure (this is described in your article).
As CuriousOne said: The LHC detects a lot of stuff that is well-known, but what we are really interested in is the stuff that we have no idea how it should look like (well not really: many people have many ideas, but nobody agrees and with all ideas, it's not really clear how the exact signal will look like). In contrast, we know pretty much what we look for in LIGO.
Fake event injection is only one of several schemes for "blind" analysis. Other blinding schemes involve manipulating some parameter of the data as show to the analysis team by a reversible transformation of some kind, multiple independent analyses, and complete analysis dry-runs on simulated data.
The thing to understand is what purposes are served by doing these things.
Fake event injection
It works best when the output of a detector is simple (in the case of LIGO it is basically a single time-series for each of the interferometers) and the expected signal is reasonably well understood, and is of most use when real events are rare. It serves to rehearse and test the process that will be used on the observation of a real event
KamLAND got around one real event per day, so detections weren't very rare, but individual detections were noted by the shift crew in the early days of the experiment. By the time I joined the experiment they had a 'online event detection' routine that triggered a couple of times a shift and served to keep you on your toes. This wasn't fake data, but a coarse filter. None the less it meant that shift takers got to exercise their response to a data event regularly.
The nature of data at a large compound detector such as those at the LHC is very different. For the processes of interest signals are not discrete but are built up by a portfolio of events and always have a non-trivial background. Fake signals and their attendant backgrounds have to be generated by a large-scale Monte Carlo simulations and joined into a fake data stream, then picked apart again to validate a proposed analysis—a process which is occurring all the time but happens off-line.
Reversible transformation of the data.
The main advantage in doing a "blind analysis" of this kind, is that it prevents the analysis team from making decision about how to set the cuts from a (presumptively unconscious, but it works against some malicious manipulations too) bias about how the results "should" come out.
The $G^0$ proton weak form-factor experiment at JLAB used a multiplicative scale (stored off-line in a secure location and known only to a few senior members of the collaboration not involved in the analysis) applied to the instantaneous asymmetry, for instance. In this case the main reportable result of the experiment was going to be the size of this asymmetry so the manipulation prevented optimizing of the analysis to get the preferred result.
Multiple separate analyses
Here two or more teams work independently on the data from the ground up and comparisons between there results are made only occasionally and in a public setting. The notion is that each team will have to grapple with the same problems and will—by virtue of doing so separately—sometimes solve them in different ways. If the results of the analysis are robust in the face of slightly different handling of the data you can have more confidence in them; on the other hand if teams disagree they are asked to act as advocates for their own point of view in the face of scrutiny from both the rest of the collaboration and other teams until the differences in results are resolved. I've seen this used by design on $G^0$, KamLAND, and Double Chooz, and it is used in a natural setting on almost any big project just because areas of interest for various working groups overlap.
As I've noted before CMS and ATLAS comprise a sort of super version of this process where even the details of their detectors differ. That's why their combined Higgs discovery announcement was more convincing than a single announcement could have been with similar statistics.
Off-line Monte-Carlo challenge.
In this scheme the analysis team or subsets thereof are presented with a completely fake data stream that is constructed to have all the expected signals and backgrounds (and perhaps some 'special' data) and asked to tease apart the size of the various contributions. This is a dry run for a full-on analysis of the data made against a working set that is completely understood by some part of the collaboration not part of the analysis teams. I've seen this done on a large scale for Double Chooz and MicroBooNE.