GHz rate single photon counting
A scheme used by several current particle physics detectors can almost certainly be made to work (though it generally involves custom high-speed electronics which is pretty expensive; perhaps a small system can get away with just a good FPGA...).
The basic scheme is to continuously digitize the output of the primary detectors (PMTs or whatever) into a circular buffer of circular buffers. The two instances of this system I've worked with used ADC widths of $8$--$32\,\mathrm{ns}$, but there is nothing special about that: you could get down to around 1 ns pretty easily and circa $0.1\,\mathrm{ns}$ should be possible.
At the electronics level the primary signal is pre-amplified (if needed/desired, often the primary detector gain is sufficient), and split to (at least) the trigger and the digital electronics.
The digital electronics are backed by $N$ circular buffers of $M$ samples each. Each buffer also maintains pointers to the start and end of the recently written data. At any given time the system is working on sample $m \in [0,M)$ of buffer $n \in [0,N)$; the sample is written and the working sample is advanced $m := (m + 1) \bmod M$. In the event that no trigger occurs the system is allowed to continuously overwrite "uninteresting" data as $m$ cycles through the whole range.
When a trigger occurs, the system advances the buffer $n := (n+1) \bmod N$ so that the most recent buffer(s) will not be overwritten.
The data acquisition system can then readout the latched buffers as time is available and reconstruct the "interesting" parts of the signal. (If you need to know what the "uninteresting" parts of the signal look like you can always generate a false trigger to latch the "nothing"; this is called a "minimum bias" or "random" trigger and you generally do need one.)
The size of the individual buffers is chosen to insure that the whole signal should be in a single latched window. The number of buffers you need depends on the expected rate and the readout latency. You need some scheme to deal with triggers than come so close together that the "next" buffer still contains stale data (only partially overwritten), and other issues that I'm sure you can see for yourself if you think about it.
This does not necessarily count photons, it allows you to approximately reconstruct the analog signal from the detector with a time-like granularity on order of the sample width. So you can't necessarily tell the difference between say two green photons in close coincidence and one near UV photon, but this is often good enough.
I suspect that high-speed capturing oscilloscopes do something similar internally.
As of February 2016 there are two ways to actually count photons at more than GHz rates that are affordable and technically sound. Technology has moved on, as opposed to over-specsmanship bluster. Hamamatsu makes a Hybrid tube R10467U-40 with 45% QE in the visible and the ability to count photons at multiple GHz rates. This has been accomplished for LIDAR using a doubled YAG laser at NPS. Lidar signals are a combination of exponential decay and 1/R^2, and the Hybrid tube is excellent for decreasing signals. This is similar to fluorescence decay but more difficult because of the large dynamic range and extremely varying signal. Hamamatsu has another tube that will also allow sustained GHz counting that would work but is not as adapted to this application.
There is literature on the last page of the following web link that compares the strengths and weaknesses of various detectors under About Photon Counting. Manufacturers tend not to point out the weaknesses of various detectors, but this article does based on 40 years of experience.
http://www.photoncounting.net/
A new interface digital output to the Hybrid tube has been developed and tested recently. The photon counter in the link can do 250 ps bins by using all four channels, or one can pay 10x as much and get slightly better bin resolution. Two of the amplifiers in the link would allow 2 GHz photon counting.
Unlike electron multipliers, the Hybrid tube can recover from a large signal in a nanosecond with no decay tail. Measuring fluorescent lifetimes accurately would best be served by collecting large numbers of photons rather than measuring their arrival time to a few picoseconds, and 10 MHz photon counting would have difficulty doing this. The Ghz photon counting technique is a vast improvement over the 20,000 rpm ultra-centrifuge light chopper technique I have used to measure sub-nanosecond decay lifetimes with a PMT in the dark ages.
Total cost for a GHz system, assuming one has a digital oscilloscope, should be less than four thousand dollars. Most companies do not publish their prices anymore for the photon counters. Two companies that make ultra fast systems are:
http://www.fastcomtec.com/products/ultra-fast-photon-counters.html
and
http://www.becker-hickl.de/
These Photon counters tend to be expensive - 5000 EU to 15,000 EU. The excellent literature at Becker-hickl describes measurements of the Hybrid tube and techniques for fluorescence decay.
There now are affordable digital logic chips available with 30 picosecond rise times and over 10 GHz toggle rates. It is time for an update to this question.