Could Google scan books without opening them?

The proposal probably refers to terahertz (THz) imaging; THz is the band that is above microwave, but below infrared, between 100 um and 1 mm wavelength; often referred to as T-Rays.

Because of their relatively long wavelength the T-Ray penetrates well into non-conductive materials, and is far, far below the ionization threshold. Thus it is safe to work with, and can be used to see inside of stuff.

It's a relatively new field of application because it depends upon laser techniques for its generation; the frequencies are too high for electronic processes, but new techniques are always under development.

Terahertz and Cultural Heritage Science: Examination of Art and Archaeology goes into great detail, beginning with analysis of hidden layers in art, where it is used to explore underneath the visible surface:

enter image description here

In section 4.1 they give examples of scanning through multiple pages at the same time.

4.1. Historical Documents Often, historical documents on papyrus as well as on parchment and on paper cannot be read for a number of reasons. Sometimes the fragile sheets just cannot be separated because they are stuck together as a result of deterioration and damages. In other cases, the sheets have been reused as supports or covers for newer documents. There is considerable interest in reading this hidden information and a number of techniques have been tested to pursue this goal while preserving the documents. So far, X-ray computed tomography [66] has been the most successful method. Since THz-TD imaging was already evaluated for the inspection of postal envelopes [67], it was also tested with encouraging results for stacked papyrus layers written with carbon black ink [68]. Recently, even more successful results were obtained with a new and more sophisticated THz method called tomosynthesis [69] which was applied to image pencil writing on a stack of 50 paper sheets. THz tomography has been applied to resolve text on both sides of a single papyrus sheet [70,71]. THz was tested also for those cases where writing is obscured by stains and other inks in old purchament manuscripts [72] and it seems successful to characterize and evaluate conservation of iron gall inks [73] and parchment [74].

So with a properly calibrated T-Ray camera, and the tomographic software for image reconstruction, one can indeed scan through the pages of a book, top to bottom, never having to open the book - and without any hazard for the librarian. Just so long as the book isn't too thick.

Update: Demonstration of reading through nine pages - MIT News: Judging a book through its cover.


Looking into this, the first thing that came up was a technology that used infrared light to determine the curvature of the pages of a book so it could be scanned non-destructively (in normal scanning, the pages need to be flat - before google's idea this was only possible with glass plates, which was inefficient, or dis-binding the book, which destroyed the book). This was patented with patent 7508978. A nice, concise explanation of this is available here. A diagram of this is shown below:

google book scanner

Now, for the x-ray, don't have to open the book scanner. As far as I can tell, this sounds a whole lot like multispectral imaging (more about this here). I couldn't find any references to google doing this, but it can be done using infrared light. This technology was used to read ancient documents where it appeared to be black ink on black paper, but infrared was able to distinguish (more about this here). This has also been used on paintings.

Another technology I found that uses x-rays was used to examine scraps of paper inside medieval bindings without taking off the binding (more about that here). This is called macro x-ray fluorescence spectrometry. More information about this is available here.

I'll be updating this as I find more information. Hope this helps!


Edit: I just today found another site about this that looks very promising. MIT has created a scanning system that can analyze the letters of nine pages. There is a video on the page that explains this in great detail.