Improve TextRecognize[] on numbers
TextRecognize[]
accepts an undocumented Option "SegmentationMode"
.
The allowed values are:
?Image`ExternalOCRDump`$TextRecognizeSegmentationModes
{
{{3, "Fully automatic page segmentation, but no OSD. (Default)"}},
{{4, "Assume a single column of text of variable sizes"}},
{{6, "Assume a single uniform block of text"}},
{{7, "Treat the image as a single text line"}},
{{8, "Treat the image as a single word"}},
{{10, "Treat the image as a single character"}}
}
Of course in this case we want to use mode 7
:
im = Import@"http://i.stack.imgur.com/cPRrY.png"
TextRecognize[im, "SegmentationMode" -> 7]
(* "3482671897" *)
And that's it.
Now, if you don't want to use undocumented options, you've to know that TextRecognize[]
works much better if you first adjust the spacing between characters:
im = Import@"http://i.stack.imgur.com/cPRrY.png"
a = ConstantArray[1 , Last@ImageDimensions@im];
newImage = Image@Transpose[Transpose[ImageData@im] //.
{x__, Longest[a..], y__} :> {x, a, a, y}]
TextRecognize@newImage
(* "3482671897" *)
Note that we are replacing the variable length vertical white strips with a minimum of 3 pixels wide with a standard separator of two pixels, as you can see here:
Image@Transpose[Transpose[ ImageData@im] //.
{x__, b : Longest[a..], y__} :> {x, 0 b, y}]
Update
As of Version 12.1, TextRecognize@Import["http://i.stack.imgur.com/cPRrY.png"]
works without the need for additional manipulations of the image or use of undocumented features.
previous answer
TextRecognize
seems to be a work in progress, consider the following
Rasterize[Graphics[Text[Style["3", 100]]]] // TextRecognize
Rasterize[Graphics[Text[Style["a", 100]]]] // TextRecognize
Rasterize[Graphics[Text[Style["123", 100]]]] // TextRecognize
Rasterize[Graphics[Text[Style["1234", 100]]]] // TextRecognize
Rasterize[Graphics[Text[Style["hello", 100]]]] // TextRecognize
Rasterize[Graphics[Text[Style["hello 3", 100]]]] // TextRecognize
yields the following output
{nothing here}
{nothing here}
{nothing here}
1234
hello
hello 3
For reasons that are entirely unclear, single characters are not recognized as text, nor are numbers small "arrays" of numbers. Oddly enough, small numbers are recognized if preceeded with an actual word, making the following a terrible solution that nonetheless gives you the answer:
n = Import["http://i.stack.imgur.com/cPRrY.png"];
pretext = Rasterize["hello ", RasterSize -> 175, ImageSize -> 40];
Row[{pretext, ImageResize[n, 1000]}] // Rasterize;
t = TextRecognize@ImageResize[%, Scaled[5]];
StringSplit@t
gives the output
{hello,3482671897}
Let's hope someone comes up with a better answer...
Mathematica version 11.1
This still fails sometimes but it's better than nothing
AlphaNumTextRecognize[img_Image] := StringJoin[
Map[
TextRecognize[#, RecognitionPrior -> "Character"] &,
TextRecognize[ImageResize[img, 4 ImageDimensions[img]],
"Character", "Image"]
]
]