Extract table from image
Here is one way:
data = ColorNegate@Import@"http://i.stack.imgur.com/NQr6I.png";
points=ComponentMeasurements[ MorphologicalComponents[Sharpen[Dilation[Binarize@data,1.5],1]] ,"Centroid"][[All,2]];
box=ComponentMeasurements[ MorphologicalComponents[Sharpen[Dilation[Binarize@data,1.5],1]] ,"BoundingBox"][[All,2]];
{posX,posY}=Mean/@Split[#,If[Abs[#1-#2]<5,True,False]&]&/@{Sort@points[[All,1]],Sort@points[[All,2]]}
We can see that grid position worked in this plot:
ListPlot[points,PlotRange->All,GridLines->{posX,posY},PlotStyle->Red]
Now let's do image partition:
imagePartition = ParallelMap[ImageTrim[Binarize@data, #] &, box];
Here is a sample:
imagePartition[[;; 15]]
Now the part that has to be improved, here is one attempt to recognize the numbers.
getNumber[img_]:=Module[{r,comp},
comp=ComponentMeasurements[img,{"PerimeterCount","Holes"}][[All,2]];
r=Which[
Length@#==2,-1
,#[[1,2]]==1,0
,#[[1,1]]<15,1
,True,2
]&[comp];
(*{r,comp,img}*)
r
]
Two elements finds -1, one hole find 0, no hole with perimeter < 15 finds 1 and the rest is 2.
Applying it data partition as:
numberData=Partition[ParallelMap[getNumber,imagePartition],Length@posX]//MatrixForm
We get:
Grid[numberData,Spacings->0,Alignment-> NumberPoint,Dividers->LightGray,BaseStyle->{FontSize-> 11}]
Not perfect, but can be a start point. It's just improve getNumber.
Update
With some calibration in getNumber
and using Binarize
intead of Sharpen
, now all cases are ok.
TextRecognize
works fine after some tweaks and error corrections:
x = Import["http://i.stack.imgur.com/NQr6I.png"];
res = TextRecognize[Binarize[ImageResize[x, Scaled[5]], 0.7],"SegmentationMode" -> 6];
m = ToExpression /@ StringSplit[#] & /@
StringSplit[
StringReplace[
res, {"O" | "D" | "U" -> "0", "~" | "\"" -> "-", "I" -> "1"}],
"\n"];
m // MatrixForm
I have used the undocumented option "SegmentationMode" -> 6
.