Importing a grid of numbers from an image (sudoku like)
1 - Summary of a simple solution
In this particular DIGIT case there is a very simple solution based on neural nets (NNs)trained on MNIST Data. It is just a few lines of code:
i=Import["https://i.stack.imgur.com/LC2c2.png"];
imageGRID = ImagePartition[i, Scaled[1/22]];
lenet = NetModel["LeNet Trained on MNIST Data"];
test[x_] := If[ImageDistance[imageGRID[[2, 2]], x] > 10, lenet[x], "-"]
Grid[imageGRID /. x_Image :> test[x] /. 7 -> 1, Frame -> All]
2 - How it wroks
Now let's go in detail about it. In Wolfram NN repo there are 2 directly relevant NNs (as of today):
- LeNet Trained on MNIST Data
- CapsNet Trained on MNIST Data
I will go with the simplest - LeNet, let's get it from the repo:
lenet = NetModel["LeNet Trained on MNIST Data"];
Next get this image:
i=Import["https://i.stack.imgur.com/LC2c2.png"];
Now - partition it into an a matrix of sub-images -- one sub-image per digit. Your image got 22 boxes vertically and horizontally - so this is how you do it:
imageGRID = ImagePartition[i, Scaled[1/22]]
Now we can run LeNet on recognizing the digits, but we got a few little problems here.
LeNet is not trained on blank images - images without digits - it always expects a digit. So if you feed it blank it will make up a closest possible digit it thinks it corresponds to. So we need a way to test for blanks. THere are many ways - but let's just use a this test (where
imageGRID[[2, 2]]
is a sample blank image):test[x_] := If[ImageDistance[imageGRID[[2, 2]], x] > 10, lenet[x], "-"]
Another problem - LeNet can get confused with some of the typed digits. It will think 1 is a 7 actually due to the font chosen in your original image. This depends on specific images and fonts and can be customary hot-fixed. To avoid hacks I use here, you can train your own LeNet easily on the digits fo your type. Docs have a lot of examples about it.
So here is your final result:
Grid[imageGRID /. x_Image :> test[x] /. 7 -> 1, Frame -> All]
So simple with modern AI :-) And actually you can train a NN to take your original image grid and return a matrix of values. Maybe image2image nets' architecture would be interesting to try to adopt for this, as matrix is just another image; you can find those nets in Wolfram NN repo.
Here is a semi-manual way to do it :
Importation of the image, cutting it in a 48X48 array of small images, removing the borders :
imageArray = img //
RightComposition[
ImagePartition[#, 40, 40] &
, Map[Binarize, #, {2}] &
, Map[ImageCrop[#, 38] &, #, {2}] &
];
(* a view of a piece of the array : *)
imageArray[[10 ;; 15, 5 ;; 10]] // Grid[#, Dividers -> All] &
Then regrouping with FindCluster[#,5]
(5 because we want 5 groups),
removing exact duplicates (with Union
) and see the result :
imageArray //
RightComposition[
Flatten
, FindClusters[#, 5] &
, (Union /@ # &)
, Column[Row /@ #, Dividers -> All] &
]
There's no errors, so one can manually create the correspondances between the groups of images and the numbers :
rules = {1 -> "-", 2 -> 1, 3 -> 3, 4 -> 2, 5 -> 0}
The final result :
imageArray //
RightComposition[
ClusteringComponents[#, 5] &
, # /. rules &
, Grid]
]