How to improve digit recognition of a model trained on MNIST?

So, you need a complex approach cause every step of your computing cascade based on the previous results. In your algorithm you have the next features:

  1. Image preprocessing

As mentioned earlier, if you apply the resizing, then you lose information about the aspect ratios of the image. You have to do the same reprocessing of digits images to get the same results that was implied in the training process.

Better way if you just crop the image by fixed size pictures. In that variant you won't need in contours finding and resizing digit image before training process. Then you could make a little change in your crop algorithm for better recognizing: simple find the contour and put your digit without resizing at the center of relevant image frame for recognition.

Also you should pay more attention to the binarization algorithm. I have had experience studying the effect of binarization threshold values on learning error: I can say that this is a very significant factor. You may try another algorithms of binarization to check this idea. For example you may use this library for testing alternate binarization algorithms.

  1. Learning algorithm

To improve the quality of recognition you use cross-validation at the training process. This helps you to avoid the problem of overfitting for your training data. For example you may read this article where explained how to use it with Keras.

Sometimes higher rates of accuracy measure doesn't say anything about the real recognition quality cause trained ANN not found the pattern in the training data. It may be connected with the training process or the input dataset as explained above, or it may cause by the ANN architecture choose.

  1. ANN architecture

It's a big problem. How to define the better ANN architecture to solve the task? There are no common ways to do that thing. But there are a few ways to get closer to the ideal. For example you could read this book. It helps you to make a better vision for your problem. Also you may find here some heuristics formulas to fit the number of hidden layers/elements for your ANN. Also here you will find a little overview for this.

I hope this will helps.


I believe that your problem is dilation process. I understand that you wish to normalize image sizes, but you shouldn't break the proportions, you should resize to maximum desired by one axis (the one that allows largest re-scale without letting another axis dimension to exceed the maximum size) and fill with background color the rest of the image. It's not that "standard MNIST just hasn't seen the number one which looks like your test cases", you make your images look like different trained numbers (the ones that are recognized)

Overlap of the source and processed images

If you maintained the correct aspect ration of your images (source and post-processed), you can see that you did not just resize the image but "distorted" it. It can be the result of either non-homogeneous dilation or incorrect resizing


After some research and experiments, I came to a conclusion that the image preprocessing itself was not the problem (I did change some suggested parameters, like e.g. dilation size and shape but they were not crucial to the results). What did help, however, were 2 following things:

  1. As @f4f noticed, I needed to collect my own dataset with real-world data. This already helped tremendously.

  2. I made important changes to my segmentation preprocessing. After getting individual contours, I first size-normalize the images to fit into a 20x20 pixel box (as they are in MNIST). After that I center the box in the middle of 28x28 image using the center of mass (which for binary images is the mean value across both dimensions).

Of course, there are still difficult segmentation cases, such as overlapping or connected digits, but the above changes answered my initial question and improved my classification performance.


There are already some answers posted but neither of them answers your actual question about image preprocessing.

In my turn I don't see any significant problems with your implementation as long as it's a study project, well done.

But one thing to notice you may miss. There are basic operations in mathematical morphology: erosion and dilation (used by you). And there complex operations: various combinations of basic ones (eg. opening and closing). Wikipedia link is not the best CV reference, but you may start with it to get the idea.

Usually in its better to use opening instead of erosion and closing instead of dilation since in this case original binary image changes much less (but the desired effect of cleaning sharp edges or filling gaps is reached). So in your case you should check closing (image dilation followed by erosion with the same kernel). In case extra-small image 8*8 is greatly modified when you dilate even with 1*1 kernel (1 pixel is more than 16% of image) which is less on larger images).

To visualize the idea see the following pics (from OpenCV tutorials: 1, 2):

dilation: original symbol and dilated one

closing: original symbol and closed one

Hope it helps.