MNIST database explained

The MNIST database (Modified National Institute of Standards and Technology database[1]) is a large database of handwritten digits that is commonly used for training various image processing systems.[2] [3] The database is also widely used for training and testing in the field of machine learning.[4] [5] It was created by "re-mixing" the samples from NIST's original datasets.[6] The creators felt that since NIST's training dataset was taken from American Census Bureau employees, while the testing dataset was taken from American high school students, it was not well-suited for machine learning experiments.[7] Furthermore, the black and white images from NIST were normalized to fit into a 28x28 pixel bounding box and anti-aliased, which introduced grayscale levels.[7]

The MNIST database contains 60,000 training images and 10,000 testing images.[8] Half of the training set and half of the test set were taken from NIST's training dataset, while the other half of the training set and the other half of the test set were taken from NIST's testing dataset.[9] The original creators of the database keep a list of some of the methods tested on it.[7] In their original paper, they use a support-vector machine to get an error rate of 0.8%.[10]

The original MNIST dataset contains at least 4 wrong labels.[11]

History

The set of images in the MNIST database was created in 1994. Previously, NIST released two datasets: Special Database 1 (NIST Test Data I, or SD-1); and Special Database 3 (or SD-2). They were released on two CD-ROMs.

SD-1 was the test set, and it contained digits written by high school students, 58,646 images written by 500 different writers. Each image is accompanied by the identity of its writer. SD-3 was the training set, and it contained digits written by 2000 employees of the United States Census Bureau. It was much cleaner and easier to recognize than images in SD-1.[7] It was found that machine learning systems trained and validated on SD-3 suffered significant drops in performance on the test set.[12]

The original dataset from MNIST contained 128x128 binary images. Each was size-normalized to fit in a 20x20 pixel box while preserving their aspect ratio, and anti-aliased to grayscale. Then it was put into a 28x28 image by translating it until the center of mass of the pixels is in the center of the image. The details of how the downsampling proceeded was reconstructed.

The training set and the testing set both originally had 60k samples, but 50k of the testing set samples were discarded. These were restored to construct the QMNIST, which has 60k images in the training set and 60k in the testing set.[13]

Extended MNIST (EMNIST) is a newer dataset developed and released by NIST to be the (final) successor to MNIST.[14] [15] MNIST included images only of handwritten digits. EMNIST includes all the images from NIST Special Database 19 (SD 19), which is a large database of 814,255 handwritten uppercase and lower case letters and digits.[16] [17] The images in EMNIST were converted into the same 28x28 pixel format, by the same process, as were the MNIST images. Accordingly, tools which work with the older, smaller, MNIST dataset will likely work unmodified with EMNIST.

Fashion MNIST was created in 2017 as a more challenging replacement for MNIST. The dataset consists of 70,000 28x28 grayscale images of fashion products from 10 categories.[18]

Performance

Some researchers have achieved "near-human performance" on the MNIST database, using a committee of neural networks; in the same paper, the authors achieve performance double that of humans on other recognition tasks.[19] The highest error rate listed[7] on the original website of the database is 12 percent, which is achieved using a simple linear classifier with no preprocessing.[10]

In 2004, a best-case error rate of 0.42 percent was achieved on the database by researchers using a new classifier called the LIRA, which is a neural classifier with three neuron layers based on Rosenblatt's perceptron principles.[20]

Some researchers have tested artificial intelligence systems using the database put under random distortions. The systems in these cases are usually neural networks and the distortions used tend to be either affine distortions or elastic distortions.[7] Sometimes, these systems can be very successful; one such system achieved an error rate on the database of 0.39 percent.[21]

In 2011, an error rate of 0.27 percent, improving on the previous best result, was reported by researchers using a similar system of neural networks.[22] In 2013, an approach based on regularization of neural networks using DropConnect has been claimed to achieve a 0.21 percent error rate.[23] In 2016, the single convolutional neural network best performance was 0.25 percent error rate.[24] As of August 2018, the best performance of a single convolutional neural network trained on MNIST training data using no data augmentation is 0.25 percent error rate.[24] [25] Also, the Parallel Computing Center (Khmelnytskyi, Ukraine) obtained an ensemble of only 5 convolutional neural networks which performs on MNIST at 0.21 percent error rate.[26] [27]

Classifiers

This is a table of some of the machine learning methods used on the dataset and their error rates, by type of classifier:

Type Classifier Distortion Error rate (%)
Neural NetworkGradient Descent TunnelingNoneNone0[28]
Deskewing 7.6
K-NN with rigid transformations 0.96[29]
K-NN with non-linear deformation (P2DHMDM) Shiftable edges 0.52[30]
Haar features 0.87[31]
Non-linear classifier 40 PCA + quadratic classifier 3.3
Simple statistical pixel importance 2.8[32]
Support-vector machine (SVM) Virtual SVM, deg-9 poly, 2-pixel jittered Deskewing 0.56[33]
2-layer 784-800-10 1.6[34]
2-layer 784-800-10 Elastic distortions 0.7
Deep neural network (DNN) 6-layer 784-2500-2000-1500-1000-500-10 Elastic distortions 0.35[35]
6-layer 784-40-80-500-1000-2000-10 0.31[36]
6-layer 784-50-100-500-1000-10-10 Expansion of the training data 0.27[37]
13-layer 64-128(5x)-256(3x)-512-2048-256-256-100.25
Committee of 35 CNNs, 1-20-P-40-P-150-10 Elastic distortions Width normalizations 0.23
Expansion of the training data 0.21
Convolutional neural networkCommittee of 20 CNNS with Squeeze-and-Excitation Networks[38] Data augmentation0.17[39]
Convolutional neural networkEnsemble of 3 CNNs with varying kernel sizesData augmentation consisting of rotation and translation0.09[40]

See also

Further reading

External links

Notes and References

  1. Web site: The MNIST Database of handwritten digits. Yann LeCun, Courant Institute, NYU Corinna Cortes, Google Labs, New York Christopher J.C. Burges, Microsoft Research, Redmond.
  2. Web site: Support vector machines speed pattern recognition - Vision Systems Design. Vision Systems Design. September 2004 . 17 August 2013.
  3. Web site: Gangaputra. Sachin. Handwritten digit database. 17 August 2013.
  4. Web site: Qiao. Yu. The MNIST Database of handwritten digits. 18 August 2013. 2007.
  5. Platt. John C.. Using analytic QP and sparseness to speed training of support vector machines. Advances in Neural Information Processing Systems. 1999. 557563. 18 August 2013. https://web.archive.org/web/20160304083810/http://ar.newsmth.net/att/148aa490aed5b5/smo-nips.pdf. 4 March 2016. dead.
  6. Web site: NIST Special Database 19 - Handprinted Forms and Characters Database. Grother. Patrick J.. National Institute of Standards and Technology.
  7. Web site: LeCun. Yann. Cortez. Corinna. Burges. Christopher C.J.. The MNIST Handwritten Digit Database. Yann LeCun's Website yann.lecun.com. 30 April 2020.
  8. Kussul . Ernst . Baidyk . Tatiana. Tetyana Baydyk . Improved method of handwritten digit recognition tested on MNIST database . Image and Vision Computing . 2004 . 22 . 12 . 971981 . 10.1016/j.imavis.2004.03.008.
  9. Zhang . Bin . Srihari . Sargur N. . Fast k-Nearest Neighbor Classification Using Cluster-Based Trees . IEEE Transactions on Pattern Analysis and Machine Intelligence . 2004 . 26 . 4 . 525528 . 20 April 2020 . 10.1109/TPAMI.2004.1265868 . 15382657 . 6883417.
  10. LeCun . Yann . Léon Bottou . Yoshua Bengio . Patrick Haffner . Gradient-Based Learning Applied to Document Recognition . Proceedings of the IEEE . 1998 . 86 . 11 . 22782324 . 18 August 2013 . 10.1109/5.726791. 14542261 .
  11. Muller . Nicolas M. . Markert . Karla . July 2019 . Identifying Mislabeled Instances in Classification Datasets . 2019 International Joint Conference on Neural Networks (IJCNN) . IEEE . 1–8 . 10.1109/IJCNN.2019.8851920 . 978-1-7281-1985-4. 1912.05283 .
  12. Book: Bottou . Léon . Proceedings of the 12th IAPR International Conference on Pattern Recognition (Cat. No.94CH3440-5) . Cortes . Corinna . Denker . John S. . Drucker . Harris . Guyon . Isabelle . Jackel . L. D. . LeCun . Y. . Muller . U. A. . Sackinger . E. . 1994 . 0-8186-6270-0 . 2 . Jerusalem, Israel . 77–82 . Comparison of classifier methods: A case study in handwritten digit recognition . 10.1109/ICPR.1994.576879 . P. . Simard . V. . Vapnik.
  13. Yadav . Chhavi . Bottou . Leon . 2019 . Cold Case: The Lost MNIST Digits . Advances in Neural Information Processing Systems . 32. 1905.10498 . Article has a detailed history and a reconstruction of the discarded testing set. .
  14. Web site: NIST . 4 April 2017 . The EMNIST Dataset . 11 April 2022 . NIST.
  15. Web site: NIST . 27 August 2010 . NIST Special Database 19 . 11 April 2022 . NIST.
  16. 1702.05373 . cs.CV . G. . Cohen . S. . Afshar . EMNIST: an extension of MNIST to handwritten letters. . Tapson . J. . van Schaik . A. . 2017.
  17. Grother, Patrick J., and K. K. Hanaoka. "NIST special database 19." Handprinted forms and characters database, National Institute of Standards and Technology 10 (1995): 69.
  18. Xiao . Han . Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms . 2017-09-15 . 1708.07747 . Rasul . Kashif . Vollgraf . Roland. cs.LG .
  19. Book: Cires¸an, Dan. http://repository.supsi.ch/5145/1/IDSIA-04-12.pdf. Multi-column deep neural networks for image classification. Ueli Meier. Jürgen Schmidhuber. 2012 IEEE Conference on Computer Vision and Pattern Recognition. 2012. 978-1-4673-1228-8. 36423649. 1202.2745. 10.1.1.300.3283. 10.1109/CVPR.2012.6248110. 2161592.
  20. Kussul. Ernst. Tatiana Baidyk. Tetyana Baydyk. Improved method of handwritten digit recognition tested on MNIST database. Image and Vision Computing. 2004. 22. 12. 971981. 10.1016/j.imavis.2004.03.008. 20 September 2013. https://web.archive.org/web/20130921060416/https://vlabdownload.googlecode.com/files/Image_VisionComputing.pdf. 21 September 2013. dead.
  21. Ranzato. Marc'Aurelio. Christopher Poultney . Sumit Chopra . Yann LeCun . Efficient Learning of Sparse Representations with an Energy-Based Model. Advances in Neural Information Processing Systems. 2006. 19. 11371144. 20 September 2013.
  22. Book: Ciresan, Dan Claudiu. Ueli Meier. Luca Maria Gambardella. Jürgen Schmidhuber. Convolutional neural network committees for handwritten character classification. 2011 International Conference on Document Analysis and Recognition (ICDAR). 2011. 11351139. 10.1109/ICDAR.2011.229. http://www.icdar2011.org/fileup/PDF/4520b135.pdf. 20 September 2013. 978-1-4577-1350-7. 10.1.1.465.2138. 10122297. https://web.archive.org/web/20160222152015/http://www.icdar2011.org/fileup/PDF/4520b135.pdf. 22 February 2016. dead.
  23. Wan. Li. Matthew Zeiler. Sixin Zhang. Yann LeCun. Rob Fergus. Regularization of Neural Network using DropConnect. International Conference on Machine Learning(ICML). 2013.
  24. Web site: SimpleNet. 2016. Lets Keep it simple, Using simple architectures to outperform deeper and more complex architectures. 3 December 2020. 1608.06037.
  25. Web site: SimpNet. Towards Principled Design of Deep Convolutional Networks: Introducing SimpNet. 3 December 2020. Github. 2018. 1802.06205.
  26. Web site: Romanuke. Vadim. Parallel Computing Center (Khmelnytskyi, Ukraine) represents an ensemble of 5 convolutional neural networks which performs on MNIST at 0.21 percent error rate.. 24 November 2016.
  27. Romanuke . Vadim . Training data expansion and boosting of convolutional neural networks for reducing the MNIST dataset error rate. Research Bulletin of NTUU "Kyiv Polytechnic Institute". 2016 . 6. 6 . 2934. 10.20535/1810-0546.2016.6.84115. 24. free.
  28. Deng . Bo . Error-free Training for Artificial Neural Network . 2023-12-26 . cs.LG . 2312.16060.
  29. Lindblad. Joakim. Nataša Sladoje. Linear time distances between fuzzy sets with applications to pattern matching and classification. IEEE Transactions on Image Processing. January 2014. 23. 1. 126136. 10.1109/TIP.2013.2286904. 24158476. 2014ITIP...23..126L . 1908950 .
  30. Keysers. Daniel. Thomas Deselaers . Christian Gollan . Hermann Ney . Deformation models for image recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence. August 2007. 29. 8. 14221435. 10.1109/TPAMI.2007.1153. 17568145. 10.1.1.106.3963. 2528485.
  31. Book: Kégl, Balázs. Róbert Busa-Fekete. Proceedings of the 26th Annual International Conference on Machine Learning . Boosting products of base classifiers . 2009. 497504. 10.1145/1553374.1553439 . 9781605585161 . 8460779 . https://users.lal.in2p3.fr/kegl/research/PDFs/keglBusafeRekete09.pdf. 27 August 2013.
  32. Web site: Mehrad Mahmoudian / MNIST with RandomForest.
  33. Decoste. Dennis. Schölkopf. Bernhard. 2002. Training Invariant Support Vector Machines. Machine Learning. 46. 161190. 1–3. 10.1023/A:1012454411458. 703649027. 0885-6125. free.
  34. Book: Best Practices for Convolutional Neural Networks Applied to Visual Document Analysis. Patrice Y. Simard. Dave Steinkraus. John C. Platt. 2003. http://research.microsoft.com/apps/pubs/?id=68920. Institute of Electrical and Electronics Engineers. 10.1109/ICDAR.2003.1227801. Proceedings of the Seventh International Conference on Document Analysis and Recognition . 1. 958. 978-0-7695-1960-9. 4659176.
  35. Ciresan. Claudiu Dan . Ueli Meier . Luca Maria Gambardella . Juergen Schmidhuber . Deep Big Simple Neural Nets Excel on Handwritten Digit Recognition. Neural Computation. December 2010. 22. 12. 320720 . 10.1162/NECO_a_00052. 20858131 . 1003.0358. 1918673.
  36. Web site: Romanuke. Vadim. The single convolutional neural network best performance in 18 epochs on the expanded training data at Parallel Computing Center, Khmelnytskyi, Ukraine. 16 November 2016.
  37. Web site: Romanuke. Vadim. Parallel Computing Center (Khmelnytskyi, Ukraine) gives a single convolutional neural network performing on MNIST at 0.27 percent error rate. 24 November 2016.
  38. 1709.01507. Hu. Jie. Squeeze-and-Excitation Networks. Shen. Li. Albanie. Samuel. Sun. Gang. Wu. Enhua. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2019. 42. 8. 20112023. 10.1109/TPAMI.2019.2913372. 31034408. 140309863.
  39. Web site: GitHub - Matuzas77/MNIST-0.17: MNIST classifier with average 0.17% error. GitHub. 25 February 2020.
  40. An . Sanghyeon . Lee . Minjun . Park . Sanglee . Yang . Heerin . So . Jungmin . 2020-10-04 . An Ensemble of Simple Convolutional Neural Network Models for MNIST Digit Recognition . cs.CV . 2008.10400 .