A new traditional Mongolian word-level online handwriting dataset, MOLHW is introduced. The dataset consists of handwritten Mongolian words, including 164,631 samples written by 200 writers and covering 40,605 Mongolian common words. These words were selected from a large Mongolian corpus. The coordinate points of words were collected by volunteers, who wrote the corresponding words on the dedicated application for their mobile phones.
With the popularity of mobile phones and digital devices, more applications of handwriting recognition have emerged, such as the handwriting input method, signature recognition, and business card recognition. In the Inner Mongolia Autonomous Region, China, about 4 million people speak and write the traditional Mongolian language. However, owing to the lack of datasets, the development of Mongolian online handwriting recognition has been slow.
In Mongolian online handwriting recognition, to our knowledge, MRG-OHMW is the first publicly available database for online handwritten Mongolian. The main shortcoming of this dataset is that the vocabulary only covers 946 Mongolian words, which is too small for Mongolian, and the handwriting trajectories were collected by an Anoto pen on paper, making them different from trajectories written with fingers on a touch screen.
The MOLHW dataset is now freely available to researchers for various Mongolian online text-related applications, such as Mongolian online text recognition, handwritten text generation, writer identification and verification, and signature recognition. The main contributions of this paper can thus be summarized as follows.
- The creation of an open vocabulary benchmarking dataset of a Mongolian online handwritten dataset, MOHLW, which includes 164,631 samples written by 200 writers and covers 40,605 common Mongolian words.
- The development of tools, techniques, and procedures for Mongolian online text collection, verification, and transliteration.
- The development of a proposed benchmark model for recognition of online Mongolian handwritten words using the encoder–decoder model.
- A comparison of the performance of different models on this dataset.
Like English, traditional Mongolian is a phonetic script, with 35 letters. Unlike letters in the Latin alphabet, Mongolian letters have different shapes depending on the position and context in a word.
The MOLHW dataset is now publicly available at https://www.kaggle.com/fandaoerji/molhw-ooo to all researchers, and some examples of handwriting samples are shown in Fig. 6.
We show examples of different styles of writing in Fig. 7.
Related News