Skip navigation
Please use this identifier to cite or link to this item: http://repository.iitr.ac.in/handle/123456789/15810
Title: Generation of synthetic training data for handwritten Indic script recognition
Authors: Gaur S.
Sonkar S.
Pratim Roy, Partha
Published in: Proceedings of the International Conference on Document Analysis and Recognition, ICDAR
Abstract: This paper presents a novel approach to create synthetic dataset for word recognition systems. Our purpose is to improve performance of off-line handwritten text recognizers by providing it with additional synthetic training data. Due to lack of proper data-set for many languages it becomes hard to train recognition systems. To solve such problems synthetic handwriting could be used to expand the existing training dataset. Any available digital data from online newspaper and such sources can be used to generate this synthetic data. The digital data is distorted in such a way that the underlying pattern is conserved for identification of the word by both machine and human user. The images hence produced can be used to train any classification system for handwriting recognition. This data can be used independently to train the system or be combined with natural handwritten data to augment the original dataset and improve the accuracy of the results. We experimented using only synthetic data obtaining high recognition accuracy in both character and word recognition. The data was tested on 3 Indian scripts for numerals- Hindi, Bengali and Telugu, and 1 script-Hindi for words, the results achieved hence are highly promising. © 2015 IEEE.
Citation: Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, (2015), 491- 495
URI: https://doi.org/10.1109/ICDAR.2015.7333810
http://repository.iitr.ac.in/handle/123456789/15810
Issue Date: 2015
Publisher: IEEE Computer Society
Keywords: Hidden Markov Models
Indic Text Recognition
Synthetic Data Generation
ISBN: 9.78148E+12
ISSN: 15205363
Author Scopus IDs: 57188730765
57207533098
56880478500
Author Affiliations: Gaur, S., Department of Computer Science and Engineering, Indian Institute of Technology, Roorkee, Uttarakhand, India
Sonkar, S., Department of Computer Science and Engineering, Indian Institute of Technology, Roorkee, Uttarakhand, India
Roy, P.P., Department of Computer Science and Engineering, Indian Institute of Technology, Roorkee, Uttarakhand, India
Appears in Collections:Conference Publications [CS]

Files in This Item:
There are no files associated with this item.
Show full item record


Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.