Skip navigation
Please use this identifier to cite or link to this item: http://repository.iitr.ac.in/handle/123456789/5640
Title: Cleaning of online bangla free-form handwriten text
Authors: Bhattacharya N.
Pal U.
Pratim Roy, Partha
Published in: ACM Transactions on Asian and Low-Resource Language Information Processing
Abstract: In the normal free-form handwritten text, repetition (repeated writing of the same stroke several times in the same place), over-writing, and crossing out are very common. In this article, we call the presence of these three types of writing as "noise." Cleaning to extract useful text from such types of noisy text is an important task for robust recognition. To the best of our knowledge, no work has been reported on cleaning of such noise from online text in any scripts and hence, in this article, we propose an automatic text-cleaning approach for online handwriting recognition. Here, at first, crossing out noise with straight strike-through lines is detected using the straightness criteria of online strokes. Next, regions containing repetition, over-writing, and other types of crossing out are located using the positional information of the overlapping strokes. Stroke density, self-intersections of strokes etc. are computed from the strokes of located regions to predict the type of noise and this type of information is used as follows for their cleaning. For cleaning of crossing outs, all strokes of the crossing-out region are removed. For cleaning repetition and over-writing, strokes written earlier are removed, keeping the latest strokes. Finally, delayed strokes are properly arranged and word is passed to online recognizer. Though recognition of free-form handwriting is quite difficult, in this attempt, we obtained up to 70.71% improvement in word-recognition accuracy after noise cleaning. © 2017 ACM.
Citation: ACM Transactions on Asian and Low-Resource Language Information Processing (2017), 17(1): -
URI: https://doi.org/10.1145/3145538
http://repository.iitr.ac.in/handle/123456789/5640
Issue Date: 2017
Publisher: Association for Computing Machinery
Keywords: Bangla script
Free-form handwriting
Indic script
Lexicon driven approach
Strike-through text detection
ISSN: 23754699
Author Scopus IDs: 55604675700
57200742116
56880478500
Author Affiliations: Bhattacharya, N., Bose Institute, Centenary Building, P-1/12, C.I.T. Scheme VII-M, Kolkata, 700 054, India
Pal, U., CVPR Unit, Indian Statistical Institute, 203 Bar-rackpore Trunk Road, Kolkata, 700 108, India
Roy, P.P., Department of Computer Science and Engineering, Indian Institute of Technology, Roorkee, Uttarakhand 247 667, India
Appears in Collections:Journal Publications [CS]

Files in This Item:
There are no files associated with this item.
Show full item record


Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.