http://repository.iitr.ac.in/handle/123456789/5640
Title: | Cleaning of online bangla free-form handwriten text |
Authors: | Bhattacharya N. Pal U. Pratim Roy, Partha |
Published in: | ACM Transactions on Asian and Low-Resource Language Information Processing |
Abstract: | In the normal free-form handwritten text, repetition (repeated writing of the same stroke several times in the same place), over-writing, and crossing out are very common. In this article, we call the presence of these three types of writing as "noise." Cleaning to extract useful text from such types of noisy text is an important task for robust recognition. To the best of our knowledge, no work has been reported on cleaning of such noise from online text in any scripts and hence, in this article, we propose an automatic text-cleaning approach for online handwriting recognition. Here, at first, crossing out noise with straight strike-through lines is detected using the straightness criteria of online strokes. Next, regions containing repetition, over-writing, and other types of crossing out are located using the positional information of the overlapping strokes. Stroke density, self-intersections of strokes etc. are computed from the strokes of located regions to predict the type of noise and this type of information is used as follows for their cleaning. For cleaning of crossing outs, all strokes of the crossing-out region are removed. For cleaning repetition and over-writing, strokes written earlier are removed, keeping the latest strokes. Finally, delayed strokes are properly arranged and word is passed to online recognizer. Though recognition of free-form handwriting is quite difficult, in this attempt, we obtained up to 70.71% improvement in word-recognition accuracy after noise cleaning. © 2017 ACM. |
Citation: | ACM Transactions on Asian and Low-Resource Language Information Processing (2017), 17(1): - |
URI: | https://doi.org/10.1145/3145538 http://repository.iitr.ac.in/handle/123456789/5640 |
Issue Date: | 2017 |
Publisher: | Association for Computing Machinery |
Keywords: | Bangla script Free-form handwriting Indic script Lexicon driven approach Strike-through text detection |
ISSN: | 23754699 |
Author Scopus IDs: | 55604675700 57200742116 56880478500 |
Author Affiliations: | Bhattacharya, N., Bose Institute, Centenary Building, P-1/12, C.I.T. Scheme VII-M, Kolkata, 700 054, India Pal, U., CVPR Unit, Indian Statistical Institute, 203 Bar-rackpore Trunk Road, Kolkata, 700 108, India Roy, P.P., Department of Computer Science and Engineering, Indian Institute of Technology, Roorkee, Uttarakhand 247 667, India |
Appears in Collections: | Journal Publications [CS] |
Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.