Skip navigation
Please use this identifier to cite or link to this item: http://repository.iitr.ac.in/handle/123456789/15660
Title: A two-stage approach for word spotting in graphical documents
Authors: Tarafdar A.
Pal U.
Pratim Roy, Partha
Ragot N.
Ramel J.-Y.
Published in: Proceedings of the International Conference on Document Analysis and Recognition, ICDAR
Abstract: Presence of multi-oriented characters, connected characters with graphical lines, intersection of text and symbols with graphical lines/curves etc. are very common in graphical documents. As a result word spotting in graphical documents is still a challenging task that we try to solve (partially) in this paper. The proposed approach proceeds in two stages. In the first stage, recognition of isolated components is done using rotation invariant features and an SVM classifier. The characters having good recognition score and match in the query string are first selected for initial spotting. Because of structural complexity of graphical documents as well as of touching components, we may miss some of the query characters during initial spotting in some documents. In that case, based on the position, size and orientation of the recognized characters in the input document image, regions where missing characters may be located (candidate regions) are defined. In the second stage, Scale Invariant Feature Transform (SIFT) is used to find those missing characters in the candidate regions for possible spotting. Finally, using the position, size, orientation as well as intercharacter gap information of the recognized components, spotting is validated. Experimental results demonstrate that the method is efficient to locate a query word in multi-oriented and/or touching graphical documents. © 2013 IEEE.
Citation: Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, (2013), 319- 323. Washington, DC
URI: https://doi.org/10.1109/ICDAR.2013.71
http://repository.iitr.ac.in/handle/123456789/15660
Issue Date: 2013
Keywords: Document Image Analysis
Graphical documents
Information Retrieval
SIFT
Word Spotting
ISSN: 15205363
Author Scopus IDs: 36648040100
57200742116
56880478500
16053389600
8293131700
Author Affiliations: Tarafdar, A., CVPR Unit, Indian Statistical Instiute, India
Pal, U., CVPR Unit, Indian Statistical Instiute, India
Roy, P.P., CVPR Unit, Indian Statistical Instiute, India, CVPR Unit, Indian Statistical Institute, India
Ragot, N., Laboratoire d'Informatique, Université Francois Rabelais Tours, France
Ramel, J.-Y., Laboratoire d'Informatique, Université Francois Rabelais Tours, France
Corresponding Author: CVPR Unit, Indian Statistical InstiuteIndia
Appears in Collections:Conference Publications [CS]

Files in This Item:
There are no files associated with this item.
Show full item record


Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.