Skip navigation
Please use this identifier to cite or link to this item: http://repository.iitr.ac.in/handle/123456789/15983
Title: Word retrieval in historical document using character-primitives
Authors: Pratim Roy, Partha
Ramel J.-Y.
Ragot N.
Published in: Proceedings of the International Conference on Document Analysis and Recognition, ICDAR
Abstract: Word searching and indexing in historical document collections is a challenging problem because, characters in these documents are often touching or broken due to degradation/ ageing effects. For efficient searching in such historical documents, this paper presents a novel approach towards word spotting using string matching of character primitives. We describe the text string as a sequence of primitives which consists of a single character or a part of a character. Primitive segmentation is performed analyzing text background information that is obtained by water reservoir technique. Next, the primitives are clustered using template matching and a codebook of representative primitives is built. Using this primitive codebook, the text information in the document images are encoded and stored. For a query word, we segment it into primitives and encode the word by a string of representative primitives from codebook. Finally, an approximate string matching is applied to find similar words. The matching similarity is used to rank the retrieved words. The proposed method is tested on historical books of French alphabets and we have obtained encouraging results from the experiment. © 2011 IEEE.
Citation: Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, (2011), 678- 682. Beijing
URI: https://doi.org/10.1109/ICDAR.2011.142
http://repository.iitr.ac.in/handle/123456789/15983
Issue Date: 2011
Keywords: Ageing effects
Approximate string matching
Background information
Codebooks
Document images
Historical documents
Query words
String matching
Text information
Text string
Water reservoir
Word retrieval
Word Spotting
History
Reservoirs (water)
Template matching
Image matching
ISBN: 9.78077E+12
ISSN: 15205363
Author Scopus IDs: 56880478500
8293131700
16053389600
Author Affiliations: Roy, P.P., Laboratoire d'Informatique, Université François Rabelais, Tours, France
Ramel, J.-Y., Laboratoire d'Informatique, Université François Rabelais, Tours, France
Ragot, N., Laboratoire d'Informatique, Université François Rabelais, Tours, France
Corresponding Author: Roy, P.P.; Laboratoire d'Informatique, Université François Rabelais, Tours, France; email: partha.roy@univ-tours.fr
Appears in Collections:Conference Publications [CS]

Files in This Item:
There are no files associated with this item.
Show full item record


Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.