http://repository.iitr.ac.in/handle/123456789/15983
Title: | Word retrieval in historical document using character-primitives |
Authors: | Pratim Roy, Partha Ramel J.-Y. Ragot N. |
Published in: | Proceedings of the International Conference on Document Analysis and Recognition, ICDAR |
Abstract: | Word searching and indexing in historical document collections is a challenging problem because, characters in these documents are often touching or broken due to degradation/ ageing effects. For efficient searching in such historical documents, this paper presents a novel approach towards word spotting using string matching of character primitives. We describe the text string as a sequence of primitives which consists of a single character or a part of a character. Primitive segmentation is performed analyzing text background information that is obtained by water reservoir technique. Next, the primitives are clustered using template matching and a codebook of representative primitives is built. Using this primitive codebook, the text information in the document images are encoded and stored. For a query word, we segment it into primitives and encode the word by a string of representative primitives from codebook. Finally, an approximate string matching is applied to find similar words. The matching similarity is used to rank the retrieved words. The proposed method is tested on historical books of French alphabets and we have obtained encouraging results from the experiment. © 2011 IEEE. |
Citation: | Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, (2011), 678- 682. Beijing |
URI: | https://doi.org/10.1109/ICDAR.2011.142 http://repository.iitr.ac.in/handle/123456789/15983 |
Issue Date: | 2011 |
Keywords: | Ageing effects Approximate string matching Background information Codebooks Document images Historical documents Query words String matching Text information Text string Water reservoir Word retrieval Word Spotting History Reservoirs (water) Template matching Image matching |
ISBN: | 9.78077E+12 |
ISSN: | 15205363 |
Author Scopus IDs: | 56880478500 8293131700 16053389600 |
Author Affiliations: | Roy, P.P., Laboratoire d'Informatique, Université François Rabelais, Tours, France Ramel, J.-Y., Laboratoire d'Informatique, Université François Rabelais, Tours, France Ragot, N., Laboratoire d'Informatique, Université François Rabelais, Tours, France |
Corresponding Author: | Roy, P.P.; Laboratoire d'Informatique, Université François Rabelais, Tours, France; email: partha.roy@univ-tours.fr |
Appears in Collections: | Conference Publications [CS] |
Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.