Skip navigation
Please use this identifier to cite or link to this item:
Title: Multi-lingual date field extraction for automatic document retrieval by machine
Authors: Mandal R.
Pratim Roy, Partha
Pal U.
Blumenstein M.
Published in: Information Sciences
Abstract: Robotic intelligence has recently received significant attention in the research community. Application of such artificial intelligence can be used to perform automatic document retrieval and interpretation by a robot through query. So, it is necessary to extract the key information from the document based on the query to produce the desired feedback. For this purpose, in this paper we propose a system for automatic date field extraction from multi-lingual (English, Devnagari and Bangla scripts) handwritten documents. The date is a key piece of information, which can be used in various robotic applications such as date-wise document indexing/retrieval. In order to design the system, first the script of the document is identified, and based on the identified script, word components of each text line are classified into month and non-month classes using word-level feature extraction and classification. Next, non-month words are segmented into individual components and labelled into one of text, digit, punctuation or contraction categories. Subsequently, the date patterns are searched using the labelled components. Both numeric and semi-numeric regular expressions have been used for date part extraction. Dynamic Time Warping (DTW) and profile feature-based approaches are used for classification of month/non-month words. Other date components such as numerals and punctuation marks are recognised using a gradient-based feature and Support Vector Machine (SVM) classifier. The experiments are performed on English, Devnagari and Bangla document datasets and the encouraging results obtained from the system indicate the effectiveness of the proposed system. © 2014 Elsevier Inc. All rights reserved.
Citation: Information Sciences (2015), 314(): 277-292
Issue Date: 2015
Publisher: Elsevier Inc.
Keywords: Date spotting
Date-based indexing
Handwritten date extraction
Multi-lingual documents
Robot reading
Robot retrieval of document
ISSN: 200255
Author Scopus IDs: 54410932900
Author Affiliations: Mandal, R., School of Information and Communication Technology, Griffith UniversityQLD, Australia
Roy, P.P., Dept. of Computer Science and Engineering, Indian Institute of Technology, Roorkee, India
Pal, U., Computer Vision and Pattern Recognition Unit, Indian Statistical Institute, Kolkata, India
Blumenstein, M., School of Information and Communication Technology, Griffith UniversityQLD, Australia
Corresponding Author: Mandal, R.; School of Information and Communication Technology, Griffith UniversityAustralia
Appears in Collections:Journal Publications [CS]

Files in This Item:
There are no files associated with this item.
Show full item record

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.