Implicit segmentation of Kannada characters in offline handwriting recognition using hidden Markov models

16 Oct 2014 · Manasij Venkatesh, Vikas Majjagi, Deepu Vijayasenan ·

We describe a method for classification of handwritten Kannada characters using Hidden Markov Models (HMMs). Kannada script is agglutinative, where simple shapes are concatenated horizontally to form a character. This results in a large number of characters making the task of classification difficult. Character segmentation plays a significant role in reducing the number of classes. Explicit segmentation techniques suffer when overlapping shapes are present, which is common in the case of handwritten text. We use HMMs to take advantage of the agglutinative nature of Kannada script, which allows us to perform implicit segmentation of characters along with recognition. All the experiments are performed on the Chars74k dataset that consists of 657 handwritten characters collected across multiple users. Gradient-based features are extracted from individual characters and are used to train character HMMs. The use of implicit segmentation technique at the character level resulted in an improvement of around 10%. This system also outperformed an existing system tested on the same dataset by around 16%. Analysis based on learning curves showed that increasing the training data could result in better accuracy. Accordingly, we collected additional data and obtained an improvement of 4% with 6 additional samples.

PDF Abstract