The Impact of Quantity of Training Data on Recognition of Eating Gestures
This paper considers the problem of recognizing eating gestures by tracking wrist motion. Eating gestures can have large variability in motion depending on the subject, utensil, and type of food or beverage being consumed. Previous works have shown viable proofs-of-concept of recognizing eating gestures in laboratory settings with small numbers of subjects and food types, but it is unclear how well these methods would work if tested on a larger population in natural settings. As more subjects, locations and foods are tested, a larger amount of motion variability could cause a decrease in recognition accuracy. To explore this issue, this paper describes the collection and annotation of 51,614 eating gestures taken by 269 subjects eating a meal in a cafeteria. Experiments are described that explore the complexity of hidden Markov models (HMMs) and the amount of training data needed to adequately capture the motion variability across this large data set. Results found that HMMs needed a complexity of 13 states and 5 Gaussians to reach a plateau in accuracy, signifying that a minimum of 65 samples per gesture type are needed. Results also found that 500 training samples per gesture type were needed to identify the point of diminishing returns in recognition accuracy. Overall, the findings provide evidence that the size a data set typically used to demonstrate a laboratory proofs-of-concept may not be sufficiently large enough to capture all the motion variability that could be expected in transitioning to deployment with a larger population. Our data set, which is 1-2 orders of magnitude larger than all data sets tested in previous works, is being made publicly available.
PDF Abstract