This paper presents two ways of dealing with scarce data in semantic decoding
using N-Best speech recognition hypotheses. First, we learn features by using a
deep learning architecture in which the weights for the unknown and known
categories are jointly optimised...
Second, an unsupervised method is used for
further tuning the weights. Sharing weights injects prior knowledge to unknown
categories. The unsupervised tuning (i.e. the risk minimisation) improves the
F-Measure when recognising nearly zero-shot data on the DSTC3 corpus. This
unsupervised method can be applied subject to two assumptions: the rank of the
class marginal is assumed to be known and the class-conditional scores of the
classifier are assumed to follow a Gaussian distribution.