User interactions with personal assistants like Alexa, Google Home and Siri are typically initiated by a wake term or wakeword.
In this way, $F$ serves as a feature extractor that maps the input to high-level representation and adds systematical noise using dropout.
In this paper we show that knowledge distillation can be used to encourage a model that generates claim independent document encodings to mimic the behavior of a more complex model which generates claim dependent encodings.
Neural network models have been very successful at achieving high accuracy on natural language inference (NLI) tasks.
Optimal selection of a subset of items from a given set is a hard problem that requires combinatorial optimization.
Language models (LMs) based on Long Short Term Memory (LSTM) have shown good gains in many automatic speech recognition tasks.
The first sub-system is a recurrent neural network (RNN)-based acoustic auto-encoder trained to reconstruct the audio through a finite-dimensional representation.
We present extensions of this decomposition to common regression and classification loss functions, and report a simulation-based analysis of the diversity term and the accuracy of the decomposition.