Distilling state-of-the-art transformer models into lightweight student models is an effective way to reduce computation cost at inference time.
We re-examine Routing Networks, an approach to multi-task learning that uses reinforcement learning to decide parameter sharing with the goal of maximizing knowledge transfer between related tasks while avoiding task interference.
Standard autoregressive language models perform only polynomial-time computation to compute the probability of the next symbol.
We discuss local minima convergence guarantees and explore the simple but critical role of the stable-manifold theorem in analyzing saddle-point avoidance.
Optimization and Control
This paper addresses the problem of community membership detection using only text features in a scenario where a small number of positive labeled examples defines the community.
This paper addresses the problem of predicting duration of unplanned power outages, using historical outage records to train a series of neural network predictors.
The architecture of the Match-Tensor model simultaneously accounts for both local relevance matching and global topicality signals allowing for a rich interplay between them when computing the relevance of a document to a query.
Social media messages' brevity and unconventional spelling pose a challenge to language identification.
The goal of this paper is to use multi-task learning to efficiently scale slot filling models for natural language understanding to handle multiple target tasks or domains.
This paper addresses the question of how language use affects community reaction to comments in online discussion forums, and the relative importance of the message vs. the messenger.