RNN-Transducers (RNN-Ts) have gained widespread acceptance as an end-to-end model for speech to text conversion because of their high accuracy and streaming capabilities.
To the best of our knowledge, we are the first to attempt inference-time adaptation of Text-to-SQL models, and harness trainable structured similarity between subqueries.
Text-to-SQL parsers typically struggle with databases unseen during the train time.
Despite cross-lingual generalization demonstrated by pre-trained multilingual models, the translate-train paradigm of transferring English datasets across multiple languages remains to be a key mechanism for training task-specific multilingual models.
Online alignment in machine translation refers to the task of aligning a target word to a source word when the target sequence has only been partially decoded.
This results in improved zero-shot transfer from related HRLs to LRLs without reducing HRL representation and accuracy.
RNN-Transducer (RNN-T) models have become synonymous with streaming end-to-end ASR systems.
Long range forecasts are the starting point of many decision support systems that need to draw inference from high-level aggregate patterns on forecasted values.
In several real world applications, machine learning models are deployed to make predictions on data whose distribution changes gradually along time, leading to a drift between the train and test distributions.
Our goal is to evaluate the accuracy of a black-box classification model, not as a single aggregate on a given test data distribution, but as a surface over a large number of combinations of attributes characterizing multiple test data distributions.
RelateLM uses transliteration to convert the unseen script of limited LRL text into the script of a Related Prominent Language (RPL) (Hindi in our case).
Machine translation of user-generated code-mixed inputs to English is of crucial importance in applications like web search and targeted advertising.
We propose DIAL, a scalable active learning approach that jointly learns embeddings to maximize recall for blocking and accuracy for matching blocked pairs.
We consider the task of personalizing ASR models while being constrained by a fixed budget on recording speaker-specific utterances.
Missing values are commonplace in decision support platforms that aggregate data over long time stretches from disparate sources, and reliable data analytics calls for careful handling of missing data.
In recent years, marked temporal point processes (MTPPs) have emerged as a powerful modeling machinery to characterize asynchronous events in a wide variety of applications.
We evaluate named entity representations of BERT-based NLP models by investigating their robustness to replacements from the same typed class in the input.
Accordingly, we propose a novel coupling of an open-source accent-tuned local model with the black-box service where the output from the service guides frame-level inference in the local model.
Empirical evaluation on five different tasks shows that (1) our algorithm is more accurate than several existing methods of learning from a mix of clean and noisy supervision, and (2) the coupled rule-exemplar supervision is effective in denoising rules.
The domain specific components are discarded after training and only the common component is retained.
Ranked #1 on Domain Generalization on LipitK
We present a Parallel Iterative Edit (PIE) model for the problem of local sequence transduction arising in tasks like Grammatical error correction (GEC).
Ranked #10 on Grammatical Error Correction on CoNLL-2014 Shared Task
We present ARU, an Adaptive Recurrent Unit for streaming adaptation of deep globally trained time-series forecasting models.
Given a small corpus $\mathcal D_T$ pertaining to a limited set of focused topics, our goal is to train embeddings that accurately capture the sense of words in the topic in spite of the limited size of $\mathcal D_T$.
We study the calibration of several state of the art neural machine translation(NMT) systems built on attention-based encoder-decoder models.
In this paper we show that a simple beam approximation of the joint distribution between attention and output is an easy, accurate, and efficient attention mechanism for sequence to sequence learning.
We present CROSSGRAD, a method to use multi-domain training data to learn a classifier that generalizes to new domains.
Ranked #77 on Domain Generalization on PACS
Accurate demand forecasts can help on-line retail organizations better plan their supply-chain processes.
This is owing to the severe mismatch in the distributions of such entities on the web and in the relatively diminutive training data.