no code implementations • EACL (BEA) 2021 • Simon Flachs, Felix Stahlberg, Shankar Kumar
We investigate how best to take advantage of existing data sources for improving GEC systems for languages with limited quantities of high quality training data.
no code implementations • 13 Nov 2024 • Felix Stahlberg, Jared Lichtarge, Shankar Kumar
We propose a novel parameter-efficient training (PET) method for large language models that adapts models to downstream tasks by optimizing a small subset of the existing model parameters.
no code implementations • 24 Sep 2024 • Leonid Velikovich, Christopher Li, Diamantino Caseiro, Shankar Kumar, Pat Rondon, Kandarp Joshi, Xavier Velez
For end-to-end Automatic Speech Recognition (ASR) models, recognizing personal or rare phrases can be hard.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 20 Oct 2023 • Arya D. McCarthy, Hao Zhang, Shankar Kumar, Felix Stahlberg, Ke wu
One challenge in speech translation is that plenty of spoken content is long-form, but short units are necessary for obtaining high-quality translations.
no code implementations • 4 Oct 2023 • Jared Lichtarge, Ehsan Amid, Shankar Kumar, Tien-Ju Yang, Rohan Anil, Rajiv Mathews
Federated Averaging, and many federated learning algorithm variants which build upon it, have a limitation: all clients must share the same model architecture.
no code implementations • 22 Aug 2023 • Yun Zhu, Yinxiao Liu, Felix Stahlberg, Shankar Kumar, Yu-Hui Chen, Liangchen Luo, Lei Shu, Renjie Liu, Jindong Chen, Lei Meng
Large Language Models (LLMs) have demonstrated impressive capabilities for text rewriting.
no code implementations • 28 May 2023 • W. Ronny Huang, Hao Zhang, Shankar Kumar, Shuo-Yiin Chang, Tara N. Sainath
We address this limitation by distilling punctuation knowledge from a bidirectional teacher language model (LM) trained on written, punctuated text.
3 code implementations • 12 Apr 2023 • CJ Carey, Travis Dick, Alessandro Epasto, Adel Javanmard, Josh Karlin, Shankar Kumar, Andres Munoz Medina, Vahab Mirrokni, Gabriel Henrique Nunes, Sergei Vassilvitskii, Peilin Zhong
In this work, we present a new theoretical framework to measure re-identification risk in such user representations.
no code implementations • 19 Dec 2022 • Arya D. McCarthy, Hao Zhang, Shankar Kumar, Felix Stahlberg, Axel H. Ng
A challenge in spoken language translation is that plenty of spoken content is long-form, but short units are necessary for obtaining high-quality translations.
no code implementations • 8 Nov 2022 • Felix Stahlberg, Aashish Kumar, Chris Alberti, Shankar Kumar
We report on novel investigations into training models that make sentences concise.
no code implementations • 10 Sep 2022 • Jared Lichtarge, Chris Alberti, Shankar Kumar
For T5, we show that learning hyper-parameters during pretraining can improve performance across downstream NLU tasks.
no code implementations • NAACL (ACL) 2022 • Eric Malmi, Yue Dong, Jonathan Mallinson, Aleksandr Chuklin, Jakub Adamek, Daniil Mirylenka, Felix Stahlberg, Sebastian Krause, Shankar Kumar, Aliaksei Severyn
Text-editing models have recently become a prominent alternative to seq2seq models for monolingual text-generation tasks such as grammatical error correction, simplification, and style transfer.
no code implementations • NAACL 2022 • Felix Stahlberg, Shankar Kumar
The softmax layer in neural machine translation is designed to model the distribution over mutually exclusive tokens.
no code implementations • ACL 2022 • Felix Stahlberg, Ilia Kulikov, Shankar Kumar
In many natural language processing (NLP) tasks the same input (e. g. source sentence) can have multiple possible outputs (e. g. translations).
no code implementations • FL4NLP (ACL) 2022 • Jae Hun Ro, Theresa Breiner, Lara McConnaughey, Mingqing Chen, Ananda Theertha Suresh, Shankar Kumar, Rajiv Mathews
Most studies in cross-device federated learning focus on small models, due to the server-client communication and on-device computation bottlenecks.
no code implementations • 9 Mar 2022 • W. Ronny Huang, Cal Peyser, Tara N. Sainath, Ruoming Pang, Trevor Strohman, Shankar Kumar
We down-select a large corpus of web search queries by a factor of 53x and achieve better LM perplexities than without down-selection.
no code implementations • 16 Feb 2022 • Hao Zhang, You-Chi Cheng, Shankar Kumar, W. Ronny Huang, Mingqing Chen, Rajiv Mathews
Capitalization normalization (truecasing) is the task of restoring the correct case (uppercase or lowercase) of noisy text.
no code implementations • 1 Feb 2022 • Jae Hun Ro, Felix Stahlberg, Ke wu, Shankar Kumar
Text normalization, or the process of transforming text into a consistent, canonical form, is crucial for speech applications such as text-to-speech synthesis (TTS).
no code implementations • 26 Aug 2021 • Hao Zhang, You-Chi Cheng, Shankar Kumar, Mingqing Chen, Rajiv Mathews
Truecasing is the task of restoring the correct case (uppercase or lowercase) of noisy text generated either by an automatic system for speech recognition or machine translation or by humans.
1 code implementation • EACL (BEA) 2021 • Felix Stahlberg, Shankar Kumar
Synthetic data generation is widely known to boost the accuracy of neural grammatical error correction (GEC) systems, but existing methods often lack diversity or are too simplistic to generate the broad range of grammatical errors made by human writers.
no code implementations • 9 Apr 2021 • W. Ronny Huang, Tara N. Sainath, Cal Peyser, Shankar Kumar, David Rybach, Trevor Strohman
We introduce Lookup-Table Language Models (LookupLM), a method for scaling up the size of RNN language models with only a constant increase in the floating point operations, by increasing the expressivity of the embedding table.
1 code implementation • EMNLP 2020 • Felix Stahlberg, Shankar Kumar
For text normalization, sentence fusion, and grammatical error correction, our approach improves explainability by associating each edit operation with a human-readable tag.
no code implementations • 24 Aug 2020 • Cal Peyser, Sepand Mavandadi, Tara N. Sainath, James Apfel, Ruoming Pang, Shankar Kumar
End-to-end (E2E) automatic speech recognition (ASR) systems lack the distinct language model (LM) component that characterizes traditional speech systems.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 7 Aug 2020 • Jared Lichtarge, Chris Alberti, Shankar Kumar
Recent progress in the task of Grammatical Error Correction (GEC) has been driven by addressing data sparsity, both through new methods for generating large and noisy pretraining data and through the publication of small and higher-quality finetuning data in the BEA-2019 shared task.
5 code implementations • 7 Feb 2020 • Qian Zhang, Han Lu, Hasim Sak, Anshuman Tripathi, Erik McDermott, Stephen Koo, Shankar Kumar
We present results on the LibriSpeech dataset showing that limiting the left context for self-attention in the Transformer layers makes decoding computationally tractable for streaming, with only a slight degradation in accuracy.
no code implementations • NAACL 2019 • Jared Lichtarge, Chris Alberti, Shankar Kumar, Noam Shazeer, Niki Parmar, Simon Tong
We provide systematic analysis that compares the two approaches to data generation and highlights the effectiveness of ensembling.
no code implementations • 7 Mar 2019 • Antonios Anastasopoulos, Shankar Kumar, Hank Liao
We report analysis that provides insights into why our multimodal language model improves upon a standard RNN language model.
2 code implementations • 21 Feb 2019 • Jonathan Shen, Patrick Nguyen, Yonghui Wu, Zhifeng Chen, Mia X. Chen, Ye Jia, Anjuli Kannan, Tara Sainath, Yuan Cao, Chung-Cheng Chiu, Yanzhang He, Jan Chorowski, Smit Hinsu, Stella Laurenzo, James Qin, Orhan Firat, Wolfgang Macherey, Suyog Gupta, Ankur Bapna, Shuyuan Zhang, Ruoming Pang, Ron J. Weiss, Rohit Prabhavalkar, Qiao Liang, Benoit Jacob, Bowen Liang, HyoukJoong Lee, Ciprian Chelba, Sébastien Jean, Bo Li, Melvin Johnson, Rohan Anil, Rajat Tibrewal, Xiaobing Liu, Akiko Eriguchi, Navdeep Jaitly, Naveen Ari, Colin Cherry, Parisa Haghani, Otavio Good, Youlong Cheng, Raziel Alvarez, Isaac Caswell, Wei-Ning Hsu, Zongheng Yang, Kuan-Chieh Wang, Ekaterina Gonina, Katrin Tomanek, Ben Vanik, Zelin Wu, Llion Jones, Mike Schuster, Yanping Huang, Dehao Chen, Kazuki Irie, George Foster, John Richardson, Klaus Macherey, Antoine Bruguier, Heiga Zen, Colin Raffel, Shankar Kumar, Kanishka Rao, David Rybach, Matthew Murray, Vijayaditya Peddinti, Maxim Krikun, Michiel A. U. Bacchiani, Thomas B. Jablin, Rob Suderman, Ian Williams, Benjamin Lee, Deepti Bhatia, Justin Carlson, Semih Yavuz, Yu Zhang, Ian McGraw, Max Galkin, Qi Ge, Golan Pundak, Chad Whipkey, Todd Wang, Uri Alon, Dmitry Lepikhin, Ye Tian, Sara Sabour, William Chan, Shubham Toshniwal, Baohua Liao, Michael Nirschl, Pat Rondon
Lingvo is a Tensorflow framework offering a complete solution for collaborative deep learning research, with a particular focus towards sequence-to-sequence models.
no code implementations • 31 Oct 2018 • Jared Lichtarge, Christopher Alberti, Shankar Kumar, Noam Shazeer, Niki Parmar
We describe an approach to Grammatical Error Correction (GEC) that is effective at making use of models trained on large amounts of weakly supervised bitext.
no code implementations • 5 Dec 2017 • Tara N. Sainath, Rohit Prabhavalkar, Shankar Kumar, Seungji Lee, Anjuli Kannan, David Rybach, Vlad Schogol, Patrick Nguyen, Bo Li, Yonghui Wu, Zhifeng Chen, Chung-Cheng Chiu
However, there has been little previous work comparing phoneme-based versus grapheme-based sub-word units in the end-to-end modeling framework, to determine whether the gains from such approaches are primarily due to the new probabilistic model, or from the joint learning of the various components with grapheme-based units.
no code implementations • 15 Nov 2017 • Shankar Kumar, Michael Nirschl, Daniel Holtmann-Rice, Hank Liao, Ananda Theertha Suresh, Felix Yu
Recurrent neural network (RNN) language models (LMs) and Long Short Term Memory (LSTM) LMs, a variant of RNN LMs, have been shown to outperform traditional N-gram LMs on speech recognition tasks.
no code implementations • 23 Jun 2016 • Babak Damavandi, Shankar Kumar, Noam Shazeer, Antoine Bruguier
The model is trained using noise contrastive estimation (NCE), an approach that transforms the estimation problem of neural networks into one of binary classification between data samples and noise samples.
no code implementations • HLT 2015 • Manaal Faruqui, Shankar Kumar
Open domain relation extraction systems identify relation and argument phrases in a sentence without relying on any underlying schema.