Search Results for author: Ciprian Chelba

Found 19 papers, 2 papers with code

Lego-Features: Exporting modular encoder features for streaming and deliberation ASR

no code implementations • 31 Mar 2023 • Rami Botros, Rohit Prabhavalkar, Johan Schalkwyk, Ciprian Chelba, Tara N. Sainath, Françoise Beaufays

Overall, they present a modular, powerful and cheap alternative to the standard encoder output, as well as the N-best hypotheses.

speech-recognition Speech Recognition

Paper
Add Code

Towards Computationally Verifiable Semantic Grounding for Language Models

no code implementations • 16 Nov 2022 • Chris Alberti, Kuzman Ganchev, Michael Collins, Sebastian Gehrmann, Ciprian Chelba

Compared to a baseline that generates text using greedy search, we demonstrate two techniques that improve the fluency and semantic accuracy of the generated text: The first technique samples multiple candidate text sequences from which the semantic parser chooses.

Language Modelling

Paper
Add Code

Scaling Laws for Neural Machine Translation

no code implementations • ICLR 2022 • Behrooz Ghorbani, Orhan Firat, Markus Freitag, Ankur Bapna, Maxim Krikun, Xavier Garcia, Ciprian Chelba, Colin Cherry

We show that cross-entropy loss as a function of model size follows a certain scaling law.

Machine Translation NMT +1

Paper
Add Code

Data Troubles in Sentence Level Confidence Estimation for Machine Translation

no code implementations • 26 Oct 2020 • Ciprian Chelba, Junpei Zhou, Yuezhang, Li, Hideto Kazawa, Jeff Klingner, Mengmeng Niu

For an English-Spanish translation model operating at $SACC = 0. 89$ according to a non-expert annotator pool we can derive a confidence estimate that labels 0. 5-0. 6 of the $good$ translations in an "in-domain" test set with 0. 95 Precision.

Machine Translation Sentence +1

Paper
Add Code

Multi-Stage Influence Function

no code implementations • NeurIPS 2020 • Hongge Chen, Si Si, Yang Li, Ciprian Chelba, Sanjiv Kumar, Duane Boning, Cho-Jui Hsieh

With this score, we can identify the pretraining examples in the pretraining task that contribute most to a prediction in the finetuning task.

Transfer Learning

Paper
Add Code

Practical Perspectives on Quality Estimation for Machine Translation

no code implementations • 2 May 2020 • Junpei Zhou, Ciprian Chelba, Yuezhang, Li

Sentence level quality estimation (QE) for machine translation (MT) attempts to predict the translation edit rate (TER) cost of post-editing work required to correct MT output.

Binary Classification General Classification +4

Paper
Add Code

Faster Transformer Decoding: N-gram Masked Self-Attention

no code implementations • 14 Jan 2020 • Ciprian Chelba, Mia Chen, Ankur Bapna, Noam Shazeer

Motivated by the fact that most of the information relevant to the prediction of target tokens is drawn from the source sentence $S=s_1, \ldots, s_S$, we propose truncating the target-side window used for computing self-attention by making an $N$-gram assumption.

Sentence

Paper
Add Code

Dynamically Composing Domain-Data Selection with Clean-Data Selection by ``Co-Curricular Learning'' for Neural Machine Translation

no code implementations • ACL 2019 • Wei Wang, Isaac Caswell, Ciprian Chelba

Noise and domain are important aspects of data quality for neural machine translation.

Machine Translation Transfer Learning +1

Paper
Add Code

Tagged Back-Translation

no code implementations • WS 2019 • Isaac Caswell, Ciprian Chelba, David Grangier

Recent work in Neural Machine Translation (NMT) has shown significant quality gains from noised-beam decoding during back-translation, a method to generate synthetic parallel data.

Machine Translation NMT +1

Paper
Add Code

Dynamically Composing Domain-Data Selection with Clean-Data Selection by "Co-Curricular Learning" for Neural Machine Translation

no code implementations • 3 Jun 2019 • Wei Wang, Isaac Caswell, Ciprian Chelba

Noise and domain are important aspects of data quality for neural machine translation.

Machine Translation Transfer Learning +1

Paper
Add Code

Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling

2 code implementations • 21 Feb 2019 • Jonathan Shen, Patrick Nguyen, Yonghui Wu, Zhifeng Chen, Mia X. Chen, Ye Jia, Anjuli Kannan, Tara Sainath, Yuan Cao, Chung-Cheng Chiu, Yanzhang He, Jan Chorowski, Smit Hinsu, Stella Laurenzo, James Qin, Orhan Firat, Wolfgang Macherey, Suyog Gupta, Ankur Bapna, Shuyuan Zhang, Ruoming Pang, Ron J. Weiss, Rohit Prabhavalkar, Qiao Liang, Benoit Jacob, Bowen Liang, HyoukJoong Lee, Ciprian Chelba, Sébastien Jean, Bo Li, Melvin Johnson, Rohan Anil, Rajat Tibrewal, Xiaobing Liu, Akiko Eriguchi, Navdeep Jaitly, Naveen Ari, Colin Cherry, Parisa Haghani, Otavio Good, Youlong Cheng, Raziel Alvarez, Isaac Caswell, Wei-Ning Hsu, Zongheng Yang, Kuan-Chieh Wang, Ekaterina Gonina, Katrin Tomanek, Ben Vanik, Zelin Wu, Llion Jones, Mike Schuster, Yanping Huang, Dehao Chen, Kazuki Irie, George Foster, John Richardson, Klaus Macherey, Antoine Bruguier, Heiga Zen, Colin Raffel, Shankar Kumar, Kanishka Rao, David Rybach, Matthew Murray, Vijayaditya Peddinti, Maxim Krikun, Michiel A. U. Bacchiani, Thomas B. Jablin, Rob Suderman, Ian Williams, Benjamin Lee, Deepti Bhatia, Justin Carlson, Semih Yavuz, Yu Zhang, Ian McGraw, Max Galkin, Qi Ge, Golan Pundak, Chad Whipkey, Todd Wang, Uri Alon, Dmitry Lepikhin, Ye Tian, Sara Sabour, William Chan, Shubham Toshniwal, Baohua Liao, Michael Nirschl, Pat Rondon

Lingvo is a Tensorflow framework offering a complete solution for collaborative deep learning research, with a particular focus towards sequence-to-sequence models.

Sequence-To-Sequence Speech Recognition

2,779

Paper
Code

Denoising Neural Machine Translation Training with Trusted Data and Online Data Selection

no code implementations • WS 2018 • Wei Wang, Taro Watanabe, Macduff Hughes, Tetsuji Nakagawa, Ciprian Chelba

Measuring domain relevance of data and identifying or selecting well-fit domain data for machine translation (MT) is a well-studied topic, but denoising is not yet.

Denoising Machine Translation +2

Paper
Add Code

GroupReduce: Block-Wise Low-Rank Approximation for Neural Language Model Shrinking

no code implementations • NeurIPS 2018 • Patrick H. Chen, Si Si, Yang Li, Ciprian Chelba, Cho-Jui Hsieh

Model compression is essential for serving large deep neural nets on devices with limited resources or applications that require real-time responses.

Language Modelling Model Compression +1

Paper
Add Code

N-gram Language Modeling using Recurrent Neural Network Estimation

no code implementations • 31 Mar 2017 • Ciprian Chelba, Mohammad Norouzi, Samy Bengio

Experiments on a small corpus (UPenn Treebank, one million words of training data and 10k vocabulary) have found the LSTM cell with dropout to be the best model for encoding the $n$-gram state when compared with feed-forward and vanilla RNN models.

Language Modelling Sentence

Paper
Add Code

Sparse Non-negative Matrix Language Modeling

no code implementations • TACL 2016 • Joris Pelemans, Noam Shazeer, Ciprian Chelba

We evaluate SNM language models on two corpora: the One Billion Word Benchmark and a subset of the LDC English Gigaword corpus.

Automatic Speech Recognition (ASR) Language Modelling +1

Paper
Add Code

Multinomial Loss on Held-out Data for the Sparse Non-negative Matrix Language Model

no code implementations • 5 Nov 2015 • Ciprian Chelba, Fernando Pereira

In experiments on the one billion words language modeling benchmark, we are able to slightly improve on our previous results which use a different loss function, and employ leave-one-out training on a subset of the main training set.

Language Modelling