3 code implementations • 11 Dec 2013 • Ciprian Chelba, Tomas Mikolov, Mike Schuster, Qi Ge, Thorsten Brants, Phillipp Koehn, Tony Robinson
We propose a new benchmark corpus to be used for measuring progress in statistical language modeling.
Ranked #22 on Language Modelling on One Billion Word
2 code implementations • 21 Feb 2019 • Jonathan Shen, Patrick Nguyen, Yonghui Wu, Zhifeng Chen, Mia X. Chen, Ye Jia, Anjuli Kannan, Tara Sainath, Yuan Cao, Chung-Cheng Chiu, Yanzhang He, Jan Chorowski, Smit Hinsu, Stella Laurenzo, James Qin, Orhan Firat, Wolfgang Macherey, Suyog Gupta, Ankur Bapna, Shuyuan Zhang, Ruoming Pang, Ron J. Weiss, Rohit Prabhavalkar, Qiao Liang, Benoit Jacob, Bowen Liang, HyoukJoong Lee, Ciprian Chelba, Sébastien Jean, Bo Li, Melvin Johnson, Rohan Anil, Rajat Tibrewal, Xiaobing Liu, Akiko Eriguchi, Navdeep Jaitly, Naveen Ari, Colin Cherry, Parisa Haghani, Otavio Good, Youlong Cheng, Raziel Alvarez, Isaac Caswell, Wei-Ning Hsu, Zongheng Yang, Kuan-Chieh Wang, Ekaterina Gonina, Katrin Tomanek, Ben Vanik, Zelin Wu, Llion Jones, Mike Schuster, Yanping Huang, Dehao Chen, Kazuki Irie, George Foster, John Richardson, Klaus Macherey, Antoine Bruguier, Heiga Zen, Colin Raffel, Shankar Kumar, Kanishka Rao, David Rybach, Matthew Murray, Vijayaditya Peddinti, Maxim Krikun, Michiel A. U. Bacchiani, Thomas B. Jablin, Rob Suderman, Ian Williams, Benjamin Lee, Deepti Bhatia, Justin Carlson, Semih Yavuz, Yu Zhang, Ian McGraw, Max Galkin, Qi Ge, Golan Pundak, Chad Whipkey, Todd Wang, Uri Alon, Dmitry Lepikhin, Ye Tian, Sara Sabour, William Chan, Shubham Toshniwal, Baohua Liao, Michael Nirschl, Pat Rondon
Lingvo is a Tensorflow framework offering a complete solution for collaborative deep learning research, with a particular focus towards sequence-to-sequence models.
no code implementations • 31 Mar 2017 • Ciprian Chelba, Mohammad Norouzi, Samy Bengio
Experiments on a small corpus (UPenn Treebank, one million words of training data and 10k vocabulary) have found the LSTM cell with dropout to be the best model for encoding the $n$-gram state when compared with feed-forward and vanilla RNN models.
no code implementations • 5 Nov 2015 • Ciprian Chelba, Fernando Pereira
In experiments on the one billion words language modeling benchmark, we are able to slightly improve on our previous results which use a different loss function, and employ leave-one-out training on a subset of the main training set.
no code implementations • 3 Dec 2014 • Noam Shazeer, Joris Pelemans, Ciprian Chelba
We present a novel family of language model (LM) estimation techniques named Sparse Non-negative Matrix (SNM) estimation.
Ranked #23 on Language Modelling on One Billion Word
no code implementations • NeurIPS 2018 • Patrick H. Chen, Si Si, Yang Li, Ciprian Chelba, Cho-Jui Hsieh
Model compression is essential for serving large deep neural nets on devices with limited resources or applications that require real-time responses.
no code implementations • WS 2018 • Wei Wang, Taro Watanabe, Macduff Hughes, Tetsuji Nakagawa, Ciprian Chelba
Measuring domain relevance of data and identifying or selecting well-fit domain data for machine translation (MT) is a well-studied topic, but denoising is not yet.
no code implementations • TACL 2016 • Joris Pelemans, Noam Shazeer, Ciprian Chelba
We evaluate SNM language models on two corpora: the One Billion Word Benchmark and a subset of the LDC English Gigaword corpus.
no code implementations • 3 Jun 2019 • Wei Wang, Isaac Caswell, Ciprian Chelba
Noise and domain are important aspects of data quality for neural machine translation.
no code implementations • WS 2019 • Isaac Caswell, Ciprian Chelba, David Grangier
Recent work in Neural Machine Translation (NMT) has shown significant quality gains from noised-beam decoding during back-translation, a method to generate synthetic parallel data.
no code implementations • ACL 2019 • Wei Wang, Isaac Caswell, Ciprian Chelba
Noise and domain are important aspects of data quality for neural machine translation.
no code implementations • 14 Jan 2020 • Ciprian Chelba, Mia Chen, Ankur Bapna, Noam Shazeer
Motivated by the fact that most of the information relevant to the prediction of target tokens is drawn from the source sentence $S=s_1, \ldots, s_S$, we propose truncating the target-side window used for computing self-attention by making an $N$-gram assumption.
no code implementations • 2 May 2020 • Junpei Zhou, Ciprian Chelba, Yuezhang, Li
Sentence level quality estimation (QE) for machine translation (MT) attempts to predict the translation edit rate (TER) cost of post-editing work required to correct MT output.
no code implementations • NeurIPS 2020 • Hongge Chen, Si Si, Yang Li, Ciprian Chelba, Sanjiv Kumar, Duane Boning, Cho-Jui Hsieh
With this score, we can identify the pretraining examples in the pretraining task that contribute most to a prediction in the finetuning task.
no code implementations • 26 Oct 2020 • Ciprian Chelba, Junpei Zhou, Yuezhang, Li, Hideto Kazawa, Jeff Klingner, Mengmeng Niu
For an English-Spanish translation model operating at $SACC = 0. 89$ according to a non-expert annotator pool we can derive a confidence estimate that labels 0. 5-0. 6 of the $good$ translations in an "in-domain" test set with 0. 95 Precision.
no code implementations • ICLR 2022 • Behrooz Ghorbani, Orhan Firat, Markus Freitag, Ankur Bapna, Maxim Krikun, Xavier Garcia, Ciprian Chelba, Colin Cherry
We show that cross-entropy loss as a function of model size follows a certain scaling law.
no code implementations • 16 Nov 2022 • Chris Alberti, Kuzman Ganchev, Michael Collins, Sebastian Gehrmann, Ciprian Chelba
Compared to a baseline that generates text using greedy search, we demonstrate two techniques that improve the fluency and semantic accuracy of the generated text: The first technique samples multiple candidate text sequences from which the semantic parser chooses.
no code implementations • 31 Mar 2023 • Rami Botros, Rohit Prabhavalkar, Johan Schalkwyk, Ciprian Chelba, Tara N. Sainath, Françoise Beaufays
Overall, they present a modular, powerful and cheap alternative to the standard encoder output, as well as the N-best hypotheses.