no code implementations • 17 Feb 2023 • Ke Hu, Tara N. Sainath, Bo Li, Nan Du, Yanping Huang, Andrew M. Dai, Yu Zhang, Rodrigo Cabrera, Zhifeng Chen, Trevor Strohman
In this work, we propose to train a single multilingual language model (LM) for shallow fusion in multiple languages.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 11 Oct 2022 • Ruibo Liu, Jason Wei, Shixiang Shane Gu, Te-Yen Wu, Soroush Vosoughi, Claire Cui, Denny Zhou, Andrew M. Dai
By training solely on written text, current language models (LMs) miss the grounded experience of humans in the real-world -- their failure to relate language to the physical world causes knowledge to be misrepresented and obvious mistakes in their reasoning.
5 code implementations • Google Research 2022 • Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, Parker Schuh, Kensen Shi, Sasha Tsvyashchenko, Joshua Maynez, Abhishek Rao, Parker Barnes, Yi Tay, Noam Shazeer, Vinodkumar Prabhakaran, Emily Reif, Nan Du, Ben Hutchinson, Reiner Pope, James Bradbury, Jacob Austin, Michael Isard, Guy Gur-Ari, Pengcheng Yin, Toju Duke, Anselm Levskaya, Sanjay Ghemawat, Sunipa Dev, Henryk Michalewski, Xavier Garcia, Vedant Misra, Kevin Robinson, Liam Fedus, Denny Zhou, Daphne Ippolito, David Luan, Hyeontaek Lim, Barret Zoph, Alexander Spiridonov, Ryan Sepassi, David Dohan, Shivani Agrawal, Mark Omernick, Andrew M. Dai, Thanumalayan Sankaranarayana Pillai, Marie Pellat, Aitor Lewkowycz, Erica Moreira, Rewon Child, Oleksandr Polozov, Katherine Lee, Zongwei Zhou, Xuezhi Wang, Brennan Saeta, Mark Diaz, Orhan Firat, Michele Catasta, Jason Wei, Kathy Meier-Hellstern, Douglas Eck, Jeff Dean, Slav Petrov, Noah Fiedel
To further our understanding of the impact of scale on few-shot learning, we trained a 540-billion parameter, densely activated, Transformer language model, which we call Pathways Language Model PaLM.
Ranked #1 on Auto Debugging on Big-bench Lite
no code implementations • 14 Dec 2021 • BoWen Zhang, Jiahui Yu, Christopher Fifty, Wei Han, Andrew M. Dai, Ruoming Pang, Fei Sha
We term this approach as Co-training Videos and Images for Action Recognition (CoVeR).
Ranked #6 on Action Classification on Moments in Time (using extra training data)
no code implementations • 13 Dec 2021 • Nan Du, Yanping Huang, Andrew M. Dai, Simon Tong, Dmitry Lepikhin, Yuanzhong Xu, Maxim Krikun, Yanqi Zhou, Adams Wei Yu, Orhan Firat, Barret Zoph, Liam Fedus, Maarten Bosma, Zongwei Zhou, Tao Wang, Yu Emma Wang, Kellie Webster, Marie Pellat, Kevin Robinson, Kathleen Meier-Hellstern, Toju Duke, Lucas Dixon, Kun Zhang, Quoc V Le, Yonghui Wu, Zhifeng Chen, Claire Cui
Scaling language models with more data, compute and parameters has driven significant progress in natural language processing.
Ranked #5 on Common Sense Reasoning on ARC (Easy) (using extra training data)
3 code implementations • ICLR 2022 • Jason Wei, Maarten Bosma, Vincent Y. Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M. Dai, Quoc V. Le
We show that instruction tuning -- finetuning language models on a collection of tasks described via instructions -- substantially improves zero-shot performance on unseen tasks.
Ranked #1 on Question Answering on StoryCloze
1 code implementation • 17 Jul 2021 • Anand Avati, Martin Seneviratne, Emily Xue, Zhen Xu, Balaji Lakshminarayanan, Andrew M. Dai
Most ML approaches focus on generalization performance on unseen data that are similar to the training data (In-Distribution, or IND).
no code implementations • 3 Feb 2021 • Zhen Xu, David R. So, Andrew M. Dai
One important challenge of applying deep learning to electronic health records (EHR) is the complexity of their multimodal structure.
no code implementations • NeurIPS 2020 • Yuan Xue, Nan Du, Anne Mottram, Martin Seneviratne, Andrew M. Dai
The paradigm of pretraining' from a set of relevant auxiliary tasks and thenfinetuning' on a target task has been successfully applied in many different domains.
1 code implementation • 22 Oct 2020 • Murphy Yuezhen Niu, Andrew M. Dai, Li Li, Augustus Odena, Zhengli Zhao, Vadim Smelyanskyi, Hartmut Neven, Sergio Boixo
Given a quantum circuit, a quantum computer can sample the output distribution exponentially faster in the number of bits than classical computers.
1 code implementation • ICLR 2021 • Marton Havasi, Rodolphe Jenatton, Stanislav Fort, Jeremiah Zhe Liu, Jasper Snoek, Balaji Lakshminarayanan, Andrew M. Dai, Dustin Tran
Recent approaches to efficiently ensemble neural networks have shown that strong robustness and uncertainty performance can be achieved with a negligible gain in parameters over the original network.
no code implementations • 10 Jul 2020 • Kamil Nar, Yuan Xue, Andrew M. Dai
When training the parameters of a linear dynamical model, the gradient descent algorithm is likely to fail to converge if the squared-error loss is used as the training loss function.
2 code implementations • CVPR 2020 • Ruiqi Gao, Erik Nijkamp, Diederik P. Kingma, Zhen Xu, Andrew M. Dai, Ying Nian Wu
(2) The update of the flow model approximately minimizes the Jensen-Shannon divergence between the flow model and the data distribution.
no code implementations • 14 Nov 2019 • Kun Zhang, Yuan Xue, Gerardo Flores, Alvin Rajkomar, Claire Cui, Andrew M. Dai
Time series data are prevalent in electronic health records, mostly in the form of physiological parameters such as vital signs and lab tests.
no code implementations • 13 Nov 2019 • Stephen R. Pfohl, Andrew M. Dai, Katherine Heller
The use of collaborative and decentralized machine learning techniques such as federated learning have the potential to enable the development and deployment of clinical risk predictions models in low-resource settings without requiring sensitive data be shared or stored in a central repository.
1 code implementation • 24 Oct 2019 • Cinjon Resnick, Abhinav Gupta, Jakob Foerster, Andrew M. Dai, Kyunghyun Cho
In this paper, we investigate the learning biases that affect the efficacy and compositionality of emergent languages.
no code implementations • 20 Sep 2019 • Zhen Xu, Andrew M. Dai, Jonas Kemp, Luke Metz
The learning rate is one of the most important hyper-parameters for model training and generalization.
1 code implementation • 6 Sep 2019 • Jonas Kemp, Alvin Rajkomar, Andrew M. Dai
Clinical notes in electronic health records contain highly heterogeneous writing styles, including non-standard terminology or abbreviations.
2 code implementations • 11 Jun 2019 • Edward Choi, Zhen Xu, Yujia Li, Michael W. Dusenberry, Gerardo Flores, Yuan Xue, Andrew M. Dai
A recent study showed that using the graphical structure underlying EHR data (e. g. relationship between diagnoses and treatments) improves the performance of prediction tasks such as heart failure prediction.
1 code implementation • 10 Jun 2019 • Michael W. Dusenberry, Dustin Tran, Edward Choi, Jonas Kemp, Jeremy Nixon, Ghassen Jerfel, Katherine Heller, Andrew M. Dai
We further show that RNNs with only Bayesian embeddings can be a more efficient way to capture model uncertainty compared to ensembles, and we analyze how model uncertainty is impacted across individual input features and patient subgroups.
1 code implementation • Transactions of the Association of Computational Linguistics 2019 • Tom Kwiatkowski, Jennimaria Palomaki, Olivia Redfield, Michael Collins, Ankur Parikh, Chris Alberti, Danielle Epstein, Illia Polosukhin, Jacob Devlin, Kenton Lee, Kristina Toutanova, Llion Jones, Matthew Kelcey, Ming-Wei Chang, Andrew M. Dai, Jakob Uszkoreit, Quoc Le, Slav Petrov
The public release consists of 307, 373 training examples with single annotations, 7, 830 examples with 5-way annotations for development data, and a further 7, 842 examples 5-way annotated sequestered as test data.
Ranked #7 on Question Answering on Natural Questions (long)
no code implementations • 17 May 2019 • Mia Xu Chen, Benjamin N Lee, Gagan Bansal, Yuan Cao, Shuyuan Zhang, Justin Lu, Jackie Tsay, Yinan Wang, Andrew M. Dai, Zhifeng Chen, Timothy Sohn, Yonghui Wu
In this paper, we present Smart Compose, a novel system for generating interactive, real-time suggestions in Gmail that assists users in writing mails by reducing repetitive typing.
11 code implementations • ICLR 2019 • Cheng-Zhi Anna Huang, Ashish Vaswani, Jakob Uszkoreit, Noam Shazeer, Ian Simon, Curtis Hawthorne, Andrew M. Dai, Matthew D. Hoffman, Monica Dinculescu, Douglas Eck
This is impractical for long sequences such as musical compositions since their memory complexity for intermediate relative information is quadratic in the sequence length.
Ranked #3 on Music Modeling on JSB Chorales
no code implementations • 20 Aug 2018 • Samuel S. Schoenholz, Sean Hackett, Laura Deming, Eugene Melamud, Navdeep Jaitly, Fiona McAllister, Jonathon O'Brien, George Dahl, Bryson Bennett, Andrew M. Dai, Daphne Koller
As in many other scientific domains, we face a fundamental problem when using machine learning to identify proteins from mass spectrometry data: large ground truth datasets mapping inputs to correct outputs are extremely difficult to obtain.
no code implementations • WS 2018 • Bhuwan Dhingra, Christopher J. Shallue, Mohammad Norouzi, Andrew M. Dai, George E. Dahl
Ideally, we could incorporate our prior knowledge of this hierarchical structure into unsupervised learning algorithms that work on text data.
no code implementations • ICML 2018 • Trieu H. Trinh, Andrew M. Dai, Minh-Thang Luong, Quoc V. Le
Despite recent advances in training recurrent neural networks (RNNs), capturing long-term dependencies in sequences remains a fundamental challenge.
Ranked #10 on Sequential Image Classification on Sequential CIFAR-10
no code implementations • 24 Jan 2018 • Alvin Rajkomar, Eyal Oren, Kai Chen, Andrew M. Dai, Nissan Hajaj, Peter J. Liu, Xiaobing Liu, Mimi Sun, Patrik Sundberg, Hector Yee, Kun Zhang, Gavin E. Duggan, Gerardo Flores, Michaela Hardt, Jamie Irvine, Quoc Le, Kurt Litsch, Jake Marcus, Alexander Mossin, Justin Tansuwan, De Wang, James Wexler, Jimbo Wilson, Dana Ludwig, Samuel L. Volchenboum, Katherine Chou, Michael Pearson, Srinivasan Madabushi, Nigam H. Shah, Atul J. Butte, Michael Howell, Claire Cui, Greg Corrado, Jeff Dean
Predictive modeling with electronic health record (EHR) data is anticipated to drive personalized medicine and improve healthcare quality.
no code implementations • 23 Jan 2018 • William Fedus, Ian Goodfellow, Andrew M. Dai
Additionally, these models are typically trained via maxi- mum likelihood and teacher forcing.
no code implementations • ICLR 2018 • Wei Wei, Quoc V. Le, Andrew M. Dai, Li-Jia Li
One challenge in applying such techniques to building goal-oriented conversation models is that maximum likelihood-based models are not optimized toward accomplishing goals.
no code implementations • ICLR 2018 • William Fedus, Ian Goodfellow, Andrew M. Dai
Neural autoregressive and seq2seq models that generate text by sampling words sequentially, with each word conditioned on the previous model, are state-of-the-art for several machine translation and summarization benchmarks.
Ranked #4 on Multivariate Time Series Imputation on PEMS-SF
1 code implementation • ICLR 2018 • William Fedus, Mihaela Rosca, Balaji Lakshminarayanan, Andrew M. Dai, Shakir Mohamed, Ian Goodfellow
Unlike other generative models, the data distribution is learned via a game between a generator (the generative model) and a discriminator (a teacher providing training signal) that each minimize their own cost.
1 code implementation • 26 Mar 2017 • Melody Y. Guan, Varun Gulshan, Andrew M. Dai, Geoffrey E. Hinton
We also show that our method performs better than competing algorithms by Welinder and Perona (2010), and by Mnih and Hinton (2012).
4 code implementations • 25 May 2016 • Takeru Miyato, Andrew M. Dai, Ian Goodfellow
We extend adversarial and virtual adversarial training to the text domain by applying perturbations to the word embeddings in a recurrent neural network rather than to the original input itself.
Ranked #16 on Sentiment Analysis on IMDb
General Classification Semi-Supervised Text Classification +2
16 code implementations • CONLL 2016 • Samuel R. Bowman, Luke Vilnis, Oriol Vinyals, Andrew M. Dai, Rafal Jozefowicz, Samy Bengio
The standard recurrent neural network language model (RNNLM) generates sentences one word at a time and does not work from an explicit global sentence representation.
164 code implementations • NeurIPS 2015 • Andrew M. Dai, Quoc V. Le
In our experiments, we find that long short term memory recurrent networks after being pretrained with the two approaches are more stable and generalize better.
5 code implementations • 29 Jul 2015 • Andrew M. Dai, Christopher Olah, Quoc V. Le
Paragraph Vectors has been recently proposed as an unsupervised method for learning distributed representations for pieces of texts.
no code implementations • 17 Dec 2014 • Andrew M. Dai, Amos J. Storkey
However, until now, Hierarchical Dirichlet Process (HDP) mixtures have not seen significant use in supervised problems with grouped data since a straightforward application of the HDP on the grouped data results in learnt clusters that are not predictive of the responses.