no code implementations • EMNLP 2021 • Giulio Zhou, Jacob Devlin
Large-scale document retrieval systems often utilize two styles of neural network models which live at two different ends of the joint computation vs. accuracy spectrum.
no code implementations • 14 Nov 2022 • Zifeng Wang, Zizhao Zhang, Jacob Devlin, Chen-Yu Lee, Guolong Su, Hao Zhang, Jennifer Dy, Vincent Perot, Tomas Pfister
Zero-shot transfer learning for document understanding is a crucial yet under-investigated scenario to help reduce the high cost involved in annotating document entities.
no code implementations • 9 Nov 2022 • Reiner Pope, Sholto Douglas, Aakanksha Chowdhery, Jacob Devlin, James Bradbury, Anselm Levskaya, Jonathan Heek, Kefan Xiao, Shivani Agrawal, Jeff Dean
We study the problem of efficient generative inference for Transformer models, in one of its most challenging settings: large deep models, with tight latency targets and long sequence lengths.
1 code implementation • 20 Oct 2022 • Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Yunxuan Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Alex Castro-Ros, Marie Pellat, Kevin Robinson, Dasha Valter, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V. Le, Jason Wei
We find that instruction finetuning with the above aspects dramatically improves performance on a variety of model classes (PaLM, T5, U-PaLM), prompting setups (zero-shot, few-shot, CoT), and evaluation benchmarks (MMLU, BBH, TyDiQA, MGSM, open-ended generation).
Ranked #1 on Multi-task Language Understanding on BBH-nlp
Cross-Lingual Question Answering Multi-task Language Understanding +1
5 code implementations • Google Research 2022 • Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, Parker Schuh, Kensen Shi, Sasha Tsvyashchenko, Joshua Maynez, Abhishek Rao, Parker Barnes, Yi Tay, Noam Shazeer, Vinodkumar Prabhakaran, Emily Reif, Nan Du, Ben Hutchinson, Reiner Pope, James Bradbury, Jacob Austin, Michael Isard, Guy Gur-Ari, Pengcheng Yin, Toju Duke, Anselm Levskaya, Sanjay Ghemawat, Sunipa Dev, Henryk Michalewski, Xavier Garcia, Vedant Misra, Kevin Robinson, Liam Fedus, Denny Zhou, Daphne Ippolito, David Luan, Hyeontaek Lim, Barret Zoph, Alexander Spiridonov, Ryan Sepassi, David Dohan, Shivani Agrawal, Mark Omernick, Andrew M. Dai, Thanumalayan Sankaranarayana Pillai, Marie Pellat, Aitor Lewkowycz, Erica Moreira, Rewon Child, Oleksandr Polozov, Katherine Lee, Zongwei Zhou, Xuezhi Wang, Brennan Saeta, Mark Diaz, Orhan Firat, Michele Catasta, Jason Wei, Kathy Meier-Hellstern, Douglas Eck, Jeff Dean, Slav Petrov, Noah Fiedel
To further our understanding of the impact of scale on few-shot learning, we trained a 540-billion parameter, densely activated, Transformer language model, which we call Pathways Language Model PaLM.
Ranked #1 on Auto Debugging on Big-bench Lite
3 code implementations • ACL 2019 • Lajanugen Logeswaran, Ming-Wei Chang, Kenton Lee, Kristina Toutanova, Jacob Devlin, Honglak Lee
First, we show that strong reading comprehension models pre-trained on large unlabeled data can be used to generalize to unseen entities.
4 code implementations • ACL 2019 • Chris Alberti, Daniel Andor, Emily Pitler, Jacob Devlin, Michael Collins
We introduce a novel method of generating synthetic question answering corpora by combining models of question generation and answer extraction, and by filtering the results to ensure roundtrip consistency.
1 code implementation • Transactions of the Association of Computational Linguistics 2019 • Tom Kwiatkowski, Jennimaria Palomaki, Olivia Redfield, Michael Collins, Ankur Parikh, Chris Alberti, Danielle Epstein, Illia Polosukhin, Jacob Devlin, Kenton Lee, Kristina Toutanova, Llion Jones, Matthew Kelcey, Ming-Wei Chang, Andrew M. Dai, Jakob Uszkoreit, Quoc Le, Slav Petrov
The public release consists of 307, 373 training examples with single annotations, 7, 830 examples with 5-way annotations for development data, and a further 7, 842 examples 5-way annotated sequestered as test data.
Ranked #7 on Question Answering on Natural Questions (long)
no code implementations • ICLR 2019 • Ming-Wei Chang, Kristina Toutanova, Kenton Lee, Jacob Devlin
Hierarchical neural architectures are often used to capture long-distance dependencies and have been applied to many document-level tasks such as summarization, document segmentation, and sentiment analysis.
505 code implementations • NAACL 2019 • Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova
We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers.
Ranked #1 on Question Answering on CoQA
no code implementations • ICLR 2018 • Rudy Bunel, Matthew Hausknecht, Jacob Devlin, Rishabh Singh, Pushmeet Kohli
Program synthesis is the task of automatically generating a program consistent with a specification.
no code implementations • NAACL 2018 • Jiatao Gu, Hany Hassan, Jacob Devlin, Victor O. K. Li
Our proposed approach utilizes a transfer-learning approach to share lexical and sentence level representations across multiple source languages into one target language.
no code implementations • ICLR 2018 • Jacob Devlin, Jonathan Uesato, Rishabh Singh, Pushmeet Kohli
We study the problem of semantic code repair, which can be broadly defined as automatically fixing non-syntactic bugs in source code.
no code implementations • NeurIPS 2017 • Jacob Devlin, Rudy Bunel, Rishabh Singh, Matthew Hausknecht, Pushmeet Kohli
In our first proposal, portfolio adaptation, a set of induction models is pretrained on a set of related tasks, and the best model is adapted towards the new task using transfer learning.
1 code implementation • EMNLP 2017 • Jacob Devlin
By combining these techniques, our best system achieves a very competitive accuracy of 38. 3 BLEU on WMT English-French NewsTest2014, while decoding at 100 words/sec on single-threaded CPU.
3 code implementations • ICML 2017 • Jacob Devlin, Jonathan Uesato, Surya Bhupatiraju, Rishabh Singh, Abdel-rahman Mohamed, Pushmeet Kohli
Recently, two competing approaches for automatic program learning have received significant attention: (1) neural program synthesis, where a neural network is conditioned on input/output (I/O) examples and learns to generate a program, and (2) neural program induction, where a neural network generates new outputs directly using a latent program representation.
1 code implementation • NAACL 2016 • Ting-Hao, Huang, Francis Ferraro, Nasrin Mostafazadeh, Ishan Misra, Aishwarya Agrawal, Jacob Devlin, Ross Girshick, Xiaodong He, Pushmeet Kohli, Dhruv Batra, C. Lawrence Zitnick, Devi Parikh, Lucy Vanderwende, Michel Galley, Margaret Mitchell
We introduce the first dataset for sequential vision-to-language, and explore how this data may be used for the task of visual storytelling.
2 code implementations • ACL 2016 • Nasrin Mostafazadeh, Ishan Misra, Jacob Devlin, Margaret Mitchell, Xiaodong He, Lucy Vanderwende
There has been an explosion of work in the vision & language community during the past few years from image captioning to video transcription, and answering questions about images.
no code implementations • 3 Nov 2015 • Junyoung Chung, Jacob Devlin, Hany Hassan Awadalla
In this paper, we explore different neural network architectures that can predict if a speaker of a given utterance is asking a question or making a statement.
no code implementations • EMNLP 2015 • Francis Ferraro, Nasrin Mostafazadeh, Ting-Hao, Huang, Lucy Vanderwende, Jacob Devlin, Michel Galley, Margaret Mitchell
Integrating vision and language has long been a dream in work on artificial intelligence (AI).
no code implementations • IJCNLP 2015 • Hendra Setiawan, Zhongqiang Huang, Jacob Devlin, Thomas Lamar, Rabih Zbib, Richard Schwartz, John Makhoul
We present a three-pronged approach to improving Statistical Machine Translation (SMT), building on recent success in the application of neural networks to SMT.
1 code implementation • 17 May 2015 • Jacob Devlin, Saurabh Gupta, Ross Girshick, Margaret Mitchell, C. Lawrence Zitnick
We explore a variety of nearest neighbor baseline approaches for image captioning.
no code implementations • IJCNLP 2015 • Jacob Devlin, Hao Cheng, Hao Fang, Saurabh Gupta, Li Deng, Xiaodong He, Geoffrey Zweig, Margaret Mitchell
Two recent approaches have achieved state-of-the-art results in image captioning.