Search Results for author: Jeff Dean

Found 30 papers, 11 papers with code

Gemma: Open Models Based on Gemini Research and Technology

no code implementations13 Mar 2024 Gemma Team, Thomas Mesnard, Cassidy Hardin, Robert Dadashi, Surya Bhupatiraju, Shreya Pathak, Laurent SIfre, Morgane Rivière, Mihir Sanjay Kale, Juliette Love, Pouya Tafti, Léonard Hussenot, Pier Giuseppe Sessa, Aakanksha Chowdhery, Adam Roberts, Aditya Barua, Alex Botev, Alex Castro-Ros, Ambrose Slone, Amélie Héliou, Andrea Tacchetti, Anna Bulanova, Antonia Paterson, Beth Tsai, Bobak Shahriari, Charline Le Lan, Christopher A. Choquette-Choo, Clément Crepy, Daniel Cer, Daphne Ippolito, David Reid, Elena Buchatskaya, Eric Ni, Eric Noland, Geng Yan, George Tucker, George-Christian Muraru, Grigory Rozhdestvenskiy, Henryk Michalewski, Ian Tenney, Ivan Grishchenko, Jacob Austin, James Keeling, Jane Labanowski, Jean-Baptiste Lespiau, Jeff Stanway, Jenny Brennan, Jeremy Chen, Johan Ferret, Justin Chiu, Justin Mao-Jones, Katherine Lee, Kathy Yu, Katie Millican, Lars Lowe Sjoesund, Lisa Lee, Lucas Dixon, Machel Reid, Maciej Mikuła, Mateo Wirth, Michael Sharman, Nikolai Chinaev, Nithum Thain, Olivier Bachem, Oscar Chang, Oscar Wahltinez, Paige Bailey, Paul Michel, Petko Yotov, Rahma Chaabouni, Ramona Comanescu, Reena Jana, Rohan Anil, Ross Mcilroy, Ruibo Liu, Ryan Mullins, Samuel L Smith, Sebastian Borgeaud, Sertan Girgin, Sholto Douglas, Shree Pandya, Siamak Shakeri, Soham De, Ted Klimenko, Tom Hennigan, Vlad Feinberg, Wojciech Stokowiec, Yu-Hui Chen, Zafarali Ahmed, Zhitao Gong, Tris Warkentin, Ludovic Peran, Minh Giang, Clément Farabet, Oriol Vinyals, Jeff Dean, Koray Kavukcuoglu, Demis Hassabis, Zoubin Ghahramani, Douglas Eck, Joelle Barral, Fernando Pereira, Eli Collins, Armand Joulin, Noah Fiedel, Evan Senter, Alek Andreev, Kathleen Kenealy

This work introduces Gemma, a family of lightweight, state-of-the art open models built from the research and technology used to create Gemini models.

AI and the Opportunity for Shared Prosperity: Lessons from the History of Technology and the Economy

no code implementations18 Jan 2024 Guy Ben-Ishai, Jeff Dean, James Manyika, Ruth Porat, Hal Varian, Kent Walker

We explore these questions by considering the recent history of technology and innovation as a guide for the likely impact of AI and what we must do to realize its economic potential to benefit society.

Brainformers: Trading Simplicity for Efficiency

no code implementations29 May 2023 Yanqi Zhou, Nan Du, Yanping Huang, Daiyi Peng, Chang Lan, Da Huang, Siamak Shakeri, David So, Andrew Dai, Yifeng Lu, Zhifeng Chen, Quoc Le, Claire Cui, James Laudon, Jeff Dean

Using this insight, we develop a complex block, named Brainformer, that consists of a diverse sets of layers such as sparsely gated feed-forward layers, dense feed-forward layers, attention layers, and various forms of layer normalization and activation functions.

Efficiently Scaling Transformer Inference

no code implementations9 Nov 2022 Reiner Pope, Sholto Douglas, Aakanksha Chowdhery, Jacob Devlin, James Bradbury, Anselm Levskaya, Jonathan Heek, Kefan Xiao, Shivani Agrawal, Jeff Dean

We study the problem of efficient generative inference for Transformer models, in one of its most challenging settings: large deep models, with tight latency targets and long sequence lengths.


A Review of Sparse Expert Models in Deep Learning

no code implementations4 Sep 2022 William Fedus, Jeff Dean, Barret Zoph

Sparse expert models are a thirty-year old concept re-emerging as a popular architecture in deep learning.

speech-recognition Speech Recognition

An Evolutionary Approach to Dynamic Introduction of Tasks in Large-scale Multitask Learning Systems

1 code implementation25 May 2022 Andrea Gesmundo, Jeff Dean

Multitask learning assumes that models capable of learning from multiple tasks can achieve better quality and efficiency via knowledge transfer, a key feature of human learning.

Continual Learning Fine-Grained Image Classification +1

muNet: Evolving Pretrained Deep Neural Networks into Scalable Auto-tuning Multitask Systems

no code implementations22 May 2022 Andrea Gesmundo, Jeff Dean

We propose a method that uses the layers of a pretrained deep neural network as building blocks to construct an ML system that can jointly solve an arbitrary number of tasks.

Image Classification Transfer Learning

ST-MoE: Designing Stable and Transferable Sparse Expert Models

2 code implementations17 Feb 2022 Barret Zoph, Irwan Bello, Sameer Kumar, Nan Du, Yanping Huang, Jeff Dean, Noam Shazeer, William Fedus

But advancing the state-of-the-art across a broad set of natural language tasks has been hindered by training instabilities and uncertain quality during fine-tuning.

Coreference Resolution Decoder +7

Carbon Emissions and Large Neural Network Training

no code implementations21 Apr 2021 David Patterson, Joseph Gonzalez, Quoc Le, Chen Liang, Lluis-Miquel Munguia, Daniel Rothchild, David So, Maud Texier, Jeff Dean

To help reduce the carbon footprint of ML, we believe energy usage and CO2e should be a key metric in evaluating models, and we are collaborating with MLPerf developers to include energy usage during training and inference in this industry standard benchmark.

Neural Architecture Search Scheduling

Interlocking Backpropagation: Improving depthwise model-parallelism

1 code implementation8 Oct 2020 Aidan N. Gomez, Oscar Key, Kuba Perlin, Stephen Gou, Nick Frosst, Jeff Dean, Yarin Gal

Motivated by poor resource utilisation in the global setting and poor task performance in the local setting, we introduce a class of intermediary strategies between local and global learning referred to as interlocking backpropagation.

Image Classification

Faster Discovery of Neural Architectures by Searching for Paths in a Large Model

no code implementations ICLR 2018 Hieu Pham, Melody Y. Guan, Barret Zoph, Quoc V. Le, Jeff Dean

We propose Efficient Neural Architecture Search (ENAS), a faster and less expensive approach to automated model design than previous methods.

Neural Architecture Search

A Hierarchical Model for Device Placement

no code implementations ICLR 2018 Azalia Mirhoseini, Anna Goldie, Hieu Pham, Benoit Steiner, Quoc V. Le, Jeff Dean

We introduce a hierarchical model for efficient placement of computational graphs onto hardware devices, especially in heterogeneous environments with a mixture of CPUs, GPUs, and other computational devices.

Machine Translation Reinforcement Learning (RL) +1

Device Placement Optimization with Reinforcement Learning

1 code implementation ICML 2017 Azalia Mirhoseini, Hieu Pham, Quoc V. Le, Benoit Steiner, Rasmus Larsen, Yuefeng Zhou, Naveen Kumar, Mohammad Norouzi, Samy Bengio, Jeff Dean

Key to our method is the use of a sequence-to-sequence model to predict which subsets of operations in a TensorFlow graph should run on which of the available devices.

Language Modelling Machine Translation +3

Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

4 code implementations23 Jan 2017 Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hinton, Jeff Dean

In this work, we address these challenges and finally realize the promise of conditional computation, achieving greater than 1000x improvements in model capacity with only minor losses in computational efficiency on modern GPU clusters.

Computational Efficiency Language Modelling +2

Distilling the Knowledge in a Neural Network

61 code implementations9 Mar 2015 Geoffrey Hinton, Oriol Vinyals, Jeff Dean

A very simple way to improve the performance of almost any machine learning algorithm is to train many different models on the same data and then to average their predictions.

 Ranked #1 on Knowledge Distillation on ImageNet (CRD training setting metric)

Knowledge Distillation

Using Web Co-occurrence Statistics for Improving Image Categorization

no code implementations19 Dec 2013 Samy Bengio, Jeff Dean, Dumitru Erhan, Eugene Ie, Quoc Le, Andrew Rabinovich, Jonathon Shlens, Yoram Singer

Albeit the simplicity of the resulting optimization problem, it is effective in improving both recognition and localization accuracy.

Common Sense Reasoning Image Categorization +1

Cannot find the paper you are looking for? You can Submit a new open access paper.