Search Results for author: Edward J. Hu

Found 7 papers, 7 papers with code

Amortizing intractable inference in large language models

1 code implementation • 6 Oct 2023 • Edward J. Hu, Moksh Jain, Eric Elmoznino, Younesse Kaddar, Guillaume Lajoie, Yoshua Bengio, Nikolay Malkin

Autoregressive large language models (LLMs) compress knowledge from their training data through next-token conditional distributions.

Bayesian Inference

Paper
Code

GFlowNet-EM for learning compositional latent variable models

1 code implementation • 13 Feb 2023 • Edward J. Hu, Nikolay Malkin, Moksh Jain, Katie Everett, Alexandros Graikos, Yoshua Bengio

Latent variable models (LVMs) with discrete compositional latents are an important but challenging setting due to a combinatorially large number of possible configurations of the latents.

Variational Inference

Paper
Code

Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer

3 code implementations • 7 Mar 2022 • Greg Yang, Edward J. Hu, Igor Babuschkin, Szymon Sidor, Xiaodong Liu, David Farhi, Nick Ryder, Jakub Pachocki, Weizhu Chen, Jianfeng Gao

Hyperparameter (HP) tuning in deep learning is an expensive process, prohibitively so for neural networks (NNs) with billions of parameters.

1,206

Paper
Code

GFlowNet Foundations

2 code implementations • 17 Nov 2021 • Yoshua Bengio, Salem Lahlou, Tristan Deleu, Edward J. Hu, Mo Tiwari, Emmanuel Bengio

Generative Flow Networks (GFlowNets) have been introduced as a method to sample a diverse set of candidates in an active learning context, with a training objective that makes them approximately sample in proportion to a given reward function.

Active Learning

191

Paper
Code

LoRA: Low-Rank Adaptation of Large Language Models

48 code implementations • ICLR 2022 • Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen

We propose Low-Rank Adaptation, or LoRA, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks.

Language Modelling

28,985

Paper
Code

Feature Learning in Infinite-Width Neural Networks

4 code implementations • 30 Nov 2020 • Greg Yang, Edward J. Hu

However, we show that the standard and NTK parametrizations of a neural network do not admit infinite-width limits that can learn features, which is crucial for pretraining and transfer learning such as with BERT.

Few-Shot Learning Transfer Learning

1,206

Paper
Code

Improved Image Wasserstein Attacks and Defenses

1 code implementation • 26 Apr 2020 • Edward J. Hu, Adith Swaminathan, Hadi Salman, Greg Yang

Robustness against image perturbations bounded by a $\ell_p$ ball have been well-studied in recent literature.

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.