Search Results for author: Alex Alemi

Found 10 papers, 3 papers with code

Training LLMs over Neurally Compressed Text

no code implementations • 4 Apr 2024 • Brian Lester, Jaehoon Lee, Alex Alemi, Jeffrey Pennington, Adam Roberts, Jascha Sohl-Dickstein, Noah Constant

In this paper, we explore the idea of training large language models (LLMs) over highly compressed text.

Paper
Add Code

Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models

no code implementations • 11 Dec 2023 • Avi Singh, John D. Co-Reyes, Rishabh Agarwal, Ankesh Anand, Piyush Patil, Xavier Garcia, Peter J. Liu, James Harrison, Jaehoon Lee, Kelvin Xu, Aaron Parisi, Abhishek Kumar, Alex Alemi, Alex Rizkowsky, Azade Nova, Ben Adlam, Bernd Bohnet, Gamaleldin Elsayed, Hanie Sedghi, Igor Mordatch, Isabelle Simpson, Izzeddin Gur, Jasper Snoek, Jeffrey Pennington, Jiri Hron, Kathleen Kenealy, Kevin Swersky, Kshiteej Mahajan, Laura Culp, Lechao Xiao, Maxwell L. Bileschi, Noah Constant, Roman Novak, Rosanne Liu, Tris Warkentin, Yundi Qian, Yamini Bansal, Ethan Dyer, Behnam Neyshabur, Jascha Sohl-Dickstein, Noah Fiedel

To do so, we investigate a simple self-training method based on expectation-maximization, which we call ReST$^{EM}$, where we (1) generate samples from the model and filter them using binary feedback, (2) fine-tune the model on these samples, and (3) repeat this process a few times.

Math

Paper
Add Code

Frontier Language Models are not Robust to Adversarial Arithmetic, or "What do I need to say so you agree 2+2=5?

no code implementations • 8 Nov 2023 • C. Daniel Freeman, Laura Culp, Aaron Parisi, Maxwell L Bileschi, Gamaleldin F Elsayed, Alex Rizkowsky, Isabelle Simpson, Alex Alemi, Azade Nova, Ben Adlam, Bernd Bohnet, Gaurav Mishra, Hanie Sedghi, Igor Mordatch, Izzeddin Gur, Jaehoon Lee, JD Co-Reyes, Jeffrey Pennington, Kelvin Xu, Kevin Swersky, Kshiteej Mahajan, Lechao Xiao, Rosanne Liu, Simon Kornblith, Noah Constant, Peter J. Liu, Roman Novak, Yundi Qian, Noah Fiedel, Jascha Sohl-Dickstein

We introduce and study the problem of adversarial arithmetic, which provides a simple yet challenging testbed for language model alignment.

Language Modelling

Paper
Add Code

Small-scale proxies for large-scale Transformer training instabilities

no code implementations • 25 Sep 2023 • Mitchell Wortsman, Peter J. Liu, Lechao Xiao, Katie Everett, Alex Alemi, Ben Adlam, John D. Co-Reyes, Izzeddin Gur, Abhishek Kumar, Roman Novak, Jeffrey Pennington, Jascha Sohl-Dickstein, Kelvin Xu, Jaehoon Lee, Justin Gilmer, Simon Kornblith

In this work, we seek ways to reproduce and study training stability and instability at smaller scales.

Paper
Add Code

Dueling Decoders: Regularizing Variational Autoencoder Latent Spaces

no code implementations • 17 May 2019 • Bryan Seybold, Emily Fertig, Alex Alemi, Ian Fischer

Variational autoencoders learn unsupervised data representations, but these models frequently converge to minima that fail to preserve meaningful semantic information.

Paper
Add Code

An information-theoretic analysis of deep latent-variable models

no code implementations • ICLR 2018 • Alex Alemi, Ben Poole, Ian Fischer, Josh Dillon, Rif A. Saurus, Kevin Murphy

We present an information-theoretic framework for understanding trade-offs in unsupervised learning of deep latent-variables models using variational inference.

Variational Inference

Paper
Add Code

TensorFlow Distributions

9 code implementations • 28 Nov 2017 • Joshua V. Dillon, Ian Langmore, Dustin Tran, Eugene Brevdo, Srinivas Vasudevan, Dave Moore, Brian Patton, Alex Alemi, Matt Hoffman, Rif A. Saurous

The TensorFlow Distributions library implements a vision of probability theory adapted to the modern deep-learning paradigm of end-to-end differentiable computation.

Probabilistic Programming

4,133

Paper
Code

Watch Your Step: Learning Node Embeddings via Graph Attention

2 code implementations • NeurIPS 2018 • Sami Abu-El-Haija, Bryan Perozzi, Rami Al-Rfou, Alex Alemi

Graph embedding methods represent nodes in a continuous vector space, preserving information from the graph (e. g. by sampling random walks).

Ranked #69 on Node Classification on Citeseer

Graph Attention Graph Embedding +2

312

Paper
Code

Motion Prediction Under Multimodality with Conditional Stochastic Networks

no code implementations • 5 May 2017 • Katerina Fragkiadaki, Jonathan Huang, Alex Alemi, Sudheendra Vijayanarasimhan, Susanna Ricco, Rahul Sukthankar

In this work, we present stochastic neural network architectures that handle such multimodality through stochasticity: future trajectories of objects, body joints or frames are represented as deep, non-linear transformations of random (as opposed to deterministic) variables.

motion prediction Optical Flow Estimation +2

Paper
Add Code

Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning

86 code implementations • 23 Feb 2016 • Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, Alex Alemi

Recently, the introduction of residual connections in conjunction with a more traditional architecture has yielded state-of-the-art performance in the 2015 ILSVRC challenge; its performance was similar to the latest generation Inception-v3 network.

Ranked #4 on Classification on InDL

Classification General Classification +1

76,614

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.