Search Results for author: Hattie Zhou

Found 7 papers, 4 papers with code

Vanishing Gradients in Reinforcement Finetuning of Language Models

1 code implementation31 Oct 2023 Noam Razin, Hattie Zhou, Omid Saremi, Vimal Thilak, Arwen Bradley, Preetum Nakkiran, Joshua Susskind, Etai Littwin

Pretrained language models are commonly aligned with human preferences and downstream tasks via reinforcement finetuning (RFT), which refers to maximizing a (possibly learned) reward function using policy gradient algorithms.

What Algorithms can Transformers Learn? A Study in Length Generalization

no code implementations24 Oct 2023 Hattie Zhou, Arwen Bradley, Etai Littwin, Noam Razin, Omid Saremi, Josh Susskind, Samy Bengio, Preetum Nakkiran

Large language models exhibit surprising emergent generalization properties, yet also struggle on many simple reasoning tasks such as arithmetic and parity.

Predicting Grokking Long Before it Happens: A look into the loss landscape of models which grok

no code implementations23 Jun 2023 Pascal Jr. Tikeng Notsawo, Hattie Zhou, Mohammad Pezeshki, Irina Rish, Guillaume Dumas

In essence, by studying the learning curve of the first few epochs, we show that one can predict whether grokking will occur later on.

Memorization

Teaching Algorithmic Reasoning via In-context Learning

no code implementations15 Nov 2022 Hattie Zhou, Azade Nova, Hugo Larochelle, Aaron Courville, Behnam Neyshabur, Hanie Sedghi

Large language models (LLMs) have shown increasing in-context learning capabilities through scaling up model and data size.

In-Context Learning

LCA: Loss Change Allocation for Neural Network Training

2 code implementations NeurIPS 2019 Janice Lan, Rosanne Liu, Hattie Zhou, Jason Yosinski

We propose a new window into training called Loss Change Allocation (LCA), in which credit for changes to the network loss is conservatively partitioned to the parameters.

Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask

6 code implementations NeurIPS 2019 Hattie Zhou, Janice Lan, Rosanne Liu, Jason Yosinski

The recent "Lottery Ticket Hypothesis" paper by Frankle & Carbin showed that a simple approach to creating sparse networks (keeping the large weights) results in models that are trainable from scratch, but only when starting from the same initial weights.

Cannot find the paper you are looking for? You can Submit a new open access paper.