Search Results for author: Logan Smith

Found 3 papers, 3 papers with code

Eliciting Latent Predictions from Transformers with the Tuned Lens

2 code implementations • 14 Mar 2023 • Nora Belrose, Zach Furman, Logan Smith, Danny Halawi, Igor Ostrovsky, Lev McKinney, Stella Biderman, Jacob Steinhardt

We analyze transformers from the perspective of iterative inference, seeking to understand how model predictions are refined layer by layer.

Language Modelling

896

Paper
Code

Researching Alignment Research: Unsupervised Analysis

1 code implementation • 6 Jun 2022 • Jan H. Kirchner, Logan Smith, Jacques Thibodeau, Kyle McDonell, Laria Reynolds

We looked at the subfields and identified the prominent researchers, recurring topics, and different modes of communication in each.

Paper
Code

Optimal Policies Tend to Seek Power

1 code implementation • NeurIPS 2021 • Alexander Matt Turner, Logan Smith, Rohin Shah, Andrew Critch, Prasad Tadepalli

Some researchers speculate that intelligent reinforcement learning (RL) agents would be incentivized to seek resources and power in pursuit of their objectives.

Reinforcement Learning (RL)

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.