Search Results for author: Logan Smith

Found 3 papers, 3 papers with code

Eliciting Latent Predictions from Transformers with the Tuned Lens

2 code implementations14 Mar 2023 Nora Belrose, Zach Furman, Logan Smith, Danny Halawi, Igor Ostrovsky, Lev McKinney, Stella Biderman, Jacob Steinhardt

We analyze transformers from the perspective of iterative inference, seeking to understand how model predictions are refined layer by layer.

Language Modelling

Researching Alignment Research: Unsupervised Analysis

1 code implementation6 Jun 2022 Jan H. Kirchner, Logan Smith, Jacques Thibodeau, Kyle McDonell, Laria Reynolds

We looked at the subfields and identified the prominent researchers, recurring topics, and different modes of communication in each.

Optimal Policies Tend to Seek Power

1 code implementation NeurIPS 2021 Alexander Matt Turner, Logan Smith, Rohin Shah, Andrew Critch, Prasad Tadepalli

Some researchers speculate that intelligent reinforcement learning (RL) agents would be incentivized to seek resources and power in pursuit of their objectives.

Reinforcement Learning (RL)

Cannot find the paper you are looking for? You can Submit a new open access paper.