no code implementations • 18 Dec 2024 • Alex Tamkin, Miles McCain, Kunal Handa, Esin Durmus, Liane Lovitt, Ankur Rathi, Saffron Huang, Alfred Mountfield, Jerry Hong, Stuart Ritchie, Michael Stern, Brian Clarke, Landon Goldberg, Theodore R. Sumers, Jared Mueller, William McEachen, Wes Mitchell, Shan Carter, Jack Clark, Jared Kaplan, Deep Ganguli
How are AI assistants being used in the real world?
1 code implementation • 14 Jun 2024 • Carson Denison, Monte MacDiarmid, Fazl Barez, David Duvenaud, Shauna Kravec, Samuel Marks, Nicholas Schiefer, Ryan Soklaski, Alex Tamkin, Jared Kaplan, Buck Shlegeris, Samuel R. Bowman, Ethan Perez, Evan Hubinger
We construct a curriculum of increasingly sophisticated gameable environments and find that training on early-curriculum environments leads to more specification gaming on remaining environments.
1 code implementation • 12 Jun 2024 • Saffron Huang, Divya Siddarth, Liane Lovitt, Thomas I. Liao, Esin Durmus, Alex Tamkin, Deep Ganguli
There is growing consensus that language model (LM) developers should not be the sole deciders of LM behavior, creating a need for methods that enable the broader public to collectively shape the behavior of LM systems that affect them.
no code implementations • 8 Mar 2024 • Kunal Handa, Yarin Gal, Ellie Pavlick, Noah Goodman, Jacob Andreas, Alex Tamkin, Belinda Z. Li
We introduce OPEN (Optimal Preference Elicitation with Natural language) a framework that uses BOED to guide the choice of informative questions and an LM to extract features and translate abstract BOED queries into natural language questions.
no code implementations • 6 Dec 2023 • Alex Tamkin, Amanda Askell, Liane Lovitt, Esin Durmus, Nicholas Joseph, Shauna Kravec, Karina Nguyen, Jared Kaplan, Deep Ganguli
We present a method for proactively evaluating the potential discriminatory impact of LMs in a wide range of use cases, including hypothetical use cases where they have not yet been deployed.
1 code implementation • 26 Oct 2023 • Jan-Philipp Fränken, Sam Kwok, Peixuan Ye, Kanishk Gandhi, Dilip Arumugam, Jared Moore, Alex Tamkin, Tobias Gerstenberg, Noah D. Goodman
We explore the idea of aligning an AI assistant by inverting a model of users' (unknown) preferences from observed interactions.
1 code implementation • 26 Oct 2023 • Alex Tamkin, Mohammad Taufeeque, Noah D. Goodman
In this setting, our approach overcomes the superposition problem by assigning states to distinct codes, and we find that we can make the neural network behave as if it is in a different state by activating the code for that state.
1 code implementation • 17 Oct 2023 • Belinda Z. Li, Alex Tamkin, Noah Goodman, Jacob Andreas
Language models (LMs) can be directed to perform target tasks by using labeled examples or natural language prompts.
2 code implementations • 7 Aug 2023 • Roger Grosse, Juhan Bae, Cem Anil, Nelson Elhage, Alex Tamkin, Amirhossein Tajdini, Benoit Steiner, Dustin Li, Esin Durmus, Ethan Perez, Evan Hubinger, Kamilė Lukošiūtė, Karina Nguyen, Nicholas Joseph, Sam McCandlish, Jared Kaplan, Samuel R. Bowman
When trying to gain better visibility into a machine learning model in order to understand and mitigate the associated risks, a potentially valuable source of evidence is: which training examples most contribute to a given behavior?
1 code implementation • 28 Jun 2023 • Esin Durmus, Karina Nguyen, Thomas I. Liao, Nicholas Schiefer, Amanda Askell, Anton Bakhtin, Carol Chen, Zac Hatfield-Dodds, Danny Hernandez, Nicholas Joseph, Liane Lovitt, Sam McCandlish, Orowa Sikder, Alex Tamkin, Janel Thamkul, Jared Kaplan, Jack Clark, Deep Ganguli
We first build a dataset, GlobalOpinionQA, comprised of questions and answers from cross-national surveys designed to capture diverse opinions on global issues across different countries.
no code implementations • 5 Jun 2023 • Risto Uuk, Carlos Ignacio Gutierrez, Alex Tamkin
The European Union's Artificial Intelligence (AI) Act is set to be a landmark legal instrument for regulating AI technology.
1 code implementation • 17 Apr 2023 • Kathryn Wantlin, Chenwei Wu, Shih-Cheng Huang, Oishi Banerjee, Farah Dadabhoy, Veeral Vipin Mehta, Ryan Wonhee Han, Fang Cao, Raja R. Narayan, Errol Colak, Adewole Adamson, Laura Heacock, Geoffrey H. Tison, Alex Tamkin, Pranav Rajpurkar
Finally, we evaluate performance on out-of-distribution data collected at different hospitals than the training data, representing naturally-occurring distribution shifts that frequently degrade the performance of medical AI models.
1 code implementation • 11 Feb 2023 • Jasmine Bayrooti, Noah Goodman, Alex Tamkin
Contrastive learning methods have been applied to a range of domains and modalities by training models to identify similar "views" of data points.
no code implementations • 20 Dec 2022 • Alex Tamkin, Kunal Handa, Avash Shrestha, Noah Goodman
We investigate how both humans and models behave in the face of such task ambiguity by proposing AmbiBench, a new benchmark of six ambiguously-specified classification tasks.
1 code implementation • 18 Apr 2022 • Alex Tamkin, Dat Nguyen, Salil Deshpande, Jesse Mu, Noah Goodman
Models can fail in unpredictable ways during deployment due to task ambiguity, when multiple behaviors are consistent with the provided training data.
1 code implementation • 24 Feb 2022 • Zhengxuan Wu, Alex Tamkin, Isabel Papadimitriou
When we transfer a pretrained language model to a new language, there are many axes of variation that change at once.
no code implementations • 10 Dec 2021 • Ananya Karthik, Mike Wu, Noah Goodman, Alex Tamkin
Contrastive learning has made considerable progress in computer vision, outperforming supervised pretraining on a range of downstream datasets.
1 code implementation • 23 Nov 2021 • Alex Tamkin, Vincent Liu, Rongfei Lu, Daniel Fein, Colin Schultz, Noah Goodman
Self-supervised learning algorithms, including BERT and SimCLR, have enabled significant strides in fields like natural language processing, computer vision, and speech processing.
Ranked #1 on Self-Supervised Learning on DABS
no code implementations • 29 Sep 2021 • Alex Tamkin, Dat Nguyen, Salil Deshpande, Jesse Mu, Noah Goodman
An important barrier to the safe deployment of machine learning systems is the risk of \emph{task ambiguity}, where multiple behaviors are consistent with the provided examples.
1 code implementation • 23 Aug 2021 • Daniel Rothchild, Alex Tamkin, Julie Yu, Ujval Misra, Joseph Gonzalez
Methods for designing organic materials with desired properties have high potential impact across fields such as medicine, renewable energy, petrochemical engineering, and agriculture.
2 code implementations • 16 Aug 2021 • Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card, Rodrigo Castellon, Niladri Chatterji, Annie Chen, Kathleen Creel, Jared Quincy Davis, Dora Demszky, Chris Donahue, Moussa Doumbouya, Esin Durmus, Stefano Ermon, John Etchemendy, Kawin Ethayarajh, Li Fei-Fei, Chelsea Finn, Trevor Gale, Lauren Gillespie, Karan Goel, Noah Goodman, Shelby Grossman, Neel Guha, Tatsunori Hashimoto, Peter Henderson, John Hewitt, Daniel E. Ho, Jenny Hong, Kyle Hsu, Jing Huang, Thomas Icard, Saahil Jain, Dan Jurafsky, Pratyusha Kalluri, Siddharth Karamcheti, Geoff Keeling, Fereshte Khani, Omar Khattab, Pang Wei Koh, Mark Krass, Ranjay Krishna, Rohith Kuditipudi, Ananya Kumar, Faisal Ladhak, Mina Lee, Tony Lee, Jure Leskovec, Isabelle Levent, Xiang Lisa Li, Xuechen Li, Tengyu Ma, Ali Malik, Christopher D. Manning, Suvir Mirchandani, Eric Mitchell, Zanele Munyikwa, Suraj Nair, Avanika Narayan, Deepak Narayanan, Ben Newman, Allen Nie, Juan Carlos Niebles, Hamed Nilforoshan, Julian Nyarko, Giray Ogut, Laurel Orr, Isabel Papadimitriou, Joon Sung Park, Chris Piech, Eva Portelance, Christopher Potts, aditi raghunathan, Rob Reich, Hongyu Ren, Frieda Rong, Yusuf Roohani, Camilo Ruiz, Jack Ryan, Christopher Ré, Dorsa Sadigh, Shiori Sagawa, Keshav Santhanam, Andy Shih, Krishnan Srinivasan, Alex Tamkin, Rohan Taori, Armin W. Thomas, Florian Tramèr, Rose E. Wang, William Wang, Bohan Wu, Jiajun Wu, Yuhuai Wu, Sang Michael Xie, Michihiro Yasunaga, Jiaxuan You, Matei Zaharia, Michael Zhang, Tianyi Zhang, Xikun Zhang, Yuhui Zhang, Lucia Zheng, Kaitlyn Zhou, Percy Liang
AI is undergoing a paradigm shift with the rise of models (e. g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks.
no code implementations • 4 Feb 2021 • Alex Tamkin, Miles Brundage, Jack Clark, Deep Ganguli
On October 14th, 2020, researchers from OpenAI, the Stanford Institute for Human-Centered Artificial Intelligence, and other universities convened to discuss open research questions surrounding GPT-3, the largest publicly-disclosed dense language model at the time.
1 code implementation • NeurIPS 2020 • Alex Tamkin, Dan Jurafsky, Noah Goodman
Language exhibits structure at different scales, ranging from subwords to words, sentences, paragraphs, and documents.
1 code implementation • ICLR 2021 • Alex Tamkin, Mike Wu, Noah Goodman
However, designing these views requires considerable trial and error by human experts, hindering widespread adoption of unsupervised representation learning methods across domains and modalities.
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Alex Tamkin, Trisha Singh, Davide Giovanardi, Noah Goodman
How does language model pretraining help transfer learning?
no code implementations • 5 Nov 2019 • Ramtin Keramati, Christoph Dann, Alex Tamkin, Emma Brunskill
While maximizing expected return is the goal in most reinforcement learning approaches, risk-sensitive objectives such as conditional value at risk (CVaR) are more suitable for many high-stakes applications.
no code implementations • NAACL 2019 • Ignacio Cases, Clemens Rosenbaum, Matthew Riemer, Atticus Geiger, Tim Klinger, Alex Tamkin, Olivia Li, S Agarwal, hini, Joshua D. Greene, Dan Jurafsky, Christopher Potts, Lauri Karttunen
The model jointly optimizes the parameters of the functions and the meta-learner{'}s policy for routing inputs through those functions.