no code implementations • 27 Feb 2025 • Karolina Stańczak, Nicholas Meade, Mehar Bhatia, Hattie Zhou, Konstantin Böttinger, Jeremy Barnes, Jason Stanley, Jessica Montgomery, Richard Zemel, Nicolas Papernot, Nicolas Chapados, Denis Therien, Timothy P. Lillicrap, Ana Marasović, Sylvie Delacroix, Gillian K. Hadfield, Siva Reddy
Recent progress in large language models (LLMs) has focused on producing responses that meet human expectations and align with shared values - a process coined alignment.
no code implementations • 17 Jan 2025 • Alan Chan, Kevin Wei, Sihao Huang, Nitarshan Rajkumar, Elija Perrier, Seth Lazar, Gillian K. Hadfield, Markus Anderljung
Given this motivation, we propose the concept of agent infrastructure: technical systems and shared protocols external to agents that are designed to mediate and influence their interactions with and impacts on their environments.
no code implementations • 3 Apr 2024 • Noam Kolt, Markus Anderljung, Joslyn Barnhart, Asher Brass, Kevin Esvelt, Gillian K. Hadfield, Lennart Heim, Mikel Rodriguez, Jonas B. Sandbrink, Thomas Woodside
Mitigating the risks from frontier AI systems requires up-to-date and reliable information about those systems.
no code implementations • 11 Apr 2023 • Gillian K. Hadfield, Jack Clark
Appropriately regulating artificial intelligence is an increasingly urgent policy challenge.
2 code implementations • 25 Jan 2020 • Raphael Köster, Dylan Hadfield-Menell, Gillian K. Hadfield, Joel Z. Leibo
How can societies learn to enforce and comply with social norms?
no code implementations • 3 Nov 2018 • Dylan Hadfield-Menell, McKane Andrus, Gillian K. Hadfield
It has become commonplace to assert that autonomous agents will have to be built to follow human rules of behavior--social norms and laws.