Search Results for author: Aidan O'Gara

Found 3 papers, 1 papers with code

AI Alignment: A Comprehensive Survey

no code implementations30 Oct 2023 Jiaming Ji, Tianyi Qiu, Boyuan Chen, Borong Zhang, Hantao Lou, Kaile Wang, Yawen Duan, Zhonghao He, Jiayi Zhou, Zhaowei Zhang, Fanzhi Zeng, Kwan Yee Ng, Juntao Dai, Xuehai Pan, Aidan O'Gara, Yingshan Lei, Hua Xu, Brian Tse, Jie Fu, Stephen Mcaleer, Yaodong Yang, Yizhou Wang, Song-Chun Zhu, Yike Guo, Wen Gao

The former aims to make AI systems aligned via alignment training, while the latter aims to gain evidence about the systems' alignment and govern them appropriately to avoid exacerbating misalignment risks.

AI Deception: A Survey of Examples, Risks, and Potential Solutions

no code implementations28 Aug 2023 Peter S. Park, Simon Goldstein, Aidan O'Gara, Michael Chen, Dan Hendrycks

This paper argues that a range of current AI systems have learned how to deceive humans.

Hoodwinked: Deception and Cooperation in a Text-Based Game for Language Models

1 code implementation5 Jul 2023 Aidan O'Gara

We conduct experiments with agents controlled by GPT-3, GPT-3. 5, and GPT-4 and find evidence of deception and lie detection capabilities.

Cannot find the paper you are looking for? You can Submit a new open access paper.