no code implementations • 22 Feb 2025 • Hantao Lou, Changye Li, Jiaming Ji, Yaodong Yang
In text-only LLMs, Sparse Autoencoders (SAEs) have gained attention for their ability to interpret latent representations.
1 code implementation • 9 Jan 2025 • Hantao Lou, Jiaming Ji, Kaile Wang, Yaodong Yang
The rapid advancement of large language models (LLMs) has led to significant improvements in their capabilities, but also to increased concerns about their alignment with human values and intentions.
1 code implementation • 20 Dec 2024 • Jiaming Ji, Jiayi Zhou, Hantao Lou, Boyuan Chen, Donghai Hong, Xuyao Wang, Wenqi Chen, Kaile Wang, Rui Pan, Jiahao Li, Mohan Wang, Josef Dai, Tianyi Qiu, Hua Xu, Dong Li, WeiPeng Chen, Jun Song, Bo Zheng, Yaodong Yang
In this work, we make the first attempt to fine-tune all-modality models (i. e. input and output with any modality, also named any-to-any models) using human preference data across all modalities (including text, image, audio, and video), ensuring its behavior aligns with human intentions.
1 code implementation • 10 Jun 2024 • Jiaming Ji, Kaile Wang, Tianyi Qiu, Boyuan Chen, Jiayi Zhou, Changye Li, Hantao Lou, Josef Dai, Yunhuai Liu, Yaodong Yang
Empirically, we demonstrate the elasticity of post-alignment models, i. e., the tendency to revert to the behavior distribution formed during the pre-training phase upon further fine-tuning.
no code implementations • 4 Feb 2024 • Jiaming Ji, Boyuan Chen, Hantao Lou, Donghai Hong, Borong Zhang, Xuehai Pan, Juntao Dai, Tianyi Qiu, Yaodong Yang
However, the tension between the complexity of current alignment methods and the need for rapid iteration in deployment scenarios necessitates the development of a model-agnostic alignment approach that can operate under these constraints.
no code implementations • 30 Oct 2023 • Jiaming Ji, Tianyi Qiu, Boyuan Chen, Borong Zhang, Hantao Lou, Kaile Wang, Yawen Duan, Zhonghao He, Jiayi Zhou, Zhaowei Zhang, Fanzhi Zeng, Kwan Yee Ng, Juntao Dai, Xuehai Pan, Aidan O'Gara, Yingshan Lei, Hua Xu, Brian Tse, Jie Fu, Stephen Mcaleer, Yaodong Yang, Yizhou Wang, Song-Chun Zhu, Yike Guo, Wen Gao
The former aims to make AI systems aligned via alignment training, while the latter aims to gain evidence about the systems' alignment and govern them appropriately to avoid exacerbating misalignment risks.
1 code implementation • Cell Research 2022 • Hantao Lou, Jian-Qing Zheng, Xiaohang Leo Fang, Zhu Liang, Meihan Zhang, Yu Chen, Chunmei Wang, Xuetao Cao
The COVID-19 pandemic has been ongoing for nearly two and half years, and new variants of concern (VOCs) of SARS-CoV-2 continue to emerge, which urges the development of broadly neutralizing antibodies.