Search Results for author: Hantao Lou

Found 7 papers, 4 papers with code

SAE-V: Interpreting Multimodal Models for Enhanced Alignment

no code implementations22 Feb 2025 Hantao Lou, Changye Li, Jiaming Ji, Yaodong Yang

In text-only LLMs, Sparse Autoencoders (SAEs) have gained attention for their ability to interpret latent representations.

Stream Aligner: Efficient Sentence-Level Alignment via Distribution Induction

1 code implementation9 Jan 2025 Hantao Lou, Jiaming Ji, Kaile Wang, Yaodong Yang

The rapid advancement of large language models (LLMs) has led to significant improvements in their capabilities, but also to increased concerns about their alignment with human values and intentions.

Math Sentence

Align Anything: Training All-Modality Models to Follow Instructions with Language Feedback

1 code implementation20 Dec 2024 Jiaming Ji, Jiayi Zhou, Hantao Lou, Boyuan Chen, Donghai Hong, Xuyao Wang, Wenqi Chen, Kaile Wang, Rui Pan, Jiahao Li, Mohan Wang, Josef Dai, Tianyi Qiu, Hua Xu, Dong Li, WeiPeng Chen, Jun Song, Bo Zheng, Yaodong Yang

In this work, we make the first attempt to fine-tune all-modality models (i. e. input and output with any modality, also named any-to-any models) using human preference data across all modalities (including text, image, audio, and video), ensuring its behavior aligns with human intentions.

All Instruction Following

Language Models Resist Alignment: Evidence From Data Compression

1 code implementation10 Jun 2024 Jiaming Ji, Kaile Wang, Tianyi Qiu, Boyuan Chen, Jiayi Zhou, Changye Li, Hantao Lou, Josef Dai, Yunhuai Liu, Yaodong Yang

Empirically, we demonstrate the elasticity of post-alignment models, i. e., the tendency to revert to the behavior distribution formed during the pre-training phase upon further fine-tuning.

Data Compression

Aligner: Efficient Alignment by Learning to Correct

no code implementations4 Feb 2024 Jiaming Ji, Boyuan Chen, Hantao Lou, Donghai Hong, Borong Zhang, Xuehai Pan, Juntao Dai, Tianyi Qiu, Yaodong Yang

However, the tension between the complexity of current alignment methods and the need for rapid iteration in deployment scenarios necessitates the development of a model-agnostic alignment approach that can operate under these constraints.

Hallucination

AI Alignment: A Comprehensive Survey

no code implementations30 Oct 2023 Jiaming Ji, Tianyi Qiu, Boyuan Chen, Borong Zhang, Hantao Lou, Kaile Wang, Yawen Duan, Zhonghao He, Jiayi Zhou, Zhaowei Zhang, Fanzhi Zeng, Kwan Yee Ng, Juntao Dai, Xuehai Pan, Aidan O'Gara, Yingshan Lei, Hua Xu, Brian Tse, Jie Fu, Stephen Mcaleer, Yaodong Yang, Yizhou Wang, Song-Chun Zhu, Yike Guo, Wen Gao

The former aims to make AI systems aligned via alignment training, while the latter aims to gain evidence about the systems' alignment and govern them appropriately to avoid exacerbating misalignment risks.

Survey

Cannot find the paper you are looking for? You can Submit a new open access paper.