1 code implementation • 16 Jan 2025 • Chaoqi Wang, Zhuokai Zhao, Yibo Jiang, Zhaorun Chen, Chen Zhu, Yuxin Chen, Jiayi Liu, Lizhu Zhang, Xiangjun Fan, Hao Ma, Sinong Wang
As a drop-in enhancement to the existing RLHF workflow, our causal reward modeling provides a practical way to improve the trustworthiness and fairness of LLM finetuning.
no code implementations • 28 Nov 2024 • Zijian Zhang, Kaiyuan Zheng, Zhaorun Chen, Joel Jang, Yi Li, Chaoqi Wang, Mingyu Ding, Dieter Fox, Huaxiu Yao
Notably, these constraints are flexible and can be customized to align the model with varying objectives, such as safety, efficiency, or task success.
1 code implementation • 21 Oct 2024 • Yun He, Di Jin, Chaoqi Wang, Chloe Bi, Karishma Mandyam, Hejia Zhang, Chen Zhu, Ning li, Tengyu Xu, Hongjiang Lv, Shruti Bhosale, Chenguang Zhu, Karthik Abinav Sankararaman, Eryk Helenowski, Melanie Kambadur, Aditya Tayade, Hao Ma, Han Fang, Sinong Wang
To address this gap, we introduce Multi-IF, a new benchmark designed to assess LLMs' proficiency in following multi-turn and multilingual instructions.
no code implementations • 16 Oct 2024 • Chaoqi Wang, Zhuokai Zhao, Chen Zhu, Karthik Abinav Sankararaman, Michal Valko, Xuefei Cao, Zhaorun Chen, Madian Khabsa, Yuxin Chen, Hao Ma, Sinong Wang
However, current post-training methods such as reinforcement learning from human feedback (RLHF) and direct alignment from preference methods (DAP) primarily utilize single-sample comparisons.
1 code implementation • 5 Jul 2024 • Zhaorun Chen, Yichao Du, Zichen Wen, Yiyang Zhou, Chenhang Cui, Zhenzhen Weng, Haoqin Tu, Chaoqi Wang, Zhengwei Tong, Qinglan Huang, Canyu Chen, Qinghao Ye, Zhihong Zhu, Yuqing Zhang, Jiawei Zhou, Zhuokai Zhao, Rafael Rafailov, Chelsea Finn, Huaxiu Yao
Compared with open-source VLMs, smaller-sized scoring models can provide better feedback regarding text-image alignment and image quality, while VLMs provide more accurate feedback regarding safety and generation bias due to their stronger reasoning capabilities.
1 code implementation • 28 Sep 2023 • Chaoqi Wang, Yibo Jiang, Chenghao Yang, Han Liu, Yuxin Chen
The increasing capabilities of large language models (LLMs) raise opportunities for artificial general intelligence but concurrently amplify safety concerns, such as potential misuse of AI systems, necessitating effective AI alignment.
1 code implementation • 17 Jun 2023 • Xuefeng Liu, Takuma Yoneda, Chaoqi Wang, Matthew R. Walter, Yuxin Chen
We introduce MAPS and MAPS-SE, a class of policy improvement algorithms that perform imitation learning from multiple suboptimal oracles.
no code implementations • NeurIPS 2021 • Chaoqi Wang, Adish Singla, Yuxin Chen
Our focus is to design a teaching algorithm that can provide an informative sequence of contrastive examples to the learner to speed up the learning process.
1 code implementation • 6 Nov 2020 • Chaoqi Wang, Shengyang Sun, Roger Grosse
While uncertainty estimation is a well-studied topic in deep learning, most such work focuses on marginal uncertainty estimates, i. e. the predictive mean and variance at individual input locations.
3 code implementations • ICLR 2020 • Chaoqi Wang, Guodong Zhang, Roger Grosse
Overparameterization has been shown to benefit both the optimization and generalization of neural networks, but large networks are resource hungry at both training and test time.
1 code implementation • 15 May 2019 • Chaoqi Wang, Roger Grosse, Sanja Fidler, Guodong Zhang
Reducing the test time resource requirements of a neural network while preserving test accuracy is crucial for running inference on resource-constrained devices.
no code implementations • ICLR 2019 • Guodong Zhang, Chaoqi Wang, Bowen Xu, Roger Grosse
Weight decay is one of the standard tricks in the neural network toolbox, but the reasons for its regularization effect are poorly understood, and recent results have cast doubt on the traditional interpretation in terms of $L_2$ regularization.
4 code implementations • ICML 2018 • Shengyang Sun, Guodong Zhang, Chaoqi Wang, Wenyuan Zeng, Jiaman Li, Roger Grosse
The NKN architecture is based on the composition rules for kernels, so that each unit of the network corresponds to a valid kernel.
no code implementations • 16 Nov 2017 • Deng Cai, Xiuye Gu, Chaoqi Wang
However, there are serious flaws in the evaluations of existing deep hashing papers: (1) The datasets they used are too small and simple to simulate the real CBIR situation.