Search Results for author: Chaoqi Wang

Found 14 papers, 9 papers with code

Beyond Reward Hacking: Causal Rewards for Large Language Model Alignment

1 code implementation16 Jan 2025 Chaoqi Wang, Zhuokai Zhao, Yibo Jiang, Zhaorun Chen, Chen Zhu, Yuxin Chen, Jiayi Liu, Lizhu Zhang, Xiangjun Fan, Hao Ma, Sinong Wang

As a drop-in enhancement to the existing RLHF workflow, our causal reward modeling provides a practical way to improve the trustworthiness and fairness of LLM finetuning.

Causal Inference counterfactual +4

GRAPE: Generalizing Robot Policy via Preference Alignment

no code implementations28 Nov 2024 Zijian Zhang, Kaiyuan Zheng, Zhaorun Chen, Joel Jang, Yi Li, Chaoqi Wang, Mingyu Ding, Dieter Fox, Huaxiu Yao

Notably, these constraints are flexible and can be customized to align the model with varying objectives, such as safety, efficiency, or task success.

Preference Optimization with Multi-Sample Comparisons

no code implementations16 Oct 2024 Chaoqi Wang, Zhuokai Zhao, Chen Zhu, Karthik Abinav Sankararaman, Michal Valko, Xuefei Cao, Zhaorun Chen, Madian Khabsa, Yuxin Chen, Hao Ma, Sinong Wang

However, current post-training methods such as reinforcement learning from human feedback (RLHF) and direct alignment from preference methods (DAP) primarily utilize single-sample comparisons.

Diversity

MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?

1 code implementation5 Jul 2024 Zhaorun Chen, Yichao Du, Zichen Wen, Yiyang Zhou, Chenhang Cui, Zhenzhen Weng, Haoqin Tu, Chaoqi Wang, Zhengwei Tong, Qinglan Huang, Canyu Chen, Qinghao Ye, Zhihong Zhu, Yuqing Zhang, Jiawei Zhou, Zhuokai Zhao, Rafael Rafailov, Chelsea Finn, Huaxiu Yao

Compared with open-source VLMs, smaller-sized scoring models can provide better feedback regarding text-image alignment and image quality, while VLMs provide more accurate feedback regarding safety and generation bias due to their stronger reasoning capabilities.

Hallucination Text-to-Image Generation

Beyond Reverse KL: Generalizing Direct Preference Optimization with Diverse Divergence Constraints

1 code implementation28 Sep 2023 Chaoqi Wang, Yibo Jiang, Chenghao Yang, Han Liu, Yuxin Chen

The increasing capabilities of large language models (LLMs) raise opportunities for artificial general intelligence but concurrently amplify safety concerns, such as potential misuse of AI systems, necessitating effective AI alignment.

Active Policy Improvement from Multiple Black-box Oracles

1 code implementation17 Jun 2023 Xuefeng Liu, Takuma Yoneda, Chaoqi Wang, Matthew R. Walter, Yuxin Chen

We introduce MAPS and MAPS-SE, a class of policy improvement algorithms that perform imitation learning from multiple suboptimal oracles.

Imitation Learning Reinforcement Learning (RL)

Teaching an Active Learner with Contrastive Examples

no code implementations NeurIPS 2021 Chaoqi Wang, Adish Singla, Yuxin Chen

Our focus is to design a teaching algorithm that can provide an informative sequence of contrastive examples to the learner to speed up the learning process.

Active Learning

Beyond Marginal Uncertainty: How Accurately can Bayesian Regression Models Estimate Posterior Predictive Correlations?

1 code implementation6 Nov 2020 Chaoqi Wang, Shengyang Sun, Roger Grosse

While uncertainty estimation is a well-studied topic in deep learning, most such work focuses on marginal uncertainty estimates, i. e. the predictive mean and variance at individual input locations.

Active Learning Benchmarking +1

Picking Winning Tickets Before Training by Preserving Gradient Flow

3 code implementations ICLR 2020 Chaoqi Wang, Guodong Zhang, Roger Grosse

Overparameterization has been shown to benefit both the optimization and generalization of neural networks, but large networks are resource hungry at both training and test time.

Network Pruning

EigenDamage: Structured Pruning in the Kronecker-Factored Eigenbasis

1 code implementation15 May 2019 Chaoqi Wang, Roger Grosse, Sanja Fidler, Guodong Zhang

Reducing the test time resource requirements of a neural network while preserving test accuracy is crucial for running inference on resource-constrained devices.

Network Pruning

Three Mechanisms of Weight Decay Regularization

no code implementations ICLR 2019 Guodong Zhang, Chaoqi Wang, Bowen Xu, Roger Grosse

Weight decay is one of the standard tricks in the neural network toolbox, but the reasons for its regularization effect are poorly understood, and recent results have cast doubt on the traditional interpretation in terms of $L_2$ regularization.

A Revisit on Deep Hashings for Large-scale Content Based Image Retrieval

no code implementations16 Nov 2017 Deng Cai, Xiuye Gu, Chaoqi Wang

However, there are serious flaws in the evaluations of existing deep hashing papers: (1) The datasets they used are too small and simple to simulate the real CBIR situation.

Content-Based Image Retrieval Deep Hashing

Cannot find the paper you are looking for? You can Submit a new open access paper.