Search Results for author: Weilin Liu

Found 3 papers, 1 papers with code

Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study

no code implementations16 Apr 2024 Shusheng Xu, Wei Fu, Jiaxuan Gao, Wenjie Ye, Weilin Liu, Zhiyu Mei, Guangju Wang, Chao Yu, Yi Wu

However, in academic benchmarks, state-of-the-art results are often achieved via reward-free methods, such as Direct Preference Optimization (DPO).

Code Generation

Multi-Agent Vulnerability Discovery for Autonomous Driving with Hazard Arbitration Reward

no code implementations12 Dec 2021 Weilin Liu, Ye Mu, Chao Yu, Xuefei Ning, Zhong Cao, Yi Wu, Shuang Liang, Huazhong Yang, Yu Wang

These scenarios indeed correspond to the vulnerabilities of the under-test driving policies, thus are meaningful for their further improvements.

Autonomous Driving Multi-agent Reinforcement Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.