Search Results for author: Wei Fu

Found 11 papers, 4 papers with code

ReaLHF: Optimized RLHF Training for Large Language Models through Parameter Reallocation

1 code implementation20 Jun 2024 Zhiyu Mei, Wei Fu, Kaiwei Li, Guangju Wang, Huanchen Zhang, Yi Wu

Based on this formulation, ReaLHF employs a tailored search algorithm with a lightweight cost estimator to discover an efficient execution plan.

Language Modelling Large Language Model

Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study

no code implementations16 Apr 2024 Shusheng Xu, Wei Fu, Jiaxuan Gao, Wenjie Ye, Weilin Liu, Zhiyu Mei, Guangju Wang, Chao Yu, Yi Wu

However, in academic benchmarks, state-of-the-art results are often achieved via reward-free methods, such as Direct Preference Optimization (DPO).

Code Generation

SRL: Scaling Distributed Reinforcement Learning to Over Ten Thousand Cores

2 code implementations29 Jun 2023 Zhiyu Mei, Wei Fu, Jiaxuan Gao, Guangju Wang, Huanchen Zhang, Yi Wu

Following this abstraction, we develop a scalable, efficient, and extensible distributed RL system called ReaLlyScalableRL, which allows efficient and massively parallelized training and easy development of customized algorithms.

reinforcement-learning Reinforcement Learning (RL)

Revisiting Some Common Practices in Cooperative Multi-Agent Reinforcement Learning

no code implementations15 Jun 2022 Wei Fu, Chao Yu, Zelai Xu, Jiaqi Yang, Yi Wu

Despite all the advantages, we revisit these two principles and show that in certain scenarios, e. g., environments with a highly multi-modal reward landscape, value decomposition, and parameter sharing can be problematic and lead to undesired outcomes.

Multi-agent Reinforcement Learning reinforcement-learning +2

Continuously Discovering Novel Strategies via Reward-Switching Policy Optimization

no code implementations ICLR 2022 Zihan Zhou, Wei Fu, Bingliang Zhang, Yi Wu

We present Reward-Switching Policy Optimization (RSPO), a paradigm to discover diverse strategies in complex RL environments by iteratively finding novel policies that are both locally optimal and sufficiently different from existing ones.

Continuous Control Diversity

How to "DODGE" Complex Software Analytics?

no code implementations5 Feb 2019 Amritanshu Agrawal, Wei Fu, Di Chen, Xipeng Shen, Tim Menzies

Machine learning techniques applied to software engineering tasks can be improved by hyperparameter optimization, i. e., automatic tools that find good settings for a learner's control parameters.

BIG-bench Machine Learning Hyperparameter Optimization

500+ Times Faster Than Deep Learning (A Case Study Exploring Faster Methods for Text Mining StackOverflow)

no code implementations14 Feb 2018 Suvodeep Majumder, Nikhila Balaji, Katie Brey, Wei Fu, Tim Menzies

Deep learners utilizes extensive computational power and can take a long time to train-- making it difficult to widely validate and repeat and improve their results.

Clustering

Revisiting Unsupervised Learning for Defect Prediction

1 code implementation1 Mar 2017 Wei Fu, Tim Menzies

(1) There is much variability in the efficacy of the Yang et al. predictors so even with their approach, some supervised data is required to prune weaker predictors away.

Easy over Hard: A Case Study on Deep Learning

1 code implementation1 Mar 2017 Wei Fu, Tim Menzies

While deep learning is an exciting new technique, the benefits of this method need to be assessed with respect to its computational cost.

Why is Differential Evolution Better than Grid Search for Tuning Defect Predictors?

no code implementations8 Sep 2016 Wei Fu, Vivek Nair, Tim Menzies

In software analytics, at least for defect prediction, several methods, like grid search and differential evolution (DE), have been proposed to learn these parameters, which has been proved to be able to improve the performance scores of learners.

What is Wrong with Topic Modeling? (and How to Fix it Using Search-based Software Engineering)

no code implementations29 Aug 2016 Amritanshu Agrawal, Wei Fu, Tim Menzies

When run on different datasets, LDA suffers from "order effects" i. e. different topics are generated if the order of training data is shuffled.

General Classification

Cannot find the paper you are looking for? You can Submit a new open access paper.