no code implementations • 16 Apr 2024 • Shusheng Xu, Wei Fu, Jiaxuan Gao, Wenjie Ye, Weilin Liu, Zhiyu Mei, Guangju Wang, Chao Yu, Yi Wu
However, in academic benchmarks, state-of-the-art results are often achieved via reward-free methods, such as Direct Preference Optimization (DPO).
1 code implementation • 29 Jun 2023 • Zhiyu Mei, Wei Fu, Guangju Wang, Huanchen Zhang, Yi Wu
In a large-scale cluster, the novel architecture of SRL leads to up to 3. 7x speedup compared to the design choices adopted by the existing libraries.
no code implementations • 15 Jun 2022 • Wei Fu, Chao Yu, Zelai Xu, Jiaqi Yang, Yi Wu
Despite all the advantages, we revisit these two principles and show that in certain scenarios, e. g., environments with a highly multi-modal reward landscape, value decomposition, and parameter sharing can be problematic and lead to undesired outcomes.
Multi-agent Reinforcement Learning reinforcement-learning +2
no code implementations • ICLR 2022 • Zihan Zhou, Wei Fu, Bingliang Zhang, Yi Wu
We present Reward-Switching Policy Optimization (RSPO), a paradigm to discover diverse strategies in complex RL environments by iteratively finding novel policies that are both locally optimal and sufficiently different from existing ones.
no code implementations • 5 Feb 2019 • Amritanshu Agrawal, Wei Fu, Di Chen, Xipeng Shen, Tim Menzies
Machine learning techniques applied to software engineering tasks can be improved by hyperparameter optimization, i. e., automatic tools that find good settings for a learner's control parameters.
no code implementations • 14 Feb 2018 • Suvodeep Majumder, Nikhila Balaji, Katie Brey, Wei Fu, Tim Menzies
Deep learners utilizes extensive computational power and can take a long time to train-- making it difficult to widely validate and repeat and improve their results.
1 code implementation • 1 Mar 2017 • Wei Fu, Tim Menzies
While deep learning is an exciting new technique, the benefits of this method need to be assessed with respect to its computational cost.
1 code implementation • 1 Mar 2017 • Wei Fu, Tim Menzies
(1) There is much variability in the efficacy of the Yang et al. predictors so even with their approach, some supervised data is required to prune weaker predictors away.
no code implementations • 8 Sep 2016 • Wei Fu, Vivek Nair, Tim Menzies
In software analytics, at least for defect prediction, several methods, like grid search and differential evolution (DE), have been proposed to learn these parameters, which has been proved to be able to improve the performance scores of learners.
no code implementations • 29 Aug 2016 • Amritanshu Agrawal, Wei Fu, Tim Menzies
When run on different datasets, LDA suffers from "order effects" i. e. different topics are generated if the order of training data is shuffled.