no code implementations • 16 Nov 2023 • Andrew Zhao, Erle Zhu, Rui Lu, Matthieu Lin, Yong-Jin Liu, Gao Huang
Our approach achieves state-of-the-art results in terms of Interquartile Mean (IQM) performance and Optimality Gap reduction on the Unsupervised Reinforcement Learning Benchmark for model-free methods, recording an 86% IQM and a 16% Optimality Gap.
no code implementations • 29 Oct 2023 • Nan He, Hanyu Lai, Chenyang Zhao, Zirui Cheng, Junting Pan, Ruoyu Qin, Ruofan Lu, Rui Lu, Yunchen Zhang, Gangming Zhao, Zhaohui Hou, Zhiyuan Huang, Shaoqing Lu, Ding Liang, Mingjie Zhan
Based on TeacherLM-7. 1B, we augmented 58 NLP datasets and taught various student models with different parameters from OPT and BLOOM series in a multi-task setting.
1 code implementation • 19 Oct 2023 • Aohan Zeng, Mingdao Liu, Rui Lu, Bowen Wang, Xiao Liu, Yuxiao Dong, Jie Tang
Though many prompting methods have been proposed to complete particular agent tasks, there is lack of research focusing on improving the agent capabilities of LLMs themselves without compromising their general abilities.
2 code implementations • NeurIPS 2023 • Yang Yue, Rui Lu, Bingyi Kang, Shiji Song, Gao Huang
We first identify a fundamental pattern, self-excitation, as the primary cause of Q-value estimation divergence in offline RL.
1 code implementation • ICCV 2023 • Yulin Wang, Yang Yue, Rui Lu, Tianjiao Liu, Zhao Zhong, Shiji Song, Gao Huang
It is also effective for self-supervised learning (e. g., MAE).
no code implementations • 31 May 2022 • Rui Lu, Andrew Zhao, Simon S. Du, Gao Huang
While multitask representation learning has become a popular approach in reinforcement learning (RL) to boost the sample efficiency, the theoretical understanding of why and how it works is still limited.
2 code implementations • CVPR 2022 • Xuran Pan, Chunjiang Ge, Rui Lu, Shiji Song, Guanfu Chen, Zeyi Huang, Gao Huang
In this paper, we show that there exists a strong underlying relation between them, in the sense that the bulk of computations of these two paradigms are in fact done with the same operation.
no code implementations • 15 Jun 2021 • Rui Lu, Gao Huang, Simon S. Du
We first discover a \emph{Least-Activated-Feature-Abundance} (LAFA) criterion, denoted as $\kappa$, with which we prove that a straightforward least-square algorithm learns a policy which is $\tilde{O}(H^2\sqrt{\frac{\mathcal{C}(\Phi)^2 \kappa d}{NT}+\frac{\kappa d}{n}})$ sub-optimal.
no code implementations • 20 Aug 2020 • Liyi Guo, Rui Lu, Haoqi Zhang, Junqi Jin, Zhenzhe Zheng, Fan Wu, Jin Li, Haiyang Xu, Han Li, Wenkai Lu, Jian Xu, Kun Gai
For e-commerce platforms such as Taobao and Amazon, advertisers play an important role in the entire digital ecosystem: their behaviors explicitly influence users' browsing and shopping experience; more importantly, advertiser's expenditure on advertising constitutes a primary source of platform revenue.
1 code implementation • ICCV 2019 • Rui Lu, Feng Xue, Menghan Zhou, Anlong Ming, Yu Zhou
On one hand, considering the relevance between edge and orientation, two sub-networks are designed to share the occlusion cue.
no code implementations • 21 Mar 2019 • Rui Lu, Menghan Zhou, Anlong Ming, Yu Zhou
Occlusion edge detection requires both accurate locations and context constraints of the contour.