no code implementations • 8 May 2025 • Zhaohan Feng, Ruiqi Xue, Lei Yuan, Yang Yu, Ning Ding, Meiqin Liu, Bingzhao Gao, Jian Sun, Gang Wang
Embodied artificial intelligence (Embodied AI) plays a pivotal role in the application of advanced technologies in the intelligent era, where AI systems are integrated with physical bodies that enable them to perceive, reason, and interact with their environments.
no code implementations • 30 Jan 2025 • Xufeng Cai, Ziwei Guan, Lei Yuan, Ali Selman Aydin, Tengyu Xu, Boying Liu, Wenbo Ren, Renkai Xiang, Songyi He, Haichuan Yang, Serena Li, Mingze Gao, Yue Weng, Ji Liu
Modern recommendation systems can be broadly divided into two key stages: the ranking stage, where the system predicts various user engagements (e. g., click-through rate, like rate, follow rate, watch time), and the value model stage, which aggregates these predictive scores through a function (e. g., a linear combination defined by a weight vector) to measure the value of each content by a single numerical score.
no code implementations • 19 Nov 2024 • Yongyan Wen, Siyuan Li, Rongchang Zuo, Lei Yuan, Hangyu Mao, Peng Liu
To achieve explainability, decision trees have emerged as a popular and promising alternative to neural networks.
no code implementations • 16 Nov 2024 • Feng Chen, Fuguang Han, Cong Guan, Lei Yuan, Zhilong Zhang, Yang Yu, Zongzhang Zhang
Given the inherent non-stationarity prevalent in real-world applications, continual Reinforcement Learning (RL) aims to equip the agent with the capability to address a series of sequentially presented decision-making tasks.
1 code implementation • 4 Jul 2024 • Yi-Chen Li, Fuxiang Zhang, Wenjie Qiu, Lei Yuan, Chengxing Jia, Zongzhang Zhang, Yang Yu, Bo An
Thanks to the residual Q-learning framework, we can restore the customized LLM with the pre-trained LLM and the \emph{residual Q-function} without the reward function $r_1$.
1 code implementation • 12 Mar 2024 • Chengxing Jia, Fuxiang Zhang, Yi-Chen Li, Chen-Xiao Gao, Xu-Hui Liu, Lei Yuan, Zongzhang Zhang, Yang Yu
Specifically, the objective of adversarial data augmentation is not merely to generate data analogous to offline data distribution; instead, it aims to create adversarial examples designed to confound learned task representations and lead to incorrect task identification.
1 code implementation • 17 Feb 2024 • Xinyu Zhang, Wenjie Qiu, Yi-Chen Li, Lei Yuan, Chengxing Jia, Zongzhang Zhang, Yang Yu
DORA incorporates an information bottleneck principle that maximizes mutual information between the dynamics encoding and the environmental data, while minimizing mutual information between the dynamics encoding and the actions of the behavior policy.
no code implementations • 1 Nov 2023 • Cong Guan, Lichao Zhang, Chunpeng Fan, Yichen Li, Feng Chen, Lihe Li, Yunjia Tian, Lei Yuan, Yang Yu
Developing intelligent agents capable of seamless coordination with humans is a critical step towards achieving artificial general intelligence.
1 code implementation • 10 May 2023 • Lei Yuan, Zi-Qian Zhang, Ke Xue, Hao Yin, Feng Chen, Cong Guan, Li-He Li, Chao Qian, Yang Yu
Concretely, to avoid the ego-system overfitting to a specific attacker, we maintain a set of attackers, which is optimized to guarantee the attackers high attacking quality and behavior diversity.
no code implementations • 9 May 2023 • Lei Yuan, Feng Chen, Zhongzhang Zhang, Yang Yu
In specific, we introduce a novel message-attacking approach that models the learning of the auxiliary attacker as a cooperative problem under a shared goal to minimize the coordination ability of the ego system, with which every information channel may suffer from distinct message attacks.
no code implementations • 7 May 2023 • Lei Yuan, Lihe Li, Ziqian Zhang, Fuxiang Zhang, Cong Guan, Yang Yu
Towards tackling the mentioned issue, this paper proposes an approach Multi-Agent Continual Coordination via Progressive Task Contextualization, dubbed MACPro.
no code implementations • 7 May 2023 • Lei Yuan, Tao Jiang, Lihe Li, Feng Chen, Zongzhang Zhang, Yang Yu
Many multi-agent scenarios require message sharing among agents to promote coordination, hastening the robustness of multi-agent communication when policies are deployed in a message perturbation environment.
no code implementations • 19 Feb 2023 • Cong Guan, Feng Chen, Lei Yuan, Zongzhang Zhang, Yang Yu
We also release the built offline benchmarks in this paper as a testbed for communication ability validation to facilitate further future research.
1 code implementation • 5 Jan 2023 • Shaowei Zhang, Jiahan Cao, Lei Yuan, Yang Yu, De-Chuan Zhan
In cooperative multi-agent reinforcement learning (CMARL), it is critical for agents to achieve a balance between self-exploration and team collaboration.
1 code implementation • 13 Oct 2022 • Ke Xue, Jiacheng Xu, Lei Yuan, Miqing Li, Chao Qian, Zongzhang Zhang, Yang Yu
MA-DAC formulates the dynamic configuration of a complex algorithm with multiple types of hyperparameters as a contextual multi-agent Markov decision process and solves it by a cooperative multi-agent RL (MARL) algorithm.
1 code implementation • 9 Aug 2022 • Ke Xue, Yutong Wang, Cong Guan, Lei Yuan, Haobo Fu, Qiang Fu, Chao Qian, Yang Yu
Generating agents that can achieve zero-shot coordination (ZSC) with unseen partners is a new challenge in cooperative multi-agent reinforcement learning (MARL).
no code implementations • 1 Jun 2022 • Chengxing Jia, Hao Yin, Chenxiao Gao, Tian Xu, Lei Yuan, Zongzhang Zhang, Yang Yu
Model-based offline optimization with dynamics-aware policy provides a new perspective for policy learning and out-of-distribution generalization, where the learned policy could adapt to different dynamics enumerated at the training stage.
1 code implementation • 1 Jun 2022 • Yi Guo, Zhaocheng Liu, Jianchao Tan, Chao Liao, Sen yang, Lei Yuan, Dongying Kong, Zhi Chen, Ji Liu
When training is finished, some gates are exact zero, while others are around one, which is particularly favored by the practical hot-start training in the industry, due to no damage to the model performance before and after removing the features corresponding to exact-zero gates.
1 code implementation • ACL 2022 • Renyu Zhu, Lei Yuan, Xiang Li, Ming Gao, Wenyuan Cai
In this paper, we consider human behaviors and propose the PGNN-EK model that consists of two main components.
no code implementations • 9 Mar 2022 • Rongjun Qin, Feng Chen, Tonghan Wang, Lei Yuan, Xiaoran Wu, Zongzhang Zhang, Chongjie Zhang, Yang Yu
We demonstrate that the task representation can capture the relationship among tasks, and can generalize to unseen tasks.
1 code implementation • 10 Nov 2021 • Xiangru Lian, Binhang Yuan, XueFeng Zhu, Yulong Wang, Yongjun He, Honghuan Wu, Lei Sun, Haodong Lyu, Chengjun Liu, Xing Dong, Yiqiao Liao, Mingnan Luo, Congfei Zhang, Jingru Xie, Haonan Li, Lei Chen, Renjie Huang, Jianying Lin, Chengchun Shu, Xuezhong Qiu, Zhishan Liu, Dongying Kong, Lei Yuan, Hai Yu, Sen yang, Ce Zhang, Ji Liu
Specifically, in order to ensure both the training efficiency and the training accuracy, we design a novel hybrid training algorithm, where the embedding layer and the dense neural network are handled by different synchronization mechanisms; then we build a system called Persia (short for parallel recommendation training system with hybrid acceleration) to support this hybrid training algorithm.
no code implementations • 26 Sep 2021 • Jiahan Cao, Lei Yuan, Jianhao Wang, Shaowei Zhang, Chongjie Zhang, Yang Yu, De-Chuan Zhan
During long-time observations, agents can build \textit{awareness} for teammates to alleviate the problem of partial observability.
no code implementations • 20 Aug 2021 • Weicong Ding, Hanlin Tang, Jingshuo Feng, Lei Yuan, Sen yang, Guangxu Yang, Jie Zheng, Jing Wang, Qiang Su, Dong Zheng, Xuezhong Qiu, Yongqi Liu, Yuxuan Chen, Yang Liu, Chao Song, Dongying Kong, Kai Ren, Peng Jiang, Qiao Lian, Ji Liu
In this setting with multiple and constrained goals, this paper discovers that a probabilistic strategic parameter regime can achieve better value compared to the standard regime of finding a single deterministic parameter.
3 code implementations • ICLR 2021 • Daochen Zha, Wenye Ma, Lei Yuan, Xia Hu, Ji Liu
Unfortunately, methods based on intrinsic rewards often fall short in procedurally-generated environments, where a different environment is generated in each episode so that the agent is not likely to visit the same state more than once.
no code implementations • 23 Oct 2020 • Yunjie Zhang, Fei Tao, Xudong Liu, Runze Su, Xiaorong Mei, Weicong Ding, Zhichen Zhao, Lei Yuan, Ji Liu
In this paper, we proposed a novel end-to-end self-organizing framework for user behavior prediction.
no code implementations • 14 Sep 2020 • Runze Su, Fei Tao, Xudong Liu, Hao-Ran Wei, Xiaorong Mei, Zhiyao Duan, Lei Yuan, Ji Liu, Yuying Xie
The applications of short-term user-generated video (UGV), such as Snapchat, and Youtube short-term videos, booms recently, raising lots of multimodal machine learning tasks.
no code implementations • 17 Jul 2019 • Hanlin Tang, Xiangru Lian, Shuang Qiu, Lei Yuan, Ce Zhang, Tong Zhang, Ji Liu
Since the \emph{decentralized} training has been witnessed to be superior to the traditional \emph{centralized} training in the communication restricted scenario, therefore a natural question to ask is "how to apply the error-compensated technology to the decentralized learning to further reduce the communication cost."
no code implementations • 30 Apr 2013 • Ji Liu, Lei Yuan, Jieping Ye
Specifically, we show 1) in the noiseless case, if the condition number of $D$ is bounded and the measurement number $n\geq \Omega(s\log(p))$ where $s$ is the sparsity number, then the true solution can be recovered with high probability; and 2) in the noisy case, if the condition number of $D$ is bounded and the measurement increases faster than $s\log(p)$, that is, $s\log(p)=o(n)$, the estimate error converges to zero with probability 1 when $p$ and $s$ go to infinity.
no code implementations • NeurIPS 2011 • Lei Yuan, Jun Liu, Jieping Ye
There have been several recent attempts to study a more general formulation, where groups of features are given, potentially with overlaps between the groups.