no code implementations • 11 Dec 2024 • YuAn Liu, Le Tian, Xiao Zhou, Xinyu Gao, Kavio Yu, Yang Yu, Jie zhou
Due to the scarcity of open-source Chinese datasets for vision-language models, we collect numerous images from the Internet and annotate them using a combination of manual and automatic methods.
no code implementations • 29 Nov 2024 • Yi Liu, Yang Yu
This paper explores the problem of mediated communication enhanced by money-burning tactics for commitment power.
no code implementations • 22 Nov 2024 • Jiashuo Liang, Guancheng Li, Yang Yu
Large language models (LLMs) have been widely adopted in applications such as automated content generation and even critical decision-making systems.
no code implementations • 16 Nov 2024 • Feng Chen, Fuguang Han, Cong Guan, Lei Yuan, Zhilong Zhang, Yang Yu, Zongzhang Zhang
Given the inherent non-stationarity prevalent in real-world applications, continual Reinforcement Learning (RL) aims to equip the agent with the capability to address a series of sequentially presented decision-making tasks.
no code implementations • 8 Nov 2024 • Zhilong Zhang, Ruifeng Chen, Junyin Ye, Yihao Sun, Pengyuan Wang, JingCheng Pang, Kaiyuan Li, Tianshuo Liu, Haoxin Lin, Yang Yu, Zhi-Hua Zhou
Incorporating these two techniques, we present Whale-ST, a scalable spatial-temporal transformer-based world model with enhanced generalizability.
no code implementations • 22 Oct 2024 • Yang Yu, Yuezun Li, Xin Sun, Junyu Dong
Phytoplankton are a crucial component of aquatic ecosystems, and effective monitoring of them can provide valuable insights into ocean environments and ecosystem changes.
1 code implementation • 17 Oct 2024 • Caigao Jiang, Xiang Shu, Hong Qian, Xingyu Lu, Jun Zhou, Aimin Zhou, Yang Yu
Namely, the accuracy of most current LLM-based methods and the generality of optimization problem types that they can model are still limited.
no code implementations • 22 Aug 2024 • Fanxu Min, Qing Cai, Shaoxiang Guo, Yang Yu, Hao Fan, Junyu Dong
Current gait recognition research predominantly focuses on extracting appearance features effectively, but the performance is severely compromised by the vulnerability of silhouettes under unconstrained scenes.
1 code implementation • 3 Aug 2024 • Yang Yu, Chen Xu, Kai Wang
Adapter based fine-tuning has been studied for improving the performance of SAM on downstream tasks.
1 code implementation • 17 Jul 2024 • Xu-Hui Liu, Tian-Shuo Liu, Shengyi Jiang, Ruifeng Chen, Zhilong Zhang, Xinwei Chen, Yang Yu
Combining offline and online reinforcement learning (RL) techniques is indeed crucial for achieving efficient and safe learning where data acquisition is expensive.
no code implementations • 9 Jul 2024 • Yang Yu
We regard HU as an auxiliary task and incorporate it into the HSI SR process by exploring the constraints between LR and HR abundances.
1 code implementation • 5 Jul 2024 • Chen-Xiao Gao, Shengjun Fang, Chenjun Xiao, Yang Yu, Zongzhang Zhang
Offline preference-based reinforcement learning (RL), which focuses on optimizing policies using human preferences between pairs of trajectory segments selected from an offline dataset, has emerged as a practical avenue for RL applications.
1 code implementation • 4 Jul 2024 • Yi-Chen Li, Fuxiang Zhang, Wenjie Qiu, Lei Yuan, Chengxing Jia, Zongzhang Zhang, Yang Yu, Bo An
Thanks to the residual Q-learning framework, we can restore the customized LLM with the pre-trained LLM and the \emph{residual Q-function} without the reward function $r_1$.
1 code implementation • 4 Jul 2024 • Fuxiang Zhang, Junyou Li, Yi-Chen Li, Zongzhang Zhang, Yang Yu, Deheng Ye
In this paper, we introduce a framework that harnesses LLMs to extract background knowledge of an environment, which contains general understandings of the entire environment, making various downstream RL tasks benefit from one-time knowledge representation.
no code implementations • 29 Jun 2024 • Yang Yu, Qingxuan Lv, Yuezun Li, Zhiqiang Wei, Junyu Dong
Phytoplankton, a crucial component of aquatic ecosystems, requires efficient monitoring to understand marine ecological processes and environmental conditions.
no code implementations • 27 May 2024 • Chengxing Jia, Pengyuan Wang, Ziniu Li, Yi-Chen Li, Zhilong Zhang, Nan Tang, Yang Yu
In a similar vein, our proposed system, the BWArea model, conceptualizes language generation as a decision-making task.
no code implementations • 27 May 2024 • Haoxin Lin, Yu-Yan Xu, Yihao Sun, Zhilong Zhang, Yi-Chen Li, Chengxing Jia, Junyin Ye, Jiaji Zhang, Yang Yu
In the online setting, ADMPO-ON demonstrates improved sample efficiency compared to previous state-of-the-art methods.
1 code implementation • 24 May 2024 • Fan-Ming Luo, Zuolin Tu, Zefang Huang, Yang Yu
Recent progress has demonstrated that recurrent reinforcement learning (RL), which consists of a context encoder based on recurrent neural networks (RNNs) for unobservable state prediction and a multilayer perceptron (MLP) policy for decision making, can mitigate partial observability and serve as a robust baseline for POMDP tasks.
no code implementations • 14 Apr 2024 • Jing-Cheng Pang, Si-Hang Yang, Kaiyuan Li, Jiaji Zhang, Xiong-Hui Chen, Nan Tang, Yang Yu
Furthermore, KALM effectively enables the LLM to comprehend environmental dynamics, resulting in the generation of meaningful imaginary rollouts that reflect novel skills and demonstrate the seamless integration of large language models and reinforcement learning.
1 code implementation • 29 Mar 2024 • Bo wang, Jian Li, Yang Yu, Li Liu, Zhenping Sun, Dewen Hu
Considering the complementarity of scene flow estimation in the spatial domain's focusing capability and 3D object tracking in the temporal domain's coherence, this study aims to address a comprehensive new task that can simultaneously capture fine-grained and long-term 3D motion in an online manner: long-term scene flow estimation (LSFE).
1 code implementation • 12 Mar 2024 • Chengxing Jia, Fuxiang Zhang, Yi-Chen Li, Chen-Xiao Gao, Xu-Hui Liu, Lei Yuan, Zongzhang Zhang, Yang Yu
Specifically, the objective of adversarial data augmentation is not merely to generate data analogous to offline data distribution; instead, it aims to create adversarial examples designed to confound learned task representations and lead to incorrect task identification.
no code implementations • 17 Feb 2024 • Xinyu Zhang, Wenjie Qiu, Yi-Chen Li, Lei Yuan, Chengxing Jia, Zongzhang Zhang, Yang Yu
DORA incorporates an information bottleneck principle that maximizes mutual information between the dynamics encoding and the environmental data, while minimizing mutual information between the dynamics encoding and the actions of the behavior policy.
no code implementations • 7 Feb 2024 • Ruichu Cai, Siyang Huang, Jie Qiao, Wei Chen, Yan Zeng, Keli Zhang, Fuchun Sun, Yang Yu, Zhifeng Hao
As a key component to intuitive cognition and reasoning solutions in human intelligence, causal knowledge provides great potential for reinforcement learning (RL) agents' interpretability towards decision-making by helping reduce the searching space.
no code implementations • 6 Feb 2024 • Jing-Cheng Pang, Heng-Bo Fan, Pengyuan Wang, Jia-Hao Xiao, Nan Tang, Si-Hang Yang, Chengxing Jia, Sheng-Jun Huang, Yang Yu
The rise of large language models (LLMs) has revolutionized the way that we interact with artificial intelligence systems through natural language.
1 code implementation • 4 Feb 2024 • Lanqing Li, Hai Zhang, Xinyu Zhang, Shatong Zhu, Yang Yu, Junqiao Zhao, Pheng-Ann Heng
As demonstrations, we propose a supervised and a self-supervised implementation of $I(Z; M)$, and empirically show that the corresponding optimization algorithms exhibit remarkable generalization across a broad spectrum of RL benchmarks, context shift scenarios, data qualities and deep learning architectures.
no code implementations • 26 Jan 2024 • Zaixi Zhang, Qingyong Hu, Yang Yu, Weibo Gao, Qi Liu
However, existing methods have the following limitations: (1) The links between local subgraphs are missing in subgraph federated learning.
1 code implementation • 24 Jan 2024 • Zhi-Hao Tan, Jian-Dong Liu, Xiao-Dong Bi, Peng Tan, Qin-Cheng Zheng, Hai-Tian Liu, Yi Xie, Xiao-Chuan Zou, Yang Yu, Zhi-Hua Zhou
The learnware paradigm proposed by Zhou [2016] aims to enable users to reuse numerous existing well-trained models instead of building machine learning models from scratch, with the hope of solving new user tasks even beyond models' original purposes.
no code implementations • CVPR 2024 • Yang Yu, Erting Pan, Xinya Wang, Yuheng Wu, Xiaoguang Mei, Jiayi Ma
By integrating unmixing this work maps unpaired HSI and RGB data to a low-dimensional abundance space greatly alleviating the difficulty of generating high-dimensional samples.
1 code implementation • 26 Dec 2023 • Renzhe Zhou, Chen-Xiao Gao, Zongzhang Zhang, Yang Yu
GENTLE employs Task Auto-Encoder~(TAE), which is an encoder-decoder architecture to extract the characteristics of the tasks.
1 code implementation • 17 Dec 2023 • Haoxin Lin, Hongqiu Wu, Jiaji Zhang, Yihao Sun, Junyin Ye, Yang Yu
Real-world decision-making problems are usually accompanied by delayed rewards, which affects the sample efficiency of Reinforcement Learning, especially in the extremely delayed case where the only feedback is the episodic reward obtained at the end of an episode.
1 code implementation • 17 Dec 2023 • Ziniu Li, Tian Xu, Yang Yu
These methods, either explicitly or implicitly, learn a reward model from preference data and differ in the data used for policy optimization to unlock the generalization ability of the reward model.
no code implementations • 1 Nov 2023 • Cong Guan, Lichao Zhang, Chunpeng Fan, Yichen Li, Feng Chen, Lihe Li, Yunjia Tian, Lei Yuan, Yang Yu
Developing intelligent agents capable of seamless coordination with humans is a critical step towards achieving artificial general intelligence.
2 code implementations • 16 Oct 2023 • Ziniu Li, Tian Xu, Yushun Zhang, Zhihang Lin, Yang Yu, Ruoyu Sun, Zhi-Quan Luo
ReMax can save about 46% GPU memory than PPO when training a 7B model and enables training on A800-80GB GPUs without the memory-saving offloading technique needed by PPO.
1 code implementation • NeurIPS 2023 • Yang Yu, Qi Liu, Kai Zhang, Yuren Zhang, Chao Song, Min Hou, Yuqing Yuan, Zhihao Ye, Zaixi Zhang, Sanshi Lei Yu
Specifically, we adopt a multiple pairwise ranking loss which trains the user model to capture the similarity orders between the implicitly augmented view, the explicitly augmented view, and views from other users.
no code implementations • 9 Oct 2023 • Fan-Ming Luo, Tian Xu, Xingchen Cao, Yang Yu
MOREC learns a generalizable dynamics reward function from offline data, which is subsequently employed as a transition filter in any offline MBRL method: when generating transitions, the dynamics model generates a batch of transitions and selects the one with the highest dynamics reward value.
no code implementations • 9 Oct 2023 • Xiong-Hui Chen, Junyin Ye, Hang Zhao, Yi-Chen Li, Haoran Shi, Yu-Yan Xu, Zhihao Ye, Si-Hang Yang, Anqi Huang, Kai Xu, Zongzhang Zhang, Yang Yu
In this work, we focus on imitator learning based on only one expert demonstration.
no code implementations • 21 Sep 2023 • Zhourui Guo, Meng Yao, Yang Yu, Qiyue Yin
We assume that the interaction can be modeled as a sequence of templated questions and answers, and that there is a large corpus of previous interactions available.
1 code implementation • 12 Sep 2023 • Chen-Xiao Gao, Chenyang Wu, Mingjun Cao, Rui Kong, Zongzhang Zhang, Yang Yu
Third, we train an Advantage-Conditioned Transformer (ACT) to generate actions conditioned on the estimated advantages.
1 code implementation • 6 Sep 2023 • Yu Chen, Tingxin Li, Huiming Liu, Yang Yu
Numerous companies have started offering services based on large language models (LLM), such as ChatGPT, which inevitably raises privacy concerns as users' prompts are exposed to the model provider.
no code implementations • 26 Aug 2023 • Jiajin Luo, Baojian Zhou, Yang Yu, Ping Zhang, Xiaohui Peng, Jianglei Ma, Peiying Zhu, Jianmin Lu, Wen Tong
In order to address the lack of applicable channel models for ISAC research and evaluation, we release Sensiverse, a dataset that can be used for ISAC research.
no code implementations • 17 Aug 2023 • Yang Yu, Han Chen
Structural Health Monitoring (SHM) plays an indispensable role in ensuring the longevity and safety of infrastructure.
no code implementations • 4 Aug 2023 • Han Chen, Yang Yu, Pengtao Li
Mechanical vibration signal denoising is a pivotal task in various industrial applications, including system health monitoring and failure prediction.
1 code implementation • 3 Aug 2023 • Guanzhou Ke, Yang Yu, Guoqing Chao, Xiaoli Wang, Chenyang Xu, Shengfeng He
In this paper, we propose a novel multi-view representation disentangling method that aims to go beyond inductive biases, ensuring both interpretability and generalizability of the resulting representations.
1 code implementation • 26 Jul 2023 • Tianyu Liu, Hao Zhao, Yang Yu, Guyue Zhou, Ming Liu
However, previous studies learned within a sequence of autonomous driving datasets, resulting in unsatisfactory blurring when rotating the car in the simulator.
2 code implementations • PMLR 2023 • Yihao Sun, Jiaji Zhang, Chengxing Jia, Haoxin Lin, Junyin Ye, Yang Yu
MOBILE conducts uncertainty quantification through the inconsistency of Bellman estimations under an ensemble of learned dynamics models, which can be a better approximator to the true Bellman error, and penalizes the Bellman estimation based on this uncertainty.
no code implementations • 28 Jun 2023 • Ziqiao Meng, Peilin Zhao, Yang Yu, Irwin King
Reaction and retrosynthesis prediction are fundamental tasks in computational chemistry that have recently garnered attention from both the machine learning and drug discovery communities.
1 code implementation • 15 Jun 2023 • Bo wang, Yifan Zhang, Jian Li, Yang Yu, Zhenping Sun, Li Liu, Dewen Hu
The occlusion problem remains a crucial challenge in optical flow estimation (OFE).
no code implementations • 12 Jun 2023 • Yu Chen, Yang Yu, Rongrong Ni, Yao Zhao, Haoliang Li
Next, we design a phoneme-viseme awareness module for cross-modal feature fusion and representation alignment, so that the modality gap can be reduced and the intrinsic complementarity of the two modalities can be better explored.
2 code implementations • 11 Jun 2023 • Yuhang Ran, Yi-Chen Li, Fuxiang Zhang, Zongzhang Zhang, Yang Yu
A common taxonomy of existing offline RL works is policy regularization, which typically constrains the learned policy by distribution or support of the behavior policy.
1 code implementation • 11 Jun 2023 • Tian Xu, Ziniu Li, Yang Yu, Zhi-Quan Luo
Adversarial imitation learning (AIL), a subset of IL methods, is particularly promising, but its theoretical foundation in the presence of unknown transitions has yet to be fully developed.
no code implementations • 5 Jun 2023 • Ziqiao Meng, Peilin Zhao, Yang Yu, Irwin King
However, the current non-autoregressive decoder does not satisfy two essential rules of electron redistribution modeling simultaneously: the electron-counting rule and the symmetry rule.
no code implementations • 23 May 2023 • Jing-Cheng Pang, Pengyuan Wang, Kaiyuan Li, Xiong-Hui Chen, Jiacheng Xu, Zongzhang Zhang, Yang Yu
We demonstrate that SIRLC can be applied to various NLP tasks, such as reasoning problems, text generation, and machine translation.
1 code implementation • 10 May 2023 • Lei Yuan, Zi-Qian Zhang, Ke Xue, Hao Yin, Feng Chen, Cong Guan, Li-He Li, Chao Qian, Yang Yu
Concretely, to avoid the ego-system overfitting to a specific attacker, we maintain a set of attackers, which is optimized to guarantee the attackers high attacking quality and behavior diversity.
no code implementations • 9 May 2023 • Lei Yuan, Feng Chen, Zhongzhang Zhang, Yang Yu
In specific, we introduce a novel message-attacking approach that models the learning of the auxiliary attacker as a cooperative problem under a shared goal to minimize the coordination ability of the ego system, with which every information channel may suffer from distinct message attacks.
no code implementations • 7 May 2023 • Lei Yuan, Lihe Li, Ziqian Zhang, Fuxiang Zhang, Cong Guan, Yang Yu
Towards tackling the mentioned issue, this paper proposes an approach Multi-Agent Continual Coordination via Progressive Task Contextualization, dubbed MACPro.
no code implementations • 7 May 2023 • Lei Yuan, Tao Jiang, Lihe Li, Feng Chen, Zongzhang Zhang, Yang Yu
Many multi-agent scenarios require message sharing among agents to promote coordination, hastening the robustness of multi-agent communication when policies are deployed in a message perturbation environment.
1 code implementation • 3 May 2023 • Xiong-Hui Chen, Bowei He, Yang Yu, Qingyang Li, Zhiwei Qin, Wenjie Shang, Jieping Ye, Chen Ma
However, building a user simulator with no reality-gap, i. e., can predict user's feedback exactly, is unrealistic because the users' reaction patterns are complex and historical logs for each user are limited, which might mislead the simulator-based recommendation policy.
1 code implementation • 21 Mar 2023 • Yang Yu, Danruo Deng, Furui Liu, Yueming Jin, Qi Dou, Guangyong Chen, Pheng-Ann Heng
Open-set semi-supervised learning (Open-set SSL) considers a more practical scenario, where unlabeled data and test data contain new categories (outliers) not observed in labeled data (inliers).
no code implementations • 9 Mar 2023 • Zhengmao Zhu, YuRen Liu, Honglong Tian, Yang Yu, Kun Zhang
Playing an important role in Model-Based Reinforcement Learning (MBRL), environment models aim to predict future states based on the past.
Model-based Reinforcement Learning reinforcement-learning +2
1 code implementation • 3 Mar 2023 • Xu-Hui Liu, Feng Xu, Xinyu Zhang, Tianyuan Liu, Shengyi Jiang, Ruifeng Chen, Zongzhang Zhang, Yang Yu
In this paper, we propose a novel active imitation learning framework based on a teacher-student interaction model, in which the teacher's goal is to identify the best teaching behavior and actively affect the student's learning process.
1 code implementation • 3 Mar 2023 • Danruo Deng, Guangyong Chen, Yang Yu, Furui Liu, Pheng-Ann Heng
To address this problem, we propose a novel method, Fisher Information-based Evidential Deep Learning ($\mathcal{I}$-EDL).
no code implementations • 19 Feb 2023 • Cong Guan, Feng Chen, Lei Yuan, Zongzhang Zhang, Yang Yu
We also release the built offline benchmarks in this paper as a testbed for communication ability validation to facilitate further future research.
no code implementations • 18 Feb 2023 • Jing-Cheng Pang, Xin-Yu Yang, Si-Hang Yang, Yang Yu
To ease the learning burden of the policy, we investigate an inside-out scheme for natural language-conditioned RL by developing a task language (TL) that is task-related and unique.
1 code implementation • 27 Jan 2023 • Ziniu Li, Tian Xu, Yang Yu, Zhi-Quan Luo
This paper considers a situation where, besides the small amount of expert data, a supplementary dataset is available, which can be collected cheaply from sub-optimal policies.
1 code implementation • 5 Jan 2023 • Shaowei Zhang, Jiahan Cao, Lei Yuan, Yang Yu, De-Chuan Zhan
In cooperative multi-agent reinforcement learning (CMARL), it is critical for agents to achieve a balance between self-exploration and team collaboration.
1 code implementation • 28 Dec 2022 • Guanzhou Ke, Guoqing Chao, Xiaoli Wang, Chenyang Xu, Yongqi Zhu, Yang Yu
To this end, we utilize a deep fusion network to fuse view-specific representations into the view-common representation, extracting high-level semantics for obtaining robust representation.
1 code implementation • 11 Dec 2022 • Yang Yu, Qi Liu, Likang Wu, Runlong Yu, Sanshi Lei Yu, Zaixi Zhang
Experiments on two public datasets show that ClusterAttack can effectively degrade the performance of FedRec systems while circumventing many defense methods, and UNION can improve the resistance of the system against various untargeted attacks, including our ClusterAttack.
no code implementations • 8 Dec 2022 • Xingxing Zhang, Yiran Liu, Xun Wang, Pengcheng He, Yang Yu, Si-Qing Chen, Wayne Xiong, Furu Wei
The input and output of most text generation tasks can be transformed to two sequences of tokens and they can be modeled using sequence-to-sequence learning modeling tools such as Transformers.
Ranked #3 on Text Summarization on SAMSum
1 code implementation • 5 Dec 2022 • Hang Zhao, Zherong Pan, Yang Yu, Kai Xu
We study the problem of learning online packing skills for irregular 3D shapes, which is arguably the most challenging setting of bin packing problems.
no code implementations • 29 Nov 2022 • Runjia Li, Yang Yu, Charlie Haywood
In this paper, we address the problem of blind deblurring with high efficiency.
no code implementations • 14 Nov 2022 • Yiran Liu, Xiao Liu, Haotian Chen, Yang Yu
We use our theoretical framework to explain why the current debiasing methods cause performance degradation.
no code implementations • 6 Nov 2022 • Haotian Chen, Lingwei Zhang, Yiran Liu, Fanchao Chen, Yang Yu
To validate our theoretical analysis, we further propose another method using our proposed Causality-Aware Self-Attention Mechanism (CASAM) to guide the model to learn the underlying causality knowledge in legal texts.
2 code implementations • ACM Multimedia 2022 • Meiyu Liang, Junping Du, Xiaowen Cao, Yang Yu, Kangkang Lu, Zhe Xue, Min Zhang
Secondly, for further improving learning ability of implicit cross-media semantic associations, a semantic label association graph is constructed, and the graph convolutional network is utilized to mine the implicit semantic structures, thus guiding learning of discriminative features of different modalities.
no code implementations • 19 Oct 2022 • Yingchun Guo, Huan He, Ye Zhu, Yang Yu
Domain generalization person re-identification (DG Re-ID) aims to directly deploy a model trained on the source domain to the unseen target domain with good generalization, which is a challenging problem and has practical value in a real-world deployment.
1 code implementation • 13 Oct 2022 • Ke Xue, Jiacheng Xu, Lei Yuan, Miqing Li, Chao Qian, Zongzhang Zhang, Yang Yu
MA-DAC formulates the dynamic configuration of a complex algorithm with multiple types of hyperparameters as a contextual multi-agent Markov decision process and solves it by a cooperative multi-agent RL (MARL) algorithm.
no code implementations • 11 Oct 2022 • Zhengbang Zhu, Rongjun Qin, JunJie Huang, Xinyi Dai, Yang Yu, Yong Yu, Weinan Zhang
The increase in the measured performance, however, can have two possible attributions: a better understanding of user preferences, and a more proactive ability to utilize human bounded rationality to seduce user over-consumption.
no code implementations • 27 Sep 2022 • Jiahan Liu, Chaochao Yan, Yang Yu, Chan Lu, Junzhou Huang, Le Ou-Yang, Peilin Zhao
In this paper, we propose a novel end-to-end graph generation model for retrosynthesis prediction, which sequentially identifies the reaction center, generates the synthons, and adds motifs to the synthons to generate reactants.
Ranked #2 on Single-step retrosynthesis on USPTO-50k
2 code implementations • 23 Sep 2022 • Ruo-Ze Liu, Zhen-Jia Pang, Zhou-Yu Meng, Wenhai Wang, Yang Yu, Tong Lu
In this work, we investigate a set of RL techniques for the full-length game of StarCraft II.
no code implementations • 21 Sep 2022 • Hui Su, Xiao Zhou, Houjin Yu, Xiaoyu Shen, YuWen Chen, Zilin Zhu, Yang Yu, Jie zhou
Large Language Models pre-trained with self-supervised learning have demonstrated impressive zero-shot generalization capabilities on a wide spectrum of tasks.
1 code implementation • 16 Sep 2022 • Lanqing Li, Liang Zeng, Ziqi Gao, Shen Yuan, Yatao Bian, Bingzhe Wu, Hengtong Zhang, Yang Yu, Chan Lu, Zhipeng Zhou, Hongteng Xu, Jia Li, Peilin Zhao, Pheng-Ann Heng
The last decade has witnessed a prosperous development of computational methods and dataset curation for AI-aided drug discovery (AIDD).
1 code implementation • 12 Sep 2022 • Haoxin Lin, Yihao Sun, Jiaji Zhang, Yang Yu
The new model-based reinforcement learning algorithm MPPVE (Model-based Planning Policy Learning with Multi-step Plan Value Estimation) shows a better utilization of the learned model and achieves a better sample efficiency than state-of-the-art model-based RL approaches.
Model-based Reinforcement Learning reinforcement-learning +2
no code implementations • 31 Aug 2022 • Chao Chen, Dawei Wang, Feng Mao, Zongzhang Zhang, Yang Yu
Semi-supervised Anomaly Detection (AD) is a kind of data mining task which aims at learning features from partially-labeled datasets to help detect outliers.
1 code implementation • 26 Aug 2022 • Guanzhou Ke, Yongqi Zhu, Yang Yu
To this end, in this paper, we proposed a hybrid contrastive fusion algorithm to extract robust view-common representation from unlabeled data.
no code implementations • 23 Aug 2022 • Xinbin Liang, Yaru Liu, Yang Yu, Kaixuan Liu, Yadong Liu, Zongtan Zhou
Significance: We improve the classification performance of 3 CNNs on 2 datasets by the use of TRM, indicating that it has the capability to mine the EEG spatial topological information.
no code implementations • 19 Aug 2022 • Rong-Jun Qin, Fan-Ming Luo, Hong Qian, Yang Yu
This paper addresses policy learning in non-stationary environments and games with continuous actions.
1 code implementation • 9 Aug 2022 • Ke Xue, Yutong Wang, Cong Guan, Lei Yuan, Haobo Fu, Qiang Fu, Chao Qian, Yang Yu
Generating agents that can achieve zero-shot coordination (ZSC) with unseen partners is a new challenge in cooperative multi-agent reinforcement learning (MARL).
no code implementations • 3 Aug 2022 • Tian Xu, Ziniu Li, Yang Yu, Zhi-Quan Luo
Imitation learning learns a policy from expert trajectories.
no code implementations • 20 Jul 2022 • Yang Yu, Zixu Zhao, Yueming Jin, Guangyong Chen, Qi Dou, Pheng-Ann Heng
Concretely, for trusty representation learning, we propose to incorporate pseudo labels to instruct the pair selection, obtaining more reliable representation pairs for pixel contrast.
no code implementations • 19 Jun 2022 • Fan-Ming Luo, Tian Xu, Hang Lai, Xiong-Hui Chen, Weinan Zhang, Yang Yu
In this survey, we take a review of MBRL with a focus on the recent progress in deep RL.
no code implementations • 4 Jun 2022 • Xue-Kun Jin, Xu-Hui Liu, Shengyi Jiang, Yang Yu
Value function estimation is an indispensable subroutine in reinforcement learning, which becomes more challenging in the offline setting.
no code implementations • 3 Jun 2022 • Zheng-Mao Zhu, Xiong-Hui Chen, Hong-Long Tian, Kun Zhang, Yang Yu
Model-based methods have recently shown promising for offline reinforcement learning (RL), aiming to learn good policies from historical data without interacting with the environment.
no code implementations • 1 Jun 2022 • Fan-Ming Luo, Xingchen Cao, Rong-Jun Qin, Yang Yu
In this work, we present a dynamics-agnostic discriminator-ensemble reward learning method (DARL) within the AIL framework, capable of learning both state-action and state-only reward functions.
no code implementations • 1 Jun 2022 • Chengxing Jia, Hao Yin, Chenxiao Gao, Tian Xu, Lei Yuan, Zongzhang Zhang, Yang Yu
Model-based offline optimization with dynamics-aware policy provides a new perspective for policy learning and out-of-distribution generalization, where the learned policy could adapt to different dynamics enumerated at the training stage.
1 code implementation • 29 Mar 2022 • Yueming Jin, Yang Yu, Cheng Chen, Zixu Zhao, Pheng-Ann Heng, Danail Stoyanov
Automatic surgical scene segmentation is fundamental for facilitating cognitive intelligence in the modern operating theatre.
no code implementations • 28 Mar 2022 • Yangyang Hu, Yang Yu
On a mathematical reasoning dataset, we adopt the recently proposed abductive learning framework, and propose the ABL-Sym algorithm that combines the Transformer neural models with a symbolic mathematics library.
no code implementations • 22 Mar 2022 • Ziniu Li, Tian Xu, Yang Yu
In particular, we demonstrate that the sample complexity of the target Q-learning algorithm in [Lee and He, 2020] is $\widetilde{\mathcal O}(|\mathcal S|^2|\mathcal A|^2 (1-\gamma)^{-5}\varepsilon^{-2})$.
no code implementations • 9 Mar 2022 • Rongjun Qin, Feng Chen, Tonghan Wang, Lei Yuan, Xiaoran Wu, Zongzhang Zhang, Chongjie Zhang, Yang Yu
We demonstrate that the task representation can capture the relationship among tasks, and can generalize to unseen tasks.
1 code implementation • 24 Feb 2022 • Quan Wang, Yang Yu, Jason Pelecanos, Yiling Huang, Ignacio Lopez Moreno
In this paper, we introduce a novel language identification system based on conformer layers.
no code implementations • 5 Feb 2022 • Ziniu Li, Tian Xu, Yang Yu, Zhi-Quan Luo
First, we show that ValueDice could reduce to BC under the offline setting.
no code implementations • 28 Dec 2021 • Qixin Zhang, Wenbing Ye, Zaiyi Chen, Haoyuan Hu, Enhong Chen, Yang Yu
As a result, only limited violations of constraints or pessimistic competitive bounds could be guaranteed.
1 code implementation • 20 Dec 2021 • Chaochao Yan, Peilin Zhao, Chan Lu, Yang Yu, Junzhou Huang
To overcome this limitation, we propose an innovative retrosynthesis prediction framework that can compose novel templates beyond training templates.
Ranked #3 on Single-step retrosynthesis on USPTO-50k
no code implementations • 8 Dec 2021 • Zhenxin Wu, Qingliang Chen, Yifeng Liu, Yinqi Zhang, Chengkai Zhu, Yang Yu
Finally, using the progressive training (P), the features extracted by the model in different stages can be fully utilized and fused with each other.
1 code implementation • 2 Dec 2021 • Yang Yu, Fangzhao Wu, Chuhan Wu, Jingwei Yi, Qi Liu
We further propose a two-stage knowledge distillation method to improve the efficiency of the large PLM-based news recommendation model while maintaining its performance.
1 code implementation • NeurIPS 2021 • Xiong-Hui Chen, Yang Yu, Qingyang Li, Fan-Ming Luo, Zhiwei Qin, Wenjie Shang, Jieping Ye
Current offline reinforcement learning methods commonly learn in the policy space constrained to in-support regions by the offline dataset, in order to ensure the robustness of the outcome policies.
1 code implementation • NeurIPS 2021 • Chenyang Wu, Guoyu Yang, Zongzhang Zhang, Yang Yu, Dong Li, Wulong Liu, Jianye Hao
A belief is a distribution of states representing state uncertainty.
1 code implementation • NeurIPS 2021 • Xiong-Hui Chen, Shengyi Jiang, Feng Xu, Zongzhang Zhang, Yang Yu
Experiments on MuJoCo and Hand Manipulation Suite tasks show that the agents deployed with our method achieve similar performance as it has in the source domain, while those deployed with previous methods designed for same-modal domain adaptation suffer a larger performance gap.
no code implementations • 24 Nov 2021 • Yang Li, Kang Li, Zhen Yang, Yang Yu, Runnan Xu, Miaosen Yang
In order to solve this model, this research combines Jaya algorithm and interior point method (IPM) to develop a hybrid analysis-heuristic solution method called Jaya-IPM, where the lower- and upper- levels are respectively addressed by the IPM and the Jaya, and the scheduling scheme is obtained via iterations between the two levels.
no code implementations • 20 Nov 2021 • Yang Hu, Zhui Zhu, Sirui Song, Xue Liu, Yang Yu
Experimental results in an exemplary environment show that our MARL approach is able to demonstrate the effectiveness and necessity of restrictions on individual liberty for collaborative supply of public goods.
1 code implementation • ICLR 2022 • Hang Zhao, Yang Yu, Kai Xu
PCT is a full-fledged description of the state and action space of bin packing which can support packing policy learning based on deep reinforcement learning (DRL).
no code implementations • 26 Sep 2021 • Jiahan Cao, Lei Yuan, Jianhao Wang, Shaowei Zhang, Chongjie Zhang, Yang Yu, De-Chuan Zhan
During long-time observations, agents can build \textit{awareness} for teammates to alleviate the problem of partial observability.
no code implementations • 3 Sep 2021 • Chuhan Wu, Fangzhao Wu, Yang Yu, Tao Qi, Yongfeng Huang, Xing Xie
Two self-supervision tasks are incorporated in UserBERT for user model pre-training on unlabeled user behavior data to empower user modeling.
no code implementations • 16 Aug 2021 • Zhao-Hua Li, Yang Yu, Yingfeng Chen, Ke Chen, Zhipeng Hu, Changjie Fan
The empirical results show that the proposed method can preserve a higher cumulative reward than behavior cloning and learn a more consistent policy to the original one.
1 code implementation • 12 Aug 2021 • Jiarui Fang, Zilin Zhu, Shenggui Li, Hui Su, Yang Yu, Jie zhou, Yang You
PatrickStar uses the CPU-GPU heterogeneous memory space to store the model data.
no code implementations • 16 Jul 2021 • Yongqing Gao, Guangda Huzhang, Weijie Shen, Yawen Liu, Wen-Ji Zhou, Qing Da, Yang Yu
Recent E-commerce applications benefit from the growth of deep learning techniques.
no code implementations • 19 Jun 2021 • Tian Xu, Ziniu Li, Yang Yu, Zhi-Quan Luo
For some MDPs, we show that vanilla AIL has a worse sample complexity than BC.
no code implementations • ACL 2021 • Tao Qi, Fangzhao Wu, Chuhan Wu, Peiru Yang, Yang Yu, Xing Xie, Yongfeng Huang
Instead of a single user embedding, in our method each user is represented in a hierarchical interest tree to better capture their diverse and multi-grained interest in news.
1 code implementation • ICLR 2022 • Tonghan Wang, Liang Zeng, Weijun Dong, Qianlan Yang, Yang Yu, Chongjie Zhang
Learning sparse coordination graphs adaptive to the coordination dynamics among agents is a long-standing problem in cooperative multi-agent learning.
1 code implementation • ICLR 2022 • Siyuan Li, Jin Zhang, Jianhao Wang, Yang Yu, Chongjie Zhang
Although GCHRL possesses superior exploration ability by decomposing tasks via subgoals, existing GCHRL methods struggle in temporally extended tasks with sparse external rewards, since the high-level policy learning relies on external rewards.
no code implementations • 18 May 2021 • Jing-Cheng Pang, Tian Xu, Shengyi Jiang, Yu-Ren Liu, Yang Yu
To tackle the issue of limited action execution in RL, this paper first formalizes the problem as a Sparse Action Markov Decision Process (SA-MDP), in which specific actions in the action space can only be executed for a limited time.
1 code implementation • NeurIPS 2021 • Xu-Hui Liu, Zhenghai Xue, Jing-Cheng Pang, Shengyi Jiang, Feng Xu, Yang Yu
In reinforcement learning, experience replay stores past samples for further reuse.
1 code implementation • 14 Apr 2021 • Ruo-Ze Liu, Wenhai Wang, Yanjie Shen, Zhiqi Li, Yang Yu, Tong Lu
StarCraft II (SC2) is a real-time strategy game in which players produce and control multiple units to fight against opponent's units.
1 code implementation • 19 Feb 2021 • Yang Yu, Shih-Kang Chao, Guang Cheng
We propose a distributed bootstrap method for simultaneous inference on high-dimensional massive data that are stored and processed with many machines.
no code implementations • 10 Feb 2021 • Hong Qian, Yang Yu
In this article, we summarize methods of derivative-free reinforcement learning to date, and organize the methods in aspects including parameter updating, model selection, exploration, and parallel/distributed methods.
no code implementations • 10 Feb 2021 • Peiyi Zhang, Xiaodong Jiang, Ginger M Holt, Nikolay Pavlovich Laptev, Caner Komurlu, Peng Gao, Yang Yu
Hyper-parameters of time series models play an important role in time series analysis.
no code implementations • Findings (EMNLP) 2021 • Chuhan Wu, Fangzhao Wu, Yang Yu, Tao Qi, Yongfeng Huang, Qi Liu
However, existing language models are pre-trained and distilled on general corpus like Wikipedia, which has some gaps with the news domain and may be suboptimal for news intelligence.
3 code implementations • 1 Feb 2021 • Rongjun Qin, Songyi Gao, Xingyuan Zhang, Zhen Xu, Shengkai Huang, Zewen Li, Weinan Zhang, Yang Yu
We evaluate existing offline RL algorithms on NeoRL and argue that the performance of a policy should also be compared with the deterministic version of the behavior policy, instead of the dataset reward.
no code implementations • 1 Feb 2021 • Yang Yu, Hai-Feng Wang, Wen-Yuan Cui, Lin-Lin Li, Chao Liu, Bo Zhang, Hao Tian, Zhen-Yan Huo, Jie Ju, Zhi-Cun Liu, Fang Wen, Shuai Feng
We present analysis of the spatial density structure for the outer disk from 8$-$14 \, kpc with the LAMOST DR5 13534 OB-type stars and observe similar flaring on north and south sides of the disk implying that the flaring structure is symmetrical about the Galactic plane, for which the scale height at different Galactocentric distance is from 0. 14 to 0. 5 \, kpc.
Astrophysics of Galaxies
no code implementations • 27 Jan 2021 • Yang Yu, Shangce Gao, Yirui Wang, Jiujun Cheng, Yuki Todo
This proposed method, adaptive step length based on memory selection BSO, namely ASBSO, applies multiple step lengths to modify the generation process of new solutions, thus supplying a flexible search according to corresponding problems and convergent periods.
no code implementations • 1 Jan 2021 • Xiong-Hui Chen, Yang Yu, Qingyang Li, Zhiwei Tony Qin, Wenjie Shang, Yiping Meng, Jieping Ye
Instead of increasing the fidelity of models for policy learning, we handle the distortion issue via learning to adapt to diverse simulators generated by the offline dataset.
1 code implementation • 1 Jan 2021 • Xiong-Hui Chen, Shengyi Jiang, Feng Xu, Yang Yu
Domain adaptation is a promising direction for deploying RL agents in real-world applications, where vision-based robotics tasks constitute an important part.
no code implementations • 9 Dec 2020 • Yang Yu, Zhenhao Gu, Rong Tao, Jingtian Ge, Kenglun Chang
With the continuous development of machine learning technology, major e-commerce platforms have launched recommendation systems based on it to serve a large number of customers with different needs more efficiently.
no code implementations • 3 Dec 2020 • Wei zhang, Murray Campbell, Yang Yu, Sadhana Kumaravel
Human judgments of word similarity have been a popular method of evaluating the quality of word embedding.
no code implementations • NeurIPS 2020 • Shengyi Jiang, JingCheng Pang, Yang Yu
In this work, we investigate policy learning in the condition of a few expert demonstrations and a simulator with misspecified dynamics.
no code implementations • 24 Nov 2020 • Jing Yang, Chun Ouyang, Wil M. P. van der Aalst, Arthur H. M. ter Hofstede, Yang Yu
We demonstrate the feasibility of this framework by proposing an approach underpinned by the framework for organizational model discovery, and also conduct experiments on real-life event logs to discover and evaluate organizational models.
no code implementations • 22 Nov 2020 • Shenglan Liu, Yang Yu
As a widely used method in machine learning, principal component analysis (PCA) shows excellent properties for dimensionality reduction.
1 code implementation • NeurIPS 2020 • Chaochao Yan, Qianggang Ding, Peilin Zhao, Shuangjia Zheng, Jinyu Yang, Yang Yu, Junzhou Huang
Retrosynthesis is the process of recursively decomposing target molecules into available building blocks.
no code implementations • 27 Oct 2020 • Yang Yu, Rongrong Ni, Yao Zhao
Recently, AI-manipulated face techniques have developed rapidly and constantly, which has raised new security issues in society.
no code implementations • NeurIPS 2020 • Tian Xu, Ziniu Li, Yang Yu
In this paper, we firstly analyze the value gap between the expert policy and imitated policies by two imitation methods, behavioral cloning and generative adversarial imitation.
no code implementations • Knowledge Based Systems 2020 • Chao Wu, Qingyu Xiong, Hualing Yi, Yang Yu, Qiwu Zhu, Min Gao, Jie Chen
In this paper, we propose a novel end-to-end multiple-element joint detection model (MEJD), which effectively extracts all (target, aspect, sentiment) triples from a sentence.
Aspect-Based Sentiment Analysis Aspect-Based Sentiment Analysis (ABSA) +2
no code implementations • 16 Oct 2020 • Xiao Liu, Jiajie Zhang, Siting Li, Zuotong Wu, Yang Yu
We discover that pixel normalization causes object entanglement by in-painting the area occupied by ablated objects.
no code implementations • 9 Oct 2020 • Jiarui Fang, Yang Yu, Chengduo Zhao, Jie zhou
This paper designed a transformer serving system called TurboTransformers, which consists of a computing runtime and a serving framework to solve the above challenges.
1 code implementation • 4 Aug 2020 • Sirui Song, Zefang Zong, Yong Li, Xue Liu, Yang Yu
Saving lives or economy is a dilemma for epidemic control in most cities while smart-tracing technology raises people's privacy concerns.
5 code implementations • ICLR 2021 • Jianhao Wang, Zhizhou Ren, Terry Liu, Yang Yu, Chongjie Zhang
This paper presents a novel MARL approach, called duPLEX dueling multi-agent Q-learning (QPLEX), which takes a duplex dueling network architecture to factorize the joint value function.
no code implementations • 29 Jun 2020 • Shenglan Liu, Yang Yu
Manifold Learning occupies a vital role in the field of nonlinear dimensionality reduction and its ideas also serve for other relevant methods.
no code implementations • LREC 2020 • Linrui Zhang, Hsin-Lun Huang, Yang Yu, Dan Moldovan
As opposed to the traditional machine learning models which require considerable effort in designing task specific features, our model can be well adapted to the proposed tasks with a very limited amount of fine-tuning, which significantly reduces the manual effort in feature engineering.
no code implementations • 16 Apr 2020 • Tianyu Liu, Qinghai Liao, Lu Gan, Fulong Ma, Jie Cheng, Xupeng Xie, Zhe Wang, Yingbing Chen, Yilong Zhu, Shuyang Zhang, Zhengyong Chen, Yang Liu, Meng Xie, Yang Yu, Zitong Guo, Guang Li, Peidong Yuan, Dong Han, Yuying Chen, Haoyang Ye, Jianhao Jiao, Peng Yun, Zhenhua Xu, Hengli Wang, Huaiyang Huang, Sukai Wang, Peide Cai, Yuxiang Sun, Yandong Liu, Lujia Wang, Ming Liu
Moreover, many countries have imposed tough lockdown measures to reduce the virus transmission (e. g., retail, catering) during the pandemic, which causes inconveniences for human daily life.
no code implementations • 25 Mar 2020 • Guangda Huzhang, Zhen-Jia Pang, Yongqing Gao, Yawen Liu, Weijie Shen, Wen-Ji Zhou, Qing Da, An-Xiang Zeng, Han Yu, Yang Yu, Zhi-Hua Zhou
The framework consists of an evaluator that generalizes to evaluate recommendations involving the context, and a generator that maximizes the evaluator score by reinforcement learning, and a discriminator that ensures the generalization of the evaluator.
1 code implementation • 1 Mar 2020 • Chao Wang, Ruo-Ze Liu, Han-Jia Ye, Yang Yu
We disclose that a classically fully trained feature extractor can leave little embedding space for unseen classes, which keeps the model from well-fitting the new classes.
no code implementations • ICML 2020 • Yang Yu, Shih-Kang Chao, Guang Cheng
In this paper, we propose a bootstrap method applied to massive data processed distributedly in a large number of machines.
no code implementations • 19 Feb 2020 • Chi-Hua Wang, Yang Yu, Botao Hao, Guang Cheng
In this paper, we propose a novel perturbation-based exploration method in bandit algorithms with bounded or unbounded rewards, called residual bootstrap exploration (\texttt{ReBoot}).
no code implementations • 6 Feb 2020 • Wen-Ji Zhou, Yang Yu
Hierarchical reinforcement learning (HRL) helps address large-scale and sparse reward issues in reinforcement learning.