Search Results for author: Yang Yu

Found 205 papers, 72 papers with code

POINTS1.5: Building a Vision-Language Model towards Real World Applications

no code implementations11 Dec 2024 YuAn Liu, Le Tian, Xiao Zhou, Xinyu Gao, Kavio Yu, Yang Yu, Jie zhou

Due to the scarcity of open-source Chinese datasets for vision-language models, we collect numerous images from the Internet and annotate them using a combination of manual and automatic methods.

Language Modeling Language Modelling +1

Money Burning Improves Mediated Communication

no code implementations29 Nov 2024 Yi Liu, Yang Yu

This paper explores the problem of mediated communication enhanced by money-burning tactics for commitment power.

Universal and Context-Independent Triggers for Precise Control of LLM Outputs

no code implementations22 Nov 2024 Jiashuo Liang, Guancheng Li, Yang Yu

Large language models (LLMs) have been widely adopted in applications such as automated content generation and even critical decision-making systems.

Decision Making

Stable Continual Reinforcement Learning via Diffusion-based Trajectory Replay

no code implementations16 Nov 2024 Feng Chen, Fuguang Han, Cong Guan, Lei Yuan, Zhilong Zhang, Yang Yu, Zongzhang Zhang

Given the inherent non-stationarity prevalent in real-world applications, continual Reinforcement Learning (RL) aims to equip the agent with the capability to address a series of sequentially presented decision-making tasks.

reinforcement-learning Reinforcement Learning +1

WHALE: Towards Generalizable and Scalable World Models for Embodied Decision-making

no code implementations8 Nov 2024 Zhilong Zhang, Ruifeng Chen, Junyin Ye, Yihao Sun, Pengyuan Wang, JingCheng Pang, Kaiyuan Li, Tianshuo Liu, Haoxin Lin, Yang Yu, Zhi-Hua Zhou

Incorporating these two techniques, we present Whale-ST, a scalable spatial-temporal transformer-based world model with enhanced generalizability.

Decision Making Video Generation

MPT: A Large-scale Multi-Phytoplankton Tracking Benchmark

no code implementations22 Oct 2024 Yang Yu, Yuezun Li, Xin Sun, Junyu Dong

Phytoplankton are a crucial component of aquatic ecosystems, and effective monitoring of them can provide valuable insights into ocean environments and ecosystem changes.

Multi-Object Tracking

LLMOPT: Learning to Define and Solve General Optimization Problems from Scratch

1 code implementation17 Oct 2024 Caigao Jiang, Xiang Shu, Hong Qian, Xingyu Lu, Jun Zhou, Aimin Zhou, Yang Yu

Namely, the accuracy of most current LLM-based methods and the generality of optimization problem types that they can model are still limited.

Code Generation Combinatorial Optimization

ZipGait: Bridging Skeleton and Silhouette with Diffusion Model for Advancing Gait Recognition

no code implementations22 Aug 2024 Fanxu Min, Qing Cai, Shaoxiang Guo, Yang Yu, Hao Fan, Junyu Dong

Current gait recognition research predominantly focuses on extracting appearance features effectively, but the performance is severely compromised by the vulnerability of silhouettes under unconstrained scenes.

Gait Recognition

TS-SAM: Fine-Tuning Segment-Anything Model for Downstream Tasks

1 code implementation3 Aug 2024 Yang Yu, Chen Xu, Kai Wang

Adapter based fine-tuning has been studied for improving the performance of SAM on downstream tasks.

Decoder parameter-efficient fine-tuning

Energy-Guided Diffusion Sampling for Offline-to-Online Reinforcement Learning

1 code implementation17 Jul 2024 Xu-Hui Liu, Tian-Shuo Liu, Shengyi Jiang, Ruifeng Chen, Zhilong Zhang, Xinwei Chen, Yang Yu

Combining offline and online reinforcement learning (RL) techniques is indeed crucial for achieving efficient and safe learning where data acquisition is expensive.

reinforcement-learning Reinforcement Learning +1

UnmixingSR: Material-aware Network with Unsupervised Unmixing as Auxiliary Task for Hyperspectral Image Super-resolution

no code implementations9 Jul 2024 Yang Yu

We regard HU as an auxiliary task and incorporate it into the HSI SR process by exploring the constraints between LR and HR abundances.

Hyperspectral Image Super-Resolution Image Super-Resolution

Hindsight Preference Learning for Offline Preference-based Reinforcement Learning

1 code implementation5 Jul 2024 Chen-Xiao Gao, Shengjun Fang, Chenjun Xiao, Yang Yu, Zongzhang Zhang

Offline preference-based reinforcement learning (RL), which focuses on optimizing policies using human preferences between pairs of trajectory segments selected from an offline dataset, has emerged as a practical avenue for RL applications.

reinforcement-learning Reinforcement Learning +1

Q-Adapter: Customizing Pre-trained LLMs to New Preferences with Forgetting Mitigation

1 code implementation4 Jul 2024 Yi-Chen Li, Fuxiang Zhang, Wenjie Qiu, Lei Yuan, Chengxing Jia, Zongzhang Zhang, Yang Yu, Bo An

Thanks to the residual Q-learning framework, we can restore the customized LLM with the pre-trained LLM and the \emph{residual Q-function} without the reward function $r_1$.

Q-Learning reinforcement-learning +2

Improving Sample Efficiency of Reinforcement Learning with Background Knowledge from Large Language Models

1 code implementation4 Jul 2024 Fuxiang Zhang, Junyou Li, Yi-Chen Li, Zongzhang Zhang, Yang Yu, Deheng Ye

In this paper, we introduce a framework that harnesses LLMs to extract background knowledge of an environment, which contains general understandings of the entire environment, making various downstream RL tasks benefit from one-time knowledge representation.

Common Sense Reasoning Reinforcement Learning (RL)

PhyTracker: An Online Tracker for Phytoplankton

no code implementations29 Jun 2024 Yang Yu, Qingxuan Lv, Yuezun Li, Zhiqiang Wei, Junyu Dong

Phytoplankton, a crucial component of aquatic ecosystems, requires efficient monitoring to understand marine ecological processes and environmental conditions.

Any-step Dynamics Model Improves Future Predictions for Online and Offline Reinforcement Learning

no code implementations27 May 2024 Haoxin Lin, Yu-Yan Xu, Yihao Sun, Zhilong Zhang, Yi-Chen Li, Chengxing Jia, Junyin Ye, Jiaji Zhang, Yang Yu

In the online setting, ADMPO-ON demonstrates improved sample efficiency compared to previous state-of-the-art methods.

Efficient Recurrent Off-Policy RL Requires a Context-Encoder-Specific Learning Rate

1 code implementation24 May 2024 Fan-Ming Luo, Zuolin Tu, Zefang Huang, Yang Yu

Recent progress has demonstrated that recurrent reinforcement learning (RL), which consists of a context encoder based on recurrent neural networks (RNNs) for unobservable state prediction and a multilayer perceptron (MLP) policy for decision making, can mitigate partial observability and serve as a robust baseline for POMDP tasks.

Decision Making Reinforcement Learning (RL)

Knowledgeable Agents by Offline Reinforcement Learning from Large Language Model Rollouts

no code implementations14 Apr 2024 Jing-Cheng Pang, Si-Hang Yang, Kaiyuan Li, Jiaji Zhang, Xiong-Hui Chen, Nan Tang, Yang Yu

Furthermore, KALM effectively enables the LLM to comprehend environmental dynamics, resulting in the generation of meaningful imaginary rollouts that reflect novel skills and demonstrate the seamless integration of large language models and reinforcement learning.

Language Modeling Language Modelling +4

SceneTracker: Long-term Scene Flow Estimation Network

1 code implementation29 Mar 2024 Bo wang, Jian Li, Yang Yu, Li Liu, Zhenping Sun, Dewen Hu

Considering the complementarity of scene flow estimation in the spatial domain's focusing capability and 3D object tracking in the temporal domain's coherence, this study aims to address a comprehensive new task that can simultaneously capture fine-grained and long-term 3D motion in an online manner: long-term scene flow estimation (LSFE).

3D Object Tracking Object Tracking +1

Disentangling Policy from Offline Task Representation Learning via Adversarial Data Augmentation

1 code implementation12 Mar 2024 Chengxing Jia, Fuxiang Zhang, Yi-Chen Li, Chen-Xiao Gao, Xu-Hui Liu, Lei Yuan, Zongzhang Zhang, Yang Yu

Specifically, the objective of adversarial data augmentation is not merely to generate data analogous to offline data distribution; instead, it aims to create adversarial examples designed to confound learned task representations and lead to incorrect task identification.

Contrastive Learning Data Augmentation +3

Debiased Offline Representation Learning for Fast Online Adaptation in Non-stationary Dynamics

no code implementations17 Feb 2024 Xinyu Zhang, Wenjie Qiu, Yi-Chen Li, Lei Yuan, Chengxing Jia, Zongzhang Zhang, Yang Yu

DORA incorporates an information bottleneck principle that maximizes mutual information between the dynamics encoding and the environmental data, while minimizing mutual information between the dynamics encoding and the actions of the behavior policy.

Representation Learning

Learning by Doing: An Online Causal Reinforcement Learning Framework with Causal-Aware Policy

no code implementations7 Feb 2024 Ruichu Cai, Siyang Huang, Jie Qiao, Wei Chen, Yan Zeng, Keli Zhang, Fuchun Sun, Yang Yu, Zhifeng Hao

As a key component to intuitive cognition and reasoning solutions in human intelligence, causal knowledge provides great potential for reinforcement learning (RL) agents' interpretability towards decision-making by helping reduce the searching space.

Decision Making Reinforcement Learning (RL)

Empowering Language Models with Active Inquiry for Deeper Understanding

no code implementations6 Feb 2024 Jing-Cheng Pang, Heng-Bo Fan, Pengyuan Wang, Jia-Hao Xiao, Nan Tang, Si-Hang Yang, Chengxing Jia, Sheng-Jun Huang, Yang Yu

The rise of large language models (LLMs) has revolutionized the way that we interact with artificial intelligence systems through natural language.

Active Learning Language Modeling +2

Towards an Information Theoretic Framework of Context-Based Offline Meta-Reinforcement Learning

1 code implementation4 Feb 2024 Lanqing Li, Hai Zhang, Xinyu Zhang, Shatong Zhu, Yang Yu, Junqiao Zhao, Pheng-Ann Heng

As demonstrations, we propose a supervised and a self-supervised implementation of $I(Z; M)$, and empirically show that the corresponding optimization algorithms exhibit remarkable generalization across a broad spectrum of RL benchmarks, context shift scenarios, data qualities and deep learning architectures.

Meta Reinforcement Learning Offline RL +3

FedGT: Federated Node Classification with Scalable Graph Transformer

no code implementations26 Jan 2024 Zaixi Zhang, Qingyong Hu, Yang Yu, Weibo Gao, Qi Liu

However, existing methods have the following limitations: (1) The links between local subgraphs are missing in subgraph federated learning.

Classification Federated Learning +2

Beimingwu: A Learnware Dock System

1 code implementation24 Jan 2024 Zhi-Hao Tan, Jian-Dong Liu, Xiao-Dong Bi, Peng Tan, Qin-Cheng Zheng, Hai-Tian Liu, Yi Xie, Xiao-Chuan Zou, Yang Yu, Zhi-Hua Zhou

The learnware paradigm proposed by Zhou [2016] aims to enable users to reuse numerous existing well-trained models instead of building machine learning models from scratch, with the hope of solving new user tasks even beyond models' original purposes.

Unmixing Before Fusion: A Generalized Paradigm for Multi-Source-based Hyperspectral Image Synthesis

no code implementations CVPR 2024 Yang Yu, Erting Pan, Xinya Wang, Yuheng Wu, Xiaoguang Mei, Jiayi Ma

By integrating unmixing this work maps unpaired HSI and RGB data to a low-dimensional abundance space greatly alleviating the difficulty of generating high-dimensional samples.

Image Generation

Generalizable Task Representation Learning for Offline Meta-Reinforcement Learning with Data Limitations

1 code implementation26 Dec 2023 Renzhe Zhou, Chen-Xiao Gao, Zongzhang Zhang, Yang Yu

GENTLE employs Task Auto-Encoder~(TAE), which is an encoder-decoder architecture to extract the characteristics of the tasks.

Contrastive Learning Decoder +5

Episodic Return Decomposition by Difference of Implicitly Assigned Sub-Trajectory Reward

1 code implementation17 Dec 2023 Haoxin Lin, Hongqiu Wu, Jiaji Zhang, Yihao Sun, Junyin Ye, Yang Yu

Real-world decision-making problems are usually accompanied by delayed rewards, which affects the sample efficiency of Reinforcement Learning, especially in the extremely delayed case where the only feedback is the episodic reward obtained at the end of an episode.

Decision Making

Policy Optimization in RLHF: The Impact of Out-of-preference Data

1 code implementation17 Dec 2023 Ziniu Li, Tian Xu, Yang Yu

These methods, either explicitly or implicitly, learn a reward model from preference data and differ in the data used for policy optimization to unlock the generalization ability of the reward model.

Efficient Human-AI Coordination via Preparatory Language-based Convention

no code implementations1 Nov 2023 Cong Guan, Lichao Zhang, Chunpeng Fan, Yichen Li, Feng Chen, Lihe Li, Yunjia Tian, Lei Yuan, Yang Yu

Developing intelligent agents capable of seamless coordination with humans is a critical step towards achieving artificial general intelligence.

Language Modelling Large Language Model

ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models

2 code implementations16 Oct 2023 Ziniu Li, Tian Xu, Yushun Zhang, Zhihang Lin, Yang Yu, Ruoyu Sun, Zhi-Quan Luo

ReMax can save about 46% GPU memory than PPO when training a 7B model and enables training on A800-80GB GPUs without the memory-saving offloading technique needed by PPO.

General Reinforcement Learning reinforcement-learning

AdaptSSR: Pre-training User Model with Augmentation-Adaptive Self-Supervised Ranking

1 code implementation NeurIPS 2023 Yang Yu, Qi Liu, Kai Zhang, Yuren Zhang, Chao Song, Min Hou, Yuqing Yuan, Zhihao Ye, Zaixi Zhang, Sanshi Lei Yu

Specifically, we adopt a multiple pairwise ranking loss which trains the user model to capture the similarity orders between the implicitly augmented view, the explicitly augmented view, and views from other users.

Contrastive Learning Data Augmentation

Reward-Consistent Dynamics Models are Strongly Generalizable for Offline Reinforcement Learning

no code implementations9 Oct 2023 Fan-Ming Luo, Tian Xu, Xingchen Cao, Yang Yu

MOREC learns a generalizable dynamics reward function from offline data, which is subsequently employed as a transition filter in any offline MBRL method: when generating transitions, the dynamics model generates a batch of transitions and selects the one with the highest dynamics reward value.

D4RL Model-based Reinforcement Learning +2

Improve the efficiency of deep reinforcement learning through semantic exploration guided by natural language

no code implementations21 Sep 2023 Zhourui Guo, Meng Yao, Yang Yu, Qiyue Yin

We assume that the interaction can be modeled as a sequence of templated questions and answers, and that there is a large corpus of previous interactions available.

Deep Reinforcement Learning

ACT: Empowering Decision Transformer with Dynamic Programming via Advantage Conditioning

1 code implementation12 Sep 2023 Chen-Xiao Gao, Chenyang Wu, Mingjun Cao, Rui Kong, Zongzhang Zhang, Yang Yu

Third, we train an Advantage-Conditioned Transformer (ACT) to generate actions conditioned on the estimated advantages.

Action Generation

Hide and Seek (HaS): A Lightweight Framework for Prompt Privacy Protection

1 code implementation6 Sep 2023 Yu Chen, Tingxin Li, Huiming Liu, Yang Yu

Numerous companies have started offering services based on large language models (LLM), such as ChatGPT, which inevitably raises privacy concerns as users' prompts are exposed to the model provider.

Sensiverse: A dataset for ISAC study

no code implementations26 Aug 2023 Jiajin Luo, Baojian Zhou, Yang Yu, Ping Zhang, Xiaohui Peng, Jianglei Ma, Peiying Zhu, Jianmin Lu, Wen Tong

In order to address the lack of applicable channel models for ISAC research and evaluation, we release Sensiverse, a dataset that can be used for ISAC research.

Synergistic Signal Denoising for Multimodal Time Series of Structure Vibration

no code implementations17 Aug 2023 Yang Yu, Han Chen

Structural Health Monitoring (SHM) plays an indispensable role in ensuring the longevity and safety of infrastructure.

Denoising Structural Health Monitoring +1

Transformer-Based Denoising of Mechanical Vibration Signals

no code implementations4 Aug 2023 Han Chen, Yang Yu, Pengtao Li

Mechanical vibration signal denoising is a pivotal task in various industrial applications, including system health monitoring and failure prediction.

Denoising

Disentangling Multi-view Representations Beyond Inductive Bias

1 code implementation3 Aug 2023 Guanzhou Ke, Yang Yu, Guoqing Chao, Xiaoli Wang, Chenyang Xu, Shengfeng He

In this paper, we propose a novel multi-view representation disentangling method that aims to go beyond inductive biases, ensuring both interpretability and generalizability of the resulting representations.

Clustering Inductive Bias +2

Car-Studio: Learning Car Radiance Fields from Single-View and Endless In-the-wild Images

1 code implementation26 Jul 2023 Tianyu Liu, Hao Zhao, Yang Yu, Guyue Zhou, Ming Liu

However, previous studies learned within a sequence of autonomous driving datasets, resulting in unsatisfactory blurring when rotating the car in the simulator.

Autonomous Driving

Model-Bellman Inconsistency for Model-based Offline Reinforcement Learning

2 code implementations PMLR 2023 Yihao Sun, Jiaji Zhang, Chengxing Jia, Haoxin Lin, Junyin Ye, Yang Yu

MOBILE conducts uncertainty quantification through the inconsistency of Bellman estimations under an ensemble of learned dynamics models, which can be a better approximator to the true Bellman error, and penalizes the Bellman estimation based on this uncertainty.

D4RL Offline RL +4

A Unified View of Deep Learning for Reaction and Retrosynthesis Prediction: Current Status and Future Challenges

no code implementations28 Jun 2023 Ziqiao Meng, Peilin Zhao, Yang Yu, Irwin King

Reaction and retrosynthesis prediction are fundamental tasks in computational chemistry that have recently garnered attention from both the machine learning and drug discovery communities.

Computational chemistry Deep Learning +3

SplatFlow: Learning Multi-frame Optical Flow via Splatting

1 code implementation15 Jun 2023 Bo wang, Yifan Zhang, Jian Li, Yang Yu, Zhenping Sun, Li Liu, Dewen Hu

The occlusion problem remains a crucial challenge in optical flow estimation (OFE).

Optical Flow Estimation

NPVForensics: Jointing Non-critical Phonemes and Visemes for Deepfake Detection

no code implementations12 Jun 2023 Yu Chen, Yang Yu, Rongrong Ni, Yao Zhao, Haoliang Li

Next, we design a phoneme-viseme awareness module for cross-modal feature fusion and representation alignment, so that the modality gap can be reduced and the intrinsic complementarity of the two modalities can be better explored.

DeepFake Detection Face Swapping

Policy Regularization with Dataset Constraint for Offline Reinforcement Learning

2 code implementations11 Jun 2023 Yuhang Ran, Yi-Chen Li, Fuxiang Zhang, Zongzhang Zhang, Yang Yu

A common taxonomy of existing offline RL works is policy regularization, which typically constrains the learned policy by distribution or support of the behavior policy.

Offline RL reinforcement-learning +2

Provably Efficient Adversarial Imitation Learning with Unknown Transitions

1 code implementation11 Jun 2023 Tian Xu, Ziniu Li, Yang Yu, Zhi-Quan Luo

Adversarial imitation learning (AIL), a subset of IL methods, is particularly promising, but its theoretical foundation in the presence of unknown transitions has yet to be fully developed.

Imitation Learning

Doubly Stochastic Graph-based Non-autoregressive Reaction Prediction

no code implementations5 Jun 2023 Ziqiao Meng, Peilin Zhao, Yang Yu, Irwin King

However, the current non-autoregressive decoder does not satisfy two essential rules of electron redistribution modeling simultaneously: the electron-counting rule and the symmetry rule.

Decoder Drug Discovery

Language Model Self-improvement by Reinforcement Learning Contemplation

no code implementations23 May 2023 Jing-Cheng Pang, Pengyuan Wang, Kaiyuan Li, Xiong-Hui Chen, Jiacheng Xu, Zongzhang Zhang, Yang Yu

We demonstrate that SIRLC can be applied to various NLP tasks, such as reasoning problems, text generation, and machine translation.

Language Modeling Language Modelling +5

Robust multi-agent coordination via evolutionary generation of auxiliary adversarial attackers

1 code implementation10 May 2023 Lei Yuan, Zi-Qian Zhang, Ke Xue, Hao Yin, Feng Chen, Cong Guan, Li-He Li, Chao Qian, Yang Yu

Concretely, to avoid the ego-system overfitting to a specific attacker, we maintain a set of attackers, which is optimized to guarantee the attackers high attacking quality and behavior diversity.

Diversity SMAC+

Communication-Robust Multi-Agent Learning by Adaptable Auxiliary Multi-Agent Adversary Generation

no code implementations9 May 2023 Lei Yuan, Feng Chen, Zhongzhang Zhang, Yang Yu

In specific, we introduce a novel message-attacking approach that models the learning of the auxiliary attacker as a cooperative problem under a shared goal to minimize the coordination ability of the ego system, with which every information channel may suffer from distinct message attacks.

Multi-agent Reinforcement Learning

Multi-agent Continual Coordination via Progressive Task Contextualization

no code implementations7 May 2023 Lei Yuan, Lihe Li, Ziqian Zhang, Fuxiang Zhang, Cong Guan, Yang Yu

Towards tackling the mentioned issue, this paper proposes an approach Multi-Agent Continual Coordination via Progressive Task Contextualization, dubbed MACPro.

Continual Learning Multi-agent Reinforcement Learning

Robust Multi-agent Communication via Multi-view Message Certification

no code implementations7 May 2023 Lei Yuan, Tao Jiang, Lihe Li, Feng Chen, Zongzhang Zhang, Yang Yu

Many multi-agent scenarios require message sharing among agents to promote coordination, hastening the robustness of multi-agent communication when policies are deployed in a message perturbation environment.

Sim2Rec: A Simulator-based Decision-making Approach to Optimize Real-World Long-term User Engagement in Sequential Recommender Systems

1 code implementation3 May 2023 Xiong-Hui Chen, Bowei He, Yang Yu, Qingyang Li, Zhiwei Qin, Wenjie Shang, Jieping Ye, Chen Ma

However, building a user simulator with no reality-gap, i. e., can predict user's feedback exactly, is unrealistic because the users' reaction patterns are complex and historical logs for each user are limited, which might mislead the simulator-based recommendation policy.

Decision Making Recommendation Systems +1

Adaptive Negative Evidential Deep Learning for Open-set Semi-supervised Learning

1 code implementation21 Mar 2023 Yang Yu, Danruo Deng, Furui Liu, Yueming Jin, Qi Dou, Guangyong Chen, Pheng-Ann Heng

Open-set semi-supervised learning (Open-set SSL) considers a more practical scenario, where unlabeled data and test data contain new categories (outliers) not observed in labeled data (inliers).

Outlier Detection

Beware of Instantaneous Dependence in Reinforcement Learning

no code implementations9 Mar 2023 Zhengmao Zhu, YuRen Liu, Honglong Tian, Yang Yu, Kun Zhang

Playing an important role in Model-Based Reinforcement Learning (MBRL), environment models aim to predict future states based on the past.

Model-based Reinforcement Learning reinforcement-learning +2

How To Guide Your Learner: Imitation Learning with Active Adaptive Expert Involvement

1 code implementation3 Mar 2023 Xu-Hui Liu, Feng Xu, Xinyu Zhang, Tianyuan Liu, Shengyi Jiang, Ruifeng Chen, Zongzhang Zhang, Yang Yu

In this paper, we propose a novel active imitation learning framework based on a teacher-student interaction model, in which the teacher's goal is to identify the best teaching behavior and actively affect the student's learning process.

Atari Games Imitation Learning

Uncertainty Estimation by Fisher Information-based Evidential Deep Learning

1 code implementation3 Mar 2023 Danruo Deng, Guangyong Chen, Yang Yu, Furui Liu, Pheng-Ann Heng

To address this problem, we propose a novel method, Fisher Information-based Evidential Deep Learning ($\mathcal{I}$-EDL).

Deep Learning Informativeness +1

Natural Language-conditioned Reinforcement Learning with Inside-out Task Language Development and Translation

no code implementations18 Feb 2023 Jing-Cheng Pang, Xin-Yu Yang, Si-Hang Yang, Yang Yu

To ease the learning burden of the policy, we investigate an inside-out scheme for natural language-conditioned RL by developing a task language (TL) that is task-related and unique.

Instruction Following Reinforcement Learning (RL)

Theoretical Analysis of Offline Imitation With Supplementary Dataset

1 code implementation27 Jan 2023 Ziniu Li, Tian Xu, Yang Yu, Zhi-Quan Luo

This paper considers a situation where, besides the small amount of expert data, a supplementary dataset is available, which can be collected cheaply from sub-optimal policies.

Imitation Learning

Self-Motivated Multi-Agent Exploration

1 code implementation5 Jan 2023 Shaowei Zhang, Jiahan Cao, Lei Yuan, Yang Yu, De-Chuan Zhan

In cooperative multi-agent reinforcement learning (CMARL), it is critical for agents to achieve a balance between self-exploration and team collaboration.

SMAC+ Starcraft +1

A Clustering-guided Contrastive Fusion for Multi-view Representation Learning

1 code implementation28 Dec 2022 Guanzhou Ke, Guoqing Chao, Xiaoli Wang, Chenyang Xu, Yongqi Zhu, Yang Yu

To this end, we utilize a deep fusion network to fuse view-specific representations into the view-common representation, extracting high-level semantics for obtaining robust representation.

Clustering MULTI-VIEW LEARNING +1

Untargeted Attack against Federated Recommendation Systems via Poisonous Item Embeddings and the Defense

1 code implementation11 Dec 2022 Yang Yu, Qi Liu, Likang Wu, Runlong Yu, Sanshi Lei Yu, Zaixi Zhang

Experiments on two public datasets show that ClusterAttack can effectively degrade the performance of FedRec systems while circumventing many defense methods, and UNION can improve the resistance of the system against various untargeted attacks, including our ClusterAttack.

Contrastive Learning Recommendation Systems

Momentum Calibration for Text Generation

no code implementations8 Dec 2022 Xingxing Zhang, Yiran Liu, Xun Wang, Pengcheng He, Yang Yu, Si-Qing Chen, Wayne Xiong, Furu Wei

The input and output of most text generation tasks can be transformed to two sequences of tokens and they can be modeled using sequence-to-sequence learning modeling tools such as Transformers.

Abstractive Text Summarization Text Generation

Learning Physically Realizable Skills for Online Packing of General 3D Shapes

1 code implementation5 Dec 2022 Hang Zhao, Zherong Pan, Yang Yu, Kai Xu

We study the problem of learning online packing skills for irregular 3D shapes, which is arguably the most challenging setting of bin packing problems.

3D geometry Action Generation +1

Real-time Blind Deblurring Based on Lightweight Deep-Wiener-Network

no code implementations29 Nov 2022 Runjia Li, Yang Yu, Charlie Haywood

In this paper, we address the problem of blind deblurring with high efficiency.

Deblurring

Does Debiasing Inevitably Degrade the Model Performance

no code implementations14 Nov 2022 Yiran Liu, Xiao Liu, Haotian Chen, Yang Yu

We use our theoretical framework to explain why the current debiasing methods cause performance degradation.

Knowledge is Power: Understanding Causality Makes Legal judgment Prediction Models More Generalizable and Robust

no code implementations6 Nov 2022 Haotian Chen, Lingwei Zhang, Yiran Liu, Fanchao Chen, Yang Yu

To validate our theoretical analysis, we further propose another method using our proposed Causality-Aware Self-Attention Mechanism (CASAM) to guide the model to learn the underlying causality knowledge in legal texts.

Open Information Extraction

Semantic Structure Enhanced Contrastive Adversarial Hash Network for Cross-media Representation Learning

2 code implementations ACM Multimedia 2022 Meiyu Liang, Junping Du, Xiaowen Cao, Yang Yu, Kangkang Lu, Zhe Xue, Min Zhang

Secondly, for further improving learning ability of implicit cross-media semantic associations, a semantic label association graph is constructed, and the graph convolutional network is utilized to mine the implicit semantic structures, thus guiding learning of discriminative features of different modalities.

Representation Learning

Domain generalization Person Re-identification on Attention-aware multi-operation strategery

no code implementations19 Oct 2022 Yingchun Guo, Huan He, Ye Zhu, Yang Yu

Domain generalization person re-identification (DG Re-ID) aims to directly deploy a model trained on the source domain to the unseen target domain with good generalization, which is a challenging problem and has practical value in a real-world deployment.

Domain Generalization Person Re-Identification

Multi-agent Dynamic Algorithm Configuration

1 code implementation13 Oct 2022 Ke Xue, Jiacheng Xu, Lei Yuan, Miqing Li, Chao Qian, Zongzhang Zhang, Yang Yu

MA-DAC formulates the dynamic configuration of a complex algorithm with multiple types of hyperparameters as a contextual multi-agent Markov decision process and solves it by a cooperative multi-agent RL (MARL) algorithm.

Multi-Armed Bandits Reinforcement Learning (RL)

Understanding or Manipulation: Rethinking Online Performance Gains of Modern Recommender Systems

no code implementations11 Oct 2022 Zhengbang Zhu, Rongjun Qin, JunJie Huang, Xinyi Dai, Yang Yu, Yong Yu, Weinan Zhang

The increase in the measured performance, however, can have two possible attributions: a better understanding of user preferences, and a more proactive ability to utilize human bounded rationality to seduce user over-consumption.

Benchmarking Sequential Recommendation

MARS: A Motif-based Autoregressive Model for Retrosynthesis Prediction

no code implementations27 Sep 2022 Jiahan Liu, Chaochao Yan, Yang Yu, Chan Lu, Junzhou Huang, Le Ou-Yang, Peilin Zhao

In this paper, we propose a novel end-to-end graph generation model for retrosynthesis prediction, which sequentially identifies the reaction center, generates the synthons, and adds motifs to the synthons to generate reactants.

Drug Discovery Graph Generation +2

WeLM: A Well-Read Pre-trained Language Model for Chinese

no code implementations21 Sep 2022 Hui Su, Xiao Zhou, Houjin Yu, Xiaoyu Shen, YuWen Chen, Zilin Zhu, Yang Yu, Jie zhou

Large Language Models pre-trained with self-supervised learning have demonstrated impressive zero-shot generalization capabilities on a wide spectrum of tasks.

Language Modeling Language Modelling +3

Model-based Reinforcement Learning with Multi-step Plan Value Estimation

1 code implementation12 Sep 2022 Haoxin Lin, Yihao Sun, Jiaji Zhang, Yang Yu

The new model-based reinforcement learning algorithm MPPVE (Model-based Planning Policy Learning with Multi-step Plan Value Estimation) shows a better utilization of the learned model and achieves a better sample efficiency than state-of-the-art model-based RL approaches.

Model-based Reinforcement Learning reinforcement-learning +2

Deep Anomaly Detection and Search via Reinforcement Learning

no code implementations31 Aug 2022 Chao Chen, Dawei Wang, Feng Mao, Zongzhang Zhang, Yang Yu

Semi-supervised Anomaly Detection (AD) is a kind of data mining task which aims at learning features from partially-labeled datasets to help detect outliers.

Ensemble Learning Partially Labeled Datasets +5

MORI-RAN: Multi-view Robust Representation Learning via Hybrid Contrastive Fusion

1 code implementation26 Aug 2022 Guanzhou Ke, Yongqi Zhu, Yang Yu

To this end, in this paper, we proposed a hybrid contrastive fusion algorithm to extract robust view-common representation from unlabeled data.

Clustering Representation Learning +1

Convolutional Neural Networks with A Topographic Representation Module for EEG-Based Brain-Computer Interfaces

no code implementations23 Aug 2022 Xinbin Liang, Yaru Liu, Yang Yu, Kaixuan Liu, Yadong Liu, Zongtan Zhou

Significance: We improve the classification performance of 3 CNNs on 2 datasets by the use of TRM, indicating that it has the capability to mine the EEG spatial topological information.

Classification EEG

Heterogeneous Multi-agent Zero-Shot Coordination by Coevolution

1 code implementation9 Aug 2022 Ke Xue, Yutong Wang, Cong Guan, Lei Yuan, Haobo Fu, Qiang Fu, Chao Qian, Yang Yu

Generating agents that can achieve zero-shot coordination (ZSC) with unseen partners is a new challenge in cooperative multi-agent reinforcement learning (MARL).

Multi-agent Reinforcement Learning

Pseudo-label Guided Cross-video Pixel Contrast for Robotic Surgical Scene Segmentation with Limited Annotations

no code implementations20 Jul 2022 Yang Yu, Zixu Zhao, Yueming Jin, Guangyong Chen, Qi Dou, Pheng-Ann Heng

Concretely, for trusty representation learning, we propose to incorporate pseudo labels to instruct the pair selection, obtaining more reliable representation pairs for pixel contrast.

Pseudo Label Representation Learning +2

Hybrid Value Estimation for Off-policy Evaluation and Offline Reinforcement Learning

no code implementations4 Jun 2022 Xue-Kun Jin, Xu-Hui Liu, Shengyi Jiang, Yang Yu

Value function estimation is an indispensable subroutine in reinforcement learning, which becomes more challenging in the offline setting.

Off-policy evaluation reinforcement-learning +1

Offline Reinforcement Learning with Causal Structured World Models

no code implementations3 Jun 2022 Zheng-Mao Zhu, Xiong-Hui Chen, Hong-Long Tian, Kun Zhang, Yang Yu

Model-based methods have recently shown promising for offline reinforcement learning (RL), aiming to learn good policies from historical data without interacting with the environment.

Model-based Reinforcement Learning Offline RL +3

Transferable Reward Learning by Dynamics-Agnostic Discriminator Ensemble

no code implementations1 Jun 2022 Fan-Ming Luo, Xingchen Cao, Rong-Jun Qin, Yang Yu

In this work, we present a dynamics-agnostic discriminator-ensemble reward learning method (DARL) within the AIL framework, capable of learning both state-action and state-only reward functions.

Imitation Learning

Model Generation with Provable Coverability for Offline Reinforcement Learning

no code implementations1 Jun 2022 Chengxing Jia, Hao Yin, Chenxiao Gao, Tian Xu, Lei Yuan, Zongzhang Zhang, Yang Yu

Model-based offline optimization with dynamics-aware policy provides a new perspective for policy learning and out-of-distribution generalization, where the learned policy could adapt to different dynamics enumerated at the training stage.

Offline RL Out-of-Distribution Generalization +3

Exploring Intra- and Inter-Video Relation for Surgical Semantic Scene Segmentation

1 code implementation29 Mar 2022 Yueming Jin, Yang Yu, Cheng Chen, Zixu Zhao, Pheng-Ann Heng, Danail Stoyanov

Automatic surgical scene segmentation is fundamental for facilitating cognitive intelligence in the modern operating theatre.

Contrastive Learning Relation +1

Enhancing Neural Mathematical Reasoning by Abductive Combination with Symbolic Library

no code implementations28 Mar 2022 Yangyang Hu, Yang Yu

On a mathematical reasoning dataset, we adopt the recently proposed abductive learning framework, and propose the ABL-Sym algorithm that combines the Transformer neural models with a symbolic mathematics library.

Logical Reasoning Mathematical Reasoning +1

A Note on Target Q-learning For Solving Finite MDPs with A Generative Oracle

no code implementations22 Mar 2022 Ziniu Li, Tian Xu, Yang Yu

In particular, we demonstrate that the sample complexity of the target Q-learning algorithm in [Lee and He, 2020] is $\widetilde{\mathcal O}(|\mathcal S|^2|\mathcal A|^2 (1-\gamma)^{-5}\varepsilon^{-2})$.

Q-Learning

Multi-Agent Policy Transfer via Task Relationship Modeling

no code implementations9 Mar 2022 Rongjun Qin, Feng Chen, Tonghan Wang, Lei Yuan, Xiaoran Wu, Zongzhang Zhang, Chongjie Zhang, Yang Yu

We demonstrate that the task representation can capture the relationship among tasks, and can generalize to unseen tasks.

Transfer Learning

Rethinking ValueDice: Does It Really Improve Performance?

no code implementations5 Feb 2022 Ziniu Li, Tian Xu, Yang Yu, Zhi-Quan Luo

First, we show that ValueDice could reduce to BC under the offline setting.

Imitation Learning

Online Allocation Problem with Two-sided Resource Constraints

no code implementations28 Dec 2021 Qixin Zhang, Wenbing Ye, Zaiyi Chen, Haoyuan Hu, Enhong Chen, Yang Yu

As a result, only limited violations of constraints or pessimistic competitive bounds could be guaranteed.

Decision Making Fairness +1

RetroComposer: Composing Templates for Template-Based Retrosynthesis Prediction

1 code implementation20 Dec 2021 Chaochao Yan, Peilin Zhao, Chan Lu, Yang Yu, Junzhou Huang

To overcome this limitation, we propose an innovative retrosynthesis prediction framework that can compose novel templates beyond training templates.

Retrosynthesis Single-step retrosynthesis

Progressive Multi-stage Interactive Training in Mobile Network for Fine-grained Recognition

no code implementations8 Dec 2021 Zhenxin Wu, Qingliang Chen, Yifeng Liu, Yinqi Zhang, Chengkai Zhu, Yang Yu

Finally, using the progressive training (P), the features extracted by the model in different stages can be fully utilized and fused with each other.

Fine-Grained Image Classification

Tiny-NewsRec: Effective and Efficient PLM-based News Recommendation

1 code implementation2 Dec 2021 Yang Yu, Fangzhao Wu, Chuhan Wu, Jingwei Yi, Qi Liu

We further propose a two-stage knowledge distillation method to improve the efficiency of the large PLM-based news recommendation model while maintaining its performance.

Knowledge Distillation Natural Language Understanding +1

Offline Model-based Adaptable Policy Learning

1 code implementation NeurIPS 2021 Xiong-Hui Chen, Yang Yu, Qingyang Li, Fan-Ming Luo, Zhiwei Qin, Wenjie Shang, Jieping Ye

Current offline reinforcement learning methods commonly learn in the policy space constrained to in-support regions by the offline dataset, in order to ensure the robustness of the outcome policies.

Decision Making reinforcement-learning +2

Cross-modal Domain Adaptation for Cost-Efficient Visual Reinforcement Learning

1 code implementation NeurIPS 2021 Xiong-Hui Chen, Shengyi Jiang, Feng Xu, Zongzhang Zhang, Yang Yu

Experiments on MuJoCo and Hand Manipulation Suite tasks show that the agents deployed with our method achieve similar performance as it has in the source domain, while those deployed with previous methods designed for same-modal domain adaptation suffer a larger performance gap.

Domain Adaptation reinforcement-learning +2

Stochastic optimal scheduling of demand response-enabled microgrids with renewable generations: An analytical-heuristic approach

no code implementations24 Nov 2021 Yang Li, Kang Li, Zhen Yang, Yang Yu, Runnan Xu, Miaosen Yang

In order to solve this model, this research combines Jaya algorithm and interior point method (IPM) to develop a hybrid analysis-heuristic solution method called Jaya-IPM, where the lower- and upper- levels are respectively addressed by the IPM and the Jaya, and the scheduling scheme is obtained via iterations between the two levels.

Scheduling

Calculus of Consent via MARL: Legitimating the Collaborative Governance Supplying Public Goods

no code implementations20 Nov 2021 Yang Hu, Zhui Zhu, Sirui Song, Xue Liu, Yang Yu

Experimental results in an exemplary environment show that our MARL approach is able to demonstrate the effectiveness and necessity of restrictions on individual liberty for collaborative supply of public goods.

Multi-agent Reinforcement Learning

Learning Efficient Online 3D Bin Packing on Packing Configuration Trees

1 code implementation ICLR 2022 Hang Zhao, Yang Yu, Kai Xu

PCT is a full-fledged description of the state and action space of bin packing which can support packing policy learning based on deep reinforcement learning (DRL).

3D Bin Packing Deep Reinforcement Learning

UserBERT: Contrastive User Model Pre-training

no code implementations3 Sep 2021 Chuhan Wu, Fangzhao Wu, Yang Yu, Tao Qi, Yongfeng Huang, Xing Xie

Two self-supervision tasks are incorporated in UserBERT for user model pre-training on unlabeled user behavior data to empower user modeling.

Neural-to-Tree Policy Distillation with Policy Improvement Criterion

no code implementations16 Aug 2021 Zhao-Hua Li, Yang Yu, Yingfeng Chen, Ke Chen, Zhipeng Hu, Changjie Fan

The empirical results show that the proposed method can preserve a higher cumulative reward than behavior cloning and learn a more consistent policy to the original one.

Decision Making Deep Reinforcement Learning +2

On Generalization of Adversarial Imitation Learning and Beyond

no code implementations19 Jun 2021 Tian Xu, Ziniu Li, Yang Yu, Zhi-Quan Luo

For some MDPs, we show that vanilla AIL has a worse sample complexity than BC.

Imitation Learning

HieRec: Hierarchical User Interest Modeling for Personalized News Recommendation

no code implementations ACL 2021 Tao Qi, Fangzhao Wu, Chuhan Wu, Peiru Yang, Yang Yu, Xing Xie, Yongfeng Huang

Instead of a single user embedding, in our method each user is represented in a hierarchical interest tree to better capture their diverse and multi-grained interest in news.

News Recommendation

Context-Aware Sparse Deep Coordination Graphs

1 code implementation ICLR 2022 Tonghan Wang, Liang Zeng, Weijun Dong, Qianlan Yang, Yang Yu, Chongjie Zhang

Learning sparse coordination graphs adaptive to the coordination dynamics among agents is a long-standing problem in cooperative multi-agent learning.

graph construction Graph Learning +2

Active Hierarchical Exploration with Stable Subgoal Representation Learning

1 code implementation ICLR 2022 Siyuan Li, Jin Zhang, Jianhao Wang, Yang Yu, Chongjie Zhang

Although GCHRL possesses superior exploration ability by decomposing tasks via subgoals, existing GCHRL methods struggle in temporally extended tasks with sparse external rewards, since the high-level policy learning relies on external rewards.

continuous-control Continuous Control +2

Reinforcement Learning With Sparse-Executing Actions via Sparsity Regularization

no code implementations18 May 2021 Jing-Cheng Pang, Tian Xu, Shengyi Jiang, Yu-Ren Liu, Yang Yu

To tackle the issue of limited action execution in RL, this paper first formalizes the problem as a Sparse Action Markov Decision Process (SA-MDP), in which specific actions in the action space can only be executed for a limited time.

Atari Games Autonomous Driving +5

An Introduction of mini-AlphaStar

1 code implementation14 Apr 2021 Ruo-Ze Liu, Wenhai Wang, Yanjie Shen, Zhiqi Li, Yang Yu, Tong Lu

StarCraft II (SC2) is a real-time strategy game in which players produce and control multiple units to fight against opponent's units.

Starcraft Starcraft II

Distributed Bootstrap for Simultaneous Inference Under High Dimensionality

1 code implementation19 Feb 2021 Yang Yu, Shih-Kang Chao, Guang Cheng

We propose a distributed bootstrap method for simultaneous inference on high-dimensional massive data that are stored and processed with many machines.

Vocal Bursts Intensity Prediction

Derivative-Free Reinforcement Learning: A Review

no code implementations10 Feb 2021 Hong Qian, Yang Yu

In this article, we summarize methods of derivative-free reinforcement learning to date, and organize the methods in aspects including parameter updating, model selection, exploration, and parallel/distributed methods.

Model Selection reinforcement-learning +2

NewsBERT: Distilling Pre-trained Language Model for Intelligent News Application

no code implementations Findings (EMNLP) 2021 Chuhan Wu, Fangzhao Wu, Yang Yu, Tao Qi, Yongfeng Huang, Qi Liu

However, existing language models are pre-trained and distilled on general corpus like Wikipedia, which has some gaps with the news domain and may be suboptimal for news intelligence.

Knowledge Distillation Language Modeling +3

NeoRL: A Near Real-World Benchmark for Offline Reinforcement Learning

3 code implementations1 Feb 2021 Rongjun Qin, Songyi Gao, Xingyuan Zhang, Zhen Xu, Shengkai Huang, Zewen Li, Weinan Zhang, Yang Yu

We evaluate existing offline RL algorithms on NeoRL and argue that the performance of a policy should also be compared with the deterministic version of the behavior policy, instead of the dataset reward.

Offline RL reinforcement-learning +2

The Flare and Warp of the Young Stellar Disk traced with LAMOST DR5 OB-type stars

no code implementations1 Feb 2021 Yang Yu, Hai-Feng Wang, Wen-Yuan Cui, Lin-Lin Li, Chao Liu, Bo Zhang, Hao Tian, Zhen-Yan Huo, Jie Ju, Zhi-Cun Liu, Fang Wen, Shuai Feng

We present analysis of the spatial density structure for the outer disk from 8$-$14 \, kpc with the LAMOST DR5 13534 OB-type stars and observe similar flaring on north and south sides of the disk implying that the flaring structure is symmetrical about the Galactic plane, for which the scale height at different Galactocentric distance is from 0. 14 to 0. 5 \, kpc.

Astrophysics of Galaxies

ASBSO: An Improved Brain Storm Optimization With Flexible Search Length and Memory-Based Selection

no code implementations27 Jan 2021 Yang Yu, Shangce Gao, Yirui Wang, Jiujun Cheng, Yuki Todo

This proposed method, adaptive step length based on memory selection BSO, namely ASBSO, applies multiple step lengths to modify the generation process of new solutions, thus supplying a flexible search according to corresponding problems and convergent periods.

Offline Adaptive Policy Leaning in Real-World Sequential Recommendation Systems

no code implementations1 Jan 2021 Xiong-Hui Chen, Yang Yu, Qingyang Li, Zhiwei Tony Qin, Wenjie Shang, Yiping Meng, Jieping Ye

Instead of increasing the fidelity of models for policy learning, we handle the distortion issue via learning to adapt to diverse simulators generated by the offline dataset.

Sequential Recommendation

Cross-Modal Domain Adaptation for Reinforcement Learning

1 code implementation1 Jan 2021 Xiong-Hui Chen, Shengyi Jiang, Feng Xu, Yang Yu

Domain adaptation is a promising direction for deploying RL agents in real-world applications, where vision-based robotics tasks constitute an important part.

Domain Adaptation reinforcement-learning +2

Interactive Search Based on Deep Reinforcement Learning

no code implementations9 Dec 2020 Yang Yu, Zhenhao Gu, Rong Tao, Jingtian Ge, Kenglun Chang

With the continuous development of machine learning technology, major e-commerce platforms have launched recommendation systems based on it to serve a large number of customers with different needs more efficiently.

Clustering Decision Making +4

Offline Imitation Learning with a Misspecified Simulator

no code implementations NeurIPS 2020 Shengyi Jiang, JingCheng Pang, Yang Yu

In this work, we investigate policy learning in the condition of a few expert demonstrations and a simulator with misspecified dynamics.

Friction Imitation Learning

OrgMining 2.0: A Novel Framework for Organizational Model Mining from Event Logs

no code implementations24 Nov 2020 Jing Yang, Chun Ouyang, Wil M. P. van der Aalst, Arthur H. M. ter Hofstede, Yang Yu

We demonstrate the feasibility of this framework by proposing an approach underpinned by the framework for organizational model discovery, and also conduct experiments on real-life event logs to discover and evaluate organizational models.

Model Discovery

Angular Embedding: A New Angular Robust Principal Component Analysis

no code implementations22 Nov 2020 Shenglan Liu, Yang Yu

As a widely used method in machine learning, principal component analysis (PCA) shows excellent properties for dimensionality reduction.

Dimensionality Reduction

Mining Generalized Features for Detecting AI-Manipulated Fake Faces

no code implementations27 Oct 2020 Yang Yu, Rongrong Ni, Yao Zhao

Recently, AI-manipulated face techniques have developed rapidly and constantly, which has raised new security issues in society.

Error Bounds of Imitating Policies and Environments

no code implementations NeurIPS 2020 Tian Xu, Ziniu Li, Yang Yu

In this paper, we firstly analyze the value gap between the expert policy and imitated policies by two imitation methods, behavioral cloning and generative adversarial imitation.

Imitation Learning Model-based Reinforcement Learning +3

Multiple-element joint detection for Aspect-Based Sentiment Analysis

no code implementations Knowledge Based Systems 2020 Chao Wu, Qingyu Xiong, Hualing Yi, Yang Yu, Qiwu Zhu, Min Gao, Jie Chen

In this paper, we propose a novel end-to-end multiple-element joint detection model (MEJD), which effectively extracts all (target, aspect, sentiment) triples from a sentence.

Aspect-Based Sentiment Analysis Aspect-Based Sentiment Analysis (ABSA) +2

Difference-in-Differences: Bridging Normalization and Disentanglement in PG-GAN

no code implementations16 Oct 2020 Xiao Liu, Jiajie Zhang, Siting Li, Zuotong Wu, Yang Yu

We discover that pixel normalization causes object entanglement by in-painting the area occupied by ablated objects.

counterfactual Disentanglement +1

TurboTransformers: An Efficient GPU Serving System For Transformer Models

no code implementations9 Oct 2020 Jiarui Fang, Yang Yu, Chengduo Zhao, Jie zhou

This paper designed a transformer serving system called TurboTransformers, which consists of a computing runtime and a serving framework to solve the above challenges.

Management

Reinforced Epidemic Control: Saving Both Lives and Economy

1 code implementation4 Aug 2020 Sirui Song, Zefang Zong, Yong Li, Xue Liu, Yang Yu

Saving lives or economy is a dilemma for epidemic control in most cities while smart-tracing technology raises people's privacy concerns.

Graph Neural Network reinforcement-learning +2

QPLEX: Duplex Dueling Multi-Agent Q-Learning

5 code implementations ICLR 2021 Jianhao Wang, Zhizhou Ren, Terry Liu, Yang Yu, Chongjie Zhang

This paper presents a novel MARL approach, called duPLEX dueling multi-agent Q-learning (QPLEX), which takes a duplex dueling network architecture to factorize the joint value function.

Decision Making Multi-agent Reinforcement Learning +3

Local Neighbor Propagation Embedding

no code implementations29 Jun 2020 Shenglan Liu, Yang Yu

Manifold Learning occupies a vital role in the field of nonlinear dimensionality reduction and its ideas also serve for other relevant methods.

Dimensionality Reduction

Affect inTweets: A Transfer Learning Approach

no code implementations LREC 2020 Linrui Zhang, Hsin-Lun Huang, Yang Yu, Dan Moldovan

As opposed to the traditional machine learning models which require considerable effort in designing task specific features, our model can be well adapted to the proposed tasks with a very limited amount of fine-tuning, which significantly reduces the manual effort in feature engineering.

Feature Engineering Transfer Learning

AliExpress Learning-To-Rank: Maximizing Online Model Performance without Going Online

no code implementations25 Mar 2020 Guangda Huzhang, Zhen-Jia Pang, Yongqing Gao, Yawen Liu, Weijie Shen, Wen-Ji Zhou, Qing Da, An-Xiang Zeng, Han Yu, Yang Yu, Zhi-Hua Zhou

The framework consists of an evaluator that generalizes to evaluate recommendations involving the context, and a generator that maximizes the evaluator score by reinforcement learning, and a discriminator that ensures the generalization of the evaluator.

Learning-To-Rank Reinforcement Learning

Novelty-Prepared Few-Shot Classification

1 code implementation1 Mar 2020 Chao Wang, Ruo-Ze Liu, Han-Jia Ye, Yang Yu

We disclose that a classically fully trained feature extractor can leave little embedding space for unseen classes, which keeps the model from well-fitting the new classes.

Classification General Classification

Simultaneous Inference for Massive Data: Distributed Bootstrap

no code implementations ICML 2020 Yang Yu, Shih-Kang Chao, Guang Cheng

In this paper, we propose a bootstrap method applied to massive data processed distributedly in a large number of machines.

Residual Bootstrap Exploration for Bandit Algorithms

no code implementations19 Feb 2020 Chi-Hua Wang, Yang Yu, Botao Hao, Guang Cheng

In this paper, we propose a novel perturbation-based exploration method in bandit algorithms with bounded or unbounded rewards, called residual bootstrap exploration (\texttt{ReBoot}).

Computational Efficiency Thompson Sampling

Temporal-adaptive Hierarchical Reinforcement Learning

no code implementations6 Feb 2020 Wen-Ji Zhou, Yang Yu

Hierarchical reinforcement learning (HRL) helps address large-scale and sparse reward issues in reinforcement learning.