1 code implementation • 8 Aug 2022 • Qianying Lin, Wen-Ji Zhou, Yanshi Wang, Qing Da, Qing-Guo Chen, Bing Wang
SAM supports efficient training and real-time inference for user behavior sequences with lengths on the scale of thousands.
no code implementations • 30 Dec 2021 • Chenlin Shen, Guangda Huzhang, YuHang Zhou, Chen Liang, Qing Da
Our algorithm can straightforwardly optimize the linear programming in the prime space, and its solution can be simply applied by a stochastic strategy to fulfill the optimized objective and the constraints in expectation.
no code implementations • 29 Sep 2021 • Qianying Lin, Wen-Ji Zhou, Yanshi Wang, Qing Da, Qing-Guo Chen, Bing Wang
Extensive empirical studies show that our method outperforms various state-of-the-art sequential modeling methods on both public and industrial datasets for long sequential user behavior modeling.
no code implementations • 19 Jul 2021 • Xuesi Wang, Guangda Huzhang, Qianying Lin, Qing Da
Combined with the idea of Bayesian Optimization and gradient descent, we solve the online contextual Black-Box Optimization task that finds the optimal weights for sub-models given a chosen RA model.
no code implementations • 16 Jul 2021 • Yongqing Gao, Guangda Huzhang, Weijie Shen, Yawen Liu, Wen-Ji Zhou, Qing Da, Yang Yu
Recent E-commerce applications benefit from the growth of deep learning techniques.
no code implementations • 23 Mar 2021 • Junmei Hao, JingCheng Shi, Qing Da, AnXiang Zeng, Yujie Dun, Xueming Qian, Qianying Lin
Each interest of the user should have a certain degree of distinction, thus we introduce three strategies as the diversity regularized separator to separate multiple user interest vectors.
no code implementations • 24 Nov 2020 • Yanshi Wang, Jie Zhang, Qing Da, AnXiang Zeng
In this paper, we propose a novel neural network framework ESDF to tackle the above three challenges simultaneously.
no code implementations • 25 Mar 2020 • Guangda Huzhang, Zhen-Jia Pang, Yongqing Gao, Yawen Liu, Weijie Shen, Wen-Ji Zhou, Qing Da, An-Xiang Zeng, Han Yu, Yang Yu, Zhi-Hua Zhou
The framework consists of an evaluator that generalizes to evaluate recommendations involving the context, and a generator that maximizes the evaluator score by reinforcement learning, and a discriminator that ensures the generalization of the evaluator.
no code implementations • 18 Nov 2018 • Feiyang Pan, Qingpeng Cai, An-Xiang Zeng, Chun-Xiang Pan, Qing Da, Hua-Lin He, Qing He, Pingzhong Tang
Model-free reinforcement learning methods such as the Proximal Policy Optimization algorithm (PPO) have successfully applied in complex decision-making problems such as Atari games.
no code implementations • 2 Jul 2018 • Hua-Lin He, Chun-Xiang Pan, Qing Da, An-Xiang Zeng
In a large E-commerce platform, all the participants compete for impressions under the allocation mechanism of the platform.
2 code implementations • 25 May 2018 • Jing-Cheng Shi, Yang Yu, Qing Da, Shi-Yong Chen, An-Xiang Zeng
Applying reinforcement learning in physical-world tasks is extremely challenging.
1 code implementation • 2 Mar 2018 • Yujing Hu, Qing Da, An-Xiang Zeng, Yang Yu, Yinghui Xu
For better utilizing the correlation between different ranking steps, in this paper, we propose to use reinforcement learning (RL) to learn an optimal ranking policy which maximizes the expected accumulative rewards in a search session.