1 code implementation • 22 Jan 2025 • Chenjia Bai, Yang Zhang, Shuang Qiu, Qiaosheng Zhang, Kang Xu, Xuelong Li
Then, we reformulate our objective to direct preference optimization with an exploration term, where the UCB-term can be converted to a count-based exploration bonus.
no code implementations • 6 Jan 2025 • Xujin Li, Wei Wei, Shuang Qiu, Xinyi Zhang, Fu Li, Huiguang He
Specifically, we propose a prompt encoder based on the language-image pre-trained model to extract language-image features from task-specific prompts and stimulus images as prior knowledge for enhancing EEG decoding.
no code implementations • 9 Sep 2024 • Zhao Shan, Chenyou Fan, Shuang Qiu, Jiyuan Shi, Chenjia Bai
In this work, we propose a novel framework, Forward KL regularized Preference optimization for aligning Diffusion policies, to align the diffusion policy with preferences directly.
no code implementations • 24 Jul 2024 • Shuang Qiu, Dake Zhang, Rui Yang, Boxiang Lyu, Tong Zhang
This paper investigates multi-objective reinforcement learning (MORL), which focuses on learning Pareto optimal policies in the presence of multiple reward functions.
Multi-Objective Reinforcement Learning
reinforcement-learning
+1
no code implementations • 10 Jul 2024 • Dake Zhang, Boxiang Lyu, Shuang Qiu, Mladen Kolar, Tong Zhang
We study risk-sensitive reinforcement learning (RL), a crucial field due to its ability to enhance decision-making in scenarios where it is essential to manage uncertainty and minimize potential adverse outcomes.
no code implementations • 1 Jul 2024 • Changde Du, Kaicheng Fu, Bincheng Wen, Yi Sun, Jie Peng, Wei Wei, Ying Gao, Shengpei Wang, Chuncheng Zhang, Jinpeng Li, Shuang Qiu, Le Chang, Huiguang He
The conceptualization and categorization of natural objects in the human mind have long intrigued cognitive scientists and neuroscientists, offering crucial insights into human perception and cognition.
no code implementations • 5 Apr 2024 • Xize Liang, Chao Chen, Shuang Qiu, Jie Wang, Yue Wu, Zhihang Fu, Zhihao Shi, Feng Wu, Jieping Ye
Preference alignment is pivotal for empowering large language models (LLMs) to generate helpful and harmless responses.
1 code implementation • 28 Feb 2024 • Haoxiang Wang, Yong Lin, Wei Xiong, Rui Yang, Shizhe Diao, Shuang Qiu, Han Zhao, Tong Zhang
Additionally, DPA models user preferences as directions (i. e., unit vectors) in the reward space to achieve user-dependent preference control.
2 code implementations • 15 Feb 2024 • Rui Yang, Xiaoman Pan, Feng Luo, Shuang Qiu, Han Zhong, Dong Yu, Jianshu Chen
We consider the problem of multi-objective alignment of foundation models with human preferences, which is a critical step towards helpful and harmless AI systems.
1 code implementation • 12 Jan 2024 • Xujin Li, Wei Wei, Shuang Qiu, Huiguang He
The performance improvement of traditional decoding methods relies on a substantial amount of training data from new test subjects, which increases preparation time for BCI systems.
no code implementations • 13 Aug 2023 • Chen Wang, Zhongcai Pei, Shuang Qiu, Yachun Wang, Zhiyong Tang
Experiments on our dataset show that our method has a significant improvement over the previous best monocular vision method, with an intersection over union (IOU) increase of 3. 4 %, and the lightweight version has a fast detection speed and can meet the requirements of most real-time applications.
no code implementations • 28 May 2023 • Kang Xu, Chenjia Bai, Shuang Qiu, Haoran He, Bin Zhao, Zhen Wang, Wei Li, Xuelong Li
Leveraging learned strategies in unfamiliar scenarios is fundamental to human intelligence.
no code implementations • 2 Dec 2022 • Chen Wang, Zhongcai Pei, Shuang Qiu, Zhiyong Tang
Specifically, we design a selective module, which can make the network learn the complementary relationship between the RGB map and the depth map and effectively combine the information from the RGB map and the depth map in different scenes.
1 code implementation • 29 Jul 2022 • Shuang Qiu, Lingxiao Wang, Chenjia Bai, Zhuoran Yang, Zhaoran Wang
Moreover, under the online setting, we propose novel upper confidence bound (UCB)-type algorithms that incorporate such a contrastive loss with online RL algorithms for MDPs or MGs.
no code implementations • 25 Jul 2022 • Shuang Qiu, Xiaohan Wei, Jieping Ye, Zhaoran Wang, Zhuoran Yang
Our algorithms feature a combination of Upper Confidence Bound (UCB)-type optimism and fictitious play under the scope of simultaneous policy optimization in a non-stationary environment.
1 code implementation • 12 Jun 2022 • Lijie Xu, Shuang Qiu, Binhang Yuan, Jiawei Jiang, Cedric Renggli, Shaoduo Gan, Kaan Kara, Guoliang Li, Ji Liu, Wentao Wu, Jieping Ye, Ce Zhang
In this paper, we first conduct a systematic empirical study on existing data shuffling strategies, which reveals that all existing strategies have room for improvement -- they all suffer in terms of I/O performance or convergence rate.
no code implementations • 25 Feb 2022 • Shuang Qiu, Boxiang Lyu, Qinglin Meng, Zhaoran Wang, Zhuoran Yang, Michael I. Jordan
Dynamic mechanism design studies how mechanism designers should allocate resources among agents in a time-varying environment.
no code implementations • 14 Jan 2022 • Chen Wang, Zhongcai Pei, Shuang Qiu, Zhiyong Tang
Staircases are some of the most common building structures in urban environments.
no code implementations • 27 Nov 2021 • Weizhong Zhang, Shuang Qiu
To the best of our knowledge, this is the first screening method which introduces the dual optimum estimation technique -- by carefully exploring and exploiting the strong convexity and the complex structure of the dual problem -- in static screening methods to dynamic screening.
no code implementations • 19 Oct 2021 • Shuang Qiu, Jieping Ye, Zhaoran Wang, Zhuoran Yang
Then, given any extrinsic reward, the agent computes the policy via a planning algorithm with offline data collected in the exploration phase.
1 code implementation • 19 Jul 2021 • Dawei Du, Longyin Wen, Pengfei Zhu, Heng Fan, QinGhua Hu, Haibin Ling, Mubarak Shah, Junwen Pan, Ali Al-Ali, Amr Mohamed, Bakour Imene, Bin Dong, Binyu Zhang, Bouchali Hadia Nesma, Chenfeng Xu, Chenzhen Duan, Ciro Castiello, Corrado Mencar, Dingkang Liang, Florian Krüger, Gennaro Vessio, Giovanna Castellano, Jieru Wang, Junyu Gao, Khalid Abualsaud, Laihui Ding, Lei Zhao, Marco Cianciotta, Muhammad Saqib, Noor Almaadeed, Omar Elharrouss, Pei Lyu, Qi Wang, Shidong Liu, Shuang Qiu, Siyang Pan, Somaya Al-Maadeed, Sultan Daud Khan, Tamer Khattab, Tao Han, Thomas Golda, Wei Xu, Xiang Bai, Xiaoqing Xu, Xuelong Li, Yanyun Zhao, Ye Tian, Yingnan Lin, Yongchao Xu, Yuehan Yao, Zhenyu Xu, Zhijian Zhao, Zhipeng Luo, Zhiwei Wei, Zhiyuan Zhao
Crowd counting on the drone platform is an interesting topic in computer vision, which brings new challenges such as small object inference, background clutter and wide viewpoint.
4 code implementations • CVPR 2021 • Zhengxia Zou, Tianyang Shi, Shuang Qiu, Yi Yuan, Zhenwei Shi
Different from previous image-to-image translation methods that formulate the translation as pixel-wise prediction, we deal with such an artistic creation process in a vectorized environment and produce a sequence of physically meaningful stroke parameters that can be further used for rendering.
no code implementations • 23 Aug 2020 • Shuang Qiu, Zhuoran Yang, Xiaohan Wei, Jieping Ye, Zhaoran Wang
Existing approaches for this problem are based on two-timescale or double-loop stochastic gradient algorithms, which may also require sampling large-batch data.
no code implementations • ACL 2020 • Jianxing Yu, Wei Liu, Shuang Qiu, Qinliang Su, Kai Wang, Xiaojun Quan, Jian Yin
Specifically, we first build a multi-hop generation model and guide it to satisfy the logical rationality by the reasoning chain extracted from a given text.
no code implementations • 22 Jun 2020 • Shuang Qiu, Xiaohan Wei, Mladen Kolar
We study online convex optimization with constraints consisting of multiple functional constraints and a relatively simple constraint set, such as a Euclidean ball.
1 code implementation • 12 May 2020 • Yu Wang, Rong Ge, Shuang Qiu
Unlike existing work in deep neural network (DNN) graphs optimization for inference performance, we explore DNN graph optimization for energy awareness and savings for power- and resource-constrained machine learning devices.
no code implementations • IEEE 2020 • Shuang Qiu, Yao Zhao, Jianbo Jiao, Yunchao Wei, Shikui Wei
To this end, we propose to train the referring image segmentation model in a generative adversarial fashion, which well addresses the distribution similarity problem.
no code implementations • NeurIPS 2020 • Shuang Qiu, Xiaohan Wei, Zhuoran Yang, Jieping Ye, Zhaoran Wang
In particular, we prove that the proposed algorithm achieves $\widetilde{\mathcal{O}}(L|\mathcal{S}|\sqrt{|\mathcal{A}|T})$ upper bounds of both the regret and the constraint violation, where $L$ is the length of each episode.
1 code implementation • 11 Oct 2019 • Chaoyang He, Conghui Tan, Hanlin Tang, Shuang Qiu, Ji Liu
However, in many social network scenarios, centralized federated learning is not applicable (e. g., a central agent or server connecting all users may not exist, or the communication cost to the central server is not affordable).
no code implementations • 25 Sep 2019 • Shupeng Gui, Xiangliang Zhang, Pan Zhong, Shuang Qiu, Mingrui Wu, Jieping Ye, Zhengdao Wang, Ji Liu
The key problem in graph node embedding lies in how to define the dependence to neighbors.
no code implementations • NeurIPS Workshop Deep_Invers 2019 • Shuang Qiu, Xiaohan Wei, Zhuoran Yang
In this paper, we consider a new framework for the one-bit sensing problem where the sparsity is implicitly enforced via mapping a low dimensional representation $x_0$ through a known $n$-layer ReLU generative network $G:\mathbb{R}^k\rightarrow\mathbb{R}^d$.
no code implementations • ICML 2020 • Shuang Qiu, Xiaohan Wei, Zhuoran Yang
Specifically, we consider a new framework for this problem where the sparsity is implicitly enforced via mapping a low dimensional representation $x_0 \in \mathbb{R}^k$ through a known $n$-layer ReLU generative network $G:\mathbb{R}^k\rightarrow\mathbb{R}^d$ such that $\theta_0 = G(x_0)$.
no code implementations • 17 Jul 2019 • Hanlin Tang, Xiangru Lian, Shuang Qiu, Lei Yuan, Ce Zhang, Tong Zhang, Ji Liu
Since the \emph{decentralized} training has been witnessed to be superior to the traditional \emph{centralized} training in the communication restricted scenario, therefore a natural question to ask is "how to apply the error-compensated technology to the decentralized learning to further reduce the communication cost."
no code implementations • 30 Jan 2019 • Ming Lin, Shuang Qiu, Jieping Ye, Xiaomin Song, Qi Qian, Liang Sun, Shenghuo Zhu, Rong Jin
This bound is sub-optimal comparing to the information theoretical lower bound $\mathcal{O}(kd)$.
no code implementations • 29 Jan 2019 • Yawei Zhao, Chen Yu, Peilin Zhao, Hanlin Tang, Shuang Qiu, Ji Liu
Decentralized Online Learning (online learning in decentralized networks) attracts more and more attention, since it is believed that Decentralized Online Learning can help the data providers cooperatively better solve their online problems without sharing their private data to a third party or other providers.
no code implementations • 8 Oct 2018 • Yawei Zhao, Shuang Qiu, Ji Liu
While the online gradient method has been shown to be optimal for the static regret metric, the optimal algorithm for the dynamic regret remains unknown.
no code implementations • 27 Sep 2018 • Shupeng Gui, Xiangliang Zhang, Shuang Qiu, Mingrui Wu, Jieping Ye, Ji Liu
Our method can 1) learn an arbitrary form of the representation function from the neighborhood, without losing any potential dependence structures, 2) automatically decide the significance of neighbors at different distances, and 3) be applicable to both homogeneous and heterogeneous graph embedding, which may contain multiple types of nodes.
no code implementations • 28 May 2018 • Shupeng Gui, Xiangliang Zhang, Shuang Qiu, Mingrui Wu, Jieping Ye, Ji Liu
Graph embedding is a central problem in social network analysis and many other applications, aiming to learn the vector representation for each node.
no code implementations • 17 Mar 2017 • Shuang Qiu, Tingjin Luo, Jieping Ye, Ming Lin
We study an extreme scenario in multi-label learning where each training instance is endowed with a single one-bit label out of multiple labels.
no code implementations • 2 Mar 2017 • Ming Lin, Shuang Qiu, Bin Hong, Jieping Ye
We show that the conventional gradient descent heuristic is biased by the skewness of the distribution therefore is no longer the best practice of learning the SLM.