Search Results for author: Shuang Qiu

Found 40 papers, 10 papers with code

Online Preference Alignment for Language Models via Count-based Exploration

1 code implementation22 Jan 2025 Chenjia Bai, Yang Zhang, Shuang Qiu, Qiaosheng Zhang, Kang Xu, Xuelong Li

Then, we reformulate our objective to direct preference optimization with an exploration term, where the UCB-term can be converted to a count-based exploration bonus.

Instruction Following

Integrating Language-Image Prior into EEG Decoding for Cross-Task Zero-Calibration RSVP-BCI

no code implementations6 Jan 2025 Xujin Li, Wei Wei, Shuang Qiu, Xinyi Zhang, Fu Li, Huiguang He

Specifically, we propose a prompt encoder based on the language-image pre-trained model to extract language-image features from task-specific prompts and stimulus images as prior knowledge for enhancing EEG decoding.

Brain Computer Interface EEG +2

Forward KL Regularized Preference Optimization for Aligning Diffusion Policies

no code implementations9 Sep 2024 Zhao Shan, Chenyou Fan, Shuang Qiu, Jiyuan Shi, Chenjia Bai

In this work, we propose a novel framework, Forward KL regularized Preference optimization for aligning Diffusion policies, to align the diffusion policy with preferences directly.

D4RL Decision Making +2

Traversing Pareto Optimal Policies: Provably Efficient Multi-Objective Reinforcement Learning

no code implementations24 Jul 2024 Shuang Qiu, Dake Zhang, Rui Yang, Boxiang Lyu, Tong Zhang

This paper investigates multi-objective reinforcement learning (MORL), which focuses on learning Pareto optimal policies in the presence of multiple reward functions.

Multi-Objective Reinforcement Learning reinforcement-learning +1

Pessimism Meets Risk: Risk-Sensitive Offline Reinforcement Learning

no code implementations10 Jul 2024 Dake Zhang, Boxiang Lyu, Shuang Qiu, Mladen Kolar, Tong Zhang

We study risk-sensitive reinforcement learning (RL), a crucial field due to its ability to enhance decision-making in scenarios where it is essential to manage uncertainty and minimize potential adverse outcomes.

Decision Making Offline RL +3

Human-like object concept representations emerge naturally in multimodal large language models

no code implementations1 Jul 2024 Changde Du, Kaicheng Fu, Bincheng Wen, Yi Sun, Jie Peng, Wei Wei, Ying Gao, Shengpei Wang, Chuncheng Zhang, Jinpeng Li, Shuang Qiu, Le Chang, Huiguang He

The conceptualization and categorization of natural objects in the human mind have long intrigued cognitive scientists and neuroscientists, offering crucial insights into human perception and cognition.

Triplet

ROPO: Robust Preference Optimization for Large Language Models

no code implementations5 Apr 2024 Xize Liang, Chao Chen, Shuang Qiu, Jie Wang, Yue Wu, Zhihang Fu, Zhihao Shi, Feng Wu, Jieping Ye

Preference alignment is pivotal for empowering large language models (LLMs) to generate helpful and harmless responses.

Text Generation

Arithmetic Control of LLMs for Diverse User Preferences: Directional Preference Alignment with Multi-Objective Rewards

1 code implementation28 Feb 2024 Haoxiang Wang, Yong Lin, Wei Xiong, Rui Yang, Shizhe Diao, Shuang Qiu, Han Zhao, Tong Zhang

Additionally, DPA models user preferences as directions (i. e., unit vectors) in the reward space to achieve user-dependent preference control.

Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment

2 code implementations15 Feb 2024 Rui Yang, Xiaoman Pan, Feng Luo, Shuang Qiu, Han Zhong, Dong Yu, Jianshu Chen

We consider the problem of multi-objective alignment of foundation models with human preferences, which is a critical step towards helpful and harmless AI systems.

Reinforcement Learning (RL)

A Temporal-Spectral Fusion Transformer with Subject-Specific Adapter for Enhancing RSVP-BCI Decoding

1 code implementation12 Jan 2024 Xujin Li, Wei Wei, Shuang Qiu, Huiguang He

The performance improvement of traditional decoding methods relies on a substantial amount of training data from new test subjects, which increases preparation time for BCI systems.

Brain Computer Interface EEG

StairNetV3: Depth-aware Stair Modeling using Deep Learning

no code implementations13 Aug 2023 Chen Wang, Zhongcai Pei, Shuang Qiu, Yachun Wang, Zhiyong Tang

Experiments on our dataset show that our method has a significant improvement over the previous best monocular vision method, with an intersection over union (IOU) increase of 3. 4 %, and the lightweight version has a fast detection speed and can meet the requirements of most real-time applications.

Deep Learning Point cloud reconstruction

On the Value of Myopic Behavior in Policy Reuse

no code implementations28 May 2023 Kang Xu, Chenjia Bai, Shuang Qiu, Haoran He, Bin Zhao, Zhen Wang, Wei Li, Xuelong Li

Leveraging learned strategies in unfamiliar scenarios is fundamental to human intelligence.

RGB-D-based Stair Detection using Deep Learning for Autonomous Stair Climbing

no code implementations2 Dec 2022 Chen Wang, Zhongcai Pei, Shuang Qiu, Zhiyong Tang

Specifically, we design a selective module, which can make the network learn the complementary relationship between the RGB map and the depth map and effectively combine the information from the RGB map and the depth map in different scenes.

Contrastive UCB: Provably Efficient Contrastive Self-Supervised Learning in Online Reinforcement Learning

1 code implementation29 Jul 2022 Shuang Qiu, Lingxiao Wang, Chenjia Bai, Zhuoran Yang, Zhaoran Wang

Moreover, under the online setting, we propose novel upper confidence bound (UCB)-type algorithms that incorporate such a contrastive loss with online RL algorithms for MDPs or MGs.

Contrastive Learning Deep Reinforcement Learning +4

Provably Efficient Fictitious Play Policy Optimization for Zero-Sum Markov Games with Structured Transitions

no code implementations25 Jul 2022 Shuang Qiu, Xiaohan Wei, Jieping Ye, Zhaoran Wang, Zhuoran Yang

Our algorithms feature a combination of Upper Confidence Bound (UCB)-type optimism and fictitious play under the scope of simultaneous policy optimization in a non-stationary environment.

Stochastic Gradient Descent without Full Data Shuffle

1 code implementation12 Jun 2022 Lijie Xu, Shuang Qiu, Binhang Yuan, Jiawei Jiang, Cedric Renggli, Shaoduo Gan, Kaan Kara, Guoliang Li, Ji Liu, Wentao Wu, Jieping Ye, Ce Zhang

In this paper, we first conduct a systematic empirical study on existing data shuffling strategies, which reveals that all existing strategies have room for improvement -- they all suffer in terms of I/O performance or convergence rate.

Computational Efficiency

Deep Leaning-Based Ultra-Fast Stair Detection

no code implementations14 Jan 2022 Chen Wang, Zhongcai Pei, Shuang Qiu, Zhiyong Tang

Staircases are some of the most common building structures in urban environments.

Diversity Line Detection +3

Safe Screening for Sparse Conditional Random Fields

no code implementations27 Nov 2021 Weizhong Zhang, Shuang Qiu

To the best of our knowledge, this is the first screening method which introduces the dual optimum estimation technique -- by carefully exploring and exploiting the strong convexity and the complex structure of the dual problem -- in static screening methods to dynamic screening.

Structured Prediction

On Reward-Free RL with Kernel and Neural Function Approximations: Single-Agent MDP and Markov Game

no code implementations19 Oct 2021 Shuang Qiu, Jieping Ye, Zhaoran Wang, Zhuoran Yang

Then, given any extrinsic reward, the agent computes the policy via a planning algorithm with offline data collected in the exploration phase.

Reinforcement Learning (RL)

Stylized Neural Painting

4 code implementations CVPR 2021 Zhengxia Zou, Tianyang Shi, Shuang Qiu, Yi Yuan, Zhenwei Shi

Different from previous image-to-image translation methods that formulate the translation as pixel-wise prediction, we deal with such an artistic creation process in a vectorized environment and produce a sequence of physically meaningful stroke parameters that can be further used for rendering.

Disentanglement Image-to-Image Translation +2

Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth Nonlinear TD Learning

no code implementations23 Aug 2020 Shuang Qiu, Zhuoran Yang, Xiaohan Wei, Jieping Ye, Zhaoran Wang

Existing approaches for this problem are based on two-timescale or double-loop stochastic gradient algorithms, which may also require sampling large-batch data.

Low-Resource Generation of Multi-hop Reasoning Questions

no code implementations ACL 2020 Jianxing Yu, Wei Liu, Shuang Qiu, Qinliang Su, Kai Wang, Xiaojun Quan, Jian Yin

Specifically, we first build a multi-hop generation model and guide it to satisfy the logical rationality by the reasoning chain extracted from a given text.

Machine Reading Comprehension valid

Gradient-Variation Bound for Online Convex Optimization with Constraints

no code implementations22 Jun 2020 Shuang Qiu, Xiaohan Wei, Mladen Kolar

We study online convex optimization with constraints consisting of multiple functional constraints and a relatively simple constraint set, such as a Euclidean ball.

Energy-Aware DNN Graph Optimization

1 code implementation12 May 2020 Yu Wang, Rong Ge, Shuang Qiu

Unlike existing work in deep neural network (DNN) graphs optimization for inference performance, we explore DNN graph optimization for energy awareness and savings for power- and resource-constrained machine learning devices.

Referring Image Segmentation by Generative Adversarial Learning

no code implementations IEEE 2020 Shuang Qiu, Yao Zhao, Jianbo Jiao, Yunchao Wei, Shikui Wei

To this end, we propose to train the referring image segmentation model in a generative adversarial fashion, which well addresses the distribution similarity problem.

Image Segmentation Referring Expression +4

Upper Confidence Primal-Dual Reinforcement Learning for CMDP with Adversarial Loss

no code implementations NeurIPS 2020 Shuang Qiu, Xiaohan Wei, Zhuoran Yang, Jieping Ye, Zhaoran Wang

In particular, we prove that the proposed algorithm achieves $\widetilde{\mathcal{O}}(L|\mathcal{S}|\sqrt{|\mathcal{A}|T})$ upper bounds of both the regret and the constraint violation, where $L$ is the length of each episode.

reinforcement-learning Reinforcement Learning +1

Central Server Free Federated Learning over Single-sided Trust Social Networks

1 code implementation11 Oct 2019 Chaoyang He, Conghui Tan, Hanlin Tang, Shuang Qiu, Ji Liu

However, in many social network scenarios, centralized federated learning is not applicable (e. g., a central agent or server connecting all users may not exist, or the communication cost to the central server is not affordable).

Federated Learning

Robust One-Bit Recovery via ReLU Generative Networks: Improved Statistical Rate and Global Landscape Analysis

no code implementations NeurIPS Workshop Deep_Invers 2019 Shuang Qiu, Xiaohan Wei, Zhuoran Yang

In this paper, we consider a new framework for the one-bit sensing problem where the sparsity is implicitly enforced via mapping a low dimensional representation $x_0$ through a known $n$-layer ReLU generative network $G:\mathbb{R}^k\rightarrow\mathbb{R}^d$.

Robust One-Bit Recovery via ReLU Generative Networks: Near-Optimal Statistical Rate and Global Landscape Analysis

no code implementations ICML 2020 Shuang Qiu, Xiaohan Wei, Zhuoran Yang

Specifically, we consider a new framework for this problem where the sparsity is implicitly enforced via mapping a low dimensional representation $x_0 \in \mathbb{R}^k$ through a known $n$-layer ReLU generative network $G:\mathbb{R}^k\rightarrow\mathbb{R}^d$ such that $\theta_0 = G(x_0)$.

$\texttt{DeepSqueeze}$: Decentralization Meets Error-Compensated Compression

no code implementations17 Jul 2019 Hanlin Tang, Xiangru Lian, Shuang Qiu, Lei Yuan, Ce Zhang, Tong Zhang, Ji Liu

Since the \emph{decentralized} training has been witnessed to be superior to the traditional \emph{centralized} training in the communication restricted scenario, therefore a natural question to ask is "how to apply the error-compensated technology to the decentralized learning to further reduce the communication cost."

Decentralized Online Learning: Take Benefits from Others' Data without Sharing Your Own to Track Global Trend

no code implementations29 Jan 2019 Yawei Zhao, Chen Yu, Peilin Zhao, Hanlin Tang, Shuang Qiu, Ji Liu

Decentralized Online Learning (online learning in decentralized networks) attracts more and more attention, since it is believed that Decentralized Online Learning can help the data providers cooperatively better solve their online problems without sharing their private data to a third party or other providers.

Proximal Online Gradient is Optimum for Dynamic Regret

no code implementations8 Oct 2018 Yawei Zhao, Shuang Qiu, Ji Liu

While the online gradient method has been shown to be optimal for the static regret metric, the optimal algorithm for the dynamic regret remains unknown.

P^2IR: Universal Deep Node Representation via Partial Permutation Invariant Set Functions

no code implementations27 Sep 2018 Shupeng Gui, Xiangliang Zhang, Shuang Qiu, Mingrui Wu, Jieping Ye, Ji Liu

Our method can 1) learn an arbitrary form of the representation function from the neighborhood, without losing any potential dependence structures, 2) automatically decide the significance of neighbors at different distances, and 3) be applicable to both homogeneous and heterogeneous graph embedding, which may contain multiple types of nodes.

Graph Embedding Representation Learning

GESF: A Universal Discriminative Mapping Mechanism for Graph Representation Learning

no code implementations28 May 2018 Shupeng Gui, Xiangliang Zhang, Shuang Qiu, Mingrui Wu, Jieping Ye, Ji Liu

Graph embedding is a central problem in social network analysis and many other applications, aiming to learn the vector representation for each node.

Graph Embedding Graph Representation Learning

Nonconvex One-bit Single-label Multi-label Learning

no code implementations17 Mar 2017 Shuang Qiu, Tingjin Luo, Jieping Ye, Ming Lin

We study an extreme scenario in multi-label learning where each training instance is endowed with a single one-bit label out of multiple labels.

Multi-Label Learning

The Second Order Linear Model

no code implementations2 Mar 2017 Ming Lin, Shuang Qiu, Bin Hong, Jieping Ye

We show that the conventional gradient descent heuristic is biased by the skewness of the distribution therefore is no longer the best practice of learning the SLM.

model Open-Ended Question Answering

Cannot find the paper you are looking for? You can Submit a new open access paper.