A Perspective of Q-value Estimation on Offline-to-Online Reinforcement Learning

12 Dec 2023

In this paper, from a novel perspective, we systematically study the challenges that remain in O2O RL and identify that the reason behind the slow improvement of the performance and the instability of online finetuning lies in the inaccurate Q-value estimation inherited from offline pretraining.

Offline RL

Theoretically Guaranteed Policy Improvement Distilled from Model-Based Planning

24 Jul 2023

Model-based reinforcement learning (RL) has demonstrated remarkable successes on a range of continuous control tasks due to its high sample efficiency.

Continuous Control Model-based Reinforcement Learning +1

ACE: Cooperative Multi-agent Q-learning with Bidirectional Action-Dependency

29 Nov 2022

In the learning phase, each agent minimizes the TD error that is dependent on how the subsequent agents have reacted to their chosen action.

Decision Making Q-Learning +2

Residual Relaxation for Multi-view Representation Learning

NeurIPS 2021

Multi-view methods learn representations by aligning multiple views of the same image and their performance largely depends on the choice of data augmentation.

Data Augmentation Representation Learning

BN-NAS: Neural Architecture Search with Batch Normalization

ICCV 2021

We present BN-NAS, neural architecture search with Batch Normalization (BN-NAS), to accelerate neural architecture search (NAS).

Neural Architecture Search

PSViT: Better Vision Transformer via Token Pooling and Attention Sharing

7 Aug 2021

Then, a compact set of the possible combinations for different token pooling and attention sharing mechanisms are constructed.

DAM: Discrepancy Alignment Metric for Face Recognition

ICCV 2021

To estimate the LID of each face image in the verification process, we propose two types of LID Estimation (LIDE) methods, which are reference-based and learning-based estimation methods, respectively.

Face Recognition

Inception Convolution with Efficient Dilation Search

CVPR 2021

To develop a practical method for learning complex inception convolution based on the data, a simple but effective search algorithm, referred to as efficient dilation optimization (EDO), is developed.

Human Detection Instance Segmentation +4

DETR for Crowd Pedestrian Detection

12 Dec 2020

Furthermore, the bipartite match of ED harms the training efficiency due to the large ground truth number in crowd scenes.

Decoder Pedestrian Detection

Adaptive Gradient Method with Resilience and Momentum

21 Oct 2020

Several variants of stochastic gradient descent (SGD) have been proposed to improve the learning effectiveness and efficiency when training deep neural networks, among which some recent influential attempts would like to adaptively control the parameter-wise learning rate (e. g., Adam and RMSProp).

Powering One-shot Topological NAS with Stabilized Share-parameter Proxy

ECCV 2020

Specifically, the difficulties for architecture searching in such a complex space has been eliminated by the proposed stabilized share-parameter proxy, which employs Stochastic Gradient Langevin Dynamics to enable fast shared parameter sampling, so as to achieve stabilized measurement of architecture performance even in search space with complex topological structures.

Neural Architecture Search

Improving One-shot NAS by Suppressing the Posterior Fading

CVPR 2020

In this paper, we analyse existing weight sharing one-shot NAS approaches from a Bayesian point of view and identify the posterior fading problem, which compromises the effectiveness of shared weights.

Neural Architecture Search object-detection +2

AM-LFS: AutoML for Loss Function Search

ICCV 2019

The key contribution of this work is the design of search space which can guarantee the generalization and transferability on different vision tasks by including a bunch of existing prevailing loss functions in a unified formulation.


