COVE: Unleashing the Diffusion Feature Correspondence for Consistent Video Editing

no code implementations13 Jun 2024 Jiangshan Wang, Yue Ma, Jiayi Guo, Yicheng Xiao, Gao Huang, Xiu Li

Specifically, we propose an efficient sliding-window-based strategy to calculate the similarity among tokens in the diffusion features of source videos, identifying the tokens with high correspondence across frames.

World Models with Hints of Large Language Models for Goal Achieving

no code implementations11 Jun 2024 Zeyuan Liu, Ziyu Huan, Xiyao Wang, Jiafei Lyu, Jian Tao, Xiu Li, Furong Huang, Huazhe Xu

By assigning higher intrinsic rewards to samples that align with the hints outlined by the language model during model rollouts, DLLM guides the agent toward meaningful and efficient exploration.

Decision Making Efficient Exploration +1

CDSA: Conservative Denoising Score-based Algorithm for Offline Reinforcement Learning

no code implementations11 Jun 2024 Zeyuan Liu, Kai Yang, Xiu Li

Distribution shift is a major obstacle in offline reinforcement learning, which necessitates minimizing the discrepancy between the learned policy and the behavior policy to avoid overestimating rare or unseen actions.

D4RL Denoising +2

GrootVL: Tree Topology is All You Need in State Space Model

1 code implementation4 Jun 2024 Yicheng Xiao, Lin Song, Shaoli Huang, Jiangshan Wang, Siyu Song, Yixiao Ge, Xiu Li, Ying Shan

The state space models, employing recursively propagated features, demonstrate strong representation capabilities comparable to Transformer models and superior efficiency.

Image Classification object-detection +1

UniQA: Unified Vision-Language Pre-training for Image Quality and Aesthetic Assessment

1 code implementation3 Jun 2024 Hantao Zhou, Longxiang Tang, Rui Yang, Guanyi Qin, Yan Zhang, Runze Hu, Xiu Li

Image Quality Assessment (IQA) and Image Aesthetic Assessment (IAA) aim to simulate human subjective perception of image visual quality and aesthetic appeal.

Image Quality Assessment

SAM-E: Leveraging Visual Foundation Model with Sequence Imitation for Embodied Manipulation

no code implementations30 May 2024 Junjie Zhang, Chenjia Bai, Haoran He, Wenke Xia, Zhigang Wang, Bin Zhao, Xiu Li, Xuelong Li

In this paper, we propose SAM-E, a novel architecture for robot manipulation by leveraging a vision-foundation model for generalizable scene understanding and sequence imitation for long-term action reasoning.

Instruction Following Robot Manipulation +1

REPARO: Compositional 3D Assets Generation with Differentiable 3D Layout Alignment

1 code implementation28 May 2024 Haonan Han, Rui Yang, Huan Liao, Jiankai Xing, Zunnan Xu, Xiaoming Yu, Junwei Zha, Xiu Li, Wanhua Li

Traditional image-to-3D models often struggle with scenes containing multiple objects due to biases and occlusion complexities.

Image to 3D Object +1

Cross-Domain Policy Adaptation by Capturing Representation Mismatch

1 code implementation24 May 2024 Jiafei Lyu, Chenjia Bai, Jingwen Yang, Zongqing Lu, Xiu Li

We perform representation learning only in the target domain and measure the representation deviations on the transitions from the source domain, which we show can be a signal of dynamics mismatch.

Reinforcement Learning (RL) Representation Learning

Towards Efficient LLM Grounding for Embodied Multi-Agent Collaboration

no code implementations23 May 2024 Yang Zhang, Shixin Yang, Chenjia Bai, Fei Wu, Xiu Li, Zhen Wang, Xuelong Li

In this paper, we propose a novel framework for multi-agent collaboration that introduces Reinforced Advantage feedback (ReAd) for efficient self-refinement of plans.


MMTryon: Multi-Modal Multi-Reference Control for High-Quality Fashion Generation

no code implementations1 May 2024 Xujie Zhang, Ente Lin, Xiu Li, Yuxuan Luo, Michael Kampffmeyer, Xin Dong, Xiaodan Liang

Besides, to remove the segmentation dependency, MMTryon uses a parsing-free garment encoder and leverages a novel scalable data generation pipeline to convert existing VITON datasets to a form that allows MMTryon to be trained without requiring any explicit segmentation.

Segmentation Virtual Try-on

Contrastive Quantization based Semantic Code for Generative Recommendation

no code implementations23 Apr 2024 mengqun Jin, Zexuan Qiu, Jieming Zhu, Zhenhua Dong, Xiu Li

Finally, we train and test semantic code with with generative retrieval on a sequential recommendation model.

Decoder Language Modelling +3

AV-GAN: Attention-Based Varifocal Generative Adversarial Network for Uneven Medical Image Translation

no code implementations16 Apr 2024 Zexin Li, Yiyang Lin, Zijie Fang, Shuyan Li, Xiu Li

In this paper, we propose the Attention-Based Varifocal Generative Adversarial Network (AV-GAN), which solves multiple problems in pathologic image translation tasks, such as uneven translation difficulty in different regions, mutual interference of multiple resolution information, and nuclear deformation.

Generative Adversarial Network Translation

Video Object Segmentation with Dynamic Query Modulation

1 code implementation18 Mar 2024 Hantao Zhou, Runze Hu, Xiu Li

Storing intermediate frame segmentations as memory for long-range context modeling, spatial-temporal memory-based methods have recently showcased impressive results in semi-supervised video object segmentation (SVOS).

Object Segmentation +3

GRA: Detecting Oriented Objects through Group-wise Rotating and Attention

no code implementations17 Mar 2024 Jiangshan Wang, Yifan Pu, Yizeng Han, Jiayi Guo, Yiru Wang, Xiu Li, Gao Huang

GRA can adaptively capture fine-grained features of objects with diverse orientations, comprising two key components: Group-wise Rotating and Group-wise Attention.

Object object-detection +2

Lodge: A Coarse to Fine Diffusion Network for Long Dance Generation Guided by the Characteristic Dance Primitives

1 code implementation CVPR 2024 Ronghui Li, Yuxiang Zhang, Yachao Zhang, Hongwen Zhang, Jie Guo, Yan Zhang, Yebin Liu, Xiu Li

In contrast, the second-stage is the local diffusion, which parallelly generates detailed motion sequences under the guidance of the dance primitives and choreographic rules.

Motion Synthesis

MambaTalk: Efficient Holistic Gesture Synthesis with Selective State Space Models

no code implementations14 Mar 2024 Zunnan Xu, Yukang Lin, Haonan Han, Sicheng Yang, Ronghui Li, Yachao Zhang, Xiu Li

Gesture synthesis is a vital realm of human-computer interaction, with wide-ranging applications across various fields like film, robotics, and virtual reality.

Harmonious Group Choreography with Trajectory-Controllable Diffusion

no code implementations10 Mar 2024 Yuqin Dai, Wanlu Zhu, Ronghui Li, Zeping Ren, Xiangzheng Zhou, Xiu Li, Jun Li, Jian Yang

Specifically, to tackle dancer collisions, we introduce a Dance-Beat Navigator capable of generating trajectories for multiple dancers based on the music, complemented by a Distance-Consistency loss to maintain appropriate spacing among trajectories within a reasonable threshold.

SEABO: A Simple Search-Based Method for Offline Imitation Learning

1 code implementation6 Feb 2024 Jiafei Lyu, Xiaoteng Ma, Le Wan, Runze Liu, Xiu Li, Zongqing Lu

Offline reinforcement learning (RL) has attracted much attention due to its ability in learning from static offline datasets and eliminating the need of interacting with the environment.

D4RL Imitation Learning +2

Understanding What Affects Generalization Gap in Visual Reinforcement Learning: Theory and Empirical Evidence

no code implementations5 Feb 2024 Jiafei Lyu, Le Wan, Xiu Li, Zongqing Lu

Recently, there are many efforts attempting to learn useful policies for continuous control in visual reinforcement learning (RL).

Continuous Control Learning Theory +1

BATON: Aligning Text-to-Audio Model with Human Preference Feedback

no code implementations1 Feb 2024 Huan Liao, Haonan Han, Kai Yang, Tianjiao Du, Rui Yang, Zunnan Xu, Qinmei Xu, Jingquan Liu, Jiasheng Lu, Xiu Li

With the development of AI-Generated Content (AIGC), text-to-audio models are gaining widespread attention.

CreativeSynth: Creative Blending and Synthesis of Visual Arts based on Multimodal Diffusion

1 code implementation25 Jan 2024 Nisha Huang, WeiMing Dong, Yuxin Zhang, Fan Tang, Ronghui Li, Chongyang Ma, Xiu Li, Changsheng Xu

Large-scale text-to-image generative models have made impressive strides, showcasing their ability to synthesize a vast array of high-quality images.

Image Generation Style Transfer

Exploration and Anti-Exploration with Distributional Random Network Distillation

2 code implementations18 Jan 2024 Kai Yang, Jian Tao, Jiafei Lyu, Xiu Li

To address this issue, we introduce the Distributional RND (DRND), a derivative of the RND.


Exploring Multi-Modal Control in Music-Driven Dance Generation

no code implementations1 Jan 2024 Ronghui Li, Yuqin Dai, Yachao Zhang, Jun Li, Jian Yang, Jie Guo, Xiu Li

Existing music-driven 3D dance generation methods mainly concentrate on high-quality dance generation, but lack sufficient control during the generation process.

1st Place Solution for 5th LSVOS Challenge: Referring Video Object Segmentation

1 code implementation1 Jan 2024 Zhuoyan Luo, Yicheng Xiao, Yong liu, Yitong Wang, Yansong Tang, Xiu Li, Yujiu Yang

The recent transformer-based models have dominated the Referring Video Object Segmentation (RVOS) task due to the superior performance.

Object Referring Video Object Segmentation +3

Chain of Generation: Multi-Modal Gesture Synthesis via Cascaded Conditional Control

no code implementations26 Dec 2023 Zunnan Xu, Yachao Zhang, Sicheng Yang, Ronghui Li, Xiu Li

We introduce a novel method that separates priors from speech and employs multimodal priors as constraints for generating gestures.

Gesture Generation

Realistic Human Motion Generation with Cross-Diffusion Models

no code implementations18 Dec 2023 Zeping Ren, Shaoli Huang, Xiu Li

Our method integrates 3D and 2D information using a shared transformer network within the training of the diffusion model, unifying motion noise into a single feature space.

Semi-supervised Semantic Segmentation Meets Masked Modeling:Fine-grained Locality Learning Matters in Consistency Regularization

no code implementations14 Dec 2023 Wentao Pan, Zhe Xu, Jiangpeng Yan, Zihan Wu, Raymond Kai-yu Tong, Xiu Li, Jianhua Yao

Semi-supervised semantic segmentation aims to utilize limited labeled images and abundant unlabeled images to achieve label-efficient learning, wherein the weak-to-strong consistency regularization framework, popularized by FixMatch, is widely used as a benchmark scheme.

Image Classification Pseudo Label +2

MagicStick: Controllable Video Editing via Control Handle Transformations

1 code implementation5 Dec 2023 Yue Ma, Xiaodong Cun, Yingqing He, Chenyang Qi, Xintao Wang, Ying Shan, Xiu Li, Qifeng Chen

Yet succinct, our method is the first method to show the ability of video property editing from the pre-trained text-to-image model.

Video Editing Video Generation

Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model

1 code implementation CVPR 2024 Kai Yang, Jian Tao, Jiafei Lyu, Chunjiang Ge, Jiaxin Chen, Qimai Li, Weihan Shen, Xiaolong Zhu, Xiu Li

The direct preference optimization (DPO) method, effective in fine-tuning large language models, eliminates the necessity for a reward model.


Replay-enhanced Continual Reinforcement Learning

no code implementations20 Nov 2023 Tiantian Zhang, Kevin Zehua Shen, Zichuan Lin, Bo Yuan, Xueqian Wang, Xiu Li, Deheng Ye

On the other hand, offline learning on replayed tasks while learning a new task may induce a distributional shift between the dataset and the learned policy on old tasks, resulting in forgetting.

Continual Learning reinforcement-learning

Reti-Diff: Illumination Degradation Image Restoration with Retinex-based Latent Diffusion Model

1 code implementation20 Nov 2023 Chunming He, Chengyu Fang, Yulun Zhang, Tian Ye, Kai Li, Longxiang Tang, Zhenhua Guo, Xiu Li, Sina Farsiu

These priors are subsequently utilized by RGformer to guide the decomposition of image features into their respective reflectance and illumination components.

Image Restoration

The primacy bias in Model-based RL

no code implementations23 Oct 2023 Zhongjian Qiao, Jiafei Lyu, Xiu Li

The primacy bias in deep reinforcement learning (DRL), which refers to the agent's tendency to overfit early data and lose the ability to learn from new data, can significantly decrease the performance of DRL algorithms.

Continuous Control Model-based Reinforcement Learning +1

Consistent123: One Image to Highly Consistent 3D Asset Using Case-Aware Diffusion Priors

no code implementations29 Sep 2023 Yukang Lin, Haonan Han, Chaoqun Gong, Zunnan Xu, Yachao Zhang, Xiu Li

However, due to utilizing the case-agnostic rigid strategy, their generalization ability to arbitrary cases and the 3D consistency of reconstruction are still poor.

Image to 3D

UniHead: Unifying Multi-Perception for Detection Heads

1 code implementation23 Sep 2023 Hantao Zhou, Rui Yang, Yachao Zhang, Haoran Duan, Yawen Huang, Runze Hu, Xiu Li, Yefeng Zheng

The detection head constitutes a pivotal component within object detectors, tasked with executing both classification and localization functions.

Time-aligned Exposure-enhanced Model for Click-Through Rate Prediction

no code implementations19 Aug 2023 Hengyu Zhang, Chang Meng, Wei Guo, Huifeng Guo, Jieming Zhu, Guangpeng Zhao, Ruiming Tang, Xiu Li

Click-Through Rate (CTR) prediction, crucial in applications like recommender systems and online advertising, involves ranking items based on the likelihood of user clicks.

Click-Through Rate Prediction Recommendation Systems

Parallel Knowledge Enhancement based Framework for Multi-behavior Recommendation

1 code implementation9 Aug 2023 Chang Meng, Chenhao Zhai, Yu Yang, Hengyu Zhang, Xiu Li

In the fusion step, advanced neural networks are used to model the hierarchical correlations between user behaviors.

Multi-Task Learning

Strategic Preys Make Acute Predators: Enhancing Camouflaged Object Detectors by Generating Camouflaged Objects

1 code implementation6 Aug 2023 Chunming He, Kai Li, Yachao Zhang, Yulun Zhang, Zhenhua Guo, Xiu Li, Martin Danelljan, Fisher Yu

On the prey side, we propose an adversarial training framework, Camouflageator, which introduces an auxiliary generator to generate more camouflaged objects that are harder for a COD method to detect.

object-detection Object Detection

Consistency Regularization for Generalizable Source-free Domain Adaptation

no code implementations3 Aug 2023 Longxiang Tang, Kai Li, Chunming He, Yulun Zhang, Xiu Li

In this paper, we propose a consistency regularization framework to develop a more generalizable SFDA method, which simultaneously boosts model performance on both target training and testing datasets.

Pseudo Label Source-Free Domain Adaptation

HQG-Net: Unpaired Medical Image Enhancement with High-Quality Guidance

no code implementations15 Jul 2023 Chunming He, Kai Li, Guoxia Xu, Jiangpeng Yan, Longxiang Tang, Yulun Zhang, Xiu Li, YaoWei Wang

Specifically, we extract features from an HQ image and explicitly insert the features, which are expected to encode HQ cues, into the enhancement network to guide the LQ enhancement with the variational normalization module.

Image Enhancement Medical Image Enhancement

PEARL: Zero-shot Cross-task Preference Alignment and Robust Reward Learning for Robotic Manipulation

no code implementations6 Jun 2023 Runze Liu, Yali Du, Fengshuo Bai, Jiafei Lyu, Xiu Li

In this paper, we propose Zero-shot Cross-task Preference Alignment and Robust Reward Learning (PEARL), which learns policies from cross-task preference transfer without any human labels of the target task.

Offline RL Reinforcement Learning (RL)

Normalization Enhances Generalization in Visual Reinforcement Learning

no code implementations1 Jun 2023 Lu Li, Jiafei Lyu, Guozheng Ma, Zilin Wang, Zhenjie Yang, Xiu Li, Zhiheng Li

Though normalization techniques have demonstrated huge success in supervised and unsupervised learning, their applications in visual RL are still scarce.

reinforcement-learning Reinforcement Learning (RL)

Off-Policy RL Algorithms Can be Sample-Efficient for Continuous Control via Sample Multiple Reuse

no code implementations29 May 2023 Jiafei Lyu, Le Wan, Zongqing Lu, Xiu Li

Empirical results show that SMR significantly boosts the sample efficiency of the base methods across most of the evaluated tasks without any hyperparameter tuning or additional tricks.

Continuous Control Q-Learning +1

SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation

1 code implementation NeurIPS 2023 Zhuoyan Luo, Yicheng Xiao, Yong liu, Shuyan Li, Yitong Wang, Yansong Tang, Xiu Li, Yujiu Yang

To address this issue, we propose Semantic-assisted Object Cluster (SOC), which aggregates video content and textual guidance for unified temporal modeling and cross-modal alignment.

Object Referring Expression Segmentation +4

Weakly-Supervised Concealed Object Segmentation with SAM-based Pseudo Labeling and Multi-scale Feature Grouping

no code implementations NeurIPS 2023 Chunming He, Kai Li, Yachao Zhang, Guoxia Xu, Longxiang Tang, Yulun Zhang, Zhenhua Guo, Xiu Li

It remains a challenging task since (1) it is hard to distinguish concealed objects from the background due to the intrinsic similarity and (2) the sparsely-annotated training data only provide weak supervision for model learning.

Segmentation Semantic Segmentation

Towards Realizing the Value of Labeled Target Samples: a Two-Stage Approach for Semi-Supervised Domain Adaptation

no code implementations21 Apr 2023 mengqun Jin, Kai Li, Shuyan Li, Chunming He, Xiu Li

We further propose a consistency learning based mean teacher model to effectively adapt the learned UDA model using labeled and unlabeled target samples.

Semi-supervised Domain Adaptation Unsupervised Domain Adaptation

Data-Efficient Image Quality Assessment with Attention-Panel Decoder

1 code implementation11 Apr 2023 Guanyi Qin, Runze Hu, Yutao Liu, Xiawu Zheng, Haotian Liu, Xiu Li, Yan Zhang

Blind Image Quality Assessment (BIQA) is a fundamental task in computer vision, which however remains unresolved due to the complex distortion conditions and diversified image contents.

Blind Image Quality Assessment Decoder

Uncertainty-driven Trajectory Truncation for Data Augmentation in Offline Reinforcement Learning

1 code implementation10 Apr 2023 Junjie Zhang, Jiafei Lyu, Xiaoteng Ma, Jiangpeng Yan, Jun Yang, Le Wan, Xiu Li

To empirically show the advantages of TATU, we first combine it with two classical model-based offline RL algorithms, MOPO and COMBO.

D4RL Data Augmentation +3

BoxSnake: Polygonal Instance Segmentation with Box Supervision

1 code implementation ICCV 2023 Rui Yang, Lin Song, Yixiao Ge, Xiu Li

Box-supervised instance segmentation has gained much attention as it requires only simple box annotations instead of costly mask or polygon annotations.

Box-supervised Instance Segmentation Segmentation +1

SSGD: A smartphone screen glass dataset for defect detection

1 code implementation12 Mar 2023 Haonan Han, Rui Yang, Shuyan Li, Runze Hu, Xiu Li

Interactive devices with touch screen have become commonly used in various aspects of daily life, which raises the demand for high production quality of touch screen glass.

Defect Detection object-detection +1

Compressed Interaction Graph based Framework for Multi-behavior Recommendation

1 code implementation4 Mar 2023 Wei Guo, Chang Meng, Enming Yuan, ZhiCheng He, Huifeng Guo, Yingxue Zhang, Bo Chen, Yaochen Hu, Ruiming Tang, Xiu Li, Rui Zhang

However, it is challenging to explore multi-behavior data due to the unbalanced data distribution and sparse target behavior, which lead to the inadequate modeling of high-order relations when treating multi-behavior data ''as features'' and gradient conflict in multitask learning when treating multi-behavior data ''as labels''.

Multi-Task Learning

SemanticAC: Semantics-Assisted Framework for Audio Classification

no code implementations12 Feb 2023 Yicheng Xiao, Yue Ma, Shuyan Li, Hantao Zhou, Ran Liao, Xiu Li

In this paper, we propose SemanticAC, a semantics-assisted framework for Audio Classification to better leverage the semantic information.

Audio Classification Language Modelling

Model-based Transfer Learning for Automatic Optical Inspection based on domain discrepancy

1 code implementation14 Jan 2023 Erik Isai Valle Salgado, Haoxin Yan, Yue Hong, Peiyuan Zhu, Shidong Zhu, Chengwei Liao, Yanxiang Wen, Xiu Li, Xiang Qian, Xiaohao Wang, Xinghui Li

However, related research enhanced the network models by applying TL without considering the domain similarity among datasets, the data long-tailedness of a source dataset, and mainly used linear transformations to mitigate the lack of samples.

Data Augmentation Transfer Learning

Adversarial Alignment for Source Free Object Detection

no code implementations11 Jan 2023 Qiaosong Chu, Shuyan Li, Guangyi Chen, Kai Li, Xiu Li

Source-free object detection (SFOD) aims to transfer a detector pre-trained on a label-rich source domain to an unlabeled target domain without seeing source data.

Object object-detection +1

Emergent collective intelligence from massive-agent cooperation and competition

1 code implementation4 Jan 2023 HanMo Chen, Stone Tao, Jiaxin Chen, Weihan Shen, Xihui Li, Chenghui Yu, Sikai Cheng, Xiaolong Zhu, Xiu Li

Since these learned group strategies arise from individual decisions without an explicit coordination mechanism, we claim that artificial collective intelligence emerges from massive-agent cooperation and competition.

reinforcement-learning Reinforcement Learning (RL)

Degradation-Resistant Unfolding Network for Heterogeneous Image Fusion

no code implementations ICCV 2023 Chunming He, Kai Li, Guoxia Xu, Yulun Zhang, Runze Hu, Zhenhua Guo, Xiu Li

Heterogeneous image fusion (HIF) techniques aim to enhance image quality by merging complementary information from images captured by different sensors.

Camouflaged Object Detection With Feature Decomposition and Edge Reconstruction

no code implementations CVPR 2023 Chunming He, Kai Li, Yachao Zhang, Longxiang Tang, Yulun Zhang, Zhenhua Guo, Xiu Li

COD is a challenging task due to the intrinsic similarity of camouflaged objects with the background, as well as their ambiguous boundaries.

object-detection Object Detection

FLAG3D: A 3D Fitness Activity Dataset with Language Instruction

1 code implementation CVPR 2023 Yansong Tang, Jinpeng Liu, Aoyang Liu, Bin Yang, Wenxun Dai, Yongming Rao, Jiwen Lu, Jie zhou, Xiu Li

With the continuously thriving popularity around the world, fitness activity analytic has become an emerging research topic in computer vision.

Action Generation Action Recognition +2

FineDance: A Fine-grained Choreography Dataset for 3D Full Body Dance Generation

1 code implementation ICCV 2023 Ronghui Li, Junfan Zhao, Yachao Zhang, Mingyang Su, Zeping Ren, Han Zhang, Yansong Tang, Xiu Li

To address these problems, we propose FineDance, which contains 14. 6 hours of music-dance paired data, with fine-grained hand motions, fine-grained genres (22 dance genres), and accurate posture.

Motion Synthesis Retrieval

Human-machine Interactive Tissue Prototype Learning for Label-efficient Histopathology Image Segmentation

1 code implementation26 Nov 2022 Wentao Pan, Jiangpeng Yan, Hanbo Chen, Jiawei Yang, Zhe Xu, Xiu Li, Jianhua Yao

Then, the encoder is used to map the images into the embedding space and generate pixel-level pseudo tissue masks by querying the tissue prototype dictionary.

Contrastive Learning Image Segmentation +5

Disentangling Past-Future Modeling in Sequential Recommendation via Dual Networks

1 code implementation26 Oct 2022 Hengyu Zhang, Enming Yuan, Wei Guo, ZhiCheng He, Jiarui Qin, Huifeng Guo, Bo Chen, Xiu Li, Ruiming Tang

Sequential recommendation (SR) plays an important role in personalized recommender systems because it captures dynamic and diverse preferences from users' real-time increasing behaviors.

Disentanglement Information Retrieval +1

State Advantage Weighting for Offline RL

no code implementations9 Oct 2022 Jiafei Lyu, Aicheng Gong, Le Wan, Zongqing Lu, Xiu Li

We present state advantage weighting for offline reinforcement learning (RL).

D4RL Offline RL +2

Estimating Neural Reflectance Field from Radiance Field using Tree Structures

no code implementations9 Oct 2022 Xiu Li, Xiao Li, Yan Lu

A high-quality NeRF decomposition relies on good geometry information extraction as well as good prior terms to properly resolve ambiguities between different components.

Dynamics-Adaptive Continual Reinforcement Learning via Progressive Contextualization

no code implementations1 Sep 2022 Tiantian Zhang, Zichuan Lin, Yuxing Wang, Deheng Ye, Qiang Fu, Wei Yang, Xueqian Wang, Bin Liang, Bo Yuan, Xiu Li

A key challenge of continual reinforcement learning (CRL) in dynamic environments is to promptly adapt the RL agent's behavior as the environment changes over its lifetime, while minimizing the catastrophic forgetting of the learned information.

Bayesian Inference Knowledge Distillation +3

A Medical Semantic-Assisted Transformer for Radiographic Report Generation

no code implementations22 Aug 2022 Zhanyu Wang, Mingkang Tang, Lei Wang, Xiu Li, Luping Zhou

Automated radiographic report generation is a challenging cross-domain task that aims to automatically generate accurate and semantic-coherence reports to describe medical images.

Image Captioning Medical Report Generation

Neural Capture of Animatable 3D Human from Monocular Video

no code implementations18 Aug 2022 Gusi Te, Xiu Li, Xiao Li, Jinglu Wang, Wei Hu, Yan Lu

We present a novel paradigm of building an animatable 3D human representation from a monocular video input, such that it can be rendered in any unseen poses and views.

Coarse-to-Fine Knowledge-Enhanced Multi-Interest Learning Framework for Multi-Behavior Recommendation

no code implementations3 Aug 2022 Chang Meng, Ziqi Zhao, Wei Guo, Yingxue Zhang, Haolun Wu, Chen Gao, Dong Li, Xiu Li, Ruiming Tang

More specifically, we propose a novel Coarse-to-fine Knowledge-enhanced Multi-interest Learning (CKML) framework to learn shared and behavior-specific interests for different behaviors.

Towards Better Dermoscopic Image Feature Representation Learning for Melanoma Classification

1 code implementation15 Jul 2022 Chenghui Yu, Mingkang Tang, ShengGe Yang, Mingqing Wang, Zhe Xu, Jiangpeng Yan, HanMo Chen, Yu Yang, Xiao-jun Zeng, Xiu Li

Deep learning-based melanoma classification with dermoscopic images has recently shown great potential in automatic early-stage melanoma diagnosis.

Data Augmentation Denoising +2

Double Check Your State Before Trusting It: Confidence-Aware Bidirectional Offline Model-Based Imagination

1 code implementation16 Jun 2022 Jiafei Lyu, Xiu Li, Zongqing Lu

Model-based RL methods offer a richer dataset and benefit generalization by generating imaginary trajectories with either trained forward or reverse dynamics model.

D4RL Offline RL +1

Seeking Common Ground While Reserving Differences: Multiple Anatomy Collaborative Framework for Undersampled MRI Reconstruction

no code implementations15 Jun 2022 Jiangpeng Yan, Chenghui Yu, Hanbo Chen, Zhe Xu, Junzhou Huang, Xiu Li, Jianhua Yao

Four different implementations of anatomy-specific learners are presented and explored on the top of our framework in two MRI reconstruction networks.

Anatomy De-aliasing +1

Mildly Conservative Q-Learning for Offline Reinforcement Learning

3 code implementations9 Jun 2022 Jiafei Lyu, Xiaoteng Ma, Xiu Li, Zongqing Lu

The distribution shift between the learned policy and the behavior policy makes it necessary for the value function to stay conservative such that out-of-distribution (OOD) actions will not be severely overestimated.

D4RL Q-Learning +2

UniInst: Unique Representation for End-to-End Instance Segmentation

1 code implementation25 May 2022 Yimin Ou, Rui Yang, Lufan Ma, Yong liu, Jiangpeng Yan, Shang Xu, Chengjie Wang, Xiu Li

Existing instance segmentation methods have achieved impressive performance but still suffer from a common dilemma: redundant representations (e. g., multiple boxes, grids, and anchor points) are inferred for one instance, which leads to multiple duplicated predictions.

Instance Segmentation Re-Ranking +2

ScalableViT: Rethinking the Context-oriented Generalization of Vision Transformer

2 code implementations21 Mar 2022 Rui Yang, Hailong Ma, Jie Wu, Yansong Tang, Xuefeng Xiao, Min Zheng, Xiu Li

The vanilla self-attention mechanism inherently relies on pre-defined and steadfast computational dimensions.

Rethinking Goal-conditioned Supervised Learning and Its Connection to Offline RL

1 code implementation ICLR 2022 Rui Yang, Yiming Lu, Wenzhe Li, Hao Sun, Meng Fang, Yali Du, Xiu Li, Lei Han, Chongjie Zhang

In this paper, we revisit the theoretical property of GCSL -- optimizing a lower bound of the goal reaching objective, and extend GCSL as a novel offline goal-conditioned RL algorithm.

Offline RL Reinforcement Learning (RL) +1

Hybrid intelligence for dynamic job-shop scheduling with deep reinforcement learning and attention mechanism

1 code implementation3 Jan 2022 Yunhui Zeng, Zijun Liao, Yuanzhi Dai, Rong Wang, Xiu Li, Bo Yuan

The dynamic job-shop scheduling problem (DJSP) is a class of scheduling tasks that specifically consider the inherent uncertainties such as changing order requirements and possible machine breakdown in realistic smart manufacturing settings.

Graph Representation Learning Job Shop Scheduling +2

Value Activation for Bias Alleviation: Generalized-activated Deep Double Deterministic Policy Gradients

1 code implementation21 Dec 2021 Jiafei Lyu, Yu Yang, Jiangpeng Yan, Xiu Li

It is vital to accurately estimate the value function in Deep Reinforcement Learning (DRL) such that the agent could execute proper actions instead of suboptimal ones.

Continuous Control

Implicit Feature Refinement for Instance Segmentation

1 code implementation9 Dec 2021 Lufan Ma, Tiancai Wang, Bin Dong, Jiangpeng Yan, Xiu Li, Xiangyu Zhang

Our IFR enjoys several advantages: 1) simulates an infinite-depth refinement network while only requiring parameters of single residual block; 2) produces high-level equilibrium instance features of global receptive field; 3) serves as a plug-and-play general module easily extended to most object recognition frameworks.

Instance Segmentation Object Recognition +3

Double-Uncertainty Guided Spatial and Temporal Consistency Regularization Weighting for Learning-based Abdominal Registration

no code implementations6 Jul 2021 Zhe Xu, Jie Luo, Donghuan Lu, Jiangpeng Yan, Sarah Frisken, Jayender Jagadeesan, William Wells III, Xiu Li, Yefeng Zheng, Raymond Tong

Such convention has two limitations: (i) Besides the laborious grid search for the optimal fixed weight, the regularization strength of a specific image pair should be associated with the content of the images, thus the "one value fits all" training scheme is not ideal; (ii) Only spatially regularizing the transformation may neglect some informative clues related to the ill-posedness.

Image Registration

MHER: Model-based Hindsight Experience Replay

no code implementations1 Jul 2021 Rui Yang, Meng Fang, Lei Han, Yali Du, Feng Luo, Xiu Li

Replacing original goals with virtual goals generated from interaction with a trained dynamics model leads to a novel relabeling method, model-based relabeling (MBR).

Multi-Goal Reinforcement Learning reinforcement-learning +1

Self-Supervised Video Hashing via Bidirectional Transformers

1 code implementation CVPR 2021 Shuyan Li, Xiu Li, Jiwen Lu, Jie zhou

Most existing unsupervised video hashing methods are built on unidirectional models with less reliable training objectives, which underuse the correlations among frames and the similarity structure between videos.

Decoder Retrieval +1

A Self-Boosting Framework for Automated Radiographic Report Generation

no code implementations CVPR 2021 Zhanyu Wang, Luping Zhou, Lei Wang, Xiu Li

On one hand, the image-text matching branch helps to learn highly text-correlated visual features for the report generation branch to output high quality reports.

Image Captioning Image-text matching +3

A Coarse-to-Fine Instance Segmentation Network with Learning Boundary Representation

no code implementations18 Jun 2021 Feng Luo, Bin-Bin Gao, Jiangpeng Yan, Xiu Li

Experiments also show that our proposed method achieves competitive performance compared to existing boundary-based methods with a lightweight design and a simple pipeline.

Distance regression Instance Segmentation +2

Efficient Continuous Control with Double Actors and Regularized Critics

1 code implementation6 Jun 2021 Jiafei Lyu, Xiaoteng Ma, Jiangpeng Yan, Xiu Li

First, we uncover and demonstrate the bias alleviation property of double actors by building double actors upon single critic and double critics to handle overestimation bias in DDPG and underestimation bias in TD3 respectively.

Continuous Control Reinforcement Learning (RL)

Noisy Labels are Treasure: Mean-Teacher-Assisted Confident Learning for Hepatic Vessel Segmentation

1 code implementation3 Jun 2021 Zhe Xu, Donghuan Lu, Yixin Wang, Jie Luo, Jayender Jagadeesan, Kai Ma, Yefeng Zheng, Xiu Li

Manually segmenting the hepatic vessels from Computer Tomography (CT) is far more expertise-demanding and laborious than other structures due to the low-contrast and complex morphology of vessels, resulting in the extreme lack of high-quality labeled data.

Reward function shape exploration in adversarial imitation learning: an empirical study

no code implementations14 Apr 2021 Yawei Wang, Xiu Li

To ensure our results' reliability, we conduct the experiments on a series of Mujoco and Box2D continuous control tasks based on four different AILs.

Continuous Control Imitation Learning

Universal and Flexible Optical Aberration Correction Using Deep-Prior Based Deconvolution

1 code implementation ICCV 2021 Xiu Li, Jinli Suo, Weihang Zhang, Xin Yuan, Qionghai Dai

High quality imaging usually requires bulky and expensive lenses to compensate geometric and chromatic aberrations.

Frequency-Aware Spatiotemporal Transformers for Video Inpainting Detection

no code implementations ICCV 2021 Bingyao Yu, Wanhua Li, Xiu Li, Jiwen Lu, Jie zhou

In this paper, we propose a frequency-aware spatiotemporal transformers for deep In this paper, we propose a Frequency-Aware Spatiotemporal Transformer (FAST) for video inpainting detection, which aims to simultaneously mine the traces of video inpainting from spatial, temporal, and frequency domains.

Decoder Video Inpainting

Unimodal Cyclic Regularization for Training Multimodal Image Registration Networks

no code implementations12 Nov 2020 Zhe Xu, Jiangpeng Yan, Jie Luo, William Wells, Xiu Li, Jayender Jagadeesan

The loss function of an unsupervised multimodal image registration framework has two terms, i. e., a metric for similarity measure and regularization.

Image Registration

Unsupervised Multimodal Image Registration with Adaptative Gradient Guidance

no code implementations12 Nov 2020 Zhe Xu, Jiangpeng Yan, Jie Luo, Xiu Li, Jayender Jagadeesan

Multimodal image registration (MIR) is a fundamental procedure in many image-guided therapies.

Image Registration

F3RNet: Full-Resolution Residual Registration Network for Deformable Image Registration

no code implementations15 Sep 2020 Zhe Xu, Jie Luo, Jiangpeng Yan, Xiu Li, Jagadeesan Jayender

In this paper, we propose a novel unsupervised registration network, namely the Full-Resolution Residual Registration Network (F3RNet), for deformable registration of severely deformed organs.

Image Registration

Adversarial Uni- and Multi-modal Stream Networks for Multimodal Image Registration

no code implementations6 Jul 2020 Zhe Xu, Jie Luo, Jiangpeng Yan, Ritvik Pulya, Xiu Li, William Wells III, Jayender Jagadeesan

Deformable image registration between Computed Tomography (CT) images and Magnetic Resonance (MR) imaging is essential for many image-guided therapies.

Computed Tomography (CT) Image Registration +2

Disentangled Non-Local Neural Networks

5 code implementations ECCV 2020 Minghao Yin, Zhuliang Yao, Yue Cao, Xiu Li, Zheng Zhang, Stephen Lin, Han Hu

This paper first studies the non-local block in depth, where we find that its attention computation can be split into two terms, a whitened pairwise term accounting for the relationship between two pixels and a unary term representing the saliency of every pixel.

Action Recognition object-detection +2

Wasserstein Distance guided Adversarial Imitation Learning with Reward Shape Exploration

1 code implementation5 Jun 2020 Ming Zhang, Yawei Wang, Xiaoteng Ma, Li Xia, Jun Yang, Zhiheng Li, Xiu Li

The generative adversarial imitation learning (GAIL) has provided an adversarial learning framework for imitating expert policy from demonstrations in high-dimensional continuous tasks.

Continuous Control Imitation Learning

Neural Architecture Search for Compressed Sensing Magnetic Resonance Image Reconstruction

1 code implementation22 Feb 2020 Jiangpeng Yan, Shuo Chen, Yongbing Zhang, Xiu Li

Our proposed method can reach a better trade-off between computation cost and reconstruction performance for MR reconstruction problem with good generalizability and offer insights to design neural networks for other medical image applications.

Image Reconstruction Neural Architecture Search +1

PgNN: Physics-guided Neural Network for Fourier Ptychographic Microscopy

no code implementations19 Sep 2019 Yongbing Zhang, Yangzhe Liu, Xiu Li, Shaowei Jiang, Krishna Dixit, Xinfeng Zhang, Xiangyang Ji

Since the optimal parameters of the PgNN can be derived by minimizing the difference between the model-generated images and real captured angle-varied images corresponding to the same scene, the proposed PgNN can get rid of the problem of massive training data as in traditional supervised methods.

On the Mathematical Understanding of ResNet with Feynman Path Integral

no code implementations16 Apr 2019 Minghao Yin, Xiu Li, Yongbing Zhang, Shiqi Wang

In this paper, we aim to understand Residual Network (ResNet) in a scientifically sound way by providing a bridge between ResNet and Feynman path integral.

Capture Dense: Markerless Motion Capture Meets Dense Pose Estimation

no code implementations5 Dec 2018 Xiu Li, Yebin Liu, Hanbyul Joo, Qionghai Dai, Yaser Sheikh

Specifically, we first introduce a novel markerless motion capture method that can take advantage of dense parsing capability provided by the dense pose detector.

Human Parsing Markerless Motion Capture +1

Structure from Recurrent Motion: From Rigidity to Recurrency

no code implementations CVPR 2018 Xiu Li, Hongdong Li, Hanbyul Joo, Yebin Liu, Yaser Sheikh

This paper proposes a new method for Non-Rigid Structure-from-Motion (NRSfM) from a long monocular video sequence observing a non-rigid object performing recurrent and possibly repetitive dynamic action.


Joint Training of Cascaded CNN for Face Detection

no code implementations CVPR 2016 Hongwei Qin, Junjie Yan, Xiu Li, Xiaolin Hu

Cascade has been widely used in face detection, where classifier with low computation cost can be firstly used to shrink most of the background while keeping the recall.

Face Detection Region Proposal

