Search Results for author: Xiaoqin Zhang

Found 34 papers, 8 papers with code

CorrMAE: Pre-training Correspondence Transformers with Masked Autoencoder

no code implementations9 Jun 2024 Tangfei Liao, Xiaoqin Zhang, Guobao Xiao, Min Li, Tao Wang, Mang Ye

To tackle these challenges, we propose a pre-training method to acquire a generic inliers-consistent representation by reconstructing masked correspondences, providing a strong initial representation for downstream tasks.

Representation Learning

One-shot Training for Video Object Segmentation

no code implementations22 May 2024 Baiyu Chen, Sixian Chan, Xiaoqin Zhang

To address these issues, we propose, for the first time, a general one-shot training framework for VOS, requiring only a single labeled frame per training video and applicable to a majority of state-of-the-art VOS networks.

Object Semantic Segmentation +2

MonoMAE: Enhancing Monocular 3D Detection through Depth-Aware Masked Autoencoders

no code implementations13 May 2024 Xueying Jiang, Sheng Jin, Xiaoqin Zhang, Ling Shao, Shijian Lu

With the proposed object occlusion and completion, MonoMAE learns enriched 3D representations that achieve superior monocular 3D detection performance qualitatively and quantitatively for both occluded and non-occluded objects.

Monocular 3D Object Detection Object +1

An Experimental Study on Exploring Strong Lightweight Vision Transformers via Masked Image Modeling Pre-Training

1 code implementation18 Apr 2024 Jin Gao, Shubo Lin, Shaoru Wang, Yutong Kou, Zeming Li, Liang Li, Congxuan Zhang, Xiaoqin Zhang, Yizheng Wang, Weiming Hu

In this paper, we question if the \textit{extremely simple} lightweight ViTs' fine-tuning performance can also benefit from this pre-training paradigm, which is considerably less studied yet in contrast to the well-established lightweight architecture design methodology.

Contrastive Learning Image Classification +2

Masked AutoDecoder is Effective Multi-Task Vision Generalist

1 code implementation CVPR 2024 Han Qiu, Jiaxing Huang, Peng Gao, Lewei Lu, Xiaoqin Zhang, Shijian Lu

Inspired by the success of general-purpose models in NLP, recent studies attempt to unify different vision tasks in the same sequence format and employ autoregressive Transformers for sequence prediction.

Weakly Supervised Monocular 3D Detection with a Single-View Image

no code implementations CVPR 2024 Xueying Jiang, Sheng Jin, Lewei Lu, Xiaoqin Zhang, Shijian Lu

We propose SKD-WM3D, a weakly supervised monocular 3D detection framework that exploits depth information to achieve M3D with a single-view image exclusively without any 3D annotations or other training data.

Object Localization Self-Knowledge Distillation +1

Conditional Tuning Network for Few-Shot Adaptation of Segmentation Anything Model

no code implementations6 Feb 2024 Aoran Xiao, Weihao Xuan, Heli Qi, Yun Xing, Ruijie Ren, Xiaoqin Zhang, Ling Shao, Shijian Lu

CAT-SAM freezes the entire SAM and adapts its mask decoder and image encoder simultaneously with a small number of learnable parameters.

Decoder Image Segmentation +1

VSFormer: Visual-Spatial Fusion Transformer for Correspondence Pruning

1 code implementation14 Dec 2023 Tangfei Liao, Xiaoqin Zhang, Li Zhao, Tao Wang, Guobao Xiao

Then, we model these visual cues and correspondences by a joint visual-spatial fusion module, simultaneously embedding visual cues into correspondences for pruning.

Handling The Non-Smooth Challenge in Tensor SVD: A Multi-Objective Tensor Recovery Framework

1 code implementation23 Nov 2023 Jingjing Zheng, Wanglong Lu, Wenzhe Wang, Yankai Cao, Xiaoqin Zhang, Xianta Jiang

We develop a new optimization algorithm named the Alternating Proximal Multiplier Method (APMM) to iteratively solve the proposed tensor completion model.

Tensor Decomposition

Adversarial Attacks on Video Object Segmentation with Hard Region Discovery

no code implementations25 Sep 2023 Ping Li, Yu Zhang, Li Yuan, Jian Zhao, Xianghua Xu, Xiaoqin Zhang

Particularly, the gradients from the segmentation model are exploited to discover the easily confused region, in which it is difficult to identify the pixel-wise objects from the background in a frame.

Autonomous Driving Object +5

Pose-Free Neural Radiance Fields via Implicit Pose Regularization

no code implementations ICCV 2023 Jiahui Zhang, Fangneng Zhan, Yingchen Yu, Kunhao Liu, Rongliang Wu, Xiaoqin Zhang, Ling Shao, Shijian Lu

However, as the pose estimator is trained with only rendered images, the pose estimation is usually biased or inaccurate for real images due to the domain gap between real images and rendered images, leading to poor robustness for the pose estimation of real images and further local minima in joint optimization.

Novel View Synthesis Pose Estimation

A Survey of Label-Efficient Deep Learning for 3D Point Clouds

1 code implementation31 May 2023 Aoran Xiao, Xiaoqin Zhang, Ling Shao, Shijian Lu

We address three critical questions in this emerging research field: i) the importance and urgency of label-efficient learning in point cloud processing, ii) the subfields it encompasses, and iii) the progress achieved in this area.

Data Augmentation Efficient Exploration +2

A Novel Tensor Factorization-Based Method with Robustness to Inaccurate Rank Estimation

no code implementations19 May 2023 Jingjing Zheng, Wenzhe Wang, Xiaoqin Zhang, Xianta Jiang

This study aims to solve the over-reliance on the rank estimation strategy in the standard tensor factorization-based tensor recovery and the problem of a large computational cost in the standard t-SVD-based tensor recovery.

Audio-Driven Talking Face Generation with Diverse yet Realistic Facial Animations

no code implementations18 Apr 2023 Rongliang Wu, Yingchen Yu, Fangneng Zhan, Jiahui Zhang, Xiaoqin Zhang, Shijian Lu

To accommodate fair variation of plausible facial animations for the same audio, we design a transformer-based probabilistic mapping network that can model the variational facial animation distribution conditioned upon the input audio and autoregressively convert the audio signals into a facial animation sequence.

Talking Face Generation

Towards Efficient Use of Multi-Scale Features in Transformer-Based Object Detectors

no code implementations CVPR 2023 Gongjie Zhang, Zhipeng Luo, Zichen Tian, Jingyi Zhang, Xiaoqin Zhang, Shijian Lu

Multi-scale features have been proven highly effective for object detection but often come with huge and even prohibitive extra computation costs, especially for the recent Transformer-based detectors.

Decoder Object +2

Latent Multi-Relation Reasoning for GAN-Prior based Image Super-Resolution

no code implementations4 Aug 2022 Jiahui Zhang, Fangneng Zhan, Yingchen Yu, Rongliang Wu, Xiaoqin Zhang, Shijian Lu

In addition, stochastic noises fed to the generator are employed for unconditional detail generation, which tends to produce unfaithful details that compromise the fidelity of the generated SR image.

Attribute Code Generation +3

VMRF: View Matching Neural Radiance Fields

no code implementations6 Jul 2022 Jiahui Zhang, Fangneng Zhan, Rongliang Wu, Yingchen Yu, Wenqing Zhang, Bai Song, Xiaoqin Zhang, Shijian Lu

With the feature transport plan as the guidance, a novel pose calibration technique is designed which rectifies the initially randomized camera poses by predicting relative pose transformations between the pair of rendered and real images.

Novel View Synthesis

A Closer Look at Self-Supervised Lightweight Vision Transformers

2 code implementations28 May 2022 Shaoru Wang, Jin Gao, Zeming Li, Xiaoqin Zhang, Weiming Hu

We also point out some defects of such pre-training, e. g., failing to benefit from large-scale pre-training data and showing inferior performance on data-insufficient downstream tasks.

Contrastive Learning Image Classification +1

DcnnGrasp: Towards Accurate Grasp Pattern Recognition with Adaptive Regularizer Learning

no code implementations11 May 2022 Xiaoqin Zhang, Ziwei Huang, Jingjing Zheng, Shuo Wang, Xianta Jiang

The task of grasp pattern recognition aims to derive the applicable grasp types of an object according to the visual information.

Object

Infrared and Visible Image Fusion via Interactive Compensatory Attention Adversarial Learning

1 code implementation29 Mar 2022 Zhishe Wang, Wenyu Shao, Yanlin Chen, Jiawei Xu, Xiaoqin Zhang

The existing generative adversarial fusion methods generally concatenate source images and extract local features through convolution operation, without considering their global characteristics, which tends to produce an unbalanced result and is biased towards the infrared image or visible image.

Decoder Infrared And Visible Image Fusion

Unsupervised Point Cloud Representation Learning with Deep Neural Networks: A Survey

1 code implementation28 Feb 2022 Aoran Xiao, Jiaxing Huang, Dayan Guan, Xiaoqin Zhang, Shijian Lu, Ling Shao

The convergence of point cloud and DNNs has led to many deep point cloud models, largely trained under the supervision of large-scale and densely-labelled point cloud data.

Autonomous Driving Representation Learning

Semantics-Guided Contrastive Network for Zero-Shot Object detection

no code implementations4 Sep 2021 Caixia Yan, Xiaojun Chang, Minnan Luo, Huan Liu, Xiaoqin Zhang, Qinghua Zheng

To address these issues, we develop a novel Semantics-Guided Contrastive Network for ZSD, named ContrastZSD, a detection framework that first brings contrastive learning mechanism into the realm of zero-shot detection.

Contrastive Learning Generalized Zero-Shot Object Detection +3

DA-DETR: Domain Adaptive Detection Transformer with Information Fusion

no code implementations CVPR 2023 Jingyi Zhang, Jiaxing Huang, Zhipeng Luo, Gongjie Zhang, Xiaoqin Zhang, Shijian Lu

DA-DETR introduces a novel CNN-Transformer Blender (CTBlender) that fuses the CNN features and Transformer features ingeniously for effective feature alignment and knowledge transfer across domains.

Domain Adaptation Object +3

Referring Segmentation in Images and Videos with Cross-Modal Self-Attention Network

no code implementations9 Feb 2021 Linwei Ye, Mrigank Rochan, Zhi Liu, Xiaoqin Zhang, Yang Wang

In this paper, we propose a cross-modal self-attention (CMSA) module to utilize fine details of individual words and the input image or video, which effectively captures the long-range dependencies between linguistic and visual features.

Ranked #5 on Referring Expression Segmentation on J-HMDB (Precision@0.9 metric)

Referring Expression Referring Expression Segmentation +3

Self-Weighted Robust LDA for Multiclass Classification with Edge Classes

no code implementations24 Sep 2020 Caixia Yan, Xiaojun Chang, Minnan Luo, Qinghua Zheng, Xiaoqin Zhang, Zhihui Li, Feiping Nie

In this regard, a novel self-weighted robust LDA with l21-norm based pairwise between-class distance criterion, called SWRLDA, is proposed for multi-class classification especially with edge classes.

Classification Computational Efficiency +2

Pretrain Soft Q-Learning with Imperfect Demonstrations

no code implementations9 May 2019 Xiaoqin Zhang, Yunfei Li, Huimin Ma, Xiong Luo

Pretraining reinforcement learning methods with demonstrations has been an important concept in the study of reinforcement learning since a large amount of computing power is spent on online simulations with existing reinforcement learning algorithms.

Q-Learning reinforcement-learning +1

Pretraining Deep Actor-Critic Reinforcement Learning Algorithms With Expert Demonstrations

no code implementations31 Jan 2018 Xiaoqin Zhang, Huimin Ma

We apply our method to two of the typical actor-critic reinforcement learning algorithms, DDPG and ACER, and demonstrate with experiments that our method not only outperforms the RL algorithms without pretraining process, but also is more simulation efficient.

reinforcement-learning Reinforcement Learning (RL)

Constructive neural network learning

no code implementations30 Apr 2016 Shaobo Lin, Jinshan Zeng, Xiaoqin Zhang

In this paper, we aim at developing scalable neural network-type learning systems.

Local Subspace Collaborative Tracking

no code implementations ICCV 2015 Lin Ma, Xiaoqin Zhang, Weiming Hu, Junliang Xing, Jiwen Lu, Jie zhou

To address this, this paper presents a local subspace collaborative tracking method for robust visual tracking, where multiple linear and nonlinear subspaces are learned to better model the nonlinear relationship of object appearances.

Object Object Tracking +1

Multiple Object Tracking: A Literature Review

no code implementations26 Sep 2014 Wenhan Luo, Junliang Xing, Anton Milan, Xiaoqin Zhang, Wei Liu, Tae-Kyun Kim

We inspect the recent advances in various aspects and propose some interesting directions for future research.

Multiple Object Tracking Object

Simultaneous Rectification and Alignment via Robust Recovery of Low-rank Tensors

no code implementations NeurIPS 2013 Xiaoqin Zhang, Di Wang, Zhengyuan Zhou, Yi Ma

In this context, the state-of-the-art algorithms RASL'' and "TILT'' can be viewed as two special cases of our work, and yet each only performs part of the function of our method."

Computational Efficiency

Cannot find the paper you are looking for? You can Submit a new open access paper.