Search Results for author: Changxin Gao

Found 70 papers, 46 papers with code

DFIMat: Decoupled Flexible Interactive Matting in Multi-Person Scenarios

1 code implementation13 Oct 2024 Siyi Jiao, Wenzheng Zeng, Changxin Gao, Nong Sang

(2) Existing works are limited to a single type of user input, which is ineffective for intention understanding and also inefficient for user operation.

Image Matting Synthetic Data Generation

Replace Anyone in Videos

no code implementations30 Sep 2024 Xiang Wang, Changxin Gao, Yuehuan Wang, Nong Sang

Recent advancements in controllable human-centric video generation, particularly with the rise of diffusion models, have demonstrated considerable progress.

Video Generation Video Inpainting

Cross-video Identity Correlating for Person Re-identification Pre-training

1 code implementation27 Sep 2024 Jialong Zuo, Ying Nie, Hanyu Zhou, Huaxin Zhang, Haoyu Wang, Tianyu Guo, Nong Sang, Changxin Gao

For example, compared with the previous state-of-the-art~\cite{ISR}, CION with the same ResNet50-IBN achieves higher mAP of 93. 3\% and 74. 3\% on Market1501 and MSMT17, while only utilizing 8\% training samples.

Denoising Person Re-Identification

Holmes-VAD: Towards Unbiased and Explainable Video Anomaly Detection via Multi-modal LLM

1 code implementation18 Jun 2024 Huaxin Zhang, Xiaohao Xu, Xiang Wang, Jialong Zuo, Chuchu Han, Xiaonan Huang, Changxin Gao, Yuehuan Wang, Nong Sang

We train a lightweight temporal sampler to select frames with high anomaly response and fine-tune a multimodal large language model (LLM) to generate explanatory content.

Anomaly Detection Anomaly Localization +4

Open-Vocabulary Semantic Segmentation with Image Embedding Balancing

1 code implementation CVPR 2024 Xiangheng Shan, Dongyue Wu, Guilin Zhu, Yuanjie Shao, Nong Sang, Changxin Gao

To learn a consistent semantic structure from CLIP, the SSC Loss aligns the inter-classes affinity in the image feature space with that in the text feature space of CLIP, thereby improving the generalization ability of our model.

Decoder Open Vocabulary Semantic Segmentation +2

UniAnimate: Taming Unified Video Diffusion Models for Consistent Human Image Animation

no code implementations3 Jun 2024 Xiang Wang, Shiwei Zhang, Changxin Gao, Jiayu Wang, Xiaoqiang Zhou, Yingya Zhang, Luxin Yan, Nong Sang

First, to reduce the optimization difficulty and ensure temporal coherence, we map the reference image along with the posture guidance and noise video into a common feature space by incorporating a unified video diffusion model.

Image Animation Video Generation

Tunnel Try-on: Excavating Spatial-temporal Tunnels for High-quality Virtual Try-on in Videos

no code implementations26 Apr 2024 Zhengze Xu, Mengting Chen, Zhao Wang, Linyu Xing, Zhonghua Zhai, Nong Sang, Jinsong Lan, Shuai Xiao, Changxin Gao

To generate coherent motions, we first leverage the Kalman filter to construct smooth crops in the focus tunnel and inject the position embedding of the tunnel into attention layers to improve the continuity of the generated videos.

Virtual Try-on

REPAIR: Rank Correlation and Noisy Pair Half-replacing with Memory for Noisy Correspondence

no code implementations13 Mar 2024 Ruochen Zheng, Jiahao Hong, Changxin Gao, Nong Sang

Unfortunately, obtaining precise annotations in the multimodal field is expensive, which has prompted some methods to tackle the mismatched data pair issue in cross-modal matching contexts, termed as noisy correspondence.

Cross-modal retrieval with noisy correspondence

GlanceVAD: Exploring Glance Supervision for Label-efficient Video Anomaly Detection

1 code implementation10 Mar 2024 Huaxin Zhang, Xiang Wang, Xiaohao Xu, Xiaonan Huang, Chuchu Han, Yuehuan Wang, Changxin Gao, Shanjun Zhang, Nong Sang

In recent years, video anomaly detection has been extensively investigated in both unsupervised and weakly supervised settings to alleviate costly temporal labeling.

Anomaly Detection Video Anomaly Detection

Few-Shot Learning for Annotation-Efficient Nucleus Instance Segmentation

no code implementations26 Feb 2024 Yu Ming, Zihao Wu, Jie Yang, Danyi Li, Yuan Gao, Changxin Gao, Gui-Song Xia, Yuanqing Li, Li Liang, Jin-Gang Yu

In this paper, we propose to formulate annotation-efficient nucleus instance segmentation from the perspective of few-shot learning (FSL).

Few-shot Instance Segmentation Few-Shot Learning +4

VS: Reconstructing Clothed 3D Human from Single Image via Vertex Shift

1 code implementation CVPR 2024 Leyuan Liu, Yuhan Li, Yunqi Gao, Changxin Gao, Yuanyuan Liu, Jingying Chen

However current implicit function-based methods inevitably produce artifacts while existing deformation methods are difficult to reconstruct high-fidelity humans wearing loose clothing.

SCTNet: Single-Branch CNN with Transformer Semantic Information for Real-Time Segmentation

1 code implementation28 Dec 2023 Zhengze Xu, Dongyue Wu, Changqian Yu, Xiangxiang Chu, Nong Sang, Changxin Gao

Recent real-time semantic segmentation methods usually adopt an additional semantic branch to pursue rich long-range context.

Real-Time Semantic Segmentation

A Recipe for Scaling up Text-to-Video Generation with Text-free Videos

1 code implementation CVPR 2024 Xiang Wang, Shiwei Zhang, Hangjie Yuan, Zhiwu Qing, Biao Gong, Yingya Zhang, Yujun Shen, Changxin Gao, Nong Sang

Following such a pipeline, we study the effect of doubling the scale of training set (i. e., video-only WebVid10M) with some randomly collected text-free videos and are encouraged to observe the performance improvement (FID from 9. 67 to 8. 19 and FVD from 484 to 441), demonstrating the scalability of our approach.

Text-to-Image Generation Text-to-Video Generation +2

VideoLCM: Video Latent Consistency Model

2 code implementations14 Dec 2023 Xiang Wang, Shiwei Zhang, Han Zhang, Yu Liu, Yingya Zhang, Changxin Gao, Nong Sang

Consistency models have demonstrated powerful capability in efficient image generation and allowed synthesis within a few sampling steps, alleviating the high computational cost in diffusion models.

Computational Efficiency Image Generation +1

Few-shot Action Recognition with Captioning Foundation Models

no code implementations16 Oct 2023 Xiang Wang, Shiwei Zhang, Hangjie Yuan, Yingya Zhang, Changxin Gao, Deli Zhao, Nong Sang

In this paper, we develop an effective plug-and-play framework called CapFSAR to exploit the knowledge of multimodal models without manually annotating text.

Few-Shot action recognition Few Shot Action Recognition

Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning

1 code implementation ICCV 2023 Zhiwu Qing, Shiwei Zhang, Ziyuan Huang, Yingya Zhang, Changxin Gao, Deli Zhao, Nong Sang

When pre-training on the large-scale Kinetics-710, we achieve 89. 7% on Kinetics-400 with a frozen ViT-L model, which verifies the scalability of DiST.

Transfer Learning Video Recognition

Adaptive Semantic Consistency for Cross-domain Few-shot Classification

1 code implementation1 Aug 2023 Hengchu Lu, Yuanjie Shao, Xiang Wang, Changxin Gao

In this way, the proposed ASC enables explicit transfer of source domain knowledge to prevent the model from overfitting the target domain.

Classification Cross-Domain Few-Shot

Conditional Boundary Loss for Semantic Segmentation

1 code implementation IEEE Transactions on Image Processing 2023 Dongyue Wu, Zilin Guo, Aoyan Li, Changqian Yu, Nong Sang, Changxin Gao

We conduct extensive experiments on ADE20K, Cityscapes, and Pascal Context, and the results show that applying the CBL to various popular segmentation networks can significantly improve the mIoU and boundary F-score performance.

Segmentation Semantic Segmentation

MoLo: Motion-augmented Long-short Contrastive Learning for Few-shot Action Recognition

1 code implementation CVPR 2023 Xiang Wang, Shiwei Zhang, Zhiwu Qing, Changxin Gao, Yingya Zhang, Deli Zhao, Nong Sang

To address these issues, we develop a Motion-augmented Long-short Contrastive Learning (MoLo) method that contains two crucial components, including a long-short contrastive objective and a motion autodecoder.

Contrastive Learning Few-Shot action recognition +1

CLIP-guided Prototype Modulating for Few-shot Action Recognition

1 code implementation6 Mar 2023 Xiang Wang, Shiwei Zhang, Jun Cen, Changxin Gao, Yingya Zhang, Deli Zhao, Nong Sang

Learning from large-scale contrastive language-image pre-training like CLIP has shown remarkable success in a wide range of downstream tasks recently, but it is still under-explored on the challenging few-shot action recognition (FSAR) task.

Few-Shot action recognition Few Shot Action Recognition

Semantic Segmentation via Pixel-to-Center Similarity Calculation

no code implementations12 Jan 2023 Dongyue Wu, Zilin Guo, Aoyan Li, Changqian Yu, Changxin Gao, Nong Sang

Under this novel view, we propose a Class Center Similarity layer (CCS layer) to address the above-mentioned challenges by generating adaptive class centers conditioned on different scenes and supervising the similarities between class centers.

Segmentation Semantic Segmentation

Cross Attention Based Style Distribution for Controllable Person Image Synthesis

1 code implementation1 Aug 2022 Xinyue Zhou, Mingyu Yin, Xinyuan Chen, Li Sun, Changxin Gao, Qingli Li

In this paper, we propose a cross attention based style distribution module that computes between the source semantic styles and target pose for pose transfer.

Pose Transfer Virtual Try-on

MAR: Masked Autoencoders for Efficient Action Recognition

1 code implementation24 Jul 2022 Zhiwu Qing, Shiwei Zhang, Ziyuan Huang, Xiang Wang, Yuehuan Wang, Yiliang Lv, Changxin Gao, Nong Sang

Inspired by this, we propose propose Masked Action Recognition (MAR), which reduces the redundant computation by discarding a proportion of patches and operating only on a part of the videos.

Action Classification Action Recognition +1

Context-aware Proposal Network for Temporal Action Detection

no code implementations18 Jun 2022 Xiang Wang, Huaxin Zhang, Shiwei Zhang, Changxin Gao, Yuanjie Shao, Nong Sang

This technical report presents our first place winning solution for temporal action detection task in CVPR-2022 AcitivityNet Challenge.

Action Classification Action Detection

Hybrid Relation Guided Set Matching for Few-shot Action Recognition

1 code implementation CVPR 2022 Xiang Wang, Shiwei Zhang, Zhiwu Qing, Mingqian Tang, Zhengrong Zuo, Changxin Gao, Rong Jin, Nong Sang

To overcome the two limitations, we propose a novel Hybrid Relation guided Set Matching (HyRSM) approach that incorporates two key components: hybrid relation module and set matching metric.

Few Shot Action Recognition Relation +1

Style Transformer for Image Inversion and Editing

1 code implementation CVPR 2022 Xueqi Hu, Qiusheng Huang, Zhengyi Shi, Siyuan Li, Changxin Gao, Li Sun, Qingli Li

Existing GAN inversion methods fail to provide latent codes for reliable reconstruction and flexible editing simultaneously.

Attribute Image-to-Image Translation

Multi-Centroid Representation Network for Domain Adaptive Person Re-ID

no code implementations22 Dec 2021 Yuhang Wu, Tengteng Huang, Haotian Yao, Chi Zhang, Yuanjie Shao, Chuchu Han, Changxin Gao, Nong Sang

First, we present a Domain-Specific Contrastive Learning (DSCL) mechanism to fully explore intradomain information by comparing samples only from the same domain.

Contrastive Learning Domain Adaptive Person Re-Identification +2

Modality-Aware Triplet Hard Mining for Zero-shot Sketch-Based Image Retrieval

1 code implementation15 Dec 2021 Zongheng Huang, Yifan Sun, Chuchu Han, Changxin Gao, Nong Sang

By combining two fundamental learning approaches in DML, e. g., classification training and pairwise training, we set up a strong baseline for ZS-SBIR.

Metric Learning Retrieval +2

Unsupervised Low-Light Image Enhancement via Histogram Equalization Prior

1 code implementation3 Dec 2021 Feng Zhang, Yuanjie Shao, Yishi Sun, Kai Zhu, Changxin Gao, Nong Sang

We introduce a Noise Disentanglement Module (NDM) to disentangle the noise and content in the reflectance maps with the reliable aid of unpaired clean images.

Disentanglement Image Restoration +1

Attribute-specific Control Units in StyleGAN for Fine-grained Image Manipulation

1 code implementation25 Nov 2021 Rui Wang, Jian Chen, Gang Yu, Li Sun, Changqian Yu, Changxin Gao, Nong Sang

Image manipulation with StyleGAN has been an increasing concern in recent years. Recent works have achieved tremendous success in analyzing several semantic latent spaces to edit the attributes of the generated images. However, due to the limited semantic and spatial manipulation precision in these latent spaces, the existing endeavors are defeated in fine-grained StyleGAN image manipulation, i. e., local attribute translation. To address this issue, we discover attribute-specific control units, which consist of multiple channels of feature maps and modulation styles.

Attribute Image Manipulation

CondNet: Conditional Classifier for Scene Segmentation

2 code implementations21 Sep 2021 Changqian Yu, Yuanjie Shao, Changxin Gao, Nong Sang

The last layer of FCN is typically a global classifier (1x1 convolution) to recognize each pixel to a semantic label.

Scene Segmentation Segmentation

ParamCrop: Parametric Cubic Cropping for Video Contrastive Learning

1 code implementation24 Aug 2021 Zhiwu Qing, Ziyuan Huang, Shiwei Zhang, Mingqian Tang, Changxin Gao, Marcelo H. Ang Jr, Rong Jin, Nong Sang

The visualizations show that ParamCrop adaptively controls the center distance and the IoU between two augmented views, and the learned change in the disparity along the training process is beneficial to learning a strong representation.

Contrastive Learning

OadTR: Online Action Detection with Transformers

1 code implementation ICCV 2021 Xiang Wang, Shiwei Zhang, Zhiwu Qing, Yuanjie Shao, Zhengrong Zuo, Changxin Gao, Nong Sang

Most recent approaches for online action detection tend to apply Recurrent Neural Network (RNN) to capture long-range temporal structure.

Decoder Online Action Detection

A Stronger Baseline for Ego-Centric Action Detection

1 code implementation13 Jun 2021 Zhiwu Qing, Ziyuan Huang, Xiang Wang, Yutong Feng, Shiwei Zhang, Jianwen Jiang, Mingqian Tang, Changxin Gao, Marcelo H. Ang Jr, Nong Sang

This technical report analyzes an egocentric video action detection method we used in the 2021 EPIC-KITCHENS-100 competition hosted in CVPR2021 Workshop.

Action Detection

Lite-HRNet: A Lightweight High-Resolution Network

15 code implementations CVPR 2021 Changqian Yu, Bin Xiao, Changxin Gao, Lu Yuan, Lei Zhang, Nong Sang, Jingdong Wang

We introduce a lightweight unit, conditional channel weighting, to replace costly pointwise (1x1) convolutions in shuffle blocks.

Ranked #33 on Pose Estimation on COCO test-dev (using extra training data)

Pose Estimation Real-Time Semantic Segmentation +1

Temporal Context Aggregation Network for Temporal Action Proposal Refinement

1 code implementation CVPR 2021 Zhiwu Qing, Haisheng Su, Weihao Gan, Dongliang Wang, Wei Wu, Xiang Wang, Yu Qiao, Junjie Yan, Changxin Gao, Nong Sang

In this paper, we propose Temporal Context Aggregation Network (TCANet) to generate high-quality action proposals through "local and global" temporal context aggregation and complementary as well as progressive boundary refinement.

Action Detection Retrieval +2

Decoupled and Memory-Reinforced Networks: Towards Effective Feature Learning for One-Step Person Search

no code implementations22 Feb 2021 Chuchu Han, Zhedong Zheng, Changxin Gao, Nong Sang, Yi Yang

Specifically, to reconcile the conflicts of multiple objectives, we simplify the standard tightly coupled pipelines and establish a deeply decoupled multi-task learning framework.

Metric Learning Multi-Task Learning +2

Weakly Supervised Text-Based Person Re-Identification

1 code implementation ICCV 2021 Shizhen Zhao, Changxin Gao, Yuanjie Shao, Wei-Shi Zheng, Nong Sang

Specifically, to alleviate the intra-class variations, a clustering method is utilized to generate pseudo labels for both visual and textual instances.

Clustering Person Re-Identification +1

Representative Graph Neural Network

no code implementations ECCV 2020 Changqian Yu, Yifan Liu, Changxin Gao, Chunhua Shen, Nong Sang

In this paper, we present a Representative Graph (RepGraph) layer to dynamically sample a few representative features, which dramatically reduces redundancy.

Graph Neural Network object-detection +2

Multi-Level Temporal Pyramid Network for Action Detection

no code implementations7 Aug 2020 Xiang Wang, Changxin Gao, Shiwei Zhang, Nong Sang

By this means, the proposed MLTPN can learn rich and discriminative features for different action instances with different durations.

Action Detection

Temporal Fusion Network for Temporal Action Localization:Submission to ActivityNet Challenge 2020 (Task E)

no code implementations13 Jun 2020 Zhiwu Qing, Xiang Wang, Yongpeng Sang, Changxin Gao, Shiwei Zhang, Nong Sang

This technical report analyzes a temporal action localization method we used in the HACS competition which is hosted in Activitynet Challenge 2020. The goal of our task is to locate the start time and end time of the action in the untrimmed video, and predict action category. Firstly, we utilize the video-level feature information to train multiple video-level action classification models.

Action Classification Temporal Action Localization

Relevant Region Prediction for Crowd Counting

no code implementations20 May 2020 Xinya Chen, Yanrui Bin, Changxin Gao, Nong Sang, Hao Tang

The module builds a fully connected directed graph between the regions of different density where each node (region) is represented by weighted global pooled feature, and GCN is learned to map this region graph to a set of relation-aware regions representations.

Crowd Counting Relation

Domain Adaptation for Image Dehazing

1 code implementation CVPR 2020 Yuanjie Shao, Lerenhan Li, Wenqi Ren, Changxin Gao, Nong Sang

By training image translation and dehazing network in an end-to-end manner, we can obtain better effects of both image translation and dehazing.

Domain Adaptation Image Dehazing +1

BiSeNet V2: Bilateral Network with Guided Aggregation for Real-time Semantic Segmentation

7 code implementations5 Apr 2020 Changqian Yu, Changxin Gao, Jingbo Wang, Gang Yu, Chunhua Shen, Nong Sang

We propose to treat these spatial details and categorical semantics separately to achieve high accuracy and high efficiency for realtime semantic segmentation.

Real-Time Semantic Segmentation Segmentation

Context Prior for Scene Segmentation

2 code implementations CVPR 2020 Changqian Yu, Jingbo Wang, Changxin Gao, Gang Yu, Chunhua Shen, Nong Sang

Given an input image and corresponding ground truth, Affinity Loss constructs an ideal affinity map to supervise the learning of Context Prior.

Scene Segmentation Scene Understanding +1

FGN: Fully Guided Network for Few-Shot Instance Segmentation

no code implementations CVPR 2020 Zhibo Fan, Jin-Gang Yu, Zhihao Liang, Jiarong Ou, Changxin Gao, Gui-Song Xia, Yuanqing Li

Few-shot instance segmentation (FSIS) conjoins the few-shot learning paradigm with general instance segmentation, which provides a possible way of tackling instance segmentation in the lack of abundant labeled data for training.

Few-shot Instance Segmentation Few-Shot Learning +3

GTNet: Generative Transfer Network for Zero-Shot Object Detection

1 code implementation19 Jan 2020 Shizhen Zhao, Changxin Gao, Yuanjie Shao, Lerenhan Li, Changqian Yu, Zhong Ji, Nong Sang

FFU and BFU add the IoU variance to the results of CFU, yielding class-specific foreground and background features, respectively.

Generative Adversarial Network Object +3

Re-ID Driven Localization Refinement for Person Search

no code implementations ICCV 2019 Chuchu Han, Jiacheng Ye, Yunshan Zhong, Xin Tan, Chi Zhang, Changxin Gao, Nong Sang

The state-of-the-art methods train the detector individually, and the detected bounding boxes may be sub-optimal for the following re-ID task.

Person Re-Identification Person Search

Learning a Discriminative Feature Network for Semantic Segmentation

3 code implementations CVPR 2018 Changqian Yu, Jingbo Wang, Chao Peng, Changxin Gao, Gang Yu, Nong Sang

Most existing methods of semantic segmentation still suffer from two aspects of challenges: intra-class inconsistency and inter-class indistinction.

Semantic Segmentation Thermal Image Segmentation

Learning a Discriminative Prior for Blind Image Deblurring

no code implementations CVPR 2018 Lerenhan Li, Jinshan Pan, Wei-Sheng Lai, Changxin Gao, Nong Sang, Ming-Hsuan Yang

We present an effective blind image deblurring method based on a data-driven discriminative prior. Our work is motivated by the fact that a good image prior should favor clear images over blurred images. In this work, we formulate the image prior as a binary classifier which can be achieved by a deep convolutional neural network (CNN). The learned prior is able to distinguish whether an input image is clear or not. Embedded into the maximum a posterior (MAP) framework, it helps blind deblurring in various scenarios, including natural, face, text, and low-illumination images. However, it is difficult to optimize the deblurring method with the learned image prior as it involves a non-linear CNN. Therefore, we develop an efficient numerical approach based on the half-quadratic splitting method and gradient decent algorithm to solve the proposed model. Furthermore, the proposed model can be easily extended to non-uniform deblurring. Both qualitative and quantitative experimental results show that our method performs favorably against state-of-the-art algorithms as well as domain-specific image deblurring approaches.

Blind Image Deblurring Image Deblurring

Exemplar-based Linear Discriminant Analysis for Robust Object Tracking

no code implementations24 Feb 2014 Changxin Gao, Feifei Chen, Jin-Gang Yu, Rui Huang, Nong Sang

However, the task in tracking is to search for a specific object, rather than an object category as in detection.

Object Object Tracking

Cannot find the paper you are looking for? You can Submit a new open access paper.