Transformer based Pluralistic Image Completion with Reduced Information Loss

1 code implementation31 Mar 2024 Qiankun Liu, Yuqi Jiang, Zhentao Tan, Dongdong Chen, Ying Fu, Qi Chu, Gang Hua, Nenghai Yu

The indices of quantized pixels are used as tokens for the inputs and prediction targets of the transformer.

Image Inpainting Quantization

Leveraging Large Language Models for Relevance Judgments in Legal Case Retrieval

no code implementations27 Mar 2024 Shengjie Ma, Chong Chen, Qi Chu, Jiaxin Mao

Nonetheless, the method of employing a general large language model for reliable relevance judgments in legal case retrieval is yet to be thoroughly explored.

Language Modelling Large Language Model +1

MiM-ISTD: Mamba-in-Mamba for Efficient Infrared Small Target Detection

1 code implementation4 Mar 2024 Tianxiang Chen, Zhentao Tan, Tao Gong, Qi Chu, Yue Wu, Bin Liu, Jieping Ye, Nenghai Yu

Inspired by the recent basic model with linear complexity for long-distance modeling, called Mamba, we explore the potential of this state space model for ISTD task in terms of effectiveness and efficiency in the paper.


Bootstrapping Audio-Visual Segmentation by Strengthening Audio Cues

no code implementations4 Feb 2024 Tianxiang Chen, Zhentao Tan, Tao Gong, Qi Chu, Yue Wu, Bin Liu, Le Lu, Jieping Ye, Nenghai Yu

This bidirectional interaction narrows the modality imbalance, facilitating more effective learning of integrated audio-visual representations.

Representation Learning

TCI-Former: Thermal Conduction-Inspired Transformer for Infrared Small Target Detection

no code implementations3 Feb 2024 Tianxiang Chen, Zhentao Tan, Qi Chu, Yue Wu, Bin Liu, Nenghai Yu

We abstract this process as the directional movement of feature map pixels to target areas through convolution, pooling and interactions with surrounding pixels, which can be analogous to the movement of thermal particles constrained by surrounding variables and particles.

Towards More Unified In-context Visual Understanding

no code implementations5 Dec 2023 Dianmo Sheng, Dongdong Chen, Zhentao Tan, Qiankun Liu, Qi Chu, Jianmin Bao, Tao Gong, Bin Liu, Shengwei Xu, Nenghai Yu

Thanks to this design, the model is capable of handling in-context vision understanding tasks with multimodal output in a unified pipeline. Experimental results demonstrate that our model achieves competitive performance compared with specialized models and previous ICL baselines.

Image Captioning In-Context Learning +1

GUPNet++: Geometry Uncertainty Propagation Network for Monocular 3D Object Detection

1 code implementation24 Oct 2023 Yan Lu, Xinzhu Ma, Lei Yang, Tianzhu Zhang, Yating Liu, Qi Chu, Tong He, Yonghui Li, Wanli Ouyang

It models the uncertainty propagation relationship of the geometry projection during training, improving the stability and efficiency of the end-to-end model learning.

Monocular 3D Object Detection object-detection

Exploiting Modality-Specific Features For Multi-Modal Manipulation Detection And Grounding

no code implementations22 Sep 2023 Jiazhen Wang, Bin Liu, Changtao Miao, Zhiwei Zhao, Wanyi Zhuang, Qi Chu, Nenghai Yu

Existing methods for multi-modal manipulation detection and grounding primarily focus on fusing vision-language features to make predictions, while overlooking the importance of modality-specific features, leading to sub-optimal results.

MotionGPT: Finetuned LLMs Are General-Purpose Motion Generators

no code implementations19 Jun 2023 Yaqi Zhang, Di Huang, Bin Liu, Shixiang Tang, Yan Lu, Lu Chen, Lei Bai, Qi Chu, Nenghai Yu, Wanli Ouyang

Generating realistic human motion from given action descriptions has experienced significant advancements because of the emerging requirement of digital humans.

EVOPOSE: A Recursive Transformer For 3D Human Pose Estimation With Kinematic Structure Priors

no code implementations16 Jun 2023 Yaqi Zhang, Yan Lu, Bin Liu, Zhiwei Zhao, Qi Chu, Nenghai Yu

Transformer is popular in recent 3D human pose estimation, which utilizes long-term modeling to lift 2D keypoints into the 3D space.

3D Human Pose Estimation

Exploring the Application of Large-scale Pre-trained Models on Adverse Weather Removal

no code implementations15 Jun 2023 Zhentao Tan, Yue Wu, Qiankun Liu, Qi Chu, Le Lu, Jieping Ye, Nenghai Yu

Inspired by the various successful applications of large-scale pre-trained models (e. g, CLIP), in this paper, we explore the potential benefits of them for this task through both spatial feature representation learning and semantic information embedding aspects: 1) for spatial feature representation learning, we design a Spatially-Adaptive Residual (\textbf{SAR}) Encoder to extract degraded areas adaptively.

Image Restoration Representation Learning

HQ-50K: A Large-scale, High-quality Dataset for Image Restoration

1 code implementation8 Jun 2023 Qinhong Yang, Dongdong Chen, Zhentao Tan, Qiankun Liu, Qi Chu, Jianmin Bao, Lu Yuan, Gang Hua, Nenghai Yu

This paper introduces a new large-scale image restoration dataset, called HQ-50K, which contains 50, 000 high-quality images with rich texture details and semantic diversity.

Denoising Image Restoration +2

Multi-spectral Class Center Network for Face Manipulation Detection and Localization

1 code implementation18 May 2023 Changtao Miao, Qi Chu, Zhentao Tan, Zhenchao Jin, Wanyi Zhuang, Yue Wu, Bin Liu, Honggang Hu, Nenghai Yu

Next, a novel Multi-Spectral Class Center Network (MSCCNet) is proposed for face manipulation detection and localization.

Face Swapping

X-Paste: Revisiting Scalable Copy-Paste for Instance Segmentation using CLIP and StableDiffusion

1 code implementation7 Dec 2022 Hanqing Zhao, Dianmo Sheng, Jianmin Bao, Dongdong Chen, Dong Chen, Fang Wen, Lu Yuan, Ce Liu, Wenbo Zhou, Qi Chu, Weiming Zhang, Nenghai Yu

We demonstrate for the first time that using a text2image model to generate images or zero-shot recognition model to filter noisily crawled images for different object categories is a feasible way to make Copy-Paste truly scalable.

Data Augmentation Instance Segmentation +5

Counterfactual Intervention Feature Transfer for Visible-Infrared Person Re-identification

no code implementations1 Aug 2022 Xulin Li, Yan Lu, Bin Liu, Yating Liu, Guojun Yin, Qi Chu, Jinyang Huang, Feng Zhu, Rui Zhao, Nenghai Yu

But we find existing graph-based methods in the visible-infrared person re-identification task (VI-ReID) suffer from bad generalization because of two issues: 1) train-test modality balance gap, which is a property of VI-ReID task.

counterfactual Person Re-Identification

Towards Intrinsic Common Discriminative Features Learning for Face Forgery Detection using Adversarial Learning

no code implementations8 Jul 2022 Wanyi Zhuang, Qi Chu, Haojie Yuan, Changtao Miao, Bin Liu, Nenghai Yu

Existing face forgery detection methods usually treat face forgery detection as a binary classification problem and adopt deep convolution neural networks to learn discriminative features.

Binary Classification Classification +1

Online Multi-Object Tracking with Unsupervised Re-Identification Learning and Occlusion Estimation

no code implementations4 Jan 2022 Qiankun Liu, Dongdong Chen, Qi Chu, Lu Yuan, Bin Liu, Lei Zhang, Nenghai Yu

In addition, such practice of re-identification still can not track those highly occluded objects when they are missed by the detector.

Ranked #7 on Multi-Object Tracking on MOT16 (using extra training data)

Multi-Object Tracking Object +2

Unsupervised Finetuning

no code implementations18 Oct 2021 Suichan Li, Dongdong Chen, Yinpeng Chen, Lu Yuan, Lei Zhang, Qi Chu, Bin Liu, Nenghai Yu

This problem is more challenging than the supervised counterpart, as the low data density in the small-scale target data is not friendly for unsupervised learning, leading to the damage of the pretrained representation and poor representation in the target domain.

Temporal RoI Align for Video Object Recognition

1 code implementation8 Sep 2021 Tao Gong, Kai Chen, Xinjiang Wang, Qi Chu, Feng Zhu, Dahua Lin, Nenghai Yu, Huamin Feng

In this work, considering the features of the same object instance are highly similar among frames in a video, a novel Temporal RoI Align operator is proposed to extract features from other frames feature maps for current frame proposals by utilizing feature similarity.

Instance Segmentation Object +5

ISNet: Integrate Image-Level and Semantic-Level Context for Semantic Segmentation

1 code implementation ICCV 2021 Zhenchao Jin, Bin Liu, Qi Chu, Nenghai Yu

Third, we compute the similarities between each pixel representation and the image-level contextual information, the semantic-level contextual information, respectively.

Image Segmentation Semantic Segmentation

Mining Contextual Information Beyond Image for Semantic Segmentation

1 code implementation ICCV 2021 Zhenchao Jin, Tao Gong, Dongdong Yu, Qi Chu, Jian Wang, Changhu Wang, Jie Shao

To address this, this paper proposes to mine the contextual information beyond individual images to further augment the pixel representations.

Image Segmentation Segmentation +1

Abnormal Behavior Detection Based on Target Analysis

no code implementations29 Jul 2021 Luchuan Song, Bin Liu, Huihui Zhu, Qi Chu, Nenghai Yu

To this end, we propose a multivariate fusion method that analyzes each target through three branches: object, action and motion.


Cascaded Residual Density Network for Crowd Counting

no code implementations29 Jul 2021 Kun Zhao, Luchuan Song, Bin Liu, Qi Chu, Nenghai Yu

Crowd counting is a challenging task due to the issues such as scale variation and perspective variation in real crowd scenes.

Crowd Counting

Improve Unsupervised Pretraining for Few-label Transfer

no code implementations ICCV 2021 Suichan Li, Dongdong Chen, Yinpeng Chen, Lu Yuan, Lei Zhang, Qi Chu, Bin Liu, Nenghai Yu

Unsupervised pretraining has achieved great success and many recent works have shown unsupervised pretraining can achieve comparable or even slightly better transfer performance than supervised pretraining on downstream target datasets.

Clustering Contrastive Learning

Towards Generalizable and Robust Face Manipulation Detection via Bag-of-local-feature

no code implementations14 Mar 2021 Changtao Miao, Qi Chu, Weihai Li, Tao Gong, Wanyi Zhuang, Nenghai Yu

Over the past several years, in order to solve the problem of malicious abuse of facial manipulation technology, face manipulation detection technology has obtained considerable attention and achieved remarkable progress.

Diverse Semantic Image Synthesis via Probability Distribution Modeling

1 code implementation CVPR 2021 Zhentao Tan, Menglei Chai, Dongdong Chen, Jing Liao, Qi Chu, Bin Liu, Gang Hua, Nenghai Yu

In this paper, we propose a novel diverse semantic image synthesis framework from the perspective of semantic class distributions, which naturally supports diverse generation at semantic or even instance level.

Image-to-Image Translation

Are Fewer Labels Possible for Few-shot Learning?

no code implementations10 Dec 2020 Suichan Li, Dongdong Chen, Yinpeng Chen, Lu Yuan, Lei Zhang, Qi Chu, Nenghai Yu

We conduct experiments on 10 different few-shot target datasets, and our average few-shot performance outperforms both vanilla inductive unsupervised transfer and supervised transfer by a large margin.

Clustering Few-Shot Learning

Efficient Semantic Image Synthesis via Class-Adaptive Normalization

1 code implementation8 Dec 2020 Zhentao Tan, Dongdong Chen, Qi Chu, Menglei Chai, Jing Liao, Mingming He, Lu Yuan, Gang Hua, Nenghai Yu

Spatially-adaptive normalization (SPADE) is remarkably successful recently in conditional semantic image synthesis \cite{park2019semantic}, which modulates the normalized activation with spatially-varying transformations learned from semantic layouts, to prevent the semantic information from being washed away.

Image Generation

MichiGAN: Multi-Input-Conditioned Hair Image Generation for Portrait Editing

1 code implementation30 Oct 2020 Zhentao Tan, Menglei Chai, Dongdong Chen, Jing Liao, Qi Chu, Lu Yuan, Sergey Tulyakov, Nenghai Yu

In this paper, we present MichiGAN (Multi-Input-Conditioned Hair Image GAN), a novel conditional image generation method for interactive portrait hair manipulation.

Conditional Image Generation

Rethinking Spatially-Adaptive Normalization

no code implementations6 Apr 2020 Zhentao Tan, Dongdong Chen, Qi Chu, Menglei Chai, Jing Liao, Mingming He, Lu Yuan, Nenghai Yu

Despite its impressive performance, a more thorough understanding of the true advantages inside the box is still highly demanded, to help reduce the significant computation and parameter overheads introduced by these new structures.

Image Generation

Density-Aware Graph for Deep Semi-Supervised Visual Recognition

no code implementations CVPR 2020 Suichan Li, Bin Liu, Dong-Dong Chen, Qi Chu, Lu Yuan, Nenghai Yu

Motivated by these limitations, this paper proposes to solve the SSL problem by building a novel density-aware graph, based on which the neighborhood information can be easily leveraged and the feature learning and label propagation can also be trained in an end-to-end way.

Pseudo Label

Cross-modality Person re-identification with Shared-Specific Feature Transfer

no code implementations CVPR 2020 Yan Lu, Yue Wu, Bin Liu, Tianzhu Zhang, Baopu Li, Qi Chu, Nenghai Yu

In this paper, we tackle the above limitation by proposing a novel cross-modality shared-specific feature transfer algorithm (termed cm-SSFT) to explore the potential of both the modality-shared information and the modality-specific characteristics to boost the re-identification performance.

Cross-Modality Person Re-identification Person Re-Identification

