no code implementations • 27 May 2025 • Chenghao Yang, Yinbo Luo, Zhoufutu Wen, Qi Chu, Tao Gong, Longxiang Liu, Kaiyuan Zhang, Jianpeng Jiao, Ge Zhang, Wenhao Huang, Nenghai Yu
Extensive experiments on MARS-Bench also reveal that closed-source LLMs significantly outperform open-source alternatives, explicit reasoning significantly boosts LLMs' robustness on handling long complex dialogue sessions, and LLMs indeed face significant challenges when handling motivation transfer and sophisticated cross-turn dependency.
no code implementations • 26 May 2025 • Kaichao Jiang, He Wang, Xiaoshuai Hao, Xiulong Yang, Ajian Liu, Qi Chu, Yunfeng Diao
To address this question, we systematically analyze the energy distribution differences of clean, adversarial, and generated samples across various JEM variants and adversarially trained models.
no code implementations • 19 May 2025 • Ajian Liu, Haocheng Yuan, Xiao Guo, Hui Ma, Wanyi Zhuang, Changtao Miao, Yan Hong, Chuanbiao Song, Jun Lan, Qi Chu, Tao Gong, Yanyan Liang, Weiqiang Wang, Jun Wan, Xiaoming Liu, Zhen Lei
The lack of a Unified Face Attack Detection model to handle both types of attacks is mainly due to two factors.
no code implementations • 26 Mar 2025 • Xinghao Wang, Tao Gong, Qi Chu, Bin Liu, Nenghai Yu
However, existing weakly supervised image manipulation methods overlook the importance of edge information for accurate localization, leading to suboptimal localization performance.
no code implementations • CVPR 2025 • Dianmo Sheng, Dongdong Chen, Zhentao Tan, Qiankun Liu, Qi Chu, Tao Gong, Bin Liu, Jing Han, Wenbin Tu, Shengwei Xu, Nenghai Yu
In response, we propose UNICL-SAM, to better model the example distribution and extract robust representations to help in-context segmentation.
no code implementations • 30 Dec 2024 • Yi Zhang, Weize Gao, Changtao Miao, Man Luo, Jianshu Li, Wenzhong Deng, Zhe Li, Bingyu Hu, Weibin Yao, Yunfeng Diao, Wenbo Zhou, Tao Gong, Qi Chu
In this paper, we present the solutions from the top 3 teams of the two tracks, to boost the research work in the field of image and audio-video forgery detection.
1 code implementation • 3 Oct 2024 • Tianxiang Chen, Zhentao Tan, Tao Gong, Yue Wu, Qi Chu, Bin Liu, Jieping Ye, Nenghai Yu
We observe performance dips in question-answering benchmarks after the removal or expansion of the shallow layers, and the degradation shrinks as the layer gets deeper, indicating that the shallow layers hold the key to knowledge injection.
no code implementations • 5 Aug 2024 • Changtao Miao, Qi Chu, Tao Gong, Zhentao Tan, Zhenchao Jin, Wanyi Zhuang, Man Luo, Honggang Hu, Nenghai Yu
The FUP integrates detection and localization tasks using a token learning strategy and multiple forgery-aware transformers, which facilitates the use of classification information to enhance localization capability.
1 code implementation • 31 Mar 2024 • Qiankun Liu, Yuqi Jiang, Zhentao Tan, Dongdong Chen, Ying Fu, Qi Chu, Gang Hua, Nenghai Yu
The indices of quantized pixels are used as tokens for the inputs and prediction targets of the transformer.
no code implementations • 27 Mar 2024 • Shengjie Ma, Chong Chen, Qi Chu, Jiaxin Mao
Nonetheless, the method of employing a general large language model for reliable relevance judgments in legal case retrieval is yet to be thoroughly explored.
1 code implementation • 4 Mar 2024 • Tianxiang Chen, Zi Ye, Zhentao Tan, Tao Gong, Yue Wu, Qi Chu, Bin Liu, Nenghai Yu, Jieping Ye
By aggregating the visual word and visual sentence features, our MiM-ISTD can effectively explore both global and local information.
no code implementations • 4 Feb 2024 • Tianxiang Chen, Zhentao Tan, Tao Gong, Qi Chu, Yue Wu, Bin Liu, Le Lu, Jieping Ye, Nenghai Yu
This bidirectional interaction narrows the modality imbalance, facilitating more effective learning of integrated audio-visual representations.
no code implementations • 3 Feb 2024 • Tianxiang Chen, Zhentao Tan, Qi Chu, Yue Wu, Bin Liu, Nenghai Yu
We abstract this process as the directional movement of feature map pixels to target areas through convolution, pooling and interactions with surrounding pixels, which can be analogous to the movement of thermal particles constrained by surrounding variables and particles.
no code implementations • CVPR 2024 • Dianmo Sheng, Dongdong Chen, Zhentao Tan, Qiankun Liu, Qi Chu, Jianmin Bao, Tao Gong, Bin Liu, Shengwei Xu, Nenghai Yu
Thanks to this design, the model is capable of handling in-context vision understanding tasks with multimodal output in a unified pipeline. Experimental results demonstrate that our model achieves competitive performance compared with specialized models and previous ICL baselines.
1 code implementation • 24 Oct 2023 • Yan Lu, Xinzhu Ma, Lei Yang, Tianzhu Zhang, Yating Liu, Qi Chu, Tong He, Yonghui Li, Wanli Ouyang
It models the uncertainty propagation relationship of the geometry projection during training, improving the stability and efficiency of the end-to-end model learning.
no code implementations • 22 Sep 2023 • Jiazhen Wang, Bin Liu, Changtao Miao, Zhiwei Zhao, Wanyi Zhuang, Qi Chu, Nenghai Yu
Existing methods for multi-modal manipulation detection and grounding primarily focus on fusing vision-language features to make predictions, while overlooking the importance of modality-specific features, leading to sub-optimal results.
no code implementations • 19 Jun 2023 • Yaqi Zhang, Di Huang, Bin Liu, Shixiang Tang, Yan Lu, Lu Chen, Lei Bai, Qi Chu, Nenghai Yu, Wanli Ouyang
Generating realistic human motion from given action descriptions has experienced significant advancements because of the emerging requirement of digital humans.
no code implementations • 16 Jun 2023 • Yaqi Zhang, Yan Lu, Bin Liu, Zhiwei Zhao, Qi Chu, Nenghai Yu
Transformer is popular in recent 3D human pose estimation, which utilizes long-term modeling to lift 2D keypoints into the 3D space.
Ranked #101 on
3D Human Pose Estimation
on Human3.6M
no code implementations • 15 Jun 2023 • Zhentao Tan, Yue Wu, Qiankun Liu, Qi Chu, Le Lu, Jieping Ye, Nenghai Yu
Inspired by the various successful applications of large-scale pre-trained models (e. g, CLIP), in this paper, we explore the potential benefits of them for this task through both spatial feature representation learning and semantic information embedding aspects: 1) for spatial feature representation learning, we design a Spatially-Adaptive Residual (\textbf{SAR}) Encoder to extract degraded areas adaptively.
1 code implementation • 8 Jun 2023 • Qinhong Yang, Dongdong Chen, Zhentao Tan, Qiankun Liu, Qi Chu, Jianmin Bao, Lu Yuan, Gang Hua, Nenghai Yu
This paper introduces a new large-scale image restoration dataset, called HQ-50K, which contains 50, 000 high-quality images with rich texture details and semantic diversity.
1 code implementation • 18 May 2023 • Changtao Miao, Qi Chu, Zhentao Tan, Zhenchao Jin, Tao Gong, Wanyi Zhuang, Yue Wu, Bin Liu, Honggang Hu, Nenghai Yu
To this end, a novel Multi-Spectral Class Center Network (MSCCNet) is proposed for face manipulation detection and localization.
no code implementations • 10 May 2023 • Xulin Li, Yan Lu, Bin Liu, Yuenan Hou, Yating Liu, Qi Chu, Wanli Ouyang, Nenghai Yu
Clothes-invariant feature extraction is critical to the clothes-changing person re-identification (CC-ReID).
2 code implementations • 7 Dec 2022 • Hanqing Zhao, Dianmo Sheng, Jianmin Bao, Dongdong Chen, Dong Chen, Fang Wen, Lu Yuan, Ce Liu, Wenbo Zhou, Qi Chu, Weiming Zhang, Nenghai Yu
We demonstrate for the first time that using a text2image model to generate images or zero-shot recognition model to filter noisily crawled images for different object categories is a feasible way to make Copy-Paste truly scalable.
Ranked #8 on
Instance Segmentation
on LVIS v1.0 val
no code implementations • 23 Oct 2022 • Wanyi Zhuang, Qi Chu, Zhentao Tan, Qiankun Liu, Haojie Yuan, Changtao Miao, Zixiang Luo, Nenghai Yu
UPCL is designed for learning the consistency-related representation with progressive optimized pseudo annotations.
no code implementations • 1 Aug 2022 • Xulin Li, Yan Lu, Bin Liu, Yating Liu, Guojun Yin, Qi Chu, Jinyang Huang, Feng Zhu, Rui Zhao, Nenghai Yu
But we find existing graph-based methods in the visible-infrared person re-identification task (VI-ReID) suffer from bad generalization because of two issues: 1) train-test modality balance gap, which is a property of VI-ReID task.
no code implementations • 8 Jul 2022 • Wanyi Zhuang, Qi Chu, Haojie Yuan, Changtao Miao, Bin Liu, Nenghai Yu
Existing face forgery detection methods usually treat face forgery detection as a binary classification problem and adopt deep convolution neural networks to learn discriminative features.
1 code implementation • CVPR 2022 • Qiankun Liu, Zhentao Tan, Dongdong Chen, Qi Chu, Xiyang Dai, Yinpeng Chen, Mengchen Liu, Lu Yuan, Nenghai Yu
The indices of quantized pixels are used as tokens for the inputs and prediction targets of transformer.
Ranked #6 on
Seeing Beyond the Visible
on KITTI360-EX
no code implementations • 4 Jan 2022 • Qiankun Liu, Dongdong Chen, Qi Chu, Lu Yuan, Bin Liu, Lei Zhang, Nenghai Yu
In addition, such practice of re-identification still can not track those highly occluded objects when they are missed by the detector.
Ranked #7 on
Multi-Object Tracking
on MOT16
(using extra training data)
no code implementations • 18 Oct 2021 • Suichan Li, Dongdong Chen, Yinpeng Chen, Lu Yuan, Lei Zhang, Qi Chu, Bin Liu, Nenghai Yu
This problem is more challenging than the supervised counterpart, as the low data density in the small-scale target data is not friendly for unsupervised learning, leading to the damage of the pretrained representation and poor representation in the target domain.
1 code implementation • 8 Sep 2021 • Tao Gong, Kai Chen, Xinjiang Wang, Qi Chu, Feng Zhu, Dahua Lin, Nenghai Yu, Huamin Feng
In this work, considering the features of the same object instance are highly similar among frames in a video, a novel Temporal RoI Align operator is proposed to extract features from other frames feature maps for current frame proposals by utilizing feature similarity.
Ranked #1 on
Video Instance Segmentation
on YouTube-VIS
1 code implementation • ICCV 2021 • Zhenchao Jin, Bin Liu, Qi Chu, Nenghai Yu
Third, we compute the similarities between each pixel representation and the image-level contextual information, the semantic-level contextual information, respectively.
1 code implementation • ICCV 2021 • Zhenchao Jin, Tao Gong, Dongdong Yu, Qi Chu, Jian Wang, Changhu Wang, Jie Shao
To address this, this paper proposes to mine the contextual information beyond individual images to further augment the pixel representations.
no code implementations • 29 Jul 2021 • Luchuan Song, Bin Liu, Huihui Zhu, Qi Chu, Nenghai Yu
To this end, we propose a multivariate fusion method that analyzes each target through three branches: object, action and motion.
no code implementations • 29 Jul 2021 • Kun Zhao, Luchuan Song, Bin Liu, Qi Chu, Nenghai Yu
Crowd counting is a challenging task due to the issues such as scale variation and perspective variation in real crowd scenes.
1 code implementation • ICCV 2021 • Yan Lu, Xinzhu Ma, Lei Yang, Tianzhu Zhang, Yating Liu, Qi Chu, Junjie Yan, Wanli Ouyang
In this paper, we propose a Geometry Uncertainty Projection Network (GUP Net) to tackle the error amplification problem at both inference and training stages.
3D Object Detection From Monocular Images
Depth Estimation
+3
no code implementations • ICCV 2021 • Suichan Li, Dongdong Chen, Yinpeng Chen, Lu Yuan, Lei Zhang, Qi Chu, Bin Liu, Nenghai Yu
Unsupervised pretraining has achieved great success and many recent works have shown unsupervised pretraining can achieve comparable or even slightly better transfer performance than supervised pretraining on downstream target datasets.
no code implementations • 14 Mar 2021 • Changtao Miao, Qi Chu, Weihai Li, Tao Gong, Wanyi Zhuang, Nenghai Yu
Over the past several years, in order to solve the problem of malicious abuse of facial manipulation technology, face manipulation detection technology has obtained considerable attention and achieved remarkable progress.
1 code implementation • CVPR 2021 • Zhentao Tan, Menglei Chai, Dongdong Chen, Jing Liao, Qi Chu, Bin Liu, Gang Hua, Nenghai Yu
In this paper, we propose a novel diverse semantic image synthesis framework from the perspective of semantic class distributions, which naturally supports diverse generation at semantic or even instance level.
Ranked #1 on
Image-to-Image Translation
on Deep-Fashion
no code implementations • 8 Feb 2021 • Ryan Magee, Deep Chatterjee, Leo P. Singer, Surabhi Sachdev, Manoj Kovalam, Geoffrey Mo, Stuart Anderson, Patrick Brady, Patrick Brockill, Kipp Cannon, Tito Dal Canton, Qi Chu, Patrick Clearwater, Alex Codoreanu, Marco Drago, Patrick Godwin, Shaon Ghosh, Giuseppe Greco, Chad Hanna, Shasvath J. Kapadia, Erik Katsavounidis, Victor Oloworaran, Alexander E. Pace, Fiona Panther, Anwarul Patwary, Roberto De Pietri, Brandon Piotrzkowski, Tanner Prestegard, Luca Rei, Anala K. Sreekumar, Marek J. Szczepańczyk, Vinaya Valsan, Aaron Viets, Madeline Wade, Linqing Wen, John Zweizig
We present results from an end-to-end mock data challenge that detects binary neutron star mergers and alerts partner facilities before merger.
High Energy Astrophysical Phenomena
no code implementations • 10 Dec 2020 • Suichan Li, Dongdong Chen, Yinpeng Chen, Lu Yuan, Lei Zhang, Qi Chu, Nenghai Yu
We conduct experiments on 10 different few-shot target datasets, and our average few-shot performance outperforms both vanilla inductive unsupervised transfer and supervised transfer by a large margin.
1 code implementation • 8 Dec 2020 • Zhentao Tan, Dongdong Chen, Qi Chu, Menglei Chai, Jing Liao, Mingming He, Lu Yuan, Gang Hua, Nenghai Yu
Spatially-adaptive normalization (SPADE) is remarkably successful recently in conditional semantic image synthesis \cite{park2019semantic}, which modulates the normalized activation with spatially-varying transformations learned from semantic layouts, to prevent the semantic information from being washed away.
1 code implementation • 30 Oct 2020 • Zhentao Tan, Menglei Chai, Dongdong Chen, Jing Liao, Qi Chu, Lu Yuan, Sergey Tulyakov, Nenghai Yu
In this paper, we present MichiGAN (Multi-Input-Conditioned Hair Image GAN), a novel conditional image generation method for interactive portrait hair manipulation.
no code implementations • 6 Apr 2020 • Zhentao Tan, Dongdong Chen, Qi Chu, Menglei Chai, Jing Liao, Mingming He, Lu Yuan, Nenghai Yu
Despite its impressive performance, a more thorough understanding of the true advantages inside the box is still highly demanded, to help reduce the significant computation and parameter overheads introduced by these new structures.
no code implementations • CVPR 2020 • Suichan Li, Bin Liu, Dong-Dong Chen, Qi Chu, Lu Yuan, Nenghai Yu
Motivated by these limitations, this paper proposes to solve the SSL problem by building a novel density-aware graph, based on which the neighborhood information can be easily leveraged and the feature learning and label propagation can also be trained in an end-to-end way.
no code implementations • CVPR 2020 • Yan Lu, Yue Wu, Bin Liu, Tianzhu Zhang, Baopu Li, Qi Chu, Nenghai Yu
In this paper, we tackle the above limitation by proposing a novel cross-modality shared-specific feature transfer algorithm (termed cm-SSFT) to explore the potential of both the modality-shared information and the modality-specific characteristics to boost the re-identification performance.
Cross-Modality Person Re-identification
Person Re-Identification
no code implementations • ICCV 2017 • Qi Chu, Wanli Ouyang, Hongsheng Li, Xiaogang Wang, Bin Liu, Nenghai Yu
The visibility map of the target is learned and used for inferring the spatial attention map.