1 code implementation • 29 May 2025 • Liyun Zhu, Qixiang Chen, Xi Shen, Xiaodong Cun
Video Anomaly Understanding (VAU) is essential for applications such as smart cities, security surveillance, and disaster alert systems, yet remains challenging due to its demand for fine-grained spatio-temporal perception and robust reasoning under ambiguity.
no code implementations • 18 Mar 2025 • Julian Gamboa, Xi Shen, Tabassom Hamidfar, Shamima Mitu, Selim M. Shahriar
Opto-electronic joint transform correlators (OJTCs) use a focal plane array (FPA) to detect the joint power spectrum (JPS) of two input images, projecting it onto a spatial light modulator (SLM) to be optically Fourier transformed.
no code implementations • 18 Mar 2025 • Xi Shen, Julian Gamboa, Tabassom Hamidfar, Shamima Mitu, Selim M. Shahriar
The Polar Mellin Transform (PMT) is a well-known technique that converts images into shift, scale and rotation invariant signatures for object detection using opto-electronic correlators.
1 code implementation • 10 Mar 2025 • Haiyang Xie, Xi Shen, Shihua Huang, Qirui Wang, Zheng Wang
Most visual models are designed for sRGB images, yet RAW data offers significant advantages for object detection by preserving sensor information before ISP processing.
1 code implementation • CVPR 2025 • Shihua Huang, Zhichao Lu, Xiaodong Cun, Yongjun Yu, Xiao Zhou, Xi Shen
We introduce DEIM, an innovative and efficient training framework designed to accelerate convergence in real-time object detection with Transformer-based architectures (DETR).
no code implementations • 5 Oct 2024 • Weihuang Liu, Xi Shen, Chi-Man Pun, Xiaodong Cun
During test-time training, the predicted mask from the localization head is used for the classification head to update the image encoder for better adaptation.
2 code implementations • 13 Sep 2024 • Yuting Li, Dexiong Chen, Tinglong Tang, Xi Shen
To address this limitation, we introduce a data-efficient ViT method that uses only the encoder of the standard transformer.
Ranked #1 on
Handwritten Text Recognition
on LAM(line-level)
1 code implementation • 12 Sep 2024 • Longfei Liu, Wen Guo, Shihua Huang, Cheng Li, Xi Shen
In this study, we introduce COCO-FP, a new evaluation dataset derived from the ImageNet-1K dataset, designed to address this issue.
no code implementations • 15 Jun 2024 • Ying Fu, Yu Li, ShaoDi You, Boxin Shi, Linwei Chen, Yunhao Zou, Zichun Wang, Yichen Li, Yuze Han, Yingkai Zhang, Jianan Wang, Qinglin Liu, Wei Yu, Xiaoqian Lv, Jianing Li, Shengping Zhang, Xiangyang Ji, Yuanpei Chen, Yuhan Zhang, Weihang Peng, Liwen Zhang, Zhe Xu, Dingyong Gou, Cong Li, Senyan Xu, Yunkang Zhang, Siyuan Jiang, Xiaoqiang Lu, Licheng Jiao, Fang Liu, Xu Liu, Lingling Li, Wenping Ma, Shuyuan Yang, Haiyang Xie, Jian Zhao, Shihua Huang, Peng Cheng, Xi Shen, Zheng Wang, Shuai An, Caizhi Zhu, Xuelong Li, Tao Zhang, Liang Li, Yu Liu, Chenggang Yan, Gengchen Zhang, Linyan Jiang, Bingyi Song, Zhuoyu An, Haibo Lei, Qing Luo, Jie Song, YuAn Liu, Haoyuan Zhang, Lingfeng Wang, Wei Chen, Aling Luo, Cheng Li, Jun Cao, Shu Chen, Zifei Dou, Xinyu Liu, Jing Zhang, Kexin Zhang, Yuting Yang, Xuejian Gou, Qinliang Wang, Yang Liu, Shizhan Zhao, Yanzhao Zhang, Libo Yan, Yuwei Guo, Guoxin Li, Qiong Gao, Chenyue Che, Long Sun, Xiang Chen, Hao Li, Jinshan Pan, Chuanlong Xie, Hongming Chen, Mingrui Li, Tianchen Deng, Jingwei Huang, Yufeng Li, Fei Wan, Bingxin Xu, Jian Cheng, Hongzhe Liu, Cheng Xu, Yuxiang Zou, Weiguo Pan, Songyin Dai, Sen Jia, Junpei Zhang, Puhua Chen, Qihang Li
The intersection of physics-based vision and deep learning presents an exciting frontier for advancing computer vision technologies.
1 code implementation • CVPR 2024 • Weihuang Liu, Xi Shen, Haolun Li, Xiuli Bi, Bo Liu, Chi-Man Pun, Xiaodong Cun
In this work, we introduce a test-time training (TTT) strategy to address the problem.
1 code implementation • CVPR 2024 • Yuting Li, Yingyi Chen, Xuanlong Yu, Dexiong Chen, Xi Shen
In this paper, we revisit techniques for uncertainty estimation within deep neural networks and consolidate a suite of techniques to enhance their reliability.
Ranked #1 on
Learning with noisy labels
on ANIMAL
no code implementations • 11 Dec 2023 • Julian Gamboa, Xi Shen, Tabassom Hamidfar, Selim M. Shahriar
Space situational awareness demands efficient monitoring of terrestrial sites and celestial bodies, necessitating advanced target recognition systems.
1 code implementation • ICCV 2023 • YiHao Zhi, Xiaodong Cun, Xuelin Chen, Xi Shen, Wen Guo, Shaoli Huang, Shenghua Gao
While previous methods are able to generate speech rhythm-synchronized gestures, the semantic context of the speech is generally lacking in the gesticulations.
2 code implementations • 29 May 2023 • Weihuang Liu, Xi Shen, Chi-Man Pun, Xiaodong Cun
We take inspiration from the widely-used pre-training and then prompt tuning protocols in NLP and propose a new visual prompting model, named Explicit Visual Prompting (EVP).
Ranked #3 on
Salient Object Detection
on DUT-OMRON
1 code implementation • CVPR 2023 • Weihuang Liu, Xi Shen, Chi-Man Pun, Xiaodong Cun
Different from the previous visual prompting which is typically a dataset-level implicit embedding, our key insight is to enforce the tunable parameters focusing on the explicit visual content from each individual image, i. e., the features from frozen patch embeddings and the input's high-frequency components.
Ranked #4 on
Salient Object Detection
on HKU-IS
1 code implementation • 15 Jan 2023 • Jianrong Zhang, Yangsong Zhang, Xiaodong Cun, Shaoli Huang, Yong Zhang, Hongwei Zhao, Hongtao Lu, Xi Shen
Additionally, we conduct analyses on HumanML3D and observe that the dataset size is a limitation of our approach.
Ranked #2 on
Motion Synthesis
on Motion-X
no code implementations • CVPR 2023 • Jianrong Zhang, Yangsong Zhang, Xiaodong Cun, Yong Zhang, Hongwei Zhao, Hongtao Lu, Xi Shen, Ying Shan
Additionally, we conduct analyses on HumanML3D and observe that the dataset size is a limitation of our approach.
2 code implementations • CVPR 2023 • Wenxuan Zhang, Xiaodong Cun, Xuan Wang, Yong Zhang, Xi Shen, Yu Guo, Ying Shan, Fei Wang
We present SadTalker, which generates 3D motion coefficients (head pose, expression) of the 3DMM from audio and implicitly modulates a novel 3D-aware face render for talking head generation.
no code implementations • 1 Sep 2022 • Yangtao Wang, Xi Shen, Yuan Yuan, Yuming Du, Maomao Li, Shell Xu Hu, James L Crowley, Dominique Vaufreydaz
This method also achieves competitive results for unsupervised video object segmentation tasks with the DAVIS, SegTV2, and FBMS datasets.
Ranked #5 on
Unsupervised Instance Segmentation
on COCO val2017
1 code implementation • 25 Jul 2022 • Yingyi Chen, Xi Shen, Yahui Liu, Qinghua Tao, Johan A. K. Suykens
In this paper, we explore solving jigsaw puzzle as a self-supervised auxiliary loss in ViT for image classification, named Jigsaw-ViT.
Ranked #1 on
Learning with noisy labels
on ANIMAL
1 code implementation • 4 Jul 2022 • Wen Guo, Yuming Du, Xi Shen, Vincent Lepetit, Xavier Alameda-Pineda, Francesc Moreno-Noguer
This paper tackles the problem of human motion prediction, consisting in forecasting future body poses from historically observed sequences.
1 code implementation • 27 Jun 2022 • Yingyi Chen, Shell Xu Hu, Xi Shen, Chunrong Ai, Johan A. K. Suykens
This decomposition provides three insights: (i) it shows that over-fitting is indeed an issue for learning with noisy labels; (ii) through an information bottleneck formulation, it explains why the proposed feature compression helps in combating label noise; (iii) it gives explanations on the performance boost brought by incorporating compression regularization into Co-teaching.
Ranked #11 on
Image Classification
on Clothing1M
(using extra training data)
1 code implementation • CVPR 2022 • Yangtao Wang, Xi Shen, Shell Hu, Yuan Yuan, James Crowley, Dominique Vaufreydaz
For unsupervised saliency detection, we improve IoU for 4. 9%, 5. 2%, 12. 9% on ECSSD, DUTS, DUT-OMRON respectively compared to previous state of the art.
Ranked #1 on
Weakly-Supervised Object Localization
on CUB
no code implementations • NeurIPS 2021 • Xi Shen, Yang Xiao, Shell Hu, Othman Sbai, Mathieu Aubry
In the problems of image retrieval and few-shot classification, the mainstream approaches focus on learning a better feature representation.
1 code implementation • 29 Oct 2021 • Xi Shen, Alexei A. Efros, Armand Joulin, Mathieu Aubry
The goal of this work is to efficiently identify visually similar patterns in images, e. g. identifying an artwork detail copied between an engraving and an oil painting, or recognizing parts of a night-time photograph visible in its daytime counterpart.
no code implementations • 18 Aug 2021 • Ryad Kaoua, Xi Shen, Alexandra Durr, Stavros Lazaris, David Picard, Mathieu Aubry
For an historian, the first step in studying their evolution in a corpus of similar manuscripts is to identify which ones correspond to each other.
1 code implementation • 28 Apr 2021 • Yingyi Chen, Xi Shen, Shell Xu Hu, Johan A. K. Suykens
On Clothing1M, our approach obtains 74. 9% accuracy which is slightly better than that of DivideMix.
Ranked #13 on
Image Classification
on Clothing1M
(using extra training data)
2 code implementations • ICLR 2020 • Shell Xu Hu, Pablo G. Moreno, Yang Xiao, Xi Shen, Guillaume Obozinski, Neil D. Lawrence, Andreas Damianou
The evidence lower bound of the marginal log-likelihood of empirical Bayes decomposes as a sum of local KL divergences between the variational posterior and the true posterior on the query set of each task.
Ranked #13 on
Few-Shot Image Classification
on CIFAR-FS 5-way (1-shot)
2 code implementations • ECCV 2020 • Xi Shen, François Darmon, Alexei A. Efros, Mathieu Aubry
Coarse alignment is performed using RANSAC on off-the-shelf deep features.
1 code implementation • 27 Aug 2019 • Xi Shen, Ilaria Pastrolin, Oumayma Bounou, Spyros Gidaris, Marc Smith, Olivier Poncet, Mathieu Aubry
Historical watermark recognition is a highly practical, yet unsolved challenge for archivists and historians.
no code implementations • ICLR 2019 • Yuan Yuan, Yueming Lyu, Xi Shen, Ivor W. Tsang, Dit-yan Yeung
The MAAN employs a novel marginalized average aggregation (MAA) module and learns a set of latent discriminative probabilities in an end-to-end fashion.
Ranked #13 on
Weakly Supervised Action Localization
on ActivityNet-1.3
(mAP@0.5 metric)
Weakly Supervised Action Localization
Weakly-supervised Learning
1 code implementation • CVPR 2019 • Xi Shen, Alexei A. Efros, Mathieu Aubry
Our goal in this paper is to discover near duplicate patterns in large collections of artworks.