Search Results for author: Xi Shen

Found 32 papers, 22 papers with code

VAU-R1: Advancing Video Anomaly Understanding via Reinforcement Fine-Tuning

1 code implementation29 May 2025 Liyun Zhu, Qixiang Chen, Xi Shen, Xiaodong Cun

Video Anomaly Understanding (VAU) is essential for applications such as smart cities, security surveillance, and disaster alert systems, yet remains challenging due to its demand for fine-grained spatio-temporal perception and robust reasoning under ambiguity.

Anomaly Detection Descriptive +2

Debiased Opto-electronic Joint Transform Correlator for Enhanced Real-Time Pattern Recognition

no code implementations18 Mar 2025 Julian Gamboa, Xi Shen, Tabassom Hamidfar, Shamima Mitu, Selim M. Shahriar

Opto-electronic joint transform correlators (OJTCs) use a focal plane array (FPA) to detect the joint power spectrum (JPS) of two input images, projecting it onto a spatial light modulator (SLM) to be optically Fourier transformed.

Shift, Scale and Rotation Invariant Multiple Object Detection using Balanced Joint Transform Correlator

no code implementations18 Mar 2025 Xi Shen, Julian Gamboa, Tabassom Hamidfar, Shamima Mitu, Selim M. Shahriar

The Polar Mellin Transform (PMT) is a well-known technique that converts images into shift, scale and rotation invariant signatures for object detection using opto-electronic correlators.

object-detection Object Detection

SimROD: A Simple Baseline for Raw Object Detection with Global and Local Enhancements

1 code implementation10 Mar 2025 Haiyang Xie, Xi Shen, Shihua Huang, Qirui Wang, Zheng Wang

Most visual models are designed for sRGB images, yet RAW data offers significant advantages for object detection by preserving sensor information before ISP processing.

Object object-detection +1

DEIM: DETR with Improved Matching for Fast Convergence

1 code implementation CVPR 2025 Shihua Huang, Zhichao Lu, Xiaodong Cun, Yongjun Yu, Xiao Zhou, Xi Shen

We introduce DEIM, an innovative and efficient training framework designed to accelerate convergence in real-time object detection with Transformer-based architectures (DETR).

Data Augmentation object-detection +1

ForgeryTTT: Zero-Shot Image Manipulation Localization with Test-Time Training

no code implementations5 Oct 2024 Weihuang Liu, Xi Shen, Chi-Man Pun, Xiaodong Cun

During test-time training, the predicted mask from the localization head is used for the classification head to update the image encoder for better adaptation.

Image Manipulation Image Manipulation Localization

HTR-VT: Handwritten Text Recognition with Vision Transformer

2 code implementations13 Sep 2024 Yuting Li, Dexiong Chen, Tinglong Tang, Xi Shen

To address this limitation, we introduce a data-efficient ViT method that uses only the encoder of the standard transformer.

Handwritten Text Recognition HTR

From COCO to COCO-FP: A Deep Dive into Background False Positives for COCO Detectors

1 code implementation12 Sep 2024 Longfei Liu, Wen Guo, Shihua Huang, Cheng Li, Xi Shen

In this study, we introduce COCO-FP, a new evaluation dataset derived from the ImageNet-1K dataset, designed to address this issue.

Object

SURE: SUrvey REcipes for building reliable and robust deep networks

1 code implementation CVPR 2024 Yuting Li, Yingyi Chen, Xuanlong Yu, Dexiong Chen, Xi Shen

In this paper, we revisit techniques for uncertainty estimation within deep neural networks and consolidate a suite of techniques to enhance their reliability.

image-classification Learning with noisy labels +2

High-speed Opto-electronic Pre-processing of Polar Mellin Transform for Shift, Scale and Rotation Invariant Image Recognition at Record-Breaking Speeds

no code implementations11 Dec 2023 Julian Gamboa, Xi Shen, Tabassom Hamidfar, Selim M. Shahriar

Space situational awareness demands efficient monitoring of terrestrial sites and celestial bodies, necessitating advanced target recognition systems.

LivelySpeaker: Towards Semantic-Aware Co-Speech Gesture Generation

1 code implementation ICCV 2023 YiHao Zhi, Xiaodong Cun, Xuelin Chen, Xi Shen, Wen Guo, Shaoli Huang, Shenghua Gao

While previous methods are able to generate speech rhythm-synchronized gestures, the semantic context of the speech is generally lacking in the gesticulations.

Gesture Generation Rhythm

Explicit Visual Prompting for Universal Foreground Segmentations

2 code implementations29 May 2023 Weihuang Liu, Xi Shen, Chi-Man Pun, Xiaodong Cun

We take inspiration from the widely-used pre-training and then prompt tuning protocols in NLP and propose a new visual prompting model, named Explicit Visual Prompting (EVP).

Camouflaged Object Segmentation Defocus Blur Detection +6

Explicit Visual Prompting for Low-Level Structure Segmentations

1 code implementation CVPR 2023 Weihuang Liu, Xi Shen, Chi-Man Pun, Xiaodong Cun

Different from the previous visual prompting which is typically a dataset-level implicit embedding, our key insight is to enforce the tunable parameters focusing on the explicit visual content from each individual image, i. e., the features from frozen patch embeddings and the input's high-frequency components.

Camouflaged Object Segmentation Defocus Blur Detection +5

SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation

2 code implementations CVPR 2023 Wenxuan Zhang, Xiaodong Cun, Xuan Wang, Yong Zhang, Xi Shen, Yu Guo, Ying Shan, Fei Wang

We present SadTalker, which generates 3D motion coefficients (head pose, expression) of the 3DMM from audio and implicitly modulates a novel 3D-aware face render for talking head generation.

Image Animation Talking Head Generation

Jigsaw-ViT: Learning Jigsaw Puzzles in Vision Transformer

1 code implementation25 Jul 2022 Yingyi Chen, Xi Shen, Yahui Liu, Qinghua Tao, Johan A. K. Suykens

In this paper, we explore solving jigsaw puzzle as a self-supervised auxiliary loss in ViT for image classification, named Jigsaw-ViT.

Classification Domain Generalization +3

Compressing Features for Learning with Noisy Labels

1 code implementation27 Jun 2022 Yingyi Chen, Shell Xu Hu, Xi Shen, Chunrong Ai, Johan A. K. Suykens

This decomposition provides three insights: (i) it shows that over-fitting is indeed an issue for learning with noisy labels; (ii) through an information bottleneck formulation, it explains why the proposed feature compression helps in combating label noise; (iii) it gives explanations on the performance boost brought by incorporating compression regularization into Co-teaching.

Ranked #11 on Image Classification on Clothing1M (using extra training data)

Feature Compression Feature Importance +2

Self-Supervised Transformers for Unsupervised Object Discovery using Normalized Cut

1 code implementation CVPR 2022 Yangtao Wang, Xi Shen, Shell Hu, Yuan Yuan, James Crowley, Dominique Vaufreydaz

For unsupervised saliency detection, we improve IoU for 4. 9%, 5. 2%, 12. 9% on ECSSD, DUTS, DUT-OMRON respectively compared to previous state of the art.

Object object-detection +5

Re-ranking for image retrieval and transductive few-shot classification

no code implementations NeurIPS 2021 Xi Shen, Yang Xiao, Shell Hu, Othman Sbai, Mathieu Aubry

In the problems of image retrieval and few-shot classification, the mainstream approaches focus on learning a better feature representation.

Classification Few-Shot Learning +3

Learning Co-segmentation by Segment Swapping for Retrieval and Discovery

1 code implementation29 Oct 2021 Xi Shen, Alexei A. Efros, Armand Joulin, Mathieu Aubry

The goal of this work is to efficiently identify visually similar patterns in images, e. g. identifying an artwork detail copied between an engraving and an oil painting, or recognizing parts of a night-time photograph visible in its daytime counterpart.

Graph Clustering Object Discovery +3

Image Collation: Matching illustrations in manuscripts

no code implementations18 Aug 2021 Ryad Kaoua, Xi Shen, Alexandra Durr, Stavros Lazaris, David Picard, Mathieu Aubry

For an historian, the first step in studying their evolution in a corpus of similar manuscripts is to identify which ones correspond to each other.

Boosting Co-teaching with Compression Regularization for Label Noise

1 code implementation28 Apr 2021 Yingyi Chen, Xi Shen, Shell Xu Hu, Johan A. K. Suykens

On Clothing1M, our approach obtains 74. 9% accuracy which is slightly better than that of DivideMix.

Ranked #13 on Image Classification on Clothing1M (using extra training data)

Data Compression image-classification +2

Empirical Bayes Transductive Meta-Learning with Synthetic Gradients

2 code implementations ICLR 2020 Shell Xu Hu, Pablo G. Moreno, Yang Xiao, Xi Shen, Guillaume Obozinski, Neil D. Lawrence, Andreas Damianou

The evidence lower bound of the marginal log-likelihood of empirical Bayes decomposes as a sum of local KL divergences between the variational posterior and the true posterior on the query set of each task.

Few-Shot Image Classification Meta-Learning +3

Cannot find the paper you are looking for? You can Submit a new open access paper.