Search Results for author: Xian Zhong

Found 20 papers, 9 papers with code

PAD: Phase-Amplitude Decoupling Fusion for Multi-Modal Land Cover Classification

1 code implementation27 Apr 2025 Huiling Zheng, Xian Zhong, Bin Liu, Yi Xiao, Bihan Wen, Xiaofeng Li

The fusion of Synthetic Aperture Radar (SAR) and RGB imagery for land cover classification remains challenging due to modality heterogeneity and the underutilization of spectral complementarity.

Land Cover Classification

QIRL: Boosting Visual Question Answering via Optimized Question-Image Relation Learning

no code implementations4 Apr 2025 Quanxing Xu, Ling Zhou, Xian Zhong, Feifei Zhang, Rubing Huang, Chia-Wen Lin

Furthermore, to validate our concept of reducing output errors through filtering unrelated question-image inputs, we propose a specialized metric to evaluate the performance of the ISI module.

Data Augmentation Image Generation +4

Anomize: Better Open Vocabulary Video Anomaly Detection

no code implementations23 Mar 2025 Fei Li, Wenxuan Liu, Jingjing Chen, Ruixu Zhang, Yuran Wang, Xian Zhong, Zheng Wang

Open Vocabulary Video Anomaly Detection (OVVAD) seeks to detect and classify both base and novel anomalies.

Anomaly Detection Video Anomaly Detection

STAA-SNN: Spatial-Temporal Attention Aggregator for Spiking Neural Networks

no code implementations4 Mar 2025 Tianqing Zhang, Kairong Yu, Xian Zhong, Hongwei Wang, Qi Xu, Qiang Zhang

The framework demonstrates exceptional performance across diverse datasets and exhibits strong generalization capabilities.

LLM Knows Geometry Better than Algebra: Numerical Understanding of LLM-Based Agents in A Trading Arena

1 code implementation25 Feb 2025 Tianmi Ma, Jiawei Du, Wenxin Huang, Wenjie Wang, Liang Xie, Xian Zhong, Joey Tianyi Zhou

Recent advancements in large language models (LLMs) have significantly improved performance in natural language processing tasks.

FocalCount: Towards Class-Count Imbalance in Class-Agnostic Counting

no code implementations15 Feb 2025 Huilin Zhu, Jingling Yuan, Zhengwei Yang, Yu Guo, Xian Zhong, Shengfeng He

In class-agnostic object counting, the goal is to estimate the total number of object instances in an image without distinguishing between specific categories.

Object Object Counting

See What You Seek: Semantic Contextual Integration for Cloth-Changing Person Re-Identification

no code implementations2 Dec 2024 Xiyu Han, Xian Zhong, Wenxin Huang, Xuemei Jia, Wenxuan Liu, Xiaohan Yu, Alex ChiChung Kot

In this paper, we propose a novel prompt learning framework, Semantic Contextual Integration (SCI), for CC-ReID, which leverages the visual-text representation capabilities of CLIP to minimize the impact of clothing changes and enhance ID-relevant features.

Cloth-Changing Person Re-Identification Prompt Learning

OccludeNet: A Causal Journey into Mixed-View Actor-Centric Video Action Recognition under Occlusions

1 code implementation24 Nov 2024 Guanyu Zhou, Wenxuan Liu, Wenxin Huang, Xuemei Jia, Xian Zhong, Chia-Wen Lin

We anticipate that the challenges posed by OccludeNet will stimulate further exploration of causal relations in occlusion scenarios and encourage a reevaluation of class correlations, ultimately promoting sustainable performance improvements.

Action Classification Action Recognition +7

Towards Low-latency Event-based Visual Recognition with Hybrid Step-wise Distillation Spiking Neural Networks

1 code implementation19 Sep 2024 Xian Zhong, Shengwang Hu, Wenxuan Liu, Wenxin Huang, Jianhao Ding, Zhaofei Yu, Tiejun Huang

In this paper, we propose Hybrid Step-wise Distillation (HSD) method, tailored for neuromorphic datasets, to mitigate the notable decline in performance at lower time steps.

Knowledge Distillation

DenseTrack: Drone-based Crowd Tracking via Density-aware Motion-appearance Synergy

no code implementations24 Jul 2024 Yi Lei, Huilin Zhu, Jingling Yuan, Guangli Xiang, Xian Zhong, Shengfeng He

Drone-based crowd tracking faces difficulties in accurately identifying and monitoring objects from an aerial perspective, largely due to their small size and close proximity to each other, which complicates both localization and tracking.

Crowd Counting Language Modeling +2

Zero-shot Object Counting with Good Exemplars

1 code implementation6 Jul 2024 Huilin Zhu, Jingling Yuan, Zhengwei Yang, Yu Guo, Zheng Wang, Xian Zhong, Shengfeng He

Zero-shot object counting (ZOC) aims to enumerate objects in images using only the names of object classes during testing, without the need for manual annotations.

Contrastive Learning Object +2

Good Is Bad: Causality Inspired Cloth-Debiasing for Cloth-Changing Person Re-Identification

1 code implementation CVPR 2023 Zhengwei Yang, Meng Lin, Xian Zhong, Yu Wu, Zheng Wang

Entangled representation of clothing and identity (ID)-intrinsic clues are potentially concomitant in conventional person Re-IDentification (ReID).

Cloth-Changing Person Re-Identification

Refined Semantic Enhancement towards Frequency Diffusion for Video Captioning

1 code implementation28 Nov 2022 Xian Zhong, Zipeng Li, Shuqin Chen, Kui Jiang, Chen Chen, Mang Ye

In this paper, we introduce a novel Refined Semantic enhancement method towards Frequency Diffusion (RSFD), a captioning model that constantly perceives the linguistic representation of the infrequent tokens.

FAD Video Captioning

Visual-aware Attention Dual-stream Decoder for Video Captioning

no code implementations16 Oct 2021 Zhixin Sun, Xian Zhong, Shuqin Chen, Lin Li, Luo Zhong

Video captioning is a challenging task that captures different visual parts and describes them in sentences, for it requires visual and linguistic coherence.

Decoder Video Captioning +1

Image-to-Video Person Re-Identification by Reusing Cross-modal Embeddings

no code implementations4 Oct 2018 Zhongwei Xie, Lin Li, Xian Zhong, Luo Zhong

In this paper, we propose an end-to-end neural network framework for image-to-video person reidentification by leveraging cross-modal embeddings learned from extra information. Concretely speaking, cross-modal embeddings from image captioning and video captioning models are reused to help learned features be projected into a coordinated space, where similarity can be directly computed.

Image Captioning Image-To-Video Person Re-Identification +2

Cannot find the paper you are looking for? You can Submit a new open access paper.