Search Results for author: YaoWei Wang

Found 140 papers, 86 papers with code

RadioDUN: A Physics-Inspired Deep Unfolding Network for Radio Map Estimation

no code implementations10 Jun 2025 Taiqin Chen, Zikun Zhou, Zheng Fang, Wenzhen Zou, Kanjun Liu, Ke Chen, Yongbing Zhang, YaoWei Wang

Inspired by the shadowing factor in the physical propagation model, we integrate obstacle-related factors to express the obstacle-induced signal stochastic decay.

Compressive Sensing

Adversarial Semantic and Label Perturbation Attack for Pedestrian Attribute Recognition

2 code implementations29 May 2025 Weizhe Kong, Xiao Wang, Ruichong Gao, Chenglong Li, Yu Zhang, Xing Yang, YaoWei Wang, Jin Tang

To bridge this gap, this paper proposes the first adversarial attack and defense framework for pedestrian attribute recognition.

Adversarial Attack Attribute +1

Your Language Model Can Secretly Write Like Humans: Contrastive Paraphrase Attacks on LLM-Generated Text Detectors

no code implementations21 May 2025 Hao Fang, Jiawei Kong, Tianqu Zhuang, Yixiang Qiu, Kuofeng Gao, Bin Chen, Shu-Tao Xia, YaoWei Wang, Min Zhang

By subtracting the machine-like patterns from the human-like distribution during the decoding process, CoPA is able to produce sentences that are less discernible by text detectors.

Language Modeling Language Modelling

High Quality Underwater Image Compression with Adaptive Correction and Codebook-based Augmentation

no code implementations15 May 2025 Yimin Zhou, Yichong Xia, Sicheng Pan, Bin Chen, Baoyi An, Haoqian Wang, Zhi Wang, YaoWei Wang, Zikun Zhou

With the increasing exploration and exploitation of the underwater world, underwater images have become a critical medium for human interaction with marine environments, driving extensive research into their efficient transmission and storage.

Image Compression

Towards Facial Image Compression with Consistency Preserving Diffusion Prior

no code implementations9 May 2025 Yimin Zhou, Yichong Xia, Bin Chen, Baoyi An, Haoqian Wang, Zhi Wang, YaoWei Wang, Zikun Zhou

Simply adapting diffusion-based compression methods to facial compression tasks results in reconstructed images that perform poorly in downstream applications due to insufficient preservation of high-frequency information.

Image Compression

CountDiffusion: Text-to-Image Synthesis with Training-Free Counting-Guidance Diffusion

no code implementations7 May 2025 Yanyu Li, Pencheng Wan, Liang Han, YaoWei Wang, Liqiang Nie, Min Zhang

Stable Diffusion has advanced text-to-image synthesis, but training models to generate images with accurate object quantity is still difficult due to the high computational cost and the challenge of teaching models the abstract concept of quantity.

Denoising Image Generation +1

VideoVista-CulturalLingo: 360$^\circ$ Horizons-Bridging Cultures, Languages, and Domains in Video Comprehension

1 code implementation23 Apr 2025 Xinyu Chen, Yunxin Li, Haoyuan Shi, Baotian Hu, Wenhan Luo, YaoWei Wang, Min Zhang

Assessing the video comprehension capabilities of multimodal AI systems can effectively measure their understanding and reasoning abilities.

Harmony: A Unified Framework for Modality Incremental Learning

no code implementations17 Apr 2025 Yaguang Song, Xiaoshan Yang, Dongmei Jiang, YaoWei Wang, Changsheng Xu

To address this task, we propose a novel framework named Harmony, designed to achieve modal alignment and knowledge retention, enabling the model to reduce the modal discrepancy and learn from a sequence of distinct modalities, ultimately completing tasks across multiple modalities within a unified framework.

Incremental Learning

Learning Compatible Multi-Prize Subnetworks for Asymmetric Retrieval

1 code implementation CVPR 2025 Yushuai Sun, Zikun Zhou, Dongmei Jiang, YaoWei Wang, Jun Yu, Guangming Lu, Wenjie Pei

For example, when introducing a new platform into the retrieval systems, developers have to train an additional model at an appropriate capacity that is compatible with existing models via backward-compatible learning.

Retrieval

A Unified Agentic Framework for Evaluating Conditional Image Generation

1 code implementation9 Apr 2025 Jifang Wang, Xue Yang, Longyue Wang, Zhenran Xu, Yiyu Wang, YaoWei Wang, Weihua Luo, Kaifu Zhang, Baotian Hu, Min Zhang

This paper introduces CIGEval, a unified agentic framework for comprehensive evaluation of conditional image generation tasks.

Conditional Image Generation

AutoSSVH: Exploring Automated Frame Sampling for Efficient Self-Supervised Video Hashing

1 code implementation CVPR 2025 Niu Lian, Jun Li, Jinpeng Wang, Ruisheng Luo, YaoWei Wang, Shu-Tao Xia, Bin Chen

To address this limitation, we propose a new framework, termed AutoSSVH, that employs adversarial frame sampling with hash-based contrastive learning.

Contrastive Learning Retrieval

Representation-based Reward Modeling for Efficient Safety Alignment of Large Language Model

no code implementations13 Mar 2025 Qiyuan Deng, Xuefeng Bai, Kehai Chen, YaoWei Wang, Liqiang Nie, Min Zhang

Reinforcement Learning (RL) algorithms for safety alignment of Large Language Models (LLMs), such as Direct Preference Optimization (DPO), encounter the challenge of distribution shift.

Language Modeling Language Modelling +4

Exo2Ego: Exocentric Knowledge Guided MLLM for Egocentric Video Understanding

no code implementations12 Mar 2025 Haoyu Zhang, Qiaohui Chu, Meng Liu, Yunxiao Wang, Bin Wen, Fan Yang, Tingting Gao, Di Zhang, YaoWei Wang, Liqiang Nie

To address these challenges, we propose learning the mapping between exocentric and egocentric domains, leveraging the extensive exocentric knowledge within existing MLLMs to enhance egocentric video understanding.

Instruction Following Video Understanding

DSPNet: Dual-vision Scene Perception for Robust 3D Question Answering

1 code implementation CVPR 2025 Jingzhou Luo, Yang Liu, Weixing Chen, Zhen Li, YaoWei Wang, Guanbin Li, Liang Lin

In this paper, we propose a Dual-vision Scene Perception Network (DSPNet), to comprehensively integrate multi-view and point cloud features to improve robustness in 3D QA.

3D Question Answering (3D-QA) Question Answering

OpenEarthSensing: Large-Scale Fine-Grained Benchmark for Open-World Remote Sensing

no code implementations28 Feb 2025 Xiang Xiang, Zhuo Xu, Yao Deng, Qinhao Zhou, Yifan Liang, Ke Chen, Qingfang Zheng, YaoWei Wang, Xilin Chen, Wen Gao

In open-world remote sensing, deployed models must continuously adapt to a steady influx of new data, which often exhibits various shifts compared to what the model encountered during the training phase.

Picking the Cream of the Crop: Visual-Centric Data Selection with Collaborative Agents

1 code implementation27 Feb 2025 Zhenyu Liu, Yunxin Li, Baotian Hu, Wenhan Luo, YaoWei Wang, Min Zhang

Specifically, our approach consists of 1) an image information quantification method via visual agents collaboration to select images with rich visual information, and 2) a visual-centric instruction quality assessment method to select high-quality instruction data related to high-quality images.

Image Quality Assessment

SelaVPR++: Towards Seamless Adaptation of Foundation Models for Efficient Place Recognition

1 code implementation23 Feb 2025 Feng Lu, Tong Jin, Xiangyuan Lan, Lijun Zhang, Yunpeng Liu, YaoWei Wang, Chun Yuan

In our previous work, we propose a novel method to realize seamless adaptation of foundation models to VPR (SelaVPR).

Deep Hashing Re-Ranking +1

Benchmarking Post-Training Quantization in LLMs: Comprehensive Taxonomy, Unified Evaluation, and Comparative Analysis

1 code implementation18 Feb 2025 Jiaqi Zhao, Ming Wang, Miao Zhang, Yuzhang Shang, Xuebo Liu, YaoWei Wang, Min Zhang, Liqiang Nie

Then, we conduct extensive experiments with the baseline within each class, covering models with various sizes (7B-70B), bitwidths, training levels (LLaMA1/2/3/3. 1), architectures (Mixtral, DeepSeekMoE and Mamba) and modality (LLaVA1. 5 and VILA1. 5) on a wide range of evaluation metrics. Through comparative analysis on the results, we summarize the superior of each PTQ strategy and modelsize-bitwidth trade-off considering the performance.

Benchmarking Mamba +1

PTQ1.61: Push the Real Limit of Extremely Low-Bit Post-Training Quantization Methods for Large Language Models

1 code implementation18 Feb 2025 Jiaqi Zhao, Miao Zhang, Ming Wang, Yuzhang Shang, Kaihao Zhang, Weili Guan, YaoWei Wang, Min Zhang

To explore the real limit of PTQ, we propose an extremely low-bit PTQ method called PTQ1. 61, which enables weight quantization to 1. 61-bit for the first time.

Binarization Quantization

EventSTR: A Benchmark Dataset and Baselines for Event Stream based Scene Text Recognition

1 code implementation13 Feb 2025 Xiao Wang, Jingtao Jiang, Dong Li, Futian Wang, Lin Zhu, YaoWei Wang, Yongyong Tian, Jin Tang

Mainstream Scene Text Recognition (STR) algorithms are developed based on RGB cameras which are sensitive to challenging factors such as low illumination, motion blur, and cluttered backgrounds.

Large Language Model Scene Text Recognition

Pilot: Building the Federated Multimodal Instruction Tuning Framework

no code implementations23 Jan 2025 Baochen Xiong, Xiaoshan Yang, Yaguang Song, YaoWei Wang, Changsheng Xu

In this paper, we explore a novel federated multimodal instruction tuning task(FedMIT), which is significant for collaboratively fine-tuning MLLMs on different types of multimodal instruction data on distributed devices.

General Knowledge

Activating Associative Disease-Aware Vision Token Memory for LLM-Based X-ray Report Generation

1 code implementation7 Jan 2025 Xiao Wang, Fuling Wang, Haowen Wang, Bo Jiang, Chuanfu Li, YaoWei Wang, Yonghong Tian, Jin Tang

X-ray image based medical report generation achieves significant progress in recent years with the help of the large language model, however, these models have not fully exploited the effective information in visual image regions, resulting in reports that are linguistically sound but insufficient in describing key diseases.

Language Modeling Language Modelling +2

Building Vision Models upon Heat Conduction

no code implementations CVPR 2025 Zhaozhi Wang, Yue Liu, Yunjie Tian, Yunfan Liu, YaoWei Wang, Qixiang Ye

Visual representation models leveraging attention mechanisms are challenged by significant computational overhead, particularly when pursuing large receptive fields.

NN-Former: Rethinking Graph Structure in Neural Architecture Representation

1 code implementation CVPR 2025 Ruihan Xu, Haokui Zhang, YaoWei Wang, Wei Zeng, Shiliang Zhang

Our approach consistently achieves promising performance in both accuracy and latency prediction, providing valuable insights for learning Directed Acyclic Graph (DAG) topology.

Sound Bridge: Associating Egocentric and Exocentric Videos via Audio Cues

1 code implementation CVPR 2025 Sihong Huang, Jiaxin Wu, XiaoYong Wei, Yi Cai, Dongmei Jiang, YaoWei Wang

However, existing visual-to-visual and visual-to-textual Ego-Exo video alignment methods struggle with the problem that there could be non-visual overlap for the same activity.

Action Recognition Scene Recognition +1

Video Language Model Pretraining with Spatio-temporal Masking

no code implementations CVPR 2025 Yue Wu, Zhaobo Qi, Junshu Sun, YaoWei Wang, Qingming Huang, Shuhui Wang

The development of self-supervised video-language models based on mask learning has significantly advanced downstream video tasks.

Decoder Language Modeling +2

VELoRA: A Low-Rank Adaptation Approach for Efficient RGB-Event based Recognition

1 code implementation28 Dec 2024 Lan Chen, Haoxiang Yang, Pengpeng Shao, Haoyu Song, Xiao Wang, Zhicheng Zhao, YaoWei Wang, Yonghong Tian

Inspired by the successful application of large models, the introduction of such large models can also be considered to further enhance the performance of multi-modal tasks.

parameter-efficient fine-tuning

Towards Visual Grounding: A Survey

4 code implementations28 Dec 2024 Linhui Xiao, Xiaoshan Yang, Xiangyuan Lan, YaoWei Wang, Changsheng Xu

Finally, we outline the challenges confronting visual grounding and propose valuable directions for future research, which may serve as inspiration for subsequent researchers.

Phrase Grounding Referring Expression +3

Efficient Dataset Distillation via Diffusion-Driven Patch Selection for Improved Generalization

no code implementations13 Dec 2024 Xinhao Zhong, Shuoyang Sun, Xulin Gu, Zhaoyang Xu, YaoWei Wang, Jianlong Wu, Bin Chen

Dataset distillation offers an efficient way to reduce memory and computational costs by optimizing a smaller dataset with performance comparable to the full-scale original.

Dataset Distillation

Towards Long Video Understanding via Fine-detailed Video Story Generation

no code implementations9 Dec 2024 Zeng You, Zhiquan Wen, Yaofo Chen, Xin Li, Runhao Zeng, YaoWei Wang, Mingkui Tan

To avoid interference from redundant information in videos, we introduce a Semantic Redundancy Reduction mechanism that removes redundancy at both the visual and textual levels.

Story Generation Video Understanding

Do We Need to Design Specific Diffusion Models for Different Tasks? Try ONE-PIC

no code implementations7 Dec 2024 Ming Tao, Bing-Kun Bao, YaoWei Wang, Changsheng Xu

However, unlike Large Language Models (LLMs) that can learn multiple tasks in a single model based on instructed data, diffusion models always require additional branches, task-specific training strategies, and losses for effective adaptation to different downstream tasks.

CATCH: Complementary Adaptive Token-level Contrastive Decoding to Mitigate Hallucinations in LVLMs

no code implementations19 Nov 2024 Zhehan Kan, Ce Zhang, Zihan Liao, Yapeng Tian, Wenming Yang, Junyuan Xiao, Xu Li, Dongmei Jiang, YaoWei Wang, Qingmin Liao

Large Vision-Language Model (LVLM) systems have demonstrated impressive vision-language reasoning capabilities but suffer from pervasive and severe hallucination issues, posing significant risks in critical domains such as healthcare and autonomous systems.

Hallucination Language Modeling +3

OneRef: Unified One-tower Expression Grounding and Segmentation with Mask Referring Modeling

2 code implementations10 Oct 2024 Linhui Xiao, Xiaoshan Yang, Fang Peng, YaoWei Wang, Changsheng Xu

Simultaneously, the current mask visual language modeling (MVLM) fails to capture the nuanced referential relationship between image-text in referring tasks.

Language Modeling Language Modelling

EMMA: Empowering Multi-modal Mamba with Structural and Hierarchical Alignment

no code implementations8 Oct 2024 Yifei Xing, Xiangyuan Lan, Ruiping Wang, Dongmei Jiang, Wenjun Huang, Qingfang Zheng, YaoWei Wang

In this work, we propose Empowering Multi-modal Mamba with Structural and Hierarchical Alignment (EMMA), which enables the MLLM to extract fine-grained visual information.

cross-modal alignment Hallucination +1

Discriminative Spatial-Semantic VOS Solution: 1st Place Solution for 6th LSVOS

no code implementations29 Aug 2024 Deshui Miao, Yameng Gu, Xin Li, Zhenyu He, YaoWei Wang, Ming-Hsuan Yang

Video object segmentation (VOS) is a crucial task in computer vision, but current VOS methods struggle with complex scenes and prolonged object motions.

Object Object Recognition +3

Event Stream based Sign Language Translation: A High-Definition Benchmark Dataset and A New Algorithm

1 code implementation20 Aug 2024 Xiao Wang, Yao Rong, Fuling Wang, Jianing Li, Lin Zhu, Bo Jiang, YaoWei Wang

Based on this dataset and several other large-scale datasets, we propose a novel baseline method that fully leverages the Mamba model's ability to integrate temporal information of CNN features, resulting in improved sign language translation outcomes.

Mamba Sign Language Translation +1

Reshaping the Online Data Buffering and Organizing Mechanism for Continual Test-Time Adaptation

1 code implementation12 Jul 2024 Zhilin Zhu, Xiaopeng Hong, Zhiheng Ma, Weijun Zhuang, Yaohui Ma, Yong Dai, YaoWei Wang

In this paper, we systematically analyze the challenges of this task: online environment, unsupervised nature, and the risks of error accumulation and catastrophic forgetting under continual domain shifts.

Test-time Adaptation

OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion

1 code implementation10 Jul 2024 Hao Wang, Pengzhen Ren, Zequn Jie, Xiao Dong, Chengjian Feng, Yinlong Qian, Lin Ma, Dongmei Jiang, YaoWei Wang, Xiangyuan Lan, Xiaodan Liang

To address these challenges, we propose a novel unified open-vocabulary detection method called OV-DINO, which is pre-trained on diverse large-scale datasets with language-aware selective fusion in a unified framework.

Ranked #5 on Zero-Shot Object Detection on MSCOCO (AP metric, using extra training data)

Zero-Shot Object Detection

Learning Spatial-Semantic Features for Robust Video Object Segmentation

no code implementations10 Jul 2024 Xin Li, Deshui Miao, Zhenyu He, YaoWei Wang, Huchuan Lu, Ming-Hsuan Yang

Tracking and segmenting multiple similar objects with complex or separate parts in long-term videos is inherently challenging due to the ambiguity of target parts and identity confusion caused by occlusion, background clutter, and long-term variations.

Object Semantic Segmentation +2

Retain, Blend, and Exchange: A Quality-aware Spatial-Stereo Fusion Approach for Event Stream Recognition

1 code implementation27 Jun 2024 Lan Chen, Dong Li, Xiao Wang, Pengpeng Shao, Wei zhang, YaoWei Wang, Yonghong Tian, Jin Tang

In this paper, we propose a novel dual-stream framework for event stream-based pattern recognition via differentiated fusion, termed EFV++.

Graph Neural Network

1st Place Solution for MOSE Track in CVPR 2024 PVUW Workshop: Complex Video Object Segmentation

no code implementations7 Jun 2024 Deshui Miao, Xin Li, Zhenyu He, YaoWei Wang, Ming-Hsuan Yang

In this challenge, we propose a semantic embedding video object segmentation model and use the salient features of objects as query representations.

Object Segmentation +3

vHeat: Building Vision Models upon Heat Conduction

1 code implementation26 May 2024 Zhaozhi Wang, Yue Liu, Yunfan Liu, Hongtian Yu, YaoWei Wang, Qixiang Ye, Yunjie Tian

A fundamental problem in learning robust and expressive visual representations lies in efficiently estimating the spatial relationships of visual semantics throughout the entire image.

Computational Efficiency

LG-VQ: Language-Guided Codebook Learning

no code implementations23 May 2024 Guotao Liang, Baoquan Zhang, YaoWei Wang, Xutao Li, Yunming Ye, Huaibin Wang, Chuyao Luo, Kola Ye, linfeng Luo

Vector quantization (VQ) is a key technique in high-resolution and high-fidelity image synthesis, which aims to learn a codebook to encode an image with a sequence of discrete codes and then generate an image in an auto-regression manner.

Image Captioning Image Generation +1

Prompt Customization for Continual Learning

1 code implementation28 Apr 2024 Yong Dai, Xiaopeng Hong, Yabin Wang, Zhiheng Ma, Dongmei Jiang, YaoWei Wang

In contrast to conventional methods that employ hard prompt selection, PGM assigns different coefficients to prompts from a fixed-sized pool of prompts and generates tailored prompts.

Continual Learning Incremental Learning

Spatio-Temporal Side Tuning Pre-trained Foundation Models for Video-based Pedestrian Attribute Recognition

3 code implementations27 Apr 2024 Xiao Wang, Qian Zhu, Jiandong Jin, Jun Zhu, Futian Wang, Bo Jiang, YaoWei Wang, Yonghong Tian

Specifically, we formulate the video-based PAR as a vision-language fusion problem and adopt a pre-trained foundation model CLIP to extract the visual features.

Attribute Pedestrian Attribute Recognition +1

Motion-aware Latent Diffusion Models for Video Frame Interpolation

no code implementations21 Apr 2024 Zhilin Huang, Yijie Yu, Ling Yang, Chujun Qin, Bing Zheng, Xiawu Zheng, Zikun Zhou, YaoWei Wang, Wenming Yang

With the advancement of AIGC, video frame interpolation (VFI) has become a crucial component in existing video generation frameworks, attracting widespread research interest.

Motion Estimation Video Frame Interpolation +1

HiVG: Hierarchical Multimodal Fine-grained Modulation for Visual Grounding

1 code implementation20 Apr 2024 Linhui Xiao, Xiaoshan Yang, Fang Peng, YaoWei Wang, Changsheng Xu

The cross-modal bridge can address the inconsistency between visual features and those required for grounding, and establish a connection between multi-level visual and text features.

cross-modal alignment Visual Grounding

State Space Model for New-Generation Network Alternative to Transformers: A Survey

1 code implementation15 Apr 2024 Xiao Wang, Shiao Wang, Yuhe Ding, Yuehang Li, Wentao Wu, Yao Rong, Weizhe Kong, Ju Huang, Shihao Li, Haoxiang Yang, Ziwen Wang, Bo Jiang, Chenglong Li, YaoWei Wang, Yonghong Tian, Jin Tang

In this paper, we give the first comprehensive review of these works and also provide experimental comparisons and analysis to better demonstrate the features and advantages of SSM.

StoryImager: A Unified and Efficient Framework for Coherent Story Visualization and Completion

1 code implementation9 Apr 2024 Ming Tao, Bing-Kun Bao, Hao Tang, YaoWei Wang, Changsheng Xu

3) The story visualization and continuation models are trained and inferred independently, which is not user-friendly.

Image Generation Story Visualization

RTracker: Recoverable Tracking via PN Tree Structured Memory

1 code implementation CVPR 2024 Yuqing Huang, Xin Li, Zikun Zhou, YaoWei Wang, Zhenyu He, Ming-Hsuan Yang

Upon the PN tree memory, we develop corresponding walking rules for determining the state of the target and define a set of control flows to unite the tracker and the detector in different tracking scenarios.

Visual Object Tracking Visual Tracking

Tracking Meets LoRA: Faster Training, Larger Model, Stronger Performance

1 code implementation8 Mar 2024 Liting Lin, Heng Fan, Zhipeng Zhang, YaoWei Wang, Yong Xu, Haibin Ling

The essence of our work lies in adapting LoRA, a technique that fine-tunes a small subset of model parameters without adding inference latency, to the domain of visual tracking.

parameter-efficient fine-tuning Visual Object Tracking +1

Prompt-Driven Dynamic Object-Centric Learning for Single Domain Generalization

no code implementations CVPR 2024 Deng Li, Aming Wu, YaoWei Wang, Yahong Han

In this paper, we propose a dynamic object-centric perception network based on prompt learning, aiming to adapt to the variations in image complexity.

Domain Generalization image-classification +5

Towards Robust and Efficient Cloud-Edge Elastic Model Adaptation via Selective Entropy Distillation

1 code implementation27 Feb 2024 Yaofo Chen, Shuaicheng Niu, YaoWei Wang, Shoukai Xu, Hengjie Song, Mingkui Tan

Moreover, with the increasing data collected at the edge, this paradigm also fails to further adapt the cloud model for better performance.

Semi-supervised Counting via Pixel-by-pixel Density Distribution Modelling

1 code implementation23 Feb 2024 Hui Lin, Zhiheng Ma, Rongrong Ji, YaoWei Wang, Zhou Su, Xiaopeng Hong, Deyu Meng

This paper focuses on semi-supervised crowd counting, where only a small portion of the training data are labeled.

Crowd Counting Decoder +1

Towards Seamless Adaptation of Pre-trained Models for Visual Place Recognition

1 code implementation22 Feb 2024 Feng Lu, Lijun Zhang, Xiangyuan Lan, Shuting Dong, YaoWei Wang, Chun Yuan

Experimental results show that our method outperforms the state-of-the-art methods with less training data and training time, and uses about only 3% retrieval runtime of the two-stage VPR methods with RANSAC-based spatial verification.

Re-Ranking Visual Place Recognition

MB-RACS: Measurement-Bounds-based Rate-Adaptive Image Compressed Sensing Network

no code implementations19 Jan 2024 Yujun Huang, Bin Chen, Naiqi Li, Baoyi An, Shu-Tao Xia, YaoWei Wang

In this paper, we propose a Measurement-Bounds-based Rate-Adaptive Image Compressed Sensing Network (MB-RACS) framework, which aims to adaptively determine the sampling rate for each image block in accordance with traditional measurement bounds theory.

compressed sensing Image Compressed Sensing

VMamba: Visual State Space Model

13 code implementations18 Jan 2024 Yue Liu, Yunjie Tian, Yuzhong Zhao, Hongtian Yu, Lingxi Xie, YaoWei Wang, Qixiang Ye, Jianbin Jiao, Yunfan Liu

At the core of VMamba is a stack of Visual State-Space (VSS) blocks with the 2D Selective Scan (SS2D) module.

Computational Efficiency Language Modeling +4

Modality-Collaborative Test-Time Adaptation for Action Recognition

no code implementations CVPR 2024 Baochen Xiong, Xiaoshan Yang, Yaguang Song, YaoWei Wang, Changsheng Xu

Existing image-based TTA methods cannot be directly applied to this task because video have domain shift in multimodal and temporal which brings difficulties to adaptation.

Action Recognition Test-time Adaptation +1

FFCA-Net: Stereo Image Compression via Fast Cascade Alignment of Side Information

no code implementations28 Dec 2023 Yichong Xia, Yujun Huang, Bin Chen, Haoqian Wang, YaoWei Wang

To address this limitation, we propose a Feature-based Fast Cascade Alignment network (FFCA-Net) to fully leverage the side information on the decoder.

Data Compression Decoder +2

Regressor-Segmenter Mutual Prompt Learning for Crowd Counting

no code implementations CVPR 2024 Mingyue Guo, Li Yuan, Zhaoyi Yan, Binghui Chen, YaoWei Wang, Qixiang Ye

In this study, we propose mutual prompt learning (mPrompt), which leverages a regressor and a segmenter as guidance for each other, solving bias and inaccuracy caused by annotation variance while distinguishing foreground from background.

Crowd Counting Prompt Learning

Recognizing Conditional Causal Relationships about Emotions and Their Corresponding Conditions

no code implementations28 Nov 2023 Xinhong Chen, Zongxi Li, YaoWei Wang, Haoran Xie, JianPing Wang, Qing Li

To highlight the context in such special causal relationships, we propose a new task to determine whether or not an input pair of emotion and cause has a valid causal relationship under different contexts and extract the specific context clauses that participate in the causal relationship.

valid

Uncovering Hidden Connections: Iterative Search and Reasoning for Video-grounded Dialog

2 code implementations11 Oct 2023 Haoyu Zhang, Meng Liu, YaoWei Wang, Da Cao, Weili Guan, Liqiang Nie

In response to these challenges, we present an iterative search and reasoning framework, which consists of a textual encoder, a visual encoder, and a generator.

Question Answering Response Generation +1

Learning Mask-aware CLIP Representations for Zero-Shot Segmentation

2 code implementations NeurIPS 2023 Siyu Jiao, Yunchao Wei, YaoWei Wang, Yao Zhao, Humphrey Shi

However, in the paper, we reveal that CLIP is insensitive to different mask proposals and tends to produce similar predictions for various mask proposals of the same image.

Open Vocabulary Semantic Segmentation Zero Shot Segmentation

MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning

1 code implementation25 Aug 2023 Bang Yang, Fenglin Liu, Xian Wu, YaoWei Wang, Xu sun, Yuexian Zou

To deal with the label shortage problem, we present a simple yet effective zero-shot approach MultiCapCLIP that can generate visual captions for different scenarios and languages without any labeled vision-caption pairs of downstream datasets.

Image Captioning Video Captioning

CiteTracker: Correlating Image and Text for Visual Tracking

1 code implementation ICCV 2023 Xin Li, Yuqing Huang, Zhenyu He, YaoWei Wang, Huchuan Lu, Ming-Hsuan Yang

Existing visual tracking methods typically take an image patch as the reference of the target to perform tracking.

Attribute Descriptive +2

MixBCT: Towards Self-Adapting Backward-Compatible Training

1 code implementation14 Aug 2023 Yu Liang, Yufeng Zhang, Shiliang Zhang, YaoWei Wang, Sheng Xiao, Rong Xiao, Xiaoyu Wang

Instance-based methods like L2 regression take into account the distribution of old features but impose strong constraints on the performance of the new model itself.

Face Recognition Image Retrieval +1

Benign Shortcut for Debiasing: Fair Visual Recognition via Intervention with Shortcut Features

no code implementations13 Aug 2023 Yi Zhang, Jitao Sang, Junyang Wang, Dongmei Jiang, YaoWei Wang

To this end, we propose \emph{Shortcut Debiasing}, to first transfer the target task's learning of bias attributes from bias features to shortcut features, and then employ causal intervention to eliminate shortcut features during inference.

Fairness

Strip-MLP: Efficient Token Interaction for Vision MLP

1 code implementation ICCV 2023 Guiping Cao, Shengda Luo, Wenjian Huang, Xiangyuan Lan, Dongmei Jiang, YaoWei Wang, JianGuo Zhang

Finally, based on the Strip MLP layer, we propose a novel \textbf{L}ocal \textbf{S}trip \textbf{M}ixing \textbf{M}odule (LSMM) to boost the token interaction power in the local region.

HQG-Net: Unpaired Medical Image Enhancement with High-Quality Guidance

no code implementations15 Jul 2023 Chunming He, Kai Li, Guoxia Xu, Jiangpeng Yan, Longxiang Tang, Yulun Zhang, Xiu Li, YaoWei Wang

Specifically, we extract features from an HQ image and explicitly insert the features, which are expected to encode HQ cues, into the enhancement network to guide the LQ enhancement with the variational normalization module.

Image Enhancement Medical Image Enhancement

Improving Deep Representation Learning via Auxiliary Learnable Target Coding

1 code implementation30 May 2023 KangJun Liu, Ke Chen, Kui Jia, YaoWei Wang

Deep representation learning is a subfield of machine learning that focuses on learning meaningful and useful representations of data through deep neural networks.

Representation Learning Retrieval +1

ShuffleMix: Improving Representations via Channel-Wise Shuffle of Interpolated Hidden States

1 code implementation30 May 2023 KangJun Liu, Ke Chen, Lihua Guo, YaoWei Wang, Kui Jia

Inspired by good robustness of alternative dropout strategies against over-fitting on limited patterns of training samples, this paper introduces a novel concept of ShuffleMix -- Shuffle of Mixed hidden features, which can be interpreted as a kind of dropout operation in feature space.

Benchmarking Data Augmentation +1

Manifold-Aware Self-Training for Unsupervised Domain Adaptation on Regressing 6D Object Pose

1 code implementation18 May 2023 Yichen Zhang, Jiehong Lin, Ke Chen, Zelin Xu, YaoWei Wang, Kui Jia

Domain gap between synthetic and real data in visual regression (e. g. 6D pose estimation) is bridged in this paper via global feature alignment and local refinement on the coarse classification of discretized anchor classes in target space, which imposes a piece-wise target manifold regularization into domain-invariant representation learning.

6D Pose Estimation regression +2

CLIP-VG: Self-paced Curriculum Adapting of CLIP for Visual Grounding

3 code implementations15 May 2023 Linhui Xiao, Xiaoshan Yang, Fang Peng, Ming Yan, YaoWei Wang, Changsheng Xu

In order to utilize vision and language pre-trained models to address the grounding problem, and reasonably take advantage of pseudo-labels, we propose CLIP-VG, a novel method that can conduct self-paced curriculum adapting of CLIP with pseudo-language labels.

Diversity Transfer Learning +1

Towards Efficient Task-Driven Model Reprogramming with Foundation Models

no code implementations5 Apr 2023 Shoukai Xu, Jiangchao Yao, Ran Luo, Shuhai Zhang, Zihao Lian, Mingkui Tan, Bo Han, YaoWei Wang

Moreover, the data used for pretraining foundation models are usually invisible and very different from the target data of downstream tasks.

Knowledge Distillation Transfer Learning

Reliability-Hierarchical Memory Network for Scribble-Supervised Video Object Segmentation

1 code implementation25 Mar 2023 Zikun Zhou, Kaige Mao, Wenjie Pei, Hongpeng Wang, YaoWei Wang, Zhenyu He

To be specific, RHMNet first only uses the memory in the high-reliability level to locate the region with high reliability belonging to the target, which is highly similar to the initial target scribble.

Semantic Segmentation Video Object Segmentation +1

ZeroNLG: Aligning and Autoencoding Domains for Zero-Shot Multimodal and Multilingual Natural Language Generation

2 code implementations11 Mar 2023 Bang Yang, Fenglin Liu, Yuexian Zou, Xian Wu, YaoWei Wang, David A. Clifton

We present the results of extensive experiments on twelve NLG tasks, showing that, without using any labeled downstream pairs for training, ZeroNLG generates high-quality and believable outputs and significantly outperforms existing zero-shot methods.

Image Captioning Image to text +6

Unsupervised Domain Adaptation via Distilled Discriminative Clustering

1 code implementation23 Feb 2023 Hui Tang, YaoWei Wang, Kui Jia

Differently, motivated by the fundamental assumption for domain adaptability, we re-cast the domain adaptation problem as discriminative clustering of target data, given strong privileged information provided by the closely related, labeled source data.

 Ranked #1 on Unsupervised Domain Adaptation on VisDA2017 (Average Accuracy metric)

Clustering Unsupervised Domain Adaptation +1

Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey

1 code implementation20 Feb 2023 Xiao Wang, Guangyao Chen, Guangwu Qian, Pengcheng Gao, Xiao-Yong Wei, YaoWei Wang, Yonghong Tian, Wen Gao

We also give visualization and analysis of the model parameters and results on representative downstream tasks.

Survey

DilateFormer: Multi-Scale Dilated Transformer for Visual Recognition

1 code implementation3 Feb 2023 Jiayu Jiao, Yu-Ming Tang, Kun-Yu Lin, Yipeng Gao, Jinhua Ma, YaoWei Wang, Wei-Shi Zheng

In this work, we explore effective Vision Transformers to pursue a preferable trade-off between the computational complexity and size of the attended receptive field.

Instance Segmentation object-detection +2

CIGAR: Cross-Modality Graph Reasoning for Domain Adaptive Object Detection

no code implementations CVPR 2023 Yabo Liu, Jinghua Wang, Chao Huang, YaoWei Wang, Yong Xu

To overcome these problems, we propose a cross-modality graph reasoning adaptation (CIGAR) method to take advantage of both visual and linguistic knowledge.

Graph Matching object-detection +1

AsyFOD: An Asymmetric Adaptation Paradigm for Few-Shot Domain Adaptive Object Detection

1 code implementation CVPR 2023 Yipeng Gao, Kun-Yu Lin, Junkai Yan, YaoWei Wang, Wei-Shi Zheng

Critically, in FSDAOD, the data-scarcity in the target domain leads to an extreme data imbalance between the source and target domains, which potentially causes over-adaptation in traditional feature alignment.

object-detection Object Detection

Unlearnable Clusters: Towards Label-agnostic Unlearnable Examples

1 code implementation CVPR 2023 Jiaming Zhang, Xingjun Ma, Qi Yi, Jitao Sang, Yu-Gang Jiang, YaoWei Wang, Changsheng Xu

Furthermore, we propose to leverage VisionandLanguage Pre-trained Models (VLPMs) like CLIP as the surrogate model to improve the transferability of the crafted UCs to diverse domains.

Data Poisoning

Universal Object Detection with Large Vision Model

1 code implementation19 Dec 2022 Feng Lin, Wenze Hu, YaoWei Wang, Yonghong Tian, Guangming Lu, Fanglin Chen, Yong Xu, Xiaoyu Wang

In this study, our focus is on a specific challenge: the large-scale, multi-domain universal object detection problem, which contributes to the broader goal of achieving a universal vision system.

model Object +2

Isolation and Impartial Aggregation: A Paradigm of Incremental Learning without Interference

1 code implementation29 Nov 2022 Yabin Wang, Zhiheng Ma, Zhiwu Huang, YaoWei Wang, Zhou Su, Xiaopeng Hong

To avoid obvious stage learning bottlenecks, we propose a brand-new stage-isolation based incremental learning framework, which leverages a series of stage-isolated classifiers to perform the learning task of each stage without the interference of others.

Continual Learning Incremental Learning

SgVA-CLIP: Semantic-guided Visual Adapting of Vision-Language Models for Few-shot Image Classification

1 code implementation28 Nov 2022 Fang Peng, Xiaoshan Yang, Linhui Xiao, YaoWei Wang, Changsheng Xu

Although significant progress has been made in few-shot learning, most of existing few-shot image classification methods require supervised pre-training on a large amount of samples of base classes, which limits their generalization ability in real world application.

Few-Shot Image Classification Few-Shot Learning +3

Revisiting Color-Event based Tracking: A Unified Network, Dataset, and Metric

2 code implementations20 Nov 2022 Chuanming Tang, Xiao Wang, Ju Huang, Bo Jiang, Lin Zhu, Jianlin Zhang, YaoWei Wang, Yonghong Tian

In this paper, we propose a single-stage backbone network for Color-Event Unified Tracking (CEUTrack), which achieves the above functions simultaneously.

Object Localization Object Tracking

HARDVS: Revisiting Human Activity Recognition with Dynamic Vision Sensors

3 code implementations17 Nov 2022 Xiao Wang, Zongzhen Wu, Bo Jiang, Zhimin Bao, Lin Zhu, Guoqi Li, YaoWei Wang, Yonghong Tian

The main streams of human activity recognition (HAR) algorithms are developed based on RGB cameras which are suffered from illumination, fast motion, privacy-preserving, and large energy consumption.

Activity Prediction Human Activity Recognition +1

Spikformer: When Spiking Neural Network Meets Transformer

2 code implementations29 Sep 2022 Zhaokun Zhou, Yuesheng Zhu, Chao He, YaoWei Wang, Shuicheng Yan, Yonghong Tian, Li Yuan

Spikformer (66. 3M parameters) with comparable size to SEW-ResNet-152 (60. 2M, 69. 26%) can achieve 74. 81% top1 accuracy on ImageNet using 4 time steps, which is the state-of-the-art in directly trained SNNs models.

image-classification Image Classification

Learned Distributed Image Compression with Multi-Scale Patch Matching in Feature Domain

no code implementations6 Sep 2022 Yujun Huang, Bin Chen, Shiyu Qin, Jiawei Li, YaoWei Wang, Tao Dai, Shu-Tao Xia

Specifically, MSFDPM consists of a side information feature extractor, a multi-scale feature domain patch matching module, and a multi-scale feature fusion network.

Decoder Image Compression +1

Identifying the kind behind SMILES—anatomical therapeutic chemical classification using structure-only representations

1 code implementation Briefings in Bioinformatics 2022 Yi Cao, Zhen-Qun Yang, Xu-Lu Zhang, Wenqi Fan, YaoWei Wang, Jiajun Shen, Dong-Qing Wei, Qing Li, Xiao-Yong Wei

The model is with better explainability in the sense that it is consists of a straightforward tokenization that extracts and embeds statistically and physicochemically meaningful tokens, and a deep network backed by a set of pyramid kernels to capture multi-resolution chemical structural characteristics.

Drug ATC Classification Molecular Property Prediction

DAS: Densely-Anchored Sampling for Deep Metric Learning

1 code implementation30 Jul 2022 Lizhao Liu, Shangxin Huang, Zhuangwei Zhuang, Ran Yang, Mingkui Tan, YaoWei Wang

To this end, we propose a Densely-Anchored Sampling (DAS) scheme that considers the embedding with corresponding data point as "anchor" and exploits the anchor's nearby embedding space to densely produce embeddings without data points.

Face Recognition Image Retrieval +2

Entity-Graph Enhanced Cross-Modal Pretraining for Instance-level Product Retrieval

no code implementations17 Jun 2022 Xiao Dong, Xunlin Zhan, Yunchao Wei, XiaoYong Wei, YaoWei Wang, Minlong Lu, Xiaochun Cao, Xiaodan Liang

Our goal in this research is to study a more realistic environment in which we can conduct weakly-supervised multi-modal instance-level product retrieval for fine-grained product categories.

Retrieval

Prompt-based Learning for Unpaired Image Captioning

no code implementations26 May 2022 Peipei Zhu, Xiao Wang, Lin Zhu, Zhenglong Sun, Weishi Zheng, YaoWei Wang, Changwen Chen

Inspired by the success of Vision-Language Pre-Trained Models (VL-PTMs) in this research, we attempt to infer the cross-domain cue information about a given image from the large VL-PTMs for the UIC task.

Image Captioning Image-text Retrieval +3

Global-Supervised Contrastive Loss and View-Aware-Based Post-Processing for Vehicle Re-Identification

no code implementations17 Apr 2022 Zhijun Hu, Yong Xu, Jie Wen, Xianjing Cheng, Zaijun Zhang, Lilei Sun, YaoWei Wang

The proposed VABPP method is the first time that the view-aware-based method is used as a post-processing method in the field of vehicle re-identification.

Attribute Vehicle Re-Identification

Fine-Grained Object Classification via Self-Supervised Pose Alignment

2 code implementations CVPR 2022 Xuhui Yang, YaoWei Wang, Ke Chen, Yong Xu, Yonghong Tian

Semantic patterns of fine-grained objects are determined by subtle appearance difference of local parts, which thus inspires a number of part-based methods.

Classification Object +1

Boost Test-Time Performance with Closed-Loop Inference

no code implementations21 Mar 2022 Shuaicheng Niu, Jiaxiang Wu, Yifan Zhang, Guanghui Xu, Haokun Li, Peilin Zhao, Junzhou Huang, YaoWei Wang, Mingkui Tan

Motivated by this, we propose to predict those hard-classified test samples in a looped manner to boost the model performance.

Auxiliary Learning

Mixed-Precision Neural Network Quantization via Learned Layer-wise Importance

1 code implementation16 Mar 2022 Chen Tang, Kai Ouyang, Zhi Wang, Yifei Zhu, YaoWei Wang, Wen Ji, Wenwu Zhu

For example, MPQ search on ResNet18 with our indicators takes only 0. 06 s, which improves time efficiency exponentially compared to iterative search methods.

Quantization

Peng Cheng Object Detection Benchmark for Smart City

no code implementations11 Mar 2022 YaoWei Wang, Zhouxin Yang, Rui Liu, Deng Li, Yuandu Lai, Leyuan Fang, Yahong Han

Considering the diversity and complexity of scenes in intelligent city governance, we build a large-scale object detection benchmark for the smart city.

Diversity Object +2

Unpaired Image Captioning by Image-level Weakly-Supervised Visual Concept Recognition

no code implementations7 Mar 2022 Peipei Zhu, Xiao Wang, Yong Luo, Zhenglong Sun, Wei-Shi Zheng, YaoWei Wang, Changwen Chen

The image-level labels are utilized to train a weakly-supervised object recognition model to extract object information (e. g., instance) in an image, and the extracted instances are adopted to infer the relationships among different objects based on an enhanced graph neural network (GNN).

Graph Neural Network Image Captioning +3

Boosting Crowd Counting via Multifaceted Attention

1 code implementation CVPR 2022 Hui Lin, Zhiheng Ma, Rongrong Ji, YaoWei Wang, Xiaopeng Hong

Secondly, we design the Local Attention Regularization to supervise the training of LRA by minimizing the deviation among the attention for different feature locations.

Crowd Counting

Conceptor Learning for Class Activation Mapping

no code implementations21 Jan 2022 Guangwu Qian, Zhen-Qun Yang, Xu-Lu Zhang, YaoWei Wang, Qing Li, Xiao-Yong Wei

Class Activation Mapping (CAM) has been widely adopted to generate saliency maps which provides visual explanations for deep neural networks (DNNs).

Relation

Towards End-to-End Image Compression and Analysis with Transformers

1 code implementation17 Dec 2021 Yuanchao Bai, Xu Yang, Xianming Liu, Junjun Jiang, YaoWei Wang, Xiangyang Ji, Wen Gao

Meanwhile, we propose a feature aggregation module to fuse the compressed features with the selected intermediate features of the Transformer, and feed the aggregated features to a deconvolutional neural network for image reconstruction.

Classification image-classification +4

Frequency Spectrum Augmentation Consistency for Domain Adaptive Object Detection

no code implementations16 Dec 2021 Rui Liu, Yahong Han, YaoWei Wang, Qi Tian

In the second stage, augmented source and target data with pseudo labels are adopted to perform the self-training for prediction consistency.

Object object-detection +1

Learning to Share in Multi-Agent Reinforcement Learning

2 code implementations16 Dec 2021 Yuxuan Yi, Ge Li, YaoWei Wang, Zongqing Lu

Inspired by the fact that sharing plays a key role in human's learning of cooperation, we propose LToS, a hierarchically decentralized MARL framework that enables agents to learn to dynamically share reward with neighbors so as to encourage agents to cooperate on the global objective through collectives.

Multi-agent Reinforcement Learning reinforcement-learning +2

An Informative Tracking Benchmark

1 code implementation13 Dec 2021 Xin Li, Qiao Liu, Wenjie Pei, Qiuhong Shen, YaoWei Wang, Huchuan Lu, Ming-Hsuan Yang

Along with the rapid progress of visual tracking, existing benchmarks become less informative due to redundancy of samples and weak discrimination between current trackers, making evaluations on all datasets extremely time-consuming.

Diversity Visual Tracking

Optimized Separable Convolution: Yet Another Efficient Convolution Operator

no code implementations29 Sep 2021 Tao Wei, Yonghong Tian, YaoWei Wang, Yun Liang, Chang Wen Chen

In this research, we propose a novel and principled operator called optimized separable convolution by optimal design for the internal number of groups and kernel sizes for general separable convolutions can achieve the complexity of O(C^{\frac{3}{2}}K).

M5Product: Self-harmonized Contrastive Learning for E-commercial Multi-modal Pretraining

no code implementations CVPR 2022 Xiao Dong, Xunlin Zhan, Yangxin Wu, Yunchao Wei, Michael C. Kampffmeyer, XiaoYong Wei, Minlong Lu, YaoWei Wang, Xiaodan Liang

Despite the potential of multi-modal pre-training to learn highly discriminative feature representations from complementary data modalities, current progress is being slowed by the lack of large-scale modality-diverse datasets.

Contrastive Learning

VisEvent: Reliable Object Tracking via Collaboration of Frame and Event Flows

2 code implementations11 Aug 2021 Xiao Wang, Jianing Li, Lin Zhu, Zhipeng Zhang, Zhe Chen, Xin Li, YaoWei Wang, Yonghong Tian, Feng Wu

Different from visible cameras which record intensity images frame by frame, the biologically inspired event camera produces a stream of asynchronous and sparse events with much lower latency.

Object Tracking

MFGNet: Dynamic Modality-Aware Filter Generation for RGB-T Tracking

2 code implementations22 Jul 2021 Xiao Wang, Xiujun Shu, Shiliang Zhang, Bo Jiang, YaoWei Wang, Yonghong Tian, Feng Wu

The visible and thermal filters will be used to conduct a dynamic convolutional operation on their corresponding input feature maps respectively.

Rgb-T Tracking

Direct Measure Matching for Crowd Counting

no code implementations4 Jul 2021 Hui Lin, Xiaopeng Hong, Zhiheng Ma, Xing Wei, Yunfeng Qiu, YaoWei Wang, Yihong Gong

Second, we derive a semi-balanced form of Sinkhorn divergence, based on which a Sinkhorn counting loss is designed for measure matching.

Crowd Counting

Self-Supervised Tracking via Target-Aware Data Synthesis

no code implementations21 Jun 2021 Xin Li, Wenjie Pei, YaoWei Wang, Zhenyu He, Huchuan Lu, Ming-Hsuan Yang

While deep-learning based tracking methods have achieved substantial progress, they entail large-scale and high-quality annotated data for sufficient training.

Representation Learning Self-Supervised Learning +1

Learning Scalable lY=-Constrained Near-Lossless Image Compression via Joint Lossy Image and Residual Compression

no code implementations CVPR 2021 Yuanchao Bai, Xianming Liu, WangMeng Zuo, YaoWei Wang, Xiangyang Ji

To achieve scalable compression with the error bound larger than zero, we derive the probability model of the quantized residual by quantizing the learned probability model of the original residual, instead of training multiple networks.

Image Compression

Tracking by Joint Local and Global Search: A Target-aware Attention based Approach

1 code implementation9 Jun 2021 Xiao Wang, Jin Tang, Bin Luo, YaoWei Wang, Yonghong Tian, Feng Wu

In this paper, we propose a novel and general target-aware attention mechanism (termed TANet) and integrate it with tracking-by-detection framework to conduct joint local and global search for robust tracking.

Decoder Object +1

Anomaly Detection with Prototype-Guided Discriminative Latent Embeddings

no code implementations30 Apr 2021 Yuandu Lai, Yahong Han, YaoWei Wang

Recent efforts towards video anomaly detection (VAD) try to learn a deep autoencoder to describe normal event patterns with small reconstruction errors.

Anomaly Detection Optical Flow Estimation +1

AAformer: Auto-Aligned Transformer for Person Re-Identification

no code implementations2 Apr 2021 Kuan Zhu, Haiyun Guo, Shiliang Zhang, YaoWei Wang, Jing Liu, Jinqiao Wang, Ming Tang

In this article, we introduce an alignment scheme in transformer architecture for the first time and propose the auto-aligned transformer (AAformer) to automatically locate both the human parts and nonhuman ones at patch level.

Human Parsing Image Classification +3

Learning Scalable $\ell_\infty$-constrained Near-lossless Image Compression via Joint Lossy Image and Residual Compression

no code implementations31 Mar 2021 Yuanchao Bai, Xianming Liu, WangMeng Zuo, YaoWei Wang, Xiangyang Ji

To achieve scalable compression with the error bound larger than zero, we derive the probability model of the quantized residual by quantizing the learned probability model of the original residual, instead of training multiple networks.

Image Compression

Dynamic Attention guided Multi-Trajectory Analysis for Single Object Tracking

1 code implementation30 Mar 2021 Xiao Wang, Zhe Chen, Jin Tang, Bin Luo, YaoWei Wang, Yonghong Tian, Feng Wu

In this paper, we propose to introduce more dynamics by devising a dynamic attention-guided multi-trajectory tracking strategy.

Object Tracking

Classification of Single-View Object Point Clouds

no code implementations18 Dec 2020 Zelin Xu, Ke Chen, KangJun Liu, Changxing Ding, YaoWei Wang, Kui Jia

By adapting existing ModelNet40 and ScanNet datasets to the single-view, partial setting, experiment results can verify the necessity of object pose estimation and superiority of our PAPNet to existing classifiers.

3D Object Classification 6D Pose Estimation using RGB +6

Modular Graph Attention Network for Complex Visual Relational Reasoning

no code implementations22 Nov 2020 Yihan Zheng, Zhiquan Wen, Mingkui Tan, Runhao Zeng, Qi Chen, YaoWei Wang, Qi Wu

Moreover, to capture the complex logic in a query, we construct a relational graph to represent the visual objects and their relationships, and propose a multi-step reasoning method to progressively understand the complex logic.

Graph Attention Question Answering +5

Cannot find the paper you are looking for? You can Submit a new open access paper.