Search Results for author: Junyu Gao

Found 48 papers, 24 papers with code

From Captions to Rewards (CAREVL): Leveraging Large Language Model Experts for Enhanced Reward Modeling in Large Vision-Language Models

no code implementations8 Mar 2025 Muzhi Dai, Jiashuo Sun, Zhiyuan Zhao, Shixuan Liu, Rui Li, Junyu Gao, Xuelong Li

Aligning large vision-language models (LVLMs) with human preferences is challenging due to the scarcity of fine-grained, high-quality, and multimodal preference data without human annotations.

Image Captioning Language Modeling +2

A Benchmark for Multi-Lingual Vision-Language Learning in Remote Sensing Image Captioning

no code implementations6 Mar 2025 Qing Zhou, Tao Yang, Junyu Gao, Weiping Ni, Junzheng Wu, Qi Wang

Remote Sensing Image Captioning (RSIC) is a cross-modal field bridging vision and language, aimed at automatically generating natural language descriptions of features and scenes in remote sensing imagery.

Descriptive Image Captioning

FGAseg: Fine-Grained Pixel-Text Alignment for Open-Vocabulary Semantic Segmentation

1 code implementation1 Jan 2025 Bingyu Li, Da Zhang, Zhiyuan Zhao, Junyu Gao, Xuelong Li

The core of FGAseg is a Pixel-Level Alignment module that employs a cross-modal attention mechanism and a text-pixel alignment loss to refine the coarse-grained alignment from CLIP, achieving finer-grained pixel-text semantic alignment.

Open Vocabulary Semantic Segmentation Open-Vocabulary Semantic Segmentation +1

SignEye: Traffic Sign Interpretation from Vehicle First-Person View

no code implementations18 Nov 2024 Chuang Yang, Xu Han, Tao Han, Yuejiao Su, Junyu Gao, Hongyuan Zhang, Yi Wang, Lap-Pui Chau

Meanwhile, we develop a traffic guidance assistant (TGA) scenario application to re-explore the role of traffic signs in ADS as a complement to popular autonomous technologies (such as obstacle perception).

Autonomous Driving

Real-Time Text Detection with Similar Mask in Traffic, Industrial, and Natural Scenes

1 code implementation5 Nov 2024 Xu Han, Junyu Gao, Chuang Yang, Yuan Yuan, Qi Wang

In addition, to validate the scene robustness of the SM-Net, we conduct experiments on traffic, industrial, and natural scene datasets.

Text Detection

Conjugated Semantic Pool Improves OOD Detection with Pre-trained Vision-Language Models

1 code implementation11 Oct 2024 Mengyuan Chen, Junyu Gao, Changsheng Xu

A straightforward pipeline for zero-shot out-of-distribution (OOD) detection involves selecting potential OOD labels from an extensive semantic pool and then leveraging a pre-trained vision-language model to perform classification on both in-distribution (ID) and OOD labels.

Out of Distribution (OOD) Detection

Revisiting Essential and Nonessential Settings of Evidential Deep Learning

1 code implementation1 Oct 2024 Mengyuan Chen, Junyu Gao, Changsheng Xu

Evidential Deep Learning (EDL) is an emerging method for uncertainty estimation that provides reliable predictive uncertainty in a single forward pass, attracting significant attention.

Common Sense Reasoning Deep Learning +1

Focus Entirety and Perceive Environment for Arbitrary-Shaped Text Detection

no code implementations25 Sep 2024 Xu Han, Junyu Gao, Chuang Yang, Yuan Yuan, Qi Wang

The latter extracts region-level information and encourages the model to focus on the distribution of positive samples in the vicinity of a pixel, which perceives environment information.

Text Detection

A Comprehensive Survey on Evidential Deep Learning and Its Applications

1 code implementation7 Sep 2024 Junyu Gao, Mengyuan Chen, Liangyu Xiang, Changsheng Xu

To address this challenge, a novel paradigm called Evidential Deep Learning (EDL) has emerged, providing reliable uncertainty estimation with minimal additional computation in a single forward pass.

Autonomous Driving Deep Learning +2

A Training-Free Framework for Video License Plate Tracking and Recognition with Only One-Shot

1 code implementation11 Aug 2024 Haoxuan Ding, Qi Wang, Junyu Gao, Qiang Li

We propose OneShotLP, a training-free framework for video-based license plate detection and recognition, leveraging these advanced models.

License Plate Detection License Plate Recognition +3

Text-only Synthesis for Image Captioning

no code implementations28 May 2024 Qing Zhou, Junlin Huang, Qiang Li, Junyu Gao, Qi Wang

In this paper, we propose Text-only Synthesis for Image Captioning (ToCa), which further advances this relaxation with fewer human labor and less computing time.

Image Captioning Language Modelling +2

U3M: Unbiased Multiscale Modal Fusion Model for Multimodal Semantic Segmentation

1 code implementation24 May 2024 Bingyu Li, Da Zhang, Zhiyuan Zhao, Junyu Gao, Xuelong Li

To address this issue, we leverage the inherent capabilities of the model itself to discover the optimal equilibrium in multimodal fusion and introduce U3M: An Unbiased Multiscale Modal Fusion Model for Multimodal Semantic Segmentation.

Segmentation Semantic Segmentation

Embedding Generalized Semantic Knowledge into Few-Shot Remote Sensing Segmentation

no code implementations22 May 2024 Yuyu Jia, Wei Huang, Junyu Gao, Qi Wang, Qiang Li

Few-shot segmentation (FSS) for remote sensing (RS) imagery leverages supporting information from limited annotated samples to achieve query segmentation of novel classes.

Segmentation

Like Humans to Few-Shot Learning through Knowledge Permeation of Vision and Text

no code implementations21 May 2024 Yuyu Jia, Qing Zhou, Wei Huang, Junyu Gao, Qi Wang

Few-shot learning aims to generalize the recognizer from seen categories to an entirely novel scenario.

Few-Shot Learning Specificity

Dynamic Proxy Domain Generalizes the Crowd Localization by Better Binary Segmentation

1 code implementation22 Apr 2024 Junyu Gao, Da Zhang, Xuelong Li

Then, based on the theory, we design a DPD algorithm which is composed by a training paradigm and proxy domain generator to enhance the domain generalization of the confidence-threshold learner.

Binary Classification Domain Generalization

NWPU-MOC: A Benchmark for Fine-grained Multi-category Object Counting in Aerial Images

1 code implementation19 Jan 2024 Junyu Gao, Liangliang Zhao, Xuelong Li

Considering the absence of a dataset for this task, a large-scale Dataset (NWPU-MOC) is collected, consisting of 3, 416 scenes with a resolution of 1024 $\times$ 1024 pixels, and well-annotated using 14 fine-grained object categories.

Object Object Counting

SamLP: A Customized Segment Anything Model for License Plate Detection

1 code implementation12 Jan 2024 Haoxuan Ding, Junyu Gao, Yuan Yuan, Qi Wang

Meanwhile, the proposed SamLP has great few-shot and zero-shot learning ability, which shows the potential of transferring vision foundation model.

License Plate Detection Zero-Shot Learning

Fast-Slow Test-Time Adaptation for Online Vision-and-Language Navigation

1 code implementation22 Nov 2023 Junyu Gao, Xuan Yao, Changsheng Xu

Such agents are typically required to execute user instructions in an online manner, leading us to explore the use of unlabeled test samples for effective online model adaptation.

Navigate Test-time Adaptation +1

Unified Multi-modal Unsupervised Representation Learning for Skeleton-based Action Understanding

1 code implementation6 Nov 2023 Shengkai Sun, Daizong Liu, Jianfeng Dong, Xiaoye Qu, Junyu Gao, Xun Yang, Xun Wang, Meng Wang

In this manner, our framework is able to learn the unified representations of uni-modal or multi-modal skeleton input, which is flexible to different kinds of modality input for robust action understanding in practical cases.

Action Understanding Representation Learning +1

Learning Transferable Conceptual Prototypes for Interpretable Unsupervised Domain Adaptation

no code implementations12 Oct 2023 Junyu Gao, Xinhong Ma, Changsheng Xu

Despite the great progress of unsupervised domain adaptation (UDA) with the deep neural networks, current UDA models are opaque and cannot provide promising explanations, limiting their applications in the scenarios that require safe and controllable model decisions.

Decision Making Pseudo Label +2

Multimodal Imbalance-Aware Gradient Modulation for Weakly-supervised Audio-Visual Video Parsing

no code implementations5 Jul 2023 Jie Fu, Junyu Gao, Changsheng Xu

In this paper, to balance the feature learning processes of different modalities, a dynamic gradient modulation (DGM) mechanism is explored, where a novel and effective metric function is designed to measure the imbalanced feature learning between audio and visual modalities.

Imbalanced Aircraft Data Anomaly Detection

no code implementations17 May 2023 Hao Yang, Junyu Gao, Yuan Yuan, Xuelong Li

Anomaly detection in temporal data from sensors under aviation scenarios is a practical but challenging task: 1) long temporal data is difficult to extract contextual information with temporal correlation; 2) the anomalous data are rare in time series, causing normal/abnormal imbalance in anomaly detection, making the detector classification degenerate or even fail.

Anomaly Detection Time Series

Collecting Cross-Modal Presence-Absence Evidence for Weakly-Supervised Audio-Visual Event Perception

1 code implementation CVPR 2023 Junyu Gao, Mengyuan Chen, Changsheng Xu

We argue that, for an event residing in one modality, the modality itself should provide ample presence evidence of this event, while the other complementary modality is encouraged to afford the absence evidence as a reference signal.

Cascade Evidential Learning for Open-World Weakly-Supervised Temporal Action Localization

no code implementations CVPR 2023 Mengyuan Chen, Junyu Gao, Changsheng Xu

Targeting at recognizing and localizing action instances with only video-level labels during training, Weakly-supervised Temporal Action Localization (WTAL) has achieved significant progress in recent years.

Open Set Learning Weakly-supervised Temporal Action Localization +1

Counting Like Human: Anthropoid Crowd Counting on Modeling the Similarity of Objects

no code implementations2 Dec 2022 Qi Wang, Juncheng Wang, Junyu Gao, Yuan Yuan, Xuelong Li

The mainstream crowd counting methods regress density map and integrate it to obtain counting results.

Crowd Counting

MAFNet: A Multi-Attention Fusion Network for RGB-T Crowd Counting

no code implementations14 Aug 2022 PengYu Chen, Junyu Gao, Yuan Yuan, Qi Wang

RGB-Thermal (RGB-T) crowd counting is a challenging task, which uses thermal images as complementary information to RGB images to deal with the decreased performance of unimodal RGB-based methods in scenes with low-illumination or similar backgrounds.

Crowd Counting

Crowd Localization from Gaussian Mixture Scoped Knowledge and Scoped Teacher

no code implementations12 Jun 2022 Juncheng Wang, Junyu Gao, Yuan Yuan, Qi Wang

The core reason of intrinsic scale shift being one of the most essential issues in crowd localization is that it is ubiquitous in crowd scenes and makes scale distribution chaotic.

Learning Muti-expert Distribution Calibration for Long-tailed Video Classification

no code implementations22 May 2022 Yufan Hu, Junyu Gao, Changsheng Xu

Most existing state-of-the-art video classification methods assume that the training data obey a uniform distribution.

Classification Image Classification +1

Learning Commonsense-aware Moment-Text Alignment for Fast Video Temporal Grounding

1 code implementation4 Apr 2022 Ziyue Wu, Junyu Gao, Shucheng Huang, Changsheng Xu

Then, a commonsense-aware interaction module is designed to obtain bridged visual and text features by utilizing the learned commonsense concepts.

cross-modal alignment Natural Language Queries

Fine-grained Temporal Contrastive Learning for Weakly-supervised Temporal Action Localization

1 code implementation CVPR 2022 Junyu Gao, Mengyuan Chen, Changsheng Xu

We target at the task of weakly-supervised action localization (WSAL), where only video-level action labels are available during model training.

Classification Contrastive Learning +4

DR.VIC: Decomposition and Reasoning for Video Individual Counting

2 code implementations CVPR 2022 Tao Han, Lei Bai, Junyu Gao, Qi Wang, Wanli Ouyang

Instead of relying on the Multiple Object Tracking (MOT) techniques, we propose to solve the problem by decomposing all pedestrians into the initial pedestrians who existed in the first frame and the new pedestrians with separate identities in each following frame.

Crowd Counting Density Estimation +2

Weakly-Supervised Video Object Grounding via Causal Intervention

no code implementations1 Dec 2021 Wei Wang, Junyu Gao, Changsheng Xu

With this in mind, we design a unified causal framework to learn the deconfounded object-relevant association for more accurate and robust video object grounding.

Contrastive Learning Object +1

LDC-Net: A Unified Framework for Localization, Detection and Counting in Dense Crowds

no code implementations10 Oct 2021 Qi Wang, Tao Han, Junyu Gao, Yuan Yuan, Xuelong Li

The rapid development in visual crowd analysis shows a trend to count people by positioning or even detecting, rather than simply summing a density map.

Visual Crowd Analysis

Unsupervised Domain Adaptive Learning via Synthetic Data for Person Re-identification

no code implementations12 Sep 2021 Qi Wang, Sikai Bai, Junyu Gao, Yuan Yuan, Xuelong Li

In addition, due to domain gaps between different datasets, the performance is dramatically decreased when re-ID models pre-trained on label-rich datasets (source domain) are directly applied to other unlabeled datasets (target domain).

Person Re-Identification Unsupervised Domain Adaptation

Congested Crowd Instance Localization with Dilated Convolutional Swin Transformer

1 code implementation2 Aug 2021 Junyu Gao, Maoguo Gong, Xuelong Li

To this end, we propose a Dilated Convolutional Swin Transformer (DCST) for congested crowd scenes.

Crowd Counting Representation Learning

Video Crowd Localization with Multi-focus Gaussian Neighborhood Attention and a Large-Scale Benchmark

1 code implementation19 Jul 2021 Haopeng Li, Lingbo Liu, Kunlin Yang, Shinan Liu, Junyu Gao, Bin Zhao, Rui Zhang, Jun Hou

Video crowd localization is a crucial yet challenging task, which aims to estimate exact locations of human heads in the given crowded videos.

Health Status Prediction with Local-Global Heterogeneous Behavior Graph

no code implementations23 Mar 2021 Xuan Ma, Xiaoshan Yang, Junyu Gao, Changsheng Xu

However, these data streams are multi-source and heterogeneous, containing complex temporal structures with local contextual and global temporal aspects, which makes the feature learning and data joint utilization challenging.

Management

Fast Video Moment Retrieval

no code implementations ICCV 2021 Junyu Gao, Changsheng Xu

To tackle this issue, we replace the cross-modal interaction module with a cross-modal common space, in which moment-query alignment is learned and efficient moment search can be performed.

Moment Retrieval Retrieval +1

Active Universal Domain Adaptation

no code implementations ICCV 2021 Xinhong Ma, Junyu Gao, Changsheng Xu

This paper proposes a new paradigm for unsupervised domain adaptation, termed as Active Universal Domain Adaptation (AUDA), which removes all label set assumptions and aims for not only recognizing target samples from source classes but also inferring those from target-private classes by using active learning to annotate a small budget of target data.

Active Learning Universal Domain Adaptation +1

Learning Independent Instance Maps for Crowd Localization

1 code implementation8 Dec 2020 Junyu Gao, Tao Han, Qi Wang, Yuan Yuan, Xuelong Li

Furthermore, to improve the segmentation quality for different density regions, we present a differentiable Binarization Module (BM) to output structured instance maps.

Binarization Segmentation

Unsupervised Semantic Aggregation and Deformable Template Matching for Semi-Supervised Learning

1 code implementation NeurIPS 2020 Tao Han, Junyu Gao, Yuan Yuan, Qi Wang

In this paper, we combine both to propose an Unsupervised Semantic Aggregation and Deformable Template Matching (USADTM) framework for SSL, which strives to improve the classification performance with few labeled data and then reduce the cost in data annotating.

Template Matching Triplet

Cannot find the paper you are looking for? You can Submit a new open access paper.