Search Results for author: Shaoxiang Chen

Found 16 papers, 4 papers with code

Hierarchical Visual-Textual Graph for Temporal Activity Localization via Language

1 code implementation • ECCV 2020 • Shaoxiang Chen, Yu-Gang Jiang

Temporal Activity Localization via Language (TALL) in video is a recently proposed challenging vision task, and tackling it requires fine-grained understanding of the video content, however, this is overlooked by most of the existing works.

Sentence

Paper
Code

Lumen: Unleashing Versatile Vision-Centric Capabilities of Large Multimodal Models

1 code implementation • 12 Mar 2024 • Yang Jiao, Shaoxiang Chen, Zequn Jie, Jingjing Chen, Lin Ma, Yu-Gang Jiang

This adaptation leads to convenient development of such LMMs with minimal modifications, however, it overlooks the intrinsic characteristics of diverse visual tasks and hinders the learning of perception capabilities.

Concept Alignment Language Modelling

Paper
Code

LLaVA-MoLE: Sparse Mixture of LoRA Experts for Mitigating Data Conflicts in Instruction Finetuning MLLMs

no code implementations • 29 Jan 2024 • Shaoxiang Chen, Zequn Jie, Lin Ma

To address this issue, we propose to apply an efficient Mixture of Experts (MoE) design, which is a sparse Mixture of LoRA Experts (MoLE) for instruction finetuning MLLMs.

Language Modelling Large Language Model

Paper
Add Code

Instance-aware Multi-Camera 3D Object Detection with Structural Priors Mining and Self-Boosting Learning

no code implementations • 13 Dec 2023 • Yang Jiao, Zequn Jie, Shaoxiang Chen, Lechao Cheng, Jingjing Chen, Lin Ma, Yu-Gang Jiang

Camera-based bird-eye-view (BEV) perception paradigm has made significant progress in the autonomous driving field.

3D Object Detection Autonomous Driving +3

Paper
Add Code

Prompting Large Language Models to Reformulate Queries for Moment Localization

no code implementations • 6 Jun 2023 • Wenfeng Yan, Shaoxiang Chen, Zuxuan Wu, Yu-Gang Jiang

The task of moment localization is to localize a temporal moment in an untrimmed video for a given natural language query.

Moment Queries Natural Language Queries

Paper
Add Code

MSMDFusion: Fusing LiDAR and Camera at Multiple Scales with Multi-Depth Seeds for 3D Object Detection

1 code implementation • CVPR 2023 • Yang Jiao, Zequn Jie, Shaoxiang Chen, Jingjing Chen, Lin Ma, Yu-Gang Jiang

Recent approaches aim at exploring the semantic densities of camera features through lifting points in 2D camera images (referred to as seeds) into 3D space, and then incorporate 2D semantics via cross-modal interaction or fusion techniques.

3D Object Detection Autonomous Driving +1

155

Paper
Code

MT-Net Submission to the Waymo 3D Detection Leaderboard

no code implementations • 11 Jul 2022 • Shaoxiang Chen, Zequn Jie, Xiaolin Wei, Lin Ma

In this technical report, we introduce our submission to the Waymo 3D Detection leaderboard.

3D Object Detection

Paper
Add Code

MORE: Multi-Order RElation Mining for Dense Captioning in 3D Scenes

1 code implementation • 10 Mar 2022 • Yang Jiao, Shaoxiang Chen, Zequn Jie, Jingjing Chen, Lin Ma, Yu-Gang Jiang

3D dense captioning is a recently-proposed novel task, where point clouds contain more geometric information than the 2D counterpart.

3D dense captioning Dense Captioning +3

Paper
Code

Self-supervised Learning for Semi-supervised Temporal Language Grounding

no code implementations • 23 Sep 2021 • Fan Luo, Shaoxiang Chen, Jingjing Chen, Zuxuan Wu, Yu-Gang Jiang

Given a text description, Temporal Language Grounding (TLG) aims to localize temporal boundaries of the segments that contain the specified semantics in an untrimmed video.

Contrastive Learning Pseudo Label +2

Paper
Add Code

FT-TDR: Frequency-guided Transformer and Top-Down Refinement Network for Blind Face Inpainting

no code implementations • 10 Aug 2021 • Junke Wang, Shaoxiang Chen, Zuxuan Wu, Yu-Gang Jiang

Blind face inpainting refers to the task of reconstructing visual contents without explicitly indicating the corrupted regions in a face image.

Facial Inpainting

Paper
Add Code

Towards Bridging Event Captioner and Sentence Localizer for Weakly Supervised Dense Event Captioning

no code implementations • CVPR 2021 • Shaoxiang Chen, Yu-Gang Jiang

Dense Event Captioning (DEC) aims to jointly localize and describe multiple events of interest in untrimmed videos, which is an advancement of the conventional video captioning task (generating a single sentence description for a trimmed video).

Sentence Video Captioning

Paper
Add Code

Motion Guided Region Message Passing for Video Captioning

no code implementations • ICCV 2021 • Shaoxiang Chen, Yu-Gang Jiang

In this paper, we aim at designing a spatial information extraction and aggregation method for video captioning without the need of external object detectors.

Video Captioning

Paper
Add Code

Learning Modality Interaction for Temporal Sentence Localization and Event Captioning in Videos

no code implementations • ECCV 2020 • Shaoxiang Chen, Wenhao Jiang, Wei Liu, Yu-Gang Jiang

Inspired by the fact that there exist cross-modal interactions in the human brain, we propose a novel method for learning pairwise modality interactions in order to better exploit complementary information for each pair of modalities in videos and thus improve performances on both tasks.

Sentence

Paper
Add Code

Black-box Adversarial Attacks on Video Recognition Models

no code implementations • 10 Apr 2019 • Linxi Jiang, Xingjun Ma, Shaoxiang Chen, James Bailey, Yu-Gang Jiang

Using three benchmark video datasets, we demonstrate that V-BAD can craft both untargeted and targeted attacks to fool two state-of-the-art deep video recognition models.

Video Recognition

Paper
Add Code

Non-local NetVLAD Encoding for Video Classification

no code implementations • 29 Sep 2018 • Yongyi Tang, Xing Zhang, Jingwen Wang, Shaoxiang Chen, Lin Ma, Yu-Gang Jiang

This paper describes our solution for the 2$^\text{nd}$ YouTube-8M video understanding challenge organized by Google AI.

Classification General Classification +3

Paper
Add Code

Aggregating Frame-level Features for Large-Scale Video Classification

no code implementations • 4 Jul 2017 • Shaoxiang Chen, Xi Wang, Yongyi Tang, Xinpeng Chen, Zuxuan Wu, Yu-Gang Jiang

This paper introduces the system we developed for the Google Cloud & YouTube-8M Video Understanding Challenge, which can be considered as a multi-label classification problem defined on top of the large scale YouTube-8M Dataset.

Classification General Classification +3

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.