Search Results for author: Mingfei Han

Found 10 papers, 6 papers with code

Mining Inter-Video Proposal Relations for Video Object Detection

1 code implementation • ECCV 2020 • Mingfei Han, Yali Wang, Xiaojun Chang, Yu Qiao

Recent studies have shown that, context aggregating information from proposals in different frames can clearly enhance the performance of video object detection.

Ranked #11 on Video Object Detection on ImageNet VID

Object object-detection +3

Paper
Code

LongVLM: Efficient Long Video Understanding via Large Language Models

1 code implementation • 4 Apr 2024 • Yuetian Weng, Mingfei Han, Haoyu He, Xiaojun Chang, Bohan Zhuang

In this way, we encode video representations that incorporate both local and global information, enabling the LLM to generate comprehensive responses for long-term videos.

Question Answering Video Question Answering +1

Paper
Code

Video Recognition in Portrait Mode

1 code implementation • 21 Dec 2023 • Mingfei Han, Linjie Yang, Xiaojie Jin, Jiashi Feng, Xiaojun Chang, Heng Wang

While existing datasets mainly comprise landscape mode videos, our paper seeks to introduce portrait mode videos to the research community and highlight the unique challenges associated with this video format.

Data Augmentation Video Recognition

Paper
Code

Shot2Story20K: A New Benchmark for Comprehensive Understanding of Multi-shot Videos

1 code implementation • 16 Dec 2023 • Mingfei Han, Linjie Yang, Xiaojun Chang, Heng Wang

A human need to capture both the event in every shot and associate them together to understand the story behind it.

Ranked #1 on video narration captioning on Shot2Story20K

Video Captioning video narration captioning +4

Paper
Code

Generating Action-conditioned Prompts for Open-vocabulary Video Action Recognition

no code implementations • 4 Dec 2023 • Chengyou Jia, Minnan Luo, Xiaojun Chang, Zhuohang Dang, Mingfei Han, Mengmeng Wang, Guang Dai, Sizhe Dang, Jingdong Wang

To realize this, we innovatively blend video models with Large Language Models (LLMs) to devise Action-conditioned Prompts.

Action Recognition Descriptive +1

Paper
Add Code

Mask Propagation for Efficient Video Semantic Segmentation

1 code implementation • NeurIPS 2023 • Yuetian Weng, Mingfei Han, Haoyu He, Mingjie Li, Lina Yao, Xiaojun Chang, Bohan Zhuang

By reusing predictions from key frames, we circumvent the need to process a large volume of video frames individually with resource-intensive segmentors, alleviating temporal redundancy and significantly reducing computational costs.

Semantic Segmentation Video Semantic Segmentation

Paper
Code

HTML: Hybrid Temporal-scale Multimodal Learning Framework for Referring Video Object Segmentation

no code implementations • ICCV 2023 • Mingfei Han, Yali Wang, Zhihui Li, Lina Yao, Xiaojun Chang, Yu Qiao

To tackle this problem, we propose a concise Hybrid Temporal-scale Multimodal Learning (HTML) framework, which can effectively align lingual and visual features to discover core object semantics in the video, by learning multimodal interaction hierarchically from different temporal scales.

Ranked #6 on Referring Video Object Segmentation on Refer-YouTube-VOS (using extra training data)

Object Referring Video Object Segmentation +2

Paper
Add Code

An Efficient Spatio-Temporal Pyramid Transformer for Action Detection

no code implementations • 21 Jul 2022 • Yuetian Weng, Zizheng Pan, Mingfei Han, Xiaojun Chang, Bohan Zhuang

The task of action detection aims at deducing both the action category and localization of the start and end moment for each action instance in a long, untrimmed video.

Action Detection Video Understanding

Paper
Add Code

CLMFormer: Mitigating Data Redundancy to Revitalize Transformer-based Long-Term Time Series Forecasting System

2 code implementations • 16 Jul 2022 • Mingjie Li, Rui Liu, Guangsi Shi, Mingfei Han, Changling Li, Lina Yao, Xiaojun Chang, Ling Chen

To further enhance forecasting accuracy, we introduce a memory-driven decoder.

Data Augmentation Decoder +2

Paper
Code

Dual-AI: Dual-path Actor Interaction Learning for Group Activity Recognition

no code implementations • CVPR 2022 • Mingfei Han, David Junhao Zhang, Yali Wang, Rui Yan, Lina Yao, Xiaojun Chang, Yu Qiao

Learning spatial-temporal relation among multiple actors is crucial for group activity recognition.

Group Activity Recognition

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.