Search Results for author: Mingfei Han

Found 10 papers, 6 papers with code

Mining Inter-Video Proposal Relations for Video Object Detection

1 code implementation ECCV 2020 Mingfei Han, Yali Wang, Xiaojun Chang, Yu Qiao

Recent studies have shown that, context aggregating information from proposals in different frames can clearly enhance the performance of video object detection.

Object object-detection +3

LongVLM: Efficient Long Video Understanding via Large Language Models

1 code implementation4 Apr 2024 Yuetian Weng, Mingfei Han, Haoyu He, Xiaojun Chang, Bohan Zhuang

In this way, we encode video representations that incorporate both local and global information, enabling the LLM to generate comprehensive responses for long-term videos.

Question Answering Video Question Answering +1

Video Recognition in Portrait Mode

1 code implementation21 Dec 2023 Mingfei Han, Linjie Yang, Xiaojie Jin, Jiashi Feng, Xiaojun Chang, Heng Wang

While existing datasets mainly comprise landscape mode videos, our paper seeks to introduce portrait mode videos to the research community and highlight the unique challenges associated with this video format.

Data Augmentation Video Recognition

Mask Propagation for Efficient Video Semantic Segmentation

1 code implementation NeurIPS 2023 Yuetian Weng, Mingfei Han, Haoyu He, Mingjie Li, Lina Yao, Xiaojun Chang, Bohan Zhuang

By reusing predictions from key frames, we circumvent the need to process a large volume of video frames individually with resource-intensive segmentors, alleviating temporal redundancy and significantly reducing computational costs.

Semantic Segmentation Video Semantic Segmentation

HTML: Hybrid Temporal-scale Multimodal Learning Framework for Referring Video Object Segmentation

no code implementations ICCV 2023 Mingfei Han, Yali Wang, Zhihui Li, Lina Yao, Xiaojun Chang, Yu Qiao

To tackle this problem, we propose a concise Hybrid Temporal-scale Multimodal Learning (HTML) framework, which can effectively align lingual and visual features to discover core object semantics in the video, by learning multimodal interaction hierarchically from different temporal scales.

Ranked #6 on Referring Video Object Segmentation on Refer-YouTube-VOS (using extra training data)

Object Referring Video Object Segmentation +2

An Efficient Spatio-Temporal Pyramid Transformer for Action Detection

no code implementations21 Jul 2022 Yuetian Weng, Zizheng Pan, Mingfei Han, Xiaojun Chang, Bohan Zhuang

The task of action detection aims at deducing both the action category and localization of the start and end moment for each action instance in a long, untrimmed video.

Action Detection Video Understanding

Cannot find the paper you are looking for? You can Submit a new open access paper.