Search Results for author: Yizhuo Li

Found 16 papers, 12 papers with code

MVBench: A Comprehensive Multi-modal Video Understanding Benchmark

1 code implementation • 28 Nov 2023 • Kunchang Li, Yali Wang, Yinan He, Yizhuo Li, Yi Wang, Yi Liu, Zun Wang, Jilan Xu, Guo Chen, Ping Luo, LiMin Wang, Yu Qiao

With the rapid development of Multi-modal Large Language Models (MLLMs), a number of diagnostic benchmarks have recently emerged to evaluate the comprehension capabilities of these models.

Ranked #1 on Zero-Shot Video Question Answer on STAR Benchmark

Fairness Multiple-choice +8

2,663

Paper
Code

Harvest Video Foundation Models via Efficient Post-Pretraining

1 code implementation • 30 Oct 2023 • Yizhuo Li, Kunchang Li, Yinan He, Yi Wang, Yali Wang, LiMin Wang, Yu Qiao, Ping Luo

Building video-language foundation models is costly and difficult due to the redundant nature of video data and the lack of high-quality video-language datasets.

Question Answering Text Retrieval +2

917

Paper
Code

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation

1 code implementation • 13 Jul 2023 • Yi Wang, Yinan He, Yizhuo Li, Kunchang Li, Jiashuo Yu, Xin Ma, Xinhao Li, Guo Chen, Xinyuan Chen, Yaohui Wang, Conghui He, Ping Luo, Ziwei Liu, Yali Wang, LiMin Wang, Yu Qiao

Specifically, we utilize a multi-scale approach to generate video-related descriptions.

Action Recognition Contrastive Learning +7

917

Paper
Code

VideoChat: Chat-Centric Video Understanding

1 code implementation • 10 May 2023 • Kunchang Li, Yinan He, Yi Wang, Yizhuo Li, Wenhai Wang, Ping Luo, Yali Wang, LiMin Wang, Yu Qiao

In this paper, we initiate an attempt of developing an end-to-end chat-centric video understanding system, coined as VideoChat.

Ranked #6 on Video Question Answering on MVBench

Video-based Generative Performance Benchmarking (Consistency) Video-based Generative Performance Benchmarking (Contextual Understanding) +5

2,663

Paper
Code

Unmasked Teacher: Towards Training-Efficient Video Foundation Models

1 code implementation • ICCV 2023 • Kunchang Li, Yali Wang, Yizhuo Li, Yi Wang, Yinan He, LiMin Wang, Yu Qiao

Previous VFMs rely on Image Foundation Models (IFMs), which face challenges in transferring to the video domain.

Ranked #1 on Video Retrieval on SSv2-template retrieval (using extra training data)

Action Classification Action Recognition +5

242

Paper
Code

UniFormerV2: Unlocking the Potential of Image ViTs for Video Understanding

no code implementations • ICCV 2023 • Kunchang Li, Yali Wang, Yinan He, Yizhuo Li, Yi Wang, LiMin Wang, Yu Qiao

The prolific performances of Vision Transformers (ViTs) in image tasks have prompted research into adapting the image ViTs for video tasks.

Video Understanding

Paper
Add Code

InternVideo: General Video Foundation Models via Generative and Discriminative Learning

1 code implementation • 6 Dec 2022 • Yi Wang, Kunchang Li, Yizhuo Li, Yinan He, Bingkun Huang, Zhiyu Zhao, Hongjie Zhang, Jilan Xu, Yi Liu, Zun Wang, Sen Xing, Guo Chen, Junting Pan, Jiashuo Yu, Yali Wang, LiMin Wang, Yu Qiao

Specifically, InternVideo efficiently explores masked video modeling and video-language contrastive learning as the pretraining objectives, and selectively coordinates video representations of these two complementary frameworks in a learnable manner to boost various video applications.

Ranked #1 on Action Recognition on Something-Something V1 (using extra training data)

Action Classification Contrastive Learning +8

917

Paper
Code

UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer

3 code implementations • 17 Nov 2022 • Kunchang Li, Yali Wang, Yinan He, Yizhuo Li, Yi Wang, LiMin Wang, Yu Qiao

UniFormer has successfully alleviated this issue, by unifying convolution and self-attention as a relation aggregator in the transformer format.

Video Understanding

3,888

Paper
Code

InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D Challenges

2 code implementations • 17 Nov 2022 • Guo Chen, Sen Xing, Zhe Chen, Yi Wang, Kunchang Li, Yizhuo Li, Yi Liu, Jiahao Wang, Yin-Dong Zheng, Bingkun Huang, Zhiyu Zhao, Junting Pan, Yifei HUANG, Zun Wang, Jiashuo Yu, Yinan He, Hongjie Zhang, Tong Lu, Yali Wang, LiMin Wang, Yu Qiao

In this report, we present our champion solutions to five tracks at Ego4D challenge.

Ranked #1 on State Change Object Detection on Ego4D

Future Hand Prediction Moment Queries +7

Paper
Code

HAKE: A Knowledge Engine Foundation for Human Activity Understanding

3 code implementations • 14 Feb 2022 • Yong-Lu Li, Xinpeng Liu, Xiaoqian Wu, Yizhuo Li, Zuoyu Qiu, Liang Xu, Yue Xu, Hao-Shu Fang, Cewu Lu

Human activity understanding is of widespread interest in artificial intelligence and spans diverse applications like health care and behavior analysis.

Action Recognition Human-Object Interaction Detection +2

217

Paper
Code

An Improved Reinforcement Learning Model Based on Sentiment Analysis

no code implementations • 19 Nov 2021 • Yizhuo Li, Peng Zhou, Fangyi Li, Xiao Yang

The authors combined the deep Q network in reinforcement learning with the sentiment quantitative indicator ARBR to build a high-frequency stock trading model for the share market.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

Test-Time Personalization with a Transformer for Human Pose Estimation

no code implementations • NeurIPS 2021 • Yizhuo Li, Miao Hao, Zonglin Di, Nitesh B. Gundavarapu, Xiaolong Wang

During test time, we personalize and adapt our model by fine-tuning with the self-supervised objective.

Pose Estimation

Paper
Add Code

PGT: A Progressive Method for Training Models on Long Videos

1 code implementation • CVPR 2021 • Bo Pang, Gao Peng, Yizhuo Li, Cewu Lu

This progressive training (PGT) method is able to train long videos end-to-end with limited resources and ensures the effective transmission of information.

Paper
Code

TDAF: Top-Down Attention Framework for Vision Tasks

no code implementations • 14 Dec 2020 • Bo Pang, Yizhuo Li, Jiefeng Li, Muchen Li, Hanwen Cao, Cewu Lu

Such spatial and attention features are nested deeply, therefore, the proposed framework works in a mixed top-down and bottom-up manner.

Action Recognition object-detection +2

Paper
Add Code

HOI Analysis: Integrating and Decomposing Human-Object Interaction

2 code implementations • NeurIPS 2020 • Yong-Lu Li, Xinpeng Liu, Xiaoqian Wu, Yizhuo Li, Cewu Lu

Meanwhile, isolated human and object can also be integrated into coherent HOI again.

Ranked #20 on Human-Object Interaction Detection on V-COCO

Human-Object Interaction Detection Object

214

Paper
Code

TubeTK: Adopting Tubes to Track Multi-Object in a One-Step Training Model

1 code implementation • CVPR 2020 • Bo Pang, Yizhuo Li, Yifan Zhang, Muchen Li, Cewu Lu

As deep learning brings excellent performances to object detection algorithms, Tracking by Detection (TBD) has become the mainstream tracking framework.

Multi-Object Tracking Object +2

137

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.