Search Results for author: Yizhuo Li

Found 16 papers, 12 papers with code

MVBench: A Comprehensive Multi-modal Video Understanding Benchmark

1 code implementation28 Nov 2023 Kunchang Li, Yali Wang, Yinan He, Yizhuo Li, Yi Wang, Yi Liu, Zun Wang, Jilan Xu, Guo Chen, Ping Luo, LiMin Wang, Yu Qiao

With the rapid development of Multi-modal Large Language Models (MLLMs), a number of diagnostic benchmarks have recently emerged to evaluate the comprehension capabilities of these models.

Fairness Multiple-choice +8

Harvest Video Foundation Models via Efficient Post-Pretraining

1 code implementation30 Oct 2023 Yizhuo Li, Kunchang Li, Yinan He, Yi Wang, Yali Wang, LiMin Wang, Yu Qiao, Ping Luo

Building video-language foundation models is costly and difficult due to the redundant nature of video data and the lack of high-quality video-language datasets.

Question Answering Text Retrieval +2

UniFormerV2: Unlocking the Potential of Image ViTs for Video Understanding

no code implementations ICCV 2023 Kunchang Li, Yali Wang, Yinan He, Yizhuo Li, Yi Wang, LiMin Wang, Yu Qiao

The prolific performances of Vision Transformers (ViTs) in image tasks have prompted research into adapting the image ViTs for video tasks.

Video Understanding

InternVideo: General Video Foundation Models via Generative and Discriminative Learning

1 code implementation6 Dec 2022 Yi Wang, Kunchang Li, Yizhuo Li, Yinan He, Bingkun Huang, Zhiyu Zhao, Hongjie Zhang, Jilan Xu, Yi Liu, Zun Wang, Sen Xing, Guo Chen, Junting Pan, Jiashuo Yu, Yali Wang, LiMin Wang, Yu Qiao

Specifically, InternVideo efficiently explores masked video modeling and video-language contrastive learning as the pretraining objectives, and selectively coordinates video representations of these two complementary frameworks in a learnable manner to boost various video applications.

 Ranked #1 on Action Recognition on Something-Something V1 (using extra training data)

Action Classification Contrastive Learning +8

UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer

3 code implementations17 Nov 2022 Kunchang Li, Yali Wang, Yinan He, Yizhuo Li, Yi Wang, LiMin Wang, Yu Qiao

UniFormer has successfully alleviated this issue, by unifying convolution and self-attention as a relation aggregator in the transformer format.

Video Understanding

HAKE: A Knowledge Engine Foundation for Human Activity Understanding

3 code implementations14 Feb 2022 Yong-Lu Li, Xinpeng Liu, Xiaoqian Wu, Yizhuo Li, Zuoyu Qiu, Liang Xu, Yue Xu, Hao-Shu Fang, Cewu Lu

Human activity understanding is of widespread interest in artificial intelligence and spans diverse applications like health care and behavior analysis.

Action Recognition Human-Object Interaction Detection +2

An Improved Reinforcement Learning Model Based on Sentiment Analysis

no code implementations19 Nov 2021 Yizhuo Li, Peng Zhou, Fangyi Li, Xiao Yang

The authors combined the deep Q network in reinforcement learning with the sentiment quantitative indicator ARBR to build a high-frequency stock trading model for the share market.

reinforcement-learning Reinforcement Learning (RL) +1

PGT: A Progressive Method for Training Models on Long Videos

1 code implementation CVPR 2021 Bo Pang, Gao Peng, Yizhuo Li, Cewu Lu

This progressive training (PGT) method is able to train long videos end-to-end with limited resources and ensures the effective transmission of information.

TDAF: Top-Down Attention Framework for Vision Tasks

no code implementations14 Dec 2020 Bo Pang, Yizhuo Li, Jiefeng Li, Muchen Li, Hanwen Cao, Cewu Lu

Such spatial and attention features are nested deeply, therefore, the proposed framework works in a mixed top-down and bottom-up manner.

Action Recognition object-detection +2

TubeTK: Adopting Tubes to Track Multi-Object in a One-Step Training Model

1 code implementation CVPR 2020 Bo Pang, Yizhuo Li, Yifan Zhang, Muchen Li, Cewu Lu

As deep learning brings excellent performances to object detection algorithms, Tracking by Detection (TBD) has become the mainstream tracking framework.

Multi-Object Tracking Object +2

Cannot find the paper you are looking for? You can Submit a new open access paper.