Search Results for author: Weiyun Wang

Found 13 papers, 12 papers with code

Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance

1 code implementation21 Oct 2024 Zhangwei Gao, Zhe Chen, Erfei Cui, Yiming Ren, Weiyun Wang, Jinguo Zhu, Hao Tian, Shenglong Ye, Junjun He, Xizhou Zhu, Lewei Lu, Tong Lu, Yu Qiao, Jifeng Dai, Wenhai Wang

Multimodal large language models (MLLMs) have demonstrated impressive performance in vision-language tasks across a broad spectrum of domains.

Autonomous Driving

Needle In A Multimodal Haystack

1 code implementation11 Jun 2024 Weiyun Wang, Shuibo Zhang, Yiming Ren, Yuchen Duan, Tiantong Li, Shuo Liu, Mengkang Hu, Zhe Chen, Kaipeng Zhang, Lewei Lu, Xizhou Zhu, Ping Luo, Yu Qiao, Jifeng Dai, Wenqi Shao, Wenhai Wang

In this work, we present Needle In A Multimodal Haystack (MM-NIAH), the first benchmark specifically designed to systematically evaluate the capability of existing MLLMs to comprehend long multimodal documents.

Retrieval

Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures

1 code implementation4 Mar 2024 Yuchen Duan, Weiyun Wang, Zhe Chen, Xizhou Zhu, Lewei Lu, Tong Lu, Yu Qiao, Hongsheng Li, Jifeng Dai, Wenhai Wang

Our evaluations demonstrate that VRWKV surpasses ViT's performance in image classification and has significantly faster speeds and lower memory usage processing high-resolution inputs.

Image Classification

The All-Seeing Project V2: Towards General Relation Comprehension of the Open World

1 code implementation29 Feb 2024 Weiyun Wang, Yiming Ren, Haowen Luo, Tiantong Li, Chenxiang Yan, Zhe Chen, Wenhai Wang, Qingyun Li, Lewei Lu, Xizhou Zhu, Yu Qiao, Jifeng Dai

In addition, we design a new benchmark, termed Circular-based Relation Probing Evaluation (CRPE) for comprehensively evaluating the relation comprehension capabilities of MLLMs.

Hallucination Object Localization +3

InternGPT: Solving Vision-Centric Tasks by Interacting with ChatGPT Beyond Language

2 code implementations9 May 2023 Zhaoyang Liu, Yinan He, Wenhai Wang, Weiyun Wang, Yi Wang, Shoufa Chen, Qinglong Zhang, Zeqiang Lai, Yang Yang, Qingyun Li, Jiashuo Yu, Kunchang Li, Zhe Chen, Xue Yang, Xizhou Zhu, Yali Wang, LiMin Wang, Ping Luo, Jifeng Dai, Yu Qiao

Different from existing interactive systems that rely on pure language, by incorporating pointing instructions, the proposed iGPT significantly improves the efficiency of communication between users and chatbots, as well as the accuracy of chatbots in vision-centric tasks, especially in complicated visual scenarios where the number of objects is greater than 2.

Language Modelling

Demystify Transformers & Convolutions in Modern Image Deep Networks

1 code implementation10 Nov 2022 Xiaowei Hu, Min Shi, Weiyun Wang, Sitong Wu, Linjie Xing, Wenhai Wang, Xizhou Zhu, Lewei Lu, Jie zhou, Xiaogang Wang, Yu Qiao, Jifeng Dai

Our experiments on various tasks and an analysis of inductive bias show a significant performance boost due to advanced network-level and block-level designs, but performance differences persist among different STMs.

Image Deep Networks Spatial Token Mixer

Cannot find the paper you are looking for? You can Submit a new open access paper.