Search Results for author: Bohan Zhai

Found 9 papers, 5 papers with code

COCO is "ALL'' You Need for Visual Instruction Fine-tuning

no code implementations17 Jan 2024 Xiaotian Han, Yiqi Wang, Bohan Zhai, Quanzeng You, Hongxia Yang

We argue that datasets with diverse and high-quality detailed instruction following annotations are essential and adequate for MLLMs IFT.

Image Captioning Instruction Following +1

Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning

no code implementations10 Jan 2024 Yiqi Wang, Wentao Chen, Xiaotian Han, Xudong Lin, Haiteng Zhao, Yongfei Liu, Bohan Zhai, Jianbo Yuan, Quanzeng You, Hongxia Yang

In this survey, we comprehensively review the existing evaluation protocols of multimodal reasoning, categorize and illustrate the frontiers of MLLMs, introduce recent trends in applications of MLLMs on reasoning-intensive tasks, and finally discuss current practices and future directions.

Multimodal Reasoning

InfiMM-Eval: Complex Open-Ended Reasoning Evaluation For Multi-Modal Large Language Models

no code implementations20 Nov 2023 Xiaotian Han, Quanzeng You, Yongfei Liu, Wentao Chen, Huangjie Zheng, Khalil Mrini, Xudong Lin, Yiqi Wang, Bohan Zhai, Jianbo Yuan, Heng Wang, Hongxia Yang

To mitigate this issue, we manually curate a benchmark dataset specifically designed for MLLMs, with a focus on complex reasoning tasks.

HallE-Control: Controlling Object Hallucination in Large Multimodal Models

2 code implementations3 Oct 2023 Bohan Zhai, Shijia Yang, Chenfeng Xu, Sheng Shen, Kurt Keutzer, Chunyuan Li, Manling Li

Current Large Multimodal Models (LMMs) achieve remarkable progress, yet there remains significant uncertainty regarding their ability to accurately apprehend visual details, that is, in performing detailed captioning.

Attribute Hallucination +2

Multitask Vision-Language Prompt Tuning

1 code implementation21 Nov 2022 Sheng Shen, Shijia Yang, Tianjun Zhang, Bohan Zhai, Joseph E. Gonzalez, Kurt Keutzer, Trevor Darrell

Specifically, (i) we demonstrate the effectiveness of learning a single transferable prompt from multiple source tasks to initialize the prompt for each target task; (ii) we show many target tasks can benefit each other from sharing prompt vectors and thus can be jointly learned via multitask prompt tuning.

Visual Prompt Tuning

SqueezeWave: Extremely Lightweight Vocoders for On-device Speech Synthesis

1 code implementation16 Jan 2020 Bohan Zhai, Tianren Gao, Flora Xue, Daniel Rothchild, Bichen Wu, Joseph E. Gonzalez, Kurt Keutzer

Automatic speech synthesis is a challenging task that is becoming increasingly important as edge devices begin to interact with users through speech.

Sound Audio and Speech Processing

Cannot find the paper you are looking for? You can Submit a new open access paper.