Search Results for author: Peter Grasch

Found 8 papers, 1 papers with code

FastVLM: Efficient Vision Encoding for Vision Language Models

no code implementations17 Dec 2024 Pavan Kumar Anasosalu Vasu, Fartash Faghri, Chun-Liang Li, Cem Koc, Nate True, Albert Antony, Gokul Santhanam, James Gabriel, Peter Grasch, Oncel Tuzel, Hadi Pouransari

At different operational resolutions, the vision encoder of a VLM can be optimized along two axes: reducing encoding latency and minimizing the number of visual tokens passed to the LLM, thereby lowering overall latency.

Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models

1 code implementation3 Oct 2024 Zhengfeng Lai, Vasileios Saveris, Chen Chen, Hong-You Chen, Haotian Zhang, BoWen Zhang, Juan Lao Tebar, Wenze Hu, Zhe Gan, Peter Grasch, Meng Cao, Yinfei Yang

Our findings reveal that a hybrid approach that keeps both synthetic captions and AltTexts can outperform the use of synthetic captions alone, improving both alignment and performance, with each model demonstrating preferences for particular caption formats.

MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs

no code implementations1 Jul 2024 Yusu Qian, Hanrong Ye, Jean-Philippe Fauconnier, Peter Grasch, Yinfei Yang, Zhe Gan

We introduce MIA-Bench, a new benchmark designed to evaluate multimodal large language models (MLLMs) on their ability to strictly adhere to complex instructions.

Instruction Following

Model Stability with Continuous Data Updates

no code implementations14 Jan 2022 Huiting Liu, Avinesh P. V. S., Siddharth Patwardhan, Peter Grasch, Sachin Agarwal

For this study, we propose a methodology for the assessment of model stability (which we refer to as jitter under various experimental conditions.

Decoder model +2

Cannot find the paper you are looking for? You can Submit a new open access paper.