1 Image, 2*2 Stitching

6 papers with code • 0 benchmarks • 0 datasets

This task has no description! Would you like to contribute one?

Most implemented papers

Visual Instruction Tuning

haotian-liu/LLaVA NeurIPS 2023

Instruction tuning large language models (LLMs) using machine-generated instruction-following data has improved zero-shot capabilities on new tasks, but the idea is less explored in the multimodal field.

InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning

salesforce/lavis NeurIPS 2023

Large-scale pre-training and instruction tuning have been successful at creating general-purpose language models with broad competence.

CogVLM: Visual Expert for Pretrained Language Models

thudm/cogvlm 6 Nov 2023

We introduce CogVLM, a powerful open-source visual language foundation model.

mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration

x-plug/mplug-owl CVPR 2024

Multi-modal Large Language Models (MLLMs) have demonstrated impressive instruction abilities across various open-ended tasks.

Gemini: A Family of Highly Capable Multimodal Models

valdecy/pybibx The Keyword 2023

This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding.

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

dlvuldet/primevul 8 Mar 2024

In this report, we introduce the Gemini 1. 5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio.