1 Image, 2*2 Stitching
6 papers with code • 0 benchmarks • 0 datasets
Benchmarks
These leaderboards are used to track progress in 1 Image, 2*2 Stitching
Most implemented papers
Visual Instruction Tuning
Instruction tuning large language models (LLMs) using machine-generated instruction-following data has improved zero-shot capabilities on new tasks, but the idea is less explored in the multimodal field.
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
Large-scale pre-training and instruction tuning have been successful at creating general-purpose language models with broad competence.
CogVLM: Visual Expert for Pretrained Language Models
We introduce CogVLM, a powerful open-source visual language foundation model.
mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration
Multi-modal Large Language Models (MLLMs) have demonstrated impressive instruction abilities across various open-ended tasks.
Gemini: A Family of Highly Capable Multimodal Models
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding.
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
In this report, we introduce the Gemini 1. 5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio.