TinyLLaVA: A Framework of Small-scale Large Multimodal Models

dlcv-buaa/tinyllavabench 22 Feb 2024

We present the TinyLLaVA framework that provides a unified perspective in designing and analyzing the small-scale Large Multimodal Models (LMMs).

Visual Question Answering

Repetition Improves Language Model Embeddings

jakespringer/echo-embeddings 23 Feb 2024

In this work, we address an architectural limitation of autoregressive models: token embeddings cannot contain information from tokens that appear later in the input.

Language Modelling

Gen4Gen: Generative Data Pipeline for Generative Multi-Concept Composition

louisYen/Gen4Gen 23 Feb 2024

First, current personalization techniques fail to reliably extend to multiple concepts -- we hypothesize this to be due to the mismatch between complex scenes and simple text descriptions in the pre-training dataset (e. g., LAION).

Image Generation

Large Language Models for Data Annotation: A Survey

zhen-tan-dmml/llm4annotation 21 Feb 2024

Furthermore, the paper includes an in-depth taxonomy of methodologies employing LLMs for data annotation, a comprehensive review of learning strategies for models incorporating LLM-generated annotations, and a detailed discussion on primary challenges and limitations associated with using LLMs for data annotation.

YOLO-World: Real-Time Open-Vocabulary Object Detection

ailab-cvc/yolo-world 30 Jan 2024

The You Only Look Once (YOLO) series of detectors have established themselves as efficient and practical tools.

Instance Segmentation Language Modelling +4

Towards Building Multilingual Language Model for Medicine

magic-ai4med/mmedlm 21 Feb 2024

In this paper, we aim to develop an open-source, multilingual language model for medicine, that the benefits a wider, linguistically diverse audience from different regions.

Language Modelling Question Answering

Differential Diffusion: Giving Each Pixel Its Strength

exx8/differential-diffusion 1 Jun 2023

With the rise of diffusion models, image editing via textual instructions has become ubiquitous.

Text-based Image Editing

T-Stitch: Accelerating Sampling in Pre-Trained Diffusion Models with Trajectory Stitching

nvlabs/t-stitch 21 Feb 2024

Sampling from diffusion probabilistic models (DPMs) is often expensive for high-quality image generation and typically requires many steps with a large model.

Image Generation

OS-Copilot: Towards Generalist Computer Agents with Self-Improvement

OS-Copilot/FRIDAY 12 Feb 2024

Autonomous interaction with the computer has been a longstanding challenge with great potential, and the recent proliferation of large language models (LLMs) has markedly accelerated progress in building digital agents.

FiT: Flexible Vision Transformer for Diffusion Model

whlzy/fit 19 Feb 2024

Nature is infinitely resolution-free.

Image Cropping

