Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models

stanford-oval/storm 22 Feb 2024

We study how to apply large language models to write grounded and organized long-form articles from scratch, with comparable breadth and depth to Wikipedia pages.

Retrieval

3,004
9.48 stars / hour

Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models

dvlab-research/minigemini 27 Mar 2024

We try to narrow the gap by mining the potential of VLMs for better performance and any-to-any workflow from three aspects, i. e., high-resolution visual tokens, high-quality data, and VLM-guided generation.

Image Comprehension Visual Dialog +1

1,687
4.50 stars / hour

LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images

openbmb/omnilmm 18 Mar 2024

To address the challenges, we present LLaVA-UHD, a large multimodal model that can efficiently perceive images in any aspect ratio and high resolution.

1,140
3.10 stars / hour

Magic Clothing: Controllable Garment-Driven Image Synthesis

shinechen1024/magicclothing 15 Apr 2024

We propose Magic Clothing, a latent diffusion model (LDM)-based network architecture for an unexplored garment-driven image synthesis task.

Image Generation

658
2.73 stars / hour

InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models

tencentarc/instantmesh 10 Apr 2024

We present InstantMesh, a feed-forward framework for instant 3D mesh generation from a single image, featuring state-of-the-art generation quality and significant training scalability.

Image to 3D

698
2.52 stars / hour

MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators

pku-yuangroup/magictime 7 Apr 2024

Recent advances in Text-to-Video generation (T2V) have achieved remarkable success in synthesizing high-quality general videos from textual descriptions.

Text-to-Video Generation Video Generation

950
2.11 stars / hour

Probing the 3D Awareness of Visual Foundation Models

mbanani/probe3d 12 Apr 2024

Given that such models can classify, delineate, and localize objects in 2D, we ask whether they also represent their 3D structure?

160
2.07 stars / hour

Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction

FoundationVision/VAR 3 Apr 2024

We present Visual AutoRegressive modeling (VAR), a new generation paradigm that redefines the autoregressive learning on images as coarse-to-fine "next-scale prediction" or "next-resolution prediction", diverging from the standard raster-scan "next-token prediction".

Image Generation Language Modelling +2

2,436
2.01 stars / hour

State Space Model for New-Generation Network Alternative to Transformers: A Survey

event-ahu/mamba_state_space_model_paper_list 15 Apr 2024

In this paper, we give the first comprehensive review of these works and also provide experimental comparisons and analysis to better demonstrate the features and advantages of SSM.

236
1.94 stars / hour

MyGO: Discrete Modality Information as Fine-Grained Tokens for Multi-modal Knowledge Graph Completion

zjukg/mygo 15 Apr 2024

To overcome their inherent incompleteness, multi-modal knowledge graph completion (MMKGC) aims to discover unobserved knowledge from given MMKGs, leveraging both structural information from the triples and multi-modal information of the entities.

Contrastive Learning Descriptive +3

110
1.61 stars / hour