WorldGPT: Empowering LLM as Multimodal World Model

dcdmllm/worldgpt 28 Apr 2024

As for evaluation, we build WorldNet, a multimodal state transition prediction benchmark encompassing varied real-life scenarios.

Language Modelling Large Language Model

67
0.38 stars / hour

QLoRA: Efficient Finetuning of Quantized LLMs

internlm/xtuner NeurIPS 2023

Our best model family, which we name Guanaco, outperforms all previous openly released models on the Vicuna benchmark, reaching 99. 3% of the performance level of ChatGPT while only requiring 24 hours of finetuning on a single GPU.

Chatbot Instruction Following +2

2,480
0.37 stars / hour

MicroDreamer: Zero-shot 3D Generation in $\sim$20 Seconds by Score-based Iterative Reconstruction

ml-gsai/microdreamer 30 Apr 2024

In this paper, we introduce score-based iterative reconstruction (SIR), an efficient and general algorithm for 3D generation with a multi-view score-based diffusion model.

3D Generation 3D Reconstruction

61
0.33 stars / hour

OmniDrive: A Holistic LLM-Agent Framework for Autonomous Driving with 3D Perception, Reasoning and Planning

nvlabs/omnidrive 2 May 2024

We further propose OmniDrive-nuScenes, a new visual question-answering dataset challenging the true 3D situational awareness of a model with comprehensive visual question-answering (VQA) tasks, including scene description, traffic regulation, 3D grounding, counterfactual reasoning, decision making and planning.

Autonomous Driving counterfactual +4

54
0.33 stars / hour

Make Your LLM Fully Utilize the Context

hsiehjackson/ruler 25 Apr 2024

While many contemporary large language models (LLMs) can process lengthy input, they still struggle to fully utilize information within the long context, known as the lost-in-the-middle challenge.

4k Information Retrieval +1

108
0.30 stars / hour

WebLINX: Real-World Website Navigation with Multi-Turn Dialogue

McGill-NLP/webllama 8 Feb 2024

We propose the problem of conversational web navigation, where a digital agent controls a web browser and follows user instructions to solve real-world tasks in a multi-turn dialogue fashion.

Conversational Web Navigation Text Generation +1

1,081
0.30 stars / hour

ManiSkill: Generalizable Manipulation Skill Benchmark with Large-Scale Demonstrations

haosulab/ManiSkill 30 Jul 2021

Here we propose SAPIEN Manipulation Skill Benchmark (ManiSkill) to benchmark manipulation skills over diverse objects in a full-physics simulator.

470
0.28 stars / hour

StreamMultiDiffusion: Real-Time Interactive Generation with Region-Based Semantic Control

ironjr/streammultidiffusion 14 Mar 2024

The enormous success of diffusion models in text-to-image synthesis has made them promising candidates for the next generation of end-user applications for image generation and editing.

Text-to-Image Generation

445
0.28 stars / hour

SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation

ailab-cvc/seed-x 22 Apr 2024

We hope that our work will inspire future research into what can be achieved by versatile multimodal foundation models in real-world applications.

Image Generation

212
0.27 stars / hour