Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models

yangling0818/buffer-of-thought-llm 6 Jun 2024

We introduce Buffer of Thoughts (BoT), a novel and versatile thought-augmented reasoning approach for enhancing accuracy, efficiency and robustness of large language models (LLMs).

Arithmetic Reasoning Code Generation +2

Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning

agent-husky/husky-v1 10 Jun 2024

Despite using 7B models, Husky matches or even exceeds frontier LMs such as GPT-4 on these tasks, showcasing the efficacy of our holistic approach in addressing complex reasoning problems.

Multi-hop Question Answering Question Answering

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

damo-nlp-sg/videollama2 11 Jun 2024

In this paper, we present the VideoLLaMA 2, a set of Video Large Language Models (Video-LLMs) designed to enhance spatial-temporal modeling and audio understanding in video and audio-oriented tasks.

Multiple-choice Question Answering +3

Unique3D: High-Quality and Efficient 3D Mesh Generation from a Single Image

AiuniAI/Unique3D 30 May 2024

In this work, we introduce Unique3D, a novel image-to-3D framework for efficiently generating high-quality 3D meshes from single-view images, featuring state-of-the-art generation fidelity and strong generalizability.

Image to 3D

Lighting Every Darkness with 3DGS: Fast Training and Real-Time Rendering for HDR View Synthesis

srameo/le3d 10 Jun 2024

Volumetric rendering based methods, like NeRF, excel in HDR view synthesis from RAWimages, especially for nighttime scenes.

2k Novel View Synthesis +1

AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising

czg1225/asyncdiff 11 Jun 2024

To address this, we introduce AsyncDiff, a universal and plug-and-play acceleration scheme that enables model parallelism across multiple devices.


Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers

alpha-vllm/lumina-t2x 9 May 2024

Sora unveils the potential of scaling Diffusion Transformer for generating photorealistic images and videos at arbitrary resolutions, aspect ratios, and durations, yet it still lacks sufficient implementation details.

Seed-TTS: A Family of High-Quality Versatile Speech Generation Models

BytedanceSpeech/seed-tts-eval 4 Jun 2024

Seed-TTS offers superior controllability over various speech attributes such as emotion and is capable of generating highly expressive and diverse speech for speakers in the wild.

In-Context Learning Language Modelling

Mathematical Supplement for the $\texttt{gsplat}$ Library

nerfstudio-project/gsplat 4 Dec 2023

This report provides the mathematical details of the gsplat library, a modular toolbox for efficient differentiable Gaussian splatting, as proposed by Kerbl et al.

Image Textualization: An Automatic Framework for Creating Accurate and Detailed Image Descriptions

sterzhang/image-textualization 11 Jun 2024

Image description datasets play a crucial role in the advancement of various applications such as image understanding, text-to-image generation, and text-image retrieval.

Hallucination Image Retrieval +1

