A Single Transformer for Scalable Vision-Language Modeling

yangyi-chen/solo 8 Jul 2024

We present SOLO, a single transformer for Scalable visiOn-Language mOdeling.

Language Modelling Mathematical Reasoning

Lookback Lens: Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps

voidism/lookback-lens 9 Jul 2024

We find that a linear classifier based on these lookback ratio features is as effective as a richer detector that utilizes the entire hidden states of an LLM or a text-based entailment model.


GaussianImage: 1000 FPS Image Representation and Compression by 2D Gaussian Splatting

xinjie-q/gaussianimage 13 Mar 2024

In response, we propose a groundbreaking paradigm of image representation and compression by 2D Gaussian Splatting, named GaussianImage.


Unique3D: High-Quality and Efficient 3D Mesh Generation from a Single Image

AiuniAI/Unique3D 30 May 2024

In this work, we introduce Unique3D, a novel image-to-3D framework for efficiently generating high-quality 3D meshes from single-view images, featuring state-of-the-art generation fidelity and strong generalizability.

Image to 3D Single-View 3D Reconstruction +1

MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers

buaacyw/meshanything 14 Jun 2024

Recently, 3D assets created via reconstruction and generation have matched the quality of manually crafted assets, highlighting their potential for replacement.


Pre-training Small Base LMs with Fewer Tokens

Lightning-AI/lit-gpt 12 Apr 2024

Here we show that smaller LMs trained utilizing some of the layers of GPT2-medium (355M) and GPT-2-large (770M) can effectively match the val loss of their bigger counterparts when trained from scratch for the same number of training steps on OpenWebText dataset with 9B tokens.

Language Modelling

Take the aTrain. Introducing an Interface for the Accessible Transcription of Interviews

bandas-center/atrain 18 Oct 2023

If an entry-level graphics card is available, the transcription speed increases to 20% of the audio duration.

Speaker Recognition

ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools

thudm/chatglm4 18 Jun 2024

We introduce ChatGLM, an evolving family of large language models that we have been developing over time.

GSM8K Instruction Following +1

WayveScenes101: A Dataset and Benchmark for Novel View Synthesis in Autonomous Driving

wayveai/wayve_scenes 11 Jul 2024

We present WayveScenes101, a dataset designed to help the community advance the state of the art in novel view synthesis that focuses on challenging driving scenes containing many dynamic and deformable elements with changing geometry and texture.

Autonomous Driving Benchmarking +1

