Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation

foundationvision/llamagen 10 Jun 2024

(3) A text-conditional image generation model with 775M parameters, from two-stage training on LAION-COCO and high aesthetics quality images, demonstrating competitive performance of visual quality and text alignment.

Conditional Image Generation

Scalable MatMul-free Language Modeling

ridgerchu/matmulfreellm 4 Jun 2024

Our experiments show that our proposed MatMul-free models achieve performance on-par with state-of-the-art Transformers that require far more memory during inference at a scale up to at least 2. 7B parameters.

Language Modelling

"Do Anything Now": Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models

verazuo/jailbreak_llms 7 Aug 2023

We hope that our study can facilitate the research community and LLM vendors in promoting safer and regulated LLMs.

Community Detection

Matching Anything by Segmenting Anything

siyuanliii/masa CVPR 2024

The robust association of the same objects across video frames in complex scenes is crucial for many applications, especially Multiple Object Tracking (MOT).

Domain Generalization Multiple Object Tracking +2

X-LoRA: Mixture of Low-Rank Adapter Experts, a Flexible Framework for Large Language Models with Applications in Protein Mechanics and Molecular Design

ericlbuehler/ 11 Feb 2024

Starting with a set of pre-trained LoRA adapters, our gating strategy uses the hidden states to dynamically mix adapted layers, allowing the resulting X-LoRA model to draw upon different capabilities and create never-before-used deep layer-wise combinations to solve tasks.

graph construction Knowledge Graphs +2

Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models

yangling0818/buffer-of-thought-llm 6 Jun 2024

We introduce Buffer of Thoughts (BoT), a novel and versatile thought-augmented reasoning approach for enhancing accuracy, efficiency and robustness of large language models (LLMs).

Arithmetic Reasoning Code Generation +2

Blind Image Restoration via Fast Diffusion Inversion

hamadichihaoui/BIRD 29 May 2024

This is ultimately equivalent to casting the IR task as an optimization problem in the space of the input noise.

Image Restoration

StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task Learning

ictnlp/streamspeech 5 Jun 2024

Simultaneous speech-to-speech translation (Simul-S2ST, a. k. a streaming speech translation) outputs target speech while receiving streaming speech inputs, which is critical for real-time communication.

 Ranked #1 on de-en on CVSS

Automatic Speech Recognition (ASR) de-en +11

Vision-LSTM: xLSTM as Generic Vision Backbone

NX-AI/vision-lstm 6 Jun 2024

Transformers are widely used as generic backbones in computer vision, despite initially introduced for natural language processing.

Lighting Every Darkness with 3DGS: Fast Training and Real-Time Rendering for HDR View Synthesis

srameo/le3d 10 Jun 2024

Volumetric rendering based methods, like NeRF, excel in HDR view synthesis from RAWimages, especially for nighttime scenes.

2k Novel View Synthesis +1

