Efficient Streaming Language Models with Attention Sinks

mit-han-lab/streaming-llm 29 Sep 2023

In this paper, we first demonstrate that the emergence of attention sink is due to the strong attention scores towards initial tokens as a ``sink'' even if they are not semantically important.

Language Modelling

3,463
12.08 stars / hour

DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation

dreamgaussian/dreamgaussian 28 Sep 2023

In contrast to the occupancy pruning used in Neural Radiance Fields, we demonstrate that the progressive densification of 3D Gaussians converges significantly faster for 3D generative tasks.

1,663
3.46 stars / hour

Break-A-Scene: Extracting Multiple Concepts from a Single Image

google/break-a-scene 25 May 2023

Text-to-image model personalization aims to introduce a user-provided concept to the model, allowing its synthesis in diverse contexts.

Complex Scene Breaking and Synthesis

333
0.81 stars / hour

Demystifying CLIP Data

facebookresearch/metaclip 28 Sep 2023

We believe that the main ingredient to the success of CLIP is its data and not the model architecture or pre-training objective.

280
0.78 stars / hour

Text-to-3D using Gaussian Splatting

gsgen3d/gsgen 28 Sep 2023

In this stage, we increase the number of Gaussians by compactness-based densification to enhance continuity and improve fidelity.

Text to 3D

295
0.78 stars / hour

ProPainter: Improving Propagation and Transformer for Video Inpainting

sczhou/propainter ICCV 2023

We also propose a mask-guided sparse video Transformer, which achieves high efficiency by discarding unnecessary and redundant tokens.

Optical Flow Estimation Video Inpainting

2,633
0.77 stars / hour

Representation Engineering: A Top-Down Approach to AI Transparency

andyzoujm/representation-engineering 2 Oct 2023

In this paper, we identify and characterize the emerging area of representation engineering (RepE), an approach to enhancing the transparency of AI systems that draws on insights from cognitive neuroscience.

63
0.73 stars / hour

RoleLLM: Benchmarking, Eliciting, and Enhancing Role-Playing Abilities of Large Language Models

interactivenlp-team/rolellm-public 1 Oct 2023

The advent of Large Language Models (LLMs) has paved the way for complex tasks such as role-playing, which enhances user interactions by enabling models to imitate various characters.

Benchmarking

56
0.64 stars / hour

Bayesian Flow Networks

nnaisense/bayesian-flow-networks 14 Aug 2023

Notably, the network inputs for discrete data lie on the probability simplex, and are therefore natively differentiable, paving the way for gradient-based sample guidance and few-step generation in discrete domains such as language modelling.

Bayesian Inference Data Compression +2

39
0.63 stars / hour

ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving

microsoft/tora 29 Sep 2023

Large language models have made significant progress in various language tasks, yet they still struggle with complex mathematics.

Arithmetic Reasoning Imitation Learning +2

73
0.58 stars / hour