CV-VAE: A Compatible Video VAE for Latent Generative Video Models

ailab-cvc/cv-vae 30 May 2024

Moreover, since current diffusion-based approaches are often implemented using pre-trained text-to-image (T2I) models, directly training a video VAE without considering the compatibility with existing T2I models will result in a latent space gap between them, which will take huge computational resources for training to bridge the gap even with the T2I models as initialization.

Quantization

40
0.47 stars / hour

RectifID: Personalizing Rectified Flow with Anchored Classifier Guidance

feifeiobama/RectifID 23 May 2024

Our study shows that based on a recent rectified flow framework, the major limitation of vanilla classifier guidance in requiring a special classifier can be resolved with a simple fixed-point solution, allowing flexible personalization with off-the-shelf image discriminators.

Image Generation

69
0.44 stars / hour

Self-Exploring Language Models: Active Preference Elicitation for Online Alignment

shenao-zhang/selm 29 May 2024

Preference optimization, particularly through Reinforcement Learning from Human Feedback (RLHF), has achieved significant success in aligning Large Language Models (LLMs) to adhere to human intentions.

Instruction Following

40
0.39 stars / hour

DiG: Scalable and Efficient Diffusion Models with Gated Linear Attention

hustvl/dig 28 May 2024

In this paper, we aim to leverage the long sequence modeling capability of Gated Linear Attention (GLA) Transformers, expanding its applicability to diffusion models.

73
0.16 stars / hour

KAN: Kolmogorov-Arnold Networks

Blealtan/efficient-kan 30 Apr 2024

Inspired by the Kolmogorov-Arnold representation theorem, we propose Kolmogorov-Arnold Networks (KANs) as promising alternatives to Multi-Layer Perceptrons (MLPs).

3,091
0.37 stars / hour

GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction

huang-yh/gaussianformer 27 May 2024

To address this, we propose an object-centric representation to describe 3D scenes with sparse 3D semantic Gaussians where each Gaussian represents a flexible region of interest and its semantic features.

3D Semantic Occupancy Prediction Autonomous Driving +1

85
0.37 stars / hour

SimPO: Simple Preference Optimization with a Reference-Free Reward

princeton-nlp/simpo 23 May 2024

Our top-performing model, built on Llama3-8B-Instruct, achieves a remarkable 44. 7 length-controlled win rate on AlpacaEval 2 -- surpassing Claude 3 Opus on the leaderboard, and a 33. 8 win rate on Arena-Hard -- making it the strongest 8B open-source model.

Instruction Following

336
0.36 stars / hour

Neighborhood-Enhanced Supervised Contrastive Learning for Collaborative Filtering

PeiJieSun/NESCL 18 Feb 2024

Using the graph-based collaborative filtering model as our backbone and following the same data augmentation methods as the existing contrastive learning model SGL, we effectively enhance the performance of the recommendation model.

Collaborative Filtering Contrastive Learning +2

36
0.35 stars / hour

Poseidon: Efficient Foundation Models for PDEs

camlab-ethz/poseidon 29 May 2024

Moreover, Poseidon scales with respect to model and data size, both for pretraining and for downstream tasks.

24
0.35 stars / hour

MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning

kongds/mora 20 May 2024

Low-rank adaptation is a popular parameter-efficient fine-tuning method for large language models.

Continual Pretraining Mathematical Reasoning

235
0.34 stars / hour