Trending Research

CV-VAE: A Compatible Video VAE for Latent Generative Video Models

ailab-cvc/cv-vae • • 30 May 2024

Moreover, since current diffusion-based approaches are often implemented using pre-trained text-to-image (T2I) models, directly training a video VAE without considering the compatibility with existing T2I models will result in a latent space gap between them, which will take huge computational resources for training to bridge the gap even with the T2I models as initialization.

Quantization

106

0.47 stars / hour

Paper
Code

RectifID: Personalizing Rectified Flow with Anchored Classifier Guidance

feifeiobama/RectifID • • 23 May 2024

Our study shows that based on a recent rectified flow framework, the major limitation of vanilla classifier guidance in requiring a special classifier can be resolved with a simple fixed-point solution, allowing flexible personalization with off-the-shelf image discriminators.

Image Generation

0.44 stars / hour

Paper
Code

Self-Exploring Language Models: Active Preference Elicitation for Online Alignment

shenao-zhang/selm • • 29 May 2024

Preference optimization, particularly through Reinforcement Learning from Human Feedback (RLHF), has achieved significant success in aligning Large Language Models (LLMs) to adhere to human intentions.

Instruction Following

0.39 stars / hour

Paper
Code

DiG: Scalable and Efficient Diffusion Models with Gated Linear Attention

hustvl/dig • • 28 May 2024

In this paper, we aim to leverage the long sequence modeling capability of Gated Linear Attention (GLA) Transformers, expanding its applicability to diffusion models.

0.16 stars / hour

Paper
Code

KAN: Kolmogorov-Arnold Networks

Blealtan/efficient-kan • • 30 Apr 2024

Inspired by the Kolmogorov-Arnold representation theorem, we propose Kolmogorov-Arnold Networks (KANs) as promising alternatives to Multi-Layer Perceptrons (MLPs).

3,174

0.37 stars / hour

Paper
Code

GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction

huang-yh/gaussianformer • 27 May 2024

To address this, we propose an object-centric representation to describe 3D scenes with sparse 3D semantic Gaussians where each Gaussian represents a flexible region of interest and its semantic features.

3D Semantic Occupancy Prediction Autonomous Driving +1

100

0.37 stars / hour

Paper
Code

SimPO: Simple Preference Optimization with a Reference-Free Reward

princeton-nlp/simpo • • 23 May 2024

Our top-performing model, built on Llama3-8B-Instruct, achieves a remarkable 44. 7 length-controlled win rate on AlpacaEval 2 -- surpassing Claude 3 Opus on the leaderboard, and a 33. 8 win rate on Arena-Hard -- making it the strongest 8B open-source model.

Instruction Following

379

0.36 stars / hour

Paper
Code

Neighborhood-Enhanced Supervised Contrastive Learning for Collaborative Filtering

PeiJieSun/NESCL • • 18 Feb 2024

Using the graph-based collaborative filtering model as our backbone and following the same data augmentation methods as the existing contrastive learning model SGL, we effectively enhance the performance of the recommendation model.

Ranked #1 on Recommendation Systems on Yelp2018