Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients

VITA-Group/Q-GaLore 11 Jul 2024

To address these limitations, we introduce Q-Galore, a novel approach that substantially reduces memory usage by combining quantization and low-rank projection, surpassing the benefits of GaLore.

Quantization

101
0.88 stars / hour

Video Diffusion Alignment via Reward Gradients

mihirp1998/vader 11 Jul 2024

We show that backpropagating gradients from these reward models to a video diffusion model can allow for compute and sample efficient alignment of the video diffusion model.

97
0.82 stars / hour

GIM: A Million-scale Benchmark for Generative Image Manipulation Detection and Localization

chenyirui/gim 24 Jun 2024

The extraordinary ability of generative models emerges as a new trend in image editing and generating realistic images, posing a serious threat to the trustworthiness of multimedia data and driving the research of image manipulation detection and location(IMDL).

Image Manipulation Image Manipulation Detection

132
0.68 stars / hour

Take the aTrain. Introducing an Interface for the Accessible Transcription of Interviews

bandas-center/atrain 18 Oct 2023

If an entry-level graphics card is available, the transcription speed increases to 20% of the audio duration.

Speaker Recognition

245
0.68 stars / hour

MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases

facebookresearch/mobilellm 22 Feb 2024

The resultant models, denoted as MobileLLM-LS, demonstrate a further accuracy enhancement of 0. 7%/0. 8% than MobileLLM 125M/350M.

737
0.67 stars / hour

A Comprehensive Survey on Human Video Generation: Challenges, Methods, and Insights

wentaol86/awesome-human-body-video-generation 11 Jul 2024

The goal of this survey is to offer the research community a clear and holistic view of the advancements in human video generation, highlighting the milestones achieved and the challenges that lie ahead.

Video Generation

120
0.67 stars / hour

MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention

microsoft/MInference 2 Jul 2024

With the pattern and sparse indices, we perform efficient sparse attention calculations via our optimized GPU kernels to significantly reduce the latency in the pre-filling stage of long-context LLMs.

Language Modelling Large Language Model

519
0.66 stars / hour

CorrNet3D: Unsupervised End-to-end Learning of Dense Correspondence for 3D Point Clouds

ZENGYIMING-EAMON/CorrNet3D CVPR 2021

The symmetric deformer, with an additional regularized loss, transforms the two permuted point clouds to each other to drive the unsupervised learning of the correspondence.

Ranked #6 on 3D Dense Shape Correspondence on SHREC'19 (using extra training data)

3D Dense Shape Correspondence

134
0.49 stars / hour

MAVIS: Mathematical Visual Instruction Tuning

zrrskywalker/mavis 11 Jul 2024

We identify three key areas within MLLMs that need to be improved: visual encoding of math diagrams, diagram-language alignment, and mathematical reasoning skills.

Contrastive Learning Language Modelling +3

64
0.48 stars / hour