Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients

VITA-Group/Q-GaLore 11 Jul 2024

To address these limitations, we introduce Q-Galore, a novel approach that substantially reduces memory usage by combining quantization and low-rank projection, surpassing the benefits of GaLore.


Video Diffusion Alignment via Reward Gradients

mihirp1998/vader 11 Jul 2024

We show that backpropagating gradients from these reward models to a video diffusion model can allow for compute and sample efficient alignment of the video diffusion model.

GIM: A Million-scale Benchmark for Generative Image Manipulation Detection and Localization

chenyirui/gim 24 Jun 2024

The extraordinary ability of generative models emerges as a new trend in image editing and generating realistic images, posing a serious threat to the trustworthiness of multimedia data and driving the research of image manipulation detection and location(IMDL).

Image Manipulation Image Manipulation Detection

Take the aTrain. Introducing an Interface for the Accessible Transcription of Interviews

bandas-center/atrain 18 Oct 2023

If an entry-level graphics card is available, the transcription speed increases to 20% of the audio duration.

Speaker Recognition

MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases

facebookresearch/mobilellm 22 Feb 2024

The resultant models, denoted as MobileLLM-LS, demonstrate a further accuracy enhancement of 0. 7%/0. 8% than MobileLLM 125M/350M.

A Comprehensive Survey on Human Video Generation: Challenges, Methods, and Insights

wentaol86/awesome-human-body-video-generation 11 Jul 2024

The goal of this survey is to offer the research community a clear and holistic view of the advancements in human video generation, highlighting the milestones achieved and the challenges that lie ahead.

Video Generation

MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention

microsoft/MInference 2 Jul 2024

With the pattern and sparse indices, we perform efficient sparse attention calculations via our optimized GPU kernels to significantly reduce the latency in the pre-filling stage of long-context LLMs.

Language Modelling Large Language Model

CorrNet3D: Unsupervised End-to-end Learning of Dense Correspondence for 3D Point Clouds


The symmetric deformer, with an additional regularized loss, transforms the two permuted point clouds to each other to drive the unsupervised learning of the correspondence.

3D Dense Shape Correspondence

MAVIS: Mathematical Visual Instruction Tuning

zrrskywalker/mavis 11 Jul 2024

We identify three key areas within MLLMs that need to be improved: visual encoding of math diagrams, diagram-language alignment, and mathematical reasoning skills.

Contrastive Learning Language Modelling +3

