Trending Research

NOLA: Compressing LoRA using Linear Combination of Random Basis

UCDvision/NOLA • • 4 Oct 2023

These methods can reduce the number of parameters needed to fine-tune an LLM by several orders of magnitude.

0.26 stars / hour

Paper
Code

TextrolSpeech: A Text Style Control Speech Corpus With Codec Language Text-to-Speech Models

jishengpeng/TextrolSpeech • 28 Aug 2023

The dataset comprises 236, 220 pairs of style prompt in natural text descriptions with five style factors and corresponding speech samples.

Language Modelling

0.26 stars / hour

Paper
Code

ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity Preserving

JackAILab/ConsistentID • • 25 Apr 2024

ConsistentID comprises two key components: a multimodal facial prompt generator that combines facial features, corresponding facial descriptions and the overall facial context to enhance precision in facial details, and an ID-preservation network optimized through the facial attention localization strategy, aimed at preserving ID consistency in facial regions.

434

0.26 stars / hour

Paper
Code

SoccerNet Game State Reconstruction: End-to-End Athlete Tracking and Identification on a Minimap

soccernet/sn-gamestate • • 17 Apr 2024

This tracking and identification process is crucial for reconstructing the game state, defined by the athletes' positions and identities on a 2D top-view of the pitch, (i. e. a minimap).

Ranked #1 on Game State Reconstruction on SoccerNet-GSR

Camera Calibration Game State Reconstruction

134

0.25 stars / hour

Paper
Code

ReFT: Representation Finetuning for Language Models

stanfordnlp/pyreft • • 4 Apr 2024

LoReFT is a drop-in replacement for existing PEFTs and learns interventions that are 10x-50x more parameter-efficient than prior state-of-the-art PEFTs.

Arithmetic Reasoning

695

0.25 stars / hour

Paper
Code

FlowMap: High-Quality Camera Poses, Intrinsics, and Depth via Gradient Descent

dcharatan/flowmap • • 23 Apr 2024

This paper introduces FlowMap, an end-to-end differentiable method that solves for precise camera poses, camera intrinsics, and per-frame dense depth of a video sequence.

Novel View Synthesis Optical Flow Estimation +1

691

0.23 stars / hour

Paper
Code

Neuro-GPT: Towards A Foundation Model for EEG

wenhui0206/neurogpt • • 7 Nov 2023

To handle the scarcity and heterogeneity of electroencephalography (EEG) data for Brain-Computer Interface (BCI) tasks, and to harness the power of large publicly available data sets, we propose Neuro-GPT, a foundation model consisting of an EEG encoder and a GPT model.

EEG Motor Imagery

0.23 stars / hour

Paper
Code

GaussianTalker: Real-Time High-Fidelity Talking Head Synthesis with Audio-Driven 3D Gaussian Splatting

ku-cvlab/gaussiantalker • • 24 Apr 2024

A key insight is to encode the 3D Gaussian attributes into a shared implicit feature representation, where it is merged with audio features to manipulate each Gaussian attribute.

Attribute

0.21 stars / hour

Paper
Code

Hallucination of Multimodal Large Language Models: A Survey

showlab/awesome-mllm-hallucination • 29 Apr 2024

By drawing the granular classification and landscapes of hallucination causes, evaluation benchmarks, and mitigation methods, this survey aims to deepen the understanding of hallucinations in MLLMs and inspire further advancements in the field.

Hallucination

150

0.21 stars / hour

Paper
Code

MiniGPT-3D: Efficiently Aligning 3D Point Clouds with Large Language Models using 2D Priors

tangyuan96/minigpt-3d • 2 May 2024

Notably, MiniGPT-3D gains an 8. 12 increase on GPT-4 evaluation score for the challenging object captioning task compared to ShapeLLM-13B, while the latter costs 160 total GPU-hours on 8 A800.

Ranked #1 on 3D Object Captioning on Objaverse

3D Object Captioning Generative 3D Object Classification

0.21 stars / hour

Paper
Code