NOLA: Compressing LoRA using Linear Combination of Random Basis

UCDvision/NOLA 4 Oct 2023

These methods can reduce the number of parameters needed to fine-tune an LLM by several orders of magnitude.

37
0.26 stars / hour

TextrolSpeech: A Text Style Control Speech Corpus With Codec Language Text-to-Speech Models

jishengpeng/TextrolSpeech 28 Aug 2023

The dataset comprises 236, 220 pairs of style prompt in natural text descriptions with five style factors and corresponding speech samples.

Language Modelling

82
0.26 stars / hour

ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity Preserving

JackAILab/ConsistentID 25 Apr 2024

ConsistentID comprises two key components: a multimodal facial prompt generator that combines facial features, corresponding facial descriptions and the overall facial context to enhance precision in facial details, and an ID-preservation network optimized through the facial attention localization strategy, aimed at preserving ID consistency in facial regions.

434
0.26 stars / hour

SoccerNet Game State Reconstruction: End-to-End Athlete Tracking and Identification on a Minimap

soccernet/sn-gamestate 17 Apr 2024

This tracking and identification process is crucial for reconstructing the game state, defined by the athletes' positions and identities on a 2D top-view of the pitch, (i. e. a minimap).

Camera Calibration Game State Reconstruction

134
0.25 stars / hour

ReFT: Representation Finetuning for Language Models

stanfordnlp/pyreft 4 Apr 2024

LoReFT is a drop-in replacement for existing PEFTs and learns interventions that are 10x-50x more parameter-efficient than prior state-of-the-art PEFTs.

Arithmetic Reasoning

695
0.25 stars / hour

FlowMap: High-Quality Camera Poses, Intrinsics, and Depth via Gradient Descent

dcharatan/flowmap 23 Apr 2024

This paper introduces FlowMap, an end-to-end differentiable method that solves for precise camera poses, camera intrinsics, and per-frame dense depth of a video sequence.

Novel View Synthesis Optical Flow Estimation +1

691
0.23 stars / hour

Neuro-GPT: Towards A Foundation Model for EEG

wenhui0206/neurogpt 7 Nov 2023

To handle the scarcity and heterogeneity of electroencephalography (EEG) data for Brain-Computer Interface (BCI) tasks, and to harness the power of large publicly available data sets, we propose Neuro-GPT, a foundation model consisting of an EEG encoder and a GPT model.

EEG Motor Imagery

52
0.23 stars / hour

GaussianTalker: Real-Time High-Fidelity Talking Head Synthesis with Audio-Driven 3D Gaussian Splatting

ku-cvlab/gaussiantalker 24 Apr 2024

A key insight is to encode the 3D Gaussian attributes into a shared implicit feature representation, where it is merged with audio features to manipulate each Gaussian attribute.

Attribute

97
0.21 stars / hour

Hallucination of Multimodal Large Language Models: A Survey

showlab/awesome-mllm-hallucination 29 Apr 2024

By drawing the granular classification and landscapes of hallucination causes, evaluation benchmarks, and mitigation methods, this survey aims to deepen the understanding of hallucinations in MLLMs and inspire further advancements in the field.

Hallucination

150
0.21 stars / hour

MiniGPT-3D: Efficiently Aligning 3D Point Clouds with Large Language Models using 2D Priors

tangyuan96/minigpt-3d 2 May 2024

Notably, MiniGPT-3D gains an 8. 12 increase on GPT-4 evaluation score for the challenging object captioning task compared to ShapeLLM-13B, while the latter costs 160 total GPU-hours on 8 A800.

3D Object Captioning Generative 3D Object Classification

20
0.21 stars / hour