MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU Languages

hlt-mt/mosel 1 Oct 2024

The rise of foundation models (FMs), coupled with regulatory efforts addressing their risks and impacts, has sparked significant interest in open-source models.

Automatic Speech Recognition speech-recognition +1

124
0.57 stars / hour

AWT: Transferring Vision-Language Models via Augmentation, Weighting, and Transportation

MCG-NJU/AWT 5 Jul 2024

Pre-trained vision-language models (VLMs) have shown impressive results in various visual classification tasks.

Action Recognition Few-Shot Image Classification +3

53
0.44 stars / hour

Text2SQL is Not Enough: Unifying AI and Databases with TAG

tag-research/tag-bench 27 Aug 2024

Such systems would allow users to leverage the powerful reasoning and knowledge capabilities of language models (LMs) alongside the scalable computational power of data management systems.

RAG Text-To-SQL +1

531
0.41 stars / hour

Resolving Multi-Condition Confusion for Finetuning-Free Personalized Image Generation

hqhqaq/mip-adapter 26 Sep 2024

Personalized text-to-image generation methods can generate customized images based on the reference images, which have garnered wide research interest.

Object Personalized Image Generation +1

61
0.39 stars / hour

PrefixQuant: Static Quantization Beats Dynamic through Prefixed Outliers in LLMs

chenmnz/prefixquant 7 Oct 2024

Specifically, PrefixQuant identifies high-frequency outlier tokens and prefixes them in the KV cache, preventing the generation of outlier tokens during inference and simplifying quantization.

Common Sense Reasoning Quantization

19
0.38 stars / hour

How to Train Long-Context Language Models (Effectively)

princeton-nlp/prolong 3 Oct 2024

We study continued training and supervised fine-tuning (SFT) of a language model (LM) to make effective use of long-context information.

58
0.34 stars / hour

FAST-LIVO2: Fast, Direct LiDAR-Inertial-Visual Odometry

hku-mars/fast-livo2 26 Aug 2024

The fusion of both visual and LiDAR measurements is based on a single unified voxel map where the LiDAR module constructs the geometric structure for registering new LiDAR scans and the visual module attaches image patches to the LiDAR points.

Visual Odometry

956
0.33 stars / hour

CAR: Controllable Autoregressive Modeling for Visual Generation

miracledance/car 7 Oct 2024

To the best of our knowledge, we are the first to propose a control framework for pre-trained autoregressive visual generation models.

14
0.33 stars / hour

Unraveling Cross-Modality Knowledge Conflict in Large Vision-Language Models

luka-group/vlm-knowledge-conflict 4 Oct 2024

Specifically, using LLaVA-34B, our proposed dynamic contrastive decoding improves an average accuracy of 2. 24%.

21
0.33 stars / hour

Breaking reCAPTCHAv2

aplesner/Breaking-reCAPTCHAv2 13 Sep 2024

Our work examines the efficacy of employing advanced machine learning methods to solve captchas from Google's reCAPTCHAv2 system.

Image Segmentation Semantic Segmentation

205
0.33 stars / hour