LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders

mcgill-nlp/llm2vec 9 Apr 2024

We outperform encoder-only models by a large margin on word-level tasks and reach a new unsupervised state-of-the-art performance on the Massive Text Embeddings Benchmark (MTEB).

Contrastive Learning

340
0.60 stars / hour

LAFS: Landmark-based Facial Self-supervised Learning for Face Recognition

Recognito-Vision/Face-SDK-Linux-Demos 13 Mar 2024

This enables our method - namely LAndmark-based Facial Self-supervised learning LAFS), to learn key representation that is more critical for face recognition.

Face Recognition Self-Supervised Learning

200
0.58 stars / hour

Multi-domain Learning for Updating Face Anti-spoofing Models

Recognito-Vision/Linux-FaceRecognition-FaceLivenessDetection 23 Aug 2022

In this work, we study multi-domain learning for face anti-spoofing(MD-FAS), where a pre-trained FAS model needs to be updated to perform equally well on both source and target domains while only using target domain data for updating.

Face Anti-Spoofing

199
0.57 stars / hour

Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models

dvlab-research/minigemini 27 Mar 2024

We try to narrow the gap by mining the potential of VLMs for better performance and any-to-any workflow from three aspects, i. e., high-resolution visual tokens, high-quality data, and VLM-guided generation.

Image Comprehension Visual Dialog +1

2,874
0.57 stars / hour

The Unreasonable Ineffectiveness of the Deeper Layers

arcee-ai/PruneMe 26 Mar 2024

We empirically study a simple layer-pruning strategy for popular families of open-weight pretrained LLMs, finding minimal degradation of performance on different question-answering benchmarks until after a large fraction (up to half) of the layers are removed.

Quantization Question Answering

65
0.56 stars / hour

UniMERNet: A Universal Network for Real-World Mathematical Expression Recognition

opendatalab/unimernet 23 Apr 2024

This paper presents the UniMER dataset to provide the first study on Mathematical Expression Recognition (MER) towards complex real-world scenarios.

Image Augmentation

17
0.54 stars / hour

emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation

alibaba-damo-academy/FunASR 23 Dec 2023

To the best of our knowledge, emotion2vec is the first universal representation model in various emotion-related tasks, filling a gap in the field.

Self-Supervised Learning Sentiment Analysis +1

3,284
0.54 stars / hour

MoVA: Adapting Mixture of Vision Experts to Multimodal Context

templex98/mova 19 Apr 2024

Although some large-scale pretrained vision encoders such as vision encoders in CLIP and DINOv2 have brought promising performance, we found that there is still no single vision encoder that can dominate various image content understanding, e. g., the CLIP vision encoder leads to outstanding results on general image understanding but poor performance on document or chart content.

Language Modelling Large Language Model

48
0.53 stars / hour

STaRK: Benchmarking LLM Retrieval on Textual and Relational Knowledge Bases

snap-stanford/stark 19 Apr 2024

Answering real-world user queries, such as product search, often requires accurate retrieval of information from semi-structured knowledge bases or databases that involve blend of unstructured (e. g., textual descriptions of products) and structured (e. g., entity relations of products) information.

Benchmarking Retrieval

37
0.51 stars / hour

ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback

liming-ai/ControlNet_Plus_Plus 11 Apr 2024

To this end, we propose ControlNet++, a novel approach that improves controllable generation by explicitly optimizing pixel-level cycle consistency between generated images and conditional controls.

SSIM

157
0.50 stars / hour