A Survey on Spoken Language Understanding: Recent Advances and New Frontiers

yizhen20133868/Awesome-SLU-Survey 4 Mar 2021

Spoken Language Understanding (SLU) aims to extract the semantics frame of user queries, which is a core component in a task-oriented dialog system.

Spoken Language Understanding

952
0.33 stars / hour

VideoMamba: State Space Model for Efficient Video Understanding

opengvlab/videomamba 11 Mar 2024

Addressing the dual challenges of local redundancy and global dependencies in video understanding, this work innovatively adapts the Mamba to the video domain.

Video Understanding

449
0.32 stars / hour

RewardBench: Evaluating Reward Models for Language Modeling

allenai/reward-bench 20 Mar 2024

In this paper, we present RewardBench, a benchmark dataset and code-base for evaluation, to enhance scientific understanding of reward models.

Instruction Following Language Modelling

133
0.31 stars / hour

ORPO: Monolithic Preference Optimization without Reference Model

xfactlab/orpo 12 Mar 2024

While recent preference alignment algorithms for language models have demonstrated promising results, supervised fine-tuning (SFT) remains imperative for achieving successful convergence.

124
0.31 stars / hour

PreFLMR: Scaling Up Fine-Grained Late-Interaction Multi-modal Retrievers

linweizhedragon/retrieval-augmented-visual-question-answering 13 Feb 2024

Large Multimodal Models (LMMs) excel in natural language and visual understanding but are challenged by exacting tasks such as Knowledge-based Visual Question Answering (KB-VQA) which involve the retrieval of relevant information from document collections to use in shaping answers to questions.

 Ranked #1 on Retrieval on InfoSeek (using extra training data)

Question Answering Retrieval +1

88
0.30 stars / hour

AdaIR: Adaptive All-in-One Image Restoration via Frequency Mining and Modulation

c-yn/adair 21 Mar 2024

Our approach is motivated by the observation that different degradation types impact the image content on different frequency subbands, thereby requiring different treatments for each restoration task.

Deblurring Denoising +3

40
0.29 stars / hour

mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding

x-plug/mplug-docowl 19 Mar 2024

In this work, we emphasize the importance of structure information in Visual Document Understanding and propose the Unified Structure Learning to boost the performance of MLLMs.

document understanding Optical Character Recognition (OCR)

377
0.29 stars / hour

Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets

openai/grok 6 Jan 2022

In this paper we propose to study generalization of neural networks on small algorithmically generated datasets.

Memorization

3,875
0.28 stars / hour
110
0.28 stars / hour

VMamba: Visual State Space Model

mzeromiko/vmamba 18 Jan 2024

Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) stand as the two most popular foundation models for visual representation learning.

Computational Efficiency Representation Learning

1,093
0.27 stars / hour