Multimodal Intent Recognition

10 papers with code • 3 benchmarks • 3 datasets

Intent recognition on multimodal content.

Image source: MIntRec: A New Dataset for Multimodal Intent Recognition

Benchmarks

Add a Result

These leaderboards are used to track progress in Multimodal Intent Recognition

Dataset	Best Model	Compare
MIntRec	Human	See all
PhotoChat	PaCE	See all
MMDialog	PaCE	See all

Libraries

Use these libraries to find Multimodal Intent Recognition models and implementations

huggingface/transformers

4 papers

125,725

Tencent/TurboTransformers

2 papers

1,442

google/seqio

2 papers

530

thu-keg/omnievent

2 papers

317

See all 8 libraries.

Datasets

Most implemented papers

Most implemented Social Latest No code

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

google-research/bert • • NAACL 2019

We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers.

528

Paper
Code

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

huggingface/transformers • • arXiv 2019

Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP).

Paper
Code

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

google-research/ALBERT • • ICLR 2020

Increasing model size when pretraining natural language representations often results in improved performance on downstream tasks.

Paper
Code

ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision

dandelin/vilt • • 5 Feb 2021

Vision-and-Language Pre-training (VLP) has improved performance on various joint vision-and-language downstream tasks.

Paper
Code

MIntRec: A New Dataset for Multimodal Intent Recognition

thuiar/mintrec • • 9 Sep 2022

This paper introduces a novel dataset for multimodal intent recognition (MIntRec) to address this issue.

Paper
Code

MMDialog: A Large-scale Multi-turn Dialogue Dataset Towards Multi-modal Open-domain Conversation

victorsungo/mmdialog • • 10 Nov 2022

First, it is the largest multi-modal conversation dataset by the number of dialogues by 88x.

Paper
Code

Speech-Text Dialog Pre-training for Spoken Dialog Understanding with Explicit Cross-Modal Alignment

alibabaresearch/damo-convai • • 19 May 2023

In this paper, we propose Speech-text dialog Pre-training for spoken dialog understanding with ExpliCiT cRoss-Modal Alignment (SPECTRA), which is the first-ever speech-text dialog pre-training model.

Paper
Code

PaCE: Unified Multi-modal Dialogue Pre-training with Progressive and Compositional Experts

AlibabaResearch/DAMO-ConvAI • • 24 May 2023

It utilizes a combination of several fundamental experts to accommodate multiple dialogue-related tasks and can be pre-trained using limited dialogue and extensive non-dialogue multi-modal data.

Paper
Code

Token-Level Contrastive Learning with Modality-Aware Prompting for Multimodal Intent Recognition

thuiar/TCL-MAP • • 22 Dec 2023

To establish an optimal multimodal semantic environment for text modality, we develop a modality-aware prompting module (MAP), which effectively aligns and fuses features from text, video and audio modalities with similarity-based modality alignment and cross-modality attention mechanism.

Paper
Code

MIntRec2.0: A Large-scale Benchmark Dataset for Multimodal Intent Recognition and Out-of-scope Detection in Conversations

thuiar/mintrec2.0 • • 16 Mar 2024

We believe that MIntRec2. 0 will serve as a valuable resource, providing a pioneering foundation for research in human-machine conversational interactions, and significantly facilitating related applications.

Paper
Code

Multimodal Intent Recognition

Benchmarks Add a Result

Libraries

Datasets

Most implemented papers

Content

Benchmarks

Add a Result