Multimodal Intent Recognition

8 papers with code • 3 benchmarks • 3 datasets

Intent recognition on multimodal content.

Image source: MIntRec: A New Dataset for Multimodal Intent Recognition


Use these libraries to find Multimodal Intent Recognition models and implementations
2 papers
2 papers
See all 8 libraries.

Most implemented papers

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

google-research/bert NAACL 2019

We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers.

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

google-research/ALBERT ICLR 2020

Increasing model size when pretraining natural language representations often results in improved performance on downstream tasks.

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

huggingface/transformers arXiv 2019

Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP).

ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision

dandelin/vilt 5 Feb 2021

Vision-and-Language Pre-training (VLP) has improved performance on various joint vision-and-language downstream tasks.

MIntRec: A New Dataset for Multimodal Intent Recognition

thuiar/mintrec 9 Sep 2022

This paper introduces a novel dataset for multimodal intent recognition (MIntRec) to address this issue.

MMDialog: A Large-scale Multi-turn Dialogue Dataset Towards Multi-modal Open-domain Conversation

victorsungo/mmdialog 10 Nov 2022

First, it is the largest multi-modal conversation dataset by the number of dialogues by 88x.

Speech-Text Dialog Pre-training for Spoken Dialog Understanding with Explicit Cross-Modal Alignment

alibabaresearch/damo-convai 19 May 2023

In this paper, we propose Speech-text dialog Pre-training for spoken dialog understanding with ExpliCiT cRoss-Modal Alignment (SPECTRA), which is the first-ever speech-text dialog pre-training model.

PaCE: Unified Multi-modal Dialogue Pre-training with Progressive and Compositional Experts

AlibabaResearch/DAMO-ConvAI 24 May 2023

It utilizes a combination of several fundamental experts to accommodate multiple dialogue-related tasks and can be pre-trained using limited dialogue and extensive non-dialogue multi-modal data.