Multi-modal Classification

9 papers with code • 2 benchmarks • 3 datasets

This task has no description! Would you like to contribute one?

Benchmarks

Add a Result

These leaderboards are used to track progress in Multi-modal Classification

Trend	Dataset	Best Model	Paper	Code	Compare
	VGG-Sound	MMT			See all
	AudioSet	CAV-MAE			See all

Datasets

Subtasks

Image-text Classification

Most implemented papers

Most implemented Social Latest No code

What Makes Training Multi-Modal Classification Networks Hard?

facebookresearch/R2Plus1D • • CVPR 2020

Consider end-to-end training of a multi-modal vs. a single-modal network on a task with multiple input modalities: the multi-modal network receives more information, so it should match or outperform its single-modal counterpart.

Paper
Code

Image and Encoded Text Fusion for Multi-Modal Classification

artelab/Multi-modal-classification • • 3 Oct 2018

To learn feature representations of resulting images, standard Convolutional Neural Networks (CNNs) are employed for the classification task.

Paper
Code

Look, Read and Enrich. Learning from Scientific Figures and their Captions

hybridNLP/look_read_and_enrich • 19 Sep 2019

Compared to natural images, understanding scientific figures is particularly hard for machines.

Paper
Code

Multi-modal Sarcasm Detection and Humor Classification in Code-mixed Conversations

LCS2-IIITD/MSH-COMICS • • 20 May 2021

In this work, we make two major contributions considering the above limitations: (1) we develop a Hindi-English code-mixed dataset, MaSaC, for the multi-modal sarcasm detection and humor classification in conversational dialog, which to our knowledge is the first dataset of its kind; (2) we propose MSH-COMICS, a novel attention-rich neural architecture for the utterance classification.

Paper
Code

Multimodal Dynamics: Dynamical Fusion for Trustworthy Multimodal Classification

tencentailabhealthcare/mmdynamics • • CVPR 2022

To the best of our knowledge, this is the first work to jointly model both feature and modality variation for different samples to provide trustworthy fusion in multi-modal classification.

Paper
Code

On Modality Bias Recognition and Reduction

guoyang9/AdaVQA • • 25 Feb 2022

From the results on four datasets regarding the above three tasks, our method yields remarkable performance improvements compared with the baselines, demonstrating its superiority on reducing the modality bias problem.

Paper
Code

UAVM: Towards Unifying Audio and Visual Models

YuanGongND/uavm • • 29 Jul 2022

Conventional audio-visual models have independent audio and video branches.

Paper
Code

Contrastive Audio-Visual Masked Autoencoder

yuangongnd/cav-mae • • 2 Oct 2022

In this paper, we first extend the recent Masked Auto-Encoder (MAE) model from a single modality to audio-visual multi-modalities.

Paper
Code

FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks

brandonhanx/fame-vil • • CVPR 2023

In the fashion domain, there exists a variety of vision-and-language (V+L) tasks, including cross-modal retrieval, text-guided image retrieval, multi-modal classification, and image captioning.

Paper
Code

Multi-modal Classification

Benchmarks Add a Result

Datasets

Subtasks

Most implemented papers

Content

Benchmarks

Add a Result