Multi-modal Classification

9 papers with code • 2 benchmarks • 3 datasets

This task has no description! Would you like to contribute one?

Most implemented papers

What Makes Training Multi-Modal Classification Networks Hard?

facebookresearch/R2Plus1D CVPR 2020

Consider end-to-end training of a multi-modal vs. a single-modal network on a task with multiple input modalities: the multi-modal network receives more information, so it should match or outperform its single-modal counterpart.

Image and Encoded Text Fusion for Multi-Modal Classification

artelab/Multi-modal-classification 3 Oct 2018

To learn feature representations of resulting images, standard Convolutional Neural Networks (CNNs) are employed for the classification task.

Look, Read and Enrich. Learning from Scientific Figures and their Captions

hybridNLP/look_read_and_enrich 19 Sep 2019

Compared to natural images, understanding scientific figures is particularly hard for machines.

Multi-modal Sarcasm Detection and Humor Classification in Code-mixed Conversations

LCS2-IIITD/MSH-COMICS 20 May 2021

In this work, we make two major contributions considering the above limitations: (1) we develop a Hindi-English code-mixed dataset, MaSaC, for the multi-modal sarcasm detection and humor classification in conversational dialog, which to our knowledge is the first dataset of its kind; (2) we propose MSH-COMICS, a novel attention-rich neural architecture for the utterance classification.

Multimodal Dynamics: Dynamical Fusion for Trustworthy Multimodal Classification

tencentailabhealthcare/mmdynamics CVPR 2022

To the best of our knowledge, this is the first work to jointly model both feature and modality variation for different samples to provide trustworthy fusion in multi-modal classification.

On Modality Bias Recognition and Reduction

guoyang9/AdaVQA 25 Feb 2022

From the results on four datasets regarding the above three tasks, our method yields remarkable performance improvements compared with the baselines, demonstrating its superiority on reducing the modality bias problem.

UAVM: Towards Unifying Audio and Visual Models

YuanGongND/uavm 29 Jul 2022

Conventional audio-visual models have independent audio and video branches.

Contrastive Audio-Visual Masked Autoencoder

yuangongnd/cav-mae 2 Oct 2022

In this paper, we first extend the recent Masked Auto-Encoder (MAE) model from a single modality to audio-visual multi-modalities.

FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks

brandonhanx/fame-vil CVPR 2023

In the fashion domain, there exists a variety of vision-and-language (V+L) tasks, including cross-modal retrieval, text-guided image retrieval, multi-modal classification, and image captioning.