Multimodal Deep Learning

Contrastive Language-Image Pre-training for the Italian Language

clip-italian/clip-italian 19 Aug 2021

CLIP (Contrastive Language-Image Pre-training) is a very recent multi-modal model that jointly learns representations of images and texts.

Classification Image Retrieval +2

Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal Sentiment Analysis

declare-lab/multimodal-deep-learning 28 Jul 2021

Multimodal sentiment analysis aims to extract and integrate semantic information collected from multiple modalities to recognize the expressed emotions and sentiment in multimodal data.

Multimodal Deep Learning Multimodal Sentiment Analysis

ShapeWorld - A new test methodology for multimodal language understanding

AlexKuhnle/ShapeWorld 14 Apr 2017

We introduce a novel framework for evaluating multimodal deep learning models with respect to their language understanding and generalization abilities.

Multimodal Deep Learning Visual Question Answering

Image Search With Text Feedback by Visiolinguistic Attention Learning

yanbeic/VAL CVPR 2020

In this work, we tackle this task by a novel Visiolinguistic Attention Learning (VAL) framework.

Deep Attention Multimodal Deep Learning +1

Distilling Audio-Visual Knowledge by Compositional Contrastive Learning

yanbeic/CCL CVPR 2021

Having access to multi-modal cues (e. g. vision and audio) empowers some cognitive tasks to be done faster compared to learning from a single modality.

Audio Tagging audio-visual learning +5

Multimodal Emotion Recognition with Transformer-Based Self Supervised Feature Fusion

shamanez/Self-Supervised-Embedding-Fusion-Transformer 27 Oct 2020

Emotion Recognition is a challenging research area given its complex nature, and humans express emotional cues across various modalities such as language, facial expressions, and speech.

Multimodal Deep Learning Multimodal Emotion Recognition +3

Multimodal deep networks for text and image-based document classification

Quicksign/ocrized-text-dataset 15 Jul 2019

Classification of document images is a critical step for archival of old manuscripts, online subscription and administrative procedures.

Classification Document Classification +4

More Diverse Means Better: Multimodal Deep Learning Meets Remote Sensing Imagery Classification

danfenghong/IEEE_TGRS_MDL-RS 12 Aug 2020

In particular, we also investigate a special case of multi-modality learning (MML) -- cross-modality learning (CML) that exists widely in RS image classification applications.

Classification General Classification +2