ShapeWorld - A new test methodology for multimodal language understanding

We introduce a novel framework for evaluating multimodal deep learning models with respect to their language understanding and generalization abilities.

Multimodal deep networks for text and image-based document classification

Classification of document images is a critical step for archival of old manuscripts, online subscription and administrative procedures.

aiMotive Dataset: A Multimodal Dataset for Robust Autonomous Driving with Long-Range Perception

The dataset consists of 176 scenes with synchronized and calibrated LiDAR, camera, and radar sensors covering a 360-degree field of view.

Multimodal Deep Learning for Robust RGB-D Object Recognition

Robust object recognition is a crucial ingredient of many, if not all, real-world robotics applications.

Are These Birds Similar: Learning Branched Networks for Fine-grained Representations

In recent years, natural language descriptions are used to obtain information on discriminative parts of the object.

Supervised Video Summarization via Multiple Feature Sets with Parallel Attention

The proposed architecture utilizes an attention mechanism before fusing motion features and features representing the (static) visual content, i. e., derived from an image classification model.

A multimodal deep learning framework for scalable content based visual media retrieval

We propose a novel, efficient, modular and scalable framework for content based visual media retrieval systems by leveraging the power of Deep Learning which is flexible to work both for images and videos conjointly and we also introduce an efficient comparison and filtering metric for retrieval.

Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal Sentiment Analysis

Multimodal sentiment analysis aims to extract and integrate semantic information collected from multiple modalities to recognize the expressed emotions and sentiment in multimodal data.

XFlow: Cross-modal Deep Neural Networks for Audiovisual Classification

Our work improves on existing multimodal deep learning algorithms in two essential ways: (1) it presents a novel method for performing cross-modality (before features are learned from individual modalities) and (2) extends the previously proposed cross-connections which only transfer information between streams that process compatible data.

Learn to Combine Modalities in Multimodal Deep Learning

Combining complementary information from multiple modalities is intuitively appealing for improving the performance of learning-based approaches.