Browse > Miscellaneous > Multi-Modal > Cross-Modal Retrieval

Cross-Modal Retrieval

16 papers with code · Miscellaneous
Subtask of Multi-Modal

Leaderboards

TREND DATASET BEST METHOD PAPER TITLE PAPER CODE COMPARE

Greatest papers with code

VSE++: Improving Visual-Semantic Embeddings with Hard Negatives

18 Jul 2017fartashf/vsepp

We present a new technique for learning visual-semantic embeddings for cross-modal retrieval.

CROSS-MODAL RETRIEVAL IMAGE RETRIEVAL STRUCTURED PREDICTION

Dual-Path Convolutional Image-Text Embedding with Instance Loss

15 Nov 2017layumi/Image-Text-Embedding

In this paper, we propose a new system to discriminatively embed the image and text to a shared visual-textual space.

CONTENT-BASED IMAGE RETRIEVAL CROSS-MODAL RETRIEVAL PERSON RETRIEVAL TEXT-IMAGE RETRIEVAL

Polysemous Visual-Semantic Embedding for Cross-Modal Retrieval

CVPR 2019 yalesong/pvse

In this work, we introduce Polysemous Instance Embedding Networks (PIE-Nets) that compute multiple and diverse representations of an instance by combining global context with locally-guided features via multi-head self-attention and residual learning.

CROSS-MODAL RETRIEVAL MULTIPLE INSTANCE LEARNING

Learning Cross-Modal Embeddings with Adversarial Networks for Cooking Recipes and Food Images

CVPR 2019 hwang1996/ACME

Food computing is playing an increasingly important role in human daily life, and has found tremendous applications in guiding human behavior towards smart food consumption and healthy lifestyle.

CROSS-MODAL RETRIEVAL

Content-Based Video-Music Retrieval Using Soft Intra-Modal Structure Constraint

22 Apr 2017csehong/VM-NET

Up to now, only limited research has been conducted on cross-modal retrieval of suitable music for a specified video or vice versa.

CROSS-MODAL RETRIEVAL

Cross-Modal Retrieval in the Cooking Context: Learning Semantic Text-Image Embeddings

30 Apr 2018Cadene/recipe1m.bootstrap.pytorch

Designing powerful tools that support cooking activities has rapidly gained popularity due to the massive amounts of available data, as well as recent advances in machine learning that are capable of analyzing them.

CROSS-MODAL RETRIEVAL

Picture It In Your Mind: Generating High Level Visual Representations From Textual Descriptions

23 Jun 2016AlexMoreo/tensorflow-Text2Vis

We choose to implement the actual search process as a similarity search in a visual feature space, by learning to translate a textual query into a visual representation.

CROSS-MODAL INFORMATION RETRIEVAL CROSS-MODAL RETRIEVAL IMAGE RETRIEVAL

See, Hear, and Read: Deep Aligned Representations

3 Jun 2017jingliao132/CrossModalRetrieval

We capitalize on large amounts of readily-available, synchronous data to learn a deep discriminative representations shared across three major natural modalities: vision, sound and language.

CROSS-MODAL RETRIEVAL REPRESENTATION LEARNING

Show, Translate and Tell

14 Mar 2019peri044/STT

Humans have an incredible ability to process and understand information from multiple sources such as images, video, text, and speech.

CROSS-MODAL RETRIEVAL IMAGE CAPTIONING