
327 papers with code • 1 benchmarks • 1 datasets

This task has no description! Would you like to contribute one?


Most implemented papers

Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books

soskek/homemade_bookcorpus ICCV 2015

Books are a rich source of both fine-grained information, how a character, an object or a scene looks like, as well as high-level semantics, what someone is thinking, feeling and how these states evolve through a story.

Improving LSTM-based Video Description with Linguistic Knowledge Mined from Text

TejInaco/multimodalML EMNLP 2016

This paper investigates how linguistic knowledge mined from large text corpora can aid the generation of natural language descriptions of videos.

A Hierarchical Approach for Generating Descriptive Image Paragraphs

chenxinpeng/im2p CVPR 2017

Recent progress on image captioning has made it possible to generate novel sentences describing images in natural language, but compressing an image into a single sentence can describe visual content in only coarse detail.

PL-SLAM: a Stereo SLAM System through the Combination of Points and Line Segments

rubengooj/pl-slam 26 May 2017

This paper proposes PL-SLAM, a stereo visual SLAM system that combines both points and line segments to work robustly in a wider variety of scenarios, particularly in those where point features are scarce or not well-distributed in the image.

CLEVRER: CoLlision Events for Video REpresentation and Reasoning

chuangg/CLEVRER ICLR 2020

While these models thrive on the perception-based task (descriptive), they perform poorly on the causal tasks (explanatory, predictive and counterfactual), suggesting that a principled approach for causal reasoning should incorporate the capability of both perceiving complex visual and language inputs, and understanding the underlying dynamics and causal relations.

Uninformed Students: Student-Teacher Anomaly Detection with Discriminative Latent Embeddings

denguir/student-teacher-anomaly-detection CVPR 2020

Our experiments demonstrate improvements over state-of-the-art methods on a number of real-world datasets, including the recently introduced MVTec Anomaly Detection dataset that was specifically designed to benchmark anomaly segmentation algorithms.

Generating images from caption and vice versa via CLIP-Guided Generative Latent Space Search

galatolofederico/clip-glass 2 Feb 2021

In this research work we present CLIP-GLaSS, a novel zero-shot framework to generate an image (or a caption) corresponding to a given caption (or image).

Music transcription modelling and composition using deep learning

IraKorshunova/folk-rnn 29 Apr 2016

We apply deep learning methods, specifically long short-term memory (LSTM) networks, to music transcription modelling and composition.

Picture It In Your Mind: Generating High Level Visual Representations From Textual Descriptions

AlexMoreo/tensorflow-Tex2Vis 23 Jun 2016

We choose to implement the actual search process as a similarity search in a visual feature space, by learning to translate a textual query into a visual representation.

A dataset and exploration of models for understanding video data through fill-in-the-blank question-answering

teganmaharaj/movieFIB CVPR 2017

In addition to presenting statistics and a description of the dataset, we perform a detailed analysis of 5 different models' predictions, and compare these with human performance.