Descriptive

327 papers with code • 1 benchmarks • 1 datasets

This task has no description! Would you like to contribute one?

Benchmarks

Add a Result

These leaderboards are used to track progress in Descriptive

Trend	Dataset	Best Model	Paper	Code	Compare
	CRIPP-VQA	Aloe*+BERT			See all

Datasets

CRIPP-VQA

Most implemented papers

Most implemented Social Latest No code

Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books

soskek/homemade_bookcorpus • ICCV 2015

Books are a rich source of both fine-grained information, how a character, an object or a scene looks like, as well as high-level semantics, what someone is thinking, feeling and how these states evolve through a story.

Paper
Code

Improving LSTM-based Video Description with Linguistic Knowledge Mined from Text

TejInaco/multimodalML • EMNLP 2016

This paper investigates how linguistic knowledge mined from large text corpora can aid the generation of natural language descriptions of videos.

Paper
Code

A Hierarchical Approach for Generating Descriptive Image Paragraphs

chenxinpeng/im2p • • CVPR 2017

Recent progress on image captioning has made it possible to generate novel sentences describing images in natural language, but compressing an image into a single sentence can describe visual content in only coarse detail.

Paper
Code

PL-SLAM: a Stereo SLAM System through the Combination of Points and Line Segments

rubengooj/pl-slam • 26 May 2017

This paper proposes PL-SLAM, a stereo visual SLAM system that combines both points and line segments to work robustly in a wider variety of scenarios, particularly in those where point features are scarce or not well-distributed in the image.

Paper
Code

CLEVRER: CoLlision Events for Video REpresentation and Reasoning

chuangg/CLEVRER • • ICLR 2020

While these models thrive on the perception-based task (descriptive), they perform poorly on the causal tasks (explanatory, predictive and counterfactual), suggesting that a principled approach for causal reasoning should incorporate the capability of both perceiving complex visual and language inputs, and understanding the underlying dynamics and causal relations.

Paper
Code

Uninformed Students: Student-Teacher Anomaly Detection with Discriminative Latent Embeddings

denguir/student-teacher-anomaly-detection • • CVPR 2020

Our experiments demonstrate improvements over state-of-the-art methods on a number of real-world datasets, including the recently introduced MVTec Anomaly Detection dataset that was specifically designed to benchmark anomaly segmentation algorithms.

Paper
Code

Generating images from caption and vice versa via CLIP-Guided Generative Latent Space Search

galatolofederico/clip-glass • • 2 Feb 2021

In this research work we present CLIP-GLaSS, a novel zero-shot framework to generate an image (or a caption) corresponding to a given caption (or image).

Paper
Code

Music transcription modelling and composition using deep learning

IraKorshunova/folk-rnn • 29 Apr 2016

We apply deep learning methods, specifically long short-term memory (LSTM) networks, to music transcription modelling and composition.

Paper
Code

Picture It In Your Mind: Generating High Level Visual Representations From Textual Descriptions

AlexMoreo/tensorflow-Tex2Vis • • 23 Jun 2016

We choose to implement the actual search process as a similarity search in a visual feature space, by learning to translate a textual query into a visual representation.

Paper
Code

A dataset and exploration of models for understanding video data through fill-in-the-blank question-answering

teganmaharaj/movieFIB • CVPR 2017

In addition to presenting statistics and a description of the dataset, we perform a detailed analysis of 5 different models' predictions, and compare these with human performance.

Paper
Code

Descriptive

Benchmarks Add a Result

Datasets

Most implemented papers

Content

Benchmarks

Add a Result