TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Video Retrieval	YouCook2	HGLMM FV CCA	text-to-video Median Rank	75	# 9
Video Retrieval	YouCook2	HGLMM FV CCA	text-to-video R@1	4.6	# 14
Video Retrieval	YouCook2	HGLMM FV CCA	text-to-video R@10	21.6	# 15
Video Retrieval	YouCook2	HGLMM FV CCA	text-to-video R@5	14.3	# 13

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/associating-neural-word-embeddings-with-deep/video-retrieval-on-youcook2)](https://paperswithcode.com/sota/video-retrieval-on-youcook2?p=associating-neural-word-embeddings-with-deep)`

Associating Neural Word Embeddings With Deep Image Representations Using Fisher Vectors

CVPR 2015 · Benjamin Klein, Guy Lev, Gil Sadeh, Lior Wolf ·

In recent years, the problem of associating a sentence with an image has gained a lot of attention. This work continues to push the envelope and makes further progress in the performance of image annotation and image search by a sentence tasks. In this work, we are using the Fisher Vector as a sentence representation by pooling the word2vec embedding of each word in the sentence. The Fisher Vector is typically taken as the gradients of the log-likelihood of descriptors, with respect to the parameters of a Gaussian Mixture Model (GMM). In this work we present two other Mixture Models and derive their Expectation-Maximization and Fisher Vector expressions. The first is a Laplacian Mixture Model (LMM), which is based on the Laplacian distribution. The second Mixture Model presented is a Hybrid Gaussian-Laplacian Mixture Model (HGLMM) which is based on a weighted geometric mean of the Gaussian and Laplacian distribution. Finally, by using the new Fisher Vectors derived from HGLMMs to represent sentences, we achieve state-of-the-art results for both the image annotation and the image search by a sentence tasks on four benchmarks: Pascal1K, Flickr8K, Flickr30K, and COCO.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Image Retrieval

Sentence

Word Embeddings

Datasets

MS COCO

Flickr30k

YouCook2

Results from the Paper

Add Remove

Ranked #14 on Video Retrieval on YouCook2

Get a GitHub badge

Results from Other Papers

Task	Dataset	Model	Metric Name	Metric Value	Rank	Compare
Video Retrieval	YouCook2	HGLMM FV CCA	text-to-video Median Rank	75	# 9	See all
			text-to-video R@1	4.6	# 14	See all
			text-to-video R@10	21.6	# 15	See all
			text-to-video R@5	14.3	# 13	See all

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Associating Neural Word Embeddings With Deep Image Representations Using Fisher Vectors

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove