We propose an algorithm to denoise speakers from a single microphone in the presence of non-stationary and dynamic noise.
Although the matrix determined by the output weights is dependent on a set of known speakers, we only use the input vectors during inference.
Specifically, we train a deep learning image tagging and retrieval system on large scale, user generated content (UGC) using sampling methods and joint optimization of word embeddings.
We consider the visual sentiment task of mapping an image to an adjective noun pair (ANP) such as "cute baby".
We present the Yahoo Flickr Creative Commons 100 Million Dataset (YFCC100M), the largest public multimedia collection that has ever been released.
Multimedia Computers and Society H.3.7