Search Results for author: Yale Song

Found 37 papers, 14 papers with code

Neural-Sim: Learning to Generate Training Data with NeRF

1 code implementation22 Jul 2022 Yunhao Ge, Harkirat Behl, Jiashu Xu, Suriya Gunasekar, Neel Joshi, Yale Song, Xin Wang, Laurent Itti, Vibhav Vineet

However, existing approaches either require human experts to manually tune each scene property or use automatic methods that provide little to no control; this requires rendering large amounts of random data variations, which is slow and is often suboptimal for the target domain.

object-detection Object Detection

Visual Attention Emerges from Recurrent Sparse Reconstruction

1 code implementation23 Apr 2022 Baifeng Shi, Yale Song, Neel Joshi, Trevor Darrell, Xin Wang

We present VARS, Visual Attention from Recurrent Sparse reconstruction, a new attention formulation built on two prominent features of the human visual attention mechanism: recurrency and sparsity.

Robust Contrastive Learning against Noisy Views

1 code implementation CVPR 2022 Ching-Yao Chuang, R Devon Hjelm, Xin Wang, Vibhav Vineet, Neel Joshi, Antonio Torralba, Stefanie Jegelka, Yale Song

Contrastive learning relies on an assumption that positive pairs contain related views, e. g., patches of an image or co-occurring multimodal signals of a video, that share certain underlying information about an instance.

Contrastive Learning

Contrastive Learning of Global and Local Video Representations

no code implementations NeurIPS 2021 Shuang Ma, Zhaoyang Zeng, Daniel McDuff, Yale Song

In this work, we propose to learn video representations that generalize to both the tasks which require global semantic information (e. g., classification) and the tasks that require local fine-grained spatio-temporal information (e. g., localization).

Classification Contrastive Learning +3

Contrastive Learning of Global-Local Video Representations

1 code implementation7 Apr 2021 Shuang Ma, Zhaoyang Zeng, Daniel McDuff, Yale Song

In this work, we propose to learn video representations that generalize to both the tasks which require global semantic information (e. g., classification) and the tasks that require local fine-grained spatio-temporal information (e. g., localization).

Classification Contrastive Learning +5

DOC2PPT: Automatic Presentation Slides Generation from Scientific Documents

no code implementations28 Jan 2021 Tsu-Jui Fu, William Yang Wang, Daniel McDuff, Yale Song

Creating presentation materials requires complex multimodal reasoning skills to summarize key concepts and arrange them in a logical and visually pleasing manner.

Document Summarization

ACAV100M: Automatic Curation of Large-Scale Datasets for Audio-Visual Video Representation Learning

1 code implementation ICCV 2021 Sangho Lee, Jiwan Chung, Youngjae Yu, Gunhee Kim, Thomas Breuel, Gal Chechik, Yale Song

We demonstrate that our approach finds videos with high audio-visual correspondence and show that self-supervised models trained on our data achieve competitive performances compared to models trained on existing manually curated datasets.

Representation Learning

Self-Supervised Learning of Compressed Video Representations

no code implementations ICLR 2021 Youngjae Yu, Sangho Lee, Gunhee Kim, Yale Song

We show that our approach achieves competitive performance on self-supervised learning of video representations with a considerable improvement in speed compared to the traditional methods.

Self-Supervised Learning

Parameter Efficient Multimodal Transformers for Video Representation Learning

no code implementations ICLR 2021 Sangho Lee, Youngjae Yu, Gunhee Kim, Thomas Breuel, Jan Kautz, Yale Song

The recent success of Transformers in the language domain has motivated adapting it to a multimodal setting, where a new visual model is trained in tandem with an already pretrained language model.

Language Modelling Representation Learning

Learning to Transfer Visual Effects from Videos to Images

no code implementations3 Dec 2020 Christopher Thomas, Yale Song, Adriana Kovashka

We study the problem of animating images by transferring spatio-temporal visual effects (such as melting) from a collection of videos.

Optical Flow Estimation

Active Contrastive Learning of Audio-Visual Video Representations

1 code implementation ICLR 2021 Shuang Ma, Zhaoyang Zeng, Daniel McDuff, Yale Song

Contrastive learning has been shown to produce generalizable representations of audio and visual data by maximizing the lower bound on the mutual information (MI) between different views of an instance.

Contrastive Learning Representation Learning +1

Multi-Reference Neural TTS Stylization with Adversarial Cycle Consistency

no code implementations25 Oct 2019 Matt Whitehill, Shuang Ma, Daniel McDuff, Yale Song

We use this method to transfer emotion from a dataset containing four emotions to a dataset with only a single emotion.

Emotion Classification Style Transfer

Unpaired Image-to-Speech Synthesis with Multimodal Information Bottleneck

1 code implementation ICCV 2019 Shuang Ma, Daniel McDuff, Yale Song

We propose a multimodal information bottleneck approach that learns the correspondence between modalities from unpaired data (image and speech) by leveraging the shared modality (text).

Image Generation Speech Synthesis

Image to Video Domain Adaptation Using Web Supervision

no code implementations5 Aug 2019 Andrew Kae, Yale Song

Training deep neural networks typically requires large amounts of labeled data which may be scarce or expensive to obtain for a particular target domain.

Domain Adaptation

M3D-GAN: Multi-Modal Multi-Domain Translation with Universal Attention

no code implementations9 Jul 2019 Shuang Ma, Daniel McDuff, Yale Song

Generative adversarial networks have led to significant advances in cross-modal/domain translation.

Dialogue Generation Image Captioning +5

Polysemous Visual-Semantic Embedding for Cross-Modal Retrieval

1 code implementation CVPR 2019 Yale Song, Mohammad Soleymani

In this work, we introduce Polysemous Instance Embedding Networks (PIE-Nets) that compute multiple and diverse representations of an instance by combining global context with locally-guided features via multi-head self-attention and residual learning.

Cross-Modal Retrieval Multiple Instance Learning +1

Characterizing Bias in Classifiers using Generative Models

1 code implementation NeurIPS 2019 Daniel McDuff, Shuang Ma, Yale Song, Ashish Kapoor

Models that are learned from real-world data are often biased because the data used to train them is biased.

Image Classification

Neural TTS Stylization with Adversarial and Collaborative Games

no code implementations ICLR 2019 Shuang Ma, Daniel McDuff, Yale Song

The synthesized audio waveform is expected to contain the verbal content of x_txt and the auditory style of x_aud.

Disentanglement Style Transfer

Video Prediction with Appearance and Motion Conditions

no code implementations ICML 2018 Yunseok Jang, Gunhee Kim, Yale Song

Video prediction aims to generate realistic future frames by learning dynamic visual patterns.

Video Prediction

Cross-Modal Retrieval with Implicit Concept Association

no code implementations12 Apr 2018 Yale Song, Mohammad Soleymani

Traditional cross-modal retrieval assumes explicit association of concepts across modalities, where there is no ambiguity in how the concepts are linked to each other, e. g., when we do the image search with a query "dogs", we expect to see dog images.

Cross-Modal Retrieval Image Retrieval +1

Image2GIF: Generating Cinemagraphs using Recurrent Deep Q-Networks

no code implementations27 Jan 2018 Yipin Zhou, Yale Song, Tamara L. Berg

Given a still photograph, one can imagine how dynamic objects might move against a static background.

Improving Pairwise Ranking for Multi-label Image Classification

4 code implementations CVPR 2017 Yuncheng Li, Yale Song, Jiebo Luo

Pairwise ranking, in particular, has been successful in multi-label image classification, achieving state-of-the-art results on various benchmarks.

Classification General Classification +2

Learning from Noisy Labels with Distillation

no code implementations ICCV 2017 Yuncheng Li, Jianchao Yang, Yale Song, Liangliang Cao, Jiebo Luo, Li-Jia Li

The ability of learning from noisy labels is very useful in many visual recognition tasks, as a vast amount of data with noisy labels are relatively easy to obtain.

Real-Time Video Highlights for Yahoo Esports

no code implementations27 Nov 2016 Yale Song

We present a technique for detecting highlights from live streaming videos of esports game matches.

Dota 2 League of Legends +1

To Click or Not To Click: Automatic Selection of Beautiful Thumbnails from Videos

2 code implementations6 Sep 2016 Yale Song, Miriam Redi, Jordi Vallmitjana, Alejandro Jaimes

Our system selects attractive thumbnails by analyzing various visual quality and aesthetic metrics of video frames, and performs a clustering analysis to determine the relevance to video content, thus making the resulting thumbnails more representative of the video.

Multimedia

Video2GIF: Automatic Generation of Animated GIFs from Video

1 code implementation CVPR 2016 Michael Gygli, Yale Song, Liangliang Cao

We introduce the novel problem of automatically generating animated GIFs from video.

Balancing Appearance and Context in Sketch Interpretation

no code implementations25 Apr 2016 Yale Song, Randall Davis, Kaichen Ma, Dana L. Penny

We describe a sketch interpretation system that detects and classifies clock numerals created by subjects taking the Clock Drawing Test, a clinical tool widely used to screen for cognitive impairments (e. g., dementia).

TGIF: A New Dataset and Benchmark on Animated GIF Description

1 code implementation CVPR 2016 Yuncheng Li, Yale Song, Liangliang Cao, Joel Tetreault, Larry Goldberg, Alejandro Jaimes, Jiebo Luo

The motivation for this work is to develop a testbed for image sequence description systems, where the task is to generate natural language descriptions for animated GIFs or video clips.

Image Captioning Machine Translation +3

Video Co-Summarization: Video Summarization by Visual Co-Occurrence

no code implementations CVPR 2015 Wen-Sheng Chu, Yale Song, Alejandro Jaimes

We present video co-summarization, a novel perspective to video summarization that exploits visual co-occurrence across multiple videos.

Video Summarization

TVSum: Summarizing Web Videos Using Titles

no code implementations CVPR 2015 Yale Song, Jordi Vallmitjana, Amanda Stent, Alejandro Jaimes

We observe that a video title is often carefully chosen to be maximally descriptive of its main topic, and hence images related to the title can serve as a proxy for important visual concepts of the main topic.

Image Retrieval Unsupervised Video Summarization

Action Recognition by Hierarchical Sequence Summarization

no code implementations CVPR 2013 Yale Song, Louis-Philippe Morency, Randall Davis

We develop an efficient learning method to train our model and show that its complexity grows sublinearly with the size of the hierarchy.

Action Recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.