Search Results for author: Kristen Grauman

Found 139 papers, 39 papers with code

Proposal-based Video Completion

no code implementations ECCV 2020 Yuan-Ting Hu, Heng Wang, Nicolas Ballas, Kristen Grauman, Alexander G. Schwing

Video inpainting is an important technique for a wide variety of applications from video content editing to video restoration.

Image Inpainting object-detection +4

Put Myself in Your Shoes: Lifting the Egocentric Perspective from Exocentric Videos

no code implementations11 Mar 2024 Mi Luo, Zihui Xue, Alex Dimakis, Kristen Grauman

We investigate exocentric-to-egocentric cross-view translation, which aims to generate a first-person (egocentric) view of an actor based on a video recording that captures the actor from a third-person (exocentric) perspective.

Hallucination Translation

Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

no code implementations30 Nov 2023 Kristen Grauman, Andrew Westbury, Lorenzo Torresani, Kris Kitani, Jitendra Malik, Triantafyllos Afouras, Kumar Ashutosh, Vijay Baiyya, Siddhant Bansal, Bikram Boote, Eugene Byrne, Zach Chavis, Joya Chen, Feng Cheng, Fu-Jen Chu, Sean Crane, Avijit Dasgupta, Jing Dong, Maria Escobar, Cristhian Forigua, Abrham Gebreselasie, Sanjay Haresh, Jing Huang, Md Mohaiminul Islam, Suyog Jain, Rawal Khirodkar, Devansh Kukreja, Kevin J Liang, Jia-Wei Liu, Sagnik Majumder, Yongsen Mao, Miguel Martin, Effrosyni Mavroudi, Tushar Nagarajan, Francesco Ragusa, Santhosh Kumar Ramakrishnan, Luigi Seminara, Arjun Somayazulu, Yale Song, Shan Su, Zihui Xue, Edward Zhang, Jinxu Zhang, Angela Castillo, Changan Chen, Xinzhu Fu, Ryosuke Furuta, Cristina Gonzalez, Prince Gupta, Jiabo Hu, Yifei HUANG, Yiming Huang, Weslie Khoo, Anush Kumar, Robert Kuo, Sach Lakhavani, Miao Liu, Mi Luo, Zhengyi Luo, Brighid Meredith, Austin Miller, Oluwatumininu Oguntola, Xiaqing Pan, Penny Peng, Shraman Pramanick, Merey Ramazanova, Fiona Ryan, Wei Shan, Kiran Somasundaram, Chenan Song, Audrey Southerland, Masatoshi Tateno, Huiyu Wang, Yuchen Wang, Takuma Yagi, Mingfei Yan, Xitong Yang, Zecheng Yu, Shengxin Cindy Zha, Chen Zhao, Ziwei Zhao, Zhifan Zhu, Jeff Zhuo, Pablo Arbelaez, Gedas Bertasius, David Crandall, Dima Damen, Jakob Engel, Giovanni Maria Farinella, Antonino Furnari, Bernard Ghanem, Judy Hoffman, C. V. Jawahar, Richard Newcombe, Hyun Soo Park, James M. Rehg, Yoichi Sato, Manolis Savva, Jianbo Shi, Mike Zheng Shou, Michael Wray

We present Ego-Exo4D, a diverse, large-scale multimodal multiview video dataset and benchmark challenge.

Video Understanding

Learning Spatial Features from Audio-Visual Correspondence in Egocentric Videos

no code implementations10 Jul 2023 Sagnik Majumder, Ziad Al-Halah, Kristen Grauman

We propose a self-supervised method for learning representations based on spatial audio-visual correspondences in egocentric videos.

Audio Denoising Denoising

SpotEM: Efficient Video Search for Episodic Memory

no code implementations28 Jun 2023 Santhosh Kumar Ramakrishnan, Ziad Al-Halah, Kristen Grauman

The goal in episodic memory (EM) is to search a long egocentric video to answer a natural language query (e. g., "where did I leave my purse?").

Natural Language Queries

Egocentric Video Task Translation @ Ego4D Challenge 2022

no code implementations3 Feb 2023 Zihui Xue, Yale Song, Kristen Grauman, Lorenzo Torresani

With no modification to the baseline architectures, our proposed approach achieves competitive performance on two Ego4D challenges, ranking the 1st in the talking to me challenge and the 3rd in the PNR keyframe localization challenge.

Translation

Novel-View Acoustic Synthesis

no code implementations CVPR 2023 Changan Chen, Alexander Richard, Roman Shapovalov, Vamsi Krishna Ithapu, Natalia Neverova, Kristen Grauman, Andrea Vedaldi

We introduce the novel-view acoustic synthesis (NVAS) task: given the sight and sound observed at a source viewpoint, can we synthesize the sound of that scene from an unseen target viewpoint?

Neural Rendering Novel View Synthesis

HierVL: Learning Hierarchical Video-Language Embeddings

no code implementations CVPR 2023 Kumar Ashutosh, Rohit Girdhar, Lorenzo Torresani, Kristen Grauman

Video-language embeddings are a promising avenue for injecting semantics into visual representations, but existing methods capture only short-term associations between seconds-long video clips and their accompanying text.

Action Classification Action Recognition +3

What You Say Is What You Show: Visual Narration Detection in Instructional Videos

no code implementations5 Jan 2023 Kumar Ashutosh, Rohit Girdhar, Lorenzo Torresani, Kristen Grauman

Narrated ''how-to'' videos have emerged as a promising data source for a wide range of learning problems, from learning visual representations to training robot policies.

Chat2Map: Efficient Scene Mapping from Multi-Ego Conversations

no code implementations CVPR 2023 Sagnik Majumder, Hao Jiang, Pierre Moulon, Ethan Henderson, Paul Calamia, Kristen Grauman, Vamsi Krishna Ithapu

Can conversational videos captured from multiple egocentric viewpoints reveal the map of a scene in a cost-efficient way?

NaQ: Leveraging Narrations as Queries to Supervise Episodic Memory

1 code implementation CVPR 2023 Santhosh Kumar Ramakrishnan, Ziad Al-Halah, Kristen Grauman

Searching long egocentric videos with natural language queries (NLQ) has compelling applications in augmented reality and robotics, where a fluid index into everything that a person (agent) has seen before could augment human memory and surface relevant information on demand.

Data Augmentation Natural Language Queries

Egocentric Video Task Translation

no code implementations CVPR 2023 Zihui Xue, Yale Song, Kristen Grauman, Lorenzo Torresani

Different video understanding tasks are typically treated in isolation, and even with distinct types of curated data (e. g., classifying sports in one dataset, tracking animals in another).

Multi-Task Learning Translation +1

Few-Shot Audio-Visual Learning of Environment Acoustics

no code implementations8 Jun 2022 Sagnik Majumder, Changan Chen, Ziad Al-Halah, Kristen Grauman

Room impulse response (RIR) functions capture how the surrounding physical environment transforms the sounds heard by a listener, with implications for various applications in AR, VR, and robotics.

audio-visual learning Room Impulse Response (RIR)

Visual Acoustic Matching

no code implementations CVPR 2022 Changan Chen, Ruohan Gao, Paul Calamia, Kristen Grauman

We introduce the visual acoustic matching task, in which an audio clip is transformed to sound like it was recorded in a target environment.

Zero Experience Required: Plug & Play Modular Transfer Learning for Semantic Visual Navigation

no code implementations CVPR 2022 Ziad Al-Halah, Santhosh K. Ramakrishnan, Kristen Grauman

In reinforcement learning for visual navigation, it is common to develop a model for each new task, and train that model from scratch with task-specific interactions in 3D environments.

Transfer Learning Visual Navigation

Active Audio-Visual Separation of Dynamic Sound Sources

1 code implementation2 Feb 2022 Sagnik Majumder, Kristen Grauman

We explore active audio-visual separation for dynamic sound sources, where an embodied agent moves intelligently in a 3D environment to continuously isolate the time-varying audio stream being emitted by an object of interest.

DexVIP: Learning Dexterous Grasping with Human Hand Pose Priors from Video

no code implementations1 Feb 2022 Priyanka Mandikal, Kristen Grauman

Dexterous multi-fingered robotic hands have a formidable action space, yet their morphological similarity to the human hand holds immense potential to accelerate robot learning.

Human-Object Interaction Detection Robotic Grasping

PONI: Potential Functions for ObjectGoal Navigation with Interaction-free Learning

no code implementations CVPR 2022 Santhosh Kumar Ramakrishnan, Devendra Singh Chaplot, Ziad Al-Halah, Jitendra Malik, Kristen Grauman

We propose Potential functions for ObjectGoal Navigation with Interaction-free learning (PONI), a modular approach that disentangles the skills of `where to look?'

Navigate

Geometry-Aware Multi-Task Learning for Binaural Audio Generation from Video

no code implementations21 Nov 2021 Rishabh Garg, Ruohan Gao, Kristen Grauman

Binaural audio provides human listeners with an immersive spatial sound experience, but most existing videos lack binaural audio recordings.

Multi-Task Learning Room Impulse Response (RIR)

Shaping embodied agent behavior with activity-context priors from egocentric video

no code implementations NeurIPS 2021 Tushar Nagarajan, Kristen Grauman

For a given object, an activity-context prior represents the set of other compatible objects that are required for activities to succeed (e. g., a knife and cutting board brought together with a tomato are conducive to cutting).

Ego4D: Around the World in 3,000 Hours of Egocentric Video

5 code implementations CVPR 2022 Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, Miguel Martin, Tushar Nagarajan, Ilija Radosavovic, Santhosh Kumar Ramakrishnan, Fiona Ryan, Jayant Sharma, Michael Wray, Mengmeng Xu, Eric Zhongcong Xu, Chen Zhao, Siddhant Bansal, Dhruv Batra, Vincent Cartillier, Sean Crane, Tien Do, Morrie Doulaty, Akshay Erapalli, Christoph Feichtenhofer, Adriano Fragomeni, Qichen Fu, Abrham Gebreselasie, Cristina Gonzalez, James Hillis, Xuhua Huang, Yifei HUANG, Wenqi Jia, Weslie Khoo, Jachym Kolar, Satwik Kottur, Anurag Kumar, Federico Landini, Chao Li, Yanghao Li, Zhenqiang Li, Karttikeya Mangalam, Raghava Modhugu, Jonathan Munro, Tullie Murrell, Takumi Nishiyasu, Will Price, Paola Ruiz Puentes, Merey Ramazanova, Leda Sari, Kiran Somasundaram, Audrey Southerland, Yusuke Sugano, Ruijie Tao, Minh Vo, Yuchen Wang, Xindi Wu, Takuma Yagi, Ziwei Zhao, Yunyi Zhu, Pablo Arbelaez, David Crandall, Dima Damen, Giovanni Maria Farinella, Christian Fuegen, Bernard Ghanem, Vamsi Krishna Ithapu, C. V. Jawahar, Hanbyul Joo, Kris Kitani, Haizhou Li, Richard Newcombe, Aude Oliva, Hyun Soo Park, James M. Rehg, Yoichi Sato, Jianbo Shi, Mike Zheng Shou, Antonio Torralba, Lorenzo Torresani, Mingfei Yan, Jitendra Malik

We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite.

De-identification Ethics

Shapes as Product Differentiation: Neural Network Embedding in the Analysis of Markets for Fonts

1 code implementation6 Jul 2021 Sukjin Han, Eric H. Schulman, Kristen Grauman, Santhosh Ramakrishnan

We then study the causal effects of a merger on the merging firm's creative decisions using the constructed measures in a synthetic control method.

Network Embedding

Learning Audio-Visual Dereverberation

1 code implementation14 Jun 2021 Changan Chen, Wei Sun, David Harwath, Kristen Grauman

We introduce Visually-Informed Dereverberation of Audio (VIDA), an end-to-end approach that learns to remove reverberation based on both the observed monaural sound and visual scene.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Anticipative Video Transformer

1 code implementation ICCV 2021 Rohit Girdhar, Kristen Grauman

We propose Anticipative Video Transformer (AVT), an end-to-end attention-based video modeling architecture that attends to the previously observed video in order to anticipate future actions.

Ranked #2 on Action Anticipation on EPIC-KITCHENS-100 (test) (using extra training data)

Action Anticipation

Egocentric Activity Recognition and Localization on a 3D Map

no code implementations20 May 2021 Miao Liu, Lingni Ma, Kiran Somasundaram, Yin Li, Kristen Grauman, James M. Rehg, Chao Li

Given a video captured from a first person perspective and the environment context of where the video is recorded, can we recognize what the person is doing and identify where the action occurs in the 3D space?

Action Localization Action Recognition +2

Move2Hear: Active Audio-Visual Source Separation

no code implementations ICCV 2021 Sagnik Majumder, Ziad Al-Halah, Kristen Grauman

We introduce the active audio-visual source separation problem, where an agent must move intelligently in order to better isolate the sounds coming from an object of interest in its environment.

Audio Source Separation Object

Multiview Pseudo-Labeling for Semi-supervised Learning from Video

no code implementations ICCV 2021 Bo Xiong, Haoqi Fan, Kristen Grauman, Christoph Feichtenhofer

We present a multiview pseudo-labeling approach to video learning, a novel framework that uses complementary views in the form of appearance and motion information for semi-supervised learning in video.

Representation Learning Video Recognition

Environment Predictive Coding for Embodied Agents

no code implementations3 Feb 2021 Santhosh K. Ramakrishnan, Tushar Nagarajan, Ziad Al-Halah, Kristen Grauman

We introduce environment predictive coding, a self-supervised approach to learn environment-level representations for embodied agents.

Self-Supervised Learning

From Culture to Clothing: Discovering the World Events Behind A Century of Fashion Images

no code implementations ICCV 2021 Wei-Lin Hsiao, Kristen Grauman

Fashion is intertwined with external cultural factors, but identifying these links remains a manual process limited to only the most salient phenomena.

Cultural Vocal Bursts Intensity Prediction

VisualVoice: Audio-Visual Speech Separation with Cross-Modal Consistency

1 code implementation CVPR 2021 Ruohan Gao, Kristen Grauman

Given a video, the goal is to extract the speech associated with a face in spite of simultaneous background sounds and/or other human speakers.

Speech Separation

Semantic Audio-Visual Navigation

no code implementations CVPR 2021 Changan Chen, Ziad Al-Halah, Kristen Grauman

We propose a transformer-based model to tackle this new semantic AudioGoal task, incorporating an inferred goal descriptor that captures both spatial and semantic properties of the target.

Position Visual Navigation

Discovering Underground Maps from Fashion

no code implementations4 Dec 2020 Utkarsh Mall, Kavita Bala, Tamara Berg, Kristen Grauman

The fashion sense -- meaning the clothing styles people wear -- in a geographical region can reveal information about that region.

Modeling Fashion Influence from Photos

no code implementations17 Nov 2020 Ziad Al-Halah, Kristen Grauman

The discovered influence relationships reveal how both cities and brands exert and receive fashion influence for an array of visual styles inferred from the images.

Learning Dexterous Grasping with Object-Centric Visual Affordances

1 code implementation3 Sep 2020 Priyanka Mandikal, Kristen Grauman

Our key idea is to embed an object-centric visual affordance model within a deep reinforcement learning loop to learn grasping policies that favor the same object regions favored by people.

Object Robotic Grasping

Learning Affordance Landscapes for Interaction Exploration in 3D Environments

1 code implementation NeurIPS 2020 Tushar Nagarajan, Kristen Grauman

We introduce a reinforcement learning approach for exploration for interaction, whereby an embodied agent autonomously discovers the affordance landscape of a new unmapped 3D environment (such as an unfamiliar kitchen).

Occupancy Anticipation for Efficient Exploration and Navigation

1 code implementation ECCV 2020 Santhosh K. Ramakrishnan, Ziad Al-Halah, Kristen Grauman

State-of-the-art navigation methods leverage a spatial memory to generalize to new environments, but their occupancy maps are limited to capturing the geometric structures directly observed by the agent.

Decision Making Efficient Exploration +1

Learning to Set Waypoints for Audio-Visual Navigation

1 code implementation ICLR 2021 Changan Chen, Sagnik Majumder, Ziad Al-Halah, Ruohan Gao, Santhosh Kumar Ramakrishnan, Kristen Grauman

In audio-visual navigation, an agent intelligently travels through a complex, unmapped 3D environment using both sights and sounds to find a sound source (e. g., a phone ringing in another room).

Visual Navigation

Learning Patterns of Tourist Movement and Photography from Geotagged Photos at Archaeological Heritage Sites in Cuzco, Peru

no code implementations29 Jun 2020 Nicole D. Payntar, Wei-Lin Hsiao, R. Alan Covey, Kristen Grauman

The popularity of media sharing platforms in recent decades has provided an abundance of open source data that remains underutilized by heritage scholars.

Cultural Vocal Bursts Intensity Prediction

VisualEchoes: Spatial Image Representation Learning through Echolocation

no code implementations ECCV 2020 Ruohan Gao, Changan Chen, Ziad Al-Halah, Carl Schissler, Kristen Grauman

Several animal species (e. g., bats, dolphins, and whales) and even visually impaired humans have the remarkable ability to perform echolocation: a biological sonar used to perceive spatial layout and locate objects in the world.

Monocular Depth Estimation Representation Learning +2

From Paris to Berlin: Discovering Fashion Style Influences Around the World

1 code implementation CVPR 2020 Ziad Al-Halah, Kristen Grauman

The evolution of clothing styles and their migration across the world is intriguing, yet difficult to describe quantitatively.

EGO-TOPO: Environment Affordances from Egocentric Video

1 code implementation CVPR 2020 Tushar Nagarajan, Yanghao Li, Christoph Feichtenhofer, Kristen Grauman

We introduce a model for environment affordances that is learned directly from egocentric video.

An Exploration of Embodied Visual Exploration

1 code implementation7 Jan 2020 Santhosh K. Ramakrishnan, Dinesh Jayaraman, Kristen Grauman

Embodied computer vision considers perception for robots in novel, unstructured environments.

Benchmarking

SoundSpaces: Audio-Visual Navigation in 3D Environments

2 code implementations ECCV 2020 Changan Chen, Unnat Jain, Carl Schissler, Sebastia Vicenc Amengual Gari, Ziad Al-Halah, Vamsi Krishna Ithapu, Philip Robinson, Kristen Grauman

Moving around in the world is naturally a multisensory experience, but today's embodied agents are deaf---restricted to solely their visual perception of the environment.

Navigate Visual Navigation

ViBE: Dressing for Diverse Body Shapes

no code implementations CVPR 2020 Wei-Lin Hsiao, Kristen Grauman

Body shape plays an important role in determining what garments will best suit a given person, yet today's clothing recommendation methods take a "one shape fits all" approach.

Emergence of Exploratory Look-Around Behaviors through Active Observation Completion

1 code implementation Science Robotics 2019 Santhosh K. Ramakrishnan, Dinesh Jayaraman, Kristen Grauman

Standard computer vision systems assume access to intelligently captured inputs (e. g., photos from a human photographer), yet autonomously capturing good observations is a major challenge in itself.

Active Observation Completion

Grounded Human-Object Interaction Hotspots from Video (Extended Abstract)

no code implementations3 Jun 2019 Tushar Nagarajan, Christoph Feichtenhofer, Kristen Grauman

Learning how to interact with objects is an important step towards embodied visual intelligence, but existing techniques suffer from heavy supervision or sensing requirements.

Human-Object Interaction Detection Object +1

Fashion IQ: A New Dataset Towards Retrieving Images by Natural Language Feedback

3 code implementations CVPR 2021 Hui Wu, Yupeng Gao, Xiaoxiao Guo, Ziad Al-Halah, Steven Rennie, Kristen Grauman, Rogerio Feris

We provide a detailed analysis of the characteristics of the Fashion IQ data, and present a transformer-based user simulator and interactive image retriever that can seamlessly integrate visual attributes with image features, user feedback, and dialog history, leading to improved performance over the state of the art in dialog-based image retrieval.

Attribute Image Retrieval +1

Predicting How to Distribute Work Between Algorithms and Humans to Segment an Image Batch

no code implementations30 Apr 2019 Danna Gurari, Yinan Zhao, Suyog Dutt Jain, Margrit Betke, Kristen Grauman

We propose a resource allocation framework for predicting how best to allocate a fixed budget of human annotation effort in order to collect higher quality segmentations for a given batch of images and automated methods.

Semantic Segmentation

You2Me: Inferring Body Pose in Egocentric Video via First and Second Person Interactions

1 code implementation CVPR 2020 Evonne Ng, Donglai Xiang, Hanbyul Joo, Kristen Grauman

The body pose of a person wearing a camera is of great interest for applications in augmented reality, healthcare, and robotics, yet much of the person's body is out of view for a typical wearable camera.

Pose Estimation

Fashion++: Minimal Edits for Outfit Improvement

no code implementations ICCV 2019 Wei-Lin Hsiao, Isay Katsman, Chao-yuan Wu, Devi Parikh, Kristen Grauman

We introduce Fashion++, an approach that proposes minimal adjustments to a full-body clothing outfit that will have maximal impact on its fashionability.

Image Generation

Co-Separating Sounds of Visual Objects

3 code implementations ICCV 2019 Ruohan Gao, Kristen Grauman

Learning how objects sound from video is challenging, since they often heavily overlap in a single audio channel.

Audio Denoising Audio Source Separation +1

Next-Active-Object prediction from Egocentric Videos

no code implementations10 Apr 2019 Antonino Furnari, Sebastiano Battiato, Kristen Grauman, Giovanni Maria Farinella

Although First Person Vision systems can sense the environment from the user's perspective, they are generally unable to predict his intentions and goals.

Object

Less is More: Learning Highlight Detection from Video Duration

no code implementations CVPR 2019 Bo Xiong, Yannis Kalantidis, Deepti Ghadiyaram, Kristen Grauman

Highlight detection has the potential to significantly ease video browsing, but existing methods often suffer from expensive supervision requirements, where human viewers must manually identify highlights in training videos.

Highlight Detection

Extreme Relative Pose Estimation for RGB-D Scans via Scene Completion

1 code implementation CVPR 2019 Zhenpei Yang, Jeffrey Z. Pan, Linjie Luo, Xiaowei Zhou, Kristen Grauman, Qi-Xing Huang

In particular, instead of only performing scene completion from each individual scan, our approach alternates between relative pose estimation and scene completion.

Pose Estimation

2.5D Visual Sound

2 code implementations CVPR 2019 Ruohan Gao, Kristen Grauman

We devise a deep convolutional neural network that learns to decode the monaural (single-channel) soundtrack into its binaural counterpart by injecting visual information about object and scene configurations.

Grounded Human-Object Interaction Hotspots from Video

1 code implementation ICCV 2019 Tushar Nagarajan, Christoph Feichtenhofer, Kristen Grauman

Learning how to interact with objects is an important step towards embodied visual intelligence, but existing techniques suffer from heavy supervision or sensing requirements.

Human-Object Interaction Detection Object +3

Kernel Transformer Networks for Compact Spherical Convolution

no code implementations CVPR 2019 Yu-Chuan Su, Kristen Grauman

KTNs efficiently transfer convolution kernels from perspective images to the equirectangular projection of 360{\deg} images.

SpotTune: Transfer Learning through Adaptive Fine-tuning

3 code implementations CVPR 2019 Yunhui Guo, Honghui Shi, Abhishek Kumar, Kristen Grauman, Tajana Rosing, Rogerio Feris

Transfer learning, which allows a source task to affect the inductive bias of the target task, is widely used in computer vision.

Inductive Bias Transfer Learning

Retrospective Encoders for Video Summarization

no code implementations ECCV 2018 Ke Zhang, Kristen Grauman, Fei Sha

The key idea is to complement the discriminative losses with another loss which measures if the predicted summary preserves the same information as in the original video.

Metric Learning Video Summarization

Sidekick Policy Learning for Active Visual Exploration

no code implementations ECCV 2018 Santhosh K. Ramakrishnan, Kristen Grauman

We consider an active visual exploration scenario, where an agent must intelligently select its camera motions to efficiently reconstruct the full environment from only a limited set of narrow field-of-view glimpses.

Learning Compressible 360° Video Isomers

no code implementations CVPR 2018 Yu-Chuan Su, Kristen Grauman

Standard video encoders developed for conventional narrow field-of-view video are widely applied to 360° video as well, with reasonable results.

Learning to Separate Object Sounds by Watching Unlabeled Video

2 code implementations ECCV 2018 Ruohan Gao, Rogerio Feris, Kristen Grauman

Our work is the first to learn audio source separation from large-scale "in the wild" videos containing multiple audio sources per video.

Audio Denoising Audio Source Separation +2

Compare and Contrast: Learning Prominent Visual Differences

no code implementations CVPR 2018 Steven Chen, Kristen Grauman

We collect instance-level annotations of most noticeable differences, and build a model trained on relative attribute features that predicts prominent differences for unseen pairs.

Attribute Image Classification

Snap Angle Prediction for 360$^{\circ}$ Panoramas

no code implementations31 Mar 2018 Bo Xiong, Kristen Grauman

360$^{\circ}$ panoramas are a rich medium, yet notoriously difficult to visualize in the 2D image plane.

reinforcement-learning Reinforcement Learning (RL)

Attributes as Operators: Factorizing Unseen Attribute-Object Compositions

1 code implementation ECCV 2018 Tushar Nagarajan, Kristen Grauman

In addition, we show that not only can our model recognize unseen compositions robustly in an open-world setting, it can also generalize to compositions where objects themselves were unseen during training.

Attribute Compositional Zero-Shot Learning +2

VizWiz Grand Challenge: Answering Visual Questions from Blind People

1 code implementation CVPR 2018 Danna Gurari, Qing Li, Abigale J. Stangl, Anhong Guo, Chi Lin, Kristen Grauman, Jiebo Luo, Jeffrey P. Bigham

The study of algorithms to automatically answer visual questions currently is motivated by visual question answering (VQA) datasets constructed in artificial VQA settings.

Question Answering Visual Question Answering

Learning Compressible 360° Video Isomers

no code implementations12 Dec 2017 Yu-Chuan Su, Kristen Grauman

Standard video encoders developed for conventional narrow field-of-view video are widely applied to 360{\deg} video as well, with reasonable results.

Im2Flow: Motion Hallucination from Static Images for Action Recognition

4 code implementations CVPR 2018 Ruohan Gao, Bo Xiong, Kristen Grauman

Second, we show the power of hallucinated flow for recognition, successfully transferring the learned motion into a standard two-stream network for activity recognition.

Action Recognition Hallucination +2

Creating Capsule Wardrobes from Fashion Images

no code implementations CVPR 2018 Wei-Lin Hsiao, Kristen Grauman

To permit efficient subset selection over the space of all outfit combinations, we develop submodular objective functions capturing the key ingredients of visual compatibility, versatility, and user-specific preference.

BlockDrop: Dynamic Inference Paths in Residual Networks

1 code implementation CVPR 2018 Zuxuan Wu, Tushar Nagarajan, Abhishek Kumar, Steven Rennie, Larry S. Davis, Kristen Grauman, Rogerio Feris

Very deep convolutional neural networks offer excellent recognition results, yet their computational expense limits their impact for many real-world applications.

ShapeCodes: Self-Supervised Feature Learning by Lifting Views to Viewgrids

no code implementations ECCV 2018 Dinesh Jayaraman, Ruohan Gao, Kristen Grauman

We introduce an unsupervised feature learning approach that embeds 3D shape information into a single-view image representation.

Object Object Recognition

Learning to Look Around: Intelligently Exploring Unseen Environments for Unknown Tasks

2 code implementations CVPR 2018 Dinesh Jayaraman, Kristen Grauman

It is common to implicitly assume access to intelligently captured inputs (e. g., photos from a human photographer), yet autonomously capturing good observations is itself a major challenge.

Learning Spherical Convolution for Fast Features from 360° Imagery

no code implementations NeurIPS 2017 Yu-Chuan Su, Kristen Grauman

While 360{\deg} cameras offer tremendous new possibilities in vision, graphics, and augmented reality, the spherical images they produce make core feature extraction non-trivial.

Learning the Latent "Look": Unsupervised Discovery of a Style-Coherent Embedding from Fashion Images

1 code implementation ICCV 2017 Wei-Lin Hsiao, Kristen Grauman

Given a collection of unlabeled fashion images, our approach mines for the latent styles, then summarizes outfits by how they mix those styles.

Topic Models

Predicting Foreground Object Ambiguity and Efficiently Crowdsourcing the Segmentation(s)

no code implementations30 Apr 2017 Danna Gurari, Kun He, Bo Xiong, Jianming Zhang, Mehrnoosh Sameki, Suyog Dutt Jain, Stan Sclaroff, Margrit Betke, Kristen Grauman

We propose the ambiguity problem for the foreground object segmentation task and motivate the importance of estimating and accounting for this ambiguity when designing vision systems.

Object Semantic Segmentation +1

Making 360$^{\circ}$ Video Watchable in 2D: Learning Videography for Click Free Viewing

no code implementations1 Mar 2017 Yu-Chuan Su, Kristen Grauman

360$^{\circ}$ video requires human viewers to actively control "where" to look while watching the video.

Navigate

Pixel Objectness

no code implementations19 Jan 2017 Suyog Dutt Jain, Bo Xiong, Kristen Grauman

We propose an end-to-end learning framework for generating foreground object segmentations.

Foreground Segmentation Image Retargeting +5

Semantic Jitter: Dense Supervision for Visual Comparisons via Synthetic Images

no code implementations ICCV 2017 Aron Yu, Kristen Grauman

Distinguishing subtle differences in attributes is valuable, yet learning to make visual comparisons remains non-trivial.

Attribute Image Generation +1

Pano2Vid: Automatic Cinematography for Watching 360$^{\circ}$ Videos

no code implementations7 Dec 2016 Yu-Chuan Su, Dinesh Jayaraman, Kristen Grauman

AutoCam leverages NFOV web video to discriminatively identify space-time "glimpses" of interest at each time instant, and then uses dynamic programming to select optimal human-like camera trajectories.

On-Demand Learning for Deep Image Restoration

1 code implementation ICCV 2017 Ruohan Gao, Kristen Grauman

While machine learning approaches to image restoration offer great promise, current methods risk training models fixated on performing well only for image corruption of a particular level of difficulty---such as a certain level of noise or blur.

Deblurring Image Deblurring +3

Object-Centric Representation Learning from Unlabeled Videos

no code implementations1 Dec 2016 Ruohan Gao, Dinesh Jayaraman, Kristen Grauman

Compared to existing temporal coherence methods, our idea has the advantage of lightweight preprocessing of the unlabeled video (no tracking required) while still being able to extract object-level regions from which to learn invariances.

Image Classification Object +2

Crowdsourcing in Computer Vision

no code implementations7 Nov 2016 Adriana Kovashka, Olga Russakovsky, Li Fei-Fei, Kristen Grauman

Computer vision systems require large amounts of manually annotated data to properly learn challenging visual concepts.

Object Recognition

Visual Question: Predicting If a Crowd Will Agree on the Answer

no code implementations29 Aug 2016 Danna Gurari, Kristen Grauman

Visual question answering (VQA) systems are emerging from a desire to empower users to ask any natural language question about visual content and receive a valid answer in response.

Question Answering valid +1

Efficient Activity Detection in Untrimmed Video with Max-Subgraph Search

no code implementations11 Jul 2016 Chao-Yeh Chen, Kristen Grauman

We show that this detection strategy permits an efficient branch-and-cut solution for the best-scoring---and possibly non-cubically shaped---portion of the video for a given activity classifier.

Action Detection Activity Detection

Click Carving: Segmenting Objects in Video with Point Clicks

no code implementations5 Jul 2016 Suyog Dutt Jain, Kristen Grauman

We present a novel form of interactive video object segmentation where a few clicks by the user helps the system produce a full spatio-temporal segmentation of the object of interest.

Interactive Video Object Segmentation Object +3

Active Image Segmentation Propagation

no code implementations CVPR 2016 Suyog Dutt Jain, Kristen Grauman

We propose a semi-automatic method to obtain foreground object masks for a large set of related images.

Image Segmentation Object Discovery +2

Pull the Plug? Predicting If Computers or Humans Should Segment Images

no code implementations CVPR 2016 Danna Gurari, Suyog Jain, Margrit Betke, Kristen Grauman

We propose a resource allocation framework for predicting how best to allocate a fixed budget of human annotation effort in order to collect higher quality segmentations for a given batch of images and automated methods.

Semantic Segmentation

Video Summarization with Long Short-term Memory

1 code implementation26 May 2016 Ke Zhang, Wei-Lun Chao, Fei Sha, Kristen Grauman

We propose a novel supervised learning technique for summarizing videos by automatically selecting keyframes or key subshots.

Domain Adaptation Structured Prediction +1

Look-ahead before you leap: end-to-end active recognition by forecasting the effect of motion

no code implementations30 Apr 2016 Dinesh Jayaraman, Kristen Grauman

To verify this hypothesis, we attempt to induce this capacity in our active recognition pipeline, by simultaneously learning to forecast the effects of the agent's motions on its internal representation of the environment conditional on all past views.

Subjects and Their Objects: Localizing Interactees for a Person-Centric View of Importance

no code implementations17 Apr 2016 Chao-Yeh Chen, Kristen Grauman

We propose to predict the "interactee" in novel images---that is, to localize the \emph{object} of a person's action.

Image Retargeting Object +3

Detangling People: Individuating Multiple Close People and Their Body Parts via Region Assembly

no code implementations CVPR 2017 Hao Jiang, Kristen Grauman

In addition, we demonstrate its impact on a proxemics recognition task, which demands a precise representation of "whose body part is where" in crowded images.

Human Detection Semantic Segmentation

Detecting Engagement in Egocentric Video

no code implementations4 Apr 2016 Yu-Chuan Su, Kristen Grauman

In a wearable camera video, we see what the camera wearer sees.

Video Summarization

Leaving Some Stones Unturned: Dynamic Feature Prioritization for Activity Detection in Streaming Video

no code implementations1 Apr 2016 Yu-Chuan Su, Kristen Grauman

Current approaches for activity recognition often ignore constraints on computational resources: 1) they rely on extensive feature computation to obtain rich descriptors on all frames, and 2) they assume batch-mode access to the entire test video at once.

Action Detection Activity Detection +2

Summary Transfer: Exemplar-based Subset Selection for Video Summarization

no code implementations CVPR 2016 Ke Zhang, Wei-Lun Chao, Fei Sha, Kristen Grauman

Video summarization has unprecedented importance to help us digest, browse, and search today's ever-growing video collections.

Video Summarization

Just Noticeable Differences in Visual Attributes

no code implementations ICCV 2015 Aron Yu, Kristen Grauman

We develop a Bayesian local learning strategy to infer when images are indistinguishable for a given attribute.

Attribute

Slow and steady feature analysis: higher order temporal coherence in video

no code implementations CVPR 2016 Dinesh Jayaraman, Kristen Grauman

While this standard approach captures the fact that high-level visual signals change slowly over time, it fails to capture *how* the visual content changes.

Action Recognition Temporal Action Localization

Predicting Important Objects for Egocentric Video Summarization

no code implementations18 May 2015 Yong Jae Lee, Kristen Grauman

Our results on two egocentric video datasets show the method's promise relative to existing techniques for saliency and summarization.

Event Detection Video Summarization

WhittleSearch: Interactive Image Search with Relative Attribute Feedback

no code implementations15 May 2015 Adriana Kovashka, Devi Parikh, Kristen Grauman

We propose a novel mode of feedback for image search, where a user describes which properties of exemplar images should be adjusted in order to more closely match his/her mental model of the image sought.

Attribute Image Retrieval

Learning image representations tied to ego-motion

1 code implementation ICCV 2015 Dinesh Jayaraman, Kristen Grauman

Understanding how images of objects and scenes behave in response to specific ego-motions is a crucial aspect of proper visual development, yet existing visual learning methods are conspicuously disconnected from the physical source of their images.

Autonomous Driving Scene Recognition

Zero-shot recognition with unreliable attributes

no code implementations NeurIPS 2014 Dinesh Jayaraman, Kristen Grauman

In principle, zero-shot learning makes it possible to train an object recognition model simply by specifying the category's attributes.

Attribute Object Recognition +1

Predicting Useful Neighborhoods for Lazy Local Learning

no code implementations NeurIPS 2014 Aron Yu, Kristen Grauman

Lazy local learning methods train a classifier on the fly" at test time, using only a subset of the training instances that are most relevant to the novel test example.

General Classification Image Classification +1

Large-Margin Determinantal Point Processes

no code implementations6 Nov 2014 Boqing Gong, Wei-Lun Chao, Kristen Grauman, Fei Sha

Extensive empirical studies validate our contributions, including applications on challenging document and video summarization, where flexibility in modeling the kernel matrix and balancing different errors is indispensable.

Point Processes Video Summarization

Zero Shot Recognition with Unreliable Attributes

no code implementations15 Sep 2014 Dinesh Jayaraman, Kristen Grauman

In principle, zero-shot learning makes it possible to train a recognition model simply by specifying the category's attributes.

Attribute Zero-Shot Learning

Inferring Unseen Views of People

no code implementations CVPR 2014 Chao-Yeh Chen, Kristen Grauman

We pose unseen view synthesis as a probabilistic tensor completion problem.

Fine-Grained Visual Comparisons with Local Learning

no code implementations CVPR 2014 Aron Yu, Kristen Grauman

Given two images, we want to predict which exhibits a particular visual attribute more than the other---even when the two images are quite similar.

Attribute

Beyond Comparing Image Pairs: Setwise Active Learning for Relative Attributes

no code implementations CVPR 2014 Lucy Liang, Kristen Grauman

It is useful to automatically compare images based on their visual properties---to predict which image is brighter, more feminine, more blurry, etc.

Active Learning Attribute

Decorrelating Semantic Visual Attributes by Resisting the Urge to Share

no code implementations CVPR 2014 Dinesh Jayaraman, Fei Sha, Kristen Grauman

Existing methods to learn visual attributes are prone to learning the wrong thing---namely, properties that are correlated with the attribute of interest among training samples.

Attribute Multi-Task Learning

Inferring Analogous Attributes

no code implementations CVPR 2014 Chao-Yeh Chen, Kristen Grauman

The appearance of an attribute can vary considerably from class to class (e. g., a "fluffy" dog vs. a "fluffy" towel), making standard class-independent attribute models break down.

Attribute Transfer Learning

Reshaping Visual Datasets for Domain Adaptation

no code implementations NeurIPS 2013 Boqing Gong, Kristen Grauman, Fei Sha

By maximum distinctiveness, we require the underlying distributions of the identified domains to be different from each other; by maximum learnability, we ensure that a strong discriminative model can be learned from the domain.

Domain Adaptation Human Activity Recognition +1

Story-Driven Summarization for Egocentric Video

no code implementations CVPR 2013 Zheng Lu, Kristen Grauman

We present a video summarization approach that discovers the story of an egocentric video.

Video Summarization

Deformable Spatial Pyramid Matching for Fast Dense Correspondences

no code implementations CVPR 2013 Jaechul Kim, Ce Liu, Fei Sha, Kristen Grauman

We introduce a fast deformable spatial pyramid (DSP) matching algorithm for computing dense pixel correspondences.

Watching Unlabeled Video Helps Learn New Human Actions from Very Few Labeled Snapshots

no code implementations CVPR 2013 Chao-Yeh Chen, Kristen Grauman

We propose an approach to learn action categories from static images that leverages prior observations of generic human motion to augment its training process.

Semantic Kernel Forests from Multiple Taxonomies

no code implementations NeurIPS 2012 Sung Ju Hwang, Kristen Grauman, Fei Sha

When learning features for complex visual recognition problems, labeled image exemplars alone can be insufficient.

Object Object Recognition

Learning a Tree of Metrics with Disjoint Visual Features

no code implementations NeurIPS 2011 Kristen Grauman, Fei Sha, Sung Ju Hwang

Given a hierarchical taxonomy that captures semantic similarity between the objects, we learn a corresponding tree of metrics (ToM).

Attribute Metric Learning +2

Hashing Hyperplane Queries to Near Points with Applications to Large-Scale Active Learning

no code implementations NeurIPS 2010 Prateek Jain, Sudheendra Vijayanarasimhan, Kristen Grauman

Our first approach maps the data to two-bit binary keys that are locality-sensitive for the angle between the hyperplane normal and a database point.

Active Learning

Multi-Level Active Prediction of Useful Image Annotations for Recognition

no code implementations NeurIPS 2008 Sudheendra Vijayanarasimhan, Kristen Grauman

We introduce a framework for actively learning visual categories from a mixture of weakly and strongly labeled image examples.

Online Metric Learning and Fast Similarity Search

no code implementations NeurIPS 2008 Prateek Jain, Brian Kulis, Inderjit S. Dhillon, Kristen Grauman

Metric learning algorithms can provide useful distance functions for a variety of domains, and recent work has shown good accuracy for problems where the learner can access all distance constraints at once.

Metric Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.