no code implementations • 22 Feb 2022 • Harish Haresamudram, Irfan Essa, Thomas Plötz
As such, self-supervision, i. e., the paradigm of 'pretrain-then-finetune' has the potential to become a strong alternative to the predominant end-to-end training approaches, let alone the classic activity recognition chain with hand-crafted features of sensor data.
no code implementations • 11 Feb 2022 • Karan Samel, Zelin Zhao, Binghong Chen, Shuang Li, Dharmashankar Subramanian, Irfan Essa, Le Song
Events across a timeline are a common data representation, seen in different temporal modalities.
no code implementations • 9 Dec 2021 • Xiang Kong, Lu Jiang, Huiwen Chang, Han Zhang, Yuan Hao, Haifeng Gong, Irfan Essa
Our results demonstrate two key advances to the state-of-the-art layout transformer models.
no code implementations • 20 Nov 2021 • Apoorva Beedu, Zhile Ren, Varun Agrawal, Irfan Essa
We introduce a simple yet effective algorithm that uses convolutional neural networks to directly estimate object poses from videos.
no code implementations • ICLR 2022 • Chengzhi Mao, Lu Jiang, Mostafa Dehghani, Carl Vondrick, Rahul Sukthankar, Irfan Essa
Vision Transformer (ViT) is emerging as the state-of-the-art architecture for image recognition.
Ranked #1 on
Domain Generalization
on Stylized-ImageNet
no code implementations • 29 Sep 2021 • Karan Samel, Zelin Zhao, Binghong Chen, Shuang Li, Dharmashankar Subramanian, Irfan Essa, Le Song
Events across a timeline are a common data representation, seen in different temporal modalities.
no code implementations • 28 Jun 2021 • AJ Piergiovanni, Anelia Angelova, Michael S. Ryoo, Irfan Essa
In this paper we address the problem of automatically discovering atomic actions in unsupervised manner from instructional videos.
no code implementations • 7 Jun 2021 • AJ Piergiovanni, Anelia Angelova, Michael S. Ryoo, Irfan Essa
In this paper we address the problem of automatically discovering atomic actions in unsupervised manner from instructional videos, which are rarely annotated with atomic actions.
no code implementations • 14 May 2021 • Nathan Frey, Peggy Chi, Weilong Yang, Irfan Essa
We propose an automatic approach that extracts editing styles in a source video and applies the edits to matched footage for video creation.
no code implementations • 29 Mar 2021 • Dan Scarafoni, Irfan Essa, Thomas Ploetz
Action prediction focuses on anticipating actions before they happen.
no code implementations • 11 Dec 2020 • Erik Wijmans, Irfan Essa, Dhruv Batra
PointGoal navigation has seen significant recent interest and progress, spurred on by the Habitat platform and associated challenge.
no code implementations • 9 Dec 2020 • Harish Haresamudram, Irfan Essa, Thomas Ploetz
Our work focuses on effective use of small amounts of labeled data and the opportunistic exploitation of unlabeled data that are straightforward to collect in mobile and ubiquitous computing scenarios.
1 code implementation • 2 Oct 2020 • Vincent Cartillier, Zhile Ren, Neha Jain, Stefan Lee, Irfan Essa, Dhruv Batra
We study the task of semantic mapping - specifically, an embodied agent (a robot or an egocentric AI assistant) is given a tour of a new environment and asked to build an allocentric top-down semantic map ("what is where?")
1 code implementation • 11 Aug 2020 • Tianhao Zhang, Hung-Yu Tseng, Lu Jiang, Weilong Yang, Honglak Lee, Irfan Essa
In recent years, text-guided image manipulation has gained increasing attention in the multimedia and computer vision community.
no code implementations • 12 Mar 2020 • Erik Wijmans, Julian Straub, Dhruv Batra, Irfan Essa, Judy Hoffman, Ari Morcos
Recent advances in deep reinforcement learning require a large amount of training data and generally result in representations that are often over specialized to the target task.
no code implementations • ICLR 2020 • Erik Wijmans, Julian Straub, Irfan Essa, Dhruv Batra, Judy Hoffman, Ari Morcos
Surprisingly, we find that slight differences in task have no measurable effect on the visual representation for both SqueezeNet and ResNet architectures.
no code implementations • ECCV 2020 • Hsin-Ying Lee, Lu Jiang, Irfan Essa, Phuong B Le, Haifeng Gong, Ming-Hsuan Yang, Weilong Yang
The first module predicts a graph with complete relations from a graph with user-specified relations.
6 code implementations • ICLR 2020 • Erik Wijmans, Abhishek Kadian, Ari Morcos, Stefan Lee, Irfan Essa, Devi Parikh, Manolis Savva, Dhruv Batra
We leverage this scaling to train an agent for 2. 5 Billion steps of experience (the equivalent of 80 years of human experience) -- over 6 months of GPU-time training in under 3 days of wall-clock time with 64 GPUs.
Ranked #1 on
PointGoal Navigation
on Gibson PointGoal Navigation
no code implementations • 9 Jul 2019 • K. Niranjan Kumar, Irfan Essa, Sehoon Ha, C. Karen Liu
Using our method, we train a robotic arm to estimate the mass distribution of an object with moving parts (e. g. an articulated rigid body system) by pushing it on a surface with unknown friction properties.
no code implementations • 3 Jul 2019 • Aneeq Zia, Liheng Guo, Linlin Zhou, Irfan Essa, Anthony Jarc
Conclusions: We demonstrate that metrics-based evaluation of surgical activity recognition models is a viable approach to determine when models can be used to quantify surgical efficiencies.
1 code implementation • 16 Jun 2019 • Steven Hickson, Karthik Raveendran, Alireza Fathi, Kevin Murphy, Irfan Essa
We propose 4 insights that help to significantly improve the performance of deep learning models that predict surface normals and semantic labels from a single RGB image.
Ranked #1 on
Surface Normals Estimation
on ScanNetV2
no code implementations • CVPR 2019 • Erik Wijmans, Samyak Datta, Oleksandr Maksymets, Abhishek Das, Georgia Gkioxari, Stefan Lee, Irfan Essa, Devi Parikh, Dhruv Batra
To help bridge the gap between internet vision-style problems and the goal of vision for embodied perception we instantiate a large-scale navigation task -- Embodied Question Answering [1] in photo-realistic environments (Matterport 3D).
2 code implementations • 25 Jan 2019 • Huda Alamri, Vincent Cartillier, Abhishek Das, Jue Wang, Anoop Cherian, Irfan Essa, Dhruv Batra, Tim K. Marks, Chiori Hori, Peter Anderson, Stefan Lee, Devi Parikh
We introduce the task of scene-aware dialog.
no code implementations • 11 Sep 2018 • Jonathan C Balloch, Varun Agrawal, Irfan Essa, Sonia Chernova
We show that pretraining real-time segmentation architectures with synthetic segmentation data instead of ImageNet improves fine-tuning performance by reducing the bias learned in pretraining and closing the \textit{transfer gap} as a result.
no code implementations • 22 Aug 2018 • Unaiza Ahsan, Rishi Madhok, Irfan Essa
We propose a self-supervised learning method to jointly reason about spatial and temporal context for video recognition.
2 code implementations • 21 Jun 2018 • Chiori Hori, Huda Alamri, Jue Wang, Gordon Wichern, Takaaki Hori, Anoop Cherian, Tim K. Marks, Vincent Cartillier, Raphael Gontijo Lopes, Abhishek Das, Irfan Essa, Dhruv Batra, Devi Parikh
We introduce a new dataset of dialogs about videos of human behaviors.
no code implementations • 1 Jun 2018 • Aneeq Zia, Andrew Hung, Irfan Essa, Anthony Jarc
Adverse surgical outcomes are costly to patients and hospitals.
4 code implementations • 1 Jun 2018 • Huda Alamri, Vincent Cartillier, Raphael Gontijo Lopes, Abhishek Das, Jue Wang, Irfan Essa, Dhruv Batra, Devi Parikh, Anoop Cherian, Tim K. Marks, Chiori Hori
Scene-aware dialog systems will be able to have conversations with users about the objects and events around them.
1 code implementation • 26 Jan 2018 • Steven Hickson, Anelia Angelova, Irfan Essa, Rahul Sukthankar
We consider the problem of retrieving objects from image data and learning to classify them into meaningful semantic categories with minimal supervision.
1 code implementation • CVPR 2014 • Steven Hickson, Stan Birchfield, Irfan Essa, Henrik Christensen
We present an efficient and scalable algorithm for segmenting 3D RGBD point clouds by combining depth, color, and temporal information using a multistage, hierarchical graph-based approach.
1 code implementation • 23 Jan 2018 • Daniel Castro, Steven Hickson, Patsorn Sangkloy, Bhavishya Mittal, Sean Dai, James Hays, Irfan Essa
We present a comparison of numerous state-of-the-art techniques on our dataset using three different representations (video, optical flow and multi-person pose data) in order to analyze these approaches.
no code implementations • 22 Jan 2018 • Unaiza Ahsan, Chen Sun, Irfan Essa
We propose an action recognition framework using Gen- erative Adversarial Networks.
no code implementations • 22 Dec 2017 • Aneeq Zia, Irfan Essa
In this paper, we explore the usage of different holistic features for automated skill assessment using only robot kinematic data and propose a weighted feature fusion technique for improving score prediction performance.
6 code implementations • 11 Sep 2017 • Amirreza Shaban, Shray Bansal, Zhen Liu, Irfan Essa, Byron Boots
Low-shot learning methods for image classification support learning from sparse data.
1 code implementation • 2 Aug 2017 • Steven Hickson, Irfan Essa, Henrik Christensen
Most of the approaches for indoor RGBD semantic la- beling focus on using pixels or superpixels to train a classi- fier.
no code implementations • 22 Jul 2017 • Steven Hickson, Nick Dufour, Avneesh Sud, Vivek Kwatra, Irfan Essa
One of the main challenges of social interaction in virtual reality settings is that head-mounted displays occlude a large portion of the face, blocking facial expressions and thereby restricting social engagement cues among users.
no code implementations • 24 Feb 2017 • Aneeq Zia, Yachna Sharma, Vinay Bettadapura, Eric L. Sarin, Irfan Essa
Methods: We conduct the largest study, to the best of our knowledge, for basic surgical skills assessment on a dataset that contained video and accelerometer data for suturing and knot-tying tasks.
no code implementations • 17 Jan 2017 • Unaiza Ahsan, Chen Sun, James Hays, Irfan Essa
We propose to leverage concept-level representations for complex event recognition in photographs given limited training examples.
no code implementations • 18 Jan 2016 • Vinay Bettadapura, Daniel Castro, Irfan Essa
We present an approach for identifying picturesque highlights from large amounts of egocentric video data.
no code implementations • 25 Oct 2015 • S. Hussain Raza, Omar Javed, Aveek Das, Harpreet Sawhney, Hui Cheng, Irfan Essa
We propose to learn and infer depth in videos from appearance, motion, occlusion boundaries, and geometric context of the scene.
no code implementations • CVPR 2013 • S. Hussain Raza, Matthias Grundmann, Irfan Essa
We present a novel algorithm for estimating the broad 3D geometric structure of outdoor video scenes.
no code implementations • 25 Oct 2015 • S. Hussain Raza, Ahmad Humayun, Matthias Grundmann, David Anderson, Irfan Essa
Our proposed framework provides an efficient approach for finding temporally consistent occlusion boundaries in video by utilizing causality, redundancy in videos, and semantic layout of the scene.
no code implementations • 7 Oct 2015 • Vinay Bettadapura, Edison Thomaz, Aman Parnami, Gregory Abowd, Irfan Essa
The pervasiveness of mobile cameras has resulted in a dramatic increase in food photos, which are pictures reflecting what people eat.
no code implementations • 7 Oct 2015 • Vinay Bettadapura, Irfan Essa, Caroline Pantofaru
We present a technique that uses images, videos and sensor data taken from first-person point-of-view devices to perform egocentric field-of-view (FOV) localization.
no code implementations • CVPR 2013 • Vinay Bettadapura, Grant Schindler, Thomaz Plotz, Irfan Essa
We present data-driven techniques to augment Bag of Words (BoW) models, which allow for more robust modeling and recognition of complex long-term activities, especially when the structure and topology of the activities are not known a priori.
no code implementations • 6 Oct 2015 • Daniel Castro, Steven Hickson, Vinay Bettadapura, Edison Thomaz, Gregory Abowd, Henrik Christensen, Irfan Essa
We collected a dataset of 40, 103 egocentric images over a 6 month period with 19 activity classes and demonstrate the benefit of state-of-the-art deep learning techniques for learning and predicting daily activities.
no code implementations • 8 Feb 2012 • Seungyeon Kim, Fuxin Li, Guy Lebanon, Irfan Essa
Sentiment analysis predicts the presence of positive or negative emotions in a text document.