Search Results for author: Irfan Essa

Found 47 papers, 12 papers with code

Assessing the State of Self-Supervised Human Activity Recognition using Wearables

no code implementations22 Feb 2022 Harish Haresamudram, Irfan Essa, Thomas Plötz

As such, self-supervision, i. e., the paradigm of 'pretrain-then-finetune' has the potential to become a strong alternative to the predominant end-to-end training approaches, let alone the classic activity recognition chain with hand-crafted features of sensor data.

Activity Recognition Domain Adaptation +1

Learning Temporal Rules from Noisy Timeseries Data

no code implementations11 Feb 2022 Karan Samel, Zelin Zhao, Binghong Chen, Shuang Li, Dharmashankar Subramanian, Irfan Essa, Le Song

Events across a timeline are a common data representation, seen in different temporal modalities.

BLT: Bidirectional Layout Transformer for Controllable Layout Generation

no code implementations9 Dec 2021 Xiang Kong, Lu Jiang, Huiwen Chang, Han Zhang, Yuan Hao, Haifeng Gong, Irfan Essa

Our results demonstrate two key advances to the state-of-the-art layout transformer models.

VideoPose: Estimating 6D object pose from videos

no code implementations20 Nov 2021 Apoorva Beedu, Zhile Ren, Varun Agrawal, Irfan Essa

We introduce a simple yet effective algorithm that uses convolutional neural networks to directly estimate object poses from videos.

Frame Pose Estimation

Neural Temporal Logic Programming

no code implementations29 Sep 2021 Karan Samel, Zelin Zhao, Binghong Chen, Shuang Li, Dharmashankar Subramanian, Irfan Essa, Le Song

Events across a timeline are a common data representation, seen in different temporal modalities.

Unsupervised Discovery of Actions in Instructional Videos

no code implementations28 Jun 2021 AJ Piergiovanni, Anelia Angelova, Michael S. Ryoo, Irfan Essa

In this paper we address the problem of automatically discovering atomic actions in unsupervised manner from instructional videos.

Unsupervised Action Segmentation for Instructional Videos

no code implementations7 Jun 2021 AJ Piergiovanni, Anelia Angelova, Michael S. Ryoo, Irfan Essa

In this paper we address the problem of automatically discovering atomic actions in unsupervised manner from instructional videos, which are rarely annotated with atomic actions.

Action Segmentation

Automatic Non-Linear Video Editing Transfer

no code implementations14 May 2021 Nathan Frey, Peggy Chi, Weilong Yang, Irfan Essa

We propose an automatic approach that extracts editing styles in a source video and applies the edits to matched footage for video creation.

How to Train PointGoal Navigation Agents on a (Sample and Compute) Budget

no code implementations11 Dec 2020 Erik Wijmans, Irfan Essa, Dhruv Batra

PointGoal navigation has seen significant recent interest and progress, spurred on by the Habitat platform and associated challenge.

PointGoal Navigation

Contrastive Predictive Coding for Human Activity Recognition

no code implementations9 Dec 2020 Harish Haresamudram, Irfan Essa, Thomas Ploetz

Our work focuses on effective use of small amounts of labeled data and the opportunistic exploitation of unlabeled data that are straightforward to collect in mobile and ubiquitous computing scenarios.

Activity Recognition

Semantic MapNet: Building Allocentric Semantic Maps and Representations from Egocentric Views

1 code implementation2 Oct 2020 Vincent Cartillier, Zhile Ren, Neha Jain, Stefan Lee, Irfan Essa, Dhruv Batra

We study the task of semantic mapping - specifically, an embodied agent (a robot or an egocentric AI assistant) is given a tour of a new environment and asked to build an allocentric top-down semantic map ("what is where?")

Representation Learning

Text as Neural Operator: Image Manipulation by Text Instruction

1 code implementation11 Aug 2020 Tianhao Zhang, Hung-Yu Tseng, Lu Jiang, Weilong Yang, Honglak Lee, Irfan Essa

In recent years, text-guided image manipulation has gained increasing attention in the multimedia and computer vision community.

Conditional Image Generation Image Captioning +1

Analyzing Visual Representations in Embodied Navigation Tasks

no code implementations12 Mar 2020 Erik Wijmans, Julian Straub, Dhruv Batra, Irfan Essa, Judy Hoffman, Ari Morcos

Recent advances in deep reinforcement learning require a large amount of training data and generally result in representations that are often over specialized to the target task.

reinforcement-learning

Insights on Visual Representations for Embodied Navigation Tasks

no code implementations ICLR 2020 Erik Wijmans, Julian Straub, Irfan Essa, Dhruv Batra, Judy Hoffman, Ari Morcos

Surprisingly, we find that slight differences in task have no measurable effect on the visual representation for both SqueezeNet and ResNet architectures.

DD-PPO: Learning Near-Perfect PointGoal Navigators from 2.5 Billion Frames

6 code implementations ICLR 2020 Erik Wijmans, Abhishek Kadian, Ari Morcos, Stefan Lee, Irfan Essa, Devi Parikh, Manolis Savva, Dhruv Batra

We leverage this scaling to train an agent for 2. 5 Billion steps of experience (the equivalent of 80 years of human experience) -- over 6 months of GPU-time training in under 3 days of wall-clock time with 64 GPUs.

Autonomous Navigation PointGoal Navigation +1

Estimating Mass Distribution of Articulated Objects using Non-prehensile Manipulation

no code implementations9 Jul 2019 K. Niranjan Kumar, Irfan Essa, Sehoon Ha, C. Karen Liu

Using our method, we train a robotic arm to estimate the mass distribution of an object with moving parts (e. g. an articulated rigid body system) by pushing it on a surface with unknown friction properties.

Novel evaluation of surgical activity recognition models using task-based efficiency metrics

no code implementations3 Jul 2019 Aneeq Zia, Liheng Guo, Linlin Zhou, Irfan Essa, Anthony Jarc

Conclusions: We demonstrate that metrics-based evaluation of surgical activity recognition models is a viable approach to determine when models can be used to quantify surgical efficiencies.

Activity Recognition

Floors are Flat: Leveraging Semantics for Real-Time Surface Normal Prediction

1 code implementation16 Jun 2019 Steven Hickson, Karthik Raveendran, Alireza Fathi, Kevin Murphy, Irfan Essa

We propose 4 insights that help to significantly improve the performance of deep learning models that predict surface normals and semantic labels from a single RGB image.

Semantic Segmentation Surface Normals Estimation

Embodied Question Answering in Photorealistic Environments with Point Cloud Perception

no code implementations CVPR 2019 Erik Wijmans, Samyak Datta, Oleksandr Maksymets, Abhishek Das, Georgia Gkioxari, Stefan Lee, Irfan Essa, Devi Parikh, Dhruv Batra

To help bridge the gap between internet vision-style problems and the goal of vision for embodied perception we instantiate a large-scale navigation task -- Embodied Question Answering [1] in photo-realistic environments (Matterport 3D).

Embodied Question Answering Question Answering

Unbiasing Semantic Segmentation For Robot Perception using Synthetic Data Feature Transfer

no code implementations11 Sep 2018 Jonathan C Balloch, Varun Agrawal, Irfan Essa, Sonia Chernova

We show that pretraining real-time segmentation architectures with synthetic segmentation data instead of ImageNet improves fine-tuning performance by reducing the bias learned in pretraining and closing the \textit{transfer gap} as a result.

Semantic Segmentation

Video Jigsaw: Unsupervised Learning of Spatiotemporal Context for Video Action Recognition

no code implementations22 Aug 2018 Unaiza Ahsan, Rishi Madhok, Irfan Essa

We propose a self-supervised learning method to jointly reason about spatial and temporal context for video recognition.

Action Recognition Frame +4

Object category learning and retrieval with weak supervision

1 code implementation26 Jan 2018 Steven Hickson, Anelia Angelova, Irfan Essa, Rahul Sukthankar

We consider the problem of retrieving objects from image data and learning to classify them into meaningful semantic categories with minimal supervision.

Deep Clustering

Efficient Hierarchical Graph-Based Segmentation of RGBD Videos

1 code implementation CVPR 2014 Steven Hickson, Stan Birchfield, Irfan Essa, Henrik Christensen

We present an efficient and scalable algorithm for segmenting 3D RGBD point clouds by combining depth, color, and temporal information using a multistage, hierarchical graph-based approach.

Graph Matching Video Segmentation

Let's Dance: Learning From Online Dance Videos

1 code implementation23 Jan 2018 Daniel Castro, Steven Hickson, Patsorn Sangkloy, Bhavishya Mittal, Sean Dai, James Hays, Irfan Essa

We present a comparison of numerous state-of-the-art techniques on our dataset using three different representations (video, optical flow and multi-person pose data) in order to analyze these approaches.

Action Recognition Frame +1

Automated Surgical Skill Assessment in RMIS Training

no code implementations22 Dec 2017 Aneeq Zia, Irfan Essa

In this paper, we explore the usage of different holistic features for automated skill assessment using only robot kinematic data and propose a weighted feature fusion technique for improving score prediction performance.

General Classification

Semantic Instance Labeling Leveraging Hierarchical Segmentation

1 code implementation2 Aug 2017 Steven Hickson, Irfan Essa, Henrik Christensen

Most of the approaches for indoor RGBD semantic la- beling focus on using pixels or superpixels to train a classi- fier.

Superpixels

Eyemotion: Classifying facial expressions in VR using eye-tracking cameras

no code implementations22 Jul 2017 Steven Hickson, Nick Dufour, Avneesh Sud, Vivek Kwatra, Irfan Essa

One of the main challenges of social interaction in virtual reality settings is that head-mounted displays occlude a large portion of the face, blocking facial expressions and thereby restricting social engagement cues among users.

Video and Accelerometer-Based Motion Analysis for Automated Surgical Skills Assessment

no code implementations24 Feb 2017 Aneeq Zia, Yachna Sharma, Vinay Bettadapura, Eric L. Sarin, Irfan Essa

Methods: We conduct the largest study, to the best of our knowledge, for basic surgical skills assessment on a dataset that contained video and accelerometer data for suturing and knot-tying tasks.

Skills Assessment Time Series

Complex Event Recognition from Images with Few Training Examples

no code implementations17 Jan 2017 Unaiza Ahsan, Chen Sun, James Hays, Irfan Essa

We propose to leverage concept-level representations for complex event recognition in photographs given limited training examples.

Discovering Picturesque Highlights from Egocentric Vacation Videos

no code implementations18 Jan 2016 Vinay Bettadapura, Daniel Castro, Irfan Essa

We present an approach for identifying picturesque highlights from large amounts of egocentric video data.

Highlight Detection

Depth Extraction from Videos Using Geometric Context and Occlusion Boundaries

no code implementations25 Oct 2015 S. Hussain Raza, Omar Javed, Aveek Das, Harpreet Sawhney, Hui Cheng, Irfan Essa

We propose to learn and infer depth in videos from appearance, motion, occlusion boundaries, and geometric context of the scene.

Depth Estimation Pose Estimation

Finding Temporally Consistent Occlusion Boundaries in Videos using Geometric Context

no code implementations25 Oct 2015 S. Hussain Raza, Ahmad Humayun, Matthias Grundmann, David Anderson, Irfan Essa

Our proposed framework provides an efficient approach for finding temporally consistent occlusion boundaries in video by utilizing causality, redundancy in videos, and semantic layout of the scene.

Leveraging Context to Support Automated Food Recognition in Restaurants

no code implementations7 Oct 2015 Vinay Bettadapura, Edison Thomaz, Aman Parnami, Gregory Abowd, Irfan Essa

The pervasiveness of mobile cameras has resulted in a dramatic increase in food photos, which are pictures reflecting what people eat.

Food Recognition

Egocentric Field-of-View Localization Using First-Person Point-of-View Devices

no code implementations7 Oct 2015 Vinay Bettadapura, Irfan Essa, Caroline Pantofaru

We present a technique that uses images, videos and sensor data taken from first-person point-of-view devices to perform egocentric field-of-view (FOV) localization.

Augmenting Bag-of-Words: Data-Driven Discovery of Temporal and Structural Information for Activity Recognition

no code implementations CVPR 2013 Vinay Bettadapura, Grant Schindler, Thomaz Plotz, Irfan Essa

We present data-driven techniques to augment Bag of Words (BoW) models, which allow for more robust modeling and recognition of complex long-term activities, especially when the structure and topology of the activities are not known a priori.

Activity Recognition

Predicting Daily Activities From Egocentric Images Using Deep Learning

no code implementations6 Oct 2015 Daniel Castro, Steven Hickson, Vinay Bettadapura, Edison Thomaz, Gregory Abowd, Henrik Christensen, Irfan Essa

We collected a dataset of 40, 103 egocentric images over a 6 month period with 19 activity classes and demonstrate the benefit of state-of-the-art deep learning techniques for learning and predicting daily activities.

Classification General Classification

Beyond Sentiment: The Manifold of Human Emotions

no code implementations8 Feb 2012 Seungyeon Kim, Fuxin Li, Guy Lebanon, Irfan Essa

Sentiment analysis predicts the presence of positive or negative emotions in a text document.

Sentiment Analysis

Cannot find the paper you are looking for? You can Submit a new open access paper.