no code implementations • ICLR 2019 • Vikas Dhiman, Shurjo Banerjee, Jeffrey M. Siskind, Jason J. Corso
Multi-goal reinforcement learning (MGRL) addresses tasks where the desired goal state can change for every trial.
1 code implementation • 12 Jul 2022 • Nathan Louis, Tylan N. Templin, Travis D. Eliason, Daniel P. Nicolella, Jason J. Corso
Analyzing sports performance or preventing injuries requires capturing ground reaction forces (GRFs) exerted by the human body during certain movements.
no code implementations • 14 Apr 2022 • Madan Ravi Ganesh, Salimeh Yasaei Sekeh, Jason J. Corso
Raw deep neural network (DNN) performance is not enough; in real-world settings, computational load, training efficiency and adversarial security are just as or even more important.
no code implementations • 19 Oct 2021 • Stephan J. Lemmer, Jason J. Corso
Many AI systems integrate sensor inputs, world knowledge, and human-provided information to perform inference.
1 code implementation • CVPR 2022 • Ryan Szeto, Jason J. Corso
Quantitative evaluation has increased dramatically among recent video inpainting work, but the video and mask content used to gauge performance has received relatively little attention.
1 code implementation • CVPR 2021 • Brent A. Griffin, Jason J. Corso
This paper addresses the problem of learning to estimate the depth of detected objects given some measurement of camera motion (e. g., from robot kinematics or vehicle odometry).
1 code implementation • 12 Jan 2021 • Nathan Louis, Luowei Zhou, Steven J. Yule, Roger D. Dias, Milisa Manojlovich, Francis D. Pagani, Donald S. Likosky, Jason J. Corso
Additionally, we collect the first dataset, Surgical Hands, that provides multi-instance articulated hand pose annotations for in-vivo videos.
1 code implementation • 8 Nov 2020 • Kyle Min, Jason J. Corso
In addition, we model the distribution of gaze fixations using a variational method.
Ranked #1 on
Egocentric Activity Recognition
on EGTEA
no code implementations • 23 Oct 2020 • Shurjo Banerjee, Jesse Thomason, Jason J. Corso
In each trial, the pair first cooperates to localize the robot on a global map visible to the Commander, then the Driver follows Commander instructions to move the robot to a sequence of target objects.
1 code implementation • ICCV 2021 • Stephan J. Lemmer, Jason J. Corso
Many vision tasks use secondary information at inference time -- a seed -- to assist a computer vision model in solving a problem.
no code implementations • 29 Jul 2020 • Duygu Sarikaya, Jason J. Corso, Khurshid A. Guru
We propose an architecture using multimodal convolutional neural networks for fast detection and localization of tools in RAS videos.
1 code implementation • ECCV 2020 • Kyle Min, Jason J. Corso
Two triplets of the feature space are considered in our approach: one triplet is used to learn discriminative features for each activity class, and the other one is used to distinguish the features where no activity occurs (i. e. background features) from activity-related features for each video.
2 code implementations • ECCV 2020 • Brent A. Griffin, Jason J. Corso
Video object segmentation, i. e., the separation of a target object from background in video, has made significant progress on real and challenging videos in recent years.
no code implementations • 22 Jun 2020 • Madan Ravi Ganesh, Dawsin Blanchard, Jason J. Corso, Salimeh Yasaei Sekeh
Finally, we define a novel sensitivity criterion for filters that measures the strength of their contributions to the succeeding layer and highlights critical filters that need to be completely protected from pruning.
1 code implementation • CVPR 2020 • Mohamed El Banani, Jason J. Corso, David F. Fouhey
Our key insight is that although we do not have an explicit 3D model or a predefined canonical pose, we can still learn to estimate the object's shape in the viewer's frame and then use an image to provide our reference model or canonical pose.
no code implementations • 18 Mar 2020 • Madan Ravi Ganesh, Jason J. Corso, Salimeh Yasaei Sekeh
Most approaches to deep neural network compression via pruning either evaluate a filter's importance using its weights or optimize an alternative objective function with sparsity constraints.
no code implementations • 13 Jan 2020 • Madan Ravi Ganesh, Jason J. Corso
In this work, we propose Learning with Incremental Labels and Adaptive Compensation (LILAC), a two-phase method that incrementally increases the number of unique output labels rather than the difficulty of samples while consistently using the entire dataset throughout training.
no code implementations • 10 Dec 2019 • Ryan Szeto, Mostafa El-Khamy, Jungwon Lee, Jason J. Corso
To combine the benefits of image and video models, we propose an image-to-video model transfer method called Hyperconsistency (HyperCon) that transforms any well-trained image model into a temporally consistent video model without fine-tuning.
no code implementations • 2 Oct 2019 • Salimeh Yasaei Sekeh, Madan Ravi Ganesh, Shurjo Banerjee, Jason J. Corso, Alfred O. Hero
In this work, firstly, we assert that OSFS's main assumption of having data from all the samples available at runtime is unrealistic and introduce a new setting where features and samples are streamed concurrently called OSFS with Streaming Samples (OSFS-SS).
3 code implementations • 24 Sep 2019 • Luowei Zhou, Hamid Palangi, Lei Zhang, Houdong Hu, Jason J. Corso, Jianfeng Gao
The model is unified in that (1) it can be fine-tuned for either vision-language generation (e. g., image captioning) or understanding (e. g., visual question answering) tasks, and (2) it uses a shared multi-layer transformer network for both encoding and decoding, which differs from many existing methods where the encoder and decoder are implemented using separate models.
Ranked #1 on
Image Captioning
on Flickr30k Captions test
1 code implementation • ICCV 2019 • Kyle Min, Jason J. Corso
It consists of two building blocks: first, the encoder network extracts low-resolution spatiotemporal features from an input clip of several consecutive frames, and then the following prediction network decodes the encoded features spatially while aggregating all the temporal information.
Ranked #3 on
Video Saliency Detection
on DHF1K
3 code implementations • CVPR 2019 • Hao Tang, Dan Xu, Nicu Sebe, Yanzhi Wang, Jason J. Corso, Yan Yan
In this paper, we propose a novel approach named Multi-Channel Attention SelectionGAN (SelectionGAN) that makes it possible to generate images of natural scenes in arbitrary viewpoints, based on an image of the scene and a novel semantic map.
Bird View Synthesis
Cross-View Image-to-Image Translation
+1
1 code implementation • CVPR 2019 • Brent A. Griffin, Jason J. Corso
Semi-supervised video object segmentation has made significant progress on real and challenging videos in recent years.
Semantic Segmentation
Semi-Supervised Video Object Segmentation
+1
1 code implementation • 20 Mar 2019 • Brent Griffin, Victoria Florence, Jason J. Corso
To be useful in everyday environments, robots must be able to identify and locate unstructured, real-world objects.
Robotics
1 code implementation • 28 Jan 2019 • Hao Tang, Xinya Chen, Wei Wang, Dan Xu, Jason J. Corso, Nicu Sebe, Yan Yan
To this end, we propose a novel Attribute-Guided Sketch Generative Adversarial Network (ASGAN) which is an end-to-end framework and contains two pairs of generators and discriminators, one of which is used to generate faces with attributes while the other one is employed for image-to-sketch translation.
2 code implementations • CVPR 2019 • Luowei Zhou, Yannis Kalantidis, Xinlei Chen, Jason J. Corso, Marcus Rohrbach
Our dataset, ActivityNet-Entities, augments the challenging ActivityNet Captions dataset with 158k bounding box annotations, each grounding a noun phrase.
no code implementations • 13 Dec 2018 • Hao Huang, Luowei Zhou, Wei zhang, Jason J. Corso, Chenliang Xu
Video action recognition, a critical problem in video understanding, has been gaining increasing attention.
2 code implementations • 19 Nov 2018 • Brent A. Griffin, Jason J. Corso
We investigate the problem of strictly unsupervised video object segmentation, i. e., the separation of a primary object from background in video without a user-provided object mask or any training on an annotated dataset.
Semantic Segmentation
Unsupervised Video Object Segmentation
+1
no code implementations • 25 Sep 2018 • Vikas Dhiman, Shurjo Banerjee, Jeffrey M. Siskind, Jason J. Corso
We do this by adapting the Floyd-Warshall algorithm for RL and call the adaptation Floyd-Warshall RL (FWRL).
no code implementations • 8 May 2018 • Luowei Zhou, Nathan Louis, Jason J. Corso
A naive extension of this approach to the video domain is to treat the entire segment as a bag of spatial object proposals.
no code implementations • 2 May 2018 • Duygu Sarikaya, Khurshid A. Guru, Jason J. Corso
Our experimental results show that our approach is superior compared to an ar- chitecture that classifies the gestures and surgical tasks separately on visual cues and motion cues respectively.
1 code implementation • 16 Apr 2018 • Eric Hofesmann, Madan Ravi Ganesh, Jason J. Corso
We present M-PACT to overcome existing issues by removing the need to develop boilerplate code which allows users to quickly prototype action classification models while leveraging existing state-of-the-art (SOTA) models available in the platform.
1 code implementation • CVPR 2018 • Luowei Zhou, Yingbo Zhou, Jason J. Corso, Richard Socher, Caiming Xiong
To address this problem, we propose an end-to-end transformer model for dense video captioning.
Ranked #7 on
Video Captioning
on YouCook2
no code implementations • 29 Mar 2018 • Abhishek Venkataraman, Brent Griffin, Jason J. Corso
SPARE is an extendable open-source dataset providing equivalent simulated and physical instances of articulated objects (kinematic chains), providing the greater research community with a training and evaluation tool for methods generating kinematic descriptions of articulated objects.
no code implementations • 21 Mar 2018 • Madan Ravi Ganesh, Eric Hofesmann, Byungsu Min, Nadha Gafoor, Jason J. Corso
We explore the erratic behavior caused by this phenomena on state-of-the-art deep network-based methods for action recognition in terms of maximum performance and stability in recognition accuracy across a range of input video speeds.
1 code implementation • 20 Mar 2018 • Ximeng Sun, Ryan Szeto, Jason J. Corso
We propose the first deep learning solution to video frame inpainting, a challenging instance of the general video inpainting problem with applications in video editing, manipulation, and forensics.
1 code implementation • 25 Feb 2018 • Ryan Szeto, Simon Stent, German Ros, Jason J. Corso
We present a parameterized synthetic dataset called Moving Symbols to support the objective study of video prediction networks.
1 code implementation • 7 Feb 2018 • Vikas Dhiman, Shurjo Banerjee, Brent Griffin, Jeffrey M. Siskind, Jason J. Corso
However, when trained and tested on different sets of maps, the algorithm fails to transfer the ability to gather and exploit map-information to unseen maps.
1 code implementation • 5 Feb 2018 • Mohamed El Banani, Jason J. Corso
We address this question by formulating it as an Adviser Problem: can we learn a mapping from the input to a specific question to ask the human to maximize the expected positive impact to the overall task?
no code implementations • 15 Jan 2018 • Theodore S. Nowak, Jason J. Corso
As such, while certain structures have been found to work better than others, the significance of a model's unique structure, or the importance of a given layer, and how these translate to overall accuracy, remains unclear.
no code implementations • 12 Jan 2018 • Sajan Patel, Brent Griffin, Kristofer Kusano, Jason J. Corso
To demonstrate our approach, we validate our model using authentic interstate highway driving to predict the future lane change maneuvers of other vehicles neighboring our autonomous vehicle.
no code implementations • ICLR 2018 • Shurjo Banerjee, Vikas Dhiman, Brent Griffin, Jason J. Corso
As the title of the paper by Mirowski et al. (2016) suggests, one might assume that DRL-based algorithms are able to “learn to navigate” and are thus ready to replace classical mapping and path-planning algorithms, at least in simulated environments.
no code implementations • CVPR 2017 • Yan Yan, Chenliang Xu, Dawen Cai, Jason J. Corso
However, current methods for detailed understanding of actor and action have significant limitations: they require large amounts of finely labeled data, and they fail to capture any internal relationship among actors and actions.
no code implementations • 27 Apr 2017 • Chenliang Xu, Caiming Xiong, Jason J. Corso
Despite the rapid progress, existing works on action understanding focus strictly on one type of action agent, which we call actor---a human adult, ignoring the diversity of actions performed by other actors.
1 code implementation • 18 Apr 2017 • Brent A. Griffin, Jason J. Corso
Focusing on the problem of strictly unsupervised video object segmentation, we devise a method called supervoxel gerrymandering that links masks of foregroundness and backgroundness via local and non-local consensus measures.
no code implementations • ICCV 2017 • Ryan Szeto, Jason J. Corso
We motivate and address a human-in-the-loop variant of the monocular viewpoint estimation task in which the location and class of one semantic object keypoint is available at test time.
1 code implementation • 28 Mar 2017 • Luowei Zhou, Chenliang Xu, Jason J. Corso
To answer this question, we introduce the problem of procedure segmentation--to segment a video procedure into category-independent procedure segments.
no code implementations • 14 Dec 2016 • Parker Koch, Jason J. Corso
On the other hand, dictionary learning does not scale to the size of problems that CNNs can handle, despite being very effective at low-level vision tasks such as denoising and inpainting.
1 code implementation • 15 Jun 2016 • Luowei Zhou, Chenliang Xu, Parker Koch, Jason J. Corso
Attention mechanisms have attracted considerable interest in image captioning due to its powerful performance.
no code implementations • CVPR 2016 • Vikas Dhiman, Quoc-Huy Tran, Jason J. Corso, Manmohan Chandraker
We present a physically interpretable, continuous 3D model for handling occlusions with applications to road scene understanding.
no code implementations • 12 Apr 2016 • Suren Kumar, Vikas Dhiman, Madan Ravi Ganesh, Jason J. Corso
We propose an online spatiotemporal articulation model estimation framework that estimates both articulated structure as well as a temporal prediction model solely using passive observations.
no code implementations • 30 Dec 2015 • Chenliang Xu, Jason J. Corso
Supervoxel segmentation has strong potential to be incorporated into early video analysis as superpixel segmentation has in image analysis.
no code implementations • CVPR 2016 • Chenliang Xu, Jason J. Corso
Actor-action semantic segmentation made an important step toward advanced video understanding problems: what action is happening; who is performing the action; and where is the action in space-time.
no code implementations • ICCV 2015 • Wei Chen, Jason J. Corso
This paper hence seeks to understand the spatiotemporal properties of intentional movement and how to capture such intentional movement without relying on challenging human detection and tracking.
no code implementations • CVPR 2015 • Jiasen Lu, ran Xu, Jason J. Corso
Detailed analysis of human action, such as action classification, detection and localization has received increasing attention from the community; datasets like JHMDB have made it plausible to conduct studies analyzing the impact that such deeper information has on the greater action understanding problem.
no code implementations • CVPR 2015 • Chenliang Xu, Shao-Hang Hsieh, Caiming Xiong, Jason J. Corso
There is no work we know of on simultaneously inferring actors and actions in the video, not to mention a dataset to experiment with.
no code implementations • 21 Oct 2014 • Ran Xu, Gang Chen, Caiming Xiong, Wei Chen, Jason J. Corso
The focus of the action understanding literature has predominately been classification, how- ever, there are many applications demanding richer action understanding such as mobile robotics and video search, with solutions to classification, localization and detection.
no code implementations • CVPR 2014 • Wei Chen, Caiming Xiong, ran Xu, Jason J. Corso
Action analysis in image and video has been attracting more and more attention in computer vision.
no code implementations • 23 Feb 2014 • David M. Johnson, Caiming Xiong, Jason J. Corso
By introducing randomness during hierarchy training and combining the output of many of the resulting semi-random weak hierarchy metrics, we can obtain a powerful and robust nonlinear metric model.
no code implementations • 7 Feb 2014 • Caiming Xiong, David Johnson, Jason J. Corso
Semi-supervised clustering seeks to augment traditional clustering methods by incorporating side information provided via human expertise in order to increase the semantic meaningfulness of the resulting clusters.
no code implementations • 13 Nov 2013 • Chenliang Xu, Richard F. Doell, Stephen José Hanson, Catherine Hanson, Jason J. Corso
In this paper, we conduct a systematic study of how well the actor and action semantics are retained in video supervoxel segmentation.
no code implementations • CVPR 2013 • Pradipto Das, Chenliang Xu, Richard F. Doell, Jason J. Corso
The problem of describing images through natural language has gained importance in the computer vision community.
no code implementations • 17 Aug 2011 • Yingjie Miao, Jason J. Corso
We propose a new feature extraction method based on two dynamical systems induced by intensity landscape: the negative gradient system and the Hamiltonian system.