Search Results for author: Jason J. Corso

Found 63 papers, 26 papers with code

Unified Vision-Language Pre-Training for Image Captioning and VQA

3 code implementations24 Sep 2019 Luowei Zhou, Hamid Palangi, Lei Zhang, Houdong Hu, Jason J. Corso, Jianfeng Gao

The model is unified in that (1) it can be fine-tuned for either vision-language generation (e. g., image captioning) or understanding (e. g., visual question answering) tasks, and (2) it uses a shared multi-layer transformer network for both encoding and decoding, which differs from many existing methods where the encoder and decoder are implemented using separate models.

Image Captioning Question Answering +2

Multi-Channel Attention Selection GAN with Cascaded Semantic Guidance for Cross-View Image Translation

3 code implementations CVPR 2019 Hao Tang, Dan Xu, Nicu Sebe, Yanzhi Wang, Jason J. Corso, Yan Yan

In this paper, we propose a novel approach named Multi-Channel Attention SelectionGAN (SelectionGAN) that makes it possible to generate images of natural scenes in arbitrary viewpoints, based on an image of the scene and a novel semantic map.

Bird View Synthesis Cross-View Image-to-Image Translation +1

Grounded Video Description

2 code implementations CVPR 2019 Luowei Zhou, Yannis Kalantidis, Xinlei Chen, Jason J. Corso, Marcus Rohrbach

Our dataset, ActivityNet-Entities, augments the challenging ActivityNet Captions dataset with 158k bounding box annotations, each grounding a noun phrase.

Sentence Video Description

Learning Object Depth from Camera Motion and Video Object Segmentation

2 code implementations ECCV 2020 Brent A. Griffin, Jason J. Corso

Video object segmentation, i. e., the separation of a target object from background in video, has made significant progress on real and challenging videos in recent years.

Object Segmentation +3

Depth from Camera Motion and Object Detection

1 code implementation CVPR 2021 Brent A. Griffin, Jason J. Corso

This paper addresses the problem of learning to estimate the depth of detected objects given some measurement of camera motion (e. g., from robot kinematics or vehicle odometry).

Object object-detection +1

M-PACT: An Open Source Platform for Repeatable Activity Classification Research

1 code implementation16 Apr 2018 Eric Hofesmann, Madan Ravi Ganesh, Jason J. Corso

We present M-PACT to overcome existing issues by removing the need to develop boilerplate code which allows users to quickly prototype action classification models while leveraging existing state-of-the-art (SOTA) models available in the platform.

Action Classification Activity Recognition +2

TASED-Net: Temporally-Aggregating Spatial Encoder-Decoder Network for Video Saliency Detection

1 code implementation ICCV 2019 Kyle Min, Jason J. Corso

It consists of two building blocks: first, the encoder network extracts low-resolution spatiotemporal features from an input clip of several consecutive frames, and then the following prediction network decodes the encoded features spatially while aggregating all the temporal information.

Video Saliency Detection

Watch What You Just Said: Image Captioning with Text-Conditional Attention

1 code implementation15 Jun 2016 Luowei Zhou, Chenliang Xu, Parker Koch, Jason J. Corso

Attention mechanisms have attracted considerable interest in image captioning due to its powerful performance.

Image Captioning Language Modelling

Adversarial Background-Aware Loss for Weakly-supervised Temporal Activity Localization

1 code implementation ECCV 2020 Kyle Min, Jason J. Corso

Two triplets of the feature space are considered in our approach: one triplet is used to learn discriminative features for each activity class, and the other one is used to distinguish the features where no activity occurs (i. e. background features) from activity-related features for each video.

Metric Learning Weakly Supervised Action Localization +1

Towards Automatic Learning of Procedures from Web Instructional Videos

1 code implementation28 Mar 2017 Luowei Zhou, Chenliang Xu, Jason J. Corso

To answer this question, we introduce the problem of procedure segmentation--to segment a video procedure into category-independent procedure segments.

Dense Video Captioning Procedure Learning +1

A Temporally-Aware Interpolation Network for Video Frame Inpainting

1 code implementation20 Mar 2018 Ximeng Sun, Ryan Szeto, Jason J. Corso

We propose the first deep learning solution to video frame inpainting, a challenging instance of the general video inpainting problem with applications in video editing, manipulation, and forensics.

Video Editing Video Inpainting +1

The DEVIL is in the Details: A Diagnostic Evaluation Benchmark for Video Inpainting

1 code implementation CVPR 2022 Ryan Szeto, Jason J. Corso

Quantitative evaluation has increased dramatically among recent video inpainting work, but the video and mask content used to gauge performance has received relatively little attention.

Attribute Video Inpainting

A Dataset To Evaluate The Representations Learned By Video Prediction Models

1 code implementation25 Feb 2018 Ryan Szeto, Simon Stent, German Ros, Jason J. Corso

We present a parameterized synthetic dataset called Moving Symbols to support the objective study of video prediction networks.

Video Prediction

Novel Object Viewpoint Estimation through Reconstruction Alignment

1 code implementation CVPR 2020 Mohamed El Banani, Jason J. Corso, David F. Fouhey

Our key insight is that although we do not have an explicit 3D model or a predefined canonical pose, we can still learn to estimate the object's shape in the viewer's frame and then use an image to provide our reference model or canonical pose.

Image-to-Image Translation Object +1

Tukey-Inspired Video Object Segmentation

2 code implementations19 Nov 2018 Brent A. Griffin, Jason J. Corso

We investigate the problem of strictly unsupervised video object segmentation, i. e., the separation of a primary object from background in video without a user-provided object mask or any training on an annotated dataset.

Object Segmentation +3

Video Object Segmentation-based Visual Servo Control and Object Depth Estimation on a Mobile Robot Platform

1 code implementation20 Mar 2019 Brent Griffin, Victoria Florence, Jason J. Corso

To be useful in everyday environments, robots must be able to identify and locate unstructured, real-world objects.

Robotics

Attribute-Guided Sketch Generation

1 code implementation28 Jan 2019 Hao Tang, Xinya Chen, Wei Wang, Dan Xu, Jason J. Corso, Nicu Sebe, Yan Yan

To this end, we propose a novel Attribute-Guided Sketch Generative Adversarial Network (ASGAN) which is an end-to-end framework and contains two pairs of generators and discriminators, one of which is used to generate faces with attributes while the other one is employed for image-to-sketch translation.

Attribute Generative Adversarial Network +1

Learning to Estimate External Forces of Human Motion in Video

1 code implementation12 Jul 2022 Nathan Louis, Tylan N. Templin, Travis D. Eliason, Daniel P. Nicolella, Jason J. Corso

Analyzing sports performance or preventing injuries requires capturing ground reaction forces (GRFs) exerted by the human body during certain movements.

3D Human Pose Estimation Multi-Task Learning

A Critical Investigation of Deep Reinforcement Learning for Navigation

1 code implementation7 Feb 2018 Vikas Dhiman, Shurjo Banerjee, Brent Griffin, Jeffrey M. Siskind, Jason J. Corso

However, when trained and tested on different sets of maps, the algorithm fails to transfer the ability to gather and exploit map-information to unseen maps.

Navigate reinforcement-learning +1

Temporally Guided Articulated Hand Pose Tracking in Surgical Videos

1 code implementation12 Jan 2021 Nathan Louis, Luowei Zhou, Steven J. Yule, Roger D. Dias, Milisa Manojlovich, Francis D. Pagani, Donald S. Likosky, Jason J. Corso

Additionally, we collect the first dataset, Surgical Hands, that provides multi-instance articulated hand pose annotations for in-vivo videos.

Action Recognition Hand Pose Estimation +4

Video Object Segmentation using Supervoxel-Based Gerrymandering

1 code implementation18 Apr 2017 Brent A. Griffin, Jason J. Corso

Focusing on the problem of strictly unsupervised video object segmentation, we devise a method called supervoxel gerrymandering that links masks of foregroundness and backgroundness via local and non-local consensus measures.

Object Semantic Segmentation +4

Weakly-Supervised Video Object Grounding from Text by Loss Weighting and Object Interaction

no code implementations8 May 2018 Luowei Zhou, Nathan Louis, Jason J. Corso

A naive extension of this approach to the video domain is to treat the entire segment as a bag of spatial object proposals.

Descriptive Multiple Instance Learning +2

Joint Surgical Gesture and Task Classification with Multi-Task and Multimodal Learning

no code implementations2 May 2018 Duygu Sarikaya, Khurshid A. Guru, Jason J. Corso

Our experimental results show that our approach is superior compared to an ar- chitecture that classifies the gestures and surgical tasks separately on visual cues and motion cues respectively.

General Classification Multi-Task Learning

Learning Kinematic Descriptions using SPARE: Simulated and Physical ARticulated Extendable dataset

no code implementations29 Mar 2018 Abhishek Venkataraman, Brent Griffin, Jason J. Corso

SPARE is an extendable open-source dataset providing equivalent simulated and physical instances of articulated objects (kinematic chains), providing the greater research community with a training and evaluation tool for methods generating kinematic descriptions of articulated objects.

T-RECS: Training for Rate-Invariant Embeddings by Controlling Speed for Action Recognition

no code implementations21 Mar 2018 Madan Ravi Ganesh, Eric Hofesmann, Byungsu Min, Nadha Gafoor, Jason J. Corso

We explore the erratic behavior caused by this phenomena on state-of-the-art deep network-based methods for action recognition in terms of maximum performance and stability in recognition accuracy across a range of input video speeds.

Action Recognition Temporal Action Localization

Deep Net Triage: Analyzing the Importance of Network Layers via Structural Compression

no code implementations15 Jan 2018 Theodore S. Nowak, Jason J. Corso

As such, while certain structures have been found to work better than others, the significance of a model's unique structure, or the importance of a given layer, and how these translate to overall accuracy, remains unclear.

Knowledge Distillation

Adviser Networks: Learning What Question to Ask for Human-In-The-Loop Viewpoint Estimation

1 code implementation5 Feb 2018 Mohamed El Banani, Jason J. Corso

We address this question by formulating it as an Adviser Problem: can we learn a mapping from the input to a specific question to ask the human to maximize the expected positive impact to the overall task?

Viewpoint Estimation

Predicting Future Lane Changes of Other Highway Vehicles using RNN-based Deep Models

no code implementations12 Jan 2018 Sajan Patel, Brent Griffin, Kristofer Kusano, Jason J. Corso

To demonstrate our approach, we validate our model using authentic interstate highway driving to predict the future lane change maneuvers of other vehicles neighboring our autonomous vehicle.

Autonomous Vehicles Trajectory Prediction

Click Here: Human-Localized Keypoints as Guidance for Viewpoint Estimation

no code implementations ICCV 2017 Ryan Szeto, Jason J. Corso

We motivate and address a human-in-the-loop variant of the monocular viewpoint estimation task in which the location and class of one semantic object keypoint is available at test time.

Viewpoint Estimation

Action Understanding with Multiple Classes of Actors

no code implementations27 Apr 2017 Chenliang Xu, Caiming Xiong, Jason J. Corso

Despite the rapid progress, existing works on action understanding focus strictly on one type of action agent, which we call actor---a human adult, ignoring the diversity of actions performed by other actors.

Action Recognition Action Segmentation +3

Sparse Factorization Layers for Neural Networks with Limited Supervision

no code implementations14 Dec 2016 Parker Koch, Jason J. Corso

On the other hand, dictionary learning does not scale to the size of problems that CNNs can handle, despite being very effective at low-level vision tasks such as denoising and inpainting.

Denoising Dictionary Learning

Spatiotemporal Articulated Models for Dynamic SLAM

no code implementations12 Apr 2016 Suren Kumar, Vikas Dhiman, Madan Ravi Ganesh, Jason J. Corso

We propose an online spatiotemporal articulation model estimation framework that estimates both articulated structure as well as a temporal prediction model solely using passive observations.

Simultaneous Localization and Mapping

LIBSVX: A Supervoxel Library and Benchmark for Early Video Processing

no code implementations30 Dec 2015 Chenliang Xu, Jason J. Corso

Supervoxel segmentation has strong potential to be incorporated into early video analysis as superpixel segmentation has in image analysis.

Boundary Detection Segmentation +1

Actor-Action Semantic Segmentation with Grouping Process Models

no code implementations CVPR 2016 Chenliang Xu, Jason J. Corso

Actor-action semantic segmentation made an important step toward advanced video understanding problems: what action is happening; who is performing the action; and where is the action in space-time.

Semantic Segmentation Video Understanding

Compositional Structure Learning for Action Understanding

no code implementations21 Oct 2014 Ran Xu, Gang Chen, Caiming Xiong, Wei Chen, Jason J. Corso

The focus of the action understanding literature has predominately been classification, how- ever, there are many applications demanding richer action understanding such as mobile robotics and video search, with solutions to classification, localization and detection.

Action Detection Action Understanding +1

Semi-Supervised Nonlinear Distance Metric Learning via Forests of Max-Margin Cluster Hierarchies

no code implementations23 Feb 2014 David M. Johnson, Caiming Xiong, Jason J. Corso

By introducing randomness during hierarchy training and combining the output of many of the resulting semi-random weak hierarchy metrics, we can obtain a powerful and robust nonlinear metric model.

Clustering Image Retrieval +2

Active Clustering with Model-Based Uncertainty Reduction

no code implementations7 Feb 2014 Caiming Xiong, David Johnson, Jason J. Corso

Semi-supervised clustering seeks to augment traditional clustering methods by incorporating side information provided via human expertise in order to increase the semantic meaningfulness of the resulting clusters.

Clustering

A Study of Actor and Action Semantic Retention in Video Supervoxel Segmentation

no code implementations13 Nov 2013 Chenliang Xu, Richard F. Doell, Stephen José Hanson, Catherine Hanson, Jason J. Corso

In this paper, we conduct a systematic study of how well the actor and action semantics are retained in video supervoxel segmentation.

object-detection Object Detection +1

Hamiltonian Streamline Guided Feature Extraction with Applications to Face Detection

no code implementations17 Aug 2011 Yingjie Miao, Jason J. Corso

We propose a new feature extraction method based on two dynamical systems induced by intensity landscape: the negative gradient system and the Hamiltonian system.

Face Detection object-detection +1

Do Deep Reinforcement Learning Algorithms really Learn to Navigate?

no code implementations ICLR 2018 Shurjo Banerjee, Vikas Dhiman, Brent Griffin, Jason J. Corso

As the title of the paper by Mirowski et al. (2016) suggests, one might assume that DRL-based algorithms are able to “learn to navigate” and are thus ready to replace classical mapping and path-planning algorithms, at least in simulated environments.

Navigate reinforcement-learning +1

Can Humans Fly? Action Understanding With Multiple Classes of Actors

no code implementations CVPR 2015 Chenliang Xu, Shao-Hang Hsieh, Caiming Xiong, Jason J. Corso

There is no work we know of on simultaneously inferring actors and actions in the video, not to mention a dataset to experiment with.

Action Recognition Action Understanding +2

Human Action Segmentation With Hierarchical Supervoxel Consistency

no code implementations CVPR 2015 Jiasen Lu, ran Xu, Jason J. Corso

Detailed analysis of human action, such as action classification, detection and localization has received increasing attention from the community; datasets like JHMDB have made it plausible to conduct studies analyzing the impact that such deeper information has on the greater action understanding problem.

Action Classification Action Segmentation +3

A Continuous Occlusion Model for Road Scene Understanding

no code implementations CVPR 2016 Vikas Dhiman, Quoc-Huy Tran, Jason J. Corso, Manmohan Chandraker

We present a physically interpretable, continuous 3D model for handling occlusions with applications to road scene understanding.

Motion Segmentation object-detection +3

Weakly Supervised Actor-Action Segmentation via Robust Multi-Task Ranking

no code implementations CVPR 2017 Yan Yan, Chenliang Xu, Dawen Cai, Jason J. Corso

However, current methods for detailed understanding of actor and action have significant limitations: they require large amounts of finely labeled data, and they fail to capture any internal relationship among actors and actions.

Action Classification Action Segmentation +2

Action Detection by Implicit Intentional Motion Clustering

no code implementations ICCV 2015 Wei Chen, Jason J. Corso

This paper hence seeks to understand the spatiotemporal properties of intentional movement and how to capture such intentional movement without relying on challenging human detection and tracking.

Action Detection Action Recognition +5

A Geometric Approach to Online Streaming Feature Selection

no code implementations2 Oct 2019 Salimeh Yasaei Sekeh, Madan Ravi Ganesh, Shurjo Banerjee, Jason J. Corso, Alfred O. Hero

In this work, firstly, we assert that OSFS's main assumption of having data from all the samples available at runtime is unrealistic and introduce a new setting where features and samples are streamed concurrently called OSFS with Streaming Samples (OSFS-SS).

feature selection

HyperCon: Image-To-Video Model Transfer for Video-To-Video Translation Tasks

no code implementations10 Dec 2019 Ryan Szeto, Mostafa El-Khamy, Jungwon Lee, Jason J. Corso

To combine the benefits of image and video models, we propose an image-to-video model transfer method called Hyperconsistency (HyperCon) that transforms any well-trained image model into a temporally consistent video model without fine-tuning.

Image-to-Image Translation Style Transfer +4

Rethinking Curriculum Learning with Incremental Labels and Adaptive Compensation

no code implementations13 Jan 2020 Madan Ravi Ganesh, Jason J. Corso

In this work, we propose Learning with Incremental Labels and Adaptive Compensation (LILAC), a two-phase method that incrementally increases the number of unique output labels rather than the difficulty of samples while consistently using the entire dataset throughout training.

Data Augmentation Pseudo Label

MINT: Deep Network Compression via Mutual Information-based Neuron Trimming

no code implementations18 Mar 2020 Madan Ravi Ganesh, Jason J. Corso, Salimeh Yasaei Sekeh

Most approaches to deep neural network compression via pruning either evaluate a filter's importance using its weights or optimize an alternative objective function with sparsity constraints.

Neural Network Compression

Slimming Neural Networks using Adaptive Connectivity Scores

no code implementations22 Jun 2020 Madan Ravi Ganesh, Dawsin Blanchard, Jason J. Corso, Salimeh Yasaei Sekeh

Finally, we define a novel sensitivity criterion for filters that measures the strength of their contributions to the succeeding layer and highlights critical filters that need to be completely protected from pruning.

Ground-truth or DAER: Selective Re-query of Secondary Information

1 code implementation ICCV 2021 Stephan J. Lemmer, Jason J. Corso

Many vision tasks use secondary information at inference time -- a seed -- to assist a computer vision model in solving a problem.

Scene Classification Semantic Segmentation +4

The RobotSlang Benchmark: Dialog-guided Robot Localization and Navigation

no code implementations23 Oct 2020 Shurjo Banerjee, Jesse Thomason, Jason J. Corso

In each trial, the pair first cooperates to localize the robot on a global map visible to the Commander, then the Driver follows Commander instructions to move the robot to a sequence of target objects.

Navigate Simultaneous Localization and Mapping

Evaluating and Improving Interactions with Hazy Oracles

no code implementations19 Oct 2021 Stephan J. Lemmer, Jason J. Corso

Many AI systems integrate sensor inputs, world knowledge, and human-provided information to perform inference.

Referring Expression Referring Expression Comprehension +4

Q-TART: Quickly Training for Adversarial Robustness and in-Transferability

no code implementations14 Apr 2022 Madan Ravi Ganesh, Salimeh Yasaei Sekeh, Jason J. Corso

Raw deep neural network (DNN) performance is not enough; in real-world settings, computational load, training efficiency and adversarial security are just as or even more important.

Adversarial Robustness

Cannot find the paper you are looking for? You can Submit a new open access paper.