Search Results for author: Anelia Angelova

Found 46 papers, 17 papers with code

AssembleNet++: Assembling Modality Representations via Attention Connections - Supplementary Material -

no code implementations ECCV 2020 Michael S. Ryoo, AJ Piergiovanni, Juhana Kangaspunta, Anelia Angelova

We create a family of powerful video models which are able to: (i) learn interactions between semantic object information and raw appearance and motion features, and (ii) deploy attention in order to better learn the importance of features at each convolutional block of the network.

Activity Recognition

Answer-Me: Multi-Task Open-Vocabulary Visual Question Answering

no code implementations2 May 2022 AJ Piergiovanni, Wei Li, Weicheng Kuo, Mohammad Saffar, Fred Bertsch, Anelia Angelova

We present Answer-Me, a task-aware multi-task framework which unifies a variety of question answering tasks, such as, visual question answering, visual entailment, visual reasoning.

Image Captioning Question Answering +3

FindIt: Generalized Localization with Natural Language Queries

no code implementations31 Mar 2022 Weicheng Kuo, Fred Bertsch, Wei Li, AJ Piergiovanni, Mohammad Saffar, Anelia Angelova

We propose FindIt, a simple and versatile framework that unifies a variety of visual grounding and localization tasks including referring expression comprehension, text-based localization, and object detection.

Object Detection Referring Expression +2

TokenLearner: Adaptive Space-Time Tokenization for Videos

no code implementations NeurIPS 2021 Michael Ryoo, AJ Piergiovanni, Anurag Arnab, Mostafa Dehghani, Anelia Angelova

In this paper, we introduce a novel visual representation learning which relies on a handful of adaptively learned tokens, and which is applicable to both image and video understanding tasks.

Representation Learning Video Recognition +1

4D-Net for Learned Multi-Modal Alignment

no code implementations ICCV 2021 AJ Piergiovanni, Vincent Casser, Michael S. Ryoo, Anelia Angelova

We present 4D-Net, a 3D object detection approach, which utilizes 3D Point Cloud and RGB sensing information, both in time.

3D Object Detection

Patch2CAD: Patchwise Embedding Learning for In-the-Wild Shape Retrieval from a Single Image

no code implementations ICCV 2021 Weicheng Kuo, Anelia Angelova, Tsung-Yi Lin, Angela Dai

3D perception of object shapes from RGB image input is fundamental towards semantic scene understanding, grounding image-based perception in our spatially 3-dimensional real-world environments.

Scene Understanding

Learning Open-World Object Proposals without Learning to Classify

2 code implementations15 Aug 2021 Dahun Kim, Tsung-Yi Lin, Anelia Angelova, In So Kweon, Weicheng Kuo

In this paper, we identify that the problem is that the binary classifiers in existing proposal methods tend to overfit to the training categories.

Object Discovery Object Localization +2

Unsupervised Discovery of Actions in Instructional Videos

no code implementations28 Jun 2021 AJ Piergiovanni, Anelia Angelova, Michael S. Ryoo, Irfan Essa

In this paper we address the problem of automatically discovering atomic actions in unsupervised manner from instructional videos.

TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?

3 code implementations21 Jun 2021 Michael S. Ryoo, AJ Piergiovanni, Anurag Arnab, Mostafa Dehghani, Anelia Angelova

In this paper, we introduce a novel visual representation learning which relies on a handful of adaptively learned tokens, and which is applicable to both image and video understanding tasks.

Action Classification Image Classification +3

Unsupervised Action Segmentation for Instructional Videos

no code implementations7 Jun 2021 AJ Piergiovanni, Anelia Angelova, Michael S. Ryoo, Irfan Essa

In this paper we address the problem of automatically discovering atomic actions in unsupervised manner from instructional videos, which are rarely annotated with atomic actions.

Action Segmentation

SMURF: Self-Teaching Multi-Frame Unsupervised RAFT with Full-Image Warping

1 code implementation CVPR 2021 Austin Stone, Daniel Maurer, Alper Ayvaci, Anelia Angelova, Rico Jonschkowski

We present SMURF, a method for unsupervised learning of optical flow that improves state of the art on all benchmarks by $36\%$ to $40\%$ (over the prior best method UFlow) and even outperforms several supervised approaches such as PWC-Net and FlowNet2.

Frame Optical Flow Estimation

Adaptive Intermediate Representations for Video Understanding

no code implementations14 Apr 2021 Juhana Kangaspunta, AJ Piergiovanni, Rico Jonschkowski, Michael Ryoo, Anelia Angelova

A common strategy to video understanding is to incorporate spatial and motion information by fusing features derived from RGB frames and optical flow.

Action Classification Optical Flow Estimation +2

Visionary: Vision architecture discovery for robot learning

no code implementations26 Mar 2021 Iretiayo Akinola, Anelia Angelova, Yao Lu, Yevgen Chebotar, Dmitry Kalashnikov, Jacob Varley, Julian Ibarz, Michael S. Ryoo

We propose a vision-based architecture search algorithm for robot manipulation learning, which discovers interactions between low dimension action inputs and high dimensional visual inputs.

Neural Architecture Search

Unsupervised Monocular Depth Learning in Dynamic Scenes

4 code implementations30 Oct 2020 Hanhan Li, Ariel Gordon, Hang Zhao, Vincent Casser, Anelia Angelova

We present a method for jointly training the estimation of depth, ego-motion, and a dense 3D translation field of objects relative to the scene, with monocular photometric consistency being the sole source of supervision.

Depth Estimation Translation

AssembleNet++: Assembling Modality Representations via Attention Connections

1 code implementation18 Aug 2020 Michael S. Ryoo, AJ Piergiovanni, Juhana Kangaspunta, Anelia Angelova

We create a family of powerful video models which are able to: (i) learn interactions between semantic object information and raw appearance and motion features, and (ii) deploy attention in order to better learn the importance of features at each convolutional block of the network.

Action Classification Activity Recognition

Mask2CAD: 3D Shape Prediction by Learning to Segment and Retrieve

no code implementations ECCV 2020 Wei-cheng Kuo, Anelia Angelova, Tsung-Yi Lin, Angela Dai

We propose to leverage existing large-scale datasets of 3D models to understand the underlying 3D structure of objects seen in an image by constructing a CAD-based representation of the objects and their poses.

Object Recognition

AttentionNAS: Spatiotemporal Attention Cell Search for Video Classification

no code implementations ECCV 2020 Xiaofang Wang, Xuehan Xiong, Maxim Neumann, AJ Piergiovanni, Michael S. Ryoo, Anelia Angelova, Kris M. Kitani, Wei Hua

The discovered attention cells can be seamlessly inserted into existing backbone networks, e. g., I3D or S3D, and improve video classification accuracy by more than 2% on both Kinetics-600 and MiT datasets.

Classification General Classification +1

What Matters in Unsupervised Optical Flow

1 code implementation ECCV 2020 Rico Jonschkowski, Austin Stone, Jonathan T. Barron, Ariel Gordon, Kurt Konolige, Anelia Angelova

We systematically compare and analyze a set of key components in unsupervised optical flow to identify which photometric loss, occlusion handling, and smoothness regularization is most effective.

Occlusion Handling Optical Flow Estimation

Differentiable Mapping Networks: Learning Structured Map Representations for Sparse Visual Localization

no code implementations19 May 2020 Peter Karkus, Anelia Angelova, Vincent Vanhoucke, Rico Jonschkowski

We address these tasks by combining spatial structure (differentiable mapping) and end-to-end learning in a novel neural network architecture: the Differentiable Mapping Network (DMN).

Visual Localization

Taskology: Utilizing Task Relations at Scale

no code implementations CVPR 2021 Yao Lu, Sören Pirk, Jan Dlabal, Anthony Brohan, Ankita Pasad, Zhao Chen, Vincent Casser, Anelia Angelova, Ariel Gordon

Many computer vision tasks address the problem of scene understanding and are naturally interrelated e. g. object classification, detection, scene segmentation, depth estimation, etc.

Depth Estimation Motion Estimation +3

X-Ray: Mechanical Search for an Occluded Object by Minimizing Support of Learned Occupancy Distributions

no code implementations20 Apr 2020 Michael Danielczuk, Anelia Angelova, Vincent Vanhoucke, Ken Goldberg

For applications in e-commerce, warehouses, healthcare, and home service, robots are often required to search through heaps of objects to grasp a specific target object.

Improving Semantic Segmentation through Spatio-Temporal Consistency Learned from Videos

no code implementations11 Apr 2020 Ankita Pasad, Ariel Gordon, Tsung-Yi Lin, Anelia Angelova

We leverage unsupervised learning of depth, egomotion, and camera intrinsics to improve the performance of single-image semantic segmentation, by enforcing 3D-geometric and temporal consistency of segmentation masks across video frames.

Semantic Segmentation

SPIN: A High Speed, High Resolution Vision Dataset for Tracking and Action Recognition in Ping Pong

no code implementations13 Dec 2019 Steven Schwarcz, Peng Xu, David D'Ambrosio, Juhana Kangaspunta, Anelia Angelova, Huong Phan, Navdeep Jaitly

The corpus consists of ping pong play with three main annotation streams that can be used to learn tracking and action recognition models -- tracking of the ping pong ball and poses of humans in the videos and the spin of the ball being hit by humans.

Action Recognition Pose Estimation

KeyPose: Multi-View 3D Labeling and Keypoint Estimation for Transparent Objects

2 code implementations CVPR 2020 Xingyu Liu, Rico Jonschkowski, Anelia Angelova, Kurt Konolige

We address two problems: first, we establish an easy method for capturing and labeling 3D keypoints on desktop objects with an RGB camera; and second, we develop a deep neural network, called $KeyPose$, that learns to accurately predict object poses using 3D keypoints, from stereo input, and works even for transparent objects.

3D Pose Estimation Transparent objects

Tiny Video Networks

2 code implementations15 Oct 2019 AJ Piergiovanni, Anelia Angelova, Michael S. Ryoo

Video understanding is a challenging problem with great impact on the abilities of autonomous agents working in the real-world.

Video Understanding

Depth from Videos in the Wild: Unsupervised Monocular Depth Learning from Unknown Cameras

4 code implementations ICCV 2019 Ariel Gordon, Hanhan Li, Rico Jonschkowski, Anelia Angelova

We present a novel method for simultaneous learning of depth, egomotion, object motion, and camera intrinsics from monocular videos, using only consistency across neighboring video frames as supervision signal.

Depth Estimation

ShapeMask: Learning to Segment Novel Objects by Refining Shape Priors

1 code implementation ICCV 2019 Wei-cheng Kuo, Anelia Angelova, Jitendra Malik, Tsung-Yi Lin

However, it is difficult and costly to segment objects in novel categories because a large number of mask annotations is required.

Instance Segmentation Semantic Segmentation

Differentiable Grammars for Videos

no code implementations1 Feb 2019 AJ Piergiovanni, Anelia Angelova, Michael S. Ryoo

This paper proposes a novel algorithm which learns a formal regular grammar from real-world continuous data, such as videos.

Future Segmentation Using 3D Structure

no code implementations28 Nov 2018 Suhani Vora, Reza Mahjourian, Soeren Pirk, Anelia Angelova

Predicting the future to anticipate the outcome of events and actions is a critical attribute of autonomous agents; particularly for agents which must rely heavily on real time visual data for decision making.

Decision Making Frame +1

Probabilistic Object Detection: Definition and Evaluation

1 code implementation27 Nov 2018 David Hall, Feras Dayoub, John Skinner, Haoyang Zhang, Dimity Miller, Peter Corke, Gustavo Carneiro, Anelia Angelova, Niko Sünderhauf

We introduce Probabilistic Object Detection, the task of detecting objects in images and accurately quantifying the spatial and semantic uncertainties of the detections.

Object Detection

Object category learning and retrieval with weak supervision

1 code implementation26 Jan 2018 Steven Hickson, Anelia Angelova, Irfan Essa, Rahul Sukthankar

We consider the problem of retrieving objects from image data and learning to classify them into meaningful semantic categories with minimal supervision.

Deep Clustering

Deep Value Networks Learn to Evaluate and Iteratively Refine Structured Outputs

1 code implementation ICML 2017 Michael Gygli, Mohammad Norouzi, Anelia Angelova

We approach structured output prediction by optimizing a deep value network (DVN) to precisely estimate the task loss on different output configurations for a given input.

General Classification Multi-Label Classification +1

Improved generator objectives for GANs

no code implementations8 Dec 2016 Ben Poole, Alexander A. Alemi, Jascha Sohl-Dickstein, Anelia Angelova

We present a framework to understand GAN training as alternating density ratio estimation and approximate divergence minimization.

Density Ratio Estimation

Object Recognition from Short Videos for Robotic Perception

no code implementations4 Sep 2015 Ivan Bogun, Anelia Angelova, Navdeep Jaitly

Videos, unlike still images, are temporally coherent which makes the application of deep networks non-trivial.

Object Recognition

Efficient Object Detection and Segmentation for Fine-Grained Recognition

no code implementations CVPR 2013 Anelia Angelova, Shenghuo Zhu

The algorithm first detects low-level regions that could potentially belong to the object and then performs a full-object segmentation through propagation.

Object Detection Semantic Segmentation

Cannot find the paper you are looking for? You can Submit a new open access paper.