Finally, we illustrate how, by using SVG, one can benefit from datasets and advancements in other research fronts that also utilize the same input format.
We present a fast bottom-up method that jointly detects over 100 keypoints on humans or objects, also referred to as human/object pose estimation.
Ranked #1 on Car Pose Estimation on ApolloCar3D
The ability to predict unseen vehicles is critical for safety in autonomous driving.
An attack is a small yet carefully-crafted perturbations to fail predictors.
Human trajectory forecasting in crowds, at its core, is a sequence prediction problem with specific challenges of capturing inter-sequence dependencies (social interactions) and consequently predicting socially-compliant multimodal distributions.
On the other hand, recent works use data-driven approaches which can learn complex interactions from the data leading to superior performance.
We present a generic neural network architecture that uses Composite Fields to detect and construct a spatio-temporal pose which is a single, connected graph whose nodes are the semantic keypoints (e. g., a person's body joints) in multiple frames.
Ranked #4 on Multi-Person Pose Estimation on COCO
Learning socially-aware motion representations is at the core of recent advances in multi-agent problems, such as human motion forecasting and robot navigation in crowds.
Ranked #1 on Trajectory Forecasting on TrajNet++
By increasing the number of attributes jointly learned, we highlight an issue related to the scales of gradients, which arises in MTL with numerous tasks.
This work tries to solve this problem by jointly predicting the intention and visual states of pedestrians.
Monocular and stereo visions are cost-effective solutions for 3D human localization in the context of self-driving cars or social robots.
Scalable Vector Graphics (SVG) are ubiquitous in modern 2D interfaces due to their ability to scale to different resolutions.
Ranked #1 on Vector Graphics Animation on SVG-Icons8
In this work, we present an in-depth analysis of existing deep learning-based methods for modelling social interactions.
Ranked #3 on Trajectory Prediction on TrajNet++
We propose a simple yet effective method for leveraging these image priors to improve semantic segmentation of images from sequential driving datasets.
We tackle the fundamentally ill-posed problem of 3D human localization from monocular RGB images.
We argue that such loss function is not suited for the visual re-identification task hence propose to model confidence in the representation learning framework.
We present an end-to-end deep Convolutional Neural Network called Convolutional Relational Machine (CRM) for recognizing group activities that utilizes the information in spatial relations between individual persons in image or video.
We propose a new bottom-up method for multi-person 2D human pose estimation that is particularly well suited for urban mobility such as self-driving cars and delivery robots.
Ranked #9 on Keypoint Detection on COCO test-dev
In discrete choice modeling (DCM), model misspecifications may lead to limited predictability and biased parameter estimates.
We propose to (i) rethink pairwise interactions with a self-attention mechanism, and (ii) jointly model Human-Robot as well as Human-Human interactions in the deep reinforcement learning framework.
Understanding human motion behavior is critical for autonomous moving platforms (like self-driving cars and social robots) if they are to navigate human-centric environments.
Ranked #12 on Trajectory Prediction on Stanford Drone
We exploit two sources of information: the past motion trajectory of the agent of interest and a wide top-view image of the navigation scene.
no code implementations • 1 Aug 2017 • Albert Haque, Michelle Guo, Alexandre Alahi, Serena Yeung, Zelun Luo, Alisha Rege, Jeffrey Jopling, Lance Downing, William Beninati, Amit Singh, Terry Platchek, Arnold Milstein, Li Fei-Fei
One in twenty-five patients admitted to a hospital will suffer from a hospital acquired infection.
Physiological signals such as heart rate can provide valuable information about an individual's state and activity.
Recent progress in style transfer on images has focused on improving the quality of stylized images and speed of methods.
To address this challenge, we present a structure of Recurrent Neural Networks (RNN) that jointly reasons on multiple cues over a temporal window.
We present an unsupervised representation learning approach that compactly encodes the motion dependencies in videos.
We present a unified framework for understanding human social behaviors in raw image sequences.
Ranked #2 on Action Recognition on Volleyball
We present an attention-based model that reasons on human body shape and motion dynamics to identify individuals in the absence of RGB information, hence in the dark.
Different from the conventional LSTM, we share the information between multiple LSTMs through a new pooling layer.
Ranked #1 on Trajectory Prediction on Stanford Drone (ADE (8/12) @K=5 metric)
We consider image transformation problems, where an input image is transformed into an output image.
Ranked #4 on Nuclear Segmentation on Cell17
We propose a viewpoint invariant model for 3D human pose estimation from a single depth image.
Ranked #4 on Pose Estimation on ITOP top-view
When given a single frame of the video, humans can not only interpret the content of the scene, but also they are able to forecast the near future.
We present an extensive evaluation where different methods for trajectory forecasting are evaluated and compared.
Online Multi-Object Tracking (MOT) has wide applications in time-critical video analysis scenarios, such as robot navigation and autonomous driving.
Ranked #17 on Multiple Object Tracking on KITTI Tracking test
Inspired by the recent success of RGB-D cameras, we propose the enrichment of RGB data with an additional "quasi-free" modality, namely, the wireless signal (e. g., wifi or Bluetooth) emitted by individuals' cell phones, referred to as RGB-W.