We compare our method, which we call PAAK, with prior approaches, including POSA, PROX ground truth, and a motion synthesis method, and highlight the benefits of our method with a perceptual study.
We present 3MASSIV, a multilingual, multimodal and multi-aspect, expertly-annotated dataset of diverse short videos extracted from short-video social media platform - Moj.
To tackle the scarcity of publicly available datasets in the telemental health space, we release a new dataset, MEDICA, for mental health patient engagement detection.
Amodal recognition is the ability of the system to detect occluded objects.
A key aspect of driving a road vehicle is to interact with the other road users, assess their intentions and make risk-aware tactical decisions.
We utilize both these trust metrics into an optimal cognitive reasoning scheme that decides when and when not to trust the given guidance.
Our task is to map gestures to novel emotion categories not encountered in training.
Additionally, we extract and compare affective cues corresponding to perceived emotion from the two modalities within a video to infer whether the input video is "real" or "fake".
We report an AP of 65. 83 across 4 categories on GroupWalk, which is also an improvement over prior methods.
Ranked #1 on Emotion Recognition in Context on EMOTIC
Our approach predicts the perceived emotions of a pedestrian from walking gaits, which is then used for emotion-guided navigation taking into account social and proxemic constraints.
Ranked #1 on Emotion Classification on EWALK
We present a data-driven deep neural algorithm for detecting deceptive walking behavior using nonverbal cues like gaits and gestures.
In practice, our approach reduces the average prediction error by more than 54% over prior algorithms and achieves a weighted average accuracy of 91. 2% for behavior prediction.
Ranked #1 on Trajectory Prediction on ApolloScape
For the annotated data, we also train a classifier to map the latent embeddings to emotion labels.
Our approach combines cues from multiple co-occurring modalities (such as face, text, and speech) and also is more robust than other methods to sensor noise in any of the individual modalities.
We use hundreds of annotated real-world gait videos and augment them with thousands of annotated synthetic gaits generated using a novel generative network called STEP-Gen, built on an ST-GCN based Conditional Variational Autoencoder (CVAE).
RobustTP is an approach that first computes trajectories using a combination of a non-linear motion model and a deep learning-based instance segmentation algorithm.
We also investigate the perception of a user in an AR setting and observe that an FVA has a statistically significant improvement in terms of the perceived friendliness and social presence of a user compared to an agent without the friendliness modeling.
We present a realtime tracking algorithm, RoadTrack, to track heterogeneous road-agents in dense traffic videos.
We also present an EWalk (Emotion Walk) dataset that consists of videos of walking individuals with gaits and labeled emotions.
We evaluate the performance of our prediction algorithm, TraPHic, on the standard datasets and also introduce a new dense, heterogeneous traffic dataset corresponding to urban Asian videos and agent trajectories.
Ranked #1 on Trajectory Prediction on NGSIM
We present a Pedestrian Dominance Model (PDM) to identify the dominance characteristics of pedestrians for robot navigation.
We also present a novel interactive multi-agent simulation algorithm to model entitative groups and conduct a VR user study to validate the socio-emotional predictive power of our algorithm.
Graphics Human-Computer Interaction
Our planning system combines a POMDP algorithm with the pedestrian motion model and runs in near real time.
We present a novel approach to automatically identify driver behaviors from vehicle trajectories and use them for safe navigation of autonomous vehicles.
We present a new method for training pedestrian detectors on an unannotated set of images.
We present a novel procedural framework to generate an arbitrary number of labeled crowd videos (LCrowdV).
We automatically compute the optimal parameters for each of these different models based on prior tracked data and use the best model as motion prior for our particle-filter based tracking algorithm.