Crowd-Robot Interaction: Crowd-aware Robot Navigation with Attention-based Deep Reinforcement Learning

vita-epfl/DyNav 24 Sep 2018

We propose to (i) rethink pairwise interactions with a self-attention mechanism, and (ii) jointly model Human-Robot as well as Human-Human interactions in the deep reinforcement learning framework.

MT-VAE: Learning Motion Transformations to Generate Multimodal Human Dynamics

xcyan/eccv18_mtvae ECCV 2018

Our model jointly learns a feature embedding for motion modes (that the motion sequence can be reconstructed from) and a feature transformation that represents the transition of one motion mode to the next motion mode.

Action-Agnostic Human Pose Forecasting

eddyhkchiu/pose_forecast_wacv 23 Oct 2018

In this paper, we propose a new action-agnostic method for short- and long-term human pose forecasting.

Learning 3D Human Dynamics from Video

akanazawa/human_dynamics CVPR 2019

We present a framework that can similarly learn a representation of 3D dynamics of humans from video via a simple but effective temporal encoding of image features.

Predicting 3D Human Dynamics from Video

jasonyzhang/phd ICCV 2019

In this work, we present perhaps the first approach for predicting a future 3D mesh model sequence of a person from past video input.

Contact and Human Dynamics from Monocular Video

davrempe/contact-human-dynamics ECCV 2020

Existing deep models predict 2D and 3D kinematic poses from video that are approximately accurate, but contain visible errors that violate physical constraints, such as feet penetrating the ground and bodies leaning at extreme angles.

Behavior-Driven Synthesis of Human Dynamics

CompVis/behavior-driven-video-synthesis CVPR 2021

Using this representation, we are able to change the behavior of a person depicted in an arbitrary posture, or to even directly transfer behavior observed in a given video sequence.

Towards Tokenized Human Dynamics Representation

likenneth/acton 22 Nov 2021

For human action understanding, a popular research direction is to analyze short video clips with unambiguous semantic content, such as jumping and drinking.