Geometric data acquired from real-world scenes, e. g., 2D depth images, 3D point clouds, and 4D dynamic point clouds, have found a wide range of applications including immersive telepresence, autonomous driving, surveillance, etc.
In order to impute the missing values, state-of-the-art methods are built on Recurrent Neural Networks (RNN), which process each time stamp sequentially, prohibiting the direct modeling of the relationship between distant time stamps.
Understanding human activity based on sensor information is required in many applications and has been an active research area.
Ranked #5 on Skeleton Based Action Recognition on MSR Action3D
In order to jointly capture the self-attention across multiple dimensions, including time, location and the sensor measurements, while maintain low computational complexity, we propose a novel approach called Cross-Dimensional Self-Attention (CDSA) to process each dimension sequentially, yet in an order-independent manner.
We aim to tackle a novel task in action detection - Online Detection of Action Start (ODAS) in untrimmed, streaming videos.
We use a general feature-extraction operator to represent application-dependent features and propose a general reconstruction error to evaluate the quality of resampling.
In 3D image/video acquisition, different views are often captured with varying noise levels across the views.