Temporal Segment Networks: Towards Good Practices for Deep Action Recognition

The other contribution is our study on a series of good practices in learning ConvNets on video data with the help of temporal segment network.

Moments in Time Dataset: one million videos for event understanding

We present the Moments in Time Dataset, a large-scale human-annotated collection of one million short videos corresponding to dynamic events unfolding within three seconds.

Interpretable 3D Human Action Analysis with Temporal Convolutional Networks

In this work, we propose to use a new class of models known as Temporal Convolutional Neural Networks (TCN) for 3D human action recognition.

Distilling Audio-Visual Knowledge by Compositional Contrastive Learning

Having access to multi-modal cues (e. g. vision and audio) empowers some cognitive tasks to be done faster compared to learning from a single modality.

Cross-modal Learning by Hallucinating Missing Modalities in RGB-D Vision

We report state-of-the-art or comparable results on video action recognition on the largest multimodal dataset available for this task, the NTU RGB+D, as well as on the UWA3DII and Northwestern-UCLA.

EV-Action: Electromyography-Vision Multi-Modal Action Dataset

To make up this, we introduce a new, large-scale EV-Action dataset in this work, which consists of RGB, depth, electromyography (EMG), and two skeleton modalities.

Bayesian Hierarchical Dynamic Model for Human Action Recognition

Human action recognition remains as a challenging task partially due to the presence of large variations in the execution of action.

