|TREND||DATASET||BEST METHOD||PAPER TITLE||PAPER||CODE||COMPARE|
We show that the convolution-free VATT outperforms state-of-the-art ConvNet-based architectures in the downstream tasks.
Ranked #1 on Action Classification on Moments in Time (using extra training data)
In particular, we explore how best to combine the modalities, such that fine-grained representations of the visual and audio modalities can be maintained, whilst also integrating text into a common embedding.
Ranked #1 on Self-Supervised Action Recognition on HMDB51
We present SlowFast networks for video recognition.
Ranked #1 on Action Recognition In Videos on AVA v2.1
State-of-the-art computer vision systems are trained to predict a fixed set of predetermined object categories.
Ranked #1 on Zero-Shot Transfer Image Classification on aYahoo
Furthermore, based on the temporal segment networks, we won the video classification track at the ActivityNet challenge 2016 among 24 teams, which demonstrates the effectiveness of TSN and the proposed good practices.
Ranked #12 on Action Classification on Moments in Time (Top 5 Accuracy metric)
The other contribution is our study on a series of good practices in learning ConvNets on video data with the help of temporal segment network.
Ranked #3 on Multimodal Activity Recognition on EV-Action
We propose a simple, yet effective approach for spatiotemporal feature learning using deep 3-dimensional convolutional networks (3D ConvNets) trained on a large scale supervised video dataset.
Ranked #1 on Action Recognition In Videos on UCF101
Recent applications of Convolutional Neural Networks (ConvNets) for human action recognition in videos have proposed different solutions for incorporating the appearance and motion information.
Ranked #49 on Action Recognition on HMDB-51 (using extra training data)
However, for action recognition in videos, the improvement of deep convolutional networks is not so evident.
Ranked #58 on Action Recognition on UCF101