In this work we aim to solve a large collection of tasks using a single reinforcement learning agent with a single set of parameters.
#2 best model for Atari Games on Atari 2600 Pong
The proposed model then warps the input frames, depth maps, and contextual features based on the optical flow and local interpolation kernels for synthesizing the output frame.
Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.
#8 best model for Semantic Segmentation on Cityscapes val
Plugged into the FCOS object detector, the SAG-Mask branch predicts a segmentation mask on each box with the spatial attention map that helps to focus on informative pixels and suppress noise.
Second, frame-based models perform quite well on action recognition; is pre-training for good image features sufficient or is pre-training for spatio-temporal features valuable for optimal transfer learning?
In this paper we discuss several forms of spatiotemporal convolutions for video analysis and study their effects on action recognition.
#3 best model for Action Recognition In Videos on Sports-1M
In addition, we introduce a graph convolutional model that consistently matches or outperforms models using fixed molecular descriptors as well as previous graph neural architectures on both public and proprietary datasets.