We present MorphNet, an approach to automate the design of neural network structures.
The AVA dataset densely annotates 80 atomic visual actions in 430 15-minute video clips, where actions are localized in space and time, resulting in 1.58M action labels with multiple labels per person occurring frequently.
In particular, annotation errors, the size of the dataset, and the level of challenge are addressed: new annotation for both datasets is created with an extra attention to the reliability of the ground truth.
To address this limitation, we propose StarGAN, a novel and scalable approach that can perform image-to-image translations for multiple domains using only a single model.
SOTA for Image-to-Image Translation on RaFD
We present a new method for synthesizing high-resolution photo-realistic images from semantic label maps using conditional generative adversarial networks (conditional GANs).
The improvements in recent CNN-based object detection works, from R-CNN , Fast/Faster R-CNN [10, 31] to recent Mask R-CNN  and RetinaNet , mainly come from new network, new framework, or novel loss design.
In this paper, we explore the impact of global contextual information in semantic segmentation by introducing the Context Encoding Module, which captures the semantic context of scenes and selectively highlights class-dependent featuremaps.
SOTA for Semantic Segmentation on ADE20K
Although it is well believed for years that modeling relations between objects would help object recognition, there has not been evidence that the idea is working in the deep learning era.