Natural language questions are inherently compositional, and many are most easily answered by reasoning about their decomposition into modular sub-problems.
Our approach efficiently detects objects in an image while simultaneously generating a high-quality segmentation mask for each instance.
#2 best model for Multi-Human Parsing on MHP v1.0
Our novel Focal Loss focuses training on a sparse set of hard examples and prevents the vast number of easy negatives from overwhelming the detector during training.
#13 best model for Object Detection on COCO
In this paper, we introduce a new channel pruning method to accelerate very deep convolutional neural networks.Given a trained CNN model, we propose an iterative two-step algorithm to effectively prune each layer, by a LASSO regression based channel selection and least square reconstruction.
State-of-the-art object objectors rely heavily on the off-the-shelf networks pre-trained on large-scale classification datasets like ImageNet, which incurs learning bias due to the difference on both the loss functions and the category distributions between classification and detection tasks.
#3 best model for Object Detection on COCO
Our method develops a deep fully convolutional neural network that takes two input frames and estimates pairs of 1D kernels for all pixels simultaneously.
The accuracy of detection suffers from degenerated object appearances in videos, e.g., motion blur, video defocus, rare poses, etc.
We propose a technique for producing "visual explanations" for decisions from a large class of CNN-based models, making them more transparent.