On all these tasks, ActNN compresses the activation to 2 bits on average, with negligible accuracy loss.
Bird's-eye-view (BEV) is a powerful and widely adopted representation for road scenes that captures surrounding objects and their spatial locations, along with overall context in the scene.
Deploying deep learning models on embedded systems has been challenging due to limited computing resources.
In this work, we first investigate the overhead of the deformable convolution on embedded FPGA SoCs, and then show the accuracy-latency tradeoffs for a set of algorithm modifications including full versus depthwise, fixed-shape, and limited-range.
Given the variety of the visual world there is not one true scale for recognition: objects may appear at drastically different sizes across the visual field.
Convolutions on monocular dash cam videos capture spatial invariances in the image plane but do not explicitly reason about distances and depth.
Adapting receptive fields by dynamic Gaussian structure further improves results, equaling the accuracy of free-form deformation while improving efficiency.
It has been challenging to analyze signals with mixed topologies (for example, point cloud with surface mesh).
The framework can not only associate detections of vehicles in motion over time, but also estimate their complete 3D bounding box information from a sequence of 2D images captured on a moving platform.
Ranked #10 on Multiple Object Tracking on KITTI Tracking test
While learning visuomotor skills in an end-to-end manner is appealing, deep neural networks are often uninterpretable and fail in surprising ways.
We present the 2017 Visual Domain Adaptation (VisDA) dataset and challenge, a large-scale testbed for unsupervised domain adaptation across visual domains.
On both datasets, we achieve better results than many state-of-the-art approaches, including a few using oracle (manually annotated) bounding boxes in the test images.
In this paper, we introduce the first domain adaptive semantic segmentation method, proposing an unsupervised adversarial approach to pixel prediction problems.
Ranked #2 on Image-to-Image Translation on SYNTHIA Fall-to-Winter
Fine-grained categorization, which aims to distinguish subordinate-level categories such as bird species or dog breeds, is an extremely challenging task.
Image semantic segmentation is the task of partitioning image into several regions based on semantic concepts.