For autonomous driving, this means that large objects close to the sensors are easily visible, but far-away or small objects comprise only one measurement or two.
Ranked #8 on 3D Object Detection on nuScenes
Our world offers a never-ending stream of visual stimuli, yet today's vision systems only accurately recognize patterns within a few seconds.
Ranked #3 on Action Recognition on AVA v2.2
This assumption greatly simplifies the learning problem, factorizing the dynamics into a nonreactive world model and a low-dimensional and compact forward model of the ego-vehicle.
Ranked #5 on Autonomous Driving on CARLA Leaderboard
We develop a probabilistic interpretation of two-stage object detection.
Ranked #16 on Object Detection on COCO test-dev (using extra training data)
We use these recognition datasets to link up a source and target domain to transfer models between them in a task distillation framework.
Three-dimensional objects are commonly represented as 3D boxes in a point-cloud.
Ranked #1 on 3D Object Detection on waymo pedestrian
Nowadays, tracking is dominated by pipelines that perform object detection followed by temporal association, also known as tracking-by-detection.
Ranked #2 on Multiple Object Tracking on KITTI Tracking test
We empirically demonstrate a general and robust grid schedule that yields a significant out-of-the-box training speedup without a loss in accuracy for different models (I3D, non-local, SlowFast), datasets (Kinetics, Something-Something, Charades), and training settings (with and without pre-training, 128 GPUs or 1 GPU).
Ranked #1 on Video Classification on Kinetics
Convolutions on monocular dash cam videos capture spatial invariances in the image plane but do not explicitly reason about distances and depth.
With the advent of deep learning, object detection drifted from a bottom-up to a top-down recognition problem.
Ranked #80 on Object Detection on COCO minival
To understand the world, we humans constantly need to relate the present to the past, and put events in context.
Ranked #3 on Action Recognition on AVA v2.1
The framework can not only associate detections of vehicles in motion over time, but also estimate their complete 3D bounding box information from a sequence of 2D images captured on a moving platform.
Ranked #10 on Multiple Object Tracking on KITTI Tracking test
), we propose to train a deep network directly on the compressed video.
Ranked #32 on Action Classification on Charades (using extra training data)
In addition, we show that a simple margin based loss is sufficient to outperform all other loss functions.
Ranked #4 on Image Retrieval on CARS196
Realistic image manipulation is challenging because it requires modifying the image appearance in a user-controlled way, while preserving the realism of the result.
The ability of the Generative Adversarial Networks (GANs) framework to learn generative models mapping from simple latent distributions to arbitrarily complex data distributions has been demonstrated empirically, with compelling results showing that the latent space of such generators captures semantic variation in the data distribution.
We use ground-truth synthetic-to-synthetic correspondences, provided by the rendering engine, to train a ConvNet to predict synthetic-to-real, real-to-real and real-to-synthetic correspondences that are cycle-consistent with the ground-truth.
We present a regression framework which models the output distribution of neural networks.
Convolutional Neural Networks spread through computer vision like a wildfire, impacting almost all visual tasks imaginable.
We propose a data-driven approach for intrinsic image decomposition, which is the process of inferring the confounding factors of reflectance and shading in an image.
We propose Constrained CNN (CCNN), a method which uses a novel loss function to optimize for any set of linear constraints on the output space (i. e. predicted label distribution) of a CNN.