Quantity over Quality: Training an AV Motion Planner with Large Scale Commodity Vision Data

With the Autonomous Vehicle (AV) industry shifting towards machine-learned approaches for motion planning, the performance of self-driving systems is starting to rely heavily on large quantities of expert driving demonstrations. However, collecting this demonstration data typically involves expensive HD sensor suites (LiDAR + RADAR + cameras), which quickly becomes financially infeasible at the scales required. This motivates the use of commodity sensors like cameras for data collection, which are an order of magnitude cheaper than HD sensor suites, but offer lower fidelity. Leveraging these sensors for training an AV motion planner opens a financially viable path to observe the `long tail' of driving events. As our main contribution we show it is possible to train a high-performance motion planner using commodity vision data which outperforms planners trained on HD-sensor data for a fraction of the cost. To the best of our knowledge, we are the first to demonstrate this using real-world data. We compare the performance of the autonomy system on these two different sensor configurations, and show that we can compensate for the lower sensor fidelity by means of increased quantity: a planner trained on 100h of commodity vision data outperforms the one with 25h of expensive HD data. We also share the engineering challenges we had to tackle to make this work.

Results in Papers With Code
(↓ scroll down to see all results)