BDD-A Dataset | Papers With Code

Name:*

Full name (optional):

Description (Markdown and $\LaTeX$ enabled):*

Dataset Statistics: The statistics of our dataset are summarized and compared with the
largest existing dataset (DR(eye)VE) [1] in Table 1. Our dataset was collected using videos
selected from a publicly available, large-scale, crowd-sourced driving video dataset, BDD100k [30,
31]. BDD100K contains human-demonstrated dashboard videos and time-stamped sensor
measurements collected during urban driving in various weather and lighting conditions. To
efficiently collect attention data for critical driving situations, we specifically selected video clips
that both included braking events and took place in busy areas (see supplementary materials
for technical details). We then trimmed videos to include 6.5 seconds prior to and 3.5 seconds
after each braking event. It turned out that other driving actions, e.g., turning, lane switching
and accelerating, were also included. 1,232 videos (=3.5 hours) in total were collected following
these procedures. Some example images from our dataset are shown in Fig. 6. Our selected
videos contain a large number of different road users. We detected the objects in our videos
using YOLO [22].On average, each video frame contained 4.4 cars and 0.3 pedestrians, multiple
times more than the DR(eye)VE dataset (Table 1).
Data Collection Procedure: For our eye-tracking experiment, we recruited 45 participants
who each had more than one year of driving experience. The participants watched the selected
driving videos in the lab while performing a driving instructor task: participants were asked
to imagine that they were driving instructors sitting in the copilot seat and needed to press
the space key whenever they felt it necessary to correct or warn the student driver of potential
dangers. Their eye movements during the task were recorded at 1000 Hz with an EyeLink 1000
desktop-mounted infrared eye tracker, used in conjunction with the Eyelink Toolbox scripts [7]
for MATLAB. Each participant completed the task for 200 driving videos. Each driving video
was viewed by at least 4 participants. The gaze patterns made by these independent participants
were aggregated and smoothed to make an attention map for each frame of the stimulus video
(see Fig. 6 and supplementary materials for technical details).
Psychological studies [19, 11] have shown that when humans look through multiple visual
cues that simultaneously demand attention, the order in which humans look at those cues is
highly subjective. Therefore, by aggregating gazes of independent observers, we could record
multiple important visual cues in one frame. In addition, it has been shown that human drivers
look at buildings, trees, flowerbeds, and other unimportant objects non-negligibly frequently
[1]. Presumably, these eye movements should be regarded as noise for driving-related machine
learning purposes. By averaging the eye movements of independent observers, we were able to
effectively wash out those sources of noise (see Fig. 2B).
Comparison with In-Car Attention Data: We collected in-lab driver attention data using
videos from the DR(eye)VE dataset. This allowed us to compare in-lab and in-car attention
maps of each video. The DR(eye)VE videos we used were 200 randomly selected 10-second
video clips, half of them containing braking events and half without braking events.
We tested how well in-car and in-lab attention maps highlighted driving-relevant objects.
We used YOLO [22] to detect the objects in the videos of our dataset. We identified three
object categories that are important for driving and that had sufficient instances in the videos
(car, pedestrian and cyclist). We calculated the proportion of attended objects out of total
detected instances for each category for both in-lab and in-car attention maps (see supplementary
materials for technical details). The results showed that in-car attention maps highlighted
significantly less driving-relevant objects than in-lab attention maps (see Fig. 2A).
The difference in the number of attended objects between the in-car and in-lab attention maps
can be due to the fact that eye movements collected from a single driver do not completely indicate
all the objects that demand attention in the particular driving situation. One individual’s eye
movements are only an approximation of their attention [23], and humans can also track objects
with covert attention without looking at them [6]. The difference in the number of attended
objects may also reflect the difference between first-person driver attention and third-person
driver attention. It may be that the human observers in our in-lab eye-tracking experiment also
looked at objects that were not relevant for driving. We ran a human evaluation experiment to
address this concern.
Human Evaluation: To verify that our in-lab driver attention maps highlight regions that
should indeed demand drivers’ attention, we conducted an online study to let humans compare
in-lab and in-car driver attention maps. In each trial of the online study, participants watched
one driving video clip three times: the first time with no edit, and then two more times in
random order with overlaid in-lab and in-car attention maps, respectively. The participant was
then asked to choose which heatmap-coded video was more similar to where a good driver would
look. In total, we collected 736 trials from 32 online participants. We found that our in-lab
attention maps were more often preferred by the participants than the in-car attention maps
(71% versus 29% of all trials, statistically significant as p = 1×10−29, see Table 2). Although
this result cannot suggest that in-lab driver attention maps are superior to in-car attention maps
in general, it does show that the driver attention maps collected with our protocol represent
where a good driver should look from a third-person perspective.
In addition, we will show in the Experiments section that in-lab attention data collected
using our protocol can be used to train a model to effectively predict actual, in-car driver
attention. This result proves that our dataset can also serve as a substitute for in-car driver
attention data, especially in crucial situations where in-car data collection is not practical.
To summarize, compared with driver attention data collected in-car, our dataset has three
clear advantages: multi-focus, little driving-irrelevant noise, and efficiently tailored to crucial
driving situations.

Homepage URL (optional):

Paper where the dataset was introduced:

Introduction date:

Dataset license:

URL to full license terms:

Image

Currently

datasets/d8c044d6-a2ee-47af-ac83-fc6b97905b5c.jpg Clear

Change

---

BDD-A (Berkeley DeepDrive Attention)

Benchmarks

Add a new result Link an existing benchmark

Papers

Dataset Loaders

Add Remove

Tasks

Similar Datasets

MAAD

CAP-DATA

DADA-2000

EyeCar

Usage

License

Modalities

Languages