Casual Conversations dataset is designed to help researchers evaluate their computer vision and audio models for accuracy across a diverse set of age, genders, apparent skin tones and ambient lighting conditions.
Casual Conversations is composed of over 45,000 videos (3,011 participants) and intended to be used for assessing the performance of already trained models in computer vision and audio applications for the purposes permitted in the data user agreement. The videos feature paid individuals who agreed to participate in the project and explicitly provided age and gender labels themselves. The videos were recorded in the U.S. with a diverse set of adults in various age, gender and apparent skin tone groups. A group of trained annotators labeled the participants’ apparent skin tone using the Fitzpatrick scale in addition to annotations of videos recorded in low ambient lighting conditions.Source: Casual Conversations Dataset