BIOSCAN-5M

Introduced by Gharaee et al. in BIOSCAN-5M: A Multimodal Dataset for Insect Biodiversity

As part of an ongoing worldwide effort to comprehend and monitor insect biodiversity, we present the BIOSCAN-5M Insect dataset to the machine learning community. BIOSCAN-5M is a comprehensive dataset containing multi-modal information for over 5 million insect specimens, and it significantly expands existing image-based biological datasets by including taxonomic labels, raw nucleotide barcode sequences, assigned barcode index numbers, geographical information, and specimen size.

Every record has both image and DNA data. Each record of the BIOSCAN-5M dataset contains six primary attributes:

RGB image
DNA barcode sequence
Barcode Index Number (BIN)
Biological taxonomic classification
Geographical information
Specimen size

Papers


Paper Code Results Date Stars

Tasks