INSTANCE – the Italian seismic dataset for machine learning

The Italian earthquake waveform data are collected here in a dataset suited for machine learning analysis (ML) applications. The dataset consists of nearly 1.2 million three-component (3C) waveform traces from about 50 000 earthquakes and more than 130 000 noise 3C waveform traces, for a total of about 43 000 h of data and an average of 21 3C traces provided per event. The earthquake list is based on the Italian Seismic Bulletin (, last access: 15 February 2020​​​​​​​) of the Istituto Nazionale di Geofisica e Vulcanologia between January 2005 and January 2020, and it includes events in the magnitude range between 0.0 and 6.5. The waveform data have been recorded primarily by the Italian National Seismic Network (network code IV) and include both weak- (HH, EH channels) and strong-motion (HN channels) recordings. All the waveform traces have a length of 120 s, are sampled at 100 Hz, and are provided both in counts and ground motion physical units after deconvolution of the instrument transfer functions. The waveform dataset is accompanied by metadata consisting of more than 100 parameters providing comprehensive information on the earthquake source, the recording stations, the trace features, and other derived quantities. This rich set of metadata allows the users to target the data selection for their own purposes. Much of these metadata can be used as labels in ML analysis or for other studies. The dataset, assembled in HDF5 format, is available at (Michelini et al., 2021).

PDF Abstract


Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.


No methods listed for this paper. Add relevant methods here