The Aurora-2 data are based on a version of the original TIDigits (as available from LDC) downsampled at 8 kHz. Different noise signals have been artificially added to clean speech data. The software tool for filtering and noise adding is available in the download area. You can use the tool for creating distorted data at sampling rates of 8 or 16 kHz. The recognition experiments for Aurora-2 are based on the usage of the HTK recognizer as it is available from Cambridge University. Scripts and configuration files are part of the Aurora-2 CDs as they are distributed by ELRA/ELDA. A published paper is available describing some details of the data creation and the recognition experiments.

The experiments as distributed on the CDs are based on acoustic features as they are created as output of a cepstral analysis scheme that has been standardized by ETSI. We refer to this feature extraction scheme as first standard. Later on the advanced front-end has been standardized as a second standard. We provide a set of scripts in the download area for performing the Aurora-2 experiment with the advanced front-end. A report is available containing more details about the set-up and the obtained recognition results.


Paper Code Results Date Stars

Dataset Loaders

No data loaders found. You can submit your data loader here.



  • Unknown