nEMO Dataset | Papers With Code

Name:*

Full name (optional):

Description (Markdown and $\LaTeX$ enabled):*

## Overview

nEMO is a simulated dataset of emotional speech in the Polish language. The corpus contains over 3 hours of samples recorded with the participation of nine actors portraying six emotional states: anger, fear, happiness, sadness, surprise, and a neutral state. The text material used was carefully selected to represent the phonetics of the Polish language. The corpus is available for free under the Creative Commons license (CC BY-NC-SA 4.0).

The dataset is available on [Hugging Face](https://huggingface.co/datasets/amu-cai/nEMO) and [GitHub](https://github.com/amu-cai/nEMO).

## Data Fields

- `file_id` - filename, i.e. `{speaker_id}_{emotion}_{sentence_id}`,

- `audio` (audio) - dictionary containing audio array, path and sampling rate (available when accessed via datasets library),

- `emotion` - label corresponding to emotional state,

- `raw_text` - original (orthographic) transcription of the audio,

- `normalized_text` - normalized transcription of the audio,

- `speaker_id` - id of speaker,

- `gender` - gender of the speaker,

- `age` - age of the speaker.

## Usage

The nEMO dataset can be loaded and processed using the datasets library:

```python
from datasets import load_dataset

nemo = load_dataset("amu-cai/nEMO", split="train")
```

To work with the nEMO dataset on GitHub, you may clone the repository and access the files directly within the `samples` folder. Corresponding metadata can be found in the `data.tsv` file.

The nEMO dataset is provided as a whole, without predefined training and test splits. This allows researchers and developers flexibility in creating their splits based on the specific needs.

## Supported Tasks

- **Audio classification:** This dataset was mainly created for the task of speech emotion recognition. Each recording is labeled with one of six emotional states (anger, fear, happiness, sadness, surprised, and neutral). Additionally, each sample is labeled with speaker id and speaker gender. Because of that, the dataset can also be used for different audio classification tasks.
- **Automatic Speech Recognition:** The dataset includes orthographic and normalized transcriptions for each audio recording, making it a useful resource for automatic speech recognition (ASR) tasks. The sentences were carefully selected to cover a wide range of phonemes in the Polish language.
- **Text-to-Speech:** The dataset contains emotional audio recordings with transcriptions, which can be valuable for developing TTS systems that produce emotionally expressive speech.

## Additional Information

### Licensing Information

The dataset is available under the Creative Commons license (CC BY-NC-SA 4.0).

### Citation Information

You can access the nEMO paper at [arXiv](https://arxiv.org/abs/2404.06292). Please cite the paper when referencing the nEMO dataset as:

```
@misc{christop2024nemo,
    title={nEMO: Dataset of Emotional Speech in Polish}, 
    author={Iwona Christop},
    year={2024},
    eprint={2404.06292},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
```

### Contributions

Thanks to [@iwonachristop](https://github.com/iwona-christop) for adding this dataset.

Homepage URL (optional):

Paper where the dataset was introduced:

Introduction date:

Dataset license:

URL to full license terms:

Image

---

nEMO

Overview

Data Fields

Usage

Supported Tasks

Additional Information

Licensing Information

Citation Information

Contributions

Benchmarks

Add a new result Link an existing benchmark

Papers

Dataset Loaders

Add Remove

Tasks

Usage

License

Modalities

Languages

nEMO

Overview

Data Fields

Usage

Supported Tasks

Additional Information

Licensing Information

Citation Information

Contributions

Benchmarks Edit Add a new result Link an existing benchmark

Papers

Dataset Loaders Edit Add Remove

Tasks Edit

Usage

License Edit

Modalities Edit

Languages Edit

Benchmarks

Add a new result Link an existing benchmark

Dataset Loaders

Add Remove

Tasks

License

Modalities

Languages