12 dataset results for segmentation AND Audio AND English

MSP-Podcast (A large naturalistic speech emotional dataset)

The MSP-Podcast corpus contains speech segments from podcast recordings which are perceptually annotated using crowdsourcing. The collection of this corpus is an ongoing process. Most of the segments in a regular podcasts are neutral. We use machine learning techniques trained with available data to retrieve candidate segments. These segments are emotionally annotated with crowdsourcing. This approach allows us to spend our resources on speech segments that are likely to convey emotions.

3 PAPERS • 4 BENCHMARKS

AVSpeech

…The segments are of varying length, between 3 and 10 seconds long, and in each clip the only visible face in the video and audible sound in the soundtrack belong to a single speaking person. In total, the dataset contains roughly 4700 hours of video segments with approximately 150,000 distinct speakers, spanning a wide variety of people, languages and face poses.

35 PAPERS • NO BENCHMARKS YET

RESPIRATORY AND DRUG ACTUATION DATASET

…respiratory flow ranging on 180-240 L/min.Each audio recording was sampled with a 8KHz sampling frequency, as a mono channel WAV file, at 8-bit depth.The audio recordings were segmented The obtained segments (of non-mixed states) were of variable length and, for some methods, were further segmented into frames of fixed length for the purposes of feature extraction.The constructed database overall consisted of 193 drug actuation segments, 319 inhalation and 620 exhalation segments and 505 noise segments, ready to be used for audio sound recognition using different sets of features

1 PAPER • NO BENCHMARKS YET

LSSED

…Each segment is annotated for the presence of 11 emotions (angry, neutral, fear, happy, sad, disappointed, bored, disgusted, excited, surprised, fear and other)

6 PAPERS • 1 BENCHMARK

WASABI

…lyrics encode an important part of the semantics of a song, the authors focus on the description of the methods they proposed to extract relevant information from the lyrics, such as their structure segmentation can be exploited by music search engines and music professionals (e.g. journalists, radio presenters) to better handle large collections of lyrics, allowing an intelligent browsing, categorization and segmentation

0 PAPER • NO BENCHMARKS YET

IEMOCAP (The Interactive Emotional Dyadic Motion Capture (IEMOCAP) Database)

…Each segment is annotated for the presence of 9 emotions (angry, excited, fear, sad, surprised, frustrated, happy, disappointed and neutral) as well as valence, arousal and dominance.

636 PAPERS • 3 BENCHMARKS

YTSeg

We present YTSeg, a topically and structurally diverse benchmark for the text segmentation task based on YouTube transcriptions.

1 PAPER • 2 BENCHMARKS

MeetingBank

…The datasets contains 6,892 segment-level summarization instances for training and evaluating of performance.

7 PAPERS • NO BENCHMARKS YET

Jamendo Corpus

…Segments of each song are annotated as “voice” (sung or spoken) or “no-voice”. The songs constitute a total of about 6 hours of music.

3 PAPERS • NO BENCHMARKS YET

Localized Narratives

…This dense visual grounding takes the form of a mouse trace segment per word and is unique to our data.

54 PAPERS • 5 BENCHMARKS

Biwi 3D Audiovisual Corpus of Affective Communication - B3D(AC)^2 (BIWI 3D)

…In order to ease automatic speech segmentation, we carried out the recordings in a anechoic room, with walls covered by sound wave-absorbing materials.

5 PAPERS • 1 BENCHMARK

FSC-P2 (Fearless Steps Challenge Phase2)

…This (FS-02) edition of the FEARLESS STEPS Challenge includes the following 6 tasks --- TASK 1: Speech Activity Detection (SAD) TASK 2: Speaker Identification (using Speaker Segments Track 2: ASR using Diarized Segments (ASR_track2)

1 PAPER • NO BENCHMARKS YET

Datasets

12 dataset results for segmentation AND Audio AND English