The MSP-Podcast corpus contains speech segments from podcast recordings which are perceptually annotated using crowdsourcing. The collection of this corpus is an ongoing process. Most of the segments in a regular podcasts are neutral. We use machine learning techniques trained with available data to retrieve candidate segments. These segments are emotionally annotated with crowdsourcing. This approach allows us to spend our resources on speech segments that are likely to convey emotions.
3 PAPERS • 4 BENCHMARKS
…The segments are of varying length, between 3 and 10 seconds long, and in each clip the only visible face in the video and audible sound in the soundtrack belong to a single speaking person. In total, the dataset contains roughly 4700 hours of video segments with approximately 150,000 distinct speakers, spanning a wide variety of people, languages and face poses.
35 PAPERS • NO BENCHMARKS YET
…respiratory flow ranging on 180-240 L/min.Each audio recording was sampled with a 8KHz sampling frequency, as a mono channel WAV file, at 8-bit depth.The audio recordings were segmented The obtained segments (of non-mixed states) were of variable length and, for some methods, were further segmented into frames of fixed length for the purposes of feature extraction.The constructed database overall consisted of 193 drug actuation segments, 319 inhalation and 620 exhalation segments and 505 noise segments, ready to be used for audio sound recognition using different sets of features
1 PAPER • NO BENCHMARKS YET
…Each segment is annotated for the presence of 11 emotions (angry, neutral, fear, happy, sad, disappointed, bored, disgusted, excited, surprised, fear and other)
6 PAPERS • 1 BENCHMARK
…lyrics encode an important part of the semantics of a song, the authors focus on the description of the methods they proposed to extract relevant information from the lyrics, such as their structure segmentation can be exploited by music search engines and music professionals (e.g. journalists, radio presenters) to better handle large collections of lyrics, allowing an intelligent browsing, categorization and segmentation
0 PAPER • NO BENCHMARKS YET
…Each segment is annotated for the presence of 9 emotions (angry, excited, fear, sad, surprised, frustrated, happy, disappointed and neutral) as well as valence, arousal and dominance.
636 PAPERS • 3 BENCHMARKS
We present YTSeg, a topically and structurally diverse benchmark for the text segmentation task based on YouTube transcriptions.
1 PAPER • 2 BENCHMARKS
…The datasets contains 6,892 segment-level summarization instances for training and evaluating of performance.
7 PAPERS • NO BENCHMARKS YET
…Segments of each song are annotated as “voice” (sung or spoken) or “no-voice”. The songs constitute a total of about 6 hours of music.
3 PAPERS • NO BENCHMARKS YET
…This dense visual grounding takes the form of a mouse trace segment per word and is unique to our data.
54 PAPERS • 5 BENCHMARKS
…In order to ease automatic speech segmentation, we carried out the recordings in a anechoic room, with walls covered by sound wave-absorbing materials.
5 PAPERS • 1 BENCHMARK
…This (FS-02) edition of the FEARLESS STEPS Challenge includes the following 6 tasks --- TASK 1: Speech Activity Detection (SAD) TASK 2: Speaker Identification (using Speaker Segments Track 2: ASR using Diarized Segments (ASR_track2)