6 dataset results for Data Augmentation AND German

Europarl-ST is a multilingual Spoken Language Translation corpus containing paired audio-text samples for SLT from and into 9 European languages, for a total of 72 different translation directions. This corpus has been compiled using the debates held in the European Parliament in the period between 2008 and 2012.

55 PAPERS • NO BENCHMARKS YET

MuST-C

MuST-C currently represents the largest publicly available multilingual corpus (one-to-many) for speech translation. It covers eight language directions, from English to German, Spanish, French, Italian, Dutch, Portuguese, Romanian and Russian. The corpus consists of audio, transcriptions and translations of English TED talks, and it comes with a predefined training, validation and test split.

193 PAPERS • 2 BENCHMARKS

Patzig

Patzig contains handwritten texts written in modern German. Train sample consists of 485 lines, validation - 38 lines and test -118 lines.

3 PAPERS • NO BENCHMARKS YET

Schiller (Shiller)

Schiller contains handwritten texts written in modern German. Train sample consists of 244 lines, validation - 21 lines and test - 63 lines.

3 PAPERS • NO BENCHMARKS YET

Schwerin

Schwerin contains handwritten texts written in medieval German. Train sample consists of 793 lines, validation - 68 lines and test - 196 lines.

3 PAPERS • NO BENCHMARKS YET

Konzil (Konzilsprotokolle_C)

Konzil dataset was created by specialists of the University of Greifswald. It contains manuscripts written in modern German. Train sample consists of 353 lines, validation - 29 lines and test - 87 lines.

3 PAPERS • NO BENCHMARKS YET

Datasets

6 dataset results for Data Augmentation AND German