• Browse State-of-the-Art
  • Datasets
  • Methods
  • More
    Newsletter RC2021
    About Trends Portals Libraries
  • Sign In

Subscribe to the PwC Newsletter

Stay informed on the latest trending ML papers with code, research developments, libraries, methods, and datasets.

Read previous issues
Join the community
You need to log in to edit.
You can create a new account if you don't have one.

Or, discuss a change on Slack.
Edit Dataset
Currently
datasets/Screen_Shot_2021-01-29_at_12.26.15_PM.png
Change
Edit Dataset Tasks

Some tasks are inferred based on the benchmarks list.

Add a Data Loader
Remove a Data Loader
  • huggingface/datasets
  • facebookresearch/ParlAI
Edit Dataset Modalities
Edit Dataset Languages
Edit Dataset Variants

The benchmarks section lists all benchmarks using a given dataset or any of its variants. We use variants to distinguish between results evaluated on slightly different versions of the same dataset. For example, ImageNet 32⨉32 and ImageNet 64⨉64 are variants of the ImageNet dataset.

Add a new evaluation result row
Edit

OpenSubtitles

Introduced by Lison et al. in OpenSubtitles2016: Extracting Large Parallel Corpora from Movie and TV Subtitles

OpenSubtitles is collection of multilingual parallel corpora. The dataset is compiled from a large database of movie and TV subtitles and includes a total of 1689 bitexts spanning 2.6 billion sentences across 60 languages.

Homepage

Benchmarks
Add a new result Link an existing benchmark


Trend Task Dataset Variant Best Model Paper Code
Language Identification
OpenSubtitles
Apple bi-LSTM

Papers


Paper Code Results Date Stars

Dataset Loaders
Add Remove


    huggingface/datasets
    13,381
    facebookresearch/ParlAI
    8,827

Tasks


  • Domain Adaptation
  • Machine Translation
  • Dialogue Generation
  • Language Identification

Similar Datasets


Allegro Reviews

Allegro Reviews

FinnSentiment

FinnSentiment

FinChat

FinChat

DiaBLa

DiaBLa

Usage


License


  • Unknown

Modalities


Languages


  • English
  • French
  • Spanish
  • German
  • Italian
  • Chinese
  • Bengali
  • Japanese
  • Russian
  • Portuguese
  • Afrikaans
  • Albanian
  • Arabic
  • Armenian
  • Basque
  • Breton
  • Bulgarian
  • Catalan
  • Croatian
  • Czech
  • Danish
  • Dutch
  • Estonian
  • Finnish
  • Galician
  • Hebrew
  • Hindi
  • Hungarian
  • Icelandic
  • Indonesian
  • Korean
  • Latvian
  • Lithuanian
  • Norwegian
  • Persian
  • Polish
  • Romanian
  • Serbian
  • Slovak
  • Slovenian
  • Swedish
  • Tagalog
  • Tamil
  • Telugu
  • Thai
  • Turkish
  • Ukrainian
  • Urdu
  • Vietnamese
  • Greek
  • Bosnian
  • Esperanto
  • Georgian
  • Malayalam
  • Macedonian
  • Sinhala
  • Malay (individual language)
Contact us on: hello@paperswithcode.com . Papers With Code is a free resource with all data licensed under CC-BY-SA.
Terms Data policy Cookies policy from