4 dataset results for Domain Adaptation AND Texts AND Chinese

ASPEC (Asian Scientific Paper Excerpt Corpus)

ASPEC, Asian Scientific Paper Excerpt Corpus, is constructed by the Japan Science and Technology Agency (JST) in collaboration with the National Institute of Information and Communications Technology (NICT). It consists of a Japanese-English paper abstract corpus of 3M parallel sentences (ASPEC-JE) and a Japanese-Chinese paper excerpt corpus of 680K parallel sentences (ASPEC-JC). This corpus is one of the achievements of the Japanese-Chinese machine translation project which was run in Japan from 2006 to 2010.

85 PAPERS • NO BENCHMARKS YET

KdConv (Knowledge-driven Conversation)

KdConv is a Chinese multi-domain Knowledge-driven Conversation dataset, grounding the topics in multi-turn conversations to knowledge graphs. KdConv contains 4.5K conversations from three domains (film, music, and travel), and 86K utterances with an average turn number of 19.0. These conversations contain in-depth discussions on related topics and natural transition between multiple topics, while the corpus can also used for exploration of transfer learning and domain adaptation.

21 PAPERS • NO BENCHMARKS YET

MSDA (Multi-source domain adaptation dataset for text recognition)

5 domains: synthetic domain, document domain, street view domain, handwritten domain, and car license domain over five million images

2 PAPERS • 2 BENCHMARKS

XL-R2R

XL-R2R (Cross-lingual Room-to-Room)

The XL-R2R dataset is built upon the R2R dataset and extends it with Chinese instructions. XL-R2R preserves the same splits as in R2R and thus consists of train, val-seen, and val-unseen splits with both English and Chinese instructions, and test split with English instructions only.

2 PAPERS • NO BENCHMARKS YET

Datasets

4 dataset results for Domain Adaptation AND Texts AND Chinese