KorSTS

Introduced by Ham et al. in KorNLI and KorSTS: New Benchmark Datasets for Korean Natural Language Understanding

KorSTS is a dataset for semantic textural similarity (STS) in Korean. The dataset is constructed by automatically the STS-B dataset. To ensure translation quality, two professional translators with at least seven years of experience who specialize in academic papers/books as well as business contracts post-edited a half of the dataset each and cross-checked each other’s translation afterward. The KorSTS dataset comprises 5,749 training examples translated automatically and 2,879 evaluation examples translated manually.

Source: https://github.com/kakaobrain/KorNLUDatasets

Homepage