54 dataset results for Korean

533 parallel examples sampled from TACRED, translated into Russian and Korean (and 3 additional examples in Russian), accompanied with tranlsation of a list of trigger words collected for the different relations.

1 PAPER • NO BENCHMARKS YET

WEATHub

WEATHub is a dataset containing 24 languages. It contains words organized into groups of (target1, target2, attribute1, attribute2) to measure the association target1:target2 :: attribute1:attribute2. For example target1 can be insects, target2 can be flowers. And we might be trying to measure whether we find insects or flowers pleasant or unpleasant. The measurement of word associations is quantified using the WEAT metric in our paper. It is a metric that calculates an effect size (Cohen's d) and also provides a p-value (to measure statistical significance of the results). In our paper, we use word embeddings from language models to perform these tests and understand biased associations in language models across different languages.

1 PAPER • NO BENCHMARKS YET

WiTA (Writing in The Air)

WiTA (Writing in The Air) is a dataset for the challenging writing in the air (WiTA) task -- an elaborate task bridging vision and NLP. The dataset consists of five sub-datasets in two languages (Korean and English) and amounts to 209,926 video instances from 122 participants. Finger movement for WiTA is captured with RGB cameras to ensure wide accessibility and cost-efficiency.

1 PAPER • NO BENCHMARKS YET

Deeply Korean read speech

Deeply Korean read speech corpus contains pairs of Korean speakers reading a script with 3 distinct text sentiments (negative, neutral, positive), with 3 distinct voice sentiments (negative, neutral, positive), are recorded. The recordings took place in 3 different types of places, which are an anechoic chamber, studio apartment, and dance studio, of which the level of reverberation differs. And in order to examine the effect of the distance of mic from the source and device, every experiment is recorded at 3 distinct distances with 2 types of smartphone, iPhone X, and Galaxy S7.

0 PAPER • NO BENCHMARKS YET

Deeply Parent-Child vocal interaction

Deeply Parent-Child Vocal Interaction contains the interaction of 24 pairs of parent and child(total 48 speakers), such as reading fairy tales, singing children’s songs, conversing, and others, is recorded. The recordings took place in 3 different types of places, which are an anechoic chamber, studio apartment, and dance studio, of which the level of reverberation differs. And in order to examine the effect of the distance of mic from the source and device, every experiment is recorded at 3 distinct distances) with 2 types of smartphone, iPhone X, and Galaxy S7.

0 PAPER • NO BENCHMARKS YET

NSMC (Naver Sentiment Movie Corpus)

This is a movie review dataset in the Korean language. Reviews were scraped from Naver Movies.

0 PAPER • 1 BENCHMARK

Datasets

54 dataset results for Korean