6 dataset results for Sentiment Analysis AND Chinese

Chinese social media suicide risk and cognitive distortions classification

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

1 PAPER • NO BENCHMARKS YET

CH-SIMS

CH-SIMS is a Chinese single- and multimodal sentiment analysis dataset which contains 2,281 refined video segments in the wild with both multimodal and independent unimodal annotations. It allows researchers to study the interaction between modalities or use independent unimodal annotations for unimodal sentiment analysis.

17 PAPERS • 1 BENCHMARK

GeoCoV19

GeoCoV19 is a large-scale Twitter dataset containing more than 524 million multilingual tweets. The dataset contains around 378K geotagged tweets and 5.4 million tweets with Place information. The annotations include toponyms from the user location field and tweet content and resolve them to geolocations such as country, state, or city level. In this case, 297 million tweets are annotated with geolocation using the user location field and 452 million tweets using tweet content.

3 PAPERS • NO BENCHMARKS YET

LSICC

LSICC (Large Scale Informal Chinese Corpus)

Large Scale Informal Chinese Corpus (LSICC) is a large-scale corpus of informal Chinese. This corpus contains around 37 million book reviews and 50 thousand netizen's comments to the news.

1 PAPER • NO BENCHMARKS YET

NAIST COVID

NAIST COVID is a multilingual dataset of social media posts related to COVID-19, consisting of microblogs in English and Japanese from Twitter and those in Chinese from Weibo. The data cover microblogs from January 20, 2020, to March 24, 2020.

1 PAPER • NO BENCHMARKS YET

WikiSem500

The WikiSem500 dataset contains around 500 per-language cluster groups for English, Spanish, German, Chinese, and Japanese (a total of 13,314 test cases).

4 PAPERS • NO BENCHMARKS YET

Datasets

6 dataset results for Sentiment Analysis AND Chinese