TWEETSUM: Event oriented Social Summarization Dataset

COLING 2020  ·  Ruifang He, Liangliang Zhao, Huanyu Liu ·

With social media becoming popular, a vast of short and noisy messages are produced by millions of users when a hot event happens. Developing social summarization systems becomes more and more critical for people to quickly grasp core and essential information. However, the publicly available and high-quality large scale social summarization dataset is rare. Constructing such corpus is not easy and very expensive since short texts have very complex social characteristics. In this paper, we construct TWEETSUM, a new event-oriented dataset for social summarization. The original data is collected from twitter and contains 12 real world hot events with a total of 44,034 tweets and 11,240 users. Each event has four expert summaries, and we also have the annotation quality evaluation. In addition, we collect additional social signals (i.e. user relations, hashtags and user profiles) and further establish user relation network for each event. Besides the detailed dataset description, we show the performance of several typical extractive summarization methods on TWEETSUM to establish baselines. For further researches, we will release this dataset to the public.

PDF Abstract
No code implementations yet. Submit your code now

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here