Clustering tweets usingWikipedia concepts

Two challenging issues are notable in tweet clustering. Firstly, the sparse data problem is serious since no tweet can be longer than 140 characters. Secondly, synonymy and polysemy are rather common because users intend to present a unique meaning with a great number of manners in tweets. Enlightened by the recent research which indicates Wikipedia is promising in representing text, we exploit Wikipedia concepts in representing tweets with concept vectors. We address the polysemy issue with a Bayesian model, and the synonymy issue by exploiting the Wikipedia redirections. To further alleviate the sparse data problem, we further make use of three types of out-links in Wikipedia. Evaluation on a twitter dataset shows that the concept model outperforms the traditional VSM model in tweet clustering.

PDF Abstract
No code implementations yet. Submit your code now

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here