WeiboCluster: An Event-Oriented Sina Weibo Dataset with Estimating Credit
Additional Document Info
Springer International Publishing AG, part of Springer Nature 2018. The earliest and the most famous micro-blogging platform is Twitter, which was created in 2006. But in China, Sina Weibo, the latecomer, has become bigger than Twitter and plays a vital role in the social media. With eight times more users than Twitter , the problems about rumor are more severe for Sina Weibo. In recent years, deep learning has been used into the natural language processing (NLP). For example, a contextual LSTM model, which is a kind of recurrent neural networks, was employed to solve large scale NLP tasks . NLP technology aims to extract the potential information of the text, which is appropriate for detecting rumor. The basis of neural networks is data set to be trained. Unfortunately, there is no suitable data set of Sina Weibo for NLP. To solve this problem, this paper proposed a process to collection data source of micro-blogs used for rumor detecting. The process here is event-oriented and introduced the concept of credit (or confidence) into the final dataset, which makes the dataset different and useful.