CPB: a classification-based approach for burst time prediction in cascades
- Additional Document Info
- View All
© 2015, Springer-Verlag London. Studying the bursty nature of cascades in social media is practically important in many real applications such as product sales prediction, disaster relief, and stock market prediction. Although both the cascade size prediction and the burst patterns of the cascades have been extensively studied, how to predict when a burst will come remains an open problem. It is challenging for traditional time-series-based models such as regression models to address this task directly. Firstly, times-series-based prediction models focus on predicting the future values based on previously observed ones. It is hard to apply them to predict the time of a bursts with the “quick rise-and-fall” pattern. Secondly, besides the cascade popularity, a lot of other side information like user profile and social relation are available in social media. Although the potential utility of such information can be high, it is also hard for time-series-based models to capture and integrate these rich information with diverse formats seamlessly. This paper proposes a classification-based approach for burst time prediction by exploiting rich knowledge in information diffusion. Particularly, we first propose a time-window-based transformation to predict in which time window the burst will appear. By dividing the time spans of all the cascades into the same number of time windows K, the cascades with diverse time spans can thus be handled uniformly. To exploit the rich and heterogenous information in social media, we next propose a scale-independent feature extraction framework to model the heterogenous knowledge in a scale-independent manner. Systematical evaluations are conducted on the Sina Weibo reposting dataset and MemeTracker dataset. Besides the superior performance of the proposed approach, we also observe that: (1) surprisingly, social/structure knowledge is more indicative of the bursts than the cascade popularity information, especially for the bursts occurring in a farther future. (2) Larger cascades are harder to predict as the spreading process of the cascades with higher popularity is usually more diverse and fluctuant. (3) The proposed approach is robust in the sense that the result is not much sensitive to the popularity of the training cascades.
author list (cited authors)
Wang, S., Yan, Z., Hu, X., Yu, P. S., Li, Z., & Wang, B.