воскресенье, 13 января 2013 г.

Data Mining of the concept “End of the World” in twitter microblogs (II part)


I would like to present the new results obtained while studying the concept "end of the world." The tweets with the keywords "end, world" were under investigation. Previous results are shown here. While analyzing the dynamics of frequent sets of keywords one may single out the following groups due to the maximums :
1) frequent sets with the maximum of 7-8 days before the event on December 21
2) frequent sets with the maximum of 1-3 days before December 21
3) frequent sets with the maximum exactly on December 21
4) frequent sets with the maximum after  December 21
5) frequent sets wich change periodically 

Here are some examples of such frequent sets: 

 frequent sets with the maximum of 7-8 days before the event on December 21





frequent sets with the maximum of 1-3 days before December 21



Support and confidence for examples of assotiative rules



keywords with the maximum exactly on December 21


frequent sets with the maximum after  December 21




frequent sets wich change periodically 



It is obvious that the most interesting are the frequent sets with the maximum before the anticipated event. Such keywords sets can be considered as predictive markers. By the nature of their maximum it is possible to judge to what extent the discussion of the anticipated event may be socially significant. For example, in case of necessity, considering prognostic markers, the government may resort to follow-up actions, such as information explanations, as it was done by NASA (), since it was known beforehand that there are no academic prerequisites for anxiety. However, for some part of the population the topic of doomsday on December 21, 2012 caused serious concern.

      This information process is somewhat similar to a viral epidemic, it is a new characteristic property of social networks, with the absence of which the scales of discussion of unscientific topics would be much lower. In my further research I am planning to explore the methods of automated detection of prognostic markers. I see a great potential in the genetic algorithms methods, which can give polynomial acceleration for given class of problems. 
Please be welcome to leave your comments, so I could see whether it is rational to continue such studies.