вторник, 18 декабря 2012 г.

Investigation of the Concept "End of the World" in Blogs Using Data Mining Methods

   The information space is discussing intensively the end of the world, which  is supposedly to take place on December 21, 2012, according to the Mayan calendar. There are no reasons for natural disasters now. Even NASA has provided a clarification on this matter (http://www.nasa.gov/topics/earth/features/2012.html). However, the very stream of information, which is associated with the "end of the world" concept, may have an impact both on individuals and on the entire society.
    Certain combinations of semantic concepts, which were periodically repeated from different information sources,  can play a role of information viruses and affect people’s mentality and behavior, cause certain trends in society.
 Therefore, in my opinion, the development of methods for detecting trends in the information stream  is very promising. I have downloaded Twitter messages with the keywords "End of the world" and tried to analyze them using data mining methods including  the semantic fields theory, frequency sets, associative rules, opinion mining, Galois lattices. My previous investigations of twitter blogs messages can be found  here.    This approach makes it possible to identify the semantic network of concepts that are associated with the subject of the analysis, and construct the corresponding semantic rules. While analyzing the dynamics of characteristics of semantic rules and relevant frequent sets, one can find connections with various important indicators.
 The tweets with the keywords “world end”, "#endoftheworld", “December 21”, “Dec 21” , “#December 21”, “#end of the world”, were being loaded into separate  files.
 The analysis was conducted in the following sequence: the stop words and the words of the highest and lowest frequency were removed from the loaded arrays. Then the frequent sets of words with given level of support were found. Based on the analysis of the array of frequent sets the associative rules were constructed. I have analyzed the daily dynamics of such characteristics of associative rules as support and confidence. Then the semantic field of words that reflects the semantic frame of the analyzed region was formed. Using that very semantic field I have constructed Galois lattices, which represent semantic relations between the analyzed concepts. On the new-formed lattice the ideals and filters were marked, which reflect the process of the semantic concept formation by other concepts, and the value of support of these concepts.
 On the basis of semantic fields of words with positive and negative sentiment I have calculated the daily dynamics of frequency sets with positive and negative sentiment.
I give the first results as investigations are still in process.

Some associative rules with support and confidence for channel  #endoftheworld :


Semantic field for messages filtering :
 {21st, die, apocalypse, dying, dead, predict, prediction, calendar,
stupid, zombies, world, friday, mayan, hahaha}

associative rules:
 
{die}  => {friday} 6.79% 66.04%
{apocalypse} => {mayan}  0.76% 38.09%
{apocalypse} => {friday} 0.62% 30.95%
{21st}  => {world}  5.35% 40.57%
{dying}  => {friday} 1.09% 41.81%
{dead}  => {friday} 0.9% 50.0%
{predict} => {world}  1.0% 75.0%
{calendar} => {mayan}  3.1% 60.18%
{stupid} => {friday} 0.9% 46.34%
{zombies} => {friday} 0.52% 52.38%

Ideal & Filter of Galois latice for different concepts (channel #endoftheworld) :
Ideal & Filter for concept {world,die,21st}





Ideal & Filter for concept {world,die,friday}

Ideal & Filter for concept {world,apocalypse,21st}


Ideal & Filter for concept {die,21st, hahaha}


 The first results for the tweets  with the keyword {"end", "world”}:

 Examples of  associative rules with support and confidence :
die => 21st 0.0018 0.11
worry  => 21st   0.00337  0.31
apocalypse => 21st 0.00043  0.06
The dynamics of support for associative rule {end, world, mayan} => {friday}




    Later on I will describe the results in detail. I also intend to develop software that will analyze information trends on this or that subject area using the semantic fields theory, frequency sets, associative rules, opinion mining, Galois lattices.
    I think it is time to study a new information object, which can be called “an informational mental virus” and which can produce latent associative rules in the subconsciousness and, thus, it can influence human behavior. 

Комментариев нет:

Отправить комментарий