The analysis of modern
social networks is widely used in many business areas such as marketing,
forecasting, financial and stock markets, etc. Marketing, Predictive Analytics
& Risk Management are the parts of Business intelligence (BI). BI provides
reporting and analysis that can help make business decisions and show what
happened and why. We would like to
consider the ability of using data mining methods which were applied to
unstructured data streams in BI
solutions. One can gather such data from different sources, e.g. social
network streams, specialized forums, RSS channels, etc. We especially study how
such type of analysis can be applied to predictive analytics and risk management.
Let us consider the grounds of these
areas. Predictive analytics
is an area of data mining that deals with extracting information from data and
using it to predict trends and behavior patterns. Predictive models analyze
past performance to assess how likely a customer is to exhibit a specific
behavior in order to improve marketing effectiveness. With the number of
competing services available, businesses need to focus efforts on maintaining
continuous consumer satisfaction, rewarding consumer loyalty. So, it is
important to analyze users' opinion which can be retrieved from users' messages
in social networks. Predictive analytics can also predict this behavior, so
that the company can take proper actions to increase customer activity. Apart from identifying prospects,
predictive analytics can also help to identify the most effective combination
of product versions, marketing material, communication channels and timing that
should be used to target a given consumer. Predictive analytics and mining of social network streams can also be
used to identify high-risk fraud candidates in business or the public sector. Another area where we can implement
social network stream is risk management, this is about the identification, assessment, and prioritization of risks.
Social network streams make it possible to reveal quantitative characteristics
of background factor for the processes under analysis. Monitoring of indicators of social network streams
allows to control the probability and/or impact of unfortunate events or to
maximize the realization of opportunities. Lack of knowledge can be retrieved
from semistructured data of message streams in social networks. This additional
knowledge can also help to optimize analyzed processes and minimize overall
risk. Text
stream mining enables us to reveal the dynamics of different risk sources by
analyzing quantitative indicators, retrieved from social network data. One of
important factors in risk management is users' opinion about some entity, e.g.
process, services, etc. Such opinions can be retrieved using sentiment mining
approach applied to informational streams of social networks. Modern systems of business intelligence
widely use the analytical methods of non-structured and semi structured data,
gathered from different sources.
We would like to show the possibility of analyzing
economical and financial indicators using the stream of textual data and
informational streams of social networks, special-purpose forums, and RSS
channels. Consider
the data mining of social network streams. To receive information streams, we
used Twitter API and special Python software for web scraping of special-purpose
forums. The theoretical basis for the analysis
was the theory of semantic fields, the analysis of formal concepts, and the
theory of frequent sets and association rules, sentiment mining methods. For the
predictive analytics we used ARIMA and VAR models. The Granger test was
used to find causality between time series.
As a result of data mining of text messages we will receive the time
series of various quantitative characteristics of blog messages, e.g. support
and confidence of association rules. The next step is to find
correlations between the time series, which are the results of social network
data mining and the time series that represent real stock markets. On this
step, we need to find such time series of social media trends that not only
correlate with stock market series but also have predictive potential. Very
important for decision-making in risk management are the visualization of data
and infographics, on the basis of which an expert makes his decision. That is why
we attached a great importance to various methods how to represent our results.
As our previous studies show it is very important to detect and remove anomalous communities that were dynamically formed in tweet streams. We also showed that it is very important to
single out the tweets of competent users and main influencers. We can find them
using different methods of graph theory.
As an example, consider the dynamics of popularity of some cosmetic brands, based on the downloaded tweet streams. Fig. 1-4 show the results obtained. We used
various types to visualize our results in graph presentation. Such types of
graphs may be used in business intelligence dashboards. They may also provide additional business information for the experts
in marketing, predictive analytics, and risk management spheres.
Fig.1
Fig.2
Fig.3
Fig.4
Let us consider the dynamics of chosen brands, based on the analysis of messages from economic forums. Those messages were downloaded from forums using corresponding Python software. Fig.5-7 shows the obtained results in different graphical presentations.
Fig.5
Fig.6
Fig.7
Now we
consider the dynamics of quantitative characteristics of one company, based on
the analysis of downloaded tweet streams. We chose Apple company as an
example. Fig.8-9 shows
the graphs with the dynamics of keyword frequent itemsets and the dynamics of users' opinion. These results reflect the dynamics of the popularity of Apple products and users' opinion towards them.
Fig.8
Fig.9
Our next step is to consider if it is possible to predict Apple stock prices on the basis of obtained time series of keyword frequent itemsets. In our previous studies, we conducted the Granger test for the time series of frequent itemsetsand Apple stock prices. This test showed that the time series of frequent
itemsets of analyzed tweet stream causes the peculiarities of the dynamics of
stock prices. We use the VAR model to analyze the possibility to predict stock
prices. This model takes into account both the dynamics of stock prices and the
dynamics of some chosen frequent itemsets. Fig.10-12 show the calculation results
with different sets of frequent itemsets. The bold
points are the predicted values that were calculated on the basis of previous
historical data. Fig.10-11 shows the calculations for three days ahead , and the fig.12 shows the calculations of the prediction for one day
ahead. Confidential interval is marked by grey color.
Fig.10
Fig.11
Fig.12
Into VAR model, we included the time series of keywords and users'
opinions of frequent itemsets. The
obtained data show that on some analyzed intervals VAR model has appeared to be
effective in predictive analytics approach to stock market forecasting In our further studies we are going to concentrate on the algorithms how
to select effectively the sets of time series of frequent itemsets for the
purpose of reducing the confidence interval and more accurate prediction for
longer time periods.
Our previous similar investigations can be found at:
Granger Causality Test for Frequent Itemsets of Keywords in Financial Tweets
Forecasting of the winners and favorites ofEurovision Song Contest 2013
Forecasting of the winners and favorites ofEurovision Song Contest 2013
We also
give our selected scientific e-prints
and links where we described the theoretical grounds of social network mining, which we
used in our studies:
B. Pavlyshenko
Tweets
Miner for Stock Market Analysis
In this paper, we present a
software package for the data mining of Twitter microblogs with the purpose of their
usage in the stock market analysis. The package is written in R language using
appropriate R packages. We considered the model of tweets and then compared
stock market charts with frequent sets of keywords in Twitter microblog
messages.
B. Pavlyshenko
Can Twitter Predict Royal Baby's Name?
We analyze the existence of
possible correlation between public opinion of twitter users and the
decision-making of persons who are influential in the society. In our study, we
use the methods of quantitative processing of natural language, the theory of
frequent sets, the algorithms of visual displaying of users' communities. It
was revealed that the structure of dynamically formed users' communities
participating in the discussion is determined by only a few leaders who
influence significantly the viewpoints of other users.
B. Pavlyshenko
Forecasting of Events by Tweet Data Mining
This paper describes the analysis
of quantitative characteristics of frequent sets and association rules in the
posts of Twitter microblogs related to different event discussions. For the
analysis, we used a theory of frequent sets, association rules and a theory of
formal concept analysis. We revealed the frequent sets and association rules
which characterize the semantic relations between the concepts of analyzed
subjects. The support of some frequent sets reaches its global maximum before
the expected event but with some time delay. Such frequent sets may be
considered as predictive markers that characterize the significance of expected
events for blogosphere users. We showed that the time dynamics of confidence in
some revealed association rules can also have predictive characteristics.
Exceeding a certain threshold may be a signal for corresponding reaction in the
society within the time interval between the maximum and the probable coming of
an event. In this paper, we considered two types of events: the Olympic tennis
tournament final in London, 2012 and the prediction of Eurovision 2013 winner.