среда, 8 мая 2013 г.

Granger Causality Test for Frequent Itemsets of Keywords in Financial Tweets

Taking into account a well-known work (Bollen, Johan, Huina Mao, and Xiaojun Zeng. "Twitter mood predicts the stock market." Journal of Computational Science 2.1 (2011): 1-8.),  we have loaded twitter users' tweets that concern financial news. We downloaded the tweets of the following users:
"CNNMoney", "TheStreet", "FoxBusiness", "SeekingAlpha", "WallstCS", "themotleyfool", "MarketWatch", "CNBC", "ReutersBiz", "WSJ", "YahooFinance", "MicroFundy", "chartly", "MarketBeat", "ReutersTV", "BloombergTV", "profitly", "myrollingstocks", "BloombergNews", "Stockstobuy", "tradespoon", "stockr", "stocktwits", "FinancialBrand", "Option_Trading", "EconomicTimes".

The analysis was conducted for the time period of 100 days. We chose a frequent itemset  of keywords {apple, stock}. The following figure  shows the dynamics of the frequency of the frequent itemset and stock price.
 The graph also shows the time dynamics of moving averages with the averaging time window  of 10 and 20 days. We conducted the Granger test to determine the causation  between the time dynamics of frequent sets and the stock price. In the first test, we considered the null hypothesis about lack of causality between the dynamics of the frequent itemset {apple, stock} and AAPL stock price; in the second test, we examined the null hypothesis about lack of the causality between AAPL stock prices and the dynamics of the frequent itemset {apple, stock}. The calculations were performed using  R packages. We have got the following results:
test 1
Granger causality test
Model 1: V3 ~ Lags(V3, 1:1) + Lags(V2, 1:1)
Model 2: V3 ~ Lags(V3, 1:1)
  Res.Df Df     F   Pr(>F)  
1     87                    
2     88 -1 10.05 0.002103 **
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

test 2
Granger causality test
Model 1: V2 ~ Lags(V2, 1:1) + Lags(V3, 1:1)
Model 2: V2 ~ Lags(V2, 1:1)
  Res.Df Df      F Pr(>F)
1     87                
2     88 -1 0.3261 0.5694

p-value in the first test is equal to 0.002103, this is significantly less than the standard significance level of 0.05. P-value in the second test is equal to 0.5694, this is substantially more than the standard significance level of 0.05. It means that the dynamics of the frequent itemsets of keywords {apple, stock} in users' tweets under analysis determines the dynamics of Apple stock prices.
Taking into consideration the causality found between the frequent sets of keywords and stock price,
one can predict stock prices using multivariate vector autoregressive model. On the following figure, we showed  the forecasting for AAPL which is based on the VAR model  using both hystorical Apple stock dynamics and the dynamics of frequent itemsets of keywords {apple, stock}.

Our program Tweets Miner for Stock Market is described in our previous blog

Комментариев нет:

Отправить комментарий