As I asserted in my last article, we can find that the sentiment analysis trend has been on the rise since 2008 (Google Trends). In order to further verify this trend and its relationship with the stock market, I did the following test using Google Correlate(you can feed in your keywords or topics, and it will tell you which search terms are most closely correlated with your data). I intended to test the factors contributed to the ‘stock returns’. From the list of correlated words with ‘stock returns,’ we can see that the ‘opinions’ rank the highest and their correlation coefficient is about 0.91(this list is generated based on the weekly trend of each word over time). The following graph is the linear regression plot between the two words and the graph shows a clear linear relationship between the two words.
After this test, we further verify the correlation between the social sentiment and the stock prices/returns. But, the next question would be: how does social sentiment/opinions affect the stock market?
Here we will perform another two tests to attempt to answer the question above and to check how certain financial events(which will bring the direct impact on social sentiment) affect the returns of a stock. The most common event might be the Post Earnings Announcement effect, which in short is the tendency of the stock’s cumulative abnormal returns to drift for several weeks following a positive earnings announcement, there are a number of hypotheses for this phenomenon and the most widely accepted explanation is the investors’ underreaction to earnings announcements. Therefore, a stock’s price of the company will move in the direction of the earnings surprise. Furthermore, the movement does not only happen immediately after the earning surprise but continue to drift in the direction of the surprises for several days afterwards. There are opportunities to profit from this drift, and many other events that you can study and test their impact on stocks prices in general or specific such as share buyback(s), stock splits or Board change announcements.
The database we are using here is from Accern(I will explain it later) which analyzes 20m+ blogs and news articles available on the web and summarize them into two sentiment measurements:
Article Sentiment (-1–1): This metric calculated the sentiment score of an article which is relevant to a company.
Impact Score (1–100): This metric calculated if the article will have a greater-than-1% impact on the stock on the same trading day.
We quantify the language used in financial news stories in an effort to depict and predict stock returns. By analyzing the complete set of events allows us to identify a common pattern in firm responses and market reactions to events. The method I am using you can find more details from this post. First, we start with the events brought up negative sentiment and test their influence on the stock’s prices and abnormal returns.
We create the filters to filter down the stocks based on their sentiment score and impact scores. We will only keep the stocks with the sentiment score lower than -.5 (higher than -.99) and their corresponding impact scores are higher than 85. I rank the final datasets in terms of their impact score ascendingly. The picture below only contains part of the result.
Here we calculate the abnormal returns(compared to market S&P500 returns) from t-20 ( our ‘asof_date’ is t) to t+ 20. We find out that the negative events/news seem to gradually incorporate into the stock prices as you can see that the stock abnormal returns gradually to decrease 20 days prior to the event date and the slope is getting larger when the date comes closer to the event date which is 0 here(the abnormal returns drop around 2.5% on average). After the day 0, there is a slightly reversal in the stock prices which indicates that the previously priced negative information was an overreaction to the actual news.
However, given a lot of stocks that we included and as negative sentiment articles vary in degree the result we have here, may not be able to efficiently incorporate linguistic information and reflect market movement. In order to verify this result, we also calculate the standard deviation of the abnormal returns. You can combine the last graph with the standard deviation graph together. It is not surprising that the standard deviation is generally high. This is the common phenomenon in the event study. At the beginning, we add the filter to filter down the articles with the +1.0 sentiment is also for the purpose to get rid of the extreme event and lower the standard deviation.
The high standard deviation implies that the further investigation awaits for us to conduct next and the possible solution could be that we categorize the stocks based on their sectors, indexes, or some other mixing factors to filtering down the stocks and to get more accurate result.
Using the same mechanism, I also conduct the research on positive news and their impacts on stock returns. I filter down the stocks based on their sentiment and impact score and the order of the stocks is based on their date(from the closest date to furthest date).
It seems like the positive news may not be incorporated into the stock prices efficiently. Only around 3 days prior to the publishing date, the stock returns show a dramatic increase. This upward momentum of stock abnormal returns last from the few days before day 0 to 20 days after the day 0. The slope seems to be constant throughout this period. However, the volatiles of those stock abnormal returns is still high in general.
The positive sentiment articles drift about 1% increase in the stock price and .3% prior to the releasing date.
These event studies can provide us with a general feeling about how certain event may influence the stock returns so that you can build up your own strategy to exploit the potential profits from the study. The stock prices started to change into the corresponding direction based on the sentiment prior to the day 0 happen in both case and it is a phenomenon caused by Asymmetric Information among investors.
Compared to the reaction towards positive news, people tend to overreact to the negative news based on the results of the studies. The trend tends to continue for a while after the positive news is released. It is the opposite to the movements of stock prices of negative news(the stock prices encounter a reversal after the news really released).
Moreover, we can also conduct the test on a single stock. Here I will use AA(Alcoa Inc) as an example. We can check its article sentiment and impact score for any time frame of the stock. I also plotted the sentiment score together with impact score from 2015–01–01 t 2016–01–21 on the same graph(by multiplying article sentiment by 100).
The graph below is the cumulative article sentiment over this time period.
In order to better check if the sentiment has any correlation with stock returns. I also plotted the daily returns of the stock and its sentiment score over time on the same graph. We can see that usually a high sentiment score will bring up a spike of stock returns at the same time or with only a small delay. Reversely, a negative sentiment score will result in a negative spike on stock returns.
As I mentioned before, the sentiment resources I am using here is from Accern (you can get access to their database directly on Quandl), this sentiment database is updated on a daily basis and it contains the information about sentiment(based on the sentiment of articles and news written about the company in the last day; higher the score the more positive the outlook) and impact score(the probability that the stock price will change by more than 1% given by ((close price — open price)/ open price) on the next trading day.
Here are multiple sentiment resources choices for you such as PsychSignal which employed several individuals in psychology to create an engine that can track 12 different emotions, including anger, sadness, and love. Clients receive two scores — bullishness and bearishness. By defining and tracking multiple emotions might be helpful since ‘emotional responses are a significant factor in the real-time processing of financial risks, even among the most rational investors in the economy’, and the behavioral finance maintain that investors often are guided by irrational feelings such as overconfidence, overreaction, herd mentality, loss aversion, fear, greed, or simply optimism, and pessimism.
Next, I will show you an example about using the database and add it into your algorithm for trading. I have used the CustomFactor trading algorithm before strategy(you can check this link for the Algo).
First of all, I will define three CustomFactors here:
- Impact Score: previous day’s impact score;
- Sentiment score: define a class to calculate the moving average of sentiment score over n days;
- Average trading dollar volume: it is generated by computing the 20 days average product of close price and trading volume.
From the screenshot below, you can see that I set up the stop loss of individual stock to 99%. I calculated the previous 5 days average sentiment score and add this factor into my pipeline. Filtering down the stocks by adding the conditions: daily trading volume > 10⁷ and the 200 days moving average >200. Also, I create the ‘top_impact’ variable which only keeps the stocks with top 100 impact scores to rank the average sentiment factor and generate a new factor called ‘sentiment_rank’. The leverage on each side is -/+0.5 (which is 1 in total).
Then I continue to narrow down the stocks in my portfolio by only keeping the stocks with the impact score higher than 85 and I will rank the stocks based on ‘sentiment_rank’ factor following descending order(from the highest to the lowest). I only keep top 5 stocks and add them into my long basket and bottom 5 stocks for shorting.
The graph below shows the backtesting result of the sample data from Quantopian Accern database and my testing data frame is from 09–01–2012 to 01–19–2016. You can also check the video for more details of this backtest.
As you can see that the total returns are 104.9% over this time period and the β is 0.28 which implies the low market risk exposure. The Sharpe Ratio is 2.10 which means we generate a relative high risk-adjust returns from this strategy. Also, this backtest has a low volatility and low MaxDD which are both characteristics that we expected our strategy to generate.
I have also uploaded the part of the log of the trading details below.
Here is the backtesting analysis of this strategy. You can check my previous article for the detailed explanation for each picture of this analysis. We can see that the average 6-month rolling Sharpe Ratio is about 2.0. The rolling Fame-French single factor betas:SMB,HML, andUMD(UMD (up-minus-down) is the momentum-factor-mimicking portfolio’s return) show that our strategy had a high positive SMB and Momentum risk exposure(implies that your strategy is betting on that the small stocks may outperform the large cap stocks and it tends to long stocks that recently going up and short ones that going down) and high negative HML risk exposure in the middle of the backtesting period. But then tend to go back to 0 since 2015. The average gross leverage of the backtest is around 1.2 and the long exposure is higher than the short one in general over time.
I also separate the backtesting period–in-sample and out-of-sample(OOS)– into two parts and the breakpoint is 2015–01–01. You can check their performance separately from the picture below.
From the Bayesian Statistic test below (for uncertainty quantification purpose), we can tell that the OOS mean returns distribution is on the left side of the in-sample one as well as the OOS annual volatility. From the difference of volatility graph on the right side, we can see the right gray line is lower than 0, hence, we can conclude that the probability > 97.5% that the OOS annual volatility is lower than the in-sample distribution.
Hope you enjoy the content. Please feel free to leave your comments or share with me your ideas here. Thank you!