With the advance of computer natural language processing and understanding capability, we can add and deploy more factors and add them into our prediction model such as the news sentiment score that I will introduce here.First of all, I would like to introduce a little about the background of sentiment analysis based on my research. Nowadays, it is accessible for us to take advantage of the advanced technology to analyze the textual information contained in the news items and assign the a ‘sentiment score’ to each article that may bring an impact on the stock’s price and an aggregation of these sentiment scores from multiple news or posts from certain timeframe was found to be predictive of stock’s future returns.According to Google Trends, the word ‘sentiment analysis’(refers to the use ofnatural language processing(NLP),text analysis, and computational linguistics to identify and extract subjective information in source materials.) has been increasing dramatically over the past 5 years, as you can see from the graph below:
In In order to take advantage of sentiment analysis for us to predict stock market movements, we can utilize the sentiment inforamtion as a directional signal to figure out whether to long or short certain stock within a portfolio. Many quantitative hedge funds have been incorporating the element of sentiment analysis in their trading strategies.
As for the reasons why social sentiment becomes more and more popular for predicting stock market movement, they could be summarized as below:
First, it is time-consuming for humans to read articles and interpret their attitudes towards a certain matter. Second, the traditional efficient market hypothesis asserts that financial markets are ‘informationally efficiently’, which means current stock prices already reflect all known information and all occurred facts. Therefore, investors should not find any arbitrage opportunities to make excess profits from the market if their trading strategies are based on known information. However, other encouraging results such as sentiment analysis prove the conditional usage of efficient market hypothesis.
Besides, since news is unpredictable, stock market prices will follow a random walk pattern and cannot be predicted with more than 50% accuracy, which is also called Random Walk Theory (the theory that stock price changes have the same distribution and are independent of each other, so the past movement or trend of a stock price or market cannot be used to predict its future movement). However, recent research suggests that news may be unpredictable but that very early indicators can be extracted from online social media to predict changes in various economic and commercial indicators. Behavioral finance has also provided further proof that financial decisions are significantly driven by emotion and mood.
At last, the beginning of times, the short-term trader can only rely on 2 factors to predict returns: price and volume. The addition of sentiment becomes another short-term factor which provide a brand new insight for investors.
Although there might be a correlation between public sentiment and stock prices, or in other words, we have found the correlation between company’s stock trading volumes along with financial returns and company related news by a large-scale text sentiment analysis, we have to keep in mind that it is important for us to use this type of linguistic analytics to filter out the noise on the Internet and only focus on the most relevant, impactful and market moving news in order to get a broader view of a company, sector, or an entire market. Some High-frequency trading(HFT) engines have successfully incorporated social sentiment as one essential element for developing trading Algo using their own mood tracking tools and others may take sentiment data feeds based on automated analysis of news, electronic media along with market data and news from external libraries.
Here I will show you an example about how to combine sentiment analysis with the trading algorithm with the example below.
The first graph here shows the backtesting outcome of this strategy. You can also check the video for more details.
As usual, let’s look at the metric above the plot. The total returns are 122.6% over 1 year time period which is desirable for the investors. The α is 1.12 and it is very high and β is 0.32 which is relatively low as we expected. As for the Sharpe ratio, it is 3.75 for this period which implies the considerable returns for investors to compensate the risk they are bearing. The volatility is 0.32 which is low and 14.29% MaxDD is acceptable compared to the 122.6% total returns. In sum, all those measurements are quite impressive overall.
Let’s look at the code of the sentiment related strategy next. We import the external CSV files(I upload the picture of part of the file below). As you can see, the file contains the sentiment score of 4 securities(‘AAPL’,’FB’,’SPY’,’TSLA’) within 2013. I will use the score calculated in the file as the ‘bullish’(when sentiment score= 1) or ‘bearish’ (when sentiment score = -1) signals to allocate the stocks into long and short baskets. The strategies will incorporate these signals as the guide to enter positions at the beginning of market open and keep that positions until that sentiment changes. The sentiment used to calculate the trade: short, long or null is for a given trading day.
The algorithm that built up for backtesting includes the commission(cost $0.0075 per share and $1.0 for minute trading) and slippage (slippage is generally referred to as the ‘price impact’ of your trade since your order affects the market i.e. your buy order drives prices up and sell order drives price down). By doing so we can ensure the accuracy of our backtesting result compared to the real-time trading result.
From the picture below, you can see that the algo sets up two baskets- bulls and bears- whenever the sentiment score equals to 1 and we currently don’t have that security, we will add it into our bulls basket and vice versa. If the sentiment score equals to 0 means we won’t buy or sell any securities under this condition. The leverage I set up here for each basket is 0.5, which means 1 in total(no leverage), and I will record the leverage throughout the whole backtesting period to track the changes of it.
Furthermore, we are running the backtest in minute mode instead of daily mode for the strategy and we set up the function to cancel all open orders at the end of each day while printing out details about both long and short positions that we may be holding overnight(you can check the detail from the video I upload).
As you can see the screenshot below, the metric is calculated by looking at the number of bullish/bearish tweets for a given day. If the number of bullish tweets is greater than the number of bearish tweets, then we record the score of 1 of this security and -1 for bearish. As for the resources of the file, you can check here.
I also plot the cumulative sentiment scores of these 4 securities on one graph. The mechanism behind is that if the sentiment score equals 1 I will add 1 into the y variable(cumulative score) and if the score equals -1 I will add -1 to y reversely. The x(time series) will always add 1 when we loop through each row of the file which is 314 rows in total(x = 314 in the end).
As you can see clearly from the graph that the cumulative sentiment score of all three securities including AAPL, FB, and TSLA are generally moving upwards over time, but the cumulative score of S&P500 is moving downwards in general. In short, the tweets that we collected for our sentiment analysis over that period show the generally positive sentiment about AAPL, FB, and TSLA and slightly negative sentiment about S&P500. You can also check part of the code I wrote for plotting the graph as below.
Here are some interesting graphs from the analysis of this backtest. It is not surprising to see that our portfolio’s long exposure is much higher than the short exposure in general since the sentiment trends of most securities included in the portfolio are moving upward. We can also capture the information about the allocation of each security in the portfolio over time. The allocation of SPY and FB are quite dynamic over time(keep moving back and forth). However, the allocation of AAPL seems to keep in a high position over time and it is getting even higher from the July of 2013. If you connect its allocation with the cumulative sentiment score graph above, you would find that the sentiment score of AAPL is high in general and the upward trend of it is getting steeper at the end of the period which happens to illustrate the reason of its high and consistent allocation in the portfolio.
From the graph below, we can see the record of daily turnover of our portfolio(A measure of stock liquidity calculated by dividing the total number of shares traded over a period by the average number of shares outstanding for the period; It shows how much of his account does the trader use. The lower the number the safer the trader plays). The average daily turnover is around 0.2 which is low. In terms of the daily trading volume(when the average daily trading volume is high, the stock can be easily traded and has high liquidity. As a result, the average daily trading volume can have an effect on the price of the security), it vibrates around 1000 shares for this time frame.
There are also several similar ways except calculating the sentiment score directly for developing a trading strategy. The studies have found that there is the correlation between changes in large-scale information gathering behavior onWikipedia and market participants’ trading decisions. They compared the changes in Wikipedia usage to subsequent stock market movements in the historical data by implementing a hypothetical investment strategy that utilizes data on either Wikipedia page views or Wikipedia pages edits to trade the Dow Jones Industrial Average(DJIA). I will briefly show you the result from the graph below. You can check this link for the explicit details.
You can see that the returns of Wikipedia article view based strategies from the period are significantly higher than the returns of the random strategies. Here is also an algorithm which related to this strategy that you may interested in(you can check the link here).
Moreover, the studies also demonstrated that Google Trends contain signals that can be exploited in a trading algorithm. You can check this article to get more details about the research and the link here for the corresponding algorithm. I want to share with you one of the interesting results they got here:
Based on the investigation they conducted, they analyzed the performance of a set of 98 Google search terms which include terms related to the concept of stock markets. Then they implemented a hypothetical investment strategy based on search volume data. Profits can only be made if at least some future changes in stock prices are correctly anticipated. The graph above shows the cumulative returns of 98 investment strategies based on search volumes(restricted to search requests of users located in the U.S. for different search terms on the left side and the global search volume on the right side.) The two shades of blue show the positive returns and two shades of red illustrate negative returns. You can also identify the related words on the right side of each graph.
The sentiment-based algorithms are heavily relying on the sentiment analysis result that we either generate through our own models or import from the external resources. In terms of building our own sentiment analysis model, we could consider starting from the following steps:
I will show you more examples of sentiment analysis and trading algorithms in my next article. Please feel free to leave your comments and share your ideas here with me. I am looking forward to hearing from you. Hope you enjoy the content. Thank you!