I have posted some pictures and tables about how to evaluate algorithm’s performance and the risk characteristics in my last article. In this article, I will go through each picture and explain a little bit more about them. Also, I will perform more tests on my portfolio related to this topic. Hopefully, these tests and explanation can provide you with a comprehensive insight about evaluating and understanding an algorithm.
It is helpful for us to run through the backtest over a long time period since we can only get access to the exhaust data. Here is the updated backtesting result of my strategy (posted in my last article)which is from 2010–12–31 to 2015–12–31(5 years in total).
You can see that the returns are moving upward in general over last five years. From the metrics, we can see that the total returns are almost twice as large as benchmark’s returns. The alpha is 0.31 and the beta is -0.33 which means the strategy actually is kind of moving towards the opposite direction of the market over time. The Sharpe ratio is high enough to prove the test is significant to some extent. Moreover, we can conclude that for every unit of risk we are taking we can get 1.64 unit returns back. The volatility is 0.17 which is relatively low for this long time frame. But the MaxDD is still sort of high for which I will explain in detail later.
From the graph below, we can find that the cumulative returns are higher than the benchmark(S&P500) in general over last 5 years. We can also find that the earnings coming from 2015 increased dramatically so we could perform a detailed analysis on our strategy for 2015 in order to figure out the reason(s) behind, which may also help us to correct our strategy into the one more resilient towards the dynamic market.
We are intending to create a portfolio which can not only generate high returns but also have a low correlation to the market which implies that the portfolio is being hedged. Therefore, we are looking for the low β of the strategy which is better around 0. Here we are using the long and short technique to keep our rolling portfolio β low. The rolling β of this strategy is relatively stable and only getting close to -1 over a short period between 2012 to 2013.
As for the Rolling Sharpe Ratio over 6 months, it means that we take data over last 6 months to calculate the risk-adjusted returns for the next period. The blue dotted line shows the long-term rolling average Sharpe ratio which is around 1.2. We can see how the rolling Sharpe ratio deviates around the average line from the plot below. We care about the consistency of how predictable it is going to be from the data of last 6 months for the next 6 months Sharpe Ratio.
I have briefly explained the graph below in my last article. We are looking at the 3 common risk factors and how returns of the portfolio spread among these factors. For example, if your strategy is highly correlated with the SMB which means your strategy has higher exposure to smaller market cap stocks than the large market cap stocks. In other words, your strategy is sort of betting on that the small cap stocks may outperform the large cap stocks. Here, our portfolio seems to have high-risk exposure to momentum factor, which means that our strategy tends to long stocks that recently going up and short the stocks that recently going down 0r the reverse.
We are looking for the portfolio with relative high α.(Alpha, often considered the active return on an investment or we can interpret it as the residual performance left over which evaluating the return of an investment without regard to the risk taken offers very little insight as to how a security or portfolio has really performed) The portfolio with high α requires low single factor beta exposure. Therefore, it is better for them to moving around 0. If they are too high or too low we would assume that the person who built up this algorithm may not come up with a good solution to control those factors well, as a result, the portfolio may suffer unintended exposure over time. Ultimately, we need to make sure that our strategy long and short the same size of stocks on both sides of each factor.
From the screenshot below, we can identify the long and short concentration respectively of our portfolio. The % of each stock represents the percentage of specific stock within your portfolio. Higher the position concentration higher the exposure your portfolio towards it. The maximum concentration of my portfolio’s long side is 2.9% of ‘NCT’ and -2.6% (‘BTU’)of my short side which are appropriate amount since the low concentration percentage can imply the low exposure our portfolio to certain stocks.
The graph below shows the information about worst drawdown periods. The darker the color the worse the drawdown. The width of the colored area implies the duration of drawdown. We can identify the pattern more directly from the underwater plot under it. When we consider the influence of the MaxDD we can think it this way: it represents how much drawdown that your portfolio could suffer from a peak to a trough at once.( if you are going to apply the leverage on your portfolio to make returns more impressive then you need to prepare that the Max drawdown will increase accordingly and whether you can accept this amount of loss).
The following plots will give you a sense of consistency of the returns which generating over calendar periods from the strategy. The first one is the Heat Map graph in which the darker the green means the higher the returns over a certain month, the darker the red implies the higher the loss over a certain month. From the next plot, we can tell that the portfolio performs well in 2011 and 2015 on the annualized base. As for the last one, we can find that the highest frequency of returns appeared over 5 years is around 0 and the average monthly returns are around 1%.
The stress events plots can tell us how our portfolio performed compared with the SPY(benchmark S&P500) when they are encountering some special events or periods such as the US downgrade/European Debt Crisis. We are looking at the movements among the performance(returns) of our portfolio reacting to these events which represent by the green line on each graph.
One thing to look at the gross leverage and long/short exposure plots below is to see if there are any intention or risk management measures are taken for our portfolio. If they are wandering all over the place instead of being stable or they are following certain trend over time, we would consider correcting our strategy so as to keep our portfolio tracked and to keep these measurements stable over time.
As I mentioned before, the returns and drawdowns of our strategy will be amplified by applying leverage on our portfolio. In order to check the original performance of our portfolio under the condition of no leverage, it is better for us to apply leverage equals 1 for testing purpose(which means no leverage).
From the two graphs below, you can see that the MaxDD and volatility are much smaller, so are the Sharpe Ratio and total returns. The β of each of the Fama-Frech factor is smaller too which means the risk exposure coming from those factors are getting smaller meanwhile.
We also want to know about the performance of the strategy within not only in-sample backtesting period but also the out-sample time frame. It is currently not possible for us to get access returns from the live-traded algorithm in a research environment. But you can manually set up the breakup point which will separate the backtesting period into the in-sample and out-sample part and measure the performance separately in both periods. It is very easy to outfit the algorithm to only work well within the specific timeframe, so comparing in-sample and out-of-sample(OOS) data is necessary for us to evaluate our strategy. The out-of-sample period here is from 2014–01–01 to 2015–12–31.
Different from the whole sample cumulative returns, this one has a cone that gives us an indication about how our algorithm performing OOS. The size of the cone represents the possible range of the OOS returns would locate within out-of-sample(OOS) period(from 2014–01–01 to 2015–12–31).
The next three distribution plots comparing the in-sample and OOS returns distributions. The first one standardizes both distributions to have the same mean of 0 and standard deviation of 1. The other two plots relax this standardization.
What’s more, we can also apply Bayesian Statistic test on our strategy for uncertainty quantification purpose(It tries to determine how likely certain outcomes are if some aspects of the system are not exactly known). All the values that you saw above, like the Sharpe Ratio, are just some single numbers. Hence, these estimations are noisy since they are computed over a limited number of data points. With the help of Bayesian statistics, we can deal with the probability distribution that assigns degrees of belief to all possible parameter values within certain time period.
The first Bayesian Cone graph is similar to the cone plot you already saw before, it takes uncertainty into account(i.e. a short backtest length will result in a wider cone) and it does not assume normality of returns but instead we are using a Student-T distribution with heavier tails.
The second row is comparing mean returns of the in-sample backtest and OOS(forward) period. As you can see, the mean returns here is not a single number but a distribution which provides us an indication of how certainty we can be in our estimates. That is why the green distribution on the top of the blue one should be much wider representing our increased uncertainty due to less OOS data. Then we can calculate the difference between these two graphs and the outcome is on the right side. The two vertical grey lines denote the 2.5% and 97.5% percentiles. Intuitively, if the right grey line is lower than 0 you can say that with probability > 97.5% the OOS returns are as below what is suggested by the backtest. Here our distribution seems equally distributed around 0. We can see that the returns of two distributions are theoretically similar to each other. (The model here is called BEST.)
The next a couple of rows follow the same pattern but are an estimate of annual volatility, Sharpe ratio and their respective difference. We can tell from the graphs that our strategy’s OOS volatility seems to smaller than the in-sample one and the Sharpe ratio seems to slightly higher than the in-sample one.
The 5th row shows that the effect size or the difference of means normalised by the standard deviation which intends to give you a general sense about how far apart the two distributions are. Intuitively, even if the means are significantly different, it may not be very meaningful if the standard deviation is hugely amounting to a tiny difference of the two returns distributions.
The 6th row shows the predicted returns based on the backtest for tomorrow and 5 days from now. The blue line indicates the probability of losing more than 5% of your portfolio value and can be interpreted as a Bayesian VaR estimate.(VaR means the loss(of the portfolio value) that would be reached or exceeded with a given probability α over a certain time horizon, usually it is measured in 3 variables: the amount of potential loss, the probability of that amount of loss, and the time frame; Bayesian VaR takes into account parameter uncertainty and non-linear relationship between ordinary and logarithmic returns.)
Lastly, a Bayesian estimate of annual α and β. In addition to uncertainty estimate, this model assumes returns to be T-distributed which lead to more robust estimates than a standard linear regression would. The default benchmark is the S&P500.
We can also design our own question such as the probability of the Sharpe Ratio larger than 0 by checking the % or posterior samples of Sharpe ratio are > 0 here:
The next picture is the flow chart which starts from the inputs such as USEquity Pricing.close then goes to each variable that I defined in my algorithm as well as how I used these variables to build my ranking scheme and filters to narrow down the stocks and allocate them into my long and short basket respectively. The chart can assist you in understanding the logic behind your algorithm better and in finding the criteria that you missed.
Here is one more function that I would like to introduce, we can also extract the securities information which is the result that the pipeline would generate in before_trading_start on that day, using the data of the day before through the step as follows.
Please feel free to share your ideas or any comments here with me. Hope you enjoy the content. Thank you!