I have shown several strategy backtesting examples in my previous article. The strategies that I used before are relying on existing strategies which may have been proved effective to some point. But it is critical to do the whole set of development workflows in order to develop your customized investment strategy.
First of all, we can look at the basic flow about developing a strategy:
The example here is the demo I copied from the ‘Developing high-frequency equities trading models’ from Quantopian.
The purpose of this test is to show evidence that there are opportunities to generate α in the high-frequency environment of the US equity market.
Here we are using Principal Component Analysis(PCA) on log returns as a basis for short-term valuation and market movements prediction.
The testing targets are a number of liquid ETFs that span the market. We get the prices of those ETFs from 2015–11–25 to 2015–12–10 since the time interval that I am using here is every minute rather than usual daily frequency.
From this step, we will get both logarithms of prices and returns. As I mentioned before, we choose returns rather than prices because of returns, versus prices, is normalization: measuring all variables in a comparable metric, thus enabling evaluation of analytics relationships amongst two or more variables despite originating from price series of unequal values. As for the benefits of log returns, the first benefit is<a href = https://en.wikipedia.org/wiki/Log-normal_distribution target="_blank"> log-normality(if we assume that prices are distributed log normally then the
is conveniently normally distributed); the second benefit is that it is approximate raw-log equality(when returns are very small, the approximation ensures that they are close in value to raw returns. Additionally, log-returns is time-additive which means that a statistic frequently calculated from this sequence is the compounding return(is the running return of this sequence of trades over time).
The graph below shows the plot of normalized log prices of all of those ETFs(dividing log-normal random variables will resulting in log-normal distribution).
The next piece of code will show you how to get the PCA(PCA can help us to identify patterns in data based on the correlation between features and keep only the most significant singular vectors to project the data to a lower dimensional space; we usually use it to de-noise of signals in stock market trading) spreads which are calculated by minus model estimate returns from log returns that we computed before; if the spread > 0 means the estimate returns are smaller than log returns and we should SHORT, so if the spread < 0, we should LONG.
Here the n_components(based on the definition ‘if n_components == ‘mle’, Minka’s MLE is used to guess the dimension if 0 <n_components < 1, select the number of components such that the amount of variance that needs to be explained is greater than the percentage specified by n_components’) is a number of principal components that we keep to use them as risk factors so as to use them to run the regression against returns and we assume that they can explain the majority of risk factors in the market.
The graph below shows the outcome of spreads calculated from the last step. The returns here are cumulative spreads over time and it looks like they are around 0(which could imply mean-reversion could be exploited). In order to show this trait more clearly, we will plot histograms of those returns as below.
Here is the histogram plot along with a normality test using Jargue bera statistic test we added(if the p-value is too small means we should reject the null hypothesis that they are normally distributed). The p-value here is 0.0, which means the distribution of cumulative returns is not normally distributed. It is not a surprise for us since the finance data is rarely normally distributed and you can capture a large reversion on tails.
From this step, we will use the model we built before to do the prediction. First of all, we need to extract the signs( long or short) from the model. The code below shows how to achieve this goal by define a variable called signs and the output of signs will show as below( the screenshot only contains part of the result): the 1 implies LONG signal, and -1 implies SHORT signals.
After seizing these signals, we should add them into our prediction model to calculate the cumulative returns after we made the trades based on the signs. The picture is as below, we can see the very positive upward returns series in general. By doing this, we successfully transform a batch of stocks by applying the signals into post-trade equity curves theoretically. The outcome is very solid and goes up to the right based on the graph below.
From the graph below, we sum up the cumulative returns of each ETF and the return prediction line shows the theoretical sum returns based on this strategy.
Everything looks so far so good. The next steps would be testing the strategy’s resilience and endurance for different factors and you can grasp a general idea about how your strategy will perform under different situations.
First thing here is to understand when we construct our strategy here we actually fell into the look-ahead bias which in short is a bias happened when your backtest is using tomorrow’s prices to determine today’s trading signals(or it is using future information to make a ‘prediction’ at the current time. Also, this bias will only influence a backtesting program but not a live trading program. The difference between backtesting and a live trading program also points to a way to avoid look-ahead bias). Here we are using the whole sample to fit the models and in order to examine the estimation we are going to use 500-day rolling windows and add this into our data frame. By doing so, we can generate a new signals series using the out of sample data and add the signals into our returns and plot the returns prediction graph.
We can still see the most returns are moving upwards.
Based on the outcome of the last step, we can still assume that this strategy may work and worth to conduct the further tests. Then we will test the cumulative sum returns of funds over time versus the various holding period. Also, we get the daily turnover from this step (turnover: the number of shares traded for a period as a percentage of the total shares in a portfolio or of an exchange).
As we can see, the longer the holding period the relatively smaller returns over time which implies the relationship between holding period and the returns will decay quickly in this case. We separate the holding period by the minute and there are total 21 intervals for testing.
We have tested the relationship between the holding period and the returns series. How about the relationship between the waiting time and the returns? In order words, we also want to know what will happen if we wait longer to place the order instead of executing immediately?
Similar to the last step, we set up 21 different intervals as ‘delaying period’ ranged from 0 minutes to 20 minutes. We can see clearly from the graph that the effect of execution delay on the return is almost immediately. Therefore, we can summarize that the returns of funds are highly dependent on being able to capture the price and execute immediately at that price without any delay. If we miss the current piece, the returns series will go into a range which is highly undesirable fast and severely.
Based on the two stressing tests (A simulation technique used on asset and liability portfolios to determine their reactions to different financial situations)above, we have a general idea about how the return series will change regarding different holding periods and execution delay time period. We want to know more details about the slippage (the difference between the expected price of a trade and the price the trade actually execute at which often occurs during periods of higher volatility, when market orders are used, and also when large orders are executed when there may not be enough interest at the desired price level to maintain the expected price to trade) and turnover effect on our portfolio by further performing Slippage/Turnover Analysis on the strategy through the following steps.
The graph below shows the relationship between the Sharpe Ratio of the portfolio and the holding period(assuming no slippage). We can see that as the holding period getting longer the Sharpe Ratio started going down.
We will use different basic points(BPS) of slippages to examine the performance of the strategy through its Sharpe Ratio. The reason why it is important is that it happens occasionally that you can only fill your bar with current price and the rest of it will fill out at a less desirable price. Assume some point 0.01% that we are going to miss out the current price and so on.
The graph below shows the trend of Sharpe Ratio regarding the different bps of slippages. As for the color of each line represents the different holding period. From the trends of the plot, we can summarize that as slippage rate increases your strategy will make less money. But one interesting thing here is that we get some intersections such as between 50bps to 60bps which imply that it is no longer desirable for us to trade at the 1-minute frequency and in turns it becomes more desirable to trade at 2 or 3 minutes frequency.
All of these tests above are preparations for the real backtest, we can also import the backtest results and even conduct further analysis on research environment here. I won’t upload the code of the backtest here. If you are interested you can check the link here. But here are some graphs which show the testing outcome of this strategy’s backtest here( the backtest timeframe is from 2015–12–15 to 2015–12–31 and we test it on the minute data).
The assumption behind this strategy is no slippage and no transaction cost. We can test the sensitivity of this strategy towards the commission costs and slippage here.
The picture below shows the backtesting result of average daily turnover which is 198 which implies it is going to be very expensive in practice to trade on this strategy.
This graph shows that the trend of Sharpe Ratio based on the bps of slippage. We can see that the Sharpe Ratio goes down fast as soon as we add any slippage ratio(this high-frequency trading strategy is very sensitive to the slippage). The Sharpe ratio then recovers gradually because you have no money left so your volatility goes very low which result in the increasing of your Sharpe Ratio.
We can test the sensitivity of the strategy to commission costs by plotting the average P&L over different commission costs. We can use the average daily P&L across the 0 line very quickly if we pay around $0.0001 commission cost per share. Thus, this high-frequency trading strategy will end up not making money quickly if the commission cost is above it.
You may also interest in the outcome of this backtest. I will upload the screenshot of it as below:
This article introduces a general idea about some details related to developing and testing a strategy. By using this high-frequency trading strategy as an example, we can amplify the effect of certain costs or slippage on your strategy. However, all of these tests are based on certain assumptions and may not reflect the reality very well. We can utilize it as tools for us to determine the strategy’s feasibility. As the 7 steps about how to develop your own trading strategy I introduced at the beginning, we should keep researching and checking our strategies by repeating these 7 steps in order to update our strategies to keep up with the potential opportunities or risks.
Please feel free to leave your comments or share your ideas here. Hope you enjoy the content. Thank you!