When I first read about the z-score system some years ago I knew it was a fluke. Descriptive statistics cannot be easily converted to edges due to non-stationarity. More importantly, a quant should be always suspicious and strive to reject systems. This is the only way to survive in a market where edges are rare and hard to find.
If you are a technical trader one of your main goals should be disproving apparent edges, not trying to prove them. When you always strive to prove that ideas are good, you risk at the end of getting fooled by randomness. Instead, your goal should be trying to disprove ideas, constantly and relentlessly. In this way you risk rejecting a few good ideas but you reduce the risk of getting fooled by randomness. Which do you prefer? Getting fooled by randomness or missing an edge or two?
The other day a trader posted a link to work involving the z-score contrarian system. That brought back memories of endless backtesting using descriptive statistics many years ago and reaching finally the conclusion that there can be no robust edges based on them due to the non-stationarity of the mean, standard deviation but also of higher moments. I also remembered how I reached the decision that the contrarian z-score was not only a fluke but also a dangerous system to trade.
Based on the details offered in the reference given at the end of this article, back in 2010 I coded the z-score system, I backtested it on SPY data and found out that my results agreed with those in the reference. Then, I just backtested the system not on the portfolio suggested but in GLD, an ETF that was hot at that time due to the strong uptrend in gold. To my surprise or not, the backtest results in GLD with data from inception on 11/08/2004 to 11/30/2010 revealed a disastrous system with a compound annual return of -10.3% and a falling equity line, as shown on the chart below:
The blue line is the buy and hold equity for a $10,000 starting value. In the backtest, position size was calculated based on a constant value of $10,000 and no commission and slippage were included, which would make things worse of course.
That was enough for me. If the z-score system could no profit from a market where even uninformed and random traders could make money via buying and holding for a few days, then it was a bad one. To the credit of the technical analyst in the reference below, it was acknowledged that the system had certain issues. I agree with the analysis and with the comments, which I find good. I disagree with what appears to be an attempt to prove that a system is profitable by selecting a portfolio of a few ETFs where it performed well. This is survivorship bias and should not be part of any analysis. Quants should strive to demolish apparent edges. Only then they can survive the noise.
The z-score system could not make a single dollar in a course of six years in a very dynamic market, GLD. I thus rejected it. It paid because those who continued using it in to trade SPY did not do well after 2013, when the statistics changed and the standard deviation decreased, as shown on the chart below:
From 01/02/2013 to 10/17/2014, the net return of SPY before dividend adjustments is about 29.4% but the z-score system generated a return of only 9.6% and thus underperformed the buy and hold by a margin that is wide enough to indicate that the trades were random and the profit generated was by luck alone. This can be shown mathematically but I do not wish to take more of the valuable time of my readers because the main points of this post were made already.
You can subscribe here to notifications of new posts by email.
Disclosure: no relevant positions.
Charting program: Amibroker