Break ranks to differentiate skill
A RECENT, OFT-CITED STUDY found that consultants are actually worse at picking managers than do-ityourself investors. Bergstresser, Chalmers and Tufano , professors at Harvard Business School and the University of Oregon, documented that “financial intermediaries do a lousy job of allocating client assets to mutual funds”. Similarly, the press frequently observes the average fund-ofhedge-funds consistently underperforms the average hedge fund and that underperformance isn’t due solely to fees. Simply stated, outside observers find professionals haven’t delivered on their promise of finding skilful managers. The profession should heed that failure and take steps to change what’s clearly been a losing game.
When data contradicts theory there’s excitement about the potential to improve the theory. In this case it’s traditional benchmark theory that needs improvement. The data shows indices and peer groups haven’t succeeded in differentiating between winners and losers and we show why in this article. But we don’t stop there: the literature is rife with documentation of the deficiencies of those benchmarks. This article describes how accurate benchmarks can be constructed from indices and how peer group biases can be overcome. Accurate benchmarking entails a lot of work but it’s well worth the effort. If the benchmark is wrong all of the analytics are wrong ‒ so losers are hired and winners are fired. It’s time to break away from this loser’s game.
A benchmark establishes a goal for the investment manager. A reasonable goal is to earn a return that exceeds a low-cost, passive implementation of the manager’s investment approach, because the investor always has the choice of active or passive management. It’s important to recognise the distinction between indices and benchmarks. Indices are barometers of price changes in segments of the market. Benchmarks are passive alternatives to active management. Historically, common practice has been to use indices as benchmarks but returns-based style analyses (RBSA) have shown most managers are best benchmarked as blends of styles that may not always be apparent in the index.
The user of RBSA must trust the “black box” ‒ because the regression can’t explain why that particular style blend is the best solution. In his article that introduced RBSA, Nobel laureate William Sharpe  set forth recommendations for the style indices used in RBSA, known as the “style palette”: “It’s desirable that the selected asset classes be: • Mutually exclusive ( no class should overlap with another). Exhaustive (all securities should fit in the set of asset classes). Investable (it should be possible to replicate the return of each class at relatively low cost). Macro-consistent (the performance of the entire set should be replicable with some combination of asset classes).” The mutually exclusive criterion addresses a statistical problem called multicollinearity and the other criteria provide solid regressors for the style match. Because the commonly used style palettes fail to meet those criteria the results can’t be relied upon. In other words, the way we typically use this excellent tool is flawed. Using indices that don’t meet Sharpe’s criteria is like using low octane fuel in your high-performance car.
Though custom benchmarks developed through RBSA are more accurate than off-the-shelf indi- ces, statisticians estimate it takes decades to develop confidence in a manager’s success at beating the benchmark, even one that’s customised. That’s because when custom benchmarks are used, our assessments about manager skill are conducted across time. An alternative is to perform that test in the cross-section of other active managers, which is the role of peer group comparisons.
Peer groups place performance into perspective by “ranking” it against similar portfolios. Accordingly, performance for even a short period can be adjudged significantly if it ranks in the top of the distribution. When traditional peer groups are used, ”manager skill” is tested by comparing performance with that of a group of portfolios that are presumably managed in a manner similar to the portfolio being evaluated, so the hypothesis is tested relative to the stock picks of similar professionals. That makes sense ‒ except that someone has to define “similar” and then collect data on the funds that fit that particular definition of similar.
Each peer group provider has its own definitions and its own collection of funds, so each provider has a different sample for the same investment mandate. “Large cap growth” is one set of funds in one provider’s peer group and another set of funds in the next provider’s peer group. Those sampling idiosyncrasies are the source of well-documented peer group biases, including composition, classification and survivor biases. For a detailed discussion of those biases, see Surz .
Because of those biases peer group comparisons are more likely to mislead than to inform and therefore they should be avoided. Given the common use of peer