Warning: long, complicated post here. You might want to skip this one if you don’t like maths.
Since the dawn of sport, man has been thinking of ways to rate the participants of it.
In the olden days, people would simply rate by win-loss records. But that made no judgement on the strength of the opposition. Being 5-0 against the worst five teams in the competition is arguably less impressive than being 3-2 against the best five teams.1
Attempts were made to change this. Dick Dunkel2 Sr created the Dunkel Index for college football in 1929, and his ratings are still going on today, except by Dick Dunkel Jr, Bob Dunkel, and John Duck, a newspaper editor. (According to their current rankings, Ohio State is on top).
However, things were by no means perfect. Then…came Arpad Elo.
Élő Árpád Imre, who later became Arpad Emerich Elo, was born in 1903 in the village of Egyházaskesző3, in the Austro-Hungarian Empire. In 1913, his parents moved to the United States, taking him with them. He was a professor of physics at Marquette University in Milwaukee, but also a chess master and eight-time Wisconsin state champion.
Elo thought about methods of ranking chess players, and came up with something which was an improvement on anything known before. From 1950 to 1960, the United States Chess Federation used the Harkness System, developed by Kenneth Harkness. Elo improved on that, and presented his Elo rating system to the USCF in 1960, which they adopted. FIDE, the World Chess Federation4, agreed to use Elo’s system in 1970, and he remained in charge until the mid-1980s, when they assigned the task to other people.5 He died in 1992, his place in sports statistical legend assured.
Statisticy Bit: How Do Elo Ratings Work?
This bit is quite relevant, but you don’t really have to pay much attention. Trust me, there isn’t going to be a test on this.
I’ll be using Metin Bektas’ blog post from 2013 on how to calculate it.6 As an example, we’ll come up with a hypothetical game of chess between Bobby Fischer7 and Garry Kasparov.8
Fischer had an all time peak Elo rating of 2785 (April 1972), while Kasparov had one of 2851 (July 1999).9 These are the values we will use.
The first value we need is the transformed rating, which is just to simplify the further calculations. This is worked out as such:
[transformed rating] = 10([original rating]/400])
Fischer now gets 9172759.3539, while Kasparov has a value of 13412199.3541.
The next step is to calculate the expected score10, and is calculated as:
[expected score] = [transformed rating]/([transformed rating]+[opponent’s transformed rating])
Fischer has 0.40614461476, which, by the miracle of Elo, is the same chance he has of winning. By necessity’s, Kasparov’s is 1 minus that, which is 0.59385538523.
Now, we’ll watch the match. Going by a Fischer victory,11 he gets one point and Kasparov gets none.
The final formula is this:
[new rating] = [old rating]+K*([actual score]-[expected score]).
What’s K, you ask? That’s the K-factor, and is the method used for working out changes. Getting this right is important. If K is too low, then the rankings will be too stable, and won’t change frequently enough. If K is too high, then values will be jumping around all over the place, and it will be impossible to get a read on things. FIDE have thought about this, and have a range of values. I’m using a K of 20, simply because that was the value used in FiveThirtyEight’s basketball analysis, and cricket is like basketball: a winning streak generally indicates skill.12
Anyway, using a K factor of 20, Fischer gets a new rating of 2797, while Kasparov’s is now 2839. This is a drop of 12, which can be significant.13
Elo is used in many sports, including chess14, soccer, American football, basketball, baseball, scrabble, and snooker, and now it’s being used on cricket.
I chose the Big Bash League to Elo rank, for a few reasons:
- It’s in Australia.
- It is very well documented.
- There haven’t been many games.
So, giving every team an Elo rating of 1500 to start with, setting a K factor of 20, and running through all of the games, I ended up with this:
There were a few things to note:
- The formula for adjusting ratings ahead of each new season was ([end of season score]*0.75)+(1500*0.25), another leaf out of Nate Silver’s book.
- The best BBL team in history was the Perth Scorchers after winning their second final in a row in 2014-15.
- The worst team were the Sydney Thunder after they had lost twenty in a row, just before they beat the Melbourne Renegades.
- Matches that went to a Super Over were ties, and no result games had the team with the same Elo rating as before the game.
- The Hobart Hurricanes are mediocre.
- Brisbane are the only really bad team at the moment.
- The Perth Scorchers are just as good as you think, they haven’t been below 1500 since 2012-13.
- The Melbourne Stars should really have won a title.
- The Sydney Thunder, despite winning the title, still have a pretty shoddy rating.
And here’s a few more little tidbits of note:
Post-game ratings of 1550+ by team
- 19: Perth Scorchers
- 8: Melbourne Stars
- 4: Sydney Sixers
- 1: Adelaide Strikers
- 0: Brisbane Heat, Hobart Hurricanes, Melbourne Renegades, Sydney Thunder
Post game ratings of 1450- by team
- 26: Sydney Thunder
- 7: Brisbane Heat
- 0: Adelaide Strikers, Hobart Hurricanes, Melbourne Renegades, Melbourne Stars, Perth Scorchers, Sydney Thunder
Rounds being the highest rated team15
- 21: Perth Scorchers
- 11: Melbourne Stars
- 5: Sydney Sixers
- 2: Hobart Hurricanes, Melbourne Renegades
- 1: Brisbane Heat
- 0: Adelaide Strikers, Sydney Thunder
Rounds being the lowest rated team
- 31: Sydney Thunder
- 9: Brisbane Heat
- 1: Melbourne Renegades
- 0: Adelaide Strikers, Hobart Hurricanes, Melbourne Stars, Perth Scorchers, Sydney Sixers
You can play with the spreadsheet here. Please tell me what you find, and I’ll be updating it when the season rolls around, so long as someone reminds me.
1Unless, of course, there aren’t many teams in the competition. In the Big Bash League, taking the 2015-16 final table as a rating, it would be more impressive beating the Thunder, the Renegades, Brisbane, Hobart, and the Sixers than it would be beating three of Adelaide, the Stars, Perth, the Thunder, and the Renegades. However, if we go to the AFL, beating three of Sydney, Geelong, Hawthorn, GWS, and Adelaide is better than beating all of Carlton, Gold Coast, Fremantle, Brisbane, and Essendon.
2This is hilarious for some reason.
3I have no clue how to say this.
4Fédération Internationale des Échecs, for you French speakers out there.
5FIDE also added new “Qualification for Rating” rules to its handbook awarding arbitrary ratings (typically in the 2200 range, which is the low end for a chess master) for players who scored at least 50 percent in the games played at selected events, such as named Chess Olympiads. Elo and others objected to these new rules as arbitrary and politically driven. (https://en.wikipedia.org/wiki/Arpad_Elo)
6If you Google it, it’s the second result.
7Bobby Fischer (1943-2008) was an American considered one of the greatest of all time. Aged 20, he won the US Championship with a perfect score, and was the first FIDE number one16 under Elo’s rating system, one of only seven people to hold this title. He came back from 2-0 down after 2 games to beat Soviet Boris Spassky 12 1/2 to 8 1/2 in the 1972 World Championships. However, he was also a bit nuts. He accused the Soviets of collusion in the 1963 Candidates tournament in Curaçao, and argued with Spassky over the location of the 1972 World Championship (he wanted Belgrade, Yugoslavia, Spassky wanted Reykjavík, Iceland). When it was played in Reykjavík, Fischer asked all cameras to be removed after Game 1. When he was denied this request, he forfeited Game 2 and nearly left Iceland. Three years later, he resigned his world championship over FIDE not accepted his demands, and pretty much disappeared off the face of the earth. In 1982, he was arrested, supposedly because he matched the description of someone who had commited a bank robbery nearby. He wrote a pamphlet about the experience, claiming he had been framed up and set up. In 1992, he challenged Spassky to a rematch in Yugoslavia, violating Executive Order 1280.17 Fischer spat on a printed copy of the Order, leading to an arrest warrant being issued for him. He lived in Hungary, the Phillipines, and Japan from 1992 to 2004, making anti-Semitic comments and defending the 9/11 terrorist attacks, before being granted Icelandic citizenship in 2005. He died in 2008, and there are debates on whether he had schizophrenia, Asperger’s Disorder, or (most likely) paranoid personality disorder.18
8Garry Kasparov (born 1963) is a Russian who became the youngest ever undisputed World Champion in 1985, beating his countryman Anatoly Karpov. He, however, got in disputes with FIDE, and ended up founding his own organisation. In 1997, he lost to Deep Blue, becoming the first Number 1 to lose to a computer. Since retiring in 2005, he has written plenty of books, mostly about chess, and is a strong critic of Vladimir Putin and Donald Trump.
9Both of these were, at the time, the world’s highest ever. Fischer’s was beaten by Kasparov, whereas Kasparov’s was beaten by Magnus Carlsen, the current world number one.
10By expected score, we mean 1 for a win, 0 for a loss, and 0.5 for a tie. There are ways of adapting for points scored in a game, but that’s outside the scope of this, and wouldn’t work that well for cricket anyway. If you really want to know, see here.
11Fischer never played Kasparov; in fact, the two never met, something Kasparov regrets. However, contemperary reports say that Fischer was near unstoppable and had no obvious weaknesses.
12This should be true of every sport, but FiveThirtyEight had this to say in the how it works section for the NBA analysis:
One way to interpret this is that NBA data is subject to relatively little randomness. This makes it different from sports like baseball and hockey, whose game-by-game results are pretty noisy; in those sports, your default assumption should be that a winning or losing streak is mostly luck. That isn’t so true for basketball. Streaks may reflect true, if perhaps temporary, changes in team quality.