By my standards, this is quite a long piece. Before I get into the graphs and information I have prepared about the league I first want to discuss a couple of areas that irritate/interest me generally in football analysis.
Regression to the mean
I’ve spent the last couple of days digging around and giving consideration to trends in the premier league over the past 13 seasons. Part of the inspiration for doing this comes from what I believe to be overly liberal use of the phrase ‘regression to the mean’ amongst sports analysts which has become a bit of a bugbear for me. I find it troublesome because a ‘mean’ in terms of a football team’s results/shot ratios/goal difference in the short term is dynamic and dependent (to varying extents) on tactics, coaching, injuries and transfers to name a few – all of which change in some cases on a monthly or seasonal basis.
The example that we often see used is Newcastle 2011/12 vs Newcastle 2012/13 and then potentially Newcastle 2013/14 which could serve up another in a long line of ‘regression to the mean’ articles explaining why their 2012/13 results were so bad that obviously 2013/14 was going to be an improvement. But is this actually regression to the mean? Not really. I’m touchy about this in part because I’m a Newcastle fan – but in any case mean regression for an individual team might not ever happen and is of little benefit to the everyday football analyst, who must take a shorter term view (even if short term can be defined as anything up to 5 years or so). Sorry Newcastle fans, but the Toon really could perform worse next season.
Ranting aside, I am being a little pedantic here, because I agree wholeheartedly in the random/luck element to football which can have a drastic effect on a team’s season over 38 games. But there’s a lot of noise in football data that lends itself to making spurious predictions – so I’d prefer we say things like ‘Newcastle had an unsustainably high goals to shots ratio in 2011/12’ or that their goal difference was unusually low for a team finishing 5th in that season. The hockey statistic PDO seems to have a cult following, and although I find it a fairly useful tool in indicating the average team’s luck in football I’ve yet to be convinced by its usefulness in analysing any particular teams over multiple seasons. Why? Because the average team doesn’t really exist. Don’t get me wrong, I love using and calculating averages – but one of their main benefits is to serve analysts as a benchmark by which we can consider individual cases or develop strategies for improvement.
Football by its nature is viewed in the short term and the league is not a closed system – teams are promoted and relegated every year, and there’s a fine line between luck and skill for the majority of teams in the league. However, when we group teams together patterns can emerge.
Macro vs Micro analysis
I’m borrowing use of the terms macro/micro from economics, or if you like top-down vs bottom-up from portfolio analysis. Macro/top-down analysis considers the economy or market or league as a whole in order to inform strategy whilst micro/bottom-up analysis considers the underlying agents involved – in the case of football this is players/teams – in order to form expectations.
The kind of analysis that lends itself to discussion of regression to the mean is macro analysis because it deals with the aggregation of many underlying factors into a league table at the end of a season which for the EPL is the culmination of 380 games, about 8800 shots, 4700 shots on target and 1000 goals. These are sample sizes you can do something more serious with, the problem being that it can be difficult to relate long-term aggregated relationships to single players or teams.
The football manager, on the other hand, has only 38 games to work with and wishes to increase his team’s share of the 1000 goals as much as possible. The football gambler wants to know who will win the next game or score the next goal (we have moved into micro analysis). Micro analysis dominates football research because of our obsession to improve our understanding of the game on the field. The football fan/pundit generally has a relatively short memory and enjoys obsessing about what will happen or has happened in every game. The analyst reviews thousands of actions collected by data companies and video footage exhaustively in order to gain a competitive advantage over rivals. Micro analysis is absolutely vital to football clubs in order to improve, but the problem with it is that sample sizes are much smaller and noisy – so it can take considerable time and effort to prove an advantage gained from this type of research. Time that a football manager might not want to take because he could be out of a job within 10 games.
What’s my point in all this? Well, I suppose I want to make clear that micro analysis will always lead the way for both innovation and spurious claims in sports but we need macro analysis to keep us in check and help identify how much an achievement/failure can be apportioned to luck or skill.
The Top 7
Are the best teams getting better? Is the battle for 4th more challenging than ever before?
Below are 2 graphs: the first shows the points and goal difference of the top 4 and the second shows places 4-7 since the 2000/01 season. The goal difference is shown on a secondary axis on the right hand side.
These graphs don’t really suggest any structural changes in the points/goal difference of the top 7 over the past 13 years (although granted, it’s hard to identify structural shifts over 13 years). We can see the mad 2004/05 season, in which Everton snatched 4th place with a goal difference of -1. Otherwise, there is a hint that the 7th best team has improved since 2008/09 as the goal difference of that team has been at a minimum of +5 over the past 5 seasons. This may be a result of Everton managing to consolidate their position in the top 7 and Spurs improving to achieve 4th or 5th in each of the last 4 seasons. In my head I’m counting Liverpool as part of the top 7 because they have been there in every season but 1 over the past 13 years.
Over the past 13 years, the Champions have won with ~87pts, scoring 80 goals and conceding 29.
4th place requires something close to 69pts, 65 goals for and 39 against.
For a relegation candidate, to attain 17th place and avoid relegation a team should aim to gain on average at least 38pts, scoring about 40 goals and conceding about 60.
The Table in Charts
Over the past 13 years, between them the top 7 have taken about 50% of the total points on offer, scoring about 45% of the goals. We can see that from around pos 7-8, the graph flattens out, dipping again significantly at 19-20. The difference between the team in 8th and the team in 18th is really not that big at all.
Structurally, I don’t anticipate this to change that much in the long term – the top 7 will likely continue to dominate the league, the group from 8-18 will be quite changeable – luck playing a big part in where each team finishes, and the bottom team is more likely to be one of the newly promoted sides whose skill level isn’t up to scratch. That isn’t to say that a team can’t drop out of the top 7 and be replaced by a higher performing team in the lower group from time to time.
In the past 13 seasons, on average 1.15 newly promoted teams have been relegated: occasionally 0 teams, usually 1 team and sometimes 2 of them. And a team in just its second season in the top flight, who perhaps benefited from a little luck in their first season, has been relegated 0.54 times. That’s about 1.7 teams – this year potentially 1-2 teams from Cardiff, Hull, Crystal Palace, West Ham and Southampton.
I think the below graphs give some insight into the potential for luck in the league for the middlish teams.
What about ratios?
In terms of goals conceded, it’s a different picture – only the top 3 really seem to have a knack of preventing goals significantly whilst actually pos 14 and below are significant in terms of an aptitude to ship goals.
The pictures suggest that scoring goals is relatively more important to teams at the top of the table, whilst stopping goals is more important to teams in the bottom half.
For example, the average number of goals scored by the team in 18th is 39 goals whilst the average scored by the team in 8th is ~49 goals, a difference of 10. Conversely, the team in 18th concedes about 64 goals per season vs 46 conceded by the team in 8th: a difference of 18. A team expecting a relegation battle is therefore more suited to investing in their defence to reduce the number of goals they concede – as this is the ‘easier’ area for them to make an improvement.
What about the team in 8th that wants to challenge for 4th? Well, 4th place generally scores 65 goals and concedes 39, +16 and -7 respectively compared to the team in 8th. In this case, improving goalscoring should arguably be the priority.
To summarise my thoughts for this piece, really there is a place for both macro and micro analysis in football analytics, despite the clear focus on micro factors by analysts in general. But macro analysis cannot be discounted, and can serve as a helpful guide to direct micro analysis and provide greater certainty that the research we are producing is truly worthwhile.