Monthly Archives: July 2013

Random League Generator

Here I have attached a downloadable Excel file to share a simple league table generator I put together recently. Click the link below to open it:

Random League Generator

I performed a quick test of downloading the file from this site and my PC tried to open it as an old-school .xls file so you might want to save the file down and open it separately if you’re having trouble. I haven’t tidied, protected or hidden much information in it so consider it yours to use and browse as you wish.

This mini-exercise is intended to provide an illustration of the randomness of results in football in general. Having just finished reading Chris Anderson and David Sally’s The Numbers Game, in which randomness and luck is discussed at great length, I thought I’d take a look at the possible results and league tables we might expect if the league was truly random.

In the first tab ‘Random League Table’ I have created results and tables for 10 seasons for 20 hypothetical teams A-T, all of whom have about a 47% chance of winning a home game, 26% chance of a draw and 27% chance of an away win.

In the second tab I have used real team names and input a strength rating [1-10] to calculate what the league table might look like if the team abilities are distributed in a certain way, and it also serves as a helpful (albeit basic) model to show that sometimes, if we accept the notion of randomness, an unexpected team CAN win the league from time to time or be relegated simply due to the sampling size of a 380 game season. If you want you can change the strength ratings (although you’ll have to pick numbers for each team between 0-10) and then click the orange button to refresh the results (make sure you enable macros in Excel).

Enjoy!

EPL Debated in the House of Lords

Today I had the fascinating experience of visiting the House of Lords at Westminster to listen in on a debate motioned by Lord Bates on ‘Contributions of the English Premier League Football to the United Kingdom’.

Sceptics and socialists among you may wonder what the hell football has to do with the House of Lords and what possible relevance it might have to grassroots football. Fair enough. The Lords give consideration to public policy and have a responsibility to hold the government to account – when I joined the chamber this morning the state of A&E healthcare provision was under scrutiny – so depending on your point of view you may not see the need for any form of observation or intevention by politicians in sport.

But the debate undeniably highlighted the popularity of the sport across all areas of society. Indeed, by discussing the Premier League in a place traditionally associated with aristocracy and rule by hereditary peers, it was the venue that provided a striking contrast to the classic memory of football I have growing up – of the game being more associated with ‘the working classes’. The Lords chamber is also a particularly striking contrast from the 2. Bundesliga match I attended last weekend at the Millentor stadium between St Pauli and 1860 Munich!

It is useful to be reminded how important football has now become to both the privileged and underprivileged, and how inclusive it has become (or at least has tried to become).

Many speakers in the debate made note of the controversial origins of the English Premier League and its breakaway from the Football Association and Football League in 1992. It remains a very clear marker for the modernisation of the top-flight game in England and a shift in focus towards the commercialisation of the game – with lucrative broadcasting deal after lucrative broadcasting deal ever since.

The high revenues enjoyed by the Premier League are well-known and well-reported. In 1991/92 the collective revenue of the top division in England was £170 million. In 2013/14, in consideration of the latest lucrative broadcasting deals, the seasonal revenue is projected to reach £3.08 billion. Attendances and stadia occupancy rates have also increased significantly, although it is also often pointed out that the German Bundesliga leads the way in that respect.

The “buoyant incomes have been re-invested: in stadium facilities, in playing squads and training standards, in wider communities and in grassroots football” (House of Lords Library Note, p3). Football has become an important economic agent, providing jobs and attracting investment across the country.  An example used in the debate was Swansea – which, according to a study by the Welsh Economy Research Unit, benefits from £8.13 million per year by visitors to Swansea matches per season – creating jobs and even increasing the number of applicants to its university.

Lord Wei provided some detail on how a recent study showed that a large proportion of Chinese people associate of the city of Manchester with football, which very helpfully enables it to be recognised by foreign investors and aids the UK in competing for investment in the Greater Manchester area – not only in tourism but also the development of significant business centres.

And yet, in the minds of some, the association of football with business is impure. But alongside the scarier prospect of selfish profiteers in football comes societal benefits such as the opening up of football to more diverse groups of players and supporters. Football grounds are now much more receptive to families, women and ethnic minorities than ever before and the heavy involvement of football clubs in promoting this change is arguably one positive influence of the commercial benefits of doing so.

On the other hand, in the EPL we are in danger of driving the cost of tickets and television subscriptions beyond the reach of the average fan – a point noted by, among others, Lord Ouseley. Eye-watering price levels have already been reached, and that is one area where politicians may see the need to intervene in the governance of a game which in danger of marginalising its poorer fans and becoming increasingly elitist.

The topic of governance at football clubs was widely mentioned with particular reference to both Portsmouth and Blackburn’s recent decline and concerns that the objectives of owners are not always aligned with the opinions and desires of the fans who invest considerable energy in support of their club. This was yet another area where the profiteering by owners – or perhaps the recklessness of their spending – was highlighted as a threat to the Premier League’s future and one which policitians are becoming ever more watchful of.

The lack of a more diverse and representative mix in both the boardrooms at football clubs and also at the top of the managerial tree was noted as another failing of the game in its current state – as well as the recent damning indictment by Tanni Grey-Thompson on the poor standards for the accommodation of wheelchair users in sport in the UK.

So politics and football: should it mix? In my opinion, to some degree it has to. Football’s grip on its millions of fans in the UK, and billions globally, needs a certain amount of regulation and monitoring considering how much power the leagues and clubs at the highest level have in the way that football is run. With great power comes great responsibility, and it is clear that in some cases the desires of owners are at odds with the desires of supporters – not to mention the ongoing arguments over how the globalisation of the EPL may be negatively affecting the results of the England team. It is hard to imagine how, without the intervention of regulators, the prospects of British academy players or ticket prices can be changed in favour of the local stakeholders in football clubs. For that reason alone I think football can always benefit from discussion and accountability to the community as a whole – be it local, national or global.

The Moonwalking Bear

I couldn’t resist making the title of this post ‘the moonwalking bear’. Many of you will have already seen the video I linked above, I think I saw it twice in separate presentations at both the Sports Analytics Innovation Summit and the Science + Football Conference earlier this year, but the point it makes is useful to repeat from time to time.

Being the pedant that I am, I’d like to mention that the bear hardly moonwalks – it’s more like an awkward jig through the teams – but that argument is for another day and another place where I can air my opinions on moonwalking bears to my heart’s content.

And now I’ll come clean, this post isn’t actually about a moonwalking bear. Time to click the ‘back’ button on your browser if you’re expecting to see more people dancing whilst dressed up as animals. A tiger doing the macarena perhaps? Sorry.

So this post is all about what we don’t see, what we don’t record, and what we don’t tend to look for.

The Numbers Game summarises the problem in sports analysis very well in Chapter 4: Light & Dark – so read it if you haven’t yet – there’s quite a lot of rhetoric but the central points are great. This is a quote directly from the book, p132: “defensive actions that can be measured – tackles, clearances, duels – have the feel of one-offs, preventative actions, rather than things that can produce something positive. Ball events are tracked, but things that happen off the ball are ignored. It is far harder to tune in to excellent marking, cutting of passing channels and wonderful positioning.”

Understanding how to credit and apportion defensive work is a troublesome issue in team sport analysis, and many professionals are actively working to understand and model this in order to improve our knowledge of what makes a successful defender. How do you count actions that do not occur?

Today I visited the team at Kickdex, and I was suitably impressed by their approach to football analytics – indeed what I’m now calling ‘the moonwalking bear problem’ was something we discussed and it is something they are actively working to solve. I’m a sucker for the application of any form of science in sport but in my humble opinion Kickdex are looking at technical challenges posed by football in a way that few other people I have met in the industry are. Slightly off topic now but incidentally network theory analysis is another area that we discussed – and I have since been helpfully informed that it is a fairly well-tracked field in the application of science to sport – but nevertheless I feel it is important to highlight a couple of excellent studies on the subject (one of which was co-written by a founder at Kickdex):

A network theory analysis of football strategies – Javier Lopez Pena & Hugo Touchette

Quantifying the Performance of Individual Players in a Team Activity – Duch, Waltzman & Amaral

Thanks to Paul Power (@counterattack9) for the second link.

If Opta stats don’t give us direct information about the moonwalking bears in football then how can we measure their effect? There is certainly some thinking to be done around this and probably a lot of inferences need to be made from the data we do have. I liken this to a kind of dark matter problem that you might need a theoretical astrophysicist to investigate! Incidentally, the co-author of the first paper linked above is a theoretical physicist called Hugo Touchette. And the director of research hired at Liverpool FC last summer (Ian Graham) has a doctorate in the subject. Clearly, even these fields are attracting the interest of football analytics and vice versa.

I’d personally love access to off-the-ball player position data (and a long, long time to look over it) but that’s something that I’m reliably informed only Prozone and the clubs who install the camera systems have access to in the EPL. Clubs almost always go down the road of keeping information proprietary, which is understandable, but it’s unclear if they have the right expertise to benefit from the information (perhaps Liverpool / Ian Graham excepted). So we need to find a workable solution to understand defence – in much the same way that our current array of stats have shown to be effective at shedding light on offensive performance.

On a separate occasion I have also had confirmation that clubs do indeed tend to have more difficulty in obtaining reliable statistics for central defenders and defensive midfielders, an area that I have also experienced odd results from my player analysis modelling – e.g. in the 2011/12 season, in some of my earliest analyses, I curiously rated Clint Hill as one of the best defenders in the league (fortunately my model has changed a fair amount since then).

So the race to find good data for defenders continues and the significance of inaction on the field continues to confound us. The analytics community has taken great leaps forward for attackers but we are still some way short of tailoring this analysis to defenders. Meaningful information in this area is in high demand!

The English (Clubs) Are Coming!

No analysis here, just some general information and a colouring-in exercise…

Preseason tours to far-away countries are now a common occurence for Premier League clubs due to high overseas demand of the product that is the Premier League. The product is now well-established in Asia – many teams are now also trying to promote market growth in North America and Africa.

9 out of 20 teams in the League next season are staying put in Europe for their preseason, whilst newly promoted Cardiff and Crystal Palace have no plans to travel out of the UK.

Germany, USA and Hong Kong are among the most popular destinations but even Costa Rica and the Bahamas are on the itineraries this summer.

Anyway here’s a crude colour-filled map of the world illustrating the geographical areas that the current 20 teams have targeted in the alternative race to be the most profitable football club. I’ve arbitrarily split the countries that more than 1 team is going to – not by city or island in case you’re wondering.

EPL Preseason Tours 2013Countries visited by EPL teams 20132013 Pre-season overseas countries visited per team, according to http://www.premierleague.com

Arsenal: Indonesia, Vietnam, Japan, Finland

Aston Villa: Germany, Ireland

Cardiff: N/A

Chelsea: Thailand, Indonesia, Malaysia, USA

Crystal Palace: N/A

Everton: USA, Austria

Fulham: Costa Rica, Germany

Hull City: Portugal

Liverpool: Indonesia, Australia, Thailand

Man City: South Africa, Hong Kong, Germany, Finland

Man Utd: Thailand, Australia, Japan, Hong Kong, Sweden

Newcastle Utd: Portugal

Norwich City: USA, Portugal

Southampton: Spain, Austria

Stoke City: USA

Sunderland: Hong Kong

Swansea City: Netherlands,

Tottenham Hotspur: Bahamas, Hong Kong, Monaco

West Brom: Germany, Hungary

West Ham Utd: Ireland, Germany

Using Past Performance to Predict Future Success

The title of this post is a little misleading, I’ve taken an oft-used disclaimer ‘past performance is no guarantee of future success’ which is legally enforced throughout the investment industry (and beyond) to remind customers that if an investment product has performed well before, it doesn’t necessarily follow that it will continue to perform well.

Nevertheless, the majority of our decision-making and forecasting does rely on historical data or experience which means that in reality past performance is almost always a significant factor in prediction. How much importance you place on the historical record, combining this with information from [ideally] a wide range of reliable sources is up to you the analyst.

Today I’ve used data from the last 13 seasons (you may remember some of the charts I put together in this post) to give a sort of calculated guideline to next season’s final Premier League table.

First up, an introductory graph:

Year on year change in pts per teamThe graph shows, for teams who were in the Premier League for consecutive seasons between 2008/09 – 2012/13, the change in total points from the previous season: 63 data points for those wondering. Teams generally, it seems, don’t improve or reduce by much more than +/- 15pts from year to year and about 80% of the time stay within 10pts of their total from the previous season.

Of course there are exceptions, like Liverpool’s 2009/10 season where they dropped 23pts and 5 places from the season before or Newcastle’s season just finished: -24pts and -11 places. The biggest improvements came from Newcastle again (this time 2011/12) +19pts, +7 pos and Spurs in 2009/10: +19pts, +4 pos.

Year on year change in GD per teamGoal difference tells a similar story, this time the 80% range is about +/-20 goals.

There’s a hint of negative skew to both charts but I doubt I have enough data to determine if that’s significant. I’d suggest it’s probably something to do with the top 7 teams’ domination of the league – arguably it is easier for the rest of the league to drop points against the top teams and the gap to widen between 8th and below than it is for 8th and below to make gains.

But what can we do with these historical results that might be worthwhile? Well, here are some images from my spreadsheets to try and give an idea!

League Position HistoryHere I’ve reformatted some information on the past position of each team in the Premier League since 2000/01. No Cardiff here, but every other team in the league next season has played a part for at least 1 season over the past 13. Having said that, I don’t realistically think that Crystal Palace’s solitary 04/05 season will actually be a good predictor of their season to come – then again – 18th place… hmm.

I’ve put some averages in there for all you regression to the mean addicts and some adjusted average & standard deviation figures. Standard deviation isn’t relevant for most teams because you need lots of data points for it to be worthwhile but I’ve put it in there anyway! Based on this I’ve calculated an expected league position range in the far right column. The ranges are quite wide for the mid-table and below teams, frankly because historically there has been a small spread of points between 8th and 18th.

I’ve done the same kind of thing below for points and goal difference expected ranges:

League Points HistoryLeague GD HistoryNow, of course, how important you think history is a factor in future results will determine how much you are interested by the expected results in these tables. You may wish to make certain additional adjustments if, for example, you think that perhaps West Ham or Southampton are being undervalued by the historical record – or perhaps Stoke are being overvalued now that Pulis is no longer their manager.

However, I’d say it’s a reasonable base set of information to which as analysts we can use Bayesian inference to consider new information like injuries to star players in certain teams, new signings that increase the expectations of the team’s outlook, the impact of a new manager on a team’s results and so on.

The teams that this analysis considers most prone to relegation (judging by the points table) – in addition to the newly promoted teams – are Sunderland, Aston Villa, Southampton and West Ham.

Team Strengths, Weaknesses & Dependencies: 2012/13

Team Radar Score Charts

Team Strengths 1Team Strengths 2Team Strengths 3The radar charts above show the rebased average impact per game for outfield players in every Premier League team, grouped by position (D, DM, M, AM, FW).

As a sentence that opening won’t actually mean anything to anybody except me so an explanation is in order. To create these charts I took a number of steps with the intention of ‘rebasing’ performance scores that my EPL model (TPOEM) produced for last season. More frequent readers of my articles will remember that players like Bale, Suarez and Cazorla completely dominated TPOEM stats and similarly some teams, which I describe as ‘busy’ fared much better than others – without necessarily correlating to their league position after 38 games. For example, TPOEM liked Arsenal, Liverpool and Spurs a lot – in part due to the overwhelming scores of the star players noted above – but also because they all engage in a lot of activity on the pitch which is considered more important by the stats TPOEM uses (things like shots on target, accurate passes in the final third and so on). This isn’t exactly fair, because a lot happens on the pitch that fairly simple stats like this don’t capture. BUT, on the other hand, the benefit of using them is that I don’t have to watch 380 games a season to compile a body of evidence that measures performance more objectively than video and it will complement other analysis. And, of course, this type of data analysis can be done in seconds rather than hours, days or weeks. As long as we keep in mind the limitations of what we are looking at we can reduce our chances of being blinded by the results.

For this set of charts, I removed the concept of a team’s total performance score and set it in every match to 1. Then I redistributed the effects of each player on goals scored and goals conceded (in both case a higher score is better) so that they add up to 1 for every team. The result above, is a picture that tries to explain more generally which positions the most important players for each team play. They’re also radar charts, and so my memory of playing the Playstation game FIFA (a few years ago now) gave this exercise a pleasing sense of nostalgia!

The charts could suggest in which positions teams are stronger/weaker, or perhaps the playing styles of each team. Liverpool and Aston Villa, for example, are dominated by the FW position (mainly Suarez, Sturridge and Benteke) – this shows their dependency on that area of the pitch for their performances last season. Everton, on the other hand, have impressive M and AM dominance – suggesting a dependency on a strong midfield over anything else. When looking at these graphs, just please remember that a proportional score of 10% for Arsenal’s defence will not equal 10% for Stoke’s defence, because each team has been rebased from a better or worse score than 1.

Player Charts

The players are shown below. Here we get a visual representation of how TPOEM distributed scores and involvement per player per game, this time broken down by impact on goals for and goals against [again, high scores for both is better/more involved]. This time we can see by eye which players are more/less important to each team and we also get a feel for the concentration of scores in particular areas. Disclaimer: you don’t see how much each player played – for example Manchester United’s Buttner enjoyed some excellent scores for the 382 mins he played, so if you misread the graph he looks to be one of the best defenders United had last season – but really what this is saying is that he was heavily involved when he did play. For that reason it may have been useful to include total minutes/apps but too late for that – use your head!

And as far as GKs go, don’t take too much notice of their results. The sad truth is I haven’t incorporated a decent method of scoring GK performance yet.

Does it surprise you that no team was more reliant on one player than Spurs were on Gareth Bale? Probably not. How about Matthew Upson’s impact for Stoke? Yes, that one is odd, but only because Upson played 1 match for Stoke last season and TPOEM gave him a good score for it (i.e. ignore it! The same goes for Victor Moses in Wigan’s chart and Dembele for Fulham!). Yes, I should know better than to include the small samples.

Arsenal rebasedAston Villa rebasedChelsea rebasedEverton rebasedFulham rebasedLiverpool rebasedMan City rebasedMan Utd rebasedNewcastle rebasedNorwich rebasedQPR rebasedReading rebasedSouthampton rebasedStoke rebasedSunderland rebasedSwansea rebasedSpurs rebasedWest Brom rebasedWest Ham rebasedWigan rebased

Conceding Shots – A brief expansion

This brief post builds slightly on the excellent work published by @Colinttrainor on his site statsbettor.

I hope he won’t mind my use of the data to try and highlight a few additional points of discussion…

I took his image, re-ordered it by actual goals conceded, and added rankings for his defined prime and secondary shots volume to attempt to review how well it explains the actual number of goals conceded by each team last season:

Shots against expansion 1.2

Spurs actually rank #1 in terms of the volume of shots in prime and secondary positions allowed but 8th in terms of goals conceded. West Ham, on the other hand, allowed the second-worst number of shots in prime and secondary positions but managed to finish mid table in terms of goals conceded. But is that luck or skill?

Defensive pressure on the shooting player, goal-keeping ability or plain old luck could feasibly be reasons for this.

At the bottom of the above table, I input an expected average assumed goals per shot to try and attribute how much each shot allowed might be worth to the opposing team – this wasn’t scientifically calculated, just a ball park set of figures to infer the goals we might anticipate that each team would concede if the season were replayed (several times). I’m suggesting that a shot in the prime area is worth something like 0.25 goals, 0.1 goals in the secondary area and then 0.05 and 0.04 for marginal and poor areas respectively.

The result is below:

Shots against expansion 2.4

Arsenal, West Brom and QPR are closest to the expected goals tally, suggesting perhaps that luck didn’t play much of a part in their defensive results last season.

And a simple indication of probability on the far right column suggests that, assuming skill does not play a part in under or out-performing the expected result (I think it certainly does to some extent), then Wigan, Southampton and Newcastle were the most unlucky in terms of goals conceded whilst West Ham, Stoke (who else?) and Sunderland were the luckiest defensively in the league last season.

Of the teams in the top half, Liverpool and Spurs both conceded more goals than expected (+5 and +10 respectively) which suggests a need for improvement if we cannot apportion their underperformance to luck.

**NB @colinttrainor ‘s work strips out headed shots, but my additions are based on all goals conceded. A relatively small but not insignificant note to the above.**