Tag Archives: Football

England’s Turning Point? (an ode to Rooney’s goal in Brazil)

Hindsight is a wonderful thing. I actually wrote the below post a week or 2 after England’s miserable exit from the World Cup but didn’t publish it because I sensed the derision you [the reader] would have towards my ‘silver lining’ attitude to England’s poor results in Brazil.

I am now releasing it because a) England are playing this week, b) England impressed in last month’s win over Switzerland, and c) I was reminded of it (just a little bit) by this goal scored by the Ipswich U14 team. England clearly weren’t good enough in Brazil, but my memory remains that they gave the impression of a decent team whilst in possession of the ball. Results notwithstanding, this represented an sizeable improvement on the style of play seen in the previous 10 years (or so).

Rooney’s inconsequential goal against Uruguay was one of the best I’ve seen from the Three Lions in a game of importance.

England fans have been treated to a few goals of individual brilliance over the past 20 years: Gazza at Euro 96, Owen at France 98, Joe Cole at Germany 06 and Beckham’s last minute free-kick against Greece to qualify for the 2002 World Cup. But Rooney’s first world cup goal was for once an excellent team goal that sets it apart from the rest – and perhaps that is why I enjoyed it so much.

The goal against Uruguay isn’t quite forgotten, in so much as it was only a couple of weeks ago, and English readers will likely remember Rooney sliding the ball into the net from Johnson’s cross.

But no match reports, highlight reels or analysis I can find seem to appreciate quite how the attack swept from one corner of the pitch to the other. 26 seconds from start to finish involving 7 different players.

The deluge of doom and gloom that Suárez’s freak winner brought on totally overshadowed what was overall a reasonable performance, and an excellent goal. The (UK) pre-game betting market had the odds pretty close between the teams which in itself suggests that Uruguay were favourites but on balance I still think England can count themselves ‘unlucky’ to have lost the match.

All the highlights of the goal appear to begin when Sturridge collects the ball. Admittedly, his improvisation under pressure from 2 players is the most elegant part of the move but we need to rewind 15 seconds to see where the play began.


You can watch it again in full here, select the analysis section and navigate to 6:52. And mute the miserable commentary from Dixon. It’s documented as Rooney’s first goal at a world cup, but little else. I wonder how the goal would have been received if Argentina, Brazil or Germany had scored it? I admit that other teams DO score goals like this, but England? Really?

Embellished text commentary:

  1. Suárez takes the ball on the turn from Cáceres’ throw-in, only for Jagielka to steal in near the England corner flag and advance with the ball, laying it on to Lallana who had doubled up on Suárez. Meanwhile, Suárez hopelessly slumps onto his back in hope of a free-kick from the referee’s assistant (time 0-3s)
  2. Lallana takes a touch and lays the ball short to Rooney deep on the left flank, who, under pressure from the retreating Cáceres, nudges the ball back towards his own goal and then stretches to thread a pass to Gerrard through the legs of the onrushing González. Rooney, having fallen upon passing to Gerrard, picks himself up and begins his run towards goal (time 3-6s)
  3. Gerrard collects the ball mid-way between the penalty area and the halfway line, switching the ball to Johnson on the right (time 6-10s)
  4. Johnson stops the ball and then pushes it further forward and wide to Henderson and then runs inside him. Henderson, receiving the ball just inside the Uruguayan half under pressure from Cavani, takes the ball further wide and then passes forward to Sturridge (time 10-16s)
  5. Sturridge, with his back to goal, drags the ball inside taking it out of Cavani’s reach, then turns outside from the challenge of Pereira, leaving Pereira on the ground. With Johnson now ahead of him on the right, Sturridge plays a nicely weighted pass encouraging Johnson to change direction and move towards the goal (time 16-21s)
  6. Sturridge’s pass also tempts Godín wide and too close to the advancing Johnson and he is also left on the ground as Johnson controls and pushes the ball in one movement directly into the penalty area (time 21-24s)
  7. Now at the final line of defence, Johnson is weakly challenged by Lodeiro as he crosses the ball along the ground into the 6-yard box (time 24-25s)
  8. Rooney ghosts in behind Cáceres to pass the ball into the net (time 25-26s)
  9. Henderson and Johnson celebrate with a front-on-knee-slide-hug in the penalty area that isn’t weird at all

Sturridge took 6 touches, everyone else only had a maximum of 2 touches on the ball, controlling the ball and moving it on.

This insignificant goal remains at the very least a small endorsement of the potential that Hodgson’s England team had at the tournament, and above all how England’s style of play (at least in possession of the ball) has improved since 2012.


World Cup 2014 Data

It’s been a while since the last post I made on this blog, but it’s World Cup season so I had to contribute something.

I’ve collected some player/team data from the FIFA website which anyone can download and find interesting stuff. I’ve only put basic data in there, nothing too technical, but there is a collection of passing and tracking stats and a handful of other categories for every game so far: World_Cup_2014_group_stage < Click to download data in .xlsx format

If you use this data and see any problems with it let me know. For the USA-Ghana match, a handful of the stats didn’t seem to be published in the usual format so that one is incomplete. I also noticed that the high-intensity distance covered stats for the same game looked strange (probably incorrect) – use with caution.

Here is a small selection of charts based on the published dataset…




(*NB I removed values for USA-Ghana in the above chart)




Srna and Di Maria pop up a couple of times with top speeds clocked over 31km/h. Aurier was observed at the fastest speed of 33.52km/h in the Ivory Coast-Colombia game.


For total distance covered Bradley makes 3 appearances in the top 20 for his efforts in all 3 games.

EPL Transfer Heatstroke

I’m adding to the long list of articles reviewing the transfer dealings that have taken place this summer. I don’t have a lot to elaborate on some excellent pieces I read this week, so I’ll provide some links to them first and some charts after.

This one, from @mixedknuts on Stats Bomb is an entertaining and well-reasoned leveller discussing Everton’s reportedly expensive loan signings – reminding us that spending big sums isn’t really a good strategy for teams that are subject to heavy budget constraints.

This, from @altmandaniel throws a cold glass of water in the dreaming fan’s face to say that spending on attack isn’t everything – despite almost every team (except Cardiff?) having generally focused on goalscoring additions to their squads in the transfer window.

And this, from @TheM_L_G on Grantland looks at Spurs’ squad evolution over recent years in light of their recent headline-grabbing dealings.

There are no doubt several other pieces I haven’t read of a similar ilk but these are nice articles because I think they combine fairly well to give us a straightforward education in transfer spending and strategy – particularly useful to refer to when we are observe last-ditch transfer deadline madness where longer-term strategy appears to fly out of the window in favour of impulse.

Anyway, on to the information I put together.

I’ve consolidated some data on transfer spending in the EPL over time and taken a fresh look at spending vs points/goal difference. More than anything I suppose I wanted to put some things into context.

I took everything below from transfermarkt.co.uk if you’re wondering/shocked by the numbers:

EPL Rev vs ExpenditureCurrent season net spend is indeed bigger than any previous season (shown by the bubble sizes above), and we haven’t had the January transfer window yet – this shows how much the EPL has been looking abroad for talent this year and indicates the reluctance of EPL teams to sell to rivals (Suarez, Rooney and Baines are a case in point).

Net Spend Per Season Big 5I suppose this graph shows again how much the EPL does like to subsidise its rival leagues! However, Spain actually eclipsed the EPL net spend once in the last 5yrs (09/10) all thanks to Ronaldo.

As an aside, I don’t buy the English talent drain argument when it come to the national team but this does indicate how much the EPL loves an import.

Net Spend vs Change in Pts EPLNet Spend vs Change in GD EPLFor the 2 charts above, I took the net spend of every club and compared their change in points total, and change in goal difference, against the previous season. 187 results since 2002/03 (excluding newly promoted teams). Others before me have done this and I have done it again – concluding that there’s basically no relationship here between net spend and a team’s resulting points growth.

There are winners and losers yes, and certain examples buck the trend, but in general spending more money won’t buy you happiness (at least immediately anyway)!

Success StoriesWe do have some success stories here, although how much of the change in points totals can be attributed only to transfer dealings is quite contestable. Above I listed teams whose net spend made them £20m+ better off and yet still increased their points total on the year before. Take a bow West Ham? Oh, wait, the following year their points total was down to 35 so perhaps not! Arsenal feature twice thanks to Wengerball.

Disaster StoriesAnd now here are the teams whose net outlay was £20m+ but they still managed to end up more than 10pts worse off than the season before.

The West Ham example above suggests we could put an extra season lag on the changes and see what happens to the relationship – in many of the examples I just published, the teams may well have traded heavily in January thus making the opportunity for dealings to affect performance slightly lower in that season. But adding a season lag doesn’t make a difference, I checked. Take my word for it.

So what I am saying here? I’m reiterating many before me but spending big doesn’t get you big improvement fast. Regime changes, as happened at Chelsea and Manchester City are sort of an exception, but they have enjoyed consistent high spending over numerous years to bring success. I wouldn’t even say that they’re better at picking star players, rather they have a better chance of buying star players simply because they are buying more of them. It’s high time the average EPL club wised up and changed their transfer approach because they’re really not that good at spending money.

Random League Generator

Here I have attached a downloadable Excel file to share a simple league table generator I put together recently. Click the link below to open it:

Random League Generator

I performed a quick test of downloading the file from this site and my PC tried to open it as an old-school .xls file so you might want to save the file down and open it separately if you’re having trouble. I haven’t tidied, protected or hidden much information in it so consider it yours to use and browse as you wish.

This mini-exercise is intended to provide an illustration of the randomness of results in football in general. Having just finished reading Chris Anderson and David Sally’s The Numbers Game, in which randomness and luck is discussed at great length, I thought I’d take a look at the possible results and league tables we might expect if the league was truly random.

In the first tab ‘Random League Table’ I have created results and tables for 10 seasons for 20 hypothetical teams A-T, all of whom have about a 47% chance of winning a home game, 26% chance of a draw and 27% chance of an away win.

In the second tab I have used real team names and input a strength rating [1-10] to calculate what the league table might look like if the team abilities are distributed in a certain way, and it also serves as a helpful (albeit basic) model to show that sometimes, if we accept the notion of randomness, an unexpected team CAN win the league from time to time or be relegated simply due to the sampling size of a 380 game season. If you want you can change the strength ratings (although you’ll have to pick numbers for each team between 0-10) and then click the orange button to refresh the results (make sure you enable macros in Excel).


TPOEM 3pm Predictions


Judging by the above, TPOEM may well be overestimating the probability of draws in general. The difference to Skybet odds suggests below par value in betting on Arsenal, Villa, Everton, Liverpool or Southampton (although might be slightly better with other bookies e.g. Betfair).

Staying true to the value-principles of the model, I’ll back draws for for Reading-Liverpool and Arsenal-Norwich even though neither result represents the most probable outcome.

In the next week or 2 I’ll post some more information on how TPOEM works, why I’m doing it and what I use it for. I actually use the model primarily for player appraisal, judging ex-post player performance (i.e. past results) rather than just to try and beat the bookies. Nevertheless, the prediction side is an interesting and amusing exercise (at least for me anyway!) – however this is still in its infancy and I am tweaking it every week at the moment.

The basics of TPOEM have striking similarities to Neil Charles’ model, which you can follow here: http://www.wallpaperingfog.co.uk/2013/04/football-model-under-hood.html

I also use Excel and VBA (more than I ought to, it certainly slows me down) and EPL Index player data to arrive at game expectations, but we are coming out with very different results – that just goes to show how important the modellers’ input weightings and choice of variables are to any model.

Martin Eastwood’s EI index is another worth checking out, although on the face of it it does appear to be more of a top-down approach to predicting: http://pena.lt/y/2013/04/12/ei-match-predictions-for-the-english-premier-league-6/

Top-down weighted models (focusing on macro- rather than micro-level data) are likely to be used more heavily by bookmakers which suggests that (assuming I’m right about the construction of the EI Index) Eastwood’s model may have a better chance of successfully predicting results than either TPOEM or Charles’ model. But unless any one of the models is spectacularly bad, we’ll need a lot more information before that becomes clear. Healthy competition in any case!

The Science + Football Conference: Day 2

Last up in this series of reviews covers the second part of the Science + Football conference.

On day 2 of the conference I kept a low profile and stuck to my seat in the presentation theatre for most of the day. It was another day of sessions from a wide variety of speakers (as by now I had become accustomed to) including psychologists, sports scientists, statisticians, scouts and a panel session including former England manager Steve McClaren.

Dr Misia Gervis, who I noted in my earlier review of the Sports Analytics Innovation Summit, gave a presentation which really struck a chord with the post I wrote on Saturday evening. A senior lecturer in sports psychology at Brunel University, Gervis’s talk discussed positive psychology and how it can be applied at football clubs. She is actively involved in efforts to bring psychology into football clubs so that it can be used to benefit players and performance. Actually, in a follow-up to my earlier post, I had already been advised to look into the work of Jacques Crevoisier whose work with the development of psychometric tests for Liverpool and Arsenal has been well-documented (although I didn’t know of him before this tip). Gervis discussed resilience: “the ability to take hard knocks, to weather the storm and to value oneself no matter what happens” – this is affected by fear of failure, perfectionism, injury and criticism with a further impact on emotional control and decision-making. She highlighted the importance of using the concept of ‘signature strengths’ with players, where their best attributes are identified and developed to help create the right conditions for them to flourish.

We were also treated to a couple of lectures about fitness planning and training regimes by Dr Peter Krustrup of the University of Exeter and Matthew Cook, head of sports science for the MCFC academy. Both discussed how optimal fitness training for footballers involves training sessions which mimic the movements and levels of activity in a match. Krustrup included work from one of his studies, showing how yo-yo training (high intensity intermittent exercise) performance was a better indicator of match fitness than VO2 max testing – although there is a correlation for footballers. Cook explained that for academy prospects at Man City, they go so far as to look at the biological age of players vs maturation levels to try to ensure that developing players are not discriminated against in comparison to faster-growing players.

That last point links in nicely with Blake Wooster’s presentation. Wooster, business development director at Prozone, described his role as a kind of coaching scientist. His views represented the future of analysis in sport when he said that clubs should “use analytics to drive and not just inform decision-making”. Wooster’s session tied in with Rasmus Ankersen’s presentation from the day before (and to a lesser degree Cook’s reference to youth maturation) as he discussed the relative age effect in youth team football. He showed how different youth age groups are concentrated towards players born in the months directly following the cut-off point because the oldest boys are likely to be the most developed e.g. where the cut-off is 31 December, players selected in a football team are most likely to be born in January and February. He went on to describe the current Belgium national team, which has an incredibly strong first 11 at the moment, and how in recent years they overhauled their age groups to include 2 separate teams – one ‘A’ team and a development team called “the futures”. Wooster also gave an example of how Prozone calculate expected pass-success rates vs actual success rates to analyse youth players and potentially identify undervalued talent – this for me was a very satisfying use of stats to aid player appraisal. Wooster, however, did admit in the later panel session that analysis is still in an embryonic stage and that the term ‘moneyball’ in football is not particularly useful in selling analytics to clubs.

For what it’s worth, the presentation that I thought was the most interesting and well-measured throughout all of the 30-odd sessions I saw over the 4 day period was from Liverpool’s Director of Research Dr Ian Graham. Graham joined the club in the summer of 2012, following 7 years with a football analysis company. His presentation, entitled “The trouble with statistics” included the right balance of caution, care and logical proofs in answering a simple question: are clean sheets more important than scoring goals?

Graham’s regression analysis showed that one extra goal scored for a team is worth 1.02pts on average, whereas one extra clean sheet is worth an additional 2.99pts. From that piece of information alone I suppose one could be forgiven (if you want to be kind) for thinking that clean sheets are indeed more important than goals scored. But the R-squared of goals scored vs points is 77% whereas for clean sheets vs points it is 65%. The relationship with clean sheets is weaker because of volume – clean sheets are a limited resource whereas goals scored are unlimited in a match. In order to improve from an average level of goals scored (50 per season) to the top quartile you would need to score about 10 more on average (+20%). However in order to go from 11 clean sheets (average) to the top quartile level of 14 per season you need to improve by +27%.  Hence we might say it is ‘easier’ to score more goals than to improve clean sheets. Having shown this, Graham explained that the FA really was a pioneer in football in 1980 when it became the first association to introduce 3pts for a win in order to incentivise attacking football (not that it had a major long-term effect). He also discussed the path of strategies for teams at different levels – showing that clean sheets are still relatively more important for below average sides who are less likely to outscore a top team and will have a better chance of success if they restrict their opponents from scoring.

The last session I attended was the coaching panel with Steve McClaren, Paul Holder (FA national coach) and Scott Miller (first-team fitness coach at Fulham). As I noted in my previous post, McClaren began by talking about how great it was to be able to support an exhibition of innovation in football, before saying that “all I want from you science people is fitness and injury stats”! He insisted that management needs be allowed to work on instinct – which just goes to show the reality of the challenge that analytics has to overcome if it will ever become fully integrated at football clubs. He gave some useful insight into his knowledge of the differences in coaching between the Netherlands and England – in the Netherlands it seems that football-related training and fitness training with a ball are given more of an emphasis. McClaren used his experiences from Twente and Wolfsburg to argue that game intelligence in England needs to improve, giving an example of a young player at Twente who, when he was asked his opinion on team tactics for the upcoming game, gave a such a full account of player positioning and where to concentrate attack/defence with a good enough understanding to be one of the coaches.

One of McClaren’s final points was that “coaches go into a comfort zone where they don’t seek to learn more. More coaches should get out of their comfort zone and try to learn new skills and gain knowledge and experience”. Finally a positive from him that could be taken for analytics, although unfortunately he wasn’t talking about the use of performance data by coaches!

Final note

I have been quite prolific over the past 7 days in terms of writing and reviewing the conferences and this is for 2 main reasons. Firstly, I have been inspired with ideas and enthusiasm after attending the conferences – for anyone with a serious interest in sports analysis I would definitely recommend getting a ticket for either (or both) next year. Secondly, the readership of my blog has increased well beyond usual levels since I started it about 6 months ago so thank you to everyone who has taken an interest and in particular retweeted/shared the link of my blog to followers, colleagues and friends. My enthusiasm in posting the reviews quickly has meant it has all been a little unfiltered but I have tried my best to keep them as informative as possible!


The Science + Football Conference: Day 1

Having never attended this conference before (this was its 3rd year) I didn’t quite know what to expect – that led me to think that the organisers could perhaps make an improvement to the way they advertise and describe what the event actually is! Fortunately the conference name ‘Science + Football’ gives a pretty clear indication of the theme of the 2 days.

The event was held at the Soccerdome in North Greenwich, London with 4 main zones: coaching area, boot room, interactive arena and presentation theatre. Although I tried to see a bit of everything, my interest was mainly in the performance analysis sessions/lectures which took place in the presentation theatre – and luckily for me that was inside rather than outside in the freezing cold! I was also a bit more selective in what I saw. But I am happy to say that the standard of presentations and speakers was of a very high standard – with, amongst many other great speakers, sports science gurus from Manchester United and Manchester City, nutritionists from Arsenal and Bolton, representatives from Prozone, Liverpool FC and even a panel session with former England manager Steve McClaren.

First up on day 1 was a minute’s silence in memory of influential performance director Nick Broad, followed by a coaching session with Tony Strudwick. Strudwick’s session, where he put some youth players through their paces, was incredibly different from his session at SALDN – indicating the contrast between the 2 conferences. The coaching sessions offered good insight into coaching/training tips (less relevant for me) and they are useful in showing how skills are being developed by particular drills. The analyst in me, and this would no doubt irritate the coaches massively, would want to know the relative benefit of each drill and how they are selected – I’m living in a bit of a dreamworld there!

Something that became apparent to me over the course of the conference is how much more of an emphasis there is on coaching 1st, fitness and injury prevention 2nd, then probably psychology and performance data analysis tied in 3rd place. There is still a palpable divide between traditional thinkers (in general driven by coaches) and the appliers of science who seek to optimise the traditional approach. A quote from Steve McClaren confirmed the scale of the divide for me during his opening speech in the Sunday panel session when he stated: “All I want from you science people is fitness and injury stats”. That is the most striking quote I lifted from his talk in which he said many other useful and interesting things about coaching in the modern game but it really does hammer home the reality of how far away Moneyball-type methods are from being integrated into coaching. I will discuss McClaren’s thoughts (or indeed my interpretation of them) in the second part of my review of the conference.

Companies like Opta and Prozone are straddling the gap between conventional wisdom and the analytical approach. They are certainly selling the concept of analytics for football, with varying success, but it is clear from the difference in language used by performance data analysts and, say, sports scientists with a focus on fitness, that the performance data analysts need to heavily soften their findings and vocabulary in order to sell analytical ideas to coaches and teams.

Sometimes the analysis itself is to blame and sometimes the delivery of it misses the mark. That is particularly troublesome for sport when an analyst can easily draw spurious conclusions that undermine his research even before he undertakes the difficult task of translating analysis into something useable by coaches. Jim Hicks, head of coach education at the PFA, gave an encouraging session discussing the use of Prozone data in the Premier League. He looked at:

  1. The location of shots that result in goals
  2. The optimal passing locations that result in assists
  3. The number of touches taken by goalscorers

For me, the conclusions of this analysis could have done with a little more reasoning. I understand that Jim Hicks was using the analysis in order to frame a youth coaching session (which I didn’t watch – so I might well have missed some extra depth) and in that sense the analysis is perhaps more effective if it is short, simple and to the point. But with regards to the position of shots that result in goals it uncovered nothing new to what players and coaches should already know – that more goals are scored in central areas close to the goal than wide areas or outside the box. We can probably take some value in teaching players who consistently shoot from long range or poor angles to stop doing it so frequently. Perhaps that is a small step forward, but do we need stats to tell us? The next revelation was that the best zones for creating an assist are again central but this time just outside the area. Not exactly rocket science, I can imagine that it would be quite patronising for a coach to be told by an analyst to get their players to have more possession in and around the penalty area – particularly in central positions – they will score more goals! The third point was more interesting: the highest proportion of goals scored in the Premier League are those where the scorer has only taken one touch. Even this is heavily influenced by tap-ins but I would suggest that superior technical ability will still enable certain strikers to score more if they are able to make a higher proportion of 1 or 2 touch goals – assuming the volume of goals under pressure is high enough.

Nevertheless, the questions asked of data and indeed the analysis itself has to go much further to be significant. How do we develop strategies to get players into the right positions on the field (with the ball) so that they penetrate the danger zones? Simply encouraging players to get on the ball near the penalty spot in order to score more goals is based on a brief observation of where goals are scored from, with not enough attention paid to the phases of play that most often lead to this kind of opportunity.

Garry Gelade, who has colloaborated with Chelsea and more recently PSG in player recruitment analysis, proved that in some cases there is indeed more than meets the eye in terms of analytics at top level clubs. His presentation discussed a valuation of goals scored across the top European leagues. For his research, he cross-referenced a high volume of matches in the top leagues plus games between them at Champions League and Europa League level to infer rankings (a bit like the UEFA coefficient) for defence and attacking ability. From this he was able to make a judgement on the average level of defensive and offensive strength in each league and also how many goals a striker could be expected to score in the Premier League (on average) if they have scored say 15 in a season in the Eredivisie. That kind of a question, although subject to the same old criticism of the use a mean value where the variation in individual cases can be large, still serves as a great example of the use of analytics to hone player recruitment strategy. It also gives a nod and a wink to the kind of research that does happen at top football clubs – despite the veils of secrecy in place. How much it is applied is another question, however as outside observers we can all recognise the probable application of a player recruitment strategy at Newcastle United today – where they are clearly uncovering value in Ligue 1 in particular. There is little question in my mind that this is the result of an analysis-informed approach.

Rasmus Ankersen, apart from trying to sell his book (The Gold Mine Effect, sounds quite interesting actually!) told us about the pitfalls of talent identification and scouting. His prime example was Simon Kjaer, who even at 15 years old was not recognised for his talent and high potential by any of his coaches at the time (including Ankersen himself). Ankersen discussed many hidden factors that can be missed when we look purely at qualifications or conventional wisdom in evaluating talent. He used many examples, including one from Jamaican national team sprint coach Stephen Francis who identified 2 sprinters, 1 who runs 100m in 10.2secs and another who can run 10.6secs. Which sprinter would you prefer to train? Of course, it should depend on the circumstances in getting those times: if the 10.2s runner trains in world class facilities to strict regimes then perhaps his potential is much lower than a 10.6s runner who has developed his own style with indisciplined training. Francis would likely select the 10.6s runner who is slower on paper but has the potential – the runner in this anecdote being former world record holder Asafa Powell. The main theme of Ankersen’s work looks at the environmental, geographical and unique factors that breed success in particular ‘gold mines’ throughout the world, like the Jamaican sprint team and Ethiopian long distance sprinters, many of whom come from the same small town of Bekoji. I guess I’ll have to buy the book to find out if Ankersen offers further guidance on how to spot these hidden talents!

All in all it was a very enlightening day 1 of the conference with both encouragement and discouragement for the use of analytics in sport in equal measure. Obviously I am a little behind in posting this considering the conference came and went at the weekend – but for all those interested I expect to have another review for day 2 written up in the next couple of days.

Model pitfalls and further discussion of TPOEM

Since my previous post introducing a new model for football analysis, TPOEM, I have developed and integrated some significant improvements to it.

Firstly the speed in which I can give predictions based on team starting line-up (involving less manual input, more automation) is much better, so last Saturday I was able to tweet about the model’s predictions well before the 3pm kick-offs began.

Secondly I have added a manager/leadership factor into the analysis which is dynamic and unique to each team.  This adjustment is intended to ‘smooth’ the team level aggregate scores that TPOEM calculates, where the model would not otherwise capture a persistent difference between a team’s results and their underying scores. This offsets (albeit not completely) the difference between the model’s league table compared to the actual league table. Why does that happen? Well, the basic underlying reason is the same as why a shots on goal league table does not reflect the real league table. I attribute this to a kind of quality factor that I am not picking up in the statistics I use: quality in terms of shooting can relate to the position on the pitch of a shot, whether defenders pressured the attacker and how much of a contribution the assist added to a goal scored. This quality factor will also incorporate a team’s record at home or away. For reference, the model currently seems to think that Stoke and Norwich are outperforming particularly well whilst Wigan, Southampton and QPR are all doing worse in the league than TPOEM suggests they should be doing. That might be due to luck, team playing style, management, player leadership, quality or all of the above. The model should now be slightly better at accounting for that.

Predicting part 2

So the first week of predicting using TPOEM brought me a net proft, although my biggest win was West Ham away win vs Stoke – and I’ve already explained that the model was distinctly anti-Stoke before the most recent update!

Again, as ever, I am seeking value so even if TPOEM suggests a probability of an event win/draw/loss of about 40%, if the bookmakers quote odds of 35% then I consider it an attractive bet. As it stands I haven’t been that selective about what I bet on: in fact so far I’ve been betting on every match that I ran the model for even though in many cases the model didn’t really suggest any particular value vs bookies.

The result this week, from 5 games, was another net profit, this time +26% return (it was +56% last time). But that came from 2 wins, 1 void, 2 lost bets, so in a sense the net result was neutral.  I profited overall because I weighted my bets towards the most attractive in terms of value – the biggest win being a draw-no-bet backing Everton at home to Man City. The model really liked Everton’s chances mostly because Kompany, Aguero and Yaya Touré were all missing for Man City.

I also backed draw-no-bets for Liverpool, Villa and Stoke: lost, won, void respectively. And lastly I went with a draw for Swansea-Arsenal (lost) but in retrospect I shouldn’t have bothered with that bet because the model gave no conclusive direction for the game and the odds weren’t good either.

As I reformat the model’s data and find a better way of communicating its predictions/results I will publish more information on the blog as I recognise I have kept most of the details pretty close to home so far. When I’m at my desk for the 3pm kick-offs I will also tweet about the model’s predictions so if you’re interested look out for that but if you bet then you are doing so at your own risk!!!

Introducing TPOEM

I must say I sometimes get irritated by the overuse of acronyms in today’s world but this time I’ve created my own. TPOEM rather unimaginitively stands for The Power Of Eleven Model which I have been developing over the past few weeks.

TPOEM is the culmination of fairly light research into simple OPTA-derived football statistics that I have been analysing over the past 6 months or so. Having only really put the information together over the past week or so, it is a bit foolhardy to discuss TPOEM in any detail right now – but I have already begun using it to objectively rate player/team performance and even test its efficacy at predicting match results.

I will give some detail into how the model works. The first point of note is that it is a bottom-up system.  That means that it primarily analyses player data first and team data second. There are many reasons I wanted to approach the analysis in this way:

  • A focus on player statistics gives an objective view of a player’s importance to a team, and can help indicate which players contributed most/least to a team’s performance
  • Player statistics like goals scored and assists are readily available and easily compared between players at different clubs
  • TPOEM can potentially capture information that is useful to understanding team playing styles
  • TPOEM can potentially be used to give a prediction of a match result based on the team starting line-ups, which will give a clearer expectation of a result if key players from either team are missing

Although TPOEM is derived from fairly simple statistics, the most recent iteration incorporates 36 statistics including stats from goals scored and shots on target to tackles and ground duels. I have weighted the utility of each action and applied success rates where available to give a rating in simplified categories:

  • Defending/Ball winning
  • Passing/Ball retention
  • Attacking
  • Discipline
  • Involvement
  • Goalkeeping

Of course the overall scores are adjusted so that the most frequent actions (passing, touches, etc) do not grossly outweigh the less frequent, but arguably more important, actions such as shots on target and goals scored. At the same time, I tried to maintain some care over the relevance of goals as a statistic – of course goals win games, but why should TPOEM rate attackers more highly than defenders because they score more often? Strikers often take all the plaudits for scoring goals but since most goals are scored inside the box I have tried not to unduly credit a goal scored – in many instances it is easier to score a goal than miss. I took a similar view of assists, seeking not to overly ramp-up a player’s score simply because he completed a pass (however important it was). I have to stress that it still wasn’t quite a finger in the air approach to rating – I have reviewed correlations to team performance at various layers with the aim of giving my weightings a scientific basis.

I have now tinkered with the algorithms enough times to realise that although TPOEM in one sense gives an objective rating of player performance, but in another sense remains a reflection of its creator’s biases and research. This is limitation of any model, which can only be improved by testing and further research.

What about results? Well I will keep publishing information over the coming weeks as I look to find suitable ways of presenting TPOEM’s output.

For now, I have run the model on the first 271 games of the premier league season (i.e. before the kick-offs on the 2 March), and I can announce its candidates for the most man of the match performances so far this season:

Player MoM awards
Santiago Cazorla 13
Gareth Bale 10
Adel Taarabt 8
Eden Hazard 8
Leighton Baines 7
Luis Suárez 7
David Silva 6
Dimitar Berbatov 6
Juan Mata 6
Marouane Fellaini 6

This highlights the importance, according to TPOEM, of Santiago Cazorla to Arsenal’s season in terms of match-winning performances. Both Manchester sides and Arsenal lead the team man of the match awards with 22 apiece, the difference being that there is a much larger spread of players who have put in top performances for United and City in the league.


Those readers who follow me on twitter will have noticed that TPOEM liked the value of the chances of a home win for Everton and draws for Swansea vs Newcastle and Manchester United vs Norwich. Please note that this isn’t a direct match result prediction for the above – TPOEM actually had all 3 as odds-on for home wins, but the probability of a draw when compared to quoted bookmakers odds before 3pm seemed attractive at the time.

The main problem I had was in finding an efficient way to input all the line-ups in time for kick-off!

As it was, I completed my efforts and placed bets on all the 3pm kick-offs by 3.25pm – something I will have to work on going forward.

In addition to the above bets, of which only Everton’s home win against Reading paid off, I bet on a draw for Sunderland-Fulham (profit) an away win for West Ham (profit) and a win for QPR. 2 of these bets were actually placed live, with the scores at 0-0, whilst QPR were already 1-0 up at Southampton when I took the gamble of backing them to win. According to TPOEM, Chelsea were massive favourites at home to West Brom so I decided not to bother with a gamble on that game.

Most pleasing was the away win of West Ham at Stoke – a game which I am sure could just as easily have gone either way. When I ran the line-ups through TPOEM West Ham had actually already made 2 early substutions so I incorporated those new players into the line-up. The model indicated about a 30% chance of West Ham winning which was attractive enough when compared to quoted odds of about 9/4. Fortunately for the early prospects of TPOEM they duly achieved an unlikely result at the Brittania.

I will continue to test TPOEM’s predictive efficacy vs bookmaker odds but for any followers of the blog, please note that I am seeking value not outright wins. Even if Manchester United are heavy favourites to win at home, as they were at the weekend, I may suggest another outcome if the odds are attractive enough depending on what my early-stage model tells me!

Defence Against Dribbling: 2012-13

This is a follow-on from my previous post on the top dribbling teams and players this season in the Premier League.  Since I took the time to prepare the data to review the best dribblers with the ball at their feet I thought why not flip the information to review the opposition as well?

This is not a completely straightforward exercise because the information I have available does not identify the opposing player(s) involved when a dribble is attempted – and even if a defender does not actively tackle his opponent, his position may force the attacker into losing the ball (eg. by running out of play or into another defender). I can’t provide much insight on these ‘micro’ events on the field.

But, we can do a similar team analysis as produced in the last post to consider which teams seem to invite dribbling against them, and also which teams are particularly adept at making opposition dribblers lose the ball.

Dribbling Against FrequencyThe above data is ordered by total dribbles against. We can immediately see that teams tend to dribble more often against Sunderland and Norwich (382 and 380 attempted dribbles against respectively) and least frequently against Everton (267 dribbles against). With all teams having played 23 games at the time of writing, this difference of about 5 dribbles per game against perhaps isn’t terribly telling but may give an indication of team tactics without the ball. QPR and Reading, both in the relegation zone, sit at opposite ends of this table.  Alternatively, Everton and Spurs are on the low end, allowing the fewest number of dribbles against, whilst Arsenal are 4th highest despite their league positions of 5th, 4th and 6th respectively. I’m intrigued by this, and admittedly I haven’t given it a lot of thought before just typing away now, but at a guess team pressing plays a part here – Moyes’ Everton in particular have a reputation for pressing across the pitch whilst AVB’s Spurs are beginning to develop a reputation for pressing high up the field. Maybe that has a bearing?

Of course, there is more to defending than pressing or allowing attackers space to dribble. Indeed, even with tight marking some players will seek to dribble to gain a yard against his opponent on the turn.

Here’s a bar graph comparing total dribbles against (as above) but now vs total shots allowed:

Dribbling Shots Against Frequency


I’m not particularly fond of information for the sake of it and I’ll admit that the above graph is a bit of a jumble, but we do get a clearer picture of Arsenal and Reading in particular who allow disproportionately fewer (and more, respectively) shots against when compared to their dribbling against stats – perhaps adding some depth to our understanding of how effective their defensive style has been.

What about the success rates of opponents dribbling against?

Dribbles Against - Success RatesThe above graph tells us how successful the opposition has been at dribbling against the team in question. So, although Everton allow the fewest number of total dribbles against, they have the second highest success rate against – i.e. 53% of dribbles against them are successful. Wigan seem to put up the least fight when it comes to dribbling against with opponents having 54% success on average. But most teams are spread a few points either side of 50%, with a few outliers: Manchester City, Spurs and Southampton – all of whom are no pushovers when it comes to dribbling against. City in particular have an exceptional 37% (now I’m starting to wonder why I didn’t turn the success rate on its head for intuitive purposes!). To translate that, around 1 in 3 attempted dribbles against City are successful, which is the lowest rate in the league this year and highlights their strength across the whole pitch.

Dribbles Against Per GameThis last chart is ordered by the number of successful dribbles against per game, so Everton move up a few places and Manchester City take up the position on the far right – allowing on average just under 5 successful dribbles against per game. Manchester City incidentally have conceded the fewest goals so far this year.

What can we learn from this? Obviously there is more to a game of football than how often you allow the opposition to dribble successfully against you but this is still an interesting objective view of the differing defensive styles/abilities of teams in the Premier League and can no doubt be used in tandem with analysis of other defensive measures to improve our understanding of what tactics have been most effective.