Monthly Archives: March 2013

The Science + Football Conference: Day 2

Last up in this series of reviews covers the second part of the Science + Football conference.

On day 2 of the conference I kept a low profile and stuck to my seat in the presentation theatre for most of the day. It was another day of sessions from a wide variety of speakers (as by now I had become accustomed to) including psychologists, sports scientists, statisticians, scouts and a panel session including former England manager Steve McClaren.

Dr Misia Gervis, who I noted in my earlier review of the Sports Analytics Innovation Summit, gave a presentation which really struck a chord with the post I wrote on Saturday evening. A senior lecturer in sports psychology at Brunel University, Gervis’s talk discussed positive psychology and how it can be applied at football clubs. She is actively involved in efforts to bring psychology into football clubs so that it can be used to benefit players and performance. Actually, in a follow-up to my earlier post, I had already been advised to look into the work of Jacques Crevoisier whose work with the development of psychometric tests for Liverpool and Arsenal has been well-documented (although I didn’t know of him before this tip). Gervis discussed resilience: “the ability to take hard knocks, to weather the storm and to value oneself no matter what happens” – this is affected by fear of failure, perfectionism, injury and criticism with a further impact on emotional control and decision-making. She highlighted the importance of using the concept of ‘signature strengths’ with players, where their best attributes are identified and developed to help create the right conditions for them to flourish.

We were also treated to a couple of lectures about fitness planning and training regimes by Dr Peter Krustrup of the University of Exeter and Matthew Cook, head of sports science for the MCFC academy. Both discussed how optimal fitness training for footballers involves training sessions which mimic the movements and levels of activity in a match. Krustrup included work from one of his studies, showing how yo-yo training (high intensity intermittent exercise) performance was a better indicator of match fitness than VO2 max testing – although there is a correlation for footballers. Cook explained that for academy prospects at Man City, they go so far as to look at the biological age of players vs maturation levels to try to ensure that developing players are not discriminated against in comparison to faster-growing players.

That last point links in nicely with Blake Wooster’s presentation. Wooster, business development director at Prozone, described his role as a kind of coaching scientist. His views represented the future of analysis in sport when he said that clubs should “use analytics to drive and not just inform decision-making”. Wooster’s session tied in with Rasmus Ankersen’s presentation from the day before (and to a lesser degree Cook’s reference to youth maturation) as he discussed the relative age effect in youth team football. He showed how different youth age groups are concentrated towards players born in the months directly following the cut-off point because the oldest boys are likely to be the most developed e.g. where the cut-off is 31 December, players selected in a football team are most likely to be born in January and February. He went on to describe the current Belgium national team, which has an incredibly strong first 11 at the moment, and how in recent years they overhauled their age groups to include 2 separate teams – one ‘A’ team and a development team called “the futures”. Wooster also gave an example of how Prozone calculate expected pass-success rates vs actual success rates to analyse youth players and potentially identify undervalued talent – this for me was a very satisfying use of stats to aid player appraisal. Wooster, however, did admit in the later panel session that analysis is still in an embryonic stage and that the term ‘moneyball’ in football is not particularly useful in selling analytics to clubs.

For what it’s worth, the presentation that I thought was the most interesting and well-measured throughout all of the 30-odd sessions I saw over the 4 day period was from Liverpool’s Director of Research Dr Ian Graham. Graham joined the club in the summer of 2012, following 7 years with a football analysis company. His presentation, entitled “The trouble with statistics” included the right balance of caution, care and logical proofs in answering a simple question: are clean sheets more important than scoring goals?

Graham’s regression analysis showed that one extra goal scored for a team is worth 1.02pts on average, whereas one extra clean sheet is worth an additional 2.99pts. From that piece of information alone I suppose one could be forgiven (if you want to be kind) for thinking that clean sheets are indeed more important than goals scored. But the R-squared of goals scored vs points is 77% whereas for clean sheets vs points it is 65%. The relationship with clean sheets is weaker because of volume – clean sheets are a limited resource whereas goals scored are unlimited in a match. In order to improve from an average level of goals scored (50 per season) to the top quartile you would need to score about 10 more on average (+20%). However in order to go from 11 clean sheets (average) to the top quartile level of 14 per season you need to improve by +27%.  Hence we might say it is ‘easier’ to score more goals than to improve clean sheets. Having shown this, Graham explained that the FA really was a pioneer in football in 1980 when it became the first association to introduce 3pts for a win in order to incentivise attacking football (not that it had a major long-term effect). He also discussed the path of strategies for teams at different levels – showing that clean sheets are still relatively more important for below average sides who are less likely to outscore a top team and will have a better chance of success if they restrict their opponents from scoring.

The last session I attended was the coaching panel with Steve McClaren, Paul Holder (FA national coach) and Scott Miller (first-team fitness coach at Fulham). As I noted in my previous post, McClaren began by talking about how great it was to be able to support an exhibition of innovation in football, before saying that “all I want from you science people is fitness and injury stats”! He insisted that management needs be allowed to work on instinct – which just goes to show the reality of the challenge that analytics has to overcome if it will ever become fully integrated at football clubs. He gave some useful insight into his knowledge of the differences in coaching between the Netherlands and England – in the Netherlands it seems that football-related training and fitness training with a ball are given more of an emphasis. McClaren used his experiences from Twente and Wolfsburg to argue that game intelligence in England needs to improve, giving an example of a young player at Twente who, when he was asked his opinion on team tactics for the upcoming game, gave a such a full account of player positioning and where to concentrate attack/defence with a good enough understanding to be one of the coaches.

One of McClaren’s final points was that “coaches go into a comfort zone where they don’t seek to learn more. More coaches should get out of their comfort zone and try to learn new skills and gain knowledge and experience”. Finally a positive from him that could be taken for analytics, although unfortunately he wasn’t talking about the use of performance data by coaches!

Final note

I have been quite prolific over the past 7 days in terms of writing and reviewing the conferences and this is for 2 main reasons. Firstly, I have been inspired with ideas and enthusiasm after attending the conferences – for anyone with a serious interest in sports analysis I would definitely recommend getting a ticket for either (or both) next year. Secondly, the readership of my blog has increased well beyond usual levels since I started it about 6 months ago so thank you to everyone who has taken an interest and in particular retweeted/shared the link of my blog to followers, colleagues and friends. My enthusiasm in posting the reviews quickly has meant it has all been a little unfiltered but I have tried my best to keep them as informative as possible!



The Science + Football Conference: Day 1

Having never attended this conference before (this was its 3rd year) I didn’t quite know what to expect – that led me to think that the organisers could perhaps make an improvement to the way they advertise and describe what the event actually is! Fortunately the conference name ‘Science + Football’ gives a pretty clear indication of the theme of the 2 days.

The event was held at the Soccerdome in North Greenwich, London with 4 main zones: coaching area, boot room, interactive arena and presentation theatre. Although I tried to see a bit of everything, my interest was mainly in the performance analysis sessions/lectures which took place in the presentation theatre – and luckily for me that was inside rather than outside in the freezing cold! I was also a bit more selective in what I saw. But I am happy to say that the standard of presentations and speakers was of a very high standard – with, amongst many other great speakers, sports science gurus from Manchester United and Manchester City, nutritionists from Arsenal and Bolton, representatives from Prozone, Liverpool FC and even a panel session with former England manager Steve McClaren.

First up on day 1 was a minute’s silence in memory of influential performance director Nick Broad, followed by a coaching session with Tony Strudwick. Strudwick’s session, where he put some youth players through their paces, was incredibly different from his session at SALDN – indicating the contrast between the 2 conferences. The coaching sessions offered good insight into coaching/training tips (less relevant for me) and they are useful in showing how skills are being developed by particular drills. The analyst in me, and this would no doubt irritate the coaches massively, would want to know the relative benefit of each drill and how they are selected – I’m living in a bit of a dreamworld there!

Something that became apparent to me over the course of the conference is how much more of an emphasis there is on coaching 1st, fitness and injury prevention 2nd, then probably psychology and performance data analysis tied in 3rd place. There is still a palpable divide between traditional thinkers (in general driven by coaches) and the appliers of science who seek to optimise the traditional approach. A quote from Steve McClaren confirmed the scale of the divide for me during his opening speech in the Sunday panel session when he stated: “All I want from you science people is fitness and injury stats”. That is the most striking quote I lifted from his talk in which he said many other useful and interesting things about coaching in the modern game but it really does hammer home the reality of how far away Moneyball-type methods are from being integrated into coaching. I will discuss McClaren’s thoughts (or indeed my interpretation of them) in the second part of my review of the conference.

Companies like Opta and Prozone are straddling the gap between conventional wisdom and the analytical approach. They are certainly selling the concept of analytics for football, with varying success, but it is clear from the difference in language used by performance data analysts and, say, sports scientists with a focus on fitness, that the performance data analysts need to heavily soften their findings and vocabulary in order to sell analytical ideas to coaches and teams.

Sometimes the analysis itself is to blame and sometimes the delivery of it misses the mark. That is particularly troublesome for sport when an analyst can easily draw spurious conclusions that undermine his research even before he undertakes the difficult task of translating analysis into something useable by coaches. Jim Hicks, head of coach education at the PFA, gave an encouraging session discussing the use of Prozone data in the Premier League. He looked at:

  1. The location of shots that result in goals
  2. The optimal passing locations that result in assists
  3. The number of touches taken by goalscorers

For me, the conclusions of this analysis could have done with a little more reasoning. I understand that Jim Hicks was using the analysis in order to frame a youth coaching session (which I didn’t watch – so I might well have missed some extra depth) and in that sense the analysis is perhaps more effective if it is short, simple and to the point. But with regards to the position of shots that result in goals it uncovered nothing new to what players and coaches should already know – that more goals are scored in central areas close to the goal than wide areas or outside the box. We can probably take some value in teaching players who consistently shoot from long range or poor angles to stop doing it so frequently. Perhaps that is a small step forward, but do we need stats to tell us? The next revelation was that the best zones for creating an assist are again central but this time just outside the area. Not exactly rocket science, I can imagine that it would be quite patronising for a coach to be told by an analyst to get their players to have more possession in and around the penalty area – particularly in central positions – they will score more goals! The third point was more interesting: the highest proportion of goals scored in the Premier League are those where the scorer has only taken one touch. Even this is heavily influenced by tap-ins but I would suggest that superior technical ability will still enable certain strikers to score more if they are able to make a higher proportion of 1 or 2 touch goals – assuming the volume of goals under pressure is high enough.

Nevertheless, the questions asked of data and indeed the analysis itself has to go much further to be significant. How do we develop strategies to get players into the right positions on the field (with the ball) so that they penetrate the danger zones? Simply encouraging players to get on the ball near the penalty spot in order to score more goals is based on a brief observation of where goals are scored from, with not enough attention paid to the phases of play that most often lead to this kind of opportunity.

Garry Gelade, who has colloaborated with Chelsea and more recently PSG in player recruitment analysis, proved that in some cases there is indeed more than meets the eye in terms of analytics at top level clubs. His presentation discussed a valuation of goals scored across the top European leagues. For his research, he cross-referenced a high volume of matches in the top leagues plus games between them at Champions League and Europa League level to infer rankings (a bit like the UEFA coefficient) for defence and attacking ability. From this he was able to make a judgement on the average level of defensive and offensive strength in each league and also how many goals a striker could be expected to score in the Premier League (on average) if they have scored say 15 in a season in the Eredivisie. That kind of a question, although subject to the same old criticism of the use a mean value where the variation in individual cases can be large, still serves as a great example of the use of analytics to hone player recruitment strategy. It also gives a nod and a wink to the kind of research that does happen at top football clubs – despite the veils of secrecy in place. How much it is applied is another question, however as outside observers we can all recognise the probable application of a player recruitment strategy at Newcastle United today – where they are clearly uncovering value in Ligue 1 in particular. There is little question in my mind that this is the result of an analysis-informed approach.

Rasmus Ankersen, apart from trying to sell his book (The Gold Mine Effect, sounds quite interesting actually!) told us about the pitfalls of talent identification and scouting. His prime example was Simon Kjaer, who even at 15 years old was not recognised for his talent and high potential by any of his coaches at the time (including Ankersen himself). Ankersen discussed many hidden factors that can be missed when we look purely at qualifications or conventional wisdom in evaluating talent. He used many examples, including one from Jamaican national team sprint coach Stephen Francis who identified 2 sprinters, 1 who runs 100m in 10.2secs and another who can run 10.6secs. Which sprinter would you prefer to train? Of course, it should depend on the circumstances in getting those times: if the 10.2s runner trains in world class facilities to strict regimes then perhaps his potential is much lower than a 10.6s runner who has developed his own style with indisciplined training. Francis would likely select the 10.6s runner who is slower on paper but has the potential – the runner in this anecdote being former world record holder Asafa Powell. The main theme of Ankersen’s work looks at the environmental, geographical and unique factors that breed success in particular ‘gold mines’ throughout the world, like the Jamaican sprint team and Ethiopian long distance sprinters, many of whom come from the same small town of Bekoji. I guess I’ll have to buy the book to find out if Ankersen offers further guidance on how to spot these hidden talents!

All in all it was a very enlightening day 1 of the conference with both encouragement and discouragement for the use of analytics in sport in equal measure. Obviously I am a little behind in posting this considering the conference came and went at the weekend – but for all those interested I expect to have another review for day 2 written up in the next couple of days.

A little more from the Sports Analytics Innovation Summit

A topic discussed by Simon Hartley from consulting firm Be World Class is the last area I want to write about in some detail as part of my summaries on SALDN. Hartley talked about ‘controlling the shift in psychological momentum’ in sport and it is an area I found quite important that otherwise wasn’t really discussed at the conference.

Psychology at this event completely look a back seat – and that has been hammered home to me by hearing a presentation today at the Science + Football conference by psychologist Dr Misia Gervis. We probably all remember former England manager Glenn Hoddle’s attitude to the importance of psychology amongst players – he famously hired faith healer Eileen Drewery in the 90s (not exactly the kind of psychological support I would recommend though!). What about the state of psychology in sport today? Truth is I don’t really know but to me it seems to be another area that is kept in the shadows with the focus purely on pre-game mental preparation. Whilst reading this anyone with an interest or education in psychology may well notice a lack of substance and experience in the arguments that follow, but please bear with me!

Hartley used in his presentation the example of Newcastle vs Arsenal in February 2011: a game famous for the incredible comeback of Newcastle to draw the game 4-4 having been 0-4 behind after 26mins. The supporting evidence he used was quite patchy and I didn’t necessarily agree with all of it – for example he mainly drew upon passing volume stats to identify that the momentum shift in the game in Newcastle’s favour actually changed midway through the first half rather than when Diaby was sent off on 50mins with the score at 0-4. But, putting those gripes to one side, there is little doubt in my mind that momentum and psychology did play a big part in the comeback at whatever point of the game it changed. Examples of large swings in performance are well-known, from a tennis player who fights back to win from 2 sets down to Olympian Ben Ainslie who famously trailed his rivals in the London 2012 Olympics but ‘got angry’ and came back to win. The best example (of many) to occur in the past 12 months or so is perhaps Europe’s Ryder Cup victory last year when coming into the last day of play Europe were all but dead and buried, and yet they came back to record the biggest ever comeback win in Ryder Cup history.

What I find most intriguing about the Ryder Cup example is the seemingly contagious effect of a small change in fortune which spreads through both teams and somehow improves one set of players’ decision-making whilst at the same time stifling the other team. Pope & Schweitzer’s 2011 study into golf showed that a golfer putting for par is more likely to be successful than when putting for birdie even if the putt for birdie is of equal difficulty – a fantastic example of the effect of loss aversion on performance. The best players in individual sports have a propensity to reset their emotions and play each point/shot without changing their natural game or being unduly affected by a mistake. Could this expand to a kind of ‘herd’ mentality across a football team that is affected by confidence in the same way as financial markets? That is perhaps where leadership and experience are important in a team.

In my opinion it is certainly enough to explore more research into the psychology of players in a starting line-up to ensure that a team’s overall ‘mental durability’ is sufficient to withstand knocks (or lifts) to confidence. In terms of player recruitment in football, some form of this is likely to be already done to some degree. E.g. background checks on a player’s personality and whether or not they will cope in a pressured environment are already asked. But how can we improve that process? And should we integrate more regular psychological testing/screening into our current squad? Can ‘mental strength’ be accurately measured? Can it be taught, developed or improved? Does it change with age? If a player has a few experiences of winning from a losing position or vice-versa does it affect their future behaviour? These questions aren’t new, and I can imagine a body of data is needed to reach conclusions of any use but surely there is some value in investigation and a certain degree of experimentation. Perhaps it already exists and it’s just that I haven’t seen it!

For me, the home advantage ‘phenomenon’ in sport is an incredible opportunity for psychology to try and shed some further light in a quantifiable way. I am certainly no expert on the topic, but as an analyst I see more potential in reallocating resources towards this field – much in the same way that I am enthusiastic for more investment in pure performance analytics.

Day 2: Sports Analytics Innovation Conference

Although day 2 of the conference had a slightly lower attendance, perhaps due to the lack of a presentation from a representative of a professional football club, it was every bit as interesting as day 1.

Having listened to all the speakers I’m in a better position to give a general review on my thoughts on the conference and the state of analytics on the basis of what I saw. Today we heard from top speakers from both codes of Rugby, UK Sport, the England and Wales Cricket Board, British gymnastics, the NFL and a couple of start-up companies that offer services to professional sporting organisations. We did indeed have representation of football analytics in the guise of Professor Chris Anderson, whose presentation I will discuss later on in the post, and a panel discussion including participants from the Times, Man City, AC Milan and the BOA.

The first speaker of the day, Scott Drawer from UK Sport, gave a roots and branches discussion of performance analysis and its use at elite level. He made an useful point about how sporting organisations protect their innovations by keeping them secret which in his opinion isn’t neccessarily the right strategy: “it’s not what you do but how you do it”. It’s quite easy to nod and agree with this line of thinking – in terms of sharing knowledge and information – as in many cases we might think that it is preferable to level a playing field in order to identify what teams operate best with the same knowledge. But I’m not personally convinced by this perspective.

My background in business and economics reminds me that companies in some industries require incentives in order to spend heavily on research and development. Take the pharmaceutical industry as an example; for all the investment required in research, development, testing and advertising new treatments – the grand proportion of which fail long before they are allowed into consumption – the pharmaceutical company is awarded patents that protect their rights to manufacture and distribute new medicine before the competition can sell the same formula. The patent enables the innovator to gain a reward in profits for the large expenses undertaken to bring a new medicine to market and in turn the consumer benefits from better treatment.

Now to baseball, and every sport analyst’s favourite example: Moneyball. If Billy Beane’s ‘new’ methods were widely distributed as he first started pursuing the Jamesian strategy, would he have bothered to think and work outside the box? Moneyball the book had so much early popularity that the strategies and metrics used by Oakland were identified and copied very quickly by their rivals – how much longer would the approach have worked if Moneyball had never been published? No doubt by 2003 some of the secrets were being found out anyway, and simply by the departures of JP Ricciardi and Paul DePodesta more of the insider knowledge would have diffused to other teams.

Back to the conference and another example: Alicia Rankin, Data & Insights Manager for the NFL, gave a presentation with one of the heaviest focuses on team business strategy – specifically discussing the role of NFL stadiums in spectator satisfaction. She provided examples of different innovations in stadium design which consumers responded well to – in this case, sharing organisational tips of how to effectively improve infrastructure for the benefit of the customer. For this situation, sharing innovations like instant replays played for the crowd, better parking design and wi-fi infrastructure are all much easier to sell in terms of the benefits to pretty much all stakeholders – from the fan to the teams themselves and also the league. Rankin also played a video with Pink Floyd’s ‘Wish you were here’ as the soundtrack which is always going to go down well!

Bill Gerrard, current technical director for Saracens RFC who recently collaborated with Billy Beane at MLS team San Jose Earthquakes, emphasised the need for analytical ideas to be accepted from the top-down – if the leadership doesn’t buy into the notion of analytics in their sport then the process is doomed to fail. Professor Chris Anderson took this one step further by telling the forward-thinkers in the room to fight hard for their work to be used or otherwise “you might as well work in insurance!”. At least Saracens certainly seem to be instilling an holistic approach to analysis as Gerrard explained the utility of an open-plan coaching office environment and an array of coaches with diverse backgrounds including medicine and law.

I found Prof Anderson’s presentation about football analytics quite refreshing. His opening statement that “football analytics has stalled” raised a few eyebrows in the room. His argument was interesting, I am sure that the representatives from football clubs that were present may have disagreed with some of his opinions, however I get the impression that he is close to the truth. I think football could well be in a kind of post-Moneyball period of scepticism (which in reality could lead to the same mistakes in football as pre-Moneyball) where it has become all too easy to discredit analysis. Why? Because analysis of a game with 22 agents moving independently over 90mins is very complex when compared to the relatively simple baseball actions of pitching and hitting (yes I know there is more to baseball than that but you get my point). For me that’s too easy to use as an excuse – a defeatist attitude that won’t drive progress, and there is perhaps too much of a fixation on Taleb’s black swan! Football analysts could turn out to be feeding out misinformation in order to maintain the secrecy of their technical methods, but in general everyone would probably agree that there is the need for a mixture of traditional coaching and analytics. But how do we find the right mix of analytics? My impression is that the football clubs remain a little too conservative.

A few conversations I had with representatives from gambling firms confirmed my surprise that none of the speakers seemed to spend too much time or effort involved in player data analysis or prediction. Just because it’s difficult doesn’t mean we should stop trying – as Michael Bourne of the ECB noted in a quote from George E.P. Box in his presentation today, “All models are wrong, but some models are useful”. Prof Anderson’s rallying cry to drive innovation in football analysis was nice to hear considering the vast resources available to football clubs – indeed many of the sports and analysts covered by the conference have to work on much smaller budgets with much less support in terms of data availability and yet in many cases they are making a significantly bigger impact.

Day 1: Sports Analytics Innovation Conference

This was my first taste of a professional sports conference, focusing on ‘analytics’ in sport.

Analytics as a term has elbowed its way into my vocabulary without me ever really thinking about it – why not just stick with analysis? Looking analytics up online, the simplest definition is ‘the science of analysis’. But after today I think I will adopt Professor Steve Haake’s distinction of analysis as covering the investigation/interrogation of smaller data sets, whilst analytics applies at a much larger scale: to terra- or even peta-bytes of data analysis.

Today’s conference, at the Oval in south London, was well-organised and packed full of insight and experience from a large variety of experts belonging to top sporting teams and institutions.

The subjects varied from sports medicine and performance innovation at Olympic level to Red Bull’s highly contrarian approach in training their athletes and even a discussion on the growing importance of social media in sport. We were also treated to a presentation by a performance scientist [Dr Harvey Galvin] on his work on hypoxic training applied to professional tennis players for the LTA.

The general theme of the day which was evident throughout most of the thirteen 30min presentations referred to performance analysis in sport as a ‘journey’ with general agreement about the widespread changes and innovations over the past 5-10 years that have taken place in terms of the integration of ‘big data’ into sporting institutions.

There were also many different approaches to coaching discussed – perhaps the extremes were highlighted by Red Bull’s Darren Roberts who championed putting the emphasis on the athlete first and foremost and letting the individual drive their training requirements. His statement that “we don’t look at risk” was one of the more provocative sound bites of the day! On the other side of the coin came Chelsea’s Head of Academy Performance Systems – Ben Smith – who detailed Chelsea’s extensive monitoring of academy prospects involving in some cases daily reports and several different methods to give and receive feedback between player and coach. Checks, controls and reviews that the Red Bull team would probably have baulked at! Clearly different sports and individuals at different stages of their careers need different training approaches.

The general consensus was that the data analysis currently used isn’t by any means perfect and is difficult to apply with confidence in the workplace. In most sports there is clearly a strong requirement for cost/benefit analysis with regards to research and implementation which hinders some progress in innovation (and at the same time avoids other projects being given too much time and resource). All too often the problem seems to be that there are so many variables involved in every test, experiment or model that we can rarely convince ourselves how to correctly identify the proportion of effects from possible causes. For my money Dr Marco Cardinale, outgoing head of sports science for the BOA, gave the best explanation of how this ‘noise’ can only be overcome by evidence-based coaching. He provided a great analogy of the pharmacology model: ‘what is the smallest dosage of medicine to give the biggest possible effect’? This philosphy with regards to the aggregation of marginal gains has clearly stood the BOA in great stead recently and on the basis of his clear and richly informative presentation Dr Cardinale will be a real loss to the British team.

What I found notable was a marked reluctance from the speakers involved in team sports (rugby and football) to speak about prediction or forecasting. At least on a public level, I suppose that talking over-confidently about prediction and modelling is potential career suicide at least if you don’t state probabilistic outcomes (and even then you can end up with egg on your face). Name checked at various points throughout the day (and I expect more of the same tomorrow) were Moneyball/Michael Lewis, Nate Silver and Nicolas Taleb with his black swan theory – the cult celebrities of the data analysis movement over the past 5-10 years or so.

The highlights for me came in actually the last 2 presentations of the day: Prof Steve Haake from the Centre for Sports Engineering Research (CSER) and Tony Strudwick of Manchester United. For me, what set these presentations apart was the speakers’ willingness to provide detail and graphical information from their work and the practical initiatives they have been working on over the past few years.

Haake gave a really interesting view of his research into measured performance for different track and field events since records began. Did you know, for example, that the global men’s 100m best times follow a general improving trend which was stilted by WW1, WW2 and then in the 1970s when electronic timing was introduced and athletes ‘lost’ 0.4secs due to the removal of human timing (affected by reaction times)? He showed off some of the performance analysis innovations created for different olympic sports and how the CSER team have colloborated with coaches and athletes in order to produce the best products/services possible for the greatest possible effect.

Strudwick impressed me with his detailed discussion of sports science and its application at the top level for Manchester United, particularly in injury prevention. The reported link of metabolic power training loads to injuries and the spread of injuries through the playing year was interesting. In addition, the disclosure of Man Utd’s use of Matlab and advanced statistical analysis was beyond my expectations – and their partnership with Liverpool John Moores University for a collaboration on scientific research also shows forward-thinking in a way one could be forgiven for thinking may not happen behind the scenes at a club whose hierarchy and decision-making is so often reported alongside one name: Sir Alex Ferguson.

Next up is of course day 2 of SALDN. I’m following that up with the Science + Football conference at the weekend but I’ll post some updates on my thoughts from those experiences as soon as I can.

Model pitfalls and further discussion of TPOEM

Since my previous post introducing a new model for football analysis, TPOEM, I have developed and integrated some significant improvements to it.

Firstly the speed in which I can give predictions based on team starting line-up (involving less manual input, more automation) is much better, so last Saturday I was able to tweet about the model’s predictions well before the 3pm kick-offs began.

Secondly I have added a manager/leadership factor into the analysis which is dynamic and unique to each team.  This adjustment is intended to ‘smooth’ the team level aggregate scores that TPOEM calculates, where the model would not otherwise capture a persistent difference between a team’s results and their underying scores. This offsets (albeit not completely) the difference between the model’s league table compared to the actual league table. Why does that happen? Well, the basic underlying reason is the same as why a shots on goal league table does not reflect the real league table. I attribute this to a kind of quality factor that I am not picking up in the statistics I use: quality in terms of shooting can relate to the position on the pitch of a shot, whether defenders pressured the attacker and how much of a contribution the assist added to a goal scored. This quality factor will also incorporate a team’s record at home or away. For reference, the model currently seems to think that Stoke and Norwich are outperforming particularly well whilst Wigan, Southampton and QPR are all doing worse in the league than TPOEM suggests they should be doing. That might be due to luck, team playing style, management, player leadership, quality or all of the above. The model should now be slightly better at accounting for that.

Predicting part 2

So the first week of predicting using TPOEM brought me a net proft, although my biggest win was West Ham away win vs Stoke – and I’ve already explained that the model was distinctly anti-Stoke before the most recent update!

Again, as ever, I am seeking value so even if TPOEM suggests a probability of an event win/draw/loss of about 40%, if the bookmakers quote odds of 35% then I consider it an attractive bet. As it stands I haven’t been that selective about what I bet on: in fact so far I’ve been betting on every match that I ran the model for even though in many cases the model didn’t really suggest any particular value vs bookies.

The result this week, from 5 games, was another net profit, this time +26% return (it was +56% last time). But that came from 2 wins, 1 void, 2 lost bets, so in a sense the net result was neutral.  I profited overall because I weighted my bets towards the most attractive in terms of value – the biggest win being a draw-no-bet backing Everton at home to Man City. The model really liked Everton’s chances mostly because Kompany, Aguero and Yaya Touré were all missing for Man City.

I also backed draw-no-bets for Liverpool, Villa and Stoke: lost, won, void respectively. And lastly I went with a draw for Swansea-Arsenal (lost) but in retrospect I shouldn’t have bothered with that bet because the model gave no conclusive direction for the game and the odds weren’t good either.

As I reformat the model’s data and find a better way of communicating its predictions/results I will publish more information on the blog as I recognise I have kept most of the details pretty close to home so far. When I’m at my desk for the 3pm kick-offs I will also tweet about the model’s predictions so if you’re interested look out for that but if you bet then you are doing so at your own risk!!!

Introducing TPOEM

I must say I sometimes get irritated by the overuse of acronyms in today’s world but this time I’ve created my own. TPOEM rather unimaginitively stands for The Power Of Eleven Model which I have been developing over the past few weeks.

TPOEM is the culmination of fairly light research into simple OPTA-derived football statistics that I have been analysing over the past 6 months or so. Having only really put the information together over the past week or so, it is a bit foolhardy to discuss TPOEM in any detail right now – but I have already begun using it to objectively rate player/team performance and even test its efficacy at predicting match results.

I will give some detail into how the model works. The first point of note is that it is a bottom-up system.  That means that it primarily analyses player data first and team data second. There are many reasons I wanted to approach the analysis in this way:

  • A focus on player statistics gives an objective view of a player’s importance to a team, and can help indicate which players contributed most/least to a team’s performance
  • Player statistics like goals scored and assists are readily available and easily compared between players at different clubs
  • TPOEM can potentially capture information that is useful to understanding team playing styles
  • TPOEM can potentially be used to give a prediction of a match result based on the team starting line-ups, which will give a clearer expectation of a result if key players from either team are missing

Although TPOEM is derived from fairly simple statistics, the most recent iteration incorporates 36 statistics including stats from goals scored and shots on target to tackles and ground duels. I have weighted the utility of each action and applied success rates where available to give a rating in simplified categories:

  • Defending/Ball winning
  • Passing/Ball retention
  • Attacking
  • Discipline
  • Involvement
  • Goalkeeping

Of course the overall scores are adjusted so that the most frequent actions (passing, touches, etc) do not grossly outweigh the less frequent, but arguably more important, actions such as shots on target and goals scored. At the same time, I tried to maintain some care over the relevance of goals as a statistic – of course goals win games, but why should TPOEM rate attackers more highly than defenders because they score more often? Strikers often take all the plaudits for scoring goals but since most goals are scored inside the box I have tried not to unduly credit a goal scored – in many instances it is easier to score a goal than miss. I took a similar view of assists, seeking not to overly ramp-up a player’s score simply because he completed a pass (however important it was). I have to stress that it still wasn’t quite a finger in the air approach to rating – I have reviewed correlations to team performance at various layers with the aim of giving my weightings a scientific basis.

I have now tinkered with the algorithms enough times to realise that although TPOEM in one sense gives an objective rating of player performance, but in another sense remains a reflection of its creator’s biases and research. This is limitation of any model, which can only be improved by testing and further research.

What about results? Well I will keep publishing information over the coming weeks as I look to find suitable ways of presenting TPOEM’s output.

For now, I have run the model on the first 271 games of the premier league season (i.e. before the kick-offs on the 2 March), and I can announce its candidates for the most man of the match performances so far this season:

Player MoM awards
Santiago Cazorla 13
Gareth Bale 10
Adel Taarabt 8
Eden Hazard 8
Leighton Baines 7
Luis Suárez 7
David Silva 6
Dimitar Berbatov 6
Juan Mata 6
Marouane Fellaini 6

This highlights the importance, according to TPOEM, of Santiago Cazorla to Arsenal’s season in terms of match-winning performances. Both Manchester sides and Arsenal lead the team man of the match awards with 22 apiece, the difference being that there is a much larger spread of players who have put in top performances for United and City in the league.


Those readers who follow me on twitter will have noticed that TPOEM liked the value of the chances of a home win for Everton and draws for Swansea vs Newcastle and Manchester United vs Norwich. Please note that this isn’t a direct match result prediction for the above – TPOEM actually had all 3 as odds-on for home wins, but the probability of a draw when compared to quoted bookmakers odds before 3pm seemed attractive at the time.

The main problem I had was in finding an efficient way to input all the line-ups in time for kick-off!

As it was, I completed my efforts and placed bets on all the 3pm kick-offs by 3.25pm – something I will have to work on going forward.

In addition to the above bets, of which only Everton’s home win against Reading paid off, I bet on a draw for Sunderland-Fulham (profit) an away win for West Ham (profit) and a win for QPR. 2 of these bets were actually placed live, with the scores at 0-0, whilst QPR were already 1-0 up at Southampton when I took the gamble of backing them to win. According to TPOEM, Chelsea were massive favourites at home to West Brom so I decided not to bother with a gamble on that game.

Most pleasing was the away win of West Ham at Stoke – a game which I am sure could just as easily have gone either way. When I ran the line-ups through TPOEM West Ham had actually already made 2 early substutions so I incorporated those new players into the line-up. The model indicated about a 30% chance of West Ham winning which was attractive enough when compared to quoted odds of about 9/4. Fortunately for the early prospects of TPOEM they duly achieved an unlikely result at the Brittania.

I will continue to test TPOEM’s predictive efficacy vs bookmaker odds but for any followers of the blog, please note that I am seeking value not outright wins. Even if Manchester United are heavy favourites to win at home, as they were at the weekend, I may suggest another outcome if the odds are attractive enough depending on what my early-stage model tells me!