Form versus fixtures is a classic debate but one that is extremely hard to quantify and generally is left to pure theory.
With a history of pre-match goalscorer odds and the actual xG & goals for the 2018/19 Premier League season at my disposal this is one that I have the capability to examine numerically.
I will refer to the goals forecast by odds as ‘fG’ , post-match expected goals as ‘xG’ and actual goals as ‘G’ throughout this post.
The first step is determining what counts as form in this analysis. I’ll be looking at 2 types of form:
- Raw Form: When a player is firing in lots of goals relative to what was expected beforehand (G/fG)
- Underlying Form: When a players underlying xG is high relative to what was expected beforehand (xG/fG)
With the database for the 2018/19 this is not so hard- I just need to select a GW range and collect each players G, xG and fG .
For example in GW1-5 Mohamed Salah scored 2 goals, had an xG of 3.89 and from odds was expected to score 3.43 goals in 426 minutes of football. Accoring to the ‘Raw Form’ measure he looks out of sorts (2G/3.43fG = 58.3%) while his underlying numbers were positive (3.89xG/3.43fG = 113%) so his ‘Underlying Form’ is positive.
*xG is taken from Undertstat & scaled for seasonal bias
*Implied goalscorer chance from odds (fG) has bookmaker margins removed and is scaled by minutes played
For the sake of this analysis only MDs & FWs who played at least 300 minutes over the initial 5GW period and also the following 5GWs are considered. The reasoning behind this is that a player must have played enough minutes to determine a meaningful level of form in both time periods and MDs/FWs were used as DFs scoring the occasional goal would massively distort the data otherwise.
In the ‘Raw Form’ plot below, the x-axis indicates the players form between GW1-5 and y-axis indicates the same players form over GW6-10. Each dot represents a player. We can see that a player scoring more goals than expected in GW1-5 didn’t give a particular useful impression of how they went on to perform in GW6-10.
Using the same samples but switching the xG data rather than actual goals produced the ‘Underlying Form’ graph displayed below. Again nothing useful coming out of it for this time period- though it has to be said it’s a very small sample size.
I decided to repeat the above for different time periods and ended up with the following datasets:
- Pair A: Initial Form = GW1-5, Comparison Form = GW6-10 (shown above)
- Pair B: Initial Form = GW11-15, Comparison Form = GW16-20
- Pair C: Initial Form = GW21-25, Comparison Form = GW26-30
This effectively tripled the data available to me and I was able to combine the 3 data pairs into one larger set, as what I’m really interested is: based on how much beyond expectations a player performed at over the last 5GWs, what can be expected over the next 5GWs? ie. does form continue from one period into another.
Again I show the Raw Form measure for this bigger sample and again it indicates that there is nothing useful to found. This means that a player scoring more than usual recently is not a particularly good reason to buy him in FPL.
Repeating this for the underlying xG data shows something more valuable for those interested in form and xG data. There appears to be a relationship- though it needs to be highlighted quite a weak one. This suggests that underlying data is a superior measure of actual form.
Here is a summary of the data for players who were flagged as high form (better than 1.0 ratio G/fG) in both the Raw and Underlying systems. One note to be considered is that in ‘Future’ 5GWs samples there was a disproportionate amount of goals across all players [G=181, fG: 159.18, xG 159.02] including out of form players. This means the ‘G’ readings in the tables below are inflated by ~13.7%.
From the form perspective of this analysis the take aways are:
- Past 5GW Underlying Form displayed a weak correlation to future Underlying Form & displayed a ~5% boost over typically expected performance levels
- Past 5GW ‘Raw Form’ displayed no correlation to future ‘Raw Form’. This suggests that looking at performance levels & data is likely better than focusing on results
Using Fixtures & Odds to Predict Performance
Form is not the only way to predict future performance and the other side of the debate needs to be considered. Is match difficulty more predictive of player performance, and if so by how much?
This was a much simpler study and used the same 5GW datasets as the form analysis (MDs&FWs with 300+mins in GWs1-5, GWs 6-10, GWs 11-15, GWs 16-20, GWs 21-25 & GWs 26-30)
The measure of match difficulty is taken as the forecast goals from bookmaker odds. This is as bookmakers are not reactionary when it comes to form so their forecasts are more so indicators of match difficulty.
The graph below shows that if a player had matches that he was expected to score in (x-axis) his goalscoring (y-axis) tended to be somewhat predictable across the samples.
Again repeating the analysis with xG data gives the most concrete answer. Match difficulty readings such as from Odds are highly predictive- way beyond that of form. The higher correlation between pre-match Odds and post-match xG compared to Odds & actual goals also highlights how valuable both Odds and xG data sources are.
Form is incredibly difficult to read into and often luck/random happenings are mistaken for meaningful form indicators.
The data weakly agrees with the concept of form from underlying data continuing/being predictable and it can be something to keep in mind but weighed very lightly.
Fixture/situation difficulty is several times more predictive and for FPL purposes fixtures should certainly be considered before form- even though it’s hard to resist picking players you see scoring goals on Match of the Day.