Why you should never analyze anything in small samples except for maybe if you are a bookmaker trying to conceal that your odds are sharp as fuck

If one thing is for sure is that we are in the middle of a mind-blowing season with plenty of profitable betting opportunities ahead.

This is the final conclusion of “Serie A suprises in the 2015/2016 season”, an article posted on Pinnacle.com[1].

In it, the author looks for surprises in the first half of the season using a statistical method to evaluate predictions, called the Brier Score. I want to show with this blog post, that the “surprises” the article identifies, weren’t surprising at all, but a result of small sample sizes.

So, first we need to know what the Brier Score is and how it works. The Brier Score allows us to objectively assess the quality of probabilistic predictions. It basically measures the gap between what was predicted and what actually happened and computes that into a single number. The lower the Brier Score, the smaller the error in your prediction model.

Pinnacle even offers an explanation on how they calculate the Brier Score for soccer games:

For example in Juventus’ first home match versus Udinese on 23rd August, Pinnacle odds implied a 75.0% win for Juventus and a 17.2% chance of draw. Hence Udinese’s actual win was very unlikely at 8.1%. … The sum of the square difference for this match is 0.7502 + 0.1722 + (1-0.081)2 = 1.436, this being the Brier Score. 

Coincidentally, my 82 year old grandma from Udine also had a prediction for that game, which was 99% chance for an away win and 1% chance for a draw. Her Brier score for that match was 0.0002[2].

As you might have already guessed my grandma is not clairvoyant and the Brier Score only becomes meaningful after evaluating a lot of games. This is because match predictions are expressed as probabilities and the outcome of a soccer match is binary (expressed as a “yes, Udinese won” and “no, Juventus did not win”).

But how many games should we wait until the Brier Score becomes meaningful? Pinnacle waited with their article until half the season was over, which would have been a good time to calculate the Brier Score for all games and see how accurate their predictions were (their Brier Score was 0.57 by the way, which is amazing, but we will get to that at the end).

Instead, the centerpiece of the article is this table, where the games are divided by team and by home and away results.

brierpinnacle

This means that the sample size shrunk from over 160 games to under 20 games for the Total column and under 10 games for every Playing Home and Playing Away column, making the Brier Score meaningless.

To demonstrate this, I replicated the table by simulating the results as if the Pinnacle odds[3] were a perfect representation of the real odds of each game. This means, if Pinnacle said there is a 42% for Team A to win, Team A was given a 42% victory chance and so on and so on. So, since I used “perfect odds” for the following tables, there would have been by definition no surprise and especially no profitable betting opportunity.

brier6brier5brier4

brier3brier2brier1

The scores differ a lot for every table. This is not because, for example, Sampdoria performed as expected in one timeline and in a different timeline below market expectations. Every team performed in those simulated tables exactly as they were expected to. The sample size just isn’t large enough to show it yet.

A high Brier Score, usually indicating that the underlying model didn’t capture reality well, is in this case no indication of an inefficient betting market.

The overall Brier Score of the perfect model varied between 0.56 and 0.61. Pinnacles actual Brier Score for the first half of the season was 0.57.

With this in mind:

If one thing is for sure is that we are in the middle of a mind-blowing season with plenty of profitable betting opportunities ahead.

Okayo, if you say so.

 

 

 [1] https://www.pinnacle.com/en/betting-articles/soccer/serie-a-surprises-based-on-brier-score-method

[2]To be honest with you, I don’t have a grandma living in Udine

[3] I used the midweek Pinnacle odds available on www.footballdata.co.uk/

 

Leave a comment