RPS 16/17

This table shows the Ranked Probability Scores (RPS) for the following public prediction models for GW 22-38 (170 games) of the Premier League:

Betbot, bing, Euro Club Index, fivethirtyeight and my own model.

The betting market is represented with the ‘Betbrain model’, which consists of the highest available odds Friday prior to the matches.

Also added is a combination of all six public models (mean average) and a simple model that is explained here.


The Ranked Probability Score is a statistical method to evaluate the quality of predictions with outcomes that have an order to them. This makes it a good tool to measure the quality of football predictions, as football matches have ranked outcomes (home win > draw > away win).

The RPS compares how far off predictions were to what actually happened. This means that a low RPS is a good RPS, as it more or less represents the ‘error’ in the predictions.

You can read more about the RPS in Constantinou and Fenton’s paper ‘Solving the problem of inadequate scoring rules for assessing probabilistic football forecast models’.


ROI stands for return of investment. This column shows how much each of the models would have made betting against the highest available odds that were used for the Betbrain model.

To make the ROI of each of the models comparable, this formula was used to decide how much to bet on each outcome:

(expected profit)/(odds-1) = stake

This is not meant to show how much each of the models are expected to win over the long run. It is actually supposed to show the opposite, that a positive ROI does not mean that the prediction method outperformed the accuracy of the betting markets.

Perfect and z-score column

Those two columns can show us a bit more about how to interpret the RPS for each model. I’ve simulated all matches a couple of thousand times based on the probabilities of each model to calculate the average RPS for every model, given its probabilities were perfect.

So, if the model perfectly captured reality, we were to expect an RPS similar to whatever stands in the ‘perfect’ column.

The z-score shows how many standard deviations the actual RPS strayed from the mean (average) of the simulated predictions.


This can help us identify overconfident and lucky prediction models, which we could not identify with their actual RPS alone.

A very low (negative) z-score indicates that the model was overconfident. A high (positive) z-score indicates that even if the predictions were perfect, the model still got very lucky with the actual outcomes of the games.

If you have any further questions concerning the table or the explanations just let me know.