Tuesday, August 5, 2008

Predicting football matches

The background is out of the way, the tools have been selected, now it's the time to get our hands dirty with the small details. Whichever model we choose, it will need some input in terms of factors which may affect our predictions. These have to be chosen wisely otherwise we may replicate the old motto of "Garbage in, garbage out".

Our approach is based on the general football concept of "class" and "form". It is often suggested that form is temporary while class is permanent so we will pick variables to model these general ideas. To account for the class of each team, we start with the obvious choice of variable of ... the actual teams playing! This translates to the fact that when Man Utd are playing at home against Hull City, it is expected that the probability of a home win is much bigger than the probability of a home win when Man Utd are playing against Arsenal. In fact, past performance is a pretty good place to begin and it turns out that decent (but not necessarily profitable) models can be built using the "class" approach. As for how to rate each team, there are both ad-hoc methods and optimization techniques. This is not a statistics lecture, so I'll leave it for now...

The form of each team is something more transient. A lot of punters who are building football prediction systems use the recent matches to judge on a team's form. This has both advantages in the sense that if a team is going through a good patch, it's more likely to have a good game than not. On the other hand, something that is often overlooked is the standard of the opposition teams against which form is being evaluated, compared to the standard of the opposition team in the next match. Maybe, there is scope for improvement here.

A team manages to score a goal after it has created some chances. Sure, some teams are more lethal and take the few chances that come along, but usually, the more (and better) chances a team creates, the higher the number of goals it will score. Therefore, we decided to use variables which describe the attacking capability of each team and its propensity to create chances. Unfortunately, for now, we don't have a measure to classify chances as good or difficult, although this "problem" provides also room for potential improvement.

Again, the small print will be left out of the blog for now, but if anyone out there is reading this and has any comments or views, please feel free to post them and who knows, maybe a fruitful discussion will facilitate further information on this.

0 comments: