Saturday, July 26, 2008

The project

Today's post is somewhat different. It's the beginning of a long but hopefully successful journey. You see, one of the reasons of having a blog for me was to have a chance of combining a few of my interests, namely football, betting and statistics. Of course, I could do this without putting my thoughts on the blog, but having them here would put some sort of pressure on me to carry on, even if the going gets tough! And who knows, if it proves to be profitable, somebody else may benefit too!

As mentioned before this is simply the beginning, an introduction if I may. The idea was to develop a football match prediction model which would estimate the probability of a Home win, a Draw and an Away win (HDA from now on) of football matches as accurately as possible. These estimated probabilities would be used to decide whether the offered odds by a bookmaker provide value. But I am getting ahead of myself here. Let's take things from the beginning.

In strict statistical terms, modeling the outcome of a football match is a classic example of logistic regression. The "event" (that's the match) can take any of three "values" (H, D or A) with some unknown probability which we are trying to estimate. These probabilities will depend on factors such as the teams playing, home advantage, and possibly other variables such as the form of each team etc. It is our mission to find that model which mimics the real results as closely as possible.

One could also suggest that since the outcome of a match depends on the number of goals scored by each team, a better idea would be to model the match score itself. Such an analysis would provide probability estimates for every single possible result i.e. 0-0, 0-1, 1-0, 2-1, or even 3-5. By adding the appropriate probabilities one would be able to infer the estimated probability of H, D or A.

So to recap, on the one hand we have some sort of a logistic regression model of the match outcome, while on the other a (bivariate) probability distribution model of the match score. There are pros and cons in either case, which probably brings me to George Box's "All models are wrong, but some are useful". Only time will tell, whether the chosen approach is useful.

1 comments:

Unknown said...

Hi buddy, you may refer to dixon&coles(1997) for your research.