Thursday, July 31, 2008

Choosing the right tools

Having now established two of the types of models that can be used for this project, we now turn our attention to choosing one of them. The fundamental difference between the two is that in the case of the logistic regression model of the match outcome, a 5-0 thrashing is considered (for modeling purposes) as the same as an ugly 1-0 win, while an entertaining 3-3 draw does not differ in any way with a boring 0-0 stalemate. Alternatively, modeling the whole probability distribution of match scores (i.e. assigning a probability estimate for each and every one of the scorelines) allows for these differences at the potential cost of higher estimation burden (possibly accompanied by an increased estimation error).

In effect we have a simple model, and a more complex one. We don't know much about their forecasting ability yet, although one could argue that sufficient backtesting should have clarified if one is significantly better than the other. However, one consideration that should be taken into account is the fact that modelling the whole distribution of goals (i.e. using the complex model) would also highlight cases where other types of betting (rather than 1X2 betting) could be profitable such as the "Number of Goals" market, Asian handicap, or even correct score betting.

As stated above, the chosen model tries to mimic the behaviour of two variables, namely the goals scored by the home team and the ones scored by the away team. In the univariate case, goals in a match can be seen to reflect events taking place at an unknown rate within a time-period (in the football scenario, 90 minutes), assumed to be constant throughout. This is called the "Poisson Process". The probability distribution of goals, that is the probability of the number of goals scored by the end of the match being 0, 1 2 , ... is entirely dependent on the unknown scoring rate. The bivariate case is simply an extension of this scenario, with the two teams having possibly differing scoring rates. But since they are playing each other, they interact, hence it's probable that their scoring rates are not independent, so under this scenario some sort of dependence structure is needed.

Anyone reading this blog will excuse me for not divulging further details for now. When adjustments are needed, more information on the type of model used may be given out but for now, allow me to keep some things to myself... Until then we shall be focusing on what may or may not be useful in predicting football scores.

Saturday, July 26, 2008

The project

Today's post is somewhat different. It's the beginning of a long but hopefully successful journey. You see, one of the reasons of having a blog for me was to have a chance of combining a few of my interests, namely football, betting and statistics. Of course, I could do this without putting my thoughts on the blog, but having them here would put some sort of pressure on me to carry on, even if the going gets tough! And who knows, if it proves to be profitable, somebody else may benefit too!

As mentioned before this is simply the beginning, an introduction if I may. The idea was to develop a football match prediction model which would estimate the probability of a Home win, a Draw and an Away win (HDA from now on) of football matches as accurately as possible. These estimated probabilities would be used to decide whether the offered odds by a bookmaker provide value. But I am getting ahead of myself here. Let's take things from the beginning.

In strict statistical terms, modeling the outcome of a football match is a classic example of logistic regression. The "event" (that's the match) can take any of three "values" (H, D or A) with some unknown probability which we are trying to estimate. These probabilities will depend on factors such as the teams playing, home advantage, and possibly other variables such as the form of each team etc. It is our mission to find that model which mimics the real results as closely as possible.

One could also suggest that since the outcome of a match depends on the number of goals scored by each team, a better idea would be to model the match score itself. Such an analysis would provide probability estimates for every single possible result i.e. 0-0, 0-1, 1-0, 2-1, or even 3-5. By adding the appropriate probabilities one would be able to infer the estimated probability of H, D or A.

So to recap, on the one hand we have some sort of a logistic regression model of the match outcome, while on the other a (bivariate) probability distribution model of the match score. There are pros and cons in either case, which probably brings me to George Box's "All models are wrong, but some are useful". Only time will tell, whether the chosen approach is useful.

Thursday, July 24, 2008

The contract

I know, I know... three weeks without a post and one could easily think that this blog is over before it even began! Had I signed a contract though, the frequency of the updates would have definitely been inserted as a clause and I would have to respect it... or would I?

Contracts are the flavour of this year's summer. And it's not just the Cristiano Ronaldo saga (I was thinking of including a link but where do I begin? Maybe this is the most appropriate) which is still going on with as many twists and turns as the player's moves on the football pitch. Gareth Barry has also been involved in a tug of war between Aston Villa and Liverpool regarding his registration. So should players be forced to honour the contracts that they have already signed or should they be allowed to blackmail their clubs, forcing them to accept a transfer?

Let's take Ronaldo's example: he plays for Manchester United but he more or less says that he would like to go to Real Madrid. Of course, he is under contract with the Red Devils, a contract that he was expected to see out unless both his team and himself agreed upon its cancellation. Real Madrid play the game very carefully: they know that approaching the player without Man Utd's agreement would probably land them some sort of fine by FIFA (although it is Real Madrid that we are talking about...). At the same time, they are making noises about wanting Ronaldo dressed in the white shirt of Madrid.

Ronaldo himself is also in a delicate position. He knows what he wants (the whole world probably does) but is also aware that it may be the case that he will still be at Man Utd come September 1st. He does not want to burn any bridges ... yet! So he sits still. Officially demanding a transfer would probably cost him (in money terms), whereas should Real launch a bid and is accepted, he would benefit more.

But what of the contract which is already under way? Manchester United obviously do not want to sell him, and they may even try to force him to stay by flatly rejecting any approach. And then what? You end up with a disgruntled player, not producing at the best of his ability or even worse who is forced to watch the game from the stands (as Ferguson threatened). Furthermore, the dressing room morale may take a hit and as a result the team's form may plunge especially when we are talking about such an important player.

So who is right in all of this? In my opinion, contracts are there to be respected. If a player is not happy, he should have thought twice before agreeing on what was offered. If he desperately wants to move, then he should find a club who will compensate the team which holds the player's registration adequately. "Adequately" here means at a price which represents the value of the player to his current team (and of course that value may be different to the value of the player to his prospective team), however that is defined. Now whether the valuation is representative or not, that's another post!

Tuesday, July 1, 2008

May the best team win!

Euro 2008 is officially over with Spain lifting the trophy after 44 long years. The general consensus is that one of, if not the best team has won it, even though many of the pundits had their doubts. For me they were the most complete team with an athletic, although sometimes suspect defense, a fluid midfield and a lethal attack. Their only negative was the "underachievers" tag which is always attributed to Spain, but hopefully this will now change.

The final itself was not an especially great match, but you couldn't help but admire Spain's fluidity. Xavi was justly named player of the tournament as it was his movement and passing which made the team tick. But not naming the rest of that midfield would be a disservice to players such as Senna who provided a strong shield to the back four, Silva and Iniesta who created chances for themselves or their teammates and Fabregas who was always dangerous in the final third. Special mention also goes to Xabi Alonso who was simply in imperious form when given the chance, especially against Greece.

Betting-wise, my bets proved to be profitable. The fact that two of my four picks (Germany, Spain, Italy and Croatia) made it to the final meant that a cover bet was not necessary. A total outlay of 14.22 units on these four picks, with a return of 4.17 x 6.00 = 25.02 units for the winner, means a yield of 75.9% which is not bad for a start. [On a side note, I am planning to keep track of these tips and who knows, maybe sometime they will prove profitable for other people too!]

Now that Euro 2008 has finished and the transfer window is officially open, teams will turn to improve their squads. Scolari has been busy at Chelsea signing Deco for an estimated £8m while Arsenal and Liverpool will soon be sealing the signings of Samir Nasri and Andrea Dossena respectively. Further comment on the transfer dealings will be coming soon. Until then, I am off to enjoy the Euro 2008 winnings .... not!