PDA

View Full Version : Best Predictor of Wins? A (simple) Statistical Analysis



cobber66
07-23-2010, 10:29 PM
NFL people have always talked about how "Offense wins games, but defense wins championships", but is that really the case? I decided to take a look at the numbers from last season (regular-season only) and see if I could find the best predictors of the total number of wins. My thoughts are that the best three would be offensive yards per game, yards per game allowed, and Turnover margin. I also looked at yards per game differential (YPG scored minus YPG allowed) to control for different styles of play. Looking at the plots, it appears that all four are good predictors.

I decided I would do a regression analysis on these data (basically, I'm just fitting the expected number of wins as a function of offensive and defensive YPG and turnover margin). What I found was that offense does indeed win games, and turnover margin is HUGE. Here's part of the summary (basically, a smaller p-value indicates a stronger predictor of wins):

Coefficients:
Estimate p-value
Defyds -0.014258 0.279215
Offyds 0.042564 0.000201
TOmargin 0.114381 0.021892

In statistics, we generally consider a p-value of less than 0.05 to be an indicator of a strong predictor. Using that criterion we see that Offensive YPG and turnover margin are the best predictors of wins, while YPG allowed isn't as significant. Given the current state of the NFL, with the emergence of big-time passing offenses, this shouldn't be that surprising. One other interesting thing to note: the estimate of 0.114 for turnover margin. Essentially what this means is that for any given YPG gained and allowed, a one-turnover increase in TO margin amounts to an expected 0.114 additional wins. Put another way, a change of 9 turnovers either way in a team's season TO margin is worth about one win!

I decided to do one more analysis. This time instead of using offensive and defensive YPG as predictors, I decided to use YPG differential. This allows me to control for different styles of play (e.g. the Jets win games a lot differently than the Colts or Saints) as well as for different field and weather conditions. YPG differential only considers the difference in yardage totals for the two teams in every game. I fit a model to last year's NFL win totals based on YPG differential and TO margin, with the following results:

Coefficients:
Estimate p-value
ypgdiff 0.031407 0.000153
TOmargin 0.107999 0.048150

Looks like YPG differential is a very strong predictor of win totals! For any given season TO margin, we expect roughly one extra win for every 32-yard increase in YPG differential, and for a given YPG differential, again a 9-turnover change in TO margin is worth about one extra win. So there are the numbers: if you can outgain your opponent in win the turnover battle, you have a great chance of winning (but you already knew that!).

Finally, a note about this analysis: do not be scared, it's really not that complicated. I did it all in about a half hour on my home computer, with some Excel and a free software package called R. I hope I was able to make it as simple as possible, but if you're confused by something just reply and I'll try to explain it further. And as always, remember correlation does not imply causation!

cobber66
07-23-2010, 10:32 PM
Sorry the plots are so small, it wouldn't let me upload a bigger image.

Colts01
07-23-2010, 10:39 PM
how about giving examples from individual teams,say the colts for example using this method.Also if you ever get time i think another great factor would be avg. field position.Interesting stuff!

Polishguy00
07-23-2010, 10:47 PM
I love this stuff. Thanks for sharing it.

damgenius
07-23-2010, 10:50 PM
The picture might work better in JPG format than BMP. On my screen it appears about 40 pixels across.

Andy Freeland
07-23-2010, 10:52 PM
Promoted to user article: http://footballproslive.com/content.php/253-Best-Predictor-of-Wins-A-%28simple%29-Statistical-Analysis

I'm going to take a hard look at this tomorrow when I have more time. Admittedly, looking it over quickly tonight I didn't fully (or even partially) understand it all.

Andy Freeland
07-23-2010, 10:55 PM
Cobber, if you send me a bigger image I should be able to post it with the article. You can send it to andy at footballproslive.com.

cobber66
07-23-2010, 11:11 PM
Cobber, if you send me a bigger image I should be able to post it with the article. You can send it to andy at footballproslive.com.

It's been sent!

cobber66
07-23-2010, 11:16 PM
how about giving examples from individual teams,say the colts for example using this method.Also if you ever get time i think another great factor would be avg. field position.Interesting stuff!

I think I could do some kind of analysis like this for individual teams within a season. For example, I could model the probability of winning for the Colts based on different factors, like the ones I already mentioned, or I could even take a look at using things like Peyton Manning passing yards, for example, and see how strong of a predictor of winning that is. In fact, I like this idea a lot. I'll see if I can do something like this in the next few days/weeks, but unfortunately my time is limited :(

williwonte
07-24-2010, 01:13 PM
Since the simplest model is the most elegant, your second is better. I also wonder if yards gained and allowed are truly uncorrelated. Since both are time related. I hope you will use yoour spare time to fine tune your analysis, rather than carousing and lessor activities.

theshow47
07-24-2010, 03:09 PM
I think I could do some kind of analysis like this for individual teams within a season. For example, I could model the probability of winning for the Colts based on different factors, like the ones I already mentioned, or I could even take a look at using things like Peyton Manning passing yards, for example, and see how strong of a predictor of winning that is. In fact, I like this idea a lot. I'll see if I can do something like this in the next few days/weeks, but unfortunately my time is limited :(

I like the idea of individual team analysis. For I think passing yards per attempt would be a great stat for this analysis. Form what i hear a qb needs to have a average of 7.5 to have a passing game that keeps the defense honest.

To piggy back on Colts01 comment about starting field position i think we should look at special teams in some effect. Like yard differential in the kicking game (kickoff yards + punt yards - return yards). The varitability of the kicking game would also be interesting.

We are using these predictiors retroactivly how do we utilize them to predicted future events?

cobber66
07-24-2010, 03:40 PM
Since the simplest model is the most elegant, your second is better. I also wonder if yards gained and allowed are truly uncorrelated. Since both are time related. I hope you will use yoour spare time to fine tune your analysis, rather than carousing and lessor activities.

I found the correlation between yards gained and allow to be about -0.45, so they are correlated (because of time, and a bad offense putting more pressure on a defense, and vice versa), though that generally isn't considered high enough correlation to throw off the analysis. However, that is why I did the second analysis, using just YPG differential. I agree that it's a better analysis, but it's interesting to compare the relative importance of offense and defense.

And what do you mean by "carousing"???

cobber66
07-24-2010, 03:47 PM
We are using these predictiors retroactivly how do we utilize them to predicted future events?

Well, generally what we do is develop models to find the importance of certain factors (for teams in general or for individual teams) as far as it affects the results of games. Then what you do is try to objectively look at teams going into individual games (or whole seasons), and ask, does this team have the ability to do these things well? Or, maybe team A wins with a certain style (e.g. the 49ers like to establish the run game), and does that match up favorably with what their opponent does (or doesn't do) well? Sometimes you can find things in the numbers that are surprising, or that you wouldn't think about just seeing a team on the surface.

buzmeg
07-25-2010, 11:01 AM
We are using these predictiors retroactivly how do we utilize them to predicted future events? It's literally impossible to create a stat based model to predict the outcome of games. The reason being is there is no way to statistically allow for the "human factor(s)" involved in a team sport. The current crop of players are not robots.

cobber66
07-25-2010, 02:06 PM
It's literally impossible to create a stat based model to predict the outcome of games. The reason being is there is no way to statistically allow for the "human factor(s)" involved in a team sport. The current crop of players are not robots.

I think what you mean by "human factor" is the randomness that is involved in everyday life. When people use statistical models we can only talk in terms of averages and/or probabilities. For example, accuscore.com projects "average scores" for every NFL game, based on thousands of simulations. Betting lines are set in a similar way, with "power rankings" put together by bookmakers. So we use these models to give a best estimate or figure out how likely something is to happen, but the reality is that over the course of just one single 60-minute football game, anything can (and eventually will) happen (See Vikings-Saints NFC Championship game last year). It's the same idea as in the stock market, the insurance industry, hell even the lottery. Give it enough time and even the most unlikely events WILL happen.

Colts01
07-25-2010, 02:45 PM
It's literally impossible to create a stat based model to predict the outcome of games. The reason being is there is no way to statistically allow for the "human factor(s)" involved in a team sport. The current crop of players are not robots.

All in good time,All in good time,heh

theshow47
07-25-2010, 03:23 PM
Well, generally what we do is develop models to find the importance of certain factors (for teams in general or for individual teams) as far as it affects the results of games. Then what you do is try to objectively look at teams going into individual games (or whole seasons), and ask, does this team have the ability to do these things well? Or, maybe team A wins with a certain style (e.g. the 49ers like to establish the run game), and does that match up favorably with what their opponent does (or doesn't do) well? Sometimes you can find things in the numbers that are surprising, or that you wouldn't think about just seeing a team on the surface.

I agree, but how are we able to compare different styles of play? The question im posing is using yardage differential and turnover margin how are we separating styles of play. Maybe we should add some different factors. Can time of possession predict stle of play even though it has little to do with the win/loss results?

Refining your model to factor in style of play will be very interesting.

thephaze
07-25-2010, 03:35 PM
Nice data.

I'm glad you did all of those calculations so I didn't have to.

buzmeg
07-25-2010, 03:46 PM
I think what you mean by "human factor" is the randomness that is involved in everyday life. When people use statistical models we can only talk in terms of averages and/or probabilities. For example, accuscore.com projects "average scores" for every NFL game, based on thousands of simulations. Betting lines are set in a similar way, with "power rankings" put together by bookmakers. So we use these models to give a best estimate or figure out how likely something is to happen, but the reality is that over the course of just one single 60-minute football game, anything can (and eventually will) happen (See Vikings-Saints NFC Championship game last year). It's the same idea as in the stock market, the insurance industry, hell even the lottery. Give it enough time and even the most unlikely events WILL happen.


I think what you mean by "human factor" is the randomness that is involved in everyday life. When people use statistical models we can only talk in terms of averages and/or probabilities.Exactly my point.

theshow47
07-26-2010, 02:19 PM
Sorry but im confused buzmeg. When I talk about predicting outcomes Im talking a probability, not an exact science. I cant believed you even bothered to post a comment