Since the beginning of the season I have been trying to analyze various factors determining wins/losses. Some of my key findings are as follows:

- First Downs are key offense indicators of touchdown projections
- The ability to throw a 40+ yard pass is another key determinant of touchdowns
- The net Q3-Q4 score has the maximum impact determining wins/losses
- Weather has no impact on a team's ability to win/lose
- Home team advantage is a myth and does not provide statistical relevance in determining wins/losses

## Football arbitrage...

Football is the fabric that binds America. Lot of folks look forward to a Monday, Thursday or weekend... like myself. Interestingly enough, seeing mathematical patterns in the game adds extra punch to the wholeexperience. We ridicule and condemn greed by punishing insider tradings. Unfair advantage or arbitrage opportunities in sports is a shame. It is one thing to find vulnerabilities in an opponent's game. However, tampering with rules and laws is plain wrong at so many levels. Sports teams are icons of competition with sports personnel role models for kids. I am truly appalled and disturbed by the level of greed surpassing competition in NFL. It reflects a double standard in calling ourselves a true "open market" with a "perfectly competitive" market.

That being said here is the analysis and projections for the Superbowl

## New Variables

For SuperBowl projections I consolidated team versus opponent factors into variables that determine wins/losses. Here are the** 7-most impactful variables that seem to explain about 90% variations in wins/losses**.

- Net Touchdowns (Teams touchdowns per game - Opponents touchdowns)
- Fourth down percent
- Net Yards (Teams yards per game - Opponents yards)
- Third down percent
- Penalty Yards per penalty
- Possession percent
- Net Q3-Q4 points

I tried to run the analysis with several combinations of variables in order to maximize the explanation of variations in wins for a team. In other words, I tried to maximize "adjusted R Square" to the point of diminishing returns to arrive at these 7 factors.

## Analysis

Here is the regression.

The 89.86% **Adjusted R-Square signifies** that the variation in the probability of wins (I computed the win probability from the regular season data) is **explained approximately 90% by the variations in these 7 particular variables and adding more variables (which I tried) did not cause this to increase**. The **high R-Square (92%) explains** the **high causality of these 7 variables to determine the variations in the probability of wins**.

The ANOVA with a low Signifance F reinforces the assessment. For the statistically inclined audience the probability (9.53 E -12) indicates an extremely low chance of rejecting the Null Hypothesis. Therefore we can say these 7 variables explain 92% variability (R-Square) in the probability of a win and that these 7 variables are the best variables to do so (Adjusted R-Square of 90%).

The variables (sorted in descending order of p-value) show variations on the probability of a win in descending order.

## Data

Sorted in descending order of their probability of winning, we find that the first 8-10 teams made the playoffs

Using the analysis I tried to re-compute the probability to win for the Patriots and Seahawks. To my suprise the Patriots now seem to have a slight edge over the Seahawks - even though we know who would be the crowds' favorite...

## SuperBowl Projections

**The New England Patriots have a 78% probability to win compared to a 73% for the Seahawks**

The small difference signifies the following:

- It will be a very competitive game
- New England may rely on Third and Fourth down conversions more
- Seattle's rushing game will be their formidable weapon

Let's see...

## Comments

You can follow this conversation by subscribing to the comment feed for this post.