Excel LADZ

Building an NFL Model - Part 1 | Predicting Game Outcomes

Added 2024-09-14 06:40:31 +0000 UTC

G’day, lads! In this article, I’ll explain the introduction to building the ‘Excel LADZ - NFL Model’. By the end, you’ll be able to understand the core principles behind the model, including how to simulate a single game and predict a winner using Attack and Defense Ratings. Parts 2 and 3 will cover simulating games to generate a distribution of outcomes, which can be used to price markets like the Moneyline, Spread, and Total Points, helping you identify positive expected value bets.

If you haven’t already, download the NFL Model for daily analysis and join the most active sports modelling community on YouTube!

Download the ‘Excel LADZ - 2024 NFL Model’: https://www.patreon.com/posts/excel-ladz-2024-111333965

Model Framework

Before building a model, I always start with a plan. For the NFL, my goal is to predict the probability of a team winning (Moneyline), covering a Spread, and hitting the Over/Under Points. To calculate these outcomes, a simulation model is needed.

Simply comparing the overall strength of two teams isn’t enough. For example, an ELO or SRS (Simple Rating System) model can help you to predict the likelihood of a team winning, but these methods fall short when you need to predict how many points will be scored in a game. Instead, we need a model that considers both a team’s offensive and defensive strengths.

Let’s look at an example: If two average teams are playing, and both teams have strong offenses but weak defenses, we would expect a high-scoring game, meaning the over/under points threshold should be set higher. On the other hand, if both teams are defensively strong but offensively weak, the expected point total would be much lower.

In an NFL game, there are several ways a team can score, but touchdowns are the most significant, worth 6 points. In this model, I’ll estimate a team’s Expected Touchdowns (xTouchdowns) by factoring in their offensive ability, adjusting for the opponent’s defense, and then scaling that by the league average. Other scoring methods (like field goals) will be simplified to keep the model efficient.

Because we are calculating probabilities for the Moneyline, Spread, and Over/Under, there isn’t one single formula to rely on. Instead, we need to simulate an entire game multiple times, and then repeat this process to create a distribution of outcomes. This will give us a more accurate picture of the likely results.

Importing the Data

Now that I have a plan, I can start building the model. I don’t want to manually copy and paste each team’s stats into the model every week and repeat the calculations. Instead, I’ll connect the spreadsheet to the internet so it can automatically update with the latest stats when I refresh it. In Excel, this is done using a Power Query. I’ll use a Power Query to connect the model to the official NFL Website and import the following team stats for the 2023 and 2024 seasons:

NFL Team Scoring: https://www.nfl.com/stats/team-stats/offense/scoring/2024/reg/all
NFL Team Rushing Defense: https://www.nfl.com/stats/team-stats/defense/rushing/2024/reg/all
NFL Team Passing Defense: https://www.nfl.com/stats/team-stats/defense/passing/2024/reg/all
NFL Team Field Goals Made: https://www.nfl.com/stats/team-stats/special-teams/scoring/2024/reg/all

Now, the model will have a separate worksheet for the 2023 and 2024 NFL season stats.

I’ll also create a worksheet that contains a Power Query and connects the spreadsheet to the NFL Official Standings for 2023 & 2024: https://www.nfl.com/standings/

Developing ATT & DEF Ratings

Now that we have the key stats in the model, we can use them to assess each team’s strength. Specifically, I want to evaluate how well each team scores rushing touchdowns, passing touchdowns and field goals, as well as how effectively they defend against rushing and passing touchdowns. First, I’ll organise the stats imported through the Power Query to make it easier to perform the offensive and defensive calculations.

Above, I’ve grabbed the 2023 stats for each team. This includes the number of games, average passing touchdowns for, average passing touchdowns conceded, average rushing touchdowns for, average rushing touchdowns conceded and field goals made (in order). I’ve also grabbed the average figure for each column in row 35. I’ll do the exact same thing to the right for the 2024 stats.

We now have data that’s clean enough to calculate the offensive (ATT) and defensive (DEF) ratings for each team. In this model, I’ve chosen to base the ratings on a trailing 12-game window. For Week 1, 100% of the data will come from the 2023 season, since there’s no data yet for 2024. By Week 4, 25% (3/12) of the data will be from the 2024 season, with the remaining 75% (9/12) still coming from 2023. This proportion gradually shifts until Week 13, when there’s enough data from the 2024 season to base the ratings entirely on the current year.

Let’s look at a team’s Passing Touchdowns ATT Rating. In 2023, the 49ers scored 33 passing touchdowns, averaging 1.94 per game (33 / 17). The league average team scored 23.56 passing touchdowns, or 1.39 per game (23.56 / 17). Therefore, the 49ers’ 2023 Passing TD ATT Rating is 1.94 / 1.39 = 1.40. This means the 49ers scored passing touchdowns 40% more often than the average team.

On the defensive side, the 49ers’ 2023 Passing TD DEF Rating was 0.85, which is relatively strong. For every passing touchdown allowed by the league average team, the 49ers allowed only 0.85. A high ATT Rating is good, but a high DEF Rating is bad.

After Week 1 of the 2024 NFL Season, I’ve filled out all the ATT & DEF Ratings along with each team’s average field goals made per game. Of course, these ratings are based on a trailing 12-game window.

Miscellaneous Stats

The ATT and DEF Ratings allow the model to calculate each team’s expected touchdowns and field goals in a matchup. However, there are still a few scoring methods we haven’t accounted for yet, including:

Fumble TD’s
Interception TD’s
Return/Other TD’s
Safeties
Extra Points (1 & 2 points)

These events are relatively rare, or the differences between teams are minimal. Therefore, I standardised these stats in the model, giving both teams in a matchup the same probability of scoring an interception touchdown, a safety, converting extra points after a touchdown, and so on.

To do this, I downloaded the ‘Scoring Offense’ stats from the 2023 NFL Season using the ‘Pro Football Reference’ website. Here’s a link to the table below: https://www.pro-football-reference.com/years/2023/#team_scoring

I’ve simply copied and pasted this table into the worksheet “Miscellaneous Stats”. Then, below the table, I’ve calculated the averages for all the mentioned scoring methods. Below this data, the Home Advantage has been set at 10%.

Simulating a Match

We now have all the stats needed to simulate an NFL game! The best part is that, with the LET function, we can do this with just one formula. Although it’s a bit lengthy, it’s very logical and efficient.

The LET function in Excel allows you to define variables within a formula, making complex calculations easier to manage by breaking them down into smaller steps. Instead of repeating the same part of a formula multiple times, you can “name” it once and use that name throughout the formula. With so many different variables we need to calculate, then simulate, the LET function makes things so much more efficient.

In full, below is what the final product looks like. Not only is this function able to simulate regular time of an NFL Match, it is also able to produce an Expected Score for each team. In this case, the output is the Expected Score for the Bills playing the Dolphins. I’ll go through and explain each part of the formula logically so that by the end, the flow of the function is clear.

Defining Home/Away Advantage

Within a LET function, you define a name, and then give its formula. For the Bills, playing as the away side, a multiple of 1 / SQRT(1.1) is defined as their “away_advantage”. Obviously, for the home side, this would be named “home_advantage” which would be defined as simply: SQRT(1.1). As a result, it needs to be changed to “home_advantage” when compiling the formula for the home side in cell I3, the Dolphins.

xTouchdowns

To calculate the away side’s xTouchdowns, the following formula is applied:

Where the ‘Other xTouchdowns’ consist of the sum of the average Fumble TD’s, Interception TD’s, Return TD’s and Other TD’s scored by a team per game.

Simulated Safeties

Our next name is “sim_safeties”, which, as made obvious by its name, simulates the safeties a team will score per match. By assuming the number of safeties scored in a match are binomially distributed, I can use the BINOM.INV function:

Simulated Touchdowns

To simulate the touchdowns in a match is easy now. Again, holding the assumption that the rate in which touchdowns are scored in a match can be accurately modelled by the binomial distribution, the following formula for “sim_touchdowns” can be used:

Simulated Extra Point Attempts

For every touchdown a team scores in a simulated match, they’ll have an opportunity to attempt an extra point or a two point conversion. As found earlier, the model is going to assume that with every touchdown there is a 90.2% chance of an extra point being attempted. That means, there is a 9.8% chance of an attempt of a two point conversion. To simulate the number of extra point attempts, based on the number of touchdowns scored in a match, the following formula for “sim_extra_point_attempt” must be used:

Simulated 2 Point Conversion Attempts

The only other option a team has, used 9.8% of the time, is the 2 point conversion. For “sim_2pa”, the formula is merely: sim_touchdowns - sim_extra_point_attempts

Simulating Extra Points Scored

We have the simulated number of extra points attempted. We must simulate the success of these kicks, assuming that each kick has a 95.89% chance of going through. Therefore, “sim_extra_points” has the formula:

Simulating 2 Point Conversions Scored

Similarly, “sim_2pm” has the formula:

Simulating Field Goals

Simulating the number of field goals a team converts is similar to that of touchdowns. First, the xField Goals is calculated; which is the team’s trailing average scaled by their away/home advantage. This is then plugged into the BINOM.INV function, to simulate a final number using the binomial distribution. The following formula holds for “sim_field_goals”:

Where the xField Goals for the away side is calculated by:

Simulated Score

By simulating every method of scoring in a game, we can calculate the total points a team will score. The formula for the simulated score, called “simulated_score”, is shown below:

Expected Score

The expected score for a team in a game is the sum of the expected values for each method of scoring. It represents what the average simulated score should be. The formula for the expected score, called “expected_score”, is below:

The Final Calculation

The last argument in any LET function is its final calculation, which can either be a name or a formula involving the name. In this case, I’ll display the expected score, labelled “expected_score”, for the away team, the Bills. When I copy the function down to cell I3 and change “away_advantage” to “home_advantage” throughout the formula, the Dolphins also receive an expected score.

According to the model, the Bills are expected to win by a close margin, around 27-26. However, if I change the final argument in the LET function to “simulated_score”, it will generate a simulated scoreline for the game. In my first simulation, the Dolphins won 14-7.

Where to Next?

A few more things are required to complete the model. Firstly, this simulated scoreline only accounts for regular time in a match. It is entirely possible that a simulated scoreline could end up as a tie; e.g. 14-14. As a result, a simplified version of overtime needs to be implemented into the model, to separate teams into a winner and loser in the event of a tie.

After the whole game is able to be simulated, this can be repeated thousands of times to get a distribution of results. Relying on the “law of large numbers”, the probabilities for the Moneyline, Spread and Total Points can be calculated. These changes will be outlined in the later articles on the NFL model.

Please let me know if you have any questions, lads! Drop a like and a comment down below. Of course, to access this model, I highly recommend joining the community and becoming an Official Excel LAD!