SamSuka
Excel LADZ
Excel LADZ

patreon


Building an NFL Model - Part 1 | Predicting Game Outcomes

G’day, lads! In this article, I’ll explain the introduction to building the ‘Excel LADZ - NFL Model’. By the end, you’ll be able to understand the core principles behind the model, including how to simulate a single game and predict a winner using Attack and Defense Ratings. Parts 2 and 3 will cover simulating games to generate a distribution of outcomes, which can be used to price markets like the Moneyline, Spread, and Total Points, helping you identify positive expected value bets.

If you haven’t already, download the NFL Model for daily analysis and join the most active sports modelling community on YouTube!

Download the ‘Excel LADZ - 2024 NFL Model’: https://www.patreon.com/posts/excel-ladz-2024-111333965

Model Framework

Before building a model, I always start with a plan. For the NFL, my goal is to predict the probability of a team winning (Moneyline), covering a Spread, and hitting the Over/Under Points. To calculate these outcomes, a simulation model is needed.

Simply comparing the overall strength of two teams isn’t enough. For example, an ELO or SRS (Simple Rating System) model can help you to predict the likelihood of a team winning, but these methods fall short when you need to predict how many points will be scored in a game. Instead, we need a model that considers both a team’s offensive and defensive strengths.

Let’s look at an example: If two average teams are playing, and both teams have strong offenses but weak defenses, we would expect a high-scoring game, meaning the over/under points threshold should be set higher. On the other hand, if both teams are defensively strong but offensively weak, the expected point total would be much lower.

In an NFL game, there are several ways a team can score, but touchdowns are the most significant, worth 6 points. In this model, I’ll estimate a team’s Expected Touchdowns (xTouchdowns) by factoring in their offensive ability, adjusting for the opponent’s defense, and then scaling that by the league average. Other scoring methods (like field goals) will be simplified to keep the model efficient.

Because we are calculating probabilities for the Moneyline, Spread, and Over/Under, there isn’t one single formula to rely on. Instead, we need to simulate an entire game multiple times, and then repeat this process to create a distribution of outcomes. This will give us a more accurate picture of the likely results.


Importing the Data

Now that I have a plan, I can start building the model. I don’t want to manually copy and paste each team’s stats into the model every week and repeat the calculations. Instead, I’ll connect the spreadsheet to the internet so it can automatically update with the latest stats when I refresh it. In Excel, this is done using a Power Query. I’ll use a Power Query to connect the model to the official NFL Website and import the following team stats for the 2023 and 2024 seasons:


Now, the model will have a separate worksheet for the 2023 and 2024 NFL season stats.


I’ll also create a worksheet that contains a Power Query and connects the spreadsheet to the NFL Official Standings for 2023 & 2024: https://www.nfl.com/standings/


Developing ATT & DEF Ratings

Now that we have the key stats in the model, we can use them to assess each team’s strength. Specifically, I want to evaluate how well each team scores rushing touchdowns, passing touchdowns and field goals, as well as how effectively they defend against rushing and passing touchdowns. First, I’ll organise the stats imported through the Power Query to make it easier to perform the offensive and defensive calculations.


Above, I’ve grabbed the 2023 stats for each team. This includes the number of games, average passing touchdowns for, average passing touchdowns conceded, average rushing touchdowns for, average rushing touchdowns conceded and field goals made (in order). I’ve also grabbed the average figure for each column in row 35. I’ll do the exact same thing to the right for the 2024 stats.

We now have data that’s clean enough to calculate the offensive (ATT) and defensive (DEF) ratings for each team. In this model, I’ve chosen to base the ratings on a trailing 12-game window. For Week 1, 100% of the data will come from the 2023 season, since there’s no data yet for 2024. By Week 4, 25% (3/12) of the data will be from the 2024 season, with the remaining 75% (9/12) still coming from 2023. This proportion gradually shifts until Week 13, when there’s enough data from the 2024 season to base the ratings entirely on the current year.

Let’s look at a team’s Passing Touchdowns ATT Rating. In 2023, the 49ers scored 33 passing touchdowns, averaging 1.94 per game (33 / 17). The league average team scored 23.56 passing touchdowns, or 1.39 per game (23.56 / 17). Therefore, the 49ers’ 2023 Passing TD ATT Rating is 1.94 / 1.39 = 1.40. This means the 49ers scored passing touchdowns 40% more often than the average team.

On the defensive side, the 49ers’ 2023 Passing TD DEF Rating was 0.85, which is relatively strong. For every passing touchdown allowed by the league average team, the 49ers allowed only 0.85. A high ATT Rating is good, but a high DEF Rating is bad.

After Week 1 of the 2024 NFL Season, I’ve filled out all the ATT & DEF Ratings along with each team’s average field goals made per game. Of course, these ratings are based on a trailing 12-game window.


Miscellaneous Stats

The ATT and DEF Ratings allow the model to calculate each team’s expected touchdowns and field goals in a matchup. However, there are still a few scoring methods we haven’t accounted for yet, including:

These events are relatively rare, or the differences between teams are minimal. Therefore, I standardised these stats in the model, giving both teams in a matchup the same probability of scoring an interception touchdown, a safety, converting extra points after a touchdown, and so on.

To do this, I downloaded the ‘Scoring Offense’ stats from the 2023 NFL Season using the ‘Pro Football Reference’ website. Here’s a link to the table below: https://www.pro-football-reference.com/years/2023/#team_scoring


I’ve simply copied and pasted this table into the worksheet “Miscellaneous Stats”. Then, below the table, I’ve calculated the averages for all the mentioned scoring methods. Below this data, the Home Advantage has been set at 10%.


Simulating a Match

We now have all the stats needed to simulate an NFL game! The best part is that, with the LET function, we can do this with just one formula. Although it’s a bit lengthy, it’s very logical and efficient.

The LET function in Excel allows you to define variables within a formula, making complex calculations easier to manage by breaking them down into smaller steps. Instead of repeating the same part of a formula multiple times, you can “name” it once and use that name throughout the formula. With so many different variables we need to calculate, then simulate, the LET function makes things so much more efficient.

In full, below is what the final product looks like. Not only is this function able to simulate regular time of an NFL Match, it is also able to produce an Expected Score for each team. In this case, the output is the Expected Score for the Bills playing the Dolphins. I’ll go through and explain each part of the formula logically so that by the end, the flow of the function is clear.

Within a LET function, you define a name, and then give its formula. For the Bills, playing as the away side, a multiple of 1 / SQRT(1.1) is defined as their “away_advantage”. Obviously, for the home side, this would be named “home_advantage” which would be defined as simply: SQRT(1.1). As a result, it needs to be changed to “home_advantage” when compiling the formula for the home side in cell I3, the Dolphins.

To calculate the away side’s xTouchdowns, the following formula is applied:

Where the ‘Other xTouchdowns’ consist of the sum of the average Fumble TD’s, Interception TD’s, Return TD’s and Other TD’s scored by a team per game.

Our next name is “sim_safeties”, which, as made obvious by its name, simulates the safeties a team will score per match. By assuming the number of safeties scored in a match are binomially distributed, I can use the BINOM.INV function:

To simulate the touchdowns in a match is easy now. Again, holding the assumption that the rate in which touchdowns are scored in a match can be accurately modelled by the binomial distribution, the following formula for “sim_touchdowns” can be used:

For every touchdown a team scores in a simulated match, they’ll have an opportunity to attempt an extra point or a two point conversion. As found earlier, the model is going to assume that with every touchdown there is a 90.2% chance of an extra point being attempted. That means, there is a 9.8% chance of an attempt of a two point conversion. To simulate the number of extra point attempts, based on the number of touchdowns scored in a match, the following formula for “sim_extra_point_attempt” must be used:

The only other option a team has, used 9.8% of the time, is the 2 point conversion. For “sim_2pa”, the formula is merely: sim_touchdowns - sim_extra_point_attempts


We have the simulated number of extra points attempted. We must simulate the success of these kicks, assuming that each kick has a 95.89% chance of going through. Therefore, “sim_extra_points” has the formula:

Similarly, “sim_2pm” has the formula:

Simulating the number of field goals a team converts is similar to that of touchdowns. First, the xField Goals is calculated; which is the team’s trailing average scaled by their away/home advantage. This is then plugged into the BINOM.INV function, to simulate a final number using the binomial distribution. The following formula holds for “sim_field_goals”:

Where the xField Goals for the away side is calculated by:

By simulating every method of scoring in a game, we can calculate the total points a team will score. The formula for the simulated score, called “simulated_score”, is shown below:

The expected score for a team in a game is the sum of the expected values for each method of scoring. It represents what the average simulated score should be. The formula for the expected score, called “expected_score”, is below:

The last argument in any LET function is its final calculation, which can either be a name or a formula involving the name. In this case, I’ll display the expected score, labelled “expected_score”, for the away team, the Bills. When I copy the function down to cell I3 and change “away_advantage” to “home_advantage” throughout the formula, the Dolphins also receive an expected score.

According to the model, the Bills are expected to win by a close margin, around 27-26. However, if I change the final argument in the LET function to “simulated_score”, it will generate a simulated scoreline for the game. In my first simulation, the Dolphins won 14-7.


Where to Next?

A few more things are required to complete the model. Firstly, this simulated scoreline only accounts for regular time in a match. It is entirely possible that a simulated scoreline could end up as a tie; e.g. 14-14. As a result, a simplified version of overtime needs to be implemented into the model, to separate teams into a winner and loser in the event of a tie.

After the whole game is able to be simulated, this can be repeated thousands of times to get a distribution of results. Relying on the “law of large numbers”, the probabilities for the Moneyline, Spread and Total Points can be calculated. These changes will be outlined in the later articles on the NFL model.

Please let me know if you have any questions, lads! Drop a like and a comment down below. Of course, to access this model, I highly recommend joining the community and becoming an Official Excel LAD!


Comments

I have been having trouble with data going threw quary then disappering and when you try to put in team everything says name? in every cell

jonathan

are you going to do a college basketball model love to have an accurate one

jonathan

What I'm saying is the 538 NFL Model is spot on. The actual score for the game is in the percentage that's giving to each team, once you choose them. You, half to handicap the numbers that are giving. Example tonight Monday night game. The eagles were giving a 77.83% chance to outright win the game. Atlanta was giving 22.17% chance at winning the game. So, this is how you handicap the eagles % goes as follow 8*3=2+4=6, These the only points eagles scored the entire first half of the game in actual they score TD kicked the extra point which gave eagles actual 7 points. Now u add 7+7=14, combining 14+6=20 that's how you handicap the eagle's percentage for the game for points. Atlanta you add 2+2=4, which this team score only 3, points the entire first half of the game. Now you take seventeen 1*7=7, you add 4+7=11. You, half to double the 11 because it's two halves in football because you can't account for when a team is actual going to score points in a game. So, before you choose a game you physical see a potential team's points before its ever played. It's the same way the college football Model just some silent differences. The college model at times giving you numbers to add are replace are subtract for total points in the percentages of each team.

Kerry Coleman

Hello, Kerry 👋 Apologies, I'm not quite picking up everything in your example. If you don't mind me asking, what's the significance of the 4 * 7 = 28 example? Are you saying the Delta makes it possible to predict the points of a match? Thanks.

Excel LADZ

maybe others understood what you put but i got lost on your example.

ic1uc1

Before you make any video, for NFL are college football models... I made discover tonight. I went thru all the last three weeks of college football and one week of NFL. The college model is 100% accurate just off by 1 are 2 points. The reason way I'm saying this I'll explain. The college model is excellent for as its creation. Model is spot on if you as a sports gambler don't understand how the college football scoring system is in American football is you'll lose ever time. Here is way the percentages for each team are the actual score for the game, also with the ELO Delta. The red minus and green plus are crucial to this. Reason being, is once the sportsbook create their lines for that individual line for the match-up. Check the ELO delta to check their line accuracy. Reason why is whoever the sportsbooks conform the favorite is to win the match-up. You as individual use the favorite percent the first two numbers are for the visiting team as their score, after the point whatever numbers are after the percent sign numbers for the home team. Here is the kicker in all of this. The numbers 3,6,9 are all multipliers. Where it's in front are behind a number. Now you use the ELO delt number(s). to add are subtract. The numbers that are giving in the from the favorite, as away and home here is an example. The Georgia vs Kentucky it was -47, in ELO Delt. If you time 4 * 7 equal 28. The serious why you times because the four and seven on opposite sides of the spectrum

Kerry Coleman

Hello, Kerry 👋 I'm going to release the NFL Model onto YouTube first, and then the College Football model is likely to be the next one 👍

Excel LADZ

Hello, lad! In cases where the probability of success is small and the number of trials is large (as in the case of the BINOM.INV formula), the binomial distribution approximates a Poisson distribution. While I couldn't automatically assume that all stats actually follow the Poisson distribution, historical data has proven it to be pretty accurate. There's an article online about how Tom Brady's career passing touchdowns are very well approximated by Poisson. I recommend having a read of that, lad 🔥

Excel LADZ

Absolutely 👍 You can message me through Patreon, or through Discord. If you haven't already, here's a helpful website on how to join the Discord group: https://support.patreon.com/hc/en-us/articles/212052266-Getting-Discord-access

Excel LADZ

G'day mate, please reach out to me with your request at: ladz.excel@gmail.com 😃

Excel LADZ

Is it safe to assume all stats follow poisson?

Vasilis

Thank you again for working on these models and giving detailed information on the processes! It is very helpful! I wanted to ask a couple questions but don’t know how to directly message you as it may be a little lengthy. Is there a way we can discuss? Thanks again, really looking forward to see how this works out!

Binoy Sheth

Please take the time to message me bro. I am working on a system but don’t have no where near as much knowledge as you do. I will pay a good price if you help me. My insta is lucian.lp or message me on here

Lucian Pop

are you still doing video for the college model

Kerry Coleman


More Creators