Excel LADZ - March Madness Model (2024)
Added 2024-03-19 11:05:04 +0000 UTCG’day lads,
In this explanation I’ll go through how I built the ‘Excel LADZ - March Madness Model’ using ELO Ratings.
ELO Ratings
In order to build a simulation model, you need to be able to define the strength of each team prior to the game starting.
Usually this would involve measuring each team’s Attacking (ATT) and Defending (DEF) strength so that an xPoints/xGoals figure can be calculated. However, this can be difficult when the schedule for each club is significantly different. For example, an average ATT Rating in the English Premier League is better than a great ATT Rating in the MLS.
While there is the option of adjusting each team’s ATT & DEF Rating using a Strength of Schedule measure, often this is very difficult to calculate accurately.
That’s why I like using ELO Ratings for knockout competitions. A team’s ELO Rating is an overall measure of their strength. From Wikipedia, “After every game, the winning player takes points from the losing one. The difference between the ratings of the winner and loser determines the total number of points gained or lost after a game.”
ELO Ratings are great for forecasting Win/Loss scenarios - there is a formula to forecast the Win % of each player. Considering the ELO Rating for Team A (Ra) & Team B (Rb), the formula for Team A’s Win Probability is:
Furthermore, ELO Ratings are often available publicly online for popular sports. In this model, I used the ELO Ratings from this website: https://www.warrennolan.com/basketball/2024/elo
I imported these ELO Ratings into Excel with a Power Query in the worksheet ‘ELO Ratings’. In order to refresh the ratings, all I’ve got to do is go into the ‘Data’ tab and select the icon ‘Refresh All’ - this will refresh the Power Query’s connection with the internet and update the ratings.
Tournament Simulation
The main goal of the model is to calculate each team’s probability of winning the whole tournament. In order to do that, a single simulation has to be run first. Then, the results of the tournament can be simulated over and over again (in this model 2,000 times) to get the probabilities.
In the ‘NCAA Bracket Simulation’ worksheet, each game in the First Round has been written out. Next to the match is the Win Probability for each side (using the ELO Win % Formula above). These figures are used to simulate a random decimal, which is then matched to the simulated ‘winner’ of the match that moves onto the next round. Consider the example below.
In cell E4, Connecticut is the winner in their simulated contest with Stetson. They moved onto the Second Round against FAU, who also had a ‘simulated win’ against Northwestern.
This process is done for every game, in every round, until a Tournament Winner is decided in cell N3.
Tournament Simulations & Analysis
Now that the tournament has been simulated once, I’ve used a Data Table to re-run this entire tournament 2,000 times. This has been done in the ‘Simulations’ worksheet.
I then analysed these simulations in the ‘Analysis’ worksheet. The final table gives the probability of each team being knocked out at each stage of the tournament.
Using the Model
The model is very easy to use! Consider the Connecticut vs Stetson game in the First Round.
As of right now, the game hasn’t been played yet. That’s why in cell E4 (see image above) the winning team that progresses to the Second Round has been simulated using a formula. Pretend Connecticut win this match, and we want to update the model to reflect this change in probabilities.
In this scenario, I would just write over the simulation formula in E4 with =B2 or the text ‘Connecticut’. This locks Connecticut into the Second Round, which the simulations and the ‘Analysis’ worksheet will reflect. On the other hand, it is now impossible for Stetson to progress past the First Round, as the model ‘forgets’ them.
Do this for every game once it has been played.
Note: Also, ELO Ratings will update in response to teams winning/losing. Remember to select ‘Refresh All’ so that these can be reflected in your model.
Thanks for reading lads! If you have any questions, don’t hesitate to ask in the comments or over Discord.
Comments
Love the model. I think for improvements youd have to have some type of recency bias weighing more recent games heavier than not - and to do this unfortunately you will need a strength of schedule modifier since the later games for NCAA are mostly within conference.
Ricky Mahan
2024-04-15 12:50:12 +0000 UTCThanks lad, it's done now! I'll add in an explanation article in the next hour or two explaining how the model works (and how to use it). The model in the post has been updated now (with 2,000 simulations), check it out 🔥
Excel LADZ
2024-03-20 22:48:37 +0000 UTCThis looks great ! But when will the “final” file be ready to go … with the tourney fully kicking off tomorrow?
Carlos Torres
2024-03-20 21:18:06 +0000 UTC