G’day, lads
In this article I’ll run through my process in using linear regression in Excel to predict corner kicks based on a team’s shot and possession data.
JOIN the best Excel Sports Modelling Community by becoming a member of the Excel LADZ Patreon! Receive downloadable access to every model (including the Excel LADZ Corner Model) as well as technical posts which will improve your understanding of sports modelling. You also have the ability to directly message me lads, so I can help you with any problems!
Excel LADZ Corner Model: https://www.patreon.com/posts/predicting-using-94108623
Linear Regression | Background
Linear regression is a method of modelling the relationship between two or more variables by fitting a linear equation to the observed data. In our scenario, the number of corners a team is expected to win is the “dependent variable”. The values used to predict them, shots and possession, are “predictors” or “independent variables”. Simply, linear regression analysis in Excel will come up with a simple equation to use the “predictors” (shots + possession) to come up with the dependent variable (expected number of corners).
This equation takes the form:
Data Collection
Firstly, Excel needs data to come up with an appropriate equation. To find out how a team’s possession and shots on goal influence the number of corners they’re awarded, we need historical data. Below is a link to stats from every game in the 2022-23 EPL season.
Kaggle 2022-23 EPL Dataset: https://www.kaggle.com/datasets/afnanurrahim/premier-league-2022-23?select=2023_home_teams_stats.csv
We can filter out all irrelevant data, so that we’re left with only corner, shot and possession stats for the home and away teams.
Data Analysis in Excel
To run the linear regression analysis in Excel, we need to go into Data > Data Analysis > Regression. We will do the formula for the Home Team first.
The ‘Data Analysis’ icon may not be enabled yet; especially if you haven’t done a linear regression analysis before. To do this, follow the instructions here: https://support.microsoft.com/en-au/office/load-the-analysis-toolpak-in-excel-6a63e598-cd6d-42e3-9317-6b40ba1a66b4
A ‘Regression’ pop-up window will appear. Firstly, we need to input a range for the Y and X Range. The Y Range is the “dependent variable”; what we are predicting. So, this will be the ‘Home Corner Kicks’ in Column B. The X Range include the “predictors”. Select Columns D and E together.
If you’ve selected the column header as part of your Y/X Ranges, then select the ‘Labels’ checkbox. For the Output, select ‘New Worksheet Ply:’ and name it “Home Regression”. This will create a new worksheet called “Home Regression” with all our analysis. Press OK. Repeat for the Away Regression worksheet, except the Y/X Ranges will have to be updated accordingly.
Interpretation of Results
In a simple Linear Regression Model, you should be very interested in two things: the Significance F and the Coefficients.
The Significance F in cell F12 of the Home/Away Regression worksheets is the probability that all the coefficients in the regression output are zero. In other words, it indicates whether the regression model provides a better equation than a model with no predictors at all! Obviously, we need this value/probability to be low. Anything below 0.05 is usually acceptable.
Next, the Coefficients in cells B17:B19 of the Home/Away Regression worksheets provide the equation for use to predict the Home/Away Corners in a match. Using the example in the video, the Intercept is B17, the Coefficient for Possession is B18, and the Coefficient for Shots is B19.
Prediction and Application
After obtaining the regression equation from Excel, you can predict the number of corners a team might win based on their average shots taken and possession percentages. Grab these stats for each team so far in the season.
We can use these stats as inputs in the equation for a team’s Expected Corners in a Match.
In this model, I made a slight change to the Possession input. Because a match will always have 100% possession shared between two teams, team’s with an equal amount of possession will be adjusted back to 50/50. This is using the formula:
Team Poss. / ( Team Poss. + Opp. Poss.)
Then, after creating the ‘Results’ worksheet, you can come up with a dashboard looking like this:
Thanks for reading lads, I highly recommend you join the Patreon today! If you have any more questions about the model or if you want further insight into how to build the odds page using the Poisson Distribution, join the community of lads!
david regnander
2024-04-11 16:18:07 +0000 UTCExcel LADZ
2024-04-10 19:47:31 +0000 UTCdavid regnander
2024-04-10 15:05:16 +0000 UTCExcel LADZ
2024-03-06 01:38:17 +0000 UTCOliver Asturiano
2024-03-04 15:37:38 +0000 UTCRob
2023-12-19 22:23:26 +0000 UTCExcel LADZ
2023-12-19 04:19:12 +0000 UTCRob
2023-12-19 04:11:26 +0000 UTCExcel LADZ
2023-12-14 22:13:58 +0000 UTCNick Segura
2023-12-14 12:52:20 +0000 UTCNick Segura
2023-12-14 12:52:10 +0000 UTC