SamSuka
Excel LADZ
Excel LADZ

patreon


Predicting Corners Using Shots + Possession | Linear Regression

G’day, lads

In this article I’ll run through my process in using linear regression in Excel to predict corner kicks based on a team’s shot and possession data.

JOIN the best Excel Sports Modelling Community by becoming a member of the Excel LADZ Patreon! Receive downloadable access to every model (including the Excel LADZ Corner Model) as well as technical posts which will improve your understanding of sports modelling. You also have the ability to directly message me lads, so I can help you with any problems!

Excel LADZ Corner Model: https://www.patreon.com/posts/predicting-using-94108623

Linear Regression | Background

Linear regression is a method of modelling the relationship between two or more variables by fitting a linear equation to the observed data. In our scenario, the number of corners a team is expected to win is the “dependent variable”. The values used to predict them, shots and possession, are “predictors” or “independent variables”. Simply, linear regression analysis in Excel will come up with a simple equation to use the “predictors” (shots + possession) to come up with the dependent variable (expected number of corners).

This equation takes the form:

Data Collection

Firstly, Excel needs data to come up with an appropriate equation. To find out how a team’s possession and shots on goal influence the number of corners they’re awarded, we need historical data. Below is a link to stats from every game in the 2022-23 EPL season.

Kaggle 2022-23 EPL Dataset: https://www.kaggle.com/datasets/afnanurrahim/premier-league-2022-23?select=2023_home_teams_stats.csv

We can filter out all irrelevant data, so that we’re left with only corner, shot and possession stats for the home and away teams.

Data Analysis in Excel

To run the linear regression analysis in Excel, we need to go into Data > Data Analysis > Regression. We will do the formula for the Home Team first.

The ‘Data Analysis’ icon may not be enabled yet; especially if you haven’t done a linear regression analysis before. To do this, follow the instructions here: https://support.microsoft.com/en-au/office/load-the-analysis-toolpak-in-excel-6a63e598-cd6d-42e3-9317-6b40ba1a66b4

A ‘Regression’ pop-up window will appear. Firstly, we need to input a range for the Y and X Range. The Y Range is the “dependent variable”; what we are predicting. So, this will be the ‘Home Corner Kicks’ in Column B. The X Range include the “predictors”. Select Columns D and E together.

If you’ve selected the column header as part of your Y/X Ranges, then select the ‘Labels’ checkbox. For the Output, select ‘New Worksheet Ply:’ and name it “Home Regression”. This will create a new worksheet called “Home Regression” with all our analysis. Press OK. Repeat for the Away Regression worksheet, except the Y/X Ranges will have to be updated accordingly.

Interpretation of Results

In a simple Linear Regression Model, you should be very interested in two things: the Significance F and the Coefficients.

The Significance F in cell F12 of the Home/Away Regression worksheets is the probability that all the coefficients in the regression output are zero. In other words, it indicates whether the regression model provides a better equation than a model with no predictors at all! Obviously, we need this value/probability to be low. Anything below 0.05 is usually acceptable.

Next, the Coefficients in cells B17:B19 of the Home/Away Regression worksheets provide the equation for use to predict the Home/Away Corners in a match. Using the example in the video, the Intercept is B17, the Coefficient for Possession is B18, and the Coefficient for Shots is B19.

Prediction and Application

After obtaining the regression equation from Excel, you can predict the number of corners a team might win based on their average shots taken and possession percentages. Grab these stats for each team so far in the season.

We can use these stats as inputs in the equation for a team’s Expected Corners in a Match.

In this model, I made a slight change to the Possession input. Because a match will always have 100% possession shared between two teams, team’s with an equal amount of possession will be adjusted back to 50/50. This is using the formula:

Team Poss. / ( Team Poss. + Opp. Poss.)

Then, after creating the ‘Results’ worksheet, you can come up with a dashboard looking like this:

Thanks for reading lads, I highly recommend you join the Patreon today! If you have any more questions about the model or if you want further insight into how to build the odds page using the Poisson Distribution, join the community of lads!

Predicting Corners Using Shots + Possession | Linear Regression

Comments

Was hoping for the shortcut , horrible at excel 😂

david regnander

Hi lad, to add in these stats you would have to run the regression analysis again. This is because the coefficients for the possession & shots variables will change when extra factors of the attacking rate & average throwing are added in 👍 Create extra columns within the model, then run the regression, and you will get a formula that works based on historical data 😃 I highly recommend joining, not only for the file, but for access to the sports modelling community on Discord 🔥

Excel LADZ

Hi mate, in theory, if i download this file, and i change the variables for say, attacking rate on sides + average throwins, this model would still work or no?

david regnander

Hi lad, for this to happen I'll need to narrow down a data source with automatic data (like you mentioned). In the video, I just used a dataset from the 2022-23 EPL Season. So, using the resulting equation on other league might be a bit statistically inappropriate (or just an oversimplification). If in the future I find a URL to bring in shooting & possession game data from a specific league (e.g. as chosen from a drop down list), I'll update the model 👍 In the meantime, if you follow the instructions in the video and article (found here: https://www.patreon.com/posts/predicting-using-94652720) then you make the model for any league you choose lad 😃

Excel LADZ

Hi, any plans to expand this to major european leagues like Spain la Liga, Germany Bundesliga, Italia Seria A, France Ligue 1, with automatic data like the NBA models?

Oliver Asturiano

i cannot for the life of me log in. i clicked the link and it still takes me to locked videos.

Rob

Hi Rob, it seems like you have two emails with Patreon. One paid account (where you can access the posts) and a free account (which is where you’re commenting from now). I sent you a message to your paid account now. Please reply back to this post if you believe you’ve made a mix up with the emails so I can help you fix it lad 👍

Excel LADZ

i paid the 23$ but cant seem to figure out how to log in.

Rob

Hi lad, this is definitely something I've been looking at making a video/article on 🔥 As you probably know, the modification would be a zero inflated Poisson Model, as studies have found that Poisson can slightly underestimate the probability of a low scoring game 👍

Excel LADZ

Thanks a lot

Nick Segura

Hi, this is great but could you show how to apply corrective factor to compensate for Pdist flaws?

Nick Segura


More Creators