Million and One Simulations of the NFL Playoffs
Goldman Sachs’ analysts received a lot of buzz when they simulated the 2018 Fifa WorldCup one million times to predict the championship team. I also had a simulation for the WorldCup. While mine wasn’t perfect, it sure as hell wasn’t as horrible as Goldman’s models, which failed spectacularly. I guess forecasting events that they can’t manipulate or control is a little trickier than what they’re used to doing… Anyhow, I felt the need to one up them here so my simulation for the NFL Playoffs will run 1,000,001 times!
(just kidding Goldman.. I know you run the world so I didn’t say anything bad about you or your forecasting skills…)
In all seriousness, predicting a sporting event is extremely difficult. Understanding probabilities is an even harder concept for people to understand. For example, Nate Silver (statistical genius) gave Hillary Clinton a 70% chance to win the election and Trump a 30%. He was lambasted by multiple sources for how “wrong” he was. However, saying something has a 30% chance of happening doesn’t mean that there is NO CHANCE that it will happen. If the election happened three times, Hillary would have won twice. It just so happens that we live in the world where Trump won. Trump was given a higher probability of winning the election than any one team has to win the SuperBowl. So let me preface my results with the clear understanding that these are only probabilities based on simulated events and anything is possible in a single elimination playoff.
Playoff Seeds are locked in
The NFL regular season is over. The playoffs are right around the corner. The debate about who is going to win the Super Bowl is in full force. There are so many great story lines this year:
- Saints – Drew Brees has been having a spectacular MVP-worthy season and their defense has definitely stiffened up lately.
- Chiefs – Who would have thought that Patty Mahomes could come out of the gate flinging the ball around the way he has this season; no look passes, lefties, and 4th down heaves that should be picked off but miraculously a WR pops up out of nowhere to save his ass and get the first down to help get the win.
- Rams – Gurley/Goff combo seems unstoppable and, unlike the Chiefs, the Rams have Aaron Donald and a respectable defense backing the offense up.
- Patriots – Tom Brady and Belichick might not like each other very much but they are still the Patriots and they know how to win this time of year.
- Chargers – Will Philip Rivers (my doppelgänger) finally get his Ring? Quietly winning and winning and winning.
- Bears – The Firm’s sleeper preseason bet at 100-1 odds. Defense wins championships.
Most people living in the world today, not just football fans, have probably heard the ever present buzzwords Machine Learning and Artificial Intelligence or AI. Some folks out there even try to start companies with these buzzwords in their title simply to sound innovative or smart…(*cough* A.I. Sports *cough*)… The power of AI is beautiful. But when applied to sports, it can also be extremely beneficial for betting or discussing “most likely” outcomes.
I intend to walk you through a basic application of machine learning to determine what probability chance each team has of winning their playoff games and eventually placing a big shiny Super Bowl ring on their finger in February.
All my code has been written in the R language. Short code will be included here. Full code and data will be linked in my github for those of you who have the desire to run the simulation on your own.
(you can fiddle around with the average probabilities in the team data file to get different results)
Goals:
- Predict the outcome of each individual playoff game.
- Determine which team has the highest probability of winning the Super Bowl.
- Determine probabilities for each team to win the wildcard games, divisional games, conference games, and finally the Super Bowl.
Data:
The data sets I use for my models are publicly available. I use derived statistics and custom metrics (my secret sauce that I am not willing to divulge) from play-by-play game data encompassing 2001-2018 seasons.
Pro Football Reference is a great source for data. They provide easy to programmatically scrape tables with aggregated statistics as well as the more granular level play-by-play data.
Method:
I use an ensemble model approach with 14 unique predictive models utilizing various Machine Learning Algorithms, some parametric, some non-parametric. I aggregate the resulting probabilities to reach my final probability that a team will win a given game. This process has yielded over 66% accuracy when taking into account the Vegas Spread, and over 80% accuracy picking a winner straight up this season.
I applied my algorithms on every single possible match-up and wrote a script simulating each game in the playoffs.
The Code:
Load Libraries
Load Data
View Data
The “teams” dataframe has a list of every possible match-up, home and away, with the probability that the home team wins.
Function to Simulate a Single Game
The Set Up
Monte Carlo Simulation
This code simulates every game in the playoffs in the correct order and tracks the winner of each game. It will run 1,000,001 times saving the results of each simulation.
Organize Results
Results
The Kansas City Chiefs have the highest probability to win the Super Bowl.
Again, this does not mean that they WILL win the Super Bowl or even make it to the Super Bowl. They have a 27% chance to win. That also means that they have a 73% chance of NOT winning the Super Bowl… Soooo it’s way more likely that they won’t win it all. However, they have the greatest chance out of all the teams.
One more thing to note from this simulation –
Every single team with a first round bye will have a much higher likelihood of winning the Super Bowl simply because they will only need to win 3 games, while everyone else has to win 4.
Playoff Bracket
Here is another view of my model’s predicted probabilities:
Based on this view of the simulation, the Chiefs should be favored in all their playoff games.
I love the playoffs! Despite all the statistics and predictive modeling, anything can happen in a single elimination tournament. Let’s see how this all plays out!