Introduction
The landscape in the sports betting industry has changed dramatically in recent years, partially due to the meteoric rise of fantasy sports markets, a more mainstream acceptance of betting on sports and recent changes in federal laws. There is an opportunity to approach betting on sports the same way people “bet” on stock prices in the financial markets. Sportsbooks set their odds and lines for games based on their predictive models along with taking into account expected public perception of the team. A sportsbook does not want to be overly exposed on any particular game. In order to mitigate risk, their lines are adjusted slightly to entice bettors to take one side or the other balancing out the money on each side. The gap between the sportsbooks game model and the general public thinking provides for an opportunity to gain an edge in the market. Raymond Sauer provided an in-depth analysis of the efficiency of the betting markets and discovered inconsistencies that are ripe for exploiting (1998). If a firm was able to determine which games have the greatest statistical edge in our favor by way of statistical modeling and probability theory, they could systematically invest on games lowering risk and increasing return on investment.
The four major sports markets in the United States are the National Football League (NFL), Major League Baseball (MLB), National Basketball Association (NBA), and National Hockey League (NHL). Each sport will need to be approached slightly different because of the amount of underlying data feeding the predictive models. For example, the MLB has 162 regular season games while the NFL only has 16. There will be higher variance when there is less data to train the models. Because of the number of games and seasonality of the different sports a highly profitable sports investment firm should be able to invest in each of these markets.
Method
There are several methodologies employed to create a consistently profitable sports investment firm: data collection, feature engineering, variable importance rankings, modeling/simulation with back-testing, and money management. Each sport will follow similar workflows but specific models and statistics will vary widely. There will also need to be personnel who are intimately involved with each of the sports to provide a “sanity check” for the models and evaluate intangible factors or current news.
Data Collection
A web scraping script written in Python will be used to collect the historical game statistics, as well as the current game line. Dr. Thomas Miller suggests collecting offensive and defensive statistics for both the home and away teams to generate the most accurate predictions (2015).
Feature Engineering
R will be used to manipulate the data and create custom metrics for players and teams. This will allow the models to gain the biggest advantage in the next stage. Most predictive models in sports use the basic metrics available freely online. Using deeper analysis and custom metrics to find more predictive variables and hidden interactions between features creates an edge over the bookmakers and general public. There will need to be variable importance metrics to determine which variables have the greatest influence on the outcome by sport.
Modeling
There are several modeling approaches vital to employ to determine which model is the best for each particular sport. Following the research of Bernard and team, implementing four separate neural networks and combining them into an ensemble model helped them achieve an 83.33% accuracy rating on their test set of games for the NBA (Loeffelholz,Bednar, & Bauer, 2009). While that might work well for the NBA, it might not perform nearly as well in the NFL where there are fewer training data to feed into the model. The modeling Balreira and team utilized helped them achieve a 64% accuracy rating in the NFL with the use of a pseudo Markov Chain Monte Carlo simulation (Balreira, Miceli, & Tegtmeyer, 2013). Dr. Miller also used Monte Carlo simulations to determine the expected score for the visiting team and a separate
Monte Carlo simulation for the expected score for the home team. He then combined the results to output a predicted probability of winning the game for the MLB (2015). Each sport needs to have a different system in place and different benchmarks of performance metrics for it to be successful.
Back-Testing
All models will be back-tested on the last two full season of games to determine how they would have performed taking into account the money management system for wager sizes. The optimal predicted probability threshold for generating the maximum rate of return throughout the season for each sport will be determined by this method. A balance between win rate and quantity of wagers placed will be taken into account to ensure that there are enough opportunities each week to maximize returns and minimize exposure.
Money Management
A 52.39% win rate in the sports investing market is needed to break even because of the cost associated with placing the wager, typically referred to as the vig or vigorish. In order to place a wager a sportsbook follows the “eleven for ten rule” which follows the logic that in order to win $100, you have to risk losing $110 (Sauer, 1998). The extra $10 is what the sportsbook keeps for taking on the wager. It has been difficult because of the lack of transparency to find performance indicators of other sports investment firms to measure against. Calculating expected value, in order to generate a consistent return on investment, a sports investment firm should strive to achieve closer to a 56-60% win rate in a given season.
The model will produce probabilities of winning each game that day. Management will be able to filter out the games with lower win probabilities based on the predefined thresholds determined by back-testing. They will be disciplined and only wager on games with a high likelihood of producing a win. The typical mistake most causal bettors make comes with their money management and their impulse to make a bet. The firm will follow strict protocols helping to avoid these common pitfalls. For simplicity, imagine if a firm had $100,000 under management and set a flat 2% wager amount ($2,000) and focusing solely on the NFL regular season, 17 weeks of games. Following the preset threshold, there would be, on average, 7 games each week that the firm would place wagers on. A 60% win rate would produce a $2,240 profit per week.
(-2200*0.4) + (2000*0.6) * 7 = 2,240
Over the course of the 17 week season the firm would generate a profit of $38,080 with roughly a 38% return on investment. Juxtaposing the sports investment firms ROI with a mutual fund or other investment vehicle over the same 17 week period there would be remarkable differences. Predictive modeling geared towards the sports betting market will allow the firm to achieve substantial returns.
References
Balreira, E. C., Miceli, B. K., Tegtmeyer, T. (2013). An oracle method to predict NFL games. Journal of Quantitative Analysis in Sports, 10(2), 183-196.
Loeffelholz, B., Bednar, E., Bauer, K. W. (2009). Predicting NBA games using neural networks. Journal of Quantitative Analysis in Sports, 5(1), 1-15.
Miller,T. W. (2015). Modeling techniques in predictive analytics with Python and R: A guide to data science. NJ: Pearson FT Press.
Sauer, R. D. (1998). The economics of wagering markets. Journal of Economic Literature 36,2021-2064.