In this article, I will be taking you through my College Football over/under betting model. This project was greatly inspired by Joey DiCresce’s (@joey_dicresce) excellent win totals model, so if you haven’t had the chance to check it out, certainly do so!
This article dives into a large amount of material, so if you wanna skip around, use these links to navigate to each section:
Making the Model
First and foremost, every model needs a name–this one is called REBEL, which stands for Regression (Based) Extreme (Gradient) Boosted Efficiency Learning. As stated in the name, this model is built using an Extreme Gradient Boosted Algorithm. It was trained on 4 years of data (2017-2020) and tested using 1 year (2021). While I am not going to jump into how exactly REBEL works, I will highlight some of the key metrics used in building this model.
Selecting the right metrics can be tricky–while it is essential that you feed your model enough information to try to account for as much variance as possible, you don’t want to give it too many miscellaneous variables which could throw off your results. With this model, I tried to highlight stats that most directly led to points being closer to or further from being scored. For each statistic, I calculated the rolling average from each team’s last 5 games to get a basis of how they were performing heading into the game and allowing for some variance to account for teams who get ‘hot’ or ‘cold’. These metrics are as follows: yards, touchdowns, penalty yards, return yards, kick yards, turnovers, takeaways, sacks, EPA, rushes, passes, avg. total points and field goals made. In the plot below, we can see which stats were most useful for the model–each blue point shows a data value for a single feature and how widely it affected the model prediction (SHAP value on the y-axis).
Making the model is fun and all, but what really matters is how accurate it is, and more so, how much money it is going to make us. To beat the market we have to get over 52.4% of our bets correct–this applies to most sports betting markets, but for over/under game totals it is more straightforward than the rest. Since the majority of over/under betting lines (at least in CFB) are set at -110 (sometimes it can be more favorable), if we make 100 bets and win 52.4% of them we will win $5,240 and lose $5,236, gaining a marginal $4.00. Fortuitously, the higher percentage of bets you win, the wider your profit margin will increase.
The REBEL model exceeded my expectations and performed excellently on our testing data set. It correctly predicted 56.1% of its 663 bets, outperforming the market by 3.7%! To break it down further, it hit 56.4% of its over bets correctly, and 56.1% of its under bets correctly. To account for possible variation in my results, I constructed a 95% confidence interval to gauge the accuracy level of my model. This allows me to claim that I am 95% confident that my model is between 52.3% and 60% accurate. The small interval was great news to me, as it shows the model will almost always perform favorably.
Below I plotted the accuracy of REBEL’s predictions; we can see the percent of correct bets for each xHit value (the expected percent chance for the over to hit) predicted by REBEL. To test the data, I took every bet REBEL produced a prediction for (not every game due to missing data)–if it produced an xHit value of 0.5 or lower, I took the under, and if it produced an xHit value greater than 0.5, I took the over. As we can see, the model beat the market at every xHit level!
To my great delight, REBEL spits out predictions at an amazingly normal rate. Most of its predictions fall at an xHit near 0.5, and higher predictions are less common. Perfect!
It is also interesting to look at how successful our bets are for each over/under value (over/under values used for REBEL are consensus picks). Unsurprisingly, it is pretty random. While the over/under matters a lot when making the model, it isn’t the sole determinant of our bets, so it makes sense for this graph to look the way it does. In the second slide, we can see that the over/under value and REBEL’s xHit prediction aren’t strongly correlated (if anything, a negative correlation, which makes sense).
I know what you’re thinking. Yea, it’s great that the model beat the market, but how much money is this thing making? Let’s take a look. To track our betting throughout the 2021 season, I gave us $1,000 to start (assuming I have deep pockets and can over bet to make every pick), took every bet REBEL produced, and bet $100 on each game (I set the odds at -110 for every game since the data didn’t provide them). The results are mouth-watering. We end the season with over $5,000 in profit, and never went negative!
Results by Division
It is interesting to see how REBEL performed on each division in 2021. The success rates are fairly high for all, which is a good sign–It averaged a 59% success rate for Power-5 schools and a 56% success rate for non-Power-5 schools. One thing to note is how poorly REBEL performed on the CFP (College Football Playoff) teams last season (Alabama, Cincinnati, Georgia, and Michigan). It only beat the market with one team (Alabama), and bet correctly just 50% or less on the other three. It will be interesting to see if that trend continues in 2022, or if that was just a random occurrence.
REBEL 2022 Predictions
The moment we’ve been waiting for…Model predictions for week 1. Before you take a look, keep in mind; these are early predictions! The model did not bet on week 1 games in our testing set because it had no data to go off of that wasn’t used for training. We can only make these 2022 week 1 predictions using data from last season, so these numbers are subject to change. The most accurate (and strongest) bets produced by REBEL will come after week 1 of 2022. Nevertheless, below are REBEL’s early picks!
For those interested, here is an alternate graphic for every bet with the line and average point total between the two teams listed as well. Keep in mind a positive xHit value says to bet the over, while a negative xHit value says to bet the under. The lines are taken from Bovada (some predictions might be slightly different than those above, as lines may have shifted)
That’s all for this article! A lot of work went into putting this model together, so if you have any questions, please feel free to reach out to me on Twitter @analytacist! Stay posted throughout the season for weekly REBEL over/under picks.