Posts

Showing posts from 2024

Beat the Streak Day 17: Coordinated Pick Selection

Image
My current best pick selection model for BTS achieves roughly 78% accuracy.  Looking at the table from Day Six , we can see that with an optimal pick selection strategy, my odds of beating the streak are roughly 11,800 to 1, or roughly a 0.01% chance of winning.  In this blog post, I will explore methods for boosting this probability by coordinating picks across multiple accounts (e.g., friends, family, other BTS enthusiasts, etc.) To the non-mathematically inclined, one might think that with $k$ accounts our probability of beating the streak as a group would simply multiply by $k$.  This simple formula is not correct, however, although it is an upper bound.  If each account has a probability $p$ of beating the streak, and there are $k$ accounts, the probability that at least one account beats the streak would be $1 - (1 - p)^k$ if we (incorrectly)  assume independence between accounts .  For small $p$ and small $k$, this is pretty close to to the upper bound.  For example, with $p=0.0

Beat the Streak Day 16: Vegas Odds and Sports Betting

Image
I recently have been seeing many advertisements for sports betting platforms like DraftKings and Fanduel, and since those are somewhat related to beat the streak, I thought it would be interesting to look into these things a little more. In my search I came across a website that lists Vegas odds for various sports books on various bets.  Among those bets is one highly related to beat the streak: a bet that a given batter will "record a hit" in a given game.  This is exactly the essence of the beat the streak: identifying a batter most likely to record a hit.  In my effort to develop models for BTS, I have several approaches to estimate the probability that a player will record a hit in a given game.   So three natural questions arise: 1. Is Vegas good at BTS?  That is, is the implied probability of a hit given the Vegas odds a better estimate of the true probability than some of the models I've talked about in this blog? 2. Can the Vegas odds be a useful feature to impro

Beat the Streak Day Fifteen: A back-testing framework

Image
In this blog post, I will talk about a back-testing framework I developed to evaluate the quality of different Beat the Streak pick selection strategies on historical data, including the data sources I use, the evaluation metrics I look at, and some of the baseline models I've considered.  The source code for my back-testing framework is available at  https://github.com/ryan112358/beat-the-streak .  Everything is written in python, and the source code heavily relies on the pandas package for data processing. Data Sources I draw on data from multiple sources, most importantly is statcast data, which I obtain using  pybaseball .  This dataset contains information about every pitch, including the outcome (ball/strike/hit/etc) as well as other characteristics that have expanded over time (pitch type, velocity, spin, etc.).  From this pitch-level data, I derive at-bat level data and (batter, game)-level data.  This data includes many context features that can be useful for predicting th