Beat the Streak: Day 13
Recall from Beat the Streak: Day 12 , that my grand vision requires building a collection of several models for different sub problems, which will all be combined to get a model for the probability that a batter will get a hit in a given game. In this blog post, my aim is to tackle the first subproblem. Specifically, I'd like to build a model to predict the probability that a ball put into play results in a hit, given it's launch angle, spray angle, launch speed, and any other relevant context (like the ballpark). Note that statcast data already has a column called "estimated_ba_from_speedangle". This only looks at launch angle and launch velocity, and ignores spray angle. It therefore acts as a good baseline for this problem that we can hopefully improve upon. An even more naive baseline is to assume the probability of a hit is constant given it was put into play, ignoring all other context. Evaluating these models gives a negative log likelihood of 0.409 and