Showing posts from May, 2023

Beat the Streak: Day 13

Recall from Beat the Streak: Day 12 , that my grand vision requires building a collection of several models for different sub problems, which will all be combined to get a model for the probability that a batter will get a hit in a given game.  In this blog post, my aim is to tackle the first subproblem.  Specifically, I'd like to build a model to predict the probability that a ball put into play results in a hit, given it's launch angle, spray angle, launch speed, and any other relevant context (like the ballpark).  Note that statcast data already has a column called "estimated_ba_from_speedangle".  This only looks at launch angle and launch velocity, and ignores spray angle.  It therefore acts as a good baseline for this problem that we can hopefully improve upon.  An even more naive baseline is to assume the probability of a hit is constant given it was put into play, ignoring all other context.  Evaluating these models gives a negative log likelihood of 0.409 and