Ryan's Repository of Random Reflections

Posts

Putting Gemini 2.5 Pro Experimental to the Test on Challenging Math Puzzles

- March 26, 2025

The pace of progress in Artificial Intelligence is truly staggering. Every few months, it feels like we see new capabilities emerge that were recently considered science fiction. As someone fascinated by problem-solving, particularly in mathematics, I'm always curious to see how the latest AI models stack up against complex reasoning tasks. Recently, I got the chance to try out Google's Gemini 2.5 Pro Experimental , one of their most advanced models currently under development. Naturally, I decided to put it through its paces using some of the very math puzzles I shared on this blog back in 2016 . These aren't your standard textbook arithmetic problems; many require a degree of logical deduction, abstract thinking, or creative strategy – the kind of things that have traditionally been challenging for computers. I fed the problems to Gemini 2.5 Pro Experimental yesterday, and honestly, I was blown away by the results. It solved all of them. Even more impressively, it n...

Beat the Streak: Day 18 - Looking back at 2024

- February 08, 2025

2024 was an exciting season for beat the streak, with several streaks appearing unstoppable and one player even breaking to 50 game barrier. This year was also quite good for me, and while my longest streak was only 20 games, my model was noticeably better than in prior seasons. In this blog post, I'll share some high level statistics about my model in the 2024 season. Number of Games with At Least One Hit First we will ignore the model and just look at the outcomes over time. In the figure below, we plot the % Games with Hit across all starting hitters from the year 2000 - 2025. We see that offensive production is overall not what it used to be, although there was a nice uptick since the all-time low in 2020 and 2021. With lower offensive production, it is harder to beat the streak, both because it is harder for an individual batter to get a hit, and because each batter gets less opportunities in a single game. The uptick is caused by the h...

Exploring multi-input einsums in JAX

- November 24, 2024

The JAX documentation says : einsum is a powerful and generic API for computing various reductions, inner products, outer products, axis reorderings, and combinations thereof across one or more input arrays. In this blog post, we are going to explore the einsum API and utilize it solve marginal inference-type problems. Basic Examples Suppose we have two $ n \times n $ arrays $A$ and $B$ and we want to compute $$ Y[i, k] = \sum_{j=1}^n A[i, j] B[j, k] $$ This is the basic definition of matrix multiplication, and we can compute it in JAX via jnp.einsum('ij,jk->ik', A, B). Here the formula 'ij,jk->ik' defines the computation. Each letter corresponds to a name for an axis of an array. In this case we have only 2D arrays so each part of the formula contains two axes. The letters on the RHS of the expression determine which axes will be defined for the output. Any axes names not appearing in the RHS of the expression will be "marginalized out" via th...

Beat the Streak Day 17: Coordinated Pick Selection

- January 03, 2024

My current best pick selection model for BTS achieves roughly 78% accuracy. Looking at the table from Day Six , we can see that with an optimal pick selection strategy, my odds of beating the streak are roughly 11,800 to 1, or roughly a 0.01% chance of winning. In this blog post, I will explore methods for boosting this probability by coordinating picks across multiple accounts (e.g., friends, family, other BTS enthusiasts, etc.) To the non-mathematically inclined, one might think that with $k$ accounts our probability of beating the streak as a group would simply multiply by $k$. This simple formula is not correct, however, although it is an upper bound. If each account has a probability $p$ of beating the streak, and there are $k$ accounts, the probability that at least one account beats the streak would be $1 - (1 - p)^k$ if we (incorrectly) assume independence between accounts . For small $p$ and small $k$, this is pretty close to to the upper bou...

Beat the Streak Day 16: Vegas Odds and Sports Betting

- January 01, 2024

I recently have been seeing many advertisements for sports betting platforms like DraftKings and Fanduel, and since those are somewhat related to beat the streak, I thought it would be interesting to look into these things a little more. In my search I came across a website that lists Vegas odds for various sports books on various bets. Among those bets is one highly related to beat the streak: a bet that a given batter will "record a hit" in a given game. This is exactly the essence of the beat the streak: identifying a batter most likely to record a hit. In my effort to develop models for BTS, I have several approaches to estimate the probability that a player will record a hit in a given game. So three natural questions arise: 1. Is Vegas good at BTS? That is, is the implied probability of a hit given the Vegas odds a better estimate of the true probability than some of the models I've talked about in this blog? 2. Can the Vegas odds be a useful f...

Beat the Streak Day Fifteen: A back-testing framework

- January 01, 2024

In this blog post, I will talk about a back-testing framework I developed to evaluate the quality of different Beat the Streak pick selection strategies on historical data, including the data sources I use, the evaluation metrics I look at, and some of the baseline models I've considered. The source code for my back-testing framework is available at https://github.com/ryan112358/beat-the-streak . Everything is written in python, and the source code heavily relies on the pandas package for data processing. Data Sources I draw on data from multiple sources, most importantly is statcast data, which I obtain using pybaseball . This dataset contains information about every pitch, including the outcome (ball/strike/hit/etc) as well as other characteristics that have expanded over time (pitch type, velocity, spin, etc.). From this pitch-level data, I derive at-bat level data and (batter, game)-level data. This data includes many context features that c...

Beat the Streak Day Fourteen: Singlearity

- November 26, 2023

After a fortnight of work, I am back with another blog post on MLBs beat the streak contest. And before you ask, yes that is still a thing in 2023. No one has won it yet, and this year the longest streak was 44, 13 shy of of the number needed and 7 short of the previous all time leader in BTS. In short, it doesn't seem like we're any closer to winning it now than we were 10 years ago. In the last few days, I have been thinking about new algorithms, models, and approaches to this longstanding problem. But before I can or should test these new approaches, it's important to better understand the limitations of simpler approaches when done very carefully. Singlearity was the first approach to this problem I've seen that convincingly demonstrated solid performance where it matters. The idea is to apply standard neural network training techniques on a dataset of (batter, game, outcome) data. The neural network is trained on carefully feature engineered data. Specif...