## Posts

### Symmetric functions and their extreme points

A symmetric function $f(x_1, \dots, x_n)$ is one that is invariant to permutations of $x$.  In the 2D case, this means $f(a,b) = f(b,a)$.  One basic example of a symmetric function is the area of a rectangle as a function of its side lengths, where $f(a,b) = a b$.  It is well known that to maximize the area of a rectangle under a constraint on the perimeter, you should set all side lengths equal, to form a square.  In this case, the maximum of this symmetric function occurs when $a=b$.

While in general it is not always true that the extreme points of a symmetric function occurs when $a=b$, there are special cases when it does hold.  For example, if $f(a,b)$ is a convex (concave) function, then a global minimum (maximum) will occur when $a=b$.  Of course, the function above is not concave, but this property still holds.  So the question arises: in what other situations does this nice property hold?

In this blog post, I will state a general result that I found today regarding this probl…

### L2 projection onto the probability simplex with bound constraints

Projecting onto the probability simplex is a common problem that arises in frequency estimation and related tasks.  This problem can be solved efficiently with an $O(n \log{(n)})$ algorithm.  In this blog post I consider the related problem of projecting onto the probability simplex with bound constraints.  Given a vector $\mathbf{r} \in \mathbb{R}^n$, our goal is to find a vector $\mathbf{p}^*$ that solves the following optimization problem. \begin{equation*} \begin{aligned} & \underset{\mathbf{p}}{\text{minimize}} & & \frac{1}{2} \lvert \lvert \mathbf{p} - \mathbf{r} \rvert \rvert_2^2 \\ & \text{subject to} & & \mathbf{1}^T \mathbf{p} = 1 \\ & & & \mathbf{a} \leq \mathbf{p} \leq \mathbf{b} \\ \end{aligned} \end{equation*} This problem generalizes the standard probability simplex projection problem by introducing arbitrary bound constraints $\mathbf{a} \leq \mathbf{p} \leq \mathbf{b}$. Here, $\mathbf{a}, \mathbf{b} \in \mathbb{R}^n$. …

### Beyond vectorization: writing efficient numpy code

Numpy is a great library for expressing computations over vectors, matrices, and multi-dimensional arrays in python due to it’s simplicity and efficiency. In a previous blog post, I discussed fairly well-known techniques for speeding up (and cleaning up) numpy code by avoiding loops and exploiting problem structure. I showed that when you use the suggestions in that post, you can speed up your numpy code by orders of magnitude. In this blog post, I’ll show you how to get even more speed out of your numpy code on top of the improvements you get from vectorization. As a simple example to illustrate an inefficiency of numpy, consider computations of the form z = 0.2*x + 0.8*y where x and y are large numpy arrays. This computation will form intermediate arrays 0.2*x and 0.8*y, so the memory overhead can be problematic and slow down the computation. On my laptop, it takes about 1 second for arrays of size 100 million:
import numpy as np def foo(x,y): return 0.2*x + 0.8…

### Beat the Streak: Day Seven In this blog post, I discuss questions of the form: “Does batter X perform better at home or away?At day or night?Against lefties or righties? On Friday or Monday?”What I found was a little bit surprising.
Take for example the batter Daniel Murphy.When you look at his data from 2011 - 2017, you will see that he got a hit in 29.85% of 1424 plate appearances during day games and he got a hit in 26.97% of 2673 plate appearances during night games.This is a pretty meaningful difference, but is it statistically significant?In other words, could this difference be explained purely by chance?To answer this question, we can perform a chi squared test under the null hypothesis that the true probabilities are the same.When we do this we get a chi squared value of 3.35 and a corresponding p value of 0.067.Thus, we can reject the null hypothesis that the true underlying probabilities are the same at the 90% confidence level.This is pretty convincing evidence that the day/night split really matt…

### Optimal Strategy for Farkle Dice

In this post, I will discuss my findings in terms of an optimal strategy for Farkle, which is a dice game of chance.  If you are unfamiliar with the game, I encourage you to skim the linked article so that you can better understand this blog post.  All of my findings are based on sound mathematical methods and a computer program was used to determine the optimal strategy.  Before I begin, let me first state the value of different dice rolls I assumed while developing the strategy.

Each 1: 100 points
Each 5: 50 points
3 1's: 1000 points
3 2's: 200 points
...
3 6's: 600 points
4 1's: 2000 points
...
4 6's 1200 points
5 1's 3000 points
...
5 6's 1800 points
Straight 1-6 1500 points
3 Pairs (e.g., 22 33 44): 750 points
Everything else is considered to be a Farkle (0 points).

There are many variants to Farkle, but I like to play by this set of rules.

The "game state" of Farkle roll can roughly be characterized by 2 values: the current score accumulate…

### Writing Efficient Numpy Code In this blog post, I am going to talk about writing efficient numpy code in python. I do a good amount of numerical linear algebra for my research and personal projects, and I typically code in python with numpy (and spicy) because it is very easy to use and it is also very efficient when used correctly.

### Modeling and Predicting the Outcome of a Sequence of Games Here's something I though of today that I think is pretty interesting. Consider the following toy problem:
Suppose Alice and Bob play a sequence of games, and the probability that Alice wins game $i$ is given by $p_i$. Further, assume $p_1 \sim U[0,1]$ is uniform and that $p_{i+1} \sim Beta(\alpha p_i + \beta, \alpha (1-p_i) + \beta)$ for known $(\alpha, \beta)$. We are unable to directly observe $p_i$, but we do observe the outcome of each game $o_i$ for $1 \leq i \leq k$. The task is to estimate $p_{k+1}$ given the evidence $o_1, \dots, o_k$.   The figure below shows how Alice's win probability might evolve over time.  This particular figure was generated with $\alpha = 1000$ and $\beta = 5$.

We can model this situation with a modified Hidden Markov Model.  The graph below shows how the random variables are related in the form of a directed graphical model:

There are very efficient algorithms for doing inference and learning on Hidden Markov Models, but t…

### Efficiently Remove Duplicate Rows from a 2D Numpy Array

Suppose you have a 2d numpy array and you want to remove duplicate rows (or columns).  In this blog post, I'll show you a trick you can use to do this more efficiently than using np.unique(A, axis=0).  This algorithm has time complexity $O(\max(n \log{n}, n m))$ for an $n \times m$ matrix, and works almost surely.  By "almost surely" I mean that it is a randomized algorithm that works correctly with probability $1$.  To find the unique rows of a matrix $A$, the algorithm works by generating a random vector $x$ of real numbers, computing the dot product $y = A x$, then analyzing the unique elements of $y$.  The indices of unique elements in $y$ is the same as the unique row indices in $A$ with probability $1$.  Below is an example to demonstrate how it works:

> import numpy as np > A = np.random.randint(low=0, high=3, size=(10, 2)) > A array([[2, 1], [1, 2], [0, 0], [2, 2], [1, 2], [0, 0], [0, 2], [2, 1], …

### Context-Dependent Pitch Prediction with Neural Networks

This semester I took the machine learning class at UMass, and for my final project I developed models for predicting characteristics of pitches based on the context that they were thrown in.  The context consists of relevant and available information known before the pitch is thrown, such as the identities of the batter and pitcher, the batter height and stance, the pitcher handedness, the current count, and many others.  In particular, I looked into modeling the distribution over pitch types and locations.  This problem is challenging primarily because for a particular context (a specific setting of the different features that make up the context), there is usually very little data available for pitches thrown in that context.  In general, for each context feature we include, we have to split up our data into $k$, groups where $k$ is the number of values that the context feature can take on (for batter stance it is $k=2$, for count it is $k=12$, etc).  Thus, developing models by part…

### An Interesting Game

Today I'm going to talk about an interesting game that I thought of, and in particular I'll show the probability that each player will win.  While the rules of the game are not fair, the skill levels of each player are not necessarily the same either, so I formally define what I mean and find the conditions under which these factors balance out.  Without further ado, here is the problem statement:
Alice and Bob play a game consisting of an indeterminate number of rounds.  In each round, Alice and Bob compete to win the round, and thus score a point.  Alice is more skilled at this game and so she wins each round with probability $p > 0.5$ (independent of the score and other context), meaning Bob wins each round with probability $1 - p$.  The score starts at $0 - 0$ and Bob wins if he ever has more points than Alice, and Alice wins otherwise.  What is the probability that Bob wins this game?  For what value of $p$ is this game fair? While the game never ends unless Bo…