Optimal Strategy for Farkle Dice

In this post, I will discuss my findings in terms of an optimal strategy for Farkle, which is a dice game of chance.  If you are unfamiliar with the game, I encourage you to skim the linked article so that you can better understand this blog post.  All of my findings are based on sound mathematical methods and a computer program was used to determine the optimal strategy.  Before I begin, let me first state the value of different dice rolls I assumed while developing the strategy.

Each 1: 100 points
Each 5: 50 points
3 1's: 1000 points
3 2's: 200 points
3 6's: 600 points
4 1's: 2000 points
4 6's 1200 points
5 1's 3000 points
5 6's 1800 points
Straight 1-6 1500 points
3 Pairs (e.g., 22 33 44): 750 points
Everything else is considered to be a Farkle (0 points).

There are many variants to Farkle, but I like to play by this set of rules.

The "game state" of Farkle roll can roughly be characterized by 2 values: the current score accumulate…

Writing Efficient Numpy Code

In this blog post, I am going to talk about writing efficient numpy code in python. I do a good amount of numerical linear algebra for my research and personal projects, and I typically code in python with numpy (and spicy) because it is very easy to use and it is also very efficient when used correctly.

Consider the following problem, which will serve as a concrete task to use as an example throughout this post. Suppose we have a (column) vector $x$ of length $n = 2^k$ and we want to compute $H_k x$ where $H_k$ is a “hierarchical” matrix with branching factor $2$, defined by $$ \begin{align*} H_0 &= \begin{bmatrix} 1 \end{bmatrix} \\ H_{k+1} &= \begin{bmatrix} 1 & 1 \\ H_k & 0 \\ 0 & H_k \end{bmatrix} \end{align*} $$ Where the top row is a vector of all ones and 0 denotes a matrix of zeros having the same size as $H_k$. For example, $H_2$ is a $7 \times 4$ matrix that looks like this: $$ \begin{bmatrix} 1 & 1 & 1 & 1 \\ 1 & 1 & 0 & 0…

Representing Graphical Model Factors in Numpy

I've been working a bit with graphical models lately for my research.  For a while I was using a library called pgmpy for their implementations of factor arithmetic and inference algorithms.  However, as my research is progressing I am needing more control than what pgmpy offers, so I decided to re-implement and extend the algorithms that I needed.  If you are in a similar situation and are finding yourself implementing your own graphical models algorithms, this post is for you.

In the setting I am working in, I have a Markov random field model where there are only a small number of variables (no more than a few dozen, often much less).  However, each variable can take on a possibly large number of categorical values and the graph may contain relatively large cliques that make exact inference computationally challenging (although still possible).

Suppose the model has $d$ variables and variable $i$ can take on $n_i$ possible values.  Usually a factor defined over $k$ variables can…

A Streamlined Workflow for Scientific Reproduciblility and Efficiency

As an undergraduate researcher, I was often unnecessarily bogged down by my poor workflow.  I consistently programmed "one off" scripts which sometimes turned into an important component of my research, and I was frequently frustrated by the way things were difficult to use.  My workflow improved a little when I got to grad school, but I finally decided to come up with a workflow that I could be happy with early this semester.  After collaborating with another student on a project and having a really hard time to integrate into his code, I decided I needed to come up with a good set of practices for my own sanity and for the sake of future collaborators as well.  While there is an initial upfront cost of using good programming principles, the long term benefits will usually outweigh the initial costs by a lot.  In this blog post, I'll summarize some key pieces to my new workflow.  If you are struggling with some of the problems that I mentioned then I hope you will find …

Kronecker Product for Matrices with Special Structure Using LinearOperators

I have been working with very large matrices for my research lately, and some of them are sparse, while others are dense but have some special structure that can be exploited to represent them efficiently.  I have also been constructing enormous matrices from these smaller building blocks using the Kronecker product.  Both numpy and scipy have functions built in to compute the Kronecker product between two matrices, but they need the input matrices to be explicitly represented (as a dense numpy array or a sparse scipy matrix), and so they are not compatible with the special-structure matrices that I've been dealing with.  I recently solved this problem, and that is the topic of this blog post.
An Example As an example to make things more concrete, consider the two matrices shown below:
$$ A = \begin{bmatrix} 1 & 1 & 1 \\ 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix}  \qquad B = \begin{bmatrix} 1 & 0 & 0 & 0 \\ 1 & 1 & 0 &…

Modeling and Predicting the Outcome of a Sequence of Games

Here's something I though of today that I think is pretty interesting. Consider the following toy problem:
Suppose Alice and Bob play a sequence of games, and the probability that Alice wins game $i$ is given by $p_i$. Further, assume $p_1 \sim U[0,1]$ is uniform and that $ p_{i+1} \sim Beta(\alpha p_i + \beta, \alpha (1-p_i) + \beta) $ for known $ (\alpha, \beta) $. We are unable to directly observe $p_i$, but we do observe the outcome of each game $ o_i $ for $ 1 \leq i \leq k $. The task is to estimate $ p_{k+1} $ given the evidence $ o_1, \dots, o_k $.   The figure below shows how Alice's win probability might evolve over time.  This particular figure was generated with $ \alpha = 1000 $ and $ \beta = 5 $.

We can model this situation with a modified Hidden Markov Model.  The graph below shows how the random variables are related in the form of a directed graphical model:

There are very efficient algorithms for doing inference and learning on Hidden Markov Models, but t…

Efficiently Remove Duplicate Rows from a 2D Numpy Array

Suppose you have a 2d numpy array and you want to remove duplicate rows (or columns).  In this blog post, I'll show you a trick you can use to do this more efficiently than using np.unique(A, axis=0).  This algorithm has time complexity $ O(\max(n \log{n}, n m)) $ for an $ n \times m $ matrix, and works almost surely.  By "almost surely" I mean that it is a randomized algorithm that works correctly with probability $1$.  To find the unique rows of a matrix $A$, the algorithm works by generating a random vector $x$ of real numbers, computing the dot product $ y = A x $, then analyzing the unique elements of $y$.  The indices of unique elements in $y$ is the same as the unique row indices in $A$ with probability $1$.  Below is an example to demonstrate how it works:

> import numpy as np > A = np.random.randint(low=0, high=3, size=(10, 2)) > A array([[2, 1], [1, 2], [0, 0], [2, 2], [1, 2], [0, 0], [0, 2], [2, 1], …

Context-Dependent Pitch Prediction with Neural Networks

This semester I took the machine learning class at UMass, and for my final project I developed models for predicting characteristics of pitches based on the context that they were thrown in.  The context consists of relevant and available information known before the pitch is thrown, such as the identities of the batter and pitcher, the batter height and stance, the pitcher handedness, the current count, and many others.  In particular, I looked into modeling the distribution over pitch types and locations.  This problem is challenging primarily because for a particular context (a specific setting of the different features that make up the context), there is usually very little data available for pitches thrown in that context.  In general, for each context feature we include, we have to split up our data into $k$, groups where $k$ is the number of values that the context feature can take on (for batter stance it is $k=2$, for count it is $k=12$, etc).  Thus, developing models by part…

An Interesting Game

Today I'm going to talk about an interesting game that I thought of, and in particular I'll show the probability that each player will win.  While the rules of the game are not fair, the skill levels of each player are not necessarily the same either, so I formally define what I mean and find the conditions under which these factors balance out.  Without further ado, here is the problem statement:
Alice and Bob play a game consisting of an indeterminate number of rounds.  In each round, Alice and Bob compete to win the round, and thus score a point.  Alice is more skilled at this game and so she wins each round with probability $ p > 0.5 $ (independent of the score and other context), meaning Bob wins each round with probability $ 1 - p $.  The score starts at $ 0 - 0 $ and Bob wins if he ever has more points than Alice, and Alice wins otherwise.  What is the probability that Bob wins this game?  For what value of $ p $ is this game fair? While the game never ends unless Bo…

Sorting as a Means of Shuffling in Racket

Racket is a derivative of Lisp, and it was also the language used in my first computer science class in college.  It is a functional language which encourages thinking in terms of composing simple functions to accomplish possibly complex goals.  When I was a lab assistant for the class, I recall having a conversation with the professor about how to shuffle a list of numbers in the language, and I came up with an interesting solution which I will talk about in this post.

In Racket, you can sort a list of anything by calling the sort function along with a list, and a function to compare any two items in the list.  An example use is (sort (list 4 1 3 2) <) which produces (list 1 2 3 4).  Since shuffling isn't natively supported in Racket and the simplest shuffling algorithms are not suitable for functional languages with lists instead of arrays, we have to be creative to achieve shuffling behavior.  My idea to shuffle a list was simple: replace the function to compare two items wi…

Fun Math Puzzles

A decent portion of my blog posts are about mathematical puzzles -- and in particular probability problems.  Unfortunately, since most of the traffic to this blog comes from google searches, those posts don't get nearly as many viewers as some of my other posts.   In order to increase viewership to my other content, I am creating this post to act as a central location where I list off and link to every challenge problem I have talked about on this blog.  If you are like me and like solving fun math problems, then you should check out the posts and try to solve the questions for yourself.  If you come up with a neat solution, feel free to share your solutions in the comments.  Here are the problems in the order that I blogged about them:
A Fun Puzzle from ProbabilityAnalyzing an Unfair GameCounting Special Non-Descending SequencesAnalyzing Randomly Descending SequencesRandom Walk on a Complete GraphPattern in Pascals TriangleMathematics and Darts!Triangles Containing the OriginInves…

Neural Networks Simplified

Artificial Neural Networks, and specifically deep neural networks have been gaining a lot of attention recently, and they are being used successfully across many disciplines. I first encountered neural networks a few years ago, but after hearing the term "backpropagation" and having no idea what it meant I feared that neural networks may be too complicated for me to understand. I made no attempts to truly understand them until recently, and have found that they are actually very easy to understand. I'm writing this blog post share my perspective on how to think about neural networks without jumping directly into the math.  This post is intended for beginners interested in machine learning using neural networks.  By the end of this blog post, you should be able to understand basics of neural networks enough to talk about them intelligently. And if you are so inclined, you can do a deeper dive into the mathematics that underlies the idea.

In this blog post I will talk a…

Active Learning and Thompson Sampling

Active learning is an interesting field in machine learning where the data is actively collected rather than passively observed.  That is, the learning algorithm has some influence over the data that is collected and used for learning.  The best example of this is Google AdWords, which uses active learning to serve ads to people in order to maximize click-through rate, profit, or some other objective function.  In this setting, the learning algorithm used by Google can choose which Ad to show me then it observes whether I clicked it or ignored it.  The learning algorithm will then update it's beliefs about the world based on this feedback so it can serve more relevant ads in the future.

Since the learning algorithm doesn't know which types of ads are most relevant a priori, it must explore the space of ads to determine which ones produce the highest click-through rate.  However, it also needs to exploit it's current knowledge to maximize click-through rate.  Thus, a good l…