### Modeling and Predicting the Outcome of a Sequence of Games

Here's something I though of today that I think is pretty interesting. Consider the following toy problem:
Suppose Alice and Bob play a sequence of games, and the probability that Alice wins game $i$ is given by $p_i$. Further, assume $p_1 \sim U[0,1]$ is uniform and that $p_{i+1} \sim Beta(\alpha p_i + \beta, \alpha (1-p_i) + \beta)$ for known $(\alpha, \beta)$. We are unable to directly observe $p_i$, but we do observe the outcome of each game $o_i$ for $1 \leq i \leq k$. The task is to estimate $p_{k+1}$ given the evidence $o_1, \dots, o_k$.
The figure below shows how Alice's win probability might evolve over time.  This particular figure was generated with $\alpha = 1000$ and $\beta = 5$.

We can model this situation with a modified Hidden Markov Model.  The graph below shows how the random variables are related in the form of a directed graphical model:

There are very efficient algorithms for doing inference and learning on Hidden Markov Models, but they usually assume that the state space of the hidden nodes is discrete.  The difference here is that the transition model $P(p_i \mid p_{i-1})$ is a continuous probability distribution, so the standard inference and learning algorithms don't apply.  We could of course discretize the transition model to make it discrete, but I am interesting in doing the mathematically correct thing and handle the continuous probability distribution directly.  I tried working out the mathematical details but it's deceptively hard.  If you have any insights, please share them in the comments below!  I have included some code at the end of this post for you to experiment with if you want to test out your ideas.

I am also interested in the more general problem where there are more than $2$ players, and we observe a sequence of matches $(x_i, y_i, o_i)$ where $x_i$ and $y_i$ are the players and $o_i$ is the outcome of the match.  To what degree can we predict the outcomes of future matches given this evidence?  With some modifications of the toy example, we can use this model to predict the outcomes of sporting events, which I think is pretty cool.  However, since even the simplified toy example seems challenging, it seems like a really hard thing to do.
import numpy as np

class Learner:
def __init__(self):
pass
def set_params(self, alpha, beta):
self.alpha = alpha
self.beta = beta
'''Update the internal model with the outcome of the current game
winner = True if Alice won, and False otherwise'''
def update(self, winner):
pass
'''Return the probability that Alice wins the next game'''
def predict(self):
pass

class Naive(Learner):
def __init__(self):
pass
def update(self, winner):
pass
def predict(self):
return 0.5

class WeightedAverage(Learner):
def __init__(self, history):
self.index = 0
self.history = history
# random vector of Trues/Falses
self.games = np.random.rand(history) < 0.5
def update(self, winner):
self.games[self.index] = winner
self.index = (self.index + 1) % self.history
def predict(self):
''' empirical average of most recent games, with regularization '''
return (1.0+np.sum(self.games)) / (2.0+self.history)

def evaluate(learner, n = 1000, alpha = 100, beta = 5):
learner.set_params(alpha, beta)
p = np.random.rand()
# burn in period, let the model learn something before testing it
for i in range(100):
winner = np.random.rand() < p
learner.update(winner)
p = np.random.beta(alpha*p + beta, alpha*(1-p) + beta)
sqerr = 0.0
loglike = 0.0
winner = np.random.rand() < p
for i in range(n):
learner.update(winner)
p = np.random.beta(alpha*p + beta, alpha*(1-p) + beta)
q = learner.predict()
winner = np.random.rand() < p
sqerr += (p-q)**2
loglike += winner*np.log(q) + (1-winner)*np.log(1-q)
print learner.__class__
print '  MSE: %.5f' % (sqerr/n)
print '  LogLoss: %.5f' % (-loglike/n)

if __name__ == '__main__':
evaluate(Naive())
evaluate(WeightedAverage(50))