Modeling and Predicting the Outcome of a Sequence of Games

Here's something I though of today that I think is pretty interesting. Consider the following toy problem:
Suppose Alice and Bob play a sequence of games, and the probability that Alice wins game $i$ is given by $p_i$. Further, assume $p_1 \sim U[0,1]$ is uniform and that $ p_{i+1} \sim Beta(\alpha p_i + \beta, \alpha (1-p_i) + \beta) $ for known $ (\alpha, \beta) $. We are unable to directly observe $p_i$, but we do observe the outcome of each game $ o_i $ for $ 1 \leq i \leq k $. The task is to estimate $ p_{k+1} $ given the evidence $ o_1, \dots, o_k $.  
The figure below shows how Alice's win probability might evolve over time.  This particular figure was generated with $ \alpha = 1000 $ and $ \beta = 5 $.

We can model this situation with a modified Hidden Markov Model.  The graph below shows how the random variables are related in the form of a directed graphical model:

There are very efficient algorithms for doing inference and learning on Hidden Markov Models, but they usually assume that the state space of the hidden nodes is discrete.  The difference here is that the transition model $ P(p_i \mid p_{i-1}) $ is a continuous probability distribution, so the standard inference and learning algorithms don't apply.  We could of course discretize the transition model to make it discrete, but I am interesting in doing the mathematically correct thing and handle the continuous probability distribution directly.  I tried working out the mathematical details but it's deceptively hard.  If you have any insights, please share them in the comments below!  I have included some code at the end of this post for you to experiment with if you want to test out your ideas.

I am also interested in the more general problem where there are more than $2$ players, and we observe a sequence of matches $ (x_i, y_i, o_i) $ where $x_i$ and $y_i$ are the players and $o_i$ is the outcome of the match.  To what degree can we predict the outcomes of future matches given this evidence?  With some modifications of the toy example, we can use this model to predict the outcomes of sporting events, which I think is pretty cool.  However, since even the simplified toy example seems challenging, it seems like a really hard thing to do.
import numpy as np

class Learner:
    def __init__(self):
        pass
    def set_params(self, alpha, beta):
        self.alpha = alpha
        self.beta = beta
    '''Update the internal model with the outcome of the current game
    winner = True if Alice won, and False otherwise'''
    def update(self, winner):
        pass
    '''Return the probability that Alice wins the next game'''
    def predict(self):
        pass

class Naive(Learner):
    def __init__(self):
        pass  
    def update(self, winner):
        pass    
    def predict(self):
        return 0.5

class WeightedAverage(Learner):   
    def __init__(self, history):
        self.index = 0
        self.history = history
        # random vector of Trues/Falses
        self.games = np.random.rand(history) < 0.5
    def update(self, winner):
        self.games[self.index] = winner
        self.index = (self.index + 1) % self.history
    def predict(self):
        ''' empirical average of most recent games, with regularization '''
        return (1.0+np.sum(self.games)) / (2.0+self.history)

def evaluate(learner, n = 1000, alpha = 100, beta = 5):
    learner.set_params(alpha, beta)
    p = np.random.rand()
    # burn in period, let the model learn something before testing it
    for i in range(100):
        winner = np.random.rand() < p
        learner.update(winner)
        p = np.random.beta(alpha*p + beta, alpha*(1-p) + beta)
    sqerr = 0.0
    loglike = 0.0
    winner = np.random.rand() < p
    for i in range(n):
        learner.update(winner)
        p = np.random.beta(alpha*p + beta, alpha*(1-p) + beta)
        q = learner.predict()
        winner = np.random.rand() < p
        sqerr += (p-q)**2
        loglike += winner*np.log(q) + (1-winner)*np.log(1-q)
    print learner.__class__
    print '  MSE: %.5f' % (sqerr/n)
    print '  LogLoss: %.5f' % (-loglike/n)

if __name__ == '__main__':
    evaluate(Naive())
    evaluate(WeightedAverage(50))

Comments

Popular posts from this blog

Optimal Strategy for Farkle Dice

Markov Chains and Expected Value

Automatically Finding Recurrence Relations from Integer Sequences