A place where share my thoughts and ideas with the world.

Random Walk on a Complete Graph

Get link

Facebook

Twitter

Pinterest

Email

Other Apps

This is the 4th math problem in a series of math problem blog posts that I am writing about. I thought of this problem as a challenging weekly problem for math club, and I will present 2 solutions. I came up with the first solution, and a friend of mine came up with the second solution -- a completely different approach that yielded the same answer.

You are randomly walking on a complete graph with \( 2 n \) vertices. You start at a random vertex and at each time step you transition to a different connected vertex. Let \( f(n) \) be the expected number of steps until you've visited exactly \( n \) distinct vertices. Find $$ \lim_{n \to \infty} \frac{f(n)}{n} $$

For those of you who are not familiar with graph theory, here is a complete graph with 6 vertices:

In the context of this problem, you start on one of the circles (it doesn't matter which due to symmetry) and at each time step, you cross one of the lines to a new circle. Henceforth, I will refer to the circles as vertices, and the lines as edges.

First Solution

This solution relies on an important result from probability theory. Namely, The Linearity of Expectation. The Linearity of Expectation says if you have random variables \( X_1, X_2, ..., X_n \), then
$$
E(X_1 + X_2 + \dots + X_n) = E(X_1) + E(X_2) + \dots + E(X_n)
$$ where \( E(Y) \) denotes the expected value of the random variable Y. Let \( X_1 \) be the expected number of steps to reach the first new vertex, \( X_2 \) be the expected number of steps to reach the second new vertex given that you've already reached the first vertex, etc. In general \( X_i \) is the expected number of steps to reach the \( i^{th} \) vertex given that you've already visited \( (i-1) \) distinct vertices. Thus, \( f(n) \) as defined in the problem statement is given by
$$
f(n) = \sum_{i=0}^n E(X_i)
$$ Here, \( X_i \) are geometrically distributed random variables where a success is achieved when a new vertex is reached. The probability of reaching a new vertex at any given time is \( p = \frac{2n-i}{2n} \), where the numerator is the number of unreached vertices and the denominator is the total number of vertices. Technically, the \( 2n \) should be a \( 2n-1 \) since you can't transition from a vertex to itself (the graph is <i>simple</i>), but since we are concerned with the limiting behavior of \( f(n) \), I will leave it in for simplicity. The expected value of a geometric random variable is \( E(Y) = \frac{1}{p} \), where \( p \) is the probability of success. Thus, \( E(X_i) = \frac{2 n}{2 n - i} \). Therefore, \( f(n) \) is given by
$$
f(n) = \sum_{i=0}^n \frac{2 n}{2 n - i} \\
f(n) = 2 n \sum_{i=0}^n \frac{1}{2 n - i}
$$ Let's change the indices of summation to simplify the expression. Notice that the summation is given by
$$
\frac{1}{2n} + \frac{1}{2n-1} + \dots + \frac{1}{n}
$$ Thus, we can write \( f(n) \) as
$$
f(n) = 2 n \sum_{i=n}^{2n} \frac{1}{i}
$$ Notice that the sum is related to the harmonic numbers, and we can easily express it in terms of the harmonic numbers.
$$
f(n) = 2 n \Big( \sum_{i=1}^{2n} \frac{1}{i} - \sum_{i=1}^{n-1} \frac{1}{i} \Big) = 2 n (H_{2n} - H_{n-1})
$$ In the limit, the harmonic numbers converge to the natural log function. Thus,
$$
\begin{split}
lim_{n \to \infty} \frac{f(n)}{n} &= lim_{n \to \infty} \frac{2 n (H_{2n} - H_{n-1})}{n}\\
&= lim_{n \to \infty} 2 (ln(2n) - ln(n-1))\\
&= \boxed { 2 ln(2) }
\end{split}
$$

Second Solution

This solution interprets the limiting behavior of \( f(n) \) in a continuous manner instead of solving for the finite discrete case then evaluating the limit. Let \( g(x) \) be the proportion of distinct vertices that we've visited after \( x \) steps. Here, a step is not interpreted in the same way as in the discrete case. Since we are dealing directly with the proportion of visited vertices, \( x \) denotes the ratio of steps (or visited vertices) to total vertices. In the most optimistic case, \( g(x) = x \), but in general \( g(x) < x \). Here is the visual interpretation of this method.

The probability of visiting a new vertex is equal to the proportion of white area to total area, or \( 1 - g(x) \). This interpretation can be thought of as a rate of change, and so we can setup the following differential equation:
$$
g'(x) = 1 - g(x) \\
g'(x) + g(x) - 1 = 0
$$ The general solution to this differential equation is \( g(x) = c e^{-x} + 1 \). We know that \( g(0) = 0 \) and \( g'(0) = 1 \) since at \( x = 0 \), we haven't visited any vertices yet so the probability of seeing a new vertex is \( 1 \). Solving for \( c \), we see that \( c = -1 \), so \( g(x) = 1 - e^{-x} \). We are interested in when half the vertices have been visited. Thus, we want to know the value of \( x \) when \( g(x) = 0.5 \).
$$
g(x) = 1 - e^{-x} = 0.5 \\
e^{-x} = 0.5 \\
x = ln(2)
$$ Thus, the answer to the original problem is
$$
lim_{n \to \infty} \frac{f(n)}{n} = \frac{ln(2)}{0.5} = \boxed { 2 ln(2) }
$$

Conclusion

This is one of the things I find fascinating about mathematics. You can take two seemingly unrelated approaches to get the same result. The first solution relied on ideas from probability theory, whereas the second approach utilized ideas from differential equations. This just goes to show you how closely related all the fields of math are.

Processors speeds are no longer growing at the rate we've grown accustomed to. Moore's law states that computers double in speed every 18 months. Moore's law is still alive and well, but we just need to find alternate ways to speed up computers. That's where multi-core programming comes into play. The idea behind it is that 2 heads are better than one and in the right situation 2 processors can work on the same problem to solve it twice as fast. Parallel programming is becoming increasingly important these days and there are many different flavors of it including distributed computing, shared memory computing, and GPU computing. Today I am going to focus on shared memory computing, or using multi-core computers rather than many single core computers. This blog post is intended to serve as a beginners guide to parallel programming with Java. It is recommended that the reader have some java experience and is familiar with some of the features of Java 8 (namely lambda expr

If you have taken an introductory computer science course, you've probably seen the binary search algorithm - an algorithm to efficiently find the index of an item in a sorted array, if it exists. You might not have heard of interpolation search, however. Interpolation search is an alternative to binary search that utilizes information about the underlying distribution of data to be searched. By using this additional information, interpolation search can be as fast as $O(log(log(n)))$, where $n$ is the size of the array. In this post I am going to talk about two implementations of interpolation search. First, I will discuss the algorithm as described on Wikepedia, then I will describe my own version of the algorithm that has a better worst case time complexity than the first and performs better in practice than the traditional binary search and interpolation search algorithms. The Basic Idea Interpolation search models how humans search a dictionary better than a binary search,

In order to maximize your probability of beating the streak, you should (1) predict the probability that a batter will get a hit in a given game given the game parameters and (2) determine if it's worthwhile to risk your current streak in order to possibly improve it by 1 or 2 games. In this blog post, I outline my solution to (2). In previous blog posts, I've hinted at what I do to solve (1) and will continue that discussion in a later blog post. Motivation When your current streak is short, the optimal strategy is to pick the best player every day, regardless of how likely their probability of getting a hit is (to a certain extent). However, as your streak grows, you have an important decision to consider: is it better to pick the best player today and possibly lose your current streak, or skip picking a player and instead maintain your current streak. Naturally, this decision should be guided by the players probability of getting a hit, as well as the distribution of the

## Comments

## Post a Comment