From Pushkin to PageRank: How Generating Functions Solve Markov Chain Problems

The Origin Story: An Academic Rivalry

The invention of Markov chains has a delightful backstory involving academic rivalry and literary analysis.

The Controversy

In early 20th-century Russia, mathematician Pavel Nekrasov claimed that the Law of Large Numbers (which says that averages converge to expected values as sample size grows) required events to be independent. This wasn’t just a mathematical claim—Nekrasov argued it had philosophical implications, suggesting that free will and independence were fundamental to probability theory.

Andrey Markov, a notoriously cantankerous mathematician at St. Petersburg University, found this argument absurd. He set out to prove Nekrasov wrong by showing that the Law of Large Numbers holds even when events are dependent on what came before.

The Literary Proof

In 1913, Markov did something remarkable: he analyzed Alexander Pushkin’s novel in verse Eugene Onegin (Евгений Онегин). He counted thousands of characters from the text, classifying each as either a vowel or consonant.

He showed that whether a letter is a vowel or consonant depends on the previous letter. Despite this dependence, the proportion of vowels still converges to a stable value. The Law of Large Numbers works perfectly well with dependent events.

This was the birth of Markov chains—sequences where the next state depends only on the current state, not the entire history. This property (often called “memorylessness” or the “Markov property”) became fundamental to probability theory.

The Personal Element

Markov wasn’t just proving a mathematical point—he was settling a personal and ideological score. He was politically liberal and opposed mystical interpretations of mathematics. Nekrasov’s claim that independence was somehow necessary for probabilistic laws to work smacked of metaphysical thinking that Markov despised.

By analyzing Eugene Onegin, Markov demonstrated that dependence is perfectly compatible with mathematical regularity. Probability theory doesn’t require philosophical assumptions about independence or free will. Even literary text follows mathematical patterns.

The irony is delicious. Markov used Russia’s greatest poet to demolish a philosophical argument about probability, and in doing so, created one of the most important tools in modern mathematics.

What Are Markov Processes?

A Markov process is a sequence of events where the probability of what happens next depends only on the current state, not on how you got there. Mathematically:

$$P(X_{n+1} = j \mid X_n = i, X_{n-1}, X_{n-2}, \ldots) = P(X_{n+1} = j \mid X_n = i)$$

The past is irrelevant—only the present matters.

Transition Probabilities and Matrices

In a Markov chain, you move between states with certain probabilities. If you’re in state $ i $, the probability of moving to state $ j $ is denoted $ p_{ij} $.

These transition probabilities can be arranged in a transition matrix $ P $ where entry $ (i,j) $ gives the probability of going from state $ i $ to state $ j $.

For example, a simple weather model with two states (Sunny, Rainy):

$$P = \begin{pmatrix} 0.8 & 0.2 \\ 0.4 & 0.6 \end{pmatrix}$$

This means:

If today is Sunny: 80% chance tomorrow is Sunny, 20% chance Rainy
If today is Rainy: 40% chance tomorrow is Sunny, 60% chance Rainy

Computing Future Probabilities

What’s the probability of being in state $ j $ after $ n $ steps, starting from state $ i $? The answer is the $ (i,j) $ entry of $ P^n $ (the transition matrix raised to the $ n $-th power). For example, what’s the probability it’s sunny in 3 days if it’s sunny today?

$$P^3 = P \times P \times P = \begin{pmatrix} 0.656 & 0.344 \\ 0.688 & 0.312 \end{pmatrix}$$

So there’s a 65.6% chance it’s sunny in 3 days given it’s sunny today. What about in 100 days? The Problem is that computing $ P^{100} $ requires 100 matrix multiplications—tedious and error-prone. For larger systems with many states, this becomes computationally intractable.

Where Markov Processes Are Used

Today, Markov chains are everywhere in business, technology, and science:

Web search: Google’s PageRank models surfing as a Markov chain where a page’s importance is the long-run visit probability
Natural language: Voice assistants (Siri, Alexa) and predictive text use Hidden Markov Models
Finance: Credit risk modeling, options pricing, regime-switching strategies
Operations: Call center staffing, manufacturing capacity, network routing (queueing theory)
Healthcare: Disease progression, hospital capacity planning, treatment modeling
Supply chain: Inventory reordering, stockout prevention, disruption modeling
AI: Reinforcement learning (AlphaGo, ChatGPT training) uses Markov Decision Processes

The unifying theme is systems where the future depends only on the present state, not the entire history.

Why Generating Functions Are Natural for Markov Chains

Here’s the deep connection that makes generating functions the obvious tool:

The Matrix Power Series Perspective

In a Markov chain, we need to compute $ P^n $ for various values of $ n $. Let’s write out this sequence:

After 0 steps: $ P^0 = I $ (identity matrix—you’re still where you started)
After 1 step: $ P^1 = P $
After 2 steps: $ P^2 $
After 3 steps: $ P^3 $
…

Now here’s the key insight: what if we encode this entire infinite sequence of matrices into a single object?

Let’s define the matrix generating function:

$$G(s) = I + Ps + P^2s^2 + P^3s^3 + \cdots = \sum_{n=0}^{\infty} P^n s^n$$

This is a matrix whose entries are generating functions!

The Beautiful Algebraic Structure

Notice something remarkable: $$G(s) = I + Ps + P^2s^2 + P^3s^3 + \cdots$$ $$G(s) = I + s(P + Ps + P^2s^2 + \cdots)$$ $$G(s) = I + sP(I + Ps + P^2s^2 + \cdots)$$ $$G(s) = I + sP \cdot G(s)$$

Solving for $ G(s) $: $$G(s) - sPG(s) = I$$ $$(I - sP)G(s) = I$$ $$G(s) = (I - sP)^{-1}$$

This is stunning: the infinite sequence of matrix powers $ P, P^2, P^3, \ldots $ collapses into the simple closed form $ (I - sP)^{-1} $!

This is the matrix analogue of the geometric series formula: $$\frac{1}{1-x} = 1 + x + x^2 + x^3 + \cdots$$

Connection to Individual States

For a specific state $ j $, let $ \mathbf{p}^{(n)} $ be the probability distribution at time $ n $. Then: $$\mathbf{p}^{(n)} = P^n \mathbf{p}^{(0)}$$

The generating function for the probability of being in state $ j $ is: $$g_j(s) = \sum_{n=0}^{\infty} p_j^{(n)} s^n$$

This is exactly the $ j $-th component of the vector: $$\mathbf{g}(s) = \sum_{n=0}^{\infty} P^n \mathbf{p}^{(0)} s^n = (I - sP)^{-1} \mathbf{p}^{(0)}$$

Why This Is Natural

Computing $ P^{100} $ directly requires multiplying the matrix by itself 100 times—computationally expensive and error-prone.

But with generating functions:

Form $ I - sP $ (simple subtraction)
Invert this matrix (one-time cost)
Extract the coefficient of $ s^{100} $ (using techniques like partial fractions)

Even better, we can often get closed forms without computing individual coefficients at all!

The Natural Fit

So generating functions aren’t just a clever trick—they’re the natural algebraic object for studying Markov chains because:

They encode the entire time evolution in one object
They transform iterated matrix multiplication into a single matrix inversion
They often yield closed forms that would be impossible to see otherwise
They unify matrix powers, first passage times, and reward computations under one algebraic toolbox

This is why, when you see a Markov chain problem, reaching for generating functions should feel as natural as reaching for derivatives when you see rates of change.

The Challenge: Problems That Seem Intractable

While Markov chains are conceptually simple, answering basic questions about them can be extremely difficult. Computing probabilities over time requires raising transition matrices to high powers or summing over exponentially many paths.

Let’s see how generating functions—the same technique from the Fibonacci article—solve these problems elegantly.

Four Worked Examples

We’ll solve four real business problems step-by-step, showing the probability model, state diagram, and how generating functions provide closed-form solutions. We will see that we have a general method to solve these problems with generating functions. First, you model your states and transition probabilities and define the quantity you want to find, expressed as a sequence over state or steps. Then find a recurrence relation. The key now is embed the quantity sequence as a generating function and re-interpreting the recurrence relation as an equation with the generating function. You can then solve this equation to find a closed form of the quantity you were searching for.

This can sound a bit abstract, but let’s explore this more concretely in our examples.

Example 1: Call Center Queue Management

You run a call center. Customers arrive randomly and wait in a queue. Currently there are 5 people waiting. If the queue hits 20, customer service degrades and people hang up. You need to decide whether to bring on more staff. Therefore, you need to know:

How long on average until the queue reaches a critical length of 20 people?

Let’s model our problem. First we define the Markov Process. The states are number of people in queue (0, 1, 2, …, 20, …). The transitions are a bit more complicated. Let’s assume that the customers arrive at rate $ \lambda $ = 10 per hour and that you have $ c = 3 $ operators, each serving at rate $ \mu = 4 $ per hour. Then the probability to go from state $ n $ to state $ n+1 $ (queue grows) is $ p = \frac{\lambda}{\lambda + c\mu} = \frac{10}{10+12} = \frac{10}{22} $, and the probability to go to state $ n-1 $ (queue shrinks) is $ q = \frac{c\mu}{\lambda + c\mu} = \frac{12}{22} $.

This is captured in state transition diagram below.

graph LR
    0((0)) -->|q| 1((1))
    1 -->|p| 0
    1 -->|q| 2((2))
    2 -->|p| 1
    2 -->|q| 3((3))
    3 -->|p| 2
    3 -->|q| 4((4))
    4 -->|p| 3
    4 -->|q| 5((5))
    5 -->|p| 4
    5 -->|q| 6((6))
    6 -->|p| 5
    6 -->|q| mid((...))
    mid -->|p| 6
    mid -->|q| 19((19))
    19 -->|p| mid
    19 -->|q| 20((20))
    20 -->|p| 19

Now, the sequence we’re trying to find is the expected time (in hour) to reach state 20 starting from state $ n $, which we will call $ T_n $. In our initial question we start from $ n=5 $.

Since state $ n $ can be reached from state $ n-1 $ with probability $p$ and from state $ n+1 $ with probability $ q $, our recurrence relation is:

$$T_n = 1 + p \cdot T_{n+1} + q \cdot T_{n-1}$$

With $ T_{20} = 0 $ (already there).

Let’s now introduce our generating function and define $$G(s) = \sum_{n=0}^{19} T_n s^n$$

This is a finite sum since we stop when the queue is 20. Multiply the recurrence by $ s^n $ and sum from $ n=0 $ to $ n=19 $:

$$\sum_{n=0}^{19} T_n s^n = \sum_{n=0}^{19} s^n + p \sum_{n=0}^{19} T_{n+1} s^n + q \sum_{n=0}^{19} T_{n-1} s^n$$

After algebraic manipulation (shifting indices, applying boundary conditions), we get a functional equation that can be solved.

$$T_n = \frac{1}{c\mu - \lambda} \left[ \frac{(p/q)^{20-n} - 1}{p/q - 1} \right] \quad \text{when } \lambda < c\mu$$

For our numbers ($ \lambda = 10 $, $ c\mu = 12 $, $ n=5 $): $$T_5 = \frac{1}{12-10} \left[ \frac{(10/12)^{15} - 1}{10/12 - 1} \right] \approx 7.5 \text{ hours}$$

So now we can answer our initial question. The expected time is 7.5 hours before the queue reaches critical length. If we reduce staff to 2 ($ c\mu = 8 < \lambda = 10 $), the queue becomes unstable—it grows without bound! The formula tells managers exactly the staffing threshold needed for stability: $ c\mu > \lambda $.

Example 2: Trading Firm Risk (Gambler’s Ruin)

A trading firm has $10M in capital. Each trading day, they either make $1M (52% of days) or lose $1M (48% of days). Even with a statistical edge, there’s risk of ruin. The firm needs to know if their capital buffer is adequate. The management wants to know:

What’s the probability of bankruptcy before reaching the target of $20M?

Here the states are the capital in millions (0, 1, 2, …, 10, …, 20). The transitions are straightforward:

From state $ k $: move to $ k+1 $ with probability $ p = 0.52 $ (win)
From state $ k $: move to $ k-1 $ with probability $ q = 0.48 $ (lose)

We say that state 0 and 20 are absorbing, as they lead to either bankruptcy or a win.

The state transitions are represented visually below.

graph LR
    0[0<br/>RUIN] -.-> 1
    1((1)) -->|p| 2((2))
    1 -->|q| 0
    2 -->|q| 1
    2 -->|p| midL((...))
    midL -->|q| 2
    midL -->|p| 10((10<br/>current))
    10 -->|q| midL
    10 -->|p| midR((...))
    midR -->|q| 10
    midR -->|p| 19((19))
    19 -->|q| midR
    19 -->|p| 20[20<br/>TARGET]

    style 0 fill:#f99
    style 20 fill:#9f9
    linkStyle 0 stroke-width:0,opacity:0;

The sequence we’re trying to know is the probability of ruin starting from capital $ k $, which we will call $ R_k $.

The recurrence relation is also straightforward: $$R_k = p \cdot R_{k+1} + q \cdot R_{k-1}$$

With $ 0 \leq k \leq 20 $, $ R_0 = 1 $ (already ruined), $ R_{20} = 0 $ (safe, reached target).

Now let’s bring in the generating function \[ F(z) = \sum_{k=0}^{20} R_k z^k \] Multiply the recurrence $R_k = p R_{k+1} + q R_{k-1}$ by $z^k$ and sum over $k=1,\ldots,19$. After shifting indices, the interior sums cancel and you obtain the algebraic identity \[ (1 - pz - qz^{-1}) F(z) = q R_0 z - p R_{20} z^{21} = qz \] because $R_0=1$ and $R_{20}=0$. Multiply through by $z$ to clear the negative power: \[(z - p z^2 - q) F(z) = q z^2.\] The denominator factors neatly as $-p(z-1)(z-q/p)$, so $F(z)$ has simple poles at $z=1$ and $z=q/p$. We can therefore write a partial fraction decomposition \[F(z) = \frac{A}{z-1} + \frac{B}{z-q/p}.\] Expanding each term as a geometric series, \[\frac{1}{z-1} = -\sum_{k\ge 0} z^k, \qquad\frac{1}{z-q/p} = -\frac{p}{q} \sum_{k\ge 0} \left(\frac{p}{q} z\right)^k,\] reveals that the coefficients themselves must be a linear combination of 1 and $(q/p)^k$: \[R_k = C_1 + C_2 \left(\frac{q}{p}\right)^k.\]

You can also use the so-called characteristic equation to come to this result, which is a kind of shortcut ¹.

From $ R_0 = 1 $ and $ R_{20} = 0 $,

$$ \begin{cases} C_1 + C_2 = 1 \\ C_1 + C_2(q/p)^{20} = 0 \end{cases} $$

Solving this system gives the familiar closed form $$R_k = \frac{(q/p)^k - (q/p)^{20}}{1 - (q/p)^{20}}$$

So let’s compute for our firm. With $ p = 0.52 $, $ q = 0.48 $, starting from $ k = 10 $ we have $ q/p = 0.48/0.52 \approx 0.923 $ and $ R_{10} = \frac{0.923^{10} - 0.923^{20}}{1 - 0.923^{20}} \approx \frac{0.459 - 0.211}{1 - 0.211} = \frac{0.248}{0.789} \approx 0.314 $

There is a 31.4% chance of bankruptcy before reaching $20M! Even with a 4% edge (52% vs 48%), nearly 1 in 3 chance of ruin. The formula shows ruin probability depends exponentially on the ratio $ q/p $. If the edge shrinks to 51% vs 49%, ruin probability jumps to 45%. Capital buffers need to be much larger than intuition suggests.

Bonus: How Long Does It Take?

The same generating-function machinery yields the closed form (for $p \ne q$): $$T_k = \frac{k(20-k)}{q-p} - \frac{20}{q-p} \cdot \frac{1 - (q/p)^k}{1 - (q/p)^{20}}.$$ Plugging in the trading desk’s numbers gives $$T_{10} = \frac{10 \times 10}{0.48 - 0.52} - \frac{20}{0.48 - 0.52} \cdot \frac{1 - (0.48/0.52)^{10}}{1 - (0.48/0.52)^{20}} \approx 2500 \text{ days} \approx 10 \text{ years}.$$

Even with the edge, it takes about a decade on average to either double capital or go bust!

Example 3: Subscription Service Lifetime Value

You run a SaaS company. Customers pay $100/month. Each month, 10% churn (cancel), but 30% of churned customers eventually resubscribe. This “customer lifetime value” (LTV) determines how much you can spend on acquisition. If LTV equals $1,500, you can profitably spend up to $1,500 to acquire each customer. The question is:

What’s the expected total revenue from a new customer?

We have three states:

Active (A): Paying subscriber, generates $100/month
Churned (C): Cancelled, generates $0, but might return
Gone (G): Permanently lost

And the following transitions per month:

Active → Active: 0.9 (stay subscribed)
Active → Churned: 0.1 (cancel)
Churned → Active: 0.3 (resubscribe)
Churned → Gone: 0.7 (permanently leave)
Gone → Gone: 1.0 (absorbing state)

This is captured in the following state transition diagram.

graph LR
    A[Active<br/>$100/mo] -->|0.9| A
    A -->|0.1| C[Churned<br/>$0]
    C -->|0.3| A
    C -->|0.7| G[Gone<br/>$0]
    G -->|1.0| G

    style A fill:#9f9
    style C fill:#ff9
    style G fill:#ccc

The quantity we want is the expected lifetime revenue starting from the active state, which we can track by the probability for a customer to be active in month n and then multiply per the monthly subscription cost.

Let $P_n$ be this probability, with $P_0 = 1$. Each month they either stay active (probability 0.9) or churn (probability 0.1). If they churn, 30% of those customers reactivate the following month. Therefore the probability a customer is active in the next month is 0.9 (stayed active) + 0.1 × 0.3 (churned then reactivated) = 0.93. This gives the simple recurrence \[P_{n+1} = 0.93 P_n \qquad (n \ge 0).\] Introduce the ordinary generating function $A(z) = \sum_{n \ge 0} P_n z^n$. The recurrence implies \[\frac{A(z) - 1}{z} = 0.93 A(z) \quad \Rightarrow \quad A(z) = \frac{1}{1 - 0.93 z}.\]

Let’s convert back to dollars. The expected number of active months is the sum of the coefficients, $\sum_{n \ge 0} P_n = \lim_{z \to 1^-} A(z) = 1/(1 - 0.93) = 1/0.07$. Multiplying by the $100 earned in each active month yields $$LTV = 100 \times \frac{1}{0.07} \approx 1{,}429.$$

Business Insight:

LTV = $1,429 for an active customer
A churned customer is still worth $429 (because they might return!)
Win-back campaigns targeting churned users should spend up to $429
Without resubscription (if churned meant gone forever), LTV would be only $1,000.
The 30% resubscribe rate increases LTV by 43%

Example 4: Inventory Management with Variable Demand

You run a retail store. Daily demand varies: some days are busy (100 units sold), other days are slow (40 units sold). Crucially, demand is autocorrelated: if today is busy, tomorrow is likely busy too. You start with 200 units in stock. Stockouts lose sales and frustrate customers. But holding too much inventory ties up capital. You need to balance these costs. The question is:

What’s the probability of running out of stock within the next week?

The Markov chain has 2 states for demand:

High (H): 100 units/day demand
Low (L): 40 units/day demand

Transitions per day:

High → High: 0.7 (busy days cluster)
High → Low: 0.3
Low → High: 0.4
Low → Low: 0.6 (slow days cluster)

State Diagram:

graph LR
    H[High<br/>100 units/day] -->|0.7| H
    H -->|0.3| L[Low<br/>40 units/day]
    L -->|0.4| H
    L -->|0.6| L

    style H fill:#f99
    style L fill:#9cf

Inventory equation: $ I_{n+1} = I_n - D_n $ where $ D_n \in {40, 100} $ depending on demand state.

Step 1: Write the Evolution

Let $ p_H(i, n) $ = probability of having $ i $ units on day $ n $ in High demand state.

After one day in High state with $ i $ units:

Sell 100 units → down to $ i - 100 $
Tomorrow: High with prob 0.7, Low with prob 0.3

This gives: $$p_H(i, n+1) = 0.7 \cdot p_H(i+100, n) + 0.4 \cdot p_L(i+100, n)$$ $$p_L(i, n+1) = 0.3 \cdot p_H(i+40, n) + 0.6 \cdot p_L(i+40, n)$$

Step 2: Generating Functions Simplify This

Define: $ G_H(s, n) = \sum_{i} p_H(i,n) s^i $

The evolution equations become: $$G_H(s, n+1) = s^{-100} [0.7 G_H(s,n) + 0.4 G_L(s,n)]$$ $$G_L(s, n+1) = s^{-40} [0.3 G_H(s,n) + 0.6 G_L(s,n)]$$

(Multiplying by $ s^{-100} $ shifts all probabilities down by 100 inventory units.)

Starting from $G_H(s,0)=s^{200}$ and $G_L(s,0)=0$, the previous recursion pushes the generating functions forward one day at a time. Negative powers of $s$ correspond to inventories that have already hit zero or below. Let $[f]_{>0}$ denote the projection that discards those negative powers. The probability of still having stock after $n$ days is then

$$\Pr(I_n>0) = \left([G_H(s,n)+G_L(s,n)]_{>0}\right) \big|_{s=1}$$

and the stockout probability is the complement. Carrying out the algebra (equivalently, at each step keeping only the positive powers of $s$) gives the exact figures below (rounded to three decimals).

Days elapsed	$Pr(I_n>0)$	Stockout probability	Dominant remaining states
1	1.000	0.000	H: 100 units (0.70), L: 160 units (0.30)
2	0.510	0.490	H: 60 (0.12), L: 60 (0.21), 120 (0.18)
3	0.342	0.658	H: 20 (0.072), L: 20 (0.162), 80 (0.108)
4	0.0648	0.9352	L: 40 (0.0648)

By the fourth day only the all-low sequence keeps the inventory positive; on day five every path has exhausted the original 200 units.

Business Insight:

Within two days there is already a 49% chance of a stockout; by the fourth day the probability climbs to 93.5%.
The only survival path over four days is four consecutive low-demand days. The Markov structure tells you the exact weight of that path (6.48%).
Generating functions make it straightforward to produce these exact risk numbers, which are far sharper than a crude average-demand estimate $200/74 \approx 2.7 $ days.

The Common Pattern

All four examples follow the same blueprint:

Model the system: Identify states and transition probabilities
Write a recurrence: Express the quantity we want (probability, expected value, etc.) recursively
Define a generating function: Encode the sequence into a power series
Solve algebraically: Convert the recurrence into a functional equation and solve
Extract the answer: Read off coefficients or evaluate derivatives

This is exactly what we did with Fibonacci in the previous article! The difference is that now we’re solving real business problems:

Call center staffing decisions
Trading firm risk management
Customer acquisition budgets
Inventory reorder policies

Historical Note

Abraham de Moivre used these techniques in 1730 to solve the gambler’s ruin problem—decades before Markov was even born. Laplace systematized generating functions for probability in his 1812 Théorie Analytique des Probabilités.

The connection to Markov chains came later, but the mathematical machinery was ready and waiting. It’s a beautiful example of how mathematical tools developed in one context (pure combinatorics) find unexpected applications elsewhere (stochastic processes).

Conclusion

From Markov’s literary analysis of Pushkin to Google’s PageRank algorithm, Markov chains have evolved from an academic curiosity into an essential tool for modern business and technology.

What makes generating functions so powerful is that they transform the problem:

From computing infinitely many matrix powers to inverting a single matrix
From tracking probabilities at each time step to manipulating algebraic expressions
From discrete combinatorics to continuous analysis

The next time you encounter a system where “the future depends only on the present”—whether it’s customer behavior, inventory dynamics, or financial risk—remember that generating functions provide the natural mathematical framework for understanding its evolution over time.

As Herbert Wilf wrote: “A generating function is a clothesline on which we hang up a sequence of numbers for display.” For Markov chains, that clothesline reveals the entire future trajectory in a single elegant expression.

The Origin Story: An Academic Rivalry#

The Controversy#

The Literary Proof#

The Personal Element#

What Are Markov Processes?#

Transition Probabilities and Matrices#

Computing Future Probabilities#

Where Markov Processes Are Used#

Why Generating Functions Are Natural for Markov Chains#

The Matrix Power Series Perspective#

The Beautiful Algebraic Structure#

Connection to Individual States#

Why This Is Natural#

The Natural Fit#

The Challenge: Problems That Seem Intractable#

Four Worked Examples#

Example 1: Call Center Queue Management#

Example 2: Trading Firm Risk (Gambler’s Ruin)#

Example 3: Subscription Service Lifetime Value#

Example 4: Inventory Management with Variable Demand#

The Common Pattern#

Historical Note#

Conclusion#

Further Reading#