The Origin Story: An Academic Rivalry
The invention of Markov chains has a delightful backstory involving academic rivalry and literary analysis.
The Controversy
In early 20th-century Russia, mathematician Pavel Nekrasov claimed that the Law of Large Numbers (which says that averages converge to expected values as sample size grows) required events to be independent. This wasn’t just a mathematical claim—Nekrasov argued it had philosophical implications, suggesting that free will and independence were fundamental to probability theory.
Andrey Markov, a notoriously cantankerous mathematician at St. Petersburg University, found this argument absurd. He set out to prove Nekrasov wrong by showing that the Law of Large Numbers holds even when events are dependent on what came before.
The Literary Proof
In 1913, Markov did something remarkable: he analyzed Alexander Pushkin’s novel in verse Eugene Onegin (Евгений Онегин). He counted thousands of characters from the text, classifying each as either a vowel or consonant.
He showed that whether a letter is a vowel or consonant depends on the previous letter. Despite this dependence, the proportion of vowels still converges to a stable value. The Law of Large Numbers works perfectly well with dependent events.
This was the birth of Markov chains—sequences where the next state depends only on the current state, not the entire history. This property (often called “memorylessness” or the “Markov property”) became fundamental to probability theory.
The Personal Element
Markov wasn’t just proving a mathematical point—he was settling a personal and ideological score. He was politically liberal and opposed mystical interpretations of mathematics. Nekrasov’s claim that independence was somehow necessary for probabilistic laws to work smacked of metaphysical thinking that Markov despised.
By analyzing Eugene Onegin, Markov demonstrated that dependence is perfectly compatible with mathematical regularity. Probability theory doesn’t require philosophical assumptions about independence or free will. Even literary text follows mathematical patterns.
The irony is delicious. Markov used Russia’s greatest poet to demolish a philosophical argument about probability, and in doing so, created one of the most important tools in modern mathematics.
What Are Markov Processes?
A Markov process is a sequence of events where the probability of what happens next depends only on the current state, not on how you got there. Mathematically:
$$P(X_{n+1} = j \mid X_n = i, X_{n-1}, X_{n-2}, \ldots) = P(X_{n+1} = j \mid X_n = i)$$
The past is irrelevant—only the present matters.
Transition Probabilities and Matrices
In a Markov chain, you move between states with certain probabilities. If you’re in state \( i \), the probability of moving to state \( j \) is denoted \( p_{ij} \).
These transition probabilities can be arranged in a transition matrix \( P \) where entry \( (i,j) \) gives the probability of going from state \( i \) to state \( j \).
For example, a simple weather model with two states (Sunny, Rainy):
$$P = \begin{pmatrix} 0.8 & 0.2 \\ 0.4 & 0.6 \end{pmatrix}$$
This means:
- If today is Sunny: 80% chance tomorrow is Sunny, 20% chance Rainy
- If today is Rainy: 40% chance tomorrow is Sunny, 60% chance Rainy
Computing Future Probabilities
What’s the probability of being in state \( j \) after \( n \) steps, starting from state \( i \)? The answer is the \( (i,j) \) entry of \( P^n \) (the transition matrix raised to the \( n \)-th power). For example, what’s the probability it’s sunny in 3 days if it’s sunny today?
$$P^3 = P \times P \times P = \begin{pmatrix} 0.656 & 0.344 \\ 0.688 & 0.312 \end{pmatrix}$$
So there’s a 65.6% chance it’s sunny in 3 days given it’s sunny today. What about in 100 days? The Problem is that computing \( P^{100} \) requires 100 matrix multiplications—tedious and error-prone. For larger systems with many states, this becomes computationally intractable.
Where Markov Processes Are Used
Today, Markov chains are everywhere in business, technology, and science:
- Web search: Google’s PageRank models surfing as a Markov chain where a page’s importance is the long-run visit probability
- Natural language: Voice assistants (Siri, Alexa) and predictive text use Hidden Markov Models
- Finance: Credit risk modeling, options pricing, regime-switching strategies
- Operations: Call center staffing, manufacturing capacity, network routing (queueing theory)
- Healthcare: Disease progression, hospital capacity planning, treatment modeling
- Supply chain: Inventory reordering, stockout prevention, disruption modeling
- AI: Reinforcement learning (AlphaGo, ChatGPT training) uses Markov Decision Processes
The unifying theme is systems where the future depends only on the present state, not the entire history.
Why Generating Functions Are Natural for Markov Chains
Here’s the deep connection that makes generating functions the obvious tool:
The Matrix Power Series Perspective
In a Markov chain, we need to compute \( P^n \) for various values of \( n \). Let’s write out this sequence:
- After 0 steps: \( P^0 = I \) (identity matrix—you’re still where you started)
- After 1 step: \( P^1 = P \)
- After 2 steps: \( P^2 \)
- After 3 steps: \( P^3 \)
- …
Now here’s the key insight: what if we encode this entire infinite sequence of matrices into a single object?
Let’s define the matrix generating function:
$$G(s) = I + Ps + P^2s^2 + P^3s^3 + \cdots = \sum_{n=0}^{\infty} P^n s^n$$
This is a matrix whose entries are generating functions!
The Beautiful Algebraic Structure
Notice something remarkable: $$G(s) = I + Ps + P^2s^2 + P^3s^3 + \cdots$$ $$G(s) = I + s(P + Ps + P^2s^2 + \cdots)$$ $$G(s) = I + sP(I + Ps + P^2s^2 + \cdots)$$ $$G(s) = I + sP \cdot G(s)$$
Solving for \( G(s) \): $$G(s) - sPG(s) = I$$ $$(I - sP)G(s) = I$$ $$G(s) = (I - sP)^{-1}$$
This is stunning: the infinite sequence of matrix powers \( P, P^2, P^3, \ldots \) collapses into the simple closed form \( (I - sP)^{-1} \)!
This is the matrix analogue of the geometric series formula: $$\frac{1}{1-x} = 1 + x + x^2 + x^3 + \cdots$$
Connection to Individual States
For a specific state \( j \), let \( \mathbf{p}^{(n)} \) be the probability distribution at time \( n \). Then: $$\mathbf{p}^{(n)} = P^n \mathbf{p}^{(0)}$$
The generating function for the probability of being in state \( j \) is: $$g_j(s) = \sum_{n=0}^{\infty} p_j^{(n)} s^n$$
This is exactly the \( j \)-th component of the vector: $$\mathbf{g}(s) = \sum_{n=0}^{\infty} P^n \mathbf{p}^{(0)} s^n = (I - sP)^{-1} \mathbf{p}^{(0)}$$
Why This Is Natural
Computing \( P^{100} \) directly requires multiplying the matrix by itself 100 times—computationally expensive and error-prone.
But with generating functions:
- Form \( I - sP \) (simple subtraction)
- Invert this matrix (one-time cost)
- Extract the coefficient of \( s^{100} \) (using techniques like partial fractions)
Even better, we can often get closed forms without computing individual coefficients at all!
The Natural Fit
So generating functions aren’t just a clever trick—they’re the natural algebraic object for studying Markov chains because:
- They encode the entire time evolution in one object
- They transform iterated matrix multiplication into a single matrix inversion
- They often yield closed forms that would be impossible to see otherwise
- They unify matrix powers, first passage times, and reward computations under one algebraic toolbox
This is why, when you see a Markov chain problem, reaching for generating functions should feel as natural as reaching for derivatives when you see rates of change.
The Challenge: Problems That Seem Intractable
While Markov chains are conceptually simple, answering basic questions about them can be extremely difficult. Computing probabilities over time requires raising transition matrices to high powers or summing over exponentially many paths.
Let’s see how generating functions—the same technique from the Fibonacci article—solve these problems elegantly.
Four Worked Examples
We’ll solve four real business problems step-by-step, showing the probability model, state diagram, and how generating functions provide closed-form solutions. We will see that we have a general method to solve these problems with generating functions. First, you model your states and transition probabilities and define the quantity you want to find, expressed as a sequence over state or steps. Then find a recurrence relation. The key now is embed the quantity sequence as a generating function and re-interpreting the recurrence relation as an equation with the generating function. You can then solve this equation to find a closed form of the quantity you were searching for.
This can sound a bit abstract, but let’s explore this more concretely in our examples.
Example 1: Call Center Queue Management
You run a call center. Customers arrive randomly and wait in a queue. Currently there are 5 people waiting. If the queue hits 20, customer service degrades and people hang up. You need to decide whether to bring on more staff. Therefore, you need to know:
How long on average until the queue reaches a critical length of 20 people?
Let’s model our problem. First we define the Markov Process. The states are number of people in queue (0, 1, 2, …, 20, …). The transitions are a bit more complicated. Let’s assume that the customers arrive at rate \( \lambda \) = 10 per hour and that you have \( c = 3 \) operators, each serving at rate \( \mu = 4 \) per hour. Then the probability to go from state \( n \) to state \( n+1 \) (queue grows) is \( p = \frac{\lambda}{\lambda + c\mu} = \frac{10}{10+12} = \frac{10}{22} \), and the probability to go to state \( n-1 \) (queue shrinks) is \( q = \frac{c\mu}{\lambda + c\mu} = \frac{12}{22} \).
This is captured in state transition diagram below.
graph LR
0((0)) -->|q| 1((1))
1 -->|p| 0
1 -->|q| 2((2))
2 -->|p| 1
2 -->|q| 3((3))
3 -->|p| 2
3 -->|q| 4((4))
4 -->|p| 3
4 -->|q| 5((5))
5 -->|p| 4
5 -->|q| 6((6))
6 -->|p| 5
6 -->|q| mid((...))
mid -->|p| 6
mid -->|q| 19((19))
19 -->|p| mid
19 -->|q| 20((20))
20 -->|p| 19
Now, the sequence we’re trying to find is the expected time (in hour) to reach state 20 starting from state \( n \), which we will call \( T_n \). In our initial question we start from \( n=5 \).
Since state \( n \) can be reached from state \( n-1 \) with probability \(p\) and from state \( n+1 \) with probability \( q \), our recurrence relation is:
$$T_n = 1 + p \cdot T_{n+1} + q \cdot T_{n-1}$$
With \( T_{20} = 0 \) (already there).
Let’s now introduce our generating function and define $$G(s) = \sum_{n=0}^{19} T_n s^n$$
This is a finite sum since we stop when the queue is 20. Multiply the recurrence by \( s^n \) and sum from \( n=0 \) to \( n=19 \):
$$\sum_{n=0}^{19} T_n s^n = \sum_{n=0}^{19} s^n + p \sum_{n=0}^{19} T_{n+1} s^n + q \sum_{n=0}^{19} T_{n-1} s^n$$
After algebraic manipulation (shifting indices, applying boundary conditions), we get a functional equation that can be solved.
$$T_n = \frac{1}{c\mu - \lambda} \left[ \frac{(p/q)^{20-n} - 1}{p/q - 1} \right] \quad \text{when } \lambda < c\mu$$
For our numbers (\( \lambda = 10 \), \( c\mu = 12 \), \( n=5 \)): $$T_5 = \frac{1}{12-10} \left[ \frac{(10/12)^{15} - 1}{10/12 - 1} \right] \approx 7.5 \text{ hours}$$
So now we can answer our initial question. The expected time is 7.5 hours before the queue reaches critical length. If we reduce staff to 2 (\( c\mu = 8 < \lambda = 10 \)), the queue becomes unstable—it grows without bound! The formula tells managers exactly the staffing threshold needed for stability: \( c\mu > \lambda \).
Example 2: Trading Firm Risk (Gambler’s Ruin)
A trading firm has $10M in capital. Each trading day, they either make $1M (52% of days) or lose $1M (48% of days). Even with a statistical edge, there’s risk of ruin. The firm needs to know if their capital buffer is adequate. The management wants to know:
What’s the probability of bankruptcy before reaching the target of $20M?
Here the states are the capital in millions (0, 1, 2, …, 10, …, 20). The transitions are straightforward:
- From state \( k \): move to \( k+1 \) with probability \( p = 0.52 \) (win)
- From state \( k \): move to \( k-1 \) with probability \( q = 0.48 \) (lose)
We say that state 0 and 20 are absorbing, as they lead to either bankruptcy or a win.
The state transitions are represented visually below.
graph LR
0[0<br/>RUIN] -.-> 1
1((1)) -->|p| 2((2))
1 -->|q| 0
2 -->|q| 1
2 -->|p| midL((...))
midL -->|q| 2
midL -->|p| 10((10<br/>current))
10 -->|q| midL
10 -->|p| midR((...))
midR -->|q| 10
midR -->|p| 19((19))
19 -->|q| midR
19 -->|p| 20[20<br/>TARGET]
style 0 fill:#f99
style 20 fill:#9f9
linkStyle 0 stroke-width:0,opacity:0;
The sequence we’re trying to know is the probability of ruin starting from capital \( k \), which we will call \( R_k \).
The recurrence relation is also straightforward: $$R_k = p \cdot R_{k+1} + q \cdot R_{k-1}$$
With \( 0 \leq k \leq 20 \), \( R_0 = 1 \) (already ruined), \( R_{20} = 0 \) (safe, reached target).
Now let’s bring in the generating function \[ F(z) = \sum_{k=0}^{20} R_k z^k \] Multiply the recurrence \(R_k = p R_{k+1} + q R_{k-1}\) by \(z^k\) and sum over \(k=1,\ldots,19\). After shifting indices, the interior sums cancel and you obtain the algebraic identity \[ (1 - pz - qz^{-1}) F(z) = q R_0 z - p R_{20} z^{21} = qz \] because \(R_0=1\) and \(R_{20}=0\). Multiply through by \(z\) to clear the negative power: \[(z - p z^2 - q) F(z) = q z^2.\] The denominator factors neatly as \(-p(z-1)(z-q/p)\), so \(F(z)\) has simple poles at \(z=1\) and \(z=q/p\). We can therefore write a partial fraction decomposition \[F(z) = \frac{A}{z-1} + \frac{B}{z-q/p}.\] Expanding each term as a geometric series, \[\frac{1}{z-1} = -\sum_{k\ge 0} z^k, \qquad\frac{1}{z-q/p} = -\frac{p}{q} \sum_{k\ge 0} \left(\frac{p}{q} z\right)^k,\] reveals that the coefficients themselves must be a linear combination of 1 and \((q/p)^k\): \[R_k = C_1 + C_2 \left(\frac{q}{p}\right)^k.\]
You can also use the so-called characteristic equation to come to this result, which is a kind of shortcut 1.
From \( R_0 = 1 \) and \( R_{20} = 0 \),
$$ \begin{cases} C_1 + C_2 = 1 \\ C_1 + C_2(q/p)^{20} = 0 \end{cases} $$
Solving this system gives the familiar closed form $$R_k = \frac{(q/p)^k - (q/p)^{20}}{1 - (q/p)^{20}}$$
So let’s compute for our firm. With \( p = 0.52 \), \( q = 0.48 \), starting from \( k = 10 \) we have \( q/p = 0.48/0.52 \approx 0.923 \) and \( R_{10} = \frac{0.923^{10} - 0.923^{20}}{1 - 0.923^{20}} \approx \frac{0.459 - 0.211}{1 - 0.211} = \frac{0.248}{0.789} \approx 0.314 \)
There is a 31.4% chance of bankruptcy before reaching $20M! Even with a 4% edge (52% vs 48%), nearly 1 in 3 chance of ruin. The formula shows ruin probability depends exponentially on the ratio \( q/p \). If the edge shrinks to 51% vs 49%, ruin probability jumps to 45%. Capital buffers need to be much larger than intuition suggests.
Bonus: How Long Does It Take?
The same generating-function machinery yields the closed form (for \(p \ne q\)): $$T_k = \frac{k(20-k)}{q-p} - \frac{20}{q-p} \cdot \frac{1 - (q/p)^k}{1 - (q/p)^{20}}.$$ Plugging in the trading desk’s numbers gives $$T_{10} = \frac{10 \times 10}{0.48 - 0.52} - \frac{20}{0.48 - 0.52} \cdot \frac{1 - (0.48/0.52)^{10}}{1 - (0.48/0.52)^{20}} \approx 2500 \text{ days} \approx 10 \text{ years}.$$
Even with the edge, it takes about a decade on average to either double capital or go bust!
Example 3: Subscription Service Lifetime Value
You run a SaaS company. Customers pay $100/month. Each month, 10% churn (cancel), but 30% of churned customers eventually resubscribe. This “customer lifetime value” (LTV) determines how much you can spend on acquisition. If LTV equals $1,500, you can profitably spend up to $1,500 to acquire each customer. The question is:
What’s the expected total revenue from a new customer?
We have three states:
- Active (A): Paying subscriber, generates $100/month
- Churned (C): Cancelled, generates $0, but might return
- Gone (G): Permanently lost
And the following transitions per month:
- Active → Active: 0.9 (stay subscribed)
- Active → Churned: 0.1 (cancel)
- Churned → Active: 0.3 (resubscribe)
- Churned → Gone: 0.7 (permanently leave)
- Gone → Gone: 1.0 (absorbing state)
This is captured in the following state transition diagram.
graph LR
A[Active<br/>$100/mo] -->|0.9| A
A -->|0.1| C[Churned<br/>$0]
C -->|0.3| A
C -->|0.7| G[Gone<br/>$0]
G -->|1.0| G
style A fill:#9f9
style C fill:#ff9
style G fill:#ccc
The quantity we want is the expected lifetime revenue starting from the active state, which we can track by the probability for a customer to be active in month n and then multiply per the monthly subscription cost.
Let \(P_n\) be this probability, with \(P_0 = 1\). Each month they either stay active (probability 0.9) or churn (probability 0.1). If they churn, 30% of those customers reactivate the following month. Therefore the probability a customer is active in the next month is 0.9 (stayed active) + 0.1 × 0.3 (churned then reactivated) = 0.93. This gives the simple recurrence \[P_{n+1} = 0.93 P_n \qquad (n \ge 0).\] Introduce the ordinary generating function \(A(z) = \sum_{n \ge 0} P_n z^n\). The recurrence implies \[\frac{A(z) - 1}{z} = 0.93 A(z) \quad \Rightarrow \quad A(z) = \frac{1}{1 - 0.93 z}.\]
Let’s convert back to dollars. The expected number of active months is the sum of the coefficients, \(\sum_{n \ge 0} P_n = \lim_{z \to 1^-} A(z) = 1/(1 - 0.93) = 1/0.07\). Multiplying by the $100 earned in each active month yields $$LTV = 100 \times \frac{1}{0.07} \approx 1{,}429.$$
Business Insight:
- LTV = $1,429 for an active customer
- A churned customer is still worth $429 (because they might return!)
- Win-back campaigns targeting churned users should spend up to $429
- Without resubscription (if churned meant gone forever), LTV would be only $1,000.
- The 30% resubscribe rate increases LTV by 43%
Example 4: Inventory Management with Variable Demand
You run a retail store. Daily demand varies: some days are busy (100 units sold), other days are slow (40 units sold). Crucially, demand is autocorrelated: if today is busy, tomorrow is likely busy too. You start with 200 units in stock. Stockouts lose sales and frustrate customers. But holding too much inventory ties up capital. You need to balance these costs. The question is:
What’s the probability of running out of stock within the next week?
The Markov chain has 2 states for demand:
- High (H): 100 units/day demand
- Low (L): 40 units/day demand
Transitions per day:
- High → High: 0.7 (busy days cluster)
- High → Low: 0.3
- Low → High: 0.4
- Low → Low: 0.6 (slow days cluster)
State Diagram:
graph LR
H[High<br/>100 units/day] -->|0.7| H
H -->|0.3| L[Low<br/>40 units/day]
L -->|0.4| H
L -->|0.6| L
style H fill:#f99
style L fill:#9cf
Inventory equation: \( I_{n+1} = I_n - D_n \) where \( D_n \in {40, 100} \) depending on demand state.
Step 1: Write the Evolution
Let \( p_H(i, n) \) = probability of having \( i \) units on day \( n \) in High demand state.
After one day in High state with \( i \) units:
- Sell 100 units → down to \( i - 100 \)
- Tomorrow: High with prob 0.7, Low with prob 0.3
This gives: $$p_H(i, n+1) = 0.7 \cdot p_H(i+100, n) + 0.4 \cdot p_L(i+100, n)$$ $$p_L(i, n+1) = 0.3 \cdot p_H(i+40, n) + 0.6 \cdot p_L(i+40, n)$$
Step 2: Generating Functions Simplify This
Define: \( G_H(s, n) = \sum_{i} p_H(i,n) s^i \)
The evolution equations become: $$G_H(s, n+1) = s^{-100} [0.7 G_H(s,n) + 0.4 G_L(s,n)]$$ $$G_L(s, n+1) = s^{-40} [0.3 G_H(s,n) + 0.6 G_L(s,n)]$$
(Multiplying by \( s^{-100} \) shifts all probabilities down by 100 inventory units.)
Starting from \(G_H(s,0)=s^{200}\) and \(G_L(s,0)=0\), the previous recursion pushes the generating functions forward one day at a time. Negative powers of \(s\) correspond to inventories that have already hit zero or below. Let \([f]_{>0}\) denote the projection that discards those negative powers. The probability of still having stock after \(n\) days is then
$$\Pr(I_n>0) = \left([G_H(s,n)+G_L(s,n)]_{>0}\right) \big|_{s=1}$$
and the stockout probability is the complement. Carrying out the algebra (equivalently, at each step keeping only the positive powers of \(s\)) gives the exact figures below (rounded to three decimals).
| Days elapsed | \(Pr(I_n>0)\) | Stockout probability | Dominant remaining states |
|---|---|---|---|
| 1 | 1.000 | 0.000 | H: 100 units (0.70), L: 160 units (0.30) |
| 2 | 0.510 | 0.490 | H: 60 (0.12), L: 60 (0.21), 120 (0.18) |
| 3 | 0.342 | 0.658 | H: 20 (0.072), L: 20 (0.162), 80 (0.108) |
| 4 | 0.0648 | 0.9352 | L: 40 (0.0648) |
By the fourth day only the all-low sequence keeps the inventory positive; on day five every path has exhausted the original 200 units.
Business Insight:
- Within two days there is already a 49% chance of a stockout; by the fourth day the probability climbs to 93.5%.
- The only survival path over four days is four consecutive low-demand days. The Markov structure tells you the exact weight of that path (6.48%).
- Generating functions make it straightforward to produce these exact risk numbers, which are far sharper than a crude average-demand estimate \(200/74 \approx 2.7 \) days.
The Common Pattern
All four examples follow the same blueprint:
- Model the system: Identify states and transition probabilities
- Write a recurrence: Express the quantity we want (probability, expected value, etc.) recursively
- Define a generating function: Encode the sequence into a power series
- Solve algebraically: Convert the recurrence into a functional equation and solve
- Extract the answer: Read off coefficients or evaluate derivatives
This is exactly what we did with Fibonacci in the previous article! The difference is that now we’re solving real business problems:
- Call center staffing decisions
- Trading firm risk management
- Customer acquisition budgets
- Inventory reorder policies
Historical Note
Abraham de Moivre used these techniques in 1730 to solve the gambler’s ruin problem—decades before Markov was even born. Laplace systematized generating functions for probability in his 1812 Théorie Analytique des Probabilités.
The connection to Markov chains came later, but the mathematical machinery was ready and waiting. It’s a beautiful example of how mathematical tools developed in one context (pure combinatorics) find unexpected applications elsewhere (stochastic processes).
Conclusion
From Markov’s literary analysis of Pushkin to Google’s PageRank algorithm, Markov chains have evolved from an academic curiosity into an essential tool for modern business and technology.
What makes generating functions so powerful is that they transform the problem:
- From computing infinitely many matrix powers to inverting a single matrix
- From tracking probabilities at each time step to manipulating algebraic expressions
- From discrete combinatorics to continuous analysis
The next time you encounter a system where “the future depends only on the present”—whether it’s customer behavior, inventory dynamics, or financial risk—remember that generating functions provide the natural mathematical framework for understanding its evolution over time.
As Herbert Wilf wrote: “A generating function is a clothesline on which we hang up a sequence of numbers for display.” For Markov chains, that clothesline reveals the entire future trajectory in a single elegant expression.
Further Reading
- William Feller, An Introduction to Probability Theory and Its Applications, Vol. 1 (1968) - Classic text with extensive coverage of generating functions for probability
- Sheldon Ross, Introduction to Probability Models - Accessible treatment of Markov chains and generating functions
- Herbert S. Wilf, generatingfunctionology (1994) - Delightful introduction to generating functions (Free PDF | Amazon)
- E. Seneta, “Markov and the Birth of Chain Dependence Theory” (1996) - Historical account of Markov’s work and the Nekrasov controversy
The characteristic-equation method assumes a linear recurrence with constant coefficients (\(a_0 R_k + a_1 R_{k-1} + \cdots + a_m R_{k-m} = 0\)). Trying a trial solution \(R_k = r^k\) leads to the polynomial \(a_0 r^m + a_1 r^{m-1} + \cdots + a_m = 0\); factoring it reveals the exponent bases. In the gambler’s-ruin recurrence the roots are \(1\) and \(q/p\), so \(R_k = C_1 + C_2 (q/p)^k\). Imposing \(R_0 = 1\) and \(R_{20} = 0\) gives the same constants as the generating-function derivation, confirming \(R_k = \frac{(q/p)^k - (q/p)^{20}}{1 - (q/p)^{20}}\). ↩︎