The Puzzle
Years ago, I watched a Stargate episode where an advanced alien civilization was dying out. They had accumulated so many genetic defects that they could no longer reproduce successfully. The premise was scientifically questionable, but it planted a disturbing thought about whether humanity itself faces an inevitable extinction.
Consider the following reasoning. Suppose each woman has some probability p strictly less than 1 of giving birth. No matter how close p is to 1, there remains a non-zero chance that every woman in a generation fails to reproduce. Given enough time, this catastrophic event becomes not just possible but seemingly inevitable.
The mathematics of probability tells us that if an event has positive probability, it will eventually occur if you wait long enough. Doesn’t this mean extinction is certain?
This article explores whether this reasoning is sound or flawed. The answer involves a beautiful piece of mathematics called branching processes and reveals a surprising asymmetry. We can be certain of doom under some conditions, but we can never be certain of survival.
The Victorian Question of Aristocratic Extinction
To analyze this rigorously, we need to build a mathematical model of population dynamics. The framework we will use has a fascinating origin story that begins in Victorian England with a question about the persistence of noble family names.
The Galton-Watson Story
In 1873, Francis Galton, a polymath statistician and cousin of Charles Darwin, posed a problem in the Educational Times. He was concerned with a question that troubled the British aristocracy of his era. Why were so many distinguished family names disappearing despite the families’ wealth and prominence?
The mechanics seemed straightforward. A family name passes from father to son. If a man has no sons, only daughters, the family name dies with him in that patrilineal line. Even wealthy aristocratic families with many children might, by chance, produce only daughters in a generation, ending the surname forever.
Galton wanted to know the mathematical probability that a family name would eventually go extinct. This was not merely an academic exercise. The British peerage was genuinely concerned about the survival of their lineages. Galton formulated the question precisely. Given that each man produces sons according to some probability distribution, what is the chance that the male line eventually dies out?
Reverend Henry William Watson, a mathematician at Berwick, provided the first mathematical treatment of Galton’s problem. Together, they published “On the Probability of the Extinction of Families” in 1874. Their work established what we now call the Galton-Watson branching process.
Ironically, Galton himself had no children and his family name went extinct with him. The mathematical framework he created to study extinction has survived and flourished far beyond its original aristocratic context, finding applications in genetics, epidemiology, nuclear physics, and, as we shall see, the fundamental question of human survival.
The Mathematical Framework
In this model, time proceeds in discrete generations numbered 0, 1, 2, 3, and so on. Each individual in generation n independently produces offspring according to the same probability distribution. The number of offspring produced by any individual is a random variable with distribution given by probabilities \(p_0, p_1, p_2, \ldots\), where \(p_k\) represents the probability of having exactly k children. These probabilities must sum to 1, that is, \(\sum_{k=0}^{\infty} p_k = 1\). After producing offspring, the parent is no longer counted in the population, representing either death or retirement from reproduction.
The expected number of offspring per individual is given by $$\mu = \sum_{k=0}^{\infty} k \cdot p_k = E[\text{number of offspring}]$$
This parameter \(\mu\) will determine the fate of the population.
Population Size Evolution
Let \(Z_n\) denote the population size in generation n. Starting with \(Z_0 = 1\), representing a single ancestor, the population evolves as follows. The size \(Z_1\) equals the number of offspring of the initial individual. The size \(Z_2\) equals the sum of offspring produced by each individual in generation 1. This process continues recursively through all generations.
Extinction occurs when \(Z_n = 0\) for some n. Once the population hits zero, it stays zero forever since you cannot recover from extinction.
A Simple Example Using the Geometric Distribution
Let us make this concrete with the simplest non-trivial example. Consider the geometric distribution where $$p_k = (1-r) r^k \quad \text{for } k = 0, 1, 2, \ldots$$ with parameter r belonging to the interval \((0,1)\).
The expected number of offspring is $$\mu = \sum_{k=0}^{\infty} k(1-r)r^k = \frac{r}{1-r}.$$
This gives us three distinct regimes. If \(r < 1/2\), then \(\mu < 1\), a subcritical regime where the population tends to shrink. If \(r = 1/2\), then \(\mu = 1\), the critical case where the population stays roughly constant. If \(r > 1/2\), then \(\mu > 1\), a supercritical regime where the population tends to grow.
The question we wish to answer is what the probability of eventual extinction is in each of these cases.
Generating Functions as the Natural Tool
To answer this question, we introduce the probability generating function of the offspring distribution, defined as
$$G(s) = \sum_{k=0}^{\infty} p_k s^k = E[s^X]$$
where X represents the number of offspring.
If you have read the previous articles on Fibonacci numbers or Markov chains, you have already seen how generating functions transform recursive problems into algebraic equations. The same technique applies here, but with an even more elegant twist.
Why Generating Functions Work
The probability generating function encodes the entire offspring distribution in a single function. More importantly, it has remarkable compositional properties that perfectly match the recursive structure of branching processes.
The function \(G\) satisfies several key properties. First, \(G(1) = \sum_{k=0}^{\infty} p_k = 1\) since probabilities sum to 1. Second, the derivative at 1 gives the mean, that is, \(G’(1) = \sum_{k=0}^{\infty} k p_k = \mu\). Third, the second derivative \(G’’(1) = \sum_{k=0}^{\infty} k(k-1) p_k\) provides variance information. Fourth, \(G(s) \geq 0\) for all \(s\) in the interval \([0,1]\). Finally, \(G\) is convex on \([0,1]\) when the offspring distribution has finite variance.
The Composition Property
The magic happens when we look at generation n. If \(G_n(s)\) denotes the probability generating function of the population size in generation n, then we have the remarkable relation
$$G_{n+1}(s) = G(G_n(s)).$$
This follows from the independence structure of the model. Each individual in generation n produces offspring independently. The total population in generation \(n+1\) is the sum of all these offspring. The probability generating function of a sum of independent random variables is the composition of their individual generating functions.
Starting from \(G_0(s) = s\), representing one ancestor, we get \(G_1(s) = G(s)\), then \(G_2(s) = G(G(s))\), then \(G_3(s) = G(G(G(s)))\), and generally \(G_n(s)\) is the n-fold composition of \(G\) with itself.
The Extinction Probability
Let \(q\) denote the probability of eventual extinction, defined as $$q = \Pr(\text{population eventually dies out}) = \Pr(\exists n \text{ such that } Z_n = 0)$$
We claim that the extinction probability is \(q = \lim_{n \to \infty} G_n(0)\).
The reason is that \(G_n(0)\) equals the probability that generation n has size 0. As \(n\) tends to infinity, this converges to the probability that extinction has occurred by generation n.
The Fixed Point Equation
Here comes the key insight. Since \(G_{n+1}(0) = G(G_n(0))\), taking limits as \(n \to \infty\) gives
$$q = \lim_{n \to \infty} G_{n+1}(0) = G\left(\lim_{n \to \infty} G_n(0)\right) = G(q)$$
Therefore the extinction probability is a fixed point of the generating function, satisfying the equation
$$q = G(q)$$
This is the same pattern we encountered when analyzing Markov chains, where the generating function converted infinite sums into tractable algebraic equations. Here, an infinite sequence of generations collapses into a single fixed point equation.
The Extinction Theorem
Now comes the beautiful result that resolves our paradox.
Theorem (Galton-Watson Extinction). The extinction probability \(q\) satisfies three properties. First, \(q\) is the smallest non-negative solution to the equation \(q = G(q)\). Second, if \(\mu \leq 1\), then \(q = 1\), meaning extinction is certain. Third, if \(\mu > 1\), then \(q < 1\), meaning survival is possible.
Proof
We prove this theorem in two parts, first establishing that \(q\) is the smallest solution, then proving the critical dichotomy.
Part 1. The extinction probability is the smallest fixed point.
We know that \(q = G(q)\). Suppose there exists another solution \(q^\) with \(0 \leq q^ < q\).
Starting from \(G_0(s) = s\), we have \(G_1(0) = G(0) = p_0\). If \(q^* < q\), then \(G(q^) = q^ < q = G(q)\). But \(G\) is increasing on \([0,1]\) since \(G’(s) = \sum k p_k s^{k-1} \geq 0\). This would mean \(G(q^*) \geq G(0)\).
By induction, the sequence \(G_n(0)\) is increasing and bounded above by \(q\). Since it converges to \(q\), any other fixed point must be larger.
Part 2. The critical dichotomy.
Consider the equation \(q = G(q)\) graphically. We are looking for intersections of the curve \(y = G(s)\) with the line \(y = s\).
The function \(G\) has several key properties. We have \(G(0) = p_0 \geq 0\) and \(G(1) = 1\). The function is strictly increasing since \(G’(s) > 0\) for \(s\) in the interval \((0,1)\). The function is convex since \(G’’(s) \geq 0\) for \(s\) in \([0,1]\). Finally, the derivative at 1 equals the mean, \(G’(1) = \mu\).
Case 1. When \(\mu \leq 1\).
Since \(G\) is convex and \(G’(1) = \mu \leq 1\), the curve \(y = G(s)\) lies above the line \(y = s\) for all \(s\) in the interval \([0,1)\).
To see this, suppose for contradiction that there exists some \(s_0\) in the interval \((0,1)\) where \(G(s_0) < s_0\). Then by the mean value theorem, there would exist some \(c\) in the interval \((s_0, 1)\) where $$G’(c) = \frac{G(1) - G(s_0)}{1 - s_0} > \frac{1 - s_0}{1 - s_0} = 1$$
But \(G\) is convex, so \(G’(c) < G’(1) = \mu \leq 1\), giving us a contradiction.
Therefore \(G(s) \geq s\) for all \(s\) in \([0,1)\), with equality only at \(s = 1\). This means \(q = 1\) is the only solution to \(q = G(q)\).
Case 2. When \(\mu > 1\).
Now \(G’(1) = \mu > 1\). Since \(G\) is convex with \(G(0) = p_0 > 0\) (assuming extinction is possible in one generation) and \(G(1) = 1\), there must be a point where the curve crosses the diagonal.
More precisely, we have \(G(0) - 0 = p_0 > 0\), and the slope of \(G\) at \(s = 1\) is \(\mu > 1\), which is steeper than the diagonal. By convexity, \(G\) must cross from above the diagonal to below it at some point \(q^*\) in the interval \((0,1)\).
This \(q^*\) is the smallest fixed point, so \(q < 1\).
There is one special case worth noting. If \(p_0 = 0\), then \(G(0) = 0\), and we have \(q = 0\), meaning extinction is impossible since everyone has at least one child.
Geometric Example Revisited
For the geometric distribution with \(p_k = (1-r)r^k\), we have $$G(s) = \sum_{k=0}^{\infty} (1-r)r^k s^k = \frac{1-r}{1-rs}$$
This is a geometric series, the same type we encountered when deriving the Fibonacci generating function. The fixed point equation \(q = G(q)\) becomes $$q = \frac{1-r}{1-rq}$$ which simplifies to $$q(1-rq) = 1-r$$ $$q - rq^2 = 1-r$$ $$rq^2 - q + (1-r) = 0$$
Using the quadratic formula, we obtain $$q = \frac{1 \pm \sqrt{1 - 4r(1-r)}}{2r} = \frac{1 \pm \sqrt{(2r-1)^2}}{2r} = \frac{1 \pm |2r-1|}{2r}$$
When \(r < 1/2\), we have \(2r - 1 < 0\), so \(|2r-1| = 1-2r\). This gives two solutions, $$q = \frac{1 + (1-2r)}{2r} = \frac{2-2r}{2r} = \frac{1-r}{r} > 1 \quad \text{or} \quad q = \frac{1 - (1-2r)}{2r} = \frac{2r}{2r} = 1$$
The smallest solution in the interval \([0,1]\) is \(q = 1\), meaning certain extinction. This is consistent with \(\mu = r/(1-r) < 1\).
When \(r > 1/2\), we have \(2r - 1 > 0\), so \(|2r-1| = 2r-1\). This gives two solutions, $$q = \frac{1 + (2r-1)}{2r} = \frac{2r}{2r} = 1 \quad \text{or} \quad q = \frac{1 - (2r-1)}{2r} = \frac{2-2r}{2r} = \frac{1-r}{r}$$
The smallest solution in \([0,1]\) is \(q = (1-r)/r < 1\), meaning survival is possible. This is consistent with \(\mu = r/(1-r) > 1\).
For example, if \(r = 0.6\), then \(\mu = 1.5\) and \(q = 0.4/0.6 = 2/3\). There is a one-third chance the population survives forever.
Resolution of the Paradox
Now we can answer the original question about whether humanity is doomed.
The Flaw in the Intuition
The original argument claimed that if each woman has probability p strictly less than 1 of reproducing, then extinction must eventually occur. The error lies in confusing two distinct quantities. The first quantity is p, the probability that a woman has at least one child. The second quantity is \(\mu\), the expected number of children per woman. These are not the same.
Consider a concrete example. Suppose 20% of women have 0 children, so \(p_0 = 0.2\). Suppose 30% have 1 child, so \(p_1 = 0.3\). Suppose 30% have 2 children, so \(p_2 = 0.3\). Suppose 20% have 3 children, so \(p_3 = 0.2\).
Then we have \(p = 1 - p_0 = 0.8 < 1\), meaning any individual woman might not reproduce. However, the expected number of children is $$\mu = 0 \cdot 0.2 + 1 \cdot 0.3 + 2 \cdot 0.3 + 3 \cdot 0.2 = 1.5 > 1$$
Despite \(p < 1\), we have \(\mu > 1\), so extinction is not certain. The extinction probability is the solution to \(q = G(q)\), which will be less than 1.
What the Model Reveals
The Galton-Watson framework rests on a critical assumption that \(\mu\) remains constant over time. This assumption is simultaneously the model’s greatest limitation and its deepest insight.
Real human populations violate this assumption entirely. When the population shrinks, reproduction rates often increase due to social pressure and more resources per capita. When the population grows too large, rates decrease due to resource constraints and societal changes. Cultural and technological innovations continuously change the effective value of \(\mu\). Humanity actively adapts to keep \(\mu\) above 1.
Yet the constant-\(\mu\) assumption reveals something profound about the mathematics of survival in random systems. The theorem tells us there exists a fundamental asymmetry in stochastic processes.
When \(\mu \leq 1\), extinction occurs with probability 1. We can be certain doom awaits. This is “almost sure” convergence, the strongest form of probabilistic certainty. The pessimist facing a system with \(\mu \leq 1\) can declare with mathematical confidence that failure is inevitable.
When \(\mu > 1\), survival occurs with probability \(1 - q > 0\), where \(0 < q < 1\). We can never be certain of survival. We can only say that there is hope. Even with \(\mu = 2\), corresponding to a replacement rate of 2.0, there remains some positive probability of extinction. The optimist, even when facing favorable conditions, can never claim certainty, only possibility.
This asymmetry between certain doom and uncertain hope is not an artifact of the model. It is intrinsic to randomness itself. A deterministic system with growth rate greater than 1 survives forever. A stochastic system with \(\mu > 1\) might survive, but there is always a chance of extinction. This fragility cannot be eliminated. It is built into the mathematics of randomness.
The model also reveals why maintenance is insufficient for survival. A system with \(\mu = 1\), meaning it exactly replaces itself on average, is certain to fail eventually. The randomness inherent in the process ensures that fluctuations will eventually drive the population to extinction. To have any chance of long-term survival, a stochastic system needs \(\mu > 1\), some built-in growth or slack to buffer against random downturns.
But \(\mu\) cannot remain constant forever in a finite world. Sustainable systems must maintain \(\mu > 1\) when the population is low to avoid extinction, and they must adapt when approaching capacity to avoid collapse from overuse. This requires continuous active intervention, adjusting strategies based on the current state. You cannot coast. Maintaining survival requires ongoing effort to keep \(\mu > 1\). The best we can do is keep \(\mu\) as large as feasible to reduce \(q\), adapt \(\mu\) to changing conditions so it does not drop to or below 1, and diversify our approach since the multi-population version of the model has better survival chances.
Without adaptation, even seemingly healthy populations with \(\mu\) slightly above 1 face significant extinction risk over long timescales. The model therefore tells us that if \(\mu\) remains constant and \(\mu \leq 1\), extinction is certain. But in reality, \(\mu\) is not constant.
Conclusion
So is humanity doomed? The answer is neither a simple yes nor no. The mathematics tells us that doom is certain only if we allow \(\mu\) to remain at or below 1. Survival becomes possible when \(\mu > 1\), though never guaranteed.
The profound insight is not about finding certainty. Living in a random world means that no amount of preparation can guarantee survival forever. But randomness also means that even when the odds are against you, there is always a chance.
The way out of doom is to maintain the conditions for hope. This requires keeping \(\mu > 1\), adapting when necessary, and accepting that survival demands continuous effort rather than passive faith. The choice between certain extinction and uncertain survival is ours to make through our actions.
As Galton discovered while studying aristocratic surnames, the mathematics of branching processes applies equally to family names and to species. His own name went extinct, but the framework he created survives. The lesson is clear. Optimism can never be certain, but pessimism, once justified by the mathematics, becomes inevitable unless we change the system.
Further Reading
Harris, T. E., The Theory of Branching Processes (1963) provides a classic rigorous treatment.
Athreya, K. B. & Ney, P. E., Branching Processes (1972) offers a comprehensive modern text.
Galton, F. & Watson, H. W., “On the Probability of the Extinction of Families” (1874) is the original paper.
Wilf, H. S., generatingfunctionology (Free PDF) contains an excellent chapter on probability generating functions.