Introduction
If you look at all the numbers from to , you’d notice that of them contain the digit - specifically, only the number contains that digit.
Well, if you look at all the numbers from to , how many contain the digit ? Here we need to work a bit. Obviously, all numbers with a in the ones place contain , so that’s numbers already:
Also, all numbers that have in their tens place contain the digit, which gives us more numbers, but we already counted , so we only have numbers to account for here. This gives us a total of numbers that contain , or of the numbers.
What’s interesting is that the higher up you go, the higher the percentage becomes, until you get to the infinite case in which you get that of numbers contain .
This is the reasoning I saw in pop media, explained by “science people”. And while it may look like that, we know in mathematics not to trust what patterns or sequences look like at first glance, but to rigorously show that the infinite case behaves as claimed.
You might have an intuition that this statement is correct, but my goal with this article is to prove (or disprove) it formally, while guiding you (the reader) along.
How Do We Formally Prove This?
From here on out we’ll use tools from Calculus I. If you haven’t studied limits, you can skip to the conclusion for the answer.
To formally prove a statement, we need to formally state it. Here’s my interpretation of the statement, as inferred from the videos (and my “proof”) above.
Theorem:Let be a sequence such that is the number of integers containing as a digit in the set
Then:
Hopefully you’re convinced this captures the intended claim. If you think there’s a better interpretation, feel free to reach out - I’m happy to revise.
The Proof
Let be the sequence counting how many numbers in contain the digit . Let .
We’ll show that converges; then all of its subsequences converge to the same limit. Thus, we can look at the subsequence and evaluate that limit.
To show that converges, we’ll show it’s bounded above and monotone increasing.
is Bounded by
Note:This might feel obvious, but we’ll include it for completeness.
Assume, for contradiction, that is not bounded above by . Then there exists with :
meaning there are more numbers containing the digit in than there are numbers in the set (since ). Contradiction. Therefore, .
is Monotone Increasing
Let . Since , the number of integers containing in (i.e., ) is greater than or equal to that in . If not, some element of would be missing from , contradicting .
Converges to
We know is monotone increasing and bounded above, so it converges. To find its limit, consider the subsequence . Define
To count how many numbers from to contain the digit , consider the set of -digit strings
where is the digit in the place. Clearly .
Note:In our setup, the all-zeros string is treated as the representative for and plays the same role here.
When we see “at least once” in combinatorics, it’s natural to use complements. The number of strings with no at all is . Therefore, the number of strings (hence numbers) containing at least one is .
Check against our earlier examples:
- For to (): , matching the example.
- For to (): , also matching.
Thus , and
Hence , and therefore .
The Conclusion
From the above, the limit is . That is, as we consider larger and larger initial segments ( to , etc.), the proportion of numbers containing the digit tends to .
This phenomenon doesn’t rely on specifically; the same argument works for any fixed digit. So, in that asymptotic sense, “all numbers contain that digit”.
But of course, not every individual number contains every digit - for example, doesn’t contain . That’s the paradoxical charm of infinities: the set of counterexamples is infinite, yet vanishingly rare in the limiting proportion. If you pick a large number uniformly at random from a vast range, it’s very unlikely to be missing a given digit.
Extra
To sanity-check the result, here’s a small Python script sampler.py you can download and run here:
from random import randint
n = 100
k = 10_000
count = sum(len(set(str(randint(0, 10**n - 1)))) < 10 for _ in range(k))
print(f"\t{count} / {k} => {count / k * 100}%")
Running it ( refers to the range and to the number of samples) yields:
❯ python3 sampler.py
9 / 10000 => 0.09%
❯ python3 sampler.py
0 / 10000 => 0.0%
❯ python3 sampler.py
2 / 10000 => 0.02%
Very small percentages, which gives us some numerical validations.