#### Supplement to Chance versus Randomness

## C. Proofs of Selected Theorems

*Proof of Theorem
1.*
This follows from Church’s observation by a result of Doob (1936) or a
similar result of Wald (1938). Doob’s result is that the measure of the
set of infinite sequences which can result from the application of a
admissible place selection \(\phi\) on a member of some subset
\(S\) of the Cantor space is the same as the measure of \(S\)
(i.e., \(\mu(\phi^{-1}(S)) = \mu(S))\) (see also Feller 1950: 203). Since the
measure of the set of Borel *abnormal* sequences \(A\) is
zero, Doob’s theorem shows that the measure of
\(\phi^{-1}(A)\) is zero—i.e., that
the set of sequences such that an admissible place selection gives a
biased subsequence has measure zero. Since for von Mises only a
non-random sequence has biased admissible subsequences, the set of
non-random sequences is the set of all sequences such that for some
place selection, the result is a biased subsequence, so is the union of
\(\phi^{-1}(A)\) for all \(\phi\).
If there are only countably many \(\phi\) (as there are on
Church’s conception of place selections), since the measure of the
union of countably many sets is less than or equal to the sum of their
measures, the measure of the non-random sequences is zero, and the von
Mises-random sequences therefore exist and form a measure one subset of
all infinite binary sequences. See van Lambalgen (1987a:
§2.5.2).

*Proof of Corollary
1.*
(This proof is due to Chris Porter.) To begin, suppose
that the place selection \(f_{\sigma}(x_1\ldots x_{i-1}) = 1\)
iff there exists a \(j \lt i\) such that \(\sigma = x_j \ldots
x_i\). That is, on input some initial segment of sequence \(x\) of
length \(i-1\), this place selection relative to \(\sigma\) selects
the \(i\)-th digit of \(x\) iff \(\sigma\) is a suffix of the
input. It is easy to see that, for each finite binary sequence
\(\sigma , f_{\sigma}\) will be effectively computable, so it is a
plausible assumption that \(\{f_{\sigma}: \sigma \in X^{\lt
\omega}\}\) is a subset of the set of admissible place selections. It
can then be shown (by induction over the length of \(\sigma)\) that
any sequence which satisfies the property of large numbers, and all of
its admissible subsequences do too, will be Borel normal.

- The base case, for the empty string \(\varnothing\), is guaranteed by the requirement that the sequence \(x\) satisfies the property of large numbers, as every outcome place is preceded by an initial subsequence which has \(\varnothing\) as a suffix.
- Next suppose that every string of length \(n\) appears in \(x\) with limit frequency \(2^{-n}\). Then for each \(\sigma\) of length \(n\), \(f_{\sigma}\) will select from \(x\) a subsequence that satisfies the property of large numbers, as roughly half the time, when \(\sigma\) appears in \(x\), it will be followed by a 0, and the other half of the time it will be followed by a 1. Since this holds for any \(\sigma\) of length \(n\), it follows that \(\sigma 0\) and \(\sigma 1\), the possible strings of length \(n+1\), all occur with limit frequency \(\frac{1}{2} \cdot 2^{-n} = 2^{-(n+1)}\), as required.

*finite automata*(finite state machines) to characterise admissible place selections, a result of Agafonov (1968) shows that the class of resulting sequences is exactly the class of normal sequences. (Of course, since not every normal sequence is random—as the result of Ville, Theorem 2, shows—that class of sequences isn’t a good candidate for the class of random sequences.)

*Proof of Theorem
3.*
Every rejected sequence is rejected at some level. So every rejected
sequence has some initial sequence which is rejected at that level. To
be rejected at a level \(2^{-m}\) is to fall into a
set of sequences \(U_m\) which has measure less than the
size of the level, by definition. A test is just a collection of such
sets \(U_m\); it is clear that if a sequence is rejected
at a level, it is rejected at every smaller level, so
\(U_m \supseteq U_{m+1}\). It is
effectively determinable whether a given sequence is rejected; since
the set of pairs \(\langle m,n\rangle\) is effectively
enumerable (using Cantor’s pairing function), if a sequence should be
rejected, there is an effective procedure which establishes that. These
two results show that the set of rejected sequences is effective
measure zero: each test is a sequence of effective open sets of
sequences. To be rejected is to be a member of some such set. So the
set of all rejected sequences is in the intersection of all the test
sets \(U_m\). By construction, the rejected sequences,
by a particular test, are thus of effective measure zero.

Martin-Löf then shows that the set of all tests is itself effectively enumerable. Using the fact that a test is an enumerable set of sets, and Martin-Löf explicitly constructs a function that assigns a test to a given triple of numbers (which, in effect, encodes the test), and shows that the set \(T\) of such triples is effectively enumerable. The universal test \(U\) is obtained by fixing on some constant, \(c\), for each test \(V\), and letting the universal test be obtained from the set \(T\), using the triple which is the level of significance, the length of the sequence, and the constant (which may as well be the Gödel number of the test \(V)\). The construction ensures that for any test \(V, V_{m+c} \subseteq U_m\); that is, the universal test is one that rejects any sequence at a given level iff there is some test that rejects it at a related level (which level depending on the test). The details of the construction (Martin-Löf 1966: 606, 609–10) show that the universal test also rejects only an effective measure zero set of sequences, so the complement of the rejected sequences—the set of ML-random sequences—is effective measure one by means of a single universal property.

*Proof of Theorem
4.*
The proof is a fairly immediate corollary of the existence of a
enumeration of the partial recursive functions. Each function
\(g\) has an associated code number (Gödel number)
\(\gamma\). Let \(\alpha^{[\gamma]}\)
represent the string consisting of a block of \(\gamma\)
instances of the digit \(\alpha\). We now choose an \(f\)
that, on input \(1^{[\gamma]}0\delta\),
produces \(g(\delta)\). Such functions exist, because of
the existence of universal Turing machines (let \(\gamma\) be the
Gödel number of \(g\) under some standard coding, then a
universal Turing machine given this input may emulate the operation of
\(g\) on input \(\delta)\). It is obvious that they are
near-superior to any \(g\), for
\(c_f (\delta) = c_g (\delta) + (\gamma +1)\). Any universal function
will be able to emulate any other, which
also shows that they are all complexity equivalent to one another.

*Proof of Theorem
5.*
Suppose there was an algorithm \(h\) that on input \(i\)
yields a random sequence \(\varrho\) such that
\(\lvert\varrho\rvert = i\). \(\varrho\) is random,
so \(C(\varrho) \approx i\); yet \(h\) is
near-inferior to some universal function, so that \(C(\varrho) \le C_h
(\varrho) + k\), for some \(k\) depending on \(h\). Let \(\lvert\varrho\rvert\)
increase, so \(k\) becomes negligible with respect to \(i\); then \(i
\le C_h (\varrho) + k \approx C_h (\varrho)\). But \(C_h (\varrho)
\approx \log_2 i\) (because that is the length of the binary
representation of \(i)\); it follows that \(i \lesssim \log_2 i\),
contradiction. So there can be no such algorithm \(h\) (van Lambalgen
1995: 11).

*Proof of Theorem
6.*
We show that, for fixed \(k\). If \(\mu\) is sufficiently
long then there is an initial segment \(\sigma\) of \(\mu\)
such that \(C(\sigma) \lt \lvert\sigma\rvert - k\).

Suppose that we have some sequence \(\mu\), with initial segment \(\nu\), such that \(\nu\) is the \(n\)-th string in the length-lexicographic ordering of the finite binary strings. Let \(\varrho\) be the next \(n\) digits of \(\mu\), and let \(\sigma = \nu\unicode{x2040}\varrho\). We can generate \(\sigma\) from \(\varrho\) alone, because the length of \(\varrho\) will give us \(\nu\) using the standard ordering via some constant decoder. So we know that \(C(\sigma) \approx C(\varrho) \le \lvert\varrho\rvert + c\). We also know, however, that \(\lvert\sigma\rvert = \lvert\nu\rvert + \lvert\varrho\rvert\). If \(\nu\) is chosen to be longer than the sum of the constants (i.e., \(\lvert\nu\rvert \gt c + k)\), it’s obvious that \(C(\sigma) \lt \lvert\sigma\rvert - k\).

*Proof of Theorem
7.*
*Only if:* The set \(U_k\) consists of those sequences which
have finite initial subsequences which are compressible by \(k\)
bits. That is, \(U_k = \cup_{\sigma \in \overline{R}} N(\sigma)\),
where \(\sigma \in \overline{R}\) iff \(K(\sigma) \le \lvert\sigma\rvert -
k\). The set of \(k\)-compressible strings is recursively enumerable;
on input \(n \in \mathbb{N}\), take the universal prefix-free
decompression function \(u\), and check if \(\lvert u(n)\rvert - \lvert n\rvert \lt k\). So
each \(U_k\) is effective open; indeed, this algorithm shows the
\(U_k\) to be uniformly effective open (supplement
C).

Recall that as a proportion of all finite strings, there are at most \(1/2^k\) compressible strings (§2.2.1). Since there are fewer prefix-free \(k\)-compressible sequences than plain \(k\)-compressible sequences, the upper bound still holds for prefix-free compressibility. As this holds independently of the length of sequences, we can use it to establish a measure for \(U_k : \mu(U_k) = 2^{-k}\). So \(\mu(U_k) \le 2^{-k}\). So \(\cap_{k}U_{k}\) is effective measure zero, and is a Martin-Löf test. Suppose \(x\) is ML-random; thus \(x \not\in \cap_{k}U_{k}\). Therefore there is a \(k\) such that \(x\) has no initial sequence which is compressible by \(k\) bits, and is therefore prefix-free Kolmogorov random.

*If:* The ‘if’ direction is a little trickier, and
depends in part on the details of how the sequential significance
tests for ML-randomness are defined. For details, see Li and
Vitányi (2008: 221–3) (as well as Chaitin 1987). The
general idea is to show that for a sequence \(\omega\) which is not
ML-random, and fails some sequential test, then there is a way of
constructing a Turing machine that will compress initial subsequences
of \(\omega\) arbitrarily much as the length of the initial subsequence
increases (i.e., the difference between \(n\) and the prefix-free
complexity of the length \(n\) initial subsequence of \(\omega\) is
positively unbounded).