I intend this as a quick note on “probabilistic computability” of real numbers since it has recently come up in my research. I won’t fuss too much about the motivation. One can imagine cases where deterministic algorithms fail due to an ability to handle rare pathological cases. For example, we might have some family of unknown lines (perhaps we are wishing to approximate them) and require a point not on those lines. If we simply pick a point at random, we will have probability \(1\) of falling outside of this line. Randomness may also be introduced for efficiency reasons, for example if we have a product of two large primes and we require an integer that is coprime from it then you will surely (most of the time) save a lot of computational resource by just picking an integer at random rather than actually attempting the prime decomposition (I might suggest buying an excess of lottery tickets if you manage to randomly pull an integer multiple of \(2^{2147483647} – 1\)), then perhaps having an error check of some kind to catch the rare case that you hit a bogey. One can then run their program several times to gain more confidence in their result, most feasible when the output is expected to be a Yes or No (if a program can be abstractly determined to output the correct result 90% of the time and we get “Yes” on 87/100 runs, we may be fairly confident that the answer should be “Yes”).
We may ask if can we do more with algorithms using randomness than deterministic ones. The answer is no and I will now discuss this. I will provide very little motivation, but hopefully it will be of interest.
Computability of Real Numbers
I will take for granted the concept of a Turing machine. We pick a convenient enumeration \((x_n)\) of the rationals. Many explicit enumerations are easily written down. I will call a real number \(x \in \mathbb R\) computable if there exists a Turing machine that accepts \(j\) and outputs a natural number \(k_j\) such that \(|x_{k_j} – x| < 2^{-j}\). That is, there is a Turing machine which produces arbitrarily good rational approximations to \(x\). Such real numbers are necessarily described by a finite string, despite possibly having an infinite decimal expansion. We call the thus produced sequence \(y_j = x_{k_j}\) a Cauchy name for \(x\). Similarly I will call a real number \(x \in \mathbb R\) limit computable if there exists a Turing machine that accepts \(j\) and outputs a natural number \(k_j\) such that when we define \(y_n = x_{k_j}\), we have \(y_n \to x\).
To show that these two notions of computability are not equivalent, we introduce Chaitin’s constant. Let \(K\) be the halting problem: the indices of Turing machines \(e\) that halt when given their own source code \(e\) as an input. We then define Chaitin’s constant as \(C = \sum_{j \in K} 2^{-j}\).
Theorem: Chaitin’s constant is computable but not limit computable.
Chaitin’s constant is limit computable: We construct a Turing machine which when given \(n\) will run the first \(n\) Turing machines \(\phi_e(e)\) for \(n\) steps. Let \(K_n\) be the indices of those machines that have halted in this time. Then \(y_n = \sum_{j \in K_n} 2^{-j}\) is a computable sequence. We have that \(K_n\) increases to \(K\) and hence \(y_n \to y\). In some sense, \(C\) is computable from below, but we cannot get error bounds from above.
Chaitin’s constant is not computable: Suppose towards a contradiction that \(C\) is computable. It is known that \(C\) is irrational and so has a unique non-recurring decimal expansion. Let \((C_n)\) be a Cauchy name for \(C\). Since \(|C_{n + 1} – C| \le 2^{-(n + 1)}\), it follows that the first \(n\) binary digits of the error \(C_{n + 1} – C\) are zero, implying that the first \(n\) binary digits of \(C_n\) are correct. Given \(C_n\) we can therefore determine which of the first \(n\) machines halt. We could then use the Turing machine producing \((C_n)\) to produce a Turing machine solving the halting problem. This is a contradiction.
Clearly, all rational numbers are computable and there are only countably many computable numbers.
A question I had was whether adding randomness changes anything. First, I’ll properly introduce the concept of a probabilistic Turing machine.
Oracle Turing Machines
We start with the concept of an oracle Turing machine. We imagine a situation where we are given a black box that can solve some kind of computable problem. Say, a binary string whose \(n\)th bit tells us whether the \(n\)th Turing machine halts given its own source code as an input. We can then ask what non-computable problems may become computable given this black box. This gives a notion in which some problems are “more non-computable” than others.
So, formally, we have a Turing machine with an additional read-only tape that has a (finite or infinite) binary string printed on it. Finite strings will be followed by infinitely many blank symbols. At any stage, the machine can choose to read any bit of the tape. The machine can then factor this into its calculation of what to do next. In the case of a finite string, the machine should stall when it tries to read a bit that is blank.
Writing \(\omega\) for the natural numbers, we denote the space of \(\{0, 1\}\)-sequences by \(2^\omega\) and the space of \(\{0, 1\}\) finite strings by \(2^{<\omega}\). The set of allowed oracle tapes is then \(2^\omega \cup 2^{<\omega}\). For \(\tau \in 2^{<\omega}\) and \(\sigma \in 2^\omega\), we write \(\tau \prec \sigma\) if \(\tau\) is an initial segment of \(\sigma\): for \(0 \le i < |\tau|\) we have \(\tau_i = \sigma_i\).
Note that if the oracle tape is “computable” (that is, there is a Turing machine that accepts \(n\) and produces its \(n\)th bit), we can simulate the action of the oracle Turing machine with this tape on a “oracleless” Turing machine. We do this simply by baking the computable tape into the source code of the oracleless Turing machine, helped by the fact that the composition of two computable functions is computable.
Perhaps surprisingly, fixing an oracle tape we can always simulate an oracle Turing machine on an oracleless Turing machine in some sense.
The Use Principle: Consider a Turing machine \(T\) with oracle tape \(\sigma\). Suppose that it halts on input \(n\). Then there is a finite string \(\tau \prec \sigma\) such that \(T\) still halts on input \(n\) when given \(\tau\) as an oracle tape.
Explanation: \(T\) will halt after a finite number of steps \(s\). At most \(s\) bits of the oracle tape will have been read. Since no bit beyond the \(s\)th is read, the action of the Turing machine would have been the same if they were flipped or blank. Hence, if we were to load \(T\) with the finite string \(\sigma_{<s}\) (the first \(s\) bits of \(\sigma\)), \(T\) would still halt on input \(n\) and produce the same output. In the same vein, if \(\tau\) is a finite string such that \(T\) halts when given \(\tau\) as an oracle tape, then it will also halt given any finite or infinite string extending \(\tau\). This fact will be indispensable.
We can then understand the action of the oracle Turing machine by trying each finite string in turn. The introduction of randomness will then be completely natural.
Probabilistic Turing Machines
We will use the framework of an oracle Turing machine to implement randomness. We want a probabilistic algorithm to be able to do a coin toss at any step to decide what it does. We therefore imagine a “random” oracle tape where every bit has an equal probability of being \(0\) or \(1\). We will say that such a probabilistic algorithm solves a particular computational problem if it produces the correct answer for “most” (relative to our probability measure) possible oracle tapes.
We now make this precise. Give \(2 = \{0, 1\}\) the Bernoulli \(1/2\)-distribution. We then consider the induced measure space \((2^\omega, \Sigma, \Pr)\). That is, for any finite string \(\tau_1, \tau_2, \ldots, \tau_n\) and distinct natural numbers \(k_1, \ldots, k_n\) we have:
\[\Pr \left((\sigma_n) \in 2^\omega : \sigma_{k_i} = \tau_i, \, \forall 1 \le i \le n\right) = \prod_{i = 1}^n \Pr \left((\sigma_n) \in 2^\omega : \sigma_{k_i} = \tau_i\right)\]
We also give \(2\) and \(\omega\) the discrete topology and give \(2^\omega\) the product topology. With our notation, the basic open neighborhoods in \(2^\omega\) are the sets \(\{\sigma \in 2^\omega : \tau \prec \sigma\}\) for finite strings \(\tau\).
We now enumerate the oracle Turing machines \(\phi_1^\bullet, \phi_2^\bullet, \ldots\), where \(\phi_e^\sigma(n)\) denotes the \(e\)th oracle Turing machine loaded with oracle tape \(\sigma\) and input \(n\). For fixed \(e\), we want to understand \((\sigma, n) \mapsto \phi_e^\sigma(n)\) as a random variable. For this, we adjoin an isolated point \(\mathbf{NH}\) to \(\mathbb N\) and understand \(\phi_e^\sigma(n) = \mathbf{NH}\) if \(\phi_e^\sigma(n)\) fails to halt. With that, we have the following theorem.
Theorem: The map \((\sigma, n) \mapsto \phi_e^\sigma(n)\) is measurable.
This will be a direct consequence of the use principle.
Proof: Since \(\mathbb N\) is countable and the union of countably many measurable sets is measurable, it suffices to fix \(n\). We prove that \(\sigma \mapsto \phi_e^\sigma(n)\) is actually locally constant (hence continuous) at any point \(\sigma\) for which \(\phi_e^\sigma(n) \downarrow\). Since \(\phi_e^\sigma(n) \downarrow\), there is a finite string \(\tau\) such that whenever \(\tau \prec \sigma^\ast\), we have \(\phi_e^{\sigma^\ast}(n) \downarrow\). The output will be the same as \(\phi_e^\sigma(n)\). Hence \(\phi_e^\bullet(n)\) is constant on the open neighborhood \(\{\sigma^\ast : \tau \prec \sigma^\ast\}\) of \(\sigma\).
Since the preimage of \(\mathbb N\) is open (hence measurable), the preimage of \(\{\mathbf {NH}\}\) is closed (hence measurable). So we conclude that the map \(\sigma \mapsto \phi_e^\sigma(n)\) is measurable.
With randomness now fully understood, we now understand the random partial function \(n \mapsto \phi_e^\sigma(\bullet)\) (with \(\sigma\) understood as a random variable with the above distribution) to be our “probabilistic Turing machine”. Note that \(n \mapsto \phi_e^\sigma(n)\) need not be computable for fixed \(\sigma\): while for each \(n\) there is a finite string \(\tau_n\) such that \(\phi_e^{\tau_n}(n)\) (hence the computation of each individual function value can be simulated on a Turing machine), the length of such strings may be unbounded in \(n\) and hence we may be tasked with reading arbitrarily many bits of \(\sigma\). We cannot do this uniformly if \(\sigma\) is not computable.
Solving problems on probabilistic Turing machines
We start with decision problems. That is, we have a set \(A \subseteq \mathbb N\) and wish to compute the indicator \(1_A\). We seek a probabilistic Turing machine that has over a \(1/2\) chance of producing \(1_A\). More precisely, we will require \(\Pr \left(\{\sigma \in 2^\omega : \forall n, \, \phi_e^\sigma(n) = 1_A(n)\}\right) > 1/2\). To simplify proofs we will also require that \(\Pr \left(\left\{\sigma \in 2^\omega : \phi_e^\sigma \text { is total}\right\}\right) = 1\). Let’s call \(A\) probabilistically computable if there exists a machine \(\phi_e^\bullet\) for which this holds.
Theorem: If \(A\) is probabilistically computable, then it is computable.
Proof: Let \(\phi_e^\bullet\) be a probabilistic Turing machine computing \(A\). From this we will craft a deterministic Turing machine \(\phi_f\) computing \(A\). First, take in an input of \(n\). We have:
\[\Pr \left(\{\sigma \in 2^\omega : \phi_e^\sigma(n) = 1_A(n)\}\right) > 1/2\]
Initialise lists \(\mathcal T_1 = \emptyset\) and \(\mathcal T_2 = \emptyset\). On step \(s\), consider the first \(s\) strings \(\tau_1, \ldots, \tau_s\). Run \(\phi_{e, j}^{\tau_j}(n)\) for \(1 \le j \le s\). If we halt on a string \(\tau_j\), add it to \(\mathcal T_1\) if the output is \(1\) and to \(\mathcal T_2\) if the output is \(0\). Write \(\mathcal T_i^\ast\) for the set of infinite strings \(\sigma\) for which there exists \(\tau \in \mathcal T_i\) such that \(\tau \prec \sigma\). Keep checking whether \(\Pr(\mathcal T_1^\ast) > 1/2\) or \(\Pr(\mathcal T_2^\ast) > 1/2\) – this is simply the computation of a geometric series and comes down to comparisons of integers. In the former case, print \(1\) and in the latter case print \(0\).
We now ensure that this process works. The two things to check are that we will in finite time reach a state where \(\Pr(\mathcal T_1^\ast) > 1/2\) or \(\Pr(\mathcal T_2^\ast) > 1/2\), and that the output is correct.
First, if \(\phi_e^\sigma(n) \downarrow\) then there exists \(i\) such that \(\tau_i \prec \sigma\) and \(\phi_e^{\tau_i}(n) \downarrow\). We can write this as:
\[\{\sigma \in 2^\omega : \phi_e^\sigma(n) = 1_A(n)\} = \bigcup_{n = 1}^\infty \{\sigma \in 2^\omega : \tau_i \prec \sigma, \, i \le n, \, \phi_e^{\tau_i}(n) = 1_A(n)\}\]
Note that the sets on the RHS are increasing. By continuity of measure, there exists \(N\) such that:
\[\Pr (\{\sigma \in 2^\omega : \tau_i \prec \sigma, \, i \le N, \, \phi_e^{\tau_i}(n) = 1_A(n)\}) > 1/2\]
Hence among the first \(N\) strings there will be \(\tau_{k_1}, \ldots, \tau_{k_n}\) such that \(\phi_e^{\tau_{k_j}}(n) \downarrow\). Say \(\phi_e^{\tau_{k_j}}(n)\) halts on step \(s_j\). Then on step \(s = \max \{s_1, \ldots, s_n, k_1, \ldots, k_n\}\) (if the process has not already halted!), we will have added all of \(\tau_{k_1}, \ldots, \tau_{k_n}\) to either \(\mathcal T_1\) or \(\mathcal T_2\), tipping either \(\mathcal T_1^\ast\) or \(\mathcal T_2^\ast\) above a measure of \(1/2\), and hence our process terminate. Since \(\mathcal T_1^\ast\) and \(\mathcal T_2^\ast\) are disjoint, the cases are mutually exclusive and we will produce the correct answer.
An almost identical proof establishes that probabilistically computable functions \(\mathbb N \to \mathbb N\) are computable. As promised in the title, we will prove this of real numbersas well. Call a real number \(x\) probabilistically computable if there is a probabilistic Turing machine \(\phi_e^\bullet\) that produces a Cauchy name for \(x\) with probability exceeding \(1/2\). Fix an enumeration \((q_n)\) of the rationals again. Then we require:
\[\Pr \left(\sigma \in 2^\omega : \forall n, \, |q_{\phi_e^\sigma(n)} – x| < 2^{-n}\right) > 1/2\]
We call \(x\) probabilistically limit computable if:
\[\Pr \left(\sigma \in 2^\omega : q_{\phi_e^\sigma(n)} \to x \text { as } n \to \infty\right) > 1/2\]
In prodding the probabilistic SCI hierarchy I found myself asking whether probabilistic (limit) computability is then equivalent to (limit) computability. I actually, at first, thought this might not be the case. It is conceivable that we could have a probabilistic Turing machine that outputs one of \(100\) completely distinct sequences, \(80\) of which converge to \(x\). We seemingly have no way of checking whether one converges to \(x\) without any further knowledge about \(x\). We may not even be able to estimate \(|x_n – x|\).
ChatGPT seemed to believe they were equivalent though, suggesting the “majority vote” tactic shown in the case of decision problems. It was slightly vague but does essentially work here.
Main Theorem 1: Probabilistically computable real numbers \(x\) are computable.
Proof: First suppose that \(x\) is irrational (in particular, non-dyadic), else \(x\) is immediately computable. Let \(\phi_e^\bullet\) be a probabilistic Turing machine computing \(x\). For brevity write \(x_n^\sigma = q_{\phi_e^\sigma(n)}\). We craft a deterministic Cauchy name for \(x\). Accept an input \(n\). We consider the intervals \(I_k = (k 2^{-n}, (k + 1) 2^{-n})\). Then \(x\) is contained in some \(I_k\), since it is not dyadic.
Now think (I suppose I shouldn’t say “initialise” since there are infinitely many of them) of lists \(\mathcal T_k = \emptyset\). At step \(s\), check whether \((x_i^{\tau_j} – 2^{-i}, x_i^{\tau_j} + 2^{-i})\) is contained in \(I_k\) for some \(1 \le i, j \le s\), \(-s \le k \le s\). If such \(j, k\) are found, add \(\tau_j\) to \(\mathcal T_k\). Let \(\mathcal T_k^\ast = \{\sigma \in 2^\omega : \tau \prec \sigma, \, \tau \in \mathcal T_k\}\) as before. Keep up this process until \(\Pr(\mathcal T_k^\ast) > 1/2\) for some value of \(k\), say \(k_n\). Then output \(y_n = (k_n + 1/2) 2^{-n}\).
We prove that this process terminates and gives a deterministic Cauchy name. Fix \(n\) as an input. With the notation above, there exists a unique \(k\) such that \(x \in I_k\). Then there exists \(i\) such that \((x – 2^{-i}, x + 2^{-i}) \subseteq I_k\). Then if \((x_n)\) is a Cauchy name for \(x\) (so \(|x_{i + 1} – x| \le 2^{-(i + 1)}\), we have that \((x_{i + 1} – 2^{-(i + 1)}, x_{i + 1} + 2^{-(i + 1)}) \subseteq (x – 2^{-i}, x + 2^{-i}) \subseteq I_k\).
Now, we note that:
\[\Pr \left(\{\sigma \in 2^\omega : |x_{i + 1}^\sigma – x| \le 2^{-(i + 1)}\}\right) > 1/2\]
and hence for some \(N\):
\[\Pr \left(\{\sigma \in 2^\omega : \tau_j \prec \sigma, \, j \le N, \, |x_{i + 1}^{\tau_j} – x| \le 2^{-(i + 1)}\}\right) > 1/2\]
Again, take \(s_\ast\) large enough such that \(\phi_{e, s}^{\tau_j}(i + 1) \downarrow\) has halted by stage \(s_\ast\) for all \(j\) such that \(\phi_e^{\tau_j}(i + 1) \downarrow\). Then by stage \(s = \max \{s_\ast, N, i + 1\}\), we will discover all strings among \(\tau_1, \ldots, \tau_n\) such that \(|x_{i + 1}^{\tau_j} – x| \le 2^{-(i + 1)}\). We will then discover that the interval \((x_{i + 1} – 2^{-(i + 1)}, x_{i + 1} + 2^{-(i + 1)})\) is contained in \(I_k\). Hence at this stage we will discover that \(\Pr(\mathcal T_k^\ast) > 1/2\) and will output \(y_n = (k + 1/2) 2^{-n}\). Since the interval \(I_k = (k 2^{-n}, (k + 1) 2^{-n})\) contains \(x\), we have \(|y_n – x| < 2^{-n}\). Hence \((y_n)\) gives a Cauchy name for \(x\), showing the (deterministic) computability of \(x\).
Main Theorem 2: A probabilistically limit computable real number \(x\) is limit computable.
Proof: We assume the existence of an oracle Turing machine \(\phi_e^\bullet\) realising the probabilistic limit computability of \(x\) and try to craft a deterministic \(\phi_f\) such that \(q_{\phi_f(n)} \to x\). Let:
\[p = \Pr \left(\{\sigma \in 2^\omega : q_{\phi_e^\sigma(n)} \to x \text { as } n \to \infty\}\right)\]
Accept an input \(n\). Run \(x_n^\tau\) for successive finite strings \(\tau\) until they halt and record the \(\tau\) for which we get halting in \(\mathcal T\). Keep going until \(\Pr \left(\{\sigma \in 2^\omega : \tau \prec \sigma, \, \tau \in \mathcal T, \, \phi_e^\tau(n) \downarrow\}\right) > 1 – 2^{-n}\). Denote the thus obtained set \(\{\sigma \in 2^\omega : \tau \prec \sigma, \, \tau \in \mathcal T, \, \phi_e^\tau(n) \downarrow\}\) by \(\mathcal S_n\).
We now have finitely many rational numbers \(x_n^{\tau_{k_1}}, \ldots, x_n^{\tau_{k_j}}\). Hence there are only finitely many intervals \([k 2^{-n}, (k + 1) 2^{-n}]\) containing at least one of these numbers. Enumerate them by \(I_1, \ldots, I_N\). Let \(\mathcal T_r\) (\(1 \le r \le N\)) consist of those \(\tau_{k_i}\) such that \(x_n^{\tau_{k_i}} \in I_r\). Pick \(r\) that maximises \(\Pr \left(\{\sigma \in 2^\omega : \tau \prec \sigma, \, \tau \in I_r \cup I_{r + 1}\}\right)\), and take \(y_n = (r + 1) 2^{-n}\).
We now need to prove that \(y_n \to x\). Fix \(\epsilon > 0\). Fix \(N\) such that \(p – 2^{-(N – 1)} > 1/2\). We have:
\[\Pr \left((\exists N_\sigma) : |x_n^\sigma – x| < \epsilon/2 \text { for all } n \ge N_\sigma\right) \ge p > 1/2\]
We can write the left hand side as:
\[\Pr \left(\bigcup_{N’ = 1}^\infty \{|x_n^\sigma – x| < \epsilon/2 \text { for all } n \ge N’\}\right) \ge p > 1/2\]
Hence there exists a particular \(N’ \ge N\) such that:
\[\Pr \left(|x_n^\sigma – x| < \epsilon/2 \text { for all } n \ge N’\right) \ge p – 2^{-N} > 1/2\]
Take \(N’\) larger if necessary to ensure that \(\epsilon/2 + 2^{-N’} < \epsilon\). Then for each \(n \ge N’\), we have:
\[\Pr \left(|x_n^\sigma – x| < \epsilon/2\right) \ge p – 2^{-N} > 1/2\]
Hence:
\[\Pr \left(\sigma : |x_n^\tau – x| < \epsilon/2 \text { and } \tau \in \mathcal S_n, \, \tau \prec \sigma\right) \ge p – 2^{-N} – 2^{-N} > 1/2\]
Since \(2^{-N’} < \epsilon/2\), all the \(x_n^\tau\) for \(\tau \in \mathcal S_n\) are contained in one of two adjacent intervals, \(I_r\) or \(I_{r + 1}\). Hence we will have \(\Pr \left(\{\sigma \in 2^\omega : \tau \prec \sigma, \, \tau \in I_q \cup I_{q + 1}\}\right) > 1/2\) for some \(r – 1 \le q \le r + 1\). Hence \(y_n = (q + 1) 2^{-n}\) and \(|x_n^\tau – (q + 1) 2^{-n}| \le 2^{-n}\). Then we have \(|y_n – x| \le \epsilon/2 + 2^{-N’} < \epsilon\). Since \(n \ge N’\) was arbitrary, we have \(y_n \to x\). This is our desired sequence.
Leave a Reply