Provability and the Halting Problem

I’m currently drafting part of my first paper and got onto the topic of provability. I initially found this topic pretty confusing, but I think I have developed a competent enough understanding to put up a quick note here. Where I use the term Turing machine, there is no loss reading this as “computer program”.

Pedantry and Non-Standard Models

First, I’ll try to discharge the pedantry. In the language of the natural numbers, we have the constant symbols \(0\), \(1\) (the additive and multiplicative units), the binary operation symbols \(+\) and \(\times\) (addition and multiplication), the unary function symbol \(s\) (the “successor” function) and the binary relation \(<\). Hence in a strict technical sense, \(1 + 1 = 2\) is not a sentence in the language of arithmetic, since \(2\) is not a symbol in the language. Of course, this is readily resolved by writing \(1 + 1 = s(1)\) or \(1 + 1 = s(s(0))\) instead. This is to say, when we write \(5\) in a “formula in the language of arithmetic”, we will mean \(s(s(s(s(s(0)))))\), but this is a mouthful.

We introduce “Peano arithmetic” as the standard axioms of arithmetic. None of these will be a surprise, so we omit them – for example commutativity of addition/multiplication, associativity, and so on are all specified as axioms. The “standard model” of Peano arithmetic is the natural numbers \(0, 1, \ldots\) as intuitively understood. Of course, the intuitive notion of a “natural number” and of arithmetic is not immediately translated into formalism. ZFC proves the existence of a “smallest inductive set”, denoted \(\omega\), which is understood as playing the role of the natural numbers. Typically we will assume that ZFC is consistent, grab a model we like (whose arithmetic coincides with our intuitive notion), and take its \(\omega\) as our natural numbers. Hence “standardness” is a relative notion, dependent on our choice of “standard model”.

We now talk about “non-standard models”. After all, there is nothing to say that the natural numbers should be the only structure that satisfy the axioms of arithmetic. We can obtain such a model non-constructively using the compactness theorem. For brevity write the theory of Peano arithmetic as \(\mathbf{PA}\). We adjoin a constant symbol \(N\) to the language of arithmetic, and consider the set of axioms \(T = \{N > 0, N > s(0), N > s(s(0))\}\) and so on. If \(T_\ast \subseteq T\) is finite, then we can show that \(T_\ast \cup \mathbf{PA}\) is consistent: if \(s(s(\ldots(0)\ldots)\) is the largest lower bound given for \(N\) and there are \(n\) successor operations, we can interpret \(N\) as \(n + 1\) to show that the standard model \(\mathbf N\) is a model of \(T_\ast \cup \mathbf{PA}\). Since every finite subset of \(T \cup \mathbf{PA}\) is consistent, it follows from the compactness theorem that \(T \cup \mathbf{PA}\) is consistent. Hence there is a model \(\mathcal M\) of \(T \cup \mathbf{PA}\). We simply drop the constant symbol \(N\) from our language to obtain a model of \(\mathbf{PA}\). This model is clearly “non-standard”, since \(N\) is greater than any “standard” natural number. \(\mathcal M\) does however contain a “copy” of the standard natural numbers, as will every model of PA. You can read a bit about what non-standard models look like in Boolos’ Models of Arithmetic.

The final thing I will address is the ambiguity in even writing \(0\) or \(1\), or so on. There’s “the” zero, and “the” one corresponding to the standard model of arithmetic, but different models of PA will have their own objects which are internally recognised as \(0\) or \(1\) but may not be the “real world” \(0\) or \(1\). Throughout the next sections I will write \(0\) and \(1\) to mean the constant symbols in the language of arithmetic (and \(2\), \(3\) to mean \(s(1)\), \(s(s(1))\) and so on, where \(s\) is a function symbol), and \(\overline 0^M\) to denote \(0\) *interpreted* within the model \(M\) (that is, \(M\)’s \(0\)).

\(\Delta_0\) truth is determined

We can see “quantifier-free” facts regarding the natural numbers, for example \(1 + 1 = 2\), as fully determined and so “absolutely true” or “absolutely false”. These facts are true iff PA proves them. As an extension, we can easily understand facts that involve bounded quantifiers as “absolutely true” or “absolutely false”. Take the example the sentence “\(4\) is divisible by \(2\)”. We would write this out as the first order sentence \((\exists n)[4 = 2 n]\), which seems to involve an existential quantifier. However \(4 = 2 n\) for a natural number \(n\) implies that \(0 \le n \le 2\), so this sentence is logically equivalent to \((4 = 2 \times 0) \vee (4 = 2 \times 1) \vee (4 = 2 \times 2)\), which does not involve quantifiers.

This fact is “clear”, there are roughly well-understood procedures for verifying, say, \(3 + 3 = 6\) by hand. We can also prove this, and the proof essentially relies on the fact that any model of PA will contain a copy of the natural numbers. A proof can be found as Lemma 11.2 in Andrew Marks’ computability notes. The problem arises as almost a “glitch” when starting to think about quantifiers.

We need to make it clear that we cannot talk about non-standard natural numbers in the language of arithmetic or prove anything about them in PA. If we could prove the existence of such numbers within PA, all models would have them. Hence, it is impossible for \(\Delta_0\) formulae to involve non-standard numbers. However, if we introduce existential or universal quantifiers, such as in \((\exists x)(x + 5 = 7)\) or \((\forall x, y)(x + y = y + x)\), the quantifiers *do* go over the non-standard numbers as well (so addition with non-standard numbers will commute, obey the same ordering rules, and so on). Hence it could be that some formula \((\exists x) \phi(x)\) is false in the standard model, yet there is some non-standard model somewhere with some non-standard natural number \(N\) where \(\phi(x)\) is satisfied with the assignment \(x = N\). Such a sentence will be independent of PA since there are some models of PA in which it is true, and others in which it is false. We will see an explicit example of this in the next section.

\(\Sigma_1\) and \(\Pi_1\)

We call a formula of the form \((\exists x)\phi(x)\), where \(\phi(x)\) is a \(\Delta_0\) formula with free variable \(x\), a \(\Sigma_1\) formula. We call a formula of the form \((\forall x) \phi(x)\), a \(\Pi_1\) formula. If a \(\Sigma_1\) formula \((\exists x)\phi(x)\) is true in the standard model, then PA can prove this. If \((\exists x) \phi(x)\) is true in the standard model, then we have that \(\phi(x)\) is true in \(\mathbf N\) with an assignment \(x = N\). Hence \(\mathbf N\) verifies \(\phi(N)\). Hence PA proves \(\phi(N)\), and so \((\exists x) \phi(x)\). Hence, assuming appropriate soundness of PA which we have implicitly assumed throughout, truth in the standard model coincides with provability in PA.

The issue is more complicated when we look at \(\Pi_1\) formula. We call a theory \(T\) “computably enumerable” if there exists a Turing machine that given \(n\) will output a natural number which is the Godel number of the \(n\)th axiom of \(T\). The classic non-provability result is that a computably enumerable theory extending PA (including ZFC!) cannot prove its own consistency unless it is consistent. Of course, a theory cannot be trusted on matters of its own consistency anyway (as we just said, if a theory is inconsistent then it proves its own consistency), but this gives us a \(\Pi_1\) statement that is independent of PA and ZFC.

We can formulate \(\mathbf{Con}(T)\), where \(T\) is a computably enumerable theory extending PA, as an arithmetic statement. We note \(T\) is inconsistent if and only if it manages to prove \(0 = 1\). We can therefore ask whether there exists a natural number \(n\) which encodes a \(T\)-proof of \(0 = 1\). So if we denote by \(\phi(n)\) the claim “\(n\) encodes a proof of \(0 = 1\)”, then we can formulate \(\mathbf{Con}(T)\) as the \(\Pi_1\) statement \((\forall x)\neg \phi(x)\).

Let’s just take \(T\) to be ZFC now and assume that ZFC is consistent. Since \(\mathbf{Con}(\mathbf{ZFC})\) is not provable in \(\mathbf{ZFC}\), \(\mathbf{ZFC} + \neg \mathbf{Con}(\mathbf{ZFC})\) is somehow consistent, and so has a model \(\mathcal M\). This model somehow believes \(\mathbf{ZFC}\) is inconsistent, despite apparently being living proof that this is not true. But in this model, \((\exists x) \phi(x)\) is true. The witness \(n\) cannot be (the image of) a standard natural number: if \(\phi(n)\) were true then ZFC would be inconsistent, contrary to our assumption, and if it were false, then PA would prove \(\neg \phi(n)\), and we would have a contradiction to the truth of \(\phi(n)\) in the model \(\mathcal M\). Hence any model of \(\mathbf{ZFC} + \neg \mathbf{Con}(\mathbf{ZFC})\) will have non-standard natural numbers.

I mention this to ease into discussing the halting problem. The understanding that I have developed is that models with non-standard natural numbers have a skewed internal understanding of “finite” which does not agree with the standard one. Hence this model \(\mathcal M\) will believe that it has a finite number decoding to a valid proof of \(0 = 1\), this number will essentially be “infinite” and not correspond to a finite, genuine proof. I will make this interpretation clearer when we discuss the halting problem.

The Halting Problem

I suppose this bit is the whole point of this note, a correspondence between provability and computability and senses in which they are equivalent for computably enumerable theories.

It’s a classic theorem of Turing that the halting problem is not computable. That is, there is no Turing machine that accepts an index \(e\) of a Turing machine and determines whether it halts on input \(e\) (by some diagonalisation, this is equivalent to the question of whether a given program \(e\) halts on a given input \(x\)). We now interpret this in terms of non-provability.

In a certain way, determining whether \((\exists x) \phi(x)\) in the standard model is equivalent to the question of whether a certain Turing machine halts. Simply for each step \(s\), we check \(\phi(s)\), and halt if it is true. In the same way, determining whether \((\forall x) \phi(x)\) in the standard model is equivalent to the question of whether a certain Turing machine does not halt, we halt if we encounter an \(s\) for which \(\neg \phi(s)\).

I think the utility of this analogy is that it drives home how arithmetic fact can be understood as fully determined. A Turing machine either exhibits certain behaviour or it doesn’t, whether it halts on an input is an intrinsic nature of the source code. If a machine halts, we will be able to observe this by simply letting the machine run. However, if the machine does not halt we will never “see” it. If we call it quits after a day of runtime, it might have halted on the second. We would have to do some kind of logical analysis of the source code. Non-provability of halting just means that this analysis may be impossible in a given set of axioms, say PA or ZFC.

The halting problem essentially tells us that for at least one program (in fact, infinitely many), we have non-halting and we cannot determine this with any analysis within PA or ZFC. Suppose that whenever a Turing machine halted, ZFC could prove it. ZFC is computably enumerable, and so we can make a list of all logical sequences possible from axioms of ZFC. We can hence enumerate all proofs of true statements in ZFC. Now, for each Turing machine \(e\) on input \(e\), ZFC will either prove that it halts or prove that it doesn’t. We can therefore craft a Turing machine that given \(e\) will look for a proof that the machine with index \(e\) halts on input \(e\) and simultaneously a proof that it doesn’t. It will find one or the other. We can have the machine halt and print \(1\) if a proof in the affirmative is found, and \(0\) otherwise. This Turing machine will solve the halting problem, contradicting non-computability. Hence there is at least one machine whose halting behaviour is independent of ZFC. By repeating this argument with machines with index \(e > N\), we get this independence for infinitely many machines.

So let’s take a machine which does not halt, but this fact is independent of ZFC. What does this mean? Well, in this model \(M\), it is true that for some \(s \in {\mathbf N}^M\), the machine halts at stage \(s\). Clearly \(s\) cannot be a standard natural number, so must be non-standard. We can think of this as the machine halting after an apparently infinite length of time, which our model believes is finite because its understanding of “finite” is out of whack with that of the external world.

I would conclude this section with the following: \(\Sigma_1\) arithmetic problems are such that we can verify in finite time whether they are true (the associated Turing machine will halt), but necessarily cannot always verify in the negative when they are false (if false, the associated Turing machine will not halt and we may not be able to determine this). \(\Pi_1\) arithmetic problems are just the negation of this: if they are false we can verify this in finite time, but if they are true we cannot do so in general.

Thinking about spectra

We conclude with a brief discussion of spectra. We want to continue this message of “some things are true but unprovable”. But so far we’ve only discussed arithmetic problems and it may be initially unclear what “absolute truth” means outside of this context. The answer comes back to the determined nature of Turing machine halting.

We start by coding operators via a Turing machine. Let’s say we have a family of suitably computable (where the meaning of this term depends on the computational problem) operators \(\Omega = \{A_n : n \in \mathbf N\}\) with a collection of Turing machines \(\Lambda\) providing information about the operators in \(\Omega\). Fix some computable enumeration \(\{x_j : j \in \mathbf N\}\) of \(\mathbf Q + i \mathbf Q\). There then exists a “\(\Pi_3\)” tower of algorithms \(\langle \Gamma_{n_3, n_2, n_1}\rangle_{(n_3, n_2, n_1) \in \mathbb N^3}\) solving the spectral problem \((\Omega, \Lambda)\). We can then describe the set \(\{(n, j, k) : \sigma(A_n) \cap B_{2^{-j}}(x_k) = \emptyset\}\) from \(\Gamma_{n_3, n_2, n_1}(A_n)\) via arithmetic. In principle, we can then understand this set in PA and the standard model of natural numbers, even though writing “\(\sigma(A_n) \cap B_{2^{-j}}(x_k) = \emptyset\)” does not make strict technical sense outside ZFC. Technically, we should (and implicitly do) swap it out for the \(\Pi_3\) arithmetic formula to which it is equivalent in ZFC. We give some caveats below.

Of course, in doing this we put some amount of trust in the arithmetic soundness of ZFC. We have to trust that our construction (which would initially be in ZFC), makes sense when interpreted in the “real world”. Specifically we trust that all Turing machines that are believed to behave a certain way (for example, computing a certain real number), actually do behave this way in the real world. I’m not too concerned about this soundness assumption, because we implicitly make it throughout our discussion. If one is fond of physics, you can understand the “real world” as being the true nature of the physical system as predicted by the model.

While general spectral problems are \(\Pi_3\), many have more favourable classifications. If we can solve the spectral problem \((\Omega, \Lambda)\) with \(\Sigma_1\) classification (convergence from below) in the SCI hierarchy and if \(\sigma(A_n) \cap B_{2^{-j}}(x_k) \ne \emptyset\) is true, then we can use the \(\Sigma_1\) algorithm to verify this in finite time. The converse is also true: knowing which balls with rational centre/dyadic radius intersect the spectrum, we can reconstruct it from below. Similarly to the halting problem, we then have that if the spectral problem \((\Omega, \Lambda)\) is *not* \(\Sigma_1\), then there is an operator \(A_n\) for which the ball \(B_{2^{-j}}(x_k)\) intersects the spectrum (understanding this via Turing machines), yet this cannot be proven in ZFC. This extends to \(\Pi_1\): we have a \(\Pi_1\) algorithm iff we can verify \(\sigma(A_n) \cap B_{2^{-j}}(x_k) = \emptyset\) in finite time whenever it is true.

Unfortunately, most consequences I have formulated within this framework have not been terribly exciting to me. Most lower bounds in the “Markov model” come down to embedding a Turing machine problem (halting problem for showing \(\not \in \Sigma_1 \cup \Pi_1\), \(\mathbf{Tot}\) for showing \(\not \in \Delta_2\) and \(\mathbf{Cof}\) for showing \(\not \in \Delta_3\)) into an operator, so we can understand the non-provability results as a direct translation of the corresponding results for Turing machines. Hopefully we will find a case with deeper consequences.