Provability and the Halting Problem

I’ve written this post before but looking back I don’t think it was particularly good. Too many symbols and not enough explanation maybe. In this post I’ll expand on the sense in which a failure of computability implies a failure in provability.

Turing Machines

Turing machines are a way to treat computer programs as a mathematical object, named after Alan Turing. We imagine the machine as consisting of:

a two-sided-infinite tape \(M\). Initially, the tape will be written with finitely many (perhaps zero) consecutive \(1\)s, with blank symbols written either side;
a state register \(Q\) which determines the subroutine that the program is currently following. \(Q\) should consist of at least two states: the starting state \(q_1\) and the halting state \(q_0\). If the machine changes its state to \(q_0\), the machine does no further operations, we say that the machine has “halted”.

The machine is equipped with a read/write head which hovers over one cell at a time. At each stage, the machine reads the value of the cell it is currently hovering over, as well as the current state of the machine, and can decide to change the state of the machine, replace the value in that cell (either write it as \(1\) or change it to a blank symbol) or move the r/w head left/right across the tape. We may also leave the action of the head, given a particular state and symbol read, undefined. In this case, the machine will have entered a “stalling state” where no further operations will occur. We do not consider this to be a halting state, rather the machine has somehow entered an illegal configuration and cannot perform any further operations. For example, a divisor may have become zero.

We consider the action of the read/write head to be a function \(\delta : Q \times \{1, B\} \to Q \times \{1, B\} \times \{\text{left}, \text{right}\}\), where \(B\) is the blank symbol. We call this the “transition function” of the machine. We understand the “input” of the machine to be the number of consecutive \(1\) initially loaded onto the tape, (we do not distinguish cells) and the “output” (if the halting state \(q_1\) is reached) to be the number of \(1\)s on the tape after halting has been achieved. With this convention, we can associate a Turing machine with a particular type of partial function. Given a Turing machine \(M\), we define \(f_M(n)\) as the output of the machine \(M\) with input \(n\) if it halts, and leave \(f_M\) undefined otherwise. We write \(f_M \uparrow\) if \(M\) does not halt on input \(n\), and \(f_M \downarrow\) otherwise. We call partial functions that can be represented in this way computable.

Despite seeming very simple and primitive, the above model of computation represents any practically implementable computer program. This idea is called the Church-Turing thesis, a procedure can be simulated on a Turing machine if and only if it can, in principle, be carried out by hand by a human. We can therefore think of computable functions as functions which can be implemented as a Python function. Clearly we cannot hard code each function value, so we must have a procedure that accepts \(n\) applies some procedure to it, (which must take the form of a finite set of instructions, possibly with loops that allow for arbitrarily long runtime) which can theoretically be replicated by hand, to obtain \(f(n)\). If a function is not computable, there exists no such procedure. This does not immediately mean that we cannot calculate a particular function value \(f(n_0)\), it means that there is no procedure to calculate \(f(n)\) that is uniform in \(n\), somehow we must adapt our procedure for each \(n\). I say immediately, because actually the non-computability of a function does mean that certain function values are not proven by ZFC, despite being possibly entirely determined in the real world (more on this later).

The language and axioms of arithmetic

In this section I will do a kind of whistlestop tour of theories and models in logic. Essentially, a theory is a template for some structure in mathematics: for example, a group or a universe of set theory. A “language” \(\mathcal L\) consists of the following:

constant symbols which denote named objects in the theory;
function symbol which represent named functions in the theory;
relation symbols which represent named relations in theory.

For example, if we are to take the language of group theory, we have a constant symbol \(e\), which is named as the identity element, and a function symbol \(\circ\), which is named as the group operation. In the language of set theory, the only symbol we introduce is \(\in\), which is named as the membership relation.

We think of a theory \(T\) on a language \(\mathcal L\) as certain governing rules for the constant symbols, function symbols and relation symbols. In the theory of groups, we specify that:

\(e\) behaves as an identity, \(e \circ x = x \circ e\) for each \(x \in G\);
every element is invertible: for all \(x \in G\) there exists \(y \in G\) such that \(x \circ y = y \circ x = e\);
the \(\circ\) operation is associative: \(a \circ (b \circ c) = (a \circ b) \circ c\) for each \(a, b, c \in G\).

We then have a model of the theory of group theory: a set \(G\) and a binary operation \(\circ\) which satisfies all of these properties. Every statement that follows directly from the three axioms of group theory must hold for every group. On the other hand, if a group property does not follow from the group axioms (and first order logic), then there must be some group that does not satisfy this property. For example, the claim that for all \(x, y \in G\) we have \(x \circ y = y \circ x\) cannot be proven from the axioms of group theory, and there exist many such “abelian” groups. Equivalently a property (that can be formulated in terms of \(e\), \(\circ\) and first-order logic) that is true of all groups can be proven from finitely many steps from the group axioms and first order logic. This is Godel’s completeness theorem.

I introduced the theory of groups first because it is a good example of a theory where someone will have no preconception of what a model should look like. In the case of arithmetic, we have very strong preconceptions about what should and should not be the case. We often think of arithmetical truth as entirely determined, seeming to represent an actual cognitive process of counting. Given a sum or product, we can verify it by hand by counting on many fingers, there is no real room for any indeterminacy. Either there exists a natural number with a property and we can verify this, or there doesn’t. Whether we can prove that there does not exist a natural number with a certain property is an entirely separate question. And recalling Godel’s completeness theorem, if an arithmetic truth cannot be proven, then there are structures that satisfy the axioms of arithmetic where this arithmetic truth is false. We will explore this a bit more in the next section.

I’ll finish by formally introducing the language of arithmetic. We have the two constant symbols \(0\) and \(1\), which represent the additive and multiplicative identities respectively, the successor \(s\) (which outputs the “next” natural number and seems to force some kind of discreteness to the structure), the addition operation \(+\) and the multiplication operation \(\times\). The axioms of Peano Arithmetic are of no surprise and are hence omitted. It includes the principle of induction, that if a property in the language of natural numbers hold for \(0\), and whenever it holds for \(n\) it also holds for \(n + 1\), it must hold for all natural numbers.

Unprovable truths and non-standard models

We first clarify what we mean when we speak of a proof, with a view to enumerate them. Let \(T\) be a theory in a language \(\mathcal L\). By a \(T\)-proof of a \(\mathcal L\)-formula \(\phi_n\) we mean a finite sequence of formulas \(\phi_1, \phi_2, \ldots, \phi_n\) such that for each \(i\):

\(\phi_i\) is a logical axiom;
\(\phi_i\) is an axiom of \(T\);
there are \(j, k < i\) such that \(\phi_j\) is equal to the formula \(\phi_k \to \phi_i\) [modus ponens];
there is \(j < i\) such that \(\phi_i\) is equal to \(\forall x \phi_j\) [generalisation].

If a theory \(T\) proves a formula \(\phi\), we write \(T \vdash \phi\) and say that \(T\) proves \(\phi\). We call \(\phi\) a theorem of \(T\). Gödel’s completeness theorem says that for any given formula, \(T\) either proves \(\phi\) or there is a model of \(\phi\) in which \(\neg \phi\) is true. It may be that \(T\) proves neither \(\phi\) nor \(\neg \phi\), making \(\phi\) independent of \(T\) and meaning that there are models of \(T\) where \(\phi\) is true, and others in which it is false. For example, the continuum hypothesis is independent of ZFC: there are models in which CH is true (for example, the Von Neumann universe), and others in which it is false.

Surprisingly, this notion of proof is sufficient to encompass everything which we would like to consider a proof. Modus ponens simply allows us to follow implications, if we have proven that \(A\) implies \(B\) and we have proven \(A\), then we should have a proof of \(B\). Generalisation is perhaps a bit less clear. It allows us to carry out a proof of \(\phi(x)\) for an arbitrary variable \(x\), with the steps being valid for any \(x\), and deduce \((\forall x) \phi(x)\). It is notable here that if, in say \(\mathbf {PA}\), we have proofs of \(P(0)\), \(P(1)\), \(\ldots\), then we do not necessarily have a proof of \((\forall n) P(n)\). A proof of \((\forall n) P(n)\) must be finite so we cannot simply glue the previous proofs together, we must have a proof that proves \(P(n)\) for sufficiently large \(n\). We are then able to deal with finitely many exceptional cases individually if needed. We will return to this idea.

Now that we have a formal concept of “proof”, we can talk about enumerating them. Gödel numberings give a method to code a formula in a countable language as a natural number, and also translate natural numbers back into a formula by looking at prime decompositions. Though the numbers that will be produced and have to be factored will get extremely large very quickly, we are not particularly concerned with runtime and merely ask whether particular computations are possible. We treat this Gödel number as being as good as having the formula itself for this reason, because we know it can be translated in finite time. Note that formulas with different Godel numbers may be equivalent, we do not check for equivalence at any point.

With these numberings in hand we talk about computably enumerable theories. This is a theory where one is able to systematically list its axioms. Of course, there may be infinitely many of them, however there is a procedure which if carried out for sufficiently long will eventually spit out any given axiom in the theory. Formally, a computably enumerable theory is one for which there exists a Turing machine which accepts \(n\) and outputs the Godel numbering of the \(n\)th axiom of the theory. There may be repeats or axioms that are actually equivalent but nonetheless every axiom should appear in the list produced, and everything in the list produce should be an axiom of the theory. ZFC and Peano Arithmetic are both computably enumerable.

With a machine \(\phi_x\) witnessing the computable enumerability of a theory \(T\), we can finally talk about enumerating the theorems of \(T\). At each stage \(n\) we grab the first \(n\) axioms from the machine \(\phi_x\) and see what proofs of length at most \(n\), involving at most \(n\) variables we can form with them. Since there are only finitely many rules and finitely many variables they can be applied to, there are only finitely many possible candidates for a proof at each stage. Conversely, every proof involves only finitely many steps and finitely many variables, so will eventually be counted within this process. Hence, if we have a theorem \(\phi\) of \(T\), we will eventually discover that it is a theorem of \(T\). We do not, however, have a similar way to discover whether \(T\) does not prove \(\phi\), it will merely be the case that no proof of \(\phi\) will ever be discovered.

The halting problem

We now consider the halting problem. Informally we ask whether there is a Turing machine that can accept the source code of another Turing machine and determine the halting behaviour. We first show that there is no machine that accepts a Godel number \(e\) of a Turing machine and outputs \(1\) if the machine halts and \(0\) otherwise. That is, there is no machine that computes the characteristic function of the set \(K = \{e \in \mathbb N : \phi_e(e) \downarrow\}\).

Suppose that there existed such a machine. We could then define another Turing machine \(T\) (with associated partial function \(f\)) as follows: accept a natural number \(e\) and check whether \(e \in K\). If \(e \in K\), then output \(\phi_e(e) + 1\). Otherwise, output \(0\). However, this machine is then not represented by a partial computable function \(\phi_x\): if \(x \in K\) then \(f(x) = \phi_x(x) + 1 \ne \phi_x(x)\), and if \(x \not \in K\) we have \(f(x) = 0\) while \(\phi_x(x) \uparrow\). Hence \(f\) is not a partial computable function and the machine \(T\) cannot exist. This shows that \(K\) is not a computable set and the halting problem is non-computable.

We now think about implications to provability. Before we dig in, I will discuss the phrase “in the real world” as applied to Turing machines. In what follows we accept that the behaviour of a Turing machine is determinate. We can implement any Turing machine in a sensible programming language (e.g. Python), and it will either halt on a particular input or it won’t. This is entirely independent of any mathematical analysis we do to the machine and we are at the mercy of our axiomatic systems to be able to prove facts about it (indeed, we also have to trust it not to wrongly predict the behaviour of these machines).

If a machine halts, then we can prove this within any theory formalising arithmetic: whether the machine has halted by the \(s\)th step can be determined in finite time from the axioms of arithmetic simply by tracing the action of the machine. However if the machine doesn’t halt, there is no obvious way to prove this without an analysis of the machine’s source code, which our axiomatic system may not be strong enough to perform. Intuitively it would also mean that the source code has to be presented in a way that is convenient for such analysis (while a solution to the halting problem must work for source code for a particular procedure, of which there will be infinitely many).

Now that we have that the halting problem is not computable, we can go ahead and deduce that no computably enumerable theory extending PA (e.g. ZFC) is strong enough to prove the halting behaviour of all Turing machines. Let’s take ZFC for familiarity. We assume ZFC is consistent, otherwise is proves every statement in the language of set theory and all this fuss is for nothing. Let’s say that whenever a Turing machine \(e\) fails to halt on input \(e\), it proves this. [1] Then we can have a procedure that accepts a natural number \(e\) and does the following on the \(s\)th step:

Check the first \(s\) theorems of ZFC to check whether any of these theorems prove that \(\phi_e(e)\) halts. If one is found, print \(0\);
Simulates the Turing machine \(\phi_e(e)\) for the first \(s\) steps. If this simulated machine halts, print \(1\).

This machine will solve the halting problem, a contradiction. So there exists a machine that fails to halt in the real world (see a few paragraphs up), but ZFC is unable to prove this fact. Assuming the consistency of ZFC, we can easily write down such a machine. By Godel’s incompleteness theorem, ZFC does not prove its own consistency. Hence the arithmetic statement “there does not exist a Godel number \(e\) coding a proof of \(1 = 0\)” is not provable in ZFC. We can then construct a Turing machine which searches for a ZFC proof of \(1 = 0\), and since ZFC is consistent this cannot halt. Further, ZFC can prove that this machine fails to halt if and only if ZFC is consistent. Hence ZFC is unable to prove that this machine does not halt, given that ZFC is consistent.

The idea now is that we will “embed” the halting problem into a functional analysis problem.

Functional analysis?

We consider the separable Hilbert space \(\ell^2(\mathbb N)\) and let \(\langle e_n\rangle\) be its standard basis consisting of \(e_n = (0, 0, \ldots, \underbrace {1}_{n^{\text{th}} \text { place}}, 0, 0, \ldots)\). Let \(A\) be a bounded operator. For each \(n\), \(A e_n\) will be a square-summable sequence of complex numbers. We can line these up, as we do in finite dimensions, to form an infinite matrix. For example, the matrix associated with the left shift will be the infinite matrix:

\[\begin{pmatrix}0 & 0 & 0 & \ldots \\ 1 & 0 & 0 & \ldots \\ 0 & 1 & 0 & \ldots \\ 0 & 0 & 1 & \ldots \\ \vdots & \vdots & \vdots & \ldots\end{pmatrix}\]

We speak of the \((i, j)\)th “matrix element” just as we do for finite matrices, though I suppose we should start the indices at \(0\). We can see that right-multiplying the matrix by a vector \(x\) calculates \(A x\) just as in the finite dimensional case. We speak about matrices of the above type that are computable. To avoid discussion about computation of real numbers, let’s focus on matrices with elements in \(\mathbb Q + i\mathbb Q\), which can be coded as a \(4\)-tuple of integers. Let’s call an infinite matrix as formulated here, (weakly) computable if there is a Turing machine that accepts \((i, j)\) and gives a natural number that codes the \((i, j)\)th matrix element.

We now ask whether the problem of determining whether such an infinite matrix is invertible, is computable. Let \(\Lambda\) be the collection of all Turing machines which identify an infinite matrix in the above way. We ask whether there exists a machine \(T\) that accepts a natural number and outputs \(1\) if it codes an invertible infinite matrix, and \(0\) otherwise. Invertible here means that there exists a bounded linear operator \(B\) such \(A B = B A = I\), where \(I\) is the identity operator. A diagonal operator in \(\ell^2(\mathbb N)\) is invertible if and only if its sequence of diagonal entries is bounded away from zero (if \(A e_n = \lambda_n e_n\) for each \(n\) then, given \(\lambda_n \ne 0\) for each \(n\), we have \(A^{-1} e_n = \lambda_n^{-1} e_n\) for each \(n\)).

Suppose there exists such a machine \(T\). Associated with the machine \(\phi_e\), we form another machine \(T_e\) describing a diagonal matrix \(A_e\) as follows:

Simulate the Turing machine \(\phi_e(e)\);
At every stage \(s\) at which the Turing machine has not halted, write a \(1\);
Otherwise, write a \(0\).

If \(\phi_e(e)\) halts at any point, \(A_e\) will be a finite rank matrix which is not invertible (since it will eventually have all columns \(0\)). Otherwise, \(A_e\) will be equal to the identity matrix which is invertible. We can then form a solution to the halting problem as follows:

Accept \(e \in \mathbb N\);
Write down the machine \(T_e\) and compile its Godel number;
Input this Godel number into the machine \(T\);
Print its output.

Hence, such a machine \(T\) does not exist. Hence the problem of determining whether an infinite matrix is non-computable and there are infinite matrices whose invertibility is not provable in ZFC when described in a particular way. I should emphasise the latter. Take a Turing machine with index \(e\) which does not halt on input \(e\) yet ZFC doesn’t prove this. Then \(A_e\) will be the identity matrix, which by itself is clearly and provably invertible. Yet, to prove that \(A_e\) is invertible in ZFC would be to prove that \(\phi_e(e)\) halts. Hence rather than being an intrinsic property of the infinite matrix at hand, the failure of provability is due to how we have described the matrix. It goes without saying that some presentations of an object will be more useful for computing a given quantity than others, and I see this as an instance of that fact.

The ideal is that we could write down a matrix whose matrix elements are a fairly explicit function of \(i, j\) (e.g. polynomials, trig, etc.) and show that we cannot prove its invertibility or non-invertibility, but I’m not sure how to do that yet.

What does this failure of provability mean?

A fairly broad question, but I believe there is satisfying answer to the question “how does ZFC fail to prove the halting behaviour of certain Turing machines”. Of course, Godel’s incompleteness theorem gives this but perhaps does not let you appreciate what’s “going on”. The issue lies in non-standard models of arithmetic. Godel’s completeness theorem says that a statement in the language of arithmetic is provable in ZFC if and only if it is true in each model. That is, true in every universe of set theory satisfying the axioms of ZFC. Hence, if “there does not exist a Godel number coding a ZFC proof of \(1 = 0\)” is not provable in PA/ZFC/etc., then there is a set-theoretic universe that satisfies the axioms of ZFC yet identifies “there does not exist a Godel number coding a ZFC proof of \(1 = 0\)” as false by its own notion of truth.

This means that the model \(M\) will point to one of its natural numbers \(e \in \omega^M\) and say “this number codes a ZFC proof of \(1 = 0\)”. While the “standard” natural numbers \(0\), \(1\), \(\ldots\) (say, \(\omega\)) which are externally finite successors of \(0\) will embed order-isomorphically into \(\omega^M\), we do not eliminate the possibility that there are elements of \(\omega^M\) falling outside of this copy of the standard natural numbers. The image of the embedding \(\omega \to \omega^M\) is an initial segment, meaning that these non-standard numbers are greater than any standard natural number. Certainly, \(e\) cannot be (the image of) a standard natural number here: if ZFC is consistent then “\(e\) codes a ZFC proof of \(1 = 0\)” is false. Since this statement can be resolved in finitely many arithmetic operators (it is a “\(\Delta_0\)” sentence), it is therefore proven false in PA, and hence is false in \(M\). Hence \(e\) is a non-standard natural number.

The punchline is now that this model \(M\) has identified a natural number larger than any natural number in the “real world” as coding a ZFC proof of \(1 = 0\), hence it does not represent a “real” finite proof. If we are to return to the language of Turing machines, if ZFC fails to prove the non-halting of some Turing machine, then there is a model of ZFC in which the machine halts at a “non-standard step”. One might want to think of this as allowing the Turing machine to run for “infinitely long” before it halts, while the model identifies it as having run for finitely long.

This all comes down to a disagreement between the model \(M\)’s notion of finiteness and finiteness in the “real-world”. This disagreement is highlighted by the Lowenheim-Skolem paradox, which implies that if ZFC is consistent, then it has a countable model. Hence we could externally list its sets as \(x_1, x_2, \ldots\). This model may not be particularly accessible: the membership relation will not be a computable relation (we could ask whether there exists a Turing machine that accepts \(n, m\) and outputs \(1\) if \(x_n \in x_m\) and \(0\) otherwise: there does not [2]), but nonetheless challenges the idea that cardinality is absolute.

Footnotes

Note that this is a fairly strong assumption: while it is the case that whenever a machine halts, ZFC is able to prove this, it is not necessarily the case that Turing machines halt in the real world whenever ZFC proves that they do so. This would require some kind of arithmetic (specifically \(\Sigma_1\)) soundness of ZFC, which is something even stronger than mere consistency.
https://mathoverflow.net/questions/12426/is-there-a-computable-model-of-zfc