WARSAW UNIV. PL * MATHESIS UNIVERSALIS, No.2, 1996 * J.Harrison: Formalized Mathematics, 1.5. Enter the computer

1.5. Enter the computer

Thus far, formalized mathematics seems an unpromising idea. Bourbaki may be right that such formalization is simply impractical for people to do. In any case, as [naur-pvf] points out, formalization sometimes leads to more errors, not fewer, as the underlying intuition which provides a check on correctness begins to get lost --- so even one of the claimed merits of formalization is open to question. The problems of formalization have been well put by [constable-programs]:

It is thus possible to translate the proof of any mathematical theorem into a completely formal proof. However, the prospect of actually doing this is quite daunting because an informal proof of modest length will expand to a formal one of prodigious size and will require in its production extreme care and detailed knowledge of the more or less arbitrary conventions of the particular formalism. These tedious details will in sheer number dominate the interesting mathematical ideas which are the raison d'être of the proof.

But like the authors just quoted, we believe that the arrival of the computer changes the situation completely, for several reasons.

As pointed out by Naur and others like [vangasteren-shape], as one relies less on intuition and more on rules of formal manipulation, accuracy in those manipulations becomes more important. This is tedious for people,¹¹ but checking conformance to formal rules is one of the things computers are very good at. Indeed, the Bourbaki claim that the transition to a completely formal text is routine, but too boring and tedious for people, seems almost an open invitation to give the task to computers.
The computer need not simply check the correctness of formal arguments, but can try to help in their construction. We will discuss in more detail just how this is to be done later, but at one extreme, it is sometimes possible for computers to find proofs of theorems completely automatically! Furthermore, the computer can provide interface assistance, helping to mitigate the austerity of formal deductive calculi and to organize the user's work.
Among modern computer scientists, formal logic is quite widely known and substantially used; many regard formal logic as the 'applied mathematics' underlying computer science, just as real analysis etc. underlies physics and engineering. [dijkstra-invariance] has remarked that 'computing scientists may well have been the first to use the predicate calculus regularly [...] as far as the mathematical community is concerned George Boole has lived in vain'.
Computer correctness is itself a topic of concern, especially since computers are used in safety-critical systems like fly-by-wire aircraft, antilock braking systems, nuclear reactor controllers and radiation therapy machines. One means of verifying a design is to prove that a mathematical model of the system satisfies some formal properties, designed to correspond to real-world requirements. Such proofs are much more technically intricate, albeit perhaps shallower, than typical mathematics proofs, and of necessity involve very complicated formal manipulations.

It isn't very surprising that the computer should make such a dramatic difference; after all it has revolutionized work in many scientific disciplines. Previously unthinkable ideas like detailed simulations of weather systems have become possible -- formalizing mathematics in a deductive calculus demands fairly modest computer power by comparison.

[mackenzie-proof] traces the history of computerized theorem proving. The early experiments were mainly concerned with completely automatic proofs of theorems, and work can broadly be divided into two tracks. Work in the Artificial Intelligence tradition, typified by [newell-simon], tried to emulate the way mathematicians actually think. Other more successful work, exemplified by [wang-mm], simply relied on the speed of the computer, using exhaustive search for a proof in certain formal systems for (usually) first order logic. Of course even here the search wasn't completely blind: at the very least the formal systems were chosen to limit the search space. For example, the pioneering work of [prawitz-pp] was based on Gentzen's cut-free sequent systems, in the guise of 'tableaux'.¹²

The next step forward, discussed by [kanger], was to use free or metavariables, discovering the right instantiations gradually during search. Subsequent implementations of tableaux and similar calculi such as model elimination [loveland-me1] invariably used some such technique. These methods are 'global', in that the instantiations need to be applied throughout the partial proof tree. It's therefore not surprising that the modern incarnations rely heavily on Prolog technology; see the work of [stickel-pttp] and [beckert-posegga] for example.

Another stream of work approached the problem bottom-up rather than top-down. These 'local' methods sacrifice goal-directedness but make instantiation cheaper and, since all variables in intermediate results are effectively universal, they allow the garnering of lemmas which can then be re-used with different instantiations. The 'inverse method' of [maslov-inverse] was presented in just this way, as a bottom-up approach to proof in cut-free sequent calculus. The most famous method though was resolution, invented by [robinson-resolution]. This took just a single rule of inference, with unification (introduced explicitly for the first time by Robinson), and its introduction caused a renaissance of interest in automated theorem proving. Otter, directly descended from this line of research, has recently achieved impressive results, settling open problems in mathematics, or producing quite new theorems [anl-new-results].

However general first order provers tend to be inefficient in restricted areas compared to specialized algorithms. Numerous special algorithms which tackle certain areas of mathematics more efficiently have been devised, for example for tautology checking [bryant,stalmarck-patent], linear arithmetic [shostak-presburger] and equational reasoning [knuth-bendix], not to mention the panoply of techniques used by computer algebra systems. The NQTHM prover, developed by [boyer-acl], is a general theorem prover for quantifier-free arithmetic; since the logic is quite simple, powerful proof automation, notably for induction proofs, is feasible. Though the logic is restricted, it is amazingly expressive in the hands of a skilled practitioner, and NQTHM has been used in a wide variety of applications.

Nevertheless the achievements of fully automatic provers soon began to reach a plateau, well below the ability to prove many substantial mathematical theorems unaided. The emphasis moved to the development of 'proof assistants', 'proof checkers' and 'interactive theorem provers', which are closer to our present interests. The idea is not to prove difficult theorems automatically, but simply to assist a human in constructing a formal proof, or at the very least check its correctness. Several pioneering projects for the computer formalization of mathematics appeared in the 70s. A notable example was the Automath project led by [debruijn-Automath]. In this project, a language for formal mathematics was designed, together with a computer system for checking the correctness of formal texts. Significant parts of mathematics were proof-checked; for example [jutting-thesis] formalized the famous book on the construction of the real number field by [landau]. The history of the project and the lessons derived from it are detailed by [nederpelt-automath]. Though the project was important and influential, the Automath system is hardly used today.

One of Automath's difficulties was that it was ahead of its time technologically, always struggling to make the computer keep pace with the ideas. Perhaps another factor contributing to its decline was that it used some rather original notational and foundational ideas (the banishment of named bound variables and the exploitation of the Curry-Howard correspondence,¹³ for example). The Mizar project, which began a little later, is still quite vital today, and one of the reasons may be that, as its originator [trybulec-ilcc] explicitly stated, it attempted not to depart too radically from the usual accepted practices of mathematics. Though until recently it was not well known among the theorem proving community in the West, Mizar has been used to proof-check a very large body of mathematics, spanning pure set theory, algebra, analysis, topology, category theory and various unusual applications like mathematical puzzles and computer models. Selected Mizar articles are automatically abstracted and printed in human-readable form in the journal 'Formalized Mathematics'.¹⁴

There are reasons for dissatisfaction with very low-level proof checking à la Automath, simply because of the tedium involved and the unreadability of the resulting proofs. But fully automatic theorem provers, though they work well in some situations, are often hard to use in practice. Usually they need to be led to the required result via a carefully graded series of lemmas, each of which they can prove automatically. Choosing this series often requires intimate knowledge of the system if one is not to lead it up a blind alley. And if one is interested in the proof, a completely automated prover may not be what is wanted.

These disadvantages of both approaches led Milner to develop Edinburgh LCF [gordon-lcfbook]. In LCF-like systems, the ML programming language is used to define data types representing logical entities such as types, terms and theorems. A number of ML functions are provided which produce theorems; these implement primitive inference rules of the logic. The use of an abstract type of theorems with these inference rules as its only constructors ensures that theorems can only be produced this way. However the user is free to write arbitrarily complex proof procedures which ultimately decompose to these inferences. So in principle, any automatic proof procedure can be translated into an LCF program, while guaranteeing that even if the program has errors, no false 'theorems' will arise (the program may fail or produce a theorem other than the one intended, of course, but the latter is easy to guard against and both are rare). It seems in practice that this can be done reasonably efficiently; we discuss this issue in more detail later. HOL [gordon-holbook], which uses classical higher order logic rather than the LCF logic, takes this a step further by insisting all theories be built up using special definitional extension mechanisms. These give a guarantee that consistency is preserved. Such an approach is consonant with the LCF philosophy, since it entails pushing back the burden of consistency proofs or whatever to the beginning, once and for all, such that all extensions, whether of the theory hierarchy or proof mechanisms, are correct per construction.

Many of the systems described here were not primarily developed in order to formalize textbook mathematics. For example, the HOL system was designed expressly for the purpose of verifying computer hardware. As we said above, this is a promising new application area for theorem provers, and many, notably NQTHM and HOL, have been applied successfully to it. This is still mathematics of a sort, and often quite a bit of pure mathematics is needed in a supporting role --- see the work of [harrison-fmsd94] for example. But the theorems proved tend to be different; shallower but more technically complex. It may be that proof techniques which work well in pure mathematics are unsuitable for verification, and vice versa. The LCF style offers the most hope here, since its inherent programmability allows multiple proof styles to be based on the same logical core.

John Harrison 96/2/22; HTML by

96/4/5