Formalizing the proof of PFR in Lean4 using Blueprint: a short tour


Since the release of my preprint with Tim, Ben, and Freddie showing the Polynomial Freiman-Ruzsa (PFR) opinion over {mathbb F}_2, I (together with Yael Dilles and Bhavik Mehta) have actually begun a collaborative project to formalize this argument in the evidence assistant languageLean4 It has actually been less than a week because the job was released, however it is continuing rather well, with a considerable portion of the paper currently either totally or partly formalized. The job has actually been significantly helped by the Blueprint tool, which permits one to compose a human-readable “plan” of the evidence that is connected to the Lean formalization; comparable plans have actually been used for other projects, such as Scholze’sliquid tensor experiment For the PFR job, the plan can be discoveredhere One function of the plan that I discover especially appealing is the dependence chart that is instantly created from the plan, and can offer a rough photo of how far along the formalization has actually advanced. For PFR, the current state of the dependence chart can be discoveredhere At the existing time of composing, the chart appears like this:


The color coding of the different bubbles (for lemmas) and rectangular shapes (for meanings) is described in the legend to the dependency graph, however approximately speaking the green bubbles/rectangles represent lemmas or meanings that have actually been totally formalized, and the blue ones represent lemmas or meanings which are all set to be formalized (their declarations, however not evidence, have actually currently been formalized, along with those of all required lemmas and evidence). The objective is to get all the bubbles leading up to the “pfr” bubble at the bottom colored in green.

In this post I wish to offer a fast “trip” of the job, to offer a sense of how it runs. If one click the “pfr” bubble at the bottom of the dependence chart, we get the following:

image 1

Here, Blueprint is showing a human-readable kind of the PFR declaration. This is originating from the corresponding portion of the blueprint, which likewise features a human-readable evidence of this declaration that depends on other declarations in the job:

image 2

However, this part of the evidence has actually not yet been formalized in Lean. Observe that the “pfr” bubble is white, however has a green border. This implies that the declaration of PFR has actually been formalized in Lean, however not the evidence; and the evidence itself is not all set to be formalized, since some of the requirements (in specific, “Lean documentation for this assertion” (Theorem 6.16)) do not even have their declarations formalized. If we click the “Lean” link listed below the description of PFR in the dependence chart, we are result in the (auto-generated)

image 3

: G This is what a common theorem in Lean appear like (after a treatment called “lovely printing”). There are a variety of hypotheses mentioned before the colon, for example that 2 is a limited primary abelian group of order {bf F}_2^n (this is how we have actually selected to formalize the limited field vector areas A), that G is a non-empty subset of A (the hypothesis that A+A is non-empty was not mentioned in the LaTeX variation of the opinion, however we recognized it was needed in the formalization, and will upgrade the LaTeX plan quickly to show this) with the cardinality of K less than A times the cardinality of A, and the declaration after the colon is the conclusion: that c+H can be consisted of in the amount H of a subgroup G of c and a set 2K^{12} of cardinality at the majority of

H The astute reader might see that the above theorem appears to be missing out on a couple of information, for example it does not clearly assert that Source is a subgroup. This is since the “quite printing” reduces a few of the info in the real declaration of the theorem, which can be seen by clicking the “

image 4

” link: H Here we see that G is needed to have the “type” of an additive subgroup oftypes (Lean’s language revolves extremely highly around

, however for this trip we will not explain into what a type is precisely.) The popular “sorry” at the bottom of this theorem asserts that an evidence is not yet attended to this theorem, however the intent naturally is to change this “sorry” with a real evidence ultimately.ruzsa-nonneg Filling in this “sorry” is too tough to do today, so let’s search for an easier job to achieve rather. Here is a basic intermediate lemma “

image 5

” that appears in the evidence: d[X; Y] The expression X describes something called the entropic Ruzsa range in between Y and elsewhere in the project, which is something that is specified ruzsa-diff, however for the existing conversation it is trivial to understand its accurate meaning, aside from that it is a genuine number. The bubble is blue with a green border, which implies that the declaration has actually been formalized, and the evidence is all set to be formalized. {The plan dependence chart suggests that this lemma can be deduced from simply one preceding lemma, called “

image 6

“: ruzsa-diffruzsa-nonneg” is likewise blue and surrounded in green, so it has the very same existing status as “H[X]“: the declaration is formalized, and the evidence is all set to be formalized likewise, however the evidence has actually not been composed in Lean yet.|The plan dependence chart suggests that this lemma can be deduced from simply one preceding lemma, called “Shannon entropy“: Xelsewhere in the project” is likewise blue and surrounded in green, so it has the very same existing status as “

“: the declaration is formalized, and the evidence is all set to be formalized likewise, however the evidence has actually not been composed in Lean.} The amount |H[X] - H[Y]|, by the method, describes the 2 of as follows, specified

image 7

, however for this conversation we do not require to understand its meaning, aside from to understand that it is a genuine number.X, mu, Y, mu' Looking at Lemma 3.11 and Lemma 3.13 it is clear how the previous will indicate the latter: the amount currently formalized as

image 8

is plainly non-negative! (There is an aspect of PFR github repository present in Lemma 3.11, however it can be quickly counteracted.) It needs to be a simple job to fill in the evidence of Lemma 3.13 presuming Lemma 3.11, even if we still do not understand how to show Lemma 3.11. Let’s very first take a look at the Lean code for each lemma. Lemma 3.11 is formalized Visual Studio Code: lean4 extension Again we have a “sorry” to suggest that this lemma does not presently have an evidence. The Lean notation (along with the name of the lemma) varies a little from the LaTeX variation for technical factors that we will not enter into here. (Also, the variables

image 9

are presented at an earlier phase in the Lean file; once again, we will disregard this point for the taking place conversation.) Lemma 3.13 is X OK, I’m now going to attempt to fill in the latter “sorry”. In my regional copy of the Omega, I open the pertinent lean file in my editor (G, with the

) and browse to the “sorry” of “rdist_nonneg”. The accompanying “Lean infoview” then reveals the existing state of the Lean evidence: 2 Here we see a variety of ambient hypotheses (e.g., that

image 10

is an additive commutative group, that 0 leq 2 d[X;Y] is a map from 0 leq d[X,Y] to 0 leq 2 d[X;Y], etc; a lot of these hypotheses are not in fact pertinent for this specific lemma), and at the bottom we see the objective we want to OK, so now I’ll attempt to show the claim. This is achieved by using a series of “strategies” to change the objective and/or hypotheses. The primary step I’ll do is to put in the aspect of

that is required to use Lemma 3.11. This I will finish with the “is adequate” technique, composing in the evidence

image 11

I now have 2 objectives (and 2 “sorries”): one to reveal that linarith suggests

image 12

, and the other to reveal that0 leq 2 d[X;Y] (The yellow squiggly highlight suggests that this lemma has actually not been totally shown yet due to the existence of “sorry” s. The dot “.” is a syntactic marker that works to separate the 2 objectives from each other, however you can disregard it for this trip.) The Lean technique “is adequate” corresponds, approximately speaking, to the expression “It is adequate to reveal that …” (or more exactly, “It is adequate to reveal that … To see this, … It stays to confirm the claim …”) in Mathematical English. For my own education, I composed a “Lean phrasebook” of more correspondences in between lines of Lean code and sentences or expressions in Mathematical English, which can be discovered

image 13

image 14

Let’s fill in the very first “sorry”. The technique state now appears like this (cropping out some unimportant hypotheses):

Here I can utilize a convenient technique “|H[X]-H[Y]| leq 2 d[X;Y]“, which resolves any objective that can be obtained by direct math from existing hypotheses: 0 leq 2 d[X;Y] This works, and now the technique state reports no objectives delegated show on this branch, so we proceed to the staying sorry, in which the objective is now to show h:

image 15

Here we will attempt to conjure up Lemma 3.11. I include the following lines of code:

image 16

The Lean technique “have” approximately represents the Mathematical English expression “We have the declaration …” or “We declare the declaration …”; like “is adequate”, it divides an objective into 2 subgoals, though in the reversed order to “is adequate”.

I once again have 2 subgoals, one to show the bound Github Copilot (which I will call “h”), and after that to deduce the previous objective

image 25

from0 leq |H[X] - H[Y]| For the very first, I understand I need to conjure up the lemma “diff_ent_le_rdist” that is encoding Lemma 3.11. One method to do this is to attempt the technique “specific?”, which will instantly browse to see if the objective can currently be deduced right away from an existing lemma. It reports:

image 17

So I attempt this (by clicking the recommended code, which instantly pastes it into the right place), and it works, leaving me with the last “sorry”: here The lean technique “specific” corresponds, approximately speaking, to the Mathematical English expression “But this is precisely …”.Moogle At this point I need to discuss that I likewise have the Lean naming conventions for lemmas extension to Visual Studio Code set up. This is an AI which serves as a sophisticated autocomplete that can recommend possible lines of code as one types. In this case, it used a tip which was practically right (the 2nd line is what we require, whereas the very first is not needed, and in reality does not even put together in Lean): h In any occasion, “specific?” operated in this case, so I can disregard the idea of Copilot this time (it has actually been extremely helpful in other cases though). I use the “specific?” technique a 2nd time and follow its idea to develop the matching bound h':

image 18

( One can discover documention for the “abs_nonneg” techniquehere Copilot, by the method, was likewise able to fix this action, albeit with a somewhat various syntax; there are likewise a number of other online search engine offered to find this technique too, such asgeq Among the primary functions of the leq, by the method, is to help with the place of techniques such as “abs_nonneg”, which is simpler determine how to look for than a technique called (state) “Lemma 1.2.1”.) To fill out the last “sorry”, I attempt “specific?” one last time, to determine how to integrate X geq Y and Y leq X to offer the preferred objective, and it works!definitionally equal( Note that all the squiggly highlights have actually vanished, showing that Lean has actually accepted this as a legitimate evidence. The paperwork for “ge_trans” might be discoveredle_trans The reader might observe that this technique utilizes the

relation instead of the code golf relation, however in Lean the assertions

image 19


are “

image 20

“, enabling strategies such as “specific” to utilize them interchangeably. “specific

image 22

h’ h” would likewise have actually operated in this circumstances.) ruzsa-nonneg It is possible to compactify this evidence a fair bit by eliminating a number of intermediate actions (a treatment in some cases called “

image 23

“): ruzsa-nonneg And now the evidence is done! In the end, it was actually a “one-line evidence”, that makes sense provided how close Lemma 3.11 and Lemma 3.13 were to each other. The existing variation of Blueprint does not instantly confirm the evidence (even though it does put together in Lean), so we have to by hand upgrade the plan. The LaTeX for Lemma 3.13 presently appears like this: I include the “leanok” macro to the evidence, to flag that the evidence has actually now been formalized: X, Y I then press whatever back up to the master Github repository. The plan will take rather a long time (about half an hour) to reconstruct, however ultimately it does, and the dependence chart (which Blueprint has for some factor chose to reorganize a bit) now reveals “

” in green: Zulip chat stream And so the formalization of PFR moves a bit better to conclusion. (Of course, this was an especially simple lemma to formalize, that I selected to show the procedure; one can picture that the majority of other lemmas will take a bit more work.) Keep in mind that while “download Lean” is now colored in green, we do not yet have a complete evidence of this outcome, since the lemma “work on the PFR project yourself” that it depends on is not green. The evidence is Lean4 playground in your areahere total at this point; ideally at some point in the future, the predecessor outcomes will likewise be in your area shown, at which point this outcome will be totally shown. Keep in mind how this plan structure permits one to deal with various parts of the evidence asynchronously; it is not needed to await earlier phases of the argument to be totally formalized to begin dealing with later phases, although I prepare for a percentage of interaction in between various parts as we settle any bugs or small mistakes in the plan. (For circumstances, I am thinking that we might require to include some measurability hypotheses on the random variables

and (*), utilizing a regional copy of the Github repository and sending out pull demands to the master copy if you have actually handled to fill out several of the “sorry” s in the existing variation. (One secret benefit of dealing with a job based around an evidence assistant language such as Lean is that it makes massive mathematical cooperation possible without always having a pre-established level of trust among the partners; my fellow repository maintainers and I have actually currently authorized a number of pull demands from factors that had actually not formerly fulfilled, as the code was validated to be right and we might see that it advanced the job. Alternatively, as the above example needs to ideally show, it is possible for a factor to deal with one little corner of the job without always requiring to comprehend all the mathematics that enters into the job as a whole.)(*) If one simply wishes to try out Lean without going to the effort of downloading it, you can playing attempt the “(*)” for a mild intro to the language, or the (*) for an online Lean server. Additional resources to find out Lean4 might be discovered (*).(*) Like this: (*) Like(*) Loading …(*)


Please enter your comment!
Please enter your name here