Formalizing Program Analysis in Lean 4
Introduction
Most program analysis tools are built on informal reasoning: we believe our analysis is sound, we think the fixpoint converges, we hope the abstract interpretation over-approximates. What if we could prove it?
This post explores formalizing core program analysis concepts in Lean 4. The code below is type-checked at build time — try hovering over definitions and proof terms to see types and proof states.
Sign Analysis: A Classic Abstract Domain
Sign analysis tracks whether a variable is positive, negative, zero, or unknown. It’s one of the simplest abstract domains:
inductive Sign where
| bot -- unreachable
| neg -- definitely negative
| zero -- definitely zero
| pos -- definitely positive
| top -- unknown
deriving Repr, DecidableEq, Inhabited
We need a partial order on signs:
open Sign in
instance : LE Sign where
le a b := match a, b with
| bot, _ => True
| _, top => True
| neg, neg => True
| zero, zero => True
| pos, pos => True
| _, _ => False
Abstract Addition
When we add two signs, what sign is the result?
open Sign in
def addSign : Sign → Sign → Sign
| bot, _ => bot
| _, bot => bot
| zero, s => s
| s, zero => s
| pos, pos => pos
| neg, neg => neg
| _, _ => top
We can prove that adding zero is an identity:
open Sign in
theorem addSign_zero_left : ∀ s : Sign, addSign zero s = s := ⊢ ∀ (s : Sign), addSign zero s = s
s:Sign⊢ addSign zero s = s; ⊢ addSign zero bot = bot⊢ addSign zero neg = neg⊢ addSign zero zero = zero⊢ addSign zero pos = pos⊢ addSign zero top = top ⊢ addSign zero bot = bot⊢ addSign zero neg = neg⊢ addSign zero zero = zero⊢ addSign zero pos = pos⊢ addSign zero top = top All goals completed! 🐙
open Sign in
theorem addSign_zero_right : ∀ s : Sign, addSign s zero = s := ⊢ ∀ (s : Sign), addSign s zero = s
s:Sign⊢ addSign s zero = s; ⊢ addSign bot zero = bot⊢ addSign neg zero = neg⊢ addSign zero zero = zero⊢ addSign pos zero = pos⊢ addSign top zero = top ⊢ addSign bot zero = bot⊢ addSign neg zero = neg⊢ addSign zero zero = zero⊢ addSign pos zero = pos⊢ addSign top zero = top All goals completed! 🐙
And that the operation is commutative:
open Sign in
theorem addSign_comm : ∀ a b : Sign, addSign a b = addSign b a := ⊢ ∀ (a b : Sign), addSign a b = addSign b a
intro a a:Signb:Sign⊢ addSign a b = addSign b a; b:Sign⊢ addSign bot b = addSign b botb:Sign⊢ addSign neg b = addSign b negb:Sign⊢ addSign zero b = addSign b zerob:Sign⊢ addSign pos b = addSign b posb:Sign⊢ addSign top b = addSign b top b:Sign⊢ addSign bot b = addSign b botb:Sign⊢ addSign neg b = addSign b negb:Sign⊢ addSign zero b = addSign b zerob:Sign⊢ addSign pos b = addSign b posb:Sign⊢ addSign top b = addSign b top ⊢ addSign top bot = addSign bot top⊢ addSign top neg = addSign neg top⊢ addSign top zero = addSign zero top⊢ addSign top pos = addSign pos top⊢ addSign top top = addSign top top ⊢ addSign bot bot = addSign bot bot⊢ addSign bot neg = addSign neg bot⊢ addSign bot zero = addSign zero bot⊢ addSign bot pos = addSign pos bot⊢ addSign bot top = addSign top bot⊢ addSign neg bot = addSign bot neg⊢ addSign neg neg = addSign neg neg⊢ addSign neg zero = addSign zero neg⊢ addSign neg pos = addSign pos neg⊢ addSign neg top = addSign top neg⊢ addSign zero bot = addSign bot zero⊢ addSign zero neg = addSign neg zero⊢ addSign zero zero = addSign zero zero⊢ addSign zero pos = addSign pos zero⊢ addSign zero top = addSign top zero⊢ addSign pos bot = addSign bot pos⊢ addSign pos neg = addSign neg pos⊢ addSign pos zero = addSign zero pos⊢ addSign pos pos = addSign pos pos⊢ addSign pos top = addSign top pos⊢ addSign top bot = addSign bot top⊢ addSign top neg = addSign neg top⊢ addSign top zero = addSign zero top⊢ addSign top pos = addSign pos top⊢ addSign top top = addSign top top All goals completed! 🐙
Modeling Control Flow Graphs
Program analysis operates over control flow graphs:
structure CFG (n : Nat) where
edges : List (Fin n × Fin n)
entry : Fin n
def CFG.successors (cfg : CFG n) (node : Fin n) : List (Fin n) :=
cfg.edges.filterMap fun (src, dst) =>
if src == node then some dst else none
def CFG.predecessors (cfg : CFG n) (node : Fin n) : List (Fin n) :=
cfg.edges.filterMap fun (src, dst) =>
if dst == node then some src else none
Here is a diamond pattern — common in if-then-else:
def diamondCFG : CFG 4 :=
{ edges := [(⟨0, ⊢ 0 < 4 All goals completed! 🐙⟩, ⟨1, ⊢ 1 < 4 All goals completed! 🐙⟩),
(⟨0, ⊢ 0 < 4 All goals completed! 🐙⟩, ⟨2, ⊢ 2 < 4 All goals completed! 🐙⟩),
(⟨1, ⊢ 1 < 4 All goals completed! 🐙⟩, ⟨3, ⊢ 3 < 4 All goals completed! 🐙⟩),
(⟨2, ⊢ 2 < 4 All goals completed! 🐙⟩, ⟨3, ⊢ 3 < 4 All goals completed! 🐙⟩)]
entry := ⟨0, ⊢ 0 < 4 All goals completed! 🐙⟩ }
We can verify structural properties:
example : diamondCFG.successors ⟨0, ⊢ 0 < 4 All goals completed! 🐙⟩ =
[⟨1, ⊢ 1 < 4 All goals completed! 🐙⟩, ⟨2, ⊢ 2 < 4 All goals completed! 🐙⟩] := ⊢ diamondCFG.successors ⟨0, ⋯⟩ = [⟨1, ⋯⟩, ⟨2, ⋯⟩] All goals completed! 🐙
Fixpoint Computation
The heart of any dataflow analysis is computing a fixpoint:
def iterate [DecidableEq α] (f : α → α) (init : α) : Nat → α
| 0 => init
| fuel + 1 =>
let next := f init
if next == init then init
else iterate f next fuel
A simple test:
#eval iterate (· + 1) 0 5
We can prove that when the function is idempotent on the result, we have a fixpoint:
theorem iterate_fixed [DecidableEq α] (f : α → α) (x : α) (h : f x = x) :
iterate f x (n + 1) = x := α:Type u_1n:Natinst✝:DecidableEq αf:α → αx:αh:f x = x⊢ iterate f x (n + 1) = x
All goals completed! 🐙
Where This Goes
This is just the foundation. With these building blocks you can formalize interval analysis, points-to analysis, taint analysis, and full abstract interpretation soundness proofs. Lean’s dependent types let us carry proofs alongside the analysis — the analysis produces evidence that results are correct.
