Formalizing Program Analysis in Lean 4

Formal MethodsProgramming

Introduction

Most program analysis tools are built on informal reasoning: we believe our analysis is sound, we think the fixpoint converges, we hope the abstract interpretation over-approximates. What if we could prove it?

This post explores formalizing core program analysis concepts in Lean 4. The code below is type-checked at build time — try hovering over definitions and proof terms to see types and proof states.

Sign Analysis: A Classic Abstract Domain

Sign analysis tracks whether a variable is positive, negative, zero, or unknown. It’s one of the simplest abstract domains:

inductive Sign where | bot -- unreachable | neg -- definitely negative | zero -- definitely zero | pos -- definitely positive | top -- unknown deriving Repr, DecidableEq, Inhabited

We need a partial order on signs:

open Sign in instance : LE Sign where le a b := match a, b with | bot, _ => True | _, top => True | neg, neg => True | zero, zero => True | pos, pos => True | _, _ => False

Abstract Addition

When we add two signs, what sign is the result?

open Sign in def addSign : Sign Sign Sign | bot, _ => bot | _, bot => bot | zero, s => s | s, zero => s | pos, pos => pos | neg, neg => neg | _, _ => top

We can prove that adding zero is an identity:

open Sign in theorem addSign_zero_left : s : Sign, addSign zero s = s := (s : Sign), addSign zero s = s s:SignaddSign zero s = s; addSign zero bot = botaddSign zero neg = negaddSign zero zero = zeroaddSign zero pos = posaddSign zero top = top addSign zero bot = botaddSign zero neg = negaddSign zero zero = zeroaddSign zero pos = posaddSign zero top = top All goals completed! 🐙 open Sign in theorem addSign_zero_right : s : Sign, addSign s zero = s := (s : Sign), addSign s zero = s s:SignaddSign s zero = s; addSign bot zero = botaddSign neg zero = negaddSign zero zero = zeroaddSign pos zero = posaddSign top zero = top addSign bot zero = botaddSign neg zero = negaddSign zero zero = zeroaddSign pos zero = posaddSign top zero = top All goals completed! 🐙

And that the operation is commutative:

open Sign in theorem addSign_comm : a b : Sign, addSign a b = addSign b a := (a b : Sign), addSign a b = addSign b a intro a a:Signb:SignaddSign a b = addSign b a; b:SignaddSign bot b = addSign b botb:SignaddSign neg b = addSign b negb:SignaddSign zero b = addSign b zerob:SignaddSign pos b = addSign b posb:SignaddSign top b = addSign b top b:SignaddSign bot b = addSign b botb:SignaddSign neg b = addSign b negb:SignaddSign zero b = addSign b zerob:SignaddSign pos b = addSign b posb:SignaddSign top b = addSign b top addSign top bot = addSign bot topaddSign top neg = addSign neg topaddSign top zero = addSign zero topaddSign top pos = addSign pos topaddSign top top = addSign top top addSign bot bot = addSign bot botaddSign bot neg = addSign neg botaddSign bot zero = addSign zero botaddSign bot pos = addSign pos botaddSign bot top = addSign top botaddSign neg bot = addSign bot negaddSign neg neg = addSign neg negaddSign neg zero = addSign zero negaddSign neg pos = addSign pos negaddSign neg top = addSign top negaddSign zero bot = addSign bot zeroaddSign zero neg = addSign neg zeroaddSign zero zero = addSign zero zeroaddSign zero pos = addSign pos zeroaddSign zero top = addSign top zeroaddSign pos bot = addSign bot posaddSign pos neg = addSign neg posaddSign pos zero = addSign zero posaddSign pos pos = addSign pos posaddSign pos top = addSign top posaddSign top bot = addSign bot topaddSign top neg = addSign neg topaddSign top zero = addSign zero topaddSign top pos = addSign pos topaddSign top top = addSign top top All goals completed! 🐙

Modeling Control Flow Graphs

Program analysis operates over control flow graphs:

structure CFG (n : Nat) where edges : List (Fin n × Fin n) entry : Fin n def CFG.successors (cfg : CFG n) (node : Fin n) : List (Fin n) := cfg.edges.filterMap fun (src, dst) => if src == node then some dst else none def CFG.predecessors (cfg : CFG n) (node : Fin n) : List (Fin n) := cfg.edges.filterMap fun (src, dst) => if dst == node then some src else none

Here is a diamond pattern — common in if-then-else:

def diamondCFG : CFG 4 := { edges := [(0, 0 < 4 All goals completed! 🐙, 1, 1 < 4 All goals completed! 🐙), (0, 0 < 4 All goals completed! 🐙, 2, 2 < 4 All goals completed! 🐙), (1, 1 < 4 All goals completed! 🐙, 3, 3 < 4 All goals completed! 🐙), (2, 2 < 4 All goals completed! 🐙, 3, 3 < 4 All goals completed! 🐙)] entry := 0, 0 < 4 All goals completed! 🐙 }

We can verify structural properties:

example : diamondCFG.successors 0, 0 < 4 All goals completed! 🐙 = [1, 1 < 4 All goals completed! 🐙, 2, 2 < 4 All goals completed! 🐙] := diamondCFG.successors 0, = [1, , 2, ] All goals completed! 🐙

Fixpoint Computation

The heart of any dataflow analysis is computing a fixpoint:

def iterate [DecidableEq α] (f : α α) (init : α) : Nat α | 0 => init | fuel + 1 => let next := f init if next == init then init else iterate f next fuel

A simple test:

5#eval iterate (· + 1) 0 5

We can prove that when the function is idempotent on the result, we have a fixpoint:

theorem iterate_fixed [DecidableEq α] (f : α α) (x : α) (h : f x = x) : iterate f x (n + 1) = x := α:Type u_1n:Natinst✝:DecidableEq αf:α αx:αh:f x = xiterate f x (n + 1) = x All goals completed! 🐙

Where This Goes

This is just the foundation. With these building blocks you can formalize interval analysis, points-to analysis, taint analysis, and full abstract interpretation soundness proofs. Lean’s dependent types let us carry proofs alongside the analysis — the analysis produces evidence that results are correct.