Formalizing Program Analysis in Lean 4

Introduction

Most program analysis tools are built on informal reasoning: we believe our analysis is sound, we think the fixpoint converges, we hope the abstract interpretation over-approximates. What if we could prove it?

This post explores formalizing core program analysis concepts in Lean 4. The code below is type-checked at build time — try hovering over definitions and proof terms to see types and proof states.

Sign Analysis: A Classic Abstract Domain

Sign analysis tracks whether a variable is positive, negative, zero, or unknown. It’s one of the simplest abstract domains:

inductive Sign where
  | bot    -- unreachable
  | neg    -- definitely negative
  | zero   -- definitely zero
  | pos    -- definitely positive
  | top    -- unknown
  deriving Repr, DecidableEq, Inhabited

We need a partial order on signs:

open Sign in
instance : LE Sign where
  le a b := match a, b with
    | bot, _     => True
    | _, top     => True
    | neg, neg   => True
    | zero, zero => True
    | pos, pos   => True
    | _, _       => False

Abstract Addition

When we add two signs, what sign is the result?

We can prove that adding zero is an identity:

open Sign in theorem addSign_zero_left : ∀ s : Sign, addSign zero s = s := by⊢ ∀ (s : Sign), addSign zero s = s intro ss:Sign⊢ addSign zero s = s; cases sbot⊢ addSign zero bot = botneg⊢ addSign zero neg = negzero⊢ addSign zero zero = zeropos⊢ addSign zero pos = postop⊢ addSign zero top = top <;>bot⊢ addSign zero bot = botneg⊢ addSign zero neg = negzero⊢ addSign zero zero = zeropos⊢ addSign zero pos = postop⊢ addSign zero top = top rflAll goals completed! 🐙

open Sign in theorem addSign_zero_right : ∀ s : Sign, addSign s zero = s := by⊢ ∀ (s : Sign), addSign s zero = s intro ss:Sign⊢ addSign s zero = s; cases sbot⊢ addSign bot zero = botneg⊢ addSign neg zero = negzero⊢ addSign zero zero = zeropos⊢ addSign pos zero = postop⊢ addSign top zero = top <;>bot⊢ addSign bot zero = botneg⊢ addSign neg zero = negzero⊢ addSign zero zero = zeropos⊢ addSign pos zero = postop⊢ addSign top zero = top rflAll goals completed! 🐙

And that the operation is commutative:

open Sign in theorem addSign_comm : ∀ a b : Sign, addSign a b = addSign b a := by⊢ ∀ (a b : Sign), addSign a b = addSign b a intro a ba:Signb:Sign⊢ addSign a b = addSign b a; cases abotb:Sign⊢ addSign bot b = addSign b botnegb:Sign⊢ addSign neg b = addSign b negzerob:Sign⊢ addSign zero b = addSign b zeroposb:Sign⊢ addSign pos b = addSign b postopb:Sign⊢ addSign top b = addSign b top <;>botb:Sign⊢ addSign bot b = addSign b botnegb:Sign⊢ addSign neg b = addSign b negzerob:Sign⊢ addSign zero b = addSign b zeroposb:Sign⊢ addSign pos b = addSign b postopb:Sign⊢ addSign top b = addSign b top cases btop.bot⊢ addSign top bot = addSign bot toptop.neg⊢ addSign top neg = addSign neg toptop.zero⊢ addSign top zero = addSign zero toptop.pos⊢ addSign top pos = addSign pos toptop.top⊢ addSign top top = addSign top top <;>bot.bot⊢ addSign bot bot = addSign bot botbot.neg⊢ addSign bot neg = addSign neg botbot.zero⊢ addSign bot zero = addSign zero botbot.pos⊢ addSign bot pos = addSign pos botbot.top⊢ addSign bot top = addSign top botneg.bot⊢ addSign neg bot = addSign bot negneg.neg⊢ addSign neg neg = addSign neg negneg.zero⊢ addSign neg zero = addSign zero negneg.pos⊢ addSign neg pos = addSign pos negneg.top⊢ addSign neg top = addSign top negzero.bot⊢ addSign zero bot = addSign bot zerozero.neg⊢ addSign zero neg = addSign neg zerozero.zero⊢ addSign zero zero = addSign zero zerozero.pos⊢ addSign zero pos = addSign pos zerozero.top⊢ addSign zero top = addSign top zeropos.bot⊢ addSign pos bot = addSign bot pospos.neg⊢ addSign pos neg = addSign neg pospos.zero⊢ addSign pos zero = addSign zero pospos.pos⊢ addSign pos pos = addSign pos pospos.top⊢ addSign pos top = addSign top postop.bot⊢ addSign top bot = addSign bot toptop.neg⊢ addSign top neg = addSign neg toptop.zero⊢ addSign top zero = addSign zero toptop.pos⊢ addSign top pos = addSign pos toptop.top⊢ addSign top top = addSign top top rflAll goals completed! 🐙

Modeling Control Flow Graphs

Program analysis operates over control flow graphs:

structure CFG (n : Nat) where edges : List (Fin n × Fin n) entry : Fin n def CFG.successors (cfg : CFG n) (node : Fin n) : List (Fin n) := cfg.edges.filterMap fun (src, dst) => if src == node then some dst else none def CFG.predecessors (cfg : CFG n) (node : Fin n) : List (Fin n) := cfg.edges.filterMap fun (src, dst) => if dst == node then some src else none

Here is a diamond pattern — common in if-then-else:

def diamondCFG : CFG 4 := { edges := [(⟨0, by⊢ 0 < 4 omegaAll goals completed! 🐙⟩, ⟨1, by⊢ 1 < 4 omegaAll goals completed! 🐙⟩), (⟨0, by⊢ 0 < 4 omegaAll goals completed! 🐙⟩, ⟨2, by⊢ 2 < 4 omegaAll goals completed! 🐙⟩), (⟨1, by⊢ 1 < 4 omegaAll goals completed! 🐙⟩, ⟨3, by⊢ 3 < 4 omegaAll goals completed! 🐙⟩), (⟨2, by⊢ 2 < 4 omegaAll goals completed! 🐙⟩, ⟨3, by⊢ 3 < 4 omegaAll goals completed! 🐙⟩)] entry := ⟨0, by⊢ 0 < 4 omegaAll goals completed! 🐙⟩ }

We can verify structural properties:

example : diamondCFG.successors ⟨0, by⊢ 0 < 4 omegaAll goals completed! 🐙⟩ = [⟨1, by⊢ 1 < 4 omegaAll goals completed! 🐙⟩, ⟨2, by⊢ 2 < 4 omegaAll goals completed! 🐙⟩] := by⊢ diamondCFG.successors ⟨0, ⋯⟩ = [⟨1, ⋯⟩, ⟨2, ⋯⟩] native_decideAll goals completed! 🐙

Fixpoint Computation

The heart of any dataflow analysis is computing a fixpoint:

def iterate [DecidableEq α] (f : α → α) (init : α) : Nat → α
  | 0 => init
  | fuel + 1 =>
    let next := f init
    if next == init then init
    else iterate f next fuel

A simple test:

5#eval iterate (· + 1) 0 5

We can prove that when the function is idempotent on the result, we have a fixpoint:

theorem iterate_fixed [DecidableEq α] (f : α → α) (x : α) (h : f x = x) :
    iterate f x (n + 1) = x := byα:Type u_1n:Natinst✝:DecidableEq αf:α → αx:αh:f x = x⊢ iterate f x (n + 1) = x
  simp [iterate, h]All goals completed! 🐙

Where This Goes

This is just the foundation. With these building blocks you can formalize interval analysis, points-to analysis, taint analysis, and full abstract interpretation soundness proofs. Lean’s dependent types let us carry proofs alongside the analysis — the analysis produces evidence that results are correct.