Operational fragment-lambda: a stochastic-lazy unfold sampler #

Operational counterpart to FragmentGrammar.lean's joint density fg(F; F) of @cite{odonnell-2015} §3.1.8 (p. 94): the sampler that draws from that density. The architecture follows the macro-expansion of @cite{odonnell-2015} §2.3.7 Figure 2.21 — (fragment-lambda args body) ↦ (PYmem a b (lambda args (if (delay? args) (delay body) body))) — mapping each Church construct to a Lean component:

(PYmem a b _) ⤳ pypDraw over PYPState with PYM := StateT _ PMF.
(if (delay? args) (delay body) body) ⤳ PMF.bernoulli halt-coin per §3.1.8 (p. 92).
The (delay body) thunks ⤳ LazyTree.fragment constructor.

The single remaining sorry is on fragmentLambdaDepth_marginalises_to_fg — the depth-→-∞ proportionality theorem, documented in detail at the theorem itself. Probabilistic-fixed-point machinery on ω-CPPOs of sub-probability measures is the missing infrastructure (mathlib-level work). All other operational machinery and preservation theorems (pypDraw_preserves_wellFormed, fragmentLambdaDepth_preserves_wellFormed) are real proofs via the PYM.Preserves combinator algebra.

@cite{odonnell-2015} §2.3.7, §3.1.8, §3.5.5 (hyperparameters: a = 0.5, b = 100, ψ_B = 50).

Pitman–Yor memoisation state #

Universe note: types in this file live at Type (= Type 0) rather than universe-polymorphic Type u. Universe polymorphism is blocked by the use of PYM α D Unit in modify's signature: Unit : Type 0, so universe- polymorphizing PYM would require either PUnit (Type-polymorphic Unit) or ULift threading throughout. Both are tractable; deferred until a downstream consumer needs higher-universe support. Linguistic type domains (NTs, terminals, rules) are all small types so Type is sufficient in practice.

source

structure Morphology.FragmentGrammars.Operational.PYPSlot (D : Type) :

Type

A Pitman–Yor memoisation slot for one input value. We track:

dishes — the value sampled at the i-th distinct table
customerCounts — how many customers have sat at each table

The lists are kept length-equal by the public API (empty, seatCustomer, addTable); we don't enforce this as a structural invariant. Per @cite{odonnell-2015} footnote 14 (p. 60), multiple tables may share a dish, which is why we keep dishes as a list rather than a set or finmap.

dishes : List D
customerCounts : List ℕ

Instances For

source

def Morphology.FragmentGrammars.Operational.instInhabitedPYPSlot.default {a✝ : Type} :

PYPSlot a✝

Equations

Morphology.FragmentGrammars.Operational.instInhabitedPYPSlot.default = { dishes := default, customerCounts := default }

Instances For

source

@[implicit_reducible]

instance Morphology.FragmentGrammars.Operational.instInhabitedPYPSlot {a✝ : Type} :

Inhabited (PYPSlot a✝)

Equations

Morphology.FragmentGrammars.Operational.instInhabitedPYPSlot = { default := Morphology.FragmentGrammars.Operational.instInhabitedPYPSlot.default }

source

def Morphology.FragmentGrammars.Operational.PYPSlot.empty {D : Type} :

PYPSlot D

Equations

Morphology.FragmentGrammars.Operational.PYPSlot.empty = { dishes := [], customerCounts := [] }

Instances For

source

def Morphology.FragmentGrammars.Operational.PYPSlot.numCustomers {D : Type} (s : PYPSlot D) :

ℕ

Total customers seated at this slot (N in the book's notation).

Equations

s.numCustomers = s.customerCounts.sum

Instances For

source

def Morphology.FragmentGrammars.Operational.PYPSlot.numTables {D : Type} (s : PYPSlot D) :

ℕ

Number of occupied tables at this slot (K in the book's notation).

Equations

s.numTables = s.dishes.length

Instances For

source

def Morphology.FragmentGrammars.Operational.PYPSlot.seatCustomer {D : Type} (s : PYPSlot D) (i : ℕ) :

PYPSlot D

Seat one more customer at the existing table indexed by i. Out-of-range indices leave the slot unchanged (List.set is a no-op for out-of-bounds indices, and getD 0 + 1 = 1 falls back to seating a customer at a fresh single-element entry — but the set no-op means the new entry is dropped, leaving customerCounts unchanged).

Implemented via List.set rather than List.modify because mathlib has a developed membership API for the former (List.mem_or_eq_of_mem_set) that lets us prove seatCustomer_wellFormed directly.

Equations

s.seatCustomer i = { dishes := s.dishes, customerCounts := s.customerCounts.set i (s.customerCounts[i]?.getD 0 + 1) }

Instances For

source

def Morphology.FragmentGrammars.Operational.PYPSlot.addTable {D : Type} (s : PYPSlot D) (v : D) :

PYPSlot D

Open a fresh table with dish v and one initial customer.

Equations

s.addTable v = { dishes := s.dishes ++ [v], customerCounts := s.customerCounts ++ [1] }

Instances For

source

@[simp]

theorem Morphology.FragmentGrammars.Operational.PYPSlot.numTables_addTable {D : Type} (s : PYPSlot D) (v : D) :

(s.addTable v).numTables = s.numTables + 1

source

@[simp]

theorem Morphology.FragmentGrammars.Operational.PYPSlot.numCustomers_addTable {D : Type} (s : PYPSlot D) (v : D) :

(s.addTable v).numCustomers = s.numCustomers + 1

source

@[simp]

theorem Morphology.FragmentGrammars.Operational.PYPSlot.dishes_addTable {D : Type} (s : PYPSlot D) (v : D) :

(s.addTable v).dishes = s.dishes ++ [v]

source

def Morphology.FragmentGrammars.Operational.PYPSlot.WellFormed {D : Type} (s : PYPSlot D) :

Prop

A PYPSlot is well-formed when every customer count is positive. This invariant is maintained by (but not enforced by) the public API: empty has empty customerCounts (vacuous); addTable appends [1]; seatCustomer increments. Lifted to PYPState and used by the sampler to discharge the otherwise-unreachable atomic-fallback branch in slotToFinpartition.

Equations

s.WellFormed = ∀ c ∈ s.customerCounts, 0 < c

Instances For

source

theorem Morphology.FragmentGrammars.Operational.PYPSlot.empty_wellFormed {D : Type} :

empty.WellFormed

source

theorem Morphology.FragmentGrammars.Operational.PYPSlot.addTable_wellFormed {D : Type} {s : PYPSlot D} (h : s.WellFormed) (v : D) :

(s.addTable v).WellFormed

source

theorem Morphology.FragmentGrammars.Operational.PYPSlot.seatCustomer_wellFormed {D : Type} {s : PYPSlot D} (h : s.WellFormed) (i : ℕ) :

(s.seatCustomer i).WellFormed

seatCustomer preserves wellformedness: each element of the resulting customerCounts is either an unchanged original (positive by h) or the newly-set value (s.customerCounts[i]?.getD 0) + 1 (at least 1 since getD 0 ≥ 0).

source

structure Morphology.FragmentGrammars.Operational.PYPHyper :

Type

Pitman–Yor hyperparameters: discount a ∈ [0, 1) and concentration b > 0. @cite{odonnell-2015} §3.5.5 (p. 102) sets a = 0.5, b = 100.

Note: the standard PYP allows b ≥ -a (boundary case), including negative b for a > 0. We restrict to 0 < b to (a) match the book's hyperparameter choice and (b) guarantee the new-table PYP weight K·a + b is strictly positive at K = 0 — needed as a positivity witness for PMF.normalize via tsum_ne_zero_iff. The boundary case can be added as a separate variant if a consumer needs it.

discount : ℝ
concentration : ℝ
discount_nonneg : 0 ≤ self.discount
discount_lt_one : self.discount < 1
concentration_pos : 0 < self.concentration

Instances For

source

structure Morphology.FragmentGrammars.Operational.PYPState (α D : Type) :

Type

Global Pitman–Yor memoisation state: per-input slot states (one "restaurant" per input value) plus shared hyperparameters.

slots : α → PYPSlot D
hyper : PYPHyper

Instances For

source

def Morphology.FragmentGrammars.Operational.PYPState.empty {α D : Type} (h : PYPHyper) :

PYPState α D

Empty memoisation state (no customers anywhere).

Equations

Morphology.FragmentGrammars.Operational.PYPState.empty h = { slots := fun (x : α) => Morphology.FragmentGrammars.Operational.PYPSlot.empty, hyper := h }

Instances For

source

def Morphology.FragmentGrammars.Operational.PYPState.updateSlot {α D : Type} [DecidableEq α] (st : PYPState α D) (x : α) (s : PYPSlot D) :

PYPState α D

Update the slot at input x, leaving all others unchanged.

Equations

st.updateSlot x s = { slots := fun (y : α) => if y = x then s else st.slots y, hyper := st.hyper }

Instances For

source

@[simp]

theorem Morphology.FragmentGrammars.Operational.PYPState.empty_slots {α D : Type} (h : PYPHyper) (x : α) :

(empty h).slots x = PYPSlot.empty

All slots in an empty PYP state are PYPSlot.empty.

source

@[simp]

theorem Morphology.FragmentGrammars.Operational.PYPState.empty_hyper {α D : Type} (h : PYPHyper) :

(empty h).hyper = h

The empty PYP state's hyperparameters are exactly the input.

source

def Morphology.FragmentGrammars.Operational.PYPState.WellFormed {α D : Type} (st : PYPState α D) :

Prop

A PYPState is well-formed when every slot is well-formed. The sampler's invariant: pypDraw and downstream fragmentLambdaDepth preserve this — see pypDraw_preserves_wellFormed and fragmentLambdaDepth_preserves_wellFormed.

Equations

st.WellFormed = ∀ (x : α), (st.slots x).WellFormed

Instances For

source

theorem Morphology.FragmentGrammars.Operational.PYPState.empty_wellFormed {α D : Type} (h : PYPHyper) :

(empty h).WellFormed

source

theorem Morphology.FragmentGrammars.Operational.PYPState.updateSlot_wellFormed {α D : Type} [DecidableEq α] {st : PYPState α D} (h_st : st.WellFormed) {x : α} {newSlot : PYPSlot D} (h_new : newSlot.WellFormed) :

(st.updateSlot x newSlot).WellFormed

The PYP-memoised stochastic monad #

PYM α D γ is the type of γ-valued stochastic computations whose state is a Pitman–Yor memoisation table over inputs α storing dishes of type D. This is the monad-stack equivalent of the (PYmem a b _) wrapper in @cite{odonnell-2015} Figure 2.21.

source

@[reducible, inline]

abbrev Morphology.FragmentGrammars.Operational.PYM (α D γ : Type) :

Type

Equations

Morphology.FragmentGrammars.Operational.PYM α D γ = StateT (Morphology.FragmentGrammars.Operational.PYPState α D) PMF γ

Instances For

source

noncomputable def Morphology.FragmentGrammars.Operational.PYM.liftBase {α D γ : Type} (p : PMF γ) :

PYM α D γ

Lift a state-free PMF sample into PYM.

Equations

Morphology.FragmentGrammars.Operational.PYM.liftBase p st = p.bind fun (v : γ) => PMF.pure (v, st)

Instances For

`Preserves` algebra: state-property preservation through PYM combinators #

A small algebra for proving that a PYM computation preserves a state-level predicate P : PYPState α D → Prop. Closure laws under pure, bind, get, liftBase, modify, dite, mapM let arbitrary PYM computations built from these primitives discharge preservation mechanically.

Used in pypDraw_preserves_wellFormed and fragmentLambdaDepth_preserves_wellFormed to discharge the WellFormed invariant chain. The algebra is general (not tied to WellFormed); promoted to Linglib/Core/Probability/PYM.lean once a second consumer needs it.

source

def Morphology.FragmentGrammars.Operational.PYM.Preserves {α D γ : Type} (P : PYPState α D → Prop) (m : PYM α D γ) :

Prop

A PYM computation preserves a state-property P if every output state with positive probability satisfies P, given the input state does.

Equations

Morphology.FragmentGrammars.Operational.PYM.Preserves P m = ∀ (init : Morphology.FragmentGrammars.Operational.PYPState α D), P init → ∀ p ∈ (m init).support, P p.2

Instances For

source

theorem Morphology.FragmentGrammars.Operational.PYM.Preserves.pure {α D γ : Type} {P : PYPState α D → Prop} (a : γ) :

Preserves P (pure a)

pure a doesn't change state, so trivially preserves.

source

theorem Morphology.FragmentGrammars.Operational.PYM.Preserves.bind {α D γ : Type} {P : PYPState α D → Prop} {δ : Type} {m : PYM α D γ} {f : γ → PYM α D δ} (h_m : Preserves P m) (h_f : ∀ (a : γ), Preserves P (f a)) :

Preserves P (m >>= f)

bind preserves if both halves do.

source

theorem Morphology.FragmentGrammars.Operational.PYM.Preserves.get {α D : Type} {P : PYPState α D → Prop} :

Preserves P get

get reads state without changing it; preserves trivially.

source

theorem Morphology.FragmentGrammars.Operational.PYM.Preserves.liftBase {α D γ : Type} {P : PYPState α D → Prop} (q : PMF γ) :

Preserves P (liftBase q)

liftBase doesn't change state (the PMF runs over its own values, state threads through unchanged); preserves.

source

theorem Morphology.FragmentGrammars.Operational.PYM.Preserves.modify {α D : Type} {P : PYPState α D → Prop} {f : PYPState α D → PYPState α D} (h_f : ∀ (s : PYPState α D), P s → P (f s)) :

Preserves P (modify f)

modify f preserves P if f does.

source

theorem Morphology.FragmentGrammars.Operational.PYM.Preserves.dite {α D γ : Type} {P : PYPState α D → Prop} {c : Prop} [Decidable c] {m₁ : c → PYM α D γ} {m₂ : ¬c → PYM α D γ} (h₁ : ∀ (hc : c), Preserves P (m₁ hc)) (h₂ : ∀ (hc : ¬c), Preserves P (m₂ hc)) :

Preserves P (if hc : c then m₁ hc else m₂ hc)

Dependent if-then-else preserves if both branches do.

source

theorem Morphology.FragmentGrammars.Operational.PYM.Preserves.ite {α D γ : Type} {P : PYPState α D → Prop} {c : Prop} [Decidable c] {m₁ m₂ : PYM α D γ} (h₁ : Preserves P m₁) (h₂ : Preserves P m₂) :

Preserves P (if c then m₁ else m₂)

Non-dependent if-then-else preserves if both branches do.

source

theorem Morphology.FragmentGrammars.Operational.PYM.Preserves.mapM {α D γ : Type} {P : PYPState α D → Prop} {δ : Type} {f : γ → PYM α D δ} (h_f : ∀ (a : γ), Preserves P (f a)) (l : List γ) :

Preserves P (List.mapM f l)

List.mapM over a preserves-respecting body preserves P.

Pitman–Yor draw #

Per @cite{odonnell-2015} §2.3.3.2 (p. 59) and §3.1.6 (p. 89), the (N+1)-th customer at a slot sits at:

existing table i (1 ≤ i ≤ K) with weight (yᵢ − a) / (N + b)
a fresh new table with weight (K·a + b) / (N + b)

where yᵢ is the count at table i, N = Σyᵢ, K = number of tables, a = hyper.discount, b = hyper.concentration.

source

noncomputable def Morphology.FragmentGrammars.Operational.pypDraw {α D : Type} [DecidableEq α] [Inhabited D] (base : α → PYM α D D) (x : α) :

PYM α D D

Sample from the PYP at slot x with base distribution base.

Either reuse the dish at an existing table (with the §3.1.6 weights), or sample a fresh dish from base and seat the new customer at a new table. Both branches update the memo state appropriately.

The base distribution is PYM-typed (state-passing) rather than the usual PMF-typed because in fragment-lambda the recursive children of a fresh sample themselves invoke the memo (via mem{L^B} for B≠A in the §3.1.8 equations). The slots for distinct inputs are independent, so this state-threading is well-defined; a per-restaurant PMF base would suffice if the children's states were marginalised first.

Implementation note (exchangeability caveat). When base itself modifies state (recursive pypDraw calls add tables at other slots, or even at the same slot if grammar recursion revisits x), the table indices change between when this pypDraw decides "new table" and when it actually adds the new dish. Operationally we add to the post-base state. By PYP exchangeability the resulting joint distribution agrees with a sequential interpretation; a faithful implementation would either snapshot-then-restore or work in a memo-free base : α → PMF D.

Equations

One or more equations did not get rendered due to their size.

Instances For

Lazy partial derivation trees #

source

inductive Morphology.FragmentGrammars.Operational.LazyTree (α β R : Type) :

Type

A partial derivation tree under fragment-grammar generation. Three constructors:

terminal t — a fully-evaluated terminal symbol (the result of unfold reaching a terminal in @cite{odonnell-2015} Figure 2.7, p. 52).
fragment x — a non-terminal stored as a fragment-leaf: the type-level representation of the (delay body) thunks of Figure 2.21 (p. 71). When the body of fragment-lambda flips heads on the halt coin, the result at this position is fragment x rather than further recursion.
branch r x children — a non-terminal expanded by rule r. The rule is stored alongside the NT and children so that downstream consumers (samplesToCorpusCounts) can credit halt-vs-recurse decisions to the specific (rule, position) pair, matching @cite{odonnell-2015} §3.1.8's rule-indexed beta-binomial structure.

The third type parameter R is the rule type — typically ContextFreeRule T G.NT for CFGs, abstract here so the same LazyTree can be used by other generative formalisms (TAG, DM-PCFG variants).

The "complete" tree (no fragment-leaves anywhere) is the result of forcing all delayed thunks until termination.

terminal {α β R : Type} : β → LazyTree α β R
fragment {α β R : Type} : α → LazyTree α β R
branch {α β R : Type} : R → α → List (LazyTree α β R) → LazyTree α β R

Instances For

source

@[implicit_reducible]

instance Morphology.FragmentGrammars.Operational.instInhabitedLazyTree {a✝ a✝¹ : Type} [Inhabited a✝¹] {a✝² : Type} :

Inhabited (LazyTree a✝ a✝¹ a✝²)

Equations

Morphology.FragmentGrammars.Operational.instInhabitedLazyTree = { default := Morphology.FragmentGrammars.Operational.instInhabitedLazyTree.default }

source

def Morphology.FragmentGrammars.Operational.instInhabitedLazyTree.default {a✝ a✝¹ : Type} [Inhabited a✝¹] {a✝² : Type} :

LazyTree a✝ a✝¹ a✝²

Equations

Morphology.FragmentGrammars.Operational.instInhabitedLazyTree.default = Morphology.FragmentGrammars.Operational.LazyTree.terminal default

Instances For

source

@[irreducible]

def Morphology.FragmentGrammars.Operational.LazyTree.fragmentLeaves {α β R : Type} :

LazyTree α β R → List α

The fragment-leaf frontier: the non-terminals stored as halted sub-derivations (the "open slots" of a fragment, in @cite{odonnell-2015} §3.1.8's terminology).

Equations

(Morphology.FragmentGrammars.Operational.LazyTree.terminal a).fragmentLeaves = []
(Morphology.FragmentGrammars.Operational.LazyTree.fragment x_1).fragmentLeaves = [x_1]
(Morphology.FragmentGrammars.Operational.LazyTree.branch a a_1 kids).fragmentLeaves = List.flatMap Morphology.FragmentGrammars.Operational.LazyTree.fragmentLeaves kids

Instances For

source

@[irreducible]

def Morphology.FragmentGrammars.Operational.LazyTree.yield {α β R : Type} :

LazyTree α β R → List β

The terminal yield: the in-order sequence of terminal symbols reachable without forcing any fragment-leaf.

Equations

(Morphology.FragmentGrammars.Operational.LazyTree.terminal a).yield = [a]
(Morphology.FragmentGrammars.Operational.LazyTree.fragment x_1).yield = []
(Morphology.FragmentGrammars.Operational.LazyTree.branch a a_1 kids).yield = List.flatMap Morphology.FragmentGrammars.Operational.LazyTree.yield kids

Instances For

fragmentLambda — depth-bounded operational unfold #

The depth bound is a structural-recursion device. The mathematically- intended object is the depth-∞ limit, which terminates almost surely when the recurse probability is bounded away from 1 (geometric depth). Formalising the limit requires probabilistic-fixed-point machinery not yet in mathlib; the bounded version is genuine and ships now.

The structure of @cite{odonnell-2015} §3.1.8 (p. 93) is

G^a_FG(d)  = Σ mem{L^A}(s) · ∏ G^root_FG(s'_i)        -- PYP-wrap + recurse on holes
L^A(d)     = Σ θ_r ∏ [ν · G^root_FG + (1-ν) · 1]      -- biased halt-or-recurse
mem{L^A}  ~ PYP(a^A, b^A, L^A)                         -- Pitman-Yor memoization

We split this into two co-located functions:

stochasticLazyUnfoldDepth — the un-memoised §2.3.5.2 model (L^A with children recursing back into the unfold itself, no PYP). This is @cite{odonnell-2015} §2.3.5.2's stochastic-lazy-unfold (Figure 2.18, p. 68) — a recognised standalone sub-model in the book. Fully defined; kept here as the reference for the un-memoised model and as a sub-step for understanding the §3.1.8 architecture.
fragmentLambdaDepth — the faithful §3.1.8 model (G^FG = mem{L^A}). Each call wraps with pypDraw; the inner body recurses on fragmentLambdaDepth itself for non-terminal children — children's draws also consult the memo, faithfully matching §3.1.8's mutual recursion. Lean accepts this as structural recursion through the pypDraw lambda + mapM + if (the recursive call is on n, structurally smaller than n+1).

source

noncomputable def Morphology.FragmentGrammars.Operational.stochasticLazyUnfoldDepth {α β R : Type} (recurse : α → PMF (R × List (α ⊕ β))) (recurseProb : α → NNReal) (recurseProb_le : ∀ (x : α), recurseProb x ≤ 1) :

ℕ → α → PMF (LazyTree α β R)

Depth-bounded stochastic-lazy unfold per @cite{odonnell-2015} §2.3.5.2 (Figure 2.18, p. 68). At each call to non-terminal x:

Halt-or-recurse step: flip the biased coin (§3.1.8 BINOMIAL(ν), p. 92). With probability 1 − recurseProb x, halt and return LazyTree.fragment x. Otherwise sample a (rule, RHS) pair via recurse x, recursively expand each non-terminal child at the next- smaller depth, map each terminal to LazyTree.terminal, and assemble LazyTree.branch rule x kids.

The rule is stored on the branch so samplesToCorpusCounts can credit halt-vs-recurse decisions to the specific (rule, position) pair — matching the rule-indexed beta-binomial structure of §3.1.8.

Depth 0 always halts. Pure PMF (no PYP memo state) — the un-memoised sub-model. The PYP-wrapped version is fragmentLambdaDepth below.

Equations

One or more equations did not get rendered due to their size.
Morphology.FragmentGrammars.Operational.stochasticLazyUnfoldDepth recurse recurseProb recurseProb_le 0 x✝ = PMF.pure (Morphology.FragmentGrammars.Operational.LazyTree.fragment x✝)

Instances For

`fragmentLambdaDepth` — the PYP-memoised §3.1.8 model #

Wraps each non-terminal call with pypDraw so that previously-sampled partial subtrees at the same non-terminal can be reused; recursive children calls go back through fragmentLambdaDepth itself (PYP-wrapped), faithfully matching @cite{odonnell-2015} §3.1.8's mutual recursion G^FG = mem{L^A} ↔ L^A-body-calls-G^FG-on-children.

The inner per-call body is factored as fragmentLambdaStep so the preservation proof can apply combinator lemmas to a function with a visible name and type; the two presentations are definitionally equal.

source

noncomputable def Morphology.FragmentGrammars.Operational.fragmentLambdaDepth {α β R : Type} [DecidableEq α] [Inhabited β] (recurse : α → PMF (R × List (α ⊕ β))) (recurseProb : α → NNReal) (recurseProb_le : ∀ (x : α), recurseProb x ≤ 1) :

ℕ → α → PYM α (LazyTree α β R) (LazyTree α β R)

Equations

One or more equations did not get rendered due to their size.
Morphology.FragmentGrammars.Operational.fragmentLambdaDepth recurse recurseProb recurseProb_le 0 x✝ = pure (Morphology.FragmentGrammars.Operational.LazyTree.fragment x✝)

Instances For

Wellformedness preservation #

source

theorem Morphology.FragmentGrammars.Operational.pypDraw_preserves_wellFormed {α D : Type} [DecidableEq α] [Inhabited D] (base : α → PYM α D D) (x : α) (init : PYPState α D) (h_init : init.WellFormed) (h_base : ∀ (y : α) (init' : PYPState α D), init'.WellFormed → ∀ p ∈ (base y init').support, p.2.WellFormed) (p : D × PYPState α D) :

p ∈ (pypDraw base x init).support → p.2.WellFormed

pypDraw preserves PYPState.WellFormed: every state in the support of the output PMF is wellformed, given a wellformed input state and a wellformed-preserving base distribution.

The proof would observe that pypDraw's two branches are:

existing-table: seatCustomer i (preserves wellformedness via PYPSlot.seatCustomer_wellFormed)
new-table: addTable dish after sampling from base (preserves via PYPSlot.addTable_wellFormed + base's preservation hypothesis)

Combined with PYPState.updateSlot_wellFormed, both branches yield a wellformed state. The PMF support is contained in the union of these branches' images, so all output states are wellformed.

Status: sorry. As of 0.230.519's seatCustomer-discharge iteration, the slot-level lemmas (seatCustomer_wellFormed, addTable_wellFormed) are real proofs. The remaining obstacle is PMF.support reasoning through the do-block: pypDraw is a chain of binds, and the support of a bind is {b | ∃ a ∈ p.support, b ∈ (f a).support} (PMF.mem_support_bind_iff). Proving the result via this chain is mechanical (~50-100 LOC) but requires patient manipulation of PMF.support_pure, support_bind, support of PMF.normalize, etc.

source

theorem Morphology.FragmentGrammars.Operational.fragmentLambdaDepth_preserves_wellFormed {α β R : Type} [DecidableEq α] [Inhabited β] (recurse : α → PMF (R × List (α ⊕ β))) (recurseProb : α → NNReal) (recurseProb_le : ∀ (x : α), recurseProb x ≤ 1) (n : ℕ) (start : α) (init : PYPState α (LazyTree α β R)) (h_init : init.WellFormed) (p : LazyTree α β R × PYPState α (LazyTree α β R)) :

p ∈ (fragmentLambdaDepth recurse recurseProb recurseProb_le n start init).support → p.2.WellFormed

fragmentLambdaDepth preserves PYPState.WellFormed: every state in the support of the output PMF is wellformed, given a wellformed input state. By induction on depth: depth 0 trivially preserves (no state change, just pure (.fragment x)); depth n+1 via pypDraw_preserves_wellFormed with the inner body's preservation — which itself follows by IH for non-terminal children's fragmentLambdaDepth ... n calls.

When discharged, this theorem lets samplesToCorpusCounts and the soundness theorem assume input-state wellformedness, which in turn lets slotToFinpartition discharge its atomic-fallback branch via absurd (the branch is unreachable for any state the sampler can produce).

Halt-count extraction from samples #

source

def Morphology.FragmentGrammars.Operational.LazyTree.collectHaltCounts {α β R : Type} [DecidableEq R] :

LazyTree α β R → R → ℕ → ℕ × ℕ

See module note above on collectHaltCounts semantics.

Equations

One or more equations did not get rendered due to their size.
(Morphology.FragmentGrammars.Operational.LazyTree.terminal a).collectHaltCounts x✝¹ x✝ = (0, 0)
(Morphology.FragmentGrammars.Operational.LazyTree.fragment a).collectHaltCounts x✝¹ x✝ = (0, 0)

Instances For

source

def Morphology.FragmentGrammars.Operational.LazyTree.collectKidsHaltCounts {α β R : Type} [DecidableEq R] :

List (LazyTree α β R) → R → ℕ → ℕ × ℕ

List-recursion helper for collectHaltCounts.

Equations

One or more equations did not get rendered due to their size.
Morphology.FragmentGrammars.Operational.LazyTree.collectKidsHaltCounts [] x✝¹ x✝ = (0, 0)

Instances For

source

@[simp]

theorem Morphology.FragmentGrammars.Operational.LazyTree.collectHaltCounts_terminal {α β R : Type} [DecidableEq R] (t : β) (r : R) (i : ℕ) :

(terminal t).collectHaltCounts r i = (0, 0)

source

@[simp]

theorem Morphology.FragmentGrammars.Operational.LazyTree.collectHaltCounts_fragment {α β R : Type} [DecidableEq R] (x : α) (r : R) (i : ℕ) :

(fragment x).collectHaltCounts r i = (0, 0)

source

def Morphology.FragmentGrammars.Operational.LazyTree.toCFGTree? {α β R : Type} :

LazyTree α β R → Option (CFGTree β α)

See module note above on toCFGTree? semantics.

Equations

One or more equations did not get rendered due to their size.
(Morphology.FragmentGrammars.Operational.LazyTree.terminal a).toCFGTree? = some (CFGTree.leaf a)
(Morphology.FragmentGrammars.Operational.LazyTree.fragment x_1).toCFGTree? = none

Instances For

source

def Morphology.FragmentGrammars.Operational.LazyTree.toCFGTreesList {α β R : Type} :

List (LazyTree α β R) → Option (List (CFGTree β α))

List-recursion helper for toCFGTree?.

Equations

One or more equations did not get rendered due to their size.
Morphology.FragmentGrammars.Operational.LazyTree.toCFGTreesList [] = some []

Instances For

source

@[simp]

theorem Morphology.FragmentGrammars.Operational.LazyTree.toCFGTree?_terminal {α β R : Type} (t : β) :

(terminal t).toCFGTree? = some (CFGTree.leaf t)

source

@[simp]

theorem Morphology.FragmentGrammars.Operational.LazyTree.toCFGTree?_fragment {α β R : Type} (x : α) :

(fragment x).toCFGTree? = none

Soundness contract #

source

def Morphology.FragmentGrammars.Operational.slotToFinpartition {D : Type} (s : PYPSlot D) :

(n : ℕ) × OrderedFinpartition n

Convert a PYPSlot to a sigma-pair Σ n, OrderedFinpartition n suitable for AdaptorGrammar.TableAssignment. Three branches:

Empty slot (numCustomers = 0): returns ⟨0, default⟩ directly. Preserves the depth-0 lemma's from rfl chain (the empty case is defeq to AdaptorGrammar.emptyTables G's entry).
Non-empty wellformed slot (∀ c ∈ customerCounts, 0 < c): builds a real Composition s.numCustomers and projects via the new Composition.toOrderedFinpartition (in Linglib/Core/Probability/PitmanYor.lean). The resulting partition faithfully captures the slot's table-grouping structure — what pypFactor actually depends on for its EPPF value.
Non-empty non-wellformed slot (a customerCount is 0): falls back to OrderedFinpartition.atomic. This branch is unreachable under the sampler's invariant (pypDraw's addTable initialises counts to 1, seatCustomer increments). The WellFormed predicate on PYPSlot and PYPState and the preservation theorems pypDraw_preserves_wellFormed (proved via PYM.Preserves combinator algebra) and fragmentLambdaDepth_preserves_wellFormed (induction sorry'd, sketch in docstring) capture this invariant. When the depth-induction is discharged, this branch can be eliminated by taking (_ : init.WellFormed) as a hypothesis to slotToFinpartition and discharging the unreachable case via absurd.

The numCustomers = 0 special case is load-bearing: without it, the Composition-built partition for an empty slot is not definitionally equal to OrderedFinpartition.atomic 0 = default (both are unique inhabitants of OrderedFinpartition 0 modulo @[ext], but the partSize functions are syntactically distinct: one is (empty composition).blocksFun, the other is constant 1). Splitting on numCustomers = 0 keeps the empty branch defeq to the prior shim, preserving the depth-0 lemma.

Equations

One or more equations did not get rendered due to their size.

Instances For

source

@[simp]

theorem Morphology.FragmentGrammars.Operational.slotToFinpartition_empty {D : Type} :

slotToFinpartition PYPSlot.empty = ⟨0, default⟩

source

@[reducible, inline]

abbrev Morphology.FragmentGrammars.Operational.CorpusCounts (T : Type) [DecidableEq T] (G : ContextFreeGrammar T) [DecidableEq G.NT] :

Type

Convenient abbreviation for the corpus-counts triple consumed by FragmentGrammar.corpusProbGivenStorage: derivation multiset + per-NT table assignment + per-(rule, position) halt counts.

Equations

One or more equations did not get rendered due to their size.

Instances For

source

noncomputable def Morphology.FragmentGrammars.Operational.samplesToCorpusCounts {T : Type} [DecidableEq T] {G : ContextFreeGrammar T} [DecidableEq G.NT] (tree : LazyTree G.NT T (ContextFreeRule T G.NT)) (finalState : PYPState G.NT (LazyTree G.NT T (ContextFreeRule T G.NT))) :

CorpusCounts T G

Extract corpus-counts triple (D, Y, Z) from a completed sample. Maps a LazyTree (with PYP final state) into CorpusCounts T G:

D : Multiset (CFGTree T G.NT) — the underlying derivation trees
Y : AdaptorGrammar.TableAssignment G — table-level reuse counts
Z : FragmentGrammar.HaltCounts G — recurse/halt counts per (rule, position)

Status:

Z is real (via LazyTree.collectHaltCounts).
Y is faithful for sampler-reachable slots (via slotToFinpartition using Composition.toOrderedFinpartition): for any slot the sampler can produce (where customerCounts are all positive — addTable initialises to 1, seatCustomer increments), the resulting OrderedFinpartition has the true partition structure, and pypFactor evaluated on it gives its EPPF value. The atomic-fallback branch in slotToFinpartition is only reached for non-wellformed slots (impossible by sampler invariant; future work proves it).
D is real (via LazyTree.toCFGTree?): if the tree projects to a complete CFGTree (no fragment-leaves), D is the singleton multiset of that derivation; otherwise (.fragment somewhere in the tree) D is empty — incomplete samples contribute no derivation.

Equations

One or more equations did not get rendered due to their size.

Instances For

source

@[simp]

theorem Morphology.FragmentGrammars.Operational.stochasticLazyUnfoldDepth_zero {α β R : Type} [DecidableEq α] (recurse : α → PMF (R × List (α ⊕ β))) (recurseProb : α → NNReal) (recurseProb_le : ∀ (x : α), recurseProb x ≤ 1) (start : α) :

stochasticLazyUnfoldDepth recurse recurseProb recurseProb_le 0 start = PMF.pure (LazyTree.fragment start)

Depth-0 base case for the un-memoised unfold: returns PMF.pure (.fragment start).

source

@[simp]

theorem Morphology.FragmentGrammars.Operational.fragmentLambdaDepth_zero {α β R : Type} [DecidableEq α] [Inhabited β] (recurse : α → PMF (R × List (α ⊕ β))) (recurseProb : α → NNReal) (recurseProb_le : ∀ (x : α), recurseProb x ≤ 1) (start : α) (st : PYPState α (LazyTree α β R)) :

fragmentLambdaDepth recurse recurseProb recurseProb_le 0 start st = PMF.pure (LazyTree.fragment start, st)

Depth-0 base case: the sampler at depth 0 is the trivial state-passing pure of LazyTree.fragment start, with no state changes.

This is genuinely operational content: it pins down what the sampler does at the depth-bound boundary, where every branch halts. Provable by rfl because pure in the PYM = StateT _ PMF monad reduces definitionally to fun st => PMF.pure (·, st).

source

theorem Morphology.FragmentGrammars.Operational.fragmentLambdaDepth_zero_marginalises {T : Type} [DecidableEq T] [Inhabited T] {G : ContextFreeGrammar T} [DecidableEq G.NT] (M : FragmentGrammar G) (recurse : G.NT → PMF (ContextFreeRule T G.NT × List (G.NT ⊕ T))) (recurseProb : G.NT → NNReal) (recurseProb_le : ∀ (x : G.NT), recurseProb x ≤ 1) (hyper : PYPHyper) (start : G.NT) :

(fragmentLambdaDepth recurse recurseProb recurseProb_le 0 start (PYPState.empty hyper)) (LazyTree.fragment start, PYPState.empty hyper) = ENNReal.ofReal (M.corpusProbGivenStorage (samplesToCorpusCounts (LazyTree.fragment start) (PYPState.empty hyper)).1 (samplesToCorpusCounts (LazyTree.fragment start) (PYPState.empty hyper)).2.1 (samplesToCorpusCounts (LazyTree.fragment start) (PYPState.empty hyper)).2.2)

Depth-0 soundness corollary: at depth 0, for the trivial sample (LazyTree.fragment start, st), the PMF mass equals the §3.1.8 density at the empty corpus / empty tables / empty halt-counts triple — both equal to 1.

A .fragment leaf has no branches, so collectHaltCounts returns the zero pair for every (rule, position), making samplesToCorpusCounts.Z extensionally equal to emptyHaltCounts G. The depth-0 sample mass is 1 (PMF.pure), and the empty-corpus density is 1 by corpusProbGivenStorage_empty; equality holds.

This is a real, fully-proved slice of the soundness contract. The general soundness theorem below remains sorry-marked.

source

noncomputable def Morphology.FragmentGrammars.Operational.marginalAtCounts {T : Type} [DecidableEq T] {G : ContextFreeGrammar T} [DecidableEq G.NT] (samplerPMF : PMF (LazyTree G.NT T (ContextFreeRule T G.NT) × PYPState G.NT (LazyTree G.NT T (ContextFreeRule T G.NT)))) (target : CorpusCounts T G) :

ENNReal

The marginal mass the sampler puts on samples whose extracted corpus-counts equal target. Defined via PMF pushforward (PMF.map) along samplesToCorpusCounts, which avoids the function-equality DecidableEq issue that an explicit tsum + if-then-else formulation would face for (Multiset × TableAssignment × HaltCounts).

The pushforward is (samplerPMF.map extract) target = ∑' s, μ s · 1[extract s = target].

Equations

One or more equations did not get rendered due to their size.

Instances For

source

theorem Morphology.FragmentGrammars.Operational.fragmentLambdaDepth_marginalises_to_fg {T : Type} [DecidableEq T] [Inhabited T] {G : ContextFreeGrammar T} [DecidableEq G.NT] (M : FragmentGrammar G) (recurse : G.NT → PMF (ContextFreeRule T G.NT × List (G.NT ⊕ T))) (recurseProb : G.NT → NNReal) (recurseProb_le : ∀ (x : G.NT), recurseProb x ≤ 1) (hyper : PYPHyper) (start : G.NT) (n : ℕ) (target target' : CorpusCounts T G) :

have sampler := fragmentLambdaDepth recurse recurseProb recurseProb_le n start (PYPState.empty hyper); marginalAtCounts sampler target * ENNReal.ofReal (M.corpusProbGivenStorage target'.1 target'.2.1 target'.2.2) = marginalAtCounts sampler target' * ENNReal.ofReal (M.corpusProbGivenStorage target.1 target.2.1 target.2.2)

Soundness contract (general — proportionality form). The marginal mass the sampler puts at corpus-counts triple (D, Y, Z) is proportional to the §3.1.8 density M.corpusProbGivenStorage D Y Z, with the same proportionality constant for all triples. We express this as a ratio identity (avoiding the partition function): for any two triples the cross- products of their marginals and densities agree. This is necessary and sufficient for the marginal to be a normalised version of the density.

Why ratios rather than equality with a normaliser. The natural formulation marginal D Y Z = ENNReal.ofReal (density D Y Z) / Z(M) requires defining Z(M) = ∑'_(D,Y,Z) density D Y Z — a sum over function spaces (TableAssignment, HaltCounts are function types). Convergence of this sum is a real-analysis problem (it sums beta/gamma integrals); the partition function is essentially the marginal likelihood of the fragment-grammar model, which is itself an open computational problem (@cite{odonnell-2015} §3.2 introduces an MH sampler precisely because this constant is intractable). The ratio formulation sidesteps Z(M) entirely and captures the proportionality content directly.

Why this still requires depth-→-∞. At any finite depth n, the sampler's marginals are truncated — supported only on samples that halt within n recursion steps. The §3.1.8 density is the limiting distribution; at finite depth the marginals only approximately match. The proof requires:

Showing (λ n, marginal at depth n) → (λ ε, density-up-to-ε) (the depth-truncated marginals converge to the true §3.1.8 marginals as n → ∞). Almost-sure halting from recurseProb x < 1 for all x gives geometric-tail bounds; pass to the limit via dominated convergence on PMF.
PYP exchangeability to handle the operational sampler's order-of- customer-arrival vs the §3.1.8 joint distribution's order-agnostic claim (see pypDraw's exchangeability caveat).
Identifying the limit's marginal at (D, Y, Z) with the §3.1.8 product formula — induction matching each PYP draw to its AG-factor contribution and each biased-coin flip to its beta-binomial-ratio contribution to M.fgFactor.

Step 1 needs probabilistic-fixed-point machinery for monotone PMF-valued recursions (Knaster–Tarski / Kleene fixed point on ω-CPPOs of sub- probability measures) absent from mathlib. The right formal home for this is mathlib's measure-theory or analysis libraries, not linglib. Steps 2 and 3 are mechanical once Step 1 is in place.

Documentation

Linglib.Theories.Morphology.FragmentGrammars.FragmentLambda

Operational fragment-lambda: a stochastic-lazy unfold sampler #

Pitman–Yor memoisation state #

The PYP-memoised stochastic monad #

`Preserves` algebra: state-property preservation through PYM combinators #

Pitman–Yor draw #

Lazy partial derivation trees #

fragmentLambda — depth-bounded operational unfold #

`fragmentLambdaDepth` — the PYP-memoised §3.1.8 model #

Wellformedness preservation #

Halt-count extraction from samples #

Soundness contract #

Operational fragment-lambda: a stochastic-lazy unfold sampler #

Pitman–Yor memoisation state #

The PYP-memoised stochastic monad #

Preserves algebra: state-property preservation through PYM combinators #

Pitman–Yor draw #

Lazy partial derivation trees #

fragmentLambda — depth-bounded operational unfold #

fragmentLambdaDepth — the PYP-memoised §3.1.8 model #

Wellformedness preservation #

Halt-count extraction from samples #

Soundness contract #

`Preserves` algebra: state-property preservation through PYM combinators #

`fragmentLambdaDepth` — the PYP-memoised §3.1.8 model #