Fragment grammars (FG) — the central proposal #
@cite{odonnell-2015}
The "inference-based" model of @cite{odonnell-2015} §2.4.4 / §3.1.8,
and the central theoretical proposal of the book. A FragmentGrammar
extends an AdaptorGrammar by adding a per-rule-RHS-position
beta-binomial halt prior: at each nonterminal slot in each rule's
right-hand side, a Beta-distributed weight ν controls the
probability of recursing (productively expanding the slot) vs.
halting (storing the slot as an open NT in a memoized fragment).
This generalizes both DMPCFG (ν → 1 everywhere: always recurse,
fragments are depth-one) and MAG (ν decided once per nonterminal,
not per slot: fragments are full subtrees). FG learns a separate
recursion probability per (rule, position), letting fragments be
arbitrary partial trees with selective open NT slots — the
distinguishing feature of the book's account of productivity-and-
reuse.
Per @cite{odonnell-2015} §3.1.8 the corpus probability factorizes as
fg(X, Y, Z; F) = ∏_{A ∈ V} [DMPCFG-factor on X^A] · [PYP-factor on Y^A]
· ∏_{r ∈ R_G} ∏_{B ∈ rhs(r)}
B(ψ_B + z_{r,B}^↻, ψ'_B + z_{r,B}^⊥) /
B(ψ_B, ψ'_B)
where Z = z_{r,B}^{↻/⊥} is the corpus-aggregate count of recurse/
halt decisions at each (rule, RHS-position) pair, and B(·,·) is
the Beta function (here written as a ratio of Γ-products to avoid
a dependency on Mathlib.Probability.Distributions.Beta).
Why corpus probability is (D, Y, Z) → ℝ #
Z is latent — the recurse/halt assignment per fragment is part of
the MAP analysis, not the observed corpus. Same situation as Y in
AdaptorGrammar. Marginalizing over (Y, Z) is the MH inference
target distribution of @cite{odonnell-2015} §3.2 — out of scope
per the Processing-scope rule.
What we inherit from AdaptorGrammar #
FragmentGrammar G extends AdaptorGrammar G, so the entire DMPCFG
- PYP infrastructure is inherited:
pseudo,pseudo_pos,lhsUrn,lhsCounts,lhsFactor,pyp,pypFactor,corpusProbGivenTables, all the positivity proofs. FG only adds the beta-binomial halt prior and the per-(rule, position) factor.
Main definitions #
FragmentGrammar G— extendsAdaptorGrammar Gwith per-(rule, RHS-position) beta-binomial pseudo-counts (haltPriorRecurse,haltPriorHalt).FragmentGrammar.HaltCounts— abbrev for the latent recurse/halt count assignment.FragmentGrammar.fgFactor— per-(rule, position) beta-binomial factor.FragmentGrammar.corpusProbGivenStorage— eq from §3.1.8, conditional on table assignmentYand halt countsZ.
§2.3.7 vs §3.1.8 — sampler vs distribution #
This file is the distribution side: corpusProbGivenStorage is the
density fg(F; F) of @cite{odonnell-2015} §3.1.8 (p. 94). The
sampler that draws from this density is in the sibling file
FragmentLambda.lean, scaffolding @cite{odonnell-2015} §2.3.7's
Church macro (fragment-lambda args body) (Figure 2.21, p. 71). The
two are linked by the soundness contract
fragmentLambdaDepth_marginalises_to_fg, which equates the sampler's
output marginal with the §3.1.8 density.
Per @cite{odonnell-2015} §3.1.8 (p. 92) the substrate here implements
the actual model used in the rest of the book — biased halt coin
BINOMIAL(ν) with ν itself drawn from a Beta prior — not the
fair-coin presentation of §2.3.6. The §2.3.7 fair-coin macro is
recovered as the haltPriorRecurse = haltPriorHalt = 1 special case
(Beta(1,1) = Uniform).
References #
- @cite{odonnell-2015} §2.4.4, §3.1.8.
A fragment grammar over G: an adaptor grammar augmented with a
per-(rule, RHS-position) beta-binomial halt prior. At each
nonterminal slot in each rule's right-hand side, the weight ν of
recursing-vs-halting is Beta-distributed with pseudo-counts
(haltPriorRecurse r i, haltPriorHalt r i).
Allowing per-slot halt decisions (rather than per-nonterminal as in MAG) lets fragments be arbitrary partial trees — the central theoretical commitment of @cite{odonnell-2015}.
- pseudo : ContextFreeRule T G.NT → ℝ
- pyp : G.NT → ProbabilityTheory.PitmanYor
- haltPriorRecurse : ContextFreeRule T G.NT → ℕ → ℝ
Beta pseudo-count for "recurse" outcome at (rule, RHS position).
- haltPriorHalt : ContextFreeRule T G.NT → ℕ → ℝ
Beta pseudo-count for "halt" outcome at (rule, RHS position).
Recurse pseudo-counts are strictly positive.
Halt pseudo-counts are strictly positive.
Instances For
A halt-count assignment gives, for each rule r and each
RHS-position i, the corpus-aggregate counts of "recurse" and
"halt" decisions at that slot. This is the latent variable Z in
@cite{odonnell-2015} §3.1.8.
Consistency between Z, Y (table assignment), and the observed
corpus D is the responsibility of the caller (or of the MAP
inference step we do not formalize).
Equations
- Morphology.FragmentGrammars.FragmentGrammar.HaltCounts G = (ContextFreeRule T G.NT → ℕ → ℕ × ℕ)
Instances For
Per-(rule, position) beta-binomial factor in the eq from §3.1.8.
For pseudo-counts α = haltPriorRecurse r i, β = haltPriorHalt r i,
and observed counts (zr, zh) = Z r i,
B(α + zr, β + zh) / B(α, β)
= Γ(α+zr)·Γ(β+zh)·Γ(α+β) / (Γ(α+β+zr+zh)·Γ(α)·Γ(β)) .
Inlined via Real.Gamma rather than mathlib's
ProbabilityTheory.beta to avoid pulling in
Mathlib.Probability.Distributions.Beta.
Equations
- One or more equations did not get rendered due to their size.
Instances For
FG corpus probability conditional on a table assignment Y and a
halt-count assignment Z. Per @cite{odonnell-2015} §3.1.8:
fg(X, Y, Z; F) = ag(X, Y; A) · ∏_r ∏_i fgFactor r i (Z r i)
The AG part is inherited from AdaptorGrammar.corpusProbGivenTables
(via extends); FG adds only the per-(rule, position) Beta-
binomial factors.
Equations
- M.corpusProbGivenStorage D Y Z = M.corpusProbGivenTables D Y * ∏ r ∈ G.rules, ∏ i ∈ Finset.range r.output.length, M.fgFactor r i (Z r i)
Instances For
The per-(rule, position) factor is strictly positive.
FG corpus probability is nonnegative on any storage assignment.
The "empty" halt-count assignment: zero recurse-decisions and
zero halt-decisions at every (rule, position) pair. The
corresponding fgFactor evaluates to 1 because the Beta-
binomial ratio is B(α, β) / B(α, β) = 1.
Equations
- Morphology.FragmentGrammars.FragmentGrammar.emptyHaltCounts G x✝¹ x✝ = (0, 0)
Instances For
The per-(rule, position) factor at zero counts is 1: the
Beta-binomial ratio degenerates to B(α, β) / B(α, β).
FG corpus probability is 1 on the empty corpus paired with
the empty table assignment and empty halt-count assignment.
Each fgFactor is 1, the AG factor is 1, so the overall
product is 1.
Bridge: FragmentGrammar → MultinomialPCFG via posterior MAP #
FG = AG + per-(rule, RHS-position) beta-binomial halt prior. Like
PYP in AG, the halt-prior factor is multiplicative in
corpusProbGivenStorage and does not enter per-LHS rule-weight
inference. So FG's posterior MAP rule weights factor through AG
which factors through DMPCFG.
The "tower equality" FG.posteriorMAP D = AG.posteriorMAP D = DMPCFG.posteriorMAP D is structural — every layer above DMPCFG in
the prior tower is purely additive on rule-weight inference. This
is the architectural payoff: the family-of-priors structure
(extends chain) is genuinely "richer hyperparameter sets," not
"different rule-weight semantics."
The Dirichlet posterior MAP collapse for FG. Same one-liner as
AG, factoring transitively through extends: FG → AG → DMPCFG.
The beta-binomial halt prior contributes a multiplicative factor
to corpus probability but does not alter rule-weight inference.
Equations
- M.posteriorMAP D = M.posteriorMAP D
Instances For
Coherence (one step). FG's posterior-MAP agrees with AG's
posterior-MAP. Holds by rfl — FG only adds halt priors on top
of AG; rule-weight inference is unchanged.
Coherence (two steps). FG's posterior-MAP agrees with the
underlying DMPCFG's posterior-MAP. The "prior tower" of length
three collapses to a single canonical MultinomialPCFG by rfl.
Conjugacy decomposition (mirror of AG / DMPCFG) #
FG's posterior delegates through AG which delegates through DMPCFG.
Halt-prior pseudo-counts are unchanged by corpus updates (the
beta-binomial layer doesn't see corpus rule counts). Mode delegates
to the underlying DMPCFG. The decomposition holds by rfl at all
three layers, witnessing the rfl-tower discovery (the prior tower's
extends-chain is structurally orthogonal to rule-weight inference).
FG's posterior update: bump the underlying AG/DMPCFG, leave halt-prior pseudo-counts unchanged.
Equations
- M.posterior D = { toAdaptorGrammar := M.posterior D, haltPriorRecurse := M.haltPriorRecurse, haltPriorHalt := M.haltPriorHalt, haltPriorRecurse_pos := ⋯, haltPriorHalt_pos := ⋯ }
Instances For
FG's mode in MultinomialPCFG-space: delegates through AG to
DMPCFG. Halt-prior pseudo-counts contribute no rule-weight mass.
Instances For
Empty corpus = identity update (delegated to AG.posterior_zero).
Incrementality at the FG layer.
The conjugacy decomposition at FG layer. posteriorMAP = mode ∘ posterior, holding by rfl because all FG operations
delegate through DMPCFG.