Documentation

Linglib.Theories.Morphology.FragmentGrammars.Comparisons

FG-family cross-model comparisons #

@cite{odonnell-2015}

Theorems that pit MultinomialPCFG, DMPCFG, and (eventually) MAG / FG against each other on shared mathematical content. Anchored on @cite{odonnell-2015} §3.1 ontology of priors over multinomial PCFGs.

Architecture #

These theorems live here, not in MultinomialPCFG.lean or DMPCFG.lean, because they reason about both substrates and would otherwise force one to import the other unnecessarily. Mirrors the mathlib pattern of */Comparison.lean files (e.g. Mathlib/Order/MinMax.lean patterns relating different order structures).

Main theorems #

theorem Morphology.FragmentGrammars.dmpcfg_corpus_strictly_couples {T : Type} [DecidableEq T] :
∃ (G : ContextFreeGrammar T) (x : DecidableEq G.NT) (x_1 : ∀ (a : G.NT), Nonempty (G.RulesWithLHS a)) (M : DMPCFG G) (D : Multiset (CFGTree T G.NT)), ENNReal.ofReal (M.corpusProb D) (M.posteriorMAP D).corpusProb D

The non-factorization counterexample. There exists a DMPCFG and a corpus whose DMPCFG corpus probability differs from the multinomial-PCFG corpus probability of the corresponding posterior MAP estimate.

This is the formal Lean witness of the "exchangeable but not independent" claim that distinguishes DMPCFG from MultinomialPCFG. The two assertions in the file docstrings of MultinomialPCFG.lean ("factorization is what distinguishes the multinomial-PCFG baseline from its richer-prior descendants") and DMPCFG.lean ("P(D | M) ≠ ∏_d P(d | M) — DMPCFG derivations are exchangeable but not independent") collapse onto this single existential.

Proof obligation (sorried for now). Take any 2-rule-per-LHS toy grammar; pick pseudo constant on those two rules; pick a corpus of two derivations exercising the two rules. The DMPCFG side produces a Pólya factor Γ(π_total)/Γ(π_total+2) · Γ(π_1+1)Γ(π_2+1) / Γ(π_1)Γ(π_2) which strictly differs from the per-derivation product (mapWeight r₁) · (mapWeight r₂) of MAP weights.

The infrastructure is in place: DMPCFG.posteriorMAP, MultinomialPCFG.corpusProb, and DMPCFG.corpusProb are all defined; what's missing is the numeric witness and the explicit Pólya-vs-product computation.