Multinomial probabilistic context-free grammar #

@cite{odonnell-2015}

The baseline stochastic CFG: for each nonterminal A, a multinomial distribution over A's expansions; the probability of a derivation tree factorizes as the product of the rule weights it uses, and the probability of a corpus factorizes across derivations.

This construction predates @cite{odonnell-2015} by decades — see Booth 1969, Booth & Thompson 1973, Chi & Geman 1998 for the canonical literature on PCFG and per-LHS multinomial structure (these are not yet in references.bib; cite-key additions deferred). @cite{odonnell-2015} §3.1.2 gives the didactic treatment we follow for notation and as the substrate for the §3.1.4–§3.1.8 family of priors over multinomial PCFGs (DMPCFG, MAG, FG) that build on this baseline.

Factorization (@cite{odonnell-2015} eq 3.2 + eq 3.5) #

Per @cite{odonnell-2015} eq 3.2, the per-derivation probability is the product of the rule weights it uses:

P(d | G) = ∏_{r ∈ d} θ_r          (eq 3.2)

Per @cite{odonnell-2015} eq 3.5, the corpus probability collects these into a single product over rules:

P(D | G) = ∏_{A ∈ V_NT} ∏_{r ∈ R^A} (θ_r^A)^{x_r^A}     (eq 3.5)

derivProb formalizes eq 3.2; corpusProb is its natural lift to a multiset of derivations. The count-form (eq 3.5) is provably equivalent to the per-derivation form (corpusProb_eq_prod_pow_count, deferred — needs derivRuleCount extracted from DMPCFG to a shared substrate first).

The factorization across derivations (corpusProb_add) is what distinguishes the multinomial-PCFG baseline from its richer-prior descendants DMPCFG, MAG, FG, where exchangeable Pólya / PYP / beta-binomial state couples derivations through shared corpus-aggregate counts. DOP estimators (DOP1, ENDOP) also factorize via Goodman 1998/2003's PCFG reductions, despite being on the "non-baseline" side of the substrate; see Comparisons.lean.

Architecture #

MultinomialPCFG G is a single point in weight space: for each LHS, a PMF over its rule bucket. Mathlib's PMF discipline is genuine here, not aspirational — normalization is part of what a probability mass function is, so the previous ruleProb_nonneg / ruleProb_normalized side conditions disappear and noncomputable is forced only by PMF's ℝ≥0∞ carrier (not by our use of ℝ).

The forgetful projection to WeightedCFG G ℝ≥0∞ (Core/Computability/ WeightedCFG.lean) is toWeightedCFG. The bridge from richer-prior descendants is the function DMPCFG.posteriorMAP : DMPCFG G → Multiset _ → MultinomialPCFG G (DMPCFG.lean): collapse a Dirichlet prior, conditioned on a corpus, into its MAP point estimate. DMPCFG does not extends MultinomialPCFG; the two are conceptually distinct objects — a prior over weight-points versus a single weight-point.

The structure requires [∀ a : G.NT, Nonempty (G.RulesWithLHS a)]: PMFs over empty supports don't exist, so grammars with "useless" nonterminals (no expansion) cannot carry a MultinomialPCFG. This constraint is now structural rather than implicit (the previous ruleProb_normalized field demanded sum = 1 for every a, which was unsatisfiable when the LHS bucket was empty).

Main definitions #

MultinomialPCFG G — per-LHS PMF over the LHS rule bucket.
MultinomialPCFG.ruleProb — per-rule probability (PMF mass on the rule's LHS bucket when r ∈ G.rules, else 0).
MultinomialPCFG.toWeightedCFG — forgetful projection to WeightedCFG G ℝ≥0∞.
MultinomialPCFG.derivProb — per-derivation probability, recursive product of rule weights through the tree (eq 3.2).
MultinomialPCFG.corpusProb — corpus probability as the product of per-derivation probabilities.

Main theorems #

MultinomialPCFG.corpusProb_add — multiplicativity over disjoint corpora (the Lean content of the "derivations are independent" claim).
MultinomialPCFG.corpusProb_zero — empty corpus has probability 1.

References #

@cite{odonnell-2015} §3.1.2 (eq 3.2 + eq 3.5).

Multinomial probabilistic context-free grammar #

Factorization (@cite{odonnell-2015} eq 3.2 + eq 3.5) #

Architecture #

Main definitions #

Main theorems #

References #

Concrete instances #

Information-theoretic primitives (bridge to Entropy) #

Count-form factorization (@cite{odonnell-2015} eq 3.5) #

Information-theoretic primitives (bridge to `Entropy`) #