Adaptor grammars (Maximum a posteriori variant: MAG) #

@cite{odonnell-2015}

The "full-listing" model of @cite{odonnell-2015} §2.4.2 / §3.1.7: a Dirichlet–multinomial PCFG (rule weights with conjugate Dirichlet prior) augmented with a per-LHS Pitman–Yor process for memoizing subtree expansions. Each nonterminal A carries a PYP "restaurant" whose tables hold previously-computed subderivations rooted at A; new derivations either reuse a stored subtree (with probability proportional to its current table occupancy) or productively compute a fresh subtree from the underlying CFG.

Per @cite{odonnell-2015} §3.1.7 the corpus probability factorizes per-LHS as

ag(X, Y; A) = ∏_{A ∈ V} [DMPCFG-factor for A on X^A]
                       · [PYP-factor for A on Y^A]

where X^A is the rule-count vector for LHS A (the same data DMPCFG consumes) and Y^A is the table-occupancy assignment for A's restaurant — formally a set partition of [N^A] (the N^A uses of NT A in the corpus, partitioned by which table each use sat at).

Why a set partition (not a multiset)? #

@cite{odonnell-2015} writes Y^A = ȳ^A as a "count vector of reused derivations stored on each table". Tables in O'Donnell's PYP are labeled (table 1, 2, ...) — each table stores a specific subderivation, and different tables store different ones (line 91-92 of the book). The natural mathematical object is therefore "for each NT-A use in the corpus, which table did it go to?", i.e. a customer- to-table assignment function. Each set partition of [N^A] corresponds to exactly one labeled assignment under any canonical labeling convention.

The EPPF formula [θ + α]_{K-1, α} · ∏ [1 - α]_{n_i - 1, 1} / [θ + 1]_{N - 1, 1} (@cite{pitman-2006} Thm 3.2 / @cite{odonnell-2015} eq from §3.1.7) is precisely the probability of one such specific set partition. By the EPPF's symmetry it depends only on the multiset of block sizes — but the underlying random variable is the set partition itself.

We model TableAssignment as G.NT → Σ n, OrderedFinpartition n using mathlib's OrderedFinpartition n (the structure mathlib uses for Faà di Bruno; it represents a set partition of Fin n with a canonical labeling by greatest element). The choice of OrderedFinpartition over Finpartition is for the proof of sum_partitionProb_ord_eq_one: mathlib's OrderedFinpartition.extendEquiv gives the seating-plan bijection (c : OrderedFinpartition n) × Option (Fin c.length) ≃ OrderedFinpartition (n+1) out of the box, exactly matching Pitman's (α, θ) seating-plan recursion. Finpartition lacks this bijection lemma. Either type admits the same number of objects (one per set partition of [n]) and partitionProb only depends on the block-size multiset, so the choice is purely about which lemmas come for free. pypFactor extracts the block-size multiset (via OrderedFinpartition.toNatPartition) to evaluate the EPPF.

Why corpus probability is `(D, Y) → ℝ`, not `D → ℝ` #

Y^A is latent — it is part of the MAP analysis, not part of the observed corpus. To reduce to D → ℝ we would have to marginalize over all possible Y, which is exactly the MH inference target distribution of @cite{odonnell-2015} §3.2.1 — out of scope per the "Processing scope" rule (we formalize target distributions, not inference algorithms). The honest signature is (D, Y) → ℝ: the closed-form probability given a particular table assignment.

The mathlib analog is MeasureTheory.Kernel: AG defines a kernel from corpora to table assignments, and the marginal distribution on corpora is kernel.bind against the prior — a perfectly well-defined object that we do not compute.

What we inherit from `DMPCFG` #

AdaptorGrammar G extends DMPCFG G, so pseudo, pseudo_pos, lhsUrn, lhsCounts, lhsFactor, lhsFactor_pos are all inherited. AG adds only the per-LHS Pitman–Yor process and the PYP factor.

Main definitions #

AdaptorGrammar G — extends DMPCFG G with per-LHS Pitman–Yor.
AdaptorGrammar.TableAssignment — abbrev for the latent table data: per LHS, a set partition of [N^A].
AdaptorGrammar.pypFactor — per-LHS Pitman–Yor partition probability (EPPF evaluated on the block-size multiset).
AdaptorGrammar.corpusProbGivenTables — eq from §3.1.7, conditional on a table assignment.

References #

@cite{odonnell-2015} §2.4.2, §3.1.7.
@cite{pitman-2006} §3.2 (EPPF and the (α, θ) seating plan).

Adaptor grammars (Maximum a posteriori variant: MAG) #

Why a set partition (not a multiset)? #

Why corpus probability is (D, Y) → ℝ, not D → ℝ #

What we inherit from DMPCFG #

Main definitions #

References #

Bridge: AdaptorGrammar → MultinomialPCFG via posterior MAP #

Conjugacy decomposition (mirror of DMPCFG) #

Why corpus probability is `(D, Y) → ℝ`, not `D → ℝ` #

What we inherit from `DMPCFG` #