Documentation

Linglib.Theories.Morphology.FragmentGrammars.DMPCFG

Dirichlet–multinomial PCFG (DMPCFG) #

@cite{odonnell-2015}

The "full-parsing" model of @cite{odonnell-2015} §2.4.1 / §3.1.4: a PCFG with a Dirichlet prior on the per-LHS rule-weight vector. Marginalizing out the rule weights yields the closed-form corpus probability of eq 3.9 — the Dirichlet–multinomial likelihood.

P(D | M) = ∏_{A ∈ V}
  [ Γ(Σ_i π_i^A) / Γ(Σ_i π_i^A + Σ_i x_i^A)
    · ∏_i Γ(π_i^A + x_i^A) / Γ(π_i^A) ]

where π_i^A are the Dirichlet hyperparameters (pseudo-counts) for the i-th rule with LHS A, and x_i^A is the count of that rule across the corpus D.

Why DMPCFG does not factorize #

The crucial difference from a multinomial PCFG (MultinomialPCFG) is that the same rule's count enters the same Pólya term across all derivations in the corpus. Two derivations using rule r are correlated through the shared x_r in the closed form. As a result, P(D | M) ≠ ∏_d P(d | M) — DMPCFG derivations are exchangeable but not independent. The corpus probability is a corpus-level function; there is no clean per-derivation factorization to expose.

Connection to PolyaUrn #

The per-LHS factor lhsFactor M D A IS the closed-form Pólya-urn per-sequence likelihood over rules with LHS A, using M.pseudo restricted to those rules. The Type-polymorphic PolyaUrn α from Linglib.Core.Probability.PolyaUrn is constructed per-LHS via lhsUrn; positivity of corpusProb is derived structurally from PolyaUrn.seqProb_pos. (We use seqProb, not the count-vector PMF polyaUrnPMFReal, because a corpus IS a specific labeled sequence of derivations, not a draw from the unlabeled-count distribution.)

Main definitions #

References #

structure Morphology.FragmentGrammars.DMPCFG {T : Type} [DecidableEq T] (G : ContextFreeGrammar T) [DecidableEq G.NT] :

A Dirichlet–multinomial PCFG over G: per-rule pseudo-counts that parametrize a Dirichlet prior on the per-LHS rule-weight vector.

Per @cite{odonnell-2015} §2.4.1 the conditional distribution over weights given a corpus is again Dirichlet (conjugacy). The "MAP weights" (CL convention) are taken to be the Dirichlet posterior mean (π + x) / Σ(π + x), proportional to corpus rule frequencies.

Terminology note: in strict Bayesian usage MAP = mode of posterior over θ, with formula (π - 1) / Σ(π - 1) requiring π > 1 everywhere. For the symmetric Dirichlet priors typical in CL (α ≤ 1) the strict mode is at the simplex boundary, so practitioners use the posterior mean (= posterior predictive probability) and call it "MAP" loosely. @cite{odonnell-2015} follows this convention; we do too. The strict mode is also formalized as posteriorMode; see that section for the ordering-equivalence theorem (mapWeight_lt_iff_posteriorMode_lt) that shows the loose convention is harmless for ordering claims.

  • pseudo : ContextFreeRule T G.NT

    Per-rule pseudo-count (Dirichlet hyperparameter).

  • pseudo_pos (r : ContextFreeRule T G.NT) : r G.rules0 < self.pseudo r

    Pseudo-counts are strictly positive on grammar rules.

Instances For
    theorem Morphology.FragmentGrammars.DMPCFG.ext {T : Type} {inst✝ : DecidableEq T} {G : ContextFreeGrammar T} {inst✝¹ : DecidableEq G.NT} {x y : DMPCFG G} (pseudo : x.pseudo = y.pseudo) :
    x = y
    theorem Morphology.FragmentGrammars.DMPCFG.ext_iff {T : Type} {inst✝ : DecidableEq T} {G : ContextFreeGrammar T} {inst✝¹ : DecidableEq G.NT} {x y : DMPCFG G} :
    x = y x.pseudo = y.pseudo
    noncomputable def Morphology.FragmentGrammars.DMPCFG.lhsUrn {T : Type} [DecidableEq T] {G : ContextFreeGrammar T} [DecidableEq G.NT] (M : DMPCFG G) (a : G.NT) :

    Per-LHS Pólya urn over the subtype of rules with LHS a, parameterized by M.pseudo restricted to those rules.

    Equations
    Instances For
      def Morphology.FragmentGrammars.DMPCFG.lhsCounts {T : Type} [DecidableEq T] {G : ContextFreeGrammar T} [DecidableEq G.NT] (a : G.NT) (D : Multiset (CFGTree T G.NT)) :
      G.RulesWithLHS a

      Per-LHS corpus rule-count vector as a function on the subtype.

      Equations
      Instances For
        noncomputable def Morphology.FragmentGrammars.DMPCFG.lhsFactor {T : Type} [DecidableEq T] {G : ContextFreeGrammar T} [DecidableEq G.NT] (M : DMPCFG G) (a : G.NT) (D : Multiset (CFGTree T G.NT)) :

        Per-LHS factor in the eq 3.9 product: the Pólya partition probability of the corpus rule-counts under the per-LHS pseudo-counts.

          Γ(Σ π_i^A) / Γ(Σ π_i^A + Σ x_i^A)  ·  ∏_i Γ(π_i^A + x_i^A) / Γ(π_i^A) .
        
        Equations
        Instances For
          noncomputable def Morphology.FragmentGrammars.DMPCFG.corpusProb {T : Type} [DecidableEq T] {G : ContextFreeGrammar T} [DecidableEq G.NT] (M : DMPCFG G) (D : Multiset (CFGTree T G.NT)) :

          DMPCFG corpus probability — eq 3.9 of @cite{odonnell-2015}. Product over LHSs that appear in the grammar of the per-LHS Pólya factor. LHSs with no rules in G contribute trivially (empty image term) so the product is over G.rules.image (·.input).

          Equations
          • M.corpusProb D = aFinset.image (fun (x : ContextFreeRule T G.NT) => x.input) G.rules, M.lhsFactor a D
          Instances For
            theorem Morphology.FragmentGrammars.DMPCFG.nonempty_rulesWithLHS_of_mem_image {T : Type} {G : ContextFreeGrammar T} [DecidableEq G.NT] {a : G.NT} (ha : a Finset.image (fun (x : ContextFreeRule T G.NT) => x.input) G.rules) :
            Nonempty (G.RulesWithLHS a)

            For LHSs that have at least one rule, the subtype is nonempty.

            theorem Morphology.FragmentGrammars.DMPCFG.lhsFactor_pos {T : Type} [DecidableEq T] {G : ContextFreeGrammar T} [DecidableEq G.NT] (M : DMPCFG G) {a : G.NT} (ha : a Finset.image (fun (x : ContextFreeRule T G.NT) => x.input) G.rules) (D : Multiset (CFGTree T G.NT)) :
            0 < M.lhsFactor a D

            The per-LHS factor is strictly positive when the LHS has rules. Direct corollary of PolyaUrn.seqProb_pos.

            theorem Morphology.FragmentGrammars.DMPCFG.corpusProb_nonneg {T : Type} [DecidableEq T] {G : ContextFreeGrammar T} [DecidableEq G.NT] (M : DMPCFG G) (D : Multiset (CFGTree T G.NT)) :

            DMPCFG corpus probabilities are nonnegative (in fact, positive).

            @[simp]
            theorem Morphology.FragmentGrammars.DMPCFG.lhsCounts_zero {T : Type} [DecidableEq T] {G : ContextFreeGrammar T} [DecidableEq G.NT] (a : G.NT) :
            lhsCounts a 0 = fun (x : G.RulesWithLHS a) => 0

            Per-LHS counts vanish on the empty corpus.

            @[simp]
            theorem Morphology.FragmentGrammars.DMPCFG.corpusProb_zero {T : Type} [DecidableEq T] {G : ContextFreeGrammar T} [DecidableEq G.NT] (M : DMPCFG G) :
            M.corpusProb 0 = 1

            The empty corpus has probability 1 under any DMPCFG: each per-LHS Pólya factor is seqProb on the all-zero count vector, which equals 1 by PolyaUrn.seqProb_zero.

            Posterior MAP weights #

            Per @cite{odonnell-2015} §2.4 (Dirichlet–multinomial conjugacy), the posterior over rule weights given a corpus is again Dirichlet, so per-rule posterior expected value (the MAP weight in the symmetric prior limit) is (π_r + x_r) / Σ_{r' with same LHS} (π_{r'} + x_{r'}). This is the productivity score @cite{odonnell-2015} Ch 7 (p. 268) identifies as DMPCFG's failure mode: when corpus counts dominate pseudo-counts, frequency wins over the prior.

            noncomputable def Morphology.FragmentGrammars.DMPCFG.mapWeight {T : Type} [DecidableEq T] {G : ContextFreeGrammar T} [DecidableEq G.NT] (M : DMPCFG G) (r : ContextFreeRule T G.NT) (D : Multiset (CFGTree T G.NT)) :

            Per-rule MAP weight: posterior expected value of the rule's Dirichlet weight given the corpus rule-counts. Equals (pseudo r + count r D) / Σ_{r' with same LHS} (pseudo r' + count r' D). Returns 0 by convention when the LHS has no rules in G (vacuous; in practice r ∈ G.rules makes the denominator positive — see mapWeight_denom_pos).

            Equations
            Instances For
              theorem Morphology.FragmentGrammars.DMPCFG.mapWeight_denom_pos {T : Type} [DecidableEq T] {G : ContextFreeGrammar T} [DecidableEq G.NT] (M : DMPCFG G) {a : G.NT} [hne : Nonempty (G.RulesWithLHS a)] (D : Multiset (CFGTree T G.NT)) :
              0 < r'G.rules with r'.input = a, (M.pseudo r' + (CFGTree.corpusRuleCount r' D))

              The denominator of mapWeight is positive whenever the LHS has at least one rule in G. Each summand pseudo r' + count r' D is at least pseudo r' > 0 by pseudo_pos.

              Takes the per-LHS nonemptiness as a Nonempty typeclass argument rather than an explicit existential — mathlib idiom for "this construction needs the bucket to be inhabited." Consumers either declare instance : Nonempty (RulesWithLHS …) := ⟨⟨r, _⟩⟩ once for their grammar+LHS pair, or haveI the witness locally.

              theorem Morphology.FragmentGrammars.DMPCFG.mapWeight_pos {T : Type} [DecidableEq T] {G : ContextFreeGrammar T} [DecidableEq G.NT] (M : DMPCFG G) {r : ContextFreeRule T G.NT} (hr : r G.rules) (D : Multiset (CFGTree T G.NT)) :
              0 < M.mapWeight r D

              mapWeight is strictly positive for any rule in G. The numerator is positive (pseudo_pos + nonneg cast) and the denominator is positive (witness: r itself is in its own LHS bucket). The Nonempty instance for RulesWithLHS r.input is synthesised internally from hr.

              theorem Morphology.FragmentGrammars.DMPCFG.mapWeight_nonneg {T : Type} [DecidableEq T] {G : ContextFreeGrammar T} [DecidableEq G.NT] (M : DMPCFG G) {r : ContextFreeRule T G.NT} (hr : r G.rules) (D : Multiset (CFGTree T G.NT)) :
              0 M.mapWeight r D

              Corollary: mapWeight is nonnegative for rules in G.

              @[simp]
              theorem Morphology.FragmentGrammars.DMPCFG.mapWeight_zero {T : Type} [DecidableEq T] {G : ContextFreeGrammar T} [DecidableEq G.NT] (M : DMPCFG G) (r : ContextFreeRule T G.NT) :
              M.mapWeight r 0 = M.pseudo r / r'G.rules with r'.input = r.input, M.pseudo r'

              Empty-corpus reduction: mapWeight collapses to the prior expected value pseudo r / Σ_{r' with same LHS} pseudo r'. The Dirichlet posterior with no data IS the prior.

              theorem Morphology.FragmentGrammars.DMPCFG.mapWeight_lt_mapWeight_iff_of_same_lhs {T : Type} [DecidableEq T] {G : ContextFreeGrammar T} [DecidableEq G.NT] (M : DMPCFG G) {r r' : ContextFreeRule T G.NT} (hr'_in : r' G.rules) (h_lhs : r.input = r'.input) {D : Multiset (CFGTree T G.NT)} :
              M.mapWeight r D < M.mapWeight r' D M.pseudo r + (CFGTree.corpusRuleCount r D) < M.pseudo r' + (CFGTree.corpusRuleCount r' D)

              Same-LHS comparison (iff form): with a shared LHS, mapWeight ordering is equivalent to pseudo + count ordering. The shared denominator cancels. This is the technical core of the @cite{odonnell-2015} Ch 7 DMPCFG critique: when corpus counts overcome pseudo-count differences, the more-frequent rule wins regardless of prior.

              theorem Morphology.FragmentGrammars.DMPCFG.mapWeight_lt_mapWeight_of_same_lhs {T : Type} [DecidableEq T] {G : ContextFreeGrammar T} [DecidableEq G.NT] (M : DMPCFG G) {r r' : ContextFreeRule T G.NT} (hr'_in : r' G.rules) (h_lhs : r.input = r'.input) {D : Multiset (CFGTree T G.NT)} (h_num : M.pseudo r + (CFGTree.corpusRuleCount r D) < M.pseudo r' + (CFGTree.corpusRuleCount r' D)) :
              M.mapWeight r D < M.mapWeight r' D

              Directional form of mapWeight_lt_mapWeight_iff_of_same_lhs for apply-style proofs.

              theorem Morphology.FragmentGrammars.DMPCFG.mapWeight_sum_eq_one_of_lhs {T : Type} [DecidableEq T] {G : ContextFreeGrammar T} [DecidableEq G.NT] (M : DMPCFG G) {a : G.NT} [Nonempty (G.RulesWithLHS a)] (D : Multiset (CFGTree T G.NT)) :
              rG.rules with r.input = a, M.mapWeight r D = 1

              The probability axiom: mapWeight over rules sharing an LHS sums to 1. Each summand has the same denominator (the LHS-sum), and the numerators sum to that same denominator, so the ratio sums to 1. This is what justifies calling mapWeight a weight rather than just a number — it is the per-LHS Dirichlet posterior mass function.

              PMF lift #

              The per-LHS mapWeights sum to 1 (mapWeight_sum_eq_one_of_lhs), so they form a probability mass function on the rules sharing that LHS. mapWeightPMF packages this as mathlib's PMF, connecting DMPCFG to the standard probability infrastructure: the resulting object inherits support, bind, map, etc. from PMF, and sum-to-1 is now a structural property (a field of PMF) rather than a follow-up theorem.

              noncomputable def Morphology.FragmentGrammars.DMPCFG.mapWeightPMF {T : Type} [DecidableEq T] {G : ContextFreeGrammar T} [DecidableEq G.NT] (M : DMPCFG G) {a : G.NT} [Nonempty (G.RulesWithLHS a)] (D : Multiset (CFGTree T G.NT)) :
              PMF (G.RulesWithLHS a)

              The per-LHS Dirichlet posterior MAP weights as a probability mass function on the rules sharing that LHS.

              Built via PMF.ofFintype over the subtype RulesWithLHS a, using mapWeight_sum_eq_one_of_lhs for the normalisation obligation. The PMF's value at ⟨r, hr⟩ is ENNReal.ofReal (M.mapWeight r D) — see mapWeightPMF_apply.

              Equations
              Instances For
                @[simp]
                theorem Morphology.FragmentGrammars.DMPCFG.mapWeightPMF_apply {T : Type} [DecidableEq T] {G : ContextFreeGrammar T} [DecidableEq G.NT] (M : DMPCFG G) {a : G.NT} [Nonempty (G.RulesWithLHS a)] (D : Multiset (CFGTree T G.NT)) (r : G.RulesWithLHS a) :
                (M.mapWeightPMF D) r = ENNReal.ofReal (M.mapWeight (↑r) D)

                The PMF value at a rule equals the mapWeight value, lifted to ℝ≥0∞. Bridges mapWeightPMF (the structural object) to mapWeight (the numeric accessor). Holds by rfl via PMF.ofFintype_apply.

                theorem Morphology.FragmentGrammars.DMPCFG.mapWeightPMF_lt_iff {T : Type} [DecidableEq T] {G : ContextFreeGrammar T} [DecidableEq G.NT] (M : DMPCFG G) {a : G.NT} [Nonempty (G.RulesWithLHS a)] (r₁ r₂ : G.RulesWithLHS a) (D : Multiset (CFGTree T G.NT)) :
                (M.mapWeightPMF D) r₁ < (M.mapWeightPMF D) r₂ M.pseudo r₁ + (CFGTree.corpusRuleCount (↑r₁) D) < M.pseudo r₂ + (CFGTree.corpusRuleCount (↑r₂) D)

                PMF comparison lemma: at a shared LHS, the per-LHS posterior PMF ordering reduces to pseudo + count ordering. Bundles mapWeightPMF_apply + mapWeight_lt_mapWeight_iff_of_same_lhs

                • ENNReal.ofReal_lt_ofReal_iff so consumers can rewrite straight from PMF mass to numerator arithmetic. The shared LHS and r₂.1 ∈ G.rules premises both come for free from the subtype proofs r₁.2, r₂.2.

                Strict Bayesian mode (the actual MAP) #

                mapWeight computes the Dirichlet posterior mean, which is what @cite{odonnell-2015} and CL practice generally call "MAP" (a loose convention). The strict Bayesian MAP — the single most likely value of the rule-weight vector under the posterior — is the mode of Dir(pseudo + count), with closed form (π_i - 1) / Σ(π_j - 1).

                The mode is well-defined only when pseudo + count > 1 for every rule in the LHS bucket. Otherwise the posterior density goes to ∞ at a simplex vertex (when some π_i + x_i < 1) or has flat edges (equality case). For symmetric Dirichlet priors with concentration α ≤ 1 (the CL norm), the strict mode lives at the boundary; this is exactly why CL uses the mean instead.

                When both estimators are defined, they agree on orderings: rule with higher pseudo + count has higher mean and higher mode. So the productivity-ordering theorems we prove for mapWeight apply to posteriorMode automatically (under its precondition). The mode adds genuinely new content the mean can't express: vertex behavior, asymptotic agreement (mode → mean as pseudo + count → ∞), and boundary collapse.

                noncomputable def Morphology.FragmentGrammars.DMPCFG.posteriorMode {T : Type} [DecidableEq T] {G : ContextFreeGrammar T} [DecidableEq G.NT] (M : DMPCFG G) (r : ContextFreeRule T G.NT) (D : Multiset (CFGTree T G.NT)) :

                The Dirichlet posterior mode of the rule weight: the single most likely value of θ_r under Dir(pseudo + count). Closed form (π_i - 1) / Σ(π_j - 1), valid in the simplex interior when pseudo r' + count r' D > 1 for every r' in the LHS bucket (otherwise the mode is at the boundary; see posteriorMode_well_defined).

                Returns 0/0 = 0 (Lean convention) outside the well-defined regime; theorems carry the well-definedness hypothesis.

                Equations
                Instances For
                  def Morphology.FragmentGrammars.DMPCFG.PosteriorModeWellDefined {T : Type} [DecidableEq T] {G : ContextFreeGrammar T} [DecidableEq G.NT] (M : DMPCFG G) (a : G.NT) (D : Multiset (CFGTree T G.NT)) :

                  The mode-well-definedness condition: every entry of the LHS bucket exceeds 1 in the posterior. Equivalent to "the Dirichlet posterior has its mode strictly in the simplex interior."

                  Equations
                  Instances For
                    theorem Morphology.FragmentGrammars.DMPCFG.posteriorMode_denom_pos {T : Type} [DecidableEq T] {G : ContextFreeGrammar T} [DecidableEq G.NT] (M : DMPCFG G) {a : G.NT} [hne : Nonempty (G.RulesWithLHS a)] (D : Multiset (CFGTree T G.NT)) (h : M.PosteriorModeWellDefined a D) :
                    0 < r'G.rules with r'.input = a, (M.pseudo r' + (CFGTree.corpusRuleCount r' D) - 1)

                    The mode denominator is positive when the well-definedness condition holds. Each summand exceeds 0 by hypothesis.

                    theorem Morphology.FragmentGrammars.DMPCFG.posteriorMode_pos {T : Type} [DecidableEq T] {G : ContextFreeGrammar T} [DecidableEq G.NT] (M : DMPCFG G) {r : ContextFreeRule T G.NT} (hr : r G.rules) (D : Multiset (CFGTree T G.NT)) (h : M.PosteriorModeWellDefined r.input D) :

                    posteriorMode is strictly positive on rules in G when the well-definedness condition holds. The numerator and denominator are both positive.

                    theorem Morphology.FragmentGrammars.DMPCFG.posteriorMode_lt_iff_of_same_lhs {T : Type} [DecidableEq T] {G : ContextFreeGrammar T} [DecidableEq G.NT] (M : DMPCFG G) {r r' : ContextFreeRule T G.NT} (hr'_in : r' G.rules) (h_lhs : r.input = r'.input) {D : Multiset (CFGTree T G.NT)} (h_wd : M.PosteriorModeWellDefined r'.input D) :

                    Same-LHS comparison (mode version). Mode ordering matches pseudo + count ordering, just like mapWeight. The shared denominator cancels. This is the key fact that makes CL's "MAP = mean" loose convention harmless for ordering claims: both estimators give the same productivity ranking.

                    theorem Morphology.FragmentGrammars.DMPCFG.mapWeight_lt_iff_posteriorMode_lt {T : Type} [DecidableEq T] {G : ContextFreeGrammar T} [DecidableEq G.NT] (M : DMPCFG G) {r r' : ContextFreeRule T G.NT} (hr'_in : r' G.rules) (h_lhs : r.input = r'.input) {D : Multiset (CFGTree T G.NT)} (h_wd : M.PosteriorModeWellDefined r'.input D) :
                    M.mapWeight r D < M.mapWeight r' D M.posteriorMode r D < M.posteriorMode r' D

                    Mean-mode ordering equivalence. Whenever both mapWeight and posteriorMode are well-defined for the same comparison, they agree on ordering of two rules sharing an LHS. This is the formal Lean witness of "the mean-vs-mode distinction is moot for productivity-ranking claims."

                    Bridge: DMPCFG → MultinomialPCFG via posterior MAP #

                    A DMPCFG is a prior over multinomial PCFGs (Dirichlet hyperparameters parameterize a distribution over per-LHS rule-weight vectors). The posterior given a corpus is again Dirichlet (conjugacy); the posterior MAP weights are a single multinomial PCFG.

                    This is the bridge: collapse a DMPCFG, conditioned on a corpus, into its MAP point estimate. At the empty corpus (D = 0), the result is the prior mean — see posteriorMAP_zero.

                    DMPCFG does not extends MultinomialPCFG: a prior is a different kind of object than a point in weight space. The two relate via this function, not via inheritance.

                    noncomputable def Morphology.FragmentGrammars.DMPCFG.posteriorMAP {T : Type} [DecidableEq T] {G : ContextFreeGrammar T} [DecidableEq G.NT] (M : DMPCFG G) [∀ (a : G.NT), Nonempty (G.RulesWithLHS a)] (D : Multiset (CFGTree T G.NT)) :

                    The Dirichlet posterior MAP collapse: given a corpus D, package DMPCFG's per-LHS posterior MAP weights as a MultinomialPCFG. Per-LHS PMFs come straight from mapWeightPMF; sum-to-1 is structural via PMF.

                    Requires [∀ a, Nonempty (G.RulesWithLHS a)] (carried by MultinomialPCFG itself): the construction can't deliver a multinomial PCFG over a grammar with empty LHS buckets.

                    Equations
                    Instances For
                      theorem Morphology.FragmentGrammars.DMPCFG.posteriorMAP_rulePMF {T : Type} [DecidableEq T] {G : ContextFreeGrammar T} [DecidableEq G.NT] (M : DMPCFG G) [∀ (a : G.NT), Nonempty (G.RulesWithLHS a)] (D : Multiset (CFGTree T G.NT)) (a : G.NT) :

                      The posterior MAP MultinomialPCFG's per-LHS PMF unfolds to DMPCFG's mapWeightPMF. Holds by rfl. Not @[simp] — Lean unfolds eagerly anyway (consumers like ODonnell2015's bridge demo use direct application without rw), and tagging simp risks loops in unfamiliar simp contexts.

                      Conjugacy decomposition: posterior + mode #

                      The posteriorMAP collapses two distinct Bayesian operations:

                      1. DMPCFG.posterior : DMPCFG G → Multiset _ → DMPCFG G — Dirichlet-conjugate update: bump pseudo-counts by the corpus rule counts. The posterior of a Dirichlet given multinomial data is again a Dirichlet (conjugacy), so this stays inside the DMPCFG type.
                      2. DMPCFG.mode : DMPCFG G → MultinomialPCFG G — the Dirichlet-mode projection (= posteriorMAP at the empty corpus).

                      The decomposition posteriorMAP = modeposterior is captured by posteriorMAP_eq_mode_posterior. Naming the two primitives separately makes the Bayesian structure visible in code and exposes incrementality of the update (posterior_add) as a stand-alone theorem rather than an unspoken consequence.

                      noncomputable def Morphology.FragmentGrammars.DMPCFG.posterior {T : Type} [DecidableEq T] {G : ContextFreeGrammar T} [DecidableEq G.NT] (M : DMPCFG G) (D : Multiset (CFGTree T G.NT)) :

                      The Dirichlet-conjugate posterior update: given a corpus, bump each rule's Dirichlet pseudo-count by that rule's corpus count. Stays inside the DMPCFG type — that's the point of conjugacy.

                      Equations
                      Instances For
                        @[simp]
                        theorem Morphology.FragmentGrammars.DMPCFG.posterior_zero {T : Type} [DecidableEq T] {G : ContextFreeGrammar T} [DecidableEq G.NT] (M : DMPCFG G) :
                        M.posterior 0 = M

                        Posterior at the empty corpus is the prior: no data, no update.

                        theorem Morphology.FragmentGrammars.DMPCFG.posterior_add {T : Type} [DecidableEq T] {G : ContextFreeGrammar T} [DecidableEq G.NT] (M : DMPCFG G) (D₁ D₂ : Multiset (CFGTree T G.NT)) :
                        M.posterior (D₁ + D₂) = (M.posterior D₁).posterior D₂

                        Incrementality. Updating with D₁ + D₂ equals updating sequentially with D₁ then D₂. Bayesian update is associative over disjoint data — the actual content of "the prior commutes with corpus aggregation." Follows from corpusRuleCount_add.

                        noncomputable def Morphology.FragmentGrammars.DMPCFG.mode {T : Type} [DecidableEq T] {G : ContextFreeGrammar T} [DecidableEq G.NT] (M : DMPCFG G) [∀ (a : G.NT), Nonempty (G.RulesWithLHS a)] :

                        Mode of the Dirichlet prior in MultinomialPCFG-space: the per-LHS-normalized pseudo-counts as a MultinomialPCFG. Equivalently: posteriorMAP at the empty corpus (no data).

                        Equations
                        Instances For
                          theorem Morphology.FragmentGrammars.DMPCFG.posteriorMAP_eq_mode_posterior {T : Type} [DecidableEq T] {G : ContextFreeGrammar T} [DecidableEq G.NT] (M : DMPCFG G) [∀ (a : G.NT), Nonempty (G.RulesWithLHS a)] (D : Multiset (CFGTree T G.NT)) :

                          The conjugacy decomposition. posteriorMAP factors as modeposterior: first update the prior with data (Dirichlet conjugacy), then take the mode (Dirichlet-to-MultinomialPCFG projection). The Bayesian-MAP estimator stops being a primitive and becomes a derived composition.