Documentation

Linglib.Phenomena.Modality.Studies.HerbstrittFranke2019PMF

@cite{herbstritt-franke-2019} on mathlib PMF — structural skeleton #

@cite{herbstritt-franke-2019}

Complex probability expressions & higher-order uncertainty: compositional threshold semantics + RSA over an urn scenario with N=10 balls. Cognition 186 (2019) 50–71.

This file formalises the structural skeleton of HF 2019 on mathlib's PMF, in the same key as Phenomena/Gradability/Studies/LassiterGoodman2017PMF.lean — honest about what the file does and does not capture.

Scope #

HF 2019's novel architectural contributions are:

  1. Hellinger-distance speaker utility (Eq. 16) instead of KL divergence. The literal listener P_LL(s|m) ∝ ⟦m⟧(s) · P_prior(s) puts zero mass outside the meaning extension; KL(P_rat.bel ‖ P_LL) = ∞ whenever the speaker's belief puts positive mass on a state outside the extension — so a strictly probabilistic speaker can never say "certainly RED" when she observes 9/10 red. Hellinger distance is uniformly bounded by 1, so the speaker can consider any message with bounded disutility. This is HF's central architectural claim about why divergence choice matters; formalised as klDiv_eq_top_when_zero_support and hellingerDist_le_one_finite in §6 below.
  2. Three-component joint posterior over (state, obs, access) (Eq. 18) with marginalisations to (state, obs) (Eq. 19) and (state, access) (Eq. 20). Direct application of Core/Probability/JointPosterior.lean's posterior_fst_apply / posterior_snd_apply substrate.
  3. Compositional threshold semantics for nested probability expressions (Eq. 22-23) — formalised against the theory-layer Theories/Semantics/Modality/EpistemicProbability.nestedThreshold.

What this file captures:

Not captured (paper-specific evaluative content, deferred):

Cross-framework positioning (linglib's "make incompatibilities visible") #

Per linglib's "no bridge files" discipline (CLAUDE.md), the framework- divergence material is anchored here (the chronologically-later paper in the cluster) rather than in a dedicated comparison file. The §6 theorem makes the Hellinger-vs-KL architectural divergence visible at theorem level. The §8 disagreement theorem with LaBToM (labtom_likely_above_hf_probably_hdi) makes the empirical posterior-mean disagreement visible at theorem level.

§0. Domain types #

@[reducible, inline]

Urn state: number of red balls in the urn (0..10).

Equations
Instances For

    The proportion of red balls: s/10. First-order probability of drawing a red ball, and the measure function for simple expressions.

    Equations
    Instances For
      @[reducible, inline]

      Observation count (zero-padded for access < 10): obs > access have probability 0 in the kernel.

      Equations
      Instances For

        §1. Belief formation via PMF.hypergeometric (Eq. 12) #

        The observation kernel is the hypergeometric distribution from Core/Probability/Hypergeometric.lean, embedded into the padded Obs := Fin 11 via PMF.map. The speaker's posterior belief is then just PMF.posterior of this kernel against the uniform world prior.

        noncomputable def HerbstrittFranke2019.PMF.obsKernel (access : ) (h_a : access 10) (s : UrnState) :
        PMF Obs

        Observation kernel: P(obs | access, state). The hypergeometric PMF on Fin (access + 1) embedded into Fin 11 (zero on obs > access). Uses PMF.hypergeometric from Core/Probability.

        Equations
        Instances For
          noncomputable def HerbstrittFranke2019.PMF.speakerBelief (access : ) (h_a : access 10) (obs : Obs) (h_marg : PMF.marginal (obsKernel access h_a) (PMF.uniformOfFintype UrnState) obs 0) :

          Speaker's posterior belief P_rat.bel(·|o, a) (Eq. 12). With uniform world prior, this is PMF.posterior of the hypergeometric kernel.

          Equations
          Instances For

            -valued speaker belief for decide-style verification #

            Mirrors the existing file's speakerBeliefQ for finite-arithmetic checks. Bridge to the PMF version: both reduce to the same hypergeometric ratio.

            def HerbstrittFranke2019.PMF.hypergeoQ (access obs : ) (s : UrnState) :

            Hypergeometric weight at (N=10, K=s, n=access, k=obs) as ℚ.

            Equations
            Instances For
              def HerbstrittFranke2019.PMF.speakerBeliefQ (access obs : ) (s : UrnState) :

              ℚ-valued speaker belief — Bayes' rule over the hypergeometric kernel with uniform world prior.

              Equations
              • One or more equations did not get rendered due to their size.
              Instances For

                §2. Simple expression semantics (Eq. 13-14) #

                Semantic threshold for "possibly" (posterior mean from Table 6).

                Equations
                Instances For

                  Semantic threshold for "probably" (posterior mean from Table 6).

                  Equations
                  Instances For

                    Upper bound of the 95% HDI for θ_probably (Table 6). Below LaBToM's likely_.θ = 0.70, witnessing posterior-mean divergence.

                    Equations
                    Instances For

                      Semantic threshold for "certainly" (posterior mean from Table 6).

                      Equations
                      Instances For

                        The inferred threshold ordering matches the theoretical prediction: certainly > probably > possibly.

                        The five simple expressions from Experiments 2 and 3.

                        Instances For
                          @[implicit_reducible]
                          Equations
                          Equations
                          • One or more equations did not get rendered due to their size.
                          Instances For
                            @[implicit_reducible]
                            Equations
                            • One or more equations did not get rendered due to their size.

                            Simple expression extensions are nested: certainly ⊂ probably ⊂ possibly.

                            §3. Compositional complex-expression semantics (Eq. 22-23) #

                            def HerbstrittFranke2019.PMF.posteriorProb (access obs : ) (φ : UrnStateBool) :

                            Posterior probability that an urn-state proposition φ holds, given the speaker's observation. ℚ-valued for decidable verification.

                            posteriorProb(access, obs, φ) = Σ_{s ∈ ⟦φ⟧} P_rat.bel(s | obs, access)

                            Equations
                            Instances For

                              The three inner expressions from Experiment 3 (Eq. 22). Footnote 18: θ_probably from the simple model maps to θ_likely of the inner expressions in the complex model.

                              Instances For
                                @[implicit_reducible]
                                Equations
                                Equations
                                • One or more equations did not get rendered due to their size.
                                Instances For

                                  The four outer modifiers from Experiment 3.

                                  Instances For
                                    @[implicit_reducible]
                                    Equations
                                    Equations
                                    • One or more equations did not get rendered due to their size.
                                    Instances For
                                      def HerbstrittFranke2019.PMF.complexMeaning (outer : OuterMod) (inner : InnerExpr) (access obs : ) :
                                      Bool

                                      Complex expression Y(X(RED)) (Eq. 23): ⟦Y(X(RED))⟧(⟨o, a⟩) ⟺ Σ_{s∈⟦X(RED)⟧} P_rat.bel(s|o, a) > θ_Y

                                      Equations
                                      Instances For

                                        For HF's specific thresholds, strict > and non-strict give the same extension on UrnState — no proportion s/10 (s ∈ {0,...,10}) exactly equals any threshold. Justifies using > (paper Eq. 13) even when the theory-layer nestedThreshold uses (Fagin & Halpern).

                                        §4. RSA model on PMF (Eq. 15-18) #

                                        The literal listener P_LL(s|m) ∝ ⟦m⟧(s) · P_prior(s) is the uniform distribution on the meaning extension (Eq. 15, with uniform prior). The speaker normalises softmax-Hellinger scores (Eq. 16-17). The pragmatic listener is PMF.posterior over the joint (state × obs × access) (Eq. 18).

                                        noncomputable def HerbstrittFranke2019.PMF.literalListener (φ : UrnStateBool) (h_nonempty : {x : UrnState | φ x = true}.Nonempty) :

                                        Literal listener (Eq. 15) for a meaning with non-empty extension: uniform on the extension.

                                        Equations
                                        Instances For

                                          All five simple expressions have non-empty extensions under HF's inferred thresholds — every meaning admits a literal listener. Uses native_decide because Rat ordering is opaque to decide.

                                          §5. Joint posterior marginalisations (Eq. 19-20) #

                                          HF's Eq. 18 pragmatic listener is P_PL(s, o, a | m). Eq. 19 is the marginal P(s, o | m) (over access); Eq. 20 is P(s, a | m) (over obs). Both are direct corollaries of Core/Probability/JointPosterior.lean's marginalisation theorems applied to the joint posterior.

                                          The structural form here works for any speaker kernel S and joint prior over (state × access) — paper-specific instantiation is the softmax-Hellinger speaker, which lives in §6 below.

                                          noncomputable def HerbstrittFranke2019.PMF.pragmaticListener {Access : Type u_1} (S : UrnState × AccessPMF SimpleExpr) (joint : PMF (UrnState × Access)) (m : SimpleExpr) (h_marg : PMF.marginal S joint m 0) :
                                          PMF (UrnState × Access)

                                          HF Eq. 18: pragmatic listener as joint posterior. P_PL(s, a | m) ∝ S(m | s, a) · P(s, a).

                                          Just PMF.posterior at α := UrnState × Access. The marginalisation theorems below are direct instantiations of posterior_fst_apply / posterior_snd_apply.

                                          Equations
                                          Instances For
                                            theorem HerbstrittFranke2019.PMF.statePosterior_apply {Access : Type u_1} [Fintype Access] (S : UrnState × AccessPMF SimpleExpr) (joint : PMF (UrnState × Access)) (m : SimpleExpr) (h_marg : PMF.marginal S joint m 0) (s : UrnState) :
                                            (pragmaticListener S joint m h_marg).fst s = (∑ a : Access, joint (s, a) * (S (s, a)) m) / PMF.marginal S joint m

                                            HF Eq. 19/20 (marginal over Access): per-state marginal posterior. P(s | m) = ∑_a P_PL(s, a | m) = (∑_a P(s, a) · S(m | s, a)) / Z

                                            theorem HerbstrittFranke2019.PMF.accessPosterior_apply {Access : Type u_1} [Fintype Access] [DecidableEq Access] (S : UrnState × AccessPMF SimpleExpr) (joint : PMF (UrnState × Access)) (m : SimpleExpr) (h_marg : PMF.marginal S joint m 0) (a : Access) :
                                            (pragmaticListener S joint m h_marg).snd a = (∑ s : UrnState, joint (s, a) * (S (s, a)) m) / PMF.marginal S joint m

                                            HF marginal over UrnState: per-access marginal posterior. Companion of statePosterior_apply for the Access component.

                                            theorem HerbstrittFranke2019.PMF.statePosterior_lt_iff {Access : Type u_1} [Fintype Access] (S : UrnState × AccessPMF SimpleExpr) (joint : PMF (UrnState × Access)) (m : SimpleExpr) (h_marg : PMF.marginal S joint m 0) (s₁ s₂ : UrnState) :
                                            (pragmaticListener S joint m h_marg).fst s₁ < (pragmaticListener S joint m h_marg).fst s₂ a : Access, joint (s₁, a) * (S (s₁, a)) m < a : Access, joint (s₂, a) * (S (s₂, a)) m

                                            Comparison decomposition for the per-state marginal posterior. L1 favours state s₂ over s₁ after m iff the conditional joint sums favour it. Direct corollary of posterior_fst_lt_iff.

                                            §6. Hellinger-vs-KL: the architectural divergence point #

                                            HF's central methodological choice (Eq. 16, footnote): KL divergence makes any message with [m] not covering speaker's belief support have infinite disutility, so the speaker can NEVER consider it. Hellinger distance is uniformly bounded by 1, so all messages have bounded disutility.

                                            The two theorems below witness this directly:

                                            Together they make the architectural divergence visible at theorem level: there exist (P, Q) pairs where KL-utility excludes Q as a message choice but Hellinger-utility admits it.

                                            The theorem klDiv_eq_top_iff_not_absolutelyContinuous from mathlib (via PMF.klDiv rfl-bridge) provides the "iff" form.

                                            theorem HerbstrittFranke2019.PMF.klDiv_eq_top_when_zero_support {α : Type u_1} [Fintype α] [MeasurableSpace α] [MeasurableSingletonClass α] (P Q : PMF α) (s₀ : α) (hP : P s₀ 0) (hQ : Q s₀ = 0) :
                                            P.klDiv Q =

                                            KL divergence is when Q has zero mass on P's support. Direct from mathlib klDiv_of_not_ac: KL = ∞ when P.toMeasure is not absolutely continuous w.r.t. Q.toMeasure. Absolute continuity fails at s₀ because Q s₀ = 0 but P s₀ ≠ 0.

                                            theorem HerbstrittFranke2019.PMF.hellingerDistSq_le_one {α : Type u_1} [Fintype α] (P Q : PMF α) :

                                            Hellinger squared distance is ≤ 1 on PMFs: H²(P, Q) = 1 - BC(P, Q) where BC ∈ [0, 1] (Cauchy-Schwarz on √P, √Q, both with sum-to-1). Bounded above by 1 because BC ≥ 0.

                                            The lower bound 0 ≤ BC is immediate (each term √(P·Q) ≥ 0).

                                            theorem HerbstrittFranke2019.PMF.hellingerDist_le_one {α : Type u_1} [Fintype α] (P Q : PMF α) :

                                            Hellinger distance is ≤ 1 on PMFs: by H = √H² and H² ≤ 1.

                                            theorem HerbstrittFranke2019.PMF.hellinger_admits_what_KL_excludes :
                                            (PMF.pure 9, ).klDiv (PMF.pure 10, ) = (PMF.pure 9, ).hellingerDist (PMF.pure 10, ) 1

                                            The architectural divergence (concrete): there exist PMFs P, Q on UrnState such that KL(P ‖ Q) = ∞ but HD(P, Q) ≤ 1.

                                            Witness: P = pure 9 (speaker's degenerate belief at 9/10 red), Q = pure 10 (literal listener for certainly — δ on s=10). Under KL: speaker can never say "certainly". Under Hellinger: speaker considers "certainly" with bounded disutility. This is HF's key architectural claim.

                                            §7. Empirical data (Tables 6, 7, 9, 10) #

                                            Inferred semantic threshold parameters from the simple expression model (Experiment 1 data, Table 6). Posterior means with 95% HDIs.

                                            • mean :
                                            • hdi_lower :
                                            • hdi_upper :
                                            Instances For
                                              Equations
                                              • One or more equations did not get rendered due to their size.
                                              Instances For
                                                Equations
                                                Instances For
                                                  Equations
                                                  Instances For
                                                    Equations
                                                    Instances For

                                                      Model–data correlation result (Tables 7, 10).

                                                      • dimension : String
                                                      • r :
                                                      • hdi_lower :
                                                      • hdi_upper :
                                                      Instances For
                                                        Equations
                                                        • One or more equations did not get rendered due to their size.
                                                        Instances For
                                                          Equations
                                                          Instances For
                                                            Equations
                                                            Instances For
                                                              Equations
                                                              Instances For
                                                                Equations
                                                                Instances For

                                                                  All simple-model correlations are substantial (r > 0.5).

                                                                  Inferred outer-modifier thresholds from the complex expression model (Experiment 2, Table 9). Inner thresholds (θ_possible, θ_likely) overlap with Table 6 — cross-experiment stability.

                                                                  Equations
                                                                  Instances For
                                                                    Equations
                                                                    Instances For
                                                                      Equations
                                                                      Instances For
                                                                        Equations
                                                                        Instances For
                                                                          Equations
                                                                          Instances For
                                                                            Equations
                                                                            Instances For
                                                                              Equations
                                                                              Instances For
                                                                                Equations
                                                                                Instances For
                                                                                  Equations
                                                                                  Instances For

                                                                                    Cross-experiment threshold stability: the inner-expression likely/possible thresholds inferred jointly with Experiment 2 complex-expression data agree with the simple-expression Experiment 1 probably/possibly thresholds to within a few thousandths.

                                                                                    Footnote 18: "θ_probably from the simpler model should be mapped onto θ_likely in Table 9 because the latter represents the threshold of the inner expressions."

                                                                                    §8. Cross-paper engagement #

                                                                                    Disagreement with LaBToM @cite{ying-zhi-xuan-wong-mansinghka-tenenbaum-2025} #

                                                                                    HF and LaBToM both infer credence thresholds for English probability vocabulary by Bayesian fitting against experimental data, but pick different thresholds where their lexicons overlap. HF's posterior mean for probably is 0.549 with 95% HDI [0.500, 0.594]; LaBToM's point estimate for likely is 0.700. LaBToM's value lies above the upper bound of HF's HDI — the methodologies disagree at the 95%-credibility level, not just on the point estimate.

                                                                                    Candidate explanations: lexical (probablylikely), task (production with urns vs Theory-of-Mind in a gridworld), or posterior uncertainty (LaBToM reports point values where HF reports intervals).

                                                                                    HF and LaBToM agree on certainly/certain to within 0.001 — well inside both HDIs; no theorem on that pair, the empirical signal isn't there.

                                                                                    LaBToM's likely_.θ = 0.70 lies above the upper bound of HF's 95% HDI for θ_probably ([0.500, 0.594]). The two parameter-fitted theories disagree at the 95%-credibility level.

                                                                                    Architectural contrast with @cite{goodman-stuhlmuller-2013} #

                                                                                    This model and G&S 2013 share the hypergeometric observation kernel (both use PMF.hypergeometric with N=10 and N=3 respectively) and the RSA architecture (literal listener, softmax speaker, posterior pragmatic listener). The only architectural difference is the speaker utility:

                                                                                    ComponentG&S 2013HF 2019
                                                                                    State spaceFin 4 (objects)Fin 11 (red balls)
                                                                                    Observationhypergeometrichypergeometric
                                                                                    Utility-KL(bel ‖ L0)-HD(bel, L0)
                                                                                    Admits "true enough"?NO (KL = ∞ on zero-support)YES (HD ≤ 1)

                                                                                    hellinger_admits_what_KL_excludes (§6) makes this difference visible: the same speaker belief pure 9 paired with the literal listener for certainly (pure 10) yields KL = ∞ but HD ≤ 1. Under G&S 2013's KL utility this speaker would be excluded from saying "certainly RED"; under HF's Hellinger utility she can. The empirical evidence (HF's production data, Section 5) shows speakers DO say "certainly" at 9/10.

                                                                                    The Bretagnolle–Huber inequality (PMF.two_hellingerDistSq_le_klDiv in Core/Probability/Entropy.lean) gives the formal direction 2 · H² ≤ KL on the finite side: where KL is finite, H² is too, and bounded. Hellinger is the strictly weaker (more permissive) divergence — exactly what HF wants for the speaker utility.

                                                                                    The paper notes that the compositional model overpredicts the frequency of "might be possible" in production data (§6). The compositional semantics assigns "might be possible" a very weak truth condition (posterior probability of "possible" exceeds θ_might = 0.332), making it true in almost all conditions.

                                                                                    Explanation: participants may give "might be possible" a modal concord reading, collapsing the two modals to a single "possible". See Phenomena/Modality/ModalConcord for the general phenomenon. This is qualitative; not formalised here.