@cite{herbstritt-franke-2019} on mathlib PMF — structural skeleton #
@cite{herbstritt-franke-2019}
Complex probability expressions & higher-order uncertainty: compositional threshold semantics + RSA over an urn scenario with N=10 balls. Cognition 186 (2019) 50–71.
This file formalises the structural skeleton of HF 2019 on mathlib's
PMF, in the same key as Phenomena/Gradability/Studies/LassiterGoodman2017PMF.lean
— honest about what the file does and does not capture.
Scope #
HF 2019's novel architectural contributions are:
- Hellinger-distance speaker utility (Eq. 16) instead of KL divergence.
The literal listener
P_LL(s|m) ∝ ⟦m⟧(s) · P_prior(s)puts zero mass outside the meaning extension;KL(P_rat.bel ‖ P_LL) = ∞whenever the speaker's belief puts positive mass on a state outside the extension — so a strictly probabilistic speaker can never say "certainly RED" when she observes 9/10 red. Hellinger distance is uniformly bounded by 1, so the speaker can consider any message with bounded disutility. This is HF's central architectural claim about why divergence choice matters; formalised asklDiv_eq_top_when_zero_supportandhellingerDist_le_one_finitein §6 below. - Three-component joint posterior over
(state, obs, access)(Eq. 18) with marginalisations to(state, obs)(Eq. 19) and(state, access)(Eq. 20). Direct application ofCore/Probability/JointPosterior.lean'sposterior_fst_apply/posterior_snd_applysubstrate. - Compositional threshold semantics for nested probability
expressions (Eq. 22-23) — formalised against the theory-layer
Theories/Semantics/Modality/EpistemicProbability.nestedThreshold.
What this file captures:
- §1 Belief formation as
PMF.posteriorofPMF.hypergeometric— no localobsPrior/speakerBeliefRredefinitions; usesCoresubstrate directly. The bridge toRSA.Distributions.hypergeometricis by construction (both reduce to mathlib'sNat.chooseformula). - §2 Threshold semantics for simple expressions (Eq. 13-14) with
inferred thresholds (Table 6). Decidability checks via
decide;native_decideonly for finite-arithmetic verification of the model (perfeedback_proof_style). - §3 Compositional threshold semantics for complex expressions (Eq. 22-23) — same pattern, threshold check on posterior probability of the inner proposition.
- §4 The RSA model on PMF: speaker via
PMF.normalizeof softmax-Hellinger; pragmatic listener viaPMF.posteriorover the joint(state × obs × access)space. - §5 Joint-posterior marginalisations (Eq. 19-20) via
posterior_fst_apply/posterior_snd_apply. - §6 The Hellinger-vs-KL architectural claim — concrete witness showing that KL excludes "true enough" messages but Hellinger doesn't.
- §7 Empirical data (Tables 6, 7, 9, 10) and cross-experiment threshold stability.
- §8 Cross-paper engagement — disagreement with LaBToM @cite{ying-zhi-xuan-wong-mansinghka-tenenbaum-2025}
on
probably/likely, and contrast with @cite{goodman-stuhlmuller-2013}'s KL-utility model (which would reject "certainly" at 9/10).
Not captured (paper-specific evaluative content, deferred):
- The full posterior predictive simulation against experimental data — this requires Bayesian parameter estimation in JAGS (the paper's tooling) and the model–data correlations (Tables 7, 10) are statistical claims, not theorems about model structure.
- Modal concord overprediction — qualitative observation about the compositional model's "might be possible" failure mode (§6 of paper).
Cross-framework positioning (linglib's "make incompatibilities visible") #
Per linglib's "no bridge files" discipline (CLAUDE.md), the framework-
divergence material is anchored here (the chronologically-later paper
in the cluster) rather than in a dedicated comparison file. The §6
theorem makes the Hellinger-vs-KL architectural divergence visible at
theorem level. The §8 disagreement theorem with LaBToM
(labtom_likely_above_hf_probably_hdi) makes the empirical
posterior-mean disagreement visible at theorem level.
§0. Domain types #
Urn state: number of red balls in the urn (0..10).
Equations
- HerbstrittFranke2019.PMF.UrnState = Fin 11
Instances For
The proportion of red balls: s/10. First-order probability of
drawing a red ball, and the measure function for simple expressions.
Equations
- HerbstrittFranke2019.PMF.proportion s = ↑↑s / 10
Instances For
Observation count (zero-padded for access < 10): obs > access have
probability 0 in the kernel.
Equations
- HerbstrittFranke2019.PMF.Obs = Fin 11
Instances For
§1. Belief formation via PMF.hypergeometric (Eq. 12) #
The observation kernel is the hypergeometric distribution from
Core/Probability/Hypergeometric.lean, embedded into the padded
Obs := Fin 11 via PMF.map. The speaker's posterior belief is then
just PMF.posterior of this kernel against the uniform world prior.
Observation kernel: P(obs | access, state). The hypergeometric
PMF on Fin (access + 1) embedded into Fin 11 (zero on
obs > access). Uses PMF.hypergeometric from Core/Probability.
Equations
- HerbstrittFranke2019.PMF.obsKernel access h_a s = PMF.map (fun (o : Fin (access + 1)) => ⟨↑o, ⋯⟩) (PMF.hypergeometric 10 (↑s) access h_a ⋯)
Instances For
Speaker's posterior belief P_rat.bel(·|o, a) (Eq. 12).
With uniform world prior, this is PMF.posterior of the
hypergeometric kernel.
Equations
- HerbstrittFranke2019.PMF.speakerBelief access h_a obs h_marg = PMF.posterior (HerbstrittFranke2019.PMF.obsKernel access h_a) (PMF.uniformOfFintype HerbstrittFranke2019.PMF.UrnState) obs h_marg
Instances For
ℚ-valued speaker belief for decide-style verification #
Mirrors the existing file's speakerBeliefQ for finite-arithmetic
checks. Bridge to the PMF version: both reduce to the same hypergeometric
ratio.
Hypergeometric weight at (N=10, K=s, n=access, k=obs) as ℚ.
Equations
- HerbstrittFranke2019.PMF.hypergeoQ access obs s = if obs ≤ access then ↑((↑s).choose obs * (10 - ↑s).choose (access - obs)) / ↑(Nat.choose 10 access) else 0
Instances For
ℚ-valued speaker belief — Bayes' rule over the hypergeometric kernel with uniform world prior.
Equations
- One or more equations did not get rendered due to their size.
Instances For
§2. Simple expression semantics (Eq. 13-14) #
Semantic threshold for "possibly" (posterior mean from Table 6).
Equations
- HerbstrittFranke2019.PMF.θ_possibly = 247 / 1000
Instances For
Semantic threshold for "probably" (posterior mean from Table 6).
Equations
- HerbstrittFranke2019.PMF.θ_probably = 549 / 1000
Instances For
Upper bound of the 95% HDI for θ_probably (Table 6). Below LaBToM's
likely_.θ = 0.70, witnessing posterior-mean divergence.
Equations
- HerbstrittFranke2019.PMF.θ_probably_upper = 594 / 1000
Instances For
Semantic threshold for "certainly" (posterior mean from Table 6).
Equations
- HerbstrittFranke2019.PMF.θ_certainly = 949 / 1000
Instances For
The inferred threshold ordering matches the theoretical prediction: certainly > probably > possibly.
The five simple expressions from Experiments 2 and 3.
- certainlyNot : SimpleExpr
- probablyNot : SimpleExpr
- possibly : SimpleExpr
- probably : SimpleExpr
- certainly : SimpleExpr
Instances For
Equations
- HerbstrittFranke2019.PMF.instDecidableEqSimpleExpr x✝ y✝ = if h : x✝.ctorIdx = y✝.ctorIdx then isTrue ⋯ else isFalse ⋯
Equations
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
Equations
- One or more equations did not get rendered due to their size.
Simple expression meaning using inferred thresholds. Eq. 13: ⟦X(RED)⟧ = {s | s/10 > θ_X} (positive) Eq. 14: ⟦X not(RED)⟧ = {s | s/10 < 1 − θ_X} (negated)
Equations
- HerbstrittFranke2019.PMF.SimpleExpr.certainly.meaning = fun (s : HerbstrittFranke2019.PMF.UrnState) => decide (HerbstrittFranke2019.PMF.proportion s > HerbstrittFranke2019.PMF.θ_certainly)
- HerbstrittFranke2019.PMF.SimpleExpr.probably.meaning = fun (s : HerbstrittFranke2019.PMF.UrnState) => decide (HerbstrittFranke2019.PMF.proportion s > HerbstrittFranke2019.PMF.θ_probably)
- HerbstrittFranke2019.PMF.SimpleExpr.possibly.meaning = fun (s : HerbstrittFranke2019.PMF.UrnState) => decide (HerbstrittFranke2019.PMF.proportion s > HerbstrittFranke2019.PMF.θ_possibly)
- HerbstrittFranke2019.PMF.SimpleExpr.probablyNot.meaning = fun (s : HerbstrittFranke2019.PMF.UrnState) => decide (HerbstrittFranke2019.PMF.proportion s < 1 - HerbstrittFranke2019.PMF.θ_probably)
- HerbstrittFranke2019.PMF.SimpleExpr.certainlyNot.meaning = fun (s : HerbstrittFranke2019.PMF.UrnState) => decide (HerbstrittFranke2019.PMF.proportion s < 1 - HerbstrittFranke2019.PMF.θ_certainly)
Instances For
Simple expression extensions are nested: certainly ⊂ probably ⊂ possibly.
§3. Compositional complex-expression semantics (Eq. 22-23) #
Posterior probability that an urn-state proposition φ holds, given the speaker's observation. ℚ-valued for decidable verification.
posteriorProb(access, obs, φ) = Σ_{s ∈ ⟦φ⟧} P_rat.bel(s | obs, access)
Equations
- HerbstrittFranke2019.PMF.posteriorProb access obs φ = {x : HerbstrittFranke2019.PMF.UrnState | φ x = true}.sum (HerbstrittFranke2019.PMF.speakerBeliefQ access obs)
Instances For
The three inner expressions from Experiment 3 (Eq. 22). Footnote 18:
θ_probably from the simple model maps to θ_likely of the inner
expressions in the complex model.
Instances For
Equations
- HerbstrittFranke2019.PMF.instDecidableEqInnerExpr x✝ y✝ = if h : x✝.ctorIdx = y✝.ctorIdx then isTrue ⋯ else isFalse ⋯
Equations
Equations
- One or more equations did not get rendered due to their size.
Instances For
Inner expression meaning (Eq. 22).
Equations
- HerbstrittFranke2019.PMF.InnerExpr.likely.meaning = fun (s : HerbstrittFranke2019.PMF.UrnState) => decide (HerbstrittFranke2019.PMF.proportion s > HerbstrittFranke2019.PMF.θ_probably)
- HerbstrittFranke2019.PMF.InnerExpr.possible.meaning = fun (s : HerbstrittFranke2019.PMF.UrnState) => decide (HerbstrittFranke2019.PMF.proportion s > HerbstrittFranke2019.PMF.θ_possibly)
- HerbstrittFranke2019.PMF.InnerExpr.unlikely.meaning = fun (s : HerbstrittFranke2019.PMF.UrnState) => decide (HerbstrittFranke2019.PMF.proportion s < 1 - HerbstrittFranke2019.PMF.θ_probably)
Instances For
Equations
- HerbstrittFranke2019.PMF.instDecidableEqOuterMod x✝ y✝ = if h : x✝.ctorIdx = y✝.ctorIdx then isTrue ⋯ else isFalse ⋯
Equations
Equations
- One or more equations did not get rendered due to their size.
Instances For
Outer modifier threshold (Table 9, complex model).
Equations
Instances For
Complex expression Y(X(RED)) (Eq. 23): ⟦Y(X(RED))⟧(⟨o, a⟩) ⟺ Σ_{s∈⟦X(RED)⟧} P_rat.bel(s|o, a) > θ_Y
Equations
- HerbstrittFranke2019.PMF.complexMeaning outer inner access obs = decide (HerbstrittFranke2019.PMF.posteriorProb access obs inner.meaning > outer.threshold)
Instances For
For HF's specific thresholds, strict > and non-strict ≥ give the
same extension on UrnState — no proportion s/10 (s ∈ {0,...,10})
exactly equals any threshold. Justifies using > (paper Eq. 13)
even when the theory-layer nestedThreshold uses ≥ (Fagin & Halpern).
§4. RSA model on PMF (Eq. 15-18) #
The literal listener P_LL(s|m) ∝ ⟦m⟧(s) · P_prior(s) is the uniform
distribution on the meaning extension (Eq. 15, with uniform prior).
The speaker normalises softmax-Hellinger scores (Eq. 16-17). The
pragmatic listener is PMF.posterior over the joint (state × obs × access)
(Eq. 18).
Literal listener (Eq. 15) for a meaning with non-empty extension: uniform on the extension.
Equations
- HerbstrittFranke2019.PMF.literalListener φ h_nonempty = PMF.uniformOfFinset {x : HerbstrittFranke2019.PMF.UrnState | φ x = true} h_nonempty
Instances For
All five simple expressions have non-empty extensions under HF's
inferred thresholds — every meaning admits a literal listener.
Uses native_decide because Rat ordering is opaque to decide.
§5. Joint posterior marginalisations (Eq. 19-20) #
HF's Eq. 18 pragmatic listener is P_PL(s, o, a | m). Eq. 19 is the
marginal P(s, o | m) (over access); Eq. 20 is P(s, a | m) (over obs).
Both are direct corollaries of Core/Probability/JointPosterior.lean's
marginalisation theorems applied to the joint posterior.
The structural form here works for any speaker kernel S and joint
prior over (state × access) — paper-specific instantiation is the
softmax-Hellinger speaker, which lives in §6 below.
HF Eq. 18: pragmatic listener as joint posterior.
P_PL(s, a | m) ∝ S(m | s, a) · P(s, a).
Just PMF.posterior at α := UrnState × Access. The marginalisation
theorems below are direct instantiations of posterior_fst_apply /
posterior_snd_apply.
Equations
- HerbstrittFranke2019.PMF.pragmaticListener S joint m h_marg = PMF.posterior S joint m h_marg
Instances For
HF Eq. 19/20 (marginal over Access): per-state marginal posterior.
P(s | m) = ∑_a P_PL(s, a | m) = (∑_a P(s, a) · S(m | s, a)) / Z
HF marginal over UrnState: per-access marginal posterior.
Companion of statePosterior_apply for the Access component.
Comparison decomposition for the per-state marginal posterior.
L1 favours state s₂ over s₁ after m iff the conditional joint
sums favour it. Direct corollary of posterior_fst_lt_iff.
§6. Hellinger-vs-KL: the architectural divergence point #
HF's central methodological choice (Eq. 16, footnote): KL divergence
makes any message with [m] not covering speaker's belief support have
infinite disutility, so the speaker can NEVER consider it. Hellinger
distance is uniformly bounded by 1, so all messages have bounded
disutility.
The two theorems below witness this directly:
klDiv_eq_top_when_zero_support: KL = ∞ whenever some support point of P has zero mass under Q.hellingerDistSq_le_one_finite: H² ≤ 1 always, so HD ≤ 1.
Together they make the architectural divergence visible at theorem level: there exist (P, Q) pairs where KL-utility excludes Q as a message choice but Hellinger-utility admits it.
The theorem klDiv_eq_top_iff_not_absolutelyContinuous from mathlib
(via PMF.klDiv rfl-bridge) provides the "iff" form.
KL divergence is ∞ when Q has zero mass on P's support.
Direct from mathlib klDiv_of_not_ac: KL = ∞ when P.toMeasure is
not absolutely continuous w.r.t. Q.toMeasure. Absolute continuity
fails at s₀ because Q s₀ = 0 but P s₀ ≠ 0.
Hellinger squared distance is ≤ 1 on PMFs: H²(P, Q) = 1 - BC(P, Q)
where BC ∈ [0, 1] (Cauchy-Schwarz on √P, √Q, both with sum-to-1).
Bounded above by 1 because BC ≥ 0.
The lower bound 0 ≤ BC is immediate (each term √(P·Q) ≥ 0).
Hellinger distance is ≤ 1 on PMFs: by H = √H² and H² ≤ 1.
The architectural divergence (concrete): there exist PMFs P, Q
on UrnState such that KL(P ‖ Q) = ∞ but HD(P, Q) ≤ 1.
Witness: P = pure 9 (speaker's degenerate belief at 9/10 red),
Q = pure 10 (literal listener for certainly — δ on s=10).
Under KL: speaker can never say "certainly". Under Hellinger:
speaker considers "certainly" with bounded disutility. This is HF's
key architectural claim.
§7. Empirical data (Tables 6, 7, 9, 10) #
Inferred semantic threshold parameters from the simple expression model (Experiment 1 data, Table 6). Posterior means with 95% HDIs.
- mean : ℚ
- hdi_lower : ℚ
- hdi_upper : ℚ
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
Equations
- HerbstrittFranke2019.PMF.θ_possibly_inferred = { mean := 247 / 1000, hdi_lower := 200 / 1000, hdi_upper := 299 / 1000 }
Instances For
Equations
- HerbstrittFranke2019.PMF.θ_probably_inferred = { mean := 549 / 1000, hdi_lower := 500 / 1000, hdi_upper := 594 / 1000 }
Instances For
Equations
- HerbstrittFranke2019.PMF.θ_certainly_inferred = { mean := 949 / 1000, hdi_lower := 904 / 1000, hdi_upper := 1 }
Instances For
Model–data correlation result (Tables 7, 10).
- dimension : String
- r : ℚ
- hdi_lower : ℚ
- hdi_upper : ℚ
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
Equations
- HerbstrittFranke2019.PMF.corr_expression = { dimension := "expression", r := 861 / 1000, hdi_lower := 824 / 1000, hdi_upper := 902 / 1000 }
Instances For
Equations
- HerbstrittFranke2019.PMF.corr_state = { dimension := "state", r := 741 / 1000, hdi_lower := 666 / 1000, hdi_upper := 824 / 1000 }
Instances For
Equations
- HerbstrittFranke2019.PMF.corr_access = { dimension := "access", r := 798 / 1000, hdi_lower := 736 / 1000, hdi_upper := 858 / 1000 }
Instances For
Equations
- HerbstrittFranke2019.PMF.corr_observation = { dimension := "observation", r := 819 / 1000, hdi_lower := 766 / 1000, hdi_upper := 868 / 1000 }
Instances For
All simple-model correlations are substantial (r > 0.5).
Inferred outer-modifier thresholds from the complex expression model
(Experiment 2, Table 9). Inner thresholds (θ_possible, θ_likely)
overlap with Table 6 — cross-experiment stability.
Equations
- HerbstrittFranke2019.PMF.θ_possible_complex = { mean := 251 / 1000, hdi_lower := 200 / 1000, hdi_upper := 295 / 1000 }
Instances For
Equations
- HerbstrittFranke2019.PMF.θ_might_complex = { mean := 332 / 1000, hdi_lower := 295 / 1000, hdi_upper := 336 / 1000 }
Instances For
Equations
- HerbstrittFranke2019.PMF.θ_likely_complex = { mean := 549 / 1000, hdi_lower := 504 / 1000, hdi_upper := 599 / 1000 }
Instances For
Equations
- HerbstrittFranke2019.PMF.θ_probably_complex = { mean := 690 / 1000, hdi_lower := 629 / 1000, hdi_upper := 745 / 1000 }
Instances For
Equations
- HerbstrittFranke2019.PMF.θ_certainly_complex = { mean := 982 / 1000, hdi_lower := 969 / 1000, hdi_upper := 988 / 1000 }
Instances For
Equations
- HerbstrittFranke2019.PMF.corr_complex_expression = { dimension := "expression", r := 654 / 1000, hdi_lower := 596 / 1000, hdi_upper := 719 / 1000 }
Instances For
Equations
- HerbstrittFranke2019.PMF.corr_complex_state = { dimension := "state", r := 849 / 1000, hdi_lower := 812 / 1000, hdi_upper := 883 / 1000 }
Instances For
Equations
- HerbstrittFranke2019.PMF.corr_complex_access = { dimension := "access", r := 853 / 1000, hdi_lower := 818 / 1000, hdi_upper := 888 / 1000 }
Instances For
Equations
- HerbstrittFranke2019.PMF.corr_complex_observation = { dimension := "observation", r := 934 / 1000, hdi_lower := 915 / 1000, hdi_upper := 952 / 1000 }
Instances For
Cross-experiment threshold stability: the inner-expression
likely/possible thresholds inferred jointly with Experiment 2
complex-expression data agree with the simple-expression Experiment 1
probably/possibly thresholds to within a few thousandths.
Footnote 18: "θ_probably from the simpler model should be mapped onto θ_likely in Table 9 because the latter represents the threshold of the inner expressions."
§8. Cross-paper engagement #
Disagreement with LaBToM @cite{ying-zhi-xuan-wong-mansinghka-tenenbaum-2025} #
HF and LaBToM both infer credence thresholds for English probability
vocabulary by Bayesian fitting against experimental data, but pick
different thresholds where their lexicons overlap. HF's posterior mean
for probably is 0.549 with 95% HDI [0.500, 0.594]; LaBToM's point
estimate for likely is 0.700. LaBToM's value lies above the upper
bound of HF's HDI — the methodologies disagree at the 95%-credibility
level, not just on the point estimate.
Candidate explanations: lexical (probably ≠ likely), task (production
with urns vs Theory-of-Mind in a gridworld), or posterior uncertainty
(LaBToM reports point values where HF reports intervals).
HF and LaBToM agree on certainly/certain to within 0.001 — well
inside both HDIs; no theorem on that pair, the empirical signal isn't
there.
LaBToM's likely_.θ = 0.70 lies above the upper bound of HF's 95%
HDI for θ_probably ([0.500, 0.594]). The two parameter-fitted
theories disagree at the 95%-credibility level.
Architectural contrast with @cite{goodman-stuhlmuller-2013} #
This model and G&S 2013 share the hypergeometric observation kernel
(both use PMF.hypergeometric with N=10 and N=3 respectively) and the
RSA architecture (literal listener, softmax speaker, posterior pragmatic
listener). The only architectural difference is the speaker utility:
| Component | G&S 2013 | HF 2019 |
|---|---|---|
| State space | Fin 4 (objects) | Fin 11 (red balls) |
| Observation | hypergeometric | hypergeometric |
| Utility | -KL(bel ‖ L0) | -HD(bel, L0) |
| Admits "true enough"? | NO (KL = ∞ on zero-support) | YES (HD ≤ 1) |
hellinger_admits_what_KL_excludes (§6) makes this difference visible:
the same speaker belief pure 9 paired with the literal listener for
certainly (pure 10) yields KL = ∞ but HD ≤ 1. Under G&S 2013's
KL utility this speaker would be excluded from saying "certainly RED";
under HF's Hellinger utility she can. The empirical evidence (HF's
production data, Section 5) shows speakers DO say "certainly" at 9/10.
The Bretagnolle–Huber inequality (PMF.two_hellingerDistSq_le_klDiv in
Core/Probability/Entropy.lean) gives the formal direction 2 · H² ≤ KL
on the finite side: where KL is finite, H² is too, and bounded.
Hellinger is the strictly weaker (more permissive) divergence — exactly
what HF wants for the speaker utility.
Modal concord overprediction #
The paper notes that the compositional model overpredicts the frequency of "might be possible" in production data (§6). The compositional semantics assigns "might be possible" a very weak truth condition (posterior probability of "possible" exceeds θ_might = 0.332), making it true in almost all conditions.
Explanation: participants may give "might be possible" a modal
concord reading, collapsing the two modals to a single "possible".
See Phenomena/Modality/ModalConcord for the general phenomenon.
This is qualitative; not formalised here.