[HF19]: complex probability expressions via RSA on PMF #
Compositional threshold semantics plus an RSA model over an urn scenario
(N = 10 balls), formalised on mathlib's PMF in the same key as
Studies/LassiterGoodman2017PMF.lean. Cognition 186 (2019) 50–71.
The architectural novelty is a Hellinger-distance speaker utility (Eq. 16) in
place of KL divergence. The literal listener P_LL(s|m) ∝ ⟦m⟧(s) · P_prior(s)
puts zero mass outside a meaning's extension, so KL(P_rat.bel ‖ P_LL) = ∞
whenever the speaker's belief has support outside it — a strictly probabilistic
speaker could never say "certainly RED" at 9/10. Hellinger distance is bounded
by 1, so the speaker can assert it with bounded disutility.
Main definitions #
obsKernel,speakerBelief— belief formation (Eq. 12) asPMF.posteriorof the hypergeometric observation kernel, withℚ-valued mirrorshypergeoQ/speakerBeliefQfor finite-arithmetic checks.SimpleExpr.meaning,complexMeaning— threshold semantics for simple (Eq. 13-14) and nested (Eq. 22-23) probability expressions.literalListener,pragmaticListener— the RSA literal listener (uniform on a meaning extension, Eq. 15) and the pragmatic listener as a jointPMF.posterior(Eq. 18).
Main results #
hellinger_admits_what_KL_excludes— on the witness pair(pure 9, pure 10),KL = ∞butHD ≤ 1: the formal content of the Hellinger-vs-KL divergence, viaklDiv_eq_top_when_zero_supportandhellingerDistSq_le_one.statePosterior_apply,accessPosterior_apply,statePosterior_lt_iff— marginalisations (Eq. 19-20) of the joint posterior, direct corollaries ofCore/Probability/JointPosterior.certainly_subset_probably,probably_subset_possibly— nesting of the inferred meaning extensions.
The cross-paper contrast with [GS13] makes the speaker-utility divergence (KL vs Hellinger) visible at theorem level. The paper's posterior-predictive evaluation (JAGS fits, the model–data correlations of Tables 6-10) is statistical, not structural, and is not formalised here.
Domain types #
Urn state: number of red balls in the urn (0..10).
Equations
- HerbstrittFranke2019.UrnState = Fin 11
Instances For
The proportion of red balls: s/10. First-order probability of
drawing a red ball, and the measure function for simple expressions.
Equations
- HerbstrittFranke2019.proportion s = ↑↑s / 10
Instances For
Observation count (zero-padded for access < 10): obs > access have
probability 0 in the kernel.
Equations
- HerbstrittFranke2019.Obs = Fin 11
Instances For
Belief formation via the hypergeometric kernel #
The observation kernel is the hypergeometric distribution from
Core/Probability/Hypergeometric.lean, embedded into the padded
Obs := Fin 11 via PMF.map. The speaker's posterior belief is then
just PMF.posterior of this kernel against the uniform world prior.
Observation kernel: P(obs | access, state). The hypergeometric
PMF on Fin (access + 1) embedded into Fin 11 (zero on
obs > access). Uses PMF.hypergeometric from Core/Probability.
Equations
- HerbstrittFranke2019.obsKernel access h_a s = PMF.map (fun (o : Fin (access + 1)) => ⟨↑o, ⋯⟩) (PMF.hypergeometric 10 (↑s) access h_a ⋯)
Instances For
Speaker's posterior belief P_rat.bel(·|o, a) (Eq. 12).
With uniform world prior, this is PMF.posterior of the
hypergeometric kernel.
Equations
- HerbstrittFranke2019.speakerBelief access h_a obs h_marg = PMF.posterior (HerbstrittFranke2019.obsKernel access h_a) (PMF.uniformOfFintype HerbstrittFranke2019.UrnState) obs h_marg
Instances For
ℚ-valued speaker belief for finite-arithmetic verification #
Mirrors the existing file's speakerBeliefQ for norm_num checks. Bridge
to the PMF version: both reduce to the same hypergeometric ratio.
Hypergeometric weight at (N=10, K=s, n=access, k=obs) as ℚ.
Equations
- HerbstrittFranke2019.hypergeoQ access obs s = if obs ≤ access then ↑((↑s).choose obs * (10 - ↑s).choose (access - obs)) / ↑(Nat.choose 10 access) else 0
Instances For
ℚ-valued speaker belief — Bayes' rule over the hypergeometric kernel with uniform world prior.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Simple expression semantics #
Semantic threshold for "possibly" (posterior mean from Table 6).
Equations
- HerbstrittFranke2019.θ_possibly = 247 / 1000
Instances For
Semantic threshold for "probably" (posterior mean from Table 6).
Equations
- HerbstrittFranke2019.θ_probably = 549 / 1000
Instances For
Semantic threshold for "certainly" (posterior mean from Table 6).
Equations
- HerbstrittFranke2019.θ_certainly = 949 / 1000
Instances For
The inferred threshold ordering matches the theoretical prediction: certainly > probably > possibly.
The five simple expressions from Experiments 2 and 3.
- certainlyNot : SimpleExpr
- probablyNot : SimpleExpr
- possibly : SimpleExpr
- probably : SimpleExpr
- certainly : SimpleExpr
Instances For
Equations
- HerbstrittFranke2019.instDecidableEqSimpleExpr x✝ y✝ = if h : x✝.ctorIdx = y✝.ctorIdx then isTrue ⋯ else isFalse ⋯
Equations
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
Equations
- One or more equations did not get rendered due to their size.
Simple expression meaning using inferred thresholds. Eq. 13: ⟦X(RED)⟧ = {s | s/10 > θ_X} (positive) Eq. 14: ⟦X not(RED)⟧ = {s | s/10 < 1 − θ_X} (negated)
Equations
- HerbstrittFranke2019.SimpleExpr.certainly.meaning = fun (s : HerbstrittFranke2019.UrnState) => decide (HerbstrittFranke2019.proportion s > HerbstrittFranke2019.θ_certainly)
- HerbstrittFranke2019.SimpleExpr.probably.meaning = fun (s : HerbstrittFranke2019.UrnState) => decide (HerbstrittFranke2019.proportion s > HerbstrittFranke2019.θ_probably)
- HerbstrittFranke2019.SimpleExpr.possibly.meaning = fun (s : HerbstrittFranke2019.UrnState) => decide (HerbstrittFranke2019.proportion s > HerbstrittFranke2019.θ_possibly)
- HerbstrittFranke2019.SimpleExpr.probablyNot.meaning = fun (s : HerbstrittFranke2019.UrnState) => decide (HerbstrittFranke2019.proportion s < 1 - HerbstrittFranke2019.θ_probably)
- HerbstrittFranke2019.SimpleExpr.certainlyNot.meaning = fun (s : HerbstrittFranke2019.UrnState) => decide (HerbstrittFranke2019.proportion s < 1 - HerbstrittFranke2019.θ_certainly)
Instances For
Simple expression extensions are nested: certainly ⊂ probably ⊂ possibly.
Compositional complex-expression semantics #
Posterior probability that an urn-state proposition φ holds, given the speaker's observation. ℚ-valued for decidable verification.
posteriorProb(access, obs, φ) = Σ_{s ∈ ⟦φ⟧} P_rat.bel(s | obs, access)
Equations
- HerbstrittFranke2019.posteriorProb access obs φ = {x : HerbstrittFranke2019.UrnState | φ x = true}.sum (HerbstrittFranke2019.speakerBeliefQ access obs)
Instances For
The three inner expressions from Experiment 3 (Eq. 22). Footnote 18:
θ_probably from the simple model maps to θ_likely of the inner
expressions in the complex model.
Instances For
Equations
- HerbstrittFranke2019.instDecidableEqInnerExpr x✝ y✝ = if h : x✝.ctorIdx = y✝.ctorIdx then isTrue ⋯ else isFalse ⋯
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
Inner expression meaning (Eq. 22).
Equations
- HerbstrittFranke2019.InnerExpr.likely.meaning = fun (s : HerbstrittFranke2019.UrnState) => decide (HerbstrittFranke2019.proportion s > HerbstrittFranke2019.θ_probably)
- HerbstrittFranke2019.InnerExpr.possible.meaning = fun (s : HerbstrittFranke2019.UrnState) => decide (HerbstrittFranke2019.proportion s > HerbstrittFranke2019.θ_possibly)
- HerbstrittFranke2019.InnerExpr.unlikely.meaning = fun (s : HerbstrittFranke2019.UrnState) => decide (HerbstrittFranke2019.proportion s < 1 - HerbstrittFranke2019.θ_probably)
Instances For
Equations
- HerbstrittFranke2019.instDecidableEqOuterMod x✝ y✝ = if h : x✝.ctorIdx = y✝.ctorIdx then isTrue ⋯ else isFalse ⋯
Equations
Equations
- One or more equations did not get rendered due to their size.
Instances For
Outer modifier threshold (Table 9, complex model).
Equations
Instances For
Complex expression Y(X(RED)) (Eq. 23): ⟦Y(X(RED))⟧(⟨o, a⟩) ⟺ Σ_{s∈⟦X(RED)⟧} P_rat.bel(s|o, a) > θ_Y
Equations
- HerbstrittFranke2019.complexMeaning outer inner access obs = decide (HerbstrittFranke2019.posteriorProb access obs inner.meaning > outer.threshold)
Instances For
For HF's specific thresholds, strict > and non-strict ≥ give the
same extension on UrnState — no proportion s/10 (s ∈ {0,...,10})
exactly equals any threshold. Justifies using > (paper Eq. 13) even
when the theory-layer nestedThreshold uses ≥ (Fagin & Halpern).
RSA model on PMF #
The literal listener P_LL(s|m) ∝ ⟦m⟧(s) · P_prior(s) is the uniform
distribution on the meaning extension (Eq. 15, with uniform prior). The
speaker normalises softmax-Hellinger scores (Eq. 16-17). The pragmatic
listener is PMF.posterior over the joint (state × obs × access) (Eq. 18).
Literal listener (Eq. 15) for a meaning with non-empty extension: uniform on the extension.
Equations
- HerbstrittFranke2019.literalListener φ h_nonempty = PMF.uniformOfFinset {x : HerbstrittFranke2019.UrnState | φ x = true} h_nonempty
Instances For
All five simple expressions have non-empty extensions under HF's inferred thresholds — every meaning admits a literal listener.
Joint posterior marginalisations #
The pragmatic listener is P_PL(s, o, a | m); its marginals P(s, o | m)
(over access) and P(s, a | m) (over obs) are direct corollaries of
Core/Probability/JointPosterior.lean's marginalisation theorems applied to
the joint posterior.
The structural form here works for any speaker kernel S and joint
prior over (state × access) — paper-specific instantiation is the
softmax-Hellinger speaker, which lives in the Hellinger-vs-KL section below.
HF Eq. 18: pragmatic listener as joint posterior.
P_PL(s, a | m) ∝ S(m | s, a) · P(s, a).
Just PMF.posterior at α := UrnState × Access. The marginalisation
theorems below are direct instantiations of posterior_fst_apply /
posterior_snd_apply.
Equations
- HerbstrittFranke2019.pragmaticListener S joint m h_marg = PMF.posterior S joint m h_marg
Instances For
HF Eq. 19/20 (marginal over Access): per-state marginal posterior.
P(s | m) = ∑_a P_PL(s, a | m) = (∑_a P(s, a) · S(m | s, a)) / Z
HF marginal over UrnState: per-access marginal posterior.
Companion of statePosterior_apply for the Access component.
Comparison decomposition for the per-state marginal posterior.
L1 favours state s₂ over s₁ after m iff the conditional joint
sums favour it. Direct corollary of posterior_fst_lt_iff.
Hellinger vs KL: the architectural divergence point #
HF's central methodological choice: KL divergence makes any message whose extension fails to cover the speaker's belief support have infinite disutility, so the speaker can never consider it. Hellinger distance is uniformly bounded by 1, so all messages have bounded disutility.
The two theorems below witness this directly:
klDiv_eq_top_when_zero_support: KL = ∞ whenever some support point ofPhas zero mass underQ.hellingerDistSq_le_one: H² ≤ 1 always, so HD ≤ 1.
Together they make the architectural divergence visible at theorem level:
there exist (P, Q) pairs where KL-utility excludes Q as a message choice
but Hellinger-utility admits it.
KL divergence is ∞ when Q has zero mass on P's support.
Direct from mathlib klDiv_of_not_ac: KL = ∞ when P.toMeasure is
not absolutely continuous w.r.t. Q.toMeasure. Absolute continuity
fails at s₀ because Q s₀ = 0 but P s₀ ≠ 0.
Hellinger squared distance is ≤ 1 on PMFs: H²(P, Q) = 1 - BC(P, Q)
where BC ∈ [0, 1] (Cauchy-Schwarz on √P, √Q, both with sum-to-1).
Bounded above by 1 because BC ≥ 0.
The lower bound 0 ≤ BC is immediate (each term √(P·Q) ≥ 0).
Hellinger distance is ≤ 1 on PMFs: by H = √H² and H² ≤ 1.
The architectural divergence (concrete): there exist PMFs P, Q
on UrnState such that KL(P ‖ Q) = ∞ but HD(P, Q) ≤ 1.
Witness: P = pure 9 (speaker's degenerate belief at 9/10 red),
Q = pure 10 (literal listener for certainly — δ on s=10).
Under KL: speaker can never say "certainly". Under Hellinger:
speaker considers "certainly" with bounded disutility. This is HF's
key architectural claim.
Architectural contrast with [GS13] #
This model and G&S 2013 share the hypergeometric observation kernel
(both use PMF.hypergeometric with N=10 and N=3 respectively) and the
RSA architecture (literal listener, softmax speaker, posterior pragmatic
listener). The only architectural difference is the speaker utility:
| Component | G&S 2013 | HF 2019 |
|---|---|---|
| State space | Fin 4 (objects) | Fin 11 (red balls) |
| Observation | hypergeometric | hypergeometric |
| Utility | -KL(bel ‖ L0) | -HD(bel, L0) |
| Admits "true enough"? | NO (KL = ∞ on zero-support) | YES (HD ≤ 1) |
hellinger_admits_what_KL_excludes makes this difference visible:
the same speaker belief pure 9 paired with the literal listener for
certainly (pure 10) yields KL = ∞ but HD ≤ 1. Under G&S 2013's
KL utility this speaker would be excluded from saying "certainly RED";
under HF's Hellinger utility she can. The empirical evidence (HF's
production data, Section 5) shows speakers DO say "certainly" at 9/10.
The Bretagnolle–Huber inequality (PMF.two_hellingerDistSq_le_klDiv in
Core/Probability/Entropy.lean) gives the formal direction 2 · H² ≤ KL
on the finite side: where KL is finite, H² is too, and bounded.
Hellinger is the strictly weaker (more permissive) divergence — exactly
what HF wants for the speaker utility.
Modal concord and might be possible #
The paper finds that the compositional model underpredicts how often
speakers choose "might be possible": although its truth condition is weak
(posterior probability of "possible" exceeds θ_might = 0.332), the pragmatic
speaker prefers logically stronger messages and so assigns the doubly-hedged
form a low choice rate — yet participants select it far more often than
predicted. A modal concord reading, collapsing the two modals to a single
"possible" [Zei07], would close the gap. Qualitative; not formalised
here.