Documentation

Linglib.Studies.HerbstrittFranke2019

[HF19]: complex probability expressions via RSA on PMF #

Compositional threshold semantics plus an RSA model over an urn scenario (N = 10 balls), formalised on mathlib's PMF in the same key as Studies/LassiterGoodman2017PMF.lean. Cognition 186 (2019) 50–71.

The architectural novelty is a Hellinger-distance speaker utility (Eq. 16) in place of KL divergence. The literal listener P_LL(s|m) ∝ ⟦m⟧(s) · P_prior(s) puts zero mass outside a meaning's extension, so KL(P_rat.bel ‖ P_LL) = ∞ whenever the speaker's belief has support outside it — a strictly probabilistic speaker could never say "certainly RED" at 9/10. Hellinger distance is bounded by 1, so the speaker can assert it with bounded disutility.

Main definitions #

Main results #

The cross-paper contrast with [GS13] makes the speaker-utility divergence (KL vs Hellinger) visible at theorem level. The paper's posterior-predictive evaluation (JAGS fits, the model–data correlations of Tables 6-10) is statistical, not structural, and is not formalised here.

Domain types #

@[reducible, inline]

Urn state: number of red balls in the urn (0..10).

Equations
Instances For

    The proportion of red balls: s/10. First-order probability of drawing a red ball, and the measure function for simple expressions.

    Equations
    Instances For
      @[reducible, inline]

      Observation count (zero-padded for access < 10): obs > access have probability 0 in the kernel.

      Equations
      Instances For

        Belief formation via the hypergeometric kernel #

        The observation kernel is the hypergeometric distribution from Core/Probability/Hypergeometric.lean, embedded into the padded Obs := Fin 11 via PMF.map. The speaker's posterior belief is then just PMF.posterior of this kernel against the uniform world prior.

        noncomputable def HerbstrittFranke2019.obsKernel (access : ) (h_a : access 10) (s : UrnState) :
        PMF Obs

        Observation kernel: P(obs | access, state). The hypergeometric PMF on Fin (access + 1) embedded into Fin 11 (zero on obs > access). Uses PMF.hypergeometric from Core/Probability.

        Equations
        Instances For
          noncomputable def HerbstrittFranke2019.speakerBelief (access : ) (h_a : access 10) (obs : Obs) (h_marg : PMF.marginal (obsKernel access h_a) (PMF.uniformOfFintype UrnState) obs 0) :

          Speaker's posterior belief P_rat.bel(·|o, a) (Eq. 12). With uniform world prior, this is PMF.posterior of the hypergeometric kernel.

          Equations
          Instances For

            -valued speaker belief for finite-arithmetic verification #

            Mirrors the existing file's speakerBeliefQ for norm_num checks. Bridge to the PMF version: both reduce to the same hypergeometric ratio.

            def HerbstrittFranke2019.hypergeoQ (access obs : ) (s : UrnState) :

            Hypergeometric weight at (N=10, K=s, n=access, k=obs) as ℚ.

            Equations
            Instances For
              def HerbstrittFranke2019.speakerBeliefQ (access obs : ) (s : UrnState) :

              ℚ-valued speaker belief — Bayes' rule over the hypergeometric kernel with uniform world prior.

              Equations
              • One or more equations did not get rendered due to their size.
              Instances For

                Simple expression semantics #

                Semantic threshold for "possibly" (posterior mean from Table 6).

                Equations
                Instances For

                  Semantic threshold for "probably" (posterior mean from Table 6).

                  Equations
                  Instances For

                    Semantic threshold for "certainly" (posterior mean from Table 6).

                    Equations
                    Instances For

                      The inferred threshold ordering matches the theoretical prediction: certainly > probably > possibly.

                      The five simple expressions from Experiments 2 and 3.

                      Instances For
                        @[implicit_reducible]
                        Equations
                        Equations
                        • One or more equations did not get rendered due to their size.
                        Instances For
                          @[implicit_reducible]
                          Equations
                          • One or more equations did not get rendered due to their size.

                          Simple expression extensions are nested: certainly ⊂ probably ⊂ possibly.

                          Compositional complex-expression semantics #

                          def HerbstrittFranke2019.posteriorProb (access obs : ) (φ : UrnStateBool) :

                          Posterior probability that an urn-state proposition φ holds, given the speaker's observation. ℚ-valued for decidable verification.

                          posteriorProb(access, obs, φ) = Σ_{s ∈ ⟦φ⟧} P_rat.bel(s | obs, access)

                          Equations
                          Instances For

                            The three inner expressions from Experiment 3 (Eq. 22). Footnote 18: θ_probably from the simple model maps to θ_likely of the inner expressions in the complex model.

                            Instances For
                              @[implicit_reducible]
                              Equations
                              Equations
                              • One or more equations did not get rendered due to their size.
                              Instances For

                                The four outer modifiers from Experiment 3.

                                Instances For
                                  @[implicit_reducible]
                                  Equations
                                  Equations
                                  • One or more equations did not get rendered due to their size.
                                  Instances For
                                    def HerbstrittFranke2019.complexMeaning (outer : OuterMod) (inner : InnerExpr) (access obs : ) :
                                    Bool

                                    Complex expression Y(X(RED)) (Eq. 23): ⟦Y(X(RED))⟧(⟨o, a⟩) ⟺ Σ_{s∈⟦X(RED)⟧} P_rat.bel(s|o, a) > θ_Y

                                    Equations
                                    Instances For

                                      For HF's specific thresholds, strict > and non-strict give the same extension on UrnState — no proportion s/10 (s ∈ {0,...,10}) exactly equals any threshold. Justifies using > (paper Eq. 13) even when the theory-layer nestedThreshold uses (Fagin & Halpern).

                                      RSA model on PMF #

                                      The literal listener P_LL(s|m) ∝ ⟦m⟧(s) · P_prior(s) is the uniform distribution on the meaning extension (Eq. 15, with uniform prior). The speaker normalises softmax-Hellinger scores (Eq. 16-17). The pragmatic listener is PMF.posterior over the joint (state × obs × access) (Eq. 18).

                                      noncomputable def HerbstrittFranke2019.literalListener (φ : UrnStateBool) (h_nonempty : {x : UrnState | φ x = true}.Nonempty) :

                                      Literal listener (Eq. 15) for a meaning with non-empty extension: uniform on the extension.

                                      Equations
                                      Instances For
                                        theorem HerbstrittFranke2019.simple_meaning_nonempty (u : SimpleExpr) :
                                        {x : UrnState | u.meaning x = true}.Nonempty

                                        All five simple expressions have non-empty extensions under HF's inferred thresholds — every meaning admits a literal listener.

                                        Joint posterior marginalisations #

                                        The pragmatic listener is P_PL(s, o, a | m); its marginals P(s, o | m) (over access) and P(s, a | m) (over obs) are direct corollaries of Core/Probability/JointPosterior.lean's marginalisation theorems applied to the joint posterior.

                                        The structural form here works for any speaker kernel S and joint prior over (state × access) — paper-specific instantiation is the softmax-Hellinger speaker, which lives in the Hellinger-vs-KL section below.

                                        noncomputable def HerbstrittFranke2019.pragmaticListener {Access : Type u_1} (S : UrnState × AccessPMF SimpleExpr) (joint : PMF (UrnState × Access)) (m : SimpleExpr) (h_marg : PMF.marginal S joint m 0) :
                                        PMF (UrnState × Access)

                                        HF Eq. 18: pragmatic listener as joint posterior. P_PL(s, a | m) ∝ S(m | s, a) · P(s, a).

                                        Just PMF.posterior at α := UrnState × Access. The marginalisation theorems below are direct instantiations of posterior_fst_apply / posterior_snd_apply.

                                        Equations
                                        Instances For
                                          theorem HerbstrittFranke2019.statePosterior_apply {Access : Type u_1} [Fintype Access] (S : UrnState × AccessPMF SimpleExpr) (joint : PMF (UrnState × Access)) (m : SimpleExpr) (h_marg : PMF.marginal S joint m 0) (s : UrnState) :
                                          (pragmaticListener S joint m h_marg).fst s = (∑ a : Access, joint (s, a) * (S (s, a)) m) / PMF.marginal S joint m

                                          HF Eq. 19/20 (marginal over Access): per-state marginal posterior. P(s | m) = ∑_a P_PL(s, a | m) = (∑_a P(s, a) · S(m | s, a)) / Z

                                          theorem HerbstrittFranke2019.accessPosterior_apply {Access : Type u_1} [Fintype Access] [DecidableEq Access] (S : UrnState × AccessPMF SimpleExpr) (joint : PMF (UrnState × Access)) (m : SimpleExpr) (h_marg : PMF.marginal S joint m 0) (a : Access) :
                                          (pragmaticListener S joint m h_marg).snd a = (∑ s : UrnState, joint (s, a) * (S (s, a)) m) / PMF.marginal S joint m

                                          HF marginal over UrnState: per-access marginal posterior. Companion of statePosterior_apply for the Access component.

                                          theorem HerbstrittFranke2019.statePosterior_lt_iff {Access : Type u_1} [Fintype Access] (S : UrnState × AccessPMF SimpleExpr) (joint : PMF (UrnState × Access)) (m : SimpleExpr) (h_marg : PMF.marginal S joint m 0) (s₁ s₂ : UrnState) :
                                          (pragmaticListener S joint m h_marg).fst s₁ < (pragmaticListener S joint m h_marg).fst s₂ a : Access, joint (s₁, a) * (S (s₁, a)) m < a : Access, joint (s₂, a) * (S (s₂, a)) m

                                          Comparison decomposition for the per-state marginal posterior. L1 favours state s₂ over s₁ after m iff the conditional joint sums favour it. Direct corollary of posterior_fst_lt_iff.

                                          Hellinger vs KL: the architectural divergence point #

                                          HF's central methodological choice: KL divergence makes any message whose extension fails to cover the speaker's belief support have infinite disutility, so the speaker can never consider it. Hellinger distance is uniformly bounded by 1, so all messages have bounded disutility.

                                          The two theorems below witness this directly:

                                          Together they make the architectural divergence visible at theorem level: there exist (P, Q) pairs where KL-utility excludes Q as a message choice but Hellinger-utility admits it.

                                          theorem HerbstrittFranke2019.klDiv_eq_top_when_zero_support {α : Type u_1} [Fintype α] [MeasurableSpace α] [MeasurableSingletonClass α] (P Q : PMF α) (s₀ : α) (hP : P s₀ 0) (hQ : Q s₀ = 0) :
                                          P.klDiv Q =

                                          KL divergence is when Q has zero mass on P's support. Direct from mathlib klDiv_of_not_ac: KL = ∞ when P.toMeasure is not absolutely continuous w.r.t. Q.toMeasure. Absolute continuity fails at s₀ because Q s₀ = 0 but P s₀ ≠ 0.

                                          theorem HerbstrittFranke2019.hellingerDistSq_le_one {α : Type u_1} [Fintype α] (P Q : PMF α) :

                                          Hellinger squared distance is ≤ 1 on PMFs: H²(P, Q) = 1 - BC(P, Q) where BC ∈ [0, 1] (Cauchy-Schwarz on √P, √Q, both with sum-to-1). Bounded above by 1 because BC ≥ 0.

                                          The lower bound 0 ≤ BC is immediate (each term √(P·Q) ≥ 0).

                                          theorem HerbstrittFranke2019.hellingerDist_le_one {α : Type u_1} [Fintype α] (P Q : PMF α) :

                                          Hellinger distance is ≤ 1 on PMFs: by H = √H² and H² ≤ 1.

                                          theorem HerbstrittFranke2019.hellinger_admits_what_KL_excludes :
                                          (PMF.pure 9, ).klDiv (PMF.pure 10, ) = (PMF.pure 9, ).hellingerDist (PMF.pure 10, ) 1

                                          The architectural divergence (concrete): there exist PMFs P, Q on UrnState such that KL(P ‖ Q) = ∞ but HD(P, Q) ≤ 1.

                                          Witness: P = pure 9 (speaker's degenerate belief at 9/10 red), Q = pure 10 (literal listener for certainly — δ on s=10). Under KL: speaker can never say "certainly". Under Hellinger: speaker considers "certainly" with bounded disutility. This is HF's key architectural claim.

                                          Architectural contrast with [GS13] #

                                          This model and G&S 2013 share the hypergeometric observation kernel (both use PMF.hypergeometric with N=10 and N=3 respectively) and the RSA architecture (literal listener, softmax speaker, posterior pragmatic listener). The only architectural difference is the speaker utility:

                                          ComponentG&S 2013HF 2019
                                          State spaceFin 4 (objects)Fin 11 (red balls)
                                          Observationhypergeometrichypergeometric
                                          Utility-KL(bel ‖ L0)-HD(bel, L0)
                                          Admits "true enough"?NO (KL = ∞ on zero-support)YES (HD ≤ 1)

                                          hellinger_admits_what_KL_excludes makes this difference visible: the same speaker belief pure 9 paired with the literal listener for certainly (pure 10) yields KL = ∞ but HD ≤ 1. Under G&S 2013's KL utility this speaker would be excluded from saying "certainly RED"; under HF's Hellinger utility she can. The empirical evidence (HF's production data, Section 5) shows speakers DO say "certainly" at 9/10.

                                          The Bretagnolle–Huber inequality (PMF.two_hellingerDistSq_le_klDiv in Core/Probability/Entropy.lean) gives the formal direction 2 · H² ≤ KL on the finite side: where KL is finite, H² is too, and bounded. Hellinger is the strictly weaker (more permissive) divergence — exactly what HF wants for the speaker utility.

                                          The paper finds that the compositional model underpredicts how often speakers choose "might be possible": although its truth condition is weak (posterior probability of "possible" exceeds θ_might = 0.332), the pragmatic speaker prefers logically stronger messages and so assigns the doubly-hedged form a low choice rate — yet participants select it far more often than predicted. A modal concord reading, collapsing the two modals to a single "possible" [Zei07], would close the gap. Qualitative; not formalised here.