Bayesian Posterior on PMF #
For a kernel κ : α → PMF β and prior μ : PMF α, the posterior at
observation b is the conditional distribution over α defined by
Bayes' rule:
posterior κ μ b a = μ a * κ a b / ∑' a', μ a' * κ a' b
This is the PMF-level analogue of ProbabilityTheory.posterior
(Mathlib.Probability.Kernel.Posterior). Mathlib's posterior is
measure-theoretic and requires [StandardBorelSpace] /
[IsFiniteKernel]; this version operates directly on countably-
supported distributions and avoids the measure-theoretic typeclass
burden, at the cost of requiring an explicit positivity hypothesis
on the marginal at b.
Main definitions #
PMF.marginal κ μ b— joint marginalP(b) = ∑' a, μ a * κ a b. Defined as anoncomputable abbrevfor(μ.bind κ) b; mathlib'sPMF.bind_applyisrfl, so the two shapes are interchangeable. Usemarginalwhen the formula is the natural reading; mathlibbind_*lemmas apply directly for the monadic shape.PMF.posterior κ μ b h— posterior PMF overαgiven kernelκ, priorμ, observationb, and proofh : marginal κ μ b ≠ 0.
Main statements #
posterior_apply— explicit Bayes formula.
Inequality decomposition library #
For each PMF construction, parallel _lt_iff_lt and _le_iff_le lemmas
strip off the normalisation factor — the foundation for "L1/posterior
prefers a₂ over a₁" structural proofs. Every construction has both
strict and non-strict forms with parallel naming:
normalize_lt_iff_lt/normalize_le_iff_le— genericPMF.normalizereweight_lt_iff_lt/reweight_le_iff_le— reweight = normalize ∘ (· * w)posterior_lt_iff_score_lt/posterior_le_iff_score_le— Bayesian posterior
Sum-over-prior monotonicity (no iff — pointwise ≤ doesn't reverse):
marginal_le_marginal/marginal_lt_marginal—marginal κ μ b(≡(μ.bind κ) b)bind_le_bind/bind_lt_bind— aliases exposing the monadic shape
Specialization for the common "uniform world prior" case:
posterior_lt_iff_kernel_lt_of_uniform/posterior_le_iff_kernel_le_of_uniform— cancels both the marginal denominator AND the uniform prior factor in one move
The bridge to ProbabilityTheory.posterior (via PMF.toMeasure /
Measure.toKernel) is left for a future file once a downstream
consumer needs the measure-theoretic identity.
A finite-type kernel-marginal at b is finite. Convenience composition of
PMF.apply_ne_top over a Fintype index — the natural hypothesis shape for
consumers building PMF.normalize from a kernel slice.
The marginal probability of observation b under the joint
distribution induced by kernel κ and prior μ:
P(b) = ∑' a, μ a * κ a b = (μ.bind κ) b.
An abbrev for the bind shape with reversed argument order. Definitionally
equal to (μ.bind κ) b via mathlib's PMF.bind_apply (rfl); all mathlib
bind_* lemmas apply directly.
Equations
- PMF.marginal κ μ b = (μ.bind κ) b
Instances For
A single witness a with μ a ≠ 0 and κ a b ≠ 0 suffices to make the
marginal non-zero — the standard positivity discharge for PMF.posterior.
Kernel-slice analogue of marginal_ne_zero: a single witness a with
κ a b ≠ 0 makes the prior-free fan-out ∑' a', κ a' b non-zero. The
shape consumers need when normalising the speaker step in RSA — there is
no listener prior over α to multiply against.
PMF.bind over a Fintype is a Finset.sum: the convenience
tsum_fintype form of PMF.bind_apply. Saves the rw [PMF.bind_apply, tsum_fintype] ritual that arises in every PMF-bind consumer.
PMF.bind of pure collapses to direct application: when the prior
is concentrated on a₀, the bind picks out f a₀ b. The natural pointwise
specialization of PMF.pure_bind (which gives the PMF-level equality).
Closed-form value of PMF.normalize: when both the numerator f a
and the partition tsum f are known, the normalized PMF takes the explicit
ratio. The equality companion to the existing inequality family
(normalize_lt_of_apply_eq_of_sum_lt, normalize_le_of_apply_eq_and_sum_ge,
normalize_eq_of_apply_eq_and_sum_eq).
The hypothesis tsum f ≠ ∞ is implicit via hF from the PMF.normalize
constructor; the tsum f ≠ 0 hypothesis comes from h0.
PMF.normalize collapses to pure when only one element has positive
weight. Mirror of posterior_eq_pure_of_singleton_score_support for the
prior-free normalize constructor — useful for "deterministic kernel"
patterns (e.g., a hypergeometric kernel at full sample access).
Inequality decomposition for PMF.normalize: comparing two normalised
values reduces to comparing the raw scores — the shared (∑' x, f x)⁻¹ factor
cancels (it is positive and finite by the well-formedness hypotheses).
Foundation lemma for the structural-decomposition shell: every "speaker prefers
utterance u₂ over u₁ at world w" claim in RSA reduces to comparing unnormalised
scores via this lemma. The partition function depends on w but not on the
utterance being compared.
The ≤ companion of normalize_lt_iff_lt.
Vacuous-zero cross-base inequality: when the LHS normalize base
vanishes at a (f a = 0) and the RHS does not (g a ≠ 0), the LHS
normalize value is 0 and the RHS is positive — so the inequality holds.
This is the workhorse for "speaker-at-w₁ assigns zero mass to utterance u
because u is inapplicable, while speaker-at-w₂ assigns positive mass" —
exactly the pattern that arises in unique-reference RSA findings (e.g.,
"green only applies to green_square, so L1(.green) prefers .green_square").
The two normalize bases f, g correspond to S1 at different worlds.
Cross-base equality: when two normalize bases agree at a AND have
the same total sum, the normalized values are equal at a.
This is the canonical discharge for "S1 prefers utterance u equally at
worlds w₁ and w₂ because the QUD partition makes the speaker insensitive
to the choice" — exactly the pattern that arises in equality findings
(e.g., ScontrasPearl surfAll, invHowMany, invAll cases).
Cross-base ≤ via partition monotonicity: when two normalize bases
agree at a (same numerator) and the LHS partition function dominates
(tsum f ≥ tsum g ⇒ LHS denominator larger ⇒ LHS quotient smaller), the
LHS normalized value is ≤ the RHS.
Useful for "S1 at world w₁ has same score for u as at w₂, but the
partition function at w₁ is larger because of an asymmetric distractor"
— canonical for ScontrasPearl invNone and similar rpow-monotonicity-
driven findings.
Strict cross-base via partition strict-monotonicity: when numerators
agree (f a = g a), the shared numerator is positive (g a ≠ 0, ≠ ⊤),
and the LHS partition strictly dominates (tsum g < tsum f), then
normalize f a < normalize g a.
Strict companion of normalize_le_of_apply_eq_and_sum_ge. The positivity
hypothesis on g a is required for the cancellation to be strict — if
g a = 0 both sides would equal 0 and the inequality wouldn't hold.
bindOnSupport collapses to a 2-element sum when the prior PMF's
support is contained in {a₁, a₂}. Each retained term p a_i * f a_i h b
uses the corresponding non-zero a_i ∈ p.support witness via dif_neg.
Useful for sparse PMF.bindOnSupport patterns (e.g., observation kernels
with 1-2 reachable obs per condition). Generalises trivially to k-element
support via repeated Finset.sum_insert.
Reweight a PMF by a non-negative weight function and renormalize.
The pointwise product p · w must have non-zero finite total mass —
the natural precondition for PMF.normalize.
This is the algebraic primitive that posterior and productOfExperts
both factor through: posterior takes w := κ · b (the kernel slice at
an observation), PoE takes w := q · (the second PMF).
Equations
- p.reweight w h_pos h_fin = PMF.normalize (fun (a : α) => p a * w a) h_pos h_fin
Instances For
Inequality decomposition for PMF.reweight: the same denominator-
cancellation pattern, lifted via reweight = normalize ∘ (· * w). Comparing
two reweighted values reduces to comparing the unnormalised products.
The ≤ companion of reweight_lt_iff_lt.
Bayesian posterior: for an observation b, the conditional
distribution over α. Well-defined when the marginal at b is
non-zero. The ≠ ∞ hypothesis is supplied automatically (the marginal
is bounded above by 1).
Equations
- PMF.posterior κ μ b h = μ.reweight (fun (a : α) => (κ a) b) h ⋯
Instances For
The posterior is supported on the intersection of the prior's support
and the kernel's support at the observed b.
Inequality decomposition: posterior comparison reduces to score comparison.
The key lemma for proving inequalities about Bayesian posteriors structurally —
two posteriors at observation b agree on which world has more probability iff
the unnormalized scores μ · κ agree.
This is the inequality-side counterpart to posterior_apply: the latter says
what the posterior value is; this says how to compare two posterior values
without computing the marginal denominator (it cancels).
Mathlib precedent: similar to Finset.sum_lt_sum_iff_of_le style — the shared
denominator/normalizer cancels in the comparison.
The ≤ companion of posterior_lt_iff_score_lt.
Set-version of posterior_lt_iff_score_lt: comparing the
outer-measure-of-Finset values of a posterior at two Finsets reduces to
comparing the corresponding conditional joint sums. The shared
(marginal κ μ b)⁻¹ factor cancels.
Foundation lemma for "L1 prefers worlds-in-A over worlds-in-B" in RSA
findings — e.g. (L1 u).toOuterMeasure {w | w.1 = .person} > (L1 u).toOuterMeasure {w | w.1 = .whale} reduces to comparing summed
conditional joint masses on the two cat-fibres.
posterior_lt_iff_score_lt is the singleton case (A = {a₁}, B = {a₂}).
The ≤ companion of posterior_toOuterMeasure_lt_iff_finset_score_lt.
Chained-posterior decomposition (PMF analogue of mathlib's
Mathlib.Probability.Kernel.Posterior.posterior_comp): comparing two
sequentially-updated posteriors posterior κ₂ (posterior κ₁ μ b₁) b₂ at
two different points reduces to comparing products of unnormalised scores
μ a · κ₁ a b₁ · κ₂ a b₂.
Both intermediate normalisations cancel: the outer marginal cancels via
posterior_lt_iff_score_lt; the inner marginal factors out as a shared
(marginal κ₁ μ b₁)⁻¹ on both sides of the inequality and cancels via
mul_lt_mul_iff_left. The result is "Bayes' rule for two conditionally-
independent observations sharing a prior".
Used in chained-RSA models like @cite{nouwen-2024}'s intersective
intensifier inference (paper Eq. 73): first update Π = stage-1 L1, then
second update against Π. Mathlib has the Kernel-categorical version
(posterior_comp); this is the discrete-PMF analogue.
Marginal monotonicity (≤): if κ₁ a b ≤ κ₂ a b pointwise, then
marginal κ₁ μ b ≤ marginal κ₂ μ b. The prior μ is the same on both sides;
multiplying by it preserves the pointwise inequality, and tsum is monotone.
Foundation lemma for cross-utterance / cross-kernel marginal comparisons.
Marginal monotonicity (<): if κ₁ a b ≤ κ₂ a b pointwise and the
inequality is strict at some a₀ with positive prior mass, then
marginal κ₁ μ b < marginal κ₂ μ b strictly. Lifts ENNReal.tsum_lt_tsum
with the prior multiplier supplying both directions of the cancellation.
Use case: "speaker assigns higher probability to u at world w₀ (and no
less anywhere else) — therefore the marginal probability of utterance u
strictly increases."
Bind monotonicity (<) — alias of marginal_lt_marginal exposing the
mathlib monadic shape (μ.bind f) b for the goal. Definitionally identical
to marginal_lt_marginal after marginal was made an abbrev for the bind
form, but kept as a named alias for consumers that already reach for it.
Bind monotonicity (≤) — companion alias of marginal_le_marginal.
Posterior comparison under uniform prior: the workhorse for any RSA
model with a uniform world prior. The shared prior factor (card α)⁻¹ is
positive and finite, so it cancels — leaving a pure kernel comparison.
This is the right starting move for "L1 prefers world a₂ over world a₁
after observing b" claims when the prior is PMF.uniformOfFintype α.
The ≤ companion of posterior_lt_iff_kernel_lt_of_uniform. Required for
"negative" findings of the form ¬ L1 a₁ > L1 a₂ (canceled implicatures),
which reduce via not_lt to L1 a₁ ≤ L1 a₂.
Forward direction of posterior_cross_obs_lt_iff (convenience for the
common case where consumers have the cross-multiplied score inequality).
Outer-measure bounds #
PMF.toOuterMeasure is bounded by 1 on every set, with strict inequality
characterised by support membership. These lemmas package the
pos_iff_ne_zero + toOuterMeasure_apply_eq_zero_iff pattern that arises
whenever a posterior measure is shown to be intermediate (0 < · < 1) —
the structural form of "borderline" in probabilistic vagueness models.
PMF.toOuterMeasure of any set is at most 1. The total mass is 1,
and the indicator restriction is dominated pointwise by the PMF.
A Finset partial sum of a PMF is at most 1. Specialisation of
tsum_coe = 1 to a finite subset of the support — the discrete analogue
of "the integral of a probability density over any set is ≤ 1".
Finite-disjoint additivity bound for PMF.toOuterMeasure.
For a Finset-indexed family of pairwise-disjoint sets, the sum of their
PMF measures is at most 1.
Proof via the indicator decomposition: each p.toOuterMeasure (f i) is
∑' x, (f i).indicator p x (mathlib's toOuterMeasure_apply). Swap finset
sum and tsum (Summable.tsum_finsetSum, with ENNReal.summable discharging
the summability hypothesis). Per-x, the inner sum bound uses disjointness:
at most one i ∈ s has x ∈ f i, so ∑ i ∈ s, (f i).indicator p x ≤ p x.
Summing pointwise: ∑' x, (...) ≤ ∑' x, p x = 1 (PMF.tsum_coe).
Used by interval-additive sorites bounds in
Phenomena/Gradability/Studies/LassiterGoodman2017PMF.lean (Eq. 37).
General enough that any disjoint-events probability bound consumer
can use it.
Combined intermediacy: when the support straddles s (witnesses both
in and out), the outer measure is strictly between 0 and 1.
Posterior degeneracy #
The Bayesian posterior concentrates on a single point when the kernel and prior conspire to leave only one positive-score world. Conversely, when the prior assigns equal mass to two worlds, the posterior ordering tracks the kernel ordering.
Posterior concentrates on a singleton score-support: if every
a' ≠ a_unique has either zero prior or zero kernel value at b, then
posterior κ μ b a_unique = 1. The deterministic limit of Bayesian
update — full information transmission.
Posterior collapses to pure at deterministic observation: when only
one prior-supported world has positive kernel mass at b, the posterior is
exactly PMF.pure a_unique. Strengthens
posterior_eq_one_of_singleton_score_support from a single apply value to
full PMF equality.
Posterior order tracks kernel order at equal prior: when
μ a₁ = μ a₂ and the prior is positive there, the posterior ranks a₁ < a₂ exactly when the kernel does. Generalises both
"pragmatic strengthening with a uniform-equivalent prior" and
"threshold-posterior informativeness with a uniform threshold prior".