Shannon entropy on PMF #
Entropy / mutual information / conditional entropy / KL divergence / JSD as
methods on mathlib's PMF α (gated by [Fintype α]).
This file makes PMF the canonical distribution type for Shannon-entropy
operations across linglib. After the unification refactor (CHANGELOG entry
forthcoming), consumers go from:
Core.InformationTheory.entropy Finset.univ p -- old shape (private after Phase 6)
p.entropy -- new shape (PMF α)
The Core.InformationTheory module is reduced to non-Shannon-entropy
primitives (ΔP, structural lemmas about negMulLog/klFun/Hellinger). The
PMF-canonical shape is what RSA pragmatic operators (L0OfMeaning /
S1Belief / L1) already use as their normalized output type, and is the
natural target type for ParadigmSystem and the Rathi LearnerModel
(both refactored in the same migration).
Public API #
| Operator | Signature |
|---|---|
PMF.entropy | (p : PMF α) → ℝ — H(p) = -∑ p log p |
PMF.mutualInformation | (joint : PMF (α × β)) → ℝ — I(X;Y) from joint |
PMF.conditionalEntropy | (joint : PMF (α × β)) → ℝ — H(β \| α) from joint |
PMF.klDiv | (p q : PMF α) [MeasurableSpace α] → ℝ≥0∞ — KL(p ‖ q), grounded in mathlib's InformationTheory.klDiv via PMF.toMeasure |
PMF.jsd | (p q : PMF α) → ℝ — Jensen-Shannon divergence |
PMF.toRealFn | (p : PMF α) → α → ℝ — ℝ≥0∞ → ℝ coercion |
Implementation #
Each operator is defined as noncomputable def (uses Real.log via
Real.negMulLog, plus ENNReal.toReal). Marginals are derived via
PMF.map Prod.fst/snd so the joint-distribution shape is canonical.
The proofs delegate to Core.InformationTheory lemmas via the coercion
PMF.toRealFn : PMF α → α → ℝ. The bridge is purely definitional
(rfl-provable). Phase 6 of the unification refactor keeps the
Core.InformationTheory.entropy-family operators as private helpers
for these delegations rather than as a public API.
Cover-Thomas anchoring #
Foundational properties match the Cover-Thomas (2006) information-theory
text. entropy_nonneg (CT 2.6.4) is proved here; mutualInformation_nonneg
(CT 2.6.5) and conditionalEntropy_le_entropy (CT 2.6.4) require
marginal-from-joint structural lemmas (PMF.map_apply on Prod.fst/snd)
and live in Phase 4 of the refactor (cross-paper bridges) along with the
substrate-level theorems that consume them.
All proofs in this file are structural. No sorry, no native_decide.
Shannon entropy of p (in nats): H(p) = -∑ p log p = ∑ negMulLog p.
Defined directly on PMF α (not via delegation to
Core.InformationTheory.entropy); the equivalence is the bridge
theorem entropy_eq_sum_negMulLog_toReal. This independence lets the
PMF entropy substrate stand on its own (mathlib pattern: each operator
at its natural type, no cross-namespace delegation).
Equations
- p.entropy = ∑ a : α, (p a).toReal.negMulLog
Instances For
Entropy is non-negative (Cover-Thomas 2.6.4). Direct proof via
Real.negMulLog_nonneg applied pointwise: each summand is ≥ 0 since
(p a).toReal ∈ [0, 1].
Product PMF #
PMF.product P Q is the independent joint distribution: (P.product Q) (a, b) = P a · Q b. Foundation for defining mutualInformation as KL(joint ‖ product).
Mathlib gap. PMF.product is missing from mathlib. Defined here via the
standard monadic bind/map construction.
The independent product distribution of two PMFs.
Equations
- P.product Q = P.bind fun (a : α) => PMF.map (Prod.mk a) Q
Instances For
KL divergence on PMFs: rfl-bridge to mathlib #
PMF.klDiv is defined as the mathlib measure-theoretic
InformationTheory.klDiv applied to PMF.toMeasure. The bridge is
rfl-provable — drift is structurally impossible. If mathlib refactors
klDiv, our def follows automatically.
The return type ℝ≥0∞ matches mathlib's. Non-negativity is by type,
not by theorem — no klDiv_nonneg lemma needed. All ~12 mathlib klDiv
theorems (klDiv_self, klDiv_zero_left, klDiv_eq_zero_iff,
klDiv_eq_top_iff, klDiv_ne_top, ...) become available on PMF.klDiv
via the rfl-bridge with no further proof obligation.
There is no PMF.klDivergence (ℝ-returning) backwards-compatibility
form. Per linglib's no-backcompat-shims discipline (MEMORY:
feedback_no_backcompat), the previous discrete-sum klDivergence was
deleted to force consumers onto the single canonical API. Consumers that
need an ℝ value use (p.klDiv q).toReal. The discrete-sum form (when
needed for proof manipulation) is the theorem klDiv_eq_sum_klFun.
KL divergence on PMFs, grounded in mathlib's
InformationTheory.klDiv. Returns ℝ≥0∞ (non-negative by type;
∞ for non-abs-continuous case). Inherits all mathlib klDiv
theorems via the rfl-bridge. Requires [MeasurableSpace α]; for
[Fintype α], equip with the discrete ⊤ measurable space.
Equations
- p.klDiv q = InformationTheory.klDiv p.toMeasure q.toMeasure
Instances For
Mathlib-inherited theorems #
The following theorems are direct corollaries of mathlib's klDiv API on
the toMeasure-coerced PMFs. They're stated here as 1-line wrappers for
ergonomic consumer access (p.klDiv_self reads better than
_root_.InformationTheory.klDiv_self p.toMeasure).
Mathlib klDiv_self: KL divergence of a PMF against itself is 0.
KL divergence is ∞ exactly when p is not absolutely continuous wrt q
or llr is not integrable. Inherited from mathlib's klDiv_eq_top_iff.
KL divergence is finite iff p ≪ q and llr is integrable.
Inherited from mathlib's klDiv_ne_top_iff.
Sufficient condition for p.klDiv q ≠ ∞. Inherited from mathlib's
klDiv_ne_top.
Discrete-sum reduction theorem (klDiv_eq_sum_klFun):
for [Fintype α] PMFs, the mathlib-grounded klDiv reduces to a
discrete sum (when p ≪ q; the value is ∞ otherwise).
p.klDiv q = ∑ a, q a * ENNReal.ofReal (klFun ((p a / q a).toReal))
Proof structure: bridge p.toMeasure = q.toMeasure.withDensity (p/q)
via Measure.ext + lintegral_fintype + per-atom ENNReal cancellation
(p a / q a) * q a = p a (which holds under singleton-AC); from there
Measure.rnDeriv_withDensity gives the rnDeriv ae-equality, and
klDiv_eq_lintegral_klFun_of_ac plus lintegral_fintype close the
sum reduction.
Midpoint distribution + KL-based JSD #
Mathlib-style definition: JSD(p, q) := (KL(p ‖ m) + KL(q ‖ m))/2 where
m = (p + q)/2. The classical H(m) − (H(p) + H(q))/2 formula becomes
a derived theorem (jsd_eq_entropy_form).
Reuse payoff: jsd_nonneg reduces to ENNReal.toReal_nonneg / 2.
jsd_symm reduces to symmetry of the sum + symmetry of midPMF.
The 1/2-mixture distribution: (midPMF p q) a = (p a + q a) / 2.
Equations
- p.midPMF q = PMF.ofFintype (fun (a : α) => (p a + q a) / 2) ⋯
Instances For
JSD is non-negative — free from ENNReal.toReal_nonneg.
JSD is symmetric — JSD(p, q) = JSD(q, p), free from midPMF_symm.
Hellinger family on PMF #
@cite{herbstritt-franke-2019} use Hellinger distance as an alternative to KL divergence in RSA speaker utilities, because Hellinger remains finite when literal interpretations assign zero probability to states the speaker considers possible (whereas KL diverges).
Defined directly on PMFs (no Core.InformationTheory delegation). Mathlib
gap — these specializations to PMF are upstreamable.
Bhattacharyya coefficient BC(P, Q) = ∑ √(P · Q).
Equations
- P.bhattacharyyaCoeff Q = ∑ a : α, √(P.toRealFn a * Q.toRealFn a)
Instances For
Squared Hellinger distance H²(P, Q) = 1 - BC(P, Q).
Equations
- P.hellingerDistSq Q = 1 - P.bhattacharyyaCoeff Q
Instances For
Hellinger distance HD(P, Q) = √H²(P, Q). Proper metric on PMFs;
bounded in [0, 1] for probability distributions.
Equations
- P.hellingerDist Q = √(P.hellingerDistSq Q)
Instances For
Bretagnolle–Huber substrate (private helpers) #
Pure real-arithmetic helpers for the BH inequality 2 · H²(P, Q) ≤ KL(P ‖ Q).
Inlined here (not in Core.InformationTheory) so that Entropy is
self-contained against Core deletion.
Discrete log-ratio sum form of klDiv:
(P.klDiv Q).toReal = ∑ a, P a · log (P a / Q a) under strict-positive Q.
This is the consumer-facing discrete sum that downstream files want
(Cover-Thomas Eq. 2.23). Self-contained — uses only mathlib's klFun
plus klDiv_eq_sum_klFun. Independent of Core.InformationTheory.klFinite.
MI as KL of joint vs product #
Mathlib-style definition: I(X;Y) := KL(joint ‖ marginal_X × marginal_Y).toReal.
The classical H(X) + H(Y) − H(X,Y) formula becomes a derived theorem
(Cover-Thomas Thm 2.6.5, mutualInformation_eq_entropy_sum).
Reuse payoff: mutualInformation_nonneg reduces to ENNReal.toReal_nonneg.
Future klDiv chain rules from mathlib transfer to MI for free.
Mutual information I(X;Y) := KL(joint ‖ marginal_X × marginal_Y).toReal,
grounded in PMF.klDiv (rfl-bridged to mathlib's InformationTheory.klDiv).
Equations
- joint.mutualInformation = (joint.klDiv (joint.fst.product joint.snd)).toReal
Instances For
Mutual information is non-negative (Cover-Thomas 2.6.5):
free from ENNReal.toReal_nonneg since klDiv returns ℝ≥0∞.
Cover-Thomas Thm 2.6.5 bridge: I(X;Y) = H(X) + H(Y) − H(X,Y).
Connects the KL-based definition (this file) to the classical
H-difference form.
Conditional entropy H(β | α) = H(α, β) - H(α).
Equations
- joint.conditionalEntropy = joint.entropy - joint.fst.entropy
Instances For
Conditioning reduces entropy (Cover-Thomas 2.6.4): H(β | α) ≤ H(β).
Direct corollary of mutualInformation_nonneg (free from
ENNReal.toReal_nonneg) plus the H-difference bridge
mutualInformation_eq_entropy_sum.
Bretagnolle–Huber inequality on PMFs: 2 · H²(P, Q) ≤ KL(P ‖ Q).
Stated against the mathlib-grounded PMF.klDiv (returns ℝ≥0∞); the
2 * hellingerDistSq factor is wrapped via ENNReal.ofReal for type
compatibility.
Proof: combines klDiv_eq_sum_klFun (discrete reduction) with the
sqrt-klFun pointwise inequality and the Hellinger bridge
(√p − √q)² = q · (√(p/q) − 1)². Self-contained — no Core deps.