Documentation

Linglib.Core.Probability.Entropy

Shannon entropy on `PMF` #

Entropy / mutual information / conditional entropy / KL divergence / JSD as methods on mathlib's PMF α (gated by [Fintype α]).

This file makes PMF the canonical distribution type for Shannon-entropy operations across linglib. After the unification refactor (CHANGELOG entry forthcoming), consumers go from:

Core.InformationTheory.entropy Finset.univ p     -- old shape (private after Phase 6)
p.entropy                                          -- new shape (PMF α)

The Core.InformationTheory module is reduced to non-Shannon-entropy primitives (ΔP, structural lemmas about negMulLog/klFun/Hellinger). The PMF-canonical shape is what RSA pragmatic operators (L0OfMeaning / S1Belief / L1) already use as their normalized output type, and is the natural target type for ParadigmSystem and the Rathi LearnerModel (both refactored in the same migration).

Public API #

Operator	Signature
`PMF.entropy`	`(p : PMF α) → ℝ` — `H(p) = -∑ p log p`
`PMF.mutualInformation`	`(joint : PMF (α × β)) → ℝ` — `I(X;Y)` from joint
`PMF.conditionalEntropy`	`(joint : PMF (α × β)) → ℝ` — `H(β \\| α)` from joint
`PMF.klDiv`	`(p q : PMF α) [MeasurableSpace α] → ℝ≥0∞` — KL(p ‖ q), grounded in mathlib's `InformationTheory.klDiv` via `PMF.toMeasure`
`PMF.jsd`	`(p q : PMF α) → ℝ` — Jensen-Shannon divergence
`PMF.toRealFn`	`(p : PMF α) → α → ℝ` — `ℝ≥0∞ → ℝ` coercion

Implementation #

Each operator is defined as noncomputable def (uses Real.log via Real.negMulLog, plus ENNReal.toReal). Marginals are derived via PMF.map Prod.fst/snd so the joint-distribution shape is canonical.

The proofs delegate to Core.InformationTheory lemmas via the coercion PMF.toRealFn : PMF α → α → ℝ. The bridge is purely definitional (rfl-provable). Phase 6 of the unification refactor keeps the Core.InformationTheory.entropy-family operators as private helpers for these delegations rather than as a public API.

Cover-Thomas anchoring #

Foundational properties match the Cover-Thomas (2006) information-theory text. entropy_nonneg (CT 2.6.4) is proved here; mutualInformation_nonneg (CT 2.6.5) and conditionalEntropy_le_entropy (CT 2.6.4) require marginal-from-joint structural lemmas (PMF.map_apply on Prod.fst/snd) and live in Phase 4 of the refactor (cross-paper bridges) along with the substrate-level theorems that consume them.

All proofs in this file are structural. No sorry, no native_decide.

noncomputable def PMF.entropy {α : Type u_1} [Fintype α] (p : PMF α) :

ℝ

Shannon entropy of p (in nats): H(p) = -∑ p log p = ∑ negMulLog p.

Defined directly on PMF α (not via delegation to Core.InformationTheory.entropy); the equivalence is the bridge theorem entropy_eq_sum_negMulLog_toReal. This independence lets the PMF entropy substrate stand on its own (mathlib pattern: each operator at its natural type, no cross-namespace delegation).

Equations

p.entropy = ∑ a : α, (p a).toReal.negMulLog

Instances For

theorem PMF.entropy_nonneg {α : Type u_1} [Fintype α] (p : PMF α) :

0 ≤ p.entropy

Entropy is non-negative (Cover-Thomas 2.6.4). Direct proof via Real.negMulLog_nonneg applied pointwise: each summand is ≥ 0 since (p a).toReal ∈ [0, 1].

noncomputable def PMF.fst {α : Type u_1} {β : Type u_2} (joint : PMF (α × β)) :

PMF α

Marginal along the first projection.

Equations

joint.fst = PMF.map Prod.fst joint

Instances For

noncomputable def PMF.snd {α : Type u_1} {β : Type u_2} (joint : PMF (α × β)) :

PMF β

Marginal along the second projection.

Equations

joint.snd = PMF.map Prod.snd joint

Instances For

theorem PMF.fst_apply {α : Type u_1} {β : Type u_2} [Fintype α] [Fintype β] [DecidableEq α] (joint : PMF (α × β)) (a : α) :

joint.fst a = ∑ b : β, joint (a, b)

joint.fst a = ∑ b, joint (a, b) for finite-Fintype joint PMFs. The first marginal is the row-sum of the joint.

theorem PMF.snd_apply {α : Type u_1} {β : Type u_2} [Fintype α] [Fintype β] [DecidableEq β] (joint : PMF (α × β)) (b : β) :

joint.snd b = ∑ a : α, joint (a, b)

joint.snd b = ∑ a, joint (a, b) for finite-Fintype joint PMFs. The second marginal is the column-sum of the joint.

theorem PMF.fst_toRealFn_eq_sum {α : Type u_1} {β : Type u_2} [Fintype α] [Fintype β] [DecidableEq α] (joint : PMF (α × β)) (a : α) :

joint.fst.toRealFn a = ∑ b : β, joint.toRealFn (a, b)

toRealFn of the first marginal equals the row-sum of toRealFn of the joint.

theorem PMF.snd_toRealFn_eq_sum {α : Type u_1} {β : Type u_2} [Fintype α] [Fintype β] [DecidableEq β] (joint : PMF (α × β)) (b : β) :

joint.snd.toRealFn b = ∑ a : α, joint.toRealFn (a, b)

toRealFn of the second marginal equals the column-sum of toRealFn of the joint.

Product PMF #

PMF.product P Q is the independent joint distribution: (P.product Q) (a, b) = P a · Q b. Foundation for defining mutualInformation as KL(joint ‖ product).

Mathlib gap. PMF.product is missing from mathlib. Defined here via the standard monadic bind/map construction.

noncomputable def PMF.product {α : Type u_1} {β : Type u_2} (P : PMF α) (Q : PMF β) :

PMF (α × β)

The independent product distribution of two PMFs.

Equations

P.product Q = P.bind fun (a : α) => PMF.map (Prod.mk a) Q

Instances For

@[simp]

theorem PMF.product_apply {α : Type u_1} {β : Type u_2} [Fintype α] [Fintype β] (P : PMF α) (Q : PMF β) (a : α) (b : β) :

(P.product Q) (a, b) = P a * Q b

@[simp]

theorem PMF.product_toRealFn {α : Type u_1} {β : Type u_2} [Fintype α] [Fintype β] (P : PMF α) (Q : PMF β) (a : α) (b : β) :

(P.product Q).toRealFn (a, b) = P.toRealFn a * Q.toRealFn b

KL divergence on PMFs: rfl-bridge to mathlib #

PMF.klDiv is defined as the mathlib measure-theoretic InformationTheory.klDiv applied to PMF.toMeasure. The bridge is rfl-provable — drift is structurally impossible. If mathlib refactors klDiv, our def follows automatically.

The return type ℝ≥0∞ matches mathlib's. Non-negativity is by type, not by theorem — no klDiv_nonneg lemma needed. All ~12 mathlib klDiv theorems (klDiv_self, klDiv_zero_left, klDiv_eq_zero_iff, klDiv_eq_top_iff, klDiv_ne_top, ...) become available on PMF.klDiv via the rfl-bridge with no further proof obligation.

There is no PMF.klDivergence (ℝ-returning) backwards-compatibility form. Per linglib's no-backcompat-shims discipline (MEMORY: feedback_no_backcompat), the previous discrete-sum klDivergence was deleted to force consumers onto the single canonical API. Consumers that need an ℝ value use (p.klDiv q).toReal. The discrete-sum form (when needed for proof manipulation) is the theorem klDiv_eq_sum_klFun.

noncomputable def PMF.klDiv {α : Type u_4} [MeasurableSpace α] (p q : PMF α) :

ENNReal

KL divergence on PMFs, grounded in mathlib's InformationTheory.klDiv. Returns ℝ≥0∞ (non-negative by type; ∞ for non-abs-continuous case). Inherits all mathlib klDiv theorems via the rfl-bridge. Requires [MeasurableSpace α]; for [Fintype α], equip with the discrete ⊤ measurable space.

Equations

p.klDiv q = InformationTheory.klDiv p.toMeasure q.toMeasure

Instances For

@[simp]

theorem PMF.klDiv_eq_toMeasure_klDiv {α : Type u_4} [MeasurableSpace α] (p q : PMF α) :

p.klDiv q = InformationTheory.klDiv p.toMeasure q.toMeasure

The rfl-bridge: PMF.klDiv IS mathlib's klDiv on the toMeasure coercions. Drift is structurally impossible.

Mathlib-inherited theorems #

The following theorems are direct corollaries of mathlib's klDiv API on the toMeasure-coerced PMFs. They're stated here as 1-line wrappers for ergonomic consumer access (p.klDiv_self reads better than _root_.InformationTheory.klDiv_self p.toMeasure).

theorem PMF.klDiv_self {α : Type u_4} [MeasurableSpace α] (p : PMF α) :

p.klDiv p = 0

Mathlib klDiv_self: KL divergence of a PMF against itself is 0.

theorem PMF.klDiv_eq_top_iff {α : Type u_4} [MeasurableSpace α] (p q : PMF α) :

p.klDiv q = ⊤ ↔ p.toMeasure.AbsolutelyContinuous q.toMeasure → ¬MeasureTheory.Integrable (MeasureTheory.llr p.toMeasure q.toMeasure) p.toMeasure

KL divergence is ∞ exactly when p is not absolutely continuous wrt q or llr is not integrable. Inherited from mathlib's klDiv_eq_top_iff.

theorem PMF.klDiv_ne_top_iff {α : Type u_4} [MeasurableSpace α] (p q : PMF α) :

p.klDiv q ≠ ⊤ ↔ p.toMeasure.AbsolutelyContinuous q.toMeasure ∧ MeasureTheory.Integrable (MeasureTheory.llr p.toMeasure q.toMeasure) p.toMeasure

KL divergence is finite iff p ≪ q and llr is integrable. Inherited from mathlib's klDiv_ne_top_iff.

theorem PMF.klDiv_ne_top {α : Type u_4} [MeasurableSpace α] {p q : PMF α} (hpq : p.toMeasure.AbsolutelyContinuous q.toMeasure) (hint : MeasureTheory.Integrable (MeasureTheory.llr p.toMeasure q.toMeasure) p.toMeasure) :

p.klDiv q ≠ ⊤

Sufficient condition for p.klDiv q ≠ ∞. Inherited from mathlib's klDiv_ne_top.

theorem PMF.klDiv_eq_sum_klFun {α : Type u_4} [Fintype α] [MeasurableSpace α] [MeasurableSingletonClass α] (p q : PMF α) (hpq : p.toMeasure.AbsolutelyContinuous q.toMeasure) :

p.klDiv q = ∑ a : α, q a * ENNReal.ofReal (InformationTheory.klFun (p a / q a).toReal)

Discrete-sum reduction theorem (klDiv_eq_sum_klFun): for [Fintype α] PMFs, the mathlib-grounded klDiv reduces to a discrete sum (when p ≪ q; the value is ∞ otherwise).

p.klDiv q = ∑ a, q a * ENNReal.ofReal (klFun ((p a / q a).toReal))

Proof structure: bridge p.toMeasure = q.toMeasure.withDensity (p/q) via Measure.ext + lintegral_fintype + per-atom ENNReal cancellation (p a / q a) * q a = p a (which holds under singleton-AC); from there Measure.rnDeriv_withDensity gives the rnDeriv ae-equality, and klDiv_eq_lintegral_klFun_of_ac plus lintegral_fintype close the sum reduction.

Midpoint distribution + KL-based JSD #

Mathlib-style definition: JSD(p, q) := (KL(p ‖ m) + KL(q ‖ m))/2 where m = (p + q)/2. The classical H(m) − (H(p) + H(q))/2 formula becomes a derived theorem (jsd_eq_entropy_form).

Reuse payoff: jsd_nonneg reduces to ENNReal.toReal_nonneg / 2. jsd_symm reduces to symmetry of the sum + symmetry of midPMF.

noncomputable def PMF.midPMF {α : Type u_1} [Fintype α] (p q : PMF α) :

PMF α

The 1/2-mixture distribution: (midPMF p q) a = (p a + q a) / 2.

Equations

p.midPMF q = PMF.ofFintype (fun (a : α) => (p a + q a) / 2) ⋯

Instances For

@[simp]

theorem PMF.midPMF_apply {α : Type u_1} [Fintype α] (p q : PMF α) (a : α) :

(p.midPMF q) a = (p a + q a) / 2

theorem PMF.midPMF_symm {α : Type u_1} [Fintype α] (p q : PMF α) :

p.midPMF q = q.midPMF p

theorem PMF.midPMF_toRealFn {α : Type u_1} [Fintype α] (p q : PMF α) (a : α) :

(p.midPMF q).toRealFn a = (p.toRealFn a + q.toRealFn a) / 2

noncomputable def PMF.jsd {α : Type u_1} [Fintype α] [MeasurableSpace α] (p q : PMF α) :

ℝ

Jensen-Shannon divergence JSD(p, q) := (KL(p ‖ m) + KL(q ‖ m))/2 where m := midPMF p q, grounded in PMF.klDiv.

Equations

p.jsd q = ((p.klDiv (p.midPMF q)).toReal + (q.klDiv (p.midPMF q)).toReal) / 2

Instances For

theorem PMF.jsd_nonneg {α : Type u_1} [Fintype α] [MeasurableSpace α] (p q : PMF α) :

0 ≤ p.jsd q

JSD is non-negative — free from ENNReal.toReal_nonneg.

theorem PMF.jsd_symm {α : Type u_1} [Fintype α] [MeasurableSpace α] (p q : PMF α) :

p.jsd q = q.jsd p

JSD is symmetric — JSD(p, q) = JSD(q, p), free from midPMF_symm.

Hellinger family on PMF #

@cite{herbstritt-franke-2019} use Hellinger distance as an alternative to KL divergence in RSA speaker utilities, because Hellinger remains finite when literal interpretations assign zero probability to states the speaker considers possible (whereas KL diverges).

Defined directly on PMFs (no Core.InformationTheory delegation). Mathlib gap — these specializations to PMF are upstreamable.

noncomputable def PMF.bhattacharyyaCoeff {α : Type u_1} [Fintype α] (P Q : PMF α) :

ℝ

Bhattacharyya coefficient BC(P, Q) = ∑ √(P · Q).

Equations

P.bhattacharyyaCoeff Q = ∑ a : α, √(P.toRealFn a * Q.toRealFn a)

Instances For

noncomputable def PMF.hellingerDistSq {α : Type u_1} [Fintype α] (P Q : PMF α) :

ℝ

Squared Hellinger distance H²(P, Q) = 1 - BC(P, Q).

Equations

P.hellingerDistSq Q = 1 - P.bhattacharyyaCoeff Q

Instances For

noncomputable def PMF.hellingerDist {α : Type u_1} [Fintype α] (P Q : PMF α) :

ℝ

Hellinger distance HD(P, Q) = √H²(P, Q). Proper metric on PMFs; bounded in [0, 1] for probability distributions.

Equations

P.hellingerDist Q = √(P.hellingerDistSq Q)

Instances For

theorem PMF.hellingerDistSq_nonneg {α : Type u_1} [Fintype α] (P Q : PMF α) (h : P.bhattacharyyaCoeff Q ≤ 1) :

0 ≤ P.hellingerDistSq Q

Bretagnolle–Huber substrate (private helpers) #

Pure real-arithmetic helpers for the BH inequality 2 · H²(P, Q) ≤ KL(P ‖ Q). Inlined here (not in Core.InformationTheory) so that Entropy is self-contained against Core deletion.

theorem PMF.toReal_klDiv_eq_sum_log_div {α : Type u_1} [Fintype α] [MeasurableSpace α] [MeasurableSingletonClass α] (P Q : PMF α) (hQ_pos : ∀ (a : α), 0 < Q.toRealFn a) :

(P.klDiv Q).toReal = ∑ a : α, P.toRealFn a * Real.log (P.toRealFn a / Q.toRealFn a)

Discrete log-ratio sum form of klDiv: (P.klDiv Q).toReal = ∑ a, P a · log (P a / Q a) under strict-positive Q.

This is the consumer-facing discrete sum that downstream files want (Cover-Thomas Eq. 2.23). Self-contained — uses only mathlib's klFun plus klDiv_eq_sum_klFun. Independent of Core.InformationTheory.klFinite.

MI as KL of joint vs product #

Mathlib-style definition: I(X;Y) := KL(joint ‖ marginal_X × marginal_Y).toReal. The classical H(X) + H(Y) − H(X,Y) formula becomes a derived theorem (Cover-Thomas Thm 2.6.5, mutualInformation_eq_entropy_sum).

Reuse payoff: mutualInformation_nonneg reduces to ENNReal.toReal_nonneg. Future klDiv chain rules from mathlib transfer to MI for free.

noncomputable def PMF.mutualInformation {α : Type u_1} {β : Type u_2} [MeasurableSpace α] [MeasurableSpace β] (joint : PMF (α × β)) :

ℝ

Mutual information I(X;Y) := KL(joint ‖ marginal_X × marginal_Y).toReal, grounded in PMF.klDiv (rfl-bridged to mathlib's InformationTheory.klDiv).

Equations

joint.mutualInformation = (joint.klDiv (joint.fst.product joint.snd)).toReal

Instances For

theorem PMF.mutualInformation_nonneg {α : Type u_1} {β : Type u_2} [MeasurableSpace α] [MeasurableSpace β] (joint : PMF (α × β)) :

0 ≤ joint.mutualInformation

Mutual information is non-negative (Cover-Thomas 2.6.5): free from ENNReal.toReal_nonneg since klDiv returns ℝ≥0∞.

theorem PMF.mutualInformation_eq_entropy_sum {α : Type u_1} {β : Type u_2} [Fintype α] [Fintype β] [MeasurableSpace α] [MeasurableSpace β] [MeasurableSingletonClass α] [MeasurableSingletonClass β] [DecidableEq α] [DecidableEq β] (joint : PMF (α × β)) (h_margX_pos : ∀ (a : α), 0 < joint.fst.toRealFn a) (h_margY_pos : ∀ (b : β), 0 < joint.snd.toRealFn b) :

joint.mutualInformation = joint.fst.entropy + joint.snd.entropy - joint.entropy

Cover-Thomas Thm 2.6.5 bridge: I(X;Y) = H(X) + H(Y) − H(X,Y). Connects the KL-based definition (this file) to the classical H-difference form.

noncomputable def PMF.conditionalEntropy {α : Type u_1} {β : Type u_2} [Fintype α] [Fintype β] (joint : PMF (α × β)) :

ℝ

Conditional entropy H(β | α) = H(α, β) - H(α).

Equations

joint.conditionalEntropy = joint.entropy - joint.fst.entropy

Instances For

theorem PMF.conditionalEntropy_le_entropy {α : Type u_1} {β : Type u_2} [Fintype α] [Fintype β] [MeasurableSpace α] [MeasurableSpace β] [MeasurableSingletonClass α] [MeasurableSingletonClass β] [DecidableEq α] [DecidableEq β] (joint : PMF (α × β)) (h_margX_pos : ∀ (a : α), 0 < joint.fst.toRealFn a) (h_margY_pos : ∀ (b : β), 0 < joint.snd.toRealFn b) :

joint.conditionalEntropy ≤ joint.snd.entropy

Conditioning reduces entropy (Cover-Thomas 2.6.4): H(β | α) ≤ H(β). Direct corollary of mutualInformation_nonneg (free from ENNReal.toReal_nonneg) plus the H-difference bridge mutualInformation_eq_entropy_sum.

theorem PMF.two_hellingerDistSq_le_klDiv {α : Type u_1} [Fintype α] [Nonempty α] [MeasurableSpace α] [MeasurableSingletonClass α] (P Q : PMF α) (hQ_pos : ∀ (a : α), 0 < Q.toRealFn a) :

ENNReal.ofReal (2 * P.hellingerDistSq Q) ≤ P.klDiv Q

Bretagnolle–Huber inequality on PMFs: 2 · H²(P, Q) ≤ KL(P ‖ Q).

Stated against the mathlib-grounded PMF.klDiv (returns ℝ≥0∞); the 2 * hellingerDistSq factor is wrapped via ENNReal.ofReal for type compatibility.

Proof: combines klDiv_eq_sum_klFun (discrete reduction) with the sqrt-klFun pointwise inequality and the Hellinger bridge (√p − √q)² = q · (√(p/q) − 1)². Self-contained — no Core deps.