Blackwell comparison of experiments #

A statistical experiment is a Markov kernel P : Kernel Θ 𝓧 generating data in 𝓧 from a parameter in Θ. Experiment P is at least as informative as P' : Kernel Θ 𝓧' when P' can be recovered from P by Markov post-processing ("garbling"): P' = η ∘ₖ P for some Markov kernel η. This file develops that order and its characterization through Bayes risk.

[Bla53]'s comparison theorem states that P is at least as informative as P' if and only if, for every decision problem, the Bayes risk under P is no greater than under P'. We state and prove this equivalence over ProbabilityTheory.bayesRisk for finite spaces: the forward direction is the data-processing inequality, the converse the Blackwell–Sherman–Stein separation argument (see the implementation notes).

Main definitions #

Kernel.IsGarblingOf: P'.IsGarblingOf P means P' is a Markov garbling of P, i.e. P is at least as informative as P'. Reflexive (Kernel.IsGarblingOf.refl) and transitive (Kernel.IsGarblingOf.trans).
Kernel.BlackwellDominates: P.BlackwellDominates P' means P has Bayes risk no greater than P' for every decision problem and prior — the dual side of the Blackwell equivalence.

Main statements #

bayesRisk_le_of_isGarblingOf / blackwellDominates_of_isGarblingOf: if P' is a garbling of P, then P Blackwell-dominates P' (the data-processing direction).
isGarblingOf_of_blackwellDominates: conversely, if P Blackwell-dominates P', then P' is a garbling of P (the Blackwell–Sherman–Stein direction, finite case). Requires finite, nonempty Θ and that both P and P' are Markov kernels — false otherwise (see the theorem docstring for counterexamples).
isGarblingOf_iff_blackwellDominates: the two directions combined.

Implementation notes #

The development is stated entirely over Mathlib's Kernel and bayesRisk with no further dependencies, so it can serve as a Mathlib.Probability.Decision.Blackwell candidate. The finite, ℝ-valued eig / questionUtility view in Core.Probability.Decision.ExperimentDesign is a downstream consumer, bridged via ObservationModel.toPMFKernel.

Kernel.BlackwellDominates quantifies over all decision problems (every measurable action space 𝓨 and loss ℓ : Θ → 𝓨 → ℝ≥0∞) and priors: dominance for a single one does not force garbling. The action-space universe is pinned to u (the universe of 𝓧') because the converse proof instantiates the dominance hypothesis at the action space 𝓨 := 𝓧' (the identity estimator).

The converse proof encodes finite kernels as real vectors of singleton masses (encode) and realizes the garblings of P as a compact convex polytope garblingSet P (the linear image of the product of standard simplices). If encode P' lies outside the polytope, the geometric Hahn–Banach theorem (geometric_hahn_banach_point_closed) yields a separating functional f; its coordinate matrix a θ x' = f (Pi.single θ (Pi.single x' 1)), shifted by a constant C to be nonnegative, defines a loss ℓ θ x' = ENNReal.ofReal (a θ x' + C) on actions 𝓧'. Under the uniform prior, the Bayes risk of any experiment Q evaluated at the identity estimator equals ENNReal.ofReal (|Θ|⁻¹ · f (encode Q) + C): the identity estimator pins P' to f (encode P'), while every estimator for P produces a garbling, whose f-value exceeds the separation level. This realizes bayesRisk ℓ P' π < bayesRisk ℓ P π, contradicting the hypothesis — no minimax theorem is needed, the infimum is bounded below directly. All analytic inputs come from Mathlib (isCompact_stdSimplex, the geometric_hahn_banach_* lemmas, bayesRisk_fintype). The kernel-to-stochastic-matrix bridge (encode, garblingMap, buildKernel) is currently private proof scaffolding; it is a self-contained finite-kernel ↔ row-stochastic-matrix correspondence that would naturally graduate to its own public file when upstreamed.

References #

[Bla53]

source

def ProbabilityTheory.Kernel.IsGarblingOf {Θ : Type u_1} {𝓧 : Type u_2} {𝓧' : Type u} [mΘ : MeasurableSpace Θ] [m𝓧 : MeasurableSpace 𝓧] [m𝓧' : MeasurableSpace 𝓧'] (P' : Kernel Θ 𝓧') (P : Kernel Θ 𝓧) :

Prop

P' is a garbling of P (Blackwell): there is a Markov post-processing kernel η with P' = η ∘ₖ P. Read "P is at least as informative as P'".

Equations

P'.IsGarblingOf P = ∃ (η : ProbabilityTheory.Kernel 𝓧 𝓧'), ProbabilityTheory.IsMarkovKernel η ∧ P' = η.comp P

Instances For

source

theorem ProbabilityTheory.Kernel.IsGarblingOf.refl {Θ : Type u_1} {𝓧 : Type u_2} [mΘ : MeasurableSpace Θ] [m𝓧 : MeasurableSpace 𝓧] (P : Kernel Θ 𝓧) [IsMarkovKernel P] :

P.IsGarblingOf P

source

theorem ProbabilityTheory.Kernel.IsGarblingOf.trans {Θ : Type u_1} {𝓧 : Type u_2} {𝓧' : Type u} [mΘ : MeasurableSpace Θ] [m𝓧 : MeasurableSpace 𝓧] [m𝓧' : MeasurableSpace 𝓧'] {𝓧'' : Type u_3} [MeasurableSpace 𝓧''] {P'' : Kernel Θ 𝓧''} {P' : Kernel Θ 𝓧'} {P : Kernel Θ 𝓧} (h₂ : P''.IsGarblingOf P') (h₁ : P'.IsGarblingOf P) :

P''.IsGarblingOf P

source

def ProbabilityTheory.Kernel.BlackwellDominates {Θ : Type u_1} {𝓧 : Type u_2} {𝓧' : Type u} [mΘ : MeasurableSpace Θ] [m𝓧 : MeasurableSpace 𝓧] [m𝓧' : MeasurableSpace 𝓧'] (P : Kernel Θ 𝓧) (P' : Kernel Θ 𝓧') :

Prop

P Blackwell-dominates P': for every decision problem (action space 𝓨, loss ℓ) and prior π, the Bayes risk under P is no larger than under P'. The right-hand side of the Blackwell equivalence.

Equations

One or more equations did not get rendered due to their size.

Instances For

source

theorem ProbabilityTheory.bayesRisk_le_of_isGarblingOf {Θ : Type u_1} {𝓧 : Type u_2} {𝓧' : Type u} [mΘ : MeasurableSpace Θ] [m𝓧 : MeasurableSpace 𝓧] [m𝓧' : MeasurableSpace 𝓧'] {𝓨 : Type u} [MeasurableSpace 𝓨] (ℓ : Θ → 𝓨 → ENNReal) {P : Kernel Θ 𝓧} {P' : Kernel Θ 𝓧'} (h : P'.IsGarblingOf P) (π : MeasureTheory.Measure Θ) :

bayesRisk ℓ P π ≤ bayesRisk ℓ P' π

Easy direction (data-processing). If P' is a garbling of P, then for every decision problem the Bayes risk under P is no larger than under P': garbling the more-informative experiment cannot help. Specializes bayesRisk_le_bayesRisk_comp.

source

theorem ProbabilityTheory.blackwellDominates_of_isGarblingOf {Θ : Type u_1} {𝓧 : Type u_2} {𝓧' : Type u} [mΘ : MeasurableSpace Θ] [m𝓧 : MeasurableSpace 𝓧] [m𝓧' : MeasurableSpace 𝓧'] {P : Kernel Θ 𝓧} {P' : Kernel Θ 𝓧'} (h : P'.IsGarblingOf P) :

P.BlackwellDominates P'

Easy direction, bundled. A garbling of P is Blackwell-dominated by P.

The garbling polytope (finite case) #

Over finite spaces, the Markov garblings {η ∘ₖ P | η Markov} of P, encoded by their singleton masses as vectors in Θ → 𝓧' → ℝ, form a compact convex polytope garblingSet P. It is the linear image of the product of standard simplices — the stochastic matrices η — under garblingMap P. This is the geometric substrate for the Blackwell–Sherman–Stein converse: if encode P' lies outside the polytope, a separating functional realizes a decision problem on which P' is strictly worse than P.

source

theorem ProbabilityTheory.isGarblingOf_of_blackwellDominates {Θ : Type u_1} {𝓧 : Type u_2} {𝓧' : Type u} [mΘ : MeasurableSpace Θ] [m𝓧 : MeasurableSpace 𝓧] [m𝓧' : MeasurableSpace 𝓧'] [Fintype Θ] [Fintype 𝓧] [Fintype 𝓧'] [Nonempty Θ] [MeasurableSingletonClass Θ] [MeasurableSingletonClass 𝓧] [MeasurableSingletonClass 𝓧'] {P : Kernel Θ 𝓧} {P' : Kernel Θ 𝓧'} [IsMarkovKernel P] [IsMarkovKernel P'] (h : P.BlackwellDominates P') :

P'.IsGarblingOf P

Blackwell–Sherman–Stein converse (finite case). If P Blackwell-dominates P' (attains a Bayes risk no larger than P' for every decision problem and prior), then P' is a garbling of P.

Stated for finite parameter and sample spaces, with both experiments Markov kernels. All four hypotheses are essential:

The converse is false for general measurable spaces — this is the finite Blackwell equivalence ([Bla53]); the standard-Borel version additionally requires the experiments to be dominated.
[Nonempty Θ] is necessary: with Θ empty every Bayes risk is 0, so the hypothesis holds vacuously, yet a Markov garbling η : Kernel 𝓧 𝓧' need not exist when 𝓧 is nonempty and 𝓧' is empty. (Nonempty Θ together with [IsMarkovKernel P'] also forces 𝓧' nonempty.)
[IsMarkovKernel P] is necessary: a defective P can attain low risk without being informative. E.g. the zero kernel P = 0 has bayesRisk ℓ 0 π = 0 for every loss (the least possible value), so it dominates every P', yet η ∘ₖ 0 = 0 forces P' = 0.
[IsMarkovKernel P'] is necessary: an over-massed P' inflates every risk. E.g. over a one-point sample space with P' = 2 • P one has bayesRisk ℓ P' π = 2 • bayesRisk ℓ P π ≥ bayesRisk ℓ P π for every loss, yet P' (mass 2) is not η ∘ₖ P for any Markov η.

The quantification over all decision problems is likewise essential: dominance for a single one does not force garbling.

The proof separates encode P' from the garbling polytope of P and realizes the separating functional as a shifted loss under the uniform prior (see the module-level implementation notes).

source

theorem ProbabilityTheory.isGarblingOf_iff_blackwellDominates {Θ : Type u_1} {𝓧 : Type u_2} {𝓧' : Type u} [mΘ : MeasurableSpace Θ] [m𝓧 : MeasurableSpace 𝓧] [m𝓧' : MeasurableSpace 𝓧'] [Fintype Θ] [Fintype 𝓧] [Fintype 𝓧'] [Nonempty Θ] [MeasurableSingletonClass Θ] [MeasurableSingletonClass 𝓧] [MeasurableSingletonClass 𝓧'] {P : Kernel Θ 𝓧} {P' : Kernel Θ 𝓧'} [IsMarkovKernel P] [IsMarkovKernel P'] :

P'.IsGarblingOf P ↔ P.BlackwellDominates P'

[Bla53] (finite case). P is at least as informative as P' (P' is a garbling of P) iff P Blackwell-dominates P' (no greater Bayes risk across every decision problem). The forward direction (blackwellDominates_of_isGarblingOf) holds for arbitrary spaces; the reverse (isGarblingOf_of_blackwellDominates) needs finiteness, nonempty Θ, and that both experiments are Markov kernels.

Documentation

Linglib.Core.Probability.Decision.Blackwell

Blackwell comparison of experiments #

Main definitions #

Main statements #

Implementation notes #

References #

The garbling polytope (finite case) #