Documentation

Linglib.Core.Probability.Decision.Blackwell

Blackwell comparison of experiments #

A statistical experiment is a Markov kernel P : Kernel Θ 𝓧 generating data in 𝓧 from a parameter in Θ. Experiment P is at least as informative as P' : Kernel Θ 𝓧' when P' can be recovered from P by Markov post-processing ("garbling"): P' = Ξ· βˆ˜β‚– P for some Markov kernel Ξ·. This file develops that order and its characterization through Bayes risk.

[Bla53]'s comparison theorem states that P is at least as informative as P' if and only if, for every decision problem, the Bayes risk under P is no greater than under P'. We state and prove this equivalence over ProbabilityTheory.bayesRisk for finite spaces: the forward direction is the data-processing inequality, the converse the Blackwell–Sherman–Stein separation argument (see the implementation notes).

Main definitions #

Main statements #

Implementation notes #

The development is stated entirely over Mathlib's Kernel and bayesRisk with no further dependencies, so it can serve as a Mathlib.Probability.Decision.Blackwell candidate. The finite, ℝ-valued eig / questionUtility view in Core.Probability.Decision.ExperimentDesign is a downstream consumer, bridged via ObservationModel.toPMFKernel.

Kernel.BlackwellDominates quantifies over all decision problems (every measurable action space 𝓨 and loss β„“ : Θ β†’ 𝓨 β†’ ℝβ‰₯0∞) and priors: dominance for a single one does not force garbling. The action-space universe is pinned to u (the universe of 𝓧') because the converse proof instantiates the dominance hypothesis at the action space 𝓨 := 𝓧' (the identity estimator).

The converse proof encodes finite kernels as real vectors of singleton masses (encode) and realizes the garblings of P as a compact convex polytope garblingSet P (the linear image of the product of standard simplices). If encode P' lies outside the polytope, the geometric Hahn–Banach theorem (geometric_hahn_banach_point_closed) yields a separating functional f; its coordinate matrix a ΞΈ x' = f (Pi.single ΞΈ (Pi.single x' 1)), shifted by a constant C to be nonnegative, defines a loss β„“ ΞΈ x' = ENNReal.ofReal (a ΞΈ x' + C) on actions 𝓧'. Under the uniform prior, the Bayes risk of any experiment Q evaluated at the identity estimator equals ENNReal.ofReal (|Θ|⁻¹ Β· f (encode Q) + C): the identity estimator pins P' to f (encode P'), while every estimator for P produces a garbling, whose f-value exceeds the separation level. This realizes bayesRisk β„“ P' Ο€ < bayesRisk β„“ P Ο€, contradicting the hypothesis β€” no minimax theorem is needed, the infimum is bounded below directly. All analytic inputs come from Mathlib (isCompact_stdSimplex, the geometric_hahn_banach_* lemmas, bayesRisk_fintype). The kernel-to-stochastic-matrix bridge (encode, garblingMap, buildKernel) is currently private proof scaffolding; it is a self-contained finite-kernel ↔ row-stochastic-matrix correspondence that would naturally graduate to its own public file when upstreamed.

References #

def ProbabilityTheory.Kernel.IsGarblingOf {Θ : Type u_1} {𝓧 : Type u_2} {𝓧' : Type u} [mΘ : MeasurableSpace Θ] [m𝓧 : MeasurableSpace 𝓧] [m𝓧' : MeasurableSpace 𝓧'] (P' : Kernel Θ 𝓧') (P : Kernel Θ 𝓧) :

P' is a garbling of P (Blackwell): there is a Markov post-processing kernel Ξ· with P' = Ξ· βˆ˜β‚– P. Read "P is at least as informative as P'".

Equations
  • P'.IsGarblingOf P = βˆƒ (Ξ· : ProbabilityTheory.Kernel 𝓧 𝓧'), ProbabilityTheory.IsMarkovKernel Ξ· ∧ P' = Ξ·.comp P
Instances For
    theorem ProbabilityTheory.Kernel.IsGarblingOf.refl {Θ : Type u_1} {𝓧 : Type u_2} [mΘ : MeasurableSpace Θ] [m𝓧 : MeasurableSpace 𝓧] (P : Kernel Θ 𝓧) [IsMarkovKernel P] :
    theorem ProbabilityTheory.Kernel.IsGarblingOf.trans {Θ : Type u_1} {𝓧 : Type u_2} {𝓧' : Type u} [mΘ : MeasurableSpace Θ] [m𝓧 : MeasurableSpace 𝓧] [m𝓧' : MeasurableSpace 𝓧'] {𝓧'' : Type u_3} [MeasurableSpace 𝓧''] {P'' : Kernel Θ 𝓧''} {P' : Kernel Θ 𝓧'} {P : Kernel Θ 𝓧} (hβ‚‚ : P''.IsGarblingOf P') (h₁ : P'.IsGarblingOf P) :
    def ProbabilityTheory.Kernel.BlackwellDominates {Θ : Type u_1} {𝓧 : Type u_2} {𝓧' : Type u} [mΘ : MeasurableSpace Θ] [m𝓧 : MeasurableSpace 𝓧] [m𝓧' : MeasurableSpace 𝓧'] (P : Kernel Θ 𝓧) (P' : Kernel Θ 𝓧') :

    P Blackwell-dominates P': for every decision problem (action space 𝓨, loss β„“) and prior Ο€, the Bayes risk under P is no larger than under P'. The right-hand side of the Blackwell equivalence.

    Equations
    • One or more equations did not get rendered due to their size.
    Instances For
      theorem ProbabilityTheory.bayesRisk_le_of_isGarblingOf {Θ : Type u_1} {𝓧 : Type u_2} {𝓧' : Type u} [mΘ : MeasurableSpace Θ] [m𝓧 : MeasurableSpace 𝓧] [m𝓧' : MeasurableSpace 𝓧'] {𝓨 : Type u} [MeasurableSpace 𝓨] (β„“ : Θ β†’ 𝓨 β†’ ENNReal) {P : Kernel Θ 𝓧} {P' : Kernel Θ 𝓧'} (h : P'.IsGarblingOf P) (Ο€ : MeasureTheory.Measure Θ) :
      bayesRisk β„“ P Ο€ ≀ bayesRisk β„“ P' Ο€

      Easy direction (data-processing). If P' is a garbling of P, then for every decision problem the Bayes risk under P is no larger than under P': garbling the more-informative experiment cannot help. Specializes bayesRisk_le_bayesRisk_comp.

      theorem ProbabilityTheory.blackwellDominates_of_isGarblingOf {Θ : Type u_1} {𝓧 : Type u_2} {𝓧' : Type u} [mΘ : MeasurableSpace Θ] [m𝓧 : MeasurableSpace 𝓧] [m𝓧' : MeasurableSpace 𝓧'] {P : Kernel Θ 𝓧} {P' : Kernel Θ 𝓧'} (h : P'.IsGarblingOf P) :

      Easy direction, bundled. A garbling of P is Blackwell-dominated by P.

      The garbling polytope (finite case) #

      Over finite spaces, the Markov garblings {Ξ· βˆ˜β‚– P | Ξ· Markov} of P, encoded by their singleton masses as vectors in Θ β†’ 𝓧' β†’ ℝ, form a compact convex polytope garblingSet P. It is the linear image of the product of standard simplices β€” the stochastic matrices Ξ· β€” under garblingMap P. This is the geometric substrate for the Blackwell–Sherman–Stein converse: if encode P' lies outside the polytope, a separating functional realizes a decision problem on which P' is strictly worse than P.

      theorem ProbabilityTheory.isGarblingOf_of_blackwellDominates {Θ : Type u_1} {𝓧 : Type u_2} {𝓧' : Type u} [mΘ : MeasurableSpace Θ] [m𝓧 : MeasurableSpace 𝓧] [m𝓧' : MeasurableSpace 𝓧'] [Fintype Θ] [Fintype 𝓧] [Fintype 𝓧'] [Nonempty Θ] [MeasurableSingletonClass Θ] [MeasurableSingletonClass 𝓧] [MeasurableSingletonClass 𝓧'] {P : Kernel Θ 𝓧} {P' : Kernel Θ 𝓧'} [IsMarkovKernel P] [IsMarkovKernel P'] (h : P.BlackwellDominates P') :

      Blackwell–Sherman–Stein converse (finite case). If P Blackwell-dominates P' (attains a Bayes risk no larger than P' for every decision problem and prior), then P' is a garbling of P.

      Stated for finite parameter and sample spaces, with both experiments Markov kernels. All four hypotheses are essential:

      • The converse is false for general measurable spaces β€” this is the finite Blackwell equivalence ([Bla53]); the standard-Borel version additionally requires the experiments to be dominated.
      • [Nonempty Θ] is necessary: with Θ empty every Bayes risk is 0, so the hypothesis holds vacuously, yet a Markov garbling Ξ· : Kernel 𝓧 𝓧' need not exist when 𝓧 is nonempty and 𝓧' is empty. (Nonempty Θ together with [IsMarkovKernel P'] also forces 𝓧' nonempty.)
      • [IsMarkovKernel P] is necessary: a defective P can attain low risk without being informative. E.g. the zero kernel P = 0 has bayesRisk β„“ 0 Ο€ = 0 for every loss (the least possible value), so it dominates every P', yet Ξ· βˆ˜β‚– 0 = 0 forces P' = 0.
      • [IsMarkovKernel P'] is necessary: an over-massed P' inflates every risk. E.g. over a one-point sample space with P' = 2 β€’ P one has bayesRisk β„“ P' Ο€ = 2 β€’ bayesRisk β„“ P Ο€ β‰₯ bayesRisk β„“ P Ο€ for every loss, yet P' (mass 2) is not Ξ· βˆ˜β‚– P for any Markov Ξ·.

      The quantification over all decision problems is likewise essential: dominance for a single one does not force garbling.

      The proof separates encode P' from the garbling polytope of P and realizes the separating functional as a shifted loss under the uniform prior (see the module-level implementation notes).

      theorem ProbabilityTheory.isGarblingOf_iff_blackwellDominates {Θ : Type u_1} {𝓧 : Type u_2} {𝓧' : Type u} [mΘ : MeasurableSpace Θ] [m𝓧 : MeasurableSpace 𝓧] [m𝓧' : MeasurableSpace 𝓧'] [Fintype Θ] [Fintype 𝓧] [Fintype 𝓧'] [Nonempty Θ] [MeasurableSingletonClass Θ] [MeasurableSingletonClass 𝓧] [MeasurableSingletonClass 𝓧'] {P : Kernel Θ 𝓧} {P' : Kernel Θ 𝓧'} [IsMarkovKernel P] [IsMarkovKernel P'] :

      [Bla53] (finite case). P is at least as informative as P' (P' is a garbling of P) iff P Blackwell-dominates P' (no greater Bayes risk across every decision problem). The forward direction (blackwellDominates_of_isGarblingOf) holds for arbitrary spaces; the reverse (isGarblingOf_of_blackwellDominates) needs finiteness, nonempty Θ, and that both experiments are Markov kernels.