Blackwell comparison of experiments #
A statistical experiment is a Markov kernel P : Kernel Ξ π§ generating data in π§ from a
parameter in Ξ. Experiment P is at least as informative as P' : Kernel Ξ π§' when P'
can be recovered from P by Markov post-processing ("garbling"): P' = Ξ· ββ P for some Markov
kernel Ξ·. This file develops that order and its characterization through Bayes risk.
[Bla53]'s comparison theorem states that P is at least as informative as P' if and
only if, for every decision problem, the Bayes risk under P is no greater than under P'. We
state and prove this equivalence over ProbabilityTheory.bayesRisk for finite spaces: the
forward direction is the data-processing inequality, the converse the BlackwellβShermanβStein
separation argument (see the implementation notes).
Main definitions #
Kernel.IsGarblingOf:P'.IsGarblingOf PmeansP'is a Markov garbling ofP, i.e.Pis at least as informative asP'. Reflexive (Kernel.IsGarblingOf.refl) and transitive (Kernel.IsGarblingOf.trans).Kernel.BlackwellDominates:P.BlackwellDominates P'meansPhas Bayes risk no greater thanP'for every decision problem and prior β the dual side of the Blackwell equivalence.
Main statements #
bayesRisk_le_of_isGarblingOf/blackwellDominates_of_isGarblingOf: ifP'is a garbling ofP, thenPBlackwell-dominatesP'(the data-processing direction).isGarblingOf_of_blackwellDominates: conversely, ifPBlackwell-dominatesP', thenP'is a garbling ofP(the BlackwellβShermanβStein direction, finite case). Requires finite, nonemptyΞand that bothPandP'are Markov kernels β false otherwise (see the theorem docstring for counterexamples).isGarblingOf_iff_blackwellDominates: the two directions combined.
Implementation notes #
The development is stated entirely over Mathlib's Kernel and bayesRisk with no further
dependencies, so it can serve as a Mathlib.Probability.Decision.Blackwell candidate. The
finite, β-valued eig / questionUtility view in Core.Probability.Decision.ExperimentDesign
is a downstream consumer, bridged via ObservationModel.toPMFKernel.
Kernel.BlackwellDominates quantifies over all decision problems (every measurable action space
π¨ and loss β : Ξ β π¨ β ββ₯0β) and priors: dominance for a single one does not force garbling.
The action-space universe is pinned to u (the universe of π§') because the converse proof
instantiates the dominance hypothesis at the action space π¨ := π§' (the identity estimator).
The converse proof encodes finite kernels as real vectors of singleton masses (encode) and
realizes the garblings of P as a compact convex polytope garblingSet P (the linear image of
the product of standard simplices). If encode P' lies outside the polytope, the geometric
HahnβBanach theorem (geometric_hahn_banach_point_closed) yields a separating functional f;
its coordinate matrix a ΞΈ x' = f (Pi.single ΞΈ (Pi.single x' 1)), shifted by a constant C to
be nonnegative, defines a loss β ΞΈ x' = ENNReal.ofReal (a ΞΈ x' + C) on actions π§'. Under the
uniform prior, the Bayes risk of any experiment Q evaluated at the identity estimator equals
ENNReal.ofReal (|Ξ|β»ΒΉ Β· f (encode Q) + C): the identity estimator pins P' to f (encode P'),
while every estimator for P produces a garbling, whose f-value exceeds the separation level.
This realizes bayesRisk β P' Ο < bayesRisk β P Ο, contradicting the hypothesis β no minimax
theorem is needed, the infimum is bounded below directly. All analytic inputs come from Mathlib
(isCompact_stdSimplex, the geometric_hahn_banach_* lemmas, bayesRisk_fintype). The
kernel-to-stochastic-matrix bridge (encode, garblingMap, buildKernel) is currently
private proof scaffolding; it is a self-contained finite-kernel β row-stochastic-matrix
correspondence that would naturally graduate to its own public file when upstreamed.
References #
P' is a garbling of P (Blackwell): there is a Markov post-processing
kernel Ξ· with P' = Ξ· ββ P. Read "P is at least as informative as P'".
Equations
- P'.IsGarblingOf P = β (Ξ· : ProbabilityTheory.Kernel π§ π§'), ProbabilityTheory.IsMarkovKernel Ξ· β§ P' = Ξ·.comp P
Instances For
P Blackwell-dominates P': for every decision problem (action space π¨, loss β)
and prior Ο, the Bayes risk under P is no larger than under P'. The right-hand side of the
Blackwell equivalence.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Easy direction (data-processing). If P' is a garbling of P, then for every
decision problem the Bayes risk under P is no larger than under P': garbling the
more-informative experiment cannot help. Specializes
bayesRisk_le_bayesRisk_comp.
Easy direction, bundled. A garbling of P is Blackwell-dominated by P.
The garbling polytope (finite case) #
Over finite spaces, the Markov garblings {Ξ· ββ P | Ξ· Markov} of P, encoded by their
singleton masses as vectors in Ξ β π§' β β, form a compact convex polytope garblingSet P.
It is the linear image of the product of standard simplices β the stochastic matrices Ξ· β
under garblingMap P. This is the geometric substrate for the BlackwellβShermanβStein
converse: if encode P' lies outside the polytope, a separating functional realizes a
decision problem on which P' is strictly worse than P.
BlackwellβShermanβStein converse (finite case). If P Blackwell-dominates P' (attains a
Bayes risk no larger than P' for every decision problem and prior), then P' is a garbling of
P.
Stated for finite parameter and sample spaces, with both experiments Markov kernels. All four hypotheses are essential:
- The converse is false for general measurable spaces β this is the finite Blackwell equivalence ([Bla53]); the standard-Borel version additionally requires the experiments to be dominated.
[Nonempty Ξ]is necessary: withΞempty every Bayes risk is0, so the hypothesis holds vacuously, yet a Markov garblingΞ· : Kernel π§ π§'need not exist whenπ§is nonempty andπ§'is empty. (NonemptyΞtogether with[IsMarkovKernel P']also forcesπ§'nonempty.)[IsMarkovKernel P]is necessary: a defectivePcan attain low risk without being informative. E.g. the zero kernelP = 0hasbayesRisk β 0 Ο = 0for every loss (the least possible value), so it dominates everyP', yetΞ· ββ 0 = 0forcesP' = 0.[IsMarkovKernel P']is necessary: an over-massedP'inflates every risk. E.g. over a one-point sample space withP' = 2 β’ Pone hasbayesRisk β P' Ο = 2 β’ bayesRisk β P Ο β₯ bayesRisk β P Οfor every loss, yetP'(mass2) is notΞ· ββ Pfor any MarkovΞ·.
The quantification over all decision problems is likewise essential: dominance for a single one does not force garbling.
The proof separates encode P' from the garbling polytope of P and realizes the separating
functional as a shifted loss under the uniform prior (see the module-level implementation
notes).
[Bla53] (finite case). P is at least as informative as P' (P' is a
garbling of P) iff P Blackwell-dominates P' (no greater Bayes risk across every decision
problem). The forward direction (blackwellDominates_of_isGarblingOf) holds for arbitrary spaces;
the reverse (isGarblingOf_of_blackwellDominates) needs finiteness, nonempty Ξ, and that both
experiments are Markov kernels.