Documentation

Linglib.Theories.Pragmatics.RSA.Cancellation

Cancellation theorems for RSA models with noisy observation #

@cite{goodman-stuhlmuller-2013}'s cancellation principle (informal): as a speaker's observation kernel becomes noisier, the listener's posterior given an utterance moves closer to the prior.

This file states the structural information-theoretic content that follows from the data processing inequality (DPI), and honestly scopes what's universal vs paper-specific.

What IS structurally provable (Path B from session audit) #

What IS NOT structurally provable #

Architectural framing #

§1. Observation-level mutual information #

Define MI(state; obs) = ∑_s prior(s) · KL(κ s ‖ marg). This is the standard chain-rule decomposition (Cover-Thomas Eq 2.61): the average information about the state contained in an observation, equivalent to KL(joint ‖ product).

Working with this form rather than PMF.mutualInformation directly because the per-state decomposition is what makes DPI applicable: each KL(κ_n s ‖ marg_n) vs KL(κ_i s ‖ marg_i) term reduces by klDiv_bind_le per-state with kernel noise : Obs → PMF Obs. The state component never enters the kernel, so the state-preservation support issue vanishes.

noncomputable def RSA.obsMarginal {W : Type u_1} {Obs : Type u_2} (prior : PMF W) (κ : WPMF Obs) :
PMF Obs

The observation marginal marg_o(o) = ∑_s prior(s) · κ(s)(o).

Equations
Instances For
    noncomputable def RSA.mutualInfoStateObs {W : Type u_1} {Obs : Type u_2} [Fintype W] [MeasurableSpace Obs] (prior : PMF W) (κ : WPMF Obs) :
    ENNReal

    Per-state-decomposed mutual information between state and observation: MI(state; obs) = ∑_s prior(s) · KL(κ s ‖ marg).

    This is the conditional-relative-entropy form (Cover-Thomas Eq 2.61). For any joint J(s, o) = prior(s) · κ(s)(o) with marginal marg(o) = ∑_s J(s, o), the standard MI KL(J ‖ prior × marg) equals this per-state weighted sum. The decomposition is what makes the DPI argument tractable.

    Equations
    Instances For

      §2. Observation-level DPI cancellation #

      Theorem: if κ_n s = (κ_i s).bind noise for all states s (the noisy observation kernel is a post-processing of the informative one through some noise channel), then MI(state; obs_n) ≤ MI(state; obs_i).

      The proof applies PMF.klDiv_bind_le per-state. For each s, we have κ_n s = (κ_i s).bind noise AND obsMarginal prior κ_n = (obsMarginal prior κ_i).bind noise (since bind distributes over outer bind). The per-state KL inequality lifts to the prior-weighted sum.

      theorem RSA.obsMarginal_bind_noise {W : Type u_1} {Obs : Type u_2} [Fintype W] [Fintype Obs] [MeasurableSpace W] [MeasurableSingletonClass W] [MeasurableSpace Obs] [MeasurableSingletonClass Obs] (prior : PMF W) (κ_i : WPMF Obs) (noise : ObsPMF Obs) :
      (obsMarginal prior fun (s : W) => (κ_i s).bind noise) = (obsMarginal prior κ_i).bind noise

      Observation marginal under noisy kernel = noise-bind of informative marginal. This is the fact that lets the per-state DPI lift to the marginal level.

      theorem RSA.mutualInfoStateObs_bind_noise_le {W : Type u_1} {Obs : Type u_2} [Fintype W] [Fintype Obs] [MeasurableSpace W] [MeasurableSingletonClass W] [MeasurableSpace Obs] [MeasurableSingletonClass Obs] (prior : PMF W) (κ_i : WPMF Obs) (noise : ObsPMF Obs) (h_ac : ∀ (s : W), (κ_i s).toMeasure.AbsolutelyContinuous (obsMarginal prior κ_i).toMeasure) (h_bind_ac : ∀ (s : W), ((κ_i s).bind noise).toMeasure.AbsolutelyContinuous ((obsMarginal prior κ_i).bind noise).toMeasure) (h_marg_pos : ∀ (o : Obs), (obsMarginal prior κ_i) o 0) (h_noise_pos : ∀ (o o' : Obs), (obsMarginal prior κ_i) o 0((obsMarginal prior κ_i).bind noise) o' 0(noise o) o' 0) :
      (mutualInfoStateObs prior fun (s : W) => (κ_i s).bind noise) mutualInfoStateObs prior κ_i

      Observation-level cancellation theorem (DPI form).

      If the noisy observation kernel is a post-processing of the informative kernel through a noise channel (κ_n s = (κ_i s).bind noise), then the mutual information between state and observation decreases.

      Reduces to PMF.klDiv_bind_le applied per-state. The state component never enters the noise kernel, so the technical support precondition on klDiv_bind_le applies cleanly to noise : Obs → PMF Obs.

      The hypothesis h_ac is per-state absolute continuity of κ_i s w.r.t. obsMarginal prior κ_i; h_marg_pos ensures the obs marginal has full support; h_noise_pos ensures noise has full support compatible with the marginals (the standard DPI support precondition).

      This is the structural information-theoretic content of "less informative observation kernel → less information about state in the observation".

      §3. Honest scoping note #

      The utterance-level form (MI(state; utt_n) ≤ MI(state; utt_i)) does NOT follow from §2 by composing with S1g : Obs → PMF U. The natural composition gives:

      MI(state; utt_x) ≤ MI(state; obs_x) ≤ MI(state; obs_i)   for x ∈ {n, i}
      

      So both MI(state; utt_n) and MI(state; utt_i) are bounded by MI(state; obs_i) — no direct comparison between them. The Markov chain state → Y_i → (U_i, U_n) gives MI(state; U_n) ≤ MI(state; Y_i) but there's no chain state → U_i → U_n because U_n depends on Y_i directly via noise(Y_i), not just on U_i.

      GS2013's per-(world-pair) findings (e.g. "L1(s2 | u) > L1(s3 | u) weakens at lower access") are NOT corollaries of any clean MI cancellation. They are specific numerical evaluations of the model — see Phenomena/ScalarImplicatures/Studies/GoodmanStuhlmuller2013PMF.lean for the illustrative computations.

      The proper architectural framing: §2 above is the universal structural theorem about RSA models with noisy observation chains. Per-paper findings are illustrations of how that structural fact manifests for specific (lex, kernel) shapes — not corollaries provable from §2 alone.