Documentation

Linglib.Theories.Processing.NoisyChannel.Basic

Memory Processes (Noisy-Channel Substrate, Memory-Side) #

@cite{futrell-gibson-levy-2020}

A MemoryProcess is the memory-side noisy-channel substrate of @cite{futrell-gibson-levy-2020}'s lossy-context surprisal: a processor that compresses its observed history into a (possibly noisy) memory state, then predicts the next symbol from that memory state alone. This is the context-noise specialization of the broader noisy-channel comprehension posture (cf. Gibson-Bergen-Piantadosi 2013 for input-string noise; Levy 2011 for noise on both the input and the memory of context).

Following the paper's claims (§3.1, Claims 1–4):

  1. Incrementality of memory (Claim 1) — the encoder encode : List Voc → PMF Mem maps each observed history to a distribution over memory states. (FGL state Claim 1 in terms of a recursive step m : R × W → R; the bundled iterated form here discharges incrementality silently and is a known schema simplification.)
  2. Linguistic knowledge (Claim 2) — the predictor predict : Mem → PMF (Option Voc) maps each memory state to a next-symbol distribution (with none = end-of-string).
  3. Inaccessibility of context (Claim 3) — the predictor sees only the memory state, never the underlying history. This is what makes the model lossy.
  4. Linking hypothesis (Claim 4) — processing difficulty for the next symbol is expectedSurprisal (Eq. 3 of the paper), the expectation of -log P(w | m) over m ~ encode(c).

This file provides the abstract type and the master cost function. Specialisations (lossless = classical surprisal, n-gram, erasure noise, cue-based retrieval) live in sibling files.

The encoder and predictor are mathlib PMF kernels (countable support, ℝ≥0∞-valued), so neither Voc nor Mem need to be Fintype. Expected sums are taken as tsum (∑') which collapses to Finset.sum whenever a Fintype instance is in scope.

Main definitions #

Main theorem #

structure Theories.Processing.NoisyChannel.MemoryProcess (Voc : Type u_1) (Mem : Type u_2) :
Type (max u_1 u_2)

A processing architecture with a memory bottleneck.

encode lossily compresses the observed history into a distribution over memory states; predict gives the next-symbol distribution from a memory state alone. The memory state mediates all information flow from the past to the prediction (the inaccessibility-of-context claim of @cite{futrell-gibson-levy-2020} §3.1, Claim 3).

  • encode : List VocPMF Mem

    Lossy encoder: history → distribution over memory states.

  • predict : MemPMF (Option Voc)

    Predictor: memory state → next-symbol distribution.

Instances For
    noncomputable def Theories.Processing.NoisyChannel.MemoryProcess.perStateSurprisal {Voc : Type u_1} {Mem : Type u_2} (mp : MemoryProcess Voc Mem) (m : Mem) (w : Voc) :

    Per-state surprisal: D_lc(w | m) = -log P(w | m).

    This is the inner factor that appears inside the paper's Eq. 2 (linking hypothesis as proportionality) and Eq. 3 (expected-surprisal expectation). @cite{futrell-gibson-levy-2020} does not give it a standalone equation number, but factoring it out is convenient for the Dirac-collapse proof below.

    Equations
    Instances For
      noncomputable def Theories.Processing.NoisyChannel.MemoryProcess.expectedSurprisal {Voc : Type u_1} {Mem : Type u_2} (mp : MemoryProcess Voc Mem) (c : List Voc) (w : Voc) :

      Expected surprisal under the lossy memory process: D_lc(w | c) = E_{m ~ encode(c)}[-log P(w | m)]. (@cite{futrell-gibson-levy-2020} Eq. 3.)

      This is the master cost function from which classical surprisal, n-gram surprisal, and the cue-based retrieval cost all derive as instantiations of (encode, predict).

      Equations
      Instances For
        noncomputable def Theories.Processing.NoisyChannel.MemoryProcess.marginalProb {Voc : Type u_1} {Mem : Type u_2} (mp : MemoryProcess Voc Mem) (c : List Voc) (w : Voc) :
        ENNReal

        Forward-marginal next-symbol probability under the memory process: P(w | c) = Σ_m P(m | c) · P(w | m).

        Computed as (encode c).bind predict evaluated at some w — the mathlib PMF monad bind is exactly the marginal of the joint kernel.

        This forward marginal is implicit in @cite{futrell-gibson-levy-2020}'s setup but not separately numbered. The paper's noisy-channel inversion in §3.3 (Eqs. 4–9) goes the other direction: marginalizing over contexts c to compute p(w | r) from a fixed memory state r. The forward kernel below — marginalizing over memory states m given a context c — is the dual operation.

        Note: -log marginalProb (the surprisal of the marginal) is not in general equal to expectedSurprisal (the expected surprisal). The two agree iff the encoder is a Dirac (no information loss); otherwise Jensen gives expectedSurprisal ≥ -log marginalProb.

        Equations
        Instances For
          def Theories.Processing.NoisyChannel.MemoryProcess.IsDirac {Voc : Type u_1} {Mem : Type u_2} (mp : MemoryProcess Voc Mem) (f : List VocMem) :

          The memory process is Dirac at f if the encoder concentrates all mass on f c for every history c. This is the lossless ("perfect memory") regime: the memory state is a deterministic function of the history.

          (@cite{futrell-gibson-levy-2020} §3.5.1: classical surprisal arises exactly in this case.)

          Equations
          Instances For
            theorem Theories.Processing.NoisyChannel.MemoryProcess.expectedSurprisal_of_dirac {Voc : Type u_1} {Mem : Type u_2} {mp : MemoryProcess Voc Mem} {f : List VocMem} (h : mp.IsDirac f) (c : List Voc) (w : Voc) :

            Dirac collapse. When the encoder is a Dirac at f c, expected surprisal reduces to per-state surprisal evaluated at f c. The expectation has no spread to average over.

            This is the structural identity that lets us recover classical surprisal as the special case of expectedSurprisal with lossless memory (see LossyContext.lean's expectedSurprisal_eq_surprisal_of_lossless).

            theorem Theories.Processing.NoisyChannel.MemoryProcess.expectedSurprisal_of_constantEncoder {Voc : Type u_1} {Mem : Type u_2} {mp : MemoryProcess Voc Mem} {m₀ : Mem} (h : ∀ (c : List Voc), mp.encode c = PMF.pure m₀) (c : List Voc) (w : Voc) :

            Constant-encoder limit (regression to prior). When the encoder ignores its input and concentrates on a single memory state m₀, expected surprisal collapses to per-state surprisal at m₀. The predictor's distribution at m₀ is then the comprehender's prior (no context-derived information) — so per-state surprisal here is the unigram log-prob.

            This is the opposite extreme of expectedSurprisal_eq_surprisal_of_lossless: where lossless memory recovers classical surprisal, fully lossy memory (all context information discarded) recovers the unigram prior. Together they bracket the lossy-context regime — @cite{futrell-gibson-levy-2020} §3.4.2 ("when all information about a true context is lost, the difficulty of comprehending a word is given exactly by its log prior unigram probability").

            theorem Theories.Processing.NoisyChannel.MemoryProcess.neg_log_marginalProb_le_expectedSurprisal {Voc : Type u_1} {Mem : Type u_2} [Fintype Mem] (mp : MemoryProcess Voc Mem) (c : List Voc) (w : Voc) (h_pos : ∀ (m : Mem), 0 < ((mp.predict m) (some w)).toReal) :
            -Real.log (mp.marginalProb c w).toReal mp.expectedSurprisal c w

            Jensen bound: expected surprisal lower-bounds marginal surprisal. The expected surprisal under the lossy memory process is always at least the surprisal of the forward marginal (encode c).bind predict.

            Equivalently: the per-state-then-expectation order yields a higher processing cost than the marginal-then-surprisal order. The two agree iff the encoder is a Dirac (no information loss); in general Jensen's inequality applied to the concave function log gives the strict direction.

            This is the substantive content of "lossy memory makes things harder": @cite{futrell-gibson-levy-2020} Suppl. Mat. A's memory distortion bound rests on this Jensen gap.

            Hypotheses. [Fintype Mem] reduces tsum to Finset.sum so that mathlib's ConcaveOn.le_map_sum applies directly. h_pos requires the per-state predicted probability of w to be strictly positive at every memory state — needed because mathlib's Real.log 0 = 0 convention breaks the bound at zero entries: a memory state with positive encode mass but zero predict mass for w contributes 0 to expected surprisal under the convention but a positive contribution to the marginal, so −log marginalProb could exceed expectedSurprisal.