Documentation

Linglib.Theories.Processing.PredictiveUncertainty.IAS

Incremental Alternative Sampling: Real-Valued Foundation #

@cite{giulianelli-etal-2026} @cite{giulianelli-opedal-cotterell-2024}

The probabilistic backbone underneath Config.lean's enum-level configuration. The LangModel primitive lives in Theories/Processing/LanguageModel/Basic.lean; this file builds on it to define the genSurprisal family of @cite{giulianelli-etal-2026}'s Eq. 3 — real-valued functions of an LM, context, and target — and shows that classical surprisal is recovered as the special case (warp = −log, score = indicator).

This file is what makes Config.lean's enum tags (WarpingFn, ScoringFn) denote actual mathematical objects rather than just labels: WarpingFn.denote ports each tag to its real function, and standardSurprisal_denotes_surprisal is the (non-trivial) reduction theorem that the enum config "standard surprisal" really computes the classical −log p(w | c).

Main definitions #

Main theorem #

noncomputable def Theories.Processing.PredictiveUncertainty.genSurprisal {Voc : Type u_1} [Fintype Voc] (lm : LanguageModel.LangModel Voc) (warp : ) (score : Option VocVocList Voc) (c : List Voc) (w : Voc) :

Generalised surprisal (@cite{giulianelli-etal-2026} Eq. 3):

γ(w; c) = warp( E_{a ~ p(·|c)} [score(a, w, c)] )

warp is the f, score is the g, and the sampler is the LM's own next-symbol distribution. Specialising (warp, score) recovers existing processing measures.

Equations
Instances For
    noncomputable def Theories.Processing.PredictiveUncertainty.indicatorScore {Voc : Type u_1} [DecidableEq Voc] (o : Option Voc) (w : Voc) :
    List Voc

    The indicator scoring function: 1 iff the alternative matches the target.

    Equations
    Instances For

      Denotation of a WarpingFn enum tag as a real function ℝ → ℝ. This is the bridge that turns the symbolic enum in Config.lean into actual mathematical content.

      Equations
      Instances For
        noncomputable def Theories.Processing.PredictiveUncertainty.ScoringFn.denote {Voc : Type u_1} [DecidableEq Voc] (dist sim : Option VocVocList Voc) :
        ScoringFnOption VocVocList Voc

        Denotation of a ScoringFn enum tag as a scoring function. The .indicator case is fully concrete (= indicatorScore); the .distance and .similarity cases are parametric in the user-supplied distance and similarity functions, since the paper's framework abstracts over these.

        Equations
        Instances For
          noncomputable def Theories.Processing.PredictiveUncertainty.SurprisalConfig.applyTo {Voc : Type u_1} [Fintype Voc] [DecidableEq Voc] (cfg : SurprisalConfig) (lm : LanguageModel.LangModel Voc) (dist sim : Option VocVocList Voc) (c : List Voc) (w : Voc) :

          Denotation of a complete SurprisalConfig against a language model: applies cfg.warp.denote and cfg.scoring.denote through genSurprisal. The horizon and level fields are recorded labels and currently ignored at the denotation layer (the informationValue1 reduction below is the only horizon-aware case formalised so far).

          Equations
          Instances For
            theorem Theories.Processing.PredictiveUncertainty.standardSurprisal_denotes_surprisal {Voc : Type u_1} [Fintype Voc] [DecidableEq Voc] (lm : LanguageModel.LangModel Voc) (c : List Voc) (w : Voc) :

            Standard surprisal is the special case of genSurprisal with warp = −log and score = indicator. Choosing these (warp, score) collapses Eq. 3 to γ(w; c) = −log p(w | c) — i.e., classical surprisal @cite{levy-2008}.

            This is the non-trivial reduction theorem: it shows that the enum-level claim standardSurprisal = (negLog, indicator, 1, predictive) in Config.lean actually denotes the classical surprisal function on language models, rather than being a definitional rfl.

            theorem Theories.Processing.PredictiveUncertainty.standardSurprisal_applyTo_eq_surprisal {Voc : Type u_1} [Fintype Voc] [DecidableEq Voc] (lm : LanguageModel.LangModel Voc) (dist sim : Option VocVocList Voc) (c : List Voc) (w : Voc) :
            standardSurprisal.applyTo lm dist sim c w = lm.surprisal c w

            Full enum-config reduction: applying the entire standardSurprisal configuration (not just its warping field) to any LM yields classical surprisal, regardless of which dist and sim parameters one chooses. This is the symmetric counterpart to standardSurprisal_denotes_surprisal — it shows that the whole enum tag tuple in Config.lean denotes correctly, since .indicator ignores its dist/sim arguments.

            noncomputable def Theories.Processing.PredictiveUncertainty.informationValue1 {Voc : Type u_1} [Fintype Voc] (lm : LanguageModel.LangModel Voc) (d : Option VocVoc) (c : List Voc) (w : Voc) :

            Incremental information value at horizon 1: the expected distance between sampled next-symbols and the actual outcome.

            This is the single-step specialisation of the IAS measure (@cite{giulianelli-etal-2026}'s V_{r,d,1}, Eq. 6). The full IAS generalises to horizon h ≥ 1 by sampling h-grams; we keep the h = 1 case here as the load-bearing definition (it is what is needed to recover surprisal as a special instance via choice of distance d).

            The relationship to genSurprisal is purely structural: information value is genSurprisal with warp = id and score = a distance.

            Equations
            Instances For
              theorem Theories.Processing.PredictiveUncertainty.informationValue1_eq_genSurprisal {Voc : Type u_1} [Fintype Voc] (lm : LanguageModel.LangModel Voc) (d : Option VocVoc) (c : List Voc) (w : Voc) :
              informationValue1 lm d c w = genSurprisal lm id (fun (o : Option Voc) (w' : Voc) (x : List Voc) => d o w') c w

              Information value at horizon 1 is genSurprisal with the identity warping. The sampler is the LM, the scoring function is the distance, and there is no warping — exactly the (identity, distance, 1, l) instantiation of @cite{giulianelli-etal-2026}'s family.

              theorem Theories.Processing.PredictiveUncertainty.informationValue_applyTo_eq_informationValue1 {Voc : Type u_1} [Fintype Voc] [DecidableEq Voc] (lm : LanguageModel.LangModel Voc) (dist sim : Option VocVocList Voc) (h : ) (l : RepLevel) (c : List Voc) (w : Voc) :
              (informationValue h l).applyTo lm dist sim c w = informationValue1 lm (fun (o : Option Voc) (w' : Voc) => dist o w' c) c w

              Full enum-config reduction for IAS (horizon-agnostic): the entire informationValue h l configuration, applied to (lm, dist, sim), equals informationValue1 lm dist' where dist' o w := dist o w c. The horizon h and level l are recorded labels at this denotation layer; horizon sensitivity enters only when one moves from informationValue1 to a horizon-h sum over h-grams.