Incremental Alternative Sampling: Real-Valued Foundation #
@cite{giulianelli-etal-2026} @cite{giulianelli-opedal-cotterell-2024}
The probabilistic backbone underneath Config.lean's enum-level
configuration. The LangModel primitive lives in
Theories/Processing/LanguageModel/Basic.lean; this file builds on it
to define the genSurprisal family of @cite{giulianelli-etal-2026}'s
Eq. 3 — real-valued functions of an LM, context, and target — and shows
that classical surprisal is recovered as the special case
(warp = −log, score = indicator).
This file is what makes Config.lean's enum tags (WarpingFn,
ScoringFn) denote actual mathematical objects rather than just labels:
WarpingFn.denote ports each tag to its real function, and
standardSurprisal_denotes_surprisal is the (non-trivial) reduction
theorem that the enum config "standard surprisal" really computes the
classical −log p(w | c).
Main definitions #
genSurprisal: γ(w; c) = warp( E_{a ~ p(·|c)} [score(a, w, c)] )indicatorScore: g(a, w, c) = 𝟙[a = some w]WarpingFn.denote: the bridge from enum tags to real functionsinformationValue1: horizon-1 IAS, expected distance to target
Main theorem #
standardSurprisal_denotes_surprisal: when the (warp, score) arguments are the denotations ofstandardSurprisal's warping and scoring tags,genSurprisalcollapses toLangModel.surprisal. This is the formal content of the surprisal-as-prefix-expectation identity (Eqs. 2a–2d of @cite{giulianelli-etal-2026}).
Generalised surprisal (@cite{giulianelli-etal-2026} Eq. 3):
γ(w; c) = warp( E_{a ~ p(·|c)} [score(a, w, c)] )
warp is the f, score is the g, and the sampler is the LM's own
next-symbol distribution. Specialising (warp, score) recovers existing
processing measures.
Equations
- Theories.Processing.PredictiveUncertainty.genSurprisal lm warp score c w = warp (∑ o : Option Voc, ((lm.next c) o).toReal * score o w c)
Instances For
The indicator scoring function: 1 iff the alternative matches the target.
Equations
- Theories.Processing.PredictiveUncertainty.indicatorScore o w x✝ = if o = some w then 1 else 0
Instances For
Denotation of a WarpingFn enum tag as a real function ℝ → ℝ.
This is the bridge that turns the symbolic enum in Config.lean into
actual mathematical content.
Equations
- Theories.Processing.PredictiveUncertainty.WarpingFn.negLog.denote = fun (x : ℝ) => -Real.log x
- Theories.Processing.PredictiveUncertainty.WarpingFn.identity.denote = id
Instances For
Denotation of a ScoringFn enum tag as a scoring function. The
.indicator case is fully concrete (= indicatorScore); the .distance
and .similarity cases are parametric in the user-supplied distance and
similarity functions, since the paper's framework abstracts over these.
Equations
- Theories.Processing.PredictiveUncertainty.ScoringFn.denote dist sim Theories.Processing.PredictiveUncertainty.ScoringFn.indicator = Theories.Processing.PredictiveUncertainty.indicatorScore
- Theories.Processing.PredictiveUncertainty.ScoringFn.denote dist sim Theories.Processing.PredictiveUncertainty.ScoringFn.distance = dist
- Theories.Processing.PredictiveUncertainty.ScoringFn.denote dist sim Theories.Processing.PredictiveUncertainty.ScoringFn.similarity = sim
Instances For
Denotation of a complete SurprisalConfig against a language model:
applies cfg.warp.denote and cfg.scoring.denote through genSurprisal.
The horizon and level fields are recorded labels and currently
ignored at the denotation layer (the informationValue1 reduction below
is the only horizon-aware case formalised so far).
Equations
- cfg.applyTo lm dist sim c w = Theories.Processing.PredictiveUncertainty.genSurprisal lm cfg.warp.denote (Theories.Processing.PredictiveUncertainty.ScoringFn.denote dist sim cfg.scoring) c w
Instances For
Standard surprisal is the special case of genSurprisal with
warp = −log and score = indicator. Choosing these (warp, score)
collapses Eq. 3 to γ(w; c) = −log p(w | c) — i.e., classical surprisal
@cite{levy-2008}.
This is the non-trivial reduction theorem: it shows that the enum-level
claim standardSurprisal = (negLog, indicator, 1, predictive) in
Config.lean actually denotes the classical surprisal function on
language models, rather than being a definitional rfl.
Full enum-config reduction: applying the entire standardSurprisal
configuration (not just its warping field) to any LM yields classical
surprisal, regardless of which dist and sim parameters one chooses.
This is the symmetric counterpart to standardSurprisal_denotes_surprisal
— it shows that the whole enum tag tuple in Config.lean denotes
correctly, since .indicator ignores its dist/sim arguments.
Incremental information value at horizon 1: the expected distance between sampled next-symbols and the actual outcome.
This is the single-step specialisation of the IAS measure (@cite{giulianelli-etal-2026}'s V_{r,d,1}, Eq. 6). The full IAS generalises to horizon h ≥ 1 by sampling h-grams; we keep the h = 1 case here as the load-bearing definition (it is what is needed to recover surprisal as a special instance via choice of distance d).
The relationship to genSurprisal is purely structural: information
value is genSurprisal with warp = id and score = a distance.
Equations
- Theories.Processing.PredictiveUncertainty.informationValue1 lm d c w = ∑ o : Option Voc, ((lm.next c) o).toReal * d o w
Instances For
Information value at horizon 1 is genSurprisal with the identity
warping. The sampler is the LM, the scoring function is the distance,
and there is no warping — exactly the (identity, distance, 1, l)
instantiation of @cite{giulianelli-etal-2026}'s family.
Full enum-config reduction for IAS (horizon-agnostic): the entire
informationValue h l configuration, applied to (lm, dist, sim), equals
informationValue1 lm dist' where dist' o w := dist o w c. The horizon
h and level l are recorded labels at this denotation layer; horizon
sensitivity enters only when one moves from informationValue1 to a
horizon-h sum over h-grams.