Language Model as a Markov Kernel #
An autoregressive language model over a vocabulary Voc is a Markov
kernel List Voc → PMF (Option Voc) from contexts to next-symbol
distributions, with none denoting end-of-string.
This is the smallest cross-cutting primitive shared by everything in
Theories/Processing/: surprisal-based theories, the IAS family
(@cite{giulianelli-etal-2026}), and any downstream measure that wants
a unified notion of "the LM's predictive distribution at a context".
PMF is mathlib's probability monad over a (countable) type
(PMF α := { f : α → ℝ≥0∞ // HasSum f 1 }); using it here gives the
language-model layer the canonical Markov-kernel typing without
imposing [Fintype Voc] (vocabularies like String or token streams
need not be finite).
Main definitions #
LangModel Voc: kernelList Voc → PMF (Option Voc)LangModel.nextProb: conditional probability of a single symbolLangModel.surprisal: −log p(w | c), in nats (@cite{levy-2008})
An autoregressive language model over a vocabulary Voc,
expressed as a Markov kernel from contexts to next-symbol distributions.
A draw of none denotes end-of-string.
- next : List Voc → PMF (Option Voc)
Conditional distribution over
Option Vocgiven a context.
Instances For
Surprisal of the next symbol w given context c (in nats).
This is the classical Shannon information content @cite{levy-2008}.