Language Model as a Markov Kernel #

An autoregressive language model over a vocabulary Voc is a Markov kernel List Voc → PMF (Option Voc) from contexts to next-symbol distributions, with none denoting end-of-string.

This is the smallest cross-cutting primitive shared by everything in Processing/: surprisal-based theories, the IAS family ([GWCF26]), and any downstream measure that wants a unified notion of "the LM's predictive distribution at a context".

PMF is mathlib's probability monad over a (countable) type (PMF α := { f : α → ℝ≥0∞ // HasSum f 1 }); using it here gives the language-model layer the canonical Markov-kernel typing without imposing [Fintype Voc] (vocabularies like String or token streams need not be finite).

Main definitions #

LangModel Voc: kernel List Voc → PMF (Option Voc)
LangModel.nextProb: conditional probability of a single symbol
LangModel.surprisal: −log p(w | c), in nats ([Lev08])

source

structure Processing.LanguageModel.LangModel (Voc : Type u_1) :

Type u_1

An autoregressive language model over a vocabulary Voc, expressed as a Markov kernel from contexts to next-symbol distributions. A draw of none denotes end-of-string.

next : List Voc → PMF (Option Voc)
Conditional distribution over Option Voc given a context.

Instances For

source

def Processing.LanguageModel.LangModel.nextProb {Voc : Type u_1} (lm : LangModel Voc) (c : List Voc) (w : Voc) :

ENNReal

Conditional probability of the next symbol w given context c.

Equations

lm.nextProb c w = (lm.next c) (some w)

Instances For

source

noncomputable def Processing.LanguageModel.LangModel.surprisal {Voc : Type u_1} (lm : LangModel Voc) (c : List Voc) (w : Voc) :

ℝ

Surprisal of the next symbol w given context c (in nats). This is the classical Shannon information content [Lev08].

Equations

lm.surprisal c w = -Real.log (lm.nextProb c w).toReal

Instances For

Documentation

Linglib.Processing.Expectation.LanguageModel

Language Model as a Markov Kernel #

Main definitions #