Documentation

Linglib.Processing.Expectation.PrefixProbability

Prefix probabilities of a generative process #

A generative process — a distribution P : PMF T over complete structures together with their yields str : T → List W — induces probabilities on strings: prefixMass is the total mass of the structures whose yield extends a given prefix ([Sto95]'s prefix probability, computed there by Earley parsing), and nextProb the conditional word probability it induces. This is the shared input type of surprisal theory: [Hal01] defines word difficulty as the log-ratio of successive prefix probabilities, and [Lev08] proves that ratio equals the relative entropy of the induced belief update (Studies/Hale2001.lean, Studies/Levy2008.lean).

Main definitions #

Main results #

def Processing.Expectation.consistent {T : Type u_1} {W : Type u_2} (str : TList W) (ws : List W) :
Set T

The complete structures whose yield extends the prefix ws.

Equations
Instances For
    theorem Processing.Expectation.consistent_anti {T : Type u_1} {W : Type u_2} (str : TList W) {ws ws' : List W} (h : ws <+: ws') :
    consistent str ws' consistent str ws

    Longer prefixes select fewer structures.

    noncomputable def Processing.Expectation.prefixMass {T : Type u_1} {W : Type u_2} (P : PMF T) (str : TList W) (ws : List W) :
    ENNReal

    The prefix probability: total mass of the structures whose yield extends the prefix.

    Equations
    Instances For
      @[simp]
      theorem Processing.Expectation.prefixMass_nil {T : Type u_1} {W : Type u_2} (P : PMF T) (str : TList W) :
      prefixMass P str [] = 1
      theorem Processing.Expectation.prefixMass_anti {T : Type u_1} {W : Type u_2} (P : PMF T) (str : TList W) {ws ws' : List W} (h : ws <+: ws') :
      prefixMass P str ws' prefixMass P str ws

      Longer prefixes carry less mass.

      noncomputable def Processing.Expectation.nextProb {T : Type u_1} {W : Type u_2} (P : PMF T) (str : TList W) (ws : List W) (w : W) :
      ENNReal

      The conditional probability of the next word induced by the generative process.

      Equations
      Instances For