Documentation

Linglib.Studies.Levy2008

Levy (2008): expectation-based syntactic comprehension #

[Lev08] (Cognition 106, 1126–1177) derives a resource-allocation theory of processing difficulty: a comprehender maintains a probability distribution over the complete structures consistent with the input so far, and the difficulty of a word is the relative entropy of the updated distribution with respect to the old one. The paper's central result (its eq. (4)) is that this difficulty is exactly the word's surprisal — [Hal01]'s theory — for any generative process over structures, making surprisal a causal bottleneck (§2.3): the structural representations affect predicted difficulty only through the conditional word probabilities they determine.

Main definitions #

Main results #

Implementation notes #

Structures live in an arbitrary discrete measurable space — PMF handles the paper's "normally infinite" 𝒯 (the theorem is measure-level via ProbabilityTheory.klDiv_cond_self and the PMF.filterMeasure.cond bridge). The prior P is fixed throughout, matching the paper's caveat that the equivalence holds only when extra-sentential context does not change while the word is processed.

noncomputable def Levy2008.posterior {T : Type u_1} {W : Type u_2} (P : PMF T) (str : TList W) (ws : List W) (h : tProcessing.Expectation.consistent str ws, t P.support) :
PMF T

Pᵢ (eq. (3)): the comprehender's distribution over complete structures given the prefix — the prior conditioned on consistency.

Equations
Instances For
    theorem Levy2008.posterior_incremental {T : Type u_1} {W : Type u_2} (P : PMF T) (str : TList W) (ws : List W) (w : W) (hws : tProcessing.Expectation.consistent str ws, t P.support) (hnext : tProcessing.Expectation.consistent str (ws ++ [w]), t P.support) :
    (posterior P str ws hws).filter (Processing.Expectation.consistent str (ws ++ [w])) = posterior P str (ws ++ [w]) hnext

    Incremental update equals direct conditioning (eqs. (5)–(8)): filtering the current posterior by consistency with the extended prefix is conditioning the prior on the extended prefix directly.

    theorem Levy2008.klDiv_posterior_eq_surprisal {T : Type u_1} {W : Type u_2} (P : PMF T) (str : TList W) (ws : List W) (w : W) [MeasurableSpace T] [DiscreteMeasurableSpace T] (hws : tProcessing.Expectation.consistent str ws, t P.support) (hnext : tProcessing.Expectation.consistent str (ws ++ [w]), t P.support) :
    (posterior P str (ws ++ [w]) hnext).klDiv (posterior P str ws hws) = ENNReal.ofReal (-Real.log (Processing.Expectation.nextProb P str ws w).toReal)

    Eq. (4): the relative entropy of the updated distribution over structures with respect to the pre-update distribution is the surprisal of the word that triggered the update.

    theorem Levy2008.klDiv_posterior_eq_lm_surprisal {T : Type u_1} {W : Type u_2} (P : PMF T) (str : TList W) (ws : List W) (w : W) [MeasurableSpace T] [DiscreteMeasurableSpace T] (lm : Processing.LanguageModel.LangModel W) (hlm : lm.nextProb ws w = Processing.Expectation.nextProb P str ws w) (hws : tProcessing.Expectation.consistent str ws, t P.support) (hnext : tProcessing.Expectation.consistent str (ws ++ [w]), t P.support) :
    (posterior P str (ws ++ [w]) hnext).klDiv (posterior P str ws hws) = ENNReal.ofReal (lm.surprisal ws w)

    The update difficulty read through any language model that matches the process's conditional word probabilities is that model's surprisal.

    theorem Levy2008.bottleneck {T : Type u_1} {W : Type u_2} (P : PMF T) (str : TList W) (ws : List W) (w : W) [MeasurableSpace T] [DiscreteMeasurableSpace T] {T' : Type u_3} [MeasurableSpace T'] [DiscreteMeasurableSpace T'] (P' : PMF T') (str' : T'List W) (hagree : Processing.Expectation.nextProb P str ws w = Processing.Expectation.nextProb P' str' ws w) (hws : tProcessing.Expectation.consistent str ws, t P.support) (hnext : tProcessing.Expectation.consistent str (ws ++ [w]), t P.support) (hws' : tProcessing.Expectation.consistent str' ws, t P'.support) (hnext' : tProcessing.Expectation.consistent str' (ws ++ [w]), t P'.support) :
    (posterior P str (ws ++ [w]) hnext).klDiv (posterior P str ws hws) = (posterior P' str' (ws ++ [w]) hnext').klDiv (posterior P' str' ws hws')

    Causal bottleneck (§2.3, Fig. 1b): two generative processes assigning the same conditional word probability incur the same update difficulty, regardless of their structural representations.