Levy (2008): expectation-based syntactic comprehension #

[Lev08] (Cognition 106, 1126–1177) derives a resource-allocation theory of processing difficulty: a comprehender maintains a probability distribution over the complete structures consistent with the input so far, and the difficulty of a word is the relative entropy of the updated distribution with respect to the old one. The paper's central result (its eq. (4)) is that this difficulty is exactly the word's surprisal — [Hal01]'s theory — for any generative process over structures, making surprisal a causal bottleneck (§2.3): the structural representations affect predicted difficulty only through the conditional word probabilities they determine.

Main definitions #

posterior — the comprehender's distribution given a prefix: the prior conditioned on consistency (eq. (3)). The prefix substrate (consistent, prefixMass, nextProb) lives in Processing.Expectation.PrefixProbability.

Main results #

posterior_incremental — incremental update equals direct conditioning (eqs. (5)–(8)), via PMF.filter_filter.
klDiv_posterior_eq_surprisal — eq. (4): the relative entropy of the update equals the surprisal of the word, via PMF.klDiv_filter_self.
klDiv_posterior_eq_lm_surprisal — the difficulty read through any LangModel matching the process's conditional word probabilities.
bottleneck — §2.3: processes agreeing on conditional word probabilities incur identical update difficulty, whatever their structures.

Implementation notes #

Structures live in an arbitrary discrete measurable space — PMF handles the paper's "normally infinite" 𝒯 (the theorem is measure-level via ProbabilityTheory.klDiv_cond_self and the PMF.filter–Measure.cond bridge). The prior P is fixed throughout, matching the paper's caveat that the equivalence holds only when extra-sentential context does not change while the word is processed.

source

noncomputable def Levy2008.posterior {T : Type u_1} {W : Type u_2} (P : PMF T) (str : T → List W) (ws : List W) (h : ∃ t ∈ Processing.Expectation.consistent str ws, t ∈ P.support) :

PMF T

Pᵢ (eq. (3)): the comprehender's distribution over complete structures given the prefix — the prior conditioned on consistency.

Equations

Levy2008.posterior P str ws h = P.filter (Processing.Expectation.consistent str ws) h

Instances For

source

theorem Levy2008.posterior_incremental {T : Type u_1} {W : Type u_2} (P : PMF T) (str : T → List W) (ws : List W) (w : W) (hws : ∃ t ∈ Processing.Expectation.consistent str ws, t ∈ P.support) (hnext : ∃ t ∈ Processing.Expectation.consistent str (ws ++ [w]), t ∈ P.support) :

(posterior P str ws hws).filter (Processing.Expectation.consistent str (ws ++ [w])) ⋯ = posterior P str (ws ++ [w]) hnext

Incremental update equals direct conditioning (eqs. (5)–(8)): filtering the current posterior by consistency with the extended prefix is conditioning the prior on the extended prefix directly.

source

theorem Levy2008.klDiv_posterior_eq_surprisal {T : Type u_1} {W : Type u_2} (P : PMF T) (str : T → List W) (ws : List W) (w : W) [MeasurableSpace T] [DiscreteMeasurableSpace T] (hws : ∃ t ∈ Processing.Expectation.consistent str ws, t ∈ P.support) (hnext : ∃ t ∈ Processing.Expectation.consistent str (ws ++ [w]), t ∈ P.support) :

(posterior P str (ws ++ [w]) hnext).klDiv (posterior P str ws hws) = ENNReal.ofReal (-Real.log (Processing.Expectation.nextProb P str ws w).toReal)

Eq. (4): the relative entropy of the updated distribution over structures with respect to the pre-update distribution is the surprisal of the word that triggered the update.

source

theorem Levy2008.klDiv_posterior_eq_lm_surprisal {T : Type u_1} {W : Type u_2} (P : PMF T) (str : T → List W) (ws : List W) (w : W) [MeasurableSpace T] [DiscreteMeasurableSpace T] (lm : Processing.LanguageModel.LangModel W) (hlm : lm.nextProb ws w = Processing.Expectation.nextProb P str ws w) (hws : ∃ t ∈ Processing.Expectation.consistent str ws, t ∈ P.support) (hnext : ∃ t ∈ Processing.Expectation.consistent str (ws ++ [w]), t ∈ P.support) :

(posterior P str (ws ++ [w]) hnext).klDiv (posterior P str ws hws) = ENNReal.ofReal (lm.surprisal ws w)

The update difficulty read through any language model that matches the process's conditional word probabilities is that model's surprisal.

source

theorem Levy2008.bottleneck {T : Type u_1} {W : Type u_2} (P : PMF T) (str : T → List W) (ws : List W) (w : W) [MeasurableSpace T] [DiscreteMeasurableSpace T] {T' : Type u_3} [MeasurableSpace T'] [DiscreteMeasurableSpace T'] (P' : PMF T') (str' : T' → List W) (hagree : Processing.Expectation.nextProb P str ws w = Processing.Expectation.nextProb P' str' ws w) (hws : ∃ t ∈ Processing.Expectation.consistent str ws, t ∈ P.support) (hnext : ∃ t ∈ Processing.Expectation.consistent str (ws ++ [w]), t ∈ P.support) (hws' : ∃ t ∈ Processing.Expectation.consistent str' ws, t ∈ P'.support) (hnext' : ∃ t ∈ Processing.Expectation.consistent str' (ws ++ [w]), t ∈ P'.support) :

(posterior P str (ws ++ [w]) hnext).klDiv (posterior P str ws hws) = (posterior P' str' (ws ++ [w]) hnext').klDiv (posterior P' str' ws hws')

Causal bottleneck (§2.3, Fig. 1b): two generative processes assigning the same conditional word probability incur the same update difficulty, regardless of their structural representations.

Documentation

Linglib.Studies.Levy2008

Levy (2008): expectation-based syntactic comprehension #

Main definitions #

Main results #

Implementation notes #