Levy (2008): expectation-based syntactic comprehension #
[Lev08] (Cognition 106, 1126–1177) derives a resource-allocation theory of processing difficulty: a comprehender maintains a probability distribution over the complete structures consistent with the input so far, and the difficulty of a word is the relative entropy of the updated distribution with respect to the old one. The paper's central result (its eq. (4)) is that this difficulty is exactly the word's surprisal — [Hal01]'s theory — for any generative process over structures, making surprisal a causal bottleneck (§2.3): the structural representations affect predicted difficulty only through the conditional word probabilities they determine.
Main definitions #
posterior— the comprehender's distribution given a prefix: the prior conditioned on consistency (eq. (3)). The prefix substrate (consistent,prefixMass,nextProb) lives inProcessing.Expectation.PrefixProbability.
Main results #
posterior_incremental— incremental update equals direct conditioning (eqs. (5)–(8)), viaPMF.filter_filter.klDiv_posterior_eq_surprisal— eq. (4): the relative entropy of the update equals the surprisal of the word, viaPMF.klDiv_filter_self.klDiv_posterior_eq_lm_surprisal— the difficulty read through anyLangModelmatching the process's conditional word probabilities.bottleneck— §2.3: processes agreeing on conditional word probabilities incur identical update difficulty, whatever their structures.
Implementation notes #
Structures live in an arbitrary discrete measurable space — PMF handles the
paper's "normally infinite" 𝒯 (the theorem is measure-level via
ProbabilityTheory.klDiv_cond_self and the PMF.filter–Measure.cond
bridge). The prior P is fixed throughout, matching the paper's caveat that
the equivalence holds only when extra-sentential context does not change while
the word is processed.
Pᵢ (eq. (3)): the comprehender's distribution over complete structures
given the prefix — the prior conditioned on consistency.
Equations
- Levy2008.posterior P str ws h = P.filter (Processing.Expectation.consistent str ws) h
Instances For
Incremental update equals direct conditioning (eqs. (5)–(8)): filtering the current posterior by consistency with the extended prefix is conditioning the prior on the extended prefix directly.
Eq. (4): the relative entropy of the updated distribution over structures with respect to the pre-update distribution is the surprisal of the word that triggered the update.
The update difficulty read through any language model that matches the process's conditional word probabilities is that model's surprisal.
Causal bottleneck (§2.3, Fig. 1b): two generative processes assigning the same conditional word probability incur the same update difficulty, regardless of their structural representations.