Dependency Length Minimization #
Formalises the core quantity behind [FLG20]'s claim that
natural languages minimise total dependency length beyond what independent
constraints predict, together with [Beh09]'s "Oberstes Gesetz"
threshold predicate. The two short-before-long arithmetic lemmas record the
direction of the DLM savings at the lemma level; the per-paper example trees
and cross-linguistic data sit in Studies/FutrellEtAl2020.lean,
Studies/ArnoldEtAl2000.lean, and Studies/FedzechkinaEtAl2017.lean.
Main declarations #
depLength— linear distance|headIdx - depIdx|for a single dependency.totalDepLength— sum ofdepLengthover a tree's dependencies; the quantity DLM minimises.oberstesGesetz— Behaghel's threshold predicate: every dependency has length ≤threshold.single_dep_direction_irrelevant— head-before-dependent and dependent-before-head yield equal length.
Implementation notes #
depLength is max - min rather than absolute value to stay in Nat. The
consumers (Studies/FutrellEtAl2020, HarmonicOrder, EnhancedDependencies)
open DepGrammar.DependencyLength to pick up depLength/totalDepLength.
Core quantities #
Linear distance between head and dependent, |headIdx - depIdx|.
Equations
- DepGrammar.DependencyLength.depLength d = max d.headIdx d.depIdx - min d.headIdx d.depIdx
Instances For
Total dependency length of a tree: the quantity minimised by DLM.
Equations
- DepGrammar.DependencyLength.totalDepLength t = List.foldl (fun (acc : ℕ) (d : DepGrammar.Dependency) => acc + DepGrammar.DependencyLength.depLength d) 0 t.deps
Instances For
Behaghel's Oberstes Gesetz #
[behaghel-1932]'s Oberstes Gesetz: every dependency has length at
most threshold.
Equations
- DepGrammar.DependencyLength.oberstesGesetz t threshold = t.deps.all fun (d : DepGrammar.Dependency) => decide (DepGrammar.DependencyLength.depLength d ≤ threshold)
Instances For
Short-before-long #
Smaller subtree closer to the head has no greater contribution to dep
length than larger subtree closer. Base arithmetic case: subtree of size s1
at distance 1 versus size s2 at distance s1 + 2.
The arithmetic savings of placing the smaller subtree first.
Symmetry #
Head and dependent are interchangeable in depLength.
An adjacent dependency has length 1.
A self-loop has length 0.
An empty tree has total length 0.