Dependency Length Minimization #

Formalises the core quantity behind [FLG20]'s claim that natural languages minimise total dependency length beyond what independent constraints predict, together with [Beh09]'s "Oberstes Gesetz" threshold predicate. The two short-before-long arithmetic lemmas record the direction of the DLM savings at the lemma level; the per-paper example trees and cross-linguistic data sit in Studies/FutrellEtAl2020.lean, Studies/ArnoldEtAl2000.lean, and Studies/FedzechkinaEtAl2017.lean.

Main declarations #

depLength — linear distance |headIdx - depIdx| for a single dependency.
totalDepLength — sum of depLength over a tree's dependencies; the quantity DLM minimises.
oberstesGesetz — Behaghel's threshold predicate: every dependency has length ≤ threshold.
single_dep_direction_irrelevant — head-before-dependent and dependent-before-head yield equal length.

Implementation notes #

depLength is max - min rather than absolute value to stay in Nat. The consumers (Studies/FutrellEtAl2020, HarmonicOrder, EnhancedDependencies) open DepGrammar.DependencyLength to pick up depLength/totalDepLength.

Core quantities #

source

def DepGrammar.DependencyLength.depLength (d : Dependency) :

ℕ

Linear distance between head and dependent, |headIdx - depIdx|.

Equations

DepGrammar.DependencyLength.depLength d = max d.headIdx d.depIdx - min d.headIdx d.depIdx

Instances For

source

def DepGrammar.DependencyLength.totalDepLength (t : DepTree) :

ℕ

Total dependency length of a tree: the quantity minimised by DLM.

Equations

DepGrammar.DependencyLength.totalDepLength t = List.foldl (fun (acc : ℕ) (d : DepGrammar.Dependency) => acc + DepGrammar.DependencyLength.depLength d) 0 t.deps

Instances For

Behaghel's Oberstes Gesetz #

source

def DepGrammar.DependencyLength.oberstesGesetz (t : DepTree) (threshold : ℕ) :

Bool

[behaghel-1932]'s Oberstes Gesetz: every dependency has length at most threshold.

Equations

DepGrammar.DependencyLength.oberstesGesetz t threshold = t.deps.all fun (d : DepGrammar.Dependency) => decide (DepGrammar.DependencyLength.depLength d ≤ threshold)

Instances For

Short-before-long #

source

theorem DepGrammar.DependencyLength.short_before_long_minimizes (s1 s2 : ℕ) (h : s1 ≤ s2) :

1 + (s1 + 2) ≤ 1 + (s2 + 2)

Smaller subtree closer to the head has no greater contribution to dep length than larger subtree closer. Base arithmetic case: subtree of size s1 at distance 1 versus size s2 at distance s1 + 2.

source

theorem DepGrammar.DependencyLength.short_before_long_savings (s1 s2 : ℕ) :