Documentation

Linglib.Syntax.DependencyGrammar.Formal.DependencyLength

Dependency Length Minimization #

Formalises the core quantity behind [FLG20]'s claim that natural languages minimise total dependency length beyond what independent constraints predict, together with [Beh09]'s "Oberstes Gesetz" threshold predicate. The two short-before-long arithmetic lemmas record the direction of the DLM savings at the lemma level; the per-paper example trees and cross-linguistic data sit in Studies/FutrellEtAl2020.lean, Studies/ArnoldEtAl2000.lean, and Studies/FedzechkinaEtAl2017.lean.

Main declarations #

Implementation notes #

depLength is max - min rather than absolute value to stay in Nat. The consumers (Studies/FutrellEtAl2020, HarmonicOrder, EnhancedDependencies) open DepGrammar.DependencyLength to pick up depLength/totalDepLength.

Core quantities #

Linear distance between head and dependent, |headIdx - depIdx|.

Equations
Instances For

    Total dependency length of a tree: the quantity minimised by DLM.

    Equations
    Instances For

      Behaghel's Oberstes Gesetz #

      def DepGrammar.DependencyLength.oberstesGesetz (t : DepTree) (threshold : ) :
      Bool

      [behaghel-1932]'s Oberstes Gesetz: every dependency has length at most threshold.

      Equations
      Instances For

        Short-before-long #

        theorem DepGrammar.DependencyLength.short_before_long_minimizes (s1 s2 : ) (h : s1 s2) :
        1 + (s1 + 2) 1 + (s2 + 2)

        Smaller subtree closer to the head has no greater contribution to dep length than larger subtree closer. Base arithmetic case: subtree of size s1 at distance 1 versus size s2 at distance s1 + 2.

        theorem DepGrammar.DependencyLength.short_before_long_savings (s1 s2 : ) :
        s1 s21 + (s2 + 2) - (1 + (s1 + 2)) = s2 - s1

        The arithmetic savings of placing the smaller subtree first.

        Symmetry #

        theorem DepGrammar.DependencyLength.single_dep_direction_irrelevant (h d : ) :
        depLength { headIdx := h, depIdx := d, depType := UD.DepRel.nsubj } = depLength { headIdx := d, depIdx := h, depType := UD.DepRel.nsubj }

        Head and dependent are interchangeable in depLength.

        theorem DepGrammar.DependencyLength.depLength_symmetric (h d : ) (r : UD.DepRel) :
        depLength { headIdx := h, depIdx := d, depType := r } = depLength { headIdx := d, depIdx := h, depType := r }

        depLength ignores the dependency relation.

        theorem DepGrammar.DependencyLength.adjacent_dep_length (h : ) :
        depLength { headIdx := h, depIdx := h + 1, depType := UD.DepRel.nsubj } = 1

        An adjacent dependency has length 1.

        theorem DepGrammar.DependencyLength.self_loop_length (i : ) :
        depLength { headIdx := i, depIdx := i, depType := UD.DepRel.nsubj } = 0

        A self-loop has length 0.

        theorem DepGrammar.DependencyLength.empty_tree_dep_length :
        totalDepLength { words := [], deps := [], rootIdx := 0 } = 0

        An empty tree has total length 0.