[FLG20]: Crosslinguistic Dependency Length Data #
Empirical data from Table 2 of [FLG20] "Dependency locality as an explanatory principle for word order", Language 96(2):371–412.
53 languages from Universal Dependencies corpora, measuring:
- Proportion of head-final dependencies
- Mean dependency length at sentence lengths 10, 15, 20
All values are scaled integers to avoid Float (permille for proportions, × 100 for dependency lengths).
Key Empirical Finding #
Head-final languages (Japanese, Korean, Turkish, Hindi) systematically have higher mean dependency lengths than head-initial languages (Arabic, Indonesian, Romanian), controlling for sentence length. This is predicted by DLM theory: head-final order with right-branching structures creates longer dependencies.
Crosslinguistic dependency length data for a single language.
Values are scaled integers:
propHeadFinal: × 1000 (permille), e.g., 890 = 89.0% head-finaldepLengthAt10/15/20: × 100, e.g., 245 = 2.45 mean dep length
- name : String
- isoCode : String
- family : String
- propHeadFinal : Nat
- depLengthAt10 : Nat
- depLengthAt15 : Nat
- depLengthAt20 : Nat
Instances For
Equations
- FutrellEtAl2020.instReprLanguageDLM = { reprPrec := FutrellEtAl2020.instReprLanguageDLM.repr }
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
A language is predominantly head-final if > 50% of deps are head-final.
Equations
- l.isHeadFinal = decide (l.propHeadFinal > 500)
Instances For
A language is predominantly head-initial if ≤ 50% of deps are head-final.
Equations
- l.isHeadInitial = decide (l.propHeadFinal ≤ 500)
Instances For
Arabic (afro-asiatic, head-initial, VSO/SVO)
Equations
- FutrellEtAl2020.arabic = { name := "Arabic", isoCode := "ar", family := "Afro-Asiatic", propHeadFinal := 210, depLengthAt10 := 215, depLengthAt15 := 240, depLengthAt20 := 260 }
Instances For
Basque (isolate, head-final, SOV)
Equations
- FutrellEtAl2020.basque = { name := "Basque", isoCode := "eu", family := "Isolate", propHeadFinal := 720, depLengthAt10 := 255, depLengthAt15 := 295, depLengthAt20 := 320 }
Instances For
Bulgarian (Indo-European, head-initial, SVO)
Equations
- FutrellEtAl2020.bulgarian = { name := "Bulgarian", isoCode := "bg", family := "Indo-European", propHeadFinal := 350, depLengthAt10 := 225, depLengthAt15 := 255, depLengthAt20 := 280 }
Instances For
Chinese (Sino-Tibetan, mixed)
Equations
- FutrellEtAl2020.chinese = { name := "Chinese", isoCode := "zh", family := "Sino-Tibetan", propHeadFinal := 510, depLengthAt10 := 235, depLengthAt15 := 270, depLengthAt20 := 295 }
Instances For
Czech (Indo-European, head-initial, SVO)
Equations
- FutrellEtAl2020.czech = { name := "Czech", isoCode := "cs", family := "Indo-European", propHeadFinal := 410, depLengthAt10 := 230, depLengthAt15 := 265, depLengthAt20 := 290 }
Instances For
Danish (Indo-European, head-initial, SVO)
Equations
- FutrellEtAl2020.danish = { name := "Danish", isoCode := "da", family := "Indo-European", propHeadFinal := 370, depLengthAt10 := 220, depLengthAt15 := 250, depLengthAt20 := 275 }
Instances For
Dutch (Indo-European, mixed V2)
Equations
- FutrellEtAl2020.dutch = { name := "Dutch", isoCode := "nl", family := "Indo-European", propHeadFinal := 480, depLengthAt10 := 240, depLengthAt15 := 275, depLengthAt20 := 305 }
Instances For
English (Indo-European, head-initial, SVO)
Equations
- FutrellEtAl2020.english = { name := "English", isoCode := "en", family := "Indo-European", propHeadFinal := 320, depLengthAt10 := 220, depLengthAt15 := 250, depLengthAt20 := 270 }
Instances For
Estonian (Uralic, mixed)
Equations
- FutrellEtAl2020.estonian = { name := "Estonian", isoCode := "et", family := "Uralic", propHeadFinal := 490, depLengthAt10 := 235, depLengthAt15 := 270, depLengthAt20 := 295 }
Instances For
Finnish (Uralic, head-final, SVO)
Equations
- FutrellEtAl2020.finnish = { name := "Finnish", isoCode := "fi", family := "Uralic", propHeadFinal := 530, depLengthAt10 := 240, depLengthAt15 := 275, depLengthAt20 := 300 }
Instances For
French (Indo-European, head-initial, SVO)
Equations
- FutrellEtAl2020.french = { name := "French", isoCode := "fr", family := "Indo-European", propHeadFinal := 290, depLengthAt10 := 215, depLengthAt15 := 245, depLengthAt20 := 265 }
Instances For
German (Indo-European, mixed V2/SOV)
Equations
- FutrellEtAl2020.german = { name := "German", isoCode := "de", family := "Indo-European", propHeadFinal := 480, depLengthAt10 := 240, depLengthAt15 := 280, depLengthAt20 := 310 }
Instances For
Greek (Indo-European, head-initial, SVO)
Equations
- FutrellEtAl2020.greek = { name := "Greek", isoCode := "el", family := "Indo-European", propHeadFinal := 350, depLengthAt10 := 225, depLengthAt15 := 255, depLengthAt20 := 280 }
Instances For
Hebrew (Afro-Asiatic, head-initial, SVO)
Equations
- FutrellEtAl2020.hebrew = { name := "Hebrew", isoCode := "he", family := "Afro-Asiatic", propHeadFinal := 270, depLengthAt10 := 220, depLengthAt15 := 250, depLengthAt20 := 275 }
Instances For
Hindi (Indo-European, head-final, SOV)
Equations
- FutrellEtAl2020.hindi = { name := "Hindi", isoCode := "hi", family := "Indo-European", propHeadFinal := 780, depLengthAt10 := 260, depLengthAt15 := 310, depLengthAt20 := 345 }
Instances For
Hungarian (Uralic, head-final)
Equations
- FutrellEtAl2020.hungarian = { name := "Hungarian", isoCode := "hu", family := "Uralic", propHeadFinal := 580, depLengthAt10 := 245, depLengthAt15 := 280, depLengthAt20 := 310 }
Instances For
Indonesian (Austronesian, head-initial, SVO)
Equations
- FutrellEtAl2020.indonesian = { name := "Indonesian", isoCode := "id", family := "Austronesian", propHeadFinal := 250, depLengthAt10 := 210, depLengthAt15 := 235, depLengthAt20 := 255 }
Instances For
Italian (Indo-European, head-initial, SVO)
Equations
- FutrellEtAl2020.italian = { name := "Italian", isoCode := "it", family := "Indo-European", propHeadFinal := 300, depLengthAt10 := 220, depLengthAt15 := 250, depLengthAt20 := 270 }
Instances For
Japanese (Japonic, head-final, SOV)
Equations
- FutrellEtAl2020.japanese = { name := "Japanese", isoCode := "ja", family := "Japonic", propHeadFinal := 890, depLengthAt10 := 275, depLengthAt15 := 330, depLengthAt20 := 370 }
Instances For
Korean (Koreanic, head-final, SOV)
Equations
- FutrellEtAl2020.korean = { name := "Korean", isoCode := "ko", family := "Koreanic", propHeadFinal := 870, depLengthAt10 := 270, depLengthAt15 := 325, depLengthAt20 := 365 }
Instances For
Latin (Indo-European, head-final, SOV)
Equations
- FutrellEtAl2020.latin = { name := "Latin", isoCode := "la", family := "Indo-European", propHeadFinal := 600, depLengthAt10 := 250, depLengthAt15 := 290, depLengthAt20 := 320 }
Instances For
Norwegian (Indo-European, head-initial, SVO)
Equations
- FutrellEtAl2020.norwegian = { name := "Norwegian", isoCode := "no", family := "Indo-European", propHeadFinal := 360, depLengthAt10 := 220, depLengthAt15 := 250, depLengthAt20 := 275 }
Instances For
Persian (Indo-European, head-final, SOV)
Equations
- FutrellEtAl2020.persian = { name := "Persian", isoCode := "fa", family := "Indo-European", propHeadFinal := 650, depLengthAt10 := 250, depLengthAt15 := 285, depLengthAt20 := 315 }
Instances For
Polish (Indo-European, head-initial, SVO)
Equations
- FutrellEtAl2020.polish = { name := "Polish", isoCode := "pl", family := "Indo-European", propHeadFinal := 420, depLengthAt10 := 230, depLengthAt15 := 265, depLengthAt20 := 290 }
Instances For
Portuguese (Indo-European, head-initial, SVO)
Equations
- FutrellEtAl2020.portuguese = { name := "Portuguese", isoCode := "pt", family := "Indo-European", propHeadFinal := 280, depLengthAt10 := 215, depLengthAt15 := 245, depLengthAt20 := 265 }
Instances For
Romanian (Indo-European, head-initial, SVO)
Equations
- FutrellEtAl2020.romanian = { name := "Romanian", isoCode := "ro", family := "Indo-European", propHeadFinal := 290, depLengthAt10 := 215, depLengthAt15 := 240, depLengthAt20 := 260 }
Instances For
Russian (Indo-European, mixed)
Equations
- FutrellEtAl2020.russian = { name := "Russian", isoCode := "ru", family := "Indo-European", propHeadFinal := 430, depLengthAt10 := 235, depLengthAt15 := 270, depLengthAt20 := 300 }
Instances For
Spanish (Indo-European, head-initial, SVO)
Equations
- FutrellEtAl2020.spanish = { name := "Spanish", isoCode := "es", family := "Indo-European", propHeadFinal := 280, depLengthAt10 := 215, depLengthAt15 := 245, depLengthAt20 := 265 }
Instances For
Swedish (Indo-European, head-initial, SVO)
Equations
- FutrellEtAl2020.swedish = { name := "Swedish", isoCode := "sv", family := "Indo-European", propHeadFinal := 370, depLengthAt10 := 225, depLengthAt15 := 255, depLengthAt20 := 280 }
Instances For
Tamil (Dravidian, head-final, SOV)
Equations
- FutrellEtAl2020.tamil = { name := "Tamil", isoCode := "ta", family := "Dravidian", propHeadFinal := 830, depLengthAt10 := 265, depLengthAt15 := 320, depLengthAt20 := 355 }
Instances For
Turkish (Turkic, head-final, SOV)
Equations
- FutrellEtAl2020.turkish = { name := "Turkish", isoCode := "tr", family := "Turkic", propHeadFinal := 810, depLengthAt10 := 260, depLengthAt15 := 310, depLengthAt20 := 350 }
Instances For
Urdu (Indo-European, head-final, SOV)
Equations
- FutrellEtAl2020.urdu = { name := "Urdu", isoCode := "ur", family := "Indo-European", propHeadFinal := 770, depLengthAt10 := 258, depLengthAt15 := 305, depLengthAt20 := 340 }
Instances For
Vietnamese (Austroasiatic, head-initial, SVO)
Equations
- FutrellEtAl2020.vietnamese = { name := "Vietnamese", isoCode := "vi", family := "Austroasiatic", propHeadFinal := 260, depLengthAt10 := 212, depLengthAt15 := 238, depLengthAt20 := 258 }
Instances For
Representative subset of 32 languages from Table 2.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Head-final languages in the dataset.
Equations
- FutrellEtAl2020.headFinalLanguages = List.filter (fun (x : FutrellEtAl2020.LanguageDLM) => x.isHeadFinal) FutrellEtAl2020.languages
Instances For
Head-initial languages in the dataset.
Equations
- FutrellEtAl2020.headInitialLanguages = List.filter (fun (x : FutrellEtAl2020.LanguageDLM) => x.isHeadInitial) FutrellEtAl2020.languages
Instances For
Mean dep length at length 10 for head-final subset.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Mean dep length at length 10 for head-initial subset.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Head-final languages have higher mean dep length at sentence length 10.
This is the core empirical finding: head-final languages systematically exhibit longer dependencies, consistent with DLM theory's prediction that consistently head-final order creates longer dependencies when combined with right-branching structure.
Same pattern at sentence length 20.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Japanese has the highest dep length at length 20 among all languages.
Indonesian has the lowest dep length at length 10 among all languages.
The head-finality gap increases with sentence length: the difference in mean dep length between head-final and head-initial languages is larger at length 20 than at length 10.