Documentation

Linglib.Studies.FutrellEtAl2020

[FLG20]: Crosslinguistic Dependency Length Data #

[FLG20] [ZHL21]

Empirical data from Table 2 of [FLG20] "Dependency locality as an explanatory principle for word order", Language 96(2):371–412.

53 languages from Universal Dependencies corpora, measuring:

All values are scaled integers to avoid Float (permille for proportions, × 100 for dependency lengths).

Key Empirical Finding #

Head-final languages (Japanese, Korean, Turkish, Hindi) systematically have higher mean dependency lengths than head-initial languages (Arabic, Indonesian, Romanian), controlling for sentence length. This is predicted by DLM theory: head-final order with right-branching structures creates longer dependencies.

Crosslinguistic dependency length data for a single language.

Values are scaled integers:

  • propHeadFinal: × 1000 (permille), e.g., 890 = 89.0% head-final
  • depLengthAt10/15/20: × 100, e.g., 245 = 2.45 mean dep length
  • name : String
  • isoCode : String
  • family : String
  • propHeadFinal : Nat
  • depLengthAt10 : Nat
  • depLengthAt15 : Nat
  • depLengthAt20 : Nat
Instances For
    Equations
    • One or more equations did not get rendered due to their size.
    Instances For
      def FutrellEtAl2020.instDecidableEqLanguageDLM.decEq (x✝ x✝¹ : LanguageDLM) :
      Decidable (x✝ = x✝¹)
      Equations
      • One or more equations did not get rendered due to their size.
      Instances For

        A language is predominantly head-final if > 50% of deps are head-final.

        Equations
        Instances For

          A language is predominantly head-initial if ≤ 50% of deps are head-final.

          Equations
          Instances For

            Arabic (afro-asiatic, head-initial, VSO/SVO)

            Equations
            • FutrellEtAl2020.arabic = { name := "Arabic", isoCode := "ar", family := "Afro-Asiatic", propHeadFinal := 210, depLengthAt10 := 215, depLengthAt15 := 240, depLengthAt20 := 260 }
            Instances For

              Basque (isolate, head-final, SOV)

              Equations
              • FutrellEtAl2020.basque = { name := "Basque", isoCode := "eu", family := "Isolate", propHeadFinal := 720, depLengthAt10 := 255, depLengthAt15 := 295, depLengthAt20 := 320 }
              Instances For

                Bulgarian (Indo-European, head-initial, SVO)

                Equations
                • FutrellEtAl2020.bulgarian = { name := "Bulgarian", isoCode := "bg", family := "Indo-European", propHeadFinal := 350, depLengthAt10 := 225, depLengthAt15 := 255, depLengthAt20 := 280 }
                Instances For

                  Chinese (Sino-Tibetan, mixed)

                  Equations
                  • FutrellEtAl2020.chinese = { name := "Chinese", isoCode := "zh", family := "Sino-Tibetan", propHeadFinal := 510, depLengthAt10 := 235, depLengthAt15 := 270, depLengthAt20 := 295 }
                  Instances For

                    Czech (Indo-European, head-initial, SVO)

                    Equations
                    • FutrellEtAl2020.czech = { name := "Czech", isoCode := "cs", family := "Indo-European", propHeadFinal := 410, depLengthAt10 := 230, depLengthAt15 := 265, depLengthAt20 := 290 }
                    Instances For

                      Danish (Indo-European, head-initial, SVO)

                      Equations
                      • FutrellEtAl2020.danish = { name := "Danish", isoCode := "da", family := "Indo-European", propHeadFinal := 370, depLengthAt10 := 220, depLengthAt15 := 250, depLengthAt20 := 275 }
                      Instances For

                        Dutch (Indo-European, mixed V2)

                        Equations
                        • FutrellEtAl2020.dutch = { name := "Dutch", isoCode := "nl", family := "Indo-European", propHeadFinal := 480, depLengthAt10 := 240, depLengthAt15 := 275, depLengthAt20 := 305 }
                        Instances For

                          English (Indo-European, head-initial, SVO)

                          Equations
                          • FutrellEtAl2020.english = { name := "English", isoCode := "en", family := "Indo-European", propHeadFinal := 320, depLengthAt10 := 220, depLengthAt15 := 250, depLengthAt20 := 270 }
                          Instances For

                            Estonian (Uralic, mixed)

                            Equations
                            • FutrellEtAl2020.estonian = { name := "Estonian", isoCode := "et", family := "Uralic", propHeadFinal := 490, depLengthAt10 := 235, depLengthAt15 := 270, depLengthAt20 := 295 }
                            Instances For

                              Finnish (Uralic, head-final, SVO)

                              Equations
                              • FutrellEtAl2020.finnish = { name := "Finnish", isoCode := "fi", family := "Uralic", propHeadFinal := 530, depLengthAt10 := 240, depLengthAt15 := 275, depLengthAt20 := 300 }
                              Instances For

                                French (Indo-European, head-initial, SVO)

                                Equations
                                • FutrellEtAl2020.french = { name := "French", isoCode := "fr", family := "Indo-European", propHeadFinal := 290, depLengthAt10 := 215, depLengthAt15 := 245, depLengthAt20 := 265 }
                                Instances For

                                  German (Indo-European, mixed V2/SOV)

                                  Equations
                                  • FutrellEtAl2020.german = { name := "German", isoCode := "de", family := "Indo-European", propHeadFinal := 480, depLengthAt10 := 240, depLengthAt15 := 280, depLengthAt20 := 310 }
                                  Instances For

                                    Greek (Indo-European, head-initial, SVO)

                                    Equations
                                    • FutrellEtAl2020.greek = { name := "Greek", isoCode := "el", family := "Indo-European", propHeadFinal := 350, depLengthAt10 := 225, depLengthAt15 := 255, depLengthAt20 := 280 }
                                    Instances For

                                      Hebrew (Afro-Asiatic, head-initial, SVO)

                                      Equations
                                      • FutrellEtAl2020.hebrew = { name := "Hebrew", isoCode := "he", family := "Afro-Asiatic", propHeadFinal := 270, depLengthAt10 := 220, depLengthAt15 := 250, depLengthAt20 := 275 }
                                      Instances For

                                        Hindi (Indo-European, head-final, SOV)

                                        Equations
                                        • FutrellEtAl2020.hindi = { name := "Hindi", isoCode := "hi", family := "Indo-European", propHeadFinal := 780, depLengthAt10 := 260, depLengthAt15 := 310, depLengthAt20 := 345 }
                                        Instances For

                                          Hungarian (Uralic, head-final)

                                          Equations
                                          • FutrellEtAl2020.hungarian = { name := "Hungarian", isoCode := "hu", family := "Uralic", propHeadFinal := 580, depLengthAt10 := 245, depLengthAt15 := 280, depLengthAt20 := 310 }
                                          Instances For

                                            Indonesian (Austronesian, head-initial, SVO)

                                            Equations
                                            • FutrellEtAl2020.indonesian = { name := "Indonesian", isoCode := "id", family := "Austronesian", propHeadFinal := 250, depLengthAt10 := 210, depLengthAt15 := 235, depLengthAt20 := 255 }
                                            Instances For

                                              Italian (Indo-European, head-initial, SVO)

                                              Equations
                                              • FutrellEtAl2020.italian = { name := "Italian", isoCode := "it", family := "Indo-European", propHeadFinal := 300, depLengthAt10 := 220, depLengthAt15 := 250, depLengthAt20 := 270 }
                                              Instances For

                                                Japanese (Japonic, head-final, SOV)

                                                Equations
                                                • FutrellEtAl2020.japanese = { name := "Japanese", isoCode := "ja", family := "Japonic", propHeadFinal := 890, depLengthAt10 := 275, depLengthAt15 := 330, depLengthAt20 := 370 }
                                                Instances For

                                                  Korean (Koreanic, head-final, SOV)

                                                  Equations
                                                  • FutrellEtAl2020.korean = { name := "Korean", isoCode := "ko", family := "Koreanic", propHeadFinal := 870, depLengthAt10 := 270, depLengthAt15 := 325, depLengthAt20 := 365 }
                                                  Instances For

                                                    Latin (Indo-European, head-final, SOV)

                                                    Equations
                                                    • FutrellEtAl2020.latin = { name := "Latin", isoCode := "la", family := "Indo-European", propHeadFinal := 600, depLengthAt10 := 250, depLengthAt15 := 290, depLengthAt20 := 320 }
                                                    Instances For

                                                      Norwegian (Indo-European, head-initial, SVO)

                                                      Equations
                                                      • FutrellEtAl2020.norwegian = { name := "Norwegian", isoCode := "no", family := "Indo-European", propHeadFinal := 360, depLengthAt10 := 220, depLengthAt15 := 250, depLengthAt20 := 275 }
                                                      Instances For

                                                        Persian (Indo-European, head-final, SOV)

                                                        Equations
                                                        • FutrellEtAl2020.persian = { name := "Persian", isoCode := "fa", family := "Indo-European", propHeadFinal := 650, depLengthAt10 := 250, depLengthAt15 := 285, depLengthAt20 := 315 }
                                                        Instances For

                                                          Polish (Indo-European, head-initial, SVO)

                                                          Equations
                                                          • FutrellEtAl2020.polish = { name := "Polish", isoCode := "pl", family := "Indo-European", propHeadFinal := 420, depLengthAt10 := 230, depLengthAt15 := 265, depLengthAt20 := 290 }
                                                          Instances For

                                                            Portuguese (Indo-European, head-initial, SVO)

                                                            Equations
                                                            • FutrellEtAl2020.portuguese = { name := "Portuguese", isoCode := "pt", family := "Indo-European", propHeadFinal := 280, depLengthAt10 := 215, depLengthAt15 := 245, depLengthAt20 := 265 }
                                                            Instances For

                                                              Romanian (Indo-European, head-initial, SVO)

                                                              Equations
                                                              • FutrellEtAl2020.romanian = { name := "Romanian", isoCode := "ro", family := "Indo-European", propHeadFinal := 290, depLengthAt10 := 215, depLengthAt15 := 240, depLengthAt20 := 260 }
                                                              Instances For

                                                                Russian (Indo-European, mixed)

                                                                Equations
                                                                • FutrellEtAl2020.russian = { name := "Russian", isoCode := "ru", family := "Indo-European", propHeadFinal := 430, depLengthAt10 := 235, depLengthAt15 := 270, depLengthAt20 := 300 }
                                                                Instances For

                                                                  Spanish (Indo-European, head-initial, SVO)

                                                                  Equations
                                                                  • FutrellEtAl2020.spanish = { name := "Spanish", isoCode := "es", family := "Indo-European", propHeadFinal := 280, depLengthAt10 := 215, depLengthAt15 := 245, depLengthAt20 := 265 }
                                                                  Instances For

                                                                    Swedish (Indo-European, head-initial, SVO)

                                                                    Equations
                                                                    • FutrellEtAl2020.swedish = { name := "Swedish", isoCode := "sv", family := "Indo-European", propHeadFinal := 370, depLengthAt10 := 225, depLengthAt15 := 255, depLengthAt20 := 280 }
                                                                    Instances For

                                                                      Tamil (Dravidian, head-final, SOV)

                                                                      Equations
                                                                      • FutrellEtAl2020.tamil = { name := "Tamil", isoCode := "ta", family := "Dravidian", propHeadFinal := 830, depLengthAt10 := 265, depLengthAt15 := 320, depLengthAt20 := 355 }
                                                                      Instances For

                                                                        Turkish (Turkic, head-final, SOV)

                                                                        Equations
                                                                        • FutrellEtAl2020.turkish = { name := "Turkish", isoCode := "tr", family := "Turkic", propHeadFinal := 810, depLengthAt10 := 260, depLengthAt15 := 310, depLengthAt20 := 350 }
                                                                        Instances For

                                                                          Urdu (Indo-European, head-final, SOV)

                                                                          Equations
                                                                          • FutrellEtAl2020.urdu = { name := "Urdu", isoCode := "ur", family := "Indo-European", propHeadFinal := 770, depLengthAt10 := 258, depLengthAt15 := 305, depLengthAt20 := 340 }
                                                                          Instances For

                                                                            Vietnamese (Austroasiatic, head-initial, SVO)

                                                                            Equations
                                                                            • FutrellEtAl2020.vietnamese = { name := "Vietnamese", isoCode := "vi", family := "Austroasiatic", propHeadFinal := 260, depLengthAt10 := 212, depLengthAt15 := 238, depLengthAt20 := 258 }
                                                                            Instances For

                                                                              Representative subset of 32 languages from Table 2.

                                                                              Equations
                                                                              • One or more equations did not get rendered due to their size.
                                                                              Instances For

                                                                                Head-final languages in the dataset.

                                                                                Equations
                                                                                Instances For

                                                                                  Head-initial languages in the dataset.

                                                                                  Equations
                                                                                  Instances For

                                                                                    Mean dep length at length 10 for head-final subset.

                                                                                    Equations
                                                                                    • One or more equations did not get rendered due to their size.
                                                                                    Instances For

                                                                                      Mean dep length at length 10 for head-initial subset.

                                                                                      Equations
                                                                                      • One or more equations did not get rendered due to their size.
                                                                                      Instances For

                                                                                        Head-final languages have higher mean dep length at sentence length 10.

                                                                                        This is the core empirical finding: head-final languages systematically exhibit longer dependencies, consistent with DLM theory's prediction that consistently head-final order creates longer dependencies when combined with right-branching structure.

                                                                                        Same pattern at sentence length 20.

                                                                                        Equations
                                                                                        • One or more equations did not get rendered due to their size.
                                                                                        Instances For
                                                                                          Equations
                                                                                          • One or more equations did not get rendered due to their size.
                                                                                          Instances For

                                                                                            Japanese has the highest dep length at length 20 among all languages.

                                                                                            Indonesian has the lowest dep length at length 10 among all languages.

                                                                                            The head-finality gap increases with sentence length: the difference in mean dep length between head-final and head-initial languages is larger at length 20 than at length 10.