Documentation

Linglib.Phenomena.WordOrder.Studies.HahnDegenFutrell2021

Study 2: 54-Language Word-Order Efficiency #

@cite{hahn-degen-futrell-2021}

Tests the Efficient Trade-off Hypothesis: the ordering regularities of natural language optimize the memory-surprisal trade-off, serving the communicative interest of the hearer. 54 languages from Universal Dependencies corpora are measured against grammar-preserving random baselines. 50/54 languages have significantly more efficient trade-offs; the 4 exceptions (Latvian, North Sami, Polish, Slovak) all have high word-order freedom (high branching direction entropy).

Key empirical finding (Figure 13): branching direction entropy is negatively correlated with optimization strength (Spearman ρ ≈ −.58, p < .0001). Languages with freer word order show weaker optimization, plausibly because free-order languages use word order to encode information structure rather than minimize processing cost.

Values #

Efficiency data for a single language from Study 2.

  • name : String
  • isoCode : String
  • family : String
  • moreEfficient : Bool

    Whether the real language's trade-off AUC is significantly lower than baseline AUCs (Hochberg-corrected p < .01). This is the empirical instantiation of Processing.MemorySurprisal.efficientTradeoffHypothesis from the theory module.

  • gMean1000 :

    Bootstrapped mean G × 1000 (from SI Figure 2). 1000 = fully optimized.

  • branchDirEntropy1000 : Option

    Branching direction entropy × 1000 (higher = more word-order freedom). none when the value is unavailable in the published data.

Instances For
    Equations
    • One or more equations did not get rendered due to their size.
    Instances For
      Equations
      • One or more equations did not get rendered due to their size.
      Instances For

        Efficient languages (50) #

        G ≥ 0.5 in the LSTM estimator (main paper). Most have G = 1.0.

        Equations
        • HahnDegenFutrell2021.afrikaans = { name := "Afrikaans", isoCode := "af", family := "Indo-European", moreEfficient := true, gMean1000 := 1000, branchDirEntropy1000 := some 195 }
        Instances For
          Equations
          • HahnDegenFutrell2021.amharic = { name := "Amharic", isoCode := "am", family := "Afro-Asiatic", moreEfficient := true, gMean1000 := 1000, branchDirEntropy1000 := some 247 }
          Instances For
            Equations
            • HahnDegenFutrell2021.arabic = { name := "Arabic", isoCode := "ar", family := "Afro-Asiatic", moreEfficient := true, gMean1000 := 1000, branchDirEntropy1000 := some 178 }
            Instances For
              Equations
              • HahnDegenFutrell2021.armenian = { name := "Armenian", isoCode := "hy", family := "Indo-European", moreEfficient := true, gMean1000 := 920, branchDirEntropy1000 := some 337 }
              Instances For
                Equations
                • HahnDegenFutrell2021.bambara = { name := "Bambara", isoCode := "bm", family := "Mande", moreEfficient := true, gMean1000 := 1000, branchDirEntropy1000 := some 110 }
                Instances For
                  Equations
                  • HahnDegenFutrell2021.basque = { name := "Basque", isoCode := "eu", family := "Isolate", moreEfficient := true, gMean1000 := 1000, branchDirEntropy1000 := some 397 }
                  Instances For
                    Equations
                    • HahnDegenFutrell2021.breton = { name := "Breton", isoCode := "br", family := "Indo-European", moreEfficient := true, gMean1000 := 1000, branchDirEntropy1000 := some 279 }
                    Instances For
                      Equations
                      • HahnDegenFutrell2021.bulgarian = { name := "Bulgarian", isoCode := "bg", family := "Indo-European", moreEfficient := true, gMean1000 := 1000, branchDirEntropy1000 := some 254 }
                      Instances For
                        Equations
                        • HahnDegenFutrell2021.buryat = { name := "Buryat", isoCode := "bxr", family := "Mongolic", moreEfficient := true, gMean1000 := 1000, branchDirEntropy1000 := some 169 }
                        Instances For
                          Equations
                          • HahnDegenFutrell2021.cantonese = { name := "Cantonese", isoCode := "yue", family := "Sino-Tibetan", moreEfficient := true, gMean1000 := 960, branchDirEntropy1000 := some 171 }
                          Instances For
                            Equations
                            • HahnDegenFutrell2021.catalan = { name := "Catalan", isoCode := "ca", family := "Indo-European", moreEfficient := true, gMean1000 := 1000, branchDirEntropy1000 := some 207 }
                            Instances For
                              Equations
                              • HahnDegenFutrell2021.chinese = { name := "Chinese", isoCode := "zh", family := "Sino-Tibetan", moreEfficient := true, gMean1000 := 1000, branchDirEntropy1000 := some 144 }
                              Instances For
                                Equations
                                • HahnDegenFutrell2021.croatian = { name := "Croatian", isoCode := "hr", family := "Indo-European", moreEfficient := true, gMean1000 := 1000, branchDirEntropy1000 := some 271 }
                                Instances For
                                  Equations
                                  • HahnDegenFutrell2021.czech = { name := "Czech", isoCode := "cs", family := "Indo-European", moreEfficient := true, gMean1000 := 1000, branchDirEntropy1000 := some 328 }
                                  Instances For
                                    Equations
                                    • HahnDegenFutrell2021.danish = { name := "Danish", isoCode := "da", family := "Indo-European", moreEfficient := true, gMean1000 := 1000, branchDirEntropy1000 := some 250 }
                                    Instances For
                                      Equations
                                      • HahnDegenFutrell2021.dutch = { name := "Dutch", isoCode := "nl", family := "Indo-European", moreEfficient := true, gMean1000 := 1000, branchDirEntropy1000 := some 280 }
                                      Instances For
                                        Equations
                                        • HahnDegenFutrell2021.english = { name := "English", isoCode := "en", family := "Indo-European", moreEfficient := true, gMean1000 := 1000, branchDirEntropy1000 := some 191 }
                                        Instances For
                                          Equations
                                          • HahnDegenFutrell2021.erzya = { name := "Erzya", isoCode := "myv", family := "Uralic", moreEfficient := true, gMean1000 := 990, branchDirEntropy1000 := some 429 }
                                          Instances For
                                            Equations
                                            • HahnDegenFutrell2021.estonian = { name := "Estonian", isoCode := "et", family := "Uralic", moreEfficient := true, gMean1000 := 800, branchDirEntropy1000 := some 435 }
                                            Instances For
                                              Equations
                                              • HahnDegenFutrell2021.faroese = { name := "Faroese", isoCode := "fo", family := "Indo-European", moreEfficient := true, gMean1000 := 1000, branchDirEntropy1000 := some 211 }
                                              Instances For
                                                Equations
                                                • HahnDegenFutrell2021.finnish = { name := "Finnish", isoCode := "fi", family := "Uralic", moreEfficient := true, gMean1000 := 1000, branchDirEntropy1000 := some 357 }
                                                Instances For
                                                  Equations
                                                  • HahnDegenFutrell2021.french = { name := "French", isoCode := "fr", family := "Indo-European", moreEfficient := true, gMean1000 := 1000, branchDirEntropy1000 := some 186 }
                                                  Instances For
                                                    Equations
                                                    • HahnDegenFutrell2021.german = { name := "German", isoCode := "de", family := "Indo-European", moreEfficient := true, gMean1000 := 1000, branchDirEntropy1000 := some 280 }
                                                    Instances For
                                                      Equations
                                                      • HahnDegenFutrell2021.greek = { name := "Greek", isoCode := "el", family := "Indo-European", moreEfficient := true, gMean1000 := 1000, branchDirEntropy1000 := some 219 }
                                                      Instances For
                                                        Equations
                                                        • HahnDegenFutrell2021.hebrew = { name := "Hebrew", isoCode := "he", family := "Afro-Asiatic", moreEfficient := true, gMean1000 := 1000, branchDirEntropy1000 := some 173 }
                                                        Instances For
                                                          Equations
                                                          • HahnDegenFutrell2021.hindi = { name := "Hindi", isoCode := "hi", family := "Indo-European", moreEfficient := true, gMean1000 := 1000, branchDirEntropy1000 := some 59 }
                                                          Instances For
                                                            Equations
                                                            • HahnDegenFutrell2021.hungarian = { name := "Hungarian", isoCode := "hu", family := "Uralic", moreEfficient := true, gMean1000 := 870, branchDirEntropy1000 := some 290 }
                                                            Instances For
                                                              Equations
                                                              • HahnDegenFutrell2021.indonesian = { name := "Indonesian", isoCode := "id", family := "Austronesian", moreEfficient := true, gMean1000 := 1000, branchDirEntropy1000 := some 237 }
                                                              Instances For
                                                                Equations
                                                                • HahnDegenFutrell2021.italian = { name := "Italian", isoCode := "it", family := "Indo-European", moreEfficient := true, gMean1000 := 1000, branchDirEntropy1000 := some 216 }
                                                                Instances For
                                                                  Equations
                                                                  • HahnDegenFutrell2021.japanese = { name := "Japanese", isoCode := "ja", family := "Japonic", moreEfficient := true, gMean1000 := 1000, branchDirEntropy1000 := some 24 }
                                                                  Instances For
                                                                    Equations
                                                                    • HahnDegenFutrell2021.kazakh = { name := "Kazakh", isoCode := "kk", family := "Turkic", moreEfficient := true, gMean1000 := 1000, branchDirEntropy1000 := some 147 }
                                                                    Instances For
                                                                      Equations
                                                                      • HahnDegenFutrell2021.korean = { name := "Korean", isoCode := "ko", family := "Koreanic", moreEfficient := true, gMean1000 := 1000, branchDirEntropy1000 := none }
                                                                      Instances For
                                                                        Equations
                                                                        • HahnDegenFutrell2021.kurmanji = { name := "Kurmanji", isoCode := "kmr", family := "Indo-European", moreEfficient := true, gMean1000 := 930, branchDirEntropy1000 := some 262 }
                                                                        Instances For
                                                                          Equations
                                                                          • HahnDegenFutrell2021.maltese = { name := "Maltese", isoCode := "mt", family := "Afro-Asiatic", moreEfficient := true, gMean1000 := 1000, branchDirEntropy1000 := some 185 }
                                                                          Instances For
                                                                            Equations
                                                                            • HahnDegenFutrell2021.naija = { name := "Naija", isoCode := "pcm", family := "Creole", moreEfficient := true, gMean1000 := 1000, branchDirEntropy1000 := some 239 }
                                                                            Instances For
                                                                              Equations
                                                                              • HahnDegenFutrell2021.norwegian = { name := "Norwegian", isoCode := "no", family := "Indo-European", moreEfficient := true, gMean1000 := 1000, branchDirEntropy1000 := some 220 }
                                                                              Instances For
                                                                                Equations
                                                                                • HahnDegenFutrell2021.persian = { name := "Persian", isoCode := "fa", family := "Indo-European", moreEfficient := true, gMean1000 := 1000, branchDirEntropy1000 := some 142 }
                                                                                Instances For
                                                                                  Equations
                                                                                  • HahnDegenFutrell2021.portuguese = { name := "Portuguese", isoCode := "pt", family := "Indo-European", moreEfficient := true, gMean1000 := 1000, branchDirEntropy1000 := some 223 }
                                                                                  Instances For
                                                                                    Equations
                                                                                    • HahnDegenFutrell2021.romanian = { name := "Romanian", isoCode := "ro", family := "Indo-European", moreEfficient := true, gMean1000 := 1000, branchDirEntropy1000 := some 267 }
                                                                                    Instances For
                                                                                      Equations
                                                                                      • HahnDegenFutrell2021.russian = { name := "Russian", isoCode := "ru", family := "Indo-European", moreEfficient := true, gMean1000 := 1000, branchDirEntropy1000 := some 252 }
                                                                                      Instances For
                                                                                        Equations
                                                                                        • HahnDegenFutrell2021.serbian = { name := "Serbian", isoCode := "sr", family := "Indo-European", moreEfficient := true, gMean1000 := 1000, branchDirEntropy1000 := some 244 }
                                                                                        Instances For
                                                                                          Equations
                                                                                          • HahnDegenFutrell2021.slovenian = { name := "Slovenian", isoCode := "sl", family := "Indo-European", moreEfficient := true, gMean1000 := 820, branchDirEntropy1000 := some 309 }
                                                                                          Instances For
                                                                                            Equations
                                                                                            • HahnDegenFutrell2021.spanish = { name := "Spanish", isoCode := "es", family := "Indo-European", moreEfficient := true, gMean1000 := 1000, branchDirEntropy1000 := some 228 }
                                                                                            Instances For
                                                                                              Equations
                                                                                              • HahnDegenFutrell2021.swedish = { name := "Swedish", isoCode := "sv", family := "Indo-European", moreEfficient := true, gMean1000 := 1000, branchDirEntropy1000 := some 229 }
                                                                                              Instances For
                                                                                                Equations
                                                                                                • HahnDegenFutrell2021.thai = { name := "Thai", isoCode := "th", family := "Kra-Dai", moreEfficient := true, gMean1000 := 1000, branchDirEntropy1000 := some 149 }
                                                                                                Instances For
                                                                                                  Equations
                                                                                                  • HahnDegenFutrell2021.turkish = { name := "Turkish", isoCode := "tr", family := "Turkic", moreEfficient := true, gMean1000 := 1000, branchDirEntropy1000 := some 246 }
                                                                                                  Instances For
                                                                                                    Equations
                                                                                                    • HahnDegenFutrell2021.ukrainian = { name := "Ukrainian", isoCode := "uk", family := "Indo-European", moreEfficient := true, gMean1000 := 1000, branchDirEntropy1000 := some 313 }
                                                                                                    Instances For
                                                                                                      Equations
                                                                                                      • HahnDegenFutrell2021.urdu = { name := "Urdu", isoCode := "ur", family := "Indo-European", moreEfficient := true, gMean1000 := 1000, branchDirEntropy1000 := some 85 }
                                                                                                      Instances For
                                                                                                        Equations
                                                                                                        • HahnDegenFutrell2021.uyghur = { name := "Uyghur", isoCode := "ug", family := "Turkic", moreEfficient := true, gMean1000 := 650, branchDirEntropy1000 := some 87 }
                                                                                                        Instances For
                                                                                                          Equations
                                                                                                          • HahnDegenFutrell2021.vietnamese = { name := "Vietnamese", isoCode := "vi", family := "Austroasiatic", moreEfficient := true, gMean1000 := 1000, branchDirEntropy1000 := some 320 }
                                                                                                          Instances For

                                                                                                            Exception languages (4) #

                                                                                                            G < 0.5 in the LSTM estimator (main paper, Figure 13; SI Figure 2). All have high branching direction entropy (free word order).

                                                                                                            Equations
                                                                                                            • HahnDegenFutrell2021.latvian = { name := "Latvian", isoCode := "lv", family := "Indo-European", moreEfficient := false, gMean1000 := 490, branchDirEntropy1000 := some 347 }
                                                                                                            Instances For
                                                                                                              Equations
                                                                                                              • HahnDegenFutrell2021.northSami = { name := "North Sami", isoCode := "sme", family := "Uralic", moreEfficient := false, gMean1000 := 370, branchDirEntropy1000 := some 315 }
                                                                                                              Instances For
                                                                                                                Equations
                                                                                                                • HahnDegenFutrell2021.polish = { name := "Polish", isoCode := "pl", family := "Indo-European", moreEfficient := false, gMean1000 := 100, branchDirEntropy1000 := some 375 }
                                                                                                                Instances For
                                                                                                                  Equations
                                                                                                                  • HahnDegenFutrell2021.slovak = { name := "Slovak", isoCode := "sk", family := "Indo-European", moreEfficient := false, gMean1000 := 70, branchDirEntropy1000 := some 372 }
                                                                                                                  Instances For

                                                                                                                    All 54 languages from Study 2 (SI Table 2).

                                                                                                                    Equations
                                                                                                                    • One or more equations did not get rendered due to their size.
                                                                                                                    Instances For

                                                                                                                      54 languages in total.

                                                                                                                      50 out of 54 languages have more efficient word orders than baselines.

                                                                                                                      Exactly 4 exceptions.

                                                                                                                      theorem HahnDegenFutrell2021.all_exceptions_have_high_word_order_freedom :
                                                                                                                      (exceptionLanguages.all fun (l : LanguageEfficiency) => match l.branchDirEntropy1000 with | some e => decide (e > 300) | none => false) = true

                                                                                                                      All 4 exceptions have high branching direction entropy (> 300 × 10⁻³).

                                                                                                                      This supports the paper's explanation: languages with very free word order have weaker optimization pressure because many orderings are nearly equally acceptable, reducing the signal of optimization.

                                                                                                                      Entropy values from branching_entropy.tsv at https://github.com/m-hahn/memory-surprisal

                                                                                                                      All 4 exceptions have G < 500 (below the optimization threshold).

                                                                                                                      The moreEfficient flag is consistent with a G ≥ 500 threshold across all 54 languages. This cross-checks two independently encoded fields: moreEfficient (from the binomial test) and gMean1000 (from SI Figure 2's bootstrapped fraction).

                                                                                                                      The 4 exceptions form a contiguous block at the bottom of the G ranking: no efficient language has G below any exception's G.

                                                                                                                      theorem HahnDegenFutrell2021.japanese_lowest_known_entropy :
                                                                                                                      ((List.filterMap (fun (x : LanguageEfficiency) => x.branchDirEntropy1000) allLanguages).all fun (x : ) => decide (x 24)) = true

                                                                                                                      Japanese has the lowest branching direction entropy among languages with known entropy data (most rigid word order). Korean is excluded because its entropy is not available in the published data.

                                                                                                                      Estonian has the highest entropy among efficient languages (435) but is still efficient (G = 0.80), showing that word-order freedom is necessary but not sufficient for being an exception.

                                                                                                                      Mean branching direction entropy is higher for exceptions than efficient languages (computed over languages with known entropy).

                                                                                                                      Equations
                                                                                                                      • One or more equations did not get rendered due to their size.
                                                                                                                      Instances For

                                                                                                                        Slovak has the lowest G value (least evidence for optimization).

                                                                                                                        theorem HahnDegenFutrell2021.most_efficient_fully_optimized :
                                                                                                                        (List.filter (fun (x : LanguageEfficiency) => decide (x.gMean1000 = 1000)) efficientLanguages).length = 42

                                                                                                                        42 out of 50 efficient languages have G = 1.0 (fully optimized: the real language beats every sampled baseline grammar).

                                                                                                                        ISO codes appearing in @cite{futrell-gibson-2020}'s 32-language dataset.

                                                                                                                        Equations
                                                                                                                        • One or more equations did not get rendered due to their size.
                                                                                                                        Instances For

                                                                                                                          ISO codes appearing in this study's 54-language dataset.

                                                                                                                          Equations
                                                                                                                          Instances For

                                                                                                                            Languages in both datasets (by ISO code).

                                                                                                                            Equations
                                                                                                                            Instances For

                                                                                                                              At least 20 languages appear in both datasets.

                                                                                                                              theorem HahnDegenFutrell2021.shared_languages_mostly_efficient :
                                                                                                                              (List.filter (fun (iso : String) => (List.filter (fun (x : LanguageEfficiency) => x.isoCode == iso) allLanguages).all fun (x : LanguageEfficiency) => x.moreEfficient) sharedIsoCodes).length sharedIsoCodes.length - 1

                                                                                                                              All but one shared language (Polish) are efficient in this study.

                                                                                                                              theorem HahnDegenFutrell2021.polish_only_shared_exception :
                                                                                                                              List.filter (fun (iso : String) => (List.filter (fun (x : LanguageEfficiency) => x.isoCode == iso) allLanguages).any fun (x : LanguageEfficiency) => !x.moreEfficient) sharedIsoCodes = ["pl"]

                                                                                                                              Polish is the only shared language that is an exception.

                                                                                                                              Negative correlation between word-order freedom and optimization #

                                                                                                                              Figure 13 of @cite{hahn-degen-futrell-2021} shows that branching direction entropy (x-axis) is negatively correlated with the surprisal difference between real and baseline orders (y-axis). Spearman ρ ≈ −.58, p < .0001.

                                                                                                                              We cannot compute a Spearman correlation in Lean without a ranking function, but we can verify the key structural claims that drive the correlation:

                                                                                                                              theorem HahnDegenFutrell2021.rigid_order_languages_efficient :
                                                                                                                              ((List.filter (fun (l : LanguageEfficiency) => match l.branchDirEntropy1000 with | some e => decide (e < 300) | none => false) allLanguages).all fun (x : LanguageEfficiency) => x.moreEfficient) = true

                                                                                                                              Languages with known low branching entropy (< 300) are all efficient. This is the left side of Figure 13: rigid-order languages cluster at high surprisal difference (strong optimization).

                                                                                                                              theorem HahnDegenFutrell2021.exceptions_all_high_entropy :
                                                                                                                              (exceptionLanguages.all fun (l : LanguageEfficiency) => match l.branchDirEntropy1000 with | some e => decide (e 315) | none => false) = true

                                                                                                                              All 4 exceptions have entropy ≥ 315. This is the lower-right of Figure 13: exceptions cluster at high entropy.

                                                                                                                              theorem HahnDegenFutrell2021.high_entropy_not_sufficient :
                                                                                                                              ((List.filter (fun (l : LanguageEfficiency) => match l.branchDirEntropy1000 with | some e => decide (e 315) | none => false) allLanguages).any fun (x : LanguageEfficiency) => x.moreEfficient) = true

                                                                                                                              Not all high-entropy languages are exceptions: word-order freedom is necessary but not sufficient for being an exception. Estonian (entropy 435) and Finnish (357) are efficient despite high entropy.

                                                                                                                              theorem HahnDegenFutrell2021.low_entropy_higher_mean_g :
                                                                                                                              have lowEntropy := List.filter (fun (l : LanguageEfficiency) => match l.branchDirEntropy1000 with | some e => decide (e < 250) | none => false) allLanguages; have highEntropy := List.filter (fun (l : LanguageEfficiency) => match l.branchDirEntropy1000 with | some e => decide (e 250) | none => false) allLanguages; List.foldl (fun (x1 x2 : ) => x1 + x2) 0 (List.map (fun (x : LanguageEfficiency) => x.gMean1000) lowEntropy) / lowEntropy.length > List.foldl (fun (x1 x2 : ) => x1 + x2) 0 (List.map (fun (x : LanguageEfficiency) => x.gMean1000) highEntropy) / highEntropy.length

                                                                                                                              The mean G value decreases as entropy increases: partition languages into low-entropy (< 250) and high-entropy (≥ 250) groups. The low-entropy group has higher mean G, consistent with the negative correlation.

                                                                                                                              Information locality generalizes dependency locality #

                                                                                                                              @cite{hahn-degen-futrell-2021} argue (§"Other Kinds of Memory Bottlenecks" and Discussion) that information locality generalizes dependency length minimization: DLM minimizes structural distance between related words, while information locality minimizes the information-theoretic distance at which predictive information concentrates.

                                                                                                                              The HarmonicOrder module proves that consistent head direction achieves shorter dependency chains (harmonic_always_shorter). The present study shows that languages with shorter dependencies (lower branching entropy, more consistent direction) achieve better memory-surprisal trade-offs (rigid_order_languages_efficient). Together, these two results establish the chain: harmonic order → short dependencies → information locality → efficient trade-off.

                                                                                                                              The DLM harmonic order prediction holds: consistent head direction produces shorter total dependency length (from HarmonicOrder.lean).

                                                                                                                              theorem HahnDegenFutrell2021.dlm_to_efficiency_chain :
                                                                                                                              DepGrammar.HarmonicOrder.dlmPredictsHarmonicCheaper = true ((List.filter (fun (l : LanguageEfficiency) => match l.branchDirEntropy1000 with | some e => decide (e < 300) | none => false) allLanguages).all fun (x : LanguageEfficiency) => x.moreEfficient) = true

                                                                                                                              The full chain: all languages with low entropy (consistent direction, short dependencies) are efficient, and the DLM prediction holds. This connects the structural argument (HarmonicOrder) to the information-theoretic result (memory-surprisal efficiency).

                                                                                                                              WALS Language Validation #

                                                                                                                              The study uses ISO 639-1 codes (2-letter) from Universal Dependencies. WALS uses ISO 639-3 codes (3-letter). This mapping connects them, enabling family classification cross-checks against WALS v2020.4.

                                                                                                                              Coverage: 51 of 54 languages have WALS entries (missing: Buryat, Croatian, Serbian). Of 51, 42 have identical family names; 9 differ due to terminology (Turkic/Altaic, Japonic/Japanese, Kra-Dai/Tai-Kadai, etc.).

                                                                                                                              ISO 639-1 codes that coincide with ISO 639-3 pass through directly. For macrolanguages (Arabic, Chinese, Persian, Estonian), the mapping points to the specific ISO 639-3 variety used in WALS.

                                                                                                                              def HahnDegenFutrell2021.iso1to3 :
                                                                                                                              List (String × String)

                                                                                                                              ISO 639-1 (study) → ISO 639-3 (WALS) mapping for the 54 languages.

                                                                                                                              Equations
                                                                                                                              • One or more equations did not get rendered due to their size.
                                                                                                                              Instances For

                                                                                                                                Look up a study language's WALS entry via its ISO code.

                                                                                                                                Equations
                                                                                                                                • One or more equations did not get rendered due to their size.
                                                                                                                                Instances For

                                                                                                                                  51 of 54 study languages have WALS entries.

                                                                                                                                  theorem HahnDegenFutrell2021.wals_missing :
                                                                                                                                  List.map (fun (x : LanguageEfficiency) => x.name) (List.filter (fun (x : LanguageEfficiency) => (walsLookup x).isNone) allLanguages) = ["Buryat", "Croatian", "Serbian"]

                                                                                                                                  The 3 languages without WALS entries are Buryat, Croatian, and Serbian.

                                                                                                                                  theorem HahnDegenFutrell2021.wals_family_agreement_count :
                                                                                                                                  (List.filter (fun (l : LanguageEfficiency) => match walsLookup l with | some w => w.family == l.family | none => false) walsMatchedLanguages).length = 42

                                                                                                                                  For all 42 languages where the family names agree, the study family matches the WALS family exactly.

                                                                                                                                  theorem HahnDegenFutrell2021.wals_family_divergence_count :
                                                                                                                                  (List.filter (fun (l : LanguageEfficiency) => match walsLookup l with | some w => w.family != l.family | none => false) walsMatchedLanguages).length = 9

                                                                                                                                  The 9 family-name divergences (all terminological, not errors):

                                                                                                                                  • Basque: study "Isolate" vs WALS "Basque"
                                                                                                                                  • Japanese: "Japonic" vs "Japanese"
                                                                                                                                  • Kazakh/Turkish/Uyghur: "Turkic" vs "Altaic" (Altaic hypothesis disputed)
                                                                                                                                  • Korean: "Koreanic" vs "Korean"
                                                                                                                                  • Naija: "Creole" vs "other"
                                                                                                                                  • Thai: "Kra-Dai" vs "Tai-Kadai"
                                                                                                                                  • Vietnamese: "Austroasiatic" vs "Austro-Asiatic" (hyphenation)