Standard Average European: A Linguistic Area #
@cite{haspelmath-2001}
Formalization of @cite{haspelmath-2001}'s argument that the core European languages — the Romance, Germanic, Balkan, and Balto-Slavic families plus the westernmost Finno-Ugric languages — form a Sprachbund (linguistic area) called Standard Average European (SAE), defined by a dozen shared structural features that are absent in the geographically and genealogically adjacent languages.
This study instantiates the framework-neutral schema in
Theories.Diachronic.Areal: SAE is a LinguisticArea SAELanguage SAEFeature
whose feature set is the twelve major Europeanisms of §2 of the paper, whose
reference frame is the four samples §1 demands (area, cofamilial, adjacent,
world), and whose areality is verified feature-by-feature against
@cite{haspelmath-2001}'s Maps 107.1–107.12.
The cluster-map gradience of §4 (most notably the Charlemagne nucleus
formed by French and German) is recovered automatically from the discrete
feature data via LinguisticArea.clusterScore and nucleus.
Architectural notes #
- Languages and features are local inductives: the paper surveys a
specific sample and committing to its boundaries is appropriate here.
A
SAELanguage.toWALSbridge maps each language to its WALS code where one exists. - The paper's Maps 107.1–107.12 are the primary source for every
isogloss. Each paper-based isogloss (e.g.
articleLgs,vplusNILgs) is the explicit set Haspelmath plots on the corresponding map. For the six features with a directly comparable WALS chapter (37A, 38A, 47A, 101A, 107A, 115A, 121A), a siblingwals*Lgsset is derived fromData.WALS.F*.allDataviawalsClassifies. The two are intentionally not unioned: where they disagree, that disagreement is a fact about Haspelmath's classification vs. WALS's encoding and should remain visible to readers and to bridge theorems. - Isoglosses are
Finset SAELanguage(the schema'sIsoglossis a transparent abbreviation forFinset L). All filter predicates are decidable, so the four areality criteria reduce todecideagainst the computable rationalIsogloss.density.
Languages surveyed by @cite{haspelmath-2001}, partitioned by their role in the four reference samples (area / cofamilial / adjacent / world).
The list follows the paper's coverage but is necessarily a subset — the maps include Sami, Mordvin, Komi, Udmurt, Mari, Tatar, Kalmyk, Lezgian, etc. We retain enough of each subgroup to make the four samples non-empty and to preserve the paper's headline findings (French/German nucleus, Celtic/Basque/Turkic margin).
- French : SAELanguage
- Italian : SAELanguage
- Spanish : SAELanguage
- Portuguese : SAELanguage
- Romanian : SAELanguage
- Catalan : SAELanguage
- German : SAELanguage
- Dutch : SAELanguage
- English : SAELanguage
- Swedish : SAELanguage
- Norwegian : SAELanguage
- Danish : SAELanguage
- Icelandic : SAELanguage
- Russian : SAELanguage
- Polish : SAELanguage
- Czech : SAELanguage
- Bulgarian : SAELanguage
- SerboCroatian : SAELanguage
- Ukrainian : SAELanguage
- Greek : SAELanguage
- Albanian : SAELanguage
- Macedonian : SAELanguage
- Latvian : SAELanguage
- Lithuanian : SAELanguage
- Hungarian : SAELanguage
- Finnish : SAELanguage
- Estonian : SAELanguage
- Welsh : SAELanguage
- Irish : SAELanguage
- Breton : SAELanguage
- Basque : SAELanguage
- Turkish : SAELanguage
- Maltese : SAELanguage
- Persian : SAELanguage
- HindiUrdu : SAELanguage
- Armenian : SAELanguage
- Lezgian : SAELanguage
- Georgian : SAELanguage
- Mongolian : SAELanguage
- Indonesian : SAELanguage
- Yoruba : SAELanguage
- Japanese : SAELanguage
- Mandarin : SAELanguage
- Swahili : SAELanguage
- Quechua : SAELanguage
Instances For
Equations
- Phenomena.ArealTypology.Studies.Haspelmath2001.instDecidableEqSAELanguage x✝ y✝ = if h : x✝.ctorIdx = y✝.ctorIdx then isTrue ⋯ else isFalse ⋯
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- One or more equations did not get rendered due to their size.
WALS code for each SAELanguage where one exists. WALS codes are 3-letter
identifiers used by the World Atlas of Language Structures (the v2020.4 codes
in Data.WALS.Features.*). Returns none for languages outside the WALS
sample (currently: every SAELanguage constructor maps to a code, but the
return type is Option String to accommodate future additions to
SAELanguage that may not be in WALS).
Equations
- Phenomena.ArealTypology.Studies.Haspelmath2001.SAELanguage.French.toWALS = some "fre"
- Phenomena.ArealTypology.Studies.Haspelmath2001.SAELanguage.Italian.toWALS = some "ita"
- Phenomena.ArealTypology.Studies.Haspelmath2001.SAELanguage.Spanish.toWALS = some "spa"
- Phenomena.ArealTypology.Studies.Haspelmath2001.SAELanguage.Portuguese.toWALS = some "por"
- Phenomena.ArealTypology.Studies.Haspelmath2001.SAELanguage.Romanian.toWALS = some "rom"
- Phenomena.ArealTypology.Studies.Haspelmath2001.SAELanguage.Catalan.toWALS = some "ctl"
- Phenomena.ArealTypology.Studies.Haspelmath2001.SAELanguage.German.toWALS = some "ger"
- Phenomena.ArealTypology.Studies.Haspelmath2001.SAELanguage.Dutch.toWALS = some "dut"
- Phenomena.ArealTypology.Studies.Haspelmath2001.SAELanguage.English.toWALS = some "eng"
- Phenomena.ArealTypology.Studies.Haspelmath2001.SAELanguage.Swedish.toWALS = some "swe"
- Phenomena.ArealTypology.Studies.Haspelmath2001.SAELanguage.Norwegian.toWALS = some "nor"
- Phenomena.ArealTypology.Studies.Haspelmath2001.SAELanguage.Danish.toWALS = some "dsh"
- Phenomena.ArealTypology.Studies.Haspelmath2001.SAELanguage.Icelandic.toWALS = some "ice"
- Phenomena.ArealTypology.Studies.Haspelmath2001.SAELanguage.Russian.toWALS = some "rus"
- Phenomena.ArealTypology.Studies.Haspelmath2001.SAELanguage.Polish.toWALS = some "pol"
- Phenomena.ArealTypology.Studies.Haspelmath2001.SAELanguage.Czech.toWALS = some "cze"
- Phenomena.ArealTypology.Studies.Haspelmath2001.SAELanguage.Bulgarian.toWALS = some "bul"
- Phenomena.ArealTypology.Studies.Haspelmath2001.SAELanguage.SerboCroatian.toWALS = some "scr"
- Phenomena.ArealTypology.Studies.Haspelmath2001.SAELanguage.Ukrainian.toWALS = some "ukr"
- Phenomena.ArealTypology.Studies.Haspelmath2001.SAELanguage.Greek.toWALS = some "grk"
- Phenomena.ArealTypology.Studies.Haspelmath2001.SAELanguage.Albanian.toWALS = some "alb"
- Phenomena.ArealTypology.Studies.Haspelmath2001.SAELanguage.Macedonian.toWALS = some "mcd"
- Phenomena.ArealTypology.Studies.Haspelmath2001.SAELanguage.Latvian.toWALS = some "lat"
- Phenomena.ArealTypology.Studies.Haspelmath2001.SAELanguage.Lithuanian.toWALS = some "lit"
- Phenomena.ArealTypology.Studies.Haspelmath2001.SAELanguage.Hungarian.toWALS = some "hun"
- Phenomena.ArealTypology.Studies.Haspelmath2001.SAELanguage.Finnish.toWALS = some "fin"
- Phenomena.ArealTypology.Studies.Haspelmath2001.SAELanguage.Estonian.toWALS = some "est"
- Phenomena.ArealTypology.Studies.Haspelmath2001.SAELanguage.Welsh.toWALS = some "wel"
- Phenomena.ArealTypology.Studies.Haspelmath2001.SAELanguage.Irish.toWALS = some "iri"
- Phenomena.ArealTypology.Studies.Haspelmath2001.SAELanguage.Breton.toWALS = some "bre"
- Phenomena.ArealTypology.Studies.Haspelmath2001.SAELanguage.Basque.toWALS = some "bsq"
- Phenomena.ArealTypology.Studies.Haspelmath2001.SAELanguage.Turkish.toWALS = some "tur"
- Phenomena.ArealTypology.Studies.Haspelmath2001.SAELanguage.Maltese.toWALS = some "mlt"
- Phenomena.ArealTypology.Studies.Haspelmath2001.SAELanguage.Persian.toWALS = some "prs"
- Phenomena.ArealTypology.Studies.Haspelmath2001.SAELanguage.HindiUrdu.toWALS = some "hin"
- Phenomena.ArealTypology.Studies.Haspelmath2001.SAELanguage.Armenian.toWALS = some "arm"
- Phenomena.ArealTypology.Studies.Haspelmath2001.SAELanguage.Lezgian.toWALS = some "lez"
- Phenomena.ArealTypology.Studies.Haspelmath2001.SAELanguage.Georgian.toWALS = some "geo"
- Phenomena.ArealTypology.Studies.Haspelmath2001.SAELanguage.Mongolian.toWALS = some "kha"
- Phenomena.ArealTypology.Studies.Haspelmath2001.SAELanguage.Indonesian.toWALS = some "ind"
- Phenomena.ArealTypology.Studies.Haspelmath2001.SAELanguage.Yoruba.toWALS = some "yor"
- Phenomena.ArealTypology.Studies.Haspelmath2001.SAELanguage.Japanese.toWALS = some "jpn"
- Phenomena.ArealTypology.Studies.Haspelmath2001.SAELanguage.Mandarin.toWALS = some "mnd"
- Phenomena.ArealTypology.Studies.Haspelmath2001.SAELanguage.Swahili.toWALS = some "swa"
- Phenomena.ArealTypology.Studies.Haspelmath2001.SAELanguage.Quechua.toWALS = some "qcu"
Instances For
Generic WALS classifier: language l has a value in WALS chapter data
that satisfies the boolean predicate pred. Returns False when l lacks
a WALS code or the chapter has no entry for it.
This is the bridging primitive for all WALS-grounded isoglosses below. Used as
walsClassifies Data.WALS.F121A.allData (· == .particle) to ask "does WALS
121A classify this language as having a particle comparative?". The result
is propositional with a derivable Decidable instance, so it slots directly
into Finset.filter.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- One or more equations did not get rendered due to their size.
The twelve "Europeanisms" identified in §2 of @cite{haspelmath-2001}. Each is the subject of one of Maps 107.1–107.12.
- definiteIndefiniteArticles : SAEFeature
§2.1, Map 107.1: Both definite and indefinite articles present.
- relativeClausesWithRelPro : SAEFeature
§2.2, Map 107.2: Postnominal relative clauses introduced by an inflecting relative pronoun (e.g. der/die/das, qui/que).
- havePerfect : SAEFeature
§2.3, Map 107.3: Transitive perfect formed by 'have' + past participle.
- nominativeExperiencers : SAEFeature
§2.4, Map 107.4: Predominant generalization of experiencer-as-nominative (English-style I like it) over inverting (it pleases me).
- participialPassive : SAEFeature
§2.5, Map 107.5: Canonical participial passive with copula + participle.
- anticausativeProminence : SAEFeature
§2.6, Map 107.6: Anticausative-prominent inchoative–causative pairs.
- dativeExternalPossessor : SAEFeature
§2.7, Map 107.7: Dative external possessors (e.g. German Die Mutter wäscht dem Kind die Haare).
- negativePronounsNoVerbalNeg : SAEFeature
§2.8, Map 107.8: Negative pronouns without obligatory verbal negation (V + NI type, e.g. French personne ne vient, German niemand kommt).
- particleComparative : SAEFeature
§2.9, Map 107.9: Particle comparatives (English than, Latin quam).
- relativeBasedEquative : SAEFeature
§2.10, Map 107.10: Equative constructions based on relative-clause structure (Catalan tan Z com X).
- strictAgreement : SAEFeature
§2.11, Map 107.11: Strict subject agreement — subject affixes that cannot stand alone with referential force (German ich arbeite, not arbeite).
- intensifierReflexiveDifferentiation : SAEFeature
§2.12, Map 107.12: Differentiated intensifier vs. reflexive forms (German selbst vs. sich, Russian sam vs. sebja).
Instances For
Equations
- Phenomena.ArealTypology.Studies.Haspelmath2001.instDecidableEqSAEFeature x✝ y✝ = if h : x✝.ctorIdx = y✝.ctorIdx then isTrue ⋯ else isFalse ⋯
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- One or more equations did not get rendered due to their size.
WALS-derived: languages classified by F37A (Definite Articles) as having a definite article (any of the three positive values).
Equations
- One or more equations did not get rendered due to their size.
Instances For
WALS-derived: languages classified by F38A (Indefinite Articles) as having any indefinite article distinct from "no indefinite article" or "indefinite-only of an unrelated kind".
Equations
- One or more equations did not get rendered due to their size.
Instances For
WALS-derived intersection: languages with both a definite and an indefinite article in the WALS data, matching @cite{haspelmath-2001} §2.1.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Languages with both definite and indefinite articles (Map 107.1).
The paper's reading: Romance, Germanic (except Icelandic), Greek, Albanian, Macedonian, Bulgarian (the edin particle is treated as a budding indefinite article), and Hungarian. Icelandic is excluded — it has a suffixed definite article but no indefinite article.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Languages with relative-pronoun relative clauses (Map 107.2).
Equations
- One or more equations did not get rendered due to their size.
Instances For
Languages with the 'have'-perfect (Map 107.3).
@cite{haspelmath-2001} §2.3 restricts this isogloss to the Romance and Germanic families plus a Balkan/peripheral fringe — Albanian, Greek, Macedonian (an innovation: ima + verbal adjective), and (parts of) Czech. Bulgarian and Serbo-Croatian retain the inherited Slavic 'be'+l-participle perfect rather than the 'have' construction.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Languages with predominant nominative-experiencer coding (Map 107.4).
Equations
- One or more equations did not get rendered due to their size.
Instances For
WALS-derived parallel: languages classified by F107A (Passive Constructions) as having a passive present. F107A counts any passive (periphrastic, morphological, etc.), so this is a strict superset of Haspelmath's copula+participle criterion.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Languages with a canonical participial passive (Map 107.5).
The Romance and Germanic copula+participle pattern, extended through Slavic and Baltic and into Greek, Albanian, Macedonian, and Irish.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Anticausative-prominent languages (Map 107.6: ≥ 70% anticausative).
@cite{haspelmath-2001} §2.6 / Map 107.6 marks only the languages whose inchoative–causative pairs are anticausative-prominent on the Haspelmath 1993 figures. Romance is partially excluded (only French/Romanian register as prominent; Italian/Spanish/Portuguese fall on the causative-prominent side). The full SAE marking is German, French, Russian, Greek, Romanian, Lithuanian, English.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Languages with dative external possessors (Map 107.7).
Equations
- One or more equations did not get rendered due to their size.
Instances For
WALS-derived parallel: languages classified by F115A (Negative Indefinite Pronouns and Predicate Negation) as not requiring predicate negation alongside the negative pronoun. Strict subset of Haspelmath's criterion: F115A.noPredicateNegation captures only the rigid V+NI type (predominantly Germanic), whereas Haspelmath's §2.8 also includes Romance languages where the predicate negative is optional or weakening.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Languages with V + NI negation (no obligatory verbal negation; Map 107.8).
Equations
- One or more equations did not get rendered due to their size.
Instances For
WALS-derived parallel: languages classified by F121A (Comparative Constructions) as having a particle comparative.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Languages with particle comparatives (Map 107.9).
Equations
- One or more equations did not get rendered due to their size.
Instances For
Languages with relative-based equatives (Map 107.10).
Equations
- One or more equations did not get rendered due to their size.
Instances For
WALS-derived parallel: languages classified by F101A (Expression of Pronominal Subjects) as requiring obligatory subject pronouns.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Languages with strict subject agreement (Map 107.11).
A language has strict agreement when subject pronouns are obligatory even in the presence of subject agreement on the verb (i.e., subject-agreement affixes lack referential force on their own). Russian and the pro-drop Romance languages (Italian, Spanish, Portuguese, Romanian) fail this criterion. Welsh has rich agreement but allows pro-drop, so it is also excluded.
Equations
- One or more equations did not get rendered due to their size.
Instances For
WALS-derived parallel: languages classified by F47A (Intensifiers and Reflexive Pronouns) as having differentiated forms.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Languages with differentiated intensifier vs. reflexive forms (Map 107.12).
Equations
- One or more equations did not get rendered due to their size.
Instances For
Dispatch from feature to its isogloss (as a Finset).
Equations
- One or more equations did not get rendered due to their size.
Instances For
Dispatch to the schema's Isogloss = Finset SAELanguage type.
Equations
Instances For
The four reference samples for evaluating areality, per @cite{haspelmath-2001} §1.
area: the core European languages (Romance, Germanic, Balkan, Balto-Slavic, marginal Finno-Ugric) that the paper proposes as SAE.cofamilial: other Indo-European branches (eastern IE: Iranian, Indic, Armenian) that lie outside the area; presence of a feature in these would point to PIE inheritance rather than areal contact.adjacent: geographically adjacent non-SAE languages (Celtic west, Turkic- Nakh-Daghestanian east, Semitic south); presence here would suggest a wider regional drift rather than a strictly European phenomenon.
world: a small worldwide sample for the (iv) "not common worldwide" criterion.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Standard Average European as a LinguisticArea: the 12 diagnostic
features of @cite{haspelmath-2001} §2 over the European/cofamilial/
adjacent/world reference frame.
LinguisticArea does not require every diagnostic feature to satisfy
the strict IsAreal predicate at any particular threshold — and
indeed, several SAE features (anticausative prominence, V+NI negation,
strict agreement) do not pass strict majority on Haspelmath's own data.
This matches the paper: @cite{haspelmath-2001}'s actual argument runs
through the cluster maps of §4, not through per-feature majority
thresholds.
Per-feature IsArealAt claims for the strongly-attested subset are
proved separately below; clusterScore and nucleus are computed
across all 12.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Definite + indefinite articles (Map 107.1) is areal at the strict 1/2 threshold: ubiquitous in the area, absent from cofamilial/adjacent/world samples.
The 'have'-perfect (Map 107.3) is areal at the strict 1/2 threshold.
Particle comparatives (Map 107.9) are areal at the strict 1/2 threshold.
French sits in the SAE cluster nucleus: it exhibits all 12 diagnostic features (the maximum cluster score).
German sits in the SAE cluster nucleus alongside French — @cite{haspelmath-2001} §4's Charlemagne Sprachbund core.
The SAE feature inventory has 12 features, matching Maps 107.1–107.12 of @cite{haspelmath-2001}.