Documentation

Linglib.Theories.Diachronic.Areal

Linguistic Areas (Sprachbünde) #

@cite{haspelmath-2001}

A framework-neutral schema for typological linguistic areas (Sprachbünde) — groups of geographically contiguous, often genealogically heterogeneous languages that share structural features attributable to contact rather than inheritance.

The schema follows the methodology of @cite{haspelmath-2001} but is gradient-first, in the spirit of his §4 cluster maps: the primary data per feature is an ArealProfile (a 4-tuple of densities — in-area, in-cofamilial, in-adjacent, in-world). Binary judgments are derived from profiles by comparison against an ArealThresholds parameter, defaulting to the qualitative 1/2 cut.

Layered structure #

Design notes #

Isogloss is Finset L, not Set L: typology works over finite samples, and Finset gives us computable density without Classical. Set algebra (, , , \) is inherited via mathlib's Finset lattice.

Reference samples carry Finset.Nonempty proofs. @cite{haspelmath-2001} evaluates his criteria against actual evidence — there is no "vacuous absence" in the paper — so requiring non-emptiness eliminates the degenerate 0/0 = 0 density and lets the rational and Nat formulations of the criteria agree on all in-use cases (cf. Isogloss.density_gt_iff / density_lt_iff).

The threshold layer is parameterized rather than hard-coded: this lets later modules (e.g. cluster-map studies, Grambank-style probabilistic treatments) replicate paper-specific cutoffs without rewriting the predicate.

Relation to contemporary typology #

The discrete-criterion framing is faithful to @cite{haspelmath-2001} but is now widely seen as a first approximation. Subsequent work has questioned whether any principled threshold separates Sprachbund from non-Sprachbund, and large databases (Grambank, AUTOTYP) support gradient, probabilistic treatments. The profile/threshold split here is the bridge: a future GradientLinguisticArea sibling can replace Isogloss = Finset L with L → ℚ strength functions and reuse ArealProfile and ArealThresholds unchanged.

@[reducible, inline]

An isogloss is the set of languages exhibiting a given linguistic feature. Following standard typological practice, isoglosses are sets of languages, not the geometric boundaries between them.

This is a transparent abbreviation for Finset L; all finset operations (, , , , \, Finset.card) apply directly, and every membership question is decidable.

Equations
Instances For
    def Theories.Diachronic.Areal.Isogloss.density {L : Type u} [DecidableEq L] (I : Isogloss L) (S : Finset L) :

    The proportion of languages in a finite reference sample S that lie inside the isogloss I. With Finset operations this is a computable rational in [0, 1] whenever S is non-empty; on S = ∅ it returns the ℚ-default 0/0 = 0.

    Equations
    • I.density S = (S I).card / S.card
    Instances For
      @[simp]
      theorem Theories.Diachronic.Areal.Isogloss.density_empty {L : Type u} [DecidableEq L] (I : Isogloss L) :
      I.density = 0
      theorem Theories.Diachronic.Areal.Isogloss.density_nonneg {L : Type u} [DecidableEq L] (I : Isogloss L) (S : Finset L) :
      0 I.density S
      theorem Theories.Diachronic.Areal.Isogloss.density_le_one {L : Type u} [DecidableEq L] (I : Isogloss L) (S : Finset L) :
      I.density S 1
      theorem Theories.Diachronic.Areal.Isogloss.density_gt_iff {L : Type u} [DecidableEq L] (I : Isogloss L) {S : Finset L} (hS : S.Nonempty) (θ : ) :
      I.density S > θ θ * S.card < (S I).card

      Bridge from the rational threshold form to a Nat-arithmetic form. Requires S non-empty so that density is not the degenerate 0 / 0. This is the lemma proofs rw to reach a decide-amenable goal.

      theorem Theories.Diachronic.Areal.Isogloss.density_lt_iff {L : Type u} [DecidableEq L] (I : Isogloss L) {S : Finset L} (hS : S.Nonempty) (θ : ) :
      I.density S < θ (S I).card < θ * S.card
      theorem Theories.Diachronic.Areal.Isogloss.density_gt_half_iff {L : Type u} [DecidableEq L] (I : Isogloss L) {S : Finset L} (hS : S.Nonempty) :
      I.density S > 1 / 2 S.card < 2 * (S I).card

      Specialization of density_gt_iff to the 1/2 threshold. The Nat-form RHS is fully kernel-reducible on concrete Finset data, so downstream decide calls do not need to traverse Rat.div.

      theorem Theories.Diachronic.Areal.Isogloss.density_lt_half_iff {L : Type u} [DecidableEq L] (I : Isogloss L) {S : Finset L} (hS : S.Nonempty) :
      I.density S < 1 / 2 2 * (S I).card < S.card

      Specialization of density_lt_iff to the 1/2 threshold.

      The four reference samples needed to evaluate whether a feature is areal, following @cite{haspelmath-2001} §1's criteria (i)–(iv).

      Non-emptiness is required by construction: @cite{haspelmath-2001}'s methodology presupposes positive evidence on each sample, and an empty sample provides no information about the criterion it is meant to address.

      For Standard Average European, the area is the core European languages, the cofamilial sample is the eastern Indo-European branches (Iranian, Indic, Armenian) — branches of Indo-European that lie outside the area, used to rule out Proto-Indo-European inheritance — the adjacent sample is the geographically neighboring non-Indo-European languages (Turkic, Nakh-Daghestanian, Afro-Asiatic), and the world sample is a representative global typological sample.

      • area : Finset L

        The candidate area itself (e.g. core European languages for SAE).

      • cofamilial : Finset L

        Other branches of the same families as the area's languages, used to rule out genealogical inheritance from a deep common ancestor. For SAE: eastern Indo-European (Iranian, Indic, Armenian).

      • adjacent : Finset L

        Languages geographically adjacent to but outside the area, used to rule out a worldwide tendency that just happens to extend a bit further. For SAE: Turkic, Nakh-Daghestanian, Celtic (in some criteria), etc.

      • world : Finset L

        A representative worldwide sample, used to rule out the feature being a cross-linguistic universal preference rather than an areal phenomenon.

      • area_nonempty : self.area.Nonempty
      • cofamilial_nonempty : self.cofamilial.Nonempty
      • adjacent_nonempty : self.adjacent.Nonempty
      • world_nonempty : self.world.Nonempty
      Instances For

        The areal profile of an isogloss against a reference: the four densities that @cite{haspelmath-2001}'s methodology compares. This is the gradient datum from which both threshold-based binary judgments (IsArealAt) and §4-style cluster maps are derived.

        Profiles are framework-agnostic: a profile with inArea = 0.85, inCofamilial = 0.10, inAdjacent = 0.05, inWorld = 0.20 is a canonical "areal" pattern regardless of any threshold convention.

        • inArea :

          Density of the feature in the candidate area.

        • inCofamilial :

          Density of the feature in the cofamilial sample.

        • inAdjacent :

          Density of the feature in the geographically adjacent sample.

        • inWorld :

          Density of the feature in the worldwide sample.

        Instances For
          Equations
          • One or more equations did not get rendered due to their size.
          Instances For

            The profile of an isogloss against a reference frame.

            Equations
            Instances For

              The maximum density outside the area — the most permissive value that the "lacks elsewhere" criteria must beat.

              Equations
              Instances For

                A natural areality score: in-area density minus the max outside density. Range [-1, 1], with higher values indicating stronger areal patterning. This is one of several reasonable score aggregations and is exposed for convenience; downstream studies are free to define their own.

                Equations
                Instances For

                  Numerical thresholds for @cite{haspelmath-2001}'s qualitative criteria.

                  @cite{haspelmath-2001} reads "the great majority" and "lack" qualitatively; this structure exposes the cutoffs so that:

                  • the default ⟨1/2, 1/2⟩ is the natural first approximation,
                  • stricter conventions (e.g. inside = 3/4, outside = 1/4) can be specified per-study,
                  • a future probabilistic generalization can plug in here without disturbing the rest of the schema.
                  • inside :

                    "Majority inside" cutoff: a feature is areal-at-T only if the area's density exceeds this.

                  • outside :

                    "Lacks outside" cutoff: cofamilial / adjacent / world densities must fall below this.

                  Instances For
                    structure Theories.Diachronic.Areal.IsArealAt {L : Type u} [DecidableEq L] (T : ArealThresholds) (R : ArealReference L) (I : Isogloss L) :

                    The four-part Haspelmath criterion at a chosen ArealThresholds.

                    Each field captures one of @cite{haspelmath-2001} §1 criteria (i)–(iv), parameterized by the threshold T. With T = default this is the plain "majority in / minority out" reading.

                    • area_majority : I.density R.area > T.inside

                      Criterion (i): the area-internal density beats the inside threshold.

                    • cofamilial_lacks : I.density R.cofamilial < T.outside

                      Criterion (iii): cofamilial density falls below the outside threshold, ruling out common genealogical inheritance.

                    • adjacent_lacks : I.density R.adjacent < T.outside

                      Criterion (ii): adjacent-non-area density falls below the outside threshold, ruling out a wider regional drift.

                    • not_worldwide : I.density R.world < T.outside

                      Criterion (iv): worldwide density falls below the outside threshold, ruling out a universal cross-linguistic preference.

                    Instances For
                      @[reducible, inline]
                      abbrev Theories.Diachronic.Areal.IsAreal {L : Type u} [DecidableEq L] (R : ArealReference L) (I : Isogloss L) :

                      Areality at the default Haspelmath threshold (⟨1/2, 1/2⟩).

                      Equations
                      Instances For
                        structure Theories.Diachronic.Areal.LinguisticArea (L : Type u) (F : Type v) [DecidableEq L] :
                        Type (max u v)

                        A linguistic area (Sprachbund) parameterized by a language type L and a feature index type F.

                        An area bundles the three pieces of data needed for cluster-map analysis: a reference frame, a finite set of diagnostic features, and an isogloss assignment. The cluster-map methodology of @cite{haspelmath-2001} §4 is recovered from this data via featureProfile (per-feature gradient), clusterScore (per-language count), and isopleth / nucleus (cluster bands and core).

                        LinguisticArea deliberately does not require every diagnostic feature to satisfy IsArealAt at any particular threshold. Real Sprachbund analyses (including @cite{haspelmath-2001}'s own SAE) propose feature inventories where strong, weak, and tendency-only features coexist; the binary IsArealAt predicate is applied à la carte by downstream studies for whichever subset passes the threshold of interest.

                        • reference : ArealReference L

                          The reference frame against which areality is judged.

                        • features : Finset F

                          The diagnostic features proposed for this Sprachbund.

                        • isogloss : FIsogloss L

                          The isogloss assigned to each feature: which languages exhibit it.

                        Instances For

                          The areal profile of feature f: its four densities under the area's reference frame. The primary gradient datum per feature.

                          Equations
                          Instances For
                            def Theories.Diachronic.Areal.LinguisticArea.featureScore {L : Type u} {F : Type v} [DecidableEq L] (A : LinguisticArea L F) (f : F) :

                            The contrast score of feature f: in-area density minus the max outside density. Range [-1, 1], with IsArealAt features all scoring above T.inside - T.outside.

                            Equations
                            Instances For
                              def Theories.Diachronic.Areal.LinguisticArea.clusterScore {L : Type u} {F : Type v} [DecidableEq L] (A : LinguisticArea L F) (l : L) :

                              The cluster score of a language: how many of the area's features it exhibits. This recovers @cite{haspelmath-2001} §4's cluster-map gradient membership from the discrete feature-by-feature data.

                              Equations
                              Instances For

                                The cluster score is bounded above by the total number of features.

                                def Theories.Diachronic.Areal.LinguisticArea.isopleth {L : Type u} {F : Type v} [DecidableEq L] (A : LinguisticArea L F) (k : ) :
                                Set L

                                The k-isopleth: the set of languages exhibiting at least k of the area's features. Used to draw cluster maps at varying tightness.

                                Equations
                                Instances For
                                  theorem Theories.Diachronic.Areal.LinguisticArea.isopleth_anti {L : Type u} {F : Type v} [DecidableEq L] (A : LinguisticArea L F) {j k : } (h : j k) :
                                  A.isopleth k A.isopleth j

                                  Isopleths are antitone in k: a stricter threshold yields a smaller set.

                                  def Theories.Diachronic.Areal.LinguisticArea.nucleus {L : Type u} {F : Type v} [DecidableEq L] (A : LinguisticArea L F) :
                                  Set L

                                  The nucleus: languages exhibiting all but at most one of the area's features. For SAE, @cite{haspelmath-2001} §4 identifies French and German as the nuclear members — the Charlemagne Sprachbund.

                                  Nat subtraction truncates at zero, so features.card = 0 gives the trivial isopleth 0 = Set.univ; the SAE study verifies the non-degenerate case where features.card = 12.

                                  Equations
                                  Instances For