Documentation

Linglib.Phenomena.Polarity.Studies.Haspelmath1997

Haspelmath (1997): Polarity-Side Indefinite Typology #

@cite{haspelmath-1997} @cite{haspelmath-2013} @cite{kadmon-landman-1993} @cite{ladusaw-1979} @cite{wals-2013}

Haspelmath, Martin (1997). Indefinite Pronouns. Oxford Studies in Typology and Linguistic Theory. Oxford University Press.

Polarity-side projection of @cite{haspelmath-1997}'s 9-function implicational map for indefinite pronouns. Where the sibling file Phenomena/Indefinites/Studies/Haspelmath1997.lean formalises the indefinite-typology angle (Fragment-derived IndefiniteParadigms for a 6-language sample, with WALS-bridge theorems checking the F46A classification), this file owns the polarity-side claims:

The substrate (HaspelmathFunction, IndefiniteEntry, IndefiniteParadigm, MorphologicalBasis, contiguity / coverage / disjointness predicates, wals46A and converters) lives in Typology/Indefinite.lean.

Sample #

17 typologically diverse languages:

The 17 paradigms are hand-stipulated here rather than derived from Fragments/{Lang}/Indefinites.lean because the per-form IndefiniteEntry.functions field commits to a particular analysis of how forms partition the 9-function map, and the polarity-side analysis (Haspelmath 1997's contiguity-driven encoding) genuinely differs from the existing Fragment-side analysis (Degano-Aloni 2025 / Bubnov 2026's competition-driven encoding) on three of the 17 languages where Fragments already exist (English, German, Russian).

Concrete disagreement: Haspelmath polarity-view English some- covers {SK, SU} only, with any- (NPI) owning {irrealis, question, conditional, indirectNeg}; the D&A-shape Fragment's someEntry covers {SK, SU, irrealis} with no any- form. This is a real analytical disagreement, not a missing-data gap.

Audit history (see project_indefinite_substrate_contested.md memory note):

The Fragment-vs-Studies disagreement is two published analyses, lifted to theorem level in Phenomena/Indefinites/Studies/Bubnov2026.lean §11: fragment_polarity_disagree_on_kto_to proves the Russian case; fragment_polarity_disagree_on_some proves the English case. Both are decide-checked extensional inequalities on the Haspelmath function sets. The disagreement source is documented there: D&A read profiles theoretically (semantic permission); Bubnov reads them distributionally (actual coverage net of paradigmatic competition with sibling forms). Promotion of the 14 missing-language paradigms to Fragments is deferred on the same grounds: each promoted paradigm would have to pick a classification, replicating the disagreement at more sites.

Relation to Indefinites/Studies/Haspelmath1997.lean #

CLAUDE.md permits placing the same paper's formalisation under multiple phenomena when the contributions split cleanly. The split here:

WALSCount is imported from Linglib/Data/WALS/Aggregation.lean.

WALS Ch 46 distribution (N = 326).

Equations
  • One or more equations did not get rendered due to their size.
Instances For

    Helpers (wals46A, formCount, allFunctions, AllContiguous, CoversAllFunctions, FormsDisjoint, IndefiniteEntry.coverage) are defined on IndefiniteParadigm / IndefiniteEntry in Typology/Indefinite.lean. The Prop-valued predicates have Decidable instances; theorems use them directly without = true tails (mathlib idiom).

    English (Indo-European): 4 series, generic-noun-based. some- (SK+SU) / any- NPI (irrealis through indirectNeg) / no- (directNeg) / any- FC (comparative+freeChoice).

    Equations
    • One or more equations did not get rendered due to their size.
    Instances For

      Russian (Slavic): 6 series, interrogative-based. Textbook map example.

      Per @cite{degano-aloni-2025} Table 2 (the most recent canonical classification): кое- = Specific Known {SK}, -то = Epistemic {SU, NS}, -нибудь = Non-specific {NS}. Note that -то AND -нибудь BOTH cover NS — D&A 2025 explicitly observe (p. 960) that "Russian speakers tend to select -нибудь for NS and -то for SU" but both forms admit NS uses. The paradigm therefore violates FormsDisjoint (which is a Prop predicate on IndefiniteParadigm, not a structural requirement; D&A's analysis treats overlapping forms as the empirical norm to be explained, not a violation).

      Fragments/Slavic/Russian/Indefinites.lean encodes -то more narrowly as {SU} only, following @cite{bubnov-2026}'s subsequent argument that paradigmatic competition with -нибудь narrows -то's actual distribution. The Fragment-vs-Studies divergence here is two published analyses, not a bug: D&A 2025 (this file's encoding) vs Bubnov 2026 (Fragment's encoding). Both are referenced from their respective consumer chains.

      The polarity-region forms (-либо for {question, conditional, indirectNeg}, никто for directNeg, кто угодно for {comparative, freeChoice}) extend the SK/SU/NS triangle with the polarity span Haspelmath's map covers beyond it.

      Equations
      • One or more equations did not get rendered due to their size.
      Instances For

        German (Indo-European): 5 series, mixed bases (jemand generic-noun, irgend- special).

        Equations
        • One or more equations did not get rendered due to their size.
        Instances For

          Japanese (Japonic): 3 series, interrogative-based. wh + particle.

          Equations
          • One or more equations did not get rendered due to their size.
          Instances For

            Mandarin (Sino-Tibetan): 2 series, mixed (yǒu rén existential, shéi interrogative).

            Equations
            • One or more equations did not get rendered due to their size.
            Instances For

              Turkish (Turkic): 5 series, generic-noun-based (bir- 'one').

              Equations
              • One or more equations did not get rendered due to their size.
              Instances For

                Hindi-Urdu (Indo-Aryan): 3 series, special (koii).

                Equations
                • One or more equations did not get rendered due to their size.
                Instances For

                  Italian (Romance): 3 series, generic-noun-based.

                  Equations
                  • One or more equations did not get rendered due to their size.
                  Instances For

                    Finnish (Uralic): 5 series, special (joku/kukaan morphemes).

                    Equations
                    • One or more equations did not get rendered due to their size.
                    Instances For

                      Korean (Koreanic): 4 series, interrogative-based (wh + particle).

                      Equations
                      • One or more equations did not get rendered due to their size.
                      Instances For

                        Hungarian (Uralic): 4 series, interrogative-based (vala- / akár-).

                        Equations
                        • One or more equations did not get rendered due to their size.
                        Instances For

                          Georgian (Kartvelian): 4 series, interrogative-based.

                          Equations
                          • One or more equations did not get rendered due to their size.
                          Instances For

                            Quechua (Imbabura): 4 series, special. (Not in WALS F46A's sample.)

                            Equations
                            • One or more equations did not get rendered due to their size.
                            Instances For

                              Yoruba (Niger-Congo): 2 series, generic-noun-based (ẹnìkan 'person').

                              Equations
                              • One or more equations did not get rendered due to their size.
                              Instances For

                                Thai (Kra-Dai): 3 series, interrogative-based.

                                Equations
                                • One or more equations did not get rendered due to their size.
                                Instances For

                                  Tagalog (Austronesian): 4 series, existential construction.

                                  Equations
                                  • One or more equations did not get rendered due to their size.
                                  Instances For

                                    Swahili (Bantu): 3 series, generic-noun-based (mtu 'person').

                                    Equations
                                    • One or more equations did not get rendered due to their size.
                                    Instances For

                                      All language paradigms in the polarity-typology sample (17 languages).

                                      Equations
                                      • One or more equations did not get rendered due to their size.
                                      Instances For

                                        @cite{haspelmath-1997}'s key constraint: every form covers a contiguous region on the implicational map.

                                        16 of 17 languages in the sample have disjoint forms (no function appears in two different forms). Russian is the exception: per @cite{degano-aloni-2025} Table 2, both -то (Epistemic, {SU, NS}) and -нибудь (Non-specific, {NS}) cover NS. D&A treat overlapping forms as a real empirical phenomenon to be explained — see the Russian paragraph on p. 960 — not a violation. FormsDisjoint is a Prop predicate on IndefiniteParadigm, not a structural requirement, so paradigms failing it are well-formed; we just enumerate the witnesses.

                                        Russian fails FormsDisjoint per D&A 2025: -то ({SU, NS}) and -нибудь ({NS}) overlap on NS.

                                        The 16 non-Russian languages in the sample DO satisfy FormsDisjoint.

                                        Coverage + contiguity theorem (the disjointness conjunct from the earlier all_languages_partition is dropped — Russian breaks it per D&A 2025 — leaving the universal claim that every paradigm covers all nine functions with each form covering a contiguous region).

                                        Mandarin (2 forms) has fewer forms than Russian (6 forms), but its total coverage is at most Russian's. (Equality held when Russian had 5 disjoint forms and total coverage 9 = Mandarin's; per @cite{degano-aloni-2025} Russian -то now covers {SU, NS} not {SU}, so total coverage rises to 10 > Mandarin's 9 — the relation weakens to ≤.)

                                        Count of languages with a given number of forms.

                                        Equations
                                        Instances For
                                          theorem Phenomena.Polarity.Studies.Haspelmath1997.language_form_counts :
                                          List.map (fun (p : Typology.Indefinite.IndefiniteParadigm) => (p.isoCode, p.formCount)) allLanguages = [("eng", 4), ("rus", 6), ("deu", 5), ("jpn", 3), ("cmn", 2), ("tur", 5), ("hin", 3), ("ita", 3), ("fin", 4), ("kor", 4), ("hun", 4), ("kat", 4), ("qvi", 4), ("yor", 2), ("tha", 3), ("tgl", 4), ("swh", 3)]

                                          Per-language form-count summary for the 17-language sample.

                                          16 of 17 languages appear in WALS F46A; Quechua (Imbabura, iso qvi) is absent. The Polarity-side annotations of basis : MorphologicalBasis on each form derive a paradigm-level F46A classification via IndefiniteParadigm.toWALS46A — but for the polarity sample, the paradigm-derived value may differ from WALS for languages where the forms span multiple bases (e.g., German mixed). We verify the lookupISO-derived classification rather than the structural derivation.

                                          All 17 languages in our sample appear in WALS F46A.

                                          Wh-based indefinite languages (Japanese, Korean, Mandarin, Thai).

                                          Equations
                                          • One or more equations did not get rendered due to their size.
                                          Instances For

                                            Negative concord languages (Russian, Italian, Hungarian).

                                            Equations
                                            • One or more equations did not get rendered due to their size.
                                            Instances For

                                              In some neg-concord language, directNeg is in a multi-function form.

                                              Languages classified as interrogative-based in WALS 46A.

                                              Equations
                                              • One or more equations did not get rendered due to their size.
                                              Instances For

                                                Minimum form count in our sample.

                                                Maximum form count in our sample.

                                                Total number of distinct forms across the sample.

                                                Verify that each language's Fragments/{Lang}/PolarityItems.lean NPI entries are licensed in contexts corresponding to the polarity-typology profile's polarity-sensitive forms.