Haspelmath (1997): Polarity-Side Indefinite Typology #
@cite{haspelmath-1997} @cite{haspelmath-2013} @cite{kadmon-landman-1993} @cite{ladusaw-1979} @cite{wals-2013}
Haspelmath, Martin (1997). Indefinite Pronouns. Oxford Studies in Typology and Linguistic Theory. Oxford University Press.
Polarity-side projection of @cite{haspelmath-1997}'s 9-function
implicational map for indefinite pronouns. Where the sibling file
Phenomena/Indefinites/Studies/Haspelmath1997.lean formalises the
indefinite-typology angle (Fragment-derived IndefiniteParadigms for a
6-language sample, with WALS-bridge theorems checking the F46A
classification), this file owns the polarity-side claims:
- 17-language sample with hand-stipulated
IndefiniteParadigminstances - NPI-cluster + FC-region + neg-concord theorems
- WALS Ch 46 distribution and per-language
wals46Alookups FragmentBridgesdecide-checks againstFragments/{Lang}/PolarityItems.leanentries (verifies that NPI items the polarity Fragments declare are licensed in the contexts the Haspelmath polarity-sensitive forms cover)
The substrate (HaspelmathFunction, IndefiniteEntry, IndefiniteParadigm,
MorphologicalBasis, contiguity / coverage / disjointness predicates,
wals46A and converters) lives in Typology/Indefinite.lean.
Sample #
17 typologically diverse languages:
- Indo-European: English, Russian, German, Italian, Hindi-Urdu
- Uralic: Finnish, Hungarian
- Turkic: Turkish
- Sino-Tibetan: Mandarin
- Japonic / Koreanic: Japanese, Korean
- Kartvelian: Georgian
- Quechuan: Quechua (Imbabura)
- Niger-Congo: Yoruba, Swahili
- Kra-Dai: Thai
- Austronesian: Tagalog
The 17 paradigms are hand-stipulated here rather than derived from
Fragments/{Lang}/Indefinites.lean because the per-form
IndefiniteEntry.functions field commits to a particular analysis of how
forms partition the 9-function map, and the polarity-side analysis
(Haspelmath 1997's contiguity-driven encoding) genuinely differs from the
existing Fragment-side analysis (Degano-Aloni 2025 / Bubnov 2026's
competition-driven encoding) on three of the 17 languages where Fragments
already exist (English, German, Russian).
Concrete disagreement: Haspelmath polarity-view English some- covers
{SK, SU} only, with any- (NPI) owning {irrealis, question, conditional, indirectNeg}; the D&A-shape Fragment's someEntry covers {SK, SU, irrealis} with no any- form. This is a real analytical disagreement,
not a missing-data gap.
Audit history (see project_indefinite_substrate_contested.md memory note):
- A first-pass audit framed this as a 5-framework conflict and recommended
substrate evolution (split
functions → attestedFunctions). That framing was wrong: re-audit verified actual writers are 2 (Fragments + this file); other consumers (D&A, Bubnov, Dekier, Chierchia, Modal Indefinites) are READ-only, parallel substrate, or no contact. - A second-pass check against the most recent canonical paper (@cite{degano-aloni-2025}, How to be (non-)specific, L&P 2025) verified that D&A 2025 explicitly works within Haspelmath's 9-function map unchanged — they do NOT split irrealis into specific/nonspecific sub-functions. So substrate-level function-inventory refinement would put linglib ahead of the field.
- D&A 2025 Table 2 also explicitly allows two forms to cover the same
function (Russian -то and -нибудь both NS). This invalidates universal
FormsDisjointas a constraint. The Russian paradigm here now follows D&A 2025's classification (-то = Epistemic {SU, NS}); seedisjoint_languages_count+russian_not_disjoint. Phenomena/Polarity/Studies/Chierchia2006.leanalready consumes this file'sitalian/english/german/mandarinparadigms as substrate. Hand-stipulation here is therefore a working pattern with multiple consumers, not an isolated stipulation.
The Fragment-vs-Studies disagreement is two published analyses, lifted
to theorem level in Phenomena/Indefinites/Studies/Bubnov2026.lean §11:
fragment_polarity_disagree_on_kto_to proves the Russian case;
fragment_polarity_disagree_on_some proves the English case. Both are
decide-checked extensional inequalities on the Haspelmath function
sets. The disagreement source is documented there: D&A read profiles
theoretically (semantic permission); Bubnov reads them distributionally
(actual coverage net of paradigmatic competition with sibling forms).
Promotion of the 14 missing-language paradigms to Fragments is deferred
on the same grounds: each promoted paradigm would have to pick a
classification, replicating the disagreement at more sites.
Relation to Indefinites/Studies/Haspelmath1997.lean #
CLAUDE.md permits placing the same paper's formalisation under multiple phenomena when the contributions split cleanly. The split here:
Indefinites/Studies/Haspelmath1997.lean— typological coverage of indefinites, advancing claims aboutIndefiniteParadigm's F46A bridgePhenomena/Polarity/Studies/Haspelmath1997.lean(this file) — polarity- side projection, advancing claims about NPI/FC clustering, neg-concord, and Fragment-PolarityItem consistency
WALSCount is imported from Linglib/Data/WALS/Aggregation.lean.
WALS Ch 46 distribution (N = 326).
Equations
- One or more equations did not get rendered due to their size.
Instances For
Helpers (wals46A, formCount, allFunctions, AllContiguous,
CoversAllFunctions, FormsDisjoint, IndefiniteEntry.coverage)
are defined on IndefiniteParadigm / IndefiniteEntry in
Typology/Indefinite.lean. The Prop-valued predicates have
Decidable instances; theorems use them directly without = true
tails (mathlib idiom).
English (Indo-European): 4 series, generic-noun-based.
some- (SK+SU) / any- NPI (irrealis through indirectNeg) / no-
(directNeg) / any- FC (comparative+freeChoice).
Equations
- One or more equations did not get rendered due to their size.
Instances For
Russian (Slavic): 6 series, interrogative-based. Textbook map example.
Per @cite{degano-aloni-2025} Table 2 (the most recent canonical
classification): кое- = Specific Known {SK}, -то = Epistemic
{SU, NS}, -нибудь = Non-specific {NS}. Note that -то AND -нибудь
BOTH cover NS — D&A 2025 explicitly observe (p. 960) that "Russian
speakers tend to select -нибудь for NS and -то for SU" but both
forms admit NS uses. The paradigm therefore violates FormsDisjoint
(which is a Prop predicate on IndefiniteParadigm, not a structural
requirement; D&A's analysis treats overlapping forms as the empirical
norm to be explained, not a violation).
Fragments/Slavic/Russian/Indefinites.lean encodes -то more narrowly
as {SU} only, following @cite{bubnov-2026}'s subsequent argument that
paradigmatic competition with -нибудь narrows -то's actual distribution.
The Fragment-vs-Studies divergence here is two published analyses,
not a bug: D&A 2025 (this file's encoding) vs Bubnov 2026 (Fragment's
encoding). Both are referenced from their respective consumer chains.
The polarity-region forms (-либо for {question, conditional, indirectNeg}, никто for directNeg, кто угодно for {comparative, freeChoice}) extend the SK/SU/NS triangle with the polarity span Haspelmath's map covers beyond it.
Equations
- One or more equations did not get rendered due to their size.
Instances For
German (Indo-European): 5 series, mixed bases (jemand generic-noun, irgend- special).
Equations
- One or more equations did not get rendered due to their size.
Instances For
Japanese (Japonic): 3 series, interrogative-based. wh + particle.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Mandarin (Sino-Tibetan): 2 series, mixed (yǒu rén existential, shéi interrogative).
Equations
- One or more equations did not get rendered due to their size.
Instances For
Turkish (Turkic): 5 series, generic-noun-based (bir- 'one').
Equations
- One or more equations did not get rendered due to their size.
Instances For
Hindi-Urdu (Indo-Aryan): 3 series, special (koii).
Equations
- One or more equations did not get rendered due to their size.
Instances For
Italian (Romance): 3 series, generic-noun-based.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Finnish (Uralic): 5 series, special (joku/kukaan morphemes).
Equations
- One or more equations did not get rendered due to their size.
Instances For
Korean (Koreanic): 4 series, interrogative-based (wh + particle).
Equations
- One or more equations did not get rendered due to their size.
Instances For
Hungarian (Uralic): 4 series, interrogative-based (vala- / akár-).
Equations
- One or more equations did not get rendered due to their size.
Instances For
Georgian (Kartvelian): 4 series, interrogative-based.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Quechua (Imbabura): 4 series, special. (Not in WALS F46A's sample.)
Equations
- One or more equations did not get rendered due to their size.
Instances For
Yoruba (Niger-Congo): 2 series, generic-noun-based (ẹnìkan 'person').
Equations
- One or more equations did not get rendered due to their size.
Instances For
Thai (Kra-Dai): 3 series, interrogative-based.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Tagalog (Austronesian): 4 series, existential construction.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Swahili (Bantu): 3 series, generic-noun-based (mtu 'person').
Equations
- One or more equations did not get rendered due to their size.
Instances For
All language paradigms in the polarity-typology sample (17 languages).
Equations
- One or more equations did not get rendered due to their size.
Instances For
@cite{haspelmath-1997}'s key constraint: every form covers a contiguous region on the implicational map.
Every language in the sample covers all nine functions on the map.
16 of 17 languages in the sample have disjoint forms (no function
appears in two different forms). Russian is the exception: per
@cite{degano-aloni-2025} Table 2, both -то (Epistemic, {SU, NS}) and
-нибудь (Non-specific, {NS}) cover NS. D&A treat overlapping forms as
a real empirical phenomenon to be explained — see the Russian paragraph
on p. 960 — not a violation. FormsDisjoint is a Prop predicate on
IndefiniteParadigm, not a structural requirement, so paradigms
failing it are well-formed; we just enumerate the witnesses.
Russian fails FormsDisjoint per D&A 2025: -то ({SU, NS}) and
-нибудь ({NS}) overlap on NS.
The 16 non-Russian languages in the sample DO satisfy FormsDisjoint.
Coverage + contiguity theorem (the disjointness conjunct from the
earlier all_languages_partition is dropped — Russian breaks it per
D&A 2025 — leaving the universal claim that every paradigm covers all
nine functions with each form covering a contiguous region).
Every language has a form covering direct negation.
Free choice and comparative are always in the same form.
Specific known and direct negation are never in the same form.
Mandarin (2 forms) has fewer forms than Russian (6 forms), but its total coverage is at most Russian's. (Equality held when Russian had 5 disjoint forms and total coverage 9 = Mandarin's; per @cite{degano-aloni-2025} Russian -то now covers {SU, NS} not {SU}, so total coverage rises to 10 > Mandarin's 9 — the relation weakens to ≤.)
The polarity cluster: in every language, some form covers at least two of {question, conditional, indirectNeg}.
Count of languages with a given number of forms.
Equations
- Phenomena.Polarity.Studies.Haspelmath1997.countByFormCount langs n = (List.filter (fun (p : Typology.Indefinite.IndefiniteParadigm) => p.formCount == n) langs).length
Instances For
Per-language form-count summary for the 17-language sample.
16 of 17 languages appear in WALS F46A; Quechua (Imbabura, iso qvi)
is absent. The Polarity-side annotations of basis : MorphologicalBasis
on each form derive a paradigm-level F46A classification via
IndefiniteParadigm.toWALS46A — but for the polarity sample, the
paradigm-derived value may differ from WALS for languages where the
forms span multiple bases (e.g., German mixed). We verify the
lookupISO-derived classification rather than the structural derivation.
WALS 46A morphological-source classification per language.
All 17 languages in our sample appear in WALS F46A.
Wh-based indefinite languages (Japanese, Korean, Mandarin, Thai).
Equations
- One or more equations did not get rendered due to their size.
Instances For
Negative concord languages (Russian, Italian, Hungarian).
Equations
- One or more equations did not get rendered due to their size.
Instances For
In some neg-concord language, directNeg is in a multi-function form.
All four wh-based languages are interrogative-based or mixed in WALS 46A.
Languages classified as interrogative-based in WALS 46A.
Equations
- One or more equations did not get rendered due to their size.
Instances For
{specificKnown, directNeg} skipping intermediates is non-contiguous.
{specificKnown, freeChoice} is non-contiguous.
{specificKnown, comparative} is non-contiguous.
{specificUnknown, directNeg} skipping intermediates is non-contiguous.
{specificKnown, specificUnknown} IS contiguous (adjacent).
{question, conditional, indirectNeg} IS contiguous (a path).
The full set of all nine functions is contiguous (the map is connected).
The NPI region (question through directNeg) is contiguous.
The FC region (comparative, freeChoice) is contiguous.
The specific/irrealis region is contiguous.
The full polarity-sensitive span (question through freeChoice) is contiguous.
Minimum form count in our sample.
Maximum form count in our sample.
Total number of distinct forms across the sample.
The most common form count in the sample is 4 (six languages).
Verify that each language's Fragments/{Lang}/PolarityItems.lean NPI
entries are licensed in contexts corresponding to the polarity-typology
profile's polarity-sensitive forms.