Documentation

Linglib.Phenomena.Case.Studies.Caha2009

Caha (2009) — The Nanosyntax of Case #

@cite{caha-2009} @cite{blake-1994}

Caha's central proposal (@cite{caha-2009} §1.1): the morphosyntactic representation of each case literally contains the representations of all cases below it on the universal hierarchy: [[[[[ NOM ] ACC ] GEN ] DAT ] P ]. This study file defines the Caha-specific containment predicate RespectsCahaContainment and applies it to each Fragment case inventory.

Caha's Universal Case sequence is NOM – ACC – GEN – DAT – INST – COM (@cite{caha-2009} (10b), p. 10); the Russian-specific sequence inserts a "prepositional" between GEN and DAT (@cite{caha-2009} (16), p. 12). Vocatives are explicitly excluded from Caha's scope (@cite{caha-2009} §1.1 fn. 4, p. 9). For the substrate's encoding of this hierarchy and how it relates to Caha's actual sequence, see Core/Case/Order.lean.

Of 22 Fragment case inventories, 19 conform; the three principled exceptions are: Dargwa (ergative — Caha is keyed to accusative alignment), Finnish (DAT-less, ALL → DAT extension per @cite{blake-1994} Ch. 6), and Hungarian (GEN-less, dative-as-possessor syncretism per @cite{caha-2008} §5).

Caha containment-respect predicate #

Does an inventory respect Caha's containment hierarchy? True iff inv is downward-closed under the canonical PartialOrder Case (Caha containment) defined in Core/Case/Order.lean: whenever c ∈ inv and d ≤ c, then d ∈ inv. Off-hierarchy cases (ERG, ABS, INST, COM, …) impose no constraint — in the Caha order they only have c ≤ c, so the downward-closure condition is vacuous. On-hierarchy c of rank r forces every lower on-hierarchy case (ranks 0, …, r-1) into inv, which is exactly the prefix-contiguity Caha demands.

Mathlib's IsLowerSet would suffice for the same content; the Caha-named predicate is kept here for grep-ability and because the substantive claim is Caha-specific.

Equations
Instances For

    Slavic substrate: containment lemmas #

    Decoupled from Fragments/Slavic/Case.lean so that the Fragment substrate file does not pull in the Caha-specific containment predicate (which lives here in this study file, not in Core/).

    Vacuous: Core.Case.Order.containmentRank .voc = none faithfully encodes Caha's own scope choice (@cite{caha-2009} §1.1 fn. 4, p. 9: "Vocatives ... are ignored throughout this dissertation").

    § 1: Conformers (non-Slavic) #

    § 2: Slavic conformers (one substrate proof, ten aliases) #

    Each per-language caseInventory abbrev-aliases coreInventory, so the ten theorems below all reduce to slavicCore_respectsCaha. Cross-Slavic agreement is structural, not coincidental — covers every modern Slavic language with productive case morphology (Bulgarian and Macedonian, which lost case in the noun system, have no Case.lean file).

    § 3: Predicted violators #

    Dargwa is ergative; Caha's containment is keyed to accusative alignment. ABS/ERG are off-hierarchy in containmentRank, so Dargwa's [abs, erg, gen, dat, com, ess] fails downward closure (GEN/DAT present without NOM/ACC).

    Finnish has no dedicated dative — the allative (-lle) covers recipient function (@cite{blake-1994} Ch. 6, ALL → DAT extension; @cite{karlsson-2017} confirms). The inventory has rank 4 (LOC) without rank 3 (DAT).

    Hungarian has no morphological genitive — both standard reference grammars (@cite{kenesei-vago-fenyvesi-1998} §1.10, @cite{rounds-2001} ch. 6) gloss -nak / -nek exclusively as dative; @cite{caha-2008} §5 (pp. 266–267) explicitly addresses Hungarian as the textbook Blake-hierarchy surface counterexample, citing Blake's own footnote that the GEN-less inventory is resolved by treating the dative as expressing possessor function. The inventory has rank 3 (DAT) on Caha's containment hierarchy without rank 2 (GEN), failing downward closure. (Note: this is a counterexample to the containmentRank-based downward-closure predicate, which encodes Blake's hierarchy in Caha's notation; it is not a counterexample to Caha 2008's (28), which is about suffix-vs-postposition ordering and holds vacuously here since Hungarian marks all cases suffixally.)

    § 4: Slavic paradigm-shape syncretism (Caha §§8.3.1-4) #

    The conformer theorems above only check inventory cardinality — a trivial agreement, since every Slavic 6-case set obviously satisfies downward-closure. Caha's substantive prediction is about paradigm shape: which morphological cells syncretize within a noun's declension. This section formalizes the paradigm-shape predictions for all four Slavic languages Caha analyses in detail. Distinct shapes are factored into Slavic.SyncretismPatterns (§ 4.1) so per-language sections are docstring + attestation lists.

    Per-language sub-sections appear in encoding order (Serbian, Slovene, Ukrainian, Czech), not Caha's chapter order (which has Czech §8.3.3 before Ukrainian §8.3.4) — file structure follows the order shapes were added; cross-Slavic narrative closes in § 4.6.

    @[reducible, inline]

    Caha's Slavic-specific Case sequence (@cite{caha-2009} (16), p. 12 for Russian; (7) p. 238 confirms the same for Serbian): NOM – ACC – GEN – PREP/LOC – DAT – INS. Re-export from Core.Case.Order.cahaSlavicRank (the substrate definition). For the relationship to containmentRank (LOC at top, INST off-hierarchy), see Core.Case.Order.cahaSlavicRank_vs_containmentRank.

    Equations
    Instances For
      @[reducible, inline]

      A morphological paradigm encoded as a form-class index per cell. Indices correspond to the Slavic case sequence: 0=NOM, 1=ACC, 2=GEN, 3=PREP/LOC, 4=DAT, 5=INS. Two cells share a form iff their indices are equal.

      Equations
      Instances For

        Caha's Universal Contiguity (@cite{caha-2009} (10), p. 10) on a Slavic paradigm. Defers to the domain-independent Morphology.Containment.isContiguous substrate (which Core.Case.Allomorphy.AllomorphyPattern.IsContiguous specializes at n=4 — same engine, n=6 specialization here).

        Equations
        Instances For

          § 4.1: Distinct syncretism patterns attested in Caha's Slavic data #

          Each pattern is a Paradigm (form-class index per cell, indexed 0=NOM, 1=ACC, 2=GEN, 3=LOC/PREP, 4=DAT, 5=INS). Same pattern across languages = same def here; per-language sections below attest which Caha-cited noun in which language exemplifies each pattern. Names classify by syncretism structure, not witness lexeme.

          Contiguous patterns (UC-respecting) #

          TWO non-contiguous syncretisms in one paradigm: NOM=GEN (ABA over ACC) + ACC=PREP=DAT (ABA over GEN). Czech ulice 'street' sg (Caha (29) p. 248, analyzed via pronominal-vs-nominal endings split in (40)–(41) p. 252–253; treated as two accidental homophonies restricted to this single paradigm).

          Equations
          Instances For

            Attested non-contiguous patterns (Caha-acknowledged #

            counterexamples to UC, defended in Caha's prose as phonological or accidental — see per-language Counterexamples sub-namespaces for witness attribution).

            Contiguity / non-contiguity proofs (decide-checked once per #

            shape; per-language attestedShapes lists below inherit these).

            Hypothetical (non-attested) ABA patterns #

            Showing the predicate has bite for arbitrary ABA shapes beyond the specific patterns Caha addresses. Caha predicts these patterns are unattested in any language.

            § 4.2: Serbian (§8.3.1, p. 238-240). Caha's headline: "Serbian can be #

            thought of as another poster child for Universal Contiguity, with no violations thereof" (p. 239). Caha's five named syncretism types (Caha (11), p. 239):

            (a) NOM-ACC: neuters in sg+pl; feminine plurals (b) ACC-GEN: singular masculine animates (c) GEN-PREP: singular of the 'death' paradigm (d) PREP-DAT: almost omnipresent (Caha p. 238 notes PREP/DAT differ only in stress on monosyllabic nouns, segmentally identical — entertains "in fact only a single 'surface' dat/prep case in Serbian") (e) DAT-INS: plurals

            Serbian attests no Caha-acknowledged counterexamples.

            All Serbian shapes attested in Caha (9) p. 238 (singular, 7 nouns) and (10) p. 239 (plural, 7 nouns). The 7 sg paradigms reduce to 4 distinct shapes; the 7 pl reduce to 2.

            Equations
            • One or more equations did not get rendered due to their size.
            Instances For

              § 4.3: Slovene (§8.3.2, p. 240-244). Per Caha (13) p. 240, Slovene uses #

              the same Slavic sequence NOM-ACC-GEN-PREP-DAT-INS but, unlike Serbian, "keeps all the six cases distinct: there is no prep – dat annexion." Slovene also has dual number — paradigms below are per-noun-and-number-cell.

              Caha's four widespread syncretism types in (15) p. 240:

              (a) NOM-ACC: widespread; in all neuters, in all duals (b) ACC-GEN: most pronouns, all masculine animate sg nouns (c) PREP-DAT: all singular nouns (d) DAT-INS: all duals

              Plus the rarer GEN-PREP syncretism (16) p. 241, attested in plural adjectives, plural/dual personal pronouns, and certain feminine sg declensions.

              Crucially, Slovene also has three Caha-acknowledged counterexamples (Caha (18) p. 241–242) — see Counterexamples sub-namespace below.

              All Slovene contiguous shapes attested in Caha (14) p. 240, (16) p. 241, (17) p. 241.

              Equations
              • One or more equations did not get rendered due to their size.
              Instances For

                Three paradigms (Caha (18) p. 241–242) that violate Universal Contiguity at the segmental level. Caha defends each in the surrounding prose (p. 242–243) as either phonological conflation ((19): 'this n.' PREP=INS via -em vs -im tonal collapse, visible distinctly in 'that' tîst-em vs tîst-im where prefixation strips the tone; tonal: 'traveller' NOM=INS via acute/circumflex pl tone plus the otrôc-i / otrók-i 'child' stem alternation evidence) or accidental homophony (ACC=INS in 'this f.', restricted to one declension of feminine singulars).

                Slovene attestations of non-contiguous patterns:

                Equations
                • One or more equations did not get rendered due to their size.
                Instances For

                  § 4.4: Ukrainian (§8.3.4, p. 268-271). Per Caha §8.3.4 p. 268, Ukrainian #

                  also conforms to the Slavic Universal Contiguity sequence NOM-ACC-GEN-PREP-DAT-INS. Caha's data (68) p. 268 illustrates NOM-ACC, ACC-GEN, GEN-PREP, and PREP-DAT pairs. Two "possibly offensive" syncretisms exist (paradigm variants of 'region' (70) and adjective 'endless' (71)), but Caha argues they are paradigm-variant-conditioned and isolated.

                  Caha (74b) p. 271 highlights that Ukrainian has removed a contiguity violation that earlier stages of the language showed, "in a way that is predicted by the Superset Principle."

                  All Ukrainian contiguous shapes attested in Caha (68) p. 268 and (69) p. 269.

                  Equations
                  • One or more equations did not get rendered due to their size.
                  Instances For

                    The "less frequently found alternative" variant of the soft-stem adjective 'endless' (bezkrájij) shows PREP=INS skipping DAT — identical shape to Slovene's Counterexamples.thisN, addressable by the same phonological-conflation analysis Caha applies to Slovene (Caha p. 271, "the homophony represents a phonological conflation of two underlyingly distinct patterns").

                    § 4.5: Czech (§8.3.3, p. 244-267). Caha's most permissive Slavic #

                    language for syncretism: "When it comes to syncretism, it seems at first blush that 'anything goes'" (p. 244). Three of Czech's six cases (ACC, PREP, INS) show 4 of the 5 logically possible syncretisms with other cases. Caha then argues that most apparent violations are phonological conflations of distinct underlying representations or accidental homophonies in restricted niches; Czech "in fact provides good support for the Universal Contiguity" (p. 267).

                    Caha's (67) p. 266 summary table — 10 attested syncretism types in Czech with extension and status:

                    (a) NOM-ACC: widespread, non-accidental (b) NOM-GEN: 'street' sg only, accidental homophony (c) NOM-INS: soft C-final m anim Ns, phonological conflation (d) ACC-GEN: m anim sg, pronouns, non-accidental (e) ACC-PREP: 'street' sg only, accidental homophony (f) ACC-INS: f.sg adjs, 'sir' pl, phonological conflation (g) GEN-PREP: As in pl, Num 'two', some Ns sg, non-accidental (h) PREP-DAT: nouns sg, non-accidental (i) PREP-INS: m/n As sg, phonological conflation (j) DAT-INS: Num 'two', for all-oblique conflation, non-accidental

                    The 5 contiguous types (a, d, g, h, j) all reuse existing SyncretismPatterns shapes attested in Serbian/Slovene/Ukrainian. The 5 non-contiguous types (b, c, e, f, i) include three already- encoded shapes (nomInsExtremeEnds for c, accInsRestricted for f, prepInsSkipDat-style for i) plus two Czech-distinctive shapes (streetDoubleABA for b+e bundled in the ulice paradigm, accGenPrepInsSkipDat for i with ACC=GEN context).

                    Czech contiguous shapes attested per Caha (29)–(33) and (67) p. 266. Witness lexemes: machine stroj sg/pl, both oba, that f.pl ty, man muž sg, good adj pl m dobrý, etc.

                    Equations
                    • One or more equations did not get rendered due to their size.
                    Instances For

                      Czech (67b, c, e, f, i) — five non-contiguous syncretism types. Caha defends each in §8.3.3 prose:

                      Equations
                      • One or more equations did not get rendered due to their size.
                      Instances For

                        § 4.6: Cross-Slavic summary (Caha §8.3.5, p. 271) #

                        Caha (73) p. 271 presents a unified table: all five investigated Slavic languages (Russian, Serbian, Slovene, Czech, Ukrainian) share the same Universal Adjacency template NOM-ACC-GEN-PREP-DAT-INS. Non-contiguous attestations are addressed as phonological conflations of distinct underlying representations (most cases) or accidental homophonies in restricted niches (a few).

                        All four detailed sub-sections (§§ 4.2-4.5) are now formalized: Serbian (no counterexamples — "poster child"), Slovene (3 in (18) p. 241), Czech (5 in (67) p. 266 — the most permissive language), Ukrainian (1 in (71) p. 269). Per-language all_attested_contiguous lemmas establish UC for each; per-language Counterexamples.all_attested_not_contiguous lemmas confirm the predicate has bite on Caha-acknowledged violators.

                        The cross-Slavic claim is documented here rather than asserted as a bundled -theorem: per-language lemmas already carry the substantive content, and bundling them was the caha_poster_child smell prior audits twice removed.

                        Russian is implicit: Caha (16) p. 12 establishes the same NOM-ACC-GEN-PREP-DAT-INS sequence for Russian as for Serbian (7) p. 238, with paradigm shapes shared (Russian's data appears in §§1.1, 5.1-5.4 as Caha's running example, but §8.3.x focuses on the four other Slavic languages).