Documentation

Linglib.Phenomena.Classifiers.Studies.LittleMoroneyRoyer2022

Little, Moroney & Royer (2022) #

@cite{little-moroney-royer-2022}

Classifiers can be for numerals or nouns: Two strategies for numeral modification. Glossa 7(1). 1–35.

Core Claim #

Numeral classifiers form a heterogeneous class. Two families of theories — classifier-for-numeral (CLF-for-NUM) and classifier-for-noun (CLF-for-N) — are both correct, but for different languages:

Four Predictions (Table 8) #

The two strategies make divergent predictions about classifier distribution:

  1. (num) Variation in whether a numeral requires a CLF → CLF-for-NUM
  2. (noun) Variation in whether a noun requires a CLF → CLF-for-N
  3. (noun) CLF found beyond numerals (quantifiers, demonstratives) → CLF-for-N
  4. (num) CLF appears in counting (no noun present) → CLF-for-NUM

Ch'ol shows predictions 1 and 4; Shan shows predictions 2 and 3.

Semantic Equivalence #

Despite different derivational strategies, both languages derive the same meaning for "two dogs": {ab, ac, bc} — the set of pluralities of two dogs.

Architectural Note #

CLF-for-NUM is formalized using Mereology.QMOD — the measure function Finset.card produces a quantized predicate (clfForNum_qua). CLF-for-N is formalized directly as atom-pair selection: ∃ d₁ d₂, d₁ ≠ d₂ ∧ s = {d₁, d₂}. The ClassifierStrategy enum in Typology captures the typological parameter.

Note: Mereology.atomize cannot be applied to Finset Dog directly because Finset has as a bottom element — Mereology.Atom (no proper part) is only satisfied by , so atomize(DOGS) would be empty. The CLF-for-N semantics is instead formalized at the element level: singletons {d} are the atoms, and clfForNounSem selects 2-element unions of distinct atoms. The extensional equivalence (derivations_extensionally_equal) bridges the two via Finset.card_eq_two.

Ch'ol noun categorization system: numeral classifier, CLF-for-NUM. @cite{bale-coon-2014} @cite{bale-et-al-2019}

Key properties:

  • Classifiers are bound to the numeral (suffixes)
  • Only Mayan-based numerals (1–6) take classifiers; Spanish loans do not
  • Classifiers appear in counting contexts (no noun)
  • Plural marking -ob co-occurs with classifiers (ex. 30)
  • Classifiers are ungrammatical with quantifiers, demonstratives, modifiers (ex. 19)
Equations
  • One or more equations did not get rendered due to their size.
Instances For

    Shan noun categorization system: numeral classifier, CLF-for-N. @cite{moroney-2021}

    Key properties:

    • Classifiers are free morphemes derived from nominal elements
    • All numerals uniformly require classifiers (no idiosyncrasies)
    • Classifiers appear with quantifiers, demonstratives, relative clauses (ex. 42)
    • Classifiers degraded/unacceptable in counting contexts (exs. 48–49)
    • No plural–classifier co-occurrence
    Equations
    • One or more equations did not get rendered due to their size.
    Instances For

      The four distributional predictions from the CLF-for-NUM vs CLF-for-N distinction (Table 2/7/8).

      • numeralIdiosyncrasies : Bool

        Prediction 1: Idiosyncrasies in whether a numeral requires a CLF. Expected for CLF-for-NUM (measure function may be built into numeral).

      • nounIdiosyncrasies : Bool

        Prediction 2: Idiosyncrasies in whether a noun requires a CLF. Expected for CLF-for-N (some nouns may already denote atoms).

      • clfBeyondNumerals : Bool

        Prediction 3: CLF found with the noun beyond numerals (quantifiers, demonstratives, relative clauses). Expected for CLF-for-N (CLF is for the noun, not the numeral).

      • clfInCounting : Bool

        Prediction 4: CLF appears in counting contexts (no noun present). Expected for CLF-for-NUM (CLF is required by the numeral itself).

      Instances For
        Equations
        • One or more equations did not get rendered due to their size.
        Instances For
          Equations
          Instances For
            def LittleMoroneyRoyer2022.instDecidableEqPredictions.decEq (x✝ x✝¹ : Predictions) :
            Decidable (x✝ = x✝¹)
            Equations
            • One or more equations did not get rendered due to their size.
            Instances For

              Expected predictions for CLF-for-NUM languages.

              Equations
              Instances For

                Expected predictions for CLF-for-N languages.

                Equations
                Instances For

                  Expected predictions for languages whose classifier system is the @cite{sudo-2016} blocking strategy: classifier semantics live with numerals, not nouns; the silent ∪-operator that lifts numerals to predicates is blocked by the lexical presence of classifiers.

                  LMR's diagnostic battery applied to Sudo's framework:

                  • numeralIdiosyncrasies = false: ∪ is uniformly blocked across all numerals; no per-numeral variation (contrast Ch'ol Mayan-vs- Spanish split, which Sudo's framework does not predict).
                  • nounIdiosyncrasies = false: explanation lives in numerals, not nouns; uniform across the noun lexicon.
                  • clfBeyondNumerals = false: classifiers exist to lift numerals to predicate type; they appear with numerals, not beyond them (contrast LMR's CLF-for-N prediction).
                  • clfInCounting = true: the ∩-operator (Sudo eq. 24) maps the numeral+CL property back to type-n, so number-predicate uses like juu-ni-nin-da "the number is twelve people" (Sudo eq. 22a) are well-formed (contrast LMR's CLF-for-N prediction).
                  Equations
                  Instances For

                    §2b: LMR's per-language strategy assignments #

                    @cite{little-moroney-royer-2022} assigns Ch'ol the CLF-for-NUM strategy and Shan the CLF-for-N strategy. They are consistent with the @cite{chierchia-1998} CLF-for-N assignment for Mandarin/Japanese (LMR treat Sinitic and Japonic as CLF-for-N). Per-language assignments live here (in this study file) rather than on NounCategorizationSystem.

                    LMR's strategy assignment for Ch'ol: classifier is a measure function required by the numeral.

                    Equations
                    Instances For

                      LMR's strategy assignment for Shan: classifier atomizes the noun denotation.

                      Equations
                      Instances For

                        The CLF-for-NUM and CLF-for-N profiles are distinct — the two LMR strategies make genuinely different predictions on all four diagnostics.

                        Ch'ol predictions follow from LMR's strategy assignment via predictionsOf.

                        @cite{chierchia-1998} and @cite{sudo-2016} disagree on Japanese's classifier strategy: Chierchia assigns .forNoun, Sudo assigns .sudoBlocking. Run through LMR's diagnostic battery, the two strategies make divergent empirical predictions:

                        The empirical wedge: under LMR's diagnostics, Chierchia's .forNoun and Sudo's .sudoBlocking agree on numeralIdiosyncrasies = false but diverge on the other three. The most decisive disagreement is clfInCounting: Sudo predicts true (citing eq. 22a — juu-ni-nin-da "the number is twelve people" is well-formed via the ∩-operator), Chierchia predicts false. Japanese empirically exhibits the Sudo pattern on this diagnostic.

                        Symmetric divergence on clfBeyondNumerals: Chierchia predicts true (CLF appears with quantifiers, demonstratives, relative clauses independent of numerals); Sudo predicts false (CLF exists for numerals, not beyond them).

                        Grammaticality judgments for Ch'ol classifier distribution (§3.1, §4). Each datum records whether a CLF appears in a given syntactic context.

                        • language : String
                        • context : String
                        • clfPresent : Bool
                        • grammatical : Bool
                        Instances For
                          Equations
                          • One or more equations did not get rendered due to their size.
                          Instances For

                            Ch'ol: CLF only with numerals and interrogative jay- 'how many'.

                            Equations
                            • One or more equations did not get rendered due to their size.
                            Instances For

                              Shan: CLF with numerals, quantifiers, demonstratives, relative clauses.

                              Equations
                              • One or more equations did not get rendered due to their size.
                              Instances For

                                @cite{little-moroney-royer-2022} §3.4 refine @cite{greenberg-1972}'s complementarity universal. The original says numeral classifiers and obligatory number marking are in complementary distribution. The refinement: this holds for CLF-for-N (where CLF and PL occupy the same functional projection) but not for CLF-for-NUM (where CLF is in a separate projection and can co-occur with PL).

                                Ch'ol (CLF-for-NUM): cha'-tyikil wiñik-ob 'two-CLF men-PL' (ex. 30) Shan (CLF-for-N): *mǎa sǎam tǒ khǎw 'three CLF dogs PL' (unattested)

                                Prediction 3 (CLF beyond numerals) is derived from the system's scopes. CLF-for-N classifiers serve the noun, so they appear wherever the noun needs individuation — not just with numerals.

                                Ch'ol constituency (51): numeral and classifier form a constituent. [[cha'-kojty]_NumCLF [ts'i']_N] The numeral cha' first combines with the classifier -kojty to form a measure phrase, which then applies to the noun ts'i' 'dog'.

                                Equations
                                Instances For

                                  Shan constituency (52): classifier and noun form a constituent. [[sǒŋ]_Num [[tǒ]_CLF [mǎa]_N]] The classifier first combines with the noun mǎa 'dog' to atomize it, then the numeral sǒŋ 'two' selects a 2-element sum.

                                  Equations
                                  Instances For

                                    The two derivation trees have different constituency despite both being binary branching over three terminals. In Ch'ol, the left daughter of the root is complex (Num+CLF); in Shan, the right daughter is complex (CLF+N).

                                    This structural difference is what generates the four distributional predictions: if Num+CLF is a constituent, the classifier is part of the numeral's semantics and appears wherever the numeral appears (counting, number reference). If CLF+N is a constituent, the classifier is part of the noun's semantics and appears wherever the noun needs individuation (quantifiers, demonstratives).

                                    Both trees have the same size (5 nodes each: 2 internal + 3 terminals). The difference is purely structural — which pairs branch together.

                                    A finite domain of three atomic dogs: a, b, c.

                                    Instances For
                                      @[implicit_reducible]
                                      Equations
                                      def LittleMoroneyRoyer2022.instReprDog.repr :
                                      DogStd.Format
                                      Equations
                                      • One or more equations did not get rendered due to their size.
                                      Instances For
                                        @[implicit_reducible]
                                        Equations
                                        • One or more equations did not get rendered due to their size.

                                        CLF-for-NUM derivation: Mereology.QMOD applied to the dog domain. ⟦two-CLF⟧ = λP. QMOD(P, μ#, 2) where μ# = Finset.card. This uses Mereology.QMOD from Core/Mereology.lean: QMOD(R, μ, n) = λx. R(x) ∧ μ(x) = n

                                        Equations
                                        Instances For

                                          CLF-for-N derivation: atomize, then count. ⟦CLF⟧(⟦DOGS⟧) restricts to atoms (singletons), then ⟦TWO⟧ selects 2-element sums from the atomized set. The result: s is the union of exactly two distinct atoms.

                                          Equations
                                          Instances For

                                            The two derivation strategies are extensionally equivalent: QMOD(DOGS, μ#, 2) = {s | ∃ d₁ d₂, d₁ ≠ d₂ ∧ s = {d₁, d₂}}. This is the paper's key semantic result (§5): despite different compositional paths, both strategies produce the same denotation for "two dogs."

                                            The CLF-for-NUM path uses the measure constraint directly (QMOD); the CLF-for-N path atomizes then forms 2-element sums. Finset.card_eq_two provides the bridge: a finset has cardinality 2 iff it's a pair of distinct elements.

                                            theorem LittleMoroneyRoyer2022.dogs_cum :
                                            Mereology.CUM fun (x : Finset Dog) => x.Nonempty

                                            The full dog predicate (Nonempty) is cumulative: the union of two dog-pluralities is a dog-plurality. Mereology.CUM applied to Finset Dog with ⊔ = ∪.

                                            Cumulativity is what forces classifier languages to need a classifier: counting over a CUM predicate is undefined until it's quantized. CLF-for-NUM uses QMOD to quantize directly; CLF-for-N atomizes first.

                                            CLF-for-NUM creates a quantized predicate via QMOD: no proper subset of a 2-element set also has 2 elements.

                                            Proof: y ⊂ x implies |y| < |x| (Finset.card_lt_card), but both have card 2 — contradiction. This mirrors the general Mereology.extMeasure_qua pattern (QMOD by any extensive measure produces QUA), instantiated directly for Finset.card.

                                            CLF-for-N also creates a quantized predicate: no proper subset of a pair of distinct dogs is also a pair of distinct dogs. Both strategies convert CUM predicates to QUA predicates — this is the semantic function of classifiers regardless of strategy.

                                            Proof: if y ⊂ x and both satisfy clfForNounSem, then by derivations_extensionally_equal, both have card 2. But y ⊂ x implies |y| < |x| — contradiction.

                                            Both strategies quantize: the semantic function of classifiers is to turn CUM predicates into QUA predicates, enabling counting.

                                            Concrete witness: {a, b} is a two-dog plurality.

                                            Concrete witness: {a, c} is a two-dog plurality.

                                            Concrete witness: {b, c} is a two-dog plurality.

                                            Singletons are not two-dog pluralities: the measure constraint excludes them. This is why CLF-for-N atomization alone doesn't suffice — the numeral still needs to select the right cardinality.

                                            The triple is not a two-dog plurality: QMOD excludes oversized sums.

                                            Ch'ol and Shan are both numeral classifier systems in Aikhenvald's typology, but have different classifier strategies. They agree on Aikhenvald's morphosyntactic classification but differ on the semantic level — illustrating that ClassifierType is too coarse to capture the CLF-for-NUM vs CLF-for-N distinction.

                                            Sample-restricted: in the 7-language Aikhenvald sample plus Ch'ol and Shan, every classifier-type language lacks agreement.

                                            @cite{chierchia-1998}'s NMP predicts CLF-for-N for [+arg, -pred] languages (Mandarin, Japanese). Shan is also CLF-for-N per @cite{little-moroney-royer-2022}, despite being Kra-Dai not Sino-Tibetan — the strategy is independent of the NMP parameter. Ch'ol is CLF-for-NUM, which @cite{chierchia-1998} does not predict (Ch'ol is not a [+arg, -pred] language in the NMP typology).

                                            This connects the two classifier study files: Chierchia predicts the strategy for Mandarin/Japanese; Little et al. provide the diagnostic framework that confirms it and extends it to new languages.

                                            Ch'ol's CLF-for-NUM strategy differs from the Chierchia-predicted CLF-for-N found in Mandarin and Japanese. This is the paper's main typological contribution: not all numeral classifier languages use the same semantic strategy.

                                            The unified classifierDenot correctly dispatches based on strategy.

                                            • CLF-for-N → clfForNoun (= atomize)
                                            • CLF-for-NUM → clfForNum (= QMOD)

                                            This confirms that the typological enum in Typology is structurally connected to semantic content, not just a label.

                                            theorem LittleMoroneyRoyer2022.clfForNum_agrees_with_local (s : Finset Dog) :
                                            clfForNumSem s Mereology.QMOD (fun (x : Finset Dog) => x.Nonempty) Finset.card 2 s

                                            The local clfForNumSem IS QMOD from Core.Mereology: both compute R(x) ∧ μ(x) = n with μ = Finset.card and n = 2. The unified clfForNum specializes QMOD to ; the local definition uses directly. Both reduce to QMOD.