Documentation

Linglib.Phenomena.Phonology.Studies.Zuraw2010

@cite{zuraw-2010}: Factorial Typology of Nasal Substitution #

Formalizes the factorial typology of Tagalog-style nasal substitution from @cite{zuraw-2010} (NLLT 28: 417–472). When a nasal-final prefix (e.g. maŋ-) is concatenated with an obstruent-initial stem, the nasal and the obstruent may coalesce into a single nasal retaining the place of the latter:

Substrate consumption #

This file routes through the project's POC (Partially Ordered Constraints) substrate. For each stem-initial consonant c:

The structural implication theorems in §7 reuse Core.Constraint.PermSubsetCombinatorics.head_filter_subset_extends and head_filter_smaller_inherits (lifted from earlier versions of this file's private helpers) — pure list-filter monotonicity facts that any binary-output OT factorial-typology study can consume.

Constraint set #

Six constraints drive the factorial typology, matching @cite{zuraw-2010}'s §4.2 footnote 17 (page 446) where the free-ranking enumeration appears:

Other Zuraw 2010 constraints (MAX(+nas), UNIFORMITY, MORPHEMECOHESION, NOCODA, *GEMINATE, NASASSIM, IDENT(place), FAITH-OO, INTEGRITY-IO) are held high-ranked per Zuraw's analytical choice and do not appear here; they would not vary the YES/NO outcome on the candidate set considered.

Implicational universals (structural) #

The voicing effect (voiced→YES implies voiceless→YES at the same place) and the place effect (backer→YES implies fronter→YES within a voicing class) follow from the set-theoretic relationships between D_c and Y_c across consonants — proved structurally per-ranking, no enumeration. These typological generalizations are independently established in @cite{newman-1984}'s overview of Western Austronesian and replicated in @cite{blust-2004}'s 48-language survey.

Dictionary data #

@cite{zuraw-2010}'s Tagalog dictionary counts (paper §2.2, page 423) confirm the voicing effect: voiceless stems show higher substitution rates than voiced stems at the labial place (p: 253/263 vs b: 177/277).

Relation to other Tagalog NS analyses #

The closely-related study files Phenomena/Phonology/Studies/ZurawHayes2017.lean and Phenomena/Phonology/Studies/Magri2025.lean analyze a 2×2 sub-square of this same phenomenon (maŋ-other / paŋ-res prefixes × /b/ /k/ stems) under a different constraint inventory (NasSub / *NC / *[stemŋ] / *[stemŋ]/n / prefix-indexed UNIFORMITY) for a MaxEnt analysis of the Hayes-Zuraw shifted-sigmoids generalization. The constraint sets and the data slices differ; the two strands are complementary readings of @cite{zuraw-2010}'s underlying phenomenon. ZurawHayes2017 and Magri2025 import the constraint identity definitions in §1 below via comap — those definitions must remain stable.

§ 0: Stems, Substitution Decisions, Dictionary Counts #

The six stem-initial obstruents in @cite{zuraw-2010}'s nasal substitution typology. Coalescence maps each to its homorganic nasal: p,b → m; t,d → n; k,g → ŋ.

Instances For
    @[implicit_reducible]
    Equations
    def Zuraw2010.instReprStemC.repr :
    StemCStd.Format
    Equations
    Instances For
      @[implicit_reducible]
      Equations
      @[implicit_reducible]
      Equations

      Whether nasal substitution applies.

      Instances For
        @[implicit_reducible]
        Equations
        def Zuraw2010.instReprSubSt.repr :
        SubStStd.Format
        Equations
        Instances For
          @[implicit_reducible]
          Equations
          @[implicit_reducible]
          Equations
          @[reducible, inline]

          A candidate is a stem consonant paired with a substitution decision.

          Equations
          Instances For

            Dictionary substitution rate for voiceless labial p (253/263 ≈ 96.2%). Counts as reported in @cite{zuraw-2010} §2.2 (page 423) from a Tagalog dictionary corpus study.

            Equations
            Instances For

              Dictionary substitution rate for voiced labial b (177/277 ≈ 63.9%). Counts as reported in @cite{zuraw-2010} §2.2 (page 423) from a Tagalog dictionary corpus study.

              Equations
              Instances For

                Voicing effect in dictionary data (labial place): voiceless p has a higher substitution rate than voiced b.

                NasSub (project-canonical name following @cite{zuraw-hayes-2017} ex. (3)). Extensionally equivalent to @cite{zuraw-2010}'s DEP-C in the present 6-constraint analysis, though the two papers frame the same constraint differently:

                • @cite{zuraw-2010} §3.1 (page 432, ex. 6): faithfulness DEP-C, penalizes inserting a segmental host for the floating [+nas] feature. Violated by NO candidate pamⱼ-bɪɡaj because the inserted [m] segment has no input correspondent.
                • @cite{zuraw-hayes-2017} ex. (3): markedness NasSub, penalizes nasal + obstruent across morpheme boundaries.

                Both fire on every NO candidate in the present 6-constraint subset, so the violation profile coincides. We follow Zuraw-Hayes 2017's naming for consistency with downstream files (ZurawHayes2017.lean, Magri2025.lean).

                NB: In earlier commits this constraint was labeled *NC; renamed for fidelity to the paper's notation, where *NC is reserved for the voiceless-only constraint (see starNC below).

                Equations
                Instances For

                  *NC, after @cite{pater-1999} (Austronesian NS) and @cite{pater-2001} (revisited). Penalizes nasal + voiceless-obstruent sequences. Violated by NO for voiceless stems only. Per @cite{zuraw-2010} ex. (17) (page 436): "*NC: A [+nasal] segment must not be immediately followed by a [-voice, -sonorant] segment".

                  Equations
                  Instances For

                    *ASSOC: penalizes adding a new association line (faithfulness). Per @cite{zuraw-2010} (page 432, ex. 7), this is "*ASSOCIATE_hetero-morphemic" — the local restriction of a more general *ASSOC family that fires on association lines crossing morpheme boundaries. Violated by YES for every stem.

                    Equations
                    Instances For

                      *[ŋ, after @cite{prince-1997-stringency} and @cite{delacy-2002} on stringency hierarchies; @cite{zuraw-2010} ex. (19) (page 437). Stems must not begin with ŋ. Violated by YES for velar stems (k, g coalesce to stem-initial ŋ).

                      Equations
                      Instances For

                        *[n: stringency-hierarchy member after @cite{prince-1997-stringency}, @cite{delacy-2002}; @cite{zuraw-2010} ex. (19) (page 437). Stems must not begin with n or backer. Violated by YES for coronal and velar stems.

                        Equations
                        • One or more equations did not get rendered due to their size.
                        Instances For

                          *[m: top of the stringency hierarchy after @cite{prince-1997-stringency}, @cite{delacy-2002}; @cite{zuraw-2010} ex. (19) (page 437). Stems must not begin with m or backer. Violated by YES for all stems (every coalesced output is stem-initial nasal of some place).

                          Equations
                          Instances For

                            The six constraints, indexed for substrate consumption. Order matches @cite{zuraw-2010}'s §4.2 footnote 17 (page 446): NasSub, *NC, *ASSOC, *[ŋ, *[n, *[m.

                            Equations
                            Instances For

                              The stringent *[N hierarchy assigns increasing violation counts to nasals at backer places: labial m=1, coronal n=2, velar ŋ=3.

                              theorem Zuraw2010.assoc_eq_initAll (c : StemC) (s : SubSt) :
                              starAssoc.eval (c, s) = starInitAll.eval (c, s)

                              *ASSOC and *[m have identical violation profiles on this candidate space. A coincidence of the 0/1-violation simplification rather than a deep identity: in @cite{zuraw-2010}'s richer analysis, *ASSOC's flat penalty contrasts with *[m's stringency-hierarchy role.

                              def Zuraw2010.vp (c : StemC) (s : SubSt) (i : Fin 6) :

                              Violation profile derived from the constraint definitions, in the Input → Output → Fin n → ℕ shape required by PartialOrderConstraints.PicksAt and pocPredict.

                              Equations
                              Instances For
                                def Zuraw2010.nsCands :
                                StemCFinset SubSt

                                POC candidate set per stem: both YES and NO are available for every stem-initial obstruent.

                                Equations
                                Instances For
                                  def Zuraw2010.relevant (c : StemC) :
                                  Finset (Fin 6)

                                  The set of constraint indices that distinguish YES from NO for stem c — i.e. constraints that disagree on the two candidates' violation counts. Computed directly from vp; see relevant_* below for concrete decide-discharged values.

                                  Equations
                                  Instances For
                                    def Zuraw2010.yesFav (c : StemC) :
                                    Finset (Fin 6)

                                    The set of constraint indices that favor YES for stem c — constraints assigning fewer violations to YES than to NO. Computed from vp; see yesFav_* below for concrete values.

                                    Equations
                                    Instances For

                                      Concrete decide-discharged values for relevant and yesFav, matching @cite{zuraw-2010} §4.2 footnote 17's per-consonant constraint subsets.

                                      @[simp]
                                      theorem Zuraw2010.relevant_p :
                                      relevant StemC.p = {0, 1, 2, 5}
                                      @[simp]
                                      theorem Zuraw2010.relevant_t :
                                      relevant StemC.t = {0, 1, 2, 4, 5}
                                      @[simp]
                                      theorem Zuraw2010.relevant_k :
                                      relevant StemC.k = {0, 1, 2, 3, 4, 5}
                                      @[simp]
                                      @[simp]
                                      theorem Zuraw2010.relevant_d :
                                      relevant StemC.d = {0, 2, 4, 5}
                                      @[simp]
                                      theorem Zuraw2010.relevant_g :
                                      relevant StemC.g = {0, 2, 3, 4, 5}
                                      @[simp]
                                      @[simp]
                                      @[simp]
                                      @[simp]
                                      @[simp]
                                      @[simp]
                                      def Zuraw2010.subProb (c : StemC) :

                                      Substitution probability under POC sampling with the discrete partial order: the fraction of all 6! = 720 total orders that pick YES as the OT optimum for stem c.

                                      Equations
                                      Instances For

                                        Substitution rate for voiceless labial p: 50% of 720 rankings.

                                        Substitution rate for voiceless coronal t: 40% of 720 rankings.

                                        Substitution rate for voiceless velar k: 33⅓% of 720 rankings.

                                        Substitution rate for voiced labial b: 33⅓% of 720 rankings.

                                        Substitution rate for voiced coronal d: 25% of 720 rankings.

                                        Substitution rate for voiced velar g: 20% of 720 rankings.

                                        theorem Zuraw2010.factorial_rates :
                                        subProb StemC.p = 1 / 2 subProb StemC.t = 2 / 5 subProb StemC.k = 1 / 3 subProb StemC.b = 1 / 3 subProb StemC.d = 1 / 4 subProb StemC.g = 1 / 5

                                        All six factorial percentages, matching @cite{zuraw-2010} §4.2 footnote 17 (page 446)'s free-ranking summary (50%, 40%, 33⅓%, 33⅓%, 25%, 20% for p, t, k, b, d, g respectively). Each derived in closed form from the substrate's picksAt_rate_eq — no 6! enumeration.

                                        Place monotonicity (model property): the factorial rate strictly decreases from labial to velar within each voicing class. NB: the place effect within voiceless is statistically not significant in @cite{zuraw-2010}'s §5 acceptability data (paper page 459: in a mixed-effects model labials get a slightly lower rating difference than dentals — by 0.3 points — but this is not significant). The strict inequality below is therefore a property of the 6-constraint factorial idealization, not a paper-citable empirical claim about voiceless stems.

                                        Voicing monotonicity: voiceless substitution rate is at least as high as voiced at every place. Empirically robust across all of @cite{zuraw-2010}'s data sources (Fig 1 dictionary, Fig 8 corpus, Fig 14 acceptability, Fig 15 web survey) — also significant in every mixed-effects model the paper reports.

                                        These structural per-ranking implication theorems formalize the cross-linguistic implicational universals established in @cite{newman-1984}'s overview of Western Austronesian (replicated in @cite{blust-2004}'s 48-language survey): if NS applies to a voiced obstruent, it applies to the corresponding voiceless obstruent; if NS applies to a stop, it applies to any fronter stop of the same voicing. The substrate proofs go via the lifted helpers Core.Constraint.PermSubsetCombinatorics.head_filter_subset_extends and head_filter_smaller_inherits (originally private here, lifted to substrate alongside perm_filter_head_in_card).

                                        theorem Zuraw2010.PicksAt_extends_smaller_D {σ : Equiv.Perm (Fin 6)} {c c' : StemC} (h_D : relevant c' relevant c) (h_Y : yesFav c' yesFav c) (h_extra : xrelevant c, xrelevant c'x yesFav c) (h_c' : Core.Constraint.PartialOrderConstraints.PicksAt nsCands vp σ c' SubSt.yes) :

                                        Voicing-style extension: if c' has a smaller distinguishing set than c but c's extras all favor YES, then c' substitutes implies c substitutes. Used for voiced→voiceless implications.

                                        Place-style extension: if c' has a larger distinguishing set than c but c''s YES-favorers all lie in c's smaller set, then c' substitutes implies c substitutes. Used for backer→fronter implications within a voicing class.

                                        Voicing effect, labial: if voiced labial b undergoes substitution, so does voiceless labial p. Per-ranking, structural — no enumeration.

                                        Place effect, voiceless k→t: if velar k subs, coronal t also subs.

                                        Place effect, voiceless t→p: if coronal t subs, labial p also subs.

                                        Place effect, voiced g→d: if velar g subs, coronal d also subs.

                                        Place effect, voiced d→b: if coronal d subs, labial b also subs.

                                        A ranking exists under which every consonant undergoes substitution — corresponding to Pattern (j) in @cite{zuraw-2010} Table 5 (page 462), exemplified by Limos Kalinga, Ginaang Kalinga, and Sarangani Manobo (paper page 463; not Tagalog itself, which has variation: Fig 1 rates of 96/91/92/64/26/2% for p/t/k/b/d/g). Witness: the identity permutation, under which NasSub (constraint index 0) is highest-ranked and favors YES for every stem.

                                        The probabilistic 2×2-square version of the Tagalog variation pattern under a different constraint inventory is treated in Phenomena/Phonology/Studies/ZurawHayes2017.lean and Phenomena/Phonology/Studies/Magri2025.lean.