@cite{zuraw-2010}: Factorial Typology of Nasal Substitution #
Formalizes the factorial typology of Tagalog-style nasal substitution from @cite{zuraw-2010} (NLLT 28: 417–472). When a nasal-final prefix (e.g. maŋ-) is concatenated with an obstruent-initial stem, the nasal and the obstruent may coalesce into a single nasal retaining the place of the latter:
maŋ+bigáj→mamigáj'to distribute' (nasal substitution = YES)paŋ+tabój→pantabój'to goad' (faithful cluster = NO)
Substrate consumption #
This file routes through the project's POC (Partially Ordered
Constraints) substrate. For each stem-initial consonant c:
vp : StemC → SubSt → Fin 6 → ℕis the violation profile derived directly from the six constraint definitions (no separate stipulation).relevant c : Finset (Fin 6)andyesFav c : Finset (Fin 6)are computed fromvp({i : vp c .yes i ≠ vp c .no i}and{i : vp c .yes i < vp c .no i}respectively), with concretedecide-discharged values.subProb c : ℚisCore.Constraint.PartialOrderConstraints.pocPredictapplied to the discrete partial order onFin 6— i.e. uniform sampling over all 720 total orders.- The closed-form rate
|Y_c ∩ D_c| / |D_c|follows by a single application ofpicksAt_rate_eq(which combines the binary-PicksAt bridge withperm_filter_head_in_card's rational form), with no enumeration of 6! = 720 rankings.
The structural implication theorems in §7 reuse
Core.Constraint.PermSubsetCombinatorics.head_filter_subset_extends
and head_filter_smaller_inherits (lifted from earlier versions of
this file's private helpers) — pure list-filter monotonicity facts
that any binary-output OT factorial-typology study can consume.
Constraint set #
Six constraints drive the factorial typology, matching @cite{zuraw-2010}'s §4.2 footnote 17 (page 446) where the free-ranking enumeration appears:
- NasSub (project-canonical name following @cite{zuraw-hayes-2017}
ex. (3); extensionally coincides with @cite{zuraw-2010}'s DEP-C —
see
nasSubdocstring): violated by NO for every stem. - *NC, after @cite{pater-1999}: penalizes nasal+voiceless-obstruent clusters; violated by NO for voiceless stems.
- *ASSOC (faithfulness): penalizes adding a new association line; violated by YES for every stem.
- *[ŋ, *[n, *[m (markedness, stringent hierarchy after @cite{prince-1997-stringency} and @cite{delacy-2002}): penalize stem-initial nasals at velar/coronal/labial places respectively (with backer = more violations).
Other Zuraw 2010 constraints (MAX(+nas), UNIFORMITY, MORPHEMECOHESION, NOCODA, *GEMINATE, NASASSIM, IDENT(place), FAITH-OO, INTEGRITY-IO) are held high-ranked per Zuraw's analytical choice and do not appear here; they would not vary the YES/NO outcome on the candidate set considered.
Implicational universals (structural) #
The voicing effect (voiced→YES implies voiceless→YES at the same place)
and the place effect (backer→YES implies fronter→YES within a voicing
class) follow from the set-theoretic relationships between D_c and
Y_c across consonants — proved structurally per-ranking, no enumeration.
These typological generalizations are independently established in
@cite{newman-1984}'s overview of Western Austronesian and replicated
in @cite{blust-2004}'s 48-language survey.
Dictionary data #
@cite{zuraw-2010}'s Tagalog dictionary counts (paper §2.2, page 423) confirm the voicing effect: voiceless stems show higher substitution rates than voiced stems at the labial place (p: 253/263 vs b: 177/277).
Relation to other Tagalog NS analyses #
The closely-related study files
Phenomena/Phonology/Studies/ZurawHayes2017.lean and
Phenomena/Phonology/Studies/Magri2025.lean analyze a 2×2 sub-square
of this same phenomenon (maŋ-other / paŋ-res prefixes × /b/ /k/ stems)
under a different constraint inventory (NasSub / *NC / *[stemŋ] /
*[stemŋ]/n / prefix-indexed UNIFORMITY) for a MaxEnt analysis of the
Hayes-Zuraw shifted-sigmoids generalization. The constraint sets and
the data slices differ; the two strands are complementary readings of
@cite{zuraw-2010}'s underlying phenomenon. ZurawHayes2017 and Magri2025
import the constraint identity definitions in §1 below via comap —
those definitions must remain stable.
§ 0: Stems, Substitution Decisions, Dictionary Counts #
Equations
- Zuraw2010.instDecidableEqStemC x✝ y✝ = if h : x✝.ctorIdx = y✝.ctorIdx then isTrue ⋯ else isFalse ⋯
Equations
- Zuraw2010.instReprStemC.repr Zuraw2010.StemC.p prec✝ = Repr.addAppParen (Std.Format.nest (if prec✝ ≥ 1024 then 1 else 2) (Std.Format.text "Zuraw2010.StemC.p")).group prec✝
- Zuraw2010.instReprStemC.repr Zuraw2010.StemC.t prec✝ = Repr.addAppParen (Std.Format.nest (if prec✝ ≥ 1024 then 1 else 2) (Std.Format.text "Zuraw2010.StemC.t")).group prec✝
- Zuraw2010.instReprStemC.repr Zuraw2010.StemC.k prec✝ = Repr.addAppParen (Std.Format.nest (if prec✝ ≥ 1024 then 1 else 2) (Std.Format.text "Zuraw2010.StemC.k")).group prec✝
- Zuraw2010.instReprStemC.repr Zuraw2010.StemC.b prec✝ = Repr.addAppParen (Std.Format.nest (if prec✝ ≥ 1024 then 1 else 2) (Std.Format.text "Zuraw2010.StemC.b")).group prec✝
- Zuraw2010.instReprStemC.repr Zuraw2010.StemC.d prec✝ = Repr.addAppParen (Std.Format.nest (if prec✝ ≥ 1024 then 1 else 2) (Std.Format.text "Zuraw2010.StemC.d")).group prec✝
- Zuraw2010.instReprStemC.repr Zuraw2010.StemC.g prec✝ = Repr.addAppParen (Std.Format.nest (if prec✝ ≥ 1024 then 1 else 2) (Std.Format.text "Zuraw2010.StemC.g")).group prec✝
Instances For
Equations
- Zuraw2010.instReprStemC = { reprPrec := Zuraw2010.instReprStemC.repr }
Equations
- Zuraw2010.instFintypeStemC = { elems := { val := ↑Zuraw2010.StemC.enumList, nodup := Zuraw2010.StemC.enumList_nodup }, complete := Zuraw2010.instFintypeStemC._proof_1 }
Whether nasal substitution applies.
Instances For
Equations
- Zuraw2010.instDecidableEqSubSt x✝ y✝ = if h : x✝.ctorIdx = y✝.ctorIdx then isTrue ⋯ else isFalse ⋯
Equations
- Zuraw2010.instReprSubSt.repr Zuraw2010.SubSt.yes prec✝ = Repr.addAppParen (Std.Format.nest (if prec✝ ≥ 1024 then 1 else 2) (Std.Format.text "Zuraw2010.SubSt.yes")).group prec✝
- Zuraw2010.instReprSubSt.repr Zuraw2010.SubSt.no prec✝ = Repr.addAppParen (Std.Format.nest (if prec✝ ≥ 1024 then 1 else 2) (Std.Format.text "Zuraw2010.SubSt.no")).group prec✝
Instances For
Equations
- Zuraw2010.instReprSubSt = { reprPrec := Zuraw2010.instReprSubSt.repr }
Equations
- Zuraw2010.instFintypeSubSt = { elems := { val := ↑Zuraw2010.SubSt.enumList, nodup := Zuraw2010.SubSt.enumList_nodup }, complete := Zuraw2010.instFintypeSubSt._proof_1 }
A candidate is a stem consonant paired with a substitution decision.
Equations
Instances For
Dictionary substitution rate for voiceless labial p (253/263 ≈ 96.2%). Counts as reported in @cite{zuraw-2010} §2.2 (page 423) from a Tagalog dictionary corpus study.
Equations
- Zuraw2010.dictRate_p = 253 / 263
Instances For
Dictionary substitution rate for voiced labial b (177/277 ≈ 63.9%). Counts as reported in @cite{zuraw-2010} §2.2 (page 423) from a Tagalog dictionary corpus study.
Equations
- Zuraw2010.dictRate_b = 177 / 277
Instances For
Voicing effect in dictionary data (labial place): voiceless p has a higher substitution rate than voiced b.
NasSub (project-canonical name following @cite{zuraw-hayes-2017} ex. (3)). Extensionally equivalent to @cite{zuraw-2010}'s DEP-C in the present 6-constraint analysis, though the two papers frame the same constraint differently:
- @cite{zuraw-2010} §3.1 (page 432, ex. 6): faithfulness DEP-C,
penalizes inserting a segmental host for the floating [+nas]
feature. Violated by NO candidate
pamⱼ-bɪɡajbecause the inserted [m] segment has no input correspondent. - @cite{zuraw-hayes-2017} ex. (3): markedness NasSub, penalizes nasal + obstruent across morpheme boundaries.
Both fire on every NO candidate in the present 6-constraint subset,
so the violation profile coincides. We follow Zuraw-Hayes 2017's
naming for consistency with downstream files
(ZurawHayes2017.lean, Magri2025.lean).
NB: In earlier commits this constraint was labeled *NC; renamed
for fidelity to the paper's notation, where *NC is reserved for
the voiceless-only constraint (see starNC below).
Equations
- Zuraw2010.nasSub = Core.Constraint.OT.mkMark "NasSub" fun (c : Zuraw2010.NSCand) => c.2 = Zuraw2010.SubSt.no
Instances For
*NC, after @cite{pater-1999} (Austronesian NS) and @cite{pater-2001} (revisited). Penalizes nasal + voiceless-obstruent sequences. Violated by NO for voiceless stems only. Per @cite{zuraw-2010} ex. (17) (page 436): "*NC: A [+nasal] segment must not be immediately followed by a [-voice, -sonorant] segment".
Equations
- Zuraw2010.starNC = Core.Constraint.OT.mkMark "*NC" fun (c : Zuraw2010.NSCand) => c.1 ∈ [Zuraw2010.StemC.p, Zuraw2010.StemC.t, Zuraw2010.StemC.k] ∧ c.2 = Zuraw2010.SubSt.no
Instances For
*ASSOC: penalizes adding a new association line (faithfulness). Per @cite{zuraw-2010} (page 432, ex. 7), this is "*ASSOCIATE_hetero-morphemic" — the local restriction of a more general *ASSOC family that fires on association lines crossing morpheme boundaries. Violated by YES for every stem.
Equations
- Zuraw2010.starAssoc = Core.Constraint.OT.mkMark "*ASSOC" fun (c : Zuraw2010.NSCand) => c.2 = Zuraw2010.SubSt.yes
Instances For
*[ŋ, after @cite{prince-1997-stringency} and @cite{delacy-2002} on stringency hierarchies; @cite{zuraw-2010} ex. (19) (page 437). Stems must not begin with ŋ. Violated by YES for velar stems (k, g coalesce to stem-initial ŋ).
Equations
- Zuraw2010.starInitVelar = Core.Constraint.OT.mkMark "*[ŋ" fun (c : Zuraw2010.NSCand) => c.1 ∈ [Zuraw2010.StemC.k, Zuraw2010.StemC.g] ∧ c.2 = Zuraw2010.SubSt.yes
Instances For
*[n: stringency-hierarchy member after @cite{prince-1997-stringency}, @cite{delacy-2002}; @cite{zuraw-2010} ex. (19) (page 437). Stems must not begin with n or backer. Violated by YES for coronal and velar stems.
Equations
- One or more equations did not get rendered due to their size.
Instances For
*[m: top of the stringency hierarchy after @cite{prince-1997-stringency}, @cite{delacy-2002}; @cite{zuraw-2010} ex. (19) (page 437). Stems must not begin with m or backer. Violated by YES for all stems (every coalesced output is stem-initial nasal of some place).
Equations
- Zuraw2010.starInitAll = Core.Constraint.OT.mkMark "*[m" fun (c : Zuraw2010.NSCand) => c.2 = Zuraw2010.SubSt.yes
Instances For
The six constraints, indexed for substrate consumption. Order matches @cite{zuraw-2010}'s §4.2 footnote 17 (page 446): NasSub, *NC, *ASSOC, *[ŋ, *[n, *[m.
Equations
Instances For
The stringent *[N hierarchy assigns increasing violation counts to nasals at backer places: labial m=1, coronal n=2, velar ŋ=3.
*ASSOC and *[m have identical violation profiles on this candidate space. A coincidence of the 0/1-violation simplification rather than a deep identity: in @cite{zuraw-2010}'s richer analysis, *ASSOC's flat penalty contrasts with *[m's stringency-hierarchy role.
Violation profile derived from the constraint definitions, in the
Input → Output → Fin n → ℕ shape required by
PartialOrderConstraints.PicksAt and pocPredict.
Equations
- Zuraw2010.vp c s i = (Zuraw2010.constraint i).eval (c, s)
Instances For
POC candidate set per stem: both YES and NO are available for every stem-initial obstruent.
Equations
- Zuraw2010.nsCands x✝ = Finset.univ
Instances For
The set of constraint indices that distinguish YES from NO for stem
c — i.e. constraints that disagree on the two candidates' violation
counts. Computed directly from vp; see relevant_* below for
concrete decide-discharged values.
Equations
- Zuraw2010.relevant c = {i : Fin 6 | Zuraw2010.vp c Zuraw2010.SubSt.yes i ≠ Zuraw2010.vp c Zuraw2010.SubSt.no i}
Instances For
The set of constraint indices that favor YES for stem c —
constraints assigning fewer violations to YES than to NO. Computed
from vp; see yesFav_* below for concrete values.
Equations
- Zuraw2010.yesFav c = {i : Fin 6 | Zuraw2010.vp c Zuraw2010.SubSt.yes i < Zuraw2010.vp c Zuraw2010.SubSt.no i}
Instances For
Concrete decide-discharged values for relevant and yesFav,
matching @cite{zuraw-2010} §4.2 footnote 17's per-consonant constraint
subsets.
Substitution probability under POC sampling with the discrete partial
order: the fraction of all 6! = 720 total orders that pick YES as
the OT optimum for stem c.
Equations
Instances For
Substitution rate for voiceless labial p: 50% of 720 rankings.
Substitution rate for voiceless coronal t: 40% of 720 rankings.
Substitution rate for voiceless velar k: 33⅓% of 720 rankings.
Substitution rate for voiced labial b: 33⅓% of 720 rankings.
Substitution rate for voiced coronal d: 25% of 720 rankings.
Substitution rate for voiced velar g: 20% of 720 rankings.
All six factorial percentages, matching @cite{zuraw-2010} §4.2
footnote 17 (page 446)'s free-ranking summary (50%, 40%, 33⅓%,
33⅓%, 25%, 20% for p, t, k, b, d, g respectively). Each derived in
closed form from the substrate's picksAt_rate_eq — no 6!
enumeration.
Place monotonicity (model property): the factorial rate strictly decreases from labial to velar within each voicing class. NB: the place effect within voiceless is statistically not significant in @cite{zuraw-2010}'s §5 acceptability data (paper page 459: in a mixed-effects model labials get a slightly lower rating difference than dentals — by 0.3 points — but this is not significant). The strict inequality below is therefore a property of the 6-constraint factorial idealization, not a paper-citable empirical claim about voiceless stems.
Voicing monotonicity: voiceless substitution rate is at least as high as voiced at every place. Empirically robust across all of @cite{zuraw-2010}'s data sources (Fig 1 dictionary, Fig 8 corpus, Fig 14 acceptability, Fig 15 web survey) — also significant in every mixed-effects model the paper reports.
These structural per-ranking implication theorems formalize the
cross-linguistic implicational universals established in
@cite{newman-1984}'s overview of Western Austronesian (replicated in
@cite{blust-2004}'s 48-language survey): if NS applies to a voiced
obstruent, it applies to the corresponding voiceless obstruent;
if NS applies to a stop, it applies to any fronter stop of the same
voicing. The substrate proofs go via the lifted helpers
Core.Constraint.PermSubsetCombinatorics.head_filter_subset_extends
and head_filter_smaller_inherits (originally private here, lifted to
substrate alongside perm_filter_head_in_card).
Voicing-style extension: if c' has a smaller distinguishing
set than c but c's extras all favor YES, then c' substitutes
implies c substitutes. Used for voiced→voiceless implications.
Place-style extension: if c' has a larger distinguishing set
than c but c''s YES-favorers all lie in c's smaller set, then
c' substitutes implies c substitutes. Used for backer→fronter
implications within a voicing class.
Voicing effect, labial: if voiced labial b undergoes substitution, so does voiceless labial p. Per-ranking, structural — no enumeration.
Voicing effect, coronal: voiced d subs implies voiceless t subs.
Voicing effect, velar: voiced g subs implies voiceless k subs.
Place effect, voiceless k→t: if velar k subs, coronal t also subs.
Place effect, voiceless t→p: if coronal t subs, labial p also subs.
Place effect, voiced g→d: if velar g subs, coronal d also subs.
Place effect, voiced d→b: if coronal d subs, labial b also subs.
Tagalog-style maximal substitution: if velar voiced g subs, every other consonant subs too. Composition of voicing + place effects — the upper end of the @cite{newman-1984} / @cite{blust-2004} implicational hierarchy.
A ranking exists under which every consonant undergoes substitution — corresponding to Pattern (j) in @cite{zuraw-2010} Table 5 (page 462), exemplified by Limos Kalinga, Ginaang Kalinga, and Sarangani Manobo (paper page 463; not Tagalog itself, which has variation: Fig 1 rates of 96/91/92/64/26/2% for p/t/k/b/d/g). Witness: the identity permutation, under which NasSub (constraint index 0) is highest-ranked and favors YES for every stem.
The probabilistic 2×2-square version of the Tagalog variation
pattern under a different constraint inventory is treated in
Phenomena/Phonology/Studies/ZurawHayes2017.lean and
Phenomena/Phonology/Studies/Magri2025.lean.