@cite{anttila-1997}: Deriving Variation from Grammar #
Formalizes the quantitative variation predictions for Finnish genitive plurals from @cite{anttila-1997}. Anttila's claim: free variation in Finnish (and crucially, its statistical biases) is derivable from a single partially-ranked OT grammar — the variant probabilities equal the fraction of total rankings consistent with the partial ranking under which that variant wins.
The grammar #
Anttila stratifies 16 constraints into 5 mutually-ranked strata, with internal random ordering within each stratum (@cite{anttila-1997} eq. (49)–(50), page 21):
Set 1 ≫ Set 2 ≫ Set 3 ≫ Set 4 ≫ Set 5
- Set 1: *X̀.X̀ (1 constraint, NoClash)
- Set 2: *L̀, *H̀ (2 constraints; secondary-stress *L, *H)
- Set 3: *H/I, *Í, *L.L (3 constraints)
- Set 4: *H/O, *Ó, *L/A, *H.H, *X.X, *H́ (6 constraints; final
constraint is
*H́acute = primary-stressed-heavy, distinct from Set 2's*H̀grave = secondary-stressed-heavy) - Set 5: 8 lower constraints (irrelevant for the variation cases here)
Substrate consumption #
This file routes through the project's POC (Partially Ordered
Constraints) substrate. For each motif, a violation-profile function
vp : Input → Variant → Fin n → ℕ derives relevant (where vp
disagrees on the two variants) and yesFav (where vp favors the
chosen variant). pocPredict over discrete n (uniform sampling
over all n! total orders) gives the variant probability;
picksAt_rate_eq reduces pocPredict to |Y ∩ D| / |D| in closed
form — no enumeration of n! rankings.
Two POC instances, one per stratum:
- Set 3 (n = 3): motif 3ab only. Input is
Unit(single motif). - Set 4 (n = 6): motifs 4ab and 5ab. Input is
Set4Motifto distinguish the two motifs' violation profiles.
Note on candidate-feature substrate #
We stipulate violation profiles via vp rather than defining
NamedConstraint instances. This matches Anttila's own level of
abstraction: the paper works directly with violation profiles
(@cite{anttila-1997} page 22: "knowing that the weak variant violates
one constraint (*L.L) while the strong variant violates two (*H/I,
*Í) gives us the result directly"). True NamedConstraint
formalisations would require a Finnish syllable substrate (input
forms with stress / weight / sonority features feeding into syllable
structure) which doesn't yet exist in linglib.
Predictions formalized #
From @cite{anttila-1997} table 52 (page 22) and table 53 (page 23):
- 3ab (
L.TÍI∼L.TI, e.g.naa.pu.rei.den∼naa.pu.ri.en): decided in Set 3 (n=3). Strong wins 1/3, weak wins 2/3. Observed: 36.9% / 63.1% (215 / 368 corpus tokens). - 4ab (
H.TÁA∼H.TA, e.g.máa.il.mòi.den∼máa.il.mo.jen): decided in Set 4 (n=6) with both variants violating two Set-4 constraints. Each wins 1/2. Observed: 50.5% / 49.5% (46 / 45 corpus tokens). - 5ab (
H.TÓO∼H.TO, e.g.kór.jaa.mòi.den∼kór.jaa.mo.jen): decided in Set 4. Strong wins 1/5, weak wins 4/5. Observed: 17.8% / 82.2% (76 / 350 corpus tokens).
Out of scope #
- Categorical motifs 1ab, 2ab, 6ab. Per @cite{anttila-1997} table 52, these are decided by Set 1 (NoClash) and Set 2 (*L̀ / *H̀), which this file doesn't model. The categorical predictions follow from higher-stratum constraints decisively favoring one variant.
NamedConstraintinstances for *H/I, *Í, *L.L, etc. — would require a Finnish syllable substrate (see "Note on candidate-feature substrate" above).- Observed-vs-predicted comparison theorems. The paper's table 53 shows a small gap between predicted (1/3, 2/3, 1/2, 1/2, 1/5, 4/5) and observed; this gap is empirical noise around the discrete prediction (the paper itself notes "as the quantitative predictions of our model are discrete probabilities (1/2, 1/3, 1/5 etc.) it would be difficult to get any closer", page 23).
Same closed form as @cite{zuraw-2010}, @cite{coetzee-pater-2011} #
Anttila's Finnish variation, Zuraw's Tagalog factorial typology, and
Coetzee & Pater's English t/d-deletion all reduce to the same
substrate predictor pocPredict (discrete n) with binary candidate
spaces — variant probability = |Y ∩ D| / |D| (where D distinguishes
and Y favors the chosen variant). The reusability across three
phonological domains validates the abstraction; see
Phenomena/Phonology/Studies/Zuraw2010.lean and
Phenomena/Phonology/Studies/CoetzeePater2011.lean for sister
consumers.
§ 0: Variant type — strong vs weak genitive plural #
The two genitive-plural variants: strong (heavy penult, final syllable onset /t/ or /d/) vs weak (light penult, onset /j/ or absent). See @cite{anttila-1997} ex. (1) page 3.
Instances For
Equations
- Anttila1997.instDecidableEqVariant x✝ y✝ = if h : x✝.ctorIdx = y✝.ctorIdx then isTrue ⋯ else isFalse ⋯
Equations
- Anttila1997.instReprVariant = { reprPrec := Anttila1997.instReprVariant.repr }
Equations
- Anttila1997.instReprVariant.repr Anttila1997.Variant.strong prec✝ = Repr.addAppParen (Std.Format.nest (if prec✝ ≥ 1024 then 1 else 2) (Std.Format.text "Anttila1997.Variant.strong")).group prec✝
- Anttila1997.instReprVariant.repr Anttila1997.Variant.weak prec✝ = Repr.addAppParen (Std.Format.nest (if prec✝ ≥ 1024 then 1 else 2) (Std.Format.text "Anttila1997.Variant.weak")).group prec✝
Instances For
Equations
- Anttila1997.instFintypeVariant = { elems := { val := ↑Anttila1997.Variant.enumList, nodup := Anttila1997.Variant.enumList_nodup }, complete := Anttila1997.instFintypeVariant._proof_1 }
Set-3 candidate set per (trivial, single-motif) input.
Equations
- Anttila1997.m3Cands x✝ = Finset.univ
Instances For
Set-3 violation profile for motif 3ab (L.TÍI ∼ L.TI). Constraint
indexing matches @cite{anttila-1997} eq. (50): *H/I = 0, *Í = 1,
*L.L = 2. Strong (L.TÍI) violates *H/I and *Í; weak (L.TI)
violates *L.L.
Equations
- Anttila1997.m3Vp PUnit.unit Anttila1997.Variant.strong ⟨0, isLt⟩ = 1
- Anttila1997.m3Vp PUnit.unit Anttila1997.Variant.strong ⟨1, isLt⟩ = 1
- Anttila1997.m3Vp PUnit.unit Anttila1997.Variant.weak ⟨2, isLt⟩ = 1
- Anttila1997.m3Vp x✝² x✝¹ x✝ = 0
Instances For
Constraints in Set 3 that distinguish strong from weak for motif 3ab.
Equations
- Anttila1997.relevant_3 = {i : Fin 3 | Anttila1997.m3Vp () Anttila1997.Variant.strong i ≠ Anttila1997.m3Vp () Anttila1997.Variant.weak i}
Instances For
Constraints in Set 3 that favor strong for motif 3ab.
Equations
- Anttila1997.yesFav_3_strong = {i : Fin 3 | Anttila1997.m3Vp () Anttila1997.Variant.strong i < Anttila1997.m3Vp () Anttila1997.Variant.weak i}
Instances For
Constraints in Set 3 that favor weak for motif 3ab.
Equations
- Anttila1997.yesFav_3_weak = {i : Fin 3 | Anttila1997.m3Vp () Anttila1997.Variant.weak i < Anttila1997.m3Vp () Anttila1997.Variant.strong i}
Instances For
Variant probability via POC sampling under the discrete partial order.
Equations
Instances For
Strong L.TÍI wins 1/3 of Set-3 rankings. Closed form via
picksAt_rate_eq: |{2} ∩ {0,1,2}| / |{0,1,2}| = 1/3.
Weak L.TI wins 2/3 of Set-3 rankings. Matches
@cite{anttila-1997}'s observed frequency 63.1% for naa.pu.ri.en
(table 53, row 3b).
The two motifs decided by Set 4: 4ab (H.TÁA ∼ H.TA) and 5ab
(H.TÓO ∼ H.TO). They share the same six-constraint stratum but
have different violation profiles.
Instances For
Equations
- Anttila1997.instDecidableEqSet4Motif x✝ y✝ = if h : x✝.ctorIdx = y✝.ctorIdx then isTrue ⋯ else isFalse ⋯
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- Anttila1997.instReprSet4Motif = { reprPrec := Anttila1997.instReprSet4Motif.repr }
Equations
- Anttila1997.instFintypeSet4Motif = { elems := { val := ↑Anttila1997.Set4Motif.enumList, nodup := Anttila1997.Set4Motif.enumList_nodup }, complete := Anttila1997.instFintypeSet4Motif._proof_1 }
Set-4 violation profile for motifs 4ab and 5ab. Constraint indexing
matches @cite{anttila-1997} eq. (50): *H/O = 0, *Ó = 1,
*L/A = 2, *H.H = 3, *X.X = 4, *H́ = 5.
Motif 4ab (H.TÁA ∼ H.TA): strong violates *H.H, *H́; weak
violates *L/A, *X.X (per @cite{anttila-1997} table 52).
Motif 5ab (H.TÓO ∼ H.TO): strong violates *H/O, *Ó, *H.H, *H́;
weak violates only *X.X.
Equations
- Anttila1997.m45Vp Anttila1997.Set4Motif.four Anttila1997.Variant.strong ⟨3, isLt⟩ = 1
- Anttila1997.m45Vp Anttila1997.Set4Motif.four Anttila1997.Variant.strong ⟨5, isLt⟩ = 1
- Anttila1997.m45Vp Anttila1997.Set4Motif.four Anttila1997.Variant.weak ⟨2, isLt⟩ = 1
- Anttila1997.m45Vp Anttila1997.Set4Motif.four Anttila1997.Variant.weak ⟨4, isLt⟩ = 1
- Anttila1997.m45Vp Anttila1997.Set4Motif.five Anttila1997.Variant.strong ⟨0, isLt⟩ = 1
- Anttila1997.m45Vp Anttila1997.Set4Motif.five Anttila1997.Variant.strong ⟨1, isLt⟩ = 1
- Anttila1997.m45Vp Anttila1997.Set4Motif.five Anttila1997.Variant.strong ⟨3, isLt⟩ = 1
- Anttila1997.m45Vp Anttila1997.Set4Motif.five Anttila1997.Variant.strong ⟨5, isLt⟩ = 1
- Anttila1997.m45Vp Anttila1997.Set4Motif.five Anttila1997.Variant.weak ⟨4, isLt⟩ = 1
- Anttila1997.m45Vp x✝² x✝¹ x✝ = 0
Instances For
Set-4 distinguishing-constraint set for motif m.
Equations
- Anttila1997.relevant_45 m = {i : Fin 6 | Anttila1997.m45Vp m Anttila1997.Variant.strong i ≠ Anttila1997.m45Vp m Anttila1997.Variant.weak i}
Instances For
Set-4 strong-favoring constraint set for motif m.
Equations
- Anttila1997.yesFav_45_strong m = {i : Fin 6 | Anttila1997.m45Vp m Anttila1997.Variant.strong i < Anttila1997.m45Vp m Anttila1997.Variant.weak i}
Instances For
Set-4 weak-favoring constraint set for motif m.
Equations
- Anttila1997.yesFav_45_weak m = {i : Fin 6 | Anttila1997.m45Vp m Anttila1997.Variant.weak i < Anttila1997.m45Vp m Anttila1997.Variant.strong i}
Instances For
Variant probability via POC sampling under the discrete partial order.
Equations
Instances For
Motif 4ab strong H.TÁA wins 1/2 of Set-4 rankings. Closed form
via picksAt_rate_eq: |{2,4} ∩ {2,3,4,5}| / |{2,3,4,5}| = 2/4 = 1/2.
Motif 4ab weak H.TA wins 1/2 of Set-4 rankings. Matches
@cite{anttila-1997} observed 49.5% (table 53, row 4b).
Motif 5ab strong H.TÓO wins 1/5 of Set-4 rankings. Closed form:
|{4} ∩ {0,1,3,4,5}| / |{0,1,3,4,5}| = 1/5.
Motif 5ab weak H.TO wins 4/5 of Set-4 rankings. Matches
@cite{anttila-1997} observed 82.2% (table 53, row 5b).
All six variation rate predictions from @cite{anttila-1997} table 52 derived in closed form from the POC substrate.
The two binary outcomes for motif 3ab partition the probability mass (sum to 1). Direct corollary of the rate equalities.
The two binary outcomes for motif 4ab partition the probability mass.
The two binary outcomes for motif 5ab partition the probability mass.