Documentation

Linglib.Phenomena.Phonology.Studies.Anttila1997

@cite{anttila-1997}: Deriving Variation from Grammar #

Formalizes the quantitative variation predictions for Finnish genitive plurals from @cite{anttila-1997}. Anttila's claim: free variation in Finnish (and crucially, its statistical biases) is derivable from a single partially-ranked OT grammar — the variant probabilities equal the fraction of total rankings consistent with the partial ranking under which that variant wins.

The grammar #

Anttila stratifies 16 constraints into 5 mutually-ranked strata, with internal random ordering within each stratum (@cite{anttila-1997} eq. (49)–(50), page 21):

Set 1 ≫ Set 2 ≫ Set 3 ≫ Set 4 ≫ Set 5

Substrate consumption #

This file routes through the project's POC (Partially Ordered Constraints) substrate. For each motif, a violation-profile function vp : Input → Variant → Fin n → ℕ derives relevant (where vp disagrees on the two variants) and yesFav (where vp favors the chosen variant). pocPredict over discrete n (uniform sampling over all n! total orders) gives the variant probability; picksAt_rate_eq reduces pocPredict to |Y ∩ D| / |D| in closed form — no enumeration of n! rankings.

Two POC instances, one per stratum:

Note on candidate-feature substrate #

We stipulate violation profiles via vp rather than defining NamedConstraint instances. This matches Anttila's own level of abstraction: the paper works directly with violation profiles (@cite{anttila-1997} page 22: "knowing that the weak variant violates one constraint (*L.L) while the strong variant violates two (*H/I, *Í) gives us the result directly"). True NamedConstraint formalisations would require a Finnish syllable substrate (input forms with stress / weight / sonority features feeding into syllable structure) which doesn't yet exist in linglib.

Predictions formalized #

From @cite{anttila-1997} table 52 (page 22) and table 53 (page 23):

Out of scope #

Same closed form as @cite{zuraw-2010}, @cite{coetzee-pater-2011} #

Anttila's Finnish variation, Zuraw's Tagalog factorial typology, and Coetzee & Pater's English t/d-deletion all reduce to the same substrate predictor pocPredict (discrete n) with binary candidate spaces — variant probability = |Y ∩ D| / |D| (where D distinguishes and Y favors the chosen variant). The reusability across three phonological domains validates the abstraction; see Phenomena/Phonology/Studies/Zuraw2010.lean and Phenomena/Phonology/Studies/CoetzeePater2011.lean for sister consumers.

§ 0: Variant type — strong vs weak genitive plural #

The two genitive-plural variants: strong (heavy penult, final syllable onset /t/ or /d/) vs weak (light penult, onset /j/ or absent). See @cite{anttila-1997} ex. (1) page 3.

Instances For
    @[implicit_reducible]
    Equations
    @[implicit_reducible]
    Equations
    def Anttila1997.instReprVariant.repr :
    VariantStd.Format
    Equations
    Instances For

      The two variants are distinct.

      def Anttila1997.m3Cands :
      UnitFinset Variant

      Set-3 candidate set per (trivial, single-motif) input.

      Equations
      Instances For
        def Anttila1997.m3Vp :
        UnitVariantFin 3

        Set-3 violation profile for motif 3ab (L.TÍIL.TI). Constraint indexing matches @cite{anttila-1997} eq. (50): *H/I = 0, *Í = 1, *L.L = 2. Strong (L.TÍI) violates *H/I and ; weak (L.TI) violates *L.L.

        Equations
        Instances For
          def Anttila1997.relevant_3 :
          Finset (Fin 3)

          Constraints in Set 3 that distinguish strong from weak for motif 3ab.

          Equations
          Instances For
            def Anttila1997.yesFav_3_strong :
            Finset (Fin 3)

            Constraints in Set 3 that favor strong for motif 3ab.

            Equations
            Instances For
              def Anttila1997.yesFav_3_weak :
              Finset (Fin 3)

              Constraints in Set 3 that favor weak for motif 3ab.

              Equations
              Instances For
                @[simp]

                Strong L.TÍI wins 1/3 of Set-3 rankings. Closed form via picksAt_rate_eq: |{2} ∩ {0,1,2}| / |{0,1,2}| = 1/3.

                Weak L.TI wins 2/3 of Set-3 rankings. Matches @cite{anttila-1997}'s observed frequency 63.1% for naa.pu.ri.en (table 53, row 3b).

                The two motifs decided by Set 4: 4ab (H.TÁAH.TA) and 5ab (H.TÓOH.TO). They share the same six-constraint stratum but have different violation profiles.

                Instances For
                  @[implicit_reducible]
                  Equations
                  def Anttila1997.instReprSet4Motif.repr :
                  Set4MotifStd.Format
                  Equations
                  • One or more equations did not get rendered due to their size.
                  Instances For

                    Set-4 candidate set per motif.

                    Equations
                    Instances For
                      def Anttila1997.m45Vp :
                      Set4MotifVariantFin 6

                      Set-4 violation profile for motifs 4ab and 5ab. Constraint indexing matches @cite{anttila-1997} eq. (50): *H/O = 0, *Ó = 1, *L/A = 2, *H.H = 3, *X.X = 4, *H́ = 5.

                      Motif 4ab (H.TÁAH.TA): strong violates *H.H, *H́; weak violates *L/A, *X.X (per @cite{anttila-1997} table 52). Motif 5ab (H.TÓOH.TO): strong violates *H/O, *Ó, *H.H, *H́; weak violates only *X.X.

                      Equations
                      Instances For
                        def Anttila1997.relevant_45 (m : Set4Motif) :
                        Finset (Fin 6)

                        Set-4 distinguishing-constraint set for motif m.

                        Equations
                        Instances For
                          def Anttila1997.yesFav_45_strong (m : Set4Motif) :
                          Finset (Fin 6)

                          Set-4 strong-favoring constraint set for motif m.

                          Equations
                          Instances For
                            def Anttila1997.yesFav_45_weak (m : Set4Motif) :
                            Finset (Fin 6)

                            Set-4 weak-favoring constraint set for motif m.

                            Equations
                            Instances For

                              Motif 4ab strong H.TÁA wins 1/2 of Set-4 rankings. Closed form via picksAt_rate_eq: |{2,4} ∩ {2,3,4,5}| / |{2,3,4,5}| = 2/4 = 1/2.

                              Motif 4ab weak H.TA wins 1/2 of Set-4 rankings. Matches @cite{anttila-1997} observed 49.5% (table 53, row 4b).

                              Motif 5ab strong H.TÓO wins 1/5 of Set-4 rankings. Closed form: |{4} ∩ {0,1,3,4,5}| / |{0,1,3,4,5}| = 1/5.

                              Motif 5ab weak H.TO wins 4/5 of Set-4 rankings. Matches @cite{anttila-1997} observed 82.2% (table 53, row 5b).

                              All six variation rate predictions from @cite{anttila-1997} table 52 derived in closed form from the POC substrate.

                              The two binary outcomes for motif 3ab partition the probability mass (sum to 1). Direct corollary of the rate equalities.

                              The two binary outcomes for motif 4ab partition the probability mass.

                              The two binary outcomes for motif 5ab partition the probability mass.