Documentation

Linglib.Phenomena.Phonology.Studies.CoetzeePater2011

Coetzee & Pater (2011): The Place of Variation in Phonological Theory #

@cite{coetzee-pater-2011}

Handbook of Phonological Theory chapter comparing three frameworks for modeling phonological variation, illustrated with English t/d-deletion.

Models formalized #

  1. Partially Ordered Constraints (POC) (@cite{anttila-1997}, @cite{kiparsky-1993b}): grammar is a partial order on OT constraints. Each evaluation randomly samples a total order consistent with the partial order. Probability of an output = fraction of total orders that select it as optimal.

  2. MaxEnt Harmonic Grammar (@cite{goldwater-johnson-2003}): constraints have numerical weights; candidate probability ∝ exp(harmony score). More expressive than POC — can encode arbitrary probability distributions over outputs.

  3. Bridge: the OT limit theorem (maxent_ot_limit) shows that as α → ∞, MaxEnt recovers OT's categorical optimization, connecting the two frameworks.

Key results #

Output form for t/d-deletion: either retain or delete.

Instances For
    @[implicit_reducible]
    Equations
    Equations
    • One or more equations did not get rendered due to their size.
    Instances For

      A candidate pairs a phonological context with an output form.

      Instances For
        def CoetzeePater2011.instDecidableEqTDCandidate.decEq (x✝ x✝¹ : TDCandidate) :
        Decidable (x✝ = x✝¹)
        Equations
        • One or more equations did not get rendered due to their size.
        Instances For
          Equations
          • One or more equations did not get rendered due to their size.
          Instances For
            @[implicit_reducible]
            Equations

            Candidates for a given context.

            Equations
            Instances For

              *CT (markedness): penalizes word-final consonant clusters ending in a coronal stop. Violated by the faithful (retaining) candidate. @cite{coetzee-pater-2011} example (11).

              Equations
              Instances For

                MAX (faithfulness): penalizes deletion of an input consonant. Violated by the deleting candidate in all contexts. @cite{coetzee-pater-2011} example (11).

                Equations
                Instances For

                  MAX-PRE-V (contextual faithfulness): penalizes deletion specifically in pre-vocalic position, where perceptual cues for t/d are maximal. @cite{coetzee-pater-2011} example (11).

                  Equations
                  • One or more equations did not get rendered due to their size.
                  Instances For

                    MAX-FINAL (contextual faithfulness): penalizes deletion in phrase-final position, where consonantal release provides cues. @cite{coetzee-pater-2011} example (11).

                    Equations
                    • One or more equations did not get rendered due to their size.
                    Instances For

                      All violations are bounded by 1 (binary constraints).

                      Violation profile for a candidate under the 4 constraints. Order: [*CT, MAX, MAX-PRE-V, MAX-FINAL].

                      Equations
                      Instances For

                        Retain always violates *CT once and nothing else.

                        Delete in pre-C violates MAX only (no contextual faithfulness active).

                        Delete in pre-V violates MAX and MAX-PRE-V.

                        Delete pre-pausally violates MAX and MAX-FINAL.

                        Candidate set per context for POC: both retain and delete are available in every context. Required by PartialOrderConstraints.pocPredict.

                        Equations
                        Instances For

                          Violation profile in POC's Input → Output → Fin n → ℕ shape. Constraint indexing matches constraints: 0 = *CT, 1 = MAX, 2 = MAX-PRE-V, 3 = MAX-FINAL.

                          Equations
                          Instances For

                            @cite{coetzee-pater-2011} §3.2 explicitly adopts the POC framework: "the grammar states a partial, rather than a total, order on the constraint set. Each time the grammar is used to evaluate a candidate set, one of the total orders consistent with the partial order is randomly chosen." Equation (9): "If a candidate is selected as optimal in n of these rankings, then this candidate's probability of occurrence is n/t."

                            We formalize the §3.2 t/d-deletion analysis (table 10) using POC's
                            `pocPredict`, with deletion probabilities derived in closed form via
                            the `picksAt_binary_iff_permDList_head_lt` bridge + the substrate's
                            `perm_filter_head_in_card`.
                            
                            The discrete partial order (`PartialOrderConstraints.discrete 4`)
                            encodes "no rankings imposed" — uniform sampling over all 4! = 24
                            total orders. 
                            

                            Probability that POC sampling selects deletion at context ctx, under the discrete partial order.

                            Equations
                            • One or more equations did not get rendered due to their size.
                            Instances For

                              Pre-vocalic deletion probability: 8/24 = 1/3. Closed form via picksAt_rate_eq: count / 4! = |Y ∩ D| / |D| with D = {*CT, MAX, MAX-PRE-V} (3 distinguishing) and Y ∩ D = {*CT} (only *CT favors delete) → 1/3.

                              Phrase-final deletion probability: 8/24 = 1/3. Same closed form as preV with D = {*CT, MAX, MAX-FINAL}, Y ∩ D = {*CT}.

                              Pre-consonantal deletion probability: 12/24 = 1/2 (the highest, since no positional faithfulness applies). D = {*CT, MAX}, Y ∩ D = {*CT}1/2.

                              The cross-dialectal generalization: pre-C deletion rate exceeds pre-V and pre-pause rates. Direct consequence of the closed-form arithmetic — no enumeration.

                              The factorial typology over Equiv.Perm (Fin 4) rankings, using POC's PicksAt for the per-context deletion check. Per-σ pattern is (PicksAt σ .preV .delete, PicksAt σ .pause .delete, PicksAt σ .preC .delete).

                              These are per-σ patterns (not head-in-Y predicates), so the substrate
                              doesn't apply — we use plain `decide` on the 24-perm `Equiv.Perm (Fin 4)`. 
                              
                              def CoetzeePater2011.deletionPattern (σ : Equiv.Perm (Fin 4)) :
                              Bool × Bool × Bool

                              The deletion pattern (preV?, pause?, preC?) produced by ranking σ.

                              Equations
                              • One or more equations did not get rendered due to their size.
                              Instances For
                                def CoetzeePater2011.factorialTypes :
                                Finset (Bool × Bool × Bool)

                                Distinct categorical dialect types over all 4! = 24 total orders.

                                Equations
                                Instances For

                                  The factorial typology has exactly 5 distinct language types, matching the 5 crucial ranking classes in table (12): a. (F,F,F) — MAX >> *CT, no deletion b. (F,T,T) — MAX-PRE-V >> *CT >> {MAX, MAX-FINAL} c. (T,F,T) — MAX-FINAL >> *CT >> {MAX, MAX-PRE-V} d. (F,F,T) — {MAX-PRE-V, MAX-FINAL} >> *CT >> MAX e. (T,T,T) — *CT >> {MAX, MAX-PRE-V, MAX-FINAL}

                                  theorem CoetzeePater2011.type_a_count :
                                  {σ : Equiv.Perm (Fin 4) | deletionPattern σ = (false, false, false)}.card = 12

                                  Type a (F,F,F): 12 rankings — MAX >> *CT blocks all deletion.

                                  theorem CoetzeePater2011.type_b_count :
                                  {σ : Equiv.Perm (Fin 4) | deletionPattern σ = (false, true, true)}.card = 2

                                  Type b (F,T,T): 2 rankings.

                                  theorem CoetzeePater2011.type_c_count :
                                  {σ : Equiv.Perm (Fin 4) | deletionPattern σ = (true, false, true)}.card = 2

                                  Type c (T,F,T): 2 rankings.

                                  theorem CoetzeePater2011.type_d_count :
                                  {σ : Equiv.Perm (Fin 4) | deletionPattern σ = (false, false, true)}.card = 2

                                  Type d (F,F,T): 2 rankings.

                                  theorem CoetzeePater2011.type_e_count :
                                  {σ : Equiv.Perm (Fin 4) | deletionPattern σ = (true, true, true)}.card = 6

                                  Type e (T,T,T): 6 rankings — *CT >> {MAX, MAX-PRE-V, MAX-FINAL}.

                                  theorem CoetzeePater2011.no_preV_only :
                                  {σ : Equiv.Perm (Fin 4) | deletionPattern σ = (true, false, false)}.card = 0

                                  No ranking produces the (T,F,F) pattern — deletion only in pre-V without pre-C deletion is impossible.

                                  Every ranking that produces pre-V deletion also produces pre-C deletion. This is the structural reason POC cannot generate reversed rates: pre-V deletion requires *CT >> MAX ∧ *CT >> MAX-PRE-V, which entails *CT >> MAX, the sole condition for pre-C deletion.

                                  Similarly, every ranking producing pause deletion also produces pre-C deletion: *CT >> MAX ∧ *CT >> MAX-FINAL entails *CT >> MAX.

                                  No ranking produces pre-V or pause deletion without also producing pre-C deletion. The formal basis for the cross-dialectal generalization P(del|preC) ≥ P(del|preV). Direct corollary of the closed-form deletionProb_* theorems.

                                  Weighted version of the t/d-deletion constraints for MaxEnt. Weight parameterization enables dialect-specific fitting.

                                  Equations
                                  • One or more equations did not get rendered due to their size.
                                  Instances For
                                    theorem CoetzeePater2011.maxent_deletion_preferred (wCT wMax wMaxPreV wMaxFin : ) (ctx : Fragments.English.TDDeletion.Context) (h : Core.Constraint.harmonyScore (mkWeightedConstraints wCT wMax wMaxPreV wMaxFin) { context := ctx, output := TDOutput.delete } > Core.Constraint.harmonyScore (mkWeightedConstraints wCT wMax wMaxPreV wMaxFin) { context := ctx, output := TDOutput.retain }) :
                                    Core.Constraint.harmonyDominates (mkWeightedConstraints wCT wMax wMaxPreV wMaxFin) { context := ctx, output := TDOutput.delete } { context := ctx, output := TDOutput.retain }

                                    MaxEnt harmony ordering is a decidable proxy for probability ordering: H(a) > H(b) ⟺ P(a) > P(b) by monotonicity of exp.

                                    With AAVE weights from table (23) ME-HG row, deletion probability ranks pre-C > pause > pre-V. Weights are exact ℚ transcriptions of the one-decimal-place values reported in the paper: *CT = 100.6, MAX-P-V = 2.1, MAX-FIN = 0.2, MAX = 99.4.

                                    theorem CoetzeePater2011.nonneg_weights_preserve_ordering (wCT wMax wMaxPreV wMaxFin : ) (hPreV : wMaxPreV 0) :

                                    With non-negative MAX-PRE-V weight, harmony of pre-C deletion ≥ pre-V deletion. Pre-V delete violates {MAX, MAX-PRE-V} while pre-C delete violates only {MAX}, so H(del|preC) - H(del|preV) = wMaxPreV ≥ 0.

                                    Banning negative weights thus makes MaxEnt respect the same typological restriction as POC (@cite{coetzee-pater-2011} §4.4).

                                    theorem CoetzeePater2011.nonneg_weights_preserve_ordering_pause (wCT wMax wMaxPreV wMaxFin : ) (hFin : wMaxFin 0) :

                                    Analogously, non-negative MAX-FINAL weight ensures pre-C ≥ pause. H(del|preC) - H(del|pause) = wMaxFin ≥ 0.

                                    Tejano' is a hypothetical dialect with reversed pre-C/pre-V rates: lowest deletion in pre-consonantal position. Created by swapping Tejano's pre-V (25%) and pre-C (62%) rates. @cite{coetzee-pater-2011} §4.4.

                                    Equations
                                    Instances For

                                      POC cannot generate Tejano': every ranking that produces pre-V deletion also produces pre-C deletion (§7), so P(del|preC) ≥ P(del|preV) for any POC grammar over these 4 constraints.

                                      theorem CoetzeePater2011.maxent_can_generate_tejanoPrime :
                                      ∃ (wCT : ) (wMax : ) (wMaxPreV : ) (wMaxFin : ), Core.Constraint.harmonyScore (mkWeightedConstraints wCT wMax wMaxPreV wMaxFin) { context := Fragments.English.TDDeletion.Context.preV, output := TDOutput.delete } > Core.Constraint.harmonyScore (mkWeightedConstraints wCT wMax wMaxPreV wMaxFin) { context := Fragments.English.TDDeletion.Context.preV, output := TDOutput.retain } Core.Constraint.harmonyScore (mkWeightedConstraints wCT wMax wMaxPreV wMaxFin) { context := Fragments.English.TDDeletion.Context.preC, output := TDOutput.retain } > Core.Constraint.harmonyScore (mkWeightedConstraints wCT wMax wMaxPreV wMaxFin) { context := Fragments.English.TDDeletion.Context.preC, output := TDOutput.delete } wMaxPreV < 0

                                      MaxEnt CAN generate Tejano' with negative MAX-PRE-V weight: when MAX-PRE-V has a negative weight, violating it helps the candidate, rewarding deletion in pre-vocalic position.

                                      Witness from @cite{coetzee-pater-2011} table (23) Tejano' ME-HG row: *CT = 99.4, MAX = 100.6, MAX-PRE-V = −1.6, MAX-FINAL = −0.8. These are the weights the paper's MaxEnt fitting procedure learned on Tejano' frequencies.

                                      Witness arithmetic:

                                      • Pre-V delete: H = −(100.6·1 + (−1.6)·1) = −99
                                      • Pre-V retain: H = −(99.4·1) = −99.4 → delete > retain ✓
                                      • Pre-C delete: H = −(100.6·1) = −100.6
                                      • Pre-C retain: H = −(99.4·1) = −99.4 → retain > delete ✓ (REVERSED)

                                      @cite{coetzee-pater-2011} §4.4, table (23)

                                      Framework separation: POC/StOT and MaxEnt have different typological predictions. POC cannot generate all patterns that MaxEnt can.

                                      Left conjunct: POC always has pre-C ≥ pre-V (structural implication). Right conjunct: MaxEnt can achieve pre-V > pre-C (negative weights).

                                      When MAX >> *CT, the categorical OT prediction is retention (no deletion) in all contexts.

                                      theorem CoetzeePater2011.ct_dominates_implies_deletion (ctx : Fragments.English.TDDeletion.Context) :
                                      have ranking := [starCT, maxC, maxPreV, maxFinal]; have tab := Core.Constraint.OT.mkTableau (candidatesFor ctx) ranking ; tab.optimal = {{ context := ctx, output := TDOutput.delete }}

                                      When *CT >> all faithfulness, the categorical OT prediction is deletion in all contexts.

                                      At each context, the AAVE MaxEnt model is a per-context ConstraintSystem TDOutput: candidates = {retain, delete}, score = harmony of (ctx, ·), decoder = softmaxDecoder 1. The two-candidate softmax is the logistic function, so predict .delete is genuine conditional probability P(delete | ctx).

                                      @[implicit_reducible]
                                      Equations
                                      • One or more equations did not get rendered due to their size.

                                      The AAVE constraint weights from table (23) ME-HG row.

                                      Equations
                                      Instances For

                                        The AAVE MaxEnt model at a fixed context, packaged as a generic ConstraintSystem. With only two candidates, predict .delete is the conditional probability P(delete | ctx).

                                        Equations
                                        • One or more equations did not get rendered due to their size.
                                        Instances For

                                          In the pre-consonantal context, the AAVE system predicts deletion over retention. Conditional probability claim: P(delete | preC) > P(retain | preC).

                                          In the pre-vocalic context, the AAVE system predicts retention over deletion: pre-V deletion violates both MAX and MAX-PRE-V, costing more than the *CT violation incurred by retention. Conditional probability claim: P(retain | preV) > P(delete | preV).

                                          The per-context AAVE system is a probability distribution over TDOutput. Generic property of softmaxDecoder.