Documentation

Linglib.Studies.HayesWilson2008

[HW08]: A Maximum Entropy Model of Phonotactics #

[HW08] propose that phonotactic well-formedness is probability: a MaxEnt grammar assigns each surface form a score h(x) = Σ wⱼ · Cⱼ(x), and well-formedness is P(x) = exp(−h(x)) / Z.

Hayes & Wilson's "score" is the negation of harmonyScore: h(x) = −harmonyScore(x), so P(x) ∝ exp(harmonyScore(x)). Higher harmony = higher probability = better well-formedness. This is exactly softmax(harmonyScore, 1) on a finite candidate set.

Key contribution: ganging #

The central empirical prediction distinguishing MaxEnt from OT is ganging: two individually weak constraints can jointly override a stronger one. This is impossible with OT's strict ranking, which corresponds to exponentially separated weights (OTLimit.lean).

The Ganging definition and anti-ganging theorems live in OTLimit.lean alongside ExponentiallySeparated, since they are two sides of the same coin.

English onset data #

We encode a subset of the learned grammar (Table (4)) and verify that the model assigns higher harmony (= higher MaxEnt probability via exp_lt_exp) to attested onsets than to unattested ones (§2).

@[reducible, inline]

abbrev HayesWilson2008.Onset :

An English onset: a list of consonants preceding the nucleus.

Equations

HayesWilson2008.Onset = List Phonology.Segment

Instances For

def HayesWilson2008.c1_star_son_dors :

Constraints.Constraint Onset

Constraint #1 from Table (4): *[+sonorant, +dorsal]. Weight 5.64.

Equations

HayesWilson2008.c1_star_son_dors onset = List.countP (fun (x : Phonology.Segment) => HayesWilson2008.matchesPat✝ x HayesWilson2008.son_dors_pat✝) onset

Instances For

def HayesWilson2008.c4_star_blank_cont :

Constraints.Constraint Onset

Constraint #4 from Table (4): *[ ][+continuant]. Weight 5.17.

Equations

HayesWilson2008.c4_star_blank_cont = Constraints.Constraint.binary fun (o : HayesWilson2008.Onset) => HayesWilson2008.c4_violated✝ o = true

Instances For

def HayesWilson2008.c5_star_blank_voice :

Constraints.Constraint Onset

Constraint #5 from Table (4): *[ ][+voice, −sonorant]. Weight 5.37.

Equations

HayesWilson2008.c5_star_blank_voice = Constraints.Constraint.binary fun (o : HayesWilson2008.Onset) => HayesWilson2008.c5_violated✝ o = true

Instances For

def HayesWilson2008.c6_star_son_blank :

Constraints.Constraint Onset

Constraint #6 from Table (4): *[+sonorant][ ]. Weight 6.66.

Equations

HayesWilson2008.c6_star_son_blank = Constraints.Constraint.binary fun (o : HayesWilson2008.Onset) => HayesWilson2008.c6_violated✝ o = true

Instances For

def HayesWilson2008.onsetCon :

Constraints.CON Onset 4

The subset grammar's constraint set: 4 constraints from Table (4).

Equations

HayesWilson2008.onsetCon = ![HayesWilson2008.c1_star_son_dors, HayesWilson2008.c4_star_blank_cont, HayesWilson2008.c5_star_blank_voice, HayesWilson2008.c6_star_son_blank]

Instances For

noncomputable def HayesWilson2008.onsetW :

Fin 4 → ℝ

The learned weights for the four Table (4) constraints: 5.64, 5.17, 5.37, 6.66.

Equations

HayesWilson2008.onsetW = ![564 / 100, 517 / 100, 537 / 100, 666 / 100]

Instances For

theorem HayesWilson2008.attested_higher_harmony_k_ŋ :

Constraints.harmonyScore onsetCon onsetW [English.Phonology.ŋ] < Constraints.harmonyScore onsetCon onsetW [English.Phonology.k]

Attested [k] (no violations) has higher harmony than unattested *[ŋ] (violates *[+son,+dors], cost 5.64). The harmony magnitude is a weight artifact; the ranking is the empirical prediction.

theorem HayesWilson2008.attested_higher_harmony_br_rk :

Constraints.harmonyScore onsetCon onsetW [English.Phonology.r, English.Phonology.k] < Constraints.harmonyScore onsetCon onsetW [English.Phonology.b, English.Phonology.r]

Attested [br] has higher harmony than unattested *[rk] (violates *[+son][ ]).

theorem HayesWilson2008.gradient_harmony_ŋ_rk :

Constraints.harmonyScore onsetCon onsetW [English.Phonology.r, English.Phonology.k] < Constraints.harmonyScore onsetCon onsetW [English.Phonology.ŋ]

Gradient: among unattested onsets, *[ŋ] has higher harmony than *[rk].

theorem HayesWilson2008.maxent_prob_k_gt_ŋ :

Real.exp (Constraints.harmonyScore onsetCon onsetW [English.Phonology.ŋ]) < Real.exp (Constraints.harmonyScore onsetCon onsetW [English.Phonology.k])

MaxEnt probability ordering: higher harmony ⟹ higher exp(harmonyScore) ⟹ higher MaxEnt probability. Applies exp_lt_exp to harmonyScore.

theorem HayesWilson2008.gradient_prob_ŋ_gt_rk :

Real.exp (Constraints.harmonyScore onsetCon onsetW [English.Phonology.r, English.Phonology.k]) < Real.exp (Constraints.harmonyScore onsetCon onsetW [English.Phonology.ŋ])

Gradient well-formedness: among unattested forms, *[ŋ] has higher MaxEnt probability than *[rk].

Phonological MaxEnt is one instance of the framework-agnostic ConstraintSystem abstraction in Core.Optimization.System. The same ConstraintSystem record that scores phonological onsets here also scores syntactic candidates in HG/MaxEnt syntax models, RSA utterances in soft-max pragmatic listeners, etc. The decoder (softmaxDecoder 1) is what makes this MaxEnt rather than HG (argmaxDecoder) or OT (argminDecoder over a LexProfile).

This section eats the dog food: rather than comparing exp(harmonyScore ...) directly (as in §3), we go through ConstraintSystem.predict.

def HayesWilson2008.candidateOnsets :

Finset Onset

The four onsets used as MaxEnt candidates: two attested ([k], [b,r]) and two unattested (*[ŋ], *[r,k]).

Equations

HayesWilson2008.candidateOnsets = {[English.Phonology.k], [English.Phonology.ŋ], [English.Phonology.r, English.Phonology.k], [English.Phonology.b, English.Phonology.r]}

Instances For

noncomputable def HayesWilson2008.onsetSystem :

Core.Optimization.ConstraintSystem Onset ℝ

[HW08]'s grammar realised as a generic ConstraintSystem over candidateOnsets, decoded by softmax at temperature 1 (built inline). The score component is harmonyScore onsetCon onsetW (the canonical MaxEnt harmony function).

Equations

One or more equations did not get rendered due to their size.

Instances For

theorem HayesWilson2008.predict_k_gt_ŋ :

onsetSystem.predict [English.Phonology.ŋ] < onsetSystem.predict [English.Phonology.k]

The system literally predicts a higher MaxEnt probability for [k] than for *[ŋ]. Unlike maxent_prob_k_gt_ŋ, this is a comparison of actual softmax probabilities (numerator / partition function), not just exponentiated harmony scores — so the partition function over candidateOnsets is part of the claim.

theorem HayesWilson2008.predict_ŋ_gt_rk :

onsetSystem.predict [English.Phonology.r, English.Phonology.k] < onsetSystem.predict [English.Phonology.ŋ]

The system also predicts a higher MaxEnt probability for *[ŋ] than for *[rk] — gradient well-formedness among unattested forms.

theorem HayesWilson2008.onsetSystem_isProb :

∑ c ∈ candidateOnsets, onsetSystem.predict c = 1

The MaxEnt softmax decoder is a probability decoder, so the system's predictions are non-negative and sum to 1 over the candidate set. Follows from Decoder.IsProb.sum_eq_one for softmaxDecoder.