@cite{hayes-wilson-2008}: A Maximum Entropy Model of Phonotactics #
@cite{hayes-wilson-2008}
@cite{hayes-wilson-2008} propose that phonotactic well-formedness is
probability: a MaxEnt grammar assigns each surface form a score
h(x) = Σ wⱼ · Cⱼ(x), and well-formedness is P(x) = exp(−h(x)) / Z.
Hayes & Wilson's "score" is the negation of harmonyScore:
h(x) = −harmonyScore(x), so P(x) ∝ exp(harmonyScore(x)).
Higher harmony = higher probability = better well-formedness.
This is exactly softmax(harmonyScoreR, 1) on a finite candidate set.
Key contribution: ganging #
The central empirical prediction distinguishing MaxEnt from OT is
ganging: two individually weak constraints can jointly override
a stronger one. This is impossible with OT's strict ranking, which
corresponds to exponentially separated weights (OTLimit.lean).
The Ganging definition and anti-ganging theorems live in OTLimit.lean
alongside ExponentiallySeparated, since they are two sides of the same
coin.
English onset data #
We encode a subset of the learned grammar (Table (4)) and verify that
the model assigns higher harmony (= higher MaxEnt probability via
exp_lt_exp) to attested onsets than to unattested ones (§2).
An English onset: a list of consonants preceding the nucleus.
Equations
Instances For
Constraint #1 from Table (4): *[+sonorant, +dorsal]. Weight 5.64.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Constraint #4 from Table (4): *[ ][+continuant]. Weight 5.17.
Equations
- HayesWilson2008.c4_star_blank_cont = Phonology.Constraints.mkMarkW "*[ ][+cont]" (fun (o : HayesWilson2008.Onset) => HayesWilson2008.c4_violated✝ o = true) (517 / 100)
Instances For
Constraint #5 from Table (4): *[ ][+voice, −sonorant]. Weight 5.37.
Equations
- HayesWilson2008.c5_star_blank_voice = Phonology.Constraints.mkMarkW "*[ ][+voice]" (fun (o : HayesWilson2008.Onset) => HayesWilson2008.c5_violated✝ o = true) (537 / 100)
Instances For
Constraint #6 from Table (4): *[+sonorant][ ]. Weight 6.66.
Equations
- HayesWilson2008.c6_star_son_blank = Phonology.Constraints.mkMarkW "*[+son][ ]" (fun (o : HayesWilson2008.Onset) => HayesWilson2008.c6_violated✝ o = true) (666 / 100)
Instances For
The subset grammar: 4 constraints from Table (4).
Equations
Instances For
Attested onset [k]: harmony = 0 (no violations).
Unattested onset *[ŋ]: harmony = −5.64 (violates *[+son,+dors]).
Unattested onset *[rk]: harmony = −6.66 (violates *[+son][ ]).
Attested [k] has higher harmony than unattested *[ŋ].
Attested [br] has higher harmony than unattested *[rk].
MaxEnt probability ordering: higher harmony ⟹ higher
exp(harmonyScore) ⟹ higher MaxEnt probability.
Applies exp_lt_exp (Mathlib) to harmonyScoreR (Core.Constraint.Weighted).
Gradient well-formedness: among unattested forms, *[ŋ]
has higher MaxEnt probability than *[rk]. Uses exp_lt_exp.
Phonological MaxEnt is one instance of the framework-agnostic
ConstraintSystem abstraction in Core.Constraint.System. The same
maxEntSystem constructor that scores phonological onsets here also
scores syntactic candidates in HG/MaxEnt syntax models, RSA utterances
in soft-max pragmatic listeners, etc. The decoder (softmaxDecoder 1)
is what makes this MaxEnt rather than HG (argmaxDecoder) or OT
(argminDecoder over a LexProfile).
This section eats the dog food: rather than comparing
exp(harmonyScoreR ...) directly (as in §3), we go through
ConstraintSystem.predict.
The four onsets used as MaxEnt candidates: two attested ([k], [b,r]) and two unattested (*[ŋ], *[r,k]).
Equations
- One or more equations did not get rendered due to their size.
Instances For
@cite{hayes-wilson-2008}'s grammar realised as a generic
ConstraintSystem over candidateOnsets, decoded by softmax at
temperature 1. The score component is harmonyScoreR onsetGrammar
(the canonical MaxEnt harmony function).
Equations
Instances For
The system literally predicts a higher MaxEnt probability for [k]
than for *[ŋ]. Unlike maxent_prob_k_gt_ŋ, this is a comparison of
actual softmax probabilities (numerator / partition function), not
just exponentiated harmony scores — so the partition function over
candidateOnsets is part of the claim.
The system also predicts a higher MaxEnt probability for *[ŋ] than for *[rk] — gradient well-formedness among unattested forms.
The MaxEnt softmax decoder is a probability decoder, so the system's
predictions are non-negative and sum to 1 over the candidate set.
Follows from Decoder.IsProb.sum_eq_one for softmaxDecoder.