Separable Harmonies and HZ's Generalization [Mag25] #
[Mag25] "Constraint Interaction in Probabilistic Phonology: Deducing Maximum Entropy Grammars from Hayes and Zuraw's Shifted Sigmoids Generalization" (Linguistic Inquiry, Early Access).
Overview #
Hayes and Zuraw ([ZH17]; [Hay22]) observe that the rates of application of variable phonological processes governed by independent factors can be fit onto shifted sigmoids at shared abscissas. [Mag25] reformulates this as a constant logit-rate difference condition and proves a biconditional: within harmony-based probabilistic phonology, a harmony function predicts HZ's generalization if and only if it is separable — it decomposes into a product of powers of unary functions, each fed by a single constraint.
Key definitions #
ConstraintIndependence: the formal constraint condition (§2.4) — each constraint is insensitive to at least one of the two dimensions of a 2×2 "square" of underlying formsConstantLogitDiff: HZ's generalization restated as constant differences of logit rates along each dimension (eq. 13)SeparableHarmony: a harmony function that decomposes asH(v) = ∏ₖ (hₖ(vₖ))^{wₖ}(eq. 30)meSeparable: ME harmony is separable (eq. 29), a corollary ofexp(sum) = prod(exp)separable_predicts_hz: the forward direction — separable harmonies predict HZ's generalization. Follows fromlogit_uniformity+ constraint rescalingseparable_eq_me_rescaled: any separable harmony is ME under constraint rescalingĈₖ = −log hₖ(Cₖ)(eq. 34)
Connection to existing infrastructure #
The forward direction leverages logit_uniformity (in NoisyHG.lean)
and maxent_logit_harmony, which already prove that MaxEnt log-odds
equal harmony score differences. [Mag25]'s contribution is
showing this is the only mode of constraint interaction with this
property (the backward direction).
Bridge to the MaxEnt grammar API (§11) #
harmonyScore con w is natively the negated Fin-indexed weighted-violation sum
(harmonyScore_eq_neg_sum), so a MaxEnt grammar (con, w) lands in separability
theory directly:
exp_harmonyScore_eq_me_separable:exp(harmonyScore con w c) = meSeparable.eval (con · c)maxent_logit_as_finsum: MaxEnt logit = weighted violation-difference sum
These apply the separability results (independence → HZ, rescaling) to any
(con, w) MaxEnt grammar.
The 2×2 Square of Underlying Forms (§2.4) #
A square of four underlying forms indexed by two binary factors
(row = top/bottom, column = left/right). This is [Mag25]'s
eq. (12): the four forms x^{TL}, x^{TR}, x^{BL}, x^{BR} arranged
so that rows and columns correspond to independent phonological
dimensions (e.g., prefix identity × stem-initial obstruent quality).
Row and Col are the two binary factors.
- tl : X
Top-left form (e.g., /maŋ+b/).
- tr : X
Top-right form (e.g., /maŋ+k/).
- bl : X
Bottom-left form (e.g., /paŋ+b/).
- br : X
Bottom-right form (e.g., /paŋ+k/).
Instances For
Constraint Independence (§2.4, Figure 4) #
A constraint C is insensitive to the row dimension of a square:
it assigns the same violation count to forms that share a column.
Cf. Figure 4a.
Equations
- HarmonicGrammar.InsensitiveToRow C sq = (C sq.tl = C sq.bl ∧ C sq.tr = C sq.br)
Instances For
A constraint C is insensitive to the column dimension of a square:
it assigns the same violation count to forms that share a row.
Cf. Figure 4b.
Equations
- HarmonicGrammar.InsensitiveToCol C sq = (C sq.tl = C sq.tr ∧ C sq.bl = C sq.br)
Instances For
Constraint independence (§2.4): the rows and columns of the square are independent dimensions relative to a constraint set — each constraint is insensitive to at least one dimension (row or column). No constraint can encode an interaction between the two dimensions.
Equations
- HarmonicGrammar.ConstraintIndependence constraints sq = ∀ (k : Fin n), HarmonicGrammar.InsensitiveToRow (constraints k) sq ∨ HarmonicGrammar.InsensitiveToCol (constraints k) sq
Instances For
HZ's Generalization as Constant Logit-Rate Differences (§2.2) #
HZ's generalization (eq. 13): the difference between logit rates of application for two underlying forms in the same row (or column) does not depend on the choice of row (or column).
Equivalently: for a function LR giving logit rates,
LR(x^TL) − LR(x^TR) = LR(x^BL) − LR(x^BR).
Equations
- HarmonicGrammar.ConstantLogitDiff LR sq = (LR sq.tl - LR sq.tr = LR sq.bl - LR sq.br)
Instances For
ME Predicts HZ's Generalization (§3) #
Independence of violation differences: if a constraint is insensitive to a dimension, its violation differences are too.
The violation difference Δₖ(x) = Cₖ(x, NO) − Cₖ(x, YES) inherits
insensitivity from the raw constraint, because both YES and NO get
the same violation count for forms in the same row (or column).
Equations
Instances For
ME predicts HZ (§3.6, eq. 22): when the
violation differences satisfy independence (inherited from constraint
independence), the weighted sum of violation differences — which
equals the ME logit probability by maxent_logit_harmony —
satisfies the constant-difference identity.
The proof follows eq. (18): split the sum by constraint, and for each k, independence ensures the k-th term contributes equally to both sides of the identity.
Separable Harmonies (§5.1) #
An n-ary harmony function H is separable if it decomposes
into a product of powers of unary functions, each attending to a
single constraint (eq. 30):
H(C₁(x,y), …, Cₙ(x,y)) = ∏ₖ (hₖ(Cₖ(x,y)))^{wₖ}
Each hₖ must be positive, normalized (hₖ(0) = 1), and decreasing
(more violations → lower harmony).
- w : Fin n → ℝ
Constraint weights.
- h : Fin n → ℕ → ℝ
Per-constraint rescaling functions.
Each hₖ is positive.
- h_norm (k : Fin n) : self.h k 0 = 1
Normalized: hₖ(0) = 1.
Decreasing: more violations → lower score.
Instances For
Evaluate a separable harmony on a violation profile.
Instances For
The separable harmony at the zero profile is 1 (normalization).
ME Harmony Is Separable (§5.1, eq. 29) #
The ME separable harmony: each hₖ(x) = exp(−x) (the
exponential-of-opposite function from Figure 5a).
This gives H_ME(v) = ∏ₖ (exp(−vₖ))^{wₖ} = exp(−Σ wₖvₖ).
Equations
- HarmonicGrammar.meSeparable n w = { w := w, h := fun (x : Fin n) (v : ℕ) => Real.exp (-↑v), h_pos := ⋯, h_norm := ⋯, h_decreasing := ⋯ }
Instances For
The ME separable harmony agrees with the standard ME harmony score
(up to positive scaling by exp).
meSeparable.eval(v) = exp(−Σₖ wₖ · vₖ)
Proof: ∏ₖ (exp(−vₖ))^{wₖ} = ∏ₖ exp(−wₖvₖ) = exp(−Σₖ wₖvₖ),
using rpow = exp(w · log(exp(−v))) = exp(−wv).
Constraint Rescaling (§5.3, eq. 33–34) #
Constraint rescaling (eq. 33): given a separable
harmony with unary functions hₖ, the rescaled constraint is
Ĉₖ = −log(hₖ(Cₖ)).
The rescaled constraints are nonneg (since hₖ ∈ (0,1] for v ≥ 0) and preserve the ordering on violation profiles.
Instances For
hₖ(v) ≤ 1 for all v: follows from decreasingness and hₖ(0) = 1.
Rescaled constraints are nonneg: since hₖ(v) ∈ (0, 1],
−log(hₖ(v)) ≥ 0.
Rescaled constraints at 0 are 0: Ĉₖ(0) = −log(1) = 0.
ME rescaling is the identity: since hₖ = exp(−·),
Ĉₖ(v) = −log(exp(−v)) = v.
Any separable harmony is ME under rescaling (
eq. 34): H(C₁, …, Cₙ) = H_ME(Ĉ₁, …, Ĉₙ) where Ĉₖ = −log hₖ(Cₖ).
This is the key insight: the choice of hₖ only affects how
constraints are rescaled, not how they interact. All separable
harmonies have the same mode of constraint interaction as ME.
Forward Direction — Separable ⟹ HZ (§5.4) #
Separable harmonies predict HZ (§5.4):
for any separable harmony H, if the rescaled violation differences
Δ̂ₖ(x) = Ĉₖ(Cₖ(x,NO)) − Ĉₖ(Cₖ(x,YES)) satisfy independence on a
square, then the logit rate log(H(v_YES)/H(v_NO)) satisfies HZ's
constant-difference identity.
The proof composes two results:
separable_eq_me_rescaled:H(v) = exp(−Σ wₖĈₖ(vₖ))me_predicts_hz: weighted sums with independent differences satisfy constant logit-rate differences
Since log(exp(a)/exp(b)) = a − b, the logit rate is a weighted sum
of rescaled violation differences, and me_predicts_hz applies.
Backward Direction — HZ ⟹ Separable (§5.5, online appendices) #
Counterexample — Inverse Function (§4.4) #
The inverse function h(x) = 1/(1+x) used in the non-separable
harmony H(v) = 1 / (1 + Σ wₖvₖ) (eq. 27).
Like ME's exp(−x), it is positive, normalized, and decreasing —
but the resulting harmony is not separable.
Equations
- HarmonicGrammar.inverseFunction x = 1 / (1 + x)
Instances For
The inverse function is positive for nonneg arguments.
The inverse function is normalized: h(0) = 1.
The inverse function is strictly decreasing on [0, ∞).
The non-separable harmony using the inverse function:
H(v) = 1 / (1 + Σₖ wₖ · vₖ) (eq. 27).
This has the form H = h(Σ wₖCₖ) (eq. 26) with h = inverseFunction,
which is not separable because the single h sees the sum of all
weighted violations, not each constraint individually.
Equations
- HarmonicGrammar.nonSeparableInverseHarmony w v = HarmonicGrammar.inverseFunction (∑ k : Fin n, w k * ↑(v k))
Instances For
Counterexample (§4.4, online appendix D.1): the non-separable inverse
harmony H(v) = inverseFunction (Σ wₖvₖ) = 1/(1 + Σ wₖvₖ) does not
predict HZ's generalization in general.
For this harmony the logit rate is log((1 + S_NO(x)) / (1 + S_YES(x)))
with S_y(x) = Σₖ wₖ Cₖ(x,y), and HZ holds iff the cross-product of
inverseFunction(S) across the square is equal on both diagonals — i.e.
∏ inverseFunction(S_NO tl, S_YES tr, S_YES bl, S_NO br) equals
∏ inverseFunction(S_YES tl, S_NO tr, S_NO bl, S_YES br).
Stating it over inverseFunction of the actual weighted sums (rather
than the bare (1+S) products) keeps the witness sensitive to the
harmony and weights: with Tagalog constraints and w₅ ≠ w₆ (weights
1,1,1,1,1,2) the two products are 1/72 ≠ 1/60, disproving HZ.
Bridge — MaxEnt harmony ↔ Separable harmony #
MaxEnt unnormalized probability is the ME separable harmony:
exp(harmonyScore con w c) = meSeparable.eval (con · c).
Since meSeparable.eval v = exp(-∑ wₖvₖ) (me_separable_eval) and
harmonyScore con w c = -∑ wₖ·(con k c) (harmonyScore_eq_neg_sum), the
exponential of the harmony is exactly the separable-harmony evaluation —
making all separability theory (independence → HZ, constraint rescaling)
directly applicable to a (con, w) MaxEnt grammar.
MaxEnt logit rate as a weighted violation-difference sum:
the logit of MaxEnt probabilities is the negated weighted sum of violation
differences. Bridges maxent_logit_harmony with me_predicts_hz: since the
logit is a weighted sum, it satisfies HZ whenever the violation differences
satisfy ViolDiffIndependence.
Consistent Ordering from Monotone Transforms #
Consistent ordering from constant logit-rate differences: if a score
function d satisfies ConstantLogitDiff and f is strictly monotone,
then the f-transformed scores exhibit across-the-board consistency —
the product of same-column differences is positive.
This is the mathematical core of why HG produces sigmoid families ([ZH17] §2.5): any strictly monotone probability transform (MaxEnt's softmax, NHG's normal CDF, Normal MaxEnt's probit) applied to scores with constant logit-rate differences preserves the "across-the-board" ordering pattern. Differences may compress at floor and ceiling (producing sigmoids rather than claws), but they never change sign.