Documentation

Linglib.Phonology.HarmonicGrammar.NoisyHG

Noisy Harmonic Grammar and Normal MaxEnt #

Noisy HG ([BP16]) adds i.i.d. Gaussian noise N(0,σ²) to each constraint weight before evaluation. For binary choice between candidates a and b, the harmony score difference H(a)−H(b) becomes a Gaussian random variable with variance σ² · Σⱼ(cⱼ(a)−cⱼ(b))² — the noise is context-dependent, scaling with squared violation differences.

Normal MaxEnt ([Fle21]) instead adds i.i.d. Gaussian noise N(0,ε²) directly to candidate scores, giving a constant noise standard deviation σ_d = ε√2 for binary choice.

Both are instances of the Gaussian random utility model (Core.gaussianChoiceProb, the probit choice rule) — the Gaussian sibling of softmax. They ground directly in that pure-math fact, not in Thurstone's psychophysics: Noisy HG and Thurstone Case V are sibling applications of the same probit RUM, neither depending on the other.

A grammar here is a constraint set con : CON C n and a weight vector w : Fin n → ℝ (from Constraints.Defs); there is no weighted-constraint record. Violation-difference quantities (violationDiffSqSum, nhgCovariance) read only con; harmony quantities read con and w.

MaxEnt logit-harmony identity #

For classical MaxEnt (Gumbel noise → softmax), the log-odds ratio between any two candidates equals their harmony score difference:

log(P(a)/P(b)) = H(a) − H(b)

This implies logit uniformity ([Fle21] §5.1, eq (10)): adding one violation of constraint j changes the logit by exactly −wⱼ, regardless of the violation profile elsewhere. NHG lacks this property because its noise variance σ_d depends on the violation profile.

NHG Noise Variance #

noncomputable def HarmonicGrammar.violationDiffSqSum {C : Type u_1} {n : ℕ} (con : Constraints.CON C n) (a b : C) :

ℝ

Sum of squared violation differences between two candidates. This determines the NHG noise variance: σ_d² = σ² · violationDiffSqSum.

Equations

HarmonicGrammar.violationDiffSqSum con a b = ∑ i : Fin n, (↑(con i a) - ↑(con i b)) ^ 2

Instances For

def HarmonicGrammar.violationDiffSqSumQ {C : Type u_1} {n : ℕ} (con : Constraints.CON C n) (a b : C) :

ℚ

Sum of squared violation differences (ℚ, computable). Use this for concrete examples with decide.

Equations

HarmonicGrammar.violationDiffSqSumQ con a b = ∑ i : Fin n, (↑(con i a) - ↑(con i b)) ^ 2

Instances For

noncomputable def HarmonicGrammar.nhgSigmaD {C : Type u_1} {n : ℕ} (con : Constraints.CON C n) (sigma : ℝ) (a b : C) :

ℝ

NHG noise standard deviation for binary choice: σ_d = σ · √(Σⱼ (cⱼ(a) − cⱼ(b))²).

The noise is context-dependent: it scales with the violation difference profile, not just the per-weight noise σ.

Equations

HarmonicGrammar.nhgSigmaD con sigma a b = sigma * √(HarmonicGrammar.violationDiffSqSum con a b)

Instances For

NHG binary choice (Gaussian random utility model) #

noncomputable def HarmonicGrammar.nhgChoiceProb {C : Type u_1} {n : ℕ} (con : Constraints.CON C n) (w : Fin n → ℝ) (sigma : ℝ) (a b : C) :

ℝ

NHG binary choice probability ([Fle21]): P(a ≻ b) = Φ((H(a) − H(b)) / σ_d).

Noisy HG is the Gaussian random utility model (gaussianChoiceProb, the probit choice rule) applied to the harmony gap, with the context-dependent NHG noise σ_d = σ·√(Σⱼ(cⱼ(a)−cⱼ(b))²) (nhgSigmaD). It grounds directly in the Gaussian RUM — not through Thurstone's psychophysics.

Equations

HarmonicGrammar.nhgChoiceProb con w sigma a b = Core.gaussianChoiceProb (Constraints.harmonyScore con w a - Constraints.harmonyScore con w b) (HarmonicGrammar.nhgSigmaD con sigma a b)

Instances For

theorem HarmonicGrammar.nhg_choiceProb_eq {C : Type u_1} {n : ℕ} (con : Constraints.CON C n) (w : Fin n → ℝ) (sigma : ℝ) (a b : C) :

nhgChoiceProb con w sigma a b = Core.normalCDF ((Constraints.harmonyScore con w a - Constraints.harmonyScore con w b) / nhgSigmaD con sigma a b)

NHG choice probability in closed form: Φ((H(a) − H(b)) / σ_d) ([Fle21] eq (15)).

Normal MaxEnt #

noncomputable def HarmonicGrammar.normalMaxEntSigmaD (epsilon : ℝ) :

ℝ

Normal MaxEnt noise standard deviation: σ_d = ε√2 (constant).

When noise is added i.i.d. N(0,ε²) to each candidate score, the difference of two candidates is N(H(a)−H(b), 2ε²), giving σ_d = ε√2 regardless of the violation profile. This is the key distinction from NHG, where σ_d varies by context.

Equations

HarmonicGrammar.normalMaxEntSigmaD epsilon = epsilon * √2

Instances For

noncomputable def HarmonicGrammar.normalMaxEntChoiceProb {C : Type u_1} {n : ℕ} (con : Constraints.CON C n) (w : Fin n → ℝ) (epsilon : ℝ) (a b : C) :

ℝ

Normal MaxEnt binary choice probability ([Fle21]): P(a ≻ b) = Φ((H(a) − H(b)) / (ε√2)).

Like NHG, the Gaussian random utility model (gaussianChoiceProb) applied to the harmony gap — but with the constant noise σ_d = ε√2 (normalMaxEntSigmaD) rather than NHG's context-dependent σ_d.

Equations

One or more equations did not get rendered due to their size.

Instances For

theorem HarmonicGrammar.normalMaxEnt_choiceProb_eq {C : Type u_1} {n : ℕ} (con : Constraints.CON C n) (w : Fin n → ℝ) (epsilon : ℝ) (a b : C) :

normalMaxEntChoiceProb con w epsilon a b = Core.normalCDF ((Constraints.harmonyScore con w a - Constraints.harmonyScore con w b) / normalMaxEntSigmaD epsilon)

Normal MaxEnt choice probability in closed form: Φ((H(a) − H(b)) / (ε√2)) ([Fle21] eq (17)).

MaxEnt Logit-Harmony Identity #

theorem HarmonicGrammar.logit_uniformity {ι : Type u_2} [Fintype ι] [Nonempty ι] (s : ι → ℝ) (a b : ι) :

Real.log (Core.softmax s a / Core.softmax s b) = s a - s b

Logit uniformity (MaxEnt diagnostic; [Fle21] §5.1): for softmax with α = 1, the log-odds between any two alternatives equals their score difference. Hence changing one score by −w changes the log-odds by exactly −w, regardless of context.

This property characterizes MaxEnt among stochastic HG variants. NHG lacks it because its noise variance σ_d depends on the violation profile (see nhgSigmaD).

See also maxent_logit_as_finsum (Separability.lean) for the Fin-indexed decomposition, and me_predicts_hz for the consequence that independent violation differences yield HZ's generalization.

theorem HarmonicGrammar.maxent_logit_harmony {C : Type u_1} {n : ℕ} [Fintype C] [Nonempty C] (con : Constraints.CON C n) (w : Fin n → ℝ) (a b : C) :

Real.log (Core.softmax (Constraints.harmonyScore con w) a / Core.softmax (Constraints.harmonyScore con w) b) = Constraints.harmonyScore con w a - Constraints.harmonyScore con w b

MaxEnt logit-harmony identity: the log-odds ratio between two candidates equals their harmony score difference.

log(P(a)/P(b)) = H(a) − H(b)

Instantiation of logit_uniformity with harmony scores.

theorem HarmonicGrammar.maxent_iia {C : Type u_1} {n : ℕ} [Fintype C] [Nonempty C] (con : Constraints.CON C n) (w : Fin n → ℝ) (a b : C) :

Core.softmax (Constraints.harmonyScore con w) a / Core.softmax (Constraints.harmonyScore con w) b = Real.exp (Constraints.harmonyScore con w a - Constraints.harmonyScore con w b)

Ratio independence (IIA for MaxEnt): the probability ratio between two candidates depends only on their own harmony scores.

P(a)/P(b) = exp(H(a) − H(b))

Adding or removing other candidates from the competition doesn't change the ratio. Corollary of softmax_odds with α = 1.

Harmony Difference Decomposition #

theorem HarmonicGrammar.harmonyScore_diff {C : Type u_1} {n : ℕ} (con : Constraints.CON C n) (w : Fin n → ℝ) (a b : C) :

Constraints.harmonyScore con w a - Constraints.harmonyScore con w b = -∑ i : Fin n, w i * (↑(con i a) - ↑(con i b))

Harmony difference decomposition: the harmony score difference equals the negated weighted sum of violation differences.

H(a) − H(b) = −Σⱼ wⱼ · (cⱼ(a) − cⱼ(b))

This is the bridge between abstract harmony scores and the constraint violation patterns used in empirical analyses (e.g., French schwa).

Censored NHG ([Fle21] §7.3) #

noncomputable def HarmonicGrammar.censoredWeight (w n : ℝ) :

ℝ

Censored weight: noise is clamped so weights never go negative.

In censored NHG, the noisy weight is max(0, w + n) where n ~ N(0,σ²). This prevents negative weights (which would reverse constraint preferences) and makes the effective noise variance depend on the weight magnitude: constraints with larger weights are less affected by censoring.

Equations

HarmonicGrammar.censoredWeight w n = max 0 (w + n)

Instances For

theorem HarmonicGrammar.censoredWeight_nonneg (w n : ℝ) :

0 ≤ censoredWeight w n

Censored weights are non-negative by construction.

theorem HarmonicGrammar.censoredWeight_mono_weight {w₁ w₂ n : ℝ} (hw : w₁ ≤ w₂) :

censoredWeight w₁ n ≤ censoredWeight w₂ n

Censored weights are monotone in the underlying weight.

theorem HarmonicGrammar.censored_nhg_weight_sensitivity (w₁ w₂ : ℝ) (hw : w₁ < w₂) :

∃ (n : ℝ), censoredWeight w₁ n ≠ censoredWeight w₂ n

Weight sensitivity ([Fle21] §7.3): censoring is non-trivial — different weights produce different censored values for some noise realization.

The witness is n = -w₁: this zeroes out w₁ but leaves w₂ > 0.

Multi-Candidate NHG Covariance #

noncomputable def HarmonicGrammar.nhgCovariance {C : Type u_1} {n : ℕ} (con : Constraints.CON C n) (sigma : ℝ) (a b c : C) :

ℝ

NHG noise covariance between two score differences relative to a reference candidate a:

Cov(ε_b − ε_a, ε_c − ε_a) = σ² · Σₖ (cₖ(b) − cₖ(a)) · (cₖ(c) − cₖ(a))

When this is non-zero, the score differences are correlated, and the joint distribution over 3+ candidates is multivariate normal with non-diagonal covariance — not reducible to independent binary comparisons. This is why NHG violates IIA for 3+ candidates ([Fle21] §9).

Equations

HarmonicGrammar.nhgCovariance con sigma a b c = sigma ^ 2 * ∑ i : Fin n, (↑(con i b) - ↑(con i a)) * (↑(con i c) - ↑(con i a))

Instances For

def HarmonicGrammar.nhgCovarianceQ {C : Type u_1} {n : ℕ} (con : Constraints.CON C n) (a b c : C) :

ℚ

NHG covariance (ℚ, computable).

Equations

HarmonicGrammar.nhgCovarianceQ con a b c = ∑ i : Fin n, (↑(con i b) - ↑(con i a)) * (↑(con i c) - ↑(con i a))

Instances For

theorem HarmonicGrammar.nhgCovariance_self {C : Type u_1} {n : ℕ} (con : Constraints.CON C n) (sigma : ℝ) (a b : C) :

nhgCovariance con sigma a b b = sigma ^ 2 * violationDiffSqSum con b a

The NHG self-covariance Cov(ε_b − ε_a, ε_b − ε_a) equals the variance σ² · violationDiffSqSum, recovering the binary case.