Documentation

Linglib.Phenomena.Phonology.Studies.Flemming2021

@cite{flemming-2021}: Comparing MaxEnt and Noisy Harmonic Grammar #

@cite{flemming-2021}

@cite{flemming-2021} compares three stochastic Harmonic Grammar variants — MaxEnt, Noisy HG (NHG), and Normal MaxEnt — identifying logit uniformity as the diagnostic that distinguishes them.

The three models as Random Utility Models #

All three HG variants are Random Utility Models (RUMs) differing only in the noise distribution added to the deterministic harmony scores:

Model	Noise target	Distribution	Binary P	Reference
MaxEnt	candidates	Gumbel	logistic(H−H')	`maxent_eq_gumbelRUM`
NHG	weights	Gaussian	Φ((H−H')/σ_d)	`nhg_choiceProb_eq`
Normal MaxEnt	candidates	Gaussian	Φ((H−H')/(ε√2))	`normalMaxEnt_choiceProb_eq`

Key diagnostic: logit uniformity #

MaxEnt exhibits logit uniformity (eq (10)): adding one violation of constraint j changes the logit by exactly −wⱼ, regardless of the tableau context. This follows from the log-odds identity (logit_uniformity):

log(P(a)/P(b)) = H(a) − H(b)

NHG violates logit uniformity because its noise standard deviation σ_d = σ · √(Σ(cⱼ(a)−cⱼ(b))²) (nhgSigmaD) depends on the violation difference profile. The same harmony difference ΔH produces different probits ΔH/σ_d in different contexts.

Normal MaxEnt has probit uniformity (constant σ_d = ε√2) rather than logit uniformity, leading to probit (Φ) rather than logistic probability functions — an empirically distinguishable prediction.

French schwa data #

Flemming tests logit uniformity on French schwa deletion across 8 phonological contexts with 6 constraints (Table (35)). Contexts that share the same *Clash violation difference should show the same logit difference under MaxEnt. We encode this data and verify:

logit_uniformity_clash: the *Clash contribution to the harmony difference is identical across all four paired contexts (MaxEnt prediction)
nhg_sigmaD_sq_varies: the NHG noise variance σ_d² differs between paired contexts, violating probit uniformity (NHG prediction)

theorem Flemming2021.maxent_eq_gumbelRUM {C : Type} [Fintype C] [Nonempty C] (constraints : List (Core.Constraint.WeightedConstraint C)) (c : C) :

Core.mcfaddenIntegral (Core.Constraint.harmonyScoreR constraints) 1 c = Core.softmax (Core.Constraint.harmonyScoreR constraints) 1 c

MaxEnt = Gumbel RUM (@cite{flemming-2021} §4/§10): MaxEnt probability is exactly the McFadden integral with Gumbel scale β = 1.

This formalizes the RUM connection: MaxEnt adds i.i.d. Gumbel noise to candidate harmonies, and by McFadden's theorem (mcfaddenIntegral_eq_softmax), the resulting choice probability is softmax — i.e., the standard MaxEnt formula.

theorem Flemming2021.eq10_logit_harmony {C : Type} [Fintype C] [Nonempty C] (constraints : List (Core.Constraint.WeightedConstraint C)) (a b : C) :

Real.log (Core.softmax (Core.Constraint.harmonyScoreR constraints) 1 a / Core.softmax (Core.Constraint.harmonyScoreR constraints) 1 b) = Core.Constraint.harmonyScoreR constraints a - Core.Constraint.harmonyScoreR constraints b

Flemming's eq (10): logit(P_a) = h_a − h_b. The MaxEnt logit-harmony identity. Alias for maxent_logit_harmony.

theorem Flemming2021.iia {C : Type} [Fintype C] [Nonempty C] (constraints : List (Core.Constraint.WeightedConstraint C)) (a b : C) :

Core.softmax (Core.Constraint.harmonyScoreR constraints) 1 a / Core.softmax (Core.Constraint.harmonyScoreR constraints) 1 b = Real.exp (Core.Constraint.harmonyScoreR constraints a - Core.Constraint.harmonyScoreR constraints b)

MaxEnt ratio independence (IIA): P(a)/P(b) = exp(H(a) − H(b)). The probability ratio depends only on the candidates' own scores, not on any other candidates. Corollary of softmax_odds with α = 1.

theorem Flemming2021.eq9_maxent_binary_logistic (constraints : List (Core.Constraint.WeightedConstraint (Fin 2))) :

Core.softmax (Core.Constraint.harmonyScoreR constraints) 1 0 = Core.logistic (Core.Constraint.harmonyScoreR constraints 0 - Core.Constraint.harmonyScoreR constraints 1)

MaxEnt binary logistic (@cite{flemming-2021} eq (9)/(11)): with two candidates, MaxEnt probability is the logistic function of the harmony difference.

P(0) = 1 / (1 + e^{-(H(0) − H(1))}) = logistic(H(0) − H(1))

Corollary of softmax_binary with α = 1.

def Flemming2021.schwaDiff (ctx : Fin 8) (con : Fin 6) :

ℤ

Violation difference matrix: ə candidate minus ∅ candidate. Rows = 8 contexts, columns = 6 constraints. Constraint order: 0=NoSchwa, 1=*CCC, 2=*Clash, 3=Max, 4=Dep, 5=*Cluster. Table (35) from @cite{flemming-2021}, data from @cite{smith-pater-2020}.

Equations

One or more equations did not get rendered due to their size.

Instances For

def Flemming2021.clashPairs :

Fin 4 → Fin 8 × Fin 8

The four *Clash pairs: contexts that differ only in *Clash (index 2). Each pair is (without *Clash, with *Clash).

Equations

Instances For

theorem Flemming2021.clash_pairs_identical_except_clash (pair : Fin 4) (j : Fin 6) (hj : j ≠ 2) :

schwaDiff (clashPairs pair).1 j = schwaDiff (clashPairs pair).2 j

*Clash pairs differ only in the *Clash column (index 2): for each pair, all non-*Clash violations are identical.

theorem Flemming2021.clash_diff_is_one (pair : Fin 4) :

schwaDiff (clashPairs pair).2 2 - schwaDiff (clashPairs pair).1 2 = 1

The *Clash violation difference is exactly 1 for all pairs.

theorem Flemming2021.logit_uniformity_clash (w : Fin 6 → ℚ) (pair : Fin 4) :

∑ j : Fin 6, w j * ↑(schwaDiff (clashPairs pair).2 j) - ∑ j : Fin 6, w j * ↑(schwaDiff (clashPairs pair).1 j) = w 2

Logit uniformity for *Clash (@cite{flemming-2021} §7.1): the *Clash contribution to the harmony difference is the same across all four paired contexts.

For any weights w, the harmony difference change between paired contexts = −w₂ (*Clash weight), independent of context. This follows from clash_pairs_identical_except_clash: since non-*Clash violations are identical in each pair, their weighted contributions cancel, leaving only −w₂ · 1 = −w₂.

This is a special case of me_predicts_hz (Separability.lean): the *Clash violation differences are column-insensitive (constant across paired contexts), so the weighted sum satisfies the constant-difference identity.

def Flemming2021.observedP :

Fin 8 → ℚ

Observed probability of schwa realization across 8 contexts. Data from @cite{smith-pater-2020} (Table 2 of @cite{flemming-2021}).

Values are approximate proportions (hundredths). The key pattern: within each *Clash pair, the +*Clash context always has higher P(schwa), consistent with the *Clash constraint favoring schwa insertion.

Equations

Instances For

theorem Flemming2021.clash_increases_schwa (pair : Fin 4) :

observedP (clashPairs pair).1 < observedP (clashPairs pair).2

Adding a *Clash violation increases P(schwa) in every paired context.

def Flemming2021.schwaSqSum (ctx : Fin 8) :

ℕ

Sum of squared violation differences for a context.

This is the study-local analogue of violationDiffSqSumQ from NoisyHG.lean: both compute Σⱼ (cⱼ(ə) − cⱼ(∅))², but schwaSqSum operates on the pre-computed difference matrix schwaDiff (Table (35)) rather than a WeightedConstraint list.

Equations

Flemming2021.schwaSqSum ctx = List.foldl (fun (acc : ℕ) (j : Fin 6) => acc + (Flemming2021.schwaDiff ctx j).natAbs ^ 2) 0 (List.finRange 6)

Instances For

theorem Flemming2021.nhg_sigmaD_sq_varies :

schwaSqSum 0 = 3 ∧ schwaSqSum 1 = 4 ∧ schwaSqSum 2 = 3 ∧ schwaSqSum 3 = 4 ∧ schwaSqSum 4 = 3 ∧ schwaSqSum 5 = 4 ∧ schwaSqSum 6 = 3 ∧ schwaSqSum 7 = 4

NHG noise variance σ_d² is context-dependent: without *Clash, the squared violation sum is 3; with *Clash, it is 4. The same *Clash violation change produces different σ_d values in different tableaux — σ_d = √3 vs σ_d = 2 (Table 3 of @cite{flemming-2021}).

noncomputable def Flemming2021.nhgProbitChange (h_init Δh σ_d σ_d' : ℝ) :

ℝ

NHG probit change when moving from one context to another: the change in the probit Φ⁻¹(P) = Δh / σ_d when σ_d changes.

h_init = initial harmony difference, Δh = harmony change (e.g., −w_Clash), σ_d / σ_d' = noise s.d. before/after the change.

Equations

Flemming2021.nhgProbitChange h_init Δh σ_d σ_d' = (h_init + Δh) / σ_d' - h_init / σ_d

Instances For

theorem Flemming2021.nhg_probit_change_depends_on_h_init (Δh σ_d σ_d' h₁ h₂ : ℝ) (hσ : σ_d ≠ σ_d') (hσ_pos : 0 < σ_d) (hσ'_pos : 0 < σ_d') (hh : h₁ ≠ h₂) :

nhgProbitChange h₁ Δh σ_d σ_d' ≠ nhgProbitChange h₂ Δh σ_d σ_d'

Probit non-uniformity (@cite{flemming-2021} §7.2): when σ_d ≠ σ_d', the NHG probit change depends on the initial harmony difference h_init.

Two contexts with different initial harmonies h₁ ≠ h₂ but the same *Clash change Δh produce different probit changes. This is because the denominator shift (σ_d → σ_d') rescales the existing harmony difference differently depending on its magnitude.

Concretely, for French schwa with σ = 1 (@cite{flemming-2021} §7.2): adding a *Clash violation changes σ_d from √3 to 2 in all pairs, but the initial harmony difference h_ə − h_∅ differs between pairs (e.g., −2.2 for pair (0,1) vs 0.01 for pair (4,5)), so the probit changes differ despite the same *Clash change.

theorem Flemming2021.nhgProbitChange_decomp (h_init Δh σ_d σ_d' : ℝ) (hσ_pos : 0 < σ_d) (hσ'_pos : 0 < σ_d') :

nhgProbitChange h_init Δh σ_d σ_d' = h_init * (σ_d - σ_d') / (σ_d * σ_d') + Δh / σ_d'

Probit change decomposition (@cite{flemming-2021} eq (38b)): the NHG probit change decomposes into a context-dependent term (proportional to initial harmony difference) and a uniform term.

Δprobit = h · (σ_d − σ_d') / (σ_d · σ_d') + Δh / σ_d'

The first term is why NHG violates probit uniformity: it depends on h_init, which varies across contexts.

@[implicit_reducible]

instance Flemming2021.instDecidableEqCand3 :

DecidableEq Flemming2021.Cand3✝

Equations

Flemming2021.instDecidableEqCand3 x✝ y✝ = if h : Flemming2021.Cand3.ctorIdx✝ x✝ = Flemming2021.Cand3.ctorIdx✝ y✝ then isTrue ⋯ else isFalse ⋯

@[implicit_reducible]

instance Flemming2021.instReprCand3 :

Repr Flemming2021.Cand3✝

Equations

Flemming2021.instReprCand3 = { reprPrec := Flemming2021.instReprCand3.repr }

def Flemming2021.instReprCand3.repr :

Flemming2021.Cand3✝ → ℕ → Std.Format

Equations

One or more equations did not get rendered due to their size.

Instances For

@[implicit_reducible]

instance Flemming2021.instFintypeCand3 :

Fintype Flemming2021.Cand3✝

Equations

Flemming2021.instFintypeCand3 = { elems := { val := ↑Flemming2021.Cand3.enumList✝, nodup := Flemming2021.Cand3.enumList_nodup✝ }, complete := Flemming2021.instFintypeCand3._proof_1 }

theorem Flemming2021.table45_equal_harmony :

Core.Constraint.harmonyScore Flemming2021.table45C✝ Flemming2021.Cand3.b✝ = Core.Constraint.harmonyScore Flemming2021.table45C✝ Flemming2021.Cand3.c✝

Candidates b and c have equal harmony: H(b) = H(c) = −16.

theorem Flemming2021.table45_nhg_variance_differs :

Core.Constraint.violationDiffSqSumQ Flemming2021.table45C✝ Flemming2021.Cand3.a✝ Flemming2021.Cand3.b✝ ≠ Core.Constraint.violationDiffSqSumQ Flemming2021.table45C✝ Flemming2021.Cand3.a✝ Flemming2021.Cand3.c✝

NHG noise variances differ: σ²_d(b−a) = 5 ≠ 3 = σ²_d(c−a). Equal-harmony candidates can have different NHG probabilities.

theorem Flemming2021.table45_maxent_equal_prob :

Core.softmax (Core.Constraint.harmonyScoreR Flemming2021.table45C✝) 1 Flemming2021.Cand3.b✝ = Core.softmax (Core.Constraint.harmonyScoreR Flemming2021.table45C✝) 1 Flemming2021.Cand3.c✝

In MaxEnt, equal harmony implies equal probability: since softmax(s, α, b) = exp(α·s(b)) / Σ exp(α·s(i)), candidates with the same score get the same numerator and hence the same probability.

This is the MaxEnt half of the §9 contrast: MaxEnt assigns P(b) = P(c) (both have H = −16), while NHG assigns P(b) ≠ P(c) because their noise variances differ (table45_nhg_variance_differs).

theorem Flemming2021.table45_nhg_covariance_value :

Core.Constraint.nhgCovarianceQ Flemming2021.table45C✝ Flemming2021.Cand3.a✝ Flemming2021.Cand3.b✝ Flemming2021.Cand3.c✝ = 3

NHG noise covariance value: Cov(ε_b−ε_a, ε_c−ε_a) = 3σ².

The paper (@cite{flemming-2021} §9, p. 37) computes Cov(ε_a−ε_b, ε_c−ε_b) = 2σ² using candidate b as reference. Our formalization uses candidate a as reference, giving 3σ² — a different but equally valid demonstration that the covariance matrix is non-diagonal.

theorem Flemming2021.table45_nhg_covariance_nonzero :

Core.Constraint.nhgCovarianceQ Flemming2021.table45C✝ Flemming2021.Cand3.a✝ Flemming2021.Cand3.b✝ Flemming2021.Cand3.c✝ ≠ 0

NHG noise covariance is non-zero: Cov(ε_b−ε_a, ε_c−ε_a) ≠ 0. The multivariate normal over score differences has a non-diagonal covariance matrix, so binary comparisons don't determine the joint distribution — NHG violates IIA (@cite{flemming-2021} §9).

def Flemming2021.schwaSquareNull :

Core.Constraint.Square (Fin 8)

The /∅/ square: contexts 0–3 (underlying /∅/, varying onset × stress).

Equations

Flemming2021.schwaSquareNull = { tl := 0, tr := 1, bl := 2, br := 3 }

Instances For

def Flemming2021.schwaSquareSchwa :

Core.Constraint.Square (Fin 8)

The /ə/ square: contexts 4–7 (underlying /ə/, varying onset × stress).

Equations

Flemming2021.schwaSquareSchwa = { tl := 4, tr := 5, bl := 6, br := 7 }

Instances For

theorem Flemming2021.schwaNull_independence :

Core.Constraint.ViolDiffIndependence (fun (k : Fin 6) (ctx : Fin 8) => ↑(schwaDiff ctx k)) schwaSquareNull

Violation differences satisfy independence on the /∅/ square: each of the 6 constraints is insensitive to either onset (row) or stress (column).

theorem Flemming2021.schwaSchwa_independence :

Core.Constraint.ViolDiffIndependence (fun (k : Fin 6) (ctx : Fin 8) => ↑(schwaDiff ctx k)) schwaSquareSchwa

Violation differences satisfy independence on the /ə/ square.

theorem Flemming2021.schwaNull_hz (w : Fin 6 → ℝ) :

Core.Constraint.ConstantLogitDiff (fun (ctx : Fin 8) => ∑ k : Fin 6, w k * ↑(schwaDiff ctx k)) schwaSquareNull

HZ's generalization for French schwa (/∅/ square): for any MaxEnt weights, the logit-rate difference across onset types is constant across stress contexts. Derived from me_predicts_hz + schwaNull_independence.

theorem Flemming2021.schwaSchwa_hz (w : Fin 6 → ℝ) :

Core.Constraint.ConstantLogitDiff (fun (ctx : Fin 8) => ∑ k : Fin 6, w k * ↑(schwaDiff ctx k)) schwaSquareSchwa

HZ's generalization for French schwa (/ə/ square).

Flemming's three-candidate table-(45) MaxEnt model is a ConstraintSystem Cand3 ℝ decoded by softmaxDecoder 1. The key observation H(b) = H(c) ⟹ predict b = predict c is a property of any MaxEnt-style decoder: it depends only on the score equality and the symmetry of softmax. By contrast NHG and Normal MaxEnt distinguish b and c despite equal harmony (table45_nhg_variance_differs, table45_nhg_covariance_nonzero) — same score, different decoder, different predict.

theorem Flemming2021.flemmingSystem_b_eq_c :

Flemming2021.flemmingSystem✝.predict Flemming2021.Cand3.b✝ = Flemming2021.flemmingSystem✝.predict Flemming2021.Cand3.c✝

The MaxEnt system assigns equal probability to candidates b and c, matching table45_maxent_equal_prob but stated through the generic predict API. The frame here makes the framework distinction sharp: NHG (argmax after Gaussian noise on weights) and Normal MaxEnt (argmax after Gaussian noise on candidates) would assign different probabilities to b and c despite identical harmonies because they differ only in decoder, not in score.

theorem Flemming2021.flemmingSystem_isProb :

∑ x : Flemming2021.Cand3✝, Flemming2021.flemmingSystem✝.predict x = 1

The MaxEnt system on Cand3 is a probability distribution.