MaxEnt Constraint Systems [Sto26b] #
MaxEnt grammars assign probabilities to input→output mappings using the
softmax function over weighted constraint violations — the probabilistic
generalization of OT (where α → ∞ recovers categorical optimization;
see softmax_argmax_limit).
This module extends the basic MaxEnt setup with systemic constraints ([Sto26b]): constraints that evaluate tuples of mappings jointly (e.g., *HOMOPHONY penalizes distinct inputs that map to the same output). Systemic constraints require evaluating a joint distribution over the product space of all input→output mappings, then marginalizing to recover individual mapping probabilities.
The framework is generic in the candidate type — phonology is one instance, but the same machinery applies to any domain that scores candidates by weighted constraint violations.
Architecture #
- A classical MaxEnt grammar is a constraint set
con : CON (I × O) n+ weight vectorw(fromConstraints.Defs), realized as aConstraintSystemviaCON.hgSystem; its probability ispredict(no grammar record). SystemicConstraint n O := (Fin n → O) → ℕscores a whole output tuple (the systemic twin ofConstraint); weights are a separateFin k → ℝvector — no grammar record, mirroring the classicalcon/wsplitjointHarmonyScorecombines classical + systemic scores over the product spacemarginalProbmarginalizes the joint to recover individual mapping probabilities
Key theorem #
marginal_eq_classical_when_no_systemic: when systemic constraint weight = 0,
the marginalized probability equals classical MaxEnt. This is because the joint
score decomposes additively ⇒ the joint distribution factorizes ⇒ each marginal
equals its factor.
Classical MaxEnt #
Classical MaxEnt assigns P(o ∣ i) ∝ exp(-harmonyScore con w (i, o)). There is
no grammar record: a MaxEnt grammar is a constraint set con : CON (I × O) n
and a weight vector w : Fin n → ℝ held as plain values (the HG twin of an OT
Ranking), and its probability is (CON.hgSystem con w Finset.univ).predict —
softmax-decoded harmony, via the shared Core.Optimization.ConstraintSystem
(CON.hgSystem below). The systemic extension (§2–4) scores output tuples
jointly and so does not factor through the per-mapping predict.
Systemic Constraints #
A systemic constraint evaluates a whole output tuple — one output per
input — rather than an individual input→output pair, so it cannot be
decomposed into per-mapping evaluations (e.g. *HOMOPHONY counts colliding
output pairs). Like a Constraint, it is its violation-counting function;
its weight is supplied separately (the systemic twin of an HG weight).
Equations
- HarmonicGrammar.SystemicConstraint n O = ((Fin n → O) → ℕ)
Instances For
*HOMOPHONY: penalizes output tuples where distinct inputs receive the same
output, counting the colliding pairs |{(i, j) : i < j ∧ f i = f j}|.
Equations
- HarmonicGrammar.homophonyAvoidance f = {p : Fin n × Fin n | p.1 < p.2 ∧ f p.1 = f p.2}.card
Instances For
Joint Distribution with Systemic Constraints #
Systemic harmony of an output tuple: -∑ⱼ swⱼ · sconⱼ(f), the negated
weighted sum of the systemic constraints' tuple-violation counts. The
systemic twin of harmonyScore — same negated-weighted-sum shape, but
scoring whole tuples; it is the coupling component of the joint score.
Equations
- HarmonicGrammar.systemicScore sw scon f = -∑ j : Fin k, sw j * ↑(scon j f)
Instances For
Joint harmony score over the product space, combining classical per-mapping
scores with the systemic tuple-level score:
H_joint(f) = ∑ᵢ H(iᵢ, f i) + systemicScore sw scon f.
Equations
- One or more equations did not get rendered due to their size.
Instances For
MaxEnt grammar with systemic constraints as a CoupledSoftmax:
componentScore i v = harmonyScore classicalCon classicalW (inputs i, v) and
couplingScore f = systemicScore sw scon f. The joint probability is a
softmax over all Fin n → O output tuples; its marginal at position i
recovers the individual mapping probability under systemic pressure.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Marginal probability P(oᵢ ∣ iᵢ) = ∑_{f : f i = oᵢ} P_joint(f): marginalize
the joint distribution to recover a specific mapping's probability under
systemic pressure ([Sto26b]'s key equation). Defined through
CoupledSoftmax.marginal so that factorization follows from
marginal_eq_independent_when_uncoupled.
Equations
- HarmonicGrammar.marginalProb inputs classicalCon classicalW sw scon i o = (HarmonicGrammar.maxEntCoupled inputs classicalCon classicalW sw scon).marginal i o
Instances For
Factorization Theorem #
Factorization: when systemic weights are all zero, the marginal equals
the classical MaxEnt probability. The coupling score is then constant (= 0),
so marginal_eq_independent_when_uncoupled applies: the joint factorizes and
each marginal equals its independent per-item softmax.