Coupled Evaluation: Softmax over Product Spaces #
When items are evaluated jointly — via systemic constraints in MaxEnt grammar or via shared latent variables in RSA — the distribution over individual items doesn't factorize. This module formalizes the shared pattern:
- Per-item scores
s(i, v)that depend only on one item's value - Coupling scores
C(f)that evaluate the full assignment jointly - Joint probability via softmax:
P(f) ∝ exp(Σᵢ s(i, f(i)) + C(f)) - Marginal probability:
P_i(v) = Σ_{f : f(i) = v} P(f)
The key theorem is factorization: when the coupling score is constant (no coupling), the joint factorizes and each marginal equals the independent softmax of its per-item score.
Instances #
MaxEnt systemic constraints ([Sto26b]): items = input→output mappings, coupling = *HOMOPHONY or other systemic constraints.
componentScore i v = harmonyScore(classicalConstraints, (inputs i, v)),couplingScore f = -Σₖ wₖ · Sₖ(f).RSA lexical uncertainty ([BLG16]): the analog is a mixture model rather than a coupled exponential — the listener marginalizes over lexicons:
L1(u,w) ∝ prior(w) · Σ_l prior(l) · S1(l,w,u). The algebraic form differs (mixture vs coupled exponential), but the phenomenon is identical: joint evaluation creates dependencies between items that would otherwise be independent, and marginalization recovers individual probabilities that reflect the coupling.See the RSA listener chains for the RSA marginalization pattern.
BToM inference (
Core/Agent/BToM.lean): the observer marginalizes over latent mental state variables (percept, belief, desire) to recover world posteriors.worldMarginal(a, w) = Σ_{p,b,d,s,m} joint(a,p,b,d,s,m,w).
Factorization #
The core theorem (marginal_eq_independent_when_uncoupled): when the coupling
score is constant, the joint distribution factorizes as a product of
independent per-item distributions. This is the mathematical basis for:
- Storme: systemic constraint weight = 0 ⟹ marginal = classical MaxEnt
- RSA: single lexicon (|Lexicon| = 1) ⟹ L1 = standard Bayesian update
- MaxEnt↔OT:
softmax_argmax_limitsends MaxEnt → OT; this theorem sends coupled MaxEnt → uncoupled MaxEnt when coupling vanishes
A coupled evaluation over indexed items, where the joint score may not decompose into independent per-item scores.
The joint probability of assignment f : Index → Value is:
P(f) = softmax(totalScore, 1)(f) over the space of all assignments.
When couplingScore is constant, totalScore decomposes additively
and the joint factorizes into independent per-item softmax distributions.
- componentScore : Index → Value → ℝ
Per-item score: depends only on one item's value. In MaxEnt:
harmonyScore(classicalConstraints, (input_i, v)). In RSA:log S1(lexicon, world, utterance). - couplingScore : (Index → Value) → ℝ
Coupling score: evaluates the full assignment jointly. In MaxEnt: systemic constraint penalty (e.g., *HOMOPHONY). In RSA: log prior over shared lexicon.
Instances For
Total score of an assignment: sum of per-item scores plus coupling.
Equations
- cs.totalScore f = ∑ i : I, cs.componentScore i (f i) + cs.couplingScore f
Instances For
Joint probability: softmax over the space of all assignments I → V.
Equations
- cs.jointProb f = Core.softmax cs.totalScore f
Instances For
Marginal probability: P_i(v) = Σ_{f : f(i) = v} P_joint(f). Marginalizes the joint distribution to recover the probability of a specific value at a specific index.
Instances For
Marginals sum to 1 (marginalization preserves total probability).
Proof: each assignment f contributes to exactly one value v = f(i),
so Σ_v marginal(i, v) = Σ_v Σ_{f:f(i)=v} P(f) = Σ_f P(f) = 1.
Marginal probabilities are non-negative.
Independent per-item probability: what the marginal would be if there
were no coupling. P_indep(i, v) = softmax(componentScore(i, ·), 1)(v).
Equations
- cs.independentProb i v = Core.softmax (cs.componentScore i) v
Instances For
Factorization theorem: when coupling is constant (no genuine coupling), the marginal probability equals the independent per-item softmax.
This is the shared mathematical core of:
- [Sto26b]: systemic weight = 0 ⟹ marginal = classical MaxEnt
- [BLG16]: single lexicon ⟹ L1 = standard posterior
- BToM: delta-function latent prior ⟹ marginal = direct inference
Proof #
When couplingScore f = c for all f:
totalScore f = Σᵢ componentScore(i, f(i)) + c- By
softmax_add_const, addingcdoesn't change softmax:jointProb f = softmax(f ↦ Σᵢ componentScore(i, f(i)), 1)(f) - Since the score decomposes additively over indices,
exp(Σᵢ sᵢ) = Πᵢ exp(sᵢ), so the joint factorizes:jointProb f = Πᵢ softmax(componentScore(i, ·))(f(i)) - Marginalizing:
marginal(i, v) = softmax(componentScore(i, ·))(v) · Π_{j≠i} Σ_{v'} softmax(componentScore(j, ·))(v') = softmax(·)(v) · 1
Step 3 uses the finite Fubini theorem (Fintype.prod_sum):
Σ_{f : I → V} Πᵢ g(i, f(i)) = Πᵢ (Σ_v g(i, v)).
The filter [f(i) = v] is encoded into the product via a zero/one trick:
define g(j, w) = (j = i ? (w = v ? exp(s_i(v)) : 0) : exp(s_j(w))), so that
when f(i) ≠ v the product vanishes (Fubini still applies), and when
f(i) = v the product equals ∏_j exp(s_j(f(j))).
A coupled evaluation is genuinely coupled at index i iff there exist
two assignments agreeing at i but with different total scores.
This is the observable signature of non-factorization: the score of the
assignment at position i depends on the values at other positions.
Equations
- cs.genuinelyCoupled i = ∃ (f : I → V) (g : I → V), f i = g i ∧ cs.totalScore f ≠ cs.totalScore g
Instances For
Construct a CoupledSoftmax from MaxEnt grammar components.
componentScore i v = classicalScore(inputs(i), v)couplingScore f = systemicScore(f)
This shows that the MaxEnt systemic constraint framework from
Phonology.HarmonicGrammar.MaxEnt is an instance of
the general coupled evaluation pattern. The marginal_eq_classical_when_no_systemic
theorem in MaxEnt.lean is a corollary of marginal_eq_independent_when_uncoupled.
Equations
- One or more equations did not get rendered due to their size.