Coupled Evaluation: Softmax over Product Spaces #

When items are evaluated jointly — via systemic constraints in MaxEnt grammar or via shared latent variables in RSA — the distribution over individual items doesn't factorize. This module formalizes the shared pattern:

Per-item scores s(i, v) that depend only on one item's value
Coupling scores C(f) that evaluate the full assignment jointly
Joint probability via softmax: P(f) ∝ exp(Σᵢ s(i, f(i)) + C(f))
Marginal probability: P_i(v) = Σ_{f : f(i) = v} P(f)

The key theorem is factorization: when the coupling score is constant (no coupling), the joint factorizes and each marginal equals the independent softmax of its per-item score.

Instances #

MaxEnt systemic constraints ([Sto26b]): items = input→output mappings, coupling = *HOMOPHONY or other systemic constraints. componentScore i v = harmonyScore(classicalConstraints, (inputs i, v)), couplingScore f = -Σₖ wₖ · Sₖ(f).
RSA lexical uncertainty ([BLG16]): the analog is a mixture model rather than a coupled exponential — the listener marginalizes over lexicons: L1(u,w) ∝ prior(w) · Σ_l prior(l) · S1(l,w,u). The algebraic form differs (mixture vs coupled exponential), but the phenomenon is identical: joint evaluation creates dependencies between items that would otherwise be independent, and marginalization recovers individual probabilities that reflect the coupling.
See the RSA listener chains for the RSA marginalization pattern.
BToM inference (Core/Agent/BToM.lean): the observer marginalizes over latent mental state variables (percept, belief, desire) to recover world posteriors. worldMarginal(a, w) = Σ_{p,b,d,s,m} joint(a,p,b,d,s,m,w).

Factorization #

The core theorem (marginal_eq_independent_when_uncoupled): when the coupling score is constant, the joint distribution factorizes as a product of independent per-item distributions. This is the mathematical basis for:

Storme: systemic constraint weight = 0 ⟹ marginal = classical MaxEnt
RSA: single lexicon (|Lexicon| = 1) ⟹ L1 = standard Bayesian update
MaxEnt↔OT: softmax_argmax_limit sends MaxEnt → OT; this theorem sends coupled MaxEnt → uncoupled MaxEnt when coupling vanishes

source

structure Core.CoupledSoftmax (Index : Type u_1) (Value : Type u_2) [Fintype Index] [Fintype Value] :

Type (max u_1 u_2)

A coupled evaluation over indexed items, where the joint score may not decompose into independent per-item scores.

The joint probability of assignment f : Index → Value is: P(f) = softmax(totalScore, 1)(f) over the space of all assignments.

When couplingScore is constant, totalScore decomposes additively and the joint factorizes into independent per-item softmax distributions.

componentScore : Index → Value → ℝ
Per-item score: depends only on one item's value. In MaxEnt: harmonyScore(classicalConstraints, (input_i, v)). In RSA: log S1(lexicon, world, utterance).
couplingScore : (Index → Value) → ℝ
Coupling score: evaluates the full assignment jointly. In MaxEnt: systemic constraint penalty (e.g., *HOMOPHONY). In RSA: log prior over shared lexicon.

Instances For

source

noncomputable def Core.CoupledSoftmax.totalScore {I : Type u_1} {V : Type u_2} [Fintype I] [Fintype V] (cs : CoupledSoftmax I V) (f : I → V) :

ℝ

Total score of an assignment: sum of per-item scores plus coupling.

Equations

cs.totalScore f = ∑ i : I, cs.componentScore i (f i) + cs.couplingScore f

Instances For

source

noncomputable def Core.CoupledSoftmax.jointProb {I : Type u_1} {V : Type u_2} [Fintype I] [DecidableEq I] [Fintype V] [Nonempty V] (cs : CoupledSoftmax I V) (f : I → V) :

ℝ

Joint probability: softmax over the space of all assignments I → V.

Equations

cs.jointProb f = Core.softmax cs.totalScore f

Instances For

source

noncomputable def Core.CoupledSoftmax.marginal {I : Type u_1} {V : Type u_2} [Fintype I] [DecidableEq I] [Fintype V] [DecidableEq V] [Nonempty V] (cs : CoupledSoftmax I V) (i : I) (v : V) :

ℝ

Marginal probability: P_i(v) = Σ_{f : f(i) = v} P_joint(f). Marginalizes the joint distribution to recover the probability of a specific value at a specific index.

Equations

cs.marginal i v = ∑ f : I → V, if f i = v then cs.jointProb f else 0

Instances For

source

theorem Core.CoupledSoftmax.marginal_sum_eq_one {I : Type u_1} {V : Type u_2} [Fintype I] [DecidableEq I] [Fintype V] [DecidableEq V] [Nonempty V] (cs : CoupledSoftmax I V) (i : I) :

∑ v : V, cs.marginal i v = 1

Marginals sum to 1 (marginalization preserves total probability).

Proof: each assignment f contributes to exactly one value v = f(i), so Σ_v marginal(i, v) = Σ_v Σ_{f:f(i)=v} P(f) = Σ_f P(f) = 1.

source

theorem Core.CoupledSoftmax.marginal_nonneg {I : Type u_1} {V : Type u_2} [Fintype I] [DecidableEq I] [Fintype V] [DecidableEq V] [Nonempty V] (cs : CoupledSoftmax I V) (i : I) (v : V) :

0 ≤ cs.marginal i v

Marginal probabilities are non-negative.

source

noncomputable def Core.CoupledSoftmax.independentProb {I : Type u_1} {V : Type u_2} [Fintype I] [Fintype V] [Nonempty V] (cs : CoupledSoftmax I V) (i : I) (v : V) :

ℝ

Independent per-item probability: what the marginal would be if there were no coupling. P_indep(i, v) = softmax(componentScore(i, ·), 1)(v).

Equations

cs.independentProb i v = Core.softmax (cs.componentScore i) v

Instances For

source

theorem Core.CoupledSoftmax.marginal_eq_independent_when_uncoupled {I : Type u_1} {V : Type u_2} [Fintype I] [DecidableEq I] [Fintype V] [DecidableEq V] [Nonempty V] (cs : CoupledSoftmax I V) (h_const : ∃ (c : ℝ), ∀ (f : I → V), cs.couplingScore f = c) (i : I) (v : V) :

cs.marginal i v = cs.independentProb i v

Factorization theorem: when coupling is constant (no genuine coupling), the marginal probability equals the independent per-item softmax.

This is the shared mathematical core of:

[Sto26b]: systemic weight = 0 ⟹ marginal = classical MaxEnt
[BLG16]: single lexicon ⟹ L1 = standard posterior
BToM: delta-function latent prior ⟹ marginal = direct inference

Proof #

When couplingScore f = c for all f:

totalScore f = Σᵢ componentScore(i, f(i)) + c
By softmax_add_const, adding c doesn't change softmax: jointProb f = softmax(f ↦ Σᵢ componentScore(i, f(i)), 1)(f)
Since the score decomposes additively over indices, exp(Σᵢ sᵢ) = Πᵢ exp(sᵢ), so the joint factorizes: jointProb f = Πᵢ softmax(componentScore(i, ·))(f(i))
Marginalizing: marginal(i, v) = softmax(componentScore(i, ·))(v) · Π_{j≠i} Σ_{v'} softmax(componentScore(j, ·))(v') = softmax(·)(v) · 1

Step 3 uses the finite Fubini theorem (Fintype.prod_sum): Σ_{f : I → V} Πᵢ g(i, f(i)) = Πᵢ (Σ_v g(i, v)).

The filter [f(i) = v] is encoded into the product via a zero/one trick: define g(j, w) = (j = i ? (w = v ? exp(s_i(v)) : 0) : exp(s_j(w))), so that when f(i) ≠ v the product vanishes (Fubini still applies), and when f(i) = v the product equals ∏_j exp(s_j(f(j))).

source

def Core.CoupledSoftmax.genuinelyCoupled {I : Type u_1} {V : Type u_2} [Fintype I] [Fintype V] (cs : CoupledSoftmax I V) (i : I) :

Prop

A coupled evaluation is genuinely coupled at index i iff there exist two assignments agreeing at i but with different total scores. This is the observable signature of non-factorization: the score of the assignment at position i depends on the values at other positions.

Equations

cs.genuinelyCoupled i = ∃ (f : I → V) (g : I → V), f i = g i ∧ cs.totalScore f ≠ cs.totalScore g

Instances For

source

noncomputable def Core.coupledSoftmaxOfMaxEnt {n : ℕ} {I : Type u_3} {O : Type u_4} [Fintype O] [DecidableEq O] (inputs : Fin n → I) (classicalScore : I × O → ℝ) (systemicScore : (Fin n → O) → ℝ) :

CoupledSoftmax (Fin n) O

Construct a CoupledSoftmax from MaxEnt grammar components.

componentScore i v = classicalScore(inputs(i), v)
couplingScore f = systemicScore(f)

This shows that the MaxEnt systemic constraint framework from Phonology.HarmonicGrammar.MaxEnt is an instance of the general coupled evaluation pattern. The marginal_eq_classical_when_no_systemic theorem in MaxEnt.lean is a corollary of marginal_eq_independent_when_uncoupled.

Equations

One or more equations did not get rendered due to their size.

Instances For

Documentation

Linglib.Core.Probability.CoupledEvaluation

Coupled Evaluation: Softmax over Product Spaces #

Instances #

Factorization #

Proof #