Log-Sum-Exp: The Warped Semiring Bridging MaxEnt and Max-Plus #
@cite{cohen-smith-zaidan-2008} @cite{eisner-2002}
The log-sum-exp operator at temperature α > 0,
lse α a b := (1/α) · log(exp(α·a) + exp(α·b)),
is the additive operator of a commutative ("warped") semiring on ℝ
whose multiplicative operator is +. Three things to know:
- Algebra: it satisfies commutativity, associativity, and the
semiring distributivity
d + (a ⊕_α b) = (d + a) ⊕_α (d + b)for allα ≠ 0. Identity is-∞(omitted here; seeWithBot ℝfor the typeclass-level packaging). - Limit (in
LogSumExp/Limit.lean): asα → ∞,lse α a b → max a b. - Bridge to softmax (in
Decoder.leanextensions): the softmax probability admits the partition-function formsoftmax α s c = exp(α · (s c - lseFinset α s f)), which collapses the softmax → argmax limit to the lse → max limit.
Important caveat #
LSE is not idempotent at finite α — lse α a a = a + log 2 / α. So
while it is a commutative semiring, it is not a Kleene algebra.
Idempotency is exactly what is recovered in the α → ∞ limit, and is
what makes max-plus suitable for shortest-path style computation. The
"warp" terminology emphasises this deviation from idempotence:
lse α only behaves like max up to a temperature-dependent slack.
Log-sum-exp at temperature α:
lse α a b := (1/α) · log(exp(α·a) + exp(α·b)).
Equations
- Core.Constraint.lse α a b = 1 / α * Real.log (Real.exp (α * a) + Real.exp (α * b))
Instances For
Cancellation identity (α ≠ 0):
exp(α · lse α a b) = exp(α·a) + exp(α·b).
This is the key building block — every algebraic identity for lse
reduces to an identity inside the log, accessed by exponentiating
both sides and using this lemma.
LSE is associative (for α ≠ 0).
Proof: both sides equal (1/α) · log(exp(α·a) + exp(α·b) + exp(α·c))
by the cancellation identity exp_alpha_lse.
The semiring distributivity law: + distributes over lse α (for
α ≠ 0).
Equivalently, multiplication by exp(α·d) distributes over the
sum inside the log. This is the "multiplicative" axiom of the warped
semiring (ℝ, lse α, +) — the analogue of d * (a + b) = d*a + d*b.
The non-idempotency identity: lse α a a = a + log 2 / α for α ≠ 0.
This is the precise statement that LSE is NOT a tropical/max-plus
operation at finite α — there is a log 2 / α "slack" that vanishes
only in the α → ∞ limit.
The n-ary log-sum-exp over a Finset:
lseFinset α s f := (1/α) · log(∑ c ∈ s, exp(α · f c)).
Equals (1/α) · log Z where Z = ∑ exp(α · f c) is the MaxEnt
partition function. The softmax probability of c ∈ s then has the
clean form exp(α · (f c - lseFinset α s f)).
Equations
- Core.Constraint.lseFinset α s f = 1 / α * Real.log (∑ c ∈ s, Real.exp (α * f c))
Instances For
Cancellation: exp(α · lseFinset α s f) = ∑ c ∈ s, exp(α · f c)
when α ≠ 0 and s is non-empty.
The singleton case: lseFinset α {c} f = f c.
Witnesses that the n-ary lse degenerates to the underlying score on a single-candidate set — there is nothing to aggregate.