Documentation

Linglib.Core.Constraint.Dequantization.LogSumExp.Basic

Log-Sum-Exp: The Warped Semiring Bridging MaxEnt and Max-Plus #

@cite{cohen-smith-zaidan-2008} @cite{eisner-2002}

The log-sum-exp operator at temperature α > 0, lse α a b := (1/α) · log(exp(α·a) + exp(α·b)), is the additive operator of a commutative ("warped") semiring on ℝ whose multiplicative operator is +. Three things to know:

  1. Algebra: it satisfies commutativity, associativity, and the semiring distributivity d + (a ⊕_α b) = (d + a) ⊕_α (d + b) for all α ≠ 0. Identity is -∞ (omitted here; see WithBot ℝ for the typeclass-level packaging).
  2. Limit (in LogSumExp/Limit.lean): as α → ∞, lse α a b → max a b.
  3. Bridge to softmax (in Decoder.lean extensions): the softmax probability admits the partition-function form softmax α s c = exp(α · (s c - lseFinset α s f)), which collapses the softmax → argmax limit to the lse → max limit.

Important caveat #

LSE is not idempotent at finite α — lse α a a = a + log 2 / α. So while it is a commutative semiring, it is not a Kleene algebra. Idempotency is exactly what is recovered in the α → ∞ limit, and is what makes max-plus suitable for shortest-path style computation. The "warp" terminology emphasises this deviation from idempotence: lse α only behaves like max up to a temperature-dependent slack.

noncomputable def Core.Constraint.lse (α a b : ) :

Log-sum-exp at temperature α: lse α a b := (1/α) · log(exp(α·a) + exp(α·b)).

Equations
Instances For
    theorem Core.Constraint.exp_alpha_lse (α : ) ( : α 0) (a b : ) :
    Real.exp (α * lse α a b) = Real.exp (α * a) + Real.exp (α * b)

    Cancellation identity (α ≠ 0): exp(α · lse α a b) = exp(α·a) + exp(α·b).

    This is the key building block — every algebraic identity for lse reduces to an identity inside the log, accessed by exponentiating both sides and using this lemma.

    theorem Core.Constraint.lse_comm (α a b : ) :
    lse α a b = lse α b a

    LSE is commutative.

    theorem Core.Constraint.lse_assoc (α : ) ( : α 0) (a b c : ) :
    lse α (lse α a b) c = lse α a (lse α b c)

    LSE is associative (for α ≠ 0).

    Proof: both sides equal (1/α) · log(exp(α·a) + exp(α·b) + exp(α·c)) by the cancellation identity exp_alpha_lse.

    theorem Core.Constraint.add_lse (α : ) ( : α 0) (d a b : ) :
    d + lse α a b = lse α (d + a) (d + b)

    The semiring distributivity law: + distributes over lse α (for α ≠ 0).

    Equivalently, multiplication by exp(α·d) distributes over the sum inside the log. This is the "multiplicative" axiom of the warped semiring (ℝ, lse α, +) — the analogue of d * (a + b) = d*a + d*b.

    theorem Core.Constraint.lse_self (α : ) ( : α 0) (a : ) :
    lse α a a = a + Real.log 2 / α

    The non-idempotency identity: lse α a a = a + log 2 / α for α ≠ 0.

    This is the precise statement that LSE is NOT a tropical/max-plus operation at finite α — there is a log 2 / α "slack" that vanishes only in the α → ∞ limit.

    noncomputable def Core.Constraint.lseFinset {C : Type u_1} (α : ) (s : Finset C) (f : C) :

    The n-ary log-sum-exp over a Finset: lseFinset α s f := (1/α) · log(∑ c ∈ s, exp(α · f c)).

    Equals (1/α) · log Z where Z = ∑ exp(α · f c) is the MaxEnt partition function. The softmax probability of c ∈ s then has the clean form exp(α · (f c - lseFinset α s f)).

    Equations
    Instances For
      theorem Core.Constraint.exp_alpha_lseFinset {C : Type u_1} (α : ) ( : α 0) {s : Finset C} (hne : s.Nonempty) (f : C) :
      Real.exp (α * lseFinset α s f) = cs, Real.exp (α * f c)

      Cancellation: exp(α · lseFinset α s f) = ∑ c ∈ s, exp(α · f c) when α ≠ 0 and s is non-empty.

      theorem Core.Constraint.lseFinset_singleton {C : Type u_1} (α : ) ( : α 0) (c : C) (f : C) :
      lseFinset α {c} f = f c

      The singleton case: lseFinset α {c} f = f c.

      Witnesses that the n-ary lse degenerates to the underlying score on a single-candidate set — there is nothing to aggregate.