Documentation

Linglib.Core.Probability.Softmax

Softmax distribution on PMFs #

The softmax distribution with EReal-valued score is the PMF with mass proportional to exp(score):

softmax score a ∝ exp(score a)

Working with score : α → EReal (rather than score : α → ℝ) is essential: the natural RSA score score = β · log(speakerWeight) is -∞ exactly when the speaker weight is 0, and EReal.exp (-∞) = 0 makes such utterances correctly receive 0 mass. By contrast, Real.log 0 = 0 in mathlib, so a ℝ-valued softmax-of-log construction would silently assign weight 1 to impossible utterances — the bug the EReal substrate prevents at the root.

The inverse-temperature factor β lives inside the score on the consumer side: write softmax (fun a => (β : EReal) * scoreReal a). This keeps the substrate generic over score-construction patterns (@cite{kao-etal-2014-metaphor}'s λ · log Σ L0, @cite{frank-goodman-2012}'s α · log L0, @cite{herbstritt-franke-2019}'s λ · (-Hellinger), etc.).

Main definitions #

Main statements #

Connection to MeasureTheory.Measure.tilted #

The general exponential-tilting construction is mathlib's Measure.tilted over arbitrary base measures. The softmax PMF is the uniform-prior case at the PMF level. The PMF-level Esscher transform (tilting an arbitrary PMF) is not yet defined; add it when a consumer needs prior · exp(score) shape that doesn't reduce to softmax.

Architectural role #

Substrate for RSA-style speaker constructions across Phenomena/: @cite{frank-goodman-2012}, @cite{goodman-stuhlmuller-2013}, @cite{kao-etal-2014-metaphor}, @cite{lassiter-goodman-2017}, @cite{herbstritt-franke-2019}, @cite{yoon-etal-2020}, etc. Replaces the per-paper inline PMF.normalize ∘ exp ∘ score pattern with a named primitive backed by the structural-decomposition lemma.

Definition #

noncomputable def PMF.softmaxWeight {α : Type u_1} (score : αEReal) (a : α) :
ENNReal

The unnormalised softmax weight at a: EReal.exp (score a) : ℝ≥0∞. Returns 0 when score a = ⊥ and when score a = ⊤.

Equations
Instances For
    @[simp]
    theorem PMF.softmaxWeight_apply {α : Type u_1} (score : αEReal) (a : α) :
    softmaxWeight score a = (score a).exp
    theorem PMF.softmaxWeight_ne_zero_iff {α : Type u_1} (score : αEReal) (a : α) :
    softmaxWeight score a 0 score a
    theorem PMF.softmaxWeight_ne_top_iff {α : Type u_1} (score : αEReal) (a : α) :
    softmaxWeight score a score a
    noncomputable def PMF.softmax {α : Type u_1} [Fintype α] (score : αEReal) (h_no_top : ∀ (a : α), score a ) (h_some_finite : ∃ (a : α), score a ) :
    PMF α

    Softmax distribution for an EReal-valued score function.

    For a finite type, the partition function Z = ∑ b, exp(score b) is non-zero iff some atom has score ≠ ⊥ (h_some_finite) and finite iff no atom has score = ⊤ (h_no_top).

    The inverse-temperature factor β lives inside score on the consumer side. Typical pattern: score a = (β : EReal) * scoreReal a with scoreReal : α → ℝ (or ℝ embedded in EReal).

    Equations
    Instances For

      Apply formula and basic API #

      @[simp]
      theorem PMF.softmax_apply {α : Type u_1} [Fintype α] (score : αEReal) (h_no_top : ∀ (a : α), score a ) (h_some_finite : ∃ (a : α), score a ) (a : α) :
      (softmax score h_no_top h_some_finite) a = softmaxWeight score a / b : α, softmaxWeight score b

      Closed form of the softmax value at a.

      theorem PMF.softmax_partition_ne_zero {α : Type u_1} [Fintype α] (score : αEReal) (h_some_finite : ∃ (a : α), score a ) :
      b : α, softmaxWeight score b 0

      The partition function is non-zero.

      theorem PMF.softmax_partition_ne_top {α : Type u_1} [Fintype α] (score : αEReal) (h_no_top : ∀ (a : α), score a ) :
      b : α, softmaxWeight score b

      The partition function is finite.

      theorem PMF.softmax_pos_iff_score_ne_bot {α : Type u_1} [Fintype α] (score : αEReal) (h_no_top : ∀ (a : α), score a ) (h_some_finite : ∃ (a : α), score a ) (a : α) :
      0 < (softmax score h_no_top h_some_finite) a score a

      Softmax positivity criterion: an atom has positive softmax mass iff its score is not . The score- atoms are exactly the impossible utterances (e.g., utterances where the literal listener has 0 support along the QUD-projection in @cite{kao-etal-2014-metaphor}).

      @[simp]
      theorem PMF.support_softmax {α : Type u_1} [Fintype α] (score : αEReal) (h_no_top : ∀ (a : α), score a ) (h_some_finite : ∃ (a : α), score a ) :
      (softmax score h_no_top h_some_finite).support = {a : α | score a }

      The softmax support is exactly the score-finite-below atoms.

      Comparison decomposition (RSA workhorse) #

      theorem PMF.softmax_lt_iff_score_lt {α : Type u_1} [Fintype α] (score : αEReal) (h_no_top : ∀ (a : α), score a ) (h_some_finite : ∃ (a : α), score a ) (a₁ a₂ : α) :
      (softmax score h_no_top h_some_finite) a₁ < (softmax score h_no_top h_some_finite) a₂ score a₁ < score a₂

      Structural-decomposition lemma: comparing softmax values at two atoms reduces to comparing their scores. Direct via the strict monotonicity of EReal.exp (mathlib's EReal.exp_lt_exp_iff).

      Foundation lemma for "speaker prefers utterance u₂ over u₁ at world w" claims in RSA: the partition function depends on w but not on the utterance being compared, so it cancels exactly. The score function bundles the inverse-temperature factor (β · log L0 for @cite{kao-etal-2014-metaphor}, λ · -hellingerDist for @cite{herbstritt-franke-2019}, etc.).

      theorem PMF.softmax_le_iff_score_le {α : Type u_1} [Fintype α] (score : αEReal) (h_no_top : ∀ (a : α), score a ) (h_some_finite : ∃ (a : α), score a ) (a₁ a₂ : α) :
      (softmax score h_no_top h_some_finite) a₁ (softmax score h_no_top h_some_finite) a₂ score a₁ score a₂

      companion of softmax_lt_iff_score_lt.

      Degenerate cases #

      theorem PMF.softmax_const {α : Type u_1} [Fintype α] [Nonempty α] (c : EReal) (hc_ne_top : c ) (hc_ne_bot : c ) :
      softmax (fun (x : α) => c) = uniformOfFintype α

      Constant-score softmax is uniform: a constant score gives no preference signal.

      Bridge to nat-power form #

      For natural exponent n, the EReal-softmax of n · log w is the natural-power softmax of w — a finite ratio of w(a)^n over the partition Σ_b w(b)^n. Mathlib's ENNReal.rpow_eq_exp_mul_log and ENNReal.rpow_natCast together give the per-weight identity.

      Pivotal for RSA findings discharged via gcongr/norm_num/bound: the EReal form makes the substrate paper-faithful, but the rpow form makes the values computable rationals when w is.

      Finite-coefficient times log: finiteness lemmas #

      The score (α : EReal) * ENNReal.log x is the canonical RSA speaker score shape. It is finite (∈ (⊥, ⊤)) under the natural side conditions: α real-valued (so neither nor ), α ≥ 0, and x either non-zero or non-top respectively.

      theorem PMF.coe_mul_log_ne_top {α : } ( : 0 α) {x : ENNReal} (hx_ne_top : x ) :
      α * x.log

      For α : ℝ non-negative and x : ℝ≥0∞ with x ≠ ⊤, the EReal product (α : EReal) * ENNReal.log x is bounded above (≠ ⊤).

      theorem PMF.coe_mul_log_ne_bot {α : } ( : 0 α) {x : ENNReal} (hx_ne_zero : x 0) :
      α * x.log

      For α : ℝ non-negative and x : ℝ≥0∞ with x ≠ 0, the EReal product (α : EReal) * ENNReal.log x is bounded below (≠ ⊥).

      theorem PMF.softmaxWeight_natMul_log_eq_pow {α : Type u_2} (n : ) (w : αENNReal) (a : α) :
      softmaxWeight (fun (b : α) => n * (w b).log) a = w a ^ n

      EReal-softmax weight equals natural-power weight at each atom, when the underlying weight is in ℝ≥0∞.

      theorem PMF.softmax_natMul_log_apply {α : Type u_2} [Fintype α] (n : ) (w : αENNReal) (h_no_top : ∀ (a : α), n * (w a).log ) (h_some_finite : ∃ (a : α), n * (w a).log ) (a : α) :
      (softmax (fun (b : α) => n * (w b).log) h_no_top h_some_finite) a = w a ^ n / b : α, w b ^ n

      Apply formula for natural-power softmax: softmax (n · log w) a = w(a)^n / Σ_b w(b)^n. Direct rewrite of softmax_apply via softmaxWeight_natMul_log_eq_pow.