Documentation

Linglib.Core.Probability.Softmax

Softmax distribution on PMFs #

The softmax distribution with EReal-valued score is the PMF with mass proportional to exp(score):

softmax score a ∝ exp(score a)

Working with score : α → EReal (rather than score : α → ℝ) is essential: the natural RSA score score = β · log(speakerWeight) is -∞ exactly when the speaker weight is 0, and EReal.exp (-∞) = 0 makes such utterances correctly receive 0 mass. By contrast, Real.log 0 = 0 in mathlib, so a ℝ-valued softmax-of-log construction would silently assign weight 1 to impossible utterances — the bug the EReal substrate prevents at the root.

The inverse-temperature factor β lives inside the score on the consumer side: write softmax (fun a => (β : EReal) * scoreReal a). This keeps the substrate generic over score-construction patterns (@cite{kao-etal-2014-metaphor}'s λ · log Σ L0, @cite{frank-goodman-2012}'s α · log L0, @cite{herbstritt-franke-2019}'s λ · (-Hellinger), etc.).

Main definitions #

PMF.softmax score h_no_top h_some_finite — softmax PMF for an EReal-valued score function bounded above and bounded-below somewhere.

Main statements #

softmax_apply — closed form.
softmax_pos_iff_score_ne_bot — atom has positive softmax mass iff its score is not ⊥.
softmax_lt_iff_score_lt — the structural-decomposition lemma: comparing softmax values reduces to comparing scores. Direct via EReal.exp_lt_exp_iff — no positivity-of-temperature side condition needed because the temperature is folded into score.

Connection to `MeasureTheory.Measure.tilted` #

The general exponential-tilting construction is mathlib's Measure.tilted over arbitrary base measures. The softmax PMF is the uniform-prior case at the PMF level. The PMF-level Esscher transform (tilting an arbitrary PMF) is not yet defined; add it when a consumer needs prior · exp(score) shape that doesn't reduce to softmax.

Architectural role #

Substrate for RSA-style speaker constructions across Phenomena/: @cite{frank-goodman-2012}, @cite{goodman-stuhlmuller-2013}, @cite{kao-etal-2014-metaphor}, @cite{lassiter-goodman-2017}, @cite{herbstritt-franke-2019}, @cite{yoon-etal-2020}, etc. Replaces the per-paper inline PMF.normalize ∘ exp ∘ score pattern with a named primitive backed by the structural-decomposition lemma.

Definition #

noncomputable def PMF.softmaxWeight {α : Type u_1} (score : α → EReal) (a : α) :

ENNReal

The unnormalised softmax weight at a: EReal.exp (score a) : ℝ≥0∞. Returns 0 when score a = ⊥ and ∞ when score a = ⊤.

Equations

PMF.softmaxWeight score a = (score a).exp

Instances For

@[simp]

theorem PMF.softmaxWeight_apply {α : Type u_1} (score : α → EReal) (a : α) :

softmaxWeight score a = (score a).exp

theorem PMF.softmaxWeight_ne_zero_iff {α : Type u_1} (score : α → EReal) (a : α) :

softmaxWeight score a ≠ 0 ↔ score a ≠ ⊥

theorem PMF.softmaxWeight_ne_top_iff {α : Type u_1} (score : α → EReal) (a : α) :

softmaxWeight score a ≠ ⊤ ↔ score a ≠ ⊤

noncomputable def PMF.softmax {α : Type u_1} [Fintype α] (score : α → EReal) (h_no_top : ∀ (a : α), score a ≠ ⊤) (h_some_finite : ∃ (a : α), score a ≠ ⊥) :

PMF α

Softmax distribution for an EReal-valued score function.

For a finite type, the partition function Z = ∑ b, exp(score b) is non-zero iff some atom has score ≠ ⊥ (h_some_finite) and finite iff no atom has score = ⊤ (h_no_top).

The inverse-temperature factor β lives inside score on the consumer side. Typical pattern: score a = (β : EReal) * scoreReal a with scoreReal : α → ℝ (or ℝ embedded in EReal).

Equations

PMF.softmax score h_no_top h_some_finite = PMF.normalize (PMF.softmaxWeight score) ⋯ ⋯

Instances For

Apply formula and basic API #

@[simp]

theorem PMF.softmax_apply {α : Type u_1} [Fintype α] (score : α → EReal) (h_no_top : ∀ (a : α), score a ≠ ⊤) (h_some_finite : ∃ (a : α), score a ≠ ⊥) (a : α) :

(softmax score h_no_top h_some_finite) a = softmaxWeight score a / ∑ b : α, softmaxWeight score b

Closed form of the softmax value at a.

theorem PMF.softmax_partition_ne_zero {α : Type u_1} [Fintype α] (score : α → EReal) (h_some_finite : ∃ (a : α), score a ≠ ⊥) :

∑ b : α, softmaxWeight score b ≠ 0

The partition function is non-zero.

theorem PMF.softmax_partition_ne_top {α : Type u_1} [Fintype α] (score : α → EReal) (h_no_top : ∀ (a : α), score a ≠ ⊤) :

∑ b : α, softmaxWeight score b ≠ ⊤

The partition function is finite.

theorem PMF.softmax_pos_iff_score_ne_bot {α : Type u_1} [Fintype α] (score : α → EReal) (h_no_top : ∀ (a : α), score a ≠ ⊤) (h_some_finite : ∃ (a : α), score a ≠ ⊥) (a : α) :

0 < (softmax score h_no_top h_some_finite) a ↔ score a ≠ ⊥

Softmax positivity criterion: an atom has positive softmax mass iff its score is not ⊥. The score-⊥ atoms are exactly the impossible utterances (e.g., utterances where the literal listener has 0 support along the QUD-projection in @cite{kao-etal-2014-metaphor}).

@[simp]

theorem PMF.support_softmax {α : Type u_1} [Fintype α] (score : α → EReal) (h_no_top : ∀ (a : α), score a ≠ ⊤) (h_some_finite : ∃ (a : α), score a ≠ ⊥) :

(softmax score h_no_top h_some_finite).support = {a : α | score a ≠ ⊥}

The softmax support is exactly the score-finite-below atoms.

Comparison decomposition (RSA workhorse) #

theorem PMF.softmax_lt_iff_score_lt {α : Type u_1} [Fintype α] (score : α → EReal) (h_no_top : ∀ (a : α), score a ≠ ⊤) (h_some_finite : ∃ (a : α), score a ≠ ⊥) (a₁ a₂ : α) :

(softmax score h_no_top h_some_finite) a₁ < (softmax score h_no_top h_some_finite) a₂ ↔ score a₁ < score a₂

Structural-decomposition lemma: comparing softmax values at two atoms reduces to comparing their scores. Direct via the strict monotonicity of EReal.exp (mathlib's EReal.exp_lt_exp_iff).

Foundation lemma for "speaker prefers utterance u₂ over u₁ at world w" claims in RSA: the partition function depends on w but not on the utterance being compared, so it cancels exactly. The score function bundles the inverse-temperature factor (β · log L0 for @cite{kao-etal-2014-metaphor}, λ · -hellingerDist for @cite{herbstritt-franke-2019}, etc.).

theorem PMF.softmax_le_iff_score_le {α : Type u_1} [Fintype α] (score : α → EReal) (h_no_top : ∀ (a : α), score a ≠ ⊤) (h_some_finite : ∃ (a : α), score a ≠ ⊥) (a₁ a₂ : α) :

(softmax score h_no_top h_some_finite) a₁ ≤ (softmax score h_no_top h_some_finite) a₂ ↔ score a₁ ≤ score a₂

≤ companion of softmax_lt_iff_score_lt.

Degenerate cases #

theorem PMF.softmax_const {α : Type u_1} [Fintype α] [Nonempty α] (c : EReal) (hc_ne_top : c ≠ ⊤) (hc_ne_bot : c ≠ ⊥) :

softmax (fun (x : α) => c) ⋯ ⋯ = uniformOfFintype α

Constant-score softmax is uniform: a constant score gives no preference signal.

Bridge to nat-power form #

For natural exponent n, the EReal-softmax of n · log w is the natural-power softmax of w — a finite ratio of w(a)^n over the partition Σ_b w(b)^n. Mathlib's ENNReal.rpow_eq_exp_mul_log and ENNReal.rpow_natCast together give the per-weight identity.

Pivotal for RSA findings discharged via gcongr/norm_num/bound: the EReal form makes the substrate paper-faithful, but the rpow form makes the values computable rationals when w is.

Finite-coefficient times log: finiteness lemmas #

The score (α : EReal) * ENNReal.log x is the canonical RSA speaker score shape. It is finite (∈ (⊥, ⊤)) under the natural side conditions: α real-valued (so neither ⊤ nor ⊥), α ≥ 0, and x either non-zero or non-top respectively.

theorem PMF.coe_mul_log_ne_top {α : ℝ} (hα : 0 ≤ α) {x : ENNReal} (hx_ne_top : x ≠ ⊤) :

↑α * x.log ≠ ⊤

For α : ℝ non-negative and x : ℝ≥0∞ with x ≠ ⊤, the EReal product (α : EReal) * ENNReal.log x is bounded above (≠ ⊤).

theorem PMF.coe_mul_log_ne_bot {α : ℝ} (hα : 0 ≤ α) {x : ENNReal} (hx_ne_zero : x ≠ 0) :

↑α * x.log ≠ ⊥

For α : ℝ non-negative and x : ℝ≥0∞ with x ≠ 0, the EReal product (α : EReal) * ENNReal.log x is bounded below (≠ ⊥).

theorem PMF.softmaxWeight_natMul_log_eq_pow {α : Type u_2} (n : ℕ) (w : α → ENNReal) (a : α) :

softmaxWeight (fun (b : α) => ↑n * (w b).log) a = w a ^ n

EReal-softmax weight equals natural-power weight at each atom, when the underlying weight is in ℝ≥0∞.

theorem PMF.softmax_natMul_log_apply {α : Type u_2} [Fintype α] (n : ℕ) (w : α → ENNReal) (h_no_top : ∀ (a : α), ↑n * (w a).log ≠ ⊤) (h_some_finite : ∃ (a : α), ↑n * (w a).log ≠ ⊥) (a : α) :

(softmax (fun (b : α) => ↑n * (w b).log) h_no_top h_some_finite) a = w a ^ n / ∑ b : α, w b ^ n

Apply formula for natural-power softmax: softmax (n · log w) a = w(a)^n / Σ_b w(b)^n. Direct rewrite of softmax_apply via softmaxWeight_natMul_log_eq_pow.