Documentation

Linglib.Core.Probability.LogitChoice

The softmax function (logit choice rule) #

softmax sᵢ = exp sᵢ / Σⱼ exp sⱼ: the normalized exponential of a score vector over a Fintype, with partition function partitionFn and logSumExp. This is the logit choice rule — the exponential parameterization of Luce's choice rule and the Gibbs/Boltzmann distribution. The inverse-temperature / rationality parameter enters by scaling the argument (softmax (α • s)); elementary properties are surveyed in [FD23].

It is the domain-neutral logit core shared across the library: RSA speakers and listeners (pragmatics), MaxEnt / Noisy HG (phonology), and rational-action agents (Core.RationalAction) all build on it. With the Gumbel→softmax result (mcfaddenIntegral_eq_softmax) it is the logit sibling of the probit choice rule (gaussianChoiceProb).

[UPSTREAM]: Mathlib has Real.exp / Real.sigmoid but no softmax / log-sum-exp layer. Quality-and-role marker: pure-math foundation, no linguistics.

Main definitions #

softmax, partitionFn, logSumExp — the normalized exponential and its normalizing constant.
logit — the inverse of Real.sigmoid.

Main results #

softmax_sum_eq_one, softmax_pos — softmax is a probability distribution.
softmax_odds, log_softmax_odds, softmax_ratio — log-odds are score differences (the defining IIA property).
softmax_binary — the two-element case is Real.sigmoid.
rpow_luce_eq_softmax — Luce's f^α power rule is softmax of log-scores.

The softmax function #

The softmax function σ(s, α)ᵢ = exp(α · sᵢ) / Σⱼ exp(α · sⱼ) is the exponential parameterization of the Luce choice rule; its elementary properties are surveyed in [FD23].

noncomputable def Core.softmax {ι : Type u_1} [Fintype ι] (s : ι → ℝ) :

ι → ℝ

The softmax function: softmax(s)ᵢ = exp(sᵢ) / Σⱼ exp(sⱼ). The inverse-temperature / rationality parameter enters by scaling the argument: softmax (α • s).

Equations

Core.softmax s i = Real.exp (s i) / ∑ j : ι, Real.exp (s j)

Instances For

noncomputable def Core.partitionFn {ι : Type u_1} [Fintype ι] (s : ι → ℝ) :

ℝ

The partition function (normalizing constant) Z = Σⱼ exp(sⱼ).

Equations

Core.partitionFn s = ∑ j : ι, Real.exp (s j)

Instances For

noncomputable def Core.logSumExp {ι : Type u_1} [Fintype ι] (s : ι → ℝ) :

ℝ

Log-sum-exp: log of the partition function.

Equations

Core.logSumExp s = Real.log (∑ j : ι, Real.exp (s j))

Instances For

theorem Core.partitionFn_pos {ι : Type u_1} [Fintype ι] [Nonempty ι] (s : ι → ℝ) :

0 < partitionFn s

The partition function is always positive.

theorem Core.partitionFn_ne_zero {ι : Type u_1} [Fintype ι] [Nonempty ι] (s : ι → ℝ) :

partitionFn s ≠ 0

theorem Core.softmax_pos {ι : Type u_1} [Fintype ι] [Nonempty ι] (s : ι → ℝ) (i : ι) :

0 < softmax s i

Each softmax probability is positive.

theorem Core.softmax_sum_eq_one {ι : Type u_1} [Fintype ι] [Nonempty ι] (s : ι → ℝ) :

∑ i : ι, softmax s i = 1

Softmax probabilities sum to 1.

theorem Core.softmax_nonneg {ι : Type u_1} [Fintype ι] [Nonempty ι] (s : ι → ℝ) (i : ι) :

0 ≤ softmax s i

Softmax is non-negative.

theorem Core.softmax_le_one {ι : Type u_1} [Fintype ι] [Nonempty ι] (s : ι → ℝ) (i : ι) :

softmax s i ≤ 1

Softmax is at most 1.

theorem Core.softmax_odds {ι : Type u_1} [Fintype ι] [Nonempty ι] (s : ι → ℝ) (i j : ι) :

softmax s i / softmax s j = Real.exp (s i - s j)

Odds are determined by score differences: pᵢ/pⱼ = exp(sᵢ - sⱼ).

theorem Core.log_softmax_odds {ι : Type u_1} [Fintype ι] [Nonempty ι] (s : ι → ℝ) (i j : ι) :

Real.log (softmax s i / softmax s j) = s i - s j

Log-odds equal the score difference.

theorem Core.softmax_ratio {ι : Type u_1} [Fintype ι] [Nonempty ι] (s : ι → ℝ) (i j : ι) :

softmax s i = softmax s j * Real.exp (s i - s j)

Ratio form of the odds identity.

noncomputable def Core.logit (p : ℝ) :

ℝ

The logit function L(p) = log(p / (1 - p)) — the inverse of Real.sigmoid on (0, 1) (mathlib provides Real.sigmoid but not its inverse).

Equations

Core.logit p = Real.log (p / (1 - p))

Instances For

theorem Core.logit_sigmoid (x : ℝ) :

logit x.sigmoid = x

logit inverts Real.sigmoid.

theorem Core.sigmoid_logit {p : ℝ} (hp0 : 0 < p) (hp1 : p < 1) :

(logit p).sigmoid = p

Real.sigmoid inverts logit for 0 < p < 1.

theorem Core.softmax_binary (s : Fin 2 → ℝ) (α : ℝ) :

softmax (α • s) 0 = (α * (s 0 - s 1)).sigmoid

For n = 2, softmax reduces to Real.sigmoid.

theorem Core.logit_softmax_binary (s : Fin 2 → ℝ) (α : ℝ) :

logit (softmax (α • s) 0) = α * (s 0 - s 1)

Softmax log-odds equals logit of the binary softmax probability (when there are exactly two alternatives).

theorem Core.softmax_add_const {ι : Type u_1} [Fintype ι] (s : ι → ℝ) (c : ℝ) :

(softmax fun (i : ι) => s i + c) = softmax s

Softmax is translation invariant.

theorem Core.softmax_mono {ι : Type u_1} [Fintype ι] [Nonempty ι] (s : ι → ℝ) (i j : ι) (hij : s i ≤ s j) :

softmax s i ≤ softmax s j

Higher scores get higher probabilities.

theorem Core.softmax_strict_mono {ι : Type u_1} [Fintype ι] [Nonempty ι] (s : ι → ℝ) (i j : ι) (hij : s i < s j) :

softmax s i < softmax s j

Strict monotonicity.

theorem Core.softmax_zero {ι : Type u_1} [Fintype ι] :

softmax 0 = fun (x : ι) => 1 / ↑(Fintype.card ι)

Constant scores give the uniform distribution (the α = 0 case of softmax (α • s)).

theorem Core.log_softmax {ι : Type u_1} [Fintype ι] [Nonempty ι] (s : ι → ℝ) (i : ι) :

Real.log (softmax s i) = s i - Real.log (partitionFn s)

Log of softmax = score minus log partition function.

theorem Core.softmax_exponential_family {ι : Type u_1} [Fintype ι] [Nonempty ι] (s : ι → ℝ) (i : ι) :

softmax s i = Real.exp (s i - logSumExp s)

Softmax is an exponential family distribution.

theorem Core.rpow_luce_eq_softmax {ι : Type u_1} [Fintype ι] [Nonempty ι] (f : ι → ℝ) (α : ℝ) (hf : ∀ (i : ι), 0 < f i) (i : ι) :

f i ^ α / ∑ j : ι, f j ^ α = softmax (α • fun (j : ι) => Real.log (f j)) i

Luce choice with rpow scores equals softmax over scaled log scores: f(i)^α / Σⱼ f(j)^α = softmax (α • (log ∘ f)) i when all f(i) > 0. This connects belief-based RSA (which uses rpow) to the softmax framework.