Softmax Function: Limit Behavior #

§1. Pointwise limits: α → 0 (uniform), α → ∞ (argmax), α → -∞ (argmin). §2. BToM observer: softmaxObserver_tendsto_one — an observer watching a softmax agent concentrates on the uniquely optimal state as α → ∞. This is the RSA–exhaustivity bridge: L1 at α → ∞ computes exh.

source

def Softmax.argmaxSet {ι : Type u_1} (s : ι → ℝ) :

Set ι

The set of indices achieving the maximum score.

Equations

Softmax.argmaxSet s = {i : ι | ∀ (j : ι), s j ≤ s i}

Instances For

source

def Softmax.argminSet {ι : Type u_1} (s : ι → ℝ) :

Set ι

The set of indices achieving the minimum score.

Equations

Softmax.argminSet s = {i : ι | ∀ (j : ι), s i ≤ s j}

Instances For

source

noncomputable def Softmax.maxScore {ι : Type u_1} [Nonempty ι] (s : ι → ℝ) :

ℝ

Maximum score value.

Equations

Softmax.maxScore s = ⨆ (i : ι), s i

Instances For

source

noncomputable def Softmax.minScore {ι : Type u_1} [Nonempty ι] (s : ι → ℝ) :

ℝ

Minimum score value.

Equations

Softmax.minScore s = ⨅ (i : ι), s i

Instances For

source

theorem Softmax.tendsto_softmax_zero {ι : Type u_1} [Fintype ι] [DecidableEq ι] [Nonempty ι] (s : ι → ℝ) (i : ι) :

Filter.Tendsto (fun (α : ℝ) => Core.softmax s α i) (nhds 0) (nhds (1 / ↑(Fintype.card ι)))

Fact 4: As α → 0, softmax converges to uniform distribution.

source

theorem Softmax.softmax_ratio_tendsto_zero {ι : Type u_1} [Fintype ι] [DecidableEq ι] [Nonempty ι] (s : ι → ℝ) (i j : ι) (hij : s i < s j) :

Filter.Tendsto (fun (α : ℝ) => Core.softmax s α i / Core.softmax s α j) Filter.atTop (nhds 0)

The ratio of non-max to max probability vanishes as α → ∞.

source

theorem Softmax.tendsto_softmax_infty_at_max {ι : Type u_1} [Fintype ι] [DecidableEq ι] [Nonempty ι] (s : ι → ℝ) (i_max : ι) (h_unique : ∀ (j : ι), j ≠ i_max → s j < s i_max) :

Filter.Tendsto (fun (α : ℝ) => Core.softmax s α i_max) Filter.atTop (nhds 1)

At the maximum, softmax → 1 as α → ∞. Helper lemma.

source

theorem Softmax.tendsto_softmax_infty_unique_max {ι : Type u_1} [Fintype ι] [DecidableEq ι] [Nonempty ι] (s : ι → ℝ) (i_max : ι) (h_unique : ∀ (j : ι), j ≠ i_max → s j < s i_max) (i : ι) :

Filter.Tendsto (fun (α : ℝ) => Core.softmax s α i) Filter.atTop (nhds (if i = i_max then 1 else 0))

When there's a unique maximum, softmax concentrates on it as α → ∞.

source

theorem Softmax.log_softmax_ratio_tendsto {ι : Type u_1} [Fintype ι] [DecidableEq ι] [Nonempty ι] (s : ι → ℝ) (i j : ι) (hij : s i < s j) :

Filter.Tendsto (fun (α : ℝ) => Real.log (Core.softmax s α j / Core.softmax s α i)) Filter.atTop Filter.atTop

Log-probability difference grows unboundedly.

source

theorem Softmax.tendsto_softmax_neg_infty_unique_min {ι : Type u_1} [Fintype ι] [DecidableEq ι] [Nonempty ι] (s : ι → ℝ) (i_min : ι) (h_unique : ∀ (j : ι), j ≠ i_min → s i_min < s j) (i : ι) :

Filter.Tendsto (fun (α : ℝ) => Core.softmax s α i) Filter.atBot (nhds (if i = i_min then 1 else 0))

As α → -∞, softmax concentrates on the minimum.

source

noncomputable def Softmax.hardmax {ι : Type u_1} [DecidableEq ι] [Nonempty ι] (s : ι → ℝ) (i_max : ι) (h_unique : ∀ (j : ι), j ≠ i_max → s j < s i_max) :

ι → ℝ

The IBR limit: hardmax selector.

Equations

Softmax.hardmax s i_max h_unique i = if i = i_max then 1 else 0

Instances For

source

theorem Softmax.softmax_tendsto_hardmax {ι : Type u_1} [Fintype ι] [DecidableEq ι] [Nonempty ι] (s : ι → ℝ) (i_max : ι) (h_unique : ∀ (j : ι), j ≠ i_max → s j < s i_max) (i : ι) :

Filter.Tendsto (fun (α : ℝ) => Core.softmax s α i) Filter.atTop (nhds (hardmax s i_max h_unique i))

Softmax converges to hardmax as α → ∞ (when maximum is unique).

source

noncomputable def Softmax.entropy {ι : Type u_1} [Fintype ι] [Nonempty ι] (p : ι → ℝ) :

ℝ

Shannon entropy of a distribution.

Equations

Softmax.entropy p = -∑ i : ι, p i * Real.log (p i)

Instances For

source

noncomputable def Softmax.maxEntropy (ι : Type u_2) [Fintype ι] :

ℝ

Maximum possible entropy (uniform distribution).

Equations

Softmax.maxEntropy ι = Real.log ↑(Fintype.card ι)

Instances For

source

theorem Softmax.entropy_tendsto_max {ι : Type u_1} [Fintype ι] [DecidableEq ι] [Nonempty ι] (s : ι → ℝ) :

Filter.Tendsto (fun (α : ℝ) => entropy (Core.softmax s α)) (nhds 0) (nhds (maxEntropy ι))

As α → 0, entropy of softmax approaches maximum.

source

theorem Softmax.entropy_tendsto_zero {ι : Type u_1} [Fintype ι] [DecidableEq ι] [Nonempty ι] (s : ι → ℝ) (i_max : ι) (h_unique : ∀ (j : ι), j ≠ i_max → s j < s i_max) :

Filter.Tendsto (fun (α : ℝ) => entropy (Core.softmax s α)) Filter.atTop (nhds 0)

As α → ∞ (with unique max), entropy approaches 0.

source

theorem Softmax.softmax_exponential_decay {ι : Type u_1} [Fintype ι] [DecidableEq ι] [Nonempty ι] (s : ι → ℝ) (i_max : ι) (h_max : ∀ (j : ι), s j ≤ s i_max) (i : ι) (hi : s i < s i_max) :

∃ C > 0, ∀ α > 0, Core.softmax s α i ≤ C * Real.exp (-α * (s i_max - s i))

Exponential rate of concentration.

source

theorem Softmax.softmax_negligible {ι : Type u_1} [Fintype ι] [DecidableEq ι] [Nonempty ι] (s : ι → ℝ) (i_max : ι) (h_max : ∀ (j : ι), s j ≤ s i_max) (ε : ℝ) (hε : 0 < ε) (gap : ℝ) (hgap : 0 < gap) (h_gap_bound : ∀ (j : ι), j ≠ i_max → s i_max - s j ≥ gap) (α : ℝ) :

α > 1 / gap * |Real.log ε| → ∀ (j : ι), j ≠ i_max → Core.softmax s α j < ε

For practical computation: when is softmax close enough to hardmax?

source

theorem Softmax.tendsto_softmax_infty_not_max {ι : Type u_1} [Fintype ι] [DecidableEq ι] [Nonempty ι] (s : ι → ℝ) (i j : ι) (hij : s i < s j) :

Filter.Tendsto (fun (α : ℝ) => Core.softmax s α i) Filter.atTop (nhds 0)

If action j has strictly higher score than i, softmax(i, α) → 0 as α → ∞. Weaker than tendsto_softmax_infty_unique_max: does not require a unique maximum, just that i is beaten by some j.

source

noncomputable def Softmax.softmaxObserver {ι : Type u_1} [Fintype ι] {σ : Type u_2} [Fintype σ] [Nonempty ι] (score : ι → σ → ℝ) (prior : σ → ℝ) (α : ℝ) (i : ι) (s : σ) :

ℝ

Observer posterior for a softmax agent: BToM inference about the state.

An observer sees a softmax agent choose action i, and infers the state: P(s | i, α) = softmax(score(·,s), α)(i) · prior(s) / Σ_{s'} ...

In RSA: the pragmatic listener L1 observes the speaker's utterance u and infers the world w. This is L1(w | u) parameterized by α for limits.

Equations

Softmax.softmaxObserver score prior α i s = Core.softmax (fun (j : ι) => score j s) α i * prior s / ∑ s' : σ, Core.softmax (fun (j : ι) => score j s') α i * prior s'

Instances For

source

theorem Softmax.softmaxObserver_tendsto_one {ι : Type u_1} [Fintype ι] [DecidableEq ι] {σ : Type u_2} [Fintype σ] [DecidableEq σ] [Nonempty ι] [Nonempty σ] (score : ι → σ → ℝ) (prior : σ → ℝ) (i₀ : ι) (s₀ : σ) (h_opt : ∀ (j : ι), j ≠ i₀ → score j s₀ < score i₀ s₀) (h_nonopt : ∀ (s : σ), s ≠ s₀ → ∃ (j : ι), score i₀ s < score j s) (h_prior : 0 < prior s₀) :

Filter.Tendsto (fun (α : ℝ) => softmaxObserver score prior α i₀ s₀) Filter.atTop (nhds 1)

BToM inference about a softmax agent concentrates on the optimal state.

An observer watches a softmax agent and infers the hidden state. If action i₀ is the unique best action at state s₀, and is not the best at any other state, the observer's posterior P(s₀ | i₀, α) → 1 as α → ∞.

RSA–exhaustivity bridge: the pragmatic listener (L1) observes a softmax speaker (S1) and does Bayesian inversion. At α → ∞, L1 concentrates on worlds where the heard utterance was the speaker's uniquely optimal choice. This IS the exhaustivity operator: L1 at α → ∞ computes exh(u).

For a simple scale {some, all} with worlds {someNotAll, all}:

"all" is more informative at the "all" world → S1 says "all" there
"some" is the only true utterance at "someNotAll" → S1 says "some" there
L1 hearing "some" → concentrates on "someNotAll" → exh(some) = some ∧ ¬all

source

theorem Softmax.softmaxObserver_add_const {ι : Type u_1} [Fintype ι] [DecidableEq ι] {σ : Type u_2} [Fintype σ] [DecidableEq σ] [Nonempty ι] (score : ι → σ → ℝ) (prior c : σ → ℝ) (α : ℝ) (i : ι) (s : σ) :

softmaxObserver (fun (j : ι) (t : σ) => score j t + c t) prior α i s = softmaxObserver score prior α i s

softmaxObserver is invariant under state-dependent constant shifts.

Adding c(s) to all action scores at state s doesn't change the observer's posterior, because softmax is translation-invariant (softmax_add_const).

Open conjectures #

Four open questions about the algebraic and emergent properties of RSA (previously stubbed in the deleted Core/Conjectures.lean).

TODO: Fixed-point uniqueness — RSA iteration converges to a unique fixed point for any rationality α > 0.

TODO: Lexicon-refinement monotonicity — if meaning₂ refines meaning₁ (i.e., meaning₂ u w → meaning₁ u w), then L1 meaning₂ u w ≤ L1 meaning₁ u w. Refining lexical meanings can only strengthen pragmatic inferences, never weaken them.

TODO: Tropical limit — in the α → ∞ limit, soft-max S1 converges to iterated best response (argmax / tropical semiring): for every ε > 0, (S1 α u w − bestResponse u w)² < ε eventually.

TODO: Neural-symbolic emergence — there exists a coarsening map from a language model's token-level predictions to world-level predictions such that the result approximates L1 to arbitrary ε-precision. Bridges next-token modeling to Gricean reasoning.

Documentation

Linglib.Theories.Pragmatics.RSA.Limits

Softmax Function: Limit Behavior #

Open conjectures #