Documentation

Linglib.Theories.Pragmatics.RSA.Limits

Softmax Function: Limit Behavior #

§1. Pointwise limits: α → 0 (uniform), α → ∞ (argmax), α → -∞ (argmin). §2. BToM observer: softmaxObserver_tendsto_one — an observer watching a softmax agent concentrates on the uniquely optimal state as α → ∞. This is the RSA–exhaustivity bridge: L1 at α → ∞ computes exh.

def Softmax.argmaxSet {ι : Type u_1} (s : ι) :
Set ι

The set of indices achieving the maximum score.

Equations
Instances For
    def Softmax.argminSet {ι : Type u_1} (s : ι) :
    Set ι

    The set of indices achieving the minimum score.

    Equations
    Instances For
      noncomputable def Softmax.maxScore {ι : Type u_1} [Nonempty ι] (s : ι) :

      Maximum score value.

      Equations
      Instances For
        noncomputable def Softmax.minScore {ι : Type u_1} [Nonempty ι] (s : ι) :

        Minimum score value.

        Equations
        Instances For
          theorem Softmax.tendsto_softmax_zero {ι : Type u_1} [Fintype ι] [DecidableEq ι] [Nonempty ι] (s : ι) (i : ι) :
          Filter.Tendsto (fun (α : ) => Core.softmax s α i) (nhds 0) (nhds (1 / (Fintype.card ι)))

          Fact 4: As α → 0, softmax converges to uniform distribution.

          theorem Softmax.softmax_ratio_tendsto_zero {ι : Type u_1} [Fintype ι] [DecidableEq ι] [Nonempty ι] (s : ι) (i j : ι) (hij : s i < s j) :
          Filter.Tendsto (fun (α : ) => Core.softmax s α i / Core.softmax s α j) Filter.atTop (nhds 0)

          The ratio of non-max to max probability vanishes as α → ∞.

          theorem Softmax.tendsto_softmax_infty_at_max {ι : Type u_1} [Fintype ι] [DecidableEq ι] [Nonempty ι] (s : ι) (i_max : ι) (h_unique : ∀ (j : ι), j i_maxs j < s i_max) :
          Filter.Tendsto (fun (α : ) => Core.softmax s α i_max) Filter.atTop (nhds 1)

          At the maximum, softmax → 1 as α → ∞. Helper lemma.

          theorem Softmax.tendsto_softmax_infty_unique_max {ι : Type u_1} [Fintype ι] [DecidableEq ι] [Nonempty ι] (s : ι) (i_max : ι) (h_unique : ∀ (j : ι), j i_maxs j < s i_max) (i : ι) :
          Filter.Tendsto (fun (α : ) => Core.softmax s α i) Filter.atTop (nhds (if i = i_max then 1 else 0))

          When there's a unique maximum, softmax concentrates on it as α → ∞.

          theorem Softmax.log_softmax_ratio_tendsto {ι : Type u_1} [Fintype ι] [DecidableEq ι] [Nonempty ι] (s : ι) (i j : ι) (hij : s i < s j) :
          Filter.Tendsto (fun (α : ) => Real.log (Core.softmax s α j / Core.softmax s α i)) Filter.atTop Filter.atTop

          Log-probability difference grows unboundedly.

          theorem Softmax.tendsto_softmax_neg_infty_unique_min {ι : Type u_1} [Fintype ι] [DecidableEq ι] [Nonempty ι] (s : ι) (i_min : ι) (h_unique : ∀ (j : ι), j i_mins i_min < s j) (i : ι) :
          Filter.Tendsto (fun (α : ) => Core.softmax s α i) Filter.atBot (nhds (if i = i_min then 1 else 0))

          As α → -∞, softmax concentrates on the minimum.

          noncomputable def Softmax.hardmax {ι : Type u_1} [DecidableEq ι] [Nonempty ι] (s : ι) (i_max : ι) (h_unique : ∀ (j : ι), j i_maxs j < s i_max) :
          ι

          The IBR limit: hardmax selector.

          Equations
          Instances For
            theorem Softmax.softmax_tendsto_hardmax {ι : Type u_1} [Fintype ι] [DecidableEq ι] [Nonempty ι] (s : ι) (i_max : ι) (h_unique : ∀ (j : ι), j i_maxs j < s i_max) (i : ι) :
            Filter.Tendsto (fun (α : ) => Core.softmax s α i) Filter.atTop (nhds (hardmax s i_max h_unique i))

            Softmax converges to hardmax as α → ∞ (when maximum is unique).

            noncomputable def Softmax.entropy {ι : Type u_1} [Fintype ι] [Nonempty ι] (p : ι) :

            Shannon entropy of a distribution.

            Equations
            Instances For
              noncomputable def Softmax.maxEntropy (ι : Type u_2) [Fintype ι] :

              Maximum possible entropy (uniform distribution).

              Equations
              Instances For
                theorem Softmax.entropy_tendsto_max {ι : Type u_1} [Fintype ι] [DecidableEq ι] [Nonempty ι] (s : ι) :
                Filter.Tendsto (fun (α : ) => entropy (Core.softmax s α)) (nhds 0) (nhds (maxEntropy ι))

                As α → 0, entropy of softmax approaches maximum.

                theorem Softmax.entropy_tendsto_zero {ι : Type u_1} [Fintype ι] [DecidableEq ι] [Nonempty ι] (s : ι) (i_max : ι) (h_unique : ∀ (j : ι), j i_maxs j < s i_max) :
                Filter.Tendsto (fun (α : ) => entropy (Core.softmax s α)) Filter.atTop (nhds 0)

                As α → ∞ (with unique max), entropy approaches 0.

                theorem Softmax.softmax_exponential_decay {ι : Type u_1} [Fintype ι] [DecidableEq ι] [Nonempty ι] (s : ι) (i_max : ι) (h_max : ∀ (j : ι), s j s i_max) (i : ι) (hi : s i < s i_max) :
                C > 0, α > 0, Core.softmax s α i C * Real.exp (-α * (s i_max - s i))

                Exponential rate of concentration.

                theorem Softmax.softmax_negligible {ι : Type u_1} [Fintype ι] [DecidableEq ι] [Nonempty ι] (s : ι) (i_max : ι) (h_max : ∀ (j : ι), s j s i_max) (ε : ) ( : 0 < ε) (gap : ) (hgap : 0 < gap) (h_gap_bound : ∀ (j : ι), j i_maxs i_max - s j gap) (α : ) :
                α > 1 / gap * |Real.log ε|∀ (j : ι), j i_maxCore.softmax s α j < ε

                For practical computation: when is softmax close enough to hardmax?

                theorem Softmax.tendsto_softmax_infty_not_max {ι : Type u_1} [Fintype ι] [DecidableEq ι] [Nonempty ι] (s : ι) (i j : ι) (hij : s i < s j) :
                Filter.Tendsto (fun (α : ) => Core.softmax s α i) Filter.atTop (nhds 0)

                If action j has strictly higher score than i, softmax(i, α) → 0 as α → ∞. Weaker than tendsto_softmax_infty_unique_max: does not require a unique maximum, just that i is beaten by some j.

                noncomputable def Softmax.softmaxObserver {ι : Type u_1} [Fintype ι] {σ : Type u_2} [Fintype σ] [Nonempty ι] (score : ισ) (prior : σ) (α : ) (i : ι) (s : σ) :

                Observer posterior for a softmax agent: BToM inference about the state.

                An observer sees a softmax agent choose action i, and infers the state: P(s | i, α) = softmax(score(·,s), α)(i) · prior(s) / Σ_{s'} ...

                In RSA: the pragmatic listener L1 observes the speaker's utterance u and infers the world w. This is L1(w | u) parameterized by α for limits.

                Equations
                Instances For
                  theorem Softmax.softmaxObserver_tendsto_one {ι : Type u_1} [Fintype ι] [DecidableEq ι] {σ : Type u_2} [Fintype σ] [DecidableEq σ] [Nonempty ι] [Nonempty σ] (score : ισ) (prior : σ) (i₀ : ι) (s₀ : σ) (h_opt : ∀ (j : ι), j i₀score j s₀ < score i₀ s₀) (h_nonopt : ∀ (s : σ), s s₀∃ (j : ι), score i₀ s < score j s) (h_prior : 0 < prior s₀) :
                  Filter.Tendsto (fun (α : ) => softmaxObserver score prior α i₀ s₀) Filter.atTop (nhds 1)

                  BToM inference about a softmax agent concentrates on the optimal state.

                  An observer watches a softmax agent and infers the hidden state. If action i₀ is the unique best action at state s₀, and is not the best at any other state, the observer's posterior P(s₀ | i₀, α) → 1 as α → ∞.

                  RSA–exhaustivity bridge: the pragmatic listener (L1) observes a softmax speaker (S1) and does Bayesian inversion. At α → ∞, L1 concentrates on worlds where the heard utterance was the speaker's uniquely optimal choice. This IS the exhaustivity operator: L1 at α → ∞ computes exh(u).

                  For a simple scale {some, all} with worlds {someNotAll, all}:

                  • "all" is more informative at the "all" world → S1 says "all" there
                  • "some" is the only true utterance at "someNotAll" → S1 says "some" there
                  • L1 hearing "some" → concentrates on "someNotAll" → exh(some) = some ∧ ¬all
                  theorem Softmax.softmaxObserver_add_const {ι : Type u_1} [Fintype ι] [DecidableEq ι] {σ : Type u_2} [Fintype σ] [DecidableEq σ] [Nonempty ι] (score : ισ) (prior c : σ) (α : ) (i : ι) (s : σ) :
                  softmaxObserver (fun (j : ι) (t : σ) => score j t + c t) prior α i s = softmaxObserver score prior α i s

                  softmaxObserver is invariant under state-dependent constant shifts.

                  Adding c(s) to all action scores at state s doesn't change the observer's posterior, because softmax is translation-invariant (softmax_add_const).

                  Open conjectures #

                  Four open questions about the algebraic and emergent properties of RSA (previously stubbed in the deleted Core/Conjectures.lean).

                  TODO: Fixed-point uniqueness — RSA iteration converges to a unique fixed point for any rationality α > 0.

                  TODO: Lexicon-refinement monotonicity — if meaning₂ refines meaning₁ (i.e., meaning₂ u w → meaning₁ u w), then L1 meaning₂ u w ≤ L1 meaning₁ u w. Refining lexical meanings can only strengthen pragmatic inferences, never weaken them.

                  TODO: Tropical limit — in the α → ∞ limit, soft-max S1 converges to iterated best response (argmax / tropical semiring): for every ε > 0, (S1 α u w − bestResponse u w)² < ε eventually.

                  TODO: Neural-symbolic emergence — there exists a coarsening map from a language model's token-level predictions to world-level predictions such that the result approximates L1 to arbitrary ε-precision. Bridges next-token modeling to Gricean reasoning.