Documentation

Linglib.Core.Agent.RationalAction

Rational Action @cite{luce-1959} #

@cite{cover-thomas-2006} @cite{zaslavsky-hu-levy-2020} @cite{adams-messick-1958}The mathematical foundation for all soft-rational agents: RSA speakers/listeners, BToM agents, and decision-theoretic actors.

Architecture #

A RationalAction agent selects actions with probability proportional to a non-negative score function — the Luce choice rule. This is the unique choice rule satisfying IIA (independence of irrelevant alternatives): the relative probability of two actions depends only on their scores.

The key mathematical results characterizing this choice rule are:

  1. Softmax (§2): The exponential parameterization score = exp(α · utility) gives policy = softmax(utility, α). This is the standard form in RSA.

  2. Gibbs Variational Principle (§3): Softmax uniquely maximizes H(p) + α · ⟨p, s⟩ on the probability simplex. This is the mathematical foundation for RSA convergence.

  3. Maximum Entropy (§4): Softmax is the max-entropy distribution subject to an expected-utility constraint. Equivalently, it minimizes free energy (the Boltzmann distribution from statistical mechanics).

  4. Bayesian Optimality (§5): The Bayesian posterior maximizes expected log-likelihood. This is the listener half of RSA convergence.

structure Core.RationalAction (State : Type u_1) (Action : Type u_2) [Fintype Action] :
Type (max u_1 u_2)

A rational action agent: selects actions with probability ∝ score(state, action).

This is the Luce choice rule. The score function is unnormalized; normalization to a proper distribution is derived (see policy).

Key instances:

  • RSA L0: score(u, w) = prior(w) · meaning(u, w)
  • RSA S1: score(w, u) = rpow(informativity(w, u), α) · exp(-α · cost(u))
  • BToM agent: score(state, action) = exp(β · Q(state, action))
  • score : StateAction

    Unnormalized score: policy(a|s) ∝ score(s, a).

  • score_nonneg (s : State) (a : Action) : 0 self.score s a

    Scores are non-negative.

Instances For
    noncomputable def Core.RationalAction.totalScore {S : Type u_1} {A : Type u_2} [Fintype A] (ra : RationalAction S A) (s : S) :

    Total score across all actions in a state (normalization constant).

    Equations
    Instances For
      theorem Core.RationalAction.totalScore_nonneg {S : Type u_1} {A : Type u_2} [Fintype A] (ra : RationalAction S A) (s : S) :
      noncomputable def Core.RationalAction.policy {S : Type u_1} {A : Type u_2} [Fintype A] (ra : RationalAction S A) (s : S) (a : A) :

      Normalized policy: P(a|s) = score(s,a) / Σ_a' score(s,a'). Returns 0 for all actions if totalScore = 0.

      Equations
      Instances For
        theorem Core.RationalAction.policy_nonneg {S : Type u_1} {A : Type u_2} [Fintype A] (ra : RationalAction S A) (s : S) (a : A) :
        0 ra.policy s a
        theorem Core.RationalAction.policy_sum_eq_one {S : Type u_1} {A : Type u_2} [Fintype A] (ra : RationalAction S A) (s : S) (h : ra.totalScore s 0) :
        a : A, ra.policy s a = 1

        Policy sums to 1 when totalScore is nonzero.

        theorem Core.RationalAction.policy_monotone {S : Type u_1} {A : Type u_2} [Fintype A] (ra : RationalAction S A) (s : S) (a₁ a₂ : A) (h : ra.score s a₁ ra.score s a₂) :
        ra.policy s a₁ ra.policy s a₂

        Policy monotonicity: higher score → higher probability.

        theorem Core.RationalAction.policy_eq_zero_of_score_eq_zero {S : Type u_1} {A : Type u_2} [Fintype A] (ra : RationalAction S A) (s : S) (a : A) (h : ra.score s a = 0) :
        ra.policy s a = 0

        Zero score implies zero policy, regardless of whether totalScore is zero.

        theorem Core.RationalAction.policy_eq_of_score_eq {S : Type u_1} {A : Type u_2} [Fintype A] (ra : RationalAction S A) (s : S) (a₁ a₂ : A) (h : ra.score s a₁ = ra.score s a₂) :
        ra.policy s a₁ = ra.policy s a₂

        Policy respects score equality: equal scores → equal probabilities. Follows directly from the Luce rule: both sides are score / totalScore with the same denominator.

        theorem Core.RationalAction.policy_eq_one_of_totalScore_eq {S : Type u_1} {A : Type u_2} [Fintype A] (ra : RationalAction S A) (s : S) (a : A) (h_sum : ra.totalScore s = ra.score s a) (h_pos : 0 < ra.score s a) :
        ra.policy s a = 1

        When totalScore equals the score of action a, the policy for a is 1. Used by the compositional proof builder when all other scores are zero, so totalScore = score a + 0 +... + 0 = score a, making policy = 1.

        theorem Core.RationalAction.policy_not_lt_of_score_le {S : Type u_1} {A : Type u_2} [Fintype A] (ra : RationalAction S A) (s : S) (a₁ a₂ : A) (h : ra.score s a₂ ra.score s a₁) :
        ¬ra.policy s a₁ < ra.policy s a₂

        Score ordering implies ¬(policy strict ordering). Used by compositional proof builder for ¬(L1 w₁ < L1 w₂) goals.

        theorem Core.RationalAction.policy_lt_of_score_lt {S : Type u_1} {A : Type u_2} [Fintype A] (ra : RationalAction S A) (s : S) (a₁ a₂ : A) (hlt : ra.score s a₁ < ra.score s a₂) :
        ra.policy s a₁ < ra.policy s a₂

        Strict policy monotonicity: strictly higher score → strictly higher probability.

        Used by rsa_decide to eliminate shared denominator computations: when comparing policy s a₁ < policy s a₂ (same state), it suffices to show score s a₁ < score s a₂, skipping the expensive totalScore computation in the proof term.

        theorem Core.RationalAction.policy_lt_iff_score_lt {S : Type u_1} {A : Type u_2} [Fintype A] (ra : RationalAction S A) (s : S) (a₁ a₂ : A) :
        ra.policy s a₁ < ra.policy s a₂ ra.score s a₁ < ra.score s a₂

        Iff combining policy_lt_of_score_lt with the contrapositive of policy_monotone: at the same state, policy strict ordering coincides with score strict ordering.

        theorem Core.RationalAction.policy_lt_cross {S : Type u_1} {A : Type u_2} [Fintype A] (ra : RationalAction S A) (s₁ s₂ : S) (a : A) (h_pos₁ : 0 < ra.totalScore s₁) (h_pos₂ : 0 < ra.totalScore s₂) (h_cross : ra.score s₁ a * ra.totalScore s₂ < ra.score s₂ a * ra.totalScore s₁) :
        ra.policy s₁ a < ra.policy s₂ a

        Cross-state policy comparison: compares policy values at different states (different denominators). Used for S2 cross-world comparisons where S2(u|w₁) vs S2(u|w₂) have different normalization constants.

        The cross-product condition score(s₁,a) * total(s₂) < score(s₂,a) * total(s₁) is equivalent to score(s₁,a)/total(s₁) < score(s₂,a)/total(s₂) when both totals are positive.

        theorem Core.RationalAction.policy_lt_cross_of_cross_lt {S : Type u_1} {A : Type u_2} [Fintype A] (ra : RationalAction S A) (s₁ s₂ : S) (a : A) (h_cross : ra.score s₁ a * ra.totalScore s₂ < ra.score s₂ a * ra.totalScore s₁) :
        ra.policy s₁ a < ra.policy s₂ a

        Cross-state policy comparison with positivity derived from the cross-product.

        Like policy_lt_cross but derives the totalScore > 0 conditions from the cross-product inequality itself: if 0 ≤ score(s₁,a) * total(s₂) < score(s₂,a) * total(s₁), then score(s₂,a) * total(s₁) > 0, so both score(s₂,a) > 0 and total(s₁) > 0. And score(s₂,a) ≤ total(s₂), so total(s₂) > 0.

        Used by rsa_predict for cross-utterance L1 comparisons where the two sides have different normalization constants.

        theorem Core.RationalAction.policy_list_sum_lt {S : Type u_1} {A : Type u_2} [Fintype A] (ra : RationalAction S A) (s : S) (as₁ as₂ : List A) (h : (List.map (ra.score s) as₁).sum < (List.map (ra.score s) as₂).sum) (htot : 0 < ra.totalScore s) :
        (List.map (ra.policy s) as₁).sum < (List.map (ra.policy s) as₂).sum

        Score-sum ordering implies policy-sum ordering when both sides share the same state (same denominator). Used by rsa_predict for marginal L1 comparisons where the worlds being summed differ but the utterance and config are shared.

        theorem Core.RationalAction.finset_sum_policy_lt_of_sum_score_lt {S : Type u_1} {A : Type u_2} [Fintype A] (ra : RationalAction S A) (s : S) (F₁ F₂ : Finset A) (h : F₁.sum (ra.score s) < F₂.sum (ra.score s)) :
        F₁.sum (ra.policy s) < F₂.sum (ra.policy s)

        Finset-sum ordering implies policy-sum ordering when both sides share the same state (same denominator). Like policy_list_sum_lt but for Finset.sum.

        Derives totalScore positivity from the score ordering itself, so no extra hypothesis is needed: if 0 ≤ Σ_{F₁} score < Σ_{F₂} score, then some score is positive, so totalScore > 0.

        Used by rsa_predict for denominator cancellation in marginal comparisons.

        Luce's Choice Axiom #

        showed that the ratio rule P(a|s) = v(a)/Σv(b) is characterized by the independence of irrelevant alternatives (IIA): the relative probability of two actions depends only on their scores, not on what other actions are available.

        We formalize:

        theorem Core.RationalAction.policy_ratio {S : Type u_3} {A : Type u_4} [Fintype A] (ra : RationalAction S A) (s : S) (a₁ a₂ : A) :
        ra.policy s a₁ * ra.score s a₂ = ra.policy s a₂ * ra.score s a₁

        Constant Ratio Rule (Theorem 2): policy(a₁) · score(a₂) = policy(a₂) · score(a₁). The odds ratio policy(a₁)/policy(a₂) = score(a₁)/score(a₂).

        noncomputable def Core.RationalAction.pChoice {S : Type u_3} {A : Type u_4} [Fintype A] [DecidableEq A] (ra : RationalAction S A) (s : S) (T : Finset A) (a : A) :

        Choice probability from a subset: pChoice(a, T) = score(a) / Σ_{b∈T} score(b). Returns 0 if a ∉ T or the subset total is 0.

        Equations
        • ra.pChoice s T a = if a T then have subTotal := bT, ra.score s b; if subTotal = 0 then 0 else ra.score s a / subTotal else 0
        Instances For
          theorem Core.RationalAction.pChoice_univ {S : Type u_3} {A : Type u_4} [Fintype A] [DecidableEq A] (ra : RationalAction S A) (s : S) (a : A) :
          ra.pChoice s Finset.univ a = ra.policy s a

          pChoice on the full set equals policy.

          theorem Core.RationalAction.pChoice_nonneg {S : Type u_3} {A : Type u_4} [Fintype A] [DecidableEq A] (ra : RationalAction S A) (s : S) (T : Finset A) (a : A) :
          0 ra.pChoice s T a

          pChoice is non-negative.

          theorem Core.RationalAction.pChoice_sum_eq_one {S : Type u_3} {A : Type u_4} [Fintype A] [DecidableEq A] (ra : RationalAction S A) (s : S) (T : Finset A) (hT : bT, ra.score s b 0) :
          aT, ra.pChoice s T a = 1

          pChoice sums to 1 over the subset when the subset total is nonzero.

          theorem Core.RationalAction.pChoice_ratio {S : Type u_3} {A : Type u_4} [Fintype A] [DecidableEq A] (ra : RationalAction S A) (s : S) (T : Finset A) (a₁ a₂ : A) (h₁ : a₁ T) (h₂ : a₂ T) :
          ra.pChoice s T a₁ * ra.score s a₂ = ra.pChoice s T a₂ * ra.score s a₁

          IIA core: the ratio of pChoice values in any subset equals the score ratio. For a₁, a₂ ∈ T with score(a₂) > 0: pChoice(a₁, T) · score(a₂) = pChoice(a₂, T) · score(a₁) (Axiom 1).

          theorem Core.RationalAction.iia {S : Type u_3} {A : Type u_4} [Fintype A] [DecidableEq A] (ra : RationalAction S A) (s : S) (S' T : Finset A) (hST : S' T) (a : A) (ha : a S') (hS_pos : bS', ra.score s b 0) (hT_pos : bT, ra.score s b 0) :
          ra.pChoice s S' a = ra.pChoice s T a / bS', ra.pChoice s T b

          IIA (Axiom 1): P(a, S) = P(a, T) / Σ_{b∈S} P(b, T) for S ⊆ T. Choice probability from a subset is the conditional probability.

          theorem Core.RationalAction.product_rule {S : Type u_3} {A : Type u_4} [Fintype A] [DecidableEq A] (ra : RationalAction S A) (s : S) (S' T : Finset A) (hST : S' T) (a : A) (ha : a S') (hS_pos : bS', ra.score s b 0) (hT_pos : bT, ra.score s b 0) :
          ra.pChoice s T a = ra.pChoice s S' a * ((∑ bS', ra.score s b) / bT, ra.score s b)

          Product rule (Theorem 1): P(a, T) = P(a, S) · P(S, T) for a ∈ S ⊆ T, where P(S, T) = Σ_{b∈S} score(b) / Σ_{b∈T} score(b).

          noncomputable def Core.RationalAction.scaleBy {S : Type u_3} {A : Type u_4} [Fintype A] (ra : RationalAction S A) (k : ) (hk : 0 < k) :

          Scale all scores by a positive constant k.

          Equations
          • ra.scaleBy k hk = { score := fun (s : S) (a : A) => k * ra.score s a, score_nonneg := }
          Instances For
            theorem Core.RationalAction.scaleBy_policy {S : Type u_3} {A : Type u_4} [Fintype A] (ra : RationalAction S A) (s : S) (a : A) (k : ) (hk : 0 < k) :
            (ra.scaleBy k hk).policy s a = ra.policy s a

            Scale invariance (Theorem 5): scaling scores by k > 0 preserves policy.

            theorem Core.RationalAction.policy_eq_of_proportional {S : Type u_3} {A : Type u_4} [Fintype A] (ra ra' : RationalAction S A) (s : S) (k : ) (hk : 0 < k) (h : ∀ (a : A), ra'.score s a = k * ra.score s a) (a : A) :
            ra'.policy s a = ra.policy s a

            Uniqueness (forward direction, Theorem 4): If scores are proportional (score'(s,a) = k · score(s,a) for some k > 0), then both agents have the same policy.

            Softmax Function #

            The softmax function σ(s, α)ᵢ = exp(α · sᵢ) / Σⱼ exp(α · sⱼ) is the exponential parameterization of the Luce choice rule. Following Franke & Degen (submitted), we establish Facts 1–8.

            noncomputable def Core.softmax {ι : Type u_3} [Fintype ι] (s : ι) (α : ) :
            ι

            The softmax function: softmax(s, α)ᵢ = exp(α · sᵢ) / Σⱼ exp(α · sⱼ).

            Equations
            • Core.softmax s α i = Real.exp (α * s i) / j : ι, Real.exp (α * s j)
            Instances For
              noncomputable def Core.partitionFn {ι : Type u_3} [Fintype ι] (s : ι) (α : ) :

              The partition function (normalizing constant) Z = Σⱼ exp(α · sⱼ).

              Equations
              Instances For
                noncomputable def Core.logSumExp {ι : Type u_3} [Fintype ι] (s : ι) (α : ) :

                Log-sum-exp: log of partition function.

                Equations
                Instances For
                  theorem Core.partitionFn_pos {ι : Type u_3} [Fintype ι] [Nonempty ι] (s : ι) (α : ) :
                  0 < partitionFn s α

                  The partition function is always positive.

                  theorem Core.partitionFn_ne_zero {ι : Type u_3} [Fintype ι] [Nonempty ι] (s : ι) (α : ) :
                  partitionFn s α 0
                  theorem Core.softmax_pos {ι : Type u_3} [Fintype ι] [Nonempty ι] (s : ι) (α : ) (i : ι) :
                  0 < softmax s α i

                  Each softmax probability is positive. (Fact 1, part 1)

                  theorem Core.softmax_sum_eq_one {ι : Type u_3} [Fintype ι] [Nonempty ι] (s : ι) (α : ) :
                  i : ι, softmax s α i = 1

                  Softmax probabilities sum to 1. (Fact 1, part 2)

                  theorem Core.softmax_nonneg {ι : Type u_3} [Fintype ι] [Nonempty ι] (s : ι) (α : ) (i : ι) :
                  0 softmax s α i

                  Softmax is non-negative.

                  theorem Core.softmax_le_one {ι : Type u_3} [Fintype ι] [Nonempty ι] (s : ι) (α : ) (i : ι) :
                  softmax s α i 1

                  Softmax is at most 1.

                  theorem Core.softmax_odds {ι : Type u_3} [Fintype ι] [Nonempty ι] (s : ι) (α : ) (i j : ι) :
                  softmax s α i / softmax s α j = Real.exp (α * (s i - s j))

                  Fact 2: Odds are determined by score differences: pᵢ/pⱼ = exp(α(sᵢ - sⱼ)).

                  theorem Core.log_softmax_odds {ι : Type u_3} [Fintype ι] [Nonempty ι] (s : ι) (α : ) (i j : ι) :
                  Real.log (softmax s α i / softmax s α j) = α * (s i - s j)

                  Log-odds equal scaled score difference.

                  theorem Core.softmax_ratio {ι : Type u_3} [Fintype ι] [Nonempty ι] (s : ι) (α : ) (i j : ι) :
                  softmax s α i = softmax s α j * Real.exp (α * (s i - s j))

                  Ratio form of Fact 2.

                  noncomputable def Core.logistic (x : ) :

                  The logistic (sigmoid) function: S(x) = 1 / (1 + exp(−x)).

                  Equations
                  Instances For
                    noncomputable def Core.logit (p : ) :

                    The logit function: L(p) = log(p / (1 − p)). Inverse of logistic on (0, 1).

                    Equations
                    Instances For
                      theorem Core.logit_logistic (x : ) :
                      logit (logistic x) = x

                      logit inverts logistic: logit(logistic(x)) = x.

                      theorem Core.logistic_logit {p : } (hp0 : 0 < p) (hp1 : p < 1) :
                      logistic (logit p) = p

                      logistic inverts logit for 0 < p < 1: logistic(logit(p)) = p.

                      theorem Core.softmax_binary (s : Fin 2) (α : ) :
                      softmax s α 0 = logistic (α * (s 0 - s 1))

                      Fact 3: For n = 2, softmax reduces to logistic.

                      theorem Core.logit_softmax_binary (s : Fin 2) (α : ) :
                      logit (softmax s α 0) = α * (s 0 - s 1)

                      Softmax log-odds equals logit of the binary softmax probability (when there are exactly two alternatives).

                      theorem Core.softmax_add_const {ι : Type u_3} [Fintype ι] (s : ι) (α c : ) :
                      softmax (fun (i : ι) => s i + c) α = softmax s α

                      Fact 6: Softmax is translation invariant.

                      theorem Core.softmax_scale {ι : Type u_3} [Fintype ι] (s : ι) (α a : ) (ha : a 0) :
                      softmax (fun (i : ι) => a * s i) (α / a) = softmax s α

                      Fact 8: Multiplicative scaling can be absorbed into α.

                      theorem Core.softmax_mono {ι : Type u_3} [Fintype ι] [Nonempty ι] (s : ι) {α : } ( : 0 < α) (i j : ι) (hij : s i s j) :
                      softmax s α i softmax s α j

                      Higher scores get higher probabilities (for α > 0).

                      theorem Core.softmax_strict_mono {ι : Type u_3} [Fintype ι] [Nonempty ι] (s : ι) {α : } ( : 0 < α) (i j : ι) (hij : s i < s j) :
                      softmax s α i < softmax s α j

                      Strict monotonicity.

                      theorem Core.softmax_zero {ι : Type u_3} [Fintype ι] [Nonempty ι] (s : ι) :
                      softmax s 0 = fun (x : ι) => 1 / (Fintype.card ι)

                      At α = 0, softmax is uniform.

                      theorem Core.softmax_neg_mono {ι : Type u_3} [Fintype ι] [Nonempty ι] (s : ι) {α : } ( : α < 0) (i j : ι) (hij : s i s j) :
                      softmax s α j softmax s α i

                      For α < 0, lower scores get higher probabilities.

                      theorem Core.log_softmax {ι : Type u_3} [Fintype ι] [Nonempty ι] (s : ι) (α : ) (i : ι) :
                      Real.log (softmax s α i) = α * s i - Real.log (partitionFn s α)

                      Log of softmax = score minus log partition function.

                      noncomputable def Core.softmax1 {ι : Type u_3} [Fintype ι] (s : ι) :
                      ι

                      Softmax with default α = 1.

                      Equations
                      Instances For
                        noncomputable def Core.softmaxTemp {ι : Type u_3} [Fintype ι] (s : ι) (τ : ) :
                        ι

                        Temperature form: τ = 1/α.

                        Equations
                        Instances For
                          theorem Core.softmax_exponential_family {ι : Type u_3} [Fintype ι] (s : ι) (α : ) (i : ι) [Nonempty ι] :
                          softmax s α i = Real.exp (α * s i - logSumExp s α)

                          Softmax is an exponential family distribution.

                          theorem Core.rpow_luce_eq_softmax {ι : Type u_3} [Fintype ι] [Nonempty ι] (f : ι) (α : ) (hf : ∀ (i : ι), 0 < f i) (i : ι) :
                          f i ^ α / j : ι, f j ^ α = softmax (fun (j : ι) => Real.log (f j)) α i

                          Luce choice with rpow scores equals softmax over log scores.

                          f(i)^α / Σⱼ f(j)^α = softmax(log ∘ f, α)(i) when all f(i) > 0.

                          This is the general identity connecting belief-based RSA (which uses rpow) to the softmax framework (which uses exp). Every S1 model with s1Score = rpow(l0, α) inherits all softmax limit theorems via this identity: as α → ∞, rpow-based Luce choice concentrates on the argmax of f, i.e., the most informative utterance.

                          Why Softmax? The Fechnerian Characterization #

                          The exponential parameterization score = exp(α · utility) is not a design choice — it is the unique transformation connecting Luce's ratio scale to a utility (interval) scale (§2.A; @cite{adams-messick-1958}).

                          Ratio vs interval scales. Luce's Axiom 1 (IIA) yields a ratio scale v: only ratios v(a)/v(b) are meaningful (Theorem 4). Fechner's psychophysics requires an interval scale u: only differences u(a) - u(b) are meaningful. The question: how are v and u related?

                          Derivation. From P(a,b) = v(a)/(v(a)+v(b)), the odds ratio is v(a)/v(b) = g(u(a) - u(b)) for some function g. Transitivity of ratios (v(a)/v(c) = [v(a)/v(b)] · [v(b)/v(c)]) forces Cauchy's multiplicative functional equation: g(s + t) = g(s) · g(t). The unique monotone increasing solution is g(s) = exp(k · s) (cauchy_mul_exp), giving:

                          The parameter k > 0 is the rationality parameter α in RSA.

                          theorem Core.cauchy_mul_exp (g : ) (hg_mul : ∀ (s t : ), g (s + t) = g s * g t) (hg_mono : StrictMono g) :
                          ∃ (k : ), 0 < k g 0 = 1 ∀ (s : ), g s = Real.exp (k * s)

                          Cauchy's multiplicative functional equation (classical): If g : ℝ → ℝ satisfies g(s + t) = g(s) · g(t) and is strictly monotone increasing, then g(s) = exp(k · s) for some k > 0.

                          The proof reduces to the additive Cauchy equation via log: setting h = log ∘ g, the multiplicative equation becomes h(s+t) = h(s) + h(t). The key lemma (cauchy_monotone_additive_linear) shows that a strictly monotone additive function must be linear, by density of ℚ in ℝ: h agrees with x ↦ k·x on rationals (by induction), and any deviation on an irrational x would violate monotonicity via a rational witness between x and h(x)/k.

                          theorem Core.luce_fechnerian_exp {X : Type u_3} (v u : X) (g : ) (hv_pos : ∀ (x : X), 0 < v x) (h_ratio : ∀ (x y : X), v x / v y = g (u x - u y)) (hg_mul : ∀ (s t : ), g (s + t) = g s * g t) (hg_mono : StrictMono g) :
                          ∃ (k : ), 0 < k ∀ (x₀ x : X), v x = v x₀ * Real.exp (k * (u x - u x₀))

                          Fechnerian uniqueness (§2.A; @cite{adams-messick-1958}): If a ratio scale v and interval scale u represent the same ordering via v(x)/v(y) = g(u(x) - u(y)) for a strictly monotone multiplicative g, then v is the exponential of u.

                          This is WHY fromSoftmax uses exp(α · utility): the exponential is forced by the requirement that log-odds be linear in utility differences. It is the unique bridge between Luce's ratio scale (Chapter 1) and Fechner's interval scale (Chapter 2).

                          noncomputable def Core.RationalAction.fromSoftmax {S : Type u_1} {A : Type u_2} [Fintype A] (utility : SA) (α : ) :

                          Construct a RationalAction from a utility function via softmax.

                          The score is exp(α · utility(s, a)), so policy = softmax(utility, α). The exponential parameterization is forced by the Fechnerian characterization (luce_fechnerian_exp): it is the unique bridge from Luce's ratio scale to an additive utility scale.

                          Equations
                          Instances For
                            theorem Core.RationalAction.fromSoftmax_policy_eq {S : Type u_1} {A : Type u_2} [Fintype A] [Nonempty A] (utility : SA) (α : ) (s : S) (a : A) :
                            (fromSoftmax utility α).policy s a = softmax (utility s) α a

                            The policy of a softmax agent equals the softmax function.

                            Gibbs Variational Principle #

                            The softmax distribution uniquely maximizes entropy + expected score on the probability simplex. This is the mathematical foundation for RSA convergence (@cite{zaslavsky-hu-levy-2020}, Proposition 1).

                            Proof strategy #

                            The Gibbs VP reduces to KL non-negativity via three identities:

                            1. H(p) + KL(p‖q) = -∑ pᵢ log qᵢ (negMulLog + KL term telescope)
                            2. -∑ pᵢ log qᵢ = -α⟨p,s⟩ + log Z (substitute log qᵢ = α sᵢ - log Z)
                            3. H(q) + α⟨q,s⟩ = log Z (softmax self-information)

                            Combining: H(p) + α⟨p,s⟩ + KL = log Z = H(q) + α⟨q,s⟩, so KL ≥ 0 ⟹ LHS ≤ RHS.

                            The KL machinery used here — klFinite, kl_eq_sum_klFun, kl_nonneg — is inlined as private (ι→ℝ) helpers in this file (mathlib gap; the PMF form lives at PMF.klDiv / PMF.toReal_klDiv_eq_sum_log_div).

                            noncomputable def Core.speakerObj {ι : Type u_3} [Fintype ι] (v : ι) (α : ) (s : ι) :

                            The per-meaning speaker objective: f(s) = Σᵤ [negMulLog(sᵤ) + α · sᵤ · vᵤ].

                            This is the function that the speaker maximizes for each meaning m, where vᵤ = log L(m|u) is the listener utility of utterance u.

                            Equations
                            Instances For
                              theorem Core.speakerObj_at_softmax {ι : Type u_3} [Fintype ι] [Nonempty ι] (v : ι) (α : ) :
                              speakerObj v α (softmax v α) = logSumExp v α

                              The softmax achieves f(s*) = log Z, where Z is the partition function.

                              theorem Core.gibbs_variational {ι : Type u_3} [Fintype ι] [Nonempty ι] (s : ι) (α : ) (p : ι) (hp_nonneg : ∀ (i : ι), 0 p i) (hp_sum : i : ι, p i = 1) :
                              i : ι, (p i).negMulLog + α * i : ι, p i * s i i : ι, (softmax s α i).negMulLog + α * i : ι, softmax s α i * s i

                              Gibbs Variational Principle: softmax maximizes entropy + expected score.

                              For any distribution p on ι and scores s : ι → ℝ: H(p) + α·⟨p, s⟩ ≤ H(q) + α·⟨q, s⟩ where q = softmax(s, α) and H(p) = Σ negMulLog(pᵢ).

                              Softmax Concentration at High Rationality #

                              As the rationality parameter α → ∞, softmax concentrates all probability mass on the action with highest utility — i.e., softmax converges to argmax. This connects:

                              Proof sketch #

                              From softmax_odds, we have σᵢ / σⱼ = exp(α(sᵢ − sⱼ)). When sᵢ > sⱼ, this ratio → ∞ as α → ∞, so σⱼ / σᵢ → 0. Since Σ σₖ = 1, the maximizer's probability → 1 by squeezing: 1 - σ_max = Σ_{k≠max} σₖ, and each non-maximal term → 0 (bounded by exp(-α · gap) where gap = sᵢ - sⱼ > 0).

                              theorem Core.softmax_argmax_limit {ι : Type u_3} [Fintype ι] [Nonempty ι] [DecidableEq ι] (s : ι) (i_max : ι) (h_max : ∀ (j : ι), j i_maxs j < s i_max) (ε : ) :
                              ε > 0∃ (α₀ : ), α > α₀, |softmax s α i_max - 1| < ε

                              Softmax → argmax limit: as α → ∞, softmax concentrates on the unique maximizer. For any i_max with s i_max strictly greater than all other scores, softmax(s, α)_{i_max} → 1.

                              This is the formal connection between MaxEnt (soft optimization) and OT (hard optimization): OT is the α → ∞ limit of MaxEnt.

                              theorem Core.softmax_nonmax_limit {ι : Type u_3} [Fintype ι] (s : ι) (i_max : ι) (h_max : ∀ (j : ι), j i_maxs j < s i_max) (j : ι) (hj : j i_max) (ε : ) :
                              ε > 0∃ (α₀ : ), α > α₀, softmax s α j < ε

                              Complement of the limit: non-maximal actions get probability → 0.

                              Entropy + softmax interactions #

                              The (ι→ℝ)-typed Shannon entropy lives in Core/InformationTheory.lean as entropy : Finset α → (α → ℝ) → ℝ. The PMF-typed canonical form is PMF.entropy : PMF α → ℝ; the two agree by definition on (ofRealWeightFn p).toRealFn = p for normalized p (see PMF.ofRealWeightFn_toRealFn_eq). This section uses the (ι→ℝ) form because softmax is a real-arithmetic construction; consumers wanting the PMF form can wrap via PMF.ofRealWeightFn.

                              theorem Core.entropy_le_log_card {ι : Type u_3} [Fintype ι] [Nonempty ι] (p : ι) (hp_nonneg : ∀ (i : ι), 0 p i) (hp_sum : i : ι, p i = 1) :
                              Core.entropy✝ Finset.univ p Real.log (Fintype.card ι)

                              Maximum entropy is achieved by uniform distribution.

                              Proof: KL(p ‖ uniform) ≥ 0, and KL(p ‖ uniform) = log n - H(p).

                              theorem Core.entropy_softmax {ι : Type u_3} [Fintype ι] [Nonempty ι] (s : ι) (α : ) :
                              Core.entropy✝ Finset.univ (softmax s α) = Real.log (partitionFn s α) - α * i : ι, softmax s α i * s i

                              Entropy of softmax: H(softmax(s, α)) = log Z - α · 𝔼[s].

                              noncomputable def Core.entropyRegObjective {ι : Type u_3} [Fintype ι] (s : ι) (α : ) (p : ι) :

                              Entropy-regularized objective: G_α(p, s) = ⟨s, p⟩ + (1/α) H(p).

                              Equations
                              Instances For
                                theorem Core.entropyRegObjective_softmax {ι : Type u_3} [Fintype ι] [Nonempty ι] (s : ι) (α : ) ( : 0 < α) :
                                entropyRegObjective s α (softmax s α) = 1 / α * Real.log (partitionFn s α)

                                The maximum value of the entropy-regularized objective.

                                theorem Core.softmax_maximizes_entropyReg {ι : Type u_3} [Fintype ι] [Nonempty ι] (s : ι) (α : ) ( : 0 < α) (p : ι) (hp_nonneg : ∀ (i : ι), 0 p i) (hp_sum : i : ι, p i = 1) :

                                Fact 5: Softmax maximizes the entropy-regularized objective.

                                Proof: gibbs_variational gives H(p) + α⟨p,s⟩ ≤ H(q) + α⟨q,s⟩; dividing by α > 0 yields the result.

                                theorem Core.softmax_unique_maximizer {ι : Type u_3} [Fintype ι] [Nonempty ι] (s : ι) (α : ) ( : 0 < α) (p : ι) (hp_nonneg : ∀ (i : ι), 0 p i) (hp_sum : i : ι, p i = 1) (h_max : entropyRegObjective s α p = entropyRegObjective s α (softmax s α)) :
                                p = softmax s α

                                Softmax is the unique maximizer.

                                Proof: equality in the objective ⟹ KL(p ‖ softmax) = 0 (via speakerObj_plus_kl), hence p = softmax (via kl_eq_zero_imp_eq).

                                noncomputable def Core.freeEnergy {ι : Type u_3} [Fintype ι] (s : ι) (α : ) (p : ι) :

                                Free energy (from statistical mechanics).

                                Equations
                                Instances For
                                  theorem Core.softmax_minimizes_freeEnergy {ι : Type u_3} [Fintype ι] [Nonempty ι] (s : ι) (α : ) ( : 0 < α) (p : ι) (hp_nonneg : ∀ (i : ι), 0 p i) (hp_sum : i : ι, p i = 1) :
                                  freeEnergy s α (softmax s α) freeEnergy s α p

                                  Softmax is the Boltzmann distribution: minimizes free energy.

                                  theorem Core.logSumExp_convex {ι : Type u_3} [Fintype ι] [Nonempty ι] (s : ι) :
                                  ConvexOn Set.univ fun (α : ) => logSumExp s α

                                  The log-partition function is convex in α.

                                  Proof: By Hölder's inequality. For 0 < a, b with a + b = 1: ∑ exp(x·sᵢ)^a · exp(y·sᵢ)^b ≤ (∑ exp(x·sᵢ))^a · (∑ exp(y·sᵢ))^b Since exp(x·sᵢ)^a · exp(y·sᵢ)^b = exp((ax+by)·sᵢ), taking logs gives logSumExp(s, ax+by) ≤ a·logSumExp(s, x) + b·logSumExp(s, y).

                                  theorem Core.deriv_logSumExp {ι : Type u_3} [Fintype ι] [Nonempty ι] (s : ι) (α : ) :
                                  deriv (fun (α : ) => logSumExp s α) α = i : ι, softmax s α i * s i

                                  Derivative of log-partition gives expected value: d/dα log(Σ exp(α sᵢ)) = Σ softmax(s,α)ᵢ · sᵢ.

                                  Proof via chain rule on log ∘ Σ exp(α · sᵢ), then hasDerivAt_finset_sum.

                                  noncomputable def Core.partitionFnOffset {ι : Type u_3} [Fintype ι] (s r : ι) (α : ) :

                                  Partition function with per-element offsets: Z(α) = Σⱼ exp(α · sⱼ + rⱼ).

                                  Equations
                                  Instances For
                                    theorem Core.partitionFnOffset_pos {ι : Type u_3} [Fintype ι] [Nonempty ι] (s r : ι) (α : ) :
                                    noncomputable def Core.logSumExpOffset {ι : Type u_3} [Fintype ι] (s r : ι) (α : ) :

                                    Log-sum-exp with offsets: log Σ exp(α · sᵢ + rᵢ).

                                    Equations
                                    Instances For
                                      noncomputable def Core.softmaxOffset {ι : Type u_3} [Fintype ι] (s r : ι) (α : ) (i : ι) :

                                      Softmax with offsets: exp(α · sᵢ + rᵢ) / Z(α).

                                      Equations
                                      Instances For
                                        theorem Core.hasDerivAt_logSumExpOffset {ι : Type u_3} [Fintype ι] [Nonempty ι] (s r : ι) (α : ) :
                                        HasDerivAt (logSumExpOffset s r) (∑ i : ι, softmaxOffset s r α i * s i) α

                                        Derivative of offset log-partition gives weighted expected value: d/dα log(Σ exp(α·sᵢ + rᵢ)) = Σ softmaxOffset(s,r,α)ᵢ · sᵢ.

                                        theorem Core.logSumExpOffset_convex {ι : Type u_3} [Fintype ι] [Nonempty ι] (s r : ι) :
                                        ConvexOn Set.univ (logSumExpOffset s r)

                                        The offset log-partition function is convex in α.

                                        Same Hölder argument as logSumExp_convex: the key factoring exp((ax+by)·sᵢ + rᵢ) = exp(x·sᵢ + rᵢ)^a · exp(y·sᵢ + rᵢ)^b holds because a + b = 1, absorbing the offset.

                                        theorem Core.logConditional_concaveOn {ι : Type u_3} [Fintype ι] [Nonempty ι] (s r : ι) (y : ι) :
                                        ConcaveOn Set.univ fun (α : ) => α * s y + r y - logSumExpOffset s r α

                                        The log conditional likelihood α ↦ (α · sᵧ + rᵧ) − logSumExpOffset(s,r,α) is concave (affine minus convex).

                                        theorem Core.hasDerivAt_logConditional {ι : Type u_3} [Fintype ι] [Nonempty ι] (s r : ι) (y : ι) (α : ) :
                                        HasDerivAt (fun (w : ) => w * s y + r y - logSumExpOffset s r w) (s y - i : ι, softmaxOffset s r α i * s i) α

                                        The derivative of log conditional likelihood α ↦ (α·sᵧ + rᵧ) − logSumExpOffset is the observed feature value minus the expected value: d/dα = sᵧ − Σᵢ softmaxOffset(s,r,α)ᵢ · sᵢ.

                                        theorem Core.max_entropy_duality {ι : Type u_3} [Fintype ι] [Nonempty ι] (s : ι) (c α : ) (_hα : 0 < α) (h_constraint : i : ι, softmax s α i * s i = c) :
                                        Core.entropy✝ Finset.univ (softmax s α) = Real.log (partitionFn s α) - α * c

                                        Strong duality: max entropy = min free energy.

                                        theorem Core.bayesian_maximizes {ι : Type u_3} [Fintype ι] (w : ι) (hw_nonneg : ∀ (i : ι), 0 w i) (hC_pos : 0 < i : ι, w i) (L : ι) (hL_pos : ∀ (i : ι), 0 < L i) (hL_sum : i : ι, L i = 1) :
                                        i : ι, w i * Real.log (L i) i : ι, w i * Real.log (w i / j : ι, w j)

                                        Bayesian optimality: The normalized posterior maximizes weighted log-likelihood on the probability simplex.

                                        For weights wᵢ ≥ 0 with C = Σwᵢ > 0, and any distribution L on the simplex (Lᵢ > 0, Σ Lᵢ = 1), the normalized posterior p*ᵢ = wᵢ/C satisfies:

                                        Σᵢ wᵢ · log(Lᵢ) ≤ Σᵢ wᵢ · log(p*ᵢ)

                                        This is used for listener optimality: with wᵢ = P(m)·S(u|m), the Bayesian listener L*(m|u) = wᵢ/C maximizes Σ_m P(m)·S(u|m)·log L(m|u).

                                        Uniqueness of the Ratio Scale #

                                        Theorem 4 (proved earlier as policy_eq_of_proportional) shows that proportional scores yield the same policy. The converse — that identical policies force proportional scores — completes the uniqueness characterization: same policy ↔ proportional scores, with proportionality constant k = totalScore₂/totalScore₁.

                                        Note: This is distinct from the "Independence-of-Unit Condition" in §1.F (pp. 28–33), which concerns state-dependent transformations f satisfying f(kv) = kf(v). That condition addresses how scale values transform across experimental conditions, not the uniqueness of the scale within a condition.

                                        theorem Core.RationalAction.proportional_of_policy_eq {S : Type u_3} {A : Type u_4} [Fintype A] (ra₁ ra₂ : RationalAction S A) (s : S) (h₁ : 0 < ra₁.totalScore s) (h₂ : 0 < ra₂.totalScore s) (hpol : ∀ (a : A), ra₁.policy s a = ra₂.policy s a) (a : A) :
                                        ra₂.score s a = ra₂.totalScore s / ra₁.totalScore s * ra₁.score s a

                                        Converse of Theorem 4 (uniqueness of ratio scale): If two agents with positive total scores have the same policy, then their scores are proportional with constant k = total₂/total₁.

                                        theorem Core.RationalAction.policy_eq_iff_proportional {S : Type u_3} {A : Type u_4} [Fintype A] (ra₁ ra₂ : RationalAction S A) (s : S) (h₁ : 0 < ra₁.totalScore s) (h₂ : 0 < ra₂.totalScore s) :
                                        (∀ (a : A), ra₁.policy s a = ra₂.policy s a) ∃ (k : ), 0 < k ∀ (a : A), ra₂.score s a = k * ra₁.score s a

                                        Full uniqueness characterization (Theorem 4 and its converse): Two agents with positive total scores have the same policy if and only if their scores are proportional.

                                        Alternative Forms of Axiom 1 #

                                        proves three equivalent formulations of the choice axiom:

                                        (a) Ratio form: There exists a positive function v such that P(x, T) = v(x) / Σ_{y∈T} v(y) for all x ∈ T.

                                        (b) Product rule: P(x, T) = P(x, S) · P(S, T) for x ∈ S ⊆ T, where P(S, T) = Σ_{y∈S} P(y, T).

                                        (c) Pairwise independence: The odds ratio P(x,{x,y}) · P(y,T) = P(y,{x,y}) · P(x,T) — pairwise ratios are preserved in any superset.

                                        The ratio form (a) is the definition of RationalAction; (a)→(b) is product_rule and (a)→(c) is pChoice_ratio.

                                        structure Core.ChoiceFn (A : Type u_4) [DecidableEq A] :
                                        Type u_4

                                        A choice function on finite subsets — the canonical Luce 1959 form (Definition 1, p. 5: "P(a, T) … is a probability distribution on T").

                                        For each non-empty subset T ⊆ A, prob T : A → ℝ is a probability distribution: non-negative, vanishes outside T, sums to 1 inside. Specializes to:

                                        • Binary choice via cf.binary x y := cf.prob {x, y} x (see ChoiceFn.binary). Complementarity follows from prob_sum_eq_one (see binary_complement); no separate BinaryChoiceFn structure needed.
                                        • Mathlib PMF when restricted to a fixed support: prob T : A → ℝ is a probability distribution conditioned on choice from T.

                                        The Luce axioms (hasRatioScale, hasProductRule, hasPairwiseIIA) are stated as properties below, asserting structural facts about which choice functions admit ratio-scale representations.

                                        • prob : Finset AA

                                          P(a, T): probability of choosing a from set T

                                        • prob_nonneg (T : Finset A) (a : A) : 0 self.prob T a

                                          Probabilities are non-negative

                                        • prob_zero_outside (T : Finset A) (a : A) : aTself.prob T a = 0

                                          Zero probability outside the choice set

                                        • prob_sum_eq_one (T : Finset A) : T.NonemptyaT, self.prob T a = 1

                                          Probabilities sum to 1 within the choice set (Luce 1959 Def 1)

                                        Instances For
                                          def Core.ChoiceFn.binary {A : Type u_4} [DecidableEq A] (cf : ChoiceFn A) (x y : A) :

                                          Binary choice view: binary cf x y is the probability of choosing x over y in the binary forced choice between them.

                                          Defined via cf.prob {x, y} x. Replaces the legacy BinaryChoiceFn structure in UtilityTheory.lean (see binary_complement for complementarity).

                                          Equations
                                          Instances For
                                            theorem Core.ChoiceFn.binary_nonneg {A : Type u_4} [DecidableEq A] (cf : ChoiceFn A) (x y : A) :
                                            0 cf.binary x y

                                            Binary choice probabilities are non-negative.

                                            theorem Core.ChoiceFn.binary_le_one {A : Type u_4} [DecidableEq A] (cf : ChoiceFn A) (x y : A) :
                                            cf.binary x y 1

                                            Binary choice probabilities are at most 1.

                                            theorem Core.ChoiceFn.binary_complement {A : Type u_4} [DecidableEq A] (cf : ChoiceFn A) {x y : A} (hxy : x y) :
                                            cf.binary x y + cf.binary y x = 1

                                            Binary complementarity: cf.binary x y + cf.binary y x = 1 when x ≠ y. Derived from prob_sum_eq_one over {x, y}.

                                            def Core.ChoiceFn.hasRatioScale {A : Type u_3} [DecidableEq A] (cf : ChoiceFn A) :

                                            Axiom 1, Form (a): ratio scale representation. There exists v > 0 such that P(x, T) = v(x) / Σ v(y).

                                            Equations
                                            • cf.hasRatioScale = ∃ (v : A), (∀ (a : A), 0 < v a) ∀ (T : Finset A), aT, cf.prob T a = v a / bT, v b
                                            Instances For
                                              def Core.ChoiceFn.hasProductRule {A : Type u_3} [DecidableEq A] (cf : ChoiceFn A) :

                                              Axiom 1, Form (b): product rule. P(x, T) = P(x, S) · Σ_{y∈S} P(y, T) for x ∈ S ⊆ T.

                                              Equations
                                              • cf.hasProductRule = ∀ (S T : Finset A), S TS.NonemptybT, cf.prob T b = 1aS, cf.prob T a = cf.prob S a * bS, cf.prob T b
                                              Instances For
                                                def Core.ChoiceFn.hasPairwiseIIA {A : Type u_3} [DecidableEq A] (cf : ChoiceFn A) :

                                                Axiom 1, Form (c): pairwise independence (IIA). The odds ratio is preserved in any superset: P(x,T) · P(y,{x,y}) = P(y,T) · P(x,{x,y}).

                                                Equations
                                                Instances For
                                                  theorem Core.ratio_implies_product {A : Type u_3} [DecidableEq A] (cf : ChoiceFn A) (h : cf.hasRatioScale) :

                                                  (a) → (b): Ratio form implies product rule (Appendix 1).

                                                  theorem Core.ratio_implies_pairwiseIIA {A : Type u_3} [DecidableEq A] (cf : ChoiceFn A) (h : cf.hasRatioScale) :

                                                  (a) → (c): Ratio form implies pairwise IIA (Appendix 1).

                                                  theorem Core.pairwiseIIA_implies_ratio {A : Type u_3} [DecidableEq A] [Inhabited A] (cf : ChoiceFn A) (hIIA : cf.hasPairwiseIIA) (hpos : ∀ (T : Finset A), aT, 0 < cf.prob T a) :

                                                  (c) → (a): Pairwise IIA implies ratio form (Appendix 1). The ratio scale is constructed by fixing a reference element x₀ and setting v(x) = P(x, {x, x₀}) / P(x₀, {x, x₀}). Requires strict positivity for elements in the choice set; normalization (probabilities sum to 1) is now structural (via prob_sum_eq_one).

                                                  theorem Core.axiom1_ratio_iff_pairwiseIIA {A : Type u_3} [DecidableEq A] [Inhabited A] (cf : ChoiceFn A) (hpos : ∀ (T : Finset A), aT, 0 < cf.prob T a) :

                                                  Axiom 1 equivalence (Appendix 1): Ratio form ↔ pairwise IIA (under strict positivity; normalization is now structural via prob_sum_eq_one).