Documentation

Linglib.Core.Agent.RationalAction

Rational Action @cite{luce-1959} #

@cite{cover-thomas-2006} @cite{zaslavsky-hu-levy-2020} @cite{adams-messick-1958}The mathematical foundation for all soft-rational agents: RSA speakers/listeners, BToM agents, and decision-theoretic actors.

Architecture #

A RationalAction agent selects actions with probability proportional to a non-negative score function — the Luce choice rule. This is the unique choice rule satisfying IIA (independence of irrelevant alternatives): the relative probability of two actions depends only on their scores.

The key mathematical results characterizing this choice rule are:

Softmax (§2): The exponential parameterization score = exp(α · utility) gives policy = softmax(utility, α). This is the standard form in RSA.
Gibbs Variational Principle (§3): Softmax uniquely maximizes H(p) + α · ⟨p, s⟩ on the probability simplex. This is the mathematical foundation for RSA convergence.
Maximum Entropy (§4): Softmax is the max-entropy distribution subject to an expected-utility constraint. Equivalently, it minimizes free energy (the Boltzmann distribution from statistical mechanics).
Bayesian Optimality (§5): The Bayesian posterior maximizes expected log-likelihood. This is the listener half of RSA convergence.

structure Core.RationalAction (State : Type u_1) (Action : Type u_2) [Fintype Action] :

Type (max u_1 u_2)

A rational action agent: selects actions with probability ∝ score(state, action).

This is the Luce choice rule. The score function is unnormalized; normalization to a proper distribution is derived (see policy).

Key instances:

RSA L0: score(u, w) = prior(w) · meaning(u, w)
RSA S1: score(w, u) = rpow(informativity(w, u), α) · exp(-α · cost(u))
BToM agent: score(state, action) = exp(β · Q(state, action))

score : State → Action → ℝ
Unnormalized score: policy(a|s) ∝ score(s, a).
score_nonneg (s : State) (a : Action) : 0 ≤ self.score s a
Scores are non-negative.

Instances For

noncomputable def Core.RationalAction.totalScore {S : Type u_1} {A : Type u_2} [Fintype A] (ra : RationalAction S A) (s : S) :

ℝ

Total score across all actions in a state (normalization constant).

Equations

ra.totalScore s = ∑ a : A, ra.score s a

Instances For

theorem Core.RationalAction.totalScore_nonneg {S : Type u_1} {A : Type u_2} [Fintype A] (ra : RationalAction S A) (s : S) :

0 ≤ ra.totalScore s

noncomputable def Core.RationalAction.policy {S : Type u_1} {A : Type u_2} [Fintype A] (ra : RationalAction S A) (s : S) (a : A) :

ℝ

Normalized policy: P(a|s) = score(s,a) / Σ_a' score(s,a'). Returns 0 for all actions if totalScore = 0.

Equations

ra.policy s a = if ra.totalScore s = 0 then 0 else ra.score s a / ra.totalScore s

Instances For

theorem Core.RationalAction.policy_nonneg {S : Type u_1} {A : Type u_2} [Fintype A] (ra : RationalAction S A) (s : S) (a : A) :

0 ≤ ra.policy s a

theorem Core.RationalAction.policy_sum_eq_one {S : Type u_1} {A : Type u_2} [Fintype A] (ra : RationalAction S A) (s : S) (h : ra.totalScore s ≠ 0) :

∑ a : A, ra.policy s a = 1

Policy sums to 1 when totalScore is nonzero.

theorem Core.RationalAction.policy_monotone {S : Type u_1} {A : Type u_2} [Fintype A] (ra : RationalAction S A) (s : S) (a₁ a₂ : A) (h : ra.score s a₁ ≤ ra.score s a₂) :

ra.policy s a₁ ≤ ra.policy s a₂

Policy monotonicity: higher score → higher probability.

theorem Core.RationalAction.policy_eq_zero_of_score_eq_zero {S : Type u_1} {A : Type u_2} [Fintype A] (ra : RationalAction S A) (s : S) (a : A) (h : ra.score s a = 0) :

ra.policy s a = 0

Zero score implies zero policy, regardless of whether totalScore is zero.

theorem Core.RationalAction.policy_eq_of_score_eq {S : Type u_1} {A : Type u_2} [Fintype A] (ra : RationalAction S A) (s : S) (a₁ a₂ : A) (h : ra.score s a₁ = ra.score s a₂) :

ra.policy s a₁ = ra.policy s a₂

Policy respects score equality: equal scores → equal probabilities. Follows directly from the Luce rule: both sides are score / totalScore with the same denominator.

theorem Core.RationalAction.policy_eq_one_of_totalScore_eq {S : Type u_1} {A : Type u_2} [Fintype A] (ra : RationalAction S A) (s : S) (a : A) (h_sum : ra.totalScore s = ra.score s a) (h_pos : 0 < ra.score s a) :

ra.policy s a = 1

When totalScore equals the score of action a, the policy for a is 1. Used by the compositional proof builder when all other scores are zero, so totalScore = score a + 0 +... + 0 = score a, making policy = 1.

theorem Core.RationalAction.policy_not_lt_of_score_le {S : Type u_1} {A : Type u_2} [Fintype A] (ra : RationalAction S A) (s : S) (a₁ a₂ : A) (h : ra.score s a₂ ≤ ra.score s a₁) :

¬ra.policy s a₁ < ra.policy s a₂

Score ordering implies ¬(policy strict ordering). Used by compositional proof builder for ¬(L1 w₁ < L1 w₂) goals.

theorem Core.RationalAction.policy_lt_of_score_lt {S : Type u_1} {A : Type u_2} [Fintype A] (ra : RationalAction S A) (s : S) (a₁ a₂ : A) (hlt : ra.score s a₁ < ra.score s a₂) :

ra.policy s a₁ < ra.policy s a₂

Strict policy monotonicity: strictly higher score → strictly higher probability.

Used by rsa_decide to eliminate shared denominator computations: when comparing policy s a₁ < policy s a₂ (same state), it suffices to show score s a₁ < score s a₂, skipping the expensive totalScore computation in the proof term.

theorem Core.RationalAction.policy_lt_iff_score_lt {S : Type u_1} {A : Type u_2} [Fintype A] (ra : RationalAction S A) (s : S) (a₁ a₂ : A) :

ra.policy s a₁ < ra.policy s a₂ ↔ ra.score s a₁ < ra.score s a₂

Iff combining policy_lt_of_score_lt with the contrapositive of policy_monotone: at the same state, policy strict ordering coincides with score strict ordering.

theorem Core.RationalAction.policy_lt_cross {S : Type u_1} {A : Type u_2} [Fintype A] (ra : RationalAction S A) (s₁ s₂ : S) (a : A) (h_pos₁ : 0 < ra.totalScore s₁) (h_pos₂ : 0 < ra.totalScore s₂) (h_cross : ra.score s₁ a * ra.totalScore s₂ < ra.score s₂ a * ra.totalScore s₁) :

ra.policy s₁ a < ra.policy s₂ a

Cross-state policy comparison: compares policy values at different states (different denominators). Used for S2 cross-world comparisons where S2(u|w₁) vs S2(u|w₂) have different normalization constants.

The cross-product condition score(s₁,a) * total(s₂) < score(s₂,a) * total(s₁) is equivalent to score(s₁,a)/total(s₁) < score(s₂,a)/total(s₂) when both totals are positive.

theorem Core.RationalAction.policy_lt_cross_of_cross_lt {S : Type u_1} {A : Type u_2} [Fintype A] (ra : RationalAction S A) (s₁ s₂ : S) (a : A) (h_cross : ra.score s₁ a * ra.totalScore s₂ < ra.score s₂ a * ra.totalScore s₁) :

ra.policy s₁ a < ra.policy s₂ a

Cross-state policy comparison with positivity derived from the cross-product.

Like policy_lt_cross but derives the totalScore > 0 conditions from the cross-product inequality itself: if 0 ≤ score(s₁,a) * total(s₂) < score(s₂,a) * total(s₁), then score(s₂,a) * total(s₁) > 0, so both score(s₂,a) > 0 and total(s₁) > 0. And score(s₂,a) ≤ total(s₂), so total(s₂) > 0.

Used by rsa_predict for cross-utterance L1 comparisons where the two sides have different normalization constants.

theorem Core.RationalAction.policy_list_sum_lt {S : Type u_1} {A : Type u_2} [Fintype A] (ra : RationalAction S A) (s : S) (as₁ as₂ : List A) (h : (List.map (ra.score s) as₁).sum < (List.map (ra.score s) as₂).sum) (htot : 0 < ra.totalScore s) :

(List.map (ra.policy s) as₁).sum < (List.map (ra.policy s) as₂).sum

Score-sum ordering implies policy-sum ordering when both sides share the same state (same denominator). Used by rsa_predict for marginal L1 comparisons where the worlds being summed differ but the utterance and config are shared.

theorem Core.RationalAction.finset_sum_policy_lt_of_sum_score_lt {S : Type u_1} {A : Type u_2} [Fintype A] (ra : RationalAction S A) (s : S) (F₁ F₂ : Finset A) (h : F₁.sum (ra.score s) < F₂.sum (ra.score s)) :

F₁.sum (ra.policy s) < F₂.sum (ra.policy s)

Finset-sum ordering implies policy-sum ordering when both sides share the same state (same denominator). Like policy_list_sum_lt but for Finset.sum.

Derives totalScore positivity from the score ordering itself, so no extra hypothesis is needed: if 0 ≤ Σ_{F₁} score < Σ_{F₂} score, then some score is positive, so totalScore > 0.

Used by rsa_predict for denominator cancellation in marginal comparisons.

Luce's Choice Axiom #

showed that the ratio rule P(a|s) = v(a)/Σv(b) is characterized by the independence of irrelevant alternatives (IIA): the relative probability of two actions depends only on their scores, not on what other actions are available.

We formalize:

The constant ratio rule (Theorem 2): policy(a₁) · score(a₂) = policy(a₂) · score(a₁)
Choice from subsets (pChoice): restriction of the choice rule to a Finset
IIA (Axiom 1): ratios in any subset equal score ratios
The product rule (Theorem 1): P(a,T) = P(a,S) · P(S,T) for S ⊆ T
Scale invariance (Theorem 5): multiplying all scores by k > 0 preserves policy
Uniqueness (Theorem 4, forward): proportional scores yield the same policy

theorem Core.RationalAction.policy_ratio {S : Type u_3} {A : Type u_4} [Fintype A] (ra : RationalAction S A) (s : S) (a₁ a₂ : A) :

ra.policy s a₁ * ra.score s a₂ = ra.policy s a₂ * ra.score s a₁

Constant Ratio Rule (Theorem 2): policy(a₁) · score(a₂) = policy(a₂) · score(a₁). The odds ratio policy(a₁)/policy(a₂) = score(a₁)/score(a₂).

noncomputable def Core.RationalAction.pChoice {S : Type u_3} {A : Type u_4} [Fintype A] [DecidableEq A] (ra : RationalAction S A) (s : S) (T : Finset A) (a : A) :

ℝ

Choice probability from a subset: pChoice(a, T) = score(a) / Σ_{b∈T} score(b). Returns 0 if a ∉ T or the subset total is 0.

Equations

ra.pChoice s T a = if a ∈ T then have subTotal := ∑ b ∈ T, ra.score s b; if subTotal = 0 then 0 else ra.score s a / subTotal else 0

Instances For

theorem Core.RationalAction.pChoice_univ {S : Type u_3} {A : Type u_4} [Fintype A] [DecidableEq A] (ra : RationalAction S A) (s : S) (a : A) :

ra.pChoice s Finset.univ a = ra.policy s a

pChoice on the full set equals policy.

theorem Core.RationalAction.pChoice_nonneg {S : Type u_3} {A : Type u_4} [Fintype A] [DecidableEq A] (ra : RationalAction S A) (s : S) (T : Finset A) (a : A) :

0 ≤ ra.pChoice s T a

pChoice is non-negative.

theorem Core.RationalAction.pChoice_sum_eq_one {S : Type u_3} {A : Type u_4} [Fintype A] [DecidableEq A] (ra : RationalAction S A) (s : S) (T : Finset A) (hT : ∑ b ∈ T, ra.score s b ≠ 0) :

∑ a ∈ T, ra.pChoice s T a = 1

pChoice sums to 1 over the subset when the subset total is nonzero.

theorem Core.RationalAction.pChoice_ratio {S : Type u_3} {A : Type u_4} [Fintype A] [DecidableEq A] (ra : RationalAction S A) (s : S) (T : Finset A) (a₁ a₂ : A) (h₁ : a₁ ∈ T) (h₂ : a₂ ∈ T) :

ra.pChoice s T a₁ * ra.score s a₂ = ra.pChoice s T a₂ * ra.score s a₁

IIA core: the ratio of pChoice values in any subset equals the score ratio. For a₁, a₂ ∈ T with score(a₂) > 0: pChoice(a₁, T) · score(a₂) = pChoice(a₂, T) · score(a₁) (Axiom 1).

theorem Core.RationalAction.iia {S : Type u_3} {A : Type u_4} [Fintype A] [DecidableEq A] (ra : RationalAction S A) (s : S) (S' T : Finset A) (hST : S' ⊆ T) (a : A) (ha : a ∈ S') (hS_pos : ∑ b ∈ S', ra.score s b ≠ 0) (hT_pos : ∑ b ∈ T, ra.score s b ≠ 0) :

ra.pChoice s S' a = ra.pChoice s T a / ∑ b ∈ S', ra.pChoice s T b

IIA (Axiom 1): P(a, S) = P(a, T) / Σ_{b∈S} P(b, T) for S ⊆ T. Choice probability from a subset is the conditional probability.

theorem Core.RationalAction.product_rule {S : Type u_3} {A : Type u_4} [Fintype A] [DecidableEq A] (ra : RationalAction S A) (s : S) (S' T : Finset A) (hST : S' ⊆ T) (a : A) (ha : a ∈ S') (hS_pos : ∑ b ∈ S', ra.score s b ≠ 0) (hT_pos : ∑ b ∈ T, ra.score s b ≠ 0) :

ra.pChoice s T a = ra.pChoice s S' a * ((∑ b ∈ S', ra.score s b) / ∑ b ∈ T, ra.score s b)

Product rule (Theorem 1): P(a, T) = P(a, S) · P(S, T) for a ∈ S ⊆ T, where P(S, T) = Σ_{b∈S} score(b) / Σ_{b∈T} score(b).

noncomputable def Core.RationalAction.scaleBy {S : Type u_3} {A : Type u_4} [Fintype A] (ra : RationalAction S A) (k : ℝ) (hk : 0 < k) :

RationalAction S A

Scale all scores by a positive constant k.

Equations

ra.scaleBy k hk = { score := fun (s : S) (a : A) => k * ra.score s a, score_nonneg := ⋯ }

Instances For

theorem Core.RationalAction.scaleBy_policy {S : Type u_3} {A : Type u_4} [Fintype A] (ra : RationalAction S A) (s : S) (a : A) (k : ℝ) (hk : 0 < k) :

(ra.scaleBy k hk).policy s a = ra.policy s a

Scale invariance (Theorem 5): scaling scores by k > 0 preserves policy.

theorem Core.RationalAction.policy_eq_of_proportional {S : Type u_3} {A : Type u_4} [Fintype A] (ra ra' : RationalAction S A) (s : S) (k : ℝ) (hk : 0 < k) (h : ∀ (a : A), ra'.score s a = k * ra.score s a) (a : A) :

ra'.policy s a = ra.policy s a

Uniqueness (forward direction, Theorem 4): If scores are proportional (score'(s,a) = k · score(s,a) for some k > 0), then both agents have the same policy.

Softmax Function #

The softmax function σ(s, α)ᵢ = exp(α · sᵢ) / Σⱼ exp(α · sⱼ) is the exponential parameterization of the Luce choice rule. Following Franke & Degen (submitted), we establish Facts 1–8.

noncomputable def Core.softmax {ι : Type u_3} [Fintype ι] (s : ι → ℝ) (α : ℝ) :

ι → ℝ

The softmax function: softmax(s, α)ᵢ = exp(α · sᵢ) / Σⱼ exp(α · sⱼ).

Equations

Core.softmax s α i = Real.exp (α * s i) / ∑ j : ι, Real.exp (α * s j)

Instances For

noncomputable def Core.partitionFn {ι : Type u_3} [Fintype ι] (s : ι → ℝ) (α : ℝ) :

ℝ

The partition function (normalizing constant) Z = Σⱼ exp(α · sⱼ).

Equations

Core.partitionFn s α = ∑ j : ι, Real.exp (α * s j)

Instances For

noncomputable def Core.logSumExp {ι : Type u_3} [Fintype ι] (s : ι → ℝ) (α : ℝ) :

ℝ

Log-sum-exp: log of partition function.

Equations

Core.logSumExp s α = Real.log (∑ j : ι, Real.exp (α * s j))

Instances For

theorem Core.partitionFn_pos {ι : Type u_3} [Fintype ι] [Nonempty ι] (s : ι → ℝ) (α : ℝ) :

0 < partitionFn s α

The partition function is always positive.

theorem Core.partitionFn_ne_zero {ι : Type u_3} [Fintype ι] [Nonempty ι] (s : ι → ℝ) (α : ℝ) :

partitionFn s α ≠ 0

theorem Core.softmax_pos {ι : Type u_3} [Fintype ι] [Nonempty ι] (s : ι → ℝ) (α : ℝ) (i : ι) :

0 < softmax s α i

Each softmax probability is positive. (Fact 1, part 1)

theorem Core.softmax_sum_eq_one {ι : Type u_3} [Fintype ι] [Nonempty ι] (s : ι → ℝ) (α : ℝ) :

∑ i : ι, softmax s α i = 1

Softmax probabilities sum to 1. (Fact 1, part 2)

theorem Core.softmax_nonneg {ι : Type u_3} [Fintype ι] [Nonempty ι] (s : ι → ℝ) (α : ℝ) (i : ι) :

0 ≤ softmax s α i

Softmax is non-negative.

theorem Core.softmax_le_one {ι : Type u_3} [Fintype ι] [Nonempty ι] (s : ι → ℝ) (α : ℝ) (i : ι) :

softmax s α i ≤ 1

Softmax is at most 1.

theorem Core.softmax_odds {ι : Type u_3} [Fintype ι] [Nonempty ι] (s : ι → ℝ) (α : ℝ) (i j : ι) :

softmax s α i / softmax s α j = Real.exp (α * (s i - s j))

Fact 2: Odds are determined by score differences: pᵢ/pⱼ = exp(α(sᵢ - sⱼ)).

theorem Core.log_softmax_odds {ι : Type u_3} [Fintype ι] [Nonempty ι] (s : ι → ℝ) (α : ℝ) (i j : ι) :

Real.log (softmax s α i / softmax s α j) = α * (s i - s j)

Log-odds equal scaled score difference.

theorem Core.softmax_ratio {ι : Type u_3} [Fintype ι] [Nonempty ι] (s : ι → ℝ) (α : ℝ) (i j : ι) :

softmax s α i = softmax s α j * Real.exp (α * (s i - s j))

Ratio form of Fact 2.

noncomputable def Core.logistic (x : ℝ) :

ℝ

The logistic (sigmoid) function: S(x) = 1 / (1 + exp(−x)).

Equations

Core.logistic x = 1 / (1 + Real.exp (-x))

Instances For

noncomputable def Core.logit (p : ℝ) :

ℝ

The logit function: L(p) = log(p / (1 − p)). Inverse of logistic on (0, 1).

Equations

Core.logit p = Real.log (p / (1 - p))

Instances For

theorem Core.logit_logistic (x : ℝ) :

logit (logistic x) = x

logit inverts logistic: logit(logistic(x)) = x.

theorem Core.logistic_logit {p : ℝ} (hp0 : 0 < p) (hp1 : p < 1) :

logistic (logit p) = p

logistic inverts logit for 0 < p < 1: logistic(logit(p)) = p.

theorem Core.softmax_binary (s : Fin 2 → ℝ) (α : ℝ) :

softmax s α 0 = logistic (α * (s 0 - s 1))

Fact 3: For n = 2, softmax reduces to logistic.

theorem Core.logit_softmax_binary (s : Fin 2 → ℝ) (α : ℝ) :

logit (softmax s α 0) = α * (s 0 - s 1)

Softmax log-odds equals logit of the binary softmax probability (when there are exactly two alternatives).

theorem Core.softmax_add_const {ι : Type u_3} [Fintype ι] (s : ι → ℝ) (α c : ℝ) :

softmax (fun (i : ι) => s i + c) α = softmax s α

Fact 6: Softmax is translation invariant.

theorem Core.softmax_scale {ι : Type u_3} [Fintype ι] (s : ι → ℝ) (α a : ℝ) (ha : a ≠ 0) :

softmax (fun (i : ι) => a * s i) (α / a) = softmax s α

Fact 8: Multiplicative scaling can be absorbed into α.

theorem Core.softmax_mono {ι : Type u_3} [Fintype ι] [Nonempty ι] (s : ι → ℝ) {α : ℝ} (hα : 0 < α) (i j : ι) (hij : s i ≤ s j) :

softmax s α i ≤ softmax s α j

Higher scores get higher probabilities (for α > 0).

theorem Core.softmax_strict_mono {ι : Type u_3} [Fintype ι] [Nonempty ι] (s : ι → ℝ) {α : ℝ} (hα : 0 < α) (i j : ι) (hij : s i < s j) :

softmax s α i < softmax s α j

Strict monotonicity.

theorem Core.softmax_zero {ι : Type u_3} [Fintype ι] [Nonempty ι] (s : ι → ℝ) :

softmax s 0 = fun (x : ι) => 1 / ↑(Fintype.card ι)

At α = 0, softmax is uniform.

theorem Core.softmax_neg_mono {ι : Type u_3} [Fintype ι] [Nonempty ι] (s : ι → ℝ) {α : ℝ} (hα : α < 0) (i j : ι) (hij : s i ≤ s j) :

softmax s α j ≤ softmax s α i

For α < 0, lower scores get higher probabilities.

theorem Core.log_softmax {ι : Type u_3} [Fintype ι] [Nonempty ι] (s : ι → ℝ) (α : ℝ) (i : ι) :

Real.log (softmax s α i) = α * s i - Real.log (partitionFn s α)

Log of softmax = score minus log partition function.

noncomputable def Core.softmax1 {ι : Type u_3} [Fintype ι] (s : ι → ℝ) :

ι → ℝ

Softmax with default α = 1.

Equations

Core.softmax1 s = Core.softmax s 1

Instances For

noncomputable def Core.softmaxTemp {ι : Type u_3} [Fintype ι] (s : ι → ℝ) (τ : ℝ) :

ι → ℝ

Temperature form: τ = 1/α.

Equations

Core.softmaxTemp s τ = Core.softmax s (1 / τ)

Instances For

theorem Core.softmax_exponential_family {ι : Type u_3} [Fintype ι] (s : ι → ℝ) (α : ℝ) (i : ι) [Nonempty ι] :

softmax s α i = Real.exp (α * s i - logSumExp s α)

Softmax is an exponential family distribution.

theorem Core.rpow_luce_eq_softmax {ι : Type u_3} [Fintype ι] [Nonempty ι] (f : ι → ℝ) (α : ℝ) (hf : ∀ (i : ι), 0 < f i) (i : ι) :

f i ^ α / ∑ j : ι, f j ^ α = softmax (fun (j : ι) => Real.log (f j)) α i

Luce choice with rpow scores equals softmax over log scores.

f(i)^α / Σⱼ f(j)^α = softmax(log ∘ f, α)(i) when all f(i) > 0.

This is the general identity connecting belief-based RSA (which uses rpow) to the softmax framework (which uses exp). Every S1 model with s1Score = rpow(l0, α) inherits all softmax limit theorems via this identity: as α → ∞, rpow-based Luce choice concentrates on the argmax of f, i.e., the most informative utterance.

Why Softmax? The Fechnerian Characterization #

The exponential parameterization score = exp(α · utility) is not a design choice — it is the unique transformation connecting Luce's ratio scale to a utility (interval) scale (§2.A; @cite{adams-messick-1958}).

Ratio vs interval scales. Luce's Axiom 1 (IIA) yields a ratio scale v: only ratios v(a)/v(b) are meaningful (Theorem 4). Fechner's psychophysics requires an interval scale u: only differences u(a) - u(b) are meaningful. The question: how are v and u related?

Derivation. From P(a,b) = v(a)/(v(a)+v(b)), the odds ratio is v(a)/v(b) = g(u(a) - u(b)) for some function g. Transitivity of ratios (v(a)/v(c) = [v(a)/v(b)] · [v(b)/v(c)]) forces Cauchy's multiplicative functional equation: g(s + t) = g(s) · g(t). The unique monotone increasing solution is g(s) = exp(k · s) (cauchy_mul_exp), giving:

v(a) = C · exp(k · u(a)) — the ratio scale IS the exponential of utility
P(a,b) = 1/(1 + exp(-k · (u(a) - u(b)))) — the logistic function
For n alternatives: P(a|S) = exp(k · u(a)) / Σ exp(k · u(b)) — softmax

The parameter k > 0 is the rationality parameter α in RSA.

theorem Core.cauchy_mul_exp (g : ℝ → ℝ) (hg_mul : ∀ (s t : ℝ), g (s + t) = g s * g t) (hg_mono : StrictMono g) :

∃ (k : ℝ), 0 < k ∧ g 0 = 1 ∧ ∀ (s : ℝ), g s = Real.exp (k * s)

Cauchy's multiplicative functional equation (classical): If g : ℝ → ℝ satisfies g(s + t) = g(s) · g(t) and is strictly monotone increasing, then g(s) = exp(k · s) for some k > 0.

The proof reduces to the additive Cauchy equation via log: setting h = log ∘ g, the multiplicative equation becomes h(s+t) = h(s) + h(t). The key lemma (cauchy_monotone_additive_linear) shows that a strictly monotone additive function must be linear, by density of ℚ in ℝ: h agrees with x ↦ k·x on rationals (by induction), and any deviation on an irrational x would violate monotonicity via a rational witness between x and h(x)/k.

theorem Core.luce_fechnerian_exp {X : Type u_3} (v u : X → ℝ) (g : ℝ → ℝ) (hv_pos : ∀ (x : X), 0 < v x) (h_ratio : ∀ (x y : X), v x / v y = g (u x - u y)) (hg_mul : ∀ (s t : ℝ), g (s + t) = g s * g t) (hg_mono : StrictMono g) :

∃ (k : ℝ), 0 < k ∧ ∀ (x₀ x : X), v x = v x₀ * Real.exp (k * (u x - u x₀))

Fechnerian uniqueness (§2.A; @cite{adams-messick-1958}): If a ratio scale v and interval scale u represent the same ordering via v(x)/v(y) = g(u(x) - u(y)) for a strictly monotone multiplicative g, then v is the exponential of u.

This is WHY fromSoftmax uses exp(α · utility): the exponential is forced by the requirement that log-odds be linear in utility differences. It is the unique bridge between Luce's ratio scale (Chapter 1) and Fechner's interval scale (Chapter 2).

noncomputable def Core.RationalAction.fromSoftmax {S : Type u_1} {A : Type u_2} [Fintype A] (utility : S → A → ℝ) (α : ℝ) :

RationalAction S A

Construct a RationalAction from a utility function via softmax.

The score is exp(α · utility(s, a)), so policy = softmax(utility, α). The exponential parameterization is forced by the Fechnerian characterization (luce_fechnerian_exp): it is the unique bridge from Luce's ratio scale to an additive utility scale.

Equations

Core.RationalAction.fromSoftmax utility α = { score := fun (s : S) (a : A) => Real.exp (α * utility s a), score_nonneg := ⋯ }

Instances For

theorem Core.RationalAction.fromSoftmax_policy_eq {S : Type u_1} {A : Type u_2} [Fintype A] [Nonempty A] (utility : S → A → ℝ) (α : ℝ) (s : S) (a : A) :

(fromSoftmax utility α).policy s a = softmax (utility s) α a

The policy of a softmax agent equals the softmax function.

Gibbs Variational Principle #

The softmax distribution uniquely maximizes entropy + expected score on the probability simplex. This is the mathematical foundation for RSA convergence (@cite{zaslavsky-hu-levy-2020}, Proposition 1).

Proof strategy #

The Gibbs VP reduces to KL non-negativity via three identities:

H(p) + KL(p‖q) = -∑ pᵢ log qᵢ (negMulLog + KL term telescope)
-∑ pᵢ log qᵢ = -α⟨p,s⟩ + log Z (substitute log qᵢ = α sᵢ - log Z)
H(q) + α⟨q,s⟩ = log Z (softmax self-information)

Combining: H(p) + α⟨p,s⟩ + KL = log Z = H(q) + α⟨q,s⟩, so KL ≥ 0 ⟹ LHS ≤ RHS.

The KL machinery used here — klFinite, kl_eq_sum_klFun, kl_nonneg — is inlined as private (ι→ℝ) helpers in this file (mathlib gap; the PMF form lives at PMF.klDiv / PMF.toReal_klDiv_eq_sum_log_div).

noncomputable def Core.speakerObj {ι : Type u_3} [Fintype ι] (v : ι → ℝ) (α : ℝ) (s : ι → ℝ) :

ℝ

The per-meaning speaker objective: f(s) = Σᵤ [negMulLog(sᵤ) + α · sᵤ · vᵤ].

This is the function that the speaker maximizes for each meaning m, where vᵤ = log L(m|u) is the listener utility of utterance u.

Equations

Core.speakerObj v α s = ∑ u : ι, ((s u).negMulLog + α * s u * v u)

Instances For

theorem Core.speakerObj_at_softmax {ι : Type u_3} [Fintype ι] [Nonempty ι] (v : ι → ℝ) (α : ℝ) :

speakerObj v α (softmax v α) = logSumExp v α

The softmax achieves f(s*) = log Z, where Z is the partition function.

theorem Core.gibbs_variational {ι : Type u_3} [Fintype ι] [Nonempty ι] (s : ι → ℝ) (α : ℝ) (p : ι → ℝ) (hp_nonneg : ∀ (i : ι), 0 ≤ p i) (hp_sum : ∑ i : ι, p i = 1) :

∑ i : ι, (p i).negMulLog + α * ∑ i : ι, p i * s i ≤ ∑ i : ι, (softmax s α i).negMulLog + α * ∑ i : ι, softmax s α i * s i

Gibbs Variational Principle: softmax maximizes entropy + expected score.

For any distribution p on ι and scores s : ι → ℝ: H(p) + α·⟨p, s⟩ ≤ H(q) + α·⟨q, s⟩ where q = softmax(s, α) and H(p) = Σ negMulLog(pᵢ).

Softmax Concentration at High Rationality #

As the rationality parameter α → ∞, softmax concentrates all probability mass on the action with highest utility — i.e., softmax converges to argmax. This connects:

MaxEnt phonology ↔ OT: a MaxEnt grammar with weights w becomes categorical (OT-like) as temperature → 0 (equivalently α → ∞).
RSA ↔ neo-Gricean pragmatics: a soft-rational RSA speaker becomes a hard-rational Gricean reasoner in the α → ∞ limit.

Proof sketch #

From softmax_odds, we have σᵢ / σⱼ = exp(α(sᵢ − sⱼ)). When sᵢ > sⱼ, this ratio → ∞ as α → ∞, so σⱼ / σᵢ → 0. Since Σ σₖ = 1, the maximizer's probability → 1 by squeezing: 1 - σ_max = Σ_{k≠max} σₖ, and each non-maximal term → 0 (bounded by exp(-α · gap) where gap = sᵢ - sⱼ > 0).

theorem Core.softmax_argmax_limit {ι : Type u_3} [Fintype ι] [Nonempty ι] [DecidableEq ι] (s : ι → ℝ) (i_max : ι) (h_max : ∀ (j : ι), j ≠ i_max → s j < s i_max) (ε : ℝ) :

ε > 0 → ∃ (α₀ : ℝ), ∀ α > α₀, |softmax s α i_max - 1| < ε

Softmax → argmax limit: as α → ∞, softmax concentrates on the unique maximizer. For any i_max with s i_max strictly greater than all other scores, softmax(s, α)_{i_max} → 1.

This is the formal connection between MaxEnt (soft optimization) and OT (hard optimization): OT is the α → ∞ limit of MaxEnt.

theorem Core.softmax_nonmax_limit {ι : Type u_3} [Fintype ι] (s : ι → ℝ) (i_max : ι) (h_max : ∀ (j : ι), j ≠ i_max → s j < s i_max) (j : ι) (hj : j ≠ i_max) (ε : ℝ) :

ε > 0 → ∃ (α₀ : ℝ), ∀ α > α₀, softmax s α j < ε

Complement of the limit: non-maximal actions get probability → 0.

Entropy + softmax interactions #

The (ι→ℝ)-typed Shannon entropy lives in Core/InformationTheory.lean as entropy : Finset α → (α → ℝ) → ℝ. The PMF-typed canonical form is PMF.entropy : PMF α → ℝ; the two agree by definition on (ofRealWeightFn p).toRealFn = p for normalized p (see PMF.ofRealWeightFn_toRealFn_eq). This section uses the (ι→ℝ) form because softmax is a real-arithmetic construction; consumers wanting the PMF form can wrap via PMF.ofRealWeightFn.

theorem Core.entropy_le_log_card {ι : Type u_3} [Fintype ι] [Nonempty ι] (p : ι → ℝ) (hp_nonneg : ∀ (i : ι), 0 ≤ p i) (hp_sum : ∑ i : ι, p i = 1) :

Core.entropy✝ Finset.univ p ≤ Real.log ↑(Fintype.card ι)

Maximum entropy is achieved by uniform distribution.

Proof: KL(p ‖ uniform) ≥ 0, and KL(p ‖ uniform) = log n - H(p).

theorem Core.entropy_softmax {ι : Type u_3} [Fintype ι] [Nonempty ι] (s : ι → ℝ) (α : ℝ) :

Core.entropy✝ Finset.univ (softmax s α) = Real.log (partitionFn s α) - α * ∑ i : ι, softmax s α i * s i

Entropy of softmax: H(softmax(s, α)) = log Z - α · 𝔼[s].

noncomputable def Core.entropyRegObjective {ι : Type u_3} [Fintype ι] (s : ι → ℝ) (α : ℝ) (p : ι → ℝ) :

ℝ

Entropy-regularized objective: G_α(p, s) = ⟨s, p⟩ + (1/α) H(p).

Equations

Core.entropyRegObjective s α p = ∑ i : ι, p i * s i + 1 / α * Core.entropy✝ Finset.univ p

Instances For

theorem Core.entropyRegObjective_softmax {ι : Type u_3} [Fintype ι] [Nonempty ι] (s : ι → ℝ) (α : ℝ) (hα : 0 < α) :

entropyRegObjective s α (softmax s α) = 1 / α * Real.log (partitionFn s α)

The maximum value of the entropy-regularized objective.

theorem Core.softmax_maximizes_entropyReg {ι : Type u_3} [Fintype ι] [Nonempty ι] (s : ι → ℝ) (α : ℝ) (hα : 0 < α) (p : ι → ℝ) (hp_nonneg : ∀ (i : ι), 0 ≤ p i) (hp_sum : ∑ i : ι, p i = 1) :

entropyRegObjective s α p ≤ entropyRegObjective s α (softmax s α)

Fact 5: Softmax maximizes the entropy-regularized objective.

Proof: gibbs_variational gives H(p) + α⟨p,s⟩ ≤ H(q) + α⟨q,s⟩; dividing by α > 0 yields the result.

theorem Core.softmax_unique_maximizer {ι : Type u_3} [Fintype ι] [Nonempty ι] (s : ι → ℝ) (α : ℝ) (hα : 0 < α) (p : ι → ℝ) (hp_nonneg : ∀ (i : ι), 0 ≤ p i) (hp_sum : ∑ i : ι, p i = 1) (h_max : entropyRegObjective s α p = entropyRegObjective s α (softmax s α)) :

p = softmax s α

Softmax is the unique maximizer.

Proof: equality in the objective ⟹ KL(p ‖ softmax) = 0 (via speakerObj_plus_kl), hence p = softmax (via kl_eq_zero_imp_eq).

noncomputable def Core.freeEnergy {ι : Type u_3} [Fintype ι] (s : ι → ℝ) (α : ℝ) (p : ι → ℝ) :

ℝ

Free energy (from statistical mechanics).

Equations

Core.freeEnergy s α p = -∑ i : ι, p i * s i - 1 / α * Core.entropy✝ Finset.univ p

Instances For

theorem Core.softmax_minimizes_freeEnergy {ι : Type u_3} [Fintype ι] [Nonempty ι] (s : ι → ℝ) (α : ℝ) (hα : 0 < α) (p : ι → ℝ) (hp_nonneg : ∀ (i : ι), 0 ≤ p i) (hp_sum : ∑ i : ι, p i = 1) :

freeEnergy s α (softmax s α) ≤ freeEnergy s α p

Softmax is the Boltzmann distribution: minimizes free energy.

theorem Core.logSumExp_convex {ι : Type u_3} [Fintype ι] [Nonempty ι] (s : ι → ℝ) :

ConvexOn ℝ Set.univ fun (α : ℝ) => logSumExp s α

The log-partition function is convex in α.

Proof: By Hölder's inequality. For 0 < a, b with a + b = 1: ∑ exp(x·sᵢ)^a · exp(y·sᵢ)^b ≤ (∑ exp(x·sᵢ))^a · (∑ exp(y·sᵢ))^b Since exp(x·sᵢ)^a · exp(y·sᵢ)^b = exp((ax+by)·sᵢ), taking logs gives logSumExp(s, ax+by) ≤ a·logSumExp(s, x) + b·logSumExp(s, y).

theorem Core.deriv_logSumExp {ι : Type u_3} [Fintype ι] [Nonempty ι] (s : ι → ℝ) (α : ℝ) :

deriv (fun (α : ℝ) => logSumExp s α) α = ∑ i : ι, softmax s α i * s i

Derivative of log-partition gives expected value: d/dα log(Σ exp(α sᵢ)) = Σ softmax(s,α)ᵢ · sᵢ.

Proof via chain rule on log ∘ Σ exp(α · sᵢ), then hasDerivAt_finset_sum.

noncomputable def Core.partitionFnOffset {ι : Type u_3} [Fintype ι] (s r : ι → ℝ) (α : ℝ) :

ℝ

Partition function with per-element offsets: Z(α) = Σⱼ exp(α · sⱼ + rⱼ).

Equations

Core.partitionFnOffset s r α = ∑ j : ι, Real.exp (α * s j + r j)

Instances For

theorem Core.partitionFnOffset_pos {ι : Type u_3} [Fintype ι] [Nonempty ι] (s r : ι → ℝ) (α : ℝ) :

0 < partitionFnOffset s r α

noncomputable def Core.logSumExpOffset {ι : Type u_3} [Fintype ι] (s r : ι → ℝ) (α : ℝ) :

ℝ

Log-sum-exp with offsets: log Σ exp(α · sᵢ + rᵢ).

Equations

Core.logSumExpOffset s r α = Real.log (Core.partitionFnOffset s r α)

Instances For

noncomputable def Core.softmaxOffset {ι : Type u_3} [Fintype ι] (s r : ι → ℝ) (α : ℝ) (i : ι) :

ℝ

Softmax with offsets: exp(α · sᵢ + rᵢ) / Z(α).

Equations

Core.softmaxOffset s r α i = Real.exp (α * s i + r i) / Core.partitionFnOffset s r α

Instances For

theorem Core.hasDerivAt_logSumExpOffset {ι : Type u_3} [Fintype ι] [Nonempty ι] (s r : ι → ℝ) (α : ℝ) :

HasDerivAt (logSumExpOffset s r) (∑ i : ι, softmaxOffset s r α i * s i) α

Derivative of offset log-partition gives weighted expected value: d/dα log(Σ exp(α·sᵢ + rᵢ)) = Σ softmaxOffset(s,r,α)ᵢ · sᵢ.

theorem Core.logSumExpOffset_convex {ι : Type u_3} [Fintype ι] [Nonempty ι] (s r : ι → ℝ) :

ConvexOn ℝ Set.univ (logSumExpOffset s r)

The offset log-partition function is convex in α.

Same Hölder argument as logSumExp_convex: the key factoring exp((ax+by)·sᵢ + rᵢ) = exp(x·sᵢ + rᵢ)^a · exp(y·sᵢ + rᵢ)^b holds because a + b = 1, absorbing the offset.

theorem Core.logConditional_concaveOn {ι : Type u_3} [Fintype ι] [Nonempty ι] (s r : ι → ℝ) (y : ι) :

ConcaveOn ℝ Set.univ fun (α : ℝ) => α * s y + r y - logSumExpOffset s r α

The log conditional likelihood α ↦ (α · sᵧ + rᵧ) − logSumExpOffset(s,r,α) is concave (affine minus convex).

theorem Core.hasDerivAt_logConditional {ι : Type u_3} [Fintype ι] [Nonempty ι] (s r : ι → ℝ) (y : ι) (α : ℝ) :

HasDerivAt (fun (w : ℝ) => w * s y + r y - logSumExpOffset s r w) (s y - ∑ i : ι, softmaxOffset s r α i * s i) α

The derivative of log conditional likelihood α ↦ (α·sᵧ + rᵧ) − logSumExpOffset is the observed feature value minus the expected value: d/dα = sᵧ − Σᵢ softmaxOffset(s,r,α)ᵢ · sᵢ.

theorem Core.max_entropy_duality {ι : Type u_3} [Fintype ι] [Nonempty ι] (s : ι → ℝ) (c α : ℝ) (_hα : 0 < α) (h_constraint : ∑ i : ι, softmax s α i * s i = c) :

Core.entropy✝ Finset.univ (softmax s α) = Real.log (partitionFn s α) - α * c

Strong duality: max entropy = min free energy.

theorem Core.bayesian_maximizes {ι : Type u_3} [Fintype ι] (w : ι → ℝ) (hw_nonneg : ∀ (i : ι), 0 ≤ w i) (hC_pos : 0 < ∑ i : ι, w i) (L : ι → ℝ) (hL_pos : ∀ (i : ι), 0 < L i) (hL_sum : ∑ i : ι, L i = 1) :

∑ i : ι, w i * Real.log (L i) ≤ ∑ i : ι, w i * Real.log (w i / ∑ j : ι, w j)

Bayesian optimality: The normalized posterior maximizes weighted log-likelihood on the probability simplex.

For weights wᵢ ≥ 0 with C = Σwᵢ > 0, and any distribution L on the simplex (Lᵢ > 0, Σ Lᵢ = 1), the normalized posterior p*ᵢ = wᵢ/C satisfies:

Σᵢ wᵢ · log(Lᵢ) ≤ Σᵢ wᵢ · log(p*ᵢ)

This is used for listener optimality: with wᵢ = P(m)·S(u|m), the Bayesian listener L*(m|u) = wᵢ/C maximizes Σ_m P(m)·S(u|m)·log L(m|u).

Uniqueness of the Ratio Scale #

Theorem 4 (proved earlier as policy_eq_of_proportional) shows that proportional scores yield the same policy. The converse — that identical policies force proportional scores — completes the uniqueness characterization: same policy ↔ proportional scores, with proportionality constant k = totalScore₂/totalScore₁.

Note: This is distinct from the "Independence-of-Unit Condition" in §1.F (pp. 28–33), which concerns state-dependent transformations f satisfying f(kv) = kf(v). That condition addresses how scale values transform across experimental conditions, not the uniqueness of the scale within a condition.

theorem Core.RationalAction.proportional_of_policy_eq {S : Type u_3} {A : Type u_4} [Fintype A] (ra₁ ra₂ : RationalAction S A) (s : S) (h₁ : 0 < ra₁.totalScore s) (h₂ : 0 < ra₂.totalScore s) (hpol : ∀ (a : A), ra₁.policy s a = ra₂.policy s a) (a : A) :

ra₂.score s a = ra₂.totalScore s / ra₁.totalScore s * ra₁.score s a

Converse of Theorem 4 (uniqueness of ratio scale): If two agents with positive total scores have the same policy, then their scores are proportional with constant k = total₂/total₁.

theorem Core.RationalAction.policy_eq_iff_proportional {S : Type u_3} {A : Type u_4} [Fintype A] (ra₁ ra₂ : RationalAction S A) (s : S) (h₁ : 0 < ra₁.totalScore s) (h₂ : 0 < ra₂.totalScore s) :

(∀ (a : A), ra₁.policy s a = ra₂.policy s a) ↔ ∃ (k : ℝ), 0 < k ∧ ∀ (a : A), ra₂.score s a = k * ra₁.score s a

Full uniqueness characterization (Theorem 4 and its converse): Two agents with positive total scores have the same policy if and only if their scores are proportional.

Alternative Forms of Axiom 1 #

proves three equivalent formulations of the choice axiom:

(a) Ratio form: There exists a positive function v such that P(x, T) = v(x) / Σ_{y∈T} v(y) for all x ∈ T.

(b) Product rule: P(x, T) = P(x, S) · P(S, T) for x ∈ S ⊆ T, where P(S, T) = Σ_{y∈S} P(y, T).

(c) Pairwise independence: The odds ratio P(x,{x,y}) · P(y,T) = P(y,{x,y}) · P(x,T) — pairwise ratios are preserved in any superset.

The ratio form (a) is the definition of RationalAction; (a)→(b) is product_rule and (a)→(c) is pChoice_ratio.

structure Core.ChoiceFn (A : Type u_4) [DecidableEq A] :

Type u_4

A choice function on finite subsets — the canonical Luce 1959 form (Definition 1, p. 5: "P(a, T) … is a probability distribution on T").

For each non-empty subset T ⊆ A, prob T : A → ℝ is a probability distribution: non-negative, vanishes outside T, sums to 1 inside. Specializes to:

Binary choice via cf.binary x y := cf.prob {x, y} x (see ChoiceFn.binary). Complementarity follows from prob_sum_eq_one (see binary_complement); no separate BinaryChoiceFn structure needed.
Mathlib PMF when restricted to a fixed support: prob T : A → ℝ is a probability distribution conditioned on choice from T.

The Luce axioms (hasRatioScale, hasProductRule, hasPairwiseIIA) are stated as properties below, asserting structural facts about which choice functions admit ratio-scale representations.

prob : Finset A → A → ℝ
P(a, T): probability of choosing a from set T
prob_nonneg (T : Finset A) (a : A) : 0 ≤ self.prob T a
Probabilities are non-negative
prob_zero_outside (T : Finset A) (a : A) : a ∉ T → self.prob T a = 0
Zero probability outside the choice set
prob_sum_eq_one (T : Finset A) : T.Nonempty → ∑ a ∈ T, self.prob T a = 1
Probabilities sum to 1 within the choice set (Luce 1959 Def 1)

Instances For

def Core.ChoiceFn.binary {A : Type u_4} [DecidableEq A] (cf : ChoiceFn A) (x y : A) :

ℝ

Binary choice view: binary cf x y is the probability of choosing x over y in the binary forced choice between them.

Defined via cf.prob {x, y} x. Replaces the legacy BinaryChoiceFn structure in UtilityTheory.lean (see binary_complement for complementarity).

Equations

cf.binary x y = cf.prob {x, y} x

Instances For

theorem Core.ChoiceFn.binary_nonneg {A : Type u_4} [DecidableEq A] (cf : ChoiceFn A) (x y : A) :

0 ≤ cf.binary x y

Binary choice probabilities are non-negative.

theorem Core.ChoiceFn.binary_le_one {A : Type u_4} [DecidableEq A] (cf : ChoiceFn A) (x y : A) :

cf.binary x y ≤ 1

Binary choice probabilities are at most 1.

theorem Core.ChoiceFn.binary_complement {A : Type u_4} [DecidableEq A] (cf : ChoiceFn A) {x y : A} (hxy : x ≠ y) :

cf.binary x y + cf.binary y x = 1

Binary complementarity: cf.binary x y + cf.binary y x = 1 when x ≠ y. Derived from prob_sum_eq_one over {x, y}.

def Core.ChoiceFn.hasRatioScale {A : Type u_3} [DecidableEq A] (cf : ChoiceFn A) :

Axiom 1, Form (a): ratio scale representation. There exists v > 0 such that P(x, T) = v(x) / Σ v(y).

Equations

cf.hasRatioScale = ∃ (v : A → ℝ), (∀ (a : A), 0 < v a) ∧ ∀ (T : Finset A), ∀ a ∈ T, cf.prob T a = v a / ∑ b ∈ T, v b

Instances For

def Core.ChoiceFn.hasProductRule {A : Type u_3} [DecidableEq A] (cf : ChoiceFn A) :

Axiom 1, Form (b): product rule. P(x, T) = P(x, S) · Σ_{y∈S} P(y, T) for x ∈ S ⊆ T.

Equations

cf.hasProductRule = ∀ (S T : Finset A), S ⊆ T → S.Nonempty → ∑ b ∈ T, cf.prob T b = 1 → ∀ a ∈ S, cf.prob T a = cf.prob S a * ∑ b ∈ S, cf.prob T b

Instances For

def Core.ChoiceFn.hasPairwiseIIA {A : Type u_3} [DecidableEq A] (cf : ChoiceFn A) :

Axiom 1, Form (c): pairwise independence (IIA). The odds ratio is preserved in any superset: P(x,T) · P(y,{x,y}) = P(y,T) · P(x,{x,y}).

Equations

cf.hasPairwiseIIA = ∀ (T : Finset A) (a b : A), a ∈ T → b ∈ T → cf.prob T a * cf.prob {a, b} b = cf.prob T b * cf.prob {a, b} a

Instances For

theorem Core.ratio_implies_product {A : Type u_3} [DecidableEq A] (cf : ChoiceFn A) (h : cf.hasRatioScale) :

cf.hasProductRule

(a) → (b): Ratio form implies product rule (Appendix 1).

theorem Core.ratio_implies_pairwiseIIA {A : Type u_3} [DecidableEq A] (cf : ChoiceFn A) (h : cf.hasRatioScale) :

cf.hasPairwiseIIA

(a) → (c): Ratio form implies pairwise IIA (Appendix 1).

theorem Core.pairwiseIIA_implies_ratio {A : Type u_3} [DecidableEq A] [Inhabited A] (cf : ChoiceFn A) (hIIA : cf.hasPairwiseIIA) (hpos : ∀ (T : Finset A), ∀ a ∈ T, 0 < cf.prob T a) :

cf.hasRatioScale

(c) → (a): Pairwise IIA implies ratio form (Appendix 1). The ratio scale is constructed by fixing a reference element x₀ and setting v(x) = P(x, {x, x₀}) / P(x₀, {x, x₀}). Requires strict positivity for elements in the choice set; normalization (probabilities sum to 1) is now structural (via prob_sum_eq_one).

theorem Core.axiom1_ratio_iff_pairwiseIIA {A : Type u_3} [DecidableEq A] [Inhabited A] (cf : ChoiceFn A) (hpos : ∀ (T : Finset A), ∀ a ∈ T, 0 < cf.prob T a) :

cf.hasRatioScale ↔ cf.hasPairwiseIIA

Axiom 1 equivalence (Appendix 1): Ratio form ↔ pairwise IIA (under strict positivity; normalization is now structural via prob_sum_eq_one).