Documentation

Linglib.Core.Probability.Choice.RationalAction

Rational action #

[Luc59] [CT06] [ZHL21] [AM58]

The mathematical foundation for all soft-rational agents: RSA speakers and listeners, BToM agents, and decision-theoretic actors. A RationalAction agent selects actions with probability proportional to a non-negative score — the Luce choice rule, the unique rule satisfying independence of irrelevant alternatives (IIA): the relative probability of two actions depends only on their scores.

The exponential parameterization (softmax) lives in Core.Probability.LogitChoice, and its variational / information-theoretic / limiting theory in Core.Probability.SoftmaxTheory (re-exported here); this file is the agent-framing layer on top of those pure-math cores.

Main definitions #

RationalAction, RationalAction.policy — the Luce choice rule.
RationalAction.fromSoftmax — build an agent from a utility via exp(α · u), bridging the agent framing to the softmax core.
ChoiceFn and ChoiceFn.hasRatioScale / hasProductRule / hasPairwiseIIA — the choice-axiom forms.

Main results #

RationalAction.policy_eq_of_proportional, RationalAction.policy_eq_iff_proportional — the policy depends only on the ratio scale of the scores.
RationalAction.iia, RationalAction.product_rule — Luce's choice axiom for pChoice.
axiom1_ratio_iff_pairwiseIIA — equivalence of the choice-axiom forms.

Score-based agents (Luce choice rule) #

structure Core.RationalAction (State : Type u_1) (Action : Type u_2) [Fintype Action] :

Type (max u_1 u_2)

A rational action agent: selects actions with probability ∝ score(state, action).

This is the Luce choice rule. The score function is unnormalized; normalization to a proper distribution is derived (see policy).

Key instances:

RSA L0: score(u, w) = prior(w) · meaning(u, w)
RSA S1: score(w, u) = rpow(informativity(w, u), α) · exp(-α · cost(u))
BToM agent: score(state, action) = exp(β · Q(state, action))

score : State → Action → ℝ
Unnormalized score: policy(a|s) ∝ score(s, a).
score_nonneg (s : State) (a : Action) : 0 ≤ self.score s a
Scores are non-negative.

Instances For

noncomputable def Core.RationalAction.totalScore {S : Type u_1} {A : Type u_2} [Fintype A] (ra : RationalAction S A) (s : S) :

ℝ

Total score across all actions in a state (normalization constant).

Equations

ra.totalScore s = ∑ a : A, ra.score s a

Instances For

theorem Core.RationalAction.totalScore_nonneg {S : Type u_1} {A : Type u_2} [Fintype A] (ra : RationalAction S A) (s : S) :

0 ≤ ra.totalScore s

noncomputable def Core.RationalAction.policy {S : Type u_1} {A : Type u_2} [Fintype A] (ra : RationalAction S A) (s : S) (a : A) :

ℝ

Normalized policy: P(a|s) = score(s,a) / Σ_a' score(s,a'). Returns 0 for all actions if totalScore = 0.

Equations

ra.policy s a = if ra.totalScore s = 0 then 0 else ra.score s a / ra.totalScore s

Instances For

theorem Core.RationalAction.policy_nonneg {S : Type u_1} {A : Type u_2} [Fintype A] (ra : RationalAction S A) (s : S) (a : A) :

0 ≤ ra.policy s a

theorem Core.RationalAction.policy_sum_eq_one {S : Type u_1} {A : Type u_2} [Fintype A] (ra : RationalAction S A) (s : S) (h : ra.totalScore s ≠ 0) :

∑ a : A, ra.policy s a = 1

Policy sums to 1 when totalScore is nonzero.

theorem Core.RationalAction.policy_monotone {S : Type u_1} {A : Type u_2} [Fintype A] (ra : RationalAction S A) (s : S) (a₁ a₂ : A) (h : ra.score s a₁ ≤ ra.score s a₂) :

ra.policy s a₁ ≤ ra.policy s a₂

Policy monotonicity: higher score → higher probability.

theorem Core.RationalAction.policy_eq_zero_of_score_eq_zero {S : Type u_1} {A : Type u_2} [Fintype A] (ra : RationalAction S A) (s : S) (a : A) (h : ra.score s a = 0) :

ra.policy s a = 0

Zero score implies zero policy, regardless of whether totalScore is zero.

theorem Core.RationalAction.policy_eq_of_score_eq {S : Type u_1} {A : Type u_2} [Fintype A] (ra : RationalAction S A) (s : S) (a₁ a₂ : A) (h : ra.score s a₁ = ra.score s a₂) :

ra.policy s a₁ = ra.policy s a₂

Policy respects score equality: equal scores → equal probabilities. Follows directly from the Luce rule: both sides are score / totalScore with the same denominator.

theorem Core.RationalAction.policy_eq_one_of_totalScore_eq {S : Type u_1} {A : Type u_2} [Fintype A] (ra : RationalAction S A) (s : S) (a : A) (h_sum : ra.totalScore s = ra.score s a) (h_pos : 0 < ra.score s a) :

ra.policy s a = 1

When totalScore equals the score of action a, the policy for a is 1.

theorem Core.RationalAction.policy_not_lt_of_score_le {S : Type u_1} {A : Type u_2} [Fintype A] (ra : RationalAction S A) (s : S) (a₁ a₂ : A) (h : ra.score s a₂ ≤ ra.score s a₁) :

¬ra.policy s a₁ < ra.policy s a₂

Score ordering implies the negation of the strict policy ordering.

theorem Core.RationalAction.policy_lt_of_score_lt {S : Type u_1} {A : Type u_2} [Fintype A] (ra : RationalAction S A) (s : S) (a₁ a₂ : A) (hlt : ra.score s a₁ < ra.score s a₂) :

ra.policy s a₁ < ra.policy s a₂

Strict policy monotonicity: strictly higher score gives strictly higher probability.

theorem Core.RationalAction.policy_lt_iff_score_lt {S : Type u_1} {A : Type u_2} [Fintype A] (ra : RationalAction S A) (s : S) (a₁ a₂ : A) :

ra.policy s a₁ < ra.policy s a₂ ↔ ra.score s a₁ < ra.score s a₂

Iff combining policy_lt_of_score_lt with the contrapositive of policy_monotone: at the same state, policy strict ordering coincides with score strict ordering.

theorem Core.RationalAction.policy_lt_cross {S : Type u_1} {A : Type u_2} [Fintype A] (ra : RationalAction S A) (s₁ s₂ : S) (a : A) (h_pos₁ : 0 < ra.totalScore s₁) (h_pos₂ : 0 < ra.totalScore s₂) (h_cross : ra.score s₁ a * ra.totalScore s₂ < ra.score s₂ a * ra.totalScore s₁) :

ra.policy s₁ a < ra.policy s₂ a

Cross-state policy comparison (states with different normalization constants): the cross-product score(s₁,a) · total(s₂) < score(s₂,a) · total(s₁) is equivalent to policy s₁ a < policy s₂ a when both totals are positive.

theorem Core.RationalAction.policy_lt_cross_of_cross_lt {S : Type u_1} {A : Type u_2} [Fintype A] (ra : RationalAction S A) (s₁ s₂ : S) (a : A) (h_cross : ra.score s₁ a * ra.totalScore s₂ < ra.score s₂ a * ra.totalScore s₁) :

ra.policy s₁ a < ra.policy s₂ a

Cross-state policy comparison, with the totalScore > 0 hypotheses derived from the cross-product inequality itself (so no positivity hypothesis is needed).

theorem Core.RationalAction.policy_list_sum_lt {S : Type u_1} {A : Type u_2} [Fintype A] (ra : RationalAction S A) (s : S) (as₁ as₂ : List A) (h : (List.map (ra.score s) as₁).sum < (List.map (ra.score s) as₂).sum) (htot : 0 < ra.totalScore s) :

(List.map (ra.policy s) as₁).sum < (List.map (ra.policy s) as₂).sum

Score-sum ordering implies policy-sum ordering when both sides share the same state (same denominator).

theorem Core.RationalAction.finset_sum_policy_lt_of_sum_score_lt {S : Type u_1} {A : Type u_2} [Fintype A] (ra : RationalAction S A) (s : S) (F₁ F₂ : Finset A) (h : F₁.sum (ra.score s) < F₂.sum (ra.score s)) :

F₁.sum (ra.policy s) < F₂.sum (ra.policy s)

Finset-sum ordering implies policy-sum ordering when both sides share the same state (same denominator); the Finset.sum analogue of policy_list_sum_lt, with positivity derived from the score ordering.

Luce's choice axiom (IIA) #

[Luc59] showed that the ratio rule P(a|s) = v(a)/Σv(b) is characterized by the independence of irrelevant alternatives (IIA): the relative probability of two actions depends only on their scores, not on what other actions are available.

We formalize:

The constant ratio rule: policy(a₁) · score(a₂) = policy(a₂) · score(a₁)
Choice from subsets (pChoice): restriction of the choice rule to a Finset
IIA: ratios in any subset equal score ratios
The product rule: P(a,T) = P(a,S) · P(S,T) for S ⊆ T
Scale invariance: multiplying all scores by k > 0 preserves policy
Uniqueness (forward direction): proportional scores yield the same policy

theorem Core.RationalAction.policy_ratio {S : Type u_3} {A : Type u_4} [Fintype A] (ra : RationalAction S A) (s : S) (a₁ a₂ : A) :

ra.policy s a₁ * ra.score s a₂ = ra.policy s a₂ * ra.score s a₁

Constant ratio rule: policy(a₁) · score(a₂) = policy(a₂) · score(a₁). The odds ratio policy(a₁)/policy(a₂) = score(a₁)/score(a₂).

noncomputable def Core.RationalAction.pChoice {S : Type u_3} {A : Type u_4} [Fintype A] [DecidableEq A] (ra : RationalAction S A) (s : S) (T : Finset A) (a : A) :

ℝ

Choice probability from a subset: pChoice(a, T) = score(a) / Σ_{b∈T} score(b). Returns 0 if a ∉ T or the subset total is 0.

Equations

ra.pChoice s T a = if a ∈ T then have subTotal := ∑ b ∈ T, ra.score s b; if subTotal = 0 then 0 else ra.score s a / subTotal else 0

Instances For

theorem Core.RationalAction.pChoice_univ {S : Type u_3} {A : Type u_4} [Fintype A] [DecidableEq A] (ra : RationalAction S A) (s : S) (a : A) :

ra.pChoice s Finset.univ a = ra.policy s a

pChoice on the full set equals policy.

theorem Core.RationalAction.pChoice_nonneg {S : Type u_3} {A : Type u_4} [Fintype A] [DecidableEq A] (ra : RationalAction S A) (s : S) (T : Finset A) (a : A) :

0 ≤ ra.pChoice s T a

pChoice is non-negative.

theorem Core.RationalAction.pChoice_sum_eq_one {S : Type u_3} {A : Type u_4} [Fintype A] [DecidableEq A] (ra : RationalAction S A) (s : S) (T : Finset A) (hT : ∑ b ∈ T, ra.score s b ≠ 0) :

∑ a ∈ T, ra.pChoice s T a = 1

pChoice sums to 1 over the subset when the subset total is nonzero.

theorem Core.RationalAction.pChoice_ratio {S : Type u_3} {A : Type u_4} [Fintype A] [DecidableEq A] (ra : RationalAction S A) (s : S) (T : Finset A) (a₁ a₂ : A) (h₁ : a₁ ∈ T) (h₂ : a₂ ∈ T) :

ra.pChoice s T a₁ * ra.score s a₂ = ra.pChoice s T a₂ * ra.score s a₁

IIA core: the ratio of pChoice values in any subset equals the score ratio. For a₁, a₂ ∈ T with score(a₂) > 0: pChoice(a₁, T) · score(a₂) = pChoice(a₂, T) · score(a₁).

theorem Core.RationalAction.iia {S : Type u_3} {A : Type u_4} [Fintype A] [DecidableEq A] (ra : RationalAction S A) (s : S) (S' T : Finset A) (hST : S' ⊆ T) (a : A) (ha : a ∈ S') (hS_pos : ∑ b ∈ S', ra.score s b ≠ 0) (hT_pos : ∑ b ∈ T, ra.score s b ≠ 0) :

ra.pChoice s S' a = ra.pChoice s T a / ∑ b ∈ S', ra.pChoice s T b

IIA: P(a, S) = P(a, T) / Σ_{b∈S} P(b, T) for S ⊆ T. Choice probability from a subset is the conditional probability.

theorem Core.RationalAction.product_rule {S : Type u_3} {A : Type u_4} [Fintype A] [DecidableEq A] (ra : RationalAction S A) (s : S) (S' T : Finset A) (hST : S' ⊆ T) (a : A) (ha : a ∈ S') (hS_pos : ∑ b ∈ S', ra.score s b ≠ 0) (hT_pos : ∑ b ∈ T, ra.score s b ≠ 0) :

ra.pChoice s T a = ra.pChoice s S' a * ((∑ b ∈ S', ra.score s b) / ∑ b ∈ T, ra.score s b)

Product rule: P(a, T) = P(a, S) · P(S, T) for a ∈ S ⊆ T, where P(S, T) = Σ_{b∈S} score(b) / Σ_{b∈T} score(b).

noncomputable def Core.RationalAction.scaleBy {S : Type u_3} {A : Type u_4} [Fintype A] (ra : RationalAction S A) (k : ℝ) (hk : 0 < k) :

RationalAction S A

Scale all scores by a positive constant k.

Equations

ra.scaleBy k hk = { score := fun (s : S) (a : A) => k * ra.score s a, score_nonneg := ⋯ }

Instances For

theorem Core.RationalAction.scaleBy_policy {S : Type u_3} {A : Type u_4} [Fintype A] (ra : RationalAction S A) (s : S) (a : A) (k : ℝ) (hk : 0 < k) :

(ra.scaleBy k hk).policy s a = ra.policy s a

Scale invariance: scaling scores by k > 0 preserves policy.

theorem Core.RationalAction.policy_eq_of_proportional {S : Type u_3} {A : Type u_4} [Fintype A] (ra ra' : RationalAction S A) (s : S) (k : ℝ) (hk : 0 < k) (h : ∀ (a : A), ra'.score s a = k * ra.score s a) (a : A) :

ra'.policy s a = ra.policy s a

Uniqueness (forward direction): If scores are proportional (score'(s,a) = k · score(s,a) for some k > 0), then both agents have the same policy.

noncomputable def Core.RationalAction.fromSoftmax {S : Type u_1} {A : Type u_2} [Fintype A] (utility : S → A → ℝ) (α : ℝ) :

RationalAction S A

Construct a RationalAction from a utility function via softmax: the score is exp(α · utility(s, a)), so policy = softmax(utility, α).

Equations

Core.RationalAction.fromSoftmax utility α = { score := fun (s : S) (a : A) => Real.exp (α * utility s a), score_nonneg := ⋯ }

Instances For

theorem Core.RationalAction.fromSoftmax_policy_eq {S : Type u_1} {A : Type u_2} [Fintype A] [Nonempty A] (utility : S → A → ℝ) (α : ℝ) (s : S) (a : A) :

(fromSoftmax utility α).policy s a = softmax (α • utility s) a

The policy of a softmax agent equals the softmax function.

Uniqueness of the ratio scale #

policy_eq_of_proportional (proved earlier) shows that proportional scores yield the same policy. The converse — that identical policies force proportional scores — completes the uniqueness characterization: same policy ↔ proportional scores, with proportionality constant k = totalScore₂/totalScore₁.

This is distinct from the independence-of-unit condition, which concerns state-dependent transformations f satisfying f(kv) = kf(v) — how scale values transform across experimental conditions, not the uniqueness of the scale within a condition.

theorem Core.RationalAction.proportional_of_policy_eq {S : Type u_3} {A : Type u_4} [Fintype A] (ra₁ ra₂ : RationalAction S A) (s : S) (h₁ : 0 < ra₁.totalScore s) (h₂ : 0 < ra₂.totalScore s) (hpol : ∀ (a : A), ra₁.policy s a = ra₂.policy s a) (a : A) :

ra₂.score s a = ra₂.totalScore s / ra₁.totalScore s * ra₁.score s a

Converse direction (uniqueness of ratio scale): If two agents with positive total scores have the same policy, then their scores are proportional with constant k = total₂/total₁.

theorem Core.RationalAction.policy_eq_iff_proportional {S : Type u_3} {A : Type u_4} [Fintype A] (ra₁ ra₂ : RationalAction S A) (s : S) (h₁ : 0 < ra₁.totalScore s) (h₂ : 0 < ra₂.totalScore s) :

(∀ (a : A), ra₁.policy s a = ra₂.policy s a) ↔ ∃ (k : ℝ), 0 < k ∧ ∀ (a : A), ra₂.score s a = k * ra₁.score s a

Full uniqueness characterization: Two agents with positive total scores have the same policy if and only if their scores are proportional.

Alternative forms of the choice axiom #

[Luc59] proves three equivalent formulations of the choice axiom:

(a) Ratio form: There exists a positive function v such that P(x, T) = v(x) / Σ_{y∈T} v(y) for all x ∈ T.

(b) Product rule: P(x, T) = P(x, S) · P(S, T) for x ∈ S ⊆ T, where P(S, T) = Σ_{y∈S} P(y, T).

(c) Pairwise independence: The odds ratio P(x,{x,y}) · P(y,T) = P(y,{x,y}) · P(x,T) — pairwise ratios are preserved in any superset.

The ratio form (a) is the definition of RationalAction; (a)→(b) is product_rule and (a)→(c) is pChoice_ratio.

structure Core.ChoiceFn (A : Type u_4) [DecidableEq A] :

Type u_4

A choice function on finite subsets — the canonical [Luc59] form. For each non-empty subset T ⊆ A, prob T : A → ℝ is a probability distribution: non-negative, vanishing outside T, summing to 1 inside. The Luce-axiom forms (hasRatioScale, hasProductRule, hasPairwiseIIA) are stated as predicates below.

prob : Finset A → A → ℝ
P(a, T): probability of choosing a from set T
prob_nonneg (T : Finset A) (a : A) : 0 ≤ self.prob T a
Probabilities are non-negative
prob_zero_outside (T : Finset A) (a : A) : a ∉ T → self.prob T a = 0
Zero probability outside the choice set
prob_sum_eq_one (T : Finset A) : T.Nonempty → ∑ a ∈ T, self.prob T a = 1
Probabilities sum to 1 within the choice set ([Luc59])

Instances For

def Core.ChoiceFn.binary {A : Type u_4} [DecidableEq A] (cf : ChoiceFn A) (x y : A) :

ℝ

Binary choice view: binary cf x y is the probability of choosing x over y in the binary forced choice between them, defined via cf.prob {x, y} x.

Equations

cf.binary x y = cf.prob {x, y} x

Instances For

theorem Core.ChoiceFn.binary_nonneg {A : Type u_4} [DecidableEq A] (cf : ChoiceFn A) (x y : A) :

0 ≤ cf.binary x y

Binary choice probabilities are non-negative.

theorem Core.ChoiceFn.binary_le_one {A : Type u_4} [DecidableEq A] (cf : ChoiceFn A) (x y : A) :

cf.binary x y ≤ 1

Binary choice probabilities are at most 1.

theorem Core.ChoiceFn.binary_complement {A : Type u_4} [DecidableEq A] (cf : ChoiceFn A) {x y : A} (hxy : x ≠ y) :

cf.binary x y + cf.binary y x = 1

Binary complementarity: cf.binary x y + cf.binary y x = 1 when x ≠ y. Derived from prob_sum_eq_one over {x, y}.

def Core.ChoiceFn.hasRatioScale {A : Type u_3} [DecidableEq A] (cf : ChoiceFn A) :

Axiom 1, Form (a): ratio scale representation. There exists v > 0 such that P(x, T) = v(x) / Σ v(y).

Equations

cf.hasRatioScale = ∃ (v : A → ℝ), (∀ (a : A), 0 < v a) ∧ ∀ (T : Finset A), ∀ a ∈ T, cf.prob T a = v a / ∑ b ∈ T, v b

Instances For

def Core.ChoiceFn.hasProductRule {A : Type u_3} [DecidableEq A] (cf : ChoiceFn A) :

Axiom 1, Form (b): product rule. P(x, T) = P(x, S) · Σ_{y∈S} P(y, T) for x ∈ S ⊆ T.

Equations

cf.hasProductRule = ∀ (S T : Finset A), S ⊆ T → S.Nonempty → ∑ b ∈ T, cf.prob T b = 1 → ∀ a ∈ S, cf.prob T a = cf.prob S a * ∑ b ∈ S, cf.prob T b

Instances For

def Core.ChoiceFn.hasPairwiseIIA {A : Type u_3} [DecidableEq A] (cf : ChoiceFn A) :

Axiom 1, Form (c): pairwise independence (IIA). The odds ratio is preserved in any superset: P(x,T) · P(y,{x,y}) = P(y,T) · P(x,{x,y}).

Equations

cf.hasPairwiseIIA = ∀ (T : Finset A) (a b : A), a ∈ T → b ∈ T → cf.prob T a * cf.prob {a, b} b = cf.prob T b * cf.prob {a, b} a

Instances For

theorem Core.ratio_implies_product {A : Type u_3} [DecidableEq A] (cf : ChoiceFn A) (h : cf.hasRatioScale) :

cf.hasProductRule

(a) → (b): the ratio form implies the product rule.

theorem Core.ratio_implies_pairwiseIIA {A : Type u_3} [DecidableEq A] (cf : ChoiceFn A) (h : cf.hasRatioScale) :

cf.hasPairwiseIIA

(a) → (c): the ratio form implies pairwise IIA.

theorem Core.pairwiseIIA_implies_ratio {A : Type u_3} [DecidableEq A] [Inhabited A] (cf : ChoiceFn A) (hIIA : cf.hasPairwiseIIA) (hpos : ∀ (T : Finset A), ∀ a ∈ T, 0 < cf.prob T a) :

cf.hasRatioScale

(c) → (a): pairwise IIA implies the ratio form. The ratio scale is constructed by fixing a reference element x₀ and setting v(x) = P(x, {x, x₀}) / P(x₀, {x, x₀}) (requires strict positivity on the choice set).

theorem Core.axiom1_ratio_iff_pairwiseIIA {A : Type u_3} [DecidableEq A] [Inhabited A] (cf : ChoiceFn A) (hpos : ∀ (T : Finset A), ∀ a ∈ T, 0 < cf.prob T a) :

cf.hasRatioScale ↔ cf.hasPairwiseIIA

Axiom 1 equivalence: the ratio form ↔ pairwise IIA, under strict positivity on each choice set.