Rational action #
The mathematical foundation for all soft-rational agents: RSA speakers and
listeners, BToM agents, and decision-theoretic actors. A RationalAction agent
selects actions with probability proportional to a non-negative score — the
Luce choice rule, the unique rule satisfying independence of irrelevant
alternatives (IIA): the relative probability of two actions depends only on their
scores.
The exponential parameterization (softmax) lives in Core.Probability.LogitChoice,
and its variational / information-theoretic / limiting theory in
Core.Probability.SoftmaxTheory (re-exported here); this file is the agent-framing
layer on top of those pure-math cores.
Main definitions #
RationalAction,RationalAction.policy— the Luce choice rule.RationalAction.fromSoftmax— build an agent from a utility viaexp(α · u), bridging the agent framing to thesoftmaxcore.ChoiceFnandChoiceFn.hasRatioScale/hasProductRule/hasPairwiseIIA— the choice-axiom forms.
Main results #
RationalAction.policy_eq_of_proportional,RationalAction.policy_eq_iff_proportional— the policy depends only on the ratio scale of the scores.RationalAction.iia,RationalAction.product_rule— Luce's choice axiom forpChoice.axiom1_ratio_iff_pairwiseIIA— equivalence of the choice-axiom forms.
Score-based agents (Luce choice rule) #
A rational action agent: selects actions with probability ∝ score(state, action).
This is the Luce choice rule. The score function is unnormalized;
normalization to a proper distribution is derived (see policy).
Key instances:
- RSA L0:
score(u, w) = prior(w) · meaning(u, w) - RSA S1:
score(w, u) = rpow(informativity(w, u), α) · exp(-α · cost(u)) - BToM agent:
score(state, action) = exp(β · Q(state, action))
- score : State → Action → ℝ
Unnormalized score: policy(a|s) ∝ score(s, a).
Scores are non-negative.
Instances For
Total score across all actions in a state (normalization constant).
Equations
- ra.totalScore s = ∑ a : A, ra.score s a
Instances For
Normalized policy: P(a|s) = score(s,a) / Σ_a' score(s,a'). Returns 0 for all actions if totalScore = 0.
Equations
- ra.policy s a = if ra.totalScore s = 0 then 0 else ra.score s a / ra.totalScore s
Instances For
Policy sums to 1 when totalScore is nonzero.
Zero score implies zero policy, regardless of whether totalScore is zero.
Policy respects score equality: equal scores → equal probabilities.
Follows directly from the Luce rule: both sides are score / totalScore
with the same denominator.
When totalScore equals the score of action a, the policy for a is 1.
Iff combining policy_lt_of_score_lt with the contrapositive of
policy_monotone: at the same state, policy strict ordering coincides
with score strict ordering.
Cross-state policy comparison (states with different normalization
constants): the cross-product score(s₁,a) · total(s₂) < score(s₂,a) · total(s₁)
is equivalent to policy s₁ a < policy s₂ a when both totals are positive.
Cross-state policy comparison, with the totalScore > 0 hypotheses derived
from the cross-product inequality itself (so no positivity hypothesis is
needed).
Score-sum ordering implies policy-sum ordering when both sides share the same state (same denominator).
Finset-sum ordering implies policy-sum ordering when both sides share the
same state (same denominator); the Finset.sum analogue of
policy_list_sum_lt, with positivity derived from the score ordering.
Luce's choice axiom (IIA) #
[Luc59] showed that the ratio rule P(a|s) = v(a)/Σv(b) is
characterized by the independence of irrelevant alternatives (IIA): the
relative probability of two actions depends only on their scores, not on what
other actions are available.
We formalize:
- The constant ratio rule:
policy(a₁) · score(a₂) = policy(a₂) · score(a₁) - Choice from subsets (
pChoice): restriction of the choice rule to aFinset - IIA: ratios in any subset equal score ratios
- The product rule:
P(a,T) = P(a,S) · P(S,T)forS ⊆ T - Scale invariance: multiplying all scores by
k > 0preserves policy - Uniqueness (forward direction): proportional scores yield the same policy
Constant ratio rule:
policy(a₁) · score(a₂) = policy(a₂) · score(a₁).
The odds ratio policy(a₁)/policy(a₂) = score(a₁)/score(a₂).
Choice probability from a subset: pChoice(a, T) = score(a) / Σ_{b∈T} score(b).
Returns 0 if a ∉ T or the subset total is 0.
Equations
Instances For
pChoice is non-negative.
pChoice sums to 1 over the subset when the subset total is nonzero.
IIA core: the ratio of pChoice values in any subset equals the score ratio.
For a₁, a₂ ∈ T with score(a₂) > 0:
pChoice(a₁, T) · score(a₂) = pChoice(a₂, T) · score(a₁).
IIA: P(a, S) = P(a, T) / Σ_{b∈S} P(b, T) for S ⊆ T.
Choice probability from a subset is the conditional probability.
Product rule:
P(a, T) = P(a, S) · P(S, T) for a ∈ S ⊆ T,
where P(S, T) = Σ_{b∈S} score(b) / Σ_{b∈T} score(b).
Scale all scores by a positive constant k.
Instances For
Scale invariance: scaling scores by k > 0 preserves policy.
Uniqueness (forward direction):
If scores are proportional (score'(s,a) = k · score(s,a) for some k > 0),
then both agents have the same policy.
Construct a RationalAction from a utility function via softmax: the score
is exp(α · utility(s, a)), so policy = softmax(utility, α).
Equations
- Core.RationalAction.fromSoftmax utility α = { score := fun (s : S) (a : A) => Real.exp (α * utility s a), score_nonneg := ⋯ }
Instances For
The policy of a softmax agent equals the softmax function.
Uniqueness of the ratio scale #
policy_eq_of_proportional (proved earlier) shows that proportional scores
yield the same policy. The converse — that identical policies force proportional
scores — completes the uniqueness characterization: same policy ↔ proportional
scores, with proportionality constant k = totalScore₂/totalScore₁.
This is distinct from the independence-of-unit condition, which concerns
state-dependent transformations f satisfying f(kv) = kf(v) — how scale values
transform across experimental conditions, not the uniqueness of the scale within
a condition.
Converse direction (uniqueness of ratio scale):
If two agents with positive total scores have the same policy,
then their scores are proportional with constant k = total₂/total₁.
Full uniqueness characterization: Two agents with positive total scores have the same policy if and only if their scores are proportional.
Alternative forms of the choice axiom #
[Luc59] proves three equivalent formulations of the choice axiom:
(a) Ratio form: There exists a positive function v such that
P(x, T) = v(x) / Σ_{y∈T} v(y) for all x ∈ T.
(b) Product rule: P(x, T) = P(x, S) · P(S, T) for x ∈ S ⊆ T,
where P(S, T) = Σ_{y∈S} P(y, T).
(c) Pairwise independence: The odds ratio P(x,{x,y}) · P(y,T) = P(y,{x,y}) · P(x,T) — pairwise ratios are preserved in any superset.
The ratio form (a) is the definition of RationalAction; (a)→(b) is
product_rule and (a)→(c) is pChoice_ratio.
A choice function on finite subsets — the canonical [Luc59] form.
For each non-empty subset T ⊆ A, prob T : A → ℝ is a probability
distribution: non-negative, vanishing outside T, summing to 1 inside. The
Luce-axiom forms (hasRatioScale, hasProductRule, hasPairwiseIIA) are
stated as predicates below.
- prob : Finset A → A → ℝ
P(a, T): probability of choosing
afrom setT Probabilities are non-negative
Zero probability outside the choice set
- prob_sum_eq_one (T : Finset A) : T.Nonempty → ∑ a ∈ T, self.prob T a = 1
Probabilities sum to 1 within the choice set ([Luc59])
Instances For
Binary complementarity: cf.binary x y + cf.binary y x = 1
when x ≠ y. Derived from prob_sum_eq_one over {x, y}.
Axiom 1, Form (a): ratio scale representation.
There exists v > 0 such that P(x, T) = v(x) / Σ v(y).
Equations
- cf.hasRatioScale = ∃ (v : A → ℝ), (∀ (a : A), 0 < v a) ∧ ∀ (T : Finset A), ∀ a ∈ T, cf.prob T a = v a / ∑ b ∈ T, v b
Instances For
Axiom 1, Form (b): product rule.
P(x, T) = P(x, S) · Σ_{y∈S} P(y, T) for x ∈ S ⊆ T.
Equations
- cf.hasProductRule = ∀ (S T : Finset A), S ⊆ T → S.Nonempty → ∑ b ∈ T, cf.prob T b = 1 → ∀ a ∈ S, cf.prob T a = cf.prob S a * ∑ b ∈ S, cf.prob T b
Instances For
Axiom 1, Form (c): pairwise independence (IIA).
The odds ratio is preserved in any superset:
P(x,T) · P(y,{x,y}) = P(y,T) · P(x,{x,y}).
Equations
- cf.hasPairwiseIIA = ∀ (T : Finset A) (a b : A), a ∈ T → b ∈ T → cf.prob T a * cf.prob {a, b} b = cf.prob T b * cf.prob {a, b} a
Instances For
(a) → (b): the ratio form implies the product rule.
(a) → (c): the ratio form implies pairwise IIA.
(c) → (a): pairwise IIA implies the ratio form. The ratio scale is
constructed by fixing a reference element x₀ and setting
v(x) = P(x, {x, x₀}) / P(x₀, {x, x₀}) (requires strict positivity on the
choice set).
Axiom 1 equivalence: the ratio form ↔ pairwise IIA, under strict positivity on each choice set.