Documentation

Linglib.Core.Probability.Choice.RationalAction

Rational action #

[Luc59] [CT06] [ZHL21] [AM58]

The mathematical foundation for all soft-rational agents: RSA speakers and listeners, BToM agents, and decision-theoretic actors. A RationalAction agent selects actions with probability proportional to a non-negative score — the Luce choice rule, the unique rule satisfying independence of irrelevant alternatives (IIA): the relative probability of two actions depends only on their scores.

The exponential parameterization (softmax) lives in Core.Probability.LogitChoice, and its variational / information-theoretic / limiting theory in Core.Probability.SoftmaxTheory (re-exported here); this file is the agent-framing layer on top of those pure-math cores.

Main definitions #

Main results #

Score-based agents (Luce choice rule) #

structure Core.RationalAction (State : Type u_1) (Action : Type u_2) [Fintype Action] :
Type (max u_1 u_2)

A rational action agent: selects actions with probability ∝ score(state, action).

This is the Luce choice rule. The score function is unnormalized; normalization to a proper distribution is derived (see policy).

Key instances:

  • RSA L0: score(u, w) = prior(w) · meaning(u, w)
  • RSA S1: score(w, u) = rpow(informativity(w, u), α) · exp(-α · cost(u))
  • BToM agent: score(state, action) = exp(β · Q(state, action))
  • score : StateAction

    Unnormalized score: policy(a|s) ∝ score(s, a).

  • score_nonneg (s : State) (a : Action) : 0 self.score s a

    Scores are non-negative.

Instances For
    noncomputable def Core.RationalAction.totalScore {S : Type u_1} {A : Type u_2} [Fintype A] (ra : RationalAction S A) (s : S) :

    Total score across all actions in a state (normalization constant).

    Equations
    Instances For
      theorem Core.RationalAction.totalScore_nonneg {S : Type u_1} {A : Type u_2} [Fintype A] (ra : RationalAction S A) (s : S) :
      noncomputable def Core.RationalAction.policy {S : Type u_1} {A : Type u_2} [Fintype A] (ra : RationalAction S A) (s : S) (a : A) :

      Normalized policy: P(a|s) = score(s,a) / Σ_a' score(s,a'). Returns 0 for all actions if totalScore = 0.

      Equations
      Instances For
        theorem Core.RationalAction.policy_nonneg {S : Type u_1} {A : Type u_2} [Fintype A] (ra : RationalAction S A) (s : S) (a : A) :
        0 ra.policy s a
        theorem Core.RationalAction.policy_sum_eq_one {S : Type u_1} {A : Type u_2} [Fintype A] (ra : RationalAction S A) (s : S) (h : ra.totalScore s 0) :
        a : A, ra.policy s a = 1

        Policy sums to 1 when totalScore is nonzero.

        theorem Core.RationalAction.policy_monotone {S : Type u_1} {A : Type u_2} [Fintype A] (ra : RationalAction S A) (s : S) (a₁ a₂ : A) (h : ra.score s a₁ ra.score s a₂) :
        ra.policy s a₁ ra.policy s a₂

        Policy monotonicity: higher score → higher probability.

        theorem Core.RationalAction.policy_eq_zero_of_score_eq_zero {S : Type u_1} {A : Type u_2} [Fintype A] (ra : RationalAction S A) (s : S) (a : A) (h : ra.score s a = 0) :
        ra.policy s a = 0

        Zero score implies zero policy, regardless of whether totalScore is zero.

        theorem Core.RationalAction.policy_eq_of_score_eq {S : Type u_1} {A : Type u_2} [Fintype A] (ra : RationalAction S A) (s : S) (a₁ a₂ : A) (h : ra.score s a₁ = ra.score s a₂) :
        ra.policy s a₁ = ra.policy s a₂

        Policy respects score equality: equal scores → equal probabilities. Follows directly from the Luce rule: both sides are score / totalScore with the same denominator.

        theorem Core.RationalAction.policy_eq_one_of_totalScore_eq {S : Type u_1} {A : Type u_2} [Fintype A] (ra : RationalAction S A) (s : S) (a : A) (h_sum : ra.totalScore s = ra.score s a) (h_pos : 0 < ra.score s a) :
        ra.policy s a = 1

        When totalScore equals the score of action a, the policy for a is 1.

        theorem Core.RationalAction.policy_not_lt_of_score_le {S : Type u_1} {A : Type u_2} [Fintype A] (ra : RationalAction S A) (s : S) (a₁ a₂ : A) (h : ra.score s a₂ ra.score s a₁) :
        ¬ra.policy s a₁ < ra.policy s a₂

        Score ordering implies the negation of the strict policy ordering.

        theorem Core.RationalAction.policy_lt_of_score_lt {S : Type u_1} {A : Type u_2} [Fintype A] (ra : RationalAction S A) (s : S) (a₁ a₂ : A) (hlt : ra.score s a₁ < ra.score s a₂) :
        ra.policy s a₁ < ra.policy s a₂

        Strict policy monotonicity: strictly higher score gives strictly higher probability.

        theorem Core.RationalAction.policy_lt_iff_score_lt {S : Type u_1} {A : Type u_2} [Fintype A] (ra : RationalAction S A) (s : S) (a₁ a₂ : A) :
        ra.policy s a₁ < ra.policy s a₂ ra.score s a₁ < ra.score s a₂

        Iff combining policy_lt_of_score_lt with the contrapositive of policy_monotone: at the same state, policy strict ordering coincides with score strict ordering.

        theorem Core.RationalAction.policy_lt_cross {S : Type u_1} {A : Type u_2} [Fintype A] (ra : RationalAction S A) (s₁ s₂ : S) (a : A) (h_pos₁ : 0 < ra.totalScore s₁) (h_pos₂ : 0 < ra.totalScore s₂) (h_cross : ra.score s₁ a * ra.totalScore s₂ < ra.score s₂ a * ra.totalScore s₁) :
        ra.policy s₁ a < ra.policy s₂ a

        Cross-state policy comparison (states with different normalization constants): the cross-product score(s₁,a) · total(s₂) < score(s₂,a) · total(s₁) is equivalent to policy s₁ a < policy s₂ a when both totals are positive.

        theorem Core.RationalAction.policy_lt_cross_of_cross_lt {S : Type u_1} {A : Type u_2} [Fintype A] (ra : RationalAction S A) (s₁ s₂ : S) (a : A) (h_cross : ra.score s₁ a * ra.totalScore s₂ < ra.score s₂ a * ra.totalScore s₁) :
        ra.policy s₁ a < ra.policy s₂ a

        Cross-state policy comparison, with the totalScore > 0 hypotheses derived from the cross-product inequality itself (so no positivity hypothesis is needed).

        theorem Core.RationalAction.policy_list_sum_lt {S : Type u_1} {A : Type u_2} [Fintype A] (ra : RationalAction S A) (s : S) (as₁ as₂ : List A) (h : (List.map (ra.score s) as₁).sum < (List.map (ra.score s) as₂).sum) (htot : 0 < ra.totalScore s) :
        (List.map (ra.policy s) as₁).sum < (List.map (ra.policy s) as₂).sum

        Score-sum ordering implies policy-sum ordering when both sides share the same state (same denominator).

        theorem Core.RationalAction.finset_sum_policy_lt_of_sum_score_lt {S : Type u_1} {A : Type u_2} [Fintype A] (ra : RationalAction S A) (s : S) (F₁ F₂ : Finset A) (h : F₁.sum (ra.score s) < F₂.sum (ra.score s)) :
        F₁.sum (ra.policy s) < F₂.sum (ra.policy s)

        Finset-sum ordering implies policy-sum ordering when both sides share the same state (same denominator); the Finset.sum analogue of policy_list_sum_lt, with positivity derived from the score ordering.

        Luce's choice axiom (IIA) #

        [Luc59] showed that the ratio rule P(a|s) = v(a)/Σv(b) is characterized by the independence of irrelevant alternatives (IIA): the relative probability of two actions depends only on their scores, not on what other actions are available.

        We formalize:

        theorem Core.RationalAction.policy_ratio {S : Type u_3} {A : Type u_4} [Fintype A] (ra : RationalAction S A) (s : S) (a₁ a₂ : A) :
        ra.policy s a₁ * ra.score s a₂ = ra.policy s a₂ * ra.score s a₁

        Constant ratio rule: policy(a₁) · score(a₂) = policy(a₂) · score(a₁). The odds ratio policy(a₁)/policy(a₂) = score(a₁)/score(a₂).

        noncomputable def Core.RationalAction.pChoice {S : Type u_3} {A : Type u_4} [Fintype A] [DecidableEq A] (ra : RationalAction S A) (s : S) (T : Finset A) (a : A) :

        Choice probability from a subset: pChoice(a, T) = score(a) / Σ_{b∈T} score(b). Returns 0 if a ∉ T or the subset total is 0.

        Equations
        • ra.pChoice s T a = if a T then have subTotal := bT, ra.score s b; if subTotal = 0 then 0 else ra.score s a / subTotal else 0
        Instances For
          theorem Core.RationalAction.pChoice_univ {S : Type u_3} {A : Type u_4} [Fintype A] [DecidableEq A] (ra : RationalAction S A) (s : S) (a : A) :
          ra.pChoice s Finset.univ a = ra.policy s a

          pChoice on the full set equals policy.

          theorem Core.RationalAction.pChoice_nonneg {S : Type u_3} {A : Type u_4} [Fintype A] [DecidableEq A] (ra : RationalAction S A) (s : S) (T : Finset A) (a : A) :
          0 ra.pChoice s T a

          pChoice is non-negative.

          theorem Core.RationalAction.pChoice_sum_eq_one {S : Type u_3} {A : Type u_4} [Fintype A] [DecidableEq A] (ra : RationalAction S A) (s : S) (T : Finset A) (hT : bT, ra.score s b 0) :
          aT, ra.pChoice s T a = 1

          pChoice sums to 1 over the subset when the subset total is nonzero.

          theorem Core.RationalAction.pChoice_ratio {S : Type u_3} {A : Type u_4} [Fintype A] [DecidableEq A] (ra : RationalAction S A) (s : S) (T : Finset A) (a₁ a₂ : A) (h₁ : a₁ T) (h₂ : a₂ T) :
          ra.pChoice s T a₁ * ra.score s a₂ = ra.pChoice s T a₂ * ra.score s a₁

          IIA core: the ratio of pChoice values in any subset equals the score ratio. For a₁, a₂ ∈ T with score(a₂) > 0: pChoice(a₁, T) · score(a₂) = pChoice(a₂, T) · score(a₁).

          theorem Core.RationalAction.iia {S : Type u_3} {A : Type u_4} [Fintype A] [DecidableEq A] (ra : RationalAction S A) (s : S) (S' T : Finset A) (hST : S' T) (a : A) (ha : a S') (hS_pos : bS', ra.score s b 0) (hT_pos : bT, ra.score s b 0) :
          ra.pChoice s S' a = ra.pChoice s T a / bS', ra.pChoice s T b

          IIA: P(a, S) = P(a, T) / Σ_{b∈S} P(b, T) for S ⊆ T. Choice probability from a subset is the conditional probability.

          theorem Core.RationalAction.product_rule {S : Type u_3} {A : Type u_4} [Fintype A] [DecidableEq A] (ra : RationalAction S A) (s : S) (S' T : Finset A) (hST : S' T) (a : A) (ha : a S') (hS_pos : bS', ra.score s b 0) (hT_pos : bT, ra.score s b 0) :
          ra.pChoice s T a = ra.pChoice s S' a * ((∑ bS', ra.score s b) / bT, ra.score s b)

          Product rule: P(a, T) = P(a, S) · P(S, T) for a ∈ S ⊆ T, where P(S, T) = Σ_{b∈S} score(b) / Σ_{b∈T} score(b).

          noncomputable def Core.RationalAction.scaleBy {S : Type u_3} {A : Type u_4} [Fintype A] (ra : RationalAction S A) (k : ) (hk : 0 < k) :

          Scale all scores by a positive constant k.

          Equations
          • ra.scaleBy k hk = { score := fun (s : S) (a : A) => k * ra.score s a, score_nonneg := }
          Instances For
            theorem Core.RationalAction.scaleBy_policy {S : Type u_3} {A : Type u_4} [Fintype A] (ra : RationalAction S A) (s : S) (a : A) (k : ) (hk : 0 < k) :
            (ra.scaleBy k hk).policy s a = ra.policy s a

            Scale invariance: scaling scores by k > 0 preserves policy.

            theorem Core.RationalAction.policy_eq_of_proportional {S : Type u_3} {A : Type u_4} [Fintype A] (ra ra' : RationalAction S A) (s : S) (k : ) (hk : 0 < k) (h : ∀ (a : A), ra'.score s a = k * ra.score s a) (a : A) :
            ra'.policy s a = ra.policy s a

            Uniqueness (forward direction): If scores are proportional (score'(s,a) = k · score(s,a) for some k > 0), then both agents have the same policy.

            noncomputable def Core.RationalAction.fromSoftmax {S : Type u_1} {A : Type u_2} [Fintype A] (utility : SA) (α : ) :

            Construct a RationalAction from a utility function via softmax: the score is exp(α · utility(s, a)), so policy = softmax(utility, α).

            Equations
            Instances For
              theorem Core.RationalAction.fromSoftmax_policy_eq {S : Type u_1} {A : Type u_2} [Fintype A] [Nonempty A] (utility : SA) (α : ) (s : S) (a : A) :
              (fromSoftmax utility α).policy s a = softmax (α utility s) a

              The policy of a softmax agent equals the softmax function.

              Uniqueness of the ratio scale #

              policy_eq_of_proportional (proved earlier) shows that proportional scores yield the same policy. The converse — that identical policies force proportional scores — completes the uniqueness characterization: same policy ↔ proportional scores, with proportionality constant k = totalScore₂/totalScore₁.

              This is distinct from the independence-of-unit condition, which concerns state-dependent transformations f satisfying f(kv) = kf(v) — how scale values transform across experimental conditions, not the uniqueness of the scale within a condition.

              theorem Core.RationalAction.proportional_of_policy_eq {S : Type u_3} {A : Type u_4} [Fintype A] (ra₁ ra₂ : RationalAction S A) (s : S) (h₁ : 0 < ra₁.totalScore s) (h₂ : 0 < ra₂.totalScore s) (hpol : ∀ (a : A), ra₁.policy s a = ra₂.policy s a) (a : A) :
              ra₂.score s a = ra₂.totalScore s / ra₁.totalScore s * ra₁.score s a

              Converse direction (uniqueness of ratio scale): If two agents with positive total scores have the same policy, then their scores are proportional with constant k = total₂/total₁.

              theorem Core.RationalAction.policy_eq_iff_proportional {S : Type u_3} {A : Type u_4} [Fintype A] (ra₁ ra₂ : RationalAction S A) (s : S) (h₁ : 0 < ra₁.totalScore s) (h₂ : 0 < ra₂.totalScore s) :
              (∀ (a : A), ra₁.policy s a = ra₂.policy s a) ∃ (k : ), 0 < k ∀ (a : A), ra₂.score s a = k * ra₁.score s a

              Full uniqueness characterization: Two agents with positive total scores have the same policy if and only if their scores are proportional.

              Alternative forms of the choice axiom #

              [Luc59] proves three equivalent formulations of the choice axiom:

              (a) Ratio form: There exists a positive function v such that P(x, T) = v(x) / Σ_{y∈T} v(y) for all x ∈ T.

              (b) Product rule: P(x, T) = P(x, S) · P(S, T) for x ∈ S ⊆ T, where P(S, T) = Σ_{y∈S} P(y, T).

              (c) Pairwise independence: The odds ratio P(x,{x,y}) · P(y,T) = P(y,{x,y}) · P(x,T) — pairwise ratios are preserved in any superset.

              The ratio form (a) is the definition of RationalAction; (a)→(b) is product_rule and (a)→(c) is pChoice_ratio.

              structure Core.ChoiceFn (A : Type u_4) [DecidableEq A] :
              Type u_4

              A choice function on finite subsets — the canonical [Luc59] form. For each non-empty subset T ⊆ A, prob T : A → ℝ is a probability distribution: non-negative, vanishing outside T, summing to 1 inside. The Luce-axiom forms (hasRatioScale, hasProductRule, hasPairwiseIIA) are stated as predicates below.

              • prob : Finset AA

                P(a, T): probability of choosing a from set T

              • prob_nonneg (T : Finset A) (a : A) : 0 self.prob T a

                Probabilities are non-negative

              • prob_zero_outside (T : Finset A) (a : A) : aTself.prob T a = 0

                Zero probability outside the choice set

              • prob_sum_eq_one (T : Finset A) : T.NonemptyaT, self.prob T a = 1

                Probabilities sum to 1 within the choice set ([Luc59])

              Instances For
                def Core.ChoiceFn.binary {A : Type u_4} [DecidableEq A] (cf : ChoiceFn A) (x y : A) :

                Binary choice view: binary cf x y is the probability of choosing x over y in the binary forced choice between them, defined via cf.prob {x, y} x.

                Equations
                Instances For
                  theorem Core.ChoiceFn.binary_nonneg {A : Type u_4} [DecidableEq A] (cf : ChoiceFn A) (x y : A) :
                  0 cf.binary x y

                  Binary choice probabilities are non-negative.

                  theorem Core.ChoiceFn.binary_le_one {A : Type u_4} [DecidableEq A] (cf : ChoiceFn A) (x y : A) :
                  cf.binary x y 1

                  Binary choice probabilities are at most 1.

                  theorem Core.ChoiceFn.binary_complement {A : Type u_4} [DecidableEq A] (cf : ChoiceFn A) {x y : A} (hxy : x y) :
                  cf.binary x y + cf.binary y x = 1

                  Binary complementarity: cf.binary x y + cf.binary y x = 1 when x ≠ y. Derived from prob_sum_eq_one over {x, y}.

                  def Core.ChoiceFn.hasRatioScale {A : Type u_3} [DecidableEq A] (cf : ChoiceFn A) :

                  Axiom 1, Form (a): ratio scale representation. There exists v > 0 such that P(x, T) = v(x) / Σ v(y).

                  Equations
                  • cf.hasRatioScale = ∃ (v : A), (∀ (a : A), 0 < v a) ∀ (T : Finset A), aT, cf.prob T a = v a / bT, v b
                  Instances For
                    def Core.ChoiceFn.hasProductRule {A : Type u_3} [DecidableEq A] (cf : ChoiceFn A) :

                    Axiom 1, Form (b): product rule. P(x, T) = P(x, S) · Σ_{y∈S} P(y, T) for x ∈ S ⊆ T.

                    Equations
                    • cf.hasProductRule = ∀ (S T : Finset A), S TS.NonemptybT, cf.prob T b = 1aS, cf.prob T a = cf.prob S a * bS, cf.prob T b
                    Instances For
                      def Core.ChoiceFn.hasPairwiseIIA {A : Type u_3} [DecidableEq A] (cf : ChoiceFn A) :

                      Axiom 1, Form (c): pairwise independence (IIA). The odds ratio is preserved in any superset: P(x,T) · P(y,{x,y}) = P(y,T) · P(x,{x,y}).

                      Equations
                      Instances For
                        theorem Core.ratio_implies_product {A : Type u_3} [DecidableEq A] (cf : ChoiceFn A) (h : cf.hasRatioScale) :

                        (a) → (b): the ratio form implies the product rule.

                        theorem Core.ratio_implies_pairwiseIIA {A : Type u_3} [DecidableEq A] (cf : ChoiceFn A) (h : cf.hasRatioScale) :

                        (a) → (c): the ratio form implies pairwise IIA.

                        theorem Core.pairwiseIIA_implies_ratio {A : Type u_3} [DecidableEq A] [Inhabited A] (cf : ChoiceFn A) (hIIA : cf.hasPairwiseIIA) (hpos : ∀ (T : Finset A), aT, 0 < cf.prob T a) :

                        (c) → (a): pairwise IIA implies the ratio form. The ratio scale is constructed by fixing a reference element x₀ and setting v(x) = P(x, {x, x₀}) / P(x₀, {x, x₀}) (requires strict positivity on the choice set).

                        theorem Core.axiom1_ratio_iff_pairwiseIIA {A : Type u_3} [DecidableEq A] [Inhabited A] (cf : ChoiceFn A) (hpos : ∀ (T : Finset A), aT, 0 < cf.prob T a) :

                        Axiom 1 equivalence: the ratio form ↔ pairwise IIA, under strict positivity on each choice set.