[TMS+26] — Act or Clarify? Uncertainty and cost in communication #

When should an agent ask a clarification question (CQ) rather than act under uncertainty? [TMS+26] predict and confirm that CQ rates depend on contextual uncertainty (ε) about the interlocutor's goal and the cost (δ) of safe actions, interacting: uncertainty matters most when acting incorrectly is costly.

The paper's computational model is layered decision theory, not RSA: a decision problem ⟨G, P, R, U⟩ with P(g₂) = ε and exhaustive-answer utility 1 − δ; a behavioral policy π = SoftMax(α · EU) over the goal-marginal expected utility EU(r) = Σ_g P(g)·U(g,r); and a CQ gate P(r_cq) = Logistic(τ · (ExpRegret(r*) − c)), where ExpRegret(r*) is the expected regret of the best action — the expected value of perfect information ([RS61]). This file formalises all three layers, building on Core.DecisionTheory.DecisionProblem; evpi below identifies the paper's ExpRegret with EVPI.

Main statements #

evpi_eq_min — for the paper's decision problem the expected regret of the best action has the closed form min ε δ: the uncertainty/cost interaction in one equation.
policy_prefers_exh_of_uncertain / policy_prefers_ms1_of_confident — the behavioral policy flips from the matching mention-some to the safe exhaustive answer as uncertainty crosses cost (δ < ε).
tl_justAsk, noNeedToAsk, worthAsking, uncertainty_matters_most_when_costly — the gate-level predictions, for every τ > 0 and threshold c (noNeedToAsk is an exact equality, the paper's null prediction).
reaction — the layered mixture: clarify with the gate probability, otherwise act by policy.
L1_exh_transmits_prior etc. — an RSA reinterpretation (this file's extension, not the paper's model): the goal-conditioned speaker — the ε → 0 / post-clarification limit of π, [HTB+25]'s action-utility respondent R₁ at β = 1, w_c = 0 — inverted by a Bayesian listener. The paper's model contains no listener.

Implementation notes #

Parameter provenance (Bayesian posterior means fitted to Exp 1): δ_L = 0.32, δ_S = 0.11 (large/small option space), ε_H = 0.49, ε_L = 0.17 (high/low uncertainty), τ = 3.60, c = 0.18. The gate-level prediction theorems hold for all τ > 0 and c, so the fitted values matter only through the orderings δ_S < ε_L < δ_L < ε_H ≤ 1/2. In the RSA section exhVal = 1 − δ and the priors 83 : 17 and 51 : 49 are (1 − ε) : ε; α = 1 there is this file's choice (the paper's α has prior N(5, 1)) — the strict-preference theorems are stated for any α > 0.

Expected value of perfect information #

[RS61]'s EVPI over a Core.DecisionTheory.DecisionProblem: the gap between the oracle value (expected utility under perfect information) and the value of acting now. EVPI is the maximum possible questionUtility ([vR03a]) — it equals questionUtility on the identity partition, so any specific clarification question yields at most EVPI; it is the paper's ExpRegret(r*) and the upper bound on [DHH+26]'s VoI.

[TMS+26] — Act or Clarify? Uncertainty and cost in communication #

Main statements #

Implementation notes #

Expected value of perfect information #

The decision problem ⟨G, P, R, U⟩ #

The behavioral policy π = SoftMax(α · EU) #

Expected regret of the best action: EVPI = min ε δ #

The CQ gate P(r_cq) = Logistic(τ · (ExpRegret(r*) − c)) #

Gate-level Experiment 1 predictions #

The shared decision-rule instance: a soft gate #

The layered reaction policy #

RSA reinterpretation: the post-clarification speaker and a listener #

The CQ gate `P(r_cq) = Logistic(τ · (ExpRegret(r*) − c))` #