[And21]: conversation update for RSA #
Tell me everything you know (SCiL 2021, 244–253): multi-turn conversation in RSA. The common ground is a probability distribution over worlds substituted for the RSA world prior (Figure 4), updated each turn by convex combination with the pragmatic-listener posterior at a learning rate, with weighted, thresholded, and difference sampling for cooperative observation selection.
Main results #
updateCG_matches_linear_learning: the update rule is [Luc59]'s linear learning rule — multi-turn conversation is iterated learning over distributions.lr_one_excludes_false_worlds/graded_update_keeps_false_world: set-intersection update is the lr = 1 degenerate limit; the graded common ground is non-monotonic by design (fn. 7).- Turn-1 and turn-2 predictions over the MutualFriends worlds
(individuals typed by major × location), including
turn2_breaks_symmetry: an updated common ground changes what the same utterance conveys. toBToMSharedUpdate:Shared := PMF WinstantiatesBToMModel.sharedUpdate— BToM discourse dynamics with a distributional shared state.
Implementation notes #
The Figure-4 chain is exact ℚ≥0, parameterized by common-ground weights:
each agent (l0Score/s1Score/l1Score/s2Score) is one
PMF.normalizeScores application over the agent below it, the
distributions are PMF.ofScores, and every prediction closes by the
ofScores comparison family with one kernel certificate.
Distributional common ground #
PMF W wraps PMF W: [Sta02]'s context set with
graded plausibility summing to one (§3.2), so entropy — Anderson's
success criterion — and KL divergence are available on the carrier.
toContextSet projects to the positive-mass worlds (PMF.support),
PMF.uniformOfFintype is the empty common ground; ofWeights renormalizes
non-negative weights (fn. 3). Unlike the classical context set, worlds
can regain probability (fn. 7); intersection update survives only at
lr = 1.
The common ground as a distribution #
Anderson's distributional common ground is a PMF W ([And21]
§3.2): graded plausibility summing to one, with PMF.entropy — the
success criterion — and PMF.toRealFn (the ℝ-valued masses every RSA
consumer reads) already on the carrier. The classical context set is the
support.
A distribution projects to [Sta02]'s context set: the positive-mass worlds.
Equations
- Discourse.instHasContextSetPMF = { toContextSet := fun (p : PMF W) (w : W) => w ∈ p.support }
MutualFriends Domain #
Equations
- Anderson2021.instDecidableEqMFWorld x✝ y✝ = if h : x✝.ctorIdx = y✝.ctorIdx then isTrue ⋯ else isFalse ⋯
Equations
- One or more equations did not get rendered due to their size.
- Anderson2021.instReprMFWorld.repr Anderson2021.MFWorld.ina prec✝ = Repr.addAppParen (Std.Format.nest (if prec✝ ≥ 1024 then 1 else 2) (Std.Format.text "Anderson2021.MFWorld.ina")).group prec✝
Instances For
Equations
- Anderson2021.instReprMFWorld = { reprPrec := Anderson2021.instReprMFWorld.repr }
Equations
Equations
- One or more equations did not get rendered due to their size.
Equations
- Anderson2021.instDecidableEqMajor x✝ y✝ = if h : x✝.ctorIdx = y✝.ctorIdx then isTrue ⋯ else isFalse ⋯
Equations
- One or more equations did not get rendered due to their size.
- Anderson2021.instReprMajor.repr Anderson2021.Major.german prec✝ = Repr.addAppParen (Std.Format.nest (if prec✝ ≥ 1024 then 1 else 2) (Std.Format.text "Anderson2021.Major.german")).group prec✝
Instances For
Equations
- Anderson2021.instReprMajor = { reprPrec := Anderson2021.instReprMajor.repr }
Equations
- Anderson2021.instDecidableEqLocation x✝ y✝ = if h : x✝.ctorIdx = y✝.ctorIdx then isTrue ⋯ else isFalse ⋯
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- Anderson2021.instReprLocation = { reprPrec := Anderson2021.instReprLocation.repr }
Equations
- Anderson2021.worldMajor Anderson2021.MFWorld.ina = Anderson2021.Major.astronomy
- Anderson2021.worldMajor Anderson2021.MFWorld.katie = Anderson2021.Major.astronomy
- Anderson2021.worldMajor Anderson2021.MFWorld.nancy = Anderson2021.Major.german
- Anderson2021.worldMajor Anderson2021.MFWorld.sally = Anderson2021.Major.german
Instances For
Equations
- Anderson2021.worldLocation Anderson2021.MFWorld.ina = Anderson2021.Location.indoors
- Anderson2021.worldLocation Anderson2021.MFWorld.sally = Anderson2021.Location.indoors
- Anderson2021.worldLocation Anderson2021.MFWorld.katie = Anderson2021.Location.outdoors
- Anderson2021.worldLocation Anderson2021.MFWorld.nancy = Anderson2021.Location.outdoors
Instances For
Utterances available to speakers. Includes a null utterance for passing when the speaker has no confident observation to share.
- studyHumanity : MFUtterance
- studyScience : MFUtterance
- likeIndoors : MFUtterance
- likeOutdoors : MFUtterance
- null : MFUtterance
Instances For
Equations
- Anderson2021.instDecidableEqMFUtterance x✝ y✝ = if h : x✝.ctorIdx = y✝.ctorIdx then isTrue ⋯ else isFalse ⋯
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- Anderson2021.instReprMFUtterance = { reprPrec := Anderson2021.instReprMFUtterance.repr }
Equations
Equations
- One or more equations did not get rendered due to their size.
Truth-conditional semantics for MutualFriends utterances.
Equations
- Anderson2021.mfMeaning Anderson2021.MFUtterance.studyHumanity x✝ = (Anderson2021.worldMajor x✝ == Anderson2021.Major.german)
- Anderson2021.mfMeaning Anderson2021.MFUtterance.studyScience x✝ = (Anderson2021.worldMajor x✝ == Anderson2021.Major.astronomy)
- Anderson2021.mfMeaning Anderson2021.MFUtterance.likeIndoors x✝ = (Anderson2021.worldLocation x✝ == Anderson2021.Location.indoors)
- Anderson2021.mfMeaning Anderson2021.MFUtterance.likeOutdoors x✝ = (Anderson2021.worldLocation x✝ == Anderson2021.Location.outdoors)
- Anderson2021.mfMeaning Anderson2021.MFUtterance.null x✝ = true
Instances For
Each non-null utterance partitions the world space in half.
Distributional Common Ground (re-exported from substrate) #
The PMF substrate above supplies the weights; the
conversation-update and RSA content below consume them.
CommonGround Update #
Common-ground update ([And21] §6, Figure 2): the lr-mixture
of the current common ground with the pragmatic-listener posterior —
PMF.mix at the learning rate.
Equations
- Anderson2021.updateCG cg post lr hlr hlr1 = PMF.mix ⟨lr, hlr⟩ ⋯ cg post
Instances For
Bridge to [Luc59] linear learning: the CommonGround update has the same
algebraic form as LinearLearner.update with retention rate 1 - lr and
reinforcement target posterior:
CommonGround'(w) = (1 - lr) · CommonGround(w) + lr · posterior(w) [Anderson]
v'(a) = α · v(a) + (1 - α) · r(a) [Luce §4.C]
Setting α = 1 - lr and r = posterior makes the formulas identical.
Multi-turn conversation IS iterated learning over distributions.
Conversation State #
The state of a two-participant conversation (Figure 2).
Tracks the common ground (distributional), each participant's private
beliefs, and the learning rate for updates. In the shared CommonGround model
(§5.1, Figure 4), both participants access the same cg. In the
approximate CommonGround model (§5.2, Figure 6), each maintains a separate
approximation (not yet formalized).
The distributional CommonGround enters the RSA model at two points
(Figure 4): inside the literal listener and as the pragmatic listener's
prior. At each turn the chain is rebuilt at the current CommonGround
(conversationStep).
- cg : PMF W
- belA : PMF W
- belB : PMF W
- lr : ℝ
- speakerIsA : Bool
Instances For
Initial conversation state: uniform CommonGround, specified beliefs, A speaks first.
Equations
- Anderson2021.ConversationState.initial belA belB lr = { cg := PMF.uniformOfFintype W, belA := belA, belB := belB, lr := lr, speakerIsA := true }
Instances For
Observation Sampling #
Weighted sampling (§7.1): a world sampled in proportion to the speaker's belief — truthful (zero-probability worlds never sampled) but flip-flop-prone for noncommittal speakers (Figure 12).
Equations
- Anderson2021.weightedSample bel = bel.toRealFn
Instances For
Thresholded sampling (§7.1.1): worlds below the confidence threshold are filtered out; with none left the speaker passes with the null utterance and "all updates are skipped" (Figure 13).
Equations
- Anderson2021.thresholdedSample bel θ w = if bel.toRealFn w ≥ θ then bel.toRealFn w else 0
Instances For
Difference-based sampling (§7.2.1): worlds weighted by "the largest (positive) difference in probability between the speaker's beliefs and the Common Ground" — the paper's own truncation at zero (its fn. 14: probability reductions are also informative but assertions can't convey them). Non-redundancy as Gricean Quantity.
Equations
- Anderson2021.differenceSample bel cg w = max 0 (bel.toRealFn w - cg.toRealFn w)
Instances For
BToM Shared-State Update #
The Figure-5 beliefs #
The §5.1.1 scenario — A thinks the person is Nancy, B thinks Katie — is qualitative in the paper; the 3:1 peak (renormalised to ½ there, 1/6 elsewhere) is this file's illustrative instantiation.
Beliefs peaked on one world: illustrative 3:1 weights, renormalised.
Equations
- Anderson2021.peakedBeliefs p = PMF.ofRealWeightFn (fun (w : Anderson2021.MFWorld) => if w = p then 3 else 1) ⋯ ⋯
Instances For
A believes Nancy; B believes Katie (Figure 5).
Instances For
See beliefsA.
Instances For
Peaked beliefs favor their peak over every other world.
The Figure-4 model on ℚ≥0 scores #
The Figure-4 chain #
Shared-CG RSA: L0 ∝ ⟦u⟧·CG, S1 ∝ LitList (α = 1, fn. 3), L1 ∝ PragSpeak·CG, with the endorsement speaker S2 ∝ L1.
The score chain #
CG-weighted literal listener ([And21] Figure 4: L0 ∝ ⟦u⟧·CG).
Equations
- Anderson2021.l0Score cg u = PMF.normalizeScores fun (w : Anderson2021.MFWorld) => if Anderson2021.mfMeaning u w = true then cg w else 0
Instances For
Pragmatic speaker ([And21] Figure 4: S1 ∝ LitList; fn. 3: the
softmax terms are omitted and probabilities renormalized, i.e. α = 1 and
no cost — the speaker weight IS the literal-listener value).
Equations
- Anderson2021.s1Score cg w = PMF.normalizeScores fun (u : Anderson2021.MFUtterance) => Anderson2021.l0Score cg u w
Instances For
Pragmatic listener ([And21] Figure 4: L1 ∝ PragSpeak·CG).
Equations
- Anderson2021.l1Score cg u = PMF.normalizeScores fun (w : Anderson2021.MFWorld) => cg w * Anderson2021.s1Score cg w u
Instances For
Endorsement speaker: S2(u|w) ∝ L1(w|u) (uniform utterance prior),
the standard endorsement inversion of L1 over utterances.
Equations
- Anderson2021.s2Score cg w = PMF.normalizeScores fun (u : Anderson2021.MFUtterance) => Anderson2021.l1Score cg u w
Instances For
The prior-transparency mechanism #
Worlds with the same truth profile get identical speaker columns — the
common-ground factor cancels out of S1's normalization
(PMF.normalizeScores_mul_left) — so L1 orders them purely by their
common-ground weights. Every tie and tie-break below is an instance.
Same truth profile, same speaker column: the CG weight scales the whole L0 row and cancels under normalization.
L1 orders same-profile worlds by their CG weights alone.
Turn-1 speaker (uniform CommonGround, [And21] Figure 2:
CG = Uniform(worlds)).
Equations
- Anderson2021.s1Turn1 w = PMF.ofScores PMF.Fallback.uniform (Anderson2021.s1Score (fun (x : Anderson2021.MFWorld) => 1) w)
Instances For
Turn-1 pragmatic listener.
Equations
- Anderson2021.l1Turn1 u = PMF.ofScores PMF.Fallback.uniform (Anderson2021.l1Score (fun (x : Anderson2021.MFWorld) => 1) u)
Instances For
Turn-1 endorsement speaker.
Equations
- Anderson2021.s2Turn1 w = PMF.ofScores PMF.Fallback.uniform (Anderson2021.s2Score (fun (x : Anderson2021.MFWorld) => 1) w)
Instances For
Turn-1 speaker values ([And21] Figure-4 equations at the uniform
CG; derived exact rationals — Figure 5 reports the qualitative profile):
2/5 on each true specific utterance, 1/5 on null, 0 off support.
Turn-1 listener values (derived; L1 ∝ PragSpeak·CG, Figure 4): 1/2
on each world satisfying a specific utterance, 1/4 on every world after
null, 0 off support.
Turn-1 predictions #
The turn-1 speaker: true-and-specific beats false or uninformative;
equal literal posteriors tie (s1Score_congr at the uniform CG).
The turn-1 listener: utterances favor the worlds they discriminate and
tie same-profile worlds at the uniform CG (l1Score_lt_iff).
The null utterance conveys nothing: L1 stays uniform.
Turn 2 (Post-Update Prior) #
CommonGround weights after hearing "studyHumanity" at turn 1.
After L1 processes "studyHumanity" with uniform prior, the posterior
concentrates on nancy and sally (the German-studying worlds):
L1(studyHumanity) = [0, 0, 1/2, 1/2]. Updating the normalized uniform
CommonGround [1/4, 1/4, 1/4, 1/4] via updateCG with lr=0.2 (footnote 9) gives:
CommonGround'(w) = 0.8 · 1/4 + 0.2 · L1(w)
CommonGround' = [1/5, 1/5, 3/10, 3/10]
The weights [2, 2, 3, 3] are proportional to [1/5, 1/5, 3/10, 3/10], which is the exact post-update CommonGround from the paper's Figure 5, panel 1A. Since RSA normalizes, proportional weights give identical predictions.
Equations
Instances For
Turn-2 speaker at the post-update CommonGround.
Equations
Instances For
Turn-2 pragmatic listener.
Equations
Instances For
Turn-2 predictions #
After the "studyHumanity" update, L1 orders same-profile worlds by
their new CG weights (l1Score_lt_iff): the outdoor pair breaks toward
Nancy, the indoor pair toward Sally, equal-weight pairs stay tied.
The key multi-turn prediction: turn 1 ties Nancy and Katie under "likeOutdoors"; the CG update breaks the tie toward Nancy.
The CG-adapted speaker #
The CG-weighted L0 makes speakers prefer new over redundant
information: after the update, Nancy's speaker switches to "likeOutdoors"
(re-asserting "studyHumanity" is redundant at 3:3 weights) and Ina's to
"studyScience" (Sally now dominates the indoor partition). At turn 1 both
pairs were symmetric (s1_turn1_informativity).
The endorsement speaker inverts L1: for Nancy, "studyHumanity" beats the false "studyScience" and ties the symmetric "likeOutdoors".
Parametric RSA and Conversation Step #
One Figure-2 loop step on the ℚ≥0 face: build the chain at the current common ground, take the L1 posterior, and mix it in at the learning rate. A dead score row — an utterance contradicting the whole common ground — skips the update (§7.1's null behaviour).
Equations
- One or more equations did not get rendered due to their size.
Instances For
With lr = 0, the conversation step leaves the CommonGround unchanged.
Qualitative information-sharing properties #
An informative utterance concentrates L1 above the uniform null baseline (1/2 vs 1/4 on Nancy).
Bridge to Classical CommonGround Update #
The [Sta02] set-intersection update survives only at the lr = 1 limit: with the prior discarded, a zero-posterior world leaves the context set.
The graded divergence (fn. 7: "worlds can regain probability"): at
any lr < 1 the prior keeps a ruled-out world alive with mass
(1 − lr)·cg w — the update never deletes a world the prior supports.
Exact Numerical Predictions (turn 1) #
Exact turn-1 values #
Null gives uniform L1: every world has the same S1(null|w) by the domain's symmetry, so L1(w|null) = CommonGround(w)/Σ CommonGround = 1/4.
Approximate CommonGround Model (§5.2, Figure 6) #
Approximate common ground #
Separate speaker/listener CG representations (Figure 6): the listener's Pragmatic Listener runs on their own beliefs, so divergence can arise when those differ from the common ground.
State for the Approximate CommonGround model (§5.2, Figure 6).
- cgA : PMF W
- cgB : PMF W
- belA : PMF W
- belB : PMF W
- lr : ℝ
- speakerIsA : Bool
Instances For
Equations
- Anderson2021.ApproxCGState.initial belA belB lr = { cgA := PMF.uniformOfFintype W, cgB := PMF.uniformOfFintype W, belA := belA, belB := belB, lr := lr, speakerIsA := true }
Instances For
Approximate comprehension listener ([And21] Figure 6): L0/S1 run
over the listener's CommonGround approximation CG_L, but the Bayesian
inversion uses the listener's private beliefs B_L as the prior.
Equations
- Anderson2021.approxL1 cgL belL u = PMF.ofScores PMF.Fallback.uniform (PMF.normalizeScores fun (w : Anderson2021.MFWorld) => belL w * Anderson2021.s1Score cgL w u)
Instances For
Belief Update Model (§6, Figure 8) #
The belief update model extends the conversation system by also updating participants' private beliefs. After comprehension, the listener updates their beliefs via the same linear rule as CommonGround update:
Bel'(w) = (1 - lr_bel) · Bel(w) + lr_bel · posterior(w)
The speaker does not update beliefs (they already knew the info). Different learning rates for CommonGround vs beliefs allow modeling:
- §6.2: skeptical listeners (low belief lr, high CommonGround lr)
- §6.3: uncertainty-based rates (lr scales with entropy)
State for the belief update model (Figure 8). Extends approximate CommonGround with separate learning rates for CommonGround and beliefs.
- cgA : PMF W
- cgB : PMF W
- belA : PMF W
- belB : PMF W
- cgLR : ℝ
Learning rate for CommonGround updates.
- belLR : ℝ
Learning rate for belief updates (may be lower for skeptical agents).
- speakerIsA : Bool
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Belief update is algebraically identical to CommonGround update — both are instances of [Luc59]'s linear learning rule. The only difference is the learning rate parameter and the interpretation (private vs shared).
Noncommittal Speaker Problem (§7.1) #
A speaker with uniform beliefs (no private information) assigns equal weight to all worlds under weighted sampling. Since no observation is more probable than any other, the speaker makes random assertions about worlds they don't know, causing the CommonGround to flip-flop (Figure 12).
Solutions:
- Threshold sampling (§7.1.1): filter out low-confidence worlds; a noncommittal speaker passes (null utterance) instead of guessing.
- Uncertainty-based lr (§6.3): scale the CommonGround update rate by the listener's uncertainty, so confident listeners resist random input.
Uniform beliefs assign equal weight to all worlds under weighted sampling — a noncommittal speaker has no basis for choosing.
Threshold sampling filters out all worlds when beliefs don't exceed
the threshold. Every mass of a distribution is ≤ 1, so any θ > 1
produces zero weight everywhere — the speaker passes (Figure 13).
Threshold preserves confident worlds: weights above θ pass through.
Redundancy and Difference Sampling (§7.2) #
Under weighted sampling, a speaker whose beliefs match the CommonGround keeps
repeating already-shared information (Figure 14). Difference-based
sampling fixes this by weighting worlds by max(0, Bel(w) - CommonGround(w)):
worlds already established in the CommonGround get zero weight.
Combined with thresholding, thresholded difference-based sampling gives the best behavior (Figure 15): informed speakers contribute new information, noncommittal speakers pass.
When beliefs match the CommonGround exactly, difference sampling assigns zero weight to all worlds — nothing new to contribute.
Difference sampling assigns positive weight when belief exceeds CommonGround — these worlds carry new information not yet in the common ground.