RSAConfig.composeWithPrior — stage-to-stage Bayesian composition #
@cite{nouwen-2024}
The canonical operation that composes two RSAConfigs by Bayesian update:
the L1 posterior of the previous stage becomes the worldPrior of the next.
prev : RSAConfig U₁ W next : RSAConfig U₂ W
───────────────────── ────────────────────
prev.L1 u_prev : W → ℝ (anything that takes a worldPrior)
│
▼
composeWithPrior prev u_prev next : RSAConfig U₂ W
(= next, but with worldPrior overridden by prev.L1 u_prev)
This makes the "one config's L1 IS the next config's prior" idiom — currently
inlined in @cite{nouwen-2024}'s seqAdjCfg — a first-class operation rather
than a per-study plumbing pattern.
Speaker chain rule (A) vs listener chain rule (C) #
Sequential RSA admits two encodings that are Bayesian-equivalent in the right setup:
(A) Speaker chain (within-utterance,
Ctx-based). A singleRSAConfigwithCtx = List Uand actx-dependent meaning function. Per-word S1 probabilities are chained viatrajectoryProb(Basic.lean). Used byIncremental.lean(CGGP) and Product-of-Experts noisy models (@cite{waldon-degen-2021}, @cite{schlotterbeck-wang-2023}).(C) Listener chain (stage-to-stage). A chain of
RSAConfigs where stage k's worldPrior is stage (k-1)'s L1 (this operation). Used by @cite{nouwen-2024}'s sequential evaluative-then-adjective inference.
For a noisy lex f : U → W → ℚ and a sequence [w₁, ..., wₙ], both encodings
yield the same L1 posterior:
L1(r | trajectory) ∝ ∏ₖ f(wₖ, r) · prior(r)
A computes this via the speaker-side product ∏ₖ S1(wₖ | r, ctx≤k-1) (noting
that linglib's trajectoryProb is product-of-per-step-softmaxes, which agrees
with this only after marginalization — see trajectoryProb_eq_compose_chain
TODO); C computes this by iterating Bayesian update on the listener side.
The trajectoryProb_eq_compose_chain theorem below states the equivalence
explicitly (with sorry); the full proof relates per-step S1 normalization
(A) to per-step Bayesian denominators (C), and is left as future work
documented in MEMORY.md.
Compose two RSAConfigs by Bayesian update: override next.worldPrior
with prev.L1 u_prev. All other fields come from next.
This is the canonical stage-to-stage composition operation. The previous
stage's listener posterior — given some specific observed utterance
u_prev from prev's vocabulary — becomes the prior over worlds for
the next stage.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Associativity at the worldPrior layer. Composing a chain
c₁ → c₂ → c₃ is the same as composing (c₁ → c₂) → c₃ because
composeWithPrior only overrides the worldPrior, and the final
config's worldPrior comes from the immediately-preceding config's L1.
Note: this is associativity of worldPrior threading, not associativity
in a category-theoretic sense — the L1 of composeWithPrior is a derived
quantity that may differ from prev.L1 because it uses the new prior.
Speaker-vs-listener chain rule equivalence (TODO, sorry'd).
For a Product-of-Experts noisy lex s : NoisyLex U W over a shared
utterance vocabulary, the within-utterance Ctx-based encoding (A —
speaker chain via trajectoryProb) and the stage-to-stage
composeWithPrior chain (C — listener chain) yield the same L1
posterior at the final stage.
Formally: for any prefix [u₁, ..., uₙ] of utterances and final-step
observation uₙ, the L1 of the (n-stage composeWithPrior chain
starting from one-shot configs over s.lex) equals the L1 of the
sequential config (s.toRSAConfigSeq scene) after observing the
trajectory.
This documents the architectural identity that justifies the coexistence of the two idioms in the codebase. The full proof requires relating per-step S1 normalization (A) to per-step Bayesian denominators (C). It is left as a sorry-marked TODO so the claim is grep-able and warning-flagged.