[WD21] — Continuous-Incremental RSA (CI-RSA) #

[CGGP19] [DHG+20]

Waldon, B. & Degen, J. (2021). Modeling cross-linguistic production of referring expressions. Proceedings of the Society for Computation in Linguistics (SCiL) 4, 206–215.

The Model #

CI-RSA synthesizes two RSA extensions:

Incremental RSA ([CGGP19]): Word-by-word production via the chain rule S1(u|r) = ∏ₖ S1(wₖ | [w₁,...,wₖ₋₁], r)
Continuous semantics ([DHG+20]): Noisy adjective reliability L^C(r, i) = v^i if i true of r, else 1 - v^i

The incremental meaning function averages continuous semantics over grammatical completions of the current prefix:

X^C(c, i, r) = Σ_{u ⊒ c+i} ⟦u⟧^C(r) / |{u : u ⊒ c+i}|

The utterance set is scene-filtered: only utterances Boolean-true of at least one scene member are included (Figure 1).

Formalization #

This builds on the incremental word-by-word chain (following [CGGP19]), adding:

Continuous (ℚ-valued) meaning instead of Boolean extension-counting
rpow-based s1Score with α = 7
Scene-parameterized configs for cross-condition comparisons

The three predictions are trajectory probability comparisons across different (language × scene) configurations of the same chain.

Predictions #

#	Prediction	Status
1	English color/size asymmetry: SS > CS	`prediction1_english_asymmetry`
2	Cross-linguistic: English SS > Spanish SS	`prediction2_cross_linguistic`
3	Spanish flip: CS > SS for redundant size	`prediction3_spanish_flip`
4	Overall: English total > Spanish total	`prediction4_overall_redundancy`

Connections #

Noise theory: lexContinuousQ instantiates the unified noise channel from RSA.Core.Noise. See lexContinuous_as_noiseChannel.
Incremental RSA: Extends [CGGP19] with continuous semantics and cross-linguistic word order variation.

Domain Types #

source

inductive WaldonDegen2021.Word :

Type

Words available to the incremental speaker: two color adjectives, two size adjectives, a noun ("pin"), and an explicit stop token. The stop token models the speaker's choice to end the utterance; without it, postnominal word orders lack a way to represent the stopping decision after the noun (cf. English where "pin" naturally terminates utterances).

blue : Word
red : Word
big : Word
small : Word
pin : Word
stop : Word

Instances For

source

@[implicit_reducible]

instance WaldonDegen2021.instDecidableEqWord :

DecidableEq Word

Equations

WaldonDegen2021.instDecidableEqWord x✝ y✝ = if h : x✝.ctorIdx = y✝.ctorIdx then isTrue ⋯ else isFalse ⋯

source

@[implicit_reducible]

instance WaldonDegen2021.instFintypeWord :

Fintype Word

Equations

WaldonDegen2021.instFintypeWord = { elems := { val := ↑WaldonDegen2021.Word.enumList, nodup := WaldonDegen2021.Word.enumList_nodup }, complete := WaldonDegen2021.instFintypeWord._proof_1 }

source

def WaldonDegen2021.instReprWord.repr :

Word → ℕ → Std.Format

Equations

One or more equations did not get rendered due to their size.
WaldonDegen2021.instReprWord.repr WaldonDegen2021.Word.blue prec✝ = Repr.addAppParen (Std.Format.nest (if prec✝ ≥ 1024 then 1 else 2) (Std.Format.text "WaldonDegen2021.Word.blue")).group prec✝
WaldonDegen2021.instReprWord.repr WaldonDegen2021.Word.red prec✝ = Repr.addAppParen (Std.Format.nest (if prec✝ ≥ 1024 then 1 else 2) (Std.Format.text "WaldonDegen2021.Word.red")).group prec✝
WaldonDegen2021.instReprWord.repr WaldonDegen2021.Word.big prec✝ = Repr.addAppParen (Std.Format.nest (if prec✝ ≥ 1024 then 1 else 2) (Std.Format.text "WaldonDegen2021.Word.big")).group prec✝
WaldonDegen2021.instReprWord.repr WaldonDegen2021.Word.pin prec✝ = Repr.addAppParen (Std.Format.nest (if prec✝ ≥ 1024 then 1 else 2) (Std.Format.text "WaldonDegen2021.Word.pin")).group prec✝
WaldonDegen2021.instReprWord.repr WaldonDegen2021.Word.stop prec✝ = Repr.addAppParen (Std.Format.nest (if prec✝ ≥ 1024 then 1 else 2) (Std.Format.text "WaldonDegen2021.Word.stop")).group prec✝

Instances For

source

@[implicit_reducible]

instance WaldonDegen2021.instReprWord :

Repr Word

Equations

WaldonDegen2021.instReprWord = { reprPrec := WaldonDegen2021.instReprWord.repr }

source

instance WaldonDegen2021.instNonemptyWord :

Nonempty Word

source

inductive WaldonDegen2021.Referent :

Type

Referents in the 2×2 reference game: big/small × blue/red.

bigBlue : Referent
bigRed : Referent
smallBlue : Referent
smallRed : Referent

Instances For

source

@[implicit_reducible]

instance WaldonDegen2021.instDecidableEqReferent :

DecidableEq Referent

Equations

WaldonDegen2021.instDecidableEqReferent x✝ y✝ = if h : x✝.ctorIdx = y✝.ctorIdx then isTrue ⋯ else isFalse ⋯

source

@[implicit_reducible]

instance WaldonDegen2021.instFintypeReferent :

Fintype Referent

Equations

One or more equations did not get rendered due to their size.

source

@[implicit_reducible]

instance WaldonDegen2021.instReprReferent :

Repr Referent

Equations

WaldonDegen2021.instReprReferent = { reprPrec := WaldonDegen2021.instReprReferent.repr }

source

def WaldonDegen2021.instReprReferent.repr :

Referent → ℕ → Std.Format

Equations

One or more equations did not get rendered due to their size.

Instances For

Boolean Semantics #

source

def WaldonDegen2021.wordApplies :

Word → Referent → Bool

Whether a word is veridically true of a referent.

Equations

Instances For

Continuous Semantics #

source

def WaldonDegen2021.semanticValueQ :

Word → ℚ

Semantic reliability values v^i. Color adjectives are more reliable than size adjectives: v^color = 19/20 (0.95), v^size = 4/5 (0.8).

Equations

Instances For

source

def WaldonDegen2021.lexContinuousQ (r : Referent) (w : Word) :

ℚ

Continuous lexical interpretation L^C(r, i). Returns v^i if true, (1 - v^i) if false.

Equations

WaldonDegen2021.lexContinuousQ r w = if WaldonDegen2021.wordApplies w r = true then WaldonDegen2021.semanticValueQ w else 1 - WaldonDegen2021.semanticValueQ w

Instances For

source

def WaldonDegen2021.uttContinuousQ (r : Referent) (u : List Word) :

ℚ

Continuous utterance meaning ⟦u⟧^C(r) = ∏_{w ∈ u} L^C(r, w).

Equations

WaldonDegen2021.uttContinuousQ r u = List.foldl (fun (acc : ℚ) (w : WaldonDegen2021.Word) => acc * WaldonDegen2021.lexContinuousQ r w) 1 u

Instances For

Utterances (Scene-Filtered) #

source

def WaldonDegen2021.uttBoolTrue (u : List Word) (r : Referent) :

Bool

Boolean utterance truth: conjunction of word applicability.

Equations

WaldonDegen2021.uttBoolTrue u r = u.all fun (w : WaldonDegen2021.Word) => WaldonDegen2021.wordApplies w r

Instances For

source

def WaldonDegen2021.allUttsEng :

List (List Word)

All grammatical English (prenominal) utterances, each terminated by .stop. In English the noun always comes last before stop, so "pin" naturally precedes the stopping decision.

Equations

One or more equations did not get rendered due to their size.

Instances For

source

def WaldonDegen2021.allUttsSpn :

List (List Word)

All grammatical Spanish (postnominal) utterances, each terminated by .stop. The stop token is critical here: after [pin, blue], the S1 chooses between .stop (2-word non-redundant) and .small (continuing to the 3-word redundant utterance). Without .stop, the model forces continuation whenever valid extensions exist.

Equations

One or more equations did not get rendered due to their size.

Instances For

source

def WaldonDegen2021.sceneFilter (utts : List (List Word)) (scene : Referent → Bool) :

List (List Word)

Scene-filtered utterances: only those Boolean-true of at least one scene member (Figure 1). This yields 7 utterances per scene.

Equations

One or more equations did not get rendered due to their size.

Instances For

Production Cost #

source

def WaldonDegen2021.wordCostQ :

Word → ℚ

Per-word production cost (Section 4): each adjective incurs cost 0.1. Pin and stop have zero cost (noun and utterance boundary).

Equations

WaldonDegen2021.wordCostQ WaldonDegen2021.Word.pin = 0
WaldonDegen2021.wordCostQ WaldonDegen2021.Word.stop = 0
WaldonDegen2021.wordCostQ x✝ = 1 / 10

Instances For

Extension-Based Continuous Meaning #

source

def WaldonDegen2021.continuousMeaningQ (utts : List (List Word)) (scene : Referent → Bool) (pfx : List Word) (r : Referent) :

ℚ

Incremental continuous meaning: average continuous semantics over all grammatical completions of prefix.

X^C(c, i, r) = Σ_{u ⊒ c+i} ⟦u⟧^C(r) / |{u : u ⊒ c+i}|

Equations

One or more equations did not get rendered due to their size.

Instances For

Scenes #

source

def WaldonDegen2021.ssScene :

Referent → Bool

Size-sufficient scene: {big_blue, big_red, small_blue}. Target small_blue is uniquely identified by size alone.

Equations

WaldonDegen2021.ssScene WaldonDegen2021.Referent.bigBlue = true
WaldonDegen2021.ssScene WaldonDegen2021.Referent.bigRed = true
WaldonDegen2021.ssScene WaldonDegen2021.Referent.smallBlue = true
WaldonDegen2021.ssScene x✝ = false

Instances For

source

def WaldonDegen2021.csScene :

Referent → Bool

Color-sufficient scene: {small_red, big_red, small_blue}. Target small_blue is uniquely identified by color alone.

Equations

WaldonDegen2021.csScene WaldonDegen2021.Referent.smallRed = true
WaldonDegen2021.csScene WaldonDegen2021.Referent.bigRed = true
WaldonDegen2021.csScene WaldonDegen2021.Referent.smallBlue = true
WaldonDegen2021.csScene x✝ = false

Instances For

Exact-ℚ face and the cost atom #

With α = 7 the informativity factor L0^α is exact ℚ; the only transcendental ingredient is the per-adjective cost factor cAtom = RSA.expAtom (7/10), bounded two-sidedly via the substrate certificates and kernel arithmetic on e-bounds. Every prediction trajectory reduces to K · cAtom / (A + B · cAtom) with kernel-certified rational constants, so the comparisons are linear (the sum comparison quadratic) in the atom.

source

def WaldonDegen2021.l0Q (utts : List (List Word)) (scene : Referent → Bool) (ctx : List Word) (u : Word) (r : Referent) :

ℚ

L0 policy value: scene-gated continuous meaning normalized over referents (all rational).

Equations

One or more equations did not get rendered due to their size.

Instances For

source

def WaldonDegen2021.s1BaseQ (utts : List (List Word)) (scene : Referent → Bool) (tgt : Referent) (ctx : List Word) (u : Word) :

ℚ

Informativity factor of the S1 score (α = 7).

Equations

WaldonDegen2021.s1BaseQ utts scene tgt ctx u = WaldonDegen2021.l0Q utts scene ctx u tgt ^ 7

Instances For

source

def WaldonDegen2021.costExp :

Word → ℕ

Cost exponent: one cAtom factor per adjective (C = 1/10, α = 7).

Equations

WaldonDegen2021.costExp WaldonDegen2021.Word.pin = 0
WaldonDegen2021.costExp WaldonDegen2021.Word.stop = 0
WaldonDegen2021.costExp x✝ = 1

Instances For

source

noncomputable def WaldonDegen2021.cAtom :

ℝ

The per-adjective cost factor exp(−α·C) = exp(−7/10).

Equations

WaldonDegen2021.cAtom = RSA.expAtom (7 / 10)

Instances For

source

theorem WaldonDegen2021.cAtom_pos :

0 < cAtom

source

theorem WaldonDegen2021.cAtom_bounds :

4965 / 10000 < cAtom ∧ cAtom < 4967 / 10000

Kernel-certified atom bounds via RSA.lt_expAtom/expAtom_lt at n = 10: (4965/10000)¹⁰·e⁷ < 1 < (4967/10000)¹⁰·e⁷.

source

noncomputable def WaldonDegen2021.s1PMF (utts : List (List Word)) (scene : Referent → Bool) (tgt : Referent) (ctx : List Word) :

PMF Word

Incremental CI-RSA speaker at context ctx (S1 ∝ L0⁷·exp(−7·C)), dite-total.

Equations

One or more equations did not get rendered due to their size.

Instances For

Scene-Filter Cardinality #

source

theorem WaldonDegen2021.ss_eng_has_7_utts :

(sceneFilter allUttsEng ssScene).length = 7

source

theorem WaldonDegen2021.cs_eng_has_7_utts :

(sceneFilter allUttsEng csScene).length = 7

source

theorem WaldonDegen2021.ss_spn_has_7_utts :

(sceneFilter allUttsSpn ssScene).length = 7

source

theorem WaldonDegen2021.cs_spn_has_7_utts :

(sceneFilter allUttsSpn csScene).length = 7

Predictions #

source

Prediction 1 (English color/size asymmetry): redundant color in the size-sufficient scene beats redundant size in the color-sufficient scene, because v^color > v^size makes color words more informative.

source

Prediction 2 (cross-linguistic): English prenominal order produces more redundant color than Spanish postnominal order.

source

Prediction 3 (novel, Spanish flip): postnominally, redundant size in CS exceeds redundant color in SS — the early noun anchors the extension sets differently.

Semantic Properties #

source

theorem WaldonDegen2021.color_more_reliable_than_size :

semanticValueQ Word.blue > semanticValueQ Word.big ∧ semanticValueQ Word.red > semanticValueQ Word.small

Color adjectives have higher reliability than size adjectives. This asymmetry drives the redundant modification predictions.

source

theorem WaldonDegen2021.semantic_values_positive (w : Word) :

semanticValueQ w > 0

All semantic values are positive (required for valid probability).

Noise Theory Connection + Substrate Bridge #

source

theorem WaldonDegen2021.lexContinuous_as_noiseChannel (r : Referent) (w : Word) :

lexContinuousQ r w = RSA.Noise.noiseChannel (semanticValueQ w) (1 - semanticValueQ w) (if wordApplies w r = true then 1 else 0)

lexContinuousQ is an instance of the unified noise channel from RSA.Core.Noise. The continuous lexical semantics L^C(r, i) is exactly the noise channel with onMatch = v^i, onMismatch = 1 - v^i, b = 1 if item i is true of referent r, 0 otherwise.

This connects [WD21] to the [DHG+20] parameterization where mismatch = 1 - match.

source

def WaldonDegen2021.noisyLex :

RSA.NoisyLex Word Referent

lexContinuousQ packaged as a RSA.NoisyLex bundle. The bundle is the substrate this study and [SW23] share — each provides its own lex and reliability parameters; the PoE prefix-product machinery (RSA.prefixMeaning and friends) is reused.

Equations

WaldonDegen2021.noisyLex = { lex := fun (w : WaldonDegen2021.Word) (r : WaldonDegen2021.Referent) => WaldonDegen2021.lexContinuousQ r w, lex_nonneg := WaldonDegen2021.noisyLex._proof_1 }

Instances For

source

theorem WaldonDegen2021.uttContinuousQ_eq_prefixMeaning (r : Referent) (u : List Word) :

uttContinuousQ r u = noisyLex.prefixMeaning u r

uttContinuousQ is the NoisyLex.prefixMeaning of the bundled lex (modulo argument order). Substrate-bridge analogue of S&W's prefix_meaning_product for the W&D extension-averaging context.

Uses the polymorphic RSA.prefixMeaning_eq_foldl_mul from Sequential.lean — no need for a study-local foldl helper.

Prediction 4: Overall Cross-Linguistic Redundancy #

source

theorem WaldonDegen2021.prediction4_overall_redundancy :

Prediction 4 (overall cross-linguistic redundancy): summed across scenes, English redundant modification exceeds Spanish.

Documentation

Linglib.Studies.WaldonDegen2021

[WD21] — Continuous-Incremental RSA (CI-RSA) #

The Model #

Formalization #

Predictions #

Connections #

Domain Types #

Boolean Semantics #

Continuous Semantics #

Utterances (Scene-Filtered) #

Production Cost #

Extension-Based Continuous Meaning #

Scenes #

Exact-ℚ face and the cost atom #

Scene-Filter Cardinality #

Predictions #

Semantic Properties #

Noise Theory Connection + Substrate Bridge #

Prediction 4: Overall Cross-Linguistic Redundancy #