Documentation

Linglib.Studies.WaldonDegen2021

[WD21] — Continuous-Incremental RSA (CI-RSA) #

[CGGP19] [DHG+20]

Waldon, B. & Degen, J. (2021). Modeling cross-linguistic production of referring expressions. Proceedings of the Society for Computation in Linguistics (SCiL) 4, 206–215.

The Model #

CI-RSA synthesizes two RSA extensions:

  1. Incremental RSA ([CGGP19]): Word-by-word production via the chain rule S1(u|r) = ∏ₖ S1(wₖ | [w₁,...,wₖ₋₁], r)
  2. Continuous semantics ([DHG+20]): Noisy adjective reliability L^C(r, i) = v^i if i true of r, else 1 - v^i

The incremental meaning function averages continuous semantics over grammatical completions of the current prefix:

X^C(c, i, r) = Σ_{u ⊒ c+i} ⟦u⟧^C(r) / |{u : u ⊒ c+i}|

The utterance set is scene-filtered: only utterances Boolean-true of at least one scene member are included (Figure 1).

Formalization #

This builds on the incremental word-by-word chain (following [CGGP19]), adding:

The three predictions are trajectory probability comparisons across different (language × scene) configurations of the same chain.

Predictions #

#PredictionStatus
1English color/size asymmetry: SS > CSprediction1_english_asymmetry
2Cross-linguistic: English SS > Spanish SSprediction2_cross_linguistic
3Spanish flip: CS > SS for redundant sizeprediction3_spanish_flip
4Overall: English total > Spanish totalprediction4_overall_redundancy

Connections #

Domain Types #

Words available to the incremental speaker: two color adjectives, two size adjectives, a noun ("pin"), and an explicit stop token. The stop token models the speaker's choice to end the utterance; without it, postnominal word orders lack a way to represent the stopping decision after the noun (cf. English where "pin" naturally terminates utterances).

Instances For
    @[implicit_reducible]
    Equations
    def WaldonDegen2021.instReprWord.repr :
    WordStd.Format
    Equations
    Instances For
      @[implicit_reducible]
      Equations

      Referents in the 2×2 reference game: big/small × blue/red.

      Instances For
        @[implicit_reducible]
        Equations
        @[implicit_reducible]
        Equations
        • One or more equations did not get rendered due to their size.
        Equations
        • One or more equations did not get rendered due to their size.
        Instances For

          Boolean Semantics #

          Continuous Semantics #

          Continuous lexical interpretation L^C(r, i). Returns v^i if true, (1 - v^i) if false.

          Equations
          Instances For
            def WaldonDegen2021.uttContinuousQ (r : Referent) (u : List Word) :

            Continuous utterance meaning ⟦u⟧^C(r) = ∏_{w ∈ u} L^C(r, w).

            Equations
            Instances For

              Utterances (Scene-Filtered) #

              def WaldonDegen2021.uttBoolTrue (u : List Word) (r : Referent) :
              Bool

              Boolean utterance truth: conjunction of word applicability.

              Equations
              Instances For

                All grammatical English (prenominal) utterances, each terminated by .stop. In English the noun always comes last before stop, so "pin" naturally precedes the stopping decision.

                Equations
                • One or more equations did not get rendered due to their size.
                Instances For

                  All grammatical Spanish (postnominal) utterances, each terminated by .stop. The stop token is critical here: after [pin, blue], the S1 chooses between .stop (2-word non-redundant) and .small (continuing to the 3-word redundant utterance). Without .stop, the model forces continuation whenever valid extensions exist.

                  Equations
                  • One or more equations did not get rendered due to their size.
                  Instances For
                    def WaldonDegen2021.sceneFilter (utts : List (List Word)) (scene : ReferentBool) :
                    List (List Word)

                    Scene-filtered utterances: only those Boolean-true of at least one scene member (Figure 1). This yields 7 utterances per scene.

                    Equations
                    • One or more equations did not get rendered due to their size.
                    Instances For

                      Production Cost #

                      Per-word production cost (Section 4): each adjective incurs cost 0.1. Pin and stop have zero cost (noun and utterance boundary).

                      Equations
                      Instances For

                        Extension-Based Continuous Meaning #

                        def WaldonDegen2021.continuousMeaningQ (utts : List (List Word)) (scene : ReferentBool) (pfx : List Word) (r : Referent) :

                        Incremental continuous meaning: average continuous semantics over all grammatical completions of prefix.

                        X^C(c, i, r) = Σ_{u ⊒ c+i} ⟦u⟧^C(r) / |{u : u ⊒ c+i}|

                        Equations
                        • One or more equations did not get rendered due to their size.
                        Instances For

                          Scenes #

                          Size-sufficient scene: {big_blue, big_red, small_blue}. Target small_blue is uniquely identified by size alone.

                          Equations
                          Instances For

                            Color-sufficient scene: {small_red, big_red, small_blue}. Target small_blue is uniquely identified by color alone.

                            Equations
                            Instances For

                              Exact-ℚ face and the cost atom #

                              With α = 7 the informativity factor L0^α is exact ℚ; the only transcendental ingredient is the per-adjective cost factor cAtom = RSA.expAtom (7/10), bounded two-sidedly via the substrate certificates and kernel arithmetic on e-bounds. Every prediction trajectory reduces to K · cAtom / (A + B · cAtom) with kernel-certified rational constants, so the comparisons are linear (the sum comparison quadratic) in the atom.

                              def WaldonDegen2021.l0Q (utts : List (List Word)) (scene : ReferentBool) (ctx : List Word) (u : Word) (r : Referent) :

                              L0 policy value: scene-gated continuous meaning normalized over referents (all rational).

                              Equations
                              • One or more equations did not get rendered due to their size.
                              Instances For
                                def WaldonDegen2021.s1BaseQ (utts : List (List Word)) (scene : ReferentBool) (tgt : Referent) (ctx : List Word) (u : Word) :

                                Informativity factor of the S1 score (α = 7).

                                Equations
                                Instances For

                                  Cost exponent: one cAtom factor per adjective (C = 1/10, α = 7).

                                  Equations
                                  Instances For
                                    noncomputable def WaldonDegen2021.cAtom :

                                    The per-adjective cost factor exp(−α·C) = exp(−7/10).

                                    Equations
                                    Instances For
                                      theorem WaldonDegen2021.cAtom_bounds :
                                      4965 / 10000 < cAtom cAtom < 4967 / 10000

                                      Kernel-certified atom bounds via RSA.lt_expAtom/expAtom_lt at n = 10: (4965/10000)¹⁰·e⁷ < 1 < (4967/10000)¹⁰·e⁷.

                                      noncomputable def WaldonDegen2021.s1PMF (utts : List (List Word)) (scene : ReferentBool) (tgt : Referent) (ctx : List Word) :
                                      PMF Word

                                      Incremental CI-RSA speaker at context ctx (S1 ∝ L0⁷·exp(−7·C)), dite-total.

                                      Equations
                                      • One or more equations did not get rendered due to their size.
                                      Instances For

                                        Scene-Filter Cardinality #

                                        Predictions #

                                        Semantic Properties #

                                        Color adjectives have higher reliability than size adjectives. This asymmetry drives the redundant modification predictions.

                                        All semantic values are positive (required for valid probability).

                                        Noise Theory Connection + Substrate Bridge #

                                        lexContinuousQ is an instance of the unified noise channel from RSA.Core.Noise. The continuous lexical semantics L^C(r, i) is exactly the noise channel with onMatch = v^i, onMismatch = 1 - v^i, b = 1 if item i is true of referent r, 0 otherwise.

                                        This connects [WD21] to the [DHG+20] parameterization where mismatch = 1 - match.

                                        lexContinuousQ packaged as a RSA.NoisyLex bundle. The bundle is the substrate this study and [SW23] share — each provides its own lex and reliability parameters; the PoE prefix-product machinery (RSA.prefixMeaning and friends) is reused.

                                        Equations
                                        Instances For

                                          uttContinuousQ is the NoisyLex.prefixMeaning of the bundled lex (modulo argument order). Substrate-bridge analogue of S&W's prefix_meaning_product for the W&D extension-averaging context.

                                          Uses the polymorphic RSA.prefixMeaning_eq_foldl_mul from Sequential.lean — no need for a study-local foldl helper.

                                          Prediction 4: Overall Cross-Linguistic Redundancy #