Documentation

Linglib.Pragmatics.RSA.Compositional

World states for the embedded SI scenario. Each student either read "some but not all" (S) or "all" (A).

Instances For
    @[implicit_reducible]
    Equations
    @[implicit_reducible]
    Equations
    • One or more equations did not get rendered due to their size.
    Equations
    • One or more equations did not get rendered due to their size.
    Instances For

      Truth conditions under global EXH (EXH > ∀): "every student read some" ∧ ¬"every student read all"

      True at: SS, SA, AS (not AA). The negated alternative is the whole "every student read all", so a world where one student read all still satisfies it.

      Equations
      Instances For
        structure RSA.Compositional.LocalAltNode (World Alt : Type) :

        A node in a compositional derivation with local alternatives.

        • meaning : AltWorldBool

          Meaning of each alternative at each world

        • prejacent : Alt

          The prejacent (what was actually uttered)

        • alternatives : List Alt

          The available alternatives at this node

        Instances For
          def RSA.Compositional.LocalAltNode.toAltSet {World Alt : Type} (node : LocalAltNode World Alt) :
          Set (Set World)

          Convert local alternatives to a set for exhMW

          Equations
          Instances For
            def RSA.Compositional.LocalAltNode.prejacent_prop {World Alt : Type} (node : LocalAltNode World Alt) :
            Set World

            The prejacent proposition

            Equations
            Instances For
              def RSA.Compositional.localExhMW {World Alt : Type} (node : LocalAltNode World Alt) :
              Set World

              Local exhMW: exhaustification using only local alternatives.

              Equations
              Instances For

                LU-RSA: What It Can and Cannot Do #

                [PLLF16] [BLG16]

                The LU Approach #

                LU-RSA models uncertainty over the lexicon:

                The listener marginalizes over lexica: L(w | m) ∝ P(w) × Σ_L P(L) × S₁(m | w, L)

                What LU Can Do #

                For simple cases like "Exactly one player hit some of his shots":

                LU does capture simple embedded implicatures.

                What LU Cannot Do: The Global Lexicon Problem #

                LU applies one lexicon to all occurrences.

                Consider: "Every student read some book and some paper"

                With LU:

                LU cannot express: "every student read some-but-not-all books AND at-least-some papers" (independent exhaustification).

                Scope vs Lexicon #

                For "Every student read some book":

                EXH analysis (scope-sensitive):

                LU analysis (scope-blind):

                For single scalar items, these are observationally equivalent. For multiple scalar items or nested structures, they diverge.

                ApproachWhat variesMultiple scalars?Scope interactions?
                Standard RSANothingNoNo
                LU-RSALexicon globallySame for allNo
                Compositional RSAWhere EXH appliesIndependentYes

                LU treats implicature as a lexical property (of words), while EXH treats it as a structural property (of positions).

                Per-student world

                Instances For
                  @[implicit_reducible]
                  Equations
                  Equations
                  • One or more equations did not get rendered due to their size.
                  Instances For

                    Lexical refinements for "some" (LU approach)

                    Instances For
                      @[implicit_reducible]
                      Equations
                      Equations
                      • One or more equations did not get rendered due to their size.
                      Instances For

                        The Single-Scalar Equivalence (Potts et al.'s Success Case) #

                        For "every student read some book" with a single scalar item, LU with strong lexicon gives the same result as local EXH.

                        LU would give:

                        This matches [PL15], who show strong empirical fits.

                        Where LU Fails: The Principled Derivation Problem #

                        1. No principled lexicon selection: With flat priors, why prefer strong? Potts et al. rely on neo-Gricean constraints on refinement sets.

                        2. Global lexicon: All occurrences of "some" get the same meaning. EXH can apply at different positions independently.

                        3. No scope interaction: LU cannot model EXH interacting with quantifier scope, negation scope, etc.

                        The theorem lu_strong_equals_local below shows the equivalence for single-scalar cases. The limitation emerges with multiple scalars.

                        LU-RSA scenario for "Every student read some book"

                        Instances For

                          LU with weak lexicon allows all worlds (like literal L0)

                          LU with strong lexicon gives local-EXH-like result

                          The Single-Scalar Success #

                          For single-scalar cases, LU achieves the same result as local EXH:

                          The Compositional Question: Where Does Marginalization Happen? #

                          LU approach (Potts et al.):

                          Compositional RSA/EXH approach:

                          The question: Does top-level marginalization over lexica give the same results as node-by-node pragmatic reasoning?

                          How Multiple Uncertainties Compose #

                          In LU, when you have multiple scalar items, each can be refined. The space of lexica is the product of refinement choices.

                          A single lexicon L is used for the whole utterance. The listener reasons about which L the speaker is using globally.

                          The structural question is whether pragmatic reasoning should happen:

                          1. Globally: marginalize over complete lexica at the end (LU)
                          2. Locally: apply RSA at each compositional node, then compose

                          For simple cases these may coincide. The divergence appears when:

                          The architectural difference:

                          • LU: P(L) is a distribution over complete lexica
                          • Compositional: each node has its own local alternatives

                          This affects how multiple sources of uncertainty interact.

                          • lexicon : SomeLexicon

                            A lexicon assigns meanings to all scalar items

                          • globalMarg : Bool

                            Marginalization happens at the top

                          Instances For
                            • localExh : List Bool

                              Each node can have its own EXH choice

                            • localRSA : Bool

                              RSA reasoning at each node

                            Instances For

                              The Structural Difference #

                              What LU gets right:

                              The architectural question:

                              For the cases Potts et al. test (single scalar under quantifier), these approaches are empirically equivalent. The question is whether they diverge for more complex compositional structures.

                              Open question (cf. [FB20b] "Global Intentions"): Does the locus of pragmatic reasoning (global vs. local) matter for predicting human behavior in complex embedded contexts?

                              Compositional RSA #

                              EXH is a structural operator, not a lexical one. Modeling where in the derivation EXH applies requires a compositional approach:

                              1. Build a derivation tree with local alternatives at each node
                              2. Apply RSA/EXH at each node with its local alternatives
                              3. Compose the results via standard semantics

                              For "Every student read some book" #

                              Step 1: At the "some" node (per student)

                              Step 2: Compose with "every"

                              Why This Works #

                              The [Fra11] limit theorem is parametric in the alternative set: IBR(alternatives) → exhMW(alternatives)

                              Instantiating with local alternatives at each node yields local exhMW at each node. Composition gives the local reading.

                              Inner alternatives: some vs all

                              Instances For
                                @[implicit_reducible]
                                Equations
                                Equations
                                • One or more equations did not get rendered due to their size.
                                Instances For

                                  Local node for a single student's reading behavior

                                  Equations
                                  • One or more equations did not get rendered due to their size.
                                  Instances For

                                    Composed local interpretation matches localExhMeaning

                                    Local reading is strictly stronger than global

                                    LU vs Compositional RSA #

                                    LU-RSA (Potts, Lassiter, Levy, Frank):

                                    Compositional RSA:

                                    The Expressivity Hierarchy #

                                    Standard RSA ⊂ LU-RSA ≈ Standard RSA ⊂ Compositional RSA ≈ EXH
                                         ↓ ↓ ↓
                                     scope-blind scope-blind scope-sensitive
                                         ↓ ↓ ↓
                                     global only global only global AND local
                                    

                                    LU-RSA and standard RSA have the same structural expressivity. LU adds uncertainty over word meanings, not scope.

                                    theorem RSA.Compositional.lu_is_scope_blind :
                                    (∀ (w : EmbeddedSIWorld), luMeaning { lexicon := SomeLexicon.weak, world := w } = true) (∀ (w : EmbeddedSIWorld), luMeaning { lexicon := SomeLexicon.strong, world := w } = localExhMeaning w) True

                                    LU-RSA is scope-blind: it cannot distinguish global from local based on structural scope position.

                                    Proof: LU's two lexicons (weak, strong) correspond to (no SI, local-like SI), but the choice is lexical not structural. There is no principled way to derive the local reading.

                                    The expressivity gain: Compositional RSA can express local readings that LU-RSA cannot principally derive.

                                    Why Standard RSA Cannot Derive Local Readings #

                                    No sentence-level alternative can distinguish SS from SA using global worlds. This limitation motivates either:

                                    1. Stipulating EXH as a grammatical primitive (the standard view)
                                    2. Developing "local RSA" that operates inside composition

                                    The Setup #

                                    For "every student read some book":

                                    The Problem #

                                    RSA excludes a world w when hearing utterance u if: ∃ u' ∈ Alt(u). ⟦u'⟧(w) ∧ informativity(u') > informativity(u)

                                    To exclude SA when hearing "every...some", we'd need an alternative u' where: ⟦u'⟧(SA) = true AND ⟦u'⟧(SS) = false (or u' more informative)

                                    No such sentence exists in the nested Aristotelians.

                                    Consequences #

                                    If RSA cannot distinguish SS from SA at the sentence level, then either:

                                    1. EXH is a separate grammatical mechanism (stipulation)
                                    2. RSA must operate sub-sententially with "local" alternatives

                                    Option 2 is "RSA all the way down" -- deriving EXH-like behavior from RSA applied at each compositional node.

                                    @[implicit_reducible]
                                    Equations
                                    Equations
                                    • One or more equations did not get rendered due to their size.
                                    Instances For

                                      Key observation: SS and SA are indistinguishable by sentence-level meaning.

                                      For every nested Aristotelian sentence, if it is true at SS, it is also true at SA (or vice versa in a way that does not help RSA exclude SA).

                                      SA "dominates" SS: if some students did X in SS, then some students did X in SA (with potentially more).

                                      Sentences true at SA but not SS don't help RSA exclude SA.

                                      "some...all" is true at SA and false at SS. But this does not help: RSA excludes worlds where stronger alternatives are true. "some...all" is weaker than "some...some" for the inner quantifier, so it does not trigger exclusion.

                                      The only sentence that could help exclude SA would be one that:

                                      1. Is true at SS
                                      2. Is false at SA
                                      3. Is stronger than "some...some"

                                      No such sentence exists.

                                      The RSA exclusion principle: RSA excludes world w when hearing u if there exists a stronger alternative u' true at w.

                                      For "some...some" (SS sentence), the only stronger alternative is "all...all". "all...all" is false at both SS and SA. Therefore, RSA cannot exclude SA when hearing "some...some".

                                      The core theorem: Standard RSA with sentence-level alternatives cannot derive the local reading {SS}.

                                      Proof: RSA's posterior after hearing "some...some" includes all worlds where "some...some" is true and no strictly stronger alternative is true.

                                      • "some...some" is true at: SS, SA, AS, AA (literally: everyone drank some)
                                      • "all...all" is the strongest alternative, true only at AA
                                      • RSA excludes AA (stronger alternative was available)
                                      • RSA keeps: SS, SA, AS -- this is the global reading, not local

                                      To get the local reading {SS}, we would need to exclude SA and AS. No sentence-level alternative can do this.

                                      The Expressivity Gap #

                                      Standard RSA posterior for "some...some":
                                        {w : ⟦some...some⟧(w) ∧ ¬⟦all...all⟧(w)} = {SS, SA, AS}
                                      
                                      Local EXH reading:
                                        {SS}
                                      
                                      Gap: {SA, AS} -- worlds RSA cannot exclude without sub-sentential reasoning
                                      

                                      This gap is why linguists posit EXH as a grammatical primitive. "Local RSA" -- RSA applied at each compositional node -- can derive the local reading without stipulating EXH. EXH then emerges as the α → ∞ limit of local RSA.

                                      [FB20b]: Global Intentions Model #

                                      Franke & Bergen's "Global Intentions" (GI) model provides machinery for reasoning over where EXH applies.

                                      The GI Architecture #

                                      1. Grammar generates parses: For each utterance, the grammar generates multiple readings based on where EXH is inserted:

                                        • lit: no EXH
                                        • M: EXH at matrix position
                                        • O: EXH at outer quantifier
                                        • I: EXH at inner quantifier
                                        • MO, MI, OI, MOI: combinations
                                      2. Speaker chooses (utterance, parse) jointly:

                                        P_S(m, p | t; α) ∝ [P(t | ⟦m⟧^p)]^α
                                        

                                        The speaker picks both what to say and how to mean it.

                                      3. Listener infers (world, parse) jointly:

                                        P_L(t, p | m; α) ∝ P(t) × P_S(m, p | t; α)
                                        

                                      Finding #

                                      GI outperforms simpler models (vanilla RSA, LU) because it can use strong readings like ⟦SS⟧^M = {██-} that uniquely identify states. However, GI assumes EXH is a grammatical primitive.

                                      The "RSA All The Way Down" Reinterpretation #

                                      Bergen & Franke's "parse" variable can be reinterpreted as "where RSA reasoning applies":

                                      Bergen & Franke: RSA All The Way Down:
                                      ───────────────── ─────────────────────────
                                      parse p ∈ {lit,M,O,I,...} config c ∈ {RSA application sites}
                                      ⟦m⟧^p = EXH at positions ⟦m⟧^c = RSA(α) at positions
                                      Speaker chooses parse Speaker chooses where to be pragmatic
                                      

                                      The Equivalence #

                                      In the α → ∞ limit:

                                      Bergen & Franke's GI model equals "RSA all the way down" in the limit.

                                      The Derivation Direction #

                                      ApproachStarting PointPragmatic Reasoning
                                      Bergen & FrankeEXH is primitiveRSA reasons over EXH positions
                                      RSA All The Way DownRSA is primitiveEXH emerges in α → ∞ limit

                                      The compositional structure is the same. The difference is foundational:

                                      Consequences #

                                      1. EXH is not a separate grammatical mechanism, but emergent behavior of rational communication at high precision.

                                      2. GI's empirical success (Bayes factor analysis) is evidence for the compositional structure, not for EXH-as-primitive.

                                      3. One mechanism (RSA) at different α values, rather than two mechanisms (grammar-EXH + pragmatics-RSA).

                                      Formal Correspondence #

                                      The GI model's joint distribution over (utterance, parse) corresponds to compositional RSA's joint distribution over (utterance, enrichment-sites).

                                      A parse in Bergen & Franke's sense: where EXH is inserted

                                      Instances For
                                        @[implicit_reducible]
                                        Equations
                                        def RSA.Compositional.instReprParse.repr :
                                        ParseStd.Format
                                        Equations
                                        • One or more equations did not get rendered due to their size.
                                        Instances For

                                          An RSA application configuration: where RSA reasoning happens

                                          • matrixRSA : Bool

                                            Apply RSA at matrix level?

                                          • outerRSA : Bool

                                            Apply RSA at outer quantifier?

                                          • innerRSA : Bool

                                            Apply RSA at inner quantifier?

                                          Instances For
                                            def RSA.Compositional.instDecidableEqRSAConfig.decEq (x✝ x✝¹ : RSAConfig) :
                                            Decidable (x✝ = x✝¹)
                                            Equations
                                            • One or more equations did not get rendered due to their size.
                                            Instances For
                                              Equations
                                              • One or more equations did not get rendered due to their size.
                                              Instances For

                                                Convert RSA config to equivalent parse (in the α → ∞ limit)

                                                Equations
                                                Instances For

                                                  The Global Intentions model: speaker chooses (utterance, parse)

                                                  • utterance : Utt
                                                  • parse : Parse
                                                  Instances For

                                                    RSA-All-The-Way-Down: speaker chooses (utterance, config)

                                                    Instances For

                                                      In the α → ∞ limit, RSAATWD choice maps to GI choice

                                                      Equations
                                                      Instances For

                                                        The Limit Theorem (Conceptual) #

                                                        theorem rsaatwd_limit_is_gi :
                                                            ∀ (scenario :...) (m : Utt) (t : World),
                                                              lim_{α→∞} P_RSAATWD(m, c | t; α) = P_GI(m, c.toParse | t)
                                                        

                                                        Bergen & Franke's machinery for reasoning over parses emerges naturally from "RSA at every compositional node" in the limit.

                                                        Implications #

                                                        1. GI's empirical success (they show GI >> LI >> LU >> RSA in Bayes factors) is evidence for compositional pragmatic reasoning, interpretable as:

                                                          • B&F interpretation: Grammar generates EXH, RSA selects among readings
                                                          • Our interpretation: RSA applies compositionally, EXH-like behavior emerges
                                                        2. GI wins because ⟦SS⟧^M uniquely identifies state ██-. This reading requires matrix exhaustification.

                                                          • B&F: Matrix EXH is a grammatical option
                                                          • Us: High-α RSA at the matrix level produces the same strengthening
                                                        3. Parse selection = RSA site selection: B&F's listener inferring the intended parse is equivalent to inferring where the speaker applied pragmatic reasoning.

                                                        Bergen & Franke's key empirical finding: GI model assigns high probability to 'SS' for state ██- because ⟦SS⟧^M = {██-}.

                                                        In RSAATWD terms: with high-α RSA at matrix level, the speaker strongly prefers 'SS' for this state because it's uniquely identifying.

                                                        Algebraic Structure of Compositional RSA #

                                                        Does the joint distribution over (utterance, config) decompose in a principled algebraic way?

                                                        The Product Hypothesis #

                                                        If pragmatic reasoning at different nodes is independent, then:

                                                        P_S(m, c | t; α) ∝ ∏_{node n ∈ c} P_RSA(choice_n | local_state_n; α)
                                                        

                                                        This would mean:

                                                        1. Each node contributes independently to the overall probability
                                                        2. Informativity MULTIPLIES across nodes
                                                        3. The joint inference FACTORS into local inferences

                                                        Informativity Multiplication #

                                                        For a sentence with multiple scalar items, if informativity multiplies:

                                                        informativity(m, c) = ∏_{n ∈ c} informativity_n(choice_n)
                                                        

                                                        Then the softmax over (m, c) decomposes:

                                                        P_S(m, c | t; α) ∝ exp(α × log informativity(m, c))
                                                                         = exp(α × Σ_n log informativity_n(choice_n))
                                                                         = ∏_n exp(α × log informativity_n(choice_n))
                                                                         = ∏_n P_RSA_n(choice_n | local_t_n; α)
                                                        

                                                        This is the algebraic signature of compositional RSA.

                                                        Connection to GradedMonad (RSAFree) #

                                                        The RSAFree graded monad (see GradedMonad.lean) has exactly this structure:

                                                        -- RSAFree is a graded monad where the grade tracks alternatives
                                                        -- Binding (seq) composes RSA computations
                                                        -- Informativity multiplies under seq
                                                        
                                                        seq : RSAFree W A → (A → RSAFree W B) → RSAFree W B
                                                        

                                                        The monad laws ensure:

                                                        1. Associativity: (m >>= f) >>= g = m >>= (λx. f x >>= g)
                                                        2. Identity: return x >>= f = f x

                                                        These correspond to:

                                                        1. Composition is associative: parsing order doesn't matter
                                                        2. Literal meaning is identity: no RSA = base meaning

                                                        The Decomposition Theorem (Conceptual) #

                                                        theorem informativity_multiplicative :
                                                            ∀ (m₁ : RSAFree W A) (m₂ : A → RSAFree W B) (a : A) (b : B),
                                                              informativity (seq m₁ m₂) (a, b) =
                                                              informativity m₁ a × informativity (m₂ a) b
                                                        

                                                        This says: the informativity of a composed utterance (with choices at multiple levels) is the product of the local informativities.

                                                        The joint distribution therefore factors.

                                                        Why Decomposition Matters #

                                                        1. Computational tractability: If the distribution factors, inference is polynomial rather than exponential in the number of nodes.

                                                        2. Theoretical parsimony: Local RSA at each node, composed via standard semantic composition. No special global mechanism.

                                                        3. Empirical predictions: Local RSA predicts independence effects that global reasoning would not.

                                                        4. Connection to grammar: The algebraic structure mirrors compositional semantics. RSA "rides along" with semantic composition.

                                                        The Monoid of Informativities #

                                                        Informativity values form a multiplicative monoid:

                                                        This monoid structure is what makes compositional RSA work:

                                                        Informativity as a value (for algebraic reasoning)

                                                        Instances For

                                                          The multiplicative monoid structure on informativities

                                                          Equations
                                                          • i₁.mul i₂ = { value := i₁.value * i₂.value, pos := , le_one := }
                                                          Instances For
                                                            theorem RSA.Compositional.informativity_value_mul (i₁ i₂ : Informativity) :
                                                            (i₁ * i₂).value = i₁.value * i₂.value

                                                            Informativity values multiply like rationals

                                                            theorem RSA.Compositional.informativity_mul_assoc_value (i₁ i₂ i₃ : Informativity) :
                                                            (i₁ * i₂ * i₃).value = (i₁ * (i₂ * i₃)).value

                                                            Informativity multiplication is associative (on values)

                                                            @[simp]

                                                            Value of unit informativity is 1

                                                            Informativity multiplication has unit (on values)

                                                            The Factorization Picture #

                                                                    Sentence: "Every student read some book"
                                                                                     │
                                                                                ┌────┴────┐
                                                                                │ Matrix │ ← RSA here? (config.matrixRSA)
                                                                                │ EXH? │
                                                                                └────┬────┘
                                                                                     │
                                                                          ┌──────────┴──────────┐
                                                                          │ │
                                                                     ┌────┴────┐ ┌────┴────┐
                                                                     │ "every" │ │ VP │
                                                                     │ student │ │ │
                                                                     └─────────┘ └────┬────┘
                                                                                                │
                                                                                      ┌─────────┴─────────┐
                                                                                      │ │
                                                                                 ┌────┴────┐ ┌────┴────┐
                                                                                 │ "read" │ │ "some" │ ← RSA here?
                                                                                 └─────────┘ │ book │ (config.innerRSA)
                                                                                                     └─────────┘
                                                            
                                                            P_S(m, c | t; α) = P_RSA_matrix(EXH? |...; α) × P_RSA_inner(EXH? |...; α)
                                                            

                                                            The distribution factors along the tree structure. Each node's RSA decision is (conditionally) independent.

                                                            Connection to Bergen & Franke #

                                                            Their finding that GI >> LI >> LU >> RSA can be reinterpreted:

                                                            GI wins because it has the right algebraic structure: the full product over all compositional nodes.

                                                            Summary: The Two Perspectives #

                                                            [FB20b] #

                                                            RSA All The Way Down (This File) #

                                                            The Mathematical Connection #

                                                            The joint inference machinery is identical:

                                                            P_S(m, x | t; α) ∝ [P(t | meaning(m, x))]^α
                                                            P_L(t, x | m; α) ∝ P(t) × P_S(m, x | t; α)
                                                            

                                                            where x = parse (B&F) or x = RSA-config (RSAATWD).

                                                            In the α → ∞ limit, these are the same model, because local RSA → local EXH at each node.

                                                            Bergen & Franke's empirical validation of GI is also validation of compositional RSA, without requiring EXH as a grammatical primitive.

                                                            Open conjectures #

                                                            TODO: Sharp characterization theorem for RSA ≅ EXH (previously stubbed in the deleted Core/Conjectures.lean). Under uniform prior + α → ∞ + depth one + no QUD sensitivity, RSA L1's positive-credence support should coincide with EXH's truth set: (uniform_prior ∧ high_rationality ∧ depth_one ∧ no_qud) → ∀ u w, L1(u,w) > 0 ↔ EXH(u) w. Conjectured to be the precise boundary between notational variants and genuine empirical disagreement of the two frameworks ([FG12], [Fox07], [CFS12]).

                                                            TODO: Phase-bounded exhaustification (3 related conjectures)[Cha14b] treats scope islands as evaluation boundaries and CFS treats Exh as applying at scope positions. Hypothesis: phase boundaries delimit both exhaustification and RSA computation. (a) Local Exh (within a phase) and global Exh agree within the phase domain. (b) S1 optimizes within the current phase; for utterances in the same phase, local and global S1 agree. (c) Alternatives are computed from material within the same phase (cf. [FK11]'s locally- available alternatives).