Documentation

Linglib.Phenomena.ScalarImplicatures.Studies.ChemlaSpector2011

@cite{chemla-spector-2011} — Experimental Evidence for Embedded Scalar Implicatures #

@cite{chemla-spector-2011}

Chemla, E. & Spector, B. (2011). Experimental evidence for embedded scalar implicatures. Journal of Semantics, 28(3), 359–400. https://doi.org/10.1093/jos/ffq023

Two threads #

  1. Empirical, contra @cite{geurts-pouscoulous-2009}: using a graded truth-value-judgment paradigm with letter-grid pictures, two experiments show that local readings of embedded some/or under universal quantifiers are detectable (Exp 1), and that local readings under non-monotonic exactly one are detectable as a separate reading logically independent of the literal (Exp 2 — the killer finding against globalist theories).
  2. Methodological: graded judgments on a continuous scale (cursor 0–100%) reveal ambiguities that binary truth-value judgments mask; the §3.2 conjecture is that ratings monotonically reflect the set of available readings true at the picture.

T1/T2/T3 taxonomy (paper §1, page 3) #

The paper carves the conventionalism debate into three positions:

Exp 1 tests T1 vs {T2, T3} (via universal-quantifier embedding). Exp 2 tests {T1, T3} vs T2 (via non-monotonic embedding where the local reading is logically independent of the literal — a reading T3 mechanically cannot derive).

Paper structure (sections mirrored below) #

§Content
§1Theories of scalar implicatures (T1/T2/T3 taxonomy)
§2Critique of @cite{geurts-pouscoulous-2009}'s methodology
§3General features of the experimental design (graded judgments)
§4Experiment 1: scalar items in universal sentences
§5Experiment 2: scalar items in non-monotonic environments
§6Conclusions

Empirical data captured #

All numerical values come from Figures 5, 6, 12, 13 and Tables 1–3. Rates are rounded mean cursor positions in percent points; the paper reports them with one decimal (e.g., 12% / 44% / 68% / 99% for Exp 1 'some'). Page references in 4:N style do not apply (CS11 uses standard pagination).

Statistical-test attribution #

The paper uses Wilcoxon signed-rank tests (per-subject, n=16), Mann-Whitney U tests (per-item), and ANOVA (Block × Condition interactions). Specific W-statistics are not encoded here — same discipline as @cite{geurts-pouscoulous-2009}: load-bearing inequalities are verified at the rate level.

Linglib integration #

Subsequent literature (forward pointers) #

The four readings the paper distinguishes. wideScopeOr arises only in Exp 2 §5.5.5 for disjunction items; the first three apply to both experiments though their entailment lattices differ between Exp 1 and Exp 2 (see Exp 2 docstring).

Instances For
    Equations
    • One or more equations did not get rendered due to their size.
    Instances For
      @[reducible, inline]

      A reading is the extensional truth-condition of a sentence under a particular interpretive option, parameterized by the picture type. We use Prop rather than Bool so that Decidable instances derive automatically from Fin n's Fintype (mathlib idiom: define predicates as Prop, get Decidable from Fintype + per-cell DecidableEq, use decide for finite checks). The Implicature W spine defaults to S = Prop, so this also matches the spine bridge.

      Equations
      Instances For

        The three theory families the paper distinguishes (§1, page 3). Exp 1 separates T1 from {T2, T3}; Exp 2 separates T2 from {T1, T3}.

        Instances For
          @[implicit_reducible]
          Equations
          Equations
          • One or more equations did not get rendered due to their size.
          Instances For

            Theory mechanisms #

            Each theory has a generative mechanism that admits some set of reading labels. The key distinction (paper §1, page 3 + footnote 1):

            Reading labels each theory's mechanism generates in monotonic environments (Exp 1). Both T2 and T3 admit local in this regime because local entails literal.

            Equations
            • One or more equations did not get rendered due to their size.
            Instances For

              Reading labels each theory's mechanism generates in non-monotonic environments (Exp 2). T3 cannot generate local because in non-monotonic environments local does not entail literal — this is the crucial structural difference paper §5 exploits to separate T2 from T3.

              Equations
              • One or more equations did not get rendered due to their size.
              Instances For
                @[reducible, inline]

                The §3.2 page-10 conjecture: if at picture p₂ strictly more of the sentence's available readings are true than at p₁, then the rating at p₂ is higher than at p₁.

                Stated locally (CS11-internal) over a list of (rating, reading-count) pairs ordered by reading-count, using mathlib's List.Pairwise. Ratings stored as Nat percentages (matching GeurtsPouscoulous2009's discipline) so the property is decide-able. Promotion to shared substrate is deferred until a second graded-TVJ consumer materializes.

                Equations
                Instances For

                  §3 Experimental design #

                  Pictures are letter-grids. Each letter is independently in one of three states with respect to its circles: connected to none (a falsifier), connected to some-but-not-all (a strong verifier), or connected to all (a weak verifier) — paper §Appendix 2 / Figure 14, page 35.

                  The terminology weak/strong verifier is per the predicate "x is connected with some of its circles" under literal vs strong "some":

                  This mapping aligns with SomeAllWorld:

                  @[reducible, inline]

                  A 6-letter picture (Exp 1). Each letter is independently in one of the three SomeAllWorld states with respect to its own set of circles.

                  Equations
                  Instances For

                    §4 Experiment 1 #

                    Method (paper §4.1, page 12): 16 native French speakers, ages 19–29 (10 women), no formal-linguistics exposure. Continuous-scale rating task (cursor 0–100%); responses coded as percent of red-line fill.

                    Target sentences:

                    Three readings of (8) (paper (10), page 14):

                    Total order: local ⊊ global ⊊ literal (page 5). Crucial for Exp 1's discriminating logic.

                    Four target conditions (paper §4.2.1 page 14, Table 1 page 36):

                    Reading extensions for Exp 1 sentence (8) #

                    Defined as Prop predicates over Picture6; Decidable instances derive automatically since Fin 6 is Fintype and SomeAllWorld is DecidableEq.

                    @[reducible, inline]

                    Literal (10a): every letter has ≥1 circle connected, i.e. no falsifiers. Uses abbrev so the body unfolds for decide and instance synthesis without explicit unfolds.

                    Equations
                    Instances For
                      @[reducible, inline]

                      Global (10b): literal AND there exists a letter that's not a weak verifier (i.e., not connected with all its circles).

                      Equations
                      • One or more equations did not get rendered due to their size.
                      Instances For

                        The four target conditions for Exp 1 (paper §4.2.1 page 14).

                        Instances For
                          Equations
                          • One or more equations did not get rendered due to their size.
                          Instances For

                            The set of readings true at the witness picture for each Exp 1 condition. Exp 1's entailment lattice is a chain (local ⊊ global ⊊ literal), so the truth-sets are nested.

                            Equations
                            Instances For

                              Per CS11's §3.2 conjecture, the rating for a condition reflects the intersection of true readings with the readings the theory's mechanism generates. For Exp 1 (universal embedding, monotonic), all theories use Theory.generatesMonotonic.

                              Equations
                              Instances For

                                Experiment 1 main results (paper Figure 5, page 18, n = 16). Rates are mean cursor positions in integer percent points, matching the discipline of GeurtsPouscoulous2009.lean (which uses Nat percentages for raw rates and for derived means). Per-condition functions are defined by direct match so decide reduces in the kernel.

                                Paper's headline finding (page 18): STRONG > WEAK for both items. The two conditions differ only in whether the local reading is true (Fig 4 page 15). T1 (restricted globalist) predicts no difference because neither condition makes a non-globalist reading true; the observed gap (31 percentage points for 'some', 32 for 'or') is the existence-of-local reading evidence against T1.

                                Ratings increase across conditions in step with the number of readings true: FALSE (0 readings) < LITERAL (1 reading) < WEAK (2 readings) < STRONG (3 readings). The §3.2 monotonicity conjecture (page 10) instantiated on the Exp 1 'some' data via RatingsMonotone.

                                T1's structural prediction, derived from Theory.generatesMonotonic: under T1, WEAK and STRONG admit the same reading-set (both intersected with T1's {literal, global} mechanism give {literal, global}, because local is true at STRONG but T1 doesn't generate local). T1 therefore predicts equal ratings; combined with strong_gt_weak_some this is the falsification of T1 in Exp 1.

                                T2's structural prediction (the contrasting case): STRONG admits a strict superset of WEAK's reading-set, because T2 generates local and local is true at STRONG. Together with the §3.2 monotonicity conjecture, T2 therefore predicts STRONG > WEAK — confirmed by strong_gt_weak_some.

                                Distributivity sub-finding (paper §4.4.5, page 20). For the 'or' item under STRONG condition, sub-conditions STRONG[≠] (where strong verifiers vary in shape, so distributivity inferences are supported) and STRONG[=] (where they don't) yield significantly different ratings (99.5% vs 73%, W = 78, p < .005). This is the kind of empirical finding @cite{fox-spector-2018}'s economy-of-exhaustification predicts: embedded exh is licensed when non-vacuous. Rates as per-mille (Nat) so the 99.5% value 995 is exact.

                                Equations
                                Instances For

                                  DE controls #

                                  Paper §4.2.2 page 14 + §5.3.2 page 26: DE control sentences (12)/(13) "Aucune lettre n'est reliée à certains de ses cercles" — "No letter is connected with some of its circles" — were tested in three conditions:

                                  Findings (Figure 6 page 19 / Figure 13 page 29):

                                  DE control conditions tested in Exp 1 (paper §4.2.2, page 14).

                                  Instances For
                                    Equations
                                    • One or more equations did not get rendered due to their size.
                                    Instances For

                                      Replicates @cite{geurts-pouscoulous-2009}'s Exp 4 finding: in DE contexts the ?LOCAL rate is far below the BOTH rate, supporting the no-local-SI-in-DE generalization.

                                      §5 Experiment 2 — the killer finding #

                                      Method (paper §5.2, page 26): 16 native French speakers, ages 18–35 (9 women), no prior formal-linguistics exposure. Same continuous-scale task as Exp 1, with 3-letter grids replacing 6-letter grids.

                                      Target sentences:

                                      Crucial: exactly one creates a non-monotonic environment where the local reading is logically independent of the literal reading (paper page 25):

                                      Lattice (page 25): global ⊊ literal AND global ⊊ local; literal ⊥ local (logically independent). T1 cannot predict local; T3 (globalist with multi-alternative negation) cannot predict local because the local reading does not entail the literal reading. Only T2 (localist) predicts local in non-monotonic environments.

                                      Four target conditions (paper §5.3.1 page 26):

                                      Reading extensions for Exp 2 sentence (21) #

                                      Note the entailment lattice differs from Exp 1: literal and local are logically independent here. The "exactly one" predicates use ∃ i, P i ∧ ∀ j ≠ i, ¬ P j spelled out explicitly so that Fintype.decidableForallFintype and Fintype.decidableExistsFintype derive Decidable automatically.

                                      @[reducible, inline]

                                      Literal (19a): exactly one letter has ≥1 circle, others have none.

                                      Equations
                                      • One or more equations did not get rendered due to their size.
                                      Instances For
                                        @[reducible, inline]

                                        Global (19b): exactly one letter is a strong verifier, no letter is a weak verifier (the speech-act SI on the exactly one sentence).

                                        Equations
                                        • One or more equations did not get rendered due to their size.
                                        Instances For
                                          @[reducible, inline]

                                          Local (19c): exactly one letter is a strong verifier; the others may be either falsifiers or weak verifiers. Logically independent of literal: a configuration with one strong verifier and two weak verifiers makes local true but literal false.

                                          Equations
                                          • One or more equations did not get rendered due to their size.
                                          Instances For

                                            The four target conditions for Exp 2 (paper §5.3.1).

                                            Instances For
                                              Equations
                                              • One or more equations did not get rendered due to their size.
                                              Instances For

                                                The set of readings true at the witness picture for each Exp 2 condition. Crucially asymmetric: at LOCAL, only local_ is true — the literal reading is FALSE because the other two letters may have all circles connected (which falsifies the exactly one with some literal reading). This is the logical-independence-of-local-and-literal structure that makes Exp 2 the diagnostic experiment.

                                                Equations
                                                Instances For

                                                  Per CS11's §3.2 conjecture, the rating reflects the intersection of true readings with theory-generated readings. For Exp 2 (non-monotonic embedding), all theories use Theory.generatesNonMonotonic — this is where T3 fails.

                                                  Equations
                                                  Instances For

                                                    Sample picture witnessing each Exp 2 condition.

                                                    Equations
                                                    Instances For

                                                      The witness pictures realize the intended reading-truth pattern, verifying the Exp2Condition enum's intended meaning. Note especially the LOCAL condition: literal=F but local=T, exhibiting the logical independence that distinguishes Exp 2 from Exp 1.

                                                      Experiment 2 main results (paper Figure 12, page 28, n = 16), per-mille Nat.

                                                      The killer finding (paper page 28): for the 'some' item under exactly one, the LOCAL condition is rated higher than the LITERAL condition (73% vs 37%). Globalist theories (T1, T3) cannot explain this: in a non-monotonic environment the local reading is logically independent of the literal reading, and globalist mechanisms cannot derive readings that don't entail the literal. The fact that participants rate LOCAL > LITERAL — despite the literal reading being false at LOCAL pictures — is direct positive evidence for the existence of an embedded local reading.

                                                      Existence of the local reading in non-monotonic environments: for both 'some' and 'or', LOCAL is rated far above FALSE (paper Figure 12). For 'or' the LITERAL > LOCAL contrast does not hold (37% vs 58%), but LOCAL > FALSE holds.

                                                      T3's structural prediction failure, derived from Theory.generatesNonMonotonic: T3 generates only {literal, global} in non-monotonic environments. At the LOCAL condition the only true reading is local_, so T3's available-reading-set is empty — identical to FALSE. Combined with the §3.2 monotonicity conjecture, T3 predicts LOCAL = FALSE. The observed local_gt_false_both_items (73% vs 6.7% for 'some') falsifies T3.

                                                      T2's contrasting structural prediction at the same condition: T2 does generate local in non-monotonic environments, so T2's available-reading-set at LOCAL is {local_} — strictly larger than at FALSE. T2 therefore predicts LOCAL > FALSE, matching the data.

                                                      The killer separation: T3 collapses the LOCAL/FALSE distinction; T2 preserves it. The observed LOCAL > FALSE is therefore evidence FOR T2 and AGAINST T3.

                                                      Wide-scope-or sub-finding (paper §5.5.5, page 30). Within the FALSE condition for the 'or' item, sub-cases where the wide-scope reading is true (despite local/global/literal all being false) are rated higher than sub-cases where it isn't (20% vs 6%, W = 128, p < .001). Evidence that graded TVJ detects scope ambiguities even when participants don't report them. Per-mille Nat.

                                                      Equations
                                                      Instances For

                                                        Bridges to GP09 and the Implicature spine #

                                                        Three connections to existing linglib content:

                                                        1. GP09 paradigm comparison: CS11 replicates GP09's no-local-SI-in-DE finding (in DE controls), but contests GP09's no-local-SI-anywhere conclusion via the universal-embedding STRONG > WEAK and the non-monotonic LOCAL > LITERAL findings. The disagreement is paradigm relative — GP09's binary inference task vs CS11's graded TVJ. We do not state "GP09 wrong / CS11 right"; we state the empirical complementarity and the methodological argument.
                                                        2. Implicature spine: the qualitative "embedded local reading exists" conclusion is wrapped as an Implicature value over Picture6 with mechanism := .exhIE (Innocent Exclusion / localist EXH family — the @cite{fox-2007} / @cite{chierchia-fox-spector-2008} / T2 cluster).
                                                        3. GP09 exactly two connection: GP09's Exp 3 exactly two condition is the binary-task analog of CS11's Exp 2 exactly one. GP09 found ~50% inference rate (chance); CS11 finds 73% LOCAL rating. The paradigm shift recovers the localist signal.

                                                        A real cross-experiment claim: both papers find DE local-SI rates well below their respective high baselines.

                                                        • CS11 Exp 1: de_qLocal (25% 'some') is far below de_both (92%)
                                                        • GP09 Exp 4: alleged-SI ambiguity (~6%) is far below genuine-ambiguity baseline (70% mean across 5 controls)

                                                        Both gaps exceed 50 percentage points; both papers' DE results qualitatively agree even though their absolute rates differ (paradigm-relative differences).

                                                        The qualitative "local reading exists in embedded position" finding expressed as an Implicature Picture6: scalar SI, content = the local reading extension, alternative = the global reading, mechanism = exhIE (the @cite{fox-2007}-style localist EXH family that T2 represents).

                                                        Equations
                                                        • One or more equations did not get rendered due to their size.
                                                        Instances For

                                                          The local-reading SI is reinforceable: there's a picture (WEAK condition) where the literal reading holds but the local reading fails. The IsReinforceable diagnostic (Sadock 1978) thus applies.

                                                          §6 Conclusions #

                                                          The paper's verdict (page 31): "scalar items in non-monotonic environments give rise to robust local readings, even more robust than the literal reading. Importantly, no globalist theory of scalar implicatures can predict the local reading to be possible in such cases, where the local reading is logically independent of the literal meaning. This result thus seems to vindicate the localist approach."

                                                          Methodological conclusion: graded judgments reveal ambiguities that binary judgments mask; CS11 detected what GP09 missed. The @cite{geurts-pouscoulous-2009} null result is paradigm-relative, not a fact about the language faculty.

                                                          Open questions noted by the paper itself (page 32):