Documentation

Linglib.Phenomena.Conditionals.Studies.RamotowskaEtAl2025

@cite{ramotowska-marty-romoli-santorio-2025} - Counterfactuals and Quantificational Force #

Ramotowska, S., Marty, P., Romoli, J. & Santorio, P. (2025). Counterfactuals and quantificational force: Experimental evidence for selectional semantics. Semantics & Pragmatics 18, Article 6: 1–43.

Finding #

Quantifier STRENGTH determines graded truth-value judgments for counterfactuals embedded under quantifiers, not polarity or QUD.

This supports the SELECTIONAL theory (Stalnaker + supervaluation) over:

Experimental Paradigm #

Two experiments using graded truth-value judgments (0–99 slider from "completely false" to "completely true"). QUD manipulated between subjects: E-QuD (existential: "at least one has a chance to win") vs U-QuD (universal: "all are guaranteed to win").

Test sentences (Experiment 2):

Key Results #

The three theories being tested.

Instances For
    Equations
    • One or more equations did not get rendered due to their size.
    Instances For
      @[implicit_reducible]
      Equations

      Quantifiers tested in the experiment.

      Instances For
        Equations
        • One or more equations did not get rendered due to their size.
        Instances For
          @[implicit_reducible]
          Equations

          Quantifier strength classification, derived from canonical Strength.

          Equations
          Instances For

            QUD type manipulated between subjects.

            Instances For
              Equations
              • One or more equations did not get rendered due to their size.
              Instances For
                @[implicit_reducible]
                Equations

                Selectional theory predictions (Table 3): QUD-independent. Strong quantifiers → rejected (low ratings), weak quantifiers → accepted (high ratings).

                Equations
                Instances For

                  Homogeneity theory predictions (Table 3): QUD-dependent. Positive quantifiers: high under E-QuD, low under U-QuD. Negative quantifiers: low under E-QuD, high under U-QuD. The predicted interaction is between QUD and polarity, not strength.

                  Equations
                  Instances For

                    Experimental datum: mean slider rating (0–99 scale) for a condition. 0 = "completely false", 99 = "completely true".

                    Instances For
                      Equations
                      • One or more equations did not get rendered due to their size.
                      Instances For

                        Experiment 2 Results (card game, n=94) #

                        Experiment 2 (§6) provides per-condition mean slider ratings for target counterfactual (TC) sentences in the mixed scenario, reported in §6.7.3 (p. 6:34). Values verified against raw CSV data (OSF: osf.io/3jywr); paper rounds raw means to 1 decimal place.

                        Experiment 2: mean slider ratings for counterfactuals in mixed scenarios. Verified against raw CSV data (OSF) and paper §6.7.3.

                        Strong quantifiers (every/all, none): all means < 4. Weak quantifiers (not all, some): all means > 82.

                        Equations
                        • One or more equations did not get rendered due to their size.
                        Instances For

                          Experiment 1 Results (lottery, n=87) #

                          Experiment 1 (§5) uses lottery scenarios where "only some of the tickets that have been bought win a prize." Mean slider ratings (0–99) for counterfactual sentences in the mixed scenario. No per-quantifier × per-QUD breakdown is reported; the paper reports marginal means by STRENGTH (collapsing across QUD and polarity).

                          Key results from the mixed-effects model (§5.7.2):

                          Experiment 1: marginal means by quantifier strength in mixed scenarios. These are the key data points establishing the strength effect (strong < 15, weak > 84). Values verified from raw CSV data (OSF: osf.io/3jywr).

                          • isStrong : Bool
                          • meanRating :
                          Instances For
                            Equations
                            • One or more equations did not get rendered due to their size.
                            Instances For
                              Equations
                              Instances For

                                Key empirical observation: Strength, not polarity or QUD, determines truth-value judgments for counterfactuals in mixed scenarios.

                                Strong quantifiers (every, no) have uniformly low mean ratings (< 4/99). Weak quantifiers (some, not every) have uniformly high ratings (> 82/99). QUD has no significant effect on counterfactual ratings.

                                Equations
                                • One or more equations did not get rendered due to their size.
                                Instances For
                                  theorem RamotowskaEtAl2025.strength_effect_verified :
                                  (experiment2MixedResults.all fun (d : ExperimentalDatum) => if d.quantifier.isStrong = true then decide (d.meanRating < 5) else decide (d.meanRating > 80)) = true

                                  Strength effect: all strong quantifier ratings are below 5/99 and all weak quantifier ratings are above 80/99 in the mixed scenario.

                                  This extreme separation rules out chance variation and confirms that strength is the dominant factor.

                                  QUD has no effect on counterfactuals: within each quantifier, E-QuD and U-QuD ratings are close (differ by < 5 points on 0–99 scale).

                                  This is the key prediction of the selectional theory (QUD-independent) and against the homogeneity theory (which predicts QUD × polarity).

                                  Equations
                                  • One or more equations did not get rendered due to their size.
                                  Instances For

                                    Selectional theory succeeds: predictions match data.

                                    The selectional theory predicts that quantifier strength determines ratings regardless of QUD. This matches the observed pattern: strong quantifiers uniformly rejected, weak uniformly accepted, with no QUD modulation.

                                    Equations
                                    • One or more equations did not get rendered due to their size.
                                    Instances For

                                      Homogeneity theory fails: predicted QUD × polarity interaction absent.

                                      The homogeneity theory predicts that positive quantifiers (every, some) should be rated HIGH under E-QuD but LOW under U-QuD, and vice versa for negative quantifiers. The data shows no such interaction:

                                      • "every" is low under BOTH QUDs (~1.2 and ~1.5)
                                      • "some" is high under BOTH QUDs (~97.2 and ~96.1)
                                      Equations
                                      • One or more equations did not get rendered due to their size.
                                      Instances For
                                        theorem RamotowskaEtAl2025.homogeneity_wrong_count :
                                        (List.filter (fun (d : ExperimentalDatum) => have predictedHigh := homogeneityPredictedHigh d.quantifier d.qud; predictedHigh && decide (d.meanRating < 50) || !predictedHigh && decide (d.meanRating > 50)) experiment2MixedResults).length = 4

                                        The homogeneity theory makes wrong predictions for 4 of 8 conditions.

                                        Under U-QuD, homogeneity predicts:

                                        • every → low (✓ observed: 1.5)
                                        • some → low (✗ observed: 96.1)
                                        • no → high (✗ observed: 3.3)
                                        • not every → high (✓ observed: 82.1)

                                        Under E-QuD, homogeneity predicts:

                                        • every → high (✗ observed: 1.2)
                                        • some → high (✓ observed: 97.2)
                                        • no → low (✓ observed: 0.9)
                                        • not every → low (✗ observed: 86.1)

                                        Mixed-Effects Model Results (Table 5 of paper) #

                                        Experiment 2 target counterfactual sentences, linear mixed-effects model with POLARITY, STRENGTH, QUD and interactions as predictors:

                                        Effectβp
                                        INTERCEPT46.1< .001
                                        STRENGTH−88.7< .001
                                        QUD−0.60.7
                                        POLARITY5.9< .001
                                        QUD:POLARITY0.30.9
                                        STRENGTH:QUD3.90.2
                                        STRENGTH:POLARITY−13.2< .001
                                        STR:POL:QUD−5.30.4

                                        Key findings:

                                        Plural Definites vs Counterfactuals: QUD Sensitivity #

                                        Experiment 2 also tested plural definite sentences alongside counterfactuals. The key finding: plural definites ARE sensitive to QUD (β = −12.6, p = 0.01), while counterfactuals are NOT (β = −0.6, p = 0.7). This dissociation confirms:

                                        1. The QUD manipulation was effective (plural definites detect it)
                                        2. Counterfactuals' QUD insensitivity is a genuine semantic property
                                        3. Both phenomena use the same DIST operator, but differ in architecture: plural homogeneity is LOCAL (gap before quantifier), while selectional counterfactuals are GLOBAL (Bool before quantifier)

                                        Plural definite datum: mean slider rating under each QUD condition in the mixed scenario ("The players won this round").

                                        Instances For
                                          Equations
                                          • One or more equations did not get rendered due to their size.
                                          Instances For

                                            Experiment 2: plural definite mean ratings in mixed scenario. Unlike counterfactuals, these show a significant QUD effect. Raw means from OSF data; paper §6.7.2 reports model-estimated marginals (42.2 / 29.6) which differ slightly.

                                            Equations
                                            • One or more equations did not get rendered due to their size.
                                            Instances For
                                              theorem RamotowskaEtAl2025.selectional_prediction_grounded (q : Quantifier) (bs : List Bool) (h_some_true : bs.any id = true) (h_some_false : (bs.any fun (x : Bool) => !x) = true) :

                                              Grounding theorem: the study-level prediction (selectionalPredictedHigh) agrees with the formal selectional semantics for any mixed input.

                                              This connects the theory layer's three-valued projection operations to the study file's simple strength-based classification. The classification is not stipulated — it is derived from the formal theory by construction.

                                              Why Strength Matters: Local vs Global Aggregation #

                                              The paper's deepest insight (§2.2): whether gaps arise LOCALLY (before the quantifier) or GLOBALLY (after the quantifier) determines whether quantifier strength matters. The algebra is Core.Duality.aggregate applied to two different inputs:

                                              Homogeneity architecture erases strength: when gaps arise locally, both strong and weak quantifiers return .indet. The quantifier's projection type is invisible — it cannot "see past" gaps.

                                              This is why the homogeneity theory predicts no strength effect and must resort to QUD × polarity to distinguish conditions.

                                              Selectional architecture produces strength effect: when the quantifier sees only Bools (global scope), mixed inputs yield .true for weak (∃/disjunctive) and .false for strong (∀/conjunctive).

                                              This connects the study's embeddedSelectional through the bridging theorem to Duality.aggregate_map_ofBool_mixed.

                                              Selectional counterfactuals are always determinate: under global scope, aggregation over Bools never produces a gap.

                                              This explains why selectional semantics yields crisp true/false judgments (no "undefined"), matching the experimental pattern of extreme slider values (< 4 or > 82).

                                              Why Plural Definites Are QUD-Sensitive But Counterfactuals Are Not #

                                              Experiment 2 tested both counterfactuals and plural definites ("The players won this round") in the same mixed scenarios. The key finding:

                                              The NonBivalence dichotomy explains this dissociation:

                                              The consequence: when the semantic layer returns .indet, the only source of variation in judgments is pragmatic resolution — and pragmatic resolution is QUD-dependent (@cite{kriz-2016}: sufficientlyTrue and addressesIssue). When the semantic layer returns a determinate value, there is no gap for pragmatics to exploit — QUD has nothing to modulate.

                                              This is why the SAME mixed scenario produces QUD-sensitivity for PDs but not for CFs: the scope of trivalence determines whether pragmatics gets a foothold.

                                              Plural definites are LOCAL: in mixed scenarios, every quantifier returns .indet. Strength, polarity, and QUD are all invisible at the semantic level — the quantifier cannot see past the gap.

                                              Counterfactuals are GLOBAL: in mixed scenarios, every quantifier returns a determinate value. There is no gap for pragmatics to exploit.

                                              theorem RamotowskaEtAl2025.scope_determines_qud_sensitivity (n : ) (hn : n > 0) (bs : List Bool) (hlen : bs.length = n) (h_some_true : bs.any id = true) (h_some_false : (bs.any fun (x : Bool) => !x) = true) (d : Core.Duality.ProjectionType) :

                                              The dissociation: for the same mixed input (n individuals, some satisfying the predicate, some not), PDs return .indet while CFs return a definite value. The gap is what makes PDs QUD-sensitive — pragmatic resolution via sufficientlyTrue (@cite{kriz-2016}) depends on the QUD partition. CFs have no gap to resolve.

                                              This is a direct corollary of the local/global aggregation decomposition in Core.Duality: local scope produces gaps that pass through quantifiers; global scope produces Bools that quantifiers can distinguish.

                                              1. Local vs Global Aggregation (Core.Duality.aggregate_*): The paper's deepest architectural insight is formalized as two facts about aggregate. homogeneity_erases_strength derives that local gaps make strength invisible; selectional_strength_effect derives that global Bools produce the strength effect.

                                              2. Plural Definite Dissociation (above): scope_determines_qud_sensitivity derives the CF/PD dissociation from the NonBivalence dichotomy. Plural definites are LOCAL (gap before quantifier → QUD-sensitive pragmatic resolution); counterfactuals are GLOBAL (Bool before quantifier → no gap to resolve).

                                              3. Modal Homogeneity (@cite{agha-jeretic-2022}): Weak necessity modals (should) are to strong necessity (must) what plural definites are to all-sentences. shouldEval produces .indet in mixed domains (local), while mustEval produces ofBool (global). The same NonBivalence dichotomy predicts that embedded should-sentences would be strength-insensitive while embedded must-sentences would show the strength effect.

                                              4. Conditional Excluded Middle (CEM): Stalnaker's semantics validates CEM: (A □→ B) ∨ (A □→ ¬B). See Counterfactual.lean for the proof.