Documentation

Linglib.Phenomena.Reference.Studies.GroszJoshiWeinstein1995

@cite{grosz-joshi-weinstein-1995}: Centering Theory #

@cite{kameyama-1986} @cite{gordon-grosz-gilliom-1993} @cite{kehler-rohde-2013} @cite{sidner-1983}

Centering: A Framework for Modeling the Local Coherence of Discourse. Computational Linguistics 21(2): 203–225.

Each utterance has a set of forward-looking centers Cf (ranked by grammatical role: subject > object > other) and at most one backward-looking center Cb (the highest-ranked Cf element of the previous utterance that is also realized in the current one). Three transition types — continuation, retaining, shifting — classify adjacent-utterance pairs by whether the Cb is preserved and whether that Cb is the most-highly-ranked Cf.

Two normative rules govern coherent discourse: Rule 1 (pronominalization constraint — if any Cf element is pronominalized in the next utterance, the Cb must be); Rule 2 (transition preference — continuations preferred over retentions, retentions over shifts).

The key empirical contrast is between Discourses 1 and 2 (§ 4 below): same propositional content, different transition pattern, different perceived coherence. The framework predicts the difference.

This file consumes the substrate types and operators from Theories/Discourse/Centering/{Defs,Basic,Transition,Rule1,Rule2}.lean plus the GrammaticalRole Cf-ranker instance from Instances/GrammaticalRole.lean. Per linglib's import-don't-restipulate discipline, no Centering primitives are redefined here — the file's contribution is the empirical-example anchor for the substrate plus the §8 comparison with @cite{sidner-1983}.

Throughout, examples use String entities for readability and Utterance String GrammaticalRole from the substrate.

@[reducible, inline]

Utterance abbreviation specialized to the GJW use case (String entities, grammatical-role-ranked Cf).

Equations
Instances For

    (1a) John went to his favorite music store to buy a piano.

    Equations
    • One or more equations did not get rendered due to their size.
    Instances For

      (1b) He had frequented the store for many years.

      Equations
      • One or more equations did not get rendered due to their size.
      Instances For

        (1c) He was excited that he could finally buy a piano.

        Equations
        • One or more equations did not get rendered due to their size.
        Instances For

          (1d) He arrived just as the store was closing for the day.

          Equations
          • One or more equations did not get rendered due to their size.
          Instances For

            (2a) John went to his favorite music store to buy a piano.

            Equations
            • One or more equations did not get rendered due to their size.
            Instances For

              (2b) It was a store John had frequented for many years.

              Equations
              • One or more equations did not get rendered due to their size.
              Instances For

                (2c) He was excited that he could finally buy a piano.

                Equations
                • One or more equations did not get rendered due to their size.
                Instances For

                  (2d) It was closing just as John arrived.

                  Equations
                  • One or more equations did not get rendered due to their size.
                  Instances For

                    Per-pair transition predictions #

                    For each adjacent pair, the Cb (computed from the prior utterance)
                    and the transition type follow from the substrate definitions. 
                    

                    Discourse 1 (a→b): John continues as Cb.

                    Discourse 2 (a→b): the Cb is John (the only entity in Cf(D2.a) that is realized in D2.b), but Cp(D2.b) = "store" (not John), so this is a retain — already a less coherent transition than Discourse 1's continuation.

                    Discourse 2 (b→c): John re-emerges as Cb (it was object in D2.b); in D2.c, John is also Cp — so this would be a continuation. Rule 1 is also fine here.

                    Discourse 2 (c→d): John remains Cb (was subject in D2.c, object in D2.d), but Cp(D2.d) = "store". RETAIN, not continuation.

                    The coherence contrast (paper §2): Discourse 1's transition pattern is all continuations; Discourse 2's pattern is retain, continue, retain — a mix of weaker transitions. The sum of Transition.ranks is a coarse but theory-aligned coherence measure.

                    Equations
                    Instances For

                      (15a) He has been acting quite odd. (Cb = John, presumed in segment.)

                      Equations
                      Instances For

                        (15b) He called up Mike yesterday. (Cb = John, "He" = John.)

                        Equations
                        • One or more equations did not get rendered due to their size.
                        Instances For

                          (15c) John wanted to meet him urgently. (Cb = John, "him" = Mike.) The Cf member Mike is pronominalized but the Cb John is not — a Rule 1 violation.

                          Equations
                          • One or more equations did not get rendered due to their size.
                          Instances For

                            Rule 1 violation: in (15c), Mike is pronominalized but the Cb John is realized as a proper name. The paper's diagnosis.

                            The (a→b) pair of the same discourse satisfies Rule 1 (the only Cf element from (a) is John, who is pronominalized in (b) — the Cb is also pronominalized). The violation is local to the (b→c) step.

                            @cite{grosz-joshi-weinstein-1995} §5 examples (7)-(10). All four variants share utterances (a) and (b); they differ only in (c)'s realization choices. Variants (7) and (8) satisfy Rule 1; variants (9) and (10) violate it. The paper notes (10) is "completely unacceptable" and (9) is also degraded — both more so than (7) and (8) — because of the Rule 1 violations.

                            Shared:
                            > a. Susan gave Betsy a pet hamster.
                            > b. She reminded her that such hamsters were quite shy. 
                            

                            (a) Susan gave Betsy a pet hamster. Cf = [Susan, Betsy, hamster].

                            Equations
                            • One or more equations did not get rendered due to their size.
                            Instances For

                              (b) She reminded her ... "She" = Susan (subj), "her" = Betsy (obj).

                              Equations
                              • One or more equations did not get rendered due to their size.
                              Instances For

                                (7c) She asked Betsy whether she liked the gift. — "She" = Susan, Betsy as proper name (object). Cb = Susan, Cp = Susan ⇒ CONTINUE. Susan is pronominalized; Rule 1 satisfied.

                                Equations
                                • One or more equations did not get rendered due to their size.
                                Instances For

                                  (8c) Betsy told her that she really liked the gift. — Betsy as proper name (subject), "her" = Susan. Cb = Susan (highest in Cf(b) realized in c), but Cp = Betsy ⇒ RETAIN. Susan as Cb pronominalized via "her"; Rule 1 satisfied.

                                  Equations
                                  • One or more equations did not get rendered due to their size.
                                  Instances For

                                    (9c) Susan asked her whether she liked the gift. — Susan as proper name (subject), "her" = Betsy. Cb = Susan, Cp = Susan ⇒ would be CONTINUE, but Betsy is pronominalized while Cb (Susan) is a proper name ⇒ Rule 1 VIOLATION.

                                    Equations
                                    • One or more equations did not get rendered due to their size.
                                    Instances For

                                      (10c) She told Susan that she really liked the gift. — "She" = Betsy (subj), Susan as proper name (obj). Cb = Susan (highest in Cf(b) realized in c). Cp = Betsy ⇒ RETAIN. Betsy is pronominalized while Cb (Susan) is a proper name ⇒ Rule 1 VIOLATION.

                                      Equations
                                      • One or more equations did not get rendered due to their size.
                                      Instances For

                                        All four (c) variants share Cb = Susan: Susan is the highest-ranked Cf(b) element realized in each (c). The choice of variant does not change which entity is the Cb — only how that Cb is realized.

                                        Variant 7 satisfies Rule 1 (Susan as Cb pronominalized).

                                        Variant 8 satisfies Rule 1 (Susan as Cb pronominalized via "her").

                                        Variant 9 violates Rule 1: Betsy is pronominalized but Cb (Susan) is realized as a proper name.

                                        Variant 10 violates Rule 1: Betsy is pronominalized but Cb (Susan) is realized as a proper name. The paper calls this case "completely unacceptable".

                                        The Rule-1 split (d7,8 OK vs d9,10 violate) tracks the paper's acceptability ordering: variants 7 and 8 are acceptable, 9 and 10 are degraded. The framework predicts this directly from Rule 1 plus the subject>object Cf ranking.

                                        (20a) John has been having a lot of trouble arranging his vacation.

                                        Equations
                                        Instances For

                                          (20b) He cannot find anyone to take over his responsibilities.

                                          Equations
                                          Instances For

                                            (20c) He called up Mike yesterday.

                                            Equations
                                            • One or more equations did not get rendered due to their size.
                                            Instances For

                                              (20d) Mike has annoyed him a lot recently.

                                              Equations
                                              • One or more equations did not get rendered due to their size.
                                              Instances For

                                                (20e) He called John at 5 AM on Friday last week.

                                                Equations
                                                • One or more equations did not get rendered due to their size.
                                                Instances For

                                                  Rule 1 holds throughout Discourse 20 (the paper's claim).

                                                  Centering's "highest-ranked Cf element" — the Cp (preferred center) — corresponds to @cite{kehler-rohde-2013}'s topichood side of the Bayesian decomposition: the production component P(pronoun | referent) is conditioned on whether the referent is the topic. The Cp of an active-clause subject is precisely the default_ topichood level in @cite{kehler-rohde-2013}'s scheme.

                                                  Equations
                                                  Instances For

                                                    @cite{kwon-lee-2026}'s Korean Exp 3 finding (null pronouns strongly prefer subject antecedents) is predicted by Centering Theory:

                                                    1. Subject is the highest-ranked Cf element (Kameyama 1986).
                                                    2. The subject of `U_n` typically becomes the Cb of `U_{n+1}`.
                                                    3. By Rule 1, the Cb is preferentially realized by a pronoun.
                                                    4. In Korean, the highest-accessibility marker (most preferred
                                                       pronominal form) is the *null* pronoun (@cite{ariel-2001}).
                                                    
                                                    Composing: subject → Cb → pronoun → null in Korean.
                                                    
                                                    The derivation is anchored on a concrete two-utterance Korean
                                                    continuation pattern: utterance (a) introduces a subject-marked
                                                    referent; utterance (b) refers back to it with a null pronoun. 
                                                    

                                                    (a) Mary often took Tom to the sea. — adapted from @cite{kwon-lee-2026} Exp 3 stimulus pattern.

                                                    Equations
                                                    • One or more equations did not get rendered due to their size.
                                                    Instances For

                                                      (b) [pro] achieved the dream of becoming a sea navigator. Korean null subject; the realization is pronominal (the empty category counts as pronominal in the Centering sense).

                                                      Equations
                                                      Instances For

                                                        Step 1 of the derivation: in canonical Korean SVO, the Cb of the second utterance is the prior-utterance subject.

                                                        Step 2: realizing the Cb pronominally satisfies Rule 1 (the null subject in Korean counts as pronominal).

                                                        Step 3: among Korean's three referential forms, the null pronoun is the most accessible (top of the Korean-form linear order, derived from Ariel2001.AccessibilityLevel.rank in KwonLee2026).

                                                        Centering predicts Korean's null-subject preference: combining Rule 1 with Korean's accessibility-scale calibration. The 71% empirical subject bias for null pronouns (@cite{kwon-lee-2026} Exp 3) is the predicted consequence of this composition.

                                                        Centering's Cb (the "currently centered" entity) corresponds to a high-accessibility referent on @cite{ariel-2001}'s scale. Rule 1 predicts that the Cb's realization should use a high-accessibility marker — typically a pronoun.

                                                        Equations
                                                        Instances For

                                                          This section mechanizes the Sidner-comparison the paper makes in its own §9 (p. 222), on the discourse:

                                                          (34) a. I haven't seen Jeff for several days.
                                                               b. Carl thinks he's studying for his exams,
                                                               c. but I think he went to the Cape with Linda.
                                                          
                                                          GJW summarize Sidner's prediction: "On Sidner's account, Carl is
                                                          the actor focus after (34b) and Jeff is the discourse focus.
                                                          Because the actor focus is preferred as the referent of pronominal
                                                          expressions, Carl is the leading candidate for the entity referred
                                                          to by *he* in (34c)." Then: "On our account, Jeff is the C_b at
                                                          (34b) and there is no problem."
                                                          
                                                          Both theories must commit to a referent for *he* in (34c). The
                                                          formalization picks the one that is **coherence-preferred** under
                                                          each theory:
                                                          
                                                          - Sidner: agent-position pronoun → actor focus (Carl). See
                                                            `Sidner1983.resolvePronoun` and the focus-state computation in
                                                            `Phenomena/Reference/Studies/Sidner1983.lean`.
                                                          - GJW: pick the resolution that yields the higher-ranked Rule-2
                                                            transition. With "he" = Jeff the Cb is preserved (Jeff → Jeff)
                                                            but the matrix subject "I" becomes the new Cp, so this is a
                                                            RETAIN. With "he" = Carl, the Cb shifts (Jeff → Carl), so this
                                                            is a SHIFT. RETAIN outranks SHIFT under Rule 2, so GJW predict
                                                            Jeff. 
                                                          

                                                          (34a) I haven't seen Jeff for several days.

                                                          Equations
                                                          • One or more equations did not get rendered due to their size.
                                                          Instances For

                                                            (34b) Carl thinks he's studying for his exams. The matrix subject is Carl (full name); the embedded subject "he" co-specifies Jeff (continuing from 34a).

                                                            Equations
                                                            • One or more equations did not get rendered due to their size.
                                                            Instances For

                                                              (34c) under the resolution "he" = Jeff: continues the Cb.

                                                              Equations
                                                              • One or more equations did not get rendered due to their size.
                                                              Instances For

                                                                (34c) under the resolution "he" = Carl: shifts the Cb.

                                                                Equations
                                                                • One or more equations did not get rendered due to their size.
                                                                Instances For

                                                                  Cb of (34b) given (34a) is Jeff — paper's own claim ("Jeff is the C_b at (34b)", §9 p. 222).

                                                                  Under the Jeff-resolution of (34c), the Cb continues as Jeff.

                                                                  Under the Carl-resolution of (34c), the Cb shifts to Carl.

                                                                  GJW prediction for (34c): by Rule 2 (RETAIN outranks SHIFT here), the Jeff-resolution is preferred. Returns Option String: none if the candidate Rule-2 ranks coincide and the framework cannot adjudicate; otherwise some "Jeff" or some "Carl".

                                                                  Caveat about overclaim. GJW themselves do not commit to a unique referent in §9 p. 222 — they only say "Jeff is the C_b at (34b) and there is no problem." Rule 2 in their paper is a constraint over speaker production, not an interpreter resolution algorithm. This function operationalizes "GJW's prediction" as "the resolution Rule 2 would prefer if a speaker had to choose between the two transitions"; it is closer to the Brennan-Friedman-Pollard 1987 resolution algorithm than to GJW 1995 as published. The headline disagreement theorem is honest about this gap — see its docstring.

                                                                  Equations
                                                                  • One or more equations did not get rendered due to their size.
                                                                  Instances For

                                                                    The disagreement on what "he" in (34c) refers to.

                                                                    Sidner's prediction (§5.2.6 step 3): agent-position pronoun → actor focus = Carl.

                                                                    GJW's Rule-2 preference (with the caveat above that GJW themselves don't claim uniqueness): RETAIN > SHIFT under Rule 2 ⇒ Jeff.

                                                                    Stated constructively (mathlib idiom): each side commits to a named prediction; the inequality follows by transparent decide from the witnesses.