@cite{grosz-joshi-weinstein-1995}: Centering Theory #
@cite{kameyama-1986} @cite{gordon-grosz-gilliom-1993} @cite{kehler-rohde-2013} @cite{sidner-1983}
Centering: A Framework for Modeling the Local Coherence of Discourse. Computational Linguistics 21(2): 203–225.
Each utterance has a set of forward-looking centers Cf (ranked by
grammatical role: subject > object > other) and at most one
backward-looking center Cb (the highest-ranked Cf element of the
previous utterance that is also realized in the current one). Three
transition types — continuation, retaining, shifting — classify
adjacent-utterance pairs by whether the Cb is preserved and whether
that Cb is the most-highly-ranked Cf.
Two normative rules govern coherent discourse: Rule 1 (pronominalization constraint — if any Cf element is pronominalized in the next utterance, the Cb must be); Rule 2 (transition preference — continuations preferred over retentions, retentions over shifts).
The key empirical contrast is between Discourses 1 and 2 (§ 4 below): same propositional content, different transition pattern, different perceived coherence. The framework predicts the difference.
This file consumes the substrate types and operators from
Theories/Discourse/Centering/{Defs,Basic,Transition,Rule1,Rule2}.lean
plus the GrammaticalRole Cf-ranker instance from
Instances/GrammaticalRole.lean. Per linglib's import-don't-restipulate
discipline, no Centering primitives are redefined here — the file's
contribution is the empirical-example anchor for the substrate plus
the §8 comparison with @cite{sidner-1983}.
Throughout, examples use String entities for readability and
Utterance String GrammaticalRole from the substrate.
Utterance abbreviation specialized to the GJW use case
(String entities, grammatical-role-ranked Cf).
Equations
Instances For
(1a) John went to his favorite music store to buy a piano.
Equations
- One or more equations did not get rendered due to their size.
Instances For
(1b) He had frequented the store for many years.
Equations
- One or more equations did not get rendered due to their size.
Instances For
(1c) He was excited that he could finally buy a piano.
Equations
- One or more equations did not get rendered due to their size.
Instances For
(1d) He arrived just as the store was closing for the day.
Equations
- One or more equations did not get rendered due to their size.
Instances For
(2a) John went to his favorite music store to buy a piano.
Equations
- One or more equations did not get rendered due to their size.
Instances For
(2b) It was a store John had frequented for many years.
Equations
- One or more equations did not get rendered due to their size.
Instances For
(2c) He was excited that he could finally buy a piano.
Equations
- One or more equations did not get rendered due to their size.
Instances For
(2d) It was closing just as John arrived.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Per-pair transition predictions #
For each adjacent pair, the Cb (computed from the prior utterance)
and the transition type follow from the substrate definitions.
Discourse 1 (a→b): John continues as Cb.
Discourse 1 (b→c): John continues.
Discourse 1 (c→d): John continues.
Discourse 2 (a→b): the Cb is John (the only entity in Cf(D2.a)
that is realized in D2.b), but Cp(D2.b) = "store" (not John),
so this is a retain — already a less coherent transition than
Discourse 1's continuation.
Discourse 2 (b→c): John re-emerges as Cb (it was object in D2.b);
in D2.c, John is also Cp — so this would be a continuation.
Rule 1 is also fine here.
Discourse 2 (c→d): John remains Cb (was subject in D2.c, object
in D2.d), but Cp(D2.d) = "store". RETAIN, not continuation.
The coherence contrast (paper §2): Discourse 1's transition
pattern is all continuations; Discourse 2's pattern is
retain, continue, retain — a mix of weaker transitions. The sum
of Transition.ranks is a coarse but theory-aligned coherence
measure.
Equations
Instances For
(15a) He has been acting quite odd. (Cb = John, presumed in segment.)
Equations
- GroszJoshiWeinstein1995.D15.a = { realizations := [{ entity := "John", role := Discourse.Centering.GrammaticalRole.subject, isPronoun := true }] }
Instances For
(15b) He called up Mike yesterday. (Cb = John, "He" = John.)
Equations
- One or more equations did not get rendered due to their size.
Instances For
(15c) John wanted to meet him urgently. (Cb = John, "him" = Mike.) The Cf member Mike is pronominalized but the Cb John is not — a Rule 1 violation.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Rule 1 violation: in (15c), Mike is pronominalized but the Cb John is realized as a proper name. The paper's diagnosis.
The (a→b) pair of the same discourse satisfies Rule 1 (the only Cf element from (a) is John, who is pronominalized in (b) — the Cb is also pronominalized). The violation is local to the (b→c) step.
@cite{grosz-joshi-weinstein-1995} §5 examples (7)-(10). All four variants share utterances (a) and (b); they differ only in (c)'s realization choices. Variants (7) and (8) satisfy Rule 1; variants (9) and (10) violate it. The paper notes (10) is "completely unacceptable" and (9) is also degraded — both more so than (7) and (8) — because of the Rule 1 violations.
Shared:
> a. Susan gave Betsy a pet hamster.
> b. She reminded her that such hamsters were quite shy.
(a) Susan gave Betsy a pet hamster. Cf = [Susan, Betsy, hamster].
Equations
- One or more equations did not get rendered due to their size.
Instances For
(b) She reminded her ... "She" = Susan (subj), "her" = Betsy (obj).
Equations
- One or more equations did not get rendered due to their size.
Instances For
(7c) She asked Betsy whether she liked the gift. — "She" = Susan, Betsy as proper name (object). Cb = Susan, Cp = Susan ⇒ CONTINUE. Susan is pronominalized; Rule 1 satisfied.
Equations
- One or more equations did not get rendered due to their size.
Instances For
(8c) Betsy told her that she really liked the gift. — Betsy as proper name (subject), "her" = Susan. Cb = Susan (highest in Cf(b) realized in c), but Cp = Betsy ⇒ RETAIN. Susan as Cb pronominalized via "her"; Rule 1 satisfied.
Equations
- One or more equations did not get rendered due to their size.
Instances For
(9c) Susan asked her whether she liked the gift. — Susan as proper name (subject), "her" = Betsy. Cb = Susan, Cp = Susan ⇒ would be CONTINUE, but Betsy is pronominalized while Cb (Susan) is a proper name ⇒ Rule 1 VIOLATION.
Equations
- One or more equations did not get rendered due to their size.
Instances For
(10c) She told Susan that she really liked the gift. — "She" = Betsy (subj), Susan as proper name (obj). Cb = Susan (highest in Cf(b) realized in c). Cp = Betsy ⇒ RETAIN. Betsy is pronominalized while Cb (Susan) is a proper name ⇒ Rule 1 VIOLATION.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Variant 7 satisfies Rule 1 (Susan as Cb pronominalized).
Variant 8 satisfies Rule 1 (Susan as Cb pronominalized via "her").
Variant 9 violates Rule 1: Betsy is pronominalized but Cb (Susan) is realized as a proper name.
Variant 10 violates Rule 1: Betsy is pronominalized but Cb (Susan) is realized as a proper name. The paper calls this case "completely unacceptable".
The Rule-1 split (d7,8 OK vs d9,10 violate) tracks the paper's
acceptability ordering: variants 7 and 8 are acceptable, 9 and 10
are degraded. The framework predicts this directly from Rule 1
plus the subject>object Cf ranking.
(20a) John has been having a lot of trouble arranging his vacation.
Equations
- GroszJoshiWeinstein1995.D20.a = { realizations := [{ entity := "John", role := Discourse.Centering.GrammaticalRole.subject, isPronoun := false }] }
Instances For
(20b) He cannot find anyone to take over his responsibilities.
Equations
- GroszJoshiWeinstein1995.D20.b = { realizations := [{ entity := "John", role := Discourse.Centering.GrammaticalRole.subject, isPronoun := true }] }
Instances For
(20c) He called up Mike yesterday.
Equations
- One or more equations did not get rendered due to their size.
Instances For
(20d) Mike has annoyed him a lot recently.
Equations
- One or more equations did not get rendered due to their size.
Instances For
(20e) He called John at 5 AM on Friday last week.
Equations
- One or more equations did not get rendered due to their size.
Instances For
The paper-stipulated transition labels b→c, c→d, d→e: CONTINUE, RETAIN, SHIFT.
Rule 1 holds throughout Discourse 20 (the paper's claim).
Centering's "highest-ranked Cf element" — the Cp (preferred
center) — corresponds to @cite{kehler-rohde-2013}'s topichood
side of the Bayesian decomposition: the production component
P(pronoun | referent) is conditioned on whether the referent is
the topic. The Cp of an active-clause subject is precisely the
default_ topichood level in @cite{kehler-rohde-2013}'s scheme.
Equations
Instances For
@cite{kwon-lee-2026}'s Korean Exp 3 finding (null pronouns strongly prefer subject antecedents) is predicted by Centering Theory:
1. Subject is the highest-ranked Cf element (Kameyama 1986).
2. The subject of `U_n` typically becomes the Cb of `U_{n+1}`.
3. By Rule 1, the Cb is preferentially realized by a pronoun.
4. In Korean, the highest-accessibility marker (most preferred
pronominal form) is the *null* pronoun (@cite{ariel-2001}).
Composing: subject → Cb → pronoun → null in Korean.
The derivation is anchored on a concrete two-utterance Korean
continuation pattern: utterance (a) introduces a subject-marked
referent; utterance (b) refers back to it with a null pronoun.
(a) Mary often took Tom to the sea. — adapted from @cite{kwon-lee-2026} Exp 3 stimulus pattern.
Equations
- One or more equations did not get rendered due to their size.
Instances For
(b) [pro] achieved the dream of becoming a sea navigator. Korean null subject; the realization is pronominal (the empty category counts as pronominal in the Centering sense).
Equations
- GroszJoshiWeinstein1995.KoreanContinuation.utt_b_null = { realizations := [{ entity := "Mary", role := Discourse.Centering.GrammaticalRole.subject, isPronoun := true }] }
Instances For
Step 1 of the derivation: in canonical Korean SVO, the Cb of the second utterance is the prior-utterance subject.
Step 2: realizing the Cb pronominally satisfies Rule 1 (the null subject in Korean counts as pronominal).
Step 3: among Korean's three referential forms, the null pronoun is
the most accessible (top of the Korean-form linear order, derived
from Ariel2001.AccessibilityLevel.rank in KwonLee2026).
Centering predicts Korean's null-subject preference: combining Rule 1 with Korean's accessibility-scale calibration. The 71% empirical subject bias for null pronouns (@cite{kwon-lee-2026} Exp 3) is the predicted consequence of this composition.
Centering's Cb (the "currently centered" entity) corresponds to a high-accessibility referent on @cite{ariel-2001}'s scale. Rule 1 predicts that the Cb's realization should use a high-accessibility marker — typically a pronoun.
Equations
Instances For
This section mechanizes the Sidner-comparison the paper makes in its own §9 (p. 222), on the discourse:
(34) a. I haven't seen Jeff for several days.
b. Carl thinks he's studying for his exams,
c. but I think he went to the Cape with Linda.
GJW summarize Sidner's prediction: "On Sidner's account, Carl is
the actor focus after (34b) and Jeff is the discourse focus.
Because the actor focus is preferred as the referent of pronominal
expressions, Carl is the leading candidate for the entity referred
to by *he* in (34c)." Then: "On our account, Jeff is the C_b at
(34b) and there is no problem."
Both theories must commit to a referent for *he* in (34c). The
formalization picks the one that is **coherence-preferred** under
each theory:
- Sidner: agent-position pronoun → actor focus (Carl). See
`Sidner1983.resolvePronoun` and the focus-state computation in
`Phenomena/Reference/Studies/Sidner1983.lean`.
- GJW: pick the resolution that yields the higher-ranked Rule-2
transition. With "he" = Jeff the Cb is preserved (Jeff → Jeff)
but the matrix subject "I" becomes the new Cp, so this is a
RETAIN. With "he" = Carl, the Cb shifts (Jeff → Carl), so this
is a SHIFT. RETAIN outranks SHIFT under Rule 2, so GJW predict
Jeff.
(34a) I haven't seen Jeff for several days.
Equations
- One or more equations did not get rendered due to their size.
Instances For
(34b) Carl thinks he's studying for his exams. The matrix subject is Carl (full name); the embedded subject "he" co-specifies Jeff (continuing from 34a).
Equations
- One or more equations did not get rendered due to their size.
Instances For
(34c) under the resolution "he" = Jeff: continues the Cb.
Equations
- One or more equations did not get rendered due to their size.
Instances For
(34c) under the resolution "he" = Carl: shifts the Cb.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Cb of (34b) given (34a) is Jeff — paper's own claim ("Jeff is the C_b at (34b)", §9 p. 222).
Under the Jeff-resolution of (34c), the Cb continues as Jeff.
Under the Carl-resolution of (34c), the Cb shifts to Carl.
Jeff-resolution: Cb stable (Jeff → Jeff), but the matrix "I" becomes the new Cp, so this is a RETAIN, not a CONTINUE.
Carl-resolution: Cb shifts from Jeff to Carl.
GJW prediction for (34c): by Rule 2 (RETAIN outranks SHIFT
here), the Jeff-resolution is preferred. Returns Option String:
none if the candidate Rule-2 ranks coincide and the framework
cannot adjudicate; otherwise some "Jeff" or some "Carl".
Caveat about overclaim. GJW themselves do not commit to a unique referent in §9 p. 222 — they only say "Jeff is the C_b at (34b) and there is no problem." Rule 2 in their paper is a constraint over speaker production, not an interpreter resolution algorithm. This function operationalizes "GJW's prediction" as "the resolution Rule 2 would prefer if a speaker had to choose between the two transitions"; it is closer to the Brennan-Friedman-Pollard 1987 resolution algorithm than to GJW 1995 as published. The headline disagreement theorem is honest about this gap — see its docstring.
Equations
- One or more equations did not get rendered due to their size.
Instances For
The disagreement on what "he" in (34c) refers to.
Sidner's prediction (§5.2.6 step 3): agent-position pronoun → actor focus = Carl.
GJW's Rule-2 preference (with the caveat above that GJW themselves don't claim uniqueness): RETAIN > SHIFT under Rule 2 ⇒ Jeff.
Stated constructively (mathlib idiom): each side commits to a
named prediction; the inequality follows by transparent
decide from the witnesses.