Documentation

Linglib.Theories.Discourse.Centering.Rule2

Centering Theory — Rule 2 (Transition Preference) and Variants #

@cite{grosz-joshi-weinstein-1995} @cite{brennan-friedman-pollard-1987} @cite{strube-1998} @cite{poesio-stevenson-eugenio-hitzeman-2004}

@cite{poesio-stevenson-eugenio-hitzeman-2004} §2.3.3 (p. 315-316) distinguish three formulations of Rule 2:

Following the same "variants as separate Props/Defs with implication theorems" mathlib pattern as Rule1.lean. Per linguistics- domain-expert audit, the cheap/expensive distinction is Strube 1998 solo (COLING-ACL '98, pp. 1251-1257), not Strube & Hahn 1999 — Hahn co-authored the related information-status work (Instances/InformationStatus.lean) but the cheap/expensive Rule 2 reformulation is Strube alone.

PSDH evaluate all three variants on their corpus. Headline finding: Rule 2 (BFP) is verified at the .01 level only with the Information- Status instantiation and STRUBE-HAHN ranking (Table 15, §5.1.1) — the first parameter setting where all three centering claims (Strong C1, Rule 1, Rule 2) pass simultaneously. Rule 2 (Strube cheap) is not verified by any instantiation PSDH test (cheap-cheap sequences average ~50 vs expensive-expensive ~250–650 across instantiations).

This file extracts the Rule 2 content from the older Rules.lean. With Rule1.lean (sibling file) it dissolves Rules.lean's monolithic structure.

@cite{grosz-joshi-weinstein-1995} Rule 2 (the "sequences" formulation per @cite{poesio-stevenson-eugenio-hitzeman-2004} §2.3.3): a pair of transitions (t₁, t₂) is preferred over a pair (s₁, s₂) when its constituent transitions have higher rank. Implemented as sum-of-ranks — the discriminating measure consistent with the paper's stated CONT-CONT > RET-RET > SHIFT-SHIFT case (min is a coarser alternative).

Per PSDH §4.1.2 (Table 4), this version is "roughly verified" in their corpus but with too few sequences to support strong claims: only ~28% of sequence pairs involve two of the canonical three transitions (CON/RET/SHIFT); the most frequent pair-types are NULL (~48%) and Establishment (~19%), neither of which the GJW 95 formulation classifies.

Equations
Instances For
    def Discourse.Centering.isCheap {E R : Type} [DecidableEq E] [CfRankerOf E R] {U : Type} [Realizes U E] (prev : Utterance E R) (cur : U) (prevCp : Option E) :

    @cite{strube-1998} cheap-vs-expensive distinction per @cite{poesio-stevenson-eugenio-hitzeman-2004} §2.3.3 (p. 316):

    A transition pair is cheap if the CB of the current utterance is correctly predicted by the CP of the previous utterance, that is, if CB(U_n) = CP(U_{n-1}). A transition pair is expensive if CB(U_n) ≠ CP(U_{n-1}).

    Rule 2 (Strube 1998) prefers cheap transitions over expensive ones. The intuition: when the speaker's previous utterance had e as its preferred (highest-ranked) center, processing is cheaper if e is also the next utterance's CB — the hearer's expectation is confirmed.

    Implementation note: takes prevCp : Option E as an explicit parameter (rather than recomputing cp from a third utterance), consistent with how Rule1GJW83 and classifyTransitionExtended take their "previous" arguments. The "previous CP" is whatever the prior cp prev_prev returned.

    Equations
    Instances For
      @[implicit_reducible]
      instance Discourse.Centering.isCheap.decidable {E R : Type} [DecidableEq E] [CfRankerOf E R] {U : Type} [Realizes U E] (prev : Utterance E R) (cur : U) (prevCp : Option E) :
      Decidable (isCheap prev cur prevCp)
      Equations
      def Discourse.Centering.Rule2Strube1998Preferred {E R : Type} [DecidableEq E] [CfRankerOf E R] {U : Type} [Realizes U E] (prev : Utterance E R) (cur : U) (prevCp : Option E) :

      @cite{strube-1998} Rule 2 preference: prefer cheap transitions over expensive ones. PSDH find this version is not verified by any of the parameter instantiations they test (every instantiation produces more expensive than cheap transitions — cheap-cheap sequence pairs average ~50 vs expensive-expensive ~250–650, Table 5 + §5.2.3).

      Equations
      Instances For

        Rule 2 (BFP 87 single transitions)Rule2BFP87Single — requires the 4-way BFPTransition enum (CON | RET | SSH | RSH where Smooth Shift splits from Rough Shift on whether CB(U_n) = CP(U_n)). Per the audit's "extract on second consumer" discipline, the 4-way enum lives in Phenomena/Reference/Studies/PoesioEtAl2004.lean as a private structure until a second consumer (Walker 1989, Brennan 1995, or other BFP-style algorithm) motivates promotion to substrate.

        Until then, the BFP 87 ranking can be encoded as a Nat-valued rank
        on the 4-way enum locally in study files. The substrate's existing
        `Transition.rank` (CON > RET > SHIFT) collapses BFP's CON/RET/SSH/RSH
        to CON/RET/SHIFT-or-SHIFT — a coarser but compatible ranking.