Documentation

Linglib.Phenomena.Reference.Studies.Beaver2004

@cite{beaver-2004}: The Optimization of Discourse Anaphora #

David I. Beaver (2004), "The Optimization of Discourse Anaphora," Linguistics and Philosophy 27(1): 3-56. © 2004 Kluwer. DOI 10.1023/B:LING.0000010796.76522.7a.

Beaver reformulates @cite{grosz-joshi-weinstein-1995} Centering (and the @cite{brennan-friedman-pollard-1987} algorithmic version) as an Optimality Theory system over six ranked constraints (his §3.2 p. 14):

AGREE > DISJOINT > PRO-TOP > FAM-DEF > COHERE > ALIGN

His main theorem (his §3.4 (20), proven Appendix A): given a sentence in which the only definite expressions are proper nouns and pronouns, if either COT or BFP uniquely predicts an interpretation involving fully anaphoric resolution, then both do, and they resolve anaphors identically.

@cite{poesio-stevenson-eugenio-hitzeman-2004} §3.1 fn 12 endorse this OT reformulation as the right framework for "unpacking" Centering's constraints into ranked-OT terms.

Design: deep reuse of Centering primitives #

Per the audit's mathlib-style recommendation, three of Beaver's six constraints are LITERAL RESTATEMENTS of existing Centering primitives (Beaver acknowledges this in his §3.2-3.3 prose):

Beaver constraintBeaver's sourceOur definition
PRO-TOP"essentially the effect of Centering's Rule 1" (his p. 15-16)via Rule1Gordon from Centering/Rule1.lean (NOT Rule1GJW95)
COHERE"the conditions used in BFP to specify transition types" (his p. 17)via cb (topic stability test)
ALIGNsamevia cb + cp (topic in preferred-center position)
AGREEbinding theory (his p. 14)NEW (substrate-gap-flagged: Realization lacks number/gender features)
DISJOINTPrinciple B (his p. 15)NEW (substrate-gap-flagged: Utterance lacks predicate-argument structure)
FAM-DEFfamiliarity (Heim 1982; his p. 16)NEW (substrate-gap-flagged: Realization lacks definiteness)

The 3 reused constraints make Theorem (20)'s equivalence with BFP structural on those clauses (it follows from the constraints being literal restatements). The 3 novel constraints fire as user-supplied boolean flags on the candidate type — substrate gaps that future commits can fill (number/gender features; predicate-argument structure; definiteness marking).

This is the PMF-vs-Measure mathlib pattern applied: Centering and OT are sibling substrates with their own vocabularies; the bridge lives in this paper-anchored study file rather than as a substrate-level identity. Where Beaver explicitly equates his constraints with Centering primitives, we define them via those primitives (deep reuse). Where Beaver introduces independent content (AGREE/DISJOINT/ FAM-DEF), we keep them independent.

Scope #

This file mechanizes:

  1. The 6 COT constraints with the design above.
  2. Beaver's tableau examples (5b), (12c) — predicting BFP-equivalent resolutions.
  3. Theorem (20) witnesses on those examples (the COT-optimal candidate matches the BFP-survivor).
  4. Beaver §4.1's PRO-TOP demotion: a re-ranked constraint hierarchy (PRO-TOP below FAM-DEF) gives Beaver (2c) "Mary likes tennis. She plays Jim. He used to play doubles with Mary." the bound reading where BFP overpredicts. The PSDH §3.1 fn 12 critique mechanized.

Out of scope (Beaver's §5 extensions):

structure Beaver2004.Candidate (E R : Type) :

A COT candidate: a fully-resolved cur utterance plus precomputed boolean flags for the three constraints whose substrates linglib doesn't yet model (AGREE/DISJOINT/FAM-DEF).

Beaver's GEN function (his §3.1 p. 13) generates all candidate pairs of (form, meaning); the constraints filter. For our formalization, candidates are alternative Utterance E R values for the same surface form (different pronoun resolutions), with the 3 substrate-gap flags supplied by the example author.

The 3 flags correspond to constraints whose semantics requires information the existing Realization E R type doesn't carry:

  • agreementOK: pronoun-antecedent number/gender match
  • argDisjointOK: co-arguments of a predicate refer to distinct entities
  • famDefOK: every definite NP has an antecedent (familiarity)
  • The resolved utterance — pronouns bound to specific entities.

  • agreementOK : Bool

    AGREE: pronoun-antecedent agreement OK? (Substrate gap: Realization E R lacks number/gender features. Linglib/Features/{Number,Gender}.lean define the feature enums; an [HasPhi E] typeclass lift would let us derive this from the entity type. Queued.)

  • argDisjointOK : Bool

    DISJOINT: co-arguments of any predicate are distinct entities? (Substrate gap: Utterance E R lacks predicate-argument structure. Linglib/Theories/Semantics/ArgumentStructure/Defs.lean provides ThematicFrame which could carry this info; integration queued.)

  • famDefOK : Bool

    FAM-DEF: every definite NP is familiar (has antecedent in prev)? (Substrate gap: Realization E R lacks definiteness marking. Linglib/Features/Definiteness.lean defines the enum; an [HasDefiniteness E] typeclass + a [HasGivenness E] typeclass (the latter not yet defined; deferred to a follow-up PR per the post-Krifka substrate redesign) would let famDefOK be derived from Features.GivennessStatus. Queued.)

Instances For
    def Beaver2004.instReprCandidate.repr {E✝ R✝ : Type} [Repr E✝] [Repr R✝] :
    Candidate E✝ R✝Std.Format
    Equations
    • One or more equations did not get rendered due to their size.
    Instances For
      @[implicit_reducible]
      instance Beaver2004.instReprCandidate {E✝ R✝ : Type} [Repr E✝] [Repr R✝] :
      Repr (Candidate E✝ R✝)
      Equations

      AGREE (Beaver §3.2 p. 14): "Anaphoric expressions agree with their antecedents in number and gender."

      Substrate gap. Implementation tests the candidate's precomputed agreementOK flag rather than computing agreement from features on Realization.

      Equations
      Instances For

        DISJOINT (Beaver §3.2 p. 14, "mirroring Principle B from binding theory"): "Co-arguments of a predicate are disjoint."

        Substrate gap. Tests the candidate's precomputed argDisjointOK flag.

        Equations
        Instances For

          PRO-TOP (@cite{beaver-2004} §3.2, "essentially the effect of Centering's Rule 1"): "The topic is pronominalized."

          Reused from Rule1Gordon (the unconditional Rule 1 variant from Centering/Rule1.lean, after @cite{gordon-grosz-gilliom-1993}). Beaver §3.2 explicitly REMOVES the if-clause: in OT, PRO-TOP has no antecedent — the topic should be pronominalized, full stop. Defeasibility is encoded by PRO-TOP's lower ranking position (3rd of 6), not by a propositional if-clause.

          Important: an EARLIER version of this file used Rule1GJW95 (the conditional Rule 1: "if any pronoun, then CB pronominalized"). That was the WRONG reuse — Beaver §3.2 ("if there are no pronouns, then all candidate interpretations will be equally bad as far as PRO-TOP is concerned") explicitly states all interpretations VIOLATE PRO-TOP when no pronoun realizes the topic. With the conditional Rule1GJW95, all pronoun-free interpretations would satisfy PRO-TOP — the opposite of Beaver's intent. Rule1Gordon (which fires unconditionally when CB exists and isn't pronominalized) is the correct restatement.

          Equations
          Instances For

            FAM-DEF (Beaver §3.2 p. 15, "Heim 1982"): "Each definite NP is familiar."

            Substrate gap. Tests famDefOK flag.

            Equations
            Instances For
              def Beaver2004.cohere {E R : Type} [DecidableEq E] [Discourse.Centering.CfRankerOf E R] (prev : Discourse.Centering.Utterance E R) (priorTopic : Option E) :

              COHERE (@cite{beaver-2004} §3.2, p. 15 statement; commentary p. 17): "The topic of the current sentence is the topic of the previous one."

              Reused from Centering's cb: the "topic" of the current utterance is cb prev cand.utt; the prior topic (supplied as Option E). Per Beaver p. 17 ("satisfied only if the topic is defined and unchanged") plus p. 16 ("undefinedness of the topic will result in a violation"), COHERE fires when:

              • cb prev c.utt = none (current topic undefined), OR
              • both topics defined but different. Only when both are defined and equal is COHERE satisfied.
              Equations
              • One or more equations did not get rendered due to their size.
              Instances For

                ALIGN (@cite{beaver-2004} §3.2, p. 15 statement; commentary p. 18): "The topic is in subject position."

                Reused from Centering's cb and cp: ALIGN is satisfied iff the topic (cb prev c.utt) coincides with the preferred center (c.utt.cp) — i.e., the highest-ranked Cf is the topic. This IS BFP's "smooth shift / continue" test.

                Per Beaver p. 18: "ALIGN literally requires the topic to be subject, but for canonical English sentences this is equivalent to saying that the topic is the preferred center of the current sentence." Per Beaver p. 16 (general policy on undefined topic), undefined topic counts as a violation — same shape as COHERE.

                Equations
                • One or more equations did not get rendered due to their size.
                Instances For
                  def Beaver2004.cotRanking {E R : Type} [DecidableEq E] [Discourse.Centering.CfRankerOf E R] (prev : Discourse.Centering.Utterance E R) (priorTopic : Option E) :

                  The COT constraint hierarchy (Beaver §3.2 p. 14): AGREE > DISJOINT > PRO-TOP > FAM-DEF > COHERE > ALIGN.

                  Equations
                  Instances For

                    These two theorems are the substrate-level structural backing for the cross-framework finding worked out in Phenomena/Reference/Studies/PoesioEtAl2004.lean §5.1: Beaver's COT cannot distinguish two candidates whose cb agrees, and ALIGN additionally consults cp. Per-example facts in §5.1 (the decide-checked Nat comparisons on u227/u229) follow as corollaries: the cb-and-cp values for cand_u229 reduce, and the structural theorems say the constraint evaluations are determined by those reductions.

                    Mathlib analogue: a function `f` "factors through `g`" when
                    `g x = g y → f x = f y`. Here `g = cb prev` (or `g = cb prev × cp`)
                    and `f = (cohere prev priorTopic).eval` (or `(align prev).eval`).
                    The theorems are the formal statement of `Beaver.cohereCandidate
                    = ψ ∘ cb prev` for some `ψ`, surfacing the cb-only dependence
                    Beaver's substrate makes invisible. 
                    
                    theorem Beaver2004.cohere_factors_through_cb {E R : Type} [DecidableEq E] [Discourse.Centering.CfRankerOf E R] (prev : Discourse.Centering.Utterance E R) (priorTopic : Option E) (c1 c2 : Candidate E R) (h : Discourse.Centering.cb prev c1.utt = Discourse.Centering.cb prev c2.utt) :
                    (cohere prev priorTopic).eval c1 = (cohere prev priorTopic).eval c2

                    COHERE factors through cb. For fixed prev and priorTopic, Beaver's COHERE constraint cannot distinguish two candidates whose cb prev c.utt values agree. Pure unfold + rw proof: COHERE's predicate references cb prev c.utt only; mkMark turns the predicate into an if-then-else; equal cb's give equal evaluations.

                    The cross-framework consequence (worked out in PoesioEtAl2004.lean §5.1): on PSDH (10), cb u227 u229 returns a single member of the cbAll tie set; COHERE's verdict on any candidate c with cb u227 c.utt = cb u227 u229 is therefore fully determined by cb u227 u229 and priorTopic — Beaver's COT cannot recover the discarded tie member from candidate-side information.

                    theorem Beaver2004.align_factors_through_cb_and_cp {E R : Type} [DecidableEq E] [Discourse.Centering.CfRankerOf E R] (prev : Discourse.Centering.Utterance E R) (c1 c2 : Candidate E R) (h_cb : Discourse.Centering.cb prev c1.utt = Discourse.Centering.cb prev c2.utt) (h_cp : c1.utt.cp = c2.utt.cp) :
                    (align prev).eval c1 = (align prev).eval c2

                    ALIGN factors through cb AND cp. ALIGN's predicate references both cb prev c.utt and c.utt.cp; equal cb's AND equal cp's give equal evaluations.

                    The cross-framework consequence: ALIGN inherits the cb-blindness structurally (via h_cb), but additionally requires cp-equality (via h_cp). Two candidates that agree on cb but disagree on cp can produce different ALIGN verdicts; this is the formal opening by which a positional totalizer (Strube-Hahn linear order) could differ from Beaver's lex-min — Strube-Hahn changes which realization wins the cf-sort tiebreaker, hence which is cp, hence ALIGN's verdict on candidates that share cb.

                    Beaver §3.3 (12) (paper p. 23):

                    > (12) a. Jane is happy.
                    >      b. She was congratulated by Freda,
                    >      c. and Mary gave her a present.
                    
                    BFP classifies (12c) as RETAIN (Jane maintained as topic,
                    realized in object position not subject). COT predicts the same
                    resolution: "her" → Jane (l=i in Beaver's notation). The competing
                    resolution "her" → Freda (l=j) is disfavored by COHERE. 
                    

                    (12a) "Jane is happy."

                    Equations
                    Instances For

                      (12b) "She was congratulated by Freda." She=Jane (resolved).

                      Equations
                      • One or more equations did not get rendered due to their size.
                      Instances For

                        (12c) Candidate where "her"=Jane (l=i): the BFP-survivor / COT prediction. The resolved utterance: subject Mary (full name), object her=Jane (pronoun), other "present" (full).

                        Equations
                        • One or more equations did not get rendered due to their size.
                        Instances For

                          (12c) Candidate where "her"=Freda (l=j): non-survivor under both BFP and COT.

                          Equations
                          • One or more equations did not get rendered due to their size.
                          Instances For

                            The two candidates wrapped with substrate-gap flags. AGREE, DISJOINT, FAM-DEF all OK in both (Mary, Freda, present, Jane all distinct; "her" is pronoun matching either Jane or Freda in number/gender; no definite descriptions).

                            Equations
                            Instances For
                              Equations
                              Instances For
                                def Beaver2004.D12.priorTopic :
                                Option String

                                The prior topic, computed from (12a) → (12b): Jane.

                                Equations
                                Instances For

                                  The prior topic for (12) is some "Jane".

                                  Beaver tableau (13) line 1 (winner, l=i): only ALIGN violated. COHERE not violated because the topic (cb b c_l_eq_i = Jane) matches the prior topic (cb a b = Jane).

                                  Beaver tableau (13) line 2 (loser, l=j): COHERE and ALIGN both violated (topic shifts from Jane to Freda; topic not in subject position).

                                  Beaver Theorem (20) witness on example (12): the COT-optimal candidate (l=i) matches BFP's prediction (RETAIN with Jane as topic). The Beaver-side optimization picks l=i because it has fewer COHERE violations (the higher-ranked constraint among the two that distinguish the candidates).

                                  Beaver §2.4 (2) and §4.1 p. 26 (paper pp. 10, 26):

                                  > (2) a. Mary likes tennis.
                                  >     b. She plays Jim quite often.
                                  >     c. He used to play doubles with Mary.
                                  
                                  BFP predicts: in (2c), since the topic (Mary) is not pronominalized
                                  despite "He" being a pronoun, **the bound reading where "Mary" in
                                  (2c) refers back to (2a)'s Mary is filtered out** (Rule 1 violation).
                                  The resulting forced reading involves "two different people called
                                  Mary" (Beaver p. 11) — empirically wrong; the bound reading is
                                  available though awkward.
                                  
                                  Beaver §4.1 p. 26: demoting PRO-TOP below FAM-DEF (re-ranking
                                  `AGREE > DISJOINT > FAM-DEF > PRO-TOP > COHERE > ALIGN`)
                                  correctly predicts the bound reading. Under the demoted ranking,
                                  the FAM-DEF constraint (penalizing introduction of a "second
                                  Mary") outranks PRO-TOP (penalizing the unpronominalized topic),
                                  so the candidate where Mary in (2c) IS the prior Mary wins.
                                  
                                  This is the @cite{poesio-stevenson-eugenio-hitzeman-2004} §3.1 fn
                                  12 critique mechanized: "in BFP Rule 1 is effectively used as a
                                  hard constraint, a problem fixed by Beaver's own optimality-
                                  theoretic reformulation of the algorithm." 
                                  

                                  (2a) "Mary likes tennis."

                                  Equations
                                  • One or more equations did not get rendered due to their size.
                                  Instances For

                                    (2b) "She plays Jim quite often." She=Mary (resolved).

                                    Equations
                                    • One or more equations did not get rendered due to their size.
                                    Instances For

                                      (2c) Bound reading (Mary in (2c) = Mary in (2a)): "He=Jim used to play doubles with Mary=Mary_a." Subject = Jim (pronoun "He"), object = Mary (full name, but ANAPHORIC to prior Mary — so famDefOK = true, FAM-DEF satisfied).

                                      Equations
                                      • One or more equations did not get rendered due to their size.
                                      Instances For

                                        (2c) Two-Marys reading (BFP's predicted resolution): "He=Jim used to play doubles with Mary'≠Mary_a." Same surface; the object "Mary" introduces a NEW entity ≠ the prior Mary. We encode the new entity as a distinct string "Mary_new". Under this resolution, FAM-DEF FIRES (Mary_new is a definite that has no familiar antecedent).

                                        Equations
                                        • One or more equations did not get rendered due to their size.
                                        Instances For

                                          Bound-reading candidate. AGREE/DISJOINT OK; famDefOK=true (Mary is anaphoric to the (2a) Mary).

                                          Equations
                                          Instances For

                                            Two-Marys candidate. AGREE/DISJOINT OK; famDefOK=FALSE (the second Mary is a brand-new definite without antecedent — FAM-DEF fires).

                                            Equations
                                            Instances For
                                              def Beaver2004.D2.priorTopic :
                                              Option String

                                              The prior topic for (2): Mary, the topic of (2b) given (2a).

                                              Equations
                                              Instances For

                                                Bound-reading candidate violates PRO-TOP (the topic Mary is not pronominalized — "Mary" is a full name, not "her"). Under Beaver's CANONICAL ranking with PRO-TOP > FAM-DEF, this filters out the bound reading.

                                                Two-Marys candidate violates FAM-DEF (the second "Mary" is a definite NP without familiar antecedent).

                                                Under canonical COT ranking (PRO-TOP > FAM-DEF), the two-Marys reading wins (matches BFP). PRO-TOP violation is more costly.

                                                def Beaver2004.cotRankingDemoted {E R : Type} [DecidableEq E] [Discourse.Centering.CfRankerOf E R] (prev : Discourse.Centering.Utterance E R) (priorTopic : Option E) :

                                                Beaver §4.1 demoted ranking (his p. 26): AGREE > DISJOINT > FAM-DEF > PRO-TOP > COHERE > ALIGN. PRO-TOP demoted below FAM-DEF.

                                                Equations
                                                Instances For

                                                  Beaver §4.1 prediction: under the demoted ranking, the bound reading wins because FAM-DEF (now ranked higher than PRO-TOP) eliminates the two-Marys candidate. The bound reading's PRO-TOP violation is now LESS costly than the two-Marys candidate's FAM-DEF violation. The PSDH §3.1 fn 12 critique mechanized: Beaver's reformulation FIXES BFP's Rule-1-as-hard-constraint overprediction by allowing Rule 1 to be defeated by higher-ranked constraints.

                                                  Beaver's headline theorem (his §3.4 (20), p. 26):

                                                  > Given a sentence in which the only definite expressions are
                                                  > proper nouns and pronouns, if either COT (with the constraints
                                                  > and constraint ranking above) or BFP uniquely predicts an
                                                  > interpretation involving fully anaphoric interpretation of all
                                                  > definites, then both do, and in this case they resolve anaphors
                                                  > identically.
                                                  
                                                  With our **deep-reuse design** (PRO-TOP via Rule1GJW95, COHERE via
                                                  cb, ALIGN via cb+cp), this theorem is partly STRUCTURAL:
                                                  
                                                  - The COT-side `proTop` constraint IS BFP's Rule 1 by definition.
                                                  - The COT-side `cohere` constraint IS BFP's "Cb stable" test.
                                                  - The COT-side `align` constraint IS BFP's "Cb = Cp" test.
                                                  - AGREE / DISJOINT / FAM-DEF are the BFP-Filter conditions
                                                    (Beaver §2.2 p. 8: BFP's Filter 1 = Rule 1 = our PRO-TOP;
                                                    Filter 3 = argument coreference = our DISJOINT). FAM-DEF is
                                                    Beaver's addition (BFP doesn't have it, since BFP doesn't
                                                    handle definite descriptions).
                                                  
                                                  The structural correspondence makes the COT-vs-BFP equivalence
                                                  mechanically apparent for the 3 reused constraints (PRO-TOP / COHERE
                                                  / ALIGN are LITERALLY defined via `Rule1Gordon` / `cb` / `cb`+`cp`,
                                                  so any "iff" theorem connecting them to the substrate is a
                                                  one-line `unfold` — i.e., the canonical naCanBridge anti-pattern
                                                  CLAUDE.md forbids). We previously had `proTop_iff_rule1Gordon_fails`,
                                                  `cohere_iff_cb_diverges`, `align_iff_cb_neq_cp` theorems here;
                                                  they were correctly flagged by audit as encoding-conclusions-as-
                                                  definitions. The 3 novel constraints don't change BFP's predictions
                                                  on Beaver's worked examples (verified by `decide` per-example). 
                                                  

                                                  Per audit's mathlib-discipline finding (the file defined 6 constraints + per-example violation theorems but never invoked the ViolationProfile lex-ordering machinery), compute the full ViolationProfile 6 for Beaver (12)'s two candidates under the canonical COT ranking and verify the lex-ordering picks l=i.

                                                  A full `Tableau` construction would also work (and is what
                                                  mathlib-style would prefer), but currently `Utterance E R`
                                                  lacks `DecidableEq` in the substrate (deriving only `Repr`),
                                                  so `Finset (Candidate E R)` synthesis is blocked. The
                                                  `ViolationProfile` comparison gives the same kernel-checked OT
                                                  witness without requiring `Finset` machinery. Adding
                                                  `DecidableEq` to the `Realization` / `Utterance` structures'
                                                  deriving block (substrate change) would unblock the full
                                                  `Tableau.optimal = {cand_l_eq_i}` form; queued for follow-up. 
                                                  

                                                  OT lex-min witness on Beaver (12): under the canonical COT ranking, cand_l_eq_i's violation profile is strictly less than cand_l_eq_j's in ViolationProfile's lex order — i.e., the OT optimization mechanism (lex-min on Nat-vector profiles) picks cand_l_eq_i as the unique optimal candidate. This is the kernel-checked OT-mechanism witness, exercising the Core.Constraint.OT substrate's mkProfile / ViolationProfile.LinearOrder infrastructure.

                                                  Per audit's cross-framework finding, three sibling Centering formalizations exist in linglib that Beaver should engage:

                                                  - **`Sidner1983.lean`**: two-focus account. Beaver §4.3 explicitly
                                                    proposes SALIENT EMPATHY (his p. 30) which is structurally
                                                    Sidner's actor focus. Out-of-scope for this commit (Beaver §4.3
                                                    cross-linguistic constraints not formalized), but flagged.
                                                  
                                                  - **`Centering/Rule2.lean::Rule2Strube1998Cheap`** (cheap iff
                                                    `cb prev cur = priorCp`). Beaver's ALIGN at the *previous*
                                                    utterance position structurally encodes the same predicate —
                                                    see hidden-agreement theorem below.
                                                  
                                                  - **`KehlerRohde2013.lean`**: Bayesian decomposition of pronoun
                                                    resolution. Chronology: Beaver < KR2013, so the bridge belongs
                                                    in `KehlerRohde2013.lean`, not here. Future-work. 
                                                  
                                                  theorem Beaver2004.align_within_utterance_iff_cb_eq_cp {E R : Type} [DecidableEq E] [Discourse.Centering.CfRankerOf E R] (prev : Discourse.Centering.Utterance E R) (c : Candidate E R) :
                                                  (align prev).eval c = 0 (Discourse.Centering.cb prev c.utt).isSome = true Discourse.Centering.cb prev c.utt = c.utt.cp

                                                  Hidden agreement: ALIGN ≈ Strube 1998 cheap (structurally). Strube's "cheap" predicate (Rule2Strube1998Preferred) tests cb prev cur = priorCp ∧ cb is some — the previous-Cp predicts the current-CB. ALIGN at the current utterance tests the analogous within-utterance condition: cb prev cur = current-cp.

                                                  Their structural identity: ALIGN says "topic = current Cp"; cheap says "topic = previous Cp." Same shape, different time index. Not literally equal — the "previous Cp" version is Strube's specific contribution. The shapes match; the witness below shows ALIGN is satisfied iff the (constructed) within- utterance cheap-condition holds.