Documentation

Linglib.Theories.Syntax.DependencyGrammar.Formal.VPDivergence

VP Divergence: DG vs PSG Constituency #

@cite{osborne-2019} @cite{osborne-gross-2012}

Formalizes the central empirical disagreement between Dependency Grammar and Phrase Structure Grammar regarding the finite VP (@cite{osborne-2019}, Ch. 2–4; Osborne, Putnam & Groß 2012, Syntax 15:4).

The core claim: In DG, the finite VP (verb + complements, excluding subject) is a catena but not a constituent. In PSG, the finite VP is a constituent. Constituency tests (topicalization, clefting, pseudoclefting, proform substitution, answer fragments) systematically fail to identify the finite VP as a constituent — supporting DG's prediction.

Key Results #

Bridges #

def DepGrammar.VPDivergence.isStrictSubset (n : ) (deps : List Dependency) :
Bool

Check that constituent count is strictly less than catena count.

Equations
Instances For
    def DepGrammar.VPDivergence.nonConstituentCatenae (n : ) (deps : List Dependency) :
    List (List )

    Enumerate all catenae that are NOT constituents.

    Equations
    • One or more equations did not get rendered due to their size.
    Instances For

      Strict containment for tree (9): constituents < catenae. (4 constituents < 10 catenae, @cite{osborne-gross-2012}, p. 359)

      Strict containment for chain4: constituents < catenae. (4 constituents < 10 catenae)

      Strict containment for star4: constituents < catenae. (4 constituents < 11 catenae)

      Tree (9) has exactly 6 non-constituent catenae.

      theorem DepGrammar.VPDivergence.exists_catena_not_constituent (deps : List Dependency) (v w : ) (hvw : v w) (hedge : parentEdge deps v w) :
      Catena.isCatena deps [v] = true ¬projection deps v = [v]

      Universal witness for strict containment:

      For any tree with ≥2 nodes and an edge (v, w), the singleton {v} is a catena (trivially connected: any singleton is connected in the dep graph) but NOT a constituent ({v} ≠ projection(v) because projection(v) includes w as a descendant).

      Uses the computable isCatena (from Catena.lean) rather than the Prop-level IsCatena which takes a SimpleGraph parameter unrelated to the dependency list. The two key facts:

      1. isCatena deps [v] = true — any singleton is a catena
      2. projection deps v ≠ [v] — v has a child w, so projection is strictly larger

      A minimal phrase structure tree for structural comparisons with DG. NOT intended to replace Minimalism's SyntacticObject/XBarPhrase (those carry theory-specific features). This is purely for the DG-vs-PSG constituency comparison.

      Instances For
        partial def DepGrammar.VPDivergence.instReprPSTree.repr :
        PSTreeStd.Format
        @[irreducible]

        Yield of a PSTree: the leaf words in left-to-right order.

        Equations
        Instances For
          @[irreducible]

          All constituents of a PSTree: every subtree's yield is a constituent.

          Equations
          Instances For

            Count of distinct constituents in a PSTree.

            Equations
            Instances For
              def DepGrammar.VPDivergence.PSTree.hasConstituent (t : PSTree) (ws : List String) :
              Bool

              Check whether a given word sequence is a constituent of this PSTree.

              Equations
              Instances For

                "Bill plays chess" (@cite{osborne-2019}, p. 92, example 24) #

                DG analysis:

                    plays(0)
                   / \
                Bill(1) chess(2)
                

                PSG analysis:

                       S
                      / \
                   Bill VP
                         / \
                      plays chess
                

                DG tree for "Bill plays chess": plays(0) → Bill(1), plays(0) → chess(2).

                Equations
                • One or more equations did not get rendered due to their size.
                Instances For

                  PSG tree for "Bill plays chess".

                  Equations
                  • One or more equations did not get rendered due to their size.
                  Instances For

                    "She reads everything" (@cite{osborne-2019}, p. 46, example 12) #

                    DG: reads(0) → she(1), reads(0) → everything(2). {reads, everything} is catena not constituent.

                    PSG: [S she [VP reads everything]]. {reads, everything} IS a constituent.

                    DG tree for "She reads everything".

                    Equations
                    • One or more equations did not get rendered due to their size.
                    Instances For

                      PSG tree for "She reads everything".

                      Equations
                      • One or more equations did not get rendered due to their size.
                      Instances For

                        "They will get the teacher a present" (@cite{osborne-2019}, p. 95–97, ex. 30–34) #

                        DG analysis — flat tree from will:

                                will(0)
                              / | \ \ \
                        They(1) get(2) teacher(3) present(4) the(5)
                                                              |
                                                              a(6)
                        

                        Actually, following UD conventions more carefully:

                        Simplified DG (UD-style): get(0) → They(1), get(0) → will(2), get(0) → teacher(3), get(0) → present(4), teacher(3) → the(5), present(4) → a(6)

                        {get, teacher, present} is catena but not constituent.

                        PSG analysis — deeply layered:

                                 S
                               / \
                            They VP
                                   / \
                                 will VP
                                      / \
                                   get VP
                                         / \
                                   the teacher NP
                                               / \
                                              a present
                        

                        Multiple constituents DG doesn't recognize.

                        DG tree for "They will get the teacher a present" (UD-style).

                        Equations
                        • One or more equations did not get rendered due to their size.
                        Instances For

                          PSG tree for "They will get the teacher a present".

                          Equations
                          • One or more equations did not get rendered due to their size.
                          Instances For

                            The finite VP {plays, chess} is a catena in DG.

                            The finite VP {plays, chess} is NOT a constituent in DG. (subtree of plays = {plays, Bill, chess} = whole sentence)

                            The finite VP {plays, chess} IS a constituent in PSG.

                            The finite VP {reads, everything} is a catena in DG.

                            The finite VP {reads, everything} is NOT a constituent in DG.

                            The finite VP {reads, everything} IS a constituent in PSG.

                            {get, teacher, present} is a catena in the DG tree.

                            {get, teacher, present} is NOT a constituent in DG.

                            {get, the, teacher, a, present} is a catena in DG.

                            {get, the, teacher, a, present} is NOT a constituent in DG (subtree of get = whole sentence).

                            The five standard constituency tests (@cite{osborne-2019}, p. 92, ex. 25).

                            Instances For
                              Equations
                              • One or more equations did not get rendered due to their size.
                              Instances For
                                @[implicit_reducible]
                                Equations

                                A constituency test result recording DG vs PSG predictions vs observation.

                                Instances For
                                  Equations
                                  • One or more equations did not get rendered due to their size.
                                  Instances For
                                    @[implicit_reducible]
                                    Equations
                                    • One or more equations did not get rendered due to their size.

                                    Constituency test results for the finite VP "plays chess" (@cite{osborne-2019}, p. 92, example 25):

                                    • Topicalization: *"...and plays chess Bill" → FAIL
                                    • Clefting: *"It is plays chess that Bill does" → FAIL
                                    • Pseudoclefting: ?"What Bill does is plays chess" → FAIL (infinitival preferred)
                                    • Proform sub (do so): "Bill does so" → PASS (but do-so matches non-constituents too, §3.5)
                                    • Answer fragment: *"?Plays chess" → FAIL (bare infinitive "Play chess" preferred)

                                    DG predicts: FAIL on all 5 (finite VP is not a constituent) PSG predicts: PASS on all 5 (finite VP is a constituent)

                                    Equations
                                    • One or more equations did not get rendered due to their size.
                                    Instances For

                                      DG predictions match observed results on 4 of 5 tests (only proform substitution is a mismatch — DG predicts fail, observed pass; but proform sub is known to be unreliable for finite VP, see §3.5).

                                      PSG predictions match observed results on only 1 of 5 tests. PSG predicts all 5 pass (it's a constituent), but only proform sub passes.

                                      theorem DepGrammar.VPDivergence.psg_matches_exactly_one :
                                      (List.filter (fun (t : TestResult) => t.psgPredicts == t.observed) finiteVPTests).length = 1

                                      PSG matches exactly 1 out of 5 tests.

                                      theorem DepGrammar.VPDivergence.dg_matches_exactly_four :
                                      (List.filter (fun (t : TestResult) => t.dgPredicts == t.observed) finiteVPTests).length = 4

                                      DG matches exactly 4 out of 5 tests.

                                      DG always has exactly n constituents for an n-word tree (one per node's complete subtree). Verified for "Bill plays chess" (3 words, 3 constituents).

                                      Constituent ratio comparison: DG 3:4 vs PSG 5:2 for "Bill plays chess" (out of 7 total non-empty subsets of 3 words).

                                      The VP catena {plays, chess} has dependency length 2 (bridge to DependencyLength.lean).

                                      The full sentence constituent has dependency length 3.

                                      Every constituent is a catena — verified exhaustively for "Bill plays chess". (Bridge to Catena.lean's constituent_is_catena theorems)

                                      The finite VP divergence is robust: it holds for the isomorphic "She reads everything" tree as well. Same structure → same divergence.