Documentation

Linglib.Theories.Syntax.DependencyGrammar.Formal.Catena

Catenae: A Novel Unit of Syntactic Analysis #

@cite{osborne-gross-2012}

Formalizes the catena (Osborne, Putnam & Groß 2012, Syntax 15:4, 354–396).

A catena (Latin: "chain") is a connected subgraph of a dependency tree — any word or combination of words that is continuous with respect to the dominance relation. Catenae strictly generalize constituents: every constituent is a catena, but not every catena is a constituent.

Mathlib Integration #

The dependency tree is converted to a mathlib SimpleGraph (Fin n) via depsToSimpleGraph, bridging linglib's DepTree/Dependency types to mathlib's graph theory infrastructure. The Prop-level IsCatena is defined using SimpleGraph.Preconnected on induced subgraphs. Computable Bool functions (isCatena, isConstituent) enable native_decide proofs.

Key Results #

Bridges #

def DepGrammar.Catena.depsToSimpleGraph (n : ) (deps : List Dependency) :
SimpleGraph (Fin n)

The undirected simple graph underlying dependency edges over n nodes. Forgets edge direction and labels: i ~ j iff some dependency connects them. Uses mathlib's SimpleGraph (Fin n) — the fundamental bridge from linglib's DepTree/Dependency types to mathlib's graph theory.

Equations
Instances For
    def DepGrammar.Catena.DepTree.asSimpleGraph (t : DepTree) :
    SimpleGraph (Fin t.words.length)

    Convert a DepTree to a mathlib SimpleGraph on its node set.

    Equations
    Instances For
      def DepGrammar.Catena.IsCatena {n : } (G : SimpleGraph (Fin n)) (S : Finset (Fin n)) :

      A catena is a non-empty subset S of tree nodes where the induced subgraph on S is preconnected. Equivalently: a word or combination of words that is continuous with respect to dominance.

      Uses mathlib's SimpleGraph.induce and SimpleGraph.Preconnected.

      Equations
      Instances For
        def DepGrammar.Catena.isConnected (deps : List Dependency) (nodes : List ) :
        Bool

        Check if a node set is connected within the dependency graph. Uses BFS from the first node and checks all others are reached.

        Equations
        Instances For
          def DepGrammar.Catena.isCatena (deps : List Dependency) (nodes : List ) :
          Bool

          Computable catena check: non-empty and connected in the tree.

          Equations
          Instances For
            inductive DepGrammar.Catena.BidirReachable (deps : List Dependency) (allowed : List ) :
            Prop

            Bidirectional reachability within a restricted node set. BidirReachable deps allowed u v holds when there is a path from u to v using dependency edges (in either direction) where all nodes are in allowed.

            Instances For
              theorem DepGrammar.Catena.bidir_step_append {deps : List Dependency} {allowed : List } {u v w : } (h : BidirReachable deps allowed u v) (hv : v allowed) (hw : w allowed) (hedge : ddeps, d.headIdx = v d.depIdx = w d.depIdx = v d.headIdx = w) :
              BidirReachable deps allowed u w

              Append a step to the end of a bidirectional path.

              theorem DepGrammar.Catena.bidir_symm {deps : List Dependency} {allowed : List } {u v : } (h : BidirReachable deps allowed u v) :
              BidirReachable deps allowed v u

              Bidirectional reachability is symmetric (reverse the path, flip edges).

              theorem DepGrammar.Catena.bidir_trans {deps : List Dependency} {allowed : List } {u v w : } (h1 : BidirReachable deps allowed u v) (h2 : BidirReachable deps allowed v w) :
              BidirReachable deps allowed u w

              Bidirectional reachability is transitive.

              theorem DepGrammar.Catena.bfsReachable_complete (deps : List Dependency) (allowed : List ) (start target : ) (h : BidirReachable deps allowed start target) :
              target DepGrammar.Catena.bfsReachable✝ deps allowed start

              BFS completeness: every bidirectionally reachable node appears in the bfsReachable output.

              Proved by showing the output contains start and is closed under edges within allowed, then applying induction on BidirReachable.

              theorem DepGrammar.Catena.singleton_isCatena (deps : List Dependency) (v : ) :
              isCatena deps [v] = true

              Any singleton is a catena: non-empty and trivially connected.

              def DepGrammar.Catena.DepTree.isCatena' (t : DepTree) (nodes : List ) :
              Bool

              Convenience: check catena on a DepTree directly.

              Equations
              Instances For
                def DepGrammar.Catena.isConstituent (deps : List Dependency) (n : ) (nodes : List ) :
                Bool

                Check if a node set equals the complete subtree (projection) rooted at some node. Uses projection from Core/Basic.lean.

                Equations
                • One or more equations did not get rendered due to their size.
                Instances For
                  def DepGrammar.Catena.allNonEmptySubsets (n : ) :
                  List (List )

                  All non-empty subsets of {0,..., n-1}.

                  Equations
                  Instances For
                    def DepGrammar.Catena.allNonEmptySubsets.powerset (remaining : List ) :
                    List (List )
                    Equations
                    Instances For
                      def DepGrammar.Catena.catenaeCount (n : ) (deps : List Dependency) :

                      Count catenae for a tree with n nodes and given dependency edges.

                      Equations
                      Instances For
                        def DepGrammar.Catena.constituentCount (n : ) (deps : List Dependency) :

                        Count constituents for a tree with n nodes.

                        Equations
                        Instances For

                          Total non-empty subsets of n elements: 2^n - 1.

                          Equations
                          Instances For
                            def DepGrammar.Catena.catenaRatio (n : ) (deps : List Dependency) :
                            ×

                            Catena ratio as (catenae, non-catenae). Flatter trees → higher ratio.

                            Equations
                            Instances For

                              Tree (9), p. 359: 4 abstract nodes. a(0) /
                              b(1) c(2) | d(3)

                              10 catenae, 5 non-catenae, 4 constituents out of 15 total. Catenae: {a},{b},{c},{d},{a,b},{a,c},{b,d},{a,b,c},{a,b,d},{a,b,c,d} Constituents: {d},{c},{b,d},{a,b,c,d}

                              Equations
                              • One or more equations did not get rendered due to their size.
                              Instances For

                                Tree (22), p. 371: 3-node flat tree. a(0) /
                                b(1) c(2)

                                6 catenae, 1 non-catena, 3 constituents out of 7 total.

                                Equations
                                Instances For

                                  4-node chain: a(0) → b(1) → c(2) → d(3). 10 catenae (only contiguous intervals are connected).

                                  Equations
                                  • One or more equations did not get rendered due to their size.
                                  Instances For

                                    4-node star: a(0) → {b(1), c(2), d(3)}. 11 catenae (every root-containing subset is connected).

                                    Equations
                                    • One or more equations did not get rendered due to their size.
                                    Instances For

                                      3-node chain: a(0) → b(1) → c(2).

                                      Equations
                                      Instances For

                                        "pulled some strings" — the idiom {pulled, strings} forms a catena but not a constituent.

                                        Words: pulled(0) some(1) strings(2) UD: pulled → strings (obj), strings → some (det).

                                        Equations
                                        • One or more equations did not get rendered due to their size.
                                        Instances For

                                          Flatter trees have strictly more catenae than chain-shaped trees. (@cite{osborne-gross-2012}, p. 371: the catena ratio increases with flatness)

                                          theorem DepGrammar.Catena.constituent_is_catena_tree9 :
                                          ((allNonEmptySubsets 4).all fun (nodes : List ) => if isConstituent tree9 4 nodes = true then isCatena tree9 nodes else true) = true

                                          Every constituent is a catena — verified exhaustively for tree (9). (@cite{osborne-gross-2012}, p. 360: "every 'constituent' is also a catena")

                                          theorem DepGrammar.Catena.constituent_is_catena_star4 :
                                          ((allNonEmptySubsets 4).all fun (nodes : List ) => if isConstituent star4 4 nodes = true then isCatena star4 nodes else true) = true

                                          Every constituent is a catena — verified for star4.

                                          theorem DepGrammar.Catena.constituent_is_catena_chain4 :
                                          ((allNonEmptySubsets 4).all fun (nodes : List ) => if isConstituent chain4 4 nodes = true then isCatena chain4 nodes else true) = true

                                          Every constituent is a catena — verified for chain4.

                                          n constituents ≤ catenae count ≤ 2^n - 1 total combinations.

                                          Every singleton is a catena.

                                          {a, d} is NOT a catena — a and d aren't connected without b.

                                          {b, c} is NOT a catena — b and c aren't connected without a.

                                          The idiom "pulled strings" is a catena (connected via obj edge)...

                                          ...but NOT a constituent (subtree of "pulled" includes "some").

                                          The full phrase "pulled some strings" IS both a constituent and a catena.

                                          theorem DepGrammar.Catena.IsCatena_singleton {n : } (G : SimpleGraph (Fin n)) (v : Fin n) :
                                          IsCatena G {v}

                                          Every singleton is a catena in any SimpleGraph (mathlib Prop-level). Proof: the induced subgraph on {v} has a single vertex, so it's trivially preconnected.

                                          theorem DepGrammar.Catena.isCatena_iff_IsCatena {n : } (deps : List Dependency) (nodes : List ) (hbounds : inodes, i < n) (hnodup : nodes.Nodup) :
                                          isCatena deps nodes = true IsCatena (depsToSimpleGraph n deps) (List.filterMap (fun (i : ) => if h : i < n then some i, h else none) nodes).toFinset

                                          The computable isCatena agrees with the Prop-level IsCatena.

                                          Forward (isCatena = true → IsCatena): BFS from the start node reaches all nodes in the list. BFS soundness gives BidirReachable from start to every node; symmetry + transitivity gives connectivity between any pair; the bridge converts to SimpleGraph.Reachable.

                                          Backward (IsCatenaisCatena = true): Preconnected gives Reachable start v for every v in the set. The bridge converts each Reachable path to BidirReachable, and BFS completeness ensures every such node appears in the output.

                                          def DepGrammar.Catena.catenaTotalDepLength (deps : List Dependency) (nodes : List ) :

                                          Total dependency length restricted to edges within a catena. Measures the linear spread of the catena.

                                          Equations
                                          • One or more equations did not get rendered due to their size.
                                          Instances For

                                            The idiom catena {pulled, strings} has dep length 2.

                                            The full constituent {pulled, some, strings} has dep length 3.