Documentation

Linglib.Core.Computability.ContextFreeGrammar.Tree

Derivation Trees for Context-Free Grammars #

CFGTree T N is a derivation tree whose leaves hold terminal symbols of type T and whose internal nodes hold a nonterminal of type N together with a list of children. The file provides:

inductive CFGTree (T : Type u_1) (N : Type u_2) :
Type (max u_1 u_2)

A derivation tree for a context-free grammar. Leaves hold terminal symbols; internal nodes hold a nonterminal and a list of children (matching a production rule's RHS).

Instances For
    def CFGTree.rootSymbol {T : Type u_1} {N : Type u_2} :
    CFGTree T NSymbol T N

    The root symbol of a subtree.

    Equations
    Instances For
      def CFGTree.yield {T : Type u_1} {N : Type u_2} :
      CFGTree T NList T

      The terminal frontier (yield) of a derivation tree, read left to right.

      Equations
      Instances For
        def CFGTree.yieldList {T : Type u_1} {N : Type u_2} :
        List (CFGTree T N)List T

        Concatenate yields of a list of subtrees.

        Equations
        Instances For
          def CFGTree.height {T : Type u_1} {N : Type u_2} :
          CFGTree T N

          The height: 0 for leaves, 1 + max child height for nodes.

          Equations
          Instances For
            def CFGTree.heightMax {T : Type u_1} {N : Type u_2} :
            List (CFGTree T N)

            Maximum height among a list of subtrees.

            Equations
            Instances For
              inductive CFGTree.ValidFor {T : Type u_1} (g : ContextFreeGrammar T) :
              CFGTree T g.NTProp

              A derivation tree is valid for a CFG if every internal node (A, children) corresponds to a rule A → [rootSymbol c₁, ..., rootSymbol cₖ], and all children are themselves valid.

              Instances For
                def CFGTree.ruleCount {T : Type u_1} {N : Type u_2} [DecidableEq T] [DecidableEq N] (r : ContextFreeRule T N) :
                CFGTree T N

                Number of times rule r is applied at internal nodes of the derivation tree t. A leaf contributes nothing; a node (nt, cs) contributes 1 if its instantiated rule ⟨nt, cs.map rootSymbol⟩ matches r, plus the count over its children. Substrate primitive used by any weighted-CFG analysis (PCFG corpus probability, Dirichlet posterior, etc.).

                Equations
                Instances For
                  def CFGTree.ruleCountList {T : Type u_1} {N : Type u_2} [DecidableEq T] [DecidableEq N] (r : ContextFreeRule T N) :
                  List (CFGTree T N)

                  List version of ruleCount (mutual companion).

                  Equations
                  Instances For
                    def CFGTree.corpusRuleCount {T : Type u_1} {N : Type u_2} [DecidableEq T] [DecidableEq N] (r : ContextFreeRule T N) (D : Multiset (CFGTree T N)) :

                    Total number of times rule r is used across a corpus D of derivation trees.

                    Equations
                    Instances For
                      @[simp]
                      theorem CFGTree.corpusRuleCount_zero {T : Type u_1} {N : Type u_2} [DecidableEq T] [DecidableEq N] (r : ContextFreeRule T N) :

                      Empty corpus contributes no rule applications.

                      theorem CFGTree.corpusRuleCount_add {T : Type u_1} {N : Type u_2} [DecidableEq T] [DecidableEq N] (r : ContextFreeRule T N) (D₁ D₂ : Multiset (CFGTree T N)) :
                      corpusRuleCount r (D₁ + D₂) = corpusRuleCount r D₁ + corpusRuleCount r D₂

                      Corpus rule counts add over disjoint corpora — corpusRuleCount is a Multiset sum over ruleCount, which respects multiset addition by Multiset.map_add + Multiset.sum_add.

                      noncomputable def ContextFreeGrammar.maxBranch {T : Type u_1} (g : ContextFreeGrammar T) :

                      Maximum rule RHS length in a grammar (at least 2).

                      We take the max over all rules' output lengths, floored at 2 to ensure the branching factor is nontrivial (a tree of branching ≥ 2 and height h has at most b^h leaves).

                      Equations
                      • g.maxBranch = max 2 (List.foldl max 0 (List.map (fun (x : ContextFreeRule T g.NT) => x.output.length) g.rules.val.toList))
                      Instances For
                        noncomputable def ContextFreeGrammar.pumpingConstant {T : Type u_1} (g : ContextFreeGrammar T) :

                        The pumping constant for a CFG: b^(k+1) where b = maxBranch ≥ 2 and k = number of rules (upper bound on distinct nonterminals).

                        Equations
                        Instances For
                          theorem ContextFreeGrammar.maxBranch_ge_two {T : Type u_1} (g : ContextFreeGrammar T) :
                          g.maxBranch 2

                          maxBranch is at least 2.

                          theorem ContextFreeGrammar.pumpingConstant_pos {T : Type u_1} (g : ContextFreeGrammar T) :

                          The pumping constant is positive (b ≥ 2 so b^(k+1) ≥ 2).

                          theorem exists_valid_tree {T : Type u_1} (g : ContextFreeGrammar T) (w : List T) (hw : w g.language) :
                          ∃ (t : CFGTree T g.NT), CFGTree.ValidFor g t t.yield = w t.rootSymbol = Symbol.nonterminal g.initial

                          Tree existence. Every word in a CFG's language has a valid derivation tree rooted at the start symbol.

                          Proof: applies forest_exists to the start nonterminal [g.initial], which yields a singleton forest containing the desired tree. The forest_exists lemma generalizes to all sentential forms by induction on the derivation, with each Produces step "folding" the children of the rewritten nonterminal back into a single node tree.

                          theorem yield_length_le_of_height {T : Type u_1} (g : ContextFreeGrammar T) (t : CFGTree T g.NT) (ht : CFGTree.ValidFor g t) :
                          t.yield.length g.maxBranch ^ t.height

                          Height–yield bound. A valid derivation tree of height h has at most b^h terminal leaves, where b is the max branching factor.

                          Proof: well-founded recursion on tree size. A leaf has height 0 and 1 = b⁰ leaves. A node with children c₁...cₖ (k ≤ b) has |yield| = Σᵢ |yield(cᵢ)| ≤ k · b^(max heights) ≤ b · b^(h-1) = b^h.

                          @[reducible, inline]

                          A position in a tree: list of child indices to follow from the root.

                          Equations
                          Instances For
                            def CFGTree.subtreeAt? {T : Type u_1} {N : Type u_2} :
                            CFGTree T NPosOption (CFGTree T N)

                            Subtree at a given position. Returns none if the path is invalid.

                            Equations
                            Instances For
                              def CFGTree.replaceAt {T : Type u_1} {N : Type u_2} :
                              CFGTree T NPosCFGTree T NCFGTree T N

                              Replace the subtree at a given position. If the path is invalid, returns the original tree unchanged.

                              Equations
                              Instances For
                                theorem CFGTree.rootSymbol_replaceAt_cons {T : Type u_1} {N : Type u_2} (t : CFGTree T N) (i : ) (rest : Pos) (new : CFGTree T N) :
                                (t.replaceAt (i :: rest) new).rootSymbol = t.rootSymbol

                                Replacing a subtree at a non-root position preserves the root symbol.

                                theorem CFGTree.yield_replaceAt_decomp {T : Type u_1} {N : Type u_2} (t : CFGTree T N) (p : Pos) (sub : CFGTree T N) (h : t.subtreeAt? p = some sub) :
                                ∃ (pre : List T) (post : List T), t.yield = pre ++ sub.yield ++ post ∀ (new : CFGTree T N), (t.replaceAt p new).yield = pre ++ new.yield ++ post

                                Yield decomposition: replacing a subtree at position p produces yield pre ++ new.yield ++ post, where pre/post are the surrounding context.

                                theorem CFGTree.validFor_replaceAt {T : Type u_1} {g : ContextFreeGrammar T} (t : CFGTree T g.NT) (p : Pos) (sub new : CFGTree T g.NT) (h : t.subtreeAt? p = some sub) (hroot : new.rootSymbol = sub.rootSymbol) (ht_valid : ValidFor g t) (hnew_valid : ValidFor g new) :
                                ValidFor g (t.replaceAt p new)

                                Replacing a subtree of the same root symbol preserves validity.

                                theorem CFGTree.exists_pos_of_height {T : Type u_1} {N : Type u_2} (t : CFGTree T N) (k : ) (h : t.height k + 1) :
                                ∃ (p : Pos), List.length p = k ∃ (sub : CFGTree T N), t.subtreeAt? p = some sub sub.height 1

                                For a tree of height ≥ k+1, there exists a position list of length k such that the subtree at that position has height ≥ 1 (i.e., is a .node).

                                theorem CFGTree.exists_pos_max_descent {T : Type u_1} {N : Type u_2} (t : CFGTree T N) (k : ) (h : t.height k + 1) :
                                ∃ (p : Pos), List.length p = k ik, ∃ (sub : CFGTree T N), t.subtreeAt? (List.take i p) = some sub sub.height = t.height - i

                                Stronger: extract a max-descent path of length k, with subtree at depth i having height = t.height - i.

                                theorem CFGTree.subtreeAt?_append {T : Type u_1} {N : Type u_2} (t : CFGTree T N) (p1 p2 : Pos) :
                                t.subtreeAt? (p1 ++ p2) = (t.subtreeAt? p1).bind fun (x : CFGTree T N) => x.subtreeAt? p2

                                subtreeAt? splits along path concatenation.

                                theorem CFGTree.spine_node_at_prefix {T : Type u_1} {N : Type u_2} (t : CFGTree T N) (p : Pos) (sub : CFGTree T N) (hsub : t.subtreeAt? p = some sub) (hsub_h : sub.height 1) (k : ) (hk : k < List.length p + 1) :
                                ∃ (nt : N) (children : List (CFGTree T N)), t.subtreeAt? (List.take k p) = some (node nt children)

                                For a valid tree and a path that descends, each prefix subtree is a .node.

                                theorem CFGTree.subtreeAt?_validFor {T : Type u_1} {g : ContextFreeGrammar T} (t : CFGTree T g.NT) (ht : ValidFor g t) (p : Pos) (sub : CFGTree T g.NT) (hsub : t.subtreeAt? p = some sub) :
                                ValidFor g sub

                                Validity propagates through subtreeAt?.

                                def CFGTree.ruleAt? {T : Type u_1} {g : ContextFreeGrammar T} (t : CFGTree T g.NT) (p : Pos) :
                                Option (ContextFreeRule T g.NT)

                                Extract the rule used at a position (option-valued).

                                Equations
                                Instances For
                                  theorem CFGTree.ruleAt?_mem_rules {T : Type u_1} {g : ContextFreeGrammar T} (t : CFGTree T g.NT) (ht : ValidFor g t) (p : Pos) (nt : g.NT) (children : List (CFGTree T g.NT)) (hsub : t.subtreeAt? p = some (node nt children)) :
                                  t.ruleAt? p = some { input := nt, output := List.map rootSymbol children } { input := nt, output := List.map rootSymbol children } g.rules

                                  For a valid tree at a .node subtree, ruleAt? returns the matching rule in g.rules.

                                  def CFGTree.size {T : Type u_1} {N : Type u_2} :
                                  CFGTree T N
                                  Equations
                                  Instances For
                                    def CFGTree.sizeList {T : Type u_1} {N : Type u_2} :
                                    List (CFGTree T N)
                                    Equations
                                    Instances For
                                      theorem CFGTree.size_pos {T : Type u_1} {N : Type u_2} (t : CFGTree T N) :
                                      t.size 1
                                      theorem CFGTree.sizeList_le_of_mem {T : Type u_1} {N : Type u_2} {t : CFGTree T N} {ts : List (CFGTree T N)} (h : t ts) :
                                      theorem CFGTree.size_subtreeAt?_le {T : Type u_1} {N : Type u_2} (t : CFGTree T N) (p : Pos) (sub : CFGTree T N) (h : t.subtreeAt? p = some sub) :
                                      sub.size t.size
                                      theorem CFGTree.size_subtreeAt?_lt_of_cons {T : Type u_1} {N : Type u_2} (t : CFGTree T N) (i : ) (rest : Pos) (sub : CFGTree T N) (h : t.subtreeAt? (i :: rest) = some sub) :
                                      sub.size < t.size
                                      theorem CFGTree.sizeList_set {T : Type u_1} {N : Type u_2} (l : List (CFGTree T N)) (i : ) (x : CFGTree T N) (hi : i < l.length) :
                                      sizeList (l.set i x) + l[i].size = sizeList l + x.size
                                      theorem CFGTree.size_replaceAt_lt {T : Type u_1} {N : Type u_2} (t : CFGTree T N) (p : Pos) (sub new : CFGTree T N) (h : t.subtreeAt? p = some sub) (hlt : new.size < sub.size) :
                                      (t.replaceAt p new).size < t.size

                                      Replacing a subtree with a strictly smaller one gives a strictly smaller tree.

                                      theorem CFGTree.exists_min_size_tree {T : Type u_1} {g : ContextFreeGrammar T} (t : CFGTree T g.NT) (ht : ValidFor g t) :
                                      ∃ (t_min : CFGTree T g.NT), ValidFor g t_min t_min.yield = t.yield t_min.rootSymbol = t.rootSymbol ∀ (t' : CFGTree T g.NT), ValidFor g t't'.yield = t.yieldt'.rootSymbol = t.rootSymbolt_min.size t'.size

                                      Existence of minimum-size valid tree with given yield and root.

                                      theorem CFGTree.exists_repeat_root {T : Type u_1} {g : ContextFreeGrammar T} (t : CFGTree T g.NT) (ht : ValidFor g t) (p : Pos) (sub : CFGTree T g.NT) (hsub : t.subtreeAt? p = some sub) (hsub_h : sub.height 1) (hlen : List.length p g.rules.card) :
                                      ∃ (i : ) (j : ), i < j j List.length p ∃ (ntᵢ : g.NT) (children_i : List (CFGTree T g.NT)) (ntⱼ : g.NT) (children_j : List (CFGTree T g.NT)), t.subtreeAt? (List.take i p) = some (node ntᵢ children_i) t.subtreeAt? (List.take j p) = some (node ntⱼ children_j) ntᵢ = ntⱼ

                                      Pigeonhole: along a long-enough valid path, two prefixes have same root nonterminal.

                                      theorem CFGTree.validFor_derives {T : Type u_3} {g : ContextFreeGrammar T} (t : CFGTree T g.NT) (ht : ValidFor g t) :
                                      g.Derives [t.rootSymbol] (List.map Symbol.terminal t.yield)