Derivation Trees for Context-Free Grammars #
CFGTree T N is a derivation tree whose leaves hold terminal symbols of type T
and whose internal nodes hold a nonterminal of type N together with a list of
children. The file provides:
ValidFor g— validity: every internal node matches a production rule ingyield/yieldList— terminal frontier (left-to-right)subtreeAt?/replaceAt— position-based subtree access and replacementvalidFor_derives— soundness: a valid tree derives its yield from its root- Tree existence (
exists_valid_tree) and height–yield bounds - Size measure, minimality, pigeonhole on derivation paths
A derivation tree for a context-free grammar. Leaves hold terminal symbols; internal nodes hold a nonterminal and a list of children (matching a production rule's RHS).
- leaf {T : Type u_1} {N : Type u_2} (t : T) : CFGTree T N
- node {T : Type u_1} {N : Type u_2} (nt : N) (children : List (CFGTree T N)) : CFGTree T N
Instances For
The root symbol of a subtree.
Equations
- (CFGTree.leaf t).rootSymbol = Symbol.terminal t
- (CFGTree.node nt children).rootSymbol = Symbol.nonterminal nt
Instances For
The terminal frontier (yield) of a derivation tree, read left to right.
Equations
- (CFGTree.leaf t).yield = [t]
- (CFGTree.node nt children).yield = CFGTree.yieldList children
Instances For
Concatenate yields of a list of subtrees.
Equations
- CFGTree.yieldList [] = []
- CFGTree.yieldList (t :: ts) = t.yield ++ CFGTree.yieldList ts
Instances For
The height: 0 for leaves, 1 + max child height for nodes.
Equations
- (CFGTree.leaf t).height = 0
- (CFGTree.node nt children).height = 1 + CFGTree.heightMax children
Instances For
Maximum height among a list of subtrees.
Equations
- CFGTree.heightMax [] = 0
- CFGTree.heightMax (t :: ts) = max t.height (CFGTree.heightMax ts)
Instances For
A derivation tree is valid for a CFG if every internal node (A, children) corresponds to a rule A → [rootSymbol c₁, ..., rootSymbol cₖ], and all children are themselves valid.
- leaf {T : Type u_1} {g : ContextFreeGrammar T} (t : T) : ValidFor g (CFGTree.leaf t)
- node {T : Type u_1} {g : ContextFreeGrammar T} (nt : g.NT) (children : List (CFGTree T g.NT)) (hrule : { input := nt, output := List.map rootSymbol children } ∈ g.rules) (hchildren : ∀ c ∈ children, ValidFor g c) : ValidFor g (CFGTree.node nt children)
Instances For
Number of times rule r is applied at internal nodes of the
derivation tree t. A leaf contributes nothing; a node (nt, cs)
contributes 1 if its instantiated rule ⟨nt, cs.map rootSymbol⟩
matches r, plus the count over its children. Substrate primitive
used by any weighted-CFG analysis (PCFG corpus probability,
Dirichlet posterior, etc.).
Equations
- CFGTree.ruleCount r (CFGTree.leaf t) = 0
- CFGTree.ruleCount r (CFGTree.node nt children) = (if r = { input := nt, output := List.map CFGTree.rootSymbol children } then 1 else 0) + CFGTree.ruleCountList r children
Instances For
List version of ruleCount (mutual companion).
Equations
- CFGTree.ruleCountList r [] = 0
- CFGTree.ruleCountList r (t :: ts) = CFGTree.ruleCount r t + CFGTree.ruleCountList r ts
Instances For
Total number of times rule r is used across a corpus D of
derivation trees.
Equations
- CFGTree.corpusRuleCount r D = (Multiset.map (CFGTree.ruleCount r) D).sum
Instances For
Empty corpus contributes no rule applications.
Corpus rule counts add over disjoint corpora — corpusRuleCount
is a Multiset sum over ruleCount, which respects multiset
addition by Multiset.map_add + Multiset.sum_add.
Maximum rule RHS length in a grammar (at least 2).
We take the max over all rules' output lengths, floored at 2 to ensure the branching factor is nontrivial (a tree of branching ≥ 2 and height h has at most b^h leaves).
Equations
- g.maxBranch = max 2 (List.foldl max 0 (List.map (fun (x : ContextFreeRule T g.NT) => x.output.length) g.rules.val.toList))
Instances For
The pumping constant for a CFG: b^(k+1) where b = maxBranch ≥ 2 and k = number of rules (upper bound on distinct nonterminals).
Equations
- g.pumpingConstant = g.maxBranch ^ (g.rules.card + 1)
Instances For
maxBranch is at least 2.
The pumping constant is positive (b ≥ 2 so b^(k+1) ≥ 2).
Tree existence. Every word in a CFG's language has a valid derivation tree rooted at the start symbol.
Proof: applies forest_exists to the start nonterminal [g.initial],
which yields a singleton forest containing the desired tree. The
forest_exists lemma generalizes to all sentential forms by induction
on the derivation, with each Produces step "folding" the children of
the rewritten nonterminal back into a single node tree.
Height–yield bound. A valid derivation tree of height h has at most b^h terminal leaves, where b is the max branching factor.
Proof: well-founded recursion on tree size. A leaf has height 0 and 1 = b⁰ leaves. A node with children c₁...cₖ (k ≤ b) has |yield| = Σᵢ |yield(cᵢ)| ≤ k · b^(max heights) ≤ b · b^(h-1) = b^h.
A position in a tree: list of child indices to follow from the root.
Equations
- CFGTree.Pos = List ℕ
Instances For
Subtree at a given position. Returns none if the path is invalid.
Equations
- x✝.subtreeAt? [] = some x✝
- (CFGTree.leaf t).subtreeAt? (head :: tail) = none
- (CFGTree.node nt children).subtreeAt? (i :: rest) = children[i]?.bind fun (x : CFGTree T N) => x.subtreeAt? rest
Instances For
Replace the subtree at a given position. If the path is invalid, returns the original tree unchanged.
Equations
- x✝¹.replaceAt [] x✝ = x✝
- (CFGTree.leaf t).replaceAt (head :: tail) x✝ = CFGTree.leaf t
- (CFGTree.node nt children).replaceAt (i :: rest) x✝ = if h : i < children.length then CFGTree.node nt (children.set i (children[i].replaceAt rest x✝)) else CFGTree.node nt children
Instances For
Replacing a subtree at a non-root position preserves the root symbol.
Yield decomposition: replacing a subtree at position p produces yield
pre ++ new.yield ++ post, where pre/post are the surrounding context.
Replacing a subtree of the same root symbol preserves validity.
For a tree of height ≥ k+1, there exists a position list of length k
such that the subtree at that position has height ≥ 1 (i.e., is a .node).
Stronger: extract a max-descent path of length k, with subtree at depth i having height = t.height - i.
subtreeAt? splits along path concatenation.
For a valid tree and a path that descends, each prefix subtree is a .node.
Validity propagates through subtreeAt?.
Extract the rule used at a position (option-valued).
Equations
- t.ruleAt? p = match t.subtreeAt? p with | some (CFGTree.node nt children) => some { input := nt, output := List.map CFGTree.rootSymbol children } | x => none
Instances For
For a valid tree at a .node subtree, ruleAt? returns the matching rule in g.rules.
Equations
- (CFGTree.leaf t).size = 1
- (CFGTree.node nt children).size = 1 + CFGTree.sizeList children
Instances For
Equations
- CFGTree.sizeList [] = 0
- CFGTree.sizeList (t :: ts) = t.size + CFGTree.sizeList ts
Instances For
Existence of minimum-size valid tree with given yield and root.
Pigeonhole: along a long-enough valid path, two prefixes have same root nonterminal.