Closure of IsContextFree under intersection with regular languages #
The intersection of a context-free language L with a regular language R is
context-free. Headline theorem in @cite{hopcroft-motwani-ullman-2000}
Theorem 7.27 (p. 286); originally proved in @cite{bar-hillel-perles-shamir-1961}.
Two textbook constructions exist for this theorem:
PDA-product (HMU 2000 §7.3.4): take a PDA
PforLand a DFAAforR, build a single PDAP'with state setQ_P × Q_Arunning them in lockstep. Cleaner and shorter than (2), but requires PDA infrastructure (PDA type + bidirectional CFG↔PDA equivalence) which mathlib does not currently have.Grammar-level Bar-Hillel triple-product (BPS 1961, the original): build a new CFG
G'directly fromGandA, with state-pair-annotated nonterminalsQ × V × Q ⊕ Unit. Heavier construction (more rules, more bookkeeping) but reachable from mathlib'sContextFreeGrammarsubstrate without adding PDAs. This file mechanizes (2). If mathlib later gainsMathlib.Computability.PushdownAutomaton, an alternate proof via (1) becomes available; this construction stays as the direct grammar-level proof.
The (2) construction:
- Start rules: for each accepting state
qf ∈ F, a ruleinr () → [.nonterminal (.inl (q₀, S, qf))]. Theinr ()start symbol branches into "the input must drive M fromq₀to some acceptingqf." - Derivation rules: for each
G-ruleA → βand each path of intermediate statesq₀, q₁, ..., qₖthrough the nonterminals ofβ, a rule(p, A, q) → annotate βwhere the annotated body tracks state transitions through terminals (deterministic) and nonterminals (chosen path).
Owns the ContextFreeGrammar.product definition and proves
Language.IsContextFree.inter_isRegular. Together with the parallel
homomorphism-closure proof in Map.lean, this dissolves the closure axioms
previously in Closure.lean.
A single-rule rewrite of a concatenation u₁ ++ u₂ happens entirely in
u₁ or entirely in u₂. The output decomposes correspondingly.
A single-step rewrite of s₁ ++ s₂ happens entirely in s₁ or entirely in
s₂.
Splitting lemma for derivations. A derivation s₁ ++ s₂ ⇒* t decomposes
into per-side derivations: s₁ ⇒* t₁, s₂ ⇒* t₂, with t = t₁ ++ t₂.
Count the number of nonterminal symbols in a list of symbols.
Equations
- ContextFreeGrammar.nonterminalCount out = List.countP (fun (s : Symbol T N) => match s with | Symbol.nonterminal n => true | x => false) out
Instances For
Annotate out from start state start, using path as the chosen exit
states for nonterminals (in order). Returns the end state and annotated
output. If path is too short, falls back to start for missing exit
states (those configurations don't produce well-formed Bar-Hillel rules and
are filtered out by the rule-generation enumeration).
Equations
- One or more equations did not get rendered due to their size.
- ContextFreeGrammar.annotateOutput M x✝¹ [] x✝ = (x✝¹, [])
Instances For
The G' rule for a chosen (start, path) of state choices applied to a
G-rule r.
Equations
- One or more equations did not get rendered due to their size.
Instances For
All G' rules generated from a single G-rule. Enumerates over all
(start, path) with path a list of length nonterminalCount r.output.
Equations
- One or more equations did not get rendered due to their size.
Instances For
The start rules: for each accepting state qf, branch the start symbol
.inr () into [.nonterminal (.inl (M.start, G.initial, qf))].
Equations
- ContextFreeGrammar.startRules M G = Finset.image (fun (qf : σ) => { input := Sum.inr (), output := [Symbol.nonterminal (Sum.inl (M.start, G.initial, qf))] }) {x : σ | x ∈ M.accept}
Instances For
The Bar-Hillel triple-product CFG: generates G.language ∩ M.accepts.
Equations
- G.product M = { NT := G.productNT σ, initial := Sum.inr (), rules := ContextFreeGrammar.startRules M G ∪ G.rules.biUnion (ContextFreeGrammar.rulesFromRule M) }
Instances For
A G' symbol-list s' projects to a G symbol-list s if every .terminal
matches and every .nonterminal (.inl (_, A, _)) projects to
.nonterminal A. The constructor doesn't allow .nonterminal (.inr ()),
enforcing that projection is only defined for "post-start" sentential
forms.
- nil {T : Type u} {G : ContextFreeGrammar T} {σ : Type} : Projects [] []
- terminal {T : Type u} {G : ContextFreeGrammar T} {σ : Type} {a : T} {rest : List (Symbol T (G.productNT σ))} {rest' : List (Symbol T G.NT)} (h : Projects rest rest') : Projects (Symbol.terminal a :: rest) (Symbol.terminal a :: rest')
- nonterminal {T : Type u} {G : ContextFreeGrammar T} {σ : Type} {p : σ} {A : G.NT} {q : σ} {rest : List (Symbol T (G.productNT σ))} {rest' : List (Symbol T G.NT)} (h : Projects rest rest') : Projects (Symbol.nonterminal (Sum.inl (p, A, q)) :: rest) (Symbol.nonterminal A :: rest')
Instances For
A projection of s' to a concatenation s₁ ++ s₂ decomposes accordingly.
A projection from a concatenation s'₁ ++ s'₂ to s decomposes the target
correspondingly.
The projection of a list of terminals (G'-side) to the same list of terminals (G-side).
If s' projects to w.map .terminal, then s' is itself a list of
terminals (matching w).
A G' symbol-list is Consistent M p s' q if walking through it from state
p (advancing via M.step for terminals, jumping via the nonterminal's
annotated state pair) ends at state q.
- nil {T : Type u} {G : ContextFreeGrammar T} {σ : Type} {M : DFA T σ} (p : σ) : Consistent M p [] p
- terminal {T : Type u} {G : ContextFreeGrammar T} {σ : Type} {M : DFA T σ} {p q : σ} (a : T) {rest : List (Symbol T (G.productNT σ))} (h : Consistent M (M.step p a) rest q) : Consistent M p (Symbol.terminal a :: rest) q
- nonterminal {T : Type u} {G : ContextFreeGrammar T} {σ : Type} {M : DFA T σ} {p p' q : σ} (A : G.NT) {rest : List (Symbol T (G.productNT σ))} (h : Consistent M p' rest q) : Consistent M p (Symbol.nonterminal (Sum.inl (p, A, p')) :: rest) q
Instances For
Consistency composes along concatenation.
Consistency splits along concatenation.
A list of terminals is consistent from p to M.evalFrom p w.
If a list of terminals is consistent from p to q, then q = M.evalFrom p w.
The annotated output projects back to the original output.
The annotated output is consistent from the start state to the returned end state.
A generated rule has the expected input shape.
A generated rule has the expected output shape.
Membership in startRules: every start rule is parametrized by some
qf ∈ M.accept.
Membership in rulesFromRule: every derivation rule is parametrized by
some (start, path) choice.
A projected G' symbol-list contains no .nonterminal (.inr _) symbols.
A G' symbol-list projecting to a singleton G-nonterminal is itself a singleton (with appropriate state annotation).
The dual: a singleton .nonterminal (.inl ...) on the left projects to the corresponding .nonterminal A.
If a list of terminals on the left projects to s, then s is the same
list of terminals on the G side.
One step of the G' Produces relation lifts to a one-step G Produces, given a Projects + Consistent annotation of the source. The target inherits the annotation.
Multi-step lift: given Projects + Consistent annotation, lift any G' derivation to a G derivation, preserving annotations.
The G' rule corresponding to any consistent annotation of a G-rule's body
is in (G.product M).rules.
The start rule for an accepting state qf is in (G.product M).rules.
Annotation-existence lemma (the heart of the backward direction).
For any G derivation of s to (w.map .terminal), and any starting
DFA state p, there exists a Projects+Consistent annotation s' of s
such that (G.product M).Derives s' (w.map .terminal).
Proved by head_induction_on on the G derivation: the refl case takes
s' = w.map .terminal; the head case constructs the annotation by
extracting a Projects+Consistent decomposition of the IH's annotation
of mid (the post-step sentential form), then applies the corresponding
G' rule (which exists by mem_rules_of_consistent).
CFL closure under intersection with a regular language
(@cite{bar-hillel-perles-shamir-1961}; @cite{hopcroft-motwani-ullman-2000}
Theorem 7.27): if L is context-free and R is regular, then L ∩ R is
context-free.