Documentation

Linglib.Core.Computability.ContextFreeGrammar.InterRegular

Closure of IsContextFree under intersection with regular languages #

The intersection of a context-free language L with a regular language R is context-free. Headline theorem in @cite{hopcroft-motwani-ullman-2000} Theorem 7.27 (p. 286); originally proved in @cite{bar-hillel-perles-shamir-1961}.

Two textbook constructions exist for this theorem:

  1. PDA-product (HMU 2000 §7.3.4): take a PDA P for L and a DFA A for R, build a single PDA P' with state set Q_P × Q_A running them in lockstep. Cleaner and shorter than (2), but requires PDA infrastructure (PDA type + bidirectional CFG↔PDA equivalence) which mathlib does not currently have.

  2. Grammar-level Bar-Hillel triple-product (BPS 1961, the original): build a new CFG G' directly from G and A, with state-pair-annotated nonterminals Q × V × Q ⊕ Unit. Heavier construction (more rules, more bookkeeping) but reachable from mathlib's ContextFreeGrammar substrate without adding PDAs. This file mechanizes (2). If mathlib later gains Mathlib.Computability.PushdownAutomaton, an alternate proof via (1) becomes available; this construction stays as the direct grammar-level proof.

The (2) construction:

Owns the ContextFreeGrammar.product definition and proves Language.IsContextFree.inter_isRegular. Together with the parallel homomorphism-closure proof in Map.lean, this dissolves the closure axioms previously in Closure.lean.

theorem ContextFreeRule.Rewrites.append_split {T : Type u} {N : Type u_1} {r : ContextFreeRule T N} {u₁ u₂ v : List (Symbol T N)} :
r.Rewrites (u₁ ++ u₂) v(∃ (v₁ : List (Symbol T N)), v = v₁ ++ u₂ r.Rewrites u₁ v₁) ∃ (v₂ : List (Symbol T N)), v = u₁ ++ v₂ r.Rewrites u₂ v₂

A single-rule rewrite of a concatenation u₁ ++ u₂ happens entirely in u₁ or entirely in u₂. The output decomposes correspondingly.

theorem ContextFreeGrammar.Produces.append_split {T : Type u} {G : ContextFreeGrammar T} {s₁ s₂ v : List (Symbol T G.NT)} (hp : G.Produces (s₁ ++ s₂) v) :
(∃ (t₁ : List (Symbol T G.NT)), v = t₁ ++ s₂ G.Produces s₁ t₁) ∃ (t₂ : List (Symbol T G.NT)), v = s₁ ++ t₂ G.Produces s₂ t₂

A single-step rewrite of s₁ ++ s₂ happens entirely in s₁ or entirely in s₂.

theorem ContextFreeGrammar.Derives.append_split {T : Type u} {G : ContextFreeGrammar T} {s₁ s₂ t : List (Symbol T G.NT)} :
G.Derives (s₁ ++ s₂) t∃ (t₁ : List (Symbol T G.NT)) (t₂ : List (Symbol T G.NT)), t = t₁ ++ t₂ G.Derives s₁ t₁ G.Derives s₂ t₂

Splitting lemma for derivations. A derivation s₁ ++ s₂ ⇒* t decomposes into per-side derivations: s₁ ⇒* t₁, s₂ ⇒* t₂, with t = t₁ ++ t₂.

def ContextFreeGrammar.productNT {T : Type u} (G : ContextFreeGrammar T) (σ : Type) :

The product nonterminal type: state-pair-annotated G-nonterminals plus a fresh start symbol inr (). Defined as def (not abbrev) to keep the type display stable through (G.product M).NT projections.

Equations
Instances For
    def ContextFreeGrammar.nonterminalCount {T : Type u} {N : Type u_1} (out : List (Symbol T N)) :

    Count the number of nonterminal symbols in a list of symbols.

    Equations
    Instances For
      def ContextFreeGrammar.annotateOutput {T : Type u} {σ : Type} {G : ContextFreeGrammar T} (M : DFA T σ) :
      σList (Symbol T G.NT)List σσ × List (Symbol T (G.productNT σ))

      Annotate out from start state start, using path as the chosen exit states for nonterminals (in order). Returns the end state and annotated output. If path is too short, falls back to start for missing exit states (those configurations don't produce well-formed Bar-Hillel rules and are filtered out by the rule-generation enumeration).

      Equations
      Instances For
        def ContextFreeGrammar.generatedRule {T : Type u} {σ : Type} {G : ContextFreeGrammar T} (M : DFA T σ) (r : ContextFreeRule T G.NT) (start : σ) (path : List σ) :
        ContextFreeRule T (G.productNT σ)

        The G' rule for a chosen (start, path) of state choices applied to a G-rule r.

        Equations
        • One or more equations did not get rendered due to their size.
        Instances For
          noncomputable def ContextFreeGrammar.rulesFromRule {T : Type u} {σ : Type} {G : ContextFreeGrammar T} (M : DFA T σ) [Fintype σ] [DecidableEq σ] (r : ContextFreeRule T G.NT) :
          Finset (ContextFreeRule T (G.productNT σ))

          All G' rules generated from a single G-rule. Enumerates over all (start, path) with path a list of length nonterminalCount r.output.

          Equations
          • One or more equations did not get rendered due to their size.
          Instances For
            noncomputable def ContextFreeGrammar.startRules {T : Type u} {σ : Type} (M : DFA T σ) [Fintype σ] (G : ContextFreeGrammar T) :
            Finset (ContextFreeRule T (G.productNT σ))

            The start rules: for each accepting state qf, branch the start symbol .inr () into [.nonterminal (.inl (M.start, G.initial, qf))].

            Equations
            • ContextFreeGrammar.startRules M G = Finset.image (fun (qf : σ) => { input := Sum.inr (), output := [Symbol.nonterminal (Sum.inl (M.start, G.initial, qf))] }) {x : σ | x M.accept}
            Instances For
              noncomputable def ContextFreeGrammar.product {T : Type u} {σ : Type} [Fintype σ] [DecidableEq σ] (G : ContextFreeGrammar T) (M : DFA T σ) :
              ContextFreeGrammar T

              The Bar-Hillel triple-product CFG: generates G.language ∩ M.accepts.

              Equations
              Instances For
                inductive Projects {T : Type u} {G : ContextFreeGrammar T} {σ : Type} :
                List (Symbol T (G.productNT σ))List (Symbol T G.NT)Prop

                A G' symbol-list s' projects to a G symbol-list s if every .terminal matches and every .nonterminal (.inl (_, A, _)) projects to .nonterminal A. The constructor doesn't allow .nonterminal (.inr ()), enforcing that projection is only defined for "post-start" sentential forms.

                • nil {T : Type u} {G : ContextFreeGrammar T} {σ : Type} : Projects [] []
                • terminal {T : Type u} {G : ContextFreeGrammar T} {σ : Type} {a : T} {rest : List (Symbol T (G.productNT σ))} {rest' : List (Symbol T G.NT)} (h : Projects rest rest') : Projects (Symbol.terminal a :: rest) (Symbol.terminal a :: rest')
                • nonterminal {T : Type u} {G : ContextFreeGrammar T} {σ : Type} {p : σ} {A : G.NT} {q : σ} {rest : List (Symbol T (G.productNT σ))} {rest' : List (Symbol T G.NT)} (h : Projects rest rest') : Projects (Symbol.nonterminal (Sum.inl (p, A, q)) :: rest) (Symbol.nonterminal A :: rest')
                Instances For
                  theorem Projects.append {T : Type u} {G : ContextFreeGrammar T} {σ : Type} {s'₁ s'₂ : List (Symbol T (G.productNT σ))} {s₁ s₂ : List (Symbol T G.NT)} (h₁ : Projects s'₁ s₁) (h₂ : Projects s'₂ s₂) :
                  Projects (s'₁ ++ s'₂) (s₁ ++ s₂)

                  Projection respects concatenation.

                  theorem Projects.split_right {T : Type u} {G : ContextFreeGrammar T} {σ : Type} {s' : List (Symbol T (G.productNT σ))} {s₁ s₂ : List (Symbol T G.NT)} (h : Projects s' (s₁ ++ s₂)) :
                  ∃ (s'₁ : List (Symbol T (G.productNT σ))) (s'₂ : List (Symbol T (G.productNT σ))), s' = s'₁ ++ s'₂ Projects s'₁ s₁ Projects s'₂ s₂

                  A projection of s' to a concatenation s₁ ++ s₂ decomposes accordingly.

                  theorem Projects.split_left {T : Type u} {G : ContextFreeGrammar T} {σ : Type} {s : List (Symbol T G.NT)} {s'₁ s'₂ : List (Symbol T (G.productNT σ))} (h : Projects (s'₁ ++ s'₂) s) :
                  ∃ (s₁ : List (Symbol T G.NT)) (s₂ : List (Symbol T G.NT)), s = s₁ ++ s₂ Projects s'₁ s₁ Projects s'₂ s₂

                  A projection from a concatenation s'₁ ++ s'₂ to s decomposes the target correspondingly.

                  theorem Projects.map_terminal {T : Type u} {G : ContextFreeGrammar T} {σ : Type} (w : List T) :
                  Projects (List.map Symbol.terminal w) (List.map Symbol.terminal w)

                  The projection of a list of terminals (G'-side) to the same list of terminals (G-side).

                  theorem Projects.eq_map_terminal_of_projects {T : Type u} {G : ContextFreeGrammar T} {σ : Type} {s' : List (Symbol T (G.productNT σ))} {w : List T} (h : Projects s' (List.map Symbol.terminal w)) :
                  s' = List.map Symbol.terminal w

                  If s' projects to w.map .terminal, then s' is itself a list of terminals (matching w).

                  inductive Consistent {T : Type u} {G : ContextFreeGrammar T} {σ : Type} (M : DFA T σ) :
                  σList (Symbol T (G.productNT σ))σProp

                  A G' symbol-list is Consistent M p s' q if walking through it from state p (advancing via M.step for terminals, jumping via the nonterminal's annotated state pair) ends at state q.

                  • nil {T : Type u} {G : ContextFreeGrammar T} {σ : Type} {M : DFA T σ} (p : σ) : Consistent M p [] p
                  • terminal {T : Type u} {G : ContextFreeGrammar T} {σ : Type} {M : DFA T σ} {p q : σ} (a : T) {rest : List (Symbol T (G.productNT σ))} (h : Consistent M (M.step p a) rest q) : Consistent M p (Symbol.terminal a :: rest) q
                  • nonterminal {T : Type u} {G : ContextFreeGrammar T} {σ : Type} {M : DFA T σ} {p p' q : σ} (A : G.NT) {rest : List (Symbol T (G.productNT σ))} (h : Consistent M p' rest q) : Consistent M p (Symbol.nonterminal (Sum.inl (p, A, p')) :: rest) q
                  Instances For
                    theorem Consistent.append {T : Type u} {G : ContextFreeGrammar T} {σ : Type} {M : DFA T σ} {p q r : σ} {s'₁ s'₂ : List (Symbol T (G.productNT σ))} (h₁ : Consistent M p s'₁ q) (h₂ : Consistent M q s'₂ r) :
                    Consistent M p (s'₁ ++ s'₂) r

                    Consistency composes along concatenation.

                    theorem Consistent.split {T : Type u} {G : ContextFreeGrammar T} {σ : Type} {M : DFA T σ} {p r : σ} {s'₁ s'₂ : List (Symbol T (G.productNT σ))} (h : Consistent M p (s'₁ ++ s'₂) r) :
                    ∃ (q : σ), Consistent M p s'₁ q Consistent M q s'₂ r

                    Consistency splits along concatenation.

                    theorem Consistent.map_terminal {T : Type u} {G : ContextFreeGrammar T} {σ : Type} {M : DFA T σ} (p : σ) (w : List T) :
                    Consistent M p (List.map Symbol.terminal w) (M.evalFrom p w)

                    A list of terminals is consistent from p to M.evalFrom p w.

                    theorem Consistent.eval_of_terminal {T : Type u} {G : ContextFreeGrammar T} {σ : Type} {M : DFA T σ} {p q : σ} {w : List T} (h : Consistent M p (List.map Symbol.terminal w) q) :
                    q = M.evalFrom p w

                    If a list of terminals is consistent from p to q, then q = M.evalFrom p w.

                    theorem ContextFreeGrammar.annotateOutput_projects {T : Type u} {G : ContextFreeGrammar T} {σ : Type} {M : DFA T σ} (start : σ) (out : List (Symbol T G.NT)) (path : List σ) :
                    Projects (annotateOutput M start out path).2 out

                    The annotated output projects back to the original output.

                    theorem ContextFreeGrammar.annotateOutput_consistent {T : Type u} {G : ContextFreeGrammar T} {σ : Type} {M : DFA T σ} (start : σ) (out : List (Symbol T G.NT)) (path : List σ) :
                    Consistent M start (annotateOutput M start out path).2 (annotateOutput M start out path).1

                    The annotated output is consistent from the start state to the returned end state.

                    @[simp]
                    theorem ContextFreeGrammar.generatedRule_input {T : Type u} {G : ContextFreeGrammar T} {σ : Type} {M : DFA T σ} (r : ContextFreeRule T G.NT) (start : σ) (path : List σ) :
                    (generatedRule M r start path).input = Sum.inl (start, r.input, (annotateOutput M start r.output path).1)

                    A generated rule has the expected input shape.

                    @[simp]
                    theorem ContextFreeGrammar.generatedRule_output {T : Type u} {G : ContextFreeGrammar T} {σ : Type} {M : DFA T σ} (r : ContextFreeRule T G.NT) (start : σ) (path : List σ) :
                    (generatedRule M r start path).output = (annotateOutput M start r.output path).2

                    A generated rule has the expected output shape.

                    theorem ContextFreeGrammar.mem_startRules {T : Type u} {G : ContextFreeGrammar T} {σ : Type} {M : DFA T σ} [Fintype σ] {r : ContextFreeRule T (G.productNT σ)} :
                    r startRules M G qfM.accept, r = { input := Sum.inr (), output := [Symbol.nonterminal (Sum.inl (M.start, G.initial, qf))] }

                    Membership in startRules: every start rule is parametrized by some qf ∈ M.accept.

                    theorem ContextFreeGrammar.mem_rulesFromRule {T : Type u} {G : ContextFreeGrammar T} {σ : Type} {M : DFA T σ} [Fintype σ] [DecidableEq σ] {r : ContextFreeRule T G.NT} {r' : ContextFreeRule T (G.productNT σ)} :
                    r' rulesFromRule M r ∃ (start : σ) (path_fn : Fin (nonterminalCount r.output)σ), r' = generatedRule M r start (List.ofFn path_fn)

                    Membership in rulesFromRule: every derivation rule is parametrized by some (start, path) choice.

                    theorem Projects.not_inr_mem {T : Type u} {G : ContextFreeGrammar T} {σ : Type} {s' : List (Symbol T (G.productNT σ))} {s : List (Symbol T G.NT)} (h : Projects s' s) (u : Unit) :
                    Symbol.nonterminal (Sum.inr u)s'

                    A projected G' symbol-list contains no .nonterminal (.inr _) symbols.

                    theorem Projects.singleton_nonterminal {T : Type u} {G : ContextFreeGrammar T} {σ : Type} {s' : List (Symbol T (G.productNT σ))} {A : G.NT} (h : Projects s' [Symbol.nonterminal A]) :
                    ∃ (p : σ) (q : σ), s' = [Symbol.nonterminal (Sum.inl (p, A, q))]

                    A G' symbol-list projecting to a singleton G-nonterminal is itself a singleton (with appropriate state annotation).

                    theorem Projects.of_singleton_nonterminal {T : Type u} {G : ContextFreeGrammar T} {σ : Type} {p q : σ} {A : G.NT} {s : List (Symbol T G.NT)} (h : Projects [Symbol.nonterminal (Sum.inl (p, A, q))] s) :
                    s = [Symbol.nonterminal A]

                    The dual: a singleton .nonterminal (.inl ...) on the left projects to the corresponding .nonterminal A.

                    theorem Projects.eq_of_map_terminal_left {T : Type u} {G : ContextFreeGrammar T} {σ : Type} {w : List T} {s : List (Symbol T G.NT)} (h : Projects (List.map Symbol.terminal w) s) :
                    s = List.map Symbol.terminal w

                    If a list of terminals on the left projects to s, then s is the same list of terminals on the G side.

                    theorem ContextFreeGrammar.product_produces_lift {T : Type u} {G : ContextFreeGrammar T} {σ : Type} [Fintype σ] [DecidableEq σ] {M : DFA T σ} {s' t' : List (Symbol T (G.productNT σ))} {s : List (Symbol T G.NT)} {p q : σ} (step : (G.product M).Produces s' t') (hp : Projects s' s) (hc : Consistent M p s' q) :
                    ∃ (t : List (Symbol T G.NT)), G.Produces s t Projects t' t Consistent M p t' q

                    One step of the G' Produces relation lifts to a one-step G Produces, given a Projects + Consistent annotation of the source. The target inherits the annotation.

                    theorem ContextFreeGrammar.product_derives_lift {T : Type u} {G : ContextFreeGrammar T} {σ : Type} [Fintype σ] [DecidableEq σ] {M : DFA T σ} {s' t' : List (Symbol T (G.productNT σ))} (hd : (G.product M).Derives s' t') {s : List (Symbol T G.NT)} {p q : σ} (hp : Projects s' s) (hc : Consistent M p s' q) :
                    ∃ (t : List (Symbol T G.NT)), G.Derives s t Projects t' t Consistent M p t' q

                    Multi-step lift: given Projects + Consistent annotation, lift any G' derivation to a G derivation, preserving annotations.

                    theorem ContextFreeGrammar.product_language_le {T : Type u} {σ : Type} [Fintype σ] [DecidableEq σ] (G : ContextFreeGrammar T) (M : DFA T σ) :
                    (G.product M).language G.languageM.accepts

                    Forward inclusion (Bar-Hillel): every word generated by G.product M is in G.language ⊓ M.accepts.

                    theorem ContextFreeGrammar.mem_rules_of_consistent {T : Type u} {G : ContextFreeGrammar T} {σ : Type} {M : DFA T σ} [Fintype σ] [DecidableEq σ] {r : ContextFreeRule T G.NT} (hr_mem : r G.rules) {p q : σ} {β' : List (Symbol T (G.productNT σ))} (hp : Projects β' r.output) (hc : Consistent M p β' q) :
                    { input := Sum.inl (p, r.input, q), output := β' } (G.product M).rules

                    The G' rule corresponding to any consistent annotation of a G-rule's body is in (G.product M).rules.

                    theorem ContextFreeGrammar.start_rule_mem {T : Type u} {σ : Type} [Fintype σ] [DecidableEq σ] (G : ContextFreeGrammar T) (M : DFA T σ) {qf : σ} (hqf : qf M.accept) :
                    { input := Sum.inr (), output := [Symbol.nonterminal (Sum.inl (M.start, G.initial, qf))] } (G.product M).rules

                    The start rule for an accepting state qf is in (G.product M).rules.

                    theorem ContextFreeGrammar.exists_annotation_of_derives {T : Type u} {G : ContextFreeGrammar T} {σ : Type} {M : DFA T σ} [Fintype σ] [DecidableEq σ] {s : List (Symbol T G.NT)} {w : List T} (h_drv : G.Derives s (List.map Symbol.terminal w)) (p : σ) :
                    ∃ (s' : List (Symbol T (G.productNT σ))), Projects s' s Consistent M p s' (M.evalFrom p w) (G.product M).Derives s' (List.map Symbol.terminal w)

                    Annotation-existence lemma (the heart of the backward direction). For any G derivation of s to (w.map .terminal), and any starting DFA state p, there exists a Projects+Consistent annotation s' of s such that (G.product M).Derives s' (w.map .terminal).

                    Proved by head_induction_on on the G derivation: the refl case takes s' = w.map .terminal; the head case constructs the annotation by extracting a Projects+Consistent decomposition of the IH's annotation of mid (the post-step sentential form), then applies the corresponding G' rule (which exists by mem_rules_of_consistent).

                    theorem ContextFreeGrammar.le_product_language {T : Type u} {σ : Type} [Fintype σ] [DecidableEq σ] (G : ContextFreeGrammar T) (M : DFA T σ) :
                    G.languageM.accepts (G.product M).language

                    Backward inclusion (Bar-Hillel): every word in G.language ⊓ M.accepts is generated by G.product M.

                    theorem ContextFreeGrammar.product_language {T : Type u} {σ : Type} [Fintype σ] [DecidableEq σ] (G : ContextFreeGrammar T) (M : DFA T σ) :
                    (G.product M).language = G.languageM.accepts

                    Bar-Hillel theorem: (G.product M).language = G.language ⊓ M.accepts.

                    theorem Language.IsContextFree.inter_isRegular {α : Type u_1} {L R : Language α} (hL : L.IsContextFree) (hR : R.IsRegular) :
                    (LR).IsContextFree

                    CFL closure under intersection with a regular language (@cite{bar-hillel-perles-shamir-1961}; @cite{hopcroft-motwani-ullman-2000} Theorem 7.27): if L is context-free and R is regular, then L ∩ R is context-free.