Documentation

Linglib.Core.Computability.SyntacticMonoid.Equations

Equational characterizations of subregular language classes #

@cite{lambert-2026} §6.2 (paper p. 22-25, with summary in Table 6 p. 28) characterizes each base-class of subregular languages by a system of equations on the syntactic semigroup: D = ⟦sx^ω = x^ω⟧, K = ⟦x^ω y = x^ω⟧, LI = ⟦x^ω y x^ω = x^ω⟧, N = ⟦x^ω y = x^ω; yx^ω = x^ω⟧ (definite, reverse-definite, generalized-definite, co/finite, respectively).

This file lands the k-definite case (Lambert Prop 53, p. 23) as a feasibility probe — the simplest entry into Lambert's algebraic story because it requires no omegaPow (idempotent power) machinery. Lambert's claim:

A language is k-definite if and only if it is in ⟦sx₁ … xₖ = x₁ … xₖ⟧.

Mathlib precedent: monoid + length-k letter-sequence variables #

Lambert's syntactic semigroup excludes the empty word; our Language.syntacticMonoid L (built via Con (FreeMonoid α), see SyntacticMonoid.lean) includes the identity (the class of the empty word). Mathlib's Con.Quotient precedent gives us a Monoid, not a Semigroup; there is no established mathlib syntacticSemigroup pattern. We follow mathlib precedent and keep the Monoid setting.

Letter-sequence quantification (not arbitrary monoid elements) #

To recover Lambert's intended characterization in the monoid setting, the equation quantifies x₁, …, xₖ over length-k letter sequences in the alphabet α rather than over arbitrary syntactic-monoid elements. This is the "generators" interpretation: the equation says "prepending any prefix to a length-k letter sequence preserves its syntactic class".

The naïve pure-monoid form ∀ s ≠ 1, ∀ xs : List M of length k, (∀ x ∈ xs, x ≠ 1) → s * xs.prod = xs.prod is strictly weaker and does not characterize k-definite. Concrete counterexample: take L = (a|b)* over the alphabet {a, b, c} — membership is "no c anywhere". Then [a] = [b] = 1 in the syntactic monoid (inserting a or b anywhere preserves "no c"), and the syntactic monoid is the rank-2 lattice M = {1, 0} with 0 = [c] absorbing. The pure-monoid equation trivially holds: the only non-identity element is 0, and 0 * 0 = 0. Yet L is not k-definite for any k — the words c·a^k and a^k share the length-k suffix a^k but only the latter is in L.

The letter-sequence formulation rules this out: for αs = [a] of length 1, the syntactic class of αs is 1 in M, and the equation s * 1 = 1 forces s = 1, which fails for s = [c] = 0. So the equation correctly detects that L is not 1-definite.

(Lambert's semigroup version sidesteps the trivial-letter issue because his syntactic semigroup is generated only by the non-trivial letter classes, implicitly ignoring L-trivial letters in the alphabet. The letter-sequence monoid form makes this explicit.)

Main definitions #

Main results #

In the same file, Lambert Prop 57 (reverse-definite, K) and Prop 58 (generalized definite, ℒℐ) are also landed using the same letter-sequence template. The Pin omega-power forms (Pin.lean) consume these finite-k iffs to derive their own iffs.

Out of scope (queued for follow-up files) #

References #

def Language.kDefiniteEquation {α : Type u_1} (L : Language α) (k : ) :

Lambert (2026) Prop 53 equation (length-k letter-sequence form): for any element s of L.syntacticMonoid and any length-k letter sequence αs : List α, prepending s to the syntactic class of αs preserves it.

The variables x₁, …, xₖ of Lambert's notation ⟦sx₁ … xₖ = x₁ … xₖ⟧ are instantiated by the letters of αs, each xᵢ as the syntactic class of a single letter, so the product x₁ ⋯ xₖ corresponds to the syntactic class of αs. See file docstring for the rationale: the alternative pure-monoid form quantifying over arbitrary xs : List M is strictly weaker and fails to characterize k-definite.

Equations
Instances For
    theorem Language.takeAt_right_append_left_absorb {α : Type u_2} (x rest : List α) {k : } (h : k rest.length) :

    The right-k-suffix of x ++ rest equals the right-k-suffix of rest whenever rest.length ≥ k.

    theorem Language.syntacticEquiv_of_kDefiniteEquation {α : Type u_1} {L : Language α} {k : } (h : L.kDefiniteEquation k) (w αs : List α) (hαs_len : αs.length = k) :
    L.SyntacticEquiv (w ++ αs) αs

    Under the k-definite equation, prepending any prefix to a length-k word gives a syntactically equivalent word.

    This is the algebraic heart of the reverse direction: applying the equation at s = [w] and unfolding L.toSyntacticMonoid (w * αs) = L.toSyntacticMonoid w * L.toSyntacticMonoid αs gives the syntactic-monoid equality [w * αs] = [αs], which lifts via Quotient.exact to the two-sided syntactic equivalence on the underlying lists.

    Lambert Prop 53 (forward direction): a k-definite language's syntactic monoid satisfies the k-definite equation: prepending any syntactic-monoid element s to a length-k letter sequence αs preserves the syntactic class of αs.

    Lambert Prop 53 (reverse direction): if a language's syntactic monoid satisfies the k-definite equation, then the language is k-definite.

    Construction: G.permitted := {Edge.right.takeAt k w | w ∈ L}. The trivial direction L ⊆ G.lang holds with witness w' = w. The interesting direction G.lang ⊆ L: if w ∈ G.lang, there is some u ∈ L with the same length-k right-suffix; case-split on u.length, w.length vs k:

    1. Both u.length ≤ k, w.length ≤ k: their takeAt k-suffixes are u, w themselves; equality forces w = u ∈ L.
    2. u.length ≤ k, w.length > k: the suffix-length match forces u.length = k, so u is the length-k right-suffix of w. Apply the equation to get SyntacticEquiv L w u; saturation closes.
    3. u.length > k, w.length ≤ k: symmetric to (2).
    4. Both u.length > k, w.length > k: both decompose as prefix ++ ks for the shared length-k suffix ks. Apply equation to both, getting SyntacticEquiv L w ks and SyntacticEquiv L u ks; chain transitively, then saturation.

    Lambert (2026) Prop 53: a language is k-definite iff its syntactic monoid satisfies the k-definite equation. Bidirectional bundling of IsDefinite.satisfies_kDefiniteEquation and isDefinite_of_satisfies_kDefiniteEquation.

    def Language.kReverseDefiniteEquation {α : Type u_1} (L : Language α) (k : ) :

    Lambert (2026) Prop 57 equation for reverse-definite languages (length-k letter-sequence, monoid form): for any element s of the syntactic monoid and any length-k letter sequence αs, post-multiplying [αs] by s preserves it. Mirror of kDefiniteEquation with right-multiplication instead of left.

    Lambert's notation: 𝒦_k = ⟦x₁ ⋯ xₖ s = x₁ ⋯ xₖ⟧ (paper Prop 57).

    Equations
    Instances For
      theorem Language.syntacticEquiv_of_kReverseDefiniteEquation {α : Type u_1} {L : Language α} {k : } (h : L.kReverseDefiniteEquation k) (αs w : List α) (hαs_len : αs.length = k) :
      L.SyntacticEquiv (αs ++ w) αs

      Under the reverse-k-definite equation, appending any suffix to a length-k word gives a syntactically equivalent word. Mirror of syntacticEquiv_of_kDefiniteEquation.

      Lambert Prop 57 (forward direction): a reverse-k-definite language's syntactic monoid satisfies the reverse-k-definite equation.

      Lambert Prop 57 (reverse direction): if a language's syntactic monoid satisfies the reverse-k-definite equation, then the language is reverse-k-definite. Mirror of isDefinite_of_satisfies_kDefiniteEquation.

      Lambert (2026) Prop 57: a language is reverse-k-definite iff its syntactic monoid satisfies the reverse-k-definite equation.

      def Language.kGeneralizedDefiniteEquation {α : Type u_1} (L : Language α) (k : ) :

      Lambert (2026) Prop 58 equation for generalized k-definite languages (length-k letter-sequence, sandwich monoid form): for any s and any length-k letter sequence αs, sandwiching s between two copies of [αs] absorbs s: [αs] · s · [αs] = [αs].

      Lambert's notation: ℒℐ_k = ⟦x₁ ⋯ xₖ s x₁ ⋯ xₖ = x₁ ⋯ xₖ⟧ (@cite{lambert-2026} Prop 58; @cite{straubing-1985}). The two αs instances are bound to the same letter sequence; this is the "simplified" form of the more general two-variable equation [αs · s · βs] = [αs · βs] that @cite{lambert-2026} remarks defines the same class.

      Equations
      • One or more equations did not get rendered due to their size.
      Instances For
        theorem Language.syntacticEquiv_of_kGeneralizedDefiniteEquation {α : Type u_1} {L : Language α} {k : } (h : L.kGeneralizedDefiniteEquation k) (αs w : List α) (hαs_len : αs.length = k) :
        L.SyntacticEquiv (αs ++ w ++ αs) αs

        Under the LI equation, sandwiching any word w between two copies of a length-k word αs is syntactically equivalent to αs alone.

        Lambert Prop 58 (forward direction): a generalized-k-definite language's syntactic monoid satisfies the LI equation.

        Proof: apply IsGeneralizedDefinite to the pair (x ++ αs ++ s ++ αs ++ y, x ++ αs ++ y). They share the length-k left-prefix (both have x ++ αs as prefix) and the length-k right-suffix (both have αs ++ y as suffix).

        Lambert Prop 58 (reverse direction): if a language's syntactic monoid satisfies the LI equation, then L is generalized k-definite.

        Strategy (double-sandwich on [w₁ · w₂]): for w₁, w₂ of length ≥ k with shared length-k prefix αs and length-k suffix βs, both decompose two ways: wᵢ = αs ++ bᵢ = cᵢ ++ βs. Then in the syntactic monoid:

        • [w₁ · w₂] = [αs · b₁ · αs · b₂] = [αs · b₂] = [w₂] (αs-sandwich applied with s := [b₁])
        • [w₁ · w₂] = [c₁ · βs · c₂ · βs] = [c₁ · βs] = [w₁] (βs-sandwich applied with s := [c₂])

        so [w₁] = [w₂] and hence w₁ ≡_L w₂. This single double-sandwich move handles both the long case (|wᵢ| ≥ 2k, αs and βs disjoint) and the overlap case (k ≤ |wᵢ| < 2k, αs and βs overlap in wᵢ) uniformly — the algebra in M_L is identical because the absorption acts on the [bᵢ]/[cᵢ] factors regardless of their length.

        The short case (|w₁| < k or |w₂| < k) is forced trivial: equal takeAt_left k requires equal lengths when one side is shorter than k, so w₁ = w₂ directly.

        Lambert (2026) Prop 58: a language is generalized k-definite iff its syntactic monoid satisfies the LI equation.