Equational characterizations of subregular language classes #
@cite{lambert-2026} §6.2 (paper p. 22-25, with summary in Table 6 p. 28)
characterizes each base-class of subregular languages by a system of
equations on the syntactic semigroup: D = ⟦sx^ω = x^ω⟧,
K = ⟦x^ω y = x^ω⟧, LI = ⟦x^ω y x^ω = x^ω⟧, N = ⟦x^ω y = x^ω; yx^ω = x^ω⟧ (definite, reverse-definite, generalized-definite,
co/finite, respectively).
This file lands the k-definite case (Lambert Prop 53, p. 23) as a
feasibility probe — the simplest entry into Lambert's algebraic story
because it requires no omegaPow (idempotent power) machinery.
Lambert's claim:
A language is
k-definite if and only if it is in⟦sx₁ … xₖ = x₁ … xₖ⟧.
Mathlib precedent: monoid + length-k letter-sequence variables #
Lambert's syntactic semigroup excludes the empty word; our
Language.syntacticMonoid L (built via Con (FreeMonoid α), see
SyntacticMonoid.lean) includes the identity (the class of the empty
word). Mathlib's Con.Quotient precedent gives us a Monoid, not a
Semigroup; there is no established mathlib syntacticSemigroup
pattern. We follow mathlib precedent and keep the Monoid setting.
Letter-sequence quantification (not arbitrary monoid elements) #
To recover Lambert's intended characterization in the monoid setting,
the equation quantifies x₁, …, xₖ over length-k letter sequences
in the alphabet α rather than over arbitrary syntactic-monoid
elements. This is the "generators" interpretation: the equation says
"prepending any prefix to a length-k letter sequence preserves its
syntactic class".
The naïve pure-monoid form
∀ s ≠ 1, ∀ xs : List M of length k, (∀ x ∈ xs, x ≠ 1) → s * xs.prod = xs.prod
is strictly weaker and does not characterize k-definite. Concrete
counterexample: take L = (a|b)* over the alphabet {a, b, c} —
membership is "no c anywhere". Then [a] = [b] = 1 in the syntactic
monoid (inserting a or b anywhere preserves "no c"), and the
syntactic monoid is the rank-2 lattice M = {1, 0} with 0 = [c]
absorbing. The pure-monoid equation trivially holds: the only
non-identity element is 0, and 0 * 0 = 0. Yet L is not
k-definite for any k — the words c·a^k and a^k share the
length-k suffix a^k but only the latter is in L.
The letter-sequence formulation rules this out: for αs = [a] of
length 1, the syntactic class of αs is 1 in M, and the equation
s * 1 = 1 forces s = 1, which fails for s = [c] = 0. So the
equation correctly detects that L is not 1-definite.
(Lambert's semigroup version sidesteps the trivial-letter issue because his syntactic semigroup is generated only by the non-trivial letter classes, implicitly ignoring L-trivial letters in the alphabet. The letter-sequence monoid form makes this explicit.)
Main definitions #
Language.kDefiniteEquation L k— the equation∀ s ∈ L.syntacticMonoid, ∀ αs : List α with αs.length = k, s * [αs] = [αs]. The product on the left is monoid multiplication inL.syntacticMonoid;[αs]denotesL.toSyntacticMonoid (FreeMonoid.ofList αs).
Main results #
Language.IsDefinite.satisfies_kDefiniteEquation— forward direction: ak-definite language's syntactic monoid satisfies the equation. Proof: extract aFreeMonoid αrepresentativewofs; the equation reduces toSyntacticEquiv L (w ++ αs) αs, which follows fromtakeAt_right_append_left_absorb(since|αs| = k, the length-ksuffix ofx ++ w ++ αs ++ yalready discardsw).Language.isDefinite_of_satisfies_kDefiniteEquation— reverse direction: the equation impliesk-definiteness. Construction:G.permitted := { Edge.right.takeAt k w | w ∈ L }. The trivial inclusionL ⊆ G.langholds with witnessw' = w. The reverse inclusionG.lang ⊆ Lis by case analysis on word length: short words equal their own suffix (forcing equality); long words decompose asprefix ++ ksand the equation givesSyntacticEquiv L w ks, thenL-saturation closes the case.Language.isDefinite_iff_satisfies_kDefiniteEquation— Lambert Prop 53 bidirectional bundling.
In the same file, Lambert Prop 57 (reverse-definite, K) and Prop 58
(generalized definite, ℒℐ) are also landed using the same letter-sequence
template. The Pin omega-power forms (Pin.lean) consume these finite-k
iffs to derive their own iffs.
Out of scope (queued for follow-up files) #
multitier ℬ𝒯Cextensions (@cite{lambert-2026} §6.3, Table 6 right column).
References #
- @cite{lambert-2026} §6.2, Prop 53 (paper p. 23).
- @cite{straubing-1985}, @cite{almeida-1995} — the equational-class framework Lambert builds on.
Lambert (2026) Prop 53 equation (length-k letter-sequence form):
for any element s of L.syntacticMonoid and any length-k letter
sequence αs : List α, prepending s to the syntactic class of αs
preserves it.
The variables x₁, …, xₖ of Lambert's notation
⟦sx₁ … xₖ = x₁ … xₖ⟧ are instantiated by the letters of αs,
each xᵢ as the syntactic class of a single letter, so the product
x₁ ⋯ xₖ corresponds to the syntactic class of αs. See file
docstring for the rationale: the alternative pure-monoid form
quantifying over arbitrary xs : List M is strictly weaker and
fails to characterize k-definite.
Equations
- L.kDefiniteEquation k = ∀ (s : L.syntacticMonoid) (αs : List α), αs.length = k → s * L.toSyntacticMonoid (FreeMonoid.ofList αs) = L.toSyntacticMonoid (FreeMonoid.ofList αs)
Instances For
The right-k-suffix of x ++ rest equals the right-k-suffix
of rest whenever rest.length ≥ k.
Under the k-definite equation, prepending any prefix to a
length-k word gives a syntactically equivalent word.
This is the algebraic heart of the reverse direction: applying the
equation at s = [w] and unfolding
L.toSyntacticMonoid (w * αs) = L.toSyntacticMonoid w * L.toSyntacticMonoid αs gives the syntactic-monoid equality
[w * αs] = [αs], which lifts via Quotient.exact to the two-sided
syntactic equivalence on the underlying lists.
Lambert Prop 53 (forward direction): a k-definite language's
syntactic monoid satisfies the k-definite equation: prepending any
syntactic-monoid element s to a length-k letter sequence αs
preserves the syntactic class of αs.
Lambert Prop 53 (reverse direction): if a language's syntactic
monoid satisfies the k-definite equation, then the language is
k-definite.
Construction: G.permitted := {Edge.right.takeAt k w | w ∈ L}. The
trivial direction L ⊆ G.lang holds with witness w' = w. The
interesting direction G.lang ⊆ L: if w ∈ G.lang, there is some
u ∈ L with the same length-k right-suffix; case-split on
u.length, w.length vs k:
- Both
u.length ≤ k,w.length ≤ k: theirtakeAt k-suffixes areu,wthemselves; equality forcesw = u ∈ L. u.length ≤ k,w.length > k: the suffix-length match forcesu.length = k, souis the length-kright-suffix ofw. Apply the equation to getSyntacticEquiv L w u; saturation closes.u.length > k,w.length ≤ k: symmetric to (2).- Both
u.length > k,w.length > k: both decompose asprefix ++ ksfor the shared length-ksuffixks. Apply equation to both, gettingSyntacticEquiv L w ksandSyntacticEquiv L u ks; chain transitively, then saturation.
Lambert (2026) Prop 53: a language is k-definite iff its
syntactic monoid satisfies the k-definite equation. Bidirectional
bundling of IsDefinite.satisfies_kDefiniteEquation and
isDefinite_of_satisfies_kDefiniteEquation.
Lambert (2026) Prop 57 equation for reverse-definite
languages (length-k letter-sequence, monoid form): for any element
s of the syntactic monoid and any length-k letter sequence αs,
post-multiplying [αs] by s preserves it. Mirror of
kDefiniteEquation with right-multiplication instead of left.
Lambert's notation: 𝒦_k = ⟦x₁ ⋯ xₖ s = x₁ ⋯ xₖ⟧ (paper Prop 57).
Equations
- L.kReverseDefiniteEquation k = ∀ (s : L.syntacticMonoid) (αs : List α), αs.length = k → L.toSyntacticMonoid (FreeMonoid.ofList αs) * s = L.toSyntacticMonoid (FreeMonoid.ofList αs)
Instances For
Under the reverse-k-definite equation, appending any suffix to
a length-k word gives a syntactically equivalent word. Mirror of
syntacticEquiv_of_kDefiniteEquation.
Lambert Prop 57 (forward direction): a reverse-k-definite
language's syntactic monoid satisfies the reverse-k-definite equation.
Lambert Prop 57 (reverse direction): if a language's syntactic
monoid satisfies the reverse-k-definite equation, then the language is
reverse-k-definite. Mirror of isDefinite_of_satisfies_kDefiniteEquation.
Lambert (2026) Prop 57: a language is reverse-k-definite iff
its syntactic monoid satisfies the reverse-k-definite equation.
Lambert (2026) Prop 58 equation for generalized k-definite
languages (length-k letter-sequence, sandwich monoid form): for any
s and any length-k letter sequence αs, sandwiching s between
two copies of [αs] absorbs s:
[αs] · s · [αs] = [αs].
Lambert's notation: ℒℐ_k = ⟦x₁ ⋯ xₖ s x₁ ⋯ xₖ = x₁ ⋯ xₖ⟧
(@cite{lambert-2026} Prop 58; @cite{straubing-1985}). The two αs
instances are bound to the same letter sequence; this is the
"simplified" form of the more general two-variable equation
[αs · s · βs] = [αs · βs] that @cite{lambert-2026} remarks defines
the same class.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Under the LI equation, sandwiching any word w between two copies
of a length-k word αs is syntactically equivalent to αs alone.
Lambert Prop 58 (forward direction): a generalized-k-definite
language's syntactic monoid satisfies the LI equation.
Proof: apply IsGeneralizedDefinite to the pair
(x ++ αs ++ s ++ αs ++ y, x ++ αs ++ y). They share the
length-k left-prefix (both have x ++ αs as prefix) and the
length-k right-suffix (both have αs ++ y as suffix).
Lambert Prop 58 (reverse direction): if a language's syntactic
monoid satisfies the LI equation, then L is generalized k-definite.
Strategy (double-sandwich on [w₁ · w₂]): for w₁, w₂ of length
≥ k with shared length-k prefix αs and length-k suffix βs,
both decompose two ways: wᵢ = αs ++ bᵢ = cᵢ ++ βs. Then in the
syntactic monoid:
[w₁ · w₂] = [αs · b₁ · αs · b₂] = [αs · b₂] = [w₂](αs-sandwich applied withs := [b₁])[w₁ · w₂] = [c₁ · βs · c₂ · βs] = [c₁ · βs] = [w₁](βs-sandwich applied withs := [c₂])
so [w₁] = [w₂] and hence w₁ ≡_L w₂. This single double-sandwich
move handles both the long case (|wᵢ| ≥ 2k, αs and βs disjoint)
and the overlap case (k ≤ |wᵢ| < 2k, αs and βs overlap in wᵢ)
uniformly — the algebra in M_L is identical because the absorption
acts on the [bᵢ]/[cᵢ] factors regardless of their length.
The short case (|w₁| < k or |w₂| < k) is forced trivial: equal
takeAt_left k requires equal lengths when one side is shorter than
k, so w₁ = w₂ directly.
Lambert (2026) Prop 58: a language is generalized k-definite
iff its syntactic monoid satisfies the LI equation.