Piecewise Testable Languages (PT_k) #

A language L is piecewise k-testable when membership depends only on which length-k subsequences the input contains @cite{simon-1975} @cite{rogers-pullum-2011} @cite{lambert-2022}. PT is to SP what LT is to SL: the Boolean closure of the strictly variant.

Both are property-based (extensional) classifications — there is no canonical grammar, only an indistinguishability relation on strings: w₁ ~_PT w₂ iff their k-subsequence sets coincide. L is PT_k iff it is closed under ~_PT.

Lambert (2026) and the multitier extension #

The multitier closure of PT (which Lambert classifies as 𝒥) covers a large fraction of attested phonotactic constraints — including the piecewise-testable analyses of unbounded stress patterns surveyed in @cite{lambert-2026} §3 and the Karanga Shona tone analysis (which is piecewise testable but not multitier generalised definite). The substrate sits adjacent to Multitier.lean and is composed via IsBTC IsPiecewiseTestable.

Main definitions #

subseqSet k w — the set of length-k subsequences of w.
IsPiecewiseTestable k L — closure of L under equality of subseqSet.

The cast IsStrictlyPiecewise k L → IsPiecewiseTestable k L lives at the end of the file (mathlib convention: cast lives with the larger class).

source

def Core.Computability.Subregular.subseqSet {α : Type u_1} (k : ℕ) (w : List α) :

Set (List α)

The set of subsequences of w of length at most k. The PT indistinguishability relation w₁ ~_PT w₂ is equality of subseqSet k w₁ and subseqSet k w₂.

The "up to k" semantics matches @cite{simon-1975} and the Heinz/Lambert literature (@cite{lambert-2026} Def 11; @cite{heinz-2018}). The "exactly k" alternative would mis-classify languages that distinguish strings of length less than k (e.g. Luganda high-tone plateauing @cite{lambert-2026} (37) distinguishes [h] from [ℓ] while their length-3 subsequence sets are both empty). Unlike factorSet, no boundary augmentation: SP/PT are insensitive to position.

Equations

Core.Computability.Subregular.subseqSet k w = {s : List α | s.Sublist w ∧ s.length ≤ k}

Instances For

source

@[simp]

theorem Core.Computability.Subregular.mem_subseqSet {α : Type u_1} {k : ℕ} {s w : List α} :

s ∈ subseqSet k w ↔ s.Sublist w ∧ s.length ≤ k

source

def Core.Computability.Subregular.IsPiecewiseTestable {α : Type u_1} (k : ℕ) (L : Language α) :

Prop

A language is piecewise k-testable iff strings with the same set of length-≤-k subsequences are either both in L or both out.

Equations

Core.Computability.Subregular.IsPiecewiseTestable k L = ∀ (w₁ w₂ : List α), Core.Computability.Subregular.subseqSet k w₁ = Core.Computability.Subregular.subseqSet k w₂ → (w₁ ∈ L ↔ w₂ ∈ L)

Instances For

source

theorem Core.Computability.Subregular.subseqSet_eq_iff {α : Type u_1} {k : ℕ} {w₁ w₂ : List α} (heq : subseqSet k w₁ = subseqSet k w₂) {s : List α} (hlen : s.length ≤ k) :

s.Sublist w₁ ↔ s.Sublist w₂

Subsequences of length at most k are interchangeable across strings sharing their subseqSet k. The standard ratchet for proving IsPiecewiseTestable k L from a predicate that depends only on length-≤-k subsequence presence.

source

theorem Core.Computability.Subregular.IsStrictlyPiecewise.toIsPiecewiseTestable {α : Type u_1} {k : ℕ} {L : Language α} (h : IsStrictlyPiecewise k L) :

IsPiecewiseTestable k L

SP_k ⊆ PT_k: every strictly-k-piecewise language is piecewise k-testable. The SP test ("every length-k subsequence is permitted") trivially depends only on the set of subsequences of that length.

Documentation

Linglib.Core.Computability.Subregular.PiecewiseTestable

Piecewise Testable Languages (PT_k) #

Lambert (2026) and the multitier extension #

Main definitions #