Belth (2026): A Learning-Based Account of Phonological Tiers [Bel26] #
D2L (Distant-to-Local) is an iterative learner that constructs phonological tiers as a byproduct of trying to express alternations as adjacent dependencies. Given underlying-form / surface-form pairs, D2L starts by projecting all segments and tries to predict the alternating segment from its linear neighbours. When the resulting rule fails Yang's Tolerance Principle [Yan16], D2L deletes the offending segments from the tier and tries again, until either a tolerated rule is found or no further deletion helps.
The output of D2L is a tier-based alternation rule, modelled here by the
canonical Subregular.TierRule schema (in
Phonology/Alternation.lean); the closely-related SPE
non-tier Subregular.LocalRewrite.Rule schema in
Phonology/Subregular/LocalRewrite.lean is the right substrate
when the alternation does not factor through a tier projection.
The function-level subregular classification of D2L outputs lives in
Core/Computability/Subregular/Function/: tier-mediated dissimilation
rules of the form Belth converges to are Tier-Subsequential
([ARGH20]). For Latin -alis / -aris
allomorphy ([Bel26] §5.3, rule 54), the rule D2L converges
to is
Disagree([?lat], {lat}) / [+cons] __ ∘ proj(·, [+cons]),
i.e. dissimilation of the underspecified affix-initial liquid /L/ from
the immediately preceding [+cons] tier segment. [Bel26]
reports 0.97 accuracy on a 121-word Perseus dataset; the residual ~3%
errors are tolerated by Yang's Tolerance Principle (4 ≤ 121/ln 121),
so D2L converges to this rule rather than memorizing the lexicon.
This study formalizes:
- the rule (
latinDissimRule) and its predictions on six worked examples (Belth ex. 5/53): navalis, popularis, pluvalis, floralis, legalis, lunaris; - the genuine empirical limit: lunaris surfaces with
[r], but the[+cons]-tier rule predicts[l](the immediately preceding tier segment is the nasal/n/, which is[−lat], soDisagreeoutputs[+lat]). This is one of the ~3% errors the Tolerance Principle tolerates; - a Tolerance-Principle certificate
(
latinDissimRule_tolerated_on_examples) showing the 1-of-6 exception count on this corpus is well under Yang's threshold; - a Subregular bridge (
consTier_apply_eq_tierProject) grounding the consonant tier in the [Lam22a]/[HRT11] TSL formalism; - an OCP-on-tier OT constraint (
latinOCP, viaConstraints.mkOCPOnTier) and an OT tableau bridge: each empirical contrast becomes a winner-loser pair,tableauERCextracts the Elementary Ranking Condition ([MR16]), and the OT analysis is shown to track the rule-based analysis exactly, including the same lunaris failure (the lunaris ERC is contradictory on the two-constraint inventory ⟨OCP, *r⟩, which is the OT-side analogue of the rule's underextension).
D2L's Turkish (Belth §5.1) and Finnish (§5.2) rules are sketched in §10 below. They require multi-feature dependencies and explicit Elsewhere defaults; both extensions are admittable on demand.
On verification scope. D2L itself — the learning algorithm — is not run inside Lean. Belth's empirical claim is that, given a corpus, D2L converges to specific rules. Verifying that claim end-to-end would require running D2L on naturalistic datasets (CHILDES, MorphoChallenge, Perseus), which is out of scope for a Lean formalization. We instead formalize the learned rules and verify their predictions on representative examples, plus the Tolerance Principle inequality that licenses convergence.
A minimal Latin segment alphabet sufficient for the -alis / -aris
contrast. The unspecified affix-initial liquid is L; surface
allomorphs are l and r.
- a : LatSeg
- e : LatSeg
- i : LatSeg
- o : LatSeg
- u : LatSeg
- l : LatSeg
- r : LatSeg
- L : LatSeg
- n : LatSeg
- v : LatSeg
- s : LatSeg
- g : LatSeg
- f : LatSeg
- p : LatSeg
- b : LatSeg
Instances For
Equations
- Belth2026.instDecidableEqLatSeg x✝ y✝ = if h : x✝.ctorIdx = y✝.ctorIdx then isTrue ⋯ else isFalse ⋯
Equations
- Belth2026.instReprLatSeg.repr Belth2026.LatSeg.a prec✝ = Repr.addAppParen (Std.Format.nest (if prec✝ ≥ 1024 then 1 else 2) (Std.Format.text "Belth2026.LatSeg.a")).group prec✝
- Belth2026.instReprLatSeg.repr Belth2026.LatSeg.e prec✝ = Repr.addAppParen (Std.Format.nest (if prec✝ ≥ 1024 then 1 else 2) (Std.Format.text "Belth2026.LatSeg.e")).group prec✝
- Belth2026.instReprLatSeg.repr Belth2026.LatSeg.i prec✝ = Repr.addAppParen (Std.Format.nest (if prec✝ ≥ 1024 then 1 else 2) (Std.Format.text "Belth2026.LatSeg.i")).group prec✝
- Belth2026.instReprLatSeg.repr Belth2026.LatSeg.o prec✝ = Repr.addAppParen (Std.Format.nest (if prec✝ ≥ 1024 then 1 else 2) (Std.Format.text "Belth2026.LatSeg.o")).group prec✝
- Belth2026.instReprLatSeg.repr Belth2026.LatSeg.u prec✝ = Repr.addAppParen (Std.Format.nest (if prec✝ ≥ 1024 then 1 else 2) (Std.Format.text "Belth2026.LatSeg.u")).group prec✝
- Belth2026.instReprLatSeg.repr Belth2026.LatSeg.l prec✝ = Repr.addAppParen (Std.Format.nest (if prec✝ ≥ 1024 then 1 else 2) (Std.Format.text "Belth2026.LatSeg.l")).group prec✝
- Belth2026.instReprLatSeg.repr Belth2026.LatSeg.r prec✝ = Repr.addAppParen (Std.Format.nest (if prec✝ ≥ 1024 then 1 else 2) (Std.Format.text "Belth2026.LatSeg.r")).group prec✝
- Belth2026.instReprLatSeg.repr Belth2026.LatSeg.L prec✝ = Repr.addAppParen (Std.Format.nest (if prec✝ ≥ 1024 then 1 else 2) (Std.Format.text "Belth2026.LatSeg.L")).group prec✝
- Belth2026.instReprLatSeg.repr Belth2026.LatSeg.n prec✝ = Repr.addAppParen (Std.Format.nest (if prec✝ ≥ 1024 then 1 else 2) (Std.Format.text "Belth2026.LatSeg.n")).group prec✝
- Belth2026.instReprLatSeg.repr Belth2026.LatSeg.v prec✝ = Repr.addAppParen (Std.Format.nest (if prec✝ ≥ 1024 then 1 else 2) (Std.Format.text "Belth2026.LatSeg.v")).group prec✝
- Belth2026.instReprLatSeg.repr Belth2026.LatSeg.s prec✝ = Repr.addAppParen (Std.Format.nest (if prec✝ ≥ 1024 then 1 else 2) (Std.Format.text "Belth2026.LatSeg.s")).group prec✝
- Belth2026.instReprLatSeg.repr Belth2026.LatSeg.g prec✝ = Repr.addAppParen (Std.Format.nest (if prec✝ ≥ 1024 then 1 else 2) (Std.Format.text "Belth2026.LatSeg.g")).group prec✝
- Belth2026.instReprLatSeg.repr Belth2026.LatSeg.f prec✝ = Repr.addAppParen (Std.Format.nest (if prec✝ ≥ 1024 then 1 else 2) (Std.Format.text "Belth2026.LatSeg.f")).group prec✝
- Belth2026.instReprLatSeg.repr Belth2026.LatSeg.p prec✝ = Repr.addAppParen (Std.Format.nest (if prec✝ ≥ 1024 then 1 else 2) (Std.Format.text "Belth2026.LatSeg.p")).group prec✝
- Belth2026.instReprLatSeg.repr Belth2026.LatSeg.b prec✝ = Repr.addAppParen (Std.Format.nest (if prec✝ ≥ 1024 then 1 else 2) (Std.Format.text "Belth2026.LatSeg.b")).group prec✝
Instances For
Equations
- Belth2026.instReprLatSeg = { reprPrec := Belth2026.instReprLatSeg.repr }
[+cons] per [Cse10]'s feature inventory: every non-vowel
is consonantal, except the orthographic ⟨v⟩, which [Bel26]
treats as the glide [w] ([−cons]). This is the choice that lets
pluv-aLis surface as [r] (the tier-preceding consonant of pluv-
is /l/, not /v/). L is [+cons] — the underspecified /L/
projects onto the consonant tier even though its [lat] value is
not yet fixed.
Equations
- Belth2026.LatSeg.a.IsCons = False
- Belth2026.LatSeg.e.IsCons = False
- Belth2026.LatSeg.i.IsCons = False
- Belth2026.LatSeg.o.IsCons = False
- Belth2026.LatSeg.u.IsCons = False
- Belth2026.LatSeg.v.IsCons = False
- x✝.IsCons = True
Instances For
Equations
- One or more equations did not get rendered due to their size.
[+lat] per [Cse10]: only l is lateral, and L is
underspecified for [lat]. Returning Option Bool rather than
Bool here avoids the older "junk default" pattern where L had
to fake some concrete value — none is the honest answer.
Equations
- Belth2026.LatSeg.l.isLat = some true
- Belth2026.LatSeg.L.isLat = none
- x✝.isLat = some false
Instances For
The consonantal tier (Belth's learned tier for Latin, rule 54).
Every [+cons] segment projects.
Instances For
The rule D2L learns under the [+cons] tier ([Bel26]
rule 54): Disagree([?lat], {lat}) / [+cons] __ ∘ proj(·, [+cons]).
Equations
- One or more equations did not get rendered due to their size.
Instances For
Underlying form of popularis 'popular': /popul-aLis/.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Underlying form of pluvalis 'rainy': /pluv-aLis/. With /v/ off
the consonant tier (treated as [w], [−cons]), the tier-preceding
consonant in pluv-a is /l/.
Equations
Instances For
Underlying form of floralis 'floral': /flor-aLis/.
Equations
Instances For
Underlying form of legalis 'legal': /leg-aLis/.
Equations
Instances For
Underlying form of lunaris 'lunar': /lun-aLis/. The empirical
counterexample: surface is [r], but the rule predicts [l]
because the immediately preceding [+cons] segment in /lun-a/ is
the nasal /n/ ([−lat]), and Disagree outputs [+lat].
Equations
Instances For
Surface value predicted by the learned rule: some true = surfaces
as l, some false = surfaces as r, none = the rule has no
opinion.
Equations
- Belth2026.predicted ur lPos = Belth2026.latinDissimRule.applyAt (List.take lPos ur)
Instances For
popularis: tier-preceding consonant is /l/ ([+lat]). Disagree
outputs [−lat] = r. ✓
pluvalis: with /v/ off the consonant tier, tier-preceding
consonant of /pluv-a/ is /l/ ([+lat]). Disagree outputs
[−lat] = r. ✓
floralis: tier-preceding consonant is /r/ ([−lat]). The
intervening /r/ blocks the dissimilation that popularis's
preceding /l/ would have triggered — exactly the long-distance
pattern Belth's tier-projection is designed to capture. ✓
legalis: tier-preceding consonant is /g/ ([−lat]). The
stem-initial /l/ is hidden from /L/ by the intervening /g/
on the consonant tier. ✓
lunaris: the rule predicts [l] (the last [+cons] segment in
/lun-a/ is the nasal /n/, which is [−lat], so Disagree
outputs [+lat]). This is the wrong prediction — the surface form
is [r]. [Bel26] reports this as one of the ~3% of
errors the Tolerance Principle tolerates; D2L does not refine the
tier further on this dataset.
Witness pairs (underlying-form-prefix, expected-surface-lat-value)
for each worked example.
Equations
- One or more equations did not get rendered due to their size.
Instances For
The learned rule misses exactly one datum out of six (lunaris) — a 5/6 ≈ 83% match on this minimal corpus, the same single-error rate Belth reports at the corpus scale (~3%).
Yang's Tolerance Principle is the productivity gate inside D2L: a
rule with n items in scope and e exceptions is tolerated iff
e ≤ n / ln n. For our six worked examples with one exception
(lunaris), 1 ≤ 6 / ln 6 ≈ 3.35 — the rule is comfortably under the
threshold, matching [Bel26]'s 4-of-121 (~3%) result on the
full Perseus corpus.
The threshold inequality for the six-example demo: 1 exception ≤ 6/ln 6.
The learned rule passes Yang's Tolerance Principle on the worked
examples: its exception count (1, lunaris) fits under the threshold
latinData.length / ln latinData.length. This is the productivity
certificate that licenses D2L's convergence to latinDissimRule
rather than memorizing each form.
The consonant tier projection equals the canonical
tierProject from the TSL formalism in Core/Computability/Subregular/.
By construction both reduce to List.filter on the [+cons] predicate,
so the autosegmental and language-theoretic tiers coincide. This
grounds Belth's tier in the TSL_k subregular hierarchy.
The TSL_2 grammar witnessing the Latin allomorphy pattern as a
tier-based subregular language: project to the [+cons] tier, then
forbid adjacent identical symbols. Lambert's [Lam22a]
TSL_k schema, instantiated with IsCons as the tier predicate and
the OCP forbidden 2-factor [some x, some x].
Instances For
Belth's tier-based dissimilation has a natural OCP twin: surface
forms produced by the rule should not contain adjacent [+lat]
segments on the consonant tier. We expose this connection via
Constraints.mkOCPOnTier, which already accepts a generic
Tier. A tier-adjacent pair of identical liquids contributes
one violation.
Instances For
The OCP-on-tier evaluation of latinOCP on a candidate is zero iff
that candidate is in latinTSLGrammar.lang. Specialization of
Subregular.mkOCPOnTier_zero_iff_in_ocp_lang to the Latin
grammar. The two perspectives — markedness constraint with zero
violations and TSL_2 grammar membership — coincide.
The OT analysis uses a minimal two-constraint inventory:
latinOCP(markedness): no adjacent laterals on the[+cons]tier.latinStarR(markedness): the default-[l]preference, penalizing each surface[r].
On this inventory, popularis and pluvalis each force OCP ≫ *r; navalis, floralis, legalis are vacuously satisfied. lunaris is unrankable — its ERC is contradictory (no W, one L), recapitulating the rule's empirical limit.
Substitute val for every L in a candidate string. The two
surface candidates for an underlying form are substL .l ur and
substL .r ur.
Equations
- Belth2026.substL val = List.map fun (seg : Belth2026.LatSeg) => if seg = Belth2026.LatSeg.L then val else seg
Instances For
Markedness constraint penalizing each surface [r]: the
default-[l] preference.
Equations
- Belth2026.latinStarR c = (List.filter (fun (x : Belth2026.LatSeg) => decide (x = Belth2026.LatSeg.r)) c).length
Instances For
The Latin ranking implied by allomorphy: OCP/[+cons] dominates *r.
Equations
Instances For
The two surface candidates for an underlying form.
Equations
Instances For
Membership in latinTSLGrammar.lang is decidable: the bridge to the
integer-valued OCP score (latinOCP_zero_iff_in_TSL) transports the
Decidable (latinOCP c = 0) instance to language membership.
Equations
- Belth2026.instDecidableMemListLatSegLanguageLangLatinTSLGrammar c = decidable_of_iff (Belth2026.latinOCP c = 0) ⋯
The empirically expected TSL_2 membership table for the Belth Latin inventory: each row pairs a candidate with whether the OCP-on-tier grammar should admit it. pluvalis and floralis are the discriminating cases (one candidate excluded for tier-adjacent identicals); navalis admits both (no adjacency); lunaris admits both (the OCP-on-tier grammar's empirical limit — the dissimilation rule fires even though only one tier-lateral is adjacent).
Equations
- One or more equations did not get rendered due to their size.
Instances For
The OCP-on-tier TSL_2 grammar agrees with the expected membership table
on every row of latinTSLExpected: the empirical payoff of the
latinOCP_zero_iff_in_TSL bridge in one shot.
popularis: OCP fires once on [p,o,p,u,l,a,l,i,s] (the tier-adjacent
(l, l)). Only *r-once on the [r]-candidate — but OCP outranks, so
[r] wins.
pluvalis: same OCP-ranking-decides pattern as popularis (the
consonant tier of /pluv-a/ is [p, l], so substituting L → l
creates a tier-adjacent (l, l)).
lunaris: the OT analysis selects [l], not the empirical [r] —
the same error the rule makes (§5). Both candidates have OCP=0
on the consonant tier (the intervening /n/ separates the laterals
in /lun-a-l-is/ so they are not tier-adjacent), so *r picks the
[l]-candidate.
The ERC induced by popularis (winner [r], loser [l]): OCP prefers
the winner (W on constraint 0); *r prefers the loser (L on constraint
1). The full vector is ⟨W, L⟩ = simpleERC 0 1.
Equations
- One or more equations did not get rendered due to their size.
Instances For
The ERC induced by pluvalis: same shape as popularis.
Equations
- One or more equations did not get rendered due to their size.
Instances For
The ERC induced by floralis.
Equations
- One or more equations did not get rendered due to their size.
Instances For
The ERC induced by legalis.
Equations
- One or more equations did not get rendered due to their size.
Instances For
The ERC induced by lunaris (taking the empirical [r] as the winner): OCP indifferent (no laterals are tier-adjacent on either candidate), *r prefers the loser [l]. ⟨e, L⟩ — contradictory (no W, one L). No ranking on the inventory ⟨OCP, *r⟩ can satisfy this; the OT formulation thus reproduces exactly the rule's empirical limit.
Equations
- One or more equations did not get rendered due to their size.
Instances For
The five non-lunaris contrasts form a consistent ERC set: the
identity ranking (OCP at position 0, *r at position 1) satisfies
every word's ranking requirements. Two informative ERCs (popularis,
pluvalis) both reduce to simpleERC 0 1; the other three are
trivial.
Bridge to the dominance characterisation: under any ranking satisfying the empirical ERC set (sans lunaris), OCP dominates *r.
TSL_2 witness: Latin liquid dissimilation is tier-strictly-local at window-size 2.
BTSL_2 corollary (via IsTierStrictlyLocal.toIsBTSL in
Subregular.Multitier): Latin liquid dissimilation
is in the multitier closure of strictly local languages, hence consumed
by the [Lam26] BTC framework.