Frisch, Pierrehumbert & Broe (2004) @cite{frisch-pierrehumbert-broe-2004} #
Similarity Avoidance and the OCP. Natural Language & Linguistic Theory 22(1):179–228.
@cite{frisch-pierrehumbert-broe-2004} (FPB) argue that the OCP-Place constraint in Arabic verbal roots is a gradient constraint whose strength is a quantitative function of the similarity between homorganic consonants. The categorical OCP-Place analyses of @cite{mccarthy-1986}, @cite{padgett-1995}, and @cite{mccarthy-1994} all face the same trade-off: dividing consonants into co-occurrence classes either ignores robust within-class variation (broad classes — many exceptions) or fragments the data into ad-hoc sub-classes (narrow classes — many missing generalisations). FPB resolve the trade-off with a single gradient constraint based on the natural-classes similarity metric (eq. 7):
Similarity(a, b) = SharedClasses(a, b) / (SharedClasses + NonShared)
restricted to natural classes containing a place feature. Identical consonants have similarity 1; non-homorganic consonants have similarity 0. The metric is sensitive to inventory contrast: larger, more divided classes (e.g. coronals) yield lower similarity for any pair within them than smaller classes (e.g. labials), exactly capturing the empirical distance between the strong coronal OCP and the weaker guttural/dorsal OCPs.
What this file formalises #
- §1 — the 28-segment Arabic consonant inventory from FPB (8) p. 201.
- §2 — the labial natural classes as enumerated by the paper itself in its two worked-example computations (p. 199). The enumeration is given separately for the /f, m/ and /b, f/ examples (matching the paper's text); see the design-boundary comment below on why the two enumerations are not identical.
- §3 — the natural-classes similarity metric (eq. 7), parameterised on a list of relevant natural classes.
- §3 worked examples — the paper's two explicit computations: similarity(/f, m/) = 2/9 and similarity(/b, f/) = 3/8 (p. 199).
- §4 — the empirical Table IV (p. 203) for adjacent pairs: 9 (similarity-bin, O/E) data points whose monotonic decrease embodies FPB's gradient claim.
- §5 — the substrate connection (
thresholdedTSL+thresholdedTSL_pair_iff) showing that the TSL_2 grammarTSLGrammar.ofForbiddenPairs (similarity ≥ t) Arabic.isLabialmakes a binary step-function decision on labial pairs (accept iff similarity strictly below threshold), and the cross-framework divergence theoremcategorical_fails_three_test_pointsshowing that no two-valued model (and hence no similarity-threshold TSL_2 grammar) can match three specific Table IV bins with three pairwise-distinct O/E values. This is the necessary-consequence formalisation of the design-boundary claim inCore/Computability/Subregular/ForbiddenPairs.lean. FPB's actual argument is stronger — it compares R² fits across nine bins (Categorical 0.70 vs Natural Classes 0.75 per Table V p. 207) — but full R² formalisation requires the lexical corpus and is deferred.
What this file does not formalise #
- The 2,674-root corpus from @cite{cowan-1979} (the Hans Wehr Arabic-English dictionary) that anchors FPB's O/E computations is not in the paper text. We use the paper's reported O/E values (Table IV) as data, not a re-derived corpus.
- The R² model fits (Table V: Frequency 0.57, Categorical 0.70, Soft 0.73, Feature 0.71, Natural Classes 0.75) require the corpus. Reproduction is deferred.
- The stochastic constraint model of FPB §3.4 (logistic fit with K, S parameters; @cite{frisch-broe-pierrehumbert-1997}, Rutgers Optimality Archive) requires the corpus and is deferred.
- §4.1 Frisch–Zawaydeh nonce-verb judgments (@cite{frisch-zawaydeh-2001}: Arabic speakers rate /babaθa/ identical < /θabama/ similar adjacent < /baʃafa/ similar nonadjacent < /baʔada/ nonhomorganic in OCP-violation severity) — experimental data, summarised in docstring only.
- §4.2 Maltese borrowings from Italian (FPB Table VI p. 213: identical 0.26, similar homorganic 0.45, coronal stop/fric 0.78 — gradient OCP applied selectively to incorporated Italian forms; @cite{mifsud-1995}) — corpus-based, summarised only.
- §4.2 cross-linguistic similarity-OCP attestations (Tigrinya @cite{buckley-1997-ocp}, Russian @cite{padgett-1995}, English @cite{berkley-1994}, Thai @cite{frisch-2000a}) — referenced in docstring.
- §4.3 phonetic / cognitive origin (@cite{berg-1998}, @cite{boersma-1998}, @cite{frisch-1996} processing-difficulty argument; speech-error data of @cite{abd-el-jawad-abu-salim-1987}: /takriib/ for /takbiir/ 'glorification', /maraaʕiʃ/ for /maʃaaʕir/ 'feelings') — diachronic-functional grounding, summarised only.
- Full natural-class derivation from a feature matrix — the paper sketches this (eq. 8 lays out the feature matrix; @cite{broe-1993}'s specification theory provides the lattice machinery), but a faithful Lean implementation requires a substrate effort the audit explicitly flagged as a separate next step. This file reproduces the paper's worked-example natural-class lists (per-pair) rather than deriving them.
Connection to ForbiddenPairs.lean's design boundary #
Linglib/Core/Computability/Subregular/ForbiddenPairs.lean (the
substrate file for tier-based strictly 2-local grammars defined by
forbidden-pair relations) cites FPB in its design-boundary section as
the empirical motivation for "single-tier TSL_2 cannot capture gradient
similarity-based OCP." Pre-this-file, that citation lived only in
docstring prose. Two declarations below make the claim Lean-formal:
(1) thresholdedTSL instantiates the substrate's
TSLGrammar.ofForbiddenPairs with R := λ x y => similarity xs x y ≥ t,
giving a real (not metaphorical) TSL_2 grammar over Arabic; and
(2) thresholdedTSL_pair_iff proves the grammar's accept/reject
decision on labial pairs is exactly the binary step function
similarity < t. The categorical_fails_three_test_points divergence
theorem then witnesses that no such two-valued model can match three
specific Table IV bins.
Connection to Hansson2010.lean #
@cite{hansson-2010} (Phenomena/Phonology/Studies/Hansson2010.lean)
cites FPB at line 76 in its design-boundary section on
similarity-graded transparency: cases where intervening segments
behave differently depending on similarity to the harmonising pair
cannot be captured by single-tier TSL with a fixed tier predicate.
This file is the load-bearing instance of that observation for
Arabic OCP-Place specifically.
Why this paper anchors a study file #
Per CLAUDE.md's anchoring discipline: every Lean file is anchored to
exactly one of (a) a specific paper, (b) a documented empirical
pattern, or (c) a named theoretical framework. The audit on
ForbiddenPairs.lean (committed 0.230.508, hash 8579b346) flagged
FPB as one of four "silent divergences" — a paper used as a substrate
file's docstring example without itself being anchored. This file
closes that finding by anchoring FPB to its primary phenomenon
(Phonology) with a Lean-formal divergence theorem connecting back to
the substrate citation.
§ 1: The Arabic Consonant Inventory (FPB feature matrix (8), p. 201) #
The 28-segment Arabic consonant inventory used by
@cite{frisch-pierrehumbert-broe-2004} eq. 8 (p. 201). The labelling
follows the paper's IPA-with-superscript-ˁ-for-emphatic convention.
This file's worked examples and divergence theorem use only the labial
sub-inventory {b, f, m, w}; the full 28-segment list is included to
keep the inventory anchored to FPB's feature matrix and to make the
file extensible to future enumeration work over the full inventory.
- b : Arabic
/b/ — voiced labial stop.
- f : Arabic
/f/ — voiceless labial fricative.
- m : Arabic
/m/ — labial nasal.
- t : Arabic
/t/ — voiceless coronal stop.
- d : Arabic
/d/ — voiced coronal stop.
- tEmph : Arabic
/tˁ/ — emphatic voiceless coronal stop.
- dEmph : Arabic
/dˁ/ — emphatic voiced coronal stop.
- theta : Arabic
/θ/ — voiceless coronal fricative.
- edh : Arabic
/ð/ — voiced coronal fricative.
- s : Arabic
/s/ — voiceless coronal sibilant.
- z : Arabic
/z/ — voiced coronal sibilant.
- sEmph : Arabic
/sˁ/ — emphatic voiceless coronal sibilant.
- zEmph : Arabic
/zˁ/ — emphatic voiced coronal sibilant.
- esh : Arabic
/ʃ/ — voiceless palatoalveolar sibilant.
- k : Arabic
/k/ — voiceless dorsal stop.
- g : Arabic
/g/ — voiced dorsal stop.
- q : Arabic
/q/ — uvular stop (dorsal+pharyngeal in FPB's analysis).
- chi : Arabic
/χ/ — voiceless uvular fricative.
- gamma : Arabic
/ʁ/ — voiced uvular fricative.
- hbar : Arabic
/ħ/ — voiceless pharyngeal fricative.
- ayin : Arabic
/ʕ/ — voiced pharyngeal fricative.
- h : Arabic
/h/ — voiceless laryngeal fricative.
- glottal : Arabic
/ʔ/ — laryngeal stop.
- l : Arabic
/l/ — coronal lateral.
- r : Arabic
/r/ — coronal rhotic.
- n : Arabic
/n/ — coronal nasal.
- w : Arabic
/w/ — labial-velar glide.
- j : Arabic
/j/ — palatal glide.
Instances For
Equations
- Phonology.Studies.FrischPierrehumbertBroe2004.instDecidableEqArabic x✝ y✝ = if h : x✝.ctorIdx = y✝.ctorIdx then isTrue ⋯ else isFalse ⋯
Equations
- One or more equations did not get rendered due to their size.
Instances For
The labial sub-inventory {b, f, m, w}. Used by the worked examples
and the divergence theorem.
Equations
Instances For
Equations
- Phonology.Studies.FrischPierrehumbertBroe2004.instDecidablePredArabicIsLabial Phonology.Studies.FrischPierrehumbertBroe2004.Arabic.b = isTrue trivial
- Phonology.Studies.FrischPierrehumbertBroe2004.instDecidablePredArabicIsLabial Phonology.Studies.FrischPierrehumbertBroe2004.Arabic.f = isTrue trivial
- Phonology.Studies.FrischPierrehumbertBroe2004.instDecidablePredArabicIsLabial Phonology.Studies.FrischPierrehumbertBroe2004.Arabic.m = isTrue trivial
- Phonology.Studies.FrischPierrehumbertBroe2004.instDecidablePredArabicIsLabial Phonology.Studies.FrischPierrehumbertBroe2004.Arabic.w = isTrue trivial
- Phonology.Studies.FrischPierrehumbertBroe2004.instDecidablePredArabicIsLabial Phonology.Studies.FrischPierrehumbertBroe2004.Arabic.t = isFalse not_false
- Phonology.Studies.FrischPierrehumbertBroe2004.instDecidablePredArabicIsLabial Phonology.Studies.FrischPierrehumbertBroe2004.Arabic.d = isFalse not_false
- Phonology.Studies.FrischPierrehumbertBroe2004.instDecidablePredArabicIsLabial Phonology.Studies.FrischPierrehumbertBroe2004.Arabic.tEmph = isFalse not_false
- Phonology.Studies.FrischPierrehumbertBroe2004.instDecidablePredArabicIsLabial Phonology.Studies.FrischPierrehumbertBroe2004.Arabic.dEmph = isFalse not_false
- Phonology.Studies.FrischPierrehumbertBroe2004.instDecidablePredArabicIsLabial Phonology.Studies.FrischPierrehumbertBroe2004.Arabic.theta = isFalse not_false
- Phonology.Studies.FrischPierrehumbertBroe2004.instDecidablePredArabicIsLabial Phonology.Studies.FrischPierrehumbertBroe2004.Arabic.edh = isFalse not_false
- Phonology.Studies.FrischPierrehumbertBroe2004.instDecidablePredArabicIsLabial Phonology.Studies.FrischPierrehumbertBroe2004.Arabic.s = isFalse not_false
- Phonology.Studies.FrischPierrehumbertBroe2004.instDecidablePredArabicIsLabial Phonology.Studies.FrischPierrehumbertBroe2004.Arabic.z = isFalse not_false
- Phonology.Studies.FrischPierrehumbertBroe2004.instDecidablePredArabicIsLabial Phonology.Studies.FrischPierrehumbertBroe2004.Arabic.sEmph = isFalse not_false
- Phonology.Studies.FrischPierrehumbertBroe2004.instDecidablePredArabicIsLabial Phonology.Studies.FrischPierrehumbertBroe2004.Arabic.zEmph = isFalse not_false
- Phonology.Studies.FrischPierrehumbertBroe2004.instDecidablePredArabicIsLabial Phonology.Studies.FrischPierrehumbertBroe2004.Arabic.esh = isFalse not_false
- Phonology.Studies.FrischPierrehumbertBroe2004.instDecidablePredArabicIsLabial Phonology.Studies.FrischPierrehumbertBroe2004.Arabic.k = isFalse not_false
- Phonology.Studies.FrischPierrehumbertBroe2004.instDecidablePredArabicIsLabial Phonology.Studies.FrischPierrehumbertBroe2004.Arabic.g = isFalse not_false
- Phonology.Studies.FrischPierrehumbertBroe2004.instDecidablePredArabicIsLabial Phonology.Studies.FrischPierrehumbertBroe2004.Arabic.q = isFalse not_false
- Phonology.Studies.FrischPierrehumbertBroe2004.instDecidablePredArabicIsLabial Phonology.Studies.FrischPierrehumbertBroe2004.Arabic.chi = isFalse not_false
- Phonology.Studies.FrischPierrehumbertBroe2004.instDecidablePredArabicIsLabial Phonology.Studies.FrischPierrehumbertBroe2004.Arabic.gamma = isFalse not_false
- Phonology.Studies.FrischPierrehumbertBroe2004.instDecidablePredArabicIsLabial Phonology.Studies.FrischPierrehumbertBroe2004.Arabic.hbar = isFalse not_false
- Phonology.Studies.FrischPierrehumbertBroe2004.instDecidablePredArabicIsLabial Phonology.Studies.FrischPierrehumbertBroe2004.Arabic.ayin = isFalse not_false
- Phonology.Studies.FrischPierrehumbertBroe2004.instDecidablePredArabicIsLabial Phonology.Studies.FrischPierrehumbertBroe2004.Arabic.h = isFalse not_false
- Phonology.Studies.FrischPierrehumbertBroe2004.instDecidablePredArabicIsLabial Phonology.Studies.FrischPierrehumbertBroe2004.Arabic.glottal = isFalse not_false
- Phonology.Studies.FrischPierrehumbertBroe2004.instDecidablePredArabicIsLabial Phonology.Studies.FrischPierrehumbertBroe2004.Arabic.l = isFalse not_false
- Phonology.Studies.FrischPierrehumbertBroe2004.instDecidablePredArabicIsLabial Phonology.Studies.FrischPierrehumbertBroe2004.Arabic.r = isFalse not_false
- Phonology.Studies.FrischPierrehumbertBroe2004.instDecidablePredArabicIsLabial Phonology.Studies.FrischPierrehumbertBroe2004.Arabic.n = isFalse not_false
- Phonology.Studies.FrischPierrehumbertBroe2004.instDecidablePredArabicIsLabial Phonology.Studies.FrischPierrehumbertBroe2004.Arabic.j = isFalse not_false
§ 2: Labial Natural Classes — Per-Pair Enumerations from FPB p. 199 #
Per-pair enumerations vs unified lattice #
@cite{frisch-pierrehumbert-broe-2004} p. 199 enumerates the labial natural classes relevant for two specific worked examples:
- /f, m/: 2 shared classes + 7 non-shared = 9 total. The non-shared
list explicitly includes
{f}(voiceless labial continuants). - /b, f/: 3 shared + 5 non-shared = 8 total. The non-shared list
does not include
{f}even though{f}containsfand notband would naturally appear in a unified lattice.
This difference is most likely a paper enumeration error, not a
principled per-pair-relativised derivation: under @cite{broe-1993}'s
specification-theory lattice the relevant labial natural classes are
inventory-determined (closed under intersection of feature-class
extensions), so once the lattice is fixed the set of classes containing
f is fixed too. {f} would therefore appear in both enumerations
under any consistent application of Broe's machinery, giving total = 9
for /b, f/ and similarity 3/9 ≈ 0.33 (rather than the paper's 3/8 = 0.38).
We reproduce the paper's two enumerations faithfully so the worked-
example similarity values match the paper's exact numbers (2/9 and
3/8); a unified-lattice derivation that would give the systematic 3/9
result requires substrate work (Broe 1993 specification theory) and is
deferred.
Labial natural classes relevant for the /f, m/ similarity computation, reproducing FPB p. 199 verbatim. The 2 shared classes appear first; the 7 non-shared follow. Each class is annotated with the phonological gloss the paper provides.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Labial natural classes relevant for the /b, f/ similarity computation, reproducing FPB p. 199 verbatim. The 3 shared classes appear first; the 5 non-shared follow.
Equations
- One or more equations did not get rendered due to their size.
Instances For
§ 3: The Natural-Classes Similarity Metric (FPB eq. 7, p. 198) #
The Nat-valued count of natural classes in xs containing at least
one of a, b — equivalently, shared + non-shared. The denominator of
FPB eq. 7.
Equations
- One or more equations did not get rendered due to their size.
Instances For
FPB eq. 7: the natural-classes similarity metric.
Similarity(a, b) = |classes containing both| / |classes containing a or b|
Generic in xs : List (Finset Arabic), the list of relevant natural
classes (containing a place feature, per FPB's stipulation on p. 198).
Identical consonants participating in any class get similarity 1 (every relevant class containing one contains the other). Non-homorganic consonants get similarity 0 (no relevant labial class contains either, so total = 0 → return 0 by convention to avoid 0/0).
Equations
- One or more equations did not get rendered due to their size.
Instances For
§ 3 worked examples (FPB p. 199) #
/f, m/ have 9 total relevant classes (2 shared + 7 non-shared) per FPB p. 199.
/b, f/ have 8 total relevant classes (3 shared + 5 non-shared) per FPB p. 199.
FPB worked example (p. 199): similarity(/f, m/) = 2/9.
/f, m/ share 2 labial natural classes (the labials, labial consonants)
and have 7 non-shared (the obstruents {b, f}, the continuants {f, w},
the voiceless continuants {f}, the voiced labials {b, m, w}, the
voiced stops {b, m}, the voiced sonorants {m, w}, the nasals {m}).
Similarity = 2 / (2 + 7) = 2/9 ≈ 0.22.
FPB worked example (p. 199): similarity(/b, f/) = 3/8.
/b, f/ share 3 labial natural classes (the labials, labial consonants,
obstruents) and have 5 non-shared ({f, w}, {b, m, w}, {b, m},
{b, w}, {b}). Similarity = 3 / (3 + 5) = 3/8 = 0.375.
§ 4: Empirical Table IV (FPB p. 203, adjacent pairs) #
FPB Table IV (p. 203, adjacent column): the gradient O/E pattern as
a function of natural-classes similarity. Each entry is
(similarity-bin-midpoint, O/E). The monotonic decrease from
O/E = 1.22 at similarity 0 (no constraint) down to O/E ≈ 0 at
similarity ≥ 0.4 (categorical avoidance of highly similar pairs)
embodies the gradient OCP claim.
Equations
- One or more equations did not get rendered due to their size.
Instances For
§ 5: Cross-Framework Divergence — Categorical Cannot Fit Table IV #
The categorical-at-threshold model #
Any TSL_2 grammar of the form
Core.Computability.Subregular.TSLGrammar.ofForbiddenPairs (fun a b => similarity la b ≥ t) p makes a two-valued prediction
about each consonant pair: forbidden (membership rejected) if similarity
≥ t, permitted (accepted) otherwise. As an O/E predictor, this becomes a
step function: predicted O/E = c1 for similarity < t (where pairs are
attested at some baseline rate) and c2 for similarity ≥ t (where pairs
are categorically suppressed). Below we show that no choice of
t, c1, c2 : ℚ can fit even three specific Table IV bins.
Categorical-at-threshold prediction: O/E = c1 if similarity < t,
otherwise c2. Any TSL_2 grammar with R := (similarity ≥ t) reduces
to this two-valued form.
Equations
- Phonology.Studies.FrischPierrehumbertBroe2004.categoricalAtThreshold t c1 c2 sim = if sim < t then c1 else c2
Instances For
Three Table IV bins exhibit three pairwise-distinct O/E values:
similarity 0 → 1.22, similarity 0.25 → 0.59, similarity 0.55 → 0.06
(see empiricalTableIV above). Witnesses that the FPB pattern has at
least three distinct response levels — more than a two-valued model
can produce.
The substrate connection: a TSL_2 grammar over Arabic whose
forbidden-pair relation is "similarity at or above threshold t",
restricted to the labial tier. Instantiates
Core.Computability.Subregular.TSLGrammar.ofForbiddenPairs directly,
making the connection from FPB's similarity metric to the substrate
machinery true by construction (rather than docstring-only).
Equations
- One or more equations did not get rendered due to their size.
Instances For
Every threshold-induced FPB grammar is TSL_2. Explicit
IsTierStrictlyLocal 2 typing of the implicit complexity claim made by
the thresholdedTSL constructor — the substrate-classification "any
two-valued threshold model is TSL_2" is now a typed theorem rather than
docstring prose.
BTSL_2 corollary (via IsTierStrictlyLocal.toIsBTSL in
Core.Computability.Subregular.Multitier): every threshold-induced FPB
grammar's stringset is in the multitier closure of strictly local
languages, hence consumed by the @cite{lambert-2026} BTC framework.
The TSL_2 grammar makes a binary step-function decision on labial
pairs. For two labial segments x, y, the grammar
thresholdedTSL xs t accepts the bigram [x, y] iff their similarity
is strictly below t. This is the precise sense in which any
similarity-threshold TSL_2 grammar collapses to the categoricalAtThreshold
two-valued prediction: the accept/reject decision is a step function of
similarity, not a graded response.
Cross-framework divergence theorem: for ANY threshold t : ℚ and
any pair of predicted O/E values c1, c2 : ℚ, the categorical-at-threshold
model cannot match all three of FPB Table IV's data points
(sim=0, O/E=1.22), (sim=0.25, O/E=0.59), (sim=0.55, O/E=0.06).
Significance: any similarity-threshold TSL_2 grammar (per
thresholdedTSL, instantiating @cite{heinz-rawal-tanner-2011}'s
substrate) makes a binary step-function decision on each pair (per
thresholdedTSL_pair_iff), so its O/E prediction collapses to two
values: one for permitted (similarity < t) pairs, one for forbidden
(similarity ≥ t) pairs. This theorem proves no such two-valued model
can reproduce three pairwise-distinct Table IV bins exactly. It is a
necessary consequence of FPB's gradient claim, not a full R²
comparison: FPB's actual argument is about aggregate fit quality across
9 bins (Categorical R² = 0.70 vs Natural Classes R² = 0.75; FPB Table V
p. 207) and requires the lexical corpus to formalise. The 3-bin
separation captured here is the corpus-free Lean-formal version.
The natural downstream extension of FPB's gradient observation is the weighted-constraint MaxEnt phonotactic learner of @cite{hayes-wilson-2008}, which uses similarity-relevant features as primitives and reproduces gradient phonotactic patterns by fitting log-linear weights — a positive-fit complement to this file's negative-fit (categorical-fails) result.
Proof strategy: case analysis on whether the threshold t lies
in each of the four intervals partitioned by 0, 0.25, 0.55. In each
case, the three predictions are some constant pattern over {c1, c2},
and linarith discharges the resulting numerical impossibility (the
three required O/E values are pairwise distinct).