Frisch, Pierrehumbert & Broe (2004) @cite{frisch-pierrehumbert-broe-2004} #

Similarity Avoidance and the OCP. Natural Language & Linguistic Theory 22(1):179–228.

@cite{frisch-pierrehumbert-broe-2004} (FPB) argue that the OCP-Place constraint in Arabic verbal roots is a gradient constraint whose strength is a quantitative function of the similarity between homorganic consonants. The categorical OCP-Place analyses of @cite{mccarthy-1986}, @cite{padgett-1995}, and @cite{mccarthy-1994} all face the same trade-off: dividing consonants into co-occurrence classes either ignores robust within-class variation (broad classes — many exceptions) or fragments the data into ad-hoc sub-classes (narrow classes — many missing generalisations). FPB resolve the trade-off with a single gradient constraint based on the natural-classes similarity metric (eq. 7):

Similarity(a, b) = SharedClasses(a, b) / (SharedClasses + NonShared)

restricted to natural classes containing a place feature. Identical consonants have similarity 1; non-homorganic consonants have similarity 0. The metric is sensitive to inventory contrast: larger, more divided classes (e.g. coronals) yield lower similarity for any pair within them than smaller classes (e.g. labials), exactly capturing the empirical distance between the strong coronal OCP and the weaker guttural/dorsal OCPs.

What this file formalises #

§1 — the 28-segment Arabic consonant inventory from FPB (8) p. 201.
§2 — the labial natural classes as enumerated by the paper itself in its two worked-example computations (p. 199). The enumeration is given separately for the /f, m/ and /b, f/ examples (matching the paper's text); see the design-boundary comment below on why the two enumerations are not identical.
§3 — the natural-classes similarity metric (eq. 7), parameterised on a list of relevant natural classes.
§3 worked examples — the paper's two explicit computations: similarity(/f, m/) = 2/9 and similarity(/b, f/) = 3/8 (p. 199).
§4 — the empirical Table IV (p. 203) for adjacent pairs: 9 (similarity-bin, O/E) data points whose monotonic decrease embodies FPB's gradient claim.
§5 — the substrate connection (thresholdedTSL + thresholdedTSL_pair_iff) showing that the TSL_2 grammar TSLGrammar.ofForbiddenPairs (similarity ≥ t) Arabic.isLabial makes a binary step-function decision on labial pairs (accept iff similarity strictly below threshold), and the cross-framework divergence theorem categorical_fails_three_test_points showing that no two-valued model (and hence no similarity-threshold TSL_2 grammar) can match three specific Table IV bins with three pairwise-distinct O/E values. This is the necessary-consequence formalisation of the design-boundary claim in Core/Computability/Subregular/ForbiddenPairs.lean. FPB's actual argument is stronger — it compares R² fits across nine bins (Categorical 0.70 vs Natural Classes 0.75 per Table V p. 207) — but full R² formalisation requires the lexical corpus and is deferred.

What this file does not formalise #

The 2,674-root corpus from @cite{cowan-1979} (the Hans Wehr Arabic-English dictionary) that anchors FPB's O/E computations is not in the paper text. We use the paper's reported O/E values (Table IV) as data, not a re-derived corpus.
The R² model fits (Table V: Frequency 0.57, Categorical 0.70, Soft 0.73, Feature 0.71, Natural Classes 0.75) require the corpus. Reproduction is deferred.
The stochastic constraint model of FPB §3.4 (logistic fit with K, S parameters; @cite{frisch-broe-pierrehumbert-1997}, Rutgers Optimality Archive) requires the corpus and is deferred.
§4.1 Frisch–Zawaydeh nonce-verb judgments (@cite{frisch-zawaydeh-2001}: Arabic speakers rate /babaθa/ identical < /θabama/ similar adjacent < /baʃafa/ similar nonadjacent < /baʔada/ nonhomorganic in OCP-violation severity) — experimental data, summarised in docstring only.
§4.2 Maltese borrowings from Italian (FPB Table VI p. 213: identical 0.26, similar homorganic 0.45, coronal stop/fric 0.78 — gradient OCP applied selectively to incorporated Italian forms; @cite{mifsud-1995}) — corpus-based, summarised only.
§4.2 cross-linguistic similarity-OCP attestations (Tigrinya @cite{buckley-1997-ocp}, Russian @cite{padgett-1995}, English @cite{berkley-1994}, Thai @cite{frisch-2000a}) — referenced in docstring.
§4.3 phonetic / cognitive origin (@cite{berg-1998}, @cite{boersma-1998}, @cite{frisch-1996} processing-difficulty argument; speech-error data of @cite{abd-el-jawad-abu-salim-1987}: /takriib/ for /takbiir/ 'glorification', /maraaʕiʃ/ for /maʃaaʕir/ 'feelings') — diachronic-functional grounding, summarised only.
Full natural-class derivation from a feature matrix — the paper sketches this (eq. 8 lays out the feature matrix; @cite{broe-1993}'s specification theory provides the lattice machinery), but a faithful Lean implementation requires a substrate effort the audit explicitly flagged as a separate next step. This file reproduces the paper's worked-example natural-class lists (per-pair) rather than deriving them.

Connection to `ForbiddenPairs.lean`'s design boundary #

Linglib/Core/Computability/Subregular/ForbiddenPairs.lean (the substrate file for tier-based strictly 2-local grammars defined by forbidden-pair relations) cites FPB in its design-boundary section as the empirical motivation for "single-tier TSL_2 cannot capture gradient similarity-based OCP." Pre-this-file, that citation lived only in docstring prose. Two declarations below make the claim Lean-formal: (1) thresholdedTSL instantiates the substrate's TSLGrammar.ofForbiddenPairs with R := λ x y => similarity xs x y ≥ t, giving a real (not metaphorical) TSL_2 grammar over Arabic; and (2) thresholdedTSL_pair_iff proves the grammar's accept/reject decision on labial pairs is exactly the binary step function similarity < t. The categorical_fails_three_test_points divergence theorem then witnesses that no such two-valued model can match three specific Table IV bins.

Connection to `Hansson2010.lean` #

@cite{hansson-2010} (Phenomena/Phonology/Studies/Hansson2010.lean) cites FPB at line 76 in its design-boundary section on similarity-graded transparency: cases where intervening segments behave differently depending on similarity to the harmonising pair cannot be captured by single-tier TSL with a fixed tier predicate. This file is the load-bearing instance of that observation for Arabic OCP-Place specifically.

Why this paper anchors a study file #

Per CLAUDE.md's anchoring discipline: every Lean file is anchored to exactly one of (a) a specific paper, (b) a documented empirical pattern, or (c) a named theoretical framework. The audit on ForbiddenPairs.lean (committed 0.230.508, hash 8579b346) flagged FPB as one of four "silent divergences" — a paper used as a substrate file's docstring example without itself being anchored. This file closes that finding by anchoring FPB to its primary phenomenon (Phonology) with a Lean-formal divergence theorem connecting back to the substrate citation.

Frisch, Pierrehumbert & Broe (2004) @cite{frisch-pierrehumbert-broe-2004} #

What this file formalises #

What this file does not formalise #

Connection to ForbiddenPairs.lean's design boundary #

Connection to Hansson2010.lean #

Why this paper anchors a study file #

§ 1: The Arabic Consonant Inventory (FPB feature matrix (8), p. 201) #

§ 2: Labial Natural Classes — Per-Pair Enumerations from FPB p. 199 #

Per-pair enumerations vs unified lattice #

§ 3: The Natural-Classes Similarity Metric (FPB eq. 7, p. 198) #

§ 3 worked examples (FPB p. 199) #

§ 4: Empirical Table IV (FPB p. 203, adjacent pairs) #

§ 5: Cross-Framework Divergence — Categorical Cannot Fit Table IV #

The categorical-at-threshold model #

Connection to `ForbiddenPairs.lean`'s design boundary #

Connection to `Hansson2010.lean` #