Documentation

Linglib.Studies.Benua1997.Basic

Benua 1997 — Misapplication Unification #

[Ben97]

Benua's central empirical claim, defended across three case studies: overapplication (Sundanese nasal harmony) and underapplication (Tiberian Hebrew spirantization) of phonological processes in morphologically complex words are duals of one mechanism — high-ranked OO-Faithfulness forcing the derivative to mimic the base, even at the cost of marked structures in the derivative's surface form.

This file formalizes the misapplication-unification claim: the same TETRU-shaped ranking schema (M₁ ≫ OO-Ident ≫ M₂ ≫ IO-Faith) derives both kinds of misapplication, depending on whether the relevant markedness constraint M₂ is satisfied or violated by the base output.

What's here #

What's not here #

Architectural integration #

Consumes:

By construction, every claim about misapplication routes through the unifying Corr substrate. There is no separate Sundanese or Hebrew machinery — the cross-linguistic point of [Ben97] is that one mechanism explains both.

The paradigm-uniformity face of TCT: a Corr-typed 3-role diagram over input + base + derivative, with the OO edge carrying the morphological alignment. The asymmetric base-priority evaluation discipline lives in OptimalityTheory/TCT.lean (TCTGrammar); these constructors are the compositional face that the case studies below build their candidates from.

def Benua1997.formFor {α : Type u_1} (input base derivative : List α) :
TCT.RoleList α

Select the form for a TCT role from explicit input/base/derivative lists.

Equations
Instances For
    def Benua1997.diagramWithEdge {α : Type u_1} (input base derivative : List α) (ooEdge : Finset ( × )) (hOO : pooEdge, p.1 < base.length p.2 < derivative.length) :

    The general TCT diagram constructor: the three forms plus an explicit OO correspondence relation ooEdge between base and derivative positions (with a well-formedness proof hOO). The IO relations (input-base, input- derivative) are the parallel-pair correspondence; only the OO relation carries the morphological alignment a study specifies (e.g. the Sundanese infix). The reverse directions are recovered by Corr.edge.

    Equations
    • One or more equations did not get rendered due to their size.
    Instances For
      def Benua1997.diagram {α : Type u_1} (input base derivative : List α) :

      The parallel-pair specialization: the OO edge is the parallel (i, i) correspondence up to min base.length derivative.length. For morphological re-alignment use diagramWithEdge.

      Equations
      Instances For
        def Benua1997.identOOViol {α : Type u_1} [DecidableEq α] (c : OptimalityTheory.Correspondence.Corr TCT.Role α) :

        IDENT-OO: featural identity of corresponding base and derivative positions. The load-bearing constraint of [Ben97]'s misapplication unification — high-ranked IDENT-OO forces overapplication (Sundanese nasal harmony) and underapplication (Tiberian Hebrew spirantization) as duals of one mechanism.

        Equations
        Instances For
          theorem Benua1997.identOO_when_equal {α : Type u_1} [DecidableEq α] (input shared : List α) :
          identOOViol (diagram input shared shared) = 0

          When the derivative is identical to the base, IDENT-OO is satisfied (zero violations) — the "perfect uniformity" baseline where there is nothing to misapply.

          inductive Benua1997.Seg :

          A minimal segmental inventory shared across both case studies, abstract enough to encode the relevant phonological contrasts:

          • Nasal vs. oral consonant (Sundanese trigger; Hebrew sonority)
          • Nasal vs. oral vowel (Sundanese harmony target)
          • Stop vs. spirant (Hebrew spirantization target)

          Per Benua's analysis, the segmental representations abstract from the full IPA forms — what matters is the featural contrast at each position, not segment identity.

          Instances For
            @[implicit_reducible]
            instance Benua1997.instDecidableEqSeg :
            DecidableEq Seg
            Equations
            def Benua1997.instReprSeg.repr :
            SegStd.Format
            Equations
            Instances For
              @[implicit_reducible]
              instance Benua1997.instReprSeg :
              Repr Seg
              Equations

              Data #

              Singular: /ɲiar/ → [ɲĩãr] 'seek' Plural: /ɲ-ar-iar/ → [ɲ-ãl-ĩãr] 'seek (pl)'

              The data source is [Coh90] on Sundanese nasal phonology; [Ben97] reanalyzes it in TCT terms.

              Canonical postnasal nasal harmony: vowels are nasalized iff they follow a nasal consonant. In the singular [ɲĩãr], the post-ɲ vowels [ĩ ã] are nasal as expected.

              In the plural [ɲ-ãl-ĩãr], the root vowels [ĩ ã] in derivative positions 3-4 remain nasal even though they are now post-l (an oral consonant) — overapplication. Without OO-Faith, plain canonical harmony would predict [ɲ-ãl-iar] (oral [i a] in the root). Note the infix vowel [ã] is nasal by canonical post-ɲ harmony — it is not the overapplication target.

              Singular base: /ɲiar/ → [ɲĩãr]. Encoded as [nasalC, nasalV, nasalV, oralC].

              Equations
              Instances For

                Singular surface form (canonical postnasal nasal harmony).

                Equations
                Instances For

                  Plural input: /ɲ-ar-iar/ — root /ɲiar/ with infix /ar/ inserted after the nasal. Six segments: ɲ, a, r (infix), i, a, r (root).

                  Equations
                  Instances For

                    Overapplied plural surface form: [ɲ-ãl-ĩãr]. Position 1 (infix vowel) is nasal by canonical harmony (post-ɲ). Positions 3-4 (root vowels) are also nasal — this is the overapplication: they are post-l (oral) so canonical harmony would not spread to them, but OO-Faith to the base forces them to remain nasal.

                    Equations
                    Instances For

                      Counterfactual non-overapplying surface form: [ɲ-ãl-iar]. Canonical harmony stops at the oral consonant l at position 2; positions 3-4 are NOT in post-nasal context so both root vowels are oral. This is the candidate TCT correctly rules out via OO-Ident.

                      Equations
                      Instances For
                        def Benua1997.Sundanese.baseDerivEdge :
                        Finset ( × )

                        Morphological OO-edge: maps base positions to their morphological correspondents in the derivative, not index-by-index. The infix -ar- is inserted between root-initial ɲ and the rest, so the alignment is:

                        base[0] = ɲ  ↔  deriv[0] = ɲ
                        base[1] = i  ↔  deriv[3] = i  (deriv positions 1, 2 are infix material)
                        base[2] = a  ↔  deriv[4] = a
                        base[3] = r  ↔  deriv[5] = r
                        

                        Standard correspondence-theoretic treatment of infixation: the OO relation respects morphological identity, not linear position.

                        Equations
                        Instances For

                          The TCT correspondence diagram for the overapplication candidate.

                          Equations
                          • One or more equations did not get rendered due to their size.
                          Instances For

                            The TCT correspondence diagram for the non-overapplying candidate.

                            Equations
                            • One or more equations did not get rendered due to their size.
                            Instances For

                              The misapplication theorem (Sundanese case): the overapplied candidate strictly beats the canonical-harmony candidate on OO-Ident. Under TETRU ranking (OO-Ident ≫ *NASAL-AFTER-ORAL), overapplied wins.

                              Data #

                              Imperfect base: /yiktol/ → [yiqθol] 'kill (3sg.m.imperf)' — post-vocalic [t] spirantizes to [θ] by canonical rule. Truncated jussive: /yiktl/ → [yiqṭl] (NOT [yiqθl]) — the [t] in the cluster does NOT spirantize, even though now post-vocalic.

                              Without OO-Faith, the truncated form would canonically spirantize the post-vocalic [t]. Underapplication preserves paradigmatic identity to the imperfect base — but at the featural level: what's preserved is the [-continuant] value of the obstruent, not the segment identity per se. Benua argues this requires IDENT-[continuant]-OO, not segment-level OO-Ident.

                              This case study uses Corr.identViolFeature with a continuant projection to compute featural OO-IDENT, the constraint Benua actually appeals to.

                              The [continuant] feature value of a segment. Stops are [-cont], spirants are [+cont]; vowels are [+cont]; consonants other than obstruents are not in the relevant natural class but for this minimal encoding we project them to a default.

                              Equations
                              Instances For

                                Imperfect base input: /yiktol/. Position 2 is the relevant obstruent.

                                Equations
                                Instances For

                                  Imperfect surface form: [yiqθol] — post-vocalic [t] spirantizes to [θ]. Position 2 is spirant (i.e., [+cont]).

                                  Equations
                                  Instances For

                                    Underapplied jussive: [yiqṭl] — the [t] FAILS to spirantize even though now post-vocalic. Position 2 is stop ([-cont]), preserving the underlying featural value. The empirical winner.

                                    Equations
                                    Instances For

                                      Canonical-spirantization jussive: [yiqθl] — what the canonical rule would predict. Position 2 is spirant ([+cont]). The candidate TCT correctly rules out — using featural OO-IDENT.

                                      Equations
                                      Instances For

                                        Morphological OO-edge for Hebrew: the truncation removes base position 3 (the syllable-nucleus vowel [o]); base positions 0, 1, 2, and 4 map to derivative positions 0, 1, 2, 3 respectively:

                                        base[0] = y  ↔  deriv[0] = y
                                        base[1] = i  ↔  deriv[1] = i
                                        base[2] = θ/t ↔ deriv[2] = θ/t  (the spirantization site)
                                        base[4] = l  ↔  deriv[3] = l    (base[3] = o deleted by truncation)
                                        
                                        Equations
                                        Instances For

                                          TCT diagram for the (empirically winning) underapplied jussive.

                                          Equations
                                          • One or more equations did not get rendered due to their size.
                                          Instances For

                                            TCT diagram for the (empirically losing) canonical-spirantization jussive.

                                            Equations
                                            • One or more equations did not get rendered due to their size.
                                            Instances For

                                              Featural OO-IDENT: counts pairs (i, j) ∈ ooEdge where the [continuant] feature value differs between base[i] and derivative[j]. The constraint [Ben97] actually appeals to (Ch 4).

                                              For the underapplied candidate: base[2] = spirant ([+cont]), deriv[2] = stop ([-cont]) — featural mismatch. Other pairs preserve [continuant].

                                              For the canonical candidate: base[2] = spirant ([+cont]), deriv[2] = spirant ([+cont]) — featural match.

                                              Equations
                                              Instances For

                                                The Hebrew featural-IDENT inversion: the empirically-correct underapplied candidate has more OO-IDENT-[cont] violations than the canonical-spirantization candidate. Under pure OO-IDENT-[cont] ≫ M₂, canonical would win — contradicting the empirical fact.

                                                The resolution (per [Ben97] Ch 4): the load-bearing constraint is IDENT-[continuant]-IO preserving the input stop, not OO-Ident. The TETRU schema for Hebrew puts IO-Ident-[cont] above OO-Ident, then *MAX-V (deletes [o]) above all the rest. This file formalizes the OO substrate; the full Hebrew tableau requires the additional IO-faith and *MAX-V machinery, deferred.

                                                The unified architectural claim of [Ben97]: both overapplication (Sundanese, §2) and underapplication (Tiberian Hebrew, §3) are formalized as the same construction — a 3-role TCT correspondence diagram with IDENT-OO (segmental or featural) on the (.base, .derivative) edge. The empirical contrast between the two languages reduces to which markedness constraint plays the M₂ role in the TETRU schema, and whether OO-Ident is segmental or featural.

                                                Under the TCT.TetruSchema substrate (`Phonology/OptimalityTheory/TCT.lean`):
                                                
                                                - **Sundanese**: M₂ = `*NASAL-AFTER-ORAL`. OO-Ident is segmental (or
                                                  restricted to `[nasal]`). Overapplication = the misapplied
                                                  candidate (deriv positions 3-4 nasal) strictly beats canonical.
                                                - **Tiberian Hebrew**: M₂ involves spirantization plus *MAX-V plus
                                                  featural IO-Ident-[cont]. The OO-Ident comparison alone does not
                                                  decide — full TETRU requires the extra constraints documented above.
                                                
                                                The shared structural mechanism: `TetruSchema.misapplication_wins`
                                                (in `TCT.lean`). 
                                                

                                                The polemic of Benua's thesis #

                                                [Ben97]'s central architectural argument is that parallel TCT subsumes the predictions of stratal/cyclic phonology for misapplication patterns, eliminating the need for cycles. The two architectures differ in how they explain misapplication, but converge on the same surface predictions for the canonical cases.

                                                This section formalizes the architectural comparison on Sundanese: both architectures predict the overapplied surface form [ɲ-ãl-ĩãr] for the plural, but for different structural reasons.

                                                The bridge: on Sundanese, both architectures agree on the surface form. The agreement is documented as an rfl-checkable claim about the architecturally-distinct derivations producing identical phraseOutput.

                                                A two-stratum stratal derivation of the Sundanese plural.

                                                • Stem cycle: input /ɲiar/ harmonized to [ɲĩãr] = Sundanese.baseOutput.
                                                • Word cycle: stem output combined with infix /-ar-/, then Word-stratum harmony produces [ɲ-ãl-ĩãr] = Sundanese.derivOutputOverapplied. Crucially, the Word stratum preserves the nasal vowels carried over from the Stem cycle (high-ranked IDENT-IO at Word level).
                                                • Phrase cycle: identity (no further phonological adjustment).

                                                The full derivation is encoded directly; the actual stratum-by-stratum OT evaluations are deferred (would require stratum-specific constraint rankings + GEN/EVAL machinery instantiated on Sundanese.Seg). The point of this section is the architectural agreement, not the grammar implementation.

                                                Equations
                                                • One or more equations did not get rendered due to their size.
                                                Instances For

                                                  The stratal derivation expressed as a Corr StratalRole Seg. Now stratal and TCT analyses share a substrate type — the Corr family. Cross-stratum feeding edges (.sIn↔.sOut, .sOut↔.wOut, .wOut↔.pOut) carry parallel-pair correspondences.

                                                  Equations
                                                  • One or more equations did not get rendered due to their size.
                                                  Instances For

                                                    The stratal derivation projected down to TCT roles. The .sOut becomes TCT .base; the .pOut becomes TCT .derivative; the .wOut is folded out (TCT doesn't distinguish a separate word stratum).

                                                    Equations
                                                    • One or more equations did not get rendered due to their size.
                                                    Instances For

                                                      Bridge theorem (substrate level): the stratal derivation's phrase output, when projected to a TCT diagram, IS the TCT derivative form. True by rfl via project_preserves_phrase_as_derivative. The two architectures share their notion of "surface form" by construction once they are projected onto the same Corr substrate.

                                                      Bridge theorem (Sundanese-specific): the stratal derivation and the directly-built TCT diagram (Sundanese.overappliedDiagram) agree on the surface form. Both produce Sundanese.derivOutputOverapplied — the empirical convergence claim of [Ben97].

                                                      Bridge theorem (architectural): the stratal stem output IS the TCT base. Benua's identification of "stratum-1 result = OO-base" is true by construction of the project function.

                                                      Bridge theorem (substrate level): the OO-correspondence edge in the projected TCT diagram is the parallel-pair correspondence between the stratal stem output and phrase output. The formal content of "the OO-Faith of TCT replaces the cycle-chain of stratal": the OO edge is recovered from the stratal feeding chain by composing (.sOut, .wOut) and (.wOut, .pOut) into the direct (.base, .derivative) correspondence.

                                                      What's not yet proved #

                                                      This bridge is substrate-level (the stratal and TCT diagrams share the Corr family) and concrete (one paradigm). The full Benua claim — for every stratal analysis there exists a TCT analysis producing the same surface predictions — is a meta-theoretical statement requiring:

                                                      1. A model of stratal grammars as functions Input → Output derived from stratum-specific constraint rankings (Stratal.evalStratum chained via chainEval).
                                                      2. A model of TCT grammars as TCTGrammar instances (already exists in TCT.lean).
                                                      3. A constructive translation stratalToTCT : StratalGrammar → TCTGrammar such that for all inputs, the derived surface forms agree — essentially: "TCT's OO-Faith with stratum-1-output as base = stratum-2's IDENT-IO with stratum-1-output as input."

                                                      Step 3 is the load-bearing piece; [Ben97]'s polemic is that this translation exists (and is empirically supported by Sundanese, Tiberian Hebrew, and English). Formalizing the constructive translation — or finding a counterexample — is the next major scientific step. The substrate is now in place: any candidate translation would map through the Corr Role α type that both architectures now share.