Documentation

Linglib.Studies.Bybee1985

Bybee (1985): the relevance hypothesis #

[Byb85] (Morphology: A Study of the Relation Between Meaning and Form, Typological Studies in Language 9) tests the relevance hypothesis on a 50-language stratified probability sample (Perkins 1980): a morpheme category whose meaning is more relevant to the verb stem (a) occurs more often as inflection cross-linguistically, (b) sits closer to the stem in suffixal morphology, and (c) fuses more tightly with it.

This file formalizes the data behind claims (a) and (b) — the Ch 2 §5 frequency surveys and the Ch 2 §6 morpheme-order counts — and grounds the substrate's relevance order (MorphCategory.peripherality, via RelevanceLT / RelevanceLE) in that evidence. Claim (c), fusion, is qualitative in the source and is not formalized.

Main definitions #

Main results #

Implementation notes #

Out of scope: Ch 1 fusion/allomorphy, Ch 3 paradigm organization, Ch 4's lexical-derivational-inflectional continuum (which the discrete MorphStatus enum cannot express — flagged for future work), and Part II tense/aspect/mood detail (Ch 6-9). MorphCategory.RelevanceLT is exercised independently by Studies/HahnDegenFutrell2021Morphology.lean; the BybeeCategory enum and toMorphCategory bridge feed Studies/RathiHahnFutrell2026.lean. The Ch 5 network substrate lives in Morphology/UsageBased/Network.lean; §5 below uses it on the English eat/ate/eaten paradigm.

Verbal categories (Ch 2 §3) #

Bybee's six core categories in relevance order: valence, voice, aspect, tense, mood, agreement (number/person/gender are agreement sub-types).

Bybee's Ch 2 verbal-inflectional categories, in her relevance order (stem first). Object agreement, number, and gender are tracked separately.

Instances For
    @[implicit_reducible]
    Equations
    Equations
    • One or more equations did not get rendered due to their size.
    Instances For
      @[implicit_reducible]
      Equations
      • One or more equations did not get rendered due to their size.

      Cross-linguistic frequency (Ch 2 §5, Figs 1+2) #

      Fig 1 counts the 50-sample languages with inflectional expression of a category; Fig 2 counts those with inflectional or derivational expression. Counts are integers because the sample is exactly 50 (count = percentage / 2).

      Prediction (a), deriv+infl: valence is the most frequent category, reflecting near-universal valence-changing morphology.

      Among purely inflectional categories, mood is the most frequent (Fig 1, 68%). Valence's drop from 90% to 6% is Bybee's point that valence-changing morphology is almost always derivational.

      In the deriv+infl survey (Fig 2), gender agreement is the least frequent category — Bybee's least-relevant verbal category. (Inflection-only, valence drops below gender, so Fig 2 is the relevance-faithful ranking.)

      Morpheme order (Ch 2 §6) #

      Prediction (b): the most relevant categories sit closest to the stem, the least relevant furthest. Bybee tests the four most frequent — aspect, tense, mood, person — counting, per pair, how many languages place one closer than the other. A pair is excluded when the morphemes are portmanteau, on opposite sides of the stem, mutually exclusive in one slot, or realized by stem modification.

      A Ch 2 §6 morpheme-order pair: closer / further are the predicted nearer / farther categories, closerCount / furtherCount the languages confirming / contradicting it, and total the languages where it is testable.

      Instances For
        def Bybee1985.instReprOrderPair.repr :
        OrderPairStd.Format
        Equations
        • One or more equations did not get rendered due to their size.
        Instances For

          The six pairs Bybee tests in Ch 2 §6; counts verified against the book, each inline comment quoting its source passage.

          Equations
          • One or more equations did not get rendered due to their size.
          Instances For

            Aspect vs. tense and aspect vs. mood are categorical: zero counterexamples in the whole sample, the strongest confirmations Bybee reports.

            Mood vs. person is the freest pair: the only one whose counterexample-to-total ratio exceeds 1/10.

            In every one of the six pairs the predicted direction outnumbers the counter-direction (Ch 2 §6 summary).

            theorem Bybee1985.ch2_section6_aggregate_counts :
            (List.map (fun (x : OrderPair) => x.total) orderPairs).sum = 125 (List.map (fun (x : OrderPair) => x.closerCount) orderPairs).sum = 59 (List.map (fun (x : OrderPair) => x.furtherCount) orderPairs).sum = 8

            Aggregate Ch 2 §6 counts: 125 testable observations, 59 in the predicted direction and 8 against (the other 58 are non-relevant under the exclusion criteria above).

            Connection to substrate MorphCategory.peripherality #

            MorphCategory.peripherality (in Morphology/MorphRule.lean) numerically encodes the hierarchy — lower = closer to stem = more relevant — faithfully to Ch 2 §3 for the six core categories. Its extensions (derivation, degree, negation, nonfinite) are linglib additions, not Bybee's.

            The substrate relevance order is strictly increasing along the six Ch 2 §3 categories: it reproduces valence < voice < aspect < tense < mood < agreement.

            The Ch 2 §6 morpheme-order data is exactly what RespectsRelevanceHierarchy predicts: in the substrate relevance order, Bybee's predicted-closer category is never less stem-relevant than the predicted-further one, so the order agrees with the empirical-majority direction on every tested pair.

            Grounding the hierarchy in the survey #

            On the four categories Bybee surveyed (aspect, tense, mood, person), the substrate order is not a free choice: SurveyedCloser, derived from orderPairs, coincides with RelevanceLT via toMorphCategory (survey_order_iso_relevance). So a RespectsRelevanceHierarchy check over these categories rests on an order isomorphism, not a stipulated table.

            a is surveyed closer to the stem than b when some tested Ch 2 §6 pair predicts a closer than b and the language counts confirm that direction (predicted majority). Derived from orderPairs, not stipulated.

            Equations
            Instances For
              @[implicit_reducible]
              Equations

              A category is surveyed if it appears in any tested Ch 2 §6 pair.

              Equations
              Instances For
                @[implicit_reducible]
                Equations

                SurveyedCloser is irreflexive: no tested pair ranks a category against itself.

                SurveyedCloser is transitive — the §6 survey tested every pair among its categories, so the confirmed dominances compose.

                theorem Bybee1985.surveyedCloser_total (a b : BybeeCategory) :
                Surveyed aSurveyed ba bSurveyedCloser a b SurveyedCloser b a

                SurveyedCloser is total on the surveyed categories: any two distinct surveyed categories are ordered by the §6 data in exactly one direction. With irreflexivity and transitivity, the survey alone determines a strict total order on its four categories.

                Grounding theorem: whenever the survey places a closer than b, the substrate ranks a strictly more stem-relevant — so toMorphCategory is strictly monotone from the surveyed order into the relevance order.

                Order isomorphism: on the surveyed categories, SurveyedCloser and the substrate RelevanceLT coincide via toMorphCategory. The hierarchy there is not merely consistent with Bybee's evidence — it is the order the §6 survey determines.

                The stem-outward ordering of the surveyed categories — a literal, but validated below as fully SurveyedCloser-sorted (bybeeSurveyedOrder_sorted) and exactly the surveyed categories without repeats (_complete, _nodup).

                Equations
                Instances For

                  The surveyed order mapped to substrate categories — exposed so consumers (HahnDegenFutrell2021Morphology, Karlsson2017, RathiHahnFutrell2026) check their slot orders against the survey rather than re-asserting the hierarchy.

                  Equations
                  Instances For

                    The data-derived surveyed order satisfies the substrate predicate, closing the loop between Bybee's §6 evidence and RespectsRelevanceHierarchy.

                    Ch 5 dynamic network, derived from the English Fragment #

                    Bybee Ch 5 §8 illustrates the network architecture with the Spanish dormir paradigm. We use English eat/ate/eaten and, rather than stipulating LexicalEntry strings, derive the network from eat : VerbEntry in Fragments/English/Predicates/Verbal.lean: changing eat.formPast there updates the network (CLAUDE.md "derive, don't duplicate"). Token frequencies default to 0; Bybee's verified counts (Francis & Kučera 1982) live in Morphology/UsageBased/Network.lean.

                    A verb's five inflected forms as Bybee LexicalEntry instances (token frequencies default to 0; the Fragment carries none).

                    Equations
                    • One or more equations did not get rendered due to their size.
                    Instances For

                      The Bybee network of a verb's paradigm, built from its Fragment VerbEntry. Every form pair gets a semantic edge (shared meaning) and a phonological edge — the latter approximating Bybee's "shared phonological skeleton"; a real similarity metric would gate it more selectively.

                      Equations
                      • One or more equations did not get rendered due to their size.
                      Instances For

                        The network of the English irregular eat, derived from the Fragment.

                        Equations
                        Instances For

                          Sanity check: eat's past form appears as a network entry, read off eat.formPast (no string literal), so decide tracks that field.

                          Negative test: a form outside eat's paradigm bears no relation to it — the relation is not vacuously true of arbitrary string pairs.