Bybee (1985): the relevance hypothesis #
[Byb85] (Morphology: A Study of the Relation Between Meaning and Form, Typological Studies in Language 9) tests the relevance hypothesis on a 50-language stratified probability sample (Perkins 1980): a morpheme category whose meaning is more relevant to the verb stem (a) occurs more often as inflection cross-linguistically, (b) sits closer to the stem in suffixal morphology, and (c) fuses more tightly with it.
This file formalizes the data behind claims (a) and (b) — the Ch 2 §5 frequency
surveys and the Ch 2 §6 morpheme-order counts — and grounds the substrate's
relevance order (MorphCategory.peripherality, via RelevanceLT / RelevanceLE)
in that evidence. Claim (c), fusion, is qualitative in the source and is not
formalized.
Main definitions #
BybeeCategory— the verbal-inflectional categories of Ch 2, in relevance order.inflectionalCount50,derivOrInflCount50— the Ch 2 §5 frequency surveys (Figs 1+2).orderPairs— the Ch 2 §6 morpheme-order counts, oneOrderPairper tested category pair.toMorphCategory— the embedding ofBybeeCategoryinto the substrateMorphCategory.SurveyedCloser— the stem-proximity order derived fromorderPairs.verbToBybeeNetwork— a verb's Ch 5 paradigm network, derived from a FragmentVerbEntry.
Main results #
mood_highest_when_inflectional,gender_lowest_when_derivOrInfl— the frequency predictions.predicted_outnumbers_counter,aspect_categorical_against_tense_and_mood— the order predictions.survey_order_iso_relevance— on the surveyed categoriesSurveyedCloserand the substrateRelevanceLTcoincide viatoMorphCategory: the hierarchy is the order the §6 survey forces, not a stipulated table.bybeeSurveyedSlots_respects_hierarchy— closes the loop toRespectsRelevanceHierarchy.
Implementation notes #
Out of scope: Ch 1 fusion/allomorphy, Ch 3 paradigm organization, Ch 4's
lexical-derivational-inflectional continuum (which the discrete MorphStatus
enum cannot express — flagged for future work), and Part II tense/aspect/mood
detail (Ch 6-9). MorphCategory.RelevanceLT is exercised independently by
Studies/HahnDegenFutrell2021Morphology.lean; the BybeeCategory enum and
toMorphCategory bridge feed Studies/RathiHahnFutrell2026.lean. The Ch 5
network substrate lives in Morphology/UsageBased/Network.lean; §5 below uses
it on the English eat/ate/eaten paradigm.
Verbal categories (Ch 2 §3) #
Bybee's six core categories in relevance order: valence, voice, aspect, tense, mood, agreement (number/person/gender are agreement sub-types).
Bybee's Ch 2 verbal-inflectional categories, in her relevance order (stem first). Object agreement, number, and gender are tracked separately.
- valence : BybeeCategory
- voice : BybeeCategory
- aspect : BybeeCategory
- tense : BybeeCategory
- mood : BybeeCategory
- numberAgr : BybeeCategory
- personAgr : BybeeCategory
- personAgrObj : BybeeCategory
- genderAgr : BybeeCategory
Instances For
Equations
- Bybee1985.instDecidableEqBybeeCategory x✝ y✝ = if h : x✝.ctorIdx = y✝.ctorIdx then isTrue ⋯ else isFalse ⋯
Equations
- Bybee1985.instReprBybeeCategory = { reprPrec := Bybee1985.instReprBybeeCategory.repr }
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Cross-linguistic frequency (Ch 2 §5, Figs 1+2) #
Fig 1 counts the 50-sample languages with inflectional expression of a category; Fig 2 counts those with inflectional or derivational expression. Counts are integers because the sample is exactly 50 (count = percentage / 2).
Languages (of 50) with inflectional expression of c (Fig 1).
Equations
- Bybee1985.inflectionalCount50 Bybee1985.BybeeCategory.valence = 3
- Bybee1985.inflectionalCount50 Bybee1985.BybeeCategory.voice = 13
- Bybee1985.inflectionalCount50 Bybee1985.BybeeCategory.aspect = 26
- Bybee1985.inflectionalCount50 Bybee1985.BybeeCategory.tense = 24
- Bybee1985.inflectionalCount50 Bybee1985.BybeeCategory.mood = 34
- Bybee1985.inflectionalCount50 Bybee1985.BybeeCategory.numberAgr = 27
- Bybee1985.inflectionalCount50 Bybee1985.BybeeCategory.personAgr = 28
- Bybee1985.inflectionalCount50 Bybee1985.BybeeCategory.personAgrObj = 14
- Bybee1985.inflectionalCount50 Bybee1985.BybeeCategory.genderAgr = 8
Instances For
Languages (of 50) with inflectional or derivational expression of c
(Fig 2). Valence reaches 90% once valence-changing derivation is counted; only
Haitian, Karankawa, Navaho, Serbo-Croatian, and Vietnamese lack it.
Equations
- Bybee1985.derivOrInflCount50 Bybee1985.BybeeCategory.valence = 45
- Bybee1985.derivOrInflCount50 Bybee1985.BybeeCategory.voice = 28
- Bybee1985.derivOrInflCount50 Bybee1985.BybeeCategory.aspect = 37
- Bybee1985.derivOrInflCount50 Bybee1985.BybeeCategory.tense = 25
- Bybee1985.derivOrInflCount50 Bybee1985.BybeeCategory.mood = 34
- Bybee1985.derivOrInflCount50 Bybee1985.BybeeCategory.numberAgr = 33
- Bybee1985.derivOrInflCount50 Bybee1985.BybeeCategory.personAgr = 28
- Bybee1985.derivOrInflCount50 Bybee1985.BybeeCategory.personAgrObj = 14
- Bybee1985.derivOrInflCount50 Bybee1985.BybeeCategory.genderAgr = 8
Instances For
Prediction (a), deriv+infl: valence is the most frequent category, reflecting near-universal valence-changing morphology.
Among purely inflectional categories, mood is the most frequent (Fig 1, 68%). Valence's drop from 90% to 6% is Bybee's point that valence-changing morphology is almost always derivational.
In the deriv+infl survey (Fig 2), gender agreement is the least frequent category — Bybee's least-relevant verbal category. (Inflection-only, valence drops below gender, so Fig 2 is the relevance-faithful ranking.)
Morpheme order (Ch 2 §6) #
Prediction (b): the most relevant categories sit closest to the stem, the least relevant furthest. Bybee tests the four most frequent — aspect, tense, mood, person — counting, per pair, how many languages place one closer than the other. A pair is excluded when the morphemes are portmanteau, on opposite sides of the stem, mutually exclusive in one slot, or realized by stem modification.
A Ch 2 §6 morpheme-order pair: closer / further are the predicted
nearer / farther categories, closerCount / furtherCount the languages
confirming / contradicting it, and total the languages where it is testable.
- closer : BybeeCategory
- further : BybeeCategory
- closerCount : ℕ
- furtherCount : ℕ
- total : ℕ
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- Bybee1985.instReprOrderPair = { reprPrec := Bybee1985.instReprOrderPair.repr }
The six pairs Bybee tests in Ch 2 §6; counts verified against the book, each inline comment quoting its source passage.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Aspect vs. tense and aspect vs. mood are categorical: zero counterexamples in the whole sample, the strongest confirmations Bybee reports.
Mood vs. person is the freest pair: the only one whose counterexample-to-total ratio exceeds 1/10.
In every one of the six pairs the predicted direction outnumbers the counter-direction (Ch 2 §6 summary).
Aggregate Ch 2 §6 counts: 125 testable observations, 59 in the predicted direction and 8 against (the other 58 are non-relevant under the exclusion criteria above).
Connection to substrate MorphCategory.peripherality #
MorphCategory.peripherality (in Morphology/MorphRule.lean) numerically
encodes the hierarchy — lower = closer to stem = more relevant — faithfully to
Ch 2 §3 for the six core categories. Its extensions (derivation, degree,
negation, nonfinite) are linglib additions, not Bybee's.
Embed BybeeCategory into the substrate MorphCategory. All four agreement
subtypes collapse to .agreement: Bybee's verbal-number agreement sits at the
low-relevance (rank-8) end with person and gender, not with nominal .number
(rank 3). Subject vs object is preserved via the controller role.
Equations
- Bybee1985.toMorphCategory Bybee1985.BybeeCategory.valence = Morphology.MorphCategory.valence
- Bybee1985.toMorphCategory Bybee1985.BybeeCategory.voice = Morphology.MorphCategory.voice
- Bybee1985.toMorphCategory Bybee1985.BybeeCategory.aspect = Morphology.MorphCategory.aspect
- Bybee1985.toMorphCategory Bybee1985.BybeeCategory.tense = Morphology.MorphCategory.tense
- Bybee1985.toMorphCategory Bybee1985.BybeeCategory.mood = Morphology.MorphCategory.mood
- Bybee1985.toMorphCategory Bybee1985.BybeeCategory.numberAgr = Morphology.MorphCategory.agreement Agreement.Controller.subj
- Bybee1985.toMorphCategory Bybee1985.BybeeCategory.personAgr = Morphology.MorphCategory.agreement Agreement.Controller.subj
- Bybee1985.toMorphCategory Bybee1985.BybeeCategory.personAgrObj = Morphology.MorphCategory.agreement Agreement.Controller.obj
- Bybee1985.toMorphCategory Bybee1985.BybeeCategory.genderAgr = Morphology.MorphCategory.agreement Agreement.Controller.subj
Instances For
The substrate relevance order is strictly increasing along the six Ch 2 §3 categories: it reproduces valence < voice < aspect < tense < mood < agreement.
The Ch 2 §6 morpheme-order data is exactly what RespectsRelevanceHierarchy
predicts: in the substrate relevance order, Bybee's predicted-closer category is
never less stem-relevant than the predicted-further one, so the order agrees
with the empirical-majority direction on every tested pair.
Grounding the hierarchy in the survey #
On the four categories Bybee surveyed (aspect, tense, mood, person), the
substrate order is not a free choice: SurveyedCloser, derived from
orderPairs, coincides with RelevanceLT via toMorphCategory
(survey_order_iso_relevance). So a RespectsRelevanceHierarchy check over
these categories rests on an order isomorphism, not a stipulated table.
a is surveyed closer to the stem than b when some tested Ch 2 §6 pair
predicts a closer than b and the language counts confirm that direction
(predicted majority). Derived from orderPairs, not stipulated.
Equations
- Bybee1985.SurveyedCloser a b = ∃ p ∈ Bybee1985.orderPairs, p.closer = a ∧ p.further = b ∧ p.furtherCount < p.closerCount
Instances For
Equations
- Bybee1985.instDecidableRelBybeeCategorySurveyedCloser x✝¹ x✝ = id inferInstance
A category is surveyed if it appears in any tested Ch 2 §6 pair.
Equations
- Bybee1985.Surveyed c = ∃ p ∈ Bybee1985.orderPairs, p.closer = c ∨ p.further = c
Instances For
Equations
- Bybee1985.instDecidablePredBybeeCategorySurveyed x✝ = id inferInstance
SurveyedCloser is irreflexive: no tested pair ranks a category against
itself.
SurveyedCloser is transitive — the §6 survey tested every pair among its
categories, so the confirmed dominances compose.
SurveyedCloser is total on the surveyed categories: any two distinct
surveyed categories are ordered by the §6 data in exactly one direction. With
irreflexivity and transitivity, the survey alone determines a strict total
order on its four categories.
Grounding theorem: whenever the survey places a closer than b, the
substrate ranks a strictly more stem-relevant — so toMorphCategory is
strictly monotone from the surveyed order into the relevance order.
Order isomorphism: on the surveyed categories, SurveyedCloser and the
substrate RelevanceLT coincide via toMorphCategory. The hierarchy there is
not merely consistent with Bybee's evidence — it is the order the §6 survey
determines.
The stem-outward ordering of the surveyed categories — a literal, but
validated below as fully SurveyedCloser-sorted (bybeeSurveyedOrder_sorted)
and exactly the surveyed categories without repeats (_complete, _nodup).
Equations
Instances For
The surveyed order mapped to substrate categories — exposed so consumers
(HahnDegenFutrell2021Morphology, Karlsson2017, RathiHahnFutrell2026) check
their slot orders against the survey rather than re-asserting the hierarchy.
Equations
Instances For
The data-derived surveyed order satisfies the substrate predicate, closing
the loop between Bybee's §6 evidence and RespectsRelevanceHierarchy.
Ch 5 dynamic network, derived from the English Fragment #
Bybee Ch 5 §8 illustrates the network architecture with the Spanish dormir
paradigm. We use English eat/ate/eaten and, rather than stipulating
LexicalEntry strings, derive the network from eat : VerbEntry in
Fragments/English/Predicates/Verbal.lean: changing eat.formPast there
updates the network (CLAUDE.md "derive, don't duplicate"). Token frequencies
default to 0; Bybee's verified counts (Francis & Kučera 1982) live in
Morphology/UsageBased/Network.lean.
A verb's five inflected forms as Bybee LexicalEntry instances (token
frequencies default to 0; the Fragment carries none).
Equations
- One or more equations did not get rendered due to their size.
Instances For
The Bybee network of a verb's paradigm, built from its Fragment VerbEntry.
Every form pair gets a semantic edge (shared meaning) and a phonological edge —
the latter approximating Bybee's "shared phonological skeleton"; a real
similarity metric would gate it more selectively.
Equations
- One or more equations did not get rendered due to their size.
Instances For
The network of the English irregular eat, derived from the Fragment.
Instances For
Sanity check: eat's past form appears as a network entry, read off
eat.formPast (no string literal), so decide tracks that field.