Word — the morphosyntactic word (ms-word) token #
The surface token: the unit that (morpho)syntax treats as a word. Wordhood minimally
splits into the ms-word (this type) and the p-word (the prosodic word,
Phonology/Prosodic/Word.lean) — the "one area of robust consensus" on the wordhood
problem ([KBC+26] §3.2); we follow the Element in calling ms-words
simply words. The split is descriptive, not a Lexicalist commitment: ms-words are
"crucial for lexicalist theories" but used descriptively by non-lexicalist ones too
(§3.2.1, §3.3), and this token carries no theory of how words are formed.
Word completes Morphology's word inventory: MorphWord is word-internal structure,
MorphRule/Stem are word-forming processes, Word is the resulting token —
form + UD category + one UD.MorphFeatures bundle, i.e. a CoNLL-U row. The
ms- vs p-boundness typology relating the two word notions ([KBC+26]
Table 3) is formalized in Studies/KalinBjorkmanEtAl2026.lean.
Main declarations #
Word— the token, with its admission rule (see the declaration docstring).Word.phi— the φ-feature projection (person/number/gender).Word.Agree— φ-agreement: a reflexive, symmetric, non-transitive tolerance relation.Word.asPassive— passive variant (voice morphology only; valence effects areDepTree.frames-level facts).
A word: the pure CoNLL-U surface token — surface form, UD category, and UD
morphology (one UD.MorphFeatures bundle; there is no separate word-level feature
record).
Admission rule: a property belongs on Word iff a Fragment-free token-level
engine reads it off the token's own data; otherwise it lives on the typed lexical
carrier (Pronoun, NounEntry, Verb, …) or on the consuming framework's own
structures (e.g. DG subcategorization premises live on DepTree.frames, not here).
Identity caveat: BEq is form + category, so homographs collapse; a CoNLL-U
lemma field is the known fix, deferred until a consumer needs it.
- form : String
- cat : UD.UPOS
- features : UD.MorphFeatures
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- instReprWord = { reprPrec := instReprWord.repr }
φ-agreement between two words: their person/number/gender features are
compatible (an unspecified feature is a wildcard). A reflexive, symmetric
tolerance relation on Word (not transitive), decided by the shared
UD.MorphFeatures.compatible. This is the feature-based agreement primitive
binding and concord consumers share — no surface-form gender lookup.
Equations
- w1.Agree w2 = (w1.phi.compatible w2.phi = true)
Instances For
Equations
- instDecidableAgree w1 w2 = id inferInstance
φ-agreement is symmetric — the docstring's "symmetric tolerance relation", as a theorem.
A word bears the number its UD morphology ingests (Number.fromUD).
Equations
- instHasNumberWord = { numberOf := fun (w : Word) => w.features.number.bind Number.fromUD }
Equations
- instHasPersonWord = { personOf := fun (w : Word) => Option.map Person.fromUD w.features.person }
Equations
- instHasCaseWord = { caseOf := fun (w : Word) => Option.map Case.fromUD w.features.case_ }
The φ-projection preserves numberOf: a word and its phi bundle bear
the same number — the defeq Word.Agree.hasNumber_compatible relies on.
φ-agreement entails number compatibility: the HasNumber mixin never
diverges from the agreement engine on Word.
Derive a passive variant: sets voice to passive. The valence change
(detransitivization) is a frame-level fact carried by the passive analysis on
DepTree.frames, not token data. Composes with VerbEntry.toWordPastPart.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- instToStringWord = { toString := fun (w : Word) => w.form }
Convert a word list to a readable string.
Equations
- wordsToString ws = " ".intercalate (List.map (fun (x : Word) => x.form) ws)