Documentation

Linglib.Morphology.Word

Word — the morphosyntactic word (ms-word) token #

[KBC+26]

The surface token: the unit that (morpho)syntax treats as a word. Wordhood minimally splits into the ms-word (this type) and the p-word (the prosodic word, Phonology/Prosodic/Word.lean) — the "one area of robust consensus" on the wordhood problem ([KBC+26] §3.2); we follow the Element in calling ms-words simply words. The split is descriptive, not a Lexicalist commitment: ms-words are "crucial for lexicalist theories" but used descriptively by non-lexicalist ones too (§3.2.1, §3.3), and this token carries no theory of how words are formed.

Word completes Morphology's word inventory: MorphWord is word-internal structure, MorphRule/Stem are word-forming processes, Word is the resulting token — form + UD category + one UD.MorphFeatures bundle, i.e. a CoNLL-U row. The ms- vs p-boundness typology relating the two word notions ([KBC+26] Table 3) is formalized in Studies/KalinBjorkmanEtAl2026.lean.

Main declarations #

structure Word :

A word: the pure CoNLL-U surface token — surface form, UD category, and UD morphology (one UD.MorphFeatures bundle; there is no separate word-level feature record).

Admission rule: a property belongs on Word iff a Fragment-free token-level engine reads it off the token's own data; otherwise it lives on the typed lexical carrier (Pronoun, NounEntry, Verb, …) or on the consuming framework's own structures (e.g. DG subcategorization premises live on DepTree.frames, not here). Identity caveat: BEq is form + category, so homographs collapse; a CoNLL-U lemma field is the known fix, deferred until a consumer needs it.

Instances For
    def instReprWord.repr :
    WordStd.Format
    Equations
    • One or more equations did not get rendered due to their size.
    Instances For
      @[implicit_reducible]
      instance instReprWord :
      Repr Word
      Equations
      def Word.mk' (form : String) (cat : UD.UPOS) :

      Convenience constructor for a featureless word (form + category only).

      Equations
      Instances For

        The φ-feature subset (person, number, gender) of a word, as a UD.MorphFeatures bundle.

        Equations
        Instances For
          def Word.Agree (w1 w2 : Word) :

          φ-agreement between two words: their person/number/gender features are compatible (an unspecified feature is a wildcard). A reflexive, symmetric tolerance relation on Word (not transitive), decided by the shared UD.MorphFeatures.compatible. This is the feature-based agreement primitive binding and concord consumers share — no surface-form gender lookup.

          Equations
          Instances For
            @[implicit_reducible]
            instance instDecidableAgree (w1 w2 : Word) :
            Decidable (w1.Agree w2)
            Equations
            theorem Word.Agree.refl (w : Word) :
            w.Agree w
            theorem Word.Agree.symm {w1 w2 : Word} (h : w1.Agree w2) :
            w2.Agree w1

            φ-agreement is symmetric — the docstring's "symmetric tolerance relation", as a theorem.

            theorem Word.Agree.not_transitive :
            ¬∀ (w1 w2 w3 : Word), w1.Agree w2w2.Agree w3w1.Agree w3

            φ-agreement is not transitive: an unspecified feature is a wildcard, so underspecified they agrees with both she and he while she ≁ he.

            @[implicit_reducible]

            A word bears the number its UD morphology ingests (Number.fromUD).

            Equations
            @[implicit_reducible]
            Equations
            @[implicit_reducible]
            Equations
            @[simp]

            The φ-projection preserves numberOf: a word and its phi bundle bear the same number — the defeq Word.Agree.hasNumber_compatible relies on.

            theorem Word.Agree.hasNumber_compatible {w1 w2 : Word} (h : w1.Agree w2) :

            φ-agreement entails number compatibility: the HasNumber mixin never diverges from the agreement engine on Word.

            Derive a passive variant: sets voice to passive. The valence change (detransitivization) is a frame-level fact carried by the passive analysis on DepTree.frames, not token data. Composes with VerbEntry.toWordPastPart.

            Equations
            • One or more equations did not get rendered due to their size.
            Instances For
              @[implicit_reducible]
              instance instBEqWord :
              BEq Word
              Equations
              @[implicit_reducible]
              instance instToStringWord :
              ToString Word
              Equations
              def wordsToString (ws : List Word) :
              String

              Convert a word list to a readable string.

              Equations
              Instances For