Unification: subsumption and joins for feature bundles #
The information ordering of unification-based grammar ([Shi86] §3.2), on
UD.MorphFeatures — the depth-1, reentrancy-free fragment of Shieber's feature
structures. Every feature here is atomic-valued, so paths are single features, the
reentrancy clause of subsumption is vacuous, and the definition (§3.2.2: D ⊑ D′ iff
D(l) ⊑ D′(l) for all l ∈ dom(D); "an atomic feature structure neither subsumes nor
is subsumed by a different atomic feature structure"; "variables subsume all other
feature structures") reduces to the product of flat orders. Subsumption is registered
as ≤ (Shieber's ⊑); the all-none bundle — Shieber's variable [ ] — is ⊥.
Unification (§3.2.3) is "the most general feature structure D such that D′ ⊑ D and
D′′ ⊑ D", failing on conflict: MorphFeatures.unify returns some exactly on
Compatible (= bounded above) inputs, and its result is the least upper bound
(unify_isLUB). The example laws of §3.2.3 are theorems: unification is idempotent
(unify_self), commutative (unify_comm), and variables are identity elements
(bot_unify).
Main declarations #
Option.FlatLE— the flat information order on one atomic feature slot.instance : PartialOrder UD.MorphFeatures— subsumption ("only a partial order", §3.2.3), with decidable≤.instance : OrderBot UD.MorphFeatures— the empty bundle is bottom.instance : SemilatticeInf UD.MorphFeatures— the meet is Shieber's generalization (anti-unification): total, unlike the join.UD.MorphFeatures.Compatible— boundedness above (BddAbove {f, g}), decidable via theBoolcheck (compatible_iff_bddAbove).unify_isLUB,unify_eq_some_iff_isLUB,unify_comm,unify_assoc,unify_self,bot_unify/unify_bot,unify_mono— the §3.2.3 laws plus associativity and guarded monotonicity.
Theory-neutrality boundary #
Three strata with different statuses: the record is annotation consensus
(Data/UD/Basic.lean); the order ≤ is shared substrate every framework consumes
its own way (DM's matching clause, underspecification, syncretism down-sets); the
operations ⊔/⊓ are commitments of the unification tradition — [Shi86]
§3.1 states unification-as-sole-combinator as a design constraint, and rival
frameworks combine differently (DM matches and competes; Minimalist Agree values
asymmetrically). This file is not that tradition's headquarters: unification-based
grammar (PATR, HPSG, LFG — reentrant feature structures, phrasal combination) is a
syntax family whose substrate belongs in Syntax/ when consuming studies demand it.
What lives here is only the tradition's morphological-bundle fragment — the algebra
of one token's Feats column — which it shares with rivals: at the level of claims
about morphological feature combination this file is a sibling of DM/ and
Nanosyntax/, not a foundation beneath them.
Implementation notes #
Morphology owns the bundle algebra: MorphFeatures is the token's morphology (UD's
Feats column), and unification at the ms-word level is the morphology/syntax interface
operation. The matching clause of DM's Subset Principle (an exponent is insertable
iff exponent.features ≤ morpheme.features) could consume ≤ directly, but the
existing Morphology/DM/VocabularyInsertion.lean matches by List-subset on [BEq F]
rather than MorphFeatures.≤; bridging is left for a future PR. The competition
clause — most-specified-wins — is separate argmax machinery already implemented in
that same file. (Nanosyntax's Superset Principle is not a consumer: it matches by
containment of syntactic trees, see Nanosyntax/TreeSpellout.lean's NanoTree.contains,
not by an order on flat bundles.) Lives apart from Data/UD/Basic.lean so the
(heavily imported) standard mirror stays mathlib-free — this file is the one that
pays for Mathlib.Order — and it is the canonical home for order instances on
UD.MorphFeatures.
The non-distributivity of the subsumption lattice is a documented
obstruction (MorphFeatures.not_distrib_witness): any ≥3-value slot
(here, Case) makes the per-slot lattice the diamond Mₙ, modular but
not distributive ([Car92] p. 15, eq. (4), notes this
explicitly: "our partial orders are not required to be distributive
(and in fact, are not even required to be modular)"). This matters for
generalization-then-unification reorderings in paradigm-induction
learners.
The flat order on one feature slot #
Option.FlatLE is the slot-level subsumption relation ([Shi86]
§3.2.2: none below everything, distinct atoms incomparable). The
definition and its refl/trans/antisymm/none_le API live in
Linglib/Core/Order/Flat.lean, where they also carry the bundled
PartialOrder/OrderBot/SemilatticeInf/PartialUnify instances on
the order-carrying alias Flat α. Re-exported here under their
original names for backward compatibility with this file's own proofs
and consumers.
The 14-case feature-type index for MorphFeatures — the signature
of UD's Feats column treated as a fixed finite type family.
- number : MorphFeatureType
- gender : MorphFeatureType
- case_ : MorphFeatureType
- definite : MorphFeatureType
- degree : MorphFeatureType
- pronType : MorphFeatureType
- reflex : MorphFeatureType
- person : MorphFeatureType
- verbForm : MorphFeatureType
- tense : MorphFeatureType
- aspect : MorphFeatureType
- mood : MorphFeatureType
- voice : MorphFeatureType
- polarity : MorphFeatureType
Instances For
Equations
- UD.instDecidableEqMorphFeatureType x✝ y✝ = if h : x✝.ctorIdx = y✝.ctorIdx then isTrue ⋯ else isFalse ⋯
Equations
- UD.instReprMorphFeatureType = { reprPrec := UD.instReprMorphFeatureType.repr }
Equations
- One or more equations did not get rendered due to their size.
- UD.instReprMorphFeatureType.repr UD.MorphFeatureType.number prec✝ = Repr.addAppParen (Std.Format.nest (if prec✝ ≥ 1024 then 1 else 2) (Std.Format.text "UD.MorphFeatureType.number")).group prec✝
- UD.instReprMorphFeatureType.repr UD.MorphFeatureType.gender prec✝ = Repr.addAppParen (Std.Format.nest (if prec✝ ≥ 1024 then 1 else 2) (Std.Format.text "UD.MorphFeatureType.gender")).group prec✝
- UD.instReprMorphFeatureType.repr UD.MorphFeatureType.case_ prec✝ = Repr.addAppParen (Std.Format.nest (if prec✝ ≥ 1024 then 1 else 2) (Std.Format.text "UD.MorphFeatureType.case_")).group prec✝
- UD.instReprMorphFeatureType.repr UD.MorphFeatureType.degree prec✝ = Repr.addAppParen (Std.Format.nest (if prec✝ ≥ 1024 then 1 else 2) (Std.Format.text "UD.MorphFeatureType.degree")).group prec✝
- UD.instReprMorphFeatureType.repr UD.MorphFeatureType.reflex prec✝ = Repr.addAppParen (Std.Format.nest (if prec✝ ≥ 1024 then 1 else 2) (Std.Format.text "UD.MorphFeatureType.reflex")).group prec✝
- UD.instReprMorphFeatureType.repr UD.MorphFeatureType.person prec✝ = Repr.addAppParen (Std.Format.nest (if prec✝ ≥ 1024 then 1 else 2) (Std.Format.text "UD.MorphFeatureType.person")).group prec✝
- UD.instReprMorphFeatureType.repr UD.MorphFeatureType.tense prec✝ = Repr.addAppParen (Std.Format.nest (if prec✝ ≥ 1024 then 1 else 2) (Std.Format.text "UD.MorphFeatureType.tense")).group prec✝
- UD.instReprMorphFeatureType.repr UD.MorphFeatureType.aspect prec✝ = Repr.addAppParen (Std.Format.nest (if prec✝ ≥ 1024 then 1 else 2) (Std.Format.text "UD.MorphFeatureType.aspect")).group prec✝
- UD.instReprMorphFeatureType.repr UD.MorphFeatureType.mood prec✝ = Repr.addAppParen (Std.Format.nest (if prec✝ ≥ 1024 then 1 else 2) (Std.Format.text "UD.MorphFeatureType.mood")).group prec✝
- UD.instReprMorphFeatureType.repr UD.MorphFeatureType.voice prec✝ = Repr.addAppParen (Std.Format.nest (if prec✝ ≥ 1024 then 1 else 2) (Std.Format.text "UD.MorphFeatureType.voice")).group prec✝
Instances For
Equations
- UD.instFintypeMorphFeatureType = { elems := { val := ↑UD.MorphFeatureType.enumList, nodup := UD.MorphFeatureType.enumList_nodup }, complete := UD.instFintypeMorphFeatureType._proof_1 }
Per-slot value space. The reflex slot is privative (Unit); all
other slots take their concrete UD enum.
Equations
- UD.MorphFeatureType.number.Val = UD.Number
- UD.MorphFeatureType.gender.Val = UD.Gender
- UD.MorphFeatureType.case_.Val = UD.Case
- UD.MorphFeatureType.definite.Val = UD.Definite
- UD.MorphFeatureType.degree.Val = UD.Degree
- UD.MorphFeatureType.pronType.Val = UD.PronType
- UD.MorphFeatureType.reflex.Val = Unit
- UD.MorphFeatureType.person.Val = UD.Person
- UD.MorphFeatureType.verbForm.Val = UD.VerbForm
- UD.MorphFeatureType.tense.Val = UD.Tense
- UD.MorphFeatureType.aspect.Val = UD.Aspect
- UD.MorphFeatureType.mood.Val = UD.Mood
- UD.MorphFeatureType.voice.Val = UD.Voice
- UD.MorphFeatureType.polarity.Val = UD.Polarity
Instances For
Equations
- UD.MorphFeatureType.number.instDecidableEqVal = UD.MorphFeatureType.instDecidableEqVal._aux_1
- UD.MorphFeatureType.gender.instDecidableEqVal = UD.MorphFeatureType.instDecidableEqVal._aux_3
- UD.MorphFeatureType.case_.instDecidableEqVal = UD.MorphFeatureType.instDecidableEqVal._aux_5
- UD.MorphFeatureType.definite.instDecidableEqVal = UD.MorphFeatureType.instDecidableEqVal._aux_7
- UD.MorphFeatureType.degree.instDecidableEqVal = UD.MorphFeatureType.instDecidableEqVal._aux_9
- UD.MorphFeatureType.pronType.instDecidableEqVal = UD.MorphFeatureType.instDecidableEqVal._aux_11
- UD.MorphFeatureType.reflex.instDecidableEqVal = UD.MorphFeatureType.instDecidableEqVal._aux_13
- UD.MorphFeatureType.person.instDecidableEqVal = UD.MorphFeatureType.instDecidableEqVal._aux_15
- UD.MorphFeatureType.verbForm.instDecidableEqVal = UD.MorphFeatureType.instDecidableEqVal._aux_17
- UD.MorphFeatureType.tense.instDecidableEqVal = UD.MorphFeatureType.instDecidableEqVal._aux_19
- UD.MorphFeatureType.aspect.instDecidableEqVal = UD.MorphFeatureType.instDecidableEqVal._aux_21
- UD.MorphFeatureType.mood.instDecidableEqVal = UD.MorphFeatureType.instDecidableEqVal._aux_23
- UD.MorphFeatureType.voice.instDecidableEqVal = UD.MorphFeatureType.instDecidableEqVal._aux_25
- UD.MorphFeatureType.polarity.instDecidableEqVal = UD.MorphFeatureType.instDecidableEqVal._aux_27
Subsumption is a partial order with bottom #
MorphFeatures as a feature bundle, and the derived order #
MorphFeatures realizes BundleLike over the 14-case signature
MorphFeatureType ([Car92]'s abstract feature structure): each
slot projects to a Flat value, with the reflex flag normalized
false ↦ none, true ↦ some () (the privative Unit case). The
valuation is injective, so MorphFeatures is LawfulBundleLike, and the
subsumption order is the per-slot Flat order pulled back along val
(subsumes_iff_val_le) — the PartialOrder/OrderBot laws derive from
the bundle embedding rather than being proved field by field.
The valuation: project each slot as a Flat value, normalizing
reflex : Bool to a privative Flat Unit.
Equations
- f.val UD.MorphFeatureType.number = f.number
- f.val UD.MorphFeatureType.gender = f.gender
- f.val UD.MorphFeatureType.case_ = f.case_
- f.val UD.MorphFeatureType.definite = f.definite
- f.val UD.MorphFeatureType.degree = f.degree
- f.val UD.MorphFeatureType.pronType = f.pronType
- f.val UD.MorphFeatureType.reflex = if f.reflex = true then some () else none
- f.val UD.MorphFeatureType.person = f.person
- f.val UD.MorphFeatureType.verbForm = f.verbForm
- f.val UD.MorphFeatureType.tense = f.tense
- f.val UD.MorphFeatureType.aspect = f.aspect
- f.val UD.MorphFeatureType.mood = f.mood
- f.val UD.MorphFeatureType.voice = f.voice
- f.val UD.MorphFeatureType.polarity = f.polarity
Instances For
Equations
The valuation is injective: a MorphFeatures bundle is determined by
its per-slot assignments (with reflex reconstructed from Option Unit).
Equations
- One or more equations did not get rendered due to their size.
Equations
- f.instDecidableSubsumes g = id inferInstance
Subsumption is decidable (each slot is).
Equations
- f.instDecidableLe g = f.instDecidableSubsumes g
The empty bundle — Shieber's variable [ ] — is bottom: "variables subsume all
other feature structures … they contain no information at all" (§3.2.2).
Equations
- UD.MorphFeatures.instOrderBot = { bot := { }, bot_le := UD.MorphFeatures.instOrderBot._proof_1 }
Compatibility is boundedness above; unification is the least upper bound #
Prop-native compatibility: the pair is bounded above in the subsumption
order — mathlib's BddAbove, so the bounds API applies directly. Characterized by
the executable compatible check via compatible_iff_bddAbove, which also makes
it decidable.
Equations
- f.Compatible g = BddAbove {f, g}
Instances For
The left input subsumes the merge.
The right input subsumes the merge — given compatibility (the doubly committed slots agree, so the left bias is harmless).
The merge is below every common upper bound — minimality ("the most general feature structure", §3.2.3).
Bounded above implies the executable check passes.
The Bool check is exactly boundedness above in the subsumption order: the
order-theoretic identity of "compatible".
Equations
- f.instDecidableCompatible g = decidable_of_iff (f.compatible g = true) ⋯
Unification succeeds exactly on compatible inputs (§3.2.3: otherwise it "fails").
Unification is the least upper bound ([Shi86] §3.2.3: "the most general
feature structure D such that D′ ⊑ D and D′′ ⊑ D").
Generalization: the meet #
Shieber's generalization (anti-unification): the most specific bundle subsumed by
both inputs. Unlike unification it is total — the meet always exists — so
MorphFeatures is a genuine SemilatticeInf with ⊥.
Equations
- One or more equations did not get rendered due to their size.
Equations
- One or more equations did not get rendered due to their size.
Unification computes least upper bounds — further laws #
MorphFeatures carries the pairwise-LUB structure of [Car92]'s
bounded complete partial order: unify is the partial join, with
unify_isLUB and compatible_iff_bddAbove supplying the two class
axioms. The unification laws — commutativity, associativity (with
failure propagating), ⊥-identity, idempotence, monotonicity — are
inherited as one-line corollaries of the generic theorems in
Core/Order/PartialUnify.lean.
Equations
- UD.MorphFeatures.instPartialUnify = { unify := UD.MorphFeatures.unify, isLUB_of_unify_eq_some := ⋯, isSome_unify_of_bddAbove := @UD.MorphFeatures.instPartialUnify._proof_1 }
The instance-projected unify is the same function as
MorphFeatures.unify.
Unification succeeds with value u exactly when u is the least upper bound.
Unification is commutative — a consequence of total compatibility: doubly committed slots agree, so the per-field left bias washes out.
Unification is idempotent (§3.2.3's example law).
Variables are unification identity elements (§3.2.3's example law): unifying with the empty bundle returns the other input.
Unification is associative, with failure propagating ([Shi86] §3.2.3's order-independence): both associations compute the lub of all three bundles.
The empty bundle is a right identity for unification.
Unification fails exactly on incompatible inputs.
Unification is monotone where defined: shrinking both inputs preserves success and
shrinks the output. (Unguarded merge-monotonicity is false — the guard is needed.)