Documentation

Linglib.Theories.Phonology.ItemSpecificity.Defs

Lexical-frequency channels — the abstract interface #

A frequency channel on a lexicon is an attachment of a per-item log-frequency to lexical entries. Once a lexicon carries one, downstream phonology can be conditioned on it; the sibling files (IndexedConstraints, ScaledWeights, UseListed, RepresentationStrength) each formalize one such conditioning theory and carry their own foundational citations.

This file is Defs.lean in the mathlib sense: a thin typeclass layer that diverse Fragments/{Lang}/ lexical-entry types can implement without inheriting from a single bundled record. Sibling files (IndexedConstraints.lean, ScaledWeights.lean, UseListed.lean, RepresentationStrength.lean) use only this interface.

Why a typeclass and not a record #

Each Fragments/{Lang}/Prosody.lean defines its own lexical-entry record (JLexicalEntry, ELexicalEntry, ...). A bundled FrequencyAnnotated super-record would force every fragment into a single inheritance hierarchy. Instead, fragments implement HasTokenFreq on their own types, mirroring mathlib's HasAdd, Mul, One. Same logic as the project's existing typeclass discipline (e.g., HasAltList, IsBoundaryVacuous).

A type-frequency analogue (HasTypeFreq) would be the natural sibling typeclass for productivity / wug-test predictions (@cite{albright-hayes-2003}); it is not declared here because no current theory in the directory consumes it. Add it when the first such consumer lands.

Log scale #

All frequency channels in this directory are LogFreq := ℝ. The log-transform is built into the channel because (a) log-frequency is the empirically standard predictor and (b) frequency-scaled-weight theories (@cite{coetzee-pater-2008}) are linear in log-frequency. Theories that want raw counts can Real.exp ∘ tokenLogFreq.

@[reducible, inline]

Log-frequency: by convention, log of corpus token count. none is not encoded here — HasTokenFreq requires some annotation; pass a sentinel (e.g., 0 = log 1 = once) for unannotated items, or wrap with Option.

Equations
Instances For

    A type whose values carry a token log-frequency annotation.

    Instances
      @[reducible]

      The all-zero token-frequency channel. Useful as a default when a fragment has not been frequency-annotated and the downstream theory needs some HasTokenFreq instance to typecheck.

      Returning 0 (= log 1) means "every item attested exactly once" — a no-information prior.

      Equations
      Instances For
        @[reducible, inline]
        abbrev Phonology.ItemSpecificity.isAboveThreshold {α : Type} [HasTokenFreq α] (threshold : ) (a : α) :

        An item is above threshold iff its token log-frequency reaches threshold. The shared primitive used by Indexed.isCore (stratum membership for indexed-constraint theories) and UseListed.isListed (storage gate for whole-word retrieval theories).

        Declared abbrev (= @[reducible] def) so that consumer proofs can show (n : ℝ) ≥ threshold directly without an explicit unfold step.

        Equations
        Instances For
          @[implicit_reducible]
          noncomputable instance Phonology.ItemSpecificity.instDecidablePredIsAboveThreshold {α : Type} [HasTokenFreq α] (threshold : ) :
          DecidablePred (isAboveThreshold threshold)

          Classical decidability for isAboveThreshold. Real's order is decidable only via Classical.dec, so all consumers downstream become noncomputable. Acceptable for theory specifications; concrete fitting routines operate over rationals or use a thresholded comparison directly.

          Equations