Documentation

Linglib.Paradigms.WugTest

Wug-Test Paradigm #

@cite{berko-1958} @cite{albright-hayes-2003}

Shared vocabulary for the wug paradigm: subjects are presented with nonce (novel) lexical items and asked to produce, judge, or rate forms. Because nonce items are by construction unlisted, a wug response cannot be a rote recall — it must be the output of a productive generalisation.

Architectural role #

Paradigms/ is the contract layer between Theories/ and Phenomena/Studies/. A wug study provides a typed cell whose Attestation factor can be swapped between attested (a real lexeme) and novel (a wug). The paradigm-level predicates in §4 quantify over the lens, so any theory whose predictions ride along the cell's other factors can be tested for the qualitative pattern the original paper reports.

Anchoring on a methodological lineage #

Two papers ground the contract:

The contract here is deliberately minimal — a single parametric lens class HasFactor (Cell) (F) plus a real-valued Rate observable. The two known dimensions a wug paradigm crosses (Attestation, log-frequency) are exposed as abbrevs HasAttestation/HasFrequency specialising HasFactor at the relevant codomain. Studies that need additional factors (neighbourhood density, paradigm structure, binary IOR membership) instantiate HasFactor at their own codomain and reuse the same predicate machinery.

Anti-UseListed discriminator #

A wug paradigm is the canonical discriminator between theories that locate productivity in the grammar (where novel forms inherit lexical-frequency effects via constraints / weights) and theories that locate productivity in lexical listing (where novel forms cannot inherit anything because they are by definition unlisted). The predicate NovelInvariantInFactor is the UseListed @cite{zuraw-2000} prediction; NovelShowsFactorGradient is the prediction of indexed-constraint @cite{pater-2010} or scaled-weight @cite{coetzee-pater-2008} accounts. The theorem novelGradient_inconsistent_with_invariance proves the two are mutually incompatible on cells with a non-vacuous factor space, so a study that adopts a typed Cell can pose the discrimination as a single bridge theorem at any factor type with [LT].

Bridge to Theories/Phonology/ItemSpecificity/HasTokenFreq #

HasTokenFreq (in Theories/Phonology/ItemSpecificity/Defs.lean) is a getter-only class on fragment lexical entries — fragments are immutable data, so there is no setter. HasFactor Cell ℝ is a lens on paradigm cells, which the paradigm-level predicates below need to quantify over swapping a frequency without touching other factors.

Wug cells that wrap a fragment lexical entry typically store the manipulable factor as a separate field (e.g. WugBKKCell.n2LogFreq) that mirrors the entry's tokenLogFreq for attested items and ranges freely on novel items. This is the right architecture for an experimental paradigm: the cell-level factor IS the experimentally manipulated value, not the lexicon-derived one. The connection to HasTokenFreq is by intent (downstream theories test whether the cell-level factor predicts the rate observable, with the cell-level factor originating in the lexicon channel for attested items), not by an automatic instance.

Out of scope (per CLAUDE.md Processing scope) #

Whether a stimulus is an attested lexical item or a novel (nonce, wug-like) form. The basic categorical contrast that @cite{berko-1958} introduced and that every wug paradigm crosses.

Instances For
    @[implicit_reducible]
    Equations
    Equations
    • One or more equations did not get rendered due to their size.
    Instances For
      @[implicit_reducible]
      Equations
      • One or more equations did not get rendered due to their size.

      Studies' Cell types vary in what additional factors they cross (item, paradigm slot, frequency stratum, IOR membership, …). The shared minimum is that every wug cell carries some collection of factors and a way to swap each one without touching others. The lens laws (get_set, set_get, set_set) make setFactor a proper lens; the paradigm-level predicates rely on them to express "swapping the factor changes the rate" as a statement that quantifies over the rest of the cell uniformly.

      HasFactor takes the codomain F as a parameter so that one schema covers Attestation (categorical), (frequency), Bool (binary IOR membership), and any other factor a future study introduces. Each Cell declares one HasFactor instance per factor it exposes; typeclass synthesis routes by the requested F.

      A lens on Cell exposing a factor of type F.

      Instances
        @[reducible, inline]

        Cell has an attestation factor that can be swapped without touching other factors. The @cite{berko-1958} dimension.

        Equations
        Instances For
          @[reducible, inline]

          Cell exposes a real-valued frequency factor (e.g. log token frequency of the source lexeme; log corpus frequency of an analogous attested compound). Frequency is -valued because lexical-frequency theories (@cite{coetzee-pater-2008}, @cite{coetzee-kawahara-2013}) operate on log frequencies as a continuous regressor.

          Equations
          Instances For
            @[reducible, inline]
            abbrev Paradigms.WugTest.Rate (Cell R : Type) :

            Per-cell numeric outcome — the wug paradigm's primary observable. Polymorphic over the codomain R so that empirical tables ( proportions) and theory predictions ( log-odds, MaxEnt probabilities) can both ride along the same predicate machinery.

            Equations
            Instances For

              These predicates state empirical patterns at the paradigm level. They are written in terms of setFactor (the lens), so any Cell type with the relevant HasFactor instances can claim them without re-deriving the universal quantification per study. The predicates are abstract; they express a shape ("novel forms show a factor gradient") that empirical studies and theoretical models may or may not satisfy.

              def Paradigms.WugTest.NovelShowsFactorGradient {Cell F R : Type} [HasAttestation Cell] [HasFactor Cell F] [LT F] [LT R] (rate : Rate Cell R) :

              A rate observable shows the novel-form factor gradient if, holding all other factors constant and fixing attestation = novel, varying the F-typed factor strictly varies the rate. This is the prediction of indexed-constraint (@cite{pater-2010}), scaled-weight (@cite{coetzee-pater-2008}), and representation-strength (@cite{moore-cantwell-2021}, @cite{smolensky-goldrick-2016}) theories: novel forms inherit a frequency-conditioned grammar pressure from analogous lexical items and therefore show a factor gradient even though they are themselves unlisted.

              Equations
              • One or more equations did not get rendered due to their size.
              Instances For
                def Paradigms.WugTest.NovelInvariantInFactor {Cell F R : Type} [HasAttestation Cell] [HasFactor Cell F] (rate : Rate Cell R) :

                A rate observable is factor-invariant on novel forms if, holding all other factors constant and fixing attestation = novel, varying the F-typed factor leaves the rate unchanged. This is the prediction of UseListed (@cite{zuraw-2000}): novel forms have no lexical entry, so no entry-keyed factor lookup can affect their grammar pressure. The two hypotheses thus make opposite predictions on the same paradigm cell.

                Equations
                • One or more equations did not get rendered due to their size.
                Instances For
                  @[reducible, inline]
                  abbrev Paradigms.WugTest.NovelShowsFreqGradient {Cell R : Type} [HasAttestation Cell] [HasFrequency Cell] [LT R] (rate : Rate Cell R) :

                  Frequency-specific spelling of NovelShowsFactorGradient at F := ℝ. Kept as an abbrev for readability at use sites where "frequency gradient" is the linguist-facing terminology.

                  Equations
                  Instances For
                    @[reducible, inline]
                    abbrev Paradigms.WugTest.NovelInvariantInFrequency {Cell R : Type} [HasAttestation Cell] [HasFrequency Cell] (rate : Rate Cell R) :

                    Frequency-specific spelling of NovelInvariantInFactor at F := ℝ.

                    Equations
                    Instances For
                      theorem Paradigms.WugTest.novelGradient_inconsistent_with_invariance {Cell F R : Type} [HasAttestation Cell] [HasFactor Cell F] [LT F] [Preorder R] (rate : Rate Cell R) (h_grad : NovelShowsFactorGradient rate) (h_inv : NovelInvariantInFactor rate) (c : Cell) (f₁ f₂ : F) (h_lt : f₁ < f₂) :
                      False

                      The two predictions are structurally incompatible: any rate observable that satisfies both NovelShowsFactorGradient and NovelInvariantInFactor at the same factor type F must have a vacuous factor space (no two F-distinct values). On any cell whose typeclasses permit f₁ < f₂ for some F-typed factors, the predicates are mutually exclusive — exactly the discriminator a wug paradigm is supposed to provide.

                      For binary factors (F := Bool), the precondition f₁ < f₂ is discharged automatically by Bool.false_lt_true; for real-valued factors (F := ℝ) any concrete pair like (0 : ℝ) < 1 works.

                      This is the structural source of the empirical claim that wug paradigms can adjudicate between grammar-locus and listing-locus accounts of productivity: the bridge theorem is a single application of this lemma to a study's cell type.