Documentation

Linglib.Studies.Saito2025

Saito, Tomaschek & Baayen (2025): frequency × inflectional status via the DLM #

[STB25] reanalyse German tongue-position data (560 tokens, 88 word types sharing the rhyme [a(:)(X)t], Karl-Eberhard Corpus): high-frequency non-inflected words show articulatory reduction (tongue raising, for the low vowel [a(:)]), while in high-frequency inflected words the reduction is attenuated (paper §2.2). Replacing the binary inflectional-status factor with SemSupSuffix — semantic support from word meaning to the suffix triphone, read off a trained DLM ([BCSBB19], [HCB26]) — improves the tongue-position GAMM by 142.87 AIC units with one fewer effective degree of freedom (paper §3.3, Table 3). The apparent morphological-boundary effect is thus driven by inflectional semantics, challenging production models with an intermediate morpheme layer such as WEAVER++ ([LRM99], [Roe97]).

Main declarations #

Implementation notes #

The paper's positional measures SemSupVowel and SemSupSuffix (paper §3.1 eqs. 3–4) are semSup (Discriminative/Measures.lean) at the stem-vowel and suffix triphone indices; the paper's triphone indexing is not reproduced here, so they get no separate definitions. The paper's production matrix G (solving SG = C) is the substrate's production, its comprehension matrix F (solving CF = S) is comprehension. The DLM's no-stored-entries architecture sits against frequency-channel theories of a stored lexicon and [Byb85]'s tokenFreq (Morphology/UsageBased/Network.lean); cf. the channel discrimination in Studies/BreissKatsudaKawahara2026.lean.

@[reducible, inline]

Triphone count of the paper's CELEX-derived form matrix C (paper §3.1).

Equations
Instances For
    @[reducible, inline]

    Dimension of the pretrained German word2vec embeddings of [Mul15].

    Equations
    Instances For
      @[reducible, inline]

      Zero/one triphone-indicator form vectors. The binary structure is a property of the training data, not of the type.

      Equations
      Instances For
        @[reducible, inline]

        300-dimensional word2vec meaning vectors.

        Equations
        Instances For
          @[reducible, inline]

          The paper's DLM: LinearDiscriminativeLexicon at German triphone × word2vec carrier types.

          Equations
          Instances For
            theorem Saito2025.close_meanings_imply_close_form (D : GermanInflectionalDLM) (s₁ s₂ : GermanWord2VecVec) {ε : } (h : s₁ - s₂ ε) :
            D.production s₁ - D.production s₂ LinearMap.toContinuousLinearMap D.production * ε

            Close meanings yield close predicted articulations, with constant ‖production‖.

            theorem Saito2025.semSup_lt_of_forms_lt {m : } {D : GermanInflectionalDLM} {data : Processing.Lexical.Discriminative.TrainingExperience m TriphoneCount Word2VecGermanDim} {q : Processing.Lexical.Discriminative.FrequencyVector m} (hD : Processing.Lexical.Discriminative.LinearDiscriminativeLexicon.IsTrainedOn D data q) (hq : ∀ (i : Fin m), 0 < q i) {suffixIdx : Fin TriphoneCount} {w : GermanWord2VecVec →ₗ[] } (hw : ∀ (i : Fin m), w (data.meanings i) = data.forms i suffixIdx) {i k : Fin m} (hik : data.forms i suffixIdx < data.forms k suffixIdx) :

            If the suffix-triphone coordinate is linearly decodable from word meanings — the paper's §4 mechanism, inflectional semantics tied to the suffix — then a trained DLM's SemSupSuffix reproduces it exactly, so a word carrying the suffix triphone (an inflected word) gets strictly greater suffix support than one lacking it: the direction of the paper's headline contrast (its Fig. 11), from the linear architecture alone.