Saito, Tomaschek & Baayen (2025): frequency × inflectional status via the DLM #
[STB25] reanalyse German tongue-position data (560 tokens, 88 word
types sharing the rhyme [a(:)(X)t], Karl-Eberhard Corpus): high-frequency non-inflected
words show articulatory reduction (tongue raising, for the low vowel [a(:)]), while in
high-frequency inflected words the reduction is attenuated (paper §2.2). Replacing the
binary inflectional-status factor with SemSupSuffix — semantic support from word meaning
to the suffix triphone, read off a trained DLM ([BCSBB19],
[HCB26]) — improves the tongue-position GAMM by 142.87 AIC units
with one fewer effective degree of freedom (paper §3.3, Table 3). The apparent
morphological-boundary effect is thus driven by inflectional semantics, challenging
production models with an intermediate morpheme layer such as WEAVER++
([LRM99], [Roe97]).
Main declarations #
GermanInflectionalDLM:LinearDiscriminativeLexiconat the paper's carrier types, triphone form vectors of dimension 14404 and word2vec meaning vectors of dimension 300 (paper §3.1).close_meanings_imply_close_form: the substrate Lipschitz bound at those carriers — close meanings yield close predicted articulations.semSup_lt_of_forms_lt: when the suffix triphone is linearly decodable from meanings, training alone gives suffix-bearing (inflected) words strictly greater suffix support — the direction of the paper's headline contrast.
Implementation notes #
The paper's positional measures SemSupVowel and SemSupSuffix (paper §3.1 eqs. 3–4) are
semSup (Discriminative/Measures.lean) at the stem-vowel and suffix triphone indices;
the paper's triphone indexing is not reproduced here, so they get no separate definitions.
The paper's production matrix G (solving SG = C) is the substrate's production, its
comprehension matrix F (solving CF = S) is comprehension. The DLM's
no-stored-entries architecture sits against frequency-channel theories of a stored
lexicon and [Byb85]'s tokenFreq (Morphology/UsageBased/Network.lean); cf. the
channel discrimination in Studies/BreissKatsudaKawahara2026.lean.
Triphone count of the paper's CELEX-derived form matrix C (paper §3.1).
Equations
- Saito2025.TriphoneCount = 14404
Instances For
Dimension of the pretrained German word2vec embeddings of [Mul15].
Equations
Instances For
Zero/one triphone-indicator form vectors. The binary structure is a property of the training data, not of the type.
Instances For
300-dimensional word2vec meaning vectors.
Equations
Instances For
The paper's DLM: LinearDiscriminativeLexicon at German triphone × word2vec
carrier types.
Equations
Instances For
Close meanings yield close predicted articulations, with constant ‖production‖.
If the suffix-triphone coordinate is linearly decodable from word meanings —
the paper's §4 mechanism, inflectional semantics tied to the suffix — then a
trained DLM's SemSupSuffix reproduces it exactly, so a word carrying the
suffix triphone (an inflected word) gets strictly greater suffix support than
one lacking it: the direction of the paper's headline contrast (its Fig. 11),
from the linear architecture alone.