Saito, Tomaschek & Baayen (2025): frequency × inflectional status via the DLM #

[STB25] reanalyse German tongue-position data (560 tokens, 88 word types sharing the rhyme [a(:)(X)t], Karl-Eberhard Corpus): high-frequency non-inflected words show articulatory reduction (tongue raising, for the low vowel [a(:)]), while in high-frequency inflected words the reduction is attenuated (paper §2.2). Replacing the binary inflectional-status factor with SemSupSuffix — semantic support from word meaning to the suffix triphone, read off a trained DLM ([BCSBB19], [HCB26]) — improves the tongue-position GAMM by 142.87 AIC units with one fewer effective degree of freedom (paper §3.3, Table 3). The apparent morphological-boundary effect is thus driven by inflectional semantics, challenging production models with an intermediate morpheme layer such as WEAVER++ ([LRM99], [Roe97]).

Main declarations #

GermanInflectionalDLM: LinearDiscriminativeLexicon at the paper's carrier types, triphone form vectors of dimension 14404 and word2vec meaning vectors of dimension 300 (paper §3.1).
close_meanings_imply_close_form: the substrate Lipschitz bound at those carriers — close meanings yield close predicted articulations.
semSup_lt_of_forms_lt: when the suffix triphone is linearly decodable from meanings, training alone gives suffix-bearing (inflected) words strictly greater suffix support — the direction of the paper's headline contrast.

Implementation notes #

The paper's positional measures SemSupVowel and SemSupSuffix (paper §3.1 eqs. 3–4) are semSup (Discriminative/Measures.lean) at the stem-vowel and suffix triphone indices; the paper's triphone indexing is not reproduced here, so they get no separate definitions. The paper's production matrix G (solving SG = C) is the substrate's production, its comprehension matrix F (solving CF = S) is comprehension. The DLM's no-stored-entries architecture sits against frequency-channel theories of a stored lexicon and [Byb85]'s tokenFreq (Morphology/UsageBased/Network.lean); cf. the channel discrimination in Studies/BreissKatsudaKawahara2026.lean.

source

@[reducible, inline]

abbrev Saito2025.TriphoneCount :

ℕ

Triphone count of the paper's CELEX-derived form matrix C (paper §3.1).

Equations

Saito2025.TriphoneCount = 14404

Instances For

source

@[reducible, inline]

abbrev Saito2025.Word2VecGermanDim :

ℕ

Dimension of the pretrained German word2vec embeddings of [Mul15].

Equations

Saito2025.Word2VecGermanDim = 300

Instances For

source

@[reducible, inline]

abbrev Saito2025.TriphoneVec :

Type

Zero/one triphone-indicator form vectors. The binary structure is a property of the training data, not of the type.

Equations

Saito2025.TriphoneVec = Processing.Lexical.Discriminative.FormVec Saito2025.TriphoneCount

Instances For

source

@[reducible, inline]

abbrev Saito2025.GermanWord2VecVec :

Type

300-dimensional word2vec meaning vectors.

Equations

Saito2025.GermanWord2VecVec = Processing.Lexical.Discriminative.MeaningVec Saito2025.Word2VecGermanDim

Instances For

source

@[reducible, inline]

abbrev Saito2025.GermanInflectionalDLM :

Type

The paper's DLM: LinearDiscriminativeLexicon at German triphone × word2vec carrier types.

Equations

Saito2025.GermanInflectionalDLM = Processing.Lexical.Discriminative.LinearDiscriminativeLexicon ℝ Saito2025.TriphoneVec Saito2025.GermanWord2VecVec

Instances For

source

theorem Saito2025.close_meanings_imply_close_form (D : GermanInflectionalDLM) (s₁ s₂ : GermanWord2VecVec) {ε : ℝ} (h : ‖s₁ - s₂‖ ≤ ε) :

‖D.production s₁ - D.production s₂‖ ≤ ‖LinearMap.toContinuousLinearMap D.production‖ * ε

Close meanings yield close predicted articulations, with constant ‖production‖.

source

theorem Saito2025.semSup_lt_of_forms_lt {m : ℕ} {D : GermanInflectionalDLM} {data : Processing.Lexical.Discriminative.TrainingExperience m TriphoneCount Word2VecGermanDim} {q : Processing.Lexical.Discriminative.FrequencyVector m} (hD : Processing.Lexical.Discriminative.LinearDiscriminativeLexicon.IsTrainedOn D data q) (hq : ∀ (i : Fin m), 0 < q i) {suffixIdx : Fin TriphoneCount} {w : GermanWord2VecVec →ₗ[ℝ] ℝ} (hw : ∀ (i : Fin m), w (data.meanings i) = data.forms i suffixIdx) {i k : Fin m} (hik : data.forms i suffixIdx < data.forms k suffixIdx) :

Processing.Lexical.Discriminative.semSup D (data.meanings i) suffixIdx < Processing.Lexical.Discriminative.semSup D (data.meanings k) suffixIdx

If the suffix-triphone coordinate is linearly decodable from word meanings — the paper's §4 mechanism, inflectional semantics tied to the suffix — then a trained DLM's SemSupSuffix reproduces it exactly, so a word carrying the suffix triphone (an inflected word) gets strictly greater suffix support than one lacking it: the direction of the paper's headline contrast (its Fig. 11), from the linear architecture alone.

Documentation

Linglib.Studies.Saito2025

Saito, Tomaschek & Baayen (2025): frequency × inflectional status via the DLM #

Main declarations #

Implementation notes #