Chuang, Bell, Tseng & Baayen (2026): Word-specific tonal realizations in Mandarin #

@cite{chuang-bell-tseng-baayen-2026} @cite{baayen-2019} @cite{heitmeier-chuang-baayen-2026}

Chuang, Y.-Y., Bell, M. J., Tseng, Y.-H., & Baayen, R. H. (2026). Word-specific tonal realizations in Mandarin. Language, in press. DOI 10.1017/S0097850725000001.

Empirical claims (the four predictions) #

The paper investigates Mandarin disyllabic words with the rise-fall (RF) tonal pattern (T2 followed by T4) in spontaneous Taiwan Mandarin speech (3,778 tokens across 51 word types from the Taiwan Mandarin Spontaneous Speech Corpus). Standard accounts attribute variation in tonal realisation to (i) lexical tone, (ii) voluntary intonation/focus, and (iii) involuntary articulatory constraints (coarticulation, speech rate, segmental makeup). The paper adds a fourth source — word meaning itself — and derives four testable predictions:

Word type predicts f0 above segmental controls. A by-word factor smooth in a generalised additive model (GAM) of f0 contour reduces AIC more than the joint contribution of all segment-related controls (vowel height × 2 syllables, onset type × 2, rhyme structure × 2).
Sense further refines word. Replacing the word-level factor smooth with a sense-level smooth further improves model fit on the subset of senses with sufficient tokens.
f0 → meaning predictability. A linear (or near-linear) mapping from token-level pitch contours to token-level contextualised embeddings achieves above-chance comprehension accuracy.
meaning → f0 predictability. A reverse mapping from contextualised embeddings to pitch contours achieves above-chance production accuracy.

The substantive theoretical claim is that word meaning is a co-determiner of phonetic realization — equivalently, that the form space of pitch contours and the meaning space of contextualised embeddings exhibit quantifiable isomorphism (predictions 3 and 4), contradicting the dual-articulation axiom that form and meaning are orthogonal modules of grammar.

Substrate #

The DLM substrate (LinearDiscriminativeLexicon, FormVec, MeaningVec, broadcastFirstCoord, linear_dlm_distinguishes_meanings, dlm_neutralizes_meanings_in_kernel, linear_dlm_admits_meaning_specific_contours) lives in Theories/Processing/Lexical/Discriminative/Defs.lean @cite{baayen-2019} @cite{heitmeier-chuang-baayen-2026}. This file imports it and supplies the paper-specific instantiation: 50-dim pitch contours, 768-dim CKIP GPT-2 contextualised embeddings, the 51 RF disyllabic word types of the corpus.

The DLM's defining architectural commitment is that the lexicon contains no stored representations — only the connection weights of comprehension and production maps. The substrate structure has exactly two fields, both LinearMaps; there is no entries-typed projection. This is the substantive reason DLM is housed under Theories/Processing/Lexical/ rather than Theories/Lexicon/ — see the substrate file's docstring for the full architectural argument.

Relation to existing usage-based / frequency-channel theories #

The DLM is the latest in a lineage of usage-based, gradient-weight theories. Adjacent linglib substrate:

Theories/Phonology/ItemSpecificity/ — four phonological theories parameterised by lexical frequency (UseListed.lean @cite{zuraw-2000}, IndexedConstraints.lean @cite{pater-2010}, RepresentationStrength.lean @cite{moore-cantwell-2021}, ScaledWeights.lean @cite{coetzee-pater-2008}). All four presuppose a stored lexicon to which frequency attaches.
Theories/Morphology/UsageBased/Network.lean (@cite{bybee-1985}): Bybee's dynamic network — typed LexicalEntrys with tokenFreq strength + connection edges. Stores entries.

The DLM is the natural extreme of these traditions: it rejects the storage premise altogether — no LexicalEntry, no tokenFreq, no entry-typed connections. The architectural divergence is sharpest against Bybee (both usage-based, only one stores entries) and is documented in the substrate file's docstring; per the linguistics audit (see CHANGELOG 0.231.15), the architectural debate the DLM entry into makes the case for keeping the four ItemSpecificity channels and Bybee where they are — they are linguistic-level theories parameterised by lexical-frequency, not theories of "the lexicon as its own object".

Architectural note: tones as emergent vs stored #

The discrete-tone substrate in Theories/Phonology/Tone/Constraints.lean, Phenomena/Tone/Studies/Hyman2006.lean (@cite{hyman-2006}), Phenomena/Tone/Studies/Lionnet2025.lean (@cite{lionnet-2025}), and Phenomena/Tone/Studies/AkinboFwangwar2026.lean (@cite{akinbo-fwangwar-2026}) treats tonal categories (H/M/L; T1–T4) as primitive featural objects with faithfulness/markedness violations defined over them. The present study adopts the opposite stance: tonal categories are statistical generalisations across token-level pitch contours, not stored cognitive objects.

This file's claims live on continuous f0 vectors (PitchContour below), deliberately not on Phonology.Autosegmental.FloatingForm. The two substrates coexist as competing analyses of overlapping data; this file does not bridge them. The decision is methodological: bridging would require committing to a translation between continuous contours and discrete tonal categories that is itself the empirical question the paper interrogates.

Cross-framework note: vs. Storme 2026 *HOMOPHONY #

Storme2026.starHomophony (@cite{storme-2026}, Phenomena/Phonology/Studies/Storme2026.lean) formalises a systemic constraint that penalises output tuples in which distinct inputs produce identical surface forms. It operates on segmental form and predicts categorical distinct-meaning–distinct-form pressure.

The DLM-based account here predicts graded distinct-meaning– distinct-form pressure operating on fine phonetic detail (here, f0 contours). Even nominally homophonous Mandarin disyllables (e.g. cheng2shi4 'city' vs. cheng2shi4 'computer program', paper §2.1) have measurably distinct pitch contours.

The two formalisations are sibling responses to homophony pressure at different levels of phonological/phonetic resolution. Their structural common generalisation is injectivity of the meaning→form map: Storme's *HOMOPHONY enforces it categorically over a discrete output paradigm; the present file's linear_dlm_distinguishes_meanings expresses it for the linear meaning→form map of a LinearDLM. A formal subsumption result would require a substrate that admits both discrete-segmental and continuous-sub-segmental representations of "the same" lexical item; linglib does not currently provide one.

Sections #

§1 Paper-specific instantiation: Taiwan Mandarin RF disyllables
§2 RF disyllabic word types from the corpus (representative subset)
§3 Quantitative form of prediction (iv) via Lipschitz continuity
§4 Empirical content (the four predictions, in prose)

source

@[reducible, inline]

abbrev Phenomena.Tone.Studies.ChuangEtAl2026.TaiwanMandarinPitchSampleCount :

ℕ

The paper uses 50 evenly-spaced f0 samples per token (paper §3.2).

Equations

Phenomena.Tone.Studies.ChuangEtAl2026.TaiwanMandarinPitchSampleCount = 50

Instances For

source

@[reducible, inline]

abbrev Phenomena.Tone.Studies.ChuangEtAl2026.CKIPGPT2HiddenDim :

ℕ

The paper uses 768-dimensional CKIP GPT-2 contextualised embeddings (paper §3.1).

Equations

Phenomena.Tone.Studies.ChuangEtAl2026.CKIPGPT2HiddenDim = 768

Instances For

source

@[reducible, inline]

abbrev Phenomena.Tone.Studies.ChuangEtAl2026.PitchContour :

Type

A pitch contour: 50 f0 samples on the normalised time scale [0, 1], centred and scaled per token (paper §3.2 min-max normalisation), representing pitch shape rather than absolute pitch or amplitude.

Equations

Phenomena.Tone.Studies.ChuangEtAl2026.PitchContour = Theories.Processing.Lexical.Discriminative.FormVec Phenomena.Tone.Studies.ChuangEtAl2026.TaiwanMandarinPitchSampleCount

Instances For

source

@[reducible, inline]

abbrev Phenomena.Tone.Studies.ChuangEtAl2026.ContextualEmbedding :

Type

A contextualised embedding: a 768-dim vector from CKIP GPT-2 conditioned on the token's preceding utterance (paper §3.1). Distinct tokens of the same word type carry distinct CEs reflecting their context-specific meanings.

Equations

Phenomena.Tone.Studies.ChuangEtAl2026.ContextualEmbedding = Theories.Processing.Lexical.Discriminative.MeaningVec Phenomena.Tone.Studies.ChuangEtAl2026.CKIPGPT2HiddenDim

Instances For

source

@[reducible, inline]

abbrev Phenomena.Tone.Studies.ChuangEtAl2026.TaiwanMandarinRFDLM :

Type

The paper's specific DLM instantiation for Taiwan Mandarin RF tones.

Equations

One or more equations did not get rendered due to their size.

Instances For

source

inductive Phenomena.Tone.Studies.ChuangEtAl2026.RFWordType :

Type

A representative subset of the paper's 51 RF (rise-fall, T2-T4) Mandarin disyllabic word types (paper §2.5). The first five are the high-frequency words sampled at 300 tokens each (paper §2.2); the remainder are mid- and lower-frequency types used for visualisation in Fig. 5 / Fig. 18. The list is not exhaustive — downstream theorems should not depend on Fintype.card RFWordType.

Note on the apparent tension with the substrate's no-stored- representations commitment: this enum is a label set for navigating the corpus, not a list of stored model entries. The DLM does not store these labels; they serve only to refer to attested word-token clusters in the dataset. The substantive "no stored representations" claim is about the model, not the empirical labelling apparatus.

ran2hou4 : RFWordType
shi2hou4 : RFWordType
bu2hui4 : RFWordType
hai2shi4 : RFWordType
yi2yang4 : RFWordType
xue2xiao4 : RFWordType
xi2guan4 : RFWordType
yi4ban4 : RFWordType
za2zhi4 : RFWordType
quan2bu4 : RFWordType
rong2yi4 : RFWordType
wen2hua4 : RFWordType
bu2shi4 : RFWordType
jue2ding4 : RFWordType
qian2mian4 : RFWordType
cheng2shi4 : RFWordType
bu2yao4 : RFWordType

Instances For

source

@[implicit_reducible]

instance Phenomena.Tone.Studies.ChuangEtAl2026.instDecidableEqRFWordType :

DecidableEq RFWordType

Equations

Phenomena.Tone.Studies.ChuangEtAl2026.instDecidableEqRFWordType x✝ y✝ = if h : x✝.ctorIdx = y✝.ctorIdx then isTrue ⋯ else isFalse ⋯

source

def Phenomena.Tone.Studies.ChuangEtAl2026.instReprRFWordType.repr :

RFWordType → ℕ → Std.Format

Equations

One or more equations did not get rendered due to their size.

Instances For

source

@[implicit_reducible]

instance Phenomena.Tone.Studies.ChuangEtAl2026.instReprRFWordType :

Repr RFWordType

Equations

Phenomena.Tone.Studies.ChuangEtAl2026.instReprRFWordType = { reprPrec := Phenomena.Tone.Studies.ChuangEtAl2026.instReprRFWordType.repr }

source

theorem Phenomena.Tone.Studies.ChuangEtAl2026.dlm_close_meanings_imply_close_contours (D : TaiwanMandarinRFDLM) (e₁ e₂ : ContextualEmbedding) {ε : ℝ} (h : ‖e₁ - e₂‖ ≤ ε) :

‖D.production e₁ - D.production e₂‖ ≤ ‖LinearMap.toContinuousLinearMap D.production‖ * ε

Quantitative form of prediction (iv). Paper §3.4 reports that the trained DLM production net maps similar CEs to similar contours above chance. The Lipschitz form: any TaiwanMandarinRFDLM satisfies ‖production e₁ - production e₂‖ ≤ ‖production‖ * ‖e₁ - e₂‖. The paper's empirical content is that the trained ‖production‖ is moderate, making this bound informative for the homograph pair cheng2shi4 'city' vs. cheng2shi4 'computer program' (paper §2.1) — similar context-specific embeddings yield similar but measurably distinct pitch contours.

Direct application of dlm_neighbor_centroids_imply_neighbor_contours to the paper-specific carrier types.

The four predictions as paper-supplied empirical facts #

Per CLAUDE.md (Processing-scope guidance), measurement modalities and empirical-fit tables are out of scope as Lean theorems. The four predictions are recorded here as documented empirical findings — not as Lean-derivable theorems — since their content is the paper's GAM and DLM training results, not a structural property of the substrate.

Prediction (i) — paper §2.6.2, Fig. 4, Fig. 6. The GAM with a by-word factor smooth has cross-validated SSE strictly less than the GAM with all six segmental factor smooths combined (vowel1, vowel2, onset1, onset2, syllable1, syllable2). Reported AIC reduction relative to baseline: −6,795 (word) vs. −4,938 (omnibus segmental).

Prediction (ii) — paper §2.7.1, Fig. 10, Fig. 12. On the restricted dataset where senses have ≥ 14 tokens (3,458 tokens, 65 senses across 48 word types), the GAM with a sense factor smooth reduces AIC by an additional 365 units relative to the word-smooth GAM. Polysemous words such as bu2yao4 (4 senses: 'prohibition', 'dissuasion', 'unnecessity', 'wish-against'; Fig. 11) have visibly distinct sense-specific contours.

Prediction (iii) — paper §3.3, Fig. 16. A DLM comprehension net trained on (pitch contour → CE) pairs achieves test accuracy ~30% (LDL) and ~50% (ResLDL), where "accuracy" means the predicted CE's nearest neighbour belongs to a token of the same word type. Permutation chance baseline: ~3.5% over the 51-word vocabulary.

Prediction (iv) — paper §3.4, Fig. 17. A DLM production net trained on (CE → pitch contour) pairs achieves test accuracy ~35–40% for both LDL and ResLDL. Notable: linear and nonlinear models perform similarly here, suggesting the meaning → form mapping is dominantly linear. The qualitative match between the LDL-predicted contour from word-centroid CEs and the GAM-predicted contour from word-factor smooths (Fig. 18) is the paper's headline form-meaning isomorphism finding.

Implications recorded in the paper's discussion #

Anti-stored-tone-representation (paper §4): tones are emergent statistical generalisations, not discrete cognitive objects. The substantive challenge to the discrete-tone substrate noted in the module docstring's Architectural note.
Anti-dual-articulation (paper §4): form and meaning are not orthogonal levels of grammar; their isomorphism is quantifiable and predictively useful in both comprehension and production.
Contextualism about meaning (paper §1, §3.1): word meaning is a property of the token in context, not a context-independent symbol shared by all tokens of a type. Operationalised by contextualised rather than type-level embeddings.

Documentation

Linglib.Phenomena.Tone.Studies.ChuangEtAl2026

Chuang, Bell, Tseng & Baayen (2026): Word-specific tonal realizations in Mandarin #

Empirical claims (the four predictions) #

Substrate #

Relation to existing usage-based / frequency-channel theories #

Architectural note: tones as emergent vs stored #

Cross-framework note: vs. Storme 2026 *HOMOPHONY #

Sections #

The four predictions as paper-supplied empirical facts #

Implications recorded in the paper's discussion #