Chuang, Bell, Tseng & Baayen (2026): Word-specific tonal realizations in Mandarin #
@cite{chuang-bell-tseng-baayen-2026} @cite{baayen-2019} @cite{heitmeier-chuang-baayen-2026}
Chuang, Y.-Y., Bell, M. J., Tseng, Y.-H., & Baayen, R. H. (2026). Word-specific tonal realizations in Mandarin. Language, in press. DOI 10.1017/S0097850725000001.
Empirical claims (the four predictions) #
The paper investigates Mandarin disyllabic words with the rise-fall (RF) tonal pattern (T2 followed by T4) in spontaneous Taiwan Mandarin speech (3,778 tokens across 51 word types from the Taiwan Mandarin Spontaneous Speech Corpus). Standard accounts attribute variation in tonal realisation to (i) lexical tone, (ii) voluntary intonation/focus, and (iii) involuntary articulatory constraints (coarticulation, speech rate, segmental makeup). The paper adds a fourth source — word meaning itself — and derives four testable predictions:
- Word type predicts f0 above segmental controls. A by-word factor smooth in a generalised additive model (GAM) of f0 contour reduces AIC more than the joint contribution of all segment-related controls (vowel height × 2 syllables, onset type × 2, rhyme structure × 2).
- Sense further refines word. Replacing the word-level factor smooth with a sense-level smooth further improves model fit on the subset of senses with sufficient tokens.
- f0 → meaning predictability. A linear (or near-linear) mapping from token-level pitch contours to token-level contextualised embeddings achieves above-chance comprehension accuracy.
- meaning → f0 predictability. A reverse mapping from contextualised embeddings to pitch contours achieves above-chance production accuracy.
The substantive theoretical claim is that word meaning is a co-determiner of phonetic realization — equivalently, that the form space of pitch contours and the meaning space of contextualised embeddings exhibit quantifiable isomorphism (predictions 3 and 4), contradicting the dual-articulation axiom that form and meaning are orthogonal modules of grammar.
Substrate #
The DLM substrate (LinearDiscriminativeLexicon, FormVec, MeaningVec,
broadcastFirstCoord, linear_dlm_distinguishes_meanings,
dlm_neutralizes_meanings_in_kernel, linear_dlm_admits_meaning_specific_contours)
lives in Theories/Processing/Lexical/Discriminative/Defs.lean
@cite{baayen-2019} @cite{heitmeier-chuang-baayen-2026}. This file
imports it and supplies the paper-specific instantiation: 50-dim pitch
contours, 768-dim CKIP GPT-2 contextualised embeddings, the 51 RF
disyllabic word types of the corpus.
The DLM's defining architectural commitment is that the lexicon
contains no stored representations — only the connection weights
of comprehension and production maps. The substrate structure has
exactly two fields, both LinearMaps; there is no entries-typed
projection. This is the substantive reason DLM is housed under
Theories/Processing/Lexical/ rather than Theories/Lexicon/ — see
the substrate file's docstring for the full architectural argument.
Relation to existing usage-based / frequency-channel theories #
The DLM is the latest in a lineage of usage-based, gradient-weight theories. Adjacent linglib substrate:
Theories/Phonology/ItemSpecificity/— four phonological theories parameterised by lexical frequency (UseListed.lean@cite{zuraw-2000},IndexedConstraints.lean@cite{pater-2010},RepresentationStrength.lean@cite{moore-cantwell-2021},ScaledWeights.lean@cite{coetzee-pater-2008}). All four presuppose a stored lexicon to which frequency attaches.Theories/Morphology/UsageBased/Network.lean(@cite{bybee-1985}): Bybee's dynamic network — typedLexicalEntrys withtokenFreqstrength + connection edges. Stores entries.
The DLM is the natural extreme of these traditions: it rejects the
storage premise altogether — no LexicalEntry, no tokenFreq, no
entry-typed connections. The architectural divergence is sharpest
against Bybee (both usage-based, only one stores entries) and is
documented in the substrate file's docstring; per the linguistics
audit (see CHANGELOG 0.231.15), the architectural debate the DLM
entry into makes the case for keeping the four ItemSpecificity
channels and Bybee where they are — they are linguistic-level theories
parameterised by lexical-frequency, not theories of "the lexicon as
its own object".
Architectural note: tones as emergent vs stored #
The discrete-tone substrate in Theories/Phonology/Tone/Constraints.lean,
Phenomena/Tone/Studies/Hyman2006.lean (@cite{hyman-2006}),
Phenomena/Tone/Studies/Lionnet2025.lean (@cite{lionnet-2025}), and
Phenomena/Tone/Studies/AkinboFwangwar2026.lean
(@cite{akinbo-fwangwar-2026}) treats tonal categories (H/M/L; T1–T4)
as primitive featural objects with faithfulness/markedness violations
defined over them. The present study adopts the opposite stance:
tonal categories are statistical generalisations across token-level
pitch contours, not stored cognitive objects.
This file's claims live on continuous f0 vectors (PitchContour below),
deliberately not on Phonology.Autosegmental.FloatingForm. The two
substrates coexist as competing analyses of overlapping data; this file
does not bridge them. The decision is methodological: bridging would
require committing to a translation between continuous contours and
discrete tonal categories that is itself the empirical question the
paper interrogates.
Cross-framework note: vs. Storme 2026 *HOMOPHONY #
Storme2026.starHomophony (@cite{storme-2026},
Phenomena/Phonology/Studies/Storme2026.lean) formalises a systemic
constraint that penalises output tuples in which distinct inputs
produce identical surface forms. It operates on segmental form and
predicts categorical distinct-meaning–distinct-form pressure.
The DLM-based account here predicts graded distinct-meaning– distinct-form pressure operating on fine phonetic detail (here, f0 contours). Even nominally homophonous Mandarin disyllables (e.g. cheng2shi4 'city' vs. cheng2shi4 'computer program', paper §2.1) have measurably distinct pitch contours.
The two formalisations are sibling responses to homophony pressure at
different levels of phonological/phonetic resolution. Their structural
common generalisation is injectivity of the meaning→form map:
Storme's *HOMOPHONY enforces it categorically over a discrete
output paradigm; the present file's linear_dlm_distinguishes_meanings
expresses it for the linear meaning→form map of a LinearDLM. A formal
subsumption result would require a substrate that admits both
discrete-segmental and continuous-sub-segmental representations of
"the same" lexical item; linglib does not currently provide one.
Sections #
- §1 Paper-specific instantiation: Taiwan Mandarin RF disyllables
- §2 RF disyllabic word types from the corpus (representative subset)
- §3 Quantitative form of prediction (iv) via Lipschitz continuity
- §4 Empirical content (the four predictions, in prose)
The paper uses 50 evenly-spaced f0 samples per token (paper §3.2).
Instances For
The paper uses 768-dimensional CKIP GPT-2 contextualised embeddings (paper §3.1).
Equations
Instances For
A pitch contour: 50 f0 samples on the normalised time scale [0, 1], centred and scaled per token (paper §3.2 min-max normalisation), representing pitch shape rather than absolute pitch or amplitude.
Equations
Instances For
A contextualised embedding: a 768-dim vector from CKIP GPT-2 conditioned on the token's preceding utterance (paper §3.1). Distinct tokens of the same word type carry distinct CEs reflecting their context-specific meanings.
Equations
Instances For
The paper's specific DLM instantiation for Taiwan Mandarin RF tones.
Equations
- One or more equations did not get rendered due to their size.
Instances For
A representative subset of the paper's 51 RF (rise-fall, T2-T4)
Mandarin disyllabic word types (paper §2.5). The first five are
the high-frequency words sampled at 300 tokens each (paper §2.2);
the remainder are mid- and lower-frequency types used for
visualisation in Fig. 5 / Fig. 18. The list is not exhaustive —
downstream theorems should not depend on Fintype.card RFWordType.
Note on the apparent tension with the substrate's no-stored- representations commitment: this enum is a label set for navigating the corpus, not a list of stored model entries. The DLM does not store these labels; they serve only to refer to attested word-token clusters in the dataset. The substantive "no stored representations" claim is about the model, not the empirical labelling apparatus.
- ran2hou4 : RFWordType
- shi2hou4 : RFWordType
- bu2hui4 : RFWordType
- hai2shi4 : RFWordType
- yi2yang4 : RFWordType
- xue2xiao4 : RFWordType
- xi2guan4 : RFWordType
- yi4ban4 : RFWordType
- za2zhi4 : RFWordType
- quan2bu4 : RFWordType
- rong2yi4 : RFWordType
- wen2hua4 : RFWordType
- bu2shi4 : RFWordType
- jue2ding4 : RFWordType
- qian2mian4 : RFWordType
- cheng2shi4 : RFWordType
- bu2yao4 : RFWordType
Instances For
Equations
- Phenomena.Tone.Studies.ChuangEtAl2026.instDecidableEqRFWordType x✝ y✝ = if h : x✝.ctorIdx = y✝.ctorIdx then isTrue ⋯ else isFalse ⋯
Equations
- One or more equations did not get rendered due to their size.
Instances For
Quantitative form of prediction (iv). Paper §3.4 reports that
the trained DLM production net maps similar CEs to similar contours
above chance. The Lipschitz form: any TaiwanMandarinRFDLM satisfies
‖production e₁ - production e₂‖ ≤ ‖production‖ * ‖e₁ - e₂‖. The
paper's empirical content is that the trained ‖production‖ is
moderate, making this bound informative for the homograph pair
cheng2shi4 'city' vs. cheng2shi4 'computer program' (paper
§2.1) — similar context-specific embeddings yield similar but
measurably distinct pitch contours.
Direct application of dlm_neighbor_centroids_imply_neighbor_contours
to the paper-specific carrier types.
The four predictions as paper-supplied empirical facts #
Per CLAUDE.md (Processing-scope guidance), measurement modalities and
empirical-fit tables are out of scope as Lean theorems. The four
predictions are recorded here as documented empirical findings — not
as Lean-derivable theorems — since their content is the paper's GAM
and DLM training results, not a structural property of the substrate.
Prediction (i) — paper §2.6.2, Fig. 4, Fig. 6. The GAM with a by-word factor smooth has cross-validated SSE strictly less than the GAM with all six segmental factor smooths combined (vowel1, vowel2, onset1, onset2, syllable1, syllable2). Reported AIC reduction relative to baseline: −6,795 (word) vs. −4,938 (omnibus segmental).
Prediction (ii) — paper §2.7.1, Fig. 10, Fig. 12. On the restricted dataset where senses have ≥ 14 tokens (3,458 tokens, 65 senses across 48 word types), the GAM with a sense factor smooth reduces AIC by an additional 365 units relative to the word-smooth GAM. Polysemous words such as bu2yao4 (4 senses: 'prohibition', 'dissuasion', 'unnecessity', 'wish-against'; Fig. 11) have visibly distinct sense-specific contours.
Prediction (iii) — paper §3.3, Fig. 16. A DLM comprehension net trained on (pitch contour → CE) pairs achieves test accuracy ~30% (LDL) and ~50% (ResLDL), where "accuracy" means the predicted CE's nearest neighbour belongs to a token of the same word type. Permutation chance baseline: ~3.5% over the 51-word vocabulary.
Prediction (iv) — paper §3.4, Fig. 17. A DLM production net trained on (CE → pitch contour) pairs achieves test accuracy ~35–40% for both LDL and ResLDL. Notable: linear and nonlinear models perform similarly here, suggesting the meaning → form mapping is dominantly linear. The qualitative match between the LDL-predicted contour from word-centroid CEs and the GAM-predicted contour from word-factor smooths (Fig. 18) is the paper's headline form-meaning isomorphism finding.
Implications recorded in the paper's discussion #
- Anti-stored-tone-representation (paper §4): tones are emergent statistical generalisations, not discrete cognitive objects. The substantive challenge to the discrete-tone substrate noted in the module docstring's Architectural note.
- Anti-dual-articulation (paper §4): form and meaning are not orthogonal levels of grammar; their isomorphism is quantifiable and predictively useful in both comprehension and production.
- Contextualism about meaning (paper §1, §3.1): word meaning is a property of the token in context, not a context-independent symbol shared by all tokens of a type. Operationalised by contextualised rather than type-level embeddings.