DLM training: Endstate Learning vs Frequency-Informed Learning #

@cite{baayen-2019} @cite{gahl-baayen-2024} @cite{heitmeier-chuang-baayen-2026}

Sibling to Defs.lean, Normed.lean, Measures.lean. Hosts the substrate for what counts as a learned production map — the optimization characterization of EndState Learning (EL) and Frequency-Informed Learning (FIL).

Paper-faithful representation: `FrequencyVector` (= paper's `Q`) #

FrequencyVector m is the type-faithful representation of paper @cite{gahl-baayen-2024}'s diagonal frequency matrix Q (appendix §A1.3): one nonneg weight per usage event. We do not normalise to a probability distribution; the paper works with raw counts. The PMF view is a derived bridge — call PMF.ofRealWeightFn from Core.Probability.Constructions directly when cross-tradition theorems require it. The ERM-equivalence between raw q and normalised q.normalize is proved in the sibling file RescalingInvariance.lean.

The cognitive interpretation: q i is the number of times the learner has experienced event i. EL corresponds to type-uniform weights (q ≡ 1); FIL corresponds to token-frequency weights (q i = #occurrences).

What this file establishes #

The substantive cross-framework structure. Different cognitive theories of learning correspond to different choices of q; the substrate captures the architecture in which those theories diverge:

weightedLoss data q G — the per-event squared-error loss weighted by q. Paper appendix §A1.3 of @cite{gahl-baayen-2024}.
IsERMSolution data q G — G minimises the weighted loss. The cognitive theory choice IS the choice of q; the optimization procedure is fixed.
IsELSolution data G and IsFILSolution data q G — abbrevs capturing the type-uniform and frequency-weighted cases.
weightedLoss_smul_frequency — loss is linear in q.
ermSolution_iff_rescaled — T-Rescaling: the ERM solution is invariant under positive rescaling of q. Relative frequencies matter; absolute scale doesn't. (Paper §3.1 appendix discussion of the equivalent FIL forms.)
weightedLoss_zero_event_drops — T-Support (weak form): events with q i = 0 contribute nothing to the loss. Novel unattested events don't update the lexicon.
isELSolution_eq_isERM_uniform — T-Uniform-EL-equivalence: EL is ERM under the constant-1 frequency vector. Definitional.

What this file does NOT do #

This is not generic regression formalization. The substrate captures:

The loss function weightedLoss as the cognitive commitment (paper §3 of @cite{gahl-baayen-2024}: minimising squared error per usage event is what the learner does).
The frequency-weight parameterisation as the cross-theory axis (paper §3.1 distinguishes EL from FIL only via q).

We do not formalise:

The closed-form (SᵀQS)⁻¹SᵀQC — that's matrix algebra, not theory-specific. A future Training/ClosedForm.lean could derive it from the optimization characterization here as a theorem (= "the closed form is the unique minimum when SᵀQS is invertible") — but that's regression formalization in the service of showing equivalence to the optimization picture.
The iterative Widrow-Hoff convergence to IsFILSolution (paper appendix §A5.1; @cite{heitmeier-chuang-baayen-2026} Heitmeier 2024 argument). Defer until a 2nd consumer needs it.
The PMF / ERM-theoretic reformulation. Mathematically equivalent in finite case, but interpretively additive — the paper authors avoid framing this as ERM under empirical distributions (see §6.4: "we would caution against reifying any particular variable on the basis of its predictiveness"). A derived PMF view is straightforward via normalization.

DLM training: Endstate Learning vs Frequency-Informed Learning #

Paper-faithful representation: FrequencyVector (= paper's Q) #

What this file establishes #

What this file does NOT do #

Paper-faithful representation: `FrequencyVector` (= paper's `Q`) #