DLM ERM-invariance under positive rescaling #

@cite{baayen-2019} @cite{gahl-baayen-2024}

Sibling to Training.lean. Two theorems making explicit that ERM solutions for the DLM weighted loss are invariant under positive rescaling of the frequency vector. The substrate operations FrequencyVector.totalMass and FrequencyVector.normalize live in Training.lean next to FrequencyVector itself; this file consumes them to prove the invariance theorems.

Why a separate file #

These two theorems form the bridge that lets cross-tradition comparisons go through: when comparing DLM predictions to Bayesian word-learning posteriors over the same event inventory, the "frequency-vector-as-counts" view (DLM-paper-faithful) and the "frequency-vector-as-empirical-distribution" view (Bayesian-tradition) make identical predictions about which G : MeaningVec → FormVec is ERM-optimal.

The DLM literature (paper appendix §A1.3 of @cite{gahl-baayen-2024}) uses Q as a diagonal matrix of raw counts, not a normalised distribution. Paper §6.4 explicitly cautions against reifying any single mathematical representation. The rescaling-invariance theorems make the equivalence formal.

PMF view #

To cast a FrequencyVector m into a mathlib PMF (Fin m) (e.g., for KL-divergence or total-variation comparisons against probabilistic models), call PMF.ofRealWeightFn from Core.Probability.Constructions directly:

PMF.ofRealWeightFn q hq i₀ hq_pos_i₀

where i₀ : Fin m is some index with hq_pos_i₀ : 0 < q i₀. From a positive-total hypothesis htot : 0 < q.totalMass together with hq nonneg, extract a witness via the standard contrapositive:

have ⟨i₀, hq_pos_i₀⟩ : ∃ i, 0 < q i := by
  by_contra h_no
  push_neg at h_no
  have : ∑ i, q i = 0 := Finset.sum_eq_zero (fun i _ => le_antisymm (h_no i) (hq i))
  exact absurd this (ne_of_gt htot)

There is no project-side toEmpiricalPMF wrapper — the inline PMF.ofRealWeightFn call is the canonical entry point. (An earlier toEmpiricalPMF reinvented PMF.ofRealWeightFn; removed 0.231.X.)

source

theorem Theories.Processing.Lexical.Discriminative.isERMSolution_normalize_iff {m n d : ℕ} (data : TrainingExperience m n d) (q : FrequencyVector m) (G : MeaningVec d →ₗ[ℝ] FormVec n) (h : 0 < q.totalMass) :

IsERMSolution data q.normalize G ↔ IsERMSolution data q G

Normalisation preserves ERM solutions. A FrequencyVector and its empirical distribution agree on which production maps are ERM solutions. This is the formal statement that the cognitive content is in the relative weights (= the normalised distribution) and not in the absolute scale.

Direct application of ermSolution_iff_rescaled with c = 1/totalMass.

source

theorem Theories.Processing.Lexical.Discriminative.isERMSolution_of_same_normalize {m n d : ℕ} (data : TrainingExperience m n d) (q₁ q₂ : FrequencyVector m) (G : MeaningVec d →ₗ[ℝ] FormVec n) (h₁ : 0 < q₁.totalMass) (h₂ : 0 < q₂.totalMass) (hnorm : q₁.normalize = q₂.normalize) :

IsERMSolution data q₁ G ↔ IsERMSolution data q₂ G

Two FrequencyVectors with identical empirical distributions are ERM-equivalent. The cognitive content lives at the level of the normalised distribution; theories that assign different absolute scales but the same relative frequencies make the same predictions.

Documentation

Linglib.Theories.Processing.Lexical.Discriminative.RescalingInvariance

DLM ERM-invariance under positive rescaling #

Why a separate file #

PMF view #