@cite{ackerman-malouf-2013}: The Low Conditional Entropy Conjecture #
@cite{ackerman-malouf-2013} @cite{carstairs-mccarthy-2010}
E-complexity vs. I-complexity #
Languages differ dramatically in their enumerative complexity (E-complexity): how many inflection classes, allomorphic variants, and paradigm cells they have. But this apparent complexity is misleading. The key question is integrative complexity (I-complexity): given that a speaker knows some forms of a lexeme, how hard is it to predict the rest?
The LCEC #
The Low Conditional Entropy Conjecture states that the average conditional entropy of paradigm cells — how uncertain you are about one cell given another — is uniformly low across typologically diverse languages, regardless of E-complexity. Formally:
I-complexity(L) = (1 / n(n-1)) · Σᵢ≠ⱼ H(Cᵢ | Cⱼ)
is low for all natural languages L, where Cᵢ ranges over paradigm cells and H(Cᵢ | Cⱼ) is the conditional entropy of cell i given cell j.
Structure #
- §0: i-Complexity (paper-specific aggregation; substrate types
InflectionClass/ParadigmSystem/cellEntropy/conditionalCellEntropylive inCore/Morphology/Paradigm.lean, hoisted there 0.230.X for shared use with @cite{rathi-hahn-futrell-2026}'s informational fusion) - §1: Per-language LCEC verification (all 10 languages)
- §2: E-complexity / I-complexity dissociation
- §3: Mazatec case study (observed vs. random baseline)
@cite{ackerman-malouf-2013}'s integrative complexity: average conditional cell entropy across all off-diagonal cell pairs.
iComplexity(L) = (1 / n(n-1)) · Σᵢ≠ⱼ H(Cᵢ | Cⱼ)
Instantiated at Form := String since A&M's paradigms are over
natural-language surface forms.
Equations
- One or more equations did not get rendered due to their size.
Instances For
The Low Conditional Entropy Conjecture: i-complexity is bounded by a small threshold. The threshold is empirical (typically ≤ 1 nat).
Equations
- Morphology.WP.LCECHolds ps threshold = (Morphology.WP.iComplexity ps ≤ threshold)
Instances For
Summary statistics for a language's morphological paradigm system, as reported in published studies.
Fields correspond to Tables 2--3 of @cite{ackerman-malouf-2013}.
- name : String
Language name
- family : String
Language family
- numClasses : ℕ
Number of inflection classes (E-complexity)
- numCells : ℕ
Number of paradigm cells
- avgCondEntropy : ℚ
Average conditional entropy H(Ci|Cj) in bits (I-complexity)
- maxCellEntropy : ℚ
Maximum cell entropy max H(Ci) in bits
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Fur (Nilo-Saharan, Fur; Sudan). 4 classes, 2 cells.
Equations
- Phenomena.Morphology.AckermanMalouf2013.fur = { name := "Fur", family := "Nilo-Saharan", numClasses := 4, numCells := 2, avgCondEntropy := 489 / 1000, maxCellEntropy := 1334 / 1000 }
Instances For
Ngiti (Nilo-Saharan, Central Sudanic; DRC). 8 classes, 2 cells.
Equations
- Phenomena.Morphology.AckermanMalouf2013.ngiti = { name := "Ngiti", family := "Nilo-Saharan", numClasses := 8, numCells := 2, avgCondEntropy := 380 / 1000, maxCellEntropy := 1741 / 1000 }
Instances For
Nuer (Nilo-Saharan, Nilotic; Sudan/South Sudan). 31 classes, 4 cells.
Equations
- Phenomena.Morphology.AckermanMalouf2013.nuer = { name := "Nuer", family := "Nilo-Saharan", numClasses := 31, numCells := 4, avgCondEntropy := 513 / 1000, maxCellEntropy := 3224 / 1000 }
Instances For
Kwerba (Trans-New Guinea; Papua, Indonesia). 2 classes, 2 cells.
Equations
- Phenomena.Morphology.AckermanMalouf2013.kwerba = { name := "Kwerba", family := "Trans-New Guinea", numClasses := 2, numCells := 2, avgCondEntropy := 469 / 1000, maxCellEntropy := 529 / 1000 }
Instances For
Chinantec (Oto-Manguean; Oaxaca, Mexico). 62 classes, 4 cells. Comaltepec Chinantec tonal verb paradigms.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Chiquihuitlan Mazatec (Oto-Manguean; Oaxaca, Mexico). 109 classes, 4 cells. The paper's primary case study (section 4).
Equations
- One or more equations did not get rendered due to their size.
Instances For
Finnish (Uralic, Finnic). 51 classes, 8 cells.
Equations
- Phenomena.Morphology.AckermanMalouf2013.finnish = { name := "Finnish", family := "Uralic", numClasses := 51, numCells := 8, avgCondEntropy := 209 / 1000, maxCellEntropy := 3803 / 1000 }
Instances For
German (Indo-European, Germanic). 7 classes, 8 cells.
Equations
- Phenomena.Morphology.AckermanMalouf2013.german = { name := "German", family := "Indo-European", numClasses := 7, numCells := 8, avgCondEntropy := 45 / 1000, maxCellEntropy := 1906 / 1000 }
Instances For
Russian (Indo-European, Slavic). 8 classes, 8 cells.
Equations
- Phenomena.Morphology.AckermanMalouf2013.russian = { name := "Russian", family := "Indo-European", numClasses := 8, numCells := 8, avgCondEntropy := 89 / 1000, maxCellEntropy := 2170 / 1000 }
Instances For
Spanish (Indo-European, Romance). 3 classes, 57 cells.
Equations
- Phenomena.Morphology.AckermanMalouf2013.spanish = { name := "Spanish", family := "Indo-European", numClasses := 3, numCells := 57, avgCondEntropy := 3 / 1000, maxCellEntropy := 1522 / 1000 }
Instances For
All 10 languages in the @cite{ackerman-malouf-2013} sample (Table 3).
Equations
- One or more equations did not get rendered due to their size.
Instances For
The LCEC threshold: all 10 languages fall below 1 bit of average conditional entropy. Even the most complex system (Mazatec, 109 classes) has I-complexity < 1 bit.
Instances For
Expected I-complexity under random class assignment for Mazatec (Monte Carlo baseline). The paper reports the mean of 1000 random permutations as ~5.25 bits, far above the observed 0.709 bits.
Equations
Instances For
Each language's reported I-complexity is below the 1-bit threshold. These are "per-datum verification theorems" in linglib's sense: changing a language's avgCondEntropy breaks exactly the corresponding theorem.
All 10 languages satisfy the LCEC.
The LCEC's key prediction: E-complexity and I-complexity are dissociated. A language can have enormous E-complexity but low I-complexity.
Mazatec has maximal E-complexity in the sample (109 classes).
Mazatec's I-complexity is still below 1 bit despite 109 classes.
Kwerba has minimal E-complexity (2 classes) but its I-complexity is not the lowest — German (7 classes) has lower I-complexity. This shows E-complexity doesn't predict I-complexity in either direction.
Spanish has only 3 classes but 57 cells — yet its I-complexity is the lowest in the sample (0.003 bits). More cells with fewer classes means more implicative structure.
The Mazatec case study (§4 of the paper) demonstrates that the observed I-complexity is far below what random assignment of inflection-class patterns would produce.
Mazatec's observed I-complexity is far below the random baseline. Observed: 0.709 bits. Random permutation baseline: ~5.25 bits. The observed value is less than 14% of the random baseline.
The ratio of observed to random I-complexity is less than 1/7. (0.709 / 5.25 ≈ 0.135, i.e., ~13.5% of random)
Mazatec has nonzero I-complexity: it violates @cite{carstairs-mccarthy-2010}'s synonymy avoidance but satisfies the LCEC. This witnesses that the LCEC is strictly weaker than synonymy avoidance.