Zaslavsky, Kemp, Regier & Tishby (2018): efficient compression in color naming #
[ZKRT18] argue that color-naming systems
efficiently compress meanings into words by optimizing the Information
Bottleneck (IB) trade-off between lexicon complexity (the information rate
I(M;W)) and accuracy (I(W;U)). Cross-language variation is captured by a
single trade-off parameter β, and the Berlin & Kay evolutionary sequence
([BK69]: dark/light, then red, then green/yellow, then blue, …)
falls out as motion up the complexity axis — successive systems carve color
space more finely, paying complexity for accuracy.
This file formalizes the per-language WALS color profiles (the eight
WALS-sourced ColorProfiles below, derived via ColorProfile.fromWALS) observable
through the efficient-communication framework in Pragmatics.Efficiency
(CostPair, weightedCost = cost₂ + β·cost₁, efficiencyLossAt).
What is derived vs. cited #
- Derived (from the WALS-sourced color profiles): the Berlin-Kay
complexity coordinate of each sampled language — the number of basic color
categories (WALS Ch 133). The IB complexity
I(M;W)of a deterministic K-word system is bounded bylog K, so the category count is a monotone handle on the complexity axis (notI(M;W)itself). - Bridge (via
Pragmatics.Efficiency): the β-scalarized IB objectiveweightedCostis monotone in this complexity coordinate, so the Berlin-Kay category ordering is exactly an ordering on the IB complexity axis. - Cited stimulus (from the paper, not formalized): real languages are
near-optimal — they lie close to the IB curve with small efficiency loss
ε_l, at a fittedβ_l ≳ 1(e.g. Englishβ_l ≈ 1.085, Fig. 4). The formal anchor here is only the idealized optimum (zero loss at coincidence); the empirical near-optimality is the paper's measured result.
Main results #
bk_complexity_strictMono: the category-count handle is strictly monotone in the WALS Ch 133 ordering (the Berlin-Kay sequence).weightedCost_mono_in_complexity: more categories ⇒ higher β-scalarized IB complexity cost, for every β ≥ 0 — the structural bridge.sample_all_warm_and_cool_split: every sampled language distinguishes both red/yellow and green/blue (all are high-complexity, late-sequence systems).
WALS Ch 132: Number of non-derived basic color categories #
Number of non-derived basic color categories (WALS Ch 132, [KM13c]). Ranges from 3 to 6 along the Berlin & Kay sequence; transitional half-values represent languages with one composite category undergoing splitting.
- three : NonDerivedColorCount
- threeHalf : NonDerivedColorCount
- four : NonDerivedColorCount
- fourHalf : NonDerivedColorCount
- five : NonDerivedColorCount
- fiveHalf : NonDerivedColorCount
- six : NonDerivedColorCount
Instances For
Equations
- ZaslavskyKempRegierTishby2018.instDecidableEqNonDerivedColorCount x✝ y✝ = if h : x✝.ctorIdx = y✝.ctorIdx then isTrue ⋯ else isFalse ⋯
Equations
- One or more equations did not get rendered due to their size.
Instances For
WALS Ch 133: Total number of basic color categories #
Total number of basic color categories including derived ones (WALS Ch 133, [KM13b]). Ranges from 3–4 (minimal systems) to the top bucket, WALS's "more than 10" — canonically 11 basic terms, the Berlin & Kay Stage-VII maximum (e.g., English, Russian).
- v3to4 : BasicColorCount
- v4to5 : BasicColorCount
- v6to6h : BasicColorCount
- v7to7h : BasicColorCount
- v8to8h : BasicColorCount
- v9to10 : BasicColorCount
- v11 : BasicColorCount
Instances For
Equations
- ZaslavskyKempRegierTishby2018.instDecidableEqBasicColorCount x✝ y✝ = if h : x✝.ctorIdx = y✝.ctorIdx then isTrue ⋯ else isFalse ⋯
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
WALS Ch 134: Green and blue #
How a language treats the green-blue region of color space (WALS Ch 134, [KM13a]). The classic grue / green-blue composite distinction, with several other composite patterns (with black, with yellow).
- distinct : GreenBlueRelation
Separate terms for green and blue.
- merged : GreenBlueRelation
A single grue term covering both green and blue.
- blackGreenBlue : GreenBlueRelation
A single term covering black, green, and blue.
- blackBlueVsGreen : GreenBlueRelation
Black/blue merged, green separate.
- yellowGreenBlue : GreenBlueRelation
Yellow, green, blue all merged.
- yellowGreenVsBlue : GreenBlueRelation
Yellow/green merged, blue separate.
- noTerm : GreenBlueRelation
No green or blue term at all.
Instances For
Equations
- ZaslavskyKempRegierTishby2018.instDecidableEqGreenBlueRelation x✝ y✝ = if h : x✝.ctorIdx = y✝.ctorIdx then isTrue ⋯ else isFalse ⋯
Equations
- One or more equations did not get rendered due to their size.
Instances For
WALS Ch 135: Red and yellow #
How a language treats the red-yellow region of color space (WALS Ch 135, [KM13d]).
- distinct : RedYellowRelation
Separate terms for red and yellow.
- merged : RedYellowRelation
A single term covering both red and yellow.
- yellowGreenBlueVsRed : RedYellowRelation
Yellow/green/blue merged, vs red.
- yellowGreenVsRed : RedYellowRelation
Yellow/green merged, vs red.
- noTerm : RedYellowRelation
No red or yellow term at all.
Instances For
Equations
- ZaslavskyKempRegierTishby2018.instDecidableEqRedYellowRelation x✝ y✝ = if h : x✝.ctorIdx = y✝.ctorIdx then isTrue ⋯ else isFalse ⋯
Equations
- One or more equations did not get rendered due to their size.
Instances For
Per-language profile #
A language's color-naming profile across [DH13b] Chs 132–135. Coverage is sparse (~120 languages); fields are optional.
- language : String
- iso : String
- family : String
- nonDerived : Option NonDerivedColorCount
Ch 132: non-derived basic color categories.
- basic : Option BasicColorCount
Ch 133: total basic color categories.
- greenBlue : Option GreenBlueRelation
Ch 134: green-blue relation.
- redYellow : Option RedYellowRelation
Ch 135: red-yellow relation.
Instances For
Equations
Equations
- One or more equations did not get rendered due to their size.
Instances For
WALS converters #
Convert WALS 132A non-derived-color-count values into the substrate enum.
Equations
- ZaslavskyKempRegierTishby2018.fromWALS132A Data.WALS.F132A.NumberOfNonDerivedBasicColourCategories.v3 = ZaslavskyKempRegierTishby2018.NonDerivedColorCount.three
- ZaslavskyKempRegierTishby2018.fromWALS132A Data.WALS.F132A.NumberOfNonDerivedBasicColourCategories.v35 = ZaslavskyKempRegierTishby2018.NonDerivedColorCount.threeHalf
- ZaslavskyKempRegierTishby2018.fromWALS132A Data.WALS.F132A.NumberOfNonDerivedBasicColourCategories.v4 = ZaslavskyKempRegierTishby2018.NonDerivedColorCount.four
- ZaslavskyKempRegierTishby2018.fromWALS132A Data.WALS.F132A.NumberOfNonDerivedBasicColourCategories.v45 = ZaslavskyKempRegierTishby2018.NonDerivedColorCount.fourHalf
- ZaslavskyKempRegierTishby2018.fromWALS132A Data.WALS.F132A.NumberOfNonDerivedBasicColourCategories.v5 = ZaslavskyKempRegierTishby2018.NonDerivedColorCount.five
- ZaslavskyKempRegierTishby2018.fromWALS132A Data.WALS.F132A.NumberOfNonDerivedBasicColourCategories.v55 = ZaslavskyKempRegierTishby2018.NonDerivedColorCount.fiveHalf
- ZaslavskyKempRegierTishby2018.fromWALS132A Data.WALS.F132A.NumberOfNonDerivedBasicColourCategories.v6 = ZaslavskyKempRegierTishby2018.NonDerivedColorCount.six
Instances For
Convert WALS 133A basic-color-count values into the substrate enum.
Equations
- ZaslavskyKempRegierTishby2018.fromWALS133A Data.WALS.F133A.NumberOfBasicColourCategories.v34 = ZaslavskyKempRegierTishby2018.BasicColorCount.v3to4
- ZaslavskyKempRegierTishby2018.fromWALS133A Data.WALS.F133A.NumberOfBasicColourCategories.v4555 = ZaslavskyKempRegierTishby2018.BasicColorCount.v4to5
- ZaslavskyKempRegierTishby2018.fromWALS133A Data.WALS.F133A.NumberOfBasicColourCategories.v665 = ZaslavskyKempRegierTishby2018.BasicColorCount.v6to6h
- ZaslavskyKempRegierTishby2018.fromWALS133A Data.WALS.F133A.NumberOfBasicColourCategories.v775 = ZaslavskyKempRegierTishby2018.BasicColorCount.v7to7h
- ZaslavskyKempRegierTishby2018.fromWALS133A Data.WALS.F133A.NumberOfBasicColourCategories.v885 = ZaslavskyKempRegierTishby2018.BasicColorCount.v8to8h
- ZaslavskyKempRegierTishby2018.fromWALS133A Data.WALS.F133A.NumberOfBasicColourCategories.v910 = ZaslavskyKempRegierTishby2018.BasicColorCount.v9to10
- ZaslavskyKempRegierTishby2018.fromWALS133A Data.WALS.F133A.NumberOfBasicColourCategories.v11 = ZaslavskyKempRegierTishby2018.BasicColorCount.v11
Instances For
Convert WALS 134A green-blue values into the substrate enum.
Equations
- ZaslavskyKempRegierTishby2018.fromWALS134A Data.WALS.F134A.GreenAndBlue.greenVsBlue = ZaslavskyKempRegierTishby2018.GreenBlueRelation.distinct
- ZaslavskyKempRegierTishby2018.fromWALS134A Data.WALS.F134A.GreenAndBlue.greenBlue = ZaslavskyKempRegierTishby2018.GreenBlueRelation.merged
- ZaslavskyKempRegierTishby2018.fromWALS134A Data.WALS.F134A.GreenAndBlue.blackGreenBlue = ZaslavskyKempRegierTishby2018.GreenBlueRelation.blackGreenBlue
- ZaslavskyKempRegierTishby2018.fromWALS134A Data.WALS.F134A.GreenAndBlue.blackBlueVsGreen = ZaslavskyKempRegierTishby2018.GreenBlueRelation.blackBlueVsGreen
- ZaslavskyKempRegierTishby2018.fromWALS134A Data.WALS.F134A.GreenAndBlue.yellowGreenBlue = ZaslavskyKempRegierTishby2018.GreenBlueRelation.yellowGreenBlue
- ZaslavskyKempRegierTishby2018.fromWALS134A Data.WALS.F134A.GreenAndBlue.yellowGreenVsBlue = ZaslavskyKempRegierTishby2018.GreenBlueRelation.yellowGreenVsBlue
- ZaslavskyKempRegierTishby2018.fromWALS134A Data.WALS.F134A.GreenAndBlue.none = ZaslavskyKempRegierTishby2018.GreenBlueRelation.noTerm
Instances For
Convert WALS 135A red-yellow values into the substrate enum.
Equations
- ZaslavskyKempRegierTishby2018.fromWALS135A Data.WALS.F135A.RedAndYellow.redVsYellow = ZaslavskyKempRegierTishby2018.RedYellowRelation.distinct
- ZaslavskyKempRegierTishby2018.fromWALS135A Data.WALS.F135A.RedAndYellow.redYellow = ZaslavskyKempRegierTishby2018.RedYellowRelation.merged
- ZaslavskyKempRegierTishby2018.fromWALS135A Data.WALS.F135A.RedAndYellow.yellowGreenBlueVsRed = ZaslavskyKempRegierTishby2018.RedYellowRelation.yellowGreenBlueVsRed
- ZaslavskyKempRegierTishby2018.fromWALS135A Data.WALS.F135A.RedAndYellow.yellowGreenVsRed = ZaslavskyKempRegierTishby2018.RedYellowRelation.yellowGreenVsRed
- ZaslavskyKempRegierTishby2018.fromWALS135A Data.WALS.F135A.RedAndYellow.none = ZaslavskyKempRegierTishby2018.RedYellowRelation.noTerm
Instances For
Build a ColorProfile from the WALS Chs 132–135 rows for an ISO 639-3
code, mapping each chapter's datapoint through its converter; a field for
which WALS has no row is none. Makes the per-language Fragment profiles
true-by-construction from the auto-generated WALS tables rather than
hand-transcribed literals.
Equations
- One or more equations did not get rendered due to their size.
Instances For
The eight WALS-sourced sample profiles #
Equations
- ZaslavskyKempRegierTishby2018.english = ZaslavskyKempRegierTishby2018.ColorProfile.fromWALS "English" "eng" "Indo-European"
Instances For
Equations
- ZaslavskyKempRegierTishby2018.french = ZaslavskyKempRegierTishby2018.ColorProfile.fromWALS "French" "fra" "Indo-European"
Instances For
Equations
- ZaslavskyKempRegierTishby2018.german = ZaslavskyKempRegierTishby2018.ColorProfile.fromWALS "German" "deu" "Indo-European"
Instances For
Equations
- ZaslavskyKempRegierTishby2018.japanese = ZaslavskyKempRegierTishby2018.ColorProfile.fromWALS "Japanese" "jpn" "Japonic"
Instances For
Equations
- ZaslavskyKempRegierTishby2018.korean = ZaslavskyKempRegierTishby2018.ColorProfile.fromWALS "Korean" "kor" "Koreanic"
Instances For
Equations
- ZaslavskyKempRegierTishby2018.mandarin = ZaslavskyKempRegierTishby2018.ColorProfile.fromWALS "Mandarin Chinese" "cmn" "Sino-Tibetan"
Instances For
Equations
- ZaslavskyKempRegierTishby2018.russian = ZaslavskyKempRegierTishby2018.ColorProfile.fromWALS "Russian" "rus" "Indo-European"
Instances For
Equations
- ZaslavskyKempRegierTishby2018.spanish = ZaslavskyKempRegierTishby2018.ColorProfile.fromWALS "Spanish" "spa" "Indo-European"
Instances For
The Berlin-Kay complexity coordinate #
A representative basic-category count for each WALS Ch 133 bucket (its lower
bound). Only the ordering matters: this is a monotone handle on the IB
complexity axis I(M;W) ≤ log K, where K is the number of color words.
Equations
- ZaslavskyKempRegierTishby2018.basicCount ZaslavskyKempRegierTishby2018.BasicColorCount.v3to4 = 3
- ZaslavskyKempRegierTishby2018.basicCount ZaslavskyKempRegierTishby2018.BasicColorCount.v4to5 = 4
- ZaslavskyKempRegierTishby2018.basicCount ZaslavskyKempRegierTishby2018.BasicColorCount.v6to6h = 6
- ZaslavskyKempRegierTishby2018.basicCount ZaslavskyKempRegierTishby2018.BasicColorCount.v7to7h = 7
- ZaslavskyKempRegierTishby2018.basicCount ZaslavskyKempRegierTishby2018.BasicColorCount.v8to8h = 8
- ZaslavskyKempRegierTishby2018.basicCount ZaslavskyKempRegierTishby2018.BasicColorCount.v9to10 = 9
- ZaslavskyKempRegierTishby2018.basicCount ZaslavskyKempRegierTishby2018.BasicColorCount.v11 = 11
Instances For
IB complexity handle for a color profile: its basic-category count
(0 when WALS records no Ch 133 datum).
Equations
- ZaslavskyKempRegierTishby2018.ibComplexity p = (Option.map ZaslavskyKempRegierTishby2018.basicCount p.basic).getD 0
Instances For
The category-count handle is strictly monotone along the WALS Ch 133 ordering — i.e. along the Berlin & Kay evolutionary sequence ([BK69]).
Bridge to the Information-Bottleneck objective #
A color system as a Pragmatics.Efficiency.CostPair: cost₁ is the IB
complexity handle (category count), cost₂ is the system's accuracy/
distortion component, left abstract (WALS does not record it per language).
Equations
- ZaslavskyKempRegierTishby2018.ibCost p acc = { cost₁ := ↑(ZaslavskyKempRegierTishby2018.ibComplexity p), cost₂ := acc }
Instances For
Structural bridge. Under the β-scalarized IB objective weightedCost,
a system with more basic categories has at least as high a complexity cost,
for every β ≥ 0 and any fixed accuracy. The Berlin-Kay category ordering is
therefore an ordering on the IB complexity axis the paper plots.
The idealized anchor of the paper's near-optimality finding: a color system
that coincides with the IB-optimal system at its fitted β has zero
efficiency loss. Real languages are near-optimal (small nonzero ε_l),
which is the paper's measured result rather than a theorem here.
The Fragment sample #
The eight WALS-sourced color profiles formalized as Fragments. All are industrialized-language systems near the top of the Berlin & Kay sequence.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Every sampled language draws both the warm (red/yellow) and cool
(green/blue) boundaries — all are high-complexity, late-sequence systems,
consistent with their high category counts and fitted β_l > 1.
Concrete complexity contrast: English (11 basic terms) sits higher on the IB complexity axis than Mandarin (8–8.5), as the Berlin-Kay sequence predicts.
The contrast lifts to the IB objective: for every β ≥ 0 (and any fixed accuracy), English's β-scalarized complexity cost is at least Mandarin's.