Documentation

Linglib.Studies.ZaslavskyKempRegierTishby2018

Zaslavsky, Kemp, Regier & Tishby (2018): efficient compression in color naming #

[ZKRT18] [BK69] [DH13b]

[ZKRT18] argue that color-naming systems efficiently compress meanings into words by optimizing the Information Bottleneck (IB) trade-off between lexicon complexity (the information rate I(M;W)) and accuracy (I(W;U)). Cross-language variation is captured by a single trade-off parameter β, and the Berlin & Kay evolutionary sequence ([BK69]: dark/light, then red, then green/yellow, then blue, …) falls out as motion up the complexity axis — successive systems carve color space more finely, paying complexity for accuracy.

This file formalizes the per-language WALS color profiles (the eight WALS-sourced ColorProfiles below, derived via ColorProfile.fromWALS) observable through the efficient-communication framework in Pragmatics.Efficiency (CostPair, weightedCost = cost₂ + β·cost₁, efficiencyLossAt).

What is derived vs. cited #

Main results #

WALS Ch 132: Number of non-derived basic color categories #

Number of non-derived basic color categories (WALS Ch 132, [KM13c]). Ranges from 3 to 6 along the Berlin & Kay sequence; transitional half-values represent languages with one composite category undergoing splitting.

Instances For
    @[implicit_reducible]
    Equations
    Equations
    • One or more equations did not get rendered due to their size.
    Instances For

      WALS Ch 133: Total number of basic color categories #

      Total number of basic color categories including derived ones (WALS Ch 133, [KM13b]). Ranges from 3–4 (minimal systems) to the top bucket, WALS's "more than 10" — canonically 11 basic terms, the Berlin & Kay Stage-VII maximum (e.g., English, Russian).

      Instances For
        @[implicit_reducible]
        Equations
        Equations
        • One or more equations did not get rendered due to their size.
        Instances For

          WALS Ch 134: Green and blue #

          How a language treats the green-blue region of color space (WALS Ch 134, [KM13a]). The classic grue / green-blue composite distinction, with several other composite patterns (with black, with yellow).

          Instances For
            @[implicit_reducible]
            Equations
            Equations
            • One or more equations did not get rendered due to their size.
            Instances For

              WALS Ch 135: Red and yellow #

              How a language treats the red-yellow region of color space (WALS Ch 135, [KM13d]).

              Instances For
                @[implicit_reducible]
                Equations
                Equations
                • One or more equations did not get rendered due to their size.
                Instances For

                  Per-language profile #

                  A language's color-naming profile across [DH13b] Chs 132–135. Coverage is sparse (~120 languages); fields are optional.

                  Instances For
                    Equations
                    • One or more equations did not get rendered due to their size.
                    Instances For

                      WALS converters #

                      Build a ColorProfile from the WALS Chs 132–135 rows for an ISO 639-3 code, mapping each chapter's datapoint through its converter; a field for which WALS has no row is none. Makes the per-language Fragment profiles true-by-construction from the auto-generated WALS tables rather than hand-transcribed literals.

                      Equations
                      • One or more equations did not get rendered due to their size.
                      Instances For

                        The eight WALS-sourced sample profiles #

                        The Berlin-Kay complexity coordinate #

                        IB complexity handle for a color profile: its basic-category count (0 when WALS records no Ch 133 datum).

                        Equations
                        Instances For

                          Bridge to the Information-Bottleneck objective #

                          A color system as a Pragmatics.Efficiency.CostPair: cost₁ is the IB complexity handle (category count), cost₂ is the system's accuracy/ distortion component, left abstract (WALS does not record it per language).

                          Equations
                          Instances For

                            Structural bridge. Under the β-scalarized IB objective weightedCost, a system with more basic categories has at least as high a complexity cost, for every β ≥ 0 and any fixed accuracy. The Berlin-Kay category ordering is therefore an ordering on the IB complexity axis the paper plots.

                            The idealized anchor of the paper's near-optimality finding: a color system that coincides with the IB-optimal system at its fitted β has zero efficiency loss. Real languages are near-optimal (small nonzero ε_l), which is the paper's measured result rather than a theorem here.

                            The Fragment sample #

                            The eight WALS-sourced color profiles formalized as Fragments. All are industrialized-language systems near the top of the Berlin & Kay sequence.

                            Equations
                            • One or more equations did not get rendered due to their size.
                            Instances For

                              Every sampled language draws both the warm (red/yellow) and cool (green/blue) boundaries — all are high-complexity, late-sequence systems, consistent with their high category counts and fitted β_l > 1.

                              Concrete complexity contrast: English (11 basic terms) sits higher on the IB complexity axis than Mandarin (8–8.5), as the Berlin-Kay sequence predicts.

                              The contrast lifts to the IB objective: for every β ≥ 0 (and any fixed accuracy), English's β-scalarized complexity cost is at least Mandarin's.