Decoders — From Scored Candidates to Probability Distributions #
@cite{mcfadden-1974}
A Decoder is the choice component of a constraint-based grammar.
Given a finite set of candidates, each carrying a score, a decoder
returns a probability distribution over those candidates:
scored candidates --[Decoder]--> P : Cand → ℝ
Different constraint frameworks correspond to different decoders. This
table shows how the major OT/HG-family frameworks all instantiate the
same Decoder interface, varying only the score type and the choice rule:
| Framework | Score type | Decoder |
|---|---|---|
| OT | LexProfile Nat n | argmin (uniform over winners) |
| HG | ℝ (harmony) | argmax (uniform over winners) |
| MaxEnt | ℝ (harmony) | softmax (Gumbel-noise argmax) |
| Normal MaxEnt | ℝ (harmony) | probit (Gaussian-noise argmax) |
| NHG | ℝ (harmony) | weight-perturbed argmax |
| Stochastic OT | LexProfile Nat n | ranking-perturbed argmin |
The unifying principle is McFadden's Random Utility Model
(@cite{mcfadden-1974}): every stochastic decoder is "argmax after a
noise kernel applied to the scores". The noise distribution determines
the choice rule (Gumbel ⇒ softmax, Gaussian ⇒ probit, point-mass ⇒
deterministic argmax). Noise-kernel-based decoders live in a sibling
file; here we just expose the Decoder interface and the three
foundational instances.
Zero-temperature bridge #
softmaxDecoder α interpolates between MaxEnt-style soft optimization
and OT-style hard optimization: as α → ∞, the softmax distribution
concentrates on the unique maximizer (when one exists), recovering
argmaxDecoder. This is the formal statement of the OT-as-MaxEnt-limit
correspondence and is captured by softmax_argmax_limit in
Core.Agent.RationalAction.
Semiring connection (deferred) #
For deterministic (point-mass noise) decoders, the resulting algebraic structure on scores is a semiring:
argmaxDecoderoverℝ↔ max-plus semiringargminDecoderoverTropical R↔ min-plus (tropical) semiringsoftmaxDecoderoverℝ↔ log-sum-exp ("warped") semiring
The zero-temperature limit softmax → argmax is precisely the semiring
homomorphism log-sum-exp → max in the limit α → ∞. A separate
Semiring.lean file will make these bridges precise.
A decoder maps scored candidates to a probability distribution.
Decoder Cand Score packages a single function:
given a finite candidate set and a Score-valued evaluation, return
a real-valued probability for each candidate.
The structure does not enforce that outputs are non-negative or sum
to 1 — those are properties of well-behaved decoders, captured by
IsProbDecoder below. Keeping the structure plain makes it easy to
define non-normalized scoring functions (e.g., raw exp(score) rather
than full softmax) and prove their probabilistic properties separately.
- decode : Finset Cand → (Cand → Score) → Cand → ℝ
Decode a finite scored candidate set into a probability assignment.
Instances For
A decoder is a probability decoder when its outputs are non-negative and sum to 1 over any non-empty candidate set.
All decoded probabilities are non-negative.
- sum_eq_one {cands : Finset Cand} : cands.Nonempty → ∀ (score : Cand → Score), ∑ c ∈ cands, d.decode cands score c = 1
Decoded probabilities sum to 1 over any non-empty candidate set.
Instances
The MaxEnt decoder: softmax over real-valued scores at temperature α.
decode cands score c = exp(α · score(c)) / Σ_{c' ∈ cands} exp(α · score(c'))
when c ∈ cands, and 0 otherwise.
Equivalent to Core.softmax from RationalAction.lean but generalised
from Fintype to a finite subset cands of a possibly-infinite type.
Equations
- One or more equations did not get rendered due to their size.
Instances For
The HG-style decoder: uniform distribution over the maximizers of
score on cands. Deterministic (Dirac on the unique winner) when
the maximum is achieved by exactly one candidate.
Used by deterministic HG (with Score = ℝ and score = harmonyScoreR)
and is the α → ∞ limit of softmaxDecoder (see softmax_argmax_limit).
Equations
- One or more equations did not get rendered due to their size.
Instances For
The OT-style decoder: uniform distribution over the minimizers of
score on cands.
For Optimality Theory, instantiate with Score = LexProfile Nat n,
so that the lexicographic minimum on the violation-count profile
selects the optimal candidate(s). With a strict total ranking and a
single optimum, this is a Dirac on the OT winner.
Equations
- One or more equations did not get rendered due to their size.
Instances For
The softmax decoder is a proper probability decoder: outputs are non-negative and sum to 1 over any non-empty candidate set.
The argmax decoder is a proper probability decoder. The non-trivial part
is that the set of winners is non-empty when cands is, so the uniform
distribution 1/winners.card makes sense.
The argmin decoder is a proper probability decoder. Symmetric to the argmax case.