Pólya urn (per-sequence likelihood) #
@cite{odonnell-2015}
A Pólya urn over an alphabet α is a sequential sampling
scheme governed by strictly-positive pseudo-counts
π : α → ℝ. The first draw is categorical with weights
π_i / Σ π; the (N + 1)-th draw conditional on previous counts
x_1, … is categorical with weights (π_i + x_i) / (Σ π + Σ x) — a
preferential-attachment dynamic, finite-K variant of the
power-law-tail dynamic that the Pitman–Yor process exhibits in the
unbounded-alphabet limit.
By Dirichlet–Categorical conjugacy, drawing θ ~ Dirichlet(π) then
sampling i.i.d. from Categorical(θ) and integrating out θ yields
the same exchangeable sequence law (the de Finetti representation
theorem guarantees that some mixing measure exists; identifying it
as Dirichlet is conjugacy + integration). The probability of any
specific draw sequence with counts x_1, …, x_K has the closed form
of @cite{odonnell-2015} eq 3.7 (-- UNVERIFIED: section/equation
number from memory; verify against PDF):
P(seq | π) = Γ(Σ π) / Γ(Σ π + Σ x) · ∏ Γ(π_i + x_i) / Γ(π_i)
This file gives only the closed-form per-sequence likelihood
seqProb — the form fragment-grammar consumers in
Theories/Morphology/FragmentGrammars/ actually use (a corpus IS a
labeled derivation sequence, not a draw from the unlabeled-count
distribution). The corresponding count-vector PMF — the
"Dirichlet–multinomial distribution" — lives in the sibling file
DirichletMultinomial.lean, which carries the heavier
Probability.ProbabilityMassFunction.Basic import (transitively
~10s of olean loading via MeasureTheory.Measure.Dirac).
The sequential sampler itself is not formalized — only the closed form is needed by downstream constructions.
Type-polymorphic alphabet #
The alphabet α is an arbitrary type; operations require
[Fintype α] (so that ∑ i, ... and ∏ i, ... are well-defined),
and theorems requiring positivity of the total pseudo-count
additionally need [Nonempty α]. The previous Fin K-indexed shape
is the special case α = Fin K (with [NeZero K] equivalent to
[Nonempty (Fin K)]); the polymorphic shape composes cleanly with
Finset-restricted alphabets needed by per-LHS PCFG factors.
Relationship to PitmanYor #
The Pólya urn is often described as the "finite-K Chinese Restaurant
Process". This is correct sequentially but misleading
distributionally: the labeled count distribution
DirichletMultinomial.pmf (sibling file) is not equal at any
finite K to the partition distribution PitmanYor.partitionProb
at discount = 0. The two agree only in the limit K → ∞ with
symmetric pseudo-counts π_i = b/K (Blackwell & MacQueen 1973;
Ferguson 1973). The bridge is therefore a limit theorem, not a
finite equality, and is not yet formalized — the labeled→unlabeled
.map pushforward to PMF (Nat.Partition N) (the natural target
type for such a bridge) is also deferred.
Main definitions #
PolyaUrn α— pseudo-counts onα(the Dirichlet hyperparameters).PolyaUrn.total— the sumΣ π_i.PolyaUrn.seqProb— closed-form per-sequence likelihood (eq 3.7 of @cite{odonnell-2015}, depending only on counts).
References #
- @cite{odonnell-2015} — Pólya-urn closed form for DMPCFG (-- UNVERIFIED: §3.1.3 eq 3.7 from memory; verify against PDF).
- Blackwell, D. & MacQueen, J. B. (1973). "Ferguson distributions via Pólya urn schemes". The Annals of Statistics 1(2): 353–355.
- Ferguson, T. S. (1973). "A Bayesian analysis of some nonparametric problems". The Annals of Statistics 1(2): 209–230.
A Pólya urn scheme over the alphabet α, parameterized by
strictly-positive pseudo-counts. In the de Finetti representation
the pseudo-counts are the Dirichlet hyperparameters: i.i.d. draws
from Categorical(θ) for θ ~ Dirichlet(pseudo) have the same
exchangeable sequence law as the Pólya urn.
- pseudo : α → ℝ
Per-color pseudo-count (the Dirichlet hyperparameter).
Pseudo-counts are strictly positive.
Instances For
Closed-form per-sequence likelihood (not the count PMF — see
DirichletMultinomial.pmfReal for that): probability that a draw
sequence with counts x was emitted by the urn u.
P(seq | π) = Γ(Σ π) / Γ(Σ π + Σ x) · ∏ Γ(π_i + x_i) / Γ(π_i)
Depends only on the counts (not the order), which is what makes the
recursive stochastic equations defining DMPCFG, adaptor grammars, and
fragment grammars in Theories.Morphology.FragmentGrammars.*
well-defined as marginals over draw order — partition
exchangeability in the EPPF sense, distinct from but implied by
exchangeability proper of the joint sequence law.
To convert to the count-vector PMF, multiply by the multinomial
coefficient (∑ x_i)! / ∏ (x_i!). See DirichletMultinomial.
Equations
Instances For
The empty count vector — no draws — has per-sequence likelihood 1.
Pólya urn predictive recurrence. Incrementing the count at color c
by 1 multiplies seqProb by the Pólya predictive factor
(π_c + x_c) / (Σπ + Σx):
seqProb (x with x_c := x_c + 1) = seqProb x · (π_c + x_c) / (Σπ + Σx)
This is the discrete-time recursion underlying the Dirichlet–Multinomial
normalization. Proof: unfold seqProb, apply Real.Gamma_add_one at
both Σπ + Σx (denominator) and π_c + x_c (the c-th factor in the
product), and let the other terms cancel.
seqProb summed over all length-N sequences equals 1. The Polya urn
defines a probability distribution over sequences; this identity says the
total mass is 1. Proved by induction on N using seqProb_succ.
Per-sequence Pólya likelihood is strictly positive on nonempty
alphabets. Used by downstream consumers (DMPCFG, AdaptorGrammar)
to derive nonnegativity of corpus probabilities.
Construct a symmetric Polya urn with concentration c on every color.
The Dirichlet-conjugate prior Dirichlet(c, c, …, c) underlying any
"all colors equally a priori" scheme.
Equations
- ProbabilityTheory.PolyaUrn.symmetric c hc = { pseudo := fun (x : α) => c, pseudo_pos := ⋯ }
Instances For
The Polya predictive: probability that the next draw is color i,
given observed counts. Closed form
(π_i + counts i) / (∑ π_j + ∑ counts) follows from the ratio
seqProb (counts + e_i) / seqProb counts via Γ(z+1) = z · Γ(z).
Equations
- u.predictive counts i = (u.pseudo i + ↑(counts i)) / (u.total + ∑ j : α, ↑(counts j))
Instances For
The Polya predictive at zero counts is pseudo i / total —
proportional to the pseudo-counts.
For a symmetric urn, the predictive at zero counts is uniform 1/K.
Polya predictive monotonicity: a color with a higher previous count gets higher predictive probability (the "rich get richer" / preferential-attachment property).