Pitman–Yor process #
@cite{pitman-2006} @cite{odonnell-2015}
The Pitman–Yor process (PYP) is a two-parameter Bayesian non-parametric
distribution on partitions of [n], generalising the Chinese Restaurant
Process (the one-parameter Dirichlet process). The mathematical reference
is @cite{pitman-2006} §3.2 (Saint-Flour lectures); the linguistic
application that motivates this file is @cite{odonnell-2015} §3.1.6
(memoization distribution for adaptor and fragment grammars in
Theories/Morphology/FragmentGrammars/).
Naming convention #
@cite{pitman-2006} writes parameters as (α, θ) with α = discount and
θ = concentration; @cite{odonnell-2015} writes (a, b) for the same
two. We use (discount, concentration) to match neither convention's
single letters but to be self-documenting.
Main definitions #
stepPochhammer x s m— generalised step product∏_{k=0}^{m-1} (x + k·s)(@cite{pitman-2006} eq 3.7, @cite{odonnell-2015} eq 3.13). Specialises to the rising factorial(ascPochhammer R m).eval xats = 1and to the geometric powerx^mats = 0.PitmanYor— the two-parameter family(discount, concentration)with0 ≤ discount ≤ 1andconcentration ≥ -discount(@cite{pitman-2006} eq 3.5, second case).PitmanYor.partitionProb— the exchangeable partition probability function (EPPF) of @cite{pitman-2006} Theorem 3.2 (eq 3.6) / @cite{odonnell-2015} eq 3.14.
What partitionProb actually computes #
partitionProb q evaluates Pitman's EPPF formula
(@cite{pitman-2006} eq 3.6) at the multiset of block sizes q.parts.
The EPPF is, per @cite{pitman-2006} p. 39, the probability that the
random partition Π_n equals any specific (set) partition of [n]
whose blocks have sizes (n_1, …, n_k). By the EPPF's symmetry, the
value depends only on the multiset of sizes — which is what makes the
Nat.Partition n → ℝ signature well-typed.
partitionProb q is therefore the prob of one specific set partition
with multiset of block sizes q.parts, NOT the prob of the multiset
q.parts itself. The two differ by the multiplicity factor
mult(q) = n! / (∏_{m ∈ q.parts} m! · ∏_{j} (q.parts.count j)!)
i.e. the number of set partitions of [n] whose block sizes are
q.parts (@cite{pitman-2006} eq 2.2 / Nat.Partition.numSetPartitions).
Sum-to-1 identities #
Pitman 2006 gives several equivalent normalisations of the EPPF:
(a) ∑_{Π : set partition of [n]} EPPF(block sizes of Π) = 1
@cite{pitman-2006} Thm 3.2
(b) ∑_{q : Nat.Partition n} mult(q) · partitionProb q = 1
@cite{pitman-2006} eq 2.2
(c) ∑_{compositions (n_1,…,n_k) of n} (n choose n_1,…,n_k)·1/k! · EPPF(n_1,…,n_k) = 1
@cite{pitman-2006} p. 42
We formalise (a) as sum_partitionProb_set_eq_one, summing over
Finpartition (Finset.univ : Finset (Fin n)). This is the form the
downstream AdaptorGrammar consumer needs (since AG's Y is a labeled
table assignment, equivalent to a set partition under the canonical
"tables labeled by order of creation" convention).
The bare sum ∑_{q : Nat.Partition n} partitionProb q does NOT
equal 1 in general — every multiset appears once in the sum, but the
EPPF interpretation requires counting it mult(q) times. For example,
at α = 0, θ = 1, n = 3 the bare sum is 2/3.
Limitations #
partitionProbreturnsℝrather thanPMF. The downstream consumerAdaptorGrammar.corpusProbGivenTablesis itself an ℝ-valued kernel (table assignments are latent, not marginalised), so the bare-ℝ form is what the consumer wants.- The normalisation theorem
sum_partitionProb_set_eq_one(@cite{pitman-2006} Thm 3.2) is stated assorrybelow. The construction-style proof (build the EPPF as the marginal of the consistent Chinese Restaurant seating plan @cite{pitman-2006} §3.2) needs Markov-kernel infrastructure linglib does not yet have; the closed-form proof needs Vandermonde-style identities not yet in mathlib. Recorded assorryper CLAUDE.md "prefer sorry over weakening." - Reduction theorems (
discount = 0⇒ Chinese Restaurant Process / Dirichlet process @cite{pitman-2006} §3.2 case(α=0, θ>0),discount = 1⇒ all-singletons partition) require a CRP file (linglib has none) and are deferred.
The number of distinct set partitions of Fin n whose multiset of
block cardinalities equals q.parts. By the standard combinatorial
formula (@cite{pitman-2006} eq 2.3 in implicit form):
mult(q) = n! / (∏_{m ∈ q.parts} m! · ∏_{j ∈ q.parts.toFinset} (q.parts.count j)!)
For example, mult({1, 2}) = 3 (counts {{1},{2,3}}, {{2},{1,3}},
{{3},{1,2}}); mult({1, 1, 1}) = 1; mult({3}) = 1.
The natural-number division is exact (the denominator divides n!
because both quantities count the same set of objects from different
angles).
Equations
- q.numSetPartitions = n.factorial / (Multiset.map Nat.factorial q.parts).prod / ∏ j ∈ q.parts.toFinset, (Multiset.count j q.parts).factorial
Instances For
Extend a partition of n by a new singleton block of size 1,
yielding a partition of n + 1. The combinatorial counterpart of
"the next customer sits at a new (singleton) table" in the PYP
seating plan (@cite{pitman-2006} §3.2).
Equations
- q.consOne = { parts := 1 ::ₘ q.parts, parts_pos := ⋯, parts_sum := ⋯ }
Instances For
Replace one occurrence of m (a block of size m) with m + 1,
yielding a partition of n + 1. The combinatorial counterpart of
"the next customer joins an existing table of size m" in the
PYP seating plan (@cite{pitman-2006} §3.2).
Equations
- q.replaceMem m hm = { parts := (m + 1) ::ₘ q.parts.erase m, parts_pos := ⋯, parts_sum := ⋯ }
Instances For
Convert a set partition of Fin n (i.e., a Finpartition of
Finset.univ : Finset (Fin n)) to its block-size multiset
(Nat.Partition n). This is the projection that loses the set-partition
structure (which elements are in which block) and keeps only the
cardinalities.
Used by AdaptorGrammar.pypFactor to evaluate Pitman's EPPF
(PitmanYor.partitionProb) on a labeled table assignment, since the
EPPF formula depends only on block sizes.
Equations
- P.toNatPartition = { parts := Multiset.map Finset.card P.parts.val, parts_pos := ⋯, parts_sum := ⋯ }
Instances For
Convert an ordered set partition of Fin n (mathlib's OrderedFinpartition n,
parts ordered by greatest element) to its block-size multiset
(Nat.Partition n). This is the projection that drops the part-ordering
data and keeps only the multiset of block cardinalities.
Used by PitmanYor.partitionProb evaluation: the EPPF formula depends
only on block sizes, so the OrderedFinpartition's specific ordering is
irrelevant — but the structure gives us mathlib's extendEquiv which
matches Pitman's seating-plan recursion exactly.
Equations
- c.toNatPartition = { parts := Multiset.map c.partSize Finset.univ.val, parts_pos := ⋯, parts_sum := ⋯ }
Instances For
Mathlib's OrderedFinpartition.extendLeft (= "add new singleton block")
corresponds to Nat.Partition.consOne at the multiset level.
Mathlib's OrderedFinpartition.extendMiddle k (= "extend block k by 1
element") corresponds to Nat.Partition.replaceMem (partSize k) at the
multiset level.
Convert a Composition n (mathlib's "list of positive integers summing to
n" view) into the corresponding OrderedFinpartition n whose i-th block
embeds to the consecutive interval [c.sizeUpTo i, c.sizeUpTo (i + 1)) of
Fin n.
Bridges mathlib's two combinatorial views: Composition n carries clean
list-of-sizes data (used wherever a per-input size profile arises); the
OrderedFinpartition view exposes per-element membership embeddings (needed
by OrderedFinpartition.toNatPartition to project to Nat.Partition n,
which is what Pitman's EPPF formula consumes — see partitionProb).
The realization uses mathlib's Composition.embedding : Fin (c.blocksFun i) ↪o Fin n directly for the emb field. Disjointness and coverage follow
from mem_range_embedding_iff plus the interval characterization
range (embedding i) = [sizeUpTo i, sizeUpTo (i+1)).
Equations
- One or more equations did not get rendered due to their size.
Instances For
The step Pochhammer product
[x]_{m, s} := ∏_{k=0}^{m-1} (x + k · s) of @cite{pitman-2006} eq 3.7
(written there as (x)_{m↑s}) and @cite{odonnell-2015} eq 3.13.
Generalises the standard rising factorial in the step s:
stepPochhammer x 0 m = x ^ m(geometric power; seestepPochhammer_zero_step).stepPochhammer x 1 m = (ascPochhammer R m).eval x(rising factorial; seestepPochhammer_one_eq_ascPochhammer_evalfor the bridge toMathlib.RingTheory.Polynomial.Pochhammer).stepPochhammer x s 0 = 1(empty product; seestepPochhammer_zero).
Used by PitmanYor.partitionProb with two values of the step:
s = 1 (block-size factors) and s = discount (table-creation factor).
The latter — step-discount — is the defining generalisation of the
Pitman–Yor process over the one-parameter Dirichlet process / Chinese
Restaurant Process.
Equations
- ProbabilityTheory.stepPochhammer x s m = ∏ k : Fin m, (x + ↑↑k * s)
Instances For
The step-1 case is mathlib's rising factorial. Bridges the
PYP file's notation to Mathlib.RingTheory.Polynomial.Pochhammer,
so users grepping for Pochhammer find both names.
The step-0 case collapses to a power: [x]_{m, 0} = x^m.
Step product is nonnegative when every factor is nonnegative.
General-purpose; the PYP-specific factor positivity lives in
PitmanYor.partitionProb_nonneg.
A Pitman–Yor process @cite{pitman-2006} §3.2 / @cite{odonnell-2015}
§3.1.6 with discount α ∈ [0, 1] and concentration θ > -α
(@cite{pitman-2006} eq 3.5, second case; the first case α = -κ < 0, θ = mκ for integer m is excluded — irrelevant for the linguistic
application here).
Sequential interpretation (@cite{pitman-2006} §3.2 (α, θ) seating
plan): the first customer sits at table 1; the (N + 1)-th customer
sits at occupied table i (current size n_i) with probability
(n_i - α) / (N + θ), or at a new table with probability
(K · α + θ) / (N + θ), where K is the current number of occupied
tables. Higher discount α favours new tables.
The constraint θ > -α is strict (matching @cite{pitman-2006}
eq 3.5). The boundary θ = -α is degenerate: the closed-form formula
has 0/0 ratios that Lean's x/0 = 0 convention resolves to 0,
giving partitionProb = 0 for partitions with K ≥ 2 blocks but
mathematically should give the all-elements-together-or-singleton
limit. Since this boundary is a separate (degenerate) distribution
not a regular PYP, we exclude it here.
Boundary cases (well-defined under the strict constraint):
discount = 0: reduces to the one-parameter Chinese Restaurant Process (Dirichlet process with parameterθ > 0).discount = 1: every customer sits at a new table;partitionProbis0on any partition with a non-singleton block (formula has1 - α = 0factor) and equals 1 on the all-singletons partition.
- discount : ℝ
Discount parameter
α ∈ [0, 1]. Higher values favour new tables. - concentration : ℝ
Concentration parameter
θ > -α(strict, per @cite{pitman-2006} eq 3.5). 0 ≤ discount.discount ≤ 1.-discount < concentration.
Instances For
The Pitman–Yor exchangeable partition probability function (EPPF)
of @cite{pitman-2006} Theorem 3.2 (eq 3.6) / @cite{odonnell-2015}
eq 3.14, evaluated at the multiset of block sizes q.parts:
EPPF(n_1, ..., n_K | α, θ) = [θ + α]_{K-1, α} · ∏_{i=1}^K [1 - α]_{n_i - 1, 1}
/ [θ + 1]_{N - 1, 1}
where K = q.parts.card is the number of blocks and N = n is the
total number of elements.
Semantics: by @cite{pitman-2006} p. 39, this is the probability
that the random partition Π_n equals any specific (set) partition
of [n] with the given multiset of block sizes — NOT the
probability of the multiset itself. The two differ by the multiplicity
factor n! / (∏ m_i! · ∏ count_j!). See the module docstring for the
three equivalent normalisation identities (Pitman 3.2 / 2.2 / p. 42).
The EPPF is symmetric in (n_1, …, n_k), so the value is well-defined
as a function of the multiset q.parts. This is what makes the
Nat.Partition n → ℝ signature well-typed; it does not make
partitionProb a probability distribution on Nat.Partition n (it
isn't — see module docstring).
The n - 1 and q.parts.card - 1 terms use ℕ truncating subtraction.
This is intentional and correct on the boundary: at n = 0 (and hence
the unique empty partition with K = 0), all three step factors
collapse to the empty product 1, giving EPPF = 1 — the vacuous
probability of the unique set partition of zero elements.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Pitman–Yor partition probabilities are nonnegative. Each step
factor in @cite{pitman-2006} eq 3.6 is nonnegative under the PYP
constraints (0 ≤ α ≤ 1, θ ≥ -α), and the overall quotient and
product preserve nonnegativity. Used downstream by
AdaptorGrammar.corpusProbGivenTables_nonneg.
Exchangeability is by construction: partitionProb reads only the
multiset q.parts, never an underlying ordering. Stated for
grep-discoverability — the proof is rfl.
Cross-multiplied form of the seating-plan transition for a new
singleton block (@cite{pitman-2006} §3.2 (α, θ) seating plan,
"new table" branch). The seating-plan probability of the (n+1)-th
customer creating a new table is (K · α + θ) / (n + θ), where K
is the current number of blocks; this is the multiplicative
counterpart, valid in all cases including n = θ = 0.
Equivalent (modulo dividing both sides by (n + θ) when nonzero) to:
partitionProb q.consOne = partitionProb q · (K · α + θ) / (n + θ).
Cross-multiplied form of the seating-plan transition for joining
an existing block of size m (@cite{pitman-2006} §3.2 (α, θ) seating
plan, "join occupied table" branch). The seating-plan probability of
the (n+1)-th customer joining an existing table of size m is
(m - α) / (n + θ); this is the multiplicative counterpart.
Equivalent (modulo dividing by (n + θ)) to:
partitionProb (q.replaceMem m hm) = partitionProb q · (m - α) / (n + θ).
Helper: sum of OrderedFinpartition.partSize over all part-indices
equals n. Follows from the cardinality of the bijection
Σ k, Fin (partSize k) ≃ Fin n.
The Pitman-Yor seating-plan addition rule (@cite{pitman-2006} eq 2.9
for the (α, θ) EPPF). Summing the partition probabilities over all
"ways to seat the (n+1)-th customer" recovers the partition probability
of the predecessor. Proved via Lemmas A (partitionProb_consOne_mul)
and B (partitionProb_replaceMem_mul) plus the arithmetic identity
(K · α + θ) + ∑_k (partSize k - α) = n + θ.
Pitman 2006 Theorem 3.2. The EPPF is a probability mass function
on the set of set partitions of [n], expressed via mathlib's
OrderedFinpartition n (canonical labeling by greatest element):
∑_{c : OrderedFinpartition n} partitionProb c.toNatPartition = 1
The proof is by induction on n using mathlib's
OrderedFinpartition.extendEquiv — the bijection
(c : OrderedFinpartition n) × Option (Fin c.length) ≃ OrderedFinpartition (n + 1) — to decompose the sum.
The inductive step is partitionProb_extend_sum, which combines
Lemma A (partitionProb_consOne_mul) and Lemma B
(partitionProb_replaceMem_mul) with the arithmetic identity
(K · α + θ) + ∑_k (partSize k - α) = n + θ.
OrderedFinpartition carries a specific canonical ordering of parts
(by greatest element), but partitionProb depends only on the
block-size multiset, so this is the right sample space: each set
partition of [n] corresponds to exactly one OrderedFinpartition n.