Documentation

Linglib.Core.Probability.PolyaUrn

Pólya urn (per-sequence likelihood) #

@cite{odonnell-2015}

A Pólya urn over an alphabet α is a sequential sampling scheme governed by strictly-positive pseudo-counts π : α → ℝ. The first draw is categorical with weights π_i / Σ π; the (N + 1)-th draw conditional on previous counts x_1, … is categorical with weights (π_i + x_i) / (Σ π + Σ x) — a preferential-attachment dynamic, finite-K variant of the power-law-tail dynamic that the Pitman–Yor process exhibits in the unbounded-alphabet limit.

By Dirichlet–Categorical conjugacy, drawing θ ~ Dirichlet(π) then sampling i.i.d. from Categorical(θ) and integrating out θ yields the same exchangeable sequence law (the de Finetti representation theorem guarantees that some mixing measure exists; identifying it as Dirichlet is conjugacy + integration). The probability of any specific draw sequence with counts x_1, …, x_K has the closed form of @cite{odonnell-2015} eq 3.7 (-- UNVERIFIED: section/equation number from memory; verify against PDF):

P(seq | π) = Γ(Σ π) / Γ(Σ π + Σ x)  ·  ∏ Γ(π_i + x_i) / Γ(π_i)

This file gives only the closed-form per-sequence likelihood seqProb — the form fragment-grammar consumers in Theories/Morphology/FragmentGrammars/ actually use (a corpus IS a labeled derivation sequence, not a draw from the unlabeled-count distribution). The corresponding count-vector PMF — the "Dirichlet–multinomial distribution" — lives in the sibling file DirichletMultinomial.lean, which carries the heavier Probability.ProbabilityMassFunction.Basic import (transitively ~10s of olean loading via MeasureTheory.Measure.Dirac).

The sequential sampler itself is not formalized — only the closed form is needed by downstream constructions.

Type-polymorphic alphabet #

The alphabet α is an arbitrary type; operations require [Fintype α] (so that ∑ i, ... and ∏ i, ... are well-defined), and theorems requiring positivity of the total pseudo-count additionally need [Nonempty α]. The previous Fin K-indexed shape is the special case α = Fin K (with [NeZero K] equivalent to [Nonempty (Fin K)]); the polymorphic shape composes cleanly with Finset-restricted alphabets needed by per-LHS PCFG factors.

Relationship to PitmanYor #

The Pólya urn is often described as the "finite-K Chinese Restaurant Process". This is correct sequentially but misleading distributionally: the labeled count distribution DirichletMultinomial.pmf (sibling file) is not equal at any finite K to the partition distribution PitmanYor.partitionProb at discount = 0. The two agree only in the limit K → ∞ with symmetric pseudo-counts π_i = b/K (Blackwell & MacQueen 1973; Ferguson 1973). The bridge is therefore a limit theorem, not a finite equality, and is not yet formalized — the labeled→unlabeled .map pushforward to PMF (Nat.Partition N) (the natural target type for such a bridge) is also deferred.

Main definitions #

References #

structure ProbabilityTheory.PolyaUrn (α : Type u_1) :
Type u_1

A Pólya urn scheme over the alphabet α, parameterized by strictly-positive pseudo-counts. In the de Finetti representation the pseudo-counts are the Dirichlet hyperparameters: i.i.d. draws from Categorical(θ) for θ ~ Dirichlet(pseudo) have the same exchangeable sequence law as the Pólya urn.

  • pseudo : α

    Per-color pseudo-count (the Dirichlet hyperparameter).

  • pseudo_pos (i : α) : 0 < self.pseudo i

    Pseudo-counts are strictly positive.

Instances For
    theorem ProbabilityTheory.PolyaUrn.ext_iff {α : Type u_1} {x y : PolyaUrn α} :
    x = y x.pseudo = y.pseudo
    theorem ProbabilityTheory.PolyaUrn.ext {α : Type u_1} {x y : PolyaUrn α} (pseudo : x.pseudo = y.pseudo) :
    x = y
    def ProbabilityTheory.PolyaUrn.total {α : Type u_1} [Fintype α] (u : PolyaUrn α) :

    The total pseudo-count Σ π_i. Strictly positive when α is nonempty.

    Equations
    Instances For
      theorem ProbabilityTheory.PolyaUrn.total_pos {α : Type u_1} [Fintype α] (u : PolyaUrn α) [Nonempty α] :
      0 < u.total
      noncomputable def ProbabilityTheory.PolyaUrn.seqProb {α : Type u_1} [Fintype α] (u : PolyaUrn α) (x : α) :

      Closed-form per-sequence likelihood (not the count PMF — see DirichletMultinomial.pmfReal for that): probability that a draw sequence with counts x was emitted by the urn u.

      P(seq | π) = Γ(Σ π) / Γ(Σ π + Σ x)  ·  ∏ Γ(π_i + x_i) / Γ(π_i)
      

      Depends only on the counts (not the order), which is what makes the recursive stochastic equations defining DMPCFG, adaptor grammars, and fragment grammars in Theories.Morphology.FragmentGrammars.* well-defined as marginals over draw order — partition exchangeability in the EPPF sense, distinct from but implied by exchangeability proper of the joint sequence law.

      To convert to the count-vector PMF, multiply by the multinomial coefficient (∑ x_i)! / ∏ (x_i!). See DirichletMultinomial.

      Equations
      • u.seqProb x = Real.Gamma u.total / Real.Gamma (u.total + i : α, (x i)) * i : α, Real.Gamma (u.pseudo i + (x i)) / Real.Gamma (u.pseudo i)
      Instances For
        theorem ProbabilityTheory.PolyaUrn.seqProb_zero {α : Type u_1} [Fintype α] (u : PolyaUrn α) [Nonempty α] :
        (u.seqProb fun (x : α) => 0) = 1

        The empty count vector — no draws — has per-sequence likelihood 1.

        theorem ProbabilityTheory.PolyaUrn.seqProb_succ {α : Type u_1} [Fintype α] (u : PolyaUrn α) [Nonempty α] [DecidableEq α] (x : α) (c : α) :
        u.seqProb (Function.update x c (x c + 1)) = u.seqProb x * (u.pseudo c + (x c)) / (u.total + i : α, (x i))

        Pólya urn predictive recurrence. Incrementing the count at color c by 1 multiplies seqProb by the Pólya predictive factor (π_c + x_c) / (Σπ + Σx):

        seqProb (x with x_c := x_c + 1) = seqProb x · (π_c + x_c) / (Σπ + Σx)
        

        This is the discrete-time recursion underlying the Dirichlet–Multinomial normalization. Proof: unfold seqProb, apply Real.Gamma_add_one at both Σπ + Σx (denominator) and π_c + x_c (the c-th factor in the product), and let the other terms cancel.

        theorem ProbabilityTheory.PolyaUrn.sum_counts_eq_length {α : Type u_1} [Fintype α] [DecidableEq α] {N : } (seq : Fin Nα) :
        d : α, {i : Fin N | seq i = d}.card = N

        The total count of a length-N sequence equals N.

        theorem ProbabilityTheory.PolyaUrn.sum_seqProb_eq_one {α : Type u_1} [Fintype α] (u : PolyaUrn α) [Nonempty α] [DecidableEq α] (N : ) :
        (∑ seq : Fin Nα, u.seqProb fun (c : α) => {x : Fin N | seq x = c}.card) = 1

        seqProb summed over all length-N sequences equals 1. The Polya urn defines a probability distribution over sequences; this identity says the total mass is 1. Proved by induction on N using seqProb_succ.

        theorem ProbabilityTheory.PolyaUrn.seqProb_pos {α : Type u_1} [Fintype α] (u : PolyaUrn α) [Nonempty α] (x : α) :
        0 < u.seqProb x

        Per-sequence Pólya likelihood is strictly positive on nonempty alphabets. Used by downstream consumers (DMPCFG, AdaptorGrammar) to derive nonnegativity of corpus probabilities.

        def ProbabilityTheory.PolyaUrn.symmetric {α : Type u_1} (c : ) (hc : 0 < c) :

        Construct a symmetric Polya urn with concentration c on every color. The Dirichlet-conjugate prior Dirichlet(c, c, …, c) underlying any "all colors equally a priori" scheme.

        Equations
        Instances For
          noncomputable def ProbabilityTheory.PolyaUrn.predictive {α : Type u_1} [Fintype α] (u : PolyaUrn α) (counts : α) (i : α) :

          The Polya predictive: probability that the next draw is color i, given observed counts. Closed form (π_i + counts i) / (∑ π_j + ∑ counts) follows from the ratio seqProb (counts + e_i) / seqProb counts via Γ(z+1) = z · Γ(z).

          Equations
          Instances For
            theorem ProbabilityTheory.PolyaUrn.predictive_zero {α : Type u_1} [Fintype α] (u : PolyaUrn α) (i : α) :
            u.predictive (fun (x : α) => 0) i = u.pseudo i / u.total

            The Polya predictive at zero counts is pseudo i / total — proportional to the pseudo-counts.

            theorem ProbabilityTheory.PolyaUrn.predictive_zero_symmetric {α : Type u_1} [Fintype α] [Nonempty α] (c : ) (hc : 0 < c) (i : α) :
            (symmetric c hc).predictive (fun (x : α) => 0) i = 1 / (Fintype.card α)

            For a symmetric urn, the predictive at zero counts is uniform 1/K.

            theorem ProbabilityTheory.PolyaUrn.predictive_mono {α : Type u_1} [Fintype α] [Nonempty α] [DecidableEq α] (u : PolyaUrn α) (counts₁ counts₂ : α) (i : α) (h_eq : ∀ (j : α), j icounts₁ j = counts₂ j) (h_le : counts₁ i counts₂ i) :
            u.predictive counts₁ i u.predictive counts₂ i

            Polya predictive monotonicity: a color with a higher previous count gets higher predictive probability (the "rich get richer" / preferential-attachment property).