@cite{kao-etal-2014-metaphor} on mathlib PMF #
@cite{kao-etal-2014-metaphor}
"Formalizing the Pragmatics of Metaphor Understanding" Proceedings of the Annual Meeting of the Cognitive Science Society 36, 719-724.
Model (verified against paper) #
- L0 (Eq. before 1):
L0(c, f⃗ | u) = P(f⃗|c) if c = u, else 0 - Speaker utility (Eq. 1):
U(u | g, f⃗) = log Σ_{c, f⃗' : g(f⃗') = g(f⃗)} L0(c, f⃗' | u) - Speaker (Eq. 2):
S1(u | g, f⃗) ∝ exp(λ · U(u | g, f⃗)) - Pragmatic listener:
L1(c, f⃗ | u) ∝ P(c) · P(f⃗|c) · Σ_g P(g) · S1(u | g, f⃗)
Parameters (paper §"Model Evaluation"):
- λ = 3 (speaker rationality)
- P(whale) = 0.01, P(person) = 0.99 (category prior)
- Vague QUD: uniform P(g) = 1/3
- Specific QUD (
f₁): P(g₁) = 0.6, P(g₂) = P(g₃) = 0.2
Softmax-of-log at the root, not rpow workaround #
The speaker utility U = log Σ L0 returns -∞ when the QUD-projected L0
marginal is 0. Mathlib's Real.log 0 = 0 would silently break this —
exp(λ · 0) = 1 would give nonzero weight to impossible utterances.
The substrate fix: PMF.softmax takes score : α → EReal, with
EReal.exp(⊥) = 0 correctly handling impossible utterances. The Kao S1
is then PMF.softmax (fun u => (α : EReal) * ENNReal.log (qudProjL0 g u f⃗))
— directly the paper's formula with no rpow workaround. Mathematically
equivalent to (qudProjL0)^λ (via ENNReal.rpow_eq_exp_mul_log) when
qudProjL0 > 0, but the EReal form makes the boundary case explicit.
For Kao's empirical priors (Experiment 1b), all featurePriorℕ entries
are strictly positive (min = 379), so all QUD-marginals are positive and
the boundary doesn't bite numerically. The substrate handles the general
case correctly anyway — RSA papers with sparser literal-listener support
inherit the right behaviour.
Joint posterior over (World × Goal) #
Kao's L1 jointly infers the speaker's category-and-features and
marginalises over goal. PMF-natively: posterior of World × Goal
projected to the World-marginal — exactly PMF.posterior_fst_apply
from Core/Probability/JointPosterior.lean.
This file uses the equivalent kernel-form: World → PMF Cat defined as
goalPrior.bind ∘ s1, folding the goal-marginalisation into the kernel.
The two formulations agree by associativity of bind.
§0. Domain types #
Categories: whale (metaphor vehicle) and person (literal referent). Categories double as utterance types.
Instances For
Equations
- Phenomena.Nonliteral.Metaphor.KaoEtAl2014.PMF.instDecidableEqCat x✝ y✝ = if h : x✝.ctorIdx = y✝.ctorIdx then isTrue ⋯ else isFalse ⋯
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Equations
- Phenomena.Nonliteral.Metaphor.KaoEtAl2014.PMF.instDecidableEqGoal x✝ y✝ = if h : x✝.ctorIdx = y✝.ctorIdx then isTrue ⋯ else isFalse ⋯
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Feature vector: 3 booleans (e.g. {large, graceful, majestic} for whale).
Equations
- Phenomena.Nonliteral.Metaphor.KaoEtAl2014.PMF.Features = (Bool × Bool × Bool)
Instances For
World = (category, feature vector). 2 × 8 = 16 worlds.
Equations
Instances For
§1. Empirical priors (Experiment 1b counts ×10000) #
Feature prior P(f⃗ | c) as integer counts (×10000). Both per-category
sums = 10000 by construction.
Equations
- Phenomena.Nonliteral.Metaphor.KaoEtAl2014.PMF.featurePriorℕ Phenomena.Nonliteral.Metaphor.KaoEtAl2014.PMF.Cat.whale (true, true, true) = 3059
- Phenomena.Nonliteral.Metaphor.KaoEtAl2014.PMF.featurePriorℕ Phenomena.Nonliteral.Metaphor.KaoEtAl2014.PMF.Cat.whale (true, true, false) = 1381
- Phenomena.Nonliteral.Metaphor.KaoEtAl2014.PMF.featurePriorℕ Phenomena.Nonliteral.Metaphor.KaoEtAl2014.PMF.Cat.whale (true, false, true) = 1791
- Phenomena.Nonliteral.Metaphor.KaoEtAl2014.PMF.featurePriorℕ Phenomena.Nonliteral.Metaphor.KaoEtAl2014.PMF.Cat.whale (true, false, false) = 1310
- Phenomena.Nonliteral.Metaphor.KaoEtAl2014.PMF.featurePriorℕ Phenomena.Nonliteral.Metaphor.KaoEtAl2014.PMF.Cat.whale (false, true, true) = 947
- Phenomena.Nonliteral.Metaphor.KaoEtAl2014.PMF.featurePriorℕ Phenomena.Nonliteral.Metaphor.KaoEtAl2014.PMF.Cat.whale (false, true, false) = 531
- Phenomena.Nonliteral.Metaphor.KaoEtAl2014.PMF.featurePriorℕ Phenomena.Nonliteral.Metaphor.KaoEtAl2014.PMF.Cat.whale (false, false, true) = 602
- Phenomena.Nonliteral.Metaphor.KaoEtAl2014.PMF.featurePriorℕ Phenomena.Nonliteral.Metaphor.KaoEtAl2014.PMF.Cat.whale (false, false, false) = 379
- Phenomena.Nonliteral.Metaphor.KaoEtAl2014.PMF.featurePriorℕ Phenomena.Nonliteral.Metaphor.KaoEtAl2014.PMF.Cat.person (true, true, true) = 1169
- Phenomena.Nonliteral.Metaphor.KaoEtAl2014.PMF.featurePriorℕ Phenomena.Nonliteral.Metaphor.KaoEtAl2014.PMF.Cat.person (true, true, false) = 1058
- Phenomena.Nonliteral.Metaphor.KaoEtAl2014.PMF.featurePriorℕ Phenomena.Nonliteral.Metaphor.KaoEtAl2014.PMF.Cat.person (true, false, true) = 1157
- Phenomena.Nonliteral.Metaphor.KaoEtAl2014.PMF.featurePriorℕ Phenomena.Nonliteral.Metaphor.KaoEtAl2014.PMF.Cat.person (true, false, false) = 1308
- Phenomena.Nonliteral.Metaphor.KaoEtAl2014.PMF.featurePriorℕ Phenomena.Nonliteral.Metaphor.KaoEtAl2014.PMF.Cat.person (false, true, true) = 1529
- Phenomena.Nonliteral.Metaphor.KaoEtAl2014.PMF.featurePriorℕ Phenomena.Nonliteral.Metaphor.KaoEtAl2014.PMF.Cat.person (false, true, false) = 1281
- Phenomena.Nonliteral.Metaphor.KaoEtAl2014.PMF.featurePriorℕ Phenomena.Nonliteral.Metaphor.KaoEtAl2014.PMF.Cat.person (false, false, true) = 1147
- Phenomena.Nonliteral.Metaphor.KaoEtAl2014.PMF.featurePriorℕ Phenomena.Nonliteral.Metaphor.KaoEtAl2014.PMF.Cat.person (false, false, false) = 1351
Instances For
Category prior P(c) as integer counts: 1% whale, 99% person.
Equations
Instances For
Every entry of the feature prior is strictly positive. The witness that drives every positivity proof in the file.
§2. PMF construction #
Feature prior as a PMF over Features, parametric in category.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Category prior as a PMF over Cat.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Joint world prior P(c, f⃗) = P(c) · P(f⃗|c).
Equations
- One or more equations did not get rendered due to their size.
Instances For
worldPmf is full-support: every world has positive joint prior
mass (since both catPriorℕ and featurePriorℕ are strictly positive
across the full domain). Discharged via mem_support_bind_iff +
mem_support_map_iff from mathlib.
§3. Literal listener L0 #
L0(c, f⃗ | u) = P(f⃗|c) if c = u, else 0. PMF-form: deterministic embedding
of featurePmf u into worlds with category u.
Literal listener: PMF over World concentrated on c = u.
Equations
- One or more equations did not get rendered due to their size.
Instances For
§4. QUD projection #
Project a feature vector onto the QUD-relevant component.
Equations
- Phenomena.Nonliteral.Metaphor.KaoEtAl2014.PMF.projectFeature Phenomena.Nonliteral.Metaphor.KaoEtAl2014.PMF.Goal.large (l, fst, snd) = l
- Phenomena.Nonliteral.Metaphor.KaoEtAl2014.PMF.projectFeature Phenomena.Nonliteral.Metaphor.KaoEtAl2014.PMF.Goal.graceful (fst, g, snd) = g
- Phenomena.Nonliteral.Metaphor.KaoEtAl2014.PMF.projectFeature Phenomena.Nonliteral.Metaphor.KaoEtAl2014.PMF.Goal.majestic (fst, fst_1, m) = m
Instances For
The QUD-projected L0 marginal at (g, u, f⃗): sum of featurePmf u f'
over the QUD-equivalence class of f⃗.
Built from the parametric RSA.QUD.proj substrate — the same primitive
shared with Kao 2014 hyperbole and Kao 2015 irony.
Equations
- One or more equations did not get rendered due to their size.
Instances For
The QUD-projected L0 marginal at (g, u, f⃗) is bounded below by
featurePmf u f⃗ (since f⃗ is in its own equivalence class).
§5. Speaker S1 (softmax of log-utility, Eq. 1-2) #
Score s1Score α g f u = (α : EReal) * ENNReal.log (qudProjL0 g u f).
EReal-valued: -∞ when qudProjL0 = 0, finite otherwise. The softmax
substrate PMF.softmax correctly handles -∞ (gives 0 mass).
For Kao's data, qudProjL0 > 0 always, so the score is always finite —
the substrate's -∞ handling isn't load-bearing here, but is the right
default for the general RSA pattern.
Speaker score s1Score α g f u = λ · log(qudProjL0 g u f) : EReal.
Equations
- Phenomena.Nonliteral.Metaphor.KaoEtAl2014.PMF.s1Score α g f u = ↑α * (Phenomena.Nonliteral.Metaphor.KaoEtAl2014.PMF.qudProjL0 g u f).log
Instances For
The score is never +∞: direct from substrate
PMF.coe_mul_log_ne_top (real coefficient × log of finite ENNReal).
The score is never −∞: direct from substrate
PMF.coe_mul_log_ne_bot (real coefficient × log of nonzero ENNReal).
Speaker S1: softmax of QUD-projected log-utility (Eq. 1-2).
Equations
Instances For
s1 is full-support: every utterance has positive speaker mass.
Direct from the substrate softmax_pos_iff_score_ne_bot: the score is
finite-below at every utterance because qudProjL0 g u f > 0 (full
literal-listener support across the QUD-projection in Kao's model).
§6. Goal-marginalised speaker #
mixedS1 positivity (the architectural lemma the audit identified).
The goal-marginalised speaker assigns positive mass at every utterance,
provided some goal has positive prior. Witnessed by the g-th term of the
bind-tsum: goalPrior g * s1 ... u > 0 because both factors are positive.
§7. Pragmatic listener L1 #
L1(c, f⃗ | u) ∝ worldPmf(c, f⃗) · mixedS1(u | f⃗).
PMF: posterior of the kernel (c, f⃗) ↦ mixedS1(· | f⃗) against worldPmf.
The kernel for L1: (c, f⃗) ↦ mixedS1(· | f⃗). Independent of c.
Equations
- Phenomena.Nonliteral.Metaphor.KaoEtAl2014.PMF.L1Kernel α hα goalPrior w = Phenomena.Nonliteral.Metaphor.KaoEtAl2014.PMF.mixedS1 α hα goalPrior w.2
Instances For
L1 marginal at u is non-zero — needed for posterior discharge.
Direct from the witness (.whale, (true, true, true)): worldPmf is
full-support (worldPmf_pos) and mixedS1 is positive at every utterance
when some goal has positive prior (mixedS1_pos).
Pragmatic listener L1: posterior over World given utterance u.
Equations
- One or more equations did not get rendered due to their size.
Instances For
§8. Standard goal priors #
Vague QUD: uniform goal prior ("What is he like?").
Equations
Instances For
Specific QUD targeting f₁ ("Is he f₁?"): P(g₁) = 0.6, P(g₂) = P(g₃) = 0.2.
Equations
- One or more equations did not get rendered due to their size.
Instances For
§9. Empirical findings (paper §"Model Evaluation") #
Each finding stated as an outer-measure inequality on the L1 posterior.
Speaker rationality λ = 3 (paper §"Model Evaluation").
Speaker rationality (paper §"Model Evaluation").
Equations
Instances For
L1 with vague QUD (the most common condition in §"Model Evaluation").
Equations
- One or more equations did not get rendered due to their size.
Instances For
L1 with specific-f₁ QUD.
Equations
- One or more equations did not get rendered due to their size.
Instances For
§9a. Architectural theorems (parametric in priors) #
The paper's headline isn't "P(person|whale) = 0.994 at Kao's specific empirical priors". It's the architectural claim: rational speech act pragmatics, with QUD-projected utility and a kernel that depends only on features (not category), produces L1 posteriors that combine prior knowledge with speaker-side preferences according to Bayes.
Each architectural theorem below states the structural content. The Kao-specific corollaries in §9b instantiate at the empirical priors.
Architectural: posterior-fibre asymmetry follows the unnormalised sums.
The structural-decomposition lemma applied to cat-fibres of the L1 posterior:
which category L1 favours after observing utterance u reduces to which
cat-fibre has more conditional joint mass under worldPmf · mixedS1(u | ·).
This is the architectural form of nonliteral and literal_correct: both
findings are instances of "the cat-fibre with more conditional joint mass
wins". The numerical content per finding is the inequality of those sums
(Kao's specific values vs the model-class generic structure).
Architectural: feature-set asymmetry follows the unnormalised sums.
Companion of L1_cat_fibre_lt_iff_inner_sum_lt for feature-event sets
(used by Findings 2-4 and 5).
The feature_pred predicate carves out a feature-event in World; outer
measure of that event under L1 reduces to summing the conditional joint
masses over the feature-event-fibre.
§9b. Kao-specific corollaries (paper findings at empirical priors) #
The 6 findings from @cite{kao-etal-2014-metaphor} §"Model Evaluation"
expressed as direct outer-measure inequalities at Kao's empirical priors.
Each finding reduces via §9a's architectural theorems to a comparison of
conditional joint sums at Kao's specific values. The remaining numerical
discharge is the empirical-fit content — sorried with TODO notes; full
formalisation requires either (a) the softmaxWeight_natMul_log_eq_pow
bridge (substrate, available) plus per-feature gcongr/norm_num
chains over the rpow form, or (b) a mixedS1_lower lemma giving an
explicit positive lower bound on the speaker contribution.
Finding 1 (Nonliteral interpretation): hearing "whale", listener
infers person, not literally whale. Paper: P(c_p|u_whale) = 0.994.
By L1_cat_fibre_lt_iff_inner_sum_lt, reduces to comparing conditional
joint sums on the two cat-fibres. The 99× catPrior asymmetry dominates
the bounded speaker contribution.
Finding 2 (Feature elevation: large): hearing "whale" raises P(large = T).
By L1_feature_event_lt_iff_inner_sum_lt, reduces to comparing conditional
joint sums over the two feature-event-fibres. Whale's featurePrior is
concentrated on large = T.
Finding 5 (Context sensitivity): specific QUD raises P(large = T). Cross-config comparison: same outer-measure target, different goalPrior.
Finding 6 (Literal correctness): hearing "person", listener
correctly infers person. Both prior AND speaker preferences agree on
.person; the cleanest of the 6 findings.
Demonstrates the architectural-theorem usage: convert Set notation to
Finset.filter, apply L1_cat_fibre_lt_iff_inner_sum_lt, sorry the
remaining inner-sum inequality (the empirical-fit content).
§10. Cross-paper engagement #
@cite{frank-goodman-2012} is the architectural ancestor — basic L0/S1/L1 without QUD inference. Kao's contribution is the joint inference over goals, opening the door to nonliteral interpretations.
@cite{goodman-stuhlmuller-2013} (Phenomena/ScalarImplicatures/Studies/GoodmanStuhlmuller2013PMF.lean)
shares the hypergeometric-kernel architecture with Kao's L0 (both use
"P(features|category) if categories match"). The architectural difference:
G&S2013 is one-step Bayesian (no goal inference); Kao's L1 is two-stage
(joint goal-inference enables metaphorical readings).
@cite{kao-etal-2014-hyperbole} (sister paper, same conference) uses the
identical RSA architecture for hyperbole (with quantity rather than
category as the literally-false dimension). Migration of the hyperbole
file would reuse most of this file's substrate.