Documentation

Linglib.Core.Agent.SignalDetection

Signal Detection Theory @cite{luce-1959} @cite{green-swets-1966} @cite{macmillan-creelman-2005} #

Signal Detection Theory (SDT) models the observer's task of discriminating between two hypotheses — "signal present" (sN) and "noise only" (N) — when each hypothesis generates a noisy internal response.

Symmetric parameterization #

We adopt the symmetric equal-variance Gaussian model throughout: the noise distribution is N(-d'/2, 1) and the signal+noise distribution is N(d'/2, 1), both with unit variance. The criterion c is measured from the midpoint between the two distributions (so c = 0 is unbiased). The single parameter d' (the sensitivity or detectability index) is the signed distance between the two means and is ≥ 0 by convention.

The observer responds "signal" when the internal response exceeds c. Writing tailProb m μ := P(X > c | X ~ N(μ, 1)) = 1 - Φ(c - μ) = Φ(μ - c), both rates are instances:

where Φ is the standard normal CDF.

The ROC curve (Receiver Operating Characteristic) traces out the (F, H) pairs as c varies, for a fixed d'. When d' > 0, the curve lies above the diagonal (H > F), reflecting genuine discriminability.

Standard observer characterization: (d', c, β) #

An equal-variance Gaussian SDT observer is characterized by three quantities, two free and one derived:

Under uniform prior odds and 0–1 loss, the Bayes-optimal criterion is c = 0 (β = 1); under prior odds π_N / π_S it is c* = log(π_N / π_S) / d'.

Connection to Luce's choice framework (McFadden's theorem, exact) #

Under the symmetric parameterization, the likelihood ratio at observation x (measured from the midpoint between the two distribution means) is

L(x) = f_{sN}(x) / f_N(x) = exp(d' · x)

and the observer's "choice" between reporting signal vs noise follows a Luce model with v(signal) / v(noise) = L(x). We construct this Luce model as SDTModel.toLuceAt, defined directly as a RationalAction.fromGumbelRUM with binary utilities (d' · x, 0) and scale β = 1. The signal probability and odds-ratio properties are then immediate corollaries of mcfaddenIntegral_binary and RationalAction.fromGumbelRUM_policy — making the SDT/Gumbel-Luce equivalence formally exact for binary detection (UNVERIFIED: @cite{luce-1959} §2.E gives this as the original choice-theoretic framing).

Connection to Thurstone #

SDT is essentially Thurstone's discriminal-process theory (UNVERIFIED: @cite{luce-1959} §2.B–D) applied to the two-alternative detection context. The formal identity SDTModel.hitRate = ThurstoneCaseV.choiceProb under the yes/no Thurstone model with σ = 1/√2 is proved as SDTModel.hitRate_eq_thurstone in GaussianChoice.lean. The 2AFC version is SDTModel.twoAFC_eq_thurstone in the same file. The logistic approximation Φ ≈ logistic connects SDT choice probabilities to the Luce-Thurstone logistic choice model — see GumbelLuce.lean and logisticApproxConst_eq_thurstoneLuceK in GaussianChoice.lean.

References #

SDT model #

structure Core.SDTModel :

A Signal Detection Theory model with equal-variance Gaussian assumptions in the symmetric parameterization.

  • dPrime: sensitivity (d'), the standardized distance between the signal+noise mean (+d'/2) and the noise mean (-d'/2). Non-negative by convention; d' = 0 means no discriminability. (Not enforced as a structure invariant — theorems that need 0 ≤ d' or 0 < d' take it as an explicit hypothesis. This matches mathlib's gaussianReal μ v discipline of leaving inert parameters unconstrained at the type level.)
  • criterion: the observer's response criterion c, measured from the midpoint between the two distribution means. The observer responds "signal" when the internal response exceeds c. c = 0 is unbiased; c > 0 is conservative (favors "noise"); c < 0 is liberal (favors "signal").
  • dPrime :

    Sensitivity: signed distance between signal+noise and noise means (in standard deviation units). Non-negative by convention but not enforced — pass (h : 0 ≤ m.dPrime) to theorems that need it.

  • criterion :

    Response criterion: threshold for "signal" response, measured from the midpoint between the signal+noise and noise means.

Instances For
    theorem Core.SDTModel.ext {x y : SDTModel} (dPrime : x.dPrime = y.dPrime) (criterion : x.criterion = y.criterion) :
    x = y
    theorem Core.SDTModel.ext_iff {x y : SDTModel} :
    x = y x.dPrime = y.dPrime x.criterion = y.criterion

    Hit and false-alarm rates as instances of tailProb #

    Both rates are tail probabilities of a unit-variance Gaussian shifted by ±d'/2: hit rate is the tail at the signal mean, false-alarm rate is the tail at the noise mean. Factoring the shared structure makes the bound proofs apply uniformly.

    noncomputable def Core.SDTModel.tailProb (m : SDTModel) (μ : ) :

    Tail probability of N(μ, 1) at the model's criterion c:

    tailProb m μ = P(X > c | X ~ N(μ, 1)) = 1 - Φ(c - μ) = Φ(μ - c).

    Both hitRate (with μ = d'/2) and falseAlarmRate (with μ = -d'/2) are instances.

    Equations
    Instances For
      @[simp]
      theorem Core.SDTModel.tailProb_mem_Icc (m : SDTModel) (μ : ) :
      m.tailProb μ Set.Icc 0 1

      The tail probability lies in [0, 1].

      tailProb is strictly monotone in the distribution mean: shifting the distribution rightward (larger μ) makes the upper-tail probability larger.

      This is the engine behind roc_above_diagonal: the signal+noise mean +d'/2 exceeds the noise mean -d'/2 whenever d' > 0, so the hit rate exceeds the false-alarm rate.

      @[simp]
      theorem Core.SDTModel.tailProb_eq_normalCDF (m : SDTModel) (μ : ) :
      m.tailProb μ = normalCDF (μ - m.criterion)

      Symmetric form via Φ(μ - c), often more convenient than the 1 - Φ form.

      noncomputable def Core.SDTModel.hitRate (m : SDTModel) :

      Hit rate H = P("signal" | signal present) = tail at μ = d'/2.

      Equations
      Instances For
        noncomputable def Core.SDTModel.falseAlarmRate (m : SDTModel) :

        False alarm rate F = P("signal" | noise only) = tail at μ = -d'/2.

        Equations
        Instances For
          @[simp]
          theorem Core.SDTModel.hitRate_mem_Icc (m : SDTModel) :
          m.hitRate Set.Icc 0 1

          Both rates lie in [0, 1].

          @[simp]

          Hit rate is non-negative (corollary of hitRate_mem_Icc).

          Hit rate is at most 1.

          False alarm rate is non-negative.

          False alarm rate is at most 1.

          noncomputable def Core.SDTModel.proportionCorrect (m : SDTModel) :

          Proportion correct assuming equal presentation rates: the standard summary accuracy statistic, (H + (1 - F)) / 2. With unbiased criterion and d' > 0, this strictly exceeds 1/2.

          Equations
          Instances For

            Recovering (d', c) from observed rates #

            Standard SDT diagnostics come in pairs: from observed (H, F) an experimenter recovers both the sensitivity d' = z(H) - z(F) and the response bias c = -(z(H) + z(F)) / 2. Both are roundtrip-exact under the model.

            noncomputable def Core.dPrimeFromRates (hitRate falseAlarmRate : ) :

            Recover d' from hit and false alarm rates: d' = z(H) - z(F) = Φ⁻¹(H) - Φ⁻¹(F)

            where z = probit is the standard-normal quantile function. Under the symmetric parameterization, z(H) = d'/2 - c and z(F) = -d'/2 - c, so the difference is d' regardless of c.

            Equations
            Instances For
              noncomputable def Core.biasFromRates (hitRate falseAlarmRate : ) :

              Recover the response bias c from hit and false alarm rates: c = -(z(H) + z(F)) / 2.

              Under the symmetric parameterization, z(H) + z(F) = -2c, so the negated half is c regardless of d'. This is the standard SDT bias diagnostic; together with dPrimeFromRates, the two functions invert the model from observed rates.

              Equations
              Instances For
                @[simp]

                Roundtrip: d' recovered from rates equals the model's d'.

                @[simp]

                Roundtrip: c recovered from rates equals the model's criterion.

                theorem Core.dPrimeFromRates_pos_iff {H F : } (hH_lo : 0 < H) (hH_hi : H < 1) (hF_lo : 0 < F) (hF_hi : F < 1) :
                0 < dPrimeFromRates H F F < H

                d' recovered from hit rate H and false alarm rate F is positive iff H > F. This is the fundamental connection between SDT sensitivity and observable discrimination: an observer with d' > 0 can tell signal from noise above chance.

                Operating characteristic β and unbiased observers #

                noncomputable def Core.SDTModel.beta (m : SDTModel) :

                β (operating likelihood ratio at criterion): β = exp(d' · c).

                The likelihood ratio of the signal vs noise distribution evaluated at the decision criterion c. Standard SDT-observer diagnostic alongside (d', c): β > 1 indicates a conservative observer (prefers "noise"), β < 1 liberal (prefers "signal"), β = 1 unbiased.

                Equations
                Instances For

                  β is always positive (being an exponential).

                  Unbiased observer: criterion = 0 (equivalently β = 1 when d' > 0).

                  Mathlib Is* Prop-predicate convention.

                  Equations
                  Instances For

                    Unbiased implies β = 1. The converse fails when d' = 0 (then β = 1 for any c); for non-trivial discriminators see isUnbiased_iff_beta_eq_one_of_pos.

                    For non-trivial discriminators (d' > 0), unbiased ↔ β = 1.

                    Unbiased ↔ H + F = 1: a clean observable characterization of the unbiased criterion.

                    Under the symmetric parameterization, H + F = Φ(d'/2 - c) + Φ(-d'/2 - c), and by the identity Φ(a) + Φ(-a-2c) = 1 (when c = 0, this collapses to Φ(a) + Φ(-a) = 1), this sum equals 1 exactly when c = 0.

                    The forward direction is easy; the backward needs d' > 0 (otherwise H = F for any c and H + F = 2H constrains H = 1/2 rather than c = 0).

                    Proportion correct exceeds 1/2 for unbiased non-trivial observers.

                    Combining H + F = 1 (from IsUnbiased.hit_plus_fa_eq_one) with tailProb_strictMono applied to -d'/2 < d'/2 (the F < H content of roc_above_diagonal, inlined here to avoid forward reference): proportionCorrect = (H + 1 - F)/2 > 1/2 follows by linarith.

                    ROC curve #

                    noncomputable def Core.rocCurve (dPrime falseAlarmRate : ) :

                    The ROC curve for a given d': maps false alarm rate to hit rate as the criterion varies. Parameterized by criterion c:

                    (F(c), H(c)) = (1 - Φ(c + d'/2), 1 - Φ(c - d'/2))

                    Eliminating c: H = Φ(Φ⁻¹(F) + d'), equivalently H = 1 - Φ(Φ⁻¹(1 - F) - d'). We define the ROC curve directly as a function of the false alarm rate.

                    Equations
                    Instances For
                      theorem Core.roc_diagonal (f : ) (hf : 0 < f) (hf' : f < 1) :
                      rocCurve 0 f = f

                      For d' = 0, the ROC curve is the diagonal: H = F (chance performance).

                      For d' > 0, the ROC curve lies above the diagonal: H > F.

                      Direct corollary of tailProb_strictMono: the signal+noise mean +d'/2 exceeds the noise mean -d'/2 whenever d' > 0, so the hit-rate tail exceeds the false-alarm tail.

                      Note: monotonicity of the equal-variance AUC Φ(d'/√2) in d' lives in GaussianChoice.lean as SDTModel.twoAFC_mono; the AUC integral identity ∫₀¹ rocCurve d' f df = Φ(d'/√2) (Green & Swets 1966 @cite{green-swets-1966}) is correct but unproved — integrating rocCurve requires additional measure- theoretic infrastructure not currently developed.

                      SDT as a Luce model — exact via McFadden's theorem #

                      The SDT signal/noise choice is a binary Gumbel-Luce RUM: with utilities (d' · x, 0) and unit Gumbel scale β = 1, the McFadden integral reduces to the SDT Luce policy exactly. The signal-probability and odds-ratio properties below are immediate corollaries of mcfaddenIntegral_binary and RationalAction.fromGumbelRUM_policy.

                      noncomputable def Core.likelihoodRatio (dPrime x : ) :

                      The Gaussian likelihood ratio at observation x, given d', in the symmetric parameterization:

                      L(x) = f_{sN}(x) / f_N(x) = exp(d' · x)

                      where f_N ~ N(-d'/2, 1), f_{sN} ~ N(d'/2, 1), and x is the raw internal response (measured from the midpoint between the two means). Derivation: log L(x) = log f_{sN}(x) - log f_N(x) = [-(x - d'/2)²/2] - [-(x + d'/2)²/2] = ((x + d'/2)² - (x - d'/2)²) / 2 = d' · x.

                      The user-facing form is SDTModel.likelihoodRatioAt (namespaced on the model); this bare form takes dPrime as a free parameter and is the underlying definition.

                      Equations
                      Instances For
                        theorem Core.likelihoodRatio_pos (dPrime x : ) :
                        0 < likelihoodRatio dPrime x

                        The likelihood ratio is always positive (being an exponential).

                        noncomputable def Core.SDTModel.likelihoodRatioAt (m : SDTModel) (x : ) :

                        The Gaussian likelihood ratio at observation x for an SDT model.

                        Equations
                        Instances For

                          Always positive.

                          noncomputable def Core.SDTModel.toLuceAt (m : SDTModel) (x : ) :
                          RationalAction Unit (Fin 2)

                          SDT embedded as a binary Gumbel-Luce RUM (McFadden's theorem).

                          Construction: utilities u(signal) = d' · x, u(noise) = 0, with unit Gumbel scale (β = 1). By RationalAction.fromGumbelRUM, the score is exp(u_i / β) = exp(d' · x) for signal and exp(0) = 1 for noise — i.e., (L(x), 1), the Bayesian-posterior-odds form under uniform prior.

                          The Luce policy P("signal" | x) = L(x) / (L(x) + 1) is then immediate from mcfaddenIntegral_binary (proved as toLuceAt_signal_prob below).

                          Note: the construction depends on m.dPrime and the observation x, not on m.criterion. The criterion enters only at decision time (the observer's response rule); the Luce score structure encodes the Bayesian likelihood, which is criterion-independent.

                          Equations
                          Instances For
                            @[simp]

                            The score on "signal" (action 0) is the likelihood ratio.

                            @[simp]
                            theorem Core.SDTModel.toLuceAt_score_noise (m : SDTModel) (x : ) :
                            (m.toLuceAt x).score () 1 = 1

                            The score on "noise" (action 1) is 1 (the Gumbel-RUM score with utility 0).

                            theorem Core.SDTModel.toLuceAt_odds_ratio (m : SDTModel) (x : ) :
                            (m.toLuceAt x).score () 0 / (m.toLuceAt x).score () 1 = m.likelihoodRatioAt x

                            The Luce odds ratio score(signal) / score(noise) equals the likelihood ratio — the core SDT/Luce identification.

                            Now an immediate corollary of the score-component lemmas.

                            The Luce signal probability L(x) / (L(x) + 1), derived as a corollary of mcfaddenIntegral_binary via RationalAction.fromGumbelRUM_policy.

                            Bayesian posterior interpretation #

                            The SDT Luce policy is exactly the Bayesian posterior on "signal present" under uniform prior odds. Under non-uniform prior π_S for signal, the posterior is π_S · L(x) / (π_S · L(x) + (1 - π_S)) — the same formula but with prior-weighted likelihoods. With π_S = 1/2, this reduces to L(x) / (L(x) + 1) = (m.toLuceAt x).policy () 0.

                            Why posteriorAt is not implemented via PMF.posterior #

                            The SDT observer's hypothesis space is binary ({signal, noise} = Fin 2) and finite, but the observation x : ℝ is continuous (a real-valued internal response). Mathlib's PMF.posterior (Core/Probability/Posterior.lean) operates on countable observation spaces; it does not natively handle continuous observations. The right mathlib substrate for the continuous binary case is MeasureTheory.Kernel + condCDF, which we have not set up here. Until that infrastructure is in place, SDTModel.posteriorAt provides the closed-form Bayes' rule formula directly — not a parallel implementation of PMF.posterior's discrete machinery, but a genuinely distinct abstraction for the continuous case (cf. Finset.sum vs Measure.integral in mathlib: same idea, different abstractions for different domains).

                            noncomputable def Core.SDTModel.posteriorAt (m : SDTModel) (x priorSignal : ) :

                            The Bayesian posterior P(signal | x) under prior priorSignal ∈ [0, 1] for "signal present", with Gaussian likelihoods determined by m.dPrime.

                            P(signal | x) = π_S · L(x) / (π_S · L(x) + (1 - π_S))

                            by Bayes' rule applied to the binary hypothesis {signal, noise} with likelihood ratio L(x) = f_{sN}(x) / f_N(x) = exp(m.dPrime · x).

                            Equations
                            Instances For
                              theorem Core.SDTModel.posteriorAt_uniform (m : SDTModel) (x : ) :
                              m.posteriorAt x (1 / 2) = (m.toLuceAt x).policy () 0

                              Luce policy = Bayesian posterior under uniform prior.

                              With prior π_S = 1/2, the Bayesian posterior P(signal | x) reduces to L(x) / (L(x) + 1), which is exactly (m.toLuceAt x).policy () 0 by toLuceAt_signal_prob.

                              This is the deepest connection between SDT and Bayesian inference: Luce probability matching on the binary detection task IS Bayes' rule under uniform priors.

                              theorem Core.SDTModel.posterior_gt_half_iff_pos_obs (m : SDTModel) (x : ) (hd : 0 < m.dPrime) :
                              1 / 2 < m.posteriorAt x (1 / 2) 0 < x

                              Bayes-optimal threshold under uniform prior: the posterior on "signal" exceeds 1/2 iff the observation is positive (when d' > 0).

                              Equivalently: the Bayes-optimal decision rule "respond signal iff posterior > 1/2" coincides with the SDT criterion-zero rule "respond signal iff x > 0" — i.e., criterion = 0 is Bayes-optimal under uniform prior and 0–1 loss.

                              Under non-uniform prior odds π_N / π_S, the optimal threshold is c* = log(π_N / π_S) / d' (UNVERIFIED: standard SDT result, see @cite{green-swets-1966} and @cite{macmillan-creelman-2005} ch. 1; not formalized here — would require continuous Bayesian-decision infrastructure).

                              Logistic approximation constant #

                              The SDT hit rate Φ(d'/2 - c) is well-approximated by logistic(k · (d'/2 - c)) where k = π/√3 ≈ 1.814 is the variance-matching constant:

                              Φ(x) ≈ logistic(x · π/√3) with max error ≈ 0.023.

                              This is the Thurstone-Luce bridge for the detection context: both SDT (Gaussian noise) and the Gumbel-Luce model (Gumbel noise) are Random Utility Models. The Gumbel-Luce model gives exactly logistic probabilities (McFadden's theorem; see gumbelMaxProb_is_mcfaddenIntegral in GumbelLuce.lean). The Gaussian model gives Φ. These agree up to the numerical approximation Φ ≈ logistic.

                              The constant k = π/√3 equals thurstoneLuceK(1/√2), unifying the SDT and Thurstone parameterizations — proved as logisticApproxConst_eq_thurstoneLuceK in GaussianChoice.lean.

                              The variance-matching constant π/√3 ≈ 1.8138 is exact (logistic has variance π²/3, so scaling Φ by π/√3 matches unit-variance normal). The optimal sup-norm constant for Φ(x) ≈ σ(k · x) is approximately 1.7009 (Page 1977). We use π/√3 rather than the sup-norm optimum because it has a clean variance-matching derivation and equals thurstoneLuceK(1/√2). UNVERIFIED: the sup-error figure ≈ 0.023 is quoted from secondary sources; verify against Bowling et al. 2009 or Page 1977 before relying on it.

                              noncomputable def Core.logisticApproxConst :

                              The logistic approximation constant: k = π/√3 ≈ 1.814.

                              The variance-matching scale factor between the standard normal and standard logistic distributions: the standard logistic has variance π²/3, so Φ(x) ≈ logistic(x · π/√3).

                              Equations
                              Instances For

                                The logistic approximation constant is positive.