BLOG Language results off by a small fraction

Simon O'Doherty :

I am trying out the Bayesian-Logic language using the following example.

  • 1% of women have breast cancer (and therefore 99% do not).
  • 80% of mammograms detect breast cancer when it is there (and therefore 20% miss it).
  • 9.6% of mammograms detect breast cancer when it’s not there (and therefore 90.4% correctly return a negative result).

I created the following code:

random Boolean Has_Cancer ~ BooleanDistrib(0.01);
random Boolean Detect_Cancer ~ 
    if Has_Cancer then BooleanDistrib(0.8)
    else BooleanDistrib(0.096);

obs Detect_Cancer = true;

query Has_Cancer;

When I run it I get the following results:

======== Query Results =========
Number of samples: 10000
Distribution of values for Has_Cancer
    false   0.9245347606896278
    true    0.07546523931038764
======== Done ========

According to the blog true should be 0.0776.

When I run with 100 samples I get this:

======== Query Results =========
Number of samples: 100
Distribution of values for Has_Cancer
    false   0.9223602484472041
    true    0.077639751552795
======== Done ========

I am just trying to understand why.

merv :

The values BLOG generates are point estimates after generating random samples from the conditioned probabilistic graphical model using the Likelihood-Weighting Algorithm (LWA). The differences from the analytic values in the example post are likely due to noise from the random sampling process.

What can be confusing though is that BLOG defaults to initializing the random number generator with the same fixed seed, and so the results misleadingly appear like they are deterministic. If you add the --randomize flag to the run invocation, you will see the results of using other random seeds.

I don't know the theoretical properties of LWA (e.g., how tightly it bounds posterior means), but at least for a naive generative sampling scheme, the means you are generating are well within a 95% CI. Here's a Python example, simulating 1000 runs of 10K samples.

import numpy as np
from scipy.stats import binom

np.random.seed(2019)

N, K = 10000, 1000
tp = np.empty(K)

for i in range(K):
    t = binom(n=N, p=0.01).rvs()
    f = N - t
    detect_t = binom(n=t, p=0.800).rvs()
    detect_f = binom(n=f, p=0.096).rvs()
    tp[i] = detect_t / (detect_f + detect_t)

np.quantile(tp, [0.025, 0.5, 0.975])
# array([0.06177242, 0.07714902, 0.09462359])

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=159148&siteId=1