And once again we have an exponential we have a bound on the error, which is

written over here. And once again, the number of samples

appears in the exponent that makes us happy.

We have the same type of epsilon squared term, that shows that the dependence on

the tolerance that's required. But her we have this one other term over

here, which is the actual value p to which we're trying that we're trying to

estimate. So if we can give you a bound on the

error that you get the this bound on on how far you are away from p.

A bound on the tolerance epsilon. And that also the bound on the

probability with which you get a bad sample set.

We can now say, for a given tolerance that we want.

I'm sorry for a given error probability that we want, that is.

How, if we want to guarantee that we're within epsilon, with probability that's

greater than one minus delta, we need this many samples.

And that's fairly straight forward algebra.

You simply, say this is less than or delta.

And then just take logs, and move things around and it all comes out, to be

exactly this. And for the turn off bound we have, a

similar expression. and, that gives us exactly that same kind

of, bound on m as a function of epsilon in delta, and in this case p as well.

So this looks great. Right?

We can you, you give me an epsilon which is your air tolerance and a delta which

is the probability which you are willing to take being wrong.

And I can say if you can only sample this many samples m then I can give you those

probabilistic guarantees. They're all deterministic but they're,

but they're pretty solid. Why is this not a perfect solution to our

inference problems? Because each of these has significant

limitations. So, let's think about the first which is

our additive bound. And let's imagine that you're going in to

a doctor and you're saying, you know what is the probability that I have some

horrible disease. Well, that probability hopefully for you,

is still pretty low. So maybe, if you're unlucky it's, I don't

know, 1% or 2%.. Well, an additive error epsilon on 1%,

the, the epsilon that you need to get something that's meaningfully bounded

when the true probability p is 1%.. The epsilon needs to be really, really

small. You can't do epsilon equals 0.01.

Because that could move you up from 1% to 2%..

And that's a factor of two increase in your probability.

And that's something that people really care about the difference between 1% and

2%. so you'd need something that's more like

0.0001 or maybe 00001 depending on, you know,

how confident you wanted to feel. And now, this epsilon2.

squared over here it was beginning to look pretty, pretty scary in terms of the

number of samples that are required to get a reasonable doubt.

So you might come and say well, fine. Let's use the turn off bound, because

that gives me relative errors on epsilon. So now if epsilon is, is, sorry, sorry on

p. So if now p is small, then by all means I

can go ahead and just, you know, have a, it, it's a multiplicative factor of p, so

I can say p plus or minus 1% of p. Well, unfortunately, there's no free

lunch because if p's small, notice that it appears here in the denominator, and

so once again we need a number of samples that could potentially be quite large

when we're dealing with small probabilities.

So the main message from this is that sampling-based estimation.

Is a reasonable thing to do when p is not too small.

When p is not too small this works fine. When P p begins to get smaller,

we the, the, the tractability of this is more in doubt.

Now that we understand when we might expect this to work let's think about how

we might apply it in the context of Bayesian Networks.

So here is our little baby network that we've used for the student example.

And what we'd like to do is we'd like to generate samples from this distribution.

So this is the distribution of, remember, P( D, I, G, S, L).

And we'd like to sample from this high dimensional distribution.

Not that high, but still. And the way in which this is done is

actually very natural. When you think about the sort of, causal

intuitions, or forward flow of a Bayesian network.

We start out, say, by sampling the difficulty variable.

And the difficulty variable was sampled from this distribution over here which

is, 0.6 versus 0.4. And so, we use the trick that we showed a

couple slides ago. And say, it comes out to be zero.

I'm going to write down the zero over here.

Now I'm going to sample I, and I'm going to toss a coin with probability 0.7 0.3,

and say I won. Now I get the sample grade.

And because I previously sampled difficulty and intelligence,

I know exactly what distribution grade needs to be sampled from.

It's the distribution I1D0. And so I now sample one of the three

values of grade from the distribution in this row that I've picked out in the

[INAUDIBLE]. And say, it comes out G1.