0:00

One of the most common applications of Bayesian networks or rather one of the

Â earliest ones that are still very much in use today, is for the purpose of

Â diagnosis. And by diagnosis I mean both medical as

Â well as fault diagnosis. Now this dates back into the early 90s in

Â the Phd thesis of. Heckerman et al won the ACM dissertation

Â award in a system called Path Finder. Which looked at a range of different

Â piece of evidence in order to help a doctor diagnose a set of diseases.

Â And specifically it was focused initially at least on lymph node pathology.

Â So 60 different diseases, all sorts of different symptoms.

Â And they tried out a bunch of different rules, a bunch of different methods for

Â solving this problem. So the first one they actually tried,

Â this was way back in the early days of artificial intelligence before Bayesian

Â networks were in common use. Then they tried a rule-based system and

Â it didn't work very well. The second version of pathfinder used the

Â naive base model which assumes that all of the symptoms are independent given the

Â disease and even that really simple model got superior performance to the rule

Â based system that they initially tried. Pathfinder three still use naive base but

Â if you naive bayes with better knowledge engineering that is they actually they

Â actually understood some of the issues behind what makes a system like this work

Â well and one. And they fixed it.

Â So specifically one of the things that turns out to be really fundamental for

Â the performance of any probabilistic modeling system is not to put in zero

Â probabilities ever, except for things that are definitions because once you put

Â in a zero, no matter how much evidence to the contrary you have, you will never

Â ever be able to get rid of it. Because anything, I'm zero is still zero,

Â And so, here in the initial pathfinder tool they put in some incorrect zero

Â probabilities for things that were very unlikely, but not impossible.

Â And it turns out that, that gave rise to about 10% of incorrect diagnosis of the

Â system. They also did better calibration of

Â conditional probabilities which turns out to be important for knowledge engineering

Â of a Bayesian network. So, for example, it turns out that it's a

Â lot easier to compare the. For a physician to compare the

Â probability of a finding. A piece of evidence between two diseases

Â as opposed to the probability of two different findings within a single

Â disease. It's much, it's much easier to say oh,

Â this is much more likely in this context than in that context.

Â And it turns out that when they asked the physician to calibrate this way, they got

Â much better estimates of the probabilities.

Â Mind you this was way before they had learning, so it was all hand constructed.

Â 2:46

and then finally Pathfinder four was the full bayesian network in all of its col

Â full glory it no longer made incorrect assumptions about independencies between

Â different say symptoms given the disease and that gave us and that both allowed

Â them to. Make the model more correct, and also it

Â turns out it has an unexpected side effect by allowing say, a symptom

Â variable to have more parents than just a single disease variable.

Â It actually gave rise to considerably more accurate estimation of the

Â probabilities because the doctor could kind of think about different cases and

Â didn't have to average them all out in his heads.

Â 3:28

And this is one of the, I think, really compelling aspects of of daisy in network

Â models. Which is that the daisy in network model

Â actually turned out to agree with the experts.

Â In an expert panel of physicians in 50 out of the 53 cases.

Â And these were hard cases. These were ones that you really needed

Â the expert's opinions on. It wasn't one that that just an average

Â doctor could necessarily diagnose correctly.

Â And this is as compared to 47 out of 53 for the naive Bayes model and

Â significantly left enough for the role based system.

Â Mind you, and this is an interesting and important.

Â Aspect is that the Bayesian network actually outperformed the physician who

Â designed the model. And, I mean, it didn't outperform the

Â expert time a little bit. But it performed the physician who

Â designed it. Because it was, better at putting

Â together all these different numbers in a way that a doctor just can't fit all of

Â these different findings into his or her brain at the same time.

Â 4:34

so we talked about the, CPS network, it's one of my, favorite networks because it's

Â kind of big and hairy and sort of kind of scary to look at, but anyway, the actual

Â number of variable in this network is about 500.

Â And each of them has, on average, about four values.

Â So the total number of parameters, if you were to specify a full-joint

Â distribution, is four to 500. So that's about, it's about four to the

Â 500, or two to the power of 1000, which is more than the number of [INAUDIBLE] in

Â the universe. So obviously one couldn't specify this as

Â a complete joint distribution. Not to mention that the probability of

Â each and every one of these is about, is as close to zero as makes no difference,

Â because it's the probability of, you know, 500 different, it means an event

Â involving 500 variables. if, you were to, if you were to actually

Â construct a CPD for each one of these for each of these variables, the [INAUDIBLE]

Â parameters would be about 133 million. Which is considerably better that two to

Â the 1,000, but still much too large. And so it turns out that they made

Â additional simplifying assumptions, that we'll talk about later on, that allowed

Â them to avoid a complete table representation of the CPD's and rather do

Â a more compact one, and that gave rise to about a 1000 parameters.

Â 6:15

So we already talked about the fact that these medical diagnosis systems have

Â emerged from research. And Microsoft built a medical diagnosis

Â system. Various other people have built one as

Â well. This has been a little bit slow on the

Â uptake in the medical field. Because it doesn't fit naturally into a

Â physician's, pipeline. maybe now, with the advent of medical, of

Â electronic health records. There will be more data entered into the

Â computers. So, the systems will be in more common

Â use. But, until very recently, most doctors

Â just wrote stuff down on paper. And so there, it was very difficult to

Â put this into the standard, production pipeline for diagnosis.

Â And then, finally, full diagnosis has been a much.

Â More direct application of the systems because here we do not have an issue you

Â know how doctors statistically do their diagnostic pipeline, so within the

Â windows operating system there is thousands of these little troubleshooters

Â that help you diagnose problems with your printer, with excel, with your email and

Â each of these has a little Bayesian network inside that answers probability

Â questions given observations about what What the system the, about, about the

Â model involving in this for example the printer.

Â And, there's also a big website out there that does car repair.

Â And you put in the make and model of, and year of the car, and what are the main

Â problems with it in it. figures it out and tells you what to look

Â at, and what the most likely complaint is.

Â And the reason behind, the benefits of this.

Â People don't use Bayesian networks for this, just because Bayesian networks are

Â cool, even though they are. they use this because it provides a very

Â flexible user interface for this, for the user,

Â You instantiate the evidence in the Bayesian network.

Â Out comes a probability. You don't want to answer the question

Â right now. That's okay.

Â You can answer it later. It's just means that it's an observation

Â that you didn't get the condition on. And.

Â and then for the designer, this type of system is really easy to design and

Â maintain. Because if, for example, something

Â changes a little bit in your printer structure.

Â If you were to design a standard menu-based system.

Â You would have to go and rebuild the entire tree that asks, you know?

Â What, what is the, what, that decides what is the first question to ask?

Â And what is the second question to ask? And what is the most likely diagnosis?

Â Here in the Bayesian network. You change one probability, maybe add an

Â edge, and everything just emerges from that in a very straight forward way.

Â So it's much more modular and more maintainable than, than a hard-wired

Â menu-based system. And that's what the people who use these

Â systems will tell you, that's why that's why they chose this path as opposed to

Â as opposed to the hard-wired methodology.

Â