0:03

Hi, my name is Stephanie and

Â I'm a PhD student in Duke in the Statistical Sciences Department.

Â >> I'm Willem van den Boom,

Â also PhD student in the Department of Statistical Science.

Â And here, we have Jim Berger who is a professor in our department, as well.

Â To start off with, how did you get interested in Bayesian statistics?

Â 0:23

>> That was a long time ago.

Â My original research was in instructive destination which arose from a famous

Â resulted trials time about 50 years ago where Li discovered that the very commonly

Â used least square estimator, or say regression parameters, was not optimal

Â from a frequencies perspective if there were three or more regression parameters.

Â 0:44

This calls a big flurry in the community and it was a lot of research on this.

Â The curious part about this shrinkage estimator is you had to

Â shrink say the least square estimator towards some point.

Â And as soon as I started doing research and it kind of went well.

Â Which point do I shrink to?

Â because could I pick any point and quickly I started saying, well,

Â you have to use Bayesian prior knowledge to decide where to shrink, and

Â that was my first interest in Bayesian.

Â 1:10

>> I see, so in your Decision Theory book, you mentioned that by that

Â via writing the book, you became a rapid Bayesian, so why is that?

Â >> That was more for, I guess I would call it philosophical reasons,

Â where I was looking at foundations of statistics and

Â one of the things I encountered was something called the Likelihood Principle

Â which is my favorite principle in statistics.

Â Basically, what it implies is that this common frequentest

Â measures of statistics are not appropriate.

Â Now, that by itself is not a big deal except for

Â the fact that the Likelihood Principle was shown to follow from to other

Â principles which everybody believed in.

Â One was the sufficiency principle, and the other is the conditionality principle

Â which the conditionality principle basically says if you're given 2 different

Â measuring instruments and you better measure something with it and

Â 1 of them has a variance 1, 1 has variance 3.

Â If you get the measuring instrument with variance 3 to report a variance

Â 3 as your error or do you say I could have gotten the other instruments that

Â report 1 plus 3 over 2, now that thing is nuts to everybody.

Â Obviously you should use the measuring instrument that you were given but

Â if you simply believe that together with sufficiency, that implies the likelihood

Â principle, which in turn more or less throws out all the frequentist statistics.

Â And so, at that point I was astonished, and sort of began to more heavily

Â embrace the Bayesian world and started using it in applications.

Â >> So what applications have you found where Bayesian statistics proved useful?

Â 3:02

One, Bayesian analysis is often associated with the use of prior information,

Â like I just mentioned earlier.

Â And one application as an example I did 20 years ago,

Â was something where you had to use prior information to solve the problem.

Â It was a problem what we were with the,

Â actually all of the automotive companies in Michigan.

Â A problem of trying to assess how much fuel economy gain was possible,

Â this had to do with government regulations.

Â We built high replica models.

Â We built a huge complicated high replica models with all sorts

Â of different vehicles, all sorts of different manufacturers,

Â all sorts of different car parts that might affect fuel efficiency.

Â And then, after we had done this,

Â we found that there is a part of the problem where we had no data about.

Â And so, we had to go back to the engineers and do about two months long of

Â elicitation of their expert knowledge in order to complete the problem.

Â So here was a problem that would have been impossible to do without

Â the Bayesian both because the subject elicitation was necessary and

Â the super complex hierarchical model could not have been in the Bayesian way.

Â A second class of problems I would say are problems that I just mentioned,

Â easier in a Bayesian way.

Â 4:26

They are very prominent in, say the chemical industry.

Â And with variance component models it's very straightforward

Â to do a Bayesian analysis but it's very difficult to do other kinds.

Â For instance, maximum likelihood is a very common alternative approach

Â 4:44

that's used by non-Bayesians.

Â But the likelihood in various component problems are very nasty.

Â You often have loads of zero and

Â things like that which make it really difficult to analyze.

Â So there's a lot of problems that fall in that category which is so

Â much simpler to do than Bayesian.

Â A third class, I guess I will call it understanding statistics and

Â when you get to problems like testing, non-statisticians

Â don't understand what common statistical test mean for instance used of P values.

Â They don't understand what that means.

Â The Bayesian answers like the posterior probability that the hypothesis is true is

Â so much simpler and easier to understand.

Â And I think using Bayesian analysis on these problems just makes all

Â the non-statisticians more capable of doing good statistics,

Â because they understand it.

Â Then, there's entire fields of research that are almost entirely Bayesian.

Â The one I've been working in the last ten years is called uncertainty

Â quantification of stimulation models.

Â And that's become essentially completely Bayesian for

Â reasons that is the only way to do the problem.

Â >> So what is uncertainty quantification of simulations?

Â 6:23

I started working, again, with the car companies on this, with General Motors.

Â Where their goal was to essentially, in the computer, create models of vehicles

Â and the different parts of vehicles so they could experiment on the computer

Â rather than having to build $200,000 vehicle prototypes to test.

Â 6:41

So a simulation model in this vein is big, usually applied math computer program.

Â That's very intensive to run.

Â And the question is,

Â is this computer model a good representation of reality or not?

Â And to answer that question you have to involve all sorts of data,

Â all sorts of statistics, and you just can't do it in a non-Bayesian way.

Â 7:06

>> How can misuse of P values contribute to the lack of

Â reproducibility of research?

Â >> I mentioned that in testing, P values are not well understood.

Â And if you ask a non-statistician what a P value of 0.05 means,

Â they will typically tell you that it means some kind of error.

Â It's the probability that you're making a mistake in rejecting the hypothesis or

Â they might say, it's the probability the hypothesis is true.

Â They always think of it as an error.

Â It's not.

Â It's something completely different.

Â And if you look at true errors, that are, either Bayesian, or, what are called,

Â conditional frequentist, a P value of 0.05 corresponds to a true error of about 0.3.

Â So now you see the problem.

Â If a scientist publishes a paper that rejects the hypothesis, and

Â proves a theory they're trying to establish, and the P value is 0.05,

Â they think they have only a 5% chance of being wrong.

Â Actually, it's a 30% chance of being wrong.

Â And so,

Â a lot of the papers published in science are simply wrong because of this reason.

Â In science, that often will get corrected,

Â because if it's an interesting result, somebody will go back and look at it and

Â redo the experiment and discover that that really wasn't a true theory.

Â 8:34

experiments are not reproduced for reasons of cost.

Â And so, you really have to get it right the first time.

Â And so, in industry if P values are misinterpreted,

Â it can be really dangerous and financially costly.

Â So I think there, there is especially strong reasons to use Bayesian methods.

Â 8:54

>> Great, well, thank you for letting us interview you today.

Â That was all very interesting and useful, thank you.

Â >> Well, my pleasure, and I hope the rest of the course goes very well.

Â