0:00

[SOUND] Welcome back to Part II of the lesson.

In this lesson, we're going to look at an application of the data science approach

to extracting structure-property linkages.

So now on to Step 3.

[COUGH] In this step, we're trying to connect the principle components

we established in step two to the properties we established in step one.

We're going to take a very simple approach, and we're going to try and

connect these variables through a very simple polynomial expression.

In this expression, p1 is principle score 1.

p2 is principle score 2.

The table here shows the expressions we actually used.

The property of interest, let's say,

is the yield point, the effective yield point, and for

this property of interest, it turns out that we only need to use three terms.

A constant term, a linear term in the principle component 1, and

a term that has principle component 1 times principle component 2, so

it's something like this term.

1:09

Whereas if you look at another property,

like the localization propensity, turns out we need to use a lot more terms.

The decision on how many terms we use for

each case really depends on the error in the linkage,

and more importantly, the cross validation error.

Again, we discussed leave-one-out cross validation in a previous lesson.

1:34

And to remind you, what it does is, it tells us when we have an over fit.

In other words, if we use too many terms here, you'll always get a lower error, but

sometimes there is a danger that we actually get much higher leave-one-out

cross validation, which basically essentially tells you that we will fit.

As a particular example in this case, we notice that this particular choice

of number of principle components 5 and number of polynomial degree 3,

in this particular case, gives us the best leave-one-out cross validation error.

2:11

In general, as the number of components and

the polynomial degree go up, the error always goes down.

So you always want to get as many as you want.

But in this case, you can see that with five principal components,

if I choose the four total polynomial instead of a third total polynomial,

the leave-one-out cross validation goes up.

That means that the model now is very sensitive to the data.

And we want to avoid that.

So that's how we make the decision on which terms to keep and

which terms to do them all.

And this table now describes

the models we established using the data science approach.

Now let's see how well they performed.

To see how well they performed, we make this plot,

where we cross plot the predictions from the simple model to the actual

personal data, which came from the fundamental results.

So in some sense, we think of the simulation results has gone through, and

we're looking at how well the predictions captured the simulations.

If everything is done well, in other words, if our models are really good,

all the data should be along this line, the black line.

If in fact, all the data is exactly on the black line,

then we have a really good model.

So in this particular case, we're comparing the ability to predict

the ability of the data science model, which is shown in red, on these plots

versus the predictive capability of some of the traditionally used approaches.

So for example, in this case, when we're looking at the effective yield point,

3:52

that connects the effective yield point to the particle size.

Or another power law that connects to the volume fraction of the inclusions.

And you'll see that both of these conventional approaches do poorly,

compared to the data science approach.

4:10

On the other hand, if you look at the localization propensity as a property,

you'll notice that both the data science method and this volume fraction

approach power law, using volume fraction, do a pretty decent job.

Only the power law based on particle size does not do a good job.

4:34

Now, here are two more properties we captured using the same approach, and

once again, you'll notice that the data science approach consistently

does a much better job at capturing the structure-property linkage.

And the real advantage of [COUGH] the benefit

of using the data science approach is it's pretty much the same approach,

no matter what property we are interested in, and

no matter what physical material phenomenon we are interested in.

So in some sense, we can template this entire process, and we can automate it.

As a summary for this class, the main steps involved in the data

science approach were demonstrated and validated

using a simple case of non-metallic inclusions, steel composite system.

5:22

We noticed that the data science approach provided a practical

tool to extract robust and reliable structure-property linkages of interest.

The data science approach can also be generalized

to a very broad range of applications.

As I said, it can be templated and used in multiscale materials modeling and design.

Thank you.

[SOUND]