A typical data science project will be structured in a few different phases

that I'll talk about in separately in this lecture.

So there's roughly five different phases that we can think about

in a data science project.

The first phase is the most important phase,

and that's the phase where you ask the question and

you specify what is it that you're interested in learning from data.

Now, specifying the question and kind of refining it over time is really

important because it will ultimately guide the data that you obtain and

the type of analysis that you do.

Part of specifying the question is also determining the type of question that

you are gonna be asking.

There are roughly six types of questions that you can

ask going from kind of descriptive, to exploratory, to inferential,

to causal, to prediction, predictive and mechanistic.

And so figuring out what type of question you're asking and

what exactly the question is, is really influential.

And so you should spend a lot of time thinking about this.

Once you've kind of figured out what your question is, but

typically you'll get some data.

Now, either you'll have the data or you'll have to go out and get it somewhere or

maybe someone will provide it to you, but the data will come to you.

And then the next phase will be exploratory data analysis.

So this is the second part, there are two main goals to exploratory data analysis.

The first is you wanna know if the data that you have is suitable for

answering the question that you have.

Then so this will depend on a variety of factors depending on very basic things

like is there enough data, are there too many missing values, things like that.

To more fundamental ones, like are you missing certain variables or

do you need to collect more data to get those variables, etc?

The second goal of exploratory data analysis is

to start to develop a sketch of the solution.

And so if the data are appropriate for

answering your question, you can start using it to kinda sketch

out what the answer might be to get a sense of kinda what it'll look like.

This can be done without any formal modeling or any kind of the statistical

testing of things like that just to get a good picture of what it might be.

The next stage, the third stage, is formal modeling.

So if you're sketch kind of works out,

you've got the right data and it seems appropriate to move on,

the formal modeling phase is the way to kind of specifically write

down what questions you're asking, what parameters you're trying to estimate.

And it also provides a framework for challenging your results.

So just because you've come up with an answer in the exploratory data analysis

phase doesn't mean that it's necessarily going to be the right answer and

you need to be able to challenge your results through a variety of

approaches where the sensitivity analysis are other types of analysis.

So challenging your model and just developing a formal framework is really

important to making sure that you can develop robust evidence for

answering your question.

The next phase is interpretation so once you've done your analysis your

formal modeling you wanna think about how to interpret your results and

there are a variety of things to think about in the interpretation phase

the data science project.

The first is kinda like think about how your results jive with kinda what

you expected to find when you where first asking the question.

And also you wanna think about the kind the totality of the evidence

that you've developed.

At this point, you've probably done many different analysis,

you probably fit in many different models.

And so you have many different bits of information to think about and

part of the interpretation phase is to kind of

assemble all that information to weigh the different pieces of evidence.

So that you know what kind or

which are more reliable, which are more important than others and to get a sense

of the totality of evidence with respect to kind of answering the question.