An introduction to the statistics behind the most popular genomic data science projects. This is the sixth course in the Genomic Big Data Science Specialization from Johns Hopkins University.

Loading...

From the course by Johns Hopkins University

Statistics for Genomic Data Science

117 ratings

An introduction to the statistics behind the most popular genomic data science projects. This is the sixth course in the Genomic Big Data Science Specialization from Johns Hopkins University.

From the lesson

Module 1

This course is structured to hit the key conceptual ideas of normalization, exploratory analysis, linear modeling, testing, and multiple testing that arise over and over in genomic studies.

- Jeff Leek, PhDAssociate Professor, Biostatistics

Bloomberg School of Public Health

So this lecture's about what statistics is, and so this class is about statistical

Â genomics and so I'd thought I'd give you just a bit of an overview of what

Â my own personal view of statistics is and in particular the statistics for genomics.

Â So for me, statistics is the science of getting generalizable knowledge

Â out of a set of data.

Â So that's a quote that I made up just now but it's typically a kind of a usual view

Â of statistics from someone who's working in the field.

Â So there's a few different things that statisticians and

Â people who do statistics in genomics do.

Â One is study design, trying to decide how many people to sample or

Â how many organisms to sample.

Â Which parts of those organisms to sample.

Â What to do, should we genotype them, should we measure their gene

Â expression and so forth, calculating power and those sorts of things.

Â That's typically one thing that statistics does for genetics.

Â Another thing that statistics is involved in is in data visualization and

Â exploration.

Â So here I'm showing you some plots of heat map of correlations and

Â then showing you two variables, gene 74 and gene 77, and how correlated they are.

Â So this sort of investigation with plotting and

Â exploring a set of data is something that.

Â Would do as well.

Â The other thing they would do is help to pre-process and normalize data.

Â So we talked a little about the pipeline from raw data to processed data, and

Â so typically when you do that, to get from raw data to processed data you have

Â to perform statistical calculations or computations on the data to make it

Â more comparable across people or to remove sources of bias.

Â So that sort of statistical preprocessing is something that statisticians do.

Â Another thing that statistics does is statistical inference.

Â And this is probably the thing that everybody knows about statistics, so

Â if you think about the t-test or you think about doing some sort of calculative

Â standard deviation or estimates of error or anything like that.

Â Those sorts of things are all tied up in statistical inference.

Â It's basically if we have a small set of samples, how do we say something about

Â the big population when we're uncertain about what we're saying.

Â That's the most common thing that people think of, but

Â its only one part of what statistics does and genomics.

Â The last thing and

Â I think maybe this is the one that is most undervalued but critically important.

Â Statistics is about communicating the results of an analysis to

Â a broader community.

Â And so here Iâ€™m showing you an example of a complied or mark-down document,

Â weâ€™ll talk about that later.

Â Itâ€™s basically how you do a reproducible analysis and how do you intermix the text

Â describing the models you fit with the models that you fit.

Â And so, how do you make people understand what youâ€™ve done.

Â What computations.

Â What statistical calculations and what inferences that you've made.

Â And so I think statistical communication is a critical part of what

Â statisticians do.

Â Coursera provides universal access to the worldâ€™s best education,
partnering with top universities and organizations to offer courses online.