So the first step in this process is to cross-link proteins to DNA, as you'll
recall from the Genomic Technologies class, and then the fragmented DNA.
And then if you have an antibody for that particular protein that you've attached
to the DNA, or a cross-link to the DNA, you can then do an antibody pulldown.
So this basically enriches for a particular subset of the DNA, that's
associated with the proteins that you're interested in, or whatever they might be.
Or whether there are transcription factors, or something else.
And so then we did pull down just those fragments, and then we sequence them.
And then we're going to be looking at basically how do we analyze the data that
come from these sequences, that come from this pull down experiment.
So the first step in this process, just like the first step often in
a next generation sequencing experiment, is to align.
And then in this case, you actually don't necessarily need to worry about,
like you did with say something like RNA seek, or with sequencing.
You don't necessarily need to worry about doing anything,
other than a sort of a straight ahead alignment to the Genome.
And so the popular software for doing these are software like Bowtie2 and BWA.
That are very fast aligners to the Genome.
The next step in the process is to detect peaks, basically to identify, again we've
enriched particular sequences because they've been pulled down by proteins.
And so we want to be able to identify where there are peaks,
or where there big piles of reeds corresponding to those sequences.
And so there are a couple of different software for doing these, CisGenome, MACS,
and PICS are a couple of the popular ones.
PICS is in bio-conductor, and CisGenome and
MACS are both sort of standalone software.
And they can be used to basically detect, where are those peaks are in the sequence.
And then the next step is to count basically, so to try to obtain
a measurement for the amount of reeds that cover a particular peak.
Now, there is a questions as to how quantitative the ChIP-seq technology is,
in terms of how much binding there is there.
But it isn't useful to have the quantification of how many reeds fall
into each of the different peaks.
So then the next step, and this is actually relatively recent that these sort
of processes have been heavily introduced in sort of normalization, and
so especially the cross sample normalization.
Until relatively recently, many ChIP-seq experiments didn't have a large number
of replicates, but they're definitely increasing over time.
And so some of the ideas that have been used in RNA sequencing analysis, and
other places have been moved over into the ChIP-seq world.
And so, it's now common to apply some kind of normalization.
Whether that's MAnormalization as I've shown in this figure here, or
some other kind.
And so the diffbind package in bio-conductor, and
the MAnorm package in bio-conductor are the two approaches that use various
different types of normalization, to make the peak counts comparable to each other.
And then you need to do some sort of statistical tests.
And so this is basically to identify whether there's any differences between