But on the, on the computational side, once we've received all this data,
we had to actually figure out how to appropriately combine these data
with other large microbiome data sets.
One in particular is the human microbiome project.
Now, unfortunately, it's not a matter of just taking two data sets and
smashing them together because you end up with inherent biases that
are present within the studies based off of how the sequencing was done,
how the actual samples themselves were processed.
So we had to figure out what type of controls and
normalizations had to go on so that we could combine these data
together without artificial biases that were simply because of study design.
So that was one of the first hurdles.
The next hurdle was figuring out how to get all of this compute done in
a fashion where we could just put it up online so
that people could see how the data were processed and recreate that themselves.
And so we chose to use the IPython notebook platform to get this done.
So IPython notebooks are basically like Google docs for
programming and they're absolutely fantastic, they're really great for
science because you can write up an analysis and redistribute it to people.
And then they can rerun analysis.
It's absolutely tremendous for reproducibility in science.
However, with the American-Gut to do the processing for
all the sequences it takes on the order of about 5000 to 10,000 CPU hours.
So what that means is if you were to let your individual laptop,
just a single processor, run for a year you might do all the compute.
So in order to help with all the compute, we have these notebooks hitting against
dedicated special supercomputers that are able to do a large amount of
this compute in parallel so we can get it done in a fairly timely manner.
Getting those to all come together, figuring out the slices of the data that
we wanted to show participants that we thought were the most interesting
took a bit of time, as well as compiling the actual result sheets that we could
distribute back out to people, as well as doing this all in an automated fashion.
So now where the fun really is coming in,
is we have collected enough samples at this point where we can begin to
investigate some of the interesting trends that we're seeing in the data.
So the American-Gut started in November of 2012.
One of the challenges with crowd-funded projects, any crowd-funded project,
is you don't know if it's going to succeed or not.
And as such, you can't plan ahead for,
you know, are we going to receive $10,000, are we going to receive $100,000.
Are we going to have five participants or are we going to have 1,000 participants.
So you don't know in advance,
how many people you actually need to get involved in the project or
what a lot of the logistics look like for actually scaling a project.
So that was really one of the big challenges for
the first, few months, getting under way,
was how do we actually send out thousands of samples all over the county.
And do so in such a manner where the people who
are volunteering their time are not volunteering all of their time.
So people who are participating in the Amer, or, or contributing to
the American-Gut from the lab side are all researchers who have their own projects or
graduate students who are trying to graduate not necessarily using
The American Guide as their thesis.
So figuring out how to make those processes as efficient as possible was,
was really big.
We learned a lot through that.
The future of The American-Gut.
First off.
One of the things we realized in this past year and
a half is that the diet questions we were asking both were incredibly tedious, and
they didn't yield as useful of information as we wanted.
So, we were in the process of changing up the whole diet portion of
the questionnaire, as well as touching up a few of the other questions to
make them a bit more useful on the science side.
So that will be going out very soon.
We're going to ask all the participants who have previously participated if
they want to re-answer the questions as well.
In addition we're looking at expanding out into additional projects that
are more focused on specific sub-populations.
In particular, we've got one that is focused on an Autism Spectrum Disorder
cohort, where we're trying to get as many individuals or
children who have Autism Spectrum Disorder,
as well as any neurotypical siblings to contribute samples.
But, what we'd really like to do is expand this out into other sub-populations and,
and dig down and use the infrastructure we've built up for
the American-Gut to enable other really interesting projects.
One of the other things that we're doing very soon at the American-Gut Project
is reaching out to other sites around the world that are interested in
participating as well, too.
So, trying to model to model the American-Gut after the personal genome
project, and start sites up in different locations to make it easier for
people to contribute samples.
So one project in particular that we'll be starting up is,
a sister project, is the British-Gut project.
And what that means is that people can send samples to an institute in the UK
as opposed to sending sample here in and then is of course,
a lot more expensive to send samples from Europe over here to the United States.
And we're looking at expanded also to the Australian-Gut.
And once we get familiar with how this would like in other English
speaking countries, we'd like to branch out into Non-English speaking countries.
Because we want to understand as many of the different populations of the world as
we can because a lot of the microbiome we already know differs substantially.
But that'll be very exciting to expand the reach of this project around the world and
really, I think enable access to a very broad population which is exciting both,
I think, for the general public and very exciting for us as researchers