So far, we've looked at customizing our documents by changing
various options either in the code chunks or in the YAML header.
In this lesson, we're going to customize
our document by adding parameters to our documents.
To learn more about adding parameters to the documents,
go to the R Markdown website,
and click on get started,
and look at the link here for parameters.
There's a description about how to use parameters and how that changes your document.
In the example shown here,
they change out information for different locations
based on the maps that are shown in the resulting document.
For example, in the first one,
they're showing a map of Hawaii and
then the second one they're actually showing a map of the Aleutian Islands.
There is additional information at the bottom of
this web page where you can learn more about parameterized reports.
There is detailed information here on how you declare parameters in
your YAML header and how to access those and use them for creating your document.
For this lesson though,
we're going to work with the data set from the fivethirtyeight R package.So,
go ahead and open R studio and install the fivethirtyeight R package.
Click on tools, install packages and type in fivethirtyeight, click install.
You can learn more about the fivethirtyeight package at the CRAN repository site,
which is a reliable place from which to install R packages.
Here's the CRAN website.
You'll notice under software, we click packages,
we can get a list of all of
the available packages for R available on the CRAN repository.
There is currently over 11, 000 packages available.
We can look at these sorted by date of publication or by
name and there's also a link here for task views.
So if you don't know the name of a package that
you're interested in you can click on task views,
you can browse the packages by topic and functionality.
So, the CRAN repository for the fivethirtyeight package is shown here.
In this listing, you'll also notice that there is a link
provided for their GitHub repository.
We can also open that and take a look and get information on the fivethirtyeight package.
This is where the authors of the package hosts the package usually,
while it's still under development.
I've mentioned this briefly before but when you install R and R Studio,
you only get the capabilities of
the core functions and packages built into the base R software.
The bulk of the functionality of R comes from user
contributed R packages and you install them as you need them.
The biggest advantage of this is that you literally have the whole world of
R developers working every day to improve R. However,
the biggest disadvantage of open source software is that there's
no single central group that checks and validates these R packages.
So in the open source software arena,
you should always check to see how reliable
the authors and the package is before you use it.
Investigate how long the package has been around,
how many people use it,
have you seen it used before in other publications.
You can also get an idea of how much code testing and validation has been
done based on the repository that's actually hosting the R package.
There's typically four places where you can find R packages: the CRAN repository,
which we just showed you.
Where you can explore submitted packages alphabetically and by task views.
There's currently over 11,
000 packages hosted by CRAN.
These packages have all been through some validation and testing.
So, these packages are considered to have been released for production.
That said though, it's still possible these packages have errors or bugs in them,
so always be sure to double check that the package is working the way you expect it to.
Another very reliable repository for R packages is Bioconductor.
Bioconductor hosts R packages primarily
for use in the computational biology research arena.
However, the R packages hosted here go through
even more vigorous vetting and testing before being accepted by the repository.
It's still a good idea though,
to test these packages to make sure they're working like you expect.
And there are currently over 1,400 R packages hosted by Bioconductor.
The next most common place to find R packages is on GitHub.
In most cases, the R packages found on GitHub are considered to still be in development.
As such, it's more common to find errors and bugs in these R packages.
However, you can often get the leading edge or bleeding edge of
the latest and greatest functionality but you need to be
even more certain that the R package is working as you expect.
There is currently no accurate count for the number of R packages hosted
on GitHub but it's well into the many tens of thousands.
Finally, virtually any other file server repository hosting service like
Bitbucket or other mode of transferring files or
groups of files can be used to distribute an R package.
Obviously, only accept files from sources that you trust.
So, let's take a moment and explore
the fivethirtyeight package that we'll use for this and future lessons.
Go to the packages tab in the lower right window.
This tab lists all of the R packages that are installed on your local drive.
These are all of the packages that you currently have access to
from your local drive for use when writing R code,
to do analyses, and manipulate your data files.
Scroll down until you see the fivethirtyeight package,
and click this hypertext link.
This is going to open a help page for the fivethirtyeight package.
This R package mainly consists of data sets that have been used over
the past several years at
fivehirtyeight.com for the purposes of writing their news articles.
For example, for this lesson,
we're going to work with this steak survey data set.
Let's scroll down, click on the link for the steak survey data set.
This data set was used for an article fivethirtyeight.com,
published in May of 2014 on how Americans like their steak.
There is a link provided here in
the help menu for this article that was published online.
Take a few minutes and read through this article to get an idea of how
fivethirtyeight.com uses data to support their journalism.
We're going to explore this data set similar to how they did.
While we're not going to try and replicate their results,
it's good to know that you're working with the same data
set that they used when they published this article.
To explore the steak survey data set from the fivethirtyeight package,
let's create an R script to save our R commands,
and then, we're going to use some of these later when we build our document.
So if you don't already have it open,
go ahead and open and start R studio for the project for module 3.
Once this opens, click file,
new file but this time,
we're going to create an R script.
We're going to type in the R script that was provided in the read ahead
materials for this lesson and I'll explain each of these in one minute.
In addition to the fivethirtyeight R package,
we're also going to use the Tidyverse R package that was developed by Hadley Wickham.
The Tidyverse package actually includes a number of other R packages.
So when you install the Tidyverse package,
it's going to take a few minutes to get everything downloaded and installed.
Go ahead and click on tools, install packages,
and type in Tidyverse, and click install.
This is the website here for Tidyverse.
Tidyverse is actually a new approach to
data manipulation and analysis that has a very good programming workflow.
And while the focus of this course is not on our programming per se,
we're going to be using R code to create
objects like figures and tables for our documents.
For consistency, I'm going to be using
the Tidyverse approach for our programming for these lessons.
So go ahead and take a few moments to install the Tidyverse package.
Now, let's go back to the R script that you just created.
The first two lines of code shown here use the library statement.
These load the two are packages, fivethirtyeight and Tidyverse.
Even though the packages are installed on our local drive,
we have to load them into our current R session for us to
be able to use the data sets and functions in these packages.
So let's highlight these two lines of code,
and then we'll click run to load these two libraries.
This next line of code uses the data function to
load the steak survey data set from the fivethirtyeight package.
Highlight this line of code and click run.
We're also going to run the next line of code to view the steak survey data set.
You'll notice when we finish,
in our environment tab at the top right,
you now see a copy of the steak survey data set.
This means that the data set is in local memory and we can see it.
We can actually click on the little table icon,
shown here on the right, to view the data set.
You'll notice that this data set has
550 observations or rows and 15 variables or columns.
As you can see, most of the columns contain
true-false responses for the yes-no questions that were asked on the steak survey.
There are other responses for how people like their steak prepared.
There's also information on gender,
age groups, income levels,
education, and the region in the United States that the survey participant lived.
The next two lines of code shown here,
use some of the functionality of the dplyr package
that's loaded along with the Tidyverse package,
namely, the filter function.
So these two lines of code make a copy of the steak survey data set,
but the first thing that it does is it omits the NAs,
the not availables, which are missing data in this data set.
Then, the data are piped using the percent,
greater than percent symbols shown here,
followed by the filter function which subsets the data to
only look at responses for people in the mountain region of the United States.
So highlight these next two lines of code, and click run.
You should now see that a new data set was created in your environment, called Estep.
It has 24 observations and 15 variables.
Now that we have a subset of the steak survey called Estep,
we're going to use that data to make
a clustered bar chart of how the people like their steak prepared
using the variable steak prep by gender indicated with the variable female.
These next two lines of code use
the ggplot2 package which was also loaded as part of the Tidyverse package.
Again, you should read more on the Tidyverse website about the ggplot2 package.
You can also learn more about making nice ggplot2 figures and
graphics at the cookbook for R website provided in the read ahead materials.
Go ahead and highlight these last four lines of code and
click run to see the clustered bar chart graphic produced by this R code.
The resulting figure is shown here in the bottom right window for plots.
In the next lesson, you're going to create a basic template for the report
that you want to see based on which region of the United States is selected.
You'll be working with parameters to further
customize your templates and automate your workflow.
For now though, let's go ahead and back everything up to your GitHub account.
Open Git Bash and make sure that you run the correct directory for Module 3,
and go ahead and add and stage the files that you just created,
type in a commit statement for your new R script,
and then push everything up to the cloud.
When you're done typing git status,
there's nothing left to commit and your working tree is clean.
Finally, go back to your Github account for Module 3,
click refresh, and you should now see that the R script you just created is here.