# Graphing with ggplot

How to use ggplot

tutorial
Author

Miriam Heiss

Published

July 29, 2022

The function `ggplot()` is essential to any data scientist. It is a really simple way to graph data, make maps, and make neat things. It might seem really hard at first, you have to learn a formula, keep all the `geoms` straight (or curvy, if you need to), and it will be overwhelming. Don’t be scared though, because once you learn the basics, all will become clear! Let’s begin!

## Libraries

Here, I’m loading my libraries. The package tidyverse includes eight packages, one of them being ggplot. The package primer.data has more datasets than the ones built in to R.

Code
``````library(tidyverse)
library(primer.data)``````

## Plotting

Once you’ve loaded your libraries, it’s time to start plotting! Start your plot by typing `ggplot()`. Right now, if you run it, it will not show anything, because we don’t have data!

Code
``ggplot()`` We are going to be using the dataset called nhanes. Inside `ggplot()`, set data equal to nhanes. If you want to see the dataset, put `> glimpse(nhanes)` in your console.

Code
``ggplot(data = nhanes)`` Do not worry about the graph being blank at this point, because we have not added axes or `geoms`, so there is nothing on the graph.

Inside `ggplot()` after the nhanes, you should type `mapping = aes()`. Inside `aes()`, we are going to put the x, the y, and the color.

Code
``````ggplot(data = nhanes,
mapping = aes())`````` Let’s start graphing! In `aes()`, choose your x, your y, and your color. I am going to be using height for x, weight for y, and gender for color.

Code
``````ggplot(data = nhanes,
mapping = aes(x = height,
y = weight,
color = gender))`````` Now it has an empty graph, instead of just blank because we added the axes. But there is still nothing on the graph!

## Geoms

We need to add a `geom` to the graph! A `geom` is what shows us the data. We add a layer by using ‘+’ Let’s use `geom_point` to make a Scatterplot.

Code
``````ggplot(data = nhanes,
mapping = aes(x = height,
y = weight,
color = gender))+
geom_point()`````` Hmmmm. Our graph doesn’t look quite right. There’s a lot of overplotting! We can fix that by changing `alpha` to 0.3 within the geom.That will change the opacity of the dots. Let’s also go ahead and change the size of the dots by adding another argument to `geom_point`. Let’s change the size to 0.5.

Code
``````ggplot(data = nhanes,
mapping = aes(x = height,
y = weight,
color = gender)) +
geom_point(alpha = 0.3, size =  0.5)`````` That looks much nicer. But what if we want to separate the gender so they don’t touch?

## Faceting

We are going to add another layer called `facet_wrap`. This will separate the graph and make a Female graph and a Male graph.

Code
``````ggplot(data = nhanes,
mapping = aes(x = height,
y = weight,
color = gender)) +
geom_point(alpha = 0.3, size =  0.5) +
facet_wrap(~gender)`````` Now we have two separate graphs!

## Second Geom

Now we’re going to add a trend line. Let’s add `geom_smooth` to the graph. We are going to want it by `geom_point`, that way all the `geoms` are next to each other. Now we need a ‘+’ to add the `facet_wrap` layer. Once we add the `geom_smooth`, it is going to be super crowded, and the line will be almost invisible. We can change that by setting `alpha` (in `geom_point`) to 0.1. This will reduce crowding significantly.

Code
``````ggplot(data = nhanes,
mapping = aes(x = height,
y = weight,
color = gender)) +
geom_point(alpha = 0.1, size =  0.5) +
geom_smooth() +
facet_wrap(~gender)`````` Now, I want a smoother curve. In geom_smooth, we are going to set the method to `"loess"`, and the formula to `y~x`. The method `"loess"` makes a smoother curve, and formula `y~x` means that y has something to do with x.

Code
``````ggplot(data = nhanes,
mapping = aes(x = height,
y = weight,
color = gender)) +
geom_point(alpha = 0.1, size =  0.5) +
geom_smooth(method = "loess", formula = y~x) +
facet_wrap(~gender)`````` Now it’s looking pretty neat!

## Labels

Here, we are going to start adding labels. Add another layer with ‘+’, and then type `labs()`. Inside `labs()`, we can add a label for title, subtitle, caption, x, y, and the legend. Choose a title and subtitle that relate to the graph. Make sure you put the text in quotation marks (““), otherwise you will get a bunch of errors.

Code
``````ggplot(data = nhanes,
mapping = aes(x = height,
y = weight,
color = gender)) +
geom_point(alpha = 0.1, size =  0.5) +
geom_smooth(method = "loess", formula = y~x) +
facet_wrap(~gender) +
labs(title = "Heights in the U.S.",
subtitle = "On average, men weigh more and are taller than women")`````` Now we have a title and subtitle, but the x and y axes labels don’t look very nice. Let’s fix that. Set x equal to “Height (cm)” and y equal to “Weight (kg)”.

Code
``````ggplot(data = nhanes,
mapping = aes(x = height,
y = weight,
color = gender)) +
geom_point(alpha = 0.1, size =  0.5) +
geom_smooth(method = "loess", formula = y~x) +
facet_wrap(~gender) +
labs(title = "Heights in the U.S.",
subtitle = "On average, men weigh more and are taller than women",
x = "Height (cm)",
y = "Weight (kg)")`````` Awesome! But what about the “gender” above the legend?

In the `aes()` we set x equal to height, y equal to weight, and color equal to gender. We changed x and y, and we can change “gender” the same way! We can set color equal to “Gender”, and that will fix the lowercase “gender” on the legend.

Code
``````ggplot(data = nhanes,
mapping = aes(x = height,
y = weight,
color = gender)) +
geom_point(alpha = 0.1, size =  0.5) +
geom_smooth(method = "loess", formula = y~x) +
facet_wrap(~gender) +
labs(title = "Heights in the U.S.",
subtitle = "On average, men weigh more and are taller than women",
x = "Height (cm)",
y = "Weight (kg)",
color = "Gender")`````` What if we want to give credit to the source of the data?

This data came from NHANES, the National Health and Nutrition Examination Survey, so we want to credit that source. How do we do that? We add a caption!

Code
``````ggplot(data = nhanes,
mapping = aes(x = height,
y = weight,
color = gender)) +
geom_point(alpha = 0.1, size =  0.5) +
geom_smooth(method = "loess", formula = y~x) +
facet_wrap(~gender) +
labs(title = "Heights in the U.S.",
subtitle = "On average, men weigh more and are taller than women",
x = "Height (cm)",
y = "Weight (kg)",
color = "Gender",
caption = "Source: National Health and Nutrition Examination Survey")`````` ## Changing colors

That is a pretty cool looking graph! But, I don’t like the pink and blue that `ggplot()` chose. How do I change that?

There are two ways to solve this problem. One way is to hand pick the built in colors from R, like `"blue"` and `"magenta"`. We want to do the function `scale_color_manual()` right after the `geoms`, just to be more organized. Inside, we will put `values = c`. That makes a list of the colors that we want.That will look something like this:

Code
``````ggplot(data = nhanes,
mapping = aes(x = height,
y = weight,
color = gender)) +
geom_point(alpha = 0.1, size =  0.5) +
geom_smooth(method = "loess", formula = y~x) +
scale_color_manual(values = c("magenta",
"blue"))+
facet_wrap(~gender) +
labs(title = "Heights in the U.S.",
subtitle = "On average, men weigh more and are taller than women",
x = "Height (cm)",
y = "Weight (kg)",
color = "Gender",
caption = "Source: National Health and Nutrition Examination Survey")`````` But what if we don’t want the built in R colors? You can also use HEX codes for your colors. I searched “color picker” in my browser and picked a pink and a blue HEX. It is almost the same, you just put the HEX instead of `"color"` in the `scale_color_manual`. It will look something like this:

Code
``````ggplot(data = nhanes,
mapping = aes(x = height,
y = weight,
color = gender)) +
geom_point(alpha = 0.1, size =  0.5) +
geom_smooth(method = "loess", formula = y~x) +
scale_color_manual(values = c("#ba4c85",
"#5f4cba"))+
facet_wrap(~gender) +
labs(title = "Heights in the U.S.",
subtitle = "On average, men weigh more and are taller than women",
x = "Height (cm)",
y = "Weight (kg)",
color = "Gender",
caption = "Source: National Health and Nutrition Examination Survey")`````` That looks super nice with the custom colors. Let’s take a look at the HEX colors again. they are #ba4c85 and #5f4cba. Which one is pink and which one is blue? You can’t tell without looking at both the code and the graph.

This is where comments come in. They are really easy to add to your code, all it is is a `#` (pound sign, not hashtag!). Anything after a `#` will not run in the code. We can add comments like `# Pink` or `# Blue`. After making comments, it will look like this:

Code
``````ggplot(data = nhanes,
mapping = aes(x = height,
y = weight,
color = gender)) +
geom_point(alpha = 0.1, size =  0.5) + # Alpha changes the opacity
geom_smooth(method = "loess", formula = y~x) +
scale_color_manual(values = c("#ba4c85", # Pink
"#5f4cba"))+ # Blue
facet_wrap(~gender) +
labs(title = "Heights in the U.S.", # Children are included in the data too
subtitle = "On average, men weigh more and are taller than women",
x = "Height (cm)",
y = "Weight (kg)",
color = "Gender",
caption = "Source: National Health and Nutrition Examination Survey")`````` Now you can easily read the code, and the graph looks nice!

As you can see, `ggplot()` is actually really simple to use.