Boxplots
Boxplots are good for showing distributions and great for comparing them. They show the median as a line inside a box that represents the interquartile range, meaning the range of data from 25%-75% of the distribution. The lines that extend from a boxplot are called whiskers and they show spread outside of the quartiles. The dots show outliers.
We’ll make a boxplot of yumminess versus flavor to see which type Charlie likes best. Again we start with the same basic structure, but now we say geom_boxplot()
as the graph type. See if you can add our variables:
Can you tell which flavor Charlie likes best? Which flavor has the most variation in yumminess?
Here’s a good site if you need help with interpreting boxplots.
Color by Variable
What might be nice now is to color these boxes differently so they stand out a little more. We can do that by adding in some code to the aesthetics. First, let’s just color the individual flavors differently by setting color to the variable flavor.
Note that that might not have done what you expected. In R color refers to lines and points and fill refers to shapes, so in the code change the word color to fill and see what happens.
R will choose colors for you automatically, but if you don’t like its choices you can choose your own colors. Here I add the command scale_fill_manual
and list all the colors that I want my boxes to be. Note that I use the function c()
to add the colors into a list. Also, note that I have to have as many colors listed as I have boxes or I’ll get an error.