WebR Charlie Data Example

Scatterplots

Scatterplots are great for showing individual data points and revealing correlations between variables.

We can use the same code in the previous graph example to make a simple scatterplot. All we need to do is change one thing: instead of geom_line() we’ll use geom_point() as the type of graph.

Let’s work on making that graph look a little nicer by adding additional lines of code to customize it.

Note one annoying thing: in graphs, instead of using the %>% to add another line, you have to use a +. I’m sorry!

The first thing we can do is change our labels. By default, they will be whatever the variable is named, but that often includes underscores and doesn’t look nice. We’ll add functions to make new labels, note that for each line we add, we add a + to the previous line and that the label titles have to be in quotes "".

If we want the points to be a different color, we can add that to our main graph aesthetics. And if we don’t like the gray background, we can add another line at the end that changes the background.

R knows the names of lots of colors. Here are all the built in color names. But you can also use any hex code as a color, so the color can be whatever you want.

Graphing treat Data

Let’s make a more interesting plot now. Let’s graph yumminess by price of treat to see how fancy Charlie’s tastes are.

Not much of a correlation, but we probably shouldn’t use the price column. Swap it out for using price_per_item and see if things change.

Here you can see that some of your points are almost right on top of each other, we can add some code to change that. We’ll add an option to geom_point that jitters the points around so they aren’t so close together. (If we wanted we could control just how much each point jitters, but let’s not go crazy yet.)

Now let’s do one more thing and add a trend line to see if that helps us see any correlation. We do this by adding the function geom_smooth() and telling it the method we want to use is “lm”, meaning “linear model”, which will draw a straight (y = mx + b) trend line .

Note: Unfortunately the interactive code here will print out tons of stuff when you add geom_smooth(). Scroll all the way down and you can see the graph. But you can also see along the way all the things you could customize!

The blue line is the trend line and the gray area around it represents a confidence interval. There are of course ways to change or hide the confidence interval and to the color of everything, but we’ll save detailed aesthetics for another day.