Renaming, Deleting, and Adding Columns
Selecting and Deleting Columns
If you want to select several columns or delete columns you will use the command select()
.
What the select()
function does is essentially subset our data by column name. You will add within the ()
the names of the columns you want to see and they will appear in the order you list, so this is also helpful if you want to rearrange the data.
For example, maybe I just want to view the price and the flavor of treats so I can see which type is the best for my budget. Here’s what I would code:
Or maybe I don’t care about the flavor or brand name, I can remove those columns by adding a -
before the column name. I’ll do this for both column names and the output will be everything but those columns.
So far we have just printed out the output, but if you want to save your changes you can make a new data object:
You can also directly change the treats dataset itself if you don’t want to make a new dataset. This is called “overwriting” and it just means that you assign your changes to your original variable name. So here any changes we make we just assign back to the object treats
.
Be careful when overwriting your data because if you want any of those other columns back, you’ll have to fully reload your data. But it’s also nice to not have a million new objects for everything, so if you know a change is something you will always want, overwriting is good.
Renaming Columns
One time overwriting is particularly useful is if you want to rename a column. When renaming columns we want to consider the type of text R likes: it doesn’t like spaces, so you can use underscores, and since it’s case sensitive, it’s probably best to just always use lowercase.
We use the rename()
function to change column names. They syntax is to pick a new column name and set it equal to the old column name.
Adding or Changing Columns
The mutate()
function let’s use add or change columns.
A good thing to have would be a column to show the cost per item since some treats have multiple items per bag. We can use the mutate()
function to divide the price column by the quantity column. The syntax for making a new column is, inside the parentheses, you give it a name and set it equal to whatever you want. We’ll call our new column price_per_item
.
We can also change existing columns using mutate()
. Let’s say we want to rescale how we define yumminess, so a 3 is now the baseline and anything above 3 is positive, anything below 3 is negative, and 3 itself is a neutral 0.