Students taking my R for epidemiology course often come into the course thinking it will be a math or statistics course. In reality, this course is probably much closer to a foreign language course. There is no doubt that we need a foundational understanding of math and statistics to understand the results we get from R, but R will take care of all of the complicated stuff for us. All we have to do is learn how to ask R to do what we want it to do. To some extent, this entire book is about learning to communicate with R. So, in this chapter we will introduce the R programming language from the 30,000-foot level.
In the same way that many people use the English language to communicate with each other, we will use the R programming language to communicate with R. Just like the English language, the R language comes complete with its own structure and vocabulary. Unfortunately, just like the English language, it also includes some weird exceptions and occasional miscommunications. We’ve already seen a couple examples of commands written to R in the R programming language. Specifically:
# Store the value 2 in the variable x <- 2 x # Print the contents of x to the screen x
##  2
# Print an example number sequence to the screen seq(from = 2, to = 100, by = 2)
##  2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 ##  40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76 ##  78 80 82 84 86 88 90 92 94 96 98 100
🗒Side Note: The gray boxes you see above are called R code chunks and I created them (and this entire book) using something called R markdown. Can you believe that you can write an entire book with R and RStudio? How cool is that? You will learn to use R markdown documents later in this book. R markdown is great because it allows you to mix R code with narrative text and multimedia content as I’ve done throughout the page you’re currently looking at. This makes it really easy for us to add context and aesthetic appeal to our results.
Question: I keep talking about “speaking” to R, but when you speak to R using the R language, who are you actually speaking to?
Well, you are speaking to something called the R interpreter. The R interpreter takes the commands we’ve written in the R language, sends them to your computer to do the actual work (e.g., get the mean of a set of numbers), and then translates the results of that work back to us in a form that we humans can understand (e.g., the mean is 25.5). At this stage, one of the key concepts for you to understand about the R language is that is extremely literal! Understanding the literal nature of R is important because it will be the underlying cause of a lot of errors in your R code.
No matter what I write next, you are going to get errors in your R code. I still get errors in my R code every single time I write R code. However, my hope is that this section will help you begin to understand why you are getting errors when you get them and provide us with a common language for discussing errors.
So, what exactly do I mean when I say that the R interpreter is extremely literal? Well, in the Navigating RStudio chapter I already told you that R is a case sensitive language. Again, that means that uppercase x (X) and lowercase x (x) are different things to R. So, if you assign 2 to lowercase x (
x <- 2). And then later ask R to tell what number you stored in upper case X; you will get an error (
Error: object 'X' not found).
<- 2 x X
## Error in eval(expr, envir, enclos): object 'X' not found
Specifically, this is an example of a logic error. Meaning, R understands what you are asking it to do – you want it to print the contents of the uppercase X object to the screen. However, it can’t complete your request because you are asking it to do something that doesn’t logically make sense – print the contents of a thing that doesn’t exist. Remember, R is literal, and it will not try to guess that you actually meant to ask it to print the contents of lowercase x.
Another general type of error is known as a syntax error. In programming languages, syntax refers to the rules of the language. You can sort of think of this as the grammar of the language. In English, I could say something like, “giving dog water drink.” This sentence is grammatically completely incorrect; however, most of you would roughly be able to figure out what I’m asking you to do based on your life experience and knowledge of the situational context. The R interpreter, as awesome as it is, would not be able to make an assumption about what I want it to do. There would either be one, and only one, preprogrammed correct response to such a request, or the R interpreter would say, “I don’t know what you’re asking me to do.” When the R interpreter says, “I don’t know what you’re asking me to do,” you’ve made a syntax error.
Throughout the rest of the book, I will try to point out situations where R programmers often encounter errors and how you may be able to address them. The remainder of this chapter will discuss some key components of R’s syntax and the data structures (i.e., ways of storing data) that the R syntax interacts with.
R is a functional programming language, which simply means that functions play a central role in the R language. But what are functions? Well, factories are a common analogy used to represent functions. In this analogy, arguments are raw material inputs that go into the factory. For example, steel and rubber. The function is the factory where all the work takes place – converting raw materials into the desired output. Finally, the factory output represents the returned results. In this case, bicycles.
To make this concept more concrete, in the Navigating RStudio chapter we used the
seq() function as a factory. Specifically, we wrote
seq(from = 2, to = 100, by = 2). The inputs (arguments) were
by. The output (returned result) was a set of numbers that went from 2 to 100 by 2’s. Most functions, like the
seq() function, will be a word or word part followed by parentheses. Other examples are the
sum() function for addition and the
mean() function to calculate the average value of a set of numbers.
In addition to functions, the R programming language also includes objects. In the Navigating RStudio chapter we created an object called
x with a value of 2 using the
x <- 2 R code. In general, you can think of objects as anything that lives in your R global environment. Objects may be single variables (also called vectors in R) or entire data sets (also called data frames in R).
Objects can be a confusing concept at first. I think it’s because it is hard to precisely define exactly what an object is. I’ll say two things about this. First, you’re probably overthinking it. When we use R, we create and save stuff. We have to call that stuff something in order to talk about it or write books about it. Somebody decided we would call that stuff “objects.” The second thing I’ll say is that this becomes much less abstract when we finally get to a place where you can really get your hands dirty doing some R programming.
Sometimes it can be useful to relate the R language to English grammar. That is, when you are writing R code you can roughly think of functions as verbs and objects as nouns. Just like nouns are things in the English language, and verbs do things in the English language, objects are things and functions do things in the R language.
So, in the
x <- 2 command
x is the object and
<- is the function. “Wait! Didn’t you just tell us that functions will be a word followed by parentheses?” Fair question. Technically, I said, “Most functions will be a word, or word part, followed by parentheses.” Just like English, R has exceptions. All operators in R are also functions. Operators are symbols like
<-. There are many more operators, but you will notice that they all do things. In this case, they add, subtract, and assign values to objects.
In addition to being a functional programming language, R is also a type of programming language called an open source programming language. For our purposes, this has two big advantages. First, it means that R is FREE! Second, it means that smart people all around the world get to develop new packages for the R language that can do cutting edge and/or very niche things.
That second advantage is probably really confusing if this is not a concept you are already familiar with. For example, when you install Microsoft Word on your computer all the code that makes that program work is owned and Maintained by the Microsoft corporation. If you need Word to do something that it doesn’t currently do, your only option is really to make a feature request on Microsoft’s website.
R works a little differently. When you downloaded R from the CRAN website, you actually downloaded something called Base R. Base R maintained by the R Core Team. However, anybody – even you – can write your own code (called packages) that add new functions to the R syntax. Like all functions, these new functions allow you to do things that you can’t do (or can’t do as easily) with Base R.
An analogy that I really like here is used by Ismay and Kim in ModernDive.
A good analogy for R packages is they are like apps you can download onto a mobile phone. So R is like a new mobile phone: while it has a certain amount of features when you use it for the first time, it doesn’t have everything. R packages are like the apps you can download onto your phone from Apple’s App Store or Android’s Google Play.1
So, when you get a new smart phone it comes with apps for making phone calls, checking email, and sending text messages. But, what if you want to listen to music on Spotify? You may or may not be able to do that through your phone’s web browser, but it’s way more convenient and powerful to download and install the Spotify app.
In this course, we will make extensive use of packages developed by people and teams outside of the R Core Team. In particular, we will use a number of related packages that are collectively known as the Tidyverse. One of the most popular packages in the tidyverse collection (and one of the most popular R packages overall) is called the
dplyr package for data management.
In the same way that you have to download and install Spotify on your mobile phone before you can use it, you have to download and install new R packages on your computer before you can use the functions they contain. Fortunately, R makes this really easy. For most packages, all you have to do is run the
install.packages() function in the R console. For example, here is how you would install the
# Make sure you remember to wrap the name of the package in single or double quotes. install.packages("dplyr")
Over time, you will download and install a lot of different packages. All those packages with all of those new functions start to create a lot of overhead. Therefore, R doesn’t keep them loaded and available for use at all times. Instead, every time you open RStudio, you will have to explicitly tell R which packages you want to use. So, when you close RStudio and open it again, the only functions that you will be able to use are Base R functions. If you want to use functions from any other package (e.g.,
dplyr) you will have to tell R that you want to do so using the
# No quotes needed here library(dplyr)
Technically, loading the package with the
library() function is not the only way to use a function from a package you’ve downloaded. For example, the
dplyr package contains a function called
filter() that helps us keep or drop certain rows in a data frame. To use this function, we have to first download the
dplyr package. Then we can use the filter function in one of two different ways.
library(dplyr) filter(states_data, state == "Texas") # Keeps only the rows from Texas
The first way you already saw above. Load all the functions contained in the
dplyr package using the
library() function. Then use that function just like any other Base R function.
The second way is something called the double colon syntax. To use the double colon syntax, you type the package name, two colons, and the name of the function you want to use from the package. Here is an example of the double colon syntax.
::filter(states_data, state == "Texas") # Keeps only the rows from Texasdplyr
Most of the time you will load packages using the
library() function. However, I wanted to show you the double colon syntax because you may come across it when you are reading R documentation and because there are times when it makes sense to use this syntax.
Finally, I want to discuss programming style. R can read any code you write as long as you write it using valid R syntax. However, R code can be much easier or harder for people (including you) to read depending on how it’s written. The coding best practices chapter of this book gives complete details on writing R code that is as easy as possible for people to read. So, please make sure to read it. It will make things so much easier for all of us!