11 Introduction to data transfer

In previous chapters, you wrote your own simple R programs by directly creating data frames in RStudio with the data.frame() function, the tibble() function, or the tribble() function. I consider this to be a really fundamental skill to master because it allows you to simulate data and get your data into R regardless of what format that data is stored in (assuming you can “see” the stored data). In other words, if nothing else, you can always resort to creating data frames this way.

In practice, however, this is not how people generally exchange data. You might recall that in Section 2.2.1 Transferring data I briefly mentioned the need to get data into R that others have stored in various different file types. These file types are also sometimes referred to as file formats. Common examples encountered in epidemiology include database files, spreadsheets, text files, SAS data sets, and Stata data sets.

Further, the data frames we’ve created so far don’t currently live in our global environment from one programming session to the next. We haven’t yet learned how to efficiently store our data long-term. I think the limitations of having to manually create a data frame every time you start a new programming session are probably becoming obvious to you.

In this part of the book, you will learn to import data stored in various different file types into R for data management and analysis, you will learn to store R data frames in a more permanent way so that you can come back later to modify or analyze them, and you will learn to export data so that you may efficiently share it with others.