16 Exporting data

The data frames we’ve created so far don’t currently live in our global environment from one programming session to the next because we haven’t yet learned how to efficiently store our data long-term. This limitation makes it difficult to share our data with others or even to come back later to modify or analyze our data ourselves. In this chapter, you will learn to export data from R’s memory to a file on your hard drive so that you may efficiently store it or share it with others. In the examples that follow, I’m going to use this simulated data.

demo <- tibble(
  id  = c("001", "002", "003", "004"),
  age = c(30, 67, 52, 56),
  edu = c(3, 1, 4, 2)
)

👆 Here’s what we did above:

  • We created a data frame that is meant to simulate some demographic information about 4 hypothetical study participants.

  • The first variable (id) is the participant’s study id.

  • The second variable (age) is the participant’s age at enrollment in the study.

  • The third variable (edu) is the highest level of formal education the participant completed. Where:

    • 1 = Less than high school

    • 2 = High school graduate

    • 3 = Some college

    • 4 = College graduate

16.1 Plain text files

Most of readr’s read_ functions that were introduced in the importing plain text files chapter have a write_ counterpart that allow you to export data from R into a plain text file.

Additionally, all of havens read_ functions that were introduced in the importing binary files chapter have a write_ counterpart that allow you to export data from R into SAS, Stata, and SPSS binary file formats.

Interestingly, readxl does not have a write_excel() function for exporting R data frames as .xls or .xlsx files. However, the importance of this is mitigated by the fact that Excel can open .csv files and readr contains a function (write_csv())for exporting data frames in the .csv file format. If you absolutely have to export your data frame as a .xls or .xlsx file, there are other R packages capable of doing so (e.g., xlsx).

So, with all these options what format should you choose? My answer to this sort of depends on the answers to two questions. First, will this data be shared with anyone else? Second, will I need any of the metadata that would be lost if I export this data to a plain text file?

Unless you have a compelling reason to do otherwise, I’m going to suggest that you always export your R data frames as csv files if you plan to share your data with others. The reason is simple. They just work. I can think of many times when someone sent me a SAS or Stata data set and I wasn’t able to import it for some reason or the data didn’t import in the way that I expected it to. I don’t recall ever having that experience with a csv file. Further, every operating system and statistical analysis software application that I’m aware of is able to accept csv files. Perhaps for that reason, they have become the closest thing to a standard for data sharing that exists – at least that I’m aware of.

Exporting an R data frame to a csv file is really easy. The example below shows how to export our simulated demographic data to a csv file on my computer’s desktop:

readr::write_csv(demo, "/Users/bradcannell/Desktop/demo.csv")

👆Here’s what we did above:

  • We used readr’s write_csv() function to export a data frame called demo in our global environment to a csv file on our desktop called demo.csv.

  • You can type ?write_csv into your R console to view the help documentation for this function and follow along with the explanation below.

  • The first argument to the write_csv() function is the x argument. The value passed to the x argument should be a data frame that is currently in our global environment.

  • The second argument to the write_csv() function is the path argument. The value passed to the path should be a file path telling R where to create the new csv file.

    • You name the csv file directly in the file path. Whatever name you write after the final slash in the file path is what the csv file will be named.

    • As always, make sure you remember to include the file extension in the file path.

Even if you don’t plan on sharing your data, there is another benefit to saving your data as a csv file. That is, it’s easy to open the file and take a quick peek if you need to for some reason. You don’t have to open R and load the file. You can just find the file on your computer, double-click it, and quickly view it in your text editor or spreadsheet application of choice.

However, there is a downside to saving your data frames to a csv file. In general, csv files don’t store any metadata, which can sometimes be a problem (or a least a pain). For example, if you’ve coerced several variables to factors, that information would not be preserved in the csv file. Instead, the factors will be converted to character strings. If you need to preserve metadata, then you may want to save you data frames in a binary format.

16.2 R binary files

In the chapter on importing binary files I mentioned that most statistical analysis software allows you to save your data in a binary file format. The primary advantage to doing so is that potentially useful metadata is stored alongside your analysis data. We were first introduced to factor vectors in the chapter on numerical descriptions of categorical variables. There, we saw how coercing some of your variables to factors can be useful. However, doing so requires R to store metadata along with the analysis data. That metadata would be lost if you were to export your data frame to a plain text file. This is an example of a time when we may want to consider exporting our data to a binary file format.

R actually allows you to save your data in multiple different binary file formats. The two most popular are the .Rdata format and the .Rds format. I’m going to suggest that you use the .Rds format to save your R data frames. Exporting to this format is really easy with the readr package.

The example below shows how to export our simulated demographic data to an .Rds file on my computer’s desktop:

readr::write_rds(demo, "/Users/bradcannell/Desktop/demo.rds")

👆Here’s what we did above:

  • We used readr’s write_rds() function to export a data frame called demo in our globabl environment to an .Rds file on our desktop called demo.rds.

  • You can type ?write_rds into your R console to view the help documentation for this function and follow along with the explanation below.

  • The first argument to the write_rds() function is the x argument. The value passed to the x argument should be a data frame that is currently in our global environment.

  • The second argument to the write_csv() function is the path argument. The value passed to the path should be a file path telling R where to create the new .Rds file.

    • You name the .Rds file directly in the file path. Whatever name you write after the final slash in the file path is what the .Rds file will be named.

    • As always, make sure you remember to include the file extension in the file path.

To load the .Rds data back into your global environment, simply pass the path to the .Rds file to readrs read_rds() function:

demo <- readr::read_rds("/Users/bradcannell/Desktop/demo.rds")

There is a final thought I want to share on exporting data frames. When I got to the end of this chapter, it occurred to me that the way I wrote it may give the impression that that you must choose to export data frames as plain text files or binary files, but not both. That isn’t the case. I frequently export my data as a csv file that I can easily open and view and/or share with others, but also export it to an .Rds file that retains useful metadata I might need the next time I return to my analysis. I suppose there could be times that your files are so large that this is not an efficient strategy, but that is generally not the case in my projects.