Erik Igelström

R packages for reading and writing data

This is part of a series of posts about which R package to use for various tasks.

(Accessibility note: When I’ve mentioned a package that’s available on CRAN, I’ve included a badge (from METACRAN) showing how many monthly downloads the package has. There’s no good way to make these accessible, and I thought the nicest experience for screen reader users might be to ignore them entirely, so I’ve given them empty alt attributes.)

Microsoft Excel

Short answer: Use readxl.

More details: “Compared to many of the existing packages (e.g. gdata, xlsx, xlsReadWrite) readxl has no external dependencies, so it’s easy to install and use on all operating systems.” (source) xlsx in particular depends on a Java library; openxlsx does not, but is less popular than readxl.

JSON

Short answer: Use jsonlite. It’s by far the most popular.

More details: This article has some feature comparisons and performance benchmarks – the main caveat of note is that rjson seems to be faster (at the expense of some features), so it might be better if performance is really important.

XML

Depends on your needs. xml2 is newer and more modern, but XML has more functions for parsing special kinds of XML files, e.g. xmlToDataFrame for simple table-like structures (I think), and readKeyValueDB to read plist files.

Data from URLs/API endpoints

To download any file and save it as a local file, you can use the built-in download.file() function. If you just want to keep the text as a character vector in memory: you should probably use httr. Older sources might recommend RCurl, but it doesn’t support newer versions of TLS, which will prevent you from downloading from many websites (e.g. GitHub).

Some packages for reading specific text formats (e.g. jsonlite, xml2) can also take URLs as input, bypassing the need for a separate package.


Comments

Fill in the form below to add a comment. I manually review all comments before publishing them. Your name and any website link you provide will be made public, but your email address will not.