R packages for reading and writing data
This is part of a series of posts about which R package to use for various tasks.
(Accessibility note: When I’ve mentioned a package that’s available on CRAN, I’ve included a badge (from METACRAN) showing how many monthly downloads the package has. There’s no good way to make these accessible, and I thought the nicest experience for screen reader users might be to ignore them entirely, so I’ve given them empty alt attributes.)
Microsoft Excel
Short answer: Use readxl
.
More details: “Compared to many of the existing packages (e.g. gdata, xlsx, xlsReadWrite) readxl has no external dependencies, so it’s easy to install and use on all operating systems.” (source) xlsx
in particular depends on a Java library; openxlsx
does not, but is less popular than readxl
.
JSON
Short answer: Use jsonlite
. It’s by far the most popular.
More details: This article has some feature comparisons and performance benchmarks – the main caveat of note is that rjson
seems to be faster (at the expense of some features), so it might be better if performance is really important.
XML
Depends on your needs. xml2
is newer and more modern, but XML
has more functions for parsing special kinds of XML files, e.g. xmlToDataFrame
for simple table-like structures (I think), and readKeyValueDB
to read plist files.
Data from URLs/API endpoints
To download any file and save it as a local file, you can use the built-in download.file()
function. If you just want to keep the text as a character vector in memory: you should probably use httr
. Older sources might recommend RCurl
, but it doesn’t support newer versions of TLS, which will prevent you from downloading from many websites (e.g. GitHub).
Some packages for reading specific text formats (e.g. jsonlite, xml2) can also take URLs as input, bypassing the need for a separate package.