Thursday 26 March 2015

Getting variable labels in R, from SPSS

When using R, we may need find our data has been saved in a different statistics package. While there are some export functions in other statistical software that will export to a different filetype, or we may simply use a .csv file, R can import some datasets from their native filetype.

SPSS is one of those filetypes. SPSS datafiles have a .sav extension, and we can import these into R using the foreign package. This package is installed by default as part of the R core installation.

Ensure the foreign library is attached:
library(foreign)

There is a nifty trick to getting the filepath for the SPSS datafile you wish to import, use:
file.choose()

Copy and paste the filepath into this code:
dataset = read.spss("[filepath including filename goes here]", to.data.frame=TRUE)
The option at the end creates the R file as a dataframe, which is the type of data object I want in R.

Note: I am using dataset as my dataset name in this example. Use whatever name is best for you, and remember to change all instances of dataset to your actual dataset name in later code.

Unfortunately, if your SPSS datafile had variable labels (e.g. "Sex of respondent"), these aren't shown in the R dataframe, only the variable names are shown (e.g. Sex). While the name is often clear for variables such as sex, you may find that the names are less clear for other options (e.g. for a survey containing multiple "select all that apply" type questions/responses). It is therefore very useful to have the list of variable names and their associated labels.

You can simply print the concordance to the console by using:
attr(dataset, "variable.labels")

I didn't find this helpful, for two reasons:
  • I have a lot of variables, so it takes up a sizeable amount of console space
  • I am going to keep referring to the labels when I need to do analyses, and retaining the information in the console is not helpful if I have to keep scrolling back, or reissuing the command
I found it much more efficient to output the variable names and their labels to a separate dataframe that I can use:
dataset.labels <- as.data.frame(attr(dataset, "variable.labels"))

Voila, 5 lines of code to get my SPSS data and variable labels into R.