Tuesday 14 May 2013

Easier confidence interval estimation with matrices and similar arrays in R

When dealing with survey data in particular, social scientists are often wanting to produce proportions from the data, and associated confidence intervals. The prop.test command in R can be used to generate the desired results. When dealing with small tables, such as 2x2, prop.test is easily applied to the output of an xtabs command, e.g.
load(MASS)
test <- xtabs(~low+smoke, data=birthwt)
test
#get 95% CI estimate of the proportion of low birth weight babies who had mothers who smoked during pregnancy
prop.test(test[4], test[2]+test[4], 0.05)

We can see that the xtab output can be referenced in the prop.test command by putting in the cell reference information for the data. The cell references are column-by-column, so go down each row in the first column, before moving onto the second column. We can also see that this type of prop.test command is quite short when the xtabs output is relatively small.

Issues arise when there are multiple rows or columns in the xtabs output. For example, if we wanted to examine number of physician's visits in the first trimester by mother's age:
test2 <- xtabs(~age+ftv, data=birthwt) 
test2 

Now if we want to know the proportion and associated CIs for each age of the mother, for each number of physician visits, there are 24 cells to sum for each physician visit count. This is way too long to enter into our prop.test; while prop.test can handle summing all these numbers, the R syntax will be hard to understand.

This is where colSums is a handy command. The column sums for test2 are simply calculated using:
colSums(test2)

 This command can be easily used in prop.test. For example, to get the proportion of mothers who saw no physical, who were 19 years of age, the prop.test command is:
prop.test(test2[6], colSums(test2)[1], 0.05)

Because the count we want is cell number 6 in test2, and the colSum we want is for the first column. We can instruct R to conduct the colSums test within prop.test, or alternatively we could assign the colSums result to an object, and then reference the cell in the object.