R equivalent of SQL coalesce
In SQL there is a function called coalesce
, which accepts any number of columns and returns the first non-null value in each row, reading across the columns in the order they’re passed in. This is super useful when you’re trying to relationalize a not-so-relational database, where a desired value might be found in one column or in another.
Today I needed to do the same thing in CRAN R, and a few quick Google searches for “R equivalent of SQL coalesce” didn’t turn up anything. So I wrote this quick function which accepts any number of R vectors of the same length and returns the first non-NA value in each position.
# accepts a list of vectors of identical length and returns one vector with the first non-NA value
coalesce = function(...) {
# convert input arguments into a list of vectors
input_list = list(...)
# check that all input vectors are of same length
vectorlength = length(input_list[[1]])
for (j in 1:length(input_list)) {
if(length(input_list[[j]]) != vectorlength) {
stop(paste("Not all vectors are of same length. First vector length: ",vectorlength,". Vector #",j,"'s length: ",length(input_list[[j]]),sep=""))
}
}
# create a result vector to fill with first non-NA values
result = rep(NA,vectorlength)
# fill with first non-NA value
for (i in 1:length(result)) {
for (j in 1:length(input_list)) {
if(!is.na(input_list[[j]][i])) {
result[i] = input_list[[j]][i]
break
}
}
}
return(result)
}
# examples to show how it works
most_preferred_measurement = seq(1,10,1)
most_preferred_measurement = NA
backup_measurement = seq(11,20,1)
backup_measurement = NA
least_preferred_measurement = seq(21,30,1)
least_preferred_measurement = NA
other_vector_of_different_length = seq(31,35,1)
coalesce(most_preferred_measurement,backup_measurement,least_preferred_measurement) # this works
coalesce(most_preferred_measurement,backup_measurement,least_preferred_measurement,other_vector_of_different_length) # this will throw an error
If you run this code, you’ll see the expected result:
> coalesce(most_preferred_measurement,backup_measurement,least_preferred_measurement) # this works [1] 11 2 3 24 NA 16 17 18 9 10 > coalesce(most_preferred_measurement,backup_measurement,least_preferred_measurement,other_vector_of_different_length) # this will throw an error Error in coalesce(most_preferred_measurement, backup_measurement, least_preferred_measurement, : Not all vectors are of same length. First vector length: 10. Vector #4's length: 5 >
Enjoy!