Introduction to R

22. The powerful “merge” function

Merging dataframes is even more powerful.  This process involves combining (merging) dataframes that share both common and different variables (columns) as well as some matching rows. For example, a trial may have measured blood concentrations of a drug as well as the effect of the drug eg. heart rate.  These data may have been collected at different times after the dose and the information was stored in separate files.  For example, one dataset pertains to the concentrations another to the effect on exercise-induced heart rate.  Not all of the collections were the same time in each subject or measurement (concentration or effect). Naturally it’s best to have these key data in a single dataframe, and probably ordered by subject ID and sequentially in time. Doing so in Excel, for example, would involve a lot of copying and pasting, re-ordering, sorting, with each step prone to errors and very time consuming.  The merge(dataframe1, dataframe2, dataframe3, etc) makes this very simple.

In this case we will merge some drug concentration-time data and some effect-time data. The effect is heart rate (hr).  Import the concentration_data.csv and effect_data.csv (link this please) files into R:

conc_data <- read.csv(file = 'concentration_data.csv', header = TRUE, sep = ',', na.strings=c("",".","NA"))

head(conc_data)

hr_data <- read.csv(file = 'effect_data.csv', header = TRUE, sep = ',', na.strings=c("",".","NA"))

head(hr_data)

 

all_data <- merge(conc_data, hr_data, all = TRUE)

head(all_data)

Notice that the data from both subjects now includes both the variable conc (concentration) and hr (heart rate), and that they are ordered sequentially by time.

Exercise:

Create plots of concentration versus time and heart-rate versus time, colour the symbols by subject ID, and save the plots.  Do this for both the original data and the merged data.  Satisfy yourself that they are identical.