9. Subsetting dataframes

To subset a dataframe, one can use a variety of the logical operators described above.  For example, we might want to subset the data for to contain only data from subjects over 25 years of age. We might also want to make this a separate dataframe so as to not overwrite our original data, and then check this to make sure that the command was carried out successfully by using the minimum age using the min function on this new data frame:

data_gt25y <- data[data$age >= 25,]

head(data_gt25y)

min(data_gt25y$age)

 

The above method subsets the entire data frame. If we wanted to just subset the variable “comfort” to show only data which rated their boots as uncomfortable or worse (ie. >=4):

data$comfort[data$comfort >= 4]

 

Which shows that very few subjects found their boots uncomfortable or worse, but the large number of “NA’s” also shows that there were a large number of subjects who didn’t report a value for comfort at all!  These NA or missing values can cause problems.  Try to calculate the median comfort rating.  Then include the argument “na.rm=TRUE” (remove NA values = true):

median(data$comfort)

median(data$comfort, na.rm=T)