12. Factors – recoding variables
Factors in R represent categorical data and are important in plots, data summaries and statistics. By default R will import text data as factors. However, often the data we have uses numbers to represent different categories. For example, gender is currently coded 1 or 2 to indicate Male and Female, respectively. To check this, try the unique command for the gender variable:
unique(data$gender)
Factors have levels (names) and levels have an order. It’s easy to accidentally mix them up. The safest way I find to do this is to
(i) create a new variable to play the role of the factor (eg genderf)
(ii) very clearly declare the text descriptors
data$genderf[data$gender == "1"] <- "Male"
data$genderf[data$gender == "2"] <- "Female"
(iii) make this new variable a factor
data$genderf <- as.factor(data$genderf)
I also like to check that things are how they are supposed to be. I simply create a new object (Factor_summary) and make a table summarising the original numerical coding against the new text coding.
As you can see the only entries for the factors “Male” or “Female” correspond to the original 1 or 2 identifiers, respectively.
Exercise: Assign text-based factors to the remaining categorical variables. |