12. Factors – recoding variables

Factors in R represent categorical data and are important in plots, data summaries and statistics.  By default R will import text data as factors.  However, often the data we have uses numbers to represent different categories.  For example, gender is currently coded 1 or 2 to indicate Male and Female, respectively.  To check this, try the unique command for the gender variable:

unique(data$gender)

Factors have levels (names) and levels have an order.  It’s easy to accidentally mix them up.  The safest way I find to do this is to

(i) create a new variable to play the role of the factor (eg genderf)

(ii) very clearly declare the text descriptors

data$genderf[data$gender == "1"] <- "Male"

data$genderf[data$gender == "2"] <- "Female"

(iii) make this new variable a factor

data$genderf <- as.factor(data$genderf)

I also like to check that things are how they are supposed to be. I simply create a new object (Factor_summary) and make a table summarising the original numerical coding against the new text coding.

 

As you can see the only entries for the factors “Male” or “Female” correspond to the original 1 or 2 identifiers, respectively.

Exercise:

Assign text-based factors to the remaining categorical variables.