Introduction to Stata
8. Basic variable recoding
In this intro, we’re not going to cover a lot of data manipulation. This could include things like checking your dataset for duplicate cases and deleting duplicates. You may also want to analyse only a subset of the study sample. Drop and keep commands can be used to drop/keep certain observations. Remember that once you’ve dropped something, you can’t get it back anymore! Therefore, when saving the revised dataset, use a different file name.
There are two key commands analysing subsamples:
If you want to analyse only a subset of dataset, which includes countries with life expectancy 65 years or more, type:
. drop if lexp <65
. keep if lexp >=65
Variable recoding is something that everyone will need to do at some point in time. It is usually recommended to create a new variable and not to overwrite the existing one. You may also want to do other modifications to your variables, such as logarithm modifications. After creating new variables, it’s a good practice to label variables and variable values, so that you won’t forget how the recoding was done.
In order to create a new variable called loggnppc which is a logarithm of the values of variable gnppc, type:
. generate loggnppc = log(gnppc)
In order to recode the variable region so that Europe and Asia are coded in the same category and to generate a new variable called region2, type:
. recode region (1=1) (2/3=2), generate(region2)
In order to label the values of variable region2, type:
. label define region2_label 1 "Europe/Asia" 2 "America"
. label values region2 region2_label
Task: Variable recoding
Try using some of the commands introduced in this section.