Introduction to Stata

8. Basic variable recoding

In this intro, we’re not going to cover a lot of data manipulation. This could include things like checking your dataset for duplicate cases and deleting duplicates. You may also want to analyse only a subset of the study sample. Drop and keep commands can be used to drop/keep certain observations. Remember that once you’ve dropped something, you can’t get it back anymore! Therefore, when saving the revised dataset, use a different file name.

There are two key commands analysing subsamples:

If you want to analyse only a subset of dataset, which includes countries with life expectancy 65 years or more, type:

. drop if lexp <65

. keep if lexp >=65

Variable recoding is something that everyone will need to do at some point in time. It is usually recommended to create a new variable and not to overwrite the existing one. You may also want to do other modifications to your variables, such as logarithm modifications. After creating new variables, it’s a good practice to label variables and variable values, so that you won’t forget how the recoding was done.

In order to create a new variable called loggnppc which is a logarithm of the values of variable gnppc, type:

. generate loggnppc = log(gnppc)

In order to recode the variable region so that Europe and Asia are coded in the same category and to generate a new variable called region2, type:

. recode region (1=1) (2/3=2), generate(region2)

In order to label the values of variable region2, type:

. label define region2_label 1 "Europe/Asia" 2 "America"

. label values region2 region2_label

Task: Variable recoding

Try using some of the commands introduced in this section.