7. Basic data description

Before you can start doing data analyses, you need to familiarise yourself with the data. I like to call this ‘getting to know your data’.  There are several commands that help you to understand the characteristics your data. Some of these commands provide similar information and it’s up to you which ones you prefer to use. However, before initiating data analyses, remember to always check the frequencies of all variables, how categorical variables have been coded, the minimum and maximum values and number of missing observations. This is a good way to identify any outliers and potential mistakes in the dataset.

 

There are several commands for describing your data:

Table 4

  

task Task:   Get to know your data set      

Try typing some of the data description commands:

. count

. describe

. summarize

. codebook

. list

  

If you want to check the frequencies for a certain variable, type e.g.:

. tab region

 

If you want to also include the missing values, type e.g.:

. tab region, missing

 

You can combine the data description commands with qualifiers and operators. For example, if is a qualifier which is used to select the observations to which a command applies. Operators used in Stata can be found in the Table below:

Operators in logical expressions:

Logical operators

 

If you want to list countries with missing information for the variable gnppc (GNP per capita) , type e.g.:

. list country gnppc if gnppc==.

 

If you want to list countries with life expectancy less than 65 years, type e.g.:

. list country lexp if lexp <65