Descriptive statistics

Non-parametrics

Introduction

In this module, Stata will mainly be used to demonstrate the various procedures. However, nearly all main statistical packages, such as SPSS and SAS can undertake the same analyses. Many statistical tests make the assumption
that the variable of interest, or dependent variable, has a normal (bell-shaped) distribution.  How can we tell if the variable is normally distributed?  The data below are the SF-36 physical health scores for 205 military recruits. The SF-36 scales range from 0 (poorest quality of life) to 1 (best quality of life).

  

The distribution clearly looks skewed with a long left-hand detail.  This is quite typical of quality of life measures, where the majority of people have a good quality of life.

We can have a quick look at the shape of the distribution, by creating a histogram. We do this in Stata using the histogram procedure.

hist phy_b

This quite clearly shows the long left-hand tail.

 

We can also get a smoother quick look at the distribution, using the Stata kdensity command.

kdensity phy_b

The long left-hand tail is even clearer.
 

A final quick check on the shape of the distribution, is by undertaking a box plot. We use the graph box procedure in Stata to do this.

graph box phy_b

In the box plot, the centre line is the median. For a normal distribution, it should be in the centre of the shaded area.  The two horizontal lines at the edge of the shaded area represent the 25th and 75th percentile. The two extreme horizontal lines are called the lower and upper adjacent values.  They are 1.5 iqr (interquartile range) away from the 25th and 75th percentiles. Finally, the dots represent outliers.

 Test of symmetry

The normal distribution is symmetrical about the mean, so the first thing we can do is check to see if our distribution is symmetrical. This is easily done in Stata using the symplot procedure.

symplot phy_b

A symmetry plot graphs the distance above the median for the i-th value against the distance below the median for the i-th value. A variable that is symmetric would have points that lie on the diagonal line.  Clearly, this is not the case for our data.

Normal quantile plot

 A normal quantile plot graphs the quantiles of a variable against the quantiles of a normal distribution. We ask for it using the Stata qnorm procedure.

qnorm phy_b  

qnorm is sensitive to non-normality near the tails, and we see considerable deviations from the diagonal line in the tails.

 

Standardised normal probability (Q-Q) plot

In a standardized normal probability plot, the sorted data are plotted vs. values selected to make the resulting image look close to a straight line if the data are approximately normally distributed. We use the Stata pnorm procedure to obtain this plot.

pnorm phy_b

        

The standardized normal probability plot is sensitive to deviations from normality nearer to the centre of the distribution. We again see some departure from the diagonal line near the centre. 

Statistical test of departure from normality

There are several tests available to formally test for departure from normality, including the Shapiro-Wilk test, Shapiro-Francia test, and the Kolmogorov-Smirnov test. The Shapiro-Wilk test is suitable for sample sizes ranging from 4 – 2000. It is undertaken in Stata using the swilk command. The null hypothesis is that the distribution is normally distributed.

swilk phy_b   

       

The very small probability leads us to reject the null hypothesis of normality.  However, the test is sensitive to sample size, and high likely to be statistically significant for large sample sizes, hence, it should be used with some of the other plots.

Skewness and kurtosis

Even better than an overall test of normality, such as the Shapiro-Wilk test, we can look at skewness and kurtosis separately. Skewness is a measure of asymmetry, and kurtosis a measure of how flt or high peaked the distribution is.  We can test for skewness and kurtosis using the Stata sktest procedure.

sktest phy_b 

          

sktest first checks for skewness, then kurtosis, and finally provides a joint test.  From the above, both skewness and kurtosis are acceptable, individually and jointly.