13. Summarising data

To get a quick summary of the data you can use the summary(<dataframe>) function, or for a single variable summary(<dataframe$variable>). For example, created a new object and obtain a basic summary of the overall dataset:

Summary_stats_basic <- summary(data)

Summary_stats_basic

Or just summarise the BMI data:

Summary_stats_bmi_basic <- summary(data$bmi)

Summary_stats_bmi_basic

This is limited in that it can’t easily separated (faceted) by factors.  For example, separate the summary of BMI by gender.  There is a very useful add-on package to R called “doBy” which, which can be easily downloaded and installed going to the Packages menu -> Install Packages and scrolling down to “doBy”, and selecting install.  If R asks for a “CRAN mirror” (ie. the site you wish do download a package from) my recommendation is one from inside Australia – although hey are all approved by the R Project Group.

Once installed, you need to load the package you wish to use.  This is achieved by entering the command library(<package name>); in this case:

library(doBy)

Sometimes a package will depend upon another package (R will warn you of this) and thus this requires you to install that package also.  If this is the case, simply repeat the process to install all required packages.  The good news is once you have installed a package you won’t need to do it again in the future.

Using doBy, you can “facet” the statistics for a specific variable by different factors (denoted by the first argument) and specify which statistics you would like in the summary (denoted by the FUN=c() argument, and also.  For example, create an object “Summary_stats_boots” and then summarise the comfort by boot type and gender:

Summary_stats_boots <- summaryBy(comfortf~bootsf+genderf, FUN=c(length, median, mean, sd, min, max),data=data)

Summary_stats_boots

 

Note that I used the factors for boots (bootsf) and gender (genderf) – this way I get the understandable name, rather than the original number format. You would probably like to save this result for use later, for example in a paper you are writing.  This is achieved the same way as you saved the data before, except this time the dataframe is Summary_stats_boots.

write.csv(Summary_stats_boots, file="Summary_stats_boots.csv", na = "NA", row.names=F)

You can then open the file you created and copy/paste the information you want.

 

Exercise:

Summarise BMI by gender and smoking status.  Save it as a csv file for later use.