5. Part III: Managing data in SPSS

5.4. Sorting and merging data

 The sort syntax is quite simple, and you can write this yourself.  If I want to sort cases by sex, in ascending order (although for this type of variable it probably doesn’t matter whether ascending or descending), I can simply type into my syntax file the following:

           sort cases by sex(A).

The command to sort your data is sort cases.  The “A” in brackets at the end of the statement indicates we want the data sorted in ascending order – you can write “D” if you want descending.

Let’s say now, we have an additional variable – year – that we have in a separate dataset.  This second dataset has the same participants as our HLTH1025_2016 dataset, and they are identified by the variable IDnumber.  But we have several years’ worth of data from this course, and when we put it all together, we want to be able to identify which students belong to which cohort (year of study).

The students we have information about belong to the second year of the study.  We need to merge this variable in to our main dataset.

This is a relatively simple merge procedure, but demonstrates the key steps involved.

 

task   Task: Sort and merge data

1. You already have open the main dataset (HLTH1025_2016).  You now need to open the second dataset, called HLTH1025_2016_yr.  Do this using the get file syntax:

get file=‘HLTH1025_2016_yr’.

We also want to name this second dataset – it will help us to tell SPSS exactly which dataset we want to do things with … we’ll see in a minute when we sort.  So use the dataset name syntax to name your new dataset:

dataset name HLTH1025_2016_yr window=front.

2. Now we need to sort cases by IDnumber in both datasets.  Here’s where it’s helpful to have datasets named, since we can ‘activate’ a dataset to run a procedure, then activate another dataset to run the same or different procedure.  It’s easy to activate a particular dataset if you’ve named it.  Use the following syntax:

dataset activate  HLTH1025_2016.

sort cases by IDnumber(A).

dataset activate  HLTH1025_2016_yr.

sort cases by IDnumber(A).

3. Now that you’ve written the syntax to open, name and sort your data, you will need to select those lines of code and Run Selection (big green arrow button).

4. The final step is to merge the year variable into your main dataset.  Here we are going to use the menus to help us, and then paste the syntax and run it from our syntax file.

 a. First, activate your main dataset (HLTH1025_2016):

dataset activate HLTH1025_2016.

b. Click Data > Merge Files > Add Variables …

39.

 

c. Select the dataset HLTH1025_2016_yr to merge with your active dataset, and click Continue:

40

 d. A new dialogue box will appear, like this:

41

e. Tick the “Match cases on key variables” check box, and then tick the “Cases are sorted in order of key variables in both datasets” check box.

f. Click the IDnumber variable in the “Excluded Variables:” box to the top left, and move it using the arrow button into the “Key Variables:” box.

This tells SPSS to match cases in both datasets by the IDnumber variable, and we confirm that the data in both datasets are sorted by IDnumber (the variable we want to match on).

g. The dialogue box should now look something like this:

42

 

h. Click Paste to paste the merge syntax to your syntax file. 

When you click Paste (or OK) a warning message will come up to say that if your data are not sorted in ascending order of the Key Variable(s) then the procedure will fail.  Luckily we already sorted the data as required, so we click OK.

i. Your syntax should look something like this:

43

 

j. Select the syntax and Run Selection.

After you run a procedure like merging, it’s best to check a few things.  First, I always check my output to see whether there were any error messages.  Hopefully you don’t receive any error messages, and your output record looks something like this:

44

 

The next thing I normally check is the Data Editor window in Variable View, to see whether my new variable is now at the bottom of my list of variables, and check whether it looks like it merged correctly. 

Looks good to me:

45

 

Then, I’d just double check using Data View that the values have actually merged for that variable, and that they look right.  I know that all records should have a value of “2” for the variable year. 

Again, it looks good:

 

46

 

You should now feel relatively comfortable with the sorting and merging procedures in SPSS.