Subscribe Now

ABC, 123, Ruby, C#, SAS, SQL, TDD, VB.NET, XYZ

Saturday, December 1, 2007

Futzing with FUTS - Part III

Let's dig a little deeper into using FUTS (Framework for Unit Testing SAS® programs).

We have 10,000 hospitalization records in a CSV file that looks like this. We want to do some basic analyses on this file: average age at admission, average length of stay and number of patient-days per doctor.

In order to unit test the SAS code that will perform the analysis, we learned in part 1 & part 2 in this series that testable code is written as macros. I've written three relatively simple SAS macros to do the three calculations. (I'm using Billy Kreuter's age calculation code.) This file that contains the three macros is named c:\hosp\macros.sas.
%macro Age(agevar,bdatevar,indexdt);
&agevar = floor((intck('month',&bdatevar,&indexdt) - (day(&indexdt) < losvar =" &dischargedtvar">

The Age and LOS macros calculate new variables and are to be called inside a data step, whereas the DocPatDays macro creates a new summary dataset, one record per DoctorID.

Before writing the production SAS program, let's test the macros with unit tests that utilize the FUTS macros. The test data for testing Age and LOS is called test_hosps.CSV and looks like this. It's basically a sample of the main hospitalizations CSV file plus EXPECTED_AGE and EXPECTED_LOS, calculated by hand.


The unit test code for Age looks like this. (We're importing the test_hosps.CSV data, calculated Age using the production macro and then comparing the EXPECTED_AGE with the age calculated by the macro.)

options mprint;
%include 'c:\hosp\macros.sas';
proc import datafile='test_hosps.csv' out=TestData dbms=csv replace;
getnames=yes;
run;

data actual;
set TestData (drop=EXPECTED_AGE);
%Age(AGE,BDATE,ADMITDATE);
run;

data expected;
set TestData (rename=(EXPECTED_AGE=AGE));
run;

%assert_compare_equal(base=expected,compare=actual);
On the first run of the unit test program (utAge.sas) we get an error on the assert.
68         %assert_compare_equal(base=expected,compare=actual);
MPRINT(ASSERT_COMPARE_EQUAL): proc compare base=expected compare=actual;
MPRINT(ASSERT_COMPARE_EQUAL): ;
MPRINT(ASSERT_COMPARE_EQUAL): run;

NOTE: There were 10 observations read from the data set WORK.EXPECTED.
NOTE: There were 10 observations read from the data set WORK.ACTUAL.
NOTE: The PROCEDURE COMPARE printed page 1.
NOTE: PROCEDURE COMPARE used (Total process time):
real time 0.45 seconds
cpu time 0.03 seconds


MPRINT(GENERATE_EVENT): options linesize=max;
ERROR: Data set actual not equal to expected
Looking at the .lst file reveals what is going on....proc import is attaching the BEST12 format and informat to the expected age variable (not present on the calculated age variable). They key thing is that all observations with all compared variables are equal, but there's still that annoying error in the log.

The COMPARE Procedure
Comparison of WORK.EXPECTED with WORK.ACTUAL
(Method=EXACT)

Data Set Summary

Dataset Created Modified NVar NObs

WORK.EXPECTED 26NOV07:19:37:43 26NOV07:19:37:43 7 10
WORK.ACTUAL 26NOV07:19:37:43 26NOV07:19:37:43 7 10


Variables Summary

Number of Variables in Common: 7.
Number of Variables with Differing Attributes: 1.


Listing of Common Variables with Differing Attributes

Variable Dataset Type Length Format Informat

AGE WORK.EXPECTED Num 8 BEST12. BEST32.
WORK.ACTUAL Num 8


Observation Summary

Observation Base Compare

First Obs 1 1
Last Obs 10 10

Number of Observations in Common: 10.
Total Number of Observations Read from WORK.EXPECTED: 10.
Total Number of Observations Read from WORK.ACTUAL: 10.

Number of Observations with Some Compared Variables Unequal: 0.
Number of Observations with All Compared Variables Equal: 10.

NOTE: No unequal values were found. All values compared are exactly equal.

To take care of that bothersome error caused by a simple difference is irrelevant variable attributes, the utAge.sas unit test SAS program gets updated to this.

options mprint;
%include 'c:\hosp\macros.sas';
proc import datafile='test_hosps.csv' out=TestData dbms=csv replace;
getnames=yes;
run;

data actual;
set TestData (drop=EXPECTED_AGE);
%Age(AGE,BDATE,ADMITDATE);
run;

data expected;
set TestData (rename=(EXPECTED_AGE=AGE));
format age; informat age; *Takes away formating/informating;
run;

%assert_compare_equal(base=expected,compare=actual);
The error goes away! Our unit test for the Age macro passes! :)

The unit test code for LOS looks very similar to the Age unit test code.
options mprint;
%include 'C:\hosp\macros.sas';
proc import datafile='test_hosps.csv' out=TestData dbms=csv replace;
getnames=yes;
run;

data actual;
set TestData (drop=EXPECTED_LOS);
%LOS(LOS,ADMITDATE,DISCHARGEDATE);
run;

data expected;
set TestData (rename=(EXPECTED_LOS=LOS));
format LOS; informat LOS; *Takes away formating/informating;
run;

%assert_compare_equal(base=expected,compare=actual);
We need another test data file for the PatDocDays unit testing. It is hand-calculated doctor-level summary data and will supply the expected values when processing the test_hosps.CSV with the PatDocDays macro.

The utDocPatDays.sas unit test code looks like this.
options mprint;
%include 'C:\hosp\macros.sas';
proc import datafile='test_hosps.csv' out=InputData dbms=csv replace;
getnames=yes;
run;

data InputData;
set InputData;
%LOS(LOS,AdmitDate,DischargeDate);
run;

proc import datafile='test_patient_days.csv' out=Expected dbms=csv replace;
getnames=yes;
run;

data Expected;
set Expected (rename=(EXPECTED_PT_DAYS=PT_DAYS));
format PT_DAYS; informat PT_DAYS; *Takes away formating/informating;
run;

%DocPatDays(actual,InputData,PT_DAYS,DOCTORID,LOS);

data actual;
set actual (keep=DOCTORID PT_DAYS);
format PT_DAYS; informat PT_DAYS;
run;

%assert_compare_equal(base=expected,compare=actual);
And finally, the production code looks like this.

options mprint;
%include 'C:\hosp\macros.sas';
proc import datafile='hospitalizations.csv' out=HospitalData dbms=csv replace;
getnames=yes;
run;

*Calculate age at hospital admission and length of stay;
data HospitalData2;
set HospitalData;
%Age(AgeAtAdmit,BDate,AdmitDate);
%LOS(LOS,AdmitDate,DischargeDate);
run;

proc means data=HospitalData2;
var AgeAtAdmit LOS;
run;

*Calculate patient-days per doctor;
%DocPatDays(PatDays,HospitalData2,PatDays,DoctorID,LOS);

proc means data=PatDays maxdec=2;
var PatDays;
run;

proc print data=PatDays;
var DoctorID PatDays;
run;
The output looks like this.


The MEANS Procedure

Variable N Mean Std Dev Minimum Maximum

AgeAtAdmit 10000 46.2669000 23.1170955 6.0000000 86.0000000
LOS 10000 8.0142000 3.7280810 2.0000000 14.0000000


The MEANS Procedure

Analysis Variable : PatDays

N Mean Std Dev Minimum Maximum

100 801.42 95.13 609.00 1049.00


Pat
Obs DOCTORID Days

1 100 744
2 101 732
3 102 840
4 103 1035
5 104 879
6 105 907
7 106 752
8 107 771
9 108 880
10 109 901
11 110 775
12 111 910
13 112 799
14 113 849
15 114 774
16 115 753
17 116 845
18 117 726
19 118 690
20 119 856
21 120 675
22 121 747
23 122 727
24 123 692
25 124 992
26 125 755
27 126 782
28 127 954
29 128 773
30 129 926
31 130 790
32 131 727
33 132 702
34 133 922
35 134 843
36 135 835
37 136 755
38 137 625
39 138 687
40 139 728
41 140 805
42 141 687
43 142 669
44 143 651
45 144 1010
46 145 909
47 146 950
48 147 805
49 148 920
50 149 679
51 150 777
52 151 756
53 152 837
54 153 798
55 154 832
56 155 893
57 156 684
58 157 754
59 158 808
60 159 788
61 160 793
62 161 816
63 162 752
64 163 1049
65 164 712
66 165 778
67 166 880
68 167 945
69 168 866
70 169 713
71 170 842
72 171 807
73 172 854
74 173 609
75 174 698
76 175 712
77 176 704
78 177 944
79 178 696
80 179 754
81 180 811
82 181 803
83 182 771
84 183 798
85 184 766
86 185 872
87 186 623
88 187 755
89 188 759
90 189 795
91 190 788
92 191 706
93 192 959
94 193 785
95 194 823
96 195 1027
97 196 903
98 197 751
99 198 765
100 199 891
FYI: I generated the hospitalizations.CSV dataset with this C# program.


C# data generation program

No comments: