Subscribe Now

ABC, 123, Ruby, C#, SAS, SQL, TDD, VB.NET, XYZ

Sunday, November 18, 2007

Futzing with FUTS - Part II

In this second look at FUTS (Framework for Unit Testing SAS® programs), I will walk through an example of how to convert a regular piece of SAS code into a testable piece of SAS code with a unit test for testing it. The example consists of proc sql code to calculate age group based on age at an index date.

proc sql;
create table Calc as
select *, (case
when int((IndexDt-BDate)/365.25) between 0 and 17 then 'A: 0-17'
when int((IndexDt-BDate)/365.25) between 18 and 34 then 'B: 18-34'
when int((IndexDt-BDate)/365.25) between 35 and 49 then 'C: 35-49'
when int((IndexDt-BDate)/365.25) between 50 and 64 then 'D: 50-64'
when int((IndexDt-BDate)/365.25) > 64 then 'E: 65+'
else '?' end) as AgeGroup
from sample;
quit;
As I mentioned in Part I, the trick to making this code testable is to place it in a macro. Editing this code to fit in a macro results in something like this. The macro creates the calculated age group variable in an output dataset (outfile) given an input dataset (infile) containing a birthdate variable (bdatevar) and index date variable (indexdtvar).
%macro CalcAgeGroup(outfile,infile,bdatevar,indexdtvar);
proc sql;
create table &outfile as
select *, (case
when int((&indexdtvar-&bdatevar)/365.25) between 0 and 17 then 'A: 0-17'
when int((&indexdtvar-&bdatevar)/365.25) between 18 and 34 then 'B: 18-34'
when int((&indexdtvar-&bdatevar)/365.25) between 35 and 49 then 'C: 35-49'
when int((&indexdtvar-&bdatevar)/365.25) between 50 and 64 then 'D: 50-64'
when int((&indexdtvar-&bdatevar)/365.25) > 64 then 'E: 65+'
else '?' end) as AgeGroup
from &infile;
quit;
%mend CalcAgeGroup;

The production code that calls this macro to achieve the same goal as the first bit of code looks like this. A simple one liner.

%CalcAgeGroup(Calc,sample,BDate,IndexDt);

To unit test the macro, we need at least a minimal set of test data like this CSV. In this sample data file, you can see there's an ID var, BDATE, INDEXDT, and EXPECTED_AGEGROUP based on a calculation done by hand.

Test Data CSV

Given the CSV, the unit test code looks like this. (The macro is assume to be saved in a file called CalcAgeGroup.sas.)


*Import the CSV file into a SAS dataset called TestData;
proc import datafile='test_data.csv' out=TestData dbms=csv replace; getnames=yes; run;

data input; *Prep input dataset;
set TestData (drop=Expected_AgeGroup);
run;

data expected; *Prep expected output dataset;
length AgeGroup $8.; *Make AgeGroup the right size;
set TestData (rename=(Expected_AgeGroup=AgeGroup));
format AgeGroup; informat AgeGroup; *Drop format/informat;
run;

%include 'CalcAgeGroup.sas';
%CalcAgeGroup(Calc,input,bdate,indexdt); *Run test data through macro;

%assert_compare_equal(base=expected,compare=Calc); *Compare macro output & expected output;

Here we see the use of the FUTS macro %assert_compare_equal which compares SAS two datasets (attributes and data values) and throws an error into the log if the base and compare files are at all different. The rest of the code is fairly self-explanatory and involves preparing an input dataset for the macro and an expected output file to compare with the actual output file.

No comments: