Subscribe Now

Technology ramblings

ABC, 123, Ruby, C#, SAS, SQL, TDD, VB.NET, XYZ

Thursday, December 13, 2007

SQL Server 2005 Non-Adoption

The tide may have finally turned, but I have some opinions why SQL Server 2005 was not initially embraced by a lot of IT shops.

First of all, have you seen how well it integrates with Access 2003 for ADP development? It's phenomenal! (In the sense that it is non-functional.) Let's hope Access 2007 / SQL Server 2008 integration works.

Second of all, I have a sneaking suspicion that there really weren't a lot of devs out there dying to write C# or VB.NET stored procedures, and that this highly touted new feature was largely a gimmick imagined up by the big brains in Redmond and not really something that the practical developer would really use. T-SQL rocks after all! :)

You have opinions one way or the other?

Saturday, December 1, 2007

Futzing with FUTS - Part III

Let's dig a little deeper into using FUTS (Framework for Unit Testing SAS® programs).

We have 10,000 hospitalization records in a CSV file that looks like this. We want to do some basic analyses on this file: average age at admission, average length of stay and number of patient-days per doctor.

In order to unit test the SAS code that will perform the analysis, we learned in part 1 & part 2 in this series that testable code is written as macros. I've written three relatively simple SAS macros to do the three calculations. (I'm using Billy Kreuter's age calculation code.) This file that contains the three macros is named c:\hosp\macros.sas.
%macro Age(agevar,bdatevar,indexdt);
&agevar = floor((intck('month',&bdatevar,&indexdt) - (day(&indexdt) < losvar =" &dischargedtvar">

The Age and LOS macros calculate new variables and are to be called inside a data step, whereas the DocPatDays macro creates a new summary dataset, one record per DoctorID.

Before writing the production SAS program, let's test the macros with unit tests that utilize the FUTS macros. The test data for testing Age and LOS is called test_hosps.CSV and looks like this. It's basically a sample of the main hospitalizations CSV file plus EXPECTED_AGE and EXPECTED_LOS, calculated by hand.


The unit test code for Age looks like this. (We're importing the test_hosps.CSV data, calculated Age using the production macro and then comparing the EXPECTED_AGE with the age calculated by the macro.)

options mprint;
%include 'c:\hosp\macros.sas';
proc import datafile='test_hosps.csv' out=TestData dbms=csv replace;
getnames=yes;
run;

data actual;
set TestData (drop=EXPECTED_AGE);
%Age(AGE,BDATE,ADMITDATE);
run;

data expected;
set TestData (rename=(EXPECTED_AGE=AGE));
run;

%assert_compare_equal(base=expected,compare=actual);
On the first run of the unit test program (utAge.sas) we get an error on the assert.
68         %assert_compare_equal(base=expected,compare=actual);
MPRINT(ASSERT_COMPARE_EQUAL): proc compare base=expected compare=actual;
MPRINT(ASSERT_COMPARE_EQUAL): ;
MPRINT(ASSERT_COMPARE_EQUAL): run;

NOTE: There were 10 observations read from the data set WORK.EXPECTED.
NOTE: There were 10 observations read from the data set WORK.ACTUAL.
NOTE: The PROCEDURE COMPARE printed page 1.
NOTE: PROCEDURE COMPARE used (Total process time):
real time 0.45 seconds
cpu time 0.03 seconds


MPRINT(GENERATE_EVENT): options linesize=max;
ERROR: Data set actual not equal to expected
Looking at the .lst file reveals what is going on....proc import is attaching the BEST12 format and informat to the expected age variable (not present on the calculated age variable). They key thing is that all observations with all compared variables are equal, but there's still that annoying error in the log.

The COMPARE Procedure
Comparison of WORK.EXPECTED with WORK.ACTUAL
(Method=EXACT)

Data Set Summary

Dataset Created Modified NVar NObs

WORK.EXPECTED 26NOV07:19:37:43 26NOV07:19:37:43 7 10
WORK.ACTUAL 26NOV07:19:37:43 26NOV07:19:37:43 7 10


Variables Summary

Number of Variables in Common: 7.
Number of Variables with Differing Attributes: 1.


Listing of Common Variables with Differing Attributes

Variable Dataset Type Length Format Informat

AGE WORK.EXPECTED Num 8 BEST12. BEST32.
WORK.ACTUAL Num 8


Observation Summary

Observation Base Compare

First Obs 1 1
Last Obs 10 10

Number of Observations in Common: 10.
Total Number of Observations Read from WORK.EXPECTED: 10.
Total Number of Observations Read from WORK.ACTUAL: 10.

Number of Observations with Some Compared Variables Unequal: 0.
Number of Observations with All Compared Variables Equal: 10.

NOTE: No unequal values were found. All values compared are exactly equal.

To take care of that bothersome error caused by a simple difference is irrelevant variable attributes, the utAge.sas unit test SAS program gets updated to this.

options mprint;
%include 'c:\hosp\macros.sas';
proc import datafile='test_hosps.csv' out=TestData dbms=csv replace;
getnames=yes;
run;

data actual;
set TestData (drop=EXPECTED_AGE);
%Age(AGE,BDATE,ADMITDATE);
run;

data expected;
set TestData (rename=(EXPECTED_AGE=AGE));
format age; informat age; *Takes away formating/informating;
run;

%assert_compare_equal(base=expected,compare=actual);
The error goes away! Our unit test for the Age macro passes! :)

The unit test code for LOS looks very similar to the Age unit test code.
options mprint;
%include 'C:\hosp\macros.sas';
proc import datafile='test_hosps.csv' out=TestData dbms=csv replace;
getnames=yes;
run;

data actual;
set TestData (drop=EXPECTED_LOS);
%LOS(LOS,ADMITDATE,DISCHARGEDATE);
run;

data expected;
set TestData (rename=(EXPECTED_LOS=LOS));
format LOS; informat LOS; *Takes away formating/informating;
run;

%assert_compare_equal(base=expected,compare=actual);
We need another test data file for the PatDocDays unit testing. It is hand-calculated doctor-level summary data and will supply the expected values when processing the test_hosps.CSV with the PatDocDays macro.

The utDocPatDays.sas unit test code looks like this.
options mprint;
%include 'C:\hosp\macros.sas';
proc import datafile='test_hosps.csv' out=InputData dbms=csv replace;
getnames=yes;
run;

data InputData;
set InputData;
%LOS(LOS,AdmitDate,DischargeDate);
run;

proc import datafile='test_patient_days.csv' out=Expected dbms=csv replace;
getnames=yes;
run;

data Expected;
set Expected (rename=(EXPECTED_PT_DAYS=PT_DAYS));
format PT_DAYS; informat PT_DAYS; *Takes away formating/informating;
run;

%DocPatDays(actual,InputData,PT_DAYS,DOCTORID,LOS);

data actual;
set actual (keep=DOCTORID PT_DAYS);
format PT_DAYS; informat PT_DAYS;
run;

%assert_compare_equal(base=expected,compare=actual);
And finally, the production code looks like this.

options mprint;
%include 'C:\hosp\macros.sas';
proc import datafile='hospitalizations.csv' out=HospitalData dbms=csv replace;
getnames=yes;
run;

*Calculate age at hospital admission and length of stay;
data HospitalData2;
set HospitalData;
%Age(AgeAtAdmit,BDate,AdmitDate);
%LOS(LOS,AdmitDate,DischargeDate);
run;

proc means data=HospitalData2;
var AgeAtAdmit LOS;
run;

*Calculate patient-days per doctor;
%DocPatDays(PatDays,HospitalData2,PatDays,DoctorID,LOS);

proc means data=PatDays maxdec=2;
var PatDays;
run;

proc print data=PatDays;
var DoctorID PatDays;
run;
The output looks like this.


The MEANS Procedure

Variable N Mean Std Dev Minimum Maximum

AgeAtAdmit 10000 46.2669000 23.1170955 6.0000000 86.0000000
LOS 10000 8.0142000 3.7280810 2.0000000 14.0000000


The MEANS Procedure

Analysis Variable : PatDays

N Mean Std Dev Minimum Maximum

100 801.42 95.13 609.00 1049.00


Pat
Obs DOCTORID Days

1 100 744
2 101 732
3 102 840
4 103 1035
5 104 879
6 105 907
7 106 752
8 107 771
9 108 880
10 109 901
11 110 775
12 111 910
13 112 799
14 113 849
15 114 774
16 115 753
17 116 845
18 117 726
19 118 690
20 119 856
21 120 675
22 121 747
23 122 727
24 123 692
25 124 992
26 125 755
27 126 782
28 127 954
29 128 773
30 129 926
31 130 790
32 131 727
33 132 702
34 133 922
35 134 843
36 135 835
37 136 755
38 137 625
39 138 687
40 139 728
41 140 805
42 141 687
43 142 669
44 143 651
45 144 1010
46 145 909
47 146 950
48 147 805
49 148 920
50 149 679
51 150 777
52 151 756
53 152 837
54 153 798
55 154 832
56 155 893
57 156 684
58 157 754
59 158 808
60 159 788
61 160 793
62 161 816
63 162 752
64 163 1049
65 164 712
66 165 778
67 166 880
68 167 945
69 168 866
70 169 713
71 170 842
72 171 807
73 172 854
74 173 609
75 174 698
76 175 712
77 176 704
78 177 944
79 178 696
80 179 754
81 180 811
82 181 803
83 182 771
84 183 798
85 184 766
86 185 872
87 186 623
88 187 755
89 188 759
90 189 795
91 190 788
92 191 706
93 192 959
94 193 785
95 194 823
96 195 1027
97 196 903
98 197 751
99 198 765
100 199 891
FYI: I generated the hospitalizations.CSV dataset with this C# program.


C# data generation program

Thursday, November 22, 2007

The Value of Cross-Training

It was back when Java 1.1 was a hot ticket item that I decided to tinker with this cool, new language. I learned the basics and made a number of fun applets. One silly one I recall involved a cat chasing a mouse around on a rectangular surface. The hardware mouse moved the software mouse and the cat head followed the mouse wherever it went automatically. At the time I was in a primarily Microsoft shop, so I never got to use my Java knowledge at work, but when C# came out, it was very easy to pick up for me due to my Java experience.

A while back I also learned a fair bit about writing Flash applications using Macromedia Flash MX. I wonder how much this is going to help when I dig into Silverlight one of these days.

More recently, I spent a weekend studying Ruby on Rails and then BAM! - a few months later Microsoft's ASP.NET MVC Framework becomes a hot topic. Due to my experience with RoR, understanding the architecture and coding of MS MVC has been very easy. I may never code a production RoR app, but by studying it, I've given myself a valuable education.

Cynics may grumble about "M$ copying others" (I'm won't touch that one!), but there have definitely been other instances of helpful cross-training not involving Microsoft products. An obvious example is all that IBM Mainframe JCL I got to write a decade or so ago...that prepared me quite well for…for…for, well, um, Soap Headers! (And punch cards prepared me for voting in Palm Beach county.) LOL. Ok, maybe that's a bit of a stretch, but I am certain that my work with Perl helped with understanding some Ruby syntax and of course when SAS started supporting regular expressions, the experience with Perl regex paid off. And my explorations into the Matisse object-oriented database definitely helped grokking NNHibernate easier. Procedural programming in QBASIC certainly prepped me for classic VB. And let's not forget BASIC with its fun and frequent GOTO statements and the not infrequent use of same in VB/VBA error handling. On Error Goto Err_Handler. Good times.

My point is that when the urge strikes you to explore that language you've been curious about, go for it. It might just turn out to be the next cool thing or cross-train you for it.

Changing gears a bit, this reminds of the cross-training encouraged by the Integral Life Practice starter kit. By exercising your body, mind, spirit, and shadow (unconscious) in different ways, each realm of human experience is enhanced. For example, the benefits of meditation (spirit) extend well beyond the spiritual realm, into the physical (a more relaxed state), mental (more clarity), and shadow (unconscious material is more accessible to productive processing).

Happy Thanksgiving! :)

Wednesday, November 21, 2007

Visual Studio.NET 2008 Arrives

No more beta blues! Visual Studio.NET 2008 has been released to manufacturing!

90-day trial editions are available right now at this location on MSDN.

Here is the express products web install page for Visual Basic 2008 Express Edition, Visual C# 2008 Express Edition, Visual C++ 2008 Express Edition, & Visual Web Developer 2008 Express Edition. Free to download immediately. (You'll want a fast connection.)

Here's a nice blog post about new features courtesy of Doug Stewart.

Sunday, November 18, 2007

Futzing with FUTS - Part II

In this second look at FUTS (Framework for Unit Testing SAS® programs), I will walk through an example of how to convert a regular piece of SAS code into a testable piece of SAS code with a unit test for testing it. The example consists of proc sql code to calculate age group based on age at an index date.

proc sql;
create table Calc as
select *, (case
when int((IndexDt-BDate)/365.25) between 0 and 17 then 'A: 0-17'
when int((IndexDt-BDate)/365.25) between 18 and 34 then 'B: 18-34'
when int((IndexDt-BDate)/365.25) between 35 and 49 then 'C: 35-49'
when int((IndexDt-BDate)/365.25) between 50 and 64 then 'D: 50-64'
when int((IndexDt-BDate)/365.25) > 64 then 'E: 65+'
else '?' end) as AgeGroup
from sample;
quit;
As I mentioned in Part I, the trick to making this code testable is to place it in a macro. Editing this code to fit in a macro results in something like this. The macro creates the calculated age group variable in an output dataset (outfile) given an input dataset (infile) containing a birthdate variable (bdatevar) and index date variable (indexdtvar).
%macro CalcAgeGroup(outfile,infile,bdatevar,indexdtvar);
proc sql;
create table &outfile as
select *, (case
when int((&indexdtvar-&bdatevar)/365.25) between 0 and 17 then 'A: 0-17'
when int((&indexdtvar-&bdatevar)/365.25) between 18 and 34 then 'B: 18-34'
when int((&indexdtvar-&bdatevar)/365.25) between 35 and 49 then 'C: 35-49'
when int((&indexdtvar-&bdatevar)/365.25) between 50 and 64 then 'D: 50-64'
when int((&indexdtvar-&bdatevar)/365.25) > 64 then 'E: 65+'
else '?' end) as AgeGroup
from &infile;
quit;
%mend CalcAgeGroup;

The production code that calls this macro to achieve the same goal as the first bit of code looks like this. A simple one liner.

%CalcAgeGroup(Calc,sample,BDate,IndexDt);

To unit test the macro, we need at least a minimal set of test data like this CSV. In this sample data file, you can see there's an ID var, BDATE, INDEXDT, and EXPECTED_AGEGROUP based on a calculation done by hand.

Test Data CSV

Given the CSV, the unit test code looks like this. (The macro is assume to be saved in a file called CalcAgeGroup.sas.)


*Import the CSV file into a SAS dataset called TestData;
proc import datafile='test_data.csv' out=TestData dbms=csv replace; getnames=yes; run;

data input; *Prep input dataset;
set TestData (drop=Expected_AgeGroup);
run;

data expected; *Prep expected output dataset;
length AgeGroup $8.; *Make AgeGroup the right size;
set TestData (rename=(Expected_AgeGroup=AgeGroup));
format AgeGroup; informat AgeGroup; *Drop format/informat;
run;

%include 'CalcAgeGroup.sas';
%CalcAgeGroup(Calc,input,bdate,indexdt); *Run test data through macro;

%assert_compare_equal(base=expected,compare=Calc); *Compare macro output & expected output;

Here we see the use of the FUTS macro %assert_compare_equal which compares SAS two datasets (attributes and data values) and throws an error into the log if the base and compare files are at all different. The rest of the code is fairly self-explanatory and involves preparing an input dataset for the macro and an expected output file to compare with the actual output file.

Friday, November 16, 2007

The Other Mother of Invention: Impatience

On 11/8, I blogged about Unit Testing an Email Method. Things have happened since then...



Bad Sleep

You may have heard the saying about necessity or laziness being the mother of invention, but I say impatience is a good mother, too. I say this based on the fact that I quickly got sick of waiting for my email unit tests to run when I was running the whole batch of unit tests in NUnit. Pretty soon I had them [Ignore]'d out. I was distressed about the yellow instead of the green, but my impatience was stronger. One property of a good unit test is that it runs rapidly.



Eventually (doh!) it dawned on me that I could both have and eat my cake. How? I modified my email unit test to query my Outlook Inbox every two seconds to see if the target email had arrived and exit early if so (with success). Otherwise, it would sleep for two more seconds and check the inbox again -- up to a maximum of eight repetitions (i.e., 16 seconds).



This refactoring made me quite the happy camper. Imagine my joy as I ran all of my unit tests quickly without the manditory 10-second waits for the tests involving email. I, like many NUnit users I am sure, just love to see that green bar -- and as soon as possible!



I took all of the Outlook-related goo and put it in a separate class called OutlookInbox with a single public (static) method called EmailReceived. I just pass in the subject line of the email I am expecting to receive and the method works its magic and I get back a boolean value telling me if the email arrived in time or not. The responsibility for doing the 2-second chunks of waiting is delegated to this method and the work of actually interfacing with Outlook is delegated to a helper method called checkInbox.



using System;
using System.Runtime.InteropServices;
using Outlook = Microsoft.Office.Interop.Outlook;

namespace TestHarness
{
public class OutlookInbox
{
private OutlookInbox() { }

public static bool EmailReceived(string subjectline)
{
const int MAXTRIES = 8;
int tries = 0;
bool emailReceived = false;
do
{
tries++;
System.Threading.Thread.Sleep(2000);
emailReceived = checkInbox(subjectline);
} while (emailReceived == false && tries < MAXTRIES);
return emailReceived;
}

private static bool checkInbox(string subjectline)
{
//This webpage (http://www.developerfusion.co.uk/show/4667/) was helpful in
//developing this code.
Outlook.ApplicationClass outlookApp = null;
Outlook.NameSpace outlookNS = null;
bool emailArrived = false;
try
{
outlookApp = new Outlook.ApplicationClass();
outlookNS = outlookApp.GetNamespace("MAPI");
outlookNS.Session.Logon("outlook", "", false, true);
Outlook.MAPIFolder inbox = outlookNS.GetDefaultFolder(Outlook.OlDefaultFolders.olFolderInbox);
foreach (Outlook.MailItem email in inbox.Items)
{
if (email.Subject == subjectline)
{
emailArrived = true;
email.Delete();
}
}
}
finally
{
if (outlookNS != null)
outlookNS.Logoff();
}
return emailArrived;
}
}
}


Now my unit test code looks like this. This is testing a web service that sends email. The business about ServiceConnections.WebServiceBaseUrl() manages whether we are pointing at the dev or prod service.



using System;
using NUnit.Framework;
using IndependentContractorApp.BusinessLayer;

namespace TestHarness
{
[TestFixture]
public class EmailServiceTestSuite
{
[Test]
public void SendRegularText()
{
string asmx = "EmailService.asmx";
string url = ServiceConnections.WebServiceBaseUrl() + asmx;
EmailWebService.EmailService svc = new EmailWebService.EmailService();
svc.Url = url;
svc.Credentials = System.Net.CredentialCache.DefaultCredentials;
DateTime now = DateTime.Now;
svc.SendRegularText("me@mycompany.org", "SendRegularText Test Email - " + now, "Body Text");
Assert.IsTrue(OutlookInbox.EmailReceived("SendRegularText Test Email - " + now),
"Email failed to arrive");
}
}
}


NUnit green bar of success

Ahh, a thing of beauty. :)