Oh noes! Healthcare.gov has be a disaster! Plagued by cronyism, bad management and too many cooks!
My background is IT, working with ERP systems. Based upon my experience, the implementation of healthcare.gov has been a disaster, as long as you ignore all of the successes. But if you look at the successes and put them in perspective, then the healthcare.gov implementation looks like a success. I discuss their successes on the other side.
Let's start with the basics:
* Healthcare.gov is a huge, huge project involving disparate systems from the federal government and every single state to talk to each other in real time
* There was no possibility for a gradual roll out that most projects this big would have. Nor was any slip in schedule possible. All fifty states had to go live on the same day
* There was a massive but understandable underestimation of the number of users when the site went live. I will discuss this further later
+ + + +
What are the most important things for a system like this:
* The data in the healthcare.gov system not get corrupt
Some examples of corrupt data: (1) if multiple insurance policies were assigned to a single person (2) if a person bought family coverage but not all family members were covered, (3) insurance sales were recorded to non-existent customers, (4) if there were multiple records for the same person.
I haven't heard of any example of corrupt data. There could be some and it just hasn't been noticed. However, it usually pretty obvious when you are working with data that has been corrupted.
* The information presented to customers is correct
The number #1 problem with healthecare.gov in its last month of development was that the it was getting the subsidized insurance rates wrong. If they hadn't fixed that problem, they wouldn't have been able to sell insurance through healthcare.gov. From the linked article, "Still, the long-term consequences of any malfunctions in registering and pricing may be limited. People may still be able to sign up offline, even if the online exchanges aren't fully functional at first, several insurers said."
* Security
Wouldn't it be fun to hack into your neighbor/co-worker/brother/sister's account and see what their income is? Or sign them up for an insurance policy even though they get insurance through their work? It is still early on this front, but I haven't seen any reports of anyone easily hacking into someone else's account.
* Performance
I am going to discuss in the next section.
+ + + +
Why did they go live when they had such performance issues?
Whenever you are talking about a system this complex, you can never solve every single problem. For one thing, you can never make a system foolproof because fools are so darn clever - they will do things that you never, ever would have thought. So there is a point where further testing doesn't provide much bang for the buck because all the most important known problems are solved and you know that the unknown problems are going to be worse than the remaining known problems.
For performance, they did some performance testing. The government expected its healthcare reform's website to draw 50,000-60,000 users at once based upon on the all-time high of 30,000 simultaneous users for Medicare.gov. Also, they expected the volume to be low initially. I find that expectation reasonable because (1) the website went live three months before the insurance you bought from it could take effect and (2) you have to pay a month's worth of insurance when you sign up, so it is foolish to pay for insurance in October.
My guess is that they were expecting to have a 30-45 days to work out the performance problems before they started getting high volumes of users. My guess would be that with a system this complex, there really isn't way to know for sure what the performance bottlenecks are until you go live. You can do load testing, but it based upon lots and lots of assumptions that won't be true.
You know what happened - the initial volume was far, far more than what they were expecting. Traffic hit over 250,000 users. I saw an estimate of 10 million users visited the healthcare.gov website on the first day. The volume crushed the web site. From what I have read, no insurance was sold on the first two days. However, by Saturday I was able to set up an account and get insurance quotes.
+ + + +
How do problems with systems like this get fixed?
What happens is the managers decide what the top problem is and they throw all of their resources at the problem. When it gets fixed, they throw all their resources at the new top problem. Repeat lots of times until you have a bunch of small problems that are insignificant enough that you can work on them in parallel.
As I said, the top problem before going live was inaccurate price quotes. The top problem when they went live was site performance. Both of those problems appear to be dead and the support team has moved on to other problems
+ + + +
But Ezra Klein said it was a disaster!
Ezra Klein is a really smart dude, but he has never worked in IT. He ignores the problems that have been fixed and mentions one (count 'em - one) problem:
In the weeks leading up to the launch I heard some very ugly things about how the system was performing when transferring data to insurers -- a necessary step if people are actually going to get insurance...Here is one example from a carrier–and I have received numerous reports from many other carriers with exactly the same problem. One carrier exec told me that yesterday they got 7 transactions for 1 person – 4 enrollments and 3 cancellations.
First off - as long as there isn't data corruption problems with the healthcare.gov database, the transmission of data to insurance companies isn't a huge problem because they have months to get it right.
Most important, the example he gives shows a lack of IT experience. It is really hard to figure out incremental changes to a record, so whenever a change happens, the easiest thing is to re-send all the information. Otherwise, the system has to keep track of what the prior record value, what exactly changed and how to send the change information. That is far more error-prone that just re-sending the current account information. So if I sign up for insurance, then change my mailing address, the system probably sends a cancellation of my prior policy and then re-sends all the information for my account. The insurance company receiving the information should have ETL (extract, transformation and load) code that compares each new record for an account to determine what has to be changed. So Ezra is getting upset about something that shouldn't be a problem.
+ + + +
What about cronyism, bad management and too many cooks
David Auberach does yeoman work digging into what contracts were let for the development for the healthcare.gov website. However, he seems to have strong opinions about how things should have been done that color his judgement about the success of the healthcare.gov project.
For example, he appears to hate how the government picks vendors for IT projects. Now, I am sure there are better ways of picking vendors for IT projects, but to make sweeping judgements about the project because the government picked vendors like it always has is stupid. Where does the "cronyism" from the title come from? As far as I can tell, it comes from the fact that Booz-Allen got a $6 million contract for the project. I too hate high-level consulting companies and think they charge ridiculous amounts for so-so advice. And there were probably lots of companies that would have given better advice for less money. We are talking about a project that was way, way beyond the experience of Health and Human Services. So it was really impossible for them to pick which companies really knew their stuff on this issue and which didn't. So Booz-Allen was not the best pick, but probably a safe pick.
Where does the "bad management" in the David Auberch's title come from? My impression is that he expected the development process to be done a certain way and the government didn't do it that way. Now, he could argue that was a poor decision, but he should make that argument. But to declare the project was badly managed just because they didn't use his desired development process is stupid.