Over a year ago, I was talking to a friend of mine who is involved in setting up a health care exchange in Colorado. I told him that if the system is not architected properly, it will be a disaster, and if it is deployed with a poor architecture it is pain we will have to live with for a long time. If it is architected properly, it will be a crazy success and will be a system we are happy to live with for a long time. I offered to architect it free as part of my doctoral program, but was turned down. Today, on the Chuck Todd show (I know, but in this case they were right), the failure of the first few weeks of exchanges lay in the architecture of the system.
My friend told me I didn't understand the real issues being encountered in the setting up of the systems. He felt that the biggest hurdle was the politics of the different parties involved. I tried to tell him that good architecting considers those things. Unfortunately, once a system is deployed, it is extremely difficult to fix the architecture. It is like trying to rearchitect a building when people are living in it. Not easy.
So what went wrong with the architecture of the exchanges? I submit a series of issues and what has to be done to fix it.
First: They failed to create what are called "operational views." Operational views are those diagrams and descriptions of what the deployed product looks like. Operational views consider what it looks like to users, i.e. the user experience. Operational views address who talks to who, not in a system way but in a way of those who are involved in operations: the different elements of the exchanges (insurance companies), the overseers, the maintainers, all the stakeholders. What do they need from the product and how do those needs get met? They address things like the numbers of people who will be trying to access the system. They talk about the information that will be passed from one part (node) of the system to another. Operational views create use cases. Use cases address what happens step by step. They create nominal use cases, which talk about what happens step by step in an ideal experience, and off-nomimal use cases, which address what needs to happen if something happens that is not ideal.
Once the operational views are created, the systems views are derived from the operational views. The systems views address what the system has to do to make the operational views come about, otherwise known as realizing the operational views. So my user interface (GUI) has to have a certain look and feel, how does the system give them this look and feel? My system has to respond with a certain speed, what is the throughput to give each user that speed (throughput vs latency)? What bandwidth do we need to accomplish that? What amount of processing capability will we need to handle all the potential simultaneous users? If there are no operational views, there are no system requirements that tell developers what their throughput and bandwidth and processing need to be. They create system use cases - step by step of what the system does to make the use cases in the operational views happen. They create off nominal use cases to determine what the system will do if something happens that is not ideal. They identify all the interfaces in the system that allow elements of the system talk to each other, and carefully define how those interfaces work. This is critical, because for data to flow from node to node, it has to be passed in a way the receiving node can receive it.
Correction: Off line, in a different room, go create these views. This will probably take some time, it should have been done first.
Once the operational and system views are created, the use cases are used to create test cases. This is done before any coding is done. Then all the parts of the system that already exist are examined to see if they can meet the needs of the system, or if they can be modified to do so. If not, those pieces have to be redone. What happened in the ACA system is the system requirements were derived from the existing parts. However, none of those preexisting parts had ever had to handle the volume of a national program. Many were obsolete. Why would we design a system to accommodate obsolete parts?
What happened in ACA, according to the IT experts discussing the failures of the system as well as evidence provided by the experience of users today, is that the developers started coding with none of the above groundwork performed. As a result, the system is a kluge. What is worse, without the prior steps having been performed, the tests did not test to the right things. As a result, the system could pass all the tests and still not performed. It is what we call the difference between verification and validation. Verification means you built the product right. Validation means you built the right product. Without the operational views informing the system views, you do not know what the right product is, so you can't be sure you built the right product.
Correction: Identify all the pieces and parts in the system and catalog them. Identify the interfaces they require and catalog them. Identify those that are obsolete and mark that in the catalog. Identify the parts that require security certification, those that have that certification and those that do not. And so on.
Then you have to find the holes. What obsolete parts are unsalvagable and need to be redone? Do it. What parts have interfaces that are incompatible with some of the parts they need to talk to? How many cannot support the data flow? Fix them. Test those interfaces. What parts can't perform to the level they need to perform? Are they salvagable? Fix or replace them. Test them. Stress test them.
Develop tests that prove you built the right system from the operational views. Stress them. Allow margin - predicting human behavior is difficult.
NEVER go straight to coding with a new system!!!
When I was talking to my friend, I was still employed as a software systems architect. However, as sequester made my company tighten its belt, it was determined that software architects were a luxury they could not afford. The government was not that willing to pay for software architecture. I am now laid off, dying with frustration at the pain being inflicted by this system that did not need to be painful.
The message here is: Start by figuring out what the deployed product needs to look like and do. There need to be scenarios for every string of events that could occur. You need to think about what could go wrong at any step and figure out what to do about it. THEN design the system. THEN develop the test cases. THEN code. THEN test. THEN deploy.
It can be fixed, but it will be painful and expensive to fix, and in the meantime it will be a painful experience for users, overseers, and especially maintainers. It always seems easier and cheaper to simply code and deliver, but in the long run it is far more expensive.