In the days before you went SaaS, a bug showed up on your bug list. It would hang around one release cycle or more. Some dropped off the list. Some didn't. Some came back. Your team in QA built the test plan and found the bugs. Your developers cured some bugs, and made some bugs. Your user groups wanted to hear about Bug #674891. No, they were not random numbers.
Now, if you were a manufacturing plant that produced a piece of metal with a hole in it, both well specified with close tolerances, you would have QC people testing the product as it rolled off the line. The testing process would be validated statistically. Defects would be found. Nobody would fix them. You would call them scrap. If the defects were within what you would call random, no problem. If the defects established a trend, QA would get the call. QA would start looking for the cause, not causes, and certainly not cures. The cause would eventually be found. It would definitely originate in a specific manufacturing process. Then, QA would engineer a process improvement. And, plant life would continue uninterrupted for a long, long time.
Somehow, in the software business, we take QC and call it QA. I never worked in any software company where QA fixed the process. I did know one programmer who psychoanalyzed bugs to discover why the creator of the code he was debugging coded it that way, that buggy way. Still, the originating programmer's process wasn't called into question.
Back in the TQM days, behavior would be corrected by process. You still press Spellcheck? No, oh. Yeah, it wasn't a spelling error. It was a typo.
Bring it to contemporary times or the late market, and we've seen some tweets over the past two weeks about sites apologizing, because allegedly, a programmer wrote some buggy code. Agile puts some process in place with pair programming, but really, allegedly is the word, because even so, it wasn't the programmer. It was the lack of server operations management, the stuff IT is made of.
CIOs put ITIL in place to meet their six sigma goals. They start with problem management, which kicks in when they have an outage. They call everyone in on their beepers on Saturday. Do bugs know when it's not a work day? Apparently. The team searches here and there since everything is connected. They write it up. Then, they follow it up. Then, change management is put in place, because calling people in and writing bugs up doesn't eliminate bugs.
Change management is the CIO's QA department. They ensure that code gets tested and passes before it gets into the operational environment. They manage change to ensure that only good working changes get into the operational code. But, bugs still manage to get in there and cause an outage. So each morning, immediately after the prior day's outage meeting, the QA hunt begins. They call all possible parties and listen for the passing of the ball, weeks can pass, the avoidance of responsibility, or maybe on a good day, one manager stands up and says, "Hey, I did it. I'll take care of it." I love this guy. Process improvement happens. Beepers don't go off. The family is content. And, hopefully change management has less to do. Pipedream. But, bugs do diminish and get smaller and smaller. One day, on the hunt, some manager will say, "It was only 5 milliseconds." Are we at 26 standard deviations from the mean yet? Hardly, so pointless. Besides, where would Nike be with "Oh, just excuse it!"
If you sever other vendor's functionality, they can only make changes during declared and previously arranged change windows. If you don't have a maintenance window, expect a call. "Get a window." Go negotiate. If their outage becomes your outage, something is broke on your side. It's your customers calling your support lines.
Here is the thing though, those websites claiming programmer error should say the CIO blew it! Or, the CEO. Or, more importantly to us product managers, we blew it. Yes, us product manager blew it. We blew it, because we made promises, claims, and we set the expectations that 24/7/365 meant something. We did it without checking. We did it without ensuring or at least making it safe.
Twitter says "Too many tweets." It doesn't fail. It's safe. Frustrating, but safe. Frustrating only long enough for us to hit Refresh.
That constantly up service is in offer. As an offer component, it is the responsibility of the product manager whose application or service gets bundled with that offer component. Product managers can't hire the people or create the QA processes, but they can say over and over again, "We need it." At some point you'll get it. Hopefully, you'll get it before your customers evacuate to the fast follower of the week.
It may seem out of scope, but it's not. It affects your P&L, so speak up. Get this fixed. And, you know blaming a programmer will get you nowhere. Hey, your phone is ringing. It's change management.
And, what about that in-offer privacy policy? Yes, you again, the porduct manager. Scope creep. But, then it happens to the CEO everyday.
Talk back! Leave a comment. Or, for those of you out on an RSS feed, email comments to locke.david@rocketmail.com. Or, tweet me at @DavidWLocke
Thanks.