As testers, we’re used to performing most of our jobs in an isolated bubble where the worst that can happen is, often, getting blocked by a bug. We test code, sometimes in multiple environments (just to be really sure) and then we push up to production. Once the code is there, we may do some smoke testing or sanity checking, just to check there’s no 404s as the result of a bad deploy… and then we scurry off as quickly as possible, for fear of affecting (whisper it) real live data.
This talk, from Marcel Gehlen and Benjamin Hofmann was met, at least at the outset, with some degree of incredulity, something the hosts acknowledged right at the start. How do we test in live isn’t even the question in most of the room’s mind – the bigger question was “WHY?!”, something which was soon answered.
The Final Frontier? Testing in Production – Marcel Gehlen and Benjamin Hofmann
The staging environment is a lie
Marcel and Benjamin started by telling us our staging environment lies to us. This is pretty hard to dispute, and as testers we have all no doubt experienced pain from environmental mismatch at some point in our careers. If we are to accept that we want live-like data and live-like code, why do we accept non-live-like hardware or deployment? The reason, almost always, is the cost implication of a live-like system. We are always pushed to reduce costs and in reality it is hugely wasteful to maintain a second live instance which is never seen by a customer, just so the organisation’s few testers have higher confidence.
Whilst we are used to taking that risk, to accepting that we not only do not but cannot have a “really realistic” environment to stage tests on, that doesn’t have to be the case. We have an environment already, literally sitting right there waiting to be used, with the exact spec, hardware and infrastructure of production.
It’s called… production.
So Marcel and Benjamin’s talk became a kind of “How To” on getting into the mindset hat it’s fine to test on production, in order to leverage this advantage.
Monitoring as a tool to enable Production Testing
One of the big thrusts of their advice was to build strong and reliable monitoring in order to establish the basic “I haven’t totally fucked the whole system” approach to testing in Prod. If we have strong monitoring it’s generally quite easy to tell if something has caused a really horrible outage or made the error rate or response time spike – but Benjamin cautions us to take a wide view of what we might see as periodic fluctuations. He showed a slide with a zoomed in graph, showing a drop in logins. Then he zoomed out and showed that, over a daily cycle, this was actually normal.
The point here was that it’s very easy to panic and misinterpret monitoring data when we’re fearful. By taking a more clear-headed and holistic approach to genuine risks in our system, we gain better information.
Deployed vs Released Code
A key distinction in the presentation was between deployed and released code.
Deployed code: code on production servers.
Released code: code which is accessible to customers.
Code can be deployed without being released! Things like feature flagging can make it very easy to test new features on production hardware, as well as A/B, Canary or Blue/Green deployments. More on these below:
A/B Deployment – Load balancer sends all customer traffic to a subset of the servers available. The remaining servers are upgraded to the new version and tested. Once the new version passes testing, the load balancer switches all customer traffic to the new (now tested) version. The other servers are then upgraded (and may be tested again in certain circumstances), before the load balancer is returned to distributing customer traffic to all servers.
Canary Deployment – As per A/B deploys, this uses all available servers on 2x versions, but customer traffic is distributed to both from the start (eg 90/10 old/new). As the error rate, response time etc are proven not to degrade, the load balancer gradually sends more and more traffic to the new version.
Blue/Green Deployment – Load balancer sends all customer traffic to one complete set of servers. Another complete set of servers are upgraded to the new version, where testing occurs. Assuming testing passes, the load balancer diverts all customer traffic to the new (now tested) version. The issue with this approach is it requires 2x the hardware.
By keeping in mind there are a ton of strategies for keeping deployed code distinct from released code, we build ourselves opportunities to test before the customer gets hands-on with the product.
Reduce time to repair – and deploy small!
Another enabler for testing in production is to reduce the time it takes to repair the code if it does break. Releasing a bug needn’t be the end of the world and if devs can quickly patch, there’s less concern about the significance of production releases.
Another mention in this section should go to Matthew Bretten who gave a 99 Second Lightning Talk about deploying smaller code changes in order to make testing in production more feasible. He suggested by only releasing smaller features rather than overhauling the system in major “version intervals” we do not place ourselves at the same level of risk – if something’s wrong, we can fix it quickly in live without screwing up every customer’s day. Smaller, more frequent releases equate to smaller risk.
It’s definitely an interesting thought to consider, and the points around scaling hardware to meet testing requirements is a persuasive one. Like many in the room, I have concerns about how this would fit with other parts of our software’s life-cycle, but Marcel and Benjamin gave some really strong guidance in how to actually achieve some of this benefit whilst exposing ourselves to limited risk. I can say with some certainty, this was one of the talks which generated the most conversation after it had finished, as few people seemed to do any real substantive testing in production.
The concern remains, and the solutions to not pulling down the whole production platform involve things like “build technical resilience”. This is absolutely a worthy goal which touches more than just whether or not we can test in production, but like any initiative it requires investment from the business to become a reality. Without that investment, many teams find themselves on an endless delivery mill, running to keep place and without much time to address technical debt, much less technical resilience.
However their pathway was clear – start small, do one meaningful thing in live. Build from there. It’s hard to argue we can’t all find one thing we could test in live which would give us SOME greater confidence in our software than we currently have, prior to customers signing in.
The platform is part of the thing the user gets to interact with. It’s part of the “truth” of your software. As such, if you’re not testing in live, you’re not testing the real software. Sure, you can get pretty close, but you are at some level releasing your code untested on your user base. It doesn’t have to be this way! By being smart with your production environment, your deployment mechanism and by capitalising on the distinction between deployed code and released code, you can do better (and more realistic) testing.
A mention should go to the wonderful art on the slides, showing Star Trek scenes including many red-shirts meeting their bloody fate. Thanks to Franziska Haaf for these – the room loved them!