Testing vs. Alerting Part I

If you want to evaluate a testing plan, you must also consider your alerting plan.

Alerting and testing are complimentary, both serve to identify defects. Testing typically serves to identify defects before code is deployed to production, while alerting typically notifies developers of an issue with a running system.

You need to consider both testing and alerting to create an effective defect mitigation plan.

Start by asking yourself a few important questions:

  1. What is the businesses’ tolerance for defects in production of this system?
  2. Can we easily rectify production issues with this system postmortem (after being alerted), or will it cause non-trivial damage to business operations and reputation?
  3. Are developers capable and willing to do on-call fixes to production systems? How much ongoing cost is there in training?

Once you’ve identified your tolerance for defects in production (and ability to fix them), you can better evaluate what preventative measures should live as real-time alerting, and what measures should live as pre-deployment tests.

Often, I find that using alerting to catch errors is a magnitude less of a time investment compared to developing comprehensive integration testing to catch the same defects. The downside is that resolving the alerts still requires developer time and manual effort.

In my opinion, alerting on production systems is more fundamental than automated testing, but both should play some role in designing a defect mitigation plan.

The point is: You can’t design a good testing plan without having an alerting plan.

Leave a Reply

Your email address will not be published. Required fields are marked *