The Pitfalls of the Agile Testing Pyramid in High-Consequence Software: Lessons from CrowdStrike

admin10454
Aug 27, 2024
5 min read

Updated: Apr 17

By Andrew Park | 2024-08-27

The Agile Testing Pyramid has been a cornerstone for software teams utilizing Continuous Integration and Continuous Deployment (CI/CD) systems, with its foundation on unit tests, followed by component and integration tests, and a smaller focus on end-to-end tests. This model aligns well with the speed-driven nature of Agile and DevOps, facilitating faster software iterations and deployments. However, its limitations become starkly evident in high-consequence software systems, where failures can have catastrophic outcomes.

The recent CrowdStrike incident highlights these shortcomings, proving that for high-consequence applications, the Agile Testing Pyramid alone is insufficient. A stronger emphasis on integration and end-to-end testing is crucial to mitigate risks in industries such as nuclear power, defense, aviation, automotive, cybersecurity, banking, healthcare, space exploration, energy grid management, industrial control, and telecommunications, where failures can result in severe safety, security, or financial repercussions.

The Testing Spectrum: Pyramid, Trapezoid, and Rectangle

When considering testing strategies for software, it’s essential to recognize that different approaches are more suitable depending on the consequences of failure. For lower-consequence software, the Testing Pyramid remains a practical choice. As you move to higher-consequence systems, the testing strategy must shift to a Testing Trapezoid before finally transitioning to a Testing Rectangle for the most critical software applications.

The Testing Pyramid is fitting for lower-consequence software because it delivers the best return on labor investment cost to achieve sufficient quality. In these cases, prioritizing unit and component tests with fewer integration and end-to-end tests allows for rapid iteration and frequent deployments, which is perfect for systems where agility and speed outweigh the risks.
The Testing Trapezoid represents an intermediate strategy that adjusts the balance between unit/component tests and integration/end-to-end tests. This approach recognizes that as the consequences of failure increase, more emphasis on integration and real-world testing is needed to catch issues that unit tests alone cannot detect.
The Testing Rectangle is crucial for high-consequence applications, where rigorous testing at all levels—unit, component, integration, and end-to-end—is necessary. The goal is not balance, but a high degree of rigor across all levels of testing to prevent failures like CrowdStrike’s global outage.

The higher the consequences of failure, the more essential it becomes to adopt a Testing Trapezoid or Testing Rectangle strategy. Integration testing ensures that components work seamlessly together, while end-to-end testing simulates real-world scenarios to validate the entire system’s behavior.

The Risks of Under-Emphasizing Integration and End-to-End Testing

The Agile Testing Pyramid’s focus on unit testing makes sense for less critical software, where rapid iteration and frequent deployments are the goals. For lower-consequence systems, the Agile Testing Pyramid delivers the best return on labor investment cost to achieve sufficient quality.

However, as we move toward higher-consequence software, the Testing Pyramid becomes less effective, and a shift toward the Testing Trapezoid and eventually the Testing Rectangle is required.

For high-consequence software, this shift is necessary. In the case of CrowdStrike, despite their emphasis on CI/CD and unit tests, critical integration and end-to-end testing gaps led to catastrophic system failures across millions of devices.

The CrowdStrike software team placed too much confidence in their automated CI/CD unit tests, while also operating with critical gaps in their integration testing and end-to-end testing. CrowdStrike’s Preliminary Post Incident Review of this event reveals:

3 Lapses in CrowdStrike’s Integration Testing

Inadequate Validation: The engineering team failed to detect a bug in the Content Validator, allowing problematic content to pass through.
Insufficient Stress Testing for Instances: The engineering team did not perform thorough stress testing on individual Template Instances.
Lack of Continuous Integration Monitoring: The engineering team did not implement adequate continuous integration monitoring to detect issues in real-time.

3 Lapses in CrowdStrike’s End-to-End Testing

Failure in Scenario Testing: The engineering team did not conduct complete real-world simulation testing.
Absence of Staggered Deployment: The engineering team did not employ a phased rollout, leading to widespread crashes.
Insufficient Rollback Mechanism: The engineering team lacked effective rollback mechanisms, delaying the response to the issue.

CrowdStrike management has committed to addressing these integration and end-to-end testing gaps to prevent future issues, but their DevOps engineering teams will need to move away from solely prioritizing speed and agility. Many modern software organizations, in their drive for rapid iteration, tend to sacrifice thorough testing and risk management. This approach is dangerous for high-consequence software, where failure can have severe repercussions. Companies developing such critical applications must enhance their engineering discipline by focusing on rigorous code reviews, led by qualified staff, and investing heavily in integration and end-to-end testing. Minimizing risk should be the primary focus, rather than simply delivering updates quickly.

Exploratory Testing: Always Essential

While a shift toward the Testing Rectangle emphasizes more structured integration and end-to-end tests, Exploratory Testing should never be neglected. Exploratory testing, which involves human testers actively investigating the system without predefined scripts, is invaluable for uncovering edge cases and unexpected behaviors that automated tests may overlook.

However, accomplishing the Testing Trapezoid or Testing Rectangle approach will require sufficiently skilled Quality Engineering talent. Unfortunately, most companies lack both the quantity and the quality of Quality Engineering talent necessary to implement such rigorous testing strategies. Effective recruitment, training, mentoring, and retention of this talent are critical to success. Over the past 20 years, I have established processes for building, cultivating, and retaining strong Quality Engineering talent within my technical teams, and this has been essential to delivering successful high-consequence software applications.

Companies need to focus on recruiting effectively and providing continuous training and mentorship to grow and retain their Quality Engineering talent. Otherwise, implementing a robust testing strategy that includes exploratory testing may become an ongoing challenge for many organizations.

Lessons from CrowdStrike: The Need for a Holistic Approach

The CrowdStrike incident exposed the dangers of over-reliance on unit testing and under-investment in integration and end-to-end testing. By following the Testing Pyramid model, many teams fall into the trap of false confidence—seeing green checkmarks for unit tests without fully understanding the risks at higher levels of the system.

For high-consequence software, this approach is not sufficient. A Testing Rectangle framework, which provides rigor across unit, integration, and end-to-end tests, is necessary to mitigate risk. When combined with exploratory testing, this holistic approach ensures that every layer of the software is thoroughly tested and validated.

Conclusion: Rethinking the Testing Strategy for High-Consequence Software

The CrowdStrike incident makes one thing clear: for high-consequence software, Agile and DevOps teams must move beyond the traditional Testing Pyramid. Depending on the consequences of failure, teams need to adopt a Testing Trapezoid or even a Testing Rectangle philosophy—where unit, component, integration, and end-to-end tests are all rigorously emphasized—to prevent critical failures in software that has severe real-world consequences.

Additionally, exploratory testing by sufficiently skilled Quality Engineers must always be part of the process to account for the unexpected and to catch potential issues that scripted tests may miss. The goal is not just to develop software quickly, but to ensure that every release is reliable and safe. In high-consequence environments, nothing less is acceptable.

If your company is developing high-consequence software and needs help building a strategy for rigorous testing—or guidance on how to recruit, develop, and retain Quality Engineering talent—feel free to direct message me via LinkedIn.