by Danial Schwartz
In Disciplined Agile Delivery (DAD), testing is so important we do it all the way through the lifecycle. One approach that your team will need to consider performing is recovery testing, which is used to see the ability of a system to handle faults. If a fault occurs, does the system keep working and does not stop? In case of a fault can the system recover within a specified period of time? In the event of a critical failure will damage such as physical, economical, health related, etc., result or not?
Recovery testing constitutes of making the system fail; then the results of system recovery are observed. The efficiency of the system to return to normal and the time it takes to do so are examined. The disturbances which can result in failure and need to be checked vary from product to product and from industry to industry.
Consider the healthcare industry and medical devices. When products are developed for the health care industry they have to be in strict accordance with FDA guidelines. They also have to adhere to the guidelines provided by the company for which the product is being made. When recovery tests are made they naturally have to comply with these strict rules. The tests require validation and so does the environment in which they are to be carried out.
The Defense Industry consists of complex systems embedded within one another. The interlink of the systems requires recovery testing which takes into account how different systems affect one another. Since the industry has to deal with harsh environmental variables, these have to be replicated for recovery testing. Doing so is no easy task.
Cloud applications are increasing in popularity. They are part of cloud systems. The cloud systems, in turn, are made up of commodity machines. This allows taking advantage of economies of scale. But this results in needing to use complex software which makes recovery testing quite a challenge.
Before a recovery test can be carried out, the software recovery tester has to make sure that recovery analysis has been undertaken. A fail over test is designed. The fail over test serves to determine that if a given threshold is reached, can the system allocate extra resources. It also serves to show if, in case of critical failure, a system can distribute resources and continue to operate or recover within a specified time.
Consider the example of a server which is reachable but it is not responding as one would expect it to. This is the fail-over cause. The result of this, known as the possible impact, could be a crash. The severity of the impact is medium to high. To simulate this one could initiate wrong responses on the server side.
Another example of a fail-over cause is a power supply failure. If the failure was in the auxiliary power source its possible impact could be a complete shutdown. This is critical. To simulate this the system could be subjected to a change in power strength or the power cord could simply be unplugged.
A low impact severity example includes a DB overload. This could result in slow response time. It could also result in information not being fetched from the DB leading to an error. Using appropriate tools a load test could be created to simulate this scenario.
At times a service might stop posing a low to high impact severity depending on the service which stopped. There might not be any possible impact or an application might stop working. To simulate this one could stop the service manually to see the possible impact.
The tester also has to ensure that the test plan and test environment are prepared, information is backed up, the recovery panel has been provided education and a record is kept of the techniques used for recovery.
Use of resources and having to deal with unpredictable possibilities makes recovery testing a daunting task, but its benefits are worth the trouble. First, recovery testing improves the system quality. It removes risk since one knows that in case of a failure the system will continue to work. Second, recovery testing results in a staff which is educated to perform recovery failure when need arise. Third, recovery testing also fixes problems and mistakes in a system before it has to go live. Finally, recovery testing shows how important recovery is and raises awareness of the fact that long term business continuity relies heavily on recovery management.
In conclusion, recovery testing is used to see how a system behaves when failure occurs. Recovery testing can be a tedious process but shows the efficiency of a recovery plan, educates the staff on how to deal with faults and failures which occur in systems, highlights the importance of recovery at times of crisis to members of the IT and business organizations, and shows how important it is to the long term success of a business to have a recovery strategy in case of a disaster.
About the Author
Danial Schwartz is a content strategist who sheds light on various engaging and informative topics related to the health IT and Q&A industry. His belief in technology, compliance and cost reduction have opened new horizons for people in the health care industry. He is passionate about topics such as Affordable Care Act, EHR,testing, test automation, and privacy and security of data.
There are also several common operations-friendly features that developers with a Disciplined DevOps mindset will choose to build into their solutions:
Now that we have overviewed a collection of development practices and implementation features, in the next blog posting in this series we will explore strategies that streamline your operations efforts.
One of the process goals that a disciplined agile team will want to address during construction is Accelerate Value Delivery. Ideally, in each construction iteration a team will move closer to having a version of their solution that provides sufficient functionality to its stakeholders. This implies that the solution is a minimally viable product (MVP) that adds greater business value than its cost to develop and deploy. Realistically it isn’t a perfect world and sometimes a team will run into a bit of trouble resulting in an iteration where they may not have moved closer to something deployable but hopefully they’ve at least learned from their experiences.
This is an important process goal for several reasons. First, it encompasses the packaging aspects of solution development (other important development aspects are addressed by its sister goal Produce a Potentially Consumable Solution). This includes artifact/asset management options such as version control and configuration management as well as your team’s deployment strategy. Second, it provides deployment planning options, from not planning at all (yikes!) to planning late in the lifecycle to the more DevOps-friendly strategies of continuous planning and active stakeholder participation. Third, this goal covers critical validation and verification (V&V) strategies, many of which push testing and quality assurance “left in the lifecycle” so that they’re performed earlier and thereby reducing the average cost of fixing any defects.
The process goal diagram for Accelerate Value Delivery is shown below. The rounded rectangle indicates the goal, the squared rectangles indicate issues or process factors that you may need to consider, and the lists in the right hand column represent potential strategies or practices that you may choose to adopt to address those issues. The lists with an arrow to the left are ordered, indicating that in general the options at the top of the list are more preferable from an agile point of view than the options towards the bottom. The highlighted options (bolded and italicized) indicate default starting points for teams looking for a good place to start but who don’t want to invest a lot of time in process tailoring right now. Each of these practices/strategies has advantages and disadvantages, and none are perfect in all situations, which is why it is important to understand the options you have available to you.
Let’s consider each process factor:
We want to share two important observations about this goal. First, this goal, along with Explore Initial Scope, Coordinate Activities, and Identify Initial Technical Strategy seem to take the brunt of your process tailoring efforts when working at scale. It really does seem to be one of those Pareto situations where 20% addresses 80% of the work, more on this in a future blog posting. As you saw in the discussion of the process issues, the process tailoring decisions that you make regarding this goal will vary greatly based on the various scaling factors. Second, as with all process goal diagrams, the one above doesn’t provide an exhaustive list of options although it does provide a pretty good start.
We’re firm believers that a team should tailor their strategy, including their team structure, their work environment, and their process, to reflect the situation that they find themselves in. When it comes to process tailoring, process goal diagrams not only help teams to identify the issues they need to consider they also summarize potential options available to them. Agile teams with a minimal bit of process guidance such as this are in a much better situation to tailor their approach that teams that are trying to figure it out on their own. The DA toolkit provides this guidance.
Early in the lifecycle, during the Inception phase, disciplined agile teams will invest some time in initial requirements envisioning and initial architecture envisioning. One of the issues to be considered as part of requirements envisioning is to identify non-functional requirement (NFRs), also called quality of service (QoS) or simply quality requirements. The NFRs will drive many of your technical decisions that you make when envisioning your initial architectural strategy. These NFRs should be captured someone and implemented during Construction. It isn’t sufficient to simply implement the NFRs, you must also validate that you have done so appropriately. In this blog posting I overview a collection of agile strategies that you can apply to validate NFRs.
A mainstay of agile validation is the philosophy of whole team testing. The basic idea is that the team itself is responsible for validating its own work, they don’t simply write some code and then throw it over the wall to testers to validate. For organizations new to agile this means that testers sit side-by-side with developers, working together and learning from one another in a collaborative manner. Eventually people become generalizing specialists, T-skilled people, who have sufficient testing skills (and other skills).
Minimally your developers should be performing regression testing to the best of their ability, adopting a continuous integration (CI) strategy in which the regression test suite(s) are run automatically many times a day. Advanced agile teams will take a test-driven development (TDD) approach where a single test is written just before sufficient production code which fulfills that test. Regardless of when tests are written by the development team, either before or after the writing of the production code, some tests will validate functional requirements and some will validate non-functional requirements.
Whole team testing is great in theory, and it is strategy that I wholeheartedly recommend, but in some situations it proves insufficient. It is wonderful to strive to have teams with sufficient skills to get the job done, but sometimes the situation is too complex to allow that. There are some types of NFRs which require significant expertise to address properly: NFRs pertaining to security, usability, and reliability for example. To validate these types of requirements, worse yet even to identify them, requires skill and sometimes even specialized (read expensive) tooling. It would be a stretch to assume that all of your delivery teams will have this expertise and access to these tools.
Recognizing that whole team testing may not sufficiently address validating NFRs many organizations will supplement their whole team testing efforts with parallel independent testing . With this approach a delivery team makes their working builds available to a test team on a regular basis, minimally at the end of each iteration, and the testers perform the types of testing on it that the delivery team is either unable or unlikely to perform. Knowing that some classes of NFRs may be missed by the team, independent test teams will look for those types of defects. They will also perform pre-production system integration testing and exploratory testing to name a few. Parallel independent testing is also common in regulatory compliance environments.
From a verification point of view some agile teams will perform either formal or informal reviews. Experienced agilists prefer to avoid reviews due to their inherently long feedback cycle, which increases the average cost of addressing found defects, in favor of non-solo development strategies such as pair programming and modeling with others. The challenge with non-solo strategies is that managers unfamiliar with agile techniques, or perhaps the real problem is that they’re still overly influenced by disproved traditional theories of yesteryear, believe that non-solo strategies reduce team productivity. When done right non-solo strategies increase overall productivity, but the political battle required to convince management to allow your team to succeed often isn’t worth the trouble.
Another strategy for validating NFRs code analysis, both dynamic and static. There is a range of analysis tools available to you that can address NFR types such as security, performance, and more. These tools will not only identify potential problems with your code many of them will also provide summaries of what they found, metrics that you can leverage in your automated project dashboards. This strategy of leveraging tool-generated metrics such as this is a technique which IBM calls Development Intelligence and is highly suggested as an enabler of agile governance in DAD. Disciplined agile teams will include invocation of code analysis tools from you CI scripts to support continuous validation throughout the lifecycle.
Your least effective validation option is end-of-lifecycle testing, in the traditional development world this would be referred to as a testing phase. The problem with this strategy is that you in effect push significant risk, and significant costs, to the end of the lifecycle. It has been known for several decades know that the average cost of fixing defects rises the longer it takes you to identify them, motivating you to adopt the more agile forms of testing that I described earlier. Having said that I still run into organizations in the process of adopting agile techniques that haven’t really made embraced agile, as a result still leave most of their testing effort to the least effective time to do such work. If you find yourself in that situation you will need to validate NFRs in addition to functional requirements.
To summarize, you have many options for validating NFRs on agile delivery teams. The secret is to pick the right one(s) for the situation that you find yourself in. The DA toolkit helps to guide you through these important process decisions, describing your options and the trade-offs associated with each one.