Project Management

Disciplined Agile

by , , , , , , , , ,
This blog contains details about various aspects of PMI's Disciplined Agile (DA) tool kit, including new and upcoming topics.

About this Blog


View Posts By:

Scott Ambler
Glen Little
Mark Lines
Valentin Mocanu
Daniel Gagnon
Michael Richardson
Joshua Barnes
Kashmir Birk
Klaus Boedker
Mike Griffiths

Recent Posts

Embracing Mindset Diversity in Disciplined Agile

Disciplined Agile: An Executive's Starting Point

Using Lean Agile Procurement (LAP) in complex procurement situations

Vendor Management in the Disciplined Agile Enterprise

Asset Management: What Types of Assets Might You Manage?

Do you test your ability to respond to emergencies?

Categories: Operations

IT Operations - Mitigate Disasters

Today in Canada we tested our nationwide emergency response system.  Apparently the test failed in the province of Quebec.  It did in fact succeed in Ontario, where I live.  Knowing about the test I purposefully had my phone on this afternoon because I was interested in what would actually happen.  Sure enough, my phone made a very annoying noise and a message came up to inform me that it was just a test.  So that was good.

An important aspect of IT Operations, and Business Operations for that matter, is to be prepared to respond to emergencies.  While the Canadian government is worried about responding to inclement weather, terrorist attacks, military attacks, and coffee being sold out at the local Timmies your IT department should be concerned about ensuring that your systems are running properly, that they are repelling cyber attacks, and that your data centres are operational (to name a few potential issues). This is why the IT Operations process goal includes a process decision point called Mitigate Disasters (see the pic above).

By running this scheduled disaster simulation, after careful planning and communication (which I why I had heard about it), the Canadian government has discovered in a controlled test that their strategy needs work.  This is exactly the type of thing you want to find out when you have the luxury of safely addressing any problems that you do find.  The government certainly wouldn’t have wanted to discover their emergency alert system didn’t work as expect in the middle of an actual emergency.

What your organization should ask itself is what would happen if:

  • One of your data centres lost power?  Or connectivity?
  • Some of your servers went down?
  • Outsourced services you rely on (think SAAS, PAAS, and other cloud solutions) went down?
  • An application/system went down?
  • A denial of service (DoS) attack succeeded?
  • And many more issues.

Will your IT ecosystem respond properly?  Will it recover automatically?  Are you guessing at these answers or do you know for sure because you’ve actually simulated them?

I hope this blog has been food for thought.  Time for a Timmies.

Posted by Scott Ambler on: May 07, 2018 02:49 PM | Permalink | Comments (0)

The Lean IT Operations Mindset

The Disciplined Agile (DA) toolkit describes strategies for how an organization’s IT group can support a lean enterprise.  An important part of this is to have an effective IT operations strategy, and to do that the people involved need to have what we call a “lean IT operations mindset.”  The philosophies behind such a mindset include:

  1. Run a trustworthy IT ecosystem.  At a high level the goal is to “keep the lights on.”  At a detailed level anyone responsible for IT operations wants to run an IT ecosystem that is sufficiently secure, resilient, available, performant, usable, and environmentally friendly.  Part of running a trustworthy ecosystem is monitoring running services so as to identify and hopefully avoid potential problems before they occur.  For some systems, and perhaps for your IT ecosystem as a whole, you may have service level agreements (SLAs) in place with your end users that guarantee a minimum level of trustworthiness.
  2. Focus on the strategic (long-term) over the tactical (short-term).  Anyone responsible for IT operations needs to have a very good understanding between the long-term implications of a decision versus the short-term conveniences.  A classic example of this right now is the preference of people building micro-services to use what they believe to be the best technologies for each service.  This makes a lot of sense from the narrow viewpoint of that service and it often proves to be incredibly convenient, and fun, for the developers because they often get to work with new technologies.  However, from an operational point of view you end up with a mishmash of technologies that must be operated and evolved over time, resulting in a potential maintenance nightmare.  Yes, you will still make some short-term decisions but you should do so intelligently.  Too great a focus on the long term results in a stagnant IT ecosystem, too great a focus on short-term decisions results in operations teams who spend all their time fighting fires.  The long-term technical vision for your organization is developed by your Enterprise Architecture efforts and the long-term business vision comes from your Product Management activities.
  3. Streamline the overall flow of work.  This arguably should be part of everyone’s mindset, but it is particularly important for people doing IT operations work.  IT operations has traditionally been a bottleneck in many organizations, often the result of the need to run a trustworthy ecosystem and to focus on long-term considerations, hence the need to focus on streamlining the overall flow of work. BUT, this isn’t just operational work that we need to streamline, but the overall flow of work into, within, and out of IT operations.  In this case we need a disciplined approach to DevOps that takes all aspects of the development-operations lifecycle into account, including the support of multiple development lifecycles (not just continuous delivery), the release management process, and the operational aspects of data management.  Of course, streamlining the flow of work goes beyond development-operations and is an important goal of your organization’s continuous improvement strategy.
  4. Help end-users succeed.  An important goal of people performing operations activities is to ensure that your end users are successfully using your IT systems.  It doesn’t matter how well your systems are built, or how trustworthy they are, if your end users are unable or unwilling to use them effectively.  End users are going to need help – you need to be prepared to provide a support function.
  5. Standardization without stagnation.  The more standardized your IT ecosystem is the easier it will be to run, to release new functionality into, and to find and fix problems if they should arise.  However, too much standardization can lead to stagnation where it becomes very difficult to evolve your ecosystem.  You will need to work very closely with people performing enterprise architecture and product management activities to ensure that you understand the long term vision and are working towards it.
  6. Regulate releases into production.   Most DevOps strategies reflect the viewpoint of a single product team.  But what about the viewpoint of your overall IT ecosystem, which may comprise hundreds of products?  An interesting question to ask is what is the WIP limit for releases across your overall ecosystem?  In other words, what rate of change can your infrastructure, and your stakeholder community, bear?  In the Disciplined Agile (DA) toolkit this philosophy is an important driver of the Release Management process blade.  Furthermore, some regulatory compliance regimes call out a separation of concerns pertaining to release management – the people building a product are not allowed to release the product into production, someone else must make that decision and do the work (even if “the work” is merely pressing a button to run a script).
  7. Sufficient documentation.  Yes, there will be some documentation maintained about your IT ecosystem.  Hopefully this documentation is concise, accurate, and high-level.  Common documentation includes an overview(s) of your infrastructure, release procedures (even if fully automated, there’s still some overview documentation and training), and high-level views of critical aspects of your infrastructure including security, data architecture, and network architecture.  Organizations that operate in regulated industries will of course need to comply to the documentation requirements of the appropriate regulations.  When infrastructure components are discoverable and self-documenting there is a lesser need for external documentation, but there is still a need.  Any documentation that you do create should be maintained under configuration management (CM) control.

Future blog postings in this series about IT operations and support will explore topics such as why you need IT operations and support, what activities you perform, and the workflow of doing such.

Posted by Scott Ambler on: June 01, 2016 10:13 AM | Permalink | Comments (0)

DevOps Strategies: Operations

Categories: Operations

DevOps Practices - Operations

There are several technical strategies that support the operational aspects of DevOps:

  1. Solution monitoring.  As the name suggests, this is the operational practice of monitoring running solutions and applications once they are in production. Technology infrastructure platforms such as operating systems, application servers, and communication services often provide monitoring capabilities that can be leveraged by monitoring tools (such as Microsoft Management Console, IBM Tivoli Monitoring, and jManage). However, for monitoring application-specific functionality, such as what user interface (UI) features are being used by given types of users, instrumentation that is compliant with your organization’s monitoring infrastructure will need to be built into the applications. Development teams need to be aware of this operational requirement or, better yet, have access to a toolkit that makes it straightforward to provide such instrumentation.
  2. Standard platforms. Software development practices, such as continuous deployment and initial architecture envisioning, are enabled by consistency within your operational infrastructure. It is much easier to deploy to a handful of standard hardware configurations than it is to a myriad of unique ones. It is easier to deploy when there are consistent versions of infrastructure software (e.g. operating systems, databases, middleware, and so on) deployed across your environment. For example, all instances of your Oracle DB are, you don’t have,, and installed in various places. Furthermore, it is much easier to make architecture decisions when there is consistency of infrastructure software packages in the first place. For example you standardize on Linuz for your server operating system, you don’t also have Windows, z/OS and others also in production (and if you do you’re actively retiring them).
  3. Deployment testing. After a solution, or an update to a component of your operational infrastructure, has been deployed you should run a quick set of tests to verify that the deployment was successful. Were the right versions of the files installed where they need to be? And were they deployed to all appropriate servers? Were database transformations applied successfully? Did the appropriate announcements, if any, get sent out? Did the overall deployment process run within the desired time frame?
  4. Automated deployment.  Deployments should be automated, not manual. This increases the consistency of your deployments and supports the practice of continuous deployment. Part of your automation effort should be to support both self-recovery and self-testing as native aspects of your deployment strategy.
  5. Support environments. Anyone doing solution support, even if it is the development team itself, is likely to need an environment in which they can reproduce problems that end users experience. There are several options available to you:
    • Production. In some cases your production environment is sufficient, although many regulatory regimes, particularly life-critical and financial-critical ones, will not allow this.
    • Pre-production test sandbox. Some support teams will find that they can use their pre-production test environment to try to simulate production problems. The advantage is that you don’t put production at risk when trying to reproduce problems, the disadvantage is that you the test environment will be different than production and as a result you may not be able to simulate all reported problems.
    • Support sandbox. Some organizations choose to have a specific environment set up to enable support staff to simulate production problems. This strategy has the same tradeoffs as using a pre-production test sandbox plus the additional cost and maintenance associated with yet another environment.

In the next blog posting in this DevOps series we will explore solution support strategies.

Posted by Scott Ambler on: February 19, 2015 04:59 AM | Permalink | Comments (0)

DevOps: Operational Disaster Strategies

Burning Building

There are several disaster mitigation strategies that IT departments may choose to adopt:

  1. Disaster planning. Disciplined organizations will plan for operational disasters. Potential disasters include servers going down, network connectivity going down, power outages, failed solution deployments, failed infrastructure deployments, natural disasters such as fires and floods, terrorist attacks, and many more. This planning will include identification of potential problems, identification of strategies to address those problems, and putting mechanisms in place to hopefully mitigate the disasters. Potential strategies to address these disasters include building solutions that self-test and self-recover, building redundancies into your operational infrastructure, having disaster procedures in place, and practicing those procedures in simulated disasters.
  2. Scheduled disaster simulation. It is one thing to have disaster mitigations plans in place, it is another to know whether they actually work. Disciplined organizations will run through disaster scenarios to verify how well their mitigation strategies work in practice. For example, to test whether your power outage emergency plan works you would purposely simulate a power outage at one of your data centers and then work through your recovery plan. Like fire drills, these simulations should be done on a regular basis so that staff members build up the “body memory” required to act swiftly and appropriately in an emergency. The advantage of a scheduled disaster simulation is that you knowingly run it at a time where you will have minimal impact on your stakeholders.   A disadvantage, at least when people are informed of the simulation ahead of time, is that people are mentally prepared for the simulation and aren’t caught unaware and thereby you don’t simulate the real level of stress that people would be under during an actual emergency.
  3. Random disaster simulation. Very disciplined organizations will implement a service within their operational environment that causes problems such as server or service outages at random times. An example of this is the Chaos Monkey functionality in Amazon’s Web Services (AWS) offering, functionality that is being implemented within many organizations now. The Chaos Monkey injects random problems into production to verify that the IT operations organization is capable of overcoming them. This is done to verify that your solutions really are able to automatically recover from problems and failing that at least operators are alerted to the problem.

As you would expect, truly disciplined organizations have adopted all of these strategies.

Related blog postings:


Posted by Scott Ambler on: February 17, 2015 02:33 AM | Permalink | Comments (0)

DevOps Teaming Strategies

Categories: DevOps, DevOps, Operations, Teams, Teams

DevOps - Initial vision

There are several teaming strategies that you can choose to adopt when it comes to getting development professionals and operations professionals to work together. Starting with the least effective and working our way to the most effective, they are:

  1. Production hand-off. When a development team releases a solution into production the operations team takes on the responsibility for running and supporting the solution. At this point the development team is often disbanded or moves on to another effort. A sustainment team of one or more developers may be formed to perform maintenance updates as needed over time, or the responsibility to do this work is given to an existing sustainment team.   The advantage of this approach is that your organization no longer has to fund the full development team moving forward. However, you risk losing the knowledge and expertise of the team that is required to maintain and evolve the solution over time. This can be particularly problematic when there are high-severity defects to be fixed.
  2. Warranty period. With this strategy the development team commits to fixing critical defects for a pre-defined period of time after the solution is released into production. For example, a development team may be required to fix any severity 1 or severity 2 defects free of charge for the first thirty days following a production release. Warranty periods are often combined with the production hand-off strategy to reduce the risks associated with it. Warranty periods are also common when development teams are funded via a fixed-price funding model or in outsourcing situations because the stakeholders typically want to ensure that they received the level of quality that they paid for.
  3. Production support.  In enterprise environments most application development teams are working on new releases of a solution that already exist in production. Not only will they be working on the new release, they will also have the responsibility of addressing serious production problems that are escalated to them. The development team will often be referred to as “level three support” for the application because they will be the third (and last) team to be involved with fixing critical production problems. The primary advantage is that production emergencies associated with a specific solution are often resolved by the most qualified people – the actual developers of that solution. Another advantage is that it gives developers an appreciation of the kinds of things that occur in production, providing them with learning opportunities to improve the way that they design solutions in the first place. A potentially significant disadvantage is that the need to fix production emergencies will distract the development away from working on new functionality.
  4. Developer-led operations.  This strategy turns up the dial on production support by having the development team be responsible for operating and supporting their own solution. This is often referred to as “you build it you run it”. This strategy has the benefits that it focuses the team on ensuring that their solution is easy to operate and support and it ensures that the most qualified people are the ones evolving the solution. However, this strategy results in Scrum teams producing silo solutions running on disparate platforms – luckily DAD teams are enterprise aware and include someone in the role of architecture owner who will guide the team in avoiding this very sort of architecture mistake. Another common strategy is to include someone with strong operations experience in your team.  A developer-led operations strategy also runs the risk of varying levels of support quality as some teams will be better than others at this.  Once again, teams that are enterprise aware will be following common guidelines and will reach out to other teams for help in improving their approach.

Of the four approaches listed above, the only one that is clearly a DevOps strategy is developer-led operations. The production support strategy is definitely a step in the right direction and is often seen as sufficient in many enterprises. If this is the case in your organization we recommend that you experiment with the developer-led operations strategy on a few teams to see how well it works for you. We suspect that you’ll be pleasantly surprised.

In the next blog in this series we will explore disaster mitigation strategies.

Related blog postings:

Posted by Scott Ambler on: February 13, 2015 05:32 AM | Permalink | Comments (0)

"Put your hand on a hot stove for a minute, and it seems like an hour. Sit with a pretty girl for an hour, and it seems like a minute. THAT's relativity."

- Albert Einstein