Program managemnet question

Information Technology

shoba L India

Pl provide me answer for below question..

You are operating a mission-critical system on behalf of one of your customers. You are contractually committed to high-availability of over 99.999% of the time. What methods and procedures will you install to ensure the needed system availability? How will you act when you find out the system is down unexpectedly?

Posted: Sep 29, 2019 2:22 AM

Sort By:

James Shields IS Director - Portfolio Solutions| City and County of San Francisco, SFPD San Francisco, Ca, United States

The question you pose seems to be operational, not project.

Regardless, the answer on a high-availability requirement is always rooted in a solution that has redundancy, backup & fail-over.

Posted: Oct 1, 2019 2:12 PM

Stéphane Parent Self Employed / Semi-retired| Leader Maker Prince Edward Island, Canada

What methods and procedures will you install to ensure the needed system availability? James answered that question. It will be expensive but you will have need the processes to monitor and adjustt the ennvironment preemptively, rather than waiting for something to happen. For example, you may need to re-allocate CPUs, memory or disk space to allow for year-end processes.

How will you act when you find out the system is down unexpectedly? I will follow the process that will have been defined for such situations. You should have the process documented and tested properly.

Posted: Oct 1, 2019 2:35 PM

Sergio Luis Conte Helping to create solutions for everyone| Worldwide based Organizations Buenos Aires, Argentina

This is not about project management. Is about operations management. Procedures you have to implement are well knonw in the framework of quality. Manily take a look to non-functional attributes or requirements of the product. You can find a good guide if you take a look to Barry Boehm´s NFRs clasification. On the other side, others disciplines like ITIL will help you a lot.

Posted: Oct 2, 2019 7:36 AM (Updated by author: Oct 2, 2019 7:37 AM)

Karl Twort Senior Project Manager| Fresh Egg United Kingdom

As others have identified, this is operational, not project, however:

Monitoring, Response Process, Back-up(s), Fail Over(s), Documentation, Lessons

Monitoring - ensuring that your team are alerted at the very earliest opportunity is key to the response plan being initiated within the contracted SLAs. With this level of committed uptime, your infrastructure must be overpowered, rugged and resolute. Your monitoring will then buy you the time to address any early signs that could build to a critical outage.

Response Process - Documented, practiced, understood. If the team knows how to respond when a Critical issue is identified, they are one step ahead. Problems happen, its how the team are trained to handle them that will keep you calm and ready to get back to a stable environment

Backups - multiple backups sound like a solution here, in multiple locations. This gives you resilience in the event of a location-based outage. Mirroring to off-site back up locations mitigates this risk.

Fail Overs - Automate your failovers. This can be actioned on early warning signs, meaning that a failure of the primary system may even go unnoticed by the client if you have already monitored, predicted and switched to mirror system before the primary system fails.

Documentation - Not only for the process of how to react but to ensure you document the issue, its causes, the solutions and next steps.

Lessons - Making sure lessons, good and bad, are taken away from an issue is critical to future success. Issues, whilst distracting and sometimes costly are also learning opportunities which can be a positive move forward for not only the immediate, but future projects too.

Obviously, the above is a very high-level list, but certainly things that should be considered from the outset.

Karl

Posted: Oct 9, 2019 5:24 AM

Please login or join to reply

Program managemnet question

Sponsors

Vendor Events

Guessing is not a strategy: How to build decision velocity with AI and real-time data

Newsletters