Your Database is toast get out the butter and jelly

From the Project Management in Real Life Blog
by
Sharing my Project Management adventures and some tips. I try to keep my articles brief and to the point. Project Management is an Art, Science, and Discipline.

About this Blog

RSS

Recent Posts

Waterfall or Agile that is the Question

My Kanban Board

The Project Manager brand called ?

The System Failure Assessment

The Risk Assessment



It's Aloha Friday morning on my way to work and I get get a call from the Data Center that the mission critical system for the organization is acting strange. Users are unable to access the system. Moments latter I get a text page from the system that a process that needs to run is gone. I had scripts monitoring the system so I knew something bad happened. 

I logon on to the system and it's not normal and the process that needs to be running was missing. I spoke with the Data Center staff and determined that an operator error caused the problem. The database that was in production was an interim database version to migrate to a new permanent database later in the month. It had known bugs that we just stepped on by accident.

I call an emergency meeting with management to brief them that the database has been corrupted. I contacted the application vendor to take a look at the database integrity. I advised management to implement downtime procedures. Hours elapsed working to fix the database with no luck. The database cannot be repaired. My only option now is the last system backup. I asked the Data Center Manager to bring the backup tapes to me. I’m holding in my hands the tapes of the mission critical system of the organization. I need a successful restore to minimize the window of data loss.

I call another meeting to break the bad news to management and staff. It’s going to be a long night we need to inform our users that the system will be down for an extended time and we will be giving periodic updates on the recovery progress. Key personal needs to be on standby and available when called.

The restore process is initiated. Restoring a large system takes hours and patience waiting for it to complete. Once the restore completes we need to run the integrity checker utilities to make sure no corruption exists on the restore. The system was validated to release back to the users with some minimal data loss between the backup and the time of the database corruption. Made some minor tweaks on the system. It's back on-line for everyone to use.

I got in the office at 8am Friday and it’s now Saturday 8am. I just experienced a System Administrators nightmare. When you are coordinating a major Disaster Recovery effort it's all about teamwork. You need to remain calm as you become the point person that knows everything that's going on. All eyes are on you. My Project Management experience definitely came into play on this recovery effort that had to be put together on the fly. You really know what you are made of when you go through an experience like a major system outage. It's a real good feeling seeing everyone work together as a team.

(Note - this article was originally written by Drake Settsu and published on DrakeSettsu.BlogSpot.com in February 2015)

 

* This incident happened a long time ago and I will never forget it. Technology has come a long way since that disaster. Today's technology can provide a faster recovery from a major disaster. 

Posted on: January 31, 2018 06:21 AM | Permalink

Comments (10)

Please login or join to subscribe to this item
Great reminders, Drake! I had a few such horror stories from much earlier in my career and the variety of tape media I was using was pretty diverse. I still remember learning how to use one particular type of tape media where if you didn't unmount it properly, the tape itself would remain in the drive and you'd remove an empty cartridge! I also remember the horror of restoring a backup only to find that it was only an incremental backup!

Lessons learned & risk management go hand in hand!

Kiron

Thanks for sharing this incident, Drake. I had almost similar experience with my database project several years ago. It was a nightmare and I will never forget that.

Good story Drake. Reminds of me of some nightmare projects.

I think everyone who's been in IT as long as we have has a few hair-raising events. I remember the time we installed an IBM PC XT for our director general. The very next day, he called us to say it didn't work anymore.

We found out he wanted to format a diskette. From the DOS prompt, he simply typed format then Return. (That's what the Enter key was called back then.) That was back when the format command did not have a Are you sure (y/n)? prompt before proceeding. And, of course, when you didn't specify A: or B: it blithely defaulted to C:, your hard drive.

The best part was trying to explain that to the DG without saying "it's your fault!"

Thank you Kiron, Eduin, Anish, Rami, Sante & Stephane for your comments.

Hi,

Such experiences are really hair raising. And trust me everyone has such kind but different experience at-least once in their life.

But looking at the positive angle it does make us a bit cautious and prepared in future.

I once by mistake update LIVE database with wrong parameters. This was because I forgot the copy the where clause in the query.

It was such a big mess I still cannot forget it and that time my PM helped me a lot to get back the data. I truly still respect him for that help.

Else it would have cause a million dollar mess

Just remembered after reading your incident.

Hi,

Such experiences are really hair raising. And trust me everyone has such kind but different experience at-least once in their life.

But looking at the positive angle it does make us a bit cautious and prepared in future.

I once by mistake update LIVE database with wrong parameters. This was because I forgot the copy the where clause in the query.

It was such a big mess I still cannot forget it and that time my PM helped me a lot to get back the data. I truly still respect him for that help.

Else it would have cause a million dollar mess

Just remembered after reading your incident.

Thanks for sharing your unforgettable incident Rajesh.

How we respond to unforgettable incidents is what defines us.

Please Login/Register to leave a comment.

ADVERTISEMENTS

"If this isn't a Strad, I'm out 50 bucks."

- Jack Benny

ADVERTISEMENT

Sponsors