Does your Disaster Recovery Plan Include the Cloud?

Disaster Recovery and the CloudIn years past, companies have relied on multiple data center locations to act as their main disaster recovery (DR) systems and data in case of disaster. This has generally worked well for those companies that have planned and tested their DR systems and plans appropriately.

In recent years organizations have been looking for more robust solutions for disaster recovery than storing their data in separate data centers. With the growth in popularity, functionality and capabilities of cloud technology and cloud vendors, CIO’s and IT Managers began to investigate the use of public, private and hybrid cloud systems for disaster recovery solutions.

It’s taken a while for many companies to feel comfortable with the cloud as a platform that is an integral part of their business systems, but most CIO’s and IT professionals have come to terms with the capabilities and impact of cloud technology. While secondary sites still dominate the disaster recovery planning for organizations, cloud deployment of disaster recovery solutions continues to grow. With a cloud DR deployment, companies can ensure geographic diversity for their data and cloud DR can allow a company to use multiple cloud vendors to ensure diversity of networks and systems for building a very robust disaster recovery plan.

Cloud-based disaster recovery makes a lot of sense, but there are still plenty of people worried about moving to the cloud for their DR. Many people get hung up on a few old myths (e.g., downtime doesn’t cost that much, disaster recovery means long-term contracts,  etc) that keep them from moving their disaster recovery systems and plans to the cloud while others believe their on-premise DR systems and plans will work just fine.

Cloud-based DR can provide an enormous amount of value to an organization. In the event of a disaster, a cloud-based system can help a company recover quickly and efficiently. Not only can data be stored safely and reliably in the cloud but systems and applications can be replicated in the cloud to allow the organization to bring their systems online quickly after a disaster.

Many clients that I work with have cloud-based disaster recovery systems in place or they’ve put them on their roadmap for the coming years. They’ve been able to look past the myths about the cloud and cloud-based DR and see the value. They see the benefits of the cloud for disaster recovery and have started shifting their disaster recovery planning and budget initiatives to the cloud.

From my experiences talking with CIO’s and other IT leaders, there’s quite a lot of interest in cloud technology these days. Many companies are looking at cloud-based disaster recovery for their next iteration of disaster recovery. Thankfully, people are starting to move past the concerns and myths about the cloud and are seeing it for what it is: a great platform for building agile, flexible and cost-effective solutions for their business.

What about your organization? Does your disaster recovery plan include the cloud?

This post is brought to you by the VMware vCloud Air Network.

Is Your SaaS Data as Protected as You Think?

Originally published on Its All About Recovery as Is Your SaaS Data as Protected as You Think?

3622973420_c6f43efb9d_mI pride myself in the backup and recovery process I use for my home. I’m also pretty proud to call myself a geek.

I have multiple computers at home with each having different functions, but all have a backup system in place that stores backups on the local machine as well as on a NAS. The NAS is then backed up to two separate off-site storage locations to ensure redundancy.

Additionally, I have another backup/recovery process that I use on my off-site Web server to ensure all of my Web-based data and content is backed up to the cloud.

While this is overkill for most people, it’s necessary to ensure that the data my wife and I use for our businesses is safely stored and ready for recovery in case disaster strikes. For every conceivable scenario that might hit a computer (or computers) my wife and I use, I have a disaster recovery scenario ready to go.

Or so I thought.

Now, for the sake of my pride, I have to say that the disaster occurred in one of my Google Mail accounts. I didn’t have a backup solution that covered my Google Mail account since Google provides a real-time backup of mail data.

The problem arose this past weekend. A large portion of emails had been deleted. They weren’t in the trash folder, nor were they filed away somewhere. They were just gone.

I contacted Google for help in restoring the data. Their response: “Sorry . . . the data is deleted . . . our backup solution doesn’t recover deleted data.”

My data is gone, and there’s nothing that Google can do about it.

I’ve spent a considerable amount of time building backup and recovery processes and solutions to protect my data. I went above and beyond what most folks do for their own home and felt I had all my bases covered — and I do, when it comes to the data that I have full control over.

My backup processes fell short when it came to dealing with SaaS-type applications like Google Mail. I was working under an assumption that the SaaS data was being backed up in the manner that I needed it to be.

I’ve got some work to do to ensure my SaaS data is as secure as my “local” data. I’m going to be looking for a solution that ensures my SaaS data is backed up just as securely as all my other data.

Are you using SaaS applications? If so, have you done your homework to make sure your SaaS data is as protected via backup as your local data? Does your current backup and recovery solution provide functionality to back up both your local data and your SaaS data?

Image Credit: Sunset over Dallas

Originally published on Its All About Recovery as Is Your SaaS Data as Protected as You Think?

The Tale of the Tape

Originally published on Its All About Recovery as The Tale of the Tape.

2944024050_04905681bb_qBack in my younger days, I worked in IT operations. One of the projects that consumed about a year of my life was a backup and recovery project.

The project’s scope was limited but important. We were tasked with selecting a new backup and recovery platform and building a new process that would take advantage of the new system.

The project was initially planned to be a fairly short selection and implementation project, with four months estimated for the selection process and six months for the implementation, testing and rollout.

After three months of review, testing and demos, we selected a solution. It was one of the top-tier backup and recovery solutions at the time. The implementation phase of the project went well with no real hiccups.

As part of the selection and implementation project, a new backup/recovery process was created for the organization. This new process was built around the new platform and was created from scratch to allow the organization to take full advantage of the features and functionality of the new solution.

We built a fairly robust process using on-site backup and off-site storage using tapes.

The solution that we chose would automatically build multiple tape copies of backups to be used for off-site storage. For robustness, we used three off-site backup locations, which required three separate tape copies of the backup volumes. Our off-site storage was geographically distributed to ensure safety, with one location being close enough to the data center to get same-day delivery if needed.

As part of the backup process, before shipping tapes off, we tested each tape to ensure it contained a backup and could be read from. During this testing process, the tapes were inserted into a computer provided as part of the new backup platform. This computer had one task – to check tapes for data and to ensure no corruption.

The new process was built. The system was implemented and rolled out. We were all happy . . . until we weren’t.

Crystal Bedell recently wrote a piece titled Tape vs. Cloud: Reducing the Risk of Data Loss During Backup and Recovery. In that post she wrote:

Despite all your best efforts, there are no guarantees that data backed up to tape will be there when you need it most. It’s not uncommon for backup data to become corrupted due to operational error or mishandling of the tape. It’s also possible to accidentally overwrite critical business data by inserting or partially formatting the wrong tape.

How true. And that’s exactly what happened to my organization about three months after going live.

One day during the integrity checking process of a new batch of tapes, we started seeing quite a few errors on tapes. Not just on one tape, but on multiple tapes. The first time the issue occurred, we thought it was a glitch and asked for support from the vendor. The glitch was highlighted as a known bug and fixed by the vendor.

The next week, we saw issues again. And then again. Our vendor provided updates and fixes for the integrity checking process, but we still saw issues throughout the next few weeks.

This integrity issue drove us crazy. We spent a little over a month working through the issue with little luck. Our backups were still running and our on-site storage was working, but our tape backups weren’t passing the integrity tests so we couldn’t rely on them.

Then one day we received a call from our vendor. They had shipped us a new tape drive to use with the integrity checking process. They wouldn’t really provide us with more information about why we needed to replace the drive other than to say the change was required.

When we received the new drive, we noticed it didn’t look much different from the original. It was the same size as the original and tapes were loaded in the same manner. After we installed the new drive, we started getting tapes to pass the integrity test. In fact, in the new drive, every tape passed the integrity test every time.

We were happy to see the issue resolved but I still wondered why we had seen the issues, so I asked to have the original drive added to another machine for testing. I grabbed a few tapes to test and planned to spend a few hours getting to the bottom of what the problem had been.

As it turns out, it didn’t take me a few hours to find the problem. I inserted the first tape and ran the integrity test and it failed. I then took that tape to the new drive and ran the tests and found that it passed. Then I took the tape back to the original drive and it failed. But now I realized why it failed.

The failure of the integrity checking process wasn’t due to the tape or the backup process. It was due to a simple oversight of the tape drive. The original drive had a small (but very important) defect in it. The defect? The drive didn’t actually force the tape into a “seated” position. In other words, the tape could never be fully inserted into the drive for the data to be read.

After three months of headaches and many hours spent trying to solve the problem, a simple defect was found to be the culprit. A simple plastic mount caused the tapes to not seat fully in the drive, thereby causing the data to fail integrity checks.

Crystal finishes off her post with this:

Backup and recovery is a necessity, but the headaches and risk associated with tape are not.

Talk about headaches. I had a lot of them from that tape drive fiasco. While there will always be headaches in IT, reducing the number of moving parts in the backup and recovery process can help reduce as many headaches as possible.

Moving to the cloud can help reduce those moving parts. The cloud isn’t a perfect solution, but it’s a solution that can help reduce some risks (and headaches) with your backup and recovery solution.

Image Credit: Tape Backups on flickr

Originally published on Its All About Recovery as The Tale of the Tape.

Is Your Data Really Protected?

Question mark made of puzzle piecesOriginally Published at Its All About Recovery as Is Your Data Really Protected?

You’ve spent a great deal of time and money getting your SaaS application “live.” You went with SaaS as a way to take advantage of the cloud and to offload some of the operational issues around your applications. Your SaaS application is built on a private cloud that your organization manages.

You’ve trained your users. You’ve made certain that the right governance model is in place to ensure everything is covered for the future.

Three weeks after you begin using your SaaS application, you run across a small problem. You start hearing about some data corruption issues from users. The data loss isn’t major and can easily be repaired by having users make a few minor data edits.

After a few more days, you’re still hearing about data corruption issues. You start to worry that you’ve got a major issue with your application—or perhaps a data security issue. The workflow of the application is analyzed to understand if there are any pieces of the process that might introduce corrupt data, but you find nothing out of the ordinary. Your team dives into access and security records to see if you’ve had a security breach of the SaaS system, but nothing points to nefarious activity.

Your team continues to use the system and continues to see small areas of data corruption. Again, it’s nothing major, but the corruption is there. After more digging, it appears that the corruption issues arise only at certain times of the day when the number of users reaches a very high level.

You reach out to your SaaS application vendor and ask them to take a look at the issues. They determine that there are some configuration issues that prevent the application from running correctly when a high user load exists.

Your vendor describes the necessary steps that your team needs to take to reconfigure the system. Over the weekend, your operations teams make the necessary changes to the system to meet the vendor’s requirements after running a full backup of the current data.

During these changes, the application’s database had to be reset to rebuild the tables to meet new requirements. Your team gets to the last few steps of reconfiguration and realizes a major issue.

When the SaaS database was initially configured, it was spread across multiple servers. The backup procedure required each server to have a separate backup process that stored backup data into a single location, which would then be stored off-site in the cloud.

The problem? The backup process didn’t take into account the complex nature of your SaaS database. Each backup agent stored the backup data for each server as a single file within the backup location. When it came time to do the necessary restore, the recovery process failed because the backup system didn’t store the right metadata required to restore the data to the appropriate location.

This wasn’t an issue during the initial test phase. During testing, the single server test system passed the backup/recovery process with flying colors. But the production system wasn’t tested.

All isn’t lost; the data can be restored. The downside to the recovery process is that it takes a great deal of time to recover using a recovery process that must be strictly followed. The downtime for this SaaS application grows from a few hours to a few days of recovery time. You are quite upset. Your team is quite upset. The organization is extremely upset.

The reason for going to the cloud was the robust nature of the systems and increased uptime (along with other benefits). But the lack of a well-planned backup process has immediately discounted the benefits of going with the cloud.

The above issues could have been mitigated by more robust testing, to ensure the backup/recovery process worked before going live. That said, this type of thing happens more often than most folks like to admit.

Have you considered all of the points of failure in your applications and systems? Do you know if your backup/recovery process will actually work the way it’s designed to work when it needs to work?

Originally Published at Its All About Recovery as Is Your Data Really Protected?

If you'd like to receive updates when new posts are published, signup for my mailing list. I won't sell or share your email.