Avoiding a data disaster: could your business recover from human error?“Data.” Ask senior management at any major organization to name their most critical business asset and they’ll likely respond with that one word.
As such, developing a disaster recovery strategy – both for data backup and restoration – is a central part of planning for business continuity management at any organization. It is essential that your company and the vendors you work with can protect against data loss and ensure data integrity in the event of catastrophic failure – whether from an external event or human error.
Think about this: What would you do if one of your trusted database administrators made a mistake that wiped out all of your databases in one fell swoop? Could your business recover?
Backing up data at an off-site data center has long-been a best practice, and that strategy relates more to the disaster recovery (DR) component of business continuity management (BCM). DR and BCM go hand in hand, but there is a difference: BCM is about making sure the enterprise can resume business quickly after a disaster. Disaster recovery (DR) falls within the continuity plan and specifically addresses protecting IT infrastructure – including systems and databases – that organizations need to operate.
While replicating data off-site is smart, it doesn’t fully address human error, which can be an even greater risk for businesses than a major external catastrophe. The human error factor is why a two-pronged approach to disaster recovery makes sense. Backing up customer data off-site means it is protected from a major uncontrollable event like a natural disaster. But a local strategy is also essential to ensure there are well-trained people, defined processes and the right technology in place to reduce the risk of human error.
Think automation and consider the cloudAutomation of backups (making a copy of the data), replication (copying and then moving data to another location), off-site verification and restoration processes are the most effective ways to address the risk of human error.
Storage replication mirrors your most important data sets between your primary and DR site or service. Most, if not all, mainstream storage vendors provide this functionality out of the box or for a license fee. The replication should support scheduling replication events, mirroring data sets against your recovery point objective (RPO) and archival services that allow systems administrators to setup policies that match your business continuity objectives (e.g., six months of offsite monthly archives). And, for added protection, consider FIPs certified encryption solutions at the disk or controller level, which protects your most critical and sensitive data against accidental exposure by encrypting your data at rest.
You can also leverage WAN Acceleration technologies to accelerate your offsite replication and/or backups by maximizing the efficiency of your data replication or backup streams and saving you costs in both bandwidth as well as time to replicate your changes offsite. Used in combination with storage replication this makes for a very secure and resilient architectural approach to data protection, and in some cases, can help lower recurring expenses.
Another choice, in lieu of storage replication availability, is to leverage your persistent storage solutions (RDBMS or NoSQL) to replicate changes in real time as most best-of-breed technologies come with data replication and backup services by default. Spending the time upfront to understand which solution is most effective from a cost and execution standpoint is advisable as there are bound to be differences driven by compliance requirements.
In addition, investing in automation tools and services can greatly improve your response to an unplanned disaster, but does require a solid foundation of configuration management standards to successfully deploy and validate your configuration items (network hardware, storage appliances, server technology, etc.). A dedicated team of DevOps resources can be most effective in this area as Infrastructure as Code continues to gain widespread adoption. Imagine for a moment that instead of troubleshooting failures, you can simply re-provision to a previously certified configuration. Not only are you proving your ability to respond in the face of a disaster, but you may even benefit from automating your infrastructure builds, where applicable, by re-purposing valuable time and resources for other important work.
If you have the right automation in place, with an expected input and an expected output verified through repeatable processes, you mitigate the risk that an engineer or a database administrator will inadvertently push the wrong button and create a data disaster.
The traditional approach is to invest in server, network and storage hardware, and co-location. But you should also consider the major cloud services – Amazon’s AWS, Microsoft Azure or Google Cloud Platform – that allow you to back up your most important data straight to the cloud. It’s another way of investing in disaster recovery without necessarily incurring the cost of buying data centers or hardware.
Companies of all sizes face pressure from investors and customers who want assurance that sensitive data will be protected correctly no matter what happens – from credit card numbers to personally identifiable information (PII). As a cloud-based software provider, I know how important it is that customers have confidence that their data are protected at all times.
Following are some top questions to ask cloud vendors:Are they investing in automation? Your vendor should be investing in automation to support its own DR plan. A vendor’s own fortified technology foundation and strong security framework can help you meet your own stringent data requirements.
Are they seeking third-party assessments? Ask about their verification processes. They should be engaging an independent assessor twice yearly to verify the efficacy of BCM and DR processes for both U.S. and non-U.S. operations. Testing them twice a year is important because the software space is always changing, and these assessments help to ensure that BCM and DR plans stay fresh.
Are they making assessor reports available to you? Any vendor should make the independent assessor’s reports available to customers – ask to see them. Documentation of specific security certifications can provide additional evidence that their BCM and DR processes are effective.
Are they focused on recovery time? A recovery point objective (RPO) is the maximum targeted period in which data might be lost due to a major incident. Ask where your vendor falls in its industry segment. Similarly, ask about their recovery time objective (RTO) – the targeted duration of time within which they can restore our service after a disaster. Many providers guarantee a two- to three-day average restoration time frame.
Just as you are concerned about data loss and integrity for your own business, you should seek the same from any vendor. Test and refine your own processes, and make sure your vendors do too.
Original article: By Mark Goldin, NetworkWorld on May 30, 2017