Protecting your data

Some general thoughts on protecting your non-professional data. Probably overkill. Because me.

A friend was after some general advice on storage/availability and backups for home stuff so rather than just reply specifically to him I thought I’d put my general thoughts together on non-professional data protection. The stuff in my day job is generally focused on 99.999% (5 nines) availability so the backup and availability strategy is usually beyond what you’d want to do for home.

The questions are usually around should I implement RAID? How do I backup? What should I backup? Where to…? To answer these questions you really need to look at fundamental aspects of your data protection strategy, and they are:

  • Recovery Time Objective (RTO). How long would it take to restore you data, and how much of a pain would it be not having that data for that recovery time? There’s also the question of effort to restore data, but that’s a softer consideration - if you’re busy, this can be a significant added burden to any potential restoration process - arguably this massively increases the pain of not having your data for that recovery time.
  • Recovery Point Objective (RPO). How much data can you accommodate losing between backups? For static stuff that doesn’t change often for example you may only backup once a week or so.
From a general data protection point of view, the 3-2-1 backup strategy is the one most talked about. What this means is:

  • You have three copies of your data at any one time.
  • Two of them are on physical different devices.
  • One of them is away from the main premises you use - I.e. Off-site storage.
Considering the above is how I would come to a backup & data protection strategy. A couple of quick points:

  • RAID is not a backup. Using RAID is a strategy that affects your RTO and RPO. Lose a RAID array and you’re in trouble aren’t you? Having RAID does not affect the 3-2-1 strategy, it is an availability technology, nothing more. It vastly reduces your RTO & RPO. Lose the array with no backup then your RT & RP become infinite….
  • Automation is key to a good-backup strategy. If you have to do something manually, the one time you think you’ll be fine is the one time you’ll be crying in to your soup.
  • You may want to consider have a second off-site copy. Why? Well, consider ransomware protection. If your backup solutions are automated to the cloud for example, there is a (albeit remote) possibility that your off-site backups also get encrypted with Ransomware. To see what I mean in a bit more detail, have a look at my video here. RansomWare - Protect Your Stuff!
So, in reality, what would a backup solution look like?

  • One device with live data.
  • One device with a copy of your live data.
  • One off-site copy.
So where does the RTO and RPO come in to it? Well, it comes down to how quickly you need your data backup, and how much you can lose. Traditionally, most systems would backup every evening (often using a Grandfather, Father, Son scheme) and this will probably be enough for most home systems. What’s the worse case here?

Let’s say you backup at 23:00 overnight. One lovely Friday at 22:59 your main storage blows up/gets flooded with milk (don’t ask). Well, you’ll have lost all of your data from 23:00 the previous night to 22:59 on the day of the milk issue. That’s your Recovery Point.

Next, you need to consider how long it takes to restore your data - that’s your recovery time.

Where does RAID come in to this? Like I say, this is an availability consideration, not a backup. If you:

  • Have a good backup system that’s automated and backs up to another device every night.
  • Would be OK with losing 24 hours of data.
  • Would be OK with the time it takes to get access to your data….
…. Then what will you gain from RAID? Not a lot really. However consider that you may want everything to just carry on working even in the event of a drive failure - that scenario RAID is a great help. You can take a drive failure and carry on as you are and replace the drive. Note you’re still backing up at this point.

When considering your backups from device one to device two, do you just want them to be exact replicas? There’s danger in this. Imagine corrupting some stuff and not realising. You’ll end up with the corruption duplicated on to the other devices, and your off-site backup. This is where having the
Grandfather, Father, Son mode of historical backups come from - this takes more automation to achieve, and you may of course consider it well beyond the requirements for home.

So…do I need RAID? It’s not as simple a question to answer as may first appear is it? Personally I think that anything keeps your data available, and avoids me having to resort to backup systems is absolutely worth it. You really want your backup system to be a ‘last resort’ type thing, so in reality I always tend to RAID my stuff. This is where NAS devices come in by the way - not just for their RAID systems but also for their in-built backup solutions. Let’s take how I used to use my Synology stuff (I’ve upgraded now for 10Gbe and I have a ridiculous internet connection so rely on Azure stuff a lot more now):

Primary Device
Synology 918 with 4 x 12TB drives giving about 36TB available.

Synology 416 (I think) with 4 x 12TB drives giving about 36TB available.

Overnight the primary device is backed up to the secondary, and it has a 12 month retention - I.e I can go back up to pretty much any point in the previous 12 months. In addition to that, live data that changed often was snapshotted from the primary to the secondary about 4 times an hour.

Finally, the secondary Synology also sync’d those backups to an off-site hosted solution.

Probably way over the top however the principle can be easily replicated without all that expensive gear.

Primary Device
2 x 6TB drives, mirrored, so 6TB available. If you get a drive failure your data is still available, and you can replace the drive.
You primary device also replicates your data to cloud storage.

Secondary Device
A 6Tb or larger hard disk with point in time incremental backups of the primary.

Far smaller, but with the same principle, and you get the same range of dates for your recovery point (I.e. You can restore back to a point in time.

Told you the question isn’t as simple as you’d imagine.

blog comments powered by Disqus