I’m going to close my eyes and take a WAG (wild-ass-guess – one of my favorite TLAs) about your current backup process. You’re probably still on tape. You backup using a default grandfather-father-son rotation scheme – that being you keep daily backups for two weeks, weekly backups for a month, monthly backups for a year, and yearly backups… forever?
If you’re lucky enough to be off tape, you still use that same kind of scheme, only to a disk-backup system. Odds are high that backup exists on-site. If you’re on tape, the tapes are either being stored at Iron Mountain (or something similar) – or being taken home by a staff member every week. Those of you at bigger firms will have your disk-backup located at a colocation facility.
You feel pretty good, but not *great* about your backups. Those of you on disk backups have a higher confidence level, helped by the ease of your restores and the automated, no-human-involvement backups done every night.
While things are being backed up, there’s probably no real plan in place for all aspects of a situation where your office is made completely unreachable. The tapes or the data exist at the colo or at Iron Mountain – but now what? How do you access the data? How do you provide computers for the staff? Where do they work? Where do they get software? That’s the difference between a backup strategy and a business continuity/DR plan – and a topic for another day.
Going back to your situation – you get restore requests fairly infrequently. The vast majority of your restore requests come from users who deleted or changed a file and need something from the last 30 days. Occasionally, you’ll get a request for something 60-90 days back. Once or twice a year, you’ll get a request for a file or folder that’s several months old.
If you’re on tape, and you get one of those “old” requests – you’re crossing your fingers. You look up the Iron Mountain bin number, order the tapes back, and – assuming they’re still compatible with your current tape system – you do the restore. But as you know very well – from explaining it numerous times to your users – the farther back you go, the less granular your retrievability becomes. Assuming you’re using the scheme we started with – if they want a file from July 10, 2010 – we can give them the file as it looked on July 1 – or August 1 – but no closer. And past that year range – say July 10, 2008 – if you’re like most, you’ll have January 1, 2008 and January 1, 2009 – or even with quarterlies, June 1 2008, or August 1, 2008. That loss of granularity looks like this:
There’s a couple of assumptions here: 1) – that granularity = usefulness – i.e., the closer you are to the date in question, the better the restore, and 2) the pyramid showing the effort required to restore the data is the exact inverse of that pyramid – the older the data, the harder it is to restore.
So what do the charts summarize? That not only are most of the requests “near-term” requests, but that we can service them better, and we can do so more easily. So the next logical question is “How do we make all requests “near-term”, and why are we spending so much time, energy and money on *extremely* infrequent requests for old data?
Let’s tackle the latter first. Ever since the first WordPerfect document was accidentally deleted, executives have been demanding backups. Tape backups have been around the 1960s. There is a ton of traditional, “because that’s the way it’s always been done” thinking with tape backups. But bottom line – they work. It is a tried and true methodology for data protection. No IT director has lost their job because they implemented a stable, reliable tape backup system. The formula of tiered retention, and the loss of granularity over time is taken for granted, by users and IT pros alike.
The former part of the equation is much more difficult. I think for Architecture and design firms, we should be asking “why are we needing/using backups older than 90 days”? There’s only a few underlying reasons for a restore request for project data. The vast majority, as stated, come from accidental deletion/overwrites. If the user hasn’t discovered that error within 90 days, they’re not going to – ever. Restore requests from *beyond* those 90 days tend to be requests to see projects at a certain phase/point in time. And the foundation of this post is that if restores are being used for that purpose, there needs to be a significant change in the way projects manage data. Worse still are the restore requests that come from users having archived off project data to other media (DVD/hard drive) – and have lost that secondary media.
All this has led us to this point: a proposal for a new methodology for managing Architectural data. There are a few Golden Rules:
1) Separating Project Data is Evil. All data related to a project should *always* be kept, whenever possible, in the same location, i.e, under a single master project folder. Project components should not be distributed across other shares/folders, and should *never* be archived (moved) off to secondary media for any reason. Single-homing the project data and maintaining integrity is paramount. Obviously, there will be format differences (i.e., Public Folders) that can’t be avoided.
2) Hard Drive Storage is Dirt Cheap. I can buy a 3 TB drive for $180. But But But you’re saying… “it’s not enterprise class” or “it’s not the drive size, it’s the backup costs that are expensive.” Yep. Heard ’em all. To point one: ok, but a HP hot-plug 2TB drive is still only $600. And to point two: if you’re using those same disks for backup, those costs are seriously diminished. All this with the knowledge that 4TB drives are around the corner. Price points will remain the same, with sizes constantly increasing.
3) Big Data is Not Bad. There is a sense amongst some CIOs that large amounts of data is equivalent to disorganization. While large data amounts *can* be disorganized, it is not a direct correlation. What is more disorganized, a single 1 TB folder with 20 50-gig Photoshop files, or a single 50-gig folder with 10,000 DWG files? At the end of the day, it doesn’t matter what size a project folder is, as long as it is organized internally to either company standards or a format that the project team understands and allows them to work efficiently. And that *must* be a responsibility of the project team, not of IT.
Here’s the proposal, in four components:
First: A primary file server. This can be a SAN, or other mechanism, but it must have two characteristics: it must easily hold all your active projects, and it must be easy to add drive space, either internally or through chained expansion. If you feel performance is an issue, fill it with enterprise drives.
Second: Archive storage space. A “big, dumb box” as we call them – be it an appliance or otherwise – but full of cheap disks. This houses all your non-active (historical) projects, with read-only permissions.
Third: local backup box. Assuming you’re a Windows shop and can use VSS for 30 days of backup coverage, but want that 30-90 range to cover the 96% of restore requests you get, this box serves as a backup-to-disk for snapshot images of your primary server for days 30 through 90.
Fourth: offsite nightly snapshots of your primary and secondary storage. There’s a plethora of services out there (Rackspace, etc) that will do nightly images of your environment at rates as low as 15 cents per gig. This acts as your disaster recovery component. You can replace this with an internal service to your colo, but again, only a single nightly image is required.
Those are the technical components. The other half involves a perception shift on behalf of the Architects and project staff. Before, they probably viewed the project drive as a finite resource. No longer. They should view it as a bottomless storage area where any and all project information should be kept. They should be informed that with a shift to only doing 90-day restores, they should manage their snapshots accordingly – that is, any time they want to preserve the project condition for future use – they should do so within the project folder. Creating that themselves – and making it available online and within the project folder – makes it much easier for all parties than a future, subject-to-failure restore request.
Over the years, I’ve become convinced that the amount of money spent on making long-term, historical retrievals available is a huge waste of limited IT resources. We can achieve better – easier – and increase staff productivity by re-examining some of the backup beliefs we’ve long taken for granted.
What do you think? Would this work in your firm? Would your senior leadership buy into it?