What to do about the file bloat plague decimating backup and restore



i

Please note: This page uses Adobe Flash. If you do not have the Adobe Flash Player installed please download the Flash player here.


Marc Staimer: Welcome to the joint webinar from Neuxpower and Dragon Slayer Consulting: 'What to do about the file bloat plague decimating backup and restore'.

My name is Marc Staimer. I am the President and Chief Dragon Slayer of Dragon Slayer Consulting. I've been consulting for over twelve years – on storage and SANS and software and networks and servers and backup and restore and replication and snapshots and just about everything that goes on in the data center. I've consulted over 100 vendors and more than 400 end-users, and end-users are always welcome to contact me on any subject and I will provide them information at no cost to them. I provide analysis at trade shows, I publish consistently for TechTarget, on the web and for a variety of their trade magazines, and other trade magazines as well. I have over 30 years industry experience.

Today we're going to cover the file bloat problem and the severe consequences as it relates to backup and restore. Then Mike Power, the President and CEO of Neuxpower, will come on and discuss Neuxpower's solution.

So what is file bloat? It's also known as insidious file size creep and you've all seen it. You notice that your PowerPoints, Excel files and Word Documents continuously get bigger, month over month, year over year. And you don't know why they are getting bigger. Well, it's a direct result of inefficient software that does not optimize the file. And it puts stuff in the files. Higher resolution that you may not need, or pointless data baggage, or junk data, or superfluous redundant data. It consistently does this in your PowerPoints and Excels and Words and JPEGs. And you never notice what's going on because you don't see it in the data – but it makes the files bigger. What does this mean when it comes to backup?

Pretty obviously, there are severe consequences. You're going to see patently slower backups. Bigger files means more data, more data means more data to back up, more data to back up takes more time. There is no free lunch. If you have more data to backup up, you have longer backup times. So you have longer backup times and add to that the consistent growth of data that we're seeing – the explosion in unstructured data.

But restore consequences are worse because restores are always urgent. When you're restoring data, it means you've got to have your data NOW. And the last time you want to find out your data is not restorable is when you need to. Exceedingly long restores or Recovery Time Objective (RTO). Recoveries include unwanted and unnecessary data. You just want to recover your data, do you really want to recover the bloat that takes much longer to recover? And as a result you're waiting on data you need because you're recovering data you don't care about? I don't think so. More data, as I said, equals longer recovery times. Longer recovery times equals unhappy users, unhappy customers and unhappy executives.

File bloat, at the end of the day, wastes a lot of time and money. Lost time is a non-recoverable resource – you can never get it back. Once time is spent it's gone, it's gone forever. And it leads to decisively slower backups and restores.

And the problem is growing worse. There are over 100 million Microsoft Office documents created every day – that's from Forrester Research. Per IDC, more than 161 exabytes of digital information was created last year – that's more than three million times the information in all the books that were ever written. And per Taneja Group, more than half of new corporate data growth is of the unstructured data type that we're talking about, like Microsoft Office documents. And that's set to grow at nearly 100% per year.

Now common workarounds are only partial solutions. Data deduplication or compression, they work great on backups after the first backup. So they will reduce backup times and windows on those, but only on duplicate files or blocks. When it comes to recovery? Doesn't do a thing. Your recoveries are going to be much longer because you're consistently recovering data you don't need, you don't want, and it slows you down. So your RTOs are always going to be longer. And remember, that's the part where you can't afford for it to be longer.

Deduplication essentially just gets rid of fat copies. You've still got to reconstitute those copies to be read. So even though you may be improving your backup windows over time (because you're getting rid of deduplicated copies which occur a lot in backups and snapshots and replication) you're not going to do anything about it on recovery. Because you've still got to recover all that data. Not just copies of it, but FAT copies of it!

It has severe limitations. As I said, it works great for the first backup and all the duplicates, but it doesn't do much for recovery. So restoring your data, you're going to have even more latency, because you've got to restore it from the deduplication, and it's going to take much longer. It doesn't do anything about file bloat – file bloat still remains.

What about compression? Well, compression just hides the fat. It too has to be reconstituted to be read. At the end of the day, you're going to have to read all that data to recover it again – and you're adding latency.

So again, there are compression limitations. Works great on reducing the backup data size (when you're doing backups you're going to have less data to back up), but when you're doing recoveries, you still have to recover all that data. And you're adding latency, making that restoring process that much longer. File bloat still remains. You're not dealing with the problem – you're dealing with only one symptom of the problem, which is on the backup side.

You have to remember that backup is a means to recovery. The 'ends' are the recovery. Backup is a necessary and important task, so dedupe and compression do help a necessary and important task. But they do nothing for recovery. And the purpose of backup is to recover data. The worst time, as I said earlier, to find out recoveries take too long, is when you need to recover.

There has to be a better way. Now, to tell you about the better way is Mike Power, CEO and President of Neuxpower. Mike...

Mike Power: Thanks Marc, for telling us about the problems of file bloat and their effect on backup. I'd like to talk about file optimization, a completely new approach to data reduction that allows you to cut file bloat at the source.

Information Week describes our technology as 'wringing files out, shaking unneeded bytes out of graphics and included objects to radically reduce their size without affecting their appearance.'

The key to NXPowerLite file optimization software is that it's lossy. This may sound a little dangerous, but the data we're removing is gratuitous. It's unnecessary baggage that nobody actually needs. And we do this while taking great pains to ensure that we don't compromise the visual content integrity of your files. By that, I mean that the optimized files look identical to the originals in every way – they will just be a lot smaller.

Lossy technology is the only way to reduce file bloat. And because we're tackling the problem at the source, those reductions will be passed on and will enhance the effects of both dedupe and compression.

Let me give you an example of the kind of fat NXPowerLite is able to eliminate. Modern digital cameras these days create huge JPEGs. And when people paste these into documents, they tend to be a lot larger than they need to be – and then consequently so are the documents. Not a huge problem in isolation. But when you multiply that by tens of images per document, hundreds of documents per user and maybe thousands of users on a network, it can add up to be a pretty huge problem. NXPowerLite will find every single one of those images and make sure they are exactly the size they need to be – and no bigger. And the net result of that is a huge reduction in the storage consumed by the documents.

An added bonus is that the files remain in their original format, which means you don't need to decompress them. You don't need any special software to view or edit them. Optimized files are the original files, just without the junk.

NXPowerLite currently optimizes PowerPoint, Word, Excel and JPEG files. And we picked those in particular because they are the biggest contributors to storage consumption. They tend to be the most bloated files and they are certainly the most prolific.

So what kind of reductions can you expect on these file types? Well, in 2007 the Coalition Navies independently tested NXPowerLite and recorded average file size reductions of 68% for Word, 76% for Excel and and amazing 84% for PowerPoint. And these savings will translate typically into 30-40% overall storage reduction for most organizations.

Our desktop solution is used by over a million people worldwide.

It has been deployed extensively by many of the world's leading organizations.

It has been tested, accredited and heavily adopted by major defense organizations around the world, including the US Army, Air Force and Navy.

I'd like to finish on a quote from NATO. They sent us these words when they returned from using our software in Afghanistan. 'NXPowerLite has been thoroughly tested under the most rigorous of operational circumstances, and was never once found wanting.'

In conclusion, file bloat leads to slower backups. And slower backups mean missed windows, longer restores and ultimately time and money. Deduplication and compression do not fix file bloat. NXPowerLite does.


Marc Staimer: Outstanding, Mike. That was really good information – the audience loved it. So much so, they have questions for you. The first question is 'Will your software work with my deduplication or compression?'.

Mike Power: Absolutely. File optimization does not preclude the use of deduplication or compression. In fact, it compliments them nicely. You're able to optimize your files, removing the file bloat at the source, and this will improve the results you'll get with your deduplication. So you'll have smaller, optimized files to run through dedupe or compression.

Marc Staimer: The next question is 'How processor-intensive is NXPowerLite?' In other words, 'What is the performance hit?'

Mike Power: It is pretty processor intensive actually. I certainly wouldn't recommend that it was run on a network while users are trying to use the system. It's definitely something that you'll want to be running during downtimes – at the weekends or overnight.

Marc Staimer: OK. The third question, and the last one we have time for today, is 'When should I run it and how do I make sure there is no conflict with the backup software?'

Mike Power: That's a good question. So, as I was saying, we definitely recommend that it's run in downtimes. So, as backup is also pretty processor-intensive, I would say that you'll want to schedule it for a time when your backup is not running. Say, in a time slot at the weekend maybe. And we have a build-in scheduler for the tool, that will allow you to pick any time slots when you know other applications are not running and using the system resources. And the great thing about the scheduler is that you can set up the time slots you want to use, and then just leave it. It will start, build a list of all the files to optimize, and then run for the time that you have allotted. The next time it picks up, it will start from where it left off, and keep going until it's done.

Marc Staimer: Great. Mike, thank you very much, and thank you all for attending. This webinar will be available for download at any time from www.neuxpower.com. Thanks again.