The storage file bloat problem and what you can do about it



i

Please note: This page uses Adobe Flash. If you do not have the Adobe Flash Player installed please download the Flash player here.


Marc Staimer: Hello, and welcome to 'The storage file bloat problem and what you can do about it', a joint Neuxpower and Dragon Slayer Consulting webinar.

My name is Marc Staimer. I am the President and Chief Dragon Slayer of Dragon Slayer Consulting. I've been a consultant now for over 12 years. I have focused on storage and SANS and software and virtualisation and networks and servers and just about everything in the data center. I've consulted well over 100 vendors and considerably more than 400 end users. I publish consistently online at TechTarget and a variety of other trade magazines. And I have over 30 years industry experience.

Today we're going to cover five things. I'm going to cover the file bloat problem and the storage consequences of the file bloat problem. And Mike Power, the CEO and President of Neuxpower, will talk about the Neuxpower solution, proof points of that solution and some conclusions.

So lets begin with the file bloat problem. What is file bloat? Well, it's also known as insidious file size creep, and I'm sure you've all seen it. Files keep getting bigger on an ongoing basis. And this is a direct result of gratuitously fatter files – and that comes because the software is inefficient. Applications like Office create excessive file resolution, pointless data baggage, redundant data and superfluous 'junk' data within the files. So the files are bigger than they need to be – in fact a lot bigger than they need to be – because the software is inefficient.

There are serious storage consequences that are a direct result of this inefficient software. First, you're going to consume storage a heck of a lot faster. And if you're consuming it faster, you've got to buy more of it. So if you have more storage, that results in more storage systems, more storage management, more storage networks, more server ports, more server management, more switch ports, more switches, more switch management, more cables, more cable management (we all love cable management), more power, more cooling, more racks, more floorspace and a lot more manually intensive administrative tasks. At the end of the day, it's a lot more capital expenditure and operating expenditure.

But it doesn't stop there. Because if you're consuming more primary storage for your data, for these fat files, you're going to consume even more secondary storage. You're going to end up with longer backups, longer replications or longer snapshots – so much longer that you may even miss backup windows. And recoveries? They're much longer too. Because you're recovering data you don't care about, don't want, don't need. But when you have fat files, you're going to be recovering them. More secondary storage ultimately means to you even more systems, more storage networks, more port cards, more switch ports, more switches, more cables (there's that cables again), more power, more cooling, racks, floorspace etc and a lot more management and a lot more manually intensive tasks.

File bloat wastes your time and wastes your money. And remember, unlike money, time is a non-renewable resource. It's non-recoverable – once it's spent it's gone, it's gone forever. And that means you're wasting not just capital expenditure, but organizational resources – people. At the end of the day it's an incredible IT infrastructure waste – a tremendous waste.

And when you add to file bloat the accelerating file growth, you're going to have a migraine headache. IDC, Forrester and Taneja Group all have facts on how fast this is growing. Forrester Research says there are over 100 million Microsoft Office documents created every day – and every single one of them has file bloat. IDC says over 161 exabytes of digital information was created last year alone. And that represents three million times the information in all the books ever written. And Taneja Groups says more than half of all new corporate data growth is unstructured data such as Microsoft Office documents. Again, bloated files. And that's set to grow at nearly one hundred percent per year. That's a migraine headache!

What about the common workarounds? What about deduplication, that should solve it, right? Or compression? Not exactly. They can make matters worse. They were designed for the secondary storage consumption problem, and they do work OK there. But on primary storage? And on bloated files? Not so much.

Dedupe is designed to get rid of fat copies. In other words, if I have duplications – copies of my primary files and my data in those files – then I'm going to get rid of those copies. OK? But you still have to reconstitute whatever you're deduping. So when you start with a file, or block of data, or blocklet of data, and you dedupe it, when you read it you have to reconstitute it. You're gonna take time deduping it and you're gonna take time reading it – remember time is a non-renewable resource. So that's going to add to your response times, because you're adding latency.

So, from a user perspective, dedupe on primary storage has severe limitations. One, you're not going to get the reductions you expect. It's designed, as I said, for secondary storage where you have lots of duplication. But in primary data, you don't have that kind of duplication. A lot of primary data – a lot of file data – is compressed. And when it's compressed, you can't dedupe it. Why? Because compression moves your blocks around. So at a file level, you can dedupe to a point. But at a block level, it's harder to find duplications. And you're not going to have that many duplicate files. And, as I said, reading and writing – it's gonna add latency. You know that phone system? It's going to light up like a Christmas tree when users decide 'Hey, my performance is just grinding to a halt. It's getting much slower'. Remember, file bloat with deduplication still remains. You're just hiding it in the secondary storage.

Compression? It's another one of those tools that kinda hides the fat, but doesn't eliminate it. It too has to be reconstituted, after it's been compressed, to read it. Therefore, you're again adding latency at both the write side and the read side.

Compression in primary data also has serious reduction limits. Most files are already compressed. Microsoft Office 2007 compresses documents and spreadsheets and presentations – so they are already in a compressed format. It's hard to compress a compressed file. JPEGS are already compressed. So you don't really reduce much when you're compressing compressed files. And, as I said, it adds latency again. So you're going to reduce your response time and IOPS. At the end of the day, file bloat still remains.

There has to be a better way. Now, to show you that better way, Mike Power, CEO and President of Neuxpower.

Mike Power: OK. Thanks Marc for that great introduction to storage growth, and the two existing solutions for data reduction. I'd like to talk today about a third approach, called file optimization, which allows you to eliminate fat from the files on your network, cutting file bloat at the source. It is neither dedupe, nor is it compression – it's a completely new approach.

Information Week describes our technology as 'wringing files out, shaking unneeded bytes out of graphics and included objects to radically reduce their size without affecting their appearance.'

The key to NXPowerLite file optimization software is that it's lossy. This may sound a little dangerous, but the data we're removing is gratuitous. It's unnecessary baggage that nobody actually needs. And we do this while taking great pains to ensure that we don't compromise the visual content integrity of your files. By that, I mean that the optimized files look identical to the originals in every way – they will just be a lot smaller.

Lossy technology is the only way to reduce file bloat. And because we're tackling the problem at the source, those reductions will be passed on and will enhance the effects of both dedupe and compression.

Let me give you an example of the kind of fat NXPowerLite is able to eliminate. Modern digital cameras these days create huge JPEGs. And when people paste these into documents, they tend to be a lot larger than they need to be – and then consequently so are the documents. Not a huge problem in isolation. But when you multiply that by tens of images per document, hundreds of documents per user and maybe thousands of users on a network, it can add up to be a pretty huge problem. NXPowerLite will find every single one of those images and make sure they are exactly the size they need to be – and no bigger. And the net result of that is a huge reduction in the storage consumed by the documents.

An added bonus is that the files remain in their original format, which means you don't need to decompress them. You don't need any special software to view or edit them. Optimized files are the original files, just without the junk.

NXPowerLite currently optimizes PowerPoint, Word, Excel and JPEG files. And we picked those in particular because they are the biggest contributors to storage consumption. They tend to be the most bloated files and they are certainly the most prolific.

So what kind of reductions can you expect on these file types? Well, in 2007 the Coalition Navies independently tested NXPowerLite and recorded average file size reductions of 68% for Word, 76% for Excel and and amazing 84% for PowerPoint. And these savings will translate typically into 30-40% overall storage reduction for most organizations.

Our desktop solution is used by over a million people worldwide.

It has been deployed extensively by many of the world's leading organizations.

It has been tested, accredited and heavily adopted by major defense organizations around the world, including the US Army, Air Force and Navy.

I'd like to finish on a quote from NATO. They sent us these words when they returned from using our software in Afghanistan. 'NXPowerLite has been thoroughly tested under the most rigorous of operational circumstances, and was never once found wanting.'

In conclusion, file bloat consumes a lot of storage. And more storage means increased infrastructure, management, power, cooling and a lot more cost. Deduplication and compression do not fix file bloat. NXPowerLite does.


Marc Staimer: Mike, that was great information. Some of the audience members have sent some questions in, so lets go ahead and answer them. The first question for you is 'Will it break my files?'

Mike Power: That's a great question. We get asked that a lot! It is possible, and early on we did break some files. But now, with over a million installs, we've learned the hard way exactly how not to break files. We're confident that our efforts over the last nine years to preserve the content integrity of files have given us the safest, most reliable solution on the market.

Marc Staimer: OK, that's good. And the next question is 'What level of data reduction can I expect with your product?'

Mike Power: There's a couple of metrics here, and on the files that we focus on, we tend to achieve in the average range of 60-80% reduction – which translates, on an average customer's server, to an average of 30-40% overall reduction of storage.

Marc Staimer: OK. And a third question for you is 'How do you reduce the file size without reducing the visual content integrity?'

Mike Power: So, there is a balance there – and it's certainly more of an art than a science. But we have worked extensively, with customers and in-house testing, to find the best balance between reclaiming storage from files and keeping the content integrity. And it's something we continue to work on over time, as new information becomes available. We have literally thousands of files that we have tested, and we're pretty certain that we've got a good balance there.

Marc Staimer: And there's a follow-on question to the second one, which has to do with 'How does your data reduction differ from compression?' Because the compression guys say that they get 60-80% reduction.

Mike Power: Well, I guess there's a couple of answers to that. We are targeting different areas and additional areas of the file than compression targets. So what we do is not mutually exclusive to compression. So you can still use compression, and you can also optimize and you'll get double the reduction. And an added benefit is that we don't wrap the file – so there's no file format change, there's no need to rehydrate or decompress your files. So the reduction we're getting is without those compromises.

Marc Staimer: Well thank you very much, Mike. And thank you all for coming. This was a very informative webinar.

Mike Power: Great stuff. Thanks.