With all the myriad of possible hardware solutions to storage I/O performance issues, the question that people are starting to ask is something like:
If I just buy newer, faster Storage, won’t that fix my application performance problem?
The short answer is:
Maybe Yes (for a while), Quite Possibly No.
I know – not a satisfying answer. For the next couple of minutes, I want to take a 10,000-foot view of just three issues that affect I/O performance to shine some technical light on the question and hopefully give you a more satisfying answer (or maybe more questions) as you look to discover IT truth. There are other issues, but let’s spend just a moment looking at the following three:
- Non-Application I/O Overhead
- Data Pipelines
- File System Overhead
These three issues by themselves can create I/O bottlenecks causing degradation to your applications of 30-50% or more.
#1 Non-Application I/O Overhead:
One of the most commonly overlooked performance issues is that an awful lot of I/Os are NOT application generated. Maybe you can add enough DRAM and go to an NVMe direct attached storage model and get your application data cached at an 80%+ rate. Of course, you still need to process Writes and the NVMe probably makes that a lot faster than what you can do today. But you still need to get it to the Storage. And, there are lots of I/Os generated on your system that are not directly from your application. There’s also lots of application related I/Os that are not targeted for caching – they’re simply non-essential overhead I/Os to manage metadata and such. People generally don’t think about the management layers of the computer and application that have to perform Storage I/O just to make sure everything can run. Those I/Os hit the data path to Storage along with the I/Os your application has to send to Storage, even if you have huge caches. They get in the way and make your Application specific I/Os stall and slow down responsiveness.
And let’s face it, a full Hyper-Converged, NVMe based storage infrastructure sounds great, but there are lots of issues besides the enormous cost with that. What about data redundancy and localization? That brings us to issue # 2.
#2 Data Pipelines:
Since your data is exploding and you’re pushing 100s of Terabytes, perhaps Petabytes and in a few cases maybe even Exabytes of data, you’re not going to get that much data on your one server box, even if you didn’t care about hardware/data failures.
Like it or not, you have an entire infrastructure of Servers, Switches, SANs, whatever. Somehow, all that data needs to get to and from the application and wherever it is stored. And if you add Cloud storage into the mix, it gets worse. At some point the data pipes themselves become the limiting factor. Even with Converged infrastructures, and software technologies that stage data for you where it is supposedly needed most, data needs to be constantly shipped along a pipe that is nowhere close to the speed of access that your new high-speed storage can handle. Then add lots of users and applications simultaneously beating on that pipe and you can quickly start to visualize the problem.
If this wasn’t enough, there are other factors and that takes us to issue #3.
#3 File System Overhead:
You didn’t buy your computer to run an operating system. You bought it to manipulate data. Most likely, you don’t even really care about the actual application. You care about doing some kind of work. Most people use Microsoft Word to write documents. I did to draft this blog. But I didn’t really care about using Word. I cared about writing this blog and Word was something I had, I knew how to use and was convenient for the task. That’s your application, but manipulating the data is your real conquest. The application is a tool to allow you to paint a beautiful picture of your data, so you can see it and accomplish your job better.
The Operating System (let’s say Windows), is one of a whole stack of tools between you, your application and your data. Operating Systems have lots of layers of software to manage the flow from your user to the data and back. Storage is a BLOB of stuff. Whether it is spinning hard drives, SSDs, SANs, cloud-based storage, or you name it, it is just a canvas where the data can be stored. One of the first strokes of the brush that will eventually allow you to create that picture you want from your data is the File System. It brings some basic order. You can see this by going into Windows File Explorer and perusing the various folders. The file system abstracts that BLOB into pieces of data in a hierarchical structure with folders, files, file types, information about size/location/ownership/security, etc… you get the idea. Before the painting you want to see from your data emerges, a lot of strokes need to be placed on the canvas and a lot of those strokes happen from the Operating and File Systems. They try to manage that BLOB so your Application can turn it into usable data and eventually that beautiful (we hope) picture you desire to draw.
Most people know there is an Operating System and those of you reading this know that Operating Systems use File Systems to organize raw data into useful components. And there are other layers as well, but let’s focus. The reality is there are lots of layers that have to be compensated for. Ignoring file system overhead and focusing solely on application overhead is ignoring a really big Elephant in the room.
The Wrap Up
Let’s wrap this up and talk about the initial question. If I just buy newer, faster Storage won’t that fix my application performance? I suppose if you have enough money you might think you can. You’ll still have data pipeline issues unless you have a very small amount of data, little if any data/compute redundancy requirements and a very limited number of users. And yet, the File System overhead will still get in your way.
When SSDs were starting to come out, Condusiv worked with several OEMs to produce software to handle obvious issues like the fact that writes were slower and re-writes were limited in number. In doing that work, one of our surprise discoveries was that when you got beyond a certain level of file system fragmentation, the File System overhead of trying to collect/arrange the small pieces of data made a huge impact regardless of how fast the underlying storage was. Just making sure data wasn’t broken down into too many pieces each time a need to manipulate it came along provided truly measurable and, in some instances, gave incredible performance gains.
Then there is that whole issue of I/Os that have nothing to do with your data/application. We also discovered that there was a path to finding/eliminating the I/Os that, while not obvious, made substantial differences in performance because we could remove those out of the flow, thus allowing the I/Os your application wants to perform happen without the noise. Think of traffic jams. Have you ever driven in stop and go traffic and noticed there aren’t any accidents or other distractions to account for such slowness? It’s just too many vehicles on the road with you. What if you could get all the people who were just out for a drive, off the road? You’d get where you want to go a LOT faster. That’s what we figured out how to do. And it turns out no one else is focused on that – not the Operating System, not the File System, and certainly not your application.
And then you got swamped with more data. Okay, so you’re in an industry where regulations forced that decision on you. Either way, you get the point. There was a time when 1GB was more storage than you would ever need. Not too long ago, 1TB was the ultimate. Now that embedded SSD on your laptop is 1TB. Before too long, your phone will have 1TB of storage. Mine has 512GB, but hey I’m a geek and MicroSD cards are cheap. My point is that the explosion of data in your computing environment strains File System Architectures. The good news is that we’ve built technologies to compensate for and fix limitations in the File System.
Where I get to Have Fun
Let me wrap this up by giving you a 10,000-foot view of us and our software. The big picture is we have been focused on Storage Performance for a very long time and at all layers. We’ve seen lots of hardware solutions that were going to fix Storage slowness. And we’ve seen that about the time a new generation comes along, there will be reasons it will still not fix the problem. Maybe it does today, but tomorrow you’ll overtax that solution as well. As computing gets faster and storage gets denser, your needs/desires to use it will grow even faster. We are constantly looking into the crystal ball knowing the future presents new challenges. We know by looking into the rear-view mirror, the future doesn’t solve the problem, it just means the problems are different. And that’s where I get to have fun. I get to work on solving those problems before you even realize they exist. That’s what turns us on. That’s what we do, and we have been doing it for a long time and, with all due modesty, we’re really good at it!
So yes, go ahead and buy that shiny new toy. It will help, and your users will see improvements for a time. But we’ll be there filling in those gaps and your users will get even greater improvements. And that’s where we really shine. We make you look like the true genius you are, and we love doing it.
Rick Cadruvi, Chief Architect
Originally Published on Sep 20, 2018. Last updated Jan 17, 2023
Thanks for the comment and question Arnie. The performance related functions in Samsung’s Magician Management Software for SSDs do have an overlap with some of our functions. If you desire to use the other functionality such as Firmware Update, Drive Health & TBW Check, etc…, I would recommend you disable Rapid Mode to avoid any conflicts as you pointed out yourself. If nothing else, you will avoid the potential overlapped DRAM usage and our product provides a very optimized DRAM cache anyway along with other optimizations you won’t get from the Magician Suite.
Very well presented – so many more considerations than just implementing a shiny new drive. I would like to see you address interaction considerations for those of us that use Samsung drives and invoke the use of their Magician management software. It seems to me that some of the Magician functions could be duplicative to Diskeeper functions and thus cause interactive problems. I have ceased using Samsung Magician and only run Diskeeper on my system. I would just like to know if my concerns are founded or not.