I/O Assessment tool

The Condusiv I/O Assessment tool is designed to provide you the ability to see how well your storage is performing. It gathers numerous storage performance metrics that Windows automatically collects over an extended period of time. It then performs numerous statistical analyzes looking for potential problems throughout the period of time the monitoring took place. It even looks for potential areas of cross node conflicts. By correlating across multiple systems it can infer that nodes are causing performance issues for each other during the overlapping periods of time. It then displays a number of metrics that will help you understand where potential bottlenecks might be.

The tool has 4 basic phases:

Setup
Data Collection
Analysis
Reporting

Setup:

The Condusiv I/O Assessment tool requires .NET version 4.5.1 and supports Windows 7, Windows 8, Windows 10 as well as Windows 2008 R2 and 2012 (or R2).

The setup phase allows you to do one of the following:

Import Data From A Previous Data Collection Cycle
Enter The Information Necessary To Perform A Data Collection Cycle

If you choose to import data, you are directed to a file chooser popup that will allow you to locate and select a previously stored data collection. By default, data collections are comma separated value lists (.csv files). Once you have selected the file you want to report on and opened it, you will go straight to the reporting screen.

If you decide you want to gather data from your systems, you will need to provide the following data:

List Of Systems To Monitor
The Number Of Days To Monitor (1-7)
The Starting Day Of The Upcoming Week You Want To Start Collecting Data On
Credentials That Will Allow The Tool To Access Your Systems Using Remote WMI To Collect The Required Data

The system names you input need to be able to be accessible remotely. Therefore the names can be IP Addresses or names that can be resolved via DNS into IP Addresses. This tool will collect data for ALL the systems you enter. Should the tool not be able to connect to one of the systems when you tell it to start collecting data, it will tell you there was a problem with accessing the system and allow you to re-enter the system name so the tool can collect the data.

The tool allows you to monitor multiple systems for a time period of one to seven days. The monitoring period will start right after midnight the day before you select. In other words, if you pick Monday as your starting day, the data collection will start right after midnight on Sunday evening. This gives you data for the entire day for all the days that you asked to monitor.

Windows collects various types of data, especially performance data that can be used by tools to manage or monitor Windows systems. This is called Windows Management Instrumentation (WMI). The data collection portion of the I/O Assessment tool utilizes that capability to capture specific system and storage related performance metrics Windows already collects. This data can be accessed remotely and many tools, such as Microsoft’s System Center, utilize this data remotely. This tool uses the credentials you enter to collect WMI data remotely. That is why it does not need to install any software on the systems being monitored. Windows already collects the data and makes it available. This tool merely collects it so it can analyze and report on the data. The credentials you enter are not saved persistently. Once this tool exits from its current run the knowledge of the credentials entered goes away. An import of an existing data collection does not require credentials as all the data is local to the system and therefore no remote access of systems is needed.

Data Collection:

Once you tell the I/O Assessment tool to “Start Data Collection”, it will display a countdown timer letting you know how long until the actual data collection starts and then how long until it completes.

Example: Let’s say it was Friday at 6pm and you had requested the monitoring to start on Monday and run for 5 days. The initial countdown timer until the actual data collection starts will begin with “54:00:00”. In other words, it will be 54 hours (2 days and 6 hours) until the data collection starts. When the initial countdown timer runs down to 00:00:00, it will reset for the number of days you selected. In our example it will reset to 120:00:00 (120 hours or 5 days).
Note: You must NOT exit this tool during the data collection cycle. If you do, the data collection will NOT finish and you will not have the data you intended to collect for reporting on.

Note: You must NOT exit this tool during the data collection cycle. If you do, the data collection will NOT finish and you will not have the data you intended to collect for reporting on.

Analysis:

Once the I/O Assessment tool has collected the data, or imported a previous data set, it is ready to prepare the data for reporting. At this time a number of the raw data metrics are transformed into something more insightful. An example would be the I/O Response Time. This is actually an average amount of the time that a typical I/O takes to complete. This tool knows the total amount of time the I/Os that were performed took and it knows the number of I/Os. It makes the transformation during analysis.

The I/O Assessment tool also performs a significant number of statistical calculations. This tool collects the data in “buckets”. These buckets represent relatively small periods of time (5 minutes). This approach allows the tool to see peaks and valleys in how the storage gets used. It also allows for cross node checks. The tool calculates the median and standard deviations for each bucket for the metrics it reports on.

The I/O Assessment tool also creates a summary system based on the systems you chose to report on. The summary system is an aggregate of the systems you have selected. In the reporting screen you can decide which of the systems in the data collection to select for reporting. Once you make that selection and choose to show the results, this tool creates the summary system and does statistical analysis on the aggregated data.

The I/O Assessment tool can also be used to create a Performance Comparison Report. To generate a performance comparison report select two data collection files which were run on the same set of systems and for same time duration. Click on the “Load Files” button and then select the systems for which the report needs to be generated. After selecting the systems to compare, click on “Show Report” button to generate the report. The comparison report will show a comparison of the aggregated data from the two data collections.

Reporting:

The reporting screen has three main sections:

Summary Of Systems In Data Collection
Series Of Individual Storage Performance Metrics
Conclusions About Your Storage Performance For Selected Systems

In the summary section, there is a grid containing the list of systems that are available from the data collection you just collected data on or imported. The grid also contains totals for each system for the various metrics for the entire time of the data collection. The list is sorted from systems that have potential storage performance issues to those that do not appear to have storage performance issues. The systems that have storage performance issues are highlighted in red. The systems that might have storage performance issues are in yellow. The systems that do not appear to have storage performance issues are in green. By default the systems that have storage performance issues are selected for the report. You can select all the systems, or any set of systems including a single system, for reporting on.

Once you have selected some systems to report on and asked to display the report, you can expand any of the following set of metrics to look at:

Workload In Gigabytes
I/O Response Time
Queue Depth
Split I/Os
IOPS
I/O Size
I/O Blender Effect Index
Seconds Per Gigabyte
Reads To Writes Ratio
Memory Utilization
CPU Utilization

When you expand one of these sections, you will be able to see data from any day within the set of days this data collection represents. By default you will see the day with the Highest Total GB. This is the day with the highest total throughput to your storage. You can also select the Day with Highest IORT (I/O Response Time). A peak is the highest value for any data bucket. Data buckets are small time periods (5 minutes).

Each section shows you the following data for the day:

Maximum Peak Value
Hours With Peak Above Normal
Normal

The normal value is the median across all the buckets plus and minus one standard deviation. The tool shows the median plus one standard deviation as the normal. The tool then shows you the peak value during that day and the number of hours with a peak outside that normal range. Peaks outside the normal range are generally values that are greater than or equal to plus one standard deviation from the median value.

Each section also allows you to see a graph for peak values during each hour of the day. This will allow you to see the times during the day when your storage has the greatest loads. It also shows you when your storage is not terribly busy. This might point to a convenient time for doing some extended reporting or other I/O extensive operations that are schedulable, thus avoiding extra pressure on application performance during peak load periods.

For most metrics, there is also displayed an aggregated “total” for the day.

For each section, the reporting is based on data collection bucket time periods, not the whole day. The analysis is done over the entire day’s data.

Let’s move on to a description of the metrics reported on.

Workload in Gigabytes:

This is a measure of the number of Gigabytes of data that was processed by your storage. It is represented in 5 minute time slices. Remember, the normal and peaks are per bucket, NOT for the whole day. It is a measurement of throughput. The peaks indicate when the storage is being used the most and can show you periods where you can offload some work to provide greater performance during peak load periods. The valleys indicate periods of lower storage utilization.

I/O Response Time:

The I/O Response Time is the average amount of time in milliseconds (1000ths of a second) that your storage system takes to process any one I/O. The higher the I/O Response Time, the worse the storage performance. The peaks indicate possible storage performance bottlenecks.

Queue Depth:

Queue Depth represents the number of I/Os that are having to wait because the storage is busy processing other I/O requests. The larger the value, the more the storage system is struggling to keep up with your need to access data. The higher the queue depth, the worse the storage performance. It directly correlates to inefficient storage performance.

Split I/Os:

Split I/Os are extra I/O operations that have to be performed because the file system has broken up a file into multiple fragments on the disk. To have a truly dynamic file system with the ability for files to be different sizes, easily expandable, and accessible using different sized I/Os, file systems have to break files up into multiple pieces. Since the size of volumes has gotten much larger and the number of files on a volume has also exploded, fragmentation has become a more severe problem. However, not all file fragments cause performance problems. Sometimes I/Os are done in such a manner that they are aligned with the file allocations and therefore always fit within a file’s fragments. Most of the time, however, that is simply not the case. When it isn’t the case, a single I/O to process data for an application may have to be split up by the file system into multiple I/Os. Thus the term – Split I/O. When the free space gets severely frag mented, this becomes even more likely and accelerates the rate of fragmentation and therefore corresponding Split I/Os. Split I/Os are bad for storage performance. Preventing and eliminating Split I/Os is one of the easiest ways to make a big difference in improving storage performance.

IOPS:

IOPS is the average number of I/O Operations per second that your storage system is being asked to perform. The higher the IOPS, the more work that is being done.

I/O Size:

I/O Size is the average size (in kilobytes) of I/Os you are performing to your storage system. It is an indication of how efficient your systems are processing data. Generally, the smaller the I/O size, the more inefficient that the data is being processed. Please note that certain applications may just process smaller I/Os. They tend to be exceptions to the rule, however.

I/O Blender Effect Index:

This is a measure of I/Os from multiple systems at the same time that are likely causing performance problems. The problem is caused because of their conflict with I/Os from other systems at the same time. When multiple VMs on a single Hypervisor are sending I/Os to the Hypervisor at the same time, the potential for conflict rears its ugly head. The same is true when multiple systems (physical or virtual) are using shared storage such as SANs. Because this tool will collect data from multiple systems in small, discreet, and overlaid periods of time, it is able to estimate contention. By searching for periods of time where performance appears to be suffering and then checking to see if any other system is having a potential problem during the same time, the tool can determine statistically that this particular period of time is problematic due to cross node interference. The amount of cross node conflict is taken into consideration, thus creating the index.

Seconds per Gigabyte:

This is a measure of how many seconds it would take to process one gigabyte of data through your storage system using the current I/O Response Time and the current I/O Size. Effectively, this tool calculates the number of potential operations per second at the current I/O Response Time rate. It then divides one gigabyte by the product of potential operations per second times the I/O Size. This can vary widely based on I/O contention, size of I/Os, and several other factors. The lower the value, the better the storage performance.

Reads to Writes Ratio:

This is the ratio of reads to writes as a percentage. If you had 5,000 total I/Os and 3,456 were Read (1,544 Writes) the ratio would be 69.12%. It shows the workload characteristics of the environment. In other words, it shows if the application is predominantly Read or Write intensive. Generally, the potential to optimize performance is greater for read intensive applications.

Memory Utilization:

This is a measure of the percentage of memory being used by your system. Some performance problems may be caused by having limited amounts of available memory. High memory utilization may indicate that one of the bottlenecks to storage performance is inadequate memory for your systems and applications to process data. Having adequate free memory can open doors to potential optimization techniques. Sometime just increasing the available memory on a system can make a significant difference in overall performance and storage performance specifically.

CPU Utilization:

This is a measure of how busy your CPU is as a percentage. This is overall utilization for the entire system, not just per core or socket. The reason this measure matters is that if your CPU utilization is close to 100%, you probably do not have a storage related issue.
After these metrics, the user is presented with a Conclusion section. This section presents two important pieces of information:

Potential For I/O Performance Optimization
I/O Performance Issues Detected

Potential for I/O Performance Optimization:

This measurement looks at a substantial amount of the data collected and determines how likely it is that your I/O performance can be increased via various optimization techniques without having to acquire more or faster hardware.

It is possible to have Minimal I/O Performance Issues Detected and yet still have really good opportunities to improve performance. You may have plenty of Free Memory and an I/O profile that lends itself nicely to a sophisticated caching algorithm. This could still have a significant impact on your applications performance. Data rates from RAM are much faster than from Disks.

It is also possible that you have Critical I/O Performance Issues and yet the profile of your I/O does not easily lend itself to optimization. This is a much less likely scenario than the first.

I/O Performance Issues Detected:

The information in this area categorizes I/O Performance Issues into following three levels:

Critical
Moderate
Minimal

The Critical I/O Performance Issues level signifies that this tool has found the selected systems have significant, measurable I/O performance problems for extended periods of time. As a result, it is estimated that your storage and applications performance is suffering when you need it to perform at its best.

The Moderate I/O Performance Issues level points out that you have measurable I/O performance problems, but that they are not significant enough to be Critical. They may be just a few peaks or peaks that are not that high. You still want to look into them and be concerned with if they may be causing problems when you most need top storage performance. They just aren’t as clearly problematic as if they were at the Critical level.

The Minimal I/O Performance Issues level is effectively a green light. It means that for all intents and purposes your storage is performing well without serious delays. As pointed out above, there may still be potential for I/O Performance Optimization.

Finally, you will have the option to send the collected data to Condusiv for deeper analysis of your current storage performance and the potential for improving it without expensive hardware upgrades.