NTFS allocation units and PowerMax storage provisioning and performance considerations


NTFS allocation units and PowerMax storage provisioning and performance considerations
Welcome to my blog and thanks for visiting!! For a while I was contemplating about developing a blog then finally, I came across the topic that certainly warrants writing one. I hope you find the content here useful and I will appreciate any comments you may have regarding this topic or any other topics you would like me to cover.

NTFS is the de-facto file system for Microsoft Windows for many years now. NTFS supports very large capacities, many different types of devices and several different format options suitable to many different applications. It also provides several resiliency, security and access control capabilities among many useful features. Starting with Windows Server 2012, Windows has also leveraged storage array capabilities for offloaded data transfer (ODX) by using underlying storage array copy capabilities instead of using host-based copies. Dell EMC PowerMax family is the industry leading storage system with a Non-Volatile Memory Express (NVMe) back end for PCI Express (PCIe)-based access to Non-Volatile Memory (NVM) media, which includes modern NAND-based flash along with high-performance storage class memory (SCM) media. Dual-ported NVMe drives with flash-optimized protocol access provide very low latencies and extremely high I/O densities for mission-critical applications. In Q3 2019, Dell EMC enhanced PowerMax systems to support end-to-end NVMe, SCM drives, and 32 Gb Fibre Channel (FC) NVMe to offer an unprecedented level of performance for host applications. PowerMax also offers storage based local and remote snapshots and D/R copies.
The PowerMax storage system is fully thin provisioned storage system that allows application storage consumption to grow on demand while offering ease of provisioning, high performance, superior data services and excellent degree of storage efficiencies using inline compression and data deduplication. The thin devices are a set of pointers to capacity allocated at 128KB extent granularity in the back-end storage resource pools (SRP); however, to the host they look and respond just like regular LUNs. Thus, all I/O on PowerMax devices and underlying data services operate on multiple of this 128KB track sizes. And optimal performance can be achieved when the host I/O subsystem also operates at the partitions aligned with 128KB track sizes.

I will go over some basics of NTFS allocation and the latest advances in NTFS to make it even more friendly to thin devices on PowerMax. The topics covered here include:
·        NTFS Allocation Units and PowerMax storage provisioning considerations
·        Use cases dependent on aligned NTFS allocation unit and PowerMax device track boundaries
·        Windows Server 2019 and solution to address the mis-alignment issue
·        Considerations for using 128K NTFS allocation unit size for various use cases
o   SQL Server Performance considerations
o   PowerMax Data Reduction for Windows NTFS
o   Windows ODX performance
o   Rapid provisioning of virtual machines using Microsoft System Center Virtual Machine Manager (SCVMM) and ODX
·        Conclusion
·        References and Further Readings

NTFS Allocation Units and PowerMax storage provisioning overview
NTFS allocations units were designed keeping diverse type of direct attached storage media and allocations required for many different types and sizes of the files that they support. While being able to support these allocations, designers also considered the fragmentation which would lead to wastage of storage capacity and require expensive defragmentation operations to free up the space. With that NTFS supported allocations units from 4096 bytes and its multiples thereof such as 8192, 16K, 32K and 64K. General recommendation for direct attached storage is to just use default size which is 4K. Whenever a device presented to Windows is formatted users can choose the appropriate
allocation size and use that for the application. Here is the screen shot of disk management console followed by PowerShell cmdlet where allocation unit size of 65536 refers to 64K.

External smart storage systems such as PowerMax abstracted multiple physical disk drives, offered RAID protection and flexibility of storage sizes that host would see and use just like locally attached storage. While offering ease of management and growth for the applications, such storage provisioning model also offered high performance, recovery from drive failures and met demands for many different applications. Here is the schematic of PowerShell storage provisioning and relevant components.



Virtual Provisioning (also known as Thin Provisioning) abstracted this even farther and developed a “grow as you grow” model for storage provisioning. The thin devices or host LUNs are just pointers into back end storage resource pools that would allow expansion of storage capacity, different service levels at sub LUN granularity and ability to choose different media types - like Solid State Drives and now even Storage Class Memory for single application. Although PowerMax can support different allocation unit size and work with applications using different block sizes, for best application performance when using LUNs presented that way, alignment of allocation unit size with PowerMax track size is important. Our best practices so far have been to choose 64K or larger allocation unit size as it serves Online Transaction Processing (OLTP), Decision Support System (DSS) and other application workload profiles very well.

When using Thin Provisioning actual storage capacity is only allocated when applications write to the device, so just formatting the device would not consume the specified capacity for the device. SQL Server data and logs use different I/O sizes, OLTP applications have smaller I/O sizes where as backup and DSS use large sequential I/O sizes. But all these different workloads can be effectively served by standardizing the allocation unit sizes when using PowerMax storage systems.

As virtual provisioning offers storage efficiency anyway, there is no reason to keep too many devices around, device sizes keep going up with lot of applications using fewer number of larger devices for ease of manageability.

Use cases dependent on aligned NTFS allocation unit and PowerMax device track boundaries

VMAX All Flash and PowerMax storage systems use the track size of 128K so all backend allocations, I/O activities, data reduction and other data services operate on 128K track sizes. As NTFS maximum allocation unit size is 64K, Windows NTFS I/O will be mis-aligned with track boundaries. The implications of this mis-alignment are:
  • Deduplication uses hash key generated from a 128K track write as shown in the figure below. Even two identical write requests from Windows servers using 64K allocation unit may not match depending on the alignment of request with the 128K track boundary. So, first copy of the data may not dedupe but subsequent copies will benefit from deduplication as those tracks will match with either one of the prior writes. Additional allocation will take place and hence making data copy little less efficient.

  • ODX copy request might be rejected and reverted to host copy if the source and target starting tracks don’t align at 128K track boundary and the extents involved in read and write requests might not be completely aligned as well. This behavior may result in partial success in ODX copies and hence degraded copy performance. The highlighted portion in the figure below from the “procmon” output for ODX file copy shows failed ODX copy request due to  mis-alignment.

  • Microsoft System Center Virtual Machine Manager (SCVMM) shown below allows pre-defined template based rapid deployment of Hyper-V cloud using pre-formatted VMAX/PowerMax storage devices. The VM deployment performance will not be consistent whenever ODX copies are rejected due to track mis-alignment as the copy will be reverted to host copy for the requests that fail.

  • Unless compression and deduplication are used extraneous storage allocation in the back end may occur due to discrepancy between host write sizes and PowerMax track sizes. Without using PowerMax data reduction, 64K Windows writes will still result in 128K allocation on PowerMax back end. Also, more storage system resources will be consumed for write coalescing to utilize available track allocation effectively and depending on the type of write (random or sequential) that may not always be feasible. 


With Windows Server 2019 we have a solution to address the mis-alignment issue
Starting with Windows Server 2019, NTFS allows allocation unit/cluster size from the current maximum 64K to 2M. So now we have 128K, 256K, 512K, 1M and 2M allocation unit/cluster sizes available with Windows Server 2019 release.

Following the suite with other applications like Oracle, VMware that already work well using larger allocation unit sizes, now our recommendation is to start using larger allocation unit sizes for NTFS users as well. I will cover various aspects when going from current recommendation of 64K allocation unit to 128K in the rest of this document. The following figure shows the new allocation unit sizes using disk management as well as PowerShell. The allocation unit size 131072 refers to 128K.



Once the file system is formatted NTFSINFO will show new cluster size of 131,072 as shown below.


Considerations for using 128K NTFS allocation unit size for various use cases

Now I will cover some considerations for using 128K allocation unit size and explain why that is the right thing to do for all applications deployed on PowerMax.



SQL Server Performance considerations
Here I used the example of SQL server and compared the performance of SQL Server OLTP application running with 64K and 128K allocation units. In both cases SQL server IO sizes remained the same so other than the difference in allocation unit at the storage/file system level, there was not any difference in application’s performance profile. I ran 1 TB SQL Server TPCC workloads on 128K and 64K formatted data and log volumes and collected “perfmon” data during the run. As we can see, 128K formatted SQL Server database reported higher Maximum and Average IOPS compared to 64K formatted volumes. This was also reflected in SQL Server Batch Requests/Sec which was higher for 128K formatted SQL Server data volumes. So, using 128K allocation unit improves or at least maintains the performance of SQL Server OLTP application.

PowerMax Data Reduction
PowerMax Data Reduction works very well when using 128K allocation unit size on NTFS. Whenever a host write takes place PowerMax generated hash key will correspond to the entire allocation unit on NTFS file system extent. As a result, subsequent matching write – such as host copy from that device to another will be completed deduplicated and hence no storage allocation will take place.

In my example below, I used a 2 TB source device with about 1.5 TB of data and compression was set on that storage group to enable data reduction. Just the nature of the files resulted in 1.5:1 compression due to all data reduction techniques that PowerMax incorporates.

The source file system was then entirely copied over to a file system on another 2 TB target device. And as we can see there was no additional allocation due to this copy. Subsequent copies will continue to realize the same deduplication benefits and hence further storage efficiencies will be achieved even with multiple host-based copies.


Windows ODX performance

Here I used Windows ODX performance and monitored the copy using “procmon” to figure out the success rate of the ODX. The tests were run by measuring the file copy performance time using PowerShell script. Source and target devices were formatted with 64K and 128K allocation units and prior to each test brand new devices were created and formatted, and file system cache was completely freed to ensure the clean starting point for each test.
It should be noted that the ODX copy is accepted by PowerMax when the request contains fully aligned source and target extents at 128K track boundary as well as the total number blocks included in copy request should also result in fully aligned ending extents.



The following table shows the ODX copy performance timing. I also ran the tests by disabling ODX to show the effect of ODX in copy performance as well. As we can see best copy performance can be achieved when both source and target devices are 128K formatted as that results in fully aligned copy from source to target file system.



Most of the current deployments are using 64K source allocation units and the copy from 64k formatted source to 128K formatted target device will depend on the alignment. In this case target start and end blocks are fully aligned with 128K track boundaries, so the success rate is considerably higher than simply going from 64K source to 64K target. But as allocation entirely depends on relative NTFS cluster position of the files being copied, your mileage may vary.  When there is a mix of 64K source to 128K target allocation unit size, the ODX request might be accepted when a file is copied standalone whereas the same file copied along with other files in the file systems might be rejected, or vice versa depending on source or target side mis-alignment.




Once the ODX copy to 128k target is done, all subsequent copies to other 128K formatted target file systems will get highest copy performance as at time both source and target devices will be 128K formatted. These copies will also realize excellent data reduction benefits that PowerMax provides and hence no additional storage capacity allocation would be needed.



Rapid provisioning of VMs using SCVMM and ODX

As ODX and data deduplication both work very effectively when using NTFS allocation size of 128K, the use case rapid provisioning of VMs using SCVMM will provide consistent results for multiple VMs. OS images storage in the VM template will be copied to target VMs very quickly and as there is 100% track alignment for every such copy with the source template, no additional storage will be consumed and all newly created VMs will share the same copy of the guest operating system.


Here are the high-level steps to accomplish rapid provisioning of VM:

  1. Start with a brand-new Windows VM and create a template on that and store the template on share created on a PowerMax device formatted using 128K allocation unit.
  2. Format the PowerMax device to be used for the target VM with 128K formatting and use the template created in (1) to deploy VM on that.
  3. Using “procmon” we can see that ODX copy operation was used to copy VHDX file to the target device.
  4. Running Solutions Enabler (or it can be done using Unisphere for PowerMax as well), we can see that even though the file copy was done there was no additional allocation for this VM image. All the target devices shared the allocations with source device containing template VHDX.

Conclusion

Windows Server 2019 introduced 128K and larger allocation unit support for commonly used NTFS for SQL Server and other Windows based applications running on PowerMax. Using NTFS 128K allocation unit complements 128k track size used by PowerMax for storage allocation as well as all the data services (Snapshots, ODX) and works very well with PowerMax storage efficiencies using compression and data deduplication.



The benefits of using 128K allocation unit on NTFS include:

(1) Better or at-par performance for Windows applications compared to current 64K allocation practices.

(2) PowerMax deduplication is based on hash keys generated at 128K track writes. PowerMax deduplication will work a lot better with NTFS when using 128K aligned IO. Now even very first copy of the data will be deduped as all hash keys for source and target devices would match.

(3) Offloaded Data Transfer (ODX) copies would work well when both source and target devices are aligned at 128K boundaries. And even the existing devices supporting NTFS with 64K allocation unit can achieve higher ODX copy success rates as the targets will be already aligned to 128K boundaries.

(4) Rapid Provisioning of Virtual Machine using System Center Virtual Machine will exhibit consistent performance as data copy will leverage ODX and PowerMax Dedupe resulting in faster deployments of VM. Copies will not only be faster, but they will also not result in additional allocations.

References and further readings
(1)    NTFS Overview

Comments

Popular posts from this blog

Microsoft SCOM monitoring for PowerMax using Unisphere SNMP traps