Some years ago when we started working with Nutanix the solution was essentially a stable, user friendly hyper converged solution offering a less future rich version of what is now called the distributed storage fabric. This is what competing solutions typically offer today and for many customers it isn’t easy to understand the added value (I would argue they should in fact be a requirement) Nutanix offers today in comparison to other approaches.
Over the years Nutanix has added lots of enterprise functionality like deduplication, compression, erasure coding, snapshots, (a)-sync replication and so on. While they are very useful, scale extremely well on Nutanix and offer VM granular configuration (if you don’t care about granularity do it cluster wide by default). It is other, maybe less obvious features or I should say design principles which should interest most customers a lot:
UPGRADEABLE WITH A SINGLE CLICK
This was introduced a while ago, I believe around version 4 of the product. At first is was mainly used to upgrade the Nutanix software (Acropolis OS or AOS) but today we use it for pretty much anything from the hypervisor to the system BIOS, the disk firmware and also to upgrade sub components of the Acropolis OS. There is for example a standardized system check (around 150 checks) called NCC (Nutanix Cluster Check) which can be upgrade throughout the cluster with a single click independent of AOS. The One-Click process also allows you to use a granular hypervisor upgrade such as an ESXi offline bundle (could be a ptach release). The Nutanix cluster will then take care of the rolling reboot, vMotion etc. to happen in a fully hyper-converged fashion (e.g. don’t reboot multiple nodes at the same time). If you think how this compares to a traditional three tier architecture (including converged generation 1) you do have a much simpler and well tested workflow which is what you use by default. And yes it does automatic prechecks and also ensures what you are updating is on the Nutanix compatibility matrix. It is also worth mentioning that upgrading AOS (the complete Nutanix software layer) doesn’t require a host reboot since it isn’t part of the hypervisor but installed as a VSA (regular VM). It also doesn’t require any VMs to migrate away from the node/host during and after upgrade (I love that fact since bigger cluster tend to have some hickups when using vMotion and other similar techniques especially if you have 100 VMs on a host) not to mentioned the network impact.
Nutanix has several unique capabilities to ensure linear scalability. The key ingredients are data locality, a fully distributed meta data layer as well as granular data management. The first is important especially when you grow your cluster. It is true that 10G networks offer very low latency but the overhead will count towards every single read IO so you should consider the sum of them (and there is a lot of read IOs you get out of every single Nutanix node!). If you look at what development is currently ongoing in the field of persistent flash storage you will see that the network overhead will only become more important going forward. The second key point is the fully distributed meta data database. Every node holds a part of the database (the meta data belonging to it’s currently local data for the most part and replica information from other nodes). All meta data is stored on at least three nodes for redundancy (each node writes to it’s neighbor nodes in a ring structure, there are no meta data master nodes). No matter how many nodes your cluster holds (or will hold) there is always a defined number of nodes (three or five) involved when a meta data update is performed (a lookup/read is typically local). I like to describe this architecture using Big O notation where in this case you can think of it as O(n) and since there are no master node there aren’t any bottlenecks at scale. The last key point is the fact that Nutanix acts as an object storage (you work with so called Vdisks) but the objects are split in small pieces (called extends) and distributed throughout the cluster with one copy residing on the local node and each replica residing on other cluster nodes. If your VM writes three blocks to its virtual disk they will all end up on the local SSD and the replicas (for redundancy) will be spread out in the cluster for fast replication (they can go to three different nodes in the cluster avoiding hot spots). If you move your VM to another node, data locality (for read access) will automatically be built again (of course only for the extends your VM currently uses). You might now think that you don’t want to migrate that extends from the previous to the now local node but if you think about the fact that the extend will have to be fetched anyhow then why not saving it locally and serve it directly from the local SSD going forward instead of discarding it and reading it over the network every single time. This is possible because the data structure is very granular. If you would have to migrate the whole Vdisk (e.g. VMDK) because this is the way your storage layer saves its underlying data then you simply wouldn’t do it (imagine vSphere DRS migrates your VMs around and your cluster would need to constantly migrate the whole VMDK(s)). If you wonder how this all matters when a rebuild (disk failure, node failure) is required then there is good news too! Nutanix immediately starts self healing (rebuild lost replica extends) whenever a disk or node is lost. During a rebuild all nodes are potentially used as source and target to rebuild the data. Since extends are used (not big objects) data is evenly spread out within the cluster. A bigger cluster will increase the probability of a disk failure but the speed of a rebuild is higher since a bigger cluster has more participating nodes. Furthermore a rebuild of cold data (on SATA) will happen directly on all remaining SATA drives (doesn’t use your SSD tier) within the cluster since Nutanix can directly address all disks (and disk tiers) within the cluster.
Thanks to data locality a large portion of your IOs (all reads, can be 70% or more) are served from local disks and therefore only impact the local node. While writes will be replicated for data redundancy they will have second priority over local writes of the destination node(s). This gives you a high degree of predictability and you can plan with a certain amount of VMs per node and you can be confident that this will be reproducible when adding new nodes to the cluster. As I mentioned above the architecture doesn’t read all data constantly over the network and uses meta data master nodes to track where everything is stored. Looking at other hyper converged architectures you won’t get that kind of assurance especially when you scale your infrastructure and the network won’t keep up with all read IOs and meta data updates going over the network. With Nutanix a VM can’t take over the whole clusters performance. It will have an influence on other VMs on the local node since they share the local hot tier (SSD) but that’s much better compared to today’s noisy neighbor and IO blender issues with external storage arrays. If you should have too little local hot storage (SSD) your VMs are allowed to consume remote SSD with secondary priority over the other node’s local VMs. This means no more data locality but is better than accessing local SATA instead. Once you move away some VMs or the load on the VM gets smaller you automatically get your data locality back. As described further down Nutanix can tell you exactly what virtual disk uses how much local (and possibliy remote) data, you get full transparency there as well.
I think it is known that hyper converged systems offer very high storage performance. Not much to add here but to say that it is indeed extremely fast compared to traditional storage arrays. And yes a full flash Nutanix cluster is as fast (if not faster) than an external full flash storage array with the added benefit that you read from you local SSD and don’t have to traverse the network/SAN to get it (that and of course all other hyper convergence benefits). Performance was the area where Nutanix had the most focus when releasing 4.6 earlier this year. The great flexibility of working with small blocks (extends) rather than the whole object on the storage layer comes at the price of much greater meta data complexity since you need to track all these small entities through out the cluster. To my understanding Nutanix invested a great deal of engineering to make their meta data layer extremely efficient to be able to even beat the performance of an object based implementation. As a partner we regularly conduct IO tests in our lab and at our customers and it was very impressive to see how all existing customers could benefit from 30-50% better performance by simply applying the latest software (using one-click upgrade of course).
Since Nutanix has full visibility into every single virtual disks of every single VM it also has lots of ways to optimize how it deals with our data. This is not only the simple random vs sequential way of processing data but it allows to not have one application take over all system performance and let others starve (to name one example). During a support case we can see all sorts of crazy information (I have a storage background so I can get pretty excited about this) like where exactly your applications consumes it’s resources (local, remote disks). What block size is used random/sequential, working set size (hot data) and lots more. All with single virtual disk granularity. At some point they were even thinking at making a tool which would look inside your VM and tell you what files (actually sub file level) are currently hot because the data is there and just needs to be visualized.
If you take a look at the upcoming functionality I wrote about further down you can see just some examples of what is possible due to the very extensible and flexible architecture. Nutanix isn’t a typical infrastructure company but more comparable to how Google, Facebook and others engineer and build their data centers. Nutanix is a software company following state of the art design patterns and using modern frameworks. Something I was missing when working with traditional infrastructure. For about a year now they heavily extended what they call the app mobility fabric which comes on top of the distributed storage fabric I mentioned above. This layer allows to move workloads between local hypervisors (currently KVM<->ESXi) and soon between private and public cloud as well. You can for example use KVM based Acropolis Hypervisor clusters for all your remote offices to get rid of high vSphere licensing costs without loosing the main functionality and replicate the VMs to a central vSphere based cluster. The replicated VMs can then be started on vSphere and Nutanix takes care of the conversion. The hypervisor is commodity just like your x86 servers.
When Nutanix released version 1 of it’s hyper converged product in 2011 it was a great idea and a good implementation of the same. Most people in IT didn’t however expect that it will become the approach with the highest focus throughout the industry. Today the largest players in IT infrastructure push their hyper converged products and solutions more than any other and while there are still other less radical approaches (e.g. external all flash storage), it is foreseeable that they will be less and less important for the big part of IT projects. Nutanix is the leader in the hyper convergence space but having converged storage within your x86 commodity compute layer is by far not the only thing Nutanix has done since then. Their own included hypervisor is a pretty interesting alternative for all those who don’t want to spend lots of dollars on vSphere licenses. While it will not yet suite all of your use cases you might actually be surprised at how much of the functionality vSphere offers today (distributed switch, host profiles, guest customization, HA etc.) you care about is already included out of the box with the added value of greatly reduced complexity (yes I am calling vSphere complex compared to Nutanix Acropolis Hypervisor).
Since Nutanix is purchased solely as an appliance solution (even though they are only making the software on top). You are always dealing with a pretested, preconfigured solution stack. You do have choice when it comes to memory, CPU, disk and GPU and you get to select from three hardware providers (Nutanix directly, DELL and Lenovo) but they are all predefined options. This allows to guarantee a high level of stability and fast resolution of support cases. As a Nutanix partner this is worth a lot since the experience we get from one customer is valid for any other customer as well. It also allows us to be very efficient and consistent when implementing or expanding the solution since we can put standardized processes in place to reduce possible issues during implementation to a minimum. Once the Nutanix hardware is rack mounted at the customer their software automatically installs the hypervisor of choice (KVM, Hyper-V or ESXi) and configures are necessary variables (IP addresses, DNS, NTP etc.). This is done by the cluster itself, the nodes stage each other over the local network.
AND LAST BUT NOT LEAST: WITH OUTSTANDING SUPPORT
The support we get from Nutanix is easily the best from all vendors we work with. If you open a case you directly speak to an engineer which can help quickly and efficiently. Our customers sometimes open support cases directly (not through us) and so far the feedback was great. One interesting aspect is the VMware support we receive from Nutanix even if the licenses are not sold by them directly. They analyze all ESXi/vCenter logs we send them. If the bug isn’t storage related we also open a case with VMware to continue investigating. They do have the possibility to directly engage with VMware by opening a support case directly (Nutanix->VMware) which we saw on multiple occasions. The last case we witnessed was a non-responsive hostd process (vCenter disconnects) where the first log analysis by Nutanix pointed out a possible issue with the Active Directory Integration Service. We then opened a VMware case which was handeled politely but after two weeks when there wasn’t much progress other than collecting logs and more logs we remembered what the Nutanix engineer suggested and there was our solution. Disabling Active Directory Integration did the trick. I wouldn’t say VMware support isn’t good as well but we are always glad that Nutanix takes a look at the logs as well because at the end of the day you are just happy if you can move on and work on other things, not support cases.
Note: I strongly encourage you to take a look at the Nutanix Bible (nutanixbible.com) where all mentioned aspects and many more are described in great detail.
ROOM FOR IMPROVEMENT
Nutanix has the potential to replace most of today’s traditional storage solutions. These are classic hybrid SAN arrays (dual and multi controller), NAS Filers, newer All-Flash Arrays as well as any object, big data etc. use cases.
For capacity it usually comes down to the price for large amounts of data where Nutanix may offer higher than needed storage performance at a price point which isn’t very attractive. This has been address in a first step using storage only nodes which are essentially an intelligent disk shelf (mainly SATA) with its own virtual SDS appliance preinstalled. Storage nodes are managed directly by the Nutanix cluster (hypervisor isn’t visible and no hypervisor license necessary). While this is going the right direction, larger storage nodes are needed to better support „cheap, big storage“ use cases. For typical big data use cases today’s combined compute and storage nodes (plus optionally storage only nodes) are already a very good fit!
The Nutanix File Services (Filer with active directory integration) are a very welcomed addition customers get with a simple software upgrade. Currently this is available as tech preview to all Acropolis Hypervisor (AHV) customers and will soon be released to ESXi as well. This is one example of a service running on-top of the Nutanix distributed storage fabric, well integrated with the existing management layer (Prism) offering native scale out capabilities and One-Click upgrade like everything else. The demand from customers for a builtin filer is big, they are looking to not depend on legacy filer technology any longer. We are looking forward to seeing this technology mature and offer more features over the coming months and years.
Another customer need is to be able to consume Nutanix storage from outside the cluster for other, non-Nutanix workloads. These could include bare metal systems as well as non-supported hypervisors (e.g. Xen Server etc.). This functionality (called Volume Groups) is already implemented and available for use by local VMs (e.g. Windows Failover Cluster Quorum) and will soon be qualified for external access (already working from a technical point of view including MPIO multi pathing with failover). It will be interesting to see if Nutanix will allow active-active access to such iSCSI LUNs (as opposed to the current active-passive implementation) with the upcoming release(s). Imagine if you upgraded your Nutanix cluster (again this would be a simple One-Click software upgrade) and all of sudden you have a multi-controller, active-active (high-end) storage array. (Please note that I am not a Nutanix employee and that these statements describing possible future functionality are to be understood as speculation from my side which might never become officially available.)
Author: Samuel Rothenbühler