Storage Caching vs Tiering Part 1

Recently I had the privilege of being a Tech Field Day Delegate. Tech Field Day is organized by Gestalt IT. If you want more detail on Tech Field Day visit right here. In interest of full disclosure the vendors we visit sponsor the event. The delegates are under no obligation to review good or bad the sponsoring companies.

The first place hosting the delegates was NetApp. I basically have worked with several different storage vendors but I must admit I have never experienced NetApp in any way before. Except for Storage vMotioning Virtual Machines from an old NetApp (I don’t even know the model) to a new SAN.

Among the 4 hours of slide shows I learned a ton. One great topic is Storage Caching vs Tiering. Some of the delegates have already blogged about the sessions here and here.

So I am going to give my super quick summary of Caching as I understood it from the NetApp session. Followed by a post about Tiering as I learned from one of our subsequent sessions from Avere.

1. Caching is superior to Tiering because Tiering requires too much management.
2. Caching outperforms tiering.
3. Tiering drives cost up.

The NetApp method is to use really quick Flash Memory to speed up the performance of the SAN. Their software attempts to predict what data will be read and keep that data available in the cache. This “front-ends” a giant pool of SATA drives. The cache cards provide the performance the the SATA drives provide a single large pool to manage. With a simplified management model and using just one type of big disk the cost is driven down.

My Take Away in Tierless-Caching

This is a solution that has a place and would work well for many situations. This is not the only solution. All in all the presentation was very good. The comparisons against tiering were really setup against a “straw-man”. A multi-device tiered solution requiring manual management off all the different storage tiers is of course a really hard solution. It could cost more to obtain and could be more expensive to manage. I asked about fully virtual automated tiering solutions. Solutions that manage your “tiers” as one big pool. These solutions would seem to solve the problem of managing tiers of disks, keeping the cost down. The question was somewhat deflected because these solutions will move data on a schedule. “How can I know when to move my data up to the top tier?” was the question posed by NetApp. Of course this is not exactly how a fully-automated tiering SAN works, but is a valid concern.

My Questions for the Smartguys:

1. How can the NetApp caching software choices be better/worse than software that makes tiering decisions from companies that have done this for several years?
2. If tiering is so bad, why does Compellent’s stock continue to rise in anticipation of an acquisition from someone big?
3. Would I really want to pay NetApp sized money to send my backups to a NetApp pool of SATA disks? Would I be better off with a more affordable SATA solution for Backup to Disk even if I have to spend slightly more time managing the device?

Fast Don’t Lie – Tech Field Day

Apologies to the new Adidas Basketball youtube campaign. I am going to steal their title for this post.

Time has flown by and it is now time to get going to Gestalt IT’s Tech Field Day. Thursday and Friday will be full of some pretty exciting companies. I have some familiarity with three of them: Solarwinds, NetApp and Intel. I am excited to get some in depth information from them though.

Then Aprius, Avere Systems, Actifio, and Asigra are companies I have never really heard anything about so it will be interesting to see what they do and see how it fits in to my perspective as a Virtualization dude.

For now I have one question on my list (I will come up with others), Is it Fast? Watch the videos, because when we talk about the cloud, Fast dont’ lie.

I’m Fast

I’m Fast 2

Fast Don’t Lie

Equallogic, VAAI and the Fear of Queues

Previously I posted on how using bigger VMFS volumes helps Equallogic reduce their scalability issues when it comes to total iSCSI connections. There was a comment about does this mean we can have a new best practice for VMFS size. I quickly said, “Yeah, make em big or go home.” I didn’t really say that but something like it. Since the commenter responded with a long response from Equallogic saying VAAI only fixes SCSI locks all the other issues with bigger datastores still remain. ALL the other issues being “Queue Depth.”

Here is my order of potential IO problems on with VMware on Equallogic:

  1. Being spindle bound. You have an awesome virtualized array that will send IO to every disk in the pool or group. Unlike some others you can take advantage of a lot of spindles. Even then, depending on the types of disks some IO workloads are going to use up all your potential IO.
    Solution(s): More spindles is always a good solution if you have unlimited budget. Not always practical. Put some planning into your deployment. Don’t just buy 17TB of SATA. Get some faster disk and break your Group into pools and separate the workloads into something better suited to the IO needs.
  2. Connection Limits. The next problem you will run into if you are not having IO problems is the total iSCSI connections. In an attempt to get all of the IO you can from your array you have multiple vmk ports using MPIO. This multiplies the connections very quickly. When you reach the limit, connections drop and bad things happen.
    Solution: The new 5.02 firmware increases the total maximum connections. Additionally, bigger datastores means less connections. Do the math.
  3. Queue Depth. There are queues everywhere, the SAN ports have queues. Each LUN has a queue. The HBA has a queue. I would need to defer to a this article by Frank Denneman (a much smarter guy than myself.) That balanced storage design is best course of action.
    Solution(s): Refer to problem 1. Properly designed storage is going to give you the best solution for any potential (even though unlikely) queue problems. In your great storage design, make room for monitoring. Equallogic gives you SANHQ USE IT!!! See how your front end queues are doing on all your ports. Use ESXTOP or RESXTOP to see how the queues look on the ESX host. Most of us will find that queues are not a problem when problem one is properly taken care of. If you still have a queuing problem then go ahead and make a new datastore. I would also request Equallogic (and others) release a Path Selection Policy plugin that uses a Least Queue Depth algorithm (or something smarter). That would help a lot.

So I will repeat my earlier statement that VAAI allows you to make bigger datastores and house more VM’s per store. I will add a caveat, if you have a particular application that needs a high IO workload, give it a datastore.

Gestalt IT – Tech Field Day

I am honored to be included in the upcoming Gestalt IT Field Day. Looks like a great group from the community will be in attendanc. I am looking forward to the collection of presenters. With how busy I have been delivering solutions lately it will be really good to dedicate some time to learning what is new and exciting. I plan to take good notes and share my thoughts here on the blog. For more information on the Field Day check it out right here: http://bit.ly/ITTFD4

Random picture of my dog.
Random picture of my dog.

How VAAI Helps Equallogic

I previously posted about the limits on iSCSI connections when using Equallogic arrays and MPIO. If you have lots of Datastores and lots of ESX hosts with multiple paths the numbers of connections multiplies pretty quickly. Now with VAAI support in the Equallogic 5.02 firmware (hopefully no recalls this time), the number of Virtual Machines per Datastore is not important. Among other improvements, the entire VMFS volume will not lock. As I understand VAAI the only the blocks (or files maybe?) are locked when exclusive access is needed.

Lets look at the improvement when using fewer larger EQ volumes:
Old way (with 500GB Datastores for example):
8Hosts x 2(vmkernel connections) x 10(Datastores) = 160 connections (already too many for the smaller arrays, PS 4000).

VAAI (with 1.9 TB* Datastores)
8 Hosts x 2(vmkernel connections) x 3(Datastores) = 48 connections

The scalability for Equallogic is much better with VAAI when trying to stay under the connection limits.

*Limit for VMFS is 2TB minus 512B so 1.9TB works out nicely.

Update Manager Problem after 4.1 Upgrade

A quick note to hopefully publicize a problem I had which I see is discussed in the VMware Community Forums already.

After building a new vCenter Server and Upgrading the vSphere 4.0 databases for vCenter and Update Manager. I noticed I could not scan hosts that were upgraded to 4.1. To be fair, by upgrading I mean rebuilt with a fresh install but with the exact same name and IP addresses. Seems that the process I took to upgrade has some kind of weird effect in the Update Manager Database. The scans fail almost immediately. I searched around the internet and found a couple of posts on the VMware Forums about the subject. One person was able to fix the problem by removing Update Manager and when reinstalling selecting the option to install a new database. I figured I didn’t have anything important in my UM database so I gave it a try and it worked like a champ.

Right now there is not any new patches for vSphere 4.1 but I have some Extension packages that need to be installed (Xsigo HCA Drivers). I wanted to note that I like the ability to upload extensions directly into Update Manager. This is a much cleaner process than loading the patches via the vMA for tracking and change control purposes.

ESXi 4.1 pNics Hard Coded to 1000 Full

I have recently made the transition to using ESXi for all customer installs. One thing I noticed was after installing with a couple different types of media (ISO and PXE install) the servers come up with the NIC’s hard coded to 1000 Full. I have always made it a practice to keep Gigabit Ethernet at auto-configure. I was told by a wise Cisco engineer many years ago that GigE and Auto/Auto is the way to go. You can also check the Internet for articles and best practices around using auto-configure with gigabit ethernet. Even the VMware “Health Analyzer” recommends using auto. So it is perplexing to me that ESXi 4.1 would start to default to hard set. Is it just me? Has anyone else noticed this behavior?

The only reason I make an issue is I was ready to call VMware support a couple weeks ago because nothing in a DRS/HA cluster just built with 4.1 would work. One vMotion would be successful, the next would fail. Editing settings on the hosts would fail miserably when done from the vSphere Client connected to vCenter. After changing all the pNics to auto everything worked just fine (matching the switches).

Hit me up in the comments or on twitter if you have noticed this.

Vote for the top VMware Blogs

The vSphere Land top 25 is up for vote once again. I am low on the list of bloggers, I just want to get close enough to see the shoes of the guy at #25. Like the picture I took in San Francisco during VMworld, I can barely see the top of the hill. Hey though, very excited to be on the ballot once again. Get on over and vote. Vote for me if you like the blog.

DSC04652.png

Here are a couple of my top blog posts from the last few months.
1. The mini ESXi 4 Portable Server
2. Storage IO Control An Idea
3. You Might be a vDiva if…
4. Adaptive Queuing in ESX

VMworld 2010 Recap – Five Session Highlights

I thought I would get more into posting my thoughts on each session. To be completely honest I was in some really good and really bad sessions. My goal was to find sessions that would potential benefit my day to day work. Not just a session where they talk about features we may or may not see in the next year. More of that knowledge came from doing the labs. Next year I will make more time to check all the labs out. I do not really learn well listening to someone speak anyways. I am more of a hands on learner.

I was go over how I would address the sessions I didn’t like. I think the best way to comment is to just say there were some sessions that were not helpful, at all. Others were really good. Therefore I wanted to list out five good lessons I learned in the VMworld 2010 Breakout sessions.

1. A common theme to me was the distributed virtual Switch (dvSwitch) is required to do anything advanced. This convinced me to push more into using the dvSwitch on deployments when possible. I figure more and more network features will be depending on the dvSwitch. Included features now available are: Network IO Control, Private VLANS (needed for cross Host network fences, and will be important for Cloud networking in vSphere an vCloud Director)

2. Innovation is coming to the Network. Converged networks from Xsigo and Cisco are just the beginning to virtualizing the network and I/O.

3. Doing VDI and having happy users is going to be harder than Server Virtualization.

4. VMware is working hard to get View deployments right. The View Benchmarking tool is going to help validate the deployments in order to provide scale. Hoping for good things here.

5. There are so many moving parts in a virtual datacenter solution. Architecture when it comes to VMware is basically knowing to account for everything involved. Seeing how the lab datacenter was put together was encouraging. Knowing even the rock star Architects at VMware have the same challenges as the everyday folk. They did a great job, because in my opinion the labs rocked.

I learned a great deal during VMworld. It was once again a great experience. At the same time I hope the words “deep dive” are not misused like they were this year. VMware did a great job this year and hopefully will do better next year. See you all at PEX 2011 in Orlando?

The Fun Stuff at VMworld 2010

Much of my planned activities for the blog didn’t work out this year. Not too much in the sessions or keynotes that was worth a blog post yet. Expect some View 4.5 and vCloud Directory posts once I can get it in the lab. Probably the most useful parts of VMworld were the discussions at the Thristy Bear, Bloggers Lounge, Chieftain, over breakfast or dinner among many other places. There was a great turn out for the In-n-out trip noting that it took around 30 minutes on public transportation to get there. This post is sharing some of the few experiences* I had and the couple of pictures I thought to make while in San Francisco. I met a lot more people than last year. I couldn’t even begin to name them all off but it was a great time hanging out with all of you enjoying a few drinks and talking Virtualization and Storage and other topics.

DSC04648.png

This is the hall in our hotel. I kept seeing these twin girls at the end of the hall. It was scary.

In-n-out.png

Here is proof of my In-n-Out take down. Double Double and Fries welldone. Several people showed up. I hope everyone enjoyed it. I do not think any In-n-Out vs Five Guys battles were decided though.

DSC04653.png

I hung off the of the Cable Car all the way back to Powell and Market. Jase McCarty @jasemccarty and Josh Leibster @vmsupergenius

DSC04657.png

The view from the top of the hill and the front of the Cable Car. The picture does not do justice to how steep the hill is.

DSC04659.png

Random shot at the Veeam party.

DSCN0396.png

A couple of VMware R&D Managers I met at the CTO gathering before the VMware party. Steve Herrod hosted a party that included a great mix of vExperts and some of the thought leaders at VMware. Great chance to meet some people, @kendrickcoleman beat me down in wii bowling though. I will be practicing until next year.

JO-vmCTO.png

Proof that I at least made it to the door of the CTO party, by Wednesday I had a pretty good collection of flare on my badge. TGI Fridays made me an offer but I didn’t want to move my family back to the West Coast.

RB-RV-JO-vmCTO.png

A less fuzzy picture with Rich Brambley @rbrambley and Rick Vanover @rickvanover. I am honored to just hold the sign for these guys.

GroupPhoto_DragonCon06_1.png

The Veeam party got bit crazy when 17 Princess Leia’s showed up.

atlanta-dragon-con-parade.png

The EMC vSpecialists roll up on VMworld 2010, there was at least 4000 more people at VMworld than last year. 3500 of them were from EMC. Actually found out they were real guys (and girls) and were really cool. Really good conversations about virtualization were had with many of these guys. If you haven’t seen it yet Nick Weaver @lynxbat and other vSpecialist put together a pretty good rap video. Check it out here

*in the event I did not have actual pictures of the event artistic liberties were taken.