Jon Owings | CrashLoopBackoff

Not the Same Ol’ Sessions from Pure Storage at VMworld

I am really excited to be going to VMworld once again. I will be wearing my Orange Nike so most likely my feet won’t hurt quite as bad. Also expect the Pure Orange Superman to make an appearance.

More about the sessions. So I will be attending VMworld San Francisco, and speaking in EMEA.

STO2996-SPO – The vExpert Storage Game Show

The session I am stoked to be a part of is STO2996-SPO – The vExpert Storage Game Show. It will be a fun and informative time about next generation storage architectures presented in the form of a game show. PLUS, two members of the audience will join the session to help the vExpert teams. I know everyone will want to be on my team in EMEA.

STO3000-SPO – Flash Storage Best Practices and Technology Preview

This very exciting session with Vaughn and Cody (super-genius vExperts) will go into what to consider when moving your datacenter to all flash. Plus previews of the Pure VVOLs. If you think you are not ready for all flash, come to this session and learn how Flashy you can be.

STO2999-SPO – Customers Unplugged: Real-World Results with VMware on Flash

I wish I had thought of this. Customers using All Flash with VMware. All Tech, No Slides.

STO1965 – Virtual Volumes Technical Deep Dive

Dive into Virtual Volumes with Rawlinson Rivera – VMware, Suzy Visvanathan – VMware and Vaughn Stewart – Pure Storage. So many customers have asked me what will VVOLS actually do over the last 3 years. This will be a great chance to find that out.

VAPP2132 – Virtualizing Mission Critical Applications on All Flash Storage

How does Pure storage enable that final 10% of critical applications that just a few years ago people said would be impossible? Meet my friend Avi Nayek from Pure and Mohan Potheri from VMware and learn how flash eliminates storage as the road block to critical applications becoming virtual.

MGT1265 – Improving Cloud Operations Visibility with Log Management and vCenter Log Insight

Cody Hosterman, Did I tell you he is smart? Yeah. He is. Join Cody and Dominic Rivera from US Bank and Bill Roth from VMware on how to increase your Cloud Operations Visibility.

SDDC2754-SPO – New Kids on the Storage Block, File and Share: Lessons in Storage and Virtualization

Lessons from all the upstarts in the storage industry. Most of them are not “startups” anymore. Finding new ways to solve the issues of using Virtualization with legacy storage. Pure Storage, Nimble Storage, Tintri, Tegile, Coho Data, Data Gravity and moderated by Howard Marks from DeepStorage.net.

STO2496-SPO – vSphere Storage Best Practices: Next-Gen Storage Technologies

The Chad and Vaughn show. Now with Rawlinson Rivera! Storage is changing. Did I say that yet?

More information on Pure Storage Sessions

Customers Unplugged at VMworld: All Tech, No Slides

Coming Soon: Support for VMware VVOLs
Pure Storage set to paint VMworld 2014 orange!

VAAI and XCOPY with Pure Storage

VAAI has been around (almost 4 years now)for a while now and this is one thing I don’t often hear customers or others talking about very often. When your vSphere hosts detect that Hardware Acceleration is compatible. The host will attempt to send VAAI compatible commands to the storage device. As we describe it usually Full Copy is explained as if you need to clone or Storage vMotion a VM the ESXi host issues a command to move the storage device to move the blocks. So when describing this in the past it was a very simple, the Host issue the command and the blocks move. Set it and forget it, right?

Not so fast, my friend!

As good ol’ Lee Corso would say, “Not so fast, my Friend!”

The VAAI Xcopy command tells the storage device to move 4096 KB (AKA 4MB) at a time. So every 4MB is a new command. Not a big deal for disk based xcopy because the blocks could only move from spindle to spindle so fast. Still way more efficient than before but sometimes not actually faster at all.

Along came the Flash Array.

The FlashArray, XCOPY and VAAI

The Pure Storage snapshot technology is used for XCOPY commands. No matter where they are coming from. This results in just a metadata pointer change in order to move the data. The blocks don’t actually move anywhere since they are stored once and mapped in metadata. This enables zero impact snaps and clones that can be created as fast as I can click the button in the GUI.
What does this all mean?
Since the ESXi host is telling the FlashArray to move 4MB at a time the copy function does not reach the full potential of what the FlashArray can really do. It is like using a freight train to move cargo across the country but only putting one box in each car.

Pure Storage recommendation

This is why Pure recommends changing the MaxHWTransferSize (the setting that controls the size of the transfer) to the maximum allowed 16384 (or 16MB).

Default is 4096
Commands to help you change the setting via the CLI

esxcfg-advcfg -g /DataMover/MaxHWTransferSize
Value of MaxHWTransferSize is 4096

Set the transfer size to the Pure Storage best practice:

esxcfg-advcfg -s 16384 /DataMover/MaxHWTransferSize
Value of MaxHWTransferSize is 16384

…but wait there is more!

So the Pure Storage FlashArray is cool with cloning multi TB volumes using xcopy with no impact on performance or space usage. So the question is why only 16MB at a time? (real answer should come from someone way smarter than me at VMware).

I am curious to try out a Storage vMotion or cloning persistent View desktops that fully use the power of the array.
Until then, still better than spinning disk or no VAAI at all.

Changing the vCenter Graphs to Microsecond

So if you are moving your data center to the next generation of Flash Storage you may have noticed your performance charts in VMware vCenter or other tools look something like this.

You start to think, what good is the millisecond scale in a microsecond world? (I know that screenshot is from vCOPS.)

Luckily VMware provided an answer (sorta kinda).

Using microsecond for Virtual Disk Metrics

Go ahead and select your VM and go to Monitor –> Performance and select Advanced.
First change the View from CPU to Virtual Disk(1).
Then select Chart Options(2)

Deselect the Legacy and move on to microseconds.

Then you can select Save Options to use these settings easily next time. The new settings will be saved in the drop down list in the top right corner.

Finally, you have a scale that can let you see what the Virtual Disks are doing for read and write latency.

Disk vs Virtual Disk Metrics

In the vSphere Online documentation the Disk Metric group is described as:
Disk utilization per host, virtual machine, or datastore. Disk metrics include I/O performance (such as latency and read/write speeds), and utilization metrics for storage as a finite resource.

While Virtual Disk is defined:
Disk utilization and disk performance metrics for virtual machines.

Someone can correct me if I am wrong, but the differences I see is even though they are both choices when a VM is selected only the Disk metric gives stats for the datastore device that the VM lives on and can be shown side by side with that VM’s stats but does NOT give the option to change the scale to microsecond if needed. Virtual Disk allows only VM level statistics but permits you to view them as microseconds at least for read and write latency.
Hope this helps.

Twelve Months for a Forklift? Check that, Forever Flash

Recently I was speaking with a potential customer and they were planning on taking 12 months to move from one end of life architecture to latest and greatest from their very big storage provider. Absolutely amazing that customers everywhere have been living with this for years now. Pure Storage introduced a very awesome solution to this issue. Built on the technical awesomeness that a purpose built for flash platform can provide. No legacy to protect so Pure is more than happy to change the way Storage business is done. More on this later.

First Never Move Your Data

Since I am a geek I will start with real production upgrades to your array. Pure can upgrade with no downtime and no performance impact. This is true for software revisions AND hardware upgrades.

Imagine you have the N-1 generation controllers and you want to get all the speed and efficiency that comes with the latest and greatest. Usually you would have to wait to buy an all new array. Use some tool to mirror all the data (if you are lucky) and take a short (if you are super lucky) downtime to move over. Do this for every single host and it could take months. Storage vMotion made this super easy but remember there are still those pesky databases that the DBA never let you virtualize because they don’t want to risk it. One more thing, they can never ever go down. Except when you would rather be at your kids soccer game or something.

Pure Storage allows you to move from controller series older (but still awesome) to series new and shiny (and more awesome) with no downtime, performance still better than you ever had on any $1M boat anchor and get your weekends back.

Now Get the Refresh without the Refresh Quote

Now, imagine getting those new controllers and their inherit boost in performance and efficiency every three years. Just keep your maintenance up to date. Now the conversation dives into OPEX vs CAPEX and resetting contracts and econ stuff I generally don’t cover. Head over to the Forever Flash landing page to dive deeper into what this means. Basically two options exist:

Free Every Three – Renew maintenance for 2 more years after year 3 and get the newest controllers.
Fresh Every Upgrade – Reset your maintenance every time you buy an upgrade (capacity or compute).

No Mas Forklift

More #ForeverFlash Information
http://www.purestorage.com/blog/introducing-forever-flash-an-end-to-maintenance-extortion-and-forklift-upgrades/
http://www.purestorage.com/company/pure-storage-reinvents-the-enterprise-storage-business-model-with-forever-flash.html
http://www.purestorage.com/forever/

Say it with me, “FOREVER, FOR-EV-ERRRR.”

By the way, that customer came out of his seat with excitement when he heard about Pure NDU and Forever Flash. Awesome.

What happened while getting 100% Virtualized

I often think about how many people have stalled around getting to 100% virtual. I know you are thinking I need to find some fun things to do. You are probably right.

The first thing I thought when I deployed my very first virtual infrastructure project back in the day was, “Man, I want to see if I can virtualize EVERYTHING.” This is before I knew much about storage, cloud, and management. I may be naive but I think there is real potential out there to achieve this goal. There is low hanging fruit still out there depending how you deploy your infrastructure. Having attended VMware Partner Exchange (PEX) I know how the ecosystem is built around your journey to virtualization. The biggest slide to resellers and other partners is the one VMware shows off that says, “Every $1 a customer spends on VMware they buy $9-11 in infrastructure.” Which I fully believe is the reason many customers never saw the FULL cost savings they could have when going virtual.

Roadblocks

I believe we all ran into a couple of different kinds of roadblocks on our path. First were organizational. Line of business owners, groups within IT and other political entities made traveling the road very difficult. Certain groups didn’t want to share. Others started to think VM’s were free and went crazy with requests. Finally the very important people who own the very important application didn’t want to be virtual because somehow virtualization was a downgrade from dedicated hardware.

Then if we were able to dodge the roadside problems organizationally, there were technical problems. Remember that $11 of drag? The big vendors made an art of refreshing and updating you with new technology. I know, I helped do it. So performance was a problem? Probably buy more disk or servers. Then every 3-5 years they were back, with something new to fix what the previous generation did not deliver on. This “spinning drag” in the case of storage slowed you from getting to your goal. 100%.

Disillusionment

At some point you lose the drive to be 100% virtual. The ideal has been beaten out of you. Well at least my vendor takes me for steak dinner and I get to go to VMworld and pretend I am a big shot every year. This is where you settle. Resign yourself to the fact that everything is so complicated and hard it will never get done. The big vendors make a huge living on keeping you there. Changing the name from VI, to Private Cloud, Hybrid super happy land or whatever some marketing guys that have never opened the vCenter client think of next.

Distractions

So trying to rebuild Amazon in your data center? Probably lots of other things to fix first. Using more complicated abstraction layers may help in the long run to building a cloud. I see more customers continue to refresh wasteful infrastructure with new infrastructure while they are still trying to figure this out. What we need is a quick an easy win. Make things better and save money right away. Then maybe we can keep working on building the utopian cloud.

The low hanging fruit

When we first started to virtualize we looked for the easy wins. To get you rolling again down the path we need to identify the lowest hanging fruit in the data center. We found all the web servers running at 1% CPU and 300MB of Ram (if that) and virtualized those so quick the app owner didn’t even know it happened. Just like a room of 1000 servers all running at 2% CPU usage there are giant tracks of heat generating spinning waste covering the data center. You had to get so many of them and stripe so wide just to make performance serviceable. You wasted weeks of your life in training classes to learn how to tweak and tune these boat anchors because it was always YOUR fault it didn’t do what the vendor said it would.

Take that legacy disk technology and consolidate to a system made to make sure it is not the roadblock on the way to being 100% virtual. I remember taking pictures of the stacks of servers getting picked up by the recycling people and now is the time to send off tons of refrigerator sized boxes of spinning dead weight. I am not in marketing so I don’t want to sound like a sales pitch. I am seeing customers realize their goal of virtualization with simple and affordable flash storage. No more data migrations or End of Life forklift upgrades. No more having to decide if the maintenance is so high I should just buy a new box. Just storage that performs well all the time and is fine running virtual Oracle and VDI on the same box.

How we do it

How is Pure Storage able to replace disk with Flash (SSD)? Mainly, we created a system from the ground up just for Flash. We created a company that believes the old way of doing business needs to disappear. Customers say, “You actually do what you said, and more.” (Biggest reason I am here). Also, do it all at the price of traditional 15k disk. Not there on SATA, yet.

Make it ultra simple. No more tweaking, moving, migrating or refreshing. If you can give a volume a name and a size you can manage Pure Storage.
Make it efficient. No more wasted space due to having to short stroke drives, no more wasted space because you created a RAID 10 pool and now have nowhere to move things so you can destroy and recreate it.
Make it Available. Support that is awesome because things do happen. Most likely though most of your downtime is planned when it comes to migrating or upgrading code. Pure Storage will allow zero performance hit and zero outage to reboot a controller to upgrade the firmware/code (whatever you want to call it). Pretty nice for an environment that needs ultimate it uptime.
Make sure it alway performs. Imagine going to the DBA’s and saying, “everything is under 1ms latency, How about you stop blaming storage and double check your SQL code?” Now that is something as an administrator I wanted to say for a long long long time.

Once you remove complicated storage from the list of things preventing you from thing preventing 100% virtual you can focus on getting the applications working right, the automation to make life easier and maybe make it to your kid’s soccer games on Saturday.

What do we really need? Cloud? or Change?

Going through the VCAP-DCD material and I had a question. Since it comes with the assumption that everyone is working toward building a private cloud. So I started asking, do I need to build a “cloud” and why? Now don’t think I have completely gone bonkers. I still think the benefits of cloud could help many IT departments. I think more than how do I build a cloud, the question should be what do we need to change to provide better service to the business.

We are infrastructure people

As VMware/Storage/Networking professionals we tend to think about what equipment we need to do this our that. Or how if I could just get 40Gb Ethernet problems XYZ would go away. Often we have to build it on top of a legacy. If we do ever get a green field opportunity it usually needs to be done so quickly we never quite to investigate all the technology we wish we could. There is stuff like All Flash, Hyper-converged things, accelerator appliances, software defined everything all aiming at replacing legacy Compute/Network/Storage.

My last post was about knowing the applications and this is not a repeat of that, but it is very important to for us to look at how our infrastructure choices will impact the business. Beyond business metrics of my FlashArray allows business unit X to do so many more transactions in a day which means more money for the business. What else do the internal customers require from the blinking lights in the loud room with really cold AC.

Ask better questions

How does faster storage change the application?
What will change if we automate networking?
Could workers be more productive if the User experience was better?
What are things we do just because we always do them that way?
What legacy server, storage and network thought processes can we turn upside down?

This type of foundation enables you to focus on the important things like getting better at Halo. Just kidding. My goal is one day Infrastructure Administrators will get to sleep well at night, their kids will know their names and weekends will once again be for fun things and not Storage, Server or Network cutovers. That is the value of Private Cloud, not that I can now let internal customers self-service provision a VM or application (which is still cool). We gain confidence that our infrastructure is manageable. We have time to work on automating the boring repetitive stuff. You get your life back. Awesome.

Start with Applications

I have been revisiting my work towards some advanced datacenter certifications and decided to journal some of the thoughts I have during the process. After a 3 year break I decided it was time to start pushing toward some of these goals.

This may sound eerily similar to something I have said before. It is a constant fight in the infrastructure technology field to get so weighed down by speeds and feeds and features. You begin to lose sight as to why you actually put servers, switches, storage and software together in the first place. While looking at the requirements guide for the VCAP-DCD the very first thing that is mentioned is getting the business requirements. How do I actually do that? What does the business actually require?

Know what the applications actually do.
Ask! What does this Microsoft SQL database do? How does email relate to our business doing deals? Find out how money goes in and out of the business. How does your company pay bills? How do you charge for whatever it is you produce? How do the MBA types make decisions about who, what, when, where and why for your business? In IT we often get so involved in rolling out a new widget from vendor X, Y and Z we often don’t realize what is the purpose to the business. Understand this from a high level first.
Map technology to the impact on the business.
Who cares if I can do a million IOPS if all I do is check email all day? How do I consolidate servers with no plan on how they impact the bottom line? How do I provide cloud like capabilities if no one really needs them? So start to map the capabilities to the benefits to the business. If the decisions being made can be done with data that is 5 minutes old instead of a 24 hours how can that change the landscape of your business? Does this give an advantage over competitors?
Know something about the Apps.
If your answer is I don’t know how are business runs or anything about SQL or Oracle I just make empty VM’s for people to put the apps on. I make sure they turn on and I move them around when they need performance or more capacity. Guess what? Those functions can be done by VMware Orchestrator. If you don’t know why you put 4 vCPU’s on a SQL VM because the batch jobs don’t ever use more than that and why, you need to learn. If you need tools to decipher the differences then get them. At least get the trial versions so you can see what happens. Get close to the queries that run at night. Do you know if they are CPU, Memory or Storage bound? Find out. Get off of reddit and check it out. Do you know if you put in faster servers will the app improve in a way that makes things better for business? Are you really going to gamble your budget on marginal improvements?

Can you connect how all of these things relate and benefit the business?

Just some small things I have been thinking about. In my job it is a constant temptation to push how many IOPS you can do with this thing or that. When I need to say “what process needs the performance? If that process is faster AND you get additional benefits of data reduction, floor tile reduction, power usage reduction what will it mean to your business users?”

Presidio and Pure Storage at Sweetwater Brewery – January 16th 5:30

If you like to try out some awesome beer and learn about how Flash can change your data center. Meet Presidio and Pure Storage at the Sweetwater Brewery in Atlanta on January 16th at 5:30.

Learn how change your Database, Virtual and VDI environments. No longer worry about performance and get amazing high availability.

Join us! I am excited to meet you if you are in the Atlanta area.

Remember to register now!

No Spindles Bro

I was assisting one of my local team members the other day with sizing a VM for Microsoft SQL. I usually always fall back to this guide from VMware. So I started out with the basic seperation of Data and Logs and TempDB.

Make it look like this:

VM Disk Layout

LSI SCSI Adapter
C: – Windows

Paravirtual SCSI Adapter
D: – Logs
E: – Data
F: – TempDB

Which is pretty standard. Then someone said, “Why do we need to do that?” I thought for a second or five. Why DO we need to do that? I knew the answer in the old school. Certain raid types were awesomer at the types of data written by the different parts of the SQL Database. We are in a total post-spindle count world. No Spindles Bro! So what are some reasons to still do it this way for an All Flash Array?

1. Disk Queues
I think of these like torpedo tubes. The more tubes the less people are waiting in line to load torpedoes. You can fire more, so to speak. Just make sure the array on the other end is able to keep up. Having 30 queues all going to one 2 Gbps Fiber Channel port would be no good. See number 3 for paths.

2. Logical Separation and OCD compliance (if using RDMs)
Don’t argue with the DBA. Just do it. If something horrifically bad happens the logs and data will be in different logical containers. So maybe that bad thing happens to one or the other, not both. I am not a proponent of RDM’s. SO much more to manage. If you can’t win or don’t want to fight that fight at least with RDM’s you will be able to label the LUN on the array “SQLSERVER10 Logs D” so you know the LUN matches to something in Windows. This also makes writing snapshot scripts much easier.

3. Paths
Each Datastore or RDM has its own paths, if you are using Round Robin (recommended for Pure Flash Array) more IO on more paths equals better usage of the iSCSI or FC interconnects. If you put it all on one LUN, you only get those queues (see #1) and those paths. Remember do what you can to limit waiting.
Am I going down the right path? How does this make it easier? Are there other reasons to separate the logs and data for a database other than making sure the Raid 10 flux capacitor is set correctly for 8k sequential writes? I don’t want to worry about that anymore. Pretty sure plenty other VM Admins and DBA’s don’t either.

For me a good exercise in questioning why I did things one way and if I should still do them this way now.

I am now at Pure Storage

Thought that after 2 weeks I would put it on my blog. It is long past official as I have already done “New Hire” and I am officially part of the Puritan family. My Orange pants are on order. One thing I am excited about is getting to install the array for my customers. Not just talking about how awesome it is but getting to see it. This should definitely inspire blog posts to share what I learn along the way.
I know many people probably already knew this, but someday I would like my blog to be a FLASH of the progression through my career.