Recent ESXi articles

I’ve published a number of ESXi specific posts recently.  However, I realize that I tend to publish these at weekends or late in the evening, which isn’t optimal for most readers coming from Planet V12n or Twitter.  Sorry, that’s just when I have free time (yes, I’ve heard of delayed publishing in WordPress, but I’m usually too excited and want to get things out there).  So here is a wee compilation summarizing the ESXi related ones, just in case they’ve passed you by:

Understanding ESXi – stateless, diskless, feckless – What does ESXi stateless or diskless really mean?  This article tries to explain the concepts behind emerging ESXi install options and what defaults you can expect depending on your hardware set-up.  It then discusses the impact it can have on your ESXi design.

ESXi disks must be “considered local” for scratch to be created – Some servers’ local disks are actually seen by the ESXi installer as remote.  This can change the default install options and create a setup that you weren’t expecting.

Check for ESXi scratch persistence – How to check what the installer has actually done when it installed itself.  This looks at the issue examined in the previous post regard ESXi scratch locations, and how to check through your servers.

“Best Practice” for Persistent ESXi scratch? – If your installs have given you mixed configurations, what should you do?  I discuss standardizing versus optimizing.

How to PXE boot from your trunked vmnic0 – Typically, your vmnic0 is physically connected to a trunked switch port.  PXE booting servers don’t tag their traffic. How do you PXE boot from this same connection without re-cabling?

How to PXE boot from your trunked vmnic0

I’ve recently been thinking about the practicalities of PXE booting ESXi servers.  Sounds great, but how do you make this work in a typical environment?

Using trunked connections on ESXi hosts is very much common place.  It’s likely that your ESXi’s Management Network connection, which by default will be your first onboard NIC (vmnic0), is connected to a trunked uplink switch port.  Probably the most popular configuration is bonding your Management Network with your vMotion vmknic on a vSwitch with two trunk uplinks which includes vmnic0.  The drive towards 10GbE and cable consolidation only increases the likelihood that your vmnic0 will patched into a trunked port.

VMware are starting to pursue solutions using servers’ ability to PXE boot.  The potential to PXE boot into an installation routine is not a new concept.  VMware’s AutoDeploy and the recently announced PXE Manager fling, uses this technique. In fact not only PXE booting the install, but actually PXE booting the OS itself via the network, or stateless as it is being referred to (although this term really defines something specific, not just PXE booting).

The question comes – how do I PXE boot my servers which are connected to trunked interfaces on the switch?  If your servers are physically connected to a trunked connection, then a standard PXE boot won’t tag the traffic appropriately (tell me if I’m wrong – is this something you can set in a server BIOS these days?)  You don’t want to re-patch a server’s network cables if you have to quickly rebuild it.  Or if you are PXE booting (stateless) then you’d have to do this for each reboot.  And you don’t want to trouble your Network Admin to change it back to an access port every time.

This is where I think Native VLANs can help out.  As a vSphere server guy, what I know about Native VLANs is VMware’s advice that you avoid tagging traffic with VLAN 1, because this is what Cisco set as the default Native VLAN for switches. When thinking about VLAN IDs for your trunked ESXi ports, you just choose something other than 1.  But Native VLANs could provide a solution to the problem of PXE booting on trunks.

If the interface for your vmnic0 has a Native VLAN, then when the server tries to PXE boot, it can get out onto the network.  If untagged traffic is being received on a switch’s trunked interface, then it will assume it is for that interface’s Native VLAN.  You could have the Native VLAN set as the same VLAN as your Management Network subnet.  Then it will PXE boot straight on to the same subnet that it will get once the Management Network is brought up.  Alternatively, if you only want to PXE boot into an installer, you could set your Native VLAN to a special build subnet.  Once the server is built, then the Management Network traffic is tagged back on to your regular trunked VLAN.

So what do you think?  Feasible, secure enough, any potential issues? Or do you have other ways you set this up in your environment that you can recommend to everyone?

"Best Practice" for Persistent ESXi scratch?

This is third post in as many days, regarding the ESXi scratch partition – here is the first and the second.

Aaron Delp posed the question – what should be a “best practice” regarding this?

Should we all run around making changes?  Personally, I think that at the very least you should go out and run a discovery of your environment.  It’s up to you to know what you are dealing with, and it certainly seems as though there are some inconsistencies out there.

What I am particularly interested in is what to do if you find hosts that aren’t set with a persistent scratch location, and there is a local disk available (as I described in the first post).  VMware state in their KB that if you want to create an addition to your kickstart script then you should do the following:
scratchdirectory=/vmfs/volumes/DatastoreName/.locker-$(hostname 2> /dev/null)-$(esxcfg-info -b 2> /dev/null)
mkdir -p $scratchdirectory
vim-cmd hostsvc/advopt/update ScratchConfig.ConfiguredScratchLocation string $scratchdirectory

There is no reason you can’t also use these lines to retro-fit this setting to existing servers.  But I’m not sure this is always the best approach.  The reason is that this assumes that all your hosts have no 4GB scratch partition created.  If the ESXi server already has 4GB set aside as FAT scratch partition, why would you want to move the scratch location to VMFS datastore?

I guess there are two schools of thought here:
1) Change all your hosts to use a VMFS datastore, regardless of the availability of any existing allocated space. That way you know all your servers are same.
2) Stick with the build default, whatever that is.  So if it created the partition – use it; if it already set a scratch location on the first VMFS volume it found – use that; or if it thought the local disks were remote THEN create a folder and set this as the scratch location.

Forcibly standardise or go with defaults – that is your choice.  Standardising is probably better in larger environments where managing unknowns is less attractive than loosing a little disk space.  In smaller places where it is important to eek out every bit of value from your CAPEX, then you might want to use the FAT partition if it’s already there.  Either way, you’ll also need to factor in the “cost” of making changes, as it requires a reboot which needs planning and execution.  If you want to standardise then go ahead and use something like the script above.

If you don’t want to move the scratch location if the 4GB FAT partition exists, then try something like this:
if df -h | grep -q 4.0G
then echo "Scratch partition already exists, let's use that"
else
if cat /etc/vmware/locker.conf | grep -q .locker
then echo "Persistent scratch location already set to VMFS folder"
else
then mkdir -p /vmfs/volumes/datastore1/.locker
vim-cmd hostsvc/advopt/update ScratchConfig.ConfiguredScratchLocation string /vmfs/volumes/datastore1/.locker
fi
fi

Now I don’t purport to be a scripter, so test my hack carefully – it may kill all your kittens by mistake.  And let me know if you have a better suggestion.

Check for ESXi scratch persistence

In my last post, I looked at how the ESXi installer may not create a scratch partition if it identifies the local disks as remote during the install.  I had stated that the following was good check to see if you had a scratch partition setup: cat /etc/vmware/locker.conf

However after a bit more testing down the rabbit hole, it appears this isn’t a good definitive test.  Before I explain why, to check that the ESXi host is using a persistent scratch “location” run this instead:
vim-cmd hostsvc/advopt/view ScratchConfig.CurrentScratchLocation
If the value is null, i.e. value = “”, then no persistent scratch location is set in the running configuration.  Changing the ScratchConfig.ConfiguredScratchLocation value will load this after the next reboot (as per the instruction in my last post)

The reason the locker.config file isn’t a definitive test is that ESXi can set the Configured value in several ways.  If you use the vim-cmd method it creates an entry in the locker.conf file (and creates the file it if it doesn’t already exist).  However if this file doesn’t exist, then ESXi goes on to check the following (from the KB):

2. A Fat16 filesystem of at least 4 GB on the Local Boot device.
3. A Fat16 filesystem of at least 4 GB on a Local device.
4. A VMFS Datastore on a Local device, in a .locker/ directory.
5. A ramdisk at /tmp/scratch/

I have found hosts where there is no locker.config file, but because a 4GB FAT partition had been created during the initial install, it uses that.  In these cases there is no .locker directory, but everything sits directly in the partition and it is mounted under /vmfs/volumes/ as to be accessible by the vmkernel.  Interestingly, in this configuration there is no sym linked datastore, so you won’t see this volume in the vSphere client.

For hosts where the 4GB FAT partition doesn’t exist, but a local VMFS datastore is present, you can find that a .locker folder is created. You can see these from the vSphere client datastore browser.  But remember that if you are a POSIX style console (like the vMA or ESXi shell), then as this folder is prepended with a period (“full stop” in real English :)), the folder will be hidden.

Various changes have occurred with regard to the scratch location during the 4.x cycle.  I guess this is why ESXi has to check all these locations for possible dump sites.  Also, when using vendor specific images, it could depend how they patch their master images before releasing them.  So it’s very difficult to understand which versions are set in which ways.

The interesting thing, is that the existence of scratch specific “partition” does not categorically determine the persistence of scratch.  ESXi can use a scratch folder and it will still be persistent across reboots.  Only the 5th option above forces it into a volatile ramdisk.  So the correct terminology is “persistent scratch location”. I for one welcome our new persistent scratch location nomenclature overlords…

Remember though, the moral of the first post is still valid.  Some servers’ local disks are treated as non-local and therefore aren’t configured with a “persistent scratch location” at all (even though there is a local VMFS volume available).  This inconsistency is something you want to check if you don’t want surprises.