Resizing the Linux Root Partition in a Gen2 Hyper-V VM

Resizing the Linux Root Partition in a Gen2 Hyper-V VM

Without a doubt, modern virtualization has changed the landscape of enterprise computing forever. Since virtual machines are abstracted away from the physical hardware, changes in compute, memory, and storage resources become mere clicks of a mouse. And, as hypervisors mature, many operations that were once thought of as out-of-band tasks, such as adding storage or even memory can now be done with little, or even zero downtime.

Hyper-V SCSI Disks and Linux

In many cases, hypervisors are backed by large storage area networks (SANs). This provides shared storage for hypervisor nodes that supports failover clustering and high availability. Additionally, it gives administrators the ability to scale the virtual environment, including the ability to easily add or expand storage on existing virtual servers. Microsoft’s Hyper-V 2012 introduced Generation 2 VMs, which extends this functionality. Among the many benefits of Gen2 VMs, was the ability to boot from a SCSI disk rather than IDE. This requires UEFI rather than a legacy BIOS, so it’s only supported among newer operating systems. Many admins I talk to think this is limited to Microsoft Server 2012 and newer, probably because of the sub-optimal phrasing in the Hyper-V VM creation UI that altogether fails to mention Linux operating systems.

The fact is, however, that many newer Linux OSes also support this ability, as shown in these tables from Microsoft.

More Disk, Please

Once you’ve built a modern Linux VM and you’re booting from synthetic SCSI disks rather than emulated IDE drives, you gain numerous advantages, not the least of which is the ability to resize the OS virtual hard disk (VHDX) on the fly. This is really handy functionality – after all, what sysadmin hasn’t had an OS drive run low on disk space at some point in their career? This is simply done from the virtual machine settings in Hyper-V Manager or Failover Cluster Manager by editing the VHDX.

Now, if you’re a Microsoft gal or guy, you already know that what comes next is pretty straightforward. Open the Disk Management MMC, rescan the disks, extend the file system, and viola, you now automagically have a bigger C:\ drive. But what about for Linux VMs? Though it might be a little less intuitive, we can still accomplish the same goal of expanding the primary OS disk with zero down time in Linux.

On-the-Fly Resizing

To demonstrate this, let’s start with a vanilla, Hyper-V Generation 2, CentOS 7.6 VM with a 10GB VHDX attached to a SCSI controller in our VM. Let’s also assume we’re using the default LVM partitioning scheme during the CentOS install. Looking at the block devices in Linux, we can see that we have a 10GB disk called sda which has three partitions – sda1, sda2 and sda3. We’re interested in sda3, since that contains our root partition, which is currently 7.8GB, as demonstrated here by the lsblk command.

Now let’s take a look at df. Here we can see an XFS filesystem on our 7.8GB partition, /dev/mapper/centos-root which is mounted on root.

Finally, let’s have a look at our LVM summary:

From this information we can see that there’s currently no room to expand our physical volume or logical volume, as the entirety of /dev/sda is consumed. In the past, with a Gen1 Hyper-V virtual machine, we would have had to shut the VM down and edit the disk, since it used an emulated IDE controller. Now that we have a Gen2 CentOS VM with a SCSI controller, however, we can simply edit the disk on the fly, expanding it to 20GB.

Once the correct virtual disk is located, select the “Expand” option.

Next, provide the size of the new disk. We’ll bump this one to 20GB.

Finally, click “Finish” to resize the disk. This process should be instant for dynamic virtual hard disks, but may take a few seconds to a several minutes for fixed virtual hard disks, depending on the size of the expansion and speed of your storage subsystem. You can then verify the new disk size by inspecting the disk.

OK, so we’ve expanded the VHDX in Hyper-V, but we haven’t done anything to make our VM’s operating system aware of the new space. As seen here with lsblk, the OS is indifferent to the expanded drive.

Taking a look at parted, we again see that our /dev/sda disk is still showing 10.7GB. We need to make the CentOS operating system aware of the new space. A reboot would certainly do this, but we want to perform this entire operation with no downtime.



Issue the following command to rescan the relevant disk – sda in our case. This tells the system to rescan the SCSI bus for changes, and will report the new space to the kernel without a restart.

Now, when we look at parted again, we’re prompted to move the GPT table to the back of the disk, since the secondary table is no longer in the proper location after the VHDX expansion. Type “Fix” to correct this, and then once again to edit the GPT to use all the available disk space. Once this is complete, we can see that /dev/sda is now recognized as 20GB, but our sda3 partition is still only 10GB.

Next, from the parted CLI, next use the resizepart command to grow the partition to the end of the disk.

Our sda3 partition is now using the maximum space available, 20.2GB. The lsblk command also now correctly reports our disk as 20GB.

But what about our LVM volumes? As suspected, our physical volumes, volume groups and logical volumes all remain unchanged.

We need to first tell our pv to expand into the available disk space on the partition. Do this with the pvresize command as follows:

Sure enough, our pv is now 18.8GB with 10.00GB free. Now we need to extend the logical volume and it’s associated filesystem into the free pv space. We can do this with a single command:

Looking at our logical volumes confirms that our root lv is now 17.80GB of the 18.80GB total, or exactly 10.0GB larger than we started with, as one would expect to see.

A final confirmation with the df command illustrates that our XFS root filesystem was also resized.

Conclusion

So there you have it. Despite some hearsay to the contrary, modern Linux OSes run just fine as Gen2 VMs on Hyper-V. Coupled with a SCSI disk controller for the OS VHDX, this yields the advantage of zero-downtime root partition resizing in Linux, though it’s admittedly a few more steps than a Windows server requires. And though Linux on Hyper-V might not seem like the most intuitive choice to some sysadmins, Hyper-V has matured significantly over the past several releases and is quite a powerful and stable platform for both Linux and Windows. And one last thing – when you run critically low on disk space on Linux, don’t forget to check those reserved blocks for a quick fix!

Reclaim Linux Filesystem Reserved Space

Reclaim Linux Filesystem Reserved Space

As IT Pros, we have a myriad of tools available to us to configure and tweak and tune the systems we manage. So much so, there are often everyday tools right under our noses that might have applications we may not immediately realize. In a Linux environment, tune2fs is an indispensable tool, used to tune parameters on ext2/ext3/ext4 filesystems. Most Linux sysadmins that have used mdadm software RAID will certainly recognize this utility if they’ve ever had to manipulate the stride size or stripe width in an array.

tune2fs

First, Let’s take a look at the disks on an Ubuntu file server so we can see what this tool does.

Now, we can use the tune2fs with the -l option to list the existing parameters of the filesystem superblock on /dev/sdb1.

Reserved Blocks?

As you can see, there are a number of parameters from the filesystem that we can view, including a number that can be tuned with tune2fs. In this article however, we’re going to focus on a rather simple and somewhat innocuous parameter – reserved block count. Let’s take a look at that parameter again:

At first glance, it isn’t obvious what this parameter means. In fact, I’ve worked with Linux sysadmins with years of experience that weren’t aware of this little gem. To understand this parameter, we probably have to put it’s origins in a bit of context. Once upon a time, SSDs didn’t exist, and no one knew what a terabyte was. In fact, I remember shelling out well north of a $100 for my first 20GB drive. To date myself even further, I remember the first 486-DX PC I built with my father in the early ’90s, and it’s drive was measured in megabytes. Crazy, I know. Since drive space wasn’t always so plentiful, and the consequences of running out of disk space on the root partition in a Linux system are numerous, early filesystem developers did something smart – they reserved a percentage of filesystem blocks for privileged processes. This ensured that even if disk space ran precariously low, the root user could still log in, and the system could still execute critical processes.

That magic number? Five percent.

And while five percent of that 20GB drive back in 1998 wasn’t very much space, imagine that new 4-disk RAID1/0 array you just created with 10TB WD Red Pros. That’s five percent of 20TB of usable space, or a full terabyte. You see, though this was likely intended for the root filesystem, by default this setting applies to every filesystem created. Now, I don’t know about you, but at $450 for a 10TB WD Red Pro, that’s not exactly space I’d want to throw away.



We Don’t Need No Stinking Reserved Blocks!

The good news, however, is that space isn’t lost forever. If you forget to initially set this parameter when you create the filesystem, tune2fs allows you to retroactively reclaim that space with the -m option.

Here you can see we’ve set the reserved blocks on /dev/sdb1 to 0%. Again, this isn’t something you’d want to do on a root filesystem, but for our “multimedia” drive, this is fine – more on that later. Now, let’s look at our filesystem parameters once again.

Notice now that our reserved blocks is set to zero. Finally, let’s have a look at our free disk space to see the real world impact. Initially, we had 50GB of 382GB free. Now we can see that, although neither the size of the disk nor the amount of used space has changed, we now have 69GB free, reclaiming 19GB of space.

Defrag Implications

Lastly, I’d be remiss if I didn’t mention that there’s one other function these reserved blocks serve. As always, in life there’s no such thing as a free lunch (or free space, in this case). The filesystem reserved blocks also serve to provide the system with free blocks with which to defragment the filesystem. Clearly, this isn’t something you’d want to do on a filesystem that contained a database, or in some other situation in which you had a large number of writes and deletions. However, if like in our case, you’re dealing with mostly static data, in a write once, read many (WORM) type configuration, this shouldn’t have a noticeable impact. In fact, the primary developer for tune2fs, Google’s Theodore Ts’o, can be seen here confirming this supposition.

So there you have it. You may be missing out on some valuable space, especially in those multi-terabyte arrays out there. And though it’s no longer 1998, and terabytes do come fairly cheap these days, it’s still nice to know you’re getting all that you paid for.

Choosing the Right Hard Drive for Your Lab

Choosing the Right Hard Drive for Your Lab

At Teknophiles, we run a fairly large number of hard drives in our lab servers. These drives fulfill several different duties, but typically fall into three primary categories. Over the years, we’ve tried a bunch of combinations, been through several iterations, and found some setups that worked well and others that didn’t. We’ll walk through our criteria for each type of drive, and hopefully help you choose the right drive for the application at hand.

Performance Drives

First, let’s talk about performance drives. We typically use 2.5″ enterprise SATA solid state drives, designed for high IOPS and long service lives. Similar in performance to high-end consumer drives, these typically have additional  features such as power-loss protection, higher write endurance, a greater number of spare blocks, and better error correction. These drives are great for arrays housing virtual machines, databases, or other high workload operations. What you get in performance, however, is offset by a much higher price per GB compared to other types of drives. You’re likely not going to put your media collection on these drives, unless of course you’ve got really deep pockets! For these drives, look for used, low-hour enterprise SSDs. These can typically be found with much of their useful service life left after retirement from a data center.

Storage Drives

The 4TB Seagate STDR4000100 is an excellent candidate for shucking

Second, we have storage drives. These drives comprise the SATA arrays that contain mostly static data, and typically fall into the write once/read many (WORM) category of service. With these drives, we’re not so much concerned with raw performance, nor ultra-high reliability. In a lab environment, we’re looking for three primary attributes with our storage drives: 1) low price per GB, 2) storage density (TB per rack unit) and 3) low watts per TB. Since it takes a significant number of drives to assemble a 30, 50, or 100 TB array, meeting these criteria keeps the overall costs of drives down, takes up less space in the rack and requires lower energy costs to operate. Individually, these drives may be quite slow – even 5400 RPM spindles will suffice, but in the proper configuration can still saturate 1Gbps or even 10Gbps links. And since we’ll be employing a number of these drives in a single array, we’ll be assuming a, “strength in numbers” approach, both from a performance and reliability standpoint. A popular, low-cost strategy for sourcing 2.5″ or 3.5″ HDDs is shucking external USB drives from several different vendors. A bit of research will reveal which drives are housed in each external drive model. But be careful! Not all external USB drives use a standard SATA connector internally, and you’re also sacrificing your warranty by doing this. It’s best to thoroughly check the drive for errors before disassembling the USB enclosure, and make a warranty claim if necessary. However, because you can save tens of dollars per drive by adopting this strategy, you can save enough to essentially “self-warranty” the drives by using the savings to keep a spare drive around, with the added benefit of limiting downtime in the event of a failure.

Archive & Backup Drives

A third category of drive is the archive or backup drive. These drives are typically not configured into RAID arrays, though they can be if one so chooses. In our lab environment, we choose to use individual backup disks grouped into a large storage pool. This gives us the benefit of a single, large backup target, but without the added cost and complexity of RAID groups. We have redundancy in our primary storage arrays, so if a single backup disk fails, the next backup job will simply copy that data back to the pool. Like storage drives, the backup drives are large, inexpensive, relatively slow disks. We typically use 4TB-6TB or larger 3.5″ disks for this purpose. Again like storage drives, many people choose to shuck drives like the WD My Book external drives and adopt a self-warranty strategy.



OS Drives

The final category we’d like to mention is server OS drives. Why do we consider OS drives to fall into their own category? Simple – efficient use of disk space. With many drives, whether SSD or spindle HDDs, you’ll likely find that after installing your OS to a RAID1 array, you have much more space than you’ll ever need. Unless you’re purposing servers for multiple duties, you’ll find that most Windows Server OSes use less than 20 GB of disk space, and even applications like WSUS which employ the Windows Internal Database (WID) will use less than 40 GB of space for the C: drive. Thus, it makes little sense to use drives that are terabytes, or even only a few hundred gigabytes, since the majority of that space will just be wasted. And though not paramount, some reasonable amount of performance is desirable for these drives, as it speeds reboot times and increases overall responsiveness of the server. To that end, small, consumer SSDs fit the bill perfectly for these drives. They’re inexpensive (sometimes, under $40 each), reasonably fast, mostly reliable, and we don’t have to worry much about write cycles, since a typical OS workload is primarily read (75-80% in our tests in Enterprise environments). While there aren’t a huge number of drives that fit these criteria, there are still a reasonable number of 50-60 GB drives available and plenty of affordable 120 GB options out there as well.

One note if you choose to use a small SSD for your OS drives.  In most cases it won’t be an issue, but do exercise caution if you use a relatively small drive for a hypervisor with a significant amount of RAM.  Since a Windows managed paged file can grow quite large (as much as 3x RAM!), you can see how the page file could easily fill a 60 GB drive. Consider a system with as 64 GB of RAM.  When performing a complete system crash dump, a full 1x RAM is required to write out the dump to either the page file or dedicated dump file.  This would quickly overwhelm the drive, even though a page file likely wouldn’t be needed at all during normal operation, assuming the system’s memory was well managed.  Given this potential issue, some Admins choose to manually set the page file to a specific value to prevent the drive from filling.  This comes with the tradeoff of not being able to perform the full system dump, however.  Check here for more information on Microsoft’s recommendations for calculating page file sizes.

TIP

If you have multiple servers, try to stick with the same drive, or at least a drive of the same capacity. This way, you can stockpile one extra drive to serve as a cold spare for all of your servers.

Other Drives

You might have noticed that we neglected to mention Enterprise SAS and ultra high-end SSDs. These drives certainly have value in specific applications, but a home lab environment is probably not the best use case. SAS drives can be expensive, power-hungry, require SAS controllers, and are finicky about mixed-use with SATA drives. And while you might have one in your high-end gaming rig, it’s not likely you’re going to fill your home lab with a dozen or more PCIe or NVMe drives, due to cost alone. We find it’s best to keep things relatively simple when it comes to your home lab, and we hope these tips will help you select a storage strategy that will serve you well into the future.

Low MTU Issue on Sophos UTM

Low MTU Issue on Sophos UTM

If you know me, you know that I am one that likes to apply the latest firmware or software updates to my equipment at home. I hate looking at my consoles and seeing that there is an update waiting to be applied. Well this finally bit me at home about a month ago. If you saw my other post, you know that I use Sophos UTM Home Edition at home as my firewall of choice. I have been using this for close to a year now. For the entire time that I have been using UTM I have been applying the firmware updates as soon as I see them available in the WebAdmin. Not once have I had a problem. That is until version 9.405-5.

Like any other update I applied it like normal and everything appeared happy and functioning. Then comes the day of the Battlefield 1 demo for the Xbox One. I was geared up to play this demo all day at work. I get home, eat dinner with the family and put the kids to bed. Finally, it is time to play. I jump on create a party and let me buddy know that I am online. He jumps online and attempts to join the party. Next thing you know I get the message “your network settings are blocking party chat”. We attempt to create the party a few more times unsuccessfully. At this point I am puzzled and angry. This is starting to cut into my game time.

It had been a while since I last check my NAT on Xbox so I decided to go there to check what my NAT type was. I went to the Home screen the chose Settings > All Settings > Network > Network Settings. Sure enough right there on my screen was my NAT type still showing Open. So now what? I figured maybe that was a false reading so why not run the test manually and see what happens. I went ahead and clicked on “Test Network Connection” located on the right hand side of the screen. The test took 10 – 15 seconds to run (felt longer than that when it is chewing into my limited gaming time) and came back with all test passed. So I go back to the Home screen and create a new party and have my buddy join and again I receive the message that my network settings are blocking party chat. At this point I was really frustrated. I figured why not try and join a game and see what happens so I launch the Battlefield 1 demo and attempt to join a match. To my surprise I was able to join and play a round without any issue. Now I was wondering if maybe there were issues with the Xbox Live services so I jump on my phone and went to check the Xbox Live Status. Unfortunately, everything was showing normal. So I decided to jump into the UTM WebAdmin and check the firewall logs. While watching the logs live feed I could see myself connecting to Xbox live successfully without anything being blocked. At this point it was late and I was tired so as much as it pained me to leave something not working, I had to walk away.

The next night I attempt to jump online with by friend hoping that everything was magically resolved and to my disappointment the same thing was happened. This time after I went and tested my network connection on my Xbox I noticed another test to run called “Test Multiplayer Connection”. I gave this a shot expecting it to succeed just like the network tests have been. To my surprise the multiplayer test failed and returned a message “There’s an MTU problem” and showed that I had an MTU of 576. I thought that was odd because I knew I had my MTU set at 1500 on my WAN interface within UTM. I proceeded to log into UTM to verify that this did not change for some reason. After logging into the WebAdmin I went to Interfaces & Routing > Interfaces and I take a look at the External (WAN) connection and it is showing an MTU of 1500. At this point I thought of the good ole saying “When all else fails, reboot!” So that is exactly what I did. I started by rebooting my cable modem, then my firewall and finally my Xbox. I then ran through both the network test and the multiplayer test and received the same results. The network tests passed and the multiplayer tests failed with the MTU error. At this point I figure I should start ruling out equipment so I go to the basement and unplug my cable modem and bring it up stairs and hook it directly to my Xbox. Once the modem is fully booted, I start up my Xbox and head over to the networking section to start running the tests again. I run the network test and the multiplayer tests and they both pass. I jump into a party chat and call up my friend to see if he can join really quick to test. He was able to join successfully and we were able to chat and stay connected without a problem. So at this point the problem would point to either my firewall or my switch. Well I know I hadn’t made any changes to my switch in months so left me with my firewall.

On the third night I began combing through my firewall configs. I start looking at my NAT rules and my firewall rules to make sure they are configured the way I left them. I decided to start disabling features to see if anything I had enabled would have started affecting the Xbox. (I plan on covering my favorite features of UTM in a later post.) Even though I have exceptions created for my Xbox I figured this was a good place to start. I began by disabling the Web Filtering feature. No change. Multiplayer tests are still failing. Now I disable the Advanced Threat Protection feature and test again. No change. Lastly, I disable the Intrusion Prevention feature. And again no change. The multiplayer tests continue to fail and now a new issue has cropped up. My daughter likes to stream Netflix from my Xbox after dinner for a few minutes before bed. She is now unable to stream anything from the Xbox. At this point I am thinking that I made things worse by messing around in the configs however I am puzzled as to why my other devices are able to stream Netflix just fine. I figured it was best to roll back to a previous day’s backup where I knew Netflix was working. Luckily that did fix Netflix however I did notice that it was taking a much longer time to cache the movies before they began to play. I decided to try and stream a video outside of Netflix to see how that performed. I was able to stream a movie from my Plex server without an issue so now I needed to find a way to stream a video from the internet on my Xbox. I went to the help section and found a video about the new features with from the latest Xbox update. When I went to stream this video something very interesting happened. The video would only play for 3 seconds then stop, cache, then play for another 3 secs and cache and it kept going like that. At this point I am puzzled as to how to proceed.

Prior to using UTM as my firewall I was using pfsense and had been using it successfully for years. I decided to stand up a pfsense install on a spare PC that I had laying around and put it in place of my UTM server and see what happened. Luckily nobody was home that night so I was able to take the internet down without any complaints to put the pfsense server in. After placing the pfsense server in line I went upstairs to begin testing my Xbox. I hard powered my Xbox, logged in, went to the network settings and began the same round of testing that I had been doing. To both my joy and disappointment all the tests passed. I again called up my friend to test out the party chat and so we jumped into a party and everything connected and worked just fine. I was not satisfied with this solution though. I want to use UTM for my firewall solution.



The next day at work I began to scour the internet for solutions. I did not realize that ISP’s are notorious for handing out low MTU’s and it just so happens that those low MTU’s are exactly 576. I still didn’t understand why my MTU was showing set for 1500 in the Sophos WebAdmin console but my Xbox was reporting my MTU as 576. Once I Googled “UTM 9405-5 low mtu” the search returned exactly what I was looking for. The first result linked to a forum post on Sophos’ site referencing this exact problem and within this post was a posted workaround for the problem. Apparently in version 9.405-5 Sophos made a change where if your ISP handed out an MTU when your modem requested a DHCP address then it would override the MTU that you specified and go with what the ISP handed out which just so happens to be the low MTU of 576. The workaround for this was to either reinstall your UTM to a lower version of 9.405-5 or modify a file on the server to remove the new MTU option that was introduced. Here is the process that you have to go through to modify the default.conf file and remove the changed MTU option.

First log into the UTM servers shell console using the root login.

Now change to the directory that houses the default.conf file

If you less the default.conf file you by typing:

You should see something similar to:

The important line here is the line

Now let’s modify the default.conf file and make the changes that will fix this MTU issue:

change the line to

then save the file.

At this point you could just take you WAN interface down and back up again but I always like to reboot. If you try taking the interface up and down and that doesn’t work, try giving the server a reboot.

After making this change I put Sophos back in line a lo and behold everything worked just fine.  Personally I didn’t like this as a fix for my system.  So what I did was I logged into Sophos WebAdmin and downloaded a copy of all of my config backups dating back to when I first stood up the system.  I then reinstalled Sophos UTM to the original version that I had saved which was version 9.356-3.  I then browsed to the Sophos Up2Date repository and downloaded the manual Up2Date files to get my UTM box to version 9.404-5 which was the version on my config backups right before version 9.405-5.

A few days after I had fixed my UTM server I had a coworker update his UTM server firmware to version 9.405-5 and the next day his Vonage service began failing. I pointed him to the fix above and he was able to implement it successfully and his Vonage service returned to normal.

Sophos has acknowledged the issue and began working on a fix to the issue ([NUTM-4992]: [Network] Unitymedia / KabelBW customer getting always the MTU 576).  This fix was included in version 9.407-3. I have not personally applied this firmware yet. Sophos is still defaulting to the MTU issued from the ISP however you can change the default behavior via the command shell.

I did find a very helpful post on the Sophos forum that walked you through how to disable the MTU auto discovery feature that was applied in version 9.407-3.  I have not personally tried this fix but it appears many people have had success using this so I figured it was worth sharing:

When you type cc and press enter you will enter kind of a second shell. Inside that shell you will type those command twister5800 provided. I’ll try to explain a bit further to you:

You will get and output like:

This means your are inside cc shell.

Now it gets a little tricky. Most setups which has this issue uses an ethernet type WAN, so:

Here you will have to “select” your WAN interface. To do that:

type REF_ (this is case sensitive) and press [TAB] two times. It should list all your ethernet type interfaces, like this:

On a default configuration system, it should look exactly like this, but don’t worry if it doesn’t. From that lines, look for the one that contains something like “REF_IntEthExternaWan” or the name of your WAN interface.After you locate the name for your WAN interface from the list, type the rest of the object name (case sensitive). To avoid any typos, you can copy and paste the rest of the object name after REF_.

For example, provided that your WAN interface is using the default name, you should then complete REF_ with:

That will autocomplete the name for your WAN interface. Then, press [ENTER] again.

You should get an output iike this:

If you do, you are in the right track. Then type:

You will get the same output as before, but mind the subtle change on ‘mtu_auto_discovery’ line, that should now be 0.

To save, type 

this will save your configuration.

this will return to the shell.

After that, fix the MTU in Webadmin and it should not revert to 576 anymore.

Let me know how it goes.

Regards – Giovani

Here is a link to the page that contains the post from Giovani:  https://community.sophos.com/products/unified-threat-management/f/hardware-installation-up2date-licensing/80641/sophos-utm-9-407-3-released#pi2132219853=2

I am undecided if I am going to apply the fix in version 9.407-3 or not.  I am leaning towards holding off until the next firmware version to see if there is any chance that this can be controlled via the WebAdmin.  Feel free to leave a comment in the comments section and let me know if you applied the solution from version 9.407-3 successfully.

List Directories and Files with Tree

List Directories and Files with Tree

On one of our backup servers, we run StableBit’s DrivePool with great success. As we’ve mentioned, this is a great program that allows you to pool disparate hard drives on a Windows Desktop or Server and has some great features and options. We use it to simply pool a number of drives to provide a large (20+ TB) backup target for our uSANs. After all, it’s backup, and in a home lab, you may not want to spend extra money on parity drives in your backup server when you already have parity and redundancy at other levels. And though it’s been working without fail for some time now, there’s one nagging thought that always lurks in the shadows for me.

As with any virtual file system layered on top of a drive pool, not knowing exactly where your files are is just how things work. After all, that’s what it’s designed to do – obfuscate the disk subsystem to provide a single large file system to place your files. Copy your stuff to the pool and let the software do the rest. To the user, all your files transparently appear in one neat and tidy place.

Perhaps it’s my OCD, but I still like to know where everything is. In a pinch, say if a backup drive fails, I like knowing exactly what’s gone. It’s like the old saying, “you don’t know what you don’t know.” “But Bill,” you say, “if a disk fails, simply rerun your backup scripts and let the system do it’s thing.” I know, and you’re exactly right, but you still can’t convince my OCD of that.



So, without further ado, here’s a simple command-line tool in Windows that will output a list of your files for reference should you need it — tree. Tree is included with nearly all versions of Windows and it’s quite easy to use. In it’s simplest form, tree simply outputs a list of directories, beginning with the current directory, and does so in a visual tree form that shows the directory structure. In system32 for instance, it looks like this:

Tree only has a couple of command line switches, but both can be useful.  Running tree with /F also displays the names of the files in each folder. As you can imagine, the output could get quite lengthy for a folder like system32, but sending the output to a logfile allows you review or search the output as needed. Using the /A outputs the results using ASCII characters instead of extended characters.  This is important when sending the output to a plain-text file, in which extended characters may not appear properly.

The simple command just looks like this:

Output is neatly sent to a plain-text file, which documents the file and folder layout.
tree_log

For our backup pool, we simply send the output tree to a log file as part of a daily scheduled task.  Should a drive in the pool fail, we can simply reference the log file for that day to determine exactly which files were lost.

Configuring Fibre in LIO

Configuring Fibre in LIO

Here’s a quick walk-through to get you up and running with fibre channel in LIO.

1) Install LIO if not already installed

2) Create the qla2xxx.conf file to configure fibre HBA to target mode

(the x’s are a part of the file name and not just blanks for the actual model number)

3) Add the following line to the qla2xxx.conf file

4) Save the file and exit

5) Now we must update initramfs with the new changes

6) Restart the server to apply changes

7) Launch targetcli to configure LIO

8) Add the LVM luns to Lio-Target (LVM luns are added to the iblock option under /backstores)

9) Now find the WWN(s) for the fibre card that is located in the server

10) Change to the qla2xxx directory

11) Now we need to add the WWN(s) from step 9 into targetcli (Do this step for each WWN that you wish to use)

(fill out the WWN from step 9)

12) Change to the WWN that you wish to add storage to

13) Add the storage from step 8 to the WWN that you just changed to.

14) Repeat steps 12 and 13 for each WWN and storage that you wish to use

15) Now change to the ACLS directory so we can add the ACL so the host can talk to the USAN

16) Now create the ACL for the WWN of the host you are trying to present the LUN to

(This is the WWN on the host)

17) Review all the configuration changes that were just made

18) Once all configuration changes have been verified, save the configuration

 

Bootable USB for Sophos UTM

Bootable USB for Sophos UTM

As an IT professional, I know how important security is to an environment. For this reason a simple Linksys router with a firewall isn’t enough for me. For my firewall I am currently using Sophos UTM installed on a SUPERMICRO 5015A-EHF-D525 1U server containing a X7SPE-HF-D525 motherboard, with an Atom 1.8Ghz processor and 4GB RAM. If you are at all familiar with this server you know that it does not include an optical drive. Most people use a USB drive to install their Operating Systems now. Well, I ran into an issue when installing Sophos UTM, Sophos does not install nicely when using a USB stick so I thought I would document the process that I found and used successfully.

First download Sophos UTM Home ISO

–https://www.sophos.com/en-us/products/free-tools/sophos-utm-home-edition.aspx

Next download Rufus (used to create the bootable USB)

–https://rufus.akeo.ie/

There is a reason for using Rufus. There are file names buried within the ISO that are to long for other bootable USB programs so they truncate the end of the file names. This requires you to 1) know where all of the files are located and 2) manually fix the filenames to contain the completed file name. Rufus leaves all of the file names intact. That is why I recommend using Rufus to create your bootable USB.



Next launch Rufus and from within Rufus

–Select the USB Device
–Next to the “Create a bootable disk using” section click on the cdrom icon and choose the ISO

Now Boot the soon to be UTM server to USB

Once you reach the UTM install screen, press Alt-F2

You should now be at a command prompt. Mount the USB stick to /install
–The command to mount the USB stick to install is:

Note because I had 2 hard drives in the system my USB stick was /dev/sdc1. Keep this in mind because the USB stick may not be /dev/sdb1

Finally once the mount was successful press Alt-F2 and proceed with the install as normal

The information in this post was put together and tested from a forum post on the Sophos forum and all credit for finding the process goes to the authors there.  I just put together what worked for me for future reference.  The Sophos forum post is located here:

https://community.sophos.com/products/unified-threat-management/f/52/t/27887?pi394=2#pi394=4

A Better Block Device in LIO

A Better Block Device in LIO

If you’ve read our previous articles on LIO , you’ve probably gathered that LIO is one of our favorite Linux utilities. We love the ability to use inexpensive hardware and FC or iSCSI cards to create a rock-solid Linux-based SAN to provide back-end storage for Hyper-V cluster shared volumes, highly-available shared VHDXs, or LUNs for Windows File Servers. We also love the flexibility that Linux MDADM/LVM offers to seamlessly add or expand storage arrays or add new LUNs. It really gives the IT Pro the ability to use many Enterprise features in a home lab you’d only be able to otherwise replicate with expensive, impractical hardware.

In the end, all this flexibility means we will inevitably tinker with configurations, add and remove hardware, and just generally screw around with things until we break them, then fix them, then break them again. That’s what we do. And, as it so often goes in IT, with any luck we’ll learn a thing or two along then way.

This was exactly the case when we recently expanded one of our backup Ubuntu SANs by adding a new disk. After the new volume was added, it became apparent that the previous method of using the typical Linux device notation for harddisks (/dev/sda, /dev/sdb, etc.) was not an optimal configuration.

Consider the following LIO backstores configuration:

This configuration has worked fine for months. However, after adding new disk, we quickly realized one of the volumes being presented to a two-node Hyper-V cluster was now listed as “Offline – Not initialized,” and any attempts to bring it online failed with I/O errors.

Looking at the backup uSAN, the disk that was formerly /dev/sdg was now /dev/sdh, and LIO’s ACLs were no longer correct. Though quick and dirty, using the /dev/sdx notation is clearly not the best way to add a single disk to the LIO backstores, since these values are subject to change. Looking in /dev/disk, we see a few different options that may be helpful:

Typically by-UUID is a good option – we’ve used it in the past for other operations. However, we’re specifically exporting block devices, and UUID only shows disks with partitions. /dev/disk/by-label, /dev/disk/by-partlabel, and /dev/disk/by-partuuid are much the same way, not all disks will have a label or partitions to view. /dev/disk/by-path is promising, but only if all the relevant disks are hanging off a SAS controller. Since many lab environments, such as ours, may make use of both on-board SATA headers as well as PCIe SAS controllers, that only leaves /dev/disk/by-id. Now listing disks by /dev/disk/by-id appears a bit messy at first, but if you look carefully you’ll a see neat and tidy way of referencing disks.



Specifically, let’s look a this systems’ disk in question, /dev/sdh.

Very nice — we see a disk ID that not only tells us the make and model of the drive, but also appends the drive’s serial number to the end. This is quite handy in a system with 10 or 20 drives, many of which may be the same model. Now, let’s go back to LIO’s targetcli and try to add the block device using this new identification, rather than the device letter.

OK. Looks like that works just fine. Now, let’s create the associated LUN.

Again, all appears well. To summarize the final steps, we then added the storage in Failover Cluster Manager, created the Cluster Shared Volume from the new disk, created a VHDX to fill the LUN, and attached the new virtual disk to our virtual machine with out issue. Now, when we add or swap drives, change disk controllers, or even completely move disks to new motherboard/chassis, we no longer have to worry about device letters, as this new (and better) method removes any ambiguity as to which disk is which.

Clear the Disk Read-Only Flag in Windows

Clear the Disk Read-Only Flag in Windows

While recently adding a new disk to one of our backup servers, one of the disks changed device letters in Linux. Ordinarily this is not a big deal, but since this particular disk was a iblock device in an LIO backstore, and was defined by the /dev/sd[x] notation, it was no longer listed correctly. Oddly, the disk was still listed in the Disk Manager on the hypervisors, but any attempt at I/O would result in errors. The disk was ultimately removed from the LIO configuration, which then caused the LUN to drop from the hypervisor nodes.

After adding the disk back to LIO using a slicker method as detailed here, the disk reappeared on the hypervisors, and we reconnected the disk to the VM in Hyper-V. However, after adding the storage back, we noticed the LUN from LIO was marked as read-only in the virtual server, and would not permit any writes. Should you run into a similar situation, the fix is usually pretty simple, as noted below.

First, start the diskpart utility from a Windows CLI and list the available disks:

 

Next, select the disk in question, in this case Disk 6. Notice that when we look at the disk details in diskpart, this disk is definitely listed as read-only:

 

With the disk still selected, clear the readonly attribute for the disk with the following command:

 

The disk should now be listed as “Read-Only: No,” and available for writing. You can verify its status with the detail command as before.

We’re still not quite sure what caused this little issue, as we’ve removed and added several disks back in LIO without this cropping up. Perhaps it was the less than graceful removal of the disk from the hypervisor while it was attempting IO. Whatever the case, though an old utility, diskpart can still prove to be a useful tool when the need arises.