Windows Subsystem for Linux – What’s the Deal?

This past summer, Microsoft released its “Anniversary Update” for Windows 10. It included a lot of the business-as-usual sort of operating system updates: enhanced security, improved integration with mobile devices, updates to Microsoft’s “virtual assistant” Cortana (who is totally not named after a video game AI character who went rampant and is currently trying to destroy all biological life in the known universe, because what company would possibly tempt fate like that?)

halo-4-cortana-rampant
“NO I WILL NOT OPEN ANOTHER INCOGNITO WINDOW FOR YOU, FILTH”

But possibly the biggest under-the-radar change to Windows 10 was the introduction of Bash on Ubuntu on Windows. Microsoft partnered with Canonical, the company that develops the popular Linux operating system distribution Ubuntu, to create a full-fledged Linux/Ubuntu subsystem (essentially Ubuntu 14.04 LTS) inside of Windows 10. That’s like a turducken of operating systems.

foo_ck_turducken_1223
Which layer is the NT kernel, though?

What does that mean, practically speaking? For years, if you were interested in command-line control of your Windows computer, you could use Powershell or the Command Prompt – the same basic command-line system that Microsoft has been using since the pre-Windows days of MS-DOS. Contrast that to Unix-based systems like Mac OSX and Ubuntu, which by default use an input system called the Bash shell – the thing you see any time you open the application Terminal.

 

The Bash shell is very popular with developers and programmers. Why? A variety of reasons. It’s an open-source system versus Microsoft’s proprietary interface, for one. It has some enhanced security features to keep users from completely breaking their operating system with an errant command (if you’re a novice command-line user, that’s why you use the “sudo” command sometimes in Terminal but never in Command Prompt – Windows just assumes everyone using Command Prompt is a “super user” with access to root directories, whereas Mac OSX/Linux prefers to at least check that you still remember your administrative password before deleting your hard drive from practical existence). The Bash scripting language handles batch processing (working with a whole bunch of files at once), scheduling commands to be executed at future times, and other automated tasks a little more intuitively. And, finally, Unix systems have a lot more built-in utility tools that make software development and navigating file systems more elegant (to be clear, these utility applications are not technically part of the Bash shell – they are built into the Mac OSX/Linux operating system itself and accessed via the Bash shell).

Bringing in a Linux subsystem and Bash shell to Windows is a pretty bold move to try and win back developers to Microsoft’s platform. There have been some attempts before to build Linux-like environments for Windows to port Mac/Linux software – Cygwin was probably the most notable – but no method I ever tried, at least, felt as intuitive to a Mac/Linux user as Bash on Ubuntu on Windows does.

cygwinsetup
what even are you

Considering the increasing attention on open-source software development and command-line implementation in the archival community, I was very curious as to whether Bash on Ubuntu on Windows could start bridging the divide between Mac and Windows systems in archives and libraries. The problem of incompatible software and the difference in command-line language between Terminal and Command Prompt isn’t insurmountable, but it’s not exactly convenient. What if we could get all users on the same page with the software they use AND how they use them – regardless of operating system???

OK. That’s still a pipe dream. I said earlier that the Windows Subsystem for Linux (yes that’s what it’s technically called even though that sounds like the exact opposite of what it should be) was “full-fledged” – buuuuuut I kinda lied. Microsoft intends the WSL to be a platform for software development, not implementation. You’re supposed to use it to build your applications, but not necessarily actually deploy it into a Windows-based workflow. To that end, there are some giant glaring holes from just a pure Ubuntu installation: using Bash on Ubuntu on Windows, you can’t deploy any Linux software with a graphical user interface (GUI) (for example, the common built-in Linux text editing utility program gedit doesn’t work – but nano, which allows you to text edit from within the Bash terminal window itself, does). It’s CLI or bust. Any web-based application is also a big no-no, so you’re not going to be able to sneakily run a Windows machine as a server using the Linux subsystem any time soon.

Edit: Oh and the other giant glaring thing I forgot to mention the first time around – there’s no external drive support yet. So the WSL can’t access removable media on USB or optical disc mounted on the Windows file system – only fixed drives. So disc imaging software, while it technically “works”, can only work with data already moved to your Windows system.

But with all those caveats in mind… who cares what Microsoft says is supposed to happen? What does it actually do? What works, and what doesn’t? I went through a laundry list of command-line tools that have been used or taught the past few years in our MIAP courses (primarily Video Preservation, Digital Preservation and Handling Complex Media), plus a few tools that I’ve personally found useful. First, I wanted to see if they installed at all – and if they did, I would try a couple of that program’s most basic commands, hardly anything in the way of flags or options. I wasn’t really trying to stress-test these applications, just see if they could indeed install and access the Windows file system in a manner familiar to Mac/Linux users.

bash
*Hello Bash my only friend / I’ve come to ‘cat’ with you again*

Before I start the run-down, a note on using Bash on Ubuntu on Windows yourself, if interested. Here are the instructions for installing and launching the Windows Subsystem for Linux – since the whole thing is technically still in beta, you’ll need to activate “developer” mode. Once installed and launched, ALL of these applications will only work through the Bash terminal window – you can not access the Linux subsystem, and all software installed thereon, from the traditional Windows Command Prompt. (It goes the other way too – you can’t activate your Windows applications from the Bash shell. This is all about accessing and working with the same files from your preferred command-line environment.) And once again, the actual Ubuntu version in this subsystem is 14.04 LTS – which is not the latest stable version of that operating system. So any software designed only to work with Ubuntu 16.04 or the very latest 16.10 isn’t going to work in the Windows subsystem.

Once you’re in a Bash terminal, you can access all your files stored within the Windows file system by navigating into the “/mnt/” directory:

[cc lang=”Bash”]$ cd /mnt/[/cc]

You should see different letters within this directory according to how many drives you have mounted in your computer, and their assigned letters/paths. For instance, for many Windows users all your files will probably be contained within something like:

[cc lang=”Bash”]/mnt/c/Users/your_user_name/Downloads[/cc] or

[cc lang=”Bash”]/mnt/c/Users/your_user_name/Desktop[/cc] , etc. etc.

And one last caveat: dragging and dropping a file into the Bash terminal to quickly get the full file path doesn’t work. It will give you a Command Prompt file path (e.g. “C:\Users\username\Downloads\file.pdf”) that the Bash shell can’t read. You’re going to have to manually type out the full file path yourself (tabbing over to automatically fill in directory/file names does still work, at least).

Let’s get to it!

Programs That Install Via Apt-Get:

  • bagit-java
  • bagit-python
  • cdrdao
  • ClamAV
  • ddrescue (install w/package name “gddrescue”, execute w/ command “ddrescue”)
  • ffmpeg (but NOT ffplay)
  • git
  • imagemagick
  • md5deep
  • mediainfo
  • MKVToolNix
  • Python/Python3/pip
  • Ruby/RubyGems
  • rsync
  • tree

Installing via Ubuntu’s “apt-get” utility is by far the easiest and most desirable method of getting applications installed on your Linux subsystem. It’s a package manager that works the same way as Homebrew on Mac, for those used to that system: just execute

[cc lang=”Bash”]$ sudo apt-get install nameofpackage [/cc]

and apt-get will install the desired program, including all necessary dependencies (any software libraries or other software utilities necessary to make the program run). As you can see, the WSL can handle a variety of useful applications: disk imaging (cdrdao, ddrescue), transcoders (ffmpeg, imagemagick), virus scanning (ClamAV), file system and metadata utilities (mediainfo, tree), hash/checksum generation (md5deep).

mediainfo
Windows 10!

You can also get distributions of programming languages like Python and Ruby and use their own package managers (pip, RubyGems) to install further packages/libraries/programs. I tried this out with Python by installing bagit-python (my preferred flavor of BagIt – see this previous post for difference between bagit-python and the bagit-java program you get by just running “apt-get bagit”), and with Ruby by installing the Nokogiri library and running through this little Ruby exercise by Ashley Blewer. (I’d tried it before on Mac OSX but guess what, works on Windows through the WSL too!)

A couple things to note: one, if you’re trying to install the Ubuntu version of ddrescue, there’s confusingly a couple different packages named the same thing, and serve the same purpose. There’s a nice little rundown of how that happened and how to make sure you’re installing and executing exactly the program you want on the Ubuntu forums.

Also, while ffmpeg’s transcoding and metadata-gathering features (ffprobe) work fine, its built-in media playback command (ffplay) will not, because of the aforementioned issue with GUIs (it has to do with X11, the window system that Unix systems use for graphical display, but never mind that for now). Actually, it sort of depends on how you define “work”, because while ffplay won’t properly play back video, it will generate some fucking awesome text art for you in the Bash terminal:

 

Programs That Require More Complicated Installation:

  • bulk_extractor (requires legacy JDK)
  • exiftool
  • Fslint
  • mediaconch
  • The Sleuth Kit tools

These applications can’t be installed via an apt-get package, but you can still get them running with a little extra work, thanks to other Linux features such as dpkg. Dpkg is another package management program – this one comes from Debian, a Linux operating system of which Ubuntu is a direct (more user-friendly) derivative. You can use dpkg to install Debian (.deb) packages (like the CLI of Mediaconch), although take note that unlike apt-get, dpkg does not automatically install dependencies – so you might need to go out and find other libraries/packages to install via apt-get or dpkg before your desired program actually starts working (for Mediaconch, for instance, you should just apt-get install Mediainfo first to make sure you have the libmediainfo library already in place).

The WSL does also have the autocompile and automake utilities of full Linux distributions, so you can also use those to get packages like The Sleuth Kit (a bunch of digital forensics tools) or Fslint (a duplicate file finder) running. Best solution is to follow whatever Linux installation documentation there is for each of these programs – if you have questions about troubleshooting specific programs, let me know and I’ll try to walk you through my process.

fslint

 

Programs That Don’t Work/Install…Yet:

  • Archivematica
  • Guymager
  • vrecord

I had no expectation that these programs would work given the stated GUI and web-based limitations of the WSL, but this is just to confirm that as far as I can tell, there’s no way to get them running. Guymager has the obvious GUI/X11 issue (plus the inability to recognize external devices, anyway, and the general dysfunction of the /dev/ directory). The vrecord team hasn’t successfully installed on Linux yet, and the WSL would run into the GUI issue even if they do release a Linux version. And web applications definitely aren’t my strong suit, but in the long process of attempting an Archivematica installation, the WSL seemed to have separate issues with Apache, uWSGI and NGINX. That’s a lot of troubleshooting to likely no end, so best to probably leave that one aside.

That’s about all for now – I’m curious if anyone else has been testing the WSL, or has any thoughts about its possible usefulness in bridging compatibility concerns. Is there any reason we shouldn’t just be teaching everyone Bash commands now??

Update (10/20): So the very day that I post this, Microsoft released a pretty major update to the WSL, with two major effects: 1) new installations of WSL will now be Ubuntu 16.04 (Xenial), though existing users such as myself will not automatically upgrade from 14.04; and 2) the Windows and Linux command-line interfaces now have cross-compatibility, so you can launch Windows applications from the Bash terminal and Linux applications from Command Prompt or Powershell. Combine that with the comment below from Euan with directions to actually launch Linux applications with GUIs, and there’s a whole slew of options to continue exploring here. Look for further posts in the future! This subsystem is clearly way more powerful than Microsoft initially wanted to let on.

Dual-Boot a Windows Machine

It is an inconvenient truth that the MIAP program is spread across two separate buildings along Broadway. They’re only about five minutes apart, and the vast majority of the time this presents no problems for students or staff, but it does mean that my office and one of our primary lab spaces are in geographically separate locations. Good disaster planning, troublesome for day-to-day operations.

The Digital Forensics Lab (alternately referred to as the Old Media Lab or the Dead Media Lab, largely depending on my current level of frustration or endearment towards the equipment contained within it) is where we house our computing equipment for the excavation and exploration of born-digital archival content: A/V files created and contained on hard drive, CD, floppy disk, zip disk, etc. We have both contemporary and legacy systems to cover decades of potential media, primarily Apple hardware (stretching back to a Macintosh SE running OS 7), but also a couple of powerful modern Windows machines set up with virtual machines and emulators to handle Microsoft operating systems back to Windows 3.1 and MS-DOS.

Having to schedule planned visits over from my office to the main Tisch building in order to test, update, or otherwise work with any of this equipment is mildly irksome. That’s why my office Mac is chock full of emulators and other forensic software that I hardly use on any kind of regular basis – when I get a request from a class for a new tool to be installed in the Digital Forensics Lab, it’s much easier to familiarize myself with the setup process right where I am before working with legacy equipment; and I’m just point-blank unlikely to trek over the other building for no other reason than to test out new software that I’ve just read about or otherwise think might be useful for our courses.

sleepy-office-worker-at-desk-with-multiple-coffees
#ProtestantWorkEthic

This is a long-winded way of justifying why the department purchased, at my request, a new Windows machine that I will be able to use as a testing ground for Windows-based software and workflows (I had previously installed a Windows 7 virtual machine on my Mac to try to get around some of this, but the slowed processing power of a VM on a desktop not explicitly setup for such a purpose was vaguely intolerable). The first thing I was quite excited to do with this new hardware was to set up a dual-boot operating system: that is, make it so that on starting up the computer I would have the choice of using either Windows 7 or Windows 10, which is the main thing I’m going to talk about today.

IMG_2329
Swag

Pretty much all of our Windows computers in the archive and MIAP program still run Windows 7 Pro, for a variety of reasons – Windows 8 was geared so heavily towards improved communication with and features for mobile devices that it was hardly worth the cost of upgrading an entire department, and Windows 10 is still not even a year old, which gives me pause in terms of the stability and compatibility of software that we rely on from Windows 7. So I needed Windows 7 in order to test how new programs work with our current systems. However, as it increases in market share and developers begin to migrate over, I’m increasingly intrigued by Windows 10, to the point that I also wanted access to it in order to test out the direction our department might go in the future. In particular I very much wanted to try out the new Windows Subsystem for Linux, available in the Windows 10 Anniversary Update coming this summer – a feature that will in theory make Linux utilities and local files accessible to the Windows user via a Bash shell (the command-line interface already seen on Mac and Ubuntu setups). Depending how extensive the compatibility gets, that could smooth over some of the kinks we have getting all our students (on different operating systems) on the same page in our Digital Literacy and Digital Preservation courses. But that is a more complicated topic for another day.

When my new Windows machine arrived, it came with a warning right on the box that even though the computer came pre-installed with Windows 7 and licenses/installation discs for both 7 and Windows 10,

You may only use one version of the Windows software at a time. Switching versions will require you to uninstall one version and install the other version.

1d8acd8c6e8e337ce31bef84a8636491

This statement is only broadly true if you have no sense of partitioning, a process by which you can essentially separate your hard drive into distinct, discrete sections. The computer can basically treat separate partitions as separate drives, allowing you to format the different partitions with entirely separate file systems, or, as we will see here, install completely different operating systems.

Now, as it happens, it also turned out to be semi-true for my specific setup, but only temporarily and because of some kinks specific to manufacturer who provided this desktop (hi, HP!). I’ll explain more in a minute, but right now would be a good point to note that I was working with a totally clean machine, and therefore endangering no personal files in this whole partitioning/installation process. If you also want to setup some kind of dual-boot partition, please please please make sure all of your files are backed up elsewhere first. You never know when you will, in fact, have to perform a clean install and completely wipe your hard drive just to get back to square one.

1a0a18d74db871e6358d7526b271c0e749d9cedb8afd2411816625802370c924
“Arnim Zola sez: back up your files, kids!”

So, as the label said, booting up the computer right out of the box, I got a clean Windows 7 setup. The first step was to make a new blank partition on the hard drive, on to which I could install the Windows 10 operating system files. In order to do this, we run the Windows Disk Management utility (you can find it by just hitting the Windows Start button and typing “disk management” into the search bar:

start

Once the Disk Management window pops up, I could see the 1TB hard drive installed inside the computer (labelled “Disk 0”), as well as all the partitions (also called “volumes”) already on that drive. Some small partitions containing system and recovery files (from which the computer could boot into at least some very basic functionality even if the Windows operating system were to corrupt or fail) were present, but mostly (~900 GB) the drive is dedicated to the main C: volume, which contains all the Windows 7 operating files, program files, personal files if there were any, etc. By right-clicking on this main partition and selecting “Shrink Volume,” I can set aside some of that space to a new partition, on to which we will install the Windows 10 OS. (note all illustrative photos gathered after the fact, so some numbers aren’t going to line up exactly here, but the process is the same)

hesx3

If you wanted to dual-boot two operating systems that use completely incompatible file systems – for instance, Mac and Windows – you would have to set aside space for not only the operating system’s files, but also all of the memory you would want to dedicate to software, file storage, etc. However, Windows 7 and 10 both use the NTFS file system – meaning Windows 10 can easily read and work with files that have been created on or are stored in a Windows 7 environment. So in setting up this new partition I only technically had to create space for the Windows 10 operating system files, which run about 25 GB total. In practice I wanted to leave some extra space, just in case some software comes along that can only be installed on the Windows 10 partition, so I went ahead and doubled that number to 50 GB (since Disk Management works in MB, we enter “50000” into the amount of space to shrink from the C: volume).

shrink_volume

Disk Management runs for a minute and then a new Blank Partition appears on Disk 0. Perfect! I pop in the Windows 10 installation disc that came with the computer and restart. In my case, the hardware automatically knew to boot up from the installation disc (rather than the Windows 7 OS on the hard drive), but it’s possible others would have to reset the boot order to go from the CD/DVD drive first, rather than the installed hard drive (this involves the computer’s BIOS or UEFI firmware interface – more on that in a minute – but for now if it gives you problems, there’s plenty of guides out there on the Googles).

Following the instructions for the first few parts of the Windows 10 installer is straightforward (entering a user name and password, name for the computer, suchlike), but I ran into a problem when finally given the option to select the partition on to which I wanted to install Windows 10. I could see the blank, unformatted 50 GB partition I had created, right there, but in trying to select it, I was given this warning message:

Windows cannot be installed to this disk. The selected disk is of the GPT partition style.

Humph. In fact I could not select ANY of the partitions on the disk, so even if I had wanted to do a clean install of Windows 10 on to the main partition where Windows 7 now lived, I couldn’t have done that either. What gives, internet?

So for many many many years (in computer terms, anyway – computer years are probably at least equivalent to dog years), PCs came installed with a firmware interface called the BIOS – Basic Input/Output System. In order to install or reinstall operating system software, you need a way to send very basic commands to the hard drive. The BIOS was able to do this because it lived on the PC’s motherboard, rather than on the hard drive – as long as your BIOS was intact, your computer would have at least some very basic functionality, even if your operating system corrupted or your hard drive had a mechanical failure. With the BIOS you could reformat your hard drive, select whether you booted the operating system from the hard drive or an external source (e.g. floppy drive or CD drive), etc.

header
Or rule a dystopian underwater society! …wait

In the few seconds when you first powered on a PC, the BIOS would look to the very first section of a hard drive, which (if already formatted) would contain something called a Master Boot Record, a table that contains information about the partitions present on that hard drive: how many partitions are present, how large each of them are, what file system was present on each, which one(s) contained bootable operating system software, which partition to boot from first (if multiple partitions had a bootable OS).

windows-cannot-be-installed-to-this-disk
You probably saw something like this screen by accident once when your cat walked across your keyboard right as you started up the computer.

Here’s the thing: because of the limitations of the time, the BIOS and MBR partition style can only handle a total of four partitions on any one drive, and can only boot from a partition if it isless than about 2.2 TB in size. For a long time, that was plenty of space and functionality to work with, but with rapid advancements in the storage size of hard drives and the processing power of motherboards, the BIOS and MBR partitioning became increasingly severe and arbitrary roadblocks. So from the late ’90s through the mid-’00s, an international consortium developed a more advanced firmware interface, called UEFI (Unified Extensible Firmware Interface) that employed a new partition system, GPT (GUID Partition Table). With GPT, there’s theoretically no limit to the number of partitions on a drive, and  UEFI can boot from partitions as large as 9.4 ZB (yes, that’s zettabytes). For comparison’s sake, 1 ZB is about equivalent to 36,000 years of 1080p high-definition video. So we’re probably set for motherboard firmware and partition styles for a while.

n2cnt4
We’re expected to hit about 40 zettabytes of known data in 2020. Like, total. In the world. Our UEFI motherboards are good for now.

UEFI can not read MBR partitions as is, though it has a legacy mode that can be enabled to restrict its own functionality to that of the BIOS, and thereby read MBR. If the UEFI motherboard is set to only boot from the legacy BIOS, it can not understand or work with GPT partitions. Follow?

So GETTING BACK TO WHAT WE WERE ACTUALLY DOING….the reason I could not install a new, Windows 10-bootable partition on to my drive was that the UEFI motherboard in my computer had booted from the legacy BIOS -for some reason.

jdhvc
Me.

Honestly, I’m not sure why this is. Obviously this was not a clean hard drive when I received it – someone at HP had already installed Windows 7 on to this GPT-partitioned hard drive, which would’ve required the motherboard to be in UEFI boot mode. So why did it arrive with legacy BIOS boot mode not only enabled, but set first in the preferential boot order? My only possible answer is that after installing Windows 7, they went back in and set the firmware settings to legacy BIOS boot mode in order to improve compatibility with the Windows 7 OS – which was developed and released still in the days when BIOS was still the default for new equipment.

This was a quick fix – restart the computer, follow the brief on-screen instructions to enter the BIOS (usually pressing the ESC key, though it can vary with your setup), and navigating through the firmware settings to re-enable UEFI boot mode (I also left legacy BIOS boot enabled, though lower in the boot order, for the above-stated reasoning about compatibility with Windows 7 – so now, theoretically, my computer can start up from either MBR or GPT drives/disks with no problem).

Phew. Are you still with me after all this? As a reward, here’s a vine of LeBron James blocking Andre Iguodala to seal an NBA championship, because that is now you owning computer history and functionality.

https://vine.co/v/5BuzmV0Xw5b

From this point on, we can just pop the Windows 10 installation disc back in and follow the instructions like we did before. I can now select the unformatted 50 GB partition on which to install Windows 10 – and the installation wizard basically runs itself. After a lot of practical setup username and password nonsense, now when I start up my computer, I get this screen:

boot-screen-640x480

And I can just choose whether to enter the Windows 7 or 10 OS. Simple as that. I’ll go more into some of what this setup allows me to do (particularly the Windows Subsystem for Linux) another day, as this post has gone on waaaayy too long. Happy summer, everyone!