Your Promotion to Package Manager Manager

Let’s talk about Homebrew. No, not the beer-making technique that left you befuddled by the appearance of approximately two thousand quirky-named, identically-tasting IPAs at the liquor store. I mean the popular package manager for macOS.

I literally had to search for “quintuple pale ale” before I stopped getting actual results.

Homebrew is a beloved tool of digital preservation educators and practitioners alike, and it’s little wonder why. While I will continue to bang the drum for Linux operating systems and the Windows Subsystem for Linux, macOS remains a popular choice for digipres workstations given the general familiarity of Macs and the ability to easily and natively run much handy Bash/command line software. And Homebrew provides macOS with a major piece that it’s “missing” out-of-the-box: a command line-based package manager.

A “package manager” or package management system allows a user to install, uninstall, upgrade and configure the software on their computer in a consistent and centralized manner. The official Mac App Store is a package manager, for instance – rather than needing to trawl the internet to find, download, and run individual installers for every single application you’d like to try, the App Store (in theory) puts everything in one place. Homebrew is frequently described as an “App Store” for CLI programs, and the comparison is pretty apt.​*​ Setting up Homebrew is pretty much step one for any Mac-based digipres machine: once it’s in place, it’s only a matter of minutes to ffmpeg, mediainfo and mediaconch, vrecord, exiftool, imagemagick, rsync, youtube-dl, hashdeep, ddrescue, and much more.

Homebrew is far from the only command line package manager out there – part of why it’s called the “missing” package manager for macOS is because CLI package managers are included in Linux operating systems by default. Debian-based systems like Ubuntu have the Advanced Packaging Tool (APT);​†​ Red Hat-based systems have YUM or DNF; Arch systems have Pacman, SUSE systems have zypper, etc etc etc (and Homebrew can be used on Linux as well, incidentally). Since 2017, native Windows users can even get in on the action with Chocolatey.

Homebrew isn’t even the first attempt to bring command-line package management to MacOS, following in the footsteps of Fink and MacPorts (both of which are still around, though I would say with less robust or user-friendly communities around them). And that’s not even getting into the crazy number of package or “dependency managers” that do basically the same work for developers looking to add modules in particular programming language environments: pip (Python), RubyGems (Ruby), npm and yarn (NodeJS/JavaScript), Maven (Java)…

It’s a software preservation nightmare! Hooray!

All of these projects have more or less the same goal and advantages: automatically do all the work it takes to install a piece of software and make it available to the user. In many cases that means, for example, typing $ brew install mediainfo so that the mediainfo command-line application is then available by calling $ mediainfo example.mov , where it wasn’t available before.

However, as with so many lovely, user-friendly innovations, there is an element here of ceding control for convenience. And while most of the time it is hopefully unnecessary to peek under the Homebrew hood, a couple of times in the past year I’ve seen situations come up where it helps to have some clarity about what all, exactly, Homebrew is doing when one hits enter on a “brew install”.

So today I’m going to dive a bit deeper on the ins and outs of package management. I’ll use Homebrew for several concrete examples because of its popularity and ubiquity in digipres training, so keep in mind that some of Homebrew’s (beer-obsessed) vocabulary may be unique; but the concepts described here are basically applicable across the board, no matter what package manager you’re using.

Return to the Source

First off, I would like to clarify a few basic computing concepts: namely, the different types of code and executable programs that can be installed with a package manager.

There is a key difference in programming between source code and machine code. Source code is “human-readable” – open up a source code file and you will be greeted with plain text, written and formatted according to the conventions of a particular programming language (Python, Bash, JavaScript, Ruby, HTML, etc). These sorts of files make up the majority of what you see on popular source code hosting and version control platforms like GitHub, GitLab, Bitbucket, SourceForge. They facilitate the work of programmers and developers: broadly speaking, people read and write source code.

Machine code, as the name implies…is for machines. It is a set of instructions that can be directly executed by a computer’s processor. Machine code is numerical – there are no words to read, but the CPU interprets and understands a given string of numbers, or even just ones and zeroes, as a series of actions for it to perform. Open up a machine code file in a text editor, and you will just get a string of absolute wingding gobbledygook as the editor tries (and fails) to interpret and display this numerical series as text. (Open it up in a hex editor instead, and you might get a different result, but that’s another day’s topic)

Literal machine code

In order to get from source code (human-readable) to machine code (machine-readable) you usually have to compile the source code. The purpose of a compiler is to take your source code text and translate it down into ones and zeroes for your computer to actually do something with. Very broadly speaking, every operating system has its own compiler to make executable machine code for the particular OS and CPU that you are using right now. The same source code file (written, let’s say, in Python) is going to create different machine code whether the machine code is intended and compiled to run on MacOS or Windows.

I’m probably making this sound more complicated than it is: we see the results when you, for instance, try to open and run a “.exe” file intended for Windows on macOS.​‡​ Windows .exes are machine code: again, try opening one up in a plain text editor like Notepad or Atom or Sublime Text and see the results.

Because executable machine code is numerical, and at its very core just a very long string of ones and zeroes, executable machine code files, like Windows .exe files, are sometimes just called “binaries”.

This can get *really* confusing because technically a “binary” is just “any file not written and meant to be displayed as plain text” – .mp3s, .movs, .pdfs, .jpgs, all of them and many more are also “binary” files. But in the particular context of package management and installing applications, the term “binary” is very frequently used interchangeably and synonymously with “version”, e.g. “a macOS binary”, “a Windows binary”, etc. I sort of wish I could’ve avoided this altogether but it will absolutely come up when troubleshooting or searching support forums with package management questions, so here we are.

So if you want to create software that works on different operating systems and processors, someone usually has to compile it from source code first into executable binaries that match the desired operating systems and processors. Again, you’ve seen this out in the digital world: you’re trying to download a piece of software from a website and there are two different links, one “For Windows” that downloads a .exe, and one “for MacOS” that gives you a .app or .pkg or such. And because compilation is essentially an act of translation, there are correspondingly things that can get lost in that process. We’ll see a direct example of this later in a case study.

*Extremely undergrad film student voice* It’s really more of a character study

A last aside: you may have encountered scripts and be thinking, “hey, but I can run (execute) a .py Python script or a .sh Bash script, which are source code files, without compiling!” Well, you got me! Just as I mentioned above that not all binaries are executable, not all executables are binaries. Scripting uses a process called interpreting *instead* of compiling. This usually allows scripts to be a little more portable than binaries across operating systems, but the basic idea – that there is a layer of translation necessary between the source code file and the computer – is the same.

Define Your “Install”

So what all exactly does source code and machine code and binaries have to do with package management and installing software?

Let’s say you are a web archivist running macOS and you installed two different pieces of software at the same time: youtube-dl, which is a handy command line tool for downloading media from YouTube and other hosting sources, and Webrecorder Player, a wonderful desktop tool for viewing and inspecting web archives (WARCs), no internet connection required.

You “installed” both programs. But youtube-dl is not in your “Applications” folder next to Webrecorder Player. And no matter how many variations on $ webrecorder you type into your Terminal, Webrecorder Player does not launch on your desktop.

If that’s the case… what does “install” mean?

“Installing” an application is actually a highly contextual process. The end-goal is always the same: you want to take a program and make it usable. But how you’re supposed to use a program….depends on the program! And thus the steps actually necessary to complete an installation also depend on the program, user, and goals involved. When broken down into smaller, discrete actions – downloading source code, compiling the code, creating a binary, moving that binary to a particular folder, changing some operating system settings so that you can execute that binary – the installation process can get quite variable and customizable.

So finally, we return to package managers. As I said up top, package managers take care of many of the nitty-gritty details of installing programs so that as far as you, the user, are concerned, “install” just means “click this button” or “type one brief command.” They are meant to cover a large majority of use cases with minimal effort – but potentially to the detriment of edge cases or tinkering an installation to exactly how a user needs it.

To return specifically to Homebrew, every package or program that you can install with Homebrew has a corresponding formula. You can browse all of them here. Homebrew’s formulae are all hosted on GitHub, and every single one of them is just a brief script (written in Ruby) which defines instructions to answer the question:”what does ‘install’ mean for this particular program?” When you type $ brew install mediainfo, Homebrew searches in the Homebrew/homebrew-core repository in GitHub, finds the mediainfo.rb formula, and then follows whatever steps that formula tells it to do.

To get a better idea of what these instructions actually look like and how Homebrew interprets and performs them, let’s look at a concrete example.

Case Study: vrecord’s Homebrew Formula

This is the whole Homebrew formula for vrecord, the open source video digitization program, developed and maintained by the Association of Moving Image Archivists’ Open Source Committee – let’s take a look and break it down by section:

class Vrecord < Formula
  desc "Capturing a video signal and turning it into a digital file"
  homepage "https://github.com/amiaopensource/vrecord"
  url "https://github.com/amiaopensource/vrecord/archive/v2020-07-01.tar.gz"
  version "2020-07-01"
  sha256 "983264ca6a69b78b4487a7479ab5a4db04cbc425f865ec2cb15844e72af4f4ac"
  head "https://github.com/amiaopensource/vrecord.git"

  depends_on "amiaopensource/amiaos/deckcontrol"
  depends_on "amiaopensource/amiaos/ffmpegdecklink"
  depends_on "amiaopensource/amiaos/gtkdialog"
  depends_on "cowsay"

  on_macos do
    depends_on "bash"
    depends_on "gnuplot"
    depends_on "mediaconch"
    depends_on "mkvtoolnix"
    depends_on "mpv"
    depends_on "qcli"
    depends_on "xmlstarlet"
  end

  on_linux do
    def caveats
      <<~EOS
        ** IMPORTANT FOR LINUX INSTALL **
        Additional install steps are necessary for a fully functioning Vrecord
        install on Linux. This includes using the standard package manager to
        install gnuplot, xmlstarlet, mkvtoolnix and mediaconch. Additionally,
        it often is necessary to remove the Homebrew installed version of SDL2
        to prevent conflicts. For more information please see:
        https://github.com/amiaopensource/vrecord/blob/master/Resources/Documentation/linux_installation.md
      EOS
    end
  end

  def install
    bin.install "vrecord"
    bin.install "vtest"
    prefix.install "Resources/audio_mode.gif"
    prefix.install "Resources/qcview.lua"
    prefix.install "Resources/vrecord_policy_ffv1.xml"
    prefix.install "Resources/vrecord_policy_uncompressed.xml"
    prefix.install "Resources/vrecord_logo.png"
    prefix.install "Resources/vrecord_logo_playback.png"
    prefix.install "Resources/vrecord_logo_audio.png"
    prefix.install "Resources/vrecord_logo_edit.png"
    prefix.install "Resources/vrecord_logo_help.png"
    prefix.install "Resources/vrecord_logo_documentation.png"
    man1.install "vrecord.1"
    man1.install "vtest.1"
  end

  test do
    system "#{bin}/vrecord", "-h"
  end
end

Metadata

class Vrecord < Formula
  desc "Capturing a video signal and turning it into a digital file"
  homepage "https://github.com/amiaopensource/vrecord"
  url "https://github.com/amiaopensource/vrecord/archive/v2020-07-01.tar.gz"
  version "2020-07-01"
  sha256 "983264ca6a69b78b4487a7479ab5a4db04cbc425f865ec2cb15844e72af4f4ac"
  head "https://github.com/amiaopensource/vrecord.git"
  • class Vrecord < Formula – This line is required to start every formula – Homebrew needs to know that the script you’ve pointed it to is indeed a formula! (The first letter of the program/formula name has to be capitalized to conform with Ruby syntax, regardless of how you normally write the name)
  • desc – A brief description of the application and its purpose. Not strictly required, but helpful – this will often match the “About” project description on the application’s GitHub/GitLab page, if the code is hosted there.
  • homepage – A project site where users can go for more information about the program. It’s mandatory to include a homepage if you want to include your formula in Homebrew’s core list of packages.
  • url – This is required – it directs Homebrew to where it should download the program’s code (in this case, and in the case of many Homebrew formulae, the source code is contained in a tarball, a format that takes all the source code files and packages them together into one, like a .zip file). If a program has multiple versions or releases, this is the field that specifies *which* version Homebrew will try to install.
  • version – This is metadata that helps Homebrew keep track of which version of a program you have installed (particularly helpful if you have multiple versions of a program/formula installed on the same computer). It’s not required to have this field, and if the source code URL above comes from GitHub, Homebrew can usually pull this version info from the file name automatically – but it doesn’t hurt to specify manually.
  • sha256 – This is the checksum for the source code tarball from the URL above. Once the tarball has downloaded to your computer, Homebrew will automatically check that it matches this checksum here – basically a security feature to make sure that what Homebrew downloads is indeed the code/program that you wanted. This is required.
  • head – “Head” specifies a cutting-edge version of the program – if it’s specified, it means early adopters can try out the absolute newest changes and revisions to the program by its developers by running $ brew install --HEAD <formula> instead of just $ brew install <formula>. It’s basically a way to signal to users that some new options or features may be available for them to try out but they are not yet considered stable. It’s not required if the formula/program author only wants Homebrew users to install “guaranteed”, stable releases of their software.

Dependencies

depends_on "amiaopensource/amiaos/deckcontrol"
  depends_on "amiaopensource/amiaos/ffmpegdecklink"
  depends_on "amiaopensource/amiaos/gtkdialog"
  depends_on "cowsay"

  on_macos do
    depends_on "bash"
    depends_on "gnuplot"
    depends_on "mediaconch"
    depends_on "mkvtoolnix"
    depends_on "mpv"
    depends_on "qcli"
    depends_on "xmlstarlet"
  end

  on_linux do
    def caveats
      <<~EOS
        ** IMPORTANT FOR LINUX INSTALL **
        Additional install steps are necessary for a fully functioning Vrecord
        install on Linux. This includes using the standard package manager to
        install gnuplot, xmlstarlet, mkvtoolnix and mediaconch. Additionally,
        it often is necessary to remove the Homebrew installed version of SDL2
        to prevent conflicts. For more information please see:
        https://github.com/amiaopensource/vrecord/blob/master/Resources/Documentation/linux_installation.md
      EOS
    end
  end

In this section of a formula, the program developer needs to specify any and all external dependencies – that is, if there is other code, or other programs, that have to be present and installed on the user’s computer before vrecord can be installed and used correctly.

Every depends_on line specifies a dependency, and every dependency listed is…another Homebrew formula. So before Homebrew proceeds to the next section of the vrecord formula (the actual “install” section), it will go to each and everyone of these formula *first* and complete the instructions found there…(including, if those formula specify dependencies, going to their dependent formula – and on and on, down the line, if necessary).

(This means that the amount of time it takes vrecord to install can vary wildly, depending on how many of those depends_on formula are already present on your computer when you start – if you already have installed cowsay or mediaconch before, Homebrew will just skip over this part for those dependencies.)

In this case, the vrecord formula also specifies slightly different behavior depending on whether the Homebrew user is running macOS or Linux. Homebrew works on Linux systems, but, as I mentioned earlier, Linux systems usually have their own baked-in package managers (e.g. apt), and sometimes Homebrew and native Linux package managers don’t play nice with each other – so in this case, rather than having Homebrew run the install process for all those dependencies on_linux, the vrecord formula instead defines a caveat, which is just a warning to display to the user. This caveat, obviously, recommends using the native package manager to install certain dependencies instead of Homebrew, to avoid conflicts and errors.

Installation

def install
    bin.install "vrecord"
    bin.install "vtest"
    prefix.install "Resources/audio_mode.gif"
    prefix.install "Resources/qcview.lua"
    prefix.install "Resources/vrecord_policy_ffv1.xml"
    prefix.install "Resources/vrecord_policy_uncompressed.xml"
    prefix.install "Resources/vrecord_logo.png"
    prefix.install "Resources/vrecord_logo_playback.png"
    prefix.install "Resources/vrecord_logo_audio.png"
    prefix.install "Resources/vrecord_logo_edit.png"
    prefix.install "Resources/vrecord_logo_help.png"
    prefix.install "Resources/vrecord_logo_documentation.png"
    man1.install "vrecord.1"
    man1.install "vtest.1"
  end

Finally, the meat of the matter – in this section of the formula, we actually get to what “install” even means in the context of vrecord.

vrecord is a relatively straightforward case because, the source code doesn’t need to be compiled. The source code is itself a Bash script, which can be interpreted and run by a macOS or Linux system as-is (or, as-long-as-the-dependencies-have-been-installed). There is no translation down to the ones and zeroes of machine code required. So the installation process here isn’t a matter of compiling, it’s just a matter of moving the files to where they can be used.

prefix here is a variable – it’s a setting that’s part of your overall Homebrew configuration, so that any time any Homebrew formula mentions “prefix”, the package manager will just sub in the value it has stored there. Specifically, prefix defines a file path, an over-arching directory/folder for all of Homebrew’s files to live in.

Usually, prefix is set by default when you first install Homebrew and you never have to mess with it again (the whole point of a package manager being to mess with things as little as possible). You can see yours by running $ brew config and looking for the HOMEBREW_PREFIX line – on MacOS, it’s usually something like /usr/local. So, all of the programs that Homebrew downloads, installs, compiles, whatever – they’ll go into that /usr/local directory (unless otherwise specified by a formula) – all neatly nested and organized according to the package/formula name and version.

(The nested directory for keeping code, in Homebrew-speak, is called the Cellar. So if you want to find all your Homebrew-installed programs, poke around your /usr/local/Cellar directory.)

So all those lines that start with prefix.install are saying: take this file (from inside the tarball you downloaded) and put them inside the nested folder specified by the prefix variable. In vrecord’s case, we’re just taking some image and configuration templates (the .XML files) for vrecord’s GUI mode and putting them in appropriate locations in the Cellar.

The bin.install options are doing the same thing, but flagging an extra step. These two entries (vrecord and vtest) are the actual scripts that we want to run, so they need to be made executable for you, the user, to run them. The bin.install directive 1) links the specified files to a particular directory – in this case, /usr/local/bin – where your operating system expects to find executable command-line programs, and 2) adjusts their permissions so that you can run these scripts (without needing “sudo” permission.

(“bin” in these file paths stands for “binaries” – remember that there is a general but imprecise equation between binaries and executables, so that is why we are putting the vrecord script there – because it is executable, even though it is a source code file, not actually a binary/machine code)

The man1.install directives are again very similar to bin.install – these are manual pages that explain how to run the program (by typing $ man vrecord or $ man vtest). macOS expects to find these files in a certain place, just like how it expects executable binaries to be in /usr/bin or /usr/local/bin. So man1.install copies these files to that location.

Test

test do
    system "#{bin}/vrecord", "-h"
  end
end

Homebrew formula writers can optionally put in a “test” block at the end of the script to see if the installation process proceeded correctly. It’s generally out-of-scope of the Homebrew project to make sure that every single feature of every single program works as expected – but it can be a handy check just to make sure at least at the end of all this you got an executable something out of it.

In the case of vrecord, this test block simply directs the user’s computer to try running the command $ vrecord -h – if the computer encounters no errors trying to run this command (which just displays vrecord’s “help” page), then Homebrew will consider this a successful installation and finish running.

Phew. Let’s review – in the end, what did that vrecord Homebrew formula do? It told the computer to:

  1. Go to a certain URL and download the file (tarball) it found there.
  2. Check for any other external software that vrecord needs to work.
  3. Extract vrecord’s files out of the tarball and move them to a different folder.
  4. Test that the computer can find and execute the program now that it’s been moved.

Case Study: Troubleshooting ewfmount

..now what about a slightly more complicated example? One that involves source code compilation? Let’s start with a Tweet.

A bit of context: libewf is a collection of software that allows users to work with EWF-formatted disk images, which is a pretty popular option for preserving hard drives. One of the tools included in libewf is ewfmount, a program that allows you to, well, mount EWF disk images and explore the files on them, just as if they were a physical, external hard drive attached over USB or what have you.

Complicating things, ewfmount won’t work with MacOS out-of-the-box. First you have to install “FUSE for MacOS”, a piece of software that allows MacOS to work with file systems beyond the ones that Apple cares about (basically just APFS at this point). Otherwise, trying to mount your EWF disk image is going to have the same affect as plugging in an unformatted hard drive.

Eddy had installed these two pieces – libewf/ewfmount and FUSE for MacOS, using Homebrew, but the two pieces of software still weren’t “seeing” each other. To find out why, it helped to take a look at the Homebrew formulae – if the installation wasn’t working (Eddy didn’t have a usable version of the desired program at the end of the process), then one of the steps Homebrew was taking by default had to be incorrect for his context.

Here is the current Homebrew formula for libewf – see if you can spot the problematic line:

class Libewf < Formula
  desc "Library for support of the Expert Witness Compression Format"
  homepage "https://github.com/libyal/libewf"
  # The main libewf repository is currently "experimental".
  url "https://github.com/libyal/libewf-legacy/releases/download/20140808/libewf-20140808.tar.gz"
  sha256 "dfe29b5f2f1841ff1fe11979780d710a660dbc4727af82ec391f398e6b49e5fd"
  license "LGPL-3.0"

  bottle do
    cellar :any
    sha256 "43d8ba6c2441f65080f257a7239fe468be70cb2578ec2106230edd1164e967b6" => :catalina
    sha256 "4c5482f8f1c97f9c3f3687bccd9c3628b314699bc26743e641f2ae573bf95eeb" => :mojave
    sha256 "cae6fd2f38855fd15f8a50b644d0817181fed055aef85b7793759d7703a833d4" => :high_sierra
  end

  head do
    url "https://github.com/libyal/libewf.git"
    depends_on "autoconf" => :build
    depends_on "automake" => :build
    depends_on "gettext" => :build
    depends_on "libtool" => :build
  end

  depends_on "pkg-config" => :build
  depends_on "openssl@1.1"

  uses_from_macos "bzip2"
  uses_from_macos "zlib"

  def install
    if build.head?
      system "./synclibs.sh"
      system "./autogen.sh"
    end

    args = %W[
      --disable-dependency-tracking
      --disable-silent-rules
      --prefix=#{prefix}
      --with-libfuse=no
    ]

    system "./configure", *args
    system "make", "install"
  end

  test do
    assert_match version.to_s, shell_output("#{bin}/ewfinfo -V")
  end
end

…give up? On Line 40, in the “install” section of the formula, there are several “args” listed, including --with-libfuse=no. This our clue and our culprit!

These “args” are arguments (options) for source code compilation. So by default, the Homebrew formula defines “installation” of libewf/ewfmount to not include a software library called libfuse – which, as the name implies, is a critical component that libewf/ewfmount requires for communicating with FUSE for MacOS. Without it, ewfmount can not mount EWF disk images on MacOS.

Now, you can edit *any* Homebrew formula and how it works, just for you, by running $ brew edit <package>. This will open up a local copy of the formula in a text editor and let you change and edit options, without affecting how this formula behaves for all other Homebrew users.

But, unfortunately, in this case it is still not enough to just run $ brew edit libewf and change line 40 to --with-libfuse=yes. That’s because, towards the top of this formula, you’ll notice a section that starts with “bottle”.

“Bottles” are Homebrew’s clever name for binaries. These are pre-compiled versions of the source code for libewf, ready to go for MacOS (with flavors for High Sierra, Mojave, or Catalina, as indicated). If a bottle is specified in a Homebrew formula, that’s it – any instructions for how to compile the source code, later on in the formula, will be ignored, because as far as  Homebrew is concerned, you already have a working binary, and it will proceed from there. So editing line 40 will make no difference, because either way Homebrew will by default use a bottle/binary that was (we can assume/infer) already created with --with-libfuse=no.

The solution, in this case, is to both edit line 40 to --with-libfuse=yes AND run $ brew install --build-from-source libewf instead of just $ brew install libewf. This extra flag/option tells Homebrew to ignore the “bottle” section, skip over those pre-compiled binaries, and start from scratch with the source code specified at url. THEN, proceeding down the formula to the “install” section, Homebrew will compile the source according to the options you set, creating a new version/binary with libfuse enabled.

(There is a longer, more rambling and circular version of this solution that I wrote in that Twitter thread – but, it is a little out-of-date, as the Homebrew formula has been edited to adjust the default source code URL since that writing. The basic point – that getting a version of ewfmount that works with FUSE for MacOS using Homebrew requires the two-step process of changing a flag for use during compilation AND telling Homebrew to start from source – still stands)

Wrapping Up

I hope that these examples and explanations help digipres users understand that, while an AMAZING tool and community, Homebrew is not magic! It plays by certain rules and relies on assumptions of what will work best for the most number of users. The vast majority of the time – those assumptions will probably work fine for you!

But if they don’t, it’s just like any other piece of technology, analog or digital – you’ll need to know a bit more about what it’s doing in order to effectively troubleshoot, fix, or change it. So I’ll leave off with a note that Homebrew’s documentation IS ALSO AMAZING! The (open-source and volunteer!!!) contributors have pulled together tons and tons of information about how Homebrew works, guidelines for different ways of using it, templates and automated tools for creating and testing formulae, etc.  I really recommend exploring those pages to learn more about the various tricks up Homebrew’s sleeve (there’s a lot of built-in neatness you might not even know about!!) and as a diving-off point to learn more about code compilation, operating systems, interpreters, and just how software works, in general.

Am I just saying that because this is the longest blog post I’ve ever written, and there were about twenty different tangents and other topics that I didn’t even explore, and that I probably don’t have the time to write about? Maybe! But it’s time to put a cork in it.


  1. ​*​
    Please return to this line in a minute so you can appreciate just how clever I am.
  2. ​†​
    See, told you! Why aren’t you laughing?
  3. ​‡​
    macOS does some trickery that usually hides its executable machine code and files from your view, which again is a topic for another post; but broadly speaking the flip side is true as well when you try to open a macOS .app or .pkg on Windows

 

Everyday Linux

A couple weeks ago, there was a Forbes article that caught my eye. (No, librarians, it wasn’t that twaddle, please don’t hurt me)

No, it was this one, relating one writer/podcaster’s decision to switch to Linux as his everyday operating system after a few too many of the new “Blue Screen of Death”: Windows 10’s staggeringly inconvenient and endless Update screens.

Actually, turn it on, turn it off, who cares! You’re not working on that spreadsheet today anyway.

The article stood out because I had already been planning to write something extremely similar myself. A couple years ago, I needed a new laptop and decided I wanted out of the Apple ecosystem, partly because Apple’s desktop/laptop hardware and macOS design seemed increasingly shunted to the side in favor of iOS/mobile/tablets, but mostly because of $$$. I considered jumping ship to a Windows 10 machine, which, as I’ve said on several occasions, is actually a pretty nifty OS at its core – but, like Mr. Evangelho, I had encountered one too many productivity-destroying updates to my liking on my Windows station at work. Never mind the intrusive privacy defaults and the insane inability to permanently uninstall Candy Crush, Minecraft and other bloatware forced upon me by Microsoft.

why

I had used Linux operating systems, particularly Ubuntu (via BitCurator) before, and thought it might be time to take the leap to everyday use. After a little bit of research to make sure I would still be able to find versions of my most common/critical applications, I jumped ship and haven’t looked back. So, whereas I have written before about Linux in a professional context for digital preservation on several occasions, I want to finally make my evangelizing case of Linux as an everyday, personal operating system – for anyone.

Linux has a reputation as a “geeky” system for programmers and hardcore computer tinkerers, but it’s become incredibly accessible to anyone – or at least, certainly to anyone who’s used to having macOS or Windows in their daily lives. In fact, you’re almost certainly already using Linux even if you don’t realize it – if you have an Android smartphone, if you have a Chromebook laptop, if you have any of a thousand different smart/networked home devices (which, please throw them in the trash, but whatever), you’re using and relying on Linux.

Breaking away from the Mac/Windows dichotomy is as easy as your original choice of one or the other – the hurdle is largely just realizing there’s another option to debate.

Why Linux?

A Linux operating system is an example of free and open source software, often abbreviated FOSS. The “free” in there is meant to refer to “freedom”, not price (although FOSS tends to be free in that sense as well) – a legacy of the Free Software Foundation‘s four maxims that computer users should be able to:

  • run a program
  • study a program’s source code (in essence, to understand exactly how it works)
  • redistribute exact copies of that program
  • create and distribute modified versions of that program

This is all in opposition to closed and proprietary software, which use copyright and patent licenses to run contrary to at least one or more of these ideas. (Note that some open source software may still carry licenses that restrict the latter two points in certain ways – always check!)

Look, I could go on my anti-capitalist screed here, but you should probably just go see a more clever and entertaining one. But what it comes down to is that, unlike the proprietary model of one company hiring employees to build and distribute/sell its own, closed software, open source software is built by collaborative networks of programmers and users, under the general philosophy that humanity tends to make better, more broadly applicable advancements when everyone stands to (at least potentially) benefit.

Laika believed in this, don’t you?

That doesn’t mean that all open source developers are noble self-sacrificing volunteers. There are entire companies – like Canonical, Mozilla, Red Hat – dedicated to creating and supporting it, and any number of name-brand tech giants – Google, Oracle, yes Microsoft and Apple even – that at least participate in certain open source projects. When I say everyone can benefit, that often includes Big Tech. So don’t get me wrong, there’s plenty of ways to participate in and advocate for FOSS in ways that don’t involve a total shift in your operating system and computing environment, if you’re perfectly content where you are now.

But for me, switching over completely to a FOSS operating system in Linux felt like a way to take back some control from increasingly intrusive devices. For many years, Apple products’ big selling point was “it just works”, and I solidly felt that way with my first couple MacBooks – buy a laptop and the operating system got out of the way, letting you browse the internet, make movies, write up Sticky Note reminders, listen to music, and install other favorite programs and games, in a matter of minutes. I could do whatever it was I wanted to do.

I don’t feel like macOS (or Windows) “just works” quite in that same way anymore – they’re designed to work the way Apple and Microsoft want me to work. Constant, barraging notifications to log in to iCloud or OneDrive accounts, to enable Siri or Cortana AI assistance. Obscured telemetry settings sending data back to the hivemind and downloading “helpful” background programs, clogging up the computer’s resources without user knowledge. Stepping way beyond security concerns to slowly but surely cordon off anything downloaded by the user, to pigeonhole them into corporately-vetted App Stores. A six-month long hooplah over “Dark Mode.”

…go away forever?

(Look, I’m no fool – Apple’s choices were always business choices, made to ultimately improve their company’s market share, no matter which way you slice it – but I don’t think I’m alone in feeling that for some time that meant ceding at least the illusion of control to the user, or at least not nagging them every damn day into the feeling they were somehow using their own computer the “wrong” way)

Linux operating systems, because they are open and modifiable, are also extremely flexible and controllable – if that means you want to get into the nitty-gritty and install every single piece of software that makes a computer work yourself, go for it. But if that means you just want something that gets out of the way and lets you play Oregon Trail on the Internet Archive, Linux can also be that. It can be your everyday, bread-and-butter, “just works” computer, without voices constantly shouting at you about what that should look like.

What’s different?

Well, despite the whole stirring case I may have just made…there is no “Linux operating system.” Or at least, there is no one thing called “Linux” that you just go out and download and start streaming Netflix on.

Linux is a kernel. It’s the very center, core, most important piece of an operating system, but it’s not entirely functional in and of itself. You have to pile a bunch of other things on top of it: a desktop environment, a way to install and update applications, icons and windows and buttons – all the sexy, front-facing stuff that most of us actually consider when picking which operating system we want to use. So many, many, many people and companies have created their own version of that stuff, piled it on top of Linux, and released it as their own operating system. And each one of those can have a completely different look or feel to them.

All these different flavors or versions of Linux are referred to as distributions. (If you want to really fit in, call them “distros“)

So….what distribution do you choose???

This is absolutely the most overwhelming thing about switching to Linux. There are a lot of distributions, and they all have their own advantages and disadvantages – sometimes not very obvious, because there aren’t necessarily whole marketing teams behind them to give you the quick, summarized pitch on what makes their distribution different from others.

I’ve tried out several myself and will give out some recommendations, in what I hope are user-friendly terms, in the next/last section. This huge amount of choice at this very first stage can be staggering, but consider the benefit compared to closed systems: do you ever wish macOS had a “home” button and a super-key like Windows, so you could just pull up applications and more without having to remember the keyboard shortcut for Spotlight? Do you wish the Windows dock was more responsive, or its drop-down menus were located all the way at the top of the screen so you had more space for your word document? You’re probably never going to be able to make those tweaks unless Apple or Microsoft make them for you. With Linux, you can find the distribution that either already mixes and matches things the right way for you – or lets you tweak them yourself!

Like literally entire applications dedicated to easily tweaking the system

In terms of hardware, if you’re coming from Windows/PC-land, there’s not going to be much difference at all. Like Windows, you install a Linux operating system on a third-party hardware manufacturer’s device: HP, IBM, Lenovo, etc. You can competitively price features to your liking – more or less storage, higher resolution screen, higher quality keyboard or trackpad, whatever it is that’s important to you and your everyday comfort.

A small handful of companies will even directly sell you laptops with Linux distributions pre-installed (System76, Dell). But for the most competitive (read: cheapest) options, you’ll have to install Linux on a PC of your choice yourself.

Maybe not the most efficient choice

Like with Windows, this does also mean you *may* occasionally have to install or reinstall drivers to make certain peripheral devices (Wi-Fi cards, external mice) play nice with your operating system. This used to be a much more common issue than it is now, and a legitimate knock against Linux systems – but these days, if you’re using a major, well-supported distribution, it’s really no worse than Windows. And if you’re sitting there, a dedicated Windows user, thinking “huh, I’ve never had to deal with that”, neither have I in two years on my Linux laptop. This is more a warning to the Mac crowd that, hey, it’s possible for problems to arise when the company making the software isn’t also making the hardware (and if you’ve ever used a cheap non-Apple Thunderbolt adapter  or power charger – you probably knew that anyway!)

Finally, applications! Again, the major Linux distributions have all at this point pretty much borrowed the visual conception of the “App Store” – a program you can use to easily browse, install and launch a vast range of open source software. The vetting may not be as thorough – so bring the same healthy dose of skepticism and awareness that you do to the Google Play Store and you’ll be fine.

Sure, “contains ads”, seems legit

If you’re worried about losing out on your favorite Mac/Windows programs, you absolutely may want to do some research to make sure there are either Linux versions or at least satisfactory Linux equivalents to the software you need. But while you might not be able to get Adobe programs, for instance, on to your new operating system, there’s plenty of big-name proprietary apps that have made the leap in recent years: Spotify, Slack, even Skype. And Linux programs are usually available to at least open and convert the files you originally made with their Mac/Windows equivalents (LibreOffice can open and work with the various Microsoft Office formats, for instance, and GIMP can at least partially convert PhotoShop .PSDs)

Installing these applications is as easy as clicking “Install” in an App Store. No Apple ID hoops to go through. And the really wonderful thing is that unlike Mac or Windows, most Linux systems will track and perform application updates at the same time and in the same place as operating system updates – no more menu-searching and notifications from individual applications to make sure you’re on the latest, greatest, and most secure version of any given program. You’ll just get a general pop-up from the “Software Center” or equivalent and perform updates in one quick, fell swoop, or as nit-picky as desired. (And my Linux laptop has never unexpectedly forced a restart to update while I was doing something else)

And finally, Linux drives are formatted differently than macOS or Windows, using the “ext4” file system. This means you can encounter some of the same quirks in moving your old files to a Linux system that you’ve ever had in shuttling between Mac and Windows – but Linux pretty much always comes with at least read support for HFS+ (Mac) and NTFS (Windows) drives, so likewise I’ve never had issues with at least just transferring old files over to a new drive.

How can I try it out?

The great news is, unlike Mac or Windows, you don’t have to go to a physical store or buy a completely new laptop just to try a Linux distribution and see if it’s something you would like to use!

Just like when you install (or reinstall) the operating system on a Mac or Windows computer, to install Linux you’ll need a USB drive that is at least 8GB large to house the installation disk image – an ISO file. Unlike Mac or Windows, Linux installation images, in addition to the installation program itself, pretty much always have a “Live” mode – this lets you run a Linux session on your computer, to see how well it works on your hardware and if you like the distribution’s design and features. It’s a fantastic try-before-you-buy feature, and can even work with MacBooks, if that’s all you have (just don’t be surprised/blame Linux if there’s some hardware wonkiness, like your keyboard not responding 100% correctly).

Once you have an 8GB USB flash drive and the ISO file for the OS you want to try downloaded, you’ll need an application to “burn” the ISO to the flash drive and make it bootable. I recommend Etcher, which is multi-platform so it’ll work whether you’re starting out on Mac or Windows (and also, just for the record, if you’re trying to make bootable installers from macOS .DMGs or Windows ISOs – Etcher is a rad tool!). From there you’ll need to boot into the installer USB according to instructions that will depend on your laptop manufacturer (it usually means holding down one of the function keys at the top of your keyboard during startup, but the key combination varies depending on the hardware/maker).

So what about all those distributions? Which ones should you try? Here are some of the most popular flavors that I think would also be accessible to converts making their way over from macOS or Windows. These distributions all have wide user bases, meaning they all have either good documentation or even active support accounts that you can contact in the event of questions or problems.

Ubuntu

Thanks to its popularity as an operating system for servers and the Internet of Things, Ubuntu is probably the biggest name in Linux, and if you just want a super stable, incredibly well-supported desktop with thoughtful features, it is still my go-to recommendation for most casual users seeking alternative to macOS and Windows (and as I said earlier, if you’ve ever encountered BitCurator, you already know what it looks/feels like). It’s what I use myself for day-to-day web browsing, streaming services, word processing, some Steam gaming, light digipres/coding and a bit of server maintenance for this very site.

When it comes to Ubuntu, you’re going to want to look to try out a version labeled “LTS” – that’s “Long-Term Support”, meaning OS updates are guaranteed for five years (any of the other versions are primarily for developers and other anxious early adopters who don’t mind a few more bugs). The latest LTS release, 18.04, just came out a couple months ago with a pretty major desktop redesign, but it’s as attractive, sleek and functional as ever, and I came back to it after flirting with some of the other distributions on this list.

Linux Mint

Linux Mint is itself a derivative of Ubuntu, so everything I just said about stability and support goes for Mint as well – the Mint developers just wait for Ubuntu to release updates and then add their own spin. The differences are thus largely visual – Linux Mint’s desktop is made to look more like Windows, so users who are migrating from that direction are more likely to be at home here. It’s been around for a while so the support community is likewise large and varied.

Zorin

Zorin is also an Ubuntu derivative that’s even more explicitly targeted at converting Windows users/Linux newbies. It’s newer than Mint but I have to say personally I think that’s led to a fresher, more attractive design. (More Windows 10 to Linux Mint’s Windows 7). In fact it’s pretty much built around flexible, easily changed desktop design. Going with a distribution based on super pretty icons and easily swapping around menus might seem silly, but honestly, there’s so many distributions with such similar core features that these *are* the kinds of decisions that make Linux users go with one over the other.

elementary OS

Whereas Linux Mint and Zorin are more or less “Ubuntu for Windows converts”, elementary OS has a visual design more targeted to bring over macOS users, with a dock, top menu bar, etc. It’s a very light, sleek OS that really only ships with the most basic apps and the design encourages you to keep things simple (their media apps are literally just called things like “Music” and “Videos”, and their custom web browser Epiphany has a bare minimum of features to keep from becoming a memory hog like Firefox and Chrome). But since it’s Ubuntu based you can still easily install more familiar open source apps like VLC, etc. Also a very intriguing Linux OS to try if you’re used to something like a Chromebook or a tablet as your primary device.

Pop!_OS

System76 is known more for their attractive hardware and great customer support (I’ve had my Lemur for two years and am still in love), but they recently tried putting out their own Ubuntu Linux distribution targeted at developers and creative professionals. It pretty much seems like Ubuntu but with some more minimalist icon design and some nifty expanded features for multiple workspaces, but mostly I’m plugging them here because if you wind up looking for higher-end hardware and some really helpful, responsive support, System76 is tops.

Solus

Whereas all these previous distributions I’ve mentioned have built off the stability of Ubuntu, Solus is different, as it is its own completely independent OS. It’s an example of a “rolling release” operating system: while with macOS, Windows, and many Linux distributions like Ubuntu, you have to worry about your particular, fixed-point version eventually becoming obsolete and no longer receiving updates (e.g. once macOS 10.14 comes out, those users still using 10.11 will no longer receive security and software updates from Apple), Solus will just continually roll out updates, forever. Or, well, “forever,” but you get what I mean.

That means users actually get the latest software and features much faster, as you don’t have to wait for those things to get bundled into the next “release”. It’s an interesting model if you’re really ready to unmoor and explore away from the usual systems.

Aside from that, Solus’ custom “Budgie” desktop design is extremely pretty and modern, with a quick-launch menu and a custom applets bar. It feels a lot like Windows 10 in that sense.

Doing DigiPres with Windows

A couple of times at NYU, a new student would ask me what kind of laptop I would recommend for courses and professional use in video and digital preservation. Maybe their personal one had just died and they were shopping for something new. Often they’d used both Mac and Windows in the past so were generally comfortable with either.

I was and still am conflicted by this question. There are essentially three options here: 1) some flavor of MacBook, 2) a PC running Windows, or 3) a PC running some flavor of Linux. (There are Chromebooks as well, but given their focus on cloud applications and lack of a robust desktop environment, I wouldn’t particularly recommend them for archivists or students looking for flexibility in professional and personal use)

Each of these options have their drawbacks. MacBooks are and always have been prohibitively expensive for many people, and I’m generally on board with calling those “butterfly switch” keyboards on recent models a crime. Though a Linux distribution like Ubuntu or Linux Mint is my actual recommendation and personally preferred option, it remains unfamiliar to the vast majority of casual users (though Android and ChromeOS are maybe opening the window), and difficult to recommend without going full FOSS evangelist on someone who is largely just looking to make small talk (I’ll write this post another day).

So I never want to steer people away from the very real affordability benefits of PC/Windows – yet feel guilty knowing that’s angling them full tilt against the grain of Unix-like environments (Mac/Linux) that seem to dominate digital preservation tutorials and education. It makes them that student or that person in a workshop desperately trying in vain to follow along with Bash commands or muck around in /usr/local while the instructor guiltily confronts the trolley problem of spending their time saving the one or the many.

I also recognize that *nix education, while the generally preferred option for the #digipres crowd, leaves a gaping hole in many archivists’ understanding of computing environments. PC/Windows remains the norm in a large number of enterprise/institutional workplaces. Teaching ffmpeg is great, but do we leave students stranded when they step into a workplace and can’t install command line programs without Homebrew (or don’t have admin privileges to install new programs at all)?

This post is intended as a mea culpa for some of these oversights by providing an overview of some fundamental concepts of working with Windows beyond the desktop environment – with an eye towards echoing the Windows equivalents to usual digital preservation introductions (the command line, scripting, file system, etc.).

*nix vs NT

To start, let’s take a moment to consider how we got here – why are Mac/Linux systems so different from Windows, and why do they tend to dominate digital preservation conversations?

Every operating system (whether macOS, Windows, a Linux distribution, etc.) is actually a combination of various applications and a kernel. The kernel is the core part of the OS – it handles the critical task of translating all the requests you as a user make in software/applications to the hardware (CPU, RAM, drives, monitor, peripherals, etc.) that actually perform/execute the request.

“Applications” and “Kernel” together make up a complete operating system.

UNIX was a complete operating system created by Bell Labs in 1969 – but it was originally distributed with full source code, meaning others could see exactly how UNIX worked and develop their own modifications. There is a whole complex web of UNIX offshoots that emerged in the ’80s and early ’90s, but the upshot is that a lot of people took UNIX’s kernel and used it as the base for a number of other kernels and operating systems, including modern-day macOS and Linux distributions. (The modifications to the kernel mean that these operating systems, while originally derived in some way from UNIX, are not technically UNIX itself – hence you will see these systems referred to as “Unix-like” or “*nix”)

The result of this shared lineage is that Mac and Linux operating systems are similar to each other in many fundamental ways, which in turn makes cross-compatible software relatively easy to create. They share a basic file system structure and usually a default terminal and programming language (Bash) for doing command line work, for instance.

Meanwhile there was Microsoft, which, though it originally created its own (phenomenally popular!) variant of UNIX in the ’80s, switched over in the late ’80s/early ’90s into developing its own, proprietary kernel, entirely separate from UNIX. This was the NT kernel, which is at the heart of basically every Windows operating system ever made. These fundamental differences in architecture (affecting not just clearly visible characteristics like file systems/naming and command line work, but extremely low-level methods of how software communicates with hardware) make it more difficult to make software cross-compatible between both *nix and Windows operating systems without a bunch of time and effort from programmers.

So, given the choice between these two major branches in computing, why this appearance that Mac hardware and operating systems have won out for digital preservation tasks, despite being the more expensive option for constantly cash-strapped cultural institutions and employees?

It seems to me it has very little to do with an actual affinity for Macs/Apple and everything to do with Linux and GNU and open source software. Unlike either macOS or Windows, Linux systems are completely open source – meaning any software developer can look at the source code for Linux operating systems and thus both design and make available software that works really well for them without jumping through any proprietary hoops (like the App Store or Windows Store). It just so happens that Macs, because at their core they are also Unix-like, can run a lot of the same, extremely useful software, with minimal to no effort. So a lot of the software that makes digital preservation easier was created for Unix-like environments, and we wind up recommending/teaching on Macs because given the choice between the macOS/Windows monoliths, Macs give us Bash and GNU tools and other critical pieces that make our jobs less frustrating.

Microsoft, at least in the ’90s and early aughts, didn’t make as much of an effort to appeal directly to developers (who, very very broadly speaking, value independence and the ability to do their own, unique thing to make themselves or their company stand out) – they made their rampant success on enterprise-level business, selling Windows as an operating system that could be managed en masse as the desktop environment for entire offices and institutions with less troubleshooting of individual machines/setups.

(Also key is that Microsoft kept their systems cheaper by never getting into the game of hardware manufacturing – unlike Mac operating systems, Windows is meant to run on third-party hardware made by companies like HP, Dell, IBM, etc. The competition between these manufacturers keeps prices down, unlike Apple’s all-in-one hardware + OS systems, making them the cheaper option for businesses opening up a new office to buy a lot of computers all at once. Capitalism!)

As I’ll get into soon, there have been some very intriguing reversals to this dynamic in recent years. But, in summary: the reason Macs have seemed to dominate digital preservation workshops and education is that there was software that made our jobs easier. It was never that you can’t do these things with Windows.

Windows File System

Beyond the superficial desktop design differences (dock vs. panel, icons), the first thing any user moving between macOS and Windows probably notices are the differences between Finder (Mac) and File Explorer (Windows) in navigating between files and folders.

In *nix systems, all data storage devices – hard disk drives, flash drives, external discs or floppies, or partitions on those devices – are mounted into the same, “root” file system as the operating system itself, in Windows every single storage device or partition is separated into its own “drive” (I use quotes because two partitions might exist on the same physical hard disk drive, but are identified by Windows as separate drives for the purpose of saving/working with files), identified by a letter – C:, D:, G:, etc. Usually, the Windows operating system itself, and probably most of a casual user’s files, are saved on to the default C: drive. (Why start at C? It harkens back to the days of floppy drives and disk operating systems, but that’s a story for another post). From that point, any additional drives, disks or partitions are assigned another letter (if for some reason you need more than 26 drives or partitions mounted, you’ll have to resort to some trickery!)

Also, file paths in Windows use backslashes (e.g. C:\Users\Ethan\Desktop) instead of forward slashes (/Users/Ethan/Desktop)

and if you’re doing command-line work, Bash (the default macOS CLI) is case-sensitive (meaning /Users/Ethan/Desktop/sample.txt and /Users/Ethan/Desktop/SAMPLE.txt are different files) while Windows’ DOS/PowerShell command line interfaces (more on these in a minute) are not (so SAMPLE.txt would overwrite sample.txt)

¯\_(ツ)_/¯

Fundamentally, I don’t think these minor differences affect the difficulty either way of performing common tasks like navigating directories in the command line or scripting from Macs – it’s just different, and an important thing to keep in mind to remember where your files live.

What DOES make things more difficult is when you start trying to move files between macOS and Windows file systems. You’ve probably encountered issues when trying to connect external hard drives formatted for one file system to a computer formatted for another. Though there’s a lot of programming differences between file systems that cause these issues (logic, security, file sizes, etc.), you can see the issue just with the minor, user-visible differences just mentioned: if a file lives at E:\DRIVE\file.txt on a Windows-formatted external drive, how would a computer running macOS even understand where to find that file?

Modern Windows systems/drives are formatted with a file system called NTFS (for many years Microsoft used FAT and its variants such as FAT16 and FAT32, so NTFS is backwards-compatible with these legacy file systems). Apple, meanwhile, is in a moment transitioning from its very longtime system HFS+ (which you may have also seen referred to in software as “Mac OS Extended”) to a new file system, APFS.

File system transitions tend to be a headache for users. NTFS and HFS+ have at least been around for long enough that quite a lot of software has been made to allow moving back and forth/between the two file systems (if you are on Windows, HFS Explorer is a great free program for exploring and extracting files on Mac-formatted drives; while on macOS, the NTFS-3G driver allows for similarly reading and writing from Windows-formatted storage). Since it is still pretty new, I am less aware of options for reading  APFS volumes on Windows, but will happily correct that if anyone has recommendations!

** I’ve got a lot of questions in the past about the exFAT file system, which is theoretically cross-compatible with both Mac and Windows systems. I personally have had great success with formatting external hard drives with exFAT, but have also heard a number of horror stories from those who have tried to use it and been unable to mount the drive into one or another OS. Anecdotally it seems that drives originally formatted exFAT on a Windows computer have better success than drives formatted exFAT on a Mac, but I don’t know enough to say why that might be the case. So: proceed, but with caution!

Executable Files

No matter what operating system you’re using, executable files are pieces of code that directly instruct the computer to perform a task (as opposed to any other kind of data file, whose information usually needs to be parsed/interpreted by executable software in some way to become in any way meaningful – like a word processing application opening a text file, where the application is the executable and the text file is the data file). As a common user, anytime you double-clicked on a file to open an application or run a script, whether a GUI or a command line program, you ran an executable.

Where executable files live and how they’re identified to you, the user, varies a bit from one operating system to the next. A MacOS user is accustomed to launching installed programs from the Applications folder, which are all actually .app files (.apps in turn are actually secret “bundles” that contain executables and other data files inside the .app, which you can view like a folder if you right-click and select “Open Package Contents” on an application).

Executable files (with or without a file extension) might also be automatically identified by macOS anywhere and labeled with this icon:

And finally, because executable files are also called binaries, you might find executables in a folder labeled “bin” (for instance, Homebrew puts the executable files for programs it installs into /usr/local/bin)

Windows-compatible executables are almost always identified with .exe file extensions. Applications that come with an installer (or, say, from the Windows Store) tend to put these in directories identified by the application name in the “Program Files” folder(s) by default. But some applications you just download off of GitHub or SourceForge or anywhere else on the internet might just put them in a “bin” folder or somewhere else.

(Note: script files, which are often identified by an extension according to the programming language they were written in – .py files for Python, .rb for Ruby, .sh for Bash, etc. – can usually be made automatically executable like an .exe but are not necessarily by default. If you download a script, whether working on Mac or Windows, you may need to check whether it’s automatically come to you as an executable or more steps need to be taken. They often don’t, because downloading executables is a big no-no from the perspective of malware protection.)

Command Prompt vs. PowerShell

On macOS, we usually do command line work (which offers powerful tools for automation, cool/flexible software, etc.) using the Terminal application and Bash shell/programming language.

Modern Windows operating systems are a little more confusing because they actually offer two command line interfaces by default: Command Prompt and PowerShell.

The Command Prompt application uses DOS commands and has essentially not changed since the days of MS-DOS, Microsoft’s pre-Windows operating system that was entirely accessed/used via command line (in fact, early Windows operating systems like Windows 3.1 and Windows 95 were to a large degree just graphic user interfaces built on top of MS-DOS).

The stagnation of Command Prompt/DOS was another reason developers started migrating over to Linux and Bash – to take advantage of more advanced/convenient features for CLI work and scripting. In 2006, Microsoft introduced PowerShell as a more contemporary, powerful command line interface and programming language aimed at developers in an attempt to win some of them back.

While I recognize that PowerShell can do a lot more and make automation/scripting way easier, I still tend to stick with Command Prompt/DOS when doing or teaching command line work in Windows. Why? Because I’m lazy.

The Command Prompt/DOS interface and commands were written around the same time as the original Bash – so even though development continued on Bash while Command Prompt stayed the same, a lot of the most basic commands and syntax remained analogous. For instance, the command for making a directory is the same in both Terminal and Command Prompt (“mkdir”), even if the specific file paths are written out differently.

So for me, while learning Command Prompt is a matter of tweaking some commands and knowledge that I already know because of Bash, learning PowerShell is like learning a whole new language – making a directory in PowerShell, e.g., is done with the command shortcut “md”.* It’s sort of like saying I already know Castillian Spanish, so I can tweak that to figure out a regional dialect or accent – but learning Portuguese would be a whole other course.

Relevant? Maybe not.

But your mileage on which of those things is easier may vary! Perhaps it’s more clear to keep your *nix and Windows knowledge totally separate and take advantage of PowerShell scripting. But this is just to explain the Command Prompt examples I may use for the remaining of this post.

* ok, yes, “mkdir” will also work in PowerShell because it’s the cmdlet name – this comparison doesn’t work so well with the basic stuff, but it gets true for more advanced functions/commands and you know it, Mx. Know-It-All

The PATH Variable

The PATH system variable is a fun bit of computing knowledge that hopefully ties together all the three previous topics we’ve just covered!

Every modern operating system has a default PATH variable set when you start it up (system variables are basically configuration settings that, as their name implies, you can update or change). PATH determines where your command line interface looks for executable files/binaries to use as commands.

Here’s an example back on macOS. Many of the binaries you know and execute as commands (cd, mkdir, echo, rm) live in /bin or /usr/bin in the file system. By default, the PATH variable is set as a list that contains both “/bin” and “/usr/bin” – so that when you type

[cc lang=”Bash”]$ cd /my/home/directory[/cc]

into Terminal, the computer knows to look in /bin and /usr/bin to find the executable file that contains instructions for running the “cd” command.

Package managers, like Homebrew, usually put all the executables for the programs you install into one folder (in Homebrew’s case, /usr/local/bin) and ALSO automatically update the PATH variable so that /usr/local/bin is added to the PATH list. If Homebrew put, let’s say, ffmpeg’s executable file into the /usr/local/bin folder, but never updated the PATH variable, your Terminal wouldn’t know where to find ffmpeg (and would just return a “command not found” error if you ran any command starting with [cc lang=”Bash”]$ ffmpeg[/cc]), even though it *is* somewhere on your computer.

So let’s move over to Windows: because Command Prompt, by default, does not have a package manager, if you download a command line program like ffmpeg, Command Prompt will not automatically know where to find that command – unless a) you move the ffmpeg directory, executable file included, into somewhere already in your PATH list (maybe somewhere like “C:\Program Files\”), or b) update the PATH list to include where you’ve put the ffmpeg directory.

I personally work with Windows rarely enough that I tend to just add a command line program’s directory to the PATH variable when needed – which usually requires rebooting the computer. So I can see how if you were adding/installing command line programs for Windows more frequently, it would probably be convenient to just create/determine “here’s my folder of command line executables”, update the PATH variable once, and then just put new downloaded executables in that folder from then on.

*** I do NOT recommend putting your Downloads or Documents folder, or any user directory that you can access/change files in without admin privileges, into your PATH!!! It’s an easy trick for accidentally-downloaded malware to get into a commonly-accessed folder (and/or one that doesn’t require raised/administrative privileges) and then be able to run its executable files from there without you ever knowing because it’s in your PATH.

Updating the PATH variable was a pain in the ass in Windows 7 and 8 but is much less so now on Windows 10.

If you are in the command line and want to quickly troubleshoot what file paths/directories are and aren’t in your PATH, you can use the “echo” command, which is available on both *nix and Windows systems. In *nix/Bash, variables can be invoked in scripts or changed with $PATH, so

[cc lang=”Bash”]$ echo $PATH[/cc]

would return something like so, with the different file paths in the PATH list separated by colons:

While on Windows, variables are identified like %THIS%, so you would run

[cc lang=”DOS” escaped=”true”] > echo %PATH%[/cc]

Package Management

Now, as I said above, package managers like Homebrew on macOS can usually handle updating/maintaining the PATH variable so you don’t have to do it by hand. For some time, there wasn’t any equivalent command line package manager (at least that I was aware of) for Windows, so there wasn’t any choice in the matter.

That’s no longer true! Now there’s Chocolatey, a package manager that will work with either Command Prompt or PowerShell. (Note: Chocolatey *uses* PowerShell in the background to do its business, so even though you can install and use Chocolatey with Command Prompt, PowerShell still needs to be on your computer. As long as your computer/OS is newer than 2006, this shouldn’t be an issue).

Users of Homebrew will generally be right at home with Chocolatey: instead of [cc lang=”Bash”]$ brew install [package-name][/cc] you can just type [cc lang=”DOS” escaped=”true”] > choco install [package-name][/cc] and Chocolatey will do the work of downloading the program and putting the executable in your PATH, so you can get right back to command line work with your new program. It works whether you’re on Windows 7 or 10.

The drawback, at least if you’re coming over from using macOS and Homebrew, is that Chocolatey is much more limited in the number/type of packages it offers. That is, Chocolatey still needs Windows-compatible software to offer, and as we went over before, some *nix software has just never been ported over to supported Windows versions (or at least, into Chocolatey packages). But many of the apps/commands you probably know and love as a media archivist: ffmpeg, youtube-dl, mediainfo, exiftool, etc. – are there!

Scripting

If you’ve been doing introductory digipres stuff, you’ve probably dipped your toes into Bash scripting – which is just the idea of writing some code written in Bash into a plain text file, containing instructions for the Terminal. Your computer can then run the instructions in the script without you needing to type out all the commands into Terminal by hand. This is obviously most useful for automating repetitive tasks, where you might want to run the same set of commands over and over again on different directories or sets of files, without constant supervising and re-typing/remembering commands. Scripts written in Bash and intended for Mac/Linux Terminals tend to be identified with “.sh” file extensions (though not exclusively).

Scripting is absolutely still possible with Windows, but we have to adjust for the fact that the Windows command line interface doesn’t speak Bash. Command Prompt scripts are written in DOS style, and their benefit for Windows systems (over, say, PowerShell scripts) is that they are extremely portable – it doesn’t much matter what version of Windows or PowerShell you’re using.

Command Prompt scripts are often called “batch files” and can usually be identified by two different file extensions: .bat or .cmd

(The difference between the two extensions/file formats is negligible on a modern operating system like Windows 7 or 10. Basically, BAT files were used with MS-DOS, and CMD files were introduced with Windows NT operating systems to account for the slight improvements from the MS-DOS command line to NT’s Command Prompt. But, NT/Command Prompt remained backwards-compatible with MS-DOS/BAT files, so you would only ever encounter issues if you tried to run a CMD file on MS-DOS, which, why are you doing that?)

Actually writing Windows batch scripts for Command Prompt is its own whole tutorial. Which – thankfully – I don’t have to write, because you can follow this terrific one instead!

Because Command Prompt batch scripts are still so heavily rooted in decades-old strategies (DOS) for talking to computers, they are pretty awkward to look at and understand.

I mean what

A lot of software developers skip right to a more advanced/human-readable programming language like Python or Ruby. Scripts in these languages do also have the benefit of being cross-platform, since Python, for example, can be installed on either Mac or Windows. But Windows OS will not understand them by default, so you have to go install them and set up a development environment rather than start scripting from scratch as you can with BAT/CMD files.

Windows Registry

The Windows Registry is fascinating (to me!). Whereas in a *nix operating system, critical system-wide configuration settings are contained as text in files (usually high up under the “root” level of your computer’s file system, where you need special elevated permissions to access and change them), in Windows these settings (beyond the usual, low-level user settings you can change in the Settings app or Control Panel) are stored in what’s called the Windows Registry.

The Windows Registry is a database where configuration settings are stored, not as text in files, but as “registry values” – pieces or strings of data that store instructions. These values are stored in “keys” (folders), which are themselves stored in “hives” (larger folders that categorize the keys within related subfolders).

Registry values can be a number of different things – text, a string of characters representing hex code, a boolean value, and more. Which kind of data they are, and what you can acceptably change them to, depends entirely on the specific setting they are meant to control. However, values and keys are also often super abstracted from the settings they change – meaning it can be very very hard to tell what exactly a registry value can change or control just by looking at it.

…..no, not at all!

For that reason, casual users are generally advised to stay out of the Windows Registry altogether, and those who are going to change something are advised to make a backup of their Registry before making edits, so they can restore it if anything goes wrong.

Just as you may never mess around in your “/etc” folder on a Mac, even as a digitally-minded archivist, you may or may not ever need to mess in the Windows Registry! I’ve only poked around in it to do things like removing that damn OneDrive tab from the side of my File Explorer windows so I stop accidentally clicking on it. But I thought I’d mention it since it’s a pretty big difference in how the Windows operating and file systems work.

Or disable Microsoft’s over-extensive telemetry! Which, along with the fact that you can’t fucking uninstall Candy Crush no matter how hard you try, is one of the big things holding back an actually pretty nifty OS.

How to Cheat: *nix on Windows

OK. So ALL OF THIS has been based on getting more familiar with Windows and working with it as-is, in its quirky not-Unix-like glory. I wanted to write these things out, because to me, understanding the differences between the major operating systems helped me get a better handle on how computers work in general.

But guess what? You could skip all that. Because there are totally ways to just follow along with *nix-based, Bash-centric digital preservation education on Windows, and get everyone on the same page.

On Windows 7, there were several programs that would port a Bash-like command line environment and a number of Linux/GNU utilities to work in Windows – Cygwin probaby being the most popular. Crucially, this was not a way to run Linux/Unix-like applications on Windows – it was just a way of doing the most basic tasks of navigating and manipulating files on your Windows computer in a *nix-like way, with Bash commands (the software essentially translated them into DOS/Windows commands behind-the-scenes).

But a way way more robust boon for cross-compatibility was the introduction of the Windows Subsystem for Linux in Windows 10. Microsoft, for many years the fiercest commercial opponent of open source software/systems, has in the last ten years or so increasingly been embracing the open source community (probably seeing how Linux took over the web and making a long-term bet to win back the business of software/web developers).

Hmm

The WSL is a complete compatibility layer for, in essence, installing Linux operating systems like Ubuntu or or Kali or openSUSE *inside* a Windows operating system. Though it still has its own limitations, this basically means that you can run a Bash terminal, install Linux software, and navigate around your computer just as you would on a *nix-only system – from inside Windows.

The major drawback from a teaching perspective: it is still a Linux operating system Windows users will be working with (I’d personally recommend Ubuntu for digipres newbies) – meaning there may still be nagging differences from Mac users (command line package management using “apt” instead of Homebrew for instance, although this too can be smoothed over with the addition of Linuxbrew). This little tutorial covers some of those, like mounting external drives into the WSL file system and how to get to your Windows files in the WSL.

You can follow Microsoft’s instructions for installing Ubuntu on the WSL here (it may vary a bit depending on which update/version exactly of Windows 10 you’re running).

That’s it! To be absolutely clear at the end of things here, I’ve generally not worked with Windows for most digital/audiovisual preservation tasks, so I’m hardly an expert. But I hope this post can be used to get Windows users up to speed and on the same page as the Mac club when it comes to the basics of using their computers for digipres fun!