The Cloud Is Just Someone Else’s 10,000 Computers

I don’t need to be the umpteenth person to tell you, in dramatically vague terms, that cloud computing and software-as-a-service have (*Paul LaFontaine voice*) changed the very society we live in. But every now and then I am reminded that the Big Tech consolidation and movement of everything online in the last ~15 years has fundamentally obstructed and obscured how computers work and most people’s understanding of what’s happening in/on the device right in front of them. [1] Google Drive’s native file “formats” being another fabulous example, but that is a post for another day

And as individuals and institutions across the board, from enterprise-level research universities to local collectives, have quickly come to rely on cloud-based file hosting, I think it has become absolutely critical for archivists – whether it’s in the context of picking one’s own storage options for a digital preservation and access plan, or sifting through acquisition, management, and organization of other’s folks’ collections – to understand what these services actually are and what they are providing. I’ve overheard requests for an “open source alternative” to Google Drive or Dropbox a couple of times recently, and it both encourages and troubles me a bit: encouraging because folks are questioning the ubiquity and motivation of these companies and services, but troubling because it means cloud-based companies have rather successfully obscured not only where people’s files and data live, but the mélange of software we use to interact with them. In essence, an “open-source alternative” to Google Drive does not exist; or at least, it doesn’t exist in the same way that we commonly talk about open-source alternatives to proprietary and expensive desktop software products (Adobe Premiere vs. Kdenlive, GarageBand vs. Audacity, Microsoft Office vs. LibreOffice, etc.)

Don’t SaaS Me

I will repeatedly use the word “service” in this post, and that’s not accidental. As part of a productivity suite (including the other “Google Workspace” products, like Google Docs, Google Sheets, Google Photos, etc), Google Drive and its equivalents are not just software, the way Finder or Windows File Manager are pieces of software, running on your computer, that allow you to browse and manipulate your files.  It is an example of what’s called Software as a Service (SaaS), an access and delivery model where, instead of downloading and installing a particular program to your computer, [2] Say, via Homebrew you log in and use a software platform that runs on another computer (often, though as we’ll see not exclusively, in return for some kind of subscription fee or license). The software platform itself, and often the data you’re manipulating (word files, spreadsheets, presentations, for example) is not actually stored and running on the computer you’re using to access the service – it’s on someone else’s, a server. To be even more specific to the current technological moment and not 20-year-old definitions of computing, it’s running in “the cloud“, which is not precisely “someone else’s computer” as the joke goes, but more as my title suggests, “someone else’s data center(s) with potentially thousands of servers working in concert” (scale is going to be a running theme here, and it’s important).

So when we talk about using these kinds of monolithic file hosting and productivity platforms – I’ll stick with Google Drive for this post as a very representative and relevant example, but this also encompasses, for instance, iCloud, Microsoft OneDrive and Office 365, Adobe Creative Cloud, Dropbox, Box, etc. – using them actually encompasses four [3]There is a fifth factor that I could discuss here – bandwidth – which encompasses the amount and speed at which you can move data to, from, and within these platforms. But that’s … Continue reading different things:

1. Storage space (GBs and TBs of drive space on which to put data; likely when you get down to it these are hard disk drives, but we could also be talking about solid state drives or even tape depending on precisely the service/price/purpose)

2. Computing power (processors and RAM on which to run software)

3. Server software (the software that runs on the cloud computing power and that actually manages/interacts with your files; with proprietary services, you will basically never as a user actually directly see or interact with this platform, in Google Drive’s case only Google’s internal developers and sysadmins know what this looks like or how to install, run, debug it, etc.)

4. Client software (the software that users actually see and use to control/direct the server application; in Google Drive’s case, the most commonly-used client is the web interface you see when you log in via a browser to drive.google.com, meaning this web client also runs on Google’s servers)

Because the last piece there is basically the only one that is visible to folks in any meaningful way, there can be a conflation of the client with the totality of the service. But putting all these four factors together is why, to put it bluntly, there is no “open source alternative” to Google Drive: in particular there is no such thing as open-source storage space or computing power. Those are material/physical resources that you either have or you don’t – and Google has them in spades, while the typical computer/SaaS user doesn’t.

Compounding this is the capitalist trap paradox that the more you use these platforms (again, either as an individual or an organization) the more difficult it will be to extricate yourself and your stuff, because you have been invisibly relying more and more on the time, money, and knowledge it takes to manage storage space and computing power – especially communally-used storage and computing power, as is the case with a file sharing service accessed by potentially many staff, patrons, and/or community members within one organization alone. Software, no matter how openly or ethically made, cannot replace those considerations on its own.

But! This is not to despair, nor disparage the instinct that I know is leading folks to ask such questions and seek alternatives. For many, many (good!) reasons, archivists are looking to divorce themselves and their work as much as possible from Big Tech companies. I do not mean to say “this is impossible” and leave it at that, but rather elaborate on exactly what currently available options are, and a glimmer of the effort and tradeoffs involved with them.

The Disclaimer

I will, for the purposes of this post, be assuming a thing or two, namely that by “Big Tech” I mean folks are looking to avoid proprietary SaaS, file hosting, and cloud computing options from a certain cadre of outsized companies including: Google, Microsoft, Amazon, Apple, Oracle, Dropbox. In the course of doing so I will mention other options, potentially including paid services offered by smaller companies (“smaller”, in some of these cases, being an extremely relative term).

This should not be read as a full-throated endorsement, paid advertisement for, or suggestion that these companies/services are not still part of the problem. You know – *gestures wildly* – the problem. Sometimes we just need the footing to investigate options, start a conversation, and move the needle a bit, particularly in work or communal settings. That’s all I’m aiming for here. For more well-rounded critique, the larger social implications of all cloud computing, and envisioning and embodying true alternatives, may I humbly encourage you go read something else, like Logic Magazine. [4] I shit you not, I had no idea that their latest issue (16) is literally called “Clouds” when I first wrote this.

Switching Clients

The good news – maybe there’s no open-source Google Drive, but there are open-source Google Drive clients that you can switch to, right now, today, with absolutely no disruption, changes, or migration of your files necessary. Just as you are not actually limited to interacting with files hosted on Drive with Google’s browser-based web app – you can also install and use their official desktop client, which, as the name implies, runs on *your* desktop/laptop’s computing power rather than Google’s servers; or use their mobile app client, which does the same but on your phone – you are not limited here by Google-made clients at all.

Open-source clients take advantage of Google Drive’s public API, which means that as long you have a Drive account and can provide the credentials to that account, any piece of client software can control/pass commands to Drive’s *server* application to perform certain tasks with your files. You are still taking advantage of Google’s resources (storage space, computing power, server-based file management platform), but you can also take advantage of features that SaaS companies like Google don’t always directly offer or intend with their own client software.

This may be particularly useful for archival or preservation-minded organizations, who often have use cases that Google doesn’t seek to serve (because our usage/business is at a scale that pales compared to general office productivity, personal file backup, education, etc). That might include more stable transfers than the upload/download options offered by a web browser, or automatically checking file fixity.

Open-source clients also tend to be designed more generically or comprehensively to hook into multiple cloud storage platforms. So you can use them to manage files on multiple services, e.g. Google Drive and Dropbox, or even transfer between them, without getting locked into using every vendor/SaaS’ client separately. That in turn helps for cleaner and more protected workflows – over time, as platforms change their clients and service offerings (i.e. pricing, limitations on storage space/computing power, privacy or ToS) you and your org/community can step back from these vagaries a bit and make smaller tweaks to settings or configurations (or even just move over to another hosting platform, if need be) rather than completely re-learning and re-training.

(Not that open-source software certainly *also* changes design, features, or workflows, and that such cycles still need to be taken into account; but, these changes are likely not made from a place of planned/forced obsolescence or pushing you into more profitable behaviors, which in my experience leads to more gradual changes and longer tails of backwards-compatibility)

  • rclone
    A personal favorite – rclone builds on basic Unix command-line tools (especially rsync) for an experience tailor-made for using cloud services in all manner of situations. You can manually control transfer speeds (for uploading/downloading over low bandwidth without losing or cancelling transfers), easily preserve timestamps and other file system data, encrypt sensitive files, and more. And their docs are great, there are example configurations on their site for a large number of the most popular file hosting services. Rclone itself has an experimental GUI built-in now but I personally prefer Rclone Browser for day-to-day use.

  • CyberDuck
    This is a pretty old-school client that goes back to the early aughts and was originally meant for uploading/downloading data to personal or web servers over FTP/SFTP, but seems to have successfully made the transition over to working with cloud services.

  • Duplicati
    Duplicati is specifically designed for encryption and creating backups of local files on external storage/remote servers (such as cloud services), so it doesn’t seem to be a full-fledged file manager per se. But, for archivists, orgs, or individuals looking for a way to primarily just use Google Drive or similar services as a remote storage option and not do a lot of sharing or manipulation of files at rest (that is, you just want a secure backup for your second or third copy of certain files/materials), it seems like a pretty good option for managing that.

Hack the Platform

Let’s say though that an open source client isn’t enough – you want to divorce from the Google Drive platform altogether and not still be invisibly relying on the Drive server software. Yes, open source file hosting platforms (server + client) exist!

The issue here is that, unlike clients which can be run on any old laptop/desktop/mobile device, server software requires – well, a server. So if you want to make the leap and use such platforms, you may need to learn something about server administration, network configuration, etc. Plus again, you need the computing power and storage space to run the software and store the data, whether that’s server hardware that you actually control/own like a desktop NAS unit, or going back to a cloud service (something like Google Cloud rather than Drive, a service that provides computing infrastructure but no particular application or software running on it). I’ll elaborate on options for the latter in a moment.

But overall I want to highlight that this is the step where we’re potentially starting to talk about time and effort to learn *new* technological skills in order to take advantage of this software, rather than just open source tools that build on or unlock skills, expertise, and workflows that you already have as a digital archivist.

  • Nextcloud
    Again I’m starting here with the platform I myself use and am most familiar with. Nextcloud’s probably the biggest name in “Google Drive” alternatives in FOSS-world (at least in the U.S.), and it definitely aims to go pound-for-pound against Drive by offering not just file storage/backup but tons of sharing options, integrations, and (as of fall 2021) an out-of-the-box office productivity suite that lets you collaboratively create and edit documents, spreadsheets and presentations in the built-in web client, just like Google Docs/Sheets/Slides. Personally I’ve used mine to sync my most important and frequently-used documents across desktops; share digitized videos with family; draft blog posts with contributors for this very site; automatically back up photos from my phone; and set up a personal music streaming service. It’s pretty great, and that’s all just for me, ignoring much of the multi-account/user options.

  • ownCloud
    Nextcloud is actually a fork of ownCloud, so they are extremely similar and many plugins/integrations, configuration settings, and other aspects of Nextcloud/ownCloud management seem completely cross-compatible. The history of the two projects is kind of murky to me as an outsider/onlooker – the split seems to have occurred over disagreements in business modeling and licensing [5]The unfortunate and basically unanswerable kind of fight that annoyingly erupts from time to time in open source communities over who is really behaving according to “the spirit” of free … Continue reading, resulting in two platforms that look and behave in much the same ways but serve slightly different communities, with ownCloud skewing toward enterprise-level business and support. Anecdotally it seems like ownCloud is a bit more popular/adopted in Europe as well. I’ll link a breakdown between the two here if you are interested and really want to get into the nitty-gritty of the differences between the two.

  • Seafile
    Like ownCloud, Seafile offers both a completely free and open source “community edition” as well as a paid tier business/enterprise option. Seafile was originally created in China and seems to have been primarily adopted in Asian and European markets; in fact its current lack of presence in the U.S. may at least be partly due to its Chinese and German partner organizations squabbling over the rights to Seafile’s U.S. trademark/intellectual property. I know little about Seafile but it seemed worth mentioning.

Mo Hosting Mo Problems

All right, now as I’ve repeatedly mentioned, if you’re interested in a Google Drive platform alternative like Nextcloud/ownCloud/Seafile, you’re going to need to invest in some hardware to go with your shiny new open source software. But let’s say you don’t have the physical space, time, or uninterrupted power supply to run and configure your own actual server/rack. That’s what cloud computing is supposed to offer: access to someone else’s hardware resources.

Of course, in our efforts to get away from prominent SaaS file hosting solutions like Google Drive, this leads us right back to the same suspects: there’s a bit of paradox, ethically-speaking, in setting up your personal or organizational Nextcloud platform on Google Cloud, or Microsoft Azure, or Amazon Web Services. This is also where the whole scale and model of cloud computing runs into problems in a capitalist society in general: if you’re big enough to offer cloud services at a useful/reasonable price point, there’s a not-insignificant chance you are: 1) still contributing to climate collapse and/or 2) about to get swallowed/bought up by one of those three or four giants anyway.

So, take all of this with a grain of salt, but if I were at a minimum just looking into alternatives to getting away from Google, I would look at:

  • DigitalOcean
    Disclosure: the web site you’re reading right now is hosted on DigtalOcean (as is my personal Nextcloud server I mentioned earlier). I like the clear pricing models and above all the community around DO – their documentation and tutorials, written by both staff and community members, are a terrific place to start if you’re interested in learning about system/server administration or just need some clear instructions for setting up particular, popular platforms (like say, Nextcloud).

  • Linode
    Linode also comes up a lot as an AWS/Azure/Google Cloud alternative – I tried them once and I admit pricing and usage was all extremely similar to DO, so I have no real comparison to make here except, as I’ve mentioned, that I’ve preferred DO’s documentation (though there’s absolutely nothing preventing you from using DO’s tutorials to set up open source software on Linode’s infrastructure!)

  • Backblaze
    Both Backblaze and the next service here specialize more in cloud storage than computing; which is to say, to my knowledge they don’t offer virtual servers and computing power (processors/RAM) the way DigitalOcean and Linode do; just remote storage management and client software to allow syncing/backing up your files. That means you’re not going to, e.g. install Nextcloud on Backblaze and get a lot of productivity/sharing options, a complete “Google Drive” equivalent – but if you’re the type of person or organization that’s simply using Google Drive or Dropbox for remote storage and backup in the first place, and largely ignoring the other features of the Google Workspace SaaS, then maybe this (which can be used in combination with one of the open source clients I mentioned earlier) is the kind of “alternative” service you’re looking for.

  • Wasabi
    Again, like Backblaze, Wasabi specializes in cloud storage, and the company’s very quick rise in this space (since only 2018) has a lot to do with their dirt-cheap pricing model. Take that as you will.

  • Reclaim Hosting
    Though they are technically piggy-backing on DigitalOcean’s cloud computing infrastructure, I did want to shout out Reclaim Hosting as an option that the sort of folks likely to be reading this – educators, non-profit/cultural heritage practitioners, students – should probably be aware of. In essence Reclaim packages support/service plans on top of DigitalOcean’s cloud computing infrastructure, taking a lot of the nitty-gritty of self-hosting SaaS platforms (domain management, setting up firewalls and network security, installing the actual server software like WordPress, Drupal, Omeka) either out of your hands entirely or at least simplifying it.

    And if you don’t specifically have ties to the educator/academia community that Reclaim is targeting, know that there are similar kinds of services out there: essentially, SaaS companies running open source platforms for you but managing the cloud computing infrastructure. A good place to start is looking at the websites for one of the open source platforms I mentioned above and looking for their “partners” or “hosting providers” – third-party companies that run these open source options on their servers and offer you access, basically the same way Google does with Drive. Pricing and terms of service may vary wildly depending where the company is located and other vagaries, so you’re always going to want to pay attention to the fine print, particularly if you’re working with any sensitive digital collection or community data.

    The Part Where I Ask *You* Questions


    None of the sections above are meant to be comprehensive. I’ve listed some tools and services that I have at least a passing familiarity with so that if anyone’s interested, I could maybe answer particular questions or chat more about my experience with them. But above all I’m offering these as examples, and trying to ground some of the vocabulary and features of these services, so that folks have a better idea of how to look for and evaluate the right combination of software and computing services that works for your stuff and situation.

    With all of these tools and SaaS options, whether you pursue alternatives or stick with a monolith like Google Drive, remember to keep one eye peeled behind the curtain. Some useful things to ask yourself in any file hosting project or evaluation:

  • Where are my files actually stored?
  • Where is the software I’m using running (and who’s running/managing/maintaining it)?
  • Who else needs to access these files, when, and for what purpose? Do I need to be able to collaboratively manage and edit files, or do I just need a backup?
  • How much storage space do I actually need?
  • Do I prefer working with files on my desktop or in a browser? Both?
  • Do I have time and capacity (including $) to learn more about computing and self-hosting software – or do I just need a quick, trusted service solution (also $, there’s really no way to get around the $)?

Footnotes

Footnotes
1 Google Drive’s native file “formats” being another fabulous example, but that is a post for another day
2 Say, via Homebrew
3 There is a fifth factor that I could discuss here – bandwidth – which encompasses the amount and speed at which you can move data to, from, and within these platforms. But that’s really getting into the weeds, potentially brings in complications like your Internet Service Provider, and frankly IMHO isn’t going to be a high-impact decision point for individuals or small-scale orgs. If you’re looking to get your mid- to large-scale institution away from Amazon Web Services, it’s going to come into play; but I’ll leave it out of “what do we do about Google Drive”.
4 I shit you not, I had no idea that their latest issue (16) is literally called “Clouds” when I first wrote this.
5 The unfortunate and basically unanswerable kind of fight that annoyingly erupts from time to time in open source communities over who is really behaving according to “the spirit” of free or open software

Your Promotion to Package Manager Manager

Let’s talk about Homebrew. No, not the beer-making technique that left you befuddled by the appearance of approximately two thousand quirky-named, identically-tasting IPAs at the liquor store. I mean the popular package manager for macOS.

I literally had to search for “quintuple pale ale” before I stopped getting actual results.

Homebrew is a beloved tool of digital preservation educators and practitioners alike, and it’s little wonder why. While I will continue to bang the drum for Linux operating systems and the Windows Subsystem for Linux, macOS remains a popular choice for digipres workstations given the general familiarity of Macs and the ability to easily and natively run much handy Bash/command line software. And Homebrew provides macOS with a major piece that it’s “missing” out-of-the-box: a command line-based package manager.

A “package manager” or package management system allows a user to install, uninstall, upgrade and configure the software on their computer in a consistent and centralized manner. The official Mac App Store is a package manager, for instance – rather than needing to trawl the internet to find, download, and run individual installers for every single application you’d like to try, the App Store (in theory) puts everything in one place. Homebrew is frequently described as an “App Store” for CLI programs, and the comparison is pretty apt.​*​ Setting up Homebrew is pretty much step one for any Mac-based digipres machine: once it’s in place, it’s only a matter of minutes to ffmpeg, mediainfo and mediaconch, vrecord, exiftool, imagemagick, rsync, youtube-dl, hashdeep, ddrescue, and much more.

Homebrew is far from the only command line package manager out there – part of why it’s called the “missing” package manager for macOS is because CLI package managers are included in Linux operating systems by default. Debian-based systems like Ubuntu have the Advanced Packaging Tool (APT);​†​ Red Hat-based systems have YUM or DNF; Arch systems have Pacman, SUSE systems have zypper, etc etc etc (and Homebrew can be used on Linux as well, incidentally). Since 2017, native Windows users can even get in on the action with Chocolatey.

Homebrew isn’t even the first attempt to bring command-line package management to MacOS, following in the footsteps of Fink and MacPorts (both of which are still around, though I would say with less robust or user-friendly communities around them). And that’s not even getting into the crazy number of package or “dependency managers” that do basically the same work for developers looking to add modules in particular programming language environments: pip (Python), RubyGems (Ruby), npm and yarn (NodeJS/JavaScript), Maven (Java)…

It’s a software preservation nightmare! Hooray!

All of these projects have more or less the same goal and advantages: automatically do all the work it takes to install a piece of software and make it available to the user. In many cases that means, for example, typing $ brew install mediainfo so that the mediainfo command-line application is then available by calling $ mediainfo example.mov , where it wasn’t available before.

However, as with so many lovely, user-friendly innovations, there is an element here of ceding control for convenience. And while most of the time it is hopefully unnecessary to peek under the Homebrew hood, a couple of times in the past year I’ve seen situations come up where it helps to have some clarity about what all, exactly, Homebrew is doing when one hits enter on a “brew install”.

So today I’m going to dive a bit deeper on the ins and outs of package management. I’ll use Homebrew for several concrete examples because of its popularity and ubiquity in digipres training, so keep in mind that some of Homebrew’s (beer-obsessed) vocabulary may be unique; but the concepts described here are basically applicable across the board, no matter what package manager you’re using.

Return to the Source

First off, I would like to clarify a few basic computing concepts: namely, the different types of code and executable programs that can be installed with a package manager.

There is a key difference in programming between source code and machine code. Source code is “human-readable” – open up a source code file and you will be greeted with plain text, written and formatted according to the conventions of a particular programming language (Python, Bash, JavaScript, Ruby, HTML, etc). These sorts of files make up the majority of what you see on popular source code hosting and version control platforms like GitHub, GitLab, Bitbucket, SourceForge. They facilitate the work of programmers and developers: broadly speaking, people read and write source code.

Machine code, as the name implies…is for machines. It is a set of instructions that can be directly executed by a computer’s processor. Machine code is numerical – there are no words to read, but the CPU interprets and understands a given string of numbers, or even just ones and zeroes, as a series of actions for it to perform. Open up a machine code file in a text editor, and you will just get a string of absolute wingding gobbledygook as the editor tries (and fails) to interpret and display this numerical series as text. (Open it up in a hex editor instead, and you might get a different result, but that’s another day’s topic)

Literal machine code

In order to get from source code (human-readable) to machine code (machine-readable) you usually have to compile the source code. The purpose of a compiler is to take your source code text and translate it down into ones and zeroes for your computer to actually do something with. Very broadly speaking, every operating system has its own compiler to make executable machine code for the particular OS and CPU that you are using right now. The same source code file (written, let’s say, in Python) is going to create different machine code whether the machine code is intended and compiled to run on MacOS or Windows.

I’m probably making this sound more complicated than it is: we see the results when you, for instance, try to open and run a “.exe” file intended for Windows on macOS.​‡​ Windows .exes are machine code: again, try opening one up in a plain text editor like Notepad or Atom or Sublime Text and see the results.

Because executable machine code is numerical, and at its very core just a very long string of ones and zeroes, executable machine code files, like Windows .exe files, are sometimes just called “binaries”.

This can get *really* confusing because technically a “binary” is just “any file not written and meant to be displayed as plain text” – .mp3s, .movs, .pdfs, .jpgs, all of them and many more are also “binary” files. But in the particular context of package management and installing applications, the term “binary” is very frequently used interchangeably and synonymously with “version”, e.g. “a macOS binary”, “a Windows binary”, etc. I sort of wish I could’ve avoided this altogether but it will absolutely come up when troubleshooting or searching support forums with package management questions, so here we are.

So if you want to create software that works on different operating systems and processors, someone usually has to compile it from source code first into executable binaries that match the desired operating systems and processors. Again, you’ve seen this out in the digital world: you’re trying to download a piece of software from a website and there are two different links, one “For Windows” that downloads a .exe, and one “for MacOS” that gives you a .app or .pkg or such. And because compilation is essentially an act of translation, there are correspondingly things that can get lost in that process. We’ll see a direct example of this later in a case study.

*Extremely undergrad film student voice* It’s really more of a character study

A last aside: you may have encountered scripts and be thinking, “hey, but I can run (execute) a .py Python script or a .sh Bash script, which are source code files, without compiling!” Well, you got me! Just as I mentioned above that not all binaries are executable, not all executables are binaries. Scripting uses a process called interpreting *instead* of compiling. This usually allows scripts to be a little more portable than binaries across operating systems, but the basic idea – that there is a layer of translation necessary between the source code file and the computer – is the same.

Define Your “Install”

So what all exactly does source code and machine code and binaries have to do with package management and installing software?

Let’s say you are a web archivist running macOS and you installed two different pieces of software at the same time: youtube-dl, which is a handy command line tool for downloading media from YouTube and other hosting sources, and Webrecorder Player, a wonderful desktop tool for viewing and inspecting web archives (WARCs), no internet connection required.

You “installed” both programs. But youtube-dl is not in your “Applications” folder next to Webrecorder Player. And no matter how many variations on $ webrecorder you type into your Terminal, Webrecorder Player does not launch on your desktop.

If that’s the case… what does “install” mean?

“Installing” an application is actually a highly contextual process. The end-goal is always the same: you want to take a program and make it usable. But how you’re supposed to use a program….depends on the program! And thus the steps actually necessary to complete an installation also depend on the program, user, and goals involved. When broken down into smaller, discrete actions – downloading source code, compiling the code, creating a binary, moving that binary to a particular folder, changing some operating system settings so that you can execute that binary – the installation process can get quite variable and customizable.

So finally, we return to package managers. As I said up top, package managers take care of many of the nitty-gritty details of installing programs so that as far as you, the user, are concerned, “install” just means “click this button” or “type one brief command.” They are meant to cover a large majority of use cases with minimal effort – but potentially to the detriment of edge cases or tinkering an installation to exactly how a user needs it.

To return specifically to Homebrew, every package or program that you can install with Homebrew has a corresponding formula. You can browse all of them here. Homebrew’s formulae are all hosted on GitHub, and every single one of them is just a brief script (written in Ruby) which defines instructions to answer the question:”what does ‘install’ mean for this particular program?” When you type $ brew install mediainfo, Homebrew searches in the Homebrew/homebrew-core repository in GitHub, finds the mediainfo.rb formula, and then follows whatever steps that formula tells it to do.

To get a better idea of what these instructions actually look like and how Homebrew interprets and performs them, let’s look at a concrete example.

Case Study: vrecord’s Homebrew Formula

This is the whole Homebrew formula for vrecord, the open source video digitization program, developed and maintained by the Association of Moving Image Archivists’ Open Source Committee – let’s take a look and break it down by section:

class Vrecord < Formula
  desc "Capturing a video signal and turning it into a digital file"
  homepage "https://github.com/amiaopensource/vrecord"
  url "https://github.com/amiaopensource/vrecord/archive/v2020-07-01.tar.gz"
  version "2020-07-01"
  sha256 "983264ca6a69b78b4487a7479ab5a4db04cbc425f865ec2cb15844e72af4f4ac"
  head "https://github.com/amiaopensource/vrecord.git"

  depends_on "amiaopensource/amiaos/deckcontrol"
  depends_on "amiaopensource/amiaos/ffmpegdecklink"
  depends_on "amiaopensource/amiaos/gtkdialog"
  depends_on "cowsay"

  on_macos do
    depends_on "bash"
    depends_on "gnuplot"
    depends_on "mediaconch"
    depends_on "mkvtoolnix"
    depends_on "mpv"
    depends_on "qcli"
    depends_on "xmlstarlet"
  end

  on_linux do
    def caveats
      <<~EOS
        ** IMPORTANT FOR LINUX INSTALL **
        Additional install steps are necessary for a fully functioning Vrecord
        install on Linux. This includes using the standard package manager to
        install gnuplot, xmlstarlet, mkvtoolnix and mediaconch. Additionally,
        it often is necessary to remove the Homebrew installed version of SDL2
        to prevent conflicts. For more information please see:
        https://github.com/amiaopensource/vrecord/blob/master/Resources/Documentation/linux_installation.md
      EOS
    end
  end

  def install
    bin.install "vrecord"
    bin.install "vtest"
    prefix.install "Resources/audio_mode.gif"
    prefix.install "Resources/qcview.lua"
    prefix.install "Resources/vrecord_policy_ffv1.xml"
    prefix.install "Resources/vrecord_policy_uncompressed.xml"
    prefix.install "Resources/vrecord_logo.png"
    prefix.install "Resources/vrecord_logo_playback.png"
    prefix.install "Resources/vrecord_logo_audio.png"
    prefix.install "Resources/vrecord_logo_edit.png"
    prefix.install "Resources/vrecord_logo_help.png"
    prefix.install "Resources/vrecord_logo_documentation.png"
    man1.install "vrecord.1"
    man1.install "vtest.1"
  end

  test do
    system "#{bin}/vrecord", "-h"
  end
end

Metadata

class Vrecord < Formula
  desc "Capturing a video signal and turning it into a digital file"
  homepage "https://github.com/amiaopensource/vrecord"
  url "https://github.com/amiaopensource/vrecord/archive/v2020-07-01.tar.gz"
  version "2020-07-01"
  sha256 "983264ca6a69b78b4487a7479ab5a4db04cbc425f865ec2cb15844e72af4f4ac"
  head "https://github.com/amiaopensource/vrecord.git"
  • class Vrecord < Formula – This line is required to start every formula – Homebrew needs to know that the script you’ve pointed it to is indeed a formula! (The first letter of the program/formula name has to be capitalized to conform with Ruby syntax, regardless of how you normally write the name)
  • desc – A brief description of the application and its purpose. Not strictly required, but helpful – this will often match the “About” project description on the application’s GitHub/GitLab page, if the code is hosted there.
  • homepage – A project site where users can go for more information about the program. It’s mandatory to include a homepage if you want to include your formula in Homebrew’s core list of packages.
  • url – This is required – it directs Homebrew to where it should download the program’s code (in this case, and in the case of many Homebrew formulae, the source code is contained in a tarball, a format that takes all the source code files and packages them together into one, like a .zip file). If a program has multiple versions or releases, this is the field that specifies *which* version Homebrew will try to install.
  • version – This is metadata that helps Homebrew keep track of which version of a program you have installed (particularly helpful if you have multiple versions of a program/formula installed on the same computer). It’s not required to have this field, and if the source code URL above comes from GitHub, Homebrew can usually pull this version info from the file name automatically – but it doesn’t hurt to specify manually.
  • sha256 – This is the checksum for the source code tarball from the URL above. Once the tarball has downloaded to your computer, Homebrew will automatically check that it matches this checksum here – basically a security feature to make sure that what Homebrew downloads is indeed the code/program that you wanted. This is required.
  • head – “Head” specifies a cutting-edge version of the program – if it’s specified, it means early adopters can try out the absolute newest changes and revisions to the program by its developers by running $ brew install --HEAD <formula> instead of just $ brew install <formula>. It’s basically a way to signal to users that some new options or features may be available for them to try out but they are not yet considered stable. It’s not required if the formula/program author only wants Homebrew users to install “guaranteed”, stable releases of their software.

Dependencies

depends_on "amiaopensource/amiaos/deckcontrol"
  depends_on "amiaopensource/amiaos/ffmpegdecklink"
  depends_on "amiaopensource/amiaos/gtkdialog"
  depends_on "cowsay"

  on_macos do
    depends_on "bash"
    depends_on "gnuplot"
    depends_on "mediaconch"
    depends_on "mkvtoolnix"
    depends_on "mpv"
    depends_on "qcli"
    depends_on "xmlstarlet"
  end

  on_linux do
    def caveats
      <<~EOS
        ** IMPORTANT FOR LINUX INSTALL **
        Additional install steps are necessary for a fully functioning Vrecord
        install on Linux. This includes using the standard package manager to
        install gnuplot, xmlstarlet, mkvtoolnix and mediaconch. Additionally,
        it often is necessary to remove the Homebrew installed version of SDL2
        to prevent conflicts. For more information please see:
        https://github.com/amiaopensource/vrecord/blob/master/Resources/Documentation/linux_installation.md
      EOS
    end
  end

In this section of a formula, the program developer needs to specify any and all external dependencies – that is, if there is other code, or other programs, that have to be present and installed on the user’s computer before vrecord can be installed and used correctly.

Every depends_on line specifies a dependency, and every dependency listed is…another Homebrew formula. So before Homebrew proceeds to the next section of the vrecord formula (the actual “install” section), it will go to each and everyone of these formula *first* and complete the instructions found there…(including, if those formula specify dependencies, going to their dependent formula – and on and on, down the line, if necessary).

(This means that the amount of time it takes vrecord to install can vary wildly, depending on how many of those depends_on formula are already present on your computer when you start – if you already have installed cowsay or mediaconch before, Homebrew will just skip over this part for those dependencies.)

In this case, the vrecord formula also specifies slightly different behavior depending on whether the Homebrew user is running macOS or Linux. Homebrew works on Linux systems, but, as I mentioned earlier, Linux systems usually have their own baked-in package managers (e.g. apt), and sometimes Homebrew and native Linux package managers don’t play nice with each other – so in this case, rather than having Homebrew run the install process for all those dependencies on_linux, the vrecord formula instead defines a caveat, which is just a warning to display to the user. This caveat, obviously, recommends using the native package manager to install certain dependencies instead of Homebrew, to avoid conflicts and errors.

Installation

def install
    bin.install "vrecord"
    bin.install "vtest"
    prefix.install "Resources/audio_mode.gif"
    prefix.install "Resources/qcview.lua"
    prefix.install "Resources/vrecord_policy_ffv1.xml"
    prefix.install "Resources/vrecord_policy_uncompressed.xml"
    prefix.install "Resources/vrecord_logo.png"
    prefix.install "Resources/vrecord_logo_playback.png"
    prefix.install "Resources/vrecord_logo_audio.png"
    prefix.install "Resources/vrecord_logo_edit.png"
    prefix.install "Resources/vrecord_logo_help.png"
    prefix.install "Resources/vrecord_logo_documentation.png"
    man1.install "vrecord.1"
    man1.install "vtest.1"
  end

Finally, the meat of the matter – in this section of the formula, we actually get to what “install” even means in the context of vrecord.

vrecord is a relatively straightforward case because, the source code doesn’t need to be compiled. The source code is itself a Bash script, which can be interpreted and run by a macOS or Linux system as-is (or, as-long-as-the-dependencies-have-been-installed). There is no translation down to the ones and zeroes of machine code required. So the installation process here isn’t a matter of compiling, it’s just a matter of moving the files to where they can be used.

prefix here is a variable – it’s a setting that’s part of your overall Homebrew configuration, so that any time any Homebrew formula mentions “prefix”, the package manager will just sub in the value it has stored there. Specifically, prefix defines a file path, an over-arching directory/folder for all of Homebrew’s files to live in.

Usually, prefix is set by default when you first install Homebrew and you never have to mess with it again (the whole point of a package manager being to mess with things as little as possible). You can see yours by running $ brew config and looking for the HOMEBREW_PREFIX line – on MacOS, it’s usually something like /usr/local. So, all of the programs that Homebrew downloads, installs, compiles, whatever – they’ll go into that /usr/local directory (unless otherwise specified by a formula) – all neatly nested and organized according to the package/formula name and version.

(The nested directory for keeping code, in Homebrew-speak, is called the Cellar. So if you want to find all your Homebrew-installed programs, poke around your /usr/local/Cellar directory.)

So all those lines that start with prefix.install are saying: take this file (from inside the tarball you downloaded) and put them inside the nested folder specified by the prefix variable. In vrecord’s case, we’re just taking some image and configuration templates (the .XML files) for vrecord’s GUI mode and putting them in appropriate locations in the Cellar.

The bin.install options are doing the same thing, but flagging an extra step. These two entries (vrecord and vtest) are the actual scripts that we want to run, so they need to be made executable for you, the user, to run them. The bin.install directive 1) links the specified files to a particular directory – in this case, /usr/local/bin – where your operating system expects to find executable command-line programs, and 2) adjusts their permissions so that you can run these scripts (without needing “sudo” permission.

(“bin” in these file paths stands for “binaries” – remember that there is a general but imprecise equation between binaries and executables, so that is why we are putting the vrecord script there – because it is executable, even though it is a source code file, not actually a binary/machine code)

The man1.install directives are again very similar to bin.install – these are manual pages that explain how to run the program (by typing $ man vrecord or $ man vtest). macOS expects to find these files in a certain place, just like how it expects executable binaries to be in /usr/bin or /usr/local/bin. So man1.install copies these files to that location.

Test

test do
    system "#{bin}/vrecord", "-h"
  end
end

Homebrew formula writers can optionally put in a “test” block at the end of the script to see if the installation process proceeded correctly. It’s generally out-of-scope of the Homebrew project to make sure that every single feature of every single program works as expected – but it can be a handy check just to make sure at least at the end of all this you got an executable something out of it.

In the case of vrecord, this test block simply directs the user’s computer to try running the command $ vrecord -h – if the computer encounters no errors trying to run this command (which just displays vrecord’s “help” page), then Homebrew will consider this a successful installation and finish running.

Phew. Let’s review – in the end, what did that vrecord Homebrew formula do? It told the computer to:

  1. Go to a certain URL and download the file (tarball) it found there.
  2. Check for any other external software that vrecord needs to work.
  3. Extract vrecord’s files out of the tarball and move them to a different folder.
  4. Test that the computer can find and execute the program now that it’s been moved.

Case Study: Troubleshooting ewfmount

..now what about a slightly more complicated example? One that involves source code compilation? Let’s start with a Tweet.

A bit of context: libewf is a collection of software that allows users to work with EWF-formatted disk images, which is a pretty popular option for preserving hard drives. One of the tools included in libewf is ewfmount, a program that allows you to, well, mount EWF disk images and explore the files on them, just as if they were a physical, external hard drive attached over USB or what have you.

Complicating things, ewfmount won’t work with MacOS out-of-the-box. First you have to install “FUSE for MacOS”, a piece of software that allows MacOS to work with file systems beyond the ones that Apple cares about (basically just APFS at this point). Otherwise, trying to mount your EWF disk image is going to have the same affect as plugging in an unformatted hard drive.

Eddy had installed these two pieces – libewf/ewfmount and FUSE for MacOS, using Homebrew, but the two pieces of software still weren’t “seeing” each other. To find out why, it helped to take a look at the Homebrew formulae – if the installation wasn’t working (Eddy didn’t have a usable version of the desired program at the end of the process), then one of the steps Homebrew was taking by default had to be incorrect for his context.

Here is the current Homebrew formula for libewf – see if you can spot the problematic line:

class Libewf < Formula
  desc "Library for support of the Expert Witness Compression Format"
  homepage "https://github.com/libyal/libewf"
  # The main libewf repository is currently "experimental".
  url "https://github.com/libyal/libewf-legacy/releases/download/20140808/libewf-20140808.tar.gz"
  sha256 "dfe29b5f2f1841ff1fe11979780d710a660dbc4727af82ec391f398e6b49e5fd"
  license "LGPL-3.0"

  bottle do
    cellar :any
    sha256 "43d8ba6c2441f65080f257a7239fe468be70cb2578ec2106230edd1164e967b6" => :catalina
    sha256 "4c5482f8f1c97f9c3f3687bccd9c3628b314699bc26743e641f2ae573bf95eeb" => :mojave
    sha256 "cae6fd2f38855fd15f8a50b644d0817181fed055aef85b7793759d7703a833d4" => :high_sierra
  end

  head do
    url "https://github.com/libyal/libewf.git"
    depends_on "autoconf" => :build
    depends_on "automake" => :build
    depends_on "gettext" => :build
    depends_on "libtool" => :build
  end

  depends_on "pkg-config" => :build
  depends_on "[email protected]"

  uses_from_macos "bzip2"
  uses_from_macos "zlib"

  def install
    if build.head?
      system "./synclibs.sh"
      system "./autogen.sh"
    end

    args = %W[
      --disable-dependency-tracking
      --disable-silent-rules
      --prefix=#{prefix}
      --with-libfuse=no
    ]

    system "./configure", *args
    system "make", "install"
  end

  test do
    assert_match version.to_s, shell_output("#{bin}/ewfinfo -V")
  end
end

…give up? On Line 40, in the “install” section of the formula, there are several “args” listed, including --with-libfuse=no. This our clue and our culprit!

These “args” are arguments (options) for source code compilation. So by default, the Homebrew formula defines “installation” of libewf/ewfmount to not include a software library called libfuse – which, as the name implies, is a critical component that libewf/ewfmount requires for communicating with FUSE for MacOS. Without it, ewfmount can not mount EWF disk images on MacOS.

Now, you can edit *any* Homebrew formula and how it works, just for you, by running $ brew edit <package>. This will open up a local copy of the formula in a text editor and let you change and edit options, without affecting how this formula behaves for all other Homebrew users.

But, unfortunately, in this case it is still not enough to just run $ brew edit libewf and change line 40 to --with-libfuse=yes. That’s because, towards the top of this formula, you’ll notice a section that starts with “bottle”.

“Bottles” are Homebrew’s clever name for binaries. These are pre-compiled versions of the source code for libewf, ready to go for MacOS (with flavors for High Sierra, Mojave, or Catalina, as indicated). If a bottle is specified in a Homebrew formula, that’s it – any instructions for how to compile the source code, later on in the formula, will be ignored, because as far as  Homebrew is concerned, you already have a working binary, and it will proceed from there. So editing line 40 will make no difference, because either way Homebrew will by default use a bottle/binary that was (we can assume/infer) already created with --with-libfuse=no.

The solution, in this case, is to both edit line 40 to --with-libfuse=yes AND run $ brew install --build-from-source libewf instead of just $ brew install libewf. This extra flag/option tells Homebrew to ignore the “bottle” section, skip over those pre-compiled binaries, and start from scratch with the source code specified at url. THEN, proceeding down the formula to the “install” section, Homebrew will compile the source according to the options you set, creating a new version/binary with libfuse enabled.

(There is a longer, more rambling and circular version of this solution that I wrote in that Twitter thread – but, it is a little out-of-date, as the Homebrew formula has been edited to adjust the default source code URL since that writing. The basic point – that getting a version of ewfmount that works with FUSE for MacOS using Homebrew requires the two-step process of changing a flag for use during compilation AND telling Homebrew to start from source – still stands)

Wrapping Up

I hope that these examples and explanations help digipres users understand that, while an AMAZING tool and community, Homebrew is not magic! It plays by certain rules and relies on assumptions of what will work best for the most number of users. The vast majority of the time – those assumptions will probably work fine for you!

But if they don’t, it’s just like any other piece of technology, analog or digital – you’ll need to know a bit more about what it’s doing in order to effectively troubleshoot, fix, or change it. So I’ll leave off with a note that Homebrew’s documentation IS ALSO AMAZING! The (open-source and volunteer!!!) contributors have pulled together tons and tons of information about how Homebrew works, guidelines for different ways of using it, templates and automated tools for creating and testing formulae, etc.  I really recommend exploring those pages to learn more about the various tricks up Homebrew’s sleeve (there’s a lot of built-in neatness you might not even know about!!) and as a diving-off point to learn more about code compilation, operating systems, interpreters, and just how software works, in general.

Am I just saying that because this is the longest blog post I’ve ever written, and there were about twenty different tangents and other topics that I didn’t even explore, and that I probably don’t have the time to write about? Maybe! But it’s time to put a cork in it.


  1. ​*​
    Please return to this line in a minute so you can appreciate just how clever I am.
  2. ​†​
    See, told you! Why aren’t you laughing?
  3. ​‡​
    macOS does some trickery that usually hides its executable machine code and files from your view, which again is a topic for another post; but broadly speaking the flip side is true as well when you try to open a macOS .app or .pkg on Windows

 

Everyday Linux

A couple weeks ago, there was a Forbes article that caught my eye. (No, librarians, it wasn’t that twaddle, please don’t hurt me)

No, it was this one, relating one writer/podcaster’s decision to switch to Linux as his everyday operating system after a few too many of the new “Blue Screen of Death”: Windows 10’s staggeringly inconvenient and endless Update screens.

Actually, turn it on, turn it off, who cares! You’re not working on that spreadsheet today anyway.

The article stood out because I had already been planning to write something extremely similar myself. A couple years ago, I needed a new laptop and decided I wanted out of the Apple ecosystem, partly because Apple’s desktop/laptop hardware and macOS design seemed increasingly shunted to the side in favor of iOS/mobile/tablets, but mostly because of $$$. I considered jumping ship to a Windows 10 machine, which, as I’ve said on several occasions, is actually a pretty nifty OS at its core – but, like Mr. Evangelho, I had encountered one too many productivity-destroying updates to my liking on my Windows station at work. Never mind the intrusive privacy defaults and the insane inability to permanently uninstall Candy Crush, Minecraft and other bloatware forced upon me by Microsoft.

why

I had used Linux operating systems, particularly Ubuntu (via BitCurator) before, and thought it might be time to take the leap to everyday use. After a little bit of research to make sure I would still be able to find versions of my most common/critical applications, I jumped ship and haven’t looked back. So, whereas I have written before about Linux in a professional context for digital preservation on several occasions, I want to finally make my evangelizing case of Linux as an everyday, personal operating system – for anyone.

Linux has a reputation as a “geeky” system for programmers and hardcore computer tinkerers, but it’s become incredibly accessible to anyone – or at least, certainly to anyone who’s used to having macOS or Windows in their daily lives. In fact, you’re almost certainly already using Linux even if you don’t realize it – if you have an Android smartphone, if you have a Chromebook laptop, if you have any of a thousand different smart/networked home devices (which, please throw them in the trash, but whatever), you’re using and relying on Linux.

Breaking away from the Mac/Windows dichotomy is as easy as your original choice of one or the other – the hurdle is largely just realizing there’s another option to debate.

Why Linux?

A Linux operating system is an example of free and open source software, often abbreviated FOSS. The “free” in there is meant to refer to “freedom”, not price (although FOSS tends to be free in that sense as well) – a legacy of the Free Software Foundation‘s four maxims that computer users should be able to:

  • run a program
  • study a program’s source code (in essence, to understand exactly how it works)
  • redistribute exact copies of that program
  • create and distribute modified versions of that program

This is all in opposition to closed and proprietary software, which use copyright and patent licenses to run contrary to at least one or more of these ideas. (Note that some open source software may still carry licenses that restrict the latter two points in certain ways – always check!)

Look, I could go on my anti-capitalist screed here, but you should probably just go see a more clever and entertaining one. But what it comes down to is that, unlike the proprietary model of one company hiring employees to build and distribute/sell its own, closed software, open source software is built by collaborative networks of programmers and users, under the general philosophy that humanity tends to make better, more broadly applicable advancements when everyone stands to (at least potentially) benefit.

Laika believed in this, don’t you?

That doesn’t mean that all open source developers are noble self-sacrificing volunteers. There are entire companies – like Canonical, Mozilla, Red Hat – dedicated to creating and supporting it, and any number of name-brand tech giants – Google, Oracle, yes Microsoft and Apple even – that at least participate in certain open source projects. When I say everyone can benefit, that often includes Big Tech. So don’t get me wrong, there’s plenty of ways to participate in and advocate for FOSS in ways that don’t involve a total shift in your operating system and computing environment, if you’re perfectly content where you are now.

But for me, switching over completely to a FOSS operating system in Linux felt like a way to take back some control from increasingly intrusive devices. For many years, Apple products’ big selling point was “it just works”, and I solidly felt that way with my first couple MacBooks – buy a laptop and the operating system got out of the way, letting you browse the internet, make movies, write up Sticky Note reminders, listen to music, and install other favorite programs and games, in a matter of minutes. I could do whatever it was I wanted to do.

I don’t feel like macOS (or Windows) “just works” quite in that same way anymore – they’re designed to work the way Apple and Microsoft want me to work. Constant, barraging notifications to log in to iCloud or OneDrive accounts, to enable Siri or Cortana AI assistance. Obscured telemetry settings sending data back to the hivemind and downloading “helpful” background programs, clogging up the computer’s resources without user knowledge. Stepping way beyond security concerns to slowly but surely cordon off anything downloaded by the user, to pigeonhole them into corporately-vetted App Stores. A six-month long hooplah over “Dark Mode.”

…go away forever?

(Look, I’m no fool – Apple’s choices were always business choices, made to ultimately improve their company’s market share, no matter which way you slice it – but I don’t think I’m alone in feeling that for some time that meant ceding at least the illusion of control to the user, or at least not nagging them every damn day into the feeling they were somehow using their own computer the “wrong” way)

Linux operating systems, because they are open and modifiable, are also extremely flexible and controllable – if that means you want to get into the nitty-gritty and install every single piece of software that makes a computer work yourself, go for it. But if that means you just want something that gets out of the way and lets you play Oregon Trail on the Internet Archive, Linux can also be that. It can be your everyday, bread-and-butter, “just works” computer, without voices constantly shouting at you about what that should look like.

What’s different?

Well, despite the whole stirring case I may have just made…there is no “Linux operating system.” Or at least, there is no one thing called “Linux” that you just go out and download and start streaming Netflix on.

Linux is a kernel. It’s the very center, core, most important piece of an operating system, but it’s not entirely functional in and of itself. You have to pile a bunch of other things on top of it: a desktop environment, a way to install and update applications, icons and windows and buttons – all the sexy, front-facing stuff that most of us actually consider when picking which operating system we want to use. So many, many, many people and companies have created their own version of that stuff, piled it on top of Linux, and released it as their own operating system. And each one of those can have a completely different look or feel to them.

All these different flavors or versions of Linux are referred to as distributions. (If you want to really fit in, call them “distros“)

So….what distribution do you choose???

This is absolutely the most overwhelming thing about switching to Linux. There are a lot of distributions, and they all have their own advantages and disadvantages – sometimes not very obvious, because there aren’t necessarily whole marketing teams behind them to give you the quick, summarized pitch on what makes their distribution different from others.

I’ve tried out several myself and will give out some recommendations, in what I hope are user-friendly terms, in the next/last section. This huge amount of choice at this very first stage can be staggering, but consider the benefit compared to closed systems: do you ever wish macOS had a “home” button and a super-key like Windows, so you could just pull up applications and more without having to remember the keyboard shortcut for Spotlight? Do you wish the Windows dock was more responsive, or its drop-down menus were located all the way at the top of the screen so you had more space for your word document? You’re probably never going to be able to make those tweaks unless Apple or Microsoft make them for you. With Linux, you can find the distribution that either already mixes and matches things the right way for you – or lets you tweak them yourself!

Like literally entire applications dedicated to easily tweaking the system

In terms of hardware, if you’re coming from Windows/PC-land, there’s not going to be much difference at all. Like Windows, you install a Linux operating system on a third-party hardware manufacturer’s device: HP, IBM, Lenovo, etc. You can competitively price features to your liking – more or less storage, higher resolution screen, higher quality keyboard or trackpad, whatever it is that’s important to you and your everyday comfort.

A small handful of companies will even directly sell you laptops with Linux distributions pre-installed (System76, Dell). But for the most competitive (read: cheapest) options, you’ll have to install Linux on a PC of your choice yourself.

Maybe not the most efficient choice

Like with Windows, this does also mean you *may* occasionally have to install or reinstall drivers to make certain peripheral devices (Wi-Fi cards, external mice) play nice with your operating system. This used to be a much more common issue than it is now, and a legitimate knock against Linux systems – but these days, if you’re using a major, well-supported distribution, it’s really no worse than Windows. And if you’re sitting there, a dedicated Windows user, thinking “huh, I’ve never had to deal with that”, neither have I in two years on my Linux laptop. This is more a warning to the Mac crowd that, hey, it’s possible for problems to arise when the company making the software isn’t also making the hardware (and if you’ve ever used a cheap non-Apple Thunderbolt adapter  or power charger – you probably knew that anyway!)

Finally, applications! Again, the major Linux distributions have all at this point pretty much borrowed the visual conception of the “App Store” – a program you can use to easily browse, install and launch a vast range of open source software. The vetting may not be as thorough – so bring the same healthy dose of skepticism and awareness that you do to the Google Play Store and you’ll be fine.

Sure, “contains ads”, seems legit

If you’re worried about losing out on your favorite Mac/Windows programs, you absolutely may want to do some research to make sure there are either Linux versions or at least satisfactory Linux equivalents to the software you need. But while you might not be able to get Adobe programs, for instance, on to your new operating system, there’s plenty of big-name proprietary apps that have made the leap in recent years: Spotify, Slack, even Skype. And Linux programs are usually available to at least open and convert the files you originally made with their Mac/Windows equivalents (LibreOffice can open and work with the various Microsoft Office formats, for instance, and GIMP can at least partially convert PhotoShop .PSDs)

Installing these applications is as easy as clicking “Install” in an App Store. No Apple ID hoops to go through. And the really wonderful thing is that unlike Mac or Windows, most Linux systems will track and perform application updates at the same time and in the same place as operating system updates – no more menu-searching and notifications from individual applications to make sure you’re on the latest, greatest, and most secure version of any given program. You’ll just get a general pop-up from the “Software Center” or equivalent and perform updates in one quick, fell swoop, or as nit-picky as desired. (And my Linux laptop has never unexpectedly forced a restart to update while I was doing something else)

And finally, Linux drives are formatted differently than macOS or Windows, using the “ext4” file system. This means you can encounter some of the same quirks in moving your old files to a Linux system that you’ve ever had in shuttling between Mac and Windows – but Linux pretty much always comes with at least read support for HFS+ (Mac) and NTFS (Windows) drives, so likewise I’ve never had issues with at least just transferring old files over to a new drive.

How can I try it out?

The great news is, unlike Mac or Windows, you don’t have to go to a physical store or buy a completely new laptop just to try a Linux distribution and see if it’s something you would like to use!

Just like when you install (or reinstall) the operating system on a Mac or Windows computer, to install Linux you’ll need a USB drive that is at least 8GB large to house the installation disk image – an ISO file. Unlike Mac or Windows, Linux installation images, in addition to the installation program itself, pretty much always have a “Live” mode – this lets you run a Linux session on your computer, to see how well it works on your hardware and if you like the distribution’s design and features. It’s a fantastic try-before-you-buy feature, and can even work with MacBooks, if that’s all you have (just don’t be surprised/blame Linux if there’s some hardware wonkiness, like your keyboard not responding 100% correctly).

Once you have an 8GB USB flash drive and the ISO file for the OS you want to try downloaded, you’ll need an application to “burn” the ISO to the flash drive and make it bootable. I recommend Etcher, which is multi-platform so it’ll work whether you’re starting out on Mac or Windows (and also, just for the record, if you’re trying to make bootable installers from macOS .DMGs or Windows ISOs – Etcher is a rad tool!). From there you’ll need to boot into the installer USB according to instructions that will depend on your laptop manufacturer (it usually means holding down one of the function keys at the top of your keyboard during startup, but the key combination varies depending on the hardware/maker).

So what about all those distributions? Which ones should you try? Here are some of the most popular flavors that I think would also be accessible to converts making their way over from macOS or Windows. These distributions all have wide user bases, meaning they all have either good documentation or even active support accounts that you can contact in the event of questions or problems.

Ubuntu

Thanks to its popularity as an operating system for servers and the Internet of Things, Ubuntu is probably the biggest name in Linux, and if you just want a super stable, incredibly well-supported desktop with thoughtful features, it is still my go-to recommendation for most casual users seeking alternative to macOS and Windows (and as I said earlier, if you’ve ever encountered BitCurator, you already know what it looks/feels like). It’s what I use myself for day-to-day web browsing, streaming services, word processing, some Steam gaming, light digipres/coding and a bit of server maintenance for this very site.

When it comes to Ubuntu, you’re going to want to look to try out a version labeled “LTS” – that’s “Long-Term Support”, meaning OS updates are guaranteed for five years (any of the other versions are primarily for developers and other anxious early adopters who don’t mind a few more bugs). The latest LTS release, 18.04, just came out a couple months ago with a pretty major desktop redesign, but it’s as attractive, sleek and functional as ever, and I came back to it after flirting with some of the other distributions on this list.

Linux Mint

Linux Mint is itself a derivative of Ubuntu, so everything I just said about stability and support goes for Mint as well – the Mint developers just wait for Ubuntu to release updates and then add their own spin. The differences are thus largely visual – Linux Mint’s desktop is made to look more like Windows, so users who are migrating from that direction are more likely to be at home here. It’s been around for a while so the support community is likewise large and varied.

Zorin

Zorin is also an Ubuntu derivative that’s even more explicitly targeted at converting Windows users/Linux newbies. It’s newer than Mint but I have to say personally I think that’s led to a fresher, more attractive design. (More Windows 10 to Linux Mint’s Windows 7). In fact it’s pretty much built around flexible, easily changed desktop design. Going with a distribution based on super pretty icons and easily swapping around menus might seem silly, but honestly, there’s so many distributions with such similar core features that these *are* the kinds of decisions that make Linux users go with one over the other.

elementary OS

Whereas Linux Mint and Zorin are more or less “Ubuntu for Windows converts”, elementary OS has a visual design more targeted to bring over macOS users, with a dock, top menu bar, etc. It’s a very light, sleek OS that really only ships with the most basic apps and the design encourages you to keep things simple (their media apps are literally just called things like “Music” and “Videos”, and their custom web browser Epiphany has a bare minimum of features to keep from becoming a memory hog like Firefox and Chrome). But since it’s Ubuntu based you can still easily install more familiar open source apps like VLC, etc. Also a very intriguing Linux OS to try if you’re used to something like a Chromebook or a tablet as your primary device.

Pop!_OS

System76 is known more for their attractive hardware and great customer support (I’ve had my Lemur for two years and am still in love), but they recently tried putting out their own Ubuntu Linux distribution targeted at developers and creative professionals. It pretty much seems like Ubuntu but with some more minimalist icon design and some nifty expanded features for multiple workspaces, but mostly I’m plugging them here because if you wind up looking for higher-end hardware and some really helpful, responsive support, System76 is tops.

Solus

Whereas all these previous distributions I’ve mentioned have built off the stability of Ubuntu, Solus is different, as it is its own completely independent OS. It’s an example of a “rolling release” operating system: while with macOS, Windows, and many Linux distributions like Ubuntu, you have to worry about your particular, fixed-point version eventually becoming obsolete and no longer receiving updates (e.g. once macOS 10.14 comes out, those users still using 10.11 will no longer receive security and software updates from Apple), Solus will just continually roll out updates, forever. Or, well, “forever,” but you get what I mean.

That means users actually get the latest software and features much faster, as you don’t have to wait for those things to get bundled into the next “release”. It’s an interesting model if you’re really ready to unmoor and explore away from the usual systems.

Aside from that, Solus’ custom “Budgie” desktop design is extremely pretty and modern, with a quick-launch menu and a custom applets bar. It feels a lot like Windows 10 in that sense.