The Cloud Is Just Someone Else’s 10,000 Computers

I don’t need to be the umpteenth person to tell you, in dramatically vague terms, that cloud computing and software-as-a-service have (*Paul LaFontaine voice*) changed the very society we live in. But every now and then I am reminded that the Big Tech consolidation and movement of everything online in the last ~15 years has fundamentally obstructed and obscured how computers work and most people’s understanding of what’s happening in/on the device right in front of them. [1] Google Drive’s native file “formats” being another fabulous example, but that is a post for another day

And as individuals and institutions across the board, from enterprise-level research universities to local collectives, have quickly come to rely on cloud-based file hosting, I think it has become absolutely critical for archivists – whether it’s in the context of picking one’s own storage options for a digital preservation and access plan, or sifting through acquisition, management, and organization of other’s folks’ collections – to understand what these services actually are and what they are providing. I’ve overheard requests for an “open source alternative” to Google Drive or Dropbox a couple of times recently, and it both encourages and troubles me a bit: encouraging because folks are questioning the ubiquity and motivation of these companies and services, but troubling because it means cloud-based companies have rather successfully obscured not only where people’s files and data live, but the mélange of software we use to interact with them. In essence, an “open-source alternative” to Google Drive does not exist; or at least, it doesn’t exist in the same way that we commonly talk about open-source alternatives to proprietary and expensive desktop software products (Adobe Premiere vs. Kdenlive, GarageBand vs. Audacity, Microsoft Office vs. LibreOffice, etc.)

Don’t SaaS Me

I will repeatedly use the word “service” in this post, and that’s not accidental. As part of a productivity suite (including the other “Google Workspace” products, like Google Docs, Google Sheets, Google Photos, etc), Google Drive and its equivalents are not just software, the way Finder or Windows File Manager are pieces of software, running on your computer, that allow you to browse and manipulate your files.  It is an example of what’s called Software as a Service (SaaS), an access and delivery model where, instead of downloading and installing a particular program to your computer, [2] Say, via Homebrew you log in and use a software platform that runs on another computer (often, though as we’ll see not exclusively, in return for some kind of subscription fee or license). The software platform itself, and often the data you’re manipulating (word files, spreadsheets, presentations, for example) is not actually stored and running on the computer you’re using to access the service – it’s on someone else’s, a server. To be even more specific to the current technological moment and not 20-year-old definitions of computing, it’s running in “the cloud“, which is not precisely “someone else’s computer” as the joke goes, but more as my title suggests, “someone else’s data center(s) with potentially thousands of servers working in concert” (scale is going to be a running theme here, and it’s important).

So when we talk about using these kinds of monolithic file hosting and productivity platforms – I’ll stick with Google Drive for this post as a very representative and relevant example, but this also encompasses, for instance, iCloud, Microsoft OneDrive and Office 365, Adobe Creative Cloud, Dropbox, Box, etc. – using them actually encompasses four [3]There is a fifth factor that I could discuss here – bandwidth – which encompasses the amount and speed at which you can move data to, from, and within these platforms. But that’s … Continue reading different things:

1. Storage space (GBs and TBs of drive space on which to put data; likely when you get down to it these are hard disk drives, but we could also be talking about solid state drives or even tape depending on precisely the service/price/purpose)

2. Computing power (processors and RAM on which to run software)

3. Server software (the software that runs on the cloud computing power and that actually manages/interacts with your files; with proprietary services, you will basically never as a user actually directly see or interact with this platform, in Google Drive’s case only Google’s internal developers and sysadmins know what this looks like or how to install, run, debug it, etc.)

4. Client software (the software that users actually see and use to control/direct the server application; in Google Drive’s case, the most commonly-used client is the web interface you see when you log in via a browser to drive.google.com, meaning this web client also runs on Google’s servers)

Because the last piece there is basically the only one that is visible to folks in any meaningful way, there can be a conflation of the client with the totality of the service. But putting all these four factors together is why, to put it bluntly, there is no “open source alternative” to Google Drive: in particular there is no such thing as open-source storage space or computing power. Those are material/physical resources that you either have or you don’t – and Google has them in spades, while the typical computer/SaaS user doesn’t.

Compounding this is the capitalist trap paradox that the more you use these platforms (again, either as an individual or an organization) the more difficult it will be to extricate yourself and your stuff, because you have been invisibly relying more and more on the time, money, and knowledge it takes to manage storage space and computing power – especially communally-used storage and computing power, as is the case with a file sharing service accessed by potentially many staff, patrons, and/or community members within one organization alone. Software, no matter how openly or ethically made, cannot replace those considerations on its own.

But! This is not to despair, nor disparage the instinct that I know is leading folks to ask such questions and seek alternatives. For many, many (good!) reasons, archivists are looking to divorce themselves and their work as much as possible from Big Tech companies. I do not mean to say “this is impossible” and leave it at that, but rather elaborate on exactly what currently available options are, and a glimmer of the effort and tradeoffs involved with them.

The Disclaimer

I will, for the purposes of this post, be assuming a thing or two, namely that by “Big Tech” I mean folks are looking to avoid proprietary SaaS, file hosting, and cloud computing options from a certain cadre of outsized companies including: Google, Microsoft, Amazon, Apple, Oracle, Dropbox. In the course of doing so I will mention other options, potentially including paid services offered by smaller companies (“smaller”, in some of these cases, being an extremely relative term).

This should not be read as a full-throated endorsement, paid advertisement for, or suggestion that these companies/services are not still part of the problem. You know – *gestures wildly* – the problem. Sometimes we just need the footing to investigate options, start a conversation, and move the needle a bit, particularly in work or communal settings. That’s all I’m aiming for here. For more well-rounded critique, the larger social implications of all cloud computing, and envisioning and embodying true alternatives, may I humbly encourage you go read something else, like Logic Magazine. [4] I shit you not, I had no idea that their latest issue (16) is literally called “Clouds” when I first wrote this.

Switching Clients

The good news – maybe there’s no open-source Google Drive, but there are open-source Google Drive clients that you can switch to, right now, today, with absolutely no disruption, changes, or migration of your files necessary. Just as you are not actually limited to interacting with files hosted on Drive with Google’s browser-based web app – you can also install and use their official desktop client, which, as the name implies, runs on *your* desktop/laptop’s computing power rather than Google’s servers; or use their mobile app client, which does the same but on your phone – you are not limited here by Google-made clients at all.

Open-source clients take advantage of Google Drive’s public API, which means that as long you have a Drive account and can provide the credentials to that account, any piece of client software can control/pass commands to Drive’s *server* application to perform certain tasks with your files. You are still taking advantage of Google’s resources (storage space, computing power, server-based file management platform), but you can also take advantage of features that SaaS companies like Google don’t always directly offer or intend with their own client software.

This may be particularly useful for archival or preservation-minded organizations, who often have use cases that Google doesn’t seek to serve (because our usage/business is at a scale that pales compared to general office productivity, personal file backup, education, etc). That might include more stable transfers than the upload/download options offered by a web browser, or automatically checking file fixity.

Open-source clients also tend to be designed more generically or comprehensively to hook into multiple cloud storage platforms. So you can use them to manage files on multiple services, e.g. Google Drive and Dropbox, or even transfer between them, without getting locked into using every vendor/SaaS’ client separately. That in turn helps for cleaner and more protected workflows – over time, as platforms change their clients and service offerings (i.e. pricing, limitations on storage space/computing power, privacy or ToS) you and your org/community can step back from these vagaries a bit and make smaller tweaks to settings or configurations (or even just move over to another hosting platform, if need be) rather than completely re-learning and re-training.

(Not that open-source software certainly *also* changes design, features, or workflows, and that such cycles still need to be taken into account; but, these changes are likely not made from a place of planned/forced obsolescence or pushing you into more profitable behaviors, which in my experience leads to more gradual changes and longer tails of backwards-compatibility)

  • rclone
    A personal favorite – rclone builds on basic Unix command-line tools (especially rsync) for an experience tailor-made for using cloud services in all manner of situations. You can manually control transfer speeds (for uploading/downloading over low bandwidth without losing or cancelling transfers), easily preserve timestamps and other file system data, encrypt sensitive files, and more. And their docs are great, there are example configurations on their site for a large number of the most popular file hosting services. Rclone itself has an experimental GUI built-in now but I personally prefer Rclone Browser for day-to-day use.

  • CyberDuck
    This is a pretty old-school client that goes back to the early aughts and was originally meant for uploading/downloading data to personal or web servers over FTP/SFTP, but seems to have successfully made the transition over to working with cloud services.

  • Duplicati
    Duplicati is specifically designed for encryption and creating backups of local files on external storage/remote servers (such as cloud services), so it doesn’t seem to be a full-fledged file manager per se. But, for archivists, orgs, or individuals looking for a way to primarily just use Google Drive or similar services as a remote storage option and not do a lot of sharing or manipulation of files at rest (that is, you just want a secure backup for your second or third copy of certain files/materials), it seems like a pretty good option for managing that.

Hack the Platform

Let’s say though that an open source client isn’t enough – you want to divorce from the Google Drive platform altogether and not still be invisibly relying on the Drive server software. Yes, open source file hosting platforms (server + client) exist!

The issue here is that, unlike clients which can be run on any old laptop/desktop/mobile device, server software requires – well, a server. So if you want to make the leap and use such platforms, you may need to learn something about server administration, network configuration, etc. Plus again, you need the computing power and storage space to run the software and store the data, whether that’s server hardware that you actually control/own like a desktop NAS unit, or going back to a cloud service (something like Google Cloud rather than Drive, a service that provides computing infrastructure but no particular application or software running on it). I’ll elaborate on options for the latter in a moment.

But overall I want to highlight that this is the step where we’re potentially starting to talk about time and effort to learn *new* technological skills in order to take advantage of this software, rather than just open source tools that build on or unlock skills, expertise, and workflows that you already have as a digital archivist.

  • Nextcloud
    Again I’m starting here with the platform I myself use and am most familiar with. Nextcloud’s probably the biggest name in “Google Drive” alternatives in FOSS-world (at least in the U.S.), and it definitely aims to go pound-for-pound against Drive by offering not just file storage/backup but tons of sharing options, integrations, and (as of fall 2021) an out-of-the-box office productivity suite that lets you collaboratively create and edit documents, spreadsheets and presentations in the built-in web client, just like Google Docs/Sheets/Slides. Personally I’ve used mine to sync my most important and frequently-used documents across desktops; share digitized videos with family; draft blog posts with contributors for this very site; automatically back up photos from my phone; and set up a personal music streaming service. It’s pretty great, and that’s all just for me, ignoring much of the multi-account/user options.

  • ownCloud
    Nextcloud is actually a fork of ownCloud, so they are extremely similar and many plugins/integrations, configuration settings, and other aspects of Nextcloud/ownCloud management seem completely cross-compatible. The history of the two projects is kind of murky to me as an outsider/onlooker – the split seems to have occurred over disagreements in business modeling and licensing [5]The unfortunate and basically unanswerable kind of fight that annoyingly erupts from time to time in open source communities over who is really behaving according to “the spirit” of free … Continue reading, resulting in two platforms that look and behave in much the same ways but serve slightly different communities, with ownCloud skewing toward enterprise-level business and support. Anecdotally it seems like ownCloud is a bit more popular/adopted in Europe as well. I’ll link a breakdown between the two here if you are interested and really want to get into the nitty-gritty of the differences between the two.

  • Seafile
    Like ownCloud, Seafile offers both a completely free and open source “community edition” as well as a paid tier business/enterprise option. Seafile was originally created in China and seems to have been primarily adopted in Asian and European markets; in fact its current lack of presence in the U.S. may at least be partly due to its Chinese and German partner organizations squabbling over the rights to Seafile’s U.S. trademark/intellectual property. I know little about Seafile but it seemed worth mentioning.

Mo Hosting Mo Problems

All right, now as I’ve repeatedly mentioned, if you’re interested in a Google Drive platform alternative like Nextcloud/ownCloud/Seafile, you’re going to need to invest in some hardware to go with your shiny new open source software. But let’s say you don’t have the physical space, time, or uninterrupted power supply to run and configure your own actual server/rack. That’s what cloud computing is supposed to offer: access to someone else’s hardware resources.

Of course, in our efforts to get away from prominent SaaS file hosting solutions like Google Drive, this leads us right back to the same suspects: there’s a bit of paradox, ethically-speaking, in setting up your personal or organizational Nextcloud platform on Google Cloud, or Microsoft Azure, or Amazon Web Services. This is also where the whole scale and model of cloud computing runs into problems in a capitalist society in general: if you’re big enough to offer cloud services at a useful/reasonable price point, there’s a not-insignificant chance you are: 1) still contributing to climate collapse and/or 2) about to get swallowed/bought up by one of those three or four giants anyway.

So, take all of this with a grain of salt, but if I were at a minimum just looking into alternatives to getting away from Google, I would look at:

  • DigitalOcean
    Disclosure: the web site you’re reading right now is hosted on DigtalOcean (as is my personal Nextcloud server I mentioned earlier). I like the clear pricing models and above all the community around DO – their documentation and tutorials, written by both staff and community members, are a terrific place to start if you’re interested in learning about system/server administration or just need some clear instructions for setting up particular, popular platforms (like say, Nextcloud).

  • Linode
    Linode also comes up a lot as an AWS/Azure/Google Cloud alternative – I tried them once and I admit pricing and usage was all extremely similar to DO, so I have no real comparison to make here except, as I’ve mentioned, that I’ve preferred DO’s documentation (though there’s absolutely nothing preventing you from using DO’s tutorials to set up open source software on Linode’s infrastructure!)

  • Backblaze
    Both Backblaze and the next service here specialize more in cloud storage than computing; which is to say, to my knowledge they don’t offer virtual servers and computing power (processors/RAM) the way DigitalOcean and Linode do; just remote storage management and client software to allow syncing/backing up your files. That means you’re not going to, e.g. install Nextcloud on Backblaze and get a lot of productivity/sharing options, a complete “Google Drive” equivalent – but if you’re the type of person or organization that’s simply using Google Drive or Dropbox for remote storage and backup in the first place, and largely ignoring the other features of the Google Workspace SaaS, then maybe this (which can be used in combination with one of the open source clients I mentioned earlier) is the kind of “alternative” service you’re looking for.

  • Wasabi
    Again, like Backblaze, Wasabi specializes in cloud storage, and the company’s very quick rise in this space (since only 2018) has a lot to do with their dirt-cheap pricing model. Take that as you will.

  • Reclaim Hosting
    Though they are technically piggy-backing on DigitalOcean’s cloud computing infrastructure, I did want to shout out Reclaim Hosting as an option that the sort of folks likely to be reading this – educators, non-profit/cultural heritage practitioners, students – should probably be aware of. In essence Reclaim packages support/service plans on top of DigitalOcean’s cloud computing infrastructure, taking a lot of the nitty-gritty of self-hosting SaaS platforms (domain management, setting up firewalls and network security, installing the actual server software like WordPress, Drupal, Omeka) either out of your hands entirely or at least simplifying it.

    And if you don’t specifically have ties to the educator/academia community that Reclaim is targeting, know that there are similar kinds of services out there: essentially, SaaS companies running open source platforms for you but managing the cloud computing infrastructure. A good place to start is looking at the websites for one of the open source platforms I mentioned above and looking for their “partners” or “hosting providers” – third-party companies that run these open source options on their servers and offer you access, basically the same way Google does with Drive. Pricing and terms of service may vary wildly depending where the company is located and other vagaries, so you’re always going to want to pay attention to the fine print, particularly if you’re working with any sensitive digital collection or community data.

    The Part Where I Ask *You* Questions


    None of the sections above are meant to be comprehensive. I’ve listed some tools and services that I have at least a passing familiarity with so that if anyone’s interested, I could maybe answer particular questions or chat more about my experience with them. But above all I’m offering these as examples, and trying to ground some of the vocabulary and features of these services, so that folks have a better idea of how to look for and evaluate the right combination of software and computing services that works for your stuff and situation.

    With all of these tools and SaaS options, whether you pursue alternatives or stick with a monolith like Google Drive, remember to keep one eye peeled behind the curtain. Some useful things to ask yourself in any file hosting project or evaluation:

  • Where are my files actually stored?
  • Where is the software I’m using running (and who’s running/managing/maintaining it)?
  • Who else needs to access these files, when, and for what purpose? Do I need to be able to collaboratively manage and edit files, or do I just need a backup?
  • How much storage space do I actually need?
  • Do I prefer working with files on my desktop or in a browser? Both?
  • Do I have time and capacity (including $) to learn more about computing and self-hosting software – or do I just need a quick, trusted service solution (also $, there’s really no way to get around the $)?

Footnotes

Footnotes
1 Google Drive’s native file “formats” being another fabulous example, but that is a post for another day
2 Say, via Homebrew
3 There is a fifth factor that I could discuss here – bandwidth – which encompasses the amount and speed at which you can move data to, from, and within these platforms. But that’s really getting into the weeds, potentially brings in complications like your Internet Service Provider, and frankly IMHO isn’t going to be a high-impact decision point for individuals or small-scale orgs. If you’re looking to get your mid- to large-scale institution away from Amazon Web Services, it’s going to come into play; but I’ll leave it out of “what do we do about Google Drive”.
4 I shit you not, I had no idea that their latest issue (16) is literally called “Clouds” when I first wrote this.
5 The unfortunate and basically unanswerable kind of fight that annoyingly erupts from time to time in open source communities over who is really behaving according to “the spirit” of free or open software

Your Promotion to Package Manager Manager

Let’s talk about Homebrew. No, not the beer-making technique that left you befuddled by the appearance of approximately two thousand quirky-named, identically-tasting IPAs at the liquor store. I mean the popular package manager for macOS.

I literally had to search for “quintuple pale ale” before I stopped getting actual results.

Homebrew is a beloved tool of digital preservation educators and practitioners alike, and it’s little wonder why. While I will continue to bang the drum for Linux operating systems and the Windows Subsystem for Linux, macOS remains a popular choice for digipres workstations given the general familiarity of Macs and the ability to easily and natively run much handy Bash/command line software. And Homebrew provides macOS with a major piece that it’s “missing” out-of-the-box: a command line-based package manager.

A “package manager” or package management system allows a user to install, uninstall, upgrade and configure the software on their computer in a consistent and centralized manner. The official Mac App Store is a package manager, for instance – rather than needing to trawl the internet to find, download, and run individual installers for every single application you’d like to try, the App Store (in theory) puts everything in one place. Homebrew is frequently described as an “App Store” for CLI programs, and the comparison is pretty apt.​*​ Setting up Homebrew is pretty much step one for any Mac-based digipres machine: once it’s in place, it’s only a matter of minutes to ffmpeg, mediainfo and mediaconch, vrecord, exiftool, imagemagick, rsync, youtube-dl, hashdeep, ddrescue, and much more.

Homebrew is far from the only command line package manager out there – part of why it’s called the “missing” package manager for macOS is because CLI package managers are included in Linux operating systems by default. Debian-based systems like Ubuntu have the Advanced Packaging Tool (APT);​†​ Red Hat-based systems have YUM or DNF; Arch systems have Pacman, SUSE systems have zypper, etc etc etc (and Homebrew can be used on Linux as well, incidentally). Since 2017, native Windows users can even get in on the action with Chocolatey.

Homebrew isn’t even the first attempt to bring command-line package management to MacOS, following in the footsteps of Fink and MacPorts (both of which are still around, though I would say with less robust or user-friendly communities around them). And that’s not even getting into the crazy number of package or “dependency managers” that do basically the same work for developers looking to add modules in particular programming language environments: pip (Python), RubyGems (Ruby), npm and yarn (NodeJS/JavaScript), Maven (Java)…

It’s a software preservation nightmare! Hooray!

All of these projects have more or less the same goal and advantages: automatically do all the work it takes to install a piece of software and make it available to the user. In many cases that means, for example, typing $ brew install mediainfo so that the mediainfo command-line application is then available by calling $ mediainfo example.mov , where it wasn’t available before.

However, as with so many lovely, user-friendly innovations, there is an element here of ceding control for convenience. And while most of the time it is hopefully unnecessary to peek under the Homebrew hood, a couple of times in the past year I’ve seen situations come up where it helps to have some clarity about what all, exactly, Homebrew is doing when one hits enter on a “brew install”.

So today I’m going to dive a bit deeper on the ins and outs of package management. I’ll use Homebrew for several concrete examples because of its popularity and ubiquity in digipres training, so keep in mind that some of Homebrew’s (beer-obsessed) vocabulary may be unique; but the concepts described here are basically applicable across the board, no matter what package manager you’re using.

Return to the Source

First off, I would like to clarify a few basic computing concepts: namely, the different types of code and executable programs that can be installed with a package manager.

There is a key difference in programming between source code and machine code. Source code is “human-readable” – open up a source code file and you will be greeted with plain text, written and formatted according to the conventions of a particular programming language (Python, Bash, JavaScript, Ruby, HTML, etc). These sorts of files make up the majority of what you see on popular source code hosting and version control platforms like GitHub, GitLab, Bitbucket, SourceForge. They facilitate the work of programmers and developers: broadly speaking, people read and write source code.

Machine code, as the name implies…is for machines. It is a set of instructions that can be directly executed by a computer’s processor. Machine code is numerical – there are no words to read, but the CPU interprets and understands a given string of numbers, or even just ones and zeroes, as a series of actions for it to perform. Open up a machine code file in a text editor, and you will just get a string of absolute wingding gobbledygook as the editor tries (and fails) to interpret and display this numerical series as text. (Open it up in a hex editor instead, and you might get a different result, but that’s another day’s topic)

Literal machine code

In order to get from source code (human-readable) to machine code (machine-readable) you usually have to compile the source code. The purpose of a compiler is to take your source code text and translate it down into ones and zeroes for your computer to actually do something with. Very broadly speaking, every operating system has its own compiler to make executable machine code for the particular OS and CPU that you are using right now. The same source code file (written, let’s say, in Python) is going to create different machine code whether the machine code is intended and compiled to run on MacOS or Windows.

I’m probably making this sound more complicated than it is: we see the results when you, for instance, try to open and run a “.exe” file intended for Windows on macOS.​‡​ Windows .exes are machine code: again, try opening one up in a plain text editor like Notepad or Atom or Sublime Text and see the results.

Because executable machine code is numerical, and at its very core just a very long string of ones and zeroes, executable machine code files, like Windows .exe files, are sometimes just called “binaries”.

This can get *really* confusing because technically a “binary” is just “any file not written and meant to be displayed as plain text” – .mp3s, .movs, .pdfs, .jpgs, all of them and many more are also “binary” files. But in the particular context of package management and installing applications, the term “binary” is very frequently used interchangeably and synonymously with “version”, e.g. “a macOS binary”, “a Windows binary”, etc. I sort of wish I could’ve avoided this altogether but it will absolutely come up when troubleshooting or searching support forums with package management questions, so here we are.

So if you want to create software that works on different operating systems and processors, someone usually has to compile it from source code first into executable binaries that match the desired operating systems and processors. Again, you’ve seen this out in the digital world: you’re trying to download a piece of software from a website and there are two different links, one “For Windows” that downloads a .exe, and one “for MacOS” that gives you a .app or .pkg or such. And because compilation is essentially an act of translation, there are correspondingly things that can get lost in that process. We’ll see a direct example of this later in a case study.

*Extremely undergrad film student voice* It’s really more of a character study

A last aside: you may have encountered scripts and be thinking, “hey, but I can run (execute) a .py Python script or a .sh Bash script, which are source code files, without compiling!” Well, you got me! Just as I mentioned above that not all binaries are executable, not all executables are binaries. Scripting uses a process called interpreting *instead* of compiling. This usually allows scripts to be a little more portable than binaries across operating systems, but the basic idea – that there is a layer of translation necessary between the source code file and the computer – is the same.

Define Your “Install”

So what all exactly does source code and machine code and binaries have to do with package management and installing software?

Let’s say you are a web archivist running macOS and you installed two different pieces of software at the same time: youtube-dl, which is a handy command line tool for downloading media from YouTube and other hosting sources, and Webrecorder Player, a wonderful desktop tool for viewing and inspecting web archives (WARCs), no internet connection required.

You “installed” both programs. But youtube-dl is not in your “Applications” folder next to Webrecorder Player. And no matter how many variations on $ webrecorder you type into your Terminal, Webrecorder Player does not launch on your desktop.

If that’s the case… what does “install” mean?

“Installing” an application is actually a highly contextual process. The end-goal is always the same: you want to take a program and make it usable. But how you’re supposed to use a program….depends on the program! And thus the steps actually necessary to complete an installation also depend on the program, user, and goals involved. When broken down into smaller, discrete actions – downloading source code, compiling the code, creating a binary, moving that binary to a particular folder, changing some operating system settings so that you can execute that binary – the installation process can get quite variable and customizable.

So finally, we return to package managers. As I said up top, package managers take care of many of the nitty-gritty details of installing programs so that as far as you, the user, are concerned, “install” just means “click this button” or “type one brief command.” They are meant to cover a large majority of use cases with minimal effort – but potentially to the detriment of edge cases or tinkering an installation to exactly how a user needs it.

To return specifically to Homebrew, every package or program that you can install with Homebrew has a corresponding formula. You can browse all of them here. Homebrew’s formulae are all hosted on GitHub, and every single one of them is just a brief script (written in Ruby) which defines instructions to answer the question:”what does ‘install’ mean for this particular program?” When you type $ brew install mediainfo, Homebrew searches in the Homebrew/homebrew-core repository in GitHub, finds the mediainfo.rb formula, and then follows whatever steps that formula tells it to do.

To get a better idea of what these instructions actually look like and how Homebrew interprets and performs them, let’s look at a concrete example.

Case Study: vrecord’s Homebrew Formula

This is the whole Homebrew formula for vrecord, the open source video digitization program, developed and maintained by the Association of Moving Image Archivists’ Open Source Committee – let’s take a look and break it down by section:

class Vrecord < Formula
  desc "Capturing a video signal and turning it into a digital file"
  homepage "https://github.com/amiaopensource/vrecord"
  url "https://github.com/amiaopensource/vrecord/archive/v2020-07-01.tar.gz"
  version "2020-07-01"
  sha256 "983264ca6a69b78b4487a7479ab5a4db04cbc425f865ec2cb15844e72af4f4ac"
  head "https://github.com/amiaopensource/vrecord.git"

  depends_on "amiaopensource/amiaos/deckcontrol"
  depends_on "amiaopensource/amiaos/ffmpegdecklink"
  depends_on "amiaopensource/amiaos/gtkdialog"
  depends_on "cowsay"

  on_macos do
    depends_on "bash"
    depends_on "gnuplot"
    depends_on "mediaconch"
    depends_on "mkvtoolnix"
    depends_on "mpv"
    depends_on "qcli"
    depends_on "xmlstarlet"
  end

  on_linux do
    def caveats
      <<~EOS
        ** IMPORTANT FOR LINUX INSTALL **
        Additional install steps are necessary for a fully functioning Vrecord
        install on Linux. This includes using the standard package manager to
        install gnuplot, xmlstarlet, mkvtoolnix and mediaconch. Additionally,
        it often is necessary to remove the Homebrew installed version of SDL2
        to prevent conflicts. For more information please see:
        https://github.com/amiaopensource/vrecord/blob/master/Resources/Documentation/linux_installation.md
      EOS
    end
  end

  def install
    bin.install "vrecord"
    bin.install "vtest"
    prefix.install "Resources/audio_mode.gif"
    prefix.install "Resources/qcview.lua"
    prefix.install "Resources/vrecord_policy_ffv1.xml"
    prefix.install "Resources/vrecord_policy_uncompressed.xml"
    prefix.install "Resources/vrecord_logo.png"
    prefix.install "Resources/vrecord_logo_playback.png"
    prefix.install "Resources/vrecord_logo_audio.png"
    prefix.install "Resources/vrecord_logo_edit.png"
    prefix.install "Resources/vrecord_logo_help.png"
    prefix.install "Resources/vrecord_logo_documentation.png"
    man1.install "vrecord.1"
    man1.install "vtest.1"
  end

  test do
    system "#{bin}/vrecord", "-h"
  end
end

Metadata

class Vrecord < Formula
  desc "Capturing a video signal and turning it into a digital file"
  homepage "https://github.com/amiaopensource/vrecord"
  url "https://github.com/amiaopensource/vrecord/archive/v2020-07-01.tar.gz"
  version "2020-07-01"
  sha256 "983264ca6a69b78b4487a7479ab5a4db04cbc425f865ec2cb15844e72af4f4ac"
  head "https://github.com/amiaopensource/vrecord.git"
  • class Vrecord < Formula – This line is required to start every formula – Homebrew needs to know that the script you’ve pointed it to is indeed a formula! (The first letter of the program/formula name has to be capitalized to conform with Ruby syntax, regardless of how you normally write the name)
  • desc – A brief description of the application and its purpose. Not strictly required, but helpful – this will often match the “About” project description on the application’s GitHub/GitLab page, if the code is hosted there.
  • homepage – A project site where users can go for more information about the program. It’s mandatory to include a homepage if you want to include your formula in Homebrew’s core list of packages.
  • url – This is required – it directs Homebrew to where it should download the program’s code (in this case, and in the case of many Homebrew formulae, the source code is contained in a tarball, a format that takes all the source code files and packages them together into one, like a .zip file). If a program has multiple versions or releases, this is the field that specifies *which* version Homebrew will try to install.
  • version – This is metadata that helps Homebrew keep track of which version of a program you have installed (particularly helpful if you have multiple versions of a program/formula installed on the same computer). It’s not required to have this field, and if the source code URL above comes from GitHub, Homebrew can usually pull this version info from the file name automatically – but it doesn’t hurt to specify manually.
  • sha256 – This is the checksum for the source code tarball from the URL above. Once the tarball has downloaded to your computer, Homebrew will automatically check that it matches this checksum here – basically a security feature to make sure that what Homebrew downloads is indeed the code/program that you wanted. This is required.
  • head – “Head” specifies a cutting-edge version of the program – if it’s specified, it means early adopters can try out the absolute newest changes and revisions to the program by its developers by running $ brew install --HEAD <formula> instead of just $ brew install <formula>. It’s basically a way to signal to users that some new options or features may be available for them to try out but they are not yet considered stable. It’s not required if the formula/program author only wants Homebrew users to install “guaranteed”, stable releases of their software.

Dependencies

depends_on "amiaopensource/amiaos/deckcontrol"
  depends_on "amiaopensource/amiaos/ffmpegdecklink"
  depends_on "amiaopensource/amiaos/gtkdialog"
  depends_on "cowsay"

  on_macos do
    depends_on "bash"
    depends_on "gnuplot"
    depends_on "mediaconch"
    depends_on "mkvtoolnix"
    depends_on "mpv"
    depends_on "qcli"
    depends_on "xmlstarlet"
  end

  on_linux do
    def caveats
      <<~EOS
        ** IMPORTANT FOR LINUX INSTALL **
        Additional install steps are necessary for a fully functioning Vrecord
        install on Linux. This includes using the standard package manager to
        install gnuplot, xmlstarlet, mkvtoolnix and mediaconch. Additionally,
        it often is necessary to remove the Homebrew installed version of SDL2
        to prevent conflicts. For more information please see:
        https://github.com/amiaopensource/vrecord/blob/master/Resources/Documentation/linux_installation.md
      EOS
    end
  end

In this section of a formula, the program developer needs to specify any and all external dependencies – that is, if there is other code, or other programs, that have to be present and installed on the user’s computer before vrecord can be installed and used correctly.

Every depends_on line specifies a dependency, and every dependency listed is…another Homebrew formula. So before Homebrew proceeds to the next section of the vrecord formula (the actual “install” section), it will go to each and everyone of these formula *first* and complete the instructions found there…(including, if those formula specify dependencies, going to their dependent formula – and on and on, down the line, if necessary).

(This means that the amount of time it takes vrecord to install can vary wildly, depending on how many of those depends_on formula are already present on your computer when you start – if you already have installed cowsay or mediaconch before, Homebrew will just skip over this part for those dependencies.)

In this case, the vrecord formula also specifies slightly different behavior depending on whether the Homebrew user is running macOS or Linux. Homebrew works on Linux systems, but, as I mentioned earlier, Linux systems usually have their own baked-in package managers (e.g. apt), and sometimes Homebrew and native Linux package managers don’t play nice with each other – so in this case, rather than having Homebrew run the install process for all those dependencies on_linux, the vrecord formula instead defines a caveat, which is just a warning to display to the user. This caveat, obviously, recommends using the native package manager to install certain dependencies instead of Homebrew, to avoid conflicts and errors.

Installation

def install
    bin.install "vrecord"
    bin.install "vtest"
    prefix.install "Resources/audio_mode.gif"
    prefix.install "Resources/qcview.lua"
    prefix.install "Resources/vrecord_policy_ffv1.xml"
    prefix.install "Resources/vrecord_policy_uncompressed.xml"
    prefix.install "Resources/vrecord_logo.png"
    prefix.install "Resources/vrecord_logo_playback.png"
    prefix.install "Resources/vrecord_logo_audio.png"
    prefix.install "Resources/vrecord_logo_edit.png"
    prefix.install "Resources/vrecord_logo_help.png"
    prefix.install "Resources/vrecord_logo_documentation.png"
    man1.install "vrecord.1"
    man1.install "vtest.1"
  end

Finally, the meat of the matter – in this section of the formula, we actually get to what “install” even means in the context of vrecord.

vrecord is a relatively straightforward case because, the source code doesn’t need to be compiled. The source code is itself a Bash script, which can be interpreted and run by a macOS or Linux system as-is (or, as-long-as-the-dependencies-have-been-installed). There is no translation down to the ones and zeroes of machine code required. So the installation process here isn’t a matter of compiling, it’s just a matter of moving the files to where they can be used.

prefix here is a variable – it’s a setting that’s part of your overall Homebrew configuration, so that any time any Homebrew formula mentions “prefix”, the package manager will just sub in the value it has stored there. Specifically, prefix defines a file path, an over-arching directory/folder for all of Homebrew’s files to live in.

Usually, prefix is set by default when you first install Homebrew and you never have to mess with it again (the whole point of a package manager being to mess with things as little as possible). You can see yours by running $ brew config and looking for the HOMEBREW_PREFIX line – on MacOS, it’s usually something like /usr/local. So, all of the programs that Homebrew downloads, installs, compiles, whatever – they’ll go into that /usr/local directory (unless otherwise specified by a formula) – all neatly nested and organized according to the package/formula name and version.

(The nested directory for keeping code, in Homebrew-speak, is called the Cellar. So if you want to find all your Homebrew-installed programs, poke around your /usr/local/Cellar directory.)

So all those lines that start with prefix.install are saying: take this file (from inside the tarball you downloaded) and put them inside the nested folder specified by the prefix variable. In vrecord’s case, we’re just taking some image and configuration templates (the .XML files) for vrecord’s GUI mode and putting them in appropriate locations in the Cellar.

The bin.install options are doing the same thing, but flagging an extra step. These two entries (vrecord and vtest) are the actual scripts that we want to run, so they need to be made executable for you, the user, to run them. The bin.install directive 1) links the specified files to a particular directory – in this case, /usr/local/bin – where your operating system expects to find executable command-line programs, and 2) adjusts their permissions so that you can run these scripts (without needing “sudo” permission.

(“bin” in these file paths stands for “binaries” – remember that there is a general but imprecise equation between binaries and executables, so that is why we are putting the vrecord script there – because it is executable, even though it is a source code file, not actually a binary/machine code)

The man1.install directives are again very similar to bin.install – these are manual pages that explain how to run the program (by typing $ man vrecord or $ man vtest). macOS expects to find these files in a certain place, just like how it expects executable binaries to be in /usr/bin or /usr/local/bin. So man1.install copies these files to that location.

Test

test do
    system "#{bin}/vrecord", "-h"
  end
end

Homebrew formula writers can optionally put in a “test” block at the end of the script to see if the installation process proceeded correctly. It’s generally out-of-scope of the Homebrew project to make sure that every single feature of every single program works as expected – but it can be a handy check just to make sure at least at the end of all this you got an executable something out of it.

In the case of vrecord, this test block simply directs the user’s computer to try running the command $ vrecord -h – if the computer encounters no errors trying to run this command (which just displays vrecord’s “help” page), then Homebrew will consider this a successful installation and finish running.

Phew. Let’s review – in the end, what did that vrecord Homebrew formula do? It told the computer to:

  1. Go to a certain URL and download the file (tarball) it found there.
  2. Check for any other external software that vrecord needs to work.
  3. Extract vrecord’s files out of the tarball and move them to a different folder.
  4. Test that the computer can find and execute the program now that it’s been moved.

Case Study: Troubleshooting ewfmount

..now what about a slightly more complicated example? One that involves source code compilation? Let’s start with a Tweet.

A bit of context: libewf is a collection of software that allows users to work with EWF-formatted disk images, which is a pretty popular option for preserving hard drives. One of the tools included in libewf is ewfmount, a program that allows you to, well, mount EWF disk images and explore the files on them, just as if they were a physical, external hard drive attached over USB or what have you.

Complicating things, ewfmount won’t work with MacOS out-of-the-box. First you have to install “FUSE for MacOS”, a piece of software that allows MacOS to work with file systems beyond the ones that Apple cares about (basically just APFS at this point). Otherwise, trying to mount your EWF disk image is going to have the same affect as plugging in an unformatted hard drive.

Eddy had installed these two pieces – libewf/ewfmount and FUSE for MacOS, using Homebrew, but the two pieces of software still weren’t “seeing” each other. To find out why, it helped to take a look at the Homebrew formulae – if the installation wasn’t working (Eddy didn’t have a usable version of the desired program at the end of the process), then one of the steps Homebrew was taking by default had to be incorrect for his context.

Here is the current Homebrew formula for libewf – see if you can spot the problematic line:

class Libewf < Formula
  desc "Library for support of the Expert Witness Compression Format"
  homepage "https://github.com/libyal/libewf"
  # The main libewf repository is currently "experimental".
  url "https://github.com/libyal/libewf-legacy/releases/download/20140808/libewf-20140808.tar.gz"
  sha256 "dfe29b5f2f1841ff1fe11979780d710a660dbc4727af82ec391f398e6b49e5fd"
  license "LGPL-3.0"

  bottle do
    cellar :any
    sha256 "43d8ba6c2441f65080f257a7239fe468be70cb2578ec2106230edd1164e967b6" => :catalina
    sha256 "4c5482f8f1c97f9c3f3687bccd9c3628b314699bc26743e641f2ae573bf95eeb" => :mojave
    sha256 "cae6fd2f38855fd15f8a50b644d0817181fed055aef85b7793759d7703a833d4" => :high_sierra
  end

  head do
    url "https://github.com/libyal/libewf.git"
    depends_on "autoconf" => :build
    depends_on "automake" => :build
    depends_on "gettext" => :build
    depends_on "libtool" => :build
  end

  depends_on "pkg-config" => :build
  depends_on "[email protected]"

  uses_from_macos "bzip2"
  uses_from_macos "zlib"

  def install
    if build.head?
      system "./synclibs.sh"
      system "./autogen.sh"
    end

    args = %W[
      --disable-dependency-tracking
      --disable-silent-rules
      --prefix=#{prefix}
      --with-libfuse=no
    ]

    system "./configure", *args
    system "make", "install"
  end

  test do
    assert_match version.to_s, shell_output("#{bin}/ewfinfo -V")
  end
end

…give up? On Line 40, in the “install” section of the formula, there are several “args” listed, including --with-libfuse=no. This our clue and our culprit!

These “args” are arguments (options) for source code compilation. So by default, the Homebrew formula defines “installation” of libewf/ewfmount to not include a software library called libfuse – which, as the name implies, is a critical component that libewf/ewfmount requires for communicating with FUSE for MacOS. Without it, ewfmount can not mount EWF disk images on MacOS.

Now, you can edit *any* Homebrew formula and how it works, just for you, by running $ brew edit <package>. This will open up a local copy of the formula in a text editor and let you change and edit options, without affecting how this formula behaves for all other Homebrew users.

But, unfortunately, in this case it is still not enough to just run $ brew edit libewf and change line 40 to --with-libfuse=yes. That’s because, towards the top of this formula, you’ll notice a section that starts with “bottle”.

“Bottles” are Homebrew’s clever name for binaries. These are pre-compiled versions of the source code for libewf, ready to go for MacOS (with flavors for High Sierra, Mojave, or Catalina, as indicated). If a bottle is specified in a Homebrew formula, that’s it – any instructions for how to compile the source code, later on in the formula, will be ignored, because as far as  Homebrew is concerned, you already have a working binary, and it will proceed from there. So editing line 40 will make no difference, because either way Homebrew will by default use a bottle/binary that was (we can assume/infer) already created with --with-libfuse=no.

The solution, in this case, is to both edit line 40 to --with-libfuse=yes AND run $ brew install --build-from-source libewf instead of just $ brew install libewf. This extra flag/option tells Homebrew to ignore the “bottle” section, skip over those pre-compiled binaries, and start from scratch with the source code specified at url. THEN, proceeding down the formula to the “install” section, Homebrew will compile the source according to the options you set, creating a new version/binary with libfuse enabled.

(There is a longer, more rambling and circular version of this solution that I wrote in that Twitter thread – but, it is a little out-of-date, as the Homebrew formula has been edited to adjust the default source code URL since that writing. The basic point – that getting a version of ewfmount that works with FUSE for MacOS using Homebrew requires the two-step process of changing a flag for use during compilation AND telling Homebrew to start from source – still stands)

Wrapping Up

I hope that these examples and explanations help digipres users understand that, while an AMAZING tool and community, Homebrew is not magic! It plays by certain rules and relies on assumptions of what will work best for the most number of users. The vast majority of the time – those assumptions will probably work fine for you!

But if they don’t, it’s just like any other piece of technology, analog or digital – you’ll need to know a bit more about what it’s doing in order to effectively troubleshoot, fix, or change it. So I’ll leave off with a note that Homebrew’s documentation IS ALSO AMAZING! The (open-source and volunteer!!!) contributors have pulled together tons and tons of information about how Homebrew works, guidelines for different ways of using it, templates and automated tools for creating and testing formulae, etc.  I really recommend exploring those pages to learn more about the various tricks up Homebrew’s sleeve (there’s a lot of built-in neatness you might not even know about!!) and as a diving-off point to learn more about code compilation, operating systems, interpreters, and just how software works, in general.

Am I just saying that because this is the longest blog post I’ve ever written, and there were about twenty different tangents and other topics that I didn’t even explore, and that I probably don’t have the time to write about? Maybe! But it’s time to put a cork in it.


  1. ​*​
    Please return to this line in a minute so you can appreciate just how clever I am.
  2. ​†​
    See, told you! Why aren’t you laughing?
  3. ​‡​
    macOS does some trickery that usually hides its executable machine code and files from your view, which again is a topic for another post; but broadly speaking the flip side is true as well when you try to open a macOS .app or .pkg on Windows

 

A Brief Introduction to Signal-to-Noise Ratio for Analog Video Preservation

This is a guest post written by Jeff Lauber, based on his paper originally written for a course on “Video Preservation” in NYU’s M.A. program in Moving Image Archiving and Preservation.

Jeff is a media archivist based in New York City. He currently works as an archivist for Jenny Holzer Studio in Brooklyn and carries out freelance archival projects for a number of NYC institutions.

In its most basic sense, signal-to-noise ratio (SNR) explains itself: the proportion of noise present in transmission of a given signal compared to the pure signal itself. In more technical terms: “the dimensionless ratio of the signal power to the noise power contained in a recording […] the signal-to-noise ratio parameterizes the performance of optimal signal processing systems when the noise is Gaussian [i.e. normally distributed and/or predictable].”​1​ Applicable in numerous fields in which imaging and/or transmission of electronic signals are used, SNR allows for a more precise understanding of how accurately a signal is being received. In the preservation of videotape, SNR can act as an important indication of whether or not audiovisual signals from the tape are being captured properly and with minimal unwanted background noise in the analog-to-digital conversion process.

Noise in magnetic/electronic audiovisual signal transmission can be the result of a number of factors. In its most common instance, noise is perceived in the video signal as “snow” and is the result of random electrical disturbances in the signal transmission process. Most of this random noise is white (i.e. distributed evenly across the frequency spectrum) and is introduced to the signal via components that contain low signal levels such as camera imagers, videotape recorders, cable circuits, and broadcast receivers.​2​ In fact, noise can be introduced into the signal at any stage in the recording and transmission process; as Jim Lindner importantly notes, “the impact [of introducing noise into the signal] is cumulative over the entire signal chain and is not necessarily just an issue of the magnetic media itself or of a problem with magnetic fields in storage.”​3​

An illustration of chrominance noise in a Betacam transfer (red specks in a woman's dark hair)
Noise in the chroma (color) signal from a Betacam transfer; image courtesy the AV Artifact Atlas

In general, the SNR of analog recording media and electronics has improved in tandem with advancements in technology. Though natural to both analog and digital recordings, noise has often been considered impure and undesirable in audiovisual signals, and efforts to improve recording technology over the years were intent on achieving the highest possible SNR, i.e. the lowest level of noise proportionate to the audiovisual signals.​3​ Video SNR is most often expressed in decibels and, in more precise terms, is the “power ratio […] of the peak-to-peak signal voltage or current, from black level to reference white but not including the sync pulses, to the rms [sic] value of the noise.”​2​ The noise power in a video signal can be expressed as either a weighted or unweighted value. Unweighted noise is an expression of the noise power in a given signal and can be determined mathematically/logarithmically or using an instrument with a uniform frequency response which quantitatively measures the output spectrum of a signal transmission. Weighted noise takes into account the noise that the average human can visibly perceive in a video signal: by considering factors such as natural eye response, screen brightness, and scan-line width, among others, the weighted factor simulates the aperture response of the human eye to adjust the noise power measurement to what can be visually perceived.​2​ In general, weighted noise measurements result in a SNR that is about 8dB higher than unweighted, the greater proportion a result of adjusting the factor for noise that is present in the signal but cannot be visually perceived.​4​ A final important consideration when measuring the SNR of a given video signal is to factor in time : in terms of human perception, still frames of a film or video image appear noisier than moving images, eye-brain coordination generally integrating around six frames of a video image to improve the SNR. It is noted that incorporating a 0.2 second factor into the mathematical equation to calculate SNR is good practice in that respect.​5​ In broadcast television, weighted SNR of between 43 and 53dB is considered reasonable, though newer analog and digital technologies have become capable of ratios far greater.​2​

SNR is an important factor to consider in the analog-to-digital conversion of videotape signals and in video- and audiotape preservation generally. In the simplest sense, awareness of noise and its proportion to the pure, desired audiovisual signal is essential. Lindner notes that many people—notably younger generations whose lives have been predominantly digital—have come to expect pure, seemingly noiseless image and sound quality from audiovisual content. However, both visual and audible noise are natural qualities of analog recordings.​3​ Thus, attempting to reduce noise in analog-to-digital transfer of video- and audiotapes has the potential to inaccurately capture intrinsic noise qualities if done too severely.

A comparison of shots from the Buffy HD remaster and the original DVDs, showing Xander's too-smooth skin
The HD remaster of Buffy the Vampire Slayer is infamous for its over-emphatic Digital Noise Reduction of material originally shot on film.

In a technical sense, digitization of analog videotape must take into consideration the SNR of a given audiovisual signal since, as with signal amplification and transmission, the digitization process involves sending a signal through multiple components or pieces of equipment, each of which has the potential to introduce noise. This consideration is especially pertinent given that noise is additive in analog systems, i.e. noise accumulates at every stage of the recording, transmission, and/or digitization process.​2​ Converting the waveform of analog audiovisual content into quantized, digital values in general introduces error, and the quality of the recording or presence of noise is also determined by the number of bits per sample. The signal to quantization noise ratio (SQNR), in that respect, measures the difference between the signal and the noise introduced in quantization; it has been noted that every bit adds around 6dB of resolution to the digitized signal so that a 16-bit sample, for instance, would allow for a maximum SQNR of 96dB.​6​ An awareness of target SNRs and methods for maximizing the ratio of digitized content can help ensure cleaner, more accurate digital transfer.

However, when dealing with analog videotape and transmitting its signal, noise is an inevitability; even when sample size is significant and even when other measures are taken to reduce noise in the signal path during digitization (e.g. minimizing the length of the path and the number of components used – literally, shorter cables and less equipment, if possible), there is an “irreducible amount of noise superimposed on the signal each time the camera image is read out, amplified, and digitized.”​7​

Heavy metal band thrashing
Embrace the noise

Works Cited

  1. 1.
    Johnson D. Signal-to-Noise Ratio. Scholarpedia. http://www.scholarpedia.org/article/Signal-to-noise_ratio. Accessed October 14, 2018.
  2. 2.
    Inglis A. Video Engineering. New York, NY: McGraw-Hill; 1993.
  3. 3.
    Lindner J. Magnetic Fields, Erasing Magnetic Media by Mistake, and Analog Video Noise. AMIA Listserv. https://lsv.uky.edu/archives/amia-l.html. Published February 11, 2015. Accessed October 12, 2018.
  4. 4.
    Constant M. Signal to Noise Ratio. CCTV Information. https://www.cctv-information.co.uk/i/Signal_to_Noise_Ratio. Published 2009. Accessed October 14, 2018.
  5. 5.
    Rose A. Comparative Noise Properties of Vision, Television, and Photographic Film. In: Vision: Human and Electronic. New York, NY: Plenum Press; 1973:95-110.
  6. 6.
    Chan CF. Chapter 2: Digitization of Sound. City University of Hong Kong. http://www.ee.cityu.edu/hk/~cfchan/EE5809/chap2.pdf. Accessed October 14, 2018.
  7. 7.
    Russ JC. Correcting Imaging Defects. In: The Image Processing Handbook. 5th ed. Boca Raton, FL: CRC Press; 2007:163-242.