Using Bagit

do you know how to use Bagit?
cause I’m lost
the libcongress github is sooo not user friendly

I’m missing bagger aren’t I
ugh ugh I just want to make a bag!

That’s an email I got a few weeks back from a good friend and former MIAP classmate. I wanted to share it because I feel like it sums up the attitude of a lot of archivists towards a tool that SHOULD be something of a godsend to the field and a foundational step in many digital preservation workflows – namely, the Library of Congress’ BagIt.

What is BagIt? It’s a software library, developed by the LoC in conjunction with their partners in the National Digital Information Infrastructure and Preservation Program (NDIIPP) to support the creation of “bags.” OK, so what’s a bag?

amerbeauty_165pyxurz

Let’s back up a minute. One of the big challenges in digital archiving is file fixity – a fancy term for checking that the contents of a file have not been changed or altered (that the file has remained “fixed”). There’s all sorts of reasons to regularly verify file fixity, even if a file has done nothing but sit on a computer or server or external hard drive: to make sure that a file hasn’t corrupted over time, that its metadata (file name, technical specs, etc.) hasn’t been accidentally changed by software or an operating system, etc.

But one of the biggest threats to file fixity is when you move a file – from a computer to a hard drive, or over a server, or even just from one folder/directory in your computer to another. Think of it kind of like putting something in the mail: there are a lot of points in the mailing process where a computer or USPS employee has to read the labeling and sort your mail into the proper bin or truck or plane so that it ends up getting to the correct destination. And there’s a LOT of opportunity for external forces to batter and jostle and otherwise get in your mail’s personal space. If you just slap a stamp on that beautiful glass vase you bought for your mother’s birthday and shove it in the mailbox, it’s not going to get to your mom in one piece.

screen-shot-2016-09-20-at-10-04-35-am
And what if you’re delivering something even more precious than a vase?

So a “bag” is a kind of special digital container – a way of packaging files together to make sure what we get on the receiving end of a transfer is the same thing that started the journey (like putting that nice glass vase in a heavily padded box with “fragile” stamped all over it).

Great, right? Generating a bag can take more time than people want, particularly if you’re an archive dealing with large, preservation-quality uncompressed video files, but it’s a no-brainer idea to implement into workflows for backing up/storing data. The thing is, as I said, the BagIt tools developed by the Library of Congress to support the creation of bags are software libraries – not necessarily in and of themselves fully-developed, ready-to-go applications. Developers have to put some kind of interface on top of the BagIt library for people to actually be able to actually interact and use it to create bags.

So right off the bat, even though tech-savvier archivists may constantly be recommending to others in the field to “use BagIt” to deliver or move their files, we’re already muddling the issue for new users, because there literally is no one, monolithic thing called “BagIt” that someone can just google and download and start running. And I think we seriously underestimate how much of a hindrance that is to widespread implementation. Basically anyone can understand the principles and need behind BagIt (I hopefully did a swift job of it in the paragraphs above) – but actually sifting through and installing the various BagIt distributions currently takes time, and an ability to read between the lines of some seriously scattered documentation.

So here I’m going to walk through the major BagIt implementations and explain a bit about how and why you might use each one. I hope consolidating all this information in one place will be more helpful than the Library of Congress’ github pages (which indeed make little effort to make their instructions accessible to anyone unfamiliar with developer-speak). If you want to learn more about the BagIt specification itself (i.e. what pieces/files actually make up a bag, how data gets hierarchically structured inside a bag, how BagIt checksums and tag manifests to do the file fixity work I mentioned earlier), I can recommend this introductory slideshow to BagIt from Justin Littman at the LoC.

Update (12/15/2017): While all the above info still stands, the roundup and installation instructions below are no longer 100% accurate. I’m keeping this post up for the sake of web archiving and laying the ever-changing state of digital preservation bare and all that, but if you’re here I’d now recommend that you proceed over to this post on using BagIt in 2018 for more up-to-date documentation!

1. Bagger (BagIt-java)

screen-shot-2016-09-20-at-9-27-12-am

The BagIt library was originally developed using a programming language called Java. For its first four stable versions, Bagit-java could be used either via command-line interface (a Terminal window on Macs or Linux/Ubuntu, Command Prompt in Windows), or via a Graphical User Interface (GUI) developed by the LoC itself called Bagger.

As of version 5 of Bagit-java, the LoC has completely ceased support of using BagIt-java via the command line. That doesn’t mean it isn’t still out there – if, for instance, you’re on a Mac and use the popular package manager Homebrew, typing

$ brew install bagit

will install the last, stable version (4.12.1) of BagIt-java. But damned if I know how to actually use it, because in its deprecation of support the LoC seems to have removed (or maybe just never wrote?) any online documentation (github or elsewhere) of how to use BagIt-java via command-line. No manual page for you.

Instead you now have to use Bagger to employ BagIt-java (from the LoC’s perspective, anyway). Bagger is primarily designed to run on Windows or, with some tinkering, Linux/Ubuntu and Mac OSX.

maxresdefault
They do, funnily enough, include a screed about the iPhone 7’s lack of a 3.5mm headphone jack.

So once you actually download Bagger, which I primarily recommend if you’re on Windows, there’s some pretty good existing documentation for using the application’s various features, and even without doing that reading, it’s a pretty intuitive program. Big honking buttons help you either start making a new bag (picking the A/V and other files you want to be included in the bag one-by-one and safely packaging them together into one directory) or create a “bag in place”, which just takes an already-existing folder/directory and structures the content within that folder according to the BagIt specification. You can also validate bags that have been given/sent to you (that is, check the fixity of the data). The “Is Bag Complete” feature checks whether a folder/directory you have is, in fact, officially a bag according to the rules of the BagIt spec.

(FWIW: I managed to get Bagger running on my OSX 10.10.5 desktop by installing an older version, 2.1.3, off of the Library of Congress’ Sourceforge. That download included a bagger.jar file that my Mac could easily open after installing the legacy Java 6 runtime environment (available here). But, that same Sourceforge page yells at you that the project has moved to Github, where you can only download the latest 2.7.1 release, which only includes the Windows-compatible bagger.bat file and material for compiling on Linux, no OSX-compatible .jar file. I have no idea what’s going on here, and we’ve definitely fallen into a tech-jargon zone that will scare off laypeople, so I’m going to leave it at “use Bagger with Windows”)

Update: After some initial confusion (see above paragraph), the documentation for running Bagger on OSX has improved twofold! First, the latest release of Bagger (2.7.2) came with some tweaks to the github repo’s documentation, including instructions for Linux/Ubuntu/OSX! Thanks, guys! Also check out the comments section on this page for some instructions for launching Bagger in OSX from the command-line and/or creating an AppleScript that will let you launch Bagger from Spotlight/the Applications menu like you would any other program.

2. Exactly

screen-shot-2016-09-20-at-9-29-04-am
Ed Begley knows exactly what I’m talking about

Developed by the consulting/developer agency AVPreserve with University of Kentucky Libraries, Exactly is another GUI built on top of Bagit-java. Unlike Bagger, it’s very easy to download, immediately install and run versions of Exactly for Mac or Windows, and AVPreserve provides a handy quickstart guide and a more detailed user manual, both very useful if you’re just getting started with bagging. Its features are at once more limited and more expansive than Bagger. The interface isn’t terribly verbose, meaning it’s not always clear what the application is actually doing from moment to moment. But Exactly is more robustly designed for insertion of extra metadata into the bag (users can create their own fields and values to be inserted in a bag’s bag-info.txt file, so you could include administrative metadata unique to your own institution).

And Exactly’s biggest attraction is that it actually serves as a file delivery client – that is, it won’t just package the files you want into a bag before transferring, but actual perform the transfer. So if you want to regularly move files to and from a dedicated server for storage with minimal fuss, Exactly might be the tool you want, albeit it could still use some aesthetic/verbosity design upgrades.

3. Command Line (BagIt-python)

screen-shot-2016-09-20-at-9-02-09-am

Let’s say you really prefer to work with command-line software. If you’re comfortable with CLI, there are many advantages – greater control over your bagging software and exactly how it operates. I mentioned earlier that the LoC stopped supporting BagIt-java for command-line, but that doesn’t mean you command-line junkies are completely out of luck. Instead they shifted support and development of command-line bagging to a different programming language: Python.

If you’re working in the command-line, chances are you’re on Mac OSX or maybe Linux. I’m going to assume from here on you’re on OSX, because anyone using Linux has likely figured all this out themselves (or would prefer to). And here’s the thing, if you’re a novice digital archivist working in the Terminal on OSX: you can’t install BagIt-python using Homebrew.

Instead, you’re going to need Python’s own package manager, a program called “pip.” In order to get Bagit-python, you’re going to need to do the following:

  1. Check what version of Python is on your computer. Mac OSX and Linux machines should come with Python already installed, but Bagit-python will require at least Python version 2.6 or above. You can check what version of Python you’re running in the terminal with:$ python ––version

    If your version isn’t high enough, visit https://www.python.org/downloads/ and download/install Python 2.7.12 for your operating system. (do not download a version of Python 3.x – Bagit-python will not work with Python 3, as if this wasn’t confusing enough)

  2. Now you’ll need the package manager/installer, pip. It may have come ready to go with your Python installation, or not. You can check that you have pip installed with:$ pip ––version

    If you get a version number, you’re ready to go to step 3. If you get a message that pip isn’t installed, you’ll have to visit https://pip.pypa.io/en/stable/installing/Click on the hyperlinked “get-pip.py”. A tab should open with a buncha text – just hit Command-S and you should have the option to save/download this page as a Python script (that is, as a .py file). Then, back in the Terminal, navigate into whatever directory you just downloaded that script into (Downloads perhaps, or Desktop, or wherever else), and run the script by invoking$ python get-pip.py

    Pip should now be installed.

     

  3. Once pip is in place you can just use it to install Bagit-python the same way you use Homebrew:$ pip install bagit               (sudo if necessary)

You should be all set now to start using Bagit-python via the command-line. You can invoke Bagit-python using commands that start with “bagit.py” – it’s a good idea to go over command line usage and options for adding metadata by visiting the help page first, which is one of the LoC’s better efforts at documentation: https://github.com/LibraryOfCongress/bagit-python or:

$ bagit.py –help

But the easiest usage is just to create a bag “in place” on a directory you already have, with a flag for whichever kind of checksum you want to use for fixity:

$ bagit.py –md5 /path/to/directory

As with Bagger, the directory will remain exactly where it is in your file system, but now will contain the various tag/checksum manifests and all the media files within a “data” folder according to the BagIt spec. Again, the power of command-line BagIt-python lies in its flexibility – the ability to add metadata, choose different checksum configurations, increase or decrease the verbosity of the software. If you’re comfortable with command-line tools, this is the implementation I would most recommend!

Update:  Please also read this terrific tutorial by Kathryn Gronsbell for the 2016 NDSR symposium for a more detailed rundown of BagIt-python use and installation, including sample exercises!!!

4. BaggerJS

screen-shot-2016-09-20-at-9-31-51-am

It’s still in an early/experimental phase, but the LoC is also working on a web-based application for bagging called BaggerJS (built on a version of the BagIt library written in JavaScript, which yes, for those wondering at home, is a totally different programming language than Java that works specifically for web applications, because we needed more versioning talk in this post).

Right now you can select and hash files (generate checksums for file fixity), and upload valid bags to a cloud server compatible with Amazon’s s3 protocol. Metadata entry and other features are still incomplete, but if development continues, this might, like Exactly, be a nice, simplified way to perform light bag transfers, particularly if you use a cloud storage service to back up files. It also has the advantage of not requiring any installation whatsoever, so novice users can more or less step around around the Java vs. Python, GUI vs. CLI questions.

https://libraryofcongress.github.io/bagger-js/

5. Integrated into other software platforms/programs

The BagIt library has also been integrated into larger, more complex software packages, designed for broader digital repository management. Creating bags is only one piece of what these platforms are designed to do. One good example is Archivematica, which can perform all kinds of file conformance checks, transcoding, automatic metadata generation and more. But it does package data according to the BagIt spec whenever it actually transfers files from one location to another.

And that’s the other, more complicated way to use the BagIt library – by building it into your own software and scripts! Obviously this is a more advanced step for archivists who interested in coding and software development. But the various BagIt versions (Java, Python, JavaScript) and the spec itself are all in the public domain and anyone could incorporate them into their own applications, or recreate the BagIt library in the programming language of their choice (there is, for instance, a BagIt-ruby version floating around out there, though it’s apparently deprecated and I’ve never heard of anyone who used it).

Dual-Boot a Windows Machine

It is an inconvenient truth that the MIAP program is spread across two separate buildings along Broadway. They’re only about five minutes apart, and the vast majority of the time this presents no problems for students or staff, but it does mean that my office and one of our primary lab spaces are in geographically separate locations. Good disaster planning, troublesome for day-to-day operations.

The Digital Forensics Lab (alternately referred to as the Old Media Lab or the Dead Media Lab, largely depending on my current level of frustration or endearment towards the equipment contained within it) is where we house our computing equipment for the excavation and exploration of born-digital archival content: A/V files created and contained on hard drive, CD, floppy disk, zip disk, etc. We have both contemporary and legacy systems to cover decades of potential media, primarily Apple hardware (stretching back to a Macintosh SE running OS 7), but also a couple of powerful modern Windows machines set up with virtual machines and emulators to handle Microsoft operating systems back to Windows 3.1 and MS-DOS.

Having to schedule planned visits over from my office to the main Tisch building in order to test, update, or otherwise work with any of this equipment is mildly irksome. That’s why my office Mac is chock full of emulators and other forensic software that I hardly use on any kind of regular basis – when I get a request from a class for a new tool to be installed in the Digital Forensics Lab, it’s much easier to familiarize myself with the setup process right where I am before working with legacy equipment; and I’m just point-blank unlikely to trek over the other building for no other reason than to test out new software that I’ve just read about or otherwise think might be useful for our courses.

sleepy-office-worker-at-desk-with-multiple-coffees
#ProtestantWorkEthic

This is a long-winded way of justifying why the department purchased, at my request, a new Windows machine that I will be able to use as a testing ground for Windows-based software and workflows (I had previously installed a Windows 7 virtual machine on my Mac to try to get around some of this, but the slowed processing power of a VM on a desktop not explicitly setup for such a purpose was vaguely intolerable). The first thing I was quite excited to do with this new hardware was to set up a dual-boot operating system: that is, make it so that on starting up the computer I would have the choice of using either Windows 7 or Windows 10, which is the main thing I’m going to talk about today.

IMG_2329
Swag

Pretty much all of our Windows computers in the archive and MIAP program still run Windows 7 Pro, for a variety of reasons – Windows 8 was geared so heavily towards improved communication with and features for mobile devices that it was hardly worth the cost of upgrading an entire department, and Windows 10 is still not even a year old, which gives me pause in terms of the stability and compatibility of software that we rely on from Windows 7. So I needed Windows 7 in order to test how new programs work with our current systems. However, as it increases in market share and developers begin to migrate over, I’m increasingly intrigued by Windows 10, to the point that I also wanted access to it in order to test out the direction our department might go in the future. In particular I very much wanted to try out the new Windows Subsystem for Linux, available in the Windows 10 Anniversary Update coming this summer – a feature that will in theory make Linux utilities and local files accessible to the Windows user via a Bash shell (the command-line interface already seen on Mac and Ubuntu setups). Depending how extensive the compatibility gets, that could smooth over some of the kinks we have getting all our students (on different operating systems) on the same page in our Digital Literacy and Digital Preservation courses. But that is a more complicated topic for another day.

When my new Windows machine arrived, it came with a warning right on the box that even though the computer came pre-installed with Windows 7 and licenses/installation discs for both 7 and Windows 10,

You may only use one version of the Windows software at a time. Switching versions will require you to uninstall one version and install the other version.

1d8acd8c6e8e337ce31bef84a8636491

This statement is only broadly true if you have no sense of partitioning, a process by which you can essentially separate your hard drive into distinct, discrete sections. The computer can basically treat separate partitions as separate drives, allowing you to format the different partitions with entirely separate file systems, or, as we will see here, install completely different operating systems.

Now, as it happens, it also turned out to be semi-true for my specific setup, but only temporarily and because of some kinks specific to manufacturer who provided this desktop (hi, HP!). I’ll explain more in a minute, but right now would be a good point to note that I was working with a totally clean machine, and therefore endangering no personal files in this whole partitioning/installation process. If you also want to setup some kind of dual-boot partition, please please please make sure all of your files are backed up elsewhere first. You never know when you will, in fact, have to perform a clean install and completely wipe your hard drive just to get back to square one.

1a0a18d74db871e6358d7526b271c0e749d9cedb8afd2411816625802370c924
“Arnim Zola sez: back up your files, kids!”

So, as the label said, booting up the computer right out of the box, I got a clean Windows 7 setup. The first step was to make a new blank partition on the hard drive, on to which I could install the Windows 10 operating system files. In order to do this, we run the Windows Disk Management utility (you can find it by just hitting the Windows Start button and typing “disk management” into the search bar:

start

Once the Disk Management window pops up, I could see the 1TB hard drive installed inside the computer (labelled “Disk 0”), as well as all the partitions (also called “volumes”) already on that drive. Some small partitions containing system and recovery files (from which the computer could boot into at least some very basic functionality even if the Windows operating system were to corrupt or fail) were present, but mostly (~900 GB) the drive is dedicated to the main C: volume, which contains all the Windows 7 operating files, program files, personal files if there were any, etc. By right-clicking on this main partition and selecting “Shrink Volume,” I can set aside some of that space to a new partition, on to which we will install the Windows 10 OS. (note all illustrative photos gathered after the fact, so some numbers aren’t going to line up exactly here, but the process is the same)

hesx3

If you wanted to dual-boot two operating systems that use completely incompatible file systems – for instance, Mac and Windows – you would have to set aside space for not only the operating system’s files, but also all of the memory you would want to dedicate to software, file storage, etc. However, Windows 7 and 10 both use the NTFS file system – meaning Windows 10 can easily read and work with files that have been created on or are stored in a Windows 7 environment. So in setting up this new partition I only technically had to create space for the Windows 10 operating system files, which run about 25 GB total. In practice I wanted to leave some extra space, just in case some software comes along that can only be installed on the Windows 10 partition, so I went ahead and doubled that number to 50 GB (since Disk Management works in MB, we enter “50000” into the amount of space to shrink from the C: volume).

shrink_volume

Disk Management runs for a minute and then a new Blank Partition appears on Disk 0. Perfect! I pop in the Windows 10 installation disc that came with the computer and restart. In my case, the hardware automatically knew to boot up from the installation disc (rather than the Windows 7 OS on the hard drive), but it’s possible others would have to reset the boot order to go from the CD/DVD drive first, rather than the installed hard drive (this involves the computer’s BIOS or UEFI firmware interface – more on that in a minute – but for now if it gives you problems, there’s plenty of guides out there on the Googles).

Following the instructions for the first few parts of the Windows 10 installer is straightforward (entering a user name and password, name for the computer, suchlike), but I ran into a problem when finally given the option to select the partition on to which I wanted to install Windows 10. I could see the blank, unformatted 50 GB partition I had created, right there, but in trying to select it, I was given this warning message:

Windows cannot be installed to this disk. The selected disk is of the GPT partition style.

Humph. In fact I could not select ANY of the partitions on the disk, so even if I had wanted to do a clean install of Windows 10 on to the main partition where Windows 7 now lived, I couldn’t have done that either. What gives, internet?

So for many many many years (in computer terms, anyway – computer years are probably at least equivalent to dog years), PCs came installed with a firmware interface called the BIOS – Basic Input/Output System. In order to install or reinstall operating system software, you need a way to send very basic commands to the hard drive. The BIOS was able to do this because it lived on the PC’s motherboard, rather than on the hard drive – as long as your BIOS was intact, your computer would have at least some very basic functionality, even if your operating system corrupted or your hard drive had a mechanical failure. With the BIOS you could reformat your hard drive, select whether you booted the operating system from the hard drive or an external source (e.g. floppy drive or CD drive), etc.

header
Or rule a dystopian underwater society! …wait

In the few seconds when you first powered on a PC, the BIOS would look to the very first section of a hard drive, which (if already formatted) would contain something called a Master Boot Record, a table that contains information about the partitions present on that hard drive: how many partitions are present, how large each of them are, what file system was present on each, which one(s) contained bootable operating system software, which partition to boot from first (if multiple partitions had a bootable OS).

windows-cannot-be-installed-to-this-disk
You probably saw something like this screen by accident once when your cat walked across your keyboard right as you started up the computer.

Here’s the thing: because of the limitations of the time, the BIOS and MBR partition style can only handle a total of four partitions on any one drive, and can only boot from a partition if it isless than about 2.2 TB in size. For a long time, that was plenty of space and functionality to work with, but with rapid advancements in the storage size of hard drives and the processing power of motherboards, the BIOS and MBR partitioning became increasingly severe and arbitrary roadblocks. So from the late ’90s through the mid-’00s, an international consortium developed a more advanced firmware interface, called UEFI (Unified Extensible Firmware Interface) that employed a new partition system, GPT (GUID Partition Table). With GPT, there’s theoretically no limit to the number of partitions on a drive, and  UEFI can boot from partitions as large as 9.4 ZB (yes, that’s zettabytes). For comparison’s sake, 1 ZB is about equivalent to 36,000 years of 1080p high-definition video. So we’re probably set for motherboard firmware and partition styles for a while.

n2cnt4
We’re expected to hit about 40 zettabytes of known data in 2020. Like, total. In the world. Our UEFI motherboards are good for now.

UEFI can not read MBR partitions as is, though it has a legacy mode that can be enabled to restrict its own functionality to that of the BIOS, and thereby read MBR. If the UEFI motherboard is set to only boot from the legacy BIOS, it can not understand or work with GPT partitions. Follow?

So GETTING BACK TO WHAT WE WERE ACTUALLY DOING….the reason I could not install a new, Windows 10-bootable partition on to my drive was that the UEFI motherboard in my computer had booted from the legacy BIOS -for some reason.

jdhvc
Me.

Honestly, I’m not sure why this is. Obviously this was not a clean hard drive when I received it – someone at HP had already installed Windows 7 on to this GPT-partitioned hard drive, which would’ve required the motherboard to be in UEFI boot mode. So why did it arrive with legacy BIOS boot mode not only enabled, but set first in the preferential boot order? My only possible answer is that after installing Windows 7, they went back in and set the firmware settings to legacy BIOS boot mode in order to improve compatibility with the Windows 7 OS – which was developed and released still in the days when BIOS was still the default for new equipment.

This was a quick fix – restart the computer, follow the brief on-screen instructions to enter the BIOS (usually pressing the ESC key, though it can vary with your setup), and navigating through the firmware settings to re-enable UEFI boot mode (I also left legacy BIOS boot enabled, though lower in the boot order, for the above-stated reasoning about compatibility with Windows 7 – so now, theoretically, my computer can start up from either MBR or GPT drives/disks with no problem).

Phew. Are you still with me after all this? As a reward, here’s a vine of LeBron James blocking Andre Iguodala to seal an NBA championship, because that is now you owning computer history and functionality.

https://vine.co/v/5BuzmV0Xw5b

From this point on, we can just pop the Windows 10 installation disc back in and follow the instructions like we did before. I can now select the unformatted 50 GB partition on which to install Windows 10 – and the installation wizard basically runs itself. After a lot of practical setup username and password nonsense, now when I start up my computer, I get this screen:

boot-screen-640x480

And I can just choose whether to enter the Windows 7 or 10 OS. Simple as that. I’ll go more into some of what this setup allows me to do (particularly the Windows Subsystem for Linux) another day, as this post has gone on waaaayy too long. Happy summer, everyone!

PowerPC Mac Emulation

A couple weeks ago Mona Jimenez asked me to step into her course on Handling Complex Media, to help a student group with a tech request (business as usual). Going back to the lab, I had a hint of what was coming from the whiteboard:

IMG_2166
Uh oh.

Yes, as it turned out, the students were working with a piece of multimedia artwork/software that required a PowerPC version of Mac OSX (10.0 through 10.5) in order to run. Normally, this wouldn’t present much of an issue, as MIAP’s “Old Media Lab” still has several old Power Mac G4 desktops and even a couple Macbook laptops running various early versions of Mac OSX. However, the students would only have access to the digital materials on-site at the partner institution for this project, and could not bring the software back to NYU. They could (and might still, if it comes to it) just bring the laptops to the site and run the software in the native environment, but that’s unideal for a couple reasons: first, I’m always somewhat hesitant for department equipment to leave campus; and second, having old hardware running these old operating systems natively is something of a luxury, which our students may very well not have in the future as equipment continues to age, or if they work at an institution with shallower pockets for digital preservation.

In order to access software or digital files created for obsolete systems, the primary solutions these days are emulation and virtualization – two slightly different methods of, essentially, using software to trick a contemporary computer into mimicking the behavior and limitations of other hardware and/or operating systems. Emulation has gotten incredibly sophisticated recently – the Internet Archive has even made it possible to run thousands of vintage MS-DOS and Windows 3.1 programs from an emulator inside your web browser, no additional downloads required, which is really an incredible feat of programming. Emulators for early Mac systems (anywhere from 1.0 to 9.x) are relatively simple to set up in OSX 10.10 (Yosemite) or 10.11 (El Capitan), likewise virtual machine software like VirtualBox (all topics for another day).

But right now the early, PowerPC versions of OSX seem to be something of an emulation/virtualization dead zone. I’m not the person to ask why – I’m assuming that the shift from PowerPC to Intel processors (starting with OSX 10.6, Snow Leopard), shifted the system architecture dramatically while the operating system remained relatively the same, resulting in a particular hardware/software configuration that just confuses the heck out of current setups, even through an emulator. It’s clearly possible – sift through the forums of Emaculation or other emulation enthusiast sites and you’ll find five-year-old boasts of people getting OSX Puma to run in Windows XP, or whatever – but documentation is sketchy and scattered even by internet standards, and replication therefore a crapshoot.

So, how do I help these students get a PowerPC version of OSX on one of their (Intel Mac) laptops? Anytime we need new Mac software in the department, I try it out first on my office computer, a mid-2011 iMac running OSX 10.10.5.

Screen Shot 2016-04-07 at 9.18.20 AM
Note: I can’t even use some of this stuff, but it’s cluttering up my desktop anyway.

I eliminate using VirtualBox almost right off the bat – the makers of VirtualBox explicitly state that the software does not support PowerPC architecture, which, again, doesn’t mean it’s impossible, but it does mean that unless I magically have the same computer setup as a random YouTube user, I’m completely on my own. Instead I’m going to use PearPC, an old PowerPC architecture emulator (it hasn’t been updated since 2011), but one with some solid documentation to get started. I’ll be trying to install OSX Tiger (10.4) in PearPC, as we still have a couple original installation discs for Tiger still lying around the department, and Apple install discs are otherwise hard to come by (if you don’t like going to/supporting super dubious torrent sites, or buying overly expensive copies off Amazon).

PearPC recommends installing Darwin as your client OS (the OS running inside the emulation software) first, to properly partition and format your virtual hard disk (the fake hard drive the emulator will use to make the OS think it’s being installed directly on to a piece of hardware). But I immediately just ignored that because WHAT THE HELL IS DARWIN?! So, I skip to just downloading the PearPC 0.5 source archive for Unix (e.g. Mac) systems.

Uh oh. A source archive means the software needs to be compiled before it will actually run. Normally I would immediately turn away and go find someone who had already compiled a packaged OSX build FOR me, but the PearPC documentation includes some seemingly straightforward command-line instructions for this step. So, I open a Terminal window, navigate into the PearPC-0.5 directory, and attempt a default configuration and make with

$ ./configure && make

Lots of Terminal gobbledygook aaaaand PearPC seems to automatically detect my system configuration fine:

Screen Shot 2016-04-07 at 9.46.37 AM

But then in the make….

Screen Shot 2016-04-07 at 9.47.55 AM

Oops. I have no idea what this ‘MAP_32BIT’ identifier is, nor how to change it, nor if that’s really even the issue here. So ends my efforts to self-compile – pretty please, tell me someone has already done this for me?

Screen Shot 2016-04-07 at 9.50.17 AM.png
“I’ll save you, Ethan!”

Huzzah! Google directs me to this very nice Dutch expert (who is also apparently secretly a cat on his 7th life) in the Emaculation forums has already compiled an Intel Mac OSX build of PearPC. YOINK.

Per the Dutch Cat, I still need a configuration file and a blank hard disk image. So it turns out to be good that I downloaded that Unix source archive, even if the compiling didn’t work, because I can just steal the “ppccfg.example” configuration file from that directory and move it into my OSX build directory. It’s just a simple text file, so I can rename it whatever I want for clarity’s sake.

Screen Shot 2016-04-07 at 9.58.26 AM.png

Now I need the blank hard disk image. Back in the PearPC documentation, we’ve got some handy details on the specs needed (a multiple of 3GiB size, in particular), and how about that, a sample dd command to make one. When I did this I just used 3GiB, but I’d recommend the 6GiB size, just to make sure you have room for the installation of OSX Tiger and something leftover:

$ dd if=/dev/zero of=~/Desktop/pearpc_osx_generic/PearPCTiger.img bs=516096 seek=6241 count=0

My OSX build directory now looks something like this in a Finder window:

Screen Shot 2016-04-07 at 10.03.36 AM.png

Dandy. Now I just need to set up the configuration file, so the PearPC application is directed to the blank hard disk image and the OSX Tiger install disc (currently sitting unmounted in my iMac’s optical drive) when it tries to boot up. So I open the configuration file with a simple text editor (TextEdit, Xcode, even Word will do) and find and change the comment lines that correspond to the hard disk image and install disc paths (you can find path to your mount point for an optical drive by running the command “$ diskutil list” in a Terminal window, then running “$ umount /path/to/disc/drive/” to make sure your host computer unmounts the disc – in most cases, if your desktop/laptop just has a hard drive with one partition and one optical drive, the path will be /dev/disk1)

Screen Shot 2016-04-07 at 10.45.05 AM.png
Save the configuration file and we’re ready to go, right? Back to Terminal, because PearPC is a command-line application, navigate into the OSX build directory, and run the executable file in the build with

$ ./ppc_osx_generic “osxtiger.rawr”

Screen Shot 2016-04-07 at 10.23.09 AM.png

Aaaaand nothing happens. I’m just sitting on the cursor. Why? Tell me, Dutch Cat Man!

Screen Shot 2016-04-07 at 10.26.54 AM.png

As it turns out, the X in “OSX” doesn’t just mean “10.” It also refers to the X Windows System, a development framework for making applications with graphical user interface windows on Unix systems. It’s a standard component in OSX (indeed, in pretty much all Mac OSs over the years), but you need to download some extra software to allow cross-platform software like PearPC to run on it. This software!

Screen Shot 2016-04-07 at 10.35.55 AM.png
So many Xs.

All right, XQuartz is now installed, and since I forgot to terminate PearPC and it’s been running this whole time in the background, suddenly XQuartz opens, PearPC starts running and booting from the OSX Tiger install disc.

Screen Shot 2016-04-07 at 10.46.48 AM.png

I get the classic gray apple screen, then after a moment, some terrifying-looking text appears. Then…it just sits there. For too long.

Screen Shot 2016-04-07 at 10.50.47 AM.png

That can’t be good. Perhaps I’m getting too fancy trying to boot off the physical disc in my host computer’s optical drive – what if I make an image of that instead, and plug it into the PearPC configuration file? There are many options for making disk images, and that’s a whole other topic. I’m going to run the absolute simplest of my command-line options right now and see how that goes:

$ cp /dev/disk1 ~/Desktop/pearpc_osx_generic/Mac_OSX_Tiger_Install_DVD.iso

Once that’s finished running, I go back into the configuration file and edit the line that corresponds to the install disk image:

Screen Shot 2016-04-07 at 10.59.34 AM.png

What happens if I run the PearPC executable again now? I’ve booted back to the scary text screen again, but…

Screen Shot 2016-04-07 at 10.54.59 AM.png

This time it keeps running! I let things scroll for a minute and eventually am greeted by a very familiar sight….

Screen Shot 2016-04-07 at 11.03.04 AM.png

A Mac Installer wizard! We did it everybody!

brad-pitt

Well, not quite yet. I start to move through the Installer but we haven’t actually formatted that blank hard disk image to make it capable of having Mac OSX installed on it yet. So, when stuck at the “Select Destination” screen with no options for where to install the OS, I’m going to head into the “Utilities” tab and enter Mac’s Disk Utility software.

Screen Shot 2016-04-07 at 11.05.28 AM.png

In order to format my blank hard disk image, I’m going to select the image from the left-hand menu, navigate to the “Partition” tab, then select “1 Partition” in the Volume Scheme and “Mac OS Extended (Journaled)” as the Format, and click Partition in the lower-right to execute.

Screen Shot 2016-04-07 at 11.08.51 AM.png
Please do not ask me why Disk Utility is shouting at me in German.

Once that’s finished, I’m able to exit Disk Utility and return to the Installer – and the formatted hard disk image is now available to select as an installation destination. Now, OSX Tiger needs about 4.8GB of space to install in its entirety, which is why I told you to make a 6GiB image earlier. If you’ve made a smaller, 3GiB image as I did, you’ll have to de-select some of the installation packages. It’s not that big a deal – a ton of space is taken up by non-essential features like device drivers and extra languages.

Screen Shot 2016-04-07 at 11.14.26 AM.png
And I was so looking forward to doing this again in Russian.

OSX Tiger is now ready to install. Now, if you’re following along, you may have noticed at this point that PearPC runs slooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooow

44a5f1b6b0dce7c0e0b3a658756533f62481c1e05da3c693841ec7f6ec682a7c

So yeah, this installation takes a while. Like possibly hours, plural. I went off to do some other work and forgot to tell my host computer not to go to sleep, which made PearPC just pause the installation for about an extra hour. Don’t do that.

OSX Tiger did successfully install, however, but here’s the kicker – as I went through the initial setup, it turns out that PearPC did something COMPLETELY WONKY to the mapping on my keyboard during that installation. So, when trying to set up a user account and just typing like a normal person, I got this nonsense:

Screen Shot 2016-04-07 at 1.11.15 PM.png

I wish I could tell you that I solved this problem, but no, I just had to painstakingly poke at one button at a time until I figured out that now while in PearPC, e=delete, 4=n, v=9, so on and so forth. When I finally got into the OSX Tiger desktop and was able to go to System Preferences, I thought I had fixed it by switching to a Canadian keyboard layout (why do you even have a separate keyboard layout, Canada?), but now every time I boot back into PearPC it resets. So that’s a mystery and if anyone has ideas how to fix this I’m all ears.

Screen Shot 2016-04-07 at 1.15.26 PM.png

But, for the moment I’m calling this mission accomplished. Again, OSX Tiger in PearPC runs AS SLOW AS DIRT, so this is not ideal, and I would still like to figure out how to crack the VirtualBox solution for all this. But from what I can tell that might involve this mysterious Darwin operating system that apparently makes all my Apple computers work…and given how this post has already turned out much longer than I intended, that’s a topic for another day.

128px-hexley_the_platypus-svg
WHAT ARE YOU

If you’ve had any success setting up a PowerPC Mac OS in an emulator or VM, I’d love to hear about it!