Using BagIt in 2018

One of my more popular posts to this blog has been my 2016 round-up of BagIt, the Library of Congress’ seminal file packaging specification/software library. My overall explanation for what BagIt is, why it’s so important, the still-scattered state of documentation, and the need for a roundup of implementations for practical use all still stand… but I’ve realized lately that this post/topic could use a revisit, for a couple of reasons:

1) A year on, I’ve done a lot more interaction on GitHub and with open source software, and I regret my general tone when discussing the need for better BagIt documentation. One of the beautiful things about open source projects (and BagIt particularly, since the LoC hosts all the code for the BagIt libraries and several of its implementations on GitHub, which is *made* for collaboration) is the opportunity for direct, constructive feedback. I should have raised my problems with unclear documentation as an issue on GitHub (looky here, just as I did while preparing this post), or at least posed my confusion as a question/concern to be improved, rather than as a complaint “behind the backs” of the developers! Etiquette is important, and I will do better at remembering that digital preservation is not an unfeeling collection of tools and tech – there are people behind every line of code and every social media post (OK besides the twitter bots but you know what I mean).

2) Software changes! It updates! That’s the whole point! And instructions that worked even a year or two ago may no longer work in the most contemporary environments. To that end, there have been some changes in macOS systems in particular that make me want to create new installation instructions (particularly for bagit-python) to help people avoid headaches.

So check out my previous post for why BagIt’s great – and then look below for a new roundup of how and why to use its various interfaces and implementations in 2018! (yeah I know it’s still 2017, but as much as digital preservation is about constant updating I’d like to future-proof this thing by *at least* two weeks, ya know?)

1. Bagger (GUI)

It’s Bagger! Still a nice intuitive GUI interface with big honking buttons for the basic tasks of bagging (creation from multiple files or bagging a directory in place, adding metadata, verification/validation). Still probably the best/most intuitive implementation for novice users. And the LoC GitHub repo for Bagger now has specific first-time installation/run instructions for both Windows and Mac. Beautiful!

2. bagit-java (library + CLI)

The LoC’s bagit-java 5.x library can be incorporated into any scripts or applications written in Java (such as the two GUI implementations elsewhere on this list). It can not, however, be interfaced with as a stand-alone command line utility. For that, you can still install and use bagit-java version 4.x, even though that version has obviously been surpassed and is not being actively developed. For installing and using bagit-java via CLI, you can use Homebrew (note that you will need Java installed as well):

$ brew install bagit

which installs bagit-java v4.12.3. Documentation for using bagit-java can be found in the utility’s help page, invoked with:

$ bagit –help

Just a quick note: the –help page incorrectly refers to the command to invoke bagit-java. The help page usage example says to use “$ bag <operation> [operation arguments]”, but the correct syntax is in fact “$ bagit <operation> [operation arguments]” ! (per my question on GitHub about this, apparently this problem is hard-coded and would require recompiling the Java source rather than just tweaking a doc, so since bagit-java CLI isn’t actively maintained, no fix is forthcoming)

Screen Shot 2017-12-15 at 11.54.15 AM.png

3. bagit-python (library + CLI)

So, this section is really less an update on bagit-python and more an update on python itself. Bagit-python can still be used either as a library to integrate into scripts and applications written in python, or as stand-alone command line utility. Your preference for using bagit-java or bagit-python in the CLI could be decided by looking at both utilities’ help pages. In either case, if you are interested in using/installing bagit-python, changes in recent macOS versions have meant that my previous instructions created more headaches than intended.

(Thanks to the brave MIAP students in Video Preservation who discovered and tried to deal with these inconsistencies!!)

So, for explanation: starting with OSX 10.11 (El Capitan), Apple introduced a feature called System Integrity Protection, nominally to keep unverified or malevolent applications downloaded from the internet from messing with critical OS-installed system software. What this means is, without futzing around a lot with permissions (which is not a great idea for a novice user), using a package manager like Homebrew winds up with some software in the OS-controlled “/usr” directory and its subfolders, and some software in the user- or package-manager-controlled “/usr/local” directory and its subfolders.

My previous instructions, which directed people to mix the default macOS-installed version of Python with the user-installed versions of pip (python’s package manager) and bagit-python, generated a whole bunch of permissions issues.

The solution? Stay away from the macOS python altogether and install all components with a package manager to keep the installation contained within “/usr/local”.

So, assuming you have Homebrew installed:

  1. $ brew install python

    This will install Homebrew’s Python 2.x package (currently 2.7.14), which includes the pip package manager by default (macOS’ Python package does *not* include pip by default). Note however! Since your Mac already came with a python installation (at /usr/bin/python), Homebrew renames its versions “python2” and “pip2” to avoid confusion/overwriting. (so its commands/binaries live at /usr/local/bin/python2 and /pip2)

  2. $ sudo pip2 install bagit

    Although “sudo” shouldn’t be necessary here, I’ve encountered some errors when not using it. So I think “pip2” may still mess with some files outside of /usr/local. Best to err on the side of using “sudo” and entering your admin password – there shouldn’t be any issue with doing so.

    That’s it! bagit-python is now installed. You can invoke bagit-python commands with

$ bagit.py [path/to/directory]

just as before. Check out the help page with the “–help” flag for more info.

(If you already tried to install bagit-python with the previous instructions, you will likely need to do some cleanup in the /usr folder to clear everything out and stop throwing errors. If you need help or advice doing this, feel free to get in touch!!)

4. BagIt for Ruby (library + CLI)

The “bagit-ruby” implementation has been expanded and documented since last year! If you are interested in including a BagIt module in a Ruby application/script, or using this version via the command line, you’ll first need to install Ruby:

$ brew install ruby

Which will include Ruby’s built-in package manager, gem.

$ gem install bagit validatable

Note that you can’t install this Ruby package and the Homebrew package of bagit-java at the same time, as you’ll get a collision with them named the same thing in /usr/local/bin. Once downloaded/installed with gem, the BagIt for Ruby CLI is documented at:

$ bagit –help

….but it’s real basic, even compared to the CLI for bagit-python. This particular implementation is probably most ideal for its library and incorporation in Ruby scripts/apps, not necessarily for direct command line interfacing.

5. Exactly (GUI)

Not much more to say about AVPreserve’s packaging/transfer application since last year – but the combined ability to not just bag, but deliver or receive directories over standard network protocols still make it a great option for those on Mac or Windows and in need of a simple workflow that combines two major ingest steps (bagging and delivery) into one quick and easy tool.

6. bagger-js (experimental library + web app)

Likewise, the LoC’s BaggerJS library/app could serve as both bagging and delivery system, via a web browser interface instead of a stand-alone, downloaded app. It’s basically “bagit-javascript” – that is, the BagIt library written in JavaScript (which is a web programming language entirely separate from Java). I assume it’s referred to as “bagger-js” because in the LoC’s naming system, “bagger” implies a GUI, whereas “bagit” is just the underlying library or CLI.

Bagger-js is still referred to in the LoC GitHub repo as “experimental”, so the library and accompanying demo web interface (which can bag a local directory and send it to a remote server compliant with Amazon’s s3 protocol) are not production-ready like Bagger or the other BagIt libraries/interfaces. But, again, all the work that they’ve done so far is right there and available to adapt/incorporate into your own JavaScript/web app projects!

7. other apps

Of course, there are likely a number of applications or other pieces of software that incorporate BagIt as one piece or microservice of a larger workflow/system. Archivematica’s a major one that I’m aware of. Maybe you have another! Feel free to let me know what I’ve missed.

 

3 thoughts on “Using BagIt in 2018”

  1. I think you should mention the original bagit tool, md5deep and a text editor.

    https://gist.github.com/edsu/edc19d99299b734fa1b29ce18919a723

    Seriously, the point of BagIt wasn’t that you need specialized GLAM tools to use it. The specification was intended to be a pattern for creating a manifest for some files and some minimal metadata so that a set of files could moved around in time and space with some confidence. While the tools you mention are useful, the kind of obscure what’s really going on. The point was for people to roll their own way of creating the manifest and metadata file, that suited their particular environment. Not that everyone should install a particular tool. Perhaps that was overly optimistic…

    Regarding your not recommending the bagit-python anymore: as you previously pointed out, it does have a –verify command. I’d also argue that if you are splitting bags you might be doing things wrong. But I’m biased because I helped create bagit-python. I do think an update command would be a useful addition though. Something along the lines of a git commit.

    Thanks for your comments about being constructive with open source tools. You’re right, they were put on GitHub so other people could help make them better. People don’t often realize what a ragtag group of people maintain these projects, mostly out of the goodness of their heart. As I’m sure you know, there are pressures at places like the Library of Congress to not do any open source software development at all because of the fear of getting negative exposure. It’s far easier to keep everything in house. We thought we were doing a public service by getting them out onto GitHub. But there’s no such thing as a free puppy.

  2. Thanks so much for your comments, Ed! I was working on an overhaul of this post – the responses to my Mastodon post (https://digipres.club/web/statuses/99361531814192420) made me realize how limited my understanding of BagIt implementation was, and diving deeper and deeper into GitHub repos I could see that a comprehensive roundup of tools and languages was misguided – or, at the very least, not particularly useful. As you’ve pointed out, the entire point is for it to be simple, and therefore nearly infinitely sharable, tweakable.

    I’ll fully admit that the motivation for this post was largely to correct installation instructions around bagit-python, and I did not do the research in the moment to define what “using BagIt in 2018” means – specifically, who is the audience that I’m addressing. I tend to work with students, community orgs, artists/individuals – people who are generally not programmers or even necessarily archivists, are not necessarily working at scale, and are often being first introduced to concepts of fixity, working in the command line, etc. So handing them a ruby module and saying “put it in your Rails app” is probably not the way I should have gone – but neither is handing them a specification document, even an extraordinarily simple/clear one like BagIt, and saying, make this by hand.

    So I still believe a round-up of GUIs and stand-alone command line tools can serve a purpose (and it was always my intent to suggest strengths and balances of various tools depending on what the user’s looking for or comfortable with, not to lock anyone in to one tool). But the framing should be more “Getting Started with BagIt for Digital Preservation” suggesting the further layers that the spec can serve. I’ll work on finishing my revision and would love for you to take a look!

    And thank you for fighting the open source good fight. The LoC absolutely has done a public service by putting both the spec itself and the tools developed on GitHub – the proliferation of implementations (again, I did not realize how spectacularly off I was here) proves that.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.