Using BagIt in 2018
One of my more popular posts to this blog has been my 2016 round-up of BagIt, the Library of Congress' seminal file packaging specification/software library. My overall explanation for what BagIt is, why it's so important, the still-scattered state of documentation, and the need for a roundup of implementations for practical use all still stand... but I've realized lately that this post/topic could use a revisit, for a couple of reasons:
1) A year on, I've done a lot more interaction on GitHub and with open source software, and I regret my general tone when discussing the need for better BagIt documentation. One of the beautiful things about open source projects (and BagIt particularly, since the LoC hosts all the code for the BagIt libraries and several of its implementations on GitHub, which is *made* for collaboration) is the opportunity for direct, constructive feedback. I should have raised my problems with unclear documentation as an issue on GitHub (looky here, just as I did while preparing this post), or at least posed my confusion as a question/concern to be improved, rather than as a complaint "behind the backs" of the developers! Etiquette is important, and I will do better at remembering that digital preservation is not an unfeeling collection of tools and tech - there are people behind every line of code and every social media post (OK besides the twitter bots but you know what I mean).
2) Software changes! It updates! That's the whole point! And instructions that worked even a year or two ago may no longer work in the most contemporary environments. To that end, there have been some changes in macOS systems in particular that make me want to create new installation instructions (particularly for bagit-python) to help people avoid headaches.
So check out my previous post for why BagIt's great - and then look below for a new roundup of how and why to use its various interfaces and implementations in 2018! (yeah I know it's still 2017, but as much as digital preservation is about constant updating I'd like to future-proof this thing by *at least* two weeks, ya know?)
1. Bagger (GUI)
It's Bagger! Still a nice intuitive GUI interface with big honking buttons for the basic tasks of bagging (creation from multiple files or bagging a directory in place, adding metadata, verification/validation). Still probably the best/most intuitive implementation for novice users. And the LoC GitHub repo for Bagger now has specific first-time installation/run instructions for both Windows and Mac. Beautiful!
2. bagit-java (library + CLI)
The LoC's bagit-java 5.x library can be incorporated into any scripts or applications written in Java (such as the two GUI implementations elsewhere on this list). It can not, however, be interfaced with as a stand-alone command line utility. For that, you can still install and use bagit-java version 4.x, even though that version has obviously been surpassed and is not being actively developed. For installing and using bagit-java via CLI, you can use Homebrew (note that you will need Java installed as well):
$ brew install bagit
which installs bagit-java v4.12.3. Documentation for using bagit-java can be found in the utility's help page, invoked with:
$ bagit --help
Just a quick note: the --help page incorrectly refers to the command to invoke bagit-java. The help page usage example says to use "$ bag <operation> [operation arguments]", but the correct syntax is in fact "$ bagit <operation> [operation arguments]" ! (per my question on GitHub about this, apparently this problem is hard-coded and would require recompiling the Java source rather than just tweaking a doc, so since bagit-java CLI isn't actively maintained, no fix is forthcoming)
3. bagit-python (library + CLI)
So, this section is really less an update on bagit-python and more an update on python itself. Bagit-python can still be used either as a library to integrate into scripts and applications written in python, or as stand-alone command line utility. Your preference for using bagit-java or bagit-python in the CLI could be decided by looking at both utilities' help pages. In either case, if you are interested in using/installing bagit-python, changes in recent macOS versions have meant that my previous instructions created more headaches than intended.
(Thanks to the brave MIAP students in Video Preservation who discovered and tried to deal with these inconsistencies!!)
So, for explanation: starting with OSX 10.11 (El Capitan), Apple introduced a feature called System Integrity Protection, nominally to keep unverified or malevolent applications downloaded from the internet from messing with critical OS-installed system software. What this means is, without futzing around a lot with permissions (which is not a great idea for a novice user), using a package manager like Homebrew winds up with some software in the OS-controlled "/usr" directory and its subfolders, and some software in the user- or package-manager-controlled "/usr/local" directory and its subfolders.
My previous instructions, which directed people to mix the default macOS-installed version of Python with the user-installed versions of pip (python's package manager) and bagit-python, generated a whole bunch of permissions issues.
The solution? Stay away from the macOS python altogether and install all components with a package manager to keep the installation contained within "/usr/local".
So, assuming you have Homebrew installed:
$ brew install python
This will install Homebrew's Python 2.x package (currently 2.7.14), which includes the pip package manager by default (macOS' Python package does *not* include pip by default). Note however! Since your Mac already came with a python installation (at /usr/bin/python), Homebrew renames its versions "python2" and "pip2" to avoid confusion/overwriting. (so its commands/binaries live at /usr/local/bin/python2 and /pip2)
$ sudo pip2 install bagit
Although "sudo" shouldn't be necessary here, I've encountered some errors when not using it. So I think "pip2" may still mess with some files outside of /usr/local. Best to err on the side of using "sudo" and entering your admin password - there shouldn't be any issue with doing so.
That's it! bagit-python is now installed. You can invoke bagit-python commands with
$ bagit.py [path/to/directory]
just as before. Check out the help page with the "--help" flag for more info.
(If you already tried to install bagit-python with the previous instructions, you will likely need to do some cleanup in the /usr folder to clear everything out and stop throwing errors. If you need help or advice doing this, feel free to get in touch!!)
4. BagIt for Ruby (library + CLI)
The "bagit-ruby" implementation has been expanded and documented since last year! If you are interested in including a BagIt module in a Ruby application/script, or using this version via the command line, you'll first need to install Ruby:
$ brew install ruby
Which will include Ruby's built-in package manager, gem.
$ gem install bagit validatable
Note that you can't install this Ruby package and the Homebrew package of bagit-java at the same time, as you'll get a collision with them named the same thing in /usr/local/bin. Once downloaded/installed with gem, the BagIt for Ruby CLI is documented at:
$ bagit --help
....but it's real basic, even compared to the CLI for bagit-python. This particular implementation is probably most ideal for its library and incorporation in Ruby scripts/apps, not necessarily for direct command line interfacing.
5. Exactly (GUI)
Not much more to say about AVPreserve's packaging/transfer application since last year - but the combined ability to not just bag, but deliver or receive directories over standard network protocols still make it a great option for those on Mac or Windows and in need of a simple workflow that combines two major ingest steps (bagging and delivery) into one quick and easy tool.
6. bagger-js (experimental library + web app)
Likewise, the LoC's BaggerJS library/app could serve as both bagging and delivery system, via a web browser interface instead of a stand-alone, downloaded app. It's basically "bagit-javascript" - that is, the BagIt library written in JavaScript (which is a web programming language entirely separate from Java). I assume it's referred to as "bagger-js" because in the LoC's naming system, "bagger" implies a GUI, whereas "bagit" is just the underlying library or CLI.
Bagger-js is still referred to in the LoC GitHub repo as "experimental", so the library and accompanying demo web interface (which can bag a local directory and send it to a remote server compliant with Amazon's s3 protocol) are not production-ready like Bagger or the other BagIt libraries/interfaces. But, again, all the work that they've done so far is right there and available to adapt/incorporate into your own JavaScript/web app projects!
7. other apps
Of course, there are likely a number of applications or other pieces of software that incorporate BagIt as one piece or microservice of a larger workflow/system. Archivematica's a major one that I'm aware of. Maybe you have another! Feel free to let me know what I've missed.