Getting Started with BagIt in 2018

Take two!

In December, I hastily wrote an update to an old post about BagIt, the Library of Congress’ open-source specification for hierarchical packaging of files to support safe data storage and transfer. The primary motivation for the update was some issues that the Video Preservation course I work with  encountered with my instructions for installing the bagit-python command-line tool, so I wanted to double-check my process there and make sure I was guiding readers correctly. I also figured that it had been a couple years and I could write about new implementations while I was at it. A cursory search turned up a BagIt-for-Ruby library, so I threw that in there, posted, *then* opened up a call for anything I’d missed.

Uhhhh –  I missed a lot.

It was at this point, as I sifted through the various scripts, apps, tools and libraries that create bags in some way that I realized I had lost the thread of what I was even trying to summarize or explain.

Every piece of software using the BagIt spec ever? That, happily, is a fool’s errand – the whole point of the spec is that it’s super easy and flexible to implement, no matter how short the script. So…there’s a lot of implementations.

Every programming language with an available port/module for creating bags according to the BagIt spec? Mildly interesting for hybrid archivist/developers, but probably of less practical use for preservation students, or the average user/creator just trying to take care of their own files, or archivists that are less programming-inclined. A Ruby module for BagIt is objectively cool and useful – for those working/writing apps and scripts in Ruby. Given that setting up a Ruby dev environment requires some other command-line setup that I didn’t even get into, someone’s likely not heading straight to that module right out of the gate.

“Using BagIt” was/is the wrong framework. Too broad, too undefined, and as Ed Summers pointed out, antithetical to the spirit in which a simple, open source specification is made in the first place: to allow anyone to use it, anywhere, however they can – not according to one of four or five methods proscribed in a blog post.

So I am rewriting this post from the mindset, not of “here’s all the forms and tools in which BagIt exists”, but rather, “ok, so I’m learning what a bag is and why’s it useful – how can I make one to get started?”

Because the contents of a specification are terrific and informative, but in my experience nothing reinforces understanding of a spec like a concrete example. And not only that, but one step further – *making* an example. Technical concepts without hands-on labwork or activities to solidify them get lost – and budding digital preservationists told to use the BagIt spec need somewhere to start.

So whether you’re just trying to securely back up your personal files to a cloud service, or trying to get a GLAM institution’s digital repository to be OAIS-compliant, validation and fixity starts at square one. Let me do that as well.

What’s a bag?

Just for refresher’s sake, I’m going to re-post here what I wrote back in 2016 – so that this post can stand alone as a primer:

One of the big challenges in digital archiving is file fixity – a fancy term for checking that the contents of a file have not been changed or altered (that the file has remained “fixed”). There’s all sorts of reasons to regularly verify file fixity, even if a file has done nothing but sit on a computer or server or external hard drive: to make sure that a file hasn’t corrupted over time, that its metadata (file name, technical specs, etc.) hasn’t been accidentally changed by software or an operating system, etc.

But one of the biggest threats to file fixity is when you move a file – from a computer to a hard drive, or over a server. Think of it kind of like putting something in the mail: there are a lot of points in the mailing process where a computer or USPS employee has to read the labeling and sort your mail into the proper bin or truck or plane so that it ends up getting to the correct destination. And there’s a LOT of opportunity for external forces to batter and jostle and otherwise get in your mail’s personal space. If you just slap a stamp on that beautiful glass vase you bought for your mother’s birthday and shove it in the mailbox, it’s not going to get to your mom in one piece.

So a “bag” is a kind of special digital container – a way of packaging files together to make sure what we get on the receiving end of a transfer is the same thing that started the journey (like putting that nice glass vase in a heavily padded box with “fragile” stamped all over it).

Sounds great! How do I make a bag?

At its core, all you need to make a bag out of a digital file or group of files is an editor capable of making plain text files (.txt) and an ability to generate MD5 checksums. An MD5 generator takes *any* string of digital information – including an entire file – and encodes it into a 128-bit fingerprint; that is, a 32-character string of seemingly “random” letters and numbers. Running an MD5 generator on the same file will always produce the same 32-character string. If the file changes in some way (even some change or edit invisible to the user), the MD5 string will change as well. So this process of generating and checking strings allows you to know whether a file is exactly the same on the receiving end of a transfer as it was at the beginning.

BagIt bags facilitate this process via a “tag manifest” – a text file including all the digital files contained in the bag (the “data” in question) and their corresponding MD5 checksums. Packaged together (along with some meta information on the BagIt spec and the bag itself), this all allows for convenient fixity checking.

Convenient, though, in the sense of easing automation. While you *can* put together a bag by hand – generating checksums for each file, copying them into text files to create the manifests, structuring the data and manifests together in BagIt’s dictated hierarchy -that is a copy/paste nightmare, and not exactly going to encourage the computer-shy into healthier digipres practice.

This is why simple scripts and tools and apps are handy. Down the line, when you’re creating your own archival workflow, you may want to find or tweak or make your own process for creating bags – but for your first bag, there’s no need to reinvent the wheel.

I’m going to cover intro tools here, for either the command line or GUI user.

Command Line Tools

  1. this Bash scriptA simple shell script by Ed that just requires just two arguments: the directory you want to bag, and an output directory (in which to put the bag).Hit the green “Download” button in the corner of the GitHub page, select the ZIP file, then unzip the result. Move the “bagit.sh” file inside to a convenient/accessible location in your computer.Once in Terminal, you can run this bash script by navigating to wherever you put this script, then executing it with:
    $ ./bagit.sh /path/to/directory /path/to/bag

    or

    $ bash bagit.sh /path/to/directory /path/to/bag

    (the “./” or “bash” commands do the same thing – indicating to the Bash terminal to execute the bagit.sh script)

    The “/path/to/directory” should be a folder containing all the files you want to be in the bag. Then you will specify the output path for the bag with “/path/to/bag”. Both can be accomplished with drag-and-dropping folders from the Finder.

  2. bagit-pythonBagit-python is the Library of Congress’s officially-supported command line utility for making and working with bags. It requires a working Python interpreter on your computer, plus Python’s package manager, “pip”. By default, macOS comes with a Python interpreter (2.7.10), but not pip. So we go to the popular command-line Mac package manager Homebrew to put this all together.Sigh. OK. So one of the reasons this post didn’t come out last week is that, literally in that same time frame, Homebrew went through….something with regards to their Python packages and how they behaved with Python 2.x vs Python 3.x vs the Python installation that comes with your Mac. (they’ve locked/deleted a lot of the conversations and issues now, but it was really the dark side of FOSS projects in there for a bit). I kept trying to check my instructions were correct, and meanwhile, every “$ brew update” was sending my python installs haywire. It seems like they’ve finally settled, but, I’d still now generally recommend giving this page a once-over before working with python-via-homebrew.

But to summarize: if you want to work with Python 3.x, you install a *package* called “python” and then invoke it with python3 and pip3 commands. If you want to use Python 2.x, you install a package called “python@2” and then invoke with either python and pip or python2 and pip2 commands.

…got it?

For the purposes of just using the bagit-python command-line tool, at least, it doesn’t matter whether you choose Python 2.x or 3.x. It’ll work with both. But stick with one or the other through this installation process. So either:

$ brew install python

+

$ sudo pip3 install bagit

or:

$ brew install python@2

+

$ sudo pip install bagit

That’s it! It’s just making sure you have a version of python installed through Homebrew, then use the python package/module installer “pip”to install the bagit-python tool. I highly recommend using admin privileges with “sudo” to globally install and avoid some weird permissions issues that may arise from trying to run python scripts and tools like bagit-python otherwise.

One installed, look over the help page with

$ bagit.py --help

to see the command syntax – and all the features that you can cover! Including using different hash generators (rather than MD5), adding metadata, validating existing bags rather than creating new ones, etc.

*** a note about bagit-java***
If you are using Homebrew and just run

$ brew install bagit

it will install the bagit-java 4.12.3 library and command-line tool. The LOC no longer supports and doesn’t recommend this tool for command line use, and the –help instructions that come with it don’t even actually reflect the command syntax you have to use to make it work. So! This isn’t a recommendation but just a note for Homebrew users who might get confused about what’s happening here.

GUIs

1. Bagger

Again, the LOC’s official graphical utility program for creating and validating bags. Following the instructions from their GitHub repository linked above, you’re going to download a release and then run on macOS by finding and clicking on the “bagger.jar” file (you’ll need a working Java install as well).Inside Bagger, once you choose the “Create a Bag” option, Bagger will ask you to choose a “profile” – these just refer to the metadata fields available for inserting more administrative information about your bag and the files therein, within the bag itself. These are really useful for keeping metadata consistent if you’re creating a whole bunch of bags, but choosing “<no profile>” is also totally acceptable to get started (you can always re-open bags and insert more metadata later!)”Create Bag in Place” is also a useful option if you don’t want (or digital storage limitations even *prevents*) to have two copies of your files (the original + the copy inside the “data” folder in your bag). Rather than copying and creating the bag in a new directory elsewhere, it’ll just move around/checksum/restructure the files according to the BagIt spec within the original directory.

2. Exactly

A GUI developed by AVP and the University of Kentucky that combines the bagging process with file transfer – which is the presumed end-goal of bagging in any case. To that end, Exactly doesn’t “bag in place” – you always have to pick a source file/directory (or sources – Exactly will bundle them all together into one bag) and a destination for the resulting, created bag. Like Bagger, you can also add metadata via custom-designed fields or import templates/profiles. Added support for FTP or SFTP transfers to remote servers (in addition to locally-attached network storage units like a Samba or Windows share) make it a simple starter option for file delivery.

***************************

If you’re getting started with the BagIt spec, these are the places I’d begin. But as to what implementation *you* can come up with from there, based on your personal/institutional needs…that’s up to you!

Using BagIt in 2018

One of my more popular posts to this blog has been my 2016 round-up of BagIt, the Library of Congress’ seminal file packaging specification/software library. My overall explanation for what BagIt is, why it’s so important, the still-scattered state of documentation, and the need for a roundup of implementations for practical use all still stand… but I’ve realized lately that this post/topic could use a revisit, for a couple of reasons:

1) A year on, I’ve done a lot more interaction on GitHub and with open source software, and I regret my general tone when discussing the need for better BagIt documentation. One of the beautiful things about open source projects (and BagIt particularly, since the LoC hosts all the code for the BagIt libraries and several of its implementations on GitHub, which is *made* for collaboration) is the opportunity for direct, constructive feedback. I should have raised my problems with unclear documentation as an issue on GitHub (looky here, just as I did while preparing this post), or at least posed my confusion as a question/concern to be improved, rather than as a complaint “behind the backs” of the developers! Etiquette is important, and I will do better at remembering that digital preservation is not an unfeeling collection of tools and tech – there are people behind every line of code and every social media post (OK besides the twitter bots but you know what I mean).

2) Software changes! It updates! That’s the whole point! And instructions that worked even a year or two ago may no longer work in the most contemporary environments. To that end, there have been some changes in macOS systems in particular that make me want to create new installation instructions (particularly for bagit-python) to help people avoid headaches.

So check out my previous post for why BagIt’s great – and then look below for a new roundup of how and why to use its various interfaces and implementations in 2018! (yeah I know it’s still 2017, but as much as digital preservation is about constant updating I’d like to future-proof this thing by *at least* two weeks, ya know?)

1. Bagger (GUI)

It’s Bagger! Still a nice intuitive GUI interface with big honking buttons for the basic tasks of bagging (creation from multiple files or bagging a directory in place, adding metadata, verification/validation). Still probably the best/most intuitive implementation for novice users. And the LoC GitHub repo for Bagger now has specific first-time installation/run instructions for both Windows and Mac. Beautiful!

2. bagit-java (library + CLI)

The LoC’s bagit-java 5.x library can be incorporated into any scripts or applications written in Java (such as the two GUI implementations elsewhere on this list). It can not, however, be interfaced with as a stand-alone command line utility. For that, you can still install and use bagit-java version 4.x, even though that version has obviously been surpassed and is not being actively developed. For installing and using bagit-java via CLI, you can use Homebrew (note that you will need Java installed as well):

$ brew install bagit

which installs bagit-java v4.12.3. Documentation for using bagit-java can be found in the utility’s help page, invoked with:

$ bagit –help

Just a quick note: the –help page incorrectly refers to the command to invoke bagit-java. The help page usage example says to use “$ bag <operation> [operation arguments]”, but the correct syntax is in fact “$ bagit <operation> [operation arguments]” ! (per my question on GitHub about this, apparently this problem is hard-coded and would require recompiling the Java source rather than just tweaking a doc, so since bagit-java CLI isn’t actively maintained, no fix is forthcoming)

Screen Shot 2017-12-15 at 11.54.15 AM.png

3. bagit-python (library + CLI)

So, this section is really less an update on bagit-python and more an update on python itself. Bagit-python can still be used either as a library to integrate into scripts and applications written in python, or as stand-alone command line utility. Your preference for using bagit-java or bagit-python in the CLI could be decided by looking at both utilities’ help pages. In either case, if you are interested in using/installing bagit-python, changes in recent macOS versions have meant that my previous instructions created more headaches than intended.

(Thanks to the brave MIAP students in Video Preservation who discovered and tried to deal with these inconsistencies!!)

So, for explanation: starting with OSX 10.11 (El Capitan), Apple introduced a feature called System Integrity Protection, nominally to keep unverified or malevolent applications downloaded from the internet from messing with critical OS-installed system software. What this means is, without futzing around a lot with permissions (which is not a great idea for a novice user), using a package manager like Homebrew winds up with some software in the OS-controlled “/usr” directory and its subfolders, and some software in the user- or package-manager-controlled “/usr/local” directory and its subfolders.

My previous instructions, which directed people to mix the default macOS-installed version of Python with the user-installed versions of pip (python’s package manager) and bagit-python, generated a whole bunch of permissions issues.

The solution? Stay away from the macOS python altogether and install all components with a package manager to keep the installation contained within “/usr/local”.

So, assuming you have Homebrew installed:

  1. $ brew install python

    This will install Homebrew’s Python 2.x package (currently 2.7.14), which includes the pip package manager by default (macOS’ Python package does *not* include pip by default). Note however! Since your Mac already came with a python installation (at /usr/bin/python), Homebrew renames its versions “python2” and “pip2” to avoid confusion/overwriting. (so its commands/binaries live at /usr/local/bin/python2 and /pip2)

  2. $ sudo pip2 install bagit

    Although “sudo” shouldn’t be necessary here, I’ve encountered some errors when not using it. So I think “pip2” may still mess with some files outside of /usr/local. Best to err on the side of using “sudo” and entering your admin password – there shouldn’t be any issue with doing so.

    That’s it! bagit-python is now installed. You can invoke bagit-python commands with

$ bagit.py [path/to/directory]

just as before. Check out the help page with the “–help” flag for more info.

(If you already tried to install bagit-python with the previous instructions, you will likely need to do some cleanup in the /usr folder to clear everything out and stop throwing errors. If you need help or advice doing this, feel free to get in touch!!)

4. BagIt for Ruby (library + CLI)

The “bagit-ruby” implementation has been expanded and documented since last year! If you are interested in including a BagIt module in a Ruby application/script, or using this version via the command line, you’ll first need to install Ruby:

$ brew install ruby

Which will include Ruby’s built-in package manager, gem.

$ gem install bagit validatable

Note that you can’t install this Ruby package and the Homebrew package of bagit-java at the same time, as you’ll get a collision with them named the same thing in /usr/local/bin. Once downloaded/installed with gem, the BagIt for Ruby CLI is documented at:

$ bagit –help

….but it’s real basic, even compared to the CLI for bagit-python. This particular implementation is probably most ideal for its library and incorporation in Ruby scripts/apps, not necessarily for direct command line interfacing.

5. Exactly (GUI)

Not much more to say about AVPreserve’s packaging/transfer application since last year – but the combined ability to not just bag, but deliver or receive directories over standard network protocols still make it a great option for those on Mac or Windows and in need of a simple workflow that combines two major ingest steps (bagging and delivery) into one quick and easy tool.

6. bagger-js (experimental library + web app)

Likewise, the LoC’s BaggerJS library/app could serve as both bagging and delivery system, via a web browser interface instead of a stand-alone, downloaded app. It’s basically “bagit-javascript” – that is, the BagIt library written in JavaScript (which is a web programming language entirely separate from Java). I assume it’s referred to as “bagger-js” because in the LoC’s naming system, “bagger” implies a GUI, whereas “bagit” is just the underlying library or CLI.

Bagger-js is still referred to in the LoC GitHub repo as “experimental”, so the library and accompanying demo web interface (which can bag a local directory and send it to a remote server compliant with Amazon’s s3 protocol) are not production-ready like Bagger or the other BagIt libraries/interfaces. But, again, all the work that they’ve done so far is right there and available to adapt/incorporate into your own JavaScript/web app projects!

7. other apps

Of course, there are likely a number of applications or other pieces of software that incorporate BagIt as one piece or microservice of a larger workflow/system. Archivematica’s a major one that I’m aware of. Maybe you have another! Feel free to let me know what I’ve missed.

 

Using Bagit

do you know how to use Bagit?
cause I’m lost
the libcongress github is sooo not user friendly

I’m missing bagger aren’t I
ugh ugh I just want to make a bag!

That’s an email I got a few weeks back from a good friend and former MIAP classmate. I wanted to share it because I feel like it sums up the attitude of a lot of archivists towards a tool that SHOULD be something of a godsend to the field and a foundational step in many digital preservation workflows – namely, the Library of Congress’ BagIt.

What is BagIt? It’s a software library, developed by the LoC in conjunction with their partners in the National Digital Information Infrastructure and Preservation Program (NDIIPP) to support the creation of “bags.” OK, so what’s a bag?

amerbeauty_165pyxurz

Let’s back up a minute. One of the big challenges in digital archiving is file fixity – a fancy term for checking that the contents of a file have not been changed or altered (that the file has remained “fixed”). There’s all sorts of reasons to regularly verify file fixity, even if a file has done nothing but sit on a computer or server or external hard drive: to make sure that a file hasn’t corrupted over time, that its metadata (file name, technical specs, etc.) hasn’t been accidentally changed by software or an operating system, etc.

But one of the biggest threats to file fixity is when you move a file – from a computer to a hard drive, or over a server, or even just from one folder/directory in your computer to another. Think of it kind of like putting something in the mail: there are a lot of points in the mailing process where a computer or USPS employee has to read the labeling and sort your mail into the proper bin or truck or plane so that it ends up getting to the correct destination. And there’s a LOT of opportunity for external forces to batter and jostle and otherwise get in your mail’s personal space. If you just slap a stamp on that beautiful glass vase you bought for your mother’s birthday and shove it in the mailbox, it’s not going to get to your mom in one piece.

screen-shot-2016-09-20-at-10-04-35-am
And what if you’re delivering something even more precious than a vase?

So a “bag” is a kind of special digital container – a way of packaging files together to make sure what we get on the receiving end of a transfer is the same thing that started the journey (like putting that nice glass vase in a heavily padded box with “fragile” stamped all over it).

Great, right? Generating a bag can take more time than people want, particularly if you’re an archive dealing with large, preservation-quality uncompressed video files, but it’s a no-brainer idea to implement into workflows for backing up/storing data. The thing is, as I said, the BagIt tools developed by the Library of Congress to support the creation of bags are software libraries – not necessarily in and of themselves fully-developed, ready-to-go applications. Developers have to put some kind of interface on top of the BagIt library for people to actually be able to actually interact and use it to create bags.

So right off the bat, even though tech-savvier archivists may constantly be recommending to others in the field to “use BagIt” to deliver or move their files, we’re already muddling the issue for new users, because there literally is no one, monolithic thing called “BagIt” that someone can just google and download and start running. And I think we seriously underestimate how much of a hindrance that is to widespread implementation. Basically anyone can understand the principles and need behind BagIt (I hopefully did a swift job of it in the paragraphs above) – but actually sifting through and installing the various BagIt distributions currently takes time, and an ability to read between the lines of some seriously scattered documentation.

So here I’m going to walk through the major BagIt implementations and explain a bit about how and why you might use each one. I hope consolidating all this information in one place will be more helpful than the Library of Congress’ github pages (which indeed make little effort to make their instructions accessible to anyone unfamiliar with developer-speak). If you want to learn more about the BagIt specification itself (i.e. what pieces/files actually make up a bag, how data gets hierarchically structured inside a bag, how BagIt checksums and tag manifests to do the file fixity work I mentioned earlier), I can recommend this introductory slideshow to BagIt from Justin Littman at the LoC.

Update (12/15/2017): While all the above info still stands, the roundup and installation instructions below are no longer 100% accurate. I’m keeping this post up for the sake of web archiving and laying the ever-changing state of digital preservation bare and all that, but if you’re here I’d now recommend that you proceed over to this post on using BagIt in 2018 for more up-to-date documentation!

1. Bagger (BagIt-java)

screen-shot-2016-09-20-at-9-27-12-am

The BagIt library was originally developed using a programming language called Java. For its first four stable versions, Bagit-java could be used either via command-line interface (a Terminal window on Macs or Linux/Ubuntu, Command Prompt in Windows), or via a Graphical User Interface (GUI) developed by the LoC itself called Bagger.

As of version 5 of Bagit-java, the LoC has completely ceased support of using BagIt-java via the command line. That doesn’t mean it isn’t still out there – if, for instance, you’re on a Mac and use the popular package manager Homebrew, typing

$ brew install bagit

will install the last, stable version (4.12.1) of BagIt-java. But damned if I know how to actually use it, because in its deprecation of support the LoC seems to have removed (or maybe just never wrote?) any online documentation (github or elsewhere) of how to use BagIt-java via command-line. No manual page for you.

Instead you now have to use Bagger to employ BagIt-java (from the LoC’s perspective, anyway). Bagger is primarily designed to run on Windows or, with some tinkering, Linux/Ubuntu and Mac OSX.

maxresdefault
They do, funnily enough, include a screed about the iPhone 7’s lack of a 3.5mm headphone jack.

So once you actually download Bagger, which I primarily recommend if you’re on Windows, there’s some pretty good existing documentation for using the application’s various features, and even without doing that reading, it’s a pretty intuitive program. Big honking buttons help you either start making a new bag (picking the A/V and other files you want to be included in the bag one-by-one and safely packaging them together into one directory) or create a “bag in place”, which just takes an already-existing folder/directory and structures the content within that folder according to the BagIt specification. You can also validate bags that have been given/sent to you (that is, check the fixity of the data). The “Is Bag Complete” feature checks whether a folder/directory you have is, in fact, officially a bag according to the rules of the BagIt spec.

(FWIW: I managed to get Bagger running on my OSX 10.10.5 desktop by installing an older version, 2.1.3, off of the Library of Congress’ Sourceforge. That download included a bagger.jar file that my Mac could easily open after installing the legacy Java 6 runtime environment (available here). But, that same Sourceforge page yells at you that the project has moved to Github, where you can only download the latest 2.7.1 release, which only includes the Windows-compatible bagger.bat file and material for compiling on Linux, no OSX-compatible .jar file. I have no idea what’s going on here, and we’ve definitely fallen into a tech-jargon zone that will scare off laypeople, so I’m going to leave it at “use Bagger with Windows”)

Update: After some initial confusion (see above paragraph), the documentation for running Bagger on OSX has improved twofold! First, the latest release of Bagger (2.7.2) came with some tweaks to the github repo’s documentation, including instructions for Linux/Ubuntu/OSX! Thanks, guys! Also check out the comments section on this page for some instructions for launching Bagger in OSX from the command-line and/or creating an AppleScript that will let you launch Bagger from Spotlight/the Applications menu like you would any other program.

2. Exactly

screen-shot-2016-09-20-at-9-29-04-am
Ed Begley knows exactly what I’m talking about

Developed by the consulting/developer agency AVPreserve with University of Kentucky Libraries, Exactly is another GUI built on top of Bagit-java. Unlike Bagger, it’s very easy to download, immediately install and run versions of Exactly for Mac or Windows, and AVPreserve provides a handy quickstart guide and a more detailed user manual, both very useful if you’re just getting started with bagging. Its features are at once more limited and more expansive than Bagger. The interface isn’t terribly verbose, meaning it’s not always clear what the application is actually doing from moment to moment. But Exactly is more robustly designed for insertion of extra metadata into the bag (users can create their own fields and values to be inserted in a bag’s bag-info.txt file, so you could include administrative metadata unique to your own institution).

And Exactly’s biggest attraction is that it actually serves as a file delivery client – that is, it won’t just package the files you want into a bag before transferring, but actual perform the transfer. So if you want to regularly move files to and from a dedicated server for storage with minimal fuss, Exactly might be the tool you want, albeit it could still use some aesthetic/verbosity design upgrades.

3. Command Line (BagIt-python)

screen-shot-2016-09-20-at-9-02-09-am

Let’s say you really prefer to work with command-line software. If you’re comfortable with CLI, there are many advantages – greater control over your bagging software and exactly how it operates. I mentioned earlier that the LoC stopped supporting BagIt-java for command-line, but that doesn’t mean you command-line junkies are completely out of luck. Instead they shifted support and development of command-line bagging to a different programming language: Python.

If you’re working in the command-line, chances are you’re on Mac OSX or maybe Linux. I’m going to assume from here on you’re on OSX, because anyone using Linux has likely figured all this out themselves (or would prefer to). And here’s the thing, if you’re a novice digital archivist working in the Terminal on OSX: you can’t install BagIt-python using Homebrew.

Instead, you’re going to need Python’s own package manager, a program called “pip.” In order to get Bagit-python, you’re going to need to do the following:

  1. Check what version of Python is on your computer. Mac OSX and Linux machines should come with Python already installed, but Bagit-python will require at least Python version 2.6 or above. You can check what version of Python you’re running in the terminal with:$ python ––version

    If your version isn’t high enough, visit https://www.python.org/downloads/ and download/install Python 2.7.12 for your operating system. (do not download a version of Python 3.x – Bagit-python will not work with Python 3, as if this wasn’t confusing enough)

  2. Now you’ll need the package manager/installer, pip. It may have come ready to go with your Python installation, or not. You can check that you have pip installed with:$ pip ––version

    If you get a version number, you’re ready to go to step 3. If you get a message that pip isn’t installed, you’ll have to visit https://pip.pypa.io/en/stable/installing/Click on the hyperlinked “get-pip.py”. A tab should open with a buncha text – just hit Command-S and you should have the option to save/download this page as a Python script (that is, as a .py file). Then, back in the Terminal, navigate into whatever directory you just downloaded that script into (Downloads perhaps, or Desktop, or wherever else), and run the script by invoking$ python get-pip.py

    Pip should now be installed.

     

  3. Once pip is in place you can just use it to install Bagit-python the same way you use Homebrew:$ pip install bagit               (sudo if necessary)

You should be all set now to start using Bagit-python via the command-line. You can invoke Bagit-python using commands that start with “bagit.py” – it’s a good idea to go over command line usage and options for adding metadata by visiting the help page first, which is one of the LoC’s better efforts at documentation: https://github.com/LibraryOfCongress/bagit-python or:

$ bagit.py –help

But the easiest usage is just to create a bag “in place” on a directory you already have, with a flag for whichever kind of checksum you want to use for fixity:

$ bagit.py –md5 /path/to/directory

As with Bagger, the directory will remain exactly where it is in your file system, but now will contain the various tag/checksum manifests and all the media files within a “data” folder according to the BagIt spec. Again, the power of command-line BagIt-python lies in its flexibility – the ability to add metadata, choose different checksum configurations, increase or decrease the verbosity of the software. If you’re comfortable with command-line tools, this is the implementation I would most recommend!

Update:  Please also read this terrific tutorial by Kathryn Gronsbell for the 2016 NDSR symposium for a more detailed rundown of BagIt-python use and installation, including sample exercises!!!

4. BaggerJS

screen-shot-2016-09-20-at-9-31-51-am

It’s still in an early/experimental phase, but the LoC is also working on a web-based application for bagging called BaggerJS (built on a version of the BagIt library written in JavaScript, which yes, for those wondering at home, is a totally different programming language than Java that works specifically for web applications, because we needed more versioning talk in this post).

Right now you can select and hash files (generate checksums for file fixity), and upload valid bags to a cloud server compatible with Amazon’s s3 protocol. Metadata entry and other features are still incomplete, but if development continues, this might, like Exactly, be a nice, simplified way to perform light bag transfers, particularly if you use a cloud storage service to back up files. It also has the advantage of not requiring any installation whatsoever, so novice users can more or less step around around the Java vs. Python, GUI vs. CLI questions.

https://libraryofcongress.github.io/bagger-js/

5. Integrated into other software platforms/programs

The BagIt library has also been integrated into larger, more complex software packages, designed for broader digital repository management. Creating bags is only one piece of what these platforms are designed to do. One good example is Archivematica, which can perform all kinds of file conformance checks, transcoding, automatic metadata generation and more. But it does package data according to the BagIt spec whenever it actually transfers files from one location to another.

And that’s the other, more complicated way to use the BagIt library – by building it into your own software and scripts! Obviously this is a more advanced step for archivists who interested in coding and software development. But the various BagIt versions (Java, Python, JavaScript) and the spec itself are all in the public domain and anyone could incorporate them into their own applications, or recreate the BagIt library in the programming language of their choice (there is, for instance, a BagIt-ruby version floating around out there, though it’s apparently deprecated and I’ve never heard of anyone who used it).