The Cloud Is Just Someone Else’s 10,000 Computers

I don’t need to be the umpteenth person to tell you, in dramatically vague terms, that cloud computing and software-as-a-service have (*Paul LaFontaine voice*) changed the very society we live in. But every now and then I am reminded that the Big Tech consolidation and movement of everything online in the last ~15 years has fundamentally obstructed and obscured how computers work and most people’s understanding of what’s happening in/on the device right in front of them. [1] Google Drive’s native file “formats” being another fabulous example, but that is a post for another day

And as individuals and institutions across the board, from enterprise-level research universities to local collectives, have quickly come to rely on cloud-based file hosting, I think it has become absolutely critical for archivists – whether it’s in the context of picking one’s own storage options for a digital preservation and access plan, or sifting through acquisition, management, and organization of other’s folks’ collections – to understand what these services actually are and what they are providing. I’ve overheard requests for an “open source alternative” to Google Drive or Dropbox a couple of times recently, and it both encourages and troubles me a bit: encouraging because folks are questioning the ubiquity and motivation of these companies and services, but troubling because it means cloud-based companies have rather successfully obscured not only where people’s files and data live, but the mélange of software we use to interact with them. In essence, an “open-source alternative” to Google Drive does not exist; or at least, it doesn’t exist in the same way that we commonly talk about open-source alternatives to proprietary and expensive desktop software products (Adobe Premiere vs. Kdenlive, GarageBand vs. Audacity, Microsoft Office vs. LibreOffice, etc.)

Don’t SaaS Me

I will repeatedly use the word “service” in this post, and that’s not accidental. As part of a productivity suite (including the other “Google Workspace” products, like Google Docs, Google Sheets, Google Photos, etc), Google Drive and its equivalents are not just software, the way Finder or Windows File Manager are pieces of software, running on your computer, that allow you to browse and manipulate your files.  It is an example of what’s called Software as a Service (SaaS), an access and delivery model where, instead of downloading and installing a particular program to your computer, [2] Say, via Homebrew you log in and use a software platform that runs on another computer (often, though as we’ll see not exclusively, in return for some kind of subscription fee or license). The software platform itself, and often the data you’re manipulating (word files, spreadsheets, presentations, for example) is not actually stored and running on the computer you’re using to access the service – it’s on someone else’s, a server. To be even more specific to the current technological moment and not 20-year-old definitions of computing, it’s running in “the cloud“, which is not precisely “someone else’s computer” as the joke goes, but more as my title suggests, “someone else’s data center(s) with potentially thousands of servers working in concert” (scale is going to be a running theme here, and it’s important).

So when we talk about using these kinds of monolithic file hosting and productivity platforms – I’ll stick with Google Drive for this post as a very representative and relevant example, but this also encompasses, for instance, iCloud, Microsoft OneDrive and Office 365, Adobe Creative Cloud, Dropbox, Box, etc. – using them actually encompasses four [3]There is a fifth factor that I could discuss here – bandwidth – which encompasses the amount and speed at which you can move data to, from, and within these platforms. But that’s … Continue reading different things:

1. Storage space (GBs and TBs of drive space on which to put data; likely when you get down to it these are hard disk drives, but we could also be talking about solid state drives or even tape depending on precisely the service/price/purpose)

2. Computing power (processors and RAM on which to run software)

3. Server software (the software that runs on the cloud computing power and that actually manages/interacts with your files; with proprietary services, you will basically never as a user actually directly see or interact with this platform, in Google Drive’s case only Google’s internal developers and sysadmins know what this looks like or how to install, run, debug it, etc.)

4. Client software (the software that users actually see and use to control/direct the server application; in Google Drive’s case, the most commonly-used client is the web interface you see when you log in via a browser to drive.google.com, meaning this web client also runs on Google’s servers)

Because the last piece there is basically the only one that is visible to folks in any meaningful way, there can be a conflation of the client with the totality of the service. But putting all these four factors together is why, to put it bluntly, there is no “open source alternative” to Google Drive: in particular there is no such thing as open-source storage space or computing power. Those are material/physical resources that you either have or you don’t – and Google has them in spades, while the typical computer/SaaS user doesn’t.

Compounding this is the capitalist trap paradox that the more you use these platforms (again, either as an individual or an organization) the more difficult it will be to extricate yourself and your stuff, because you have been invisibly relying more and more on the time, money, and knowledge it takes to manage storage space and computing power – especially communally-used storage and computing power, as is the case with a file sharing service accessed by potentially many staff, patrons, and/or community members within one organization alone. Software, no matter how openly or ethically made, cannot replace those considerations on its own.

But! This is not to despair, nor disparage the instinct that I know is leading folks to ask such questions and seek alternatives. For many, many (good!) reasons, archivists are looking to divorce themselves and their work as much as possible from Big Tech companies. I do not mean to say “this is impossible” and leave it at that, but rather elaborate on exactly what currently available options are, and a glimmer of the effort and tradeoffs involved with them.

The Disclaimer

I will, for the purposes of this post, be assuming a thing or two, namely that by “Big Tech” I mean folks are looking to avoid proprietary SaaS, file hosting, and cloud computing options from a certain cadre of outsized companies including: Google, Microsoft, Amazon, Apple, Oracle, Dropbox. In the course of doing so I will mention other options, potentially including paid services offered by smaller companies (“smaller”, in some of these cases, being an extremely relative term).

This should not be read as a full-throated endorsement, paid advertisement for, or suggestion that these companies/services are not still part of the problem. You know – *gestures wildly* – the problem. Sometimes we just need the footing to investigate options, start a conversation, and move the needle a bit, particularly in work or communal settings. That’s all I’m aiming for here. For more well-rounded critique, the larger social implications of all cloud computing, and envisioning and embodying true alternatives, may I humbly encourage you go read something else, like Logic Magazine. [4] I shit you not, I had no idea that their latest issue (16) is literally called “Clouds” when I first wrote this.

Switching Clients

The good news – maybe there’s no open-source Google Drive, but there are open-source Google Drive clients that you can switch to, right now, today, with absolutely no disruption, changes, or migration of your files necessary. Just as you are not actually limited to interacting with files hosted on Drive with Google’s browser-based web app – you can also install and use their official desktop client, which, as the name implies, runs on *your* desktop/laptop’s computing power rather than Google’s servers; or use their mobile app client, which does the same but on your phone – you are not limited here by Google-made clients at all.

Open-source clients take advantage of Google Drive’s public API, which means that as long you have a Drive account and can provide the credentials to that account, any piece of client software can control/pass commands to Drive’s *server* application to perform certain tasks with your files. You are still taking advantage of Google’s resources (storage space, computing power, server-based file management platform), but you can also take advantage of features that SaaS companies like Google don’t always directly offer or intend with their own client software.

This may be particularly useful for archival or preservation-minded organizations, who often have use cases that Google doesn’t seek to serve (because our usage/business is at a scale that pales compared to general office productivity, personal file backup, education, etc). That might include more stable transfers than the upload/download options offered by a web browser, or automatically checking file fixity.

Open-source clients also tend to be designed more generically or comprehensively to hook into multiple cloud storage platforms. So you can use them to manage files on multiple services, e.g. Google Drive and Dropbox, or even transfer between them, without getting locked into using every vendor/SaaS’ client separately. That in turn helps for cleaner and more protected workflows – over time, as platforms change their clients and service offerings (i.e. pricing, limitations on storage space/computing power, privacy or ToS) you and your org/community can step back from these vagaries a bit and make smaller tweaks to settings or configurations (or even just move over to another hosting platform, if need be) rather than completely re-learning and re-training.

(Not that open-source software certainly *also* changes design, features, or workflows, and that such cycles still need to be taken into account; but, these changes are likely not made from a place of planned/forced obsolescence or pushing you into more profitable behaviors, which in my experience leads to more gradual changes and longer tails of backwards-compatibility)

  • rclone
    A personal favorite – rclone builds on basic Unix command-line tools (especially rsync) for an experience tailor-made for using cloud services in all manner of situations. You can manually control transfer speeds (for uploading/downloading over low bandwidth without losing or cancelling transfers), easily preserve timestamps and other file system data, encrypt sensitive files, and more. And their docs are great, there are example configurations on their site for a large number of the most popular file hosting services. Rclone itself has an experimental GUI built-in now but I personally prefer Rclone Browser for day-to-day use.

  • CyberDuck
    This is a pretty old-school client that goes back to the early aughts and was originally meant for uploading/downloading data to personal or web servers over FTP/SFTP, but seems to have successfully made the transition over to working with cloud services.

  • Duplicati
    Duplicati is specifically designed for encryption and creating backups of local files on external storage/remote servers (such as cloud services), so it doesn’t seem to be a full-fledged file manager per se. But, for archivists, orgs, or individuals looking for a way to primarily just use Google Drive or similar services as a remote storage option and not do a lot of sharing or manipulation of files at rest (that is, you just want a secure backup for your second or third copy of certain files/materials), it seems like a pretty good option for managing that.

Hack the Platform

Let’s say though that an open source client isn’t enough – you want to divorce from the Google Drive platform altogether and not still be invisibly relying on the Drive server software. Yes, open source file hosting platforms (server + client) exist!

The issue here is that, unlike clients which can be run on any old laptop/desktop/mobile device, server software requires – well, a server. So if you want to make the leap and use such platforms, you may need to learn something about server administration, network configuration, etc. Plus again, you need the computing power and storage space to run the software and store the data, whether that’s server hardware that you actually control/own like a desktop NAS unit, or going back to a cloud service (something like Google Cloud rather than Drive, a service that provides computing infrastructure but no particular application or software running on it). I’ll elaborate on options for the latter in a moment.

But overall I want to highlight that this is the step where we’re potentially starting to talk about time and effort to learn *new* technological skills in order to take advantage of this software, rather than just open source tools that build on or unlock skills, expertise, and workflows that you already have as a digital archivist.

  • Nextcloud
    Again I’m starting here with the platform I myself use and am most familiar with. Nextcloud’s probably the biggest name in “Google Drive” alternatives in FOSS-world (at least in the U.S.), and it definitely aims to go pound-for-pound against Drive by offering not just file storage/backup but tons of sharing options, integrations, and (as of fall 2021) an out-of-the-box office productivity suite that lets you collaboratively create and edit documents, spreadsheets and presentations in the built-in web client, just like Google Docs/Sheets/Slides. Personally I’ve used mine to sync my most important and frequently-used documents across desktops; share digitized videos with family; draft blog posts with contributors for this very site; automatically back up photos from my phone; and set up a personal music streaming service. It’s pretty great, and that’s all just for me, ignoring much of the multi-account/user options.

  • ownCloud
    Nextcloud is actually a fork of ownCloud, so they are extremely similar and many plugins/integrations, configuration settings, and other aspects of Nextcloud/ownCloud management seem completely cross-compatible. The history of the two projects is kind of murky to me as an outsider/onlooker – the split seems to have occurred over disagreements in business modeling and licensing [5]The unfortunate and basically unanswerable kind of fight that annoyingly erupts from time to time in open source communities over who is really behaving according to “the spirit” of free … Continue reading, resulting in two platforms that look and behave in much the same ways but serve slightly different communities, with ownCloud skewing toward enterprise-level business and support. Anecdotally it seems like ownCloud is a bit more popular/adopted in Europe as well. I’ll link a breakdown between the two here if you are interested and really want to get into the nitty-gritty of the differences between the two.

  • Seafile
    Like ownCloud, Seafile offers both a completely free and open source “community edition” as well as a paid tier business/enterprise option. Seafile was originally created in China and seems to have been primarily adopted in Asian and European markets; in fact its current lack of presence in the U.S. may at least be partly due to its Chinese and German partner organizations squabbling over the rights to Seafile’s U.S. trademark/intellectual property. I know little about Seafile but it seemed worth mentioning.

Mo Hosting Mo Problems

All right, now as I’ve repeatedly mentioned, if you’re interested in a Google Drive platform alternative like Nextcloud/ownCloud/Seafile, you’re going to need to invest in some hardware to go with your shiny new open source software. But let’s say you don’t have the physical space, time, or uninterrupted power supply to run and configure your own actual server/rack. That’s what cloud computing is supposed to offer: access to someone else’s hardware resources.

Of course, in our efforts to get away from prominent SaaS file hosting solutions like Google Drive, this leads us right back to the same suspects: there’s a bit of paradox, ethically-speaking, in setting up your personal or organizational Nextcloud platform on Google Cloud, or Microsoft Azure, or Amazon Web Services. This is also where the whole scale and model of cloud computing runs into problems in a capitalist society in general: if you’re big enough to offer cloud services at a useful/reasonable price point, there’s a not-insignificant chance you are: 1) still contributing to climate collapse and/or 2) about to get swallowed/bought up by one of those three or four giants anyway.

So, take all of this with a grain of salt, but if I were at a minimum just looking into alternatives to getting away from Google, I would look at:

  • DigitalOcean
    Disclosure: the web site you’re reading right now is hosted on DigtalOcean (as is my personal Nextcloud server I mentioned earlier). I like the clear pricing models and above all the community around DO – their documentation and tutorials, written by both staff and community members, are a terrific place to start if you’re interested in learning about system/server administration or just need some clear instructions for setting up particular, popular platforms (like say, Nextcloud).

  • Linode
    Linode also comes up a lot as an AWS/Azure/Google Cloud alternative – I tried them once and I admit pricing and usage was all extremely similar to DO, so I have no real comparison to make here except, as I’ve mentioned, that I’ve preferred DO’s documentation (though there’s absolutely nothing preventing you from using DO’s tutorials to set up open source software on Linode’s infrastructure!)

  • Backblaze
    Both Backblaze and the next service here specialize more in cloud storage than computing; which is to say, to my knowledge they don’t offer virtual servers and computing power (processors/RAM) the way DigitalOcean and Linode do; just remote storage management and client software to allow syncing/backing up your files. That means you’re not going to, e.g. install Nextcloud on Backblaze and get a lot of productivity/sharing options, a complete “Google Drive” equivalent – but if you’re the type of person or organization that’s simply using Google Drive or Dropbox for remote storage and backup in the first place, and largely ignoring the other features of the Google Workspace SaaS, then maybe this (which can be used in combination with one of the open source clients I mentioned earlier) is the kind of “alternative” service you’re looking for.

  • Wasabi
    Again, like Backblaze, Wasabi specializes in cloud storage, and the company’s very quick rise in this space (since only 2018) has a lot to do with their dirt-cheap pricing model. Take that as you will.

  • Reclaim Hosting
    Though they are technically piggy-backing on DigitalOcean’s cloud computing infrastructure, I did want to shout out Reclaim Hosting as an option that the sort of folks likely to be reading this – educators, non-profit/cultural heritage practitioners, students – should probably be aware of. In essence Reclaim packages support/service plans on top of DigitalOcean’s cloud computing infrastructure, taking a lot of the nitty-gritty of self-hosting SaaS platforms (domain management, setting up firewalls and network security, installing the actual server software like WordPress, Drupal, Omeka) either out of your hands entirely or at least simplifying it.

    And if you don’t specifically have ties to the educator/academia community that Reclaim is targeting, know that there are similar kinds of services out there: essentially, SaaS companies running open source platforms for you but managing the cloud computing infrastructure. A good place to start is looking at the websites for one of the open source platforms I mentioned above and looking for their “partners” or “hosting providers” – third-party companies that run these open source options on their servers and offer you access, basically the same way Google does with Drive. Pricing and terms of service may vary wildly depending where the company is located and other vagaries, so you’re always going to want to pay attention to the fine print, particularly if you’re working with any sensitive digital collection or community data.

    The Part Where I Ask *You* Questions


    None of the sections above are meant to be comprehensive. I’ve listed some tools and services that I have at least a passing familiarity with so that if anyone’s interested, I could maybe answer particular questions or chat more about my experience with them. But above all I’m offering these as examples, and trying to ground some of the vocabulary and features of these services, so that folks have a better idea of how to look for and evaluate the right combination of software and computing services that works for your stuff and situation.

    With all of these tools and SaaS options, whether you pursue alternatives or stick with a monolith like Google Drive, remember to keep one eye peeled behind the curtain. Some useful things to ask yourself in any file hosting project or evaluation:

  • Where are my files actually stored?
  • Where is the software I’m using running (and who’s running/managing/maintaining it)?
  • Who else needs to access these files, when, and for what purpose? Do I need to be able to collaboratively manage and edit files, or do I just need a backup?
  • How much storage space do I actually need?
  • Do I prefer working with files on my desktop or in a browser? Both?
  • Do I have time and capacity (including $) to learn more about computing and self-hosting software – or do I just need a quick, trusted service solution (also $, there’s really no way to get around the $)?

Footnotes

Footnotes
1 Google Drive’s native file “formats” being another fabulous example, but that is a post for another day
2 Say, via Homebrew
3 There is a fifth factor that I could discuss here – bandwidth – which encompasses the amount and speed at which you can move data to, from, and within these platforms. But that’s really getting into the weeds, potentially brings in complications like your Internet Service Provider, and frankly IMHO isn’t going to be a high-impact decision point for individuals or small-scale orgs. If you’re looking to get your mid- to large-scale institution away from Amazon Web Services, it’s going to come into play; but I’ll leave it out of “what do we do about Google Drive”.
4 I shit you not, I had no idea that their latest issue (16) is literally called “Clouds” when I first wrote this.
5 The unfortunate and basically unanswerable kind of fight that annoyingly erupts from time to time in open source communities over who is really behaving according to “the spirit” of free or open software

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.