Progress Report: The Data Lifeboat viewer, circa 2024

In my previous post, I showed you our prototype Data Lifeboat creation workflow. At the end of the workflow, we’d created a Data Lifeboat we could download! Now I want to show you what you get inside the Data Lifeboat package.

img { max-height: 70vh; display: block; margin: 0 auto; }
h2 { margin-top: 1em; }

Design goals

When we were designing the contents of a Data Lifeboat, we had the following principles in mind:

Self-contained – a Data Lifeboat should have no external dependencies, and not rely on any external services that might go away between it being created and opened.
Long lasting – a Data Lifeboat should be readable for a long time. It’s a bit optimistic to imagine anything digital we create today will last for 100 years, but we can aim for several decades at least!
Understandable – it should be easy for anybody to understand what’s in a Data Lifeboat, and why it might be worth exploring further.
Portable – a Data Lifeboat should be easy to move around, and slot into existing preservation systems and workflows without too much difficulty.

A lot of the time, when you export your data from a social media site, you get a folder full of opaque JSON files. That’s tricky to read, and it’s not obvious why you’d care about what’s inside – we wanted to do something better!

We decided to create a “viewer” that lives inside every Data Lifeboat package which gives you a more human-friendly way to browse the contents. The underlying data is still machine-readable, but you can see all the photos and metadata without needing to read a JSON file. This viewer is built as a static website. Building small websites with vanilla HTML and JavaScript gives us something lightweight, portable, and likely to last a long time.

This is inspired by services like Twitter and Instagram which also create static websites as part of their account export – but we’re going much smaller and simpler.

Folder structure

When you open a Data Lifeboat, here’s what’s inside:

The files folder contains all of the photo and video files – the JPEGs, GIFs, and so on. We currently store two sizes of each file: the high-resolution original file that was uploaded to Flickr, and a low-resolution thumbnail.

The metadata folder contains all of the metadata, in machine-readable JavaScript/JSON files. This includes the technical metadata (like the upload date or resolution) and the social metadata (like comments and favorites).

The viewer folder contains the code for our viewer. It’s a small number of hand-written HTML, CSS, and JavaScript files.

The README.html file is the entry point to the viewer, and the first file we want people to open. This name is a convention that comes from the software industry, but we hope that the meaning will be clear even if people are unfamiliar with it.

If you’re trying to put a Data Lifeboat into a preservation system that requires a fixed packaging format like BagIt or OCFL, you could make this the payload folder – but we didn’t want to require those tools in Data Lifeboat. Those structures are useful in large institutions, but less understandable to individuals. We think of this as progressive enhancement, but for data formats.

Inside the viewer

Let’s open the viewer and take a look inside.

When you open README.html, the first thing you see is a “cover sheet”. This is meant to be a quick overview of what’s in the Data Lifeboat – a bit like the cover sheet on a box of papers in a physical archive. It gives you some summary statistics and tells you why the creator thought these photos were worth keeping – this is what was written in the Data Lifeboat creation workflow. It also shows a small number of photos, from the most popular tags in the Data Lifeboat.

This cover sheet is a completely self-contained HTML file. Normally web pages load multiple external resources, like images or style sheets, but we plan for this file to be completely self-contained. Styles will be inline, and images will be base64-encoded data URIs. This design choice makes it easy to create multiple copies of the cover sheet, independent of the rest of the Data Lifeboat, as a summary of the contents.

For example, if you had a large collection of Data Lifeboats, you could create an index from these cover sheets that a researcher could browse before deciding exactly which Data Lifeboat they wanted to download.

Now let’s look at a list of photos. If you click on any of the summary stats, or the “Photos” tab in the header, you see a list of photos.

This list shows you a preview thumbnail of each photo, and some metadata that can be used for filtering and sorting. For example, you can sort by photos with the most/least comments, or filter to photos uploaded by a particular Flickr member.

If you click on a photo, you can see an individual photo page. This shows you the original copy of the photo, and all the metadata we have about it:

Eventually you’ll be able to use the metadata on this page to find similar photos – for example, you’ll be able to click on a tag to find other photos with the same tag.

These pages still need a proper visual design, and this prototype is just meant to show the range of data we can capture. It’s already more understandable than a JSON file, but we think we can do even better!

Legible in the long term

The viewer will also contain documentation, about both the idea of Data Lifeboat and the structure of this particular package. If a Data Lifeboat is opened by somebody who doesn’t know about the project in 50 years, we want them to understand what they’re looking at and how they can use it.

It will also contain the text and agreement date of any policies agreed upon by the creator of this particular Data Lifeboat.

For example, as we create the machine-readable metadata files, we’re starting to document their structure. This should make it easier for future users to extract the metadata programmatically, or even build alternative viewer applications.

Lo-fi and low-tech

The whole viewer is written in a deliberately low-tech way. All the HTML templates, CSS and JavaScript are written by hand, with no external dependencies or bloated frameworks. This keeps the footprint small, makes it easier for us to work on as a small team, and we believe gives the viewer a good chance of lasting for multiple decades. The technology behind the web has a lot of sticking power.

This is a work-in-progress – we have more ideas that we haven’t built yet, and lots of areas where we know where the viewer can be improved. Check back soon for updates as we continue to improve it, and look out for a public alpha next year where you’ll be able to create your own Data Lifeboats!

Zoology specimens, Australasian Antarctic Expedition Reports, 1911-1914

From Desiderata to READMEs: The case for a C.A.R.E.-full Data Lifeboat Pt. II

The second of a two-part blog post detailing possible prompts for Data Lifeboat creators to encourage ethical and informed collecting

From Desiderata to READMEs: The case for a C.A.R.E.-full Data Lifeboat Pt. I

The first of a two-part blog post detailing the origins and approaches to ethical archiving in the Data Lifeboat tool.

A Phoenix in Paris: Data Lifeboats for User-Generated Histories

The case for citizen-driven collections in enhancing our understanding of contemporary events.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.