Progress Report: Creating a Data Lifeboat circa 2024

Alex Chan

In my previous post, I showed you our prototype Data Lifeboat creation workflow. At the end of the workflow, we’d created a Data Lifeboat we could download! Now I want to show you what you get inside the Data Lifeboat package.

Design goals

When we were designing the contents of a Data Lifeboat, we had the following principles in mind:

  • Self-contained – a Data Lifeboat should have no external dependencies, and not rely on any external services that might go away between it being created and opened.
  • Long lasting – a Data Lifeboat should be readable for a long time. It’s a bit optimistic to imagine anything digital we create today will last for 100 years, but we can aim for several decades at least!
  • Understandable – it should be easy for anybody to understand what’s in a Data Lifeboat, and why it might be worth exploring further.
  • Portable – a Data Lifeboat should be easy to move around, and slot into existing preservation systems and workflows without too much difficulty.

A lot of the time, when you export your data from a social media site, you get a folder full of opaque JSON files. That’s tricky to read, and it’s not obvious why you’d care about what’s inside – we wanted to do something better!

We decided to create a “viewer” that lives inside every Data Lifeboat package which gives you a more human-friendly way to browse the contents. The underlying data is still machine-readable, but you can see all the photos and metadata without needing to read a JSON file. This viewer is built as a static website. Building small websites with vanilla HTML and JavaScript gives us something lightweight, portable, and likely to last a long time.

This is inspired by services like Twitter and Instagram which also create static websites as part of their account export – but we’re going much smaller and simpler.

Folder structure

When you open a Data Lifeboat, here’s what’s inside:


The files folder contains all of the photo and video files – the JPEGs, GIFs, and so on. We currently store two sizes of each file: the high-resolution original file that was uploaded to Flickr, and a low-resolution thumbnail.

The metadata folder contains all of the metadata, in machine-readable JavaScript/JSON files. This includes the technical metadata (like the upload date or resolution) and the social metadata (like comments and favorites).

The viewer folder contains the code for our viewer. It’s a small number of hand-written HTML, CSS, and JavaScript files.

The README.html file is the entry point to the viewer, and the first file we want people to open. This name is a convention that comes from the software industry, but we hope that the meaning will be clear even if people are unfamiliar with it.

If you’re trying to put a Data Lifeboat into a preservation system that requires a fixed packaging format like BagIt or OCFL, you could make this the payload folder – but we didn’t want to require those tools in Data Lifeboat. Those structures are useful in large institutions, but less understandable to individuals. We think of this as progressive enhancement, but for data formats.

Inside the viewer

Let’s open the viewer and take a look inside.

When you open README.html, the first thing you see is a “cover sheet”. This is meant to be a quick overview of what’s in the Data Lifeboat – a bit like the cover sheet on a box of papers in a physical archive. It gives you some summary statistics and tells you why the creator thought these photos were worth keeping – this is what was written in the Data Lifeboat creation workflow. It also shows a small number of photos, from the most popular tags in the Data Lifeboat.

This cover sheet is a completely self-contained HTML file. Normally web pages load multiple external resources, like images or style sheets, but we plan for this file to be completely self-contained. Styles will be inline, and images will be base64-encoded data URIs. This design choice makes it easy to create multiple copies of the cover sheet, independent ​​of the rest of the Data Lifeboat, as a summary of the contents.

For example, if you had a large collection of Data Lifeboats, you could create an index from these cover sheets that a researcher could browse before deciding exactly which Data Lifeboat they wanted to download.

Now let’s look at a list of photos. If you click on any of the summary stats, or the “Photos” tab in the header, you see a list of photos.

This list shows you a preview thumbnail of each photo, and some metadata that can be used for filtering and sorting. For example, you can sort by photos with the most/least comments, or filter to photos uploaded by a particular Flickr member.

If you click on a photo, you can see an individual photo page. This shows you the original copy of the photo, and all the metadata we have about it:

Eventually you’ll be able to use the metadata on this page to find similar photos – for example, you’ll be able to click on a tag to find other photos with the same tag.

These pages still need a proper visual design, and this prototype is just meant to show the range of data we can capture. It’s already more understandable than a JSON file, but we think we can do even better!

Legible in the long term

The viewer will also contain documentation, about both the idea of Data Lifeboat and the structure of this particular package. If a Data Lifeboat is opened by somebody who doesn’t know about the project in 50 years, we want them to understand what they’re looking at and how they can use it.

It will also contain the text and agreement date of any policies agreed upon by the creator of this particular Data Lifeboat.

For example, as we create the machine-readable metadata files, we’re starting to document their structure. This should make it easier for future users to extract the metadata programmatically, or even build alternative viewer applications.

Lo-fi and low-tech

The whole viewer is written in a deliberately low-tech way. All the HTML templates, CSS and JavaScript are written by hand, with no external dependencies or bloated frameworks. This keeps the footprint small, makes it easier for us to work on as a small team, and we believe gives the viewer a good chance of lasting for multiple decades. The technology behind the web has a lot of sticking power.

This is a work-in-progress – we have more ideas that we haven’t built yet, and lots of areas where we know where the viewer can be improved. Check back soon for updates as we continue to improve it, and look out for a public alpha next year where you’ll be able to create your own Data Lifeboats!

Progress Report: Creating a Data Lifeboat circa 2024

Alex Chan

In October and November, we held two Data Lifeboat workshops, funded by the Mellon Foundation. We had four days of detailed discussions about how Data Lifeboat should work, we talked about some of the big questions around ethics and care, and got a lot of useful input from our attendees.

As part of the workshops, we showed a demo of our current software, where we created and downloaded a prototype Data Lifeboat. This is an early prototype, with a lot of gaps and outstanding questions, but it was still very helpful to guide some of our conversations. Feedback from the workshops will influence what we are building, and we plan to release a public alpha next year.

We are sharing this work in progress as a snapshot of where we’ve got to. The prototype isn’t built for other people to use, but I can walk you through it with screenshots and explanations.

In this post, I’ll walk you through the creation workflow – the process of preparing a Data Lifeboat. In a follow-up post, I’ll show you what you get when you download a finished Data Lifeboat.

Step 1: Sign in to Flickr

To create a Data Lifeboat, you have to sign in to your Flickr account:

This gives us an authenticated source of identity for who is creating each Data Lifeboat. This means each Data Lifeboat will reflect the social position of its creator in Flickr.com. For example, after you log in, your Data Lifeboat could contain photos shared with you by friends and family, where those photos would not be accessible to other Flickr members who aren’t part of the same social graph.

Step 2: Choose the photos you want to save

To choose photos, you enter a URL that points to photos on Flickr.com:

In the prototype, we can only accept a single URL, either a Gallery, Album, or Photostream for now. This is good enough for prototyping, but we know we’ll need something more flexible later – for example, we might allow you to enter multiple URLs, and then we’d get photos from all of them.

Step 3: See a summary of the photos you’d be downloading

Once you’ve given us a URL, we fetch the first page of photos and show you a summary. This includes things like:

  • How many photos are there?
  • How many photos are public, semi-public, or private?
  • What license are the photos using?
  • What’s the safety level of the photos?
  • Have the owners disabled downloads for any of these photos?

Each of these controls affects what we are permitted to put in a Data Lifeboat, and the answers will be different for different people. Somebody creating their family archive may want all the photos, whereas somebody creating a Data Lifeboat for a museum might only want photos which are publicly visible and openly licensed.

We want Data Lifeboat creators to make informed decisions about what goes in their Data Lifeboat, and we believe we can do better than showing them a series of toggle switches. The current design of this screen is to give us a sense of how these controls are used in practice. It exposes the raw mechanics of Flickr.com, and relies on a detailed understanding of how Flickr.com works. We know this won’t be the final design. We might, for example, build an interface that asks people where they intend to store the Data Lifeboat, and use that to inform which photos we include. This is still speculative, and we have a lot of ideas we haven’t tried yet.

The prototype only saves public, permissively-licensed photos, because we’re still working out the details of how we handle licensed and private photos.

Step 4: Write a README

This is a vital step – it’s where people give us more context. A single Data Lifeboat can only contain a sliver of Flickr, so we want to give the creator the opportunity to describe why they made this selection, and also to include any warnings about sensitive content so it’s easier to use the archive with care in future.

Tori will be writing up what happened at the workshops around how we could design this particular interface to encourage creators to think carefully here.

We like the idea of introducing ‘positive friction’ to this process, supporting people to write constructive and narrative notes to the future about why this particular sliver is important to keep.

Step 5: Agree to policies

When you create a Data Lifeboat, you need to agree to certain conditions around responsible use of the photos you’re downloading:

The “policies” in the current prototype are obviously placeholders. We know we will need to impose certain conditions, but we don’t know what they are yet.

One idea we’re developing is that these policies might adapt dynamically based on the contents of the Data Lifeboat. If you’re creating a Data Lifeboat that only contains your own public photos, that’s very different from one that contains private photos uploaded by other people.

Step 6: One moment please…

Creating or “baking” a Data Lifeboat can take a while – we need to download all the photos, their associated metadata, and construct a Data Lifeboat package.

In the prototype we show you a holding page:

We imagine that in the future, we’d email you a notification when the Data Lifeboat has finished baking.

Step 7: Download the Data Lifeboat

We have a page where you can download your Data Lifeboat:

Here you see a list of all the Data Lifeboats that we’ve been prototyping, because we wanted people to share their ideas for Data Lifeboats at our co-design workshops. In the real tool, you’ll only be able to see and download Data Lifeboats that you created.

What’s next?

We still have plenty to get on with, but you can see the broad outline of where we’re going, and it’s been really helpful to have an end-to-end tool that shows the whole Data Lifeboat creation process.

Come back next week, and I’ll show you what you get inside the Data Lifeboat when you download it!

“Flickr.com is a Gathering of Memory” Insights from the Flickr Foundation’s First Conversation

Susan Mernit & George Oates

On September 26, 2024, we hosted our first-ever public conversation featuring director George Oates and advisors Anasuya Sengupta and Eliza Gregory. The event explored critical questions about preserving digital visual history in our rapidly evolving technological landscape.

The discussion centered on our purpose: to keep Flickr pictures visible for 100 years. We discussed the long list of technological uncertainties, with George quoting another Flickr Foundation advisor, Temi Odumosu, who said, “We don’t even know what a JPEG will be in ten years.”

Anasuya described social justice issues, emphasizing that digital preservation must address questions of power and representation. She stressed how important it is for marginalized communities to control  their narratives , and how we must keep this in the front of our minds as we make tools..

The conversation touched on several key points:

  • We must balance vast scale with meaningful personal engagement
  • Using Flickr Commons to empowering communities to define their histories 
  • How we can support smaller cultural institutions in digital preservation efforts
  • What community engagement brings to enriching digital archives
  • Myriad curatorial challenges as we consider deciding what to preserve

→ Read the full transcript of the event

 

Preserving Our Visual Heritage: The Flickr Foundation was established in 2022 with the purpose to keep Flickr pictures visible for 100 years. As part of our work, we look after the Flickr Commons, a unique collection of historical photographs from cultural institutions all around the world.

To stay in touch, follow us on LinkedIn or sign up for our occasional newsletter (at the bottom of our home page).

Progress Report: Creating a Data Lifeboat circa 2024

Alex Chan

In September, Tori and I went to Belgium for iPres 2024. We were keen to chat about digital preservation and discuss some of our ideas for Data Lifeboat – and enjoy a few Belgian waffles, of course!

Photo by the Digital Preservation Coalition, used under CC BY-NC-SA 2.0

We ran a workshop called “How do you preserve 50 billion photos?” to talk about the challenges of archiving social media at scale. We had about 30 people join us for a lively discussion. Sadly we don’t have any photos of the workshop, but we did come away with a lot to think about, and we wanted to share some of the ideas that emerged.

Thanks to the National Endowment for the Humanities for supporting this trip as part of our Digital Humanities Advancement Grant.

How much can we preserve?

Some people think we should try to collect all of social media – accumulate as much data as possible, and sort through it later. That might be appealing in theory, because we won’t delete anything that’s important to future researchers, but it’s much harder in practice.

For Data Lifeboat, this is a problem of sheer scale. There are 50 billion photos on Flickr, and trillions of points that form the social graph that connects them. It’s simply too much for a single institution to collect as a whole.

At the conference we heard about other challenges that make it hard to archive social media like constraints on staff time, limited resources, and a lack of cooperation from platform owners. Twitter/X came up repeatedly, as an example of a site which has become much harder to archive after changes to the API.

There are also longer-term concerns. Sibyl Schaefer, who works at the University of California, San Diego, presented a paper about climate change, and how scarcity of oil and energy will affect our ability to do digital preservation. All of our digital services rely on a steady supply of equipment and electricity, which seem increasingly fraught as we look over the next 100 years. “Just keep everything” may not be a sustainable strategy.

This paper was especially resonant for us, because she encourages us to think about these problems now, before the climate crisis gets any worse. It’s better to make a decision when you have more options and things are (relatively) calm, than wait for things to get really bad and be forced to make a snap judgment. This matches our approach to rights, privacy, and legality with Data Lifeboat – we’re taking the time to consider the best approach while Flickr is still happy and healthy, and we’re not under time pressure to make a quick decision.

What should we keep?

We went to iPres believing that trying to keep everything is inappropriate for Flickr and social media, and the conversations we had only strengthened this view. There are definitely benefits to this approach, but they require an abundance of staffing and resources that simply don’t exist.

One thing we heard at our Birds of a Feather session is that if you can only choose a selection of photos from Flickr, large institutions don’t want to make that selection themselves. They want an intermediate curator to choose photos to go in a Data Lifeboat, and then bequeath that Data Lifeboat to their archive. That person decides what they think is worth keeping, not the institution.

Who chooses what to keep?

If you can only save some of a social media service, how do you decide which part to take? You might say “keep the most valuable material”, but who decides what’s valuable? This is a thorny question that came up again and again at iPres.

An institution could conceivably collect Data Lifeboats from many people, each of whom made a different selection. Several people pointed out that any selection process will introduce bias and inequality – and while it’s impossible to fix these completely, having many people involved can help mitigate some of the issues.

This ‘collective selection’ helps deal with the fact that social media is really big – there’s so much stuff to look at, and it’s not always obvious where the interesting parts are. Sharing that decision with different people creates a broader perspective of what’s on a platform, and what material might be worth keeping.

Why are we archiving social media?

The discussion around why we archive social media is still frustratingly speculative. We went to iPres hoping to hear some compelling use cases or examples, but we didn’t.

There are plenty of examples of people using material from physical archives to aid their research. For example, one person told the story of the Minutes of the Milk Marketing Board. Once thought of as a dry and uninteresting collection, it became very useful when there was an outbreak of foot-and-mouth disease in Britain. We didn’t hear any case studies like that for digital archives.

There are already lots of digital archives and archives of Internet material. It would be interesting to hear from historians and researchers who are using these existing collections, to hear what they find useful and valuable.

The Imaginary Future Researcher

A lot of discussion revolved around an imaginary future researcher or PhD student, who would dive into the mass of digital material and find something interesting – but these discussions were often frustratingly vague. The researcher would do something with the digital collections, but more specifics weren’t forthcoming.

As we design Data Lifeboat, we’ve found it useful to imagine specific scenarios:

  • The Museum of London works with schools across the city, engaging students to collect great pictures of their local area. A schoolgirl in Whitechapel selects 20 photos of Whitechapel she thinks are worth depositing in the Museum’s collection.
  • The botany student at California State looks across Flickr to find photography of plant coverage in a specific area and gathers them as a longitudinal archive.
  • A curation student interning at Qtopia in Sydney wants to gather community documentation of Sydney’s Mardi Gras.

These only cover a handful of use cases, but they’ve helped ground our discussions and imagine how a future researcher might use the material we’ve saving.

A Day Out in Antwerp

On my final day in Belgium, I got to visit a couple of local institutions in Antwerp. I got a tour of the Plantin-Moretus Museum, which focuses on sixteenth-century book printing. The museum is an old house with some gorgeous gardens:

And a collection of old printing machines:

There was even a demonstration on a replica printing machine, and I got to try a bit of printing myself – it takes a lot of force to push the metal letters and the paper together!

Then in the afternoon I went to the FelixArchief, the city archive of Antwerp, which is stored inside an old warehouse next to the Port of Antwerp:

We got a tour of their stores, including some of the original documents from the very earliest days of Antwerp:

And while those shelves may look like any other archive, there’s a twist – they have to fit around the interior shape of the original warehouse! The archivists have got rather good at playing tetris with their boxes, and fitting everything into tight spaces – this gets them an extra 2 kilometres of shelving space!

Our tour guide explained that this is all because the warehouse is a listed building – and if the archives were ever to move out, they’d need to remove all their storage and leave it in a pristine condition. No permanent modifications allowed!

Next steps for Data Lifeboat

We’re continuing to think about what we heard about iPres, and bring it into the design and implementation of Data Lifeboat.

This month and next, we’re holding Data Lifeboat co-design workshops in Washington DC and London to continue these discussions.

100-year plan workshop – Edinburgh

By Robert Pembleton, George Oates, and Melissa Terras

This workshop was co-developed, iterated and convened by the Data + Design Lab, with organisational help from the Centre for Data, Culture and Society, both based at Edinburgh Futures Institute.

Imagine a vibrant, student cafe at the University of Edinburgh, on a cold but sunny January day. Imagine a group of academics, students, community leaders, and changemakers gathered in the corner near surreal interpretations of bookshelves, speaking over the excited conversations of a Friday morning.  This was the setting for a How to write a 100-year plan workshop hosted by the University’s Edinburgh Futures Institute – a challenge-led multidisciplinary initiative which tackles complex issues to imagine and shape better futures.

We convened to imagine the preservation of our digital heritage for future generations. A sense of excitement filled the room. There was an energy. Everyone was eager to contribute and collaborate, to give what they had to this purpose. 50 billion images; worthy of protection. The horizon was 100 years.

This workshop gravitated to the challenge of continuing access, about what to keep. How do we ensure that the viewers and researchers of future generations can see things both in their raw form, and with contextual colour around single photos? Flickr is interestingly different here because most of the images are described directly by their creators, and have factual EXIF data attached. 

We began by trying to step out of time and allow ourselves to think on the scale of centuries. Each of us dug out a picture of a meaningful place: breath-taking landscapes such as Ben Nevis and Zumaya Beach, a now empty hut sometimes buzzing with community vibrancy, and Bobby’s blurry family photo qualified with generational memories:

“It may not be the best picture, but it’s my picture.”

This sentence popped out as we were doing the first exercise, where we ask people to find pictures of places that mean something to them. It’s always interesting to see which places people choose. They’re often of views, or capture a place where good memories have been made with loved ones. They are rarely what you might call spectacular or historic or exceptionally well-made, but they are poignant for their viewer. John Berger has called photographs “observable moments.” Meaningful to maybe just a few people, but no less valuable than a well-constructed photograph of a classical landscape. Vernacular. Flickr is full of pictures like that. Brimming

We discussed meaning, longevity, humility, and value. What do we value, and how? We tend to show a primacy for personal value. There was once a glimmer that the internet could be a space outside of capitalism , but it has of course become integrated into the machine. Now is a good time to be cautious, as we imagine new systems. 

The profundity of the changes at hand causes pause. Perhaps we should leave things in a state where they can be found and used in ways we couldn’t possibly imagine. If we curate this with the lens of the present, there is a threat of sanitisation. Obtuse decision making shrouds bias, and we’re in the midst of a swell of disinformation that’s colliding with wanting to present an unbiased picture. 

There are practical considerations for such a large archive. Do we really need 50 billion images plus the infinite amount yet to come? Maybe its value should be measured against the archive’s carbon footprint. Maybe it’s OK for some things to disappear forever. We discussed the simplistic beauty in randomness, and so perhaps an approach could be to keep a random percentage of everything. This would mean we’d keep some of the boring stuff, and history lovers of the future may be most enamoured with the mundane. Libraries, archives and other memory institutions have detailed deaccession policies – where they decide what to no longer look after – but Flickr can be thought of as a vernacular, outside, “fugitive” archive. Any decision regarding deletion of content should be collective, cooperative, collaborative, and transparent. To expose our methods, to help future generations understand how we made our decisions. Then, maybe we can release the weight of our history with joy, with ceremony, with something that could become like a tradition that we encourage, support, revisit, and maintain ourselves collectively. 

Thinking about long organisations

We talked about how ritual might help sustain a strong direction across a century. (There were jokes about everyone wearing robes.) Ritual has had a place to play in human society thus far – it seems to have longevity. Imagine a successful 1000 year old pub that never franchised and never exceeded its comfort level; it was just right. 

Towards the end of our session we went for a walk and explored the ancient Royal Mile of Edinburgh, passing by the World’s End pub. That was once the perception: there was nothing of value, the end of the world, outside of the castle’s gates. We walked in contemplation with ancient volcanoes which dot the landscape surrounding this beautiful city. A 3.8 billion year old rock? Maybe 100 years isn’t so long after all. There are more species than ourselves, and there is more to come. 

We walked past another neighbourhood whose significant industrial heritage is now demolished. The machines of the future may be quite upset indeed that they aren’t able to visit their ancestors. 

Few digital photos will survive for a century by accident. A 100 year plan should be a practical, responsive, future facing declaration of intent. We train our lens, knowing that our visions are from fascinating unknowns, impossible perspectives. We do love to play. We should nurture that in ourselves. Play inspires joy. Joy celebrates humanity. The Flickr archive is a record of what it has been like to be in certain spaces, from a certain perspective, at a certain time. It cherishes our sense of identity

“The nature of our identity must not be destroyed.”

The session ended in quiet, contemplative reflection. It elicited poetry, a snapshot of the moment. We could share it, but it wouldn’t really make sense out of context. Nothing really does.

We thank all attendees for their contributions. As with any good, invigorating engagement, our discussions provoked more questions than it answered. Here are some things that we may explore in the future. 

  • Oral histories happen everywhere. Could we or should we figure out how to attach sound to pictures in flickr.org?
  • When does a story become a history?
  • Is decentralisation a route to protecting neutrality? If more copies live in more and differently accessed and described spaces, could that diffuse the “truth” of history?

Postscript:Post-it transcriptions

The first section of the workshop is about thinking in centuries. Not something we do often, so we’ve found it helpful to expand our normal timeframes a bit at the start. We have an exercise in three parts, and participants note down their thoughts on post-its and we can all have a look afterwards.

  1. What in your life has lasted for 100 years? Fisherrow Harbour; books, photographs/postcards, documents (incl. design), monuments, environment, buildings/roads/rail; my house ~1910, grandfather’s crown from WWI, great-grandmother’s ring (now mine!); Victorian egg & bird collection, great AUK(?) items, egg at NMS, old paintings; a cannon from a tall ship; the ring I’m wearing; my flat; my house; the ring on my right hand; music (I sing classical music); I have a doll that was my mother’s mother’s from her childhood; my house; copies of vintage books I adore and own, e.g. Lafcadio Hearn’s.
  2. What in your life do you want to last for another 100 years? The same books, some of my pictures, some of my writings; code and info on AI planners, life/work results; my dichroic ring and the story behind it; HOPE; photos of my family; music!; family photographs; international communication; my children; my children, or at least, their children; my house; environment, community/society; how we lived, understanding of how we lived; the house.
  3. What in your life do you not want to last 100 years? The album with my weedy singing made by the band I was in in my 20s; META CRISIS; AI-generated crud; “reinforced divisions,” “Left v Right”; climate change; Tories; all this plastic; videos of my early teaching/lectures (Lecture Capture!); the rest of my pictures and writings; revenge porn!