Flickr Foundation at iPres 2024

Alex Chan

In September, Tori and I went to Belgium for iPres 2024. We were keen to chat about digital preservation and discuss some of our ideas for Data Lifeboat – and enjoy a few Belgian waffles, of course!

Photo by the Digital Preservation Coalition, used under CC BY-NC-SA 2.0

We ran a workshop called “How do you preserve 50 billion photos?” to talk about the challenges of archiving social media at scale. We had about 30 people join us for a lively discussion. Sadly we don’t have any photos of the workshop, but we did come away with a lot to think about, and we wanted to share some of the ideas that emerged.

Thanks to the National Endowment for the Humanities for supporting this trip as part of our Digital Humanities Advancement Grant.

How much can we preserve?

Some people think we should try to collect all of social media – accumulate as much data as possible, and sort through it later. That might be appealing in theory, because we won’t delete anything that’s important to future researchers, but it’s much harder in practice.

For Data Lifeboat, this is a problem of sheer scale. There are 50 billion photos on Flickr, and trillions of points that form the social graph that connects them. It’s simply too much for a single institution to collect as a whole.

At the conference we heard about other challenges that make it hard to archive social media like constraints on staff time, limited resources, and a lack of cooperation from platform owners. Twitter/X came up repeatedly, as an example of a site which has become much harder to archive after changes to the API.

There are also longer-term concerns. Sibyl Schaefer, who works at the University of California, San Diego, presented a paper about climate change, and how scarcity of oil and energy will affect our ability to do digital preservation. All of our digital services rely on a steady supply of equipment and electricity, which seem increasingly fraught as we look over the next 100 years. “Just keep everything” may not be a sustainable strategy.

This paper was especially resonant for us, because she encourages us to think about these problems now, before the climate crisis gets any worse. It’s better to make a decision when you have more options and things are (relatively) calm, than wait for things to get really bad and be forced to make a snap judgment. This matches our approach to rights, privacy, and legality with Data Lifeboat – we’re taking the time to consider the best approach while Flickr is still happy and healthy, and we’re not under time pressure to make a quick decision.

What should we keep?

We went to iPres believing that trying to keep everything is inappropriate for Flickr and social media, and the conversations we had only strengthened this view. There are definitely benefits to this approach, but they require an abundance of staffing and resources that simply don’t exist.

One thing we heard at our Birds of a Feather session is that if you can only choose a selection of photos from Flickr, large institutions don’t want to make that selection themselves. They want an intermediate curator to choose photos to go in a Data Lifeboat, and then bequeath that Data Lifeboat to their archive. That person decides what they think is worth keeping, not the institution.

Who chooses what to keep?

If you can only save some of a social media service, how do you decide which part to take? You might say “keep the most valuable material”, but who decides what’s valuable? This is a thorny question that came up again and again at iPres.

An institution could conceivably collect Data Lifeboats from many people, each of whom made a different selection. Several people pointed out that any selection process will introduce bias and inequality – and while it’s impossible to fix these completely, having many people involved can help mitigate some of the issues.

This ‘collective selection’ helps deal with the fact that social media is really big – there’s so much stuff to look at, and it’s not always obvious where the interesting parts are. Sharing that decision with different people creates a broader perspective of what’s on a platform, and what material might be worth keeping.

Why are we archiving social media?

The discussion around why we archive social media is still frustratingly speculative. We went to iPres hoping to hear some compelling use cases or examples, but we didn’t.

There are plenty of examples of people using material from physical archives to aid their research. For example, one person told the story of the Minutes of the Milk Marketing Board. Once thought of as a dry and uninteresting collection, it became very useful when there was an outbreak of foot-and-mouth disease in Britain. We didn’t hear any case studies like that for digital archives.

There are already lots of digital archives and archives of Internet material. It would be interesting to hear from historians and researchers who are using these existing collections, to hear what they find useful and valuable.

The Imaginary Future Researcher

A lot of discussion revolved around an imaginary future researcher or PhD student, who would dive into the mass of digital material and find something interesting – but these discussions were often frustratingly vague. The researcher would do something with the digital collections, but more specifics weren’t forthcoming.

As we design Data Lifeboat, we’ve found it useful to imagine specific scenarios:

  • The Museum of London works with schools across the city, engaging students to collect great pictures of their local area. A schoolgirl in Whitechapel selects 20 photos of Whitechapel she thinks are worth depositing in the Museum’s collection.
  • The botany student at California State looks across Flickr to find photography of plant coverage in a specific area and gathers them as a longitudinal archive.
  • A curation student interning at Qtopia in Sydney wants to gather community documentation of Sydney’s Mardi Gras.

These only cover a handful of use cases, but they’ve helped ground our discussions and imagine how a future researcher might use the material we’ve saving.

A Day Out in Antwerp

On my final day in Belgium, I got to visit a couple of local institutions in Antwerp. I got a tour of the Plantin-Moretus Museum, which focuses on sixteenth-century book printing. The museum is an old house with some gorgeous gardens:

And a collection of old printing machines:

There was even a demonstration on a replica printing machine, and I got to try a bit of printing myself – it takes a lot of force to push the metal letters and the paper together!

Then in the afternoon I went to the FelixArchief, the city archive of Antwerp, which is stored inside an old warehouse next to the Port of Antwerp:

We got a tour of their stores, including some of the original documents from the very earliest days of Antwerp:

And while those shelves may look like any other archive, there’s a twist – they have to fit around the interior shape of the original warehouse! The archivists have got rather good at playing tetris with their boxes, and fitting everything into tight spaces – this gets them an extra 2 kilometres of shelving space!

Our tour guide explained that this is all because the warehouse is a listed building – and if the archives were ever to move out, they’d need to remove all their storage and leave it in a pristine condition. No permanent modifications allowed!

Next steps for Data Lifeboat

We’re continuing to think about what we heard about iPres, and bring it into the design and implementation of Data Lifeboat.

This month and next, we’re holding Data Lifeboat co-design workshops in Washington DC and London to continue these discussions.