Progress Report: Creating a Data Lifeboat circa 2024

Alex Chan

In October and November, we held two Data Lifeboat workshops, funded by the Mellon Foundation. We had four days of detailed discussions about how Data Lifeboat should work, we talked about some of the big questions around ethics and care, and got a lot of useful input from our attendees.

As part of the workshops, we showed a demo of our current software, where we created and downloaded a prototype Data Lifeboat. This is an early prototype, with a lot of gaps and outstanding questions, but it was still very helpful to guide some of our conversations. Feedback from the workshops will influence what we are building, and we plan to release a public alpha next year.

We are sharing this work in progress as a snapshot of where we’ve got to. The prototype isn’t built for other people to use, but I can walk you through it with screenshots and explanations.

In this post, I’ll walk you through the creation workflow – the process of preparing a Data Lifeboat. In a follow-up post, I’ll show you what you get when you download a finished Data Lifeboat.

Step 1: Sign in to Flickr

To create a Data Lifeboat, you have to sign in to your Flickr account:

This gives us an authenticated source of identity for who is creating each Data Lifeboat. This means each Data Lifeboat will reflect the social position of its creator in Flickr.com. For example, after you log in, your Data Lifeboat could contain photos shared with you by friends and family, where those photos would not be accessible to other Flickr members who aren’t part of the same social graph.

Step 2: Choose the photos you want to save

To choose photos, you enter a URL that points to photos on Flickr.com:

In the prototype, we can only accept a single URL, either a Gallery, Album, or Photostream for now. This is good enough for prototyping, but we know we’ll need something more flexible later – for example, we might allow you to enter multiple URLs, and then we’d get photos from all of them.

Step 3: See a summary of the photos you’d be downloading

Once you’ve given us a URL, we fetch the first page of photos and show you a summary. This includes things like:

  • How many photos are there?
  • How many photos are public, semi-public, or private?
  • What license are the photos using?
  • What’s the safety level of the photos?
  • Have the owners disabled downloads for any of these photos?

Each of these controls affects what we are permitted to put in a Data Lifeboat, and the answers will be different for different people. Somebody creating their family archive may want all the photos, whereas somebody creating a Data Lifeboat for a museum might only want photos which are publicly visible and openly licensed.

We want Data Lifeboat creators to make informed decisions about what goes in their Data Lifeboat, and we believe we can do better than showing them a series of toggle switches. The current design of this screen is to give us a sense of how these controls are used in practice. It exposes the raw mechanics of Flickr.com, and relies on a detailed understanding of how Flickr.com works. We know this won’t be the final design. We might, for example, build an interface that asks people where they intend to store the Data Lifeboat, and use that to inform which photos we include. This is still speculative, and we have a lot of ideas we haven’t tried yet.

The prototype only saves public, permissively-licensed photos, because we’re still working out the details of how we handle licensed and private photos.

Step 4: Write a README

This is a vital step – it’s where people give us more context. A single Data Lifeboat can only contain a sliver of Flickr, so we want to give the creator the opportunity to describe why they made this selection, and also to include any warnings about sensitive content so it’s easier to use the archive with care in future.

Tori will be writing up what happened at the workshops around how we could design this particular interface to encourage creators to think carefully here.

We like the idea of introducing ‘positive friction’ to this process, supporting people to write constructive and narrative notes to the future about why this particular sliver is important to keep.

Step 5: Agree to policies

When you create a Data Lifeboat, you need to agree to certain conditions around responsible use of the photos you’re downloading:

The “policies” in the current prototype are obviously placeholders. We know we will need to impose certain conditions, but we don’t know what they are yet.

One idea we’re developing is that these policies might adapt dynamically based on the contents of the Data Lifeboat. If you’re creating a Data Lifeboat that only contains your own public photos, that’s very different from one that contains private photos uploaded by other people.

Step 6: One moment please…

Creating or “baking” a Data Lifeboat can take a while – we need to download all the photos, their associated metadata, and construct a Data Lifeboat package.

In the prototype we show you a holding page:

We imagine that in the future, we’d email you a notification when the Data Lifeboat has finished baking.

Step 7: Download the Data Lifeboat

We have a page where you can download your Data Lifeboat:

Here you see a list of all the Data Lifeboats that we’ve been prototyping, because we wanted people to share their ideas for Data Lifeboats at our co-design workshops. In the real tool, you’ll only be able to see and download Data Lifeboats that you created.

What’s next?

We still have plenty to get on with, but you can see the broad outline of where we’re going, and it’s been really helpful to have an end-to-end tool that shows the whole Data Lifeboat creation process.

Come back next week, and I’ll show you what you get inside the Data Lifeboat when you download it!

Our Data Lifeboat workshops are complete

Thanks to support from the Mellon Foundation, we have now completed our two international Data Lifeboat workshops. They were great! We have various blog posts planned to share what happened, and I’ll just start with a very quick summary.

As you may know, we had laid out doing two workshops:

  1. Washington DC, at The Library of Congress, in October, and
  2. London, at the Garden Museum and Autograph Gallery, in November.

We were pleased to welcome a total of 32 people across the events, from libraries, archives, academic institutions, the freelance world, other like-minded nonprofits, Flickr.com, and Flickr.org.

Now we are doing the work of sifting through the bazillion post-its and absorbing the great conversations had as we worked through Tori’s fantastic program for the event. We were all very well-fed and organized too, thanks to Ewa’s superb project management. Thank you both.

Workshop aims

The aims of each workshop were the same:

  • Articulate the value of archiving social media, and Data Lifeboat
  • Detail where Data Lifeboat fits in current ecology of tools and practices
  • Detail where Data Lifeboat fits with curatorial approaches and content delivery
  • Plot (and recognise) the type and amount of work it would take to establish Data Lifeboat or similar in organisations

Workshop outline

We met these aims by lining up the workshops into different sessions:

  1. Foundations of Long-Term Digital Preservation – Backward/forward horizons; understanding digital infrastructures; work happening in long-term digital preservation
  2. Data Lifeboat: What we’re thinking so far – Reporting on our NEH work to prototype software and policy, including a live demo; positioning a Data Lifeboat in emergency/not-emergency scenarios; curation needs or desires to use Data Lifeboats as selection/acquisition tool
  3. Consent and Care in Social Media Archiving – Ethics of care in digital archives; social context and care vs extractive data practices; mapping ethical rights, risks, responsibilities including copyright and data protection, and consent, and
  4. Characteristics of a Robust & Responsible Safe Harbor Network (our planned extension of the Data Lifeboat concept – think LOCKSS-ish) – The long history of safe harbor networks; logistics of such a network; Trust.

I’m not going to report on these now, but whet your appetite for our further reporting back.

Background readings

Tori also prepared some grounding readings for the event, which we thought others may like to review:

Needless to say, we all enjoyed it very much, and heard the same from our attendees. Several follow-on chats have been arranged, and the community continues to wiggle towards each other.

Progress Report: Creating a Data Lifeboat circa 2024

Alex Chan

In September, Tori and I went to Belgium for iPres 2024. We were keen to chat about digital preservation and discuss some of our ideas for Data Lifeboat – and enjoy a few Belgian waffles, of course!

Photo by the Digital Preservation Coalition, used under CC BY-NC-SA 2.0

We ran a workshop called “How do you preserve 50 billion photos?” to talk about the challenges of archiving social media at scale. We had about 30 people join us for a lively discussion. Sadly we don’t have any photos of the workshop, but we did come away with a lot to think about, and we wanted to share some of the ideas that emerged.

Thanks to the National Endowment for the Humanities for supporting this trip as part of our Digital Humanities Advancement Grant.

How much can we preserve?

Some people think we should try to collect all of social media – accumulate as much data as possible, and sort through it later. That might be appealing in theory, because we won’t delete anything that’s important to future researchers, but it’s much harder in practice.

For Data Lifeboat, this is a problem of sheer scale. There are 50 billion photos on Flickr, and trillions of points that form the social graph that connects them. It’s simply too much for a single institution to collect as a whole.

At the conference we heard about other challenges that make it hard to archive social media like constraints on staff time, limited resources, and a lack of cooperation from platform owners. Twitter/X came up repeatedly, as an example of a site which has become much harder to archive after changes to the API.

There are also longer-term concerns. Sibyl Schaefer, who works at the University of California, San Diego, presented a paper about climate change, and how scarcity of oil and energy will affect our ability to do digital preservation. All of our digital services rely on a steady supply of equipment and electricity, which seem increasingly fraught as we look over the next 100 years. “Just keep everything” may not be a sustainable strategy.

This paper was especially resonant for us, because she encourages us to think about these problems now, before the climate crisis gets any worse. It’s better to make a decision when you have more options and things are (relatively) calm, than wait for things to get really bad and be forced to make a snap judgment. This matches our approach to rights, privacy, and legality with Data Lifeboat – we’re taking the time to consider the best approach while Flickr is still happy and healthy, and we’re not under time pressure to make a quick decision.

What should we keep?

We went to iPres believing that trying to keep everything is inappropriate for Flickr and social media, and the conversations we had only strengthened this view. There are definitely benefits to this approach, but they require an abundance of staffing and resources that simply don’t exist.

One thing we heard at our Birds of a Feather session is that if you can only choose a selection of photos from Flickr, large institutions don’t want to make that selection themselves. They want an intermediate curator to choose photos to go in a Data Lifeboat, and then bequeath that Data Lifeboat to their archive. That person decides what they think is worth keeping, not the institution.

Who chooses what to keep?

If you can only save some of a social media service, how do you decide which part to take? You might say “keep the most valuable material”, but who decides what’s valuable? This is a thorny question that came up again and again at iPres.

An institution could conceivably collect Data Lifeboats from many people, each of whom made a different selection. Several people pointed out that any selection process will introduce bias and inequality – and while it’s impossible to fix these completely, having many people involved can help mitigate some of the issues.

This ‘collective selection’ helps deal with the fact that social media is really big – there’s so much stuff to look at, and it’s not always obvious where the interesting parts are. Sharing that decision with different people creates a broader perspective of what’s on a platform, and what material might be worth keeping.

Why are we archiving social media?

The discussion around why we archive social media is still frustratingly speculative. We went to iPres hoping to hear some compelling use cases or examples, but we didn’t.

There are plenty of examples of people using material from physical archives to aid their research. For example, one person told the story of the Minutes of the Milk Marketing Board. Once thought of as a dry and uninteresting collection, it became very useful when there was an outbreak of foot-and-mouth disease in Britain. We didn’t hear any case studies like that for digital archives.

There are already lots of digital archives and archives of Internet material. It would be interesting to hear from historians and researchers who are using these existing collections, to hear what they find useful and valuable.

The Imaginary Future Researcher

A lot of discussion revolved around an imaginary future researcher or PhD student, who would dive into the mass of digital material and find something interesting – but these discussions were often frustratingly vague. The researcher would do something with the digital collections, but more specifics weren’t forthcoming.

As we design Data Lifeboat, we’ve found it useful to imagine specific scenarios:

  • The Museum of London works with schools across the city, engaging students to collect great pictures of their local area. A schoolgirl in Whitechapel selects 20 photos of Whitechapel she thinks are worth depositing in the Museum’s collection.
  • The botany student at California State looks across Flickr to find photography of plant coverage in a specific area and gathers them as a longitudinal archive.
  • A curation student interning at Qtopia in Sydney wants to gather community documentation of Sydney’s Mardi Gras.

These only cover a handful of use cases, but they’ve helped ground our discussions and imagine how a future researcher might use the material we’ve saving.

A Day Out in Antwerp

On my final day in Belgium, I got to visit a couple of local institutions in Antwerp. I got a tour of the Plantin-Moretus Museum, which focuses on sixteenth-century book printing. The museum is an old house with some gorgeous gardens:

And a collection of old printing machines:

There was even a demonstration on a replica printing machine, and I got to try a bit of printing myself – it takes a lot of force to push the metal letters and the paper together!

Then in the afternoon I went to the FelixArchief, the city archive of Antwerp, which is stored inside an old warehouse next to the Port of Antwerp:

We got a tour of their stores, including some of the original documents from the very earliest days of Antwerp:

And while those shelves may look like any other archive, there’s a twist – they have to fit around the interior shape of the original warehouse! The archivists have got rather good at playing tetris with their boxes, and fitting everything into tight spaces – this gets them an extra 2 kilometres of shelving space!

Our tour guide explained that this is all because the warehouse is a listed building – and if the archives were ever to move out, they’d need to remove all their storage and leave it in a pristine condition. No permanent modifications allowed!

Next steps for Data Lifeboat

We’re continuing to think about what we heard about iPres, and bring it into the design and implementation of Data Lifeboat.

This month and next, we’re holding Data Lifeboat co-design workshops in Washington DC and London to continue these discussions.

Field Notes #01: Lughnasadh

by Fattori McKenna

Deep Reading in the Last Days of Summer

 

I joined the Foundation team in early August, with the long-term goal of better understanding future users of the Data Lifeboat project and Safe Harbor network. Thanks to the Digital Humanities Advancement Grant we were awarded by the National Endowment for the Humanities, my first task was to get up to speed with the Data Lifeboat project, a concept that has been in the works since 2022, as part of Flickr.org’s Content Mobility Program, and recently developed a working prototype. I have the structured independence to design my own research plan and, as every researcher knows, being able to immerse oneself in the topic prior, is a huge advantage. It allows us to frame the problem at hand, to be resolute with objectives and ground the research in what is known and current.

 

Stakeholder interviews

To understand what would be needed from the research plan, I first wanted to understand how we got to where we are with Data Lifeboat project.

I spoke with Flickr.org’s tight-knit internal team to gather perspectives that emphasised varying approaches to the question of long-term digital preservation: ranging from the technological, to the speculative, to the communal. It was curious to see how different team members viewed the project, each speaking from their own specialty, with their wider ambitions and community in mind.

Branching out, I enlisted external stakeholders for half-hour chats, those who’ve had a hand in the Data Lifeboat project since it was in napkin-scribble format. The tool owes its present form to a cadre of digital preservation experts and enthusiasts, who do not work on the project full-time, but have generously given their hours to partake in workshops, coffees, Whereby calls, and a blissfully meandering Slack thread. Knowing these folks would be, themselves, a huge repository of knowledge, I wanted a way to capture this. Besides introductions to the Safe Harbor Network co-design workshops (as supported by the recent Mellon Foundation grant) and my new role, I centred our conversation around three key questions:

  1. What has your experience of the last six months of the Data Lifeboat project been like? How do you think we are doing? Any favourite moments, any concerns?
  2. What are the existing practices around digital acquisition, storage and maintenance in your organisation(s)? How would the Data Lifeboat and Safe Harbor Network differ from the existing practices?
  3. Where are the blind-spots that still exist for developing the Data Lifeboat project and Safe Harbor Network? What might we want to find out from the co-design workshops in October and November?

Here it was notable to learn what had stuck with them in the repose since the last Data Lifeboat project meet-up. For some the emphasis was on how the Data Lifeboat tool could connect institutions, for others it was how the technology can decentralise power and ownership of data. All were keen to see what shape the project would take next.

One point, however, remained amorphous to all stakeholders that we ought to carry forward into research: what is the problem that Data Lifeboat project is solving? Specifically in a non-emergency scenario (as the emergency need is intuitive). How can we best articulate that problem to our imagined users?

As our prototype user group is likely to be institutional users of Flickr (Galleries, Libraries, Archives and Museums), it will be important to meet them where they are, which brought me onto my next August task: the mini-literature review.

 

Mini Literature Review

Next, I wanted to get up to date on the contemporary discourses around digital preservation. Whilst stakeholders have brought their understanding of these topics to shaping the Data Lifeboat project, it felt as if the project was missing its own bibliography or set of citations. I wanted to ask, what are the existing conversations that Data Lifeboat project is speaking to?

It goes without saying that this is a huge topic and, despite my humble background in digital heritage research (almost always theoretical), cramming this all into one month would be impossible. Thus, I adopted the ethos of the archival ‘sliver’ that so informs the ethos of the Data Lifeboat project, to take a snapshot of current literature. After reviewing the writing to date on the project (shout-out to Jenn’s reporting here and here), I landed on three guiding topics for the literature review:

 

The Status of Digital Preservation

  • What are the predominant tools and technologies of digital preservation?
  • What are recent reflections and learnings from web archiving experiments?
  • What are current institutional and corporate strategies to digital social collecting and long-term data storage?

Examples include:

Care & Ethics of Archives

  • What are the key ethical considerations among archivists today?
  • How are care practices being embedded into archives and archival practice?
  • What reflections and responses exist to previous ethical interventions?

Examples include:

Collaboration and Organisation in Archival Practice

  • What are the infrastructures (hard and soft) of archival practice?
  • What are the predominant organisational structures, considerations and difficulties in digital archives
  • How does collaboration appear in archives? Who are the (visible and invisible) stakeholders?

Examples include:

 

A selection of academic articles, blog posts and industry guidelines were selected as source materials (as well as crowdsourcing from the Flickr.org team’s favourites). In reading these texts, I had top of mind the questions: ‘What does this mean for the Data Lifeboat project and the Safe Harbor Network’, in more granular terms this means, ‘What can we learn from these investigations?’ ‘Where are we positioned in the wider ecosystem of digital preservation?’ and finally, ‘What should we be thinking about that we aren’t yet?’

Naturally with more time, or with an academic audience in mind, a more rigorous methodology to discourse capture would be appropriate. For our purposes, however, this snapshot approach suffices – ultimately the data this research is grounded in comes not from textual problematising, but instead will emerge from our workshops with future users.

Having this resource is of huge benefit to meeting our session participants where they stand. Whilst there will inevitably be discourses, approaches and critiques I have missed, I will at least be able to speak the same language as our participants and get into the weeds of our problems in a complex, rather than baseline, manner. Furthermore, my ambition is for this bibliography to become an ongoing and open-source asset, expanding as the project develops.

These three headers (1. The Status of Digital Preservation, 2. Care & Ethics of Archives, 3. Collaboration and Organisation in Archival Practice) currently constitute placeholders for our workshop topics. It is likely, however, that these titles could evolve, splinter or coalesce as we come closer to a more refined and targeted series of questions for investigating with our participants.

 

Question Repository [in the works]

Concurrently to these ongoing workstreams, I am building a repository, or long-list, of questions for our upcoming workshops. The aim is to first go broad, listing all possible questions, in an attempt to capture as many inquisitive voices as possible. These will then be refined down, grouped under thematic headings which will in turn structure the sub-points or provocations for our sessions. This iterative process reflects a ground-up methodology, derived from interviews, reading, and the collective knowledge of the Flickr.org community, to finally land on working session titles for our October and November Safe Harbor Network co-design workshops.

Looking ahead, there is an opportunity to test several of these provocations around Data Lifeboat at our Birds-of-a-Feather session, taking place at this year’s International Conference on Digital Preservation (iPres) in Ghent later this month. Here we might foresee which questions generate lively and engaged discussion; which features of the Data Lifeboat tool and project prompt anticipation or concern; and finally, which pathways we ought to explore further.

 

Other things I’ve been thinking about this month

Carl Öhman’s concept of the Neo-Natufians in The Afterlife of Data: What Happens to Your Information When you Die and Why You Should Care

Öhman proposes that the digital age has ushered in a major shift in how we interact with our deceased. Referencing the Natufians, the first non-nomadic peoples to keep the dead among their tribe (who would adorn skulls with seashells and place them in the walls) instead of leaving them behind to the elements, he posits our current position is equally as seismic. The dead now live alongside us in the digital realm. A profound shift from the family shoebox of photographs, the dead are accessible from virtually anywhere at any time, their (visible and invisible) data trail co-existing with ours. An inescapable provocation for the Data Lifeboat project to consider.

“The imago mask, printed not in wax but in ones and zeros”

The Shikinen Sengu Ritual at Ise Jingu, Japan

The Shikinen Sengu is a ritual held at the Ise Grand Shrine in Japan every 20 years, where the shrine is completely rebuilt and the sacred objects are transferred to the new structure. This practice has been ongoing for over a millennium and makes me think on the mobility of cultural heritage (analogue or digital) and that stasis, despite its intuitive appeal, can cause objects to perish. I am reminded of the oft-exalted quote from di Lampedusa’s Sicilian epic:

“If we want things to stay as they are, things will have to change.” The Leopard, by Giuseppe Tomasi di Lampedusa

Furthermore Shikinen Sengu highlights the importance of ritual in sustaining objects, despite the wear-and-tear that handling over millennia may cause. What might our rituals around digital cultural data be, what practices could we generate (even if the original impetus gets lost)?

 

Background Ephemera

Currently Playing: Laura Misch Sample the Earth and Sample the Sky

Currently Reading: The Hearing Trumpet by Leonora Carrington

Currently Drinking: Clipper Green Tea

Welcome, Fattori!

Hello, world! I’m Fattori, Lead Researcher on the Data Lifeboat Project at the Flickr Foundation.

I first used Flickr in 2005; at that time, I was an angsty teen who needed a place to store grainy photos of Macclesfield, my post-industrial hometown, that I shot on an old Minolta camera. Since then, both my career and my academic research have focused on themes that are central to the aims of Flickr.org: images, databases, community, and the recording of human experiences.

In 2017 I began working as a researcher for strategic design studios based in New York, Helsinki, London and Mumbai. My research tried to address complex questions about humans’ experience of modern visual cultures by blending semiotics, ethnography and participatory methods. My commercial projects allowed me to explore women’s domestic needs in rural Vietnam, the future of work in America’s Rust Belt, and much in between.

As a postgraduate researcher at the University of Oxford’s Internet Institute, my work explores how blockchain experiments have shaped art and heritage sectors in the U.K. and Italy. At an Oxford Generative AI Summit I met the Flickr Foundation’s Co-Founder, George, and we hosted a workshop on Flickr’s 100-Year Plan with University and Bodleian academics, archivists, and students. I subsequently became more involved with Flickr.org when I contributed research to their generative AI time-capsule, A Generated Family of Man.

Now, as a Lead Researcher at Flickr.org, I’m developing a plan to help better understand future users of Data Lifeboat and the proposed Safe Harbour Network. We want to know how these tools might be implemented in real-world contexts, what problems they might solve, and how we can maintain the soft, collective infrastructure that keeps the Data Lifeboat afloat. 

Beyond my professional life, I always have a jumper on my knitting needles (I can get quite nerdy about wool), I rush to a potter’s wheel whenever I can, and I’m writing a work of historical fiction about a mystic in the Balearic Islands. Like my 2005 self, I still snap the odd photo, these days on a Nikon L35AF.

Improving millions of files on Wikimedia Commons with Flickypedia Backfillr Bot

Last year, we built Flickypedia, a new tool for copying photos from Flickr to Wikimedia Commons. As part of our planning, we asked for feedback on Flickr2Commons and analysed other tools. We spotted two consistent themes in the community’s responses:

  • Write more structured data for Flickr photos
  • Do a better job of detecting duplicate files

We tried to tackle both of these in Flickypedia, and initially, we were just trying to make our uploader better. Only later did we realize that we could take our work a lot further, and retroactively apply it to improve the metadata of the millions of Flickr photos already on Wikimedia Commons. At that moment, Flickypedia Backfillr Bot was born. Last week, the bot completed its millionth update, and we guesstimate we will be able to operate on another 13 million files.

The main goals of the Backfillr Bot are to improve the structured data for Flickr photos on Wikimedia Commons and to make it easier to find out which photos have been copied across. In this post, I’ll talk about what the bot does, and how it came to be.

Write more structured data for Flickr photos

There are two ways to add metadata to a file on Wikimedia Commons: by writing Wikitext or by creating structured data statements.

When you write Wikitext, you write your metadata in a MediaWiki-specific markup language that gets rendered as HTML. This markup can be written and edited by people, and the rendered HTML is designed to be read by people as well. Here’s a small example, which has some metadata to a file linking it back to the original Flickr photo:

== {{int:filedesc}} ==
{{Information
|Description={{en|1=Red-whiskered Bulbul photographed in Karnataka, India.}}
|Source=https://www.flickr.com/photos/shivanayak/12448637/
|Author=[[:en:User:Shivanayak|Shiva shankar]]
|Date=2005-05-04
|Permission=
|other_versions=
}}

and here’s what that Wikitext looks like when rendered as HTML:

A table with four rows: Description (Red-whiskered Bulbul photographed in Karnataka, India), Date (4 May 2005), Source (a Flickr URL) and Author (Shiva shankar)

This syntax is convenient for humans, but it’s fiddly for computers – it can be tricky to extract key information from Wikitext, especially when things get more complicated.

In 2017, Wikimedia Commons added support for structured data. This allows editors to add metadata in a machine-readable format. This makes it much easier to edit metadata programmatically, and there’s a strong desire from the community for new tools to write high-quality structured metadata that other tools can use.

When you add structured data to a file, you create “statements” which are attached to properties. The list of properties is chosen by the volunteers in the Wikimedia community.

For example, there’s a property called “source of file” which is used to indicate where a file came from. The file in our example has a single statement for this property, which says the file is available on the Internet, and points to the original Flickr URL:

Structured data is exposed via an API, and you can retrieve this information in nice machine-readable XML or JSON:

$ curl 'https://commons.wikimedia.org/w/api.php?action=wbgetentities&sites=commonswiki&titles=File%3ARed-whiskered%20Bulbul-web.jpg&format=xml'
<?xml version="1.0"?>
<api success="1">
  …
  <P7482>
    …
    <P973>
      <_v snaktype="value" property="P973">
        <datavalue
          value="https://www.flickr.com/photos/shivanayak/12448637/"
          type="string"/>
      </_v>
    </P973>
    …
  </P7482>
</api>

(Here “P7482” means “source of file” and “P973” is “described at URL”.)

Part of being a good structured data citizen is following the community’s established patterns for writing structured data. Ideally every tool would create statements in the same way, so the data is consistent across files – this makes it easier to work with later.

We spent a long time discussing how Flickypedia should use structured data, and we got a lot of helpful community feedback. We’ve documented our current data model as part of our Wikimedia project page.

Do a better job of detecting duplicate files

If a photo has already been copied from Flickr onto Wikimedia Commons, nobody wants to copy it a second time.

This sounds simple – just check whether the photo is already on Commons, and don’t offer to copy it if it’s already there. In practice, it’s quite tricky to tell if a given Flickr photo is on Commons. There are two big challenges:

  1. Files on Wikimedia Commons aren’t consistent in where they record the URL of the original Flickr photo. Newer files put the URL in structured data; older files only put the URL in Wikitext or the revision descriptions. You have to look in multiple places.
  2. Files on Wikimedia Commons aren’t consistent about which form of the Flickr URL they use – with and without a trailing slash, with the user NSID or their path alias, or the myriad other URL patterns that have been used in Flickr’s twenty-year history.

Here’s a sample of just some of the different URLs we saw in Wikimedia Commons:

https://www.flickr.com/photos/joyoflife//44627174
https://farm5.staticflickr.com/4586/37767087695_bb4ecff5f4_o.jpg
www.flickr.com/photo_edit.gne?id=3435827496
https://www.flickr.com/photo.gne?short=2ouuqFT

There’s no easy way to query Wikimedia Commons and see if a Flickr photo is already there. You can’t, for example, do a search for the current Flickr URL and be sure you’ll find a match – it wouldn’t find any of the examples above. You can combine various approaches that will improve your chances of finding an existing duplicate, if there is one, but it’s a lot of work and you get varying results.

For the first version of Flickypedia, we took a different approach. We downloaded snapshots of the structured data for every file on Wikimedia Commons, and we built a database of all the links between files on Wikimedia Commons and Flickr photos. For every file in the snapshot, we looked at the structured data properties where we might find a Flickr URL. Then we tried to parse those URLs using our Flickr URL parsing library, and find out what Flickr photo they point at (if any).

This gave us a SQLite database that mapped Flickr photo IDs to Wikimedia Commons filenames. We could use this database to do fast queries to find copies of a Flickr photo that already exist on Commons. This proved the concept, but it had a couple of issues:

  • It was an incomplete list – we only looked in the structured data, and not the Wikitext. We estimate we were missing at least a million photos.
  • Nobody else can use this database; it only lives on the Flickypedia server. Theoretically somebody else could create it themselves – the snapshots are public, and the code is open source – but it seems unlikely.
  • This database is only as up-to-date as the latest snapshot we’ve downloaded – it could easily fall behind what’s on Wikimedia Commons.

We wanted to make this process easier – both for ourselves, and anybody else building Flickr–Wikimedia Commons integrations.

Adding the Flickr Photo ID property

Every photo on Flickr has a unique numeric ID, so we proposed a new Flickr photo ID property to add to structured data on Wikimedia Commons. This proposal was discussed and accepted by the Wikimedia Commons community, and gives us a better way to match files on Wikimedia Commons to photos on Flickr:

This is a single field that you can query, and there’s an unambiguous, canonical way that values should be stored in this field – you don’t need to worry about the different variants of Flickr URL.

We added this field to Flickypedia, so any files uploaded with our tool will get this new field, and we hope that other Flickr upload tools will consider adding this field as well. But what about the millions of Flickr photos already on Wikimedia Commons? This is where Flickypedia Backfillr Bot was born.

Updating millions of files

Flickypedia Backfillr Bot applies our structured data mapping to every Flickr photo it can find on Wikimedia Commons – whether or not it was uploaded with Flickypedia. For every photo which was copied from Flickr, it compares the structured data to the live Flickr metadata, and updates the structured data if the two don’t match. This includes the Flickr Photo ID.

It reuses code from our duplicate detector: it goes through a snapshot looking for any files that come from Flickr photos. Then it gets metadata from Flickr, checks if the structured data matches that metadata, and if not, it updates the file on Wikimedia Commons.

Here’s a brief sketch of the process:

Most of the time this logic is fairly straightforward, but occasionally the bot will get confused – this is when the bot wants to write a structured data statement, but there’s already a statement with a different value. In this case, the bot will do nothing and flag it for manual review. There are edge cases and unusual files in Wikimedia Commons, and it’s better for the bot to do nothing than write incorrect or misleading data that will need to be reverted later.

Here are two examples:

  • Sometimes Wikimedia Commons has more specific metadata than Flickr. For example, this Flickr photo was posted by the Donostia Kultura account, and the description identifies Leire Cano as the photographer.

    Flickypedia Backfillr Bot wants to add a creator statement for “Donostia Kultura”, because it can’t understand the description – but when this file was copied to Wikimedia Commons, somebody added a more specific creator statement for “Leire Cano”.

    The bot isn’t sure which statement is correct, so it does nothing and flags this for manual review – and in this case, we’ve left the existing statement as-is.

  • Sometimes existing data on Wikimedia Commons has been mapped incorrectly. For example, this Flickr photo was taken “circa 1943”, but when it was copied to Wikimedia Commons somebody added an overly precise “date taken” statement claiming it was taken on “1 Jan 1943”.

    This bug probably occurred because of a misunderstanding of the Flickr API. The Flickr API will always return a complete timestamp in the “date” field, and then return a separate granularity value telling you how accurate it is. If you ignored that granularity value, you’d create an incorrect statement of what the date is.

    The bot isn’t sure which statement is correct, so it does nothing and flags this for manual review – and in this case, we made a manual edit to replace the statement with the correct date.

What next?

We’re going to keep going! There were a few teething problems when we started running the bot, but the Wikimedia community helped us fix our mistakes. It’s now been running for a month or so, and processed over a million files.

All the Flickypedia code is open source on GitHub, and a lot of it isn’t specific to Flickr – it’s general-purpose code for working with structured data on Wikimedia Commons, and could be adapted to build similar bots. We’ve already had conversations with a few people about other use cases, and we’ve got some sketches for how that code could be extracted into a standalone library.

We estimate that at least 14 million files on Wikimedia Commons are photos that were originally uploaded to Flickr – more than 10% of all the files on Commons. There’s plenty more to do. Onwards and upwards!

Data Lifeboat 5: Prototypes and policy

We are now past the midpoint of our first project stage, and have our three basic prototype Data Lifeboats. At the moment, they run locally via the command line and generate rough versions of what Data Lifeboats will eventually contain—data and pictures.

The last step for those prototypes is to move them into a clicky web prototype showing the full workflow—something we will share with our working group (but may not put online publicly). We are working towards completing this first prototyping stage around the end of June and writing up the project in July.

We’ve made a few key decisions since we last posted an update, namely about who we’re designing for and what other expertise we need to bring in. We still have more questions than answers, but really, that’s what prototyping is for.

Who might do which bit

It took us a while to get to this decision, but once we had gone through the initial discovery phase, it became clear that we need to concentrate our efforts on three key user groups:

  1. Flickr members – People who’ve uploaded pictures to Flickr, have set licenses and permissions, and may either be happy or not happy for their pictures to be put into Data Lifeboats.
  2. Data Lifeboat creators – Could be archivists or other curatorial types looking to gather sets of pictures to copy into archives elsewhere, whether that be an institution like The Library of Congress, or a family archivist with a DropBox account.
  3. Dock operators – This group is a bit more speculative, but, we envision that Data Lifeboats could actually land (or dock) in specific destinations and be treated with special care there. Our ideal scenario would be to develop a network of docks–something we’ve been calling a “Safe Harbor Network”—made up of members that are our great and good cultural organizations: they are already really good at keeping things safe over the long term.

It’ll be good to flesh the needs and wants of these three groups out in more detail in our next stage. If you are a Flickr member reading this, and want to share your story about what your Flickr account means to you, we’d love to hear it.

Web archive vs object archive

Some digital/web preservation experts take the opinion that it’s archivally important to also archive the user interface of a digital property in order to fully understand a digital object’s context. This has arguably resulted in web archives containing a whole lot more information and structural stuff than is useful or necessary. It’s sort of like archiving the entire house within which the shoebox of photos was found.

We have decided that archiving the flickr.com interface itself is not necessary for a Data Lifeboat, and we will be designing a special viewer that will live inside each Data Lifeboat to help people explore its contents.

Analysing the need for new policy

The Data Lifeboat idea is about so much more than technology. Even though that’s certainly challenging, the more we think about it, the more challenging the social and ethical aspects are. It’s gritty, complex stuff, made moreso by the delicate socio-technical settings available to Flickr members, like privacy, search settings, and licensing. The crosshatch of these three vectors makes managing stable permissions over time harder than weaving a complicated textile!

Once we narrowed down our focus to these specific user groups it also became clear that we need to address the (very) complex legal landscape surrounding the potential for archiving of Flickr images external to the service. It’s particularly gnarly when you start considering how permissions might change over time, or how access might shift for different scales of audience. For example, a Flickr member might be happy for Data Lifeboats containing their images to be shared with friends of friends, but a little apprehensive about them being shared with a recognized cultural institution that would use them for research. They may be much less happy for their Flickr pictures to be fully archived and available to anyone in perpetuity.

To help us explore these questions, and begin prototyping policies for each type of user group we foreses, we have enlisted the help of Dr. Andrea Wallace of the Law School at the University of Exeter. She is working with us to develop legal and policy frameworks tailored to the needs of each of these three groups, and to study how the current Flickr Terms of Service may be suitable for, or need adaption around, this idea of a Data Lifeboat. This may include drafting terms and conditions needed to create a Data Lifeboat, how we might be able to enhance rights management, and exploring how to manage expiration or decay of privacy or licensing into the future.

Data Lifeboat prototypes

We have generated three different prototype Data Lifeboats to think with, and show to our working group:

  1. Photos tagged with “Flickrhq”: This prototype includes thousands of tagged images of ‘life working at Flickr’, which is useful to explore the tricky aspects of collating other people’s pictures into a Data Lifeboat. Creating it revealed a search foible, whereby the result set that is delivered by searching via a tag is not consistent. Many of the pictures are also marked as All Rights Reserved, with 33% having downloads disabled. This raises juicy questions about licensing and permissions that need further discussion.
  2. Two photos from each Flickr Commons Member: We picked this subset because Flickr Commons photos are earmarked with the ‘no known copyright restrictions’ assertion, so questions about copying or reusing are theoretically simpler. 
  3. All photos from the Library of Congress (LoC) account: Comprising roughly 42,000 photos also marked as “no known copyright restrictions,” this prototype contains a set that is simpler to manage as all images have a uniform license setting. It was also useful to generate a Data Lifeboat of this size as it allowed us to do some very early benchmarking on questions like how long it takes to create one and where changes to our APIs might be helpful.

Preparing these prototypes has underscored the challenges of balancing the legal, social, and technical aspects of this kind of social media archiving, making clear the need for a special set of terms & conditions for Data Lifeboat creation. They also reveal the limitations of tags in capturing all relevant content (which, to some extent, we were expecting) and the user-imposed restrictions set on images in the Flickr context, like ‘can be downloaded.’

Remaining questions?

OMG, so many. Although the prototypes are still in progress, they have already stimulated great discussion and raised some key questions, such as:

  • How might user intentions or permissions change over time and how could software represent them?
  • How could the scope or scale of sharing influence how shared images are perceived, updated, and utilized?
  • How can we understand how different use cases and how archivists/librarians could engage with the Data Lifeboats?
  • How important is it to make sure Data Lifeboats are launched with embedded rights information, and how might those decay over time?
  • How should we be considering the descriptive or social contexts that accompany images, and how should they inform subsequent decisions about expiration dates?

Long term sustainability and funding models

It’s really so early to be talking about this – and we’re definitely not ready to present any actual, reasonable, viable models here because we don’t know enough yet about how Data Lifeboats could be used or under what circumstances. We did do a first pass review of some obvious potential business models, for example:

  • A premium subscription service that allows Flickr.com users to create personalized Data Lifeboats for their own collections.
  • A consulting service for institutions and individuals who want to create Data Lifeboats for specific archival purposes.
  • Developing training and certification programs for digital archivization that uses Data Lifeboats as the foundation.
  • Membership fees for members of the Safe Harbor network, or charging fees for access to the Data Lifeboat archives.

While there were aspects to each that appealed to our partners, there were also significant flaws so overall, we’re still a long way from having an answer. This is something else we’re planning to explore more broadly in partnership with the wider Flickr Commons membership in subsequent phases of this project.

Next steps

This month we’ll be wrapping up this first prototyping phase supported by the National Endowment for the Humanities. After we’ve completed the required reporting, we’ll move into the next phase in earnest, reaching out to those three user groups more deliberately to learn more about how Data Lifeboats could operate for them and what they would need them to do. 

Two upcoming in-person events!

We’re also very happy to be able to tell you the Mellon Foundation has awarded us a grant to support this next stage, and we’re especially looking forward to running two small events later in the year to gather people from our Flickr Commons partner institutions, as well as other birds of a feather, to discuss these key challenges together.

If you’d like to register your interest in attending one of these meetings, please let us know via this short Registration of Interest form. Please note, these will be small, maybe 20ish people at each, and registering interest does not guarantee a spot, and we’ve only just begun planning in earnest.

 

Data Lifeboat Update 4: What a service architecture could be like

We’re starting to write code for our Data Lifeboat, and that’s pushed us to decide what the technical architecture looks like. What are the different systems and pieces involved in creating a Data Lifeboat? In this article I’m going to outline what we imagine that might look like.

We’re still very early in the prototyping stage of this work. Our next step is going to be building an end-to-end prototype of this design, and seeing how well it works.

Here’s the diagram we drew on the whiteboard last week:

Let’s step through it in detail.

First somebody has to initiate the creation of a Data Lifeboat, and choose the photos they want to include. There could be a number of ways to start this process: a command-line tool, a graphical web app, a REST API.

We’re starting to think about what those interfaces will look like, and how they’ll work. When somebody creates a Data Lifeboat, we need more information than just a list of photos. We know we’re going to need things like legal agreements, permission statements, and a description of why the Lifeboat was created. All this information needs to be collected at this stage.

However these interfaces work, it all ends in the same way: with a request to create a Data Lifeboat for a list of photos and their metadata from Flickr.

To take a list of photos and create a Data Lifeboat, we’ll have a new Data Lifeboat Creator service. This will call the Flickr API to fetch all the data from Flickr.com, and package it up into a new file. This could take a long time, because we need to make a lot of API calls! (Minutes, if not hours.)

We already have the skeleton of this service in the Commons Explorer, and we expect to reuse that code for the Data Lifeboat.

We are also considering creating an index of all the Data Lifeboats we’ve created – for example, “Photo X was added to Data Lifeboat Y on date Z”. This would be a useful tool for people wanting to look up Flickr URLs if the site ever goes away. “I have a reference to photo X, where did that end up after Flickr?”

When all the API calls are done, this service will eventually produce a complete, standalone Data Lifeboat which is ready to be stored!

When we create the Data Lifeboat, we’re imagining we’ll keep it on some temporary storage owned by the Flickr Foundation. Once the packaging is complete, the person or organization who requested it can download it to their permanent storage. Then it becomes their responsibility to make sure it’s kept safely – for example, creating backups or storing it in multiple geographic locations.

The Flickr Foundation isn’t going to run a single, permanent store of all Data Lifeboats ever created. That would turn us into another Single Point of Failure, which is something we’re keen to avoid!

There are still lots of details to hammer out at every step of this process, but thinking about the broad shape of the Data Lifeboat service has already been useful. It’s helped us get a consistent understanding of what the steps are, and exposed more questions for us to ponder as we keep building.

Data Lifeboat Update 3

March has been productive. The short version is it’s complicated but we’re exploring happily, and adjusting the scope in small ways to help simplify it. Let me summarise the main things we did this month.

Legal workshop

We welcomed two of our advisors—Neil from the Bodleian and Andrea from GLAM e-Lab—to our HQ to get into the nitty gritty of what a 50-year-old Data Lifeboat needs to accommodate. 

As we began the conversation, I centred us in the C.A.R.E. Principles and asked that we always keep them in our sights for this work. The main future challenges are settling around the questions of how identity and the right to be forgotten must be expressed, how Flickr account holders can or should be identified, and whether an external name resolver service of some kind could help us. We think we should develop policies for Flickr members (on consent to be in a Data Lifeboat), Data Lifeboat creators (on their obligations as creators), and Dock Operators (an operations manual & obligations for operating a dock). It’s possible there will also be some challenges ahead around database rights, but we don’t know enough yet to give a good update. We’d like a first-take legal framework of the Data Lifeboat system to be an outcome of these first six months.

Privacy & licensing

These are key concepts central to Flickr—privacy and licensing—and we must make sure we do our utmost to respect them in all our work. It would be irresponsible for us to jettison the desires encoded in those settings for our convenience, tempting though that may be. By that I mean, it would be easier for us to make Data Lifeboats that contained whatever photos from whomever, but we must respect the desires of Flickr creators in the creation process. 

There are still big and unanswered questions about consent, and how we get millions of Flickr members to agree to participate and give permission to allow their pictures to be put in other people’s Data Lifeboats. 

Extending the prototype Data Lifeboat sets 

Initially, we had planned to run this 6-month prototype stage with just one test set of images, which would be some or all of the Flickr Commons photographs. But in order to explore the challenges around privacy and licensing, we’ve decided to expand our set of working prototypes to also include the entire Library of Congress Flickr Commons account, and all the photos tagged with “flickrhq” (since that set is something the Flickr Foundation may decide to collect for its own archive and contains photographs from different Flickr members who also happen to have been Flickr staff and would therefore (theoretically) be more sympathetic to the consent question).

Visit to Greenwich

Ewa spotted that there was an exhibition of ambrotype photographic portraits of women in the RNLI at the Maritime Museum in Greenwich at the moment, so we decided to take a day trip to see the portraits and poke around the brilliant museum. We ended up taking a boat from Greenwich to Battersea which was a nice way to experience the Thames (and check out that boat’s life saving capabilities).

Day Out: Maritime Museum & Lifeboats

Day Out: Maritime Museum & Lifeboats

The Data Lifeboat creation process

I found myself needing to start sketching out what it could look like to actually create a Data Lifeboat, and particularly not via a command line, so we spent a while in front of a whiteboard kicking that off. 

At this point, we’re imagining a few key steps:

  1. The Query – “I want these photos” – is like a search. We could borrow from our existing Flinumeratr toy.
  2. The Results – Show the images, some metadata. But it’s hard to show information about the set in aggregate at this stage, e.g., how many of the contents are licensed in which way. This could form a manifest for the Data Lifeboat..
  3. Agreement – We think there’s a need for the Data Lifeboat creator to agree to certain terms. Simple, active language that echoes the CARE principles, API ToS, and Flickr Community Guidelines. We think this should also be included in the Data Lifeboat it’s connected with.
  4. README / Note to the Future – we love the idea that the Data Lifeboat creator could add a descriptive narrative at this point, about why they are making this lifeboat, and for whom, but we recognised that this may not get done at all, especially if it’s too complicated or time-consuming. This is also a good spot to describe or configure warnings, timers, or other conditions needed for future access. Thanks also to two of our other advisors – Commons members Mary Grace and Alan – who shared with us their organisation’s policies on acquisitions for reference.
  5. Packaging – This would be asynchronous and invisible to the creator; downloading everything in the background. We realised it could take days, especially if there are lots of Data Lifeboats being made at once.
  6. Ready! – The Data Lifeboat creator gets a note somehow about the Data Lifeboat being ready for download. We may need to consider keeping it available only for a short time(?).

Creation Schematic, 19th March

Emergency v Non-Emergency 

We keep coming up against this… 

The original concept of the Data Lifeboat is a response to the near-death experience that Flickr had in 2017 when its then-owner, Verizon/Yahoo, almost decided to vaporise it because they deemed it too expensive to sell (something known as “the cost of economic divestment”). So, in the event of that kind of emergency, we’d want to try to save as much of this unique collection as possible as quickly as possible, so we’d need a million lifeboats full of pictures created more or less simultaneously or certainly in a relatively short period of time. 

In the early days of this work, Alex said that the pressure of this kind of emergency would be the equivalent of being “hugged to death by the archivists,” as we all try— in very caring and responsible ways—to save as much as we can. And then there’s the bazillion-emergency-hits-to-the-API-connection problem—aka the “Thundering Herd” problem—which we do not yet have a solution for, and which is very likely to affect any other social media platforms that may also be curious to explore this concept.

We’re connecting with the Flickr.com team to start discussing how to address this challenge. We’re beginning to think about how emergency selection might work, as well as the present, and future, challenges of establishing the identity of photo subjects and account owners. The millions of lifeboats that would be created would surely need the support of the company to launch if they’re ever needed.

This work is supported by the National Endowment for the Humanities.

NEH logo

Data Lifeboat Update 2a: Deeper research into the challenge of archiving social media objects

By Jenn Phillips-Bacher

For all of us at Flickr Foundation, the idea of Flickr as an archive in waiting inspires our core purpose. We believe the billions of photos that have amassed on Flickr in the last 20 years have potential to be the material of future historical research. With so much of our everyday lives being captured digitally and posted to public platforms, we – both the Flickr Foundation and the wider cultural heritage community – have begun figuring out how to proactively gather, make available, and preserve digital images and their metadata for the long term.

In this blog post, I’m setting my sights beyond technology to consider the institutional and social aspects that enable the collection of digital photography from online platforms.

It’s made of people

Our Data Lifeboat project is now underway. Its goal is to build a mechanism to make it possible to assemble and decentralize slivers of Flickr photos for potential future users. (You can read project update 1 and project update 2 for the background). The outcome of the first project phase will be one or more prototypes we will show to our Flickr Commons partners for feedback. We’re already looking ahead to the second phase where we will work with cultural heritage institutions within the wider Flickr Commons network to make sure that anything we put into production best suits cultural heritage institutions’ real-world needs.

We’ve been considering multiple possible use cases for creating, and importantly, docking a Data Lifeboat in a safe place. The two primary institutional use cases we see are:

  1. Cultural heritage institutions want to proactively collect born digital photography on topics relevant to their collections
  2. In an emergency situation, cultural heritage institutions (and maybe other Flickr members) want to save what they can from a sinking online platform – either photos they’ve uploaded or generously saving whatever they can. (And let me be clear: Flickr.com is thriving! But it’s better to design for a worst-case scenario than to find ourselves scrambling for a solution with no time to spare.)

We are working towards our Flickr Commons members (and other interested institutions) being able to accept Data Lifeboats as archival materials. For this to succeed, “dock” institutions will need to:

  • Be able to use it, and have the technology to accept it
  • Already have a view on collecting born digital photography, and ideally this type of media is included in their collection development strategy. (This is probably more important.)

This isn’t just a technology problem. It’s a problem made of everything else the technology is made of: people who work in cultural heritage institutions, their policies, organizational strategies, legal obligations, funding, commitment to maintenance, the willing consent of people who post their photos to online platforms and lots more.

To preserve born digital photos from the web requires the enthusiastic backing of institutions—which are fundamentally social creatures—to do what they’re designed to do, which is to save and ensure access to the raw material of future research.

Collecting social photography

I’ve been doing some background research to inform the early stages of Data Lifeboat development. I came across the 2020 Collecting Social Photography (CoSoPho) research project, which set out to understand how photography is used in social media in order to be able to develop methods for collection and transmission to future generations. Their report, ‘Connect to Collect: approaches to collecting social digital photography in museums and archives’, is freely available as PDF.

The project collaborators were:

  • The Nordic Museum / Nordiska Museet
  • Stockholm County Museum / Stockholms Läns Museum
  • Aalborg City Archives / Aalborg Stadsarkiv
  • The Finnish Museum of Photography / Finland’s Fotografiska Museum
  • Department of Social Anthropology, Stockholm University

The CoSoPho project was a response to the current state of digital social photography and its collection/acquisition – or lack thereof – by museums and archives.

Implicit to the team’s research is that digital photography from online platforms is worth collecting. Three big questions were centered in their research:

  1. How can data collection policies and practices be adapted to create relevant and accessible collections of social digital photography?
  2. How can digital archives, collection databases and interfaces be relevantly adapted – considering the character of the social digital photograph and digital context – to serve different stakeholders and end users?
  3. How can museums and archives change their role when collecting and disseminating, to increase user influence in the whole life circle of the vernacular photographic cultural heritage?

There’s a lot in this report that is relevant to the Data Lifeboat project. The team’s research focussed on ‘digital social photography’, taken to mean any born digital photos that are taken for the purpose of sharing on social media. It interrogates Flickr alongside Snapchat, Facebook, Instagram, as well as region-specific social media sites like IRC-Galleria (a very early 2000s Finnish social media platform).

I would consider Flickr a bit different to the other apps mentioned, only because it doesn’t address the other Flickr-specific use cases such as:

  • Showcasing photography as craft
  • Using Flickr as a public photo repository or image library where photos can be downloaded and re-used outside of Flickr, unlike walled garden apps like Instagram or Snapchat.

The ‘massification’ of images

The CoSoPho project highlighted the challenges of collecting digital photos of today while simultaneously digitizing analog images from the past, the latter of which cultural heritage institutions have been actively doing for many years. Anna Dahlgren describes this as a “‘massification’ of images online”. The complexities of digital social photos, with their continually changing and growing dynamic connections, combined with the unstoppable growth of social platforms, pose certain challenges for libraries, archives and museums to collect and preserve.

To collect digital photos requires a concerted effort to change the paradigm:

  • from static accumulation to dynamic connection
  • from hierarchical files to interlinked files
  • and from pre-selected quantities of documents to aggregation of unpredictably variable image and data objects.

Dahlgren argues that “…in order to collect and preserve digital cultural heritage, the infrastructure of memory institutions has to be decisively changed.”

The value of collecting and contributing

“Put bluntly, if images on Instagram, Facebook or any other open online platform should be collected by museums and archives what would the added value be? Or, put differently, if the images and texts appearing on these sites are already open and public, what is the role of the museum, or what is the added value of having the same contents and images available on a museum site?” (A. Dahlgren)

Those of us working in the cultural heritage sector can imagine many good responses to this question. At the Flickr Foundation, we look to our recent internet history and how many web platforms have been taken offline. Our digital lives are at risk of disappearing. Museums, libraries and archives have that long-term commitment to preservation. They are repositories of future knowledge, and expect to be there to provide access to it.

Cultural heritage institutions that choose to collect from social online spaces can forge a path for a multiplicity of voices within collections, moving beyond standardized metadata toward richer, more varied descriptions from the communities from which the photos are drawn. There is significant potential to collect in collaboration with the publics the institution serves. This is a great opportunity to design for a more inclusive ethics of care into collections.

But what about potential contributors whose photos are being considered for collection by institutions? What values might they apply to these collections?

CoSoPho uncovered useful insights about how people participating in community-driven collecting projects considered their own contributions. Contributors wanted to be selective about which of their photos would make it into a collection; this could be for aesthetic reasons (choosing the best, most representative photos) or concerns for their own or others’ anonymity. Explicit consent to include one’s photos in a future archive was a common theme – and one which we’re thinking deeply about.

Overall, people responded positively to the idea of cultural institutions collecting digital social photos – they too can be part of history!— and also think it’s important that the community from which those photos are drawn have a say in what is collected and how it’s made available. Future user researchers at Flickr Foundation might want to explore contributor sentiment even further.

What’s this got to do with Data Lifeboats?

As an intermediary between billions of Flickr photos and cultural heritage institutions, we need to create the possibilities for long-term preservation of this rich vein of digital history. These considerations will help us to design a system that works for Flickr members and museums and archives.

Adapting collection development practices

All signs point to cultural heritage institutions needing to prepare to take on born digital items. Many are already doing this as part of their acquisition strategies, but most often this born digital material comes entangled in a larger archival collection.

If institutions aren’t ready to proactively collect born digital material from the public web, this is a risk to the longevity of this type of knowledge. And if this isn’t a problem that currently matters to institutions, how can we convince them to save Flickr photos?

As we move into the next phase of the Data Lifeboat project, we want to find out:

  • Are Flickr Commons member institutions already collecting, or considering collecting, born digital material?
  • What kinds of barriers do they face?

Enabling consent and self-determination

CoSoPho’s research surfaced the critical importance of consent, ownership and self-determination in determining how public users/contributors engage with their role in creating a new digital archive.

  • How do we address issues of consent when preserving photos that belong to creators?
  • How do we create a system that allows living contributors to have a say in what is preserved, and how it’s presented?
  • How do we design a system that enables the informed collection of a living archive?
    Is there a form of donor agreement or an opt-in to encourage this ethics of care?

Getting choosy

With 50 billion Flickr photos, not all of them visible to the public or openly licensed, we are working from the assumption that the Data Lifeboat needs to enable selective collecting.

  • Are there acquisition practices and policies within Flickr Commons institutions that can inform how we enable users to choose what goes into a Data Lifeboat?
  • What policies for protecting data subjects in collections need to be observed?
  • Are there existing paradigms for public engagement for proactive, social collecting that the Data Lifeboat technology can enable?

Co-designing usable software

Cultural heritage institutions have massively complex technical environments with a wide variety of collection management systems, digital asset management systems and more. This complexity often means that institutions miss out on chances to integrate community-created content into their collections.

The CoSoPho research team developed a prototype for collecting digital social photography. That work was attempting to address some of these significant tech challenges, which Flickr Foundation is already considering:

  • Individual institutions need reliable, modern software that interfaces with their internal systems; few institutions have internal engineering capacity to design, build and maintain their own custom software
  • Current collection management systems don’t have a lot of room for community-driven metadata; this information is often wedged in to local data fields
  • Collection management systems lack the ability to synchronize data with social media platforms (and vice versa) if the data changes. That makes it more difficult to use third-party platforms for community description and collecting projects.

So there’s a huge opportunity for the Flickr Foundation to contribute software that works with this complexity to solve real challenges for institutions. Co-design–that is, a design process that draws on your professional expertise and institutional realities–is the way forward!

We need you!

We are working on the challenge of keeping Flickr photos visible for 100 years and we believe it’s essential that cultural heritage institutions are involved. Therefore, we want to make sure we’re building something that works for as many organizations as possible – both big and small – no matter where you are in your plans to collect born digital content from the web.

If you’re part of the Flickr Commons network already, we are planning two co-design workshops for Autumn 2024, one to be held in the US and the other likely to be in London. Keep your eyes peeled for Save-the-Date invitations, or let us know you’re interested, and we’ll be sure to keep you in the loop directly.

This work is supported by the National Endowment for the Humanities.

NEH logo