From Desiderata to READMEs: The case for a C.A.R.E.-full Data Lifeboat Pt. I

By Fattori McKenna

This is the first of a two-part blog post where we detail our thinking around ethics and the Data Lifeboat README function. In this blog-post we reflect on the theoretical precursors and structural interventions that inform our approach. We specifically question how these dovetail with the dataset we are working with (i.e. images on Flickr.com) and the tool we’re developing, the Data Lifeboat. In part 2 (forthcoming), we will detail the learnings from our ethics session at the Mellon co-design workshops and how we plan to embed these into the README feature.

Installation View of Smithsonian Photography Exhibition Art Section | Smithsonian Institution

Spencer Baird, the American naturalist and first curator of the Smithsonian Institution, instructed his collectors in ‘the field’ what to collect, how to describe it and how to preserve it until returning back Eastwards, carts laden. His directions included:

Birds and mammalia larger than a rat should be skinned. For insects and bugs — the harder kinds may be put in liquor, but the vessels and bottles should not be very large. Fishes under six inches in length need not have the abdominal incision… Specimens with scales and fins perfect, should be selected and if convenient, stitched or pinned in bits of muslin to preserve the scales. Skulls of quadrupeds may be prepared by boiling in water for a few hours… A little potash or ley will facilitate the operation.

Baird’s 1848 General Directions for Collecting and Preserving Objects of Natural History is an example of a collecting guide, also known at the time as a desiderata (literally ‘desired things’). It is this archival architecture that Hannah Turner (2021) takes critical aim at in Cataloguing Culture: Legacies of Colonialism in Museum Documentation. According to Turner, Baird’s design “enabled collectors in the field and museum workers to slot objects into existing categories of knowledge”.

Whilst the desiderata prompted the diffuse and amateur spread of collecting in the 19th century, no doubt flooding burgeoning institutional collections with artefacts from the so-called ‘field’, the input and classification systems these collecting guides held came with their own risks. Baird’s 1848 desiderata shockingly includes human subjects—Indigenous people—perceived as extensions of the natural world and thus procurable materials in a concerted attempt to both Other and historicise. Later collecting guides would be issued for indigenous tribal artefacts, such as the Haíłzaqv-Haida Great Canoe – now in the American Museum of Natural History’s Northwest Coast Hall – as well as capturing intangible cultural artefacts – as documented in Kara Lewis’ study of the 1890 collection of Passamaquoddy wax recording cylinders used for tribal music and language. But Turner pivots our focus away from what has been collected, and instead towards how these objects were collected, explaining, “practices and technologies, embedded in catalogues, have ethical consequences”.

While many physical artefacts have been returned to Indigenous tribes through activist-turned-institutional measures (such as the repatriation of Iroquois Wampum belts from the National Museum of the American Indian or the Bååstede project returning Sami cultural heritage from Norway’s national museums), the logic of the collecting guides remains. Two centuries later, the nomenclature and classification systems from these collecting guides have been largely transposed into digital collection management systems (CMS), along with digital copies of the objects themselves. Despite noteworthy efforts to to provide greater access and transparency through F.A.I.R. principles or rewrite and reclaim archival knowledge systems—such as Traditional Knowledge (T.K.) Labels and C.A.R.E. principles, Kara Lewis (2024) notes that “because these systems developed out of the classification structures before them, and regardless of how much more open and accessible they become, they continue to live with the colonial legacies ingrained within them”. The slowness of the Galleries, Libraries, Archives and Museums (G.L.A.M.) sector to adapt, Lewis continues, stems less from “an unwillingness to change, and more with budgets that do not prioritize CMS customizations”. Evidently a challenge lies in the rigidly programmed nature of rationalising cultural description for computational input.

In our own Content Mobility programme, the Data Lifeboat project, we propose that creators write a README. In our working prototype, the input is an open-text field, allowing creators to write as much or as little as they wish about their Data Lifeboat’s purpose, contents, and future intentions. However, considering Turner’s cautionary perspective, we face a modern parallel: today’s desiderata is data, and the field is the social web—deceptively public for users to browse and “Right-Click-Save” at will. We realised that in designing the input architecture for Data Lifeboats, we could inadvertently be creating a 21st century desiderata: a seemingly open and neutral digital collecting tool that beneath the surface risks perpetuating existing inequalities.

This blog-post will introduce the theoretical and ethical underpinnings to the Data Lifeboat’s collecting guide, or README, that we want to design. The decades of remedy and reconciliatory work, tirelessly driven primarily by Indigenous rights activists, in addressing the archival injustices first cemented by early collecting guides provides a robust starting point for embedding ethics into the Data Lifeboat. Indigenous cultural heritage inevitably exists within Flickr’s collections, particularly among our Flickr Commons members who are actively pursuing their own reconciliation initiatives. Yet the value of these interventions extends beyond Indigenous cultural heritage, serving as a foundation for ethical data practices that benefit all data subjects in the age of Big Data.

Untitled, Smithsonian Institution

A Brief History of C.A.R.E Principles

Building on decades of Indigenous activism and scholarship in restitution and reconciliation, the C.A.R.E. principles emerged in 2018 from a robust lineage of interventions, such as Native American Graves Protection and Repatriation Act (NAGPRA, 1990) and The United Nations Declaration on the Rights of Indigenous Peoples (UNDRIP, 2007), which sought to recognise and restore Indigenous sovereignty over tangible and intangible cultural heritage.

These earlier frameworks were primarily rooted in consultation processes with Indigenous communities, ensuring that their consent and governance shaped the management of artefacts and knowledge systems. For instance, NAGPRA enabled tribes to reclaim human remains and sacred objects through formalised dialogues and consultation sessions with museums. Similarly, Traditional Knowledge Labels (Local Contexts Initiative) were designed to identify Indigenous protocols for accessing and using knowledge within the museum’s existing collection, for instance a tribal object may be reserved for viewing only by female tribal members. These methods worked effectively within the domain of physical collections but faltered when confronted with the scale and opaqueness of data in the digital age.

In this context, Indigenous governance of data emerged as essential, particularly for sensitive datasets such as health records, where documented misuse showed evidence of perpetuating harm. As the Data Science field developed, it often prioritised the technical ideals of F.A.I.R. principles (Findable, Accessible, Interoperable, Reusable), which advocate for improved usability and discoverability of data, to counter increasingly oblique and privatised resources. Though valuable, F.A.I.R. principles fell short on the ethical dimensions of data, particularly on how data is collected and used in ways that affect already at-risk communities (see also O’Neil 2016, Eubanks 2018, and Benjamin 2019). As the Global Indigenous Data Alliance argued:

“Mainstream values related to research and data are often inconsistent with Indigenous cultures and collective rights”

Recognising the challenges posed by Big Data and Machine Learning (ML)—from entrenched bias in data to the opacity of ML algorithms—Indigenous groups such as the Te Mana Raraunga Māori Data Sovereignty Network, the US Indigenous Data Sovereignty Network, and the Maiam nayri Wingara Aboriginal and Torres Strait Islander Data Sovereignty Collective led efforts to articulate frameworks for ethical data governance. These efforts culminated in a global, inter-tribal workshop in Gaborone, Botswana, in 2018, convened by Stephanie Russo Carroll and Maui Hudson in collaboration with the Research Data Alliance (RDA) International Indigenous Data Sovereignty Interest Group. The workshop formalised the C.A.R.E. principles, which were published by the Global Indigenous Data Alliance in September 2019 and proposed as a governance framework with people and purpose at its core.

The C.A.R.E. principles foreground the following four values around data:

Collective Benefit: Data must enhance collective well-being and serve the communities to which it pertains.
Authority to Control: Communities must retain governance over their data and decide how it is accessed, used, and shared.
Responsibility: Data handlers must minimise harm and ensure alignment with community values.
Ethics: Ethical considerations rooted in cultural values and collective rights must guide all stages of the data lifecycle.

Untitled, Smithsonian Institution

C.A.R.E. in Data Lifeboats?

While the C.A.R.E. principles were initially developed to address historical data inequities and exploitation faced by Indigenous communities, they offer a framework that can benefit all data practices: as the Global Indigenous Data Alliance argues, “Being CARE-Full is a prerequisite for equitable data and data practices.”

We believe the principles are important for Data Lifeboat, as collecting networked images from Flickr poses the following complexities:

Data Lifeboat creators will be able to include images from Flickr Commons members (which may include images of culturally sensitive content)
Data Lifeboat creators may be able to include images from other Flickr members, besides themselves
Subjects of photographs in a Data Lifeboat may be from historically at-risk groups
Data Lifeboats are designed to last and therefore may be separated from their original owners, intents and contexts.

The Global Inidgenous Data Alliance asserts, their principles must guide every stage of data governance “from collection to curation, from access to application, with implications and responsibilities for a broad range of entities from funders to data users.” The creation of a Data Lifeboat is an opportunity to create a new collection, thus we have the opportunity to embed C.A.R.E. principles from the start. Although we cannot control how Data Lifeboats will be used or handled after their creation, we can attempt to establish an architecture for encouraging that C.A.R.E. is deployed throughout the data lifecycle.

Untitled, Smithsonian Institution

Enter: The README

Our ambition for the Data Lifeboat (and the ethos behind many of Flickr.org programmes) is the principle of “conscious collecting”. We aim to move away from the mindset of perpetual accumulation that plagues both museums and Big Tech alike—a mindset that advances a dangerous future, as cautioned by both anti-colonialist and environmentalist critiques. Conscious collecting allows us to better consider and care for what we already have.

One of the possible ways we can embed conscious collecting is through the inclusion of a README—a reflective, narrative-driven process for creating a Data Lifeboat.

READMEs are files traditionally used in software development and distribution that contain information about files within the directory. It is often in the form of plain text (.txt, .md), to maximise readability, frequently containing information about operating instructions, troubleshooting, credits, licensing and changelogs, intended to be read on start-up. In the Data Lifeboat, we have adopted this container to supplement the files. Data Lifeboat creators are introduced to the README in the creation process and, in the present prototype, are met with the following prompts to assist writing:

Tell the future why you are making this Data Lifeboat.
Is there anything special you’d like future viewers to know about the contents? Anything to be careful about?

(These prompts are not fixed, as you’ll read in Part 2)

During our workshops, participants noted the positive (and rarely seen) experience of introducing friction to data preservation. This friction slows down the act of collecting and creates space to engage with the social and ethical dimensions of the content. As Christen & Andersen (2019) emphasise in their call for Slow Archives, “Slowing down creates a necessary space for emphasising how knowledge is produced, circulated, and exchanged through a series of relationships”. We hope that Data Lifeboat’s README will contribute to Christen & Andersen’s invocation for the “development of new methodologies that move toward archival justice that is reparative, reflective, accountable, and restorative”.

We propose three primary functions of the README in a Data Lifeboat:

Telling the Story of an Archive

Boast, Bravo, and Srinivasan (2018), reflecting on Inuit artefacts in an institutional museum collection, write that its transplant results in the deprivation of “richly situated life of objects in their communities and places of origin.” Once subsumed into a collection, artefacts often suffer the “loss of narrative and thick descriptions when transporting them to distant collections”.

We are conscious that this could be the fate of many images once transplanted in a Data Lifeboat. Questions emerged in our workshops as to how to maintain the contextual world around the object, speaking of not only its social metadata (comments, tags, groups, albums) but also the more personal levers of choice, value and connection. A README resists the diminishment of narrative by creating opportunities to retain and reflect on the relational life of the materials.

The README directly resists the archival instinct toward neutrality, by its very format it holds that this can never be true. Boden critiques the paucity of current content management systems, their highly structured input formats cannot meet our responsibilities to communities as they do not give space to fully citing how information came to be known and associated with an object and on whose authority. Boden argues for “reflections on the knowledge production process”, which is what we intend the README to encourage the Data Lifeboat creator to do. The README prompts (could) suggest Data Lifeboat creator reflect on issues around ownership (e.g. is this your photo?), consent (e.g. were all photo subjects able to consent to inclusion in a Data Lifeboat?), and embedded power relations (e.g. are there any persecuted minorities in this Data Lifeboat?): acknowledging the archive is never objective.

More poetically, the README could prompt greater storytelling, serving as a canvas for both critical and emotional reflection on the content of a Data Lifeboat. Through guided prompts, creators could explore their personal connections to the images, share the stories behind their selection process, and document the emotional resonance of their collection. A README allows creators to capture and contextualise not only the images themselves, but to add layers of personal inscription and meaning, creating a richer, more distributed archive.
Decentralised and Distributed Annotation

The Data Lifeboat constitutes a new collecting format that intends to operate outside traditional archival systems’ rigid limitations and universalising classification schemes. The README encourages decentralised curation and annotation by enabling communities to directly contribute to selecting and contextualising archival and contemporary images, fostering what Huvila (2008) terms the ‘participatory archive’ [more on Data Lifeboat as a tool for decentralised curation here].

User-generated descriptions such as comments, tags, groups, and galleries — known on Flickr as ‘social metadata’ —serve as “ontological keys that unlock the doors to diverse, rich, and incommensurable knowledge communities” (Boast et al., 2018), embracing multiple ways of knowing the world. Together, these create ‘folksonomies’—socially-generated digital classification systems that David Sturz argues are particularly well-suited to “loosely-defined, developing fields,” such as photo subjects and themes often overlooked by the institutional canon. The Data Lifeboat captures the rich, social media that is native to Flickr, preserving decades worth of user contributions.

The success of community annotation projects has been well-documented. The Library of Congress’s own Flickr Pilot Project demonstrated how community input enhanced detail, correction, and enrichment. As Michelle Springer et al. (2018) note, “many of our old photos came to us with very little description and that additional description would be appreciated”. Within nine months of joining Flickr, committing to a hands-off approach, the Library of Congress accumulated 67,000 community-added tags. “The wealth of interaction and engagement that has taken place within the comments section has resulted in immediate benefits both for the Library and users of the collections,” continues Springer et al. After staff verification, these corrections and additions to captions and titles demonstrated how decentralised annotation could reshape the central archive itself. As Laura Farley (2014) observes, community annotation “challenges archivists to see their collections not as closely guarded property of the repository, but instead as records belonging to a society of users”.

Beyond capturing existing metadata, the README enables Data Lifeboat creators to add free-form context, such as correcting erroneous tags or clarifying specific terminology that future viewers might misinterpret—like the Portals to Hell group. As Duff and Harris (2002) write, “the power to describe is the power to make and remake records and to determine how they will be used and remade in the future. Each story we tell about our records, each description we compile, changes the meaning of records and recreates them” — the README hands over the narrative power to describe.
Data Restitution and Justice

Thinking speculatively, the README could serve an even more transformative purpose as a tool for digital restitution. Through the Data Lifeboat framework, communities could reclaim contested archival materials and reintegrate them into their own digital ecosystems. This approach aligns with “Steal It Back” (Rivera-Carlisle, 2023) initiatives such as Looty, which creates digital twins of contested artefacts, currently held in Western museums. By leveraging digital technologies, these initiatives counter the slow response of GLAM institutions to restitution calls. As Pavis and Wallace (2023) note, digital restitution offers the chance to “reverse existing power hierarchies and restore power with the peoples, communities, and countries of origin”. In essence, this offers a form of “platform exit” that carves an alternative avenue of control of content to original creators or communities, regardless of who initially uploaded the materials. In an age of encroaching data extractivism, the power to disengage, though severe, for at-risk communities can be the “reassertion of autonomy and agency in the face of pervasive connectivity” (Kaun and Treré, 2021).

It is a well-documented challenge in digital archives that many of the original uploaders were not the original creators, which prompts ought to prompt reflections around copyright and privacy. As Payal Arora (2019) has noted our dominant frameworks largely ignore empirical realities of the Global South: “We need to open our purview to alternative meanings including paying heed to the desire for selective visibility, how privacy is often not a choice, and how the cost of privacy is deeply subjective”. Within the README, Data Lifeboat creators can establish terms for their collections, specifying viewing contexts, usage conditions, and other critical contextual information. They can also specify restrictions on where and how their images may be hosted or reused in the future (e.g. ‘I refuse to let these image be used in AI training data sets’). A README could allow for Data Lifeboat creators to expand and detail more fluid and cultural and context-specific conditions for privacy and re-use.

At best, these terms would allow Data Lifeboat creators to articulate their preferences for how their materials are accessed, interpreted and reused in the future, functioning as an ethical safeguard. While these terms may not always be enforceable, they provide a clear record of the creators’ intentions. Looking ahead, we could envision the possibility of making these terms machine-readable and executable. The sustenance of these terms could potentially be incorporated into the governance framework of the Safe Harbor Network, our proposed decentralised storage system of cultural institutions that can hold Data Lifeboats for the long-term.

Untitled, Smithsonian Institution

Discussion: README as a Datasheet for Networked Social Photography Data Sets?

In the long history of cataloging and annotating data, Timnit Gebru et al.’s (2018) Datasheets for Datasets stands out as an emerging best practice for the machine learning age. These datasheets provide “a structured approach to the description of datasets,” documenting provenance, purpose, and ethical considerations. By encouraging creators to critically reflect on the collection, composition, and application of datasets, datasheets foster transparency and accountability in an otherwise vast, opaque, and voraciously consuming sphere.

The Digital Cultural Heritage space has made similar calls for datasheets in archival contexts, as they too handle large volumes of often uncontextualised and culturally sensitive data. As Alkemade et al. (2023) note, cultural heritage data is unique: “They are extremely diverse by nature, biased by definition and hardly ever created or collected with computation in mind”. They argue, “In contrast to industrial or research datasets that are assembled to create knowledge… cultural heritage datasets may present knowledge as it was fabricated in earlier times, or community-based knowledge from lost local contexts”. Given this uniqueness, digital cultural heritage requires a tailored datasheet format that enables rich, detailed contextualization reflecting both the passage of time and potentially lost or inaccessible meanings. Just as datasheets have transformed technical datasets, the README has the potential to reshape how we collect, interpret, and preserve the networked social photography that is native to the Flickr.com platform — something we argue is part of our collective digital heritage.

There are, of course, limitations—neither datasheets nor READMEs will be a panacea for C.A.R.E-full data practices. Gebru et al. acknowledge that “Dataset creators cannot anticipate every possible use of a database”. The descriptive approach also presents possible trade-offs: “identifying unwanted societal biases often requires additional labels indicating demographic information about individuals,” which may conflict with privacy or data protection. Gebru notes that the Datasheet “will necessarily impose overhead on dataset creator”—we recognise this friction as a positive. Echoing Christen and Anderson’s call “Slowing down is about focusing differently, listening carefully, and acting ethically“.

Marshall Islands Navigation Chart | Smithsonian Institution

Conclusion

Our hope is that the README is both a reflective and instructive tool that prompts Data Lifeboat Creators to consider the needs and wishes of each of the four main user groups in the Data Lifeboat ecosystem:

Flickr Members
Data Lifeboats Creators
Safe Harbor Dock Operators
Subjects in the Photo

While we do not yet know precisely what form the README will take, we hope our iterative design process can offer flexibility to accommodate the needs of—and our responsibilities to—Data Lifeboat creators, photographic subjects and communities, and future viewers.

In our Mellon-funded Data Lifeboat workshops in October and November, we asked our participants to support us in co-designing a digital collecting tool with care in mind. We asked:

What prompts or questions for Data Lifeboat creators could we include in the README to help them think about C.A.R.E. or F.A.I.R. principles. Try to map each question to a letter.

The results of this exercise and what this means for Data Lifeboat development will be detailed in Part 2.

The photographs in this blog post come from the Smithsonian Institution’s Thomas Smillie Collection (Record Unit 95) – Thomas Smillie served as the first official photographer for the Smithsonian Institution from 1870 until his death in 1917. As head of the photography lab as well as its curator, he was responsible for photographing all of the exhibits, objects, and expeditions, leaving an informal record of early Smithsonian collections.

Bibliography

Alkemade, Henk, et al. “Datasheets for Digital Cultural Heritage Datasets.” Journal of Open Humanities Data, vol. 9, 2023, doi:10.5334/johd.124.

Arora, Payal. “Decolonizing Privacy Studies.” Television & New Media, vol. 20, no. 4, 26 Oct. 2018, pp. 366–378, doi:10.1177/1527476418806092.

Baird, Spencer. “General Directions for Collecting and Preserving Objects of Natural History”, c. 1848, Dickinson College Archives & Special Collections

Benjamin, Ruha. Race After Technology: Abolitionist Tools for the New Jim Code. Polity, 2019.

Boast, Robin, et al. “Return to Babel: Emergent Diversity, Digital Resources, and Local Knowledge.” The Information Society, vol. 23, no. 5, 27 Sept. 2007, pp. 395–403, doi:10.1080/01972240701575635.

Boden, Gertrud. “Whose Information? What Knowledge? Collaborative Work and a Plea for Referenced Collection Databases.” Collections: A Journal for Museum and Archives Professionals, vol. 18, no. 4, 12 Oct. 2022, pp. 479–505, doi:10.1177/15501906221130534.

Carroll, Stephanie Russo, et al. “The CARE Principles for Indigenous Data Governance.” Data Science Journal, vol. 19, 2020, doi:10.5334/dsj-2020-043.

Christen, Kimberly, and Jane Anderson. “Toward Slow Archives.” Archival Science, vol. 19, no. 2, 1 June 2019, pp. 87–116, doi:10.1007/s10502-019-09307-x.

“Digital Media Activism A Situated, Historical, and Ecological Approach Beyond the Technological Sublime.” Digital Roots, by Emiliano Treré and Anne Kaun, De Gruyter Oldenbourg, 2021.

Duff, Wendy M., and Verne Harris. “Stories and Names: Archival Description as Narrating Records and Constructing Meanings.” Archival Science, vol. 2, no. 3–4, Sept. 2002, pp. 263–285, doi:10.1007/bf02435625.

Eubanks, Virginia. Automating Inequality: How High-Tech Tools Profile, Police, and Punish the Poor. Picador, 2018.

Gebru, Timnit, et al. “Datasheets for Datasets.” Communications of the ACM, vol. 64, no. 12, 19 Nov. 2021, pp. 86–92, doi:10.1145/3458723.

Griffiths, Kalinda E et al. “Indigenous and Tribal Peoples Data Governance in Health Research: A Systematic Review.” International journal of environmental research and public health vol. 18,19 10318. 30 Sep. 2021, doi:10.3390/ijerph181910318

Lewis, Kara. “Toward Centering Indigenous Knowledge in Museum Collections Management Systems.” Collections: A Journal for Museum and Archives Professionals, vol. 20, no. 1, Mar. 2024, pp. 27–50, doi:10.1177/15501906241234046.

O’Neil, Cathy. Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy. Penguin, 2017.

Rivera-Carlisle, Joanna. “Contextualising the Contested: XR as Experimental Museology.” Herança, vol. 6, no. 1, 2023, doi.org/10.52152/heranca.v6i1.676

Pavis, Mathilde, and Andrea Wallace. “Recommendations on Digital Restitution and Intellectual Property Restitution.” SSRN Electronic Journal, 2023, doi:10.2139/ssrn.4323678.

Schaefer, Sibyl. “Energy, Digital Preservation, and the Climate: Proactively Planning for an Uncertain Future.” iPRES 2024 Papers – International Conference on Digital Preservation. 2024.

Shilton, Katie, and Ramesh Srinivasan. “Participatory Appraisal and Arrangement for Multicultural Archival Collections.” Archivaria, vol. 63, Spring 2007.

Springer, Michelle et al. “For the Common Good: The Library of Congress Flickr Pilot Project”. Library of Congress Collections, 2008.

Sturz, David N. “Communal Categorization: The Folksonomy”, INFO622: Content Representation, 2004.

Turner, Hannah. Cataloguing Culture: Legacies of Colonialism in Museum Documentation. University of British Columbia Press, 2022.

A Phoenix in Paris: Data Lifeboats for Citizen-Driven Histories

By Fattori McKenna & George Oates

This blog post discusses the value of social media photography in enhancing our understanding and emotional vocabulary around historic events. It makes the case for a Data Lifeboat as a effective collecting tool for these types of citizen-made galleries (and histories). Additionally it also recounts the recommendation of other Data Lifeboat themes as collated during the Mellon co-design workshops.

On Saturday, December 7th 2024, Notre Dame Cathedral reopened its iron-clad tracery doors, marking the end of a four-year closure. The news coverage focused on the splendour — and occasional controversy — of its distinguished guests, contentious seating plans and retro light shows. The reopening inevitably brought back memories of the 2019 tragedy that befell the cathedral, destroyed by fire. Somehow the event underscored our collective helplessness under the Covid-19 lockdowns as viewers could only watch in horror as the same images spread around news and social media. On reflection the ubiquity and uniformity of the images is surprising, so often captured from the southeast end of the chancel: the flechè engulfed in flames like a tapered candle, and behind it through the iridescent smoke, a pair of lancet windows seemed to peer back at the viewer, embodying a vision of Notre Dame des Larmes—Our Lady in tears.

A selection of newspapers displaying the Notre Dame fire 2019

There is an alternative history of the Notre Dame fire, captured not through mainstream media but in the dispersed archives of networked social photography—an often overlooked and underreported lens on the event. Among the pictures gathered by the Flickr Foundation is a remarkable collection of street photographs that offer a fresh perspective (also shared in this post). These images place the fire in context: smoke billows against the backdrop of the Eiffel Tower as seen from Trocadéro; a woman in a fur coat strolls nonchalantly past the boarded-up bouquinistes; a couple steal a glance from their moped; a man abandons his bicycle, supplicating on the banks of the Seine. User-generated social photography expands the event from its mass-reproduced, singular, fixed perspective, into a multi-dimensional, multi-vocal narrative, that unfolds longitudinally over time.

This is the incantation of social photography at its best, so often dismissed for its sheer volume, producing images that are “unoriginal, obvious, boring.” Yet, as art historian Geoffrey Batchen counters, “There are no such things as banal photographs, only banal accounts of photography.” The true value lies not just in the images themselves, but in how we look at them. It is through this act of curation, contextualisation and interpretation that these photographs gain their depth.

A People’s History through the Lens

Embedded within the story of photography itself is a people’s history. From its inception, photography has centred the social subject, capturing the overlooked and hidden realities that traditional media refused to. In mid-19th century Paris, early photographers, Eugène Atget and Félix Nadar chronicled the changing urban landscapes, preserving scenes of working-class neighbourhoods, subject to Haussman’s destruction, proletarian characters and everyday life. The camera’s portability, speed and (perceived) candidness made it suitable to the task of documenting the unseen.

The social subject in photography has long been intended to elicit sentiment and action. Social photographs compel viewers to respond emotionally and, ideally, to take action. Jacob Riis’s How the Other Half Lives (1888), aimed at middle-class audiences, used images of New York’s Lower East Side slums to generate empathy and drive charity. As Walter Benjamin later described it, the medium of photography has a “revolutionary use-value” in its ability to render visible hidden social structures. Contemporary documentary collections, such Kris Graves’s A Bleak Reality, have harnessed this historic compulsion to catalyse social change.

There is rightly so, caution against social photography’s treatment of its subjects. As Maren Stange argues, in her analysis of American documentary photographers including Riis (along with Hine and the famed Farm Security Administration collection), social photography has historically rested on assumptions about its subjects, instrumentalising them or reducing them to symbolic devices. Moreover, it often fails to acknowledge that the photographer inherently constructs the photograph’s reality. As David Peeler notes, each photograph holds “layers of accreted meanings” shaped by the photographer’s choice in composition, processing, and presentation. In the age of citizen-driven photography, with the distributed ability to manipulate images, these limitations become even more pronounced, requiring an explicit recognition of the constructed nature of the medium.

The construction of this fragment of reality in photography, however, is not inherently negative. When acknowledged, it can be a source of power, offering what Donna Haraway describes as situated knowledge—a rejection of objectivity in favor of partial, contextual perspectives. Citizen-driven collections, though subjective and, by their very nature incomplete, serve as an antidote to what Haraway calls the “conquering gaze from nowhere.” They counter dominant, institutional narratives with fragmented, personal views.

The Photograph Over Time

The value of a photograph often increases over time, as its meaning and significance evolve in relation to historical, cultural, and social contexts. Photographs gain depth through longitudinal perspectives, becoming more compelling at a distance as they reveal shifts, absences, and forgotten details. As Silke Helmerdig, in Fragments, Futures, Absence and the Past, discusses in her treatment of German photography of the 2010s, we ought to see photography as speaking to us in the future subjunctive, it asks us, “what could be, if?”

With time and attention, photographs can be viewed in aggregate, the future historian can pull from concurrent sources. Our contemporary photographic collecting tools, as in the case of Flickr’s Galleries and Albums, which allow curation of others people’s photographs, can come to resemble a sort of photomontage. Rosalind Krauss, writing on the photomontages of Hannah Höch and other Dadaists in The Optical Unconscious, argues that the medium forces a dialogue between images, creating unexpected connections and breaking the linearity of traditional visual narratives thus opening space for political critique. The Notre Dame gallery disrupts the throughline of the ‘official’ imagery of the event, creating a space for discourse of other elements besides the central action (e.g. gender, capitalism, urbanism).

Securing the People’s Archive

Having discussed the value of the citizen-made collection, this compels us to ask what if our institutional archives began collecting contemporaneously more? We believe Data Lifeboat can help with this. The Notre Dame Gallery is just one example of a potential collection for a Data Lifeboat: our tool for long-term preservation of networked social photography and citizen-made collections from Flickr.com.

Data Lifeboats could be deployed as a curatorial tool for networked social photography, providing institutions with a way to collect, catalogue and reflect on citizen-driven narratives. At present, there is not an archival standard for images on social media and archivists still struggle with the vastness and maintenance of those they’ve managed to collect [see our blog-post from iPres]. Data Lifeboat thus operates as a sort of packaging tool, flexible and light enough to adapt to collections of differing scales and purviews, but still maintaining the social context that makes networked images so valuable.

There are two potential approaches:

Hands-on: Data Lifeboats could be commissioned by an institution around a certain topic. For example, the Museum of London could commission a group of Whitechapel teenagers to collect photos from Flickr.com of their neighbourhood spaces that are meaningful to them.
Hands-off: Citizens create Data Lifeboats independently of a topic of their choosing. Institutions may choose to hold these bounded social media archives as a public good, for the benefit of our collective digital heritage.

In both cases, the institutions become holders of Data Lifeboats and they are subsumed into their digital collections management systems. Data Lifeboats become part of a process of Participatory Appraisal, extending and diversifying the ‘official archive’, addressing the persistent gap of who gets to be represented. As we have also discussed, there are also possibilities for distributing the load of Data Lifeboats, more on this in the Safe Harbor Network.

Other possible Data Lifeboats

During our Mellon-funded workshops, we asked participants to suggest Data Lifeboats they would like to see in their institutional collections, but also any they would create themselves for personal use.

At-risk Subjects

Collections focus on documenting vulnerable or ephemeral content that might disappear without active intervention. This includes both environmental changes and socio-political documentation that could be censored or lost.

e.g. glaciers over time, rapid response after a disaster, disappearing rural life across Europe, politically at-risk accounts

Subjects often overlooked

Collections that aim to preserve marginalised voices and underrepresented perspectives, helping to fill gaps in traditional institutional archives and ensure a more representative historical record.

e.g. a queer community coffee shop, Black astronauts, local street art, life in Communist Poland

Nostalgia for Web 1.0

As so much of Web 1.0 disappears (e.g. Geocities, MySpace music, see also ‘Digital Dark Age‘), there is a desire to archive and begin critically reflecting on the early days of the web.

e.g. self-portraits from the early 2000s, vernacular photography from the 2010s, Flickr HQ, most viewed Flickr photos

Quirky Collections

Flickr is renowned as a home for serendipitous discovery on the web, sometimes lauded as ‘digital shoebox of photographs’, there is the opportunity to replicate this ‘quirkiness’ with Data Lifeboats.

e.g. ghost signs, every Po’Boy in town, electricity pylons of the world

Personal collections

e.g. family archives, 365 challenges, a group of friends

Data Lifeboats could serve as secure containers for digital family heirlooms. Built into Flickr.com are privacy controls (Friends, Family) that would carry over to Data Lifeboats, preserving privacy for the long-term

Conclusion

The Notre Dame gallery exemplifies an ideal subject for a Data Lifeboat, both in its content and curatorial approach. The Data Lifeboat framework serves as an apt vessel, with its built-in capabilities:

Data Lifeboats can capture alternative viewpoints, situated knowledges and stories from below through tapping into the vast Flickr archive. We recognise that we can never capture, nor preserve, the archive in its entirety, so Data Lifeboats tap into the logic of the archival sliver.
Data Lifeboats can preserve citizen-driven communication through their unique storage of social metadata. This means that the conversations around the images are preserved with the images themselves, creating a holistic entity.
Data Lifeboats are purposely designed with posterity in mind. Technologically, their light-touch design means they are built to last. Furthermore, the README (link) nudges the Data Lifeboat creator toward conscious curation and commentary, providing value to future historians.

Can you think of any other Data Lifeboats? We’d love to hear about them.

As you may know, we had laid out doing two workshops:

Washington DC, at The Library of Congress, in October, and
London, at the Garden Museum and Autograph Gallery, in November.

We were pleased to welcome a total of 32 people across the events, from libraries, archives, academic institutions, the freelance world, other like-minded nonprofits, Flickr.com, and Flickr.org.

Now we are doing the work of sifting through the bazillion post-its and absorbing the great conversations had as we worked through Tori’s fantastic program for the event. We were all very well-fed and organized too, thanks to Ewa’s superb project management. Thank you both.

Workshop aims

The aims of each workshop were the same:

Articulate the value of archiving social media, and Data Lifeboat
Detail where Data Lifeboat fits in current ecology of tools and practices
Detail where Data Lifeboat fits with curatorial approaches and content delivery
Plot (and recognise) the type and amount of work it would take to establish Data Lifeboat or similar in organisations

Workshop outline

We met these aims by lining up the workshops into different sessions:

Foundations of Long-Term Digital Preservation – Backward/forward horizons; understanding digital infrastructures; work happening in long-term digital preservation
Data Lifeboat: What we’re thinking so far – Reporting on our NEH work to prototype software and policy, including a live demo; positioning a Data Lifeboat in emergency/not-emergency scenarios; curation needs or desires to use Data Lifeboats as selection/acquisition tool
Consent and Care in Social Media Archiving – Ethics of care in digital archives; social context and care vs extractive data practices; mapping ethical rights, risks, responsibilities including copyright and data protection, and consent, and
Characteristics of a Robust & Responsible Safe Harbor Network (our planned extension of the Data Lifeboat concept – think LOCKSS-ish) – The long history of safe harbor networks; logistics of such a network; Trust.

I’m not going to report on these now, but whet your appetite for our further reporting back.

Background readings

Tori also prepared some grounding readings for the event, which we thought others may like to review:

The Archival Sliver: A Perspective on the Construction of Social Memory in Archives and the Transition from Apartheid to Democracy by Verne Harris
Energy, Digital Preservation, and the Climate: Proactively Planning for an Uncertain Future by Sibyl Schaefer
The Image As Witness: Collecting Visual Materials from the National Tragedy by Jeremy Adamson
Digital Preservation: A Critical Vocabulary, Chapter 5: Risk by Rebecca Frank
“I Can’t Wait for You to Die” A Community Archives Critique by Harrison Apple

Needless to say, we all enjoyed it very much, and heard the same from our attendees. Several follow-on chats have been arranged, and the community continues to wiggle towards each other.

[Spectators taking photographs at tomb of John F. Kennedy. Arlington National Cemetery] (LOC)
at The Library of Congress

We are now past the midpoint of our first project stage, and have our three basic prototype Data Lifeboats. At the moment, they run locally via the command line and generate rough versions of what Data Lifeboats will eventually contain—data and pictures.

The last step for those prototypes is to move them into a clicky web prototype showing the full workflow—something we will share with our working group (but may not put online publicly). We are working towards completing this first prototyping stage around the end of June and writing up the project in July.

We’ve made a few key decisions since we last posted an update, namely about who we’re designing for and what other expertise we need to bring in. We still have more questions than answers, but really, that’s what prototyping is for.

Who might do which bit

It took us a while to get to this decision, but once we had gone through the initial discovery phase, it became clear that we need to concentrate our efforts on three key user groups:

Flickr members – People who’ve uploaded pictures to Flickr, have set licenses and permissions, and may either be happy or not happy for their pictures to be put into Data Lifeboats.
Data Lifeboat creators – Could be archivists or other curatorial types looking to gather sets of pictures to copy into archives elsewhere, whether that be an institution like The Library of Congress, or a family archivist with a DropBox account.
Dock operators – This group is a bit more speculative, but, we envision that Data Lifeboats could actually land (or dock) in specific destinations and be treated with special care there. Our ideal scenario would be to develop a network of docks–something we’ve been calling a “Safe Harbor Network”—made up of members that are our great and good cultural organizations: they are already really good at keeping things safe over the long term.

It’ll be good to flesh the needs and wants of these three groups out in more detail in our next stage. If you are a Flickr member reading this, and want to share your story about what your Flickr account means to you, we’d love to hear it.

Web archive vs object archive

Some digital/web preservation experts take the opinion that it’s archivally important to also archive the user interface of a digital property in order to fully understand a digital object’s context. This has arguably resulted in web archives containing a whole lot more information and structural stuff than is useful or necessary. It’s sort of like archiving the entire house within which the shoebox of photos was found.

We have decided that archiving the flickr.com interface itself is not necessary for a Data Lifeboat, and we will be designing a special viewer that will live inside each Data Lifeboat to help people explore its contents.

Analysing the need for new policy

The Data Lifeboat idea is about so much more than technology. Even though that’s certainly challenging, the more we think about it, the more challenging the social and ethical aspects are. It’s gritty, complex stuff, made moreso by the delicate socio-technical settings available to Flickr members, like privacy, search settings, and licensing. The crosshatch of these three vectors makes managing stable permissions over time harder than weaving a complicated textile!

Once we narrowed down our focus to these specific user groups it also became clear that we need to address the (very) complex legal landscape surrounding the potential for archiving of Flickr images external to the service. It’s particularly gnarly when you start considering how permissions might change over time, or how access might shift for different scales of audience. For example, a Flickr member might be happy for Data Lifeboats containing their images to be shared with friends of friends, but a little apprehensive about them being shared with a recognized cultural institution that would use them for research. They may be much less happy for their Flickr pictures to be fully archived and available to anyone in perpetuity.

To help us explore these questions, and begin prototyping policies for each type of user group we foreses, we have enlisted the help of Dr. Andrea Wallace of the Law School at the University of Exeter. She is working with us to develop legal and policy frameworks tailored to the needs of each of these three groups, and to study how the current Flickr Terms of Service may be suitable for, or need adaption around, this idea of a Data Lifeboat. This may include drafting terms and conditions needed to create a Data Lifeboat, how we might be able to enhance rights management, and exploring how to manage expiration or decay of privacy or licensing into the future.

Data Lifeboat prototypes

We have generated three different prototype Data Lifeboats to think with, and show to our working group:

Photos tagged with “Flickrhq”: This prototype includes thousands of tagged images of ‘life working at Flickr’, which is useful to explore the tricky aspects of collating other people’s pictures into a Data Lifeboat. Creating it revealed a search foible, whereby the result set that is delivered by searching via a tag is not consistent. Many of the pictures are also marked as All Rights Reserved, with 33% having downloads disabled. This raises juicy questions about licensing and permissions that need further discussion.
Two photos from each Flickr Commons Member: We picked this subset because Flickr Commons photos are earmarked with the ‘no known copyright restrictions’ assertion, so questions about copying or reusing are theoretically simpler.
All photos from the Library of Congress (LoC) account: Comprising roughly 42,000 photos also marked as “no known copyright restrictions,” this prototype contains a set that is simpler to manage as all images have a uniform license setting. It was also useful to generate a Data Lifeboat of this size as it allowed us to do some very early benchmarking on questions like how long it takes to create one and where changes to our APIs might be helpful.

Preparing these prototypes has underscored the challenges of balancing the legal, social, and technical aspects of this kind of social media archiving, making clear the need for a special set of terms & conditions for Data Lifeboat creation. They also reveal the limitations of tags in capturing all relevant content (which, to some extent, we were expecting) and the user-imposed restrictions set on images in the Flickr context, like ‘can be downloaded.’

Remaining questions?

OMG, so many. Although the prototypes are still in progress, they have already stimulated great discussion and raised some key questions, such as:

How might user intentions or permissions change over time and how could software represent them?
How could the scope or scale of sharing influence how shared images are perceived, updated, and utilized?
How can we understand how different use cases and how archivists/librarians could engage with the Data Lifeboats?
How important is it to make sure Data Lifeboats are launched with embedded rights information, and how might those decay over time?
How should we be considering the descriptive or social contexts that accompany images, and how should they inform subsequent decisions about expiration dates?

Long term sustainability and funding models

It’s really so early to be talking about this – and we’re definitely not ready to present any actual, reasonable, viable models here because we don’t know enough yet about how Data Lifeboats could be used or under what circumstances. We did do a first pass review of some obvious potential business models, for example:

A premium subscription service that allows Flickr.com users to create personalized Data Lifeboats for their own collections.
A consulting service for institutions and individuals who want to create Data Lifeboats for specific archival purposes.
Developing training and certification programs for digital archivization that uses Data Lifeboats as the foundation.
Membership fees for members of the Safe Harbor network, or charging fees for access to the Data Lifeboat archives.

While there were aspects to each that appealed to our partners, there were also significant flaws so overall, we’re still a long way from having an answer. This is something else we’re planning to explore more broadly in partnership with the wider Flickr Commons membership in subsequent phases of this project.

Next steps

This month we’ll be wrapping up this first prototyping phase supported by the National Endowment for the Humanities. After we’ve completed the required reporting, we’ll move into the next phase in earnest, reaching out to those three user groups more deliberately to learn more about how Data Lifeboats could operate for them and what they would need them to do.

Two upcoming in-person events!

We’re also very happy to be able to tell you the Mellon Foundation has awarded us a grant to support this next stage, and we’re especially looking forward to running two small events later in the year to gather people from our Flickr Commons partner institutions, as well as other birds of a feather, to discuss these key challenges together.

If you’d like to register your interest in attending one of these meetings, please let us know via this short Registration of Interest form. Please note, these will be small, maybe 20ish people at each, and registering interest does not guarantee a spot, and we’ve only just begun planning in earnest.

Flocks of Blue Geese and Snow Stop at the Squaw Creek National Wildlife Refuge near Mound City, Missouri…10/1974
at The U.S. National Archives

Tramore Lifeboat Crew, National Library of Ireland
see it on flickr.com

What’s the grant for?

It’s a 12-month grant, and mostly involves using the prototype work we’ve been doing to demonstrate and discuss the concept with our community. We can’t wait to hold the two events we have planned in the (Northern hemisphere) autumn, and we’ll likely be having them on the East Coast of the USA, and in our homebase, London. If you’d like to learn more about attending one of these small meetings, please let us know via hello [at] flickr.org.

We expect to also iterate on the software itself, but we’re not quite sure where we’ll end up just yet, especially if all our conversations result in us needing to pursue different directions.

Growing the team

As part of this grant, we’ll be advertising for two new roles, likely on contract: Researcher and Software Developer. Stay tuned for those!

What’s a Data Lifeboat again?

A Data Lifeboat is an archival piece of Flickr, not all of the 50 billion images and their metadata. For example, a Lifeboat might contain all the photos tagged with “sunflower” or all the Recipes to Share group submissions. Whatever facet of the data you can think of, you could generate a Data Lifeboat for it. We envision an archival sliver richer than a mere folder of JPGs: one where you can navigate the content to explore and understand its networked context. Even better, an archival sliver that is updated if things change at flickr.com.

Today, Flickr members can make an archive of their own photostream, and that works really well. You can “get your data” and that download includes most, if not all, of the kinds of information we expect a Data Lifeboat to contain. And, we want to take it two steps further, from an archival point of view:

Allow creators to make Data Lifeboats that can contain other people’s images (with permission, and that’s very, very gnarly), and
We plan to develop ‘known places’ for Data Lifeboats to land, so they can be registered or even accessioned as bonafide objects of meaningful cultural value. We’re calling those landing places Docks. That work is probably going to start in earnest in 2025.

In our ideal world, these docks will live inside our great and good cultural organizations, spreading the load, responsibility, and acknowledgement that our digital, user-generated cultural heritage is valuable and worthy of the attention and care our archives, museums, and libraries can provide. Jenn’s recent deeper dive into this is worth your time.

Building steadily

Our prototyping stage is nearly done now, within which we expect to come out with some Data Lifeboats to look at and critique, some “prototype policies” for Flickr members, Data Lifeboat creators, and possible “dock” operators. We are also doing some foundational work on models for sustainability, because, as you will know, to date, we’ve been largely quite bad at planning for long term life for our digital projects.

Thank you

A huge thank you to Jenn and Ewa for your fantastic support getting the grant application done, and to the team at Mellon for such constructive feedback.

March has been productive. The short version is it’s complicated but we’re exploring happily, and adjusting the scope in small ways to help simplify it. Let me summarise the main things we did this month.

Legal workshop

We welcomed two of our advisors—Neil from the Bodleian and Andrea from GLAM e-Lab—to our HQ to get into the nitty gritty of what a 50-year-old Data Lifeboat needs to accommodate.

As we began the conversation, I centred us in the C.A.R.E. Principles and asked that we always keep them in our sights for this work. The main future challenges are settling around the questions of how identity and the right to be forgotten must be expressed, how Flickr account holders can or should be identified, and whether an external name resolver service of some kind could help us. We think we should develop policies for Flickr members (on consent to be in a Data Lifeboat), Data Lifeboat creators (on their obligations as creators), and Dock Operators (an operations manual & obligations for operating a dock). It’s possible there will also be some challenges ahead around database rights, but we don’t know enough yet to give a good update. We’d like a first-take legal framework of the Data Lifeboat system to be an outcome of these first six months.

Privacy & licensing

These are key concepts central to Flickr—privacy and licensing—and we must make sure we do our utmost to respect them in all our work. It would be irresponsible for us to jettison the desires encoded in those settings for our convenience, tempting though that may be. By that I mean, it would be easier for us to make Data Lifeboats that contained whatever photos from whomever, but we must respect the desires of Flickr creators in the creation process.

There are still big and unanswered questions about consent, and how we get millions of Flickr members to agree to participate and give permission to allow their pictures to be put in other people’s Data Lifeboats.

Extending the prototype Data Lifeboat sets

Initially, we had planned to run this 6-month prototype stage with just one test set of images, which would be some or all of the Flickr Commons photographs. But in order to explore the challenges around privacy and licensing, we’ve decided to expand our set of working prototypes to also include the entire Library of Congress Flickr Commons account, and all the photos tagged with “flickrhq” (since that set is something the Flickr Foundation may decide to collect for its own archive and contains photographs from different Flickr members who also happen to have been Flickr staff and would therefore (theoretically) be more sympathetic to the consent question).

Visit to Greenwich

Ewa spotted that there was an exhibition of ambrotype photographic portraits of women in the RNLI at the Maritime Museum in Greenwich at the moment, so we decided to take a day trip to see the portraits and poke around the brilliant museum. We ended up taking a boat from Greenwich to Battersea which was a nice way to experience the Thames (and check out that boat’s life saving capabilities).

The Data Lifeboat creation process

I found myself needing to start sketching out what it could look like to actually create a Data Lifeboat, and particularly not via a command line, so we spent a while in front of a whiteboard kicking that off.

At this point, we’re imagining a few key steps:

The Query – “I want these photos” – is like a search. We could borrow from our existing Flinumeratr toy.
The Results – Show the images, some metadata. But it’s hard to show information about the set in aggregate at this stage, e.g., how many of the contents are licensed in which way. This could form a manifest for the Data Lifeboat..
Agreement – We think there’s a need for the Data Lifeboat creator to agree to certain terms. Simple, active language that echoes the CARE principles, API ToS, and Flickr Community Guidelines. We think this should also be included in the Data Lifeboat it’s connected with.
README / Note to the Future – we love the idea that the Data Lifeboat creator could add a descriptive narrative at this point, about why they are making this lifeboat, and for whom, but we recognised that this may not get done at all, especially if it’s too complicated or time-consuming. This is also a good spot to describe or configure warnings, timers, or other conditions needed for future access. Thanks also to two of our other advisors – Commons members Mary Grace and Alan – who shared with us their organisation’s policies on acquisitions for reference.
Packaging – This would be asynchronous and invisible to the creator; downloading everything in the background. We realised it could take days, especially if there are lots of Data Lifeboats being made at once.
Ready! – The Data Lifeboat creator gets a note somehow about the Data Lifeboat being ready for download. We may need to consider keeping it available only for a short time(?).

Creation Schematic, 19th March

This is an image of a step in the Data Lifeboat creation flow. It's schematic in nature.

Emergency v Non-Emergency

We keep coming up against this…

The original concept of the Data Lifeboat is a response to the near-death experience that Flickr had in 2017 when its then-owner, Verizon/Yahoo, almost decided to vaporise it because they deemed it too expensive to sell (something known as “the cost of economic divestment”). So, in the event of that kind of emergency, we’d want to try to save as much of this unique collection as possible as quickly as possible, so we’d need a million lifeboats full of pictures created more or less simultaneously or certainly in a relatively short period of time.

In the early days of this work, Alex said that the pressure of this kind of emergency would be the equivalent of being “hugged to death by the archivists,” as we all try— in very caring and responsible ways—to save as much as we can. And then there’s the bazillion-emergency-hits-to-the-API-connection problem—aka the “Thundering Herd” problem—which we do not yet have a solution for, and which is very likely to affect any other social media platforms that may also be curious to explore this concept.

We’re connecting with the Flickr.com team to start discussing how to address this challenge. We’re beginning to think about how emergency selection might work, as well as the present, and future, challenges of establishing the identity of photo subjects and account owners. The millions of lifeboats that would be created would surely need the support of the company to launch if they’re ever needed.

This work is supported by the National Endowment for the Humanities.

For all of us at Flickr Foundation, the idea of Flickr as an archive in waiting inspires our core purpose. We believe the billions of photos that have amassed on Flickr in the last 20 years have potential to be the material of future historical research. With so much of our everyday lives being captured digitally and posted to public platforms, we – both the Flickr Foundation and the wider cultural heritage community – have begun figuring out how to proactively gather, make available, and preserve digital images and their metadata for the long term.

In this blog post, I’m setting my sights beyond technology to consider the institutional and social aspects that enable the collection of digital photography from online platforms.

It’s made of people

Our Data Lifeboat project is now underway. Its goal is to build a mechanism to make it possible to assemble and decentralize slivers of Flickr photos for potential future users. (You can read project update 1 and project update 2 for the background). The outcome of the first project phase will be one or more prototypes we will show to our Flickr Commons partners for feedback. We’re already looking ahead to the second phase where we will work with cultural heritage institutions within the wider Flickr Commons network to make sure that anything we put into production best suits cultural heritage institutions’ real-world needs.

We’ve been considering multiple possible use cases for creating, and importantly, docking a Data Lifeboat in a safe place. The two primary institutional use cases we see are:

Cultural heritage institutions want to proactively collect born digital photography on topics relevant to their collections
In an emergency situation, cultural heritage institutions (and maybe other Flickr members) want to save what they can from a sinking online platform – either photos they’ve uploaded or generously saving whatever they can. (And let me be clear: Flickr.com is thriving! But it’s better to design for a worst-case scenario than to find ourselves scrambling for a solution with no time to spare.)

We are working towards our Flickr Commons members (and other interested institutions) being able to accept Data Lifeboats as archival materials. For this to succeed, “dock” institutions will need to:

Be able to use it, and have the technology to accept it
Already have a view on collecting born digital photography, and ideally this type of media is included in their collection development strategy. (This is probably more important.)

This isn’t just a technology problem. It’s a problem made of everything else the technology is made of: people who work in cultural heritage institutions, their policies, organizational strategies, legal obligations, funding, commitment to maintenance, the willing consent of people who post their photos to online platforms and lots more.

To preserve born digital photos from the web requires the enthusiastic backing of institutions—which are fundamentally social creatures—to do what they’re designed to do, which is to save and ensure access to the raw material of future research.

Collecting social photography

I’ve been doing some background research to inform the early stages of Data Lifeboat development. I came across the 2020 Collecting Social Photography (CoSoPho) research project, which set out to understand how photography is used in social media in order to be able to develop methods for collection and transmission to future generations. Their report, ‘Connect to Collect: approaches to collecting social digital photography in museums and archives’, is freely available as PDF.

The project collaborators were:

The Nordic Museum / Nordiska Museet
Stockholm County Museum / Stockholms Läns Museum
Aalborg City Archives / Aalborg Stadsarkiv
The Finnish Museum of Photography / Finland’s Fotografiska Museum
Department of Social Anthropology, Stockholm University

The CoSoPho project was a response to the current state of digital social photography and its collection/acquisition – or lack thereof – by museums and archives.

Implicit to the team’s research is that digital photography from online platforms is worth collecting. Three big questions were centered in their research:

How can data collection policies and practices be adapted to create relevant and accessible collections of social digital photography?
How can digital archives, collection databases and interfaces be relevantly adapted – considering the character of the social digital photograph and digital context – to serve different stakeholders and end users?
How can museums and archives change their role when collecting and disseminating, to increase user influence in the whole life circle of the vernacular photographic cultural heritage?

There’s a lot in this report that is relevant to the Data Lifeboat project. The team’s research focussed on ‘digital social photography’, taken to mean any born digital photos that are taken for the purpose of sharing on social media. It interrogates Flickr alongside Snapchat, Facebook, Instagram, as well as region-specific social media sites like IRC-Galleria (a very early 2000s Finnish social media platform).

I would consider Flickr a bit different to the other apps mentioned, only because it doesn’t address the other Flickr-specific use cases such as:

Showcasing photography as craft
Using Flickr as a public photo repository or image library where photos can be downloaded and re-used outside of Flickr, unlike walled garden apps like Instagram or Snapchat.

The ‘massification’ of images

The CoSoPho project highlighted the challenges of collecting digital photos of today while simultaneously digitizing analog images from the past, the latter of which cultural heritage institutions have been actively doing for many years. Anna Dahlgren describes this as a “‘massification’ of images online”. The complexities of digital social photos, with their continually changing and growing dynamic connections, combined with the unstoppable growth of social platforms, pose certain challenges for libraries, archives and museums to collect and preserve.

To collect digital photos requires a concerted effort to change the paradigm:

from static accumulation to dynamic connection
from hierarchical files to interlinked files
and from pre-selected quantities of documents to aggregation of unpredictably variable image and data objects.

Dahlgren argues that “…in order to collect and preserve digital cultural heritage, the infrastructure of memory institutions has to be decisively changed.”

The value of collecting and contributing

“Put bluntly, if images on Instagram, Facebook or any other open online platform should be collected by museums and archives what would the added value be? Or, put differently, if the images and texts appearing on these sites are already open and public, what is the role of the museum, or what is the added value of having the same contents and images available on a museum site?” (A. Dahlgren)

Those of us working in the cultural heritage sector can imagine many good responses to this question. At the Flickr Foundation, we look to our recent internet history and how many web platforms have been taken offline. Our digital lives are at risk of disappearing. Museums, libraries and archives have that long-term commitment to preservation. They are repositories of future knowledge, and expect to be there to provide access to it.

Cultural heritage institutions that choose to collect from social online spaces can forge a path for a multiplicity of voices within collections, moving beyond standardized metadata toward richer, more varied descriptions from the communities from which the photos are drawn. There is significant potential to collect in collaboration with the publics the institution serves. This is a great opportunity to design for a more inclusive ethics of care into collections.

But what about potential contributors whose photos are being considered for collection by institutions? What values might they apply to these collections?

CoSoPho uncovered useful insights about how people participating in community-driven collecting projects considered their own contributions. Contributors wanted to be selective about which of their photos would make it into a collection; this could be for aesthetic reasons (choosing the best, most representative photos) or concerns for their own or others’ anonymity. Explicit consent to include one’s photos in a future archive was a common theme – and one which we’re thinking deeply about.

Overall, people responded positively to the idea of cultural institutions collecting digital social photos – they too can be part of history!— and also think it’s important that the community from which those photos are drawn have a say in what is collected and how it’s made available. Future user researchers at Flickr Foundation might want to explore contributor sentiment even further.

What’s this got to do with Data Lifeboats?

As an intermediary between billions of Flickr photos and cultural heritage institutions, we need to create the possibilities for long-term preservation of this rich vein of digital history. These considerations will help us to design a system that works for Flickr members and museums and archives.

Adapting collection development practices

All signs point to cultural heritage institutions needing to prepare to take on born digital items. Many are already doing this as part of their acquisition strategies, but most often this born digital material comes entangled in a larger archival collection.

If institutions aren’t ready to proactively collect born digital material from the public web, this is a risk to the longevity of this type of knowledge. And if this isn’t a problem that currently matters to institutions, how can we convince them to save Flickr photos?

As we move into the next phase of the Data Lifeboat project, we want to find out:

Are Flickr Commons member institutions already collecting, or considering collecting, born digital material?
What kinds of barriers do they face?

Enabling consent and self-determination

CoSoPho’s research surfaced the critical importance of consent, ownership and self-determination in determining how public users/contributors engage with their role in creating a new digital archive.

How do we address issues of consent when preserving photos that belong to creators?
How do we create a system that allows living contributors to have a say in what is preserved, and how it’s presented?
How do we design a system that enables the informed collection of a living archive?
Is there a form of donor agreement or an opt-in to encourage this ethics of care?

Getting choosy

With 50 billion Flickr photos, not all of them visible to the public or openly licensed, we are working from the assumption that the Data Lifeboat needs to enable selective collecting.

Are there acquisition practices and policies within Flickr Commons institutions that can inform how we enable users to choose what goes into a Data Lifeboat?
What policies for protecting data subjects in collections need to be observed?
Are there existing paradigms for public engagement for proactive, social collecting that the Data Lifeboat technology can enable?

Co-designing usable software

Cultural heritage institutions have massively complex technical environments with a wide variety of collection management systems, digital asset management systems and more. This complexity often means that institutions miss out on chances to integrate community-created content into their collections.

The CoSoPho research team developed a prototype for collecting digital social photography. That work was attempting to address some of these significant tech challenges, which Flickr Foundation is already considering:

Individual institutions need reliable, modern software that interfaces with their internal systems; few institutions have internal engineering capacity to design, build and maintain their own custom software
Current collection management systems don’t have a lot of room for community-driven metadata; this information is often wedged in to local data fields
Collection management systems lack the ability to synchronize data with social media platforms (and vice versa) if the data changes. That makes it more difficult to use third-party platforms for community description and collecting projects.

So there’s a huge opportunity for the Flickr Foundation to contribute software that works with this complexity to solve real challenges for institutions. Co-design–that is, a design process that draws on your professional expertise and institutional realities–is the way forward!

We need you!

We are working on the challenge of keeping Flickr photos visible for 100 years and we believe it’s essential that cultural heritage institutions are involved. Therefore, we want to make sure we’re building something that works for as many organizations as possible – both big and small – no matter where you are in your plans to collect born digital content from the web.

If you’re part of the Flickr Commons network already, we are planning two co-design workshops for Autumn 2024, one to be held in the US and the other likely to be in London. Keep your eyes peeled for Save-the-Date invitations, or let us know you’re interested, and we’ll be sure to keep you in the loop directly.

This work is supported by the National Endowment for the Humanities.

Thanks to the Digital Humanities Advancement Grant we were awarded by the National Endowment for the Humanities, our Data Lifeboat project (which is part of the Content Mobility Program) is now well and truly underway. The Data Lifeboat is our response to the challenge of archiving the 50 billion or so images currently on Flickr, should the service go down. It’s simply too big to archive as a whole, and we think that these shared histories should be available for the long term, so we’re exploring a decentralized approach. Find out more about the context for this work in our first blog post.

So, after our kick-off last month, we were left with a long list of open questions. That list became longer thanks to our first all-hands meeting that took place shortly afterwards! It grew again once we had met with the project user group – staff from the British Library, San Diego Air & Space Museum, and Congregation of Sisters of St Joseph – a small group representing the diversity of Flickr Commons members. Rather than being overwhelmed, we were buoyed by the obvious enthusiasm and encouragement across the group, all of whom agreed that this is very much an idea worth pursuing.

As Mia Ridge from the British Library put it; “we need ephemeral collections to tell the story of now and give people who don’t currently think they have a role in preservation a different way of thinking about it”. And from Mary Grace of the Congregation of Sisters of St. Joseph in Canada, “we [the smaller institutions] don’t want to be the 3rd class passengers who drown first”.

Software sketching

We’ve begun working on the software approach to create a Data Lifeboat, focussing on the data model and assessing existing protocols we may use to help package it. Alex and George started creating some small prototypes to test how we should include metadata, and have begun exploring what “social metadata” could be like – that’s the kind of metadata that can only be created on Flickr, and is therefore a required element in any Data Lifeboat (as you’ll see from the diagram below, it’s complex).

Feb 2024: An early sketch of a Data Lifeboat’s metadata graph structure.

Thanks to our first set of tools, Flinumeratr and Flickypedia, we have robust, reusable code for getting photos and metadata from Flickr. We’ve done some experiments with JSON, XML, and METS as possible ways to store the metadata, and started to imagine what a small viewer that would be included in each Data Lifeboat might be like.

Complexity of long-term licensing

Alongside the technical development we have started developing our understanding of the legal issues that a Data Lifeboat is going to have to navigate to avoid unintended consequences of long-term preservation colliding with licenses set in the present. We discussed how we could build care and informed participation into the infrastructure, and what the pitfalls might be. There are fiddly questions around creating a Data Lifeboat containing photos from other Flickr members.

As the image creator, would you need to be notified if one of your images has been added to a Data Lifeboat?
Conversely, how would you go about removing an image from a Data Lifeboat?
What happens if there’s a copyright dispute regarding images in a Data Lifeboat that is docked somewhere else?

We discussed which aspects of other legal and licensing models might apply to Data Lifeboats, given the need to maintain stewardship and access over the long term (100 years at least!), as well as the need for the software to remain usable over this kind of time horizon. This isn’t something that the world of software has ready answers for.

Could Flickr.org offer this kind of service?
How would we notify future users of the conditions of the license, let alone monitor the decay of licenses in existing Data Lifeboats over this kind of timescale?

So many standards to choose from

We had planned to do a deep dive into the various digital asset management systems used by cultural institutions, but this turned out to be a trickier subject than we thought as there are simply too many approaches, tools, and cobbled-together hacks being used in cultural institutions. Everyone seems to be struggling with this, so it’s not clear (yet) how best to approach this. If you have any ideas, let us know!

This work is supported by the National Endowment for the Humanities.

Thanks to the Digital Humanities Advancement Grant we were awarded by the National Endowment for the Humanities, our Data Lifeboat project (which is part of the Content Mobility Program) is now well and truly underway. The Data Lifeboat is our response to the challenge of archiving the 50 billion or so images currently on Flickr, should the service go down. It’s simply too big to archive as a whole, and we think that these shared histories should be available for the long term, so we’re exploring a decentralized approach. Find out more about the context for this work in our first blog post.

So, after our kick-off last month, we were left with a long list of open questions. That list became longer thanks to our first all-hands meeting that took place shortly afterwards! It grew again once we had met with the project user group – staff from the British Library, San Diego Air & Space Museum, and Congregation of Sisters of St Joseph – a small group representing the diversity of Flickr Commons members. Rather than being overwhelmed, we were buoyed by the obvious enthusiasm and encouragement across the group, all of whom agreed that this is very much an idea worth pursuing.

As Mia Ridge from the British Library put it; “we need ephemeral collections to tell the story of now and give people who don’t currently think they have a role in preservation a different way of thinking about it”. And from Mary Grace of the Congregation of Sisters of St. Joseph in Canada, “we [the smaller institutions] don’t want to be the 3rd class passengers who drown first”.

Software sketching

We’ve begun working on the software approach to create a Data Lifeboat, focussing on the data model and assessing existing protocols we may use to help package it. Alex and George started creating some small prototypes to test how we should include metadata, and have begun exploring what “social metadata” could be like – that’s the kind of metadata that can only be created on Flickr, and is therefore a required element in any Data Lifeboat (as you’ll see from the diagram below, it’s complex).

Feb 2024: An early sketch of a Data Lifeboat’s metadata graph structure.

Thanks to our first set of tools, Flinumeratr and Flickypedia, we have robust, reusable code for getting photos and metadata from Flickr. We’ve done some experiments with JSON, XML, and METS as possible ways to store the metadata, and started to imagine what a small viewer that would be included in each Data Lifeboat might be like.

Complexity of long-term licensing

Alongside the technical development we have started developing our understanding of the legal issues that a Data Lifeboat is going to have to navigate to avoid unintended consequences of long-term preservation colliding with licenses set in the present. We discussed how we could build care and informed participation into the infrastructure, and what the pitfalls might be. There are fiddly questions around creating a Data Lifeboat containing photos from other Flickr members.

As the image creator, would you need to be notified if one of your images has been added to a Data Lifeboat?
Conversely, how would you go about removing an image from a Data Lifeboat?
What happens if there’s a copyright dispute regarding images in a Data Lifeboat that is docked somewhere else?

We discussed which aspects of other legal and licensing models might apply to Data Lifeboats, given the need to maintain stewardship and access over the long term (100 years at least!), as well as the need for the software to remain usable over this kind of time horizon. This isn’t something that the world of software has ready answers for.

Could Flickr.org offer this kind of service?
How would we notify future users of the conditions of the license, let alone monitor the decay of licenses in existing Data Lifeboats over this kind of timescale?

So many standards to choose from

We had planned to do a deep dive into the various digital asset management systems used by cultural institutions, but this turned out to be a trickier subject than we thought as there are simply too many approaches, tools, and cobbled-together hacks being used in cultural institutions. Everyone seems to be struggling with this, so it’s not clear (yet) how best to approach this. If you have any ideas, let us know!

This work is supported by the National Endowment for the Humanities.

When I do planning, I usually carve it up along three axes: Projects, Pipeline, and People. I want to keep our project list very short in 2024. That allows us to focus more deeply, I think, and spend time thinking and waxing and wandering a bit as we map the new terrain of our mission, to keep Flickr images visible for 100 years.

Projects

There are three main flows of project work for the team:

Flickr Commons nurturing and growing
Start Data Lifeboat
Continue 100-year plan ideation and workshopping

Flickr Commons

Flickr Commons turned 16 years old last week. To celebrate, we launched the first instantiation of a new front door which lives at commons.flickr.org. The intent is to help Commons fans explore the different members’ collections more easily, and get a sense of recent activity across the aggregate. We hope to do another handful of releases over this year and beyond.

The other good news is that we’re nearly, finally, ready to welcome new members into the program. The software that supported new registrations and members had decayed a bit over the last decade, so, working with the company team—thanks Ruppel et al—we’ve co-designed a new set of Commons-specific APIs that will help the Foundation really lean into supporting Flickr Commons members from now on.

We are going to build: 1) a new registration form, 2) improved onboarding resources/workflow, 3) the new discovery layer you can now see at commons.flickr.org, and 4) better admin tools for the team to watch over the health of the program, and the happiness of our members. This will all be rolling out in the first half of this year. I don’t have a date for our first new tranche of members, but rest assured, we’ll let you know!

Later in the year, we want to find out a lot more about Flickr members interact with Flickr Commons and see if we can support them to more easily keep track of their input and progress. If you fit into this group, we’d like to know you!

Data Lifeboat

Last year, we applied to the National Endowment for the Humanities (NEH) to develop a first set of prototyping for our Data Lifeboat concept. That’s the idea that we should actually plan for a possible end of flickr.com, developing “lifeboats” that can carry Flickr photos to other places if the big ship goes down. It was gratifying that the NEH decided to support this first block of work.

Our framing for the grant is to create two identical lifeboats containing Flickr pictures, “objective metadata” like EXIF, and a first crack at “social metadata”—the stuff that is only created on Flickr—because we think that’s essential for longer term contextual, archival framing of the existence of a Flickr photo. After all, on Flickr (and off) a photo is a social object, that is discussed, arranged, annotated, pointed at, and displayed, and EXIF data (the data that is created when a digital camera takes a photograph) falls short.

We’re planning to post NEH-grant-specific updates the blog at the end of each month, so stay tuned for that. (I’d better write that next!)

The 100-year plan

I don’t have a structure or plan written yet. But, I’ve really enjoyed all the discussions I’ve had about the idea, and especially the various workshops we’ve run in different groups about the idea. Basically, the workshop is called How to write a 100-year plan and my opening gambit is “I don’t know, what do you think” and conversation ensues.

We do hope to be able to at least get that workshop into a form where you might be able to run it without us. We’d let you know about that too.

Pipeline

We’re just over one year old, launching officially in November 2022. We’ve had an amazing start, thanks to support from SmugMug and our first cornerstone funder, Filecoin Foundation for the Decentralized Web. Since then, we’ve figured out how to accept donations of cash online via Stripe, and even stock donations! We’ve sketched out the grants we’re planning to apply for too.

People

Ewa Spohn, who also helped write the NEH grant for Data Lifeboat, has joined the crew to manage the project. With a background in mechanical engineering, program management, and people-arranging, we’re lucky to have her! Welcome, Ewa!

We’ve brought on a new part-time team member to help wrangle our Pipeline work, Susan Mernit. (Check out her sledgehammer!!) A veteran of the tech industry, Susan changed gears to lead two non-profits in California, to great success. She’s now working with nonprofits to help shore up their development plans and strategy, and we’re very glad she’s come on board to support us.

And, in case you missed it, we’re hiring: Our first job ad for this year is Archivist. It’s live now, closing January 31st.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

From Desiderata to READMEs: The case for a C.A.R.E.-full Data Lifeboat Pt. I

A Brief History of C.A.R.E Principles

C.A.R.E. in Data Lifeboats?

Enter: The README

Telling the Story of an Archive

Decentralised and Distributed Annotation

Data Restitution and Justice

Discussion: README as a Datasheet for Networked Social Photography Data Sets?

Conclusion

Bibliography

A Phoenix in Paris: Data Lifeboats for Citizen-Driven Histories

A People’s History through the Lens

The Photograph Over Time

Securing the People’s Archive

Other possible Data Lifeboats

Conclusion

Our Data Lifeboat workshops are complete

Workshop aims

Workshop outline

Background readings

Data Lifeboat 5: Prototypes and policy

Who might do which bit

Web archive vs object archive

Analysing the need for new policy

Data Lifeboat prototypes

Remaining questions?

Long term sustainability and funding models

Next steps

Two upcoming in-person events!

New Grant from the Mellon Foundation!

What’s the grant for?

Growing the team

What’s a Data Lifeboat again?

Building steadily

Thank you

Data Lifeboat Update 3

Legal workshop

Privacy & licensing

Extending the prototype Data Lifeboat sets

Visit to Greenwich

The Data Lifeboat creation process

Creation Schematic, 19th March

Emergency v Non-Emergency

Data Lifeboat Update 2a: Deeper research into the challenge of archiving social media objects

It’s made of people

Collecting social photography

The ‘massification’ of images

The value of collecting and contributing

What’s this got to do with Data Lifeboats?

Adapting collection development practices

Enabling consent and self-determination

Getting choosy

Co-designing usable software

We need you!

Data Lifeboat Update 2: More questions than answers

Software sketching

Complexity of long-term licensing

So many standards to choose from

Data Lifeboat Update 2: More questions than answers

Software sketching

Complexity of long-term licensing

So many standards to choose from

Our plan for 2024: Flickr Commons & Data Lifeboat & the 100-year Plan

Projects

Flickr Commons

Data Lifeboat

The 100-year plan

Pipeline

People