Data Lifeboat Update 3

March has been productive. The short version is it’s complicated but we’re exploring happily, and adjusting the scope in small ways to help simplify it. Let me summarise the main things we did this month.

Legal workshop

We welcomed two of our advisors—Neil from the Bodleian and Andrea from GLAM e-Lab—to our HQ to get into the nitty gritty of what a 50-year-old Data Lifeboat needs to accommodate. 

As we began the conversation, I centred us in the C.A.R.E. Principles and asked that we always keep them in our sights for this work. The main future challenges are settling around the questions of how identity and the right to be forgotten must be expressed, how Flickr account holders can or should be identified, and whether an external name resolver service of some kind could help us. We think we should develop policies for Flickr members (on consent to be in a Data Lifeboat), Data Lifeboat creators (on their obligations as creators), and Dock Operators (an operations manual & obligations for operating a dock). It’s possible there will also be some challenges ahead around database rights, but we don’t know enough yet to give a good update. We’d like a first-take legal framework of the Data Lifeboat system to be an outcome of these first six months.

Privacy & licensing

These are key concepts central to Flickr—privacy and licensing—and we must make sure we do our utmost to respect them in all our work. It would be irresponsible for us to jettison the desires encoded in those settings for our convenience, tempting though that may be. By that I mean, it would be easier for us to make Data Lifeboats that contained whatever photos from whomever, but we must respect the desires of Flickr creators in the creation process. 

There are still big and unanswered questions about consent, and how we get millions of Flickr members to agree to participate and give permission to allow their pictures to be put in other people’s Data Lifeboats. 

Extending the prototype Data Lifeboat sets 

Initially, we had planned to run this 6-month prototype stage with just one test set of images, which would be some or all of the Flickr Commons photographs. But in order to explore the challenges around privacy and licensing, we’ve decided to expand our set of working prototypes to also include the entire Library of Congress Flickr Commons account, and all the photos tagged with “flickrhq” (since that set is something the Flickr Foundation may decide to collect for its own archive and contains photographs from different Flickr members who also happen to have been Flickr staff and would therefore (theoretically) be more sympathetic to the consent question).

Visit to Greenwich

Ewa spotted that there was an exhibition of ambrotype photographic portraits of women in the RNLI at the Maritime Museum in Greenwich at the moment, so we decided to take a day trip to see the portraits and poke around the brilliant museum. We ended up taking a boat from Greenwich to Battersea which was a nice way to experience the Thames (and check out that boat’s life saving capabilities).

Day Out: Maritime Museum & Lifeboats

Day Out: Maritime Museum & Lifeboats

The Data Lifeboat creation process

I found myself needing to start sketching out what it could look like to actually create a Data Lifeboat, and particularly not via a command line, so we spent a while in front of a whiteboard kicking that off. 

At this point, we’re imagining a few key steps:

  1. The Query – “I want these photos” – is like a search. We could borrow from our existing Flinumeratr toy.
  2. The Results – Show the images, some metadata. But it’s hard to show information about the set in aggregate at this stage, e.g., how many of the contents are licensed in which way. This could form a manifest for the Data Lifeboat..
  3. Agreement – We think there’s a need for the Data Lifeboat creator to agree to certain terms. Simple, active language that echoes the CARE principles, API ToS, and Flickr Community Guidelines. We think this should also be included in the Data Lifeboat it’s connected with.
  4. README / Note to the Future – we love the idea that the Data Lifeboat creator could add a descriptive narrative at this point, about why they are making this lifeboat, and for whom, but we recognised that this may not get done at all, especially if it’s too complicated or time-consuming. This is also a good spot to describe or configure warnings, timers, or other conditions needed for future access. Thanks also to two of our other advisors – Commons members Mary Grace and Alan – who shared with us their organisation’s policies on acquisitions for reference.
  5. Packaging – This would be asynchronous and invisible to the creator; downloading everything in the background. We realised it could take days, especially if there are lots of Data Lifeboats being made at once.
  6. Ready! – The Data Lifeboat creator gets a note somehow about the Data Lifeboat being ready for download. We may need to consider keeping it available only for a short time(?).

Creation Schematic, 19th March

Emergency v Non-Emergency 

We keep coming up against this… 

The original concept of the Data Lifeboat is a response to the near-death experience that Flickr had in 2017 when its then-owner, Verizon/Yahoo, almost decided to vaporise it because they deemed it too expensive to sell (something known as “the cost of economic divestment”). So, in the event of that kind of emergency, we’d want to try to save as much of this unique collection as possible as quickly as possible, so we’d need a million lifeboats full of pictures created more or less simultaneously or certainly in a relatively short period of time. 

In the early days of this work, Alex said that the pressure of this kind of emergency would be the equivalent of being “hugged to death by the archivists,” as we all try— in very caring and responsible ways—to save as much as we can. And then there’s the bazillion-emergency-hits-to-the-API-connection problem—aka the “Thundering Herd” problem—which we do not yet have a solution for, and which is very likely to affect any other social media platforms that may also be curious to explore this concept.

We’re connecting with the Flickr.com team to start discussing how to address this challenge. We’re beginning to think about how emergency selection might work, as well as the present, and future, challenges of establishing the identity of photo subjects and account owners. The millions of lifeboats that would be created would surely need the support of the company to launch if they’re ever needed.

This work is supported by the National Endowment for the Humanities.

NEH logo

New! Flickr Commons Explorer

commons.flickr.org

At the Flickr Foundation, one of the goals we set early on when we took over responsibility for running the Flickr Commons program was to build an improved ‘discovery layer’ for the Commons collection.

We’re pleased to share with you a first look at our new Commons Explorer, available at commons.flickr.org.

We’ve built the explorer using the standard Flickr API, and have created a secondary database which is updated pretty regularly. (This is a way of us saying not all the data is live live.) And, please note that photos on display link back to flickr.com.

It’s a work in development, but we wanted to show you our progress in this early version. We’ve prioritized being able to look across the Commons in an interface that’s richer than search results. We’re surfacing activity levels across the collection too, to show that there’s a ton of chatting happening, and new uploads all the time.

The views we’ve built so far:

Home page

This is a list of recent uploads from across the Commons collection, and a sample of our members.

Members

This is a list of all the Flickr Commons members, which you can sort in different ways. We’ve set it to be sorted by the member with the most recent upload, so you’ll see active members at a glance.

Each member has their own page, where you can see their popular tags, interesting photos, and recent uploads.

Conversations

For the first time ever, you can enjoy catching up on the last week’s conversations about photos in Flickr Commons. You’ll see immediately the fantastic community that’s grown up around members like the National Library of Ireland’s photostream, and get to know some of the volunteer researchers inhabiting and contributing their time and detective skills to enrich the Commons.

Stats

Here we present activity across the collection, like uploading volumes, comments, and popular tags across the collection…

About

A simple static page which outlines what we’re doing. And finally…

Search!

We’ve made a bone simple search for the explorer too, so you can quickly see a splat of pictures about just about anything. Even with a few million photos, there’s a huge range of tagging and other description happening. Jump into London, pie, Istanbul, and smiles, or just look for the magnifying glass in the top right of the nav bar.

We hope you enjoy exploring, and, please let us know if you have ideas for how we can improve upon what’s there so far!

In other Flickr Commons news

We are working with the Flickr company to develop a new set of API methods the Foundation will be able to use to build the member management tools we need to really lean into rejuvenating the Commons and especially growing the new membership. If we can introduce 5-10 new members to the program this year, we’ll be stoked! More, we’ll be even stoked-er.

This will involve a new home for registrations of interest, and a smoother onboarding experience for new members as they come on board. Generally, we’re looking forward to new insights into the overall health of the program in the form of better views on activity (the beginnings of which you can see in commons.flickr.org).

If you are either from an existing member institution, or you’re curious about joining in and sharing your historical photography collections, please let us know.

In our early research back in 2021, we noted we wanted to get to know more of the volunteer community too, and see if we can learn about their needs for participating with research and commentary, and I’m pleased to report we’ve begun that, with our first interview with a prominent community member last week. (I was so excited I could barely talk, but Jessamyn wisely recorded the conversation and will be reporting on it soon.)

Repurposing and Remixing Archival Images

A surprising glimpse of a historical photo with a long history

A recent episode of Abbott Elementary had a historical photograph as a plot point of the episode. We tracked down that photograph and offer a bit more of its real-life history.

During a recent episode of Abbott Elementary, the satisfying conclusion of the episode discussed the “specialness” of the fictional school in Philadelphia where the show takes place.

The last scene shows teachers and students assembled to hang a framed photograph showing the first Black teachers who worked at the school. The photo, which they found in the “school archives” was highlighted as a point of pride, something they could all feel good about. The photograph looked familiar to me and I wondered if it was one from Flickr Commons.

Using some image editing skills and reverse image search tools, I found that the photograph shown above was from one of the Flickr Commons members, the Library of Congress. Going to the original source of this image showed that it was from a Black photographer Thomas E. Askew who worked at the turn of the last century.

Thomas E. Askew self-portrait

Another photograph from the same series as the one in Abbott Elementary is in the Commons.

[Four African American women seated on steps of building at Atlanta University, Georgia] (LOC)

Not much is known about Askew. He was a formerly-enslaved man, born in 1847. He lived in Atlanta, Georgia where these photographs were taken. Business directories show him working at the CW Motes Studio and he had a personal photography studio in his home. From a blog post from the Historic Oakland Foundation (where he is buried):

Askew’s personal, intimate portraits showed a broader range of the Black experience that stood in stark contrast to the stereotypes present in the media of the time. His subjects ranged in age, skin-tone, attire, and vocation and reframed the visual aesthetics and culture developed by Black Americans in the decades following emancipation. This imagery challenged the perception that the American middle class was an exclusively white experience.

He became better-known after his photographs were included in an album titled Types of American Negroes that was compiled by W. E. B. Du Bois for the Exposition Universelle of 1900 in Paris. Du Bois won a gold medal for his role as “collaborator” and “compiler” of materials for the exhibit.

W.E.B. (William Edward Burghardt) Du Bois, 1868-1963 (LOC)

The Library of Congress has determined that Du Bois specifically commissioned Askew to take photographs of Black middle-class people for this exhibition. Askew’s wife Mary was a seamstress and the role of both clothing and accessorizing was an important part of these photos. As the LoC States in their 2003 book A small nation of people : W.E.B. Du Bois and African American portraits of progress,

When we look at the photographs Askew took of these people, we can see a tension in the well-dressed students and residents of Georgia. The style of dress worn by the subjects reveals the status of the sitters, either real or hoped for; we see them today as class-conscious Blacks.

Of the albums Du Bois curated with Thomas J. Calloway, four were photographic, and included this image (photographer unknown) of a baseball team from Morris Brown College which had been founded less than two decades previously by the African Methodist Episcopal Church.

[African American baseball players from Morris Brown College, with boy and another man standing at door, Atlanta, Georgia] (LOC)

Askew had nine children. Five of them (plus one neighbor) are in this photograph, which was also sent to the Paris Exposition.

Celebrating World Photography Day! (LOC)

From a five-second peek at a photograph in a television show, we can look closer and learn more about the history of the United States, and even of photography itself.

 

Data Lifeboat Update 2a: Deeper research into the challenge of archiving social media objects

By Jenn Phillips-Bacher

For all of us at Flickr Foundation, the idea of Flickr as an archive in waiting inspires our core purpose. We believe the billions of photos that have amassed on Flickr in the last 20 years have potential to be the material of future historical research. With so much of our everyday lives being captured digitally and posted to public platforms, we – both the Flickr Foundation and the wider cultural heritage community – have begun figuring out how to proactively gather, make available, and preserve digital images and their metadata for the long term.

In this blog post, I’m setting my sights beyond technology to consider the institutional and social aspects that enable the collection of digital photography from online platforms.

It’s made of people

Our Data Lifeboat project is now underway. Its goal is to build a mechanism to make it possible to assemble and decentralize slivers of Flickr photos for potential future users. (You can read project update 1 and project update 2 for the background). The outcome of the first project phase will be one or more prototypes we will show to our Flickr Commons partners for feedback. We’re already looking ahead to the second phase where we will work with cultural heritage institutions within the wider Flickr Commons network to make sure that anything we put into production best suits cultural heritage institutions’ real-world needs.

We’ve been considering multiple possible use cases for creating, and importantly, docking a Data Lifeboat in a safe place. The two primary institutional use cases we see are:

  1. Cultural heritage institutions want to proactively collect born digital photography on topics relevant to their collections
  2. In an emergency situation, cultural heritage institutions (and maybe other Flickr members) want to save what they can from a sinking online platform – either photos they’ve uploaded or generously saving whatever they can. (And let me be clear: Flickr.com is thriving! But it’s better to design for a worst-case scenario than to find ourselves scrambling for a solution with no time to spare.)

We are working towards our Flickr Commons members (and other interested institutions) being able to accept Data Lifeboats as archival materials. For this to succeed, “dock” institutions will need to:

  • Be able to use it, and have the technology to accept it
  • Already have a view on collecting born digital photography, and ideally this type of media is included in their collection development strategy. (This is probably more important.)

This isn’t just a technology problem. It’s a problem made of everything else the technology is made of: people who work in cultural heritage institutions, their policies, organizational strategies, legal obligations, funding, commitment to maintenance, the willing consent of people who post their photos to online platforms and lots more.

To preserve born digital photos from the web requires the enthusiastic backing of institutions—which are fundamentally social creatures—to do what they’re designed to do, which is to save and ensure access to the raw material of future research.

Collecting social photography

I’ve been doing some background research to inform the early stages of Data Lifeboat development. I came across the 2020 Collecting Social Photography (CoSoPho) research project, which set out to understand how photography is used in social media in order to be able to develop methods for collection and transmission to future generations. Their report, ‘Connect to Collect: approaches to collecting social digital photography in museums and archives’, is freely available as PDF.

The project collaborators were:

  • The Nordic Museum / Nordiska Museet
  • Stockholm County Museum / Stockholms Läns Museum
  • Aalborg City Archives / Aalborg Stadsarkiv
  • The Finnish Museum of Photography / Finland’s Fotografiska Museum
  • Department of Social Anthropology, Stockholm University

The CoSoPho project was a response to the current state of digital social photography and its collection/acquisition – or lack thereof – by museums and archives.

Implicit to the team’s research is that digital photography from online platforms is worth collecting. Three big questions were centered in their research:

  1. How can data collection policies and practices be adapted to create relevant and accessible collections of social digital photography?
  2. How can digital archives, collection databases and interfaces be relevantly adapted – considering the character of the social digital photograph and digital context – to serve different stakeholders and end users?
  3. How can museums and archives change their role when collecting and disseminating, to increase user influence in the whole life circle of the vernacular photographic cultural heritage?

There’s a lot in this report that is relevant to the Data Lifeboat project. The team’s research focussed on ‘digital social photography’, taken to mean any born digital photos that are taken for the purpose of sharing on social media. It interrogates Flickr alongside Snapchat, Facebook, Instagram, as well as region-specific social media sites like IRC-Galleria (a very early 2000s Finnish social media platform).

I would consider Flickr a bit different to the other apps mentioned, only because it doesn’t address the other Flickr-specific use cases such as:

  • Showcasing photography as craft
  • Using Flickr as a public photo repository or image library where photos can be downloaded and re-used outside of Flickr, unlike walled garden apps like Instagram or Snapchat.

The ‘massification’ of images

The CoSoPho project highlighted the challenges of collecting digital photos of today while simultaneously digitizing analog images from the past, the latter of which cultural heritage institutions have been actively doing for many years. Anna Dahlgren describes this as a “‘massification’ of images online”. The complexities of digital social photos, with their continually changing and growing dynamic connections, combined with the unstoppable growth of social platforms, pose certain challenges for libraries, archives and museums to collect and preserve.

To collect digital photos requires a concerted effort to change the paradigm:

  • from static accumulation to dynamic connection
  • from hierarchical files to interlinked files
  • and from pre-selected quantities of documents to aggregation of unpredictably variable image and data objects.

Dahlgren argues that “…in order to collect and preserve digital cultural heritage, the infrastructure of memory institutions has to be decisively changed.”

The value of collecting and contributing

“Put bluntly, if images on Instagram, Facebook or any other open online platform should be collected by museums and archives what would the added value be? Or, put differently, if the images and texts appearing on these sites are already open and public, what is the role of the museum, or what is the added value of having the same contents and images available on a museum site?” (A. Dahlgren)

Those of us working in the cultural heritage sector can imagine many good responses to this question. At the Flickr Foundation, we look to our recent internet history and how many web platforms have been taken offline. Our digital lives are at risk of disappearing. Museums, libraries and archives have that long-term commitment to preservation. They are repositories of future knowledge, and expect to be there to provide access to it.

Cultural heritage institutions that choose to collect from social online spaces can forge a path for a multiplicity of voices within collections, moving beyond standardized metadata toward richer, more varied descriptions from the communities from which the photos are drawn. There is significant potential to collect in collaboration with the publics the institution serves. This is a great opportunity to design for a more inclusive ethics of care into collections.

But what about potential contributors whose photos are being considered for collection by institutions? What values might they apply to these collections?

CoSoPho uncovered useful insights about how people participating in community-driven collecting projects considered their own contributions. Contributors wanted to be selective about which of their photos would make it into a collection; this could be for aesthetic reasons (choosing the best, most representative photos) or concerns for their own or others’ anonymity. Explicit consent to include one’s photos in a future archive was a common theme – and one which we’re thinking deeply about.

Overall, people responded positively to the idea of cultural institutions collecting digital social photos – they too can be part of history!— and also think it’s important that the community from which those photos are drawn have a say in what is collected and how it’s made available. Future user researchers at Flickr Foundation might want to explore contributor sentiment even further.

What’s this got to do with Data Lifeboats?

As an intermediary between billions of Flickr photos and cultural heritage institutions, we need to create the possibilities for long-term preservation of this rich vein of digital history. These considerations will help us to design a system that works for Flickr members and museums and archives.

Adapting collection development practices

All signs point to cultural heritage institutions needing to prepare to take on born digital items. Many are already doing this as part of their acquisition strategies, but most often this born digital material comes entangled in a larger archival collection.

If institutions aren’t ready to proactively collect born digital material from the public web, this is a risk to the longevity of this type of knowledge. And if this isn’t a problem that currently matters to institutions, how can we convince them to save Flickr photos?

As we move into the next phase of the Data Lifeboat project, we want to find out:

  • Are Flickr Commons member institutions already collecting, or considering collecting, born digital material?
  • What kinds of barriers do they face?

Enabling consent and self-determination

CoSoPho’s research surfaced the critical importance of consent, ownership and self-determination in determining how public users/contributors engage with their role in creating a new digital archive.

  • How do we address issues of consent when preserving photos that belong to creators?
  • How do we create a system that allows living contributors to have a say in what is preserved, and how it’s presented?
  • How do we design a system that enables the informed collection of a living archive?
    Is there a form of donor agreement or an opt-in to encourage this ethics of care?

Getting choosy

With 50 billion Flickr photos, not all of them visible to the public or openly licensed, we are working from the assumption that the Data Lifeboat needs to enable selective collecting.

  • Are there acquisition practices and policies within Flickr Commons institutions that can inform how we enable users to choose what goes into a Data Lifeboat?
  • What policies for protecting data subjects in collections need to be observed?
  • Are there existing paradigms for public engagement for proactive, social collecting that the Data Lifeboat technology can enable?

Co-designing usable software

Cultural heritage institutions have massively complex technical environments with a wide variety of collection management systems, digital asset management systems and more. This complexity often means that institutions miss out on chances to integrate community-created content into their collections.

The CoSoPho research team developed a prototype for collecting digital social photography. That work was attempting to address some of these significant tech challenges, which Flickr Foundation is already considering:

  • Individual institutions need reliable, modern software that interfaces with their internal systems; few institutions have internal engineering capacity to design, build and maintain their own custom software
  • Current collection management systems don’t have a lot of room for community-driven metadata; this information is often wedged in to local data fields
  • Collection management systems lack the ability to synchronize data with social media platforms (and vice versa) if the data changes. That makes it more difficult to use third-party platforms for community description and collecting projects.

So there’s a huge opportunity for the Flickr Foundation to contribute software that works with this complexity to solve real challenges for institutions. Co-design–that is, a design process that draws on your professional expertise and institutional realities–is the way forward!

We need you!

We are working on the challenge of keeping Flickr photos visible for 100 years and we believe it’s essential that cultural heritage institutions are involved. Therefore, we want to make sure we’re building something that works for as many organizations as possible – both big and small – no matter where you are in your plans to collect born digital content from the web.

If you’re part of the Flickr Commons network already, we are planning two co-design workshops for Autumn 2024, one to be held in the US and the other likely to be in London. Keep your eyes peeled for Save-the-Date invitations, or let us know you’re interested, and we’ll be sure to keep you in the loop directly.

This work is supported by the National Endowment for the Humanities.

NEH logo

Repurposing and Remixing Archival Images

A surprising glimpse of a historical photo with a long history

A recent episode of Abbott Elementary had a historical photograph as a plot point of the episode. We tracked down that photograph and offer a bit more of its real-life history.

During a recent episode of Abbott Elementary, the satisfying conclusion of the episode discussed the “specialness” of the fictional school in Philadelphia where the show takes place.

The last scene shows teachers and students assembled to hang a framed photograph showing the first Black teachers who worked at the school. The photo, which they found in the “school archives” was highlighted as a point of pride, something they could all feel good about. The photograph looked familiar to me and I wondered if it was one from Flickr Commons.

In “Seance of the Digital Image” I began to seek out the “ghosts” that haunt the material that machines use to make new images. In my residency with the Flickr Foundation, I’ll continue to dig into training data — particularly, the Flickr Commons collection — to see the ways it shapes AI-generated images. These will not be one to one correlations, because that’s not how these models work.

So how do these diffusion models work? How do we make an image with AI? The answer to this question is often technical: a system of diffusion, in which training images are broken down into noise and reassembled. But this answer ignores the cultural component of the generated image. Generative AI is a product of training datasets scraped from the web, and entangled in these datasets are vast troves of cultural heritage data and photographic archives. When training data-driven AI tools, we are diffusing data, but we are also diffusing visual culture. 

 

Eryk Salvaggio: Flowers Blooming Backward Into Noise (2023) from ARRG! on Vimeo.

 

In my research, I have developed a methodology for “reading” AI-generated images as the products of these datasets, as a way of interrogating the biases that underwrite them. Since then, I have taken an interest in this way of reading for understanding the lineage, or genealogy, of generated images: what stew do these images make with our archives? Where does it learn the concept of what represents a person, or a tree, or even an archive? Again, we know the technical answer. But what is the cultural answer to this question? 

By looking at generated images and the prompts used to make them, we’ll build a way to map their lineages: the history that shapes and defines key concepts and words for image models. My hope is that this endeavor shows us new ways of looking at generated images, and to surface new stories about what such images mean.

As the tech industry continues building new infrastructures on this training data, our window of opportunity for deciding what we give away to these machines is closing, and understanding what is in those datasets is difficult, if not impossible. Much of the training data is proprietary, or has been taken offline. While we cannot map generated images to their true training data, massive online archives like Flickr give us insight into what they might be. Through my work with the Flickr Foundation, I’ll look at the images from institutions and users to think about what these images mean in this generated era. 

In this sense, I will interrogate what haunts a generated image, but also what haunts the original archives: what stories do we tell, and which do we lose? I hope to reverse the generated image in a meaningful way: to break the resulting image apart, tackling correlations between the datasets that train them, the archives that built those datasets, and the images that emerge from those entanglements.

Data Lifeboat Update 2: More questions than answers

By Ewa Spohn

Thanks to the Digital Humanities Advancement Grant we were awarded by the National Endowment for the Humanities, our Data Lifeboat project (which is part of the Content Mobility Program) is now well and truly underway. The Data Lifeboat is our response to the challenge of archiving the 50 billion or so images currently on Flickr, should the service go down. It’s simply too big to archive as a whole, and we think that these shared histories should be available for the long term, so we’re exploring a decentralized approach. Find out more about the context for this work in our first blog post.

So, after our kick-off last month, we were left with a long list of open questions. That list became longer thanks to our first all-hands meeting that took place shortly afterwards! It grew again once we had met with the project user group – staff from the British Library, San Diego Air & Space Museum, and Congregation of Sisters of St Joseph – a small group representing the diversity of Flickr Commons members. Rather than being overwhelmed, we were buoyed by the obvious enthusiasm and encouragement across the group, all of whom agreed that this is very much an idea worth pursuing. 

As Mia Ridge from the British Library put it; “we need ephemeral collections to tell the story of now and give people who don’t currently think they have a role in preservation a different way of thinking about it”. And from Mary Grace of the Congregation of Sisters of St. Joseph in Canada, “we [the smaller institutions] don’t want to be the 3rd class passengers who drown first”. 

Software sketching

We’ve begun working on the software approach to create a Data Lifeboat, focussing on the data model and assessing existing protocols we may use to help package it. Alex and George started creating some small prototypes to test how we should include metadata, and have begun exploring what “social metadata” could be like – that’s the kind of metadata that can only be created on Flickr, and is therefore a required element in any Data Lifeboat (as you’ll see from the diagram below, it’s complex). 


Feb 2024: An early sketch of a Data Lifeboat’s metadata graph structure.

Thanks to our first set of tools, Flinumeratr and Flickypedia, we have robust, reusable code for getting photos and metadata from Flickr. We’ve done some experiments with JSON, XML, and METS as possible ways to store the metadata, and started to imagine what a small viewer that would be included in each Data Lifeboat might be like. 

Complexity of long-term licensing

Alongside the technical development we have started developing our understanding of the legal issues that a Data Lifeboat is going to have to navigate to avoid unintended consequences of long-term preservation colliding with licenses set in the present. We discussed how we could build care and informed participation into the infrastructure, and what the pitfalls might be. There are fiddly questions around creating a Data Lifeboat containing photos from other Flickr members. 

  • As the image creator, would you need to be notified if one of your images has been added to a Data Lifeboat? 
  • Conversely, how would you go about removing an image from a Data Lifeboat? 
  • What happens if there’s a copyright dispute regarding images in a Data Lifeboat that is docked somewhere else? 

We discussed which aspects of other legal and licensing models might apply to Data Lifeboats, given the need to maintain stewardship and access over the long term (100 years at least!), as well as the need for the software to remain usable over this kind of time horizon. This isn’t something that the world of software has ready answers for. 

  • Could Flickr.org offer this kind of service? 
  • How would we notify future users of the conditions of the license, let alone monitor the decay of licenses in existing Data Lifeboats over this kind of timescale? 

So many standards to choose from

We had planned to do a deep dive into the various digital asset management systems used by cultural institutions, but this turned out to be a trickier subject than we thought as there are simply too many approaches, tools, and cobbled-together hacks being used in cultural institutions. Everyone seems to be struggling with this, so it’s not clear (yet) how best to approach this. If you have any ideas, let us know!

This work is supported by the National Endowment for the Humanities.

NEH logo

Black History Through Archival Images: Part 2

Flickr Commons’ Curated Albums

Too many images of underrepresented people and groups go unidentified in archival collections. For Black History Month in the United States we’re showcasing some of our curated collections which tell the stories of Black experiences.

State Archives of North Carolina – Charlotte Hawkins Brown

Charlotte Hawkins Brown was an educator and civil rights activist who opened the Palmer Institute for Black students in Sedalia North Carolina in 1902.

N_83_12_9CHBrwn-c1930-GOOD

The Palmer Institute was the only accredited rural high school (for African American or white students) in Guilford County NC. It graduated generations of Black educators; Brown worked there herself until she retired in 1952.

N-83-12-7PalmerInst1933

The State Archives also have a set of sixty archival images of North Carolinian women from the 1800s through the 1950s.

PC2177_B1_F1_B

pc2154_V9_P90

Other notable collections include this set of photographs of Black soldiers from North Carolina who fought in World War I and a collection of Raleigh’s lost African American architectural landmarks (as well as some that are still around).

N_2009_4_162 371st Infantry Band 1917

N_53_17_119 Shaw Hall

 

San Diego Air and Space Museum Archives – African Americans in Aviation

From the Tuskeegee Airmen to Mae Jemison, the San Diego Air and Space Museum Archives collects photographs and other ephemera, some of it from personal scrapbooks, documenting Black people working in aviation and aerospace.

Tuskegee

Benjamin Davis, specifically had a long military career, retiring in 1998 as a four-star general.

Ben O Davis and P-51

Leroy Criss, another of the Tuskegee Airmen, kept a scrapbook where many of these images are from.

Criss 050-1

Mae Jemison

Willa Brown was the first Black woman to earn a pilot’s license in the United States.

 

Willa Brown

While we’re on the subject of space, NASA also has created a collection of Black astronauts and other people who worked in aerospace.

Winston Scott during EVA

Col. Frederick D. Gregory

 

National Library of Medicine – African American Medical Practitioners

The NLM has curated a collection of Black workers, mostly women, in the Public Health Service for their History of Medicine division.

Nurses standing with bicycles

Teeth cleaning

Improvised clinic

Mennonite Church USA – Camp Ebenezer Photographs, 1947-1950

Tillie Yoder Nauraine founded an early “fresh air” camp in Ohio for poor Black  children from Chicago. This was part of the Mennonite movement towards “building an interracial church in a segregated society.” Yoder opened the camp out of her conviction that “all people are equal in God’s eyes.”

 

Camp Ebenezer:  Boys Playing Baseball

Camp Ebenezer:  The First Ebenezer Campers

Camp Ebenezer: African American Children on Teeter-Totters

Kheel Center for Labor-Management Documentation Cornell University – Civil Rights

The International Ladies Garment Workers Union actively worked for the rights of Black workers in including picketing Woolworths and making a New York to Washington DC Prayer pilgrimage to mark the anniversary of the Supreme Court decision that segregated schools are unconstitutional.

People picket against the Woolworth Company's practice of segregation, April 20, 1963.

Prayer pilgrimage attendees holding an ILGWU sign in front of their bus

The Kheel Center also has documentation of the Southern Tenants Farmers Union, an integrated union which held meetings in Parkin Arkansas in 1937.

Smiling STFU members at an outdoor meeting

Image verso: "An early union meeting." Black and White STFU members including Myrtle Lawrence and Ben Lawrence, listen to Norman Thomas speak outside Parkin, Arkansas on September 12, 1937. One man carries an enamel pot and drinking glass.

Large group sharing a meal at outdoor banquet tables during an STFU meeting

Black men listening to a speaker at an outdoor STFU meeting

If you’d like to see more archival photography (or other material) about Black history and culture, the Schomburg Center for Research in Black Culture, Photographs and Prints Division at New York Public Library owns over 300,000 images, thousands of which are online and over a thousand of which are in the public domain.

Or if you’re interested in modern Black photographers read this GQ article where twenty-five Black photographers discuss what drives their work or this Guardian article showcasing the best photography by Black female photographers or this blog post at Flickr.com spotlighting the work of photographer Ayesha Kazim.

 

Black History Through Archival Images: Part 2

Flickr Commons’ Curated Albums

Too many images of underrepresented people and groups go unidentified in archival collections. For Black History Month in the United States we’re showcasing some of our curated collections which tell the stories of Black experiences.

State Archives of North Carolina – Charlotte Hawkins Brown

Charlotte Hawkins Brown was an educator and civil rights activist who opened the Palmer Institute for Black students in Sedalia North Carolina in 1902.

N_83_12_9CHBrwn-c1930-GOOD

The Palmer Institute was the only accredited rural high school (for African American or white students) in Guilford County NC. It graduated generations of Black educators; Brown worked there herself until she retired in 1952.

N-83-12-7PalmerInst1933

The State Archives also have a set of sixty archival images of North Carolinian women from the 1800s through the 1950s.

PC2177_B1_F1_B

pc2154_V9_P90

Other notable collections include this set of photographs of Black soldiers from North Carolina who fought in World War I and a collection of Raleigh’s lost African American architectural landmarks (as well as some that are still around).

N_2009_4_162 371st Infantry Band 1917

N_53_17_119 Shaw Hall

 

San Diego Air and Space Museum Archives – African Americans in Aviation

From the Tuskeegee Airmen to Mae Jemison, the San Diego Air and Space Museum Archives collects photographs and other ephemera, some of it from personal scrapbooks, documenting Black people working in aviation and aerospace.

Tuskegee

Benjamin Davis, specifically had a long military career, retiring in 1998 as a four-star general.

Ben O Davis and P-51

Leroy Criss, another of the Tuskegee Airmen, kept a scrapbook where many of these images are from.

Criss 050-1

Mae Jemison

Willa Brown was the first Black woman to earn a pilot’s license in the United States.

 

Willa Brown

While we’re on the subject of space, NASA also has created a collection of Black astronauts and other people who worked in aerospace.

Winston Scott during EVA

Col. Frederick D. Gregory

 

National Library of Medicine – African American Medical Practitioners

The NLM has curated a collection of Black workers, mostly women, in the Public Health Service for their History of Medicine division.

Nurses standing with bicycles

Teeth cleaning

Improvised clinic

Mennonite Church USA – Camp Ebenezer Photographs, 1947-1950

Tillie Yoder Nauraine founded an early “fresh air” camp in Ohio for poor Black  children from Chicago. This was part of the Mennonite movement towards “building an interracial church in a segregated society.” Yoder opened the camp out of her conviction that “all people are equal in God’s eyes.”

 

Camp Ebenezer:  Boys Playing Baseball

Camp Ebenezer:  The First Ebenezer Campers

Camp Ebenezer: African American Children on Teeter-Totters

Kheel Center for Labor-Management Documentation Cornell University – Civil Rights

The International Ladies Garment Workers Union actively worked for the rights of Black workers in including picketing Woolworths and making a New York to Washington DC Prayer pilgrimage to mark the anniversary of the Supreme Court decision that segregated schools are unconstitutional.

People picket against the Woolworth Company's practice of segregation, April 20, 1963.

Prayer pilgrimage attendees holding an ILGWU sign in front of their bus

The Kheel Center also has documentation of the Southern Tenants Farmers Union, an integrated union which held meetings in Parkin Arkansas in 1937.

Smiling STFU members at an outdoor meeting

Image verso: "An early union meeting." Black and White STFU members including Myrtle Lawrence and Ben Lawrence, listen to Norman Thomas speak outside Parkin, Arkansas on September 12, 1937. One man carries an enamel pot and drinking glass.

Large group sharing a meal at outdoor banquet tables during an STFU meeting

Black men listening to a speaker at an outdoor STFU meeting

If you’d like to see more archival photography (or other material) about Black history and culture, the Schomburg Center for Research in Black Culture, Photographs and Prints Division at New York Public Library owns over 300,000 images, thousands of which are online and over a thousand of which are in the public domain.

Or if you’re interested in modern Black photographers read this GQ article where twenty-five Black photographers discuss what drives their work or this Guardian article showcasing the best photography by Black female photographers or this blog post at Flickr.com spotlighting the work of photographer Ayesha Kazim.

 

Introducing Flickypedia, our first tool

Building a new bridge between Flickr and Wikimedia Commons

For the past four months, we’ve been working with the Culture & Heritage team at the Wikimedia Foundation — the non-profit that operates Wikipedia, Wikimedia Commons, and other Wikimedia free knowledge projects — to build Flickypedia, a new tool for bridging the gap between photos on Flickr and files on Wikimedia Commons. Wikimedia Commons is a free-to-use library of illustrations, photos, drawings, videos, and music. By contributing their photos to Wikimedia Commons, Flickr photographers help to illustrate Wikipedia, a free, collaborative encyclopedia written in over 300 languages. More than 1.7 billion unique devices visit Wikimedia projects every month.

We demoed the initial version at GLAM Wiki 2023 in Uruguay, and now that we’ve incorporated some useful feedback from the Wikimedia community, we’re ready to launch it. Flickypedia is now available at https://www.flickr.org/tools/flickypedia/, and we’re really pleased with the result. Our goal was to create higher quality records on Wikimedia Commons, with better connected data and descriptive information, and to make it easier for Flickr photographers to see how their photos are being used.

This project has achieved our original goals – and a couple of new ones we discovered along the way.

So what is Flickypedia?

An easy way to copy photos from Flickr to Wikimedia Commons

The original vision of Flickypedia was a new tool for copying photos from Flickr to Wikimedia Commons, a re-envisioning of the popular Flickr2Commons tool, which copied around 5.4M photos.

This new upload tool is what we built first, leveraging ideas from Flinumeratr, a toy we built for exploring Flickr photos. You start by entering a Flickr URL:

And then Flickypedia will find all photos at that URL, and show you the ones which are suitable for copying to Wikimedia Commons. You can choose which photos you want to upload:

Then you enter a title, a short description, and any categories you want to add to the photo(s):

Then you click “Upload”, and the photo(s) are copied to Wikimedia Commons. Once it’s done, you can leave a comment on the original Flickr photo, so the photographer can see the photo in its new home:

As well as the title and caption written by the uploader, we automatically populate a series of machine-readable metadata fields (“Structured Data on Commons” or “SDC”) based on the Flickr information – the original photographer, date taken, a link to the original, and so on. You can see the exact list of fields in our data modeling document. This should make it easier for Commons users to find the photos they need, and maintain the link to the original photo on Flickr.

This flow has a little more friction than some other Flickr uploading tools, which is by design. We want to enable high-quality descriptions and metadata for carefully selected photos; not just bulk copying for the sake of copying. Our goal is to get high quality photos on Wikimedia Commons, with rich metadata which enables them to be discovered and used – and that’s what Flickypedia enables.

Reducing risk and responsible licensing

Flickr photographers can choose from a variety of licenses, and only some of them can be used on Wikimedia Commons: CC0, Public Domain, CC BY and CC BY-SA. If it’s any other license, the photo shouldn’t be on Wikimedia Commons, according to its licensing policy.

As we were building the Flickypedia uploader, we took the opportunity to emphasize the need for responsible licensing – when you select your photographs, it checks the licenses, and doesn’t allow you to copy anything that doesn’t have a Commons-compatible license:

This helps to reduce risk for everyone involved with Flickr and Wikimedia Commons.

Better duplicate detection

When we looked at the feedback on existing Flickr upload tools, there was one bit of overwhelming feedback: people want better duplicate detection. There are already over 11 million Flickr photos on Wikimedia Commons, and if a photo has already been copied, it doesn’t need to be copied again.

Wikimedia Commons already has some duplicate detection. It’ll spot if you upload a byte-for-byte identical file, but it can’t detect duplicates if the photo has been subtly altered – say, converted to a different file format, or a small border cropped out.

It turns out that there’s no easy way to find out if a given Flickr photo is in Wikimedia Commons. Although most Flickr upload tools will embed that metadata somewhere, they’re not consistent about it. We found at least four ways to spot possible duplicates:

  • You could look for a Flickr URL in the structured data (the machine-readable metadata)
  • You could look for a Flickr URL in the Wikitext (the human-readable description)
  • You could look for a Flickr ID in the filename
  • Or Flickypedia could know that it had already uploaded the photo

And even looking for matching Flickr URLs can be difficult, because there are so many forms of Flickr URLs – here are just some of the varieties of Flickr URLs we found in the existing Wikimedia Commons data:

(And this is without some of the smaller variations, like trailing slashes and http/https.)

We’d already built a Flickr URL parser as part of Flinumeratr, so we were able to write code to recognise these URLs – but it’s a fairly complex component, and that only benefits Flickypedia. We wanted to make it easier for everyone.

So we did!

We proposed (and got accepted) a new Flickr Photo ID property. This is a new field in the machine-readable structured data, which can contain the numeric ID. This is a clean, unambiguous pointer to the original photo, and dramatically simplifies the process of looking for existing Flickr photos.

When Flickypedia uploads a new photo to Flickr, it adds this new property. This should make it easier for other tools to find Flickr photos uploaded with Flickypedia, and skip re-uploading them.

Backfillr Bot: Making Flickr metadata better for all Flickr photos on Commons

That’s great for new photos uploaded with Flickypedia – but what about photos uploaded with other tools, tools that don’t use this field? What about the 10M+ Flickr photos already on Wikimedia Commons? How do we find them?

To fix this problem, we created a new Wikimedia Commons bot: Flickypedia Backfillr Bot. It goes back and fills in structured data on Flickr photos on Commons, including the Flickr Photo ID property. It uses our URL parser to identify all the different forms of Flickr URLs.

This bot is still in a preliminary stage—waiting for approval from the Wikimedia Commons community—but once granted, we’ll be able to improve the metadata for every Flickr photo on Wikimedia Commons. And in addition, create a hook that other tools can use – either to fill in more metadata, or search for Flickr photos.

Sydney Harbour Bridge, from the Museums of History New South Wales. No known copyright restrictions.

Flickypedia started as a tool for copying photos from Flickr to Wikimedia Commons. From the very start, we had ideas about creating stronger links between the two – the “say thanks” feature, where uploaders could leave a comment for the original Flickr photographer – but that was only for new photos.

Along the way, we realized we could build a proper two-way bridge, and strengthen the connection between all Flickr photos on Wikimedia Commons, not just those uploaded with Flickypedia.

We think this ability to follow a photo around the web is really important – to see where it’s come from, and to see where it’s going. A Flickr photo isn’t just an image, it comes with a social context and history, and being uploaded to Wikimedia Commons is the next step in its journey. You can’t separate an image from its context.

As we start to focus on Data Lifeboat, we’ll spend even more time looking at how to preserve the history of a photo – and Flickypedia has given us plenty to think about.

If you want to use Flickypedia to upload some photos to Wikimedia Commons, visit www.flickr.org/tools/flickypedia.

If you want to look at the source code, go to github.com/Flickr-Foundation/flickypedia.

Data Lifeboat Update 2: More questions than answers

By Ewa Spohn

Thanks to the Digital Humanities Advancement Grant we were awarded by the National Endowment for the Humanities, our Data Lifeboat project (which is part of the Content Mobility Program) is now well and truly underway. The Data Lifeboat is our response to the challenge of archiving the 50 billion or so images currently on Flickr, should the service go down. It’s simply too big to archive as a whole, and we think that these shared histories should be available for the long term, so we’re exploring a decentralized approach. Find out more about the context for this work in our first blog post.

So, after our kick-off last month, we were left with a long list of open questions. That list became longer thanks to our first all-hands meeting that took place shortly afterwards! It grew again once we had met with the project user group – staff from the British Library, San Diego Air & Space Museum, and Congregation of Sisters of St Joseph – a small group representing the diversity of Flickr Commons members. Rather than being overwhelmed, we were buoyed by the obvious enthusiasm and encouragement across the group, all of whom agreed that this is very much an idea worth pursuing. 

As Mia Ridge from the British Library put it; “we need ephemeral collections to tell the story of now and give people who don’t currently think they have a role in preservation a different way of thinking about it”. And from Mary Grace of the Congregation of Sisters of St. Joseph in Canada, “we [the smaller institutions] don’t want to be the 3rd class passengers who drown first”. 

Software sketching

We’ve begun working on the software approach to create a Data Lifeboat, focussing on the data model and assessing existing protocols we may use to help package it. Alex and George started creating some small prototypes to test how we should include metadata, and have begun exploring what “social metadata” could be like – that’s the kind of metadata that can only be created on Flickr, and is therefore a required element in any Data Lifeboat (as you’ll see from the diagram below, it’s complex). 


Feb 2024: An early sketch of a Data Lifeboat’s metadata graph structure.

Thanks to our first set of tools, Flinumeratr and Flickypedia, we have robust, reusable code for getting photos and metadata from Flickr. We’ve done some experiments with JSON, XML, and METS as possible ways to store the metadata, and started to imagine what a small viewer that would be included in each Data Lifeboat might be like. 

Complexity of long-term licensing

Alongside the technical development we have started developing our understanding of the legal issues that a Data Lifeboat is going to have to navigate to avoid unintended consequences of long-term preservation colliding with licenses set in the present. We discussed how we could build care and informed participation into the infrastructure, and what the pitfalls might be. There are fiddly questions around creating a Data Lifeboat containing photos from other Flickr members. 

  • As the image creator, would you need to be notified if one of your images has been added to a Data Lifeboat? 
  • Conversely, how would you go about removing an image from a Data Lifeboat? 
  • What happens if there’s a copyright dispute regarding images in a Data Lifeboat that is docked somewhere else? 

We discussed which aspects of other legal and licensing models might apply to Data Lifeboats, given the need to maintain stewardship and access over the long term (100 years at least!), as well as the need for the software to remain usable over this kind of time horizon. This isn’t something that the world of software has ready answers for. 

  • Could Flickr.org offer this kind of service? 
  • How would we notify future users of the conditions of the license, let alone monitor the decay of licenses in existing Data Lifeboats over this kind of timescale? 

So many standards to choose from

We had planned to do a deep dive into the various digital asset management systems used by cultural institutions, but this turned out to be a trickier subject than we thought as there are simply too many approaches, tools, and cobbled-together hacks being used in cultural institutions. Everyone seems to be struggling with this, so it’s not clear (yet) how best to approach this. If you have any ideas, let us know!

This work is supported by the National Endowment for the Humanities.

NEH logo