Early View
RESEARCH ARTICLE
Open Access

Structural elements and spheres of expertise: Creating a healthy ecosystem for cultural data initiatives

Lisa M. Given

Lisa M. Given

Social Change Enabling Impact Platform, RMIT University, Melbourne, Victoria, Australia

Search for more papers by this author
Sarah Polkinghorne

Corresponding Author

Sarah Polkinghorne

Social Change Enabling Impact Platform, RMIT University, Melbourne, Victoria, Australia

Correspondence

Sarah Polkinghorne, Social Change Enabling Impact Platform, RMIT University, 360 Swanston Street, Melbourne, Vic 3000, Australia.

Email: [email protected]

Search for more papers by this author
Joann Cattlin

Joann Cattlin

Social Change Enabling Impact Platform, RMIT University, Melbourne, Victoria, Australia

Search for more papers by this author
First published: 08 November 2023

Abstract

While technology affords creation of digital collections, and promises access to all, the reality is that many cultural data collections exist in a precarious ecosystem, where erratic funding, fragmented support, and disconnected expertise threaten their continued existence. As a significant branch of the broader information ecosystem, cultural data collections range in size and scope, from national institutions to bespoke local collections supported by individuals. This exploratory, qualitative study engaged cultural data experts in Australia, Canada, and the United Kingdom to map the broad cultural data ecosystem and to identify opportunities for healthier growth. The development and maintenance of cultural data collections requires integration across the spheres of expertise of creators, curators, subject matter experts, information science, and computing and technology. The foundational structural elements of the ecosystem include funding, policies, access to existing data, community context, and technological infrastructure. The key elements of a healthy data ecosystem are clarity of purpose, user-focused design, sustainability, allied coproduction, and reciprocal interconnection. A healthier cultural data ecosystem means more collections and initiatives will have positive impacts for research, knowledge, and diverse communities, contributing positively to the broader information ecosystem and to society, at large.

1 INTRODUCTION

Cultural data initiatives preserve and provide access to the historic artifacts of our communities. While national libraries, galleries, museums, and archives gather historically significant works that tell the stories of a country's broad, collective experience, individuals, and small groups (such as regional historical societies) collect historical items documenting rich, localized activities. These organizations, along with the many creators and curators of cultural data artifacts, comprise a complex ecosystem that has been transformed significantly through digitization and the use of online platforms. This is what Burkey (2022) refers to as a “new memory ecosystem, whereby heritage communities are invited to contribute, participate with, and share more of what they are interested in collectively remembering, rather than simply accepting the authoritative narratives of heritage institutions” (p. 185).

Globally, the cultural data ecosystem produces, manages, preserves, and facilitates interaction with collections that are extremely diverse in the materials they include, across myriad topics, for varied audiences. The development and maintenance of these collections relies on expertise from across various academic and practice disciplines, including information science, humanities, computing, social sciences, and cultural industries, among others. The diversity in data scope, creator type, material format, longevity of retention, and availability of access points (to name a few examples) make the cultural data ecosystem one of the most complex and rich segments of the broader information ecosystem incorporating human-generated knowledge artifacts. Terras et al. (2021) describe this ecosystem as “a patchwork of small to large scale content, held in different locations, formats and under different reuse licenses, with different institutional approaches to risk, public engagement and entrepreneurship” (p. 11).

Thus, while cultural data initiatives are growing in number, globally, they lack a cohesive, sustainable, and healthy ecosystem to enable collaboration and sharing across related contexts. Using an exploratory, qualitative design, the results of this study demonstrate that the lack of a healthy ecosystem for cultural data initiatives results in fragmented, ineffective approaches, where collections have significant design limitations, are not sustainable, and lack clarity of purpose. The potential for cultural data to be irrecoverably lost is very real due to limited funding, insufficient infrastructure, and a precarious workforce that lacks the integrated, sustained, interdisciplinary approach needed for collections' long-term viability.

This paper examines cultural data initiatives and practices as critical components of the broader information ecosystem; it identifies their interrelated parts and articulates the features that constitute a healthy cultural data ecosystem that, in turn, benefit society. The research draws on Nardi and O'Day's (1999) concept of a “complex system of parts and relationships”—that is, an ecosystem—characterized by the presence of multiple essential elements and qualities (p. 50) and “marked by strong interrelationships and dependencies among its different parts” (p. 51). These parts are diverse and include “keystone species” necessary for the survival of the ecosystem, which often coevolve together for the benefit of the whole (p. 52). An important example of a keystone species in information ecosystems is “people who build bridges across institutional boundaries and translate across disciplines” (p. 54). Cultural data initiatives are well-suited to ecological analysis because their character as ecosystems is highly evident, relying on diverse experts committed to building bridges and coevolving as technologies and other circumstances change.

The cultural data ecosystem is a distinctive branch of the larger global information ecosystem. It includes collections managed with standardized protocols within galleries, libraries, archives, and museums, and those curated by universities, cultural organizations, businesses, special interest groups, and individuals, such as performance companies, music collectors, artists, and local history associations. The goals of these collections vary across preservation, public access, research, education, and commercial purposes. Interactive technologies that enable public interaction and annotation of collections have also contributed to this “new ecosystem of commemorative practices and collective remembering” (Burkey, 2022, p. 186).

While digitization facilitates increased access and reuse of cultural data (Terras, 2015), it also creates challenges for reuse and interoperability, and “a complex, interleaving, network of issues regarding training and upskilling, licensing and copyright, access to computational resources, access to data and consideration of the place of technological development within the cultural and creative sectors, and how this sits alongside existing or inherited activities, resources and cultural policy” (Terras et al., 2021, p. 11). While there are moves to address the barriers to improved interoperability within libraries and archives, this requires significant investment and professional expertise that is not reflected across the other organizations creating cultural data collections (Hawkins, 2022; Zhang, 2022).

While assessing and addressing the health of the overarching information ecosystem is critical, it is also significantly challenging to analyze such an immense ecosystem, across data types, formats, and intention of design. Nardi and O'Day (1999) argue that examining “the biggest picture possible” can be difficult, and even pessimism-inducing, because macrolevel processes can “seem impenetrable” (p. 57). They argue that examining a specific, locally rooted, instance can provide a “viable point of intervention in a larger system” (p. 57). Examining the health of the cultural data ecosystem provides a critical window into the interrelated workings of the individuals, groups, and data sources that comprise this significant branch of the broader information ecosystem.

Cultural datasets include information that is among the richest, most diverse, and the most ephemeral, in existence. Examples discussed in this paper include recorded performances of poetry readings and circus events, national literatures, government debates, videogame code, and resources supporting linguistic and cultural resurgence. These collections preserve and provide public access to artifacts of the arts, heritage, and culture, which “help shape reflective individuals, produce engaged citizens, impact cities and urban life, improve health and well-being, and have distinctive economic benefits” (Terras et al., 2021, p. 2). Cultural data encompass artifacts that reflect and represent what it means to be human. This is what sets these collections apart from other types of data collections. As such, there is an urgent need, and great potential, for the cultural data ecosystem to reflect the diversity of human experience, expression, and creativity, beyond the Western canon and the formats included in many library collections.

A healthy cultural data ecosystem will strengthen the broader information ecosystem by democratizing access to cultural knowledge and ensuring appropriate standards are applied by content creators and curators. In this way, the information ecosystem benefits from heightened resilience against systems of misinformation and disinformation, as abundant, interoperable, and accessible cultural data strengthen our understandings and analyses of history and society. One well-known example of the power of such a cultural data approach is highlighted by Verwayen et al. (2011), who describe how the online proliferation of fake and low-quality images of Vermeer's painting The Milkmaid led the Rijksmuseum to release their cultural data, openly. The museum discovered that there were more than ten thousand fake images of the painting circulating online, causing a situation where “people simply didn't believe the postcards in our museum shop were showing the original painting [and this] was the trigger for us to put high-resolution images of the original work with open metadata on the web ourselves. Opening up our data is our best defence against [fakes that mislead the public]” (2011, p. 2).

A healthy cultural data ecosystem is also more capable of responding to engagements with systems of power, advantage, and disadvantage, as diverse cultural data support the correction of biased historical narratives. As Burkey (2022) notes, “heritage communities can utilize digital heritage initiatives as a nexus of information, where more voices can be brought together in providing a richer set of perspectives and a multitude of conversations instead of a particular narrative” (p. 196). This echoes Montenegro's (2019) critiques of the universalization of knowledge representation where, for example, “metadata and analyses [of traditional Indigenous knowledge are] generated by professionals and authorities… external to those communities, resulting in… incorrect information about Indigenous people's histories and realities” (p. 74). Indeed, Burkey (2022) explains that there is now a “consensus [that] a wider variety of channels equates to more voices in the conversation and at least the potential for increased involvement, broader interpretations, and more democratized versions of remembering through digital heritage initiatives” (p. 192). This makes visible the untold stories of marginalized peoples who would otherwise be absent from historical records. The research results presented here demonstrate that with sufficient, consistent care and appropriate resources, the social dynamics that challenge the sustainability of cultural data initiatives can be overcome. This, in turn, enables the cultural data ecosystem to contribute to the betterment of society.

2 GLOBAL CHALLENGES IN SUSTAINABILITY OF CULTURAL DATA PRACTICES

The gathering and management of cultural data is dispersed across government agencies, libraries, museums, galleries, universities, and independent collections held by cultural organizations or individuals. This work occurs within a broad spectrum of organizational contexts and available infrastructure and resources, ranging from large-scale, formalized collections with significant, long-term investment (e.g., mandated depository collections in national libraries), to small, bespoke collections developed by individuals with limited resources (e.g., an individual researcher gathering cultural materials over the span of their career). While countries have different approaches for providing funding and support opportunities for cultural data initiatives, there are many similarities, particularly regarding long-term sustainability and accessibility challenges.

In Australia, for example, many initiatives are undertaken by the Australian Research Data Commons (ARDC), which provides high-capacity data storage, hosting for many discipline-based collections, and project funding (https://ardc.edu.au/). Research Data Australia (https://researchdata.edu.au/), an ARDC data discovery service, provides access to data collections held by over 100 organizations. Yet, many initiatives focus on large consortia, not small-scale collections. At the individual level, researchers and small organizations rely on discrete, time-limited and highly competitive funding, influenced by government priorities. For example, the Australian Research Council's (ARC's) Linkage Infrastructure, Equipment and Facilities (LIEF) grants enable university researchers to form cooperative data partnerships (see https://www.arc.gov.au/funding-research/funding-schemes/linkage-program/linkage-infrastructure-equipment-and-facilities).

Canada also provides competitive funding to support cultural data initiatives, including the Social Sciences and Humanities Research Council (SSHRC) (https://www.sshrc-crsh.gc.ca/home-accueil-eng.aspx) and the Canada Council for the Arts (https://canadacouncil.ca/). SSHRC's Research Data Management Capacity Building Initiative, for example, supports development of skills and adoption of data management tools (see https://www.sshrc-crsh.gc.ca/funding-financement/programs-programmes/data_management-gestion_des_donnees-eng.aspx). The National Heritage Digitization Strategy (https://ccdh-cnpc.ca/) is a 10-year project (from 2016) supporting collaboration among Canada's memory institutions to preserve and provide access to heritage materials. Library and Archives Canada funds the Documentary Heritage Communities Program (https://library-archives.canada.ca/eng/services/funding-programs/dhcp/pages/dhcp.aspx), enabling community organizations (e.g., historical societies) to increase access to collections and foster preservation.

The United Kingdom (UK) provides similar, incremental funding schemes, with similar challenges for long-term sustainability (Wright & Gray, 2022). Interdisciplinary data initiatives are supported by the UK Research and Innovation's Arts and Humanities Research Council (AHRC) (https://www.ukri.org/councils/ahrc/), and international organizations such as the European Research Infrastructure Consortium's Digital Research Infrastructure for the Arts and Humanities (DARIAH) (https://www.dariah.eu/). However, the time-limited nature of these schemes, and their focus on projects instead of programmatic funding, do not provide sustainable infrastructure and skills development. Yet, several national initiatives do focus on longer-term strategies: the Creative Industries Clusters Programme (https://creativeindustriesclusters.com/), driving innovation, commercialization, and skills development; Towards a National Collection (https://www.nationalcollection.org.uk/), a project designed to remove collection boundaries and enable accessibility; and the Museum Data Service (https://artuk.org/about/museum-data-service), a partnership between Art UK, the Collections Trust, and University of Leicester to create a data repository for national museums and public collections.

3 THE NEED FOR A HEALTHY CULTURAL DATA ECOSYSTEM

Where initiatives rely on short-term funding, and small research teams, the fragility of the cultural data ecosystem is particularly evident. A healthy cultural data ecosystem relies on a mature funding and support model, where maintenance and expansion are sustained. The current approach, globally, means many artifacts are not discoverable by users, collections are developed in isolation and lack interoperability, and all phases of collecting work are managed with a precarious workforce. These aspects of the current cultural data ecosystem raise significant questions for long-term viability and sustainability.

While these challenges are not new, they are becoming more pressing. Cultural heritage institutions have been digitizing to preserve materials and increase access since the 1970s (Terras et al., 2021). Yet, a recent audit of digital collections found significant variation in the structure and organization of metadata, limited resources, and outdated systems (Gosling et al., 2022). Growing emphasis on commercialization of large datasets raises concerns about conflicts with the fundamental principles of open access, and lack of sectoral capabilities for collections work. The Alan Turing Institute (McGillivray et al., 2020) found the digital humanities discipline to be at a critical turning point, requiring reassessment of interdisciplinary collaboration, skills, and infrastructure to realize technological developments (p 8). Their recommendations include:
  1. A reconsideration of methodologies and developing a shared understanding across the technical and humanities/arts disciplines.
  2. Technical infrastructure that is not project dependent and time limited to avoid fragmentation.
  3. Research funding to support and recognize the conditions needed for interdisciplinary research including cross-council schemes, recognition of hybrid roles and institutional resources to enable researchers to bridge disciplinary gaps.
  4. Training for humanities researchers and professionals through degree programs and short courses.

Several reviews by government agencies and disciplinary peak bodies identify the need for national policies and resourcing to support consistency and interoperability (e.g., Academy of the Social Sciences in Australia, 2022; Tindall & Duncan, 2020). These reports highlight the risks to cultural data preservation due to lack of consistent policy, inadequate resourcing, and lack of skills, globally. As Terras et al. (2021) note, the “legacy of 30 years of investment in cultural heritage digitization is a patchwork of small to large scale content, held in different locations, formats and under different reuse licenses, with different institutional approaches to risk, public engagement and entrepreneurship” (p. 11).

The National Library of Australia's Trove is an excellent example of the sustainability challenges faced even by large organizations mandated to collect and provide access to cultural data. Trove provides public access to more than 6 billion cultural data items from hundreds of partners, including libraries, museums, galleries, media organizations, government, and community organizations. Yet, Trove suffers from a severe lack of funding, which culminated in a financial crisis in 2022. As Jones and Verhoeven (2022) note, this is a symptom of a larger funding crisis across a sector that needs “sustainable, recurrent funding”; this speaks to the fragility of cultural data initiatives, broadly.

4 CHALLENGES WORKING ACROSS DISCIPLINES AND PRACTICE ENVIRONMENTS RELEVANT TO CULTURAL DATA

In addition to these funding and sustainability challenges, there is also a lack of integration among the experts working on these initiatives. As Edmond (2015) notes, there is historical tension between libraries and historians in the management of collections, leading to the idea that digital humanities research infrastructure is in some way separate from traditional library, archive, and museum work. The digital humanities have engaged in negotiation of disciplinary boundaries and traditions, which remain unresolved. As Edmond (2015) notes, the “fragmented definition and conceptualization of infrastructure is another result of the organic changes that form a part of the shift that has brought digital humanities into existence, with a traditional and valued paradigm facing competition based on a new conceptualization of and by users” (p. 59). Yet, Edmond (2015) also links these challenges to the current state of uncertainty for cultural data projects, noting:

Maintenance of digital resources falls between traditional roles and areas of expertise, and therefore responsibility for it remains ambiguous… to guarantee that both data and interface will be available in anything approximating perpetuity is an exceptionally difficult and expensive promise to make, and it is therefore no surprise that the long-term fate of so many digital humanities projects remains uncertain. (p. 61)

Indeed, cultural data initiatives extend well beyond established institutions (such as libraries), or disciplines (like digital humanities). These data arise from research and practice activities in various communities and disciplines. They are curated and maintained by professionals in varied settings and contexts, with design and implementation informed by expertise across the humanities, social sciences, and creative industries, as well as computing and information science. They rely on expertise in preservation, collection, access, commercialisation, decolonization, representation, knowledge organization, information behavior, user experience, human–computer interaction, and software engineering, among others.

5 WHO ENGAGES IN THE CULTURAL DATA ECOSYSTEM?

Despite a shared interest in creating and maintaining cultural data, academic and practice-based entities are often siloed and singular in their focus. This means the various actors who need to be involved in the ecosystem may not be integrated with—or even aware of—other expertise required for short-term activities or long-term sustainability. Computing, humanities, and social sciences students and academics typically sit in different College or Faculty structures. Academic librarians may support similar disciplines, without enabling interdisciplinary investigations of cultural data needs. Funding agencies target initiatives to disciplines, reinforcing siloed approaches to handling cultural data.

Within this broader social ecosystem, cultural data initiatives are conceived of, built, and maintained by many different experts. A healthy cultural data ecosystem requires these various groups to be interconnected and willing to coevolve, to share similar goals, and to be supported by appropriate experts. There are five spheres of expertise relevant to this work, all of whom must work together for the ecosystem to thrive: creators, curators, subject matter experts, information science experts, and computing and technology experts.

5.1 Creators

Without cultural data creators there would be no collections in the broader information ecosystem. Creators include artists, performers, writers, and other professionals who create objects, performances, recordings, images, texts, or other cultural artifacts. This category also includes “broadcasters, publishers, …, innovators, creatives, exhibition creators and, developers” (Trove Strategy, n.d.). Creators may be contemporaneous, such as coders on independent video games, or historical, such as Renaissance composers. Creators generate cultural data by conducting creative work, which are integrated into various types of informing collections. For their data to endure, creators must accumulate, retain, or share them at point of creation. Creators generally lack formal training in collection management, curation, or preservation. Rather, they may preserve their work because of its informative value, for professional or legacy connections, or due to an instinctive impulse.

5.2 Curators

Where a creator amasses cultural data over a lifetime of creative practice, a curator then shapes that cultural data into an organized, described, and now often digitized or born-digital, collection. These collections, in turn, feed into the broader information ecosystem, to represent experiences across local and global societies. This category includes all types of information professionals, including librarians, museum curators, and archivists, who are formally trained in collection, description, presentation, storage, and preservation practices. Curators navigate an “interactive relationship” between users and collections (Walter, 1996). Digital collections extend beyond the walls of information institutions and “show the potential for greater interactivity, transforming how narratives around heritage spaces and objects are constructed and interpreted” (Evans, 2016, p. 51). In the cultural data ecosystem, curators create databases, catalogues, metadata schemes, and user interfaces. They contribute expertise around system interoperability, copyright, information ethics, and record management and retention. They preserve the historical record and ensure equitable information access through education and research, providing critical content and infrastructure that intersects with the broader information ecosystem.

5.3 Subject matter experts

Subject matter specialists connect with cultural data through disciplinary expertise or practice contexts, including research and teaching. While these experts may be creators, many are not; they are often historians, linguists, musicologists, dramaturgs, or literary scholars, who may also contribute to other branches across the broader information ecosystem. These experts may not start out thinking about data, but over time, they come to think about their texts as data. This category includes experts in digital humanities, which continues to evolve as a field of practice and research (Callaway et al., 2020). It has been defined as “interdisciplinary” (Wymer, 2021) or “a field, a methodological tool kit, a discipline, a subdiscipline, and a paradiscipline” (Risam, 2021), and “an array of convergent practices” (Schnapp & Presner, 2009). Increasing integration of technology into humanities research practices has involved development of new approaches to data analysis, modeling, visualization, and conceptualization (Camlot et al., 2020; Luhmann & Burghardt, 2022; Terras et al., 2021).

5.4 Information science experts

Two information science specializations contribute significantly to cultural data initiatives, as well as to the broader information ecosystem. First, information behavior scholars bring expertise in user needs and practices. They examine how people engage with data and systems, including “information experiences, in diverse circumstances and settings, and across various personal activities and outcomes” (Given et al., 2023). They explore how people navigate information, including purposeful, question-driven information seeking, and more serendipitous, often leisure-driven, information exploration. Second, knowledge organization experts study the organization and structure of information in support of discoverability and access, with expertise in classification, indexing, abstracting, taxonomies, and metadata. Knowledge organization's contribution to the cultural data ecosystem includes resolving issues where “information is being organized ad hoc, often resulting in systems that underperform and even effectively prevent access to data, information and knowledge” (Golub & Liu, 2022, p. 17).

5.5 Computing and technology experts

Cultural data initiatives require the expertise of system designers, software engineers, and often, experts in computational analysis. These experts contribute technical expertise, rather than expertise rooted in the subject domain of a data initiative; they contribute critical skills to all branches of the information ecosystem. Their research and practices are usually focused on the computing-related organization, structure, retrieval, and management of cultural data, rather than its substance, qualitative meanings, or underlying organizational principles. Some computing and technology experts bring formal expertise in other disciplines, as well; for example, many linguists work in computational methods, having developed technical expertise in addition to subject specialization. This category includes research software engineers, who support research in the humanities and other disciplines; they often contribute to research projects by developing specific technical solutions for data collection, analysis, management, digitization, system design, and maintenance (Hettrick, 2016).

The integration of all five spheres of expertise is essential for a healthy cultural data ecosystem to deliver on intended outcomes. Yet, knowledge siloing remains widespread, with project teams often including experts from only a few of these categories. For the ecosystem to thrive, and to support long-term viability of cultural data initiatives, we need to better understand the perspectives and experiences of these groups. The results of this exploratory, empirical study identify the key elements required for a healthy cultural data ecosystem. This includes elements that can enable experts who are often trained (and work) in isolation, to collaborate more effectively in a shared commitment to allied coproduction (Figure 1).

Details are in the caption following the image
A healthy cultural data ecosystem requires all experts—creators, curators, subject matter experts, information science experts, and computing and technology experts—to work together

6 METHODS AND PARTICIPANTS

This research used qualitative, in-depth, exploratory key informant interviews to map the current state of the cultural data ecosystem, globally, and to formulate recommendations for a healthier ecosystem, in future. Using a maximum variation sampling approach (Palys, 2008; Stebbins, 2008), we interviewed nine experts in three countries (Australia, Canada, United Kingdom); participants were identified using a mix of purposive, convenience, and snowball sampling. Interviews were semistructured and ranged between ~40 and ~120 min, based on participant availability; six interviews were conducted online (via Zoom or MS Teams) and three were in-person. Participants reflected the range of roles present within the cultural data ecosystem; they reflected on their own cultural data experiences and work, and they provided views and insights on global trends (including challenges and opportunities) across the sector. As appropriate in a study of experts' experiences and views, participants were given the option to be anonymized, or to be referenced by name.

The resulting participant group includes diverse experts. Interviewees include librarians and archivists, drawn from research libraries and national archival organizations, with specializations in print and digital collections, digitization initiatives, and metadata creation and management. Researchers who came into cultural data work from humanities fields, such as history, creative writing, musicology, and literary studies, were interviewed, as well as those with digital humanities specializations. The participants included:
  • David, a professor, and writer with long-term involvement in building and sustaining the Living Archive of Circus Oz, a contemporary Australian circus collective.
  • Andy, a senior leader in a national archival institution in Canada.
  • Jessie, a senior leader in a national archival institution in the United Kingdom.
  • Sean, a digital curation librarian, in the Canadian research library sector.
  • Sharon, a head librarian in a research-intensive institution in Canada, whose responsibilities focus on metadata.
  • Taylor, a research fellow affiliated with a major Australian cultural data initiative.
  • Jason, a professor of literature and director of the SpokenWeb Partnership Network, based in Canada.
  • Melanie, a historian of computing in Australia, who is closely involved in leading large-scale software preservation initiatives.
  • Daniel, a data scientist based at a large Australian university.

Interviews were transcribed, verbatim, and analyzed using an inductive approach where themes were identified as they emerged from the coding process (Fox, 2008). As this research is exploratory and inductive, its results cannot be decontextualized or isolated from interpretation, and so findings and discussion are presented together (Richardson, 2000). Due to the ecosystem perspective taken in the study design, we sensitized our analysis not only to the experiences of our participants as individuals, but also to what their accounts reveal about the health of cultural data initiatives as part of a broader, global information ecosystem.

7 FINDINGS AND DISCUSSION

Across the participant group, no two experts work with the same cultural data collections. Formats in participants' cultural data work include photographs, digitized print, born-digital texts, music recordings, nonmusical sound recordings, video recordings, software, and social media and web content. As contextualized documents, these formats represent family, community, and colonial histories; books, magazines, newspapers, letters, government records, and other text-based documents; professional and amateur music; poetry readings, spoken word, oral histories, interviews, and conversations; musical scores; films and documentaries; video games and complex media artifacts such as architectural files; and the contents of social media platforms such as Twitter. As participant Sharon, head of metadata at a large Canadian research library, points out, these documents embody “intangible cultural heritage,” because they make possible the discovery of “intangible things, as captured in tangible things.” By speaking with experts with different spheres of expertise, and who work with divergent cultural data collections, this study has identified common elements and challenges.

First, our analysis finds five predominant structural elements underpinning the cultural data ecosystem. In the following sections we outline these elements, including characteristics that lead them to function in unhealthy or healthy ways. Second, our analysis also reveals rich distinctions between the five spheres of expertise described previously, identifying considerations for assembling effective teams for the collection, preservation, and study of cultural data. Third, participants' accounts identify five core signs of health needed within the cultural data ecosystem and, by extension, individual projects. A healthy cultural data ecosystem is depicted in Figure 2, a visualization of our findings. As in any ecosystem, all elements are essential for the ongoing vitality of the system; each are discussed, in turn, in the sections that follow.

Details are in the caption following the image
A healthy cultural data ecosystem

7.1 Structural elements

7.1.1 Funding environment

Funding, as an essential but uncertain resource, is the most significant structural element of the cultural data ecosystem. The cyclical nature of grant funding can bring projects to a premature end and cause lost progress when projects languish due to a lack of resources. For Taylor, a research fellow working on a large university-based Australian cultural data initiative, the funding status quo is marked by “bursts of investment […] There's kind of a finite end to these projects, and then you sort of have to start again.” David, a professor and writer with long-term involvement in building and sustaining the Circus Oz Living Archive, confirms Taylor's observation. The Circus Oz Living Archive is a portal showcasing and enabling research into the performance videos of Circus Oz, a prominent Australian circus company. David recounts initial investments into the planning and construction of the Living Archive, but then, as investment ceased, the Living Archive degraded (starting in 2014), until it ceased to function. Once the Archive became a cultural data emergency, urgent efforts by other researchers revived it; but it remains precarious in the absence of dedicated funding and institutional commitment. Today, David observes that “hosting and maintenance of the Living Archive itself is still very much…in doubt.”

Insufficient funding also causes instability for project staff, who often work on finite contracts. Experts in data science roles are difficult to retain as they can earn larger salaries in industry roles. Taylor observes “there can be some precarity around the people in those roles because they can get paid better [elsewhere] […] Who wants to be attached to, like, a complex, difficult, you know, challenging project on a kind of mediocre salary?” The funding cycle affects the ability of cultural data projects to build collections and systems over time. The grants funding this work are extremely competitive, and projects funded in the past are not guaranteed to be funded in the future.

Funding uncertainty also pervades cultural data collecting within national institutions. For example, Andy, a senior leader in a national archival organization, highlights the paradox of working for a collecting institution that is simultaneously influential and vulnerable to cutbacks. Andy points out that:

We also have so much authority and credit in some spaces that if we were to say, ‘we do this [metadata modernization] now,’ a lot of folks would be, like, great. In fact, they're looking to us to do a lot of that. But […] you know, the next government could come in and be like, ‘you guys get twelve dollars a year.’

Achievements that take years to build can be compromised quickly. Andy mentions the example of establishing relationships with communities historically affected by colonization; these relationships are key to knowledgeably and responsibly collecting certain cultural data, such as historical photos of Indigenous peoples whose names have not been documented. This important work is easily destabilized by government austerity measures.
One healthier side of today's uncertain funding environment is the ability of granting requirements to motivate partnerships. The SpokenWeb Partnership Network is a group of collecting institutions focused on “literary audio” such as poetry readings. Jason, the SpokenWeb's director and a professor of literature in Montreal, describes coordinating a SSHRC Partnership Grant for the purposes of expanding SpokenWeb. The application process required institutions to make commitments, which was helpful. Jason observes:

You have to have a certain degree of matching funds with [partnering institutions] to put the money to digitize the collections on the table. […] Before year one even began, I felt like I had made a big win, you know, because it meant that all those institutions had committed.

In other words, SSHRC's requirements were an imperative to develop crucial partnerships and material commitments.

7.1.2 Policy environment

Cultural data collections and research exist within policy contexts, with many forms of regulation, and guidelines affecting this work. Around metadata, for example, Sharon cites the current importance of the FAIR principles for digital data management (FAIR Principles, 2016), and the CARE Principles for Indigenous Data Governance (GIDA, 2018). In Canada, the Calls to Action from the Truth and Reconciliation Commission have motivated improvements in how Indigenous cultural data collections are managed and described (Truth and Reconciliation Commission of Canada, 2015).

Copyrights, ownership, and permissions are a near-ubiquitous concern in the cultural data space. Sean, a digital curation librarian at a large Canadian research library, observes that copyright concerns always factor into decisions with any digitization project: “The university's own risk tolerance does come into the conversation.” In Australia, Melanie, a computing historian and lead investigator on large software preservation initiatives, finds Australia's copyright law is an enabler rather than a hindrance. She describes studying the law when considering how to proceed with a new large-scale, LIEF grant-funded initiative involving multiple institutions across the country, in a new shared adoption of Emulation as a Service Infrastructure (EaaSI) for software preservation, sharing, and use. She recalls:

I was glancing through the act [and] thought Section 113J looked pretty promising. But I wanted to make sure, and so I sought expert IP legal advice. […] And I was right. […] It's legal for libraries and archives, as defined in the act, to make a preservation copy of content that's in their collection, and to make that available to research purposes, in the library or archive, or in another library or archive. And that's the really exciting bit.

This EaaSI project enables libraries and archives to make emulation available to their users, and to share collected, preserved software with each other's users. This is an example of beneficial alignment between a project's goals—software preservation and access—and the current copyright environment.

7.1.3 Extant data

Cultural data initiatives always hinge on the question of what data exist, affected by historical collecting decisions made by individual private collectors or staff in collecting institutions. Forms of cultural data of interest today are often unaligned with past mandates of collecting institutions, so projects must sometimes draw varied strategies to build a collection. Melanie must continually address this challenge. In collecting 1980s and 1990s video games, Melanie's work relies on Ebay vendors. She reports “finding them really is tricky. […] We have bought a lot of the games that we've targeted for acquisition. […] That's just a matter of, you know, watching and waiting and finding a good copy and spending the money.” Melanie rightly argues that acquiring software not collected by libraries or archives is an ongoing challenge. Cases frequently arise of contemporary software becoming unusable. As Melanie observes, “This is not just about the past. This is about the now, and contemporary memory.” Librarians, archivists, and others involved in collecting decisions have significant power to shape future cultural data collections, which can only be built from data artifacts that are retained.

7.1.4 Institutional or community context

All cultural data collections emerge from, and exist within, an institutional or community context. Context is fundamental to the understanding of collections as data in the first place. Context is also a core influence on the intentions and imagined purpose of cultural data collections, including which research questions are imagined for them, and how a community will benefit from supporting them. Institutional or community context also frames thinking around who may be the intended, potential, or current users of cultural data. To satisfy funder requirements, cultural data initiatives are often positioned as beneficial for the public. In fact, initiatives are more likely to have specific users, such as researchers, or a specific Indigenous community focused on linguistic and cultural reclamation.

Context also drives the research potential of a collection. As Jason, director of the SpokenWeb Partnership Network, observes, thinking about literary audio recordings as data has expanded his view of research possibilities:

Everything we were doing, even on the development side of this project was a research question, right? […] From a literary studies perspective, research only begins with the content, you know. And it'll be like, ohhh, it's with the text, but really everything in the SpokenWeb Network sort of became a research question about the management of data, like so in a lot of ways this whole project is essentially looking in different ways at how we work with, how we create, how we imagine, how we structure, and how we use data.

Sharon, who also contributes to cultural data collection management in Inuvik, within the Inuvialuit Settlement Region of northern Canada, adds that research emerging from institutional contexts need not be at odds with a collection's community purposes. She observes “the more access people have to things […] everything from a local genealogy researcher to somebody who does text mining […] Just making things available and accessible for people, I think creates opportunities there.”

7.1.5 Technological infrastructure

In the cultural data ecosystem, information and digital technologies are tools in service of goals, rather than ends unto themselves. Each collection requires in-depth consideration of a suitable approach. A collection's purpose, constraints, and user needs must be considered closely before any technical decisions are made. Sean, the digital curation librarian, often provides in-depth consultations prior to the library partnering with data-holding community groups. He explains:

A lot of my role is helping [community groups] figure out rights, ethics, access to those materials, preservation, metadata, description, and then often we're trying to see if we can help them with infrastructure. So, you know, can you? Put your materials in our repositories? […] And sometimes that works, sometimes it doesn't, and it…may lead to, you know, we've consulted, given advice, connected.

Sean's work illustrates the considerations required before asking questions of infrastructure.
The cultural data ecosystem produces new technologies in the process of meeting aspirations around preservation, access, and research into cultural data. These new technologies may be “shiny,” to use Sean's term, but they may also be more “organic,” as Jessie describes. Jessie is a senior leader, with a metadata specialization, in a national archival institution. Having been doing this work for some years, Jessie observes that amalgams of systems can be necessary, if complicated, even as a collecting institution simultaneously works toward transitioning to new infrastructures:

You have to kind of do workarounds and bolt this bit on and bolt that bit on […] we have this sort of organic growth of systems such that we now have multiple systems and […] everything has to be squished together and remodelled and remodelled and remodelled […] We have some, you know, quite old relational databases, fundamentally […] But relational databases is not where it's at when it comes to your data-driven website.

The task for Jessie's team is to “move the infrastructures” into the future. Andy, another leader in a national archival institution, agrees, while pointing out that although technological infrastructure is always important, “it's not actually the hard part. It's figuring out all the thinking around it, getting a coordinated vision of what folks want. It's looking at, you know, what are the drivers?” Technology must follow the drivers, as Andy puts it, to avoid outcomes whose only benefit is being “shiny.”

7.2 Spheres of expertise

Cultural data collecting and research requires people with widely divergent forms of expertise. The interview data provide rich understandings of key distinctions among the spheres of expertise within the ecosystem, which can inform positive team formation and overcome barriers to collaboration. People with common expertise often relate to one another more than they may relate across spheres. As Jessie observes, when people collaborate across spheres it can feel “like being able to talk more than one language.”

7.2.1 Creators

David, with his longstanding involvement with Circus Oz, is an example of a creator in this study. As David describes his involvement with the Circus Oz Living Archive, he speaks from an artist's perspective:

As a performing arts practitioner myself and as a video practitioner myself […] I was interested in memory studies, and I was interested in notions of storytelling and how this collection of video recordings could be seen to be telling the story of this company and this large ensemble of people who had made up the company across many years because it's a very particular form of cultural production, which began as a collective and then morphed over the years, but in a very organic way. […] So it was really as a creative artist, I suppose, that I was involved in this project, and interested in trying to mesh the desires of the company with thinking about the interaction design issues as well.

David has an abiding, multifaceted interest in the original circumstances of the creation of the Circus Oz cultural data (i.e., performance videos), the evolution of the company over time, and the data's storytelling potential. He illustrates that a creator can make a rich contribution to a cultural data collection, without having created the dataset.

7.2.2 Curators

The curators in this study are true to form with their commitment to data access, organization, description, and preservation. Andy, working in national archival institution, voices a common sentiment for curators: “Access is the key to understanding.” Curators know their work is antecedent to the pursuit of research or other engagement with cultural data. Unlike other spheres of expertise, curators are usually working on multiple collections, with responsibilities that traverse any single collection. This creates challenges for curators who wish to develop closer involvement with collections of interest. Curators' roles are often conceived of as functional, with disciplinary knowledge deprioritized.

Curators are considering how their roles will evolve over time. Jessie, reflecting on her work as a metadata expert in a national archival organization, asks: “How can we maintain some of those traditional [archival] principles, but actually put those principles into practice in a different context, because it's a very different context now?” Similarly, Sean observes a shift toward a more proactive role: “It's always a question of who should take the first step, in a way, right, and maybe that's not a conventional orientation for librarians and archivists. You know, where there's a mindset of, you know, we'll be here when you need us.” Curators think deeply about the supports they provide and the systems they build, and they are committed to access as a principle. At the same time, because curators are situated within institutions, such as a libraries or archives, their roles, including with cultural data, are always intertwined with that institutional context.

7.2.3 Subject matter experts

Unlike curators, these experts often have long-term involvement with a single collection, subject area, person of interest, format, or document type. Subject matter experts are characterized by deep expertise in the knowledge domain of the cultural data initiative with which they engage. They may describe a formative experience, through domain knowledge, as their entry point into cultural data. For example, Jason, a literature scholar, reflects on starting to see texts as data:

There were all sorts of data awakenings […] I became increasingly interested in sort of thinking about telling literary history through this other medium, sound, which introduced me to all kinds of thinking about, from a practical perspective, but also a theoretical perspective about what a sound recording was as an entity, you know, as a research object.

Subject matter experts can sustain cultural data collections over long periods of time. At the same time, for some humanities and social sciences scholars, there can be a perceived legitimacy benefit to working with cultural data, simply because it carries the language and potential of data. Taylor, the Australian research fellow, recalls a period when:

Everyone felt the need to, either literally or indirectly, say that they are doing something that involves a quantitative method. Because that could be tethered to a sense of, you know, scientific rigor or impact. And so, the idea that you're collecting all of this data seemed in and of itself, a useful thing to do. But what we know is that it's only as good as what people actually use it for and do with it.

Taylor's observation is that this period of collecting data for its own sake has largely passed, at least in Australia, but the sentiment is important to consider.

7.2.4 Information science, computation, and technology experts

What is most telling in the interview data is the relative absence of discussion by cultural data experts (i.e., creators, curators, and subject matter experts) on the roles of information science experts and computation and technology experts. While the participants in this study do have some background and expertise in these areas, their focus in the discussions was primarily technical and functionalist with respect to these domain areas. None of the participants discussed the influence of research on their practice within these areas of expertise. This indicates a significant gap contributing to the lack of a healthy ecosystem for cultural data, as research in these domains is critical for long-term viability through enhanced knowledge organization practices and evidence-based understandings of people's information needs.

While the interviews demonstrated a level of understanding and sophistication around knowledge organization expertise (e.g., metadata required for collection descriptions), information behavior expertise was not discussed at all. While the participants regularly refer to the concept of “users,” the interviewees primarily demonstrate a surface understanding of the implications of collection work on end-users' experiences, needs, and understandings of their communities. This significant gap requires additional research, as well as integration of expertise from this subfield of information science, across the cultural data ecosystem.

7.3 Signs of health

We have demonstrated how cultural data, considered holistically, benefits from an ecosystem perspective. All cultural data initiatives are shaped by multiple structural elements that are much larger than individual projects, such as the availability of funding, which influences the scale and pace at which the work can proceed, as well as the past collecting practices of individuals and institutions, which determine what documents and artifacts have endured until now and are available to be contemplated as data. We have also described the five predominant spheres of expertise that are required for cultural data projects to proceed successfully.

In the preceding sections, we referenced challenges articulated by participants. These challenges often represent persistent quandaries faced by many people involved in cultural data work. They can be thought of as signs of “unhealthiness” within the current cultural data ecosystem. The purpose of this section is to articulate five predominant qualities that signify health within a cultural data project and, if consistently present across projects, signify health within the broader cultural data ecosystem. A healthy cultural data ecosystem (Figure 2) is one that contains these qualities, favorable structural elements, and participation representative of the predominant spheres of expertise.

7.3.1 Clarity of purpose

In a healthy cultural data ecosystem, a project proceeds with a shared sense of internal alignment, and its aims are clear. Clarity of purpose does not mean an absence of disagreement; contributors may bring divergent goals or research questions to a project based on their own interests and specializations. However, a healthy project radiates a sense of shared aspirations. Contributors can articulate the purpose of the project. This, over time, enables them to identify the insights and impacts that are being created. Contributors can document that intended research, community benefits, or other outcomes are in fact being enabled through the project.

For example, in discussing cultural data initiatives within Indigenous communities, Sharon observes a need for people working in this space to create “community-driven, sort of social contact-driven spaces in a lot of ways […] really thinking more critically about this space and trying to open to a bit more kind of a fluid, dynamic space.” Similarly, Taylor describes the thinking behind a large Australian cultural data initiative: “We thought that it would be more effective with the resourcing that we had, to deeply consider what kind of effort, skills, teams, and so on, intelligences, are required to take cultural data that's sitting in robust but actually fragile collections and see what needs to be done to it to make it immediately useful for some kind of analysis.” These examples of intentionality illustrate how a healthy cultural data project embodies the clarity brought by a reflective orientation to the work.

Clarity of purpose also creates space for unexpected benefits, or positive outcomes that cannot be predicted precisely. This is evident in how Jason, the director and chair of the SpokenWeb Partnership Network, describes this initiative's broad benefits. Having grown out of a smaller project based on the study of a single set of reel-to-reel tapes, scaling SpokenWeb upward involved circulating a call across Canada to locate other literacy recording collections. The clear need addressed by the initiative led to the identification of collections that had not been shared before, and that diversify our understanding of people's involvements with literature in Canada in the 20th century. Jason observes: “Some of the most exciting parts of the discovery process of new collections has been like, of queer communities that were recording their gatherings, you know; of Black communities in Montreal, for example, like where we found interesting collections; and just all kinds of communities that, like, were completely invisible, from even the many diverse collections that we have in our university archives. You know, they just weren't there at all.” SpokenWeb's credibility and clarity of purpose has made it appealing to new partners, including those who can contribute rare and little-known collections to the Partnership Network.

7.3.2 User-focused design

In a healthy cultural data ecosystem, consideration is given to users from the outset, whether they are future researchers, members of specific communities, or the public at large. A user focus is embedded into the design and implementation of cultural data projects from the outset. Users are not an afterthought. There is an ongoing commitment to the discoverability and accessibility of a healthy cultural data collection.

There is great potential in the cultural data sphere for user-focused design to be more prominent and normalized, and for information behavior research to inform cultural data practices. Taylor points out that user considerations often come up toward the end of a cultural data project, “when it's like ‘what do we do with this collection?’ […] And I think there's been a big shift in the last five years with, at least to get funding, a lot of people would say, well, ‘I'm building a database.’” However, Taylor also points to shifting attitudes, at least in Australia, in recognition of the need for purposeful approaches that integrate consideration of users; simply “building a database” is “not as in vogue anymore for a whole range of reasons, the main one being, like, who cares? Everyone's got data. So, you're gonna build a database that's not actually in and of itself an impactful output, unless you can demonstrate precisely who might be using it, what they're using it for, whether it's interoperable with other data sources, and so on.” Understanding the anticipated user, or the community, for a project is not an afterthought, but a foundational consideration.

A user-focused approach also includes awareness-raising and outreach, and the need to resource these important activities sufficiently. As Sharon puts it, “How do you let people know this thing [a collection] is here for them to find?” She points out how, around cultural data collections, people often imagine “the one stop shop. You know, one portal to rule them all. But people don't know the portal.” Healthy cultural data initiatives incorporate ongoing outreach as part of iterative, user-focused design and ensuring that user-focused design is supported by research evidence from information behavior studies.

7.3.3 Sustainability

Sustainability refers to the provision of appropriate investment over time, including but not limited to regularized funding. Adequate and secure long-term funding enables staffing to be less precarious, strengthening teams and ensuring project continuity and integrity. Programmatic investment enables collaborators to develop a vision for long-term maintenance and the less “shiny” work of ensuring that cultural data collections endure and remain usable over time. Sharon voices a common sentiment among participants that “right when the money happens, you get it, create a thing, and then it sits and then it just never goes anywhere or continues to thrive. […] There's so much support and infrastructure and resources for building the new thing.” By contrast, it can be difficult to secure resources to support the essential work “that's not the jazzy, interesting new stuff.” These less “jazzy” elements of cultural data work, such as infrastructure maintenance, are essential to a project's sustainability, but they must be provisioned adequately with resources and leadership.

David's description of the history of the Circus Oz Living Archive illustrates how cultural data projects can “fall off the perch,” as he puts it. The Living Archive is the public-facing access point for the company's archival videos, imagined originally as a portal that would allow collaborative research including identification and description of performances. After 2014, when initial funding rounds expired, David notes:

There wasn't funding for the ongoing maintenance of the platform, and Circus Oz themselves had some changes in personnel and some changes in strategic direction, I suppose, and so they became both less interested and less able to maintain the Living Archive themselves, and it gradually, over the ensuing years from 2014, it fell from daily use and eventually got to the point where it was no longer accessible.

It was not until several years later when the Living Archive's decline was noted by some of the original project team that efforts began to revitalize it.

7.3.4 Allied coproduction

In a healthy cultural data ecosystem, there is a sense of open collaboration and mutual understanding among experts. Connections across spheres of expertise are particularly important, where ongoing respectful engagement is necessary as very different experts work together toward shared aims. Cultural data work is interdisciplinary and must be appreciated as an inherently shared enterprise.

Participants consistently emphasize the importance of not only working together well, but also maintaining awareness of the divergent expertise necessary for a cultural data project to succeed. Jessie observes that “patience and persistence” are needed, and that, at times, “it's like being able to talk more than one language, like I constantly find myself having to switch language.” Taylor echoes the importance of “having the right knowledge within a team.” Similarly, Daniel, a data specialist at a large Melbourne university, describes international collaborations with data-driven musicologists. He observes that over time, “people find a way to work together and find a common ground. […] The community develops a shared vocabulary which informs the technical terms used to describe things, but it also informs the way that people are able to talk to one another, sort of across disciplinary boundaries.”

Cultural data project teams can be highly diverse in terms of skills and expertise. To that end, participants observe that their role on a team is always partially pedagogical. Everyone must devote intentional effort to communicating the complexities of a project from their point of view. This includes complexities around technical requirements and possibilities, metadata schema, formats, and matters of ethics, ownership, and copyright; it also includes the grounding knowledge domain that informs a project's overarching structure, purpose, and research questions.

Jason, describing the leadership model for the SpokenWeb Partnership Network, points out that the Network's governance committee meets every 2 weeks, and has done so for 6 years running: “The people involved are just, you know, great. And we really like each other. […] And that's a huge, a huge important part of it, it sounds maybe glib or something, but it really has been such an important element of the success of the research network, that we care about each other. And we are interested in hearing each other's concerns and adapting to them.” A well-allied team, bringing together the right knowledge, is essential to the health of a cultural data project.

7.3.5 Reciprocal interconnectedness

Each cultural data project is unique and local, to some extent. At the same time, the health of the larger ecosystem, and the success of individual projects, benefits greatly when collaborators maintain some awareness of other projects, innovations, and communities of practice around the world. Reciprocal interconnectedness includes technical and metadata interoperability, which are key to the growth, flexibility, and sustainability of cultural data efforts. However, it also includes a commitment to, and practice of, staying acquainted with other relevant work. In a healthy ecosystem, there is a balance between minimizing redundancy (i.e., not “reinventing the wheel”) and identifying local needs and unique features.

Andy, drawing on experience in a leadership role in a national archival institution, emphasizes the potential benefits of reciprocal interconnectedness, as well as the future hazards of not attending to it. Andy argues that a key opportunity for those working in cultural data is to “really leverage technology for access in new and exciting ways. […] There's a lot coming that is being built at universities and other non-profit things that we'll be able to leverage. […] The next 10 years is going to be really big for that, because holy shit, the potential disaster of everybody going off and creating all that stuff and then later, how are we going to tie it all back together?” Andy also emphasizes that interconnectedness should be both international and intergovernmental: “Not only should Canada and Australia be talking about things, but, like, Higher Ed and Cultural Heritage [federal government departments], often because of where they live in funding models. [They] don't talk to each other, but there's so much overlap there. […] I worry about that a lot and currently in [my country] I have many worries about [multiple peak bodies] […] I'm just like, ah, everybody stop and do it together.” More fragmentary, competitive approaches, and their attendant instability, undermine the benefits of more robust, normalized interconnectedness.

Cultural data projects involving multiple partners gain sustainability and resilience when there is a high-functioning, well-run team in charge. Reciprocal interconnectedness relies on effective leadership, where the individual components of a project or initiative are led by experts who can manage their complexities as well as articulate them to other collaborators. The SpokenWeb Partnership Network is one example. One of its key characteristics is its network approach; its leadership team decided not to build a single repository, and instead focus on enabling discovery and metadata crosswalks among existing repositories. While scaling up is often desirable for cultural data projects, consideration must be given to the multiple ways in which this can occur, as well as potential unintended implications. Daniel, whose experiences include studying music scores with computational methods, cautions that relying on broader technical schema, such as Dublin Core for metadata, can undermine the utility of collections for specific research questions. Daniel points out:

A lot of the work that we're doing, we're asking fairly unique questions of the data that often require a very bespoke data model. One of the issues you can have with things like Dublin Core is that […] you can easily lose information as you sort of transform things. You can easily end up with a situation where anyone who had anything to do with, like, the creation of a song ends up being a creator, which is totally useless if you're interested in looking at like the poetry, the authorship, and other things.

Calls for interconnectedness on the largest scale—such as among those who argue for unified national repositories to ingest many data types—must be questioned so that loss of local specificity is clearly understood and can be factored into decision-making. Like any part of an ecosystem, it may be possible that having too much interconnectedness weakens the overall system and compromises other priorities.

8 CONCLUSION

Although many cultural data collections remain at risk, and initiatives are often precarious for reasons documented in this paper, there is great potential for these collections to have diverse positive impacts on society. Taylor sees a significant opportunity in getting “robust, clean, useful quantitative cultural data in the hands of decision-makers and government. […] It actually has very specific things to say that speak to a range of industries and contexts.” Andy cites the potential of cultural data to contribute to repairing historical injustices, in an inclusive, participatory way: “We have this big initiative where we have all these photos of [Indigenous] folks and we're reaching out to communities to be like, do you know any of these people? Because we want to name these people in our collection.” Jason highlights the SpokenWeb's success at increasing the visibility of voices not previously known to Canadian literature: “It's really diversifying our understanding of what literary activity and ‘the literary’ meant in Canada, really from the 50s to now, and bringing in a lot of different personal voices.” With this study, we have gathered the perspectives of experts from across five spheres of expertise, and through their accounts, we have mapped the elements of a healthy cultural data ecosystem for the first time. With more stable investment, and more diverse, expert teams, there is great potential for many more cultural data collections to have widespread positive impacts like those documented here.

It is also important to note that, as one branch of the broader information ecosystem, the cultural data ecosystem holds many potential lessons for other branches. This study was limited to one branch and the experiences of experts working within only three countries; as an exploratory study, it makes a significant contribution to mapping the current landscape for cultural data work in Australia, the UK, and Canada, and laying the groundwork for future research. The long-term sustainability of collections, for example, is not limited to cultural data; government records, genealogical materials, scientific research data, and other information sources may face similar challenges, threats, and opportunities to those identified in this study of cultural data initiatives. Similarly, ensuring that curatorial initiatives draw on expertise from all types of individuals working within the area, is critical to ensuring the long-term health of the specific ecosystem branch, as well as that of the overarching information ecosystem. Additional research is needed across various branches, to understand the specific needs and complexities that must be addressed.

As the information ecosystem is global in nature, it is also important to examine these issues across nations, systems, and cultures, to ensure that the ecosystem can work for the betterment of society. While this research examined cultural data experiences in three countries, extending these results with the addition of new perspectives—particularly from Indigenous peoples and non-Western countries—will further extend our understandings of the importance of cultural data initiatives, globally, and their impact on the broader information ecosystem. By ensuring that the information ecosystem, as a whole, includes perspectives and artifacts representing all people, we can ensure that society will benefit from its collective wisdom long into the future.

ACKNOWLEDGMENT

The authors acknowledge the support of Australian Research Council Linkage Infrastructure, Equipment and Facilities (LIEF) Program Grant LE210100021, entitled ACD-Engine: Enriching cultural data for research, industry and government. Open access publishing facilitated by RMIT University, as part of the Wiley - RMIT University agreement via the Council of Australian University Librarians.