Connecting DMSPs to Research Outputs

By Brian Riley, California Digital Library (CDL), and Mary O’Brien Uhlmansiek, The Association of Research Libraries (ARL)

In March, the lead developer for the DMP Tool, Brian Riley, attended a workshop on “Scientometrics Using Open Data” offered by the Centre for Science and Technology Studies (CWTS) at Leiden University. Participation in this session allowed us to share the work we are doing as part of the MAP Pilot project funded by the NSF and IMLS, and to collaborate on scientometric analyses using open data sources such as Crossref and DataCite.

The MAP Pilot project involves working with 10 institutions across the US to test connecting machine-actionable data management and sharing plans (maDMSPs) with related research outputs. Using research project metadata and persistent identifiers to query open data sources, it is somewhat easy to find research articles produced by a particular project, but not the datasets, software and other artifacts that are described in a DMSP. We are investigating ways to improve their findability using automation including machine learning/AI.

When maDMSPs are created in the DMP Tool, users can enter useful project metadata to enable queries with other systems. This includes ORCIDs for contributors, funding opportunity identifiers, RORs for affiliations and funders, anticipated project start and end dates, and the planned data repository for storage. The DMP Tool then assigns a DMP ID to the DMSP.

DMSPs are often created years before the research outputs. The DMSPs in the DMP Tool with good metadata are only 2-3 years old, and their DMSP outputs have not yet been published. Therefore, the institutions contributing to our pilot have been asked to find older, funded research projects and their outputs to use as test cases. Using a new feature to upload an existing DMSP, they will enter basic information about the project (i.e., title, PI, grant identifiers) for research funded by 4 major US agencies (NSF, NIH, DOE, and NASA) and for which we have the most developed API integrations. As potential DMSP outputs are identified, the pilot teams will verify their relation to the research.

Identifying related DMSP outputs within the DMP Tool will give data librarians and research/grant management offices insight into the outputs of research projects, academic departments, and the institution. Users can generate reports for compliance checks (was the data shared according to the funder’s policy), grant reporting, and research management activities.

With sufficient metadata, how do we find related DMSP outputs? We start by exploring open data sources like Crossref, DataCite, and COKI. For example, we explore DataCite’s GraphQL API to extract DataCite metadata and compare it with DMP Tool projects. We use an algorithm to compare and score each field in the records. Each data source structures its metadata differently, though, so we must transform that metadata into a standardized format. We then weigh or score the confidence level of any matches found. A high confidence level is when grant IDs match, but this is rare currently. Confidence levels improve with additional identifiers like ORCIDs, RORs, and repository IDs.

Some development challenges discussed at the workshop include:

  • US funding agencies lack a standard way of sharing metadata via their APIs and rarely include Grant IDs. Grant IDs are important but not reliable yet for identification purposes.
  • Research/DMSP outputs associated with older projects frequently lack identifiers such as ROR and ORCIDs in their metadata record. 
  • How can we find datasets and software related to published research articles in systems like COKI? Can we use an article’s references to find these artifacts? What other hooks will allow us to identify these related outputs, and how could improved metadata and the usage of identifiers help facilitate making these connections?  

We are exploring adding more data aggregators to combine findings and create a clearer picture of a research project and its outputs. We will also explore methods to identify related works from research article reference sections, like dataset or software references. We are experimenting with ML/AI techniques to determine if a research output might be related to a DMSP.

Findings from the MAP Pilot will be published as reports and best practices for implementing maDMSP workflows at research institutions after the project ends in 2025. If interested in collaborating on this important developmental work, please contact muhlmansiek [at] arl [dot] org for more information.


New Project Director Joins the MAP Pilot Project

By Mary O’Brien Uhlmansiek, Project Director, The Association of Research Libraries (ARL)

This February, I joined the MAP Pilot team as Project Director, serving in a joint position with The Association of Research Libraries (ARL) and the California Digital Library (CDL). In this role, I will support ten research libraries in our pilot project, exploring ways to advance institutional coordination around machine-actionable data management and sharing plans (maDMSPs). The project will compile resources for research workflow improvements utilizing maDMSPs, such as for tracking compliance with funder data-sharing requirements or to initiate internal research infrastructure requests upon grant award, for example. Our pilot partners will also help drive improvements in the DMP Tool itself, providing valuable software testing and feedback as new interoperability features are developed, and using real-world examples to ensure the application will meet the needs of researchers and stakeholders alike.

Through my experiences serving as a data and repository manager for sensitive health-related information, in managing research software adoption and implementation at a large medical university, and as a facilitator for the adoption of outputs and recommendations at the Research Data Alliance, I can see the potential for the DMP Tool to provide critical research infrastructure for researchers and administrators alike as they navigate new data-sharing requirements from funders. I am excited to work with the project PIs, Cynthia Hudson Vitale and Maria Praetzellis, and the many other dedicated professionals from research library organizations in the open science movement. Projects such as the MAP Pilot are building blocks for the transition to more open science, and I look forward to the dissemination of the teams’ outputs to aid research institutions in adopting and continuing this important work. 

If you would like to learn more about maDMSPs or to get involved in future work in this area, please consider joining a group such as the Active Data Management Plans Interest Group at the Research Data Alliance.

Institutions Selected to Pilot Development of Scalable Data-Management Infrastructure

The Association of Research Libraries (ARL) and the California Digital Library (CDL) have selected five institutional teams to pilot the integration or creation of prototypes and possible workflows for machine-actionable data management and sharing plans (maDMSPs). The pilot project will run January–December 2024. This project is funded by an Institute of Museum and Library Services (IMLS) National Leadership Grant. Additional information about the project is on our project webpage.

Machine-actionable data management and sharing plans are structured, machine-readable documents that allow for dynamic reporting on the intentions and outcomes of a research project, enabling streamlined information exchange across relevant parties and systems. These plans go beyond traditional static document-based DMSPs, and contain an inventory of key metadata about a project and its outputs (not just datasets), with a change history that stakeholders can query for information over the lifetime of the research. Implementing maDMSPs can be a key piece of establishing interconnected, automated systems for research data management and compliance.

The maDMSPs pilot institutions will help shape the development of maDMSPs and gain valuable early experience with new approaches to enable more automated and connected research data management. The institutions are:

  • Arizona State University
  • Northwestern University Feinberg School of Medicine
  • Pennsylvania State University
  • University of California, Riverside
  • University of Colorado, Boulder

An additional five institutions have been selected for the maDMSP extended cohort that will engage closely with the pilot cohort.

Call for Institutions to Pilot Development of Scalable Data-Management Infrastructure

The Association of Research Libraries (ARL) and the California Digital Library (CDL) are seeking four institutional teams to pilot the integration or creation of prototypes and possible workflows for machine-actionable data management and sharing plans (maDMSPs). The pilot project will run January–December 2024. This project is funded by an Institute of Museum and Library Services (IMLS) National Leadership Grant. Additional information about the project is on our project webpage.

Interested organizations should submit their expression of interest here.

Machine-actionable data management and sharing plans are structured, machine-readable documents that allow for dynamic reporting on the intentions and outcomes of a research project, enabling streamlined information exchange across relevant parties and systems. These plans go beyond traditional static document-based DMSPs, and contain an inventory of key metadata about a project and its outputs (not just datasets), with a change history that stakeholders can query for information over the lifetime of the research. Implementing maDMSPs can be a key piece of establishing interconnected, automated systems for research data management and compliance.

This pilot provides an exciting opportunity for selected institutions to help shape the development of maDMSPs and gain valuable early experience with new approaches to enable more automated and connected research data management.

By agreeing to be part of this pilot program, institutions will:

  • Define a set of success measures for institutional pilot projects of maDMSPs at their organization.
  • Gather a sample set of data management plans from funded research projects to use as test cases for connecting plans with associated datasets and other research outputs.
  • Provide engaged feedback on the maDMSP features and uses at their organization.
  • Conduct ongoing work to meet the locally defined success measures.
  • Attend and actively participate in project meetings every other month.
  • Participate in project communication, outreach, and engagement (such as conference panels, webinars, reports and articles, etc.).
  • Coordinate and manage one program team site visit.

Pilot projects should include a team of three to five people representing institutional stakeholders who will work together to test or prototype an institutional solution to support public access to research data leveraging the maDMSP. Teams may include representatives from the offices of several institutional stakeholders, such as the research office, library, information technology, institutional review board (IRB), high-performance computing units, and/or faculty.

Examples of possible pilot projects include, but are not limited to:

  • Modeling notification workflows that could be automated through maDMSPs to alert stakeholders to key events over the data life cycle. Example use cases include alerts around sensitive data, managing big data, enabling data transfer, and linking datasets to published outputs.
  • Building prototype integrations connecting maDMSPs with existing research information management systems (RIMS) or researcher profile systems. For example, automatically updating and exchanging key metadata between maDMSPs and other research systems.
  • Engaging academic or administrative departments to test the utility of maDMPs for their research workflows and data management needs. Departmental testing would provide feedback to inform the optimization of maDMSP systems.
  • Demonstrating and improving communication workflows between key campus units involved in research data management using maDMSPs as a connecting platform. Example stakeholders include the library, research office, IT/security, IRB, research computing, and high-performance computing units.

Pilot institutions will:

  • Gain early access to new maDMSP features and functionality.
  • Influence technical development and workflow processes of the maDMSP platform.
  • Be reimbursed for up to $6,000 per institution to attend conferences or workshops to communicate pilot project goals or outcomes.

The ARL/CDL project team will produce all required reporting to IMLS; there are no federal grant reporting requirements for pilot partners.

We are seeking a range of institutions that are diverse in size, research activity, and level of development of services and infrastructure for research data management and sharing. Even if your institution has just begun planning for research data management and sharing, we invite you to apply.

Applications will remain open until Friday, November 10, 2023, and we anticipate notifying applicants by the end of November.

If you are interested in learning more, you are invited to register to attend an optional, informational webinar on Thursday, November 2, at 10:00 a.m. PDT/1:00 p.m. EDT.

Please direct any questions to Cynthia Hudson Vitale cvitale@arl.org or Maria Praetzellis maria.praetzellis@ucop.edu.

Association of Research Libraries and California Digital Library Receive Grant to Advance Data Management and Sharing

Cross-posted from ARL News and written by Cynthia Hudson-Vitale | cvitale@arl.org | August 4, 2023

image by Markus Spiske on Unsplash

The Association of Research Libraries (ARL) and the California Digital Library (CDL) have received a $668,048 National Leadership Grant from the US Institute of Museum and Library Services (IMLS) to assist institutions in managing and sharing federally funded research data. This project will build a machine-actionable data-management plan (maDMP) tool by enhancing and developing new DMPTool features utilizing persistent identifiers (PIDs). CDL and ARL will work together to further strengthen institutional capacity for tracking research outputs by piloting the institutional integration of maDMPs across an academic campus and building community across institutions for maDMPs.

The promise of the maDMP is to be a vehicle for reporting on the intentions and outcomes of a research project that enables information exchange across relevant stakeholders and systems. maDMPs contain an inventory of key information about a project and its outputs with a change history that stakeholders can query for updated information about the project over its lifetime. By incorporating open persistent identifiers (PIDs) into DMPs and leveraging all DMP metadata for interoperability across infrastructures, institutions—and specifically libraries—will be better equipped to track and manage their institutional research data products.

CDL and ARL have collaborated before on advancing PIDs and maDMPs, including joint efforts on the 2019 National Science Foundation (NSF) grant Implementing Effective Data Practices that led to stakeholder recommendations for collaborative research support. The new IMLS project builds on this prior work by piloting maDMP workflows in the DMPTool, gathering feedback from partner institutions, and iterating on maDMP features to put those recommendations into practice at scale.

“We are thrilled to work with ARL on this timely project to advance open science by utilizing machine-actionable DMPs,” said Günter Waibel, associate vice provost and executive director, California Digital Library. “Facilitating the sharing and tracking of research data furthers our goals of supporting open scholarship and leveraging innovative technology to situate research data within an open knowledge graph of scholarly activity. We look forward to collaborating with ARL and partner institutions to build new tools and workflows to strengthen the research data ecosystem.”

“ARL is eager to engage its members and the broader research library community in testing new DMPTool features to improve cross-institution communications around open-science practices and research integrity,” said Mary Lee Kennedy, executive director, Association of Research Libraries.

In addition to developing DMPTool workflows to link research outputs and track relationships, this project will also work with four institutions to pilot the new features and improve capabilities. The call for institutional teams will be distributed in the next few months. Stay tuned for information on community calls and other project updates.

About the Association of Research Libraries

The Association of Research Libraries (ARL) is a nonprofit organization of research libraries in Canada and the US whose vision is to create a trusted, equitable, and inclusive research and learning ecosystem and prepare library leaders to advance this work in strategic partnership with member libraries and other organizations worldwide. ARL’s mission is to empower and advocate for research libraries and archives to shape, influence, and implement institutional, national, and international policy. ARL develops the next generation of leaders and enables strategic cooperation among partner institutions to benefit scholarship and society. ARL is on the web at ARL.org.

About the California Digital Library

The University of California (UC) founded the CDL in 1997 to take advantage of emerging technologies that were transforming the way digital information was being published and accessed. Since then, in collaboration with the UC libraries and other partners, we assembled one of the world’s largest digital research libraries and changed the ways that faculty, students, and researchers discover and access information. In partnership with the UC libraries, the CDL has continually broken new ground by developing systems linking our users to the vast print and online collections within UC and beyond. Building on the foundations of the Melvyl Catalog, we developed one of the largest online library catalogs in the country. We saved the university millions of dollars by facilitating the co-investment and sharing of materials and services used by libraries across the UC system. We work in partnership with campuses to bring the treasures of our libraries, museums, and cultural heritage organizations to the world. And we continue to explore how services such as digital curation, scholarly publishing, archiving, and preservation support research throughout the information life cycle. Serving the UC libraries is a vital component of our mission. Our unique position within the university allows us to provide the infrastructure and support commonly needed by the campus libraries, freeing them to focus their resources on the needs of their users. Looking ahead, the CDL will continue to use innovative technology to connect content and communities in ways that enhance teaching, learning, and research. CDL is on the web at cdlib.org.

About the Institute of Museum and Library Services

The Institute of Museum and Library Services is the primary source of federal support for the nation’s libraries and museums. We advance, support, and empower America’s museums, libraries, and related organizations through grantmaking, research, and policy development. IMLS envisions a nation where individuals and communities have access to museums and libraries to learn from and be inspired by the trusted information, ideas, and stories they contain about our diverse natural and cultural heritage. To learn more, visit www.imls.gov and follow us on Facebook and Twitter.

.

Supporting FAIR Data with Integrations

As part of our work to extend the usability and interoperability of the new Networked DMP, we have partnered with Research Space (RSpace), a connected electronic lab notebook, and developed a prototype integration that allows users to track research data throughout the research life cycle. This new integration enables tri-directional data flows between RSpace, DMPTool, and data repositories, facilitating higher quality and more comprehensive research data capture and tracking.

The lack of interoperability between tools is arguably the most significant barrier to streamlining workflows throughout the research lifecycle. This gap prevents the comprehensive collection and incorporation of research data and metadata into the research record captured during the active research phase. Furthermore, it limits the scope for passing this data and metadata on to data repositories, thus undermining FAIR data principles and reproducibility. Bridging this gap is precisely what the integration seeks to address. 

Researchers are now able to reference and update their DMPs from within RSpace. The aims of this feature are:

  1. To add value to DMPs to link to datasets generated throughout a study and become ‘living documents.’
  2. Reduce the burden on researchers in keeping their DMPs up to date.
  3. Append DOIs and links to datasets exported from RSpace to a repository to be associated with a DMP.

Learn More

To see this workflow in action, check out this short demo from FORCE 2021, where Julie Goldman from the Harvard Medical School explains how the DMPTool – RSpace – Dataverse workflow enhances research data capture in the Harvard environment. The user documentation also contains a walk-through of the feature with instructions. 

Additionally, an upcoming FORCE11 Community Call on January 25, 2022, will showcase this and other integrations and demonstrate ways in which data capture and reproducibility are enhanced by tool interoperability.  

Please reach out with any comments, suggestions, or ideas for future integrations!

FAIR Island Project Receives NSF Funding

FAIR Island

Crossposted from the FAIR Island website

The California Digital Library (CDL), University of California Gump South Pacific Research Station, Berkeley Institute for Data Science (BIDS), Metadata Game Changers, and DataCite are pleased to announce that they have been awarded a 2- year NSF EAGER grant entitled “The FAIR Island Project for Place-based Open Science” (full proposal text). 

The FAIR Island project examines the impact of implementing optimal research data management policies and requirements, affording us the unique opportunity to look at the outcomes of strong data policies at a working field station. Building on the Island Digital Ecosystem Avatars (IDEA) Consortium (see Davies et al. 2016), the FAIR Island Project leverages collaboration between the Gump Station on the island of Moorea in French Polynesia (host of the NSF Moorea Coral Reef Long-Term Ecological Research site), and Tetiaroa Society, which operates a newly established field station located on the atoll of Tetiaroa a short distance from Moorea. 

The FAIR Island project builds interoperability between pieces of critical research infrastructure — DMPs, research practice, PIDs, data policy, and publications contributing to the advancement and adoption of Open Science.  In the global context, there are ongoing efforts to make science Open and FAIR to bring more rigor to the research process, in turn increasing the reproducibility and reusability of scientific results.  DataCite as a global partner in the project, has been working to recognize the importance of better management of research entities. This has led to critical advances concerning the development of infrastructure for Open Science. Increased availability of the different research outputs of a project (datasets, pre-registrations, software, protocols, etc.) would enable the reuse of research to aggregate findings across studies to evaluate discoveries in the field and ultimately assess and accelerate progress.

Key outcomes the FAIR Island team will develop include: 

  1. CDL, BIDS, and the University of California Natural Reserve System will work together to build an integrated system for linking research data to their associated publications via PIDs. We will develop a provenance dashboard from field to publication, documenting all research data and research outcomes derived from that data. 
  1. The project also facilitates further development of the DataCite Commons interface and extends connections made possible via the networked DMP that allows users to track relationships between DMPs, investigators, outputs, organizations, research methods, and protocols; and display citations throughout the research lifecycle.
  1. Developing an optimal data policy for place-based research by CDL, BIDS, and Metadata Game Changers is the cornerstone component of the FAIR Island project.  A reusable place-based data policy template will be shared and implemented amongst participating UC-managed field stations and marine labs. In addition, we will be incorporating these policies into a templated data management plan within the DMPTool application and sharing it with the broader community via our website, whitepapers, and conferences such as the Research Data Alliance (RDA) Plenaries.

The FAIR Island project is in a unique position to demonstrate how we can advance open science by creating optimal FAIR data policies governing all research conducted at field stations. Starting with the field station on Tetiaroa, the project team plans to demonstrate how FAIR data practices can make the reuse of data and the collaboration of data more efficient. Data Management Plans (DMPs) in this “FAIR data utopia” will be utilized as key documents for tracking provenance, attribution, compliance, deposit, and publication of all research data collected on the island by implementing mandatory registration requirements, including extensive use of controlled vocabularies, personal identifiers (PIDs), and other identifiers.

The project will make significant contributions to international Open Science standards and collaborate with open infrastructure providers to provide a scalable implementation of best practices across services. In addition, DataCite seeks to extend the infrastructure services developed in the project to their member community across 48 countries and 2,500 repositories globally. 

We will continue to share details and feature developments related to the FAIR Island project via our blog. You can join the conversation at the next RDA plenary in November 2021. Feedback or questions are most welcome and can be sent directly to info@fairisland.org

Connecting the DMP ID to an ORCID record

Recently we announced that the DMPTool can now generate persistent, unique IDs (the DMP ID) for plans created within the application. Building on this development, we are thrilled to share that the scholarly identifier service for researchers, ORCID, recently adopted the DMP as a resource type. As a result, DMPs are now a defined work type within an ORCID record and listed on an individual’s ORCID record. The connection between a DMP ID and ORCID is crucial for the Networked DMP, as ORCIDs play a key role in facilitating connections between researchers, institutions, outputs, and projects. It is precisely these types of relationships that we are enabling through our work on Networked DMPs.

Screenshot of manually adding a DMP as a work to an ORCID record

Additionally, DMP IDs generated via the DMPTool are now automatically linked to the DMP creator’s ORCID record. This means that when a DMPTool user “Registers” their plan, a DMP ID is generated, and this record is automatically pushed to ORCID and included as a work on their ORCID profile page. 

“Registering” a DMP will generate a DMP ID and push this work to the associated ORCID record
After a DMP ID is generated this work will be listed as a work on the researcher’s ORCID record

Together with Liz Krznarich from DataCite and DMPTool Editorial Board member ​​Nina Exner from Virginia Commonwealth University, I recently participated in an ORCID Community Call demonstrating this new integration and discussing our approach to building the Networked DMP. A recording of the webinar is available here, and our combined slide deck is available here.  

The DMPTool team continues to expand the Networked DMP. Development is currently underway for additional features within the DMPTool, including DMP versioning and advancing our API to facilitate external integrations. We look forward to sharing updates with you soon about these exciting advancements. In the meantime, as always, feedback or questions are most welcome and can be sent directly to maria.praetzellis@ucop.edu.

Interviews on Implementing Effective Data Practices, Part I: Why This Work Matters

Cross-posted from ARL News by Natalie Meyers, Judy Ruttenberg, and Cynthia Hudson-Vitale | October 28, 2020

In preparation for the December 2019 invitational conference, “Implementing Effective Data Practices,” hosted by the Association of Research Libraries (ARL), Association of American Universities (AAU), Association of Public and Land-grant Universities (APLU), and California Digital Library (CDL), we conducted a series of short pre-conference interviews.

We interviewed representatives from scholarly societies, research communities, funding agencies, and research libraries about their perspectives and goals around machine-readable data management plans (maDMPs) and persistent identifiers (PIDs) for data. We hoped to help expose the community to the range of objectives and concerns we bring to the questions we collectively face in adopting these practices. We asked about the value the interviewees see or wish to see in maDMPs and PIDs, their concerns, and their pre-conference goals.

In an effort to make these perspectives more widespread, we are sharing excerpts from these interviews and discussing them in the context of the final conference report that was released recently. Over the next three weeks, we will explore and discuss interview themes in the context of broad adoption of these critical tools.

Why This Work Matters

To start off this series of scholarly communications stakeholder perspectives, we need to position the importance of this infrastructure within broader goals. The overall goal of the conference was to explore the ways that stakeholders could adopt a more connected ecosystem for research data outputs. The vision of why this was important and how it would be implemented was a critical discussion point for the conference attendees.

Benjamin Pierson, then senior program officer, now deputy director for enterprise data, Bill and Melinda Gates Foundation, expressed the value of this infrastructure as key to solving real-world issues and making data and related assets first-class research assets that can be reused with confidence.

Clifford Lynch, executive director, Coalition for Networked Information, stated how a public sharing of DMPs within an institution would create better infrastructure and coordination at the university level for research support.

From the funder perspective, Jason Gerson, senior program officer, PCORI (Patient-Centered Outcomes Research Institute), indicated that PIDs are also essential for providing credit for researchers as well as for providing funders with a mechanism to track the impact of the research they fund.

Margaret Levenstein, director, ICPSR (Inter-university Consortium for Political and Social Research), spoke about the importance of machine-readable DMPs and PIDs for enhancing research practices of graduate students and faculty as well as the usefulness for planning repository services.

For those developing policies at the national level, Dina Paltoo, then assistant director for policy development, US National Library of Medicine, currently assistant director, scientific strategy and innovation, Immediate Office of the Director, US National Heart, Lung, and Blood Institute, discussed how machine-readable data management plan are integral for connecting research assets.

All of the pre-conference interviews are available on the ARL YouTube channel.

Natalie Meyers is interim head of the Navari Family Center for Digital Scholarship and e-research librarian for University of Notre Dame, Judy Ruttenberg is senior director of scholarship and policy for ARL, and Cynthia Hudson-Vitale is head of Research Informatics and Publishing for Penn State University Libraries.

Effective Data Practices: new recommendations to support an open research ecosystem

We are pleased to announce the release of a new report written with our partners at the Association of Research Libraries (ARL), the Association of American Universities (AAU), and the Association of Public and Land-grant Universities (APLU): Implementing Effective Data Practices: Stakeholder Recommendations for Collaborative Research Support.  

The report brings together information and insights shared during a December 2019 National Science Foundation sponsored invitational conference on implementing effective data practices. In this report, experts from library, research, and scientific communities provide key recommendations for effective data practices to support a more open research ecosystem. 

During the December conference, the project team developed a set of recommendations for the broad adoption and implementation of NSF’s recommended data practices as described in the NSF’s May 2019 Dear Colleague Letter.  The report focuses on recommendations for research institutions and also provides guidance for publishers, tool builders, and professional associations. The AAU-APLU Institutional Guide to Accelerating Public Access to Research Data, forthcoming in spring 2021, will include the recommendations.

The conference focused on designing guidelines for (1) using persistent identifiers (PIDs) for datasets, and (2) creating machine-readable data management plans (DMPs), both data practices that were recommended by NSF. Based on the information and insights shared during the conference, the project team developed a set of recommendations for the broad adoption and implementation of NSF’s preferred data practices. 

The report focuses on recommendations for research institutions and also provides guidance for publishers, tool builders, and professional associations. The AAU-APLU Institutional Guide to Accelerating Public Access to Research Data, forthcoming in spring 2021, will include the recommendations.

Five key takeaways from the report are:

  • Center the researcher by providing tools, education, and services that are built around data management practices that accommodate the scholarly workflow.
  • Create closer integration of library and scientific communities, including researchers, institutional offices of research, research computing, and disciplinary repositories.
  • Provide sustaining support for the open PID infrastructure that is a core community asset and essential piece of scholarly infrastructure. Beyond adoption and use of PIDs, organizations that sustain identifier registries need the support of the research community.
  • Unbundle the DMP, because the DMP as currently understood may be overloaded with too many expectations (for example, simultaneously a tool within the lab, among campus resource units, and with repositories and funding agencies). Unbundling may allow for different parts of a DMP to serve distinct and specific purposes.
  • Unlock discovery by connecting PIDs across repositories to assemble diverse data to answer new questions, advance scholarship, and accelerate adoption by researchers.

The report also identifies five core PIDs that are fundamental and foundational to an open data ecosystem. Using these PIDs will ensure that basic metadata about research is standardized, networked, and discoverable in scholarly infrastructure: 

  1. Digital object identifiers (DOIs) from DataCite to identify research data, as well as from Crossref to identify publications
  2. Open Researcher and Contributor (ORCID) iDs to identify researchers
  3. Research Organization Registry (ROR) IDs to identify research organization affiliations 
  4. Crossref Funder Registry IDs to identifier research funders 
  5. Crossref Grant IDs to identify grants and other types of research awards

The report is intended to encourage collaboration and conversation among a wide range of stakeholder groups in the research enterprise by showcasing how collaborative processes help with implementing PIDs and machine-actionable DMPs (maDMPs) in ways that can advance public access to research.

The full report is now available online

This material is based upon work supported by the National Science Foundation under Grant Number 1945938. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Project team:

  • John Chodacki, California Digital Library
  • Cynthia Hudson-Vitale, Pennsylvania State University
  • Natalie Meyers, University of Notre Dame
  • Jennifer Muilenburg, University of Washington
  • Maria Praetzellis, California Digital Library
  • Kacy Redd, Association of Public and Land-grant Universities
  • Judy Ruttenberg, Association of Research Libraries
  • Katie Steen, Association of American Universities

 

Additional report and conference contributors:

  • Joel Cutcher-Gershenfeld, Brandeis University
  • Maria Gould, California Digital Library