Connecting DMSPs to Research Outputs

By Brian Riley, California Digital Library (CDL), and Mary O’Brien Uhlmansiek, The Association of Research Libraries (ARL)

In March, the lead developer for the DMP Tool, Brian Riley, attended a workshop on “Scientometrics Using Open Data” offered by the Centre for Science and Technology Studies (CWTS) at Leiden University. Participation in this session allowed us to share the work we are doing as part of the MAP Pilot project funded by the NSF and IMLS, and to collaborate on scientometric analyses using open data sources such as Crossref and DataCite.

The MAP Pilot project involves working with 10 institutions across the US to test connecting machine-actionable data management and sharing plans (maDMSPs) with related research outputs. Using research project metadata and persistent identifiers to query open data sources, it is somewhat easy to find research articles produced by a particular project, but not the datasets, software and other artifacts that are described in a DMSP. We are investigating ways to improve their findability using automation including machine learning/AI.

When maDMSPs are created in the DMP Tool, users can enter useful project metadata to enable queries with other systems. This includes ORCIDs for contributors, funding opportunity identifiers, RORs for affiliations and funders, anticipated project start and end dates, and the planned data repository for storage. The DMP Tool then assigns a DMP ID to the DMSP.

DMSPs are often created years before the research outputs. The DMSPs in the DMP Tool with good metadata are only 2-3 years old, and their DMSP outputs have not yet been published. Therefore, the institutions contributing to our pilot have been asked to find older, funded research projects and their outputs to use as test cases. Using a new feature to upload an existing DMSP, they will enter basic information about the project (i.e., title, PI, grant identifiers) for research funded by 4 major US agencies (NSF, NIH, DOE, and NASA) and for which we have the most developed API integrations. As potential DMSP outputs are identified, the pilot teams will verify their relation to the research.

Identifying related DMSP outputs within the DMP Tool will give data librarians and research/grant management offices insight into the outputs of research projects, academic departments, and the institution. Users can generate reports for compliance checks (was the data shared according to the funder’s policy), grant reporting, and research management activities.

With sufficient metadata, how do we find related DMSP outputs? We start by exploring open data sources like Crossref, DataCite, and COKI. For example, we explore DataCite’s GraphQL API to extract DataCite metadata and compare it with DMP Tool projects. We use an algorithm to compare and score each field in the records. Each data source structures its metadata differently, though, so we must transform that metadata into a standardized format. We then weigh or score the confidence level of any matches found. A high confidence level is when grant IDs match, but this is rare currently. Confidence levels improve with additional identifiers like ORCIDs, RORs, and repository IDs.

Some development challenges discussed at the workshop include:

  • US funding agencies lack a standard way of sharing metadata via their APIs and rarely include Grant IDs. Grant IDs are important but not reliable yet for identification purposes.
  • Research/DMSP outputs associated with older projects frequently lack identifiers such as ROR and ORCIDs in their metadata record. 
  • How can we find datasets and software related to published research articles in systems like COKI? Can we use an article’s references to find these artifacts? What other hooks will allow us to identify these related outputs, and how could improved metadata and the usage of identifiers help facilitate making these connections?  

We are exploring adding more data aggregators to combine findings and create a clearer picture of a research project and its outputs. We will also explore methods to identify related works from research article reference sections, like dataset or software references. We are experimenting with ML/AI techniques to determine if a research output might be related to a DMSP.

Findings from the MAP Pilot will be published as reports and best practices for implementing maDMSP workflows at research institutions after the project ends in 2025. If interested in collaborating on this important developmental work, please contact muhlmansiek [at] arl [dot] org for more information.


New Project Director Joins the MAP Pilot Project

By Mary O’Brien Uhlmansiek, Project Director, The Association of Research Libraries (ARL)

This February, I joined the MAP Pilot team as Project Director, serving in a joint position with The Association of Research Libraries (ARL) and the California Digital Library (CDL). In this role, I will support ten research libraries in our pilot project, exploring ways to advance institutional coordination around machine-actionable data management and sharing plans (maDMSPs). The project will compile resources for research workflow improvements utilizing maDMSPs, such as for tracking compliance with funder data-sharing requirements or to initiate internal research infrastructure requests upon grant award, for example. Our pilot partners will also help drive improvements in the DMP Tool itself, providing valuable software testing and feedback as new interoperability features are developed, and using real-world examples to ensure the application will meet the needs of researchers and stakeholders alike.

Through my experiences serving as a data and repository manager for sensitive health-related information, in managing research software adoption and implementation at a large medical university, and as a facilitator for the adoption of outputs and recommendations at the Research Data Alliance, I can see the potential for the DMP Tool to provide critical research infrastructure for researchers and administrators alike as they navigate new data-sharing requirements from funders. I am excited to work with the project PIs, Cynthia Hudson Vitale and Maria Praetzellis, and the many other dedicated professionals from research library organizations in the open science movement. Projects such as the MAP Pilot are building blocks for the transition to more open science, and I look forward to the dissemination of the teams’ outputs to aid research institutions in adopting and continuing this important work. 

If you would like to learn more about maDMSPs or to get involved in future work in this area, please consider joining a group such as the Active Data Management Plans Interest Group at the Research Data Alliance.

DMP Tool 5.0 Release

We are excited to share the latest enhancements in the DMP Tool with the 5.0 release. This update marks a shift in our technology, featuring substantial back-end advancements that set the stage for a more robust, efficient, and scalable future. 

Infrastructure Improvements  

At the core of these updates is an overhaul of our infrastructure. The DMPTool’s back-end, built on legacy Rails code, is evolving. We’re transitioning towards a more modern architecture, separating the front-end and back-end operations. This shift involves transforming existing code into API endpoints and developing a React-based front-end. These changes will allow the DMP Tool to effectively generate the structured data required to realize the potential of machine-actionable plans. 

A New Look and Feel

The first thing you’ll notice is an updated DMP Tool homepage. This redesign aims to streamline how users and prospective partners access information about the tool. Recognizing the frequent inquiries we receive about joining the DMP Tool, we’ve focused on making key information about the application more accessible and straightforward.  

Versioning of Registered DMPs

Plan versioning is a key feature for machine-actionable DMPs and one we have received many requests for. Rather than static, quickly outdated documents, effective DMPs track progress by logging critical events from planning to preservation. Regularly revisiting and updating DMPs as research unfolds creates dynamic records that monitor ongoing activities. 

As a first step to exposing updates to DMPs, this release also includes the introduction of versioning for plans with DMP-IDs. This means a new version is created whenever a registered DMP is updated. Changes made within the same hour are combined into a single version. This feature provides a clear history of updates and ensures that you can easily track and reference different iterations of a DMP.

We welcome your input on these latest updates. Please reach out with any comments, questions, or feedback about these changes or the DMP Tool in general.

Institutions Selected to Pilot Development of Scalable Data-Management Infrastructure

The Association of Research Libraries (ARL) and the California Digital Library (CDL) have selected five institutional teams to pilot the integration or creation of prototypes and possible workflows for machine-actionable data management and sharing plans (maDMSPs). The pilot project will run January–December 2024. This project is funded by an Institute of Museum and Library Services (IMLS) National Leadership Grant. Additional information about the project is on our project webpage.

Machine-actionable data management and sharing plans are structured, machine-readable documents that allow for dynamic reporting on the intentions and outcomes of a research project, enabling streamlined information exchange across relevant parties and systems. These plans go beyond traditional static document-based DMSPs, and contain an inventory of key metadata about a project and its outputs (not just datasets), with a change history that stakeholders can query for information over the lifetime of the research. Implementing maDMSPs can be a key piece of establishing interconnected, automated systems for research data management and compliance.

The maDMSPs pilot institutions will help shape the development of maDMSPs and gain valuable early experience with new approaches to enable more automated and connected research data management. The institutions are:

  • Arizona State University
  • Northwestern University Feinberg School of Medicine
  • Pennsylvania State University
  • University of California, Riverside
  • University of Colorado, Boulder

An additional five institutions have been selected for the maDMSP extended cohort that will engage closely with the pilot cohort.

Call for Institutions to Pilot Development of Scalable Data-Management Infrastructure

The Association of Research Libraries (ARL) and the California Digital Library (CDL) are seeking four institutional teams to pilot the integration or creation of prototypes and possible workflows for machine-actionable data management and sharing plans (maDMSPs). The pilot project will run January–December 2024. This project is funded by an Institute of Museum and Library Services (IMLS) National Leadership Grant. Additional information about the project is on our project webpage.

Interested organizations should submit their expression of interest here.

Machine-actionable data management and sharing plans are structured, machine-readable documents that allow for dynamic reporting on the intentions and outcomes of a research project, enabling streamlined information exchange across relevant parties and systems. These plans go beyond traditional static document-based DMSPs, and contain an inventory of key metadata about a project and its outputs (not just datasets), with a change history that stakeholders can query for information over the lifetime of the research. Implementing maDMSPs can be a key piece of establishing interconnected, automated systems for research data management and compliance.

This pilot provides an exciting opportunity for selected institutions to help shape the development of maDMSPs and gain valuable early experience with new approaches to enable more automated and connected research data management.

By agreeing to be part of this pilot program, institutions will:

  • Define a set of success measures for institutional pilot projects of maDMSPs at their organization.
  • Gather a sample set of data management plans from funded research projects to use as test cases for connecting plans with associated datasets and other research outputs.
  • Provide engaged feedback on the maDMSP features and uses at their organization.
  • Conduct ongoing work to meet the locally defined success measures.
  • Attend and actively participate in project meetings every other month.
  • Participate in project communication, outreach, and engagement (such as conference panels, webinars, reports and articles, etc.).
  • Coordinate and manage one program team site visit.

Pilot projects should include a team of three to five people representing institutional stakeholders who will work together to test or prototype an institutional solution to support public access to research data leveraging the maDMSP. Teams may include representatives from the offices of several institutional stakeholders, such as the research office, library, information technology, institutional review board (IRB), high-performance computing units, and/or faculty.

Examples of possible pilot projects include, but are not limited to:

  • Modeling notification workflows that could be automated through maDMSPs to alert stakeholders to key events over the data life cycle. Example use cases include alerts around sensitive data, managing big data, enabling data transfer, and linking datasets to published outputs.
  • Building prototype integrations connecting maDMSPs with existing research information management systems (RIMS) or researcher profile systems. For example, automatically updating and exchanging key metadata between maDMSPs and other research systems.
  • Engaging academic or administrative departments to test the utility of maDMPs for their research workflows and data management needs. Departmental testing would provide feedback to inform the optimization of maDMSP systems.
  • Demonstrating and improving communication workflows between key campus units involved in research data management using maDMSPs as a connecting platform. Example stakeholders include the library, research office, IT/security, IRB, research computing, and high-performance computing units.

Pilot institutions will:

  • Gain early access to new maDMSP features and functionality.
  • Influence technical development and workflow processes of the maDMSP platform.
  • Be reimbursed for up to $6,000 per institution to attend conferences or workshops to communicate pilot project goals or outcomes.

The ARL/CDL project team will produce all required reporting to IMLS; there are no federal grant reporting requirements for pilot partners.

We are seeking a range of institutions that are diverse in size, research activity, and level of development of services and infrastructure for research data management and sharing. Even if your institution has just begun planning for research data management and sharing, we invite you to apply.

Applications will remain open until Friday, November 10, 2023, and we anticipate notifying applicants by the end of November.

If you are interested in learning more, you are invited to register to attend an optional, informational webinar on Thursday, November 2, at 10:00 a.m. PDT/1:00 p.m. EDT.

Please direct any questions to Cynthia Hudson Vitale cvitale@arl.org or Maria Praetzellis maria.praetzellis@ucop.edu.

Association of Research Libraries and California Digital Library Receive Grant to Advance Data Management and Sharing

Cross-posted from ARL News and written by Cynthia Hudson-Vitale | cvitale@arl.org | August 4, 2023

image by Markus Spiske on Unsplash

The Association of Research Libraries (ARL) and the California Digital Library (CDL) have received a $668,048 National Leadership Grant from the US Institute of Museum and Library Services (IMLS) to assist institutions in managing and sharing federally funded research data. This project will build a machine-actionable data-management plan (maDMP) tool by enhancing and developing new DMPTool features utilizing persistent identifiers (PIDs). CDL and ARL will work together to further strengthen institutional capacity for tracking research outputs by piloting the institutional integration of maDMPs across an academic campus and building community across institutions for maDMPs.

The promise of the maDMP is to be a vehicle for reporting on the intentions and outcomes of a research project that enables information exchange across relevant stakeholders and systems. maDMPs contain an inventory of key information about a project and its outputs with a change history that stakeholders can query for updated information about the project over its lifetime. By incorporating open persistent identifiers (PIDs) into DMPs and leveraging all DMP metadata for interoperability across infrastructures, institutions—and specifically libraries—will be better equipped to track and manage their institutional research data products.

CDL and ARL have collaborated before on advancing PIDs and maDMPs, including joint efforts on the 2019 National Science Foundation (NSF) grant Implementing Effective Data Practices that led to stakeholder recommendations for collaborative research support. The new IMLS project builds on this prior work by piloting maDMP workflows in the DMPTool, gathering feedback from partner institutions, and iterating on maDMP features to put those recommendations into practice at scale.

“We are thrilled to work with ARL on this timely project to advance open science by utilizing machine-actionable DMPs,” said Günter Waibel, associate vice provost and executive director, California Digital Library. “Facilitating the sharing and tracking of research data furthers our goals of supporting open scholarship and leveraging innovative technology to situate research data within an open knowledge graph of scholarly activity. We look forward to collaborating with ARL and partner institutions to build new tools and workflows to strengthen the research data ecosystem.”

“ARL is eager to engage its members and the broader research library community in testing new DMPTool features to improve cross-institution communications around open-science practices and research integrity,” said Mary Lee Kennedy, executive director, Association of Research Libraries.

In addition to developing DMPTool workflows to link research outputs and track relationships, this project will also work with four institutions to pilot the new features and improve capabilities. The call for institutional teams will be distributed in the next few months. Stay tuned for information on community calls and other project updates.

About the Association of Research Libraries

The Association of Research Libraries (ARL) is a nonprofit organization of research libraries in Canada and the US whose vision is to create a trusted, equitable, and inclusive research and learning ecosystem and prepare library leaders to advance this work in strategic partnership with member libraries and other organizations worldwide. ARL’s mission is to empower and advocate for research libraries and archives to shape, influence, and implement institutional, national, and international policy. ARL develops the next generation of leaders and enables strategic cooperation among partner institutions to benefit scholarship and society. ARL is on the web at ARL.org.

About the California Digital Library

The University of California (UC) founded the CDL in 1997 to take advantage of emerging technologies that were transforming the way digital information was being published and accessed. Since then, in collaboration with the UC libraries and other partners, we assembled one of the world’s largest digital research libraries and changed the ways that faculty, students, and researchers discover and access information. In partnership with the UC libraries, the CDL has continually broken new ground by developing systems linking our users to the vast print and online collections within UC and beyond. Building on the foundations of the Melvyl Catalog, we developed one of the largest online library catalogs in the country. We saved the university millions of dollars by facilitating the co-investment and sharing of materials and services used by libraries across the UC system. We work in partnership with campuses to bring the treasures of our libraries, museums, and cultural heritage organizations to the world. And we continue to explore how services such as digital curation, scholarly publishing, archiving, and preservation support research throughout the information life cycle. Serving the UC libraries is a vital component of our mission. Our unique position within the university allows us to provide the infrastructure and support commonly needed by the campus libraries, freeing them to focus their resources on the needs of their users. Looking ahead, the CDL will continue to use innovative technology to connect content and communities in ways that enhance teaching, learning, and research. CDL is on the web at cdlib.org.

About the Institute of Museum and Library Services

The Institute of Museum and Library Services is the primary source of federal support for the nation’s libraries and museums. We advance, support, and empower America’s museums, libraries, and related organizations through grantmaking, research, and policy development. IMLS envisions a nation where individuals and communities have access to museums and libraries to learn from and be inspired by the trusted information, ideas, and stories they contain about our diverse natural and cultural heritage. To learn more, visit www.imls.gov and follow us on Facebook and Twitter.

.

Supporting the FDP NIH Data Management and Sharing Pilot Project

The FDP (Federal Demonstration Partnership) NIH Data Management and Sharing Pilot Project aims to simplify the process of creating an NIH DMSP. The Pilot is testing the effectiveness and usability of two distinct templates.

The goal of the FDP pilot project is not only to test the two templates but also to gather data from both the researcher’s perspective and that of the NIH program. Feedback from these two perspectives will be instrumental in refining the templates based on the pilot data.

DMPTool supports researchers in fulfilling data management requirements efficiently and effectively. In light of this, we have developed templates based on the two FDP pilot templates, Alpha and Bravo. These templates follow the same design principles as the two FDP pilot templates, making it easier for researchers to navigate and comply with the requirements of their respective projects.

DMPTool administrators can customize the FDP templates like other DMPTool templates, including adding custom guidance and example answers and customization of the Research Outputs tab

We are proud to support the FDP NIH Pilot Project and aid researchers in creating comprehensive NIH DMSPs that foster open science. As always, feel free to contact us with any questions or feedback. 

Template Customization for the Research Outputs Tab

V 4.1 RELEASE
customize the RESEARCH OUTPUTS tab
Define output types, repositories, licenses & metadata standards
add tooltips

We are pleased to announce the latest DMPTool release, which introduces a new feature set focused on template customization. This update includes a “Preferences” tab in the Template Builder, offering administrators enhanced control over the Research Outputs section of the DMPTool. Documentation on this release is available here, and our full release notes are here

Here’s an overview of the features included in this update:

1. Customize Research Output Types: The Research Outputs tab traditionally offers a standard list of output types. To provide more flexibility, we’ve added the ability for administrators to customize this list based on the needs of their specific research areas.

2. Customize Available Repositories: The current list of nearly 3,500 repository entries can be overwhelming for researchers. This update allows administrators to refine this list, helping guide researchers toward specific repositories as mandated by funders or institutional requirements. Additionally, DMPTool administrators can add descriptive text to the repository that provides additional information to help researchers select or utilize these resources. 

3. Customize Metadata Standards and License Types: In the same vein as repository customization, this update allows for the customization of metadata standards and license types. 

4. Tooltips: This update introduces tooltips on fields in the Research Outputs section, allowing administrators to provide additional guidance for repositories, metadata standards, and licenses. This text will be displayed as a tooltip to the researcher adjacent to the field, enhancing the clarity of information.

5. Define Custom Repositories: Administrators can now add custom repositories to their preferred repository list. This feature allows admins to record the name, description, and homepage URL of custom repositories.

DMPTool Administrators can add repositories not included in the re3data registry. These repositories will be included in the local index for all DMPTool users. 

6. Remove the Research Outputs Tab from a template: In response to feedback from administrators who noted duplication in information capture, we’ve added the ability to turn off the Research Outputs tab. This can be useful when using older templates that already ask users to provide comprehensive information about their anticipated research outputs.

We appreciate the continuous feedback from our users, and we hope these enhancements will provide a more streamlined, efficient experience for administrators and researchers alike. We invite you to explore these new features and share your feedback with us. 

Latest DMPTool Release 

The DMPTool team is hard at work developing a suite of new features to facilitate the creation of optimized and structured DMPs. In response to growing federal mandates for data sharing, the DMPTool is focused on supporting these new policies by exposing the information contained within a DMP in a machine-readable format to facilitate tracking research projects as they progress through the research lifecycle. Two National Science Foundation EAGER grants have supported this research: the first for DMP Roadmap: Making Data Management Plans Actionable and the second for the ongoing FAIR Island project. Additionally, it has received support from the Templeton World Charity Foundation as part of the FAIR Workflows project.

This latest DMPTool release includes several updates focused on UX improvements and streamlining workflows.

New Follow-up tab 

Follow-up tab to update funding status & link research outputs

This new tab on the Create Plan workflow allows plan creators (or administrators with permission to modify a plan) to add information about funded projects. The new fields include the funding status and a grant number or ID. This tab will only appear on plans that have associated DMP-IDs.

A new section allows users to connect Research Outputs from their work to their associated DMP. Research Outputs can be anything related to a project with a DOI or other URL-based identifier. For example, a project output could be a dataset, protocol, or software connected to a research project. 

These research outputs will be recorded in the metadata of the DMP-ID as a related identifier and passed back to DataCite. By submitting updated metadata to DataCite, this workflow facilitates tracking scholarly outputs and is openly available for consumption and ingest by other systems.

UX changes to the Finalize tab

In response to user feedback, we have streamlined the UI to clarify what a researcher needs to do to get a persistent identifier for their DMP and why they would want to do this.  UI changes include: 

  • New plan text language explains what a DMP-ID is, what identifiers do, and why they should get one for their DMP. 
  • The “Register” button is present but disabled if the preconditions are not met. 
  • Moved the Finalize tab to before Download to better reflect the logical workflow of creating a DMP. 

Improved DMP-ID Landing Page Design

Based on user feedback, we have redesigned the DMP-ID landing page for improved accessibility and to make it clear where a user can view a PDF version of a plan. This redesign also allows us to build new plan versioning features in the coming months. 

Sample DMP-ID Landing Page with a link to the full-text narrative 
Example of Project Outputs as they appear on the DMP ID landing page

Other miscellaneous updates & bug fixes included in this release

  • Research Outputs now appear in the CSV and TXT versions #406
  • Fixed an issue that was causing the DOCX version of the plan from displaying an error in MS Word when opening the document
  • Fixed an issue with the sans-serif font used in PDF generation. Switched from Helvetica (which is no longer downloadable for free) to Roboto and also updated spacing between questions/sections.
  • Fixed an issue that was preventing an institutional admin from adding more than one URL/link on the Org Details page #413 #405
  • Fixed an issue that was preventing associated research outputs from being deleted #372
  • Fixed an issue with the emails sent out after the plan’s visibility changes #416
  • Other updates are detailed in the release notes.

How to get involved

We welcome contributions or collaborations. For those interested in following our work’s technical development, please see our GitHub project board. Please contact us if you have suggestions or ideas for pilot partnerships or if you’re interested in being an early tester.

Updates on DMPTool Support for the NIH DMSP Requirements 

A few months back, we announced a new DMPTool NIH Template Working Group focused on supporting the upcoming NIH requirements for data sharing. Since then, this hard-working group, chaired by Nina Exner of Virginia Commonwealth University, has collaborated to develop several new resources for the community. 

Updated NIH-GEN Template

The updated NIH-GEN DMSP (forthcoming 2023) template (v6) follows the structure laid out by the NIH in the optional DMS Plan format and aligns with the NIH-recommended Elements of a DMS Plan. The DMPTool NIH Template Working Group augmented NIH notices and other policy documents with additional sample language and guidance designed to help researchers create DMS plans. The new NIH-GEN DMSP (forthcoming 2023) template also includes guidance for data covered under the Genomic Data Sharing (GDS) policy, as NIH now expects a single data sharing plan to satisfy both the GDS Policy and the DMS Policy (NOT-OD-22-198). 

This new DMPTool NIH-GEN DMSP (forthcoming 2023) template includes answer prompts and sample answer text. 
The new NIH-GEN DMSP (forthcoming 2023) template includes answer prompts and sample answer text. 

Test out the new template by creating a plan. Or, preview the new template by downloading a PDF version with sample language and guidance included. 

DMPTool administrators can customize this (or any DMPTool template) and add institution-specific guidance and sample language. Instructions on how to customize templates and a short video tutorial are available. Any institutions with existing customizations will need to migrate to this new version of the template by publishing the template. Please see our documentation on the two steps required to transfer existing customizations. 

The DMPTool will depreciate the older NIH templates (NIH-GDS: Genomic Data Sharing and NIH-GEN: Generic (Current until 2023)) on January 24. Any plans with these older templates will still be available, but new plans for NIH will be directed to the new template. After we make this switch, the NIH-GEN DMSP (Forthcoming 2023) template title will change to NIH-GEN. 

New educational materials

The Education Sub-committee of the DMPTool NIH Template Working Group developed materials that institutions can use to promote the NIH requirements and use of the DMPTool. To support the increasing number of new medical centers and other institutions joining the DMPTool community, the Sub-committee produced a slide deck and flyers that institutions can utilize to train local researchers on using the DMPTool templates.

The Education Sub-committee also collaborated on a DMPTool training workshop held by the Network of the National Library of Medicine’s National Center for Data Services (NCDS). The first DMPTool workshop was held in December and broke attendance records. Betsy Gunia of Johns Hopkins led this training webinar, giving an excellent overview and DMPTool demonstration. 

If you missed this first session, never fear! A recording of this session is available via the NNLM, and Jim Martin of the University of Arizona is giving a repeat session on February 15. Registration is available via NNLM

Ongoing work

We will continue to iterate on the new templates, including the sample language and guidance provided, and welcome feedback from the community. As the NIH releases additional recommendations and guidance, we’ll continue incorporating these into NIH templates. As always, please reach out with any questions, suggestions, or feedback!