Recently we announced that the DMPTool can now generate persistent, unique IDs (the DMP ID) for plans created within the application. Building on this development, we are thrilled to share that the scholarly identifier service for researchers, ORCID, recently adopted the DMP as a resource type. As a result, DMPs are now a defined work type within an ORCID record and listed on an individual’s ORCID record. The connection between a DMP ID and ORCID is crucial for the Networked DMP, as ORCIDs play a key role in facilitating connections between researchers, institutions, outputs, and projects. It is precisely these types of relationships that we are enabling through our work on Networked DMPs.
Additionally, DMP IDs generated via the DMPTool are now automatically linked to the DMP creator’s ORCID record. This means that when a DMPTool user “Registers” their plan, a DMP ID is generated, and this record is automatically pushed to ORCID and included as a work on their ORCID profile page.
Together with Liz Krznarich from DataCite and DMPTool Editorial Board member Nina Exner from Virginia Commonwealth University, I recently participated in an ORCID Community Call demonstrating this new integration and discussing our approach to building the Networked DMP. A recording of the webinar is available here, and our combined slide deck is available here.
The DMPTool team continues to expand the Networked DMP. Development is currently underway for additional features within the DMPTool, including DMP versioning and advancing our API to facilitate external integrations. We look forward to sharing updates with you soon about these exciting advancements. In the meantime, as always, feedback or questions are most welcome and can be sent directly to firstname.lastname@example.org.
Building the recent creation of “A Brave New PID” for DMPs, we are excited to announce that DMP creators can now receive IDs for their DMPs within the DMPTool. From the outset of our NSF-funded EAGER research project, the ability to generate DMP IDs has long been on the strategic roadmap for integrating DMPs into the scholarly knowledge sharing and persistent identifier ecosystem.
Supporting NSF recommendations for data management
The DMPTool team continues to work towards supporting these recommendations by building new features and services for an open, automatically updated, interconnected system for data management of research projects.
Our new feature of generating IDs for DMPs represents tangible progress towards achieving our shared goal of moving DMPs from static text documents into structured, interoperable data that is able to be fed across stakeholders, linking metadata, repositories, and institutions, and allowing for notifications, verification, and reporting in real-time.
What’s included in this latest release?
Below is an outline of three new features included in this release. For technical details and a few additional features included, please see our v3.1.0 documentation. These improvements have also been distributed to the larger community within our shared open source codebase, DMPRoadmap. Thank you to the DMPTool Editorial Board for their guidance and feedback as we developed this feature set. We are also appreciative of the DMPTool Administrators who submitted feedback on an early iteration of this release. We intend on incorporating many of these suggestions in future releases and building off the many good ideas shared by all as we continue to expand our support for Networked DMPs
1. IDs for DMPs
Within the Finalize/Publish tab users can “Register” their plan and generate a DMP ID. The DMP ID will then display within the tool and link to a landing page for the plan. For further details on this feature please see our DMP ID documentation.
2. DMP ID Landing Page
After receiving a DMP ID, the system will generate a DMP landing page that includes high level details about the plan. The DMP ID metadata does not include the narrative components of a DMP. For an example of a DMP ID landing page please see this DMP.
The landing pages also demonstrate the types of connections made possible by tracking a research project over time from the point of DMP creation. As a project progresses over time, updates to the plan can be connected to the DMP ID and will display on the associated landing page.
3. Research Outputs Tab
The new Research Outputs tab allows researchers to describe specific project outputs at a more granular and controlled manner than was previously possible solely via the text narrative. In designing this new section, we strived to utilize as many controlled vocabularies and PIDs as possible. Here are some highlights of the new tab:
Repository selector tool utilizing the Registry of Research Data Repositories (re3 data registry) that allows researchers to define where they anticipate depositing a specific output
License selector (from SPDX) that allows researchers to define the associated license for specific outputs
Ability to flag outputs as containing sensitive data and/or PII
Researchers can create an unlimited number of specific research outputs. All entered outputs are included in the downloaded version of the plan, placed after the narrative component of the plan so as not to interfere with funder page count limits.
What’s up next?
With the ability to generate DMP IDs now in place, we are one step closer to creating networked, living DMPs. While this is a great start, we have many additional features in development that will extend the usability and interoperability of this new generation of DMPs. In the coming months, we will be working on developing these additional features:
Connecting DMPs to other related research outputs such as datasets and journal articles via the PID Graph
Connecting DMP IDs to corresponding ORCID records
Incorporating additional PIDs including research resource identifiers (RRIDs)
Sponsor and funder approval workflow wherein these stakeholders can review, comment, and approve submitted DMPs
Integration with the Electronic Lab Notebook, RSpace
Adding the ability for DMPTool admins to curate a list of recommended repositories for the new repository selector tool
Additionally, in response to several DMPTool admin requests for outreach materials supporting adoption of the DMP ID, we are developing materials to share with the DMPTool admin community in order to promote these data practices amongst their users.
We will continue to share details on this work and the development of new features to support the networked DMP. Stay tuned for more developments over the coming months for further advancements.
Despite the challenges over the last year, we are pleased to share some exciting news about launching the brave new PID, DMP IDs. Two years ago we set out a plan in collaboration with the University of California Curation Center and the DMPTool to bring DMP IDs to life. The work was part of the NSF Eager grant DMP Roadmap: Making Data Management Plans Actionable and allowed us to explore the potential of machine-actionable DMPs as a means to transform the DMP into a critical component of networked research data management.
The plan was to develop a persistent identifier (PID) for Data Management Plans (DMPs). We already have PIDs for many entities, such as articles, datasets etc. (DOIs), people (such as ORCID iDs) and places (such as ROR IDs). We knew that it would be important for DataCite to support the community in establishing a unique persistent identifier for DMPs. Until now, we had no PID for the document that “describes data that will be acquired or produced during research; how the data will be managed, described, and stored, what standards you will use, and how data will be handled and protected during and after the completion of the project”. There was no such thing as a DMP-ID; and today that changes.
DMP IDs at a fundamental level are registered as a DOI with the resourceTypeGeneral “OutputsManagementPlan.” Since the DataCite release of schema 4.4, the resourceTypeGeneral controlled vocabulary now includes this as a controlled list item. DMP IDs are created in the same way as registering any DOI, with the same required fields, but must include the “OutputsManagementPlan” resourceTypeGeneral to be identifiable.
Generating DMP IDs creates an unbreakable link between a data plan to the project outputs and allows access to DataCite’s supporting services such as Event Data to facilitate connections via the PID Graph.
Assigning DOIs to persistently identify DMPs is a trend that we have seen already. Since 2019, more than 200 DMPs have been assigned a DOI for their identification. Repositories such as Zenodo made this possible by allowing users to select Data Management Plans as one of the many types of resources.
We know through our work with the DMP community that the introduction of the formal DMP ID, will allow for DMP IDs to proliferate and serve downstream use cases.
Besides persistently identifying DMPs, the assignment of DMP IDs realizes the promises of machine-actionable DMPs. The DataCite GraphQL API can now expose Data Management Plans and all their connections. Other applications can use the same APIs to build machine-actionable DMPs-based applications such as visualizations or summary statistics.
From today, it is possible for DataCite members to use the MDS API and Fabrica to assign DMP IDs to your Data Management Plans. Our team has created documentation to support the community in registering DMP IDs, understanding best practices and exploring related connections in the PID Graph.
We are really pleased to have reached this milestone and look forward to tracking the downstream impact.
Cross-posted from ARL News by Natalie Meyers, Judy Ruttenberg, and Cynthia Hudson-Vitale | October 28, 2020
In preparation for the December 2019 invitational conference, “Implementing Effective Data Practices,” hosted by the Association of Research Libraries (ARL), Association of American Universities (AAU), Association of Public and Land-grant Universities (APLU), and California Digital Library (CDL), we conducted a series of short pre-conference interviews.
We interviewed representatives from scholarly societies, research communities, funding agencies, and research libraries about their perspectives and goals around machine-readable data management plans (maDMPs) and persistent identifiers (PIDs) for data. We hoped to help expose the community to the range of objectives and concerns we bring to the questions we collectively face in adopting these practices. We asked about the value the interviewees see or wish to see in maDMPs and PIDs, their concerns, and their pre-conference goals.
In an effort to make these perspectives more widespread, we are sharing excerpts from these interviews and discussing them in the context of the final conference report that was released recently. Over the next three weeks, we will explore and discuss interview themes in the context of broad adoption of these critical tools.
Why This Work Matters
To start off this series of scholarly communications stakeholder perspectives, we need to position the importance of this infrastructure within broader goals. The overall goal of the conference was to explore the ways that stakeholders could adopt a more connected ecosystem for research data outputs. The vision of why this was important and how it would be implemented was a critical discussion point for the conference attendees.
Benjamin Pierson, then senior program officer, now deputy director for enterprise data, Bill and Melinda Gates Foundation, expressed the value of this infrastructure as key to solving real-world issues and making data and related assets first-class research assets that can be reused with confidence.
Clifford Lynch, executive director, Coalition for Networked Information, stated how a public sharing of DMPs within an institution would create better infrastructure and coordination at the university level for research support.
From the funder perspective, Jason Gerson, senior program officer, PCORI (Patient-Centered Outcomes Research Institute), indicated that PIDs are also essential for providing credit for researchers as well as for providing funders with a mechanism to track the impact of the research they fund.
Margaret Levenstein, director, ICPSR (Inter-university Consortium for Political and Social Research), spoke about the importance of machine-readable DMPs and PIDs for enhancing research practices of graduate students and faculty as well as the usefulness for planning repository services.
For those developing policies at the national level, Dina Paltoo, then assistant director for policy development, US National Library of Medicine, currently assistant director, scientific strategy and innovation, Immediate Office of the Director, US National Heart, Lung, and Blood Institute, discussed how machine-readable data management plan are integral for connecting research assets.
Natalie Meyers is interim head of the Navari Family Center for Digital Scholarship and e-research librarian for University of Notre Dame, Judy Ruttenberg is senior director of scholarship and policy for ARL, and Cynthia Hudson-Vitale is head of Research Informatics and Publishing for Penn State University Libraries.
During the December conference, the project team developed a set of recommendations for the broad adoption and implementation of NSF’s recommended data practices as described in the NSF’s May 2019 Dear Colleague Letter. The report focuses on recommendations for research institutions and also provides guidance for publishers, tool builders, and professional associations. The AAU-APLU Institutional Guide to Accelerating Public Access to Research Data, forthcoming in spring 2021, will include the recommendations.
The conference focused on designing guidelines for (1) using persistent identifiers (PIDs) for datasets, and (2) creating machine-readable data management plans (DMPs), both data practices that were recommended by NSF. Based on the information and insights shared during the conference, the project team developed a set of recommendations for the broad adoption and implementation of NSF’s preferred data practices.
The report focuses on recommendations for research institutions and also provides guidance for publishers, tool builders, and professional associations. The AAU-APLU Institutional Guide to Accelerating Public Access to Research Data, forthcoming in spring 2021, will include the recommendations.
Five key takeaways from the report are:
Center the researcher by providing tools, education, and services that are built around data management practices that accommodate the scholarly workflow.
Create closer integration of library and scientific communities, including researchers, institutional offices of research, research computing, and disciplinary repositories.
Provide sustaining support for the open PID infrastructure that is a core community asset and essential piece of scholarly infrastructure. Beyond adoption and use of PIDs, organizations that sustain identifier registries need the support of the research community.
Unbundle the DMP, because the DMP as currently understood may be overloaded with too many expectations (for example, simultaneously a tool within the lab, among campus resource units, and with repositories and funding agencies). Unbundling may allow for different parts of a DMP to serve distinct and specific purposes.
Unlock discovery by connecting PIDs across repositories to assemble diverse data to answer new questions, advance scholarship, and accelerate adoption by researchers.
The report also identifies five core PIDs that are fundamental and foundational to an open data ecosystem. Using these PIDs will ensure that basic metadata about research is standardized, networked, and discoverable in scholarly infrastructure:
Digital object identifiers (DOIs) from DataCite to identify research data, as well as from Crossref to identify publications
Open Researcher and Contributor (ORCID) iDs to identify researchers
Research Organization Registry (ROR) IDs to identify research organization affiliations
Crossref Funder Registry IDs to identifier research funders
Crossref Grant IDs to identify grants and other types of research awards
The report is intended to encourage collaboration and conversation among a wide range of stakeholder groups in the research enterprise by showcasing how collaborative processes help with implementing PIDs and machine-actionable DMPs (maDMPs) in ways that can advance public access to research.
This material is based upon work supported by the National Science Foundation under Grant Number 1945938. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
John Chodacki, California Digital Library
Cynthia Hudson-Vitale, Pennsylvania State University
Natalie Meyers, University of Notre Dame
Jennifer Muilenburg, University of Washington
Maria Praetzellis, California Digital Library
Kacy Redd, Association of Public and Land-grant Universities
Judy Ruttenberg, Association of Research Libraries
Research Data Alliance (RDA) recently hosted a three day (27-29 May 2020) machine-actionable DMP hackathon to build integrations and test the Common Standard for maDMPs. The event, coordinated through teams at RDA-Austria and TU Wien, was well attended with over 70 participants from Australia, Europe, Africa, and North America.
The teams that work on DMP Tool (dmptool.org) and DMP Online (dmponline.org) were really pleased to represent our shared DMPRoadmap codebase and show our conformance with the standard and ability to exchange DMPs across systems. This blog post details the work of the DMPRoadmap group in the hackathon, for a full review of all outputs please visit the Hackathon GitHub.
What did we work on?
Maria Praetzellis and Sarah Jones, product managers from DMPRoadmap, joined the hackathon “TigTag” team and focused on mapping maDMPs to funder templates. During the hackathon, their group successfully mapped required questions from several funder specific DMPs including:
National Science Foundation
U.S. Geological Survey
The goal of the exercise was to develop guidance on how to normalize the ways that fields from specific funder templates can be mapped to the standard, and, when necessary, develop extensions to incorporate template specific needs. The team came up with several proposals for changes to the documentation and structure of DMP Common Standard and made a few recommendations for extensions to the standard. The team is now assembling the recommendations and will submit ideas as issues to the Common Standard GitHub so work can be tracked going forward.
Brian Riley and Sam Rust, developers from DMPRoadmap, joined the hackathon “DMP Exchange team” and worked to determine how the RDA Common Standard JSON format could be used to exchange DMP metadata between tools. Their team provided a staging service and granted API keys to other development teams to allow testing of prototypes, which helped all participants debug issues. Over the course of the hackathon, our new maDMP API helped developers of the following DMP systems implement their own APIs:
Based on this work, we were able to exchange maDMP metadata between DMPTool and those three systems by the end of the hackathon. Below are screenshots of DMP exports from the Data Stewardship Wizard that were imported into the DMPTool. Because we were each using the RDA Common Standard format, the new DMP was created within the DMPTool and the appropriate metadata was successfully mapped: title, description, project start/end dates, grant ID, contact information, and contributor information.
While the data models used by many systems do not yet offer full support of the RDA Common Standard model, progress was made towards mapping the high level DMP information across the board. Also, the confirmation that these systems could exchange information using RDA Common Standard JSON was encouraging and will likely open the door for future integrations.
We also collaborated with members of the DMP Melbourne, University of Cape Town and Stockholm University on an integration with their institutional repository platform. The teams were interested in pushing both DMP metadata and the physical DMP document into that repository. However, they did not yet support the maDMP standard. So the team created two separate prototype scripts. The first script extracts DMPs from a DMPRoadmap system and creates a placeholder Project that future datasets can be connected to and also uploads a PDF copy of the DMP. The second script converts their JSON into RDA Common Standard compliant JSON. While their institutional repositories do not contain many DMPs at this point, a service like this could help extract DMPs for import into DMP systems that utilize the RDA Common Standards in the future. We hope to build upon this work to facilitate integrations with additional repositories in the future.
Hackathon participants are now collating work produced during the hackathon into a final report. In addition, participants expressed interest in:
More communities. Most of the attendees at this hackathon were developers from DMP-focused tools. In the future, it would be great to have participants from other communities, including developers of CRIS systems, data repository platforms, and ethics tools. This would help us expand the types of use cases being served.
More PIDs. The power of connected information replies on persistent identifiers. We would like to increase our connection with various standards and integrate with the Research Organization Registry (ROR), the Funder Registry, and the Contributor Roles Taxonomy (CRediT) to provide more structured information to support such integrations.
The goal of our EAGER research project is to explore the potential of machine-actionable DMPs as a means to transform the DMPs from a compliance exercise based on static text documents into a key component of a networked research data management. This ecosystem will not only facilitate, but also improve the research process for all stakeholders.
We will be laying out the phases of work in the coming months and will continue to use this blog to keep the community informed of our progress, and to solicit your feedback and ideas.
Phase 1 Workplan
Phase 1 of of our research entails exploring the following three high level ideas:
How to optimize the Digital Object Identifiers (DOI) metadata schema for DMPs
How to best incorporate other Persistent identifiers (PIDs) into DMPs
The common data model for the creation of machine-actionable DMPs, produced by the RDA working group on DMP Common Standards, was recently released for community feedback. Our partners at the Digital Curation Center (DCC) have now implemented this model into the DMPRoadmap codebase. A big thank you to Sam Rust from DCC for his work on this! Those interested in learning more about the Common Standard in DMPRoadmap may want to view a recent webinar recording of Sam detailing this work. This was a fundamental step towards machine actionable DMPs, as it forms the foundation to enable information flow between DMPs and affiliated external systems in a standardized manner.
DOIs for DMPs
With our partners at the Digital Curation Center (DCC), we are working to incorporate the common standards into the shared DMPRoadmap codebase and our DMPTool development plans. As part of this work, we have partnered with DataCite to update their metadata schema to better support DMPs and to optimize a workflow for generating DOIs for DMPs. By relying on the DOI infrastructure, we will then be able to utilize the Event Data service from DataCite to record when assertions have been made on the DOI. More on the workflows surrounding this aspect of the project below.
DMPs and the PID graph
Projects such as Freya have been working to connect research outputs through a PID graph. A key question underpinning much of our work is how we can best leverage the PID graph (see Principle 5: Use PIDs and controlled vocabularies) within the DMP ecosystem. To connect DMPs to the larger PID ecosystem, our first phase will also include incorporating the following persistent identifiers into the DMP as a baseline for future work:
As discussed above, in Phase 1, we are building a system to mint DOIs for DMPs and creating a landing page for DMP DOIs to record updates to the DOI that occur over time. Although the system can be thought of as a giant API, pulling and pushing data from various sources, we are also building a landing page for these DOIs in order to visually demonstrate the types of connections made possible by tracking a research project over time from the point of DMP creation.
Below is a high level overview of this workflow and whiteboarding of its potential architecture. (For those that would like a more detailed view, please check out our GitHub).
maDMP system accepts common standard metadata from DMPTool (DMP Roadmap)
maDMP system sends that metadata to DataCite to mint a DOI (which it then returns to the DMPTool)
A landing page is generated for the DMP DOI
A separate harvester application queries outside APIs to check for assertions recorded against the DOI. For this phase of work we will work with the NSF awards API, and return any award information into the maDMP system.
The maDMP system then sends any award info returned to DataCite
Our goal is to leverage the work being done by the RDA Exposing DMP working group to help inform the privacy concerns of exposing certain types of assertions on this landing page.
Looking ahead, we plan to produce a basic prototype ready for testing and feedback by the end of October. I will be presenting on our work thus far at the upcoming RDA and CODATA meetings. During these meetings, I look forward to continuing our work with the RDA Common Standards Working Group (and to meeting many of those active in this space for the first time in-person)!
Once we establish the workflow to record assertions to a DMP DOI, our next phase of work will include pilot projects with domain-specific and institutional stakeholders to test the flow and integration of relevant information across services and systems. With these partners we plan to test how maDMPs can help track data management activities as they occur during the course of a grant project.
Finally, it’s important to note that all of our development work is being done in a test environment where we will continue to iterate for the next several months as we determine how best to deploy new features to the DMPTool and DMPRoadmap codebase.
Interested in contributing?
Lastly, we realize that maDMP is far from the most euphonious or creative name for this service (nor is our original idea of the DMPHub much better). We are open to any and all ideas for naming this work so if you have any ideas, however strange or off the wall, please do let us know. If we use your idea we promise to shower you with accolades for your denomination genius. Also, free stickers galore.
To review or contribute to the technical components of the project check out our GitHub. And most importantly, please send any and all feedback, questions, or ideas for names to email@example.com.