Supporting FAIR Data with Integrations

As part of our work to extend the usability and interoperability of the new Networked DMP, we have partnered with Research Space (RSpace), a connected electronic lab notebook, and developed a prototype integration that allows users to track research data throughout the research life cycle. This new integration enables tri-directional data flows between RSpace, DMPTool, and data repositories, facilitating higher quality and more comprehensive research data capture and tracking.

The lack of interoperability between tools is arguably the most significant barrier to streamlining workflows throughout the research lifecycle. This gap prevents the comprehensive collection and incorporation of research data and metadata into the research record captured during the active research phase. Furthermore, it limits the scope for passing this data and metadata on to data repositories, thus undermining FAIR data principles and reproducibility. Bridging this gap is precisely what the integration seeks to address. 

Researchers are now able to reference and update their DMPs from within RSpace. The aims of this feature are:

  1. To add value to DMPs to link to datasets generated throughout a study and become ‘living documents.’
  2. Reduce the burden on researchers in keeping their DMPs up to date.
  3. Append DOIs and links to datasets exported from RSpace to a repository to be associated with a DMP.

Learn More

To see this workflow in action, check out this short demo from FORCE 2021, where Julie Goldman from the Harvard Medical School explains how the DMPTool – RSpace – Dataverse workflow enhances research data capture in the Harvard environment. The user documentation also contains a walk-through of the feature with instructions. 

Additionally, an upcoming FORCE11 Community Call on January 25, 2022, will showcase this and other integrations and demonstrate ways in which data capture and reproducibility are enhanced by tool interoperability.  

Please reach out with any comments, suggestions, or ideas for future integrations!

FAIR Island Project Receives NSF Funding

FAIR Island

Crossposted from the FAIR Island website

The California Digital Library (CDL), University of California Gump South Pacific Research Station, Berkeley Institute for Data Science (BIDS), Metadata Game Changers, and DataCite are pleased to announce that they have been awarded a 2- year NSF EAGER grant entitled “The FAIR Island Project for Place-based Open Science” (full proposal text). 

The FAIR Island project examines the impact of implementing optimal research data management policies and requirements, affording us the unique opportunity to look at the outcomes of strong data policies at a working field station. Building on the Island Digital Ecosystem Avatars (IDEA) Consortium (see Davies et al. 2016), the FAIR Island Project leverages collaboration between the Gump Station on the island of Moorea in French Polynesia (host of the NSF Moorea Coral Reef Long-Term Ecological Research site), and Tetiaroa Society, which operates a newly established field station located on the atoll of Tetiaroa a short distance from Moorea. 

The FAIR Island project builds interoperability between pieces of critical research infrastructure — DMPs, research practice, PIDs, data policy, and publications contributing to the advancement and adoption of Open Science.  In the global context, there are ongoing efforts to make science Open and FAIR to bring more rigor to the research process, in turn increasing the reproducibility and reusability of scientific results.  DataCite as a global partner in the project, has been working to recognize the importance of better management of research entities. This has led to critical advances concerning the development of infrastructure for Open Science. Increased availability of the different research outputs of a project (datasets, pre-registrations, software, protocols, etc.) would enable the reuse of research to aggregate findings across studies to evaluate discoveries in the field and ultimately assess and accelerate progress.

Key outcomes the FAIR Island team will develop include: 

  1. CDL, BIDS, and the University of California Natural Reserve System will work together to build an integrated system for linking research data to their associated publications via PIDs. We will develop a provenance dashboard from field to publication, documenting all research data and research outcomes derived from that data. 
  1. The project also facilitates further development of the DataCite Commons interface and extends connections made possible via the networked DMP that allows users to track relationships between DMPs, investigators, outputs, organizations, research methods, and protocols; and display citations throughout the research lifecycle.
  1. Developing an optimal data policy for place-based research by CDL, BIDS, and Metadata Game Changers is the cornerstone component of the FAIR Island project.  A reusable place-based data policy template will be shared and implemented amongst participating UC-managed field stations and marine labs. In addition, we will be incorporating these policies into a templated data management plan within the DMPTool application and sharing it with the broader community via our website, whitepapers, and conferences such as the Research Data Alliance (RDA) Plenaries.

The FAIR Island project is in a unique position to demonstrate how we can advance open science by creating optimal FAIR data policies governing all research conducted at field stations. Starting with the field station on Tetiaroa, the project team plans to demonstrate how FAIR data practices can make the reuse of data and the collaboration of data more efficient. Data Management Plans (DMPs) in this “FAIR data utopia” will be utilized as key documents for tracking provenance, attribution, compliance, deposit, and publication of all research data collected on the island by implementing mandatory registration requirements, including extensive use of controlled vocabularies, personal identifiers (PIDs), and other identifiers.

The project will make significant contributions to international Open Science standards and collaborate with open infrastructure providers to provide a scalable implementation of best practices across services. In addition, DataCite seeks to extend the infrastructure services developed in the project to their member community across 48 countries and 2,500 repositories globally. 

We will continue to share details and feature developments related to the FAIR Island project via our blog. You can join the conversation at the next RDA plenary in November 2021. Feedback or questions are most welcome and can be sent directly to info@fairisland.org

Connecting the DMP ID to an ORCID record

Recently we announced that the DMPTool can now generate persistent, unique IDs (the DMP ID) for plans created within the application. Building on this development, we are thrilled to share that the scholarly identifier service for researchers, ORCID, recently adopted the DMP as a resource type. As a result, DMPs are now a defined work type within an ORCID record and listed on an individual’s ORCID record. The connection between a DMP ID and ORCID is crucial for the Networked DMP, as ORCIDs play a key role in facilitating connections between researchers, institutions, outputs, and projects. It is precisely these types of relationships that we are enabling through our work on Networked DMPs.

Screenshot of manually adding a DMP as a work to an ORCID record

Additionally, DMP IDs generated via the DMPTool are now automatically linked to the DMP creator’s ORCID record. This means that when a DMPTool user “Registers” their plan, a DMP ID is generated, and this record is automatically pushed to ORCID and included as a work on their ORCID profile page. 

“Registering” a DMP will generate a DMP ID and push this work to the associated ORCID record
After a DMP ID is generated this work will be listed as a work on the researcher’s ORCID record

Together with Liz Krznarich from DataCite and DMPTool Editorial Board member ​​Nina Exner from Virginia Commonwealth University, I recently participated in an ORCID Community Call demonstrating this new integration and discussing our approach to building the Networked DMP. A recording of the webinar is available here, and our combined slide deck is available here.  

The DMPTool team continues to expand the Networked DMP. Development is currently underway for additional features within the DMPTool, including DMP versioning and advancing our API to facilitate external integrations. We look forward to sharing updates with you soon about these exciting advancements. In the meantime, as always, feedback or questions are most welcome and can be sent directly to maria.praetzellis@ucop.edu.

Interviews on Implementing Effective Data Practices, Part I: Why This Work Matters

Cross-posted from ARL News by Natalie Meyers, Judy Ruttenberg, and Cynthia Hudson-Vitale | October 28, 2020

In preparation for the December 2019 invitational conference, “Implementing Effective Data Practices,” hosted by the Association of Research Libraries (ARL), Association of American Universities (AAU), Association of Public and Land-grant Universities (APLU), and California Digital Library (CDL), we conducted a series of short pre-conference interviews.

We interviewed representatives from scholarly societies, research communities, funding agencies, and research libraries about their perspectives and goals around machine-readable data management plans (maDMPs) and persistent identifiers (PIDs) for data. We hoped to help expose the community to the range of objectives and concerns we bring to the questions we collectively face in adopting these practices. We asked about the value the interviewees see or wish to see in maDMPs and PIDs, their concerns, and their pre-conference goals.

In an effort to make these perspectives more widespread, we are sharing excerpts from these interviews and discussing them in the context of the final conference report that was released recently. Over the next three weeks, we will explore and discuss interview themes in the context of broad adoption of these critical tools.

Why This Work Matters

To start off this series of scholarly communications stakeholder perspectives, we need to position the importance of this infrastructure within broader goals. The overall goal of the conference was to explore the ways that stakeholders could adopt a more connected ecosystem for research data outputs. The vision of why this was important and how it would be implemented was a critical discussion point for the conference attendees.

Benjamin Pierson, then senior program officer, now deputy director for enterprise data, Bill and Melinda Gates Foundation, expressed the value of this infrastructure as key to solving real-world issues and making data and related assets first-class research assets that can be reused with confidence.

Clifford Lynch, executive director, Coalition for Networked Information, stated how a public sharing of DMPs within an institution would create better infrastructure and coordination at the university level for research support.

From the funder perspective, Jason Gerson, senior program officer, PCORI (Patient-Centered Outcomes Research Institute), indicated that PIDs are also essential for providing credit for researchers as well as for providing funders with a mechanism to track the impact of the research they fund.

Margaret Levenstein, director, ICPSR (Inter-university Consortium for Political and Social Research), spoke about the importance of machine-readable DMPs and PIDs for enhancing research practices of graduate students and faculty as well as the usefulness for planning repository services.

For those developing policies at the national level, Dina Paltoo, then assistant director for policy development, US National Library of Medicine, currently assistant director, scientific strategy and innovation, Immediate Office of the Director, US National Heart, Lung, and Blood Institute, discussed how machine-readable data management plan are integral for connecting research assets.

All of the pre-conference interviews are available on the ARL YouTube channel.

Natalie Meyers is interim head of the Navari Family Center for Digital Scholarship and e-research librarian for University of Notre Dame, Judy Ruttenberg is senior director of scholarship and policy for ARL, and Cynthia Hudson-Vitale is head of Research Informatics and Publishing for Penn State University Libraries.

Effective Data Practices: new recommendations to support an open research ecosystem

We are pleased to announce the release of a new report written with our partners at the Association of Research Libraries (ARL), the Association of American Universities (AAU), and the Association of Public and Land-grant Universities (APLU): Implementing Effective Data Practices: Stakeholder Recommendations for Collaborative Research Support.  

The report brings together information and insights shared during a December 2019 National Science Foundation sponsored invitational conference on implementing effective data practices. In this report, experts from library, research, and scientific communities provide key recommendations for effective data practices to support a more open research ecosystem. 

During the December conference, the project team developed a set of recommendations for the broad adoption and implementation of NSF’s recommended data practices as described in the NSF’s May 2019 Dear Colleague Letter.  The report focuses on recommendations for research institutions and also provides guidance for publishers, tool builders, and professional associations. The AAU-APLU Institutional Guide to Accelerating Public Access to Research Data, forthcoming in spring 2021, will include the recommendations.

The conference focused on designing guidelines for (1) using persistent identifiers (PIDs) for datasets, and (2) creating machine-readable data management plans (DMPs), both data practices that were recommended by NSF. Based on the information and insights shared during the conference, the project team developed a set of recommendations for the broad adoption and implementation of NSF’s preferred data practices. 

The report focuses on recommendations for research institutions and also provides guidance for publishers, tool builders, and professional associations. The AAU-APLU Institutional Guide to Accelerating Public Access to Research Data, forthcoming in spring 2021, will include the recommendations.

Five key takeaways from the report are:

  • Center the researcher by providing tools, education, and services that are built around data management practices that accommodate the scholarly workflow.
  • Create closer integration of library and scientific communities, including researchers, institutional offices of research, research computing, and disciplinary repositories.
  • Provide sustaining support for the open PID infrastructure that is a core community asset and essential piece of scholarly infrastructure. Beyond adoption and use of PIDs, organizations that sustain identifier registries need the support of the research community.
  • Unbundle the DMP, because the DMP as currently understood may be overloaded with too many expectations (for example, simultaneously a tool within the lab, among campus resource units, and with repositories and funding agencies). Unbundling may allow for different parts of a DMP to serve distinct and specific purposes.
  • Unlock discovery by connecting PIDs across repositories to assemble diverse data to answer new questions, advance scholarship, and accelerate adoption by researchers.

The report also identifies five core PIDs that are fundamental and foundational to an open data ecosystem. Using these PIDs will ensure that basic metadata about research is standardized, networked, and discoverable in scholarly infrastructure: 

  1. Digital object identifiers (DOIs) from DataCite to identify research data, as well as from Crossref to identify publications
  2. Open Researcher and Contributor (ORCID) iDs to identify researchers
  3. Research Organization Registry (ROR) IDs to identify research organization affiliations 
  4. Crossref Funder Registry IDs to identifier research funders 
  5. Crossref Grant IDs to identify grants and other types of research awards

The report is intended to encourage collaboration and conversation among a wide range of stakeholder groups in the research enterprise by showcasing how collaborative processes help with implementing PIDs and machine-actionable DMPs (maDMPs) in ways that can advance public access to research.

The full report is now available online

This material is based upon work supported by the National Science Foundation under Grant Number 1945938. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Project team:

  • John Chodacki, California Digital Library
  • Cynthia Hudson-Vitale, Pennsylvania State University
  • Natalie Meyers, University of Notre Dame
  • Jennifer Muilenburg, University of Washington
  • Maria Praetzellis, California Digital Library
  • Kacy Redd, Association of Public and Land-grant Universities
  • Judy Ruttenberg, Association of Research Libraries
  • Katie Steen, Association of American Universities

 

Additional report and conference contributors:

  • Joel Cutcher-Gershenfeld, Brandeis University
  • Maria Gould, California Digital Library

DMPRoadmap Team at the maDMP Hackathon

Research Data Alliance (RDA) recently hosted a three day (27-29 May 2020) machine-actionable DMP hackathon to build integrations and test the Common Standard for maDMPs. The event, coordinated through teams at RDA-Austria and TU Wien, was well attended with over 70 participants from Australia, Europe, Africa, and North America. 

The teams that work on DMP Tool (dmptool.org) and DMP Online (dmponline.org) were really pleased to represent our shared DMPRoadmap codebase and show our conformance with the standard and ability to exchange DMPs across systems. This blog post details the work of the DMPRoadmap group in the hackathon, for a full review of all outputs please visit the Hackathon GitHub.

What did we work on?

Maria Praetzellis and Sarah Jones, product managers from DMPRoadmap, joined the hackathon “TigTag” team and focused on mapping maDMPs to funder templates. During the hackathon, their group successfully mapped required questions from several funder specific DMPs including: 

  • Horizon 2020
  • Science Europe
  • National Science Foundation
  • U.S. Geological Survey

The goal of the exercise was to develop guidance on how to normalize the ways that fields from specific funder templates can be mapped to the standard, and, when necessary, develop extensions to incorporate template specific needs. The team came up with several proposals for changes to the documentation and structure of DMP Common Standard and made a few recommendations for extensions to the standard. The team is now assembling the recommendations and will submit ideas as issues to the Common Standard GitHub so work can be tracked going forward. 

Brian Riley and Sam Rust, developers from DMPRoadmap,  joined the hackathon “DMP Exchange team” and worked to determine how the RDA Common Standard JSON format could be used to exchange DMP metadata between tools. Their team provided a staging service and granted API keys to other development teams to allow testing of prototypes, which helped all participants debug issues. Over the course of the hackathon, our new maDMP API helped developers of the following DMP systems implement their own APIs:

Based on this work, we were able to exchange maDMP metadata between DMPTool and those three systems by the end of the hackathon.  Below are screenshots of DMP exports from the Data Stewardship Wizard that were imported into the DMPTool. Because we were each using the RDA Common Standard format, the new DMP was created within the DMPTool and the appropriate metadata was successfully mapped: title, description, project start/end dates, grant ID, contact information, and contributor information.

While the data models used by many systems do not yet offer full support of the RDA Common Standard model, progress was made towards mapping the high level DMP information across the board. Also, the confirmation that these systems could exchange information using RDA Common Standard JSON was encouraging and will likely open the door for future integrations. 

Other outcomes

We also collaborated with members of the DMP Melbourne, University of Cape Town and Stockholm University on an integration with their institutional repository platform. The teams were interested in pushing both DMP metadata and the physical DMP document into that repository. However, they did not yet support the maDMP standard. So the team created two separate prototype scripts. The first script extracts DMPs from a DMPRoadmap system and creates a placeholder Project that future datasets can be connected to and also uploads a PDF copy of the DMP. The second script converts their JSON into RDA Common Standard compliant JSON. While their institutional repositories do not contain many DMPs at this point, a service like this could help extract DMPs for import into DMP systems that utilize the RDA Common Standards in the future. We hope to build upon this work to facilitate integrations with additional repositories in the future. 

Future work 

Hackathon participants are now collating work produced during the hackathon into a final report. In addition, participants expressed interest in:

  • More communities. Most of the attendees at this hackathon were developers from DMP-focused tools. In the future, it would be great to have participants from other communities, including developers of CRIS systems, data repository platforms, and ethics tools.  This would help us expand the types of use cases being served.
  • More PIDs. The power of connected information replies on persistent identifiers.  We would like to increase our connection with various standards and integrate with the Research Organization Registry (ROR), the Funder Registry, and the Contributor Roles Taxonomy (CRediT) to provide more structured information to support such integrations.

Thank you again to the team at RDA Austria and TU Wein for organizing the hackathon.  If you’re interested in tracking future development and outputs of this work please follow the GitHub and consider joining the RDA Common Standard Working Group or Active DMPs Interest Group

What’s new with our machine actionable DMP work?

Building on the conceptual framework laid out in articles such as Ten principles for machine-actionable data management plans and prior blog posts covering such topics as what maDMPs are, what they can do to support automation, utilizing common standards and PIDs, and maDMPs as living documents, we are now moving into active development on the technical aspects of our NSF funded EAGER research project

A phased approach: building a plan for maDMPs

The goal of our EAGER research project is to explore the potential of machine-actionable DMPs as a means to transform the DMPs from a compliance exercise based on static text documents into a key component of a networked research data management. This ecosystem will not only facilitate, but also improve the research process for all stakeholders. 

We will be laying out the phases of work in the coming months and will continue to use this blog to keep the community informed of our progress, and to solicit your feedback and ideas.

Phase 1 Workplan

maDMP_phase1

Phase 1 of of our research entails exploring the following three high level ideas:

  1. How to best restructure the DMPTool metadata to utilize the RDA Working Group Common Standard
  2. How to optimize the Digital Object Identifiers (DOI) metadata schema for DMPs 
  3. How to best incorporate other Persistent identifiers (PIDs) into DMPs

Common Standards

The common data model for the creation of machine-actionable DMPs, produced by the RDA working group on DMP Common Standards, was recently released for community feedback. Our partners at the Digital Curation Center (DCC) have now implemented this model into the DMPRoadmap codebase. A big thank you to Sam Rust from DCC for his work on this! Those interested in learning more about the Common Standard in DMPRoadmap may want to view a recent webinar recording of Sam detailing this work. This was a fundamental step towards machine actionable DMPs, as it forms the foundation to enable information flow between DMPs and affiliated external systems in a standardized manner.

DOIs for DMPs

With our partners at the Digital Curation Center (DCC), we are working to incorporate the common standards into the shared DMPRoadmap codebase and our DMPTool development plans. As part of this work, we have partnered with DataCite to update their metadata schema to better support DMPs and to optimize a workflow for generating DOIs for DMPs. By relying on the DOI infrastructure, we will then be able to utilize the Event Data service from DataCite to record when assertions have been made on the DOI. More on the workflows surrounding this aspect of the project below. 

DMPs and the PID graph

Projects such as Freya have been working to connect research outputs through a PID graph.  A key question underpinning much of our work is how we can best leverage the PID graph (see Principle 5: Use PIDs and controlled vocabularies) within the DMP ecosystem. To connect DMPs to the larger PID ecosystem, our first phase will also include incorporating the following persistent identifiers into the DMP as a baseline for future work:

Phase 1 workflows

As discussed above, in Phase 1, we are building a system to mint DOIs for DMPs and creating a landing page for DMP DOIs to record updates to the DOI that occur over time. Although the system can be thought of as a giant API, pulling and pushing data from various sources, we are also building a landing page for these DOIs in order to visually demonstrate the types of connections made possible by tracking a research project over time from the point of DMP creation. 

Below is a high level overview of this workflow and whiteboarding of its potential architecture. (For those that would like a more detailed view, please check out our GitHub).maDMPRegistry

  1. maDMP system accepts common standard metadata from DMPTool (DMP Roadmap) 
  2. maDMP system sends that metadata to DataCite to mint a DOI (which it then returns to the DMPTool)
  3. A landing page is generated for the DMP DOI
  4. A separate harvester application queries outside APIs to check for assertions recorded against the DOI. For this phase of work we will work with the NSF awards API, and return any award information into the maDMP system. 
  5. The maDMP system then sends any award info returned to DataCite 

Our goal is to leverage the work being done by the RDA Exposing DMP working group to help inform the privacy concerns of exposing certain types of assertions on this landing page.  

Next Steps

Looking ahead, we plan to produce a basic prototype ready for testing and feedback by the end of October. I will be presenting on our work thus far at the upcoming RDA and CODATA meetings. During these meetings, I look forward to continuing our work with the RDA Common Standards Working Group (and to meeting many of those active in this space for the first time in-person)! 

Once we establish the workflow to record assertions to a DMP DOI, our next phase of work will include pilot projects with domain-specific and institutional stakeholders to test the flow and integration of relevant information across services and systems. With these partners we plan to test how maDMPs can help track data management activities as they occur during the course of a grant project. 

Finally, it’s important to note that all of our development work is being done in a test environment where we will continue to iterate for the next several months as we determine how best to deploy new features to the DMPTool and DMPRoadmap codebase. 

Interested in contributing?

Lastly, we realize that maDMP is far from the most euphonious or creative name for this service (nor is our original idea of the DMPHub much better). We are open to any and all ideas for naming this work so if you have any ideas, however strange or off the wall, please do let us know. If we use your idea we promise to shower you with accolades for your denomination genius. Also, free stickers galore.

To review or contribute to the technical components of the project check out our GitHub. And most importantly, please send any and all feedback, questions, or ideas for names to maria.praetzellis@ucop.edu.

 

Representing time in machine-actionable DMPs

In this next installment of the machine-actionable DMP blog series, we want to address the broader context of time to hone in on answering the following question:

How and when do you update some piece of information in a DMP?

This happens to be the substance of Principle 9 from our preprint, forthcoming in PLOS Miksa et al. 2018: maDMPs should be versioned, updatable, living documents.

DMPs should not just be seen as a “plan” but as updatable, versioned documents representing and recording the actual state of data management as the project unfolds. The act of planning is far more important than the plan itself, and to derive value for researchers and other stakeholders, the plan needs to evolve. DMPs should track the course of research activities from planning to sharing and preserving outputs, recording key events over the course of a project to become an evolving record of activities related to the implementation of the plan.

We can all agree that it’s important to treat maDMPs as living documents, but there are multiple approaches we might take to updating them, and multiple stakeholders who should be able to provide updates for particular pieces of information at particular points along the way. First we’ll provide a quick overview of the current state of DMP-time as represented in systems and policies related to our NSF EAGER project, plus a handful of other relevant systems and policies that extend the geographical and organizational scope. Then, we’ll pitch an idea for how we can handle DMP-time using Crossref/DataCite Event Data Service. We welcome, nay encourage your feedback about this and other ideas as we experiment and iterate and prove things out in practice.

Representing time in DMPs

So we built a graph database with seed data from our partners at BCO-DMO and the UC Gump Field Station on Moorea, and enriched it with information from the NSF Awards API and public plans created with the DMPTool. All of the projects represented in the database correspond with NSF awards and therefore the DMPs have an associated timeline of:

  1. Create DMP and submit grant proposal (via institutional Office of Research, NSF Fastlane system)
  2. Grant awarded (grant number issued by NSF)
  3. Grant period ends, final report due (data deposited at appropriate repository)

This current grant/DMP workflow fails to capture information about actual data management activities as they unfold over the course of a project, however, data management staff at BCO-DMO and the Gump Field Station perform interventions and provide manual updates in their own repository systems opportunistically. These updates can occur during active stages of multi-year projects and most of them are done at the grant closeout stage when researchers are engaged with reporting activities and aware that they must deposit their data. Relevant NSF program officers from the Geosciences Directorate conduct manual compliance checks to ensure that grantees have deposited data prior to issuing a new award, which is a very useful feature of this case study.

In addition to the data repository systems, information about these projects flows through institutional grant management systems, NSF’s Fastlane system, and a subset is made publicly available via the NSF Awards API (example of our award). Each of these systems records the start data and end date for the award, and some include interim reporting dates. Our ongoing analysis for maDMP prototyping is focused on identifying additional milestones during the course of a project and which stakeholders should be responsible for updating which pieces of information…drilling into the original question of how and when do you update things?

DMP-time in European contexts

To avoid an overly narrow focus on one national context and one funding agency in this larger thematic discussion about time, we’ll also consider some European examples. The European Commission’s Horizon 2020 program acknowledges the fact that information about research data changes from the planning to final preservation stages; as a result, DMPs have built-in versioning. Horizon 2020 proposals that receive an award must submit a first version of the DMP within the first 6 months of the project. The DMP needs to be updated over the course of the project whenever significant changes arise, however, this “requirement” is somewhat vague and reads more like a best practice. Updated versions of the DMP are required at any periodic reporting deadline and at the time of the final report. DMPonline provides an optional set of Horizon 2020 templates that includes an 1) Initial DMP, 2) Detailed DMP, and 3) Final review DMP.

Our maDMP collaborators at the Technical University of Vienna are forging ahead with their own institutional prototyping efforts to automate DMPs and integrate them with local infrastructure. They just released this excellent interactive “mockups” tool and invite your feedback. Within the mockups system, time is represented through the concept of DMP Granularity and in some cases this is related to funding status. The level of granularity corresponds roughly with versions, which carry the labels “initial, detailed, or sophisticated.”

Representing time in maDMPs: Ideas for the future

The ability to update DMPs is central to our own plans for realizing machine-actionability and relies on infrastructure that already exists. In a nutshell, our idea is to insert DMPs and corresponding grant numbers into the sprawling web of information connecting people and their published outputs. We think the mechanism for accomplishing this is to issue DataCite DOIs for DMPs: this creates an identifier against which we can assert things programmatically. In addition, this hooks DMPs into Crossref/DataCite Event Data, which is a stream of assertions of relationships between research-related things. Existing and emerging registries of information are already leveraging this infrastructure—Scholix, ORCID, Wikidata, Make Data Count, etc. DMPs and grant numbers would provide a view of the connections between everything at the project level.

Documentation for Event Data explains that it “is a hub for the collection and distribution of a variety of Events and contains data from a selection of Sources. Every Event has a time at which it was created. This is usually soon after the Event was observed. In addition to this, every Event has a theoretical date on which it occurred…dates are represented as the occurred_at, timestamp and updated_date fields on each Event. The Query API has two views which allow you to find Events filtered by both occurred_at and timestamp timescales. It also lets you query for Events that have been updated since a given date.” This hub of information would therefore support versioning of the DMP as well as dynamic updating of key pieces of information (e.g. data types, volumes, licenses, repositories) by various stakeholders over time. Stakeholders could rely on this open hub of information and begin to make plans based on it (e.g., a named repository learns that a TB of data is expected within a specific timeframe).

In this scenario, the DMP would become an assertion store (cf. Wikidata and Wikibase). The assertion store would have a timeline component and anyone could use the DMP identifier to ping/query the Event Data Query API and find out what’s been asserted about the project. Various DMP stakeholders could also assert things about the project and update information over time. Each stakeholder could query and model DMP information based on the types of relationships and get the specific details they’re interested in… so an institution could discover who their PIs are collaborating with[o], a funder could check[p] if a dataset has been deposited in a named repository, a repository manager could search for any changes to a specific project or all relevant projects within a specific date range, etc. Wikidata has already begun indexing policies, in fact; once this happens at scale and is integrated with indexing of datasets, we could have automated dashboards displaying policy compliance and project progress.

That’s about it. Please tell us what you think about this approach to transforming a DMP into something active and updated, versioned and linked to research outputs.

Common standards and PIDs for machine-actionable DMPs

QR code cupcakes

From Flickr by Amber Case CC BY-NC 2.0 https://www.flickr.com/photos/caseorganic/4663192783/

Picking up where we left off from “Machine-actionable DMPs: What can we automate?”… Let’s unpack a couple of topics central to our machine-actionable DMP prototyping and automating efforts. These are the top rallying themes from all conversations, workshops, and working groups we’ve been privy to in the past few years. In addition, they feature in the “10 principles for machine-actionable DMPs” (principles 4 and 5):

  • DMP common standards
  • Persistent identifiers (PIDs)

DMP common standards
There’s community consensus about the need to first establish common standards for DMPs in order to enable anything else (Simms et al. 2017). Interoperability and delivery of DMP information across systems—to alleviate administrative burdens, improve quality of information, and reap other benefits—requires a common data model.

To address this requirement, the DMP Common Standards working group was launched at the 9th RDA plenary meeting in Barcelona. They’re making excellent progress and are on track to deliver a set of recommendations in 2019, which we intend to incorporate into our existing tools and emerging prototypes. Adoption of the common data model will enable tools and systems (e.g., CRIS, repositories, funder systems) involved in processing research data to read and write information to/from DMPs. The working group deliverables will be publicly available under a CC0 license and will consist of models, software, and documentation. For a summary of their scope and activities to date see Miksa et al. 2018.

A second round of consultation is underway currently to tease out more details and gather additional requirements about what DMP info is needed when for each stakeholder group. This international, multi-stakeholder working group is open to all; check out their session at the next RDA plenary in Botswana and contribute to the DMP common data model (6 Nov; remote participation is available).

Current/traditional DMPs - model questionnaires

<administrative_data>
    <question>Who will be the Principal Investigator?</question>
    <answer>The PI will be John Smith from our university.</answer>
</administrative data>
Machine-actionable DMPs - model information

“dc:creator”:[ {
         “foaf:name”:”John Smith”,
         “@id”:”orcid.org/0000-1111-2222-3333”,
         “foaf:mbox”:”mailto:jsmith@tuwien.ac.at”,
         “madmp:institution”:”AT-Vienna-University-of-Technology”
} ],

Caption: An example of data models for traditional DMPs (upper part) and machine-actionable DMPs (lower part). (Miksa et al. 2018: Fig. 1)

PIDs and DMPs
The story of PIDs in DMPs, or at least my involvement in the discussion, began with a lot of hand waving and musical puns at PIDapalooza 2016 (slides). After a positive reception and many deep follow-on conversations (unexpected yet gratifying to discover a new nerd community), things evolved into what is now a serious exploration of how to leverage PIDs for and in DMPs. The promise of PIDs to identify and connect research-relevant entities is tremendous and we’re fortunate to ride the coattails of some smart people who are making significant strides in this arena.

For our own PID-DMP R&D we’re partnering with one of the usual PID suspects, Datacite, to draw from their expertise and technical capabilities. Datacite contributed to the timely publication of the European Commission-funded FREYA report, which provides the necessary background research and a straightforward starting point(s). There’s also an established RDA PID interest group that we plan to engage with more as things progress.

A primary goal of FREYA is the creation and expansion of the “PID Graph.” The PID Graph “connects and integrates PID systems, creating relationships across a network of PIDs and serving as a basis for new services.” The report summarizes the current state of PID services as well as some emerging initiatives that we hope to harness (each is classified as mature, emerging, or immature):

  • ORCID iDs for researchers (mature)
  • DOIs for publications and data (mature), and software (emerging; also see SWH IDs)
  • Research OrgIDs for organizations (aka ROR; emerging and CDL is participating so we have an intimate view)
  • Global grant IDs (emerging and very exciting to track the prototyping efforts of Wellcome, NIH, and MRC!)
  • Data repository IDs (immature but on the radar as we address DMPs)
  • Project IDs/RAiDs (emerging and we see a lot of overlap with DMPs)

It also describes a vast array of PIDs for other things, all of which are potentially useful for maDMPs as we reconfigure them as an inventory of linked research outputs (Table 1: RRIDs, protocols, research facilities, field stations, physical samples, cultural artifacts, conferences, etc. etc.). Taken together, these efforts are aimed at extending the universe of things that can be identified with PIDs and expanding what can be done with them. This, in turn, supports automation and machine-actionability to achieve better research data management and promote open science.

Summing up
For now we’ll continue exploring our graph database and interviewing stakeholders who contributed seed data to dive deeper into their workflows, challenges, and use cases for maDMPs. This runs parallel to the activities of the RDA DMP Common Standards WG and various emerging PID initiatives. Based on this overlapping community research, we can move forward with outlining what to implement and test. The recommendations of the RDA group for DMP common standards are a given, and below is a high-level plan for PID prototyping:

PIDs for DMPs and PIDs in DMPs:

  • DOIs for DMPs: define metadata
  • PIDs in DMPs: What can we achieve by leveraging mature PID services? How do we make the information flow between stakeholders and systems?

Stay tuned as the story develops here on the blog! I’ll also be presenting on maDMPs in a data repositories session convened by our BCO-DMO partners at the upcoming American Geophysical Union meeting in DC (program here, 11 Dec). And Daniel Mietchen will be at PIDapalooza 2019 (Dublin, 23-24 Jan) promoting a highly relevant initiative: PIDs for FAIR ethics review processes.

Roadmap back to school edition

Summer activities and latest (major 2.0.0) release
The DMPRoadmap team is checking in with an overdue update after rotating holidays and work travels over the past few months. We also experienced some core team staff transitions and began juggling some parallel projects. As a result we haven’t been following a regular development schedule, but we have been busy tidying up the codebase and documentation.

This post summarizes the contents of the major release and provides instructions for those with existing installations who will need to make some configuration changes in order to upgrade to the latest and greatest DMPRoadmap code. In addition to infrastructure improvements, we fixed some bugs and completed some feature enhancements. We appreciate the feedback and encourage you to keep it coming since this helps us set priorities (listed on the development roadmap) and meet the data management planning needs of our increasingly international user community. On that note, we welcome Japan (National Institute for Informatics) and South Africa (NeDICC) as additional voices in the DMP conversation!

Read on for more details about all the great things packed into the latest release, as well as some general updates about our services and of course machine-actionable DMPs. The DCC has already pushed the release out to its services and the DMPTool will be upgrading soon – separate communications to follow. Those who run their own instances should check out the full release notes and a video tutorial on the validations and data clean-up (thanks Gavin!) to complete the upgrade.

DMPRoadmap housekeeping work (full release notes, highlights below)

  • Instructions for existing installations to upgrade to the latest release. Please read and follow these carefully to prevent any issues arising from invalid data. We highly recommend that you backup your existing database before running through these steps to prepare your system for Roadmap 2.0.0!
  • Added a full suite of automated unit tests to make it easier to incorporate external contributions and improve overall reliability.
  • Added data validations for improved data integrity.
  • Created new and revised existing documentation for coding conventions, tests, translations, etc (Github wiki). We can now update existing translations and add new ones more efficiently.

DMPRoadmap new features and bug fixes

  • Comments are now visible by default without having to click ‘Show.’ Stay tuned for additional improvements to the plan comments functionality in upcoming sprints.
  • Renamed/standardized text labels for ‘Save’ buttons for clarity.
  • Added a button to download a list of org users as a csv file (Admin > ‘Users’ page)
  • Added a global usage report for total users and plans for all orgs (Admin > ‘Usage’ page)
  • Admins can create customized template sections and place them at the beginning or end of funder templates via drag-and-drop
  • Removed multi-select box as an answer format and replaced with multiple choice

DCC/DMPonline subscriptions [Please note: this does not apply to DMPTool users] Another recent change is in the DMPonline service delivery model. The DCC has been running DMP services for overseas clients for several years and is now transitioning the core DMPonline tool to a subscription model based on administrator access to the tool. The core functionality (developing, sharing and publishing DMPs) remains freely accessible to all, as well as the templates, guidance and user manuals we offer. We also remain committed to the Open Source DMPRoadmap codebase. The charges cover the support infrastructure necessary to run a production-level international service. More information is available for our users in a recent announcement. We’re also growing the support team to keep up with the requests we’re receiving. If you are interested in being at the cutting edge of DMP services and engaging with the international community to define future directions, please apply to join us!

Machine-actionable DMPs
Increasing the opportunities for machine-actionability of DMPs was one of the spurs behind the DMPRoadmap collaboration. Facilities already exist via use of a number of standard identifiers and we’re moving on both the standards development tracks and code development and testing.

The CDL has been prototyping for the NSF EAGER grant and started a blog series focused on this work (#1, #2, next installation forthcoming), with an eye to seeding conversations and sharing experiences as many of us begin to experiment in multiple directions. CDL prototyping efforts are separate from the DMPRoadmap project currently but will inform future enhancements.

We’re also attempting to inventory global activities and projects on https://activedmps.org/ Some updates for this page are in the works to highlight new requirements and tools. Please add any other updates you’re aware of! Sarah ran a workshop in South Africa in August on behalf of NeDICC to gather requirements for machine-actionable DMPs there and the DCC will be hosting a visit from DIRISA in December. All the content from the workshop is on Zenodo and you can see how engaged the audience got in mapping our solutions. The DCC is also presenting on recent trends in DMPs as part of the OpenAIRE and FOSTER webinar series for Open Access week 2018. The talk maps out the current and emerging tools from a European perspective. Check out the slides and video.

You can also check out the preprint and/or stop by the poster for ‘Ten Principles for Machine-Actionable DMPs’ at Force2018 in Montreal and the RDA plenary in Botswana. This work presents 10 community-generated principles to put machine-actionable DMPs into practice and realize their benefits. The principles describe specific actions that various stakeholders are already undertaking or should take.

We encourage everyone to contribute to the session for the DMP Common Standards working group at the next RDA plenary (Nov 5-8 in Botswana). There is community consensus that interoperability and delivery of DMP information across systems requires a common data model; this group aims to deliver a framework for this essential first step in actualizing machine-actionable DMPs.