Scoping Machine-Actionable DMPs

Machine-actionable data management plans (maDMPs) are happening. Over the past several years we’ve contributed to community discussions and various events to suss out what we all mean by this term and why we think maDMPs are important. In the midst of these efforts, we (California Digital Library) also received an NSF EAGER grant to prototype maDMPs and are now in the process of designing that work.

To connect our prototyping with the constantly evolving maDMP landscape, we remain active in the Research Data Alliance, Force11, domain-based efforts (e.g., AGU Enabling FAIR Data), and of course we run the DMPTool service as part of an international policy/support initiative called the DMP Roadmap project. We also recently helped launch a website activedmps.org to identify all of the people and projects across the globe working on maDMPs.

In keeping with this community thread, as well as for our own edification, we’re kicking off an maDMP blog series. The primary goal is to offer some framing documents so other stakeholders, especially those who’ve invested as much time as we have thinking about such an obscure topic (!), can help us ask and answer the many outstanding questions about maDMPs. A secondary motivation is to respond to the frequent queries from our users and other stakeholders about how to envision and plan for an maDMP future, which seems inevitable as more of us begin to prototype in different directions.

For this inaugural scoping piece we want to address the following high-level questions. And just to reiterate, the answers herein are distilled from our own thinking; by no means do we think that these are the correct or only answers. We invite others to challenge our ideas at any/every step along the way.

  1. What are maDMPs?
  2. What are they not? 
  3. Who are they for?
  4. How are they different from “traditional” DMPs?
  5. What does this mean for the future of DMPs and support services?

…What comes next?

 

1. What are maDMPs?
maDMPs are a vehicle for reporting on the intentions and outcomes of a research project that enable information exchange across relevant parties and systems. They contain an inventory of key information about a project and its outputs (not just data), with a change history that stakeholders can query for updated information about the project over its lifetime. The basic framework requires common data models for exchanging information, currently under development in the RDA DMP Common Standards WG, as well as a shared ecosystem of services that send notifications and act on behalf of humans. Other components of the vision include machine-actionable policies, persistent identifiers (PIDs) (e.g., ORCID iDs, funder IDs, forthcoming Org IDs, RRIDs for biomedical resources, protocols.io, IGSNs for geosamples, etc), and the removal of barriers for information sharing.

2. What are they not?
maDMPs are not a collection of best practices for creating a data management plan (those exist already, Michener 2015) nor are they a comprehensive record of every detail about a research project and how it was conducted (i.e., they are not the Open Science Framework). It is out of scope to use maDMPs to connect all the things in the universe and try to solve reproducibility. Instead they are a plan and instructions about how to implement the plan, as well as a report about the completion of the plan; this plan includes an inventory/registry of research outputs and information about what to do with each thing (e.g., length of time to retain a dataset in a repository).

3. Who are they for?
maDMPs are focused primarily on infrastructure providers, systems, and those responsible for creating and enforcing research data policies. maDMPs are not focused primarily on researchers, data librarians, or other research support staff. However, broad adoption by all stakeholders in the research enterprise is required to achieve the the goals of the policies and ideally everyone will reap the benefits. Here is a (roughly) ranked-order list of the target audience for maDMPs:

  • Funder: funding agencies and foundations that specify requirements for DMPs and monitor compliance.
  • Repository Operator: General (e.g., Zenodo, Dryad), disciplinary (e.g., GenBank, ICPSR), and institutional data repositories.
  • Infrastructure Provider: Providers of systems for creating DMPs (DMPTool, DMPonline), grants administration, researcher profiles (RIMS/CRIS), etc. .
  • Institutional Administrator: Office of Research/Sponsored Programs, Chief Information Officers, University Librarians, others.
  • Ethics Review: Institutional Review Boards (IRB)/Research Ethics Boards (REB) that authorize human subjects research.
  • Legal Expert: Technology transfer offices; copyright and patent experts.
  • Publisher: Purveyors of article and data publication services.
  • Researcher: Principal Investigator and collaborators, including postdoctoral researchers, graduate and undergraduate students.
  • Research Support Staff: Data managers/curators, research administrators, and data librarians.
machine-actionable DMP info flows

Examples of stakeholder interactions within the ecosystem of machine-actionable DMPs. Stakeholders communicate with each other by exchanging information through DMPs. For example, a repository operator can select a proper repository, set an embargo period, and assign a correct license to data submitted by researchers. In return, a system acting on behalf of a repository operator provides a list of DOIs assigned to the data and provides information on costs of storage and preservation. This in turn can be accessed by a funder to check how the DMP was implemented.

4. How are they different from “traditional” DMPs?
The vision for maDMPs is to automate certain pieces of the DMP process, especially to alleviate the administrative burden of entering the same information in multiple places (e.g. it would be great if a researcher could recycle part or all of an IRB application for a DMP, or generate a Biosketch/CV automatically from their ORCID profile, or automatically generate a data availability statement when publishing data/articles). There is still a need for a human-readable narrative that describes digital research methods and outputs, but the main difference is that it should be updatable so that DMPs can become useful beyond the grant application stage.

5. What does this mean for the future of DMPs and support services?
We get asked this question often, most recently in the form of a provocative email from Dr. Devan Ray Donaldson as he was designing the curriculum for his digital curation course at Indiana University Bloomington.

Our response: Librarians and other digital curation experts absolutely have a role to play in supporting researchers with DMPs and data management issues more broadly. At CDL we spend a lot of time digging into the weeds of digital curation issues with librarians and researchers at all 10 UC campuses and we noticed that a major barrier to effectively supporting researchers is that they don’t recognize the language/jargon of digital curation. At the risk of self-promotion I’ll direct you to this guide that we created based on our collective experiences as researchers, and now as people who support researchers, called “Support Your Data.” John Borghi was the main driver of the project (more details from him here) and we’re now developing more attractive resources and a website to adapt for your purposes if you find these materials useful. The goal is to educate researchers about good data management practices by relating to their current practices, and demonstrate how small habits (e.g., file naming conventions) can amount to better/more efficient research.

… What comes next?
maDMPs present an opportunity to move DMPs beyond a compliance exercise by providing needed structure, interoperability, and added-value functionality to support open, reusable research data. We’re designing and developing an open framework for maDMPs that builds on existing initiatives and infrastructure. There are numerous efforts focused on connecting people and outputs (e.g., ORCID, Wikidata, Scholix, NCBI accession numbers). We want to link this information with grant numbers to create a dynamic inventory of assertions about a grant-funded research project (note: in the future we’ll also consider DMPs not associated with grants).

Step 1 for us is to get seed data from our partners at BCO-DMO and the UC Berkeley Gump Field Station on Moorea and structure it to define native maDMPs. We’ll discuss subsequent steps in future blog posts. Stay tuned!

2 thoughts on “Scoping Machine-Actionable DMPs

  1. As part of the Research Support Staff, Data Manager/curator at Stockholm University Library, I find it disturbing to be ranked (however roughly) last on the list under point 3 above. In our work, we would definitely find it very useful to get e.g. easy export output of DMPs in XML or JSON, in order to validate compliance with local / national / international policies and regulations (allowing us to construct our own validation schemas in XSD, Schematron, JSON) and to use DMPs as one of the metadata sources for archival purposes, that is by extracting metadata for the transformation of SIPs to AIPs. This would definitely help us in achieving for our researchers the aim stated above under point 4 above, “to alleviate the administrative burden of entering the same information in multiple places”. (We’ve been talking to DMP Online about this already. The APIs and Ruby Gem solutions currently implemented by the DMP Roadmap are a bit to technical for us to be useful for these purposes.) Sincerely, Joakim Philipson, Stockholm University Library

    • Hi Joakim. Thanks for your comment; it provides an opportunity to clarify the ranking and reiterate that a major motivation for this post was to respond to questions from those who support researchers with DMPs and data management more broadly about what they should be doing to prepare for an maDMP future. The statement that “maDMPs are focused primarily on infrastructure providers, systems, and those responsible for creating and enforcing research data policies” means that these are the stakeholders with primary responsibility for taking the technical steps to implement maDMPs (developing a common data model, APIs, retrofitting existing systems, making machine-actionable policies, etc). Researchers and those who support research data management are also essential players, of course, but their primary focus should still be on doing research and managing data. The ranking is more about infrastructure and implementation and who is responsible for leading the charge in this arena. But your input throughout this phase of experimentation and prototyping is of equal value. I hope this helps alleviate your concerns, as your work is really the whole point of the research enterprise!

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.