Common standards and PIDs for machine-actionable DMPs

QR code cupcakes

From Flickr by Amber Case CC BY-NC 2.0 https://www.flickr.com/photos/caseorganic/4663192783/

Picking up where we left off from “Machine-actionable DMPs: What can we automate?”… Let’s unpack a couple of topics central to our machine-actionable DMP prototyping and automating efforts. These are the top rallying themes from all conversations, workshops, and working groups we’ve been privy to in the past few years. In addition, they feature in the “10 principles for machine-actionable DMPs” (principles 4 and 5):

  • DMP common standards
  • Persistent identifiers (PIDs)

DMP common standards
There’s community consensus about the need to first establish common standards for DMPs in order to enable anything else (Simms et al. 2017). Interoperability and delivery of DMP information across systems—to alleviate administrative burdens, improve quality of information, and reap other benefits—requires a common data model.

To address this requirement, the DMP Common Standards working group was launched at the 9th RDA plenary meeting in Barcelona. They’re making excellent progress and are on track to deliver a set of recommendations in 2019, which we intend to incorporate into our existing tools and emerging prototypes. Adoption of the common data model will enable tools and systems (e.g., CRIS, repositories, funder systems) involved in processing research data to read and write information to/from DMPs. The working group deliverables will be publicly available under a CC0 license and will consist of models, software, and documentation. For a summary of their scope and activities to date see Miksa et al. 2018.

A second round of consultation is underway currently to tease out more details and gather additional requirements about what DMP info is needed when for each stakeholder group. This international, multi-stakeholder working group is open to all; check out their session at the next RDA plenary in Botswana and contribute to the DMP common data model (6 Nov; remote participation is available).

Current/traditional DMPs - model questionnaires

<administrative_data>
    <question>Who will be the Principal Investigator?</question>
    <answer>The PI will be John Smith from our university.</answer>
</administrative data>
Machine-actionable DMPs - model information

“dc:creator”:[ {
         “foaf:name”:”John Smith”,
         “@id”:”orcid.org/0000-1111-2222-3333”,
         “foaf:mbox”:”mailto:jsmith@tuwien.ac.at”,
         “madmp:institution”:”AT-Vienna-University-of-Technology”
} ],

Caption: An example of data models for traditional DMPs (upper part) and machine-actionable DMPs (lower part). (Miksa et al. 2018: Fig. 1)

PIDs and DMPs
The story of PIDs in DMPs, or at least my involvement in the discussion, began with a lot of hand waving and musical puns at PIDapalooza 2016 (slides). After a positive reception and many deep follow-on conversations (unexpected yet gratifying to discover a new nerd community), things evolved into what is now a serious exploration of how to leverage PIDs for and in DMPs. The promise of PIDs to identify and connect research-relevant entities is tremendous and we’re fortunate to ride the coattails of some smart people who are making significant strides in this arena.

For our own PID-DMP R&D we’re partnering with one of the usual PID suspects, Datacite, to draw from their expertise and technical capabilities. Datacite contributed to the timely publication of the European Commission-funded FREYA report, which provides the necessary background research and a straightforward starting point(s). There’s also an established RDA PID interest group that we plan to engage with more as things progress.

A primary goal of FREYA is the creation and expansion of the “PID Graph.” The PID Graph “connects and integrates PID systems, creating relationships across a network of PIDs and serving as a basis for new services.” The report summarizes the current state of PID services as well as some emerging initiatives that we hope to harness (each is classified as mature, emerging, or immature):

  • ORCID iDs for researchers (mature)
  • DOIs for publications and data (mature), and software (emerging; also see SWH IDs)
  • Research OrgIDs for organizations (aka ROR; emerging and CDL is participating so we have an intimate view)
  • Global grant IDs (emerging and very exciting to track the prototyping efforts of Wellcome, NIH, and MRC!)
  • Data repository IDs (immature but on the radar as we address DMPs)
  • Project IDs/RAiDs (emerging and we see a lot of overlap with DMPs)

It also describes a vast array of PIDs for other things, all of which are potentially useful for maDMPs as we reconfigure them as an inventory of linked research outputs (Table 1: RRIDs, protocols, research facilities, field stations, physical samples, cultural artifacts, conferences, etc. etc.). Taken together, these efforts are aimed at extending the universe of things that can be identified with PIDs and expanding what can be done with them. This, in turn, supports automation and machine-actionability to achieve better research data management and promote open science.

Summing up
For now we’ll continue exploring our graph database and interviewing stakeholders who contributed seed data to dive deeper into their workflows, challenges, and use cases for maDMPs. This runs parallel to the activities of the RDA DMP Common Standards WG and various emerging PID initiatives. Based on this overlapping community research, we can move forward with outlining what to implement and test. The recommendations of the RDA group for DMP common standards are a given, and below is a high-level plan for PID prototyping:

PIDs for DMPs and PIDs in DMPs:

  • DOIs for DMPs: define metadata
  • PIDs in DMPs: What can we achieve by leveraging mature PID services? How do we make the information flow between stakeholders and systems?

Stay tuned as the story develops here on the blog! I’ll also be presenting on maDMPs in a data repositories session convened by our BCO-DMO partners at the upcoming American Geophysical Union meeting in DC (program here, 11 Dec). And Daniel Mietchen will be at PIDapalooza 2019 (Dublin, 23-24 Jan) promoting a highly relevant initiative: PIDs for FAIR ethics review processes.

Roadmap back to school edition

Summer activities and latest (major 2.0.0) release
The DMPRoadmap team is checking in with an overdue update after rotating holidays and work travels over the past few months. We also experienced some core team staff transitions and began juggling some parallel projects. As a result we haven’t been following a regular development schedule, but we have been busy tidying up the codebase and documentation.

This post summarizes the contents of the major release and provides instructions for those with existing installations who will need to make some configuration changes in order to upgrade to the latest and greatest DMPRoadmap code. In addition to infrastructure improvements, we fixed some bugs and completed some feature enhancements. We appreciate the feedback and encourage you to keep it coming since this helps us set priorities (listed on the development roadmap) and meet the data management planning needs of our increasingly international user community. On that note, we welcome Japan (National Institute for Informatics) and South Africa (NeDICC) as additional voices in the DMP conversation!

Read on for more details about all the great things packed into the latest release, as well as some general updates about our services and of course machine-actionable DMPs. The DCC has already pushed the release out to its services and the DMPTool will be upgrading soon – separate communications to follow. Those who run their own instances should check out the full release notes and a video tutorial on the validations and data clean-up (thanks Gavin!) to complete the upgrade.

DMPRoadmap housekeeping work (full release notes, highlights below)

  • Instructions for existing installations to upgrade to the latest release. Please read and follow these carefully to prevent any issues arising from invalid data. We highly recommend that you backup your existing database before running through these steps to prepare your system for Roadmap 2.0.0!
  • Added a full suite of automated unit tests to make it easier to incorporate external contributions and improve overall reliability.
  • Added data validations for improved data integrity.
  • Created new and revised existing documentation for coding conventions, tests, translations, etc (Github wiki). We can now update existing translations and add new ones more efficiently.

DMPRoadmap new features and bug fixes

  • Comments are now visible by default without having to click ‘Show.’ Stay tuned for additional improvements to the plan comments functionality in upcoming sprints.
  • Renamed/standardized text labels for ‘Save’ buttons for clarity.
  • Added a button to download a list of org users as a csv file (Admin > ‘Users’ page)
  • Added a global usage report for total users and plans for all orgs (Admin > ‘Usage’ page)
  • Admins can create customized template sections and place them at the beginning or end of funder templates via drag-and-drop
  • Removed multi-select box as an answer format and replaced with multiple choice

DCC/DMPonline subscriptions [Please note: this does not apply to DMPTool users] Another recent change is in the DMPonline service delivery model. The DCC has been running DMP services for overseas clients for several years and is now transitioning the core DMPonline tool to a subscription model based on administrator access to the tool. The core functionality (developing, sharing and publishing DMPs) remains freely accessible to all, as well as the templates, guidance and user manuals we offer. We also remain committed to the Open Source DMPRoadmap codebase. The charges cover the support infrastructure necessary to run a production-level international service. More information is available for our users in a recent announcement. We’re also growing the support team to keep up with the requests we’re receiving. If you are interested in being at the cutting edge of DMP services and engaging with the international community to define future directions, please apply to join us!

Machine-actionable DMPs
Increasing the opportunities for machine-actionability of DMPs was one of the spurs behind the DMPRoadmap collaboration. Facilities already exist via use of a number of standard identifiers and we’re moving on both the standards development tracks and code development and testing.

The CDL has been prototyping for the NSF EAGER grant and started a blog series focused on this work (#1, #2, next installation forthcoming), with an eye to seeding conversations and sharing experiences as many of us begin to experiment in multiple directions. CDL prototyping efforts are separate from the DMPRoadmap project currently but will inform future enhancements.

We’re also attempting to inventory global activities and projects on https://activedmps.org/ Some updates for this page are in the works to highlight new requirements and tools. Please add any other updates you’re aware of! Sarah ran a workshop in South Africa in August on behalf of NeDICC to gather requirements for machine-actionable DMPs there and the DCC will be hosting a visit from DIRISA in December. All the content from the workshop is on Zenodo and you can see how engaged the audience got in mapping our solutions. The DCC is also presenting on recent trends in DMPs as part of the OpenAIRE and FOSTER webinar series for Open Access week 2018. The talk maps out the current and emerging tools from a European perspective. Check out the slides and video.

You can also check out the preprint and/or stop by the poster for ‘Ten Principles for Machine-Actionable DMPs’ at Force2018 in Montreal and the RDA plenary in Botswana. This work presents 10 community-generated principles to put machine-actionable DMPs into practice and realize their benefits. The principles describe specific actions that various stakeholders are already undertaking or should take.

We encourage everyone to contribute to the session for the DMP Common Standards working group at the next RDA plenary (Nov 5-8 in Botswana). There is community consensus that interoperability and delivery of DMP information across systems requires a common data model; this group aims to deliver a framework for this essential first step in actualizing machine-actionable DMPs.

Machine-actionable DMPs: What can we automate?

Following on some initial thoughts about Scoping Machine-Actionable DMPs (maDMPs), we’re keen to dive into the substance. There are plenty of research questions we plan to explore here and over the course of our maDMP prototyping efforts. Let’s begin with these:

What can we automate?
What needs to be entered manually?

One of the major goals for maDMPs is to automate the creation and maintenance of some pieces of information.

Automation stands to alleviate administrative burdens and improve the quality of information contained in a DMP.

Thankfully, we’re not starting from scratch since Tomasz Miksa crafted an assignment for his CS students at the Technical University of Vienna to build an maDMP prototype tool and answer these very questions (course details; assignment). The student reports provide valuable insights that will help guide our own and others’ work on the topic. Read on for a brief overview of the assignment and a discussion of the key results; the results are woven into answers to the questions above.

I will also note that our own project includes grant numbers as a key piece of project metadata, which is not part of this assignment. We’re currently exploring the NSF Awards API and institutional grants management systems in the context of these questions, more on this anon.

Assignment
Students were instructed to build a tool that gathers information from external sources and automatically creates a DMP. Modeled on the European Commission’s DMP requirements for Horizon 2020, students could choose to create a DMP when a project starts (first version upon receiving funding) or when a project ends and all products have been preserved/published (final report). For the first option, the tool should help researchers estimate their storage needs and select a proper repository to store their research outputs. For the second option, the tool should connect to services where data is stored to retrieve information for creating a DMP.

External (or controlled) sources of information included:

  1. Administrative info (researcher name, project title): Use one or both of these inputs to search the university profile system and/or ORCID API to retrieve additional info (affiliation, contact email, etc).
  2. Find a repository (option 1): Use the OpenDOAR API or re3data API to recommend a repository based on sample data types and location (Europe, Austria)
  3. Get metadata about things deposited in a repository (option 2): Collect as much info as possible from the GitHub API about software products and OAI-PMH compliant repositories (e.g., license, format, size, etc) for other products.
  4. Select a license (if not provided in step 3): EUDAT license selector, reuse existing code.
  5. Preservation details: Allow users to tag all research products (e.g., input data, output data, software, documentation, presentation, etc.). Group them if appropriate. Provide a combo-box to define how long each product will be preserved (5, 10, 20 years).

The final reports describe the architecture and implementation of the tool; demonstrate how it works; include a human-readable and an maDMP created with the tool; and answer some questions about the benefits and limitations of automation.

Results
The student reports emphasize that a mixture of automation and manual processes is necessary to produce DMPs that meet all of the requirements outlined by funders. They demonstrate how we can leverage automation for maDMPs and provide thoughtful analyses about how we can consume available sources of information.

Portions of a DMP that can be automated easily include:

  • Basic project details such as title, names/authors, DMP creation date
  • Information (including metadata) about the research products associated with the project (e.g., data, software…)
  • Repository details: e.g., Zenodo, Github for software

Other automated portions of a DMP enable some inference but aren’t adequate by themselves:

  • Licenses: can be derived from a Github/Zenodo link
  • Software and data preservation details: some data is given for each file; some assumptions can be made based on the repository
  • Data sharing, access, and security details: some data is given for each file; some assumptions can be made based on the repository
  • Costs/resources: estimations can be made based on the size and type of data

Portions of a DMP that cannot be completed via automation:

  • Roles and responsibilities (although at TU Wien this is partially automated; they assume the project uses their infrastructure and provide details to designate individuals responsible for backups, final data deposit, etc)
  • Licenses and policies for reuse, derivatives (complete answers must be provided manually)
  • Ethical and privacy questions

Check out this example of a human-readable landing page for the DMP produced by one student team (Rafael Konlechner and Simon Oblasser) and the corresponding json output for the maDMP version. Some other examples of maDMP-creation tools for both assignment options are available here (ex 1, ex 2, ex 3, ex 4, ex 5, ex 6); they’re provided as Docker containers that can be launched quickly.

Discussion
The student prototypes and some other projects in this arena (e.g., UQRDM) inform larger maDMP goals surrounding automation and maintenance/versioning (i.e., keeping info in a DMP up to date). They identify sources/systems of existing information, mechanisms (APIs, persistent identifiers) for consuming and connecting it, and some important limitations regarding the informational content that require manual interventions and enrichment.

Our own prototype is following a similar trajectory as the student assignment. We’re defining existing data sources/systems and exploring the possibilities for moving information between them. The good news is that there are lots of sources and APIs out there in the wild with implications for maDMPs. There are also lots of existing initiatives to connect all the things that could become part of an maDMP framework (e.g., Scholix, ORCIDs, OrgIDs).

By taking this approach, we want to make the creation and maintenance of a DMP an iterative and incremental process that engages all relevant stakeholders (not just researchers writing grant proposals). Researchers need guides and translators to find the best resources and do their research efficiently, and in a manner that complies with open data policies. As we noted in the previous blog post, we want to enable repository operators, research support staff, policy experts, and many others to contribute to DMPs in order to achieve good data management.

Up next
Some related questions that we’re mulling over, but won’t endeavor to answer in this post:

  • Which stakeholders and/or systems should be able to make and update assertions (in a DMP) about a grant-funded project?
  • What is required to put it all together?

A teaser for the second question: interoperability and delivery of the DMP information across systems requires a common data model for DMPs. You can join the RDA DMP Common Standards working group to contribute to this ongoing effort. We’ll unpack this one in a future blog post.

Thanks to Tomasz (also a co-chair of the RDA group) and his students for taking an inspirational lead in maDMP prototyping!

Scoping Machine-Actionable DMPs

Machine-actionable data management plans (maDMPs) are happening. Over the past several years we’ve contributed to community discussions and various events to suss out what we all mean by this term and why we think maDMPs are important. In the midst of these efforts, we (California Digital Library) also received an NSF EAGER grant to prototype maDMPs and are now in the process of designing that work.

To connect our prototyping with the constantly evolving maDMP landscape, we remain active in the Research Data Alliance, Force11, domain-based efforts (e.g., AGU Enabling FAIR Data), and of course we run the DMPTool service as part of an international policy/support initiative called the DMP Roadmap project. We also recently helped launch a website activedmps.org to identify all of the people and projects across the globe working on maDMPs.

In keeping with this community thread, as well as for our own edification, we’re kicking off an maDMP blog series. The primary goal is to offer some framing documents so other stakeholders, especially those who’ve invested as much time as we have thinking about such an obscure topic (!), can help us ask and answer the many outstanding questions about maDMPs. A secondary motivation is to respond to the frequent queries from our users and other stakeholders about how to envision and plan for an maDMP future, which seems inevitable as more of us begin to prototype in different directions.

For this inaugural scoping piece we want to address the following high-level questions. And just to reiterate, the answers herein are distilled from our own thinking; by no means do we think that these are the correct or only answers. We invite others to challenge our ideas at any/every step along the way.

  1. What are maDMPs?
  2. What are they not? 
  3. Who are they for?
  4. How are they different from “traditional” DMPs?
  5. What does this mean for the future of DMPs and support services?

…What comes next?

 

1. What are maDMPs?
maDMPs are a vehicle for reporting on the intentions and outcomes of a research project that enable information exchange across relevant parties and systems. They contain an inventory of key information about a project and its outputs (not just data), with a change history that stakeholders can query for updated information about the project over its lifetime. The basic framework requires common data models for exchanging information, currently under development in the RDA DMP Common Standards WG, as well as a shared ecosystem of services that send notifications and act on behalf of humans. Other components of the vision include machine-actionable policies, persistent identifiers (PIDs) (e.g., ORCID iDs, funder IDs, forthcoming Org IDs, RRIDs for biomedical resources, protocols.io, IGSNs for geosamples, etc), and the removal of barriers for information sharing.

2. What are they not?
maDMPs are not a collection of best practices for creating a data management plan (those exist already, Michener 2015) nor are they a comprehensive record of every detail about a research project and how it was conducted (i.e., they are not the Open Science Framework). It is out of scope to use maDMPs to connect all the things in the universe and try to solve reproducibility. Instead they are a plan and instructions about how to implement the plan, as well as a report about the completion of the plan; this plan includes an inventory/registry of research outputs and information about what to do with each thing (e.g., length of time to retain a dataset in a repository).

3. Who are they for?
maDMPs are focused primarily on infrastructure providers, systems, and those responsible for creating and enforcing research data policies. maDMPs are not focused primarily on researchers, data librarians, or other research support staff. However, broad adoption by all stakeholders in the research enterprise is required to achieve the the goals of the policies and ideally everyone will reap the benefits. Here is a (roughly) ranked-order list of the target audience for maDMPs:

  • Funder: funding agencies and foundations that specify requirements for DMPs and monitor compliance.
  • Repository Operator: General (e.g., Zenodo, Dryad), disciplinary (e.g., GenBank, ICPSR), and institutional data repositories.
  • Infrastructure Provider: Providers of systems for creating DMPs (DMPTool, DMPonline), grants administration, researcher profiles (RIMS/CRIS), etc. .
  • Institutional Administrator: Office of Research/Sponsored Programs, Chief Information Officers, University Librarians, others.
  • Ethics Review: Institutional Review Boards (IRB)/Research Ethics Boards (REB) that authorize human subjects research.
  • Legal Expert: Technology transfer offices; copyright and patent experts.
  • Publisher: Purveyors of article and data publication services.
  • Researcher: Principal Investigator and collaborators, including postdoctoral researchers, graduate and undergraduate students.
  • Research Support Staff: Data managers/curators, research administrators, and data librarians.
machine-actionable DMP info flows

Examples of stakeholder interactions within the ecosystem of machine-actionable DMPs. Stakeholders communicate with each other by exchanging information through DMPs. For example, a repository operator can select a proper repository, set an embargo period, and assign a correct license to data submitted by researchers. In return, a system acting on behalf of a repository operator provides a list of DOIs assigned to the data and provides information on costs of storage and preservation. This in turn can be accessed by a funder to check how the DMP was implemented.

4. How are they different from “traditional” DMPs?
The vision for maDMPs is to automate certain pieces of the DMP process, especially to alleviate the administrative burden of entering the same information in multiple places (e.g. it would be great if a researcher could recycle part or all of an IRB application for a DMP, or generate a Biosketch/CV automatically from their ORCID profile, or automatically generate a data availability statement when publishing data/articles). There is still a need for a human-readable narrative that describes digital research methods and outputs, but the main difference is that it should be updatable so that DMPs can become useful beyond the grant application stage.

5. What does this mean for the future of DMPs and support services?
We get asked this question often, most recently in the form of a provocative email from Dr. Devan Ray Donaldson as he was designing the curriculum for his digital curation course at Indiana University Bloomington.

Our response: Librarians and other digital curation experts absolutely have a role to play in supporting researchers with DMPs and data management issues more broadly. At CDL we spend a lot of time digging into the weeds of digital curation issues with librarians and researchers at all 10 UC campuses and we noticed that a major barrier to effectively supporting researchers is that they don’t recognize the language/jargon of digital curation. At the risk of self-promotion I’ll direct you to this guide that we created based on our collective experiences as researchers, and now as people who support researchers, called “Support Your Data.” John Borghi was the main driver of the project (more details from him here) and we’re now developing more attractive resources and a website to adapt for your purposes if you find these materials useful. The goal is to educate researchers about good data management practices by relating to their current practices, and demonstrate how small habits (e.g., file naming conventions) can amount to better/more efficient research.

… What comes next?
maDMPs present an opportunity to move DMPs beyond a compliance exercise by providing needed structure, interoperability, and added-value functionality to support open, reusable research data. We’re designing and developing an open framework for maDMPs that builds on existing initiatives and infrastructure. There are numerous efforts focused on connecting people and outputs (e.g., ORCID, Wikidata, Scholix, NCBI accession numbers). We want to link this information with grant numbers to create a dynamic inventory of assertions about a grant-funded research project (note: in the future we’ll also consider DMPs not associated with grants).

Step 1 for us is to get seed data from our partners at BCO-DMO and the UC Berkeley Gump Field Station on Moorea and structure it to define native maDMPs. We’ll discuss subsequent steps in future blog posts. Stay tuned!

Set the controls for the heart of the sun

Our DMPTool and DMPonline services have been humming along with the same underlying code for a couple of months now. Since our MVP release, we’ve shifted gears to more regular sprints. We’re also pleasantly surprised by how eager the wider DMP community has been to join forces in migrating, translating, and even contributing new features already! Here’s a brief retrospective and a glimpse into the future.

Post MVP Backlog
There is a modest backlog of work that didn’t make into the MVP release. We’ve prioritized these issues and are focused on tying up the loose ends over the coming months. Those following the DMPRoadmap Github repository will notice regular releases. The goal is to settle into a steady two-week rhythm, but in the near term we’re working on slightly shorter or longer cycles to address critical bugs and some minor refactoring. Many thanks to our users on both sides of the pond who have reported issues and provided overwhelmingly positive feedback so far!

Evolving processes
We’ve been communicating with our respective user communities about new fixes and features as things pertain to them. Some things to note about our evolving development process:

  • DMPRoadmap GitHub repo: this is where most development work happens since the majority of fixes and features apply to the core codebase. This repository also contains all technical documentation, release notes, and other info for those interested in deploying their own instances or contributing to the project.
  • The DMPRoadmap wiki has a list of potential future enhancements. We’re collating ideas here and will define priorities and requirements in consultation with the community via user groups and listserv discussions. If you have other desired new features please let us know.
  • Any service-specific customizations reside in separate GitHub repos. For example, you can find the custom Single-Sign-On code in the DMPTool GitHub repo. The way that we handle helpdesk functions varies too. DMPTool users can report issues directly in the DMPTool repo or via the helpdesk. If something pertains to the common codebase, Stephanie will tag the issue and transfer it to DMPRoadmap. For DMPonline users we ask you to report issues via the helpdesk.

External contributions
Our core dev team is test driving the external contributor guidelines with the French team from DMP OPIDoR. They developed a new feature for a global notification system (e.g., to display maintenance messages, updates to funder templates) that happens to be in our backlog. The new feature looks great and is exactly the kind of contribution we’d like from others. You’ll see it in the next release. Thanks Benjamin and Quentin!

We’re also keen to commence monthly community dev calls to learn about other new features that folks might be planning and keep track of how we collaborate on DMP support across the globe.

Translations
We’ll be adding new translations for Brazilian Portuguese (thanks to Benilton de Sá Carvalho and colleagues at UNICAMP) and Finnish thanks to DMPTuuli. We’re also reaching out to fill in missing portions of existing translations for other languages since we added so many new features. New translations are always welcome; more information is available on the GitHub wiki and/or contact us.

A machine-actionable future
With the launch milestone behind us, we’re devoting more attention and resources to creating a machine-actionable future for DMPs. Two working groups hosted productive sessions at the recent RDA plenary (DMP Common Standards, Exposing DMPs) that included lightning talk presentations by members of the DMPRoadmap project (slides 1 and slides 2). Both of the groups are on track to provide actionable outputs in the next 12 months that will bolster wider community efforts on this front. We’ll continue participating in both groups as well as begin prototyping things with the NSF EAGER grant awarded to the California Digital Library. Stay tuned for more details via future updates and check out the activedmps.org site to get involved.

Prepare for launch in 3… 2… 1…

In about two weeks we will launch the new DMPTool on Tues, 27 Feb. The much-anticipated third version of the tool represents an exciting next step in what has always been a community-driven project. We’ve now successfully merged the primary US- and UK-based data management planning tools into a single codebase (DMP Roadmap): the engine under the new DMPTool hood.

Why are we doing this?

A little background for those who haven’t been following along with our codevelopment journey: in 2016 the University of California Curation Center (UC3) decided to join forces with the Digital Curation Centre (DCC) to maintain a single open-source platform for DMPs. We took this action to extend our reach beyond national boundaries and move best practices forward, with a lofty goal to begin making DMPs machine actionable (i.e., useful for managing data). We’ll continue to run our own branded services (DMPTool, DMPonline, DMPTuuli, DMPMelbourne) on the shared codebase, and incorporate partners in Canada, Argentina, South Africa, and throughout Europe who are already running their own instances (full list).

In parallel with our co-development efforts we’ve been making the rounds of Research Data Alliance, Force11, IDCC, and disciplinary meetings to collect use cases for machine-actionable DMPs (details here) and help define common standards (RDA Working Group; just posted pre-print for 10 Simple Rules for Machine-Actionable DMPs). We also got an NSF EAGER grant so we can begin prototyping muy pronto.

The new version of the DMPTool will enable us to implement and test machine-actionable things in a truly global open science ecosystem. Successful approaches to making DMPs a more useful exercise will require input from and adoption by many stakeholders so we look forward to working with our existing DMP Roadmap community (an estimated 50k+ users, 400+ participating institutions, and a growing list of funder contacts across the globe) and welcoming others into the fold!

Preparing for Launch

To help DMPTool administrators prepare themselves and their institutional users for the upcoming launch, we will host a webinar on:

Mon, 26 Feb 2018, 9-10 AM Pacific Time
Zoom link (recording on Vimeo; Q&A and slides)

By that time we’ll have a new user guide for administrators, a new Quick Start Guide for researchers, and refreshed promo materials. Everyone will have seamless access to their existing DMPTool accounts, just through a new user interface that looks and feels more like DMPonline (spoiler alert: we made it blue). And one of the most exciting things about the new tool is that it contains 34 freshly updated funder templates with links to additional funder guidance.

Stay tuned to the DMPTool communication channels in the coming weeks (blog, admin email list, Twitter) for more news and updates. We look forward to seeing you at the webinar and welcome your feedback at any point.

First annual funder template pizza party!

template editors

As we approach our target release date of Feb 2018 for the DMP Roadmap platform, the DMPTool team has embarked on a major housekeeping effort. A top-to-bottom content review is underway, and last week we began an audit of the funder templates and guidance. Ten participants gathered for an all-day, pizza-fueled event that amounted to a huge template success (but an epic pizza fail, see evidence below). We were so productive and gratified by the opportunity to analyze multiple DMP policies in a group setting that we decided to make it an annual event. Read on for more DMPTool funder template news + migration plans, followed by brief updates on the DMP Roadmap project and machine-actionable DMPs.

DMPTool funder templates

The DMPTool is a hugely popular community resource in part because it serves as a central clearinghouse of information about DMP requirements and guidance for researchers applying for grants from U.S. funding agencies. Migrating the DMPTool data to the new platform provides an opportunity to update and normalize things to maintain this value. [Side note: we’re also adding a “Last updated” field to the DMP Requirements table as an enhancement in the new platform per your feedback.]

At present the tool contains 32 templates for 16 different federal and private funders. This top 10 templates list demonstrates that our users are especially keen on getting support with NSF and NIH grant proposals, although the NEH is #7, and DOE and others aren’t far behind. Some global usage statistics to put these numbers in context: 26.8k users have created 20k plans; and we have 216 participating institutions (mostly U.S. colleges and universities).

funder-template-table

Our goals for the pizza party included: 1) ensuring that template language comes directly from the most recent versions of funder policy documents; and 2) applying themes (more on themes here). Staying up to date with DMP requirements remains a crowdsourced effort spearheaded by data librarians using the Twitter hashtag #OSTPResp and a Google spreadsheet. In the past year, two additional resources entered the scene: a list of public access plans from U.S. federal agencies at CENDI.gov and this lovely SPARC tool. Using these reference materials and some additional internet research, we updated 7 links to policy documents in the current DMPTool platform (NIH-GDS, NEH-ODH, NSF-CHE, NOAA, USDA-NIFA, Joint Fire Science Program, Sloan) and made some revisions to templates in the new platform (mostly formatting). We also identified some templates that require deeper investigation and/or consultation with agency contacts to verify the best way to present DMP requirements; between now and the release date we’ll continue to work on these templates. In addition, Jackie Wilson is contracting with us to finalize the clean-up of templates and guidance (checking links and guidance text provided by funders).

#pizzafail

#pizzafail

By January we aim to have a beta DMPTool-branded version of the new platform ready for training and testing purposes. Stay tuned for a rollout plan in the new year that includes webinars for institutional administrators, with an orientation to templates and themes. Also, please note that we will be disabling template editing functionality on 18 Dec in the current version of DMPTool to maintain the integrity of template data in the new platform. For admin users who wish to make changes to templates and guidance after that date, you can contact the helpdesk, but it would be great if you can keep changes to a minimum. All other functionality in the current DMPTool will remain the same up to the final migration date (adding new users, institutions, creating and editing plans, etc.)

A million thanks to the 2017 template fixing team: Amy Neeser, Joan Starr, Alana Miller, Jackie Wilson, Marisa Strong, Daniella Lowenberg, Perry Willett, John Chodacki, and Stephen Abrams.

DMP Roadmap update

The co-development team is busy building and refining the final MVP features. The usage dashboard is the last new feature left to add. In the meantime, parallel data migration efforts are underway at DCC to move from the existing 28 DMPonline themes to the new set of 14. By January both service teams will be working on new user guides, updating other content, testing and branding. If all continues to go smoothly, we’ll be on track for a DMP Roadmap demo at IDCC in Barcelona (19–22 Feb) and an official code release. Stay tuned!

Machine-actionable DMPs

On the machine-actionable DMP front, there are two items to report:

  1. We’ll be emailing the various DMP lists shortly to encourage everyone to participate in working meetings for the RDA WGs (DMP Common Standards & Exposing DMPs) at the next plenary. For now mark your calendars for 21–23 Mar and join us in Berlin!
  2. Following on a productive session at FORCE2017, we’re finishing a draft of the 10 Simple Rules for Machine-Actionable DMPs that we will circulate soon soon.

As always, we encourage you to contact us to get involved!

Roll up, roll up. Get yer DMP update here!

Paper seller and bench From Flickr by henry... CC-BY-NC-ND

From Flickr by henry… CC-BY-NC-ND

by Sarah Jones

Last month saw a busy Active DMPs and Domain Repositories Interest Groups joint session at the RDA Plenary in Montreal. Two new working groups have been launched to advance work in this area: one on developing Common Standards for DMPs and another on Exposing DMPs. In addition, there are multiple active projects in this space including ezDMP, the University of Queensland’s Data Management Records approach, FAIRsharing and our own DMPRoadmap project. All the slides and notes from the RDA session are available from the link above if you want to find out more. The working groups are just starting to get underway too, so please review their plans and contribute if you can.

We’ve been progressing the machine-actionable DMP agenda through the DMPRoadmap team too. With support from an RDA Europe collaboration award, we integrated the disciplinary Metadata Standards Directory (MSD) into the tool. Template administrators can choose the MSD as an answer format for metadata questions so users can browse the directory from within the tool. We’d love your feedback on this – both admins trialling it on templates and end users selecting standards. Can you find relevant standards easily? Is the functionality intuitive? Are there other features or additions you would like to see? Please try it out at https://dmponline-test.dcc.ac.uk and let us know.

RDA metadata standards directory screenshot

Integrating the MSD is just one small step on the path to improving the DMP experience. We also plan to surface other registries, such as FAIRsharing and re3data, to recommend appropriate standards and services. Experimentation in this area will also aim to facilitate the exchange of information between systems and alert services to data in the pipeline. The DMPTool team have just received a 2-year NSF EAGER grant to address these bigger aims! The work plan includes pilot projects with the Biological and Chemical Oceanographic Data Management Office (BCO-DMO) at Woods Hole, MA and understanding the institutional workflow in collaboration with Purdue and others. Find out more on the DMPTool blog; additional details forthcoming as we refine the work plan.

The next stop for us is FORCE2017 in Berlin next week. We’ll be running a session on 10 Simple Rules for Active DMPs on Friday morning (27 Oct) in collaboration with the FAIR DMP group. The session will introduce participants to the concepts of FAIR and machine-actionable DMPs and then build community consensus around common goals and definitions. We’ve been working on a draft that we’ll share and iterate on at the meeting. Join us there if you can!

We’re also looking forward to the International Digital Curation Conference (IDCC) in Barcelona next February. The call for papers is out now and closes later this month. Last year we outlined ideas for Next-Generation DMPs (here) and hosted a workshop that resulted in this white paper with community-generated use cases for machine-actionable DMPs. Thanks again to all those who contributed to defining these preliminary requirements for the work now being addressed by us and the RDA working groups. IDCC is a great opportunity to get international input on your ideas so share what you’ve been working on and join us in Barcelona!

NSF EAGER Grant for Actionable DMPs

We’re delighted to announce that the California Digital Library has been awarded a 2-year NSF EAGER grant to support active, machine-actionable data management plans (DMPs). The vision is to convert DMPs from a compliance exercise based on static text documents into a key component of a networked research data management ecosystem that not only facilitates, but improves the research process for all stakeholders.

Machine-actionable “refers to information that is structured in a consistent way so that machines, or computers, can be programmed against the structure” (DDI definition). Through prototyping and pilot projects we will experiment with making DMPs machine-actionable.

Imagine if the information contained in a DMP could flow across other systems automatically (e.g., to populate faculty profiles, monitor grants, notify repositories of data in the pipeline) and reduce administrative burdens. What if DMPs were part of active research workflows, and served to connect researchers with tailored guidance and resources at appropriate points over the course of a project? The grant will enable us to extend ongoing work with researchers, institutions, data repositories, funders, and international organizations (e.g., Research Data Alliance, Force11) to define a vision of machine-actionable DMPs and explore this enhanced DMP future. Working with a broad coalition of stakeholders, we will implement, test, and refine machine-actionable DMP use cases. The work plan also involves outreach to domain-specific research communities (environmental science, biomedical science) and pilot projects with various partners (full proposal text).

Active DMP community

Building on our existing partnership with the Digital Curation Centre, we look forward to incorporating new collaborators and aligning our work with wider community efforts to create a future world of machine-actionable DMPs. We’re aware that many of you are already experimenting in this arena and are energized to connect the dots, share experiences, and help carry things forward. These next-generation DMPs are a key component in the globally networked research data management ecosystem. We also plan to provide a neutral forum (not tied to any particular tool or project or working group) to ground conversations and community efforts.

Follow the conversation @ActiveDMPs #ActiveDMPs and activedmps.org (forthcoming). You can also join the active, machine-actionable DMP community (live or remote participation) at the RDA plenary in Montreal and Force11 meeting in Berlin to contribute to next steps.

Contact us to get involved!

On the right track(s) – DCC release draws nigh

blog post by Sarah Jones

Eurostar photo

Eurostar from Flickr by red hand records CC-BY-ND

Preliminary DMPRoadmap out to test

We’ve made a major breakthrough this month, getting a preliminary version of the DMPRoadmap code out to test on DMPonline, DMPTuuli and DMPMelbourne. This has taken longer than expected but there’s a lot to look forward to in the new code. The first major difference users will notice is that the tool is now lightning quick. This is thanks to major refactoring to optimise the code and improve performance and scalability. We have also reworked the plan creation wizard, added multi-lingual support, ORCID authentication for user profiles, on/off switches for guidance, and improved admin controls to allow organisations to upload their own logos and assign admin rights within their institutions. We will run a test period for the next 1-2 weeks and then move this into production for DCC-hosted services.

Work also continues on additional features needed to enable the DMPTool team to migrate to the DMPRoadmap codebase. This includes additional enhancements to existing features, adding a statistics dashboard, email notifications dashboard, enabling a public DMP library, template export, creating plans and templates from existing ones, and flagging “test” plans (see the Roadmap to MVP on the wiki to track our progress). We anticipate this work will be finished in August and the DMPTool will migrate over the summer. When we issue the full release we’ll also provide a migration path and documentation so those running instances of DMPonline can join us in the DMPRoadmap collaboration.

Machine-actionable DMPs

Stephanie and Sarah are also continuing to gather requirements for machine-actionable DMPs. Sarah ran a DMP workshop in Milan last month where we considered what tools and systems need to connect with DMPs in an institutional context, and Stephanie has been working with Purdue University and UCSD to map out the institutional landscape. The goal is to produce maps/diagrams for two specific institutions and extend the exercise to others to capture more details about practices, workflows, and systems. All the slides and exercise from the DMP workshop in Milan are on the Zenodo RDM community collection, and we’ll be sharing a write-up of our institutional mapping in due course. I’m keen to replicate the exercise Stephanie has been doing with some UK unis, so if you want to get involved, drop me a line. We have also been discussing potential pilot projects with the NSF and Wellcome Trust, and have seen the DMP standards and publishing working groups proposed at the last RDA plenary host their initial calls. Case statements will be out for comment soon – stay tuned for more!

We have also been discussing DMP services with the University of Queensland in Australia who are doing some great work in this area, and will be speaking with BioSharing later this month about connecting up so we can start to trial some of our machine-actionable DMP plans.

The travelling roadshow

Our extended network has also been helping us to disseminate DMPRoadmap news. Sophie Hou of NCAR (National Center for Atmospheric Research) took our DMP poster to the USGS Community for Data Integration meeting (Denver, CO 16–19 May) and Sherry Lake will display it next at the Dataverse community meeting (Cambridge, MA 14-16 June). We’re starting an inclusive sisterhood of the travelling maDMPs poster. Display the poster, take a picture, and go into the Hall of Fame! Robin Rice and Josh Finnell have also been part of the street team taking flyers to various conferences on our behalf. If you would like a publicity pack, Stephanie will send out stateside and Sarah will share through the UK and Europe. Just email us your contact details and we’ll send you materials. The next events we’ll be at are the Jisc Research Data Network in York, the EUDAT and CODATA summer schools, the DataONE Users Group and Earth Science Information Partners meetings (Bloomington, IN), the American Library Association Annual Conference (Chicago, IL), and the Ecological Society of America meeting (Portland, OR) . Catch up with us there!