What’s new with the DMPTool?

The past few months have been quite fruitful in terms of pushing forward on the technical details surrounding machine-actionable DMPs.

The common data model for the creation of machine-actionable DMPs, produced by the RDA working group on DMP Common Standards, was recently released for community feedback. With our partners at the Digital Curation Center (DCC), we are now actively incorporating this model into the DMPRoadmap codebase and our DMPTool development plans.

As part of our NSF EAGER grant, CDL has partnered with DataCite to explore how DOI infrastructure could enable the passing of information between RDM systems and supporting integration between various related systems. The initial phase of this work includes piloting workflows that efficiently move information between stakeholders, systems, and researcher workflows. Our goal is a working prototype developed by mid-October of this year. This is exciting as it represents the first step towards realizing our long-term goal of machine-actionable DMPs as critical infrastructure in the research process.


Community involvement 

Another key goal for the coming months is to re-engage the DMPTool community via regular virtual user meetings, the re-creation of advisory boards, and most importantly hearing more from you about how the DMPTool is working (or not) and gathering feedback on future developments and ideas for new areas of making the DMPTool even more useful and vital. In the coming weeks, we will reach out with more details on the above. However, in the meantime, please feel free to contact me and introduce yourself!

I am interested in hearing any input, questions, comments or feedback. You can contact me directly at maria.praetzellis@ucop.edu.

Meet the new DMPTool Product Manager

MariaPraetzellisHeadshotToday, August 19, marks my seventh week as the new DMPTool Product Manager, and the latest Research Data Specialist to join the team at UC3. I’m thrilled to be joining such an active and engaged community of professionals committed to the principles of open science, open infrastructure, and public access to research and knowledge.

As I take the reins from Stephanie Simms, I’m grateful for her instrumental work in rethinking the capabilities of a data management plan (DMP) and her work with the community in developing the conceptual frameworks and use cases for the creation of machine-actionable DMPs. As I’ve learned more in these first weeks, I am invigorated by the plans for machine-actionable DMPs, seeing the critical role they could play in research and data sharing and the exciting potential for expanding their dynamism, utility, and centrality to research data workflows. 

Prior to joining CDL, I was a Program Manager in the Web Archiving and Data Services group at the Internet Archive. At the Internet Archive, I managed domain-scale web harvesting, dataset and indexing services, and computational access to large-scale data for researchers. I bring a strong background in product management for services used by a global set of partners and a commitment to community-driven feature development and system integrations. 

I’m looking forward to expanding upon this experience as I begin work on furthering development of the DMPTool, keeping in step with what can be useful to and benefit the community, and advancing our shared commitment to open access to research and research data.

Please feel free to reach out and introduce yourself! I’m eager to receive any feedback or questions. You can reach me directly at maria.praetzellis@ucop.edu.

Representing time in machine-actionable DMPs

In this next installment of the machine-actionable DMP blog series, we want to address the broader context of time to hone in on answering the following question:

How and when do you update some piece of information in a DMP?

This happens to be the substance of Principle 9 from our preprint, forthcoming in PLOS Miksa et al. 2018: maDMPs should be versioned, updatable, living documents.

DMPs should not just be seen as a “plan” but as updatable, versioned documents representing and recording the actual state of data management as the project unfolds. The act of planning is far more important than the plan itself, and to derive value for researchers and other stakeholders, the plan needs to evolve. DMPs should track the course of research activities from planning to sharing and preserving outputs, recording key events over the course of a project to become an evolving record of activities related to the implementation of the plan.

We can all agree that it’s important to treat maDMPs as living documents, but there are multiple approaches we might take to updating them, and multiple stakeholders who should be able to provide updates for particular pieces of information at particular points along the way. First we’ll provide a quick overview of the current state of DMP-time as represented in systems and policies related to our NSF EAGER project, plus a handful of other relevant systems and policies that extend the geographical and organizational scope. Then, we’ll pitch an idea for how we can handle DMP-time using Crossref/DataCite Event Data Service. We welcome, nay encourage your feedback about this and other ideas as we experiment and iterate and prove things out in practice.

Representing time in DMPs

So we built a graph database with seed data from our partners at BCO-DMO and the UC Gump Field Station on Moorea, and enriched it with information from the NSF Awards API and public plans created with the DMPTool. All of the projects represented in the database correspond with NSF awards and therefore the DMPs have an associated timeline of:

  1. Create DMP and submit grant proposal (via institutional Office of Research, NSF Fastlane system)
  2. Grant awarded (grant number issued by NSF)
  3. Grant period ends, final report due (data deposited at appropriate repository)

This current grant/DMP workflow fails to capture information about actual data management activities as they unfold over the course of a project, however, data management staff at BCO-DMO and the Gump Field Station perform interventions and provide manual updates in their own repository systems opportunistically. These updates can occur during active stages of multi-year projects and most of them are done at the grant closeout stage when researchers are engaged with reporting activities and aware that they must deposit their data. Relevant NSF program officers from the Geosciences Directorate conduct manual compliance checks to ensure that grantees have deposited data prior to issuing a new award, which is a very useful feature of this case study.

In addition to the data repository systems, information about these projects flows through institutional grant management systems, NSF’s Fastlane system, and a subset is made publicly available via the NSF Awards API (example of our award). Each of these systems records the start data and end date for the award, and some include interim reporting dates. Our ongoing analysis for maDMP prototyping is focused on identifying additional milestones during the course of a project and which stakeholders should be responsible for updating which pieces of information…drilling into the original question of how and when do you update things?

DMP-time in European contexts

To avoid an overly narrow focus on one national context and one funding agency in this larger thematic discussion about time, we’ll also consider some European examples. The European Commission’s Horizon 2020 program acknowledges the fact that information about research data changes from the planning to final preservation stages; as a result, DMPs have built-in versioning. Horizon 2020 proposals that receive an award must submit a first version of the DMP within the first 6 months of the project. The DMP needs to be updated over the course of the project whenever significant changes arise, however, this “requirement” is somewhat vague and reads more like a best practice. Updated versions of the DMP are required at any periodic reporting deadline and at the time of the final report. DMPonline provides an optional set of Horizon 2020 templates that includes an 1) Initial DMP, 2) Detailed DMP, and 3) Final review DMP.

Our maDMP collaborators at the Technical University of Vienna are forging ahead with their own institutional prototyping efforts to automate DMPs and integrate them with local infrastructure. They just released this excellent interactive “mockups” tool and invite your feedback. Within the mockups system, time is represented through the concept of DMP Granularity and in some cases this is related to funding status. The level of granularity corresponds roughly with versions, which carry the labels “initial, detailed, or sophisticated.”

Representing time in maDMPs: Ideas for the future

The ability to update DMPs is central to our own plans for realizing machine-actionability and relies on infrastructure that already exists. In a nutshell, our idea is to insert DMPs and corresponding grant numbers into the sprawling web of information connecting people and their published outputs. We think the mechanism for accomplishing this is to issue DataCite DOIs for DMPs: this creates an identifier against which we can assert things programmatically. In addition, this hooks DMPs into Crossref/DataCite Event Data, which is a stream of assertions of relationships between research-related things. Existing and emerging registries of information are already leveraging this infrastructure—Scholix, ORCID, Wikidata, Make Data Count, etc. DMPs and grant numbers would provide a view of the connections between everything at the project level.

Documentation for Event Data explains that it “is a hub for the collection and distribution of a variety of Events and contains data from a selection of Sources. Every Event has a time at which it was created. This is usually soon after the Event was observed. In addition to this, every Event has a theoretical date on which it occurred…dates are represented as the occurred_at, timestamp and updated_date fields on each Event. The Query API has two views which allow you to find Events filtered by both occurred_at and timestamp timescales. It also lets you query for Events that have been updated since a given date.” This hub of information would therefore support versioning of the DMP as well as dynamic updating of key pieces of information (e.g. data types, volumes, licenses, repositories) by various stakeholders over time. Stakeholders could rely on this open hub of information and begin to make plans based on it (e.g., a named repository learns that a TB of data is expected within a specific timeframe).

In this scenario, the DMP would become an assertion store (cf. Wikidata and Wikibase). The assertion store would have a timeline component and anyone could use the DMP identifier to ping/query the Event Data Query API and find out what’s been asserted about the project. Various DMP stakeholders could also assert things about the project and update information over time. Each stakeholder could query and model DMP information based on the types of relationships and get the specific details they’re interested in… so an institution could discover who their PIs are collaborating with[o], a funder could check[p] if a dataset has been deposited in a named repository, a repository manager could search for any changes to a specific project or all relevant projects within a specific date range, etc. Wikidata has already begun indexing policies, in fact; once this happens at scale and is integrated with indexing of datasets, we could have automated dashboards displaying policy compliance and project progress.

That’s about it. Please tell us what you think about this approach to transforming a DMP into something active and updated, versioned and linked to research outputs.

Common standards and PIDs for machine-actionable DMPs

QR code cupcakes

From Flickr by Amber Case CC BY-NC 2.0 https://www.flickr.com/photos/caseorganic/4663192783/

Picking up where we left off from “Machine-actionable DMPs: What can we automate?”… Let’s unpack a couple of topics central to our machine-actionable DMP prototyping and automating efforts. These are the top rallying themes from all conversations, workshops, and working groups we’ve been privy to in the past few years. In addition, they feature in the “10 principles for machine-actionable DMPs” (principles 4 and 5):

  • DMP common standards
  • Persistent identifiers (PIDs)

DMP common standards
There’s community consensus about the need to first establish common standards for DMPs in order to enable anything else (Simms et al. 2017). Interoperability and delivery of DMP information across systems—to alleviate administrative burdens, improve quality of information, and reap other benefits—requires a common data model.

To address this requirement, the DMP Common Standards working group was launched at the 9th RDA plenary meeting in Barcelona. They’re making excellent progress and are on track to deliver a set of recommendations in 2019, which we intend to incorporate into our existing tools and emerging prototypes. Adoption of the common data model will enable tools and systems (e.g., CRIS, repositories, funder systems) involved in processing research data to read and write information to/from DMPs. The working group deliverables will be publicly available under a CC0 license and will consist of models, software, and documentation. For a summary of their scope and activities to date see Miksa et al. 2018.

A second round of consultation is underway currently to tease out more details and gather additional requirements about what DMP info is needed when for each stakeholder group. This international, multi-stakeholder working group is open to all; check out their session at the next RDA plenary in Botswana and contribute to the DMP common data model (6 Nov; remote participation is available).

Current/traditional DMPs - model questionnaires

<administrative_data>
    <question>Who will be the Principal Investigator?</question>
    <answer>The PI will be John Smith from our university.</answer>
</administrative data>
Machine-actionable DMPs - model information

“dc:creator”:[ {
         “foaf:name”:”John Smith”,
         “@id”:”orcid.org/0000-1111-2222-3333”,
         “foaf:mbox”:”mailto:jsmith@tuwien.ac.at”,
         “madmp:institution”:”AT-Vienna-University-of-Technology”
} ],

Caption: An example of data models for traditional DMPs (upper part) and machine-actionable DMPs (lower part). (Miksa et al. 2018: Fig. 1)

PIDs and DMPs
The story of PIDs in DMPs, or at least my involvement in the discussion, began with a lot of hand waving and musical puns at PIDapalooza 2016 (slides). After a positive reception and many deep follow-on conversations (unexpected yet gratifying to discover a new nerd community), things evolved into what is now a serious exploration of how to leverage PIDs for and in DMPs. The promise of PIDs to identify and connect research-relevant entities is tremendous and we’re fortunate to ride the coattails of some smart people who are making significant strides in this arena.

For our own PID-DMP R&D we’re partnering with one of the usual PID suspects, Datacite, to draw from their expertise and technical capabilities. Datacite contributed to the timely publication of the European Commission-funded FREYA report, which provides the necessary background research and a straightforward starting point(s). There’s also an established RDA PID interest group that we plan to engage with more as things progress.

A primary goal of FREYA is the creation and expansion of the “PID Graph.” The PID Graph “connects and integrates PID systems, creating relationships across a network of PIDs and serving as a basis for new services.” The report summarizes the current state of PID services as well as some emerging initiatives that we hope to harness (each is classified as mature, emerging, or immature):

  • ORCID iDs for researchers (mature)
  • DOIs for publications and data (mature), and software (emerging; also see SWH IDs)
  • Research OrgIDs for organizations (aka ROR; emerging and CDL is participating so we have an intimate view)
  • Global grant IDs (emerging and very exciting to track the prototyping efforts of Wellcome, NIH, and MRC!)
  • Data repository IDs (immature but on the radar as we address DMPs)
  • Project IDs/RAiDs (emerging and we see a lot of overlap with DMPs)

It also describes a vast array of PIDs for other things, all of which are potentially useful for maDMPs as we reconfigure them as an inventory of linked research outputs (Table 1: RRIDs, protocols, research facilities, field stations, physical samples, cultural artifacts, conferences, etc. etc.). Taken together, these efforts are aimed at extending the universe of things that can be identified with PIDs and expanding what can be done with them. This, in turn, supports automation and machine-actionability to achieve better research data management and promote open science.

Summing up
For now we’ll continue exploring our graph database and interviewing stakeholders who contributed seed data to dive deeper into their workflows, challenges, and use cases for maDMPs. This runs parallel to the activities of the RDA DMP Common Standards WG and various emerging PID initiatives. Based on this overlapping community research, we can move forward with outlining what to implement and test. The recommendations of the RDA group for DMP common standards are a given, and below is a high-level plan for PID prototyping:

PIDs for DMPs and PIDs in DMPs:

  • DOIs for DMPs: define metadata
  • PIDs in DMPs: What can we achieve by leveraging mature PID services? How do we make the information flow between stakeholders and systems?

Stay tuned as the story develops here on the blog! I’ll also be presenting on maDMPs in a data repositories session convened by our BCO-DMO partners at the upcoming American Geophysical Union meeting in DC (program here, 11 Dec). And Daniel Mietchen will be at PIDapalooza 2019 (Dublin, 23-24 Jan) promoting a highly relevant initiative: PIDs for FAIR ethics review processes.

Prettier and mobile-ready(-er)

The latest DMPTool release is focused on making the tool more attractive and mobile-ready, AND more accessible (release notes). Continue reading for highlights and screenshots, or you can visit the live version of the tool for an improved user experience. Please report any issues via the helpdesk or GitHub issues.

After a thorough round of quality assurance testing, we’ll upgrade the DMPTool to the latest version of the Roadmap code in the coming weeks (v2.0.0 and blog news)…stay tuned. Some accessibility work will continue on the core codebase in the coming months.

Improved responsiveness

We made it easier to access the public pages and improved menu navigation on mobile and tablet devices (screenshots below).

mobile pages

Redesigned institutional branding banner

Organizational admins have the option to upload a logo that displays in the main branding banner for their local users (see example for UCSD users below; instructions for admins on Customizing your institutional profile). In connection with the changes to improve the responsiveness of the tool, we also redesigned the banner to accommodate the wide variety of logo shapes and sizes, in particular, large horizontal logos.

unaffiliated branding

Standard DMPTool branding for unaffiliated users.

UCSD branding

UCSD branding for users affiliated with that institution.

Roadmap back to school edition

Summer activities and latest (major 2.0.0) release
The DMPRoadmap team is checking in with an overdue update after rotating holidays and work travels over the past few months. We also experienced some core team staff transitions and began juggling some parallel projects. As a result we haven’t been following a regular development schedule, but we have been busy tidying up the codebase and documentation.

This post summarizes the contents of the major release and provides instructions for those with existing installations who will need to make some configuration changes in order to upgrade to the latest and greatest DMPRoadmap code. In addition to infrastructure improvements, we fixed some bugs and completed some feature enhancements. We appreciate the feedback and encourage you to keep it coming since this helps us set priorities (listed on the development roadmap) and meet the data management planning needs of our increasingly international user community. On that note, we welcome Japan (National Institute for Informatics) and South Africa (NeDICC) as additional voices in the DMP conversation!

Read on for more details about all the great things packed into the latest release, as well as some general updates about our services and of course machine-actionable DMPs. The DCC has already pushed the release out to its services and the DMPTool will be upgrading soon – separate communications to follow. Those who run their own instances should check out the full release notes and a video tutorial on the validations and data clean-up (thanks Gavin!) to complete the upgrade.

DMPRoadmap housekeeping work (full release notes, highlights below)

  • Instructions for existing installations to upgrade to the latest release. Please read and follow these carefully to prevent any issues arising from invalid data. We highly recommend that you backup your existing database before running through these steps to prepare your system for Roadmap 2.0.0!
  • Added a full suite of automated unit tests to make it easier to incorporate external contributions and improve overall reliability.
  • Added data validations for improved data integrity.
  • Created new and revised existing documentation for coding conventions, tests, translations, etc (Github wiki). We can now update existing translations and add new ones more efficiently.

DMPRoadmap new features and bug fixes

  • Comments are now visible by default without having to click ‘Show.’ Stay tuned for additional improvements to the plan comments functionality in upcoming sprints.
  • Renamed/standardized text labels for ‘Save’ buttons for clarity.
  • Added a button to download a list of org users as a csv file (Admin > ‘Users’ page)
  • Added a global usage report for total users and plans for all orgs (Admin > ‘Usage’ page)
  • Admins can create customized template sections and place them at the beginning or end of funder templates via drag-and-drop
  • Removed multi-select box as an answer format and replaced with multiple choice

DCC/DMPonline subscriptions [Please note: this does not apply to DMPTool users] Another recent change is in the DMPonline service delivery model. The DCC has been running DMP services for overseas clients for several years and is now transitioning the core DMPonline tool to a subscription model based on administrator access to the tool. The core functionality (developing, sharing and publishing DMPs) remains freely accessible to all, as well as the templates, guidance and user manuals we offer. We also remain committed to the Open Source DMPRoadmap codebase. The charges cover the support infrastructure necessary to run a production-level international service. More information is available for our users in a recent announcement. We’re also growing the support team to keep up with the requests we’re receiving. If you are interested in being at the cutting edge of DMP services and engaging with the international community to define future directions, please apply to join us!

Machine-actionable DMPs
Increasing the opportunities for machine-actionability of DMPs was one of the spurs behind the DMPRoadmap collaboration. Facilities already exist via use of a number of standard identifiers and we’re moving on both the standards development tracks and code development and testing.

The CDL has been prototyping for the NSF EAGER grant and started a blog series focused on this work (#1, #2, next installation forthcoming), with an eye to seeding conversations and sharing experiences as many of us begin to experiment in multiple directions. CDL prototyping efforts are separate from the DMPRoadmap project currently but will inform future enhancements.

We’re also attempting to inventory global activities and projects on https://activedmps.org/ Some updates for this page are in the works to highlight new requirements and tools. Please add any other updates you’re aware of! Sarah ran a workshop in South Africa in August on behalf of NeDICC to gather requirements for machine-actionable DMPs there and the DCC will be hosting a visit from DIRISA in December. All the content from the workshop is on Zenodo and you can see how engaged the audience got in mapping our solutions. The DCC is also presenting on recent trends in DMPs as part of the OpenAIRE and FOSTER webinar series for Open Access week 2018. The talk maps out the current and emerging tools from a European perspective. Check out the slides and video.

You can also check out the preprint and/or stop by the poster for ‘Ten Principles for Machine-Actionable DMPs’ at Force2018 in Montreal and the RDA plenary in Botswana. This work presents 10 community-generated principles to put machine-actionable DMPs into practice and realize their benefits. The principles describe specific actions that various stakeholders are already undertaking or should take.

We encourage everyone to contribute to the session for the DMP Common Standards working group at the next RDA plenary (Nov 5-8 in Botswana). There is community consensus that interoperability and delivery of DMP information across systems requires a common data model; this group aims to deliver a framework for this essential first step in actualizing machine-actionable DMPs.

Minor NSF template updates + other miscellanea

In the waning weeks of summer, we accomplished a wide range of DMPTool things. A bulleted summary of mostly template-related updates is below. Admins should take note that the minor National Science Foundation (NSF) template updates resulted in new versions of the 4 templates in question. This means that admins will need to transfer any customizations you may have created for these templates (instructions here). All users will also see a dismissable notification message when you log into the tool (screenshot below). Read on for more details.

TL;DR

  • Some minor NSF template updates: AGS, EAR, CISE, SBE
  • DCC template now available in Brazilian Portuguese
  • DMPTool templates added to protocols.io
  • Final promo materials shipped and order form closed
  • First successful eduGAIN configuration: welcome to Australian National University!

transfer template customization

notification of template changes

Minor NSF template updates
While working on our machine-actionable DMPs grant, we noticed that a handful of NSF entities had issued updates to DMP requirements since our comprehensive template audit in Feb 2018. The four divisions/directorates listed below posted new documents in Apr 2018 with very minor changes from the previous versions. None of the changes affect the core requirements; most involve updated links and resources. A detailed summary of the changes for each template follows and you can view all templates on the DMPTool Funder Requirements page:

NSF-AGS: Atmospheric and Geospace Sciences

NSF-EAR: Earth Sciences

  • updated PDF document with new links
  • updated appendix with list of recommended repositories and other resources

NSF-CISE: Computer & Information Science & Engineering

  • updated links and reformatting on webpage
  • merged redundant questions about data storage reducing total questions from 7 to 6

NSF-SBE: Social, Behavioral & Economic Sciences

  • new PDF document with no substantive changes; mostly reformatting and removed references to specific repositories

DCC template available in Brazilian Portuguese

A big thanks to Vitor Silvério Rodrigues from São Paulo State University (UNESP) for translating the DCC template (defined by our Digital Curation Centre partners) into Brazilian Portuguese! This is the default, best practices template provided when users check the box to indicate that they aren’t applying to a specific funder. Anyone can now download the translated template from the Funder Requirements page. The DMPTool is not localized to automatically serve up the translated template for users who set their language to Brazilian Portuguese, however. In order to create a new plan with the translated version, users should make the following selections in the create plan wizard (regardless of language setting):

  1. Enter a project title
  2. Select São Paulo State University (UNESP) as your organization
  3. Select Digital Curation Centre (DCC) as the “funder”
  4. Click button to create plan

Brazilian Portuguese create plan options

new plan with translated DCC template

DMPTool templates added to protocols.io

Protocols.io is an open repository popular among computational and bioinformatics researchers, yet open to all domains, where all scientific protocols (private or public) can be annotated and discussed on step- or protocol-level. Users can also fork (clone) public protocols and publish modified versions as well as connect protocols to published articles and other research outputs, all in the pursuit of increasing transparency and reproducibility.

Scientific protocols are among the many research outputs that we aim to inventory with machine-actionable DMPs. We often promote the notion that DMPs themselves are essentially protocols (i.e., a description of digital research methods), and should be maintained as such over the course of a project. During conversations with the protocols.io team about our intersecting activities, they suggested that we experiment with enabling researchers to create and maintain DMPs on their platform. So we created a Data Management Plans group with two basic DMP templates for users who might prefer this dynamic platform for documenting their digital protocols to an online wizard that produces a static text file. Go check it out and spread the word!

Final promo materials shipped and order form closed

Everyone who placed orders for DMPTool marketing materials (postcards and stickers) should have received them by now, hopefully in time for workshops and other events to kick off the new academic year. The order form for free materials associated with the launch is now closed. Just a reminder that we provide various promo materials (all CC0) on the website so anyone can produce their own swag and spread the DMPTool gospel.

First successful eduGAIN/SSO configuration!

One of the most popular features of the DMPTool is the ability for participating institutions to configure Shibboleth single signon, thereby enabling their users to sign in easily with institutional credentials. Until recently, we only provided this functionality for members of the US-based InCommon federation. There is now an interfederation service called eduGAIN that connects identity federations around the world. We are pleasantly surprised (since Shib can be a tricky, black-box affair) that we were able to configure our first eduGAIN institution: the Australian National University. We hope for (but cannot promise) similar success stories for other identity federations that participate in eduGAIN. The Australian Access Federation is documenting the process and we’re delighted to welcome ANU to the DMPTool community!

Machine-actionable DMPs: What can we automate?

Following on some initial thoughts about Scoping Machine-Actionable DMPs (maDMPs), we’re keen to dive into the substance. There are plenty of research questions we plan to explore here and over the course of our maDMP prototyping efforts. Let’s begin with these:

What can we automate?
What needs to be entered manually?

One of the major goals for maDMPs is to automate the creation and maintenance of some pieces of information.

Automation stands to alleviate administrative burdens and improve the quality of information contained in a DMP.

Thankfully, we’re not starting from scratch since Tomasz Miksa crafted an assignment for his CS students at the Technical University of Vienna to build an maDMP prototype tool and answer these very questions (course details; assignment). The student reports provide valuable insights that will help guide our own and others’ work on the topic. Read on for a brief overview of the assignment and a discussion of the key results; the results are woven into answers to the questions above.

I will also note that our own project includes grant numbers as a key piece of project metadata, which is not part of this assignment. We’re currently exploring the NSF Awards API and institutional grants management systems in the context of these questions, more on this anon.

Assignment
Students were instructed to build a tool that gathers information from external sources and automatically creates a DMP. Modeled on the European Commission’s DMP requirements for Horizon 2020, students could choose to create a DMP when a project starts (first version upon receiving funding) or when a project ends and all products have been preserved/published (final report). For the first option, the tool should help researchers estimate their storage needs and select a proper repository to store their research outputs. For the second option, the tool should connect to services where data is stored to retrieve information for creating a DMP.

External (or controlled) sources of information included:

  1. Administrative info (researcher name, project title): Use one or both of these inputs to search the university profile system and/or ORCID API to retrieve additional info (affiliation, contact email, etc).
  2. Find a repository (option 1): Use the OpenDOAR API or re3data API to recommend a repository based on sample data types and location (Europe, Austria)
  3. Get metadata about things deposited in a repository (option 2): Collect as much info as possible from the GitHub API about software products and OAI-PMH compliant repositories (e.g., license, format, size, etc) for other products.
  4. Select a license (if not provided in step 3): EUDAT license selector, reuse existing code.
  5. Preservation details: Allow users to tag all research products (e.g., input data, output data, software, documentation, presentation, etc.). Group them if appropriate. Provide a combo-box to define how long each product will be preserved (5, 10, 20 years).

The final reports describe the architecture and implementation of the tool; demonstrate how it works; include a human-readable and an maDMP created with the tool; and answer some questions about the benefits and limitations of automation.

Results
The student reports emphasize that a mixture of automation and manual processes is necessary to produce DMPs that meet all of the requirements outlined by funders. They demonstrate how we can leverage automation for maDMPs and provide thoughtful analyses about how we can consume available sources of information.

Portions of a DMP that can be automated easily include:

  • Basic project details such as title, names/authors, DMP creation date
  • Information (including metadata) about the research products associated with the project (e.g., data, software…)
  • Repository details: e.g., Zenodo, Github for software

Other automated portions of a DMP enable some inference but aren’t adequate by themselves:

  • Licenses: can be derived from a Github/Zenodo link
  • Software and data preservation details: some data is given for each file; some assumptions can be made based on the repository
  • Data sharing, access, and security details: some data is given for each file; some assumptions can be made based on the repository
  • Costs/resources: estimations can be made based on the size and type of data

Portions of a DMP that cannot be completed via automation:

  • Roles and responsibilities (although at TU Wien this is partially automated; they assume the project uses their infrastructure and provide details to designate individuals responsible for backups, final data deposit, etc)
  • Licenses and policies for reuse, derivatives (complete answers must be provided manually)
  • Ethical and privacy questions

Check out this example of a human-readable landing page for the DMP produced by one student team (Rafael Konlechner and Simon Oblasser) and the corresponding json output for the maDMP version. Some other examples of maDMP-creation tools for both assignment options are available here (ex 1, ex 2, ex 3, ex 4, ex 5, ex 6); they’re provided as Docker containers that can be launched quickly.

Discussion
The student prototypes and some other projects in this arena (e.g., UQRDM) inform larger maDMP goals surrounding automation and maintenance/versioning (i.e., keeping info in a DMP up to date). They identify sources/systems of existing information, mechanisms (APIs, persistent identifiers) for consuming and connecting it, and some important limitations regarding the informational content that require manual interventions and enrichment.

Our own prototype is following a similar trajectory as the student assignment. We’re defining existing data sources/systems and exploring the possibilities for moving information between them. The good news is that there are lots of sources and APIs out there in the wild with implications for maDMPs. There are also lots of existing initiatives to connect all the things that could become part of an maDMP framework (e.g., Scholix, ORCIDs, OrgIDs).

By taking this approach, we want to make the creation and maintenance of a DMP an iterative and incremental process that engages all relevant stakeholders (not just researchers writing grant proposals). Researchers need guides and translators to find the best resources and do their research efficiently, and in a manner that complies with open data policies. As we noted in the previous blog post, we want to enable repository operators, research support staff, policy experts, and many others to contribute to DMPs in order to achieve good data management.

Up next
Some related questions that we’re mulling over, but won’t endeavor to answer in this post:

  • Which stakeholders and/or systems should be able to make and update assertions (in a DMP) about a grant-funded project?
  • What is required to put it all together?

A teaser for the second question: interoperability and delivery of the DMP information across systems requires a common data model for DMPs. You can join the RDA DMP Common Standards working group to contribute to this ongoing effort. We’ll unpack this one in a future blog post.

Thanks to Tomasz (also a co-chair of the RDA group) and his students for taking an inspirational lead in maDMP prototyping!

Release notes: Guidance, Request feedback, and fixes

We just deployed some bug fixes and minor changes to the DMPTool based on your feedback. Comprehensive release notes are available in GitHub. Keep reading for a summary of things that affect the user interface:

Guidance and Customizing Templates

  • You can create guidance with multiple themes again (instructions in the admin help guide). We’re still working to make sure that guidance tagged with multiple themes only appears once (in some cases it’s duplicated in the accordions displayed to end users); this will be fixed soon.
  • When a user selects guidance on the project details tab, the main organizational guidance group is checked by default and any optional subgroups are unchecked (screenshot below with UCSF test data).
  • We fixed a bug related to customizing funder templates so please go forth and customize (instructions in the admin help guide)! Contact the helpdesk if you created any customizations in the past few weeks and notice that the themes were removed from the customized questions.

default guidance selections

Request feedback workflow

  • We revised the tooltip language to clarify changes on the Admin > Organization details > Request Feedback tab.
  • If you enable this functionality, the system now displays your ‘Request Feedback Message’ to users from your organization on-screen instead of sending them an email (screenshots below; #2 is from the Share tab where users can click the button to ‘Request feedback’).
  • When you finish providing feedback on a plan and click the ‘Complete’ link on the Admin > Plans page, we updated the ‘Request Feedback complete’ email with clearer instructions on how the plan owner can find your comments and a direct link to the appropriate page.

request feedback tab

share tab request feedback button

Other DMPTool news

  • The recording from the webinar on themes, guidance, and templates is available on the CDL Vimeo page. You can also find links to webinars from the GitHub wiki and blog.
  • For those who placed orders for marketing materials after 18 Jun, we’ll ship the second batch next month after restocking. Thanks for all the suggestions about other ways we can help you promote the DMPTool—please keep them coming!

Scoping Machine-Actionable DMPs

Machine-actionable data management plans (maDMPs) are happening. Over the past several years we’ve contributed to community discussions and various events to suss out what we all mean by this term and why we think maDMPs are important. In the midst of these efforts, we (California Digital Library) also received an NSF EAGER grant to prototype maDMPs and are now in the process of designing that work.

To connect our prototyping with the constantly evolving maDMP landscape, we remain active in the Research Data Alliance, Force11, domain-based efforts (e.g., AGU Enabling FAIR Data), and of course we run the DMPTool service as part of an international policy/support initiative called the DMP Roadmap project. We also recently helped launch a website activedmps.org to identify all of the people and projects across the globe working on maDMPs.

In keeping with this community thread, as well as for our own edification, we’re kicking off an maDMP blog series. The primary goal is to offer some framing documents so other stakeholders, especially those who’ve invested as much time as we have thinking about such an obscure topic (!), can help us ask and answer the many outstanding questions about maDMPs. A secondary motivation is to respond to the frequent queries from our users and other stakeholders about how to envision and plan for an maDMP future, which seems inevitable as more of us begin to prototype in different directions.

For this inaugural scoping piece we want to address the following high-level questions. And just to reiterate, the answers herein are distilled from our own thinking; by no means do we think that these are the correct or only answers. We invite others to challenge our ideas at any/every step along the way.

  1. What are maDMPs?
  2. What are they not? 
  3. Who are they for?
  4. How are they different from “traditional” DMPs?
  5. What does this mean for the future of DMPs and support services?

…What comes next?

 

1. What are maDMPs?
maDMPs are a vehicle for reporting on the intentions and outcomes of a research project that enable information exchange across relevant parties and systems. They contain an inventory of key information about a project and its outputs (not just data), with a change history that stakeholders can query for updated information about the project over its lifetime. The basic framework requires common data models for exchanging information, currently under development in the RDA DMP Common Standards WG, as well as a shared ecosystem of services that send notifications and act on behalf of humans. Other components of the vision include machine-actionable policies, persistent identifiers (PIDs) (e.g., ORCID iDs, funder IDs, forthcoming Org IDs, RRIDs for biomedical resources, protocols.io, IGSNs for geosamples, etc), and the removal of barriers for information sharing.

2. What are they not?
maDMPs are not a collection of best practices for creating a data management plan (those exist already, Michener 2015) nor are they a comprehensive record of every detail about a research project and how it was conducted (i.e., they are not the Open Science Framework). It is out of scope to use maDMPs to connect all the things in the universe and try to solve reproducibility. Instead they are a plan and instructions about how to implement the plan, as well as a report about the completion of the plan; this plan includes an inventory/registry of research outputs and information about what to do with each thing (e.g., length of time to retain a dataset in a repository).

3. Who are they for?
maDMPs are focused primarily on infrastructure providers, systems, and those responsible for creating and enforcing research data policies. maDMPs are not focused primarily on researchers, data librarians, or other research support staff. However, broad adoption by all stakeholders in the research enterprise is required to achieve the the goals of the policies and ideally everyone will reap the benefits. Here is a (roughly) ranked-order list of the target audience for maDMPs:

  • Funder: funding agencies and foundations that specify requirements for DMPs and monitor compliance.
  • Repository Operator: General (e.g., Zenodo, Dryad), disciplinary (e.g., GenBank, ICPSR), and institutional data repositories.
  • Infrastructure Provider: Providers of systems for creating DMPs (DMPTool, DMPonline), grants administration, researcher profiles (RIMS/CRIS), etc. .
  • Institutional Administrator: Office of Research/Sponsored Programs, Chief Information Officers, University Librarians, others.
  • Ethics Review: Institutional Review Boards (IRB)/Research Ethics Boards (REB) that authorize human subjects research.
  • Legal Expert: Technology transfer offices; copyright and patent experts.
  • Publisher: Purveyors of article and data publication services.
  • Researcher: Principal Investigator and collaborators, including postdoctoral researchers, graduate and undergraduate students.
  • Research Support Staff: Data managers/curators, research administrators, and data librarians.
machine-actionable DMP info flows

Examples of stakeholder interactions within the ecosystem of machine-actionable DMPs. Stakeholders communicate with each other by exchanging information through DMPs. For example, a repository operator can select a proper repository, set an embargo period, and assign a correct license to data submitted by researchers. In return, a system acting on behalf of a repository operator provides a list of DOIs assigned to the data and provides information on costs of storage and preservation. This in turn can be accessed by a funder to check how the DMP was implemented.

4. How are they different from “traditional” DMPs?
The vision for maDMPs is to automate certain pieces of the DMP process, especially to alleviate the administrative burden of entering the same information in multiple places (e.g. it would be great if a researcher could recycle part or all of an IRB application for a DMP, or generate a Biosketch/CV automatically from their ORCID profile, or automatically generate a data availability statement when publishing data/articles). There is still a need for a human-readable narrative that describes digital research methods and outputs, but the main difference is that it should be updatable so that DMPs can become useful beyond the grant application stage.

5. What does this mean for the future of DMPs and support services?
We get asked this question often, most recently in the form of a provocative email from Dr. Devan Ray Donaldson as he was designing the curriculum for his digital curation course at Indiana University Bloomington.

Our response: Librarians and other digital curation experts absolutely have a role to play in supporting researchers with DMPs and data management issues more broadly. At CDL we spend a lot of time digging into the weeds of digital curation issues with librarians and researchers at all 10 UC campuses and we noticed that a major barrier to effectively supporting researchers is that they don’t recognize the language/jargon of digital curation. At the risk of self-promotion I’ll direct you to this guide that we created based on our collective experiences as researchers, and now as people who support researchers, called “Support Your Data.” John Borghi was the main driver of the project (more details from him here) and we’re now developing more attractive resources and a website to adapt for your purposes if you find these materials useful. The goal is to educate researchers about good data management practices by relating to their current practices, and demonstrate how small habits (e.g., file naming conventions) can amount to better/more efficient research.

… What comes next?
maDMPs present an opportunity to move DMPs beyond a compliance exercise by providing needed structure, interoperability, and added-value functionality to support open, reusable research data. We’re designing and developing an open framework for maDMPs that builds on existing initiatives and infrastructure. There are numerous efforts focused on connecting people and outputs (e.g., ORCID, Wikidata, Scholix, NCBI accession numbers). We want to link this information with grant numbers to create a dynamic inventory of assertions about a grant-funded research project (note: in the future we’ll also consider DMPs not associated with grants).

Step 1 for us is to get seed data from our partners at BCO-DMO and the UC Berkeley Gump Field Station on Moorea and structure it to define native maDMPs. We’ll discuss subsequent steps in future blog posts. Stay tuned!