Finding our Roadmap rhythm

Image from page 293 of "The life of the Greeks and Romans" (1875) by Guhl, Koner, and Hueffer. Retrieved from the Internet Archive https://archive.org/details/lifeofgreeksroma00guhl

Image from page 293 of “The life of the Greeks and Romans” (1875) by Guhl, Koner, and Hueffer. Retrieved from the Internet Archive https://archive.org/details/lifeofgreeksroma00guhl

In keeping with our monthly updates about the merged Roadmap platform, here’s the short and the long of what we’ve been up to lately:

Short update

Long(er) update

This month our main focus has been getting into a steady 2-week sprint groove that you can track on our GitHub Projects board. DCC/DMPonline is keen to migrate to the new codebase asap so in preparation we’re revising the database schema and optimizing the code. This clean-up work not only makes things easier for our core development team, but will facilitate community development efforts down the line. It also addresses some scalability issues that we encountered during a week of heavy use on the hosted instance of the Finnish DMPTuuli (thanks for the lessons learned, Finland!). We’ve also been evaluating dependencies and fixing all the bugs introduced by the recent Rails and Bootstrap migrations.

Once things are in good working order, DMPonline will complete their migration and we’ll shift focus to adding new features from the MVP roadmap. DMPTool won’t migrate to the new system until we’ve added everything on the list and conducted testing with our institutional partners from the steering committee. The UX team from the CDL is helping us redesign some things, with particular attention to internationalization and improving accessibility for users with disabilities.

The rest of our activities revolve around gathering requirements and refining some use cases for machine-actionable DMPs. This runs the gamut from big-picture brainstorming to targeted work on features that we’ll implement in the new platform. The first step to achieving the latter involves a collaboration with Substance.io to implement a new text editor (Substance Forms). The new editor offers increased functionality, a framework for future work on machine-actionability, and delivers a better user experience throughout the platform. In addition, we’re refining the DMPonline themes (details here)—we’re still collecting feedback and are grateful to all those who have weighed in so far. Sarah and I will consolidate community input and share the new set of themes during the first meeting of a DDI working group to create a DMP vocabulary. We plan to coordinate our work on the themes with this parallel effort—more details as things get moving on that front in Nov.

Future brainstorming events include PIDapalooza—come to Iceland and share your ideas about persistent identifiers in DMPs!—and the International Digital Curation Conference (IDCC) 2017 for which registration is now open. We’ll be presenting a Roadmap update at IDCC along with a demo of the new system. In addition, we’re hosting an interactive workshop for developers et al. to help us envision (and plan for) a perfect DMP world with tools and services that support FAIR, machine-actionable DMPs (more details forthcoming).

Two final pieces of info: 1) We’re still seeking funding to speed up progress toward building machine-actionable DMP infrastructure; we weren’t successful with our Open Science Prize application but are hoping for better news on an IMLS preliminary proposal (both available here). 2) We’re also continuing to promote greater openness with DMPs; one approach involves expanding the RIO Journal Collection of exemplary plans. Check out the latest plan from Ethan White that also lives on GitHub and send us your thoughts on DMP workflows, publishing and sharing DMPs.

New template: DOD

As far as we can discern, DMPs are not yet a required component of Department of Defense (DOD) grant applications. But in an effort to address numerous user requests for a DOD template, we went ahead and created one based on the draft DOD Public Access Plan issued in Feb 2015, which states:

“This proposed plan is a draft at this point and has not been adopted as part of the DoD regulatory system or as a definitive course of action.”

The (draft) DOD requirements for DMPs are similar to those issued by NSF, NASA, and others so DMPTool users should note the resemblance among these templates. Another similarity is that the DOD plan focuses heavily on access to data underlying published articles. The plan mentions an implementation date at the end of FY 2016 — we will monitor the situation and update the template accordingly. This also presents an opportunity to monitor the new CENDI.gov inventory of public access plans.

Meanwhile, the DOD encourages pilot projects with voluntary submission of articles and data. The Defense Technical Information Center (DTIC) will be responsible for key elements of policy implementation and compliance monitoring (see their prototype DOD Public Access Search for articles that mention DOD funding).

Official news remains pending, but for now we’re happy to provide a draft DOD template for conscientious researchers. If anyone has experience with DOD programs asking for DMPs or related developments, please let us know!

A common set of themes for DMPs: Seeking input

When the Digital Curation Centre (DCC) revised DMPonline in 2013, we introduced the concept of themes to the tool. The themes represent the most common topics addressed in Data Management Plans (DMPs) and work like tags to associate questions and guidance. Questions within DMP templates can be tagged with one or more themes, and guidance can be written by theme to allow organisations to apply their advice over multiple templates at once. This means organisations don’t have to worry about monitoring changes in requirements and updating their guidelines each time a new template is released.

Backup and storage guidance with theme tag

Institutional guidance on ‘Storage and Backup,’ overlaid onto a funder template

Moving forward, we see potential for broader application of the themes. In collaboration with the DMPTool, we plan to use a refined set of themes to support our objectives around machine-actionable DMPs. The themes provide the beginnings of a common vocabulary and structure for DMPs and could help to identify sections of text to mine, e.g., to identify a repository named in a DMP and the volume of data in the pipeline.

Stephanie and I have revised the existing set of Data Management Planning themes and propose a shortened set of 17 themes. We merged several closely related themes, e.g., ‘Metadata’ and ‘Documentation.’ Now we’re keen to collect your feedback about whether the themes still cover all the required elements and if they make sense to users. The goal is to find a suitable balance between the total number of themes (for mining and for usability considerations when creating guidance) and granularity. Specific questions we have are:

  • Whether ‘Existing data’ should be a separate category? We’ve merged it with the general ‘Data description’ on the rationale that reusing data doesn’t apply in all domains.
  • Should the ‘Data repository’ theme be merged with ‘Preservation’ or is it better kept separate since repositories cover preservation and sharing?
  • Several themes address data sharing: one is generic (‘Data sharing’), one addresses the ‘Timeframe for sharing’ and one covers ‘Restricted-use data.’ Is this granularity needed or should some of these themes be merged, e.g., ‘Data sharing’ and ‘Restricted-use data.’

We’re reaching out to various groups on this: the Force 11 FAIR DMP group, the RDA Active DMPs group, CASRAI UK DMP working group, and the Data Documentation Initiative (DDI) Active DMPs working group. Naturally we’re also consulting the DMPonline and DMPTool user groups and are keen to receive feedback from any other quarters too so please pass this notice on to colleagues! Comments can be left on the blog here or emailed to the DMPONLINE-USER-GROUP.

The original and revised sets of themes are below for reference:

 

New release: Privacy policy, plan visibility, and more…

We just released a batch of subtle changes designed to boost community insight into DMP behaviors. With DMPTool usage continuing to grow in leaps and bounds, we’re well embedded in burgeoning initiatives to build RDM programs, promote open scholarship, and reimagine DMPs as dynamic, updatable inventories of research activities. The tweaks and enhancements outlined below are about determining what we should be measuring and using this information to contribute to our collective data management efforts.

But before we get into the technical details, here’s a snapshot of DMPTool usage to date (a full report is next on the agenda). Our U.S.-centric user community is comparable in shape and size to that of DMPonline for the UK (plus Europe, Canada, and Australia), which reinforces our combined position as international DMP players.

  • Total n users = 20,390
  • Total n plans = 17,526 (13,612 excluding plans with “test” in the title)
  • Total n participating organizations = 194
    • 171 universities/institutions
    • 8 organizations, distributed or discipline-specific (e.g., DataONE, UCAR, WHOI)
    • 15 funders, some participating actively (e.g., template maintenance), others passively
  • Top 5 templates: NSF-SBE, NIH-GEN, NSF-GEN, NSF-ENG, NSF-BIO

Release Notes

  • Privacy Policy/Terms of Use. We updated our privacy policy and terms of use, rolling them into a single, easy-to-read-and-understand package (see Terms of Use). There were no changes to the policy itself; rather we wanted to make the terms transparent to users, bring our policy language in line with DMPonline, and lay a foundation for exposing more usage data to institutional admins. This also helps pave the way for machine-actionable DMPs—more on that subject in a forthcoming blog post.
  • Plan visibility settings. We made some related changes to revise language within the tool about plan visibility settings (screenshots below). Note that plans are no longer “private” by default. We’re now asking users to choose a visibility setting at the beginning of the plan creation process. In addition, they’ll be asked to confirm their choice at the end. This should reveal preferences about sharing plans, and *hopefully* we can encourage more users to open their plans up to “public” or “institutional” audiences. The Quick Start Guide and other portions of the Help menu have also been updated to reflect these changes.
  • Test plans. We added a “test or practice” option for plan visibility (screenshots below). This will enable us (and institutional partners) to filter test plans from usage statistics in addition to helping us curate the Public DMPs list.
  • Get a list of plans. We updated two API calls so authorized admins can retrieve information about ALL plans created by users from their institution (get a list of plans, and get a list of plans with all related attributes). Please note that admins will only be able to see private plans created after we implemented these changes. Admins can still get aggregated, anonymized usage info about total plans, templates used, etc. for all plans created at your institution since the beginning of DMPTool time (see the GitHub wiki for a complete list of API calls).

viz_buttons Tooltip for plan visibility options Confirm your DMP visibility choice message

As always, we’re eager to know what you think. Please send us your questions, comments, use cases for machine-actionable DMPs, etc!

New template: NIJ (DOJ)

The National Institute of Justice (NIJ) is the research, development, and evaluation agency of the U.S. Department of Justice (DOJ). We created a template to assist NIJ funding applicants with preparing a Data Archiving Plan. This is essentially a 1–2 page DMP submitted with grant proposals: 1) to demonstrate your recognition that data sets resulting from your research must be submitted as grant products for archiving and have budgeted accordingly, and 2) to describe how the data will be prepared and documented to allow reproduction of the project’s findings as well as future research that can extend the scientific value of the original project. The policy also notes that “some amount of grant award funds is typically withheld for submission of research data along with the final report and other products/deliverables.”

In most cases, the NIJ requires grantees to deposit their data in the National Archive of Criminal Justice Data (NACJD), which is hosted by ICPSR. The template contains links to guidelines, best practices, FAQs, and other helpful information provided by the NACJD and ICPSR, including specific instructions pertaining to common types of social science data and software.

While the NIJ is not subject to the OSTP Memo, the requirement to submit a Data Archiving Plan has been in place since 2014. We finally added a template in response to a user request.

NASA template update & bug fix

NASA template

NASA logoLast week NASA launched a new Research Portal, with consolidated information regarding data management plans and publications. There are no changes to the DMP requirements as the public access plan remains the same. The big news concerns the creation of PubSpace, an open access article repository that is part of the NIH-managed PubMed Central. Beginning with 2016 awards, all NASA-funded authors and co-authors will be required to deposit copies of their peer-reviewed scientific publications and associated data into PubSpace.

Another new resource is the NASA Data Portal, which bears the following description:

“The NASA data catalog serves not as a repository of study data, but as a registry that has information describing the dataset (i.e., metadata) and information about where and how to access the data. The public has access to the catalog and associated data free of charge. NASA will continue to identify additional approaches involving public and private sector entities and will continue efforts to improve public access to research data. NASA will explore the development of a research data commons—a federated system of research databases—along with other departments and agencies for the storage, discoverability, and reuse of data, with a particular focus on making the data underlying the conclusions of federally funded peer-reviewed scientific research publications available for free at the time of publication.”

In response to the announcement, we’ve updated a few guidance links for the NASA template and reached out to the NASA Open Innovation Team—part of the office of the CIO— which appears to be in charge of these new initiatives.

Review workflow: Refinements and fixes

After releasing the review workflow enhancements, we encountered a bug that prevented the system from sending out an email notification if an institution did not create a customized message. Only one user was affected and we have since fixed the issue. We also added a grayed-out default message to the box on the Institution Profile page. We apologize if any emails went awry and invite you to test again and let us know if things are working as expected. You can also check out the updated documentation on the GitHub wiki.

Review workflow enhancements

We deployed some enhancements to the review workflow in response to feedback. With increasing use of this functionality, we appreciate you letting us know what works for you and what doesn’t. In the next version of the tool, we plan to dispense with the term “review” altogether and replace it with more informal language to avoid confusing researchers (e.g., “feedback” or “comments”). The following changes to the current tool should hopefully improve things for all users. And as always, we want to know what you think!

One more small thing to note: we updated the generic slide decks (PDF and Google doc) on the promotional materials page.

  • Replaced “Submit for Review” button with “Request Feedback” for templates enabled for Informal Review

request_feedback

  • Provided complete history of reviewed plans in admin dashboard. Admins and plan owners can add new comments to previously reviewed plans.

previously_reviewed

  • Added a field to the Institution Profile page where admins can customize the automated email message that users receive when they Request Feedback on a plan

feedback_email

New templates: DOT and NASA

We just added two new funder templates in response to user requests. Both the U.S. Department of Transportation (DOT) and the National Aeronautics and Space Agency (NASA) have required a data management plan with grant proposals since 2015, but for various reasons (detailed below) we held off on creating templates.

Next on the list are DOD and NIJ templates. Please let us know if you need a specific template and we’ll bump it to the front of the line.

DOT Template

Via conversations with members of the National Transportation Library (NTL) and the American Association of State Highway and Transportation Officials (AASHTO), we learned more about the bureaucratic hurdles that stand between an agency issuing a public access plan in compliance with the OSTP memo and being able to enforce that plan legally. Suffice to say, it’s complicated (for the DOT it involves the Paperwork Reduction Act). The DOT lawyers requested that we not provide a public DOT template until they cleared these hurdles, but then they softened their stance on the condition that we include the following disclaimer:

“This tool serves to provide guidance for how to prepare a Data Management Plan (DMP). The output of this tool does not constitute an approved government form. Those preparing DMPs for submission to the U.S. Department of Transportation (USDOT) should use their best judgment in determining what information to include. USDOT has identified five (5) broad areas that should be addressed in a DMP, but is not requiring any specific information to be included in any submitted DMP. USDOT may, at its discretion, establish an Office of Management and Budget-approved information collection. Once approved, the information collection will become a form with a control number, and certain DMP elements may become mandatory.”

Throughout these conversations, we gained valuable insight into the vibrant DOT community and became fans of the NTL for providing such helpful guidance (links included in the template). The NTL also hosts a regular webinar series on data management and invited me to give a DMPTool presentation (past recordings available on their website). One noteworthy feature of the DOT plan is that it requires researchers to obtain an ORCID, which will be used in the reporting workflow to identify research outputs. We look forward to working with the NTL to maintain the DOT template in the future!

NASA Template

NASA also seems to be in limbo regarding enforcement of their public access plan. This blog post is instructive and various NASA webpages contain general information about data management plans, often infused with humor, e.g.:

“Remember, this is a directive from the white house and if you are really bad The President will call your dean and shame you. Just kidding, but awardees who do not fulfill the intent of their DMPs may have continuing funds withheld and this may be considered in the evaluation of future proposals, which may be even worse…” (DMP FAQ Roses)

Because we received so many requests for a NASA template, we decided to go ahead and create one with the information at hand (official Public Access Plan), and with the expectation that there will be revisions and updates to come. If you have suggestions of additional resources to include in the NASA template, please let us know.

Getting our ducks in a row

From Flickr by Cliff Johnson, CC BY-SA 2.0

From Flickr by Cliff Johnson, CC BY-SA 2.0

Recent activity on the Roadmap project encompasses two major themes: 1) machine-actionable data management plans and 2) kicking off co-development of the shared codebase.

Machine-actionable DMPs

The first of these has been a hot topic of conversation among stakeholders in the data management game for some time now, although most use the phrase “machine-readable DMPs.” So what do we mean by machine-actionable DMPs? Per the Data Documentation Initiative definition, “this term refers to information that is structured in a consistent way so that machines can be programmed against the structure.” The goal of machine-actionable DMPs, then, is to better facilitate good data management and reuse practices (think FAIR: Findable, Accessible, Interoperable, Reusable) by enabling:

  • Institutions to manage their data
  • Funders to mine the DMPs they receive
  • Infrastructure providers to plan their resources
  • Researchers to discover data

This term is consistent with the Research Data Alliance Active DMPs Interest Group and the FORCE11 FAIR DMPs group mission statements, and it seems to capture what we’re all thinking: i.e., we want to move beyond static text files to create a dynamic inventory of digital research methods, protocols, environments, software, articles, data… One reason for the DMPonline-DMPTool merger is to develop a core infrastructure for implementing use cases that make this possible. We still need a human-readable document with a narrative, but underneath the DMP could have more thematic richness with value for all stakeholders.

A recent Cern/RDA workshop presented the perfect opportunity to consolidate our notes and ideas. In addition to the Roadmap project members, Daniel Mietchen (NIH) and Angus Whyte (DCC) participated in the exercise. We conducted a survey of previous work on the topic (we know we didn’t capture everything so please alert us to things we missed) and began outlining concrete use cases for machine-actionable DMPs, which we plan to develop further through community engagement over the coming months. Another crucial piece of our presentation was a call to make DMPs public, open, discoverable resources. We highlighted existing efforts to promote public DMPs (e.g., the DMPTool Public DMPs list, publishing exemplary DMPs in RIO Journal, Dataverse collections that include DMPs) but these are just a drop in the bucket compared to what we might be able to do if all DMPs were open by default.

You can review our slides here. And please send feedback—we want to know what you think!

Let the co-development begin!

Now for the second news item: our ducks are all in a row and work is underway on the shared Roadmap codebase.

We open with a wistful farewell to Marta Ribeiro, who is moving on to an exciting new gig at the Urban Big Data Centre. DCC has hired two new developers to join our ranks—Ray Carrick and Jimmy Angelakos—both from their sister team at EDINA. The finalized co-development team commenced weekly check-in calls and in the next week or two we’ll begin testing the draft co-development process by adding three features from the roadmap:

  1. Enhanced institutional branding
  2. Funder template export
  3. OAuth link an ORCID

In the meantime, Brian completed the migration to Rails 4.2 and both teams are getting our development environments in place. Our intention is to iterate on the process for a few sprints, iron out the kinks, and then use it and the roadmap as the touchstones for a monthly community developer check-in call. We hope this will provide a forum for sharing use cases and plans for future work (on all instances of the tool) in order to prioritize, coordinate, and alleviate duplication of effort.

The DCC interns have also been plugging away at their respective projects. Sam Rust just finished building some APIs for creating plans and extracting guidance, and is now starting work on the usage statistics use case. Damodar Sójka meanwhile is completing the internationalization project, drawing from work done by the Canadian DMP Assistant team. We’ll share more details about their work once we roll it all back into the main codebase.

Next month the UC Berkeley Web Services team will evaluate the current version of DMPonline to flag any accessibility issues that need to be addressed in the new system. We’ve also been consulting with Rachael Hu on UX strategy. We’re keeping track of requests for the new system and invite you to submit feedback via GitHub issues.

Stay tuned to GitHub and our blog channels for more documentation and regular progress updates.

The 20:51 sprint (Roadmap team-building: UK edition)

teamwork

This week we hosted the DMPTool team to flesh out our plans for ‘roadmap’ – the joint codebase we’re building together based on DMPonline and DMPTool. The key focus was reviewing and prioritising tasks for an initial release.  Building on discussions from the earlier US visit, we confirmed what work was to be done and agreed to begin with some well-defined, short tasks as a test of our co-development procedures. With everyone taking leave over the coming weeks, the first sprint will start in mid-July at which point we’ll begin adding documentation to the Github repository.

We also discussed communication plans. Stephanie and I will take turns to do monthly blog posts so you can stay in the loop with what’s happening, and we aim to start regular calls in a few months with others who are actively working on the code, such as the Portage group in Canada. This will allow everyone to share their plans for future enhancements and to coordinate development activities. We’re always learning about new people who have picked up on our software – the latest being a group in Germany who have extended the DMPTool code to offer a bi-lingual interface – so we want to do more to bring these efforts together. While we get our work underway, we encourage people to join the developers list as a place to start discussions and form a community of interest.

Machine-actionable DMPs was another key theme for which Daniel Mietchen joined our discussions. Stephanie has an RDA/US Data Share Fellowship to pursue work in this area and we’re planning to give talks at some upcoming events highlighting our ideas. We’ve started to refine the themes used for guidance in DMPonline. Currently, DCC defines 28 themes corresponding with UK funder questions that are often addressed in a DMP (e.g., Data format, Metadata, Ethical issues, etc.). These themes offer the perfect starting point for standardising and structuring DMPs so it’s more feasible to identify and mine relevant text. We’ll be seeking comments from our user communities and key working groups such as CASRAI, RDA and FORCE11 shortly on this. We’re also keen to capture more data in a controlled way so it can be put to better uses. One idea is to provide an actionable list of repositories to allow researchers to select where they are going to deposit research outputs, and then to use this data to push notifications out to alert repositories and/or monitor compliance. Machine-actionable DMPs have been part of the future plans of both teams for some years, and they are currently a hot topic. We’re excited that we now have the resources to develop those ideas and a system that will allow us to test them via deployment. We also want to collect additional use cases and explore integrations with other systems so please don’t hesitate to get in touch.

The eagle-eyed among you may have spotted some new faces in the team photos. The DCC has two student interns from Informatics working on DMPonline over the summer. Damodar is doing the internationalisation work that we consulted with the user group on at IDCC and Sam is busy developing an API. Both are making great progress so we’ll be looking for input from the user group again soon to try out the new features. The DMPTool team includes a new developer called Brian joining as technical lead. The visit was a great team-building opportunity for our transatlantic DMP roadmap project.

sprintIt was a jam-packed week with lots of meetings, brainstorming sessions and time working together on the code. We had new culinary experiences (deep-fried haggis balls no less!), heard some hilarious tales from the adventures of John Chodacki, and initiated the US team in the Glasgow-Edinburgh commute, including a quick dash one evening to make the 20:51 train home. Here’s the photographic proof of our first successful joint sprint. Stay tuned for what else we deliver over the coming months.