Importance of Data Management Education

70% of all PhDs awarded in the state of California for science, technology, engineering, and mathematics are granted by the University of California system.  In fact, the UC system is responsible for training approximately 10,000 graduate researchers and supporting over 6,000 postdoctoral fellows in their research. The amount of data generated by these graduate students and scientists is difficult to estimate, but is surely near the petabyte range (a petabyte is a number starting with 1 and having 15 zeros after it).  Although fledgling scientists receive training in collection methods, instrument use, statistics, and the ethics of science, good data management is not often emphasized.  Without proper data management, these data are at risk of being lost.

The lack of data management training may stem from advisors themselves having poor data literacy, but this situation is likely to change with the implementation of data management plan requirements by NSF.  Principle investigators who head up a laboratory group and are  writing grant proposals are also now responsible for writing data management plans. With the assistance of the DMP Tool, these investigators will be able to construct thoughtful data management plans that can be implemented by themselves and the scientists they train.  The next generation of scientists will then be better equipped to properly document, manage, and archive their own data and those of the scientists they train.

Source for University of California statistics

UC Funding at Risk without Good DMPs

Government agencies are tightening their belts due to the current economic climate. The National Science Foundation is no exception: their budget for research activities decreased by $150 million in 2011.  The logical assumption is that fewer projects will be funded, and therefore competition for the remaining funds will be fierce.

Many scientists have experienced the frustration of receiving a grant proposal review that is favorable (all “very good” or “excellent” ratings) yet is not funded.  The funding rate for NSF as a whole was at 32% in 2009, with the lowest funding rates in the Engineering and Biological Sciences directorates (25% and 28%, respectively).  Increasingly grants that are funded must be above and beyond good; they must be stellar.

As of January 18 2011, all NSF grant proposals must include a data management plan, or DMP.  This document, much like a Broader Impacts statement, is a supplement to the main body of the 15 page proposal.  Although the DMP is in its early phase of implementation, we can assume that the DMP will garner importance similar to that of the Broader Impacts statement  in awarding grants during these fiscally challenging times.

In 2010, the University of California received almost $500 million in research funds from the National Science Foundation.  This funding is in jeopardy if UC scientists are not cognizant of the importance of a good data management plan in their next NSF proposal. The DMP Tool is meant to help guide scientists in the creation of an excellent data management plan, and should be used in conjunction with talking to your discipline’s librarian about how best to structure your data organization, storage, and archiving.

Aside from the “stick” that NSF is using to encourage data management plans, there is also a “carrot”: researchers will benefit immensely from even a minimal amount of planning for their data management.  The University of California campuses are leaders among universities in receiving research funding from the National Science Foundation. To maintain our position as top-tier research institutions, it is imperative that good data management plans accomany all proposals to the NSF.

Sources:  UC Data , NSF Merit Review Document for FY 2009 , Blog from Today’s Engineer about FY 2012 budget, citing House Resolution 1