70% of all PhDs awarded in the state of California for science, technology, engineering, and mathematics are granted by the University of California system. In fact, the UC system is responsible for training approximately 10,000 graduate researchers and supporting over 6,000 postdoctoral fellows in their research. The amount of data generated by these graduate students and scientists is difficult to estimate, but is surely near the petabyte range (a petabyte is a number starting with 1 and having 15 zeros after it). Although fledgling scientists receive training in collection methods, instrument use, statistics, and the ethics of science, good data management is not often emphasized. Without proper data management, these data are at risk of being lost.
The lack of data management training may stem from advisors themselves having poor data literacy, but this situation is likely to change with the implementation of data management plan requirements by NSF. Principle investigators who head up a laboratory group and are writing grant proposals are also now responsible for writing data management plans. With the assistance of the DMP Tool, these investigators will be able to construct thoughtful data management plans that can be implemented by themselves and the scientists they train. The next generation of scientists will then be better equipped to properly document, manage, and archive their own data and those of the scientists they train.