Data practices and data management

Blog series on scientific integrity

Have you ever had to go back to research data you accumulated a few years ago? And were then frustrated that you couldn’t remember anymore where you stored them or what you did to obtain them? If you’re like the average researcher, this has most likely happened to you many times throughout your research career. When we store data, we tend to not think about documenting everything consistently.

At the time of storage, your brain is fully aware of where everything is and what happened to it. But our brains have a notoriously terrible memory, turning those once crisp memories into a murky lake of chaos and frustration! The key to solving this problem is proper data management, using the FAIR principles of Findability, Accessibility, Interoperability and Reusability as your guideline to bypass your brain and contribute to the Open Science movement. This will make your data reusable not just for you, but any other researcher you plan to share those data with. Moreover, being FAIR and Open increases transparency in science and helps you adhere to the standards of scientific integrity.

Data management during research

From the moment you plan your experiment all the way to the endpoint of analysing your results and writing them down, data management should always be at the back of your head. The Findability (F), Interoperability (I) and Reusability (R) aspects of FAIR play a big part here. Important things to consider are:

Storing data in non-proprietary open file formats (so no obscure file formats or formats that require highly specialized software). (I)
Using a standard vocabulary, preferably based on official ontology terms, making it much easier to compare different studies with each other. (I)
Using an electronic zlab) notebook (as opposed to a paper one) with a set of guidelines on how to consistently document experiments (for those who conduct experiments), other types of research data accumulation, and metadata. (F and R)

Sharing data

If you consistently manage your data, sharing those data with other researchers will be a very easy experience for everyone. But how can you best share your data? And when do you normally do this? Sharing data is strongly linked to the Findability (F), Accessibility (A), and Reusability (R) aspects of FAIR. One of the best-known examples of data sharing is placing your data set in a public data repository. This generally means that:

The data set can be very easily found using metadata and a globally unique persistent identifier, such as a DOI. (F)
It is clearly described how the data can be accessed (freely or, if required, using authentication/authorisation steps) and the metadata are always freely accessible. (A)
Data sets receive a flexible Creative Commons usage license by default, allowing other people to use your work without any restrictions. (R)

These three points normally also apply to papers you publish, where articles receive a DOI, access protocols are clear (either Open Access, which is much preferred in Open Science, or via a subscription), metadata is always available (e.g., title, authors, abstract), and the usage license is clearly defined.

A great example of a general-purpose data repository is DataverseNL. Just like your papers, data sets in DataverseNL receive their own DOI and are made available under a flexible Creative Commons usage license. Apart from DataverseNL, many other options are available, including domain-specific repositories. To make your work truly reusable for others, sharing your data is the way to go!

Long-term storage of data

The Dutch code of conduct for academic practice states that research data should remain available for a minimum of 10 years. While you could use a public data repository to store your data for the long-term, you would need a guarantee that the data will stay stored for that 10-year minimum. But more importantly, some data you cannot publicly share, for example because of intellectual property (IP) restrictions or privacy reasons. How can you then adhere to the code of conduct without violating IP or privacy regulations? The answer is simple: DataHub’s Maastricht Data Repository (MDR). MDR is UM’s in-house private data archive where you can safely store all types of data for the long term in a secure environment. Access is restricted to you as the data owner and anyone else you explicitly grant access to. If you ever need to go back to your old data, they will be right there waiting for you!

Getting help

The University Library can help you out with many aspects of Open Science and research integrity and they provide the services and infrastructure you need to “FAIR-ify” your data. Don’t hesitate to check out their website to find out more. Of course, you can also contact the Open Science Community Maastricht (OSCM) for more information on Open Science or FAIR. The OSCM organizes FAIR lectures and workshops, has a wide range of expertise, and has the right connections to help you out with any questions.