Data management is a large topic and there are many excellent resources available on the internet. This page aims to provide links to information specific to the HMTF programme through to general information on best practices in research data management.
As a programme, NERC require that we document our datasets (metadata) to a standard that would allow a future researcher to be able to understand or potentially duplicate the dataset. Comprehensive documentation of datasets is good scientific research practice and will ensure that datasets archived to NERC computer centres (or other appropriate repositories e.g. ForestPlots) can contribute to future research. Part of the documentation process is also ensuring that anyone re-using an archived datasets correctly references the original researchers.
If you have any queries about specific or general data management issues, please contact the HMTF Data Manager (firstname.lastname@example.org).
Metadata – describing your datasets
Metadata is data that describes other data. The EIDC and NERC require metadata that conforms to the UK GEMINI standard for spatial data.
NERC requires discovery metadata – the essential information that enables the potential user of data to find out if a particular resource exists, its location, ownership and whether it meets their requirements.
HMTF data management resources
All researchers have been sent an Excel spreadsheet template to fill in and return to the data manager (email@example.com) that gives basic details about the datasets they will be creating e.g. name, description, file type, likely final size, date dataset likely to be complete.
NERC metadata guidelines
General good practice guidance
Searches on metadata portals and search engines to which the metadata is exposed can result in a large number of results; the metadata should therefore be sufficiently clear and comprehensible to enable the reader to understand the nature of the entry and to assess whether it is suitable for reuse. Poor quality metadata can mean that a resource is effectively hidden from users and remains unused.
When writing good quality metadata, always keep in mind the ABCD of good discovery metadata. Metadata should be:
Accurate – correctly and precisely describe the resource in question
Beneficial – contains information that is useful to the end user without lots of extraneous, irrelevant information
Clear – easily understandable by a non-technical user and unambiguous
Distinctive – contains information that allows it to be distinguished from other, potentially similar, resources
The following metadata fields will be needed. Guidance on each field is given here which should be read in conjunction with this document.
Spatial extent: (bounding box)
Temporal extent: (dates from / to):
Spatial Reference system: e.g. British National Grid, WGS
Spatial representation type: e.g. raster, vector
Spatial resolution (For gridded data, this is the area of the ground (in metres) represented in each pixel. For point data, the ground sample distance is the degree of confidence in the point’s location e.g. for a point expressed as a six-figure grid reference, SN666781, the resolution would be 100m)
Author name: (Bloggs, J.J.)
Where appropriate, all datasets should also have detailed metadata on aspects such as experimental design, sampling, fieldwork or laboratory instrumentation, analytical methods; any information that would be necessary for a researcher not involved in the project to understand and/or re-use the dataset. Further guidance is available
HMTF programme – specific links
NERC Data Catalogue service – Searchable catalogue of NERC funded data held at all NERC data centres.
Environmental Information Data Centre (EIDC)
The EIDC is the NERC data centre which will store HMTF datasets of long term value.
EIDC data catalogue: View descriptions of currently available datasets from the EIDC and see what discovery metadata is required for the catalogue when your dataset is uploaded to EIDC.
Datasets held by the Environmental Information Data Centre can be found by searching or browsing the CEH Data Catalogue using a text search and/or map search option. Each data resource has its own page, providing a description, background information, digital object identifier (where available) and links from which the dataset can be downloaded or ordered. Datasets are available under the Open Government Licence where possible, and licences for each dataset are provided.
Amazonian stream dataset – example of available dataset
The deposit process can be viewed here.
Stability of Altered Forest Ecosystems (SAFE) project
Project website www.safeproject.net
- SAFE data policy
- SAFE metadata protocols
- SAFE open datasets (e.g. Above-ground microclimate, SAFE experimental layout, sampling stations, LiDAR flight locations, stream networks)
- SAFE project Wiki pages – shared knowledge base for researchers working at SAFE
- Earthcape – platform for SAFE datasets
- ForestPlots – hosts information about over 2000 forest plots in 31 countries and networks including ECOFOR, AfriTRON, RAINFOR
- GEM – Global Ecosystems Monitoring
- TRY – TRY is a network of vegetation scientists headed by Future Earth and the Max Planck Institute for Biogeochemistry, providing a global archive of curated plant traits.
Research data management resources
As outlined in the HMTF data management plan, researchers in the BALI, ECOFOR and LOMBOK consortia are encouraged to follow the data management guidelines outlined by their institutions (e.g. backup, storage, policies and support). Links to relevant website pages by institution are shown below:
See general information for further advice.
ITBC, University of Malaysia
Mendel University in Brno
The Digital Curation Centre (DCC) website has a wealth of information about all aspects of data management. While it was set up to support UK research institutions, the majority of information is applicable to all research data management.
DataOne (www.dataone.org) is another good source of information for general guidelines on data management. They list many resources and particularly useful are their summary sheets on various data management aspects.
- Data management introduction – Trends in data collection, storage and loss, the importance and benefits of data management, and an introduction to the data life cycle
- Data sharing – Data sharing in the context of the data life cycle, the value of sharing data, concerns about sharing data, and methods and best practices for sharing data
- Data management planning – Benefits of a data management plan (DMP), DMP components, tools for creating a DMP, NSF DMP information, and a sample
- Data entry and manipulation Best practices for data entry, data entry and data manipulation tools.
- Data quality control and assurance – Types of data errors, best practices for data quality assurance and control to prevent and correct errors.
- Protecting your data – The difference between data protection, backup, archiving and preservation, best practices for backing up and preserving data.
- Metadata – Metadata defined, information included in metadata, selection of metadata standards, the value and utility of metadata.
- Writing quality metadata – Best practices for writing high quality metadata
- Data citation – Data citation defined, benefits of data citation, examples and best practices for data citation
- Data analysis and workflows Types of data analyses, introduction to reproducibility, provenance, and workflows, informal (conceptual) and formal (executable) workflows
- Legal and policy issues – Legal and policy issues, copyright and licenses, data restrictions and ethical considerations.
http://www.re3data.org/ – registry of research data repositories