Data

Data management is a large topic and there are many excellent resources available on the internet.  This page aims to provide links to information specific to the HMTF programme through to general information on best practices in research data management.

As a programme, NERC require that we document our datasets (metadata) to a standard that would allow a future researcher to be able to understand or potentially duplicate the dataset.  Comprehensive documentation of datasets is good scientific research practice and will ensure that datasets archived to NERC computer centres (or other appropriate repositories e.g. ForestPlots) can contribute to future research.  Part of the documentation process is also ensuring that anyone re-using an archived datasets correctly references the original researchers.

If you have any queries about specific or general data management issues, please contact the HMTF Data Manager  (finella.blair@ouce.ox.ac.uk).

 

Metadata – describing your datasets

Metadata is data that describes other data.   The EIDC and NERC require metadata that conforms to the UK GEMINI standard for spatial data.

NERC requires discovery metadata – the essential information that enables the potential user of data to find out if a particular resource exists, its location, ownership and whether it meets their requirements.

 

HMTF data management resources

All researchers have been sent an Excel spreadsheet template to fill in and return to the data manager (finella.blair@ouce.ox.ac.uk) that gives basic details about the datasets they will be creating e.g. name, description, file type, likely final size, date dataset likely to be complete.

Basic dataset information form

Example dataset

 

NERC metadata guidelines

General good practice guidance
Searches on metadata portals and search engines to which the metadata is exposed can result in a large number of results; the metadata should therefore be sufficiently clear and comprehensible to enable the reader to understand the nature of the entry and to assess whether it is suitable for reuse. Poor quality metadata can mean that a resource is effectively hidden from users and remains unused.

When writing good quality metadata, always keep in mind the ABCD of good discovery metadata. Metadata should be:

Accurate – correctly and precisely describe the resource in question

Beneficial  – contains information that is useful to the end user without lots of extraneous, irrelevant information

Clear – easily understandable by a non-technical user and unambiguous

Distinctive – contains information that allows it to be distinguished from other, potentially similar, resources
The following metadata fields will be needed. Guidance on each field is given here which should be read in conjunction with this document.

Title:

Abstract:

Lineage:

Spatial extent: (bounding box)

Temporal extent: (dates from / to):

Spatial Reference system: e.g. British National Grid, WGS

Spatial representation type: e.g. raster, vector

Spatial resolution (For gridded data, this is the area of the ground (in metres) represented in each pixel. For point data, the ground sample distance is the degree of confidence in the point’s location e.g. for a point expressed as a six-figure grid reference, SN666781, the resolution would be 100m)

Keywords:

Author name: (Bloggs, J.J.)

Author email:

Author organisation:

Where appropriate, all datasets should also have detailed metadata on aspects such as experimental design, sampling, fieldwork or laboratory instrumentation, analytical methods; any information that would be necessary for a researcher not involved in the project to understand and/or re-use the dataset. Further guidance is available

 

HMTF programme – specific links

NERC website  

NERC – HMTF program information

Award details

NERC Data policy document

Guidance notes for Data policy

NERC data policy information page

NERC Data Catalogue service   – Searchable catalogue of NERC funded data held at all NERC data centres.

 

Environmental Information Data Centre (EIDC) 

 

The EIDC is the NERC data centre which will store HMTF datasets of long term value.

EIDC website

EIDC data catalogue:     View descriptions of currently available datasets from the EIDC and see what discovery metadata is required for the catalogue when your dataset is uploaded to EIDC.

Datasets held by the Environmental Information Data Centre can be found by searching or browsing the CEH Data Catalogue using a text search and/or map search option. Each data resource has its own page, providing a description, background information, digital object identifier (where available) and links from which the dataset can be downloaded or ordered. Datasets are available under the Open Government Licence where possible, and licences for each dataset are provided.

 

Amazonian stream dataset – example of available dataset

The deposit process can be viewed here.

 

 

Stability of Altered Forest Ecosystems (SAFE) project

Project website www.safeproject.net

 

Online databases

 

Research data management resources

Institution guidelines

As outlined in the HMTF data management plan, researchers in the BALI, ECOFOR and LOMBOK consortia are encouraged to follow the data management guidelines outlined by their institutions (e.g. backup, storage, policies and support).  Links to relevant website pages by institution are shown below:

 

UK

Cranfield University

Queen Mary University of London

University of Aberdeen

University of Bristol

University of Cambridge

University of Edinburgh

Imperial College London

University of Kent

University of Lancaster

University of Leeds

University of Liverpool

University of Oxford

University of York

 

Brazil

See general information for further advice.

 

Malaysia

ITBC, University of Malaysia

 

Other

James Cook University

National University of Singapore

Northern Arizona University

Mendel University in Brno

 

Other resources

 

The Digital Curation Centre (DCC) website has a wealth of information about all aspects of data management.  While it was set up to support UK research institutions, the majority of information is applicable to all research data management.

DataOne (www.dataone.org) is another good source of information for general guidelines on data management.  They list many resources and particularly useful are their summary sheets on various data management aspects.

  • Data management introduction – Trends in data collection, storage and loss, the importance and benefits of data management, and an introduction to the data life cycle
  • Data sharing – Data sharing in the context of the data life cycle, the value of sharing data, concerns about sharing data, and methods and best practices for sharing data
  • Data management planning – Benefits of a data management plan (DMP), DMP components, tools for creating a DMP, NSF DMP information, and a sample
  • Data entry and manipulation Best practices for data entry, data entry and data manipulation tools.
  • Data quality control and assurance – Types of data errors, best practices for data quality assurance and control to prevent and correct errors.
  • Protecting your data – The difference between data protection, backup, archiving and preservation, best practices for backing up and preserving data.
  • Metadata – Metadata defined, information included in metadata, selection of metadata standards, the value and utility of metadata.
  • Writing quality metadata – Best practices for writing high quality metadata
  • Data citation – Data citation defined, benefits of data citation, examples and best practices for data citation
  • Data analysis and workflows Types of data analyses, introduction to reproducibility, provenance, and workflows, informal (conceptual) and formal (executable) workflows
  • Legal and policy issues – Legal and policy issues, copyright and licenses, data restrictions and ethical considerations.

 

Digital Object Identifiers

 

Repositories

 

http://www.re3data.org/ – registry of research data repositories

http://datadryad.org/

https://figshare.com/