The Gateway does not hold a copy of Data Custodian data. Instead, it stores summary information used to describe each of the datasets that the Data Custodian holds. This is commonly known as Metadata and contains information such as: where the dataset has come from, a description of the dataset, the time-period, and the geographical areas the dataset covers.
This metadata is helpful for a researcher to understand whether a dataset will be of use to them without them having to see the data itself. Obviously, the more metadata that is provided about each dataset and the more accurate it is, the easier it is for the researcher or innovator to decide whether it is helpful for their work. This metadata also allows them to check they are eligible for access before making a request from the Data Custodian. This helps to reduce the number of invalid requests that custodians have to handle.
Below this summary banner are a number of tabs providing additional information about the dataset. These are described below.
The about tab provides the high level overview of the available dataset broken down as follows:
- Details: Latest release date; Publishing frequency; and Resource creator
- Coverage: Time period covered; Any time lag in the dataset; Geographical Coverage; Typical age range of participants in the dataset; Availability of physical samples in the dataset; Any follow-up period for patients in the dataset; Any pathway represented in the dataset.
- Formats and standards: A list of semantic annotations used within the dataset; Any standardised data models used; Language of the dataset; Format of the dataset e.g. csv
- Provenance: Purpose for which the dataset was collected; Whether derived datasets are available and if so the type or derivation used.
- Data access request: Jurisdiction; Data controller; and Data Processor, if relevant
- Related resources: Any keystone paper associated with the dataset; any active projects using the dataset; and any tools or models that have been created for the dataset
- Linked datasets: Any datasets linked to this dataset