Glossary
A
Anonymization
De-identification of individuals in the data.
ARGOS
Jointly developed by OpenAIRE and EUDAT, it is a collaborative software for writing data management plans.
ARK
ARK (Archival Resource Key) is a type of persistent identifier used by museums, archives, libraries, etc.
ARP AROMA
In the ARP Data Repository, the ARP AROMA service can also be used to provide unique metadata for files within hosted data packages. Metadata can be added to data packages and the files within them according to the public metadata schemas stored in the ARP Schema Repository.
ARP Federated Search
In the ARP Data Repository, it is also possible to specify file-level metadata, which can be searched by using the ARP Common Search. The service allows searching not only research data stored in the ARP Data Repository but also in other repositories.
ARP Schema Registry
The ARP Spatial Repository is a Cedar-based metadata repository. Users can also create new metadata sets (schemas) according to their specific needs.
B
Big Data
A large amount of data requiring a variety of data management and analysis methods.
C
CEDAR
The CEDAR (Center for Expanded Data Annotation and Retrieval) is a metadata schema registry, which helps researchers comply with requirements to archive their data so others can understand and use them.
Creative Commons (CC) licences
CC licences allow authors to keep their copyright, but at the same time provide the possibility to process and distribute the work, within different limits. The project offers a number of free licences.
CRIS (Current Research Information System)
Research Information System used to record projects, publications and research data.
D
Data access level
Controls on access to data (e.g. open, restricted, closed).
Data access permission
Authorisation rules for access and use of data.
Data authentication
Checking the authenticity and veracity of the data.
Data citation
A standard way of referencing data to ensure traceability and recognition.
Data cleaning
Remove noise and errors from data to increase the accuracy of analyses.
Data documentation
Information describing the meaning, structure and production process of data.
Data format
How the data is stored (e.g. CSV, XML...)
Data integrity
Ensuring the reliability, completeness and consistency of data.
Data licence
Legal and ethical framework for the use of data (e.g. CC )
Data management
The process of collecting, storing, processing, sharing and protecting data.
Data Management Plan (DMP)
A formal document that specifies how data should be handled in research projects.
Data preservation
Procedures to ensure the long-term readability and interpretability of data and their metadata.
Data privacy
A set of principles and practices related to data protection and security. It aims to prevent unauthorised access to sensitive or personal data, for example by complying with data protection regulations (GDPR).
Data quality
A set of indicators describing the accuracy, reliability and consistency of data.
Data repository
A datastore that is used to publish and share data packages, usually from different sources.
Data steward
A person who is responsible for maintaining and interpreting data and is familiar with data storage and preservation options and best practices.
Data storage
It refers to the method and technology used to store, retrieve and manage data. Data storage can be local (e.g. on a hard drive) or cloud-based.
Data Use Agreement, DUA
A Data Use Agreement (DUA) is a legal agreement that specifies who can use a particular data file, how and under what conditions.
Dataset
A collection of data that can be considered as a single entity from a data management, legal and sharing perspective, meaning that practically all elements of the dataset are subject to the same legal constraints, share common metadata and are shared together.
Dataverse
An open source platform used to archive and share research data. It allows researchers to store and publish data in a systematic way. The ARP data repository is an enhanced version of the Dataverse data repository.
Derived data
Data that is the result of processing other data, i.e. not primary data. The English term secondary data is more of a statistical term.
DMPonline
A platform created by the Digital Curation Centre that allows registered users to create data management plans. You can create your own personalised plan, and you also have the option to use pre-designed template questions from several universities and research funding organisations.
DMPTool
The Data Management Plan Tool (DMPTool) is an online tool that helps researchers to create a Data Management Plan (DMP).
DOI
DOI (Digital Object Identifier) is a type of persistent identifier that is widely used for scientific publications.
Dublin Core
A generic metadata schema that can be used to describe all things and is therefore very common, ISO Standard 15836-1:2017
F
FAIR principles
Principles for managing research data to ensure the availability and reproducibility of research results. (Findable, Accessible, Interoperable, Reusable)
G
GDPR
General Data Protection Regulation, which governs the processing of personal data relating to natural persons.
H
Handle
The Handle System is a general persistent identification system operated by CNRI. The Handle System also serves DOIs. The persistent identifier registered by the Handle System is called a handle.
M
Metadata
Data describing the data. The most common metadata are the name, creator and creation date of the data. Other possible metadata descriptors are usually grouped and defined in schemas. The best known such schema is Dublin Core. The set of metadata for an item is called a metadata record, the structure of which is governed by metadata schemas.
Metadata schema
Standardizing metadata is key to its use. A metadata schema describes the structure of the metadata, what elements can be used in a metadata record, what values they can have, and how it is recommended to fill or publish the metadata.
O
Ontology
A formal representation of a set of knowledge that represents the relationship between concepts. The framework of the ontology consists of definitions of classes and properties. For example, in an ontology describing scientific publications, there may be a journal article and a book chapter class, and an author and page number property. An ontology is based on a special form of mathematical logic, descriptive logic.
Open access
Open access means free and unrestricted access to publications and data, including reading, downloading and reuse, with appropriate attribution. It is common to distinguish between gold and green open access, where in the gold case the publisher and in the green case the institution or researcher provides open access.
Open Data
Data that is freely accessible, reusable and shareable, usually under appropriate licences (e.g. Creative Commons). Used in science to promote transparency and reproducibility.
Open science, Open research
Open Research is a set of principles and practices that aim to make the outputs of research freely accessible and usable, thereby to maximising the possibility of public benefit. It is also characterised by collaboration, transparency and reproducibility.
P
Persistent identifier
Used for long-term identification of digital objects. The function of a PID is to uniquely identify a digital object globally, and it is also required that the identifier be resolvable (actionable) to a URL. DOI, ORCID, ISBN are known examples of PID services.
Personal data
Personal data are any data that can be associated with a natural person and any inference that can be drawn from the data concerning the data subject.
Primary data (raw data)
Unprocessed data, in the form and with the content they had when they were created.
Provenance
Traditionally, a chronological list of the owners and storage locations of a historical object. In modern research data management, it describes in as much detail as possible the stages in the life cycle of the data, i.e. how and from what other data it was created, by whom and what transformations were made to it.
R
Registry
A list of digital objects of some type, uniquely identified within the register. Its main purpose is to allow the object to be located, rather than to store the complete data content (unlike a repo store). Typical examples are: domain name register, protocol register, metadata schema register.
Research data
Research data is any information generated to support research findings. Research data can be created by authors, generated by tools, collected or observed. Research data do not only exist in digital form, but also include paper notes and journals.
Research data management
Research data management or data management plan refers to the creation, storage, access, sharing, reuse, and preservation of research data.
Research Object
A Research Object provides a way to aggregate, package and identify research data on the Internet. It involves describing the data (metadata), documenting the content of the data files and the relationships between them.
RO-Crate
Research data package format, in which the whole package and the individual files it contains can be richly described with metadata. This supports FAIR compliance and long-term preservation of research data.
T
V
Version management
Track and document changes to data files.
W
Wikidata
A free and open data set containing structured data on various topics. Wikidata is operated by the Wikimedia Foundation and is intended to serve as a central data source for Wikipedia and other projects.
Wikidata Identifier
A unique identifier (Q-number, e.g. Q42) that Wikidata assigns to each record. This helps both machines and humans to clearly identify the data.