HUN-REN Data Repository Platform

Glossary

Let's edit the glossary together! Please send your suggestions for changes and additions to support@researchdata.hu

A
B
C
D
E
F
G
H
I
L
M
O
P
R
S
T
V
W

A

Accessible

The management of research data is described by the acronym FAIR (Findable, Accessible, Interoperable, Reusable). There are clear rules for access to data and metadata, even if the data are not open (e.g. appropriate authorisation and authentication mechanisms).

Anonymization

De-identification of individuals in the data.

ARGOS

Jointly developed by OpenAIRE and EUDAT, it is a collaborative software for writing data management plans.

ARK

ARK (Archival Resource Key) is a type of persistent identifier used by museums, archives, libraries, etc.

ARP AROMA

In the ARP Data Repository, the ARP AROMA service can also be used to provide unique metadata for files within hosted data packages. Metadata can be added to data packages and the files within them according to the public metadata schemas stored in the ARP Schema Repository.

ARP Federated Search

In the ARP Data Repository, it is also possible to specify file-level metadata, which can be searched by using the ARP Common Search. The service allows searching not only research data stored in the ARP Data Repository but also in other repositories.

ARP Schema Registry

The ARP Spatial Repository is a Cedar-based metadata repository. Users can also create new metadata sets (schemas) according to their specific needs.

B

Big Data

A large amount of data requiring a variety of data management and analysis methods.

C

CEDAR

The CEDAR (Center for Expanded Data Annotation and Retrieval) is a metadata schema registry, which helps researchers comply with requirements to archive their data so others can understand and use them.

Creative Commons (CC) licences

CC licences allow authors to keep their copyright, but at the same time provide the possibility to process and distribute the work, within different limits. The project offers a number of free licences.

CRIS (Current Research Information System)

Research Information System used to record projects, publications and research data.

D

Data access level

Controls on access to data (e.g. open, restricted, closed).

Data access permission

Authorisation rules for access and use of data.

Data archiving

Long-term retention and storage of data for future use.

Data authentication

Checking the authenticity and veracity of the data.

Data citation

A standard way of referencing data to ensure traceability and recognition.

Data cleaning

Remove noise and errors from data to increase the accuracy of analyses.

Data documentation

Information describing the meaning, structure and production process of data.

Data format

How the data is stored (e.g. CSV, XML...)

Data integrity

Ensuring the reliability, completeness and consistency of data.

Data licence

Legal and ethical framework for the use of data (e.g. CC )

Data management

The process of collecting, storing, processing, sharing and protecting data.

Data Management Plan (DMP)

A formal document that specifies how data should be handled in research projects.

Data owner

The personal legal representative of a dataset.

Data preservation

Procedures to ensure the long-term readability and interpretability of data and their metadata.

Data privacy

A set of principles and practices related to data protection and security. It aims to prevent unauthorised access to sensitive or personal data, for example by complying with data protection regulations (GDPR).

Data quality

A set of indicators describing the accuracy, reliability and consistency of data.

Data repository

A datastore that is used to publish and share data packages, usually from different sources.

Data sharing

Making data available to other researchers.

Data Sovereignty

Control and jurisdiction over data.

Data steward

A person who is responsible for maintaining and interpreting data and is familiar with data storage and preservation options and best practices.

Data storage

It refers to the method and technology used to store, retrieve and manage data. Data storage can be local (e.g. on a hard drive) or cloud-based.

Data Use Agreement, DUA

A Data Use Agreement (DUA) is a legal agreement that specifies who can use a particular data file, how and under what conditions.

Dataset

A collection of data that can be considered as a single entity from a data management, legal and sharing perspective, meaning that practically all elements of the dataset are subject to the same legal constraints, share common metadata and are shared together.

Dataverse

An open source platform used to archive and share research data. It allows researchers to store and publish data in a systematic way. The ARP data repository is an enhanced version of the Dataverse data repository.

Derived data

Data that is the result of processing other data, i.e. not primary data. The English term secondary data is more of a statistical term.

DMPonline

A platform created by the Digital Curation Centre that allows registered users to create data management plans. You can create your own personalised plan, and you also have the option to use pre-designed template questions from several universities and research funding organisations.

DMPTool

The Data Management Plan Tool (DMPTool) is an online tool that helps researchers to create a Data Management Plan (DMP).

DOI

DOI (Digital Object Identifier) is a type of persistent identifier that is widely used for scientific publications.

Dublin Core

A generic metadata schema that can be used to describe all things and is therefore very common, ISO Standard 15836-1:2017

E

EduID

The EduID is an identification facility that allows a user to log in to a service hosted by another institution via an identification server hosted by that institution. EduID thus allows for more secure password management and service use. EduID can be connected at the institutional level. The connection can be initiated at Pro-M Zrt.

Embargo

In the scientific and research context, an embargo is a period during which a particular publication or set of data is not available to the public. This is often due to the protection of research data or the requirements of scientific journals.

EOSC

The European Open Science Cloud (EOSC) is an initiative of the European Commission to develop an infrastructure that provides services to its users to promote open scientific practices.

https://eosc.eu/eosc-about/

F

FAIR data

Data that comply with the FAIR principles are called FAIR data.

FAIR principles

Principles for managing research data to ensure the availability and reproducibility of research results. (Findable, Accessible, Interoperable, Reusable)

File Naming Conventions

Rules and recommendations on how to name files so that they are easy to search, organise and identify. For example, "Year_Month_Day_ProjectName_Version.txt".

Findable

The management of research data is described by the acronym FAIR (Findable, Accessible, Interoperable, Reusable). Data and their metadata should be clearly identifiable and searchable (e.g. persistent identifiers, searchable metadata).

G

GDPR

General Data Protection Regulation, which governs the processing of personal data relating to natural persons.

H

Handle

The Handle System is a general persistent identification system operated by CNRI. The Handle System also serves DOIs. The persistent identifier registered by the Handle System is called a handle.

HRDA

The Research Data Alliance Hungarian National Node (HRDA) is the Hungarian member of the Research Data Alliance (RDA), a global organisation supporting research data management. The founding members are the Institute for Computer Science and Control (SZTAKI), the MTA Library and Information Centre (MTA KIK), the HUNgarian Open Repositories (HUNOR) and the Governmental Agency for IT Development (KIFÜ). (KIFÜ ceased to exist on 31 December 2024 and its tasks were taken over by several other organisations.)

I

Interoperable

The management of research data is described by the acronym FAIR (Findable, Accessible, Interoperable, Reusable). Data and metadata are available in standard formats and structures so that they can be used with other systems and data.

L

Linked Data

Linking data with other data for interpretability and retrievability.

Linked Open Data (LOD)

Linked open data is information that is both freely accessible and linked in a machine-interpretable way. A data can be open but not linked or linked but not open - the concept of linked open data is valid if both conditions are met.

M

Metadata

Data describing the data. The most common metadata are the name, creator and creation date of the data. Other possible metadata descriptors are usually grouped and defined in schemas. The best known such schema is Dublin Core. The set of metadata for an item is called a metadata record, the structure of which is governed by metadata schemas.

Metadata schema

Standardizing metadata is key to its use. A metadata schema describes the structure of the metadata, what elements can be used in a metadata record, what values they can have, and how it is recommended to fill or publish the metadata.

Metadata strategy

The metadata strategy defines how metadata is collected, managed and accessed. A plan for the management, organisation and documentation of metadata (descriptive data) related to the data. It aims to facilitate data retrieval, sharing and reuse.

O

Ontology

A formal representation of a set of knowledge that represents the relationship between concepts. The framework of the ontology consists of definitions of classes and properties. For example, in an ontology describing scientific publications, there may be a journal article and a book chapter class, and an author and page number property. An ontology is based on a special form of mathematical logic, descriptive logic.

Open access

Open access means free and unrestricted access to publications and data, including reading, downloading and reuse, with appropriate attribution. It is common to distinguish between gold and green open access, where in the gold case the publisher and in the green case the institution or researcher provides open access.

Open Content Licence

A licence that allows the free use, modification and distribution of content under certain conditions. Examples include Creative Commons (CC) licences and the GNU Free Documentation License (GFDL).

Open Data

Data that is freely accessible, reusable and shareable, usually under appropriate licences (e.g. Creative Commons). Used in science to promote transparency and reproducibility.

Open File Formats

A file format that is publicly documented and free to use in any software. Examples are TXT (plain text file without formatting), ODT (OpenDocument Text), CSV (Comma-Separated Values), PNG (Portable Network Graphics).

Open science, Open research

Open Research is a set of principles and practices that aim to make the outputs of research freely accessible and usable, thereby to maximising the possibility of public benefit. It is also characterised by collaboration, transparency and reproducibility.

ORCID

Open Researcher and Contributor ID (ORCID), a unique author identifier that helps to clearly identify researchers, manage name variations and track their publications.

P

Persistent identifier

Used for long-term identification of digital objects. The function of a PID is to uniquely identify a digital object globally, and it is also required that the identifier be resolvable (actionable) to a URL. DOI, ORCID, ISBN are known examples of PID services.

Personal data

Personal data are any data that can be associated with a natural person and any inference that can be drawn from the data concerning the data subject.

Primary data (raw data)

Unprocessed data, in the form and with the content they had when they were created.

Provenance

Traditionally, a chronological list of the owners and storage locations of a historical object. In modern research data management, it describes in as much detail as possible the stages in the life cycle of the data, i.e. how and from what other data it was created, by whom and what transformations were made to it.

R

RDA

The Research Data Alliance (RDA) is a global organisation dedicated to promoting the interoperability and sharing of research data. RDA creates opportunities for collaboration between different disciplines, organisations and countries to manage research data more efficiently.

https://www.rd-alliance.org/

Re3data

The Registry of Research Data Repositories (re3data.org) is a global registry, with more than 2,400 detailed data repository records, and is widely acknowledged as a tool for the identification of suitable repositories for research data.

https://www.re3data.org/

Readme file

A text file, usually found in software projects or data sets. It may contain information about how to use the files, the purpose of the project, and installation or run instructions.

Registry

A list of digital objects of some type, uniquely identified within the register. Its main purpose is to allow the object to be located, rather than to store the complete data content (unlike a repo store). Typical examples are: domain name register, protocol register, metadata schema register.

Repository

A file server for archiving and making scientific material available.

Reproducibility

Replicability and independent verifiability of research results.

Research data

Research data is any information generated to support research findings. Research data can be created by authors, generated by tools, collected or observed. Research data do not only exist in digital form, but also include paper notes and journals.

Research data lifecycle

The process of research data management, which includes the collection, storage, processing, sharing, publication and long-term archiving of data.

Research data management

Research data management or data management plan refers to the creation, storage, access, sharing, reuse, and preservation of research data.

Research Lifecycle

It is the set of different phases of research, from the initial idea to the implementation, publication and exploitation of the research, and the archiving and re-use of the results.

Research Object

A Research Object provides a way to aggregate, package and identify research data on the Internet. It involves describing the data (metadata), documenting the content of the data files and the relationships between them.

Reusable

The management of research data is characterised by the acronym FAIR (Findable, Accessible, Interoperable, Reusable). The data is well documented and clearly licensed for reuse by other researchers.

RO-Crate

Research data package format, in which the whole package and the individual files it contains can be richly described with metadata. This supports FAIR compliance and long-term preservation of research data.

ROR

The Research Organisation Registry (ROR) is an open, community database of unique identifiers for research organisations and universities. It helps to clearly identify institutions when linking scientific publications and research data.

https://ror.org/

ROR ID

The Research Organization Registry Identifier (ROR ID) is a unique identifier provided by the Research Organization Registry (ROR) system for research institutions, universities and other scientific organizations.

S

Sensitive data

Data that require special protection (e.g. personal or health data).

Structured data

Structured data is data that has a standardised format for efficient access by software and people. Typically tabular, with rows and columns.

T

Timestamp

Information recording the date on which the data were created or modified.

V

Version management

Track and document changes to data files.

W

Wikidata

A free and open data set containing structured data on various topics. Wikidata is operated by the Wikimedia Foundation and is intended to serve as a central data source for Wikipedia and other projects.

https://www.wikidata.org

Wikidata Identifier

A unique identifier (Q-number, e.g. Q42) that Wikidata assigns to each record. This helps both machines and humans to clearly identify the data.