Research Data

Leitlinien für den Umgang mit Forschungsdaten an der TU Hamburg

What is research data?

Almost every research process in any field will produce data. They arise as by experiments, measurements, surveys, interviews as well as by digitization or source research. Therefore primary or raw data can be of varying content. There will be measurement results and analysis data with the associated programs among them just as pictures and drawings or empirical data.

Research Data

.

Why should research data be published?

The storage and provision of data of research projects in addition to the publication of results is becoming increasingly important:

  • Research results will be transparent and verifiable in conjunction with the data used.
  • Reuse of data is possible. Thus, duplication of effort is avoided and instead new research ideas may emerge.
  • Securing and storing primary data is in accordance with good scientific practice. A corresponding recommendation is given in the Guidelines for Safeguarding Good Research Practice by Deutsche Forschungsgemeinschaft (DFG).

Also many project funders require for a data management plan and research data to be published. For example:

Data management planning

If your project involves data, then data management planning should be one of the early tasks in your project. Usually, these considerations are recorded in a data management plan (DMP). Depending on the project a DMP can vary between a few lines and a very comprehensive document. The following questions should be considered:

  • Are there any existing data that can be reused?
  • What kind of data is generated in the project?
  • How will data be organized?
  • Are there any administrative or legal aspects to consider?
  • How is the data published and made available for re-use of the community?
  • How are responsibilities defined?
  • What costs can be expected?

In Germany there are no set procedures for the preparation of a DMP yet. Two online tools have become well established internationally:

  • DMPonline:
    Provision by the UK Digital Curation Centre (DDC) with a strong focus on British requirements. But also suitable for Horizon 2020.
  • DMPTool:
    Provision by the University of California Curation Center with a strong focus on the US requirements of NSF and NIH.

Where and how can I publish research data?

Research data should be stored in a recognized repository or archiving system, provided with a persistent identifier (DOI) and metadata in accordance with the FAIR principles (“Findable, Accessible, Interoperable, Reusable”), and, if possible, made publicly available.

The FAIR Data Principles: Findable, Accessible, Interoperable, Reusable
SangyaPundir, CC BY-SA 4.0, via Wikimedia Commons

Research data is best published in a domain specific data repository. If no suitable repository is available, you can publish to the TUHH Open Research (TORE) research data collection. For (citable) software please check out GitHub and Zenodo.

1. Domain specific repositories

Depending on your research area and the community, requirements for searchability and accessibility can vary greatly. re3data.org, the Registry of Research Data Repositories can help you choose the right home for your data.

Criteria for the selection of a trustworthy repository

Trustworthy repositories should meet the following minimum criteria:

  1. Provision of Persistent and Unique Identifiers (PIDs)
    1. Allow data discovery and identification
    2. Enable searching, citing, and retrieval of data
    3. Provide support for data versioning
  2. Metadata
    1. Enable finding of data
    2. Enable referencing to related relevant information, such as other data and publications
    3. Provide information that is publicly available and maintained, even for non-published, protected, retracted, or deleted data
    4. Use metadata standards that are broadly accepted (by the scientific community)
    5. Ensure that metadata are machine-retrievable
  3. Data access and usage licences
    1. Enable access to data under well-specified conditions
    2. Ensure data authenticity and integrity
    3. Enable retrieval of data
    4. Provide information about licensing and permissions (in ideally machine-readable form)
    5. Ensure confidentiality and respect rights of data subjects and creators
  4. Preservation
    1. Ensure persistence of metadata and data
    2. Be transparent about mission, scope, preservation policies, and plans (including governance, financial sustainability, retention period, and continuity plan)

For Guidance see Science Europe. Practical Guide to the International Alignment of Research Data Management (Extended Edition). January 2021. p. 26 ff

2. Hamburg University of Technology: TORE

At TUHH, TUHH Open Research (TORE) is available for the publication of research data from the TUHH.

TUHH Open Research (TORE) https://tore.tuhh.de is operated as an institutional repository for research data at the TU Hamburg in conformity with the FAIR data principles. Long-term archiving takes place on an S3 storage of the regional computing center of the University of Hamburg. TORE supports DataCite as a metadata schema. DOIs for a dataset, ORCID iDs for persons and ROR for institutions are used as persistent identifiers.


TORE is included in the Registry of Research Data Repositories: re3data.org: TUHH Open Research – Research Data TUHH; editing status 2020-05-14; re3data.org – Registry of Research Data Repositories. https://doi.org/10.17616/R31NJML0 last accessed: 2022-05-17


FAQs

Before publication:

Files up to 5GB in size can be uploaded via the TORE website. Larger files can be uploaded to TORE via a WebDAV directory. Please contact us if you want to upload files larger than 5GB.


I do not want to publish the data until my paper is accepted.

The Digital Object Idenfieer (DOI) is displayed to you in TORE after you enter the descriptive data and before you upload your data. You can use it to reference your dataset and include it in the data documentation.

Important: The DOI will not be registered until your dataset is published. Until then, your dataset cannot be displayed on the web.


Depending on the software used and the further goals of a research project, some file formats are more suitable than others. In the scientific field, particular attention should be paid to compatibility, suitability for long-term archiving and lossless conversion to alternative formats.

Not all file formats are archivable to the same extent in the medium or long term. In particular, proprietary formats whose usability and readability depend on specific software manufacturers or platforms are not suitable for archiving and should therefore be converted into independent formats that can be read over the long term.

forschungssdaten.info: Overview of frequently used data formats (German)


Access protection:

An embargo is possible on file level. This can be set individually via the Access Settings of the uploaded file. After expiration, the file is automatically publicly accessible.

File with embargo displayed until October 1, 2022. Below this, Request copy is displayed.

File with embargo displayed until October 1, 2022. Below this, Request copy is displayed.

Basically yes. Via “Request copy”, a request is generated offering e-mail dispatch. However, depending on the file size, sending by e-mail is not possible. Therefore, access can be set up for selected persons:

  1. All persons authorized to access the dataset must have an account with TORE.
  2. The owner of the data set notifies forschungsdaten@tuhh.de of the data set and the persons authorized to access it.
  3. As soon as access is set up, feedback is sent to all parties involved.

After publication:

Please use the feedback option next to your data set on the right side. We will correct the error.


Please use the feedback option next to your data set on the right side. We will link your research data to your papa.


How to create a new version:

  1. Log into TORE
  2. Call your record
  3. Use Button on the right side: Create Version of this item
    • Change the descriptive data if necessary
    • Make a note of the new, versioned DOI and incorporate it into your documentation
    • Delete the obsolete versions from the files and upload the new versions
    • Add a brief description of the change
    • Publish

The record is then released after a formal check by the library. The versions are automatically linked to each other.

Example of a versioned record on TORE:

Sardhara, T., Aydin, R. C., Li, Y., Piché, N., Gauvin, R., Cyron, C. J., & Ritter, M. (2021). Training deep neural networks to reconstruct nanoporous structures from FIB tomography images using synthetic training data. TUHH Universitätsbibliothek. https://doi.org/10.15480/336.3932


3. Software on Zenodo and Github

Make your code citable

Software development is often an important part of scientific work at TUHH. To make your repositories easier to reference in the scientific literature, you can create persistent identifiers, also known as Digital Object Identifiers (DOI). You can use the data archiving tool Zenodo to archive a repository on GitHub.com and assign a DOI for the archive.

  • Develop on GitLab (internal)
  • Put on GitHub (public)
  • Archive on Zenodo (CERN-hosted)
    → obtain DOI
    → reference DOI in your paper

Digital Object Identifiers (DOI) have become the quasi-standard for referencing electronic publications. So what could be more obvious than using DOIs for software as well? Since 2014, this has been possible thanks to the cooperation between GitHub and Zenodo. Zenodo is an open platform for the permanent archiving of research results of almost any kind, operated by CERN and funded by the EU, among others: zenodo.org/features

How can I make research data quotable?

If research data are to complement research results, they must also be reliably citable. Many repositories use a registration agency to list their content in DataCite. There a unique DOI is assigned for each data set, which will help the data set to become permanently citable and accessible.

Aberle, Christoph (2019). Mobility as a Service: ein Angebot auch für Einkommensarme? (Geo-Datensatz). TUHH Universitätsbibliothek. https://doi.org/10.15480/336.2396

Any publication in TUHH Open Research or DataCite authored by you can also be claimed to your ORCID profile.

If you have questions