#OAWeek2022: The PDF is not enough: Why science needs open formats

Contribution from Axel Dürkop und Florian Hagen

During the project period from 2019 to 2021, the Modern Publishing project bundled many years of experience of the Hamburg University of Technology (TUHH) and the Hamburg State and University Library (SUB) as part of the Hamburg Open Science (HOS) initiative. The goal: The development of a socio-technical system for single source publishing, i.e. for the generation of different output formats from one source format. This was based on open source solutions such as GitLab and Open Journal Systems (OJS) to enable an open alternative approach to publishing scientific results compared to commercial and proprietary publishing offerings.

An open system for writing and publishing

A first architectural draft of the project was presented at the Open Access Days 2019. At that time, the tool chain around GitLab and OJS had already been extended to include Markdown, pandoc-scholar, Docker, and Hypothesis. Thus, among other things, aspects of collaboration and participation in writing processes were integrated in the context of publication processes.

This status of the socio-technical system, i.e. the link between people and technology, is also depicted in the poster contribution to the OA Days 2019.

Plakat Offenheit leben: Kollaboratives Schreiben und Publizieren unter Berücksichtigung der Werte von Open Science
Abb: Offenheit leben: Kollaboratives Schreiben und Publizieren unter Berücksichtigung der Werte von Open Science https://doi.org/10.5281/zenodo.3267473

However, the work did not end with the Open Access Days 2019. Feedback was gathered and, thanks to the help of many colleagues, the process chain was further developed from different perspectives. The result of this further work was single-source publishing with Swapfire and OJS, which was also presented at various workshops and events.

Prozesskette SWAPFIRE
Abb: Single Source Publishing mit Swapfire und OJS https://doi.org/10.15480/882.2902

The illustration, which looks like a spiral, shows the single-source publishing workflow for a journal publication. Here, Markdown texts can be converted into PDF and HTML files, for example, using static page generators and converters – depending on the target formats (for example, journal articles, but also teaching-learning scripts or websites).

The different possibilities of the process chain were tested with colleagues from various institutions and used, among other things, for the relaunch of the freely available and peer-reviewed scientific journal kommunikation@gesellschaft. The work on open single source publishing approaches did not end with the end of the project.

Founding of the Single Source Publishing Community

Former team members of the project have founded the Single Source Publishing Community (SSPC). This community focuses on scientific writing and publishing with open tools and formats and is a meeting place for researchers, lecturers, publishers and developers. Under the motto “Collaborate more, compete less”, the active members of the community exchange information at their monthly meetings on current developments in their projects and discuss strategies for cultural change in the field of scientific publishing.

Open tools for open formats

Numerous open source tools favor the desired sovereignty: Software projects like Open Journal Systems, Vivliostyle, Paged.js, Swapfire, FidusWriter, HedgeDoc, Quarto and last but not least pandoc are combined in different ways in community projects to create alternative open systems.

Many projects use the Markdown format as a source to create complementary versions to the PDF in the form of HTML, JATS/XML and EPUB. The latter offer the advantage that they preserve the semantic distinction of the information they contain and thus open up a wide range of possible applications in automated text mining processes. At the same time, the usability and reach of published scientific findings is increased.

Formats For Future in Open Access Week 2022

The importance of open formats for scientific publications is demonstrated by the IPCC’s current report on climate change, which is published as a PDF with several thousand pages. Peppered with abbreviations and jargon, the important information on the climate crisis is understandable only to a specialist community and unreadable to machines. The fact that some parts have recently appeared in HTML format is gratifying.

The #semanticClimate group led by chemist and open knowledge activist Peter Murray-Rust has therefore taken it upon itself to convert the IPCC’s PDF documents into HTML and XML and to mark them up semantically – valuable and time-consuming work that would not be necessary if the IPCC were to publish in semantically excellent formats from the outset.

With this in mind, activists in the single source publishing community and #semanticClimate have joined forces to raise awareness of sustainable publishing workflows and formats during Open Access Week 2022, which is themed “Open for Climate Justice.”

In a one-week hackathon entitled “Formats For Future: Liberating and Semantify IPCC Reports”, the tools of the #semanticClimate group can be tested and further chapters of the IPCC report can be liberated. The organization is supported by the University Library of the TU Hamburg, the Open Science Lab of the TIB Hannover, the Hamburg Open Online University at the TU Hamburg and many other people who have contributed time and expertise over the past weeks.

Join in!

The Single Source Publishing Community meetings are held once a month and are open to all interested parties. Further information can be found on GitHub..

The international hackathon “Formats For Future: Liberating and Semantify IPCC Reports” will start on 10/24/2022 and last until the end of Open Access Week on 10/30/2022. Further information can be read TODO in this blogpost TODO and on the group’s website.

Under the motto “Formats For Future”, further activities are planned for the future to make scientific publishing more open, independent and fit for the future.