Submit to Data Review for Data

Journal Menu

Journal Browser

Data, Volume 6, Issue 9 (September 2021) – 6 articles

Cover Story (view full-size image): Trends in the sciences are indicative of data management becoming a feature of the mainstream research process. In this context, the European Commission introduced an Open Research Data (ORD) pilot at the start of its Horizon 2020 Research and Innovation programme. With Horizon 2020 gradually coming to an end and Horizon Europe having recently started, an important facet of the new EU research cycle is to support research data management and open access to research data according to the principle “as open as possible, as closed as necessary”. With this in mind, a review of projects that participated in the Horizon 2020 ORD pilot has been undertaken in anticipation of identifying best practices and providing insights into the formulation and implementation of effective data management plans. View this paper.

Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
You may sign up for e-mail alerts to receive table of contents of newly released issues.
PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Reader to open them.

Order results

Result details

Section

Show export options Show export options

Select all

Export citation of selected articles as:

13 pages, 3072 KiB

Open AccessData Descriptor

Dataset of Flow-Induced Vibrations on a Pipe Conveying Cold Water

by Francisco Villa, Cherlly Sánchez, Marcela Vallejo, Juan S. Botero-Valencia and Edilson Delgado-Trejos

Data 2021, 6(9), 100; https://doi.org/10.3390/data6090100 - 17 Sep 2021

Cited by 3 | Viewed by 3039

Abstract

Analysis of flow-induced pipe vibrations has been applied in a variety of applications, such as flowrate inference and leak detection. These applications are based on a functional relationship between the vibration features estimated in the pipe walls and the dynamics related to the flow of the substance. The dataset described in this document is comprised of signals acquired using an accelerometer attached to a pipe conveying cold water at specific flowrate values. Tests were carried out under numerals of the ISO 4064-1/2: 2016 standard and were performed in two measurement benches designed for flowmeter calibration, and a total of 80 flowrate values, from 25 L/h to 20,000 L/h, were considered. For each flowrate value, 3 to 6 samples were taken, so that the resulting dataset has a total of 382 signals that contain acceleration values in three axes and a timestamp in microseconds. Full article

(This article belongs to the Section Information Systems and Data Management)

► Show Figures

Figure 1

13 pages, 2860 KiB

Open AccessData Descriptor

Technical Data of Heterologous Expression and Purification of SARS-CoV-2 Proteases Using Escherichia coli System

by Rafida Razali, Vijay Kumar Subbiah and Cahyo Budiman

Data 2021, 6(9), 99; https://doi.org/10.3390/data6090099 - 16 Sep 2021

Cited by 7 | Viewed by 3023

Abstract

The SARS-CoV-2 coronavirus expresses two essential proteases: firstly, the 3Chymotrypsin-like protease (3CLpro) or main protease (Mpro), and secondly, the papain-like protease (PLpro), both of which are considered as viable drug targets for the inhibition of viral replication. In order to perform drug discovery assays for SARS-CoV-2, it is imperative that efficient methods are established for the production and purification of 3CLpro and PLpro of SARS-CoV-2, designated as 3CLpro-CoV2 and PLpro-CoV2, respectively. This article expands the data collected in the attempts to express SARS-CoV-2 proteases under different conditions and purify them under single-step chromatography. Data showed that the use of E. coli BL21(DE3) strain was sufficient to express 3CLpro-CoV2 in a fully soluble form. Nevertheless, the single affinity chromatography step was only applicable for 3CLpro-CoV2 expressed at 18 °C, with a yield and purification fold of 92% and 49, respectively. Meanwhile, PLpro-CoV2 was successfully expressed in a fully soluble form in either BL21(DE3) or BL21-CodonPlus(DE3) strains. In contrast, the single affinity chromatography step was only applicable for PLpro-CoV2 expressed using E. coli BL21-CodonPlus(DE3) at 18 or 37 °C, with a yield and purification fold of 86% (18 °C) or 83.36% (37 °C) and 112 (18 °C) or 71 (37 °C), respectively. The findings provide a guide for optimizing the production of SARS-CoV-2 proteases of E. coli host cells. Full article

(This article belongs to the Section Computational Biology, Bioinformatics, and Biomedical Data Science)

► Show Figures

Figure 1

Figure 1
The three-dimensional model structures of (a) 3CLpro-CoV2 (PDB ID: 6WTM) and (b) PLpro-CoV2 (PDB ID: 6W9C). The domain organization and catalytic residues of both proteases were also indicated for clarity. Full article ">Figure 2
The primary structure of SARS-CoV-2 proteases: (a) 3CLpro-CoV2 and (b) PLpro-CoV2. The linker sequence for connecting MBP and 3CLpro is LINGDGAGLEVLSAVLQ. The 6His-tag sequences for 3CLpro-CoV2 and PLpro-CoV2 are GPHHHHHH and HHHHHH, respectively. The figures are not drawn to scale. Full article ">Figure 3
Expression profile of 3CLpro-CoV2 in E. coli BL21 (DE3) under 15% SDS-PAGE. Lane 1: The cell before IPTG induction; Lane 2: The cell after IPTG induction; Lane 3: Soluble fraction of the cell obtained after the sonication; Lane 4: Insoluble fraction of the cell obtained after the sonication. The area that corresponds to the 3CLpro-CoV2 band is indicated by a red box: (a) The expression profile under condition 1; (b) The expression profile under condition 2. Details of the conditions are shown in <a href="#data-06-00099-t002" class="html-table">Table 2</a>. Full article ">Figure 4
Expression check of PLpro-CoV2 under 15% SDS-PAGE. Lane 1: The cell before IPTG induction; Lane 2: The cell after IPTG induction; Lane 3: Soluble fraction of the cell obtained after the sonication; Lane 4: Insoluble fraction of the cell obtained after the sonication. The area that corresponds to the PLpro-CoV2 band is indicated by a red box: (a) The expression profile under condition 5; (b) The expression profile under condition 6; (c) The expression profile under condition 9; (d) The expression profile under condition 10. Details of the conditions are shown in <a href="#data-06-00099-t002" class="html-table">Table 2</a>. Full article ">Figure 5
The 15% SDS-PAGE analysis of purified 3CLpro-CoV2. Lane M: Protein marker; Lane 1: Purified protein after Ni2+-NTA chromatography: (a) Purified 3CLpro-CoV2 expressed under condition 1; (b) Purified 3CLpro-CoV2 expressed under condition 2. The band that corresponds to the 3CLpro-CoV2 is indicated by an arrow. Details of the conditions are shown in <a href="#data-06-00099-t002" class="html-table">Table 2</a>. Full article ">Figure 6
The 15% SDS-PAGE analysis of purified PLpro-CoV2. Lane M: Protein marker; Lane 1: Purified protein after Ni2+-NTA chromatography: (a) Purified PLpro-CoV2 expressed under condition 5; (b) Purified PLpro-CoV2 expressed under condition 6; (c) Purified PLpro-CoV2 expressed under condition 9; (d) Purified PLpro-CoV2 expressed under condition 10. Details of the conditions are shown in <a href="#data-06-00099-t002" class="html-table">Table 2</a>. Full article ">Figure 6 Cont.
The 15% SDS-PAGE analysis of purified PLpro-CoV2. Lane M: Protein marker; Lane 1: Purified protein after Ni2+-NTA chromatography: (a) Purified PLpro-CoV2 expressed under condition 5; (b) Purified PLpro-CoV2 expressed under condition 6; (c) Purified PLpro-CoV2 expressed under condition 9; (d) Purified PLpro-CoV2 expressed under condition 10. Details of the conditions are shown in <a href="#data-06-00099-t002" class="html-table">Table 2</a>. Full article ">Figure 7
The formation of yellow color in the reaction cocktails of (a) 3CLpro-CoV2 and (b) PLpro-CoV2. Full article ">

6 pages, 1817 KiB

Open AccessData Descriptor

Seismic Envelopes of Coda Decay for Q-coda Attenuation Studies of the Gargano Promontory (Southern Italy) and Surrounding Regions

by Marilena Filippucci, Salvatore Lucente, Salvatore de Lorenzo, Edoardo Del Pezzo, Giacomo Prosser and Andrea Tallarico

Data 2021, 6(9), 98; https://doi.org/10.3390/data6090098 - 13 Sep 2021

Cited by 1 | Viewed by 1824

Abstract

Here, we describe the dataset of seismic envelopes used to study the S-wave Q-coda attenuation quality factor

Q_{c}

of the Gargano Promontory (Southern Italy). With this dataset, we investigated the crustal seismic attenuation by the

Q_{c}

parameter. We collected this dataset [...] Read more.

Here, we describe the dataset of seismic envelopes used to study the S-wave Q-coda attenuation quality factor

Q_{c}

of the Gargano Promontory (Southern Italy). With this dataset, we investigated the crustal seismic attenuation by the

Q_{c}

parameter. We collected this dataset starting from two different earthquake catalogues: the first regarding the period from April 2013 to July 2014; the second regarding the period from July 2015 to August 2018. Visual inspection of the envelopes was carried out on recordings filtered with a Butterworth two-poles filter with central frequency

f_{c}

= 6 Hz. The obtained seismic envelopes of coda decay can be linearly fitted in a bilogarithmic diagram in order to obtain a series of single source-receiver measures of

Q_{c}^{}

for each seismogram component at different frequency

f_{c}

. The analysis of the trend

Q_{c} (f_{c}

) gives important insights into the heterogeneity and the anelasticity of the sampled Earth medium. Full article

(This article belongs to the Section Spatial Data Science and Digital Earth)

► Show Figures

Figure 1

Figure 1
Plot of the first envelope file in <a href="#data-06-00098-t003" class="html-table">Table 3</a>, as an example. Full article ">Figure 2
Plot of the first envelope file in <a href="#data-06-00098-t006" class="html-table">Table 6</a>, as an example. Full article ">Figure 3
Three-component seismograms at station OT01, as an example. Over each record, the origin time in absolute time is overwritten; the X-axis is time (s), the Y-axis is amplitude (counts/s). The P-wave marker (IPU0) and S-wave marker (IS) are overwritten. Full article ">Figure 4
Three-component seismograms at station OT01, filtered with <math display="inline"> <semantics> <mrow> <msub> <mi>f</mi> <mi>c</mi> </msub> <mo>=</mo> <mn>6</mn> </mrow> </semantics> </math> Hz and band-width [4.24; 8.48] Hz, as an example. Over each record, the origin time in absolute time is overwritten; the X-axis is time (s), the Y-axis is amplitude (counts/s). Full article ">Figure 5
Envelopes of the filtered seismograms in <a href="#data-06-00098-f004" class="html-fig">Figure 4</a>. Over the first record, the T3 and T4 markers are overwritten; the X-axis is time (s), the Y-axis is amplitude (counts/s). Full article ">

11 pages, 2352 KiB

Open AccessArticle

BioCPR–A Tool for Correlation Plots

by Vidal Fey, Dhanaprakash Jambulingam, Henri Sara, Samuel Heron, Csilla Sipeky and Johanna Schleutker

Data 2021, 6(9), 97; https://doi.org/10.3390/data6090097 - 8 Sep 2021

Cited by 7 | Viewed by 5121

Abstract

A gene is a sequence of DNA bases through which genetic information is passed on to the next generation. Most genes encode for proteins that ultimately control cellular function. Understanding the interrelation between genes without the application of statistical methods can be a daunting task. Correlation analysis is a powerful approach to determine the strength of association between two variables (e.g., gene-wise expression). Moreover, it becomes essential to visualize this data to establish patterns and derive insight. The most common method for gene expression visualization is to use correlation heatmaps in which the colors of the plot represent strength of co-expression. In order to address this requirement, we developed a visualization tool called BioCPR: Biological Correlation Plots in R. This tool performs both correlation analysis and subsequent visualization in the form of an interactive heatmap, improving both usability and interpretation of the data. BioCPR is an R Shiny-based application and can be run locally in Rstudio or a web browser. Full article

(This article belongs to the Section Computational Biology, Bioinformatics, and Biomedical Data Science)

► Show Figures

Figure 1

19 pages, 508 KiB

Open AccessArticle

Lessons Learnt from Engineering Science Projects Participating in the Horizon 2020 Open Research Data Pilot

by Timothy Austin, Kyriaki Bei, Theodoros Efthymiadis and Elias P. Koumoulos

Data 2021, 6(9), 96; https://doi.org/10.3390/data6090096 - 6 Sep 2021

Cited by 3 | Viewed by 3369

Abstract

Trends in the sciences are indicative of data management becoming established as a feature of the mainstream research process. In this context, the European Commission introduced an Open Research Data pilot at the start of the Horizon 2020 research programme. This initiative followed the success of the Open Access pilot implemented in the prior (FP7) research programme, which thereafter became an integral component of Horizon 2020. While the Open Access phenomenon can reasonably be argued to be one of many instances of web technologies disrupting established business models (namely publication practices and workflows established over several centuries in the case of Open Access), initiatives designed to promote research data management have no established foundation on which to build. For Open Data to become a reality and, more importantly, to contribute to the scientific process, data management best practices and workflows are required. Furthermore, with the scientific community having operated to good effect in the absence of data management, there is a need to demonstrate the merits of data management. This circumstance is complicated by the lack of the necessary ICT infrastructures, especially interoperability standards, required to facilitate the seamless transfer, aggregation and analysis of research data. Any activity aiming to promote Open Data thus needs to overcome a number of cultural and technological challenges. It is in this context that this paper examines the data management activities and outcomes of a number of projects participating in the Horizon 2020 Open Research Data pilot. The result has been to identify a number of commonly encountered benefits and issues; to assess the utilisation of data management plans; and through the close examination of specific cases, to gain insights into obstacles to data management and potential solutions. Although primarily anecdotal and difficult to quantify, the experiences reported in this paper tend to favour developing data management best practices rather than doggedly pursue the Open Data mantra. While Open Data may prove valuable in certain circumstances, there is good reason to claim that managed access to scientific data of high inherent intellectual and financial value will prove more effective in driving knowledge discovery and innovation. Full article

(This article belongs to the Section Information Systems and Data Management)

► Show Figures

Figure 1

19 pages, 2945 KiB

Open AccessArticle

TRIPOD—A Treadmill Walking Dataset with IMU, Pressure-Distribution and Photoelectric Data for Gait Analysis

by Justin Trautmann, Lin Zhou, Clemens Markus Brahms, Can Tunca, Cem Ersoy, Urs Granacher and Bert Arnrich

Data 2021, 6(9), 95; https://doi.org/10.3390/data6090095 - 26 Aug 2021

Cited by 10 | Viewed by 5871

Abstract

Inertial measurement units (IMUs) enable easy to operate and low-cost data recording for gait analysis. When combined with treadmill walking, a large number of steps can be collected in a controlled environment without the need of a dedicated gait analysis laboratory. In order to evaluate existing and novel IMU-based gait analysis algorithms for treadmill walking, a reference dataset that includes IMU data as well as reliable ground truth measurements for multiple participants and walking speeds is needed. This article provides a reference dataset consisting of 15 healthy young adults who walked on a treadmill at three different speeds. Data were acquired using seven IMUs placed on the lower body, two different reference systems (Zebris FDMT-HQ and OptoGait), and two RGB cameras. Additionally, in order to validate an existing IMU-based gait analysis algorithm using the dataset, an adaptable modular data analysis pipeline was built. Our results show agreement between the pressure-sensitive Zebris and the photoelectric OptoGait system (r = 0.99), demonstrating the quality of our reference data. As a use case, the performance of an algorithm originally designed for overground walking was tested on treadmill data using the data pipeline. The accuracy of stride length and stride time estimations was comparable to that reported in other studies with overground data, indicating that the algorithm is equally applicable to treadmill data. The Python source code of the data pipeline is publicly available, and the dataset will be provided by the authors upon request, enabling future evaluations of IMU gait analysis algorithms without the need of recording new data. Full article

(This article belongs to the Special Issue Measurements of User and Sensor Data from the Internet of Things (IoT) Devices)

► Show Figures

Figure 1