1. Introduction
Healthcare is evolving towards the application of automated systems for home-monitoring and pre-clinical screening to complement diagnostic routines. The current reference practice for diagnosis of sleep-related pathologies is a labor-intensive overnight stay in a specialized sleep center. There, a polysomnography (PSG) is performed, requiring the patient to wear encephalography electrodes, oronasal airflow sensors, thoracic and abdominal belts, electrocardiography (ECG) sensors, an oxygen saturation finger-clip sensor, a body position sensor, chin and leg electromyography and electrooculography sensors over a full night. This setup is highly obtrusive for the patient and impedes a normal night’s sleep. Moreover, the PSG procedure requires well-trained staff for analysis and is costly and burdensome. Sleep centers often have a limited capacity as well. Therefore, unobtrusive, cheap, and simple though reliable systems for monitoring at home are desired. These sensors could offer the ability to screen patients and prioritize them for hospital diagnostics, to increase healthcare accessibility or to enable long-term follow-up.
Among sleep disorders, obstructive sleep apnea (OSA) has the highest prevalence, from 13% to 33% in men and from 6% to 19% in women. However, these numbers are probably an underestimated and are likely to grow as they are closely associated with obesity and advancing age [
1]. OSA is characterized by events of breathing disturbance causing hypoxaemia, large chest motions, and arousals from sleep. These events fragment the patient’s sleep and reduce phases of rapid eye movement and slow wave sleep. Consequently, OSA is an acknowledged risk factor for excessive daytime sleepiness, hypertension, and cardiovascular diseases [
2]. The severity of sleep apnea is assessed by the Apnea–Hypopnea Index (AHI), which is the number of respiratory events (apneas and hypopneas) per hour. A patient is categorized as not suffering from sleep apnea (
AHI
), as having mild apnea (
AHI
), moderate apnea (
AHI
), or severe apnea (
AHI) [
3].
In order to expand unobtrusive resources for home-based sleep apnea screening and sleep monitoring, a commercial off-the-shelf sensor was explored, the Emfit QS (referred to as Emfit, developed and manufactured by Emfit, Finland). The Emfit is a pressure sensor built from electromechanical film (EMFi), which is a polypropylene film including gas voids. The material is similar to piezoelectric materials as a displacement charge is produced when a force is being applied. However, the change of the internal electric field is caused by the movement of static charges that were injected during fabrication of the film [
4]. From the pressure-modulated signal, a respiratory signal and ballistocardiography (BCG) signal can be derived. The latter is an unobtrusive measurement of the body’s recoil caused by cardiovascular pulsation. As such, the sensor can provide information on sleep-disordered breathing as well as other origins of motion. A study by Koyama et al. [
5], based on BCG, studied the feasibility of a piezoelectric sensor for apnea screening. They considered apneas during Cheyne–Stokes-like breathing to be correlated with AHI. This type of breathing is, however, only present in cardiac patients, thus targeting a subset of patients. Tenhunen et al. [
6] evaluated a custom-made Emfit sheet and derived several parameters from breathing patterns to correlate these with AHI and assess sleep apnea severity. Despite the sensitivity of 0.95 in detecting subjects with AHI <15 using a combined parameter, the method required annotators to score breathing patterns visually and made no contribution to the automatic detection of these patterns. The same authors derived heart rate variability (HRV) as well [
7], which resembled known HRV results of sleep apnea patients during periodic apneic events. This revealed an increase in sympathetic activity and claimed a good reliability of detection of periodic sleep disordered breathing. However, periods with wakefulness, movements, and artefacts were manually omitted, which hinders the application of Emfit as a stand-alone device.
Currently, no fully automated sleep apnea screening method has been established based on the Emfit sensor. Moreover, no Emfit studies have been performed using the commercial off-the-shelf Emfit sensor, according to the knowledge of the authors of this study. Hence, the goal of the present study was twofold (see
Figure 1). First, the potential of the Emfit sensor in a stand-alone setting for sleep apnea screening was investigated. Sleep apnea is characterized by breathing cessations, which are terminated by arousals often accompanied by large motions of the chest. These arousals and chest motions cause deviations in the signals, which were referred to as artefacts. Hence, the Emfit data was explored to reveal clusters of artefacts and clean segments in the signal. The characteristics of these clusters were linked to the AHI. This cluster analysis was performed unsupervised as the Emfit sensor was not automatically synchronized with the PSG and to avoid burdensome manual labeling of the data into clean and artefact segments. Secondly, the study explored the suitability of this commercial sensor to be merged with gold standard polysomnography data for future sleep monitoring. Therefore, an automated synchronization procedure based on the previously detected artefact patterns was developed, since PSG and Emfit signals originate from different types of sensor modalities and cannot be regarded as strictly coupled. After synchronization, two different positions of the Emfit were investigated to find the optimal position for capturing respiratory and cardiac information similar to the PSG.
2. Materials
The Emfit QS is a commercially available pressure sensor (542 mm × 70 mm × 1.4 mm). Both the raw data and prefiltered data was made available. The raw data was sampled at 100 Hz. The prefiltered data contained a bandpass filtered signal at [0.08, 3] Hz and a bandpass filtered signal at [6, 16] Hz to obtain the respiratory and BCG signals, respectively. Filtering techniques were not specified by the manufacturer. From the PSG system (B3IP, Medatec, Belgium) the thoracic belt and ECG signal were analyzed.
In this study, two setups of the sensor were investigated. The bed consisted of a mattress on top of which a mattress topper of approximately 4 cm thickness was added. One sensor was positioned underneath the thorax of the patient, separated by the mattress cover (position
Top). A second sensor was placed beneath the topper (position
Bottom) at a 2.5 cm horizontal distance to the top sensor (see
Figure 2). The horizontal distance ensured the limiting of the influence of the top sensor and compensated the effect of patients moving down in the bed when lifting the head of the mattress upwards. This setup was applied simultaneously in two beds in the sleep laboratory.
The Emfit sensor and PSG simultaneously recorded data for patients referred for sleep diagnosis in the sleep laboratory of the University Hospitals Leuven (UZ Leuven). Overnight PSG signals were annotated by sleep specialists according to the American Academy of Sleep Medicine. 2012 scoring rules [
8] to derive the AHI. The dataset was recorded in two phases with an interruption of 7.5 months. The sensor setup remained the same; only the sensors were removed between phases and relocated as close as possible to the original location. Specifications of both datasets can be seen in
Table 1. The last column, Top+Bottom, indicates the number of top sensor signals that have a corresponding bottom signal available. The reason for this was data loss due to technical problems, mostly with the bottom sensor.
All subjects gave their informed consent for inclusion before they participated in the study. The study was conducted in accordance with the Declaration of Helsinki, and the protocol with registration number B322201732928 was approved on November 8th 2018 by the UZ/KU Leuven Ethics Committee (Ethische Commissie Onderzoek UZ/KU Leuven).
3. Emfit-Based Sleep Apnea Screening
The Emfit sensor was evaluated in terms of its potential for sleep apnea screening in a stand-alone setting. As sleep apnea is characterized by breathing cessations that are often accompanied by large chest motions, these motions will induce deviations in the signal. These deviations will be referred to as artefacts, which, on the other hand, can also be induced by non-pathological body motions. It was hypothesized that the distortion of the data increased with AHI as more movement and arousals would be detected. Therefore, these artefacts were identified in the data by an unsupervised clustering method. First, the raw Emfit data was pre-processed. Thereafter, features were extracted that highlight irregularities in the signal. Features which optimally clustered the data were selected. Finally, the characteristics of the clustering were applied for sleep apnea screening.
3.1. Emfit Preprocessing
First, data quality was assessed by investigating the peak-to-peak amplitude (PP) distribution of the sensors after both measurement phases. Then, after subtraction of the mean value, the prefiltered respiratory signal of the Emfit sensor was further bandpass filtered to [0.08, 2] Hz. The respiratory signal was resampled at 4 Hz and the BCG signal at 50 Hz. As the signal amplitude was dependent on the weight and position of the patient, the signals were normalized. Normalization was based on the assumption that long-lasting periods of signal saturation corresponded to position changes by the patient. Segments between these periods were normalized by the median of the PP amplitude of this segment. If the median value was zero, the normalization of the previous segment was applied. This procedure was applied separately to the raw pressure, prefiltered respiratory, and prefiltered BCG signal. The periods of position changes and other saturated values were clipped to a value of 1, which was double the value of signals at the median amplitude.
Next, time–frequency domain information was extracted from the resulting signals by means of the discrete wavelet transform. To accentuate steep changes in the raw pressure signal indicating motion, a Daubechies 1 (i.e., db1 or Haar) wavelet was applied. Taking into account window size and sampling frequency, the signal was decomposed until level 8, i.e., [0.2, 0.4] Hz. The respiratory signal was approximated with a db4 wavelet (until level 3, [0.25, 0.5] Hz) and the BCG with db6 (until level 2, [6.25, 12.5] Hz). The respective wavelet shapes were chosen for their resemblance to the natural wave shape. A total of 16 signals (original signals and decompositions) were used for the subsequent feature extraction step.
3.2. Artefact Detection
3.2.1. Feature Extraction
A feature window of 10 s was applied for sufficient time resolution and to include two to three breaths from the respiration signal. In total, 19 features were extracted in order to locate artefacts by inspecting outliers as well as irregularities (see
Table 2). For features 9–19, the window was split into 3 equal subsegments over which PP was calculated, resulting in
[
9].
Time domain features were derived from both the untransformed signals and the three wavelet decomposed signals. These features were then normalized per subject using the z-score, and features with a Pearson correlation coefficient larger than 0.9 were removed. Lastly, feature values were transformed by means of the Euclidean norm normalization to decrease the effect of extreme values.
3.2.2. Unsupervised Feature Selection
The unsupervised feature selection framework was based on Robust Spectral learning (RSFS) [
10] (see
Figure 3). This method provides a ranking of features, depending on three parameters of the RSFS objective function, i.e.,
,
, and
. Input feature vectors were taken from a reduced training dataset selected using K-medoids clustering with
and the Mahalanobis distance metric [
11]. The K-medoids clustering was performed 100 times, such that the parameter optimization pipeline was run with 100 different training sets. Additionally, the Rényi entropy of every training set was calculated to verify the diversity within a training set and stability over training sets. Next, parameters
,
, and
of the RSFS were taken from a 3D grid search over equispaced values in logarithmic scale from
to 3. For every set
,
, and
, a feature ranking was calculated and a number
d of top-ranked features was selected. Subsequently, a
k-means clustering in a
d-dimensional space was performed 20 times using squared Euclidean distance and random initialization. The clustering performance was evaluated by the overall average silhouette score [
12]. The pipeline was iterated for
features and
clusters. After completion of these iterative steps, the pipeline optimized the parameters
,
, and
, resulting in the feature ranking, as well as the optimal number of features
d.
3.2.3. Clustering of Artefacts
With the optimized features, the training points were clustered using k-means with . From this clustered training set, the centroids of both clusters were identified. These centroids acted as target points for the test data to determine its associated cluster by mapping every test data point to the closest centroid. The characteristics of the clusters were analyzed based on their feature values and a pairwise Mann–Whitney U test. As the features were tailored to detect large deviations in the signal, it was assumed that one cluster contained clean and the other contaminated, or artefact, data segments.
3.3. Screening of Sleep Apnea
Artefacts present in the Emfit signal originated from different sources such as position changes and apneic arousals. It was hypothesized that more severe sleep apnea patients would have more artefacts present in their data compared to healthier subjects. Clustering of these artefacts was performed using k-means clustering. This method assumes globular data structures due to the use of the Voronoi diagram. However, artefacted segments exhibited a varying morphology, resulting in less globular clusters. Therefore, some artefacted segments might be assigned to the clean cluster. The cleanness of the clean segment cluster was inspected by taking into account the distances of segments in the clean cluster to the clean cluster centroid. Outlying values were discarded by only considering values below the 95th percentile of distances. This segment distance distribution was calculated for every subject. A larger 95th percentile would indicate larger distances within the clean cluster and thus more artefact-like segments, hence a larger AHI was expected for the subject.
Training of the cluster centroids was performed with the dataset from Phase 1 (see
Table 1). The dataset from Phase 2 was applied for testing by mapping the data of individual subjects to the trained centroids and evaluating the cleanness of the cluster based on the 95th percentile.
6. Discussion
The approach presented here demonstrated the potential for unobtrusive home-monitoring screening of patients at risk of sleep apnea with an off-the-shelf sensor intended for a home environment. Patients in which a large amount of artefacts are detected, due to position changes or apneic arousals, are considered as being at higher risk of suffering from sleep apnea. A trend was seen in the irregularity of the data with AHI (see
Figure 10a), although the linear relation was limited (
of 0.16). Moreover, a distinction was made between patients suffering from sleep apnea (
AHI) and patients considered healthy (see
Figure 10b). A significant difference existed between both classes, which is a beneficial result for screening purposes. Doctors are most interested in the identification of these patients as they should be referred for further research in a sleep clinic and ideally prioritized on the waiting lists. The screening with ROC analysis resulted in a sensitivity of 0.72, specificity of 0.70, and diagnostic odds ratio (
) of 6.00. Investigation of misclassification revealed a trend in the BMI towards higher values for false negatives and false positives, which can be attributed to saturation of the Emfit pressure signal with heavy weight. As patients with
BMI are known to have an increased risk for sleep apnea, these were removed from the screening analysis. This increased the DOR of the EMFIT screening method for 15 ⩽ AHI to 8.96. Additionally, different body positions can have an influence on the signal and resulting misclassification, such as lying higher, lower, or sideways.
A similar screening procedure was performed in [
15], in which a larger sensitivity (80%) and specificity (87%) for severe sleep apnea screening were obtained. The study was based on the dataset of Phase 1 but using a leave-one-subject-out approach for testing. In the current study, a separate test set (Phase 2) was applied for screening. The sensors of the test set were slightly relocated compared to the training set. This relocation could have changed the properties of the artefacts and of the signal itself, thereby deteriorating the results. Therefore, pre-processing was improved by a normalization of the input data as well as the interpretation of the clustering results. A more gradual increase in irregularity of the data with AHI was observed in this study, complicating the screening of specifically severe sleep apnea patients (
AHI).
In clinical practice, screening questionnaires for OSA are readily available. Chiu et al. [
16] compared the screening performance of commonly used questionnaires such as the STOP-BANG questionnaire (SBQ), which was found to be a superior tool for detecting mild, moderate, and severe OSA. However, its sensitivity is high at the expense of low specificity (
AHI: sensitivity of 0.90, specificity of 0.36, and DOR of 5.05), and its DOR is inferior compared to the current Emfit-based method. Nonetheless, taking into account the different ratios of sensitivity and specificity for both screening methods, these could be applied simultaneously to reinforce each other. Nevertheless, as a screening sensitivity of 0.95 and specificity of 0.92 based on manual annotation of Emfit signals was reached by Tenhunen et al. [
6], improvement in automated methods is possible.
On this matter, clustering of data in clean and artefact segments was performed using
k-means clustering, which is a method assuming globular data structures. However, artefacted segments exhibited a varying morphology resulting in less globular clusters, causing artefacted segments to be assigned to the clean cluster. A more complex clustering algorithm such as kernel spectral clustering [
17] may be able to capture the varying morphologies of artefacts in multiple clusters. On the other hand, the simplified threshold method for screening performed similarly to the unsupervised clustering-based method. However, to establish an optimized threshold, the AHI of patients is required. In contrast, the clustering method is purely data-driven and is trainable without prior knowledge. Furthermore, its application can be extended to capture different types of irregularities in the data.
In order to establish an integration of the Emfit sensor with the PSG, an automated synchronization approach was developed. Segments in
Figure 8 show that wave shapes in both modalities are different. As such, signals cannot be compared as a whole based on cross-correlations and the procedure focused on detecting large artefact patterns first with a coarse synchronization step. In patients with very a high AHI, synchronization becomes more difficult as signal deviations are almost continuously present.
The synchronization approach was automated by the introduction of a performance indicator, namely the bandwidth of the delay distribution. A threshold of a bandwidth = 14.26 could be defined to ensure sufficient synchronization accuracy. Moreover, most of the data (86.3%) attained a value below the threshold. However, some signals exhibited a delay distribution bandwidth above 15 while synchronization was accurate enough. A reason was that some patients leave the bed overnight. Electrodes are detached and only noise is recorded, causing the synchronization between both sensors to be distorted. The optimal shift before and after detachment is different, causing the bandwidth of the shift distribution to increase. Leaving the bed is a typical event, hence future work for Emfit–PSG integration should include the detection of electrode detachment and separate synchronization on different segments of the night. Concerning other recordings, the delay was fixed over the night. The difference in delay among recordings was suspected in instabilities during recording of the Emfit data, transmission over the hospital’s wifi network, or uploading to the Emfit server. Furthermore, synchronization in the signals of patients with a very large AHI (AHI > 90) was more difficult as artefacted segments were more similar due to almost continuous apneic events (see
Figure 4). Different delays result in similar cross-correlation values. Additionally, signal quality tends to decrease, which causes the correlation value during synchronization to drop.
In a second stage, the sensor signals were precisely synchronized based on heart rate information instead of the respiratory signal. As the calculated delay between the tachograms of the ECG and BCG was small, a good synchronization was already reached during respiration-based synchronization. The presented framework for synchronization enabled a supervised analysis of the commercial Emfit sensor for future studies. Additionally, the framework can be applied to other multi-modal systems that record movements during sleep. This includes pressure-based signals of the thorax and respiratory-related signals, as simultaneous and similar artefacts can be expected in these signals.
Regarding the positioning of the Emfit sensor, it can be seen in
Figure 11a–c that performance parameters exhibit similar distributions for the top and tottom. Parameters were only calculated for clean segments, therefore the percentage of (clean) segments included for analysis from every sensor was visualized in
Figure 11d. From the bottom sensor, more clean segments of at least 1 min could be extracted as these signals were attenuated by the mattress topper and fewer artefacts were present in the signal. On the other hand, median values were significantly higher for the top sensor, indicating better sensor correspondence with the hospital’s PSG. This is due to the fact that the recorded signal amplitude of the bottom sensor was lower, making it more difficult for the algorithm to detect heart beats in the BCG. In general, MS coherence and correlation values of Emfit compared to PSG were modest. The Emfit sensor has a different measuring mechanism to the PSG thoracic belt or the PSG ECG. Therefore, different frequency components can be expected in Emfit respiration signals compared to the PSG thoracic belt. Moreover, the sensor quality of Emfit is expected to be less consistent during the night due to the different body positions of the patient.