RU2758550C1

RU2758550C1 - Method for diagnosing signs of bronchopulmonary diseases associated with covid-19 virus disease

Info

Publication number: RU2758550C1
Application number: RU2021106232A
Authority: RU
Inventors: Павел Романович Самсонов; Дмитрий Михайлович Михайлов; Вера Васильевна Чуманская
Original assignee: Общество с ограниченной ответственностью "Кардио Маркер"
Priority date: 2021-03-10
Filing date: 2021-03-10
Publication date: 2021-10-29
Also published as: WO2022191740A1

Abstract

FIELD: medicine.

SUBSTANCE: invention relates to medicine and can be used in practical medicine for non-invasive diagnosis of diseases of the bronchopulmonary system. Three types of audio recordings from the patient are recorded: cough, breathing, speech, discrete integral transformation of audio recordings is performed, the result of which is a set of spectrograms of these audio recordings, additional segmentation of spectrograms into separate fragments with intersections in time is carried out, methods for preprocessing a signal with using ultra-precise linear layers and obtaining a set of feature vectors that are fed to the input of a convolutional neural network for classification with obtaining the generated feature vector at the output, the obtained vectors of signs are combined from the three original audio recordings, the combinations of the obtained vectors are transformed using a linear layer, and a conclusion about the patient's health is formed based on the results obtained.

EFFECT: invention provides automation and simplification of the technological process for the diagnosis of COVID using deep machine learning methods.

5 cl, 10 dwg

Description

Изобретение относится к медицине и может быть использовано в практической медицине для неинвазивной диагностики заболеваний бронхолегочной системы.The invention relates to medicine and can be used in practical medicine for non-invasive diagnosis of diseases of the bronchopulmonary system.

Разработанное техническое решение характеризует способ диагностирования акустических признаков вызванных изменениями в дыхательном тракте, сопутствующих заболеванию Covid. Методами глубокого обучения решена задача регрессии, определения вероятности по записям кашля, дыхания и речи наличия у человека заражения вирусными заболеваниями, оказывающими влияние на дыхательный тракт человека, в частности вызванных вирусом COVID-19.The developed technical solution characterizes the way of diagnosing acoustic signs caused by changes in the respiratory tract associated with Covid disease. Deep learning methods have been used to solve the problem of regression, determining the likelihood of a person being infected with viral diseases that affect the human respiratory tract, in particular those caused by the COVID-19 virus, based on records of coughing, breathing and speech.

Известен (RU, патент 2304928, опубл. 27.08.2007) способ акустической диагностики очаговых изменений в легких человека, включающий регистрацию и вычисление спектра акустических сигналов проведенного звука голоса на поверхности грудной клетки в симметрично расположенных справа и слева точках обследования, измерение и сравнительную оценку их амплитуд. Спектр зарегистрированного сигнала вычисляют в полосе частот от 80 до 2000 Гц в логарифмическом масштабе по амплитуде в каждой точке обследования, измеряют амплитуды и частоты первого (A1, f1), второго (А2, f2), третьего (A3, f3) спектральных максимумов, расположенных на гармонически связанных частотах и имеющих уровень не ниже 60 дБ от уровня первого максимума, вычисляют отношения А12/f21, равное отношению (А1-А2) к (f2-f1), A23/f32, равное (А2-А3) к (f3-f2), разность ΔΑ12 величин А1 и А2 над симметричными точками справа (D) и слева (S), при этом сравнение полученных величин проводят с соответствующими пороговыми значениями для данного типа заболевания, а патологическое снижение пневмотизации в точке обследования фиксируют, если выполняется, по крайней мере, одно из условий: A12/f21 меньше первого порогового значения данного параметра (А12/f21)_п1, A23/f32 меньше первого порогового значения данного параметра (A23/f32)_п1, f1 больше порогового значения данного параметра (f1)_п, ΔΑ12 меньше первого порогового значения данного параметра (ΔΑ12)_п1 для точек обследования правого легкого, ΔΑ12 больше второго порогового значения данного параметра (ΔΑ12)_п2 для точек обследования левого легкого, а патологическое увеличение пневмотизации в точке обследования фиксируют, если А12/f21 больше второго порогового значения данного параметра (А12/f21)_п2 и/или A23/f32 больше второго порогового значения данного параметра (A23/f32)_п2, причем величины первого и второго пороговых значений вычисляют как 5% и 95% персентили распределения данных параметров по группе здоровых.Known (RU, patent 2304928, publ. 27.08.2007) a method of acoustic diagnosis of focal changes in the lungs of a person, including the registration and calculation of the spectrum of acoustic signals of the conducted sound of the voice on the surface of the chest in the points of examination symmetrically located on the right and left, measurement and comparative assessment of them amplitudes. The spectrum of the recorded signal is calculated in the frequency band from 80 to 2000 Hz on a logarithmic scale in amplitude at each point of the survey, the amplitudes and frequencies of the first (A1, f1), second (A2, f2), third (A3, f3) spectral maxima located at harmonically related frequencies and having a level of at least 60 dB from the level of the first maximum, calculate the ratio A12 / f21, equal to the ratio (A1-A2) to (f2-f1), A23 / f32, equal to (A2-A3) to (f3- f2), the difference ΔΑ12 values A1 and A2 over symmetric points on the right (D) and on the left (S), while the comparison of the obtained values is carried out with the corresponding threshold values for this type of disease, and the pathological decrease in pneumatization at the point of examination is recorded, if performed, according to at least one of the conditions: A12 / f21 is less than the first threshold value of this parameter (A12 / f21) _n1 , A23 / f32 is less than the first threshold value of this parameter (A23 / f32) _n1 , f1 is greater than the threshold value of this parameter (f1) _n , ΔΑ12 less than the first threshold value of this parameter (ΔΑ12) _n1 for examination points of the right lung, ΔΑ12 is greater than the second threshold value of this parameter (ΔΑ12) _n2 for examination points of the left lung, and a pathological increase in pneumatization at the examination point is recorded if A12 / f21 is greater than the second threshold value of this parameter (A12 / f21) _n2 and / or A23 / f32 is greater than the second threshold value of this parameter (A23 / f32) _n2 , and the values of the first and second threshold values are calculated as 5% and 95% of the percentile of the distribution of these parameters in the healthy group.

Известен также (RU, патент 2354285, опубл. 10.05.2009) способ диагностики обструктивных нарушений функций внешнего дыхания путем проведения бронхофонографии и регистрации респираторного цикла, причем оценивают следующие параметры: акустический эквивалент работы дыхательных мышц (АРД) в различных частотных диапазонах: АРД0 - 200-1200 Гц, АРД1 - 1200-12600 Гц, АРД2 - 5000-12600 Гц, АРД3 - 1200-5000 Гц; вычисляют коэффициенты Κ1, K2, K3: K1=АРД1/АРД0×100, K2=АРД2/АРД0×100, K3=АРД3/АРД0×100; ΔK, соответствующий приросту показателей коэффициентов K, а именно ΔK=K форсированного выдоха - K спокойного дыхания/K спокойного дыхания × 100; индекс прироста коэффициента (ИПК)=ΔK2/ΔK3 и при значениях в режиме спокойного дыхания: АРД1 и АРД3 более 100 мДж; K1 и K3 более 15; ΔΚ1 и ΔΚ3 менее 200% и ИПК 2 и более диагностируют обструктивные нарушения функций внешнего дыхания.It is also known (RU, patent 2354285, publ. 10.05.2009) a method for diagnosing obstructive respiratory dysfunctions by conducting bronchophonography and registering the respiratory cycle, and the following parameters are assessed: the acoustic equivalent of the work of the respiratory muscles (ARD) in various frequency ranges: ARD0 - 200 -1200 Hz, ARD1 - 1200-12600 Hz, ARD2 - 5000-12600 Hz, ARD3 - 1200-5000 Hz; calculate the coefficients Κ1, K2, K3: K1 = DGS1 / DGS0 × 100, K2 = DGS2 / DGS0 × 100, K3 = DGS3 / DGS0 × 100; ΔK, corresponding to the increase in the indices of the coefficients K, namely, ΔK = K forced expiration - K calm breathing / K calm breathing × 100; coefficient growth index (IPC) = ΔK2 / ΔK3 and at values in the quiet breathing mode: ARD1 and ARD3 more than 100 mJ; K1 and K3 more than 15; ΔΚ1 and ΔΚ3 less than 200% and IPC 2 and more diagnose obstructive disturbances in the functions of external respiration.

Также известен (RU, патент 2598051, опубл. 20.09.2016) Способ определения изменений голосовой функции человека при ХОБЛ, включающий измерения параметров изменения голосообразующей функции на основе акустического анализа с помощью компьютерной программы Specta PLUS, отличающийся тем, что проводят определение характеристик частоты основного тона, максимального времени фонации и участков голосового шума, последовательно в динамике и при увеличении на 10-й день лечения частоты основного тона до 142,6±15,2, максимального времени фонации до 20,5±2,9, участков голосового шума до "+" определяют улучшение голосовой функции человека.Also known (RU, patent 2598051, publ. 09/20/2016) A method for determining changes in the voice function of a person with COPD, including measuring the parameters of changes in voice-forming function based on acoustic analysis using the Specta PLUS computer program, characterized in that the characteristics of the pitch frequency are determined , maximum phonation time and sections of voice noise, sequentially in dynamics and with an increase on the 10th day of treatment, the frequency of the fundamental tone to 142.6 ± 15.2, maximum phonation time to 20.5 ± 2.9, sections of voice noise to " + "determine the improvement of a person's vocal function.

Недостатком всех перечисленных технически решений следует признать их неприменимость к диагностике заболеваний, вызванных вирусом COVID-19.The disadvantage of all the above technical solutions should be recognized as their inapplicability to the diagnosis of diseases caused by the COVID-19 virus.

Техническая проблема, решаемая использованием разработанного способа, состоит в расширении арсенала средств диагностики заболеваний, вызванных вирусом COVID-19.The technical problem solved by using the developed method is to expand the arsenal of diagnostic tools for diseases caused by the COVID-19 virus.

Технический результат, достигаемый при реализации разработанного способа, состоит в автоматизации и упрощении технологического процесса по диагностике COVID методами глубокого машинного обучения.The technical result achieved by implementing the developed method consists in automating and simplifying the technological process for diagnosing COVID using deep machine learning methods.

Для достижения указанного технического результата предложено использовать разработанный способ диагностики признаков бронхолегочных заболеваний, сопутствующих заболеванию вирусом COVID-19. При реализации разработанного способа осуществляют регистрацию трех типов аудиозаписей от пациента: кашля, дыхания, речи, осуществляют дискретное интегральное преобразование аудиозаписей, результатом которого является получение набора спектрограмм этих аудиозаписей, проводят дополнительную сегментацию спектрограмм на отдельные фрагменты с пересечениями по времени, применяют к полученным фрагментам спектрограмм методы предобработки сигнала с применением сверхточных линейных слоев и получением набора векторов признаков, которые подают на вход сверточной нейронной сети для классификации с получение на выходе сформированного вектора признаков, проводят объединение полученных векторов признаков от трех исходных аудиозаписей, преобразуют объединения полученных векторов с применением линейного слоя и по полученным результатам формируют заключение о здоровье пациента.To achieve this technical result, it is proposed to use the developed method for diagnosing signs of bronchopulmonary diseases associated with the disease with the COVID-19 virus. When implementing the developed method, three types of audio recordings from a patient are registered: cough, breathing, speech, discrete integral transformation of audio recordings is carried out, the result of which is a set of spectrograms of these audio recordings, additional segmentation of spectrograms into separate fragments with intersections in time is carried out, applied to the obtained fragments of spectrograms methods of signal preprocessing using ultra-precise linear layers and obtaining a set of feature vectors that are fed to the input of a convolutional neural network for classification with obtaining a generated feature vector at the output, combining the obtained feature vectors from three original audio recordings, transforming the combination of the obtained vectors using a linear layer and based on the results obtained, a conclusion about the patient's health is formed.

В некоторых вариантах реализации разработанного способа после регистрации трех типов аудиозаписей от пациента: кашля, дыхания, речи, осуществляют извлечение спектральных характеристик аудиозаписи и передачу их на вход классических алгоритмов машинного обучения.In some embodiments of the developed method, after registering three types of audio recordings from a patient: cough, breathing, speech, the spectral characteristics of the audio recording are extracted and transferred to the input of classical machine learning algorithms.

Для получения спектрограмм можно использовать оконное преобразование Фурье или вейвлет-преобразования.To obtain spectrograms, you can use the windowed Fourier transform or wavelet transform.

В некоторых вариантах разработанного способа после предобработки фрагментов спектрограмм и получения векторов признаков, вектор подается на вход рекуррентной нейронной сети Классификация признаков нейронной сетью осуществляют с применением механизма внимания.In some variants of the developed method, after preprocessing the spectrogram fragments and obtaining the feature vectors, the vector is fed to the input of the recurrent neural network. Feature classification by the neural network is carried out using the attention mechanism.

Разработанное техническое решение характеризует способ диагностирования акустических признаков, вызванных изменениями в дыхательном тракте сопутствующих заболеванию Covid. Методами глубокого обучения решается задача регрессии, определения вероятности по записям кашля, дыхания и речи наличия у человека заражения вирусными заболеваниями, оказывающими влияние на дыхательный тракт человека. Способ включает конвертирование, подготовку, предобработку и анализ данных методами глубокого обучения. Для классификации заболеваний предложено использовать рекуррентная сеть со сверточной нейронной сетью в качестве энкодера и механизмом внимания.The developed technical solution characterizes the way of diagnosing acoustic signs caused by changes in the respiratory tract associated with Covid disease. Deep learning methods are used to solve the problem of regression, to determine the likelihood, based on the records of coughing, breathing and speech, that a person is infected with viral diseases that affect the human respiratory tract. The method includes converting, preparing, preprocessing and analyzing data using deep learning methods. To classify diseases, it is proposed to use a recurrent network with a convolutional neural network as an encoder and an attention mechanism.

Представленная технология представляет из себя серверное приложение для анализа медицинских акустических данных пациентов для выявления и классификации респираторных заболеваний, а также осложнений и отклонений вызванных наличием вирусов, в частности, COVID-19.The presented technology is a server application for analyzing medical acoustic data of patients for the detection and classification of respiratory diseases, as well as complications and deviations caused by the presence of viruses, in particular, COVID-19.

Коронавирусная инфекция стала настоящим испытанием для общественности. Невозможно не оценить труд врачей, столкнувшихся с огромным количеством пациентов. Однако, вспышка коронавируса обнажила некоторые проблемы в области здравоохранения, в частности, недостаток медицинских работников. В век высоких технологий стоит задуматься о снабжении больниц специальным программным обеспечением, способным помочь доктору в диагностировании заболевания. В связи с растущей популярностью методов машинного и глубокого обучения становится очевидным обращение к этой области для поиска решения.Coronavirus infection has become a real challenge for the public. It is impossible not to appreciate the work of doctors who are faced with a huge number of patients. However, the outbreak of the coronavirus has exposed some health problems, in particular the lack of medical workers. In the age of high technology, it is worth thinking about supplying hospitals with special software that can help a doctor diagnose a disease. With the growing popularity of machine learning and deep learning methods, it becomes obvious that this area is being addressed to find a solution.

На сегодняшний день существует несколько подходов, посвященных диагностированию респираторных и вирусных заболеваний. Основная идея большей части из них основана на обработке аудиосигналов человеческого тела: кашля, дыхания, звуков грудной клетки. По результатам исследования исследовательских групп простые бинарные классификаторы данных, в основе которых лежит логистическая регрессия (logit model), градиентный бустинг (gradient boosting) и метод опорных векторов (support vector machines) дают точность (precision) до 82%. Подход, использующий random forest получил точность (accuracy) классификации на тестовых данных достигла 66.74%). Некоторые исследователи идут по пути разработки классификатора, представленного тремя ветками и медиатором, аналогично с независимыми мнениями нескольких врачей.Today, there are several approaches to diagnosing respiratory and viral diseases. The main idea of most of them is based on the processing of audio signals from the human body: coughing, breathing, chest sounds. According to research by research groups, simple binary data classifiers based on logit model, gradient boosting, and support vector machines give a precision of up to 82%. The random forest approach obtained the accuracy (accuracy) of classification on test data reached 66.74%). Some researchers are following the path of developing a classifier, represented by three branches and a mediator, similar to the independent opinions of several doctors.

В предложенной реализации положительный или отрицательный результат Covid-19 ставится только при совпадении решений трех веток, что снижает вероятность ошибки до 6.147⋅10^-4. В классификации использовались сверточные сети и метод опорных векторов.In the proposed implementation, a positive or negative result of Covid-19 is set only when the decisions of the three branches coincide, which reduces the probability of an error to 6.147⋅10 ^-4 . The classification used convolutional networks and support vector machines.

Помимо обработки звуков тела человека, для диагностирования Covid методами глубокого обучения также можно использовать снимки рентгена и компьютерной томографии грудной клетки.In addition to processing the sounds of the human body, X-rays and CT scans of the chest can also be used to diagnose Covid with deep learning methods.

Разработанное техническое решение представляет из себя способ анализа акустических данных кашля, дыхания и речи пациента для выявления и классификации респираторных заболеваний или сопутствующих признаков наличия вирусного заболевания.The developed technical solution is a method for analyzing the acoustic data of a patient's cough, breathing and speech to identify and classify respiratory diseases or associated signs of a viral disease.

Модель для диагностики заболеваний представляет собой ансамбль рекуррентных нейронных сетей с энкодером, механизмом внимания и линейными слоями, следующими за ней.The model for the diagnosis of diseases is an ensemble of recurrent neural networks with an encoder, an attention mechanism and linear layers following it.

Изобретение представляет собой метод отработки записей поступающих от пользователей. Архитектура метода представлена ансамблем нейронных сетей, которые представлены тремя независимыми ветками с последующей конкретизацией результатов Fully connected слоями.The invention is a method for processing records from users. The architecture of the method is represented by an ensemble of neural networks, which are represented by three independent branches with subsequent concretization of the results Fully connected by layers.

На фиг.1 приведена общая архитектура используемой при реализации способа системы для диагностики COVID.Figure 1 shows the general architecture used in the implementation of the method of the system for the diagnosis of COVID.

Для анализа на вход системы подаются три аудиозаписи: речь, кашель, дыхание. Каждая аудиозапись имеет одинаковый процесс обработки. Схема процесса обработки каждой записи представлена в параллельно обрабатывается в отдельной ветке. Схема каждой ветки одинакова и представлена на фиг.1.For analysis, three audio recordings are fed to the input of the system: speech, cough, breathing. Every audio recording has the same processing process. A diagram of the processing of each record is presented in parallel processed in a separate branch. The layout of each branch is the same and is shown in Fig. 1.

Последовательность обработки аудиозаписями включает следующие этапы:The sequence of processing audio recordings includes the following steps:

• проверка и конвертация параметров аудиозаписи;• verification and conversion of audio recording parameters;

• нарезка и экстракция признаков для каждого отдельного окна аудиозаписи;• slicing and extraction of features for each separate audio recording window;

• получение вектора признаков с помощью RNN (рекурентной нейронной сети) для полной аудиозаписи.• obtaining a feature vector using RNN (recurrent neural network) for complete audio recording.

Затем проводят проверку и конвертацию аудиозаписи, при этом аудиозаписи от пользователей поступают в блок обработки. Блок проверит аудиофайл на соответствие требованиям системы по формату данных, частоте дискретизации, битрейту, количеству каналов. При несовпадении параметров происходит конвертация данных к требуемым параметрам системы.Then the audio recordings are checked and converted, while the audio recordings from users are sent to the processing unit. The block will check the audio file for compliance with the system requirements for data format, sampling frequency, bit rate, number of channels. If the parameters do not match, the data is converted to the required system parameters.

• Перевод аудиодорожки в числовой массив• Translation of an audio track into a numeric array

• Перевод из стерео- в моно- режим• Conversion from stereo to mono mode

• Ресемплинг к частоте дискретизации 44.1 кГц• Resampling to 44.1 kHz sampling rate

При невозможности конвертации к требуемым параметрам блок генерирует ошибку с указанием невалидных параметров аудиофайла.If it is impossible to convert to the required parameters, the block generates an error indicating invalid parameters of the audio file.

Далее проходит стадия нарезки и экстракции признаков. На этапе экстракции признаков выделяют наиболее значимые признаки в аудиофайлах для последующей подачи их в рекуррентную нейронную сеть для извлечения закономерностей и паттернов. Экстракция признаков может быть проведена разными способами, такими как:Next comes the stage of cutting and extracting features. At the stage of feature extraction, the most significant features in audio files are selected for their subsequent submission to a recurrent neural network to extract patterns and patterns. Extraction of traits can be done in different ways, such as:

• интегральные преобразования (оконное преобразование Фурье, вейвлет-преобразование, и другие);• integral transforms (windowed Fourier transform, wavelet transform, and others);

• извлечение i-vectors;• extraction of i-vectors;

• скрытые марковские модели;• hidden Markov models;

• другие.• other.

Затем проходит стадия непрерывных интегральных преобразований анализа временных сигналов. Существуют различные семейства интегральных преобразования нестационарных временных сигалов. Предполагают, что временной сигнал переводится в область частот, где удобнее проводить анализ поведения динамики процесса и проще извлекать числовые характеристики. При этом, существуют различные виды частотно-временных (time-frequency) интегральных преобразований, осуществляющих перевод сигнала в частотную область. Помимо Fourier Transform (FT) в приложениях анализа сигналов применяют также Short-time Fourier Transform (STFT), Gabor Transform (GT), Wavelet Transform (WT), Wigner Distribution Function (WDF), etc.Then the stage of continuous integral transformations of the analysis of time signals passes. There are various families of integral transformations of non-stationary time signals. It is assumed that the time signal is transferred to the frequency range, where it is more convenient to analyze the behavior of the dynamics of the process and it is easier to extract numerical characteristics. At the same time, there are various types of time-frequency integral transformations that translate the signal into the frequency domain. In addition to Fourier Transform (FT), signal analysis applications also use Short-time Fourier Transform (STFT), Gabor Transform (GT), Wavelet Transform (WT), Wigner Distribution Function (WDF), etc.

STFTSTFT

По определению, непрерывное оконное преобразование Фурье представимо в виде интегралаBy definition, the continuous windowed Fourier transform can be represented as an integral

где w(⋅) - оконная функция, позволяющая выполнять селекцию интересующего отрезка времени, и проводить дополнительную обработку внутри него. В случае, когда в качестве оконной функции выбирают функцию Гаусса, оконное преобразование Фурье (STFT) называют преобразованием Габора (GT).where w (⋅) is a window function that allows you to select the time interval of interest and carry out additional processing inside it. When the Gaussian function is selected as the window function, the windowed Fourier transform (STFT) is called the Gabor transform (GT).

WTWT

Обобщением STFT является вейвлет преобразование. В общем случае, интегральное вейвлет-преобразование (2) записывается в видеA generalization of STFT is the wavelet transform. In the general case, the integral wavelet transform (2) is written in the form

где ядром преобразования является вейвлет функция ψ(⋅), а в самом преобразовании используется ее комплексное сопряжение ψ(⋅). В то время как оконная функция в STFT зависит от одного параметра τ, определяющего сдвиг во времени, вейвлет в CWT зависит от двух параметров а, b которые отвечают за масштаб (сжатие или растяжение ядра преобразования) и сдвиг (трансляцию), соответственно. Например, в качестве ядра ψ(⋅) в медицинских приложениях применяют вейвлет Морле (Morlet wavelet или Gabor wavelet), фиг.2. на котором приведен Вейвлет Морле и его первая производная, то есть, функцию видаwhere the kernel of the transformation is the wavelet function ψ (⋅), and the transformation itself uses its complex conjugation ψ (⋅). While the window function in STFT depends on one parameter τ, which determines the time shift, the wavelet in CWT depends on two parameters a, b, which are responsible for the scale (compression or expansion of the transformation kernel) and shift (translation), respectively. For example, a Morlet wavelet or Gabor wavelet is used as the ψ (⋅) kernel in medical applications, Fig. 2. which shows the Morlet wavelet and its first derivative, that is, a function of the form

Кроме того, вейвлет функция должна удовлетворять следующим свойствамIn addition, the wavelet function must satisfy the following properties

1. Конечность энергии1. The finiteness of energy

2. Условие допустимости2. Condition of admissibility

3. Для комплексных вейвлет функций преобразование Фурье должно быть действительным и обращаться в нуль для отрицательных частот.3. For complex wavelet functions, the Fourier transform must be real and vanish for negative frequencies.

Отметим, что существуют различные способы построения вейвлет-систем как ортогональных, так и неортогональных. Так, в качестве аппроксимационного базиса для построения различных систем вейвлет функций могут быть использованы бесконечно-дифференцируемые сплайны или атомарные функции. Примеры вычисления количественных характеристик временных сигналов с помощью подобных синтезированных систем вейвлет функций также представлены в. Пример простейшей атомарной функции, совпадающей с функцией Фабиуса на отрезке [0; 2] показан на фиг.3, на котором приведен вид функции Фабиуса и ее первой производнойNote that there are various ways of constructing wavelet systems, both orthogonal and non-orthogonal. So, as an approximation basis for constructing various systems of wavelet functions, infinitely differentiable splines or atomic functions can be used. Examples of calculating the quantitative characteristics of temporal signals using similar synthesized systems of wavelet functions are also presented in. An example of the simplest atomic function that coincides with the Fabius function on the segment [0; 2] is shown in Fig. 3, which shows the form of the Fabius function and its first derivative

Отметим, что существуют библиотека визуализации вейвлет систем на Python и библиотека вейвлет-преобразований на Python.Note that there is a library for visualizing wavelet systems in Python and a library for wavelet transforms in Python.

Затем начинается стадия дискретных интегральных преобразований анализа временных сигналов. В силу дискретности входных данных, возникает необходимость учета конечности числа отсчетов, и, как следствие, появляются дискретные аналоги непрерывных интегральных преобразований, указанных выше.Then the stage of discrete integral transformations of the analysis of time signals begins. Due to the discreteness of the input data, it becomes necessary to take into account the finiteness of the number of samples, and, as a consequence, discrete analogs of the continuous integral transformations indicated above appear.

DWFTDWFT

Дискретный вариант непрерывного оконного преобразования принимает видThe discrete version of the continuous window transform takes the form

где Х(k) - дискретная частоты временной последовательности х(n), n - временной индекс, k - частотный индекс, N - количество отсчетов, w(n) - отсчеты оконной функции. При этом оконная функция может быть выбрана различными способами. Так, в практических приложениях используется окно Ханна (Hann windows), которое определяется следующим образомwhere X (k) is the discrete frequency of the time sequence x (n), n is the time index, k is the frequency index, N is the number of samples, w (n) are the samples of the window function. In this case, the window function can be selected in various ways. So, in practical applications, the Hann windows are used, which is defined as follows

и

and

извлечения i-vectors.extracting i-vectors.

Метод i-vectors или identity-vectors представляет собой метод выделения и использования вспомогательных признаков. На настоящий момент, класс методов i-vectors является сравнительно новым способом решения задач распознавания объектов различной природы. Первоначально метод i-vectors возник для решения задачи распознавания речи. Идея метода основана на представлении моделей выражений гауссовой смеси

The i-vectors or identity-vectors method is a method for extracting and using auxiliary features. At the moment, the class of methods i-vectors is a relatively new way of solving problems of recognizing objects of various nature. Initially, the i-vectors method arose to solve the speech recognition problem. The idea of the method is based on the representation of models of expressions of a Gaussian mixture

При этом изображение этого выражения также применяется в качестве вектора признаков в языковом классификаторе.In this case, the image of this expression is also used as a feature vector in the language classifier.

Применение оконного преобразования ФурьеApplying Windowed Fourier Transform

Для примера рассмотрим схему извлечения признаков методом дискретного оконного преобразования Фурье. Стандартная схема применения DWFT состоит в следующем. Из полного сигнала с данными выделяется область интереса для анализа, фиг.4.For example, consider a feature extraction scheme using the discrete windowed Fourier transform. The standard scheme for using DWFT is as follows. An area of interest is selected from the complete data signal for analysis, FIG. 4.

Та часть сигнала, которая попала в область интереса скалярно умножается на некую оконную функцию (window function), т.е. происходит "взвешивание" (weighting), фиг.5 - фиг.6).The part of the signal that fell into the area of interest is scalar multiplied by a certain window function, i.e. "weighting" occurs (Fig. 5 - Fig. 6).

При этом сумма сдвигов оконной функции Ханна обеспечивает разложение единицы, (фиг.7 - фиг.8). Однако в качестве оконных функций можно также использовать вейвлеты и атомарные функции, сумма сдвигов (shifts) которых также удовлетворяет разложению единицы (partition of unity).In this case, the sum of the shifts of the Hann window function provides the decomposition of unity, (Fig. 7 - Fig. 8). However, wavelets and atomic functions can also be used as window functions, the sum of the shifts of which also satisfies the partition of unity.

b)b)

В настоящем изобретении данный подход позволяет получить спектрограмму, после чего она разбивается на фрагменты длительности 1 секунда с шагом 0.5 секунд, которые подаются на вход CNN энкодеров.In the present invention, this approach allows obtaining a spectrogram, after which it is divided into fragments of 1 second duration with a step of 0.5 seconds, which are fed to the input of the CNN encoders.

В таком случае, CNN энкодеры служат для извлечения (экстракции) репрезентативных (значимых) признаков и уменьшения размерности входных данных в LSTM слои. Энкодер состоит из четырех блоков, включающих в себя операцию свертки с ядром 3×3, слой активации с функцией Leaky ReLu, метод прореживания с вероятностью исключения нейрона 0.7 для предотвращения переобучения и батчнормализацию. Энкодеры обрабатывают поступившие в качестве входных данных окна спектрограммы и полученные признаки, подаются на вход LSTM слоев, фиг.9.In this case, CNN encoders are used to extract (extract) representative (significant) features and reduce the dimension of the input data into LSTM layers. The encoder consists of four blocks, which include a convolution operation with a 3 × 3 kernel, an activation layer with the Leaky ReLu function, a decimation method with a probability of excluding a neuron of 0.7 to prevent overfitting, and batch normalization. The encoders process the spectrogram windows received as input data and the received features, are fed to the input of the LSTM layers, Fig. 9.

Рекуррентная нейронная сеть с LSTM устроена по принципу many-to-many. Каждый отдельный фрагмент аудиозаписи после прохождения через экстракторы признаков попадает на отдельный слой LSTM размерностью внутренних гейтов 512.A recurrent neural network with LSTM is designed according to the many-to-many principle. Each separate fragment of the audio recording, after passing through the feature extractors, falls on a separate LSTM layer with the dimension of 512 internal gates.

Выход с каждого слоя рекуррентной сети передается далее в блок attention.The output from each layer of the recurrent network is passed on to the attention block.

AttentionAttention

Выходные данные каждого LSTM слоя, представляющие собой вектор размерности 512, проходят через линейный слой с гиперболическим тангенсом в качестве его функции активации.The output of each LSTM layer, which is a 512 vector, is passed through a linear layer with the hyperbolic tangent as its activation function.

Полученные вектора после линейного слоя скалярно перемножаются с вектором весов, который в процессе обучения модели, корректирует веса методом градиентного спуска.. и сформированные признаки передаются в softmax для нормализации.The obtained vectors after the linear layer are scalar multiplied with the vector of weights, which, in the process of training the model, corrects the weights by the method of gradient descent .. and the generated features are transferred to softmax for normalization.

Нормализованные значения перемножаются с исходными признаками полученными на LSTM слоях и полученные значения взвешенно суммируются с выходами всех других слоев. Архитектура предлагаемого алгоритма глубокого обучения представлена на фиг.10. Получение вектора взвешенных сумм всех трех аудиозаписей подаются на вход блока конкатенации и последующего линейного преобразования и на выходе получаем вероятность заражения пациента COVID-19.The normalized values are multiplied with the original features obtained on the LSTM layers and the obtained values are weightedly summed up with the outputs of all other layers. The architecture of the proposed deep learning algorithm is shown in Fig. 10. Obtaining a vector of weighted sums of all three audio recordings is fed to the input of the concatenation unit and subsequent linear transformation, and at the output we obtain the probability of a patient becoming infected with COVID-19.

При обучение модели используется оптимизационный алгоритм adam и происходит снижение скорости обучения алгоритма в 10 раз каждые 100 шагов.When training the model, the adam optimization algorithm is used and the learning rate of the algorithm is reduced by 10 times every 100 steps.

Описанный метод может быть применен с использованием любого устройства, имеющего микрофон и способного использовать его на запись (включая, но не ограничиваясь: диктофон, кнопочный мобильный телефон, смартфон, умные часы, терминал, умная колонка и т.п.). Специализированное программное обеспечение, адаптированное под указанное устройство помогает пользователю выполнить необходимую последовательность шагов для подготовки и записи звуковых файлов. Записанные в файлы данные передаются на сервер с развернутой на нем системой обработки файлов через любые каналы передачи данных. Система на сервере обрабатывает звуковые файлы в соответствии с методом описанным выше и передает результат пользователю (или иному адресату (как человеку, так и другой системе, определенному настройкой системы) с использованием адаптируемых форматов и любых доступных каналов связи.The described method can be applied using any device that has a microphone and is able to use it for recording (including, but not limited to: a voice recorder, push-button mobile phone, smartphone, smart watch, terminal, smart speaker, etc.). Specialized software adapted to the specified device helps the user to complete the necessary sequence of steps to prepare and record audio files. The data recorded in the files is transmitted to the server with the file processing system deployed on it through any data transfer channels. The system on the server processes sound files in accordance with the method described above and transmits the result to the user (or to another addressee (either a person or another system determined by the system settings) using adaptable formats and any available communication channels.

Claims

1. A method for diagnosing signs of bronchopulmonary diseases in COVID-19, characterized in that they register three types of audio recordings from a patient: cough, breathing, speech, perform discrete integral transformation of audio recordings, which results in obtaining a set of spectrograms of these audio recordings, carry out additional segmentation of spectrograms into separate fragments with intersection in time, apply to the obtained fragments of spectrograms the methods of signal preprocessing using ultra-precise linear layers and obtaining a set of feature vectors that are fed to the convolutional neural network for classification with obtaining the generated feature vector at the output, combining the obtained feature vectors from three original audio recordings , transform the union of the obtained vectors using a linear layer and, based on the results obtained, form a conclusion about the patient's health.

2. The method according to claim 1, characterized in that after registering three types of audio recordings from a patient: cough, breathing, speech, the spectral characteristics of the audio recording are extracted and transferred to the input of classical machine learning algorithms.

3. A method according to claim 1, characterized in that a windowed Fourier transform or a wavelet transform is used to obtain the spectrograms.

4. The method according to claim 1, characterized in that after preprocessing the spectrogram fragments and obtaining the feature vectors, the vector is fed to the input of the recurrent neural network.

5. The method according to claim 1, characterized in that the classification of the features by the neural network is carried out using the attention mechanism.