[go: up one dir, main page]

CN116434739A - Device and related components for constructing a classification model for identifying different stages of heart failure - Google Patents

Device and related components for constructing a classification model for identifying different stages of heart failure Download PDF

Info

Publication number
CN116434739A
CN116434739A CN202310205344.5A CN202310205344A CN116434739A CN 116434739 A CN116434739 A CN 116434739A CN 202310205344 A CN202310205344 A CN 202310205344A CN 116434739 A CN116434739 A CN 116434739A
Authority
CN
China
Prior art keywords
classification model
heart failure
voice
identifying
stage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310205344.5A
Other languages
Chinese (zh)
Inventor
武晓静
燕楠
周骐
姚圣森
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen University
Original Assignee
Shenzhen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen University filed Critical Shenzhen University
Priority to CN202310205344.5A priority Critical patent/CN116434739A/en
Publication of CN116434739A publication Critical patent/CN116434739A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • G10L2015/0635Training updating or merging of old and new templates; Mean values; Weighting
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)

Abstract

本发明公开了构建识别心力衰竭不同分期的分类模型的装置及相关组件,该装置包括样本处理单元,用于将所采集的语音模拟信号转换为语音数字信号,并对所述语音数字信号进行预处理,以及对预处理后的语音数字信号进行特征提取,得到多类语音特征样本;模型训练单元,用于构建用于识别心力衰竭分期的分类模型,利用所述多类语音特征样本对所述分类模型进行训练和优化,得到最优分类模型。利用该最优分类模型能够较为准确的识别出心力衰竭的不同分期。

Figure 202310205344

The invention discloses a device for constructing a classification model for identifying different stages of heart failure and related components. The device includes a sample processing unit, which is used to convert the collected voice analog signal into a voice digital signal, and perform pre-processing on the voice digital signal. processing, and performing feature extraction on the preprocessed speech digital signal to obtain multi-class speech feature samples; the model training unit is used to construct a classification model for identifying heart failure stages, and use the multi-class speech feature samples to classify the The classification model is trained and optimized to obtain the optimal classification model. Using the optimal classification model, the different stages of heart failure can be identified more accurately.

Figure 202310205344

Description

构建识别心力衰竭不同分期的分类模型的装置及相关组件Device and related components for constructing a classification model for identifying different stages of heart failure

技术领域Technical Field

本发明涉及计算机技术领域,特别涉及一种基于声音特征构建识别心力衰竭不同分期的分类模型的装置、一种计算机可读存储介质及一种计算机设备。The present invention relates to the field of computer technology, and in particular to a device for constructing a classification model for identifying different stages of heart failure based on sound features, a computer-readable storage medium and a computer device.

背景技术Background Art

心力衰竭是由各种原因导致心脏结构和/或功能异常改变,从而引起的一组复杂临床综合征,是各种常见心血管疾病的严重和终末阶段。目前根据心衰发生发展过程,分为心衰风险期(A期)、前心衰阶段(B期)、心力衰竭阶段(C期)和终末期心力衰竭阶段(D期)四期。给心衰分期的目的在于早发现、早诊断、早干预,尤其对及时识别和治疗有心衰风险和前心衰阶段的患者,早干预对延缓心室重构和心衰进展、保护心功能、改善生活质量、减少再住院率等具有重要意义。Heart failure is a group of complex clinical syndromes caused by abnormal changes in cardiac structure and/or function due to various reasons. It is a serious and terminal stage of various common cardiovascular diseases. At present, according to the development process of heart failure, it is divided into four stages: heart failure risk stage (stage A), pre-heart failure stage (stage B), heart failure stage (stage C) and end-stage heart failure stage (stage D). The purpose of staging heart failure is early detection, early diagnosis and early intervention, especially for timely identification and treatment of patients at risk of heart failure and pre-heart failure stage. Early intervention is of great significance for delaying ventricular remodeling and progression of heart failure, protecting cardiac function, improving quality of life and reducing re-hospitalization rate.

而现有技术中,对于心衰分期的识别通常依赖于病史、体格检查、实验室检查、心脏影像学检查和功能检查,这些检查往往是患者因出现症状就诊时才做,不利于早期识别。近年发展起来的通过植入装置进行血流动力学或肺水含量监测,如CardioMEMS、MultiSENSE、ReDS等传感器装置,以及通过HeartLogic多传感器指数和警报算法评估,可实现患者心衰失代偿事件的提前预警。然而以上方法的设备昂贵且为侵入式,需要植入传感器或已经安装过起搏器,仅适合小部分重症或顽固性心衰患者,不适合心衰风险期和前心衰阶段患者的筛查。在加强规范化诊治和患者教育基础上,发展无创、便捷、普适性的监测和预警方法,识别不同分期心衰患者,加强居家监测和预警是心衰慢病管理减少再住院率、降低死亡率的关键。而现有技术中也有对某些参数训练和学习,进而识别某些疾病的分类模型的技术,但这些分类模型利用的数据也多依赖于诸多检查数据,并且识别准确率不高。In the existing technology, the identification of heart failure stages usually relies on medical history, physical examination, laboratory tests, cardiac imaging tests and functional tests. These tests are often performed only when patients seek medical treatment due to symptoms, which is not conducive to early identification. In recent years, the development of hemodynamic or lung water content monitoring through implanted devices, such as CardioMEMS, MultiSENSE, ReDS and other sensor devices, as well as HeartLogic multi-sensor index and alarm algorithm evaluation, can achieve early warning of patients' heart failure decompensation events. However, the equipment of the above methods is expensive and invasive, requiring implanted sensors or pacemakers to be installed. It is only suitable for a small number of patients with severe or refractory heart failure, and is not suitable for screening patients in the risk period of heart failure and pre-heart failure stage. On the basis of strengthening standardized diagnosis and treatment and patient education, developing non-invasive, convenient and universal monitoring and early warning methods, identifying patients with heart failure at different stages, and strengthening home monitoring and early warning are the key to reducing the re-hospitalization rate and mortality in the management of chronic heart failure. The prior art also has technologies for training and learning certain parameters to identify classification models of certain diseases, but the data used by these classification models also rely on a lot of examination data, and the recognition accuracy is not high.

发明内容Summary of the invention

本发明实施例旨在提供一种基于声音特征构建能够准确识别心力衰竭不同分期的分类模型的装置、一种计算机可读存储介质及一种计算机设备。The embodiments of the present invention aim to provide a device for constructing a classification model based on sound features that can accurately identify different stages of heart failure, a computer-readable storage medium, and a computer device.

第一方面,本发明实施例提供了一种基于声音特征构建识别心力衰竭不同分期的分类模型的装置,其包括:In a first aspect, an embodiment of the present invention provides a device for constructing a classification model for identifying different stages of heart failure based on sound features, comprising:

样本处理单元,用于将所采集的语音模拟信号转换为语音数字信号,并对所述语音数字信号进行预处理,以及对预处理后的语音数字信号进行特征提取,得到多类语音特征样本;A sample processing unit, used for converting the collected voice analog signal into a voice digital signal, preprocessing the voice digital signal, and extracting features from the preprocessed voice digital signal to obtain multiple types of voice feature samples;

模型训练单元,用于构建用于识别心力衰竭分期的分类模型,利用所述多类语音特征样本对所述分类模型进行训练和优化,得到最优分类模型,其中,用于识别心力衰竭A期和心力衰竭B期的分类模型为基于原始变量的AdaBoost分类模型;用于识别心力衰竭B期和C期的分类模型为基于Lasso降维的AdaBoost分类模型;用于识别心力衰竭A期和B期与C期的分类模型为基于Lasso降维的AdaBoost分类模型。A model training unit is used to construct a classification model for identifying heart failure stages, and to train and optimize the classification model using the multi-category speech feature samples to obtain an optimal classification model, wherein the classification model for identifying heart failure stage A and heart failure stage B is an AdaBoost classification model based on original variables; the classification model for identifying heart failure stage B and stage C is an AdaBoost classification model based on Lasso dimensionality reduction; and the classification model for identifying heart failure stages A, B, and C is an AdaBoost classification model based on Lasso dimensionality reduction.

第二方面,本发明实施例又提供了一种计算机设备,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现如下方法:将所采集的语音模拟信号转换为语音数字信号,并对所述语音数字信号进行预处理,以及对预处理后的语音数字信号进行特征提取,得到多类语音特征样本;构建用于识别心力衰竭分期的分类模型,利用所述多类语音特征样本对所述分类模型进行训练和优化,得到最优分类模型,其中,用于识别心力衰竭A期和心力衰竭B期的分类模型为基于原始变量的AdaBoost分类模型;用于识别心力衰竭B期和C期的分类模型为基于Lasso降维的AdaBoost分类模型;用于识别心力衰竭A期和B期与C期的分类模型为基于Lasso降维的AdaBoost分类模型。In a second aspect, an embodiment of the present invention further provides a computer device, comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the following method when executing the computer program: converting the collected voice analog signal into a voice digital signal, preprocessing the voice digital signal, and extracting features from the preprocessed voice digital signal to obtain multiple categories of voice feature samples; constructing a classification model for identifying heart failure stages, and training and optimizing the classification model using the multiple categories of voice feature samples to obtain an optimal classification model, wherein the classification model for identifying heart failure stage A and heart failure stage B is an AdaBoost classification model based on original variables; the classification model for identifying heart failure stage B and stage C is an AdaBoost classification model based on Lasso dimensionality reduction; and the classification model for identifying heart failure stages A, B, and C is an AdaBoost classification model based on Lasso dimensionality reduction.

第三方面,本发明实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如下方法:将所采集的语音模拟信号转换为语音数字信号,并对所述语音数字信号进行预处理,以及对预处理后的语音数字信号进行特征提取,得到多类语音特征样本;构建用于识别心力衰竭分期的分类模型,利用所述多类语音特征样本对所述分类模型进行训练和优化,得到最优分类模型,其中,用于识别心力衰竭A期和心力衰竭B期的分类模型为基于原始变量的AdaBoost分类模型;用于识别心力衰竭B期和C期的分类模型为基于Lasso降维的AdaBoost分类模型;用于识别心力衰竭A期和B期与C期的分类模型为基于Lasso降维的AdaBoost分类模型。In a third aspect, an embodiment of the present invention further provides a computer-readable storage medium having a computer program stored thereon, and when the computer program is executed by a processor, the following method is implemented: converting the collected voice analog signal into a voice digital signal, preprocessing the voice digital signal, and extracting features from the preprocessed voice digital signal to obtain multiple categories of voice feature samples; constructing a classification model for identifying heart failure stages, and training and optimizing the classification model using the multiple categories of voice feature samples to obtain an optimal classification model, wherein the classification model for identifying heart failure stage A and heart failure stage B is an AdaBoost classification model based on original variables; the classification model for identifying heart failure stage B and stage C is an AdaBoost classification model based on Lasso dimensionality reduction; and the classification model for identifying heart failure stages A, B and C is an AdaBoost classification model based on Lasso dimensionality reduction.

本发明实施例基于可反映不同分期的声音特征,达到针对不同分期构建识别相应分期的分类模型的目的,提高了模型识别准确率。The embodiment of the present invention is based on the sound features that can reflect different stages, so as to achieve the purpose of constructing a classification model for identifying the corresponding stages according to different stages, thereby improving the accuracy of model recognition.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本发明实施例技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the accompanying drawings required for use in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are some embodiments of the present invention. For ordinary technicians in this field, other accompanying drawings can be obtained based on these accompanying drawings without paying any creative work.

图1为本发明实施例所提供的一种基于声音特征构建识别心力衰竭不同分期的分类模型的装置的结构示意图;FIG1 is a schematic diagram of the structure of a device for building a classification model for identifying different stages of heart failure based on sound features provided by an embodiment of the present invention;

图2为本发明实施例所提供的不同心衰分期对Jitter的影响效果图;FIG2 is a diagram showing the effects of different heart failure stages on Jitter provided by an embodiment of the present invention;

图3为本发明实施例所提供的不同心衰分期对Shimmer的影响效果图;FIG3 is a diagram showing the effects of different heart failure stages on Shimmer provided by an embodiment of the present invention;

图4为本发明实施例所提供的不同心衰分期对Harmonic difference的影响效果图;FIG4 is a diagram showing the effect of different heart failure stages on harmonic difference provided by an embodiment of the present invention;

图5为本发明实施例所提供的不同心衰分期对HNR的影响效果图;FIG5 is a diagram showing the effect of different heart failure stages on HNR provided by an embodiment of the present invention;

图6为本发明实施例所提供的不同心衰分期对Alpha Ratio的影响效果图;FIG6 is a diagram showing the effect of different heart failure stages on Alpha Ratio provided by an embodiment of the present invention;

图7为本发明实施例所提供的不同心衰分期对voiced/unvoiced duration的影响效果图;FIG. 7 is a diagram showing the effect of different heart failure stages on voiced/unvoiced duration provided by an embodiment of the present invention;

图8为本发明实施例所提供的不同心衰分期Loudness的影响效果图;FIG8 is a diagram showing the effects of Loudness on different heart failure stages provided by an embodiment of the present invention;

图9为本发明实施例所提供的不同心衰分期Hammarberg Index的影响效果图;FIG9 is a diagram showing the effects of different heart failure stages of the Hammarberg Index provided by an embodiment of the present invention;

图10为本发明实施例所提供的不同心衰分期Spectral Slope的影响效果图;FIG10 is a diagram showing the effects of spectral slope on different heart failure stages provided by an embodiment of the present invention;

图11为本发明实施例所提供的不同心衰分期声门周期的互相关系数图;FIG11 is a cross-correlation coefficient diagram of glottal cycles in different heart failure stages provided by an embodiment of the present invention;

图12为本发明实施例所提供的不同心衰分期嗓音特征非线性分析图;FIG12 is a nonlinear analysis diagram of voice characteristics of different heart failure stages provided by an embodiment of the present invention;

图13为本发明实施例所提供的不同心衰分期基于倒谱的嗓音声学特征变化图;FIG13 is a diagram showing changes in voice acoustic characteristics based on cepstrum for different heart failure stages provided by an embodiment of the present invention;

图14为本发明实施例所提供的最优模型(即原始变量的AdaBoost)样本级别ROC曲线图;FIG14 is a sample-level ROC curve diagram of the optimal model (i.e., AdaBoost of the original variables) provided in an embodiment of the present invention;

图15为本发明实施例所提供的最优模型(即采用Lasso降维的AdaBoost)样本级别ROC曲线图;FIG15 is a sample-level ROC curve diagram of the optimal model (i.e., AdaBoost using Lasso dimensionality reduction) provided in an embodiment of the present invention;

图16为本发明实施例所提供的最优模型(即采用Lasso降维的AdaBoost)样本级别ROC曲线图。FIG16 is a sample-level ROC curve diagram of the optimal model (i.e., AdaBoost using Lasso dimensionality reduction) provided in an embodiment of the present invention.

具体实施方式DETAILED DESCRIPTION

下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The following will be combined with the drawings in the embodiments of the present invention to clearly and completely describe the technical solutions in the embodiments of the present invention. Obviously, the described embodiments are part of the embodiments of the present invention, not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by ordinary technicians in this field without creative work are within the scope of protection of the present invention.

应当理解,当在本说明书和所附权利要求书中使用时,术语“包括”和“包含”指示所描述特征、整体、步骤、操作、元素和/或组件的存在,但并不排除一个或多个其它特征、整体、步骤、操作、元素、组件和/或其集合的存在或添加。It should be understood that when used in this specification and the appended claims, the terms "include" and "comprises" indicate the presence of described features, integers, steps, operations, elements and/or components, but do not exclude the presence or addition of one or more other features, integers, steps, operations, elements, components and/or combinations thereof.

还应当理解,在此本发明说明书中所使用的术语仅仅是出于描述特定实施例的目的而并不意在限制本发明。如在本发明说明书和所附权利要求书中所使用的那样,除非上下文清楚地指明其它情况,否则单数形式的“一”、“一个”及“该”意在包括复数形式。It should also be understood that the terms used in this specification of the present invention are only for the purpose of describing specific embodiments and are not intended to limit the present invention. As used in the specification of the present invention and the appended claims, unless the context clearly indicates otherwise, the singular forms "a", "an" and "the" are intended to include plural forms.

还应当进一步理解,在本发明说明书和所附权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合,并且包括这些组合。It should be further understood that the term "and/or" used in the present description and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

下面请参见图1,图1为本发明实施例所提供的一种基于声音特征构建识别心力衰竭不同分期的分类模型的装置的结构示意图。该装置包括本处理单元101和模型训练单元201;Please refer to Figure 1 below, which is a schematic diagram of the structure of a device for building a classification model for identifying different stages of heart failure based on sound features provided by an embodiment of the present invention. The device includes a processing unit 101 and a model training unit 201;

其中,样本处理单元101,用于将所采集的语音模拟信号转换为语音数字信号,并对所述语音数字信号进行预处理,以及对预处理后的语音数字信号进行特征提取,得到多类语音特征样本。The sample processing unit 101 is used to convert the collected voice analog signal into a voice digital signal, preprocess the voice digital signal, and extract features from the preprocessed voice digital signal to obtain multiple types of voice feature samples.

在本单元中,声音是一种模拟信号,需将其转化为数字信号才可用于计算机进行处理,样本处理单元包括:用于采样的第一转换单元、用于量化的第二转换单元和用于编码的第三转换单元;In this unit, sound is an analog signal, which needs to be converted into a digital signal before it can be used for computer processing. The sample processing unit includes: a first conversion unit for sampling, a second conversion unit for quantization, and a third conversion unit for encoding;

其中,第一转换单元具体用于按预定采样周期将时间连续的语音模拟信号转换成时间离散、幅度连续的信号样本;The first conversion unit is specifically used to convert the time-continuous voice analog signal into a time-discrete and amplitude-continuous signal sample according to a predetermined sampling period;

采样周期即相邻两个采样点的时间间隔,采样频率为采样周期的倒数,如采样频率8kHz表示1秒钟采集8000个样本,因此采样频率越高,声音的还原度就越高,声音则更真实。The sampling period is the time interval between two adjacent sampling points, and the sampling frequency is the reciprocal of the sampling period. For example, a sampling frequency of 8kHz means that 8000 samples are collected in 1 second. Therefore, the higher the sampling frequency, the higher the degree of sound restoration and the more realistic the sound.

第二转换单元具体用于将幅度上连续取值(模拟量)的每一个信号样本转换为离散值(数字量),并使用二进制进行表示,得到数字数据;The second conversion unit is specifically used to convert each signal sample with continuous amplitude values (analog quantity) into a discrete value (digital quantity), and use binary to represent it to obtain digital data;

量化后的信号样本通常用二进制进行表示,采样精度是指每个信号样本占的二进制位数,可反映声音的质量,如常见的CD采用16bit的采样深度,可表示65535(2^16)个不同值,DVD使用24bit的采样深度,大部分电话设备使用8bit采样深度。The quantized signal samples are usually represented in binary. The sampling accuracy refers to the number of binary bits occupied by each signal sample, which can reflect the quality of the sound. For example, common CDs use a sampling depth of 16 bits, which can represent 65535 (2^16) different values. DVDs use a sampling depth of 24 bits, and most telephone devices use a sampling depth of 8 bits.

第三转换单元具体用于将所述数字数据转成二进制码流,得到语音数字信号。The third conversion unit is specifically used to convert the digital data into a binary code stream to obtain a voice digital signal.

编码是将采样和量化后的数字数据转成二进制码流,以便于计算机的存储、处理和传输,其中脉冲编码调制可达到最高保真水平,通过采样频率和精度可计算出声音的数据传输率:数据传输率(bps)=采样频率*精度*声道数,同时可计算声音信号的数据量:数据量(byte)=数据传输率*持续时间/8。Coding is the process of converting sampled and quantized digital data into a binary code stream for computer storage, processing and transmission. Pulse code modulation can achieve the highest fidelity level. The data transmission rate of sound can be calculated by sampling frequency and accuracy: data transmission rate (bps) = sampling frequency * accuracy * number of channels. At the same time, the data volume of the sound signal can be calculated: data volume (byte) = data transmission rate * duration / 8.

在本实施例中,为了提升语音数字信号质量和保留更多语音信息,在语音数字信号分析之前,需对语音数字信号进行预处理,即样本处理单元还包括:同步单元、端点检测单元、加重单元、分帧和加窗单元。In this embodiment, in order to improve the quality of the voice digital signal and retain more voice information, the voice digital signal needs to be preprocessed before the voice digital signal is analyzed, that is, the sample processing unit also includes: a synchronization unit, an endpoint detection unit, an emphasis unit, a framing and windowing unit.

其中,同步单元用于采用降采样的方法将多个语音数字信号同步到统一的采样率;The synchronization unit is used to synchronize multiple voice digital signals to a uniform sampling rate by using a downsampling method;

不同语音采集设备由于其采样频率、时长、位置不同,可能导致语音幅值具有差异,因此在信号分析前需要对原始语音进行降采样处理。例如原始信号的采样频率为22050Hz,为了减小计算复杂度,提高信号处理效率且不损失语音主要成分,根据奈奎斯特采样定理,本实施例可将信号降采样至16000Hz。Different speech acquisition devices may have different speech amplitudes due to their different sampling frequencies, durations, and locations, so the original speech needs to be downsampled before signal analysis. For example, if the sampling frequency of the original signal is 22050Hz, in order to reduce the computational complexity, improve the signal processing efficiency, and not lose the main components of the speech, according to the Nyquist sampling theorem, this embodiment can downsample the signal to 16000Hz.

端点检测单元用于对统一采样率之后的语音数字信号进行端点检测,区分出语音区域和非语音区域;The endpoint detection unit is used to perform endpoint detection on the voice digital signal after the unified sampling rate, and distinguish the voice area and the non-voice area;

端点检测,也称语音活动检测,目的是区分语音和非语音的区域,即为了从带有噪声的语音中准确地检测出语音的起始点和结束点,并去掉静音及噪声的部分,找到真正有效的语音内容。常见的方法如基于短时能量和短时过零率的双门限法,由于浊音的能量高于清音,且清音的过零率高于无声部分,因此可先利用短时能量检测浊音部分和过零率提取清音部分,则可完成语音的端点检测。Endpoint detection, also known as voice activity detection, aims to distinguish between speech and non-speech areas, that is, to accurately detect the start and end points of speech from noisy speech, and remove the silent and noisy parts to find the truly effective speech content. Common methods include the double threshold method based on short-time energy and short-time zero-crossing rate. Since the energy of voiced sounds is higher than that of unvoiced sounds, and the zero-crossing rate of unvoiced sounds is higher than that of silent parts, short-time energy can be used to detect the voiced parts and zero-crossing rate can be used to extract the unvoiced parts, so as to complete the endpoint detection of speech.

加重单元用于对所述语音区域的高频部分进行加重,增加语音的高频分辨率;The emphasis unit is used to emphasize the high frequency part of the speech area to increase the high frequency resolution of the speech;

由于语音信号的平均功率谱受声门激励和口鼻辐射影响,功率谱随频率的增加而减小,语音的能量主要集中在低频部分,为此要需进行预加重处理,目的是为了对语音的高频部分进行加重,去除口唇辐射的影响,增加语音的高频分辨率,使得信号频谱可从低频到高频的整个频带中能用同样的信噪比求频谱,便于频谱分析或声道参数分析。一般通过传递函数为高通数字滤波器来实现预加重,即H(z)=1-az-1,其中a为预加重系数,范围为0.9<a<1.0,一般取a=0.97。Since the average power spectrum of speech signal is affected by glottal excitation and oral-nasal radiation, the power spectrum decreases with the increase of frequency, and the energy of speech is mainly concentrated in the low-frequency part. Therefore, pre-emphasis processing is required to emphasize the high-frequency part of speech, remove the influence of lip radiation, increase the high-frequency resolution of speech, and make the signal spectrum from low frequency to high frequency The whole frequency band can be calculated with the same signal-to-noise ratio, which is convenient for spectrum analysis or vocal tract parameter analysis. Pre-emphasis is generally achieved by using a high-pass digital filter as the transfer function, that is, H(z)=1-az -1 , where a is the pre-emphasis coefficient, ranging from 0.9<a<1.0, and generally a=0.97.

分帧和加窗单元用于对加重后的所述语音区域进行分帧和加窗,得到多个语音信号分段。The framing and windowing unit is used to perform framing and windowing on the emphasized speech area to obtain a plurality of speech signal segments.

语音信号是非平稳的连续模拟信号,具有时变特性,但在一个短时间范围内(一般在10~30ms内),其特性基本保持不变即相对稳定,因此语音信号具有短时平稳性,意味着任何语音信号的分析和处理必须进行“短时分析”,将语音信号分段来分析其特征参数,其中每一段称为一“帧”,帧长一般为10~30ms。为了使帧与帧之间平滑过渡且保持连续性,可采用交叠分段的方法分帧。前一帧和后一帧的交叠部分称为帧移,帧移与帧长的比值一般为0~0.5。本实施例中帧长取25ms,帧移取10ms。Speech signals are non-stationary continuous analog signals with time-varying characteristics, but within a short time range (generally within 10 to 30 ms), their characteristics remain basically unchanged, that is, relatively stable. Therefore, speech signals have short-term stability, which means that any analysis and processing of speech signals must be "short-term analysis". The speech signal is segmented to analyze its characteristic parameters, where each segment is called a "frame" and the frame length is generally 10 to 30 ms. In order to achieve a smooth transition between frames and maintain continuity, overlapping segmentation can be used to divide the frames. The overlapping part of the previous frame and the next frame is called frame shift, and the ratio of frame shift to frame length is generally 0 to 0.5. In this embodiment, the frame length is 25 ms and the frame shift is 10 ms.

因语音信号具有短时平稳性,除了要对信号进行分帧处理外,还要对其加窗处理,目的是对抽样的样本附近的语音波形加以强调,而对波形的其余部分加以减弱。常用的窗函数有矩形窗、汉明窗等,矩形窗具有较高的频谱分辨率,但相邻谐波干扰比较严重,损失高频成分而导致波形的细节丢失,而汉明窗则与之相反。Because speech signals have short-term stability, in addition to framing the signal, it is also necessary to perform windowing, the purpose of which is to emphasize the speech waveform near the sampled sample and weaken the rest of the waveform. Commonly used window functions include rectangular window and Hamming window. The rectangular window has a higher spectral resolution, but the adjacent harmonic interference is more serious, the loss of high-frequency components leads to the loss of waveform details, while the Hamming window is the opposite.

在本实施例中,样本处理单元还包括:提取单元和合并单元;In this embodiment, the sample processing unit further includes: an extraction unit and a merging unit;

其中,提取单元用于使用openSMILE开源工具包提取多维第一语音特征样本,以及利用python提取多维第二语音特征样本;The extraction unit is used to extract a multi-dimensional first speech feature sample using the openSMILE open source toolkit, and to extract a multi-dimensional second speech feature sample using Python;

其中,提取单元用于将所述多维第一语音特征样本和多维第二语音特征样本合并,得到多类语音特征样本。The extraction unit is used to merge the multi-dimensional first speech feature samples and the multi-dimensional second speech feature samples to obtain multi-category speech feature samples.

需要说明的是,第一语音特征样本和第二语音特征样本可按实际需求来提取。本实施例所提取的多类语音特征样本共100维。第一语音特征样本使用eGeMAPS特征集,是GeMAPS的扩展特征集,该特征集是由openSMILE开源工具包所提取的88维的手工特征,其中包含低级别描述特征(Low-Level Descriptors,LLDs)18个,并在GeMAPS基础上增加了5个谱特征(MFCC1-4和Spectral flux)和2个频率相关特征(即第二个共振峰和第三个共振峰的带宽),包括频率、能量/振幅相关特征和谱特征。此外还利用python提取了12维第二语音特征样本。It should be noted that the first speech feature sample and the second speech feature sample can be extracted according to actual needs. The multi-category speech feature samples extracted in this embodiment have a total of 100 dimensions. The first speech feature sample uses the eGeMAPS feature set, which is an extended feature set of GeMAPS. The feature set is an 88-dimensional manual feature extracted by the openSMILE open source toolkit, which includes 18 low-level descriptors (LLDs), and adds 5 spectral features (MFCC1-4 and spectral flux) and 2 frequency-related features (i.e., the bandwidth of the second resonance peak and the third resonance peak) on the basis of GeMAPS, including frequency, energy/amplitude-related features and spectral features. In addition, a 12-dimensional second speech feature sample was extracted using python.

在本实施例中,所述第一语音特征样本包括:音高特征、频率微扰特征、共振峰特征、振幅微扰特征、响度特征、谐噪比特征、谐波差异特征、α比值特征、Hammarberg系数特征、谱斜率特征、梅尔倒谱系数特征、频谱流量特征、响度峰值的比率特征、连续声音区域和无声区域特征、等效声级特征。第一语音特征样本的详细信息如表1所示。In this embodiment, the first speech feature sample includes: pitch feature, frequency perturbation feature, formant feature, amplitude perturbation feature, loudness feature, harmonic-to-noise ratio feature, harmonic difference feature, α ratio feature, Hammarberg coefficient feature, spectral slope feature, Mel cepstral coefficient feature, spectral flow feature, loudness peak ratio feature, continuous sound area and silent area feature, equivalent sound level feature. The detailed information of the first speech feature sample is shown in Table 1.

表1Table 1

Figure SMS_1
Figure SMS_1

Figure SMS_2
Figure SMS_2

在表1中,音高特征即声带振动的基本频率,表示每秒钟声带振动的次数,特征描述:log F0,在半音频率尺度上计算,从27.5Hz开始;频率微扰特征即反映声波相邻周期间频率变化,特征描述:单个连续基音周期内的偏差;共振峰特征,特征描述:即第一、二、三共振峰的中心频率、带宽,前三个共振峰与基音的能量比;振幅微扰特征即,特征描述:反映声波相邻周期间幅度的变化;响度特征即声音的大小程度;谐噪比特征,特征描述:即稳态元音的声波中周期性重复的谐波分量所占的比重;谐波差异特征,特征描述:即第一个基音谐波H1与第二基音谐波H2的能量比或第一个基音谐波H1与第三共振峰H3的能量比;α比值特征,特征描述:即50-1000Hz的能量和除以1-5kHz的能量和;Hammarberg系数特征,特征描述:即0-2kHz的最强能量峰除以2-5kHz的最强能量峰;谱斜率特征即,特征描述:在0-500Hz和500-1500Hz范围内的对数功率谱的线性回归斜率;梅尔倒谱系数特征即,特征描述:梅尔倒谱系数1-4;频谱流量特征即,特征描述:两个相邻帧的频谱差异;响度峰值的比率特征即,特征描述:每秒响度峰的个数;连续声音区域和无声区域特征即,特征描述:连续浊音(F0>0)时长和清音(F0=0)时长;等效声级特征即等效声级指某一段时间内的A声级按能量的平均值。In Table 1, the pitch feature is the fundamental frequency of vocal cord vibration, which indicates the number of times the vocal cord vibrates per second, and the feature description is log F0, calculated on a semitone frequency scale, starting from 27.5 Hz; the frequency perturbation feature is the frequency change between adjacent cycles of the sound wave, and the feature description is the deviation within a single continuous fundamental tone cycle; the resonance peak feature, the feature description is the center frequency and bandwidth of the first, second, and third resonance peaks, and the energy ratio of the first three resonance peaks to the fundamental tone; the amplitude perturbation feature is the feature description: reflecting the amplitude change between adjacent cycles of the sound wave; the loudness feature is the size of the sound; the harmonic-to-noise ratio feature, the feature description is the proportion of periodically repeated harmonic components in the sound wave of the steady-state vowel; the harmonic difference feature, the feature description is the energy ratio of the first fundamental tone harmonic H1 to the second fundamental tone harmonic H2 or the energy ratio of the first fundamental tone harmonic H1 to the third resonance peak H3; the α ratio feature, the feature description is 50-1000H The energy sum of z divided by the energy sum of 1-5kHz; Hammarberg coefficient feature, feature description: that is, the strongest energy peak of 0-2kHz divided by the strongest energy peak of 2-5kHz; spectral slope feature, that is, feature description: the linear regression slope of the logarithmic power spectrum in the range of 0-500Hz and 500-1500Hz; Mel cepstral coefficient feature, that is, feature description: Mel cepstral coefficients 1-4; spectral flow feature, that is, feature description: the spectral difference between two adjacent frames; loudness peak ratio feature, that is, feature description: the number of loudness peaks per second; continuous sound area and silent area feature, that is, feature description: the duration of continuous voiced sound (F0>0) and the duration of unvoiced sound (F0=0); equivalent sound level feature, that is, the equivalent sound level refers to the average value of the A sound level in a certain period of time according to energy.

在本实施例中,所述第二语音特征样本包括:声门噪声激励比特征、声带激励比特征、循环周期密度熵特征、消除趋势波动分析特征、样本熵特征、多尺度熵特征。第二语音特征样本的详细信息如表2所示。In this embodiment, the second speech feature sample includes: glottal noise excitation ratio feature, vocal cord excitation ratio feature, cycle density entropy feature, detrending fluctuation analysis feature, sample entropy feature, and multi-scale entropy feature. The detailed information of the second speech feature sample is shown in Table 2.

表2Table 2

Figure SMS_3
Figure SMS_3

Figure SMS_4
Figure SMS_4

在表2中,声门噪声激发比特征(GNE)和声带振动激发比特征(VFER)都是用来量化语音信号中由声门脉冲触发的各频段同时激发的正常语音信号以及由混乱噪声(通常是由声门的不完全闭合引起的)触发的各频段无序激发的噪声信号的能量比重。比如要计算GNE类参数,对于采样频率为44.1kHz的原始语音信号,首先对信号进行下采样到10KHz;然后使用逆滤波的方法找出每个声门的张开以及闭合的时间点,从而找出其张开以及闭合的时间序列;然后对于每一个时间序列,使用滤波器以500Hz为带宽分别滤出0-500Hz、500-1000Hz、1000-1500Hz等直到11.5KHz的频带;对于每一个频带,将低频段的前五个频带(1Hz-2.5KHz)作为信号,将剩下的高频带(2.5KHz-11.5KHz)作为噪声,并分别计算信号和噪声的SEO、TKEO能量值;最后根据计算所得的能量值分别求出信噪比SNR以及噪信比NSR。VFER参数的计算与GNE的步骤大致相同,只是去除了GNE过程中对信号进行下采样的过程,并且在GNE的第二步使用了DYSPA算法而并非逆滤波方法来得到声门张开与闭合序列。换句话说,GNE和VFER,是先采用逆滤波(用于GNE)或DYSPA(用于VFER)检测给定时间窗内的声门脉冲;然后将原始声音分为两部分:2.5kHz以上为噪声和2.5kHz以下为能量信号;再结合SEO和TKEO的概念,计算信号不同频段的能量值,得到经验模态分解激励比(Empiricalmode decomposition excitation ratio,EMDER-ER)测量的信噪比。计算各时间窗值后,计算GNE和VFER的平均值和标准值。即可计算以下参数:GNE_SEO_SNR、GNE_TKEO_SNR、GNE_mean、GNE_std、VFER_SEO_SNR、VFER_TKEO_SNR、VFER_mean、VFER_std。In Table 2, the glottal noise excitation ratio feature (GNE) and the vocal fold vibration excitation ratio feature (VFER) are used to quantify the energy proportion of the normal speech signal in which each frequency band is simultaneously excited by the glottal pulse and the noise signal in which each frequency band is disorderly excited by the chaotic noise (usually caused by incomplete closure of the glottis). For example, to calculate GNE parameters, for the original speech signal with a sampling frequency of 44.1kHz, first downsample the signal to 10KHz; then use the inverse filtering method to find the opening and closing time points of each glottis, so as to find out its opening and closing time series; then for each time series, use a filter with a bandwidth of 500Hz to filter out the frequency bands of 0-500Hz, 500-1000Hz, 1000-1500Hz, etc. up to 11.5KHz; for each frequency band, use the first five frequency bands of the low frequency band (1Hz-2.5KHz) as the signal, and the remaining high frequency band (2.5KHz-11.5KHz) as the noise, and calculate the SEO and TKEO energy values of the signal and noise respectively; finally, the signal-to-noise ratio SNR and the noise-to-signal ratio NSR are calculated according to the calculated energy values. The calculation of VFER parameters is roughly the same as that of GNE, except that the process of downsampling the signal in the GNE process is removed, and the DYSPA algorithm is used in the second step of GNE instead of the inverse filtering method to obtain the glottal opening and closing sequence. In other words, GNE and VFER first use inverse filtering (for GNE) or DYSPA (for VFER) to detect the glottal pulse within a given time window; then the original sound is divided into two parts: noise above 2.5kHz and energy signal below 2.5kHz; then, combined with the concepts of SEO and TKEO, the energy values of different frequency bands of the signal are calculated to obtain the signal-to-noise ratio measured by the empirical mode decomposition excitation ratio (EMDER-ER). After calculating the values of each time window, the average and standard values of GNE and VFER are calculated. The following parameters can be calculated: GNE_SEO_SNR, GNE_TKEO_SNR, GNE_mean, GNE_std, VFER_SEO_SNR, VFER_TKEO_SNR, VFER_mean, VFER_std.

循环周期密度熵特征(RPDE)指运用在动力系统、随机过程和时间序列分析领域中的一种方法,用于确定信号的周期性或重复性。其值在0到1之间,对于准周期信号,其值为0,而均匀白噪声的值接近为1。其计算方法如下:Cyclic period density entropy (RPDE) is a method used in the fields of dynamical systems, random processes and time series analysis to determine the periodicity or repeatability of a signal. Its value is between 0 and 1. For quasi-periodic signals, its value is 0, while the value of uniform white noise is close to 1. Its calculation method is as follows:

第一步,先将时间序列Xn=[xn,xn+r,xn+2r,…,xn+(M-1)r]根据Taken嵌入理论投射到一个相位空间;这里的M是嵌入的维数,T是嵌入的延时,并且这些参数都是通过参数寻优算法得到的。第二步,在相位空间中的每一点Xn,画一个半径为的M维区域,然后记录下每一次这个时间序列到达这个区域以及离开这个区域的时间差,将这个时间差画作一个直方图,最后将其进行归一化,就得到了回归密度函数P(T)。The first step is to project the time series Xn = [ xn , xn+r , xn+2r , …, xn+(M-1)r ] into a phase space according to Taken embedding theory; here M is the embedding dimension, T is the embedding delay, and these parameters are obtained through the parameter optimization algorithm. The second step is to draw an M-dimensional region with a radius of each point Xn in the phase space, and then record the time difference of each time the time series arrives at this region and leaves this region, draw this time difference into a histogram, and finally normalize it to obtain the regression density function P(T).

由公式By formula

Figure SMS_5
Figure SMS_5

其中Tmax为嵌入到相位空间的最大延时,便可得到循环周期密度熵的值。Where T max is the maximum delay embedded in the phase space, and the value of the cycle density entropy can be obtained.

消除趋势波动分析(DFA)是一种标度指数计算方法,用于消除时间序列中趋势项对波动分析的影响,用来分析语音信号的长程相关性,即判断时间序列中的噪音项是否具有正或负的自相关,它的一个优点是可以有效地滤去序列中的各阶趋势成分,检测含有噪声且叠加有多项式趋势信号的长程相关,适合非平稳时间序列的长程幂律相关分析,适合非平稳时间序列的长程幂律相关分析。Detrended fluctuation analysis (DFA) is a scaling index calculation method used to eliminate the influence of trend terms in time series on fluctuation analysis. It is used to analyze the long-range correlation of speech signals, that is, to determine whether the noise terms in the time series have positive or negative autocorrelation. One of its advantages is that it can effectively filter out trend components of various orders in the sequence and detect long-range correlations containing noise and superimposed with polynomial trend signals. It is suitable for long-range power-law correlation analysis of non-stationary time series.

其实现方法如下:The implementation method is as follows:

1.首先,对于序列x(t),计算其累积偏差y(t)1. First, for the sequence x(t), calculate its cumulative deviation y(t)

Figure SMS_6
Figure SMS_6

式中:

Figure SMS_7
为序列x(t)的平均值。Where:
Figure SMS_7
is the mean value of the sequence x(t).

此处,首先滤去了时间序列的平均值。由于循环或者波动成分对于一般的时间序列的可能存在,那么一个时间序列可能有随机成分,把序列的这些成分过滤掉则会有很大的帮助。Here, the average value of the time series is first filtered out. Since cyclic or fluctuating components may exist in general time series, a time series may have random components, and filtering out these components of the series will be of great help.

2.进行序列重构,对Y(t)分别进行等长分割,以长度s将其分割为m个不重叠的区间,其中m=[n/s](取整);由于序列长度并不总是增量s的整数倍,因此,序列尾端有时会出现小部分的数据信息未能被使用,因此,对序列的颠倒顺序进行同样的操作,共得到2N个等长度的区间。2. Reconstruct the sequence and divide Y(t) into m non-overlapping intervals of equal length with length s, where m = [n/s] (rounded); since the sequence length is not always an integer multiple of the increment s, sometimes a small amount of data information at the end of the sequence is not used. Therefore, the same operation is performed on the reverse order of the sequence to obtain 2N intervals of equal length.

3.对每个区间v,用最小二乘法分别对每个区间所包含的S个数据进行一阶线性拟合。3. For each interval v, use the least squares method to perform a first-order linear fit on the S data contained in each interval.

Figure SMS_8
Figure SMS_8

4.计算每个区间滤去趋势后的均方差(此处将顺序和逆序分别公式进行计算):4. Calculate the mean square error after filtering out the trend in each interval (here the sequential and reverse order are calculated separately):

Figure SMS_9
Figure SMS_9

5.对所有等长度区间求均值并开方,计算得到DFA波动函数:5. Take the mean and square root of all intervals of equal length to calculate the DFA fluctuation function:

Figure SMS_10
Figure SMS_10

6.如果径流时间序列{x(t)}长程幂律相关,则F(s)与s满足如下幂律关系:6. If the runoff time series {x(t)} is long-range power-law related, then F(s) and s satisfy the following power-law relationship:

ln(F(s))~hln(s)ln(F(s))~hln(s)

对上式两边同时取对数得:Taking the logarithm of both sides of the above equation, we get:

F(s)~sh F(s)~s h

在双对数坐标(ln(s),ln(F(s)))中的散点图,用最小二乘法对数据点进行拟合,其中直线部分的斜率,即Hurst指数。In the scatter plot in double logarithmic coordinates (ln(s), ln(F(s))), the data points are fitted using the least squares method, and the slope of the straight line is the Hurst exponent.

Hurst指数和相关性的关系The relationship between Hurst index and correlation

(1)当0.5<h<1时,说明时间序列具有长程相关性,呈现出趋势不断增强的状态,即在某一时间段是递增(递减)趋势,下一个时间段也会是递增(递减)趋势,且h越接近于1,相关性越强。(1) When 0.5<h<1, it means that the time series has long-range correlation and presents a state of increasing trend. That is, if there is an increasing (decreasing) trend in a certain time period, there will also be an increasing (decreasing) trend in the next time period. The closer h is to 1, the stronger the correlation.

(2)当h=0.5时,说明时间序列不相关,是一个独立的随机过程,即当前状态不会影响将来状态。(2) When h = 0.5, it means that the time series is uncorrelated and is an independent random process, that is, the current state will not affect the future state.

(3)当0<h<0.5时,说明径流时间序列只存在负的相关性,呈现反持久性的状态,即时间序列在某一个时间段是递增(递减)的趋势,则在下一个时间段是递减(递增)的趋势。(3) When 0<h<0.5, it means that the runoff time series has only negative correlation and presents an anti-persistent state, that is, if the time series has an increasing (decreasing) trend in a certain time period, it will have a decreasing (increasing) trend in the next time period.

经验分解模式比(EMD-ER):对于采样频率为44.1kHz的原始语音信号,可以分解为有限个本征模函数(Intrinsic Mode Function,简称IMF),所分解出来的各IMF分量包含了原信号的不同时间尺度的局部特征信号。最初分解的本征模函数为高频噪声信号,之后分解的本征模函数为实际有用信号。根据能量算子公式

Figure SMS_11
和香农熵,可以计算出每个IMF的SEO、TKEO和香农熵。在计算信噪比的时候,将前四个IMF作为噪声信号,由公式
Figure SMS_12
(u为每个IMF的SEO、TKEO和香农熵的值,D为分解得到的IMF的个数)可以得到有关信噪比的参数。在计算噪信比时候,先对每个IMF取对数,将前两个IMF作为噪声信号,然后再计算每个IMF的SEO、TKEO和香农熵的值,由公式
Figure SMS_13
(u为每个IMF的SEO、TKEO和香农熵的值,D为分解得到的IMF的个数)可以得到有关噪信比的参数。Empirical Decomposition Mode Ratio (EMD-ER): For the original speech signal with a sampling frequency of 44.1kHz, it can be decomposed into a finite number of intrinsic mode functions (IMFs). The decomposed IMF components contain local characteristic signals of the original signal at different time scales. The initially decomposed intrinsic mode function is a high-frequency noise signal, and the subsequently decomposed intrinsic mode function is an actual useful signal. According to the energy operator formula
Figure SMS_11
And Shannon entropy, we can calculate the SEO, TKEO and Shannon entropy of each IMF. When calculating the signal-to-noise ratio, the first four IMFs are used as noise signals, and the formula
Figure SMS_12
(u is the SEO, TKEO and Shannon entropy value of each IMF, D is the number of IMFs obtained by decomposition) can be used to obtain the parameters related to the signal-to-noise ratio. When calculating the signal-to-noise ratio, first take the logarithm of each IMF, take the first two IMFs as noise signals, and then calculate the SEO, TKEO and Shannon entropy values of each IMF. The formula is
Figure SMS_13
(u is the value of SEO, TKEO and Shannon entropy of each IMF, and D is the number of IMFs obtained by decomposition) parameters related to the noise-to-signal ratio can be obtained.

样本熵特征(SampEn)是一种基于近似熵的用于度量时间序列复杂性的改进方法;The sample entropy feature (SampEn) is an improved method for measuring the complexity of time series based on approximate entropy;

多尺度熵特征(MSEn)即把样本熵扩展到多个时间尺度,计算不同时间尺度下信号的复杂性。Multi-scale entropy feature (MSEn) expands the sample entropy to multiple time scales and calculates the complexity of signals at different time scales.

声门熵(Glottis quotient)GQ是通过计算语音信号的均值以及标准差来衡量声带振动的稳定性。首先使用DYSPA算法找出声门的张开点以及闭合点,将一段语音信号划分为若干声门张开片段以及声门闭合片段,然后分别对声门张开的片段以及闭合的片段求取其均值以及标准差。Glottis quotient (GQ) measures the stability of vocal cord vibration by calculating the mean and standard deviation of speech signals. First, the DYSPA algorithm is used to find the opening and closing points of the glottis, and a speech signal is divided into several glottis opening segments and glottis closing segments. Then, the mean and standard deviation of the glottis opening segments and glottis closing segments are calculated respectively.

MFCC是Mel标度频率域提取出来的倒谱参数,Mel标度描述了人耳频率的非线性特性。39维的MFCC从左至包括一个对数能量和12个倒谱参数,差量运算和差差量运算。MFCC is the cepstral parameter extracted from the Mel scale frequency domain. The Mel scale describes the nonlinear characteristics of the human ear frequency. The 39-dimensional MFCC includes a logarithmic energy and 12 cepstral parameters, difference operation and difference difference operation from left to right.

模型训练单元201,用于构建用于识别心力衰竭分期的分类模型,利用所述多类语音特征样本对所述分类模型进行训练和优化,得到最优分类模型,其中,用于识别心力衰竭A期和心力衰竭B期的分类模型为基于原始变量的AdaBoost分类模型;用于识别心力衰竭B期和C期的分类模型为基于Lasso降维的AdaBoost分类模型;用于识别心力衰竭A期和B期与C期的分类模型为基于Lasso降维的AdaBoost分类模型。The model training unit 201 is used to construct a classification model for identifying heart failure stages, and use the multi-category speech feature samples to train and optimize the classification model to obtain the optimal classification model, wherein the classification model for identifying heart failure stage A and heart failure stage B is an AdaBoost classification model based on the original variables; the classification model for identifying heart failure stage B and stage C is an AdaBoost classification model based on Lasso dimensionality reduction; the classification model for identifying heart failure stage A, stage B and stage C is an AdaBoost classification model based on Lasso dimensionality reduction.

在一实施例中,需将二分类模型和降维方法组合,从而对分类器算法进行优化。本实施例中,从6种分类模型和2种降维方式中进行筛选,即支持向量机分类模型(SVM)、决策树分类模型(DT)、自适应增强分类模型(Ada Boost)、最小绝对收缩和选择算子分类模型(LASSO)、岭回归分类模型(Ridge regression)、弹性网络分类模型(Elastic Net)和主成分分析(PCA)降维、LASSO降维。In one embodiment, it is necessary to combine the binary classification model and the dimensionality reduction method to optimize the classifier algorithm. In this embodiment, six classification models and two dimensionality reduction methods are selected, namely, support vector machine classification model (SVM), decision tree classification model (DT), adaptive boost classification model (Ada Boost), least absolute shrinkage and selection operator classification model (LASSO), ridge regression classification model (Ridge regression), elastic network classification model (Elastic Net) and principal component analysis (PCA) dimensionality reduction, LASSO dimensionality reduction.

最终建立的模型如表3所示,包含3个模型:①基于原始变量代入的分类模型;②基于PCA降维后,利用主成分代入的分类模型;③基于LASSO特征选择,将特征变量代入的分类模型。The final model is shown in Table 3, which includes three models: ① classification model based on original variables; ② classification model based on principal components after PCA dimensionality reduction; ③ classification model based on LASSO feature selection and feature variables.

表3Table 3

Figure SMS_14
Figure SMS_14

本发明实施例的装置还包括:评估单元,用于根据留出法对所述分类模型进行模型评估。The device of the embodiment of the present invention further includes: an evaluation unit, configured to perform model evaluation on the classification model according to a holdout method.

模型建立应尽量避免“过拟合”,即分类器把训练样本自身的一些特点当作了所有潜在样本都会具有的一般性质,从而导致泛化性能下降,表现为最终模型在训练集上效果好,在测试集上效果差。也应避免“欠拟合”,即训练样本的一般性质仍未学好,在训练集及测试集上的表现都较差的情况,“过拟合”无法避免,只能缓解,而“欠拟合”可通过增加特征数、增加模型复杂度、减小正则化系数等方法克服,最终希望模型对训练数据集有很好的拟合(即训练误差),同时可以对未知数据集(即测试集)有很好的拟合结果(即泛化能力)。When building a model, we should try to avoid "overfitting", that is, the classifier regards some characteristics of the training sample itself as the general properties that all potential samples will have, which leads to a decrease in generalization performance, which is manifested as the final model has good results on the training set and poor results on the test set. We should also avoid "underfitting", that is, the general properties of the training samples have not been learned well, and the performance on both the training set and the test set is poor. "Overfitting" cannot be avoided, but can only be alleviated, while "underfitting" can be overcome by increasing the number of features, increasing the complexity of the model, and reducing the regularization coefficient. Ultimately, we hope that the model has a good fit for the training data set (i.e., training error), and at the same time, it can have a good fit result (i.e., generalization ability) for the unknown data set (i.e., test set).

模型建立通常需将样本数据分为训练集和测试集,且为两者为互斥的集合,模型评估则需要一个测试集来测试学习器对新样本的判别能力,将测试集上的“测试误差”近似为泛化误差。常用的方法包括留出法、交叉验证法和自助法。Model building usually requires dividing sample data into a training set and a test set, and the two are mutually exclusive sets. Model evaluation requires a test set to test the learner's ability to discriminate new samples, and the "test error" on the test set is approximated as the generalization error. Commonly used methods include holdout method, cross-validation method and bootstrap method.

留出法:直接将数据集分为两个互斥的集合,一个集合作为训练集,另一个作为测试集。训练集与测试集的划分要尽可能保持数据分布的一致性,避免因数据划分过程引入额外的偏差而对最终结果产生影响。由于单次使用留出法得到的估计结果往往不可靠,一般要采用若干次随机划分并重复进行实验评估后取平均值作为留出法的评估结果,一般取2/3到4/5的样本用于训练,剩余的用于测试。本实施例中采用留一法,留一法是留出法的其中一种方式。Holdout method: Directly divide the data set into two mutually exclusive sets, one set as the training set and the other as the test set. The division of the training set and the test set should maintain the consistency of the data distribution as much as possible to avoid the impact of additional deviations introduced by the data division process on the final result. Since the estimation results obtained by using the holdout method once are often unreliable, it is generally necessary to use several random divisions and repeat the experimental evaluation and take the average value as the evaluation result of the holdout method. Generally, 2/3 to 4/5 of the samples are used for training, and the rest are used for testing. The hold-out method is adopted in this embodiment, which is one of the methods of the hold-out method.

本发明实施例的装置还包括:性能度量单元,用于根据预定指标对所述分类模型进行性能度量,其中,所述预定指标包括错误率与准确率、查准率与查全率、F1值、特异度、灵敏度、ROC曲线与AUC、未加权平均召回率。The device of the embodiment of the present invention also includes: a performance measurement unit, which is used to perform performance measurement on the classification model according to predetermined indicators, wherein the predetermined indicators include error rate and accuracy, precision rate and recall rate, F1 value, specificity, sensitivity, ROC curve and AUC, and unweighted average recall rate.

对分类器的泛化性能进行评估,需要有衡量模型泛化能力的评价标准,即性能度量。对于二分类问题,可根据真实类别和分类器预测类别的组合形成“混淆矩阵(confusionmatrix)”,包含四种情况:真阳性(True Positive,TP)、真阴性(True Negative,TN)、假阳性(FalsePositive,FP)、假阴性(False Negative,FN),混淆矩阵如表4所示。To evaluate the generalization performance of the classifier, we need an evaluation standard to measure the generalization ability of the model, that is, performance measurement. For binary classification problems, a "confusion matrix" can be formed based on the combination of the true category and the classifier's predicted category, which includes four cases: true positive (True Positive, TP), true negative (True Negative, TN), false positive (False Positive, FP), and false negative (False Negative, FN). The confusion matrix is shown in Table 4.

表4Table 4

Figure SMS_15
Figure SMS_15

所述预定指标分别为:The predetermined indicators are:

准确率(ACC):又称精度,指所有分类正确的样本数占总样本数的比例。对二分类任务和多分类任务均适用,可判断总的正确率,但在样本不平衡的情况下,准确率将会失效,不能作为来衡量结果的指标,则需其他指标补充。Accuracy (ACC): Also known as precision, it refers to the ratio of all correctly classified samples to the total number of samples. It is applicable to both binary and multi-classification tasks and can be used to determine the overall accuracy. However, in the case of unbalanced samples, the accuracy will be invalid and cannot be used as an indicator to measure the results. Other indicators are needed to supplement it.

ACC=(TP+TN)/(TP+FN+FP+TN)ACC=(TP+TN)/(TP+FN+FP+TN)

错误率(ERR):指所有分类错误的样本数占总样本数的比例,ERR=1-ACC。Error rate (ERR): refers to the ratio of all misclassified samples to the total number of samples, ERR = 1-ACC.

ERR=(FN+FP)/(TP+FN+FP+TN)ERR=(FN+FP)/(TP+FN+FP+TN)

查准率(P):又称精准率,主要针对预测结果,指所有被预测为阳性的样本占实际为阳性样本的比例。Precision (P): Also known as accuracy, it mainly refers to the prediction results and refers to the proportion of all samples predicted to be positive to the actual positive samples.

P=TP/(TP+FP)P=TP/(TP+FP)

召回率(R):又称查全率或灵敏度(Sensitivity,SEN),主要针对原样本,指实际为阳性的样本中,被预测为阳性样本的比例。Recall rate (R): also known as recall rate or sensitivity (SEN), mainly for the original sample, refers to the proportion of samples predicted to be positive among the samples that are actually positive.

R=TP/(TP+FN)R=TP/(TP+FN)

特异度:(SPE):指实际为阴性的样本中,被预测为阴性的比例。Specificity: (SPE): refers to the proportion of samples predicted to be negative among those that are actually negative.

SPE=TN/(FP+TN)SPE=TN/(FP+TN)

F1分数(F1 Score):为查准率和召回率的调和平均数,取值为0~1,在统计学中常用来衡量二分类(或多任务二分类)模型的精确度。F1 Score: It is the harmonic mean of precision and recall, ranging from 0 to 1. It is often used in statistics to measure the accuracy of binary classification (or multi-task binary classification) models.

F1=2*P*R/(P+R)F1=2*P*R/(P+R)

ROC曲线与AUC:ROC曲线全称为“受试者工作特征曲线”,ROC曲线图的纵坐标为真阳性率(即灵敏度),横坐标为假阳性率(1-灵敏度),在不同的阈值下获得坐标点并连接而得到。若ROC曲线与对角线越相近,模型的准确率则越低。假设A分类器ROC曲线可“包住”B分类器,则可说明A分类器具有更好分类性能,但当两个分类器的ROC曲线发生交叉,则难以判断两者性能优劣,这时可使用ROC曲线下的面积进行衡量,即AUC(Area Under ROC Curve),由于ROC曲线一般都处于y=x这条直线的上方,取值一般为0.5~1,AUC值(面积)越大,代表分类器性能越好。ROC curve and AUC: The full name of ROC curve is "receiver operating characteristic curve". The ordinate of the ROC curve is the true positive rate (i.e. sensitivity), and the abscissa is the false positive rate (1-sensitivity). The coordinate points are obtained and connected at different thresholds. The closer the ROC curve is to the diagonal line, the lower the accuracy of the model. Assuming that the ROC curve of classifier A can "enclose" classifier B, it can be said that classifier A has better classification performance, but when the ROC curves of the two classifiers intersect, it is difficult to judge the performance of the two. At this time, the area under the ROC curve can be used for measurement, that is, AUC (Area Under ROC Curve). Since the ROC curve is generally above the straight line y=x, the value is generally 0.5~1. The larger the AUC value (area), the better the classifier performance.

未加权平均召回率(UAR):若分类器标签分布不均匀,传统的评价指标(如ACC、P、R、F1等)将会导致对样本量多的那一类结果过于乐观,则可使用UAR作为性能度量,避免所提出的分类器方法对某个类别的过拟合。Unweighted Average Recall (UAR): If the classifier labels are unevenly distributed, traditional evaluation indicators (such as ACC, P, R, F1, etc.) will lead to overly optimistic results for the class with a large number of samples. UAR can be used as a performance metric to avoid overfitting of the proposed classifier method to a certain category.

具体实施例Specific embodiments

患者资料:从2021年4月到2022年12月共纳入患者101例,根据心衰分期,分为A组(A期,n=35)、B组(B期,n=26)、C组(C期,n=40),同时纳入29例没有心衰的志愿者N组(n=29)。三组不同心力衰竭分期间的性别、身体质量指数(BMI)、收缩压、血红蛋白值、肌酐、低密度脂蛋白胆固醇(LDL-C)、冠心病史、高血压病史、糖尿病史、吸烟史、饮酒史、血脂异常史无明显差异。三组不同心力衰竭分期间的年龄、肌酐、肌钙蛋白、N型脑利钠肽前体、左室射血分数、左室内径的差异均具有统计学意义(P<0.05)。具体临床资料如表5所示:Patient data: From April 2021 to December 2022, a total of 101 patients were included. According to the stage of heart failure, they were divided into group A (stage A, n=35), group B (stage B, n=26), and group C (stage C, n=40). At the same time, 29 volunteers without heart failure were included in group N (n=29). There were no significant differences in gender, body mass index (BMI), systolic blood pressure, hemoglobin value, creatinine, low-density lipoprotein cholesterol (LDL-C), history of coronary heart disease, history of hypertension, history of diabetes, history of smoking, history of drinking, and history of dyslipidemia among the three groups with different stages of heart failure. The differences in age, creatinine, troponin, N-type brain natriuretic peptide precursor, left ventricular ejection fraction, and left ventricular internal diameter among the three groups with different stages of heart failure were statistically significant (P<0.05). The specific clinical data are shown in Table 5:

表5Table 5

A期(n=35)Phase A (n=35) B期(n=26)Phase B (n=26) C期(n=40)Phase C (n=40) PP 性别(%,M)Gender (%, M) 28(68%)28(68%) 28(97%)28(97%) 32(80%)32(80%) 0.0140.014 年龄age 46±1246±12 51±1351±13 57±1257±12 <0.001<0.001 BMIBMI 25.7±3.525.7±3.5 26.9±3.826.9±3.8 25.3±4.825.3±4.8 0.2460.246 收缩压(mmHg)Systolic blood pressure (mmHg) 139±20139±20 141±24141±24 121±18121±18 <0.001<0.001 血红蛋白Hemoglobin 145±15145±15 146±11146±11 141±28141±28 0.5180.518 肌钙蛋白(ng/ml)Troponin (ng/ml) 0.005(0.003)0.005(0.003) 0.011(0.013)0.011(0.013) 0.031(0.045)0.031(0.045) <0.001<0.001 NT-ProBNP(pg/ml)NT-ProBNP (pg/ml) 31.0(45.4)31.0(45.4) 59.0(168.4)59.0(168.4) 1391.0(1754.0)1391.0(1754.0) <0.001<0.001 LDL-c(mmol/L)LDL-c (mmol/L) 2.90±1.092.90±1.09 2.65±1.122.65±1.12 2.54±1.102.54±1.10 0.3260.326 左室射血分数(%)Left ventricular ejection fraction (%) 67±567±5 66±566±5 41±1041±10 <0.001<0.001 左室内径(舒张期mm)Left ventricular internal diameter (diastolic mm) 45±345±3 47±447±4 61±1061±10 <0.001<0.001 冠心病史(%)History of coronary heart disease (%) 15(37%)15(37%) 14(48%)14(48%) 23(57%)23(57%) 0.1680.168 高血压病史(%)History of hypertension (%) 24(59%)24(59%) 24(83%)24(83%) 19(48%)19(48%) 0.0110.011 糖尿病史(%)History of diabetes (%) 9(22%)9(22%) 8(28%)8(28%) 11(28%)11(28%) 0.8090.809 吸烟史(%)Smoking history (%) 7(17%)7(17%) 11(38%)11(38%) 16(40%)16(40%) 0.0530.053 血脂异常史(%)History of dyslipidemia (%) 13(32%)13(32%) 12(41%)12(41%) 13(33%)13(33%) 0.6640.664

在本实施例共纳入130例符合入选标准的病例,共4055个语音样本,有效时长共2.216h。其中年龄30-50岁者63例(%),年龄>50者67例(%)。心衰分组情况如下:A组35例(%),含1085个语音样本,有效时长0.574h;B组26例(%),含849个语音样本,有效时长0.462h;C组40例(%),含1231个语音样本,有效时长0.715h;N组29例(%),含890个语音样本,有效时长0.465h;对照组18例(%),含890个语音样本,有效时长0.465h。In this embodiment, a total of 130 cases that meet the inclusion criteria are included, with a total of 4055 voice samples and a total effective duration of 2.216 hours. Among them, 63 cases (%) were aged 30-50 years old, and 67 cases (%) were aged >50. The heart failure grouping is as follows: Group A 35 cases (%), including 1085 voice samples, effective duration 0.574 hours; Group B 26 cases (%), including 849 voice samples, effective duration 0.462 hours; Group C 40 cases (%), including 1231 voice samples, effective duration 0.715 hours; Group N 29 cases (%), including 890 voice samples, effective duration 0.465 hours; Control group 18 cases (%), including 890 voice samples, effective duration 0.465 hours.

本实施例中的语音特征使用eGeMAPS特征集,是GeMAPS的扩展特征集,该特征集是由openSMILE开源工具包所提取的88维的手工特征,其中包含低级别描述特征(Low-LevelDescriptors,LLDs)18个,并在GeMAPS基础上增加了5个谱特征(MFCC1-4和Spectral flux)和2个频率相关特征(第二个共振峰和第三个共振峰的带宽),包括频率、能量/振幅相关特征和谱特征。此外还利用python提取了另外12维特征:GNE_SEO_SNR,GNE_TKEO_SNR,GNE_mean,GNE_std,VFER_SEO_SNR,VFER_TKEO_SNR,VFER_mean,VFER_std,循环周期密度熵RPDE,消除趋势波动分析DFA,样本熵SampEn,多尺度熵MSEn。因此本研究所使用的特征参数共100维。通过分析不同心衰分期患者该100维语音特征,发现反映嗓音粗糙度和气息的语音特征,不同心衰分期均有显著影响。The speech features in this embodiment use the eGeMAPS feature set, which is an extended feature set of GeMAPS. This feature set is an 88-dimensional manual feature extracted by the openSMILE open source toolkit, which includes 18 low-level descriptors (LLDs), and adds 5 spectral features (MFCC1-4 and spectral flux) and 2 frequency-related features (the bandwidth of the second resonance peak and the third resonance peak) on the basis of GeMAPS, including frequency, energy/amplitude-related features and spectral features. In addition, another 12-dimensional features are extracted using python: GNE_SEO_SNR, GNE_TKEO_SNR, GNE_mean, GNE_std, VFER_SEO_SNR, VFER_TKEO_SNR, VFER_mean, VFER_std, cycle density entropy RPDE, detrending fluctuation analysis DFA, sample entropy SampEn, multi-scale entropy MSEn. Therefore, the feature parameters used in this study have a total of 100 dimensions. By analyzing the 100-dimensional speech features of patients with different heart failure stages, it was found that the speech features reflecting voice roughness and breath were significantly affected by different heart failure stages.

对嗓音粗糙度而言,反映对声门、声带的控制能力和声音嘶哑程度,主要指标包括Jitter、Shimmer、Harmonic difference、HNR、Alpha Ratio等。As for the roughness of the voice, it reflects the ability to control the glottis, vocal cords and the degree of hoarseness. The main indicators include Jitter, Shimmer, Harmonic difference, HNR, Alpha Ratio, etc.

Jitter代表单个连续基音周期内的偏差,反映嗓音韵律的音质特征,统计量为均值、标准差;共2个特征,不同心衰分期具有显著差异。其实验结果如图2所示。Jitter represents the deviation within a single continuous fundamental period, reflecting the sound quality characteristics of the voice rhythm. The statistics are mean and standard deviation. There are two features in total, and there are significant differences in different heart failure stages. The experimental results are shown in Figure 2.

Shimmer代表相邻基音周期间振幅峰值之差,也反映嗓音韵律的音质特征。统计量为均值、标准差。共2个特征,心衰类型均有显著影响。其实验结果如图3所示。Shimmer represents the difference between the peak amplitudes of adjacent fundamental cycles and also reflects the sound quality characteristics of the voice rhythm. The statistics are mean and standard deviation. There are two features in total, and both types of heart failure have significant effects. The experimental results are shown in Figure 3.

Harmonic difference表示H1-H2:第一个基音谐波H1与第二个基音谐波H2的能量比。统计量为均值、标准差。共2个特征,心衰类型均有显著影响。H1-A3:+第一个基音谐波H1与第三共振峰A3的能量比。统计量为均值、标准差。共2个特征,心衰类型均有显著影响。其实验结果如图4所示。Harmonic difference means H1-H2: the energy ratio of the first fundamental harmonic H1 to the second fundamental harmonic H2. The statistics are mean and standard deviation. There are 2 features in total, and the heart failure type has a significant impact. H1-A3: +The energy ratio of the first fundamental harmonic H1 to the third resonance peak A3. The statistics are mean and standard deviation. There are 2 features in total, and the heart failure type has a significant impact. The experimental results are shown in Figure 4.

HNR代表谐波噪声比,即稳态元音的声波中周期性重复的谐波分量所占的比重。统计量为均值、标准差;共2个特征,心衰类型均有显著影响。其实验结果如图5所示。HNR stands for harmonic noise ratio, which is the proportion of periodically repeated harmonic components in the sound waves of steady-state vowels. The statistics are mean and standard deviation; there are two features in total, and the type of heart failure has a significant impact. The experimental results are shown in Figure 5.

Alpha Ratio表示50-1000Hz的能力和除以1-5kHz的能量和。统计量为浊音区域的均值、标准差,清音区域均值;共3个特征,心衰类型均有显著影响。其实验结果如图6所示。Alpha Ratio represents the power sum of 50-1000Hz divided by the power sum of 1-5kHz. The statistics are the mean and standard deviation of the voiced area and the mean of the unvoiced area. There are three features in total, and the heart failure type has a significant impact. The experimental results are shown in Figure 6.

对于嗓音气息度而言,反映声音顿挫和强度,包括Loudness、voiced/unvoicedduration、Hammarberg Index、Spectral Slope等指标。不同心衰分期嗓音气息度有显著差异。As for the vocal breathiness, it reflects the pauses and intensity of the voice, including indicators such as Loudness, voiced/unvoiced duration, Hammarberg Index, Spectral Slope, etc. There are significant differences in the vocal breathiness in different heart failure stages.

voiced/unvoiced duration表示连续浊音(F0>0)时长,统计量为平均长度和标准差,共2个特征。清音(F0=0)时长,统计量为平均长度和标准差,共2个特征。每秒浊音区域的个数,共1个特征。心衰类型在5个特征上均有显著影响。其实验结果如图7所示。Voiced/unvoiced duration indicates the duration of continuous voiced sounds (F0>0), and the statistics are the average length and standard deviation, with a total of 2 features. Unvoiced sound (F0=0) duration, the statistics are the average length and standard deviation, with a total of 2 features. The number of voiced sound areas per second, with a total of 1 feature. Heart failure type has a significant impact on all 5 features. The experimental results are shown in Figure 7.

Loudness表示统计量为均值、标准差、20/50/80百分位、20-80百分位的范围、上升/下降语音信号的斜率的均值和标准差;共10个特征,不同心衰分期均有显著影响。其实验结果如图8所示。Loudness means that the statistics are the mean, standard deviation, 20/50/80 percentiles, 20-80 percentile range, and the mean and standard deviation of the slope of the rising/falling speech signal; there are 10 features in total, and different heart failure stages have significant effects. The experimental results are shown in Figure 8.

Hammarberg Index表示0-2kHz的最强能量峰除以2-5kHz的最强能量峰。The Hammarberg Index represents the strongest energy peak between 0 and 2 kHz divided by the strongest energy peak between 2 and 5 kHz.

统计量为浊音区域的均值、标准差,清音区域均值,共3个特征,心衰类型均有显著影响。其实验结果如图9所示。The statistics are the mean and standard deviation of the voiced area and the mean of the unvoiced area, a total of 3 features, and the heart failure type has a significant impact. The experimental results are shown in Figure 9.

Spectral Slope表示在0-500Hz和500-1500Hz范围内的对数功率谱的线性回归斜率(衰减率),斜线形的声谱包络的倾斜度,斜率越大,对分频斜率以外信号的衰减也就越大。统计量为浊音区域的均值、标准差,清音区域均值。共6个特征,心衰类型在5个特征上有显著影响。其实验结果如图10所示。Spectral Slope represents the linear regression slope (attenuation rate) of the logarithmic power spectrum in the range of 0-500Hz and 500-1500Hz, and the slope of the oblique spectrum envelope. The larger the slope, the greater the attenuation of the signal outside the frequency division slope. The statistics are the mean and standard deviation of the voiced area and the mean of the unvoiced area. There are 6 features in total, and the type of heart failure has a significant impact on 5 features. The experimental results are shown in Figure 10.

每个声门周期的互相关系数或2.5KHz以上与2.5KHz以下的能量之比。共4个特征,心衰类型均有显著影响。其实验结果如图11所示。The cross-correlation coefficient of each glottal cycle or the ratio of the energy above 2.5KHz to that below 2.5KHz. There are 4 features in total, and the type of heart failure has a significant impact. The experimental results are shown in Figure 11.

对于非线性分析而言,非线性参数更适合对周期性较差的声学信号的内在特征进行描述,而较多的心衰语音往往呈现出周期性较差特点。熵反映语音信息在频域上分布的无序性特征,样本熵可以从时域上反映心音信号的复杂性。本研究首先提取循环周期密度熵(Recurrence period density entropy,RPDE)、消除趋势波动分析(DetrendedFluctuation Analysis,DFA)、样本熵(Sample Entropy)和尺度熵等非线性参数,用于描述周期、非周期和混乱的语音信号特征。For nonlinear analysis, nonlinear parameters are more suitable for describing the intrinsic characteristics of acoustic signals with poor periodicity, and most heart failure voices often show poor periodicity. Entropy reflects the disordered characteristics of voice information distributed in the frequency domain, and sample entropy can reflect the complexity of heart sound signals in the time domain. This study first extracted nonlinear parameters such as recurrence period density entropy (RPDE), detrended fluctuation analysis (DFA), sample entropy (Sample Entropy) and scale entropy to describe the characteristics of periodic, non-periodic and chaotic voice signals.

循环周期密度熵RPDE,消除趋势波动分析DFA,样本熵SampEn,多尺度熵MSEn,都是反应声音的粗糙度。共4个特征,心衰类型均有显著影响。其实验结果如图12所示。Cycle-by-cycle density entropy RPDE, detrending fluctuation analysis DFA, sample entropy SampEn, and multi-scale entropy MSEn all reflect the roughness of the sound. There are 4 features in total, and the heart failure type has a significant impact. The experimental results are shown in Figure 12.

对于基于倒谱的声学特征参数而言,梅尔倒谱系数是Mel标度频率域提取出来的倒谱参数,Mel标度描述了人耳频率的非线性特性。基于倒谱分析的声学特征参数可以有效地规避基频不规律性所带来的分析不准确。For acoustic feature parameters based on cepstrum, Mel cepstrum coefficients are cepstrum parameters extracted from the Mel scale frequency domain. The Mel scale describes the nonlinear characteristics of human ear frequency. Acoustic feature parameters based on cepstrum analysis can effectively avoid the inaccurate analysis caused by the irregularity of the fundamental frequency.

梅尔倒谱系数1-4。统计量为整体及其浊音段的均值、标准差;共16个特征,心衰类型在15个特征上有显著影响。其实验结果如图13所示。Mel cepstral coefficients 1-4. Statistics are the mean and standard deviation of the whole and voiced segments; there are 16 features in total, and the type of heart failure has a significant impact on 15 features. The experimental results are shown in Figure 13.

其中,需要说明的是,图2-图13中横坐标为心衰分期研究中的分组,纵坐标为对应语音特征参数,如图2中对Jitter的提取了jitterLocal_sma3nz_ameam和jitterLocal_sma3nz_stddevNorm 2个语音特征参数,其提取的对应语音特征参数数量与表1一致。It should be noted that the horizontal axis in Figures 2-13 represents the grouping in the heart failure staging study, and the vertical axis represents the corresponding speech feature parameters. For example, in Figure 2, two speech feature parameters, jitterLocal_sma3nz_ameam and jitterLocal_sma3nz_stddevNorm, were extracted for Jitter, and the number of corresponding speech feature parameters extracted is consistent with that in Table 1.

本实施例中基于不同分期心衰患者嗓音原始100维eGeMAPs特征,我们进一步用二分类实验,分别通过PCA法和LASSO法降维,比较6种不同分类器,包括支持向量机(SVM)、决策树(Decision tree,DT)、自适应增强(Adaptive Boosting,Ada Boost)、最小绝对收缩和选择算子(Least Absolute Shrinkage and Selection Operator,LASSO)、岭回归(Ridgeregression)、弹性网络(Elastic Net)等方法,观察不同分类器识别不同分期心衰患者的性能,找出最优分类模型。In this embodiment, based on the original 100-dimensional eGeMAPs features of the voice of patients with heart failure at different stages, we further used a binary classification experiment to reduce the dimension through the PCA method and the LASSO method, and compared 6 different classifiers, including support vector machine (SVM), decision tree (DT), adaptive boosting (Adaptive Boosting, Ada Boost), least absolute shrinkage and selection operator (Least Absolute Shrinkage and Selection Operator, LASSO), ridge regression (Ridgeregression), elastic network (Elastic Net) and other methods, to observe the performance of different classifiers in identifying patients with heart failure at different stages and find the optimal classification model.

1.A、B二分类实验(留一法)1. A, B binary classification experiment (leave one out method)

以心衰A期及B期人群为因变量进行二分类识别建模,基于原始变量代入的分类模型,原始100维语音特征分类模型中,Ada Boost分类器模型最优,准确率(Accuracy)达0.869,查准率(Pricision,P)为0.846,召回率(Recall,R)为0.846,F1 score 0.846。通过PCA降维,Ada Boost的分类模型最优,使用训练集进行模型训练准确率为0.738;通过LASSO降维后,Ada Boost的分类模型最优,使用训练集进行模型训练准确率为0.770,均较基于原始特征的二分类识别模型降低,考虑可能和降维后特征损失有关。最优模型,即原始的AdaBoost,样本级别ROC曲线如图14所示(AUC=0.793),结果表明语音特征可以识别心衰A期和B期人群,提示通过语音特征可以初步筛选心衰风险期和出现靶器官损害的心衰前期患者。基于原始100维eGeMAPs特征的二分类正确率(均值,标准差)如表6所示。The heart failure stage A and stage B population was used as the dependent variable for binary classification recognition modeling. Based on the classification model with the original variables substituted, the Ada Boost classifier model was the best in the original 100-dimensional speech feature classification model, with an accuracy of 0.869, a precision of 0.846, a recall of 0.846, and an F1 score of 0.846. After PCA dimensionality reduction, the Ada Boost classification model was the best, and the accuracy of the model training using the training set was 0.738; after LASSO dimensionality reduction, the Ada Boost classification model was the best, and the accuracy of the model training using the training set was 0.770, which was lower than the binary classification recognition model based on the original features, which may be related to the feature loss after dimensionality reduction. The optimal model, the original AdaBoost, has a sample-level ROC curve as shown in Figure 14 (AUC = 0.793). The results show that speech features can identify people in stage A and stage B of heart failure, suggesting that speech features can be used to preliminarily screen patients in the risk period of heart failure and patients in the early stage of heart failure with target organ damage. The binary classification accuracy (mean, standard deviation) based on the original 100-dimensional eGeMAPs features is shown in Table 6.

表6Table 6

Figure SMS_16
Figure SMS_16

以心衰A期及B期人群为因变量进行二分类识别建模,基于原始变量代入的分类模型A binary classification model was built with the heart failure stage A and stage B population as the dependent variable. The classification model was based on the original variable substitution.

AdaBoost混淆矩阵如表7和表8所示:The AdaBoost confusion matrix is shown in Table 7 and Table 8:

表7Table 7

Figure SMS_17
Figure SMS_17

Figure SMS_18
Figure SMS_18

表8Table 8

Figure SMS_19
Figure SMS_19

基于PCA进行降维,利用主成分代入的分类模型的结果如表9所示:Based on PCA dimensionality reduction, the results of the classification model using principal components are shown in Table 9:

表9Table 9

Figure SMS_20
Figure SMS_20

AdaBoost混淆矩阵如表10和表11所示:The AdaBoost confusion matrix is shown in Table 10 and Table 11:

表10Table 10

Figure SMS_21
Figure SMS_21

表11Table 11

Figure SMS_22
Figure SMS_22

基于LASSO进行降维,将特征变量代入的分类模型的结果如表12所示:The results of the classification model based on LASSO dimensionality reduction and substituting the feature variables into the classification model are shown in Table 12:

表12Table 12

Figure SMS_23
Figure SMS_23

Figure SMS_24
Figure SMS_24

AdaBoost混淆矩阵如表13和表14所示:The AdaBoost confusion matrix is shown in Table 13 and Table 14:

表13Table 13

Figure SMS_25
Figure SMS_25

表14Table 14

Figure SMS_26
Figure SMS_26

通过LASSO降维后,Ada Boost的分类模型最优,使用训练集进行模型训练准确率为0.770,均较基于原始特征的二分类识别模型降低,考虑可能和降维后特征损失有关,原始100维AdaBoost特征重要性(其中2维重要性为0)如表15所示。After dimensionality reduction by LASSO, the classification model of Ada Boost is the best. The accuracy of model training using the training set is 0.770, which is lower than that of the binary classification recognition model based on the original features. It is considered that this may be related to the feature loss after dimensionality reduction. The importance of the original 100-dimensional AdaBoost features (where the importance of 2 dimensions is 0) is shown in Table 15.

表15Table 15

Figure SMS_27
Figure SMS_27

Figure SMS_28
Figure SMS_28

A、B二分类实验(留一法)结果总结A, B binary classification experiment (leave one out method) results summary

原始分类模型在测试集中的性能评价指标如表16所示:The performance evaluation indicators of the original classification model in the test set are shown in Table 16:

表16Table 16

Figure SMS_29
Figure SMS_29

“PCA降维+分类模型”在测试集中的性能评价指标如表17所示:The performance evaluation indicators of "PCA dimensionality reduction + classification model" in the test set are shown in Table 17:

表17Table 17

Figure SMS_30
Figure SMS_30

“LASSO特征选择+分类模型”在测试集中的性能评价指标如表18所示:The performance evaluation indicators of "LASSO feature selection + classification model" in the test set are shown in Table 18:

表18Table 18

Figure SMS_31
Figure SMS_31

最优模型(原始的AdaBoost)样本级别ROC曲线(AUC=0.793)如图14所示,图中,横纵坐标分别表示假阳性率和真阳性率,Receiver Operating Characteristic为接收器工作曲线。The sample-level ROC curve (AUC=0.793) of the optimal model (original AdaBoost) is shown in FIG14 , where the horizontal and vertical axes represent the false positive rate and the true positive rate, respectively, and the Receiver Operating Characteristic is the receiver operating curve.

2.B、C二分类结果(留一法)2. B, C binary classification results (leave one out method)

以心衰B期及C期人群为因变量进行二分类识别建模,基于原始变量代入的分类模型,原始100维语音特征分类模型中,Ada Boost分类器模型最优,准确率(Accuracy)达0.788,查准率(Pricision,P)为0.771,召回率(Recall,R)为0.925,F1 score 0.841。通过PCA降维,Ada Boost和SVM的分类模型相似,使用训练集进行模型训练准确率为0.773;Elastic分类模型训练准确率0.742,均较基于原始特征的二分类识别模型降低。通过LASSO降维后,Ada Boost的分类模型最优,使用训练集进行模型训练准确率为0.803,较基于原始特征的二分类识别模型准确率有所提高。如图15所示,最优模型,即采用Lasso降维的AdaBoost,样本级别ROC曲线下面积AUC为0.819。结果表明语音特征可以识别心衰B期和C期人群,提示通过语音特征可以识别出现靶器官损害的心衰前期患者和出现过症状性心衰的患者。The heart failure stage B and C population was used as the dependent variable for binary classification modeling. Based on the classification model with the original variables substituted, the Ada Boost classifier model was the best in the original 100-dimensional speech feature classification model, with an accuracy of 0.788, a precision of 0.771, a recall of 0.925, and an F1 score of 0.841. Through PCA dimensionality reduction, the classification models of Ada Boost and SVM were similar. The accuracy of model training using the training set was 0.773; the accuracy of the Elastic classification model training was 0.742, both of which were lower than the binary classification recognition model based on the original features. After LASSO dimensionality reduction, the Ada Boost classification model was the best. The accuracy of model training using the training set was 0.803, which was higher than the accuracy of the binary classification recognition model based on the original features. As shown in Figure 15, the optimal model, AdaBoost with Lasso dimensionality reduction, had a sample-level ROC curve area under the curve of 0.819. The results showed that speech features can identify people in stage B and C of heart failure, suggesting that speech features can be used to identify patients in the early stages of heart failure with target organ damage and patients who have experienced symptomatic heart failure.

基于原始100维eGeMAPs特征的二分类正确率(均值,标准差)如表19所示:The binary classification accuracy (mean, standard deviation) based on the original 100-dimensional eGeMAPs features is shown in Table 19:

表19Table 19

Figure SMS_32
Figure SMS_32

AdaBoost混淆矩阵如表20和表21所示:The AdaBoost confusion matrix is shown in Table 20 and Table 21:

表20Table 20

Figure SMS_33
Figure SMS_33

表21Table 21

Figure SMS_34
Figure SMS_34

基于PCA进行降维,利用主成分代入的分类模型如表22所示:Based on PCA dimensionality reduction, the classification model using principal components is shown in Table 22:

表22Table 22

Figure SMS_35
Figure SMS_35

AdaBoost混淆矩阵如表23和表24所示:The AdaBoost confusion matrix is shown in Table 23 and Table 24:

表23Table 23

Figure SMS_36
Figure SMS_36

表24Table 24

Figure SMS_37
Figure SMS_37

SVM混淆矩阵如表25和表26所示:The SVM confusion matrix is shown in Table 25 and Table 26:

表25Table 25

Figure SMS_38
Figure SMS_38

表26Table 26

Figure SMS_39
Figure SMS_39

基于LASSO进行降维,将特征变量代入的分类模型如表27所示:Based on LASSO dimensionality reduction, the classification model in which the feature variables are substituted is shown in Table 27:

表27Table 27

Figure SMS_40
Figure SMS_40

AdaBoost混淆矩阵如表28和表29所示:The AdaBoost confusion matrix is shown in Table 28 and Table 29:

表28Table 28

Figure SMS_41
Figure SMS_41

表29Table 29

Figure SMS_42
Figure SMS_42

LASSO正则化系数不为0的特征,共计66维,34维特征为0,如表30所示:The features with LASSO regularization coefficients that are not 0 are 66 in total, and 34 of them are 0, as shown in Table 30:

表30Table 30

Figure SMS_43
Figure SMS_43

Figure SMS_44
Figure SMS_44

Figure SMS_45
Figure SMS_45

B、C二分类实验(留一法)结果总结B. C Binary Classification Experiment (Leave One Out) Results Summary

原始分类模型在测试集中的性能评价指标如表31所示:The performance evaluation indicators of the original classification model in the test set are shown in Table 31:

表31Table 31

Figure SMS_46
Figure SMS_46

“PCA降维+分类模型”在测试集中的性能评价指标如表32所示:The performance evaluation indicators of "PCA dimensionality reduction + classification model" in the test set are shown in Table 32:

表32Table 32

Figure SMS_47
Figure SMS_47

“LASSO特征选择+分类模型”在测试集中的性能评价指标如表33所示:The performance evaluation indicators of "LASSO feature selection + classification model" in the test set are shown in Table 33:

表33Table 33

Figure SMS_48
Figure SMS_48

最优模型(原始的AdaBoost)样本级别ROC曲线(AUC=0.793)如图15所示,图中,横纵坐标分别表示假阳性率和真阳性率,Receiver Operating Characteristic为接收器工作曲线。The sample-level ROC curve (AUC=0.793) of the optimal model (original AdaBoost) is shown in FIG15 , where the horizontal and vertical axes represent the false positive rate and the true positive rate, respectively, and the Receiver Operating Characteristic is the receiver operating curve.

3.AB、C二分类实验(留一法)3. AB, C binary classification experiment (leave one out method)

以心衰A和B期的患者(AB)及C期患者为因变量进行二分类识别建模,基于原始变量代入的分类模型,原始100维语音特征分类模型中,Ada Boost分类器模型最优,准确率(Accuracy)达0.802,查准率(Pricision,P)为0.857,召回率(Recall,R)为0.600,F1 score0.706。通过PCA降维,SVM的分类模型最优,但训练集进行模型训练准确率下降为0.723;通过LASSO降维后,Ada Boost的分类模型最优,使用训练集进行模型训练准确率为0.812,较基于原始特征的二分类识别模型准确率有所提高。如图16所示,最优模型,即采用Lasso降维的AdaBoost,样本级别ROC曲线下面积AUC为0.731。结果表明语音特征可以将有心衰风险(A期)及靶器官损害的心衰前期患者(B期)与出现过症状性心衰的C期患者区分,即语音特征可帮助识别曾经发生过症状性心衰的患者。基于原始100维eGeMAPs特征的二分类正确率(均值,标准差)如表34所示。The patients with heart failure A and B (AB) and C stage were used as dependent variables for binary classification recognition modeling. Based on the classification model with the original variables substituted, the Ada Boost classifier model was the best in the original 100-dimensional speech feature classification model, with an accuracy of 0.802, a precision of 0.857, a recall of 0.600, and an F1 score of 0.706. After PCA dimensionality reduction, the SVM classification model was the best, but the accuracy of the model training using the training set dropped to 0.723; after LASSO dimensionality reduction, the Ada Boost classification model was the best, with an accuracy of 0.812 using the training set, which was higher than the accuracy of the binary classification recognition model based on the original features. As shown in Figure 16, the best model, AdaBoost with Lasso dimensionality reduction, had a sample-level ROC curve area under the curve of 0.731. The results show that speech features can distinguish patients with heart failure risk (stage A) and pre-heart failure (stage B) with target organ damage from patients with stage C who have experienced symptomatic heart failure, that is, speech features can help identify patients who have experienced symptomatic heart failure. The binary classification accuracy (mean, standard deviation) based on the original 100-dimensional eGeMAPs features is shown in Table 34.

表34Table 34

Figure SMS_49
Figure SMS_49

AdaBoost混淆矩阵如表35和表36所示:The AdaBoost confusion matrix is shown in Table 35 and Table 36:

表35Table 35

Figure SMS_50
Figure SMS_50

表36Table 36

Figure SMS_51
Figure SMS_51

基于PCA进行降维,利用主成分代入的分类模型如表37所示:Based on PCA dimensionality reduction, the classification model using principal components is shown in Table 37:

表37Table 37

Figure SMS_52
Figure SMS_52

SVM混淆矩阵如表38和表39所示:The SVM confusion matrix is shown in Table 38 and Table 39:

表38Table 38

Figure SMS_53
Figure SMS_53

表39Table 39

Figure SMS_54
Figure SMS_54

基于LASSO进行降维的结果如表40所示:The results of dimensionality reduction based on LASSO are shown in Table 40:

表40Table 40

Figure SMS_55
Figure SMS_55

AdaBoost混淆矩阵如表41和表42所示:The AdaBoost confusion matrix is shown in Table 41 and Table 42:

表41Table 41

Figure SMS_56
Figure SMS_56

表42Table 42

Figure SMS_57
Figure SMS_57

LASSO正则化系数不为0的特征,共计56维,44维特征为0,如表43所示:The features with LASSO regularization coefficients that are not 0 are 56 in total, and 44 of them are 0, as shown in Table 43:

表43Table 43

Figure SMS_58
Figure SMS_58

Figure SMS_59
Figure SMS_59

AB、C二分类结果整理(留一法)AB and C classification results (leave one out)

原始分类模型在测试集中的性能评价指标如表44所示:The performance evaluation indicators of the original classification model in the test set are shown in Table 44:

表44Table 44

Figure SMS_60
Figure SMS_60

Figure SMS_61
Figure SMS_61

“PCA降维+分类模型”在测试集中的性能评价指标如表45所示:The performance evaluation indicators of the "PCA dimensionality reduction + classification model" in the test set are shown in Table 45:

表45Table 45

Figure SMS_62
Figure SMS_62

“LASSO特征选择+分类模型”在测试集中的性能评价指标如表46所示:The performance evaluation indicators of "LASSO feature selection + classification model" in the test set are shown in Table 46:

表46Table 46

Figure SMS_63
Figure SMS_63

最优模型(原始的AdaBoost)样本级别ROC曲线(AUC=0.793)如图16所示,图中,横纵坐标分别表示假阳性率和真阳性率,Receiver Operating Characteristic为接收器工作曲线。The sample-level ROC curve (AUC=0.793) of the optimal model (original AdaBoost) is shown in FIG16 , where the horizontal and vertical axes represent the false positive rate and the true positive rate, respectively, and the Receiver Operating Characteristic is the receiver operating curve.

经过上述试验证明,(1)不同心衰分期患者,嗓音特征是有区别的,基于eGeMAPS特征集和python提取的共计100维特征,不同心衰分期患者反映嗓音粗糙度的主要指标,Jitter、Shimmer、Harmonic difference、HNR、Alpha Ratio等,在不同分期心衰间均有区别。(2)反映嗓音气息度的主要指标,包括Loudness、voiced/unvoiced duration、Hammarberg Index、Spectral Slope等,不同心衰分期间也有显著差异。在这些声音指标中,反映声音的基本特征,包括频率、能量/振幅相关特征、非线性等特征方面,不同心衰分期均有区别。(3)不同分期间语音特征的贡献度不完全相同。(4)基于不同分期心衰患者嗓音原始100维eGeMAPs特征,通过二分类法,构建分类模型,比较原始变量、PCA降维和LASSO降维,及不同分类器识别不同分期心衰患者的性能,优化分类模型,识别A期和B期心衰患者的最优模型为基于原始变量的AdaBoost分类方法,其ROC曲线AUC为0.793;识别B期和C期心衰患者的最优模型为基于Lasso降维的AdaBoost分类方法,其ROC曲线AUC为0.819;将AB期与C期识别区分的最优模型为基于Lasso降维的AdaBoost分类方法,其ROC曲线AUC为0.731。The above experiments have proved that (1) the voice characteristics of patients with different heart failure stages are different. Based on the eGeMAPS feature set and a total of 100-dimensional features extracted by python, the main indicators reflecting the roughness of the voice of patients with different heart failure stages, such as Jitter, Shimmer, Harmonic difference, HNR, Alpha Ratio, etc., are different between different stages of heart failure. (2) The main indicators reflecting the breathiness of the voice, including Loudness, voiced/unvoiced duration, Hammarberg Index, Spectral Slope, etc., also have significant differences between different heart failure stages. Among these sound indicators, the basic characteristics of the voice, including frequency, energy/amplitude related characteristics, nonlinearity and other characteristics, are different in different heart failure stages. (3) The contribution of speech features in different stages is not exactly the same. (4) Based on the original 100-dimensional eGeMAPs features of the voices of patients with heart failure at different stages, a classification model was constructed using a binary classification method. The performance of the original variables, PCA dimensionality reduction, LASSO dimensionality reduction, and different classifiers in identifying patients with heart failure at different stages was compared, and the classification model was optimized. The optimal model for identifying patients with heart failure at stage A and stage B was the AdaBoost classification method based on the original variables, with an ROC curve AUC of 0.793; the optimal model for identifying patients with heart failure at stage B and stage C was the AdaBoost classification method based on Lasso dimensionality reduction, with an ROC curve AUC of 0.819; the optimal model for distinguishing between stage AB and stage C was the AdaBoost classification method based on Lasso dimensionality reduction, with an ROC curve AUC of 0.731.

本发明还提供了一种计算机设备,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现如下方法:将所采集的语音模拟信号转换为语音数字信号,并对所述语音数字信号进行预处理,以及对预处理后的语音数字信号进行特征提取,得到多类语音特征样本;构建用于识别心力衰竭分期的分类模型,利用所述多类语音特征样本对所述分类模型进行训练和优化,得到最优分类模型,其中,用于识别心力衰竭A期和心力衰竭B期的分类模型为基于原始变量的AdaBoost分类模型;用于识别心力衰竭B期和C期的分类模型为基于Lasso降维的AdaBoost分类模型;用于识别心力衰竭A期和B期与C期的分类模型为基于Lasso降维的AdaBoost分类模型。The present invention also provides a computer device, comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the following method when executing the computer program: converting the collected voice analog signal into a voice digital signal, preprocessing the voice digital signal, and extracting features from the preprocessed voice digital signal to obtain multiple categories of voice feature samples; constructing a classification model for identifying heart failure stages, and training and optimizing the classification model using the multiple categories of voice feature samples to obtain an optimal classification model, wherein the classification model for identifying heart failure stage A and heart failure stage B is an AdaBoost classification model based on original variables; the classification model for identifying heart failure stage B and stage C is an AdaBoost classification model based on Lasso dimensionality reduction; and the classification model for identifying heart failure stages A, B, and C is an AdaBoost classification model based on Lasso dimensionality reduction.

本发明还提供了一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如下方法:将所采集的语音模拟信号转换为语音数字信号,并对所述语音数字信号进行预处理,以及对预处理后的语音数字信号进行特征提取,得到多类语音特征样本;构建用于识别心力衰竭分期的分类模型,利用所述多类语音特征样本对所述分类模型进行训练和优化,得到最优分类模型,其中,用于识别心力衰竭A期和心力衰竭B期的分类模型为基于原始变量的AdaBoost分类模型;用于识别心力衰竭B期和C期的分类模型为基于Lasso降维的AdaBoost分类模型;用于识别心力衰竭A期和B期与C期的分类模型为基于Lasso降维的AdaBoost分类模型。The present invention also provides a computer-readable storage medium, on which a computer program is stored. When the computer program is executed by a processor, the following method is implemented: converting the collected voice analog signal into a voice digital signal, preprocessing the voice digital signal, and extracting features from the preprocessed voice digital signal to obtain multiple categories of voice feature samples; constructing a classification model for identifying heart failure stages, training and optimizing the classification model using the multiple categories of voice feature samples to obtain an optimal classification model, wherein the classification model for identifying heart failure stage A and heart failure stage B is an AdaBoost classification model based on original variables; the classification model for identifying heart failure stage B and stage C is an AdaBoost classification model based on Lasso dimensionality reduction; and the classification model for identifying heart failure stages A, B and C is an AdaBoost classification model based on Lasso dimensionality reduction.

说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似部分互相参见即可。对于实施例公开的装置而言,由于其与实施例公开的方法相对应,所以描述的比较简单,相关之处参见方法部分说明即可。应当指出,对于本技术领域的普通技术人员来说,在不脱离本发明原理的前提下,还可以对本发明进行若干改进和修饰,这些改进和修饰也落入本发明权利要求的保护范围内。The various embodiments in the specification are described in a progressive manner, and each embodiment focuses on the differences from other embodiments. The same and similar parts between the various embodiments can be referred to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant parts can be referred to the method part description. It should be pointed out that for ordinary technicians in this technical field, without departing from the principle of the present invention, several improvements and modifications can be made to the present invention, and these improvements and modifications also fall within the scope of protection of the claims of the present invention.

还需要说明的是,在本说明书中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的状况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。It should also be noted that, in this specification, relational terms such as first and second, etc. are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Moreover, the terms "comprises", "comprising" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, method, article or device including a series of elements includes not only those elements, but also other elements not explicitly listed, or also includes elements inherent to such process, method, article or device. In the absence of further restrictions, an element defined by the statement "comprising a ..." does not exclude the presence of other identical elements in the process, method, article or device including the element.

Claims (10)

1. An apparatus for constructing a classification model based on acoustic features that identifies different stages of heart failure, comprising:
the sample processing unit is used for converting the collected voice analog signals into voice digital signals, preprocessing the voice digital signals, and extracting the characteristics of the preprocessed voice digital signals to obtain multiple types of voice characteristic samples;
the model training unit is used for constructing a classification model for identifying heart failure stage, training and optimizing the classification model by utilizing the multi-class voice feature samples to obtain an optimal classification model, wherein the classification model for identifying heart failure stage A and heart failure stage B is an AdaBoost classification model based on an original variable; the classification model used for identifying the B phase and the C phase of heart failure is an AdaBoost classification model based on Lasso dimension reduction; the classification model used for identifying heart failure stage A and B and stage C is an AdaBoost classification model based on Lasso dimension reduction.
2. The apparatus for constructing a classification model for identifying different stages of heart failure based on acoustic features according to claim 1, wherein the sample processing unit comprises:
a first conversion unit for converting a time-continuous speech analog signal into time-discrete, amplitude-continuous signal samples at a predetermined sampling period;
The second conversion unit is used for converting each signal sample with continuous value in amplitude into discrete value and representing the discrete value by binary system to obtain digital data;
and the third conversion unit is used for converting the digital data into a binary code stream to obtain a voice digital signal.
3. The apparatus for constructing a classification model for identifying different stages of heart failure based on acoustic features of claim 1, wherein the sample processing unit further comprises:
a synchronizing unit for synchronizing the plurality of voice digital signals to a uniform sampling rate by adopting a downsampling method;
the end point detection unit is used for carrying out end point detection on the voice digital signal after the unified sampling rate and distinguishing a voice area and a non-voice area;
the emphasis unit is used for emphasizing the high-frequency part of the voice area and increasing the high-frequency resolution of the voice;
and the framing and windowing unit is used for framing and windowing the emphasized voice region to obtain a plurality of voice signal segments.
4. The apparatus for constructing a classification model for identifying different stages of heart failure based on acoustic features of claim 1, wherein the sample processing unit further comprises:
An extraction unit for extracting a multi-dimensional first speech feature sample using an openSMILE open source toolkit and extracting a multi-dimensional second speech feature sample using python;
and the merging unit is used for merging the multi-dimensional first voice characteristic sample and the multi-dimensional second voice characteristic sample to obtain multi-class voice characteristic samples.
5. The apparatus for constructing a classification model identifying different stages of heart failure based on acoustic features of claim 4, wherein the first speech feature sample comprises: pitch characteristics, frequency perturbation characteristics, formant characteristics, amplitude perturbation characteristics, loudness characteristics, harmonic to noise characteristics, harmonic difference characteristics, alpha ratio characteristics, hammarberg coefficient characteristics, spectral slope characteristics, mel-frequency cepstrum characteristics, spectral flow characteristics, ratio characteristics of loudness peaks, continuous and unvoiced regions characteristics, equivalent sound level characteristics.
6. The apparatus for constructing a classification model based on acoustic features that identifies different stages of heart failure according to claim 4, wherein the second speech feature sample comprises: glottal noise excitation ratio characteristics, vocal cord excitation ratio characteristics, cyclic period density entropy characteristics, trend fluctuation elimination analysis characteristics, sample entropy characteristics and multi-scale entropy characteristics.
7. The apparatus for constructing a classification model for identifying different stages of heart failure based on acoustic features of claim 1, further comprising:
and the evaluation unit is used for carrying out model evaluation on the classification model according to a set aside method.
8. The apparatus for constructing a classification model for identifying different stages of heart failure based on acoustic features of claim 1, further comprising:
and the performance measurement unit is used for performing performance measurement on the classification model according to a preset index, wherein the preset index comprises error rate and accuracy rate, precision rate and recall rate, F1 value, specificity, sensitivity, ROC curve and AUC, and unweighted average recall rate.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the following method when executing the computer program: converting the collected voice analog signals into voice digital signals, preprocessing the voice digital signals, and extracting features of the preprocessed voice digital signals to obtain multiple types of voice feature samples; constructing a classification model for identifying heart failure stage, and training and optimizing the classification model by utilizing the multi-class voice feature samples to obtain an optimal classification model, wherein the classification model for identifying heart failure stage A and heart failure stage B is an AdaBoost classification model based on an original variable; the classification model used for identifying the B phase and the C phase of heart failure is an AdaBoost classification model based on Lasso dimension reduction; the classification model used for identifying heart failure stage A and B and stage C is an AdaBoost classification model based on Lasso dimension reduction.
10. A computer readable storage medium, wherein a computer program is stored on the computer readable storage medium, the computer program when executed by a processor implementing the method of: converting the collected voice analog signals into voice digital signals, preprocessing the voice digital signals, and extracting features of the preprocessed voice digital signals to obtain multiple types of voice feature samples; constructing a classification model for identifying heart failure stage, and training and optimizing the classification model by utilizing the multi-class voice feature samples to obtain an optimal classification model, wherein the classification model for identifying heart failure stage A and heart failure stage B is an AdaBoost classification model based on an original variable; the classification model used for identifying the B phase and the C phase of heart failure is an AdaBoost classification model based on Lasso dimension reduction; the classification model used for identifying heart failure stage A and B and stage C is an AdaBoost classification model based on Lasso dimension reduction.
CN202310205344.5A 2023-03-06 2023-03-06 Device and related components for constructing a classification model for identifying different stages of heart failure Pending CN116434739A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310205344.5A CN116434739A (en) 2023-03-06 2023-03-06 Device and related components for constructing a classification model for identifying different stages of heart failure

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310205344.5A CN116434739A (en) 2023-03-06 2023-03-06 Device and related components for constructing a classification model for identifying different stages of heart failure

Publications (1)

Publication Number Publication Date
CN116434739A true CN116434739A (en) 2023-07-14

Family

ID=87078617

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310205344.5A Pending CN116434739A (en) 2023-03-06 2023-03-06 Device and related components for constructing a classification model for identifying different stages of heart failure

Country Status (1)

Country Link
CN (1) CN116434739A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117727296A (en) * 2023-12-18 2024-03-19 杭州恒芯微电子技术有限公司 Speech recognition control system based on single fire panel
CN117898684A (en) * 2024-03-20 2024-04-19 北京大学 A heart failure condition monitoring method, device, equipment and readable storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117727296A (en) * 2023-12-18 2024-03-19 杭州恒芯微电子技术有限公司 Speech recognition control system based on single fire panel
CN117727296B (en) * 2023-12-18 2024-08-09 杭州恒芯微电子技术有限公司 Speech recognition control system based on single fire panel
CN117898684A (en) * 2024-03-20 2024-04-19 北京大学 A heart failure condition monitoring method, device, equipment and readable storage medium
CN117898684B (en) * 2024-03-20 2024-06-18 北京大学 Method, device and equipment for monitoring heart failure illness state and readable storage medium

Similar Documents

Publication Publication Date Title
Cheng et al. Automated sleep apnea detection in snoring signal using long short-term memory neural networks
Dibazar et al. Feature analysis for automatic detection of pathological speech
CN102429662B (en) Screening system for sleep apnea syndrome in family environment
CN103280220B (en) A kind of real-time recognition method for baby cry
Panek et al. Acoustic analysis assessment in speech pathology detection
KR20240135018A (en) Multi-modal system and method for voice-based mental health assessment using emotional stimuli
CN110123367B (en) Computer device, heart sound recognition method, model training device, and storage medium
AU2013274940B2 (en) Cepstral separation difference
Reddy et al. The automatic detection of heart failure using speech signals
CN107657964A (en) Depression aided detection method and grader based on acoustic feature and sparse mathematics
Abou-Abbas et al. A fully automated approach for baby cry signal segmentation and boundary detection of expiratory and inspiratory episodes
CN116434739A (en) Device and related components for constructing a classification model for identifying different stages of heart failure
US20250194938A1 (en) Diagnosis of medical conditions using voice recordings and auscultation
CN105448291A (en) Parkinsonism detection method and detection system based on voice
Reggiannini et al. A flexible analysis tool for the quantitative acoustic assessment of infant cry
Qian et al. Automatic detection, segmentation and classification of snore related signals from overnight audio recording
Abou-Abbas et al. Expiratory and inspiratory cries detection using different signals' decomposition techniques
Das et al. Supervised model for Cochleagram feature based fundamental heart sound identification
CN113974607A (en) A sleep snore detection system based on spiking neural network
CN112820279A (en) Parkinson disease detection method based on voice context dynamic characteristics
Mittapalle et al. Glottal flow characteristics in vowels produced by speakers with heart failure
CN108682432A (en) Speech emotion recognition device
Touahria et al. Discrete Wavelet based Features for PCG Signal Classification using Hidden Markov Models.
Saloni et al. Disease detection using voice analysis: A review
Singh et al. IIIT-S CSSD: A cough speech sounds database

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination