[go: up one dir, main page]

CN114676249A - Data sample detection method and system for Alzheimer's disease - Google Patents

Data sample detection method and system for Alzheimer's disease Download PDF

Info

Publication number
CN114676249A
CN114676249A CN202210193493.XA CN202210193493A CN114676249A CN 114676249 A CN114676249 A CN 114676249A CN 202210193493 A CN202210193493 A CN 202210193493A CN 114676249 A CN114676249 A CN 114676249A
Authority
CN
China
Prior art keywords
data
text
sentence
alzheimer
disease
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210193493.XA
Other languages
Chinese (zh)
Inventor
朱浩瑾
李嘉淳
孟岩
王韬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiao Tong University
Original Assignee
Shanghai Jiao Tong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiao Tong University filed Critical Shanghai Jiao Tong University
Priority to CN202210193493.XA priority Critical patent/CN114676249A/en
Publication of CN114676249A publication Critical patent/CN114676249A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24137Distances to cluster centroïds
    • G06F18/2414Smoothing the distance, e.g. radial basis function networks [RBFN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24147Distances to closest patterns, e.g. nearest neighbour classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/10Protecting distributed programs or content, e.g. vending or licensing of copyrighted material ; Digital rights management [DRM]
    • G06F21/16Program or content traceability, e.g. by watermarking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/226Validation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Epidemiology (AREA)
  • Primary Health Care (AREA)
  • Public Health (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Medical Informatics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Technology Law (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Machine Translation (AREA)

Abstract

The invention provides a method and a system for detecting data samples of Alzheimer's disease, comprising the following steps: based on the intelligent home equipment of the Internet of things, passive and active multi-task data acquisition is carried out; performing voice-to-text processing on voice data in the collected data, and authorizing text information to be transmitted at each intelligent home node to obtain authorized text data; after the authorized text data is verified, extracting text data characteristics and carrying out statement continuity analysis; and summarizing and aggregating the extracted features, classifying, and combining the sentence continuity analysis result to obtain a final detection result of the data sample aiming at the Alzheimer disease. The invention has lower dependency on equipment, and can be deployed efficiently and perform simple data collection; a safety mechanism is provided to ensure the safety of the whole process; high precision of data acquisition is guaranteed based on multi-task investigation and an effective feature extraction mechanism.

Description

用于阿尔兹海默症的数据样本检测方法及系统Data sample detection method and system for Alzheimer's disease

技术领域technical field

本发明涉及医疗器械技术领域,具体地,涉及一种用于阿尔兹海默症的数据样本检测方法及系统,同时提供了一种相应的设备及计算机可读存储介质。The present invention relates to the technical field of medical devices, in particular, to a data sample detection method and system for Alzheimer's disease, and simultaneously provides a corresponding device and a computer-readable storage medium.

背景技术Background technique

阿尔兹海默症是一种起病隐匿的进行性发展的神经系统退行性疾病。临床上以记忆障碍、失语、失用、失认、视空间技能损害、执行功能障碍以及人格和行为改变等全面性痴呆表现为特征。对于与阿尔兹海默症相关的生理数据的采集及检测,始终存在需要专业昂贵设备,并需要持续到医院进行。由于患病人群主要集中于中老年,对于早期的检查是十分不便的。阿尔兹海默症不可逆的特点,在早期不发现预防的情况下,对于生命健康又极其不良的影响。Alzheimer's disease is a progressive neurodegenerative disease with an insidious onset. Clinically, it is characterized by generalized dementia manifestations such as memory impairment, aphasia, apraxia, agnosia, impairment of visuospatial skills, executive dysfunction, and personality and behavioral changes. For the collection and detection of physiological data related to Alzheimer's disease, there is always the need for professional and expensive equipment, and it needs to be carried out continuously in the hospital. Because the sick population is mainly concentrated in the middle-aged and the elderly, it is very inconvenient for the early examination. The irreversible characteristics of Alzheimer's disease have extremely adverse effects on life and health without early detection and prevention.

由于阿尔兹海默症患者早期会表现出说话迟钝,遗忘事件等等特点,这一现象可以通过语音和文本表现出来。因此,可以使用机器学习手段对于收集到的数据进行分析,发现微小的变化,对于健康人群和患病人群进行区分。目前检查的方法主要表现为,基于问卷的人工检查以及基于医疗设备的病理性检查。基于问卷(MMSE)的阿尔兹海默症人工检查方法,在精度和成本上一直存在着改进的空间,对于早期的预防很多时候不能提前发现。基于医疗设备的病理性检查,例如核磁共振等手段,早期可能难以发现,并且依赖于昂贵医疗设备,成本较高,并且可能对用户健康带来其他危害。Since people with Alzheimer's disease will show slow speech, forgetting events and other characteristics in the early stage, this phenomenon can be expressed through speech and text. Therefore, machine learning methods can be used to analyze the collected data, find small changes, and distinguish between healthy and sick people. The current inspection methods mainly include manual inspection based on questionnaires and pathological inspection based on medical equipment. There is always room for improvement in the accuracy and cost of the manual inspection method for Alzheimer's disease based on the questionnaire (MMSE). Pathological examinations based on medical equipment, such as MRI and other means, may be difficult to detect in the early stage, and rely on expensive medical equipment, which is costly and may bring other hazards to the health of users.

如今,随着海量分布式物联网设备在智能家居场景中的广泛部署(亚马逊Alexa,谷歌home等等),大大优化了用户的体验,也解决了很多复杂的数据采集问题。如果能够借助物联网技术实现对阿尔兹海默症相关数据进行采集和检测,将具有很大的研究价值和应用潜力。Today, with the widespread deployment of massive distributed IoT devices in smart home scenarios (Amazon Alexa, Google home, etc.), the user experience has been greatly optimized, and many complex data collection problems have been solved. If the data collection and detection of Alzheimer's disease can be realized with the help of IoT technology, it will have great research value and application potential.

目前没有发现同本发明类似技术的说明或报道,也尚未收集到国内外类似的资料。At present, there is no description or report of the technology similar to the present invention, and no similar materials at home and abroad have been collected.

发明内容SUMMARY OF THE INVENTION

本发明针对现有技术中存在的上述不足,提供了一种用于阿尔兹海默症的数据样本检测方法及系统,同时提供了一种相应的设备及计算机可读存储介质。Aiming at the above deficiencies in the prior art, the present invention provides a data sample detection method and system for Alzheimer's disease, as well as a corresponding device and a computer-readable storage medium.

根据本发明的一个方面,提供了一种用于阿尔兹海默症的数据样本检测方法,包括:According to one aspect of the present invention, there is provided a data sample detection method for Alzheimer's disease, comprising:

基于物联网智能家居设备,进行被动式和主动式的多任务数据采集;Passive and active multi-task data collection based on IoT smart home devices;

对采集的所述数据中的语音数据进行语音转文字处理,并在每一个智能家居节点对需要传输的文本信息进行授权,获得授权后的文本数据;Perform voice-to-text processing on the voice data in the collected data, and authorize the text information to be transmitted at each smart home node to obtain the authorized text data;

对所述授权后的文本数据进行验证后,提取文本数据特征,并进行语句连贯性分析;After the authorized text data is verified, the text data features are extracted, and sentence coherence analysis is performed;

对提取的所述特征汇总聚合后进行分类,并结合所述语句连贯性分析结果,得到最终的针对阿尔兹海默症的数据样本的检测结果。The extracted features are aggregated and classified, and combined with the sentence coherence analysis result, the final detection result of the Alzheimer's disease data sample is obtained.

优选地,所述被动式的数据采集包括:日常对于智能家居设备的指令信息数据采集;所述主动式的数据采集包括:针对设定的任务进行的语音和文本数据采集。Preferably, the passive data collection includes: daily command information data collection for smart home devices; the active data collection includes: voice and text data collection for set tasks.

优选地,所述对采集的所述数据中的语音数据进行语音转文字处理,并在每一个智能家居节点对需要传输的文本信息进行授权,获得授权后的文本数据,包括:Preferably, the voice data in the collected data is subjected to voice-to-text processing, and the text information that needs to be transmitted is authorized at each smart home node, and the authorized text data includes:

对采集的所述数据进行预处理;preprocessing the collected data;

将预处理后的所述数据中的语音数据进行语音转文字处理;The voice data in the preprocessed data is subjected to voice-to-text processing;

在每一个智能家居节点,对需要传输的文本信息进行签名授权,得到授权后的文本数据。At each smart home node, sign and authorize the text information to be transmitted, and obtain the authorized text data.

优选地,所述预处理,包括:Preferably, the preprocessing includes:

对采集的所述数据进行滤波去噪处理;Perform filtering and denoising processing on the collected data;

对部分采集的所述数据进行预加重处理。Part of the collected data is pre-emphasized.

优选地,所述对需要传输的文本信息进行签名授权,得到授权后的文本数据,包括:Preferably, the signature authorization is performed on the text information to be transmitted, and the authorized text data includes:

对于第i个智能家居节点,使用GroupGen多项式时间算法生成循环群G,所述循环群G的阶数是p,生成元是g,所述循环群G满足DDH假设;For the i-th smart home node, use the GroupGen polynomial time algorithm to generate a cyclic group G, the order of the cyclic group G is p, the generator is g, and the cyclic group G satisfies the DDH assumption;

从[1,p]中随机选取两个元素a和b,生成ga和gb并进行交换,计算获得gab作为签名密钥;Randomly select two elements a and b from [1, p], generate g a and g b and exchange them, and calculate to obtain g ab as the signature key;

使用签名算法Sig(·)对需要传输的文本信息进行签名,并使用哈希函数Hash对所述文本信息进行映射,获得授权后的文本数据Pre(Datai):Use the signature algorithm Sig( ) to sign the text information to be transmitted, and use the hash function Hash to map the text information to obtain the authorized text data Pre(Datai):

Pre(Datai)=[Trans(Datai),Sig(gab,Hash(Trans(Datai)))]Pre(Data i )=[Trans(Data i ), Sig(g ab , Hash(Trans(Data i )))]

其中,Pre(Datai)为第i个智能家居节点的授权后的文本数据,Trans(Datai)为第i个智能家居节点传输的文本信息。Wherein, Pre(Data i ) is the authorized text data of the ith smart home node, and Trans(Data i ) is the text information transmitted by the ith smart home node.

优选地,所述对所述授权后的文本数据进行验证后,提取文本数据特征,并进行语句连贯性分析,包括:Preferably, after the authorized text data is verified, the text data features are extracted, and sentence coherence analysis is performed, including:

采用签名验证算法,验证所述文本数据是否被篡改;如果数据被篡改,则直接丢弃对应的智能家居边缘节点此次的计算结果;如果数据未被篡改,则进行对应的特征提取和语句连贯性分析;A signature verification algorithm is used to verify whether the text data has been tampered with; if the data is tampered, the calculation result of the corresponding smart home edge node is directly discarded; if the data has not been tampered with, the corresponding feature extraction and statement coherence are performed analyze;

采用分层注意网络模型对所述文本数据进行从字词到句子再到语篇的特征提取;Using a hierarchical attention network model to extract features from words to sentences to discourse on the text data;

基于马尔可夫链对所述文本数据进行语句连贯性分析。Sentence coherence analysis is performed on the text data based on Markov chains.

优选地,所述分层注意网络模型包括:字词层面编码器层、字词层面注意力层、句子层面编码器层和句子层面注意力层;其中:Preferably, the hierarchical attention network model includes: a word-level encoder layer, a word-level attention layer, a sentence-level encoder layer, and a sentence-level attention layer; wherein:

所述字词层面编码器层,用于对每个字词进行编码,得到隐向量;The word-level encoder layer is used to encode each word to obtain a latent vector;

所述字词层面注意力层,用于对选取的隐向量进行点积计算,得到注意力权重;The word-level attention layer is used to perform dot product calculation on the selected hidden vector to obtain the attention weight;

所述句子层面编码器层,用于根据得到的隐向量的序列,进行隐向量进行加权和,得到句子的向量;The sentence-level encoder layer is used to perform a weighted sum of the hidden vectors according to the obtained sequence of hidden vectors to obtain the vector of the sentence;

所述句子层面注意力层,用于对选取的句子的向量进行点积计算,得到句子的注意力权重;The sentence-level attention layer is used to perform dot product calculation on the vector of the selected sentence to obtain the attention weight of the sentence;

对每一个所述句子的向量按照所述句子的注意力权重进行加权平均,获得最终的语篇向量并输出,实现从字词到句子再到整个语篇的特征提取。The vector of each sentence is weighted and averaged according to the attention weight of the sentence, and the final discourse vector is obtained and output, so as to realize feature extraction from words to sentences and then to the entire discourse.

优选地,所述基于马尔可夫链对所述文本数据进行语句连贯性分析,包括:Preferably, the sentence coherence analysis on the text data based on the Markov chain includes:

基于词频提取出所述文本数据里出现频率最高的多个字词,得到的字词记为Wset[w1,w2......,wn];Based on the word frequency, multiple words with the highest frequency in the text data are extracted, and the obtained words are recorded as W set [w 1 , w 2 ......, w n ];

针对任一文本数据,按照句子进行切割,检索其中出现含有Wset中字词的句子j,并提取出对应词汇,记录为

Figure BDA0003525875000000031
For any text data, cut according to the sentence, retrieve the sentence j that contains the words in W set , and extract the corresponding words, which are recorded as
Figure BDA0003525875000000031

采用已有的阿尔兹海默症患者的文本训练数据集对马尔可夫链进行训练,得到对应的转移概率;根据所述转移概率,比对所述文本数据中句子j的对应词汇,如果转移正确,所述句子j的连贯性得分Scorei增加1,否则不做任何操作;Use the existing text training data set of Alzheimer's disease patients to train the Markov chain to obtain the corresponding transition probability; according to the transition probability, compare the corresponding vocabulary of sentence j in the text data, if the transition probability Correct, the coherence score Score i of the sentence j is increased by 1, otherwise do nothing;

对所述文本数据的所有句子的连贯性得分进行归一化操作,根据设定的阈值Th进行划分,将大于阈值的Scorei修改为1,将小于阈值的Scorei修改为0,得到最终的语句连贯性分析结果。The coherence scores of all sentences in the text data are normalized, divided according to the set threshold Th, the Score i greater than the threshold value is modified to 1, and the Score i less than the threshold value is modified to 0 to obtain the final result. Statement coherence analysis results.

优选地,采用轻量级的神经网络,对汇总聚合后的特征进行分类,得到分类结果;将所述分类结果结合所述语句连贯性分析结果,得到最终的针对阿尔兹海默症的数据样本的检测结果。Preferably, a lightweight neural network is used to classify the aggregated features to obtain a classification result; the classification result is combined with the sentence coherence analysis result to obtain a final data sample for Alzheimer's disease test results.

根据本发明的另一个方面,提供了一种用于阿尔兹海默症的数据样本检测系统,包括:According to another aspect of the present invention, there is provided a data sample detection system for Alzheimer's disease, comprising:

多源数据采集模块,该模块基于物联网智能家居设备,进行被动式和主动式的多任务数据采集;A multi-source data acquisition module, which is based on IoT smart home devices for passive and active multi-task data acquisition;

数据预处理模块,该模块对采集的所述数据中的语音数据进行语音转文字处理,并在每一个智能家居节点对需要传输的文本信息进行授权,获得授权后的文本数据;a data preprocessing module, which performs voice-to-text processing on the voice data in the collected data, and authorizes the text information to be transmitted at each smart home node to obtain the authorized text data;

特征提取模块,该模块对所述授权后的文本数据进行验证后,提取文本数据特征,并进行语句连贯性分析;A feature extraction module, which extracts text data features after verifying the authorized text data, and performs statement coherence analysis;

结果检测模块,该模块对提取的所述特征汇总聚合后进行分类,并结合所述语句连贯性分析结果,得到最终的针对阿尔兹海默症的数据样本的检测结果。A result detection module, which summarizes and aggregates the extracted features and performs classification, and combines the sentence coherence analysis results to obtain a final detection result for Alzheimer's disease data samples.

根据本发明的第三个方面,提供了一种设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时可用于执行上述任一项所述的方法,或,运行上述的系统。According to a third aspect of the present invention, there is provided a device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor can be used to execute any of the above-mentioned programs when the processor executes the program A method of, or, operating a system as described above.

根据本发明的第四个方面,提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时可用于执行上述任一项所述的方法,或,运行上述的系统。According to a fourth aspect of the present invention, there is provided a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, the program can be used to execute any one of the above-mentioned methods, or to run the above-mentioned method. system.

由于采用了上述技术方案,本发明与现有技术相比,具有如下至少一项的有益效果:Due to the adoption of the above-mentioned technical solution, the present invention has the following beneficial effects compared with the prior art:

本发明提供的用于阿尔兹海默症的数据样本检测方法及系统,针对早期阿尔兹海默症的相关数据检测任务,可以区分出受试者是否患有早期阿尔兹海默症。本发明仅需利用在智能家居中广泛部署的物联网设备,对于多任务场景下的用户数据进行收集。由于定义了四种任务,分别是指令任务,描述任务,回忆任务,书写任务,包含被动式和主动式的场景,并且涵盖了语音和文本多源数据,因此有更多的用户特征可以用于后续的阿尔兹海默症相关数据的预测分类,有利于提升最终对数据预测结果的准确性和有效性。The data sample detection method and system for Alzheimer's disease provided by the present invention can distinguish whether a subject suffers from early Alzheimer's disease for the related data detection task of early Alzheimer's disease. The present invention only needs to use IoT devices widely deployed in smart homes to collect user data in multi-task scenarios. Since four tasks are defined, namely instruction task, description task, recall task, and writing task, including passive and active scenarios, and covering multi-source data of speech and text, there are more user features that can be used for follow-up The prediction and classification of Alzheimer's disease-related data is conducive to improving the accuracy and validity of the final data prediction results.

本发明提供的用于阿尔兹海默症的数据样本检测方法及系统,在特征的提取和最终分类中,也增加了对阿尔兹海默症患者和健康人群区别的数据分析。除了对采集数据的预处理以外,增加了基于马尔可夫链的语义连贯性分析,选择了适合这一任务的分层注意网络提取语篇的特征。本发明选择了多种分类模型,根据数据预测结果,能够反馈出受试者是否满足患有阿尔兹海默症的数据特征,在早期阿尔兹海默症的特征采集和预测具有如下任意一项或任意多项贡献:The data sample detection method and system for Alzheimer's disease provided by the present invention also adds data analysis on the difference between Alzheimer's disease patients and healthy people in feature extraction and final classification. In addition to the preprocessing of the collected data, a Markov chain-based semantic coherence analysis is added, and a hierarchical attention network suitable for this task is selected to extract the features of the text. The present invention selects a variety of classification models, and according to the data prediction results, can feedback whether the subject meets the data characteristics of Alzheimer's disease, and the feature collection and prediction of early Alzheimer's disease have any of the following or any number of contributions:

-对于设备的依赖性较低,只需要基于智能家居场景里的物联网设备完成多任务协调部署,即能够进行数据采集,可以高效地部署并进行简单的数据收集。- Low dependence on equipment, only need to complete multi-task coordinated deployment based on IoT devices in smart home scenarios, that is, data collection can be performed, efficient deployment and simple data collection can be performed.

-提出了安全机制,保证全过程的安全性,增加了签名验证机制,在传输过程中防止数据被篡改。-Proposed a security mechanism to ensure the security of the entire process, and added a signature verification mechanism to prevent data from being tampered with during transmission.

-基于多任务的考察和有效的特征提取机制,保证数据采集的高精度。引入了分层注意网络进行数据的特征提取,使用马尔可夫链对语义连贯性进行拟合,改善提高系统性能。- Based on multi-task inspection and effective feature extraction mechanism, it ensures the high precision of data collection. A hierarchical attention network is introduced for feature extraction of data, and Markov chain is used to fit semantic coherence to improve system performance.

附图说明Description of drawings

通过阅读参照以下附图对非限制性实施例所作的详细描述,本发明的其它特征、目的和优点将会变得更明显:Other features, objects and advantages of the present invention will become more apparent by reading the detailed description of non-limiting embodiments with reference to the following drawings:

图1为本发明一实施例中用于阿尔兹海默症的数据样本检测方法的流程图。FIG. 1 is a flowchart of a data sample detection method for Alzheimer's disease according to an embodiment of the present invention.

图2为本发明一实施例中用于阿尔兹海默症的数据样本检测系统的组成模块示意图。FIG. 2 is a schematic diagram of components of a data sample detection system for Alzheimer's disease according to an embodiment of the present invention.

图3为本发明一优选实施例中用于阿尔兹海默症的数据样本检测系统的工作示意图。FIG. 3 is a working schematic diagram of a data sample detection system for Alzheimer's disease in a preferred embodiment of the present invention.

具体实施方式Detailed ways

下面对本发明的实施例作详细说明:本实施例在以本发明技术方案为前提下进行实施,给出了详细的实施方式和具体的操作过程。应当指出的是,对本领域的普通技术人员来说,在不脱离本发明构思的前提下,还可以做出若干变形和改进,这些都属于本发明的保护范围。The embodiments of the present invention are described in detail below: This embodiment is implemented on the premise of the technical solution of the present invention, and provides detailed implementation modes and specific operation processes. It should be pointed out that for those skilled in the art, without departing from the concept of the present invention, several modifications and improvements can be made, which all belong to the protection scope of the present invention.

图1为本发明一实施例中用于阿尔兹海默症的数据样本检测方法的流程图。FIG. 1 is a flowchart of a data sample detection method for Alzheimer's disease according to an embodiment of the present invention.

如图1所示,该实施例提供的用于阿尔兹海默症的数据样本检测方法,可以包括如下步骤:As shown in FIG. 1 , the data sample detection method for Alzheimer's disease provided by this embodiment may include the following steps:

S100,基于物联网智能家居设备,进行被动式和主动式的多任务数据采集;S100, based on IoT smart home equipment, performs passive and active multi-task data collection;

S200,对采集的数据中的语音数据进行语音转文字处理,并在每一个智能家居节点对需要传输的文本信息进行授权,获得授权后的文本数据;S200, performing voice-to-text processing on the voice data in the collected data, and authorizing the text information to be transmitted at each smart home node to obtain the authorized text data;

S300,对授权后的文本数据进行验证后,提取文本数据特征,并进行语句连贯性分析;S300, after verifying the authorized text data, extract the features of the text data, and perform sentence coherence analysis;

S400,对提取的特征汇总聚合后进行分类,并结合语句连贯性分析结果,得到最终的针对阿尔兹海默症的数据样本的检测结果。S400, the extracted features are aggregated and classified, and combined with the sentence coherence analysis result, a final detection result of the Alzheimer's disease data sample is obtained.

在S100的一优选实施例中,被动式的数据采集可以包括:日常对于智能家居设备的指令信息数据采集;主动式的数据采集可以包括:针对设定的任务进行的语音和文本数据采集。在一具体应用实例中,设定的多任务可以包括:描述任务、回忆任务和书写任务;进一步地,被动式和主动式的多任务数据采集,包括:指令任务、描述任务、回忆任务下采集的语音数据(该语音数据在后续处理中将处理为转录文本)以及书写任务下获得的文本数据;其中,语音数据依赖于智能家居的智能音箱、麦克风等进行采集,文本数据依赖于智能家居的智能扫描仪等设备进行辅助收集。In a preferred embodiment of S100, passive data collection may include: daily command information data collection for smart home devices; active data collection may include: voice and text data collection for set tasks. In a specific application example, the set multi-tasks may include: description tasks, recall tasks, and writing tasks; further, passive and active multi-task data collection includes: instruction tasks, description tasks, and recall tasks. Voice data (the voice data will be processed as transcribed text in subsequent processing) and text data obtained under the writing task; among which, the voice data is collected by relying on the smart speakers and microphones of the smart home, and the text data is dependent on the intelligence of the smart home. Scanners and other equipment for auxiliary collection.

在S200的一优选实施例中,对采集的数据中的语音数据进行语音转文字处理,并在每一个智能家居节点对需要传输的文本信息进行授权,获得授权后的文本数据,可以包括如下步骤:In a preferred embodiment of S200, voice-to-text processing is performed on the voice data in the collected data, and the text information to be transmitted is authorized at each smart home node, and the authorized text data can include the following steps :

S201,对采集的数据进行预处理;S201, preprocessing the collected data;

S202,将预处理后的数据中的语音数据进行语音转文字处理;S202, performing voice-to-text processing on the voice data in the preprocessed data;

S203,在每一个智能家居节点,对需要传输的文本信息进行签名授权,得到授权后的文本数据。S203, at each smart home node, perform signature authorization on the text information to be transmitted, and obtain authorized text data.

在S201的一优选实施例中,预处理,可以包括如下内容:In a preferred embodiment of S201, the preprocessing may include the following:

对采集的数据进行滤波去噪处理;Filter and denoise the collected data;

对部分采集的数据进行预加重处理。Pre-emphasis is performed on some of the collected data.

在S203的一优选实施例中,为了生成签名密钥,保证传输数据不被篡改,对需要传输的文本信息进行签名授权,得到授权后的文本数据,可以包括如下步骤:In a preferred embodiment of S203, in order to generate a signature key to ensure that the transmission data is not tampered with, perform signature authorization on the text information to be transmitted, and obtain the authorized text data, the following steps may be included:

S2031,对于第i个智能家居节点,使用GroupGen多项式时间算法生成循环群G,循环群G的阶数是p,生成元是g,循环群G满足DDH假设;S2031, for the i-th smart home node, use the GroupGen polynomial time algorithm to generate a cyclic group G, the order of the cyclic group G is p, the generator is g, and the cyclic group G satisfies the DDH assumption;

S2032,从[1,p]中随机选取两个元素a和b,生成ga和gb并进行交换,计算获得gab作为签名密钥;S2032, randomly select two elements a and b from [1, p], generate g a and g b and exchange them, and calculate and obtain g ab as a signature key;

S2033,使用签名算法Sig(·)对需要传输的文本信息进行签名,并使用哈希函数Hash对文本信息进行映射,获得授权后的文本数据Pre(Datai):S2033, use the signature algorithm Sig( ) to sign the text information to be transmitted, and use the hash function Hash to map the text information, and obtain the authorized text data Pre(Data i ):

Pre(Datai)=[Trans(Datai),Sig(gab,Hash(Trans(Datai)))]Pre(Data i )=[Trans(Data i ), Sig(g ab , Hash(Trans(Data i )))]

其中,Pre(Datai)为第i个智能家居节点的授权后的文本数据,Trans(Datai)为第i个智能家居节点传输的文本信息。该第i个智能家居节点传输的文本信息Trans(Datai)可以为通过语音转录得到的文本信息,也可以为通过智能家居设备直接获取的用于完成书写任务的文本信息。Wherein, Pre(Data i ) is the authorized text data of the ith smart home node, and Trans(Data i ) is the text information transmitted by the ith smart home node. The text information Trans(Data i ) transmitted by the i-th smart home node may be text information obtained by voice transcription, or may be text information directly obtained by a smart home device for completing a writing task.

在S300的一优选实施例中,对授权后的文本数据进行验证后,提取文本数据特征,并进行语句连贯性分析,可以包括如下步骤:In a preferred embodiment of S300, after the authorized text data is verified, the text data features are extracted, and sentence coherence analysis is performed, which may include the following steps:

S301,采用签名验证算法,验证文本数据是否被篡改;如果数据被篡改,则直接丢弃对应的智能家居边缘节点此次的计算结果;如果数据未被篡改,则进行对应的特征提取和语句连贯性分析;S301, using a signature verification algorithm to verify whether the text data has been tampered with; if the data has been tampered with, directly discard the calculation result of the corresponding smart home edge node; if the data has not been tampered with, perform corresponding feature extraction and statement coherence analyze;

S302,采用分层注意网络模型对文本数据进行从字词到句子再到语篇的特征提取;S302, using a hierarchical attention network model to extract features from words to sentences and then to discourses on the text data;

S303,基于马尔可夫链对文本数据进行语句连贯性分析。S303, perform sentence coherence analysis on the text data based on the Markov chain.

在S302的一优选实施例中,分层注意网络模型包括:字词层面编码器层、字词层面注意力层、句子层面编码器层和句子层面注意力层;其中:In a preferred embodiment of S302, the hierarchical attention network model includes: a word-level encoder layer, a word-level attention layer, a sentence-level encoder layer, and a sentence-level attention layer; wherein:

字词层面编码器层,用于对每个字词进行编码,得到隐向量;The word-level encoder layer is used to encode each word to obtain a latent vector;

字词层面注意力层,用于对选取的隐向量进行点积计算,得到注意力权重;The word-level attention layer is used to calculate the dot product of the selected latent vector to obtain the attention weight;

句子层面编码器层,用于根据得到的隐向量的序列,进行隐向量进行加权和,得到句子的向量;The sentence-level encoder layer is used to perform a weighted sum of the hidden vectors according to the sequence of the obtained hidden vectors to obtain the vector of the sentence;

句子层面注意力层,用于对选取的句子的向量进行点积计算,得到句子的注意力权重;The sentence-level attention layer is used to calculate the dot product of the vector of the selected sentence to obtain the attention weight of the sentence;

对每一个句子的向量按照句子的注意力权重进行加权平均,获得最终的语篇向量表示v并输出,实现从字词到句子再到整个语篇的特征提取。The vector of each sentence is weighted and averaged according to the attention weight of the sentence, and the final discourse vector representation v is obtained and output, and the feature extraction from words to sentences to the entire discourse is realized.

对于获得的语篇向量,可以通过全连接层的softmax层分类获得输出。For the obtained discourse vector, the output can be obtained through the softmax layer classification of the fully connected layer.

在S303的一优选实施例中,基于马尔可夫链对文本数据进行语句连贯性分析,可以包括如下步骤:In a preferred embodiment of S303, performing sentence coherence analysis on the text data based on the Markov chain may include the following steps:

S3031,基于词频提取出文本数据里出现频率最高的多个字词,得到的字词记为Wset[w1,w2......,wn];S3031, based on the word frequency, extract a plurality of words with the highest frequency in the text data, and record the obtained words as W set [w 1 , w 2 ......, w n ];

S3032,针对任一文本数据,按照句子进行切割,检索其中出现含有Wset中字词的句子j,并提取出对应词汇,记录为

Figure BDA0003525875000000071
S3032, for any text data, cut according to the sentence, retrieve the sentence j that contains the word in W set , and extract the corresponding word, which is recorded as
Figure BDA0003525875000000071

S3033,采用已有的阿尔兹海默症患者的文本训练数据集对马尔可夫链进行训练,得到对应的转移概率;根据转移概率,比对文本数据中句子j的对应词汇,如果转移正确,句子j的连贯性得分Scorei增加1,否则不做任何操作;S3033, use the existing text training data set of Alzheimer's patients to train the Markov chain to obtain the corresponding transition probability; compare the corresponding vocabulary of sentence j in the text data according to the transition probability, if the transition is correct, Increase the coherence score Score i of sentence j by 1, otherwise do nothing;

S3034,对文本数据的所有句子的连贯性得分进行归一化操作,根据设定的阈值Th进行划分,将大于阈值的Scorei修改为1,将小于阈值的Scorei修改为0,得到最终的语句连贯性分析结果。S3034, normalize the coherence scores of all sentences in the text data, divide according to the set threshold Th, modify the Score i greater than the threshold to 1, and modify the Score i smaller than the threshold to 0, to obtain the final Statement coherence analysis results.

在S400的一优选实施例中,采用轻量级的神经网络,对汇总聚合后的特征进行分类,得到分类结果;将分类结果结合语句连贯性分析结果,得到最终的针对阿尔兹海默症的数据样本的检测结果。其中,只有当两个结果均为符合正常条件的数据样本时,判断采集的数据样本为正常样本,否则为患病样本。In a preferred embodiment of S400, a lightweight neural network is used to classify the aggregated features to obtain a classification result; the classification result is combined with the sentence coherence analysis result to obtain the final Alzheimer's disease-specific The test results of the data sample. Among them, only when both results are data samples that meet normal conditions, the collected data samples are judged to be normal samples, otherwise they are diseased samples.

本发明上述实施例提供的用于阿尔兹海默症的数据样本检测方法,满足多场景任务的需求,通过签名验证方案实现了高可靠安全,高有效性的实现针对阿尔兹海默症的数据样本居家检测。在智能家居场景中进行部署,利用海量的物联网设备,对多任务的数据进行采集分析处理,在数据传输过程中使用签名验证方案,确保数据在传输过程中未被篡改,有效保证数据传输的安全性。最终使用机器学习模型对于用户数据进行分类预测,为阿尔兹海默症的早期发现提供辅助的数据参考。The data sample detection method for Alzheimer's disease provided by the above embodiments of the present invention meets the needs of multi-scenario tasks, and realizes highly reliable, safe, and highly effective data for Alzheimer's disease through the signature verification scheme. Sample home testing. Deploy in smart home scenarios, use massive IoT devices to collect, analyze, and process multi-task data, and use signature verification schemes during data transmission to ensure that data is not tampered with during transmission, effectively ensuring data transmission. safety. Finally, the machine learning model is used to classify and predict user data, which provides auxiliary data reference for the early detection of Alzheimer's disease.

本发明一优选实施例提供了一种用于阿尔兹海默症的数据样本检测方法。该优选实施例提供的方法可以在智能家居中部署使用,包括如下步骤:A preferred embodiment of the present invention provides a data sample detection method for Alzheimer's disease. The method provided by this preferred embodiment can be deployed and used in a smart home, and includes the following steps:

步骤一,进行多任务数据采集,可以依赖于智能家居的设备内部的麦克风对语音进行收集,其中包括被动式和主动式的数据收集。被动式收集指的是日常对于智能家居设备的指令,包括唤醒词和对话等等。主动式则是本发明定义的其余任务。In step 1, multi-task data collection is performed, and the voice can be collected by relying on the microphone inside the device of the smart home, including passive and active data collection. Passive collection refers to everyday commands to smart home devices, including wake words and conversations. Active is the remaining task defined by the present invention.

步骤二,进行数据的预处理,首先要对于语音进行一些去噪处理,可以使用一些低通滤波器进行滤波。此后对于收集到的数据进行语音转文字处理。在每个智能家居节点,都对于要传输的信息进行签名,避免在传输过程中被攻击者篡改。The second step is to perform data preprocessing. First, some denoising processing is performed on the speech, and some low-pass filters can be used for filtering. Thereafter, speech-to-text processing is performed on the collected data. At each smart home node, the information to be transmitted is signed to avoid being tampered with by attackers during the transmission process.

步骤三,进行特征提取,该模块使用分层注意网络对转录文本进行特征提取,从字词到句子再到语篇,实现对整个语篇特征的提取,为最终分类和检测做好准备工作。并且使用基于马尔可夫链的模型对于语篇做语句连贯性分析,作为最终分类的参考。The third step is to perform feature extraction. This module uses a hierarchical attention network to extract features from the transcribed text, from words to sentences to discourse, to extract features of the entire discourse and prepare for final classification and detection. And use the Markov chain-based model to analyze the sentence coherence of the text, as a reference for the final classification.

步骤四,进行分析决策,医疗数据中心收集到汇总的聚合特征,使用轻量级神经网络和传统简单分类模型对结果进行分类和预测。参考之前的连贯性分析的得分,最终得到针对于阿尔兹海默症的决策。The fourth step is to make an analysis decision. The medical data center collects the aggregated aggregated features, and uses lightweight neural networks and traditional simple classification models to classify and predict the results. Referring to the score of the previous coherence analysis, the final decision for Alzheimer's disease is obtained.

下面结合一具体应用实例对该优选实施例提供的方法的具体方案进一步详细说明。The specific scheme of the method provided by the preferred embodiment will be further described in detail below with reference to a specific application example.

该具体应用实例提供的方法,可以采用商用的智能家居物联网设备进行数据采集工作,针对于N个用户,有四种任务需要进行数据采集。第一项是被动式的指令采集,数据记为

Figure BDA0003525875000000091
表示指令任务的第i个用户的数据。第二项是主动式的对指定图片进行描述,数据记为
Figure BDA0003525875000000092
表示描述任务的第i个用户的数据。而另外两种数据分别针对于回忆任务和书写任务,记为
Figure BDA0003525875000000093
针对特定一个用户Useri,最终获得的数据是
Figure BDA0003525875000000094
In the method provided by this specific application example, commercial smart home IoT devices can be used for data collection. For N users, there are four tasks that require data collection. The first item is passive command collection, and the data is recorded as
Figure BDA0003525875000000091
Data representing the ith user of the command task. The second item is to actively describe the specified picture, and the data is recorded as
Figure BDA0003525875000000092
Represents data describing the ith user of the task. The other two kinds of data are respectively for the recall task and the writing task, denoted as
Figure BDA0003525875000000093
For a specific user Useri, the final data obtained is
Figure BDA0003525875000000094

在预处理阶段,该进行了滤波处理,对于一些不清晰的音频进行了预加重操作,方便转录成为文本。该具体应用实例提供的方法使用了语音转文字的开源包工具,获得转录的文本Trans(Datai)。为了保证数据传输的安全性和不可篡改性,引入了基于了签名验证方案保证数据的安全性。对于医疗数据中心和第i个智能家居节点,首先使用GroupGen多项式时间算法生成循环群G,该群的阶数是p,生成元是g。生成的群需要满足DDH假设(Decisional Diffie-Hellman Assumption)。从[1,p]中随机选取两个元素a,b,生成ga和gb并进行交换,计算获得gab作为签名密钥,使用签名算法Sig(·)对收集到的数据进行签名,并使用哈希函数Hash对数据进行映射,获得新的数据。In the preprocessing stage, it is time to filter and pre-emphasize some unclear audio to facilitate transcription into text. The method provided by this specific application example uses the open source package tool of speech-to-text to obtain the transcribed text Trans(Datai). In order to ensure the security and immutability of data transmission, a signature verification scheme is introduced to ensure data security. For the medical data center and the i-th smart home node, first use the GroupGen polynomial time algorithm to generate a cyclic group G, the order of the group is p, and the generator is g. The generated group needs to satisfy the DDH assumption (Decisional Diffie-Hellman Assumption). Randomly select two elements a, b from [1, p], generate g a and g b and exchange them, calculate and obtain g ab as the signature key, and use the signature algorithm Sig( ) to sign the collected data, And use the hash function Hash to map the data to obtain new data.

Pre(Datai)=[Trans(Datai),Sig(gab,Hash(Trans(Datai)))]Pre(Data i )=[Trans(Data i ), Sig(g ab , Hash(Trans(Data i )))]

其中Pre(Datai)是对于第i个用户的授权后的文本数据,为经过预处理之后,经过语音转录文本和签名验证处理后的数据,为特征提取做准备。Among them, Pre(Data i ) is the authorized text data for the i-th user, which is the data after preprocessing, voice transcription text and signature verification, in preparation for feature extraction.

在特征提取过程中,首先医疗数据中心需要对收到的数据进行签名验证,使用多项式时间的验证算法,验证数据是否被篡改过。如果数据被篡改,将直接丢弃对应的智能家居边缘节点此次的计算结果,之后进行对应的特征提取。In the feature extraction process, the medical data center first needs to perform signature verification on the received data, and use a polynomial time verification algorithm to verify whether the data has been tampered with. If the data is tampered with, the calculation result of the corresponding smart home edge node will be directly discarded, and then the corresponding feature extraction will be performed.

该具体应用实例提供的方法,基于健康人群和患阿尔兹海默症人群的语音文本差异,进行特征的提取。具体而言是,特征提取模型采用分层注意网络模型,由以下的四部分组成:一个字词层面的编码器,一个字词层面的注意力层,一个句子层面的编码器,一个处于句子层面的注意力层。对每个字词编码后,得到隐向量,通过预先选好的向量进行点积计算,得到注意力权重。再根据隐向量的序列,进行加权和,得到句子的向量。后续的从句子到语篇的设计过程与从单词到句子的过程类似,最终获得整个语篇的对应特征向量。The method provided by this specific application example performs feature extraction based on the difference in speech and text between healthy people and people with Alzheimer's disease. Specifically, the feature extraction model adopts a hierarchical attention network model, which consists of the following four parts: a word-level encoder, a word-level attention layer, a sentence-level encoder, and a sentence-level encoder. attention layer. After encoding each word, a latent vector is obtained, and the pre-selected vector is used for dot product calculation to obtain the attention weight. Then according to the sequence of hidden vectors, weighted sum is performed to obtain the vector of the sentence. The subsequent sentence-to-discourse design process is similar to the word-to-sentence process, and finally the corresponding feature vectors of the entire text are obtained.

此后引入了马尔可夫链来探究语义的连贯性情况,该情况用于辅助最后的预测结果。具体而言是,首先基于词频提取出Pre(Datai)里最常出现的50个字词,此处设定为50是考虑到了语篇和任务量的大小设计的,得到的字词记为Wset[w1,w2......,w50]。对于某一用户的数据,按照句子切割,检索其中出现含有Wset里面词的句子,并提取出对应词汇,假设某一句子没有出现Wset里面词,则不记录,否则进行记录。例如对于第j句,如果该句子含有k个Wset里面的词,记录为

Figure BDA0003525875000000101
Markov chains have since been introduced to explore the semantic coherence case, which is used to assist the final prediction results. Specifically, the 50 most frequently appearing words in Pre(Data i ) are firstly extracted based on word frequency. The setting of 50 here is designed considering the size of the discourse and the amount of tasks. The obtained words are recorded as W set [w 1 , w 2 ......, w 50 ]. For the data of a certain user, cut it according to the sentence, retrieve the sentence containing the words in the W set , and extract the corresponding vocabulary. If a sentence does not appear in the word in the W set , it will not be recorded, otherwise it will be recorded. For example, for the jth sentence, if the sentence contains k words in the W set , record as
Figure BDA0003525875000000101

使用已有的阿尔兹海默症患者的文本数据集对马尔可夫链进行训练,可以得到对应的转移概率,即针对某一出现在Wset里面的词,下一次出现的Wset面词应该是什么。通过这一规则与目前采集样本中的记录进行比对,如果转移正确,第i个用户的样本连贯性得分Scorei增加1,不匹配则不做任何操作。最后对用户的所有样本连贯性得分进行归一化操作,根据设定的阈值Th进行划分,大于阈值的Scorei修改为1,小于阈值的修改为0,辅助最后的分类结果。Using the existing text data set of Alzheimer's patients to train the Markov chain, the corresponding transition probability can be obtained, that is, for a word that appears in the W set , the next W set face word should be what is. Through this rule, it is compared with the records in the currently collected samples. If the transfer is correct, the sample coherence score Score i of the i-th user is increased by 1, and if it does not match, no action is taken. Finally, normalize the coherence scores of all samples of the user, and divide them according to the set threshold Th. The Score i greater than the threshold is modified to 1, and the score i is less than the threshold is modified to 0 to assist the final classification result.

在特征提取后,需要进行分类任务来得到最终的阿尔兹海默症的预测结果,采用轻量级的神经网络和传统的分类方法(如,随机森林,k相邻,支持向量机等等)进行分类,最终选取最好的分类效果。该具体应用实例中需要对于采集的数据进行训练集和测试集的划分,可以按照7:3来划分。采用轻量级的算法可以实现快速的分类,最终获得结果0或者1。0代表受试者的数据样本被分类为符合患阿尔兹海默症条件的数据样本,1代表受试者的数据样本被分类为符合健康人群条件的数据样本。最终结果参考之前马尔可夫链的分析结果综合得到,如果两个结果都判定为1,最终结果输出为1,代表该受试者是健康人群,其余情况设定为患病人群,尽可能避免漏检。After feature extraction, a classification task is required to get the final Alzheimer's disease prediction result, using lightweight neural networks and traditional classification methods (such as random forests, k-neighbors, support vector machines, etc.) Classification, and finally select the best classification effect. In this specific application example, the collected data needs to be divided into a training set and a test set, which may be divided according to 7:3. Using a lightweight algorithm can achieve fast classification, and finally obtain a result of 0 or 1. 0 represents that the data sample of the subject is classified as a data sample that meets the conditions of Alzheimer's disease, and 1 represents the data sample of the subject. A data sample that is classified as eligible for a healthy population. The final result is obtained by referring to the analysis results of the previous Markov chain. If both results are judged to be 1, the final result output is 1, which means that the subject is a healthy population, and the rest are set as a sick population, which should be avoided as much as possible. Missed inspection.

图2为本发明一实施例提供的用于阿尔兹海默症的数据样本检测系统的组成模块示意图。FIG. 2 is a schematic diagram of components of a data sample detection system for Alzheimer's disease according to an embodiment of the present invention.

如图2所示,该实施例提供的用于阿尔兹海默症的数据样本检测系统,可以包括如下模块:As shown in Figure 2, the data sample detection system for Alzheimer's disease provided by this embodiment may include the following modules:

多源数据采集模块,该模块基于物联网智能家居设备,进行被动式和主动式的多任务数据采集;A multi-source data acquisition module, which is based on IoT smart home devices for passive and active multi-task data acquisition;

数据预处理模块,该模块对采集的数据中的语音数据进行语音转文字处理,并在每一个智能家居节点对需要传输的文本信息进行授权,获得授权后的文本数据;A data preprocessing module, which performs voice-to-text processing on the voice data in the collected data, and authorizes the text information to be transmitted at each smart home node to obtain the authorized text data;

特征提取模块,该模块对授权后的文本数据进行验证后,提取文本数据特征,并进行语句连贯性分析;A feature extraction module, which verifies the authorized text data, extracts the text data features, and analyzes the sentence coherence;

结果检测模块,该模块对提取的特征汇总聚合后进行分类,并结合语句连贯性分析结果,得到最终的针对阿尔兹海默症的数据样本的检测结果。A result detection module, which summarizes and aggregates the extracted features and classifies them, and combines the sentence coherence analysis results to obtain the final detection results for Alzheimer's disease data samples.

下面结合附图对本发明上述实施例提供的用于阿尔兹海默症的数据样本检测系统的工作内容进一步说明。The working content of the data sample detection system for Alzheimer's disease provided by the above embodiments of the present invention will be further described below with reference to the accompanying drawings.

如图3所示,在该实施例的一优选实施例中:As shown in Figure 3, in a preferred embodiment of this embodiment:

多源数据采集模块:Multi-source data acquisition module:

针对多种任务进行数据采集,提交给医疗数据中心进行处理和管理。Data is collected for various tasks and submitted to the medical data center for processing and management.

为保证采集数据的有效性和多样性,该模块定义了四种任务,分别是被动采集的指令任务以及主动采集的描述任务、回忆任务和书写任务,广泛收集用户的多源数据。In order to ensure the validity and diversity of the collected data, this module defines four kinds of tasks, which are passively collected instruction tasks and actively collected description tasks, recall tasks and writing tasks, and extensively collect multi-source data from users.

数据采集需要依赖于智能家居场景中广泛部署的物联网设备,可以便捷的进行数据的采集。Data collection needs to rely on IoT devices that are widely deployed in smart home scenarios, which can easily collect data.

数据预处理模块:Data preprocessing module:

包含对聚合数据的语音转文字处理,并且对于要传输的信息进行签名。Contains speech-to-text processing of aggregated data and signing of information to be transmitted.

采用语音转文字的开源包工具实现对采集的语音数据的处理,同时考虑到收集到的数据可能清晰度,声道数不同,需要进行预处理。The voice-to-text open source package tool is used to process the collected voice data. At the same time, considering that the collected data may be clear and the number of channels is different, preprocessing is required.

增加了签名机制,在数据采集端对于数据进行签名,为后续验证安全性打好基础。A signature mechanism is added to sign the data at the data collection end, laying a solid foundation for subsequent verification of security.

特征提取模块:Feature extraction module:

在医疗数据中心,需要对收到的数据签名进行验证,确定是否来自于合法用户,此后使用分层注意网络提取有效的特征,并且进行语义连贯性分析,得到最终的聚合特征。In the medical data center, it is necessary to verify the received data signature to determine whether it is from a legitimate user, and then use the hierarchical attention network to extract valid features, and perform semantic coherence analysis to obtain the final aggregated features.

对于收集到的数据签名进行验证,确定是否来源于合法用户,此后对于有效的内容进行特征提取,选择适用于本场景的网络模型,提取到合适的特征用于后一模块的分类。Verify the collected data signatures to determine whether they come from legitimate users, and then perform feature extraction for valid content, select a network model suitable for this scenario, and extract appropriate features for the classification of the latter module.

增加了基于马尔可夫链的语义连贯性分析,给出连贯性评分,为最终分类提供参考。The semantic coherence analysis based on Markov chain is added, and the coherence score is given to provide a reference for the final classification.

结果检测模块:Result detection module:

在医疗数据中心,使用神经网络和其他简单的分类器模型对于数据进行训练和分类,可以为用户是否患病提供辅助的数据参考。In medical data centers, using neural networks and other simple classifier models to train and classify data can provide auxiliary data references for whether users are sick or not.

对于收集到的特征进行分类,采用神经网络或其他常用的分类模型,如随机森林,k相邻,支持向量机等等,综合上一模块给出的连贯性评分,给出最终的分类结果,为针对早期阿尔兹海默症的数据检测提供帮助。For the classification of the collected features, neural networks or other commonly used classification models, such as random forest, k-adjacent, support vector machine, etc., are used to synthesize the coherence score given by the previous module to give the final classification result. Provide help for data detection of early Alzheimer's disease.

需要说明的是,本发明提供的方法中的步骤,可以利用系统中对应的模块、装置、单元等予以实现,本领域技术人员可以参照方法的技术方案实现系统的组成,即,方法中的实施例可理解为构建系统的优选例,在此不予赘述。It should be noted that the steps in the method provided by the present invention can be implemented by using corresponding modules, devices, units, etc. in the system, and those skilled in the art can refer to the technical solutions of the method to realize the composition of the system, that is, the implementation of the method. The example can be understood as a preferred example of the construction system, which will not be repeated here.

本发明一实施例提供了一种设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,处理器执行程序时可用于执行本发明上述中任一项的方法,或,运行本发明上述实施例中任一项的系统。An embodiment of the present invention provides a device, including a memory, a processor, and a computer program stored in the memory and running on the processor. When the processor executes the program, the processor can be used to execute any of the above methods of the present invention, Or, run the system of any of the above embodiments of the present invention.

本发明一实施例还提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时可用于执行本发明上述中任一项的方法,或,运行本发明上述实施例中任一项的系统。An embodiment of the present invention further provides a computer-readable storage medium on which a computer program is stored. When the program is executed by a processor, the program can be used to execute any of the above-mentioned methods of the present invention, or to execute the above-mentioned implementation of the present invention. system of any of the examples.

可选地,存储器,用于存储程序;存储器,可以包括易失性存储器(英文:volatilememory),例如随机存取存储器(英文:random-access memory,缩写:RAM),如静态随机存取存储器(英文:static random-access memory,缩写:SRAM),双倍数据率同步动态随机存取存储器(英文:Double Data Rate Synchronous Dynamic Random Access Memory,缩写:DDR SDRAM)等;存储器也可以包括非易失性存储器(英文:non-volatile memory),例如快闪存储器(英文:flash memory)。存储器用于存储计算机程序(如实现上述方法的应用程序、功能模块等)、计算机指令等,上述的计算机程序、计算机指令等可以分区存储在一个或多个存储器中。并且上述的计算机程序、计算机指令、数据等可以被处理器调用。Optionally, the memory is used to store the program; the memory may include volatile memory (English: volatile memory), such as random-access memory (English: random-access memory, abbreviation: RAM), such as static random-access memory ( English: static random-access memory, abbreviation: SRAM), double data rate synchronous dynamic random access memory (English: Double Data Rate Synchronous Dynamic Random Access Memory, abbreviation: DDR SDRAM), etc.; memory can also include non-volatile Memory (English: non-volatile memory), such as flash memory (English: flash memory). The memory is used to store computer programs (such as application programs, functional modules, etc. for implementing the above methods), computer instructions, etc., and the above computer programs, computer instructions, etc. can be stored in one or more memories in partitions. And the above-mentioned computer programs, computer instructions, data, etc. can be called by the processor.

上述的计算机程序、计算机指令等可以分区存储在一个或多个存储器中。并且上述的计算机程序、计算机指令、数据等可以被处理器调用。The computer programs, computer instructions, etc. described above may be partitioned and stored in one or more memories. And the above-mentioned computer programs, computer instructions, data, etc. can be called by the processor.

处理器,用于执行存储器存储的计算机程序,以实现上述实施例涉及的方法中的各个步骤。具体可以参见前面方法实施例中的相关描述。The processor is configured to execute the computer program stored in the memory, so as to implement each step in the method involved in the above embodiments. For details, refer to the relevant descriptions in the foregoing method embodiments.

处理器和存储器可以是独立结构,也可以是集成在一起的集成结构。当处理器和存储器是独立结构时,存储器、处理器可以通过总线耦合连接。The processor and memory can be separate structures or integrated structures that are integrated together. When the processor and the memory are independent structures, the memory and the processor can be coupled and connected through a bus.

本发明上述实施例提供的用于阿尔兹海默症的数据样本检测方法及系统,考虑了在多种任务下用户的表现,同时兼顾了主动式检测和被动式检测,并且使用了在智能家居场景中广泛部署的设备进行数据的采集,具有便捷可靠低成本等优势。其主要原理是基于患病人群与健康人群在语言、思维、文字上表现的差异,例如患病人群的语音有很多停顿和空白,逻辑性更差,这些细粒度的差异可以通过本发明上述实施例中提出的特征分析获得。目前,基于问卷的人工检查,基于医疗设备的病理性检查已经提出,但仍然存在众多的缺陷和不足,难以满足对早期阿尔兹海默症的发现和预防要求。同时在已有方案中都没有任何安全保护机制,对于用户而言,语音文本和其他隐私数据(如,年龄,性别)等等都可能在诊断过程中泄露,数据可能被篡改,因此本发明上述实施例中引入签名机制,保证了数据源的可靠性。本发明上述实施例提供的用于阿尔兹海默症的数据样本检测方法及系统,具有:设备依赖性低:只需依赖于在智能家居场景中部署的物联网设备进行数据的采集;具有安全机制:保证用户的健康数据不被泄露,数据的提供是安全的;高精度、低成本:与已有的阿尔兹海默症数据检测方案相比,由于多任务的数据采集更加多源,方便部署,低成本的同时达到较好的检测精度。The data sample detection method and system for Alzheimer's disease provided by the above embodiments of the present invention consider the user's performance under various tasks, and take into account both active detection and passive detection, and use the smart home scene. It is convenient, reliable and low-cost to collect data using equipment widely deployed in China. The main principle is based on the differences in language, thinking, and text performance between the sick population and the healthy population. For example, the voice of the sick population has many pauses and blanks, and the logic is worse. These fine-grained differences can be implemented by the present invention. The characteristic analysis proposed in the example is obtained. At present, manual inspection based on questionnaires and pathological inspections based on medical equipment have been proposed, but there are still many defects and deficiencies, and it is difficult to meet the requirements for the detection and prevention of early Alzheimer's disease. At the same time, there is no security protection mechanism in the existing solutions. For users, voice text and other private data (such as age, gender), etc. may be leaked during the diagnosis process, and the data may be tampered with. A signature mechanism is introduced in the embodiment to ensure the reliability of the data source. The data sample detection method and system for Alzheimer's disease provided by the above embodiments of the present invention have the following features: low device dependence: data collection only needs to be performed by relying on IoT devices deployed in smart home scenarios; Mechanism: Ensure that users' health data is not leaked, and the data is provided safely; high precision and low cost: Compared with the existing Alzheimer's disease data detection solutions, the multi-task data collection is more multi-source, which is convenient It can be deployed at low cost while achieving better detection accuracy.

本发明上述实施例中未尽事宜均为本领域公知技术。Matters not covered in the above embodiments of the present invention are known in the art.

以上对本发明的具体实施例进行了描述。需要理解的是,本发明并不局限于上述特定实施方式,本领域技术人员可以在权利要求的范围内做出各种变形或修改,这并不影响本发明的实质内容。The specific embodiments of the present invention have been described above. It should be understood that the present invention is not limited to the above-mentioned specific embodiments, and those skilled in the art can make various variations or modifications within the scope of the claims, which do not affect the essential content of the present invention.

Claims (10)

1. A data sample detection method for Alzheimer's disease is characterized by comprising the following steps:
based on the intelligent home equipment of the Internet of things, passive and active multi-task data acquisition is carried out;
performing voice-to-text processing on voice data in the collected data, and authorizing text information to be transmitted at each intelligent home node to obtain authorized text data;
after the authorized text data is verified, extracting text data characteristics and carrying out statement continuity analysis;
and summarizing and aggregating the extracted features, classifying, and combining the sentence continuity analysis result to obtain a final detection result of the data sample aiming at the Alzheimer disease.
2. The method for data sample detection of alzheimer's disease as set forth in claim 1, wherein said passive data acquisition comprises: acquiring instruction information data of the intelligent household equipment in daily life; the active data acquisition comprises: voice and text data collection for a set task.
3. The method according to claim 1, wherein the step of performing speech-to-text processing on the speech data in the collected data, and authorizing the text information to be transmitted at each smart home node to obtain the authorized text data comprises:
preprocessing the acquired data;
performing voice-to-text processing on voice data in the preprocessed data;
and at each intelligent home node, performing signature authorization on the text information to be transmitted to obtain authorized text data.
4. The method for detecting data samples for Alzheimer's disease according to claim 3, further comprising any one or more of:
-said pre-treatment comprising:
carrying out filtering and denoising processing on the acquired data;
pre-emphasis processing is carried out on part of the collected data;
-said signing and authorizing the text information to be transmitted to obtain authorized text data, comprising:
for the ith intelligent home node, generating a cyclic group G by using a GroupGen polynomial time algorithm, wherein the order of the cyclic group G is p, the generator is G, and the cyclic group G meets the DDH assumption;
From [1, p ]]Randomly selecting two elements a and b to generate gaAnd gbAnd exchanging to obtain gabAs a signing key;
signing the text information to be transmitted by using a signature algorithm Sig (-) and mapping the text information by using a Hash function Hash to obtain authorized text Data Pre (Data)i):
Pre(Datai)=[Trans(Datai),Sig(gab,Hash(Trans(Datai)))]
Wherein, Pre (Data)i) Is the authorized text Data of the ith intelligent home node, Trans (Data)i) And transmitting the text information for the ith intelligent home node.
5. The method for detecting data samples for alzheimer's disease as set forth in claim 1, wherein said verifying the authorized text data, extracting text data features, and performing sentence continuity analysis comprises:
verifying whether the text data is tampered by adopting a signature verification algorithm; if the data are tampered, directly discarding the current calculation result of the corresponding intelligent household edge node; if the data is not tampered, corresponding feature extraction and statement continuity analysis are carried out;
performing feature extraction from words to sentences to language fragments on the text data by adopting a layered attention network model;
and performing sentence consistency analysis on the text data based on the Markov chain.
6. The method for detecting data samples for Alzheimer's disease according to claim 5, further comprising any one or more of:
-the hierarchical attention network model comprises: a word level encoder layer, a word level attention layer, a sentence level encoder layer and a sentence level attention layer; wherein:
the word level encoder layer is used for encoding each word to obtain a hidden vector;
the word level attention layer is used for performing dot product calculation on the selected implicit vectors to obtain attention weight;
the sentence level encoder layer is used for carrying out weighted sum on the hidden vectors according to the obtained sequence of the hidden vectors to obtain the vectors of the sentences;
the sentence level attention layer is used for carrying out dot product calculation on the vector of the selected sentence to obtain the attention weight of the sentence;
carrying out weighted average on the vector of each sentence according to the attention weight of the sentence to obtain and output a final language piece vector, and realizing feature extraction from words to sentences to the whole language piece; -said sentence consistency analysis of said text data based on markov chains, comprising:
Extracting a plurality of words with highest occurrence frequency in the text data based on word frequency, and marking the obtained words as Wset[w1,w2……,wn];
For any text data, cutting according to sentences, and searching the text data in which W appearssetThe sentence j of the Chinese word is extracted and recorded as
Figure FDA0003525874990000021
Training a Markov chain by adopting a text training data set of an existing Alzheimer's disease patient to obtain a corresponding transition probability; comparing the corresponding vocabulary of the sentence j in the text data according to the transition probability, and if the transition is correct, scoring the continuity Score of the sentence jiIncreasing 1, otherwise, not doing any operation;
normalizing the consistency scores of all sentences of the text data, dividing the sentences according to a set threshold Th, and scoring the sentences larger than the thresholdiModified to 1, Score less than thresholdiAnd modifying to be 0 to obtain a final sentence consistency analysis result.
7. The method for detecting the data samples of the alzheimer's disease as claimed in claim 1, wherein the aggregated features are classified by using a lightweight neural network to obtain a classification result; and combining the classification result with the sentence continuity analysis result to obtain a final detection result of the data sample aiming at the Alzheimer disease.
8. A data sample testing system for alzheimer's disease comprising:
the multi-source data acquisition module is used for carrying out passive and active multi-task data acquisition based on the Internet of things intelligent household equipment;
the data preprocessing module is used for carrying out voice-to-text processing on voice data in the collected data, authorizing text information to be transmitted at each intelligent home node and acquiring the authorized text data;
the feature extraction module is used for extracting the feature of the text data and carrying out statement continuity analysis after the authorized text data is verified;
and the result detection module is used for classifying the extracted features after summarizing and aggregating the features and obtaining a final detection result of the data sample aiming at the Alzheimer's disease by combining the sentence consistency analysis result.
9. An apparatus comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the program when executed by the processor is operable to perform the method of any one of claims 1 to 7 or to operate the system of claim 8.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, is adapted to carry out the method of any one of claims 1 to 7 or to carry out the system of claim 8.
CN202210193493.XA 2022-03-01 2022-03-01 Data sample detection method and system for Alzheimer's disease Pending CN114676249A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210193493.XA CN114676249A (en) 2022-03-01 2022-03-01 Data sample detection method and system for Alzheimer's disease

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210193493.XA CN114676249A (en) 2022-03-01 2022-03-01 Data sample detection method and system for Alzheimer's disease

Publications (1)

Publication Number Publication Date
CN114676249A true CN114676249A (en) 2022-06-28

Family

ID=82072304

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210193493.XA Pending CN114676249A (en) 2022-03-01 2022-03-01 Data sample detection method and system for Alzheimer's disease

Country Status (1)

Country Link
CN (1) CN114676249A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180165554A1 (en) * 2016-12-09 2018-06-14 The Research Foundation For The State University Of New York Semisupervised autoencoder for sentiment analysis
KR101998881B1 (en) * 2018-05-03 2019-07-10 주식회사 에프티에치코리아 Old man dementia prevention and safety management system
US20200300972A1 (en) * 2015-07-17 2020-09-24 Origin Wireless, Inc. Method, apparatus, and system for vital signs monitoring using high frequency wireless signals
CN113569001A (en) * 2021-01-29 2021-10-29 腾讯科技(深圳)有限公司 Text processing method, apparatus, computer device, and computer-readable storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200300972A1 (en) * 2015-07-17 2020-09-24 Origin Wireless, Inc. Method, apparatus, and system for vital signs monitoring using high frequency wireless signals
US20180165554A1 (en) * 2016-12-09 2018-06-14 The Research Foundation For The State University Of New York Semisupervised autoencoder for sentiment analysis
KR101998881B1 (en) * 2018-05-03 2019-07-10 주식회사 에프티에치코리아 Old man dementia prevention and safety management system
CN113569001A (en) * 2021-01-29 2021-10-29 腾讯科技(深圳)有限公司 Text processing method, apparatus, computer device, and computer-readable storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JIACHUN LI: "A Federated Learning Based Privacy-Preserving Smart Healthcare System", 《IEEE》, 20 July 2021 (2021-07-20), pages 2021 - 2027 *
金祝新;秦飞巍;方美娥;: "深度迁移学习辅助的阿尔兹海默氏症早期诊断", 计算机应用与软件, no. 05, 12 May 2019 (2019-05-12) *

Similar Documents

Publication Publication Date Title
Tsanas et al. Novel speech signal processing algorithms for high-accuracy classification of Parkinson's disease
Chen et al. Automatic detection of Alzheimer’s disease using spontaneous speech only
CN102723078B (en) Emotion speech recognition method based on natural language comprehension
CN108550375A (en) A kind of emotion identification method, device and computer equipment based on voice signal
CN111329494B (en) Depression reference data acquisition method and device
Niu et al. A time-frequency channel attention and vectorization network for automatic depression level prediction
Qian et al. Automatic detection, segmentation and classification of snore related signals from overnight audio recording
Wang et al. Automatic assessment of pathological voice quality using multidimensional acoustic analysis based on the GRBAS scale
Khan et al. Battling voice spoofing: a review, comparative analysis, and generalizability evaluation of state-of-the-art voice spoofing counter measures
Ravi et al. A step towards preserving speakers’ identity while detecting depression via speaker disentanglement
Xue et al. Cross-modal information fusion for voice spoofing detection
Qin et al. Automatic speech assessment for aphasic patients based on syllable-level embedding and supra-segmental duration features
Chen et al. Deep learning in automatic detection of dysphonia: Comparing acoustic features and developing a generalizable framework
Li et al. Unsupervised latent behavior manifold learning from acoustic features: Audio2behavior
CN116978409A (en) Depression state evaluation method, device, terminal and medium based on voice signal
Ding et al. Speech based detection of Alzheimer’s disease: a survey of AI techniques, datasets and challenges
Wang et al. Mixture of experts fusion for fake audio detection using frozen wav2vec 2.0
Kang et al. Retrieval-augmented audio deepfake detection
Ma et al. ClearSpeech: Improving Voice Quality of Earbuds Using Both In-Ear and Out-Ear Microphones
Zhang et al. A retrieval method for encrypted speech based on improved power normalized cepstrum coefficients and perceptual hashing
Yagnavajjula et al. Detection of neurogenic voice disorders using the fisher vector representation of cepstral features
CN114676249A (en) Data sample detection method and system for Alzheimer's disease
Li et al. SONAR: A Synthetic AI-Audio Detection Framework and Benchmark
Lopez‐Otero et al. Influence of speaker de‐identification in depression detection
Fayad et al. Vocal test analysis for assessing Parkinson's disease at early stage

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination