TW202135048A

TW202135048A - Method of extracting acoustic features in a disease estimation program, and disease estimation program and an apparatus using the acoustic features

Info

Publication number: TW202135048A
Application number: TW110100726A
Authority: TW
Inventors: 大宮康宏; 熊本賴夫
Original assignee: 日商生命科學研究所股份有限公司; 日商Ｐｓｔ股份有限公司
Priority date: 2020-01-09
Filing date: 2021-01-08
Publication date: 2021-09-16
Also published as: JP2023015420A; WO2021141088A1

Abstract

In an apparatus for estimating a plurality of psychiatric and neurological disorders by means of speech analysis, an estimation apparatus equipped with means for extracting acoustic features unaffected by the location of speech acquisition, and a method for operating the estimation apparatus are provided.

Description

Method for extracting sound feature quantity in disease estimation program, and disease estimation program and device using the sound feature quantity

本發明涉及一種在疾病估計程式中提取與環境無關的聲音特徵量的方法，以及使用與環境無關的聲音特徵量的疾病估計程式和疾病估計裝置。The present invention relates to a method for extracting sound feature quantities that are not related to the environment in a disease estimation program, and a disease estimation program and a disease estimation device that use sound feature quantities that are not related to the environment.

透過分析受檢者的語音來估計情緒的技術逐漸普及。專利文獻1揭示了一種將受檢者的語音轉換至頻譜，在頻率軸上滑動以求得自相關波形，並從中計算基頻來估計情緒狀態的技術。現有技術文獻專利文獻Techniques for estimating emotions by analyzing the voice of the subject are gradually becoming popular. Patent Document 1 discloses a technique that converts the subject's voice to a frequency spectrum, slides on the frequency axis to obtain an autocorrelation waveform, and calculates the fundamental frequency therefrom to estimate the emotional state. Prior art literature Patent literature

專利文獻1：國際專利公開第2006/132159號Patent Document 1: International Patent Publication No. 2006/132159

發明所欲解決的問題The problem to be solved by the invention

然而，當用戶在家庭或醫療機構之類的房間中輸入語音時，根據取得語音的地方，由於構成房間的牆壁、地板、天花板等產生的反射聲，可能會出現聲音干擾。這種聲音干擾可能會改變從輸入聲音中提取的聲音特徵量，降低疾病估計的準確性，但專利文件1沒有提到這個問題。However, when a user inputs a voice in a room such as a home or a medical institution, depending on the place where the voice is obtained, sound interference may occur due to the reflected sound generated by the walls, floor, ceiling, etc. constituting the room. This kind of sound interference may change the amount of sound features extracted from the input sound and reduce the accuracy of disease estimation, but Patent Document 1 does not mention this problem.

此外，專利文獻1的裝置僅止於估計用戶的情緒狀態，並沒有提及用於估計精神系統疾病或神經系統疾病（以下有時稱為神經精神疾病）的程式。通常，由於沒有有效的生物標記等，因此難以從多種類型的神經精神疾病中估計疾病。In addition, the device of Patent Document 1 only ends with estimating the emotional state of the user, and does not mention a program for estimating mental system diseases or neurological diseases (hereinafter sometimes referred to as neuropsychiatric diseases). Generally, since there are no effective biomarkers, etc., it is difficult to estimate diseases from various types of neuropsychiatric diseases.

例如，根據美國精神病學會（APA）出版的DSM-5手冊中的診斷標準，利維體認知障礙和帕金森氏症的區別取決於利維體發生的位置，因此症狀可能相似。此外，阿茲海默症和額顳葉失智症、阿茲海默症和利維體認知障礙、利維體認知障礙和帕金森氏症，以及雙相情感障礙和重度憂鬱症之間的疾病難以區分。For example, according to the diagnostic criteria in the DSM-5 manual published by the American Psychiatric Association (APA), the difference between Levitic Cognitive Impairment and Parkinson's Disease depends on the location of Levitic, so the symptoms may be similar. In addition, Alzheimer’s disease and frontotemporal dementia, Alzheimer’s disease and Levitic cognitive impairment, Levitic cognitive impairment and Parkinson’s disease, and the relationship between bipolar disorder and severe depression Diseases are difficult to distinguish.

此外，患者往往表現出憂鬱症狀，這是認知障礙症的早期症狀。另一方面，當患者患有憂鬱性假性認知障礙症時，患者實際上是憂鬱的，但其症狀表現為認知能力下降。因此，難以區分患者是患有某種類型的認知障礙症、憂鬱症，還是認知障礙症和憂鬱症的結合體，並且其中一種症狀表現得很明顯。In addition, patients often show symptoms of depression, which is an early symptom of dementia. On the other hand, when the patient suffers from melancholic pseudodignosis, the patient is actually melancholic, but its symptoms are manifested as cognitive decline. Therefore, it is difficult to distinguish whether the patient is suffering from a certain type of dementia, depression, or a combination of dementia and depression, and one of the symptoms is obvious.

如果不能從多種類型的精神和神經疾病候選者中推斷出疾病，就有可能使患者在選擇應就診的醫療機構時出現錯誤，從而無意中使症狀惡化。If the disease cannot be inferred from multiple types of psychiatric and neurological disease candidates, it is possible for the patient to make a mistake in selecting the medical institution to be treated, thereby inadvertently worsening the symptoms.

因此，本發明的目的在於通過語音分析估計多種精神和神經系統疾病的裝置中，提供一種估計裝置以及運作該估計裝置的方法，該估計裝置具有提取不受語音取得場所影響的聲音特徵量的手段。解決問題的技術手段Therefore, the object of the present invention is to provide an estimation device and a method for operating the estimation device in an apparatus for estimating various mental and neurological diseases through speech analysis. The estimation device has a means for extracting sound feature quantities that are not affected by the place where the speech is obtained. . Technical means to solve the problem

作為解決上述問題的深入研究的結果，本申請的發明人在用於估計多種精神和神經系統疾病的裝置中，發現了一種估計裝置以及運作該估計裝置的方法，其中該估計裝置具有提取不受語音取得場所影響的聲音特徵量的手段。As a result of in-depth research to solve the above-mentioned problems, the inventor of the present application discovered an estimation device and a method of operating the estimation device in a device for estimating a variety of mental and neurological diseases, wherein the estimation device has the ability to extract Voice is a means to obtain the voice feature quantity affected by the place.

換言之，本發明包括以下態樣。 [1] 一種精神和神經系統疾病的估計裝置，包括：基於不隨錄音環境產生顯著差異的聲音特徵量（A）及各種疾病相關的聲音特徵量（B），提取上述聲音特徵量（A）及上述聲音特徵量（B）的共通的聲音特徵量（C）的提取單元；基於上述聲音特徵量（C）計算疾病預測值的計算單元；以及通過輸入上述疾病預測值來估計疾病的估計單元。 [2] 根據上述[1]所述的估計裝置，其特徵在於，其中可估計的上述疾病的候選者包括阿茲海默症引起的神經認知障礙（Neurocognitive Disorder Due to Alzheimer’s Disease）、因利維體引起神經認知障礙的利維體認知障礙（Neurocognitive Disorder with Lewy Bodies）、帕金森氏症（Parkinson’s Disease）、雙相情感障礙（Bipolar Disorder）、具有非典型特徵的非典型性憂鬱症（Depressive Disorder with Atypical Features）以及重度憂鬱症（Major Depressive Disorder）。 [3] 如上述[1]所述的估計裝置，其特徵在於，上述可估計的疾病候選者之一是重度憂鬱症。 [4] 一種估計裝置的運作方法，包括：以估計裝置的提取單元基於不隨錄音環境產生顯著差異的聲音特徵量（A）及各種疾病相關的聲音特徵量（B），提取上述聲音特徵量（A）和上述聲音特徵量（B ）的共通的聲音特徵量（C）的步驟；以上述估計裝置的計算單元基於上述聲音特徵量（C）計算疾病預測值的步驟；以及以上述估計裝置的估計單元通過輸入上述疾病預測值來估計疾病的步驟。發明功效In other words, the present invention includes the following aspects. [1] An estimation device for mental and nervous system diseases, including: Based on the voice feature quantity (A) and the voice feature quantity related to various diseases (B) that do not significantly differ with the recording environment, the common voice feature quantity (A) and the voice feature quantity (B) are extracted ( C) the extraction unit; A calculation unit that calculates the disease prediction value based on the above-mentioned voice feature quantity (C); and The estimation unit of the disease is estimated by inputting the above-mentioned disease prediction value. [2] The estimating device according to [1] above, wherein the estimable candidates for the disease include Neurocognitive Disorder Due to Alzheimer's Disease, Inrevi Neurocognitive Disorder with Lewy Bodies, Parkinson’s Disease, Bipolar Disorder, and Depressive Disorder with atypical features. with Atypical Features) and Major Depressive Disorder. [3] The estimation device described in [1] above, wherein one of the disease candidates that can be estimated is severe depression. [4] An operating method of an estimation device, including: The extraction unit of the estimation device extracts the above-mentioned sound characteristic quantities (A) and the above-mentioned sound characteristic quantities (B) based on the sound characteristic quantities (A) and various diseases-related sound characteristic quantities (B) that do not significantly differ with the recording environment Common voice feature quantity (C) steps; The step of calculating a disease prediction value based on the voice feature quantity (C) by the calculation unit of the estimation device; and The step of estimating the disease by inputting the predictive value of the disease by the estimating unit of the estimating device. Invention effect

本發明可以提供一種在估計多種精神和神經系統疾病的程式中提取不受語音取得場所影響的聲音特徵量的方法，以及使用該方法的裝置。The present invention can provide a method for extracting voice feature quantities that are not affected by the location of voice acquisition in a program for estimating various mental and neurological diseases, and a device using the method.

在下文中，將詳細描述本發明之估計多種精神和神經系統疾病的裝置，但下面描述的構成要求是本發明的一個實施例，並且在這些內容中進行了詳細說明。在下面的描述中，疾病預測值可以被稱為「精神值」。＜1. 程式＞In the following, the device for estimating various mental and neurological diseases of the present invention will be described in detail, but the configuration requirements described below are an embodiment of the present invention and are described in detail in these contents. In the following description, the disease predictive value can be referred to as the "mental value". ＜1. Program＞

本實施例的估計裝置200例如由具有圖1所示配置的電腦100來實現。在下文中，將使用示例給出描述。圖1為一個硬體配置圖，其顯示實現估計裝置200功能的電腦的一個示例。電腦100具有中央處理單元（Central Processing Unit，CPU）101、隨機存取記憶體（Random Access Memory，RAM）102、唯讀記憶體（Read-Only Memory，ROM）103、硬碟（Hard Disk Drive，HDD）104、通信介面（Interface，I/F）105、輸入/輸出介面106及媒體介面107。The estimation apparatus 200 of this embodiment is realized by, for example, a computer 100 having the configuration shown in FIG. 1. In the following, a description will be given using examples. FIG. 1 is a hardware configuration diagram, which shows an example of a computer that implements the functions of the estimation device 200. The computer 100 has a central processing unit (Central Processing Unit, CPU) 101, a random access memory (Random Access Memory, RAM) 102, a read-only memory (Read-Only Memory, ROM) 103, and a hard disk (Hard Disk Drive, HDD) 104, communication interface (Interface, I/F) 105, input/output interface 106, and media interface 107.

CPU 101基於儲存在ROM 103或HDD 104中的程式進行運作，並且控制每個部分。當電腦100啟動時，CPU 101會執行儲存在ROM 103內的啟動程式，其取決於電腦100的硬體的程式等。The CPU 101 operates based on programs stored in the ROM 103 or the HDD 104, and controls each part. When the computer 100 is started, the CPU 101 will execute the startup program stored in the ROM 103, which depends on the program of the hardware of the computer 100 and so on.

HDD 104儲存由CPU 101執行的程式以及該程式使用的資料等。通信介面105經由網路N從另一裝置接收資料，並將其發送到CPU 101，並且將由CPU 101生成的資料發送到另一裝置。The HDD 104 stores programs executed by the CPU 101 and data used by the programs. The communication interface 105 receives data from another device via the network N and sends it to the CPU 101, and sends the data generated by the CPU 101 to the other device.

CPU 101經由輸入/輸出介面106控制如顯示器的輸出裝置、如麥克風的語音輸入裝置，以及如鍵盤和滑鼠的輸入裝置。CPU 101經由輸入/輸出介面106從輸入裝置取得語音資料。此外，CPU 101經由輸入/輸出介面106將生成的資料輸出到輸出裝置。The CPU 101 controls an output device such as a display, a voice input device such as a microphone, and an input device such as a keyboard and a mouse via the input/output interface 106. The CPU 101 obtains voice data from the input device via the input/output interface 106. In addition, the CPU 101 outputs the generated data to the output device via the input/output interface 106.

媒體介面107讀取儲存在記錄媒體108中的程式或資料，並經由RAM 102將其提供給CPU 101。CPU 101經由媒體介面107將程式從記錄媒體108加載到RAM 102上，並執行加載的程式。記錄媒體108例如是諸如DVD（Digital Versatile Disc）或相變可讀寫式光碟（Phase change rewritable Disk，PD）之類的光學記錄媒體，諸如磁光碟（Magneto-Optical disk，MO）之類的光磁記錄媒體，磁帶媒體，磁記錄媒體或半導體儲存器，等等。The media interface 107 reads programs or data stored in the recording medium 108 and provides them to the CPU 101 via the RAM 102. The CPU 101 loads the program from the recording medium 108 to the RAM 102 via the media interface 107, and executes the loaded program. The recording medium 108 is, for example, an optical recording medium such as a DVD (Digital Versatile Disc) or a phase change rewritable Disk (PD), and an optical recording medium such as a magneto-optical disk (MO). Magnetic recording media, magnetic tape media, magnetic recording media or semiconductor storage, etc.

例如，當電腦100執行根據實施例的估計裝置200時，電腦100的CPU 101通過執行RAM 102上加載的程式來實現控制單元的功能。此外，記憶單元中的資料儲存在HDD104中。電腦100的CPU 101從記錄媒體108讀取並執行這些程式，但是作為另一示例，可以從另一裝置取得這些程式。＜2. 估計裝置的配置＞For example, when the computer 100 executes the estimation apparatus 200 according to the embodiment, the CPU 101 of the computer 100 implements the function of the control unit by executing the program loaded on the RAM 102. In addition, the data in the memory unit is stored in the HDD104. The CPU 101 of the computer 100 reads and executes these programs from the recording medium 108, but as another example, these programs can be obtained from another device. ＜2. Configuration of estimation device＞

接下來，將參考圖2描述根據實施例的估計裝置200的配置。如圖2所示，估計裝置200通過網路N以有線或無線方式可通信地連接到用戶端201。估計裝置200也可以連接到多個用戶端201。Next, the configuration of the estimation apparatus 200 according to the embodiment will be described with reference to FIG. 2. As shown in FIG. 2, the estimation device 200 is communicably connected to the client 201 via the network N in a wired or wireless manner. The estimation device 200 can also be connected to multiple client terminals 201.

如圖2所示，估計裝置200包括一通信單元202、一聲音特徵量提取單元203具有一第一聲音特徵量的提取單元204和一第二聲音特徵量的提取單元205、一計算單元206、一估計單元207以及一記憶單元208。聲音特徵量提取單元203、計算單元206和估計單元207由計算處理裝置（CPU）執行，並作為一個控制單元（未顯示）相互合作。As shown in FIG. 2, the estimation device 200 includes a communication unit 202, a voice feature extraction unit 203, a first voice feature extraction unit 204, a second voice feature extraction unit 205, a calculation unit 206, An estimation unit 207 and a memory unit 208. The sound feature extraction unit 203, the calculation unit 206, and the estimation unit 207 are executed by a calculation processing device (CPU), and cooperate with each other as a control unit (not shown).

通信單元202例如是由網路介面卡（Network Interface Card，NIC）等來實現。通信單元202通過有線或無線方式連接到網路N，並且向/從用戶端201發送/接收資訊。The communication unit 202 is, for example, implemented by a network interface card (NIC) or the like. The communication unit 202 is connected to the network N in a wired or wireless manner, and sends/receives information to/from the client 201.

控制單元可藉由例如由中央處理單元（CPU）或微處理單元（Micro Processing Unit，MPU）等實現，通過使用RAM作為工作區執行儲存在記憶單元208的各種程式。此外，控制單元例如藉由諸如特殊應用積體電路（Application Specific Integrated Circuit，ASIC）或場式可程式閘陣列（Field Programmable Gate Array，FPGA）之類的機體電路來實現。The control unit can be realized by, for example, a central processing unit (CPU) or a micro processing unit (Micro Processing Unit, MPU), etc., and executes various programs stored in the memory unit 208 by using RAM as a work area. In addition, the control unit is realized by, for example, a body circuit such as an Application Specific Integrated Circuit (ASIC) or a Field Programmable Gate Array (FPGA).

記憶單元208例如由諸如RAM（隨機存取記憶體）或快閃記憶體（Flash Memory）之類的半導體儲存元件，或諸如硬碟或光碟之類的儲存裝置實現。用戶端201包括語音輸入單元和估計結果輸出單元。估計裝置200從輸入單元取得用戶語音，將用戶語音從類比信號轉換為數位信號的語音資料，並且經由通信單元202將語音資料儲存在記憶單元208中。The memory unit 208 is realized by, for example, a semiconductor storage device such as RAM (Random Access Memory) or Flash Memory, or a storage device such as a hard disk or an optical disk. The user terminal 201 includes a voice input unit and an estimation result output unit. The estimation device 200 obtains the user's voice from the input unit, converts the user's voice from an analog signal to a voice data of a digital signal, and stores the voice data in the memory unit 208 via the communication unit 202.

輸入單元通過諸如麥克風之類的語音取得單元取得受檢者說出的語音信號，並通過以預定的採樣頻率（例如11025 Hz，赫茲）對語音信號進行採樣來生成數位信號語音資料。輸入單元可以包括用於與估計裝置200的記憶單元208個別記錄語音資料的記憶單元。在這種情況下，輸入單元可以是可攜式錄音機。輸入單元的記憶單元可以是諸如CD、DVD、USB儲存器、SD卡或小型光碟之類的記錄介質。The input unit obtains the voice signal spoken by the subject through a voice acquisition unit such as a microphone, and generates digital signal voice data by sampling the voice signal at a predetermined sampling frequency (for example, 11025 Hz, Hertz). The input unit may include a memory unit for separately recording voice data with the memory unit 208 of the estimation device 200. In this case, the input unit may be a portable recorder. The memory unit of the input unit may be a recording medium such as CD, DVD, USB storage, SD card, or compact disc.

輸出單元包含接收估計結果等的資料的接收單元及顯示該資料的顯示單元。顯示單元是顯示估計結果等的資料的顯示器。顯示器可以是有機電致發光（Organic Electro-Luminescence）或液晶等的顯示器。＜＜提取單元 203＞＞The output unit includes a receiving unit that receives data such as the estimation result and a display unit that displays the data. The display unit is a display that displays data such as estimation results. The display may be an organic electro-luminescence (Organic Electro-Luminescence) or a liquid crystal display. ＜＜Extraction unit 203＞＞

提取單元203具有第一聲音特徵量的提取單元204及第二聲音特徵量的提取單元205。此處，第一聲音特徵量的提取單元204用於創建第一聲音特徵量的集合。將多個健康受檢者預先在多個場所說出相同的說話內容，對取得的說話語音進行標記及歸一化處理後，進行語音分析提取多個特徵量，將該多個特徵量透過成對t檢定（paired t-test）進行比較，並將在任何場所沒有顯著差異的一組聲音特徵量，定義為第一聲音特徵量的集合。作為沒有顯著差異的聲音特徵量的集合的一個範例，在成對t檢定中，較佳為P值超過0.05的聲音特徵量的集合，更佳為P值超過0.1的聲音特徵量的集合。P值的理論數值範圍為0至1，並且P值的顯著性水位通常設定為0.05。The extraction unit 203 has a first sound feature extraction unit 204 and a second sound feature extraction unit 205. Here, the extraction unit 204 of the first sound feature quantity is used to create a set of the first sound feature quantity. Multiple healthy subjects are preliminarily speaking the same content in multiple places, the acquired speaking voice is labeled and normalized, then the voice analysis is performed to extract multiple feature quantities, and the multiple feature quantities are transformed into The paired t-test is compared, and a group of sound feature quantities that are not significantly different in any place are defined as the set of first sound feature quantities. As an example of a collection of voice feature quantities with no significant difference, in the paired t test, a collection of voice feature quantities with a P value of more than 0.05 is preferable, and a collection of voice feature quantities with a P value of more than 0.1 is more preferable. The theoretical numerical range of the P value is 0 to 1, and the significance level of the P value is usually set to 0.05.

第一聲音特徵量的集合儲存在記憶單元208。第一聲音特徵量的集合可與後述的第二聲音特徵量的集合一起使用，也可只將第一聲音特徵量的集合用作為與環境無關的特徵量。The set of first sound feature quantities is stored in the memory unit 208. The set of first voice feature amounts may be used together with the set of second voice feature amounts described later, or only the set of first voice feature amounts may be used as a feature amount that has nothing to do with the environment.

第二聲音特徵量的提取單元205創建第二聲音特徵量的集合。將多個健康受檢者預先在多個場所說出不同的說話內容，對取得的說話語音進行標記及歸一化處理後，進行語音分析提取多個特徵量，將該多個特徵量透過非成對t檢定（Unpaired t-test）進行比較，並將在任何場所沒有顯著差異的一組聲音特徵量，定義為第二聲音特徵量的集合。作為沒有顯著差異的聲音特徵量的集合的一個範例，在成對t檢定中，較佳為P值超過0.05的聲音特徵量的集合，更佳為P值超過0.1的聲音特徵量的集合。The extraction unit 205 of the second sound feature quantity creates a set of second sound feature quantities. A number of healthy subjects are pre-said different speech content in multiple places, the obtained speech speech is labeled and normalized, and then the speech analysis is performed to extract multiple feature quantities, and the multiple feature quantities are transmitted through non- Paired t-tests (Unpaired t-test) are compared, and a group of sound feature quantities that are not significantly different in any place are defined as the set of second sound feature quantities. As an example of a collection of voice feature quantities with no significant difference, in the paired t test, a collection of voice feature quantities with a P value of more than 0.05 is preferable, and a collection of voice feature quantities with a P value of more than 0.1 is more preferable.

第二聲音特徵量的集合儲存在記憶單元208。第二聲音特徵量的集合可與第一聲音特徵量的集合一起使用，也可只將第二聲音特徵量的集合用作為與環境無關的特徵量。The set of second sound feature quantities is stored in the memory unit 208. The set of second sound feature amounts may be used together with the set of first sound feature amounts, or only the set of second sound feature amounts may be used as a feature amount that has nothing to do with the environment.

接著說明設定P值的閾值的基礎。圖6顯示基於健康受檢者的說話進行語音分析以提取聲音特徵量，在成對t檢定或t檢定中具有顯著差異的聲音特徵量的示例。另一方面，圖7顯示基於健康受檢者的說話進行語音分析以提取聲音特徵量，在成對t檢定或t檢定中不具有顯著差異的聲音特徵量的示例。讓健康受檢者在不同場所說出相同的說話內容或不同的說話內容來取得說話語音，並在比較某聲音特徵量時，出現如圖6顯示的具有顯著差異，表示該語音的屬性的差異僅在於環境，可強烈懷疑是與環境相關的聲音特徵量。因此，當聲音特徵量的集合在P值超過0.05的情況，即如圖7所示沒有顯著差異，則可選擇為與環境無關的聲音特徵量。Next, the basis for setting the threshold of the P value will be explained. FIG. 6 shows an example of voice analysis based on the utterance of a healthy subject to extract voice feature amounts, and there is a significant difference in voice feature amounts in paired t-tests or t-tests. On the other hand, FIG. 7 shows an example in which voice analysis is performed based on the utterance of a healthy subject to extract voice feature amounts, and there is no significant difference in voice feature amounts in paired t-tests or t-tests. Let healthy subjects speak the same content or different content in different places to obtain the spoken voice, and when comparing a certain voice feature amount, there is a significant difference as shown in Figure 6, indicating the difference in the attributes of the voice It only lies in the environment, and it is strongly suspected that it is the sound characteristic amount related to the environment. Therefore, when the set of sound feature quantities exceeds 0.05 at the P value, that is, there is no significant difference as shown in FIG. 7, it can be selected as a sound feature quantity that has nothing to do with the environment.

此外，如果聲音特徵量的集合的P值超過0.1，表示當健康受檢者在前往每個場所的過程，聲音特徵量不受到輕微的身體狀況的影響，可選擇為與環境無關的聲音特徵量。此外，如果聲音特徵量的集合的P值超過0.1，表示難以被用於疾病估計的至少一個以上的聲音特徵量（以下以特徵量F（a）說明）影響，從創建疾病估計程式的角度來看是較佳的。In addition, if the P value of the set of sound feature quantities exceeds 0.1, it means that when a healthy subject is going to each place, the sound feature quantity is not affected by slight physical conditions, and it can be selected as a sound feature quantity that is not related to the environment. . In addition, if the P value of the set of sound feature quantities exceeds 0.1, it means that it is difficult to be affected by at least one sound feature quantity used for disease estimation (the feature quantity F(a) is described below). From the perspective of creating a disease estimation program Look is better.

接著，將更具體地描述創建第一聲音特徵量的集合的方法。在此，為了消除由於場所的環境引起的差異，測量了場所間的聲音特徵量的顯著差異。例如，對於在七個場所（分別稱為場所1至場所7）收集的語音，如同場所1和場所2，場所1和場所3一般創建₇ C₂ 對，並提取在任何一對中都沒有顯著差異的聲音特徵量（成對t檢定）。這個成對t檢定，是在作為對象的所有場所中，取得一人或多人的健康受檢者的說話語音。在此，健康受檢者是指沒有罹患作為分析對象的疾病的人。Next, a method of creating a set of first sound feature amounts will be described in more detail. Here, in order to eliminate the difference caused by the environment of the place, the significant difference in the sound characteristic amount between the places is measured. For example, for the voices collected in seven locations (respectively called location 1 to location 7), like location 1 and location 2, location 1 and location 3 generally create ₇ C ₂ pairs, and the extraction is not significant in any pair Differential sound characteristics (paired t test). This paired t test is to obtain the speech voice of one or more healthy subjects in all places as the target. Here, a healthy subject refers to a person who has not suffered from the disease targeted for analysis.

這個成對t檢定可讓1個健康受檢者參與即可，較佳讓2人以上參與以提高可信度，更佳的是讓3人以上參與。此外，當由多個健康受檢者執行時，在同一場所取得的語音可以針對多個人共同地或各別地進行處理。在各別處理時，檢測的成對數則變成₇ C₂ ×人數。This paired t test allows only one healthy subject to participate. It is better to allow more than 2 people to participate in order to increase credibility, and it is even better to allow more than 3 people to participate. In addition, when performed by multiple healthy subjects, voices acquired in the same place can be processed collectively or individually for multiple persons. In the separate processing, the detected logarithm becomes ₇ C ₂ × the number of people.

另外，當健康受檢者在每個場所說出多個短語以取得語音時，可以集體或各別地處理那些短語。在各別處理時，對每個短語提取沒有顯著差異的聲音特徵量。In addition, when a healthy subject utters multiple phrases in each place to obtain speech, those phrases can be processed collectively or individually. In the separate processing, the voice feature quantity with no significant difference is extracted for each phrase.

接著，將更具體地描述創建第二聲音特徵量的集合的方法。在此，為了消除由於患者組（以及健康受檢者組）引起的差異，測量了患者組的聲音特徵量的顯著差異。例如，如果在一定期間內取得多個重度憂鬱症患者（重度憂鬱症A組）的聲音，在同一時間取得多個健康受檢者的聲音（健康受檢者A組），在另一個時期取得了多個重度憂鬱症患者的聲音（重度憂鬱症B組），在同一時期取得了多個健康受檢者的聲音（健康受檢者B組），則進行t檢定（非成對t檢定）測量各組（重度憂鬱症A組和重度憂鬱症B組，健康受檢者A組和健康受檢者B組）在同一疾病（或健康受檢者）中聲音特徵量的顯著差異。此外，當各組的患者說出多個短語以取得語音時，可以集體或各別地處理那些短語。在各別處理時，對每個短語提取沒有顯著差異的聲音特徵量。Next, a method of creating a set of second sound feature amounts will be described in more detail. Here, in order to eliminate the difference caused by the patient group (and the healthy subject group), the significant difference in the voice feature amount of the patient group was measured. For example, if the voices of multiple patients with severe depression (severe depression group A) are acquired within a certain period of time, the voices of multiple healthy subjects (group A of healthy subjects) are acquired at the same time, and the voices of multiple healthy subjects are acquired in another period. If the voices of multiple patients with severe depression (severe depression group B) are obtained, and if multiple healthy subjects’ voices are obtained in the same period (healthy subjects group B), then t-test (unpaired t-test) Measure the significant differences in voice characteristics of each group (severe depression group A and severe depression group B, healthy subject A and healthy subject B) in the same disease (or healthy subject). In addition, when patients in each group utter multiple phrases to obtain speech, those phrases can be processed collectively or individually. In the separate processing, the voice feature quantity with no significant difference is extracted for each phrase.

聲音特徵量的提取單元203，比較超過期望的P值的第一聲音特徵量的集合及第二聲音特徵量的集合，將共通的聲音特徵量的集合定義為不受語音的取得場所影響的第三聲音特徵量的集合。此外，也可僅將基於超過期望的P值的第一聲音特徵量的集合定義為不受語音的取得場所影響的第三聲音特徵量的集合。The voice feature value extraction unit 203 compares the first voice feature value set and the second voice feature value set that exceed the desired P value, and defines the common voice feature value set as the first voice feature set that is not affected by the location where the voice is obtained A collection of three sound feature quantities. In addition, it is also possible to define only the set of first voice feature amounts that exceed the desired P value as a set of third voice feature amounts that are not affected by the location where the voice is acquired.

第三聲音特徵量的集合在提取為了計算多種疾病的預測值的至少一組以上的聲音特徵量（特徵量F（a））的時候使用。例如，將用於計算多種疾病的至少一組以上的聲音特徵量的集合及上述第三聲音特徵量的集合的共通特徵量，提取作為用於計算真實的多種疾病的預測值的至少一組以上的聲音特徵量（特徵量F（a））。＜＜提取單元203的處理流程＞＞The third set of sound feature quantities is used when extracting at least one set of sound feature quantities (feature quantity F(a)) for calculating the predictive value of multiple diseases. For example, extracting at least one set of common feature values used to calculate at least one set of voice feature amounts for multiple diseases and the third set of voice feature amounts as a predictive value for real multiple diseases The sound feature quantity (feature quantity F(a)). ＜＜Processing flow of extraction unit 203＞＞

此處參照圖3說明關於提取單元203的處理流程。當運作開始時，在步驟S1001中，提取單元203對預先取得語音之後並儲存在儲存單元208中的語音資料進行說話標記的運作。接著，在步驟S1002中，提取單元203對已經完成說話標記的語音資料進行歸一化處理。在進行歸一化處理後則完成前處理的運作。接著，在步驟S1003中，提取單元203從前處理完的語音資料提取聲音特徵量。Here, the processing flow of the extraction unit 203 will be described with reference to FIG. 3. When the operation starts, in step S1001, the extraction unit 203 performs a speech marking operation on the speech data stored in the storage unit 208 after the speech is obtained in advance. Next, in step S1002, the extraction unit 203 performs normalization processing on the speech data for which the speaking mark has been completed. After the normalization process is performed, the pre-processing operation is completed. Next, in step S1003, the extraction unit 203 extracts the voice feature amount from the previously processed speech data.

接著，在步驟S1004A中，提取單元203的第一聲音特徵量的提取單元204，將提取出的聲音特徵量與由多個健康受檢者預先在多個場所說出相同說話內容而取得的說話語音所創建的聲音特徵量，以成對t檢定（Paired t-test）進行比較。接著，在步驟S1005A中，第一聲音特徵量的提取單元204，根據期望的P值的閾值，將在任何場所間沒有顯著差異的聲音特徵量的集合定義為第一聲音特徵量的集合。Next, in step S1004A, the extraction unit 204 of the first voice feature amount of the extraction unit 203 compares the extracted voice feature amount with the speech obtained by multiple healthy subjects uttering the same content in multiple places in advance. The voice features created by speech are compared with Paired t-test. Next, in step S1005A, the extraction unit 204 of the first sound feature quantity defines a set of sound feature quantities that are not significantly different in any place as the set of first sound feature quantities according to the desired threshold of the P value.

另一方面，在步驟S1004B中，提取單元203的第二聲音特徵量的提取單元205，將提取出的聲音特徵量與由多個健康受檢者預先在多個場所說出不同說話內容而取得的說話語音所創建的聲音特徵量，以t檢定（Unpaired t-test）進行比較。接著，在步驟S1005B中，第二聲音特徵量的提取單元205，根據期望的P值的閾值，將任何場所間沒有顯著差異的聲音特徵量的集合定義為第二聲音特徵量的集合。On the other hand, in step S1004B, the second voice feature value extraction unit 205 of the extraction unit 203 obtains the extracted voice feature value from the different speech content of a plurality of healthy subjects uttered in a plurality of places in advance. The voice feature volume created by the speaking voice of, is compared with the unpaired t-test. Next, in step S1005B, the second voice feature amount extraction unit 205 defines a set of voice feature amounts with no significant difference between any places as a second voice feature set based on the desired threshold of the P value.

接著，在步驟S1006中，聲音特徵量的提取單元203，比較超過期望的P值的第一聲音特徵量的集合與第二聲音特徵量的集合，將共通的聲音特徵量的集合定義為不受到語音的取得場所影響的第三聲音特徵量的集合，並結束運作。此外，僅將基於超過期望的P值的第一聲音特徵量的集合定義為第三聲音特徵量的集合時，可以省略步驟S1006。Next, in step S1006, the voice feature value extraction unit 203 compares the first voice feature value set exceeding the desired P value with the second voice feature value set, and defines the common voice feature value set as not receiving The collection of the third voice feature quantity affected by the location where the voice is acquired, and the operation ends. In addition, when only the set of first sound feature amounts based on the value exceeding the desired P value is defined as the set of third sound feature amounts, step S1006 may be omitted.

透過進行上述處理，將不受到語音的取得場所影響的第三聲音特徵量的集合結合於用於計算多種疾病的預測值的至少一組以上的聲音特徵量（特徵量F（a）），可以進行更準確的疾病估計。＜＜計算單元206及估計單元207＞＞By performing the above processing, the set of third voice feature quantities that are not affected by the location where the voice is acquired is combined with at least one set of voice feature quantities (feature quantity F(a)) used to calculate the predictive value of multiple diseases. Make more accurate disease estimates. ＜＜Calculation unit 206 and estimation unit 207＞＞

計算單元203基於下述疾病的推論模型並基於至少一個聲音特徵量的組合，計算多種疾病的預測值。估計單元207將疾病的預測值作為輸入以估計多種神經精神疾病。接著詳述計算單元206及估計單元207。＜＜計算疾病的預測值＞＞The calculation unit 203 calculates predictive values of multiple diseases based on the following inference models of diseases and based on a combination of at least one voice feature amount. The estimation unit 207 takes the predicted value of the disease as input to estimate a variety of neuropsychiatric diseases. Next, the calculation unit 206 and the estimation unit 207 are described in detail. ＜＜Calculation of predictive value of disease＞＞

將對疾病預測值的計算進行示概略描述。計算單元206經歷從受檢者的語音資料中提取多個聲音特徵量的步驟。聲音特徵量是從患者的語音資料中提取的。聲音特徵量是指聲音傳輸時的特徵的特徵量。例如，聲音特徵量包括零點交叉率和赫斯特（Hurst）指數。通過計算每單位時間內，語音的聲壓波形與基準壓力相交的次數來計算零點交叉率，作為語音中波形的變化的嚴重程度。赫斯特指數表示語音波形變化的相關性。The calculation of the disease predictive value will be briefly described. The calculation unit 206 undergoes a step of extracting a plurality of voice feature quantities from the voice data of the subject. The voice feature quantity is extracted from the patient's voice data. The sound characteristic amount refers to the characteristic amount of the characteristic when the sound is transmitted. For example, the sound feature quantity includes a zero-point crossing rate and a Hurst index. The zero crossing rate is calculated by calculating the number of times the sound pressure waveform of the voice intersects the reference pressure per unit time, as the severity of the waveform change in the voice. The Hurst index represents the correlation of changes in the voice waveform.

從這裡開始，解釋估計疾病的程式。由於有必要方便與上述第一聲音特徵量至第三聲音特徵量的集合區別以進行聲音特徵量的說明，故將「聲音特徵量」稱為「聲音參數」來說明。然而，在本發明說明中，「聲音特徵量」與「聲音參數」本質上是相同的，兩者都被用作推理裝置的輸入，並具有表示實體特徵的列的含義和程度。From here, explain the formula for estimating the disease. Since it is necessary to distinguish it easily from the set of the aforementioned first voice feature quantity to the third voice feature quantity in order to describe the voice feature quantity, the "sound feature quantity" is referred to as the "sound parameter" for explanation. However, in the description of the present invention, the "sound feature amount" and the "sound parameter" are essentially the same, and both are used as the input of the inference device, and have the meaning and degree of the column representing the entity feature.

疾病的估計裝置所使用的聲音參數中，包含第一聲音參數及第二聲音參數。第一聲音參數是從應該估計特定疾病的受檢者的語音中提取的聲音的參數。第二聲音參數是預先記錄在記憶單元208的聲音的參數。第二聲音參數是從患有阿茲海默症、利維體認知障礙、帕金森氏症、重度憂鬱症、非典型憂鬱症或雙相情感障礙症的患者的語音資料所提取，並預先將各聲音參數與各疾病進行聯結。The sound parameters used by the disease estimation device include a first sound parameter and a second sound parameter. The first sound parameter is a parameter of a sound extracted from the voice of a subject whose specific disease should be estimated. The second sound parameter is a parameter of the sound recorded in the memory unit 208 in advance. The second voice parameter is extracted from the voice data of patients suffering from Alzheimer’s disease, Levitic cognitive impairment, Parkinson’s disease, severe depression, atypical depression or bipolar disorder, and preliminarily Each sound parameter is linked to each disease.

本發明所使用的聲音參數包含下列的項目： 1）音量包絡（上升時間、衰減時間、維持水平、釋放時間） 2）波形變動資訊（微光（shimmer）、抖動（jitter）） 3）零點交叉率 4）赫斯特指數 5）語音開始時間（Voice Onset Time，VOT） 6）梅爾頻率倒譜係數的發聲分佈統計值（第一四分位數、中位數、第三四分位數、95%點、算術平均數、幾何平均數、第三四分位數與中位數之間的差等） 7）頻譜變化率的語音內分佈統計值（第一四分位數、中位數、第三四分位數、95%點、算術平均數、幾何平均數、第三四分位數與中位數之間的差等） 8）關於梅爾頻率倒譜係數隨時間變化的語音分佈統計值（第一四分位數、中位數、第三四分位數、95%點、算術平均數、幾何平均數、第三四分位數與中位數之間的差等） 9）關於梅爾頻率倒譜係數的時間變化的語音分佈相對於時間變化的統計值（第一四分位數、中位數、第三四分位數、95%點、算術平均數、幾何平均數、第三四分位數與中位數之間的差等） 10）發聲內時間變化與90%頻譜滾降的二次回歸近似平方誤差 11）頻譜重心發聲時間變化的二次回歸近似的算術誤差另外，其他因素包括音調率、發聲的概率、任意範圍內的頻率功率、音階、說話速度（一定時間內的聲音數量）、暫停/間隔、音量等。The sound parameters used in the present invention include the following items: 1) Volume envelope (rise time, decay time, maintenance level, release time) 2) Waveform change information (shimmer, jitter) 3) Zero crossing rate 4) Hurst index 5) Voice Onset Time (Voice Onset Time, VOT) 6) Statistical value of sound distribution of Mel frequency cepstral coefficient (first quartile, median, third quartile, 95% point, arithmetic mean, geometric mean, third quartile Difference from the median, etc.) 7) Intra-phonetic distribution statistics of the rate of spectral change (first quartile, median, third quartile, 95% point, arithmetic mean, geometric mean, third quartile Difference from the median, etc.) 8) The statistic value of the voice distribution of the Mel frequency cepstral coefficient over time (the first quartile, the median, the third quartile, the 95% point, the arithmetic mean, the geometric mean, the third The difference between the quartile and the median, etc.) 9) The statistical value of the voice distribution relative to the time change of the time change of the Mel frequency cepstral coefficient (first quartile, median, third quartile, 95% point, arithmetic mean, geometric Mean, the difference between the third quartile and the median, etc.) 10) Approximate square error of quadratic regression between time change in utterance and 90% spectrum roll-off 11) The arithmetic error of the quadratic regression approximation of the time change of the spectral gravity center sound In addition, other factors include pitch rate, probability of utterance, frequency power within any range, scale, speaking speed (number of voices in a certain period of time), pause/interval, volume, etc.

估計程式具有人工智慧的學習功能，通過其學習功能執行估計處理。推理模型可以使用分類算法，如線性模型回歸、線性回歸、剛性回歸、Lasso或邏輯回歸等的分類算法。也可以使用類神經網路的深度學習，或者可以使用部分強化學習領域的強化學習等。其他還可以使用遺傳演算法、聚類分析、自組織圖、集成學習等。當然，也可以利用這些以外的人工智慧相關的技術。在集成學習中，可以透過使用增強和決策樹兩者的方法來創建分類算法。The estimation program has a learning function of artificial intelligence, and performs estimation processing through its learning function. The reasoning model can use classification algorithms, such as linear model regression, linear regression, rigid regression, Lasso, or logistic regression. You can also use deep learning like neural networks, or you can use reinforcement learning in some areas of reinforcement learning. Others can also use genetic algorithms, cluster analysis, self-organizing maps, integrated learning, etc. Of course, other technologies related to artificial intelligence can also be used. In ensemble learning, classification algorithms can be created by using both enhancement and decision tree methods.

在創建估計程式的階段，演算法創建者從上述第二聲音參數的項目中，使用逐步回歸法使作為變數f（n）所使用的任意聲音參數得到更佳的組合，而選擇一個或多個聲音參數。接著，對所選擇的任意聲音參數賦予係數，而創建一個或多個聲音參數。再進而組合這些聲音參數以創建F（a）。In the stage of creating the estimation program, the algorithm creator uses the stepwise regression method to obtain a better combination of the arbitrary sound parameters used as the variable f(n) from the above-mentioned second sound parameter items, and selects one or more Sound parameters. Then, coefficients are assigned to the selected arbitrary sound parameters, and one or more sound parameters are created. Then combine these sound parameters to create F(a).

逐步回歸法有3種類型：變數增加法，變數減少法和變數增減法，可以使用其中任何一種。逐步回歸法中使用的回歸分析包括線性判別式和邏輯回歸分析等的線性分類處理。變數f（n）及其係數，也就是下列公式的F（a）的係數xn則稱為回歸係數，其賦予函數f（n）的權重。There are three types of stepwise regression methods: variable increase method, variable decrease method and variable increase and decrease method, any of which can be used. The regression analysis used in the stepwise regression method includes linear classification processing such as linear discriminant and logistic regression analysis. The variable f(n) and its coefficients, that is, the coefficient xn of F(a) in the following formula, are called regression coefficients, which give weight to the function f(n).

在學習演算法的創建者選擇了回歸係數之後，可以從資料庫所累積的疾病資訊等，透過用於提高估計準確度的機器學習來改善品質。After the creator of the learning algorithm selects the regression coefficients, the disease information accumulated in the database can be used to improve the quality through machine learning to improve the accuracy of the estimation.

受檢者的疾病預測值，例如可基於下列公式F（a）計算出一個以上的聲音參數。The disease predictive value of the subject can be calculated based on the following formula F(a) to calculate more than one sound parameter, for example.

公式1 F （a） = x1 x f （1） + x2 x f （2） + x3 x f （3） + ··· + xn x f （n）Formula 1 F (a) = x1 x f (1) + x2 x f (2) + x3 x f (3) + ··· + xn x f (n)

此處，f（n）從上述聲音參數的項目（1）至（11）之中任意選擇的一個以上的第二聲音參數。xn是疾病所固有的回歸係數。f（n）、xn可事先記錄於估計程式的記錄裝置120中。可以在估計程式的機器學習的過程中改善參數F（a）的回歸係數。Here, f(n) is one or more second sound parameters arbitrarily selected from the above-mentioned sound parameter items (1) to (11). xn is the regression coefficient inherent to the disease. f(n) and xn can be recorded in the recording device 120 of the estimation program in advance. The regression coefficient of the parameter F(a) can be improved in the process of machine learning of the estimation program.

圖2的計算單元206，基於第二聲音參數的組合，辨別健康受檢者和疾病受檢者，或計算用於辨別疾病受檢者的參數。從該參數，透過對計算參考範圍以及對受檢者的值與參考範圍的距離進行計分，來計算受檢者的疾病預測值。The calculation unit 206 in FIG. 2 distinguishes between healthy subjects and disease subjects based on the combination of the second sound parameters, or calculates parameters for distinguishing the disease subjects. From this parameter, by scoring the calculation reference range and the distance between the subject’s value and the reference range, the subject’s disease prediction value is calculated.

圖8顯示每種疾病的某個聲音參數的不同強度的圖，其顯示疾病A的分數最高。因此，計算出受檢者相對疾病A的預測值，比其他疾病組還要高。此外，例如將強度50設定為閾值，可以將疾病A，疾病D和疾病E的疾病組，與疾病B和疾病C的疾病組進行分類。Fig. 8 shows a graph of different intensities of a certain sound parameter for each disease, which shows that disease A has the highest score. Therefore, the predicted value of the subject relative to disease A is calculated, which is higher than that of other disease groups. In addition, for example, if the intensity 50 is set as a threshold, the disease groups of disease A, disease D, and disease E can be classified from disease groups of disease B and disease C.

雖然圖8是基於一個聲音參數的強度計算疾病的預測值，但事實上僅以一個聲音參數來分類疾病是有困難的。因此，也可以透過組合幾個聲音參數計算所需的參數F（a）來對疾病分類。Although Figure 8 calculates the predictive value of a disease based on the intensity of a sound parameter, in fact it is difficult to classify a disease with only one sound parameter. Therefore, it is also possible to classify diseases by combining several sound parameters to calculate the required parameter F(a).

基於參數F（a），針對被標記受檢者的聲音計算疾病的預測值，求出每種疾病的預測值的分佈。藉此，可對每種疾病進行分類。Based on the parameter F(a), the predicted value of the disease is calculated for the voice of the marked subject, and the distribution of the predicted value of each disease is obtained. In this way, each disease can be classified.

圖9是通過三個聲音參數的組合取得的疾病預測值（在圖9中描述為「精神值」）的分佈圖Figure 9 is a distribution diagram of disease prediction values (described as "mental value" in Figure 9) obtained through the combination of three sound parameters

從圖9可以看出，可以將利維體認知障礙症組中的預測值分佈與其他疾病患者組和健康受檢者組中的預測值分佈區分開。在本發明中，針對每種疾病設置聲音參數的組合以便與其他疾病區分開，計算參數F（a），並且輸入作為聲音參數的組合的參數F（a），如此可以確定每個受檢者的聲音適用於哪種疾病。It can be seen from Fig. 9 that the distribution of the predictive value in the Levitic dementia group can be distinguished from the distribution of the predictive value in the group of patients with other diseases and the group of healthy subjects. In the present invention, a combination of sound parameters is set for each disease to distinguish it from other diseases, the parameter F(a) is calculated, and the parameter F(a) as the combination of the sound parameters is input, so that each subject can be determined What kind of disease does your voice apply to.

另一種方法是從每個患者的聲音中提取每種疾病的參數F（a），找出哪種疾病的參數更多，然後將疾病預測值相互比較，估計出患者所患的疾病。。Another method is to extract the parameter F(a) of each disease from the voice of each patient, find out which disease has more parameters, and then compare the disease prediction values with each other to estimate the disease of the patient. .

在這種情況下，可以把疾病預測值作為受該疾病影響的程度。通過比較每種疾病預測值，可以表示受該疾病影響的概率。In this case, the predictive value of the disease can be regarded as the degree of influence by the disease. By comparing the predictive value of each disease, the probability of being affected by the disease can be expressed.

這樣，從包括阿茲海默症、利維體認知障礙症、帕金森氏症、重度憂鬱症、非典型性憂鬱症和雙相情感障礙等六種疾病患者的聲音，以及從健康受檢者的聲音中提取與每種疾病相關的參數F（a），並且計算每種疾病預測值。In this way, from the voices of patients with six diseases including Alzheimer’s disease, Levitic dementia, Parkinson’s disease, severe depression, atypical depression, and bipolar disorder, as well as from healthy subjects The parameter F(a) related to each disease is extracted from the voice of, and the predictive value of each disease is calculated.

此外，關於對象疾病，估計程式可以從10種疾病患者的語音中創建，包括四種額外的疾病：血管神經認知障礙症（Vascular Neurocognitive Disorder）、前顳葉神經認知障礙症（Frontotemporal Neurocognitive Disorder）、循環型情緒障礙症（Cyclothymic Disorder）和持續性憂鬱症（Persistent Depressive Disorder）。In addition, regarding the target disease, it is estimated that the program can be created from the speech of patients with 10 diseases, including four additional diseases: Vascular Neurocognitive Disorder, Frontotemporal Neurocognitive Disorder, Cyclic mood disorder (Cyclothymic Disorder) and persistent depression (Persistent Depressive Disorder).

最後，通過分析判別受檢者說出的語音，估計單元207估計該受檢者是否患有上述6至10種疾病中的任何一種，或者是健康的。Finally, by analyzing and judging the voice spoken by the subject, the estimation unit 207 estimates whether the subject has any of the above 6 to 10 diseases, or is healthy.

關於估計程式的估計流程，如上所述，疾病預測值可以通過提取每種疾病的特徵量F（a）來計算每個各別的疾病，但首先，可以創建與一組疾病相關的聲音特徵量組合，與該組疾病相關的特徵量F（a）可以作為估計單元的輸入，然後多個階段的輸入和估計可分為兩個或多個階段，並對每種疾病或健康狀態進行最終估計。＜＜估計裝置的處理＞＞Regarding the estimation process of the estimation program, as described above, the disease prediction value can be calculated for each individual disease by extracting the feature quantity F(a) of each disease, but first, the sound feature quantity related to a group of diseases can be created In combination, the feature quantity F(a) related to the group of diseases can be used as the input of the estimation unit, and then the input and estimation of multiple stages can be divided into two or more stages, and the final estimation is made for each disease or health state . ＜＜Processing of estimation device＞＞

圖4顯示圖2所示的估計裝置200的估計處理的一個例子。圖4是由估計裝置200的計算處理裝置（CPU）執行儲存在估計裝置200的記憶單元208中的估計程式來實現。FIG. 4 shows an example of the estimation process of the estimation device 200 shown in FIG. 2. FIG. 4 is implemented by the calculation processing device (CPU) of the estimation device 200 executing the estimation program stored in the memory unit 208 of the estimation device 200.

當處理開始時，在步驟S2001中，控制單元取得語音資料。語音資料可從用戶端201的輸入單元取得，或先儲存於記憶單元208中之後再由控制單元讀出。接著，在步驟S2002中，聲音特徵量的提取單元203從語音資料中提取第一聲音參數。接著，在步驟S2003中，從第一聲音參數中排除與環境相關的聲音特徵量，並提取處理後的第一聲音參數。例如，比較第一聲音參數與提取單元203所得到的第三聲音特徵量，可將沒有共通的部分判斷為與環境相關的聲音特徵量。When the process starts, in step S2001, the control unit obtains voice data. The voice data can be obtained from the input unit of the user terminal 201 or stored in the memory unit 208 before being read by the control unit. Next, in step S2002, the voice feature amount extraction unit 203 extracts the first voice parameter from the voice data. Next, in step S2003, the sound feature quantity related to the environment is excluded from the first sound parameter, and the processed first sound parameter is extracted. For example, by comparing the first sound parameter with the third sound feature quantity obtained by the extraction unit 203, a part that is not in common can be determined as a sound feature quantity related to the environment.

接著，在步驟S2004中，計算單元206比較從第二聲音參數得到的參數F（a）與步驟S2003中所得到的處理後的第一聲音參數，以計算各疾病的預測值。Next, in step S2004, the calculation unit 206 compares the parameter F(a) obtained from the second sound parameter with the processed first sound parameter obtained in step S2003 to calculate the predicted value of each disease.

接著，在步驟S2005中，估計單元207透過用於區別特定疾病與其他疾病的每個閾值，將計算出疾病預測值的多個患者，與應該指定的患者及其他患者區別，並結束處理。在下述的實施例中，透過對超過閾值的情況和未超過閾值的情況進行分類來進行判定。＜3. 程式的使用領域＞Next, in step S2005, the estimating unit 207 distinguishes the multiple patients for which the disease prediction value is calculated from the patients to be designated and other patients through each threshold for distinguishing the specific disease from other diseases, and ends the process. In the following embodiment, the judgment is made by classifying the cases where the threshold value is exceeded and the cases where the threshold value is not exceeded. ＜3. Application area of the program＞

由於本發明的估計程式甚至可以分析來自遠程位置的語音，因此它可以用於線上醫療和線上諮詢的場景中。在診斷神經精神疾病時，醫師會通過問診或面談，觀察患者的表情、動作及對話狀況。但是，由於患者對神經精神疾病具有偏見，可能猶豫前往精神科醫院和診所。Since the estimation program of the present invention can even analyze voices from remote locations, it can be used in online medical and online consultation scenarios. When diagnosing neuropsychiatric diseases, doctors will observe the patient's facial expressions, movements and conversations through consultations or interviews. However, patients may hesitate to go to psychiatric hospitals and clinics due to their prejudice against neuropsychiatric diseases.

無需前往醫療機構就可以與醫生和諮詢師進行線上診療和諮詢。因此，與神經精神疾病以外的其他疾病相比，線上診療神經精神疾病具有很高的親和力。You can conduct online diagnosis, treatment and consultation with doctors and counselors without going to a medical institution. Therefore, compared with other diseases other than neuropsychiatric diseases, online diagnosis and treatment of neuropsychiatric diseases has a high affinity.

在線上面談患者（或客戶）時，醫生，諮詢師和臨床心理學家可以使用本估計進行分析。藉此，可非常容易估計是否罹患神經精神疾病以及疾病的種類。此外，在面談時，可以同時進行各種心理測驗和認知功能測驗，例如MMSE、BDI、PHQ-9等。When talking about patients (or clients) online, doctors, consultants and clinical psychologists can use this estimate for analysis. In this way, it is very easy to estimate whether there is a neuropsychiatric disease and the type of the disease. In addition, during the interview, various psychological tests and cognitive function tests, such as MMSE, BDI, PHQ-9, etc., can be taken at the same time.

在這種情況下，患者需要能夠傳輸語音的計算機硬體，以及用於面談的監視器屏幕以及用於語音記錄的麥克風。In this case, the patient needs computer hardware capable of transmitting voice, as well as a monitor screen for interviews and a microphone for voice recording.

如果患者家中沒有這些裝置，則可以將其設置在例如家庭診所中。患者可前往家庭診所，透過那裡的裝置來進行面談。If these devices are not in the patient's home, they can be installed in, for example, a home clinic. Patients can go to the family clinic and have an interview through the device there.

另外，例如，當患者為了治療身體疾病而去家庭診所時，如果家庭醫生經診斷懷疑有神經精神疾病，則可在現場取得語音，然後使用本發明的程式進行分析。In addition, for example, when a patient goes to a family clinic to treat a physical disease, if the family doctor is diagnosed with a neuropsychiatric disease, he can obtain the voice on the spot, and then use the program of the present invention for analysis.

同樣在其他場所，只要精神科醫生和神經科醫生處於可以在線診療的狀態，則家庭醫生、精神科醫生和神經科醫生可在線上合作進行診斷。Also in other places, as long as psychiatrists and neurologists are in a state where they can perform online diagnosis and treatment, family doctors, psychiatrists, and neurologists can collaborate online for diagnosis.

通過提高用於估計特定疾病的敏感性（在這種情況下，特異性通常降低），本發明的估計程式可以用作為篩選裝置。By increasing the sensitivity for estimating specific diseases (in this case, the specificity is usually reduced), the estimation program of the present invention can be used as a screening device.

透過將其用作公司和地方政府進行的健康檢查以及在醫療機構進行的人體檢查的測試項目，有助於早期發現到目前為止很難發現且沒有簡單的檢測方法的神經精神疾病。By using it as a test item for health checkups conducted by companies and local governments and human body checkups conducted by medical institutions, it is helpful for early detection of neuropsychiatric diseases that have been difficult to detect so far and there is no simple detection method.

例如，與眼底檢查、視力測驗、聽力測驗等一樣，可將取得語音作為一系列檢測之一，並在現場將程式的估計結果，或者與其他測驗結果一起，進行通知。For example, like fundus examinations, vision tests, hearing tests, etc., the acquired voice can be used as one of a series of tests, and the estimated results of the program can be notified on the spot, or together with other test results.

由於本發明的估計程式不需要特殊的裝置，因此任何人都可以容易地使用。另一方面，由於僅限於神經精神疾病的場合使用，因此使用頻率並不總是很高。因此，只要在配備有昂貴的檢查裝置的專科醫院設置本發明的一組估計裝置，當目標患者到醫院就診時，家庭醫生等可以請求該專科醫院進行檢查。Since the estimation program of the present invention does not require a special device, anyone can easily use it. On the other hand, the frequency of use is not always very high because it is limited to occasions of neuropsychiatric diseases. Therefore, as long as the set of estimating devices of the present invention is installed in a specialized hospital equipped with an expensive examination device, when the target patient comes to the hospital for treatment, the family doctor or the like can request the specialized hospital to perform the examination.

在神經精神疾病中使用的裝置包括光學形貌，心肌閃爍造影，腦血流閃爍造影，CT，MRI，腦波圖等。這些裝置用於疾病估計和排除診斷，但是由於本發明的估計裝置的侵入性極低，因此它們可以與這些檢查結合使用或在做這些檢查之前使用。Devices used in neuropsychiatric diseases include optical topography, myocardial scintigraphy, cerebral blood flow scintigraphy, CT, MRI, brain wave, etc. These devices are used for disease estimation and exclusion diagnosis, but since the estimation device of the present invention is extremely low in invasiveness, they can be used in combination with these examinations or used before these examinations.

由於本發明的估計程式可以在家中容易地使用，因此可以在診斷後用作監視裝置。例如，在情緒失調組的疾病的情況下，相對患者的疾病進行藥物和心理療法治療，並且可以評估這些療法的有效性。另外，通過持續使用，可以每天觀察症狀是否穩定以及是否有復發跡象。Since the estimation program of the present invention can be easily used at home, it can be used as a monitoring device after diagnosis. For example, in the case of a disease in the mood disorder group, medication and psychotherapy are performed against the patient's disease, and the effectiveness of these therapies can be evaluated. In addition, through continuous use, you can observe whether the symptoms are stable and whether there are signs of recurrence every day.

由於本發明的估計程式是分析說話產生的語音，因此它可以用作高齡者的看顧裝置。Since the estimation program of the present invention analyzes the speech produced by speaking, it can be used as a caregiving device for the elderly.

獨居老人的生活狀況是否良好，是其近親關注的一個問題。通過在使用電話或視訊電話等通信手段的高齡者看顧系統中實施本發明的推定程式，不僅可以觀看生活反應，還可以估計是否有癡呆和抑鬱的傾向，因此即使一個人生活也可以適當地採取的應對措施。Whether the living conditions of elderly people living alone are good is a concern of their close relatives. By implementing the presumptive program of the present invention in the elderly care system using telephone or videophone, etc., it is possible not only to watch life reactions, but also to estimate whether there is a tendency for dementia and depression. Therefore, even if a person lives alone, he can take appropriate measures. Responses.

在這些各種實施例中，對取得語音的方法並沒有特別限制，例如（1）透過電話或互聯網從受檢者端發送錄製語音的方法、（2）檢查員通過電話或互聯網與受檢者聯繫，並通過交談以取得語音的方法、（3）將語音採集裝置安裝在受檢者的住所中，以該裝置對受檢者進行錄音的方法、（4）語音取得裝置自動地有規律啟動，並且透過與受檢者進行對話來取得受檢者的語音的方法等等。In these various embodiments, there are no special restrictions on the method of obtaining the voice, such as (1) the method of sending the recorded voice from the subject through the telephone or the Internet, (2) the inspector contacting the subject through the telephone or the Internet , And obtain the voice through conversation, (3) install the voice acquisition device in the subject’s residence, and use the device to record the subject; (4) the voice acquisition device is automatically and regularly activated, And the method of obtaining the subject’s voice through dialogue with the subject and so on.

在取得語音時，較佳地，在估計裝置中設置的顯示器上顯示要說出的句子，或者從揚聲器播放要說出的句子的聲音，以使受檢者可以流暢地朗讀。透過錄音開始的機械聲音開始錄音，並且當說話結束時，通過開關結束錄音，可以針對每個句子取得說話的語音。＜4. 估計程式流程＞＜＜估計流程-1＞＞When the voice is acquired, preferably, the sentence to be spoken is displayed on the display provided in the estimation device, or the sound of the sentence to be spoken is played from the speaker, so that the subject can read aloud smoothly. Start recording through the mechanical sound at the beginning of the recording, and when the speech ends, end the recording through the switch, and the spoken voice can be obtained for each sentence. ＜4. Estimation program flow＞＜＜Estimation process-1＞＞

例如，在第一步中，估計三組：（1-A）認知障礙組，包括阿茲海默症、利維體認知障礙和帕金森氏症；（1-B）情緒障礙組，包括重度憂鬱症、非典型憂鬱症和雙相情感障礙；以及（1-C）健康組。For example, in the first step, three groups are estimated: (1-A) cognitive impairment group, including Alzheimer’s disease, Levitic cognitive impairment and Parkinson’s disease; (1-B) mood disorder group, including severe Depression, atypical depression and bipolar disorder; and (1-C) health group.

接下來，在第二步中，通過程式來估計認知障礙組中患者的疾病，該程式可以從分類為（1-A）認知障礙組的患者的聲音中估計出（1-A-1）阿茲海默症、（1-A-2）利維體認知障礙和（1-A-3）帕金森氏症這三種疾病中的哪一種。另一方面，通過一個程式來估計情緒障礙組中患者的疾病，該程式可以從分類為（1-B）情緒障礙組的患者的聲音中估計出（1-B-1）重度憂鬱症、（1-B-2）非典型性憂鬱症和（1-B-3）雙相情感障礙症這三種疾病中的哪一種。＜＜估計流程-2＞＞Next, in the second step, the disease of patients in the cognitive impairment group is estimated by a program, which can be estimated from the voices of patients classified as (1-A) cognitive impairment group (1-A-1) Ah Which of the three diseases: Zheimer's disease, (1-A-2) Levitic body cognitive impairment, and (1-A-3) Parkinson's disease. On the other hand, a program is used to estimate the disease of patients in the mood disorder group. This program can estimate (1-B-1) severe depression, ( 1-B-2) Which of the three diseases, atypical depression and (1-B-3) bipolar disorder? ＜＜Estimation process-2＞＞

作為判斷流程的另一種形式，在第一步中，估計三組：（2-A）認知障礙組，包括阿茲海默症、利維體認知障礙和帕金森氏症；（2-B）情緒障礙組，包括重度憂鬱症、非典型憂鬱症和雙相情感障礙；以及（2-C）健康組。As another form of the judgment process, in the first step, three groups are estimated: (2-A) Cognitive impairment group, including Alzheimer's disease, Levitic cognitive impairment and Parkinson's disease; (2-B) Mood disorder group, including severe depression, atypical depression and bipolar disorder; and (2-C) healthy group.

接下來，在第二步中，被分類為（2-A）認知障礙組的患者的聲音是通過一個程式來估計是否為（2-A-1）利維體認知障礙，該程式估計患者是否患有利維體認知障礙還是其他認知障礙。另一方面，被分類為（2-B）情緒障礙組的患者的聲音是通過一個程式來估計是否為（2-B-1）重度憂鬱症，該程式估計患者是否患有重度憂鬱症還是其他情緒障礙。Next, in the second step, the voice of patients classified as (2-A) cognitive impairment group is estimated by a program to estimate whether it is (2-A-1) Levitic cognitive impairment, and the program estimates whether the patient is Is suffering from Levitic cognitive impairment or other cognitive impairment. On the other hand, the voice of patients classified into the (2-B) mood disorder group is estimated to be (2-B-1) severe depression through a program that estimates whether the patient has severe depression or other Emotional disorders.

而類似地，估計是否患有（2-A-2）阿茲海默症或其他認知障礙症的程式、估計是否患有（2-A-3）帕金森氏症或其他認知障礙症的程式、估計是否患有非典型性憂鬱症或其他情緒障礙症的程式、估計是否患有雙相情感障礙或其他情緒障礙症的程式。通過使用該程式，最終可以確定是否患有阿茲海默症、利維體認知障礙、帕金森氏症、重度憂鬱症、非典型憂鬱症或雙相情感障礙。＜＜估計流程-3＞＞And similarly, estimate whether you have (2-A-2) Alzheimer’s disease or other dementia programs, and estimate whether you have (2-A-3) Parkinson’s disease or other dementia programs , Estimate whether you have a program of atypical depression or other mood disorders, and estimate whether you have a program of bipolar disorder or other mood disorders. By using this program, you can finally determine whether you have Alzheimer's disease, Levitic cognitive impairment, Parkinson's disease, severe depression, atypical depression, or bipolar disorder. ＜＜Estimation process-3＞＞

在另一實施例中，在第二步中，通過分類為（3-A）認知障礙組患者的聲音，使用估計是否患有（3-A-1）阿茲海默症或利維體認知障礙的程式、估計是否患有（3-A-2）利維體認知障礙或帕金森氏症的程式、估計是否患有（3-A-3）帕金森氏症或阿茲海默症的程式。三個估計程式一起使用，可以估計被分類為認知障礙組的患者的疾病。In another embodiment, in the second step, the voices of patients classified as (3-A) cognitive impairment group are used to estimate whether they have (3-A-1) Alzheimer’s disease or Levitic cognition Disordered program, estimated whether you have (3-A-2) Levitic cognitive impairment or Parkinson’s disease, estimated whether you have (3-A-3) Parkinson’s disease or Alzheimer’s disease Program. The three estimation programs are used together to estimate the disease of patients classified as cognitive impairment group.

另一方面，通過分類為（3-B）情緒障礙組患者的聲音，使用估計是否患有（3-B-1）重度憂鬱症或非典型憂鬱症的程式、估計是否患有（3-B-2）非典型憂鬱症或雙相情感障礙的程式、估計是否患有（3-B-3）雙相情感障礙或重度憂鬱症的程式。三個估計程式一起使用，可以估計被分類為情緒障礙組的患者的疾病。＜＜估計流程-4＞＞On the other hand, by using the voice of patients classified as (3-B) mood disorder group, using the program that estimates whether they have (3-B-1) severe depression or atypical depression, and whether they have (3-B) -2) A program of atypical depression or bipolar disorder, and a program that estimates whether you have (3-B-3) bipolar disorder or severe depression. The three estimation programs are used together to estimate the disease of the patients classified as the mood disorder group. ＜＜Estimation process-4＞＞

此外，如上所述，分為認知障礙組、情緒障礙組和健康組等三類。第一步是將其分為（4-A）健康組和（4-B）其他疾病組，接下來是將疾病組分為（4-B-1）認知障礙組和（4-B-2）情緒障礙組等兩個階段。（在創建估計程式時取得的聲音的受檢者）In addition, as described above, it is divided into three categories: cognitive impairment group, mood disorder group and healthy group. The first step is to divide it into (4-A) healthy group and (4-B) other disease group. The next step is to divide the disease group into (4-B-1) cognitive impairment group and (4-B-2) ) Two stages for the mood disorder group. (The subject of the sound obtained when creating the estimation program)

將描述第二聲音參數的語音資料的取得。優選地，根據以下標準選擇要取得語音的受檢者。The acquisition of the voice data of the second voice parameter will be described. Preferably, the subject whose voice is to be obtained is selected according to the following criteria.

（A）在取得充分的解釋後，受檢者應書面同意將所取得的語音用於疾病分析。(A) After obtaining a sufficient explanation, the subject should agree in writing to use the obtained speech for disease analysis.

（B）本估計系統不根據單詞的含義或內容（文本）進行疾病估計分析。因此，基本上沒有對國籍或母語的限制。然而，由於種族和語言之間有可能具有差異，因此最好是確定相同的種族和語言作為比較對照。例如，從方便的角度來看，當本發明在日本實施時，最好是通過取得母語為日語的人的語音來創建這個估計程式，並通過日語語音來估計一種疾病。在英語國家實施本發明的情況下，最好是通過取得母語為英語的人的語音來創建估計程式，並通過英語語音來估計疾病。(B) This estimation system does not perform disease estimation analysis based on the meaning or content (text) of words. Therefore, there are basically no restrictions on nationality or mother tongue. However, because there may be differences between race and language, it is best to identify the same race and language as a comparison. For example, from a convenient point of view, when the present invention is implemented in Japan, it is better to create the estimation program by obtaining the voice of a person whose mother tongue is Japanese, and to estimate a disease through the Japanese voice. In the case of implementing the present invention in an English-speaking country, it is better to create an estimation program by obtaining the voice of a native English speaker, and to estimate the disease through the English voice.

（C）只要是有語言能力的人，對年齡沒有特別限制。但是，考慮到聲音變化和情緒穩定性，優選15歲以上，更優選18歲以上，特別優選20歲以上。另外，考慮到因年齡而造成的說話困難，小於100歲為佳，小於90歲更為佳。(C) As long as the person has language ability, there is no special restriction on age. However, in consideration of changes in voice and emotional stability, it is preferably 15 years or older, more preferably 18 years or older, and particularly preferably 20 years or older. In addition, taking into account the difficulty of speaking due to age, it is better to be younger than 100 years old, and it is better to be younger than 90 years old.

（D）當本發明在日本實施時，在取得語音時，能讀出日語句子（發音）的人是最好的。然而，當估計具有日語以外的母語的人時，可以通過讀取對應的母語的句子來判別。(D) When the present invention is implemented in Japan, the person who can read the Japanese sentence (pronunciation) is the best when acquiring the voice. However, when it is estimated that there is a person with a native language other than Japanese, it can be distinguished by reading the sentence of the corresponding native language.

（E）在創建估計算法時，最好使用屬於以下六種疾病的人的聲音：阿茲海默症、利維體認知障礙、帕金森氏症、重度憂鬱症、非典型憂鬱症和雙相情感障礙。然而，同時患有這些疾病的人被排除在外。除了上述六種疾病之外，還可以使用那些分別屬於血管性認知障礙症、前顳葉認知障礙症、情緒調節障礙症和循環型情緒障礙症的聲音。此外，也可以使用對應於精神分裂症、廣泛性焦慮症和其他神經精神疾病的人的聲音。(E) When creating the estimation algorithm, it is best to use the voices of people belonging to the following six diseases: Alzheimer’s disease, Levitic cognitive impairment, Parkinson’s disease, severe depression, atypical depression, and bipolar disorder Affective disorder. However, people who also suffer from these diseases are excluded. In addition to the above six diseases, voices belonging to vascular dementia, anterior temporal lobe dementia, mood regulation disorder, and cyclic mood disorder can also be used. In addition, the voices of people corresponding to schizophrenia, generalized anxiety disorder, and other neuropsychiatric diseases can also be used.

（F）健康受檢者最好是被確認為既不屬於認知障礙組也不屬於情緒障礙組的人。＜＜取得語音的方法＞＞(F) Healthy subjects should preferably be identified as those who belong to neither the cognitive impairment group nor the emotional impairment group. ＜＜How to get voice＞＞

將描述用於取得語音的方法。The method for acquiring voice will be described.

（1）對麥克風沒有特別限制，只要可以取得聲音即可。例如，可以選擇手持麥克風、頭戴式耳機、內置在可攜式終端中的麥克風、或內置在個人電腦或平板電腦中的麥克風等。從僅能夠取得目標人的聲音的觀點出發，優選銷釘麥克風，定向麥克風和移動終端中內置的麥克風。從僅能夠取得受檢者的聲音的角度來看，針式麥克風、定向麥克風和內置在可攜式終端中的麥克風是比較好的。(1) There are no special restrictions on the microphone, as long as the sound can be obtained. For example, you can choose a handheld microphone, a headset, a microphone built into a portable terminal, or a microphone built into a personal computer or tablet, etc. From the viewpoint that only the voice of the target person can be acquired, a pin microphone, a directional microphone, and a microphone built into a mobile terminal are preferable. From the point of view that only the voice of the subject can be obtained, pin microphones, directional microphones, and microphones built in portable terminals are better.

（2）記錄器可以是可攜式記錄器、個人電腦、平板電腦，或內置或外置於可攜式終端的記錄媒介。(2) The recorder can be a portable recorder, a personal computer, a tablet computer, or a recording medium built-in or externally placed in a portable terminal.

（3）在估計疾病時，對語音的內容沒有限制。例如，可以使用由受檢者自由說出的語音、受檢者預先準備好的文本中讀出的短語、或從電話或面對面交談中的話語。然而，在創建一個估計算法時，最好是使用受檢者常見的語音內容。因此，在創建估計算法時，最好使用受檢者事先準備好的文本的語料。(3) When estimating diseases, there is no restriction on the content of the voice. For example, it is possible to use a voice freely uttered by the subject, a phrase read from a text prepared by the subject in advance, or words from a telephone or face-to-face conversation. However, when creating an estimation algorithm, it is best to use speech content that is common to the subject. Therefore, when creating the estimation algorithm, it is best to use the corpus of the text prepared by the examinee in advance.

如果說話時間太短，估計結果的準確性將變低。優選為15秒以上，更優選為20秒以上，特別優選為30秒以上。另外，如果所需時間更長，則需要花費一些時間來取得結果。優選為5分鐘以下，更優選為3分鐘以下，特別優選為2分鐘以下。If the speaking time is too short, the accuracy of the estimation result will be lower. It is preferably 15 seconds or more, more preferably 20 seconds or more, and particularly preferably 30 seconds or more. In addition, if it takes longer, it will take some time to get results. It is preferably 5 minutes or less, more preferably 3 minutes or less, and particularly preferably 2 minutes or less.

（4）觀察和檢查項目不受限制。然而，最好是取得資訊，以驗證是否因受檢者的情況不同而具有聲音差異。這些資訊包括性別、年齡、身高、體重等一般資訊，明確的診斷名稱、嚴重程度、身體疾病的併發症、病史、發病時間等醫療資訊、MRI、CT等檢查資訊、病人健康問卷（Patient Health Quastionnaire，PHQ）- 9作為醫療訪談和問題、迷你國際神經精神病學訪談（M.I.N.I-International Neuropsychiatric Interview）（迷你螢幕）、漢氏憂鬱量表（Hamilton Depression Rating Scale）（HAM-D或HDRS），楊氏躁症量表（Young Mania Rating Scale， YMRS）、簡易心智量表（Mini-Mental State Examination，MMSE）、躁鬱症頻譜診斷量表（Bipolar Spectrum Diagnostic Scale，BSDS）、由動作障礙協會（Movement Disorder Society）修訂的統一帕金森氏症評定量表（Unified Parkinson’s Disease Rating Scale）（MDS-UPDRS）等。(4) Observation and inspection items are not restricted. However, it is best to obtain information to verify whether there is a difference in voice due to the condition of the subject. This information includes general information such as gender, age, height, weight, clear diagnosis name, severity, complications of physical diseases, medical history, time of onset and other medical information, examination information such as MRI, CT, and patient health questionnaires (Patient Health Quastionnaire). , PHQ)-9 as medical interviews and questions, MINI-International Neuropsychiatric Interview (mini screen), Hamilton Depression Rating Scale (HAM-D or HDRS), Young’s Mania Rating Scale (Young Mania Rating Scale, YMRS), Mini-Mental State Examination (MMSE), Bipolar Spectrum Diagnostic Scale (BSDS), from the Movement Disorder Society (Movement Disorder Society) ) Revised Unified Parkinson’s Disease Rating Scale (MDS-UPDRS), etc.

（5）關於取得語音的環境，只要是只能取得患者的語音的環境即可，沒有特別的限制。安靜的環境，具體來說，40分貝或更低為佳，30分貝或更低則更為理想。具體例子包括檢查室、諮詢室、會議室、聽力檢查室、CT、MRI、X射線和其他檢查室。另外，也可以在受檢者家的安靜房間中取得語音。(5) Regarding the environment for obtaining the voice, there is no particular limitation as long as it is an environment in which only the voice of the patient can be obtained. A quiet environment, specifically, 40 decibels or lower is better, and 30 decibels or lower is more ideal. Specific examples include examination rooms, consultation rooms, conference rooms, hearing examination rooms, CT, MRI, X-ray and other examination rooms. In addition, the voice can also be obtained in a quiet room of the subject's home.

如上所述創建的估計程式可以不受任何特別限制的使用，無論其是被懷疑患有神經精神疾病還是被估計為健康。如果取得了被檢查者的聲音，該程式還可以作為醫生的體檢工具，或者作為體檢或身體檢查中的檢查項目。The estimation program created as described above can be used without any special restrictions, regardless of whether it is suspected of having a neuropsychiatric disease or is estimated to be healthy. If the voice of the examinee is obtained, the program can also be used as a medical examination tool for doctors, or as an examination item in a physical examination or physical examination.

至於語音取得次數，可以一次就能判別出來。然而，例如，即使是健康受檢者也可能因為生活中的某個事件而感到憂鬱，而且很有可能在取得聲音的時候剛好處於憂鬱的情緒中。此外，重度憂鬱症、非典型憂鬱症等，其情緒可能在早晚之間改變。因此，如果估計結果顯示此人患有某種精神疾病時，最好至少再取得一次語音並再次進行估計。As for the number of voice acquisitions, it can be distinguished once. However, for example, even a healthy subject may feel melancholy due to a certain event in life, and it is very likely that they are just in the mood of melancholy when they get the voice. In addition, the mood of severe depression, atypical depression, etc. may change in the morning and evening. Therefore, if the estimation result shows that the person suffers from a certain mental illness, it is best to obtain the voice at least one more time and perform the estimation again.

醫生、臨床心理學家、護士、實驗室技術人員、顧問或任何其他在面對取得語音的受檢者時操縱本發明的裝置的人都可使用本發明的估計系統100。在一個保持安靜環境的房間裡，如治療室或諮詢室，一個或多個運作本發明裝置的人使用該裝置，同時直接向處於開放狀態的受檢者解釋語音取得的方法。此外，取得聲音的受檢者進入聽力測試室或其他各種測試室，上述處理者在通過玻璃或通過監視器圖像觀察受檢者時使用該裝置。如果受檢者身處偏遠地區，如家中，可事先向受檢者解釋語音採集的方法，受檢者可在指定的日期和時間前自行錄製語音。如果是在遠程場所，也可以使用個別的通信線路來取得語音，同時用攝像機圖像等確認被攝者的身份。Doctors, clinical psychologists, nurses, laboratory technicians, consultants, or anyone else who manipulates the device of the present invention when facing a subject who obtains speech can use the estimation system 100 of the present invention. In a room where a quiet environment is maintained, such as a treatment room or a consultation room, one or more people operating the device of the present invention use the device while directly explaining the method of voice acquisition to the subject in an open state. In addition, the subject who obtained the sound enters the hearing test room or other various test rooms, and the above-mentioned processor uses the device when observing the subject through glass or through a monitor image. If the subject is in a remote area, such as at home, the method of voice collection can be explained to the subject in advance, and the subject can record the voice before the designated date and time. If you are in a remote location, you can also use a separate communication line to obtain the voice, and at the same time use camera images to confirm the identity of the subject.

此外，當受檢者去醫療機構或諮詢室時，可以取得和估計聲音，也可以作為公司或地方政府的健康檢查或人體健康檢查的檢查項目之一來取得和估計聲音。＜5. 創建估計程式1的示例＞＜＜排除了取決於環境的聲音特徵量的估計過程的例子＞＞In addition, when the subject goes to a medical institution or consultation room, the voice can be obtained and estimated, and it can also be used as one of the inspection items of the company or local government's health check or human health check to obtain and estimate the voice. ＜5. Example of creating estimation program 1＞＜＜Excluding the example of the estimation process of the sound feature quantity depending on the environment＞＞

如圖3和圖4所示，使用第三聲音特徵量的集合和隨後的估計來測試排除與環境有關的聲音特徵量的過程。＜＜估計程式1的創建和結果＞＞As shown in FIGS. 3 and 4, the third set of sound feature quantities and the subsequent estimation are used to test the process of excluding sound feature quantities related to the environment. ＜＜Creation and result of estimation program 1＞＞

將介紹在提取第一組或第三聲音特徵量的集合之後執行估計處理時的驗證的示例。首先，作為第一次聲音特徵量提取中要使用的相同語音內容，發明人以相同發音內容為例，使用13個固定短語（每個短語複誦兩次）、3個長母音和短語「pa ta ka」（日文羅馬拼音）作為相同發音內容的示例，總共準備了30個發音，並讓多個健康受檢者在七個不同的場所之間說出相同的說話內容。關於在提取第二聲音特徵量中使用的不同發聲內容，允許健康受檢者自由說話並取得語音。An example of verification when the estimation process is performed after extracting the first group or the third set of sound feature amounts will be introduced. First of all, as the same speech content to be used in the first sound feature extraction, the inventor takes the same pronunciation content as an example, using 13 fixed phrases (each phrase is repeated twice), 3 long vowels and short vowels. The phrase "pa ta ka" (Japanese Roman pinyin) is used as an example of the same pronunciation. A total of 30 pronunciations were prepared, and multiple healthy subjects were asked to speak the same content in seven different places. Regarding the different utterance content used in extracting the second voice feature amount, the healthy subject is allowed to speak freely and obtain the voice.

在對取得的語音進行標記作業和歸一化處理後，用分析軟體OpenSMILE提取了7440個聲音特徵量。接下來，用成對t檢定（Paired t-test）進行比較，將P值設置為大於0.5，並且取得170個聲音特徵量作為第一聲音特徵量的集合。接下來，通過t檢定（非成對t檢定）進行比較，將P值設置為大於0.5，並且取得549個聲音特徵量作為第二聲音特徵量的集合。接下來，將第一聲音特徵量的集合和第二聲音特徵量的集合進行比較，得到共通的聲音特徵量作為第三聲音特徵量的集合，這聲音特徵量的集合不受語音採集場所的影響，共得到169個聲音特徵量。169個聲音特徵量中的99.4%包含在第一聲音特徵量的集合中。從上述結果來看，也可以僅根據超過期望的P值的第一聲音特徵量的集合將其定義為第三聲音特徵量的集合。After marking and normalizing the acquired voice, 7,440 voice features were extracted with the analysis software OpenSMILE. Next, a paired t-test is used for comparison, the P value is set to be greater than 0.5, and 170 voice feature quantities are obtained as the first set of voice feature quantities. Next, through t-test (non-paired t-test) for comparison, the P value is set to be greater than 0.5, and 549 sound feature quantities are obtained as the set of second sound feature quantities. Next, compare the set of the first voice feature quantity with the set of the second voice feature quantity, and obtain the common voice feature quantity as the set of the third voice feature quantity. This set of voice feature quantity is not affected by the voice collection place , A total of 169 voice feature quantities are obtained. 99.4% of the 169 voice feature quantities are included in the first set of voice feature quantities. From the above results, it is also possible to define it as the third set of voice feature amounts only based on the set of first voice feature amounts exceeding the desired P value.

接下來，將上述169個不含環境因素的聲音特徵量與從第二聲音參數中得到的參數F（a）進行比較，得到並驗證了一個排除與環境有關的聲音特徵量的推理程式，並進行驗證。作為學習資料，使用了30名重度憂鬱症患者，共963個短語，以及30名健康受檢者，共965個短語。14名重度憂鬱症患者和30名健康受檢者被作為驗證資料。在驗證資料方面，每個人的大約30個短語被分別判斷為重度憂鬱症或健康，並以這30個短語中的大多數來判斷作為最終估計結果。Next, the above-mentioned 169 sound feature quantities that do not contain environmental factors are compared with the parameter F(a) obtained from the second sound parameter, and an inference program that excludes sound feature quantities related to the environment is obtained and verified, and authenticating. As learning materials, 30 patients with severe depression, a total of 963 phrases, and 30 healthy subjects, a total of 965 phrases were used. 14 patients with severe depression and 30 healthy subjects were used as verification data. In terms of verification data, about 30 phrases of each person were judged as severe depression or health, and most of these 30 phrases were judged as the final estimation result.

圖5是驗證資料的混合矩陣，其中HE表示健康，MDD表示重度憂鬱症。如圖5所示，估計程式的正確診斷率為79.5%。＜6. 創建估計程式2的示例＞＜＜用語音資料關聯多種疾病的工作-語音採集＞＞Figure 5 is a mixed matrix of verification data, where HE means healthy and MDD means severe depression. As shown in Figure 5, it is estimated that the correct diagnosis rate of the program is 79.5%. ＜6. Example of creating estimation program 2＞＜＜Work to associate multiple diseases with voice data-voice collection＞＞

以下介紹了創建估計程式時使用的程式。為了將語音資料附加到多種疾病，在2017年12月25日至2018年5月30日的期間取得了下列患者和健康受檢者的語音。The following describes the program used when creating the estimation program. In order to attach voice data to a variety of diseases, the voices of the following patients and healthy subjects were obtained from December 25, 2017 to May 30, 2018.

・阿茲海默症患者的語音20例・利維體認知障礙症患者的語音20例・帕金森氏症患者的語音20例・重度憂鬱症患者的語音20例（重度憂鬱症A組）・雙相情感障礙症患者的語音16例・非典型憂鬱症患者的語音19例・健康受檢者的語音20例（健康受檢者A組）・20 voice cases of patients with Alzheimer's disease ・20 voice cases of patients with Levitic dementia ・20 voice cases of patients with Parkinson's disease ・20 voice cases of patients with severe depression (severe depression group A) ・16 voice cases of patients with bipolar disorder ・19 voices of patients with atypical depression ・20 voice cases of healthy subjects (healthy subjects A group)

此外，在2019年6月28日至2019年10月31日的期間取得了下列患者和健康受檢者的語音。In addition, the voices of the following patients and healthy subjects were obtained from June 28, 2019 to October 31, 2019.

・阿茲海默症患者的語音37例・利維體認知障礙症患者的語音57例・其他認知障礙症（包含血管性認知障礙症和前顳葉認知障礙症）患者的語音28例・帕金森氏症患者的語音35例・重度憂鬱症患者的語音57例（重度憂鬱症B組）・雙相情感障礙症患者的語音34例・非典型憂鬱症患者的語音30例・其他憂鬱症（包括情緒調節障礙症和循環型情緒障礙症）患者的語音38例・健康受檢者的語音60例+28例（取得4個人在七個不同的場所的語音：健康受檢者B組）・37 voices of patients with Alzheimer's disease ・57 cases of voices of patients with Levi body dementia ・28 voices of patients with other dementias (including vascular dementia and anterior temporal lobe dementia) ・35 voice cases of Parkinson's disease patients ・57 voices of patients with severe depression (group B of severe depression) ・34 voices of patients with bipolar disorder ・30 voice cases of patients with atypical depression ・38 voices of patients with other depression (including mood regulation disorder and cyclic mood disorder) ・Sounds of healthy subjects 60 cases + 28 cases (Acquisition of 4 people's voices in seven different places: healthy subjects B group)

而且，這些患者是由精神科和神經內科等專業領域的醫生根據DSM-5或ICD-10的標準確認定患有各自疾病的患者。此外，透過進行PHQ-9，MMSE等，醫生確認沒有其他神經精神疾病的併發症。Moreover, these patients are confirmed by doctors in specialized fields such as psychiatry and neurology according to the standards of DSM-5 or ICD-10 to have their respective diseases. In addition, by performing PHQ-9, MMSE, etc., the doctor confirmed that there are no complications of other neuropsychiatric diseases.

通過執行PHQ-9，MMSE等，可以確認健康受檢者沒有抑鬱症狀或認知功能下降。By performing PHQ-9, MMSE, etc., it can be confirmed that healthy subjects have no depressive symptoms or cognitive decline.

使用奧林巴斯（Olympus）針式麥克風和樂蘭（Roland）可攜式錄音機進行語音採集。語音資料記錄在SD卡上。Use Olympus (Olympus) needle microphone and Roland (Roland) portable recorder for voice collection. The voice data is recorded on the SD card.

說話內容是讓受檢者在圖10所示的17個句子中，分別朗讀第1至13項兩次以及第14至17項一次。The content of the speech is for the subject to read items 1 to 13 twice and items 14 to 17 once in the 17 sentences shown in Figure 10.

在取得語音時，向受檢者說明將在研究中使用語音分析神經精神病患的聲音與疾病的關係性、說話的內容以及取得語音的方法，並簽署書面同意。另外，以無法識別個人身份的格式對包括語音的取得資料進行符號化和管理。When obtaining the speech, explain to the examinee that it will be used in the study to analyze the relationship between the neuropsychiatric patient's voice and the disease, the content of the speech, and the method of obtaining the speech, and sign a written consent. In addition, symbolize and manage the acquired data, including voice, in a format that cannot identify individuals.

在上述17種類型的說話內容中，每位受檢者的朗讀第1到13種的說話內容（各2次，總共說26次）和朗讀第14到17種的說話內容（各1次，總共說4次），總共30次說話中，將較長的說話內容分為兩部分，並把不清楚的說話內容排除，從而取得每種疾病患者和健康受檢者的語音。＜＜估計程式 2＞＞＜＜提取與環境無關的語音特徵量＞＞針對健康受檢者B組中的四個健康受檢者，在七個不同的場所（醫院檢查室和治療室）中取得了語音。Among the above 17 types of speech content, each subject reads the 1st to 13th types of speech content (2 times each, totaling 26 times) and reads the 14th to 17th types of speech content (1 time each, A total of 4 speeches). In a total of 30 speeches, the longer speech content is divided into two parts, and the unclear speech content is excluded, so as to obtain the speech of each disease patient and healthy subject. ＜＜Estimation Program 2＞＞＜＜Extraction of voice features irrelevant to the environment＞＞ For the four healthy subjects in the healthy subject group B, voices were obtained in seven different places (hospital examination rooms and treatment rooms).

將這些語音進行了歸一化處理之後，使用OpenSMILE進行語音分析並提取7440個聲音特徵量。關於該特徵量，對應每個短語以成對t檢定（Paired t-test）進行比較。結果顯示，朗讀「從前從前在某個地方」的短語中，取得505個場所之間沒有顯著差異（p＞0.5）的聲音特徵量。同樣以相同的方式，朗讀「昨天睡的很好」取得553個、朗讀「感到生氣」取得727個、朗讀「加油吧」取得525個場所之間沒有顯著差異的聲音特徵量。After normalizing these voices, OpenSMILE is used for voice analysis and 7440 voice features are extracted. Regarding this feature amount, a paired t-test (Paired t-test) is used to compare each phrase. The results show that in reading the phrase "in the past, once in a certain place", 505 places with no significant difference (p>0.5) of voice features were obtained. In the same way, reading "I slept well yesterday" gets 553, reading "I feel angry" gets 727, and reading "Come on" gets 525 voice features that have no significant difference between places.

另外，關於相同特徵量，以非成對t檢定（Unpaired t-test）比較了健康受檢者A組和健康受檢者B組以及重度憂鬱症A組和重度憂鬱症B組的語音。另外，以非成對t檢定（Unpaired t-test）比較了健康受檢者A組和重度憂鬱症A組、健康受檢者B組和重度憂鬱症B組的語音。結果，在朗讀「從前從前在某個地方」中選擇了246個聲音特徵量，這些聲音特徵量在重度憂鬱症組和健康受檢者組之間沒有顯著差異（P＞0.1），但在重度憂鬱症組和健康受檢者組之間有顯著差異。另外，以相同的方法，在朗讀「昨天睡的很好」中選擇了336個、在朗讀「感到生氣」中選擇了231個、在朗讀「加油吧」中選擇了363個聲音特徵量。In addition, regarding the same feature amount, the voices of the healthy subject A group and the healthy subject B group, and the severe depression group A and severe depression group B were compared by an unpaired t-test. In addition, the unpaired t-test (Unpaired t-test) compared the voices of healthy subjects A group and severe depression group A, healthy subjects B group and severe depression group B. As a result, 246 voice feature quantities were selected in the reading "Once upon a time in a certain place". These voice feature quantities were not significantly different between the severe depression group and the healthy subject group (P>0.1), but in the severe There is a significant difference between the depression group and the healthy subject group. In addition, in the same way, 336 voice features were selected in the reading "I slept well yesterday", 231 voice features were selected in the reading "I feel angry", and 363 voice feature values were selected in the reading "Come on".

然後，作為透過成對t檢定和非成對t檢定選擇的聲音特徵量，朗讀「從前從前在某個地方」取得21個、朗讀「昨天睡的很好」取得14個、朗讀「感到生氣」取得28個、朗讀「加油吧」取得46個語音特徵量。Then, as the voice feature quantities selected by the paired t test and the unpaired t test, read "Are you in a certain place before" to get 21, read "I slept well yesterday" to get 14, and read "I feel angry." Obtain 28, read "Come on" aloud to obtain 46 voice features.

以相同的方式，在每個短語中，提取出通過成對t檢定無顯著差異的選定特徵量和通過非成對t檢定無顯著差異的選定特徵量。然後，選擇在成對t檢定和非成對t檢定之間沒有顯著差異的選定特徵量作為共通特徵量。圖16總結了以這種方式提取與環境無關的聲音特徵量的結果。＜＜6. 創建估計程式2-1（機器學習）＞＞In the same way, in each phrase, the selected feature quantities that are not significantly different by the paired t test and the selected feature quantities that are not significantly different by the unpaired t test are extracted. Then, the selected feature quantity that has no significant difference between the paired t test and the unpaired t test is selected as the common feature quantity. Fig. 16 summarizes the results of extracting sound feature quantities irrelevant to the environment in this way. ＜＜6. Create estimation program 2-1 (machine learning)＞＞

接著，使用30名重度憂鬱症患者和30名健康受檢者的「從前從前在某個地方」的說話語音，以及使用21個與環境無關的語音特徵量用作為學習資料，並基於用以估計重度憂鬱症及健康者其中之一的特徵量F（a）創建了估計程式2-1。＜＜通過估計程式2-1估計重度憂鬱症＞＞Then, 30 severe depression patients and 30 healthy subjects were used to speak "in a certain place before," and 21 speech features that were not related to the environment were used as learning data, and based on the estimation An estimation formula 2-1 is created for the characteristic quantity F(a) of one of the severely depressed and healthy persons. ＜＜Estimation of severe depression through estimation program 2-1＞＞

使用25名重度憂鬱症患者和52名健康受檢者的聲音作為驗證資料，並將結果顯示於圖11（尤登指數中的混淆矩陣，以下相同）。＜＜估計程式2-2＞＞The voices of 25 patients with severe depression and 52 healthy subjects were used as verification data, and the results were shown in Figure 11 (the confusion matrix in the Uden Index, the same below). ＜＜Estimation program 2-2＞＞

與估計程式2-1相同，使用朗讀「昨天睡的很好」的語音。同時，除了對「昨天睡的很好」使用了14個在成對t檢定和非成對t檢定中沒有顯著差異的聲音特徵量外，使用與創建估計程式2-1相同的方式創建估計程式2-2並進行驗證，其結果顯示於圖12。＜＜估計程式2-3＞＞Same as the estimation program 2-1, using a voice that reads "I slept well yesterday." At the same time, in addition to using 14 voice feature quantities that are not significantly different between the paired t test and the unpaired t test for "I slept well yesterday", the estimation program was created in the same way as the estimation program 2-1. 2-2 and verify, the results are shown in Figure 12. ＜＜Estimation program 2-3＞＞

與估計程式2-1相同，使用朗讀「感到生氣」的語音。同時，除了對「感到生氣」使用了28個在成對t檢定和非成對t檢定中沒有顯著差異的聲音特徵量外，使用與創建估計程式2-1相同的方式創建估計程式2-3並進行驗證，其結果顯示於圖13。＜＜估計程式2-4＞＞The same as the estimation formula 2-1, use the voice that reads "I feel angry" aloud. At the same time, except for using 28 voice feature quantities that are not significantly different between the paired t test and the unpaired t test for "feeling angry", the estimation program 2-3 was created in the same way as the estimation program 2-1. And verify, the results are shown in Figure 13. ＜＜Estimation program 2-4＞＞

與估計程式2-1相同，使用朗讀「加油吧」的語音。同時，除了對「加油吧」使用了46個在成對t檢定和非成對t檢定中沒有顯著差異的聲音特徵量外，使用與創建估計程式2-1相同的方式創建估計程式2-4並進行驗證，其結果顯示於圖14。＜7. 創建估計程式3的示例＞It is the same as the estimation program 2-1, using the voice that reads "Come on" aloud. At the same time, except that 46 voice feature quantities that are not significantly different between the paired t test and the unpaired t test are used for "Come on", the estimation program 2-4 is created in the same way as the estimation program 2-1. And to verify, the results are shown in Figure 14. ＜7. Example of creating estimation program 3＞

估計程式2-1至2-4、使用上述相同的「今天是晴天」的語音創建估計程式5、使用「筋疲力盡」的語音創建估計程式6、及使用「心情很平靜」的語音創建估計程式7。使用上述七個估計程式，根據各自對應的說話，可以判斷是患有重度憂鬱症還是健康的。然後，根據七個判斷的多數決，獲得每個人的最終估計結果，其結果顯示於圖15。Estimation programs 2-1 to 2-4, use the same "today is sunny" voice to create an estimation program 5, use the "exhausted" voice to create an estimation program 6, and use the "mood very calm" Voice creation estimation program7. Using the above seven estimation programs, according to the corresponding speech, you can judge whether you are suffering from severe depression or healthy. Then, according to the majority decision of seven judgments, the final estimation result of each person is obtained, and the result is shown in Figure 15.

本發明的疾病估計系統中使用的疾病估計程式並不分析說話語音在聲音特徵量方面的內容。估計程式從提取說話中的聲音特徵量所形成的特徵量中計算出疾病預測值。因此，具有不依賴語言的優勢。然而，當一個或多個受檢者實際說話時，聲音特性可能會受到影響，因為除非句子是他們的母語，否則他們無法流利地說話。因此，例如，在估計母語為英語的受檢者的疾病時，最好先收集和分析母語為英語的患者和健康受檢者的聲音，創建一個英語的估計程式，然後用這個程式通過英語語音來估計疾病。以相同的方式，可以創建除日語和英語以外的其他語言的估計程式。The disease estimation program used in the disease estimation system of the present invention does not analyze the content of the voice feature amount of the spoken speech. The estimation program calculates the disease prediction value from the feature amount formed by extracting the voice feature amount in the speech. Therefore, it has the advantage of not relying on language. However, when one or more subjects actually speak, the voice characteristics may be affected because they cannot speak fluently unless the sentence is in their native language. Therefore, for example, when estimating the disease of a subject whose native language is English, it is best to collect and analyze the voices of patients and healthy subjects whose native language is English, create an English estimation program, and then use this program to pass the English voice To estimate the disease. In the same way, estimation programs in languages other than Japanese and English can be created.

當創建用於英語的估計程式並據此估計疾病時，受檢者或受檢者所閱讀的句子的示例包括以下英語句子。When an estimation program for English is created and diseases are estimated accordingly, examples of the subject or sentences read by the subject include the following English sentences.

例如，以下可以作為英語句子的示例。（1） A,B,C,D,E,F,G （2） Prevention is better than cure. （3） Time and tide wait for no man. （4） Seeing is believing. （5） A rolling stone gathers no moss. （6） One, Two, Three, Four, Five, Six, Seven, Eight 對用任何語言說出的句子沒有特別的限制，但從便於任何人閱讀的角度來看，這些句子最好是眾所周知的。另外，長母音如「a~~~」、「e ~~~」、「u~~~」等，最好是任何人都能讀出來，無論哪種語言是他們的母語。For example, the following can be used as an example of an English sentence. (1) A, B, C, D, E, F, G (2) Prevention is better than cure. (3) Time and tide wait for no man. (4) Seeing is believing. (5) A rolling stone gathers no moss. (6) One, Two, Three, Four, Five, Six, Seven, Eight There are no special restrictions on the sentences spoken in any language, but from the point of view of being easy for anyone to read, these sentences are best known. In addition, long vowels such as "a~~~", "e ~~~", "u~~~", etc., should preferably be read by anyone, no matter which language is their mother tongue.

也可以使用市售的特徵量提取程式來從說話的語音提取聲音特徵量。具體的例子，例如是openSMILE等。It is also possible to use a commercially available feature quantity extraction program to extract the voice feature quantity from the spoken voice. A specific example is openSMILE, etc.

估計裝置200可以應用於例如機器人、人工智慧、汽車、或呼叫中心、互聯網、移動終端裝置應用和服務，如智慧手機和平板裝置以及檢索系統。此外，估計裝置200也可應用於診斷裝置、自動醫療問診裝置、災難分類等。The estimation device 200 can be applied to, for example, robots, artificial intelligence, automobiles, or call centers, the Internet, mobile terminal device applications and services, such as smart phones and tablet devices, and retrieval systems. In addition, the estimation device 200 can also be applied to a diagnosis device, an automatic medical consultation device, disaster classification, and the like.

而且，上述詳細的說明已清楚闡明實施例的特徵和優點。其目的在於不偏離申請專利範圍的精神和範圍的情況下，申請專利範圍擴展到上述實施例的特點和優點。此外，具有本領域普通知識的人應該能夠輕易地設想出所有的改進和修改。因此，無意將具有本發明的實施例的範圍限制為上述範圍，並且可以依賴於實施例中公開的範圍中包括的適當的改進和等同形式。產業利用性Moreover, the above detailed description has clearly clarified the features and advantages of the embodiments. The purpose is to extend the scope of the patent application to the features and advantages of the above-mentioned embodiments without departing from the spirit and scope of the scope of the patent application. In addition, a person with ordinary knowledge in the field should be able to easily imagine all improvements and modifications. Therefore, it is not intended to limit the scope of the embodiment with the present invention to the above-mentioned scope, and may rely on appropriate improvements and equivalents included in the scope disclosed in the embodiment. Industrial availability

提供一種可估計受檢者說話的語音，識別和估計受檢者所患的疾病以防止疾病加重，並基於對疾病的準確確認，使患者能夠接受適當的治療的估計系統、估計程式及估計方法。Provide an estimation system, estimation program, and estimation method that can estimate the speech of the examinee, recognize and estimate the disease that the examinee suffers to prevent the disease from getting worse, and based on the accurate confirmation of the disease, so that the patient can receive appropriate treatment .

本專利申請係主張2020年1月9日提交的日本專利申請號2020-001943的優先權，並且引用該日本專利申請中說明的所有內容。This patent application claims the priority of Japanese Patent Application No. 2020-001943 filed on January 9, 2020, and cites all the contents described in the Japanese Patent Application.

100:電腦 101:CPU 102:RAM 103:ROM 104:HDD 105:通信介面 106:輸入/輸出介面 107:媒體介面 108:記錄媒體 200:估計裝置 201:用戶端 202:通信單元 203:聲音特徵量提取單元 204:第一聲音特徵量的提取單元 205:第二聲音特徵量的提取單元 206:計算單元 207:估計單元 208:記憶單元 N:網路 S1001,S1002,S1003,S1004A:步驟 S1004B,S1005A,S1005B,S1006:步驟 S2001,S2002,S2003,S2004,S2005:步驟100: Computer 101: CPU 102: RAM 103: ROM 104: HDD 105: Communication interface 106: input/output interface 107: Media Interface 108: recording media 200: estimation device 201: Client 202: Communication Unit 203: Sound feature extraction unit 204: Extraction unit of the first sound feature quantity 205: Extraction unit of the second sound feature amount 206: Computing Unit 207: estimation unit 208: memory unit N: Network S1001, S1002, S1003, S1004A: steps S1004B, S1005A, S1005B, S1006: steps S2001, S2002, S2003, S2004, S2005: steps

圖1為本發明的硬體配置的示例圖。圖2為本發明的配置的示例圖。圖3為本發明的流程圖，其中詳細說明提取不受語音取得場所影響的聲音特徵量。圖4為本發明的流程圖。圖5顯示出根據本發明的在提取不受語音取得場所影響的聲音特徵量之後疾病估計的準確性的示例圖。圖6顯示出在成對t檢定或t檢定中具有顯著差異的聲音特徵量的示例圖。圖7顯示出在成對t檢定或t檢定中沒有顯著差異的聲音特徵量的示例圖。圖8顯示出疾病預測值的示例圖。圖9顯示出按疾病分佈的疾病預測值的示例圖。圖10顯示出受檢者大聲讀出的短語內容的示例圖。圖11顯示出根據本發明的在提取不受語音取得場所影響的聲音特徵量之後疾病估計的準確性的另一示例圖。圖12顯示出根據本發明的在提取不受語音取得場所影響的聲音特徵量之後疾病估計的準確性的另一示例圖。圖13顯示出根據本發明的在提取不受語音取得場所影響的聲音特徵量之後疾病估計的準確性的另一示例圖。圖14顯示出根據本發明的在提取不受語音取得場所影響的聲音特徵量之後疾病估計的準確性的另一示例圖。圖15顯示出根據本發明的在提取不受語音取得場所影響的聲音特徵量之後疾病估計的準確性的另一示例圖。圖16顯示出提取獨立於環境的聲音特徵量的結果的表格。Fig. 1 is an example diagram of the hardware configuration of the present invention. Fig. 2 is a diagram showing an example of the configuration of the present invention. Fig. 3 is a flowchart of the present invention, which explains in detail the extraction of voice feature quantities that are not affected by the location where the voice is obtained. Figure 4 is a flow chart of the present invention. FIG. 5 shows an exemplary diagram of the accuracy of disease estimation after extracting sound feature quantities that are not affected by the voice acquisition location according to the present invention. Fig. 6 shows an example diagram of voice feature quantities that have significant differences in paired t-tests or t-tests. Fig. 7 shows an example diagram of voice feature quantities that are not significantly different in paired t-tests or t-tests. Figure 8 shows an example graph of disease predictive value. Fig. 9 shows an example graph of disease prediction values distributed by disease. Fig. 10 shows an example diagram of the content of a phrase read aloud by the subject. FIG. 11 shows another example diagram of the accuracy of disease estimation after extracting sound feature quantities that are not affected by the voice acquisition location according to the present invention. FIG. 12 shows another example diagram of the accuracy of disease estimation after extracting the sound feature amount that is not affected by the voice acquisition location according to the present invention. FIG. 13 shows another exemplary diagram of the accuracy of disease estimation after extracting the sound feature amount that is not affected by the voice acquisition location according to the present invention. FIG. 14 shows another example diagram of the accuracy of disease estimation after extracting the sound feature amount that is not affected by the voice acquisition location according to the present invention. FIG. 15 shows another example diagram of the accuracy of disease estimation after extracting the sound feature amount that is not affected by the voice acquisition location according to the present invention. Fig. 16 shows a table of the results of extracting sound feature quantities independent of the environment.

100:電腦 100: Computer

101:CPU 101: CPU

102:RAM 102: RAM

103:ROM 103: ROM

104:HDD 104: HDD

105:通信介面 105: Communication interface

106:輸入/輸出介面 106: input/output interface

107:媒體介面 107: Media Interface

108:記錄媒體 108: recording media

N:網路 N: Network

Claims

A mental and nervous system disease estimation device, comprising: an extraction unit, based on the sound feature quantity (A) that does not significantly differ with the recording environment and the sound feature quantity (B) related to various diseases, and extract the sound feature quantity ( A) and the common voice feature amount (C) of the voice feature amount (B); a calculation unit that calculates a disease prediction value based on the voice feature amount (C); and an estimation unit that inputs the disease prediction Value to estimate disease.

The estimation device of claim 1, wherein the candidates for the disease that can be estimated include Alzheimer's disease, Levitic dementia, Parkinson's disease, bipolar disorder, atypical depression, and severe Depression.

The estimation device of claim 1, wherein one of the candidates for the disease that can be estimated is severe depression.

An operating method of an estimation device includes: extracting the voice feature amount (A) based on the voice feature amount (A) and various disease-related voice feature amounts (B) by the extraction unit of the estimation device (A) ) And the common voice feature amount (C) of the voice feature amount (B); the step of calculating the disease prediction value based on the voice feature amount (C) by the calculation unit of the estimation device; and the step of The estimating unit of the estimating device estimates the step of the disease by inputting the predictive value of the disease.