TWI854592B - Methods and systems for predicting risk of cardiovascular disease - Google Patents
Methods and systems for predicting risk of cardiovascular disease Download PDFInfo
- Publication number
- TWI854592B TWI854592B TW112113645A TW112113645A TWI854592B TW I854592 B TWI854592 B TW I854592B TW 112113645 A TW112113645 A TW 112113645A TW 112113645 A TW112113645 A TW 112113645A TW I854592 B TWI854592 B TW I854592B
- Authority
- TW
- Taiwan
- Prior art keywords
- data
- cvd
- blood pressure
- risk
- sbp
- Prior art date
Links
- 208000024172 Cardiovascular disease Diseases 0.000 title claims abstract description 126
- 238000000034 method Methods 0.000 title claims abstract description 28
- 230000036541 health Effects 0.000 claims abstract description 47
- 238000010801 machine learning Methods 0.000 claims abstract description 41
- 230000036772 blood pressure Effects 0.000 claims description 44
- 230000035488 systolic blood pressure Effects 0.000 claims description 32
- 230000035487 diastolic blood pressure Effects 0.000 claims description 29
- 238000012544 monitoring process Methods 0.000 claims description 22
- 201000010099 disease Diseases 0.000 claims description 19
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 claims description 19
- 238000003915 air pollution Methods 0.000 claims description 18
- 206010020772 Hypertension Diseases 0.000 claims description 16
- 238000011282 treatment Methods 0.000 claims description 13
- 238000007637 random forest analysis Methods 0.000 claims description 12
- 239000003814 drug Substances 0.000 claims description 11
- 238000004159 blood analysis Methods 0.000 claims description 10
- HVYWMOMLDIMFJA-DPAQBDIFSA-N cholesterol Chemical compound C1C=C2C[C@@H](O)CC[C@]2(C)[C@@H]2[C@@H]1[C@@H]1CC[C@H]([C@H](C)CCCC(C)C)[C@@]1(C)CC2 HVYWMOMLDIMFJA-DPAQBDIFSA-N 0.000 claims description 10
- 229940079593 drug Drugs 0.000 claims description 9
- 230000000391 smoking effect Effects 0.000 claims description 8
- 108010010234 HDL Lipoproteins Proteins 0.000 claims description 7
- 102000015779 HDL Lipoproteins Human genes 0.000 claims description 7
- 208000031226 Hyperlipidaemia Diseases 0.000 claims description 7
- 206010012601 diabetes mellitus Diseases 0.000 claims description 7
- 230000035622 drinking Effects 0.000 claims description 7
- 108010007622 LDL Lipoproteins Proteins 0.000 claims description 6
- 102000007330 LDL Lipoproteins Human genes 0.000 claims description 6
- 238000003066 decision tree Methods 0.000 claims description 6
- 235000012000 cholesterol Nutrition 0.000 claims description 5
- 239000008280 blood Substances 0.000 claims description 4
- 210000004369 blood Anatomy 0.000 claims description 4
- 102100036475 Alanine aminotransferase 1 Human genes 0.000 claims description 3
- 108010082126 Alanine transaminase Proteins 0.000 claims description 3
- 108010003415 Aspartate Aminotransferases Proteins 0.000 claims description 3
- 102000004625 Aspartate Aminotransferases Human genes 0.000 claims description 3
- BSYNRYMUTXBXSQ-UHFFFAOYSA-N Aspirin Chemical compound CC(=O)OC1=CC=CC=C1C(O)=O BSYNRYMUTXBXSQ-UHFFFAOYSA-N 0.000 claims description 3
- 208000006011 Stroke Diseases 0.000 claims description 3
- 229960001138 acetylsalicylic acid Drugs 0.000 claims description 3
- 238000004458 analytical method Methods 0.000 claims description 3
- 239000002220 antihypertensive agent Substances 0.000 claims description 3
- 229940127088 antihypertensive drug Drugs 0.000 claims description 3
- 208000010125 myocardial infarction Diseases 0.000 claims description 3
- 102000017011 Glycated Hemoglobin A Human genes 0.000 claims description 2
- 108010014663 Glycated Hemoglobin A Proteins 0.000 claims description 2
- 206010019280 Heart failures Diseases 0.000 claims description 2
- 230000002058 anti-hyperglycaemic effect Effects 0.000 claims description 2
- 230000001315 anti-hyperlipaemic effect Effects 0.000 claims description 2
- 230000002526 effect on cardiovascular system Effects 0.000 claims description 2
- 230000007774 longterm Effects 0.000 description 10
- 238000004422 calculation algorithm Methods 0.000 description 8
- 238000012549 training Methods 0.000 description 7
- 238000013528 artificial neural network Methods 0.000 description 5
- 230000003205 diastolic effect Effects 0.000 description 5
- 238000010200 validation analysis Methods 0.000 description 5
- 230000010354 integration Effects 0.000 description 4
- 238000007477 logistic regression Methods 0.000 description 4
- LFQSCWFLJHTTHZ-UHFFFAOYSA-N Ethanol Chemical compound CCO LFQSCWFLJHTTHZ-UHFFFAOYSA-N 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 206010028980 Neoplasm Diseases 0.000 description 2
- 201000011510 cancer Diseases 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 208000019622 heart disease Diseases 0.000 description 2
- 238000000513 principal component analysis Methods 0.000 description 2
- 208000020446 Cardiac disease Diseases 0.000 description 1
- 208000034826 Genetic Predisposition to Disease Diseases 0.000 description 1
- WQZGKKKJIJFFOK-GASJEMHNSA-N Glucose Natural products OC[C@H]1OC(O)[C@H](O)[C@@H](O)[C@@H]1O WQZGKKKJIJFFOK-GASJEMHNSA-N 0.000 description 1
- 108010028554 LDL Cholesterol Proteins 0.000 description 1
- 108091028043 Nucleic acid sequence Proteins 0.000 description 1
- 208000008589 Obesity Diseases 0.000 description 1
- 208000001647 Renal Insufficiency Diseases 0.000 description 1
- 230000001154 acute effect Effects 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 239000003472 antidiabetic agent Substances 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000002567 autonomic effect Effects 0.000 description 1
- 230000033228 biological regulation Effects 0.000 description 1
- 230000001364 causal effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000002790 cross-validation Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 239000008103 glucose Substances 0.000 description 1
- 230000005802 health problem Effects 0.000 description 1
- 230000003862 health status Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 201000006370 kidney failure Diseases 0.000 description 1
- 108010022197 lipoprotein cholesterol Proteins 0.000 description 1
- 235000012054 meals Nutrition 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 244000005700 microbiome Species 0.000 description 1
- 238000012806 monitoring device Methods 0.000 description 1
- 235000020824 obesity Nutrition 0.000 description 1
- 230000007310 pathophysiology Effects 0.000 description 1
- 230000002265 prevention Effects 0.000 description 1
- 230000003449 preventive effect Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000005180 public health Effects 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 238000000611 regression analysis Methods 0.000 description 1
- 230000014493 regulation of gene expression Effects 0.000 description 1
- 230000002787 reinforcement Effects 0.000 description 1
- 235000015598 salt intake Nutrition 0.000 description 1
- 150000003839 salts Chemical class 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 201000002859 sleep apnea Diseases 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 238000013179 statistical model Methods 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Landscapes
- Measuring And Recording Apparatus For Diagnosis (AREA)
- Measuring Pulse, Heart Rate, Blood Pressure Or Blood Flow (AREA)
Abstract
Description
本發明關於用於預測心血管疾病風險的方法和計算機實現方法,其透過使用一機器學習模式來分析心血管疾病的發生與患者的健康數據之間的關係。 The present invention relates to a method and a computer-implemented method for predicting the risk of cardiovascular disease by using a machine learning model to analyze the relationship between the occurrence of cardiovascular disease and the patient's health data.
高血壓是世界上最普遍的公共健康問題之一。高血壓會增加嚴重健康問題的風險,例如中風、腎衰竭和心血管疾病(CVD)。高血壓如果不及時治療,會導致心臟病發作或中風。近年來,研究人員對高血壓的病理生理學有了更深入的了解,並開發出更有效的治療和預防方法。其中,居家血壓(HBP)程度已被證明可以預測未來的高血壓和心血管疾病。 High blood pressure is one of the most common public health problems in the world. High blood pressure increases the risk of serious health problems, such as stroke, kidney failure, and cardiovascular disease (CVD). If high blood pressure is not treated promptly, it can lead to a heart attack or stroke. In recent years, researchers have gained a deeper understanding of the pathophysiology of high blood pressure and developed more effective treatments and prevention methods. Among them, home blood pressure (HBP) levels have been shown to predict future high blood pressure and cardiovascular disease.
血壓控制不佳受多種因素影響,包括肥胖、鹽敏感性、心理壓力、遺傳預傾向性、表觀遺傳學(在不改變DNA序列下的基因表現之調節)、睡眠呼吸中止、自主調節、微生物群落和環境因素,如抽煙、過量攝取鹽分、過量飲酒和社會經濟地位。通常,這些因素以複雜的方式相互作用並影響各種疾病。正因為如此,傳統的統計模型很少能反應出風險因素之間所有複雜的因果關係。因此,全面的數據分析對於開發準確的疾病預測模型至關重要。過去,各種統計方法已被用於開發預測模型和發現重要 的風險因素。然而,近期內人工智慧(AI)和大數據興起,並且越來越多地被用於開發疾病預測模型。 Poor blood pressure control is influenced by multiple factors, including obesity, salt sensitivity, psychological stress, genetic predisposition, epigenetics (regulation of gene expression without changing the DNA sequence), sleep apnea, autonomic regulation, microbiome, and environmental factors such as smoking, excessive salt intake, excessive alcohol consumption, and socioeconomic status. Often, these factors interact in complex ways and affect various diseases. Because of this, traditional statistical models rarely reflect all the complex causal relationships between risk factors. Therefore, comprehensive data analysis is essential to develop accurate disease prediction models. In the past, various statistical methods have been used to develop prediction models and discover important risk factors. However, artificial intelligence (AI) and big data have emerged recently and are increasingly being used to develop disease prediction models.
本發明關於用於評估一個體以預測在1個月至4年期間內發生心血管疾病事件的風險之方法和計算機實施方法。這些方法使用機器學習模型來分析心血管疾病的發生與該個體的健康數據之間的關係。 The present invention relates to methods and computer-implemented methods for assessing an individual's risk of developing a cardiovascular disease event within a period of 1 month to 4 years. These methods use machine learning models to analyze the relationship between the occurrence of cardiovascular disease and the individual's health data.
CVD預測是現代醫學的一個重要目標。預測CVD風險來實現早期治療就可以顯著降低患重病的風險。血壓(BP)是大多數健康檢查中所測量的關鍵指標。以往,血壓只能在醫院或保健中心進行測量;但是,如果可以在家中使用輕便型BP監測設備就能夠定期進行BP監測,因此這些測量現在變得更容易獲得。根據本文顯示的結果,清楚地揭示CVD和BP之間的關係。使用可穿戴設備進行日常血壓監測,記錄運動量、飲酒量、藥物劑量以及其他數據,有助於日後及時準確治療心血管疾病。 CVD prediction is an important goal of modern medicine. Predicting CVD risk to enable early treatment can significantly reduce the risk of serious illness. Blood pressure (BP) is a key indicator measured in most health examinations. In the past, BP could only be measured in hospitals or health centers; however, these measurements are now more accessible if BP monitoring can be performed regularly at home using lightweight BP monitoring devices. According to the results shown in this article, the relationship between CVD and BP is clearly revealed. Daily BP monitoring using wearable devices to record exercise volume, alcohol intake, medication dosage, and other data can help to treat cardiovascular diseases in a timely and accurate manner in the future.
如果能夠獲得患者在CVD發生前的血壓數據,也可以提高對CVD的預測。 If the patient's blood pressure data before CVD occurs can be obtained, the prediction of CVD can also be improved.
如本文所用,術語「一」或「一個」用於描述本發明的元件和組分。這樣做僅僅是為了方便並給出本發明的基本概念。進一步,該描述應理解為包含一個或至少一個,並且除非上下文另有明確指示,否則單數術語包括複數,且複數術語包括單數。當在申請專利範圍中與單詞「包含」結合使用時,術語「一」或「一個」可以表示一個或多個。 As used herein, the terms "a" or "an" are used to describe elements and components of the present invention. This is done only for convenience and to give the basic concept of the present invention. Further, the description should be understood to include one or at least one, and unless the context clearly indicates otherwise, singular terms include the plural and plural terms include the singular. When used in conjunction with the word "comprising" in the scope of the patent application, the terms "a" or "an" can mean one or more.
本文所用的術語「或」可能表示「及/或」。 The term "or" as used herein may mean "and/or".
本發明提供一種用於生成一用於估算心血管疾病(CVD) 風險的預測模型之方法,其包含:(a)從一個或多個來源獲得一數據庫,其中該數據庫包含非CVD患者和CVD患者的健康數據;以及與該CVD患者的CVD發病時間相關的數據,和該健康數據包含人口統計數據、個人習慣數據、疾病數據、治療數據、血液分析數據和血壓數據;(b)將該數據庫輸入到至少一個機器學習模型,以訓練該至少一個機器學習模型來預測CVD的發生;(c)評估來自該步驟(b)的該至少一個機器學習模型的準確率,並且當一第一機器學習模型的準確率高於一準確率的閾值,從該至少一個機器學習模型中挑選出該第一機器學習模型;以及(d)使用該第一機器學習模型來生成該用於估算不同時間點的CVD風險的預測模型。 The present invention provides a method for generating a prediction model for estimating cardiovascular disease (CVD) risk, comprising: (a) obtaining a database from one or more sources, wherein the database comprises health data of non-CVD patients and CVD patients; and data related to the CVD onset time of the CVD patients, and the health data comprises demographic data, personal habit data, disease data, treatment data, blood analysis data and blood pressure data; (b) converting the database into a prediction model for estimating cardiovascular disease (CVD) risk; Input into at least one machine learning model to train the at least one machine learning model to predict the occurrence of CVD; (c) evaluate the accuracy of the at least one machine learning model from the step (b), and when the accuracy of a first machine learning model is higher than an accuracy threshold, select the first machine learning model from the at least one machine learning model; and (d) use the first machine learning model to generate the prediction model for estimating the CVD risk at different time points.
在一具體實施例中,該一個或多個來源包含台灣高血壓相關心臟疾病臨床試驗聯盟的數據庫和台灣衛生福利資料科學中心的數據庫。 In one specific embodiment, the one or more sources include the database of the Taiwan Hypertension-Related Heart Disease Clinical Trial Consortium and the database of the Taiwan Center for Health and Welfare Data Science.
在某些方面,該非CVD患者是初步診斷為沒有CVD的患者,且該CDV患者是初步診斷為CVD的患者。 In certain aspects, the non-CVD patient is a patient initially diagnosed as not having CVD, and the CDV patient is a patient initially diagnosed as having CVD.
在另一具體實施例中,該CVD包含心肌梗塞、中風、心臟衰竭、心血管死亡或其組合。 In another embodiment, the CVD comprises myocardial infarction, stroke, heart failure, cardiovascular death, or a combination thereof.
在一具體實施例中,該人口統計數據包含年齡、性別、身體質量指數(BMI)和腰圍。 In one embodiment, the demographic data includes age, gender, body mass index (BMI), and waist circumference.
在另一具體實施例中,該個人習慣數據包含抽煙習慣、飲酒習慣和運動習慣。在某些方面,每個月或每年收集一次抽煙習慣和飲酒習慣,且每週收集一次運動習慣。 In another specific embodiment, the personal habit data includes smoking habits, drinking habits, and exercise habits. In some aspects, smoking habits and drinking habits are collected once a month or a year, and exercise habits are collected once a week.
在一具體實施例中,該疾病數據包含高血壓(HT)、糖尿病(DM)、高血脂症(HL)或其組合。 In a specific embodiment, the disease data includes hypertension (HT), diabetes mellitus (DM), hyperlipidemia (HL) or a combination thereof.
在另一具體實施例中,該治療數據包含降血壓藥物、降血糖藥物、降血脂藥物、阿斯匹林或其組合之使用。 In another specific embodiment, the treatment data includes the use of antihypertensive drugs, antihyperglycemic drugs, antihyperlipidemic drugs, aspirin or a combination thereof.
在一具體實施例中,該血液分析數據包含天門冬胺酸轉胺酶(GOT)、丙胺酸轉胺酶(GPT)、血糖、糖化血紅素(如HbA1c)、膽固醇、低密度脂蛋白(LDL)、高密度脂蛋白(HDL)或其組合。 In a specific embodiment, the blood analysis data includes aspartate aminotransferase (GOT), alanine aminotransferase (GPT), blood glucose, glycosylated hemoglobin (such as HbA1c), cholesterol, low-density lipoprotein (LDL), high-density lipoprotein (HDL) or a combination thereof.
在一具體實施例中,該血壓數據包含收縮壓(SBP)的數據和舒張壓(DBP)的數據。在一較佳的具體實施例中,該SBP的數據包含SBP的平均值、SBP的標準差(SD)、SBP的變異係數(CV)、SBP的平均實際變異性(average real variability,ARV)、一段時間的SBP之早晚平均值(morning and evening average,MEave)的最大值和最小值或其組合。在一更佳的具體實施例中,該SBP的數據包含一段時間的SBP之早晚平均值的最大值和最小值。在另一具體實施例中,該DBP的數據包含DBP的平均值、DBP的SD、DBP的CV、DBP的ARV、一段時間的DBP之MEave的最大值和最小值或其組合。在某些方面,該SBP和DBP的MEave是透過獲取早上6點到8點/晚上4點到6點進食前的血壓所收集的。在另一具體實施例中,該SBP和DBP的收集期間為1至12週。在一較佳的具體實施例中,該SBP和DBP的收集期間為1至6週。在一更佳的具體實施例中,該SBP和DBP的收集期間為1週。 In a specific embodiment, the blood pressure data include data of systolic blood pressure (SBP) and data of diastolic blood pressure (DBP). In a preferred specific embodiment, the SBP data include the mean value of SBP, the standard deviation (SD) of SBP, the coefficient of variation (CV) of SBP, the average real variability (ARV) of SBP, the maximum and minimum values of the morning and evening average (MEave) of SBP over a period of time, or a combination thereof. In a more preferred specific embodiment, the SBP data include the maximum and minimum values of the morning and evening average of SBP over a period of time. In another specific embodiment, the DBP data include the mean value of DBP, the SD of DBP, the CV of DBP, the ARV of DBP, the maximum and minimum values of MEave of DBP over a period of time, or a combination thereof. In some aspects, the MEave of SBP and DBP is collected by obtaining blood pressure before eating at 6 am to 8 am/4 pm to 6 pm. In another embodiment, the collection period of SBP and DBP is 1 to 12 weeks. In a preferred embodiment, the collection period of SBP and DBP is 1 to 6 weeks. In a more preferred embodiment, the collection period of SBP and DBP is 1 week.
在一具體實施例中,該數據庫進一步包含氣溫數據和空氣污染數據。在一較佳的具體實施例中,該空氣污染數據包含空氣品質指數(air quality index,AQI)、O3、PM10和PM2.5。在一更佳的具體實施例中,該空氣污染數據包含PM2.5。 In one embodiment, the database further includes temperature data and air pollution data. In a preferred embodiment, the air pollution data includes air quality index (AQI), O 3 , PM10 and PM2.5. In a more preferred embodiment, the air pollution data includes PM2.5.
在本發明中,該數據庫隨機分為一訓練集和一驗證集,用於訓練該至少一個機器學習模型。通常,該訓練集與該驗證集合併使用。術語「驗證集」是指統計樣本中的一組患者,其中該個體的數據用於驗證或評估感興趣而確定使用的訓練集之定量值。 In the present invention, the database is randomly divided into a training set and a validation set for training the at least one machine learning model. Usually, the training set is used together with the validation set. The term "validation set" refers to a group of patients in a statistical sample, where the data of the individuals are used to validate or evaluate the quantitative value of the training set determined to be used.
在一具體實施例中,該至少一種機器學習模型包含邏輯迴歸(logistic regression,LR)、深度類神經網路(deep neural networks,DNNs)、隨機森林(random forest,RF)、輕量化梯度提升機(light gradient boosting machine,LightGBM)、極限梯度提升(eXtreme gradient boosting,XGboost)、決策樹(decision trees,DTs)、k近鄰演算法(k nearest neighbor,KNN)、自適應增強(Adaboost)、梯度提升(gradient boosting,Gboost)、DT裝袋算法(DT bagging,DTB)、knn裝袋算法(knn bagging,KNNB)或RF裝袋算法(RF bagging,RFB)。在一較佳的具體實施例中,該至少一個機器學習模型包含XGBoost、DTB或RF。在一較佳的具體實施例中,該第一機器學習模型包含XGBoost、DTB或RF。 In one embodiment, the at least one machine learning model includes logistic regression (LR), deep neural networks (DNNs), random forest (RF), light gradient boosting machine (LightGBM), eXtreme gradient boosting (XGboost), decision trees (DTs), k nearest neighbor (KNN), Adaboost, gradient boosting (Gboost), DT bagging (DTB), knn bagging (KNNB) or RF bagging (RFB). In a preferred embodiment, the at least one machine learning model includes XGBoost, DTB or RF. In a preferred embodiment, the first machine learning model includes XGBoost, DTB or RF.
此外,該至少一個機器學習模型包含深度學習架構(deep-learning architectures)。在某些方面,該深度學習架構包含深度神經網路、深度信念網路(deep belief networks)、深度增強式學習(deep reinforcement learning)、遞迴神經網路(recurrent neural networks)、卷積神經網路(convolutional neural networks)和轉換器(transformers)。 In addition, the at least one machine learning model includes a deep-learning architecture. In some aspects, the deep-learning architecture includes deep neural networks, deep belief networks, deep reinforcement learning, recurrent neural networks, convolutional neural networks, and transformers.
在本發明中,該至少一個機器學習模型可以是一監督式機器學習演算法(supervised machine learning algorithm)。可以透過使用患者的先前數據、相似患者的先前數據或其組合來訓練該監督式機器學習演算 法。該監督式機器學習演算法可以是迴歸演算法、支持向量機(support vector machine)、決策樹、神經網路等。在機器學習模型是迴歸演算法的情況下,權重可以是迴歸參數。該監督式機器學習演算法可以是預測患者是否將經歷CVD事件的二元分類器。二元分類器可以產生介於0和1之間的機率風險分數。在某些情況下,系統可以將機率風險分數映射到定性風險類別。或者,該監督式機器學習演算法可以是直接預測定性風險類別的多元分類器。此外,該至少一個機器學習模型的內部驗證是透過使用十折交叉驗證來執行的。 In the present invention, the at least one machine learning model may be a supervised machine learning algorithm. The supervised machine learning algorithm may be trained by using the patient's previous data, previous data of similar patients, or a combination thereof. The supervised machine learning algorithm may be a regression algorithm, a support vector machine, a decision tree, a neural network, etc. In the case where the machine learning model is a regression algorithm, the weights may be regression parameters. The supervised machine learning algorithm may be a binary classifier that predicts whether a patient will experience a CVD event. The binary classifier may generate a probability risk score between 0 and 1. In some cases, the system can map the probabilistic risk scores to qualitative risk categories. Alternatively, the supervised machine learning algorithm can be a multivariate classifier that directly predicts the qualitative risk category. In addition, internal validation of the at least one machine learning model is performed using ten-fold cross validation.
在另一具體實施例中,該準確率的閾值為0.85。在一較佳的具體實施例中,該準確率的閾值為0.9。在一更佳的具體實施例中,該準確率的閾值為0.95。 In another specific embodiment, the threshold of the accuracy is 0.85. In a preferred specific embodiment, the threshold of the accuracy is 0.9. In a more preferred specific embodiment, the threshold of the accuracy is 0.95.
在一具體實施例中,該與該CVD患者的CVD發病時間相關的數據包含CVD患者在1個月、3個月、6個月、9個月、1年、2年、3年或4年內發生CVD事件的發病時間。在某些方面,該至少一個機器學習模型被訓練來分析該非CVD患者和該CVD患者的健康數據以及該與該CVD患者的CVD發病時間相關的數據之間的關係,以預測未來CVD的發生。因此,本發明可以使用與CVD患者的不同CVD發病時間相關之數據來訓練該至少一個機器學習模型以生成不同的預測模型,並且不同的預測模型用於預測一個體於未來不同時間點發生CVD事件的機率,其中該不同時間點為1個月、3個月、6個月、9個月、1年、2年、3年或4年內。 In a specific embodiment, the data related to the onset time of CVD in the CVD patient includes the onset time of a CVD event in the CVD patient within 1 month, 3 months, 6 months, 9 months, 1 year, 2 years, 3 years or 4 years. In certain aspects, the at least one machine learning model is trained to analyze the relationship between the health data of the non-CVD patient and the CVD patient and the data related to the onset time of CVD in the CVD patient to predict the occurrence of future CVD. Therefore, the present invention can use data related to different CVD onset times of CVD patients to train the at least one machine learning model to generate different prediction models, and the different prediction models are used to predict the probability of a CVD event occurring in an individual at different time points in the future, wherein the different time points are within 1 month, 3 months, 6 months, 9 months, 1 year, 2 years, 3 years or 4 years.
在本發明中,該患者的健康數據用於建立較高準確率的長期(1至4年)CVD發生之預測模型。此外,利用該患者的健康數據、該氣 溫數據和該空氣污染數據來建立較高準確率的短期(1至9個月)CVD發生之預測模型。因此,本發明的預測模型可以估算不同時間點的CVD發病風險。在一具體實施例中,該時間點包含1個月、3個月、6個月、9個月、1年、2年、3年或4年。 In the present invention, the patient's health data is used to establish a high-accuracy prediction model for the occurrence of long-term (1 to 4 years) CVD. In addition, the patient's health data, the temperature data and the air pollution data are used to establish a high-accuracy prediction model for the occurrence of short-term (1 to 9 months) CVD. Therefore, the prediction model of the present invention can estimate the risk of CVD at different time points. In a specific embodiment, the time point includes 1 month, 3 months, 6 months, 9 months, 1 year, 2 years, 3 years or 4 years.
本發明也提供一種系統,其包含一個或多個計算機以及一個或多個儲存設備,用於儲存一可運行的程序,當該一個或多個計算機運行時,使該一個或多個計算機執行以下操作以生成一種用於估算CVD風險的預測模型,包含:(a)從一個或多個來源獲得一數據庫,其中該數據庫包含非CVD患者和CVD患者的健康數據;以及與該CVD患者的CVD發病時間相關的數據,和該健康數據包含人口統計數據、個人習慣數據、疾病數據、治療數據、血液分析數據和血壓數據;(b)將該數據庫輸入到至少一個機器學習模型,以訓練該至少一個機器學習模型來預測CVD的發生;(c)評估來自該步驟(b)的該至少一個機器學習模型的準確率,並且當一第一機器學習模型的準確率高於一準確率的閾值,從該至少一個機器學習模型中挑選出該第一機器學習模型;以及(d)使用該第一機器學習模型來生成該用於估算不同時間點的CVD風險的預測模型。 The present invention also provides a system, which includes one or more computers and one or more storage devices for storing an executable program. When the one or more computers are run, the one or more computers are caused to perform the following operations to generate a prediction model for estimating CVD risk, including: (a) obtaining a database from one or more sources, wherein the database includes health data of non-CVD patients and CVD patients; and data related to the CVD onset time of the CVD patients, and the health data includes demographic data, personal habit data, disease data, data, treatment data, blood analysis data and blood pressure data; (b) inputting the database into at least one machine learning model to train the at least one machine learning model to predict the occurrence of CVD; (c) evaluating the accuracy of the at least one machine learning model from the step (b), and when the accuracy of a first machine learning model is higher than an accuracy threshold, selecting the first machine learning model from the at least one machine learning model; and (d) using the first machine learning model to generate the prediction model for estimating the CVD risk at different time points.
本發明進一步提供一種預測一個體的CVD風險之方法,包含:(i)獲取該個體的健康數據,其中該健康數據包含人口統計數據、個人習慣數據、疾病數據、治療數據、血液分析數據和血壓數據;(ii)將該健康數據輸入到上述用於估算CVD風險的預測模型;以及(iii)輸出不同時間點的CVD風險預測結果。 The present invention further provides a method for predicting an individual's CVD risk, comprising: (i) obtaining health data of the individual, wherein the health data includes demographic data, personal habit data, disease data, treatment data, blood analysis data and blood pressure data; (ii) inputting the health data into the above-mentioned prediction model for estimating CVD risk; and (iii) outputting CVD risk prediction results at different time points.
在另一具體實施例中,該用於估算CVD風險的預測模型之 預測結果提供該個體在10年內發生CVD事件的機率。在一較佳的具體實施例中,該用於估算CVD風險的預測模型之預測結果提供該個體在6年內發生CVD事件的機率。在一更佳的具體實施例中,該用於估算CVD風險的預測模型之預測結果提供該個體在4年內發生CVD事件的機率。 In another specific embodiment, the prediction result of the prediction model for estimating CVD risk provides the probability of the individual having a CVD event within 10 years. In a preferred specific embodiment, the prediction result of the prediction model for estimating CVD risk provides the probability of the individual having a CVD event within 6 years. In a more preferred specific embodiment, the prediction result of the prediction model for estimating CVD risk provides the probability of the individual having a CVD event within 4 years.
在一具體實施例中,該用於估算CVD風險的預測模型之預測結果提供該個體在1個月、3個月、6個月、9個月、1年、2年、3年或4年內發生CVD事件的機率。 In a specific embodiment, the prediction result of the prediction model for estimating CVD risk provides the probability of the individual experiencing a CVD event within 1 month, 3 months, 6 months, 9 months, 1 year, 2 years, 3 years or 4 years.
在另一具體實施例中,該方法進一步包含一步驟(iv),其在步驟(iii)之後,其中該步驟(iv)包含基於該預測結果來判定是否對該患有CVD風險的個體啟動醫療干預。 In another specific embodiment, the method further comprises a step (iv) after step (iii), wherein the step (iv) comprises determining whether to initiate medical intervention for the individual at risk of CVD based on the prediction result.
本發明進一步提供一種判定一個體的CVD風險之計算機實現方法,其包含:接收該個體的健康數據;使用上述用於估算CVD風險的預測模型,並基於所接收到的健康數據來判定該個體的CVD風險;以及輸出不同的時間點所判定的CVD風險,其中該用於估算CVD風險的預測模型透過分析CVD的發生與所接收到的健康數據之間的關係來評估所判定的CVD風險。 The present invention further provides a computer-implemented method for determining the CVD risk of an individual, comprising: receiving health data of the individual; using the above-mentioned prediction model for estimating CVD risk and determining the CVD risk of the individual based on the received health data; and outputting the CVD risk determined at different time points, wherein the prediction model for estimating CVD risk evaluates the determined CVD risk by analyzing the relationship between the occurrence of CVD and the received health data.
在一具體實施例中,該個體和該患者為人類。 In one embodiment, the individual and the patient are human.
本發明也提供一種健康照護系統,其包含:(a)一患者監測模組,用於收集由患者的即時監測所產生的患者數據;(b)一數據庫,用於收集該患者的健康數據,其中該健康數據包含人口統計數據、個人習慣數據、疾病數據、治療數據、血液分析數據和血壓數據;以及(c)一整合模組,用於接收該患者數據和該健康數據,並使用上述用於估算CVD風險 的預測模型來分析該患者數據和該健康數據,和輸出該患者於不同時間點的CVD風險之預測結果,其是基於該用於估算CVD風險的預測模型之分析。在該健康照護系統中,該患者監測模組為一遠距患者監測模組。 The present invention also provides a health care system, which includes: (a) a patient monitoring module for collecting patient data generated by real-time monitoring of the patient; (b) a database for collecting the patient's health data, wherein the health data includes demographic data, personal habit data, disease data, treatment data, blood analysis data and blood pressure data; and (c) an integration module for receiving the patient data and the health data, and using the above-mentioned prediction model for estimating CVD risk to analyze the patient data and the health data, and output the prediction results of the patient's CVD risk at different time points, which are based on the analysis of the prediction model for estimating CVD risk. In the health care system, the patient monitoring module is a remote patient monitoring module.
在一具體實施例中,該患者監測模組是一遠距患者監測模組。因此,該遠距患者監測模組能夠透過對該患者進行遠距即時監測來收集患者數據。在某些方面,該患者監測模組的功能包含擷取生命體徵。另外,該遠距患者監測模組的目的是用於監測該患者的居家血壓。 In one embodiment, the patient monitoring module is a remote patient monitoring module. Therefore, the remote patient monitoring module is capable of collecting patient data by remotely monitoring the patient in real time. In some aspects, the function of the patient monitoring module includes capturing vital signs. In addition, the purpose of the remote patient monitoring module is to monitor the patient's home blood pressure.
在另一具體實施例中,該患者監測模組、該數據庫以及該整合模組相互連接。 In another specific embodiment, the patient monitoring module, the database and the integration module are interconnected.
10:健康照護系統 10:Health care system
101:患者監測模組 101: Patient monitoring module
102:數據庫 102: Database
103:整合模組 103: Integration module
圖1為本發明的數據來源之描述。n:患者人數。 Figure 1 is a description of the data source of the present invention. n: number of patients.
圖2顯示用於估算患者的心血管疾病風險之健康照護系統。 Figure 2 shows a healthcare system used to estimate a patient's risk of cardiovascular disease.
本發明的實施例可以有不同的實施內容,並不侷限於下文所舉的例子。以下實施例僅代表本發明的各個方面和特徵。 The embodiments of the present invention may have different implementation contents and are not limited to the examples given below. The following embodiments only represent various aspects and features of the present invention.
1.研究設計 1. Research design
本發明的模型是由兩個數據庫所構建而成。一個是從台灣高血壓相關心臟疾病臨床試驗聯盟(Taiwan Consortium of Hypertension-associated Cardiac Disease,TCHC)數據庫所收集,該數據庫有11家醫學中心參與,是一個專注於高血壓和高血壓相關疾病臨床試驗和研究合作的非營利性研究聯盟;另一個是台灣衛生福利資料科學中心(Health and Welfare Data Science Center,HWDSC)數據庫,該數據庫包含 2000年至2017年台灣全民健康保險受益人、門診就診申報、住院申報、藥局數據庫、死因數據庫的記錄。合併後的數據庫有近2820名參與者(圖1);大約80%和20%的數據集是隨機選擇的,分別用於訓練和驗證。由於數據集中僅20%的患者發生CVD,因此80%的CVD陽性和80%的CVD陰性的數據集用於訓練。其餘20%的項目用於測試。TCHC包含心血管疾病記錄、血壓數據、用藥記錄、生活方式訪談、基本健康資訊和其他疾病記錄。這不僅有助於了解可測量的健康值對心血管疾病的影響,而且有助於了解不同疾病之間是否存在相互作用。使用了幾種演算法:邏輯迴歸(LR)、深度神經網路(DNN)、隨機森林(RF)、輕量化梯度提升機(LightGBM)、極限梯度提升(XGboost)、決策樹(DTs)、k近鄰演算法(KNN)、自適應增強(Adaboost)、梯度提升(Gboost)、DT裝袋算法(DTB)、knn裝袋算法(KNNB)和RF裝袋算法(RFB)。最終,本發明留下三個性能最好的模型:XGBoost、DTB和RF。 The model of the present invention is constructed from two databases. One is collected from the Taiwan Consortium of Hypertension-associated Cardiac Disease (TCHC) database, which has 11 participating medical centers and is a non-profit research consortium focusing on clinical trials and research collaborations in hypertension and hypertension-related diseases; the other is the Taiwan Health and Welfare Data Science Center (HWDSC) database, which contains records of Taiwan's National Health Insurance beneficiaries, outpatient visits, hospitalizations, pharmacy databases, and cause of death databases from 2000 to 2017. The combined database has nearly 2,820 participants (Figure 1); approximately 80% and 20% of the datasets were randomly selected for training and validation, respectively. Since only 20% of the patients in the dataset developed CVD, 80% of the CVD-positive and 80% of the CVD-negative datasets were used for training. The remaining 20% of the items were used for testing. TCHC contains cardiovascular disease records, blood pressure data, medication records, lifestyle interviews, basic health information, and other disease records. This not only helps to understand the impact of measurable health values on cardiovascular disease, but also helps to understand whether there are interactions between different diseases. Several algorithms were used: Logical Regression (LR), Deep Neural Network (DNN), Random Forest (RF), Lightweight Gradient Boosting Machine (LightGBM), Extreme Gradient Boosting (XGboost), Decision Trees (DTs), K-Nearest Neighbors (KNN), Adaptive Boosting (Adaboost), Gradient Boosting (Gboost), DT Bagging (DTB), knn Bagging (KNNB) and RF Bagging (RFB). Finally, the present invention leaves three models with the best performance: XGBoost, DTB and RF.
CVD結果的比例如表1所示。 The ratio of CVD results is shown in Table 1.
表1、CVD結果的比例
2.數據處理 2. Data processing
主要特徵如下: The main features are as follows:
2-1.與BP無關的特徵: 2-1. Characteristics not related to BP:
人口統計學:年齡、性別、BMI、腰圍 Demographics: age, gender, BMI, waist circumference
生活習慣:抽煙習慣、飲酒習慣、運動習慣 Lifestyle habits: smoking habits, drinking habits, exercise habits
疾病:高血壓(HT)、糖尿病(DM)、高血脂症(HL) Diseases: Hypertension (HT), diabetes mellitus (DM), hyperlipidemia (HL)
治療:降血壓藥物(抗BP藥物)、降血糖藥物、降血脂藥物、阿斯匹林 Treatment: antihypertensive drugs (anti-BP drugs), hypoglycemic drugs, lipid-lowering drugs, aspirin
其他:GOT、GPT、血糖(Glu_AC和Glu_PC)、HbA1c、膽固醇、LDL和HDL Others: GOT, GPT, blood sugar (Glu_AC and Glu_PC), HbA1c, cholesterol, LDL and HDL
2-2. BP相關特徵 2-2. BP-related characteristics
收縮壓(SBP):平均值、標準差(SD)、變異係數(CV)、平均實際變異性(ARV)和7天早晚平均值(MEave)(早上6點到晚上8點/晚上4點到6點進食前)的最大值和最小值。 Systolic blood pressure (SBP): mean, standard deviation (SD), coefficient of variation (CV), average actual variation (ARV), and maximum and minimum values of the 7-day morning and evening mean (MEave) (6am to 8pm/4pm to 6pm before meals).
舒張壓(DBP):平均值、SD、CV、ARV、和MEave的最大值和最小值 Diastolic blood pressure (DBP): mean, SD, CV, ARV, and maximum and minimum values of MEave
2.3 天氣 2.3 Weather
氣溫(如果患者在某一天被診斷為患有心血管疾病,則使用 當天的氣溫。如果患者沒有心血管疾病,則使用測量血壓時的當天氣溫。) Temperature (If the patient was diagnosed with cardiovascular disease on a particular day, use the temperature of that day. If the patient did not have cardiovascular disease, use the temperature of the day when blood pressure was measured.)
空氣污染值:包括常用的空氣污染指標,例如AQI、O3、PM10和PM2.5。 Air pollution values: Includes commonly used air pollution indicators such as AQI, O 3 , PM10 and PM2.5.
年齡、性別、BMI、腰圍是醫學模型容易附加的共同特徵。習慣(抽煙、飲酒、運動)對一個人的健康狀況會有影響。所收集的與抽煙、飲酒、運動習慣相關的數據如表2所示。如本發明提到的,疾病之間可能相互影響;因此,本發明包括高血壓、糖尿病、高血脂、癌症、恐慌、憤怒和憂鬱作為特徵。天門冬胺酸轉胺酶(GOT)、丙胺酸轉胺酶(GPT)和血糖是健康的危險因素。低密度脂蛋白(LDL),有時稱為“壞”膽固醇,和高密度脂蛋白(HDL),或稱為“好”膽固醇,是膽固醇的重要參數。目標是透過使用BP數據來預測CVD的發生。但是,BP數據很複雜。因此,改為使用SBP和DBP數據的方法。一些常見的CVD相關疾病也進行分析。基於年齡的回歸用於缺失數據。數據集中有和沒有CVD的患者分別標記為1和0。同樣地,患有和不患有HT、DM、HL和癌症的患者也分別標記為1和0。對於數字資料,使用原始數據。本發明使用特徵縮放方法來標準化每個特徵的影響。對於分類資料,使用獨熱編碼(one-hot encoding)將特徵轉換為可訓練的。表3和表4顯示用於本發明的連續變量(continuous variables)和二元變量(binary variables)。此外,本發明使用基於邏輯迴歸和主成分分析(principal component analysis,PCA)的遞迴特徵消除法(recursive feature elimination,RFE)來進行特徵選擇。 Age, gender, BMI, waist circumference are common features that are easily attached to the medical model. Habits (smoking, drinking, exercise) have an impact on a person's health status. The collected data related to smoking, drinking, and exercise habits are shown in Table 2. As mentioned in the present invention, diseases may affect each other; therefore, the present invention includes hypertension, diabetes, hyperlipidemia, cancer, panic, anger and depression as features. Aspartate aminotransferase (GOT), alanine aminotransferase (GPT) and blood sugar are risk factors for health. Low-density lipoprotein (LDL), sometimes called "bad" cholesterol, and high-density lipoprotein (HDL), or "good" cholesterol, are important parameters of cholesterol. The goal is to predict the occurrence of CVD by using BP data. However, BP data is complex. Therefore, a method using SBP and DBP data is used instead. Some common CVD-related diseases are also analyzed. Age-based regression is used for missing data. Patients with and without CVD in the dataset are labeled 1 and 0, respectively. Similarly, patients with and without HT, DM, HL, and cancer are also labeled 1 and 0, respectively. For numerical data, the original data is used. The present invention uses a feature scaling method to standardize the effect of each feature. For categorical data, one-hot encoding is used to convert the features into trainable ones. Tables 3 and 4 show the continuous variables and binary variables used in the present invention. In addition, the present invention uses recursive feature elimination (RFE) based on logical regression and principal component analysis (PCA) to perform feature selection.
表2、抽煙、飲酒、運動習慣的分類資料
表3、連續變量
表4、二元變量
一些重要的特徵定義如下:早晚血壓值的平均值(MEave),居家收縮壓和舒張壓的MEave之SD,居家收縮壓和舒張壓的MEave之CV,居家收縮壓和舒張壓的MEave之ARV,以及居家收縮壓和舒張壓的MEave之獨立於平均值的變異性(variability independent of the mean, VIM)。儘管包含了BP數據,但仍選擇與CVD相關的特徵來建構模型。TCHC包括每位患者的收縮壓和舒張壓,每天四次,持續7天。插值用於重建缺失的數據。該方法能準確識別BP特徵、變異性和趨勢。 Some important features were defined as follows: mean of morning and evening BP values (MEave), SD of MEave for home systolic and diastolic BP, CV of MEave for home systolic and diastolic BP, ARV of MEave for home systolic and diastolic BP, and variability independent of the mean (VIM) of MEave for home systolic and diastolic BP. Despite the inclusion of BP data, features related to CVD were selected for model construction. TCHC includes systolic and diastolic BP for each patient four times daily for 7 days. Interpolation was used to reconstruct missing data. The method accurately identified BP characteristics, variability, and trends.
經過實驗後發現,大部分血壓相關參數不影響模型的準確率。因此,在表5、6和7中,本發明僅使用SBP的MEave作為預測參數以減少模型的訓練負荷。 After experiments, it was found that most blood pressure related parameters do not affect the accuracy of the model. Therefore, in Tables 5, 6 and 7, the present invention only uses the MEave of SBP as a prediction parameter to reduce the training load of the model.
3、模型 3. Model
根據以往的實驗,本發明使用三種對於預測CVD發生準確率最高的模型。當本發明預測短期CVD發生時,本發明可以得到很好的結果。否則,在預測長期CVD發生時,準確率將下降到0.85左右。然而,曲線下面積(AUC)大多非常高,接近0.9。 Based on previous experiments, the present invention uses three models with the highest accuracy in predicting the occurrence of CVD. When the present invention predicts the occurrence of short-term CVD, the present invention can obtain good results. Otherwise, when predicting the occurrence of long-term CVD, the accuracy will drop to about 0.85. However, the area under the curve (AUC) is mostly very high, close to 0.9.
4.氣溫與空氣污染的CVD預測 4. CVD prediction of temperature and air pollution
表5、6和7顯示將空氣溫度和空氣污染加入到上述模型中所獲得的結果。氣溫的計算是測量血壓時的7天之平均氣溫,空氣污染的計算是測量血壓時的7天PM2.5之平均值。從表6可以看出,加入空氣溫度或空氣污染後,模型的準確率有稍微提升。 Tables 5, 6, and 7 show the results obtained by adding air temperature and air pollution to the above model. The temperature is calculated as the average temperature of the 7 days when blood pressure is measured, and the air pollution is calculated as the average PM2.5 of the 7 days when blood pressure is measured. From Table 6, it can be seen that after adding air temperature or air pollution, the accuracy of the model is slightly improved.
對於氣溫,僅使用心血管疾病發生當天的氣溫,因為氣溫的影響是有立即性的。 For temperature, only the temperature on the day the cardiovascular disease occurred was used because the effect of temperature is immediate.
表5、僅有氣溫的短期CVD發生的模型比較結果
從表5可以看出,僅用溫度來預測心血管疾病的發生可以達到0.88以上的準確率,已經是一個很好的預測模型。但是,可以加入更多特徵以提升準確率。 As can be seen from Table 5, using temperature alone to predict the occurrence of cardiovascular disease can achieve an accuracy of more than 0.88, which is already a very good prediction model. However, more features can be added to improve the accuracy.
由於要加入的特徵包括涉及基本生命特徵、血壓特徵和空氣污染,這些特徵不僅是短期因素,而且是中短期心血管疾病的常見原因;故本發明擴展了數據集去將1、3、6和9個月內經歷過心血管疾病的患者包括在內。 Since the characteristics to be included include those involving basic vital signs, blood pressure characteristics and air pollution, which are not only short-term factors but also common causes of short- to medium-term cardiovascular diseases; the present invention expands the data set to include patients who have experienced cardiovascular diseases within 1, 3, 6 and 9 months.
在表6中可以看出,在加入其他特徵後,模型的準確率有了明顯的提高,準確率高達0.99,且平均在0.95以上。這可以被認為是一種非常準確的用於心血管疾病之中短期預測模型。 As can be seen in Table 6, after adding other features, the accuracy of the model has been significantly improved, reaching 0.99, and the average is above 0.95. This can be considered a very accurate short-term prediction model for cardiovascular disease.
表6、具有氣溫和空氣汙染的短期CVD發生之模型比較結果
5.長期模型 5. Long-term model
中短期模型已經顯示出良好的結果。接下來,本發明想嘗試,心血管疾病的長期預測。本發明所設定時間範圍,是針對1、2、3和4年內經歷過心血管疾病的患者。由於有更多的長期患者和更多的變量要考量,本發明預計準確率會降低。 The short- to medium-term models have shown good results. Next, the present invention wants to try the long-term prediction of cardiovascular disease. The time range set by the present invention is for patients who have experienced cardiovascular disease within 1, 2, 3 and 4 years. As there are more long-term patients and more variables to consider, the accuracy of the present invention is expected to decrease.
如表7所示,除了預測四年內心血管疾病的準確率為0.88左右外,模型在1年、2年、3年的預測準確率均在0.9以上。與中短期模型相比,雖然有些微的落差,但仍是準確率不錯的模型。 As shown in Table 7, except for the accuracy of predicting cardiovascular disease within four years, which is about 0.88, the model's prediction accuracy in 1, 2, and 3 years is all above 0.9. Compared with the medium- and short-term models, although there is a slight gap, it is still a model with good accuracy.
表7、長期CVD發生的模型比較結果
本發明的結果也透過分類變量篩選進行驗證,例如體型、運動、飲酒和藥物與CVD之間的關係。而且,傳統的迴歸模型,例如LR,要達到高準確度並不容易。相反地,機器學習模型適合處理生物二元判斷模型。實驗有幾個適合預測CVD風險的模型,例如RF、DTB和xGBoost。本發明的目的是識別CVD風險。然而,基於有限的數據庫大小,BP記錄只有7天;延長4-6週的記錄可能會產生更準確的結果。 The results of the present invention are also validated by categorical variable screening, such as the relationship between body size, exercise, alcohol and drugs and CVD. Moreover, it is not easy for traditional regression models, such as LR, to achieve high accuracy. In contrast, machine learning models are suitable for dealing with biological binary judgment models. There are several models suitable for predicting CVD risk, such as RF, DTB and xGBoost. The purpose of the present invention is to identify CVD risk. However, based on the limited database size, BP records are only 7 days; extended records of 4-6 weeks may produce more accurate results.
總結來說,參與本發明的人基於CVD的發生分為短期組(1/3/6/9個月內)和長期組(1/2/3/4年內)。根據不同的時間段,輸入因素也會不同。 In summary, participants in the present invention were divided into short-term group (within 1/3/6/9 months) and long-term group (within 1/2/3/4 years) based on the occurrence of CVD. Depending on the time period, the input factors will also be different.
對於短期模型:基本物理特徵、血壓數據和溫度的組合用於預測短期CVD發生。 For short-term models: A combination of basic physical characteristics, blood pressure data, and temperature is used to predict short-term CVD occurrence.
對於長期模型:基本物理特徵用於預測長期CVD發生。 For long-term models: basic physical features are used to predict the occurrence of long-term CVD.
本發明建構的模型可應用於短期急性心血管疾病的預測,也可作為長期追蹤觀察和預防醫學的預測模型。 The model constructed by the present invention can be applied to the prediction of short-term acute cardiovascular diseases, and can also be used as a prediction model for long-term follow-up observation and preventive medicine.
此外,本發明進一步提供一種用於估算患者的心血管疾病風險之健康照護系統。在圖2中,該健康照護系統10包含:(a)一患者監測模組101,用於收集由患者的即時監測所產生的患者數據;(b)一數據庫102,用於收集該患者的健康數據,其中該健康數據包含人口統計數據、個人習慣數據、疾病數據、治療數據、血液分析數據和血壓數據;以及(c)一整合模組103,用於接收該患者數據和該健康數據,並使用上述用於估算CVD風險的預測模型來分析該患者數據和該健康數據,和輸出該患者於不同時間點的CVD風險之預測結果,其是基於該用於估算CVD風險的預測模型之分析。在該健康照護系統中,該患者監測模組為一遠距患者監測模組。
In addition, the present invention further provides a health care system for estimating the cardiovascular disease risk of a patient. In Figure 2, the
本領域技術人員將上述概要理解為對用於傳達所寄存的申請資訊之方法描述。本領域技術人員將認識到這些僅是說明性質,並且許多等效物都是有可能的。 Those skilled in the art will understand the above summaries as descriptions of methods used to communicate deposited application information. Those skilled in the art will recognize that these are merely illustrative in nature and that many equivalents are possible.
Claims (12)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
TW112113645A TWI854592B (en) | 2023-04-12 | 2023-04-12 | Methods and systems for predicting risk of cardiovascular disease |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
TW112113645A TWI854592B (en) | 2023-04-12 | 2023-04-12 | Methods and systems for predicting risk of cardiovascular disease |
Publications (2)
Publication Number | Publication Date |
---|---|
TWI854592B true TWI854592B (en) | 2024-09-01 |
TW202443591A TW202443591A (en) | 2024-11-01 |
Family
ID=93648841
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
TW112113645A TWI854592B (en) | 2023-04-12 | 2023-04-12 | Methods and systems for predicting risk of cardiovascular disease |
Country Status (1)
Country | Link |
---|---|
TW (1) | TWI854592B (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10483006B2 (en) * | 2017-05-19 | 2019-11-19 | Siemens Healthcare Gmbh | Learning based methods for personalized assessment, long-term prediction and management of atherosclerosis |
CN111553478A (en) * | 2020-05-06 | 2020-08-18 | 西安电子科技大学 | Cardiovascular disease prediction system and method for the elderly in community based on big data |
CN115831374A (en) * | 2022-12-05 | 2023-03-21 | 南通先进通信技术研究院有限公司 | Method for constructing cardiovascular disease diagnosis model |
-
2023
- 2023-04-12 TW TW112113645A patent/TWI854592B/en active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10483006B2 (en) * | 2017-05-19 | 2019-11-19 | Siemens Healthcare Gmbh | Learning based methods for personalized assessment, long-term prediction and management of atherosclerosis |
CN111553478A (en) * | 2020-05-06 | 2020-08-18 | 西安电子科技大学 | Cardiovascular disease prediction system and method for the elderly in community based on big data |
CN115831374A (en) * | 2022-12-05 | 2023-03-21 | 南通先进通信技术研究院有限公司 | Method for constructing cardiovascular disease diagnosis model |
Non-Patent Citations (1)
Title |
---|
期刊 Joung Ouk (Ryan) Kim, Yong-Suk Jeong, Jin Ho Kim, Jong-Weon Lee, Dougho Park and Hyoung-Seop Kim Machine Learning-Based Cardiovascular Disease Prediction Model: A Cohort Study on the Korean National Health Insurance Service Health Screening Database Diagnostics 11(6), 943 MDPI 20210525 https://www.mdpi.com/2075-4418/11/6/943 * |
Also Published As
Publication number | Publication date |
---|---|
TW202443591A (en) | 2024-11-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Tigga et al. | Prediction of type 2 diabetes using machine learning classification methods | |
Rastogi et al. | Diabetes prediction model using data mining techniques | |
Chakradar et al. | A non-invasive approach to identify insulin resistance with triglycerides and HDL-c ratio using machine learning | |
Patro et al. | Ambient assisted living predictive model for cardiovascular disease prediction using supervised learning | |
Kanyongo et al. | Feature selection and importance of predictors of non-communicable diseases medication adherence from machine learning research perspectives | |
Park et al. | A sequential neural network model for diabetes prediction | |
Theerthagiri et al. | Diagnosis and classification of the diabetes using machine learning algorithms | |
Rajliwall et al. | Machine learning based models for cardiovascular risk prediction | |
Alzboon et al. | Early Diagnosis of Diabetes: A Comparison of Machine Learning Methods. | |
US11848106B1 (en) | Clinical event outcome scoring system employing a severity of illness clinical key and method | |
Mounika et al. | Prediction of type-2 diabetes using machine learning algorithms | |
Dutta et al. | Prediction of weight gain during COVID-19 for avoiding complication in health | |
Chinnasamy et al. | Machine learning based cardiovascular disease prediction | |
Shahin et al. | A robust deep neural network framework for the detection of diabetes | |
Hurley et al. | Visualization of emergency department clinical data for interpretable patient phenotyping | |
García-Vicente et al. | Clinical synthetic data generation to predict and identify risk factors for cardiovascular diseases | |
Bin-Hezam et al. | A machine learning approach towards detecting dementia based on its modifiable risk factors | |
US20240347203A1 (en) | Methods and Systems for Predicting Risk of Cardiovascular Disease | |
TWI854592B (en) | Methods and systems for predicting risk of cardiovascular disease | |
US20230352174A1 (en) | Systems and methods for generating a parasitic infection program | |
US20220375617A1 (en) | Computerized decision support tool for preventing falls in post-acute care patients | |
Tahsin et al. | Predictive analysis & brief study of early-stage diabetes using multiple classifier models | |
Chan et al. | Investigation of diabetic microvascular complications using data mining techniques | |
Liao et al. | Dual autoencoders modeling of electronic health records for adverse drug event preventability prediction | |
Zhao et al. | External validation of a deep learning prediction model for in-hospital mortality among ICU patients |