CN111627553A

CN111627553A - Method for constructing individualized prediction model of first-onset schizophrenia

Info

Publication number: CN111627553A
Application number: CN202010454990.1A
Authority: CN
Inventors: 张程程; 李涛
Original assignee: West China Hospital of Sichuan University
Current assignee: West China Hospital of Sichuan University
Priority date: 2020-05-26
Filing date: 2020-05-26
Publication date: 2020-09-04

Abstract

The invention belongs to the fields of psychiatry, neuroimaging and artificial intelligence, and discloses a construction method for an individualized prediction model of first-episode schizophrenia, which solves the problem of low accuracy of auxiliary diagnosis of the existing SCH brain structure network model. The method includes the following steps: A. Obtaining a diffusion tensor image of a first-episode schizophrenic patient; B. Preprocessing the obtained diffusion tensor image; C. Constructing a sparse brain structure network based on the preprocessed image; D. The similarity network fusion method is used to construct the multi-threshold fusion brain structure network after each subject's sparseness; E. Extract the topological attribute features of the multi-threshold fusion brain structure network, and then perform feature screening; F. Based on the filtered features, use a classifier to perform Perform classification training to obtain an individualized prediction model for first-episode schizophrenia; G. Perform performance verification and evaluation on the individualized prediction model for first-episode schizophrenia obtained through training.

Description

Construction method of individualized prediction model of first-episode schizophrenia

技术领域technical field

本发明属于精神医学、神经影像及人工智能领域，具体涉及一种首发精神分裂症个体化预测模型的构建方法。The invention belongs to the fields of psychiatry, neuroimaging and artificial intelligence, and particularly relates to a method for constructing an individualized prediction model for first-episode schizophrenia.

背景技术Background technique

精神分裂症(Schizophrenia，SCH)是一种高致残、致死性精神障碍，世界卫生组织将其列为全球疾病负担排行榜居前的十大疾病之一，然而其脑机制仍未完全厘清，诊断缺乏客观标准，治愈率低。寻求一种客观有效、方便可行的生物学标记对SCH进行早期个体化分类诊断和治疗成为亟待解决的临床问题。脑结构网络改变是SCH神经解剖异常的重要生物学基础，机器学习方法作为基于数据驱动的预测与分析工具，能够充分利用生物标志数据内在的结构信息构建SCH个体化脑结构网络模型。Schizophrenia (SCH) is a highly disabling and fatal mental disorder. The World Health Organization ranks it as one of the top ten diseases in the global disease burden list. However, its brain mechanism has not been fully clarified. Diagnosis lacks objective criteria and cure rates are low. It has become an urgent clinical problem to seek an objective, effective, convenient and feasible biological marker for early individualized classification, diagnosis and treatment of SCH. Brain structural network changes are an important biological basis for neuroanatomical abnormalities in SCH. As a data-driven prediction and analysis tool, machine learning methods can make full use of the inherent structural information of biomarker data to build an individualized brain structural network model for SCH.

目前SCH脑结构网络模型的研究现状是：1)直接提取原始网络中结构连接值作为特征，纳入了权重较低的伪连接；2)基于单一固定阈值来稀疏网络的方法存在不同级别的噪声影响，且单一阈值的选择具有主观性；3)直接应用原始的结构连接值作为低层次结构信息特征，忽视大脑拓扑复杂的网络重要属性。The current research status of the SCH brain structure network model is: 1) Directly extract the structural connection value in the original network as a feature, and incorporate pseudo-connections with lower weights; 2) The method of sparse network based on a single fixed threshold has different levels of noise influence , and the selection of a single threshold is subjective; 3) The original structural connection value is directly used as the low-level structural information feature, ignoring the important network properties of the brain's complex topology.

因此，现有SCH脑结构网络模型基于脑结构网络特征寻找精神分裂症敏感生物标志非常困难，模型的辅助诊断的正确率低下。Therefore, it is very difficult to find sensitive biomarkers of schizophrenia based on the characteristics of the brain structure network in the existing SCH brain structure network model, and the accuracy of the auxiliary diagnosis of the model is low.

发明内容SUMMARY OF THE INVENTION

本发明所要解决的技术问题是：提出一种首发精神分裂症个体化预测模型的构建方法，解决现有SCH脑结构网络模型的辅助诊断的正确率低下的问题。The technical problem to be solved by the present invention is: to propose a method for constructing an individualized prediction model of first-episode schizophrenia, so as to solve the problem of low accuracy of auxiliary diagnosis of the existing SCH brain structure network model.

本发明解决上述技术问题采用的技术方案是：The technical scheme adopted by the present invention to solve the above-mentioned technical problems is:

首发精神分裂症个体化预测模型的构建方法,包括以下步骤：The construction method of the individualized prediction model of first-episode schizophrenia includes the following steps:

A、获取首发精神分裂症患者弥散张量图像；A. Obtaining diffusion tensor images of patients with first-episode schizophrenia;

B、对获取的所述弥散张量图像进行预处理；B, preprocessing the acquired diffusion tensor image;

C、基于预处理后的图像构建稀疏脑结构网络；C. Build a sparse brain structure network based on the preprocessed images;

D、采用相似网络融合方法构建每个被试稀疏后的多阈值融合脑结构网络；D. Using the similarity network fusion method to construct a multi-threshold fusion brain structure network after each subject's sparseness;

E、提取多阈值融合脑结构网络拓扑属性特征，然后进行特征筛选；E. Extract multi-threshold fusion brain structure network topology attribute features, and then perform feature screening;

F、基于筛选后的特征，采用分类器进行分类训练，获得首发精神分裂症个体化预测模型；F. Based on the characteristics after screening, the classifier is used for classification training to obtain the individualized prediction model of first-episode schizophrenia;

G、对训练获得的首发精神分裂症个体化预测模型进行性能验证评估。G. Perform performance verification evaluation on the individualized prediction model of first-episode schizophrenia obtained by training.

作为进一步优化，步骤A中，利用核磁共振成像扫描机采用单次激发平面回波成像(EPI)技术扫描获取首发精神分裂症患者的弥散张量图像。As a further optimization, in step A, the diffusion tensor image of the first-episode schizophrenia patient is obtained by scanning with a single-shot echo-planar imaging (EPI) technique using an MRI scanner.

作为进一步优化，步骤B中，所述对获取的弥散张量图像进行预处理，具体包括：As a further optimization, in step B, the preprocessing of the acquired diffusion tensor image specifically includes:

B1、利用MRI convert(磁共振成像转换)将DICOM数据格式的弥散张量图像转换为NIFTI格式的图像；B1. Use MRI convert (magnetic resonance imaging conversion) to convert the diffusion tensor image in DICOM data format to an image in NIFTI format;

B2、对数据格式转换后的弥散张量图像进行涡流校正和头动校正；B2. Perform eddy current correction and head motion correction on the diffusion tensor image after data format conversion;

B3、应用FSL的Brain Extraction Tool(脑提取工具)去除头骨，移除非脑组织图像。B3. Use the Brain Extraction Tool of FSL to remove the skull and remove non-brain tissue images.

作为进一步优化，步骤C中，所述基于预处理后的图像构建稀疏脑结构网络，具体包括：As a further optimization, in step C, the construction of a sparse brain structure network based on the preprocessed images specifically includes:

C1、在弥散张量空间用旋转和平移的线性配准法将各被试预处理后的脑图像配准到b0像；然后将配准后的b0像配准到标准MNI空间中的T1像；通过转换矩阵求逆，利用得到的逆矩阵将AAL模板从MNI空间变换到弥散张量空间，获得每个被试基于AAL模板划分的90个脑区网络节点；C1. The preprocessed brain image of each subject is registered to the b0 image by the linear registration method of rotation and translation in the diffusion tensor space; then the registered b0 image is registered to the T1 image in the standard MNI space ; By inverting the transformation matrix, using the obtained inverse matrix to transform the AAL template from the MNI space to the diffusion tensor space, and obtaining 90 brain area network nodes divided by each subject based on the AAL template;

C2、使用概率纤维束成像方法，基于BEDPOSTX工具进行弥散参数的抽样贝叶斯估计，用马尔可夫链蒙特卡罗(Markov Chain Monte Carlo，MCMC)抽样方法来给每个体素的弥散参数建立分布，将大脑每个体素都预设为纤维交叉模型，并自动判别有多少种交叉通过的纤维束；C2. Using the probabilistic tractography method, based on the BEDPOSTX tool for sampling Bayesian estimation of the diffusion parameters, and using the Markov Chain Monte Carlo (MCMC) sampling method to establish a distribution for the diffusion parameters of each voxel , preset each voxel of the brain as a fiber crossing model, and automatically determine how many fiber bundles there are crossing;

C3、基于PROBTRACKX工具进行概率追踪纤维束重建，通过对每个体素的主弥散方向的分布中进行反复抽样，每次从抽取的局部样本中生成流线，通过多次抽样建立起流线位置后验分布的统计图，得到两两脑区间结构连接概率的分布情形；将每个边的权重定义为两两节点区域之间的纤维束连接概率，则每个被试获得一个对称的90×90的纤维束连接概率的加权网络矩阵；C3. Probability tracking fiber bundle reconstruction based on PROBTRACKX tool, by repeatedly sampling the distribution of the main diffusion direction of each voxel, generating streamlines from the extracted local samples each time, and establishing the streamline position through multiple sampling The statistical diagram of the test distribution can be used to obtain the distribution of the probability of structural connection between the two brain regions; the weight of each edge is defined as the probability of fiber bundle connection between the two node regions, then each subject obtains a symmetrical 90×90 The weighted network matrix of the fiber bundle connection probability;

C4、设定纤维束连接概率阈值，超过阈值的两个脑区存在结构连接，测试不同稀疏度阈值对融合效果的影响，采用相对更窄的阈值范围(5％到40％，步长为1％)来构建稀疏结构网络。C4. Set the probability threshold of fiber bundle connection. There are structural connections between the two brain regions that exceed the threshold. Test the effect of different sparsity thresholds on the fusion effect. Use a relatively narrow threshold range (5% to 40%, step size is 1) %) to build a sparse structure network.

作为进一步优化，步骤D中，所述采用相似网络融合方法构建每个被试稀疏后的多阈值融合脑结构网络，具体包括：As a further optimization, in step D, the similar network fusion method is used to construct a sparse multi-threshold fusion brain structure network for each subject, which specifically includes:

D1、以稀疏后结构连接矩阵定义为全核(full kernel)矩阵W_i ^j，对于第i个被试第j个阈值下的全核矩阵W_i ^j，进一步构建对应的稀疏核(sparse kernel)矩阵：D1. Define the connection matrix after sparse as a full kernel matrix W _i ^j , for the full kernel matrix W _i ^j under the jth threshold of the i-th subject, further construct the corresponding sparse kernel (sparse kernel) matrix:

令δ_u为全核矩阵W_i ^j中节点u的k近邻(包括节点u本身)，则稀疏核矩阵S_i ^j被定义为：Let δ _u be the k-nearest neighbors of node u in the full-kernel matrix Wi _j (including node u itself), then the sparse kernel ^matrix S _i ^j is defined as:

D2、基于全核矩阵对应的稀疏核矩阵进行全核矩阵的迭代更新：D2. Iteratively update the full-kernel matrix based on the sparse kernel matrix corresponding to the full-kernel matrix:

其中(W_i ^c)^(m)表示第m次迭代时第i个被试第c个阈值下的全核矩阵，(W_i ^j)^(m+1)表示第m+1次迭代时的全核矩阵，N为总的稀疏阈值个数；where (W _i ^c ) ^(m) represents the full kernel matrix of the i-th subject at the c-th threshold at the m-th iteration, and (W _i ^j ) ^(m+1) represents the full-kernel matrix at the m+1-th iteration Kernel matrix, N is the total number of sparse thresholds;

D3、判断是否满足迭代收敛条件，若是，则执行步骤D4，否则继续迭代；D3. Determine whether the iterative convergence condition is met, if so, execute step D4, otherwise continue to iterate;

其中，所述收敛条件为：||(W_i ^j)^(m+1)-(W_i ^j)^(m)||≤0.01；Wherein, the convergence condition is: ||(W _i ^j ) ^(m+1) -(W _i ^j ) ^(m) ||≤0.01;

D4、将更新后的N个阈值对应的全核矩阵W_i ^j进行平均，从而为每个被试构建平均的全核矩阵：D4. Average the full-kernel matrix W _i ^j corresponding to the updated N thresholds, so as to construct an average full-kernel matrix for each subject:

D5、将W_i中的元素归一化到区间[0，1]，从而为每个被试生成最终的融合后的网络。D5. _Normalize the elements in Wi to the interval [0, 1], thereby generating the final fused network for each subject.

作为进一步优化，步骤E中，所述提取多阈值融合脑结构网络拓扑属性特征，具体包括：As a further optimization, in step E, the extraction of multi-threshold fusion brain structure network topology attribute features specifically includes:

基于图理论分析方法计算融合后网络的8个全局拓扑属性和3种节点拓扑属性在所有阈值下的AUC值作为后续分类的初始特征；Based on the graph theory analysis method, the AUC values of 8 global topological attributes and 3 node topological attributes of the fused network under all thresholds were calculated as the initial features of subsequent classification;

所述8个全局拓扑属性包括网络强度S_p，全局效率E_glob，局部效率E_loc，最短路径长度L_p，集聚系数C_p，标准化的最短路径长度λ，标准化集聚系数γ，小世界属性σ：The 8 global topological properties include network strength Sp , global efficiency E _glob , local efficiency E _loc , shortest path length L _p , clustering coefficient C _p , _normalized shortest path length λ , normalized clustering coefficient γ , small-world property σ :

所述网络强度S_p的计算公式为：The calculation formula of the network strength _Sp is:

其中，S(i)是与第i个节点连接的边的权重总和，N是全脑网络中脑区域的数目；where S(i) is the sum of the weights of the edges connected to the ith node, and N is the number of brain regions in the whole-brain network;

所述最短路径长度L_p的计算公式为：The calculation formula of the shortest path length L _p is:

其中，L_ij代表节点i与节点j之间的最短路径，L_p为整个网络G的最短路径长度；Among them, L _ij represents the shortest path between node i and node j, and L _p is the shortest path length of the entire network G;

所述全局效率E_glob的计算公式包括：The calculation formula of the global efficiency E _glob includes:

其中，E_{glob_i}(G)为节点i的全局效率，网络G的全局效率为网络中所有节点的全局效率的平均值；Among them, E _{glob_i} (G) is the global efficiency of node i, and the global efficiency of network G is the average value of the global efficiency of all nodes in the network;

所述局部效率E_loc的计算公式包括：The calculation formula of the local efficiency E _loc includes:

其中，L_jk是区域j和区域k之间的最短路径长度，G_i是和区域i相连的节点构成的子网络，N_Gi是子网络G_i中脑区域的数目；E_{loc_i}(G)是节点i的局部效率，网络G的局部效率E_loc(G)为网络中所有节点的局部效率的平均值；where L _jk is the shortest path length between region j and region k, G _i is the sub-network formed by nodes connected to region i, N _Gi is the number of brain regions in sub-network _Gi ; E _{loc_i} (G) is The local efficiency of node i, the local efficiency of network G E _loc (G) is the average of the local efficiencies of all nodes in the network;

所述集聚系数C_p的计算公式包括：The calculation formula of the agglomeration coefficient C _p includes:

其中，C(i)为节点i的集聚系数，网络G的集聚系数为所有节点的集聚系数的平均值；Among them, C(i) is the clustering coefficient of node i, and the clustering coefficient of network G is the average value of the clustering coefficients of all nodes;

所述标准化的聚集系数γ和标准化的最短路径长度λ计算公式如下：The normalized aggregation coefficient γ and the normalized shortest path length λ are calculated as follows:

其中，C_p ^rand和L_p ^rand分别是100个随机网络的集聚系数和最短路径长度的平均值；Among them, C _p ^rand and L _p ^rand are the average of the clustering coefficient and the shortest path length of 100 random networks;

所述小世界属性σ的计算公式如下：The calculation formula of the small-world attribute σ is as follows:

σ＝γ/λ；σ=γ/λ;

所述3种节点拓扑属性包括节点度D_nodal(i)，节点效率E_nodal(i)，节点介数中心度B_nodal(i)，分别定义如下：The three types of node topology attributes include node degree D _nodal (i), node efficiency E _nodal (i), node betweenness centrality B _nodal (i), which are respectively defined as follows:

其中，e_st表示在网络G中节点s到节点t的所有最短路径的数量，e_sit是这些最短路径中通过节点i的数量。where _est represents the number of all shortest paths from node s to node t in network G, and e _sit is the number of these shortest paths through node i.

作为进一步优化，步骤E中，所述进行特征筛选具体包括：As a further optimization, in step E, the feature screening specifically includes:

基于支持向量机的递归特征消除(Recursive Feature Elimination,RFE)算法通过不断训练分类器并去除特征权重较小的特征维度来进行特征选择，具体包括：The Recursive Feature Elimination (RFE) algorithm based on support vector machine performs feature selection by continuously training the classifier and removing the feature dimension with small feature weight, including:

①将被选特征集初始化为含有所有被选特征，

① Initialize the selected feature set to contain all selected features,

②将特征集作为输入，训练分类器，得到分类效果和每个特征的权重；② Take the feature set as input, train the classifier, and get the classification effect and the weight of each feature;

③移除权重最小的特征，形成新的特征集；③ Remove the feature with the smallest weight to form a new feature set;

④重复②、③直到

选取分类效果最好的情况。④Repeat ②, ③ until

Choose the case with the best classification effect.

作为进一步优化，步骤F中，所述分类器采用基于径向基函数(radial basisfunction，RBF)核的SVM分类器、逻辑回归分类器(Logistic Regression Classifier)或多个分类器集成学习；在进行分类训练时，应用多种模式识别分类方法来寻找最优的分类器模型及筛选关键的多阈值融合的脑结构网络特征。As a further optimization, in step F, the classifier adopts SVM classifier based on radial basis function (RBF) kernel, logistic regression classifier (Logistic Regression Classifier) or integrated learning of multiple classifiers; During training, a variety of pattern recognition classification methods are applied to find the optimal classifier model and screen the key multi-threshold fusion brain structure network features.

作为进一步优化，步骤G中，采用交叉验证评估精神分裂症训练集和测试集的多种预测模型性能，包括准确率、敏感性、特异性、ROC曲线和AUC值等，系统通过测试特征筛选和分类算法，应用置换检验，优化脑影像特征选择和个体亚型预测模型，识别亚型相关的核心脑影像组特征，以提高个体化预测准确率。As a further optimization, in step G, cross-validation is used to evaluate the performance of various prediction models of schizophrenia training set and test set, including accuracy, sensitivity, specificity, ROC curve and AUC value, etc. Classification algorithms, applying permutation tests, optimizing brain imaging feature selection and individual subtype prediction models, and identifying subtype-related core brain imaging group features to improve individualized prediction accuracy.

本发明的有益效果是：The beneficial effects of the present invention are:

(1)构建多阈值融合后的网络，通过整合原始网络在不同拓扑视图下提供的互补信息，从而生成不依赖于单个阈值的融合网络的拓扑属性特征作为分类的初始特征，经过特征筛选后进行分类训练，该手段既能提高分类准确性，又考虑了结构指标的分量，具有可解释性；(1) Constructing a multi-threshold fusion network, by integrating the complementary information provided by the original network under different topological views, the topological attribute features of the fusion network that do not depend on a single threshold are generated as the initial features of classification. Classification training, which can not only improve the classification accuracy, but also consider the components of structural indicators, which is interpretable;

(2)基于融合多阈值脑结构网络生物学数据建立的人工智能模型，能自动采集数据更新模型，建立精神分裂症患者早期诊断的客观生物学标记，该模型对首发精神分裂症患者分类准确定性显示极高的可靠性，同时具有很高的稳定性。(2) The artificial intelligence model based on the fusion of multi-threshold brain structure network biological data can automatically collect data to update the model and establish objective biological markers for early diagnosis of schizophrenia patients. The model is accurate and qualitative for the classification of first-episode schizophrenia patients. Shows extremely high reliability and high stability at the same time.

附图说明Description of drawings

图1为本发明中的首发精神分裂症个体化预测模型的构建方法流程图。FIG. 1 is a flowchart of the construction method of the individualized prediction model of first-episode schizophrenia in the present invention.

具体实施方式Detailed ways

本发明旨在提出一种首发精神分裂症个体化预测模型的构建方法，解决现有SCH脑结构网络模型的辅助诊断的正确率低下的问题。其核心思想是：获取首发精神分裂症患者弥散张量的单次激发平面回波成像；对弥散张量图像进行预处理；基于预处理后的图像构建稀疏脑结构网络；采用相似网络融合方法，构建每个被试稀疏后的多阈值结构网络；提取基于处理后的融合多阈值脑结构网络拓扑属性特征，经过特征筛选后进行分类训练，获得首发精神分裂症个体化预测模型，最后对训练获得的首发精神分裂症个体化预测模型进行性能验证评估。The invention aims to propose a method for constructing an individualized prediction model of first-episode schizophrenia, so as to solve the problem of low accuracy of auxiliary diagnosis of the existing SCH brain structure network model. The core idea is to obtain the single-shot echo plane imaging of the diffusion tensor of patients with first-episode schizophrenia; to preprocess the diffusion tensor images; to construct a sparse brain structure network based on the preprocessed images; Construct a sparse multi-threshold structure network for each subject; extract the topological attribute features based on the processed fusion multi-threshold brain structure network, perform classification training after feature screening, and obtain an individualized prediction model for first-episode schizophrenia. Performance validation evaluation of an individualized prediction model for first-episode schizophrenia.

对于不同稀疏度阈值下的网络，可以视作对同一被试的脑网络的不同类型的特征表达，本发明将多稀疏度阈值的网络进行融合，将会得到更丰富的拓扑信息，有利于后续分类工作，也规避了采用单一固定阈值来稀疏网络的方法存在不同级别的噪声影响，以及单一阈值的选择具有主观性的问题；而本发明基于图理论分析得到的网络拓扑属性能够反映网络的高层次属性，将这些属性作为分类的特征，会比将原始的网络连接值这些只能反映低层次的信息作为分类特征，能够得到更好的分类结果。Networks with different sparsity thresholds can be regarded as different types of feature expressions for the same subject's brain network. The present invention fuses networks with multiple sparsity thresholds to obtain richer topology information, which is beneficial for subsequent The classification work also avoids the influence of different levels of noise in the method of using a single fixed threshold to sparse the network, and the subjectivity of the selection of a single threshold; and the network topology attributes obtained by the present invention based on graph theory analysis can reflect the high level of the network. Hierarchical attributes, using these attributes as classification features can obtain better classification results than using the original network connection values, which can only reflect low-level information as classification features.

在具体实现上，本发明中的首发精神分裂症个体化预测模型的构建方法流程如图1所示，其包括以下实现步骤：In terms of specific implementation, the construction method flow of the first-episode schizophrenia individualized prediction model in the present invention is shown in Figure 1, and it includes the following implementation steps:

1、获取首发精神分裂症患者弥散张量图像；1. Obtain diffusion tensor images of first-episode schizophrenia patients;

本步骤中，采用Philip 3.0T和GE 3.0T核磁共振成像扫描机采集数据作为训练模块测试集和测试模块数据集，作为具体实施手段，扫描参数如下：在32个轴向平面方向上，TR＝10295ms，TE＝91ms，FOV＝128mm×128mm²，翻转角＝90°，层厚＝4mm，矩阵＝256×256，单个体素大小为2×2×2mm³，b＝1000m/s。采集3DT1结构像数据优化DTI数据配准，扫描参数如下：TR＝8.4ms，TE＝3.8ms，FOV＝256×256mm²，翻转角＝90°，层厚＝lmm，无间隔连续扫描，矩阵＝256×256，单个体素大小为1×1×1mm³，全脑共采集188层图像。In this step, the data collected by Philip 3.0T and GE 3.0T MRI scanners are used as the training module test set and the test module data set. As a specific implementation method, the scanning parameters are as follows: in the 32 axial plane directions, TR= 10295ms, TE=91ms, FOV=128mm×128mm ² , flip angle=90°, slice thickness=4mm, matrix=256×256, single voxel size is 2×2×2mm ³ , b=1000m/s. Collect 3DT1 structural image data to optimize DTI data registration. The scanning parameters are as follows: TR=8.4ms, TE=3.8ms, FOV=256×256mm ² , flip angle=90°, slice thickness=lmm, continuous scan without interval, matrix= 256×256, the size of a single voxel is 1×1×1 mm ³ , and a total of 188 slices of images are collected in the whole brain.

2、对获取的所述弥散张量图像进行预处理；2. Preprocessing the acquired diffusion tensor image;

本步骤中，作为具体实施手段，预处理过程如下：In this step, as a specific means of implementation, the preprocessing process is as follows:

①利用MRI convert将DICOM数据格式的弥散张量图像转换为NIFTI格式图像；①Use MRI convert to convert the diffusion tensor image in DICOM data format to NIFTI format image;

②对数据格式转换后的弥散张量图像进行涡流校正和头动校正；② Perform eddy current correction and head motion correction on the diffusion tensor image after data format conversion;

③应用FSL的Brain Extraction Tool去除头骨，移除非脑组织图像。③ Apply FSL's Brain Extraction Tool to remove the skull and remove the non-brain tissue images.

3、基于预处理后的图像构建稀疏脑结构网络；3. Build a sparse brain structure network based on the preprocessed images;

本步骤中，作为具体实施手段，构建稀疏脑结构网络的过程如下：In this step, as a specific implementation method, the process of constructing a sparse brain structure network is as follows:

①在弥散张量空间用旋转和平移的线性配准法将各被试预处理后的脑图像配准到b0像；然后将配准后的b0像配准到标准MNI空间中的T1像；通过转换矩阵求逆，利用得到的逆矩阵将AAL模板从MNI空间变换到弥散张量空间，获得每个被试基于AAL模板划分的90个脑区网络节点；①Register the preprocessed brain image of each subject to the b0 image in the diffusion tensor space using the linear registration method of rotation and translation; then register the registered b0 image to the T1 image in the standard MNI space; By inverting the transformation matrix, using the obtained inverse matrix to transform the AAL template from the MNI space to the diffusion tensor space, and obtaining 90 brain area network nodes divided by the AAL template for each subject;

②使用概率纤维束成像方法，基于BEDPOSTX工具进行弥散参数的抽样贝叶斯估计，用马尔可夫链蒙特卡罗抽样方法来给每个体素的弥散参数建立分布，将大脑每个体素都预设为纤维交叉模型，并自动判别有多少种交叉通过的纤维束；②Using the probabilistic tractography method, based on the BEDPOSTX tool for sampling Bayesian estimation of the diffusion parameters, and using the Markov chain Monte Carlo sampling method to establish a distribution for the diffusion parameters of each voxel, and preset each voxel in the brain It is a fiber crossing model, and automatically determines how many types of fiber bundles cross through;

③基于PROBTRACKX工具进行概率追踪纤维束重建，通过对每个体素的主弥散方向的分布中进行反复抽样，每次从抽取的局部样本中生成流线，通过多次抽样建立起流线位置后验分布的统计图，得到两两脑区间结构连接概率的分布情形；将每个边的权重定义为两两节点区域之间的纤维束连接概率，则每个被试获得一个对称的90×90的纤维束连接概率的加权网络矩阵；③Probability tracing fiber bundle reconstruction based on the PROBTRACKX tool, by repeatedly sampling the distribution of the main diffusion direction of each voxel, generating streamlines from the extracted local samples each time, and establishing the posterior position of the streamlines through multiple sampling The statistical diagram of the distribution can obtain the distribution of the probability of structural connection between the two brain regions; the weight of each edge is defined as the probability of fiber bundle connection between the two node regions, then each subject obtains a symmetrical 90×90 Weighted network matrix of fiber bundle connection probabilities;

④设定纤维束连接概率阈值，超过阈值的两个脑区存在结构连接，测试不同稀疏度阈值对融合效果的影响，采用相对更窄的阈值范围(5％到40％，步长为1％)来构建稀疏结构网络。(4) Set the fiber bundle connection probability threshold, and there are structural connections between the two brain regions that exceed the threshold. Test the effect of different sparsity thresholds on the fusion effect, using a relatively narrower threshold range (5% to 40%, with a step size of 1%). ) to build a sparse structure network.

4、采用相似网络融合方法构建每个被试稀疏后的多阈值融合脑结构网络；4. Using the similarity network fusion method to construct a multi-threshold fusion brain structure network after each subject's sparse;

本步骤中，作为具体实施手段，构建多阈值融合脑结构网络的过程如下：In this step, as a specific implementation method, the process of constructing a multi-threshold fusion brain structure network is as follows:

①以稀疏后结构连接矩阵定义为全核矩阵W_i ^j，对于第i个被试第j个阈值下的全核矩阵W_i ^j，进一步构建对应的稀疏核矩阵，稀疏核矩阵是用于编码网络稀疏后依然保留的强连接：① The sparse post-structure connection matrix is defined as the full-kernel matrix W _i ^j , for the full-kernel matrix Wi j under the j-th threshold of the _i - ^th subject, the corresponding sparse kernel matrix is further constructed. The sparse-kernel matrix is used for coding Strong connections that remain after the network is sparse:

②、基于全核矩阵对应的稀疏核矩阵进行全核矩阵的迭代更新：2. Iteratively update the full kernel matrix based on the sparse kernel matrix corresponding to the full kernel matrix:

通过与除自己以外的所有其它阈值网络进行交互，全核矩阵W_i ^j可以集成原始网络在其它拓扑视图下提供的互补信息。同时，稀疏核矩阵S_i ^j通过对应全核矩阵W_i ^j中的最强的连接引导了迭代过程，因此可以有效地抑制噪声。从矩阵乘法的角度来看整个迭代过程，上述公式中意味着全核矩阵W_i ^j中任何两个节点的连接值大小同时依赖于其它阈值网络中对应节点的k近邻。特别当如果两个节点各自的k近邻在其它的阈值的网络中是最强连接的话，则它们之间的连接在迭代更新后会加强(尽管它们本身可能是弱连接)，反之亦然。By interacting with all other threshold networks except its own, the all-kernel matrix W _i ^j can integrate the complementary information provided by the original network under other topological views. Meanwhile, the sparse kernel matrix S _i ^j guides the iterative process through the strongest connection in the corresponding full kernel matrix W _i ^j , so the noise can be effectively suppressed. Looking at the entire iterative process from the perspective of matrix multiplication, the above formula means that the connection value of any two nodes in the full-kernel matrix W _i ^j also depends on the k-nearest neighbors of the corresponding nodes in other threshold networks. In particular, if the respective k-nearest neighbors of two nodes are the strongest connections in other thresholded networks, the connections between them will strengthen after iterative updates (although they may themselves be weak connections), and vice versa.

③、判断是否满足迭代收敛条件，若是，则执行步骤④，否则继续迭代；3. Determine whether the iterative convergence conditions are met, if so, perform step 4, otherwise continue to iterate;

④、将更新后的N个阈值对应的全核矩阵W_i ^j进行平均，从而为每个被试构建平均的全核矩阵：④. Average the full-kernel matrix W _i ^j corresponding to the updated N thresholds to construct an average full-kernel matrix for each subject:

⑤、将W_i中的元素归一化到区间[0，1]，从而为每个被试生成最终的融合后的网络。⑤. _Normalize the elements in Wi to the interval [0, 1] to generate the final fused network for each subject.

5、提取多阈值融合脑结构网络拓扑属性特征，然后进行特征筛选；5. Extract multi-threshold fusion brain structure network topology attribute features, and then perform feature screening;

本步骤中，作为具体实施手段，基于图理论分析方法计算融合后网络的8个全局拓扑属性和3种节点拓扑属性在所有阈值下的AUC值作为后续分类的初始特征。其中，8个全局拓扑属性包括网络强度(节点度)S_p，全局效率E_glob，局部效率E_loc，最短路径长度L_p，集聚系数C_p，标准化的最短路径长度λ，标准化集聚系数γ，小世界属性σ。具体定义如下：In this step, as a specific implementation method, the AUC values of 8 global topology attributes and 3 node topology attributes of the fused network under all thresholds are calculated based on the graph theory analysis method as the initial features of subsequent classification. Among them, the 8 global topological properties include network strength (node degree) S _p , global efficiency E _glob , local efficiency E _loc , shortest path length L _p , clustering coefficient C _p , normalized shortest path length λ, normalized agglomeration coefficient γ, Small world property σ. The specific definitions are as follows:

网络强度(节点度)S_p反映重要的网络演化特性。节点度定义为与节点直接连接的边的权重和，节点的度越大则该节点的连接就越多，其节点在网络中的地位也就越重要。定义公式为：The network strength (node degree) _Sp reflects important network evolution characteristics. The degree of a node is defined as the sum of the weights of the edges directly connected to the node. The greater the degree of a node, the more connections the node has, and the more important its position in the network is. The definition formula is:

其中S(i)是与第i个节点连接的边的权重总和，N是全脑网络中脑区域的数目。网络中所有节点的度的平均值为该网络的强度。where S(i) is the sum of the weights of edges connected to the ith node, and N is the number of brain regions in the whole-brain network. The average of the degrees of all nodes in a network is the strength of that network.

最短路径长度L_p：网络中所有节点最短路径的平均值为该网络的最短路径，反映整个网络的运行效率。通过最短路径可以更快地传输信息，节省系统资源。定义公式为：The shortest path length L _p : the average value of the shortest paths of all nodes in the network is the shortest path of the network, reflecting the operation efficiency of the entire network. Information can be transmitted faster over the shortest path, saving system resources. The definition formula is:

其中L_i,j代表节点i与节点j之间的最短路径，L_p为整个网络G的最短路径长度。可以看出最短路径长度的计算必须是基于网络全连通的情况的：假如节点i不能通过任何途径到达节点j，则L_i,j不存在或者无穷大，L_p也将不存在。Among them, L _i,j represents the shortest path between node i and node j, and L _p is the shortest path length of the entire network G. It can be seen that the calculation of the shortest path length must be based on the full connectivity of the network: if node i cannot reach node j through any way, then _{Li, j} does not exist or is infinite, and L _p also does not exist.

全局效率E_glob：描述网络内信息传输效率，节点i的全局效率是在最短路径长度的基础上按照公式以下定义Global efficiency E _glob : describe the information transmission efficiency in the network, the global efficiency of node i is defined by the following formula on the basis of the shortest path length

由上述公式可以看出一个节点的最短路径越小，节点与其它节点之间的信息传递越快，即该节点的全局效率越高。It can be seen from the above formula that the smaller the shortest path of a node, the faster the information transfer between the node and other nodes, that is, the higher the global efficiency of the node.

而网络G的全局效率E_glob被定义为网络中所有节点全局效率的平均值And the global efficiency E _glob of the network G is defined as the average value of the global efficiency of all nodes in the network

局部效率E_loc是衡量网络内各相邻节点组成的“团(clique)”的紧凑程度，同时也描述了网络的冗余(redundancy)性及对外界攻击的容忍(tolerance)力的重要指标。节点i的局部效率和网络G的局部效率按照公式定义为：The local efficiency E _loc is an important indicator to measure the compactness of the "clique" composed of adjacent nodes in the network, and also to describe the redundancy of the network and the tolerance of external attacks. The local efficiency of node i and the local efficiency of network G are defined as:

其中L_jk是区域j和区域k之间的最短路径长度，G_i是和区域i相连的节点构成的子网络，N_Gi是子网络G_i中脑区域的数目，N是全脑网络中G中节点的数目。where L _jk is the shortest path length between region j and region k, G _i is the sub-network composed of nodes connected to region i, N _Gi is the number of brain regions in sub-network _Gi , and N is G in the whole-brain network the number of nodes in the .

集聚系数C_p是衡量网络的小集团(cliquishness)性和局部互连接(interconnectivity)程度的重要指标，节点i的类聚系数C(i)定义为网络G内与节点i直接相连的“其它节点”之间的边数与这些“其它节点”之间的最大可能边数之间的比值，按照以下公式定义。网络G的集聚系数C_p被定义为所有节点集聚系数的平均值。The clustering coefficient _Cp is an important indicator to measure the cliquishness and the degree of local interconnectivity of the network. The clustering coefficient C(i) of node i is defined as the “other nodes” directly connected to node i in network G. The ratio between the number of edges between these "other nodes" and the maximum possible number of edges between these "other nodes", as defined by the following formula. The clustering coefficient _Cp of the network G is defined as the average of the clustering coefficients of all nodes.

如果一个网络同时具有较高的集聚系数和较短的最短路径长度，这个网络则被认为具有小世界属性(small-worldness)。为了定量判定网络是否具有小世界属性，一般将该网络的集聚系数和最短路径长度与随机网络的对应属性进行比较。A network is considered to have small-worldness if it has both a high clustering coefficient and a short shortest path length. In order to quantitatively determine whether the network has the small-world property, the clustering coefficient and the shortest path length of the network are generally compared with the corresponding properties of the random network.

根据以下公式分别计算出标准化的聚集系数γ和标准化的最短路径长度λ：The normalized aggregation coefficient γ and normalized shortest path length λ are calculated respectively according to the following formulas:

其中C_p ^rand和L_p ^rand分别是100个随机网络的集聚系数和最短路径长度的平均值。where C _p ^rand and L _p ^rand are the mean values of the clustering coefficients and the shortest path lengths of 100 random networks, respectively.

小世界属性σ被定义为σ＝γ/λ。如果γ>1且λ≈1，即σ>1，则这个网络被判定具有小世界属性。The small-world property σ is defined as σ=γ/λ. If γ>1 and λ≈1, that is, σ>1, the network is judged to have small-world properties.

3个节点拓扑属性包括节点度D_nodal(i)，节点效率E_nodal(i)，节点介数中心度B_nodal(i)，分别定义如下：The three node topological properties include node degree D _nodal (i), node efficiency E _nodal (i), node betweenness centrality B _nodal (i), which are defined as follows:

其中介数中心度(betweenness centrality)则是从信息流的角度出发定义节点的中心程度，在公式e_st表示在网络G中节点s到节点t的所有最短路径的数量，e_sit是这些最短路径中通过节点i的数量。Among them, betweenness centrality defines the centrality of nodes from the perspective of information flow. In the formula _est represents the number of all shortest paths from node s to node t in network G, and e _sit is these shortest paths. The number of passing nodes i in .

本步骤中，所述进行特征筛选具体包括：In this step, the feature screening specifically includes:

①将被选特征集初始化为含有所有被选特征，

① Initialize the selected feature set to contain all selected features,

④重复②、③直到

选取分类效果最好的情况。④Repeat ②, ③ until

Choose the case with the best classification effect.

6、基于筛选后的特征，采用分类器进行分类训练，获得首发精神分裂症个体化预测模型；6. Based on the characteristics after screening, the classifier is used for classification training to obtain the individualized prediction model of first-episode schizophrenia;

本步骤中，作为具体实施手段，所述分类器采用基于径向基函数(radial basisfunction，RBF)核的SVM分类器、逻辑回归分类器(Logistic Regression Classifier)或多个分类器集成学习；在进行分类训练时，应用多种模式识别分类方法来寻找最优的分类器模型及筛选关键的多阈值融合的脑结构网络特征。In this step, as a specific implementation means, the classifier adopts SVM classifier based on radial basis function (radial basis function, RBF) kernel, logistic regression classifier (Logistic Regression Classifier) or integrated learning of multiple classifiers; During classification training, a variety of pattern recognition classification methods are applied to find the optimal classifier model and screen the key multi-threshold fusion brain structure network features.

7、对训练获得的首发精神分裂症个体化预测模型进行性能验证评估。7. Perform performance verification evaluation on the individualized prediction model of first-episode schizophrenia obtained by training.

本步骤中，作为具体实施手段，采用交叉验证评估精神分裂症训练集和测试集的多种预测模型性能，包括准确率、敏感性、特异性、ROC曲线和AUC值等，系统通过测试特征筛选和分类算法，应用置换检验，优化脑影像特征选择和个体亚型预测模型，识别亚型相关的核心脑影像组特征，以提高个体化预测准确率。In this step, as a specific implementation method, cross-validation is used to evaluate the performance of various prediction models of schizophrenia training set and test set, including accuracy, sensitivity, specificity, ROC curve and AUC value, etc. and classification algorithms, apply permutation tests, optimize brain imaging feature selection and individual subtype prediction models, and identify subtype-related core brain imaging group features to improve individualized prediction accuracy.

综上，本发明采用人工智能与机器学习的技术，通过对首发精神分裂症脑结构磁共振成像数据的分析与挖掘，构建一种具有良好鲁棒性的首发精神分裂症个体化预测模型，用以对精神分裂症早期诊断识别，以实现精确和客观的辅助诊断，提高疗效。To sum up, the present invention adopts the technology of artificial intelligence and machine learning, and constructs an individualized prediction model of first-episode schizophrenia with good robustness by analyzing and mining the magnetic resonance imaging data of the brain structure of first-episode schizophrenia. Early diagnosis and identification of schizophrenia can achieve accurate and objective auxiliary diagnosis and improve curative effect.

Claims

1. The method for constructing the individualized prediction model of the first-onset schizophrenia is characterized by comprising the following steps of:

A. acquiring a diffusion tensor image of a patient with first schizophrenia;

B. preprocessing the acquired diffusion tensor image;

C. constructing a sparse brain structure network based on the preprocessed image;

D. constructing a multi-threshold fusion brain structure network after each tested sparse by adopting a similar network fusion method;

E. extracting network topology attribute features of the multi-threshold fusion brain structure, and then performing feature screening;

F. based on the screened features, performing classification training by adopting a classifier to obtain an individualized prediction model of the first schizophrenia;

G. and (4) performing performance verification evaluation on the trained individualized prediction model of the first schizophrenia.

2. The method for constructing an individualized model for predicting first-onset schizophrenia according to claim 1, wherein the first-onset schizophrenia is a single-dose schizophrenia,

in the step A, a nuclear magnetic resonance imaging scanner is used for scanning and obtaining a diffusion tensor image of the first schizophrenia patient by adopting a single-shot plane echo imaging technology.

3. The method for constructing an individualized model for predicting first-onset schizophrenia according to claim 1, wherein the first-onset schizophrenia is a single-dose schizophrenia,

in step B, the preprocessing of the acquired diffusion tensor image specifically includes:

b1, converting the dispersion tensor image in the DICOM data format into an image in an NIFTI format by using MRI convert;

b2, performing eddy current correction and head movement correction on the diffusion tensor image after data format conversion;

b3, removing skull by Brain Extraction Tool of FSL, and removing non-Brain tissue image.

4. The method for constructing an individualized model for predicting first-onset schizophrenia according to claim 1, wherein the first-onset schizophrenia is a single-dose schizophrenia,

in step C, the constructing a sparse brain structure network based on the preprocessed image specifically includes:

c1, registering the brain images after the pre-processing of each test to b0 images in a diffusion tensor space by a rotation and translation linear registration method; the registered b0 images are then registered to T1 images in standard MNI space; inverting the conversion matrix, and transforming the AAL template from the MNI space to the diffusion tensor space by using the obtained inverse matrix to obtain 90 brain area network nodes divided based on the AAL template;

c2, using a probabilistic fiber bundle imaging method, carrying out Bayesian estimation of dispersion parameters based on a BEDPOSTX tool, establishing distribution of the dispersion parameters of each voxel by using a Markov chain Monte Carlo sampling method, presetting each voxel of a brain as a fiber cross model, and automatically judging how many kinds of cross-passing fiber bundles pass;

c3, carrying out probability tracking fiber bundle reconstruction based on a PROBTRACKX tool, repeatedly sampling the distribution of each voxel in the main diffusion direction, generating a streamline from the extracted local sample each time, and establishing a statistical graph of streamline position posterior distribution through multiple sampling to obtain the distribution situation of the structural connection probability between every two brain regions; defining the weight of each edge as the fiber bundle connection probability between every two node areas, and obtaining a symmetrical 90 x 90 fiber bundle connection probability weighting network matrix for each tested object;

and C4, setting a fiber bundle connection probability threshold, wherein structural connection exists between the two brain areas exceeding the threshold, testing the influence of different sparsity thresholds on the fusion effect, and constructing a sparse structure network by adopting a relatively narrower threshold range.

5. The method for constructing an individualized model for predicting first-onset schizophrenia according to claim 1, wherein the first-onset schizophrenia is a single-dose schizophrenia,

in step D, the constructing of the multi-threshold fusion brain structure network after each tested sparse by using the similar network fusion method specifically includes:

d1, defining the sparse structure connection matrix as a full-core matrix

For the ith tested full-core matrix under the jth threshold

Further constructing a corresponding sparse kernel matrix:

order to_uIs a full-core matrix

K adjacent to the middle node u, then sparse kernel matrix

Is defined as:

d2, carrying out iterative updating of the full-kernel matrix based on the sparse kernel matrix corresponding to the full-kernel matrix:

wherein (W)_i ^c)^(m)Represents the full-kernel matrix under the ith tested c threshold value in the mth iteration, (W)_i ^j)^(m+1)Representing a full-kernel matrix in the (m + 1) th iteration, wherein N is the total sparse threshold number;

d3, judging whether the iteration convergence condition is met, if so, executing a step D4, otherwise, continuing the iteration;

wherein the convergence condition is: i (W)_i ^j)^(m+1)-(W_i ^j)^(m)||≤0.01；

D4, and corresponding the updated N thresholds to the full-core matrix

Averaging was performed to construct an averaged full kernel matrix for each test:

d5, mixing W_iNormalized to the interval [0, 1 ]]Thus generating a final fused network for each test.

6. The method for constructing an individualized model for predicting first-onset schizophrenia according to claim 1, wherein the first-onset schizophrenia is a single-dose schizophrenia,

in step E, the extracting of the multi-threshold fusion brain structure network topology attribute feature specifically includes:

calculating AUC values of the fused network under all thresholds of 8 global topological attributes and 3 node topological attributes based on a graph theory analysis method as initial features of subsequent classification;

the 8 global topology attributes include network strength S_pGlobal efficiency E_globLocal efficiency E_locLength of shortest path L_pCoefficient of aggregation C_pNormalized shortest path length λ, normalized clustering coefficient γ, small world attribute σ:

the network strength S_pThe calculation formula of (2) is as follows:

wherein, S (i) is the weighted sum of edges connected with the ith node, and N is the number of brain areas in the whole brain network;

the shortest path length L_pThe calculation formula of (2) is as follows:

wherein L is_ijRepresenting the shortest path between node i and node j, L_pThe shortest path length for the entire network G;

the global efficiency E_globThe calculation formula (2) includes:

wherein E is_{glob_i}(G) The global efficiency of the node i is the average value of the global efficiency of all nodes in the network G;

the local efficiency E_locThe calculation formula (2) includes:

wherein L is_jkIs the shortest path length between region j and region k, G_iIs a sub-network of nodes connected to area i, N_GiIs a sub-network G_iThe number of midbrain regions; e_{loc_i}(G) Is the local efficiency of node i, the local efficiency E of network G_loc(G) The average value of the local efficiency of all nodes in the network is obtained;

the agglomeration coefficient C_pThe calculation formula (2) includes:

wherein, C (i) is the aggregation coefficient of the node i, and the aggregation coefficient of the network G is the average value of the aggregation coefficients of all the nodes;

the normalized aggregation coefficient γ and the normalized shortest path length λ are calculated as follows:

wherein, C_p ^randAnd L_p ^randThe average values of the aggregation coefficients and the shortest path lengths of 100 random networks are respectively;

the formula for calculating the small world attribute sigma is as follows:

σ＝γ/λ；

the 3 kinds of node topology attributes comprise a node degree D_nodal(i) Node efficiency E_nodal(i) Center degree of node betweenness B_nodal(i) Respectively defined as follows:

wherein e is_stRepresenting the number of all shortest paths from node s to node t in network G, e_sitIs the number of passing nodes i in these shortest paths.

7. The method for constructing an individualized model for predicting first-onset schizophrenia according to claim 1, wherein the first-onset schizophrenia is a single-dose schizophrenia,

in step E, the performing feature screening specifically includes:

the recursive feature elimination algorithm based on the support vector machine performs feature selection by continuously training a classifier and removing feature dimensions with smaller feature weights, and specifically comprises the following steps:

① initializes the selected feature set to contain all of the selected features,

secondly, taking the feature set as input, training a classifier, and obtaining a classification effect and the weight of each feature;

removing the features with the minimum weight to form a new feature set;

④ repeat ②, ③ until

And selecting the condition with the best classification effect.

8. The method for constructing an individualized model for predicting first-onset schizophrenia according to claim 1, wherein the first-onset schizophrenia is a single-dose schizophrenia,

in the step F, the classifier adopts an SVM classifier, a logistic regression classifier or a plurality of classifiers integrated learning based on a radial basis function kernel; during classification training, a plurality of pattern recognition classification methods are applied to find an optimal classifier model and screen key multi-threshold fusion brain structure network characteristics.

9. The method for constructing the individualized prediction model for the first-onset schizophrenia according to any one of claims 1 to 8, wherein in the step G, cross validation is adopted to evaluate the performances of various prediction models of the training set and the test set of schizophrenia, including accuracy, sensitivity, specificity, ROC curve, AUC value and the like, and the system optimizes the brain image feature selection and the individual subtype prediction models by using the test feature screening and classification algorithm and applies the permutation test to identify the core brain image group features related to the subtypes so as to improve the individualized prediction accuracy.