[go: up one dir, main page]

CN112908470B - Hepatocellular carcinoma prognosis scoring system based on RNA binding protein gene and application thereof - Google Patents

Hepatocellular carcinoma prognosis scoring system based on RNA binding protein gene and application thereof Download PDF

Info

Publication number
CN112908470B
CN112908470B CN202110172416.1A CN202110172416A CN112908470B CN 112908470 B CN112908470 B CN 112908470B CN 202110172416 A CN202110172416 A CN 202110172416A CN 112908470 B CN112908470 B CN 112908470B
Authority
CN
China
Prior art keywords
hepatocellular carcinoma
score
binding protein
gene
scoring system
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110172416.1A
Other languages
Chinese (zh)
Other versions
CN112908470A (en
Inventor
刘利平
张强弩
刘权
严巧婷
张育森
孙哲
鲍世韵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Peoples Hospital
Original Assignee
Shenzhen Peoples Hospital
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Peoples Hospital filed Critical Shenzhen Peoples Hospital
Priority to CN202110172416.1A priority Critical patent/CN112908470B/en
Publication of CN112908470A publication Critical patent/CN112908470A/en
Application granted granted Critical
Publication of CN112908470B publication Critical patent/CN112908470B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/10Ploidy or copy number detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Theoretical Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Public Health (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Chemical & Material Sciences (AREA)
  • Computational Linguistics (AREA)
  • Primary Health Care (AREA)
  • Epidemiology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Pathology (AREA)
  • Artificial Intelligence (AREA)
  • Analytical Chemistry (AREA)
  • Databases & Information Systems (AREA)
  • Genetics & Genomics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention provides a hepatocellular carcinoma prognosis scoring system based on RNA-binding protein genes and application thereof, wherein input variables of the hepatocellular carcinoma prognosis scoring system comprise a coefficient of kennel and a score of the hepatocellular carcinoma-related RNA-binding protein genes in a data set, and the score is determined according to the mRNA expression level and risk ratio of the hepatocellular carcinoma-related RNA-binding protein genes in the data set; the output variable of the hepatocellular carcinoma prognosis scoring system is a weighted sum of the coefficient of the kennel and the score of the hepatocellular carcinoma-associated RNA-binding protein gene. The hepatocellular carcinoma prognosis scoring system can effectively evaluate and predict prognosis of patients, has the characteristics of cross-platform and universal applicability, and has wide clinical application prospect, and the score is obviously related to part of clinical characteristics of the patients.

Description

一种基于RNA结合蛋白基因的肝细胞癌预后评分系统及其 应用A prognostic scoring system for hepatocellular carcinoma based on RNA-binding protein genes and its application

技术领域Technical field

本发明属于肿瘤学技术领域,涉及一种基于RNA结合蛋白基因的肝细胞癌预后评分系统及其应用。The invention belongs to the technical field of oncology and relates to a hepatocellular carcinoma prognosis scoring system based on RNA binding protein genes and its application.

背景技术Background technique

肝细胞癌(Hepatocellular carcinoma,HCC)由于起病隐匿、早期诊断困难,5年生存率不足20%。近年来,手术、放化疗技术、靶向治疗技术及免疫治疗技术取得了显著的进步,给HCC患者带来了新希望,但是中晚期HCC的疗效仍然十分有限。HCC发生发展过程中的很多分子机制尚不清楚,因此有必要进一步完善HCC分子机制的拼图,找到关键分子,从而助力HCC早期诊断、个体化治疗以及预后判断。Due to its insidious onset and difficulty in early diagnosis, hepatocellular carcinoma (HCC) has a 5-year survival rate of less than 20%. In recent years, significant progress has been made in surgery, radiotherapy and chemotherapy technology, targeted therapy technology and immunotherapy technology, bringing new hope to HCC patients. However, the curative effect for intermediate and advanced HCC is still very limited. Many molecular mechanisms in the occurrence and development of HCC are still unclear. Therefore, it is necessary to further improve the puzzle of HCC molecular mechanisms and find key molecules to facilitate early diagnosis, personalized treatment and prognosis judgment of HCC.

RNA结合蛋白(RNA-binding proteins,RBPs)是参与转录后调控事件的重要蛋白,凭借多变的RNA结合(RNA-binding)区域和灵活的结构,RBPs以动态形式控制很多RNA的代谢行为,包括:RNA剪接、定位、转运及稳定性维持。目前已经明确RBPs可以通过转录后调控等作用参与肿瘤的发生发展,靶基因涵盖促癌基因、细胞周期/凋亡调控因子基因、自噬调控因子基因和炎性因子基因等;癌组织和癌旁组织中存在一些差异表达的RBPs,这些差异表达的RBPs与患者的预后及临床特征相关。近期基于TCGA的高通量数据分析研究提示,RBPs的表达改变存在于多种癌症、尤其是HCC中。很多研究鉴定出与HCC发生发展有关的RBPs:比如Zhao等人发现RNA结合蛋白RPS3通过转录后调控SIRT1促进HCC细胞的增殖(ZhaoL,Cao J,Hu K,Wang P,Li G,He X,et al.RNA-binding protein RPS3 contributes tohepatocarcinogenesis by post-transcriptionally up-regulating SIRT1.NucleicAcids Res 2019;47(4):2011-28.);Dong等人发现RNA结合蛋白RBM通过调控circular RNASCD-circRNA 2的产生来促进HCC细胞的增长(Dong W,Dai ZH,Liu FC,Guo XG,Ge CM,DingJ,et al.The RNA-binding protein RBM3 promotes cell proliferation inhepatocellular carcinoma by regulating circular RNA SCD-circRNA2production.EBioMedicine2019;45:155-67)。RNA-binding proteins (RBPs) are important proteins involved in post-transcriptional regulatory events. With their variable RNA-binding regions and flexible structures, RBPs dynamically control many RNA metabolic behaviors, including : RNA splicing, localization, transport and stability maintenance. It is now clear that RBPs can participate in the occurrence and development of tumors through post-transcriptional regulation and other functions. The target genes include oncogenic genes, cell cycle/apoptosis regulatory factor genes, autophagy regulatory factor genes, and inflammatory factor genes; cancer tissues and adjacent cancer tissues There are some differentially expressed RBPs in tissues, and these differentially expressed RBPs are related to the prognosis and clinical characteristics of patients. Recent high-throughput data analysis studies based on TCGA suggest that expression changes of RBPs exist in various cancers, especially HCC. Many studies have identified RBPs related to the occurrence and development of HCC: For example, Zhao et al. found that the RNA-binding protein RPS3 promotes the proliferation of HCC cells through post-transcriptional regulation of SIRT1 (ZhaoL, Cao J, Hu K, Wang P, Li G, He X, et al. al.RNA-binding protein RPS3 contributes tohepatocarcinogenesis by post-transcriptionally up-regulating SIRT1. NucleicAcids Res 2019;47(4):2011-28.); Dong et al. found that RNA-binding protein RBM regulates the production of circular RNASCD-circRNA 2 To promote the growth of HCC cells (Dong W, Dai ZH, Liu FC, Guo XG, Ge CM, DingJ, et al. The RNA-binding protein RBM3 promotes cell proliferation in hepatocellular carcinoma by regulating circular RNA SCD-circRNA2production. EBioMedicine2019; 45: 155-67).

但是大部分既往研究多在体外水平关注单个RBPs在HCC细胞中的功能和机制,同一基因在不同数据集的表达模式可能存在显著差异,甚至是完全相反的结果。如果在研究中仅纳入少量队列,得到的结果往往是有局限性的,没有普遍适用性,不能反映真实情况。现有技术缺乏关于RBPs的系统性回顾研究以及临床应用研究。一些研究虽然涉及了RBPs和临床预后的关系,但是这些研究多基于单个数据集,因此研究结果存在不一致性和矛盾性。However, most previous studies focus on the function and mechanism of individual RBPs in HCC cells in vitro. The expression patterns of the same gene in different data sets may be significantly different, or even completely opposite results. If only a small number of cohorts are included in the study, the results obtained are often limited, not generally applicable, and cannot reflect the real situation. The existing technology lacks systematic review studies and clinical application studies on RBPs. Although some studies have involved the relationship between RBPs and clinical prognosis, these studies are mostly based on a single data set, so the research results are inconsistent and contradictory.

因此,有必要基于多个队列以及大规模样本获得具有一致性和临床应用价值的RBPs相关数据。Therefore, it is necessary to obtain RBPs-related data with consistency and clinical application value based on multiple cohorts and large-scale samples.

发明内容Contents of the invention

针对现有技术的不足和实际需求,本发明提供了一种基于RNA结合蛋白基因的肝细胞癌预后评分系统及其应用,所述肝细胞癌预后评分系统可以有效预测HCC患者的总体生存率(overall survival,OS)和无病生存率(disease-free survival,DFS),具有较高的临床应用价值。In view of the shortcomings and actual needs of the existing technology, the present invention provides a hepatocellular carcinoma prognostic scoring system based on RNA binding protein genes and its application. The hepatocellular carcinoma prognostic scoring system can effectively predict the overall survival rate of HCC patients ( overall survival (OS) and disease-free survival (DFS), which have high clinical application value.

为达此目的,本发明采用以下技术方案:To achieve this goal, the present invention adopts the following technical solutions:

第一方面,本发明提供了一种基于RNA结合蛋白基因的肝细胞癌预后评分系统,所述肝细胞癌预后评分系统的输入变量包括肝细胞癌相关性RNA结合蛋白基因在数据集中的基尼系数和分值;In a first aspect, the present invention provides a hepatocellular carcinoma prognostic scoring system based on RNA binding protein genes. The input variables of the hepatocellular carcinoma prognostic scoring system include the Gini coefficient of the hepatocellular carcinoma-related RNA binding protein genes in the data set. and score;

所述分值根据肝细胞癌相关性RNA结合蛋白基因在数据集中的mRNA表达水平和风险比率确定。The score is determined based on the mRNA expression level and risk ratio of the hepatocellular carcinoma-associated RNA-binding protein gene in the data set.

本发明中,基于Cox生存分析和随机森林模型的结果,利用关键的10个HCC相关性RBPs基因的mRNA表达值构建了肝细胞癌预后评分系统RBP-score,RBP-score将10个HCC相关性RBPs基因的重要性与HCC患者的预后及临床特征相联系,在不同HCC数据集中,RBP-score不同的患者存在显著的总体生存率和无病生存率差异,RBP-score越高、其总体生存率和无病生存率越差,所述肝细胞癌预后评分系统RBP-score是一个有效、跨平台、具有普遍适用性的患者预后评估工具,预后评估能力不弱于临床TNM分期方法,具有实际临床使用价值。In the present invention, based on the results of Cox survival analysis and random forest model, the hepatocellular carcinoma prognostic scoring system RBP-score is constructed using the mRNA expression values of 10 key HCC-related RBPs genes. RBP-score combines the 10 HCC-related The importance of RBPs genes is related to the prognosis and clinical characteristics of HCC patients. In different HCC data sets, patients with different RBP-scores have significant differences in overall survival rate and disease-free survival rate. The higher the RBP-score, the better the overall survival rate. The worse the disease-free survival rate and disease-free survival rate, the hepatocellular carcinoma prognostic scoring system RBP-score is an effective, cross-platform, and universally applicable patient prognosis assessment tool. The prognostic assessment ability is not weaker than the clinical TNM staging method and has practical Clinical value.

优选地,所述风险比率(HR)根据肝细胞癌相关性RNA结合蛋白基因在数据集中的基于Cox比例风险模型的总体生存率确定,优选为基于GSE14520、TCGA-LIHC和ICGC-LIRI-JP,利用单变量COX比例分险模型求得HR,在进行COX比例分险模型计算时,软件会自动给出分险比率,也就是HR值。Preferably, the hazard ratio (HR) is determined based on the overall survival rate of the hepatocellular carcinoma-associated RNA binding protein gene in the data set based on the Cox proportional hazard model, preferably based on GSE14520, TCGA-LIHC and ICGC-LIRI-JP, The HR is obtained using the univariate COX proportional risk model. When calculating the COX proportional risk model, the software will automatically give the risk ratio, which is the HR value.

优选地,所述分值为0或1;Preferably, the score is 0 or 1;

所述肝细胞癌相关性RNA结合蛋白基因的mRNA表达水平≥平均表达水平(表达水平中位值)且风险比率>1,或所述肝细胞癌相关性RNA结合蛋白基因的mRNA表达水平<平均表达水平(表达水平中位值)且风险比率<1,分值为1,否则分值为0。The mRNA expression level of the hepatocellular carcinoma-related RNA-binding protein gene is ≥ the average expression level (median expression level) and the risk ratio is > 1, or the mRNA expression level of the hepatocellular carcinoma-related RNA-binding protein gene is < average Expression level (median expression level) and risk ratio <1, the score is 1, otherwise the score is 0.

优选地,所述肝细胞癌相关性RNA结合蛋白基因包括PRPF3、SLBP、CPEB3、PPARGC1A、IGF2BP3、SF3B4、ILF2、CSTF2、ACO1和FBL。Preferably, the hepatocellular carcinoma-related RNA binding protein genes include PRPF3, SLBP, CPEB3, PPARGC1A, IGF2BP3, SF3B4, ILF2, CSTF2, ACO1 and FBL.

优选地,所述数据集包括肝细胞癌队列基因表达综合数据库、肝细胞癌基因组图谱或国际癌症基因组联盟日本肝癌数据中的任意一种或至少两种的组合。Preferably, the data set includes any one or a combination of at least two of the Comprehensive Gene Expression Database of the Hepatocellular Carcinoma Cohort, the Hepatocellular Carcinoma Genome Atlas, or the International Cancer Genome Consortium Japanese Liver Cancer Data.

本发明中,通过整合多个大规模跨平台的HCC队列mRNA表达数据,鉴定出30个在HCC组织中表达差异一致的肝细胞癌相关性RNA结合蛋白基因,并进一步利用机器学习算法随机森林筛选出10个关键的肝细胞癌相关性RNA结合蛋白基因,准确性更好、应用价值更高。In the present invention, by integrating multiple large-scale cross-platform HCC cohort mRNA expression data, 30 hepatocellular carcinoma-related RNA-binding protein genes with consistent expression differences in HCC tissues were identified, and further used machine learning algorithm random forest screening 10 key hepatocellular carcinoma-related RNA-binding protein genes were identified, with better accuracy and higher application value.

优选地,所述肝细胞癌队列基因表达综合数据库包括GSE14520、GSE22058、GSE25097、GSE36376、GSE45436、GSE64041、GSE76427、GSE54236或GSE63898中的任意一种或至少两种的组合。Preferably, the hepatocellular carcinoma cohort gene expression comprehensive database includes any one or a combination of at least two of GSE14520, GSE22058, GSE25097, GSE36376, GSE45436, GSE64041, GSE76427, GSE54236 or GSE63898.

优选地,所述数据集包括GSE14520、肝细胞癌基因组图谱和国际癌症基因组联盟日本肝癌数据。Preferably, the data set includes GSE14520, Hepatocellular Carcinoma Genome Atlas, and International Cancer Genome Consortium Japanese Liver Cancer Data.

优选地,所述肝细胞癌预后评分系统的输出变量为肝细胞癌相关性RNA结合蛋白基因的基尼系数和分值的加权和。Preferably, the output variable of the hepatocellular carcinoma prognostic scoring system is the weighted sum of the Gini coefficient and the score of the hepatocellular carcinoma-related RNA binding protein gene.

优选地,所述肝细胞癌预后评分系统的计算公式为:Preferably, the calculation formula of the hepatocellular carcinoma prognostic scoring system is:

RBP-score=∑(Gene_score×Gene_Weight)RBP-score=∑(Gene_score×Gene_Weight)

其中,Gene_Weight为基尼系数,Gene_score为分值,取值0或1。Among them, Gene_Weight is the Gini coefficient, and Gene_score is the score, which takes the value 0 or 1.

本发明中,基于随机森林算法构建的肝细胞癌预后评分系统RBP-score在指示和预测患者预后方面具有较高的潜在应用价值,RBP-score可以有效预测HCC患者的OS和DFS,与患者其他临床特征TNM分期、AFP和转移风险也具有一定的相关性,其临床价值在不同平台的数据集中得到了验证,Cox分析提示高RBP-score是HCC患者预后较差的独立危险因素。In the present invention, the hepatocellular carcinoma prognostic scoring system RBP-score constructed based on the random forest algorithm has high potential application value in indicating and predicting the prognosis of patients. RBP-score can effectively predict the OS and DFS of HCC patients, and is comparable to other patients. Clinical features TNM stage, AFP and risk of metastasis are also related to a certain extent, and their clinical value has been verified in data sets from different platforms. Cox analysis suggests that high RBP-score is an independent risk factor for poor prognosis in HCC patients.

具体地,Gene_score的确定方法按下表进行,如果一个基因的整合HR>1且mRNA表达≥中位值、或整合HR<1、mRNA表达<中位值,则该基因的Gene_score为1;否则,Gene_score为0。表中基因表达水平的高表达定义为表达值≥中位值,低表达定义为表达值<中位值,整合HR值基于GSE14520、TCGA-LIHC和ICGC-LIRI-JP数据利用单变量COX比例分险模型求得。Specifically, the Gene_score is determined according to the following table. If the integrated HR of a gene is >1 and the mRNA expression is ≥ the median value, or the integrated HR is <1 and the mRNA expression is < the median value, the Gene_score of the gene is 1; otherwise ,Gene_score is 0. High expression of gene expression levels in the table is defined as expression value ≥ median value, low expression is defined as expression value < median value, the integrated HR value is based on GSE14520, TCGA-LIHC and ICGC-LIRI-JP data using univariate COX proportion score Obtained from the risk model.

第二方面,本发明提供了一种肝细胞癌相关性RNA结合蛋白基因标志物,所述肝细胞癌相关性RNA结合蛋白基因标志物包括PRPF3、SLBP、CPEB3、PPARGC1A、IGF2BP3、SF3B4、ILF2、CSTF2、ACO1和FBL。In a second aspect, the invention provides a hepatocellular carcinoma-related RNA-binding protein gene marker, which includes PRPF3, SLBP, CPEB3, PPARGC1A, IGF2BP3, SF3B4, ILF2, CSTF2, ACO1 and FBL.

优选地,所述肝细胞癌相关性RNA结合蛋白基因标志物还包括POLR2G、MBNL2、PUF60、RALY、LSM4、CASC3、ZFP36、LARP1、SNRPC、TSNAX、RBM34、IGF2BP2、ABCF1、NSUN6、RBMS3、NONO、LSM2、SNRPB、ZGPAT和XPO5。Preferably, the hepatocellular carcinoma-related RNA binding protein gene markers also include POLR2G, MBNL2, PUF60, RALY, LSM4, CASC3, ZFP36, LARP1, SNRPC, TSNAX, RBM34, IGF2BP2, ABCF1, NSUN6, RBMS3, NONO, LSM2, SNRPB, ZGPAT and XPO5.

本发明中,总体技术路线图如图1所示,首先在9个肝细胞癌队列基因芯片表达综合数据库中,利用稳健排序整合算法(RRA)从430个功能明确的RBPs中筛选到30个在9个HCC队列中表达模式一致的RBPs,稳健排序整合算法(RRA)可以有效地整合多个不同平台的微阵列数据,从而获取有效的整合结果,利用所述稳健排序整合算法不仅筛选到了特异性好的HCC相关性RBPs,而且实现了数据降维,将研究范围从430个RBPs缩小到了30个RBPs,大大减小了研究负担。随后,在TCGA-LIHC和ICGC-LIRI-JP两个队列的RNA测序数据中对整合结果进行验证,这30个HCC相关性RBPs在RNA测序数据中的表达模式与在9个mRNA基因芯片数据中的表达模式完全一致。In the present invention, the overall technical roadmap is shown in Figure 1. First, in the gene chip expression comprehensive database of 9 hepatocellular carcinoma cohorts, a robust ranking integration algorithm (RRA) was used to screen out 30 RBPs with clear functions from 430. For RBPs with consistent expression patterns in the 9 HCC cohorts, the Robust Ranking and Integration Algorithm (RRA) can effectively integrate microarray data from multiple different platforms to obtain effective integration results. The robust ranking and integration algorithm is used to not only screen out the specificity Good HCC correlation RBPs, and achieved data dimensionality reduction, narrowing the research scope from 430 RBPs to 30 RBPs, greatly reducing the research burden. Subsequently, the integrated results were verified in the RNA sequencing data of the two cohorts TCGA-LIHC and ICGC-LIRI-JP. The expression patterns of these 30 HCC-related RBPs in the RNA sequencing data were consistent with those in the 9 mRNA gene chip data. The expression pattern is completely consistent.

本发明中,进一步利用随机森林算法,计算了30个HCC相关性RBPs基因在决定患者5年生存期中的重要性,并筛选到最重要的10个HCC相关性RBPs基因。In the present invention, the random forest algorithm was further used to calculate the importance of 30 HCC-related RBPs genes in determining the patient's 5-year survival period, and the 10 most important HCC-related RBPs genes were screened out.

第三方面,本发明提供了一种第二方面所述的肝细胞癌相关性RNA结合蛋白基因标志物的筛选方法,所述筛选方法包括:In a third aspect, the present invention provides a method for screening hepatocellular carcinoma-related RNA-binding protein gene markers described in the second aspect, and the screening method includes:

从肝细胞癌队列基因表达综合数据库中筛选出表达差异一致的RNA结合蛋白基因,得到初始的肝细胞癌相关性RNA结合蛋白基因。RNA-binding protein genes with consistent expression differences were screened out from the comprehensive gene expression database of the hepatocellular carcinoma cohort, and the initial hepatocellular carcinoma-related RNA-binding protein genes were obtained.

优选地,利用稳健排序整合算法(Robust rank aggregation,RRA)从9个肝细胞癌队列基因表达综合数据库中筛选出30个表达差异一致的RNA结合蛋白基因,并进一步在TCGA-LIHC和ICGC-LIRI-JP队列中进行验证;30个表达差异一致的初始的RNA结合蛋白基因进行了拷贝数变异、单核苷酸突变和启动子区甲基化程度分析,部分RNA结合蛋白基因是患者预后的危险或保护性因子。Preferably, 30 RNA-binding protein genes with consistent expression differences were screened out from the gene expression comprehensive database of 9 hepatocellular carcinoma cohorts using the Robust rank aggregation algorithm (RRA), and further analyzed in TCGA-LIHC and ICGC-LIRI. - Validation in the JP cohort; 30 initial RNA-binding protein genes with consistent expression differences were analyzed for copy number variation, single nucleotide mutation and promoter region methylation degree. Some RNA-binding protein genes are risk factors for patient prognosis. or protective factors.

优选地,所述筛选方法还包括:Preferably, the screening method further includes:

以肝细胞癌基因组图谱作为训练集,将训练集样本分为5年生存患者和5年非生存患者,建立随机森林分类模型;Using the hepatocellular carcinoma genome map as a training set, the training set samples were divided into 5-year survival patients and 5-year non-survival patients, and a random forest classification model was established;

利用所述随机森林分类模型对初始的肝细胞癌相关性RNA结合蛋白基因进行分类,得到关键的肝细胞癌相关性RNA结合蛋白基因。The random forest classification model is used to classify the initial hepatocellular carcinoma-related RNA-binding protein genes to obtain key hepatocellular carcinoma-related RNA-binding protein genes.

第四方面,本发明提供了第一方面所述的肝细胞癌预后评分系统在制备肝细胞癌预后监测产品中的应用。In the fourth aspect, the present invention provides the application of the hepatocellular carcinoma prognosis scoring system described in the first aspect in the preparation of hepatocellular carcinoma prognosis monitoring products.

优选地,所述肝细胞癌预后监测产品包括肝细胞癌预后监测试剂盒和/或肝细胞癌预后监测医疗器械。Preferably, the hepatocellular carcinoma prognosis monitoring product includes a hepatocellular carcinoma prognosis monitoring kit and/or a hepatocellular carcinoma prognosis monitoring medical device.

与现有技术相比,本发明具有如下有益效果:Compared with the prior art, the present invention has the following beneficial effects:

(1)本发明通过整合分析多个大规模跨平台的HCC队列mRNA表达数据,得到表达差异高度一致的HCC相关性RBPs基因表达谱,并利用机器学习模型随机森林构建了肝细胞癌预后评分系统,所述肝细胞癌预后评分系统是一个简单、强效的预后评价工具,具有跨平台特性,适用于不同亚组的HCC患者;(1) The present invention integrates and analyzes multiple large-scale cross-platform HCC cohort mRNA expression data to obtain HCC-related RBPs gene expression profiles with highly consistent expression differences, and uses the machine learning model random forest to construct a hepatocellular carcinoma prognosis scoring system. , the hepatocellular carcinoma prognostic scoring system is a simple and powerful prognostic evaluation tool with cross-platform characteristics and is suitable for different subgroups of HCC patients;

(2)本发明鉴定的HCC相关性RNA结合蛋白基因可能作为新的肝细胞癌诊疗靶点,用于HCC预后预测和评估。(2) The HCC-related RNA-binding protein gene identified in the present invention may be used as a new target for diagnosis and treatment of hepatocellular carcinoma and used for HCC prognosis prediction and assessment.

附图说明Description of the drawings

图1为总体技术路线图;Figure 1 is the overall technology roadmap;

图2A为通过稳健排序整合算法,在HCC队列中鉴定出30个表达特征高度一致的HCC相关RBPs基因,图2B为30个表达特征高度一致的HCC相关RBPs基因在9个不同HCC芯片数据队列中的表达情况,图2C为30个表达特征高度一致的HCC相关RBPs基因在TCGA-LIHC RNA测序数据中的表达情况,图2D为30个表达特征高度一致的HCC相关RBPs基因在ICGC-LIRI-JPRNA测序数据中的表达情况,图2E为30个表达特征高度一致的HCC相关RBPs基因在TCGA-LIHC队列的HCC患者TNM分期中的表达情况,图2F为30个表达特征高度一致的HCC相关RBPs基因在ICGC-LIRI-JP队列的HCC患者TNM分期中的表达情况,图2G为30个表达特征高度一致的HCC相关RBPs基因在TCGA-LIHC队列中能够有效区分癌组织和癌周组织,图2H为30个表达特征高度一致的HCC相关RBPs基因在ICGC-LIRI-JP HCC队列中能够有效区分癌组织和癌周组织;Figure 2A shows the identification of 30 HCC-related RBPs genes with highly consistent expression characteristics in the HCC cohort through the robust ranking integration algorithm. Figure 2B shows 30 HCC-related RBPs genes with highly consistent expression characteristics in 9 different HCC chip data cohorts. Figure 2C shows the expression of 30 HCC-related RBPs genes with highly consistent expression characteristics in TCGA-LIHC RNA sequencing data. Figure 2D shows the expression of 30 HCC-related RBPs genes with highly consistent expression characteristics in ICGC-LIRI-JPRNA. Expression status in sequencing data. Figure 2E shows the expression status of 30 HCC-related RBPs genes with highly consistent expression characteristics in TNM stages of HCC patients in the TCGA-LIHC cohort. Figure 2F shows 30 HCC-related RBPs genes with highly consistent expression characteristics. Expression status in TNM staging of HCC patients in the ICGC-LIRI-JP cohort, Figure 2G shows 30 HCC-related RBPs genes with highly consistent expression characteristics that can effectively distinguish cancer tissue and peritumoral tissue in the TCGA-LIHC cohort, Figure 2H shows 30 HCC-related RBPs genes with highly consistent expression characteristics can effectively distinguish cancer tissue and peritumoral tissue in the ICGC-LIRI-JP HCC cohort;

图3A为采用Cox模型计算30个HCC相关RBPs基因在TCGA-LIHC中的总生存风险比,图3B为TCGA-LIHC中与总生存相关的RBPs基因的Kaplan-Meier曲线,图3C为TCGA-LIHC中与无病生存相关的RBPs基因的Kaplan-Meier曲线,图3D为TCGA-LIHC、GSE14520和ICGC-LIRI-JP三个HCC数据集中30个与HCC相关的RBPs基因的综合生存分析结果;Figure 3A uses the Cox model to calculate the overall survival hazard ratio of 30 HCC-related RBPs genes in TCGA-LIHC. Figure 3B shows the Kaplan-Meier curve of RBPs genes related to overall survival in TCGA-LIHC. Figure 3C shows TCGA-LIHC. Kaplan-Meier curve of RBPs genes related to disease-free survival in TCGA-LIHC, GSE14520 and ICGC-LIRI-JP. Figure 3D shows the comprehensive survival analysis results of 30 RBPs genes related to HCC in three HCC data sets: TCGA-LIHC, GSE14520 and ICGC-LIRI-JP;

图4A为使用Kaplan-Meier曲线计算TCGA-LIHC患者的总生存率,图4B为使用Kaplan-Meier曲线计算GSE14520患者的总生存率,图4C为使用Kaplan-Meier曲线计算ICGC-LIRI-JP患者的总生存率,图4D为使用ROC曲线评估RBP-score预测TCGA-LIHC患者1年、3年和5年总生存率的准确性和特异性,图4E为使用ROC曲线评估RBP-score预测GSE14520患者1年、3年和5年总生存率的准确性和特异性,图4F为使用ROC曲线评估RBP-score预测ICGC-LIRI-JP患者1年、3年和5年总生存率的准确性和特异性,图4G为使用Kaplan-Meier曲线计算TCGA-LIHC患者的无病生存率,图4H为使用Kaplan-Meier曲线计算GSE14520患者的无病生存率;Figure 4A shows the overall survival rate of TCGA-LIHC patients using the Kaplan-Meier curve. Figure 4B shows the overall survival rate of the GSE14520 patients using the Kaplan-Meier curve. Figure 4C uses the Kaplan-Meier curve to calculate the overall survival rate of ICGC-LIRI-JP patients. Overall survival rate, Figure 4D is the use of ROC curve to evaluate the accuracy and specificity of RBP-score in predicting 1-year, 3-year and 5-year overall survival in TCGA-LIHC patients, Figure 4E is the use of ROC curve to evaluate the accuracy and specificity of RBP-score in predicting GSE14520 patients. Accuracy and specificity of 1-year, 3-year and 5-year overall survival rates. Figure 4F shows the use of ROC curve to evaluate the accuracy and specificity of RBP-score in predicting 1-year, 3-year and 5-year overall survival rates of ICGC-LIRI-JP patients. Specificity, Figure 4G shows the use of Kaplan-Meier curve to calculate the disease-free survival rate of TCGA-LIHC patients, Figure 4H shows the use of Kaplan-Meier curve to calculate the disease-free survival rate of GSE14520 patients;

图5A为HCC患者按性别分层后使用RBP-score对各亚组患者的预后评估结果,图5B为HCC患者按年龄分层后使用RBP-score对各亚组患者的预后评估结果,图5C为HCC患者按TNM分期分层后使用RBP-score对各亚组患者的预后评估结果,图5D为HCC患者按AFP水平分层后使用RBP-score对各亚组患者的预后评估结果,图5E为HCC患者按HBV感染情况分层后使用RBP-score对各亚组患者的预后评估结果,图5F为HCC患者按HCV感染情况分层后使用RBP-score对各亚组患者的预后评估结果,图5G为HCC患者按肝硬化情况分层后使用RBP-score对各亚组患者的预后评估结果。Figure 5A shows the prognostic evaluation results of each subgroup of patients using RBP-score after HCC patients were stratified by gender. Figure 5B shows the prognostic evaluation results of each subgroup of patients using RBP-score after HCC patients were stratified by age. Figure 5C Figure 5D shows the prognostic evaluation results of HCC patients using RBP-score for each subgroup of patients after stratification by TNM stage. Figure 5D shows the prognostic evaluation results of HCC patients using RBP-score for each subgroup of patients after stratification by AFP level. Figure 5E Figure 5F shows the prognostic evaluation results of each subgroup of patients using RBP-score after stratifying HCC patients according to HBV infection status. Figure 5F shows the prognostic evaluation results of each subgroup of patients using RBP-score after stratifying HCC patients according to HCV infection status. Figure 5G shows the prognostic evaluation results of each subgroup of patients using RBP-score after HCC patients were stratified by liver cirrhosis.

具体实施方式Detailed ways

为进一步阐述本发明所采取的技术手段及其效果,以下结合实施例和附图对本发明作进一步地说明。可以理解的是,此处所描述的具体实施方式仅仅用于解释本发明,而非对本发明的限定。In order to further illustrate the technical means adopted by the present invention and its effects, the present invention will be further described below with reference to the embodiments and drawings. It can be understood that the specific embodiments described here are only used to explain the present invention, but not to limit the present invention.

实施例中未注明具体技术或条件者,按照本领域内的文献所描述的技术或条件,或者按照产品说明书进行。所用试剂或仪器未注明生产厂商者,均为可通过正规渠道商购获得的常规产品。If specific techniques or conditions are not specified in the examples, the techniques or conditions described in literature in the field shall be followed, or the product instructions shall be followed. If the manufacturer of the reagents or instruments used is not indicated, they are all conventional products that can be purchased through regular channels.

实施例1Example 1

本实施例首先从基因表达综合数据库(Gene Expression Omnibus,GEO,https://www.ncbi.nlm.nih.gov/geo/)获取9个HCC队列的mRNA芯片数据GSE14520、GSE22058、GSE25097、GSE36376、GSE45436、GSE64041、GSE76427、GSE54236和GSE63898,从NIH GDCData Portal(https://portal.gdc.cancer.gov/)收集包括mRNA表达数据的肝细胞癌基因组图谱(The Cancer Genome Atlas Liver Hepatocellular Carcinoma,TCGA-LIHC),从ICGC DCC(https://dcc.icgc.org/releases)收集国际癌症基因组联盟日本肝癌(International Cancer Genome Consortium Japanese liver cancer,ICGC-LIRI-JP)数据。This example first obtains the mRNA chip data GSE14520, GSE22058, GSE25097, GSE36376, and 9 HCC cohorts from the Gene Expression Omnibus (GEO, https://www.ncbi.nlm.nih.gov/geo/) GSE45436, GSE64041, GSE76427, GSE54236 and GSE63898, The Cancer Genome Atlas Liver Hepatocellular Carcinoma (TCGA- LIHC), collecting International Cancer Genome Consortium Japanese liver cancer (ICGC-LIRI-JP) data from ICGC DCC (https://dcc.icgc.org/releases).

上述HCC队列数据集的基因标识符均为最新的HUGO基因名称,mRNA表达数据采用log2进行标准化。The gene identifiers of the above HCC cohort data sets are all the latest HUGO gene names, and the mRNA expression data are normalized using log2.

以下实施例中,使用R软件(version 3.6.1)进行统计学分析,两组间正态分布数据的差异使用独立样本t检验进行分析,非正态分布数据的差异使用wilcoxon检验进行分析;单一HCC相关性RBPs基因和RBP-score与患者OS、DFS的关系采用Kaplan-Meier生存分析log-rank检验进行分析,单一HCC相关性RBPs基因、RBP-score以及影响OS的其他临床指标采用单因素及多因素Cox分析获取;RBP-score与患者TNM分期、AFP水平等临床特征的关系采用卡方检验进行分析;P<0.05被定义为差异具有统计学意义。In the following examples, R software (version 3.6.1) was used for statistical analysis. The difference in normally distributed data between the two groups was analyzed using the independent sample t test. The difference in non-normally distributed data was analyzed using the wilcoxon test; single The relationship between HCC-related RBPs genes and RBP-score and patient OS and DFS was analyzed using Kaplan-Meier survival analysis log-rank test. Single HCC-related RBPs genes, RBP-score and other clinical indicators that affect OS were analyzed using single factor and Multifactor Cox analysis was obtained; the relationship between RBP-score and clinical characteristics such as TNM stage and AFP level of patients was analyzed using chi-square test; P<0.05 was defined as a statistically significant difference.

实施例2Example 2

本实施例采用稳健排序整合算法(Robust rank aggregation,RRA)(Kolde R,Laur S,Adler P,Vilo J.Robust rank aggregation for gene list integration andmeta-analysis.Bioinformatics 2012;28(4):573-80)对9个mRNA芯片数据进行整合,以获得具有一致性表达模式的mRNA,RRA算法使用R software(version3.6.1)执行。选取在癌组织和正常组织中差异表达倍数(fold change)>1.5或<-1.5且P<0.05的基因建立HCC RRA列表,HCC RRA列表共包含1326个差异表达显著的基因,这些基因在9个HCC队列中呈现一致性上调或下调。This embodiment uses Robust rank aggregation (RRA) (Kolde R, Laur S, Adler P, Vilo J. Robust rank aggregation for gene list integration and meta-analysis. Bioinformatics 2012; 28(4):573-80 ) Integrated 9 mRNA chip data to obtain mRNA with consistent expression patterns, and the RRA algorithm was executed using R software (version 3.6.1). Select genes with differential expression fold change (fold change) >1.5 or <-1.5 and P<0.05 in cancer tissues and normal tissues to establish the HCC RRA list. The HCC RRA list contains a total of 1326 genes with significant differential expression. These genes are in 9 There was consistent up-regulation or down-regulation in the HCC cohort.

通过检索The database of RNA-binding specificities(RBPDB,http://rbpdb.ccbr.utoronto.ca/)并参考Gerstberger的研究工作(Gerstberger S,Hafner M,Tuschl T.A census of human RNA-binding proteins.Nat Rev Genet2014;15(12):829-45),从差异表达显著的基因中得到430个RNA结合蛋白基因,并构建人类RBPs基因列表,此列表中的基因的翻译产物为经过功能鉴定的能发挥RNA结合功能的蛋白质。By searching The database of RNA-binding specificities (RBPDB, http://rbpdb.ccbr.utoronto.ca/) and referring to the research work of Gerstberger (Gerstberger S, Hafner M, Tuschl T. A census of human RNA-binding proteins. Nat Rev Genet2014;15(12):829-45), 430 RNA-binding protein genes were obtained from significantly differentially expressed genes, and a human RBPs gene list was constructed. The translation products of the genes in this list are functionally identified RNAs that can exert Functional protein binding.

如图2A所示,将HCC RRA列表和RBPs基因列表取交集,鉴定出30个在9个HCC队列的芯片数据集中表达差异一致的RBP mRNA,定义为HCC相关性RBPs基因(HCC-associatedRBPs genes)。这30个RBP mRNA在9个HCC队列中的表达差异(癌组织vs正常组织)如图2B所示,其中8个RBP mRNA在HCC组织中呈现低表达(P<0.05),22个RBP mRNA在HCC组织中呈现高表达(P<0.05)。As shown in Figure 2A, the HCC RRA list and the RBPs gene list were intersected, and 30 RBP mRNAs with consistent differential expression in the chip data sets of the 9 HCC cohorts were identified, which were defined as HCC-associated RBPs genes (HCC-associated RBPs genes). . The expression differences of these 30 RBP mRNAs in 9 HCC cohorts (cancer tissue vs. normal tissue) are shown in Figure 2B. Among them, 8 RBP mRNAs showed low expression in HCC tissues (P<0.05), and 22 RBP mRNAs showed low expression in HCC tissues (P<0.05). It was highly expressed in HCC tissues (P<0.05).

接下来,利用TCGA-LIHC和ICGC-LIRI-JP的RNA测序数据验证HCC相关性RBPs基因的表达情况,如图2C和图2D所示,30个HCC相关性RBPs基因在TCGA-LIHC和ICGC-LIRI-JP中的表达情况与在9个HCC队列中的表达情况高度一致。Next, the RNA sequencing data of TCGA-LIHC and ICGC-LIRI-JP were used to verify the expression of HCC-related RBPs genes. As shown in Figure 2C and Figure 2D, 30 HCC-related RBPs genes were expressed in TCGA-LIHC and ICGC- The expression profile in LIRI-JP was highly consistent with that in the 9 HCC cohorts.

进一步分析30个HCC相关性RBPs基因在不同TNM分期组织中的表达,如图2E和图2F所示,一些RNA结合蛋白如XPO5和CPEB3在早期组织和晚期组织中的表达存在显著差异,并且这种差异与其在正常组织和肿瘤组织中的变化趋势是一致的,因此这些RNA结合蛋白很有可能发挥促癌或抑癌作用。30个HCC相关性RBPs基因在TCGA-LIHC和ICGC-LIRI-JP中的PCA分析结果如图2G和图2H所示,PCA分析发现,30个HCC相关性RBPs基因的mRNA表达谱可以有效区分肿瘤组织和正常组织,由此说明鉴定的30个HCC相关性RBPs基因与HCC高度相关,具有进一步研究的价值。Further analysis of the expression of 30 HCC-related RBPs genes in different TNM stage tissues, as shown in Figure 2E and Figure 2F, there are significant differences in the expression of some RNA-binding proteins such as XPO5 and CPEB3 in early tissues and late tissues, and this This difference is consistent with its changing trend in normal tissues and tumor tissues, so these RNA-binding proteins are likely to play a tumor-promoting or tumor-suppressing role. The PCA analysis results of 30 HCC-related RBPs genes in TCGA-LIHC and ICGC-LIRI-JP are shown in Figure 2G and Figure 2H. PCA analysis found that the mRNA expression profiles of 30 HCC-related RBPs genes can effectively distinguish tumors. tissue and normal tissue, which shows that the 30 identified HCC-related RBPs genes are highly related to HCC and have the value of further research.

实施例3Example 3

本实施例进一步探索了30个HCC相关性RBPs基因的临床应用价值,分析单个RBP基因的mRNA表达数据在3个HCC队列GSE14520、TCGA-LIHC和ICGC-LIRI-JP中与预后的关联。This example further explores the clinical application value of 30 HCC-related RBPs genes and analyzes the association between the mRNA expression data of individual RBP genes and prognosis in three HCC cohorts GSE14520, TCGA-LIHC and ICGC-LIRI-JP.

30个RBP基因基于Cox比例风险模型(Cox proportional hazards model)的生存分析结果见图3A;以mRNA表达的中位值作为cut-off,每个HCC相关性RBP基因关于总体生存率和无病生存率的Kaplan-Meier曲线见图3B和图3C(仅显示log-rank检验P<0.05的结果);三个数据集的整合分析结果见图3D。The survival analysis results of 30 RBP genes based on the Cox proportional hazards model are shown in Figure 3A; using the median value of mRNA expression as cut-off, each HCC-related RBP gene has an impact on overall survival and disease-free survival. The Kaplan-Meier curves of the rate are shown in Figure 3B and Figure 3C (only the results of log-rank test P<0.05 are shown); the integrated analysis results of the three data sets are shown in Figure 3D.

30个RBPs基因中的大部分都与HCC患者的生存相关,提示这些RBPs基因可能发挥促癌或者抑癌作用。整合了TCGA-LIHC、GSE14520和ICGC-LIRI-JP三个数据集后,可以发现CSTF2、SF3B4、PPARGCA1和RALY等基因在三个数据集中的表现较为一致。综合上述证据,说明30个HCC相关性RBPs基因中的部分基因与HCC患者的生存关系密切。Most of the 30 RBPs genes are related to the survival of HCC patients, suggesting that these RBPs genes may play a tumor-promoting or tumor-suppressing role. After integrating the three data sets of TCGA-LIHC, GSE14520 and ICGC-LIRI-JP, it can be found that the performance of genes such as CSTF2, SF3B4, PPARGCA1 and RALY is relatively consistent in the three data sets. Based on the above evidence, it is shown that some of the 30 HCC-related RBPs genes are closely related to the survival of HCC patients.

实施例4Example 4

为了获得含有较少基因数量的RBPs基因标志物,本实施例利用随机森林算法进一步筛选关键的HCC相关性RBPs基因。步骤如下:In order to obtain RBPs gene markers containing a smaller number of genes, this embodiment uses a random forest algorithm to further screen key HCC-related RBPs genes. Proceed as follows:

以TCGA-LIHC数据集作为训练集,将所有患者分为5年生存患者和5年非生存患者,建立随机森林分类模型,HCC相关性RBPs基因的mRNA表达数据作为输入变量(ntree=500);进行10倍交叉验证(10-fold cross validation,CV=10)分层后,根据重要值基尼平均值(Mean Gini value,cut-off=5.1)选择10个关键的HCC相关性RBPs基因构建RBPs基因标志物。Using the TCGA-LIHC data set as the training set, all patients were divided into 5-year survival patients and 5-year non-survival patients, and a random forest classification model was established, with the mRNA expression data of HCC-related RBPs genes as input variables (ntree=500); After 10-fold cross validation (CV=10) stratification, 10 key HCC-related RBPs genes were selected based on the mean Gini value (cut-off=5.1) to construct RBPs genes. landmark.

为了将HCC相关性RBPs基因应用于临床,本实施例利用筛选的10个关键的HCC相关性RBPs基因构建了HCC预后评分系统(prognostic score system,RBP-score),RBP-score的计算公式如下:In order to apply HCC-related RBPs genes to clinical applications, this example uses the screened 10 key HCC-related RBPs genes to construct an HCC prognostic score system (RBP-score). The calculation formula of RBP-score is as follows:

RBP-score=∑(Gene_score×Gene_Weight)RBP-score=∑(Gene_score×Gene_Weight)

其中,Gene_Weight为随机森林模型的基尼系数,Gene_score根据10个关键的HCC相关性RBPs基因的mRNA表达水平和对应的整合性风险比率(Integrated hazard ratio,Integrated HR)计算确定,Integrated HR根据10个关键的HCC相关性RBPs基因在GSE14520、TCGA-LIHC和ICGC-LIRI-JP 3个队列中的总体生存率Cox回归分析的整合结果确定,若某基因的Integrated HR>1且mRNA表达量>平均表达量、或Integrated HR<1且mRNA表达量<平均表达量,则该基因的Gene_score=1,否则Gene_score=0。Among them, Gene_Weight is the Gini coefficient of the random forest model, Gene_score is calculated and determined based on the mRNA expression levels of 10 key HCC-related RBPs genes and the corresponding integrated hazard ratio (Integrated HR), and Integrated HR is calculated based on 10 key HCC-related RBPs genes. The integrated results of Cox regression analysis of the overall survival rate of HCC-related RBPs genes in the three cohorts of GSE14520, TCGA-LIHC and ICGC-LIRI-JP determined that if the Integrated HR of a gene is >1 and the mRNA expression is >the average expression , or Integrated HR<1 and mRNA expression <average expression, then Gene_score=1 for the gene, otherwise Gene_score=0.

接下来在GSE14520、TCGA-LIHC和ICGC-LIRI-JP 3个队列中验证RBP-score与患者预后的关系,患者HCC组织的RBP-score根据上述公式计算,以RBP-score的下四分位数、中位数以及上四分位数作为cut-off,将患者分为Q1、Q2、Q3和Q4四组,RBP-scoreQ1<RBP-scoreQ2<RBP-scoreQ3<RBP-scoreQ4,分析各组的总体生存率(overall survival,OS)和无病生存率(disease-free survival,DFS)。Next, the relationship between RBP-score and patient prognosis was verified in three cohorts: GSE14520, TCGA-LIHC and ICGC-LIRI-JP. The RBP-score of the patient's HCC tissue was calculated according to the above formula, and the lower quartile of the RBP-score was used. , median and upper quartile as cut-off, divide the patients into four groups: Q1, Q2, Q3 and Q4, RBP-scoreQ1<RBP-scoreQ2<RBP-scoreQ3<RBP-scoreQ4, analyze the overall results of each group Overall survival (OS) and disease-free survival (DFS).

如图4A、图4B和图4C所示,可明显观察到患者OS随着RBP-score的增加而降低的趋势;如图4D、图4E和图4F所示,ROC分析RBP-score对于1年OS、3年OS和5年OS的预测准确性在各数据集中均>65%;如图4G和图4H所示,在GSE14520和TCGA-LIHC中,较高的RBP-score暗示了较差的DFS。As shown in Figure 4A, Figure 4B and Figure 4C, it can be clearly observed that the patient OS decreases with the increase of RBP-score; as shown in Figure 4D, Figure 4E and Figure 4F, ROC analysis RBP-score for 1 year The prediction accuracy of OS, 3-year OS, and 5-year OS was >65% in each dataset; as shown in Figure 4G and Figure 4H, in GSE14520 and TCGA-LIHC, a higher RBP-score implies a poorer DFS.

根据RBP-score与HCC患者其他临床特征的卡方分析结果(TCGA-LIHC andGSE14520),可以发现RBP-score较高的患者AFP>300ng/mL、TNM分期为晚期(III-IV)、CLIP分期为晚期(>3)、肿瘤尺寸>5cm、出现血管浸润的比例较高。According to the chi-square analysis results of RBP-score and other clinical characteristics of HCC patients (TCGA-LIHC andGSE14520), it can be found that patients with higher RBP-score have AFP>300ng/mL, TNM stage is advanced (III-IV), and CLIP stage is Advanced stage (>3), tumor size >5cm, and vascular invasion are more likely to occur.

在TCGA-LIHC和GSE14520中,结合了患者其他临床特征的Cox比例风险模型(Coxproportional hazards analysis)结果显示,RBP-score是HCC患者具有较差总体生存率的独立危险因素(HRTCGA-LIHC=2.57,HRGSE14520=1.66,P<0.05)。In TCGA-LIHC and GSE14520, the results of the Cox proportional hazards analysis (Coxproportional hazards analysis) combined with other clinical characteristics of patients showed that RBP-score was an independent risk factor for poor overall survival in HCC patients (HRTCGA-LIHC=2.57, HRGSE14520=1.66, P<0.05).

实施例5Example 5

本实施例进行亚组生存率(subgroup-survival)分析,将HCC患者依据性别、年龄、TNM分期、甲胎蛋白(AFP)水平、HBV情况、HCV情况、肝硬化共7项临床参数分成亚组,各亚组的患者又被分为高RBP-score组和低RBP-score组(cut-off=RBP-score平均值),分析各组的总体生存率(overall survival,OS)和无病生存率(disease-free survival,DFS)。This example performs subgroup-survival analysis and divides HCC patients into subgroups based on seven clinical parameters: gender, age, TNM stage, alpha-fetoprotein (AFP) level, HBV status, HCV status, and liver cirrhosis. , patients in each subgroup were further divided into high RBP-score group and low RBP-score group (cut-off = mean RBP-score), and the overall survival (OS) and disease-free survival of each group were analyzed. rate (disease-free survival, DFS).

如图5A、图5B、图5C、图5D、图5E、图5F和图5G所示,以中位值作为cut-off,RBP-score在TCGA-LIHC各亚组中可以有效预测OS。在GSE14520和ICGC-LIRI-JP也进行了类似的subgroup-survival分析,排除存在分布偏倚的亚组外,RBP-score在大部分亚组中可以指示OS。需要注意的是,RBP-score在同一临床分期的患者中也能有效提示OS。可见基于RBPs基因的HCC分子预后评分系统具有普遍适用性。As shown in Figure 5A, Figure 5B, Figure 5C, Figure 5D, Figure 5E, Figure 5F and Figure 5G, using the median value as cut-off, RBP-score can effectively predict OS in each subgroup of TCGA-LIHC. Similar subgroup-survival analysis was also conducted on GSE14520 and ICGC-LIRI-JP. Except for subgroups with distribution bias, RBP-score can indicate OS in most subgroups. It should be noted that RBP-score can also effectively predict OS in patients of the same clinical stage. It can be seen that the HCC molecular prognostic scoring system based on RBPs genes has universal applicability.

综上所述,本发明基于肝细胞癌相关性RNA结合蛋白基因构建的肝细胞癌预后评分系统是一个简单而强效的预后评价工具,它适用于不同亚组的HCC患者并具有跨平台特性,鉴定出的肝细胞癌相关性RNA结合蛋白基因可能作为新的HCC诊疗靶点。In summary, the hepatocellular carcinoma prognostic scoring system constructed based on the hepatocellular carcinoma-related RNA binding protein gene of the present invention is a simple and powerful prognostic evaluation tool, which is suitable for different subgroups of HCC patients and has cross-platform characteristics. , the identified hepatocellular carcinoma-related RNA-binding protein gene may serve as a new target for HCC diagnosis and treatment.

申请人声明,本发明通过上述实施例来说明本发明的详细方法,但本发明并不局限于上述详细方法,即不意味着本发明必须依赖上述详细方法才能实施。所属技术领域的技术人员应该明了,对本发明的任何改进,对本发明产品各原料的等效替换及辅助成分的添加、具体方式的选择等,均落在本发明的保护范围和公开范围之内。The applicant declares that the present invention illustrates the detailed methods of the present invention through the above embodiments, but the present invention is not limited to the above detailed methods, that is, it does not mean that the present invention must rely on the above detailed methods to be implemented. Those skilled in the art should understand that any improvements to the present invention, equivalent replacement of raw materials of the product of the present invention, addition of auxiliary ingredients, selection of specific methods, etc., all fall within the protection scope and disclosure scope of the present invention.

Claims (6)

1. A hepatocellular carcinoma prognosis scoring system based on an RNA-binding protein gene, wherein the input variables of the hepatocellular carcinoma prognosis scoring system comprise a coefficient of kurther and a score of the hepatocellular carcinoma-associated RNA-binding protein gene in a dataset;
the score is determined based on mRNA expression levels and risk ratios of the hepatocellular carcinoma-associated RNA-binding protein gene in the dataset;
the hepatocellular carcinoma-associated RNA binding protein gene comprises PRPF3, SLBP, CPEB3, PPARGC1A, IGF BP3, SF3B4, ILF2, CSTF2, ACO1 and FBL;
the data set comprises GSE14520, hepatocellular carcinoma genomic profile, and international cancer genome alliance, japan liver cancer data;
the output variable of the hepatocellular carcinoma prognosis scoring system is a weighted sum of the coefficient of the kennel and the score of the hepatocellular carcinoma-associated RNA-binding protein gene;
the calculation formula of the hepatocellular carcinoma prognosis scoring system is as follows:
RBP-score=∑(Gene_score×Gene_Weight)
wherein RBP-score is the score in the hepatocellular carcinoma prognosis scoring system, gene_weight is the coefficient of foundation, gene_score is the score, and Gene_score takes a value of 0 or 1.
2. The hepatocellular carcinoma prognostic scoring system according to claim 1, wherein the risk ratio is determined based on overall survival of the hepatocellular carcinoma-associated RNA binding protein gene in the dataset based on a univariate Cox-proportional risk model.
3. The hepatocellular carcinoma prognosis scoring system of claim 1, wherein the score is 0 or 1;
the mRNA expression level of the relevant RNA binding protein gene of the liver cell cancer is more than or equal to the average expression level and the risk ratio is more than 1, or the mRNA expression level of the relevant RNA binding protein gene of the liver cell cancer is less than the average expression level and the risk ratio is less than 1, the score is 1, otherwise, the score is 0.
4. A method of screening for a gene marker for a hepatocellular carcinoma-associated RNA binding protein, the method comprising:
screening out RNA binding protein genes with consistent expression difference from a comprehensive database of liver cell cancer queue gene expression to obtain initial liver cell cancer related RNA binding protein genes;
the screening method further comprises the steps of:
taking a liver cell cancer genome map as a training set, dividing the training set sample into a 5-year survival patient and a 5-year non-survival patient, and establishing a random forest classification model;
classifying the initial hepatocellular carcinoma-related RNA binding protein genes by using the random forest classification model to obtain key hepatocellular carcinoma-related RNA binding protein genes;
the RNA binding protein genes include PRPF3, SLBP, CPEB3, PPARGC1A, IGF2BP3, SF3B4, ILF2, CSTF2, ACO1, and FBL.
5. A method of preparing a hepatocellular carcinoma prognosis monitoring product, characterized in that the method employs the hepatocellular carcinoma prognosis scoring system of any one of claims 1-3.
6. The method of claim 5, wherein the hepatocellular carcinoma prognosis monitoring product comprises a hepatocellular carcinoma prognosis monitoring kit and/or a hepatocellular carcinoma prognosis monitoring medical device.
CN202110172416.1A 2021-02-08 2021-02-08 Hepatocellular carcinoma prognosis scoring system based on RNA binding protein gene and application thereof Active CN112908470B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110172416.1A CN112908470B (en) 2021-02-08 2021-02-08 Hepatocellular carcinoma prognosis scoring system based on RNA binding protein gene and application thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110172416.1A CN112908470B (en) 2021-02-08 2021-02-08 Hepatocellular carcinoma prognosis scoring system based on RNA binding protein gene and application thereof

Publications (2)

Publication Number Publication Date
CN112908470A CN112908470A (en) 2021-06-04
CN112908470B true CN112908470B (en) 2023-10-03

Family

ID=76122735

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110172416.1A Active CN112908470B (en) 2021-02-08 2021-02-08 Hepatocellular carcinoma prognosis scoring system based on RNA binding protein gene and application thereof

Country Status (1)

Country Link
CN (1) CN112908470B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113416786A (en) * 2021-08-09 2021-09-21 深圳市人民医院 Biomarker combination for hepatocellular carcinoma prognosis evaluation and screening method and application thereof
CN113611363B (en) * 2021-08-09 2023-11-28 上海基绪康生物科技有限公司 Method for identifying cancer driving gene by using consensus prediction result
CN114783609B (en) * 2022-05-11 2025-07-15 深圳市人民医院 A prognostic scoring system for hepatocellular carcinoma based on PUS family genes and its application
CN115920006B (en) * 2022-09-19 2023-09-05 山东大学 Application of ABCF1 or its agonist in preparation of anti-DNA virus preparation
CN115807089B (en) * 2022-11-14 2024-09-13 石河子大学 Liver cell liver cancer prognosis biomarker and application thereof
CN116844685B (en) * 2023-07-03 2024-04-12 广州默锐医药科技有限公司 Immunotherapeutic effect evaluation method, device, electronic equipment and storage medium

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1852974A (en) * 2003-06-09 2006-10-25 密歇根大学董事会 Compositions and methods for treating and diagnosing cancer
CN101622348A (en) * 2006-12-08 2010-01-06 奥斯瑞根公司 Gene and the approach regulated as the miR-20 of targets for therapeutic intervention
CN101801419A (en) * 2007-06-08 2010-08-11 米尔纳疗法公司 MiR-34 regulated genes and pathways as targets for therapeutic intervention
CN104271033A (en) * 2012-05-03 2015-01-07 曼迪奥研究有限公司 Methods and systems of evaluating a risk of a gastrointestinal cancer
CN106771200A (en) * 2016-11-22 2017-05-31 陈静 Application and kit of the IGF2BP3 and AFP joint marks in liver cancer diagnosis and treatment assessment kit is prepared
CN107657149A (en) * 2017-09-12 2018-02-02 中国人民解放军军事医学科学院生物医学分析中心 System for predicting liver cancer patient prognosis
CN107922973A (en) * 2015-07-07 2018-04-17 远见基因组系统公司 Method and system for the modification detection based on sequencing
CN108410984A (en) * 2018-02-11 2018-08-17 中山大学 RBMS3 is as tumor drug resistance detection, treatment and the application of prognosis molecule target spot
CN108603230A (en) * 2015-10-09 2018-09-28 南安普敦大学 The screening of the adjusting of gene expression and protein expression imbalance
CN109593848A (en) * 2018-11-08 2019-04-09 浙江大学 A kind of tumour correlated series, long-chain non-coding RNA and its application
CN110070915A (en) * 2017-11-10 2019-07-30 首尔大学医院 The next generation utilizes the Prognosis in Breast Cancer prediction technique and forecasting system based on machine learning of base sequence analysis
KR20190090939A (en) * 2018-01-26 2019-08-05 충남대학교산학협력단 Biomarker composition comprising human cytochrome P450 4A for diagnosis or predicting prognosis of hepatocellular carcinoma
CN110993106A (en) * 2019-12-11 2020-04-10 深圳市华嘉生物智能科技有限公司 Liver cancer postoperative recurrence risk prediction method combining pathological image and clinical information
CN110996990A (en) * 2017-06-02 2020-04-10 亚利桑那州立大学董事会 Universal cancer vaccines and methods of making and using the same
CN111132682A (en) * 2017-07-28 2020-05-08 雷莫内克斯生物制药有限公司 Pharmaceutical composition for preventing or treating liver cancer

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB201320061D0 (en) * 2013-11-13 2013-12-25 Electrophoretics Ltd Materials nad methods for diagnosis and prognosis of liver cancer

Patent Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1852974A (en) * 2003-06-09 2006-10-25 密歇根大学董事会 Compositions and methods for treating and diagnosing cancer
CN101622348A (en) * 2006-12-08 2010-01-06 奥斯瑞根公司 Gene and the approach regulated as the miR-20 of targets for therapeutic intervention
CN101627121A (en) * 2006-12-08 2010-01-13 奥斯瑞根公司 As the miRNA regulatory gene and the path for the treatment of the target of intervening
CN101801419A (en) * 2007-06-08 2010-08-11 米尔纳疗法公司 MiR-34 regulated genes and pathways as targets for therapeutic intervention
CN104271033A (en) * 2012-05-03 2015-01-07 曼迪奥研究有限公司 Methods and systems of evaluating a risk of a gastrointestinal cancer
CN107922973A (en) * 2015-07-07 2018-04-17 远见基因组系统公司 Method and system for the modification detection based on sequencing
CN108603230A (en) * 2015-10-09 2018-09-28 南安普敦大学 The screening of the adjusting of gene expression and protein expression imbalance
CN106771200A (en) * 2016-11-22 2017-05-31 陈静 Application and kit of the IGF2BP3 and AFP joint marks in liver cancer diagnosis and treatment assessment kit is prepared
CN110996990A (en) * 2017-06-02 2020-04-10 亚利桑那州立大学董事会 Universal cancer vaccines and methods of making and using the same
CN111132682A (en) * 2017-07-28 2020-05-08 雷莫内克斯生物制药有限公司 Pharmaceutical composition for preventing or treating liver cancer
CN107657149A (en) * 2017-09-12 2018-02-02 中国人民解放军军事医学科学院生物医学分析中心 System for predicting liver cancer patient prognosis
CN110070915A (en) * 2017-11-10 2019-07-30 首尔大学医院 The next generation utilizes the Prognosis in Breast Cancer prediction technique and forecasting system based on machine learning of base sequence analysis
KR20190090939A (en) * 2018-01-26 2019-08-05 충남대학교산학협력단 Biomarker composition comprising human cytochrome P450 4A for diagnosis or predicting prognosis of hepatocellular carcinoma
CN108410984A (en) * 2018-02-11 2018-08-17 中山大学 RBMS3 is as tumor drug resistance detection, treatment and the application of prognosis molecule target spot
CN109593848A (en) * 2018-11-08 2019-04-09 浙江大学 A kind of tumour correlated series, long-chain non-coding RNA and its application
CN110993106A (en) * 2019-12-11 2020-04-10 深圳市华嘉生物智能科技有限公司 Liver cancer postoperative recurrence risk prediction method combining pathological image and clinical information

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
《Prognostic potential of PRPF3 in hepatocellular carcinoma》;Liu, YL et al.;《AGING-US》;20200115;第12卷(第1期);第912-930页 *
《TCGA 数据库中肝癌相关差异长链非编码RNA筛选和功能预测》;孙金旗等;《胃肠病学和肝病学杂志》;第28卷(第2期);第147-153页 *
《基于蛋白互作网络分析SNRPB 在肝癌发生中的作用》;李康智等;《基因组学与应用生物学》;20191031;第38卷(第10期);第4673-4679页 *

Also Published As

Publication number Publication date
CN112908470A (en) 2021-06-04

Similar Documents

Publication Publication Date Title
CN112908470B (en) Hepatocellular carcinoma prognosis scoring system based on RNA binding protein gene and application thereof
Liu et al. A lncRNA prognostic signature associated with immune infiltration and tumour mutation burden in breast cancer
Zhou et al. Construction of an immune-related six-lncRNA signature to predict the outcomes, immune cell infiltration, and immunotherapy response in patients with hepatocellular carcinoma
Ramaker et al. RNA sequencing-based cell proliferation analysis across 19 cancers identifies a subset of proliferation-informative cancers with a common survival signature
US20240318254A1 (en) Construction method of risk prediction model for prognosis of gastric cancer
Marcus et al. Incorporating epistasis interaction of genetic susceptibility single nucleotide polymorphisms in a lung cancer risk prediction model
Zhang et al. TMEM206 is a potential prognostic marker of hepatocellular carcinoma
Berkel et al. Transcriptomic analysis reveals tumor stage-or grade-dependent expression of miRNAs in serous ovarian cancer
Wang et al. Identification of potential biomarkers in cervical cancer with combined public mRNA and miRNA expression microarray data analysis
CN117625793B (en) Screening method of ovarian cancer biomarker and application thereof
Shi et al. Application of an autophagy-related gene prognostic risk model based on TCGA database in cervical cancer
Zhang et al. Identification of m6A methyltransferase-related genes predicts prognosis and immune infiltrates in head and neck squamous cell carcinoma
Zhang et al. Identification and validation of an eight-lncRNA signature that predicts prognosis in patients with esophageal squamous cell carcinoma
Ning et al. Identification of CD4+ conventional T cells-related lncRNA signature to improve the prediction of prognosis and immunotherapy response in breast cancer
Zhu et al. Effects of immune inflammation in head and neck squamous cell carcinoma: Tumor microenvironment, drug resistance, and clinical outcomes
Zhuang et al. Identification of an individualized immune-related prognostic risk score in lung squamous cell cancer
CN113046436A (en) Hypoxia-related gene marker combination for hepatocellular carcinoma and application thereof
Gao et al. A novel alternative splicing-based prediction model for uteri corpus endometrial carcinoma
Yang et al. LncRNA MSC-AS1 is a diagnostic biomarker and predicts poor prognosis in patients with gastric cancer by integrated bioinformatics analysis
Xie et al. Comprehensive analysis revealed the potential implications of m6A regulators in lung adenocarcinoma
Zhang et al. A novel machine learning derived RNA-binding protein gene–based score system predicts prognosis of hepatocellular carcinoma patients
CN113416786A (en) Biomarker combination for hepatocellular carcinoma prognosis evaluation and screening method and application thereof
Ren et al. A novel pancreatic cancer hypoxia status related gene signature for prognosis and therapeutic responses
CN110885886B (en) Method for differential diagnosis of glioblastoma and typing of survival prognosis of glioma
Huang et al. Construction and validation of a metabolic gene-associated prognostic model for cervical carcinoma and the role on tumor microenvironment and immunity

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant