Method and kit for detecting polymorphism of 21q22.3 region related to severe coronavirus pneumonia, and application of method and kit
Technical Field
The invention relates to the technical field of biology, in particular to a method and a kit for detecting polymorphism of 21q22.3 region related to severe coronavirus pneumonia and application thereof.
Background
After a human body is infected by the novel Coronavirus SARS-CoV-2, the novel Coronavirus pneumonia (Coronavir Disease 2019, COVID-19) which is mainly characterized by fever, cough and dyspnea can be caused, and the life health of people is seriously threatened. However, after SARS-CoV-2 infects a human body, individuals develop a markedly different course of disease, which can be manifested as simple infection, mild pneumonia, common pneumonia, severe pneumonia and severe pneumonia. The reason for different clinical outcome after SARS-CoV-2 infection of individuals may be caused by differences in factors such as sex and age among individuals. For example, it has been reported that elderly individuals, male individuals, and individuals with poor basic health are more likely to induce severe pneumonia after infection with SARS-CoV-2. On the other hand, studies have shown that genetic factors of the body also play an important role in the clinical outcome after viral infection. For example, Severe Acute Respiratory Syndrome (SARS) results from SARS-CoV infection, and individuals carrying the chemokine ligand 2(CCL2) genetic variation G-2518A are able to recruit more monocytes and macrophages, thereby rendering these individuals more susceptible to SARS-CoV. For another example, after Hepatitis B Virus (HBV) infection of an individual, some individuals can self-clear the virus, and some individuals will develop persistent infection, which is mainly caused by the existence of gene polymorphism rs4646287 in gene SLC10A1 encoding sodium-taurocholate co-transport polypeptide (NTCP) which is a receptor of HBV, and the prevention of HBV invasion and infection of hepatocytes by NTCP receptors of some individuals.
The discovery of susceptibility genes of infectious diseases has important significance for the clinical treatment of the diseases. For example, the genetic variation rs12979860 of the IL28B gene encoding interferon λ 3 is significantly associated with the occurrence of chronic Hepatitis C (HCV). The clearance rate of HCV virus (especially G1 type virus) of an individual carrying the IL28B gene susceptible site is obviously higher than that of an individual not carrying the susceptible site by adopting a polyethylene glycol interferon alpha combined ribavirin treatment scheme. Therefore, the discovery of the severe-related susceptibility gene of COVID-19 is expected to provide theoretical basis and research clues for the occurrence, the progression of the course of disease, the treatment effect and the prognosis of COVID-19.
Genome-wide association study (GWAS) has been demonstrated to allow unbiased, efficient discovery of disease or phenotype associated SNPs. To date, over 2000 genome-wide association studies have been reported. The disease susceptibility gene discovered by GWAS is proved to play an important role in the occurrence and development processes of diseases by subsequent mechanism research, and a solid foundation is laid for the research and development of disease prevention, diagnosis and treatment measures. Recently, a successful GWAS for italian and spain identified 2 SNPs with significant association with COVID-19 severe respiratory failure, respectively: rs11385942 at 3p21.31 and rs657152 at 9q 34.2. The frequency distribution of these two SNPs is much different in different populations, especially rs11385942, which does not have genetic polymorphism in chinese population.
At present, no research report on the severe genetic susceptibility of Chinese population COVID-19 exists. The SNP-based COVID-19 severe disease prediction technology still needs to be developed.
Disclosure of Invention
The present invention is based on the discovery and recognition by the inventors of the following facts and problems:
in order to discover a new COVID-19 severe susceptibility gene, the inventors recruited 2 groups of patients with COVID-19 (group 1: recruited from hospital A, hereinafter referred to as group A; group 2: recruited from hospital B, hereinafter referred to as group B). According to the new diagnosis and treatment scheme for coronavirus pneumonia (trial seventh edition), the inventors classified patients with COVID-19 into light, normal, heavy and critically ill patients. Among these, light and normal patients were mild patients and were used as controls in the present study, and heavy and critical patients were severe patients and were used as cases in the present study. Thus, the group A included 663 cases and 322 controls, and the group B included 200 cases and 207 controls. The inventors used a case-control based strategy to conduct a genome-wide association study in both populations and meta-analyze the results of both populations. It is known that SARS-CoV-2 can utilize Transmembrane Serine Protease 2(Transmembrane Serine Protease 2, TMPRSS2) as an activator of viral spike proteins by binding to angiotensin I converting enzyme 2(ACE2) into cells. Thus, the TMPRSS2 gene is a candidate gene for COVID-19 susceptibility. The inventors focused their attention on the successful location of the region 21q22.3 of TMPRSS2 gene as a susceptibility region of COVID-19 severe disease. Further, the inventors have conducted more detailed analysis on this region.
The present invention is directed to solving, at least to some extent, one of the technical problems in the related art.
According to a first aspect of the invention, the invention proposes the use of a reagent for the detection of a SNP for the preparation of a kit, wherein the kit is used for codv-19 patient screening, the SNP being rs430915, located at human chromosome 21 42,864,743 (based on human genome version hg 19). The inventor adopts a case-control based strategy to complete association analysis aiming at two populations, and finally determines that rs430915 achieves a nominally significant association (P < 0.001) with the severe risk of COVID-19, so that the reagent for detecting SNP can be used for effectively screening COVID-19 patients.
According to some embodiments of the invention, the reagent comprises a primer and/or a probe.
According to some embodiments of the invention, the genotype of rs430915 as AA or AG is indicative of an severe high risk of COVID-19. By analyzing an expression Quantitative Trait locus (eQTL), the inventor finds that an rs430915 risk allele A allele (AA or AG) is obviously related to the high expression of TMPRSS2 gene in lung tissues, and a protein encoded by TMPRSS2 plays a key role in mediating SARS-CoV-2 invading cells. Therefore, the screening of COVID-19 severe patients can be effectively carried out by detecting the genotype of rs430915, and the potential application value of the kit for assisting and indicating COVID-19 severe treatment (such as TMPRSS2 inhibitor camostat mesylate) is realized.
According to a second aspect of the invention, the invention provides a kit for screening patients with COVID-19, wherein the kit contains a reagent for detecting SNP, and the SNP is rs 430915. The inventor adopts a case-control based strategy to complete association analysis aiming at two groups of people, and finally determines that rs430915 is significantly related to the severe risk of COVID-19, so that the kit can be used for effectively detecting the severe risk of COVID-19.
According to some embodiments of the invention, the reagent comprises a primer and/or a probe. Methods of using the reagents include, but are not limited to, Sequenom typing and TaqMan typing, among others.
According to some embodiments of the invention, the Sequenom primers of rs430915 are ACGTTGGATGCCAGGAAACGTGGAAATGTG (SEQ ID NO: 1) and ACGTTGGATGATCTCCAGACGGTGGTGTTC (SEQ ID NO: 2). Wherein ACGTTGGATGCCAGGAAACGTGGAAATGTG is a forward primer, ACGTTGGATGATCTCCAGACGGTGGTGTTC is a backward primer. The SNP site of the invention can be effectively detected by utilizing the forward primer and the backward primer.
According to some embodiments of the invention, the probe of rs430915 is (FAM) CTGGGTTTAGCCgTC (SEQ ID NO: 3) and/or (HEX) CTGGGTTTAGCCaTC (SEQ ID NO: 4). The SNP site of the present invention can be effectively detected by using the probe.
According to some embodiments of the invention, the genotype of rs430915 as AA or AG is indicative of an severe high risk of COVID-19. According to the embodiment of the invention, the primer group disclosed by the invention can be used for effectively carrying out PCR amplification on the fragment of the SNP marker of the individual to be detected, which is related to the COVID-19 severe risk, so that the detection of the SNP marker can be effectively realized through sequencing, the genotype of the SNP marker site of the individual to be detected is determined, and further, whether the individual to be detected is the COVID-19 severe high risk or not can be effectively predicted. In particular, the frequency of rs430915 risk allele a alleles in severe patients with COVID-19 is significantly higher than in mild cases. By analyzing eQTL, the inventor finds that the rs430915 risk allele A allele (AA or AG) is obviously related to the high expression of TMPRSS2 gene in lung tissue, and the protein coded by TMPRSS2 plays a key role in mediating SARS-CoV-2 invading cells. Thus, the primer set for detecting the SNP marker of the present invention described above can be effectively used for screening a severe COVID-19 susceptible population, and can assist in early prediction of severe COVID-19 individuals in a short time, at low cost, and with high accuracy, thereby providing a theoretical basis for clinical care, prognosis evaluation, and the like.
According to a third aspect of the present invention, there is provided an apparatus for screening severe cases COVID-19 patients, comprising:
sequencing means for sequencing at least one of the patient's whole genomes to obtain a sequencing result; wherein at least one of the whole genomes is a region comprising rs 430915.
An alignment device connected to the sequencing device and configured to determine a genotype of rs430915 based on the sequencing result;
an analysis device connected with the alignment device and used for determining the COVID-19 severe risk based on the SNP type. The inventor adopts a case-control based strategy to complete a whole genome association analysis aiming at two populations, and finally determines that rs430915 is significantly related to the severe risk of COVID-19, so that the severe COVID-19 can be effectively detected by using the kit.
According to a specific embodiment of the present invention, the sequencing device is used for sequencing a predetermined region in the whole genome of an individual so as to obtain a sequencing result; wherein the predetermined region is rs430915, upstream 1Kb and downstream 1 Kb. Thus, sequencing can be performed more efficiently.
Drawings
FIG. 1: the meta analysis of the association results of the population a and the population b according to example 1 of the present invention plotted the locus region of rs430915 chromosome.
The region map shows the region 50 kilobases (Kb) upstream and downstream of the TMPRSS2 gene in the 21q22.3 region. The P value is the meta-analysis result of the association result between the A population and the B population. The genomic position is based on the human reference genomic (hg19 version) position. The P value of rs430915 is shown as purple. Linkage Disequilibrium (LD) value (r) of other SNPs with rs4309152) Indicated by the marking color. Red for r2Not less than 0.8, orange indicates that r is not less than 0.62Less than 0.8, green indicates r is more than or equal to 0.42Less than 0.6, light blue indicates r is more than or equal to 0.22< 0.4, blue indicates r2Is less than 0.2. The estimated recombination rate for the reference population of the thousand human genomes (from the 11 month release 2014 version of the thousand human genome project) is represented by the light blue vertical axis.
FIG. 2: an eQTL correlation plot of rs430915 and TMPRSS2 gene expression in lung tissue was drawn using GTEx online eQTL calculation and visualization tools.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.
Examples
Example 1 genome-wide association study of COVID-19 Severe
1. Materials and methods
1.1 study object
The genome wide association study included 1,431 COVID-19 hospitalized patients. These patients were recruited from two hospitals: (1) the first population was recruited from hospital a between 2 and 4 days 2020 to 3 and 23 days 2020 (n-1,010; defined as population a); (2) a second population was recruited from hospital b (n 421; defined as population b) between 1 and 15 days of 2020 and 3 and 30 days of 2020. The basis for diagnosing the novel coronavirus pneumonia is as follows: the nasopharyngeal swab samples collected from the patients in the two hospitals are analyzed by reverse transcription-polymerase chain reaction, and the diagnosis is proved to be positive by the novel coronavirus nucleic acid. The patients with the novel coronavirus pneumonia are clinically classified according to the diagnosis and treatment scheme (trial seventh edition) of the novel coronavirus pneumonia, and the following are briefly described: and (I) light weight. The clinical symptoms are slight, and no pneumonia is shown in the imaging; (II) conventional type. Has symptoms of fever, respiratory tract and the like, and the pneumonia can be seen by imaging; and (III) heavy. According to any one of the following: 1. respiratory distress, RR is more than or equal to 30 times/minute; 2. under the resting state, the oxygen saturation is less than or equal to 93 percent; 3. arterial partial pressure of blood oxygen (PaO)2) Oxygen uptake concentration (FiO)2) 300mmHg (1 mmHg-0.133 kPa) or less. And (IV) dangerous and heavy types. One of the following conditions is met: 1. respiratory failure occurs and mechanical ventilation is required; 2. the occurrence of shock; 3. intensive care unit monitoring treatment is needed for the combined functional failure of other organs.
In this study, patients with COVID-19 of heavy and critically heavy type were defined as "cases" and patients with COVID-19 of light and general type were defined as "controls". Thus, the group A included 679 cases and 331 controls, and the group B included 206 cases and 215 controls. After quality control, 663 cases and 322 controls of the group A are reserved for the group A to be subjected to subsequent association analysis, and the average age of the cases of the group A is 64.2 +/-13.7, the male-female ratio is 1.1, the average age of the controls is 56.6 +/-13.9, and the male-female ratio is 1.1. The group B retained 200 cases and 207 controls for subsequent association analysis, and the average age of the cases in the group was 61.8 + -14.0, the ratio of male to female was 1.2, the average age of the controls was 52.2 + -16.2, and the ratio of male to female was 0.9. Since the samples collected in this study were obtained from hospital hospitalized patients, asymptomatic infectors and the vast majority of light patients who did not require hospitalization were excluded. Thus, the percentage of patients with severe/critical illness (-60%) in this study was much higher than the percentage in the population with total infections (-15%). Clinical information of patients in two populations comes from electronic medical records, and the inventor extracts information such as age, sex, complications (including hypertension, diabetes, coronary artery disease, chronic hepatitis B, chronic obstructive pulmonary disease, chronic kidney disease and cancer) and symptoms (fever, cough, headache, fatigue, pharyngalgia and dyspnea). The population information for the genome-wide association study samples can be found in table 1.
1.2 Association study
Case controls were correlated using a logistic regression model of SNPTEST software version 2.5.4, in which age, gender and complications (hypertension, type II diabetes and coronary heart disease) were corrected.
1.3 statistical test
META-analysis was performed using the fixed effect model in version 1.7 of META software to combine the odds ratios and 95% confidence intervals from the two population association analyses. P value in meta analysis is 5X 10-8As a genome-wide significance threshold, 0.001 was taken as a nominal significance threshold. The inventors calculated the Q statistic to examine the heterogeneity between groups, which was considered statistically significant when P < 0.05.
2. Results
2.1 Whole genome SNP data quality control results
To find a new COVID-19 severe susceptibility area, the inventors used Affymetrix
World chip (Affymetrix)
World arcs) genotyping 770,570 SNPs from 1,431 patients with COVID-19 in population a and population b. Because the same genotyping platform is adopted by both populations, the inventors perform unified quality control on the genotyping data of both populations. With strict quality control (table 2), 663 cases and 322 controls were reserved for population a, and 200 cases and 207 controls were reserved for population b. Both populations finally retained 558,642 SNPs with average detection rates of 98.6% and 98.9%, respectively.
TABLE 2 quality control Process for group A and group B
(a) Quality control of A and B population samples
(b) SNP quality control process
2.2 correlation of the results of the analysis
Further looking at the upstream and downstream 50Kb range of TMPRSS2 gene, the inventors identified that the 21q22.3 region was significantly associated with codid-19 in severe cases (exceeding the nominal significance threshold P of 0.001): the SNP label is rs430915, and the A allele ratio thereofThe ratio is 1.39; a 95% confidence interval of 1.18-1.64; p1.18 × 10-4(Table 3). In the group a, the P value of rs430915 reaches the threshold of significant correlation (P ═ 1.27 × 10)-4(ii) a Table 3). In the group B, the A allele was also associated with severe cases, but the P value did not reach significance (P ═ 0.20; Table 3), possibly due to the problem of the population sample size. No heterogeneity of OR values of rs430915 was observed in the two populations (P)heterogeneity0.70 and 0.07; table 3).
In conclusion, the study newly found a susceptibility region 21q22.3 of the heavy COVID-19.
Example 2 analysis of susceptibility Gene mapping in rs430915 region
rs430915 is located in 21q22.3 region of intron region of Transmembrane Serine Protease 2(Transmembrane Serine Protease 2, TMPRSS2) (FIG. 1). According to eQTL data of a Genotype-Tissue Expression database (GTEx), the risk allele A of rs430915 is obviously related to the high Expression of TMPRSS2 gene in lung Tissue (P ═ 6.7 × 10)-8(ii) a Table 4 and fig. 2). In lung tissue in QTLbase database, the edtl signal with a significant association of rs430915 risk allele a with high expression of TMPRSS2 gene was further verified (most significant P value 2.8 × 10)-13(ii) a Table 4).
The most likely candidate cognate gene for the 21q22.3 region is TMPRSS 2. The gene encodes a transmembrane serine protease. The TMPRSS2 protein comprises a type II transmembrane domain, a class a receptor domain, a cysteine-rich scavenger receptor domain, and a protease domain. Based on the results of the query on the GTEx database, the TMPRSS2 gene was mainly expressed in prostate, stomach, colon, pancreas and lung tissues. Transmembrane serine proteases are involved in a variety of physiological and pathological processes, and play important roles, inter alia, in mediating viral infections. TMPRSS2 can cleave and activate viral envelope glycoproteins through proteolysis, thereby facilitating viral entry into the host cell. Viruses known to enter cells using the action of TMPRSS2 protein include influenza viruses and human coronaviruses (e.g., HCoV-229E, MERS-CoV, SARS-CoV, and SARS-CoV-2). It has been proved by experiments that SARS-CoV-2 can activate virus spike protein by hydrolysis of TMPRSS2 to enter human cells; the entry of SARS-CoV-2 into cells can be partially blocked by TMPRSS2 inhibitor (camostat mesylate) alone, while the combination of TMPRSS2 inhibitor (camostat mesylate) and CtsB/L inhibitor (E64-d) can completely block the invasion of SARS-CoV-2. The above evidence suggests that the SNP of TMPRSS2 gene may also be responsible for the severity of COVID-19.
Taken together, the candidate susceptibility gene for the 21q22.3 region may be TMPRSS 2. The discovery of the invention can provide theoretical basis for further exploring the occurrence and development mechanism, clinical treatment, prognosis evaluation measures and the like of the new coronary pneumonia, and has potential application value.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.