[go: up one dir, main page]

CN114858906B - Kit for diagnosing novel coronavirus infection - Google Patents

Kit for diagnosing novel coronavirus infection Download PDF

Info

Publication number
CN114858906B
CN114858906B CN202110157110.9A CN202110157110A CN114858906B CN 114858906 B CN114858906 B CN 114858906B CN 202110157110 A CN202110157110 A CN 202110157110A CN 114858906 B CN114858906 B CN 114858906B
Authority
CN
China
Prior art keywords
polypeptide
characteristic
mass
ala
leu
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110157110.9A
Other languages
Chinese (zh)
Other versions
CN114858906A (en
Inventor
廖璞
孙巍
乔亮
吕倩
马庆伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Clin Bochuang Biotechnology Co Ltd
Original Assignee
Beijing Clin Bochuang Biotechnology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Clin Bochuang Biotechnology Co Ltd filed Critical Beijing Clin Bochuang Biotechnology Co Ltd
Priority to CN202110157110.9A priority Critical patent/CN114858906B/en
Priority to PCT/CN2021/142779 priority patent/WO2022166485A1/en
Publication of CN114858906A publication Critical patent/CN114858906A/en
Application granted granted Critical
Publication of CN114858906B publication Critical patent/CN114858906B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N27/00Investigating or analysing materials by the use of electric, electrochemical, or magnetic means
    • G01N27/62Investigating or analysing materials by the use of electric, electrochemical, or magnetic means by investigating the ionisation of gases, e.g. aerosols; by investigating electric discharges, e.g. emission of cathode
    • G01N27/626Investigating or analysing materials by the use of electric, electrochemical, or magnetic means by investigating the ionisation of gases, e.g. aerosols; by investigating electric discharges, e.g. emission of cathode using heat to ionise a gas
    • G01N27/628Investigating or analysing materials by the use of electric, electrochemical, or magnetic means by investigating the ionisation of gases, e.g. aerosols; by investigating electric discharges, e.g. emission of cathode using heat to ionise a gas and a beam of energy, e.g. laser enhanced ionisation
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N27/00Investigating or analysing materials by the use of electric, electrochemical, or magnetic means
    • G01N27/62Investigating or analysing materials by the use of electric, electrochemical, or magnetic means by investigating the ionisation of gases, e.g. aerosols; by investigating electric discharges, e.g. emission of cathode
    • G01N27/626Investigating or analysing materials by the use of electric, electrochemical, or magnetic means by investigating the ionisation of gases, e.g. aerosols; by investigating electric discharges, e.g. emission of cathode using heat to ionise a gas
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N30/00Investigating or analysing materials by separation into components using adsorption, absorption or similar phenomena or using ion-exchange, e.g. chromatography or field flow fractionation
    • G01N30/02Column chromatography
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N30/00Investigating or analysing materials by separation into components using adsorption, absorption or similar phenomena or using ion-exchange, e.g. chromatography or field flow fractionation
    • G01N30/02Column chromatography
    • G01N30/62Detectors specially adapted therefor
    • G01N30/72Mass spectrometers
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N30/00Investigating or analysing materials by separation into components using adsorption, absorption or similar phenomena or using ion-exchange, e.g. chromatography or field flow fractionation
    • G01N30/02Column chromatography
    • G01N30/88Integrated analysis systems specially adapted therefor, not covered by a single one of the groups G01N30/04 - G01N30/86
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/53Immunoassay; Biospecific binding assay; Materials therefor
    • G01N33/569Immunoassay; Biospecific binding assay; Materials therefor for microorganisms, e.g. protozoa, bacteria, viruses
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/68Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving proteins, peptides or amino acids
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/68Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving proteins, peptides or amino acids
    • G01N33/6803General methods of protein analysis not limited to specific proteins or families of proteins
    • G01N33/6848Methods of protein analysis involving mass spectrometry
    • G01N33/6851Methods of protein analysis involving laser desorption ionisation mass spectrometry
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N30/00Investigating or analysing materials by separation into components using adsorption, absorption or similar phenomena or using ion-exchange, e.g. chromatography or field flow fractionation
    • G01N30/02Column chromatography
    • G01N30/88Integrated analysis systems specially adapted therefor, not covered by a single one of the groups G01N30/04 - G01N30/86
    • G01N2030/8809Integrated analysis systems specially adapted therefor, not covered by a single one of the groups G01N30/04 - G01N30/86 analysis specially adapted for the sample
    • G01N2030/8813Integrated analysis systems specially adapted therefor, not covered by a single one of the groups G01N30/04 - G01N30/86 analysis specially adapted for the sample biological materials
    • G01N2030/8822Integrated analysis systems specially adapted therefor, not covered by a single one of the groups G01N30/04 - G01N30/86 analysis specially adapted for the sample biological materials involving blood
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N30/00Investigating or analysing materials by separation into components using adsorption, absorption or similar phenomena or using ion-exchange, e.g. chromatography or field flow fractionation
    • G01N30/02Column chromatography
    • G01N30/88Integrated analysis systems specially adapted therefor, not covered by a single one of the groups G01N30/04 - G01N30/86
    • G01N2030/8809Integrated analysis systems specially adapted therefor, not covered by a single one of the groups G01N30/04 - G01N30/86 analysis specially adapted for the sample
    • G01N2030/8813Integrated analysis systems specially adapted therefor, not covered by a single one of the groups G01N30/04 - G01N30/86 analysis specially adapted for the sample biological materials
    • G01N2030/8831Integrated analysis systems specially adapted therefor, not covered by a single one of the groups G01N30/04 - G01N30/86 analysis specially adapted for the sample biological materials involving peptides or proteins
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N2333/00Assays involving biological materials from specific organisms or of a specific nature
    • G01N2333/005Assays involving biological materials from specific organisms or of a specific nature from viruses
    • G01N2333/08RNA viruses
    • G01N2333/165Coronaviridae, e.g. avian infectious bronchitis virus

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Immunology (AREA)
  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Pathology (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Biochemistry (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Urology & Nephrology (AREA)
  • Hematology (AREA)
  • Cell Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Food Science & Technology (AREA)
  • Microbiology (AREA)
  • Biotechnology (AREA)
  • Medicinal Chemistry (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Electrochemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Optics & Photonics (AREA)
  • Tropical Medicine & Parasitology (AREA)
  • Virology (AREA)
  • Biophysics (AREA)
  • Other Investigation Or Analysis Of Materials By Electrical Means (AREA)
  • Peptides Or Proteins (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

The invention provides a characteristic polypeptide composition for detecting new coronavirus infection, which comprises 29 characteristic polypeptides with specific mass-to-charge ratios, and can judge whether a sample is a patient with the new coronavirus infection or not by analyzing the expression condition of the characteristic polypeptides. The invention also provides applications of the mass spectrum model prepared according to the characteristic polypeptide composition, products for diagnosing new coronavirus infection and the like. The invention provides a plurality of characteristic protein combinations with differences according to the new coronavirus infection patient/normal person, phthisis patient and the contrast with new coronavirus infection type symptoms for the first time, breaks through the traditional research thought of searching characteristic polypeptides only in normal person and new coronavirus infection patient, effectively avoids the infection of false positive results similar to the new coronavirus infection symptoms, has simple operation, low detection cost and high accuracy, and is expected to be used for large-scale screening of the new coronavirus infection.

Description

Kit for diagnosing novel coronavirus infection
Technical Field
The invention belongs to the field of detection, and relates to a technology for rapidly detecting novel coronavirus infection by using a time-of-flight mass spectrometry technology.
Background
Coronaviruses are a class of pathogens that primarily cause respiratory and intestinal diseases. The surface of such virus particles has a plurality of regularly arranged protrusions, and the whole virus particle resembles a imperial crown, thus the name "coronavirus". Coronaviruses can infect a variety of mammals, such as pigs, cattle, cats, dogs, minks, camels, bats, mice, hedgehog, and a variety of birds, in addition to humans.
Six types of human coronaviruses are known to date. Four of these coronaviruses are more common in the population and are less pathogenic, generally causing only mild respiratory symptoms like common cold. Two other coronaviruses, severe acute respiratory syndrome coronavirus and middle east respiratory syndrome coronavirus, namely SARS coronavirus and MERS coronavirus for short, can cause severe respiratory diseases.
The novel coronavirus COVID-19 is a novel coronavirus strain which is never found in human body before, and the propagation rule, the infection mechanism, the evolution and variation rule are still unclear, thus bringing difficulty to control.
In order to prevent the occurrence and prevalence of novel coronavirus (COVID-19) infection, measures are rapidly taken to effectively control the development and spread of epidemic, and the rapid detection of novel coronavirus infection is particularly important. For a long time, the identification of coronaviruses adopts traditional microbiological detection methods, namely morphological, physiological and biochemical characteristics and serological identification. The method has high accuracy, but the required time is too long, and the method can be completed only in tens of hours at the fastest speed, so that the method is difficult to meet the requirement of rapid detection. The nucleic acid detection method based on multiplex PCR has important significance for early diagnosis of coronaviruses and discovery of infectious sources.
Matrix-assisted laser desorption ionization time-of-flight mass spectrometry (MALDI-TOF MS for short) technology is a mass spectrometry technology which has been developed and rapidly developed at the end of the 80 th century. The mass analyzer is an ion drift tube (ion dirft tube), ions generated by an ion source are firstly collected, all ion speeds in the collector become 0, and the ions enter the field-free drift tube after being accelerated by using a pulse electric field and fly to an ion receiver at a constant speed, and the larger the ion mass is, the longer the time for reaching the receiver is; the smaller the ion mass, the shorter the time it takes to reach the receiver. According to the principle, ions with different masses can be separated according to mass-to-charge ratio, and the molecular mass and purity of biological macromolecules such as polypeptide, protein, nucleic acid, polysaccharide and the like can be accurately detected, so that the method has the advantages of high accuracy, high flexibility, large flux, short detection period and high cost performance.
In recent years, mass spectrometry techniques have emerged to detect polypeptides or polypeptides characteristic of pathogenic microorganisms or viruses. For example, chinese patent application CN102337223a, "penicillium chrysogenum antifungal protein Pc-Arctin and its preparation method", discloses a MALDI-TOF identification method for detecting penicillium chrysogenum antifungal protein Pc-Arctin, wherein penicillium chrysogenum a096 spores are picked up from a flat plate and inoculated in SGY liquid medium for culture, crude protein solution obtained by pretreatment is separated and purified on chromatographic column, and separated and purified on carboxymethyl cation exchange chromatographic column, eluting components are collected, centrifugal ultrafiltration concentration of each component is carried out to a required volume, paecilomyces variotii is used as sensitive test indicator bacteria, antifungal active components are tracked, and the determined active components judge purity of the obtained protein; a single band on the SDS-PAGE electrophoretogram was excised and MALDI-TOF identified. The method is only suitable for specific microorganisms, multiple protein purification processes are needed, and finally, the characteristic polypeptide Pc-Arctin is identified by MALDI-TOF, so that the method is complex in process and narrow in application range, and the purpose of detecting viruses by mass spectrometry cannot be achieved.
Chinese patent application 201110154723, "method for MALDI TOF MS assisted identification of listeria monocytogenes" and 201110154469, "method for MALDI TOF MS assisted identification of vibrio cholerae" disclose a method for assisted identification of bacteria using MALDI TOF MS technique comprising: pretreating bacterial cultures, collecting MALDI TOF MS (matrix assisted laser Desorption ionization time of flight) maps of all strain samples, preparing a bacterial standard map according to software, detecting and collecting the maps of bacteria to be detected by using the same method, comparing the maps, and judging according to matching scores. Because the method uses conventional treatment (by absolute ethyl alcohol, formic acid and acetonitrile treatment, and assisted by centrifugation, and finally the supernatant is sucked for detection), although the method can characterize the characteristic spectrum of the bacteria to a certain extent, the obtained spectrum is essentially the spectrum collection of the various molecules because the detected substances contain proteins, lipids, lipopolysaccharide and lipo-oligosaccharide, DNA, polypeptide and other molecules capable of being ionized, so that the information of the spectrum which is required to be treated and compared is too large, and the characteristic of the spectrum is low because the detected molecules are too large, and the method is only suitable for a specific bacteria and cannot be popularized to other virus detection in a large quantity.
Chinese patent application 200880121570, entitled "methods and biomarkers for diagnosing and monitoring mental disorders" reports that nearly hundred kinds of neuropeptides associated with mental disorders, including influenza virus, can be detected by MALDI-TOF mass spectrometry techniques. However, this method only briefly summarises the various possible techniques, which neither report a specific protocol nor a specific target for coronaviruses, and thus it is difficult to teach researchers to detect influenza viruses by MALDI-TOF mass spectrometry techniques.
Thus, there is a need for a novel characteristic polypeptide mass spectrometry model for detecting coronavirus infection by matrix-assisted laser desorption ionization time-of-flight mass spectrometry (MALDI-TOF-MS) and use thereof.
Disclosure of Invention
The first object of the present invention provides a set of compositions based on characteristic polypeptides of sero-peptide group (peptidome) which can detect novel coronaviruses (COVID-19) by MALDI-TOF mass spectrometry, wherein the characteristic polypeptide composition comprises 25 characteristic polypeptides :5158m/z、5366m/z、5893m/z、6357m/z、6654m/z、6939m/z、7364m/z、7614m/z、8034m/z、8043m/z、8226m/z、8425m/z、8560m/z、8986m/z、9626m/z、13719m/z、13765m/z、13886m/z、14049m/z、14095m/z、14102m/z、15123m/z、15867m/z、28091m/z、28232m/z, having a mass to charge ratio or comprises 29 characteristic polypeptides having a mass to charge ratio :5158m/z、5366m/z、5893m/z、6357m/z、6654m/z、6939m/z、7364m/z、7614m/z、8034m/z、8043m/z、8226m/z、8425m/z、8560m/z、8986m/z、9626m/z、11435m/z、11495m/z、11523m/z、11680m/z、13719m/z、13765m/z、13886m/z、14049m/z、14095m/z、14102m/z、15123m/z、15867m/z、28091m/z、28232m/z.
In one embodiment, the signature polypeptide composition comprises 19 signature polypeptides having the following mass to charge ratios and polypeptide sequences:
A characteristic polypeptide with a mass to charge ratio of 6939m/z, the polypeptide sequence of which is selected from the sequences shown in SEQ ID No. 1;
a polypeptide with a mass-to-charge ratio of 7614m/z, wherein the polypeptide sequence is selected from the sequences shown in SEQ ID No. 2;
A characteristic polypeptide with a mass to charge ratio of 8034m/z, the polypeptide sequence of which is selected from the sequence shown in SEQ ID No. 3;
a characteristic polypeptide with a mass-to-charge ratio of 8226m/z, the polypeptide sequence of which is selected from the sequences shown in SEQ ID No. 4;
a characteristic polypeptide with a mass-to-charge ratio of 8986m/z, the polypeptide sequence of which is selected from the sequences shown in SEQ ID No. 5;
a characteristic polypeptide having a mass to charge ratio of 9626m/z, the polypeptide sequence being selected from the group consisting of the sequences shown in SEQ ID No. 6;
A characteristic polypeptide having a mass to charge ratio of 13719m/z, the polypeptide sequence of which is selected from the group consisting of the sequences shown in SEQ ID No. 7;
a characteristic polypeptide having a mass to charge ratio of 13765m/z, the polypeptide sequence being selected from the group consisting of the sequences shown in SEQ ID No. 8;
a polypeptide having a mass to charge ratio of 13886m/z, wherein the polypeptide sequence is selected from the group consisting of the sequences shown in SEQ ID No. 9;
a characteristic polypeptide having a mass to charge ratio of 14049m/z, the polypeptide sequence of which is selected from the group consisting of the sequences shown in SEQ ID No. 10;
a characteristic polypeptide with a mass to charge ratio of 14095m/z, the polypeptide sequence of which is selected from the sequences shown in SEQ ID No. 11;
A characteristic polypeptide having a mass to charge ratio of 14102m/z, the polypeptide sequence of which is selected from the group consisting of the sequences shown in SEQ ID No. 12;
a characteristic polypeptide with a mass to charge ratio of 15123m/z, the polypeptide sequence of which is selected from the sequence shown in SEQ ID No. 13;
a characteristic polypeptide having a mass to charge ratio of 15867m/z, the polypeptide sequence of which is selected from the group consisting of the sequences shown in SEQ ID No. 14;
a polypeptide having a mass to charge ratio of 28091m/z and a polypeptide sequence selected from the group consisting of the sequences shown in SEQ ID No.15
A characteristic polypeptide with mass to charge ratio of 11435m/z, the polypeptide sequence of which is selected from the sequence shown in SEQ ID No. 16;
A characteristic polypeptide having a mass to charge ratio of 11495m/z, the polypeptide sequence of which is selected from the group consisting of the sequences shown in SEQ ID No. 17;
A characteristic polypeptide having a mass to charge ratio of 11523m/z, the polypeptide sequence of which is selected from the group consisting of the sequences shown in SEQ ID No. 18;
A polypeptide having a mass to charge ratio of 11680m/z, the polypeptide sequence of which is selected from the group consisting of the sequences shown in SEQ ID No. 19.
In any of the above embodiments, a ten fold cross-validation accuracy of about 91% is indicated when the peak of the signature polypeptide 8986m/z, 28091m/z is up-regulated while the peak of the signature polypeptide 6939m/z, 13886m/z, 14049m/z, 14102m/z is down-regulated for expression, indicating that the serum sample is a positive sample, i.e., the patient is determined to be a novel coronavirus infected patient. In a preferred embodiment, the composition of the characteristic polypeptides comprises only characteristic polypeptides having mass to charge ratios of 8986m/z, 28091m/z, and 6939m/z, 13886m/z, 14049m/z, 14102m/z, respectively.
In another arbitrary embodiment, a ten fold cross-validation accuracy of approximately 93.31% is indicated when the peak of the signature polypeptide 7614m/z、8034m/z、8226m/z、8986m/z、9626m/z、11435m/z、11495m/z、11523m/z、11680m/z、15123m/z、15867m/z、28091m/z is up-regulated while the peak of the signature polypeptides 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z is down-regulated for expression, indicating that the serum sample is a positive sample, i.e., the patient is a novel coronavirus infected patient. In a preferred embodiment, the composition of the characteristic polypeptides comprises only characteristic polypeptides having mass to charge ratios of 7614m/z、8034m/z、8226m/z、8986m/z、9626m/z、11435m/z、11495m/z、11523m/z、11680m/z、15123m/z、15867m/z、28091m/z, and 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, respectively.
In other embodiments, a ten fold cross-validation accuracy of about 98.69% is indicated when the peak of the signature polypeptide 5158m/z、5366m/z、5893m/z、7364m/z、7614m/z、8034m/z、8043m/z、8226m/z、8425m/z、8560m/z、8986m/z、9626m/z、11435m/z、11495m/z、11523m/z、11680m/z、15123m/z、15867m/z、28091m/z is up-regulated while the peak of the signature polypeptide 6357m/z, 6654m/z, 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 28232m/z is down-regulated for expression, indicating that the serum sample is a positive sample, i.e., the patient is a novel coronavirus infected patient.
It is a second object of the present invention to provide a mass spectrometry model for detecting novel coronavirus infections, which is prepared from a characteristic polypeptide composition having a mass-to-charge ratio peak of any of the above schemes.
In one embodiment, the mass spectrometry model is prepared from a signature polypeptide 5158m/z、5366m/z、5893m/z、6357m/z、6654m/z、6939m/z、7364m/z、7614m/z、8034m/z、8043m/z、8226m/z、8425m/z、8560m/z、8986m/z、9626m/z、13719m/z、13765m/z、13886m/z、14049m/z、14095m/z、14102m/z、15123m/z、15867m/z、28091m/z、28232m/z, wherein when the peak of signature polypeptide 5158m/z、5366m/z、5893m/z、7364m/z、7614m/z、8034m/z、8043m/z、8226m/z、8425m/z、8560m/z、8986m/z、9626m/z、15123m/z、15867m/z、28091m/z is up-regulated while the peak of signature polypeptide 6357m/z, 6654m/z, 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 28232m/z is down-regulated, the serum sample is indicative of a positive sample, i.e., the patient is a novel coronavirus infected patient, and the ten fold cross-validation accuracy is about 97.96%.
Alternatively, in another embodiment described above, the mass spectrometry model is prepared from a signature polypeptide 5158m/z、5366m/z、5893m/z、6357m/z、6654m/z、6939m/z、7364m/z、7614m/z、8034m/z、8043m/z、8226m/z、8425m/z、8560m/z、8986m/z、9626m/z、11435m/z、11495m/z、11523m/z、11680m/z、13719m/z、13765m/z、13886m/z、14049m/z、14095m/z、14102m/z、15123m/z、15867m/z、28091m/z、28232m/z, wherein when the peak of signature polypeptide 5158m/z、5366m/z、5893m/z、7364m/z、7614m/z、8034m/z、8043m/z、8226m/z、8425m/z、8560m/z、8986m/z、9626m/z、11435m/z、11495m/z、11523m/z、11680m/z、15123m/z、15867m/z、28091m/z is up-regulated while the peak of signature polypeptide 6357m/z, 6654m/z, 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 28232m/z is down-regulated, the serum sample is indicative of a positive sample, i.e., the patient is a novel coronavirus infected patient, and the ten fold cross-validation accuracy is about 98.69%.
In another embodiment, the mass spectrometry model is prepared from only a composition of the signature polypeptides having mass to charge ratios of 7614m/z、8034m/z、8226m/z、8986m/z、9626m/z、11435m/z、11495m/z、11523m/z、11680m/z、15123m/z、15867m/z、28091m/z, and 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, respectively, wherein when the peak of signature polypeptide 7614m/z、8034m/z、8226m/z、8986m/z、9626m/z、11435m/z、11495m/z、11523m/z、11680m/z、15123m/z、15867m/z、28091m/z is up-regulated while the peak of signature polypeptides 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z is down-regulated, the serum sample is indicated to be a positive sample, i.e., the patient is a novel coronavirus infected patient, and the ten fold cross-validation accuracy is about 93.31%.
In other embodiments, the mass spectrometry model is prepared from only a composition of signature polypeptides having mass to charge ratios of 8986m/z, 28091m/z, 6939m/z, 13886m/z, 14049m/z, 14102m/z, respectively, wherein when the peaks of signature polypeptides 8986m/z, 28091m/z are up-regulated while the peaks of signature polypeptides 6939m/z, 13886m/z, 14049m/z, 14102m/z are down-regulated, the serum sample is indicative of a positive sample, i.e., the patient is determined to be a novel coronavirus infected patient, and the ten fold cross-validation accuracy is about 91%.
It is a third object of the present invention to provide a kit for detecting a novel coronavirus infection comprising the above-described characteristic polypeptide composition, or comprising the above-described mass spectrometry model.
In one embodiment, the polypeptide composition or mass spectrometry model is prepared from a signature polypeptide 5158m/z、5366m/z、5893m/z、6357m/z、6654m/z、6939m/z、7364m/z、7614m/z、8034m/z、8043m/z、8226m/z、8425m/z、8560m/z、8986m/z、9626m/z、13719m/z、13765m/z、13886m/z、14049m/z、14095m/z、14102m/z、15123m/z、15867m/z、28091m/z、28232m/z, wherein when the peak of signature polypeptide 5158m/z、5366m/z、5893m/z、7364m/z、7614m/z、8034m/z、8043m/z、8226m/z、8425m/z、8560m/z、8986m/z、9626m/z、15123m/z、15867m/z、28091m/z is up-regulated while the peak of signature polypeptide 6357m/z, 6654m/z, 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 28232m/z is down-regulated, the serum sample is indicative of a positive sample, i.e., the patient is a novel coronavirus infected patient, and the ten fold cross-validation accuracy is about 97.96%.
Or in another embodiment, the polypeptide composition or mass spectrometry model is prepared from a signature polypeptide 5158m/z、5366m/z、5893m/z、6357m/z、6654m/z、6939m/z、7364m/z、7614m/z、8034m/z、8043m/z、8226m/z、8425m/z、8560m/z、8986m/z、9626m/z、11435m/z、11495m/z、11523m/z、11680m/z、13719m/z、13765m/z、13886m/z、14049m/z、14095m/z、14102m/z、15123m/z、15867m/z、28091m/z、28232m/z, wherein when the peak of signature polypeptide 5158m/z、5366m/z、5893m/z、7364m/z、7614m/z、8034m/z、8043m/z、8226m/z、8425m/z、8560m/z、8986m/z、9626m/z、11435m/z、11495m/z、11523m/z、11680m/z、15123m/z、15867m/z、28091m/z is up-regulated while the peak of signature polypeptide 6357m/z, 6654m/z, 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 28232m/z is down-regulated, the serum sample is indicative of a positive sample, i.e., the patient is a novel coronavirus infected patient, and the ten fold cross-validation accuracy is about 98.69%.
In another embodiment, the polypeptide composition or mass spectrometry model is prepared from only the characteristic polypeptides having mass to charge ratios of 7614m/z、8034m/z、8226m/z、8986m/z、9626m/z、11435m/z、11495m/z、11523m/z、11680m/z、15123m/z、15867m/z、28091m/z, and 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, respectively, wherein when the peak of characteristic polypeptide 7614m/z、8034m/z、8226m/z、8986m/z、9626m/z、11435m/z、11495m/z、11523m/z、11680m/z、15123m/z、15867m/z、28091m/z is up-regulated while the peak of characteristic polypeptide 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z is down-regulated, the serum sample is indicative of a positive sample, i.e., the patient is a novel coronavirus infected patient, and the ten fold cross-validation accuracy is about 93.31%.
In other embodiments, the polypeptide composition or mass spectrometry model is prepared from only the following characteristic polypeptides having mass to charge ratios of 8986m/z, 28091m/z, 6939m/z, 13886m/z, 14049m/z, 14102m/z, respectively, wherein when the peaks of the characteristic polypeptides 8986m/z, 28091m/z are up-regulated while the peaks of the characteristic polypeptides 6939m/z, 13886m/z, 14049m/z, 14102m/z are down-regulated, the serum sample is indicative of a positive sample, i.e., the patient is determined to be a novel coronavirus infected patient, and the ten fold cross-validation accuracy is about 91%.
In one embodiment, the kit comprises a sample treatment fluid developed by Beijing-based New Boc Biotechnology Inc.
In another embodiment, the kit further comprises a standard mass spectrum sample tube for ensuring the accuracy of the molecular weight measured by the mass spectrometer, wherein the sample tube can be a plurality of sample tubes containing single characteristic polypeptides or one sample tube containing a plurality of characteristic polypeptides, and samples in the standard sample tube are used for performing parallel mass spectrum test when being subjected to mass spectrum with the sample to be measured so as to judge whether the molecular weight information of the sample to be measured is accurate and reliable.
In another embodiment, the kit can contain software or a chip of the standard database of the characteristic polypeptides, and can be used for providing standard data or curve comparison when a sample to be tested is subjected to mass spectrometry so as to judge the expression condition of the characteristic polypeptides in the sample to be tested.
It is a fourth object of the present invention to provide the use of said characteristic polypeptide composition, or said mass spectrometry model, for the preparation of a product for diagnosing a novel coronavirus infection.
In one embodiment, the polypeptide composition or mass spectrometry model is prepared from a signature polypeptide 5158m/z、5366m/z、5893m/z、6357m/z、6654m/z、6939m/z、7364m/z、7614m/z、8034m/z、8043m/z、8226m/z、8425m/z、8560m/z、8986m/z、9626m/z、13719m/z、13765m/z、13886m/z、14049m/z、14095m/z、14102m/z、15123m/z、15867m/z、28091m/z、28232m/z, wherein when the peak of signature polypeptide 5158m/z、5366m/z、5893m/z、7364m/z、7614m/z、8034m/z、8043m/z、8226m/z、8425m/z、8560m/z、8986m/z、9626m/z、15123m/z、15867m/z、28091m/z is up-regulated while the peak of signature polypeptide 6357m/z, 6654m/z, 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 28232m/z is down-regulated, the serum sample is indicative of a positive sample, i.e., the patient is a novel coronavirus infected patient, and the ten fold cross-validation accuracy is about 97.96%.
Or in another embodiment, the polypeptide composition or mass spectrometry model is prepared from a signature polypeptide 5158m/z、5366m/z、5893m/z、6357m/z、6654m/z、6939m/z、7364m/z、7614m/z、8034m/z、8043m/z、8226m/z、8425m/z、8560m/z、8986m/z、9626m/z、11435m/z、11495m/z、11523m/z、11680m/z、13719m/z、13765m/z、13886m/z、14049m/z、14095m/z、14102m/z、15123m/z、15867m/z、28091m/z、28232m/z, wherein when the peak of signature polypeptide 5158m/z、5366m/z、5893m/z、7364m/z、7614m/z、8034m/z、8043m/z、8226m/z、8425m/z、8560m/z、8986m/z、9626m/z、11435m/z、11495m/z、11523m/z、11680m/z、15123m/z、15867m/z、28091m/z is up-regulated while the peak of signature polypeptide 6357m/z, 6654m/z, 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 28232m/z is down-regulated, the serum sample is indicative of a positive sample, i.e., the patient is a novel coronavirus infected patient, and the ten fold cross-validation accuracy is about 98.69%.
In another embodiment, the polypeptide composition or mass spectrometry model is prepared from only the characteristic polypeptides having mass to charge ratios of 7614m/z、8034m/z、8226m/z、8986m/z、9626m/z、11435m/z、11495m/z、11523m/z、11680m/z、15123m/z、15867m/z、28091m/z, and 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, respectively, wherein when the peak of characteristic polypeptide 7614m/z、8034m/z、8226m/z、8986m/z、9626m/z、11435m/z、11495m/z、11523m/z、11680m/z、15123m/z、15867m/z、28091m/z is up-regulated while the peak of characteristic polypeptide 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z is down-regulated, the serum sample is indicative of a positive sample, i.e., the patient is a novel coronavirus infected patient, and the ten fold cross-validation accuracy is about 93.31%.
In other embodiments, the polypeptide composition or mass spectrometry model is prepared from only the following characteristic polypeptides having mass to charge ratios of 8986m/z, 28091m/z, 6939m/z, 13886m/z, 14049m/z, 14102m/z, respectively, wherein when the peaks of the characteristic polypeptides 8986m/z, 28091m/z are up-regulated while the peaks of the characteristic polypeptides 6939m/z, 13886m/z, 14049m/z, 14102m/z are down-regulated, the serum sample is indicative of a positive sample, i.e., the patient is determined to be a novel coronavirus infected patient, and the ten fold cross-validation accuracy is about 91%.
In any of the above embodiments, the product for diagnosing a novel coronavirus infection refers to any conventional product for diagnosing a novel coronavirus infection, including: detection reagent, detection chip, detection carrier, detection kit, etc.
A fifth object of the present invention is to provide a method for constructing a mass spectrometry model, comprising:
1) Serum samples of a plurality of clinically definite patients infected with the novel coronavirus and non-novel patients infected with the coronavirus (including tuberculosis patients, patients with symptoms similar to fever and cough and healthy people) are collected and frozen at low temperature for standby;
2) Carrying out mass spectrum pretreatment on serum proteins;
3) Carrying out mass spectrometry detection and reading on the two groups of preprocessed serum proteins to obtain fingerprint patterns of the two groups of serum polypeptides;
4) Carrying out standardized treatment on finger print of serum polypeptide of all patients and normal people, and collecting data;
5) And performing quality control treatment on the obtained data, screening out characteristic polypeptides :5158m/z、5366m/z、5893m/z、6357m/z、6654m/z、6939m/z、7364m/z、7614m/z、8034m/z、8043m/z、8226m/z、8425m/z、8560m/z、8986m/z、9626m/z、13719m/z、13765m/z、13886m/z、14049m/z、14095m/z、14102m/z、15123m/z、15867m/z、28091m/z、28232m/z, with the following mass-to-charge ratio peaks, performing secondary mass spectrometry identification on the characteristic polypeptides, and establishing a mass spectrometry model for detecting novel coronavirus infection according to the mass-to-charge ratio peaks.
In one embodiment, wherein step 5) performs a quality control process on the resulting data, screens out a signature polypeptide :5158m/z、5366m/z、5893m/z、6357m/z、6654m/z、6939m/z、7364m/z、7614m/z、8034m/z、8043m/z、8226m/z、8425m/z、8560m/z、8986m/z、9626m/z、11435m/z、11495m/z、11523m/z、11680m/z、13719m/z、13765m/z、13886m/z、14049m/z、14095m/z、14102m/z、15123m/z、15867m/z、28091m/z、28232m/z, having the following mass to charge ratio peaks, performs a secondary mass spectrometry identification of the signature polypeptide, and builds a mass spectrometry model for detecting novel coronavirus infection based on these mass to charge ratio peaks.
In a preferred embodiment, wherein the mass spectrometry model of step 5) is prepared from only the characteristic polypeptides having mass to charge ratios of 7614m/z、8034m/z、8226m/z、8986m/z、9626m/z、11435m/z、11495m/z、11523m/z、11680m/z、15123m/z、15867m/z、28091m/z, and 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, respectively, wherein when the peak of characteristic polypeptide 7614m/z、8034m/z、8226m/z、8986m/z、9626m/z、11435m/z、11495m/z、11523m/z、11680m/z、15123m/z、15867m/z、28091m/z is up-regulated while the peak of characteristic polypeptide 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z is down-regulated, the serum sample is indicated as a positive sample, i.e. the patient is a novel coronavirus infected patient, and the ten fold cross-validation accuracy is about 93.31%.
In another embodiment, wherein the mass spectrometry model of step 5) is prepared from only the following characteristic polypeptides having mass to charge ratios of 8986m/z, 28091m/z, 6939m/z, 13886m/z, 14049m/z, 14102m/z, respectively, wherein when the peaks of the characteristic polypeptides 8986m/z, 28091m/z are up-regulated while the peaks of the characteristic polypeptides 6939m/z, 13886m/z, 14049m/z, 14102m/z are down-regulated, the serum sample is indicated as a positive sample, i.e. the patient is determined to be a novel coronavirus infected patient, a ten fold cross-validation accuracy of about 91%.
In any of the above embodiments, the characteristic polypeptides are respectively:
A characteristic polypeptide with a mass to charge ratio of 6939m/z, the polypeptide sequence of which is selected from the sequences shown in SEQ ID No. 1;
a polypeptide with a mass-to-charge ratio of 7614m/z, wherein the polypeptide sequence is selected from the sequences shown in SEQ ID No. 2;
A characteristic polypeptide with a mass to charge ratio of 8034m/z, the polypeptide sequence of which is selected from the sequence shown in SEQ ID No. 3;
a characteristic polypeptide with a mass-to-charge ratio of 8226m/z, the polypeptide sequence of which is selected from the sequences shown in SEQ ID No. 4;
a characteristic polypeptide with a mass-to-charge ratio of 8986m/z, the polypeptide sequence of which is selected from the sequences shown in SEQ ID No. 5;
a characteristic polypeptide having a mass to charge ratio of 9626m/z, the polypeptide sequence being selected from the group consisting of the sequences shown in SEQ ID No. 6;
A characteristic polypeptide having a mass to charge ratio of 13719m/z, the polypeptide sequence of which is selected from the group consisting of the sequences shown in SEQ ID No. 7;
a characteristic polypeptide having a mass to charge ratio of 13765m/z, the polypeptide sequence being selected from the group consisting of the sequences shown in SEQ ID No. 8;
a polypeptide having a mass to charge ratio of 13886m/z, wherein the polypeptide sequence is selected from the group consisting of the sequences shown in SEQ ID No. 9;
a characteristic polypeptide having a mass to charge ratio of 14049m/z, the polypeptide sequence of which is selected from the group consisting of the sequences shown in SEQ ID No. 10;
a characteristic polypeptide with a mass to charge ratio of 14095m/z, the polypeptide sequence of which is selected from the sequences shown in SEQ ID No. 11;
A characteristic polypeptide having a mass to charge ratio of 14102m/z, the polypeptide sequence of which is selected from the group consisting of the sequences shown in SEQ ID No. 12;
a characteristic polypeptide with a mass to charge ratio of 15123m/z, the polypeptide sequence of which is selected from the sequence shown in SEQ ID No. 13;
a characteristic polypeptide having a mass to charge ratio of 15867m/z, the polypeptide sequence of which is selected from the group consisting of the sequences shown in SEQ ID No. 14;
A characteristic polypeptide with a mass to charge ratio of 28091m/z, the polypeptide sequence of which is selected from the sequence shown in SEQ ID No. 15;
A characteristic polypeptide with mass to charge ratio of 11435m/z, the polypeptide sequence of which is selected from the sequence shown in SEQ ID No. 16;
A characteristic polypeptide having a mass to charge ratio of 11495m/z, the polypeptide sequence of which is selected from the group consisting of the sequences shown in SEQ ID No. 17;
A characteristic polypeptide having a mass to charge ratio of 11523m/z, the polypeptide sequence of which is selected from the group consisting of the sequences shown in SEQ ID No. 18;
A polypeptide having a mass to charge ratio of 11680m/z, the polypeptide sequence of which is selected from the group consisting of the sequences shown in SEQ ID No. 19.
In any of the above embodiments, wherein the method of step 2) of pre-treating comprises diluting the serum protein or polypeptide in the stabilized sample with a sample treatment fluid.
In any of the above embodiments, in the step 3), the polypeptide mass spectrometry universal pretreatment kit is used to dilute and read two groups of serum proteins, so as to obtain fingerprints of the two groups of serum polypeptides.
In any of the above embodiments, the quality control process described in step 5) uses the same mass spectrum parameters to detect the crystallization point of the blank substrate, and if a distinct mass spectrum peak appears, the quality of the substrate solution is considered to be unacceptable.
In any one of the above embodiments, wherein the quality control processing in step 5) selects the following 8 characteristic peaks as quality control peaks: 6426m/z, 6623m/z, 8753m/z, 8785m/z, 8904m/z, 9118m/z, 9409m/z, 9700m/z.
In the process of detecting a biological sample by time-of-flight mass spectrometry, the mass of a mass spectrogram is influenced by a plurality of conditions such as individual difference, sample mass, environmental temperature and humidity change, crystallization states of a sample and a matrix and the like. To avoid the influence of the abnormal spectrum on the analysis result, the above 8 characteristic peaks common to human serum were introduced as quality control peaks, and the occurrence of the quality control peaks was independent of whether the patient had a novel coronavirus infection. Of the 843 spectra collected, 683 could detect all 8 quality control peaks (81.0% of the total number of spectra), and 156 could detect 7 quality control peaks (18.5% of the total number of spectra). Wherein, the following spectrogram quality control conditions are set: in the spectrogram of a single sample, the quality control peak number is 6-8, and when the deviation of the internal standard peak molecular weight is less than 0.002 (or the deviation range is not more than 2 per mill), the quality control is qualified. Unqualified spectra need to be re-detected.
The invention combines a bioinformatics method to screen out corresponding novel coronavirus infection markers and establish a detection model for analysis and detection, wherein the bioinformatics method comprises the steps of carrying out standardization treatment on fingerprint spectra, carrying out experimental quality control treatment on obtained data, screening expected serum characteristic polypeptides and establishing a mass spectrum model, and optionally establishing and verifying the mass spectrum model by using an LR algorithm. And the experimental quality control processing reserves mass spectrum data with the number of the internal standard peaks not less than 6, and performs secondary calibration of the spectrogram by using the internal standard peaks.
Terminology and definitions
Cross validation of ten folds, called 10-fold cross-validation, was used to test algorithm accuracy. Is a common test method. The data set was divided into ten parts, 9 parts of which were used as training data and 1 part as test data in turn, and the test was performed. Each test gives a corresponding correct rate (or error rate). As an estimation of the accuracy of the algorithm, an average value of the accuracy (or error rate) of the result of 10 times is generally required to perform 10-fold cross-validation (e.g., 10 times 10-fold cross-validation), and then the average value is obtained as an estimation of the accuracy of the algorithm. It should be noted that ten fold cross-validation accuracy is related to, but not equivalent to, the accuracy (or sensitivity) of the actual test. In the process of evaluating the effect of the test algorithm, the effect accords with the ten-fold cross-validation accuracy of the confidence interval, and if the correlation change is presented along with the quantity of the characteristic polypeptides and reaches the value which is feasible for clinical diagnosis, the mass spectrum model constructed by the polypeptides accords with the requirement for clinical diagnosis.
SAA protein (Serum amyloid A protein) is a serum amyloid A family, an acute phase response protein, belonging to the heterogeneous class of proteins in the apolipoprotein family. There are 4 serum amyloid a genes in humans as SAA1-SAA4, respectively, where two proteins of SAA1 and SAA2 are acute phase (acute phase) are called a-SAA.
Technical effects
Compared with the prior art, the invention has the following advantages:
1. The invention adopts a plurality of characteristic protein combinations which are different between a novel coronavirus infected patient and a normal person, a pulmonary tuberculosis patient and a control patient with novel coronavirus infection type symptoms to detect serum samples, and adopts a method combining traditional statistics and a modern bioinformatics method to carry out data processing, thereby obtaining a polypeptide fingerprint detection model of the novel coronavirus infected patient, the healthy person and other control patients, and a series of discovered protein charge ratio peaks provide basis and resources for searching new and more ideal markers.
2. Compared with the prior detection method, the method has higher sensitivity and specificity, simple operation, low detection cost and high flux, and is expected to be used for large-scale screening of novel coronavirus infection.
3. The construction method of the model is reasonable and feasible in design, provides a new screening method for providing the clinical cure rate of the novel coronavirus infection, and provides a new thought for exploring the mechanism of occurrence and development of the novel coronavirus infection.
4. The invention provides a plurality of characteristic protein combinations with differences between 146 patients with definite diagnosis of novel coronavirus infection, 46 patients with normal infection, 33 patients with tuberculosis and 73 patients with novel coronavirus infection type symptoms for the first time, breaks through the traditional research thought of searching characteristic polypeptides only in normal patients and novel patients with coronavirus infection, and effectively avoids the infection of false positive results similar to the novel coronavirus infection symptoms.
5. The mass spectrum model of the invention has the detection accuracy reaching 99%, the sensitivity being 98% and the specificity being 100%, and the result shows that the serum peptide group characteristic polypeptide model of the invention can be rapidly used for screening novel coronavirus infected patients in the crowd.
6. Compared with a composition constructed by 25 characteristic polypeptides and a mass spectrum model, 4 newly introduced characteristic polypeptides (namely SEQ ID NO: 16-19) belong to an SAA protein marker family, and can be used as a biomarker for diagnosing pathogenic and viral infection by methods such as ELASA, immunoturbidimetry, colloidal gold method, immunofluorescence chromatography and the like in clinic. However, on the basis of a completed mass spectrum model of 25 special polypeptides, the invention firstly proposes that the SAA protein marker is used for detecting viruses by laser flight mass spectrometry, and specific SAA protein sequences (namely SEQ ID NO: 16-19) are accurately identified for the first time, so that the situation of misdiagnosis of a normal sample in clinic can be effectively avoided. The results showed that the ten fold cross-validation accuracy of the 29 feature polypeptide mass spectrum models incorporating the 4 SAA polypeptide markers was about 97.96% compared to the ten fold cross-validation accuracy of the 25 feature polypeptide mass spectrum models, which was about 98.69%.
Drawings
Fig. 1: comparing serum polypeptide finger print of different groups (healthy person group, pulmonary tuberculosis group, similar symptom group, and new crown patient group), wherein the negative healthy person pattern, negative pulmonary tuberculosis pattern, negative similar symptom, and positive new crown patient are respectively from top to bottom
Fig. 2-1: the 20 peaks with the highest repetition frequency in LASSO. Fig. 2-2: the 20 peaks with the highest importance of VIP changes in PLS-DA.
Fig. 2-3: the 10 peaks with the highest accuracy are cross-validated in RFECV.
Fig. 3: each characteristic peak intensity, wherein the left column is a negative control group and the right column is a positive control group.
Fig. 4-1: various machine learning methods, training set ROC curve comparison. Fig. 4-2: test set ROC curve comparison.
Fig. 5: the test set of the real groupings confuses the predicted results of the matrix.
Fig. 6: a procedure for establishing a mass spectrometry model for rapidly screening a novel coronavirus infected (COVID-19) patient for characteristic polypeptides.
Fig. 7: the mass spectrum peak spectrum of the characteristic polypeptide m/z 5157.6, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of COVID-19.
Fig. 8: the mass spectrum peak spectrum of the characteristic polypeptide m/z 5366.2, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of COVID-19.
Fig. 9: the mass spectrum peak spectrum of the characteristic polypeptide m/z 5892.9, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of COVID-19.
Fig. 10: the mass spectrum peak spectrum of the characteristic polypeptide m/z 6357.4, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of COVID-19.
Fig. 11: the mass spectrum peak spectrum of the characteristic polypeptide m/z 6654.0, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of COVID-19.
Fig. 12: the mass spectrum peak spectrum of the characteristic polypeptide m/z 6939.1, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of COVID-19.
Fig. 13: the mass spectrum peak spectrum of the characteristic polypeptide m/z 7364.2, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of COVID-19.
Fig. 14: the mass spectrum peak spectrum of the characteristic polypeptide m/z 7614.2, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of COVID-19.
Fig. 15: the mass spectrum peak spectrum of the characteristic polypeptide m/z 8034.3, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of COVID-19.
Fig. 16: the mass spectrum peak spectrum of the characteristic polypeptide m/z 8042.7, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of COVID-19.
Fig. 17: the mass spectrum peak spectrum of the characteristic polypeptide m/z 8226.4, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of COVID-19.
Fig. 18: the mass spectrum peak spectrum of the characteristic polypeptide m/z 8424.9, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of COVID-19.
Fig. 19: the mass spectrum peak spectrum of the characteristic polypeptide m/z 8559.8, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of COVID-19.
Fig. 20: the mass spectrum peak spectrum of the characteristic polypeptide m/z 8986.1, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of COVID-19.
Fig. 21: the mass spectrum peak spectrum of the characteristic polypeptide m/z 9626.4, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of COVID-19.
Fig. 22: the mass spectrum peak spectrum of the characteristic polypeptide m/z 13719.2, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of COVID-19.
Fig. 23: the mass spectrum peak spectrum of the characteristic polypeptide m/z 13765.2, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of COVID-19.
Fig. 24: the mass spectrum peak spectrum of the characteristic polypeptide m/z 13886.1, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of COVID-19.
Fig. 25: the mass spectrum peak spectrum of the characteristic polypeptide m/z 14049.4, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of COVID-19.
Fig. 26: the mass spectrum peak spectrum of the characteristic polypeptide m/z 14094.7, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of COVID-19.
Fig. 27: the mass spectrum peak spectrum of the characteristic polypeptide m/z 14101.8, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of COVID-19.
Fig. 28: the mass spectrum peak spectrum of the characteristic polypeptide m/z 15123.4, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of COVID-19.
Fig. 29: the mass spectrum peak spectrum of the characteristic polypeptide m/z 15866.5, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of COVID-19.
Fig. 30: the mass spectrum peak spectrum of the characteristic polypeptide m/z 28091.4, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of COVID-19.
Fig. 31: the mass spectrum peak spectrum of the characteristic polypeptide m/z 28231.5, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of COVID-19.
Fig. 32: the mass spectrum peak spectrum of the characteristic polypeptide m/z 11435.1, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of COVID-19.
Fig. 33: the mass spectrum peak spectrum of the characteristic polypeptide m/z 11495.3, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of COVID-19.
Fig. 34: the mass spectrum peak spectrum of the characteristic polypeptide m/z 11522.8, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of COVID-19.
Fig. 35: the mass spectrum peak spectrum of the characteristic polypeptide m/z 11680.3, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of COVID-19.
Detailed Description
The following examples are illustrative of the invention and are not intended to limit the scope of the invention.
Example 1 sample processing
Serum samples from 146 diagnosed patients were obtained from a hospital in Chongqing, month 2 of 2020, all patients were positive for nucleic acid detection and were strictly classified according to guidelines.
Classification is based on the following criteria:
(1) Light weight: clinical symptoms are mild, and the symptoms are not manifested by pneumonia in imaging;
(2) General type: has fever and respiratory symptoms, and can be used for treating visible pneumonia in imaging;
(3) Heavy duty: dyspnea, respiratory rate not less than 30 times/min, oxygen saturation not more than 93% under static state, arterial blood partial pressure (PaO 2)/oxygen concentration
(FiO2)≤300mmHg;
(4) Critically, respiratory failure, the need for a ventilator, shock, and other organ failure should be sent to the ICU for rescue.
The 152 serum samples of non-novel coronavirus infection used as controls were from a Chongqing hospital at month 3 of 2020, including 46 normal persons, 33 tuberculosis patient controls, and 73 controls with novel coronavirus infection type symptoms.
All samples were drawn on an empty stomach before eating in the early morning, loaded into a vacuum serum collection tube without additives, centrifuged for 10min at 2,264g, incubated at 56℃for 30min, and serum samples were then sub-packaged and frozen at-80 ℃.
Mass spectrometry pretreatment of serum samples: before mass spectrometry experiments were performed, 1 tube each of the sub-packaged serum samples was extracted from the low temperature refrigerator and placed on wet ice. Thawing for 60-90 min. 5uL of serum sample is sucked, 45uL of sample treatment solution is added, and vortex is carried out at 1200rpm for 30s; 10uL of the sample solution after the suction treatment is added into 10uL of the prepared matrix solution, and vortex is carried out at 1200rpm for 30s; and (3) spotting 1uL of the mixed solution on a target plate, repeating three experiments on each sample, and naturally airing to perform mass spectrum detection.
Example 2 creation of a Mass Spectrometry model for MALDI-TOF-MS
Sample preparation
5Ul of serum for each sample was diluted in 45ul of sample treatment fluid (Bioyong Technologies inc.). Then 10ul of diluted serum was removed and mixed with 10ul of matrix solution (Bioyong Technologies inc.).
2Ul of the mixed droplets were removed and added to the stainless steel target plate. After drying at room temperature, the samples were injected into a MALDI-TOF MS mass spectrometer (Clin-TOF-II; bioyong Technologies Inc.). Each sample was tested in parallel 3 times.
The general pretreatment kit for the matrix-assisted laser desorption time-of-flight mass spectrum Clin-TOF and the experimental polypeptide mass spectrum is developed by Bioyong company in China. And performing preprocessing of the data by using MALDIquant programs, performing square root conversion on the processed data, performing smoothing processing by using a filter fitting method, and performing baseline correction. The mass spectrometer is calibrated with a mixture of polypeptide proteins of known molecular weight. The quality drift of the calibrator should be within 500 ppm. 500 spectra were acquired for each sample point. The molecular weight acquisition range is m/z 3000-30000.
The mass spectrograms of different groups of samples are shown in figure 1 (figure 1: the fingerprint comparison of different groups of serum polypeptides is shown in the specification), wherein the fingerprint comparison is respectively from top to bottom of a negative healthy human spectrum, a negative tuberculosis spectrum, a negative similar symptom and a positive new coronary patient. The peak intensities of ,5158m/z、5366m/z、5893m/z、7364m/z、7614m/z、8034m/z、8043m/z、8425m/z、8560m/z、8986m/z、9626m/z、11435m/z、11523m/z、15123m/z、15867m/z、28091m/z in the negative healthy human spectra were lower, while the peak intensities of 6357m/z, 6654m/z, 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 28232m/z were higher. The peak intensities of ,5158m/z、5366m/z、5893m/z、7364m/z、7614m/z、8034m/z、8043m/z、8226m/z、8425m/z、8560m/z、8986m/z、9626m/z、11435m/z、11495m/z、11523m/z、11680m/z、15123m/z、15867m/z、28091m/z in the negative tuberculosis spectra were lower, while the peak intensities of 6654m/z, 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 28232m/z were higher. The peak intensities of ,5158m/z、5366m/z、7364m/z、7614m/z、8034m/z、8043m/z、8425m/z、8560m/z、8986m/z、9626m/z、15123m/z、15867m/z、28091m/z were lower in similar negative symptom group spectra, while the peak intensities of 6357m/z, 6654m/z, 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 28232m/z were higher. The peak intensities of ,5158m/z、5366m/z、5893m/z、7364m/z、7614m/z、8034m/z、8043m/z、8226m/z、8425m/z、8560m/z、8986m/z、9626m/z、11435m/z、11495m/z、11523m/z、11680m/z、15123m/z、15867m/z、28091m/z in the positive new crown patient spectra were higher, while the peak intensities of 6357m/z, 6654m/z, 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 28232m/z were lower.
(II) Mass Spectrometry data acquisition
Clin-TOF mass spectrometry was used. And setting a certain point of a crystallization point of a proper laser energy acquisition sample. Each sample point selects 50 laser bombardment positions, each position is bombarded for 10 times, namely, each sample crystallization point is subjected to 500 laser bombardment, and a spectrogram is collected. Laser frequency: 30Hz. Data collection range: 3-30 kDa. External standard calibration was performed with standard before each sample crystallization point was collected, with an average molecular weight deviation of less than 500ppm.
Experiment quality control:
(1) And detecting blank matrix crystallization points by using the same mass spectrum parameters, and if obvious mass spectrum peaks appear, considering that the mass of the matrix solution is unqualified, and replacing a new matrix.
(2) When the standard is used for external standard calibration, the quality deviation of different calibration product points is not more than 500ppm, and 5 calibration product peaks must meet the requirements at the same time.
(3) And selecting polypeptide peaks in 8 serum as internal standard quality control peaks. If 6-8 internal standard peaks can be detected and the molecular weight deviation range of the internal standard peaks is not more than 2 per mill, the spectrogram is considered to be qualified. Otherwise, the spectrogram needs to be collected again. The internal standard peaks m/z are as follows: 6426m/z, 6623m/z, 8753m/z, 8785m/z, 8904m/z, 9118m/z, 9409m/z, 9700m/z.
(III) raw data preprocessing
The MALDI-TOF raw data is subjected to internal standard secondary calibration by internal standard calibration software and is stored as a txt format file. The internal standard peak m/z is: 6426m/z, 6623m/z, 8753m/z, 8785m/z, 8904m/z, 9118m/z, 9409m/z, 9700m/z. The spectra were then processed using the MALDIquant program. The spectral processing content includes smoothing, baseline correction, and molecular weight calibration. Peak detection is performed with a signal-to-noise ratio of 3. The peak is bin processed using binPeaks commands with a fault tolerance of 0.002. Peaks with a peak frequency of not less than 25% in the retention group. Finally, the resulting matrix was used for the following analysis.
After log2 transformation, the peak intensity matrix is quantitated and normalized with the R packet limma. The missing values are filled with the minimum values in all samples. COVID-19 patient data and control sample data were randomly divided into training and test groups at a distribution ratio of 2:1.
(IV) selection of characteristic proteins
After intensity normalization and missing value normalization, the peaks of the training set were analyzed by the following three machine learning methods: LASSO Algorithm (LASSO), partial least squares regression analysis (PLS-DA) and recursive feature elimination with cross validation (RFECV). LASSO, commonly known as Least solution SHRINKAGE AND selection operator, is a compression estimate. The method comprises the steps of obtaining a relatively refined model by constructing a penalty function, so that the model compresses regression coefficients, namely the sum of absolute values of forced coefficients is smaller than a certain fixed value; while some regression coefficients are set to zero. The advantage of subset contraction is thus retained, being a biased estimate of the processing of data with complex co-linearity.
FIG. 2-1 shows the 20 peaks with the highest repetition frequency in LASSO. Wherein the vertical axis is the mass-to-nuclear ratio of each preferred characteristic peak. Partial least squares discriminant analysis (PLS-DA) is a multivariate statistical analysis method for discriminant analysis. Discriminant analysis is a common statistical analysis method for determining how a subject is classified based on observed or measured variable values. The principle is that the characteristics of different processed samples (such as an observation sample and a control sample) are respectively trained to generate a training set, and the credibility of the training set is checked.
FIG. 2-2 shows the 20 peaks of highest importance for VIP changes in PLS-DA. Wherein the vertical axis is the mass-to-nuclear ratio of each preferred characteristic peak. RFECV refers to finding the optimal number of features by cross-validation. Wherein RFE (Recursive feature elimination) denotes recursive feature elimination for ranking the importance of features. CV (Cross Validation) refers to cross-validation, i.e., after feature rating, the best number of features is selected by cross-validation. Fig. 2-3 show the 10 peaks with the highest cross-validation accuracy in RFECV. Wherein the vertical axis is the mass-to-nuclear ratio of each preferred characteristic peak.
29 Peaks with qualified quality control are screened out as features by empirical examination of the original spectra of the selected peaks. The intensities of the characteristic peaks are shown in FIG. 3. Each row in the graph represents a characteristic peak, each column represents a spectrogram data, and the shades in the graph represent intensities of the peaks. Wherein the left column is a negative control group and the right column is a positive group. It can be seen that the peaks of the characteristic polypeptides 6357m/z, 6654m/z, 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 28232m/z are generally expressed more in the negative group than in the positive group, while the peaks of the characteristic polypeptide 5158m/z、5366m/z、5893m/z、7364m/z、7614m/z、8034m/z、8043m/z、8226m/z、8425m/z、8560m/z、8986m/z、9626m/z、11435m/z、11495m/z、11523m/z、11680m/z、15123m/z、15867m/z、28091m/z are generally expressed more in the positive group than in the negative group. The intensities of these peaks differed significantly between COVID-19 and the control.
(V) model algorithm
8 Machine learning methods are used for building a model by 29 characteristic peaks of training set data, and the model result is evaluated through cross-validation accuracy. The machine learning method in analysis 8 is as follows: logistic Regression (LR), support Vector Machine (SVM), random Forest (RF), naive bayes method (NB), gradient descent tree (GBDT), K-nearest neighbor algorithm (KNN), decision Tree (DT) and adaptive enhancement algorithm (Adaboost).
Fig. 4-1 and 4-2 show model results for training and test sets, respectively, in the form of ROC curves. ROC curves are curves plotted on the ordinate with true positive rate (sensitivity) and false positive rate (1-specificity) on the abscissa, according to a series of different classification schemes (demarcation values or decision thresholds). The areas under the ROC curves (AUCs) of the respective tests are calculated separately for comparison, and the AUC of which test is the largest and the diagnostic value of which test is the best. In this study, the area under ROC curve AUC for all models of the training set was greater than 0.99, with LR, SVM, GBDT, DT and Adaboost AUC of 1 (FIG. 4-1). In ROC curve analysis of the validation set data, it was found that the AUC of all 8 models obtained by 8 machine learning methods in the test set exceeded 0.94, for the LR model, AUC was 1 (fig. 4-2). After evaluating the accuracy, recall, precision, F1, sensitivity and specificity of the 8 models, the LR models were found to have the best classification performance (auc=1, sensitivity=98%, specificity=100%, accuracy=99%, precision=99%, recall=99%, f1=99%), and could be further applied to COVID-19 assays.
The confusion matrix of the LR model in the test set is shown in FIG. 5, wherein the vertical axis in the figure represents the real grouping situation of samples, the upper row represents the number of negative samples, and the lower row represents the number of positive samples; the horizontal axis represents the model prediction result, the left column represents the number of samples judged negative by the model, and the right column represents the number of samples judged positive by the model. Among the 51 negative samples, all the negative samples are judged to be negative, and the judgment accuracy (namely model specificity) of the negative samples is 100%; of the 49 positive samples, 1 was misjudged as negative, 48 were judged as positive, and the positive sample judgment accuracy (i.e., model sensitivity) was 98.0%.
TABLE 1 means and quartile range of 29 characteristic polypeptides in each group in training set
A specific procedure for establishing a mass spectrometry model for rapidly screening patients with novel coronavirus infection (COVID-19) is shown in FIG. 6. The process comprises the following steps: (1) Collecting a novel coronavirus infected patient and a negative control crowd respectively and collecting a serum sample; (2) subjecting the serum sample to mass spectrometry pretreatment with the kit; (3) MALDI-TOF MS mass spectrum detection to obtain spectrogram information; (4) spectrogram processing and obtaining a peak list; (5) bioinformatic analysis; (6) determining a mass spectrometry model.
Example 3 construction of novel screening model for coronavirus infected patients
As training samples, 198 out of 298 serum samples (146 from diagnosed patients with novel coronavirus infection, another 46 normal persons, 33 tuberculosis patient controls, and 73 controls with similar symptoms of novel coronavirus infection (fever cough), 97 from patients with novel coronavirus infection, and 34 from normal persons, 19 from tuberculosis patient controls, and 48 from patients with similar symptoms of novel coronavirus infection) were selected for model establishment. All serum samples were withdrawn on an early morning empty stomach, serum was isolated and virus inactivated and stored in a-80 ℃ low temperature freezer.
The remaining samples (49 patients with novel coronavirus infection, 12 normal persons, 14 tuberculosis, 25 novel coronavirus infection-like symptoms) were used as verification samples for blind selection tests. The processing method is the same as the above.
A mass spectrum model of the novel coronavirus infection polypeptide was established using the serum characteristic polypeptide peaks of the novel coronavirus infection patients screened in example 1-2. The model was defined as using 29 characteristic peaks, one for each :5158m/z、5366m/z、5893m/z、6357m/z、6654m/z、6939m/z、7364m/z、7614m/z、8034m/z、8043m/z、8226m/z、8425m/z、8560m/z、8986m/z、9626m/z、11435m/z、11495m/z、11523m/z、11680m/z、13719m/z、13765m/z、13886m/z、14049m/z、14095m/z、14102m/z、15123m/z、15867m/z、28091m/z、28232m/z.
The characteristic mass spectrum peak spectrogram of the characteristic polypeptide is shown in figures 7-35.
The training set and validation set AUC of the LR model were both greater than 0.99. The accuracy of the test set is 99%, the sensitivity is 98% and the specificity is 100%. The model has good prediction capability.
TABLE 2 model training results
From the above table it can be seen that the results for the training set samples are: 34 cases in 34 normal groups are judged correctly, and the specificity is 100.00%;97 out of 97 patients were judged correctly, sensitivity was 100.00%;19 out of 19 tuberculosis patients were judged correctly with sensitivity of 100.00%; the 48 cases of the similar patients were judged to be correct for 48 cases, and the sensitivity was 100.00%.
Example 4 identification of novel coronavirus infection characteristic Polypeptides
After the peaks to be identified were determined in examples 2 and 3, 7 serum samples with different intensities of the peaks to be identified in the pre-treatment samples were searched. After the sample is reduced by DTT, the protein with molecular weight more than 50kDa is removed by ultrafiltration and centrifugation. The small molecule proteins/polypeptides filtered off were separated by tricine-SDS-PAGE. And carrying out secondary mass spectrum identification on each strip after the strips are subjected to intra-gel enzyme digestion.
Polypeptide sequence identification was performed using a nano-LC-MS/MS platform, including nanoflow HPLC (Thermo FISHER SCIENTIFIC, USA) and Q-Exactive mass spectrometer (Thermo FISHER SCIENTIFIC, USA). The ion mode is a positive ion mode, and the scanning range is 300-1400m/z. The primary mass spectrum resolution is 70000 and the secondary mass spectrum resolution is 17500.
Liquid phase analysis column: model: exsil Pure 120C18 (Dr. Maisch GmbH, USA); specification of: 360 μm by 12cm; inner diameter: 150 μm; and (3) granulating: 1.9um. Elution mode: the mobile phase eluted linearly from 7%B (80% acetonitrile, 0.1% formic acid) to 45% b. Flow rate: 600nl/min; the total time was 38 minutes. The results of the identification are shown in tables 3 and 4.
TABLE 3 characterization of characteristic peak Polypeptides
m/z Gene name Protein name
5158 H2AJ Histone H2A.J
6357 S100A7 Protein S100-A7
6654 IGLL5 Immunoglobulin lambda-like polypeptide 5
6939 UBB Polyubiquitin-B
7364 IGKV3-7 Probable non-functional immunoglobulin kappa variable 3-7
7614 PF4V1 Platelet factor 4variant
8034 IGKV3-15 Immunoglobulin kappa variable 3-15
8226 CFI Complement factor I
8986 RAB7A Ras-related protein Rab-7a
9626 ELANE Neutrophil elastase
13719 B2M Beta-2-microglobulin
13765 TTR Transthyretin
13886 PPBP Platelet basic protein
14049 DUSP14 Dual specificity protein phosphatase 14
14095 H2AC11 Histone H2A type 1
14102 H2AC6 Histone H2A type 1-C
15123 HBA1 Hemoglobin subunit alpha
15867 HBB Hemoglobin subunit beta
28091 WRAP73 WD repeat-containing protein WRAP73
11435 SAA1 Serum amyloid A-1protein
11495 SAA2 Serum amyloid A-2protein
11523 SAA1 Serum amyloid A-1protein
11680 SAA1 Serum amyloid A-1protein
TABLE 4 polypeptide identification sequences
Example 5 Blind screening test of novel coronavirus infected patient screening model
After model training, a model was created of the input variables of the 25 characteristic polypeptide fragments of SEQ ID NO. 1-15 and of the 29 characteristic polypeptide fragments of SEQ ID NO. 1-19, and in addition, a model was created of the input variables of the 19 characteristic polypeptide fragments (i.e., sequences SEQ 1-19) that were sequenced.
According to the method of example 3, 49 patients, 12 normal persons, 14 tuberculosis, 21 samples of the type symptoms were blindly predicted by using the above three models, and the type of the sample was judged, and the method was the same as that of the above example. The results are shown in tables 5-1, 5-2 and 5-3, respectively.
TABLE 5-1 prediction of test samples by 25 variables
Sample of Number of examples Prediction of novel coronavirus infection Prediction of non-novel coronavirus infection Prediction accuracy%
Patient group 49 48 1 97.96
Normal group 12 0 12 100.00
Tuberculosis group 14 0 14 100.00
Symptom analogue group 25 0 25 100.00
Totals to 100 99.00
From Table 5-1, it can be seen that the results for the test group samples are: 12 cases in the 12 normal cases are judged correctly, and the specificity is 100.00%; 48 out of 49 patients were judged correctly, sensitivity was 97.96%;14 out of 14 tuberculosis patients were judged correctly with a sensitivity of 100.00%;25 of the 25 symptoms were judged to be correct similarly to 25 of the patients, with a sensitivity of 100.00%.
TABLE 5-2 prediction of test sample results by 29 variables
Sample of Number of examples Prediction of novel coronavirus infection Prediction of non-novel coronavirus infection Prediction accuracy%
Patient group 49 48 1 97.96
Normal group 12 0 12 100.00
Tuberculosis group 14 0 14 100.00
Symptom analogue group 25 0 25 100.00
Totals to 100 99.00
From Table 5-2, it can be seen that the results for the test group samples are: 12 cases in the 12 normal cases are judged correctly, and the specificity is 100.00%; 48 out of 49 patients were judged correctly, sensitivity was 97.96%;14 out of 14 tuberculosis patients were judged correctly with a sensitivity of 100.00%;25 of the 25 symptoms were judged to be correct similarly to 25 of the patients, with a sensitivity of 100.00%.
As can be seen from tables 5-1 and 5-2, the prediction accuracy for 100 identical samples both met the criteria for clinical diagnosis. Although the accuracy is the same, it may be because the number of domestic patients to be examined is too small, resulting in no differentiation. However, according to the accuracy of ten-fold cross validation, it can be predicted that as the number of patients to be detected increases, a mass spectrum diagnosis model using 29 characteristic polypeptides will exhibit higher accuracy.
TABLE 5-3 prediction of test samples by 19 variables
Sample of Number of examples Prediction of novel coronavirus infection Prediction of non-novel coronavirus infection Prediction accuracy%
Patient group 49 46 3 93.88
Normal group 12 0 12 100.00
Tuberculosis group 14 0 14 100.00
Symptom analogue group 25 4 21 84.00
Totals to 100 93.00
From tables 5-3, it can be seen that the results for the test group samples are: 46 of 49 new patients with coronary disease are judged correctly, and the sensitivity is 93.88%;12 cases in the 12 normal cases are judged correctly, and the specificity is 100.00%;14 out of 14 tuberculosis patients were judged correctly with a specificity of 100.00%; the 25 symptoms were judged to be correct in 21 of the similar patients, with a sensitivity of 84.00%. This demonstrates that the model consisting of the input variables for the 19 characteristic polypeptides has the same specificity as the detection results for the complete variable for healthy people and tuberculosis patients, with very few erroneous decisions occurring in the other two groups. The model meets the clinical requirement of rapid screening and diagnosis of patients.
In addition, it can be seen from the above table that: the complete variable of 29 characteristic polypeptides is basically the same as model training in blind selection detection accuracy of a novel coronavirus infection group, but the prediction result of a non-novel coronavirus infection group reaches 100%, so that on the result after model training, an experimenter can completely exclude false positive results through fine optimization, and the true reliability of the diagnosis result of the positive results is proved, and missing diagnosis and/or misdiagnosis are avoided to the greatest extent, so that the method has positive significance.
The foregoing is merely a preferred embodiment of the present invention, and it should be noted that it will be apparent to those skilled in the art that several improvements and modifications can be made without departing from the technical principle of the present invention, and these improvements and modifications should also be considered as the scope of the present invention.
Sequence listing
<110> Beijing and Yixinbo-created biotechnology Co., ltd
<120> Kit for diagnosing new coronaries pneumonia
<140> 2021101571109
<141> 2021-02-04
<160> 19
<170> SIPOSequenceListing 1.0
<210> 1
<211> 61
<212> PRT
<213> 2 Ambystoma laterale x Ambystoma jeffersonianum
<400> 1
Thr Ile Thr Leu Glu Val Glu Pro Ser Asp Thr Ile Glu Asn Val Lys
1 5 10 15
Ala Lys Ile Gln Asp Lys Glu Gly Ile Pro Pro Asp Gln Gln Arg Leu
20 25 30
Ile Phe Ala Gly Lys Gln Leu Glu Asp Gly Arg Thr Leu Ser Asp Tyr
35 40 45
Asn Ile Gln Lys Glu Ser Thr Leu His Leu Val Leu Arg
50 55 60
<210> 2
<211> 68
<212> PRT
<213> 2 Ambystoma laterale x Ambystoma jeffersonianum
<400> 2
Glu Glu Asp Gly Asp Leu Gln Cys Leu Cys Val Lys Thr Thr Ser Gln
1 5 10 15
Val Arg Pro Arg His Ile Thr Ser Leu Glu Val Ile Lys Ala Gly Pro
20 25 30
His Cys Pro Thr Ala Gln Leu Ile Ala Thr Leu Lys Asn Gly Arg Lys
35 40 45
Ile Cys Leu Asp Leu Gln Ala Leu Leu Tyr Lys Lys Ile Ile Lys Glu
50 55 60
His Leu Glu Ser
65
<210> 3
<211> 74
<212> PRT
<213> 2 Ambystoma laterale x Ambystoma jeffersonianum
<400> 3
Met Glu Ala Pro Ala Gln Leu Leu Phe Leu Leu Leu Leu Trp Leu Pro
1 5 10 15
Asp Thr Thr Gly Glu Ile Val Met Thr Gln Ser Pro Ala Thr Leu Ser
20 25 30
Val Ser Pro Gly Glu Arg Ala Thr Leu Ser Cys Arg Ala Ser Gln Ser
35 40 45
Val Ser Ser Asn Leu Ala Trp Tyr Gln Gln Lys Pro Gly Gln Ala Pro
50 55 60
Arg Leu Leu Ile Tyr Gly Ala Ser Thr Arg
65 70
<210> 4
<211> 69
<212> PRT
<213> 2 Ambystoma laterale x Ambystoma jeffersonianum
<400> 4
Met Lys Leu Leu His Val Phe Leu Leu Phe Leu Cys Phe His Leu Arg
1 5 10 15
Phe Cys Lys Val Thr Tyr Thr Ser Gln Glu Asp Leu Val Glu Lys Lys
20 25 30
Cys Leu Ala Lys Lys Tyr Thr His Leu Ser Cys Asp Lys Val Phe Cys
35 40 45
Gln Pro Trp Gln Arg Cys Ile Glu Gly Thr Cys Val Cys Lys Leu Pro
50 55 60
Tyr Gln Cys Pro Lys
65
<210> 5
<211> 79
<212> PRT
<213> 2 Ambystoma laterale x Ambystoma jeffersonianum
<400> 5
Met Thr Ser Arg Lys Lys Val Leu Leu Lys Val Ile Ile Leu Gly Asp
1 5 10 15
Ser Gly Val Gly Lys Thr Ser Leu Met Asn Gln Tyr Val Asn Lys Lys
20 25 30
Phe Ser Asn Gln Tyr Lys Ala Thr Ile Gly Ala Asp Phe Leu Thr Lys
35 40 45
Glu Val Met Val Asp Asp Arg Leu Val Thr Met Gln Ile Trp Asp Thr
50 55 60
Ala Gly Gln Glu Arg Phe Gln Ser Leu Gly Val Ala Phe Tyr Arg
65 70 75
<210> 6
<211> 91
<212> PRT
<213> 2 Ambystoma laterale x Ambystoma jeffersonianum
<400> 6
Met Thr Leu Gly Arg Arg Leu Ala Cys Leu Phe Leu Ala Cys Val Leu
1 5 10 15
Pro Ala Leu Leu Leu Gly Gly Thr Ala Leu Ala Ser Glu Ile Val Gly
20 25 30
Gly Arg Arg Ala Arg Pro His Ala Trp Pro Phe Met Val Ser Leu Gln
35 40 45
Leu Arg Gly Gly His Phe Cys Gly Ala Thr Leu Ile Ala Pro Asn Phe
50 55 60
Val Met Ser Ala Ala His Cys Val Ala Asn Val Asn Val Arg Ala Val
65 70 75 80
Arg Val Val Leu Gly Ala His Asn Leu Ser Arg
85 90
<210> 7
<211> 119
<212> PRT
<213> 2 Ambystoma laterale x Ambystoma jeffersonianum
<400> 7
Met Ser Arg Ser Val Ala Leu Ala Val Leu Ala Leu Leu Ser Leu Ser
1 5 10 15
Gly Leu Glu Ala Ile Gln Arg Thr Pro Lys Ile Gln Val Tyr Ser Arg
20 25 30
His Pro Ala Glu Asn Gly Lys Ser Asn Phe Leu Asn Cys Tyr Val Ser
35 40 45
Gly Phe His Pro Ser Asp Ile Glu Val Asp Leu Leu Lys Asn Gly Glu
50 55 60
Arg Ile Glu Lys Val Glu His Ser Asp Leu Ser Phe Ser Lys Asp Trp
65 70 75 80
Ser Phe Tyr Leu Leu Tyr Tyr Thr Glu Phe Thr Pro Thr Glu Lys Asp
85 90 95
Glu Tyr Ala Cys Arg Val Asn His Val Thr Leu Ser Gln Pro Lys Ile
100 105 110
Val Lys Trp Asp Arg Asp Met
115
<210> 8
<211> 127
<212> PRT
<213> 2 Ambystoma laterale x Ambystoma jeffersonianum
<400> 8
Gly Pro Thr Gly Thr Gly Glu Ser Lys Cys Pro Leu Met Val Lys Val
1 5 10 15
Leu Asp Ala Val Arg Gly Ser Pro Ala Ile Asn Val Ala Val His Val
20 25 30
Phe Arg Lys Ala Ala Asp Asp Thr Trp Glu Pro Phe Ala Ser Gly Lys
35 40 45
Thr Ser Glu Ser Gly Glu Leu His Gly Leu Thr Thr Glu Glu Glu Phe
50 55 60
Val Glu Gly Ile Tyr Lys Val Glu Ile Asp Thr Lys Ser Tyr Trp Lys
65 70 75 80
Ala Leu Gly Ile Ser Pro Phe His Glu His Ala Glu Val Val Phe Thr
85 90 95
Ala Asn Asp Ser Gly Pro Arg Arg Tyr Thr Ile Ala Ala Leu Leu Ser
100 105 110
Pro Tyr Ser Tyr Ser Thr Thr Ala Val Val Thr Asn Pro Lys Glu
115 120 125
<210> 9
<211> 128
<212> PRT
<213> 2 Ambystoma laterale x Ambystoma jeffersonianum
<400> 9
Met Ser Leu Arg Leu Asp Thr Thr Pro Ser Cys Asn Ser Ala Arg Pro
1 5 10 15
Leu His Ala Leu Gln Val Leu Leu Leu Leu Ser Leu Leu Leu Thr Ala
20 25 30
Leu Ala Ser Ser Thr Lys Gly Gln Thr Lys Arg Asn Leu Ala Lys Gly
35 40 45
Lys Glu Glu Ser Leu Asp Ser Asp Leu Tyr Ala Glu Leu Arg Cys Met
50 55 60
Cys Ile Lys Thr Thr Ser Gly Ile His Pro Lys Asn Ile Gln Ser Leu
65 70 75 80
Glu Val Ile Gly Lys Gly Thr His Cys Asn Gln Val Glu Val Ile Ala
85 90 95
Thr Leu Lys Asp Gly Arg Lys Ile Cys Leu Asp Pro Asp Ala Pro Arg
100 105 110
Ile Lys Lys Ile Val Gln Lys Lys Leu Ala Gly Asp Glu Ser Ala Asp
115 120 125
<210> 10
<211> 123
<212> PRT
<213> 2 Ambystoma laterale x Ambystoma jeffersonianum
<400> 10
Val Pro Leu Ala Asp Met Pro His Ala Pro Ile Gly Leu Tyr Phe Asp
1 5 10 15
Thr Val Ala Asp Lys Ile His Ser Val Ser Arg Lys His Gly Ala Thr
20 25 30
Leu Val His Cys Ala Ala Gly Val Ser Arg Ser Ala Thr Leu Cys Ile
35 40 45
Ala Tyr Leu Met Lys Phe His Asn Val Cys Leu Leu Glu Ala Tyr Asn
50 55 60
Trp Val Lys Ala Arg Arg Pro Val Ile Arg Pro Asn Val Gly Phe Trp
65 70 75 80
Arg Gln Leu Ile Asp Tyr Glu Arg Gln Leu Phe Gly Lys Ser Thr Val
85 90 95
Lys Met Val Gln Thr Pro Tyr Gly Ile Val Pro Asp Val Tyr Glu Lys
100 105 110
Glu Ser Arg His Leu Met Pro Tyr Trp Gly Ile
115 120
<210> 11
<211> 130
<212> PRT
<213> 2 Ambystoma laterale x Ambystoma jeffersonianum
<400> 11
Met Ser Gly Arg Gly Lys Gln Gly Gly Lys Ala Arg Ala Lys Ala Lys
1 5 10 15
Ser Arg Ser Ser Arg Ala Gly Leu Gln Phe Pro Val Gly Arg Val His
20 25 30
Arg Leu Leu Arg Lys Gly Asn Tyr Ala Glu Arg Val Gly Ala Gly Ala
35 40 45
Pro Val Tyr Met Ala Ala Val Leu Glu Tyr Leu Thr Ala Glu Ile Leu
50 55 60
Glu Leu Ala Gly Asn Ala Ala Arg Asp Asn Lys Lys Thr Arg Ile Ile
65 70 75 80
Pro Arg His Leu Gln Leu Ala Ile Arg Asn Asp Glu Glu Leu Asn Lys
85 90 95
Leu Leu Gly Lys Val Thr Ile Ala Gln Gly Gly Val Leu Pro Asn Ile
100 105 110
Gln Ala Val Leu Leu Pro Lys Lys Thr Glu Ser His His Lys Ala Lys
115 120 125
Gly Lys
130
<210> 12
<211> 130
<212> PRT
<213> 2 Ambystoma laterale x Ambystoma jeffersonianum
<400> 12
Met Ser Gly Arg Gly Lys Gln Gly Gly Lys Ala Arg Ala Lys Ala Lys
1 5 10 15
Ser Arg Ser Ser Arg Ala Gly Leu Gln Phe Pro Val Gly Arg Val His
20 25 30
Arg Leu Leu Arg Lys Gly Asn Tyr Ala Glu Arg Val Gly Ala Gly Ala
35 40 45
Pro Val Tyr Leu Ala Ala Val Leu Glu Tyr Leu Thr Ala Glu Ile Leu
50 55 60
Glu Leu Ala Gly Asn Ala Ala Arg Asp Asn Lys Lys Thr Arg Ile Ile
65 70 75 80
Pro Arg His Leu Gln Leu Ala Ile Arg Asn Asp Glu Glu Leu Asn Lys
85 90 95
Leu Leu Gly Arg Val Thr Ile Ala Gln Gly Gly Val Leu Pro Asn Ile
100 105 110
Gln Ala Val Leu Leu Pro Lys Lys Thr Glu Ser His His Lys Ala Lys
115 120 125
Gly Lys
130
<210> 13
<211> 141
<212> PRT
<213> 2 Ambystoma laterale x Ambystoma jeffersonianum
<400> 13
Val Leu Ser Pro Ala Asp Lys Thr Asn Val Lys Ala Ala Trp Gly Lys
1 5 10 15
Val Gly Ala His Ala Gly Glu Tyr Gly Ala Glu Ala Leu Glu Arg Met
20 25 30
Phe Leu Ser Phe Pro Thr Thr Lys Thr Tyr Phe Pro His Phe Asp Leu
35 40 45
Ser His Gly Ser Ala Gln Val Lys Gly His Gly Lys Lys Val Ala Asp
50 55 60
Ala Leu Thr Asn Ala Val Ala His Val Asp Asp Met Pro Asn Ala Leu
65 70 75 80
Ser Ala Leu Ser Asp Leu His Ala His Lys Leu Arg Val Asp Pro Val
85 90 95
Asn Phe Lys Leu Leu Ser His Cys Leu Leu Val Thr Leu Ala Ala His
100 105 110
Leu Pro Ala Glu Phe Thr Pro Ala Val His Ala Ser Leu Asp Lys Phe
115 120 125
Leu Ala Ser Val Ser Thr Val Leu Thr Ser Lys Tyr Arg
130 135 140
<210> 14
<211> 146
<212> PRT
<213> 2 Ambystoma laterale x Ambystoma jeffersonianum
<400> 14
Val His Leu Thr Pro Glu Glu Lys Ser Ala Val Thr Ala Leu Trp Gly
1 5 10 15
Lys Val Asn Val Asp Glu Val Gly Gly Glu Ala Leu Gly Arg Leu Leu
20 25 30
Val Val Tyr Pro Trp Thr Gln Arg Phe Phe Glu Ser Phe Gly Asp Leu
35 40 45
Ser Thr Pro Asp Ala Val Met Gly Asn Pro Lys Val Lys Ala His Gly
50 55 60
Lys Lys Val Leu Gly Ala Phe Ser Asp Gly Leu Ala His Leu Asp Asn
65 70 75 80
Leu Lys Gly Thr Phe Ala Thr Leu Ser Glu Leu His Cys Asp Lys Leu
85 90 95
His Val Asp Pro Glu Asn Phe Arg Leu Leu Gly Asn Val Leu Val Cys
100 105 110
Val Leu Ala His His Phe Gly Lys Glu Phe Thr Pro Pro Val Gln Ala
115 120 125
Ala Tyr Gln Lys Val Val Ala Gly Val Ala Asn Ala Leu Ala His Lys
130 135 140
Tyr His
145
<210> 15
<211> 257
<212> PRT
<213> 2 Ambystoma laterale x Ambystoma jeffersonianum
<400> 15
Ile Leu Leu Tyr Ser Leu Asp Gly Arg Leu Leu Ser Thr Tyr Ser Ala
1 5 10 15
Tyr Glu Trp Ser Leu Gly Ile Lys Ser Val Ala Trp Ser Pro Ser Ser
20 25 30
Gln Phe Leu Ala Val Gly Ser Tyr Asp Gly Lys Val Arg Ile Leu Asn
35 40 45
His Val Thr Trp Lys Met Ile Thr Glu Phe Gly His Pro Ala Ala Ile
50 55 60
Asn Asp Pro Lys Ile Val Val Tyr Lys Glu Ala Glu Lys Ser Pro Gln
65 70 75 80
Leu Gly Leu Gly Cys Leu Ser Phe Pro Pro Pro Arg Ala Gly Ala Gly
85 90 95
Pro Leu Pro Ser Ser Glu Ser Lys Tyr Glu Ile Ala Ser Val Pro Val
100 105 110
Ser Leu Gln Thr Leu Lys Pro Val Thr Asp Arg Ala Asn Pro Lys Ile
115 120 125
Gly Ile Gly Met Leu Ala Phe Ser Pro Asp Ser Tyr Phe Leu Ala Thr
130 135 140
Arg Asn Asp Asn Ile Pro Asn Ala Val Trp Val Trp Asp Ile Gln Lys
145 150 155 160
Leu Arg Leu Phe Ala Val Leu Glu Gln Leu Ser Pro Val Arg Ala Phe
165 170 175
Gln Trp Asp Pro Gln Gln Pro Arg Leu Ala Ile Cys Thr Gly Gly Ser
180 185 190
Arg Leu Tyr Leu Trp Ser Pro Ala Gly Cys Met Ser Val Gln Val Pro
195 200 205
Gly Glu Gly Asp Phe Ala Val Leu Ser Leu Cys Trp His Leu Ser Gly
210 215 220
Asp Ser Met Ala Leu Leu Ser Lys Asp His Phe Cys Leu Cys Phe Leu
225 230 235 240
Glu Thr Glu Ala Val Val Gly Thr Ala Cys Arg Gln Leu Gly Gly His
245 250 255
Thr
<210> 16
<211> 102
<212> PRT
<213> 2 Ambystoma laterale x Ambystoma jeffersonianum
<400> 16
Phe Phe Ser Phe Leu Gly Glu Ala Phe Asp Gly Ala Arg Asp Met Trp
1 5 10 15
Arg Ala Tyr Ser Asp Met Arg Glu Ala Asn Tyr Ile Gly Ser Asp Lys
20 25 30
Tyr Phe His Ala Arg Gly Asn Tyr Asp Ala Ala Lys Arg Gly Pro Gly
35 40 45
Gly Val Trp Ala Ala Glu Ala Ile Ser Asp Ala Arg Glu Asn Ile Gln
50 55 60
Arg Phe Phe Gly His Gly Ala Glu Asp Ser Leu Ala Asp Gln Ala Ala
65 70 75 80
Asn Glu Trp Gly Arg Ser Gly Lys Asp Pro Asn His Phe Arg Pro Ala
85 90 95
Gly Leu Pro Glu Lys Tyr
100
<210> 17
<211> 103
<212> PRT
<213> 2 Ambystoma laterale x Ambystoma jeffersonianum
<400> 17
Ser Phe Phe Ser Phe Leu Gly Glu Ala Phe Asp Gly Ala Arg Asp Met
1 5 10 15
Trp Arg Ala Tyr Ser Asp Met Arg Glu Ala Asn Tyr Ile Gly Ser Asp
20 25 30
Lys Tyr Phe His Ala Arg Gly Asn Tyr Asp Ala Ala Lys Arg Gly Pro
35 40 45
Gly Gly Ala Trp Ala Ala Glu Val Ile Ser Asn Ala Arg Glu Asn Ile
50 55 60
Gln Arg Leu Thr Gly Arg Gly Ala Glu Asp Ser Leu Ala Asp Gln Ala
65 70 75 80
Ala Asn Lys Trp Gly Arg Ser Gly Arg Asp Pro Asn His Phe Arg Pro
85 90 95
Ala Gly Leu Pro Glu Lys Tyr
100
<210> 18
<211> 103
<212> PRT
<213> 2 Ambystoma laterale x Ambystoma jeffersonianum
<400> 18
Ser Phe Phe Ser Phe Leu Gly Glu Ala Phe Asp Gly Ala Arg Asp Met
1 5 10 15
Trp Arg Ala Tyr Ser Asp Met Arg Glu Ala Asn Tyr Ile Gly Ser Asp
20 25 30
Lys Tyr Phe His Ala Arg Gly Asn Tyr Asp Ala Ala Lys Arg Gly Pro
35 40 45
Gly Gly Val Trp Ala Ala Glu Ala Ile Ser Asp Ala Arg Glu Asn Ile
50 55 60
Gln Arg Phe Phe Gly His Gly Ala Glu Asp Ser Leu Ala Asp Gln Ala
65 70 75 80
Ala Asn Glu Trp Gly Arg Ser Gly Lys Asp Pro Asn His Phe Arg Pro
85 90 95
Ala Gly Leu Pro Glu Lys Tyr
100
<210> 19
<211> 104
<212> PRT
<213> 2 Ambystoma laterale x Ambystoma jeffersonianum
<400> 19
Arg Ser Phe Phe Ser Phe Leu Gly Glu Ala Phe Asp Gly Ala Arg Asp
1 5 10 15
Met Trp Arg Ala Tyr Ser Asp Met Arg Glu Ala Asn Tyr Ile Gly Ser
20 25 30
Asp Lys Tyr Phe His Ala Arg Gly Asn Tyr Asp Ala Ala Lys Arg Gly
35 40 45
Pro Gly Gly Val Trp Ala Ala Glu Ala Ile Ser Asp Ala Arg Glu Asn
50 55 60
Ile Gln Arg Phe Phe Gly His Gly Ala Glu Asp Ser Leu Ala Asp Gln
65 70 75 80
Ala Ala Asn Glu Trp Gly Arg Ser Gly Lys Asp Pro Asn His Phe Arg
85 90 95
Pro Ala Gly Leu Pro Glu Lys Tyr
100

Claims (4)

1. A kit for detecting a novel coronavirus infection, wherein the kit comprises software or a chip of a standard database of characteristic polypeptides, and is useful for providing a standard data or curve alignment for mass spectrometry of a sample to be tested to determine the expression profile of the characteristic polypeptides in the sample to be tested, comprising a polypeptide composition consisting of:
A characteristic polypeptide with a mass to charge ratio of 6939m/z, the polypeptide sequence of which is selected from the sequences shown in SEQ ID No. 1;
a polypeptide with a mass-to-charge ratio of 7614m/z, wherein the polypeptide sequence is selected from the sequences shown in SEQ ID No. 2;
A characteristic polypeptide with a mass to charge ratio of 8034m/z, the polypeptide sequence of which is selected from the sequence shown in SEQ ID No. 3;
a characteristic polypeptide with a mass-to-charge ratio of 8226m/z, the polypeptide sequence of which is selected from the sequences shown in SEQ ID No. 4;
a characteristic polypeptide with a mass-to-charge ratio of 8986m/z, the polypeptide sequence of which is selected from the sequences shown in SEQ ID No. 5;
a characteristic polypeptide having a mass to charge ratio of 9626m/z, the polypeptide sequence being selected from the group consisting of the sequences shown in SEQ ID No. 6;
A characteristic polypeptide having a mass to charge ratio of 13719m/z, the polypeptide sequence of which is selected from the group consisting of the sequences shown in SEQ ID No. 7;
a characteristic polypeptide having a mass to charge ratio of 13765m/z, the polypeptide sequence being selected from the group consisting of the sequences shown in SEQ ID No. 8;
a polypeptide having a mass to charge ratio of 13886m/z, wherein the polypeptide sequence is selected from the group consisting of the sequences shown in SEQ ID No. 9;
a characteristic polypeptide having a mass to charge ratio of 14049m/z, the polypeptide sequence of which is selected from the group consisting of the sequences shown in SEQ ID No. 10;
a characteristic polypeptide with a mass to charge ratio of 14095m/z, the polypeptide sequence of which is selected from the sequences shown in SEQ ID No. 11;
A characteristic polypeptide having a mass to charge ratio of 14102m/z, the polypeptide sequence of which is selected from the group consisting of the sequences shown in SEQ ID No. 12;
a characteristic polypeptide with a mass to charge ratio of 15123m/z, the polypeptide sequence of which is selected from the sequence shown in SEQ ID No. 13;
a characteristic polypeptide having a mass to charge ratio of 15867m/z, the polypeptide sequence of which is selected from the group consisting of the sequences shown in SEQ ID No. 14;
A characteristic polypeptide with a mass to charge ratio of 28091m/z, the polypeptide sequence of which is selected from the sequence shown in SEQ ID No. 15;
A characteristic polypeptide with mass to charge ratio of 11435m/z, the polypeptide sequence of which is selected from the sequence shown in SEQ ID No. 16;
A characteristic polypeptide having a mass to charge ratio of 11495m/z, the polypeptide sequence of which is selected from the group consisting of the sequences shown in SEQ ID No. 17;
A characteristic polypeptide having a mass to charge ratio of 11523m/z, the polypeptide sequence of which is selected from the group consisting of the sequences shown in SEQ ID No. 18;
a characteristic polypeptide having a mass to charge ratio of 11680m/z, the polypeptide sequence of which is selected from the group consisting of the sequences shown in SEQ ID No. 19;
Wherein a positive sample is indicated when the peak of the characteristic polypeptide 7614m/z, 8034m/z, 8226m/z, 8986m/z, 9626m/z, 15123m/z, 15867m/z, 28091m/z is up-regulated while the peak of the characteristic polypeptide 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z is down-regulated in a serum sample for mass spectrometry detection, i.e., the serum sample is determined to be provided to a novel coronavirus infected patient, and a ten fold cross-validation accuracy is about 93.31%.
2. The kit of claim 1, wherein the polypeptide composition consists only of the characteristic polypeptides having mass to charge ratios of 8986m/z, 28091m/z, 6939m/z, 13886m/z, 14049m/z, 14102m/z, respectively, wherein a ten fold cross-validation accuracy of about 91% is indicated when peaks of the characteristic polypeptides 8986m/z, 28091m/z are up-regulated while peaks of the characteristic polypeptides 6939m/z, 13886m/z, 14049m/z, 14102m/z are down-regulated for expression, indicating that the serum sample is a positive sample, i.e., a provider of the serum sample is determined to be a novel coronavirus infected patient.
3. The kit according to any one of claims 1-2, wherein the kit comprises a sample processing fluid.
4. The kit of claim 3, wherein the kit further comprises a standard mass spectrometry sample tube that ensures that the molecular weight measured by the mass spectrometer is accurate.
CN202110157110.9A 2021-02-04 2021-02-04 Kit for diagnosing novel coronavirus infection Active CN114858906B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202110157110.9A CN114858906B (en) 2021-02-04 2021-02-04 Kit for diagnosing novel coronavirus infection
PCT/CN2021/142779 WO2022166485A1 (en) 2021-02-04 2021-12-30 Kit for diagnosing covid-19

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110157110.9A CN114858906B (en) 2021-02-04 2021-02-04 Kit for diagnosing novel coronavirus infection

Publications (2)

Publication Number Publication Date
CN114858906A CN114858906A (en) 2022-08-05
CN114858906B true CN114858906B (en) 2024-08-09

Family

ID=82623579

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110157110.9A Active CN114858906B (en) 2021-02-04 2021-02-04 Kit for diagnosing novel coronavirus infection

Country Status (2)

Country Link
CN (1) CN114858906B (en)
WO (1) WO2022166485A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023123175A1 (en) * 2021-12-30 2023-07-06 北京毅新博创生物科技有限公司 Method for evaluating whether individual completes vaccination or individual immune changes
CN116087482B (en) * 2023-02-24 2023-07-11 广州国家实验室 Biomarkers for severity typing of course of patients with 2019 novel coronavirus infection

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111830120A (en) * 2020-06-10 2020-10-27 北京东西分析仪器有限公司 Kit for identifying new coronavirus by using mass spectrometry system and use method thereof

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004092208A2 (en) * 2003-04-15 2004-10-28 Her Majesty The Queen In Right Of Canada As Represented By The Minister Of Health Sars-related proteins
US8057993B2 (en) * 2003-04-26 2011-11-15 Ibis Biosciences, Inc. Methods for identification of coronaviruses
WO2005043111A2 (en) * 2003-07-14 2005-05-12 Ciphergen Biosystems, Inc. Serum biomarkers for sars
CN101196526A (en) * 2006-12-06 2008-06-11 许洋 Mass spectrometry reagent kit and method for rapid tuberculosis diagnosis
CN111220686A (en) * 2018-11-23 2020-06-02 中国科学院大连化学物理研究所 Method for establishing mass spectrum database based on virus identification
US12360112B2 (en) * 2018-12-19 2025-07-15 Rudjer Boskovic Institute Method for identification of viruses and diagnostic kit using the same
CN110632326A (en) * 2019-10-01 2019-12-31 北京毅新博创生物科技有限公司 Characteristic protein marker composition for mass spectrometry diagnosis of thalassemia and its diagnostic product
CN111455062B (en) * 2020-04-01 2022-02-11 中国人民解放军总医院 A kit and platform for detection of novel coronavirus susceptibility genes
CN111499692B (en) * 2020-06-16 2020-12-04 国家纳米科学中心 Peptides targeting novel coronavirus COVID-19 and their applications
CN112798679B (en) * 2020-10-16 2023-06-20 北京毅新博创生物科技有限公司 Kit for diagnosing novel coronavirus infection

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111830120A (en) * 2020-06-10 2020-10-27 北京东西分析仪器有限公司 Kit for identifying new coronavirus by using mass spectrometry system and use method thereof

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"Rapid classification and prediction of COVID-19 severity by MALDI-TOF mass spectrometry analysis of serum peptidome";Rosa M.Gomila et al.;medRXiv;20201103;1-24 *

Also Published As

Publication number Publication date
WO2022166485A1 (en) 2022-08-11
CN114858906A (en) 2022-08-05

Similar Documents

Publication Publication Date Title
CN112798679B (en) Kit for diagnosing novel coronavirus infection
Chalupová et al. Identification of fungal microorganisms by MALDI-TOF mass spectrometry
CN114858907B (en) Method for constructing a mass spectrometry model for diagnosing novel coronavirus infection
CN103308696B (en) Brucella rapid detection kit based on mass-spectrometric technique
CN110057955A (en) The screening technique of hepatitis B specific serum marker
CN107024530B (en) Method for detecting microorganisms by internal standard mass spectrometry and products thereof
CN114858906B (en) Kit for diagnosing novel coronavirus infection
CN101403740B (en) Mass spectrum model used for detecting liver cancer characteristic protein and preparation method thereof
CN114858903B (en) Characteristic polypeptide composition for diagnosing novel coronavirus infection
CN111307926A (en) Serum-based rapid detection method for Brucella vaccine strain infection
CN114858905B (en) Application of characteristic polypeptide composition and mass spectrum model in preparation of novel coronavirus infection detection product
CN114858904B (en) Mass spectrometry models comprising characteristic polypeptides for diagnosing novel coronavirus infections
CN110057954A (en) Blood plasma metabolic markers are in the application for diagnosing or monitoring HBV
Velichko et al. Classification and identification tasks in microbiology: mass spectrometric methods coming to the aid
CN116337986B (en) Quick identification method of salmonella kentucky based on MALDI-TOF MS
CN101196526A (en) Mass spectrometry reagent kit and method for rapid tuberculosis diagnosis
CN116087482B (en) Biomarkers for severity typing of course of patients with 2019 novel coronavirus infection
CN117637160A (en) Amyotrophic lateral sclerosis prediction and prognosis evaluation system
TW202208843A (en) Method of identification of methicillin-resistant staphylococcus aureus
CN110850079A (en) Application of diagnosis marker APOA1 for effect evaluation after liver transplantation
CN111220594A (en) A method for screening Streptomyces strains with pesticide activity by ultra-high resolution mass spectrometry
CN116298280A (en) Application of liquid biopsy index in preparation of endometriosis screening and diagnosis products
CN108103171B (en) Preparation method of vibrio parahaemolyticus fingerprint atlas database
CN108048540B (en) Method for preparing fingerprint library for detection of Vibrio cholerae typing
CN106770618A (en) A kind of method of the mass spectra model for setting up acute ischemic cerebral apoplexy characteristic protein

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant