CN114858906B

CN114858906B - Kit for diagnosing novel coronavirus infection

Info

Publication number: CN114858906B
Application number: CN202110157110.9A
Authority: CN
Inventors: 廖璞; 孙巍; 乔亮; 吕倩; 马庆伟
Original assignee: Beijing Clin Bochuang Biotechnology Co Ltd
Current assignee: Beijing Clin Bochuang Biotechnology Co Ltd
Priority date: 2021-02-04
Filing date: 2021-02-04
Publication date: 2024-08-09
Anticipated expiration: 2041-02-04
Also published as: WO2022166485A1; CN114858906A

Abstract

The invention provides a characteristic polypeptide composition for detecting new coronavirus infection, which comprises 29 characteristic polypeptides with specific mass-to-charge ratios, and can judge whether a sample is a patient with the new coronavirus infection or not by analyzing the expression condition of the characteristic polypeptides. The invention also provides applications of the mass spectrum model prepared according to the characteristic polypeptide composition, products for diagnosing new coronavirus infection and the like. The invention provides a plurality of characteristic protein combinations with differences according to the new coronavirus infection patient/normal person, phthisis patient and the contrast with new coronavirus infection type symptoms for the first time, breaks through the traditional research thought of searching characteristic polypeptides only in normal person and new coronavirus infection patient, effectively avoids the infection of false positive results similar to the new coronavirus infection symptoms, has simple operation, low detection cost and high accuracy, and is expected to be used for large-scale screening of the new coronavirus infection.

Description

Kit for diagnosing novel coronavirus infection

Technical Field

The invention belongs to the field of detection, and relates to a technology for rapidly detecting novel coronavirus infection by using a time-of-flight mass spectrometry technology.

Background

Coronaviruses are a class of pathogens that primarily cause respiratory and intestinal diseases. The surface of such virus particles has a plurality of regularly arranged protrusions, and the whole virus particle resembles a imperial crown, thus the name "coronavirus". Coronaviruses can infect a variety of mammals, such as pigs, cattle, cats, dogs, minks, camels, bats, mice, hedgehog, and a variety of birds, in addition to humans.

Six types of human coronaviruses are known to date. Four of these coronaviruses are more common in the population and are less pathogenic, generally causing only mild respiratory symptoms like common cold. Two other coronaviruses, severe acute respiratory syndrome coronavirus and middle east respiratory syndrome coronavirus, namely SARS coronavirus and MERS coronavirus for short, can cause severe respiratory diseases.

The novel coronavirus COVID-19 is a novel coronavirus strain which is never found in human body before, and the propagation rule, the infection mechanism, the evolution and variation rule are still unclear, thus bringing difficulty to control.

In order to prevent the occurrence and prevalence of novel coronavirus (COVID-19) infection, measures are rapidly taken to effectively control the development and spread of epidemic, and the rapid detection of novel coronavirus infection is particularly important. For a long time, the identification of coronaviruses adopts traditional microbiological detection methods, namely morphological, physiological and biochemical characteristics and serological identification. The method has high accuracy, but the required time is too long, and the method can be completed only in tens of hours at the fastest speed, so that the method is difficult to meet the requirement of rapid detection. The nucleic acid detection method based on multiplex PCR has important significance for early diagnosis of coronaviruses and discovery of infectious sources.

Matrix-assisted laser desorption ionization time-of-flight mass spectrometry (MALDI-TOF MS for short) technology is a mass spectrometry technology which has been developed and rapidly developed at the end of the 80 th century. The mass analyzer is an ion drift tube (ion dirft tube), ions generated by an ion source are firstly collected, all ion speeds in the collector become 0, and the ions enter the field-free drift tube after being accelerated by using a pulse electric field and fly to an ion receiver at a constant speed, and the larger the ion mass is, the longer the time for reaching the receiver is; the smaller the ion mass, the shorter the time it takes to reach the receiver. According to the principle, ions with different masses can be separated according to mass-to-charge ratio, and the molecular mass and purity of biological macromolecules such as polypeptide, protein, nucleic acid, polysaccharide and the like can be accurately detected, so that the method has the advantages of high accuracy, high flexibility, large flux, short detection period and high cost performance.

In recent years, mass spectrometry techniques have emerged to detect polypeptides or polypeptides characteristic of pathogenic microorganisms or viruses. For example, chinese patent application CN102337223a, "penicillium chrysogenum antifungal protein Pc-Arctin and its preparation method", discloses a MALDI-TOF identification method for detecting penicillium chrysogenum antifungal protein Pc-Arctin, wherein penicillium chrysogenum a096 spores are picked up from a flat plate and inoculated in SGY liquid medium for culture, crude protein solution obtained by pretreatment is separated and purified on chromatographic column, and separated and purified on carboxymethyl cation exchange chromatographic column, eluting components are collected, centrifugal ultrafiltration concentration of each component is carried out to a required volume, paecilomyces variotii is used as sensitive test indicator bacteria, antifungal active components are tracked, and the determined active components judge purity of the obtained protein; a single band on the SDS-PAGE electrophoretogram was excised and MALDI-TOF identified. The method is only suitable for specific microorganisms, multiple protein purification processes are needed, and finally, the characteristic polypeptide Pc-Arctin is identified by MALDI-TOF, so that the method is complex in process and narrow in application range, and the purpose of detecting viruses by mass spectrometry cannot be achieved.

Chinese patent application 201110154723, "method for MALDI TOF MS assisted identification of listeria monocytogenes" and 201110154469, "method for MALDI TOF MS assisted identification of vibrio cholerae" disclose a method for assisted identification of bacteria using MALDI TOF MS technique comprising: pretreating bacterial cultures, collecting MALDI TOF MS (matrix assisted laser Desorption ionization time of flight) maps of all strain samples, preparing a bacterial standard map according to software, detecting and collecting the maps of bacteria to be detected by using the same method, comparing the maps, and judging according to matching scores. Because the method uses conventional treatment (by absolute ethyl alcohol, formic acid and acetonitrile treatment, and assisted by centrifugation, and finally the supernatant is sucked for detection), although the method can characterize the characteristic spectrum of the bacteria to a certain extent, the obtained spectrum is essentially the spectrum collection of the various molecules because the detected substances contain proteins, lipids, lipopolysaccharide and lipo-oligosaccharide, DNA, polypeptide and other molecules capable of being ionized, so that the information of the spectrum which is required to be treated and compared is too large, and the characteristic of the spectrum is low because the detected molecules are too large, and the method is only suitable for a specific bacteria and cannot be popularized to other virus detection in a large quantity.

Chinese patent application 200880121570, entitled "methods and biomarkers for diagnosing and monitoring mental disorders" reports that nearly hundred kinds of neuropeptides associated with mental disorders, including influenza virus, can be detected by MALDI-TOF mass spectrometry techniques. However, this method only briefly summarises the various possible techniques, which neither report a specific protocol nor a specific target for coronaviruses, and thus it is difficult to teach researchers to detect influenza viruses by MALDI-TOF mass spectrometry techniques.

Thus, there is a need for a novel characteristic polypeptide mass spectrometry model for detecting coronavirus infection by matrix-assisted laser desorption ionization time-of-flight mass spectrometry (MALDI-TOF-MS) and use thereof.

Disclosure of Invention

The first object of the present invention provides a set of compositions based on characteristic polypeptides of sero-peptide group (peptidome) which can detect novel coronaviruses (COVID-19) by MALDI-TOF mass spectrometry, wherein the characteristic polypeptide composition comprises 25 characteristic polypeptides ：5158m/z、5366m/z、5893m/z、6357m/z、6654m/z、6939m/z、7364m/z、7614m/z、8034m/z、8043m/z、8226m/z、8425m/z、8560m/z、8986m/z、9626m/z、13719m/z、13765m/z、13886m/z、14049m/z、14095m/z、14102m/z、15123m/z、15867m/z、28091m/z、28232m/z, having a mass to charge ratio or comprises 29 characteristic polypeptides having a mass to charge ratio ：5158m/z、5366m/z、5893m/z、6357m/z、6654m/z、6939m/z、7364m/z、7614m/z、8034m/z、8043m/z、8226m/z、8425m/z、8560m/z、8986m/z、9626m/z、11435m/z、11495m/z、11523m/z、11680m/z、13719m/z、13765m/z、13886m/z、14049m/z、14095m/z、14102m/z、15123m/z、15867m/z、28091m/z、28232m/z.

In one embodiment, the signature polypeptide composition comprises 19 signature polypeptides having the following mass to charge ratios and polypeptide sequences:

A characteristic polypeptide with a mass to charge ratio of 6939m/z, the polypeptide sequence of which is selected from the sequences shown in SEQ ID No. 1;

a polypeptide with a mass-to-charge ratio of 7614m/z, wherein the polypeptide sequence is selected from the sequences shown in SEQ ID No. 2;

A characteristic polypeptide with a mass to charge ratio of 8034m/z, the polypeptide sequence of which is selected from the sequence shown in SEQ ID No. 3;

a characteristic polypeptide with a mass-to-charge ratio of 8226m/z, the polypeptide sequence of which is selected from the sequences shown in SEQ ID No. 4;

a characteristic polypeptide with a mass-to-charge ratio of 8986m/z, the polypeptide sequence of which is selected from the sequences shown in SEQ ID No. 5;

a characteristic polypeptide having a mass to charge ratio of 9626m/z, the polypeptide sequence being selected from the group consisting of the sequences shown in SEQ ID No. 6;

A characteristic polypeptide having a mass to charge ratio of 13719m/z, the polypeptide sequence of which is selected from the group consisting of the sequences shown in SEQ ID No. 7;

a characteristic polypeptide having a mass to charge ratio of 13765m/z, the polypeptide sequence being selected from the group consisting of the sequences shown in SEQ ID No. 8;

a polypeptide having a mass to charge ratio of 13886m/z, wherein the polypeptide sequence is selected from the group consisting of the sequences shown in SEQ ID No. 9;

a characteristic polypeptide having a mass to charge ratio of 14049m/z, the polypeptide sequence of which is selected from the group consisting of the sequences shown in SEQ ID No. 10;

a characteristic polypeptide with a mass to charge ratio of 14095m/z, the polypeptide sequence of which is selected from the sequences shown in SEQ ID No. 11;

A characteristic polypeptide having a mass to charge ratio of 14102m/z, the polypeptide sequence of which is selected from the group consisting of the sequences shown in SEQ ID No. 12;

a characteristic polypeptide with a mass to charge ratio of 15123m/z, the polypeptide sequence of which is selected from the sequence shown in SEQ ID No. 13;

a characteristic polypeptide having a mass to charge ratio of 15867m/z, the polypeptide sequence of which is selected from the group consisting of the sequences shown in SEQ ID No. 14;

a polypeptide having a mass to charge ratio of 28091m/z and a polypeptide sequence selected from the group consisting of the sequences shown in SEQ ID No.15

A characteristic polypeptide with mass to charge ratio of 11435m/z, the polypeptide sequence of which is selected from the sequence shown in SEQ ID No. 16;

A characteristic polypeptide having a mass to charge ratio of 11495m/z, the polypeptide sequence of which is selected from the group consisting of the sequences shown in SEQ ID No. 17;

A characteristic polypeptide having a mass to charge ratio of 11523m/z, the polypeptide sequence of which is selected from the group consisting of the sequences shown in SEQ ID No. 18;

A polypeptide having a mass to charge ratio of 11680m/z, the polypeptide sequence of which is selected from the group consisting of the sequences shown in SEQ ID No. 19.

In any of the above embodiments, a ten fold cross-validation accuracy of about 91% is indicated when the peak of the signature polypeptide 8986m/z, 28091m/z is up-regulated while the peak of the signature polypeptide 6939m/z, 13886m/z, 14049m/z, 14102m/z is down-regulated for expression, indicating that the serum sample is a positive sample, i.e., the patient is determined to be a novel coronavirus infected patient. In a preferred embodiment, the composition of the characteristic polypeptides comprises only characteristic polypeptides having mass to charge ratios of 8986m/z, 28091m/z, and 6939m/z, 13886m/z, 14049m/z, 14102m/z, respectively.

In another arbitrary embodiment, a ten fold cross-validation accuracy of approximately 93.31% is indicated when the peak of the signature polypeptide 7614m/z、8034m/z、8226m/z、8986m/z、9626m/z、11435m/z、11495m/z、11523m/z、11680m/z、15123m/z、15867m/z、28091m/z is up-regulated while the peak of the signature polypeptides 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z is down-regulated for expression, indicating that the serum sample is a positive sample, i.e., the patient is a novel coronavirus infected patient. In a preferred embodiment, the composition of the characteristic polypeptides comprises only characteristic polypeptides having mass to charge ratios of 7614m/z、8034m/z、8226m/z、8986m/z、9626m/z、11435m/z、11495m/z、11523m/z、11680m/z、15123m/z、15867m/z、28091m/z, and 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, respectively.

In other embodiments, a ten fold cross-validation accuracy of about 98.69% is indicated when the peak of the signature polypeptide 5158m/z、5366m/z、5893m/z、7364m/z、7614m/z、8034m/z、8043m/z、8226m/z、8425m/z、8560m/z、8986m/z、9626m/z、11435m/z、11495m/z、11523m/z、11680m/z、15123m/z、15867m/z、28091m/z is up-regulated while the peak of the signature polypeptide 6357m/z, 6654m/z, 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 28232m/z is down-regulated for expression, indicating that the serum sample is a positive sample, i.e., the patient is a novel coronavirus infected patient.

It is a second object of the present invention to provide a mass spectrometry model for detecting novel coronavirus infections, which is prepared from a characteristic polypeptide composition having a mass-to-charge ratio peak of any of the above schemes.

In one embodiment, the mass spectrometry model is prepared from a signature polypeptide 5158m/z、5366m/z、5893m/z、6357m/z、6654m/z、6939m/z、7364m/z、7614m/z、8034m/z、8043m/z、8226m/z、8425m/z、8560m/z、8986m/z、9626m/z、13719m/z、13765m/z、13886m/z、14049m/z、14095m/z、14102m/z、15123m/z、15867m/z、28091m/z、28232m/z, wherein when the peak of signature polypeptide 5158m/z、5366m/z、5893m/z、7364m/z、7614m/z、8034m/z、8043m/z、8226m/z、8425m/z、8560m/z、8986m/z、9626m/z、15123m/z、15867m/z、28091m/z is up-regulated while the peak of signature polypeptide 6357m/z, 6654m/z, 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 28232m/z is down-regulated, the serum sample is indicative of a positive sample, i.e., the patient is a novel coronavirus infected patient, and the ten fold cross-validation accuracy is about 97.96%.

Alternatively, in another embodiment described above, the mass spectrometry model is prepared from a signature polypeptide 5158m/z、5366m/z、5893m/z、6357m/z、6654m/z、6939m/z、7364m/z、7614m/z、8034m/z、8043m/z、8226m/z、8425m/z、8560m/z、8986m/z、9626m/z、11435m/z、11495m/z、11523m/z、11680m/z、13719m/z、13765m/z、13886m/z、14049m/z、14095m/z、14102m/z、15123m/z、15867m/z、28091m/z、28232m/z, wherein when the peak of signature polypeptide 5158m/z、5366m/z、5893m/z、7364m/z、7614m/z、8034m/z、8043m/z、8226m/z、8425m/z、8560m/z、8986m/z、9626m/z、11435m/z、11495m/z、11523m/z、11680m/z、15123m/z、15867m/z、28091m/z is up-regulated while the peak of signature polypeptide 6357m/z, 6654m/z, 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 28232m/z is down-regulated, the serum sample is indicative of a positive sample, i.e., the patient is a novel coronavirus infected patient, and the ten fold cross-validation accuracy is about 98.69%.

In another embodiment, the mass spectrometry model is prepared from only a composition of the signature polypeptides having mass to charge ratios of 7614m/z、8034m/z、8226m/z、8986m/z、9626m/z、11435m/z、11495m/z、11523m/z、11680m/z、15123m/z、15867m/z、28091m/z, and 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, respectively, wherein when the peak of signature polypeptide 7614m/z、8034m/z、8226m/z、8986m/z、9626m/z、11435m/z、11495m/z、11523m/z、11680m/z、15123m/z、15867m/z、28091m/z is up-regulated while the peak of signature polypeptides 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z is down-regulated, the serum sample is indicated to be a positive sample, i.e., the patient is a novel coronavirus infected patient, and the ten fold cross-validation accuracy is about 93.31%.

In other embodiments, the mass spectrometry model is prepared from only a composition of signature polypeptides having mass to charge ratios of 8986m/z, 28091m/z, 6939m/z, 13886m/z, 14049m/z, 14102m/z, respectively, wherein when the peaks of signature polypeptides 8986m/z, 28091m/z are up-regulated while the peaks of signature polypeptides 6939m/z, 13886m/z, 14049m/z, 14102m/z are down-regulated, the serum sample is indicative of a positive sample, i.e., the patient is determined to be a novel coronavirus infected patient, and the ten fold cross-validation accuracy is about 91%.

It is a third object of the present invention to provide a kit for detecting a novel coronavirus infection comprising the above-described characteristic polypeptide composition, or comprising the above-described mass spectrometry model.

In one embodiment, the polypeptide composition or mass spectrometry model is prepared from a signature polypeptide 5158m/z、5366m/z、5893m/z、6357m/z、6654m/z、6939m/z、7364m/z、7614m/z、8034m/z、8043m/z、8226m/z、8425m/z、8560m/z、8986m/z、9626m/z、13719m/z、13765m/z、13886m/z、14049m/z、14095m/z、14102m/z、15123m/z、15867m/z、28091m/z、28232m/z, wherein when the peak of signature polypeptide 5158m/z、5366m/z、5893m/z、7364m/z、7614m/z、8034m/z、8043m/z、8226m/z、8425m/z、8560m/z、8986m/z、9626m/z、15123m/z、15867m/z、28091m/z is up-regulated while the peak of signature polypeptide 6357m/z, 6654m/z, 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 28232m/z is down-regulated, the serum sample is indicative of a positive sample, i.e., the patient is a novel coronavirus infected patient, and the ten fold cross-validation accuracy is about 97.96%.

Or in another embodiment, the polypeptide composition or mass spectrometry model is prepared from a signature polypeptide 5158m/z、5366m/z、5893m/z、6357m/z、6654m/z、6939m/z、7364m/z、7614m/z、8034m/z、8043m/z、8226m/z、8425m/z、8560m/z、8986m/z、9626m/z、11435m/z、11495m/z、11523m/z、11680m/z、13719m/z、13765m/z、13886m/z、14049m/z、14095m/z、14102m/z、15123m/z、15867m/z、28091m/z、28232m/z, wherein when the peak of signature polypeptide 5158m/z、5366m/z、5893m/z、7364m/z、7614m/z、8034m/z、8043m/z、8226m/z、8425m/z、8560m/z、8986m/z、9626m/z、11435m/z、11495m/z、11523m/z、11680m/z、15123m/z、15867m/z、28091m/z is up-regulated while the peak of signature polypeptide 6357m/z, 6654m/z, 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 28232m/z is down-regulated, the serum sample is indicative of a positive sample, i.e., the patient is a novel coronavirus infected patient, and the ten fold cross-validation accuracy is about 98.69%.

In another embodiment, the polypeptide composition or mass spectrometry model is prepared from only the characteristic polypeptides having mass to charge ratios of 7614m/z、8034m/z、8226m/z、8986m/z、9626m/z、11435m/z、11495m/z、11523m/z、11680m/z、15123m/z、15867m/z、28091m/z, and 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, respectively, wherein when the peak of characteristic polypeptide 7614m/z、8034m/z、8226m/z、8986m/z、9626m/z、11435m/z、11495m/z、11523m/z、11680m/z、15123m/z、15867m/z、28091m/z is up-regulated while the peak of characteristic polypeptide 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z is down-regulated, the serum sample is indicative of a positive sample, i.e., the patient is a novel coronavirus infected patient, and the ten fold cross-validation accuracy is about 93.31%.

In other embodiments, the polypeptide composition or mass spectrometry model is prepared from only the following characteristic polypeptides having mass to charge ratios of 8986m/z, 28091m/z, 6939m/z, 13886m/z, 14049m/z, 14102m/z, respectively, wherein when the peaks of the characteristic polypeptides 8986m/z, 28091m/z are up-regulated while the peaks of the characteristic polypeptides 6939m/z, 13886m/z, 14049m/z, 14102m/z are down-regulated, the serum sample is indicative of a positive sample, i.e., the patient is determined to be a novel coronavirus infected patient, and the ten fold cross-validation accuracy is about 91%.

In one embodiment, the kit comprises a sample treatment fluid developed by Beijing-based New Boc Biotechnology Inc.

In another embodiment, the kit further comprises a standard mass spectrum sample tube for ensuring the accuracy of the molecular weight measured by the mass spectrometer, wherein the sample tube can be a plurality of sample tubes containing single characteristic polypeptides or one sample tube containing a plurality of characteristic polypeptides, and samples in the standard sample tube are used for performing parallel mass spectrum test when being subjected to mass spectrum with the sample to be measured so as to judge whether the molecular weight information of the sample to be measured is accurate and reliable.

In another embodiment, the kit can contain software or a chip of the standard database of the characteristic polypeptides, and can be used for providing standard data or curve comparison when a sample to be tested is subjected to mass spectrometry so as to judge the expression condition of the characteristic polypeptides in the sample to be tested.

It is a fourth object of the present invention to provide the use of said characteristic polypeptide composition, or said mass spectrometry model, for the preparation of a product for diagnosing a novel coronavirus infection.

In any of the above embodiments, the product for diagnosing a novel coronavirus infection refers to any conventional product for diagnosing a novel coronavirus infection, including: detection reagent, detection chip, detection carrier, detection kit, etc.

A fifth object of the present invention is to provide a method for constructing a mass spectrometry model, comprising:

1) Serum samples of a plurality of clinically definite patients infected with the novel coronavirus and non-novel patients infected with the coronavirus (including tuberculosis patients, patients with symptoms similar to fever and cough and healthy people) are collected and frozen at low temperature for standby;

2) Carrying out mass spectrum pretreatment on serum proteins;

3) Carrying out mass spectrometry detection and reading on the two groups of preprocessed serum proteins to obtain fingerprint patterns of the two groups of serum polypeptides;

4) Carrying out standardized treatment on finger print of serum polypeptide of all patients and normal people, and collecting data;

5) And performing quality control treatment on the obtained data, screening out characteristic polypeptides ：5158m/z、5366m/z、5893m/z、6357m/z、6654m/z、6939m/z、7364m/z、7614m/z、8034m/z、8043m/z、8226m/z、8425m/z、8560m/z、8986m/z、9626m/z、13719m/z、13765m/z、13886m/z、14049m/z、14095m/z、14102m/z、15123m/z、15867m/z、28091m/z、28232m/z, with the following mass-to-charge ratio peaks, performing secondary mass spectrometry identification on the characteristic polypeptides, and establishing a mass spectrometry model for detecting novel coronavirus infection according to the mass-to-charge ratio peaks.

In one embodiment, wherein step 5) performs a quality control process on the resulting data, screens out a signature polypeptide ：5158m/z、5366m/z、5893m/z、6357m/z、6654m/z、6939m/z、7364m/z、7614m/z、8034m/z、8043m/z、8226m/z、8425m/z、8560m/z、8986m/z、9626m/z、11435m/z、11495m/z、11523m/z、11680m/z、13719m/z、13765m/z、13886m/z、14049m/z、14095m/z、14102m/z、15123m/z、15867m/z、28091m/z、28232m/z, having the following mass to charge ratio peaks, performs a secondary mass spectrometry identification of the signature polypeptide, and builds a mass spectrometry model for detecting novel coronavirus infection based on these mass to charge ratio peaks.

In a preferred embodiment, wherein the mass spectrometry model of step 5) is prepared from only the characteristic polypeptides having mass to charge ratios of 7614m/z、8034m/z、8226m/z、8986m/z、9626m/z、11435m/z、11495m/z、11523m/z、11680m/z、15123m/z、15867m/z、28091m/z, and 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, respectively, wherein when the peak of characteristic polypeptide 7614m/z、8034m/z、8226m/z、8986m/z、9626m/z、11435m/z、11495m/z、11523m/z、11680m/z、15123m/z、15867m/z、28091m/z is up-regulated while the peak of characteristic polypeptide 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z is down-regulated, the serum sample is indicated as a positive sample, i.e. the patient is a novel coronavirus infected patient, and the ten fold cross-validation accuracy is about 93.31%.

In another embodiment, wherein the mass spectrometry model of step 5) is prepared from only the following characteristic polypeptides having mass to charge ratios of 8986m/z, 28091m/z, 6939m/z, 13886m/z, 14049m/z, 14102m/z, respectively, wherein when the peaks of the characteristic polypeptides 8986m/z, 28091m/z are up-regulated while the peaks of the characteristic polypeptides 6939m/z, 13886m/z, 14049m/z, 14102m/z are down-regulated, the serum sample is indicated as a positive sample, i.e. the patient is determined to be a novel coronavirus infected patient, a ten fold cross-validation accuracy of about 91%.

In any of the above embodiments, the characteristic polypeptides are respectively:

A characteristic polypeptide with a mass to charge ratio of 28091m/z, the polypeptide sequence of which is selected from the sequence shown in SEQ ID No. 15;

In any of the above embodiments, wherein the method of step 2) of pre-treating comprises diluting the serum protein or polypeptide in the stabilized sample with a sample treatment fluid.

In any of the above embodiments, in the step 3), the polypeptide mass spectrometry universal pretreatment kit is used to dilute and read two groups of serum proteins, so as to obtain fingerprints of the two groups of serum polypeptides.

In any of the above embodiments, the quality control process described in step 5) uses the same mass spectrum parameters to detect the crystallization point of the blank substrate, and if a distinct mass spectrum peak appears, the quality of the substrate solution is considered to be unacceptable.

In any one of the above embodiments, wherein the quality control processing in step 5) selects the following 8 characteristic peaks as quality control peaks: 6426m/z, 6623m/z, 8753m/z, 8785m/z, 8904m/z, 9118m/z, 9409m/z, 9700m/z.

In the process of detecting a biological sample by time-of-flight mass spectrometry, the mass of a mass spectrogram is influenced by a plurality of conditions such as individual difference, sample mass, environmental temperature and humidity change, crystallization states of a sample and a matrix and the like. To avoid the influence of the abnormal spectrum on the analysis result, the above 8 characteristic peaks common to human serum were introduced as quality control peaks, and the occurrence of the quality control peaks was independent of whether the patient had a novel coronavirus infection. Of the 843 spectra collected, 683 could detect all 8 quality control peaks (81.0% of the total number of spectra), and 156 could detect 7 quality control peaks (18.5% of the total number of spectra). Wherein, the following spectrogram quality control conditions are set: in the spectrogram of a single sample, the quality control peak number is 6-8, and when the deviation of the internal standard peak molecular weight is less than 0.002 (or the deviation range is not more than 2 per mill), the quality control is qualified. Unqualified spectra need to be re-detected.

The invention combines a bioinformatics method to screen out corresponding novel coronavirus infection markers and establish a detection model for analysis and detection, wherein the bioinformatics method comprises the steps of carrying out standardization treatment on fingerprint spectra, carrying out experimental quality control treatment on obtained data, screening expected serum characteristic polypeptides and establishing a mass spectrum model, and optionally establishing and verifying the mass spectrum model by using an LR algorithm. And the experimental quality control processing reserves mass spectrum data with the number of the internal standard peaks not less than 6, and performs secondary calibration of the spectrogram by using the internal standard peaks.

Terminology and definitions

Cross validation of ten folds, called 10-fold cross-validation, was used to test algorithm accuracy. Is a common test method. The data set was divided into ten parts, 9 parts of which were used as training data and 1 part as test data in turn, and the test was performed. Each test gives a corresponding correct rate (or error rate). As an estimation of the accuracy of the algorithm, an average value of the accuracy (or error rate) of the result of 10 times is generally required to perform 10-fold cross-validation (e.g., 10 times 10-fold cross-validation), and then the average value is obtained as an estimation of the accuracy of the algorithm. It should be noted that ten fold cross-validation accuracy is related to, but not equivalent to, the accuracy (or sensitivity) of the actual test. In the process of evaluating the effect of the test algorithm, the effect accords with the ten-fold cross-validation accuracy of the confidence interval, and if the correlation change is presented along with the quantity of the characteristic polypeptides and reaches the value which is feasible for clinical diagnosis, the mass spectrum model constructed by the polypeptides accords with the requirement for clinical diagnosis.

SAA protein (Serum amyloid A protein) is a serum amyloid A family, an acute phase response protein, belonging to the heterogeneous class of proteins in the apolipoprotein family. There are 4 serum amyloid a genes in humans as SAA1-SAA4, respectively, where two proteins of SAA1 and SAA2 are acute phase (acute phase) are called a-SAA.

Technical effects

Compared with the prior art, the invention has the following advantages:

1. The invention adopts a plurality of characteristic protein combinations which are different between a novel coronavirus infected patient and a normal person, a pulmonary tuberculosis patient and a control patient with novel coronavirus infection type symptoms to detect serum samples, and adopts a method combining traditional statistics and a modern bioinformatics method to carry out data processing, thereby obtaining a polypeptide fingerprint detection model of the novel coronavirus infected patient, the healthy person and other control patients, and a series of discovered protein charge ratio peaks provide basis and resources for searching new and more ideal markers.

2. Compared with the prior detection method, the method has higher sensitivity and specificity, simple operation, low detection cost and high flux, and is expected to be used for large-scale screening of novel coronavirus infection.

3. The construction method of the model is reasonable and feasible in design, provides a new screening method for providing the clinical cure rate of the novel coronavirus infection, and provides a new thought for exploring the mechanism of occurrence and development of the novel coronavirus infection.

4. The invention provides a plurality of characteristic protein combinations with differences between 146 patients with definite diagnosis of novel coronavirus infection, 46 patients with normal infection, 33 patients with tuberculosis and 73 patients with novel coronavirus infection type symptoms for the first time, breaks through the traditional research thought of searching characteristic polypeptides only in normal patients and novel patients with coronavirus infection, and effectively avoids the infection of false positive results similar to the novel coronavirus infection symptoms.

5. The mass spectrum model of the invention has the detection accuracy reaching 99%, the sensitivity being 98% and the specificity being 100%, and the result shows that the serum peptide group characteristic polypeptide model of the invention can be rapidly used for screening novel coronavirus infected patients in the crowd.

6. Compared with a composition constructed by 25 characteristic polypeptides and a mass spectrum model, 4 newly introduced characteristic polypeptides (namely SEQ ID NO: 16-19) belong to an SAA protein marker family, and can be used as a biomarker for diagnosing pathogenic and viral infection by methods such as ELASA, immunoturbidimetry, colloidal gold method, immunofluorescence chromatography and the like in clinic. However, on the basis of a completed mass spectrum model of 25 special polypeptides, the invention firstly proposes that the SAA protein marker is used for detecting viruses by laser flight mass spectrometry, and specific SAA protein sequences (namely SEQ ID NO: 16-19) are accurately identified for the first time, so that the situation of misdiagnosis of a normal sample in clinic can be effectively avoided. The results showed that the ten fold cross-validation accuracy of the 29 feature polypeptide mass spectrum models incorporating the 4 SAA polypeptide markers was about 97.96% compared to the ten fold cross-validation accuracy of the 25 feature polypeptide mass spectrum models, which was about 98.69%.

Drawings

Fig. 1: comparing serum polypeptide finger print of different groups (healthy person group, pulmonary tuberculosis group, similar symptom group, and new crown patient group), wherein the negative healthy person pattern, negative pulmonary tuberculosis pattern, negative similar symptom, and positive new crown patient are respectively from top to bottom

Fig. 2-1: the 20 peaks with the highest repetition frequency in LASSO. Fig. 2-2: the 20 peaks with the highest importance of VIP changes in PLS-DA.

Fig. 2-3: the 10 peaks with the highest accuracy are cross-validated in RFECV.

Fig. 3: each characteristic peak intensity, wherein the left column is a negative control group and the right column is a positive control group.

Fig. 4-1: various machine learning methods, training set ROC curve comparison. Fig. 4-2: test set ROC curve comparison.

Fig. 5: the test set of the real groupings confuses the predicted results of the matrix.

Fig. 6: a procedure for establishing a mass spectrometry model for rapidly screening a novel coronavirus infected (COVID-19) patient for characteristic polypeptides.

Fig. 7: the mass spectrum peak spectrum of the characteristic polypeptide m/z 5157.6, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of COVID-19.

Fig. 8: the mass spectrum peak spectrum of the characteristic polypeptide m/z 5366.2, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of COVID-19.

Fig. 9: the mass spectrum peak spectrum of the characteristic polypeptide m/z 5892.9, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of COVID-19.

Fig. 10: the mass spectrum peak spectrum of the characteristic polypeptide m/z 6357.4, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of COVID-19.

Fig. 11: the mass spectrum peak spectrum of the characteristic polypeptide m/z 6654.0, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of COVID-19.

Fig. 12: the mass spectrum peak spectrum of the characteristic polypeptide m/z 6939.1, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of COVID-19.

Fig. 13: the mass spectrum peak spectrum of the characteristic polypeptide m/z 7364.2, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of COVID-19.

Fig. 14: the mass spectrum peak spectrum of the characteristic polypeptide m/z 7614.2, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of COVID-19.

Fig. 15: the mass spectrum peak spectrum of the characteristic polypeptide m/z 8034.3, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of COVID-19.

Fig. 16: the mass spectrum peak spectrum of the characteristic polypeptide m/z 8042.7, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of COVID-19.

Fig. 17: the mass spectrum peak spectrum of the characteristic polypeptide m/z 8226.4, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of COVID-19.

Fig. 18: the mass spectrum peak spectrum of the characteristic polypeptide m/z 8424.9, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of COVID-19.

Fig. 19: the mass spectrum peak spectrum of the characteristic polypeptide m/z 8559.8, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of COVID-19.

Fig. 20: the mass spectrum peak spectrum of the characteristic polypeptide m/z 8986.1, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of COVID-19.

Fig. 21: the mass spectrum peak spectrum of the characteristic polypeptide m/z 9626.4, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of COVID-19.

Fig. 22: the mass spectrum peak spectrum of the characteristic polypeptide m/z 13719.2, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of COVID-19.

Fig. 23: the mass spectrum peak spectrum of the characteristic polypeptide m/z 13765.2, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of COVID-19.

Fig. 24: the mass spectrum peak spectrum of the characteristic polypeptide m/z 13886.1, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of COVID-19.

Fig. 25: the mass spectrum peak spectrum of the characteristic polypeptide m/z 14049.4, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of COVID-19.

Fig. 26: the mass spectrum peak spectrum of the characteristic polypeptide m/z 14094.7, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of COVID-19.

Fig. 27: the mass spectrum peak spectrum of the characteristic polypeptide m/z 14101.8, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of COVID-19.

Fig. 28: the mass spectrum peak spectrum of the characteristic polypeptide m/z 15123.4, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of COVID-19.

Fig. 29: the mass spectrum peak spectrum of the characteristic polypeptide m/z 15866.5, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of COVID-19.

Fig. 30: the mass spectrum peak spectrum of the characteristic polypeptide m/z 28091.4, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of COVID-19.

Fig. 31: the mass spectrum peak spectrum of the characteristic polypeptide m/z 28231.5, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of COVID-19.

Fig. 32: the mass spectrum peak spectrum of the characteristic polypeptide m/z 11435.1, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of COVID-19.

Fig. 33: the mass spectrum peak spectrum of the characteristic polypeptide m/z 11495.3, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of COVID-19.

Fig. 34: the mass spectrum peak spectrum of the characteristic polypeptide m/z 11522.8, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of COVID-19.

Fig. 35: the mass spectrum peak spectrum of the characteristic polypeptide m/z 11680.3, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of COVID-19.

Detailed Description

The following examples are illustrative of the invention and are not intended to limit the scope of the invention.

Example 1 sample processing

Serum samples from 146 diagnosed patients were obtained from a hospital in Chongqing, month 2 of 2020, all patients were positive for nucleic acid detection and were strictly classified according to guidelines.

Classification is based on the following criteria:

(1) Light weight: clinical symptoms are mild, and the symptoms are not manifested by pneumonia in imaging;

(2) General type: has fever and respiratory symptoms, and can be used for treating visible pneumonia in imaging;

(3) Heavy duty: dyspnea, respiratory rate not less than 30 times/min, oxygen saturation not more than 93% under static state, arterial blood partial pressure (PaO 2)/oxygen concentration

(FiO2)≤300mmHg；

(4) Critically, respiratory failure, the need for a ventilator, shock, and other organ failure should be sent to the ICU for rescue.

The 152 serum samples of non-novel coronavirus infection used as controls were from a Chongqing hospital at month 3 of 2020, including 46 normal persons, 33 tuberculosis patient controls, and 73 controls with novel coronavirus infection type symptoms.

All samples were drawn on an empty stomach before eating in the early morning, loaded into a vacuum serum collection tube without additives, centrifuged for 10min at 2,264g, incubated at 56℃for 30min, and serum samples were then sub-packaged and frozen at-80 ℃.

Mass spectrometry pretreatment of serum samples: before mass spectrometry experiments were performed, 1 tube each of the sub-packaged serum samples was extracted from the low temperature refrigerator and placed on wet ice. Thawing for 60-90 min. 5uL of serum sample is sucked, 45uL of sample treatment solution is added, and vortex is carried out at 1200rpm for 30s; 10uL of the sample solution after the suction treatment is added into 10uL of the prepared matrix solution, and vortex is carried out at 1200rpm for 30s; and (3) spotting 1uL of the mixed solution on a target plate, repeating three experiments on each sample, and naturally airing to perform mass spectrum detection.

Example 2 creation of a Mass Spectrometry model for MALDI-TOF-MS

Sample preparation

5Ul of serum for each sample was diluted in 45ul of sample treatment fluid (Bioyong Technologies inc.). Then 10ul of diluted serum was removed and mixed with 10ul of matrix solution (Bioyong Technologies inc.).

2Ul of the mixed droplets were removed and added to the stainless steel target plate. After drying at room temperature, the samples were injected into a MALDI-TOF MS mass spectrometer (Clin-TOF-II; bioyong Technologies Inc.). Each sample was tested in parallel 3 times.

The general pretreatment kit for the matrix-assisted laser desorption time-of-flight mass spectrum Clin-TOF and the experimental polypeptide mass spectrum is developed by Bioyong company in China. And performing preprocessing of the data by using MALDIquant programs, performing square root conversion on the processed data, performing smoothing processing by using a filter fitting method, and performing baseline correction. The mass spectrometer is calibrated with a mixture of polypeptide proteins of known molecular weight. The quality drift of the calibrator should be within 500 ppm. 500 spectra were acquired for each sample point. The molecular weight acquisition range is m/z 3000-30000.

The mass spectrograms of different groups of samples are shown in figure 1 (figure 1: the fingerprint comparison of different groups of serum polypeptides is shown in the specification), wherein the fingerprint comparison is respectively from top to bottom of a negative healthy human spectrum, a negative tuberculosis spectrum, a negative similar symptom and a positive new coronary patient. The peak intensities of ,5158m/z、5366m/z、5893m/z、7364m/z、7614m/z、8034m/z、8043m/z、8425m/z、8560m/z、8986m/z、9626m/z、11435m/z、11523m/z、15123m/z、15867m/z、28091m/z in the negative healthy human spectra were lower, while the peak intensities of 6357m/z, 6654m/z, 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 28232m/z were higher. The peak intensities of ,5158m/z、5366m/z、5893m/z、7364m/z、7614m/z、8034m/z、8043m/z、8226m/z、8425m/z、8560m/z、8986m/z、9626m/z、11435m/z、11495m/z、11523m/z、11680m/z、15123m/z、15867m/z、28091m/z in the negative tuberculosis spectra were lower, while the peak intensities of 6654m/z, 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 28232m/z were higher. The peak intensities of ,5158m/z、5366m/z、7364m/z、7614m/z、8034m/z、8043m/z、8425m/z、8560m/z、8986m/z、9626m/z、15123m/z、15867m/z、28091m/z were lower in similar negative symptom group spectra, while the peak intensities of 6357m/z, 6654m/z, 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 28232m/z were higher. The peak intensities of ,5158m/z、5366m/z、5893m/z、7364m/z、7614m/z、8034m/z、8043m/z、8226m/z、8425m/z、8560m/z、8986m/z、9626m/z、11435m/z、11495m/z、11523m/z、11680m/z、15123m/z、15867m/z、28091m/z in the positive new crown patient spectra were higher, while the peak intensities of 6357m/z, 6654m/z, 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 28232m/z were lower.

(II) Mass Spectrometry data acquisition

Clin-TOF mass spectrometry was used. And setting a certain point of a crystallization point of a proper laser energy acquisition sample. Each sample point selects 50 laser bombardment positions, each position is bombarded for 10 times, namely, each sample crystallization point is subjected to 500 laser bombardment, and a spectrogram is collected. Laser frequency: 30Hz. Data collection range: 3-30 kDa. External standard calibration was performed with standard before each sample crystallization point was collected, with an average molecular weight deviation of less than 500ppm.

Experiment quality control:

(1) And detecting blank matrix crystallization points by using the same mass spectrum parameters, and if obvious mass spectrum peaks appear, considering that the mass of the matrix solution is unqualified, and replacing a new matrix.

(2) When the standard is used for external standard calibration, the quality deviation of different calibration product points is not more than 500ppm, and 5 calibration product peaks must meet the requirements at the same time.

(3) And selecting polypeptide peaks in 8 serum as internal standard quality control peaks. If 6-8 internal standard peaks can be detected and the molecular weight deviation range of the internal standard peaks is not more than 2 per mill, the spectrogram is considered to be qualified. Otherwise, the spectrogram needs to be collected again. The internal standard peaks m/z are as follows: 6426m/z, 6623m/z, 8753m/z, 8785m/z, 8904m/z, 9118m/z, 9409m/z, 9700m/z.

(III) raw data preprocessing

The MALDI-TOF raw data is subjected to internal standard secondary calibration by internal standard calibration software and is stored as a txt format file. The internal standard peak m/z is: 6426m/z, 6623m/z, 8753m/z, 8785m/z, 8904m/z, 9118m/z, 9409m/z, 9700m/z. The spectra were then processed using the MALDIquant program. The spectral processing content includes smoothing, baseline correction, and molecular weight calibration. Peak detection is performed with a signal-to-noise ratio of 3. The peak is bin processed using binPeaks commands with a fault tolerance of 0.002. Peaks with a peak frequency of not less than 25% in the retention group. Finally, the resulting matrix was used for the following analysis.

After log2 transformation, the peak intensity matrix is quantitated and normalized with the R packet limma. The missing values are filled with the minimum values in all samples. COVID-19 patient data and control sample data were randomly divided into training and test groups at a distribution ratio of 2:1.

(IV) selection of characteristic proteins

After intensity normalization and missing value normalization, the peaks of the training set were analyzed by the following three machine learning methods: LASSO Algorithm (LASSO), partial least squares regression analysis (PLS-DA) and recursive feature elimination with cross validation (RFECV). LASSO, commonly known as Least solution SHRINKAGE AND selection operator, is a compression estimate. The method comprises the steps of obtaining a relatively refined model by constructing a penalty function, so that the model compresses regression coefficients, namely the sum of absolute values of forced coefficients is smaller than a certain fixed value; while some regression coefficients are set to zero. The advantage of subset contraction is thus retained, being a biased estimate of the processing of data with complex co-linearity.

FIG. 2-1 shows the 20 peaks with the highest repetition frequency in LASSO. Wherein the vertical axis is the mass-to-nuclear ratio of each preferred characteristic peak. Partial least squares discriminant analysis (PLS-DA) is a multivariate statistical analysis method for discriminant analysis. Discriminant analysis is a common statistical analysis method for determining how a subject is classified based on observed or measured variable values. The principle is that the characteristics of different processed samples (such as an observation sample and a control sample) are respectively trained to generate a training set, and the credibility of the training set is checked.

FIG. 2-2 shows the 20 peaks of highest importance for VIP changes in PLS-DA. Wherein the vertical axis is the mass-to-nuclear ratio of each preferred characteristic peak. RFECV refers to finding the optimal number of features by cross-validation. Wherein RFE (Recursive feature elimination) denotes recursive feature elimination for ranking the importance of features. CV (Cross Validation) refers to cross-validation, i.e., after feature rating, the best number of features is selected by cross-validation. Fig. 2-3 show the 10 peaks with the highest cross-validation accuracy in RFECV. Wherein the vertical axis is the mass-to-nuclear ratio of each preferred characteristic peak.

29 Peaks with qualified quality control are screened out as features by empirical examination of the original spectra of the selected peaks. The intensities of the characteristic peaks are shown in FIG. 3. Each row in the graph represents a characteristic peak, each column represents a spectrogram data, and the shades in the graph represent intensities of the peaks. Wherein the left column is a negative control group and the right column is a positive group. It can be seen that the peaks of the characteristic polypeptides 6357m/z, 6654m/z, 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 28232m/z are generally expressed more in the negative group than in the positive group, while the peaks of the characteristic polypeptide 5158m/z、5366m/z、5893m/z、7364m/z、7614m/z、8034m/z、8043m/z、8226m/z、8425m/z、8560m/z、8986m/z、9626m/z、11435m/z、11495m/z、11523m/z、11680m/z、15123m/z、15867m/z、28091m/z are generally expressed more in the positive group than in the negative group. The intensities of these peaks differed significantly between COVID-19 and the control.

(V) model algorithm

8 Machine learning methods are used for building a model by 29 characteristic peaks of training set data, and the model result is evaluated through cross-validation accuracy. The machine learning method in analysis 8 is as follows: logistic Regression (LR), support Vector Machine (SVM), random Forest (RF), naive bayes method (NB), gradient descent tree (GBDT), K-nearest neighbor algorithm (KNN), decision Tree (DT) and adaptive enhancement algorithm (Adaboost).

Fig. 4-1 and 4-2 show model results for training and test sets, respectively, in the form of ROC curves. ROC curves are curves plotted on the ordinate with true positive rate (sensitivity) and false positive rate (1-specificity) on the abscissa, according to a series of different classification schemes (demarcation values or decision thresholds). The areas under the ROC curves (AUCs) of the respective tests are calculated separately for comparison, and the AUC of which test is the largest and the diagnostic value of which test is the best. In this study, the area under ROC curve AUC for all models of the training set was greater than 0.99, with LR, SVM, GBDT, DT and Adaboost AUC of 1 (FIG. 4-1). In ROC curve analysis of the validation set data, it was found that the AUC of all 8 models obtained by 8 machine learning methods in the test set exceeded 0.94, for the LR model, AUC was 1 (fig. 4-2). After evaluating the accuracy, recall, precision, F1, sensitivity and specificity of the 8 models, the LR models were found to have the best classification performance (auc=1, sensitivity=98%, specificity=100%, accuracy=99%, precision=99%, recall=99%, f1=99%), and could be further applied to COVID-19 assays.

The confusion matrix of the LR model in the test set is shown in FIG. 5, wherein the vertical axis in the figure represents the real grouping situation of samples, the upper row represents the number of negative samples, and the lower row represents the number of positive samples; the horizontal axis represents the model prediction result, the left column represents the number of samples judged negative by the model, and the right column represents the number of samples judged positive by the model. Among the 51 negative samples, all the negative samples are judged to be negative, and the judgment accuracy (namely model specificity) of the negative samples is 100%; of the 49 positive samples, 1 was misjudged as negative, 48 were judged as positive, and the positive sample judgment accuracy (i.e., model sensitivity) was 98.0%.

TABLE 1 means and quartile range of 29 characteristic polypeptides in each group in training set

A specific procedure for establishing a mass spectrometry model for rapidly screening patients with novel coronavirus infection (COVID-19) is shown in FIG. 6. The process comprises the following steps: (1) Collecting a novel coronavirus infected patient and a negative control crowd respectively and collecting a serum sample; (2) subjecting the serum sample to mass spectrometry pretreatment with the kit; (3) MALDI-TOF MS mass spectrum detection to obtain spectrogram information; (4) spectrogram processing and obtaining a peak list; (5) bioinformatic analysis; (6) determining a mass spectrometry model.

Example 3 construction of novel screening model for coronavirus infected patients

As training samples, 198 out of 298 serum samples (146 from diagnosed patients with novel coronavirus infection, another 46 normal persons, 33 tuberculosis patient controls, and 73 controls with similar symptoms of novel coronavirus infection (fever cough), 97 from patients with novel coronavirus infection, and 34 from normal persons, 19 from tuberculosis patient controls, and 48 from patients with similar symptoms of novel coronavirus infection) were selected for model establishment. All serum samples were withdrawn on an early morning empty stomach, serum was isolated and virus inactivated and stored in a-80 ℃ low temperature freezer.

The remaining samples (49 patients with novel coronavirus infection, 12 normal persons, 14 tuberculosis, 25 novel coronavirus infection-like symptoms) were used as verification samples for blind selection tests. The processing method is the same as the above.

A mass spectrum model of the novel coronavirus infection polypeptide was established using the serum characteristic polypeptide peaks of the novel coronavirus infection patients screened in example 1-2. The model was defined as using 29 characteristic peaks, one for each ：5158m/z、5366m/z、5893m/z、6357m/z、6654m/z、6939m/z、7364m/z、7614m/z、8034m/z、8043m/z、8226m/z、8425m/z、8560m/z、8986m/z、9626m/z、11435m/z、11495m/z、11523m/z、11680m/z、13719m/z、13765m/z、13886m/z、14049m/z、14095m/z、14102m/z、15123m/z、15867m/z、28091m/z、28232m/z.

The characteristic mass spectrum peak spectrogram of the characteristic polypeptide is shown in figures 7-35.

The training set and validation set AUC of the LR model were both greater than 0.99. The accuracy of the test set is 99%, the sensitivity is 98% and the specificity is 100%. The model has good prediction capability.

TABLE 2 model training results

From the above table it can be seen that the results for the training set samples are: 34 cases in 34 normal groups are judged correctly, and the specificity is 100.00%;97 out of 97 patients were judged correctly, sensitivity was 100.00%;19 out of 19 tuberculosis patients were judged correctly with sensitivity of 100.00%; the 48 cases of the similar patients were judged to be correct for 48 cases, and the sensitivity was 100.00%.

Example 4 identification of novel coronavirus infection characteristic Polypeptides

After the peaks to be identified were determined in examples 2 and 3, 7 serum samples with different intensities of the peaks to be identified in the pre-treatment samples were searched. After the sample is reduced by DTT, the protein with molecular weight more than 50kDa is removed by ultrafiltration and centrifugation. The small molecule proteins/polypeptides filtered off were separated by tricine-SDS-PAGE. And carrying out secondary mass spectrum identification on each strip after the strips are subjected to intra-gel enzyme digestion.

Polypeptide sequence identification was performed using a nano-LC-MS/MS platform, including nanoflow HPLC (Thermo FISHER SCIENTIFIC, USA) and Q-Exactive mass spectrometer (Thermo FISHER SCIENTIFIC, USA). The ion mode is a positive ion mode, and the scanning range is 300-1400m/z. The primary mass spectrum resolution is 70000 and the secondary mass spectrum resolution is 17500.

Liquid phase analysis column: model: exsil Pure 120C18 (Dr. Maisch GmbH, USA); specification of: 360 μm by 12cm; inner diameter: 150 μm; and (3) granulating: 1.9um. Elution mode: the mobile phase eluted linearly from 7%B (80% acetonitrile, 0.1% formic acid) to 45% b. Flow rate: 600nl/min; the total time was 38 minutes. The results of the identification are shown in tables 3 and 4.

TABLE 3 characterization of characteristic peak Polypeptides

m/z	Gene name	Protein name
			5158	H2AJ	Histone H2A.J
6357	S100A7	Protein S100-A7
			6654	IGLL5	Immunoglobulin lambda-like polypeptide 5
6939	UBB	Polyubiquitin-B
			7364	IGKV3-7	Probable non-functional immunoglobulin kappa variable 3-7
7614	PF4V1	Platelet factor 4variant
			8034	IGKV3-15	Immunoglobulin kappa variable 3-15
8226	CFI	Complement factor I
			8986	RAB7A	Ras-related protein Rab-7a
9626	ELANE	Neutrophil elastase
			13719	B2M	Beta-2-microglobulin
13765	TTR	Transthyretin
			13886	PPBP	Platelet basic protein
14049	DUSP14	Dual specificity protein phosphatase 14
			14095	H2AC11	Histone H2A type 1
14102	H2AC6	Histone H2A type 1-C
			15123	HBA1	Hemoglobin subunit alpha
15867	HBB	Hemoglobin subunit beta
			28091	WRAP73	WD repeat-containing protein WRAP73
11435	SAA1	Serum amyloid A-1protein
			11495	SAA2	Serum amyloid A-2protein
11523	SAA1	Serum amyloid A-1protein
			11680	SAA1	Serum amyloid A-1protein

TABLE 4 polypeptide identification sequences

Example 5 Blind screening test of novel coronavirus infected patient screening model

After model training, a model was created of the input variables of the 25 characteristic polypeptide fragments of SEQ ID NO. 1-15 and of the 29 characteristic polypeptide fragments of SEQ ID NO. 1-19, and in addition, a model was created of the input variables of the 19 characteristic polypeptide fragments (i.e., sequences SEQ 1-19) that were sequenced.

According to the method of example 3, 49 patients, 12 normal persons, 14 tuberculosis, 21 samples of the type symptoms were blindly predicted by using the above three models, and the type of the sample was judged, and the method was the same as that of the above example. The results are shown in tables 5-1, 5-2 and 5-3, respectively.

TABLE 5-1 prediction of test samples by 25 variables

Sample of	Number of examples	Prediction of novel coronavirus infection	Prediction of non-novel coronavirus infection	Prediction accuracy%
					Patient group	49	48	1	97.96
Normal group	12	0	12	100.00
					Tuberculosis group	14	0	14	100.00
Symptom analogue group	25	0	25	100.00
					Totals to	100			99.00

From Table 5-1, it can be seen that the results for the test group samples are: 12 cases in the 12 normal cases are judged correctly, and the specificity is 100.00%; 48 out of 49 patients were judged correctly, sensitivity was 97.96%;14 out of 14 tuberculosis patients were judged correctly with a sensitivity of 100.00%;25 of the 25 symptoms were judged to be correct similarly to 25 of the patients, with a sensitivity of 100.00%.

TABLE 5-2 prediction of test sample results by 29 variables

From Table 5-2, it can be seen that the results for the test group samples are: 12 cases in the 12 normal cases are judged correctly, and the specificity is 100.00%; 48 out of 49 patients were judged correctly, sensitivity was 97.96%;14 out of 14 tuberculosis patients were judged correctly with a sensitivity of 100.00%;25 of the 25 symptoms were judged to be correct similarly to 25 of the patients, with a sensitivity of 100.00%.

As can be seen from tables 5-1 and 5-2, the prediction accuracy for 100 identical samples both met the criteria for clinical diagnosis. Although the accuracy is the same, it may be because the number of domestic patients to be examined is too small, resulting in no differentiation. However, according to the accuracy of ten-fold cross validation, it can be predicted that as the number of patients to be detected increases, a mass spectrum diagnosis model using 29 characteristic polypeptides will exhibit higher accuracy.

TABLE 5-3 prediction of test samples by 19 variables

Sample of	Number of examples	Prediction of novel coronavirus infection	Prediction of non-novel coronavirus infection	Prediction accuracy%
					Patient group	49	46	3	93.88
Normal group	12	0	12	100.00
					Tuberculosis group	14	0	14	100.00
Symptom analogue group	25	4	21	84.00
					Totals to	100			93.00

From tables 5-3, it can be seen that the results for the test group samples are: 46 of 49 new patients with coronary disease are judged correctly, and the sensitivity is 93.88%;12 cases in the 12 normal cases are judged correctly, and the specificity is 100.00%;14 out of 14 tuberculosis patients were judged correctly with a specificity of 100.00%; the 25 symptoms were judged to be correct in 21 of the similar patients, with a sensitivity of 84.00%. This demonstrates that the model consisting of the input variables for the 19 characteristic polypeptides has the same specificity as the detection results for the complete variable for healthy people and tuberculosis patients, with very few erroneous decisions occurring in the other two groups. The model meets the clinical requirement of rapid screening and diagnosis of patients.

In addition, it can be seen from the above table that: the complete variable of 29 characteristic polypeptides is basically the same as model training in blind selection detection accuracy of a novel coronavirus infection group, but the prediction result of a non-novel coronavirus infection group reaches 100%, so that on the result after model training, an experimenter can completely exclude false positive results through fine optimization, and the true reliability of the diagnosis result of the positive results is proved, and missing diagnosis and/or misdiagnosis are avoided to the greatest extent, so that the method has positive significance.

The foregoing is merely a preferred embodiment of the present invention, and it should be noted that it will be apparent to those skilled in the art that several improvements and modifications can be made without departing from the technical principle of the present invention, and these improvements and modifications should also be considered as the scope of the present invention.

Sequence listing

<110> Beijing and Yixinbo-created biotechnology Co., ltd

<120> Kit for diagnosing new coronaries pneumonia

<140> 2021101571109

<141> 2021-02-04

<160> 19

<170> SIPOSequenceListing 1.0

<210> 1

<211> 61

<212> PRT

<213> 2 Ambystoma laterale x Ambystoma jeffersonianum

<400> 1

Thr Ile Thr Leu Glu Val Glu Pro Ser Asp Thr Ile Glu Asn Val Lys

1 5 10 15

Ala Lys Ile Gln Asp Lys Glu Gly Ile Pro Pro Asp Gln Gln Arg Leu

20 25 30

Ile Phe Ala Gly Lys Gln Leu Glu Asp Gly Arg Thr Leu Ser Asp Tyr

35 40 45

Asn Ile Gln Lys Glu Ser Thr Leu His Leu Val Leu Arg

50 55 60

<210> 2

<211> 68

<212> PRT

<213> 2 Ambystoma laterale x Ambystoma jeffersonianum

<400> 2

Glu Glu Asp Gly Asp Leu Gln Cys Leu Cys Val Lys Thr Thr Ser Gln

1 5 10 15

Val Arg Pro Arg His Ile Thr Ser Leu Glu Val Ile Lys Ala Gly Pro

20 25 30

His Cys Pro Thr Ala Gln Leu Ile Ala Thr Leu Lys Asn Gly Arg Lys

35 40 45

Ile Cys Leu Asp Leu Gln Ala Leu Leu Tyr Lys Lys Ile Ile Lys Glu

50 55 60

His Leu Glu Ser

65

<210> 3

<211> 74

<212> PRT

<213> 2 Ambystoma laterale x Ambystoma jeffersonianum

<400> 3

Met Glu Ala Pro Ala Gln Leu Leu Phe Leu Leu Leu Leu Trp Leu Pro

1 5 10 15

Asp Thr Thr Gly Glu Ile Val Met Thr Gln Ser Pro Ala Thr Leu Ser

20 25 30

Val Ser Pro Gly Glu Arg Ala Thr Leu Ser Cys Arg Ala Ser Gln Ser

35 40 45

Val Ser Ser Asn Leu Ala Trp Tyr Gln Gln Lys Pro Gly Gln Ala Pro

50 55 60

Arg Leu Leu Ile Tyr Gly Ala Ser Thr Arg

65 70

<210> 4

<211> 69

<212> PRT

<213> 2 Ambystoma laterale x Ambystoma jeffersonianum

<400> 4

Met Lys Leu Leu His Val Phe Leu Leu Phe Leu Cys Phe His Leu Arg

1 5 10 15

Phe Cys Lys Val Thr Tyr Thr Ser Gln Glu Asp Leu Val Glu Lys Lys

20 25 30

Cys Leu Ala Lys Lys Tyr Thr His Leu Ser Cys Asp Lys Val Phe Cys

35 40 45

Gln Pro Trp Gln Arg Cys Ile Glu Gly Thr Cys Val Cys Lys Leu Pro

50 55 60

Tyr Gln Cys Pro Lys

65

<210> 5

<211> 79

<212> PRT

<213> 2 Ambystoma laterale x Ambystoma jeffersonianum

<400> 5

Met Thr Ser Arg Lys Lys Val Leu Leu Lys Val Ile Ile Leu Gly Asp

1 5 10 15

Ser Gly Val Gly Lys Thr Ser Leu Met Asn Gln Tyr Val Asn Lys Lys

20 25 30

Phe Ser Asn Gln Tyr Lys Ala Thr Ile Gly Ala Asp Phe Leu Thr Lys

35 40 45

Glu Val Met Val Asp Asp Arg Leu Val Thr Met Gln Ile Trp Asp Thr

50 55 60

Ala Gly Gln Glu Arg Phe Gln Ser Leu Gly Val Ala Phe Tyr Arg

65 70 75

<210> 6

<211> 91

<212> PRT

<213> 2 Ambystoma laterale x Ambystoma jeffersonianum

<400> 6

Met Thr Leu Gly Arg Arg Leu Ala Cys Leu Phe Leu Ala Cys Val Leu

1 5 10 15

Pro Ala Leu Leu Leu Gly Gly Thr Ala Leu Ala Ser Glu Ile Val Gly

20 25 30

Gly Arg Arg Ala Arg Pro His Ala Trp Pro Phe Met Val Ser Leu Gln

35 40 45

Leu Arg Gly Gly His Phe Cys Gly Ala Thr Leu Ile Ala Pro Asn Phe

50 55 60

Val Met Ser Ala Ala His Cys Val Ala Asn Val Asn Val Arg Ala Val

65 70 75 80

Arg Val Val Leu Gly Ala His Asn Leu Ser Arg

85 90

<210> 7

<211> 119

<212> PRT

<213> 2 Ambystoma laterale x Ambystoma jeffersonianum

<400> 7

Met Ser Arg Ser Val Ala Leu Ala Val Leu Ala Leu Leu Ser Leu Ser

1 5 10 15

Gly Leu Glu Ala Ile Gln Arg Thr Pro Lys Ile Gln Val Tyr Ser Arg

20 25 30

His Pro Ala Glu Asn Gly Lys Ser Asn Phe Leu Asn Cys Tyr Val Ser

35 40 45

Gly Phe His Pro Ser Asp Ile Glu Val Asp Leu Leu Lys Asn Gly Glu

50 55 60

Arg Ile Glu Lys Val Glu His Ser Asp Leu Ser Phe Ser Lys Asp Trp

65 70 75 80

Ser Phe Tyr Leu Leu Tyr Tyr Thr Glu Phe Thr Pro Thr Glu Lys Asp

85 90 95

Glu Tyr Ala Cys Arg Val Asn His Val Thr Leu Ser Gln Pro Lys Ile

100 105 110

Val Lys Trp Asp Arg Asp Met

115

<210> 8

<211> 127

<212> PRT

<213> 2 Ambystoma laterale x Ambystoma jeffersonianum

<400> 8

Gly Pro Thr Gly Thr Gly Glu Ser Lys Cys Pro Leu Met Val Lys Val

1 5 10 15

Leu Asp Ala Val Arg Gly Ser Pro Ala Ile Asn Val Ala Val His Val

20 25 30

Phe Arg Lys Ala Ala Asp Asp Thr Trp Glu Pro Phe Ala Ser Gly Lys

35 40 45

Thr Ser Glu Ser Gly Glu Leu His Gly Leu Thr Thr Glu Glu Glu Phe

50 55 60

Val Glu Gly Ile Tyr Lys Val Glu Ile Asp Thr Lys Ser Tyr Trp Lys

65 70 75 80

Ala Leu Gly Ile Ser Pro Phe His Glu His Ala Glu Val Val Phe Thr

85 90 95

Ala Asn Asp Ser Gly Pro Arg Arg Tyr Thr Ile Ala Ala Leu Leu Ser

100 105 110

Pro Tyr Ser Tyr Ser Thr Thr Ala Val Val Thr Asn Pro Lys Glu

115 120 125

<210> 9

<211> 128

<212> PRT

<213> 2 Ambystoma laterale x Ambystoma jeffersonianum

<400> 9

Met Ser Leu Arg Leu Asp Thr Thr Pro Ser Cys Asn Ser Ala Arg Pro

1 5 10 15

Leu His Ala Leu Gln Val Leu Leu Leu Leu Ser Leu Leu Leu Thr Ala

20 25 30

Leu Ala Ser Ser Thr Lys Gly Gln Thr Lys Arg Asn Leu Ala Lys Gly

35 40 45

Lys Glu Glu Ser Leu Asp Ser Asp Leu Tyr Ala Glu Leu Arg Cys Met

50 55 60

Cys Ile Lys Thr Thr Ser Gly Ile His Pro Lys Asn Ile Gln Ser Leu

65 70 75 80

Glu Val Ile Gly Lys Gly Thr His Cys Asn Gln Val Glu Val Ile Ala

85 90 95

Thr Leu Lys Asp Gly Arg Lys Ile Cys Leu Asp Pro Asp Ala Pro Arg

100 105 110

Ile Lys Lys Ile Val Gln Lys Lys Leu Ala Gly Asp Glu Ser Ala Asp

115 120 125

<210> 10

<211> 123

<212> PRT

<213> 2 Ambystoma laterale x Ambystoma jeffersonianum

<400> 10

Val Pro Leu Ala Asp Met Pro His Ala Pro Ile Gly Leu Tyr Phe Asp

1 5 10 15

Thr Val Ala Asp Lys Ile His Ser Val Ser Arg Lys His Gly Ala Thr

20 25 30

Leu Val His Cys Ala Ala Gly Val Ser Arg Ser Ala Thr Leu Cys Ile

35 40 45

Ala Tyr Leu Met Lys Phe His Asn Val Cys Leu Leu Glu Ala Tyr Asn

50 55 60

Trp Val Lys Ala Arg Arg Pro Val Ile Arg Pro Asn Val Gly Phe Trp

65 70 75 80

Arg Gln Leu Ile Asp Tyr Glu Arg Gln Leu Phe Gly Lys Ser Thr Val

85 90 95

Lys Met Val Gln Thr Pro Tyr Gly Ile Val Pro Asp Val Tyr Glu Lys

100 105 110

Glu Ser Arg His Leu Met Pro Tyr Trp Gly Ile

115 120

<210> 11

<211> 130

<212> PRT

<213> 2 Ambystoma laterale x Ambystoma jeffersonianum

<400> 11

Met Ser Gly Arg Gly Lys Gln Gly Gly Lys Ala Arg Ala Lys Ala Lys

1 5 10 15

Ser Arg Ser Ser Arg Ala Gly Leu Gln Phe Pro Val Gly Arg Val His

20 25 30

Arg Leu Leu Arg Lys Gly Asn Tyr Ala Glu Arg Val Gly Ala Gly Ala

35 40 45

Pro Val Tyr Met Ala Ala Val Leu Glu Tyr Leu Thr Ala Glu Ile Leu

50 55 60

Glu Leu Ala Gly Asn Ala Ala Arg Asp Asn Lys Lys Thr Arg Ile Ile

65 70 75 80

Pro Arg His Leu Gln Leu Ala Ile Arg Asn Asp Glu Glu Leu Asn Lys

85 90 95

Leu Leu Gly Lys Val Thr Ile Ala Gln Gly Gly Val Leu Pro Asn Ile

100 105 110

Gln Ala Val Leu Leu Pro Lys Lys Thr Glu Ser His His Lys Ala Lys

115 120 125

Gly Lys

130

<210> 12

<211> 130

<212> PRT

<213> 2 Ambystoma laterale x Ambystoma jeffersonianum

<400> 12

Met Ser Gly Arg Gly Lys Gln Gly Gly Lys Ala Arg Ala Lys Ala Lys

1 5 10 15

Ser Arg Ser Ser Arg Ala Gly Leu Gln Phe Pro Val Gly Arg Val His

20 25 30

Arg Leu Leu Arg Lys Gly Asn Tyr Ala Glu Arg Val Gly Ala Gly Ala

35 40 45

Pro Val Tyr Leu Ala Ala Val Leu Glu Tyr Leu Thr Ala Glu Ile Leu

50 55 60

Glu Leu Ala Gly Asn Ala Ala Arg Asp Asn Lys Lys Thr Arg Ile Ile

65 70 75 80

Pro Arg His Leu Gln Leu Ala Ile Arg Asn Asp Glu Glu Leu Asn Lys

85 90 95

Leu Leu Gly Arg Val Thr Ile Ala Gln Gly Gly Val Leu Pro Asn Ile

100 105 110

Gln Ala Val Leu Leu Pro Lys Lys Thr Glu Ser His His Lys Ala Lys

115 120 125

Gly Lys

130

<210> 13

<211> 141

<212> PRT

<213> 2 Ambystoma laterale x Ambystoma jeffersonianum

<400> 13

Val Leu Ser Pro Ala Asp Lys Thr Asn Val Lys Ala Ala Trp Gly Lys

1 5 10 15

Val Gly Ala His Ala Gly Glu Tyr Gly Ala Glu Ala Leu Glu Arg Met

20 25 30

Phe Leu Ser Phe Pro Thr Thr Lys Thr Tyr Phe Pro His Phe Asp Leu

35 40 45

Ser His Gly Ser Ala Gln Val Lys Gly His Gly Lys Lys Val Ala Asp

50 55 60

Ala Leu Thr Asn Ala Val Ala His Val Asp Asp Met Pro Asn Ala Leu

65 70 75 80

Ser Ala Leu Ser Asp Leu His Ala His Lys Leu Arg Val Asp Pro Val

85 90 95

Asn Phe Lys Leu Leu Ser His Cys Leu Leu Val Thr Leu Ala Ala His

100 105 110

Leu Pro Ala Glu Phe Thr Pro Ala Val His Ala Ser Leu Asp Lys Phe

115 120 125

Leu Ala Ser Val Ser Thr Val Leu Thr Ser Lys Tyr Arg

130 135 140

<210> 14

<211> 146

<212> PRT

<213> 2 Ambystoma laterale x Ambystoma jeffersonianum

<400> 14

Val His Leu Thr Pro Glu Glu Lys Ser Ala Val Thr Ala Leu Trp Gly

1 5 10 15

Lys Val Asn Val Asp Glu Val Gly Gly Glu Ala Leu Gly Arg Leu Leu

20 25 30

Val Val Tyr Pro Trp Thr Gln Arg Phe Phe Glu Ser Phe Gly Asp Leu

35 40 45

Ser Thr Pro Asp Ala Val Met Gly Asn Pro Lys Val Lys Ala His Gly

50 55 60

Lys Lys Val Leu Gly Ala Phe Ser Asp Gly Leu Ala His Leu Asp Asn

65 70 75 80

Leu Lys Gly Thr Phe Ala Thr Leu Ser Glu Leu His Cys Asp Lys Leu

85 90 95

His Val Asp Pro Glu Asn Phe Arg Leu Leu Gly Asn Val Leu Val Cys

100 105 110

Val Leu Ala His His Phe Gly Lys Glu Phe Thr Pro Pro Val Gln Ala

115 120 125

Ala Tyr Gln Lys Val Val Ala Gly Val Ala Asn Ala Leu Ala His Lys

130 135 140

Tyr His

145

<210> 15

<211> 257

<212> PRT

<213> 2 Ambystoma laterale x Ambystoma jeffersonianum

<400> 15

Ile Leu Leu Tyr Ser Leu Asp Gly Arg Leu Leu Ser Thr Tyr Ser Ala

1 5 10 15

Tyr Glu Trp Ser Leu Gly Ile Lys Ser Val Ala Trp Ser Pro Ser Ser

20 25 30

Gln Phe Leu Ala Val Gly Ser Tyr Asp Gly Lys Val Arg Ile Leu Asn

35 40 45

His Val Thr Trp Lys Met Ile Thr Glu Phe Gly His Pro Ala Ala Ile

50 55 60

Asn Asp Pro Lys Ile Val Val Tyr Lys Glu Ala Glu Lys Ser Pro Gln

65 70 75 80

Leu Gly Leu Gly Cys Leu Ser Phe Pro Pro Pro Arg Ala Gly Ala Gly

85 90 95

Pro Leu Pro Ser Ser Glu Ser Lys Tyr Glu Ile Ala Ser Val Pro Val

100 105 110

Ser Leu Gln Thr Leu Lys Pro Val Thr Asp Arg Ala Asn Pro Lys Ile

115 120 125

Gly Ile Gly Met Leu Ala Phe Ser Pro Asp Ser Tyr Phe Leu Ala Thr

130 135 140

Arg Asn Asp Asn Ile Pro Asn Ala Val Trp Val Trp Asp Ile Gln Lys

145 150 155 160

Leu Arg Leu Phe Ala Val Leu Glu Gln Leu Ser Pro Val Arg Ala Phe

165 170 175

Gln Trp Asp Pro Gln Gln Pro Arg Leu Ala Ile Cys Thr Gly Gly Ser

180 185 190

Arg Leu Tyr Leu Trp Ser Pro Ala Gly Cys Met Ser Val Gln Val Pro

195 200 205

Gly Glu Gly Asp Phe Ala Val Leu Ser Leu Cys Trp His Leu Ser Gly

210 215 220

Asp Ser Met Ala Leu Leu Ser Lys Asp His Phe Cys Leu Cys Phe Leu

225 230 235 240

Glu Thr Glu Ala Val Val Gly Thr Ala Cys Arg Gln Leu Gly Gly His

245 250 255

Thr

<210> 16

<211> 102

<212> PRT

<213> 2 Ambystoma laterale x Ambystoma jeffersonianum

<400> 16

Phe Phe Ser Phe Leu Gly Glu Ala Phe Asp Gly Ala Arg Asp Met Trp

1 5 10 15

Arg Ala Tyr Ser Asp Met Arg Glu Ala Asn Tyr Ile Gly Ser Asp Lys

20 25 30

Tyr Phe His Ala Arg Gly Asn Tyr Asp Ala Ala Lys Arg Gly Pro Gly

35 40 45

Gly Val Trp Ala Ala Glu Ala Ile Ser Asp Ala Arg Glu Asn Ile Gln

50 55 60

Arg Phe Phe Gly His Gly Ala Glu Asp Ser Leu Ala Asp Gln Ala Ala

65 70 75 80

Asn Glu Trp Gly Arg Ser Gly Lys Asp Pro Asn His Phe Arg Pro Ala

85 90 95

Gly Leu Pro Glu Lys Tyr

100

<210> 17

<211> 103

<212> PRT

<213> 2 Ambystoma laterale x Ambystoma jeffersonianum

<400> 17

Ser Phe Phe Ser Phe Leu Gly Glu Ala Phe Asp Gly Ala Arg Asp Met

1 5 10 15

Trp Arg Ala Tyr Ser Asp Met Arg Glu Ala Asn Tyr Ile Gly Ser Asp

20 25 30

Lys Tyr Phe His Ala Arg Gly Asn Tyr Asp Ala Ala Lys Arg Gly Pro

35 40 45

Gly Gly Ala Trp Ala Ala Glu Val Ile Ser Asn Ala Arg Glu Asn Ile

50 55 60

Gln Arg Leu Thr Gly Arg Gly Ala Glu Asp Ser Leu Ala Asp Gln Ala

65 70 75 80

Ala Asn Lys Trp Gly Arg Ser Gly Arg Asp Pro Asn His Phe Arg Pro

85 90 95

Ala Gly Leu Pro Glu Lys Tyr

100

<210> 18

<211> 103

<212> PRT

<213> 2 Ambystoma laterale x Ambystoma jeffersonianum

<400> 18

Ser Phe Phe Ser Phe Leu Gly Glu Ala Phe Asp Gly Ala Arg Asp Met

1 5 10 15

Trp Arg Ala Tyr Ser Asp Met Arg Glu Ala Asn Tyr Ile Gly Ser Asp

20 25 30

Lys Tyr Phe His Ala Arg Gly Asn Tyr Asp Ala Ala Lys Arg Gly Pro

35 40 45

Gly Gly Val Trp Ala Ala Glu Ala Ile Ser Asp Ala Arg Glu Asn Ile

50 55 60

Gln Arg Phe Phe Gly His Gly Ala Glu Asp Ser Leu Ala Asp Gln Ala

65 70 75 80

Ala Asn Glu Trp Gly Arg Ser Gly Lys Asp Pro Asn His Phe Arg Pro

85 90 95

Ala Gly Leu Pro Glu Lys Tyr

100

<210> 19

<211> 104

<212> PRT

<213> 2 Ambystoma laterale x Ambystoma jeffersonianum

<400> 19

Arg Ser Phe Phe Ser Phe Leu Gly Glu Ala Phe Asp Gly Ala Arg Asp

1 5 10 15

Met Trp Arg Ala Tyr Ser Asp Met Arg Glu Ala Asn Tyr Ile Gly Ser

20 25 30

Asp Lys Tyr Phe His Ala Arg Gly Asn Tyr Asp Ala Ala Lys Arg Gly

35 40 45

Pro Gly Gly Val Trp Ala Ala Glu Ala Ile Ser Asp Ala Arg Glu Asn

50 55 60

Ile Gln Arg Phe Phe Gly His Gly Ala Glu Asp Ser Leu Ala Asp Gln

65 70 75 80

Ala Ala Asn Glu Trp Gly Arg Ser Gly Lys Asp Pro Asn His Phe Arg

85 90 95

Pro Ala Gly Leu Pro Glu Lys Tyr

100

Claims

1. A kit for detecting a novel coronavirus infection, wherein the kit comprises software or a chip of a standard database of characteristic polypeptides, and is useful for providing a standard data or curve alignment for mass spectrometry of a sample to be tested to determine the expression profile of the characteristic polypeptides in the sample to be tested, comprising a polypeptide composition consisting of:

a characteristic polypeptide having a mass to charge ratio of 11680m/z, the polypeptide sequence of which is selected from the group consisting of the sequences shown in SEQ ID No. 19;

Wherein a positive sample is indicated when the peak of the characteristic polypeptide 7614m/z, 8034m/z, 8226m/z, 8986m/z, 9626m/z, 15123m/z, 15867m/z, 28091m/z is up-regulated while the peak of the characteristic polypeptide 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z is down-regulated in a serum sample for mass spectrometry detection, i.e., the serum sample is determined to be provided to a novel coronavirus infected patient, and a ten fold cross-validation accuracy is about 93.31%.

2. The kit of claim 1, wherein the polypeptide composition consists only of the characteristic polypeptides having mass to charge ratios of 8986m/z, 28091m/z, 6939m/z, 13886m/z, 14049m/z, 14102m/z, respectively, wherein a ten fold cross-validation accuracy of about 91% is indicated when peaks of the characteristic polypeptides 8986m/z, 28091m/z are up-regulated while peaks of the characteristic polypeptides 6939m/z, 13886m/z, 14049m/z, 14102m/z are down-regulated for expression, indicating that the serum sample is a positive sample, i.e., a provider of the serum sample is determined to be a novel coronavirus infected patient.

3. The kit according to any one of claims 1-2, wherein the kit comprises a sample processing fluid.

4. The kit of claim 3, wherein the kit further comprises a standard mass spectrometry sample tube that ensures that the molecular weight measured by the mass spectrometer is accurate.