CN103617375B

CN103617375B - The method and system of PCR product sequencing and typing

Info

Publication number: CN103617375B
Application number: CN201310634809.5A
Authority: CN
Inventors: 陈永胜; 蒋帅; 龚小龙
Original assignee: SHENZHEN HUADA GENE HEALTH TECHNOLOGY Co Ltd
Current assignee: Bgi Diagnosis Co ltd; BGI Shenzhen Co Ltd
Priority date: 2013-12-02
Filing date: 2013-12-02
Publication date: 2017-08-25
Anticipated expiration: 2033-12-02
Also published as: CN103617375A

Abstract

The present invention proposes a method for sequencing and typing polymerase chain reaction products, comprising the following steps: providing base sequences to be typed, file information and primer file information; allele type information based on file information and reference sequences Identify the alignment positional relationship between the base sequence to be typed and the reference sequence, and obtain the first candidate type combination to be analyzed; perform typing and recognition on the base sequence to be typed according to the primer file information and the primer information of the reference sequence, and obtain the second A combination of candidate types to be analyzed; the final type of the base sequence to be typed is obtained according to the first and second combination of candidate types to be analyzed. The embodiment according to the present invention has the advantages of high typing efficiency, accurate and reliable typing, short typing time and low labor cost. The invention also provides a system for sequencing and typing polymerase chain reaction products.

Description

Method and system for sequencing and typing polymerase chain reaction products

技术领域technical field

本发明涉及等位基因分型技术领域，特别涉及一种聚合酶链式反应产物测序分型的方法及系统。The invention relates to the technical field of allele typing, in particular to a method and system for sequencing and typing polymerase chain reaction products.

背景技术Background technique

HLA(Human Leucocyte antigen),即人类白细胞抗原。它是主要组织相容性复合物(major histocompatibility complex.MHC)基因编码的一组糖蛋白.对于人体免疫系统来说，HLA与MHC是同一个概念，HLA也就是人的主要相容性复合物。HLA分型指借助血清学，细胞学或分子生物学方法检测个体的HLA抗原特异性基因。临床器官移植中，HLA分型是选择合适供者的主要依据。HLA (Human Leucocyte antigen), that is, human leukocyte antigen. It is a group of glycoproteins encoded by the major histocompatibility complex (MHC) gene. For the human immune system, HLA and MHC are the same concept, and HLA is also the human major compatibility complex . HLA typing refers to the detection of individual HLA antigen-specific genes by means of serological, cytological or molecular biological methods. In clinical organ transplantation, HLA typing is the main basis for selecting suitable donors.

目前国际标准的HLA分型技术为PCR-SSP（Polymerase Chain ReactionSequence-Specific Primers，序列特异引物聚合酶链式反应），PCR-SSO（PolymeraseChain Reaction equence-Specific Oligonucleotide Probe Hybridization，聚合酶链式反应寡核苷酸探针杂交）和PCR-SBT（Polymerase Chain Reaction Sequencing-basedTyping，聚合酶链式反应产物直接测序分型）。The current international standard HLA typing techniques are PCR-SSP (Polymerase Chain Reaction Sequence-Specific Primers, sequence-specific primer polymerase chain reaction), PCR-SSO (Polymerase Chain Reaction sequence-Specific Oligonucleotide Probe Hybridization, polymerase chain reaction oligonucleotide Nucleic acid probe hybridization) and PCR-SBT (Polymerase Chain Reaction Sequencing-based Typing, polymerase chain reaction product direct sequencing typing).

PCR-SBT是用引物对HLA基因多态性的区域进行PCR扩增，然后对扩增产物进行DNA测序，在计算机软件辅助下确定HLA等位基因型别。对于基因结构的分析，SBT是比较直观、准确的方法，同时可以直接发现新的等位基因。用PCR-SSP或PCR-SSO方法鉴别出新的等位基因，通常也通过测序加以证实。PCR-SBT uses primers to amplify the polymorphic regions of the HLA gene by PCR, then performs DNA sequencing on the amplified products, and determines the HLA allele type with the assistance of computer software. For the analysis of gene structure, SBT is a relatively intuitive and accurate method, and can directly discover new alleles. Novel alleles are identified using PCR-SSP or PCR-SSO methods and are usually also confirmed by sequencing.

SBT技术中，利用扩增产物对DNA测序后，需要通过软件将测序所得结果与国际组织IMGT数据库(http://www.ebi.ac.uk/imgt/)中公布的HLA标准序列进行比对；通过比对得出样品序列与标准序列的匹配率；根据匹配率高低得出样品序列的分型结论。In SBT technology, after the DNA is sequenced using the amplified product, it is necessary to use software to compare the sequencing results with the HLA standard sequence published in the international organization IMGT database (http://www.ebi.ac.uk/imgt/) ; Obtain the matching rate of the sample sequence and the standard sequence through comparison; draw the typing conclusion of the sample sequence according to the matching rate.

目前比较成功的分型软件有uType和SBTengine，它们都是基于PCR-SBT方法的测序。Currently, relatively successful typing software include uType and SBTengine, both of which are based on PCR-SBT sequencing.

但是，随着IMGT数据库等位基因量增大，以上软件分型过程中进行大范围候选型别筛查检索时，速度越慢，效率越低。此外，人工辅助分型阶段，分型人员进行峰图查看时，操作程序较烦琐,时间较长。同时，现有HLA分型软件只适用于Windows系统，而无法于linux及Mac系统上运行。However, as the number of alleles in the IMGT database increases, the slower the speed and the lower the efficiency when performing a large-scale candidate type screening search during the typing process of the above software. In addition, in the manual-assisted typing stage, when the typing personnel check the electropherogram, the operation procedure is cumbersome and takes a long time. At the same time, the existing HLA typing software is only applicable to Windows systems, but cannot run on Linux and Mac systems.

发明内容Contents of the invention

本发明的目的旨在至少解决上述技术缺陷之一。The purpose of the present invention is to solve at least one of the above-mentioned technical drawbacks.

为此，本发明的目的在于提出一种聚合酶链式反应产物测序分型的方法。该方法具有分型效率高、分型准确可靠且分型时间短以及人力成本低的优点。Therefore, the object of the present invention is to propose a method for sequencing and typing polymerase chain reaction products. The method has the advantages of high typing efficiency, accurate and reliable typing, short typing time and low labor cost.

本发明的另一目的在于提出一种聚合酶链式反应产物测序分型的系统。Another object of the present invention is to provide a system for sequencing and typing polymerase chain reaction products.

为达到所述目的，本发明的实施例提供了一种聚合酶链式反应产物测序分型的方法，包括以下步骤：获取待分型碱基序列、与所述待分型碱基序列关联的文件信息和引物文件信息，其中，所述文件信息包括所述待分型碱基序列的位点名称；根据所述文件信息和参考序列的等位基因型别信息识别所述待分型碱基序列和参考序列的联配位置关系，以得到第一待分析候选型别组合；根据所述引物文件信息和所述参考序列的引物信息对所述待分型碱基序列进行进一步分型识别，以得到第二待分析候选型别组合；以及根据所述第一待分析候选型别组合和所述第二待分析候选型别组合得到所述待分型碱基序列的最终型别。In order to achieve the above purpose, an embodiment of the present invention provides a method for sequencing and typing polymerase chain reaction products, including the following steps: obtaining the base sequence to be typed, and the base sequence associated with the base sequence to be typed File information and primer file information, wherein the file information includes the site name of the base sequence to be typed; the base to be typed is identified according to the file information and the allele type information of the reference sequence sequence and the alignment positional relationship of the reference sequence to obtain the first candidate type combination to be analyzed; according to the primer file information and the primer information of the reference sequence, the base sequence to be typed is further typed and identified, to obtain a second combination of candidate types to be analyzed; and to obtain the final type of the base sequence to be typed according to the first combination of candidate types to be analyzed and the second combination of candidate types to be analyzed.

根据本发明实施例的聚合酶链式反应产物测序分型的方法，根据参考序列以及其相关信息和待分型碱基序列以及其相关信息可直接识别待分型碱基序列到参考序列的联配位置关系，从而，减少了读取碱基序列逐一比对到各个参考序列的对应位点的步骤，缩减了比对时间。尤其在进行大范围候选型别（如很多个参考序列，很多个型别）筛查检索时，大大缩短了比对时间。另外，根据引物信息进行再次识别分型，这样，保证分型的正确性和可靠性。此外，该方法可自动地对待分型碱基序列进行分型，减少人工导入GSSP信息等繁琐，有效节约了时间并减低了人工成本。According to the method for sequencing and typing of polymerase chain reaction products according to the embodiment of the present invention, the link between the base sequence to be typed and the reference sequence can be directly identified according to the reference sequence and its related information and the base sequence to be typed and its related information. Therefore, the step of aligning the read base sequence to the corresponding position of each reference sequence is reduced, and the comparison time is shortened. Especially when screening and searching a large range of candidate types (such as many reference sequences, many types), the comparison time is greatly shortened. In addition, the re-identification and typing are carried out according to the primer information, thus ensuring the correctness and reliability of the typing. In addition, the method can automatically type the base sequence to be typed, reducing the tediousness of manually importing GSSP information, effectively saving time and reducing labor costs.

另外，根据本发明上述实施例的聚合酶链式反应产物测序分型的方法还可以具有如下附加的技术特征：In addition, the method for sequencing and typing polymerase chain reaction products according to the above-mentioned embodiments of the present invention may also have the following additional technical features:

在一些示例中，所述根据所述文件信息和参考序列的等位基因型别信息识别所述待分型碱基序列和参考序列的联配位置关系，以得到第一待分析候选型别组合，进一步包括：根据所述文件信息包括的所述待分型碱基序列的位点名称与所述参考序列的等位基因型别信息进行匹配；根据匹配结果得到所述待分型碱基序列和参考序列的联配位置关系；根据所述联配位置关系得到所述第一待分析候选型别组合。In some examples, the alignment position relationship between the base sequence to be typed and the reference sequence is identified according to the file information and the allelic type information of the reference sequence, so as to obtain the first candidate type combination to be analyzed , further comprising: matching the site name of the base sequence to be typed included in the file information with the allelic type information of the reference sequence; obtaining the base sequence to be typed according to the matching result The alignment positional relationship with the reference sequence; according to the alignment positional relationship, the first combination of candidate types to be analyzed is obtained.

进一步地，所述参考序列的等位基因型别信息包括IMGT的HLA I型和II型位点基因信息，所述文件信息包括的所述待分型碱基序列的位点名称为对应位点的位点基因信息。Further, the allelic type information of the reference sequence includes the HLA type I and type II site gene information of IMGT, and the site name of the base sequence to be typed included in the file information is the corresponding site locus gene information.

在一些示例中，所述根据所述引物文件信息和所述参考序列的引物信息对所述待分型碱基序列进行进一步分型识别，以得到第二待分析候选型别组合，进一步包括：对所述引物文件信息和所述参考序列的引物信息进行匹配；如果匹配一致，则将对应的参考序列添加到所述第二待分析候选型别组合中。In some examples, further typing and identifying the base sequence to be typed according to the primer file information and the primer information of the reference sequence to obtain a second candidate type combination to be analyzed further includes: Matching the primer file information and the primer information of the reference sequence; if the matching is consistent, adding the corresponding reference sequence to the second candidate type combination to be analyzed.

进一步地，所述参考序列的引物信息包括HLA I型和II型的GSSP引物信息即备用引物信息。Further, the primer information of the reference sequence includes GSSP primer information of HLA type I and II, that is, backup primer information.

在一些示例中，所述根据所述第一待分析候选型别组合和所述第二待分析候选型别组合得到所述待分型碱基序列的最终型别，进一步包括：获取所述第一待分析候选型别组合和所述第二待分析候选型别组合的交集；根据所述交集确定所述待分型碱基序列的最终型别。In some examples, the obtaining the final type of the base sequence to be typed according to the first candidate type combination to be analyzed and the second candidate type combination to be analyzed further includes: obtaining the first An intersection of the combination of candidate types to be analyzed and the second combination of candidate types to be analyzed; according to the intersection, the final type of the base sequence to be typed is determined.

本发明第二方面的实施例提供了一种聚合酶链式反应产物测序分型的系统，包括：存储模块，用于存储待分型碱基序列、与所述待分型碱基序列关联的文件信息和引物文件信息，以及参考序列、所述参考序列的等位基因型别信息和引物信息，其中，所述文件信息包括所述待分型碱基序列的位点名称；分析模块，用于根据所述文件信息和参考序列的等位基因型别信息识别所述待分型碱基序列和参考序列的联配位置关系，以得到第一待分析候选型别组合，并根据所述引物文件信息和所述参考序列的引物信息对所述待分型碱基序列进行进一步分型识别，以得到第二待分析候选型别组合，以及根据所述第一待分析候选型别组合和所述第二待分析候选型别组合得到所述待分型碱基序列的最终型别；和显示模块，用于显示所述存储模块存储的信息以及所述分析模块的分析结果。The embodiment of the second aspect of the present invention provides a system for sequencing and typing polymerase chain reaction products, including: a storage module for storing the base sequence to be typed, and the base sequence associated with the base sequence to be typed File information and primer file information, as well as reference sequence, allele type information and primer information of the reference sequence, wherein the file information includes the site name of the base sequence to be typed; the analysis module uses Identifying the alignment positional relationship between the base sequence to be typed and the reference sequence based on the file information and the allelic type information of the reference sequence to obtain the first candidate type combination to be analyzed, and according to the primer The file information and the primer information of the reference sequence are further typed and identified on the base sequence to be typed to obtain a second candidate type combination to be analyzed, and according to the first candidate type combination to be analyzed and the Combining the second candidate types to be analyzed to obtain the final type of the base sequence to be typed; and a display module for displaying the information stored in the storage module and the analysis results of the analysis module.

根据本发明实施例的聚合酶链式反应产物测序分型的系统，根据参考序列以及其相关信息和待分型碱基序列以及其相关信息可直接识别待分型碱基序列到参考序列的联配位置关系，从而，减少了读取碱基序列逐一比对到各个参考序列的对应位点的步骤，缩减了比对时间。尤其在进行大范围候选型别（如很多个参考序列，很多个型别）筛查检索时，大大缩短了比对时间。另外，根据引物信息进行再次识别分型，这样，保证分型的正确性和可靠性。此外，该系统可自动地对待分型碱基序列进行分型，减少人工导入GSSP信息等繁琐，有效节约了时间并减低了人工成本。According to the polymerase chain reaction product sequencing and typing system of the embodiment of the present invention, according to the reference sequence and its related information, the base sequence to be typed and its related information can directly identify the combination of the base sequence to be typed and the reference sequence. Therefore, the step of aligning the read base sequence to the corresponding position of each reference sequence is reduced, and the comparison time is shortened. Especially when screening and searching a large range of candidate types (such as many reference sequences, many types), the comparison time is greatly shortened. In addition, the re-identification and typing are carried out according to the primer information, thus ensuring the correctness and reliability of the typing. In addition, the system can automatically type the base sequence to be typed, reducing the tediousness of manually importing GSSP information, effectively saving time and reducing labor costs.

另外，根据本发明上述实施例的聚合酶链式反应产物测序分型的系统还可以具有如下附加的技术特征：In addition, the system for sequencing and typing polymerase chain reaction products according to the above embodiments of the present invention may also have the following additional technical features:

在一些示例中，所述分析模块用于：根据所述文件信息包括的所述待分型碱基序列的位点名称与所述参考序列的等位基因型别信息进行匹配；根据匹配结果得到所述待分型碱基序列和参考序列的联配位置关系；根据所述联配位置关系得到所述第一待分析候选型别组合。In some examples, the analysis module is used to: match the site name of the base sequence to be typed included in the file information with the allele type information of the reference sequence; obtain The alignment positional relationship between the base sequence to be typed and the reference sequence; the first candidate type combination to be analyzed is obtained according to the alignment positional relationship.

在一些示例中，所述分析模块还用于：对所述引物文件信息和所述参考序列的引物信息进行匹配；如果匹配一致，则将对应的参考序列添加到所述第二待分析候选型别组合中。In some examples, the analysis module is also used to: match the primer file information with the primer information of the reference sequence; if the match is consistent, add the corresponding reference sequence to the second candidate type to be analyzed Do not combine.

在一些示例中，所述分析模块还用于：获取所述第一待分析候选型别组合和所述第二待分析候选型别组合的交集；根据所述交集确定所述待分型碱基序列的最终型别。In some examples, the analysis module is also used to: obtain the intersection of the first candidate type combination to be analyzed and the second candidate type combination to be analyzed; determine the base to be typed according to the intersection The final type of the sequence.

本发明附加的方面和优点将在下面的描述中部分给出，部分将从下面的描述中变得明显，或通过本发明的实践了解到。Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

附图说明Description of drawings

本发明所述的和/或附加的方面和优点从下面结合附图对实施例的描述中将变得明显和容易理解，其中：The stated and/or additional aspects and advantages of the present invention will become apparent and easily understood from the following description of the embodiments in conjunction with the accompanying drawings, wherein:

图1是根据本发明一个实施例的聚合酶链式反应产物测序分型的方法的流程图；1 is a flowchart of a method for sequencing and typing polymerase chain reaction products according to an embodiment of the present invention;

图2是根据本发明一个实施例的聚合酶链式反应产物测序分型的方法对不同数量的参考序列（样本）进行分型的资源消耗的示意图；2 is a schematic diagram of resource consumption for typing different numbers of reference sequences (samples) according to a method for sequencing and typing polymerase chain reaction products according to an embodiment of the present invention;

图3是应用本发明一个实施例的聚合酶链式反应产物测序分型的方法进行分型处理的操作流程图；Fig. 3 is the operation flowchart of typing processing by applying the method for sequencing and typing polymerase chain reaction products according to one embodiment of the present invention;

图4是利用本发明一个实施例的聚合酶链式反应产物测序分型的方法的操作界面的示意图；以及Fig. 4 is a schematic diagram of the operation interface of the method for sequencing and typing polymerase chain reaction products using an embodiment of the present invention; and

图5是根据本发明一个实施例的聚合酶链式反应产物测序分型的系统的结构框图。Fig. 5 is a structural block diagram of a system for sequencing and typing polymerase chain reaction products according to an embodiment of the present invention.

具体实施方式detailed description

下面详细描述本发明的实施例，所述实施例的示例在附图中示出，其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的，仅用于解释本发明，而不能解释为对本发明的限制。Embodiments of the present invention are described in detail below, examples of which are shown in the drawings, wherein the same or similar reference numerals designate the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the figures are exemplary only for explaining the present invention and should not be construed as limiting the present invention.

在本发明的描述中，需要理解的是，术语“纵向”、“横向”、“上”、“下”、“前”、“后”、“左”、“右”、“竖直”、“水平”、“顶”、“底”“内”、“外”等指示的方位或位置关系为基于附图所示的方位或位置关系，仅是为了便于描述本发明和简化描述，而不是指示或暗示所指的装置或元件必须具有特定的方位、以特定的方位构造和操作，因此不能理解为对本发明的限制。In describing the present invention, it should be understood that the terms "longitudinal", "transverse", "upper", "lower", "front", "rear", "left", "right", "vertical", The orientation or positional relationship indicated by "horizontal", "top", "bottom", "inner", "outer", etc. are based on the orientation or positional relationship shown in the drawings, and are only for the convenience of describing the present invention and simplifying the description, rather than Nothing indicating or implying that a referenced device or element must have a particular orientation, be constructed, and operate in a particular orientation should therefore not be construed as limiting the invention.

在本发明的描述中，需要说明的是，除非另有规定和限定，术语“安装”、“相连”、“连接”应做广义理解，例如，可以是机械连接或电连接，也可以是两个元件内部的连通，可以是直接相连，也可以通过中间媒介间接相连，对于本领域的普通技术人员而言，可以根据具体情况理解所述术语的具体含义。In the description of the present invention, it should be noted that unless otherwise specified and limited, the terms "installation", "connection" and "connection" should be understood in a broad sense, for example, it can be a mechanical connection or an electrical connection, or it can be two The internal communication of each element may be directly connected or indirectly connected through an intermediary, and those skilled in the art can understand the specific meaning of the terms according to specific situations.

以下结合附图描述根据本发明实施例的聚合酶链式反应产物测序分型的方法及系统。The method and system for sequencing and typing polymerase chain reaction products according to the embodiments of the present invention will be described below with reference to the accompanying drawings.

图1是根据本发明一个实施例的聚合酶链式反应产物测序分型的方法的流程图。如图1所示，根据本发明一个实施例的聚合酶链式反应产物测序分型的方法，包括如下步骤：Fig. 1 is a flowchart of a method for sequencing and typing polymerase chain reaction products according to an embodiment of the present invention. As shown in Figure 1, the method for sequencing and typing polymerase chain reaction products according to one embodiment of the present invention includes the following steps:

步骤S101：获取待分型碱基序列、与待分型碱基序列关联的文件信息和引物文件信息，其中，文件信息包括待分型碱基序列的位点名称。Step S101: Obtain the base sequence to be typed, the file information associated with the base sequence to be typed, and the primer file information, wherein the file information includes the site name of the base sequence to be typed.

具体地说，待分型碱基序列、与待分型碱基序列关联的文件信息和引物文件信息可预先存储在数据库中。Specifically, the base sequence to be typed, the file information associated with the base sequence to be typed and the primer file information may be pre-stored in the database.

例如：待分型碱基序列、与待分型碱基序列关联的文件信息和引物文件信息等存储在一个动态数据库中，该动态数据库主要用于存储分析样品（即待分型碱基序列）中的中间结果数据。该动态数据库例如存储了3个数据库表，分别为文件信息表、引物文件信息表和样品信息表。文件信息表存储每个序列文件的有用信息（例如：与待分型碱基序列关联的文件信息）；引物文件信息表存储每个引物文件的有用信息（例如：与待分型碱基序列关联的引物文件信息）；样品信息表存储每个样品的有用信息（例如：待分型碱基序列）。For example: the base sequence to be typed, the file information associated with the base sequence to be typed, and the primer file information are stored in a dynamic database, which is mainly used to store and analyze samples (ie, the base sequence to be typed) Intermediate result data in . The dynamic database stores, for example, three database tables, namely a file information table, a primer file information table, and a sample information table. The file information table stores the useful information of each sequence file (for example: the file information associated with the base sequence to be typed); the primer file information table stores the useful information of each primer file (for example: the file information associated with the base sequence to be typed primer file information); the sample information table stores the useful information of each sample (for example: the base sequence to be typed).

步骤S102：根据文件信息和参考序列的等位基因型别信息识别待分型碱基序列和参考序列的联配位置关系，以得到第一待分析候选型别组合。Step S102: Identify the alignment positional relationship between the base sequence to be typed and the reference sequence according to the file information and the allelic type information of the reference sequence, so as to obtain the first candidate type combination to be analyzed.

在本发明的一个实施例中，可通过如下步骤实现：In one embodiment of the present invention, it can be realized through the following steps:

步骤S1021：根据文件信息包括的待分型碱基序列的位点名称与参考序列的等位基因型别信息进行匹配。Step S1021: Match the site name of the base sequence to be typed with the allelic type information of the reference sequence according to the file information.

步骤S1022：根据匹配结果得到待分型碱基序列和参考序列的联配位置关系。Step S1022: Obtain the alignment positional relationship between the base sequence to be typed and the reference sequence according to the matching result.

步骤S1023：根据联配位置关系得到第一待分析候选型别组合。Step S1023: Obtain the first combination of candidate types to be analyzed according to the alignment positional relationship.

其中，参考序列的等位基因型别信息包括IMGT的HLA I型和II型位点基因信息，文件信息包括的待分型碱基序列的位点名称为对应位点的位点基因信息。Wherein, the allelic type information of the reference sequence includes IMGT HLA type I and type II site gene information, and the site name of the base sequence to be typed included in the file information is the site gene information of the corresponding site.

具体地说，参考序列和其关联的信息可存储在预先设置的一个静态数据库内，该静态数据库也存储了3个数据库表，分别为基因信息表（即存储参考序列的基因信息）、型别信息表（即存储参考序列的型别）和引物信息表（即存储参考序列的引物信息）。更为具体地说，基因信息表储存了来自IMGT的HLA I型，II型位点基因信息，型别信息表储存了所有等位基因型别信息，引物信息表储存了HLA I型，II型所有GSSP引物及备用引物信息。Specifically, the reference sequence and its associated information can be stored in a pre-set static database, which also stores three database tables, namely the gene information table (that is, the gene information storing the reference sequence), the type Information table (that is, store the type of the reference sequence) and primer information table (that is, store the primer information of the reference sequence). More specifically, the gene information table stores the gene information of HLA type I and type II loci from IMGT, the type information table stores all allele type information, and the primer information table stores HLA type I, type II All GSSP primers and alternate primer information.

这样，通过静态数据库中的参考序列以及其相关信息和动态数据库中的待分型碱基序列以及其相关信息可对待分型碱基序列进行初步分型。如上所述，例如：根据上述的这些信息可直接识别待分型碱基序列到参考序列的联配位置关系，从而获得第一待分析候选型别组合，从而，减少了读取碱基序列逐一比对到各个参考序列的对应位点的步骤，缩减了比对时间。尤其在进行大范围候选型别（如很多个参考序列，很多个型别）筛查检索时，大大缩短了比对时间。In this way, the base sequence to be typed can be preliminarily typed through the reference sequence and its related information in the static database and the base sequence to be typed and its related information in the dynamic database. As mentioned above, for example, according to the above-mentioned information, the alignment positional relationship between the base sequence to be typed and the reference sequence can be directly identified, thereby obtaining the first combination of candidate types to be analyzed, thereby reducing the need for reading base sequences one by one. The step of aligning to the corresponding positions of each reference sequence reduces the alignment time. Especially when screening and searching a large range of candidate types (such as many reference sequences, many types), the comparison time is greatly shortened.

步骤S103：根据引物文件信息和参考序列的引物信息对待分型碱基序列进行进一步分型识别，以得到第二待分析候选型别组合。Step S103: Perform further typing identification on the base sequence to be typed according to the primer file information and the primer information of the reference sequence, so as to obtain the second candidate type combination to be analyzed.

在本发明的一个实施例中，可包括如下步骤：In one embodiment of the present invention, the following steps may be included:

步骤S1031：对引物文件信息和参考序列的引物信息进行匹配。Step S1031: matching the primer file information with the primer information of the reference sequence.

步骤S1032：如果匹配一致，则将对应的参考序列添加到第二待分析候选型别组合中。Step S1032: If the matches are consistent, add the corresponding reference sequence to the second combination of candidate types to be analyzed.

其中，参考序列的引物信息包括HLA I型和II型的GSSP引物信息即备用引物信息。Wherein, the primer information of the reference sequence includes the GSSP primer information of HLA type I and II, that is, the spare primer information.

也就是说，识别GSSP引物测序碱基序列时，可直接从静态数据库中调用GSSP引物信息，判读待分型碱基序列，从而获得第二待分析候选型别组合。即通过待分型碱基序列关联的引物信息和参考序列的引物信息进行比对，从而得到上述的第二待分析候选型别组合。That is to say, when identifying the base sequence of GSSP primers for sequencing, the GSSP primer information can be directly called from the static database to interpret the base sequence to be typed, so as to obtain the second candidate type combination to be analyzed. That is, by comparing the primer information associated with the base sequence to be typed with the primer information of the reference sequence, the above-mentioned second candidate type combination to be analyzed is obtained.

步骤S104：根据第一待分析候选型别组合和第二待分析候选型别组合得到待分型碱基序列的最终型别。Step S104: Obtain the final type of the base sequence to be typed according to the first combination of candidate types to be analyzed and the second combination of candidate types to be analyzed.

具体地，可通过如下步骤实现：Specifically, it can be achieved through the following steps:

步骤S1041：获取第一待分析候选型别组合和第二待分析候选型别组合的交集。Step S1041: Obtain the intersection of the first candidate type combination to be analyzed and the second candidate type combination to be analyzed.

步骤S1042：根据交集确定待分型碱基序列的最终型别。Step S1042: Determine the final type of the base sequence to be typed according to the intersection.

即：根据第一待分析候选型别组合和第二待分析候选型别组合的交集以确认待分型碱基序列的最终的型别。That is: the final type of the base sequence to be typed is confirmed according to the intersection of the first candidate type combination to be analyzed and the second candidate type combination to be analyzed.

如图2所示，应用本发明实施例的方法，对于不同数量的参考序列进行分型所消耗的内存、处理器CPU时间以及分型（即分析）时间的示意图。由图2可以看出，即使对88个样品（即待分型碱基序列）进行分型所占的时间也仅需几分钟便可分型完成。As shown in FIG. 2 , it is a schematic diagram of memory, processor CPU time and typing (ie analysis) time consumed for typing different numbers of reference sequences by applying the method of the embodiment of the present invention. It can be seen from Fig. 2 that even if it takes only a few minutes for the typing of 88 samples (ie base sequences to be typed), the typing can be completed.

本发明实施例的方法可通过软件实现，该软件可以以一种软件产品的形式存在。这样，人为操作该软件进行分型时，只需导入一些必要数据即可。具体地说，如图3和图4所示，其中，图3是应用本发明实施例的方法进行分型时，人工所需处理的事情的步骤，图4是一种操作界面的示意图，其中，图4中1表示碱基导航、2表示序列展示、3表示峰图展示、4表示样品列表、5表示型别列表。结合图3和图4所示，例如人为使用本发明实施例的方法进行分型，只需按如下操作步骤进行操作：The method in the embodiment of the present invention can be realized by software, and the software can exist in the form of a software product. In this way, when the software is manually operated for classification, it only needs to import some necessary data. Specifically, as shown in Fig. 3 and Fig. 4, wherein, Fig. 3 is a step of manual processing when applying the method of the embodiment of the present invention for classification, and Fig. 4 is a schematic diagram of an operation interface, in which , in Figure 4, 1 indicates base navigation, 2 indicates sequence display, 3 indicates electropherogram display, 4 indicates sample list, and 5 indicates type list. As shown in Fig. 3 and Fig. 4, for example, the method of the embodiment of the present invention is artificially used for classification, and only the following steps are required:

步骤S301：导入文件。Step S301: Import files.

步骤S302：点击文件列表模块的一个样品文件。Step S302: Click a sample file in the file list module.

步骤S303：查看错配位点。Step S303: check the mismatched sites.

步骤S304：碱基编辑。Step S304: base editing.

步骤S305：查看型别列表模块。Step S305: Check the type list module.

步骤S306：根据序列文件信息,编辑错配碱基,直到出现错配数为0个的型别或型别组合。Step S306: According to the sequence file information, edit the mismatched bases until a type or type combination with 0 mismatches appears.

步骤S307：判断是否需要Gssp引物。如果是，则执行步骤S308，否则执行步骤S311。Step S307: judging whether Gssp primers are needed. If yes, execute step S308, otherwise execute step S311.

步骤S308：保存。Step S308: Save.

步骤S309：运行Gssp。Step S309: Running Gssp.

步骤S310：导入保存文件和Gssp文件，并返回步骤S306。Step S310: import the saved file and the Gssp file, and return to step S306.

步骤S311：保存。Step S311: Save.

步骤S312：标记。Step S312: mark.

步骤S313：保存。Step S313: Save.

步骤S314：导出报文。Step S314: Export the message.

本发明的进一步实施例提供了一种聚合酶链式反应产物测序分型的系统。如图5所示，根据本发明一个实施例的聚合酶链式反应产物测序分型的系统500，包括：存储模块510、分析模块520和显示模块530。A further embodiment of the present invention provides a system for sequencing and typing polymerase chain reaction products. As shown in FIG. 5 , a system 500 for sequencing and typing polymerase chain reaction products according to an embodiment of the present invention includes: a storage module 510 , an analysis module 520 and a display module 530 .

其中，存储模块510用于存储待分型碱基序列、与待分型碱基序列关联的文件信息和引物文件信息，以及参考序列、参考序列的等位基因型别信息和引物信息，其中，文件信息包括待分型碱基序列的位点名称。Wherein, the storage module 510 is used to store the base sequence to be typed, the file information associated with the base sequence to be typed, and the primer file information, as well as the reference sequence, the allele type information of the reference sequence, and the primer information, wherein, The file information includes the site name of the base sequence to be typed.

存储模块510可分为静态数据库和动态数据库。例如：待分型碱基序列、与待分型碱基序列关联的文件信息和引物文件信息等存储在动态数据库中，该动态数据库主要用于存储分析样品（即待分型碱基序列）中的中间结果数据。该动态数据库例如存储了3个数据库表，分别为文件信息表、引物文件信息表和样品信息表。文件信息表存储每个序列文件的有用信息（例如：与待分型碱基序列关联的文件信息）；引物文件信息表存储每个引物文件的有用信息（例如：与待分型碱基序列关联的引物文件信息）；样品信息表存储每个样品的有用信息（例如：待分型碱基序列）。The storage module 510 can be divided into a static database and a dynamic database. For example: the base sequence to be typed, the file information associated with the base sequence to be typed, and the primer file information are stored in a dynamic database, which is mainly used to store and analyze samples (ie, the base sequence to be typed) intermediate result data. The dynamic database stores, for example, three database tables, namely a file information table, a primer file information table, and a sample information table. The file information table stores the useful information of each sequence file (for example: the file information associated with the base sequence to be typed); the primer file information table stores the useful information of each primer file (for example: the file information associated with the base sequence to be typed primer file information); the sample information table stores the useful information of each sample (for example: the base sequence to be typed).

参考序列和其关联的信息可存储在静态数据库内，该静态数据库也存储了3个数据库表，分别为基因信息表（即存储参考序列的基因信息）、型别信息表（即存储参考序列的型别）和引物信息表（即存储参考序列的引物信息）。更为具体地说，基因信息表储存了来自IMGT的HLA I型，II型位点基因信息，型别信息表储存了所有等位基因型别信息，引物信息表储存了HLA I型，II型所有GSSP引物及备用引物信息。The reference sequence and its associated information can be stored in a static database, which also stores 3 database tables, namely the gene information table (that is, the gene information that stores the reference sequence), the type information table (that is, the gene information that stores the reference sequence type) and the primer information table (that is, the primer information that stores the reference sequence). More specifically, the gene information table stores the gene information of HLA type I and type II loci from IMGT, the type information table stores all allele type information, and the primer information table stores HLA type I, type II All GSSP primers and alternate primer information.

分析模块520用于根据文件信息和参考序列的等位基因型别信息识别待分型碱基序列和参考序列的联配位置关系，以得到第一待分析候选型别组合，并根据引物文件信息和参考序列的引物信息对待分型碱基序列进行进一步分型识别，以得到第二待分析候选型别组合，以及根据第一待分析候选型别组合和第二待分析候选型别组合得到待分型碱基序列的最终型别。The analysis module 520 is used to identify the alignment positional relationship between the base sequence to be typed and the reference sequence according to the file information and the allelic type information of the reference sequence, so as to obtain the first candidate type combination to be analyzed, and according to the primer file information and the primer information of the reference sequence to further type and identify the base sequence to be typed to obtain the second candidate type combination to be analyzed, and obtain the candidate type combination according to the first candidate type combination to be analyzed and the second candidate type combination to be analyzed. The final type of the typed base sequence.

具体而言，分析模块520用于：根据文件信息包括的待分型碱基序列的位点名称与参考序列的等位基因型别信息进行匹配，并根据匹配结果得到待分型碱基序列和参考序列的联配位置关系，以及根据联配位置关系得到所述第一待分析候选型别组合。Specifically, the analysis module 520 is used to: match the site name of the base sequence to be typed included in the file information with the allelic type information of the reference sequence, and obtain the base sequence to be typed and The alignment positional relationship of the reference sequence, and obtaining the first candidate type combination to be analyzed according to the alignment positional relationship.

在本发明的一个实施例中，分析模块520还用于：对引物文件信息和参考序列的引物信息进行匹配，如果匹配一致，则将对应的参考序列添加到第二待分析候选型别组合中。其中，参考序列的引物信息包括HLA I型和II型的GSSP引物信息即备用引物信息。In one embodiment of the present invention, the analysis module 520 is also used to: match the primer file information with the primer information of the reference sequence, and if the matching is consistent, add the corresponding reference sequence to the second candidate type combination to be analyzed . Wherein, the primer information of the reference sequence includes the GSSP primer information of HLA type I and II, that is, the spare primer information.

进一步地，分析模块520还用于：获取第一待分析候选型别组合和第二待分析候选型别组合的交集，并根据交集确定所述待分型碱基序列的最终型别。其中，参考序列的等位基因型别信息包括IMGT的HLA I型和II型位点基因信息，文件信息包括的待分型碱基序列的位点名称为对应位点的位点基因信息。即：根据第一待分析候选型别组合和第二待分析候选型别组合的交集以确认待分型碱基序列的最终的型别。Further, the analysis module 520 is further configured to: acquire the intersection of the first candidate type combination to be analyzed and the second candidate type combination to be analyzed, and determine the final type of the base sequence to be typed according to the intersection. Wherein, the allelic type information of the reference sequence includes IMGT HLA type I and type II site gene information, and the site name of the base sequence to be typed included in the file information is the site gene information of the corresponding site. That is: the final type of the base sequence to be typed is confirmed according to the intersection of the first candidate type combination to be analyzed and the second candidate type combination to be analyzed.

显示模块530用于显示存储模块存储的信息以及分析模块的分析结果。The display module 530 is used to display the information stored by the storage module and the analysis result of the analysis module.

综上，本发明实施例的系统，包括：存储模块（即数据库模块）、分析模块和显示模块（即界面展示模块），其中，IMGT/HLA数据库中所有已知的HLA型别序列会按设计好的格式存入存储模块的静态数据库中，同时待分析样品（待分析碱基序列）导入软件后的有效数据也会按固定格式存入存储模块的动态数据库中，方便分析时调取数据。In summary, the system of the embodiment of the present invention includes: a storage module (i.e. a database module), an analysis module and a display module (i.e. an interface display module), wherein all known HLA type sequences in the IMGT/HLA database will be displayed as designed The good format is stored in the static database of the storage module. At the same time, the valid data of the sample to be analyzed (base sequence to be analyzed) imported into the software will also be stored in the dynamic database of the storage module in a fixed format, which is convenient for data retrieval during analysis.

分析模块实现分型功能，例如：分析模块实现对样品HLA基因型别分型的核心功能，可两个子模块：分析文件提取信息模块和分析样品HLA分型模块。分析文件提取信息模块主要实现的功能是提取测序序列峰图文件有用信息，存储到存储模块中，分析样品HLA分型模块将文件信息合并，按样品实现HLA分型的功能。The analysis module realizes the typing function, for example: the analysis module realizes the core function of typing the HLA genotype of the sample, and there are two sub-modules: the analysis file extraction information module and the analysis sample HLA typing module. The main function of the analysis file extraction information module is to extract the useful information of the sequencing sequence peak map file and store it in the storage module. The HLA typing module of the analysis sample merges the file information and realizes the function of HLA typing according to the sample.

显示模块（即界面展示模块）生成操作界面，即可视化的界面。例如界面展示模块如图4所示，样品列表实现将打开的文件按文件树的结构展示，并实现相关的右键功能；型别列表实现将分型的结果展示出来，并实现相关右键功能；碱基导航实现快速导航外显子不同区域的功能；序列展示实现比对序列的展示功能。峰图展示实现对峰图的展示功能。The display module (that is, the interface display module) generates an operation interface, that is, a visual interface. For example, the interface display module is shown in Figure 4. The sample list realizes the display of the opened files according to the structure of the file tree, and realizes the related right-click function; the type list realizes the display of the typing results, and realizes the related right-click function; base Base navigation realizes the function of quickly navigating different regions of exons; sequence display realizes the display function of aligned sequences. Electropherogram display realizes the display function of electropherogram.

在本说明书的描述中，参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本发明的至少一个实施例或示例中。在本说明书中，对所述术语的示意性表述不一定指的是相同的实施例或示例。而且，描述的具体特征、结构、材料或者特点可以在任何的一个或多个实施例或示例中以合适的方式结合。In the description of this specification, descriptions referring to the terms "one embodiment", "some embodiments", "example", "specific examples", or "some examples" mean that specific features described in connection with the embodiment or example , structure, material or characteristic is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the terms do not necessarily refer to the same embodiment or example. Furthermore, the specific features, structures, materials or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

尽管已经示出和描述了本发明的实施例，对于本领域的普通技术人员而言，可以理解在不脱离本发明的原理和精神的情况下可以对这些实施例进行多种变化、修改、替换和变型，本发明的范围由所附权利要求及其等同限定。Although the embodiments of the present invention have been shown and described, those skilled in the art can understand that various changes, modifications and substitutions can be made to these embodiments without departing from the principle and spirit of the present invention. and modifications, the scope of the invention is defined by the appended claims and their equivalents.

Claims

1. A method for polymerase chain reaction product sequencing typing, is characterized in that, comprises the following steps:

Obtaining the base sequence to be typed, the file information associated with the base sequence to be typed, and the primer file information, wherein the file information includes the site name of the base sequence to be typed;

identifying the alignment positional relationship between the base sequence to be typed and the reference sequence according to the file information and the allelic type information of the reference sequence, so as to obtain the first candidate type combination to be analyzed;

Further typing and identifying the base sequence to be typed according to the primer file information and the primer information of the reference sequence to obtain a second candidate type combination to be analyzed; and

Obtaining the final type of the base sequence to be typed according to the first combination of candidate types to be analyzed and the second combination of candidate types to be analyzed.

2. the method for polymerase chain reaction product sequencing typing according to claim 1, is characterized in that, described base to be typed is identified according to the allele type information of described file information and reference sequence The alignment positional relationship between the sequence and the reference sequence to obtain the first combination of candidate types to be analyzed, further including:

matching the site name of the base sequence to be typed according to the allelic type information of the reference sequence included in the file information;

Obtaining the alignment positional relationship between the base sequence to be typed and the reference sequence according to the matching result;

The first combination of candidate types to be analyzed is obtained according to the alignment positional relationship.

3. the method for polymerase chain reaction product sequencing typing according to claim 2, is characterized in that, the allelic type information of described reference sequence comprises the HLA type I of IMGT and type II site gene information, The site name of the base sequence to be typed included in the file information is the site gene information of the corresponding site.

4. the method for polymerase chain reaction product sequencing typing according to claim 1, is characterized in that, described base sequence to be typed according to the primer information of described primer file information and described reference sequence Carry out further typing identification to obtain the second combination of candidate types to be analyzed, further including:

matching the primer file information with the primer information of the reference sequence;

If the matching is consistent, the corresponding reference sequence is added to the second combination of candidate types to be analyzed.

5. The method for sequencing and typing polymerase chain reaction products according to claim 4, wherein the primer information of the reference sequence includes GSSP primer information of HLA type I and type II, that is, spare primer information.

6. The method for sequencing and typing of polymerase chain reaction products according to any one of claims 1-5, characterized in that, according to the combination of the first candidate type to be analyzed and the second to be analyzed The combination of candidate types to obtain the final type of the base sequence to be typed further includes:

Obtaining the intersection of the first candidate type combination to be analyzed and the second candidate type combination to be analyzed;

The final type of the base sequence to be typed is determined according to the intersection.

7. A system for sequencing and typing polymerase chain reaction products, characterized in that it comprises:

The storage module is used to store the base sequence to be typed, the file information associated with the base sequence to be typed and the primer file information, as well as the reference sequence, the allelic type information and primer information of the reference sequence, Wherein, the file information includes the site name of the base sequence to be typed;

The analysis module is used to identify the alignment position relationship between the base sequence to be typed and the reference sequence according to the file information and the allelic type information of the reference sequence, so as to obtain the first candidate type combination to be analyzed, and According to the primer file information and the primer information of the reference sequence, the base sequence to be typed is further typed and identified to obtain the second candidate type combination to be analyzed, and according to the first candidate type to be analyzed Genre combination and the second candidate type to be analyzed are combined to obtain the final type of the base sequence to be typed; and

The display module is used to display the information stored by the storage module and the analysis result of the analysis module.

8. The system of polymerase chain reaction product sequencing typing according to claim 7, is characterized in that, described analysis module is used for:

9. The system of polymerase chain reaction product sequencing typing according to claim 8, is characterized in that, the allelic type information of described reference sequence comprises the HLA type I of IMGT and type II site gene information, The site name of the base sequence to be typed included in the file information is the site gene information of the corresponding site.

10. The system of polymerase chain reaction product sequencing typing according to claim 7, is characterized in that, described analysis module is also used for:

11. The system for sequencing and typing polymerase chain reaction products according to claim 10, wherein the primer information of the reference sequence includes GSSP primer information of HLA type I and type II, that is, spare primer information.

12. The system for sequencing and typing polymerase chain reaction products according to any one of claims 7-11, wherein the analysis module is also used for: