[go: up one dir, main page]

CN112735519B - Method, device and storage medium for positioning segregation character - Google Patents

Method, device and storage medium for positioning segregation character Download PDF

Info

Publication number
CN112735519B
CN112735519B CN202110031263.9A CN202110031263A CN112735519B CN 112735519 B CN112735519 B CN 112735519B CN 202110031263 A CN202110031263 A CN 202110031263A CN 112735519 B CN112735519 B CN 112735519B
Authority
CN
China
Prior art keywords
information
segregation
partial separation
data
partial
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110031263.9A
Other languages
Chinese (zh)
Other versions
CN112735519A (en
Inventor
邓秀新
王楠
宋谢天
周银
胡健兵
谢源源
叶俊丽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong Agricultural University
Original Assignee
Huazhong Agricultural University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong Agricultural University filed Critical Huazhong Agricultural University
Priority to CN202110031263.9A priority Critical patent/CN112735519B/en
Publication of CN112735519A publication Critical patent/CN112735519A/en
Application granted granted Critical
Publication of CN112735519B publication Critical patent/CN112735519B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection

Landscapes

  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • Genetics & Genomics (AREA)
  • Molecular Biology (AREA)
  • Chemical & Material Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Analytical Chemistry (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

本发明提供一种定位偏分离性状的方法、装置及存储介质,方法包括:导入遗传群体待定位表型数据、遗传群体父母本及子代的基因型变异信息和基因组参考信息,对参考信息进行数据窗口划分,在多个数据窗口内对变异信息进行偏分离程度分析处理,得到待比较偏分离程度信息,从变异信息中提取相对性状的偏分离效应去除变异文件和偏分离效应增加变异文件,对其进行偏分离程度分析处理,得到第一和第二偏分离程度信息,将第一和第二偏分离程度信息与待比较偏分离程度信息进行比较,根据比较结果得到偏分离性状定位区段。本发明能够快速且准确地得到偏分离性状定位区段,解决了偏分离的性状不能定位的问题。

Figure 202110031263

The present invention provides a method, a device and a storage medium for locating biased segregation traits. The method includes: importing genetic population phenotype data to be located, genotype variation information and genome reference information of parents and offspring of the genetic population, and analyzing the reference information. The data window is divided, and the variation information is subjected to partial separation degree analysis and processing in multiple data windows to obtain the partial separation degree information to be compared, and the partial separation effect of the relative trait is extracted from the variation information to remove the variation file and the partial separation effect to increase the variation file. Perform partial separation degree analysis processing on it to obtain first and second partial separation degree information, compare the first and second partial separation degree information with the partial separation degree information to be compared, and obtain the partial separation trait location segment according to the comparison result . The invention can quickly and accurately obtain the locating section of the deviated-separated characters, and solves the problem that the deviated-separated characters cannot be located.

Figure 202110031263

Description

一种定位偏分离性状的方法、装置及存储介质A kind of method, device and storage medium of positioning deviation separation character

技术领域technical field

本发明主要涉及基因数据处理技术领域,具体涉及一种定位偏分离性状的方法、装置及存储介质。The invention mainly relates to the technical field of gene data processing, and in particular relates to a method, a device and a storage medium for locating biased segregation traits.

背景技术Background technique

正向遗传学其中一个主要的方法是基于杂交群体,对控制性状的相关区段进行定位。针对单基因控制的质量性状,常常构建BC1测交分离群体以及F2自交分离群体,如果显性性状和隐性性状的分离比经过卡方检验在BC1群体子代中呈现1:1的分离,而在F2群体子代中呈现1:2:1的分离,通常对简单质量性状的定位方法有QTL定位,BSA定位,这一些方法能表现出良好的效果。但是定位一些可能影响子代存活率的性状则可能效果不显著,因为这些性状会影响子代表型的分离,即产生分离群体表型偏分离的现象,目前对于偏分离的性状不能定位的问题还没有相应的解决办法。One of the main methods of forward genetics is based on cross-breeding populations to locate relevant segments that control a trait. For quality traits controlled by a single gene, BC1 tester segregating populations and F2 selfing segregating populations are often constructed. If the segregation ratio of dominant traits and recessive traits is 1:1 segregated in BC1 population progeny after chi-square test, However, in the F2 population progeny, there is a 1:2:1 separation. Usually, the mapping methods for simple quality traits include QTL mapping and BSA mapping. These methods can show good results. However, locating some traits that may affect the survival rate of offspring may not have significant effect, because these traits will affect the segregation of sub-representative types, that is, the phenomenon of segregation of the phenotype of the segregated population. There is no corresponding solution.

发明内容SUMMARY OF THE INVENTION

本发明所要解决的技术问题是针对现有技术的不足,提供一种定位偏分离性状的方法、装置及存储介质。The technical problem to be solved by the present invention is to provide a method, a device and a storage medium for positioning deviation and separation characteristics in view of the deficiencies of the prior art.

本发明解决上述技术问题的技术方案如下:一种定位偏分离性状的数据处理方法,包括如下步骤:The technical solution of the present invention to solve the above-mentioned technical problems is as follows: a data processing method for positioning partial separation characteristics, comprising the following steps:

导入遗传群体待定位表型数据、遗传群体父母本及子代的基因型变异信息和基因组参考信息;Import the phenotype data to be located in the genetic group, the genotype variation information and genome reference information of the parents and offspring of the genetic group;

基于数据窗口划分法对所述基因组参考信息进行划分,得到多个数据窗口;Divide the genome reference information based on the data window division method to obtain multiple data windows;

在多个数据窗口内对所述基因型变异信息进行偏分离程度分析处理,得到待比较偏分离程度信息;Performing partial separation degree analysis processing on the genotype variation information in multiple data windows to obtain partial separation degree information to be compared;

将所述遗传群体待定位表型数据中的遗传群体子代划分为不同性状的亚群,以划分得到的亚群为标准从所述基因型变异信息中提取相对性状的偏分离效应去除变异文件和偏分离效应增加变异文件;Divide the genetic population progeny in the phenotype data to be located in the genetic population into subgroups with different traits, and extract the partial segregation effect of relative traits from the genotype variation information using the subgroup obtained by division as a standard to remove the variation file. and partial segregation effects increase the variance file;

通过多个所述数据窗口对所述偏分离效应去除变异文件进行偏分离程度分析处理,得到第一偏分离程度信息,并通过多个所述数据窗口对所述偏分离效应增加变异文件进行偏分离程度分析处理,得到第二偏分离程度信息;The partial separation effect removal variation file is subjected to partial separation degree analysis processing through a plurality of the data windows to obtain first partial separation degree information, and the partial separation effect increase variation file is subjected to partial separation processing through the multiple data windows. Separation degree analysis and processing to obtain second partial separation degree information;

将所述第一偏分离程度信息和所述第二偏分离程度信息与所述待比较偏分离程度信息进行比较,根据比较结果得到偏分离性状定位区段。The first partial separation degree information and the second partial separation degree information are compared with the partial partial separation degree information to be compared, and the partial separation trait location segment is obtained according to the comparison result.

本发明解决上述技术问题的另一技术方案如下:一种定位偏分离性状的装置,包括:Another technical solution of the present invention to solve the above-mentioned technical problems is as follows: a device for positioning deviation and separation characteristics, comprising:

导入模块,用于导入遗传群体待定位表型数据、遗传群体父母本及子代的基因型变异信息和基因组参考信息;The import module is used to import the phenotype data to be located in the genetic group, the genotype variation information and genome reference information of the parents and offspring of the genetic group;

窗口划分模块,用于基于数据窗口划分法对所述基因组参考信息进行划分,得到多个数据窗口;a window division module, configured to divide the genome reference information based on the data window division method to obtain a plurality of data windows;

处理模块,用于在多个数据窗口内对所述基因型变异信息B进行偏分离程度分析处理,得到偏分离程度信息;a processing module, configured to perform partial separation degree analysis processing on the genotype variation information B in multiple data windows to obtain partial separation degree information;

将所述遗传群体待定位表型数据中的遗传群体子代划分为至少两个不同性状的亚群,以划分得到的亚群为标准从所述基因型变异信息B中提取相对性状的偏分离效应去除变异文件和偏分离效应增加变异文件;The genetic population progeny in the phenotype data to be located in the genetic population is divided into at least two subgroups with different traits, and the partial segregation of relative traits is extracted from the genotype variation information B with the obtained subgroups as a standard Effect removal variation file and partial separation effect increase variation file;

通过多个所述数据窗口对所述偏分离效应去除变异文件进行偏分离程度分析处理,得到第一待比较偏分离程度信息,并通过多个所述数据窗口对所述偏分离效应增加变异文件进行偏分离程度分析处理,得到第二待比较偏分离程度信息;Perform partial separation degree analysis processing on the partial separation effect removal variation file through a plurality of the data windows to obtain the first partial separation degree information to be compared, and add a variation file for the partial separation effect through the multiple data windows Performing partial separation degree analysis and processing to obtain the second to-be-compared partial separation degree information;

比较模块,用于分别将所述第一待比较偏分离程度信息和所述第二待比较偏分离程度信息与所述偏分离程度信息进行比较,得到第一比较结果和第二比较结果,将第一比较结果和第二比较结果进行交集处理,得到偏分离性状定位区段。A comparison module, configured to compare the first and second partial separation degree information to be compared with the partial separation degree information to obtain a first comparison result and a second comparison result, The first comparison result and the second comparison result are intersected to obtain a partial separation trait location segment.

本发明解决上述技术问题的另一技术方案如下:一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,当所述计算机程序被处理器执行时,实现如上所述的定位偏分离性状的方法。Another technical solution of the present invention to solve the above technical problem is as follows: a computer-readable storage medium, the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the above positioning is realized Methods for partial segregation of traits.

本发明解决上述技术问题的另一技术方案如下:一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,当所述计算机程序被处理器执行时,实现如上所述的定位偏分离性状的方法。Another technical solution of the present invention to solve the above technical problem is as follows: a computer-readable storage medium, the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the above positioning is realized Methods for partial segregation of traits.

本发明的有益效果是:将基因组参考信息进行窗口划分,得到多个用于对初始的基因型变异信息及性状划分后的基因型变异信息进行偏分离程度分析处理的数据窗口,得到对应的待比较偏分离程度信息和第一偏分离程度信息以及第二偏分离程度信息,将其进行比较分析,从而确定偏分离性状定位区段。The beneficial effects of the present invention are as follows: the genome reference information is divided into windows to obtain a plurality of data windows for performing partial separation degree analysis and processing on the initial genotype variation information and the genotype variation information after the trait division, and the corresponding pending genotype variation information is obtained. The partial segregation degree information is compared with the first partial segregation degree information and the second partial segregation degree information, and they are compared and analyzed, thereby determining the partial segregation trait location segment.

附图说明Description of drawings

图1为本发明实施例提供的定位偏分离性状的数据处理方法的流程示意图;FIG. 1 is a schematic flowchart of a data processing method for a positioning deviation separation trait provided by an embodiment of the present invention;

图2为本发明实施例提供的定位偏分离性状的装置的功能模块示意图。FIG. 2 is a schematic diagram of functional modules of an apparatus for locating a deviated separation trait according to an embodiment of the present invention.

具体实施方式Detailed ways

以下结合附图对本发明的原理和特征进行描述,所举实例只用于解释本发明,并非用于限定本发明的范围。The principles and features of the present invention will be described below with reference to the accompanying drawings. The examples are only used to explain the present invention, but not to limit the scope of the present invention.

图1为本发明实施例提供的定位偏分离性状的数据处理方法的流程示意图。FIG. 1 is a schematic flowchart of a data processing method for a localization biased segregation trait according to an embodiment of the present invention.

实施例1:如图1所示,一种定位偏分离性状的数据处理方法,包括如下步骤:Embodiment 1: As shown in Figure 1, a data processing method for positioning partial segregation traits, comprising the following steps:

导入遗传群体待定位表型数据、遗传群体父母本及子代的基因型变异信息和基因组参考信息;Import the phenotype data to be located in the genetic group, the genotype variation information and genome reference information of the parents and offspring of the genetic group;

基于数据窗口划分法对所述基因组参考信息进行划分,得到多个数据窗口;Divide the genome reference information based on the data window division method to obtain multiple data windows;

在多个所述数据窗口内对所述基因型变异信息进行偏分离程度分析处理,得到待比较偏分离程度信息;Performing partial separation degree analysis processing on the genotype variation information in a plurality of the data windows to obtain the partial separation degree information to be compared;

将所述遗传群体待定位表型数据中的遗传群体子代划分为不同性状的亚群,以划分得到的亚群为标准从所述基因型变异信息中提取相对性状的偏分离效应去除变异文件和偏分离效应增加变异文件;Divide the genetic population progeny in the phenotype data to be located in the genetic population into subgroups with different traits, and extract the partial segregation effect of relative traits from the genotype variation information using the subgroup obtained by division as a standard to remove the variation file. and partial segregation effects increase the variance file;

通过多个所述数据窗口对所述偏分离效应去除变异文件进行偏分离程度分析处理,得到第一偏分离程度信息,并通过多个所述数据窗口对所述偏分离效应增加变异文件进行偏分离程度分析处理,得到第二偏分离程度信息;The partial separation effect removal variation file is subjected to partial separation degree analysis processing through a plurality of the data windows to obtain first partial separation degree information, and the partial separation effect increase variation file is subjected to partial separation processing through the multiple data windows. Separation degree analysis and processing to obtain second partial separation degree information;

将所述第一偏分离程度信息和所述第二偏分离程度信息与所述待比较偏分离程度信息进行比较,根据比较结果得到偏分离性状定位区段。The first partial separation degree information and the second partial separation degree information are compared with the partial partial separation degree information to be compared, and the partial separation trait location segment is obtained according to the comparison result.

应理解地,“遗传群体父母本及子代的基因型变异信息”中“基因型变异信息”涉及的是遗传群体父母本及子代的共同信息。It should be understood that the "genotype variation information" in the "genotype variation information of parents and offspring of a genetic population" refers to the common information of parents and offspring of a genetic population.

上述实施例中,将基因组参考信息进行窗口划分,得到多个用于对初始的基因型变异信息及性状划分后的基因型变异信息进行偏分离程度分析处理的数据窗口,得到对应的待比较偏分离程度信息和第一偏分离程度信息以及第二偏分离程度信息,将其进行比较分析,从而确定偏分离性状定位区段。In the above embodiment, the genome reference information is divided into windows to obtain a plurality of data windows for performing partial separation degree analysis and processing on the initial genotype variation information and the genotype variation information after the trait division, and the corresponding biases to be compared are obtained. The separation degree information, the first partial separation degree information and the second partial separation degree information are compared and analyzed, thereby determining the partial separation trait positioning segment.

在实施例1的基础上,实施例2:所述对所述基因组参考信息进行窗口划分,得到多个数据窗口的过程包括:On the basis of Example 1, Example 2: the process of dividing the genome reference information into windows to obtain multiple data windows includes:

根据预设步长值对所述基因组参考信息进行窗口划分,得到多个数据窗口,其中,所述预设步长值为100kb的长度。The genome reference information is divided into windows according to a preset step value to obtain a plurality of data windows, wherein the preset step value is a length of 100 kb.

上述实施例中,由于基因组参考信息的长度较长,需要将其进行等长度划分,便于对初始的基因型变异信息及性状划分后的基因型变异信息进行信息索引及偏分离程度分析处理。In the above embodiment, due to the long length of the genome reference information, it needs to be divided into equal lengths, which facilitates information indexing and partial separation degree analysis processing for the initial genotype variation information and the genotype variation information after trait division.

在实施例1的基础上,实施例3:在多个数据窗口内对所述基因型变异信息进行偏分离程度分析处理前,还包括对所述基因型变异信息进行优化处理步骤,其过程包括:On the basis of Example 1, Example 3: before performing partial separation degree analysis and processing on the genotype variation information in multiple data windows, further includes the step of optimizing the genotype variation information, and the process includes: :

过滤掉所述基因型变异信息中子代基因型的假阳性位点;filtering out false positive sites of progeny genotypes in the genotype variation information;

根据预设孟德尔遗传理论模型对过滤后的基因型变异信息B进行变异类型筛选,得到孟德尔分离比。According to the preset Mendelian genetic theory model, the filtered genotype variation information B is screened for variation type, and the Mendelian segregation ratio is obtained.

具体地,根据父母本标记类型推测子代可能的标记类型,当不符合子代理论基因型的个体数占总群体的比例超过5%的位点可认为假阳性位点,可以进行剔除。Specifically, the possible marker types of the offspring are inferred according to the parental marker types. When the number of individuals that do not conform to the theoretical genotype of offspring accounts for more than 5% of the total population, it can be regarded as a false positive locus and can be eliminated.

具体地,根据定位性状的推测模型进行变异类型筛选,例如建立BC1分离模型则选取父母本只有其中一个为杂合的位点,建立F2分离模型则选取父母本中两个均为杂合的位点。根据分离模型确定孟德尔分离比,依据此分离比进行卡方检验,将卡方检验结果中的p值小于0.001的进行保留。Specifically, the variant type is screened according to the inferred model of the localization trait. For example, to establish the BC1 segregation model, select the locus where only one parent is heterozygous, and establish the F2 segregation model, select the locus where both parents are heterozygous. point. According to the separation model, the Mendelian separation ratio was determined, and the chi-square test was carried out according to the separation ratio, and the p-values in the chi-square test results less than 0.001 were retained.

在实施例3的基础上,实施例4:所述在每个数据窗口内对所述基因型变异信息B进行偏分离程度分析处理,得到偏分离程度信息的过程包括:On the basis of Example 3, Example 4: performing partial separation degree analysis processing on the genotype variation information B in each data window, and the process of obtaining the partial separation degree information includes:

在各个数据窗口中统计所述基因型变异信息在偏分离位点上的频数,根据所述频数得到偏分离位点个数;In each data window, count the frequency of the genotype variation information on the partial segregation site, and obtain the number of the partial segregation site according to the frequency;

对所述孟德尔分离比进行卡方检验,根据卡方检验结果的p值为标准得到偏分离位点变异信息,其中,所述p值小于0.001;The chi-square test is performed on the Mendelian separation ratio, and the variation information of the partial separation site is obtained according to the p value of the chi-square test result, wherein the p value is less than 0.001;

将所述偏分离位点个数和所述偏分离位点变异信息作为偏分离程度信息。The number of the partial separation sites and the variation information of the partial separation sites are used as the partial separation degree information.

具体地,在各个数据窗口中统计所述基因型变异信息在偏分离位点上的频数,绘图得到杂交群体的全基因组偏分离位点分布图,并在每一个窗口中计算偏分离的程度,得到偏分离程度。Specifically, the frequency of the genotype variation information at the partial segregation sites is counted in each data window, and the distribution map of the genome-wide partial segregation sites of the hybrid population is obtained by drawing, and the degree of partial segregation is calculated in each window, Get the degree of partial separation.

应理解地,偏分离位点变异信息包含偏分离位点在染色体上的位置信息,根据基因组已经划分好的100kb窗口进行数量统计,记为该窗口的偏分离频数,这反映了该位点偏分离的可信度。每个窗口内所有的偏分离位点均有偏分离的特征,可以根据p值反应偏分离的程度,本实施例对p取log10,来反应偏分离的程度。It should be understood that the variation information of the partial segregation site includes the location information of the partial segregation site on the chromosome, and the number is counted according to the 100kb window that has been divided into the genome, which is recorded as the partial segregation frequency of the window, which reflects the bias of the locus. Separation credibility. All the partial separation sites in each window have the characteristics of partial separation, and the degree of partial separation can be reflected according to the p value. In this embodiment, log10 is taken for p to reflect the degree of partial separation.

上述实施例中,通过对孟德尔分离比进行卡方检验,从而得到偏分离位点变异信息。In the above embodiment, the chi-square test is performed on the Mendelian segregation ratio to obtain the variation information of the partial segregation site.

在实施例1的基础上,实施例5:所述将所述遗传群体待定位表型数据中的遗传群体子代划分为至少两个不同性状的亚群,以划分得到的亚群为标准从所述基因型变异信息中提取相对性状的偏分离效应去除变异文件和偏分离效应增加变异文件的过程包括:On the basis of Example 1, Example 5: dividing the descendants of the genetic population in the phenotype data to be located in the genetic population into at least two subgroups with different traits, using the obtained subgroups as the standard from The process of extracting the partial segregation effect removal variation file and the partial segregation effect increasing variant file of the relative traits from the genotype variation information includes:

通过所述遗传群体待定位表型数据构建偏分离效应去除类群和构建偏分离效应增加类群;Constructing a partial segregation effect removing group and constructing a partial segregation effect increasing group by using the phenotype data to be located in the genetic population;

以所述偏分离效应去除类群和所述偏分离效应增加类群为标准从所述基因型变异信息中提取相对性状的偏分离效应去除变异文件和偏分离效应增加变异文件。Extracting the partial segregation effect-removed variation file and the partial segregation effect-increasing variation file of the relative trait from the genotype variation information using the partial segregation effect removal group and the partial segregation effect increase group as a standard.

上述实施例中,通过偏分离程度下降信息和偏分离程度上升信息来得到定位候选区段,即遗传待定位表型数据的表型偏分离信息。In the above embodiment, the location candidate segment, that is, the phenotype partial segregation information of the genetic phenotype data to be located, is obtained through the information on the decrease in the degree of partial segregation and the information on the increase in the degree of partial segregation.

在实施例5的基础上,实施例6:所述通过所述遗传群体待定位表型数据构建偏分离效应去除类群和构建偏分离效应增加类群的过程包括:On the basis of Example 5, Example 6: the process of constructing a deviated segregation effect removing group and constructing a deviating segregation effect increasing group by using the phenotype data to be located in the genetic population includes:

在所述遗传群体待定位表型数据中获取多个A表型群体子代信息和多个B表型群体子代信息;Obtain a plurality of A phenotype population progeny information and a plurality of B phenotype population progeny information in the phenotype data to be located in the genetic population;

选取所有的B表型群体子代信息以及随机选取A表型群体子代信息来构建偏分离效应去除类群;Select all the progeny information of the B phenotype population and randomly select the progeny information of the A phenotype population to construct a partial segregation effect removal group;

选取所有的A表型群体子代信息来构建偏分离效应增加类群。All the progeny information of the A phenotype population was selected to construct a population with increased segregation effect.

具体地,随机选取A表型群体子代信息和所有的B表型群体子代信息对应的个数为m和n,且m>n,其中,m:n的比例需符合孟德尔遗传模型的卡方检验。Specifically, the numbers corresponding to the progeny information of the A phenotype population and all the progeny information of the B phenotype population are randomly selected as m and n, and m>n, where the ratio of m:n must conform to the Mendelian genetic model. Chi-square test.

上述实施例中,通过遗传群体待定位表型数据得到多个A表型群体子代信息和多个B表型群体子代信息,从而构建得到偏分离效应去除类群和偏分离效应增加类群,通过偏分离效应去除类群和偏分离效应增加类群便于进一步处理得到偏分离性状定位区段。In the above-mentioned embodiment, multiple A phenotype population progeny information and multiple B phenotype population progeny information are obtained through the phenotype data to be located in the genetic population, thereby constructing and obtaining a partial segregation effect removal group and a partial segregation effect increasing group, through Partial segregation effect removal of taxa and partial segregation effect increase of taxa facilitate further processing to obtain a partial segregation trait location segment.

在实施例1的基础上,实施例7:所述将所述第一偏分离程度信息和所述第二偏分离程度信息与所述待比较偏分离程度信息进行比较,根据比较结果得到偏分离性状定位区段的过程包括:On the basis of Embodiment 1, Embodiment 7: the first partial separation degree information and the second partial separation degree information are compared with the to-be-compared partial separation degree information, and the partial separation is obtained according to the comparison result The process of locating segments of a trait includes:

将所述第一偏分离程度信息和所述第二偏分离程度信息与所述待比较偏分离程度信息进行比较,得到偏分离程度下降信息和偏分离程度增加信息;Comparing the first partial separation degree information and the second partial separation degree information with the to-be-compared partial separation degree information to obtain partial separation degree decrease information and partial separation degree increase information;

将偏分离程度下降信息和偏分离程度增加信息得到数据重叠的数据窗口,根据所述数据重叠的数据窗口得到偏分离性状定位区段。The partial separation degree decreasing information and the partial separation degree increasing information are used to obtain a data window with overlapping data, and a partial separation trait location segment is obtained according to the data overlapping data window.

具体地,在数据窗口中通过t检验方法,并设置99%的置信区间,经过t检验方法表明偏分离程度显著下降则为偏分离程度显著下降的窗口;同理,在数据窗口中通过t检验方法,并设置99%的置信区间,经过t检验方法表明偏分离程度显著增加则为偏分离程度显著上升的窗口。Specifically, pass the t-test method in the data window, and set a 99% confidence interval. After the t-test method shows that the degree of partial separation is significantly reduced, it is a window where the degree of partial separation is significantly reduced; similarly, pass the t-test in the data window. method, and set a 99% confidence interval. The t-test method shows that the degree of partial separation is significantly increased, which is the window where the degree of partial separation is significantly increased.

应理解地,统计多个分离效应去除类群与杂交群体相比偏分离程度显著下降的窗口,统计偏分离效应增加类群与杂交群体相比偏分离程度显著增加的窗口,重叠窗口即为定位候选区段,影响表型的偏分离。It should be understood that the statistics of multiple segregation effects remove the window in which the degree of partial segregation is significantly reduced compared with the hybrid population, and the statistical partial segregation effect increases the window in which the degree of partial segregation is significantly increased compared with the hybrid population, and the overlapping window is the positioning candidate area. segment, affecting phenotype biased segregation.

图2为本发明实施例提供的定位偏分离性状的装置的功能模块示意图。FIG. 2 is a schematic diagram of functional modules of an apparatus for locating a deviated separation trait according to an embodiment of the present invention.

实施例8:一种定位偏分离性状的装置,包括:Embodiment 8: a kind of device for positioning partial separation character, comprising:

导入模块,用于导入遗传群体待定位表型数据、遗传群体父母本及子代的基因型变异信息和基因组参考信息;The import module is used to import the phenotype data to be located in the genetic group, the genotype variation information and genome reference information of the parents and offspring of the genetic group;

窗口划分模块,用于基于数据窗口划分法对所述基因组参考信息进行划分,得到多个数据窗口;a window division module, configured to divide the genome reference information based on the data window division method to obtain a plurality of data windows;

处理模块,用于在多个数据窗口内对所述基因型变异信息进行偏分离程度分析处理,得到偏分离程度信息;a processing module, used for performing partial separation degree analysis and processing on the genotype variation information in multiple data windows to obtain partial separation degree information;

将所述遗传群体待定位表型数据中的遗传群体子代划分为至少两个不同性状的亚群,以划分得到的亚群为标准从所述基因型变异信息中提取相对性状的偏分离效应去除变异文件和偏分离效应增加变异文件;Divide the genetic population progeny in the phenotype data to be located in the genetic population into at least two subgroups with different traits, and extract the partial segregation effect of relative traits from the genotype variation information using the subgroups obtained by division as a standard Remove variation files and add variation files with partial separation effects;

通过多个所述数据窗口对所述偏分离效应去除变异文件进行偏分离程度分析处理,得到第一待比较偏分离程度信息,并通过多个所述数据窗口对所述偏分离效应增加变异文件进行偏分离程度分析处理,得到第二待比较偏分离程度信息;Perform partial separation degree analysis processing on the partial separation effect removal variation file through a plurality of the data windows to obtain the first partial separation degree information to be compared, and add a variation file for the partial separation effect through the multiple data windows Performing partial separation degree analysis and processing to obtain the second to-be-compared partial separation degree information;

比较模块,用于分别将所述第一待比较偏分离程度信息和所述第二待比较偏分离程度信息与所述偏分离程度信息进行比较,得到第一比较结果和第二比较结果,将第一比较结果和第二比较结果进行交集处理,得到偏分离性状定位区段。A comparison module, configured to compare the first and second partial separation degree information to be compared with the partial separation degree information to obtain a first comparison result and a second comparison result, The first comparison result and the second comparison result are intersected to obtain a partial separation trait location segment.

实施例9:一种定位偏分离性状的装置,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,当所述处理器执行所述计算机程序时,实现如上所述的定位偏分离性状的方法。Embodiment 9: a device for positioning biased separation traits, comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, when the processor executes the computer program, A method to achieve the above-described localization biased segregation trait.

实施例10:一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,当所述计算机程序被处理器执行时,实现如上所述的定位偏分离性状的方法。Embodiment 10: A computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, the above-described method for locating a biased separation trait is implemented.

应理解地,上述实施例1至8中,是以两个性状为例进行数据处理方法说明的,若出现两个以上的性状,其数据处理方法与“两个性状”的数据处理方法相同,不再赘述。It should be understood that, in the above-mentioned embodiments 1 to 8, the data processing method is described by taking two traits as an example. If there are more than two traits, the data processing method is the same as the data processing method of "two traits", No longer.

下面以具体实例来验证本方法的有效性:The following is a concrete example to verify the effectiveness of this method:

以柑橘基因组和柑橘杂交群体进行定位,针对柑橘自交不亲和性状进行定位,柑橘自交不亲和性状位于柑橘一号染色体上1-1.3mb的区间,利用本群体重测序数据构建杂交群体父母本和子代的基因型变异文件,因为根据亲和:不亲和的分离比1.7:1假定偏离1:1模型,建立孟德尔遗传理论模型,即父母本BC1模型,已知柑橘群体中不亲和性状来源于母本,亲和来源于父本,偏分离偏向父本,其中亲和与不亲和相对性状的偏分离被成功检测,本实例基于已知定位区间的相对性状进行验证,表明偏分离检测有效,能够准确定位偏分离性状。The citrus genome and the citrus hybrid population were used to locate the citrus self-incompatibility traits. The citrus self-incompatibility traits were located in the range of 1-1.3mb on the citrus chromosome 1. The re-sequencing data of this population was used to construct the hybrid population. The genotype variation files of the parents and their offspring, because the segregation ratio of 1.7:1 for incompatibility and incompatibility is assumed to deviate from the 1:1 model. The compatibility traits are derived from the female parent, the compatibility is derived from the male parent, and the partial segregation is biased toward the male parent. The partial segregation of the relative traits of affinity and incompatibility is successfully detected. This example is based on the relative traits in the known positioning interval. It shows that the partial segregation detection is effective and can accurately locate the partial segregation traits.

需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。It should be noted that, in this document, relational terms such as first and second are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply any relationship between these entities or operations. any such actual relationship or sequence exists. Moreover, the terms "comprising", "comprising" or any other variation thereof are intended to encompass a non-exclusive inclusion such that a process, method, article or device that includes a list of elements includes not only those elements, but also includes not explicitly listed or other elements inherent to such a process, method, article or apparatus.

以上所述仅为本发明的较佳实施例,并不用以限制本发明,凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。The above are only preferred embodiments of the present invention and are not intended to limit the present invention. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the present invention shall be included in the protection of the present invention. within the range.

Claims (9)

1. A data processing method for positioning segregation traits is characterized by comprising the following steps:
introducing phenotype data to be located of the genetic group, genotype variation information of parents and offspring of the genetic group and genome reference information;
dividing the genome reference information based on a data window division method to obtain a plurality of data windows;
performing partial separation degree analysis processing on the genotype variation information in a plurality of data windows to obtain partial separation degree information to be compared;
dividing filial generations of the genetic population in the phenotype data to be located of the genetic population into subgroups with different traits, and extracting partial segregation effect removing variation files of relative traits and partial segregation effect increasing variation files of relative traits from the genotype variation information by taking the obtained subgroups as a standard;
performing partial separation degree analysis processing on the partial separation effect removal variant file through the plurality of data windows to obtain first partial separation degree information, and performing partial separation degree analysis processing on the partial separation effect addition variant file through the plurality of data windows to obtain second partial separation degree information;
and comparing the first partial separation degree information and the second partial separation degree information with the partial separation degree information to be compared, and obtaining a partial separation character positioning section according to a comparison result.
2. The data processing method for locating segregation traits as claimed in claim 1, wherein the dividing the genome reference information based on a data window division method to obtain a plurality of data windows comprises:
and carrying out window division on the genome reference information according to a preset step value to obtain a plurality of data windows, wherein the preset step value is 100kb in length.
3. The data processing method for locating segregation trait according to claim 1, further comprising a step of optimizing the genotype variation information before analyzing the segregation degree of the genotype variation information in a plurality of data windows, wherein the step comprises:
filtering out false positive sites of the offspring genotypes in the genotype variation information;
carrying out mutation type screening on the filtered genotype mutation information according to a preset Mendelian genetic theory model to obtain a Mendelian separation ratio;
the process of analyzing and processing the segregation degree of the genotype variation information in a plurality of data windows to obtain the segregation degree information comprises the following steps:
counting the frequency of the genotype variation information on the partial separation sites in each data window, and obtaining the number of the partial separation sites according to the frequency;
performing chi-square test on the Mendelian separation ratio, and obtaining variation information of partial separation sites according to a p value of a chi-square test result as a standard, wherein the p value is less than 0.001;
and taking the number of the partial separation sites and the variation information of the partial separation sites as partial separation degree information.
4. The data processing method for locating segregation traits as claimed in claim 1, wherein the process of dividing progeny of the genetic population in the phenotype data to be located of the genetic population into at least two subsets of different traits, and extracting segregation-effect-removing variation files of relative traits and segregation-effect-increasing variation files of relative traits from the genotype variation information by using the divided subsets as a standard comprises:
constructing a segregation effect removing class group and a segregation effect increasing class group through the phenotype data to be located of the genetic group;
and extracting a partial segregation effect removal variation file of relative characters and a partial segregation effect increase variation file of relative characters from the genotype variation information by taking the partial segregation effect removal cluster and the partial segregation effect increase cluster as standards.
5. The data processing method for locating segregation traits as claimed in claim 4, wherein the process of constructing segregation effect eliminating cluster and constructing segregation effect increasing cluster by the phenotype data to be located by the genetic population comprises:
acquiring a plurality of A phenotype population filial generation information and a plurality of B phenotype population filial generation information from the genetic population to-be-located phenotype data;
selecting all B phenotype population progeny information and randomly selecting A phenotype population progeny information to construct segregation-biased effect removal clusters;
and selecting all filial generation information of the A phenotype population to construct a segregation effect increasing class group.
6. The data processing method for locating segregation trait according to claim 1, wherein the step of comparing the first segregation degree information and the second segregation degree information with the segregation degree information to be compared to obtain a segregation trait locating section according to the comparison result comprises:
comparing the first partial separation degree information and the second partial separation degree information with the partial separation degree information to be compared to obtain partial separation degree reduction information and partial separation degree increase information;
and obtaining data overlapping data windows according to the partial separation degree reduction information and the partial separation degree increase information, and obtaining partial separation character positioning sections according to the data overlapping data windows.
7. A data processing apparatus for locating segregation traits, comprising:
the introduction module is used for introducing phenotype data to be positioned of the genetic group, genotype variation information of parents and filial generations of the genetic group and genome reference information;
the window division module is used for dividing the genome reference information based on a data window division method to obtain a plurality of data windows;
the processing module is used for analyzing and processing the segregation degree of the genotype variation information in a plurality of data windows to obtain segregation degree information;
dividing filial generations of the genetic population in the phenotype data to be located of the genetic population into at least two subgroups with different traits, and extracting partial segregation effect removing variation files of relative traits and partial segregation effect increasing variation files of relative traits from the genotype variation information by taking the obtained subgroups as a standard;
performing partial separation degree analysis processing on the partial separation effect removing variation file through the plurality of data windows to obtain first partial separation degree information to be compared, and performing partial separation degree analysis processing on the partial separation effect adding variation file through the plurality of data windows to obtain second partial separation degree information to be compared;
and the comparison module is used for respectively comparing the first partial separation degree information to be compared and the second partial separation degree information to be compared with the partial separation degree information to obtain a first comparison result and a second comparison result, and performing intersection processing on the first comparison result and the second comparison result to obtain a partial separation character positioning section.
8. A data processing apparatus for locating segregation traits, comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that when the computer program is executed by the processor, a data processing method for locating segregation traits according to any one of claims 1 to 6 is implemented.
9. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out a data processing method for locating a segregation trait of any one of claims 1 to 6.
CN202110031263.9A 2021-01-11 2021-01-11 Method, device and storage medium for positioning segregation character Active CN112735519B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110031263.9A CN112735519B (en) 2021-01-11 2021-01-11 Method, device and storage medium for positioning segregation character

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110031263.9A CN112735519B (en) 2021-01-11 2021-01-11 Method, device and storage medium for positioning segregation character

Publications (2)

Publication Number Publication Date
CN112735519A CN112735519A (en) 2021-04-30
CN112735519B true CN112735519B (en) 2022-08-30

Family

ID=75590219

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110031263.9A Active CN112735519B (en) 2021-01-11 2021-01-11 Method, device and storage medium for positioning segregation character

Country Status (1)

Country Link
CN (1) CN112735519B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1184456A2 (en) * 2000-08-28 2002-03-06 Genaissance Pharmaceuticals, Inc. Haplotypes of the AGTR1 gene
CN103740828A (en) * 2014-01-14 2014-04-23 南京农业大学 SNP (Single Nucleotide Polymorphism) molecular marking method for major QTL (Quantitative Trait Locus) in fruit stem length of pear fruit and application thereof
CN106755562A (en) * 2017-03-30 2017-05-31 吉林省农业科学院 A kind of QTL related to soybean rhizoplane area, SNP marker and application
CN108642206A (en) * 2018-05-09 2018-10-12 云南省烟草农业科学研究院 A kind of relevant QTL of Alternaria alternate resistance and its localization method and application
CN109101786A (en) * 2018-08-29 2018-12-28 广东省农业科学院动物科学研究所 A kind of genomic breeding value estimation method for integrating dominant effect
WO2019140402A1 (en) * 2018-01-15 2019-07-18 Illumina, Inc. Deep learning-based variant classifier

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1184456A2 (en) * 2000-08-28 2002-03-06 Genaissance Pharmaceuticals, Inc. Haplotypes of the AGTR1 gene
CN103740828A (en) * 2014-01-14 2014-04-23 南京农业大学 SNP (Single Nucleotide Polymorphism) molecular marking method for major QTL (Quantitative Trait Locus) in fruit stem length of pear fruit and application thereof
CN106755562A (en) * 2017-03-30 2017-05-31 吉林省农业科学院 A kind of QTL related to soybean rhizoplane area, SNP marker and application
WO2019140402A1 (en) * 2018-01-15 2019-07-18 Illumina, Inc. Deep learning-based variant classifier
CN108642206A (en) * 2018-05-09 2018-10-12 云南省烟草农业科学研究院 A kind of relevant QTL of Alternaria alternate resistance and its localization method and application
CN109101786A (en) * 2018-08-29 2018-12-28 广东省农业科学院动物科学研究所 A kind of genomic breeding value estimation method for integrating dominant effect

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
SLAF-Based Construction of a High-Density Genetic Map and Its Application in QTL Mapping of Carotenoids Content in Citrus Fruit;Xiongjie Zheng 等;《Journal of agricultural and food chemistry》;20181227;第67卷(第3期);994-1002 *
水稻偏分离位定位及遗传研究;肖雄锋;《中国优秀博硕士学位论文全文数据库(硕士》农业科技辑》;20180615(第06期);全文 *
陆地棉优异种质J02-508纤维相关性状QTL定位及偏分离分析;王丽媛;《中国优秀博硕士学位论文全文数据库(博士)农业科技辑》;20200815(第08期);全文 *

Also Published As

Publication number Publication date
CN112735519A (en) 2021-04-30

Similar Documents

Publication Publication Date Title
Gardner et al. A highly recombined, high‐density, eight‐founder wheat MAGIC map reveals extensive segregation distortion and genomic locations of introgression segments
CN108690871B (en) Method, device and storage medium for detecting insertion deletion mutation based on next generation sequencing
Hu et al. The genetic basis of haploid induction in maize identified with a novel genome-wide association method
CN110093406A (en) A kind of argali and its filial generation gene research method
CN105404793B (en) The method for quickly finding phenotype correlation gene based on probabilistic framework and weight sequencing technologies
CN108304694B (en) Method for analyzing gene mutation based on second-generation sequencing data
WO2020244538A1 (en) Method for screening pathogenic uniparental disomy and use thereof
CN111883210A (en) Single-gene disease name recommendation method and system based on clinical features and sequence variation
CN111863125A (en) Detection method and application of uniparental diploidy based on NGS-trio
Yang et al. ggComp enables dissection of germplasm resources and construction of a multiscale germplasm network in wheat
CN110689930A (en) Method and device for detecting TMB
CN111508560A (en) A method for constructing high-density genotype maps of outcrossing species
CN114141310A (en) Construction method of background noise filtering model of repeated area and background noise filtering method
CN113823354A (en) Classification evaluation method for BRCA1/2 gene variation
CN112735519B (en) Method, device and storage medium for positioning segregation character
Michno et al. The importance of genotype identity, genetic heterogeneity, and bioinformatic handling for properly assessing genomic variation in transgenic plants
Zhang et al. Quantitative trait locus mapping with background control in genetic populations of clonal F1 and double cross
CN113793637B (en) Whole genome association analysis method based on parental genotype and progeny phenotype
CN113808665A (en) Causal association analysis method for fine positioning of genome-wide pathogenic SNP
CN117625812B (en) Molecular marker for cashmere thickness character breeding identification and application thereof
CN107201403A (en) Cotton fiber length correlation QTL and its application
Shah et al. The complex genetic architecture of recombination and structural variation in wheat uncovered using a large 8-founder MAGIC population
CN114974415A (en) Method and device for detecting chromosome copy number abnormality
CN114203257B (en) Method for obtaining background reversion rate of backcross population based on SNP marker
CN111798926A (en) Pathogenic gene locus database and establishment method thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant