[go: up one dir, main page]

CN101930502B - Method and system for detection of phenotype genes and analysis of biological information - Google Patents

Method and system for detection of phenotype genes and analysis of biological information Download PDF

Info

Publication number
CN101930502B
CN101930502B CN201010273517XA CN201010273517A CN101930502B CN 101930502 B CN101930502 B CN 101930502B CN 201010273517X A CN201010273517X A CN 201010273517XA CN 201010273517 A CN201010273517 A CN 201010273517A CN 101930502 B CN101930502 B CN 101930502B
Authority
CN
China
Prior art keywords
gene
sequence
dna
homologous
dna sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201010273517XA
Other languages
Chinese (zh)
Other versions
CN101930502A (en
Inventor
安娜
李波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BGI Technology Solutions Co Ltd
Original Assignee
BGI Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BGI Shenzhen Co Ltd filed Critical BGI Shenzhen Co Ltd
Priority to CN201010273517XA priority Critical patent/CN101930502B/en
Publication of CN101930502A publication Critical patent/CN101930502A/en
Priority to HK11102469.5A priority patent/HK1148372A1/en
Application granted granted Critical
Publication of CN101930502B publication Critical patent/CN101930502B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a method and a system for detection of phenotype genes and analysis of biological information. The method comprises the following steps of: performing local comparison between the sequence of target genes and nucleotide sequences of predicted genome coding regions of all allied species respectively, and obtaining a homologous sequence according to the local comparison results; screening the homologous sequence to obtain the homologous genes; extracting a DNA sequence from the homologous genes, converting the DNA sequence into a protein sequence, performing the overall comparison, and then converting the protein sequence into the DNA sequence; constructing a gene tree for the DNA sequence obtained by conversion, and working out Dn/Ds of each branch; and performing positive selection detection and detecting positive selection loci. The method and the system provided by the invention have the advantages of quickly processing a large amount of data information and discovering more accurate information, reducing misguiding of pseudogenes on analysis of biological information and helping to acquire the biological information to explain more biological problems.

Description

The detection of phenotype genes and the method and system of analysis of biological information
Technical field
The present invention relates to biological technical field, relate in particular to a kind of detection of phenotype genes and the method and system of analysis of biological information.
Background technology
Based on the analysis and research of biological phenotypic characteristic gene, also comprise the relevant gene studiess of feature such as some physiological, biochemical and behavior habit, before occurring, extensive genomic data mainly is being based upon in the check of experiment porch.Along with people to the understanding of biology from the macro development to the microcosmic, scientist has developed into the molecule on the microcosmic to the species The classification basis also form on the macroscopic view, and breakthrough progress has been arranged.Along with continuing to bring out of the genomic information with certain genetic affinity, to the sequence comparing analysis between the species that have certain genetic connection in the evolutionary process, the abundant information that obtains to contain in the genome has become the effective analysis means outside high flux and the normal experiment.Have nearer ancestors' biology altogether, the genome that has the kind difference between them is to be evolved by the ancestral gene group, and two kinds of biologies are approaching more on the stage of evolving, and their functional dependency is just high more.If there is very near nearly edge relation between the biology, their genome will show collinearity so, and promptly gene order is partly or entirely conservative.So just can utilize between the mould genome on the coded sequence and structural homology, locate gene in the other genome by the function information of known.Since Carl Woses in 1900 proposed genetic analysis, genetic analysis work was all mainly based on the relation between sequence, structure and the molecule, and the more effects that will play affirmation of classic method.
In present research, phenotype genes and whole genome sequence compared can detect more gene, but since inconsistent because of the parameter that is provided with in screening, can there be a lot of pseudogenes simultaneously.The existence part of these pseudogenes is because the degeneration in historical development of gene, and a part is because the mistake of order-checking and mistake is measured, and a part is then producing because of standard is provided with difference in the screening artificially in addition.For surveying the existence of the functional gene that makes new advances accurately, its evolution modelling is compared accurately infer simultaneously, we need badly and form a new method that is suitable for relatively in the screenings of functional gene.Analytically form a relatively accurate analysis process at new gene.
In sum, along with the increase of order-checking amount, the generation of increasing genomic data can the fast processing large scale data information and analytical information accurately is provided, and becomes the technical matters that this area needs to be resolved hurrily.
Summary of the invention
The technical matters that the present invention will solve provides a kind of detection of phenotype genes and the method and system of analysis of biological information, can the fast processing large scale data information and gene information accurately is provided, for further analysis of biological information and solve biological question and provide safeguard.
One aspect of the present invention provides a kind of detection of phenotype genes and the method for analysis of biological information, this method comprises: carry out part comparison with the nucleotide sequence in the genome encoding district of each nearly edge species prediction respectively with the sequence of genes of interest, the result who compares according to the part obtains homologous sequence; Homologous sequence is screened, obtain homologous gene; Extract dna sequence dna, dna sequence dna is converted into protein sequence, and carries out overall comparison, be converted into dna sequence dna again; The dna sequence dna that conversion is obtained makes up gene tree, and calculates the Dn/Ds of each branch; And positive selection check is just being selected the site with check.
Among the embodiment of the detection of phenotype genes provided by the invention and the method for analysis of biological information, step " is screened homologous sequence, is obtained homologous gene " and further comprises: according to similarity and coverage rate homologous sequence is screened; And according to homogenic gene ontology gene function annotation system GO, gene structure annotation system IPR annotation information is screened.
Among the embodiment of the detection of phenotype genes provided by the invention and the method for analysis of biological information, step " dna sequence dna that conversion is obtained makes up gene tree; and calculate the Dn/Ds of each branch " further comprise: utilize the Bayesian statistics method, adopt Mrbayes software that dna sequence dna is made up gene tree; And the Dn/Ds that adopts each branch of Codeml computed in software in the PAML software package.
Among the embodiment of the detection of phenotype genes provided by the invention and the method for analysis of biological information, step " employing Mrbayes software makes up gene tree to dna sequence dna " further comprises: adopt Modeltest software to select the best model of replacing, employing Mrbayes software building gene tree.
Among the embodiment of the detection of phenotype genes provided by the invention and the method for analysis of biological information, step " positive selection check is just being selected the site with check " further comprises: adopt " branch-site A " model in the PAML software package to go check just selecting the site; And filter out according to wrong discovery rate FDR and Bayes's empirical probability and to be subjected to the gene just selected.
Among the embodiment of the detection of phenotype genes provided by the invention and the method for analysis of biological information, step " filter out be subjected to the gene just selected according to wrong discovery rate FDR and Bayes's empirical probability " further comprises: the threshold value that preestablishes wrong discovery rate FDR is less than 0.05, and the threshold value of Bayes's empirical probability at least one site is greater than 0.95; And filter out according to the threshold value of the threshold value of wrong discovery rate FDR and Bayes's empirical probability and to be subjected to the gene just selected.
Another aspect of the present invention provides a kind of detection of phenotype genes and the system of analysis of biological information, this system comprises: local comparing module, be used for the sequence of genes of interest is carried out part comparison with the nucleotide sequence in the genome encoding district of each nearly edge species prediction respectively, the result who compares according to the part obtains homologous sequence; Homologous gene screening module is used for homologous sequence is screened, and obtains homologous gene; The dna sequence dna conversion module is used to extract dna sequence dna, and dna sequence dna is converted into protein sequence, and carries out overall comparison, is converted into dna sequence dna again; Gene tree makes up module, and the dna sequence dna that is used for conversion is obtained makes up gene tree, and calculates the Dn/Ds of each branch; And just selecting inspection module, be used for check and just selecting the site.
Among the embodiment of the detection of phenotype genes provided by the invention and the system of analysis of biological information, homologous gene screening module is further used for: according to similarity and coverage rate homologous sequence is screened; And according to the homogenic gene ontology functional annotation GO of system, gene structure annotation system IPR annotation information is screened.
Among the embodiment of the detection of phenotype genes provided by the invention and the system of analysis of biological information, gene tree makes up module and is further used for: utilize the Bayesian statistics method, adopt Mrbayes software that dna sequence dna is made up gene tree; And the Dn/Ds that adopts each branch of Codeml computed in software in the PAML software package.
Among the embodiment of the detection of phenotype genes provided by the invention and the system of analysis of biological information, just selecting inspection module further to comprise: to adopt " branch-siteA " model in the PAML software package to go check just selecting the site; And filter out according to wrong discovery rate FDR and Bayes's empirical probability and to be subjected to the gene just selected.
Among the embodiment of the detection of phenotype genes provided by the invention and the system of analysis of biological information, the threshold value that preestablishes wrong discovery rate FDR is less than 0.05, and the threshold value of Bayes's empirical probability at least one site is greater than 0.95; And filter out according to the threshold value of the threshold value of wrong discovery rate FDR and Bayes's empirical probability and to be subjected to the gene just selected.
The detection of phenotype genes provided by the invention and the method and system of analysis of biological information with reference to the gene of predicting in the nearly edge species gene group, filter out the homologous sequence of genes of interest or gene family by the similarity of predesignating; Adopt maximum likelihood method to build chadogram, come estimating system to grow the tree maximum likelihood ratio by comparison nucleic acid or amino acid, thereby can obtain phylogenetic more accurately topological structure.The detection of phenotype genes provided by the invention and the method and system of analysis of biological information can also can be excavated more accurate information by the fast processing large scale data information, reduced the misleading of pseudogene, helped illustrating more biological questions from the biological information of obtaining to analysis of biological information.
Description of drawings
Fig. 1 illustrates the process flow diagram of the method for the detection of a kind of phenotype genes that the embodiment of the invention provides and analysis of biological information;
Fig. 2 illustrates the process flow diagram of another embodiment of the method for the detection of phenotype genes provided by the invention and analysis of biological information;
Fig. 3 illustrates the process flow diagram of another embodiment of the method for the detection of phenotype genes provided by the invention and analysis of biological information;
Fig. 4 illustrates the process flow diagram of an embodiment of the method for the detection of phenotype genes provided by the invention and analysis of biological information;
Fig. 5 illustrates the structural representation of the system of the detection of a kind of phenotype genes that the embodiment of the invention provides and analysis of biological information;
Fig. 6 illustrates the structural representation of another embodiment of the system of the detection of phenotype genes provided by the invention and analysis of biological information;
Fig. 7 illustrates the topological structure synoptic diagram of gene tree of the phenotype genes (AtHKT1) of the little salt mustard that the present invention selects for use;
Fig. 8 illustrates each branch of gene tree " Dn/Ds " result of calculation synoptic diagram of the phenotype genes (AtHKT1) of the little salt mustard of the present invention.
Embodiment
With reference to the accompanying drawings the present invention is described more fully, exemplary embodiment of the present invention wherein is described.
Fig. 1 illustrates the process flow diagram of the method for the detection of a kind of phenotype genes that the embodiment of the invention provides and analysis of biological information.
As shown in Figure 1, the detection of phenotype genes and the method for analysis of biological information 100 comprise: step 102, carry out part comparison with the nucleotide sequence in the genome encoding district of each nearly edge species prediction respectively with the sequence of genes of interest, the result who compares according to the part obtains homologous sequence.For example, can use software " BLAST " that the sequence of genes of interest is carried out the part comparison with the nucleotide sequence in the genome encoding district of each nearly edge species respectively, wherein, reference sequences is the nucleotide sequence in certain nearly edge species gene group coding district, and the sequence of comparison is the protein sequence of order ground gene; Employed parameter of building the storehouse is " pF-oT ", and comparing used default parameters is " ptblastn-ele-5 ".There are 5 kinds of values selection " p " the parameter the inside of comparison type, and wherein " tblastn " is an integral body, is meant the comparison of protein sequence and nucleic acid library.The type of the database that the representative of " p " parameter is selected, the option of " p " parameter the inside has two F and T, and wherein " F " represents nucleic acid library, and " T " represents protein pool.The meaning of " o " parameter is to judge whether the analytical sequence name and set up sequence name index, and the option of " o " parameter the inside also has two F and T; Wherein sequence name index is set up in " F " expression, and sequence name index is set up in " T " expression." e " represents expectation value (E-value also can be referred to as expectation value), and this numeric representation is only because randomness causes the possible number of times that obtains this alignment result, and this numerical value is approaching more zero, and the possibility that this incident takes place is more little; From the angle of search, the e value is more little, and alignment result is remarkable more; " 1e-5 " is the concrete threshold value of predefined expectation value.
Step 104 is screened homologous sequence, obtains homologous gene.For example, the result according to local comparison in the step 102 screens the homologous sequence that is obtained; Those skilled in the art know, and it can screen the homologous sequence data according to the coverage (Query_end-Query_stat+1/Query_length) and the expectation value parameters such as (E_value) of the similarity (Identity) of sequence, reference sequences coverage (Subject_end-Subject_start+1/Subject_length), aligned sequences.In addition, those skilled in the art know, and can also introduce new standard and screen on the basis of aforementioned screening, will be further detailed for example among other embodiment after a while.
Step 106 is extracted dna sequence dna, dna sequence dna is converted into protein sequence, and carries out overall comparison, is converted into dna sequence dna again.For example, the DNA sequences encoding of extracting is converted into protein sequence after, carry out overall comparison with MUSCLE (Multiple Protein Sequence Aligment) software, be converted into dna sequence dna again; Wherein MUSCLE is a open source software that is used for protein level multisequencing comparison in issue in 2004, in view of its on speed and the advantage of precision, become those skilled in the art's normal software that uses in the overall comparison process more already; The core concept of MUSCLE algorithm mainly is to use gradual comparison to obtain initial multisequencing comparison earlier, re-uses the result that horizontal refining iteration improves the multisequencing comparison.
Protein sequence is converted into dna sequence dna again among the present invention after carrying out overall comparison, mainly is based on following reason: amino acid of three alkali yl codings, but the pairing triplet codon of amino acid can be a plurality of; Therefore amino acid is carried out overall comparison has more accuracy, a triplet code can not interrupted.In the heterogeneic comparative analysis of back, be to analyze according to the otherness of the base of encoding, amino acid sequence is converted into nucleotide sequence, can embody these difference better.
Step 108, the dna sequence dna that conversion is obtained makes up gene tree, and calculates the Dn/Ds of each branch.Specifically, Modeltest software is selected the best model of replacing, and utilizes the Bayesian statistics method, adopts Mrbayes software that dna sequence dna is made up gene tree; And the Dn/Ds that adopts each branch of Codeml computed in software in the PAML software package.Modeltest software is selected the best model of replacing automatically, and it chooses the scheme of the optimal data model that nucleotide is replaced.Select 56 models in this scheme, and realized three different model selection frameworks: layering likelihood ratio test (hLRTs), red pond information criterion (AIC) and bayesian information criterion (BIC).Provide best model by check.The model of usefulness is " GTR+gamma+I " in this article." PAML " is that a maximum likelihood method of utilizing is carried out the software package of Phylogenetic Analysis to DNA or protein sequence, and this software package is by the Yang Ziheng exploitation and provide free to academic research and use.The parameter that is adopted when for example, adopting the Dn/Ds of each branch of Codeml computed in software under the PAML can be set to respectively: noisy=3, verbose=1, runmode=0, seqtype=1, clock=0, model=1, NSsites=0, icode=0, CodonFreq=2, fix_kappa=0, kappa=4.54006, fix_omega=0, omega=1, fix_alpha=1, alpha=.0, Malpha=0, ncatG=4, getSE=0, RateAncestor=0, fix_blength=1, method=0.
Step 110, positive selection check is just being selected the site with check.For example, those skilled in the art can carry out positive selection check according to the disclosed mode of prior art, verify to be subjected to the site just selected.The concrete mode that also will align selection check among other embodiment after a while for example is further detailed.
An embodiment of the detection of phenotype genes provided by the invention and the method for analysis of biological information with reference to the gene of predicting in the nearly edge species gene group, filters out the homologous sequence of genes of interest or gene family by the similarity of predesignating; Adopt maximum likelihood method to build chadogram, come estimating system to grow the tree maximum likelihood ratio by comparison nucleic acid or amino acid, thereby can obtain phylogenetic more accurately topological structure.The detection of phenotype genes provided by the invention and the method for analysis of biological information can be excavated more accurate information, have reduced the misleading of pseudogene to analysis of biological information, help illustrating more biological questions from the biological information of obtaining.
Fig. 2 illustrates the process flow diagram of another embodiment of the method for the detection of phenotype genes provided by the invention and analysis of biological information.
As shown in Figure 2, the detection of phenotype genes and the method for analysis of biological information 200 comprise: step 202,204,205,206,208 and 210, wherein step 202,206,208 and 210 can be carried out respectively and step 102 shown in Figure 1,106,108 and 110 same or analogous technology contents, for for purpose of brevity, repeat no more its technology contents here.
As shown in Figure 2, after step 202, execution in step 204 is screened homologous sequence according to similarity and coverage rate.For example, according to the parameter of building the storehouse " pF-oT ", compare used default parameters " p tblastn-e 1e-5 ", similarity threshold, coverage threshold value (for example reference sequences coverage threshold value and/or aligned sequences threshold value) are set, the expectation value threshold value also can also be set the homologous sequence that the part comparison obtains is carried out preliminary screening.Next execution in step 205, and according to homogenic gene ontology GO, the IPR annotation information is screened.By the homologous gene that preliminary screening is obtained, in conjunction with its gene ontology gene function annotation system (GO, Gene Ontology) and gene structure annotation system IPR (Interpro Record) annotation information, carry out programmed screening again.GO is a widely used body in field of bioinformatics, mainly comprises three branches: bioprocess, molecular function and cell assembly; It is a individual system according to the gene function note.IPR is another system of carrying out note according to gene structure.
An embodiment of the detection of phenotype genes provided by the invention and the method for analysis of biological information with reference to the gene of predicting in the nearly edge species gene group, filters out the homologous sequence of genes of interest or gene family by the similarity of predesignating; Select the more sequence in homologous region territory by sieves, the annotation information in conjunction with these genes filters out gene comparatively approaching on the function again, and this technological means can filter out homologous gene in accurate more scope.The present invention compares the gene set that phenotype genes and note go out can detect homologous gene in more accurate scope, has further reduced the possibility of choosing pseudogene.
Fig. 3 illustrates the process flow diagram of another embodiment of the method for the detection of phenotype genes provided by the invention and analysis of biological information.
As shown in Figure 3, the detection of phenotype genes and the method for analysis of biological information 300 comprise: step 302,304,306,308,310 and 311, wherein step 302,304,306 and 308 can be carried out respectively and step 102 shown in Figure 1,104,106 and 108 same or analogous technology contents, for for purpose of brevity, repeat no more its technology contents here.
As shown in Figure 3, after step 308, execution in step 310 adopts " branch-site A " model in the PAML software package to go check just selecting the site.Specifically, " branch-siteA " model belongs to the existing model that the PAML software package provides, and is used to analyze adaptive evolution.For example, in the site branch model of just selecting, the site of the branch that might be subjected to just selecting is made as prospect, and remaining site is made as background.Alternative model allows to take place just to select on the prospect site, and zero model does not then allow.Thereby the null hypothesis parameter can be set be: noisy=3, verbose=1, runmode=0, seqtype=1, CodonFreq=2, clock=0, model=2, NSsites=2, icode=0, fix_kappa=0, kappa=4.54006, fix_omega=1, omega=1, fix_alpha=1, alpha=0., Malpha=0, ncatG=4, getSE=0, RateAncestor=0, fix_blength=1, method=0; The alternative hypothesis parameter is: noisy=3, verbose=1, runmode=0, seqtype=1, CodonFreq=2, clock=0, model=2, NSsites=2, icode=0, fix_kappa=0, kappa=4.54006, fix_omega=0, omega=1.5, fix_alpha=1, alpha=0., Malpha=0, ncatG=4, getSE=0, RateAncestor=0, fix_blength=1, method=0.
Step 311 filters out according to wrong discovery rate FDR and Bayes's empirical probability and to be subjected to the gene just selected.Specifically, the threshold value that can set in advance wrong discovery rate FDR is less than 0.05, and the threshold value of Bayes's empirical probability at least one site is greater than 0.95; And filter out according to the threshold value of the threshold value of wrong discovery rate FDR and Bayes's empirical probability and to be subjected to the gene just selected.
The value that adopts non-synonym replacement/synonym to replace is a replacement rate, estimates the value of each Dn of branch, Ds by likelihood ratio test; And only by relatively Dn and Ds can't determine whether just necessarily to be subjected to the selection effect.An embodiment of the detection of phenotype genes provided by the invention and the method for analysis of biological information, also adopted statistical method to check their authenticity and reliability, promptly the gene that detects has been screened the back and be subjected to the gene just selected by branch-site A model testing; Two hypothesis by likelihood method check branch-site hypothesis, the check mistake is just being selected frequency to screen and is just being selected gene, thereby guarantee to show the authenticity and the reliability of shape genetic test, for further analysis of biological information and solution biological question provide safeguard.
Fig. 4 illustrates the process flow diagram of an embodiment of the method for the detection of phenotype genes provided by the invention and analysis of biological information.
As shown in Figure 4, the method 400 of the detection of phenotype genes and analysis of biological information comprises:
Step 402 is carried out part comparison with the nucleotide sequence in the genome encoding district of each nearly edge species prediction respectively with the sequence of genes of interest, and the result who compares according to the part obtains homologous sequence.For example, can use software " BLAST " that the sequence of genes of interest is carried out the part comparison with the nucleotide sequence in the genome encoding district of each nearly edge species respectively, wherein, reference sequences is the nucleotide sequence in certain nearly edge species gene group coding district, and the sequence of comparison is the protein sequence of order ground gene; Employed parameter of building the storehouse is " pF-oT ", and comparing used default parameters is " p tblastn-e 1e-5 ".
Step 404 is screened homologous sequence according to similarity and coverage rate.For example, according to the parameter of building the storehouse " pF-oT ", compare used default parameters " ptblastn-e 1e-5 ", similarity threshold, coverage threshold value (for example reference sequences coverage threshold value and/or aligned sequences threshold value) are set, the expectation value threshold value also can also be set the homologous sequence that the part comparison obtains is carried out preliminary screening.
Step 405, according to homogenic gene ontology GO, the IPR annotation information is screened.By the homologous gene that preliminary screening is obtained, in conjunction with its gene ontology GO and IPR annotation information, carry out programmed screening again.
Step 406 is extracted dna sequence dna, dna sequence dna is converted into protein sequence, and carries out overall comparison, is converted into dna sequence dna again.For example, the DNA sequences encoding of extracting is converted into protein sequence after, carry out overall comparison with MUSCLE software, be converted into dna sequence dna again.
Step 408, the dna sequence dna that conversion is obtained makes up gene tree, and calculates the Dn/Ds of each branch.Specifically, Modeltest software is selected the best model of replacing, and utilizes the Bayesian statistics method, adopts Mrbayes that dna sequence dna is made up gene tree; And the Dn/Ds that adopts each branch of Codeml computed in software in the PAML software package.The parameter that is adopted when for example, adopting the Dn/Ds of each branch of Codeml computed in software under the PAML can be set to respectively: noisy=3, verbose=1, runmode=0, seqtype=1, clock=0, model=1, NSsites=0, icode=0, CodonFreq=2, fix_kappa=0, kappa=4.54006, fix_omega=0, omega=1, fix_alpha=1, alpha=0, Malpha=0, ncatG=4, getSE=0, RateAncestor=0, fix_blength=1, method=0.
Step 410 adopts " branch-site A " model in the PAML software package to go check just selecting the site.For example, the null hypothesis parameter being set is: noisy=3, verbose=1, runmode=0, seqtype=1, CodonFreq=2, clock=0, model=2, NSsites=2, icode=0, fix_kappa=0, kappa=4.54006, fix_omega=1, omega=1, fix_alpha=1, alpha=0., Malpha=0, ncatG=4, getSE=0, RateAncestor=0, fix_blength=1, method=0; The alternative hypothesis parameter is: noisy=3, verbose=1, runmode=0, seqtype=1, CodonFreq=2, clock=0, model=2, NSsites=2, icode=0, fix_kappa=0, kappa=4.54006, fix_omega=0, omega=1.5, fix_alpha=1, alpha=0., Malpha=0, ncatG=4, getSE=0, RateAncestor=0, fix_blength=1, method=0.
Step 411 filters out according to wrong discovery rate FDR and Bayes's empirical probability and to be subjected to the gene just selected.Specifically, the threshold value that can set in advance wrong discovery rate FDR is less than 0.05, and the threshold value of Bayes's empirical probability at least one site is greater than 0.95; And filter out according to the threshold value of the threshold value of wrong discovery rate FDR and Bayes's empirical probability and to be subjected to the gene just selected.
Fig. 5 illustrates the process flow diagram of an embodiment of the method for the detection of phenotype genes provided by the invention and analysis of biological information.
As shown in Figure 5, the detection of phenotype genes and the method for analysis of biological information 500 comprise: step 501, determine the protein sequence of target gene.
Step 502 is chosen the nucleotide sequence in the genome encoding district of the nearly edge species prediction of each of target gene sequence.
Step 504 utilizes BLAST software to carry out the part comparison, and the result who compares according to the part obtains homologous sequence.For example, can use software " BLAST " that the sequence of genes of interest is carried out the part comparison with the nucleotide sequence in the genome encoding district of each nearly edge species respectively, wherein, reference sequences is the nucleotide sequence in certain nearly edge species gene group coding district, and the sequence of comparison is the protein sequence of order ground gene; Employed parameter of building the storehouse is " pF-oT ", and comparing used default parameters is " p tblastn-e 1e-5 ".
Step 506 is screened homologous sequence according to the similarity in homology zone.For example, according to the parameter of building the storehouse " pF-oT ", compare used default parameters " p tblastn-e1e-5 ", similarity threshold, coverage threshold value (for example reference sequences coverage threshold value and/or aligned sequences threshold value) etc. are set the homologous sequence that the part comparison obtains is carried out preliminary screening.
Step 508, according to homogenic gene ontology GO, the IPR annotation information is screened.By the homologous gene that preliminary screening is obtained, in conjunction with its gene ontology GO and IPR annotation information, carry out programmed screening again.
Step 510 is extracted dna sequence dna, dna sequence dna is converted into protein sequence, and carries out overall comparison, is converted into dna sequence dna again.For example, the DNA sequences encoding of extracting is converted into protein sequence after, carry out overall comparison with MUSCLE software, be converted into dna sequence dna again.
Step 512, Modeltest software are selected the best model of replacing, and adopt Mrbayes that dna sequence dna is made up gene tree.
Step 514 is by branch's length comparison evolutionary rate of each gene.The priority that position by child node and father node (just each branch occur priority) relatively breaks up, the length of each branch is calculated according to the base mutation rate, can reflect the problem (mainly comprising: the time of differentiation, the speed of differentiation) of each gene rate of differentiation.
Step 515 judges that by the topological structure of gene tree the direct line (orthology) of gene and collateral line (paralogy) concern, infer in the recent period extensive duplicate event.
Step 516, the Dn/Ds of each branch of calculating gene tree judges the selection that each gene is suffered.Specifically, adopt the Dn/Ds of each branch of Codeml computed in software in the PAML software package, can adopt parameter selected in the previous embodiment when calculating the Dn/Ds of each branch; Adopt Codeml to calculate " Dn/Ds " of each branch, analyze the suffered selection pressure of each species homologous gene, wherein " Dn/Ds " is that (this ratio can judge whether that selection pressure acts on this protein coding gene for ratio between the contrary opinion frequency of mutation (Dn) and the same sense mutation frequency (Ds).If Dn/Ds>1, then thinking has positive selection effect.If Dn/Ds=1 then thinks to have neutral the selection.If Dn/Ds<1, then thinking has purifying selection effect); Thereby judge the selection that each gene is suffered.
Step 517 adopts " branch-site A " model in the PAML software package to go check just selecting the site, and the result is carried out wrong discovery rate FDR check.For example, prospect hypothesis parameter and background hypothesis parameter are set, adopt " branch-site A " model in the PAML software package to go check just selecting the site.Filter out according to wrong discovery rate FDR and Bayes's empirical probability and to be subjected to the gene just selected.Specifically, the threshold value that can set in advance wrong discovery rate FDR is less than 0.05, and the threshold value of Bayes's empirical probability at least one site is greater than 0.95; And filter out according to the threshold value of the threshold value of wrong discovery rate FDR and Bayes's empirical probability and to be subjected to the gene just selected.
An embodiment of the detection of phenotype genes provided by the invention and the method for analysis of biological information with reference to the gene of predicting in the nearly edge species gene group, filters out the homologous sequence of genes of interest or gene family by the similarity of predesignating; Select the more sequence in homologous region territory by sieves, the annotation information in conjunction with these genes filters out gene comparatively approaching on the function again, and this technological means can filter out homologous gene in accurate more scope.The present invention compares the gene set that phenotype genes and note go out can detect homologous gene in more accurate scope, has further reduced the possibility of choosing pseudogene.In addition, the present invention compares Orhology and Paralogy relation by the topological structure of gene tree, and whether deduction large-scale duplicate event has taken place in the recent period; Compare Dn/Ds, selected the software of multiparameter model for use, can adjust according to the difference of species character; Can be relatively accurate show out the pressure type that each gene is subjected to natural selection, and can detect by the method and be subjected to the new gene just selected, infer that effect is played in the enhancing of which gene pairs phenotype.The present invention has simultaneously also adopted statistical method to check their authenticity and reliability, promptly the gene that detects is screened the back and is subjected to the gene just selected by branch-site A model testing; Two hypothesis by likelihood method check branch-site hypothesis, the check mistake is just being selected frequency to screen and is just being selected gene, thereby guarantee to show the authenticity and the reliability of shape genetic test, for further analysis of biological information and solution biological question provide safeguard.
Fig. 6 illustrates the structural representation of the system of the detection of a kind of phenotype genes that the embodiment of the invention provides and analysis of biological information.
As shown in Figure 6, the system 600 of a kind of detection of phenotype genes and analysis of biological information comprises: local comparing module 602, homologous gene screening module 604, dna sequence dna conversion module 606, gene tree make up module 608 and are just selecting inspection module 610.Wherein
Local comparing module 602 is used for the sequence of genes of interest is carried out part comparison with the nucleotide sequence in the genome encoding district of each nearly edge species prediction respectively, and the result who compares according to the part obtains homologous sequence.For example, local comparing module 602 can be used software " BLAST " that the sequence of genes of interest is carried out part with the nucleotide sequence in the genome encoding district of each nearly edge species respectively and compare, wherein, reference sequences is the nucleotide sequence in certain nearly edge species gene group coding district, and the sequence of comparison is the protein sequence of order ground gene; Employed parameter of building the storehouse is " pF-oT ", and comparing used default parameters is " p tblastn-e 1e-5 ".
Homologous gene screening module 604 is used for homologous sequence is screened, and obtains homologous gene.For example, according to the result of local comparing module 602 local comparisons, homologous gene screening module 604 is screened the homologous sequence that is obtained; Those skilled in the art know, and it can screen the homologous sequence data according to the coverage and the expectation value parameters such as (E_value) of the similarity of sequence, reference sequences coverage, aligned sequences.
Dna sequence dna conversion module 606 is used to extract dna sequence dna, and dna sequence dna is converted into protein sequence, and carries out overall comparison, is converted into dna sequence dna again.For example, after the DNA sequences encoding of 606 pairs of extractions of dna sequence dna conversion module is converted into protein sequence, carry out overall comparison, be converted into dna sequence dna again with MUSCLE software.
Gene tree makes up module 608, and the dna sequence dna that is used for conversion is obtained makes up gene tree, and calculates the Dn/Ds of each branch.Specifically, utilize the Bayesian statistics method, adopt Mrbayes that dna sequence dna is made up gene tree; And the Dn/Ds that adopts each branch of Codeml computed in software in the PAML software package.
Just selecting inspection module 610, be used for check and just selecting the site.Further, just selecting inspection module to adopt " branch-site A " model in the PAML software package to go check just selecting the site; And filter out according to wrong discovery rate FDR and Bayes's empirical probability and to be subjected to the gene just selected.But the description among the concrete flow process details reference method embodiment that adopts just repeats no more here.
Among the embodiment of the detection of phenotype genes provided by the invention and the system of analysis of biological information, homologous gene screening module is further used for: according to similarity and coverage rate homologous sequence is screened; And according to homogenic gene ontology GO, the IPR annotation information is screened.But the description among the concrete flow process details reference method embodiment that adopts just repeats no more here.
Among the embodiment of the detection of phenotype genes provided by the invention and the system of analysis of biological information, the threshold value that preestablishes wrong discovery rate FDR is less than 0.05, and the threshold value of Bayes's empirical probability at least one site is greater than 0.95; And filter out according to the threshold value of the threshold value of wrong discovery rate FDR and Bayes's empirical probability and to be subjected to the gene just selected.
An embodiment of the detection of phenotype genes provided by the invention and the system of analysis of biological information, local comparing module is with reference to the gene of predicting in the nearly edge species gene group, and homologous gene screening module filters out the homologous sequence of genes of interest or gene family by the similarity of predesignating; Select the more sequence in homologous region territory by sieves, the annotation information in conjunction with these genes filters out gene comparatively approaching on the function again, and this technological means can filter out homologous gene in accurate more scope.The present invention compares the gene set that phenotype genes and note go out can detect homologous gene in more accurate scope, has further reduced the possibility of choosing pseudogene.In addition, the present invention has also adopted statistical method to check their authenticity and reliability, promptly the gene that detects is screened the back and is subjected to the gene of just selecting by just selecting the inspection module check; Two hypothesis by likelihood method check branch-site hypothesis, the check mistake is just being selected frequency to screen and is just being selected gene, thereby guarantee to show the authenticity and the reliability of shape genetic test, for further analysis of biological information and solution biological question provide safeguard.
Fig. 7 illustrates the topological structure synoptic diagram of gene tree of the phenotype genes (AtHKT1) of the little salt mustard that the present invention selects for use.
Choose " protein sequence of the phenotype genes of little salt mustard (AtHKT1) " as target gene sequences, its each nearly edge species are respectively grape, willow, paddy rice and arabidopsis; Adopting BLAST software to carry out part the nucleotide sequence in the genome encoding district of the prediction of the protein sequence of the phenotype genes (AtHKT1) of little salt mustard and grape, willow, paddy rice and arabidopsis compares.The parameter of wherein building the storehouse is " pF-oT ", and comparing used default parameters is " p tblastn-e 1e-5 ".By homology similarity>0.3 is set, reference sequences coverage>0.3, the coverage of aligned sequences>0.3, expectation value<1e-5 screens for the first time, and results of screening is screened in conjunction with GO and IPR annotation information again.Table 1 illustrates the homologous gene and the note result of the phenotype genes (AtHKT1) of little salt mustard.As shown in table 1, through aforementioned screening, filter out 18 new genes altogether, the gene that is filtered out is relevant with the transportation of metal mostly.
The sequence of these genes is extracted, and use the Bayesian statistics method of utilizing, adopt Mrbayes software building gene tree, obtain topological structure as shown in Figure 7.As shown in Figure 7, at grape, willow, paddy rice, the screening new gene similar in the arabidopsis to the phenotype genes (AtHKT1) of little salt mustard.The nucleotide sequence in the genome encoding district of the prediction that two of the upper left corner zones are nearly edge species " grape " among the figure, the bulk zone in the lower right corner is the nucleotide sequence in genome encoding district of the prediction of nearly edge species " paddy rice " among the figure, and the zone, upper right side is the nucleotide sequence in genome encoding district of the prediction of nearly edge species " arabidopsis " among the figure; According to shown in Figure 7, those skilled in the art can clearly know: these genes of nearly edge species " grape ", " paddy rice ", " arabidopsis " all have on a large scale at the phenotype genes (AtHKT1) of little salt mustard and duplicate; In addition, because collateral line (Paralogy) relation is nearer, illustrate that the gene differentiation time of origin in the species is not long, and intimate gene breaks up to early between species.Length by branch more as can be known, (gene of the POPTR ending among Fig. 7 is the gene that screens from willow in willow.Branch is longer relatively.Gene with color mark among the figure is that massive duplication is arranged, so the gene of willow has only a color of no use to mark) and paddy rice in rate of differentiation fast.
Each branch of gene tree to the phenotype genes (AtHKT1) of little salt mustard shown in Figure 7 analyzes subsequently, and calculates according to " Dn/Ds " and to draw Fig. 8.Fig. 8 illustrates each branch of gene tree " Dh/Ds " result of calculation synoptic diagram of the phenotype genes (AtHKT1) of the little salt mustard of the present invention.Adopt the branch-site A model testing in the PAML software package just selecting gene, FDR detects the just selection gene of q_value<0.05, obtains just selecting shown in subordinate list 2 testing result.Table 2 illustrates the homologous gene of the phenotype genes (AtHKT1) of little salt mustard and is just selecting testing result, wherein detects 7 genes and is subjected to just to select.
With reference to the exemplary description of aforementioned the present invention, those skilled in the art can clearly know the present invention and have the following advantages:
1, the method and system of the detection of phenotype genes provided by the invention and analysis of biological information embodiment with reference to the gene of predicting in the nearly edge species gene group, filters out the homologous sequence of genes of interest or gene family by the similarity of predesignating; Adopt maximum likelihood method to build chadogram, come estimating system to grow the tree maximum likelihood ratio by comparison nucleic acid or amino acid, thereby can obtain phylogenetic more accurately topological structure.The detection of phenotype genes provided by the invention and the method for analysis of biological information can be excavated more accurate information, have reduced the misleading of pseudogene to analysis of biological information, help illustrating more biological questions from the biological information of obtaining.
2, the method and system of the detection of phenotype genes provided by the invention and analysis of biological information embodiment with reference to the gene of predicting in the nearly edge species gene group, filters out the homologous sequence of genes of interest or gene family by the similarity of predesignating; Select the more sequence in homologous region territory by sieves, the annotation information in conjunction with these genes filters out gene comparatively approaching on the function again, and this technological means can filter out homologous gene in accurate more scope.The present invention compares the gene set that phenotype genes and note go out can detect homologous gene in more accurate scope, has further reduced the possibility of choosing pseudogene.
3, the method and system of the detection of phenotype genes provided by the invention and analysis of biological information embodiment, adopted statistical method to check their authenticity and reliability, promptly the gene that detects has been screened the back and be subjected to the gene just selected by branch-site A model testing; Two hypothesis by likelihood method check branch-site hypothesis, the check mistake is just being selected frequency to screen and is just being selected gene, thereby guarantee to show the authenticity and the reliability of shape genetic test, for further analysis of biological information and solution biological question provide safeguard.
4, the method for the detection of phenotype genes provided by the invention and analysis of biological information embodiment, by the topological structure comparison Orhology and the Paralogy relation of gene tree, whether deduction large-scale duplicate event has taken place in the recent period; Compare Dn/Ds, selected the software of multiparameter model for use, can adjust according to the difference of species character; Can be relatively accurate show out the pressure type that each gene is subjected to natural selection, and can detect by the method and be subjected to the new gene just selected, infer that effect is played in the enhancing of which gene pairs phenotype.
5, the method for the detection of phenotype genes provided by the invention and analysis of biological information embodiment is widely used, and is applicable to multiple biology; And detection and analysis speed are fast, the accuracy rate height.Specifically, can use in the information analysis of the phenotypic correlation of animal, plant group.According to known phenotype genes, in same species and nearly edge species, find a plurality of new phenotype correlation genes, for the excavation and the analysis of many species functional gene provides strong support.
Description of the invention provides for example with for the purpose of describing, and is not exhaustively or limit the invention to disclosed form.Many modifications and variations are obvious for the ordinary skill in the art.The functional module of describing among the present invention and the dividing mode of functional module only are explanation thought of the present invention, and those skilled in the art can freely change the dividing mode of functional module and module structure thereof with the realization identical functions according to the needs of instruction of the present invention and practical application; Selecting and describing embodiment is for better explanation principle of the present invention and practical application, thereby and makes those of ordinary skill in the art can understand the various embodiment that have various modifications that the present invention's design is suitable for special-purpose.
The homologous gene and the note result of the phenotype genes (AtHKT1) of the little salt mustard of table 1
Figure BSA00000261549500181
Figure BSA00000261549500191
The homologous gene of the phenotype genes (AtHKT1) of the little salt mustard of table 2 is just being selected testing result
Figure BSA00000261549500201

Claims (11)

1. the method for the detection of a phenotype genes and analysis of biological information is characterized in that, described method comprises:
Carry out part comparison with the nucleotide sequence in the genome encoding district of each nearly edge species prediction respectively with the sequence of genes of interest, obtain homologous sequence according to the described local result who compares;
Described homologous sequence is screened, obtain homologous gene;
Extract dna sequence dna, described dna sequence dna is converted into protein sequence, and carries out overall comparison, be converted into dna sequence dna again;
The described dna sequence dna that conversion is obtained makes up gene tree, and calculates the non-synonym replacement Dn/ synonym replacement Ds of each branch; And
Positive selection check is just being selected the site with check.
2. method according to claim 1 is characterized in that, step " is screened described homologous sequence, obtained homologous gene " and further comprises:
According to similarity and coverage rate described homologous sequence is screened; And
According to homogenic gene ontology gene function annotation system GO, gene structure annotation system IPR annotation information is screened.
3. method according to claim 1 is characterized in that, step " the described dna sequence dna that conversion is obtained makes up gene tree, and calculates the Dn/Ds of each branch " further comprises:
Based on the Bayesian statistics method, adopt Mrbayes software that described dna sequence dna is made up gene tree; And
Adopt the Dn/Ds of each branch of Codeml computed in software in the PAML software package.
4. method according to claim 3 is characterized in that, step " adopts Mrbayes software to described dna sequence dna structure gene tree " and further comprises:
Adopt Modeltest software to select the best model of replacing, adopt Mrbayes software building gene tree.
5. method according to claim 1 is characterized in that, step " positive selection check is just being selected the site with check " further comprises:
Adopt " branch-site A " model in the PAML software package to go check just selecting the site; And
Filter out according to wrong discovery rate FDR and Bayes's empirical probability and to be subjected to the gene just selected.
6. method according to claim 5 is characterized in that, step " filter out be subjected to the gene just selected according to wrong discovery rate FDR and Bayes's empirical probability " further comprises:
The threshold value that sets in advance described wrong discovery rate FDR is less than 0.05, and the threshold value of Bayes's empirical probability at least one site is greater than 0.95; And
Filter out according to the threshold value of the threshold value of described wrong discovery rate FDR and described Bayes's empirical probability and to be subjected to the gene just selected.
7. the system of the detection of a phenotype genes and analysis of biological information is characterized in that, described system comprises:
Local comparing module is used for the sequence of genes of interest is carried out part comparison with the nucleotide sequence in the genome encoding district of each nearly edge species prediction respectively, obtains homologous sequence according to the described local result who compares;
Homologous gene screening module is used for described homologous sequence is screened, and obtains homologous gene;
The dna sequence dna conversion module is used to extract dna sequence dna, and described dna sequence dna is converted into protein sequence, and carries out overall comparison, is converted into dna sequence dna again;
Gene tree makes up module, is used for the described dna sequence dna that conversion obtains is made up gene tree, and calculates the non-synonym replacement Dn/ synonym replacement Ds of each branch; And
Just selecting inspection module, be used for check and just selecting the site.
8. system according to claim 7 is characterized in that, described homologous gene screening module is further used for: according to similarity and coverage rate described homologous sequence is screened; And according to the homogenic gene ontology functional annotation GO of system, gene structure annotation system IPR annotation information is screened.
9. system according to claim 7 is characterized in that, described gene tree makes up module and is further used for: utilize the Bayesian statistics method, adopt Mrbayes software that described dna sequence dna is made up gene tree; And the Dn/Ds that adopts each branch of Codeml computed in software in the PAML software package.
10. system according to claim 7 is characterized in that, is just selecting inspection module to be further used for: adopt " branch-site A " model in the PAML software package to go check just selecting the site; And filter out according to wrong discovery rate FDR and Bayes's empirical probability and to be subjected to the gene just selected.
11. system according to claim 10 is characterized in that, the threshold value that sets in advance described wrong discovery rate FDR is less than 0.05, and the threshold value of Bayes's empirical probability at least one site is greater than 0.95; And filter out according to the threshold value of the threshold value of described wrong discovery rate FDR and described Bayes's empirical probability and to be subjected to the gene just selected.
CN201010273517XA 2010-09-03 2010-09-03 Method and system for detection of phenotype genes and analysis of biological information Active CN101930502B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201010273517XA CN101930502B (en) 2010-09-03 2010-09-03 Method and system for detection of phenotype genes and analysis of biological information
HK11102469.5A HK1148372A1 (en) 2010-09-03 2011-03-11 A method and a system for the detection and bioinformatic analysis of the phenotypic genes

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201010273517XA CN101930502B (en) 2010-09-03 2010-09-03 Method and system for detection of phenotype genes and analysis of biological information

Publications (2)

Publication Number Publication Date
CN101930502A CN101930502A (en) 2010-12-29
CN101930502B true CN101930502B (en) 2011-12-21

Family

ID=43369675

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201010273517XA Active CN101930502B (en) 2010-09-03 2010-09-03 Method and system for detection of phenotype genes and analysis of biological information

Country Status (2)

Country Link
CN (1) CN101930502B (en)
HK (1) HK1148372A1 (en)

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102559880B (en) * 2011-12-23 2013-10-09 北京林业大学 A method for constructing the original chromosome of Rosaceae
CN104424399B (en) * 2013-08-30 2018-02-23 中国科学院上海生命科学研究院 A kind of method, apparatus of the knowledge navigation based on virus protein body
CN104134018B (en) * 2014-07-30 2017-09-26 北京诺禾致源科技股份有限公司 The apparatus and method that the source of chromosome segment is inferred in pedigree
CN104134016B (en) * 2014-07-30 2017-12-15 北京诺禾致源科技股份有限公司 The apparatus and method that pedigree on molecular level is rebuild
CN105404793B (en) * 2015-12-07 2018-05-11 浙江大学 The method for quickly finding phenotype correlation gene based on probabilistic framework and weight sequencing technologies
CN105488356A (en) * 2016-01-20 2016-04-13 青海师范大学 Molecular evolution analysis method of GP gene of Ebola virus
CN107273204B (en) * 2016-04-08 2020-10-09 华为技术有限公司 Resource allocation method and apparatus for genetic analysis
CN107273713B (en) * 2017-05-26 2020-06-02 浙江工业大学 A multi-domain protein template search method based on TM-align
CN109243531B (en) * 2018-07-24 2021-11-26 江苏省农业科学院 Method for batch calculation of genome coding region SNP sites among closely related species
CN109411021A (en) * 2018-10-09 2019-03-01 中国科学院昆明植物研究所 A kind of species tree constructing method based on extensive gene tree
CN110400604B (en) * 2019-06-28 2021-10-08 中国科学院计算技术研究所 A method and system for analyzing codon usage patterns in multiple species of Rutaceae
CN111477275B (en) * 2020-04-02 2020-12-25 上海之江生物科技股份有限公司 Method and device for identifying multi-copy area in microorganism target fragment and application
CN113628684A (en) * 2021-08-06 2021-11-09 苏州鸿晓生物科技有限公司 Sample bacterial species detection methods and systems
CN114283882B (en) * 2021-12-31 2022-08-19 华智生物技术有限公司 Non-destructive poultry egg quality character prediction method and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1145098A (en) * 1994-01-27 1997-03-12 因塞特药品公司 Comparative Analysis of Gene Transcripts
CN1670754A (en) * 2004-07-09 2005-09-21 清华大学 Alignment method of protein stereostructure based on mean field annealing technique
CN1884521A (en) * 2006-06-21 2006-12-27 北京未名福源基因药物研究中心有限公司 Method for finding novel gene and computer system platform using same and novel gene
CN101561845A (en) * 2008-12-12 2009-10-21 深圳华大基因研究院 Detection method of chromosome synteny homology region and system thereof

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040191781A1 (en) * 2003-03-28 2004-09-30 Jie Zhang Genomic profiling of regulatory factor binding sites
DK2479267T3 (en) * 2006-12-21 2017-03-27 Basf Enzymes Llc Amylases and Glucoamylases, Nucleic Acids Encoding Them, and Methods for Preparing and Using Them

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1145098A (en) * 1994-01-27 1997-03-12 因塞特药品公司 Comparative Analysis of Gene Transcripts
CN1670754A (en) * 2004-07-09 2005-09-21 清华大学 Alignment method of protein stereostructure based on mean field annealing technique
CN1884521A (en) * 2006-06-21 2006-12-27 北京未名福源基因药物研究中心有限公司 Method for finding novel gene and computer system platform using same and novel gene
CN101561845A (en) * 2008-12-12 2009-10-21 深圳华大基因研究院 Detection method of chromosome synteny homology region and system thereof

Also Published As

Publication number Publication date
CN101930502A (en) 2010-12-29
HK1148372A1 (en) 2011-09-02

Similar Documents

Publication Publication Date Title
CN101930502B (en) Method and system for detection of phenotype genes and analysis of biological information
Steenwyk et al. Incongruence in the phylogenomics era
CN104164479B (en) Heterozygous genes group processing method
Ravinet et al. Interpreting the genomic landscape of speciation: a road map for finding barriers to gene flow
EP3207483B1 (en) Ancestral human genomes
EP3304383B1 (en) De novo diploid genome assembly and haplotype sequence reconstruction
CN101957892B (en) Whole-genome replication event detection method and system
WO2017143585A1 (en) Method and apparatus for assembling separated long fragment sequences
CN109448794B (en) Genetic taboo and Bayesian network-based epistatic site mining method
US12062417B2 (en) System, method and computer accessible-medium for multiplexing base calling and/or alignment
CN112349350A (en) Method for strain identification based on Dunaliella core genome sequence
CN111445954B (en) Method for identifying multiple gene families and carrying out evolutionary analysis
Ané RECONSTRUCTING CONCORDANCE TREES AND TESTING THE~~ COALESCENT MODEL FROM~~ GENOME-WIDE DATA sars
CN108460248A (en) A method of based on the long tandem repetitive sequence of Bionano detection of platform
Izuno et al. Demography and selection analysis of the incipient adaptive radiation of a Hawaiian woody species
CN108376210A (en) A kind of breeding parent selection method excavated based on the advantageous haplotypes of full-length genome SNP of genomic information auxiliary breeding means II-
CN112086128B (en) Third generation full-length transcriptome sequencing result analysis method suitable for sequence sequencing
Riaño-Pachón et al. Modern approaches for transcriptome analyses in plants
Armstrong Enabling comparative genomics at the scale of hundreds of species
Singleton et al. Leveraging genomic redundancy to improve inference and alignment of orthologous proteins
Takou et al. Predicting gene expression responses to environment in Arabidopsis thaliana using natural variation in DNA sequence
Roger et al. Phylogenomic analysis
Singh et al. Inferring interaction networks from transcriptomic data: methods and applications
Gaitán Gómez Development of a new structural variant detection software based on graph clustering machine learning algorithms from long reads
CN116705156A (en) A method for finding decisive loci of virus genome classification based on decision tree algorithm

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 1148372

Country of ref document: HK

C14 Grant of patent or utility model
GR01 Patent grant
ASS Succession or assignment of patent right

Owner name: HUADA GENE RESEARCH +. DEVELOPMENT CENTRE, HANGZHO

Free format text: FORMER OWNER: BGI-SHENZHEN CO., LTD.

Effective date: 20120316

C41 Transfer of patent application or patent right or utility model
COR Change of bibliographic data

Free format text: CORRECT: ADDRESS; FROM: 518083 SHENZHEN, GUANGDONG PROVINCE TO: 310008 HANGZHOU, ZHEJIANG PROVINCE

TR01 Transfer of patent right

Effective date of registration: 20120316

Address after: 310008 No. 51, Zhijiang Road, Hangzhou, Zhejiang, Xihu District

Patentee after: Huada Gene Research &. Development Centre, Hangzhou

Address before: Beishan Industrial Zone Building in Yantian District of Shenzhen city of Guangdong Province in 518083

Patentee before: BGI-Shenzhen Co., Ltd.

REG Reference to a national code

Ref country code: HK

Ref legal event code: GR

Ref document number: 1148372

Country of ref document: HK

ASS Succession or assignment of patent right

Owner name: BGI TECHNOLOGY SOLUTIONS CO., LTD.

Free format text: FORMER OWNER: HUADA GENE RESEARCH +. DEVELOPMENT CENTRE, HANGZHOU

Effective date: 20130715

C41 Transfer of patent application or patent right or utility model
COR Change of bibliographic data

Free format text: CORRECT: ADDRESS; FROM: 310008 HANGZHOU, ZHEJIANG PROVINCE TO: 518083 SHENZHEN, GUANGDONG PROVINCE

TR01 Transfer of patent right

Effective date of registration: 20130715

Address after: 518083 science and Technology Pioneer Park, comprehensive building, Beishan Industrial Zone, Yantian District, Guangdong, Shenzhen 201

Patentee after: BGI Technology Solutions Co., Ltd.

Address before: 310008 No. 51, Zhijiang Road, Hangzhou, Zhejiang, Xihu District

Patentee before: Huada Gene Research &. Development Centre, Hangzhou