CN110050092B - Rice whole genome breeding chip and application thereof - Google Patents
Rice whole genome breeding chip and application thereof Download PDFInfo
- Publication number
- CN110050092B CN110050092B CN201680091357.2A CN201680091357A CN110050092B CN 110050092 B CN110050092 B CN 110050092B CN 201680091357 A CN201680091357 A CN 201680091357A CN 110050092 B CN110050092 B CN 110050092B
- Authority
- CN
- China
- Prior art keywords
- chip
- snp
- synthesis
- rice
- seq
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 235000007164 Oryza sativa Nutrition 0.000 title claims abstract description 116
- 235000009566 rice Nutrition 0.000 title claims abstract description 116
- 238000009395 breeding Methods 0.000 title claims description 30
- 230000001488 breeding effect Effects 0.000 title claims description 30
- 240000007594 Oryza sativa Species 0.000 title abstract 2
- 238000000034 method Methods 0.000 claims abstract description 63
- 239000003550 marker Substances 0.000 claims abstract description 57
- 238000003205 genotyping method Methods 0.000 claims abstract description 6
- 241000209094 Oryza Species 0.000 claims description 115
- 108090000623 proteins and genes Proteins 0.000 claims description 72
- 230000015572 biosynthetic process Effects 0.000 claims description 42
- 238000003786 synthesis reaction Methods 0.000 claims description 42
- 238000001514 detection method Methods 0.000 claims description 39
- 239000002773 nucleotide Substances 0.000 claims description 39
- 125000003729 nucleotide group Chemical group 0.000 claims description 39
- 239000000523 sample Substances 0.000 claims description 31
- 238000005516 engineering process Methods 0.000 claims description 25
- 238000011065 in-situ storage Methods 0.000 claims description 23
- 102000054766 genetic haplotypes Human genes 0.000 claims description 17
- 108091028043 Nucleic acid sequence Proteins 0.000 claims description 14
- 241000746966 Zizania Species 0.000 claims description 14
- 235000002636 Zizania aquatica Nutrition 0.000 claims description 14
- 238000004458 analytical method Methods 0.000 claims description 12
- 239000012634 fragment Substances 0.000 claims description 12
- 239000012472 biological sample Substances 0.000 claims description 10
- 238000001308 synthesis method Methods 0.000 claims description 10
- 239000011325 microbead Substances 0.000 claims description 7
- 238000000206 photolithography Methods 0.000 claims description 5
- 229920002120 photoresistant polymer Polymers 0.000 claims description 5
- 238000007639 printing Methods 0.000 claims description 5
- 238000002174 soft lithography Methods 0.000 claims description 5
- 239000000725 suspension Substances 0.000 claims description 5
- 238000010367 cloning Methods 0.000 claims description 4
- 238000013507 mapping Methods 0.000 claims description 3
- 238000000926 separation method Methods 0.000 claims description 3
- 238000013461 design Methods 0.000 abstract description 5
- 230000035772 mutation Effects 0.000 description 24
- 239000000463 material Substances 0.000 description 23
- 210000000349 chromosome Anatomy 0.000 description 19
- 238000011144 upstream manufacturing Methods 0.000 description 18
- 238000009826 distribution Methods 0.000 description 16
- 238000012163 sequencing technique Methods 0.000 description 14
- 108091092724 Noncoding DNA Proteins 0.000 description 12
- 230000002068 genetic effect Effects 0.000 description 12
- 108020004414 DNA Proteins 0.000 description 11
- 238000012216 screening Methods 0.000 description 10
- 108700028369 Alleles Proteins 0.000 description 8
- 239000011324 bead Substances 0.000 description 8
- 230000008569 process Effects 0.000 description 8
- 241000196324 Embryophyta Species 0.000 description 6
- 108091026898 Leader sequence (mRNA) Proteins 0.000 description 6
- 108700026244 Open Reading Frames Proteins 0.000 description 6
- 108091036066 Three prime untranslated region Proteins 0.000 description 6
- 238000009396 hybridization Methods 0.000 description 6
- 125000006850 spacer group Chemical group 0.000 description 6
- 101150050192 PIGM gene Proteins 0.000 description 5
- 235000013339 cereals Nutrition 0.000 description 5
- 230000000694 effects Effects 0.000 description 5
- 150000007523 nucleic acids Chemical class 0.000 description 5
- 108091034117 Oligonucleotide Proteins 0.000 description 4
- 230000006872 improvement Effects 0.000 description 4
- 238000011160 research Methods 0.000 description 4
- 239000000758 substrate Substances 0.000 description 4
- 108091026890 Coding region Proteins 0.000 description 3
- 238000000018 DNA microarray Methods 0.000 description 3
- 240000002582 Oryza sativa Indica Group Species 0.000 description 3
- 240000008467 Oryza sativa Japonica Group Species 0.000 description 3
- 238000003556 assay Methods 0.000 description 3
- 239000002131 composite material Substances 0.000 description 3
- 238000003976 plant breeding Methods 0.000 description 3
- 238000005070 sampling Methods 0.000 description 3
- 230000002194 synthesizing effect Effects 0.000 description 3
- 108091032973 (ribonucleotides)n+m Proteins 0.000 description 2
- 239000003298 DNA probe Substances 0.000 description 2
- XUIMIQQOPSSXEZ-UHFFFAOYSA-N Silicon Chemical compound [Si] XUIMIQQOPSSXEZ-UHFFFAOYSA-N 0.000 description 2
- JLCPHMBAVCMARE-UHFFFAOYSA-N [3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-hydroxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methyl [5-(6-aminopurin-9-yl)-2-(hydroxymethyl)oxolan-3-yl] hydrogen phosphate Polymers Cc1cn(C2CC(OP(O)(=O)OCC3OC(CC3OP(O)(=O)OCC3OC(CC3O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c3nc(N)[nH]c4=O)C(COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3CO)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cc(C)c(=O)[nH]c3=O)n3cc(C)c(=O)[nH]c3=O)n3ccc(N)nc3=O)n3cc(C)c(=O)[nH]c3=O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)O2)c(=O)[nH]c1=O JLCPHMBAVCMARE-UHFFFAOYSA-N 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 238000012217 deletion Methods 0.000 description 2
- 230000037430 deletion Effects 0.000 description 2
- 239000011521 glass Substances 0.000 description 2
- 238000003780 insertion Methods 0.000 description 2
- 230000037431 insertion Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 108020004707 nucleic acids Proteins 0.000 description 2
- 102000039446 nucleic acids Human genes 0.000 description 2
- 230000006798 recombination Effects 0.000 description 2
- 238000005215 recombination Methods 0.000 description 2
- 229910052710 silicon Inorganic materials 0.000 description 2
- 239000010703 silicon Substances 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- 108700026220 vif Genes Proteins 0.000 description 2
- 238000012070 whole genome sequencing analysis Methods 0.000 description 2
- 101150096316 5 gene Proteins 0.000 description 1
- 108020003215 DNA Probes Proteins 0.000 description 1
- 108700039691 Genetic Promoter Regions Proteins 0.000 description 1
- 108091092195 Intron Proteins 0.000 description 1
- 241001344133 Magnaporthe Species 0.000 description 1
- 241001556089 Nilaparvata lugens Species 0.000 description 1
- 239000004677 Nylon Substances 0.000 description 1
- 108020005187 Oligonucleotide Probes Proteins 0.000 description 1
- 206010034133 Pathogen resistance Diseases 0.000 description 1
- 230000001133 acceleration Effects 0.000 description 1
- 230000009418 agronomic effect Effects 0.000 description 1
- 125000003275 alpha amino acid group Chemical group 0.000 description 1
- 150000001413 amino acids Chemical class 0.000 description 1
- FFBHFFJDDLITSX-UHFFFAOYSA-N benzyl N-[2-hydroxy-4-(3-oxomorpholin-4-yl)phenyl]carbamate Chemical compound OC1=C(NC(=O)OCC2=CC=CC=C2)C=CC(=C1)N1CCOCC1=O FFBHFFJDDLITSX-UHFFFAOYSA-N 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 230000002759 chromosomal effect Effects 0.000 description 1
- 238000007621 cluster analysis Methods 0.000 description 1
- 239000002299 complementary DNA Substances 0.000 description 1
- 239000012530 fluid Substances 0.000 description 1
- 230000004907 flux Effects 0.000 description 1
- 108091008053 gene clusters Proteins 0.000 description 1
- 238000012214 genetic breeding Methods 0.000 description 1
- 238000012268 genome sequencing Methods 0.000 description 1
- 238000012203 high throughput assay Methods 0.000 description 1
- 230000006801 homologous recombination Effects 0.000 description 1
- 238000002744 homologous recombination Methods 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000013011 mating Effects 0.000 description 1
- 238000001823 molecular biology technique Methods 0.000 description 1
- 239000000178 monomer Substances 0.000 description 1
- 229920001778 nylon Polymers 0.000 description 1
- 239000002751 oligonucleotide probe Substances 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000001259 photo etching Methods 0.000 description 1
- 102000054765 polymorphisms of proteins Human genes 0.000 description 1
- 108091033319 polynucleotide Proteins 0.000 description 1
- 102000040430 polynucleotide Human genes 0.000 description 1
- 239000002157 polynucleotide Substances 0.000 description 1
- 239000011148 porous material Substances 0.000 description 1
- 102000004169 proteins and genes Human genes 0.000 description 1
- 238000003908 quality control method Methods 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 239000013074 reference sample Substances 0.000 description 1
- 238000007480 sanger sequencing Methods 0.000 description 1
- 238000010187 selection method Methods 0.000 description 1
- 238000001338 self-assembly Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000007480 spreading Effects 0.000 description 1
- 238000003892 spreading Methods 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 235000012431 wafers Nutrition 0.000 description 1
- 238000005406 washing Methods 0.000 description 1
Images
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/11—DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
-
- C—CHEMISTRY; METALLURGY
- C40—COMBINATORIAL TECHNOLOGY
- C40B—COMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
- C40B40/00—Libraries per se, e.g. arrays, mixtures
- C40B40/04—Libraries containing only organic compounds
- C40B40/06—Libraries containing nucleotides or polynucleotides, or derivatives thereof
Landscapes
- Chemical & Material Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Organic Chemistry (AREA)
- Engineering & Computer Science (AREA)
- Genetics & Genomics (AREA)
- Zoology (AREA)
- Molecular Biology (AREA)
- Wood Science & Technology (AREA)
- Biomedical Technology (AREA)
- Biochemistry (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- Biotechnology (AREA)
- Physics & Mathematics (AREA)
- Microbiology (AREA)
- General Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Plant Pathology (AREA)
- Chemical Kinetics & Catalysis (AREA)
- General Chemical & Material Sciences (AREA)
- Medicinal Chemistry (AREA)
- Analytical Chemistry (AREA)
- Immunology (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The application relates to SNP marker combination and a design method for rice genotyping, a chip designed aiming at the SNP markers and application thereof.
Description
Technical Field
The application relates to the fields of genomics, molecular biology, bioinformatics and molecular plant breeding, in particular to a rice whole genome breeding chip and application thereof.
Background
Genome breeding refers to the application of molecular biology techniques to breeding, and breeding is performed at the genome level. The main advantages are as follows: firstly, plant seeds or seedlings can be identified at a molecular level, and whether the plant seeds or seedlings have expected excellent properties or not can be further judged, so that selection is carried out, and the acceleration of a breeding process and the improvement of breeding accuracy are realized; secondly, a set of standard flow can be formed by molecular biological detection and analysis, and different technicians can quickly obtain accurate results according to the flow operation strictly, so that the influence of personal experience on plant breeding is greatly reduced; thirdly, the marking technology in the genome breeding can detect at the whole genome level, so that the separation of offspring caused by the material containing heterozygous sites is avoided, and the stability of the material is ensured. The marking technology is an important tool in genome breeding, and the technology makes great contribution to the functional genome research and genetic improvement of crops. Among them, SNP (Single Nucleotide Polymorphism) is increasingly widely used as a third-generation marker due to its characteristics of wide distribution, high density, high stability and high accuracy on the genome. The technology for detecting SNP with high flux mainly comprises a detection platform based on a sequencing technology and a detection platform based on a chip technology, and the SNP chip becomes an important tool in the process of genome breeding due to the controllability of a marker locus, the convenience of operation and the reliability of a result. Currently, the most mature SNP chip detection technology comprises two major platforms, namely an Illumina infinium chip and an Affymetrix Axiom chip.
The Illumina infinium chip technology is a high-density chip technology based on microbeads. The technology uses micro-beads with the diameter of 3 mu m to carry out self-assembly in micro-pores which take optical fiber bundles or planar silicon wafers as substrates. Each bead is covered with hundreds of thousands copies of a particular oligonucleotide that will be used as a capture sequence to genotype a sample in an assay. The chips can be divided into the following formats according to the number of types of oligonucleotides: 24 sample formats (3,000-90,000 bead types), 12 sample formats (90,001-250,000 bead types), or 4 sample formats (250,001-1,000,000 bead types). The scanning system matched with the chip is provided with advanced laser and optical elements, can process high-density multi-sample chips, generates high-quality data and ensures high running speed. The average detection rate of the sample is high due to the advanced analysis technology, and the repeatability is as high as 99.9%. These high quality data reduce the likelihood of false positives and false negatives, making genotyping results more accurate.
The Affymetrix Axiom chip adopts an in-situ photoetching technology, and the photomask design and strict process flow in the technology ensure that the manufactured chip has high quality, high repeatability and consistency, and the extremely high density of probe synthesis on the chip is ensured, wherein the number of probes synthesized on each square centimeter substrate exceeds 400 ten thousand. The Affymetrix GeneTitan system is a fully automated, highly integrated chip workstation using chip plates in a format similar to a 96-well plate, where each square chip occupies approximately the area of one well of the 96-well plate, and one chip plate may contain 16, 24, or 96 chips, thereby enabling multi-sample high-throughput assays. The system integrates a hybridization furnace, a fluid workstation and CCD scanning imaging equipment which are used in the whole process from hybridization to scanning into an instrument, after a chip plate is placed into a GeneTitan system, the hybridization, washing and scanning of the chip almost do not need manual intervention, and all the operations can be automatically completed by a machine.
The applicant discloses a Rice whole genome breeding chip Rice60K in PCT international application publication WO/2014/121419A1, and the chip is successfully applied to Rice genome breeding and functional genome research.
Disclosure of Invention
In one aspect, the present application provides a combination of SNP markers for rice genotyping, comprising SEQ ID NO: 1-27781.
In some embodiments, the SNP marker combinations of the present application further include SEQ ID no:27782-86071 or a nucleotide sequence shown in the specification. In some embodiments, the SNP marker combinations of the present application include SEQ ID NOs: 1-86071, at least 37582 nucleotide sequences.
In another aspect, the present application provides a rice chip comprising a nucleotide sequence for SEQ ID NO:1-27781, and the SNP marker in the nucleotide sequence.
In some embodiments, the rice chips of the present application comprise a nucleotide sequence directed against SEQ ID NO:1-27781, and the SNP marker in the nucleotide sequence.
In some embodiments, the rice chips of the present application further comprise a nucleotide sequence directed to SEQ ID NO: a detection site designed by the SNP label in the nucleotide sequence shown in 27782-86071. In some embodiments, the rice chips of the present application comprise a nucleotide sequence directed against SEQ ID NO:1-86071, at least 37582 of the nucleotide sequence. In some embodiments, the detection sites in the rice chips of the present application are a combination of probes designed for SNP markers.
In some embodiments, the rice chip of the present application is fabricated using an in-situ synthesis method for a sheet, a separation synthesis method, or a microbead method. In some embodiments, the rice chips of the present application are fabricated by in situ photolithography synthesis, photoresist layer parallel synthesis, microfluidic channel-on-chip synthesis, light guided in situ synthesis, soft lithography in situ synthesis, jet printing synthesis, molecular stamp-on-chip synthesis, maskless chip synthesis, beadArray method, or suspension chip method. In some embodiments, the rice chips of the present application are made by Illumina Infinium technology or Affymetrix Axiom technology.
In another aspect, the present application provides the use of the above-mentioned SNP marker combination or chip in the detection of a biological sample. In certain embodiments, the assays are used for breeding, identity identification, gene mapping and cloning, germplasm resource identification, hybrid rice identification, wild rice identification, functional gene identification, or functional gene haplotype analysis.
In another aspect, the present application provides a method of detecting a biological sample, the method comprising detecting in the biological sample the presence of SEQ ID NO:1-27781, and the nucleotide sequence thereof. In some embodiments, the methods of the present application further comprise detecting the presence of SEQ ID:27782-86071 in the sequence shown. In some embodiments, the methods of the present application comprise detecting the presence of SEQ ID NO:1-86071, at least 37582 of the nucleotide sequence. In some embodiments, the methods of the present application utilize gene chips for the detection.
In another aspect, the present application provides a method for screening a genetic resource representative SNP marker combination, comprising the steps of:
obtaining SNP sites from sequencing results of a plurality of rice varieties;
selecting a locus having a score greater than 0.6 in the Illumina scoring system;
performing comprehensive scoring on the SNP loci, wherein the comprehensive scoring is the simple sum of the following numerical values:
calculating the difference of the SNP loci as A/T or C/G for 0 point, and calculating the difference of other positions as 20 points;
when the SNP locus is positioned at different positions of a gene spacer region, an intron, a promoter, a5 'end non-coding region (5' -UTR) and a 3 'end non-coding region (3' -UTR), the SNP locus is divided into 1, 1.5, 2 and 2.5;
when the SNP causes a synonymous mutation, a non-synonymous mutation and a large-effect mutation in the coding region, the mutations are given scores of 2, 5 and 10, respectively;
(MAF × 25 at SNP site in whole population) + (MAF × 25 at SNP site in indica rice population) + (MAF × 25 at SNP site in japonica rice population) + (MAF × 25 at SNP site in mixed sequencing);
uniformly selecting a plurality of SNP sites on a rice genome according to the comprehensive score; and
and (3) carrying out linkage disequilibrium block division on the whole rice genome according to the LD value, selecting 2 sites with the highest comprehensive score and 25 sites at most for each block, and selecting at least 10 sites per 100 kb.
In another aspect, the present application provides a method for screening a combination of SNP markers specific to a promoted hybrid rice, comprising the steps of:
carrying out whole genome sequencing on multiple hybrid rice to obtain multiple SNP loci;
selecting a locus having a score greater than 0.6 in the Illumina scoring system;
a composite score for a SNP site consisting of a simple sum of the following values:
the difference of the SNP loci is A/T or C/G, and the score is 0, and the other differences are 20;
when the SNP locus is positioned at different positions of a gene spacer region, an intron, a promoter, a5 'end non-coding region (5' -UTR) and a 3 'end non-coding region (3' -UTR), the SNP locus is divided into 1, 1.5, 2 and 2.5;
when the SNP causes synonymous mutation, non-synonymous mutation and large-effect mutation in the coding region, the SNP is respectively given 2, 5 and 10;
MAF × 50 in mixed sequencing of snp sites;
and uniformly selecting a plurality of SNP sites on the rice genome according to the comprehensive scoring result.
In another aspect, the present application provides a method for screening a combination of SNP markers derived from wild rice, comprising the steps of:
obtaining SNP loci from a wild rice variety from a rice SNP database;
removing sites with other SNPs or indels existing in 55bp upstream and downstream of the SNP sites;
selecting SNP sites which can be detected in at least 10% of varieties;
comparing the upstream or downstream 55bp sequence of the SNP locus with the rice genome, and removing the SNP locus with the matching degree of more than 70 percent with other positions of the genome;
selecting a locus with a score greater than 0.6 in the Illumina scoring system;
the rice genome is divided into segments according to the positions of every 40kb, and each segment selects one SNP site with the highest score.
In another aspect, the present application provides a method of screening for a combination of functional gene region markers comprising the steps of:
obtaining a plurality of SNP sites from a rice SNP database, wherein the plurality of SNP sites are located within nucleotide sequences of a plurality of functional genes of a plurality of rice varieties and are detectable in more than three varieties;
removing sites with other SNPs or indels existing in 55bp upstream and downstream of the SNP sites;
comparing the upstream or downstream 55bp sequence of the SNP locus with the rice genome, and removing the SNP locus with the matching degree of more than 70 percent with other positions of the genome;
removing SNP markers located outside 5kb upstream and downstream of the functional gene;
selecting a locus having a score greater than 0.6 in the Illumina scoring system;
and selecting SNP sites in a specific functional gene region, wherein the number of the existing SNP sites in the region of the Rice60K chip disclosed in WO/2014/121419A1 is not more than 10.
Drawings
FIG. 1 shows the distribution of SNP sites on rice genome. The ordinate figures sequentially represent 12 chromosomes of the rice, and the abscissa is a physical position; the height of the vertical line indicates the number of SNP sites; the legend indicates the correspondence between the height of the vertical line and the number of SNP sites. 1a is the SNP locus distribution of a functional gene region in newly added 30K SNP loci; 1b is the distribution of wild rice source SNP loci in newly added 30K SNP loci; 1c, spreading the special SNP locus distribution of the hybrid rice in the newly added 30K SNP loci; 1d is the germplasm resource representative SNP locus distribution in the newly added 30K SNP loci; 1e is newly added 30K SNP locus distribution; and 1f is the distribution of newly added 30K and Rice60K SNP sites.
FIG. 2 is a genetic background of detection of modified materials against rice blast A08-1 using a 90K chip. 2a is the detection result of Rice60KAddon 1; and 2b is the detection result of Os90Kv1. Wherein, the boxes indicated by the abscissa numbers sequentially represent 12 rice chromosomes, and the ordinate numbers are physical positions [ in megabases (Mb) ] on the rice genome; in the figure, white background indicates the genotype of the donor material K22, black line indicates the genotype of the donor material K131, and the line of the black dot on chromosome 6 is the target fragment.
FIG. 3 the result of haplotype clustering analysis of the rice blast resistance gene Pi2/Pi9/Pigm region. 3a, utilizing the clustering analysis result of the newly added 30K SNP marker combination; and 3b is the clustering analysis result of the Rice60K chip. Wherein the ordinate represents the difference value of the material piece; the horizontal direction is for each test material, and the same haplotype type is divided by the horizontal line-connected representation.
Detailed Description
The term "single nucleotide polymorphism" or "SNP marker" or "SNP site" as used herein refers to a nucleotide sequence present in the genomic sequence of a chromosome, and polynucleotide sequence variations based on differences in nucleotide sequence (changes in a single nucleotide — A, T, C or G) result in diversity in the chromosomal genome, thereby allowing different alleles (e.g., alleles from two different individuals) or different individuals to be distinguished from each other. The change may occur in coding or non-coding regions of the gene (e.g., at or near the promoter region, or in introns) or in intergenic regions.
The term "allele" as used herein refers to a different form of the same gene present in a given locus on a homologous chromosome.
The term "linkage disequilibrium" as used herein refers to a non-random association at two or more sites, which may be on the same chromosome or on different chromosomes. Linkage imbalance is also referred to as gamete level imbalance or gamete imbalance. In another aspect, linkage disequilibrium is the frequency at which an allele or genetic marker exhibits in a population above or below the unimodal specimen predicted by the random frequency of the allele. Linkage refers to a limited combination of two or more loci on a chromosome, and linkage disequilibrium is not equivalent to linkage. The number of linkage disequilibrium depends on the observed and expected differences in site frequency. For those populations where the frequency of sites or genotypes after recombination is equal to the expected one we call this linkage balance. The degree of linkage disequilibrium depends on a variety of factors including genetic linkage, selection, and probability of recombination, genetic drift, type-selective mating, and population architecture.
The term "linkage disequilibrium block" as used herein refers to a haplotype block defining a genome-wide SNP marker based on the difference in linkage disequilibrium, using LD value D' as a criterion. Haplotypes are located in a set of associations of a particular region of a chromosome and tend to be a combination of single nucleotide polymorphisms that are inherited as a whole to progeny.
MAF is the minimum Allele Frequency (Minor Allele Frequency), which refers to the Frequency of occurrence of alleles not commonly found in a given population. A higher value indicates a greater likelihood of polymorphism between any two varieties.
The term "Indel" as used herein refers to insertions or deletions, which specifically refer to differences in the entire genome, with a certain number of nucleotide insertions or deletions in the genome of an individual relative to a standard control (Jander et al, 2002).
The term "SNP chip" as used herein refers to a biological microchip capable of analyzing the presence of SNPs contained in sample DNA by arranging and attaching several hundred to several hundred thousand biomolecules as probes, such as DNA, DNA fragments, cDNA, oligonucleotides, RNA or RNA fragments having known sequences, which are fixed at intervals on a small solid substrate formed of glass, silicon or nylon. Depending on the degree of complementarity, hybridization occurs between the nucleic acids contained in the sample and the probes immobilized on the surface. By detecting and judging the hybridization, information on the substance contained in the sample can be obtained at the same time.
The major types of DNA chips currently available include: in-situ chip synthesis, which adopts modified oligonucleotide monomers to gradually synthesize spatially combined probe sequences in situ to form a DNA chip, thereby directly synthesizing an oligonucleotide probe array on a hard surface. A method for synthesizing a DNA chip by spotting a probe sequence synthesized in advance on a specific site by spotting method, thereby forming a DNA probe array immobilized on a glass substrate. The bead method involves directly synthesizing DNA probes on coded beads, or fixing pre-prepared probe sequences on the coded beads, and then arbitrarily assembling to form bead chips.
In one aspect, the present application provides a combination of SNP markers for rice genotyping, comprising SEQ ID NO: 1-27781. The amino acid sequence of SEQ ID NO: the nucleotide sequence shown by 1-27781 is SNP site and 70bp of upstream and downstream thereof, and the probe can be designed from upstream or downstream when actually designed.
In certain embodiments, the SNP marker combination further includes SEQ ID no:27782-86071 or a nucleotide sequence shown in the specification. (ii) SEQ ID: the SNP markers in the nucleotide sequence shown in 27782-86071 are 58,290 SNP marker combinations detected by Rice whole genome breeding chip Rice60K disclosed in PCT international application WO2014/121419A1, and the SNP markers comprise the SNP markers and single-side sequences thereof, and can be used for designing the chip.
In the present context, SEQ ID:1-86071 is collectively referred to as 90K, wherein the SNP marker disclosed for the first time in this application (i.e., the SNP marker in the nucleotide sequence shown in SEQ ID NO: 1-27781) is referred to as newly added 30K, and the nucleotide sequence shown in SEQ ID: the SNP marker in the nucleotide sequence shown in 27782-86071 is called 60K.
In another aspect, the present application provides a rice chip comprising a nucleotide sequence for SEQ ID NO:1-27781, and the SNP marker in the nucleotide sequence.
In certain embodiments, the chip further comprises a sequence directed to SEQ ID NO:27782-86071, namely the chip comprises a detection site designed for the SNP marker in the nucleotide sequence shown in SEQ ID NO:1-86071, and the SNP marker in the nucleotide sequence. In certain embodiments, the chip comprises a nucleic acid sequence directed against SEQ ID NO:1-86071, at least 37582 of the nucleotide sequence. In certain embodiments, the detection site is a combination of probes designed for a SNP marker.
In certain embodiments, the chip is fabricated using an in-situ on-chip synthesis method, a ex-chip synthesis method, or a microbead method. In certain embodiments, the chip is fabricated by in situ photolithography synthesis, photoresist layer parallel synthesis, microfluidic channel-on-chip synthesis, light guided in situ synthesis, soft lithography technique in situ synthesis, jet printing synthesis, molecular stamp-on-chip synthesis, maskless chip synthesis, beadArray method, or suspension chip method. In certain embodiments, the chip is fabricated by Illumina Infinium technology, affymetrix Axiom technology.
In another aspect, the present application provides the use of the above-mentioned SNP marker combination or chip in the detection of a biological sample. In certain embodiments, the assays are used for breeding, identity identification, gene mapping and cloning, germplasm resource identification, hybrid rice identification, wild rice identification, functional gene identification, or functional gene haplotype analysis.
In another aspect, the present application provides a method of detecting a biological sample, the method comprising detecting in the biological sample the presence of the nucleic acid of SEQ ID NO:1-27781, and the nucleotide sequence thereof.
In certain embodiments, the method further comprises detecting the presence of SEQ ID:27782-86071 in the sequence shown. In certain embodiments, the method comprises detecting the presence of SEQ ID NO:1-86071, at least 37582 of the nucleotide sequences.
In certain embodiments, the detection is performed using a gene chip. In certain embodiments, the chip comprises a sequence directed to SEQ ID NO:1-27781, and the SNP marker in the nucleotide sequence.
In certain embodiments, the chip further comprises a nucleic acid sequence directed against SEQ ID NO:27782-86071 and designing a detection site for the SNP marker in the nucleotide sequence shown in the specification. In certain embodiments, the chip comprises a nucleic acid sequence directed against SEQ ID NO:1-86071, at least 37582 nucleotide sequences in the nucleotide sequence. In certain embodiments, the detection site is a combination of probes designed for a SNP marker.
In certain embodiments, the chip is fabricated using an in-situ on-chip synthesis method, a ex-chip synthesis method, or a microbead method. In certain embodiments, the chip is fabricated by in situ photolithography synthesis, photoresist layer parallel synthesis, microfluidic channel-on-chip synthesis, light guided in situ synthesis, soft lithography technique in situ synthesis, jet printing synthesis, molecular stamp-on-chip synthesis, maskless chip synthesis, beadArray method, or suspension chip method. In certain embodiments, the chip is fabricated by Illumina Infinium technology or Affymetrix Axiom technology.
In another aspect, the present application provides a method for screening a genetic resource representative SNP marker combination, comprising the steps of:
obtaining SNP sites from sequencing results of a plurality of rice varieties;
selecting a locus having a score greater than 0.6 in the Illumina scoring system;
performing comprehensive scoring on the SNP loci, wherein the comprehensive scoring is the simple sum of the following numerical values:
the difference of the SNP loci is A/T or C/G, and the score is 0, and the other differences are 20;
when the SNP locus is positioned at different positions of a gene spacer region, an intron, a promoter, a5 'end non-coding region (5' -UTR) and a 3 'end non-coding region (3' -UTR), the SNP locus is divided into 1, 1.5, 2 and 2.5;
when the SNP causes a synonymous mutation, a non-synonymous mutation and a large-effect mutation in the coding region, the mutations are given scores of 2, 5 and 10, respectively;
(MAF × 25 at SNP site in whole population) + (MAF × 25 at SNP site in indica rice population) + (MAF × 25 at SNP site in japonica rice population) + (MAF × 25 at SNP site in mixed sequencing);
uniformly selecting a plurality of SNP sites on a rice genome according to the comprehensive score; and
and (3) carrying out linkage disequilibrium block division on the whole rice genome according to the LD value, selecting 2 sites with the highest comprehensive score and 25 sites at most for each block, and selecting at least 10 sites per 100 kb.
In another aspect, the present application provides a method for screening a combination of SNP markers specific to a promoted hybrid rice, comprising the steps of:
sequencing a whole genome of a plurality of hybrid rice to obtain a plurality of SNP loci;
selecting a locus having a score greater than 0.6 in the Illumina scoring system;
a composite score for a SNP site consisting of a simple sum of the following values:
the difference of the SNP loci is A/T or C/G, and the score is 0, and the other differences are 20;
when the SNP locus is positioned at different positions of a gene spacer region, an intron, a promoter, a5 'end non-coding region (5' -UTR) and a 3 'end non-coding region (3' -UTR), the SNP locus is divided into 1, 1.5, 2 and 2.5;
when the SNP causes synonymous mutation, non-synonymous mutation and large-effect mutation in the coding region, the SNP is respectively given 2, 5 and 10;
MAF × 50 in mixed sequencing of snp sites;
and uniformly selecting a plurality of SNP sites on the rice genome according to the comprehensive scoring result.
In another aspect, the present application provides a method of screening a combination of wild rice-derived SNP markers, comprising the steps of:
obtaining SNP loci from a wild rice variety from a rice SNP database;
removing sites with other SNPs or indels existing in 55bp upstream and downstream of the SNP sites;
selecting SNP sites that can be detected in at least 10% of the varieties;
comparing the upstream or downstream 55bp sequence of the SNP locus with the rice genome, and removing the SNP locus with the matching degree of more than 70 percent with other positions of the genome;
selecting a locus having a score greater than 0.6 in the Illumina scoring system;
the rice genome is divided into segments according to the positions of every 40kb, and each segment selects one SNP site with the highest score.
In another aspect, the present application provides a method of screening for a combination of functional gene region markers comprising the steps of:
obtaining a plurality of SNP sites from a rice SNP database, wherein the plurality of SNP sites are located within nucleotide sequences of a plurality of functional genes of a plurality of rice varieties and are detectable in more than three varieties;
removing sites with other SNPs or indels existing in 55bp upstream and downstream of the SNP sites;
comparing the upstream or downstream 55bp sequence of the SNP locus with the rice genome, and removing the SNP locus with the matching degree of more than 70 percent with other positions of the genome;
removing SNP markers located outside 5kb upstream and downstream of the functional gene;
selecting a locus with a score greater than 0.6 in the Illumina scoring system;
and selecting SNP sites in a specific functional gene region, wherein the number of the existing SNP sites in the region of the Rice60K chip disclosed in WO/2014/121419A1 is not more than 10.
Examples
Example 1 SNP marker selection method
As shown in SEQ ID NO: the SNP markers of the nucleotide sequence shown in 1-27781 consist of five markers, and the corresponding SNP sites are obtained by screening according to the following methods.
1. Representative SNP site of germplasm resources:
(1) 6,428,770 SNP sites were sequenced from 1491 rice varieties (from the RiceVarMap database, see webpage http:// RiceVarMap. Ncpgr. Cn /);
(2) Selecting a locus with a score greater than 0.6 in the Illumina scoring system;
(3) The SNP sites were scored synthetically, which is a simple sum of the following values:
the difference of the SNP loci is A/T or C/G, and the score is 0, and the other differences are 20;
according to the difference of the influence degree of different regions of the gene structure on the gene function, when the SNP loci are respectively positioned at different positions of a gene spacer region, an intron, a promoter, a5 'end non-coding region (5' -UTR) and a 3 'end non-coding region (3' -UTR), respectively assigning 1, 1.5, 2 and 2.5;
since the base mutation of the coding region is directly related to function, when the SNP causes a synonymous mutation, a non-synonymous mutation and a large-effect mutation (e.g., a termination mutation) in the coding region, it is divided into 2, 5 and 10;
(MAF × 25 at SNP site in whole population) + (MAF × 25 at SNP site in indica rice population) + (MAF × 25 at SNP site in japonica rice population) + (MAF × 25 at SNP site in mixed sequencing);
(4) Uniformly selecting 4850 SNP loci on a rice genome according to the comprehensive score;
(5) Performing linkage disequilibrium block division on the rice whole genome according to the LD value; the general principle of selecting the sites is that SNP sites are representative and uniformly distributed, each block selects 2 sites with the highest comprehensive score, and at least 10 sites are ensured to be selected per 100 kb; when the number of blocks within 100kb is less than 5, i.e., less than 10 sites are selected per 100kb, 3 or more SNP sites are selected for a part of blocks, and a maximum of 25 sites are selected for each block.
Finally, based on LD selection, combining the whole rice population, indica-japonica subspecies and hybrid rice mixed sequencing results, 6108 SNP loci are selected (as shown in figure 1d, the distribution of germplasm resource representative SNP loci in newly added 30K SNP loci, the ordinate figures sequentially represent 12 chromosomes of rice, the abscissa is a physical position, the vertical line height represents the number of SNP loci, and the legend represents the corresponding relation between the vertical line height and the number of SNP loci).
2. Popularizing the specific SNP marker of hybrid rice:
(1) Hybrid rice purchased from the market is mixed and subjected to whole genome sequencing to obtain 2,207 and 700 SNP loci, wherein 13.8 percent of loci are not detected in 1491 variety (RiceVarMap database, see webpage http:// riceVarmap. Ncpgr. Cn /) sequencing data, which indicates that the increase of specific markers of the generalized hybrid rice is necessary;
(2) Selecting a locus with a score greater than 0.6 in the Illumina scoring system;
(3) A composite score for a SNP site consisting of a simple sum of the following values:
calculating the difference of the SNP loci as A/T or C/G for 0 point, and calculating the difference of other positions as 20 points;
according to the difference of the influence degree of different regions of the gene structure on the gene function, when the SNP loci are respectively positioned at different positions of a gene spacer region, an intron, a promoter, a5 'end non-coding region (5' -UTR) and a 3 'end non-coding region (3' -UTR), respectively assigning 1, 1.5, 2 and 2.5;
since the base mutation of the coding region is directly related to the function, when the SNP causes synonymous mutation, non-synonymous mutation and a mutation having a large effect (e.g., a termination mutation) in the coding region, it is given to each of the groups 2, 5 and 10;
MAF × 50 in mixed sequencing of snp sites;
(4) And uniformly selecting SNP sites on the rice genome according to the comprehensive scoring result.
Finally, 4850 SNP sites were selected from 100 production-applied hybrid rice genome sequencing data (as shown in FIG. 1c, the distribution of specific SNP sites of generalized hybrid rice among the newly added 30K SNP sites. The ordinate numbers sequentially represent 12 chromosomes of rice, the abscissa represents the physical position; the vertical line height represents the number of SNP sites; and the legend represents the corresponding relationship between the vertical line height and the number of SNP sites).
3. Wild rice-derived SNP markers:
(1) 2,472,942 SNP loci derived from 446 wild rice varieties are obtained from a rice SNP database (http:// 202.127.18.221/riceHap3/index. Php);
(2) Removing sites with other SNPs or indels in the upstream and downstream 55 bp;
(3) Selecting SNP loci which can be detected in at least 10% of varieties;
(4) Comparing the upstream or downstream 55bp sequence of the SNP locus with the rice genome, and removing the SNP locus with the matching degree of more than 70 percent with other positions of the genome;
(5) Selecting a locus having a score greater than 0.6 in the Illumina scoring system;
(6) The rice genome is divided into segments according to the positions of every 40kb, and each segment selects one SNP site with the highest score.
Finally, 8316 SNP loci evenly distributed on the genome are selected from 446 wild rice varieties which have been published (as shown in the distribution of wild rice source SNP loci in the newly added 30K SNP loci in FIG. 1b, the ordinate numbers sequentially represent 12 chromosomes of rice, the abscissa represents the physical position, the vertical line height represents the number of SNP loci, and the legend represents the corresponding relationship between the vertical line height and the number of SNP loci).
4. Functional gene region markers:
(1) Obtaining 5,680,149 SNP loci of 879 functional gene regions (Xiao Jing Hua, etc., china Rice functional genome research progress and prospect, scientific report 2015, 60, 1711-1722) from a rice SNP database (http:// ricevarmap. Ncpgr. Cn /) of 590 rice varieties, wherein the SNP loci can be detected in more than three varieties;
(2) Removing sites with other SNPs or indels in the upstream and downstream 55 bp;
(3) Comparing the upstream or downstream 55bp sequence of the SNP locus with the rice genome, and removing the locus with the matching degree of more than 70 percent with other positions of the genome;
(4) Selecting SNP sites within 5kb upstream and downstream of 879 cloned functional genes;
(5) Selecting a locus with a score greater than 0.6 in the Illumina scoring system;
(6) And selecting SNP sites in a specific functional gene region, wherein the number of the SNP sites in the Rice60K chip disclosed in WO/2014/121419A1 is not more than 10.
Finally, 8316 large-effect SNP sites are selected from 879 reported functional gene regions (shown in the distribution of SNP sites of the functional gene regions in the newly added 30K SNP site in FIG. 1a, the ordinate figures sequentially show 12 chromosomes of rice, the abscissa shows the physical position, the vertical line height shows the number of the SNP sites, and the legend shows the corresponding relationship between the vertical line height and the number of the SNP sites).
5. Functional gene region haplotype markers:
191 SNP markers related to gene regions (Pi 1, pi2, bph14, bph15, rf-1) such as a rice blast resistance gene, a brown planthopper resistance gene, a fertility restorer gene and the like can distinguish different allelic types. The design method comprises the following steps: rice materials containing a target gene and rice materials not containing the target gene are selected, based on the position information of the known target gene in the genome, primers are designed for every 5-10kb by taking the Japanese fine genome as a reference, gene sequences in a region of 250kb before and after the target gene are obtained by a Sanger sequencing method, differential SNP marker design markers of the two groups of materials are explored, and 191 SNP markers of 5 gene regions (Pi 1, pi2, bph14, bph15 and Rf-1) are obtained in total.
Example 2 construction of Rice60KAddon1 chips Using SNP marker combinations
The applicant combines all the SNP markers obtained in example 1 with 58,290 SNP markers detected by Rice whole genome breeding chip Rice60K disclosed in PCT international application WO/2014/121419A1, and manufactures a Rice90K whole genome breeding chip by using the Illumina infinium chip technology (as shown in the distribution of newly added 30K and Rice60K SNP sites in FIG. 1f, the ordinate numbers sequentially show 12 chromosomes of Rice, the abscissa is a physical position, the vertical line heights show the number of SNP sites, and the legend shows the corresponding relationship between the vertical line heights and the number of SNP sites), which is named as Rice60KAddon1. The markers detected by the chip comprise 27781 SNP markers disclosed by the application and 58,290 SNP markers detected by Rice whole genome breeding chip Rice90K disclosed in PCT international application WO/2014/121419A 1. The sequence distribution of the chip probe is designed and selected in the 70bp area on both sides of the SNP marker according to the technical requirement of the Illumina infinium chip. SEQ ID NO: the SNP marker combination in the nucleotide sequence shown in 1-27781 is abbreviated as newly added 30K to be distinguished from the published SNP markers in the chip.
The Rice whole genome breeding chips Rice6K and Rice60K (or Rice SNP 50) developed by the applicant based on the Illumina infinium technology have been proved to be well applicable to Rice molecular breeding and functional genome research (Yu et al, a white-genome SNP array (Rice 6K) for genetic breeding in Rice plant biotechnological j 2014, 12.
Example 3 construction of Os90Kv1 chip Using SNP marker combination
The applicants submitted 58,290 SNP markers detected by Rice90K chip and 86,071 SNP markers in total of 27,781 SNP markers newly added to Affymetrix corporation (http:// www.affymetrix.com /) manufacturing chip. To make it fit to the Affymetrix Axiom chip platform, affymetrix designed two probe sets (probe sets) based on the sequences on both sides of each marker, and finally had a total of 131, 631 probe sets that detected a total of 86,014 SNP sites, the chip named Os90Kv1.
After the Os90Kv1 chip is produced, 192 rice samples including 96 inbred line parents and 96 hybrid F are detected on GeneTitan equipment (http:// www.affymetrix.com /) according to Affymetrix Axiom 2.0 chip detection process 1 . After the analysis of Affymetrix data analyst, 190 samples (the detection rate is more than 99%) are qualified by quality control QC. The applicants further analyzed these data and screened high quality SNP markers according to the following criteria: (1) Two probe sets for detecting the same SNP locus adopt one probe with the best genotyping effect; (2) When 89 inbred line parent varieties (only one of 96 inbred line samples is selected for repeated detection or close genetic relationship of the same variety) are detected, the total number of heterozygous genotypes is less than or equal to 3; (3) The typing type is PolyHighResolution, monoHighResolution or NoMinorHom (typing type is provided by Affymetrix). Finally, 60,938 high-quality probe sets are obtained in total, and 60,938 SNP sites are detected.
Background analysis of a stable strain A08-1 (patent application No. CN201410532337.7, publication No. CN 105567790A) in which a blast-resistant gene was introduced into an empty culture 131 by using these high-quality SNP markers revealed that the background returned to the empty culture 131 substantially except for the target fragment introduced by Chr6, as shown in FIG. 2b (in the figure, the boxes indicated by the abscissa numerals indicate 12 chromosomes of rice in sequence, and the ordinate numerals indicate the physical positions [ in megabases (Mb) ] on the rice genome, the white background in the figure indicates the genotype of the empty culture 131 of an acceptor material, the black line indicates the genotype or experimental error of a donor material K22, and the line at the black dot on the No. 6 chromosome indicates the target fragment). The same samples were tested on a 90K chip (Rice 60KAddon 1) based on the Illumina infinium chip platform, and the background was completely clean, as shown in FIG. 2a (the boxes indicated by the abscissa number in the figure sequentially represent 12 chromosomes of Rice, and the ordinate number represents the physical position on the Rice genome [ in megabases (Mb) ], the white background in the figure represents the genotype consistent with the receptor material empty breeding 131, the black line represents the genotype consistent with the donor material K22, and the line at the black dot on the No. 6 chromosome is the target fragment). In practice, the probability of frequent occurrence of crossover in a small area in the vicinity is very low. Therefore, the black line shown in fig. 2b at the non-target segment is judged as an experimental error. That is, within the allowable error range (reliability > 99%), the Os90Kv1 chip based on the Affymetrix Axiom platform also has a better typing effect.
Example 4 functional Gene haplotype analysis and comparison of the newly added 30K and 60K
Many important agronomic trait related genes in rice are not single copy, for example, most of the rice blast resistance genes belong to NBS-LRR gene family. For such structurally complex genes, it is difficult to develop a single functional marker or design a linked marker on the gene, and the function of the gene can be detected by a haplotype marker in the gene region.
In order to verify the haplotype effect of the breeding chip on the genes, the applicant analyzed the rice blast resistance gene cluster Pi2/Pi9/Pigm of the 6 th chromosome of rice. In order to identify whether The rice blast resistant materials R002, R005, R004 and R006 contain The rice blast resistant gene in this region, a material containing a Pi2 gene was reported as C101A51 (Zhou et al, the light amino-acid differences with The present rice grain-rich repeat between Pi2 and Piz-t resistance genes determining The resistance property to Magnaporthe. Mol Plant Microbe interact.2006, 19 1216-1228): the reference variety containing The Pi9 gene is 75-1-127 (Qu et al, the broad-spectrum viral resistance gene Pi9 nucleotides a nucleotide-binding site-leucoine-rich repeat protein and is a member of a multigene quality in genetics.2006, 172, 1901-1914), and The reference variety containing The Pigm gene is Valley 4 (GM 4) (Deng et al, genetic characterization and finishing of The broad resistance gene log Pi2 and Pi9 in a broad-spectrum resistance Chinese Genetic 113, 705-713). DNA is extracted from 7 samples of a sample to be detected and a reference sample, and the whole genome genotypes of the 7 samples are obtained by utilizing a Rice whole genome breeding chip Rice60KAddon1 for detection according to an Illumina infinium chip detection process.
The results of 60K (SNP markers detected by Rice whole genome breeding chip Rice60K disclosed in WO2014/121419A 1) and newly added 30K SNP marker combinations in Pi2/Pi9/Pigm gene regions (in the upstream and downstream 250kb regions) were respectively extracted for cluster analysis, and the results are shown in FIG. 3 (the ordinate represents the material piece difference value; the abscissa represents each detection material, and the abscissa is connected with the abscissa to divide the detection material into the same haplotype type). The two clustered results in this region were identical, i.e., the haplotypes of R002, R005, R006 and C101A51 were identical, while the haplotype of R004 was identical to that of GM 4. This result indicates that R002, R005, and R006 contained the Pi2 gene, and R004 contained the Pigm gene. Sequencing verification is carried out on the target genes of the materials by using a Sanger method, and the sequencing verification is consistent with a clustering result, which shows that the functions of the SNP markers designed according to the haplotypes of the functional gene regions can be realized. In addition, the clustering result of Rice60K shows that the difference value between 75-1-127 and C101A51 is less than 0.2, and the result of newly adding 30K is more than 0.2 and close to 0.3. The larger the value, the better the classification. The two materials have proved to contain different resistance genes, so that the classification effect of the newly added 30K in the functional gene region is better than that of Rice60K.
Example 5 application of SNP marker combination and chip
1. Application in rice breeding
The Chinese patent application CN201410532337.7 (publication No. CN 105567790A) discloses a plant breeding method containing target genome DNA fragments:
(1) Taking a receptor plant parent without the target genome DNA segment as a recurrent parent, and carrying out hybridization, backcross and selfing with a donor plant parent containing the target genome DNA segment;
(2) Performing foreground selection by using a foreground selection marker in a breeding process;
(3) Carrying out whole genome background selection by using a high-density marker detection method in a breeding process;
(4) And (3) carrying out the steps until a target plant with homozygous target genome DNA segments and completely restored background is obtained by homologous recombination at both sides of the target genome DNA segments.
The "high-density marker detection method" in step (3) can be used for genotype detection using the SNP marker combinations described herein and the chips designed for these SNP markers.
2. Application in rice identity identification
The method for identifying the DNA identity of the rice disclosed in Chinese patent application CN201610009053.9 (publication No. CN 105550537A) obtains the standard gene fingerprint data of the rice by detecting the genotypes of a group of genetic diversity markers distributed in the whole genome of the rice, thereby identifying the DNA identity of the rice.
In the method, a group of genetic diversity markers distributed in the whole rice genome can be detected by utilizing the SNP marker combination and a chip designed aiming at the SNP markers.
3. Application in rice gene positioning and cloning
The Rice Whole Genome Breeding chip Rice6K developed by the applicant has been applied to the positioning of Rice grain size and Yield related QTL (Sun et al, identification of quantitative trait locus for grain size and the distributions of major grain-size QTLs to grain weight in Rice, mol Breeding DOI 10.1007/s11032-012-9802-z; tan et al, QTL screening for grain Yield Using a wheel Genome SNP Array, journal of Genetics and Genomics, 2013).
4. Applications in other directions
The SNP marker combination and the chip designed for the SNP markers have the following five types of markers: a germplasm resource representative marker, a generalized hybrid rice specific marker, a wild rice source marker, a functional gene region marker and a functional gene region haplotype marker. It is apparent that the SNP marker combination and the chip designed for the SNP markers can be applied to germplasm resource identification, hybrid rice identification, wild rice identification, functional gene identification and functional gene haplotype analysis.
Example 6 setting of minimum number of SNP markers for realizing detection function
As described in example 3, rice60KAdd1 could accurately judge the Rice blast resistance fragment contained in A08-1. Rice60KAdd1 detected 65071 high quality sites in A08-1 in total, wherein 11 SNP markers capable of distinguishing A08-1 from the recipient parent sterile 131 in the target Rice blast resistant fragment, are shown in the following table, wherein the recipient parent sterile 131 genotype is set as A and the donor parent K22 genotype is set as B.
TABLE 1 SNP marker for difference between empty breeding 131 and A08-1 of target rice blast resistant fragment
In practical judgment, it is generally considered that the continuous occurrence of AA or BB for 3 times at polymorphic sites of the material is more reliable, that is, the difference of the material in the target section can be determined by detecting the difference of more than 3 SNP markers in the above table. 65071 high-quality sites were sampled randomly for 100 times, and the number of samples for 11 different SNP markers in the table was counted. The result shows that when the number of sampling sites is greater than 37582, the probability that the number of the sampling sites is less than 3 in 11 differential SNP markers is less than 0.05, and the sampling sites belong to small probability events in normal distribution. That is, 37582 is the minimum number of SNP markers that realize the detection function among 86,014 SNP markers included in the Rice60KAdd1 chip.
Although the present application has been described in detail with respect to the general description and specific embodiments, it will be apparent to those skilled in the art that certain modifications or improvements may be made based on the present application. Accordingly, such modifications and improvements are intended to be within the scope of this invention as claimed.
Claims (19)
1. The SNP marker combination for rice genotyping is characterized by comprising an SNP nucleotide fragment sequence shown as SEQ ID NO 1-27781.
2. The SNP marker set according to claim 1, further comprising a SNP nucleotide fragment sequence as set forth in SEQ ID No 27782-86071.
3. The rice chip is characterized by comprising detection sites designed aiming at SNP markers with nucleotide sequences respectively shown as SEQ ID NO 1-27781.
4. The chip of claim 3, wherein the chip further comprises detection sites designed for the SNP markers having the nucleotide sequences as set forth in SEQ ID NOS: 27782-86071, respectively.
5. The chip of claim 3 or 4, wherein the detection sites are a combination of probes designed for SNP markers.
6. The chip of claim 4, wherein said chip is fabricated using an in-situ synthesis method, an off-chip synthesis method, or a microbead method.
7. The chip of claim 4, wherein said chip is fabricated by in-situ photolithography synthesis, photoresist layer parallel synthesis, microfluidic channel-on-chip synthesis, light-guided in-situ synthesis, soft lithography in-situ synthesis, jet printing synthesis, molecular stamp-on-chip synthesis, maskless chip synthesis, beadArray method, or suspension chip method.
8. The chip of claim 4, wherein the chip is fabricated by Illumina Infinium technology or Affymetrix Axiom technology.
9. Use of the SNP marker according to any one of claims 1 to 2, or the chip according to any one of claims 3 to 8 for the detection of a biological sample.
10. The use of claim 9, wherein the detection is for breeding, identity identification, gene mapping and cloning, germplasm resources identification, hybrid rice identification, wild rice identification, functional gene identification, or functional gene haplotype analysis.
11. A method for detecting a biological sample, the method comprising detecting information on SNP markers having nucleotide sequences as set forth in SEQ ID NOS: 1-27781, respectively, in the biological sample.
12. The method of claim 11, further comprising detecting information of SNP markers having nucleotide sequences respectively shown in SEQ ID Nos. 27782-86071 in the biological sample.
13. The method of claim 11 or 12, wherein the detection is performed using a gene chip.
14. The method according to claim 13, wherein the chip comprises detection sites designed for SNP markers having nucleotide sequences as set forth in SEQ ID NO 1-27781, respectively.
15. The method according to claim 14, wherein the chip further comprises detection sites designed for SNP markers having the nucleotide sequences respectively shown in SEQ ID NOS: 27782-86071.
16. The method of claim 14 or 15, wherein the detection site is a probe combination designed for a SNP marker.
17. The method of claim 13, wherein the chip is fabricated using an in-situ wafer synthesis method, a separation wafer synthesis method, or a microbead method.
18. The method of claim 13, wherein the chip is fabricated by in-situ photolithography synthesis, photoresist layer parallel synthesis, microfluidic channel-on-chip synthesis, light-guided in-situ synthesis, soft lithography in-situ synthesis, jet printing synthesis, molecular stamp-on-chip synthesis, maskless chip synthesis, beadArray method, or suspension chip method.
19. The method of claim 13, wherein the chip is fabricated by Illumina Infinium technology or Affymetrix Axiom technology.
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2016/109007 WO2018103037A1 (en) | 2016-12-08 | 2016-12-08 | Rice whole genome breeding chip and application thereof |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110050092A CN110050092A (en) | 2019-07-23 |
CN110050092B true CN110050092B (en) | 2023-01-03 |
Family
ID=62490633
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201680091357.2A Active CN110050092B (en) | 2016-12-08 | 2016-12-08 | Rice whole genome breeding chip and application thereof |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110050092B (en) |
WO (1) | WO2018103037A1 (en) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020082314A1 (en) * | 2018-10-25 | 2020-04-30 | 武汉双绿源创芯科技研究院有限公司 | Oryza sativa green gene chip and application |
CN110257553B (en) * | 2019-08-05 | 2022-07-08 | 江苏省农业科学院 | A KASP molecular marker method for identifying rice blast resistance gene Pigm |
CN110408719B (en) * | 2019-08-05 | 2022-07-08 | 江苏省农业科学院 | Four-primer molecular marking method for identifying rice blast resistance gene Pigm |
CN111681709B (en) * | 2020-06-17 | 2023-04-28 | 深圳市早知道科技有限公司 | Method for designing gene locus on high-density gene chip |
CN112941216A (en) * | 2020-12-29 | 2021-06-11 | 武汉基诺赛克科技有限公司 | Development method and breeding application of 1K SNP-Panel of rice |
CN113308562B (en) * | 2021-05-24 | 2022-08-23 | 浙江大学 | Cotton whole genome 40K single nucleotide site and application thereof in cotton genotyping |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2011008361A1 (en) * | 2009-06-30 | 2011-01-20 | Dow Agrosciences Llc | Application of machine learning methods for mining association rules in plant and animal data sets containing molecular genetic markers, followed by classification or prediction utilizing features created from these association rules |
CN102747138A (en) * | 2012-03-05 | 2012-10-24 | 中国种子集团有限公司 | Rice whole genome SNP chip and application thereof |
WO2014048062A1 (en) * | 2012-09-28 | 2014-04-03 | 未名兴旺系统作物设计前沿实验室(北京)有限公司 | Snp loci set and usage method and application thereof |
CN104789648A (en) * | 2014-12-25 | 2015-07-22 | 中国种子集团有限公司 | Molecular markers for haplotype identification of paddy rice CMS restoring gene Rf-1 segment and applications thereof |
CN105008599A (en) * | 2013-02-07 | 2015-10-28 | 中国种子集团有限公司 | Rice whole genome breeding chip and application thereof |
CN105550537A (en) * | 2016-01-07 | 2016-05-04 | 中国种子集团有限公司 | Method for identifying rice DNA identities and application thereof |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1962212A1 (en) * | 2007-01-17 | 2008-08-27 | Syngeta Participations AG | Process for selecting individuals and designing a breeding program |
CN105567790B (en) * | 2014-10-10 | 2018-12-21 | 中国种子集团有限公司 | The selection of the plant of DNA fragmentation containing target gene group |
CN104328507B (en) * | 2014-10-11 | 2016-03-30 | 中国水稻研究所 | A kind of SNP chip for rice variety identification, preparation method and application |
-
2016
- 2016-12-08 CN CN201680091357.2A patent/CN110050092B/en active Active
- 2016-12-08 WO PCT/CN2016/109007 patent/WO2018103037A1/en active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2011008361A1 (en) * | 2009-06-30 | 2011-01-20 | Dow Agrosciences Llc | Application of machine learning methods for mining association rules in plant and animal data sets containing molecular genetic markers, followed by classification or prediction utilizing features created from these association rules |
CN102747138A (en) * | 2012-03-05 | 2012-10-24 | 中国种子集团有限公司 | Rice whole genome SNP chip and application thereof |
WO2014048062A1 (en) * | 2012-09-28 | 2014-04-03 | 未名兴旺系统作物设计前沿实验室(北京)有限公司 | Snp loci set and usage method and application thereof |
CN105008599A (en) * | 2013-02-07 | 2015-10-28 | 中国种子集团有限公司 | Rice whole genome breeding chip and application thereof |
CN104789648A (en) * | 2014-12-25 | 2015-07-22 | 中国种子集团有限公司 | Molecular markers for haplotype identification of paddy rice CMS restoring gene Rf-1 segment and applications thereof |
CN105550537A (en) * | 2016-01-07 | 2016-05-04 | 中国种子集团有限公司 | Method for identifying rice DNA identities and application thereof |
Also Published As
Publication number | Publication date |
---|---|
CN110050092A (en) | 2019-07-23 |
WO2018103037A1 (en) | 2018-06-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110050092B (en) | Rice whole genome breeding chip and application thereof | |
US9976191B2 (en) | Rice whole genome breeding chip and application thereof | |
CN108779459B (en) | Cotton whole genome SNP chip and application thereof | |
CN109196123B (en) | SNP molecular marker combination for rice genotyping and application thereof | |
CN113795597B (en) | Soybean SNP (Single nucleotide polymorphism) typing detection chip and application thereof in molecular breeding and basic research | |
CN108998550B (en) | SNP molecular marker for rice genotyping and application thereof | |
US20210285063A1 (en) | Genome-wide maize snp array and use thereof | |
CN107090495B (en) | Molecular marker related to long shape of neck of millet and detection primer and application thereof | |
CN107090494B (en) | Molecular marker related to grain number character of millet and detection primer and application thereof | |
CN106480228A (en) | The SNP marker of paddy rice low cadmium-accumulation gene OsHMA3 and its application | |
CN110846429A (en) | Corn whole genome InDel chip and application thereof | |
CN112289384A (en) | Construction method and application of whole citrus genome KASP marker library | |
Kim et al. | Detection and validation of single feature polymorphisms using RNA expression data from a rice genome array | |
CN111684113B (en) | Rice green gene chip and application | |
CN115232884B (en) | Genome-wide SNP molecular markers associated with rice drought resistance and their applications | |
CN118813855A (en) | A SNP locus combination related to rubber tree breeding traits and its application | |
CN118581275A (en) | A SNP locus combination related to agronomic traits of rubber trees and liquid-phase gene chip and application thereof | |
CN108048458B (en) | SNP marker of rice lodging-resistant gene and application thereof | |
CN119372326B (en) | SNP chip of Channa argus and its application | |
CN115976260A (en) | SNP molecular marker for rice genotyping and application thereof | |
CN118006822B (en) | Probe combination, detection system and application of water-saving drought-resistant rice breeding chip (WDR 6K) | |
CN118207349A (en) | SNP molecular marker for rice genotyping and application thereof | |
Thomson et al. | Development and application of 96-and 384-plex single nucleotide polymorphism (SNP) marker sets for diversity analysis, mapping and marker-assisted selection in rice | |
CN113817862B (en) | KASP-Flw-sau6198 molecular marker linked with wheat flag leaf width major QTL and application thereof | |
CN119372326A (en) | SNP chip of Channa argus and its application |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CP03 | Change of name, title or address | ||
CP03 | Change of name, title or address |
Address after: Room 3610, 6th Floor, Building 3, Yabulun Industrial Park, Yazhou Bay Science and Technology City, Yazhou District, Sanya City, Hainan Province, 572025 Patentee after: CHINA NATIONAL SEED GROUP Corp.,Ltd. Address before: 15 / F, Sinochem building, A2 Fuxingmenwai street, Xicheng District, Beijing 100045 Patentee before: CHINA NATIONAL SEED GROUP Corp.,Ltd. |