CN110050092B

CN110050092B - Rice whole genome breeding chip and application thereof

Info

Publication number: CN110050092B
Application number: CN201680091357.2A
Authority: CN
Inventors: 周发松; 喻辉辉; 谢为博; 雷昉; 李菁; 张小波; 周莹; 程丹; 陆青; 邱树青; 韦懿; 陈�光; 张启发
Original assignee: China National Seed Group Co Ltd
Current assignee: China National Seed Group Co Ltd
Priority date: 2016-12-08
Filing date: 2016-12-08
Publication date: 2023-01-03
Anticipated expiration: 2036-12-08
Also published as: CN110050092A; WO2018103037A1

Abstract

The application relates to SNP marker combination and a design method for rice genotyping, a chip designed aiming at the SNP markers and application thereof.

Description

Rice whole genome breeding chip and application thereof

Technical Field

The application relates to the fields of genomics, molecular biology, bioinformatics and molecular plant breeding, in particular to a rice whole genome breeding chip and application thereof.

Background

Genome breeding refers to the application of molecular biology techniques to breeding, and breeding is performed at the genome level. The main advantages are as follows: firstly, plant seeds or seedlings can be identified at a molecular level, and whether the plant seeds or seedlings have expected excellent properties or not can be further judged, so that selection is carried out, and the acceleration of a breeding process and the improvement of breeding accuracy are realized; secondly, a set of standard flow can be formed by molecular biological detection and analysis, and different technicians can quickly obtain accurate results according to the flow operation strictly, so that the influence of personal experience on plant breeding is greatly reduced; thirdly, the marking technology in the genome breeding can detect at the whole genome level, so that the separation of offspring caused by the material containing heterozygous sites is avoided, and the stability of the material is ensured. The marking technology is an important tool in genome breeding, and the technology makes great contribution to the functional genome research and genetic improvement of crops. Among them, SNP (Single Nucleotide Polymorphism) is increasingly widely used as a third-generation marker due to its characteristics of wide distribution, high density, high stability and high accuracy on the genome. The technology for detecting SNP with high flux mainly comprises a detection platform based on a sequencing technology and a detection platform based on a chip technology, and the SNP chip becomes an important tool in the process of genome breeding due to the controllability of a marker locus, the convenience of operation and the reliability of a result. Currently, the most mature SNP chip detection technology comprises two major platforms, namely an Illumina infinium chip and an Affymetrix Axiom chip.

The Illumina infinium chip technology is a high-density chip technology based on microbeads. The technology uses micro-beads with the diameter of 3 mu m to carry out self-assembly in micro-pores which take optical fiber bundles or planar silicon wafers as substrates. Each bead is covered with hundreds of thousands copies of a particular oligonucleotide that will be used as a capture sequence to genotype a sample in an assay. The chips can be divided into the following formats according to the number of types of oligonucleotides: 24 sample formats (3,000-90,000 bead types), 12 sample formats (90,001-250,000 bead types), or 4 sample formats (250,001-1,000,000 bead types). The scanning system matched with the chip is provided with advanced laser and optical elements, can process high-density multi-sample chips, generates high-quality data and ensures high running speed. The average detection rate of the sample is high due to the advanced analysis technology, and the repeatability is as high as 99.9%. These high quality data reduce the likelihood of false positives and false negatives, making genotyping results more accurate.

The Affymetrix Axiom chip adopts an in-situ photoetching technology, and the photomask design and strict process flow in the technology ensure that the manufactured chip has high quality, high repeatability and consistency, and the extremely high density of probe synthesis on the chip is ensured, wherein the number of probes synthesized on each square centimeter substrate exceeds 400 ten thousand. The Affymetrix GeneTitan system is a fully automated, highly integrated chip workstation using chip plates in a format similar to a 96-well plate, where each square chip occupies approximately the area of one well of the 96-well plate, and one chip plate may contain 16, 24, or 96 chips, thereby enabling multi-sample high-throughput assays. The system integrates a hybridization furnace, a fluid workstation and CCD scanning imaging equipment which are used in the whole process from hybridization to scanning into an instrument, after a chip plate is placed into a GeneTitan system, the hybridization, washing and scanning of the chip almost do not need manual intervention, and all the operations can be automatically completed by a machine.

The applicant discloses a Rice whole genome breeding chip Rice60K in PCT international application publication WO/2014/121419A1, and the chip is successfully applied to Rice genome breeding and functional genome research.

Disclosure of Invention

In one aspect, the present application provides a combination of SNP markers for rice genotyping, comprising SEQ ID NO: 1-27781.

In some embodiments, the SNP marker combinations of the present application further include SEQ ID no:27782-86071 or a nucleotide sequence shown in the specification. In some embodiments, the SNP marker combinations of the present application include SEQ ID NOs: 1-86071, at least 37582 nucleotide sequences.

In another aspect, the present application provides a rice chip comprising a nucleotide sequence for SEQ ID NO:1-27781, and the SNP marker in the nucleotide sequence.

In some embodiments, the rice chips of the present application comprise a nucleotide sequence directed against SEQ ID NO:1-27781, and the SNP marker in the nucleotide sequence.

In some embodiments, the rice chips of the present application further comprise a nucleotide sequence directed to SEQ ID NO: a detection site designed by the SNP label in the nucleotide sequence shown in 27782-86071. In some embodiments, the rice chips of the present application comprise a nucleotide sequence directed against SEQ ID NO:1-86071, at least 37582 of the nucleotide sequence. In some embodiments, the detection sites in the rice chips of the present application are a combination of probes designed for SNP markers.

In some embodiments, the rice chip of the present application is fabricated using an in-situ synthesis method for a sheet, a separation synthesis method, or a microbead method. In some embodiments, the rice chips of the present application are fabricated by in situ photolithography synthesis, photoresist layer parallel synthesis, microfluidic channel-on-chip synthesis, light guided in situ synthesis, soft lithography in situ synthesis, jet printing synthesis, molecular stamp-on-chip synthesis, maskless chip synthesis, beadArray method, or suspension chip method. In some embodiments, the rice chips of the present application are made by Illumina Infinium technology or Affymetrix Axiom technology.

In another aspect, the present application provides the use of the above-mentioned SNP marker combination or chip in the detection of a biological sample. In certain embodiments, the assays are used for breeding, identity identification, gene mapping and cloning, germplasm resource identification, hybrid rice identification, wild rice identification, functional gene identification, or functional gene haplotype analysis.

In another aspect, the present application provides a method of detecting a biological sample, the method comprising detecting in the biological sample the presence of SEQ ID NO:1-27781, and the nucleotide sequence thereof. In some embodiments, the methods of the present application further comprise detecting the presence of SEQ ID:27782-86071 in the sequence shown. In some embodiments, the methods of the present application comprise detecting the presence of SEQ ID NO:1-86071, at least 37582 of the nucleotide sequence. In some embodiments, the methods of the present application utilize gene chips for the detection.

In another aspect, the present application provides a method for screening a genetic resource representative SNP marker combination, comprising the steps of:

obtaining SNP sites from sequencing results of a plurality of rice varieties;

selecting a locus having a score greater than 0.6 in the Illumina scoring system;

performing comprehensive scoring on the SNP loci, wherein the comprehensive scoring is the simple sum of the following numerical values:

calculating the difference of the SNP loci as A/T or C/G for 0 point, and calculating the difference of other positions as 20 points;

when the SNP locus is positioned at different positions of a gene spacer region, an intron, a promoter, a5 'end non-coding region (5' -UTR) and a 3 'end non-coding region (3' -UTR), the SNP locus is divided into 1, 1.5, 2 and 2.5;

when the SNP causes a synonymous mutation, a non-synonymous mutation and a large-effect mutation in the coding region, the mutations are given scores of 2, 5 and 10, respectively;

(MAF × 25 at SNP site in whole population) + (MAF × 25 at SNP site in indica rice population) + (MAF × 25 at SNP site in japonica rice population) + (MAF × 25 at SNP site in mixed sequencing);

uniformly selecting a plurality of SNP sites on a rice genome according to the comprehensive score; and

and (3) carrying out linkage disequilibrium block division on the whole rice genome according to the LD value, selecting 2 sites with the highest comprehensive score and 25 sites at most for each block, and selecting at least 10 sites per 100 kb.

In another aspect, the present application provides a method for screening a combination of SNP markers specific to a promoted hybrid rice, comprising the steps of:

carrying out whole genome sequencing on multiple hybrid rice to obtain multiple SNP loci;

a composite score for a SNP site consisting of a simple sum of the following values:

the difference of the SNP loci is A/T or C/G, and the score is 0, and the other differences are 20;

when the SNP causes synonymous mutation, non-synonymous mutation and large-effect mutation in the coding region, the SNP is respectively given 2, 5 and 10;

MAF × 50 in mixed sequencing of snp sites;

and uniformly selecting a plurality of SNP sites on the rice genome according to the comprehensive scoring result.

In another aspect, the present application provides a method for screening a combination of SNP markers derived from wild rice, comprising the steps of:

obtaining SNP loci from a wild rice variety from a rice SNP database;

removing sites with other SNPs or indels existing in 55bp upstream and downstream of the SNP sites;

selecting SNP sites which can be detected in at least 10% of varieties;

comparing the upstream or downstream 55bp sequence of the SNP locus with the rice genome, and removing the SNP locus with the matching degree of more than 70 percent with other positions of the genome;

selecting a locus with a score greater than 0.6 in the Illumina scoring system;

the rice genome is divided into segments according to the positions of every 40kb, and each segment selects one SNP site with the highest score.

In another aspect, the present application provides a method of screening for a combination of functional gene region markers comprising the steps of:

obtaining a plurality of SNP sites from a rice SNP database, wherein the plurality of SNP sites are located within nucleotide sequences of a plurality of functional genes of a plurality of rice varieties and are detectable in more than three varieties;

removing SNP markers located outside 5kb upstream and downstream of the functional gene;

and selecting SNP sites in a specific functional gene region, wherein the number of the existing SNP sites in the region of the Rice60K chip disclosed in WO/2014/121419A1 is not more than 10.

Drawings

FIG. 1 shows the distribution of SNP sites on rice genome. The ordinate figures sequentially represent 12 chromosomes of the rice, and the abscissa is a physical position; the height of the vertical line indicates the number of SNP sites; the legend indicates the correspondence between the height of the vertical line and the number of SNP sites. 1a is the SNP locus distribution of a functional gene region in newly added 30K SNP loci; 1b is the distribution of wild rice source SNP loci in newly added 30K SNP loci; 1c, spreading the special SNP locus distribution of the hybrid rice in the newly added 30K SNP loci; 1d is the germplasm resource representative SNP locus distribution in the newly added 30K SNP loci; 1e is newly added 30K SNP locus distribution; and 1f is the distribution of newly added 30K and Rice60K SNP sites.

FIG. 2 is a genetic background of detection of modified materials against rice blast A08-1 using a 90K chip. 2a is the detection result of Rice60KAddon 1; and 2b is the detection result of Os90Kv1. Wherein, the boxes indicated by the abscissa numbers sequentially represent 12 rice chromosomes, and the ordinate numbers are physical positions [ in megabases (Mb) ] on the rice genome; in the figure, white background indicates the genotype of the donor material K22, black line indicates the genotype of the donor material K131, and the line of the black dot on chromosome 6 is the target fragment.

FIG. 3 the result of haplotype clustering analysis of the rice blast resistance gene Pi2/Pi9/Pigm region. 3a, utilizing the clustering analysis result of the newly added 30K SNP marker combination; and 3b is the clustering analysis result of the Rice60K chip. Wherein the ordinate represents the difference value of the material piece; the horizontal direction is for each test material, and the same haplotype type is divided by the horizontal line-connected representation.

Detailed Description

The term "single nucleotide polymorphism" or "SNP marker" or "SNP site" as used herein refers to a nucleotide sequence present in the genomic sequence of a chromosome, and polynucleotide sequence variations based on differences in nucleotide sequence (changes in a single nucleotide — A, T, C or G) result in diversity in the chromosomal genome, thereby allowing different alleles (e.g., alleles from two different individuals) or different individuals to be distinguished from each other. The change may occur in coding or non-coding regions of the gene (e.g., at or near the promoter region, or in introns) or in intergenic regions.

The term "allele" as used herein refers to a different form of the same gene present in a given locus on a homologous chromosome.

The term "linkage disequilibrium" as used herein refers to a non-random association at two or more sites, which may be on the same chromosome or on different chromosomes. Linkage imbalance is also referred to as gamete level imbalance or gamete imbalance. In another aspect, linkage disequilibrium is the frequency at which an allele or genetic marker exhibits in a population above or below the unimodal specimen predicted by the random frequency of the allele. Linkage refers to a limited combination of two or more loci on a chromosome, and linkage disequilibrium is not equivalent to linkage. The number of linkage disequilibrium depends on the observed and expected differences in site frequency. For those populations where the frequency of sites or genotypes after recombination is equal to the expected one we call this linkage balance. The degree of linkage disequilibrium depends on a variety of factors including genetic linkage, selection, and probability of recombination, genetic drift, type-selective mating, and population architecture.

The term "linkage disequilibrium block" as used herein refers to a haplotype block defining a genome-wide SNP marker based on the difference in linkage disequilibrium, using LD value D' as a criterion. Haplotypes are located in a set of associations of a particular region of a chromosome and tend to be a combination of single nucleotide polymorphisms that are inherited as a whole to progeny.

MAF is the minimum Allele Frequency (Minor Allele Frequency), which refers to the Frequency of occurrence of alleles not commonly found in a given population. A higher value indicates a greater likelihood of polymorphism between any two varieties.

The term "Indel" as used herein refers to insertions or deletions, which specifically refer to differences in the entire genome, with a certain number of nucleotide insertions or deletions in the genome of an individual relative to a standard control (Jander et al, 2002).

The term "SNP chip" as used herein refers to a biological microchip capable of analyzing the presence of SNPs contained in sample DNA by arranging and attaching several hundred to several hundred thousand biomolecules as probes, such as DNA, DNA fragments, cDNA, oligonucleotides, RNA or RNA fragments having known sequences, which are fixed at intervals on a small solid substrate formed of glass, silicon or nylon. Depending on the degree of complementarity, hybridization occurs between the nucleic acids contained in the sample and the probes immobilized on the surface. By detecting and judging the hybridization, information on the substance contained in the sample can be obtained at the same time.

The major types of DNA chips currently available include: in-situ chip synthesis, which adopts modified oligonucleotide monomers to gradually synthesize spatially combined probe sequences in situ to form a DNA chip, thereby directly synthesizing an oligonucleotide probe array on a hard surface. A method for synthesizing a DNA chip by spotting a probe sequence synthesized in advance on a specific site by spotting method, thereby forming a DNA probe array immobilized on a glass substrate. The bead method involves directly synthesizing DNA probes on coded beads, or fixing pre-prepared probe sequences on the coded beads, and then arbitrarily assembling to form bead chips.

In one aspect, the present application provides a combination of SNP markers for rice genotyping, comprising SEQ ID NO: 1-27781. The amino acid sequence of SEQ ID NO: the nucleotide sequence shown by 1-27781 is SNP site and 70bp of upstream and downstream thereof, and the probe can be designed from upstream or downstream when actually designed.

In certain embodiments, the SNP marker combination further includes SEQ ID no:27782-86071 or a nucleotide sequence shown in the specification. (ii) SEQ ID: the SNP markers in the nucleotide sequence shown in 27782-86071 are 58,290 SNP marker combinations detected by Rice whole genome breeding chip Rice60K disclosed in PCT international application WO2014/121419A1, and the SNP markers comprise the SNP markers and single-side sequences thereof, and can be used for designing the chip.

In the present context, SEQ ID:1-86071 is collectively referred to as 90K, wherein the SNP marker disclosed for the first time in this application (i.e., the SNP marker in the nucleotide sequence shown in SEQ ID NO: 1-27781) is referred to as newly added 30K, and the nucleotide sequence shown in SEQ ID: the SNP marker in the nucleotide sequence shown in 27782-86071 is called 60K.

In certain embodiments, the chip further comprises a sequence directed to SEQ ID NO:27782-86071, namely the chip comprises a detection site designed for the SNP marker in the nucleotide sequence shown in SEQ ID NO:1-86071, and the SNP marker in the nucleotide sequence. In certain embodiments, the chip comprises a nucleic acid sequence directed against SEQ ID NO:1-86071, at least 37582 of the nucleotide sequence. In certain embodiments, the detection site is a combination of probes designed for a SNP marker.

In certain embodiments, the chip is fabricated using an in-situ on-chip synthesis method, a ex-chip synthesis method, or a microbead method. In certain embodiments, the chip is fabricated by in situ photolithography synthesis, photoresist layer parallel synthesis, microfluidic channel-on-chip synthesis, light guided in situ synthesis, soft lithography technique in situ synthesis, jet printing synthesis, molecular stamp-on-chip synthesis, maskless chip synthesis, beadArray method, or suspension chip method. In certain embodiments, the chip is fabricated by Illumina Infinium technology, affymetrix Axiom technology.

In another aspect, the present application provides a method of detecting a biological sample, the method comprising detecting in the biological sample the presence of the nucleic acid of SEQ ID NO:1-27781, and the nucleotide sequence thereof.

In certain embodiments, the method further comprises detecting the presence of SEQ ID:27782-86071 in the sequence shown. In certain embodiments, the method comprises detecting the presence of SEQ ID NO:1-86071, at least 37582 of the nucleotide sequences.

In certain embodiments, the detection is performed using a gene chip. In certain embodiments, the chip comprises a sequence directed to SEQ ID NO:1-27781, and the SNP marker in the nucleotide sequence.

In certain embodiments, the chip further comprises a nucleic acid sequence directed against SEQ ID NO:27782-86071 and designing a detection site for the SNP marker in the nucleotide sequence shown in the specification. In certain embodiments, the chip comprises a nucleic acid sequence directed against SEQ ID NO:1-86071, at least 37582 nucleotide sequences in the nucleotide sequence. In certain embodiments, the detection site is a combination of probes designed for a SNP marker.

In certain embodiments, the chip is fabricated using an in-situ on-chip synthesis method, a ex-chip synthesis method, or a microbead method. In certain embodiments, the chip is fabricated by in situ photolithography synthesis, photoresist layer parallel synthesis, microfluidic channel-on-chip synthesis, light guided in situ synthesis, soft lithography technique in situ synthesis, jet printing synthesis, molecular stamp-on-chip synthesis, maskless chip synthesis, beadArray method, or suspension chip method. In certain embodiments, the chip is fabricated by Illumina Infinium technology or Affymetrix Axiom technology.

obtaining SNP sites from sequencing results of a plurality of rice varieties;

sequencing a whole genome of a plurality of hybrid rice to obtain a plurality of SNP loci;

MAF × 50 in mixed sequencing of snp sites;

In another aspect, the present application provides a method of screening a combination of wild rice-derived SNP markers, comprising the steps of:

obtaining SNP loci from a wild rice variety from a rice SNP database;

selecting SNP sites that can be detected in at least 10% of the varieties;

selecting a locus with a score greater than 0.6 in the Illumina scoring system;

Examples

Example 1 SNP marker selection method

As shown in SEQ ID NO: the SNP markers of the nucleotide sequence shown in 1-27781 consist of five markers, and the corresponding SNP sites are obtained by screening according to the following methods.

1. Representative SNP site of germplasm resources:

(1) 6,428,770 SNP sites were sequenced from 1491 rice varieties (from the RiceVarMap database, see webpage http:// RiceVarMap. Ncpgr. Cn /);

(2) Selecting a locus with a score greater than 0.6 in the Illumina scoring system;

(3) The SNP sites were scored synthetically, which is a simple sum of the following values:

according to the difference of the influence degree of different regions of the gene structure on the gene function, when the SNP loci are respectively positioned at different positions of a gene spacer region, an intron, a promoter, a5 'end non-coding region (5' -UTR) and a 3 'end non-coding region (3' -UTR), respectively assigning 1, 1.5, 2 and 2.5;

since the base mutation of the coding region is directly related to function, when the SNP causes a synonymous mutation, a non-synonymous mutation and a large-effect mutation (e.g., a termination mutation) in the coding region, it is divided into 2, 5 and 10;

(4) Uniformly selecting 4850 SNP loci on a rice genome according to the comprehensive score;

(5) Performing linkage disequilibrium block division on the rice whole genome according to the LD value; the general principle of selecting the sites is that SNP sites are representative and uniformly distributed, each block selects 2 sites with the highest comprehensive score, and at least 10 sites are ensured to be selected per 100 kb; when the number of blocks within 100kb is less than 5, i.e., less than 10 sites are selected per 100kb, 3 or more SNP sites are selected for a part of blocks, and a maximum of 25 sites are selected for each block.

Finally, based on LD selection, combining the whole rice population, indica-japonica subspecies and hybrid rice mixed sequencing results, 6108 SNP loci are selected (as shown in figure 1d, the distribution of germplasm resource representative SNP loci in newly added 30K SNP loci, the ordinate figures sequentially represent 12 chromosomes of rice, the abscissa is a physical position, the vertical line height represents the number of SNP loci, and the legend represents the corresponding relation between the vertical line height and the number of SNP loci).

2. Popularizing the specific SNP marker of hybrid rice:

(1) Hybrid rice purchased from the market is mixed and subjected to whole genome sequencing to obtain 2,207 and 700 SNP loci, wherein 13.8 percent of loci are not detected in 1491 variety (RiceVarMap database, see webpage http:// riceVarmap. Ncpgr. Cn /) sequencing data, which indicates that the increase of specific markers of the generalized hybrid rice is necessary;

(3) A composite score for a SNP site consisting of a simple sum of the following values:

since the base mutation of the coding region is directly related to the function, when the SNP causes synonymous mutation, non-synonymous mutation and a mutation having a large effect (e.g., a termination mutation) in the coding region, it is given to each of the

groups

2, 5 and 10;

MAF × 50 in mixed sequencing of snp sites;

(4) And uniformly selecting SNP sites on the rice genome according to the comprehensive scoring result.

Finally, 4850 SNP sites were selected from 100 production-applied hybrid rice genome sequencing data (as shown in FIG. 1c, the distribution of specific SNP sites of generalized hybrid rice among the newly added 30K SNP sites. The ordinate numbers sequentially represent 12 chromosomes of rice, the abscissa represents the physical position; the vertical line height represents the number of SNP sites; and the legend represents the corresponding relationship between the vertical line height and the number of SNP sites).

3. Wild rice-derived SNP markers:

(1) 2,472,942 SNP loci derived from 446 wild rice varieties are obtained from a rice SNP database (http:// 202.127.18.221/riceHap3/index. Php);

(2) Removing sites with other SNPs or indels in the upstream and downstream 55 bp;

(3) Selecting SNP loci which can be detected in at least 10% of varieties;

(4) Comparing the upstream or downstream 55bp sequence of the SNP locus with the rice genome, and removing the SNP locus with the matching degree of more than 70 percent with other positions of the genome;

(5) Selecting a locus having a score greater than 0.6 in the Illumina scoring system;

(6) The rice genome is divided into segments according to the positions of every 40kb, and each segment selects one SNP site with the highest score.

Finally, 8316 SNP loci evenly distributed on the genome are selected from 446 wild rice varieties which have been published (as shown in the distribution of wild rice source SNP loci in the newly added 30K SNP loci in FIG. 1b, the ordinate numbers sequentially represent 12 chromosomes of rice, the abscissa represents the physical position, the vertical line height represents the number of SNP loci, and the legend represents the corresponding relationship between the vertical line height and the number of SNP loci).

4. Functional gene region markers:

(1) Obtaining 5,680,149 SNP loci of 879 functional gene regions (Xiao Jing Hua, etc., china Rice functional genome research progress and prospect, scientific report 2015, 60, 1711-1722) from a rice SNP database (http:// ricevarmap. Ncpgr. Cn /) of 590 rice varieties, wherein the SNP loci can be detected in more than three varieties;

(3) Comparing the upstream or downstream 55bp sequence of the SNP locus with the rice genome, and removing the locus with the matching degree of more than 70 percent with other positions of the genome;

(4) Selecting SNP sites within 5kb upstream and downstream of 879 cloned functional genes;

(5) Selecting a locus with a score greater than 0.6 in the Illumina scoring system;

(6) And selecting SNP sites in a specific functional gene region, wherein the number of the SNP sites in the Rice60K chip disclosed in WO/2014/121419A1 is not more than 10.

Finally, 8316 large-effect SNP sites are selected from 879 reported functional gene regions (shown in the distribution of SNP sites of the functional gene regions in the newly added 30K SNP site in FIG. 1a, the ordinate figures sequentially show 12 chromosomes of rice, the abscissa shows the physical position, the vertical line height shows the number of the SNP sites, and the legend shows the corresponding relationship between the vertical line height and the number of the SNP sites).

5. Functional gene region haplotype markers:

191 SNP markers related to gene regions (Pi 1, pi2, bph14, bph15, rf-1) such as a rice blast resistance gene, a brown planthopper resistance gene, a fertility restorer gene and the like can distinguish different allelic types. The design method comprises the following steps: rice materials containing a target gene and rice materials not containing the target gene are selected, based on the position information of the known target gene in the genome, primers are designed for every 5-10kb by taking the Japanese fine genome as a reference, gene sequences in a region of 250kb before and after the target gene are obtained by a Sanger sequencing method, differential SNP marker design markers of the two groups of materials are explored, and 191 SNP markers of 5 gene regions (Pi 1, pi2, bph14, bph15 and Rf-1) are obtained in total.

Example 2 construction of Rice60KAddon1 chips Using SNP marker combinations

The applicant combines all the SNP markers obtained in example 1 with 58,290 SNP markers detected by Rice whole genome breeding chip Rice60K disclosed in PCT international application WO/2014/121419A1, and manufactures a Rice90K whole genome breeding chip by using the Illumina infinium chip technology (as shown in the distribution of newly added 30K and Rice60K SNP sites in FIG. 1f, the ordinate numbers sequentially show 12 chromosomes of Rice, the abscissa is a physical position, the vertical line heights show the number of SNP sites, and the legend shows the corresponding relationship between the vertical line heights and the number of SNP sites), which is named as Rice60KAddon1. The markers detected by the chip comprise 27781 SNP markers disclosed by the application and 58,290 SNP markers detected by Rice whole genome breeding chip Rice90K disclosed in PCT international application WO/2014/121419A 1. The sequence distribution of the chip probe is designed and selected in the 70bp area on both sides of the SNP marker according to the technical requirement of the Illumina infinium chip. SEQ ID NO: the SNP marker combination in the nucleotide sequence shown in 1-27781 is abbreviated as newly added 30K to be distinguished from the published SNP markers in the chip.

The Rice whole genome breeding chips Rice6K and Rice60K (or Rice SNP 50) developed by the applicant based on the Illumina infinium technology have been proved to be well applicable to Rice molecular breeding and functional genome research (Yu et al, a white-genome SNP array (Rice 6K) for genetic breeding in Rice plant biotechnological j 2014, 12.

Example 3 construction of Os90Kv1 chip Using SNP marker combination

The applicants submitted 58,290 SNP markers detected by Rice90K chip and 86,071 SNP markers in total of 27,781 SNP markers newly added to Affymetrix corporation (http:// www.affymetrix.com /) manufacturing chip. To make it fit to the Affymetrix Axiom chip platform, affymetrix designed two probe sets (probe sets) based on the sequences on both sides of each marker, and finally had a total of 131, 631 probe sets that detected a total of 86,014 SNP sites, the chip named Os90Kv1.

After the Os90Kv1 chip is produced, 192 rice samples including 96 inbred line parents and 96 hybrid F are detected on GeneTitan equipment (http:// www.affymetrix.com /) according to Affymetrix Axiom 2.0 chip detection process ₁ . After the analysis of Affymetrix data analyst, 190 samples (the detection rate is more than 99%) are qualified by quality control QC. The applicants further analyzed these data and screened high quality SNP markers according to the following criteria: (1) Two probe sets for detecting the same SNP locus adopt one probe with the best genotyping effect; (2) When 89 inbred line parent varieties (only one of 96 inbred line samples is selected for repeated detection or close genetic relationship of the same variety) are detected, the total number of heterozygous genotypes is less than or equal to 3; (3) The typing type is PolyHighResolution, monoHighResolution or NoMinorHom (typing type is provided by Affymetrix). Finally, 60,938 high-quality probe sets are obtained in total, and 60,938 SNP sites are detected.

Background analysis of a stable strain A08-1 (patent application No. CN201410532337.7, publication No. CN 105567790A) in which a blast-resistant gene was introduced into an empty culture 131 by using these high-quality SNP markers revealed that the background returned to the empty culture 131 substantially except for the target fragment introduced by Chr6, as shown in FIG. 2b (in the figure, the boxes indicated by the abscissa numerals indicate 12 chromosomes of rice in sequence, and the ordinate numerals indicate the physical positions [ in megabases (Mb) ] on the rice genome, the white background in the figure indicates the genotype of the empty culture 131 of an acceptor material, the black line indicates the genotype or experimental error of a donor material K22, and the line at the black dot on the No. 6 chromosome indicates the target fragment). The same samples were tested on a 90K chip (Rice 60KAddon 1) based on the Illumina infinium chip platform, and the background was completely clean, as shown in FIG. 2a (the boxes indicated by the abscissa number in the figure sequentially represent 12 chromosomes of Rice, and the ordinate number represents the physical position on the Rice genome [ in megabases (Mb) ], the white background in the figure represents the genotype consistent with the receptor material empty breeding 131, the black line represents the genotype consistent with the donor material K22, and the line at the black dot on the No. 6 chromosome is the target fragment). In practice, the probability of frequent occurrence of crossover in a small area in the vicinity is very low. Therefore, the black line shown in fig. 2b at the non-target segment is judged as an experimental error. That is, within the allowable error range (reliability > 99%), the Os90Kv1 chip based on the Affymetrix Axiom platform also has a better typing effect.

Example 4 functional Gene haplotype analysis and comparison of the newly added 30K and 60K

Many important agronomic trait related genes in rice are not single copy, for example, most of the rice blast resistance genes belong to NBS-LRR gene family. For such structurally complex genes, it is difficult to develop a single functional marker or design a linked marker on the gene, and the function of the gene can be detected by a haplotype marker in the gene region.

In order to verify the haplotype effect of the breeding chip on the genes, the applicant analyzed the rice blast resistance gene cluster Pi2/Pi9/Pigm of the 6 th chromosome of rice. In order to identify whether The rice blast resistant materials R002, R005, R004 and R006 contain The rice blast resistant gene in this region, a material containing a Pi2 gene was reported as C101A51 (Zhou et al, the light amino-acid differences with The present rice grain-rich repeat between Pi2 and Piz-t resistance genes determining The resistance property to Magnaporthe. Mol Plant Microbe interact.2006, 19 1216-1228): the reference variety containing The Pi9 gene is 75-1-127 (Qu et al, the broad-spectrum viral resistance gene Pi9 nucleotides a nucleotide-binding site-leucoine-rich repeat protein and is a member of a multigene quality in genetics.2006, 172, 1901-1914), and The reference variety containing The Pigm gene is Valley 4 (GM 4) (Deng et al, genetic characterization and finishing of The broad resistance gene log Pi2 and Pi9 in a broad-spectrum resistance Chinese Genetic 113, 705-713). DNA is extracted from 7 samples of a sample to be detected and a reference sample, and the whole genome genotypes of the 7 samples are obtained by utilizing a Rice whole genome breeding chip Rice60KAddon1 for detection according to an Illumina infinium chip detection process.

The results of 60K (SNP markers detected by Rice whole genome breeding chip Rice60K disclosed in WO2014/121419A 1) and newly added 30K SNP marker combinations in Pi2/Pi9/Pigm gene regions (in the upstream and downstream 250kb regions) were respectively extracted for cluster analysis, and the results are shown in FIG. 3 (the ordinate represents the material piece difference value; the abscissa represents each detection material, and the abscissa is connected with the abscissa to divide the detection material into the same haplotype type). The two clustered results in this region were identical, i.e., the haplotypes of R002, R005, R006 and C101A51 were identical, while the haplotype of R004 was identical to that of GM 4. This result indicates that R002, R005, and R006 contained the Pi2 gene, and R004 contained the Pigm gene. Sequencing verification is carried out on the target genes of the materials by using a Sanger method, and the sequencing verification is consistent with a clustering result, which shows that the functions of the SNP markers designed according to the haplotypes of the functional gene regions can be realized. In addition, the clustering result of Rice60K shows that the difference value between 75-1-127 and C101A51 is less than 0.2, and the result of newly adding 30K is more than 0.2 and close to 0.3. The larger the value, the better the classification. The two materials have proved to contain different resistance genes, so that the classification effect of the newly added 30K in the functional gene region is better than that of Rice60K.

Example 5 application of SNP marker combination and chip

1. Application in rice breeding

The Chinese patent application CN201410532337.7 (publication No. CN 105567790A) discloses a plant breeding method containing target genome DNA fragments:

(1) Taking a receptor plant parent without the target genome DNA segment as a recurrent parent, and carrying out hybridization, backcross and selfing with a donor plant parent containing the target genome DNA segment;

(2) Performing foreground selection by using a foreground selection marker in a breeding process;

(3) Carrying out whole genome background selection by using a high-density marker detection method in a breeding process;

(4) And (3) carrying out the steps until a target plant with homozygous target genome DNA segments and completely restored background is obtained by homologous recombination at both sides of the target genome DNA segments.

The "high-density marker detection method" in step (3) can be used for genotype detection using the SNP marker combinations described herein and the chips designed for these SNP markers.

2. Application in rice identity identification

The method for identifying the DNA identity of the rice disclosed in Chinese patent application CN201610009053.9 (publication No. CN 105550537A) obtains the standard gene fingerprint data of the rice by detecting the genotypes of a group of genetic diversity markers distributed in the whole genome of the rice, thereby identifying the DNA identity of the rice.

In the method, a group of genetic diversity markers distributed in the whole rice genome can be detected by utilizing the SNP marker combination and a chip designed aiming at the SNP markers.

3. Application in rice gene positioning and cloning

The Rice Whole Genome Breeding chip Rice6K developed by the applicant has been applied to the positioning of Rice grain size and Yield related QTL (Sun et al, identification of quantitative trait locus for grain size and the distributions of major grain-size QTLs to grain weight in Rice, mol Breeding DOI 10.1007/s11032-012-9802-z; tan et al, QTL screening for grain Yield Using a wheel Genome SNP Array, journal of Genetics and Genomics, 2013).

4. Applications in other directions

The SNP marker combination and the chip designed for the SNP markers have the following five types of markers: a germplasm resource representative marker, a generalized hybrid rice specific marker, a wild rice source marker, a functional gene region marker and a functional gene region haplotype marker. It is apparent that the SNP marker combination and the chip designed for the SNP markers can be applied to germplasm resource identification, hybrid rice identification, wild rice identification, functional gene identification and functional gene haplotype analysis.

Example 6 setting of minimum number of SNP markers for realizing detection function

As described in example 3, rice60KAdd1 could accurately judge the Rice blast resistance fragment contained in A08-1. Rice60KAdd1 detected 65071 high quality sites in A08-1 in total, wherein 11 SNP markers capable of distinguishing A08-1 from the recipient parent sterile 131 in the target Rice blast resistant fragment, are shown in the following table, wherein the recipient parent sterile 131 genotype is set as A and the donor parent K22 genotype is set as B.

TABLE 1 SNP marker for difference between empty breeding 131 and A08-1 of target rice blast resistant fragment

In practical judgment, it is generally considered that the continuous occurrence of AA or BB for 3 times at polymorphic sites of the material is more reliable, that is, the difference of the material in the target section can be determined by detecting the difference of more than 3 SNP markers in the above table. 65071 high-quality sites were sampled randomly for 100 times, and the number of samples for 11 different SNP markers in the table was counted. The result shows that when the number of sampling sites is greater than 37582, the probability that the number of the sampling sites is less than 3 in 11 differential SNP markers is less than 0.05, and the sampling sites belong to small probability events in normal distribution. That is, 37582 is the minimum number of SNP markers that realize the detection function among 86,014 SNP markers included in the Rice60KAdd1 chip.

Although the present application has been described in detail with respect to the general description and specific embodiments, it will be apparent to those skilled in the art that certain modifications or improvements may be made based on the present application. Accordingly, such modifications and improvements are intended to be within the scope of this invention as claimed.

Claims

1. The SNP marker combination for rice genotyping is characterized by comprising an SNP nucleotide fragment sequence shown as SEQ ID NO 1-27781.

2. The SNP marker set according to claim 1, further comprising a SNP nucleotide fragment sequence as set forth in SEQ ID No 27782-86071.

3. The rice chip is characterized by comprising detection sites designed aiming at SNP markers with nucleotide sequences respectively shown as SEQ ID NO 1-27781.

4. The chip of claim 3, wherein the chip further comprises detection sites designed for the SNP markers having the nucleotide sequences as set forth in SEQ ID NOS: 27782-86071, respectively.

5. The chip of claim 3 or 4, wherein the detection sites are a combination of probes designed for SNP markers.

6. The chip of claim 4, wherein said chip is fabricated using an in-situ synthesis method, an off-chip synthesis method, or a microbead method.

7. The chip of claim 4, wherein said chip is fabricated by in-situ photolithography synthesis, photoresist layer parallel synthesis, microfluidic channel-on-chip synthesis, light-guided in-situ synthesis, soft lithography in-situ synthesis, jet printing synthesis, molecular stamp-on-chip synthesis, maskless chip synthesis, beadArray method, or suspension chip method.

8. The chip of claim 4, wherein the chip is fabricated by Illumina Infinium technology or Affymetrix Axiom technology.

9. Use of the SNP marker according to any one of claims 1 to 2, or the chip according to any one of claims 3 to 8 for the detection of a biological sample.

10. The use of claim 9, wherein the detection is for breeding, identity identification, gene mapping and cloning, germplasm resources identification, hybrid rice identification, wild rice identification, functional gene identification, or functional gene haplotype analysis.

11. A method for detecting a biological sample, the method comprising detecting information on SNP markers having nucleotide sequences as set forth in SEQ ID NOS: 1-27781, respectively, in the biological sample.

12. The method of claim 11, further comprising detecting information of SNP markers having nucleotide sequences respectively shown in SEQ ID Nos. 27782-86071 in the biological sample.

13. The method of claim 11 or 12, wherein the detection is performed using a gene chip.

14. The method according to claim 13, wherein the chip comprises detection sites designed for SNP markers having nucleotide sequences as set forth in SEQ ID NO 1-27781, respectively.

15. The method according to claim 14, wherein the chip further comprises detection sites designed for SNP markers having the nucleotide sequences respectively shown in SEQ ID NOS: 27782-86071.

16. The method of claim 14 or 15, wherein the detection site is a probe combination designed for a SNP marker.

17. The method of claim 13, wherein the chip is fabricated using an in-situ wafer synthesis method, a separation wafer synthesis method, or a microbead method.

18. The method of claim 13, wherein the chip is fabricated by in-situ photolithography synthesis, photoresist layer parallel synthesis, microfluidic channel-on-chip synthesis, light-guided in-situ synthesis, soft lithography in-situ synthesis, jet printing synthesis, molecular stamp-on-chip synthesis, maskless chip synthesis, beadArray method, or suspension chip method.

19. The method of claim 13, wherein the chip is fabricated by Illumina Infinium technology or Affymetrix Axiom technology.