KR20050008651A

KR20050008651A - Methods for detecting genome-wide sequence variations associated with a phenotype

Info

Publication number: KR20050008651A
Application number: KR10-2004-7013908A
Authority: KR
Inventors: 마이에르빠스깔; 레비에브일리아; 오스테라스마그네; 파리넬리로랑
Original assignee: 솔렉사 리미티드; 링크스 세라퓨틱스 인코포레이티드
Priority date: 2002-03-05
Filing date: 2003-03-05
Publication date: 2005-01-21
Also published as: WO2003074734A3; WO2003074734A2; AU2003208480A1; JP2005518811A; CA2478722A1; EP1483404A2

Abstract

본 발명은 종의 표현형과 관련된 가설이 없는 방식의 게놈-와이드 서열 변이를 결정하는 방법을 제공한다. 본 발명의 방법에서, 표현형을 갖는 개인의 서브집단에서 각각의 제한 프래그먼트 세트는 하나 이상의 서로 다른 제한 효소를 이용하여 개인의 핵산을 분해함으로써 생성된다. 그런 다음, 개인의 제한 서열 태그 세트를 제한 프래그먼트 세트로부터 결정한다. 생명체의 서브집단의 제한 서열 태그를 비교하여, 상동 서열을 포함하는 제한 서열 태그를 포함하는 하나 이상의 그룹으로 분류한다. 획득된 하나 이상의, 제한 서열 태그의 그룹은 표현형과 관련된 서열 변이를 동정한다. 예를 들어, 희미한 유전적 위험 인자를 동정하기 위해 많은 환자의 샘플에 있는 많은 수의 서열 변이를 분석하는데 본 발명의 방법을 사용할 수 있다.The present invention provides a method for determining genome-wide sequence variation in a hypothesis that is not related to the phenotype of the species. In the methods of the present invention, each restriction fragment set in a subpopulation of an individual with the phenotype is generated by digesting the nucleic acid of the individual using one or more different restriction enzymes. The set of restriction sequences tags of the individual is then determined from the set of restriction fragments. Restriction sequence tags of subpopulations of organisms are compared and classified into one or more groups comprising restriction sequence tags comprising homologous sequences. One or more groups of restriction sequence tags obtained identify sequence variations associated with the phenotype. For example, the methods of the present invention can be used to analyze large numbers of sequence variations in a sample of many patients to identify faint genetic risk factors.

Description

Methods for detecting genome-wide sequence variations associated with a phenotype

유전학적 분석을 위한 분자적 접근은, 생명체의 게놈에서 자연적으로 그리고 무작위로 나타나는 뉴클레오티드 서열 변이를 추적한다. 개개인간 그리고 집단간의 DNA 폴리몰피즘(polymorphism)의 지식은 유전자형과 표현형 간의 복잡한 연결을 이해하는데 있어 중요하다. 서열 변이에 관한 완전한 데이터 없이, 우리는 소정의 관련 위치 또는 불규칙한 서열 변이의 위치를 추론하는 것을 가능하게 하는 '니어바이(nearby)' 마터를 인식하는 능력에 의존한다. 마커의 정보 제공성(informativeness)은 연관불평형(LD, linkage disequilibrium)의 크기에 의존한다. 마커는 후보 유전자를 연구하는 연결 연구(lingage study) 및 개체간 변이에 영향을 주는 후보 유전자의 기능적 대립형질유전자 변이를 동정하는 관련 연구에 사용될 수 있다.Molecular approaches for genetic analysis track nucleotide sequence variations that occur naturally and randomly in the genome of life. Knowledge of DNA polymorphism between individuals and groups is important in understanding the complex linkage between genotypes and phenotypes. Without complete data on sequence variation, we rely on our ability to recognize 'nearby' markers that make it possible to infer certain relevant positions or positions of irregular sequence variations. The informativeness of a marker depends on the size of linkage disequilibrium (LD). Markers can be used in lingage studies to study candidate genes and related studies to identify functional allelic variation of candidate genes that affect inter-individual variation.

약물 치료의 부작용 및 질병의 감수성을 개개인의 유전자 구성과 연관짓기위해서, 개개인간의 게놈의 차이를 모니터하는 것이 필수적이다. 현재의 접근방법은 거대한 세트의 유전자 마커, 예를 들면 게놈에 고르게 분포되어 있는 수천 개의 SNPs(Single Nucleotide Polymorphisms)를 모니터하는 것을 포함한다. 이러한 SNPs를 대조군 집단의 개인과 영향 집단의 개인에서 모니터한다. 주어진 SNPs에 있어서의 두 개의 집단 간의 연관불평형은 약물 반응 또는 질병 감수성에 연루되는 SNPs 및 게놈 부위 간의 게놈에 대한 육체적인 근접성에 대한 표시로서 이용된다.In order to correlate the side effects of drug treatment and disease susceptibility with the genetic makeup of individuals, it is essential to monitor the differences in genomes between individuals. Current approaches include monitoring large sets of genetic markers, for example thousands of single nucleotide polymorphisms (SNPs) that are evenly distributed in the genome. These SNPs are monitored in individuals in the control group and individuals in the influence group. Linkage disequilibrium between two populations for a given SNPs is used as an indication of the physical proximity to the genome between SNPs and genomic regions involved in drug response or disease susceptibility.

SNPs는 유전자 폴리몰피즘의 가장 흔한 형태이다. 기능적 변이로서 잠재력과 결합된 이것은 약물유전학적 인디켄이터 및 복잡한 질병의 유전자를 맵핑하기 위한 유전자 맵핑 마커로서 대단한 관심을 불러일으켰다(Risch et al., 1996, Science 273:1516-7; Kruglyak, 1997, Nat. Genet. 17:21-4; Masood, 1999, Nature 398:545-6). 수많은 SNP가 이미 동정되어 NCBI's SNP 데이터베이스에만 >2,500,000 개가 들어가 있다(http://www.ncbi.nlm.nih.gov/SNP/). 최근의 많는 연구가 흔한 질병과 관련된 잠재적 후보 유전자의 암호화 서열에 있어서의 폴리몰피즘을 동정하는데 초점이 맞춰져 있다(Nickerson et al, Nat. Genet., 1998 19:233-240; Cambien et al, 1999, Am. J. Hum. Genet. 65:183-91; Risch et al., 1996, Science 273:1516-7; Kruglyak, 1997, Nat. Genet. 17:21-4; Masood, 1999, Nature 398:545-6; Cargill et al., 1999, Nat. Genet. 22:231-238; Halushka et al., 1999, Nat. Genet. 22:239-247).SNPs are the most common form of gene polymorphism. Combined with the potential as a functional variation, this has generated great interest as a gene mapping marker for mapping genes of pharmacogenetic indicators and complex diseases (Risch et al., 1996, Science 273: 1516-7; Kruglyak, 1997). Genet. 17: 21-4; Masood, 1999, Nature 398: 545-6). Many SNPs have already been identified, with> 2,500,000 in the NCBI's SNP database (http://www.ncbi.nlm.nih.gov/SNP/). Many recent studies have focused on identifying polymorphisms in the coding sequences of potential candidate genes associated with common diseases (Nickerson et al, Nat. Genet., 1998 19: 233-240; Cambien et al, 1999 , Am. J. Hum. Genet. 65: 183-91; Risch et al., 1996, Science 273: 1516-7; Kruglyak, 1997, Nat. Genet. 17: 21-4; Masood, 1999, Nature 398: 545-6; Cargill et al., 1999, Nat. Genet. 22: 231-238; Halushka et al., 1999, Nat. Genet. 22: 239-247).

현재 기술 상태에서, 잘 선택된 100 명 정도의 대조군 집단의 개인의 많은 부분의 게놈을 집중적으로 재시퀀싱함으로써 먼저 SNPs를 발견해야 한다. 발견되는 가장 흔한 차이가 SNPs의 후보가 된다. 이러한 접근방법은 매우 시간 소모적이고 비용이 많이 들며, 그 결과는 대조군 집단의 선택에 의존한다.In the state of the art, SNPs must first be discovered by intensively resequencing the genomes of large portions of individuals from as many as 100 well selected control groups. The most common differences found are candidates for SNPs. This approach is very time consuming and expensive, and the result depends on the choice of the control group.

일단 SNPs가 동정되면, 개개인의 수많은 SNPs를 평가하는 신속하고 경제적인 방법을 개발하여야 한다. 현 기술 상태에에서는, SNP를 평가하는 대부분의 방법들은 SNP를 둘러싸는 작은 부위의 PCR에 의한 증폭(또는 다른 방법의 DNA 증폭 방법)에 의존한다. 이러한 증폭단계는 SNP를 둘러싸는 서열에 대한 지식과 각각의 SNP를 위한 특정의 통상적으로 제조되는 핵산 프라이머의 사용을 필요로 한다. 수많은 서로 다른 DNA 시퀀스의 동시 증폭은 지루하고 비용이 많이 드는 작업이며, 복잡하고 비싼 로봇공학 및 많은 양의 비싼 반응물을 필요로 한다.Once SNPs have been identified, a rapid and economic way of evaluating the individual's numerous SNPs should be developed. In the state of the art, most methods of assessing SNPs rely on amplification by PCR (or other methods of DNA amplification) of small regions surrounding the SNP. This amplification step requires knowledge of the sequences surrounding the SNPs and the use of specific conventionally prepared nucleic acid primers for each SNP. Simultaneous amplification of many different DNA sequences is a tedious and expensive task, requiring complex and expensive robotics and large amounts of expensive reactants.

이러한 많은 양의 변이 원료를 신속하고 정확하게 제노타이핑(genotyping)하는 능력은 유전자학 커뮤니티에서 더더욱 중요한 목표가 되고 있다(Bonn, D., 1999, Lancet, 353:1684). 사용 가능한 다양한 기술이 고효율의 제노파이핑 실험실에서 이용될 수 있는 잠재성을 가지고 있다(Landegren et al., 1998, Genome Research 8:769-776). 이러한 기술들로는 TaqMan과 같은 5' 엑소뉴클레아제 어세이(Lyvak et al., 1995, Nature Genet. 9:341-342), 분자 비콘(molecular beacons)(Tyagi et al, 1998, Nat. Biotechnol. 16:49-53), 올리고뉴클레오티드-라이게이션 어세이(OLAs)(Tobe et al.,1996, Nucleic Acids Res. 24:3728-3732), 염료-라벨 올리고뉴클레티드 라이게이션(DOL)(Chen et al., 1998, Genome Res., 8:549-556), 미니시퀀싱(Chen et al., 1997, Nucleic Acids res., 25:347-353; Pastinen et al., 1997, Genome Res. 7:606-614), 마이크로어레이 기술(Hacia etal., 1998, Genome Res. 8:1245-1258; Wang et al., 1998, Science, 280:1077-1082) 및 스콜피온(scorpions) 어세이(Whitcome et al., 1999, Nat. Biotechnol. 17:804-807).The ability to genotyping these large amounts of mutant material quickly and accurately has become an even more important goal in the genetic community (Bonn, D., 1999, Lancet, 353: 1684). The variety of techniques available has the potential to be used in highly efficient xenopipe laboratories (Landegren et al., 1998, Genome Research 8: 769-776). These techniques include 5 'exonuclease assays such as TaqMan (Lyvak et al., 1995, Nature Genet. 9: 341-342), molecular beacons (Tyagi et al, 1998, Nat. Biotechnol. 16 : 49-53), oligonucleotide-ligation assays (OLAs) (Tobe et al., 1996, Nucleic Acids Res. 24: 3728-3732), dye-labeled oligonucleotide ligation (DOL) (Chen et al., 1998, Genome Res., 8: 549-556), minisequencing (Chen et al., 1997, Nucleic Acids res., 25: 347-353; Pastinen et al., 1997, Genome Res. 7: 606 -614), microarray technology (Hacia et al., 1998, Genome Res. 8: 1245-1258; Wang et al., 1998, Science, 280: 1077-1082) and scorpions assay (Whitcome et al. , 1999, Nat. Biotechnol. 17: 804-807).

이러한 현존하는 방법들은 두 개의 주요 장애를 가지고 있으며, 하나는 평가하기 전에 SNPs가 동정되고 임의적으로 선택되어야 한다는 것이고, 다른 하나는 수많은 서로 다른 DNA 산물이 특정 증폭법에 의해 생산되어야 한다는 것이다. 그러므로, 본 발명자들은 이러한 두 가지 단점을 갖지 않는 방법을 디자인한다.These existing methods have two major obstacles, one is that SNPs must be identified and chosen arbitrarily before evaluation, and the other is that numerous different DNA products must be produced by specific amplification methods. Therefore, we design a method that does not have these two drawbacks.

현재의 기술상태에서, 현존하는 방법은 제약 산업의 요구를 만족시키지 못하고 있다. 제약 산업 이외에도, 의학 연구, 건강 관리, 수의학, 농업, 식품, 화장품, 및 많은 그 이외의 산업 및 분야와 같은 많은 다른 분야는 다른 상황 및/또는 다른 생명체에 기초한 동일한 접근방법을 이용하는 것에 관심이 있다. 따라서, 저렴하고 매우 높은 효율로 높은 정확도를 갖는, 생명체의 풍부한 유전적 변화로의 충분한 접근을 위한 새로운 방법이 필요하다.In the state of the art, existing methods do not meet the needs of the pharmaceutical industry. In addition to the pharmaceutical industry, many other fields, such as medical research, health care, veterinary medicine, agriculture, food, cosmetics, and many other industries and sectors, are interested in using the same approach based on different situations and / or different organisms. . Thus, there is a need for new methods for sufficient access to life's rich genetic changes, which are cheap and have very high efficiency with very high efficiency.

따라서, 적은 마커, 제한된 샘플 크기, 및/또는 샘플 풀의 사용으로 인해 현재의 게놈 스캔에서 검출되지 않은 희미한 유전적 위험 인자를 동정하기 위해, 많은 환자의 샘플 중의 수많은 서열의 변이를 분석하기 위한 보다 효과적인 방법이 필요하다. 그러므로, 본 발명의 목적은 보다 효율적인 시퀀싱(sequencing) 방법을 제공하는 것이다.Thus, in order to identify faint genetic risk factors that were not detected in current genome scans due to the use of fewer markers, limited sample size, and / or sample pools, a more extensive analysis of variations in numerous sequences in a sample of many patients was performed. An effective method is needed. It is therefore an object of the present invention to provide a more efficient sequencing method.

여기에서 참고문헌에 대한 논의 또는 인용은 그러한 참고문헌이 본 발명의 선행 기술이라는 것을 인정하는 것으로 해석되어져서는 안될 것이다.No discussion or citation of references herein shall be construed as an admission that such references are prior art to the present invention.

본 발명은 종(a species)의 생명체 집단에서 표현형과 관련된 게놈 와이드 서열 변이를 검출하는 방법에 관한 것이다. 본 발명은 또한 생명체를 위한 게놈 와이드 제한 서열 태그(tag)를 생성시키는 방법에 관한 것이다.The present invention relates to a method for detecting genome wide sequence variation associated with a phenotype in a species of organism. The invention also relates to a method for generating a genome wide restriction sequence tag for a living being.

도 1은 표현형과 관련된 제한 서열 태그를 동정하는 방법을 나타낸다.1 shows a method for identifying restriction sequence tags associated with phenotypes.

도 2a 및 2b는 제한 서열 태그를 결정하기 위한 발명의 구현예를 나타낸다.2A and 2B show embodiments of the invention for determining restriction sequence tags.

도 3a 및 3b는 인식 부위의 양쪽에서 절단하는 제한효소를 이용하여 생명체의 게놈으로부터 제한 프래그먼트를 생성함으로써 제한 서열 태그를 결정하는 것의 구현예를 나타낸다.3A and 3B show an embodiment of determining restriction sequence tags by generating restriction fragments from the genome of a living organism using restriction enzymes that cleave at both sides of the recognition site.

도 4a 및 4b는 type ⅡS 엔도뉴클레아제를 이용하여 생명체의 게놈으로부터 제한 프래그먼트를 생성함으로써 제한 서열 태그를 결정하는 것의 구현예를 나타낸다.4A and 4B show an embodiment of determining restriction sequence tags by generating restriction fragments from the genome of living things using type IIS endonucleases.

도 5a 및 5b는 이중 분해: 희귀한 커터를 이용한 다음 빈번한 커터를 이용하는 것을 용하여 생명체의 게놈으로부터 제한 프래그먼트를 생성함으로써 제한 서열 태그를 결정하는 것의 구현예를 나타낸다.5A and 5B show an embodiment of double digestion: determining restriction sequence tags by generating restriction fragments from the genome of living things using a rare cutter followed by the use of frequent cutters.

도 6a 및 6b는 이중분해: 제 1 제한효소 및 다수의 제 2 제한효소를 이용하여 생명체의 게놈으로부터 제한 프래그먼트를 생성함으로써 제한 서열 태그를 결정하는 것의 구현예를 나타낸다.6A and 6B show an embodiment of double digestion: determining restriction sequence tags by generating restriction fragments from the genome of life using a first restriction enzyme and a plurality of second restriction enzymes.

도 7a 및 7b는 이중분해: 제 1 제한효소 및 다수의 제 2 제한효소를 이용하여 생명체의 게놈으로부터 제한 프래그먼트를 생성함으로써 제한 서열 태그를 결정하는 것의 또 다른 구현예를 나타낸다.7A and 7B show another embodiment of determining the restriction sequence tag by generating restriction fragments from the genome of the organism using digestion: first restriction enzyme and a plurality of second restriction enzymes.

도 8a 및 8b는 이중분해: 제 1 제한효소 및 다수의 제 2 제한효소를 이용하여 생명체의 게놈으로부터 제한 프래그먼트를 생성함으로써 제한 서열 태그를 결정하는 것의 또 다른 구현예를 나타낸다.8A and 8B show another embodiment of determining the restriction sequence tag by generating restriction fragments from the genome of the organism using digestion: the first restriction enzyme and a plurality of second restriction enzymes.

도 9a는 클로닝된 DNA 프래그먼트로부터의 짧은 DNA 태그의 생성을 나타낸다. 긴 DNA 프래그먼트를 환상 벡터로 BsmFI 부위 사이로 클로닝한다. BsmFI 분해는 벡터에 부착된 단지 짧은 DNA 태그만을 남긴다. 자가 라이게이션 후에, 환상 벡터는 원래의 DNA 프래그먼트 인서트의 길이에 상관없이 태그의 쌍에 의해 형성된 인서트(insert)를 함유한다. 도 9b는 제 1 라이게이션 후의 생성물의 분석 결과를 나타낸다. Sau3AI로 분해된 람다 파지 DNA를 BamHI 분해/탈인산화 1차 생성 벡터와 라이게이션 하였다. 분석을 위해, 그 생성물을 삽입 부위를 측접하는 프라이머를 이용하여 PCR로 증폭하였다. 도 9c는 제 2 라이게이션 후의 생성물의 분석결과를 나타낸다. 도 9b에서와 동일한 샘플을 BsmFI 분해 및 자가 라이게이션에 의해 그들의 크기를 정규화 처리하여 환상 벡터를 생성시켰다. 분석을 위해, 이러한 생성물을 PCR로 증폭하였다. 132 bp의 기대된 피크가 관찰된다. 도 9d는 간단한 반응에서 획득된 제 2 라이게이션 생성물의 분석 결과를 나타낸다. 단일 인서트를함유하는 플라스미드를 BsmFI로 처리하고 Klenow 효소 처리한 후에 자가 라이게이션 하여 평활 말단을 생성시켰다. 분석을 위해, 생성물을 PCR로 증폭하였다. 정확한 크기보다 더 작은 크기의 프래그먼트에 해당하는 밴드가 전혀 관찰되지 않았다.9A shows the generation of short DNA tags from cloned DNA fragments. Long DNA fragments are cloned between the BsmFI sites with annular vectors. BsmFI digestion leaves only a short DNA tag attached to the vector. After self ligation, the annular vector contains an insert formed by a pair of tags, regardless of the length of the original DNA fragment insert. 9B shows the analysis results of the product after the first ligation. Lambda phage DNA digested with Sau3AI was ligated with BamHI digested / dephosphorylated primary production vector. For analysis, the product was amplified by PCR using primers flanking the insertion site. 9C shows the analysis results of the product after the second ligation. The same samples as in FIG. 9B were normalized to their size by BsmFI decomposition and self ligation to generate annular vectors. For analysis, this product was amplified by PCR. An expected peak of 132 bp is observed. 9D shows the results of analysis of the second ligation product obtained in a simple reaction. Plasmids containing a single insert were treated with BsmFI and subjected to Klenow enzyme followed by self ligation to produce blunt ends. For analysis, the product was amplified by PCR. No bands were observed that correspond to fragments smaller than the correct size.

도 10a는 두 개의 서로 다른 효소를 이용하는 분해에 의해 생성된 DNA가 2차 생성 백터로 클로닝되는 여러 가능성을 나타낸다. 도 10b는 2차 생성 벡터로의 in vitro 클로닝의 분석 결과를 나타낸다. MspI 및 SphI 분해 람다 DNA를 SphI 및 AccI로 분해된 벡터로 삽입하였다. BsmFI로 분해한 뒤, 제 2 라이게이션은 정규화된 크기의 인서트, 제한 서열 태그를 생성시켰다. 최종 라이게이션 반응의 증폭에 의해 획득된 PCR 산물을 분석하였다. 정확한 크기의 밴드만이 관찰되었다. 도 10c는 AluI 및 SphI 분해 람다 DNA를 HincⅡ 및 SphI 분해 벡터로 삽입할 때, 최초 라이게이션 산물의 분석 결과를 나타낸다. 분석을 위해 PCR 증폭한 후에, 기대한 바와 같이 분석을 위한 Agilent 2100 bioanalyzer DNA 1000 chip을 이용하여 서로 다른 크기의 프래그먼트가 관찰되었다. 가장 높은 피크는 크기 마커이다. 도 10d는 제 2 라이게이션 후에 도 10c에서와 동일한 샘플의 분석 결과를 나타낸다. PCR 증폭 후에, 오직 기대한 크기의 단일 프래그먼트만이 Agilent 2100 bioanalyzer DNA 1000 chip을 이용하여 관찰되었다. 크기 마커에 해당하는 피크를 도면에 나타내었다.FIG. 10A shows several possibilities for DNA produced by digestion using two different enzymes to be cloned into a secondary production vector. 10B shows the results of in vitro cloning with secondary production vectors. MspI and SphI digested lambda DNA was inserted into the vector digested with SphI and AccI. After digestion with BsmFI, the second ligation produced inserts, restriction sequence tags of normalized size. PCR products obtained by amplification of the final ligation reaction were analyzed. Only bands of the correct size were observed. 10C shows the results of analysis of the original ligation products when AluI and SphI digested lambda DNA were inserted into the HincII and SphI digest vectors. After PCR amplification for analysis, fragments of different sizes were observed using the Agilent 2100 bioanalyzer DNA 1000 chip for analysis as expected. The highest peak is the size marker. FIG. 10D shows the analysis results of the same sample as in FIG. 10C after the second ligation. After PCR amplification, only a single fragment of expected size was observed using an Agilent 2100 bioanalyzer DNA 1000 chip. Peaks corresponding to the size markers are shown in the figure.

도 11a-11b는 도 4a에 나타낸 단일 제한 서열 태그 방법을 이용하여 HindⅢ 및 RsaI 분해 DNA의 주형 제조를 나타낸다. 도 11c는 상기 방법의 여러 단계 후에수집된 분액을 방사선 사진에 의해 분석한 결과를 나타낸다. 레인 1: 완전한 DNA 콜로니 벡터 크기 350 bp의 PCR 산물; 레인 2-6: 람다 게놈 DNA 및 레인 7-10 인간 게놈 DNA; 짧은 암(arm)으로의 라이게이션 후의 레인 3 및 7: 여러 프래그먼트가 관찰됨; MmeI로 분해한 후의 레인 4 및 8, 크기 평준화가 관찰됨; 레인 5, 6, 9, 및 10: 긴 암으로 라이게이션한 후에 DNA 콜로니 벡터를 원하는 크기로 생성. 도 11d는 람다 DNA의 DNA 콜로니를 나타낸다. 도 11e는 람다 DNA(왼쪽 컬럼) 또는 인간 DNA(오른쪽 컬럼의 첫 3 개의 이미지)의 DNA 콜로니를 나타낸다. 그런 다음, 제한 서열 태그를 동정하기 위해, 이러한 DNA 콜로니를 WO 98/44152의 방법을 이용하여 in situ에서 시퀀싱한다.11A-11B show template preparation of HindIII and RsaI digested DNA using the single restriction sequence tag method shown in FIG. 4A. 11C shows the results of radiographic analysis of the collected aliquots after various steps of the method. Lane 1: PCR product of 350 bp of complete DNA colony vector size; Lanes 2-6: lambda genomic DNA and lanes 7-10 human genomic DNA; Lanes 3 and 7: after fragmentation to a short arm several fragments were observed; Lanes 4 and 8, size leveling after degradation with MmeI; Lanes 5, 6, 9, and 10: DNA colony vectors were generated at desired sizes after ligation into long cancers. 11D shows DNA colonies of lambda DNA. Figure 11E shows DNA colonies of lambda DNA (left column) or human DNA (first three images of right column). This DNA colony is then sequenced in situ using the method of WO 98/44152 to identify restriction sequence tags.

도 12는 3' 돌출로부터 평활 말단의 생성(PstI 분해에 대한 설명) 및 dCTP 존재 하에서 Klenow 폴리머라제에 의한 5' 돌출의 부분적인 보충(filling)(MspI 분해에 대한 설명)을 나타낸다.Figure 12 shows the generation of smooth ends from the 3 'overhang (description of PstI degradation) and partial filling of the 5' overhang with Klenow polymerase in the presence of dCTP (description of MspI degradation).

발명의 요약Summary of the Invention

본 발명은 종의 표현형과 관련된, 바람직하게는 가설이 없는 방식의, 게놈-와이드 서열 변이를 결정하는 방법을 제공한다. 한 구현예에서, 게놈-와이드 변이는 특정 표현형을 갖는 개인의 서브 집단으로부터 결정된다. 본 발명의 방법에서, 표현형을 갖는 개인의 서브집단에서 각각의 개인의 제한 프래그먼트 세트는 하나 이상의 서로 다른 제한 효소를 이용하여 개인의 핵산을 분해함으로써 생성된다. 바람직하게는, 그러한 세트의 제한 프래그먼트는 충분한 수의 서로 다른 제한 프래그먼트를 포함하여 생명체의 게놈에서의 서열 변이를 동정하는 것을 가능하게 한다. 보다 바람직하게는, 제한 프래그먼트 세트는 10, 100, 1000, 10⁴, 10⁵, 10⁶, 10⁷, 또는 10⁸이상의 서로 다른 제한 프래그먼트를 포함한다.The present invention provides a method for determining genome-wide sequence variation in a manner, preferably hypothesized, associated with a phenotype of a species. In one embodiment, genome-wide variation is determined from a subpopulation of individuals with a particular phenotype. In the methods of the invention, each individual's restriction fragment set in a subpopulation of individuals having a phenotype is generated by digesting the nucleic acid of the individual using one or more different restriction enzymes. Preferably, such sets of restriction fragments comprise a sufficient number of different restriction fragments to make it possible to identify sequence variations in the genome of an organism. More preferably, the restriction fragment set includes at least 10, 100, 1000, 10 ⁴ , 10 ⁵ , 10 ⁶ , 10 ⁷ , or 10 ⁸ different restriction fragments.

그런 다음, 한 세트의 제한 프래그먼트 태그를 개인의 그 제한 프래그먼트 세트로부터 개인 각각에 대해서 결정한다. 본 발명의 방법에서, 특정 표현형을 가는 서브 집단의 한 개인에서의 한 세트의 제한 서열 테그는 바람직하게는 예를 들어 개인의 게놈 DNA로부터의 한 세트의 제한 프래그먼트를 생성시킨 다음 각각의 제한 프래그먼트의 일부를 DNA 콜로니의 생성을 포함하는 방법(이하 기재)을 이용하여 시퀀싱 함으로써 결정된다.Then, a set of restriction fragment tags is determined for each individual from that restriction fragment set of the individual. In the method of the invention, a set of restriction sequence tags in an individual of a subpopulation with a particular phenotype preferably generates a set of restriction fragments, e.g., from the individual's genomic DNA, and then of each restriction fragment. Some are determined by sequencing using a method (described below) that involves the generation of DNA colonies.

그런 다음, 바람직하게는 서브 집단의 서로 다른 개인으로부터 획득된 한 세트의 제한 프래그먼트 태그를 비교하여, 각각의 그룹이 상동 서열을 포함하는 제한서열 태그를 포함하는 하나 이상의 그룹으로 분류한다. 그러한 비교는 바람직하게는 제한 서열 태그의 각각의 그룹의 수 또는 빈도의 결정을 가능하게 한다. 서브 집단의 상동 제한 태그 그룹의 수집은 표현형과 관련된 서열 변이를 동정하는데 사용될 수 있다. 바람직한 구현예에서는, 제한 서열 태그를 생명체의 게놈 시퀀스와 비교하여 제한 서열 태그의 게놈 위치를 확인한다. 또 다른 바람직한 구현예에서는, 양쪽 인식 부위에 측접하는 제한 서열 태그를 또한 생명체의 게놈 서열로부터 동정한다.Then, a set of restriction fragment tags obtained from different individuals of the sub-populations is preferably compared, so that each group is classified into one or more groups containing restriction sequence tags containing homologous sequences. Such a comparison preferably enables the determination of the number or frequency of each group of restriction sequence tags. Collection of homologous restriction tag groups of subpopulations can be used to identify sequence variations associated with phenotypes. In a preferred embodiment, the restriction sequence tag is compared with the genomic sequence of the organism to identify the genomic location of the restriction sequence tag. In another preferred embodiment, restriction sequence tags flanking both recognition sites are also identified from the genomic sequence of the organism.

발명의 상세한 설명Detailed description of the invention

본 발명은 종의 표현형과 연관된 게놈-와이드 서열 변이를 결정하는 방법을 제공한다(예를 들어, 도 1 참조). 본 발명은 표현형과 관련된 서열 변이를 가설 없이 그 표현형을 갖는 개인의 게놈 DNA 또는 cDNA의 충분한 많은 수의 서열 태그를 획득하고 비교함으로써 결정할 수 있다는 발견에 기초한다. 예를 들어, 게놈-와이드 변이는 동일한 표현형 특징을 갖는 특정 인종, 변종, 종, 속, 과 등에 속하는 개인과 같은 특정 표현형을 갖는 개인들의 서브집단으로부터 결정할 수 있다. 게놈-와이드 변이는 또한 예를 들어 건강한 개인, 특정 질병에 감수성을 갖는 개인, 또는 발달의 특정 상태에 있는 개인과 같은 서브집단으로부터 결정할 수 있다.The present invention provides a method for determining genome-wide sequence variation associated with a phenotype of a species (see, eg, FIG. 1). The present invention is based on the discovery that sequence variations associated with a phenotype can be determined by acquiring and comparing a sufficient number of sequence tags of genomic DNA or cDNA of an individual having that phenotype without hypothesis. For example, genome-wide variation can be determined from a subpopulation of individuals with a particular phenotype, such as individuals belonging to a particular race, variety, species, genus, family, etc. having the same phenotypic feature. Genome-wide variations can also be determined from subgroups, such as, for example, healthy individuals, individuals susceptible to certain diseases, or individuals in certain states of development.

본 발명의 방법에서, 표현형을 갖는 개인의 서브집단의 각각의 멤버에 대한 제한 프래그먼트 세트는 개인으로부터의 핵산을 하나 이상의 서로 다른 제한 효소를 이용하여 분해함으로써 생산된다. 여기에서 사용된 바와 같이, 제한 프래그먼트 세트는 하나 이상의 제한 프래그먼트를 포함할 수 있다. 그런 다음, 개인의 제한 서열 태그 세트를 제한 프래그먼트 세트로부터 결정한다. 생명체 서브 집단의 제한 서열 태그를 비교하고, 각각이 상동 서열을 포함하는 제한서열 태그를 포함하는 하나 이상의 그룹으로 분류한다. 한 구현예에서, 제한 태그 그룹은 60%, 70%, 80%, 90%, 또는 99% 이상이 상동인 제한 태그들로 구성된다. 또 다른 구현예에서, 제한 태그 그룹은 100% 상동인 제한 태그들로 구성된다. 획득된 하나 이상의 제한 서열 태그 그룹은 표현형과 연관된 서열 변이를 동정하는데 이용될 수 있다. 바람직한 구현예에서, 연구중인 표현형은 서열 변이의 비율 또는 조합과 관련이 있다. 본 발명은 또한, 서로 다른 표현형의 제한 서열 태그를 비교함으로써 다수의 표현형 중에서 게놈-와이드 서열 변이를 결정하는 방법을 제공한다. 본 발명의 방법의 생명체이 임의의 종에 적용 가능하다. 본 발명의 방법은 인간을 포함하지만 이에 한정되지 않는 고등동물 및 식물과 같이 복잡한 게놈을 가지고 있는 고등 진핵 생물에 특히 유용하다. 특히, 본 발명의 방법은 인간에서 질병의 감수성 또는 치료에 대한 반응과 관련된 서열 변이를 분석하고 동정하는데 유용하다.In the methods of the invention, a set of restriction fragments for each member of a subpopulation of an individual with the phenotype is produced by digesting nucleic acids from the individual with one or more different restriction enzymes. As used herein, the restriction fragment set may include one or more restriction fragments. The set of restriction sequences tags of the individual is then determined from the set of restriction fragments. The restriction sequence tags of the organism subpopulation are compared and classified into one or more groups, each containing a restriction sequence tag comprising a homologous sequence. In one implementation, a restriction tag group consists of restriction tags with at least 60%, 70%, 80%, 90%, or 99% homologous. In another implementation, the restriction tag group consists of restriction tags that are 100% homologous. One or more restriction sequence tag groups obtained can be used to identify sequence variations associated with the phenotype. In a preferred embodiment, the phenotype under study relates to the proportion or combination of sequence variations. The present invention also provides a method for determining genome-wide sequence variation among multiple phenotypes by comparing restriction sequence tags of different phenotypes. The organism of the method of the present invention is applicable to any species. The method of the present invention is particularly useful for higher eukaryotes with complex genomes, such as but not limited to higher animals and plants, including humans. In particular, the methods of the invention are useful for analyzing and identifying sequence variations associated with response to disease susceptibility or treatment in humans.

본 발명의 방법은 제한 서열 태그로부터 종의 게놈의 다형(polymorphism)을 확인하는데 사용될 수 있다. 본 방법은 종래의 방법에 비해 ⅰ) 상관 연구를 개시하기 전에 많은 세트의 다형을 발견할 필요가 없고; ⅱ) 상관 연구를 개시하기 전에 제한된 세트의 다형을 선택할 필요가 없고; ⅲ) 임의의 서열에 대한 사전 지식을 이용할 필요가 없고; ⅳ) 많은 세트의 서로 다른 올리고뉴클레오티드를 합성할 피료가 없고; ⅴ) 많은 수의 특정 증폭 단계를 수행할 필요가 없고; ⅵ) 다수의 서로 다른 제한효소를 이용함으로써 연구에 사용되는 다형의 수를 용이하게 증가시킬수 있고; ⅶ) 전체 공정을 단일 물리적 샘플을 조작함으로써 수행하는 반면에, 다른 방법에서는 물리적 샘플의 수가 분석될 다형의 수에 비례하는 증폭 단계가 한번 이상 있고; ⅷ) 각각의 개인이 분석될 수 있기 때문에 집단의 샘플을 모을 필요가 없고; ⅸ) 집단에서 매우 낮은 빈도로 존재하는 서열 변이를 동정할 수 있으며; ⅹ) 분석 비용이 현재의 제노타이핑(genotyping)보다 10배 이상 더 저렴하다는 여러 가지 장점을 나타낸다.The method of the present invention can be used to identify polymorphisms of the genome of a species from restriction sequence tags. The method does not have to find many sets of polymorphisms before initiating the correlation study compared to the conventional method; Ii) there is no need to select a limited set of polymorphisms before initiating correlation studies; Iii) there is no need to use prior knowledge of any sequence; Iii) no material to synthesize many sets of different oligonucleotides; Iii) no need to perform a large number of specific amplification steps; Iii) using a number of different restriction enzymes can easily increase the number of polymorphisms used in the study; Iii) the entire process is performed by manipulating a single physical sample, while in other methods there is at least one amplification step in which the number of physical samples is proportional to the number of polymorphs to be analyzed; Iii) there is no need to collect a sample of the population since each individual can be analyzed; Iii) identify sequence variations that exist at very low frequencies in a population; Iii) several advantages that the analysis cost is more than 10 times cheaper than current genotyping.

하기 설명 및 실시예에서, 많은 용어가 사용되었다. 명세서 및 청구범위에 대한 명백하고 일관성 있는 이해를 위해, 다음 정의를 제공한다.In the following description and examples, many terms are used. For a clear and consistent understanding of the specification and claims, the following definitions are provided.

용어 "게놈 부위"는 본 발명의 방법을 이용하여 개인의 집단으로부터의 샘플을 비교함으로써 확인되는, 하나 또는 다수의 서열 변이를 함유하는 게놈의 부분을 말한다.The term “genomic region” refers to a portion of a genome containing one or multiple sequence variations, identified by comparing samples from a population of individuals using the methods of the invention.

용어 "핵산"은 서로 공유결합으로 연결된 두 개 이상의 뉴클레오티드를 말한다. 본 발명의 핵산은 포스포디에스테르 결합을 함유할 수 있다. 본 발명의 핵산은 예를 들어 포스포라미드(예: 전체가 참고로 여기에 통합되어 있는 Beaucage et al, 1993, Tetrahedron 491925 참조), 포스포로티오에이트(예: 전체가 참고로 여기에 통합되어 있는 Mag et al, 1991, Nucleic Acids Res. 19:1437 및 미국특허 5,644,048 참조), 포스포로디티오에이트(예: Briu et al. (1989) J. Am. Chem. Soc. 111:2321), O-메틸포스포라미다이트 연결(예: Eckstein, Oligonucleotide and Analogues: A Practical Approach, Oxford University Press), 및 펩티드 핵산 골격 및 연결(예: 전체가 참고로 여기에 통합되어 있는 Egholm (1992) J. Am. Chem.Soc. 114:1895; Nielsen (1993) Nature 365:566)을 포함하는 골격을 갖는 핵산 유사체일 수 있다. 다른 다른 유사체 핵산은 양의 골격(예: 전체가 참고로 여기에 통합되어 있는 Denpcy et al (1995) Proc. Natl. Acad. Sci. USA 92:6097 참조), 비이온성 골격(전체가 참고로 여기에 통합되어 있는 US 5,386,023; US 5,637,684; US 5,602,240; US 5,216,141; 및 US 4,469,863 참조), 및 전체가 참고로 여기에 통합되어 있는 US 5,235,033 및 US 5,034,506에 기재되어 있는 것을 포함하는 비리보오스 골격이 있다. 하나 이상의 탄소환 당을 포함하는 핵산 또한 핵산의 정의 내에 포함된다(예: 전체가 참고로 여기에 통합되어 있는 Jenkins et al.(1995) Chem. Soc. Rev., 169-176쪽). 여러 핵산 유사체가 또한 전체가 참고로 여기에 통합되어 있는 Rawls, C & E News, June 2, 1997, 3쪽에 기재되어 있다. 리보오스-포스페이트 골격의 이러한 변형은 라벨과 같은 부가적인 모이어티의 부가를 촉진시키기 위해 또는 생리학적 환경 내에서 그러한 분자의 안정성 및 반감기를 증가시키기 위해 행해질 수 있다. 또한, 자연적으로 발생하는 핵산 유사체들의 혼합물, 및 자연적으로 발생하는 핵산 및 그 유사체들의 혼합물을 제조할 수 있다. 당해 기술분야에서 통상의 지식을 가진 자는 본 발명의 다양한 구현예에서 사용하기 위해 적절한 유사체를 선택하는 방법을 알 것이다. 예를 들어, 제한 효소로 분해할 경우 천연 핵산이 바람직하다. 핵산은 단일-가닥 또는 이중-가닥으로 특정될 수도 있고, 또는 이중-가닥 또는 단일-가닥 모두의 일부를 함유할 수 있다. 핵산은 예를 들어 게놈 DNA와 같은 DNA, cDNA, RNA, 또는 핵산이 데옥시리보- 및 리보- 뉴클레오티드의 임의의 조합 및 우라실, 아데닌, 티민, 시토신, 구아닌, 이노신, 잔타닌, 하이포잔타닌, 이소시토신, 이소구아닌 등을 포함하는 염기의 임의의 조합을 함유하는 하이브리드 DNA일 수 있다.The term “nucleic acid” refers to two or more nucleotides covalently linked to each other. Nucleic acids of the invention may contain phosphodiester bonds. Nucleic acids of the present invention are for example phosphoramide (e.g., Beaucage et al, 1993, Tetrahedron 491925, which is incorporated herein by reference in its entirety), phosphorothioate (e.g., incorporated herein by reference in its entirety) Mag et al, 1991, Nucleic Acids Res. 19: 1437 and US Pat. No. 5,644,048), phosphorodithioates (e.g. Briu et al. (1989) J. Am. Chem. Soc. 111: 2321), O- Methylphosphoramidate linkages (eg, Eckstein, Oligonucleotide and Analogues: A Practical Approach, Oxford University Press), and peptide nucleic acid backbones and linkages (eg, Egholm (1992) J. Am. Nucleic acid analogues having a backbone including Chem. Soc. 114: 1895; Nielsen (1993) Nature 365: 566). Other analogue nucleic acids include positive backbones (e.g., Denpcy et al (1995) Proc. Natl. Acad. Sci. USA 92: 6097, which is hereby incorporated by reference in its entirety), nonionic backbones (herein incorporated by reference in their entirety). Nonribose skeletons, including those described in US 5,386,023; US 5,637,684; US 5,602,240; US 5,216,141; and US 4,469,863), and US 5,235,033 and US 5,034,506, which are incorporated herein by reference in their entirety. Nucleic acids comprising one or more carbocyclic sugars are also included within the definition of a nucleic acid (eg, Jenkins et al. (1995) Chem. Soc. Rev., 169-176, which is hereby incorporated by reference in its entirety). Several nucleic acid analogs are also described in Rawls, C & E News, June 2, 1997, 3, which is hereby incorporated by reference in its entirety. Such modifications of the ribose-phosphate backbone can be done to promote the addition of additional moieties such as labels or to increase the stability and half-life of such molecules within the physiological environment. It is also possible to prepare mixtures of naturally occurring nucleic acid analogs, and mixtures of naturally occurring nucleic acids and analogs thereof. One of ordinary skill in the art would know how to select appropriate analogs for use in the various embodiments of the present invention. For example, natural nucleic acids are preferred when digested with restriction enzymes. Nucleic acids may be specified as single-stranded or double-stranded, or may contain portions of both double-stranded or single-stranded. Nucleic acids include, for example, DNA, such as genomic DNA, cDNA, RNA, or any combination of deoxyribo- and ribo-nucleotides and uracil, adenine, thymine, cytosine, guanine, inosine, xanthan, hypoxantanine, It may be a hybrid DNA containing any combination of bases including isocytosine, isoguanine and the like.

여기에 사용된 용어 "올리고뉴클레오티드"는 데옥시리보뉴클레오시드, 리보뉴클레오시드 등을 포함하는 천연 또는 변형 모노머 또는 연결을 갖는 선형 올리고머를 포함하며, Watson-Crick 유형의 염기 짝짓기, 염기 염기 스택킹(stacking), Hoogsteen 또는 역 Hoogsteen 형의 염기 짝짓기 등과 같은 모노머 대 모노머 작용의 규칙적인 패턴으로 타겟 폴리뉴클레오티드에 특이적으로 결합할 수 있다. 바람직하게는, 모노머를 포스포디에스테르 결합으로 연결하거나 그 유사체가 수 개(예: 3-4)의 모노머 유닛으로부터 수십의 모노머 유닛(예: 40-60) 크기 범위의 올리고뉴클레오티드를 형성한다. 올리고뉴클레오티드를 "ATGCCTG"와 같은 문자 서열로 나타낼 때마다, 뉴클레오티드는 왼쪽에서 오른쪽으로 5'에서 3' 방향으로 이해되고, 달리 특정하지 않으면 "A"는 아데닌, "C"는 시티딘, "G"는 구아노신, "T"는 타이미딘, 및 "U"는 유리딘을 나타낸다. 용어 "뉴클레오티드"는 "데옥시리보뉴클레오티드" 또는 "리보뉴클레오티드"를 말하며, "dATP", "dCTP", "dGTP", "dTTP", 및 "dUTP"는 각각의 뉴클레오티드의 트리포스페이트 유도체를 말한다. 대개, 올리고뉴클레오티드는 천연 뉴클레오티드를 포함하지만, 그들은 또한 비천연 뉴클레오티드 유사체를 포함한다. 예를 들어 효소로 처리하는 것을 수행할 때 천연 또는 비천연 뉴클레오티드를 갖는 올리고뉴클레오티드를 적용할 수 있다고 할 지라도, 천연 뉴클레오티드로 구성된 올리고뉴클레오티드가 바람직하다는 것이 당업자에게 명백할 것이다.As used herein, the term "oligonucleotide" includes linear or oligomers with natural or modified monomers or linkages, including deoxyribonucleosides, ribonucleosides, and the like, and includes Watson-Crick type base pairing, base base stacks A specific pattern of monomer-to-monomer interactions, such as stacking, Hoogsteen or reverse Hoogsteen type base pairing, can be specifically bound to the target polynucleotide. Preferably, the monomers are linked by phosphodiester bonds or analogs thereof form oligonucleotides ranging in size from tens of monomer units (eg 40-60) from several (eg 3-4) monomer units. Whenever an oligonucleotide is represented by a letter sequence such as "ATGCCTG", the nucleotide is understood from 5 'to 3' direction from left to right, unless otherwise specified, "A" is adenine, "C" is cytidine, "G "Guanosine," T "represents thymidine, and" U "represents uridine. The term “nucleotide” refers to “deoxyribonucleotide” or “ribonucleotide,” and “dATP”, “dCTP”, “dGTP”, “dTTP”, and “dUTP” refer to triphosphate derivatives of each nucleotide. Usually, oligonucleotides include natural nucleotides, but they also include non-natural nucleotide analogues. It will be apparent to those skilled in the art that oligonucleotides composed of natural nucleotides are preferred, although, for example, it may be possible to apply oligonucleotides with natural or unnatural nucleotides when performing treatment with enzymes.

용어 "다형(polymorphism)"은 집단에 두 개 이상의 대립형질(allele)이 존재하는 것을 말한다. 용어 "대립형질"은 특정 위치에서 여러 택일적인 서열 변이 중 하나를 말한다. 단일 염색체 위치에서의 다형은 유전학적 마커를 구성한다. 용어 "SNP"는 단일 뉴클레오티드 다형(Single Nucleotide Polymorphism)을 말한다. 바람직하게는, 예를 들어 SNP와 같은 유전적 변이는 생명체 집단에서 통상적인 것이며 Mendel 경향에 따라 유전된다. 그러한 대립형질은 표현형과 관련이 있을 수도 있고, 그렇지도 않을 수도 있다.The term "polymorphism" refers to the presence of two or more alleles in a population. The term “allele” refers to one of several alternative sequence variations at a particular location. Polymorphisms at a single chromosome location constitute genetic markers. The term "SNP" refers to Single Nucleotide Polymorphism. Preferably, genetic variations, such as, for example, SNPs, are common in the population of organisms and are inherited in accordance with the Mendel trend. Such alleles may or may not be related to the phenotype.

여기에 사용된 용어 "이형접합체"은 상동 염색체상의 해당 위치에서 서로 다른 대립형질을 갖는 개인을 말한다. 따라서, 여기에 사용된 용어 "이형접합의"란 상동 염색체상의 하나 이상의 짝지워진 위치에서 서로 다른 대립형질 유전자를 갖는 개체 또는 스트레인(strain)을 말한다.As used herein, the term “heterozygote” refers to an individual with different alleles at corresponding positions on homologous chromosomes. Thus, the term “heterozygous” as used herein refers to an individual or strain having different alleles at one or more paired positions on homologous chromosomes.

여기에 사용된 용어 "동형접합체"은 상동 염색체상의 해당 위치에서 동일한 대립형질을 갖는 개인을 말한다. 따라서, 여기에 사용된 용어 "동형접합의"란 상동 염색체상의 하나 이상의 짝지워진 위치에서 동일한 대립형질 유전자를 갖는 개체 또는 스트레인을 말한다.The term “homozygote” as used herein refers to an individual with the same allele at the corresponding position on the homologous chromosome. Thus, the term "homozygous" as used herein refers to an individual or strain having the same allele at one or more paired positions on a homologous chromosome.

용어 "변이"란 생명체의 DNA 서열에서 유전 가능한 변화를 말한다.The term "variation" refers to a heritable change in the DNA sequence of an organism.

용어 "유전형"은 (i) 개인의 유전자 구성, 또는 (ⅱ) 개인의 염색체 위치에서 발견되는 대립형질의 유형을 의하는 것으로 통상적으로 알려져 있다.The term “genotype” is commonly known to refer to (i) the genetic makeup of the individual, or (ii) the type of allele found at the chromosomal location of the individual.

용어 "제한 엔도뉴클레아제" 또는 "제한 효소"는 이중-가닥 DNA 분자에서 특정 염기 서열(표적 또는 인식 부위)을 인식하여 DNA 분자를 예를 들어 표적 또는인식 부위 가까이에서 또는 그로부터의 특정 거리 내에서 절단하는 효소를 말한다.The term "limiting endonuclease" or "limiting enzyme" recognizes a particular base sequence (target or recognition site) in a double-stranded DNA molecule to direct the DNA molecule to, for example, within or near a target or recognition site. Refers to an enzyme that cleaves from

용어 "제한 부위"는 제한 엔도뉴클레아제의 인식 부위 또는 개열(cleavage) 부위를 포함하는 핵산 바람직하게는 이중-가닥 핵산 내의 대개 4 내지 8 개의 뉴클레오티드, 또는 20 개 이상의 뉴클레오티드 부위, 그러나 이에 한정되지 않는 뉴클레오티드 부위를 말한다. 인식 부위는 제한 엔도뉴클레아제 또는 제한 엔도뉴클레아제 그룹이 결합하는 핵산 내에 있는 서열에 해당한다. 개열 부위 또는 절단 부위는 제한 엔도뉴클레아제에 의한 절단이 일어나는 특정 서열에 해당한다. 제한 엔도뉴클레아제에 따라, 절단 부위는 인식 부위 내에 있을 수 있다. 그러나, 예를 들어 type-ⅡS 엔도뉴클레아제와 같은 어떤 제한 엔도뉴클레아제는 인식 부위 밖에 개열 부위를 갖는다.The term "restriction site" refers to, but is not limited to, usually 4-8 nucleotides, or 20 or more nucleotide sites in a nucleic acid, preferably a double-stranded nucleic acid, comprising a recognition site or cleavage site of a restriction endonuclease. Does not refer to a nucleotide site. The recognition site corresponds to a sequence within a nucleic acid to which a restriction endonuclease or restriction endonuclease group binds. The cleavage site or cleavage site corresponds to the specific sequence where cleavage by the restriction endonuclease occurs. Depending on the restriction endonuclease, the cleavage site may be within the recognition site. However, some restriction endonucleases, such as, for example, type-IIS endonucleases, have cleavage sites outside the recognition site.

용어 "제한 프래그먼트"는 DNA 분자를 제한 엔도뉴클레아제로 분해하여 생성된 DNA 분자를 말한다.The term "limiting fragment" refers to a DNA molecule produced by digesting a DNA molecule with restriction endonucleases.

용어 "조작된 핵산" 또는 "어댑터(adaptor)"는 기결정된 뉴클레오티드 서열을 갖는 짧은 이중가닥 DNA 분자를 말한다. 바람직하게는 조작된 핵산 또는 어댑터는 10 내지 500 개의 염기쌍 길이를 갖는다. 보다 바람직하게는, 조작된 핵산 또는 어댑터는 10 내지 150 개의 염기쌍 길이를 갖는다. 바람직하게는, 제한 프래그먼트의 말단에 라이게이션 될 수 있는 방식으로 정해져 있다. 그러한 핵산은 일단 제한 프래그먼트 말단의 서열이 주어지면 당업자가 디자인 할 수 있다. 바람직하게는, 조작된 핵산은 하나 이상의 증폭 프라이머 서열을 포함하며, 증폭 프라이머 각각은 바람직하게는 조작된 핵산의 말단에 가깝고 프라이머 연장이 그 분자의말단을 향하는방향으로 가능하도록 배향된다. 증폭 프라이머들은 동일하거나 다를 수 있다. 바람직하게는, 조작된 핵산은 또한 하나 이상의 시퀀싱 프라이머 서열을 포함하고, 그 시퀀싱 프라이머 각각은 바람직하게는 조작된 핵산의 말단에 가깝고 분자의 말단을 향하는 방향으로 프라이머 연장이 가능하도록 배향된다. 시퀀싱 프라이머들은 동일하거나 다를 수 있다. 어떤 구현예에서, 조작된 핵산은 또한 하나 이상의 제한 부위를 포함할 수 있다. 조작된 핵산은 또한 본 명세서에서 DNA 콜로니 벡터를 일컫는 것이기도 하다.The term “engineered nucleic acid” or “adaptor” refers to a short double-stranded DNA molecule having a predetermined nucleotide sequence. Preferably the engineered nucleic acid or adapter is 10 to 500 base pairs in length. More preferably, the engineered nucleic acid or adapter has a length of 10 to 150 base pairs. Preferably, it is defined in such a way that it can be ligated to the end of the restriction fragment. Such nucleic acids can be designed by those skilled in the art once given the sequence of restriction fragment ends. Preferably, the engineered nucleic acid comprises one or more amplification primer sequences, each of which is preferably oriented such that it is close to the end of the engineered nucleic acid and primer extension is enabled towards the end of the molecule. Amplification primers can be the same or different. Preferably, the engineered nucleic acid also comprises one or more sequencing primer sequences, each of which is oriented to allow primer extension in a direction close to the end of the engineered nucleic acid and towards the end of the molecule. Sequencing primers can be the same or different. In certain embodiments, the engineered nucleic acid can also include one or more restriction sites. Engineered nucleic acids are also referred to herein as DNA colony vectors.

용어 "라이게이션(ligation)"은 리가아제에 의해 촉진되는 두 개의 이중-가닥 DNA 분자가 서로 공유결합하는 효소 반응을 말한다. 하나 또는 양쪽의 DNA 가닥은 서로 공유결합될 수 있다. 또한, 두 개의 DNA 가닥 중 오직 하나의 결합을 허여하기 위해, 말단 중 하나의 화학 및/또는 효소 변형을 통해 두 개의 가닥 중 어느 하나의 라이게이션을 억제하는 것이 가능하다.The term "ligation" refers to an enzymatic reaction in which two double-stranded DNA molecules promoted by ligase are covalently bonded to each other. One or both DNA strands may be covalently bonded to each other. It is also possible to inhibit the ligation of either of the two strands through chemical and / or enzymatic modification of one of the ends to allow binding of only one of the two DNA strands.

용어 "고체 지지체"는 라텍스 비드, 덱스트란 비드, 폴리스티렌, 폴리프로필렌 표면, 폴리아크릴아미드 겔, 금 표면, 유리 표면, 및 실리콘 웨이퍼와 같은 그러나 이에 한정되지 않는 핵산이 부착될 수 있는 임의의 고체 표면을 말한다. 바람직하게는, 고체 지지체는 유리 표면이다.The term "solid support" refers to any solid surface to which nucleic acids can be attached, such as but not limited to latex beads, dextran beads, polystyrene, polypropylene surfaces, polyacrylamide gels, gold surfaces, glass surfaces, and silicon wafers. Say Preferably, the solid support is a glass surface.

용어 "핵산 콜로니" 또는 "콜로니"는 핵산 가닥의 다수의 복제물을 포함하는 예를 들어 고체 표면 상의 분리된 부위를 말한다. 상보적 가닥의 다수의 복제는 또한 동일한 콜로니에 존재할 수 있다. 콜로니를 구성하는 핵산 가닥의 다수의 복제물은 일반적으로 고체 지지체 상에 고정되고 단일 또는 이중 가닥 형태일 수 있다.The term "nucleic acid colony" or "colony" refers to an isolated site, for example on a solid surface, that includes a plurality of copies of a nucleic acid strand. Multiple copies of complementary strands may also be present in the same colony. Multiple copies of the nucleic acid strands that make up the colony are generally immobilized on a solid support and may be in single or double stranded form.

여기에 사용된 용어 "콜로니 프라이머"는 상보적 서열에 혼성화되어 특정 폴라머라제 반응을 개시할 수 있는 올리고뉴클레오티드 서열을 포함하는 핵산 분자를 말한다. 콜로니 프라이머를 포함하는 서열은 상보적 서열과 최대의 혼성화 활성을 갖고 임의의 다른 서열과는 매우 낮은 비특이적 혼성화 활성을 갖도록 선택된다. 콜로니 프라이머는 5 내지 100 개의 염기 길이를 가질 수 있으나, 바람직하게는 15 내지 25 개의 염기 길이를 가질 수 있다. 천연적으로 생성되거나 비천연적으로 생성되는 뉴클레오티드가 프라이머에 존재할 수 있다. 하나 이상의 서로 다른 콜로니 프라이머는 본 발명의 방법에서 핵산 콜로니를 생성시키는데 사용될 수 있다.As used herein, the term “colony primer” refers to a nucleic acid molecule comprising an oligonucleotide sequence capable of hybridizing to a complementary sequence to initiate a specific polymerase reaction. Sequences comprising colony primers are selected to have maximum hybridization activity with complementary sequences and very low nonspecific hybridization activity with any other sequence. Colony primers may have a length of 5 to 100 bases, but may preferably have a length of 15 to 25 bases. Naturally occurring or non-naturally occurring nucleotides may be present in the primer. One or more different colony primers can be used to generate nucleic acid colonies in the methods of the invention.

5.1 특정 표현형의 개체로부터 샘플을 수집하고 기록하는 단계5.1 Collecting and recording samples from individuals of specific phenotype

특정 표현형을 갖는 개체의 게놈 DNA 또는 cDNA는 그러한 개인으로부터 수집한 샘플로부터 유래할 수 있다. 바람직하게는, 예를 들어 동일한 표현형 특징을 갖는 특정 인종, 변종, 종, 속, 과 등에 속하는 개체, 또는 예를 들어 건강한 상태, 특정 질병을 갖고 있는 상태, 또는 특정 발달 단계와 같은 특정 조건을 갖는 개체와 같이 같은 표현형을 갖는 개체의 서브집단을 동정한다. 그러한 개체의 서브 집단의 샘플을 수집하고 그 서브집단과 관련된 표현형 특징을 상세하게 기록한다. 그러한 세심한 기록은 하나 이상의 표현형에 대해 서열 변이를 지정하는 것을 촉진시킨다.Genomic DNA or cDNA of an individual with a particular phenotype can be derived from a sample collected from such individual. Preferably, for example, an individual belonging to a particular race, variety, species, genus, family, etc. having the same phenotypic characteristics, or having a particular condition, for example a healthy condition, a particular disease, or a particular stage of development. Identify subgroups of entities with the same phenotype as the subject. Collect samples of subpopulations of such individuals and record the phenotypic features associated with that subpopulation in detail. Such careful recording facilitates the assignment of sequence variations for one or more phenotypes.

5.2. 제한 분해에 의해 제한 프래그먼트를 생성시키는 방법5.2. How to generate restriction fragments by restriction decomposition

본 발명의 방법은 예를 들어 생명체로부터 유래된 세포로부터 추출된 게놈DNA 또는 생명체로부터 유래된 세포로부터 추출된 mRNAs로부터 제조된 cDNAs와 같은 생명체의 게놈 DNA 또는 cDNA로부터 한 세트의 제한 프래그먼트를 생성시키는 것을 연루한다. 본 발명에서, DNA, 예를 들어 게놈 DNA는 개체로부터, 예를 들어 서로 다른 세포, 부분, 조직, 또는 장기로부터 얻어질 수 있다. 본 발명의 다양한 구현예에서, 하나 이상의 서로 다른 제한효소를 동시에 또는 별도로 이용하여, 예를 들어 게놈 DNA로부터 제한 프래그먼트 세트를 생성시킨다. 바람직하게는, 제한 프래그먼트 세트는 충분히 많은 수의 제한 프래그먼트를 포함하여 생명체의 게놈에서 서열 변이를 동정하는 것을 가능하게 한다. 보다 바람직하게는, 제한 프래그먼트 세트는 10, 100, 1000, 10⁴, 10⁵, 10⁶, 10⁷, 또는 10⁸이상의 서로 다른 제한 프래그먼트를 포함한다.The method of the present invention comprises generating a set of restriction fragments from genomic DNA or cDNA of an organism such as, for example, genomic DNA extracted from a cell derived from an organism or cDNAs prepared from mRNAs extracted from a cell derived from an organism. Involve. In the present invention, DNA, eg genomic DNA, can be obtained from an individual, for example from different cells, parts, tissues, or organs. In various embodiments of the invention, one or more different restriction enzymes are used simultaneously or separately to generate a set of restriction fragments, for example from genomic DNA. Preferably, the restriction fragment set comprises a sufficiently large number of restriction fragments to make it possible to identify sequence variations in the genome of an organism. More preferably, the restriction fragment set includes at least 10, 100, 1000, 10 ⁴ , 10 ⁵ , 10 ⁶ , 10 ⁷ , or 10 ⁸ different restriction fragments.

분석되어야 할 핵산 분자, 예를 들어 게놈 DNA는 예를 들어 조직 균질화물, 혈액, 양수, 융모막 융모 샘플, 및 세균 배양물과 같은 임의의 원료로부터 획득될 수 있다. 상기 핵산 분자는 이러한 원료로부터 당해 기술분야에 알려져 있는 통상적인 방법을 이용하여 획득될 수 있다. 바람직하게는, 단지 아주 적은 양의 DNA 또는 RNA 핵산이 필요하다(RNA의 경우에는 역전사 단계가 PCR 전에 필요하다). 분자 생물학적 방법이 본 발명의 방법에 사용될 경우 표준 방법을 이용하여 수행한다(예: Ausubel et al., Current Protocols in Molecular Biology, John Wiley and Sons, New York 1989; Sambrook et al., Molecular Cloning, Laboratory Manual, 3^rdEditions, Cold Spring Harbr New York, 2001; Innis et al., PCR Protocols: AGuide to Methods and Applications, Academic Press, Cold Spring Harbor New York, 1989).Nucleic acid molecules to be analyzed, such as genomic DNA, can be obtained from any source such as, for example, tissue homogenates, blood, amniotic fluid, chorionic villus samples, and bacterial cultures. The nucleic acid molecule can be obtained from such raw materials using conventional methods known in the art. Preferably only a very small amount of DNA or RNA nucleic acid is required (in the case of RNA a reverse transcription step is required before PCR). When molecular biological methods are used in the methods of the present invention, they are performed using standard methods (e.g. Ausubel et al., Current Protocols in Molecular Biology, John Wiley and Sons, New York 1989; Sambrook et al., Molecular Cloning, Laboratory). ^{Manual, 3 rd Editions, Cold Spring} Harbr New York, 2001; Innis et al, PCR Protocols:. AGuide to Methods and Applications, Academic Press, Cold Spring Harbor New York, 1989).

당해 기술분야에 공지된 임의의 제한효소를 본 발명에 사용할 수 있다. 본 발명의 어떤 구현예에서, Type-ⅡS 엔도뉴클레아제는 일반적으로 상업적으로 입수 가능하며 당해 기술분야에 잘 알려져 있다. Type-ⅡS 엔도뉴클레아제는 이중가닥 폴리뉴클레오티드 서열 내에 있는 염기쌍의 특정 부위를 인식한다. 그 서열을 인식할 때, 엔도뉴클레아제는 폴리뉴클레오티드 서열을 개열하여 그 서열의 한쪽 가닥이 더 긴 "스틱키 엔드(sticky end)"를 남길 것이다. Type-ⅡS 엔도뉴클레아제는, type-Ⅱ 엔도뉴클레아제와 같이 특정 인식부위가 회문(palindrome)일 것, 즉 5'에서 3' 방향으로 읽을 때 염기쌍 서열이 인식 부위의 양쪽 가닥 모두에서 동일할 것을 요구하지 않는다. 또한, Type-ⅡS 엔도뉴클레아제는 또한 인식 부위의 외부를 개열한다. 개열이 인식 부위로부터 떨어진 소정의 염기쌍인 임의의 폴리뉴클레오티드 서열의 위치에서 일어나기 때문에, Type-ⅡS는 본 발명의 일부 구현예에서 개열 부위까지 사이에 있는 서열(intervene sequence)을 포착하는 것을 가능하게 한다. 본 발명에 유용한 특정 Type-ⅡS 엔도뉴클레아제는EarI.MnlI,PleI,AlwI,BbsI,BceAI,BsaI,BsmAI,BspMI,Eco571,Esp3I,HgaI,SapI,SfaNI,BbvI,BsmFI,FokI,BseRI,HphI,MmeI, 및MboIL이 있으나, 이에 한정되는 것은 아니다. 현재 발견된 효소는 그 인식부위로부터 최대 20-25 개의 염기를 절단한다. 예를 들어 인식 부위로부터 50, 100, 또는 200 개 이상의 염기만큼 떨어진 부위를 절단하는 효소가 본 발명에 유용할 것이다.Any restriction enzyme known in the art can be used in the present invention. In certain embodiments of the invention, Type-IIS endonucleases are generally commercially available and are well known in the art. Type-IIS endonucleases recognize specific sites of base pairs within double-stranded polynucleotide sequences. Upon recognition of the sequence, the endonuclease will cleave the polynucleotide sequence, leaving one strand of that sequence with a longer “sticky end”. Type-IIS endonucleases, like type-II endonucleases, must have a specific recognition site palindrome, ie the base pair sequence is identical on both strands of the recognition site when read in the 5 'to 3' direction. Do not require to do In addition, Type-IIS endonucleases also cleave outside of the recognition site. Because cleavage occurs at the location of any polynucleotide sequence that is a given base pair away from the recognition site, Type-IIS makes it possible to capture an intervene sequence between the cleavage sites in some embodiments of the invention. . Type-specific ⅡS endonuclease useful in the present invention Ear I. Mnl I, Ple I, Alw I, Bbs I, Bce AI, Bsa I, Bsm AI, Bsp MI, Eco 571, Esp 3I, Hga I, Sap I, Sfa NI, Bbv I, Bsm FI, Fok I, Bse RI, Hph I, Mme I, and Mbo IL, but are not limited thereto. The enzyme currently found cleaves up to 20-25 bases from its recognition site. For example, enzymes that cleave sites 50, 100, or 200 bases away from the recognition site will be useful in the present invention.

본 발명의 어떤 구현예에서, 희귀한 커터 및 빈번한 커터의 조합을 이용하여 제한 프래그먼트를 생성시킨다. 희귀한 커터는 4 개 이상의 뉴클레오티드 서열, 바람직하게는 6 또는 8 개의 뉴클레오티드 서열로 구성된 인식 부위를 갖는 제한 엔도뉴클레아제이다. 상업적으로 입수 가능한 희귀한 커터의 예는PstI,HpaⅡ,MspI,ClaI,HhaI,EcoRⅡ,BstBI,HinPI,MaeⅡ,BbvI,PvuⅡ,XmaI,SmaI,NciI,AvaI,HaeⅡ,SalI,XhoI, 및PvuⅡ가 있으며, 그 중에서PstI,HpaⅡ,MspI,ClaI,HhaI,EcoRⅡ,BstBI,HinPI, 및MaeⅡ가 바람직하다. 빈번한 커터는 4 개 이하의 염기 뉴클레오티드 인식부위를 갖는 제한 엔도뉴클레아제이다. 적절한 빈번한 커터 효소의 예로는MseI 및TaqI 등이 있다.In some embodiments of the invention, a combination of rare and frequent cutters is used to create the restriction fragment. The rare cutter is a restriction endonuclease having a recognition site consisting of at least 4 nucleotide sequences, preferably 6 or 8 nucleotide sequences. Examples of rare commercially available cutters are Pst I, Hpa II, Msp I, Cla I, Hha I, EcoR II, Bst BI, Hin PI, Mae II, Bbv I, Pvu II, Xma I, Sma I, Nci I, Ava I, Hae II, Sal I, Xho I, and Pvu II, among which Pst I, Hpa II, Msp I, Cla I, Hha I, EcoR II, Bst BI, Hin PI, and Mae II desirable. Frequent cutters are restriction endonucleases with up to 4 base nucleotide recognition sites. Examples of suitable frequent cutter enzymes include Mse I and Taq I.

본 발명의 어떤 구현예에서, 제한 프래그먼트는 분해 부위에서 다른 핵산 또는 그 자신과 연결된다. 전형적으로, 제한 효소는 양쪽 가닥의 최종 뉴클레오티드가 염기 쌍을 이루는 평활 말단(blunt end) 또는 두 개의 가닥 중 하나가 돌출되어 짧은 단일 가닥 연장이 생성되는 점착성 말단(staggered end)을 생성시킨다. 본 발명의 어떤 구현예에서, 제한효소가 Type-ⅡS인 경우, 폴리머라제로 돌출 말단을 평활 말단으로 전환시킴으로써 말단의 변형을 포함하는 단계를 부가하는 것이 바람직하다.In certain embodiments of the invention, the restriction fragment is linked to another nucleic acid or to itself at the site of degradation. Typically, restriction enzymes produce blunt ends where the base nucleotides of both strands are base paired, or a staggered end where one of the two strands protrudes, resulting in a short single strand extension. In certain embodiments of the invention, when the restriction enzyme is Type-IIS, it is preferred to add a step comprising modification of the ends by converting the protruding ends into smooth ends with polymerase.

5.3. 제한 서열 태그를 결정하는 방법5.3. How to Determine Restriction Sequence Tags

섹션 5.2의 방법에 의해 생성된 제한 프래그먼트의 제한 서열 태그 세트를 결정하기 위해 당해 기술분야에서 공지된 임의의 방법을 사용할 수 있다. 바람직하게는, 제한 프래그먼트는 시퀀싱 전에 증폭한다. 그러나, 단일 분자 시퀀싱과같이 증폭을 요구하지 않는 시퀀싱 방법은 부가적인 증폭단계 없이 사용될 수 있다. 바람직하게는, 생성된 제한 서열 태그의 길이는 5 개 이상의 뉴클레오티드이다. 보다 바람직하게는, 생성된 제한 서열 태그의 길이는 10 개 내지 20 개 범위의 뉴클레오티드이다. 더욱 바람직하게는, 생성된 제한 서열 태그의 길이는 50 개 이하의 뉴클레오티드이다. 바람직하게는, 제한 프래그먼트의 제한 서열 태그를 결정하기 위해 DNA 콜로니의 생성 및 시퀀싱을 포함하는 방법이 사용된다. 당해 기술분야에서 공지된 임의의 방법이 본 발명에서 사용될 수 있다(예: 전체가 참고로 여기에 통합되어 있는 PCT 공개공보 WO 98/44151, WO 98/44152, WO00/18957, 및 WO 02/46456). 하나의 핵산 콜로니를 단일의 고정된 핵산 주형, 예를 들어 제한 프래그먼트로부터 유래된 핵산 주형으로부터 생성할 수 있다. 본 발명의 방법은 또한 각각이 서로 다른 고정된 핵산을 함유하는 많은 그러한 핵산 콜로니를 동시에 생성하는 것을 가능하게 한다.Any method known in the art can be used to determine the restriction sequence tag set of restriction fragments generated by the method of section 5.2. Preferably, the restriction fragments are amplified before sequencing. However, sequencing methods that do not require amplification, such as single molecule sequencing, can be used without additional amplification steps. Preferably, the resulting restriction sequence tag is at least 5 nucleotides in length. More preferably, the resulting restriction sequence tag is in the range of 10 to 20 nucleotides in length. More preferably, the resulting restriction sequence tag is 50 or less nucleotides in length. Preferably, a method comprising the generation and sequencing of DNA colonies is used to determine the restriction sequence tag of the restriction fragment. Any method known in the art can be used in the present invention (eg, PCT Publications WO 98/44151, WO 98/44152, WO00 / 18957, and WO 02/46456, which are hereby incorporated by reference in their entirety). ). One nucleic acid colony may be generated from a single immobilized nucleic acid template, eg, a nucleic acid template derived from a restriction fragment. The method of the invention also makes it possible to produce many such nucleic acid colonies simultaneously, each containing a different immobilized nucleic acid.

고체 표면에 고정된 프라이머를 이용하여 DNA 프래그먼트, 예를 들어 제한 프래그먼트를 포획하고 증폭하는 것을 포함하는 방법으로 DNA 콜로니를 생성할 수 있다(PCT 공개공보 WO 98/44151 및 WO 98/44152 참조). DNA 프래그먼트가 환상인 본 발명의 구현예에서, 제한 효소를 이용하여 환상 DNA 프래그먼트를 선형화하는 단계를 콜로니 생성 전에 수행하는 것이 바람직하다. 한 구현예에서, DNA 콜로니는DNA colonies can be generated in a method comprising capturing and amplifying DNA fragments, eg, restriction fragments, using primers immobilized on a solid surface (see PCT Publications WO 98/44151 and WO 98/44152). In embodiments of the invention where the DNA fragment is annular, it is preferred to perform the step of linearizing the annular DNA fragment using restriction enzymes prior to colony production. In one embodiment, the DNA colony is

ⅰ) 고체 표면 상에 5' 말단으로 고정된 다수의 콜로니 프라이머를 포함하는 고체 표면을 제공하고, 상기 각각의 콜로니 프라이머는 샘플의 DNA 분자의 3' 말단에서의 서열과 혼성화 가능한 서열을 포함하는 것을 특징으로 하는 단계;Iii) providing a solid surface comprising a plurality of colony primers immobilized at the 5 'end on the solid surface, wherein each colony primer comprises a sequence capable of hybridizing with the sequence at the 3' end of the DNA molecule of the sample. Characterized by;

ⅱ) 상기 DNA 분자를 변성시켜 단일 가닥 프래그먼트를 생성시키는 단계;Ii) denaturing said DNA molecule to produce a single stranded fragment;

ⅲ) 상기 단일가닥 프래그먼트를 상기 고정된 콜로니 프라이머에 어닐링(annealing)하는 단계;Iii) annealing the single stranded fragment to the immobilized colony primer;

ⅳ) 상기 어닐링된 단일 가닥 프래그먼트를 주형으로 이용하여 프라이머 연장 반응을 수행하여 고정된 이중가닥 핵산 프래그먼트를 생성시키는 단계;Iii) performing a primer extension reaction using the annealed single stranded fragment as a template to generate an immobilized double stranded nucleic acid fragment;

ⅴ) 상기 고정된 이중가닥 핵산 프래그먼트를 변성시켜 고정된 단일가닥 프래그먼트를 생성시키는 단계;Iii) denaturing the immobilized double stranded nucleic acid fragments to produce immobilized single stranded fragments;

ⅵ) 상기 고정된 단일가닥 프래그먼트를 고정된 콜로니 프라이머에 어닐링 하는 단계;Iii) annealing the fixed single stranded fragments to the fixed colony primers;

ⅶ) 상기 고체 표면 상의 각각의 특정 부위에 상기 콜로니가 생성되도록 상기 단계 ⅳ) 내지 ⅵ)을 반복하는 단계를 포함하는 방법에 의해, DNA 분자 샘플, 예를 들어 제한 프래그먼트 풀로부터 생성된다.Iii) repeating steps iii) to iii) such that the colonies are produced at each particular site on the solid surface, resulting from a sample of DNA molecule, for example a pool of restriction fragments.

바람직한 구현예에서, 고정된 프라이머는 DNA 분자의 서열과 혼성화 가능한 서열을 포함한다. 예를 들어, 샘플 중의 DNA 분자는 기결정된 서열을 갖는 핵산에 연결된 제한 프래그먼트일 수 있다. 그러한 경우에, 고정된 프라이머는 기결정된 서열 중의 서열에 혼성화 가능한 서열을 가질 수 있다. 본 발명의 다른 구현예에서는, 서로 다른 서열을 갖는 콜로니 프라이머들이 이용될 수 있다. 본 발명에서 사용하기 위한 프라이머는 바람직하게는 5 개 이상의 염기 길이를 갖는다. 보다 바람직하게는, 프라이머는 100 개 미만 또는 50 개 미만의 염기 길이를 갖는다.본 발명은 주형을 고정된 프라이머에 어닐링하는 단계, 프라이머 연장 단계, 및 연장된 프라이머를 주형으로부터 분리하는 단계를 반복한다. 이 단계는 PCR 시약 및 조건(또는 역전사 및 PCR)을 이용하여 수행될 수 있다는 것이 당해 기술분야에서 통상의 지식을 가진 자는 알 것이다. PCR 기술은 예를 들어, 전체가 참고로 여기에 통합되어 있는 Springer-Verlag에 의해 1992년 발행된 "PCR: Clinical Diagnostics and Research"에 개시되어 있다.In a preferred embodiment, the immobilized primer comprises a sequence capable of hybridizing with the sequence of the DNA molecule. For example, the DNA molecule in the sample may be a restriction fragment linked to a nucleic acid having a predetermined sequence. In such cases, the immobilized primer may have a sequence that is hybridizable to the sequence in the predetermined sequence. In other embodiments of the invention, colony primers having different sequences can be used. Primers for use in the present invention preferably have a length of at least 5 bases. More preferably, the primers have a length less than 100 or less than 50 bases. The present invention repeats the steps of annealing the template to the immobilized primer, extending the primer, and separating the extended primer from the template. . It will be appreciated by those skilled in the art that this step can be performed using PCR reagents and conditions (or reverse transcription and PCR). PCR techniques are disclosed, for example, in "Clinical Diagnostics and Research" published in 1992 by Springer-Verlag, which is hereby incorporated by reference in its entirety.

DNA 콜로니는 또한 PCT 공개공보 WO 00/18957에 기재된 방법으로 생성시킬 수 있다. 증폭되는 DNA 프래그먼트가 환상인 본 발명의 구현예에서, 환상 DNA 프래그먼트를 제한 효소를 하여 선형화 하는 단계를 콜로니 생성 전에 수행하는 것이 바람직하다. 한 구현예에서, DNA 콜로니는 DNA 분자 샘플, 예를 들어 제한 프래그먼트 풀(pool)로부터,DNA colonies can also be produced by the methods described in PCT Publication WO 00/18957. In embodiments of the invention where the DNA fragments to be amplified are cyclic, it is preferred to perform the linearization of the cyclic DNA fragments with restriction enzymes prior to colony production. In one embodiment, the DNA colony is derived from a sample of DNA molecules, for example a restriction fragment pool,

ⅰ) 샘플 중의 DNA 분자를, 각각의 콜로니 프라이머가 그 DNA 분자의 3' 말단의 서열과 혼성화 가능한 서열을 포함하는 콜로니 프라이머와 혼합하는 단계;Iii) mixing the DNA molecules in the sample with colony primers comprising sequences in which each colony primer hybridizes with the sequence at the 3 'end of the DNA molecule;

ⅱ) 고체 표면 상의 DNA 분자 및 콜로니 프라이머를 DNA 분자 및 콜로니 프라이머 모두의 5' 말단으로 고체 표면에 이식하여 고정된 선형 프래그먼트 및 고정된 콜로니 프라이머를 생성시키는 단계;Ii) implanting DNA molecules and colony primers on the solid surface to the solid surface at the 5 'end of both the DNA molecules and colony primers to produce fixed linear fragments and fixed colony primers;

ⅲ) 상기 고정된 DNA 분자를 변성시켜 고정된 단일가닥 프래그먼트를 생성시키는 단계;Iii) denaturing the immobilized DNA molecules to produce immobilized single stranded fragments;

ⅳ) 상기 고정된 단일가닥 프래그먼트를 고정된 콜로니 프라이머에 어닐링하여 어닐링된 단일가닥 프래그먼트를 획득하는 단계;Iii) annealing the fixed single stranded fragment to the fixed colony primer to obtain an annealed single stranded fragment;

ⅴ) 상기 어닐링된 단일가닥 프래그먼트를 주형으로 하여 프라이머 연장 반응을 수행하여 고정된 이중가닥 핵산 프래그먼트를 생성시키는 단계;Iii) performing a primer extension reaction using the annealed single stranded fragment as a template to generate an immobilized double stranded nucleic acid fragment;

ⅵ) 상기 고정된 이중가닥 핵산 프래그먼트를 변성시켜 고정된 단일가닥 프래그먼트를 생성시키는 단계;Iii) denaturing the immobilized double stranded nucleic acid fragments to produce immobilized single stranded fragments;

ⅶ) 상기 고정된 단일가닥 프래그먼트를 고정된 콜로니 프라이머에 어닐링 하는 단계;Iii) annealing the fixed single stranded fragments to the fixed colony primers;

ⅷ) 상기 고체 표면 상의 각각의 특정 부위에 콜로니가 생성되도록 상기 단계 ⅳ) 내지 ⅶ)을 반복하는 단계를 포함하는 방법에 생성된다.Iii) repeating steps iii) to iii) to produce colonies at each particular site on the solid surface.

바람직하게는, 혼합물에 존재하는 콜로니 프라이머의 비율은 콜로니 주형의 비율보다 더 높다. 바람직하게는, 콜로니 주형에 대한 콜로니 프라이머의 비율은, 콜로니 프라이머 및 핵산 주형이 고체 지지체에 고정될 때, 고체 지지체의 전체 또는 한정된 부위에 걸쳐 적절히 균일한 밀도로 위치하는 다수의 콜로니 프라이머를 포함하는 고체 콜로니 프라이머의 "lawn(론)"이 형성되고 그 콜로니 프라이머의 론 내에서 하나 이상의 콜로니 주형이 간격을 두고 개별적으로 고정되도록 한다. 본 발명에서 사용하기 위한 프라이머는 바람직하게는 5 개 이상의 염기 길이를 갖는다. 보다 바람직하게는, 프라이머는 100 개 미만 또는 50 개 미만의 염기 길이를 갖는다. 본 발명은 주형을 고정된 프라이머에 어닐링하는 단계, 프라이먼 연장 단계, 및 연장된 프라이머를 주형으로부터 분리시키는 단계를 반복한다. 이 단계는 PCR 시약 및 조건(또는 역전사 및 PCR)을 이용하여 수행될 수 있다는 것이 당해 기술분야에서 통상의 지식을 가진 자는 알 것이다. PCR 기술은 예를 들어, 전체가참고로 여기에 통합되어 있는 Springer-Verlag에 의해 1992년 발행된 "PCR: Clinical Diagnostics and Research"에 개시되어 있다.Preferably, the proportion of colony primers present in the mixture is higher than the proportion of colony templates. Preferably, the ratio of colony primer to colony template comprises a plurality of colony primers that are located at appropriate uniform density over the entire or defined portion of the solid support when the colony primer and nucleic acid template are immobilized on the solid support. A "lawn" of solid colony primers is formed and one or more colony templates within the colon of the colony primers are spaced apart individually. Primers for use in the present invention preferably have a length of at least 5 bases. More preferably, the primers have a length of less than 100 or less than 50 bases. The present invention repeats the steps of annealing the template to the immobilized primer, extending the primer, and separating the extended primer from the template. It will be appreciated by those skilled in the art that this step can be performed using PCR reagents and conditions (or reverse transcription and PCR). PCR techniques are disclosed, for example, in "Clinical Diagnostics and Research" published in 1992 by Springer-Verlag, which is hereby incorporated by reference in its entirety.

또한, 고체 지지체 상의 핵산의 등온 증폭을 이용하여 DNA 콜로니를 생성시킬 수 있다(예: PCT 공개공보 WO 02/46456 참조). 증폭되는 DNA 프래그먼트가 환상인 본 발명의 구현예에서, 선형 DNA 프래그먼트를 제한 효소를 이용하여 선형화 하는 단계를 콜로니 생성 전에 수행하는 것이 바람직하다. 한 구현예에서, DNA 콜로니를 DNA 분자 샘플, 예를 들어 제한 프래그먼트 풀로부터Isothermal amplification of nucleic acids on solid supports can also be used to generate DNA colonies (see, eg, PCT Publication WO 02/46456). In embodiments of the invention where the DNA fragment to be amplified is annular, it is preferred to perform the linearization of the linear DNA fragment with restriction enzymes prior to colony production. In one embodiment, the DNA colony is derived from a DNA molecule sample, eg, a restriction fragment pool.

ⅰ) 샘플 중의 DNA 분자를, 각각의 콜로니 프라이머가 그 DNA 분자의 3' 말단의 서열과 혼성화 가능한 서열을 포함하는 콜로니 프라이머와 함께 혼합하되, 그 콜로니 프라이머의 농도를 이식된 DNA 분자의 증폭이 일어나도록 조절하는 단계;Iii) DNA molecules in the sample are mixed with colony primers, each colony primer comprising a sequence capable of hybridizing with the sequence at the 3 'end of the DNA molecule, wherein the concentration of the colony primer is amplified with the transplanted DNA molecule. Adjusting it to be;

ⅱ) DNA 분자 및 콜로니 프라이머를 고체 표면 상에 5' 말단으로 이식하여 고정된 DNA 분자 및 고정된 콜로니 프라이머를 생성시키는 단계;Ii) grafting the DNA molecule and colony primer to the 5 'end on the solid surface to produce immobilized DNA molecule and immobilized colony primer;

ⅲ) 폴리머라제 및 뉴클레오티드를 함유하는 증폭 용액을, 콜로니가 고체 표면의 각각의 특정 위치에 등온으로 생성되도록 고체 표면에 적용하는 단계를 포함하는 방법에 의해 생성된다.Iii) applying an amplification solution containing polymerase and nucleotides to the solid surface such that the colonies are isothermally generated at each particular location of the solid surface.

단계 ⅱ)에서 고정된 핵산의 양은 생성될 수 있는 표면당 DNA 콜로니의 평균 수를 결정한다. 고정되는 DNA 분자의 바람직한 농도 범위는 바람직하게는 콜로니 주형이 1 나노몰 내지 0.01 나노몰이며, 콜로니 프라이머가 50 내지 1000 나노몰이다. 바람직한 구현예에서, 반응의 온도는 폴리머라제 활성의 최적온도로 선택된다. 바람직한 구현예에서, 샘플 중의 DNA 분자는 약 50-5000 염기쌍 범위의 크기를 갖는다.The amount of nucleic acid immobilized in step ii) determines the average number of DNA colonies per surface that can be produced. The preferred concentration range of DNA molecules to be immobilized is preferably from 1 nanomolar to 0.01 nanomolar with colony template and from 50 to 1000 nanomolar with colony primer. In a preferred embodiment, the temperature of the reaction is selected as the optimum temperature of the polymerase activity. In a preferred embodiment, the DNA molecules in the sample have a size in the range of about 50-5000 base pairs.

본 섹션에서 기재한 방법에서, 콜로니들은 표면 상의 서로 떨어져 있는 위치에서 생성된다. 표면상의 콜로니 밀도는 예를 들어, 표면 상에 고정된 프라이머의 밀도를 조정함으로써 조절될 수 있다. 바람직한 구현예에서, 콜로니 밀도는 10^4-6콜로니/cm², 보다 바람직하게는 10^7-8콜로니/cm²이상이다. 콜로니의 크기는 또한 실험 조건을 조정함으로써 조절될 수 있다. 바람직하게는, 콜로니는 최대 직경이 10 nm 내지 100 ㎛, 보다 바람직하게는 100 nm 내지 10 ㎛이다.In the method described in this section, colonies are produced at locations remote from each other on the surface. Colony density on the surface can be adjusted, for example, by adjusting the density of primers immobilized on the surface. In a preferred embodiment, the colony density is at least 10 ^4-6 colonies / cm ² , more preferably at least 10 ^7-8 colonies / cm ² . The size of the colonies can also be adjusted by adjusting the experimental conditions. Preferably, the colonies have a maximum diameter of 10 nm to 100 μm, more preferably 100 nm to 10 μm.

DNA의 서열의 적어도 일부를 결정하기 위해 DNA 콜로니를 시퀀싱 할 수 있다. 한 구현예에서, 시퀀싱은 여기에서 때때로 "시퀀싱 프라이머"라고 일컫는 적절한 프라이머를 DNA 콜로니의 핵산 분자와 혼성화하고, 그 프라이머를 연장시키고, 그 프라이머를 연장하기 위해 사용된 뉴클레오티드를 검출함으로써 수행된다. 바람직하게는, 각각의 콜로니에서 프라이머를 연장하기 위해 사용된 뉴클레오티드는 다음 뉴클레오티드를 성장하는 핵산 체인에 부가하기 전에 검출하여, 염기를 하나씩 in situ 핵산 시퀀싱하는 것을 가능하게 한다.DNA colonies can be sequenced to determine at least a portion of the sequence of DNA. In one embodiment, sequencing is performed by hybridizing appropriate primers, sometimes referred to herein as “sequencing primers,” with the nucleic acid molecules of the DNA colony, extending the primers, and detecting the nucleotides used to extend the primers. Preferably, the nucleotides used to extend the primers in each colony are detected prior to adding the next nucleotide to the growing nucleic acid chain, making it possible to sequence the bases one by one in situ nucleic acid.

통합된 뉴클레오티드를 검출하는 것은 하나 이상의 라벨링된 뉴클레오티드를 프라이머 연장반응에 포함시킴으로써 촉진된다. 임의의 적절한 검출 가능한 라벨을 사용할 수 있으며, 예를 들어 형광발색단, 방사능 라벨 등이 있다. 바람직하게는 형광 라벨이 사용된다. 당해 기술분야에 공지되어 있는 임의의 형광 라벨을 사용할 수 있다. 동일하거나 서로 다른 형광 라벨을 각각의 서로 다른 종류의 뉴클레오티드에 대해 사용할 수 있다. 라벨이 형광발색단이고 동일한 라벨이 각각 다른 종류의 뉴클레오티드에 대해 사용될 경우, 각각의 뉴클레오티드 통합은 특정 파장에서 검출되는 신호의 축적적인 증가를 제공한다. 서로 다른 라벨을 사용할 경우, 이러한 신호는 서로 다른 적절한 파장에서 검출될 수 있다. 바람직한 구현예에서, 동일한 종류로 라벨링된 뉴클레오티드 및 라벨링되지 않은 뉴클레오티드의 혼합물이 각각의 프라이머 연장 단계에서 사용된다.Detecting the integrated nucleotides is facilitated by including one or more labeled nucleotides in the primer extension. Any suitable detectable label can be used, for example fluorophores, radioactive labels and the like. Preferably fluorescent labels are used. Any fluorescent label known in the art can be used. The same or different fluorescent labels can be used for each different kind of nucleotide. If the label is a fluorophore and the same label is used for different kinds of nucleotides, each nucleotide integration provides a cumulative increase in the signal detected at a particular wavelength. If different labels are used, these signals can be detected at different suitable wavelengths. In a preferred embodiment, a mixture of nucleotides labeled and unlabeled nucleotides of the same kind is used in each primer extension step.

적절한 시퀀싱 프라이머를 시퀀싱될 핵산 주형에 혼성화하는 것을 가능하도록 하기 위해, 핵산 주형은 대개 단일 가닥 형태이어야 한다. 핵산 콜로니를 구성하는 핵산 주형이 이중 가닥 형태로 존재한다면, 단일 가닥 핵산 주형을 제공하기 위하여, 예를 들어 변성, 개열 등과 같은 그러나 이에 한정되지 않는 당해 기술분야에 잘 알려져 있는 방법을 이용할 수 있다.In order to be able to hybridize the appropriate sequencing primer to the nucleic acid template to be sequenced, the nucleic acid template should usually be in single stranded form. If the nucleic acid templates constituting the nucleic acid colony are in double stranded form, methods well known in the art, such as, but not limited to, denaturation, cleavage, etc., can be used to provide a single stranded nucleic acid template.

핵산 주형에 혼성화되어 프라이머 연장을 위해 사용되는 시퀀싱 프라이머는 바람직하게는 짧은 올리고뉴클레오티드, 예를 들어 15 내지 25 개의 뉴클레오티드 길이이다. 프라이머의 서열은 엄격한 조건에서 시퀀싱될 핵산 주형의 일부와 혼성화 하도록 디자인 될 수 있다. 시퀀싱에 사용되는 프라이머의 서열은 핵산 콜로니를 생성시키기 위해 사용되는 콜로니 프라이머의 서열과 동일하거나 유사한 서열을 가질 수 있다.Sequencing primers that are hybridized to the nucleic acid template and used for primer extension are preferably short oligonucleotides, for example 15-25 nucleotides in length. The sequence of the primer can be designed to hybridize with a portion of the nucleic acid template to be sequenced under stringent conditions. The sequence of the primers used for sequencing may have the same or similar sequence as the sequence of colony primers used to generate nucleic acid colonies.

핵산 주형 및 시퀀싱 프라이머를 당해 기술분야에서 잘 알려진 방법에 의해 결정된 적절한 조건으로 가함으로써 일단 시퀀싱 프라이머가 시퀀싱될 핵산 주형에 어닐링되면, 예를 들어 핵산 폴리머라제 및 적어도 일부가 라벨링된 형태인 뉴클레오티드를 제공하고, 적절한 뉴클레오티드가 제공된다면 프라이머 연장에 적절한 조건을 이용하여 프라이머 연장을 수행한다. 사용될 수 있는 DNA 폴리머라제 및 뉴클레오티드는 당해 기술분야에 잘 알려져 있다.Adding nucleic acid templates and sequencing primers to appropriate conditions as determined by methods well known in the art, once sequencing primers are annealed to the nucleic acid template to be sequenced, for example, nucleic acid polymerase and at least a portion of the nucleotides in labeled form are provided. If appropriate nucleotides are provided, primer extension is performed using conditions appropriate for primer extension. DNA polymerases and nucleotides that can be used are well known in the art.

바람직하게는, 각각의 프라이머 연장 단계 후에, 이후의 단계를 방해할 수 있는 통합되지 않은 뉴클레오티드를 제거하기 위해 세척 단계를 포함한다. 프라이머 연장 단계가 수행된 후에, 라벨링된 뉴클레오티드가 연장된 프라이머로 통합되었는지 여부를 결정하기 위해 DNA 콜로니를 검출할 수 있다. 그런 다음, 연장된 프라이머로 통합된 이후의 뉴클레오티드를 결정하기 위해 프라이머 연장 단계를 반복할 수 있다.Preferably, after each primer extension step, a wash step is included to remove unintegrated nucleotides that may interfere with subsequent steps. After the primer extension step is performed, DNA colonies can be detected to determine whether the labeled nucleotides have been incorporated into the extended primer. The primer extension step can then be repeated to determine the nucleotides after integration into the extended primers.

존재 또는 부존재의 검출을 가능하게 하는 임의의 장치, 및 연장된 프라이머로 통합되는 적절한 라벨의 양, 예를 들어 형광 및 방사성이 서열 결정을 위해 이용될 수 있다. 라벨이 형광 라벨인 구현예에서, 확대 장치(예: 현미경)에 부착된 CCD 카메라를 이용할 수 있다.Any device that allows detection of the presence or absence, and the amount of appropriate label incorporated into the extended primer, such as fluorescence and radioactivity, may be used for sequencing. In embodiments where the label is a fluorescent label, one can use a CCD camera attached to a magnifying device (eg, a microscope).

검출 시스템은 바람직하게는 각각의 프라이머 연장 단계 후에 각각의 콜로니에 통합된 뉴클레오티드의 수 및 신원을 결정하기 위해 분석 시스템과 함께 사용된다. 각각의 프라이머 연장 단계 직후 또는 이후에 기록된 데이터를 이용하여 수행될 수 있는 이러한 분석은 주어진 콜로니 내에 있는 핵산 주형의 서열의 결정을 가능하게 한다.The detection system is preferably used with an assay system to determine the number and identity of nucleotides incorporated into each colony after each primer extension step. This analysis, which can be performed using data recorded immediately after or after each primer extension step, allows for the determination of the sequence of the nucleic acid template within a given colony.

본 발명의 또 다른 구현예에서, 하나 이상의 핵산의 전체 서열 또는 부분적 서열은, 하나 이상의 핵산 콜로니에 존재하는 핵산 주형의 전체 또는 부분적인 서열을 결정함으로써 결정할 수 있다. 바람직하게는, 다수의 서열을 동시에 결정하며, 핵산 콜로니에 적용되는 뉴클레오티드를 일반적으로 대개 선택된 순서대로 예를 들어 dATP, dTTP, dCTP, dGTP의 순서대로 적용한 다음 분석에 걸쳐 반복한다.In another embodiment of the invention, the entire sequence or partial sequence of one or more nucleic acids can be determined by determining the entire or partial sequence of a nucleic acid template present in one or more nucleic acid colonies. Preferably, multiple sequences are determined simultaneously and the nucleotides applied to the nucleic acid colonies are generally applied in the order selected, for example dATP, dTTP, dCTP, dGTP, and then repeated throughout the analysis.

따라서, 특정 핵산 콜로니를 구성하는 핵산 주형의 전체 또는 부분적 서열을 결정할 수 있다는 것을 알 수 있다.Thus, it can be seen that the whole or partial sequence of the nucleic acid template constituting a particular nucleic acid colony can be determined.

본 발명의 방법에 이용되는 프라이머 및 올리고뉴클레오티드는 바람직하게는 DNA이며, 표준 기술을 이용하여 합성될 수 있으며, 적절할 경우 표준 방법을 이용하여 검출 가능하게 라벨링한다(상기 Ausubel 등). 본 발명의 방법에 사용될 수 있는 검출 가능한 라벨로는 형광 라벨(예: 플루오레세인 및 다민)이 있으나, 이에 한정되는 것은 아니다. 본 발명의 방법에 사용되는 라벨은 표준 방법을 이용하여 검출된다.Primers and oligonucleotides used in the methods of the present invention are preferably DNA, can be synthesized using standard techniques, and are detectably labeled using standard methods when appropriate (Ausubel et al., Supra). Detectable labels that can be used in the methods of the present invention include, but are not limited to, fluorescent labels (eg, fluorescein and damin). Labels used in the methods of the present invention are detected using standard methods.

본 발명의 방법은 어세이를 수행하기 위해 필요한 시약을 함유하는 키트를 사용함으로써 촉진될 수 있다. 그 키트는 단일 제한 프래그먼트 태그의 분석(예를 들어 진단방법에서 사용하기 위해) 또는 여러 제한 프래그먼트 태그의 분석(예를 들어 게놈 맵핑에 사용하기 위해)을 수행하기 위한 시약을 함유할 수 있다. 여러 샘플을 분석할 경우, 여러 세트의 적절한 프라이머 및 올리고뉴클레오티드가 키트에 제공된다. 또한, 여러 방법을 수행하기 위해 요구되는 프라이머 및 올리고뉴클레오티드 외에, 키트는 그 방법에서 사용되는 효소, 및 라벨을 검출하기 위한 시약 등을 함유할 수 있다. 키트는 또한 본 발명의 방법을 수행하는데 사용하기 위한 고체 기판을 함유할 수 있다. 예를 들어, 키트는 유리 플레이트 또는 실리콘 또는유리 마이크로칩과 같은 고체 기판을 함유할 수 있다.The method of the present invention can be facilitated by using a kit containing the reagents needed to perform the assay. The kit may contain reagents for performing analysis of a single restriction fragment tag (eg for use in diagnostic methods) or analysis of several restriction fragment tags (eg for use in genomic mapping). When analyzing several samples, several sets of appropriate primers and oligonucleotides are provided in the kit. In addition, in addition to the primers and oligonucleotides required to perform the various methods, the kit may contain enzymes used in the methods, reagents for detecting labels, and the like. The kit may also contain a solid substrate for use in carrying out the method of the present invention. For example, the kit may contain a glass plate or a solid substrate such as silicon or glass microchips.

5.4. 표현형과 관련된 제한 서열 태그를 동정하는 방법5.4. How to identify restriction sequence tags associated with phenotypes

그런 다음, 모든 상동 태그를 동정하고 상동 제한 서열 태그의 수를 결정하기 위해, 각각의 개체로부터 획득한 제한 서열 태그를 주어진 표현형의 서브집단 사이에서 비교한다. 바람직한 구현예에서, DNA 콜로니 내에서 획득되는 두 개의 제한 서열 태그는 제한 프래그먼트 세트 중의 해당 제한 프래그먼트의 말단을 나타낸다. 그러한 두 개의 태그는 게놈 상에 서로 물리적으로 가까운 위치로부터 유래한다. 각각의 태그는 더 긴 서열을 획득하기 위해 게놈 DNA의 분해를 위해 사용되는 제한 효소의 제한 부위의 서열과 조합될 수 있다. 상동 태그를 분류한다. 한 구현예에서, 제한 태그 그룹은 60%, 70%, 80%, 90%, 또는 99% 이상이 상동성인 제한 태그로 구성된다. 또 다른 구현예에서, 독특한 제한 태그 그룹은 100%가 상동인 제한 태그로 구성된다. 서브 집단의 제한 태그 그룹의 수집은 표현형과 관련된 서열 변이를 확인하는데 이용될 수 있다. 바람직한 구현예에서, 연구중인 표현형은 집단에서의 서열 변이의 비율과 또는 서열 변이의 조합과 관련이 있다. 한 구현예에서, 예를 들어 제한 서열 태그의 각각의 하나 이상의 집단특정 그룹에서 제한 태그의 상대적인 수로 나타내고, 각각이 두 개의 서로 다른 집단 간에 10%, 20%, 50%, 70%, 또는 90% 이상 다른, 집단에서 하나 이상의 특정 서열의 비율은 두 집단 간의 표현형 차이와 연관이 있는 것으로 확인된다. 또 다른 바람직한 구현예에서, 표현형은 집단의 개체에서 발견되는 서열 변이의 특정 조합과 연관이 있다. 한 구현예에서, 예를 들어 제한 서열 태그의 다수의 특정 그룹 중의 제한 태그의수의 조합으로 나타내는, 즉 다수의 그룹 중의 제한 태그의 총 수와 같은 집단에서 다수의 특정 서열의 비율의 조합은, 그러한 비율의 조합이 두 집단 간 10%, 20%, 50%, 70%, 또는 90% 이상 다르다면 두 집단 간의 표현형의 차이와 연관되는 것으로 확인된다. 또 다른 구현예에서, 다수의 그러한 조합은 표현형의 차이를 동정하는데 이용된다. 다수의 조합이 이용되는 구현예에서, 다수의 조합 중 각각의 조합은, 다수의 조합에 서로 다른 조합으로 포함되는 하나 이상의 특정 서열을 포함할 수 있다. 이러한 구현예는 하기 실시예 6.3에 예시되어 있다.The restriction sequence tags obtained from each individual are then compared between subpopulations of a given phenotype to identify all homologous tags and determine the number of homologous restriction sequence tags. In a preferred embodiment, two restriction sequence tags obtained within the DNA colony indicate the ends of the corresponding restriction fragment in the restriction fragment set. Such two tags originate from locations physically close to each other on the genome. Each tag can be combined with the sequence of a restriction site of a restriction enzyme that is used for digestion of genomic DNA to obtain longer sequences. Classify homologous tags. In one embodiment, the restriction tag group consists of restriction tags with at least 60%, 70%, 80%, 90%, or 99% homology. In another implementation, the unique restriction tag group consists of restriction tags that are 100% homologous. Collection of restriction tag groups of subpopulations can be used to identify sequence variations associated with phenotypes. In a preferred embodiment, the phenotype under study relates to the proportion of sequence variation in the population or to a combination of sequence variations. In one embodiment, for example, represented by the relative number of restriction tags in each one or more population specific groups of restriction sequence tags, each 10%, 20%, 50%, 70%, or 90% between two different populations. As described above, the proportion of one or more specific sequences in a population is identified as being associated with the phenotypic difference between the two groups. In another preferred embodiment, the phenotype is associated with a particular combination of sequence variations found in a population of individuals. In one embodiment, a combination of ratios of a plurality of specific sequences in a population, for example represented by a combination of the number of restriction tags in a plurality of specific groups of restriction sequence tags, ie, the total number of restriction tags in the plurality of groups, If such a combination of ratios differs by more than 10%, 20%, 50%, 70%, or 90% between the two groups, it is found to be associated with the difference in phenotype between the two groups. In another embodiment, many such combinations are used to identify differences in phenotypes. In embodiments where multiple combinations are used, each combination of multiple combinations may include one or more specific sequences that are included in different combinations in the multiple combinations. This embodiment is illustrated in Example 6.3 below.

한 구현예에서, 제한 서열 태그는 제한 서열 태그의 게놈 위치를 확인하기 위해 생명체의 게놈 서열과 비교할 수 있다. 또 다른 구현예에서, 인식 부위 양쪽의 게놈과 측접하는 제한 서열 태그는 생명체의 유전자 서열로부터 동정된다.In one embodiment, the restriction sequence tag may be compared with the genomic sequence of the organism to identify the genomic location of the restriction sequence tag. In another embodiment, restriction sequence tags flanking the genome on both sides of the recognition site are identified from the genetic sequence of the organism.

5.5. 제한 서열 태그를 획득하기 위한 특정 바람직한 구현예5.5. Certain Preferred Embodiments for Obtaining Restriction Sequence Tags

제한 서열 태그를 획득하기 위한 여러 바람직한 구현예를 이 섹션에 기재한다. 이 방법들은 표현형과 연관된 서열 변이를 확인하기 위해 섹션 5.1 내지 5.4에 기재된 임의의 방법과 조합하여 제조할 수 있다. 본 섹션에 기재된 하나 이상의 특정 구현예의 임의의 반복 및/또는 조합을 또한 사용할 수 있다는 것은 당업자에게 명확할 것이다.Several preferred embodiments for obtaining restriction sequence tags are described in this section. These methods can be prepared in combination with any of the methods described in Sections 5.1 to 5.4 to identify sequence variations associated with phenotypes. It will be apparent to those skilled in the art that any repetition and / or combination of one or more specific embodiments described in this section may also be used.

(1) 제 1 특정 구현예(1) First Specific Embodiment

바람직한 구현예에서, 본 발명은 생물학적 샘플의 제한 서열 태그를 생성시키는 방법을 제공한다(도 2a 및 도 2b). 그 방법에서, 하나 이상의 제 1 제한효소를 이용하여 생물학적 샘플로부터 추출된 핵산을 분해하여 제한 프래그먼트 세트를생성시킬 수 있다. 그런 다음,In a preferred embodiment, the present invention provides a method for generating a restriction sequence tag of a biological sample (FIGS. 2A and 2B). In that method, one or more first restriction enzymes can be used to digest the nucleic acid extracted from the biological sample to generate a set of restriction fragments. after that,

1) 제한 프래그먼트 세트 중의 제한 프래그먼트를 기결정된 뉴클레오티드 서열을 포함하는 제 1 의 조작된 핵산과 연결시켜 제 1 환상 핵산 프래그먼트를 획득하는 단계로서, 상기 기결정된 뉴클레오티드 서열은 제 2 제한효소가 상기 제한 프래그먼트에서 절단하도록 위치하고 배향하는, 제 2 제한효소의 하나 이상의 인식부위 포함하는 것을 특징으로 하는 단계;1) linking a restriction fragment in a set of restriction fragments with a first engineered nucleic acid comprising a predetermined nucleotide sequence to obtain a first annular nucleic acid fragment, wherein the predetermined nucleotide sequence is determined by a second restriction enzyme by the restriction fragment; And at least one recognition site of the second restriction enzyme, positioned and oriented to cleave at;

2) 제 1 환상 핵산 프래그먼트를 제 2 제한효소로 분해시키는 단계;2) digesting the first cyclic nucleic acid fragment with a second restriction enzyme;

3) 상기 제 2 제한효소에 의해 생성된 말단을 변형시켜 라이게이션을 가능하게 하는 단계;3) modifying the terminal produced by the second restriction enzyme to enable ligation;

4) 제 2 제한효소에 의해 생성된 말단을 연결시켜 제 2 환상 핵산 프래그먼트 세트를 생성시키는 단계; 및4) linking the ends produced by the second restriction enzyme to generate a second set of circular nucleic acid fragments; And

5) 상기 제 2 환상 핵산의 각각의 제한 프래그먼트의 적어도 일부를 시퀀싱하여 상기 제한 서열 태그 세트를 결정하는 단계를 포함하는 방법을 이용하여, 제한 프래그먼트 세트로부터 제한 서열 태그 세트를 결정한다.5) determining a restriction sequence tag set from the restriction fragment set using a method comprising sequencing at least a portion of each restriction fragment of the second circular nucleic acid to determine the restriction sequence tag set.

바람직하게는, 제 1의 조작된 핵산의 제 2 제한 효소의 인식 부위 각각은 제 1의 조작된 핵산의 말단에 가깝게 위치한다. 한 바람직한 구현예에서, 제 1의 조작된 핵산 중의 제 2 제한효소 인식부위 각각은 제 1의 조작되 핵산의 말단으로부터 20 개 미만의 뉴클레오티드만큼 떨어져서 위치한다. 보다 바람직하게는, 제 1의 조작된 핵산의 제 2 제한효소의 각각의 인식부위는 제 1의 조작된 핵산의 말단으로부터 0 내지 5 개의 뉴클레오티드만큼 떨어져 위치한다. 바람직하게는, 제 2 제한효소는 type ⅡS 엔도뉴클레아제이다. 바람직한 구현예에서, type ⅡS 엔도뉴클레아제는 인식부위로부터 5, 10, 20, 50, 100, 또는 200 이상 떨어진 염기를 절단한다. 또 다른 구현예에서, 제 2 환상 핵산 프래그먼트를 예를 들어 제 1 제한효소 및 제 2 제한효소와는 다른 제 3 제한효소를 이용하여 선형화 하여 제 3 제한 프래그먼트 세트를 획득할 수 있다. 바람직한 구현예에서, 본 방법은 또한 제 3 제한 프래그먼트를 제 1의 조작된 핵산에서 발견되는 프라이머를 이용하여 제 3 제한 프래그먼트를 증폭시키는 단계를 포함한다. 또 다른 바람직한 구현예에서, 제 3 제한효소로 분해한 다음 증폭하는 단계를 제 2 환상 핵산 프래그먼트의 증폭단계로 대신할 수 있다.Preferably, each of the recognition sites of the second restriction enzyme of the first engineered nucleic acid is located close to the end of the first engineered nucleic acid. In one preferred embodiment, each of the second restriction enzyme recognition sites in the first engineered nucleic acid is located less than 20 nucleotides away from the end of the first engineered nucleic acid. More preferably, each recognition site of the second restriction enzyme of the first engineered nucleic acid is located 0 to 5 nucleotides away from the end of the first engineered nucleic acid. Preferably, the second restriction enzyme is a type IIS endonuclease. In a preferred embodiment, the type IIS endonuclease cleaves a base at least 5, 10, 20, 50, 100, or 200 away from the recognition site. In another embodiment, the second circular nucleic acid fragment can be linearized using, for example, a third restriction enzyme different from the first restriction enzyme and the second restriction enzyme to obtain a third set of restriction fragments. In a preferred embodiment, the method also includes amplifying the third restriction fragment using the primer found in the first engineered nucleic acid. In another preferred embodiment, digesting with a third restriction enzyme and then amplifying may be substituted for amplifying the second annular nucleic acid fragment.

바람직한 구현예에서, 제 2 환상 핵산 프래그먼트를 고정하고 증폭시키는 단계는 단계 5) 전에 수행한다. 보다 바람직한 구현예에서, 고정 및 증폭은 섹션 5.3에 기재된 DNA 콜로니 방법 중 임의의 하나의 방법으로 수행한다. 더욱 바람직한 구현예에서, 시퀀싱은 섹션 5.3에 기재된 한 염기씩의 프라이머 연장 방법 중 하나에 의해 수행된다.In a preferred embodiment, the step of fixing and amplifying the second circular nucleic acid fragment is performed before step 5). In a more preferred embodiment, the fixation and amplification is performed by any one of the DNA colony methods described in section 5.3. In a more preferred embodiment, sequencing is performed by one of the base-by-base primer extension methods described in section 5.3.

본 발명의 더욱 바람직한 다른 구현예에서, 상기 제 2 제한 프래그먼트의 말단을 연결하기 위해 뭉툭해지도록 DNA 폴리머라제로 보충하거나 돌출된 뉴클레오티드를 제거함으로써 상기 제 2 제한 프래그먼트의 상기 말단을 변경하는 단계를 수행한다.In another more preferred embodiment of the invention, the step of modifying said end of said second restriction fragment is removed by supplementing with DNA polymerase or removing protruding nucleotides to blunt to connect the ends of said second restriction fragment. do.

또 다른 바람직한 구현예에서, 본 발명의 방법은 각각의 단계 후에 정제 단계 및/또는 DNA 분리단계를 포함한다.In another preferred embodiment, the methods of the present invention comprise a purification step and / or a DNA separation step after each step.

또 다른 바람직한 구현예에서, 제한 프래그먼트 세트 중의 작은 게놈 DNA 서열은 어느 정도 연결하고, 플라스미드로 삽입하고, 아가로오스 플레이트 상에 도말된 세균 및 각각의 개별 세균 콜로니로부터 분리된 플라스미드를 이용하여 세균으로 클로닝하고, 자동화된 모세관 시퀀스로 Sanger 시퀀싱을 이용하여 시퀀싱한다. 세균 클로닝을 이용하지 않는 다른 접근법이 또한 당해 기술분야에서 통상의 지식을 가진 자에게 알려져 있다. 예를 들어, 제 1의 조작된 핵산은 제 3 핵산 프래그먼트가 비드 상의 분자 클로닝을 위해 이용되어 하나의 염기씩 시퀀싱할 수 있도록 조합 시퀀스 태그를 포함할 수 있다.In another preferred embodiment, the small genomic DNA sequence in the restriction fragment set is linked to the bacterium using a plasmid isolated from each individual bacterial colony and bacteria spread to some extent, inserted into the plasmid, and plated on agarose plates. Clones and sequence using Sanger sequencing into an automated capillary sequence. Other approaches that do not utilize bacterial cloning are also known to those of ordinary skill in the art. For example, the first engineered nucleic acid can include a combinatorial sequence tag such that the third nucleic acid fragment can be used for molecular cloning on the beads to sequence by one base.

(Ⅱ) 제 2 특정 구현예(II) Second Specific Embodiment

또 다른 구현예에서, 본 발명은 생물학적 샘플의 제한 서열 태그를 생성시키는 방법을 제공한다(도 3a 및 도 3b). 상기 방법에서, 제 1 제한효소를 이용하여 생물학적 샘플로부터 추출된 핵산을 분해하여, 제한 프래그먼트 세트를 생성할 수 있다. 제 1 제한효소는 인식부위가 아닌 절단부위가 서열의 일부를 둘러싸는 방식으로 인식 부위 양쪽을 절단한다. 제한효소는 이러한 목적으로 사용될 수 있으며,BaeI,BcgI,BsaXI를 포함하지만 이에 한정되는 것은 아니다. 그런 다음,In another embodiment, the present invention provides a method of generating a restriction sequence tag of a biological sample (FIGS. 3A and 3B). In this method, a nucleic acid extracted from a biological sample can be digested using a first restriction enzyme to generate a set of restriction fragments. The first restriction enzyme cleaves both recognition sites in such a way that the cleavage site, rather than the recognition site, surrounds part of the sequence. Restriction enzymes can be used for this purpose, including but not limited to Bae I, Bcg I, Bsa XI. after that,

1) 제 1 제한 효소에 의해 생성된 말단을 변형하여 라이게이션을 가능하게 하는 단계;1) modifying the terminus produced by the first restriction enzyme to enable ligation;

2) 제한 프래그먼트 세트 중의 제한 프래그먼트를, 기결정된 뉴클레오티드 서열을 포함하는 제 1 의 조작된 핵산과 연결시켜 제 1 환상 핵산 프래그먼트 세트를 획득하는 단계; 및2) linking the restriction fragments in the restriction fragment set with a first engineered nucleic acid comprising a predetermined nucleotide sequence to obtain a first set of circular nucleic acid fragments; And

3) 제 1 환상 핵산의 상기 각각의 제한 프래그먼트의 적어도 일부를 시퀀싱하여 제한 서열 태그 세트를 결정하는 단계를 포함하는 방법에 의해, 제한 서열 태그 세트를 제한 프래그먼트 세트로부터 결정한다.3) determining the restriction sequence tag set from the restriction fragment set by a method comprising sequencing at least a portion of each restriction fragment of the first circular nucleic acid to determine the restriction sequence tag set.

바람직한 구현예에서, 제 1 환상 핵산 프래그먼트를 고정하고 증폭하는 단계는 단계 3) 전에 수행된다. 보다 바람직한 구현예에서, 고정 및 증폭은 섹션 5.3에 기재된 DNA 콜로니 방법 중 임의의 어느 하나에 의해 수행한다. 더욱 바람직한 구현예에서, 시퀀싱은 섹션 5.3에 기재된 하나의 염기씩의 프라이머 연장 방법에 의해 수행된다.In a preferred embodiment, the step of fixing and amplifying the first circular nucleic acid fragments is performed before step 3). In a more preferred embodiment, the fixation and amplification is performed by any one of the DNA colony methods described in section 5.3. In a more preferred embodiment, sequencing is performed by the primer extension method by one base described in section 5.3.

또 다른 바람직한 구현예에서, 그 말단을 연결하기 위해 뭉툭해지도록 DNA 폴리머라제로 보충하거나 돌출된 뉴클레오티드를 제거함으로써 상기 제 2 제한 프래그먼트의 상기 말단을 변경하는 단계를 수행한다.In another preferred embodiment, the step of modifying said terminus of said second restriction fragment is carried out by supplementing with DNA polymerase to remove blunt nucleotides or by eliminating protruding nucleotides to connect termini.

또 다른 바람직한 구현예에서, 본 발명의 방법은 각각의 단계 후에 정제단계 및/또는 DNA 분리단계를 포함한다.In another preferred embodiment, the method of the present invention comprises a purification step and / or a DNA separation step after each step.

(Ⅲ) 제 3 특정 구현예(III) Third Specific Embodiment

또 다른 구현예에서, 본 발명은 생물학적 샘플의 제한 서열 태그를 생성시키는 방법을 제공한다(도 4a 및 도 4b). 상기 방법에서, 제 1 제한효소를 이용하여 생물학적 샘플로부터 추출된 핵산을 분해하여, 제한 프래그먼트 세트를 생성할 수 있다. 그런 다음,In another embodiment, the present invention provides a method of generating a restriction sequence tag of a biological sample (FIGS. 4A and 4B). In this method, a nucleic acid extracted from a biological sample can be digested using a first restriction enzyme to generate a set of restriction fragments. after that,

1) 상기 제한 프래그먼트 세트 중의 제한 프래그먼트를 기결정된 뉴클레오티드 서열을 포함하는 제 1 의 조작된 핵산과 연결시켜 제 1 핵산 프래그먼트 세트를획득하는 단계로서, 상기 기결정된 뉴클레오티드 서열은 제 2 제한 효소가 상기 제한 프래그먼트에서 절단하도록 위치하고 배향하는, 제 2 제한 효소의 하나 이상의 인식부위 포함하는 것을 특징으로 하는 단계;1) linking a restriction fragment in the restriction fragment set with a first engineered nucleic acid comprising a predetermined nucleotide sequence to obtain a first nucleic acid fragment set, wherein the predetermined nucleotide sequence is determined by a second restriction enzyme. At least one recognition site of a second restriction enzyme, positioned and oriented to cleave at the fragment;

2) 제 1 핵산 프래그먼트를 제 2 제한 효소로 분해시키는 단계;2) digesting the first nucleic acid fragment with a second restriction enzyme;

3) 제 2 제한 효소에 의해 생성된 말단을 변경시켜 라이게이션을 가능하게 하는 단계;3) altering the termini produced by the second restriction enzyme to enable ligation;

4) 제 2 제한 효소에 의해 생성된 말단을 기결정된 뉴클레오티드를 포함하는 제 2 조작된 핵산과 연결시켜 제 2 핵산 프래그먼트 세트를 생성시키는 단계; 및4) linking the terminus produced by the second restriction enzyme with a second engineered nucleic acid comprising a predetermined nucleotide to generate a second set of nucleic acid fragments; And

5) 제 2 핵산 프래그먼트 중의 각각의 제한 프래그먼트의 적어도 일부를 시퀀싱하여 제한 서열 태그 세트를 결정하는 단계를 포함하는 방법에 의해 제한 서열 태그 세트를 제한 프래그먼트 세트로부터 결정한다.5) determining the restriction sequence tag set from the restriction fragment set by a method comprising sequencing at least a portion of each restriction fragment in the second nucleic acid fragment to determine the restriction sequence tag set.

바람직하게는, 제 1의 조작된 핵산 중의 제 2 제한효소의 인식 부위는 제 1의 조작된 핵산의 말단에 가깝게 위치한다. 하나의 바람직한 구현예에서, 제 1의 조작된 핵산 중의 제 2 제한효소의 인식 부위는 20 개 미만의 뉴클레오티드만큼 떨어져서 위치한다. 보다 바람직하게는, 제 1의 조작된 핵산의 제 2 제한효소의 각각의 인식부위는 제 1의 조작된 핵산의 말단으로부터 0 내지 5 개의 뉴클레오티드만큼 떨어져 위치한다. 바람직하게는, 제 2 제한효소는 type ⅡS 엔도뉴클레아제이다. 바람직한 구현예에서, type ⅡS 엔도뉴클레아제는 인식부위로부터 5, 10, 20, 50, 100, 또는 200 이상 염기가 떨어진 부위를 절단한다.Preferably, the recognition site of the second restriction enzyme in the first engineered nucleic acid is located close to the end of the first engineered nucleic acid. In one preferred embodiment, the recognition sites of the second restriction enzymes in the first engineered nucleic acid are located by less than 20 nucleotides apart. More preferably, each recognition site of the second restriction enzyme of the first engineered nucleic acid is located 0 to 5 nucleotides away from the end of the first engineered nucleic acid. Preferably, the second restriction enzyme is a type IIS endonuclease. In a preferred embodiment, the type IIS endonuclease cleaves at least 5, 10, 20, 50, 100, or 200 bases away from the recognition site.

바람직한 구현예에서, 제 2 핵산 프래그먼트를 고정하고 증폭시키는 단계는단계 5) 전에 수행한다. 보다 바람직한 구현예에서, 고정 및 증폭은 섹션 5.3에 기재된 DNA 콜로니 방법 중 임의의 하나의 방법으로 수행한다. 더욱 바람직한 구현예에서, 시퀀싱은 섹션 5.3에 기재된 한 염기씩의 프라이머 연장 방법 중 하나에 의해 수행된다.In a preferred embodiment, the step of fixing and amplifying the second nucleic acid fragment is performed before step 5). In a more preferred embodiment, the fixation and amplification is performed by any one of the DNA colony methods described in section 5.3. In a more preferred embodiment, sequencing is performed by one of the base-by-base primer extension methods described in section 5.3.

본 발명의 또 다른 바람직한 다른 구현예에서, 상기 제 2 제한 프래그먼트의 말단을 변경하는 단계를 연결되도록 하기 위해 뭉툭해지도록 DNA 폴리머라제로 보충하거나 돌출된 뉴클레오티드를 제거함으로써 상기 제 2 제한 프래그먼트의 상기 말단을 변경하는 단계를 수행한다.In another preferred embodiment of the invention, the end of said second restriction fragment is removed by supplementing with DNA polymerase or removing protruding nucleotides to blunt to allow the step of altering the end of said second restriction fragment to be linked. Follow the steps to change it.

(Ⅳ) 제 4 특정 구현예(IV) fourth specific embodiment

또 다른 구현예에서, 본 발명은 생물학적 샘플의 제한 서열 태그를 생성시키는 방법을 제공한다(도 5a 및 도 5b). 상기 방법에서, 하나 이상의 희귀한 커터를 사용하여 생물학적 샘플로부터 추출된 핵산을 분해하여, 제한 프래그먼트 세트를 생성한다. 바람직하게는, 6-염기, 8-염기, 또는 8-염기 이상의 인식 서열을 인식하는 희귀한 커터를 이용한다. 그런 다음,In another embodiment, the present invention provides a method of generating a restriction sequence tag of a biological sample (FIGS. 5A and 5B). In this method, one or more rare cutters are used to digest nucleic acid extracted from a biological sample to generate a set of restriction fragments. Preferably, a rare cutter is used that recognizes 6-base, 8-base, or more than 8-base recognition sequences. after that,

1) 상기 제한 프래그먼트 세트 중의 제한 프래그먼트를 기결정된 뉴클레오티드 서열을 포함하는 제 1 의 조작된 핵산과 연결시켜 제 1 핵산 프래그먼트 세트를 획득하는 단계;1) linking a restriction fragment in the restriction fragment set with a first engineered nucleic acid comprising a predetermined nucleotide sequence to obtain a first nucleic acid fragment set;

2) 제 1 핵산 프래그먼트를, 제 1의 조작된 핵산에서 절단하지 않고 제 1 제한효소와는 다른 제 2 제한 효소로 분해시켜, 제 2 제한 프래그먼트를 생성시키는 단계;2) digesting the first nucleic acid fragment with a second restriction enzyme that is different from the first restriction enzyme without cleavage from the first engineered nucleic acid to produce a second restriction fragment;

3) 제 2 제한 프래그먼트의 말단을, 기결정된 뉴클레오티드 서열을 포함하는 제 2의 조작된 핵산과 연결시켜 제 2 핵산 프래그먼트 세트를 생성시키는 단계;3) linking the end of the second restriction fragment with a second engineered nucleic acid comprising a predetermined nucleotide sequence to generate a second set of nucleic acid fragments;

4) 제 2 핵산 프래그먼트 중의 각각의 제한 프래그먼트의 적어도 일부를 시퀀싱하여 제한 서열 태그 세트를 결정하는 단계를 포함하는 방법에 의해 제한 서열 태그 세트를 제한 프래그먼트 세트로부터 결정한다.4) determining the restriction sequence tag set from the restriction fragment set by a method comprising sequencing at least a portion of each restriction fragment in the second nucleic acid fragment to determine the restriction sequence tag set.

바람직한 구현예에서, 제 1 제한효소 및 제 2 제한효소로의 분해는 제 1 및 제 2의 조작된 프래그먼트와의 라이게이션 전에 동시에 수행한다.In a preferred embodiment, digestion with the first restriction enzyme and the second restriction enzyme is performed simultaneously prior to ligation with the first and second engineered fragments.

바람직한 구현예에서, 제 2 핵산 프래그먼트를 고정하고 증폭시키는 단계는 단계 4) 전에 수행한다. 보다 바람직한 구현예에서, 고정 및 증폭은 섹션 5.3에 기재된 DNA 콜로니 방법 중 임의의 하나의 방법으로 수행한다. 더욱 바람직한 구현예에서, 시퀀싱은 섹션 5.3에 기재된 한 염기씩의 프라이머 연장 방법 중 하나에 의해 수행된다.In a preferred embodiment, the step of fixing and amplifying the second nucleic acid fragment is performed before step 4). In a more preferred embodiment, the fixation and amplification is performed by any one of the DNA colony methods described in section 5.3. In a more preferred embodiment, sequencing is performed by one of the base-by-base primer extension methods described in section 5.3.

(V) 다른 특정 구현예(V) other specific embodiments

본 발명은 또한 생물학적 샘플의 제한 서열 태그를 생성시키는 방법을 제공한다. 그러한 방법에서, 생물학적 샘플로부터 추출된 핵산을 분해하기 위해 하나 이상의 제 1 제한 효소를 이용하여, 제한 프래그먼트 세트를 생성한다. 그런 다음, 다수의 서로 다른 제 2 제한 효소를 이용하여 상기 제한 프래그먼트를 더 분해한다. 그러한 방법은 제 1 제한효소의 인식 부위에 가까이 위치하는 제한 서열 태그의 수를 더욱 증가시키는 것을 가능하게 한다.The invention also provides a method of generating a restriction sequence tag of a biological sample. In such a method, a set of restriction fragments is generated using one or more first restriction enzymes to degrade nucleic acid extracted from a biological sample. Then, the restriction fragment is further digested using a plurality of different second restriction enzymes. Such a method makes it possible to further increase the number of restriction sequence tags located close to the recognition site of the first restriction enzyme.

바람직한 구현예(도 6a 및 6b)에서,In a preferred embodiment (FIGS. 6A and 6B),

2) 제 1 핵산 프래그먼트를, 제 1의 조작된 핵산에서 절단하지 않고 제 1 제한효소와는 다른 제 2 제한 효소로 분해시켜, 제 2 제한 프래그먼트를 획득하는 단계;2) digesting the first nucleic acid fragment with a second restriction enzyme that is different from the first restriction enzyme without cleavage from the first engineered nucleic acid to obtain a second restriction fragment;

4) 제 2 핵산 프래그먼트 중의 각각의 제한 프래그먼트의 적어도 일부를 시퀀싱하여 제한 서열 태그 세트를 결정하는 단계를 포함하는 방법에 의해, 제 1 제한효소로 분해한 후에 제한 서열 태그 세트를 제한 프래그먼트 세트로부터 결정한다.4) determining the restriction sequence tag set from the restriction fragment set after digestion with the first restriction enzyme, by sequencing at least a portion of each restriction fragment in the second nucleic acid fragment to determine the restriction sequence tag set. do.

또 다른 바람직한 구현예(도 7a 및 7b)에서, 제 1 제한 효소로 분해한 뒤,In another preferred embodiment (FIGS. 7A and 7B), after digestion with a first restriction enzyme,

1) 제한 프래그먼트 세트 중의 제한 프래그먼트를, 제 2 제한효소와 제 3 제한효소는 서로 다른 제 2 제한효소의 인식부위 및 제 3 제한효소의 두 개의 인식부위를 포함하고 상기 제 2 제한효소의 인식부위는 상기 제 3 제한효소의 인식부위들사이에 위치하고 상기 제 3 제한효소의 인식부위는 상기 제 3 제한효소가 상기 제한 프래그먼트에서 절단하도록 위치하고 배향하는 기결정된 뉴클레오티드 서열을 포함하는 제 1의 조작된 핵산과 연결시켜 제 1 환상 핵산 프래그먼트 세트를 획득하는 단계;1) The restriction fragment in the restriction fragment set, wherein the second restriction enzyme and the third restriction enzyme include two recognition sites of different restriction enzymes and two recognition sites of the third restriction enzyme and the recognition sites of the second restriction enzyme Is a first engineered nucleic acid comprising a predetermined nucleotide sequence located between the recognition sites of the third restriction enzyme and the recognition site of the third restriction enzyme is located and orientated such that the third restriction enzyme is cleaved at the restriction fragment. Concatenating with to obtain a first set of circular nucleic acid fragments;

2) 상기 제 1 핵산 프래그먼트를 제 2 제한 효소로 분해하여, 제 2 핵산 프래그먼트를 획득하는 단계;2) digesting the first nucleic acid fragment with a second restriction enzyme to obtain a second nucleic acid fragment;

3) 상기 제 2 제한 프래그먼트의 말단을 연결시켜 제 2 환상 핵산 프래그먼트 세트를 생성시키는 단계;3) joining the ends of the second restriction fragment to generate a second set of circular nucleic acid fragments;

4) 상기 제 3 환상 핵산 프래그먼트 중의 상기 제한 프래그먼트 각각의 일부를 시퀀싱하여 상기 제한 서열 태그 세트를 결정하는 단계를 포함하는 방법에 의해, 제한 프래그먼트 세트로부터 제한 서열 태그 세트를 결정한다.4) determining a set of restriction sequence tags from the set of restriction fragments by a method comprising sequencing a portion of each of the restriction fragments of the third circular nucleic acid fragment to determine the set of restriction sequence tags.

바람직하게는, 본 방법은 단계 3) 후에 3i) 제 2 환상 핵산 프래그먼트를 제 3 제한효소로 분해하여 제 3 핵산 프래그먼트 세트를 생성하는 단계; 3ⅱ) 상기 제 3 제한효소에 의해 생성된 말단을 변형시켜 라이게이션을 가능하게 하는 단계; 및 3ⅲ) 제 3 핵산 프래그먼트의 말단을 연결시켜 제 3 환상 핵산 프래그먼트 세트를 생성시키는 단계를 더 포함한다. 바람직하게는, 제 1 조작된 핵산 중의 제 3 제한효소의 인식 부위는 제 1의 조작된 핵산의 말단에 가깝게 위치한다. 하나의 바람직한 구현예에서, 제 1의 조작된 핵산 중의 제 3 제한효소의 인식 부위 각각은 20 개 미만의 뉴클레오티드만큼 떨어져서 위치한다. 보다 바람직한 구현예에서, 제 1의 조작된 핵산의 제 3 제한효소의 각각의 인식부위는 제 1의 조작된 핵산의 말단으로부터 0 내지 5 개의 뉴클레오티드만큼 떨어져 위치한다. 바람직하게는, 제 3 제한효소는 type ⅡS 엔도뉴클레아제이다. 바람직한 구현예에서, type ⅡS 엔도뉴클레아제는 인식부위로부터 5, 10, 20, 50, 100, 또는 200 이상 염기가 떨어진 부위를 절단한다.Preferably, the method further comprises, after step 3): 3i) digesting the second circular nucleic acid fragment with a third restriction enzyme to generate a third set of nucleic acid fragments; 3ii) modifying the termini produced by the third restriction enzyme to enable ligation; And 3iii) linking the ends of the third nucleic acid fragments to generate a third set of circular nucleic acid fragments. Preferably, the recognition site of the third restriction enzyme in the first engineered nucleic acid is located close to the end of the first engineered nucleic acid. In one preferred embodiment, each of the recognition sites of the third restriction enzyme in the first engineered nucleic acid is located less than 20 nucleotides apart. In a more preferred embodiment, each recognition site of the third restriction enzyme of the first engineered nucleic acid is located 0 to 5 nucleotides away from the end of the first engineered nucleic acid. Preferably, the third restriction enzyme is a type IIS endonuclease. In a preferred embodiment, the type IIS endonuclease cleaves at least 5, 10, 20, 50, 100, or 200 bases away from the recognition site.

더욱 바람직한 구현예에서(도 8a 및 8b), 제 1 제한 효소로 분해한 후에,In a more preferred embodiment (FIGS. 8A and 8B), after digestion with a first restriction enzyme,

1) 상기 제한 프래그먼트 세트 중의 제한 프래그먼트를, 제 1 제한효소와는 다른 제 2 제한효소의 인식 부위를 포함하는 기결정된 뉴클레오티드 서열을 포함하는 제 1 의 조작된 핵산과 연결시켜 제 1 핵산 프래그먼트 세트를 획득하는 단계;1) The restriction fragment in the restriction fragment set is linked with a first engineered nucleic acid comprising a predetermined nucleotide sequence comprising a recognition site of a second restriction enzyme different from the first restriction enzyme to link the first nucleic acid fragment set. Obtaining;

2) 제 1 핵산 프래그먼트를 제 2 제한 효소로 분해시켜 제 2 제한 프래그먼트를 획득하는 단계;2) digesting the first nucleic acid fragment with a second restriction enzyme to obtain a second restriction fragment;

3) 제 2 제한 프래그먼트의 말단을 연결시켜 제 1 환상 핵산 프래그먼트 세트를 생성시키는 단계; 및3) joining the ends of the second restriction fragment to generate a first set of circular nucleic acid fragments; And

4) 제 4 핵산 프래그먼트 각각의 적어도 일부를 시퀀싱하여 제한 서열 태그 세트를 결정하는 단계를 포함하는 방법에 의해, 제 1 제한효소로 분해한 후에 제한 서열 태그 세트를 제한 프래그먼트 세트로부터 결정한다. 바람직하게는, 본 방법은 단계 3) 후에 3i) 제 1 환상 핵산 프래그먼트를, 제 1 및 제 2 제한효소와는 다른 제 3 제한효소로 분해하여 제 3 핵산 프래그먼트 세트를 생성하는 단계; 3ⅱ) 상기 제 3 제한효소에 의해 생성된 말단을 변형시켜 라이게이션을 가능하게 하는 단계; 및 3ⅲ) 제 3 핵산 프래그먼트의 말단을 연결시켜 제 3 환상 핵산 프래그먼트 세트를 생성시키는 단계를 더 포함한다.4) The restriction sequence tag set is determined from the restriction fragment set after digestion with the first restriction enzyme by a method comprising sequencing at least a portion of each of the fourth nucleic acid fragments to determine the restriction sequence tag set. Preferably, the method further comprises, after step 3): 3i) digesting the first cyclic nucleic acid fragment with a third restriction enzyme different from the first and second restriction enzymes to generate a third set of nucleic acid fragments; 3ii) modifying the termini produced by the third restriction enzyme to enable ligation; And 3iii) linking the ends of the third nucleic acid fragments to generate a third set of circular nucleic acid fragments.

그러한 구현예에서, 제 1 제한효소에 의해 생성된 제한 프래그먼트 세트를 다수의 서로 다른 제 2 제한효소 각각을 이용하여 별도로 더욱 분해하는 것이 바람직하다. 보다 바람직하게는, 다수의 서로 다른 제 2 제한효소는 3, 5, 10, 또는 20 개 이상의 서로 다른 제한 효소를 포함한다.In such embodiments, it is desirable to further digest separately the set of restriction fragments generated by the first restriction enzyme using each of a plurality of different second restriction enzymes. More preferably, the plurality of different second restriction enzymes comprise at least three, five, ten, or twenty different restriction enzymes.

바람직한 구현예에서, 제 1 핵산 프래그먼트를 고정하고 증폭시키는 단계는 시퀀싱하는 단계 전에 수행한다. 보다 바람직한 구현예에서, 고정 및 증폭은 섹션 5.3에 기재된 DNA 콜로니 방법 중 임의의 하나의 방법으로 수행한다. 더욱 바람직한 구현예에서, 시퀀싱은 섹션 5.3에 기재된 한 염기씩의 프라이머 연장 방법에 의해 수행된다.In a preferred embodiment, the step of fixing and amplifying the first nucleic acid fragment is performed before the sequencing step. In a more preferred embodiment, the fixation and amplification is performed by any one of the DNA colony methods described in section 5.3. In a more preferred embodiment, sequencing is performed by the primer extension by one base described in section 5.3.

또 다른 바람직한 구현예에서, 연결되도록 하기 위해 뭉툭해지도록 DNA 폴리머라제로 보충하거나 돌출된 뉴클레오티드를 제거함으로써 상기 제 2 제한 프래그먼트의 상기 말단을 변경하는 단계를 수행한다.In another preferred embodiment, the step of altering said terminus of said second restriction fragment is carried out by supplementing with DNA polymerase to remove blunting or by removing protruding nucleotides so as to be linked.

그러한 구현예는 각각의 제 1 제한 프래그먼트 파트에 포함되는 두 개의 제한 서열 태그를 동정하는 것을 가능하게 하며, 여기에서 제 1 제한 태그는 제 1 제한효소 인식 부위의 옆에 존재하며, 제 2 제한 태그는 제 2 제한효소 인식 부위의 옆에 존재하며, 이는 제 1 및 제 2 제한 서열 태그가 동일한 제 1 제한 프래그먼트로부터 유래된 짝을 이루는 제한 서열 태그라는 정보를 저장한다.Such an embodiment makes it possible to identify two restriction sequence tags included in each first restriction fragment part, wherein the first restriction tag is next to the first restriction enzyme recognition site and the second restriction tag Is next to the second restriction enzyme recognition site, which stores the information that the first and second restriction sequence tags are paired restriction sequence tags derived from the same first restriction fragment.

제한 서열 태그는 서열 상동성에 의해 분류될 수 있으며, 가능하면 동일한제 1 제한 서열 태그를 함유하는 짝을 이루는 제한 서열 태그를 더욱 분류할 수 있으며, 분류된 짝을 이루는 제한 서열 태그로부터의 제 2 제한 태그는 주어진 제 1 제한효소 인식 부위와 물리적으로 가깝거나 동일한 면 상에 위치한다는 정보를 저장한다. 본 발명의 바람직한 방법에서, 게놈 서열이 입수 가능하다면, 제 1 제한효소의 인식 부위의 양쪽에 존재하는 게놈 상에 위치하는 측접하는 제한 서열 태그를 확인하기 위해 맵핑에 의해 제한 서열 태그를 클러스터링 하는 부가적인 단계를 제공한다.Restriction sequence tags can be classified by sequence homology, possibly further classifying a pair of restriction sequence tags containing the same first restriction sequence tag, and a second restriction from the sorted paired restriction sequence tag The tag stores information that is located on the same physical plane or on the same side as the given first restriction enzyme recognition site. In a preferred method of the invention, if genomic sequences are available, the addition of clustering restriction sequence tags by mapping to identify flanking restriction sequence tags located on the genome present on both sides of the recognition site of the first restriction enzyme. Provides an important step

6. 실시예6. Examples

하기 실시예는 본 발명을 설명하기 위한 것이지, 어떤 식으로든 본 발명을 제한하기 위한 것이 아니다.The following examples are intended to illustrate the invention, but not to limit the invention in any way.

6.1. 실시예 1: DNA 콜로니 주형의 제조: 이중 제한 서열 태그6.1. Example 1 Preparation of DNA Colony Templates: Double Restriction Sequence Tag

이 실시예는 DNA 태그의 in vitro 생성을 위해 벡터를 조작하는 것을 설명한다. 게놈 DNA로부터의 제한 서열 태그의 생성의 구현예를 도 9a에 나타내었다. 이 실시예는 두 개의 BsmFI 부위 사이에 위치하는 DNA 클로닝 부위를 갖는 플라스미드 벡터를 이용한다. 그 벡터는 작은 크기로 인해 선택된 pUC19 플라스미드에 기초한다.This example illustrates the manipulation of a vector for the in vitro generation of DNA tags. An embodiment of the generation of restriction sequence tags from genomic DNA is shown in FIG. 9A. This example uses a plasmid vector with a DNA cloning site located between two BsmFI sites. The vector is based on the selected pUC19 plasmid due to its small size.

1) 클로닝 벡터의 1차 생성1) First Generation of Cloning Vectors

단일 제한 효소로 분해된 게놈 DNA와 함께 사용하기 위해 클로닝 벡터의 1차 생성을 디자인하였다. 이 실시예에서, 제한 서열 태그의 생성을 입증하기 위해 박테리오파지 람다 게놈 DNA를 사용하였다.The primary generation of cloning vectors was designed for use with genomic DNA digested with a single restriction enzyme. In this example, bacteriophage lambda genomic DNA was used to demonstrate the generation of restriction sequence tags.

pUC19로 합성 링커를 클로닝함으로써 두 개의 벡터 변이를 만들었다. 제 1 변이에서, 벡터는 다음 인서트를 함유한다:Two vector variants were made by cloning the synthetic linker with pUC19. In a first variant, the vector contains the following inserts:

BsmFI BamHI BsmFIBsmFI BamHI BsmFI

GGGAC GGATCC GTCCC (SEQ ID NO: 1)GGGAC GGATCC GTCCC (SEQ ID NO: 1)

CCCTG CCTAGG CAGGG (SEQ ID NO: 2)CCCTG CCTAGG CAGGG (SEQ ID NO: 2)

이것은 Sau3AI 분해 람다 DNA를, 두 개의 BsmFI 부위에 의해 측접된 BamHI 제한 부위로 클로닝하는 것을 가능하게 한다. pUC19의 BamHI 부위는 벡터로부터 미리 제거하였다.This makes it possible to clone Sau3AI digested lambda DNA into BamHI restriction sites flanked by two BsmFI sites. The BamHI site of pUC19 was previously removed from the vector.

제 2 변이에서, 벡터는 두 개의 인접 BsmFI 부위에 의해 형성된 AatⅡ 제한 부위(밑줄)를 갖는 인서트를 함유한다.In a second variant, the vector contains an insert with an AatII restriction site (underlined) formed by two adjacent BsmFI sites.

BsmFI BsmFIBsmFI BsmFI

GGGAC GTCCC (SEQ ID NO: 3)GG GAC GTC CC (SEQ ID NO: 3)

CCCTG CAGGG (SEQ ID NO: 4)CC CTG CAG GG (SEQ ID NO: 4)

AatⅡAatⅡ

이것은 TaiI 분해 람다 DNA를 벡터로 클로닝 하는 것을 가능하게 한다. pUC19의 AatⅡ 부위는 벡터로부터 미리 제거하였다.This makes it possible to clone TaiI digested lambda DNA into vectors. The AatII site of pUC19 was previously removed from the vector.

1 차 두 개 모두 공 벡터(empty vector)의 자가 라이게이션을 막기 위해 사용하기 전에 탈인산화 하였다. 람다 DNA 프래그먼트의 라이게이션 후에, DNA 가닥의 온전성을 복구하기 위해 DNA 폴리머라제 I 및 라이게이즈를 사용하였다.Both primarys were dephosphorylated before use to prevent self ligation of empty vectors. After ligation of lambda DNA fragments, DNA polymerase I and ligation were used to restore the integrity of the DNA strands.

다음은 (벡터의 또 다른 생성을 위해 통상적으로 사용되는) 단계를 요약한다:The following summarizes the steps (commonly used for another generation of vectors):

ⅰ) 제 1 라이게이션Iii) Ligation 1

ⅱ) 가열에 의한 T4 DNA 라이게이즈의 불활성화Ii) inactivation of T4 DNA ligase by heating

ⅲ) BsmFI로의 분해Iii) decomposition into BsmFI

ⅳ) Klenow 및 dNTPs를 이용한 DNA 말단의 보충Iii) DNA termination supplementation with Klenow and dNTPs

ⅴ) 가열에 의한 BsmFI의 불활성화Viii) deactivation of BsmFI by heating;

ⅵ) 제 2 라이게이션 반응Iii) second ligation reaction

다음과 같은 in vitro 제한 서열 태그 생성의 프로토콜을 실시예에서 이용하였다:The following in vitro restriction sequence tag generation protocol was used in the examples:

제 1 차 라이게이션Primary Ligation

적절한 효소에 의해 분해되는 0.1 ㎍ 박테리오파지 람다 게놈 DNA0.1 μg bacteriophage lambda genomic DNA digested by appropriate enzymes

0.05 ㎍ 선형 벡터(아가로오스겔에 의해 정제)0.05 μg linear vector (purified by agarose gel)

1 mM ATP1 mM ATP

1-x 완충액 NEB41-x Buffer NEB4

1 ㎕ T4 DNA 리가아제(New England Biolabs, 400 u/㎕)1 μl T4 DNA ligase (New England Biolabs, 400 u / μl)

총 부피 10 ㎕, 실온에서 2 시간 배양10 μl total volume, 2 hours incubation at room temperature

T4 DNA 리가아제를 65℃에서 20 분간 가열함으로써 불활성화Inactivation by heating T4 DNA ligase at 65 ° C. for 20 minutes

BsmFI 분해BsmFI Decomposition

1-x 완충액 NEB4,1-x buffer NEB4,

0.5 ㎕ BsmF1(New England Biolabs, 2 u/㎕)를 함유하는 용액 5㎕ 부가Add 5 μl of solution containing 0.5 μl BsmF1 (New England Biolabs, 2 u / μl)

65℃에서 2 시간 배양2 hours incubation at 65 ℃

Klenow 처리Klenow Treatment

T4 DNA 리가아제(New England Biolabs)에 대한 1-x 완충액,1-x buffer for T4 DNA ligase (New England Biolabs),

100 μM dNTPs,100 μM dNTPs,

0.5 ㎕ Klenow 프래그먼트(New England Biolabs, 5 u/㎕)를 함유하는 용액 5㎕를 부가Add 5 μl of solution containing 0.5 μl Klenow fragment (New England Biolabs, 5 u / μl)

실온에서 5 분간 배양Incubate for 5 minutes at room temperature

80℃에서 20 분간 가열함으로써 효소를 불활성화Inactivate enzyme by heating at 80 ° C. for 20 minutes

제 2 차 라이이게이션Secondary Licensing

1-x MSL 완충액,1-x MSL buffer,

2mM ATP2 mM ATP

20% PEG600020% PEG6000

10%(v/v) T4 DNA 리가아제(New England Biolabs, 400u/㎕)를 함유하는 용액을 동을한 부피만큼 부가Add equal volume of solution containing 10% (v / v) T4 DNA ligase (New England Biolabs, 400 u / μl)

16℃에서 밤새 배양Incubate overnight at 16 ℃

상기 프로토콜에서, 분자내 라이게이션이 낮은 염에서 보다 효율적이기 때문에 MSL(Minimal Salt Ligation) 완충액을 이용하였다. MSL 완충액의 조성은 다음과 같다:In this protocol, Minimal Salt Ligation (MSL) buffer was used because intramolecular ligation was more efficient in low salts. The composition of the MSL buffer is as follows:

5-x MSL 1-x MSL5-x MSL 1-x MSL

50 mM Tris-HCl pH 7.5 10 mM Tris-HCl pH 7.550 mM Tris-HCl pH 7.5 10 mM Tris-HCl pH 7.5

50 mM MgCl₂10 mM MgCl₂ 50 mM MgCl ₂ 10 mM MgCl ₂

10 mM DTT 1 mM DTT10 mM DTT 1 mM DTT

in vitro 라이게이션 생성물의 분석을 PCR로 수행하였다. 정확한 크기의 두 개의 람다 DNA 제한 서열 태그가 벡터에 존재할 경우, 134 bp의 증폭 생성물이 형성된다. 더 작은 크기의 증폭 생성물은 단지 하나의 태그를 어떤 태그도 없는 공벡터 또는 공벡터의 BsmFI 분해 후의 자가 라이게이션한 벡터로 삽입함으로써 형성될 수 있다.Analysis of in vitro ligation products was performed by PCR. When two lambda DNA restriction sequence tags of the correct size are present in the vector, an amplification product of 134 bp is formed. Smaller amplification products can be formed by inserting only one tag into an empty vector without any tag or a self-ligated vector after BsmFI decomposition of the empty vector.

PCT 생성물의 길이의 분석을 Agilent DNA500 또는 DNA 1000 칩을 이용하여 수행하였다. in vitro 라이게이션 생성물을 캄피턴트(competent) E.coli 세포로 형징전환 시킨 후, 개개의 콜로니로부터 분리된 플라스미드의 분석을 수행하는, in virto 라이게이션 생성물을 연구하는 또 다른 방법이 있다. 람다 DNA를 벡터로 제 1 라이게이션 생성물을 분석 시(도 9b), 서로 다른 길이의 DNA 프래그먼트를 벡터로 삽입한 결과로서 여러 피크가 관찰되었다.Analysis of the length of the PCT product was performed using Agilent DNA500 or DNA 1000 chips. There is another method of studying in virto ligation products, in which the in vitro ligation product is transformed into competent E. coli cells, followed by analysis of plasmids isolated from individual colonies. When the first ligation product was analyzed with lambda DNA as a vector (FIG. 9B), various peaks were observed as a result of inserting DNA fragments of different lengths into the vector.

제 2 라이게이션 생성물을 분석 시, 기대한 크기의 프래그먼트가 더 작은 프래그먼트와 함께 존재하였다(도 9c).In analyzing the second ligation product, fragments of the expected size were present with the smaller fragments (FIG. 9C).

제 1 세대 벡터가 람다 게놈 DNA의 기대한 크기의 두 개의 제한 서열 태그로 크기 표준화하는 것을 허여한다고 할 지라도, 몇몇 기대하지 않은 생서물이 검출되었다. 그 이유는 아마도 제 1 라이게이션 반응동안 벡터의 자가-라이게이션이다. 이것은 완결되지 않은 탈인산화의 결과로서 일어나거나 DNA 폴리머라제 I 처리에 의해 유도될 수 있으며, 그것은 벡터 말단으로부터 탈인산화 염기를 제거할 수 있다. 그러한 문제는 실시예에서 설명한 바와 같이 단일 제한 서열 태그와 함께 게놈 DNA 프래그먼트의 부분적 파일링에 의해 극복할 수 있다. 예를 들어, BamHI 부위는 dGTP로 부분적으로 파일링 될 수 있다.Although the first generation vector allows for size normalization with two restriction sequence tags of the expected size of lambda genomic DNA, some unexpected raw organisms have been detected. The reason is probably the self-ligation of the vector during the first ligation reaction. This may occur as a result of incomplete dephosphorylation or may be induced by DNA polymerase I treatment, which may remove the dephosphorylated base from the vector terminus. Such problems can be overcome by partial filing of genomic DNA fragments with a single restriction sequence tag as described in the Examples. For example, the BamHI site can be partially compiled with dGTP.

택일적으로, 벡터는 BamHI 부위를 BglⅡ로 대체함으로써 디자인될 수 있다. BamHI 게놈 프래그먼트를 BglⅡ 제한 효소의 존재 하에서 BglⅡ 분해 벡터로 라이게이션하면 벡터의 자가 라이게이션을 막을 것이다. 단지 기대된 벡터-인서트 라이게이션 생성물은 BglⅡ 부위를 억제할 것이므로 분해에 대한 저항성이 있을 것이다.Alternatively, the vector can be designed by replacing the BamHI site with BglII. Ligation of the BamHI genomic fragment with a BglII digestion vector in the presence of BglII restriction enzymes will prevent self-ligation of the vector. Only expected vector-insertion ligation products will inhibit the BglII site and thus will be resistant to degradation.

BsmFI 효소는 간단한 구조물로서 평가되었다. 제 1 세대 벡터의 BamHI 부위에서 2000 bp DNA 인서트를 함유하는 환상 플라스미드를 BsmFI를 이용하여 분해하고(인서트 내에 부위가 없다), 부착된 DNA 태그를 함유하는 벡터의 3000 bp 밴드를 아가로오스 겔로부터 분리하였다. 이러한 DNA를 Klenow 효소 + dNTPs로 처리하여 평활 말단을 생성시키고, 제 2 차 라이게이션을 위해 T4 리가아제로 처리하였다.도 9에 나타낸 결과는 133 bp의 기대한 크기보다 더 작은 프래그먼트의 밴드는 나타나지 않았다. 더 큰 크기의 프래그먼트 밴드는, 이후의 실험에서 나타나지 않았기 때문에 PCR 생성물이 것으로 보인다.BsmFI enzymes were evaluated as simple constructs. The cyclic plasmid containing 2000 bp DNA insert at the BamHI site of the first generation vector was digested with BsmFI (no site in the insert), and the 3000 bp band of the vector containing the attached DNA tag was removed from the agarose gel. Separated. These DNAs were treated with Klenow enzyme + dNTPs to generate blunt ends and treated with T4 ligase for secondary ligation. The results shown in FIG. 9 show bands of fragments smaller than the expected size of 133 bp. Did. Fragment bands of larger size appear to be PCR products because they did not appear in subsequent experiments.

이러한 실험은 BsmFI 효소가 정확한 거리에서 정확히 절단하고 태그의 생성물이 성공적으로 형성될 수 있다는 것을 알려준다. 두 개의 제한 서열 태그의 라이게이션을 가능하게 하는 평활 말단의 생성 이외의 또 다른 방법은 두 개의 제한 서열 태그 사이에 링커를 삽입하는 것이다.These experiments indicate that the BsmFI enzyme can be correctly cleaved at the correct distance and the product of the tag can be formed successfully. Another method besides the generation of blunt ends that allows the ligation of two restriction sequence tags is to insert a linker between the two restriction sequence tags.

또 다른 방법은 벡터-인서트-링커 시스템을 뒤집는 것이다. 제 1 라이게이션은 게놈 DNA 프래그먼트를 링커(DNA 콜로니의 선형화에 유용할 특이적인 절단 부위를 함유하여 각각의 콜로니에서 증폭된 DNA의 양 가닥의 시퀀싱을 가능하게 한다)와 연결한다. type ⅡS 효소로 분해한 후, "벡터" 암을 type ⅡS 효소로 절단된 말단과 라이게이션 한다.Another way is to invert the vector-insert-linker system. The first ligation associates genomic DNA fragments with linkers (which contain specific cleavage sites that will be useful for linearizing DNA colonies to allow sequencing of both strands of DNA amplified in each colony). After digestion with type IIS enzyme, the "vector" cancer is ligated with the cleaved end with type IIS enzyme.

2) 제 2 세대 벡터2) second generation vector

클로닝에 두 개의 서로 다른 효소를 이용하기 위해, 예를 들어 게놈 DNA 프래그먼트의 평균 크기를 더 감소시키는 것을 가능하게 하고 공벡터의 자가 라이게이션을 하지 않는 제 2 세대 벡터를 디자인하였다. 아가로오스 겔 상에서 충분히 분리된 플라스미드를 부분적으로 분해된 것으로부터 분리하는 것을 촉진하기 위해, 1000 bp DNA 프래그먼트(BlueScript 플라스미드 pBSK로부터 유래)를 생 벡터(raw vector)의 제한 부위 사이로 포함시켰다. 제 2 세대 벡터를 위해 탈인산화 및 DNA 폴리머라제 I의 처리를 필요로 하지 않는다.In order to use two different enzymes for cloning, a second generation vector was designed that makes it possible to further reduce the average size of genomic DNA fragments, for example, and does not self-ligation of the empty vectors. To facilitate the separation of the plasmid sufficiently separated on the agarose gel from the partially digested, 1000 bp DNA fragments (from BlueScript plasmid pBSK) were included between the restriction sites of the raw vector. There is no need for dephosphorylation and treatment of DNA polymerase I for second generation vectors.

생벡터는 도 10a에 나타낸 인서트를 함유하며, 그것은 클로닝을 위해 SphI 및 AccI 제한 부위를 사용하는 것을 가능하게 한다. pUC19 플라스미드의 자가 SphI 및 AccI 부위를 제거하였다. SphI 분해에 의해 형성된 3'-돌출 말단으로 인해, 공벡터는 Klenow 효소가 완전히 그 돌출부위를 제거하지 않는다면 자가 라이게이션을 할 수 없다. 두 개의 서로 다른 효소에 의해 분해된 DNA를 제 2 세대 벡터로 삽입할 수 있다. 도 10a는 클로닝의 여러 가능성을 보여준다.The live vector contains the insert shown in FIG. 10A, which makes it possible to use SphI and AccI restriction sites for cloning. The autologous SphI and AccI sites of the pUC19 plasmid were removed. Due to the 3′-protruding end formed by SphI digestion, the empty vector cannot self-ligation unless the Klenow enzyme completely removes its overhang. DNA digested by two different enzymes can be inserted into a second generation vector. 10A shows several possibilities for cloning.

MspI 및 SphI로 분해된 람다 DNA의 in vitro 라이게이션을 수행하여 SphI-AccI 개방 벡터가 생성되었다. 제 2 차 라이게이션 생성물을 분석한 결과 도 10b에 나타낸 바와 같이 정확한 크기의 단일 밴드가 나타났다. 제 2 차 라이게이션 생성물을 E. coli 세포로 형질전환하였다. 30 개의 콜로니를 액체 배양물로 접종하였다. 가장 높은 밀도를 갖는 세균 배양물로부터의 12 개의 플라스미드를 분석하였다. 공벡터에 해당하는 플라스미드는 전혀 관찰되지 않았다. 인서트의 2 개 이상의 염기 변이를 갖는 플라스미드도 전혀 관찰되지 않았다. HincⅡ 및 SphI 분해 벡터로 삽입된 AluI 및 SphI 분해 람다로 유사한 실험을 수행하였다(Alu 및 HincⅡ는 평활 말단을 생성한다). 도 10c는 제 1 차 라이게이션 생성물의 분석 결과를 나타낸다. 기대한 바와 같이, Agilent 2100 bioanalyzer DNA 1000 chip을 이용하여 람다 DNA의 서로 다른 크기의 프래그먼트가 관찰되었다. 가장 높은 피크는 크기 마커이다. 도 10d는 제 2 차 라이게이션의 생성물의 분석결과를 나타낸다. Agilent 2100 bioanalyzer DNA 1000 chip을 이용하여 분석한 결과, 기대된 크기를 갖는 단일 프래그먼트만이 관찰되었다.In vitro ligation of lambda DNA digested with MspI and SphI was performed to generate SphI-AccI open vectors. Analysis of the secondary ligation product revealed a single band of the correct size as shown in FIG. 10B. Secondary ligation products were transformed into E. coli cells. Thirty colonies were inoculated with liquid culture. Twelve plasmids from the bacterial cultures with the highest densities were analyzed. No plasmid corresponding to the empty vector was observed. No plasmids with two or more base variations of the insert were observed at all. Similar experiments were performed with AluI and SphI digestion lambdas inserted into HincII and SphI digestion vectors (Alu and HincII produce blunt ends). 10C shows the analysis results of the primary ligation product. As expected, fragments of different sizes of lambda DNA were observed using Agilent 2100 bioanalyzer DNA 1000 chip. The highest peak is the size marker. 10D shows the analysis results of the product of the secondary ligation. Analysis using an Agilent 2100 bioanalyzer DNA 1000 chip revealed only a single fragment with the expected size.

6.2. 실시예 2: DNA 콜로니 주형의 제조: 이중 제한 서열 태그6.2. Example 2: Preparation of DNA Colony Templates: Double Restriction Sequence Tag

이 실시예는 도 4a에 나타낸 바와 같이 제노타이핑 되어야 할 DNA 샘플로부터의 단일 제한 서열 태그를 함유하는 DNA 콜로니 주형의 제조를 설명한다. 변이 가능한 프래그먼트인 제한 서열 태그가 DNA 콜로니 주형의 크기보다 6% 더 작을 것으로 나타나기 때문에, 이 프로토콜에서 크기 표준화 단계는 모든 DNA 콜로니의 효율적이고 필적할만한 모든 DNA 콜로니의 증폭을 확실하게 한다. DNA 콜로니 벡터로의 삽입은 유니버설 서열의 부가를 가능하게 하여 DNA 콜로니 주형을 생성시킨다.This example illustrates the preparation of a DNA colony template containing a single restriction sequence tag from the DNA sample to be genotyped as shown in FIG. 4A. Because the restriction sequence tag, a mutable fragment, appears to be 6% smaller than the size of the DNA colony template, the size normalization step in this protocol ensures efficient and comparable amplification of all DNA colonies of all DNA colonies. Insertion into a DNA colony vector allows for the addition of a universal sequence to generate a DNA colony template.

이 실시예에서 이용된 in vitro 클로닝의 전반적인 전략을 도 11a-b에 나타내었다. 간단하게, 짧은 이중 가닥 어댑터("짧은 암(arm)")는 type ⅡS 제한효소 MmeI의 인식 부위를 형성하는 헥사뉴클레오티드 TCCGAC가 뒤따르는 증폭 프라이머 Px로 구성된다. 올리고뉴클레오티드의 5' 말단은 분리 가능한 이황화 결합을 통해 결합된 비오틴 모이어티를 함유한다. 상보적 가닥은 5'-인산화되며, 개시 제한 효소에 의해 분해된 DNA의 스틱키 말단과 양립 가능한 연장된 뉴클레오티드를 함유한다. 짧은 암(arm)을 해당 엔도뉴클레아제로 개열된 DNA와 라이게이션 하고, type ⅡS 효소 MmeI로 더욱 처리하였다. 이것은 짧은 암에 부착된 DNA의 20 bp 프래그먼트를 남긴다. 그런 다음 스트렙타비딘 비드를 이용하여 컨쥬게이트를 다른 DNA 프래그먼트로부터 정제하고 또 다른 증폭 프라이머 Py를 함유하는 "긴 암"에 라이게이션 한다.The overall strategy of in vitro cloning used in this example is shown in Figures 11a-b. Briefly, the short double strand adapter ("short arm") consists of an amplification primer Px followed by hexanucleotide TCCGAC forming the recognition site of type IIS restriction enzyme MmeI. The 5 'end of the oligonucleotide contains a biotin moiety bound via separable disulfide bonds. The complementary strand is 5'-phosphorylated and contains an extended nucleotide compatible with the sticky end of the DNA degraded by the initiation restriction enzyme. Short arms were ligated with the DNA cleaved with the corresponding endonuclease and further treated with the type IIS enzyme MmeI. This leaves 20 bp fragments of DNA attached to short cancers. The conjugate is then purified from other DNA fragments using streptavidin beads and ligated to a "long cancer" containing another amplification primer Py.

클로닝 전략이 6 bp 서열을 인지하는 엔도뉴클레아제에 의한 DNA 개열에 기초한다고 하더라도, 평균 DNA 프래그먼트 크기를 감소시키기 위해 제 2 빈번한 절단 효소(4 bp의 인식 부위, RsaI)로 DNA를 분해하는 것이 바람직하다.Although the cloning strategy is based on DNA cleavage by endonucleases that recognize 6 bp sequences, digesting DNA with a second frequent cleavage enzyme (4 bp recognition site, RsaI) to reduce the average DNA fragment size is essential. desirable.

HindⅢ 및 RsaI로 분해된 람다 파지 DNA로부터의 주형 제조를 위한 프로토콜 및 서로 다른 형성된 단계를 아래에 요약하였다:The protocols for the preparation of templates from lambda phage DNA digested with HindIII and RsaI and the different formation steps are summarized below:

ⅰ) 람다 게놈 DNA의 분해Iii) digestion of lambda genomic DNA

ⅱ) 짧은 Px 암으로의 라이게이션Ii) Ligation to Short Px Cancers

ⅲ) MmeI에 의한 분해Iii) decomposition by MmeI

ⅳ) Px 암-태그 컨쥬게이트의 정제Iii) Purification of Px Cancer-Tag Conjugates

ⅴ) Py 암의 부착Ⅴ) Attachment of Py arm

ⅵ) 최종 DNA 콜로니 주형의 정제Viii) Purification of the final DNA colony template

이 실시예에 사용된 각각의 개별적인 단계의 프로토콜을 아래에 상세하게 기재하였다.The protocol of each individual step used in this example is described in detail below.

ⅰ) 박테리오파지 람다 게놈 DNA의 분해분해) digestion of bacteriophage lambda genomic DNA

람다 박테리오파지 DNA 10 ㎕(New England Biolabs, 0.5 ㎍/㎕)를 완충액 Y +/Tango(Fermentas); 32.5 ㎕ H₂O; 1.25 ㎕ HindⅢ(New England Biolabs); 1.25 ㎕ RsaI(New England Biolabs).10 [mu] l of lambda bacteriophage DNA (New England Biolabs, 0.5 [mu] g / [mu] l) was buffered with Y + / Tango (Fermentas); 32.5 μl H ₂ O; 1.25 μl HindIII from New England Biolabs; 1.25 μl RsaI (New England Biolabs).

37℃에서 2 내지 16 시간동안 배양한다.Incubate at 37 ° C. for 2-16 hours.

그리하여 HindⅢ 말단 42 fmol/㎕를 함유하는 람다 파지 DNA의 100 ng/㎕ 용액이 생성된다.This results in a 100 ng / μl solution of lambda phage DNA containing 42 fmol / μl of HindIII end.

HindⅢ 돌출을 dATP로 부분적으로 보충한다.HindIII protrusions are partially supplemented with dATP.

람다 게놈 DNA HindⅢ 말단을 짧은 암과 라이게이션 하는 것을 최대화하면서 람다 게놈 프래그먼트의 자가 라이게이션 및 짧은 암의 자가 라이게이션을 억제하기 위해 서로 다른 프로토콜을 이용할 수 있다.Different protocols can be used to inhibit the ligation of lambda genomic fragments and the ligation of short cancers while maximizing ligation of the lambda genomic DNA HindIII ends with short cancers.

HindⅢ 말단의 자가-라이게이션을 억제하는 가장 좋은 방법은 단일 염기를 dATP로 보충하는 단게라는 것이 발견되었다. 짧은 암 프래그먼트는 게놈 DNA 프래그먼트의 부분적으로 채워진 HindⅢ 말단과 양립 가능하도록 디자인되어야 한다.It has been found that the best way to inhibit self-ligation at the HindIII terminus is to supplement a single base with dATP. Short cancer fragments should be designed to be compatible with the partially filled HindIII ends of genomic DNA fragments.

HindⅢ 말단의 보충:Supplementation of HindIII terminus:

HidⅢ-RsaI 분해 람다 게놈 DNA 20 ㎕를 10 mM dATP 2㎕; Klenow 효소 1 ㎕(New England Biolabs, 5u/㎕)와 혼합한다.20 μl of HidIII-RsaI digested lambda genomic DNA was added to 2 μl of 10 mM dATP; Mix with 1 μl Klenow enzyme (New England Biolabs, 5 u / μl).

25℃에서 30분, 70℃에서 20분간 배양한다.Incubate at 25 ° C. for 30 minutes and 70 ° C. for 20 minutes.

ⅱ) 짧은 암 모이어티로의 라이게이션Ii) ligation to short arm moieties

라이게이션 반응동안 짧은 암 다이머의 형성을 막기 위해(또는 용액으로부터 그것을 제거하기 위해) 주의를 기울여야 한다. 부분적인 MmeI 분해 후에 형성되는 그러한 다이머는 짧은 암의 클로닝된 프래그먼트를 함유하는 정확한 크기의 주형을 일으킬 수 있다.Care must be taken to prevent the formation of short arm dimers (or to remove them from solution) during the ligation reaction. Such dimers formed after partial MmeI degradation can result in molds of the correct size containing cloned fragments of short arms.

상기한 바와 같이, 부분적으로 채워진 DNA 말단에 상보적인 비회문(non-palindromic) 돌출을 함유하는 짧은 암을 사용하는 것이 바람직한 방법이다. 택일적으로, 3' 말단에 디데옥시 염기를 함유하는 짧은 암이 사용될 수도 있다. 닉(nick)이 인식부위의 바로 뒤에 존재한다면 MmeI는 DNA를 개열시킬 수 있다. 비인산화 짧은 암의 사용은 또 다른 선택사항이다.As mentioned above, it is a preferred method to use short cancers containing non-palindromic protrusions complementary to partially filled DNA ends. Alternatively, short arms containing dideoxy base at the 3 'end may be used. If the nick is just behind the recognition site, MmeI can cleave the DNA. The use of non-phosphorylated short cancers is another option.

이러한 클로닝 단계는 dATP로 채워진 HindⅢ보다 짧은 암을 10 배 과량의 몰로 이용하여 수행한다.This cloning step is performed using 10-fold excess moles of cancer shorter than HindIII filled with dATP.

짧은 암의 제조:Preparation of Short Arms:

Tris-HCl pH 8.0 중의 비오티닐화 올리고 짧은 A 5'-GAGGAAAGGGAAGGGAAAGGAAGGTCCGAC-3'(SEQ ID NO:9) 10 mM 용액 10 ㎕를 Tris-HCl pH 8.0 중의 올리고 짧은-B 5'-GCTGTCGGACCTTCCTTTCCCTTCCCTTTCCTC-3'(SEQ ID NO:10)의 10 μM 용액 10 ㎕와 혼합한다. 올리고 짧은-A는 비오틴 및 5'-말단 사이의 개열 가능한 디설파이드 결합을 함유한다.10 μl of a 10 mM solution of biotinylated oligo short A 5′-GAGGAAAGGGAAGGGAAAGGAAGGTCCGAC-3 ′ (SEQ ID NO: 9) in Tris-HCl pH 8.0 was added to oligo short-B 5′-GCTGTCGGACCTTCCTTTCCCTTCCCTTTCCTC-3 ′ in Tris-HCl pH 8.0. Mix with 10 μl of 10 μM solution of SEQ ID NO: 10). Oligo Short-A contains cleavable disulfide bonds between biotin and the 5'-terminus.

80℃까지 데우고 30 분동안 실온으로 서서히 냉각한다.Warm to 80 ° C. and slowly cool to room temperature for 30 minutes.

라이게이션:Ligation:

게놈 DNA 혼합물의 부분적으로 채워진 HindⅢ 말단에 10 mM 리보 ATP 3 ㎕; 5 μM 짧은 암 4 ㎕; T4 DNA 리가아제(New England Biolabs, 400 u/㎕)를 16℃에서 1 시간동안 배양하였다.3 μl of 10 mM ribo ATP at the partially filled HindIII terminus of the genomic DNA mixture; 4 μl 5 μM short cancer; T4 DNA ligase (New England Biolabs, 400 u / μl) was incubated at 16 ° C. for 1 hour.

0℃까지 데우고 30 분동안 실온으로 서서히 냉각한다.Warm to 0 ° C. and slowly cool to room temperature for 30 minutes.

라이게이션:Ligation:

부분적으로 보충된 게놈 DNA의 혼합물의 HindⅢ 말단에 10 mM 리보ATP 3㎕; 5 μM 짧은 암 4㎕; T4 DNA 리가아제 1 ㎕(New England Biolabs, 400 u/㎕)를 가하고, 16℃에서 1 시간동안 배양하였다.3 μl of 10 mM riboATP at the HindIII terminus of the partially supplemented mixture of genomic DNA; 4 μl 5 μM short cancer; 1 μl of T4 DNA ligase (New England Biolabs, 400 u / μl) was added and incubated at 16 ° C. for 1 hour.

Qiagen MiniElute Reaction Clean Up 프로토콜에 따라 DNA 정제를 수행한다.완충액 EB 12 ㎕로 용리하고, 튜브를 바꾸지 않고 신선한 완충액 EB 5 ㎕를 이용하여 용리를 반복한다.Perform DNA purification according to the Qiagen MiniElute Reaction Clean Up protocol. Elute with 12 μl of buffer EB and repeat elution with 5 μl of fresh buffer EB without changing tubes.

이러한 라이게이션 조건 하에서, 평활 RsaI-생성 말단의 라이게이션으로 인해 게놈 DNA 프래그먼트의 현저한 폴리머화가 일어나지 않는다.Under these ligation conditions, ligation of the blunt RsaI-producing ends does not result in significant polymerisation of the genomic DNA fragments.

다수의 비라이게이션화 암을 제거하기 위해, T4 리가아제의 열 불활성화 대신 Qiagen Mini Elute 컬럼을 이용하여 샘플을 정제하는 것이 바람직하다. 이중 용리(elution)는 반응 생성물의 회수를 증가시킬 수 있다.To remove a large number of non-ligated arms, it is desirable to purify the sample using a Qiagen Mini Elute column instead of heat inactivation of T4 ligase. Double elution can increase the recovery of the reaction product.

ⅲ) MmeI 분해Iii) MmeI decomposition

MmiI에 의한 효과적인 분해는 주형의 수율을 결정하는 결정적인 단계이다. 효소는 DNA ㎍당 1-2 유닛 보다 많지 않은 비율로 사용해야 한다. New England Biolabs에 따르면, 과량의 효소는 엔도뉴클레아제 개열을 차단한다.Effective degradation by MmiI is a critical step in determining the yield of the template. Enzymes should be used at a rate no greater than 1-2 units per μg DNA. Excess enzymes block endonuclease cleavage, according to New England Biolabs.

샘플 혼합물에 완충액 Y 2㎕ +/Tango(Fermentas); 2 ㎕ 1 mM SAM; 1 ㎕(2u.) MmeI(New England Biolabs)를 가한다.2 μl of buffer Y + / Tango (Fermentas) in the sample mixture; 2 μl 1 mM SAM; Add 1 μl (2 u.) MmeI (New England Biolabs).

37℃에서 1 시간동안 배양한다.Incubate at 37 ° C. for 1 hour.

ⅳ) 스트렙타비딘 비드로의 결합/분리Iii) binding / separation to streptavidin beads

제조자 제공 정보에 따르면 30분의 시간이 비드와의 DNA 결합에 충분하다고 하지만, 밤새 배양하여 생성물의 수율을 증가시킨다. 200 mM DTT에 의한 디설파이드 결합 개열 및 DNA의 분비는 30 분 후에 완료된다. 이 단계 후에, 원하는 생성물의 수율 및 MmeI 분해의 효율성을 분석하는 것이 유용하다. 분해되지 않은 생성물이 거대한 DNA 프래그먼트로서 보였다.According to manufacturer-supplied information, 30 minutes of time is sufficient for DNA binding with beads, but incubated overnight to increase product yield. Disulfide bond cleavage and secretion of DNA by 200 mM DTT is completed after 30 minutes. After this step, it is useful to analyze the yield of the desired product and the efficiency of MmeI degradation. Undigested product was seen as a huge DNA fragment.

SA 비드로의 결합/분비:Binding / secretion to SA beads:

2x B&W 완충액(Dynal 프로토콜에 따라 제조) 20 ㎕에 재현탁된, 세척된 SA 280 비드(Dynal) 10 ㎕를 부가한다.10 μl of washed SA 280 beads (Dynal) are added to 20 μl of 2 × B & W buffer (prepared according to the Dynal protocol).

실온에서 밤새 진탕배양한다. 비드를 1X B&W 완충액 40 ㎕로 2 회 세척한다. 비드를 100 mM Tris-Hcl pH 8.0 40 ㎕로 2 회 세척한다. 비드를 80 mM Tris-HCl pH 8.0 중의 200 mM DTT 11 ㎕를 가한다.Shake overnight at room temperature. Beads are washed twice with 40 μl of 1 × B & W buffer. Beads are washed twice with 40 μl 100 mM Tris-Hcl pH 8.0. Beads are added 11 μl of 200 mM DTT in 80 mM Tris-HCl pH 8.0.

실온에서 30 분동안 진탕배양한다.Shake for 30 minutes at room temperature.

상등액을 비드로부터 분리하고 비드를 버린다. 필요하다면, 상등액 1 ㎕를 Agilent 2100 bioanalyzer DNA 1000 칩으로 분석한다.The supernatant is separated from the beads and discarded. If necessary, 1 μl of supernatant is analyzed with an Agilent 2100 bioanalyzer DNA 1000 chip.

v) 긴 암 모이어티의 라이게이션v) ligation of long arm moieties

이 라이게이션은 MmeI 라이게이션에 의해 게놈 DNA에서 생성된 3'-돌출 중에 존재하는 무작위의 2 개의 염기의 인식에 기초한다. 이러한 두 개의 염기가 생성될 때, 그러한 라이게이션은 느린 반응을 나타내고 효소 농도의 증가를 필요로 한다(New England Biolabs, MmeI에 관한 정보 노트).This ligation is based on the recognition of two random bases present in the 3'-protrusion generated in genomic DNA by MmeI ligation. When these two bases are produced, such ligation exhibits a slow reaction and requires an increase in enzyme concentration (New England Biolabs, MmeI information note).

긴 암의 제조Manufacture of long cancer

이미 제조된 PCR 비드(Amersham)를 갖는 튜브에 H2O 19 ㎕; pUC19 플라스미드 DNA(부위 571-870이 증폭 예정) 1 ng/㎕ 1 ㎕; 10 μM 올리고 긴-A 5'-CTCACATTAA TTGCGTTGCG NNCACTGCCC GCTTTCCAG-3'(SEQ ID NO:11); 10 μM 올리고 긴-B 5'-CACCAACCCAAACCAACCCAAACCGAAAAA CGCCAGCAAC G-3'(SEQ ID NO: 12)를 부가한다. PTC-200 열교환기(MJ Research)의 프로그램을 이용하여 증폭을 수행하고; 20분 30 초간 94℃; 25개의 사이클(94℃ 30초; 55℃ 30초; 72℃ 30 초); 그런 다음 72℃에서 10 분간 수행한다.19 μl of H 2 O in a tube with prepared PCR beads (Amersham); 1 μl of pUC19 plasmid DNA (sites 571-870 will be amplified) 1 ng / μl; 10 μΜ oligo long-A 5′-CTCACATTAA TTGCGTTGCG NNCACTGCCC GCTTTCCAG-3 ′ (SEQ ID NO: 11); Add 10 μM oligo long-B 5′-CACCAACCCAAACCAACCCAAACCGAAAAA CGCCAGCAAC G-3 ′ (SEQ ID NO: 12). Amplification is performed using a program of PTC-200 heat exchanger (MJ Research); 94 ° C. for 20 minutes 30 seconds; 25 cycles (94 ° C. 30 sec; 55 ° C. 30 sec; 72 ° C. 30 sec); Then run at 72 ° C. for 10 minutes.

증폭 산물의 기대 길이는 323 bp 이다. 반응 생성물은 Qiagen 컬럼을 통해 정제하고, 그 순도 및 농도를 Agilent 2100 bioanalyzer DNA 1000 칩으로 평가한다.The expected length of the amplification product is 323 bp. The reaction product is purified via Qiagen column and its purity and concentration is assessed with Agilent 2100 bioanalyzer DNA 1000 chip.

그런 다음 PCT 산물을 BtsI로 분해해야 한다. 효소의 양 및 배야시간은 PCR 산물의 양에 의존한다. 분해 효율은 Agilent 2100 bioanalyzer DNA 1000 칩으로의 분석에 의해 평가해야 한다. 323 bp로부터 301 bp로의 크기 변화가 기대된다. 분해가 완료되면, Qiagen 컬럼을 통한 정제(PCR 산물 정제 프로토콜)은 반응물로부터의 작은 22 bp 생성물을 제거하기에 충분하다. 다른 방법으로는 301 bp 프래그먼트는 2% 아가로오스 겔을 통해 정제해야 한다.The PCT product must then be broken down into BtsI. The amount of enzyme and the backtime depends on the amount of PCR product. Degradation efficiency should be assessed by analysis with an Agilent 2100 bioanalyzer DNA 1000 chip. A change in size from 323 bp to 301 bp is expected. Once the digestion is complete, purification via a Qiagen column (PCR product purification protocol) is sufficient to remove small 22 bp product from the reaction. Alternatively, the 301 bp fragment should be purified via 2% agarose gel.

긴 암 모이어티의 라이게이션:Ligation of long arm moieties:

비드로부터 분비된 상등액에 100 mM Mgcl2 2㎕; 10 mM rATP 2 ㎕; 긴 암 5㎕; 농축 T4 DNA 리가아제 1 ㎕(New England Biolabs, 2000 u/㎕)를 부가한다.2 μl of 100 mM Mgcl2 in the supernatant secreted from the beads; 2 μl 10 mM rATP; 5 μL long cancer; Add 1 μl of concentrated T4 DNA ligase (New England Biolabs, 2000 u / μl).

16℃에서 밤새 배양한다.Incubate overnight at 16 ° C.

필요하다면, 반응액 1 ㎕를 Agilent 2100 bioanalyzer DNA 100 칩에서 분석한다.If necessary, 1 μl of the reaction solution is analyzed on an Agilent 2100 bioanalyzer DNA 100 chip.

ⅵ) 최종 주형 정제Iii) final mold refining

최종 정제 단계를 위해, 바람직한 라이게이션된 주형을 유리 긴 암 및 우발적인 긴 암 다이머 또는 비반응 50 bp 생성물로부터 분리한다. 변성 조건에서의주형의 가열은 주형 가닥의 분리를 최소화하기 위해 피해야 한다.For the final purification step, preferred ligated templates are separated from free long cancer and accidental long cancer dimers or unreacted 50 bp products. Heating of the mold under denaturing conditions should be avoided to minimize separation of the mold strands.

주형의 최종 정제:Final purification of the mold:

전체 샘플을 2% 아가로오스 겔 상에 로딩한다. 유리 긴 암(301 bp) 주형(350 bp) 및 긴 암 다이머(600 bp) 간의 분리가 잘 이루어질 때까지 작동(run)시킨다. 밴드를 아가로오스 겔로부터 절단한다.The entire sample is loaded onto a 2% agarose gel. Run until good separation between free long arm (301 bp) template (350 bp) and long arm dimer (600 bp) is achieved. The band is cut from the agarose gel.

DNA를 Clontech Montage Agarose Kit 또는 Qiagen MiniElute Agarose Extraction Kit에 의해 정제한다. Qiagen Kit을 사용할 경우, 권유하는 바와 같이 튜브를 50℃로 데우지 않아도 실온에서 15 분만에 용해할 것이다. 필요한다면 최종 생성물을 Agilent 2100 bioanalyzer DNA 1000 칩에서 분석할 수 있다.DNA is purified by Clontech Montage Agarose Kit or Qiagen MiniElute Agarose Extraction Kit. When using the Qiagen Kit, as recommended, it will dissolve in 15 minutes at room temperature without the tube warming to 50 ° C. If necessary, the final product can be analyzed on an Agilent 2100 bioanalyzer DNA 1000 chip.

그런 다음, 크기 표준화를 입증하였다. 박테리오파지 람다 게놈 DNA 또는³³P-dATP로 라벨링한 인간 게놈 DNA로부터 평행하게 동일한 실험을 수행하였다.Then size standardization was demonstrated. The same experiment was performed in parallel from bacteriophage lambda genomic DNA or human genomic DNA labeled with ³³ P-dATP.

도 11c는 상기 공정의 다양한 단계 후에 수집한 분액을 방사선 사진으로 분석한 결과를 보여준다. 레인 1: 완전한 DNA 콜로니 벡터 크기 350 bp의 PCR 산물; 레인 2-6: 람다 게놈 DNA 및 레인 7-10 인간 게놈 DNA; 짧은 암(arm)으로의 라이게이션 후의 레인 3 및 7; MmeI로 분해한 후의 레인 4 및 8, 크기 평준화가 관찰됨; 레인 5, 6, 9, 및 10: 긴 암으로 라이게이션한 후에 DNA 콜로니 벡터를 원하는 크기로 생성.11C shows the results of radiographic analysis of the aliquots collected after the various steps of the process. Lane 1: PCR product of 350 bp of complete DNA colony vector size; Lanes 2-6: lambda genomic DNA and lanes 7-10 human genomic DNA; Lanes 3 and 7 after ligation into short arms; Lanes 4 and 8, size leveling after degradation with MmeI; Lanes 5, 6, 9, and 10: DNA colony vectors were generated at desired sizes after ligation into long cancers.

그런 다음 DNA 콜로니를 다음과 같이 생성하였다: 본 실시예에서 지시한 바에 따라 제조된, HindⅢ로 분해되고 MmeI로 크기 평준화가 이루어진 람다 또는 인간 게놈 DNA 프래그먼트를 함유하는 DNA 콜로니 벡터를 이용하여 WO 00/18957의 방법으로 DNA 콜로니를 생성시켰다. 도 11d는 람다 DNA의 DNA 콜로니를 나타낸다. 도 11e는 람다 DNA(왼쪽 컬럼) 또는 인간 DNA(오른쪽 컬럼의 첫 3 개의 이미지)를 보여준다. 그런 다음, 이러한 DNA 콜로니를 WO 98/44152의 방법을 이용하여 in situ에서 시퀀싱하여 제한 서열 태그를 동정한다.DNA colonies were then generated as follows: WO 00 / using DNA colony vectors containing lambda or human genomic DNA fragments digested with HindIII and size-equalized with MmeI, prepared as directed in this example. DNA colonies were generated by the method of 18957. 11D shows DNA colonies of lambda DNA. 11E shows lambda DNA (left column) or human DNA (first three images of right column). This DNA colony is then sequenced in situ using the method of WO 98/44152 to identify restriction sequence tags.

DNA 콜로니 벡터의 크기를 또한 PCR 증폭에 의해 입증하였다. 그런 다음, PCR 산물을 pUC19 플라스미드로 클로닝하고 E.coli 캄피턴트 세포(XL-2 Blue, Stratagene)에서 형질변환 한다. 개인의 클론으로부터의 Minipreps를 시퀀싱 하였다. 제한 서열 태그가 기대한 크기인 20bp라는 것이 입증되었다. 그러나, 21 개의 염기 길이의 태그가 어떤 클론에서 회수되었다. 20개 미만의 염기를 갖는 태그는 전혀 발견되지 않았다.The size of the DNA colony vector was also verified by PCR amplification. PCR products are then cloned into pUC19 plasmid and transformed in E. coli campentant cells (XL-2 Blue, Stratagene). Minipreps from individual clones were sequenced. It was demonstrated that the restriction sequence tag was 20bp, the expected size. However, 21 base long tags were recovered in some clones. No tag with less than 20 bases was found at all.

핑거프린팅 실험은 모든 기대된 14 HindⅢ-분해 람다가 DNA 콜로니 벡터에 존재한다는 것을 입증하였다. MmeI 처리 및 긴 암의 라이게이션 후에, 프래그먼트를 아가로오스 겔로부터 정제하고 프라이머 연장반응을 3 dXTP 및 하나의 디데옥시 뉴클레오티드(예: dATP, dTTP, dCTP, 및 ddGTP)의 존재 하에서 수행하였다. 그런 다음, 생성물을 아크릴아미드겔 상에서 분석하여 각각의 기대되는 프래그먼트의 동정을 가능하게 했다.Fingerprinting experiments demonstrated that all expected 14 HindIII-digested lambdas were present in the DNA colony vector. After MmeI treatment and long cancer ligation, fragments were purified from agarose gels and primer extension was performed in the presence of 3 dXTP and one dideoxy nucleotide (eg dATP, dTTP, dCTP, and ddGTP). The product was then analyzed on acrylamide gels to enable identification of each expected fragment.

4 개의 돌출을 생성시키는 6 개 염기 커터를 클로닝에 사용할 경우, 21 개의 연속적인 염기에 대한 정보를 이미 제조된 주형으로부터 획득할 수 있다. 21 개 중에, 6 개가 엔도뉴클레아제의 인식 부위를 형성하는 공지의염기이며 15 개가 유전자 변이 검출에 사용될 수 있다. 어떤 효소의 경우, 짧은 암의 "스틱키" 말단이 MmeI 부위와 오버랩된다면, 예를 들어 NcoI 말단 CATGG에 라이게이션딘 TCCGA가 MmeI 부위를 형성한다면, 이러한 숫자는 증가될 수 있다.When using a 6 base cutter for cloning, which produces 4 protrusions, information on 21 consecutive bases can be obtained from the already prepared template. Of the 21, 6 are known bases that form the recognition site of endonucleases and 15 can be used to detect gene mutations. For some enzymes, if the "sticky" end of a short cancer overlaps the MmeI site, this number can be increased, for example, if TCCGA ligated to the NcoI end CATGG forms the MmeI site.

또한, 클로닝의 힘을 증가시키기 위해 상기 표준 프로토콜의 두 개의 변형을 이용한다. 하나의 변형에서는, 평할 말단-생성 효소를 이용하였다. DNA를 개열시키기 위해 사용되는 효소가 6 개의 염기 인식 서열을 가지고 평활 말단을 남긴다면, 23 개의 연속적인 염기에 대한 정보(공지 6 개 및 SNP 검출 17 개)를 획득할 수 있다. 평활 말단의 라이게이션의 효율이 더 낮기 때문에, 라이게이션 시간의 연장이 필요하다. 그럼에도 불구하고, 라이게이션이 밤새 이루어진다면 충분한 라이게이션 효율을 얻을 수 있다. MscI-분해 람다 DNA를 이용하여 획득된 주형의 효율은 HindⅢ 분해 람다 DNA의 수율과 유사하였다.In addition, two variants of the standard protocol are used to increase the power of cloning. In one variation, flat end-generating enzymes were used. If the enzyme used to cleave the DNA has six base recognition sequences and leaves the blunt ends, information on 23 consecutive bases (6 known and 17 SNP detected) can be obtained. Since the efficiency of ligation at the smooth end is lower, an extension of the ligation time is necessary. Nevertheless, if ligation is done overnight, sufficient ligation efficiency can be obtained. The efficiency of the template obtained using MscI-digested lambda DNA was similar to the yield of HindIII digested lambda DNA.

pUC19 플라스미드로의 증폭된 주형의 삽입에 의해 획득된 플라스미드의 분석으로 다음과 같은 사항을 알 수 있었다:Analysis of the plasmid obtained by the insertion of the amplified template into the pUC19 plasmid revealed the following:

(1) 짧은 암 다이머를 함유하는 "주형"의 부존재;(1) the absence of "moulds" containing short cancer dimers;

(2) 소량의 바람직하지 않은 생성물(18 개의 클론 중 단지 1-2 개);(2) small amounts of undesirable product (only 1-2 of 18 clones);

(3) 주형의 서로 다른 람다 게놈 DNA 프래그먼트의 훌륭한 출현(15 개의 주형 중 오직 3 개의 프래그먼트가 2 회 발견되었다).(3) Excellent emergence of different lambda genomic DNA fragments of the template (only three fragments of 15 templates were found twice).

또 다른 변이에서, 평활 말단의 인위적인 생성을 이용하였다. 최초 DNA 분해 후에 남아 있는 4 개의 염기 돌출을 제거하면, 클로닝 정보는 25 개의 염기로 증가한다(공지 6 개 및 SNP 검출 19 개). 예비 실험에서, 돌출을 제거할 수 있는두 개의 효소를 조사하였다. 녹두 뉴클레아제(New England Biolabs)는 평활 말단을 효율적으로 생성시키는데 실패하였다. Klenow 효소에 의한 3' 돌출의 제거는 만족스러운 결과를 나타냈다. 또한, 필수적인 데옥시뉴클레오티드의 존재 하에서, 후자인 효소 또한 빈번한 커터에 의해 생성된 말단이 라이게이션에 참여하는 것을 억제할 수 있다. 예를 들어, DNA를 PstI 및 MspI로 분해한다면, dCTP 존재 하에서 Klenow 효소는 PstI 말단을 다듬어 없애고, MspI 말단을 불활성의 단일 염기 5' 돌출로 전환시킬 것이다(도 12).In another variation, artificial production of blunt ends was used. Upon removal of the 4 base overhangs remaining after the initial DNA digestion, the cloning information increases to 25 bases (6 known and 19 SNP detected). In preliminary experiments, two enzymes were examined that could eliminate protrusions. Mung bean nucleases (New England Biolabs) failed to efficiently produce smooth ends. Removal of the 3 'overhang by Klenow enzyme yielded satisfactory results. In addition, in the presence of the essential deoxynucleotides, the latter enzyme can also inhibit the terminal produced by frequent cutters from participating in ligation. For example, if DNA is digested with PstI and MspI, the Klenow enzyme will trim the PstI terminus in the presence of dCTP and convert the MspI terminus into an inactive single base 5 ′ overhang (FIG. 12).

6.3. 실시예 3: 게놈-와이드 서열 변이의 검출6.3. Example 3: Detection of Genome-Wide Sequence Variations

이 실시예는 재현 가능한 방법으로 복잡한 게놈으로부터 많은 수의 제한 서열 태그의 생성을 위해 사용되는 본 발명의 구현예를 설명한다. 이 제한 서열 태그는 변이의 선행 지식 없이 게놈간의 유전자 변이를 동정하는데 유용하며, 포괄적인 방법으로 선행 지식에 기초한 가설 없이 개체의 집단에 특이적인 표현형과 관련된 변이를 동정하는데 유용하며, 획득된 제한 서열의 고밀도로 인해 그러한 변이를 최소한 크기의 게놈 부위와 관련짓는데 유용하다.This example describes an embodiment of the invention used for the generation of a large number of restriction sequence tags from a complex genome in a reproducible manner. This restriction sequence tag is useful for identifying genetic variations between genomes without prior knowledge of mutations, and in a comprehensive way is useful for identifying variants related to phenotypes specific to a population of individuals without hypotheses based on prior knowledge, and obtained restriction sequences. Due to their high density, they are useful for associating such variations with at least genomic regions of size.

이 실시예에 개시된 방법은 서로 다른 게놈 DNA 샘플로부터 동일한 제한 프래그먼트를 생성시키기 위해 동일한 제한 엔도뉴클레아제의 사용하는 것에 기초한다. 증폭 후, 이러한 제한 프래그먼트의 말단을 시퀀싱하고, 서열을 처리하여, 게놈 DNA의 분해를 위해 사용된 제한 효소의 인식 부위 바로 옆에 존재하는 뉴클레오티드의 짧은 서열인 제한 서열 태그를 동정한다.The method disclosed in this example is based on the use of the same restriction endonuclease to generate the same restriction fragment from different genomic DNA samples. After amplification, the ends of these restriction fragments are sequenced and the sequences are processed to identify restriction sequence tags, which are short sequences of nucleotides present immediately next to the recognition sites of restriction enzymes used for digestion of genomic DNA.

연구 중인 집단의 각각의 개체인 임상 연구 중인 환자에 대해, 다음 단계에따라 이 방법을 수행한다:For each patient in the population under study, a patient under clinical study, follow these steps:

1) 게놈 DNA의 추출1) Extraction of Genomic DNA

게놈 DNA를 서로 다른 개체이 생물학적 샘플들로부터 추출한다. 이러한 생물학적 샘플은 볼의 스와브(swab) 또는 혈액 샘플이다. 게놈 DNA를 표준 프로토콜을 이용하여 추출한다. 전형적으로, 게놈 DNA 0.5 내지 3 ㎍을 볼 스와브로부터 추출하고, 게놈 DNA 4 ㎍을 전혈 샘플 100 ㎕로부터 추출한다. 하나의 2 배수 인간 게놈은 약 6 pg의 DNA를 가지고 있기 때문에, 이는 2 배수 게놈의 80 개 이상 내지 600 개 이상의 카피(copy)에 해당하며, 그것은 우리의 목적에 충분하다.Genomic DNA is extracted from biological samples by different individuals. Such a biological sample is a swab or blood sample of the ball. Genomic DNA is extracted using standard protocols. Typically, 0.5-3 μg of genomic DNA is extracted from a ball swab and 4 μg of genomic DNA is extracted from 100 μl of whole blood sample. Since one double fold human genome has about 6 pg of DNA, this corresponds to more than 80 to 600 copies of the double fold genome, which is sufficient for our purposes.

2) 제한 분해2) limited disassembly

사용될 제한 엔도뉴클레아제를, 두 개의 제한효소 인식 부위 사이의평균 거리(획득될 게놈 제한 프래그먼트의 평균적인 거리와 동일)에 직접적으로 의존하는 제한 서열 태그의 밀도에 따라 선택한다. 그러므로, 목적은 5000 개의 염기마다 평균 적어도 하나의 절단을 획득하는 것이기 때문에, 6 개의 염기 인식 부위를 갖는 제한 효소가 사용되며, 그것은 4096 염기의 평균크기의 프래그먼트를 생성시키는 것으로 기대된다. 따라서, 약 60억 개의 염기를 갖는 각각의 2 배수 인간 게놈의 1,400,000 개의 게놈 제한 프래그먼트가 생성된다. 각각의 게놈 제한 프래그먼트에 있어서, 두 개의 제한 서열 태그가 형성되기 때문에, 280 만개 이상의 서로 다른 제한 서열 태그의 총 수가 2 배수 인간 게놈에서 생성된다. 아래에서 논의한 바와 같이, 이러한 실시예에서 생성된 제한 서열 태그는 15 개의 염기 길이를 가지며, 다형이 인간 게놈에서 500 개의 염기마다 발견되고, 280만의 태그가 환자 당80,000 개의 다형을 생성시키거나, 인간 게놈 서열의 매 35,000 개의 염기마다 하나의 다형이 생성되는 것으로 평가된다.The restriction endonuclease to be used is selected according to the density of the restriction sequence tag directly dependent on the average distance between two restriction enzyme recognition sites (equivalent to the average distance of the genomic restriction fragment to be obtained). Therefore, since the goal is to obtain at least one cleavage on average every 5000 bases, a restriction enzyme with 6 base recognition sites is used, which is expected to produce fragments of average size of 4096 bases. Thus, 1,400,000 genomic restriction fragments of each two-fold human genome with about 6 billion bases are generated. For each genomic restriction fragment, because two restriction sequence tags are formed, the total number of more than 2.8 million different restriction sequence tags is generated in the two fold human genome. As discussed below, the restriction sequence tag generated in this example has 15 bases in length, polymorphisms are found every 500 bases in the human genome, and 2.8 million tags generate 80,000 polymorphs per patient, or It is estimated that one polymorph is produced for every 35,000 bases of the genomic sequence.

개체당 획득되는 제한 서열 태그의 수는 서로 다른 제한효소 또는 제한효소 조합을 이용함으로써 조절될 수 있다. 예를 들어, 제한 서열 태그의 수를 증가시키기 위해, 다수의 제한 효소를 조합하여 사용하거나 이 방법을 서로 다른 효소로 순서를 가지고 반복할 수 있다. 또 다른 방법으로는, 제한 서열 태그의 수를 감소시키기 위해, 더 긴 인식부위를 갖는 효소를 단독으로 또는 조합하여 이용할 수 있다.The number of restriction sequence tags obtained per individual can be controlled by using different restriction enzymes or restriction enzyme combinations. For example, to increase the number of restriction sequence tags, multiple restriction enzymes can be used in combination or the method can be repeated in sequence with different enzymes. Alternatively, enzymes with longer recognition sites can be used alone or in combination to reduce the number of restriction sequence tags.

상기 방법을 동일한 샘플 또는 서로 다른 샘플에서 반복할 때, 샘플 간의 게놈 서열의 변화로 인한 변이를 제외하고 동일한 제한 프래그먼트를 생성시키는 것이 필수적이어서, 동일한 제한 서열 태그를 획득할 것이다. 이론상, 아이소스키조마(isoschizomer)와 같이 동일한 인식부위를 갖는 서로 다른 제한 효소가 사용될 수 있다. 그러나, 이 실시예에서는 동일한 기원 및 동일한 공급자로부터 유래된 동일한 효소를 사용한다.When the method is repeated in the same sample or in different samples, it is essential to generate the same restriction fragments except for variations due to changes in the genomic sequence between the samples, thus obtaining the same restriction sequence tag. In theory, different restriction enzymes with the same recognition site may be used, such as isoschizomers. However, this example uses the same enzymes from the same origin and from the same supplier.

환자당 2 배수 게놈의 10 내지 20 카피를 이용하여 제한 분해를 수행하며, 각각의 제한 서열 태그를 확인하기 위해 반복을 수행한다.Restriction digestion is performed using 10-20 copies of the 2-fold genome per patient, and repetitions are performed to identify each restriction sequence tag.

3) 증폭 및 시퀀싱 벡터로의 게놈 제한 프래그먼트의 삽입3) Insertion of genome restriction fragments into amplification and sequencing vectors

이 실시예에서, 게놈 제한 프래그먼트의 증폭 및 시퀀싱을 위해 DNA 콜로니를 이용한다.In this example, DNA colonies are used for amplification and sequencing of genomic restriction fragments.

게놈 제한 프래그먼트를 환상 분자를 생성시키는 라이게이션 반응을 수행함으로써 DNA 콜로니 벡터, 즉 기결정된 서열을 갖는 조작된 핵산에 연결한다. DNA 콜로니 벡터는 다음 특징을 함유한다: 분해된 게놈 DNA 프래그먼트의 말단과 양립 가능하고 바람직하게는 점착성인 두 개의 말단을 가지며, 그 말단들은 탈인산화 되어 벡터의 자가 라이게이션이 억제되고; BsmFI, BceAI, Eco57I, 또는 MmeI와 같은 type ⅡS에 대한 2 개의 인식부위를 가지며, 각각의 인식부위는 말단에 위치하거나 그 벡터에 연결되는 게놈 제한 프래그먼트 내에서 절단을 유도하도록 배향되고; 두 개의 시퀀싱 프라이머에 대해 인식부위를 가지며, 그 각각은 벡터의 말단에 가깝게 위치하거나 벡터에 연결되는 게놈 제한 프래그먼트의 방향으로 프라이머 연장이 가능하도록 배향하거나; 시퀀싱 프라이머의 서열과 오버랩 될 수 있는, 벡터의 부분과 삽입된 프래그먼트를 증폭하는 것이 가능하도록 배향하는 두 개의 증폭 프라이머를 가지고; 그리고 선택적으로, 증폭 프라이머 서열을 이용하여 증폭되는 부위 밖에 위치하는 희귀한 절단 제한 효소의 인식부위를 갖는다. DNA 콜로니 벡터의 부가적인 특징은, 예를 들어 DNA 콜로니의 선형화를 위해 증폭되는 부위에 추가의 제한 부위 또는 또는 스페이서 서열을 포함한다.The genome restriction fragment is linked to a DNA colony vector, ie an engineered nucleic acid having a predetermined sequence, by performing a ligation reaction to generate a cyclic molecule. The DNA colony vector contains the following features: It has two ends that are compatible and preferably tacky with the ends of the digested genomic DNA fragments, the ends being dephosphorylated to inhibit self ligation of the vector; Having two recognition sites for type IIS such as BsmFI, BceAI, Eco57I, or MmeI, each recognition site is oriented to induce cleavage in genomic restriction fragments located at the ends or linked to the vector; Having recognition sites for two sequencing primers, each of which is oriented such that primer extension is possible in the direction of genomic restriction fragments that are located close to or linked to the ends of the vector; Having two amplification primers oriented to enable amplification of the inserted fragments and portions of the vector, which may overlap with the sequence of the sequencing primers; And optionally, a recognition site for a rare cleavage restriction enzyme located outside the site to be amplified using the amplification primer sequence. Additional features of DNA colony vectors include additional restriction sites or spacer sequences, for example, at sites to be amplified for linearization of DNA colonies.

게놈 제한 프래그먼트의 콘케이트머화(concatemerization)을 억제하기 위해, DNA 콜로니 벡터 분자를 게놈 제한 프래그먼트에 비해 과량의 몰로 사용한다.To inhibit concatemerization of genomic restriction fragments, DNA colony vector molecules are used in excess moles relative to genome restriction fragments.

4) 인서트 크기의 표준화4) Standardization of insert size

그런 다음, 게놈 제한 프래그먼트에 연결된 DNA 콜로니 벡터를 함유하는 환상 DNA 분자를 type-ⅡS 제한효소로 분해한다. 예를 들어, BceAI를 사용한다면 삽입된 게놈 프래그먼트 내의 14 개의 염기를 절단할 것이다. DNA 폴리머라제 I 또는T4 DNA 폴리머라제의 Klenow 프래그먼트와 같이 DNA 폴리머라제로 보충하는 반응 후에, 그 결과 생성되는 평활 말단을 라이게이션시켜 연결된 게놈 제한 프래그먼트의 28 개의 염기 부위, 즉 게놈 제한 프래그먼트의 각각의 말단으로부터 14개의 염기 부분을 함유하는 환상 분자를 생성시킨다.The cyclic DNA molecule containing the DNA colony vector linked to the genome restriction fragment is then digested with type-IIS restriction enzyme. For example, using BceAI will cleave 14 bases in the inserted genomic fragment. After a reaction supplemented with DNA polymerase, such as the Klenow fragment of DNA polymerase I or T4 DNA polymerase, the resulting smooth ends are ligated to assemble 28 base sites of each of the 28 genomic restriction fragments, ie, genomic restriction fragments. From the ends yield cyclic molecules containing 14 base moieties.

인식 부위 밖의 20 개의 염기를 절단하는 MmeI와 같은 효소를 이용하여 더 긴 인서트를 생성시킨다. 그러나, 2-염기의 3' 돌출이 생성된다는 사실 때문에, DNA 폴리머라제 I 또는 T4 DNA 폴리머라제의 Klenow 프래그먼트와 같은 DNA 폴리머라제와의 반응이 2 개의 염기를 제거할 것이다. 이러한 경우, 그 결과 연결된 게놈 제한 프래그먼트는 36 개의 염기 길이를 갖는다.Longer inserts are generated using enzymes such as MmeI that cleave 20 bases outside the recognition site. However, due to the fact that 2-base 3 'overhangs are produced, the reaction with DNA polymerase, such as the Klenow fragment of DNA polymerase I or T4 DNA polymerase, will remove two bases. In this case, the resulting genomic restriction fragment is 36 bases in length.

5) DNA 콜로니 주형의 생성5) Generation of DNA Colony Templates

증폭 프라이머의 존재 하에서 하나 이상의 PCR 증폭 주기를 이용하여 DNA 콜로니 주형을 생성시킨다. DNA 주형 분자 서열은 5' 말단에서 3' 말단의 다음 서열: 앞 방향의 제 1 증폭 프라이머 서열; 앞 방향의 제 1 시퀀싱 프라이머의 서열(제 1 증폭 프라이머의 서열과 오버랩될 수 있다); type-ⅡS 제한효소의 제 1 인식 부위; 크기 표준화 단계에서 기인하는 게놈 제한 프래그먼트에 연결된 28 또는 36 개의 염기(게놈 DNA를 분해하기 위해 사용되는 제한효소의 인식부위를 반을 포함); type-ⅡS 제한효소의 제 2 인식부위; 역방향의 시퀀싱 프라이머의 서열(제 2 증폭 프라이머 서열의 서열과 오버랩될 수 있다); 및 역방향인 제 2 증폭 프라이머의 서열을 함유한다.DNA colony templates are generated using one or more PCR amplification cycles in the presence of amplification primers. The DNA template molecular sequence comprises the following sequence from the 5 'end to the 3' end: the first amplification primer sequence in the forward direction; The sequence of the forward first sequencing primer (which may overlap with the sequence of the first amplification primer); a first recognition site of type-IIS restriction enzyme; 28 or 36 bases linked to genomic restriction fragments resulting from the size normalization step (including half the recognition sites of restriction enzymes used to degrade genomic DNA); second recognition site of type-IIS restriction enzyme; The sequence of the reverse sequencing primer (which may overlap with the sequence of the second amplification primer sequence); And a sequence of reverse a second amplification primers.

또 다른 방법으로는, DNA 콜로니 주형은 이전 단계에서 획득된 환상 분자를,증폭 프라이머에 의해 증폭되는 부위 밖에서 DNA 콜로니 벡터를 절단하는 희귀한 절단 효소를 이용하여 간단한 제한 분해에 의해 생성할 수 있다.Alternatively, the DNA colony template can be generated by simple restriction digestion using a rare cleavage enzyme that cleaves the DNA colony vector outside the site amplified by the amplification primers.

6) DNA 콜로니의 생성6) Generation of DNA Colonies

DNA 콜로니를 생성하는 제 1 단계는 DNA 콜로니 주형 분자 및 증폭 프라이머를, 관능화된 유리 또는 NucieoLink 튜브(Nunc, Roskide, DK)와 같은 플라스틱의 표면과 같은 고체 표면 상에 부착하는 것이다. DNA 콜로니 주형 및 증폭 프라이머 분자의 농도는, 부착 후에 표면이 고밀도의 증폭 프라이머 분자로 덮여지도록 선택하고, DNA 콜로니 주형 분자는 DNA 콜로니 주형 분자가 부착된 증폭 프라이머를 이용하여 DNA 콜로니로 국지적인 증폭이 일어나는 것을 가능하게 하고 적절한 반복이 이루어지도록 선택한다. 140만 개의 제한 프래그먼트가 생성되는 실시예에서는, 약 3000만 개의 DNA 콜로니가 표면 입방센티미터당 생성된다.The first step to generate DNA colonies is to attach the DNA colony template molecules and amplification primers onto a solid surface, such as a functionalized glass or a plastic surface such as NucieoLink tubes (Nunc, Roskide, DK). The concentration of the DNA colony template and amplification primer molecules is chosen so that the surface is covered with a dense amplification primer molecule after attachment, and the DNA colony template molecule is subjected to local amplification with DNA colonies using an amplification primer to which the DNA colony template molecule is attached. Make it possible to happen and choose the appropriate repetition. In an example where 1.4 million restriction fragments are generated, about 30 million DNA colonies are generated per surface cubic centimeter.

증폭은 등온 방법을 이용하여 수행한다(섹션 5.3 및 PCT 공개공보 WO 02/46456에 기재된 바와 같다).Amplification is performed using an isothermal method (as described in Section 5.3 and PCT Publication WO 02/46456).

7) DNA 콜로니의 시퀀싱7) Sequencing DNA Colonies

증폭 후, DNA 콜로니는 제한 분해 후의 변성에 의해 단일 가닥이 된다. 그런 다음, 제 1 시퀀싱 프라이머를 DNA 콜로니 벡터에 혼성화한다. 그런 다음, 표면을 T7 DNA 폴리머라제와 같은 DNA 폴리머라제 및 4 개의 가능한 뉴클레오티드 중 오직 하나의 혼합물과 함께 배양한다. 그 혼합물은 형광으로 라벨링된 것일 수도 있고 라벨링되지 않은 동일한 종류일 수 있으며, 10 개의 통합된 뉴클레오티드 중 약 하나가 형광으로 라벨링 되도록 한다. 이러한 라벨링된 뉴클레오티드는 그들이DNA 콜로니 분자의 서열에 상보적이라면 프라이머의 3' 말단으로 통합시킨다. 프라이머 연장 단계 후에, 각각의 DNA 콜로니의 형광의 강도 및 위치를 측정하기 위해 이미지를 형광 현미경(Axiovert 200, Zeiss, Germany equiped with ORCA-ER CCD 카메라, Hamamatsu, Japan)으로 찍는다. 이러한 과정을 4 개의 서로 다른 종류의 뉴클레오티드를 하나씩 주기적으로 반복함으로써 단계적인 방식으로 반복한다. 각각의 단계에서, 주어진 염기를 통합을 위해 이용하며, 그리하여 생성된 신호를 표면 상의 DNA 콜로니 각각에 대해 측정한다. 각 단계에서 하나 이상의 염기를 통합시킨 DNA 콜로니의 형광강도는 비례적으로 강해지는 반면, 염기를 통합하지 않는 콜로니의 형광강도는 변화하지 않은 채로 있다. 통합 단계 후의 형광강도를 그 이전의 형광강도와 비교함으로써, DNA 콜로니에 통합된 염기의 양이 결정된다. 각각의 DNA 콜로니에 있어서 형광강도의 순서적인 변화를 추적하고 그 강도를 연장 단계에 사용된 염기의 신원과 연결시킴으로써, 각각의 DNA 콜로니에 함유된 DNA의 서열이 결정된다.After amplification, DNA colonies become single stranded by denaturation after restriction digestion. The first sequencing primer is then hybridized to the DNA colony vector. The surface is then incubated with a mixture of only one of four possible nucleotides and a DNA polymerase such as T7 DNA polymerase. The mixture may be labeled with fluorescence or may be of the same kind that is not labeled, allowing about one of the 10 integrated nucleotides to be labeled with fluorescence. Such labeled nucleotides integrate into the 3 'end of the primer if they are complementary to the sequence of the DNA colony molecule. After the primer extension step, images are taken with a fluorescence microscope (Axiovert 200, Zeiss, Germany equiped with ORCA-ER CCD camera, Hamamatsu, Japan) to determine the intensity and location of the fluorescence of each DNA colony. This process is repeated in a stepwise manner by periodically repeating four different kinds of nucleotides one by one. In each step, a given base is used for integration, and the resulting signal is then measured for each DNA colony on the surface. In each step, the fluorescence intensity of DNA colonies incorporating one or more bases is increased proportionally, while the fluorescence intensities of colonies not incorporating bases remain unchanged. By comparing the fluorescence intensity after the integration step with the fluorescence intensity before it, the amount of base incorporated into the DNA colony is determined. The sequence of DNA contained in each DNA colony is determined by tracking the sequential change in fluorescence intensity in each DNA colony and linking the intensity with the identity of the base used in the extension step.

게놈 프래그먼트로부터 28 또는 36 개의 염기가 읽혀질 때까지 시퀀싱 단계를 반복한다. 시퀀싱되는 염기의 수는, 게놈 DNA의 분해를 위해 사용되는 제한 효소의 인식 부위의 절반까지 연장된 시퀀싱 프라이머를 이용함으로써 감소시킬 수 있다.The sequencing step is repeated until 28 or 36 bases are read from the genomic fragment. The number of bases that are sequenced can be reduced by using sequencing primers that extend to half of the recognition site of the restriction enzyme used for digestion of genomic DNA.

필요하다면, 연장된 제 1 시퀀싱 프라이머를 변성 및 세척에 의해 제거할 수 있으며, 상보적인 가닥의 시퀀싱은 제 2 시퀀싱 프라이머를 이용하여 수행할 수 있다.If necessary, the extended first sequencing primer can be removed by denaturation and washing, and sequencing of the complementary strand can be performed using a second sequencing primer.

8) 제한 효소 태그8) restriction enzyme tag

DNA 콜로니를 시퀀싱함으로써 획득되는 서열은, 각각의 원래의 게놈 제한 프래그먼트로부터의 2 개의 제한 서열 태그를 동정하기 위해 처리된다. 예를 들어, 효소 MmeI를 연결된 제한 프래그먼트의 크기의 표준화를 위해 사용할 때, 제한 서열 태그는 18 개의 염기 길이를 가지며, 게놈 DNA의 분해를 위해 사용되는 제한 부위의 절반으로부터 3 개 염기가 빠진 염기길이이다. BceAI를 사용할 때는, 제한 서열 태그는 11 개의 염기 길이를 갖는다.Sequences obtained by sequencing DNA colonies are processed to identify two restriction sequence tags from each original genome restriction fragment. For example, when using the enzyme MmeI for standardization of the size of the linked restriction fragment, the restriction sequence tag is 18 bases long and 3 bases missing from half of the restriction sites used for digestion of genomic DNA. to be. When using BceAI, the restriction sequence tag is 11 bases in length.

이러한 2 개의 제한 서열 태그는 원래의 게놈 제한 프래그먼트의 말단을 나타낸다. 각각의 DNA 콜로니에서 얻어지는 2 개의 태그는 물리적으로 게놈에 가깝고(예: 평균 4096 염기만큼 떨어져 있다), 또 다른 사용을 위해 저장된다. 게놈상의 태그의 위치를 15 개 또는 11 개의 염기와 함께, 게놈 DNA의 분해를 위해 사용되는 제한효소의 제한 부위의 6 개의 염기로 구성된 서열, 즉 21 또는 17 개의 염기 서열을 이용하여 결정한다.These two restriction sequence tags represent the ends of the original genomic restriction fragment. The two tags obtained from each DNA colony are physically close to the genome (eg, on average 4096 bases apart) and stored for another use. The location of the tag on the genome is determined using a sequence of six bases, ie 21 or 17 base sequences, of the restriction sites of the restriction enzyme used for digestion of genomic DNA, along with 15 or 11 bases.

9) 제한 서열 태그를 정렬하고, 표현형과 관련된 서열 변이를 확인하는 단계9) Aligning Restriction Sequence Tags and Identifying Sequence Variations Associated with Phenotype

그런 다음, 제한 서열 태그를 컴퓨터 프로그램을 이용하여 비교하여 서로 다른 태그를 동정하고 각각의 개체의 제한 서열 태그의 수를 결정한다. 그런 다음, 이러한 태그를 개체끼리 비교하여, 상동 태그 그룹과 집단의 특정 표현형과 관련된 서열 변이를 확인한다. 숨겨진 Markov 체인(hidden Markov chains) 또는 클러스터링 방법과 같은 당해 기술분야에 공지되어 있는 통계분석에 의해 비교를 수행할 수 있다. 태그를 또한 이미 획득된 태그 또는 데이터베이스로부터의 서열과 비교할수도 있다.The restriction sequence tags are then compared using a computer program to identify different tags and determine the number of restriction sequence tags for each individual. These tags are then compared among individuals to identify sequence variations associated with particular phenotypes of homologous tag groups and populations. Comparisons can be made by statistical analysis known in the art, such as hidden Markov chains or clustering methods. Tags can also be compared with sequences from tags or databases that have already been obtained.

두 개의 집단 간의 제한 서열 태그의 비교는 서로 다른 결과를 일으킬 수 있다. 주어진 서열 변이에 있어서, 집단 1에서 두 가지 유형의 유전자 변이의 비율이 집단 2에서의 비율과 다를 수 있다.Comparison of restriction sequence tags between two populations can produce different results. For a given sequence variation, the proportion of the two types of genetic variation in population 1 may differ from that in population 2.

다른 경우에는, 다양한 유형의 서열 변이의 비율이 두 개의 집단에서 유사하거나 동일할 수 있으나, 개인의 서로 다른 유전자 변이의 특정 조합의 분석은 어떤 변이의 조합이 두 개의 집단에서 다른 비율로 표현되는 것을 알려줄 수 있다.In other cases, the proportion of various types of sequence variation may be similar or identical in the two populations, but analysis of specific combinations of different genetic variations in an individual may indicate that a combination of variations is expressed in different proportions in the two populations. I can tell you.

본 발명의 방법에서 획득될 수 있는 태그 그룹의 예는 다음과 같다:Examples of tag groups that can be obtained in the method of the present invention are as follows:

개체 1에서는 다음과 같이 결정된다In entity 1 it is determined as follows:

. .

개체 2에서는 다음과 같이 결정된다In entity 2, it is determined as follows:

개체 3에서는 다음과 같이 결정된다In entity 3 it is determined as follows:

상기 결과로부터,From the above results,

태그 T1a, T2a, 및 T3a는 동일하고, 그룹-서열의 그룹 g1을 형성한다 Sg1=T1aThe tags T1a, T2a, and T3a are identical and form a group g1 of the group-sequence Sg1 = T1a

태그 T1b 및 T2b는 동일하고, 그룹-서열의 그룹 g2을 형성한다 Sg2=T2bThe tags T1b and T2b are identical and form a group-sequence group g2 Sg2 = T2b

태그 T1c 및 T3b는 동일하고, 그룹-서열의 그룹 g3을 형성한다 Sg3=T1cThe tags T1c and T3b are identical and form a group-sequence group g3 Sg3 = T1c

태그 T1e 및 T2c는 동일하고, 그룹-서열의 그룹 g4을 형성한다 Sg4=T1eThe tags T1e and T2c are identical and form a group-sequence group g4 Sg4 = T1e

태그 T1f 및 T3c는 동일하고, 그룹-서열의 그룹 g5을 형성한다 Sg5=T1f.The tags T1f and T3c are identical and form a group-sequence group g5 Sg5 = T1f.

다음과 같다고 할 수 있다It can be said that

는 하나의 단일 염기만을 제외하고 동일하지만, 그들 각각은 Sg1, Sg4, 및 Sg5와 매우 다르며,Are identical except for one single base, but each of them is very different from Sg1, Sg4, and Sg5,

는 하나의 단일 염기만을 제외하고 동일하지만, 그들 각각은 Sg1, Sg2, 및 Sg3와 매우 다르다.Are identical except for one single base, but each of them is very different from Sg1, Sg2, and Sg3.

그런 다음, Sg2 및 Sg3에 의해 형성된 그룹 G1, Sg4 및 Sg5에 의해 형성된 그룹 G2, 및 그룹 Sg1에 의해 형성된 그룹 G3를 생성할 수 있다.Then, groups G1 formed by Sg2 and Sg3, groups G2 formed by Sg4 and Sg5, and group G3 formed by group Sg1 can be generated.

각각의 개체가 두 개의 서로 다른 세트의 염색체를 갖고 있기 때문에,Because each individual has two different sets of chromosomes,

(1) 개체 1은 Sg1 카피 2 개, Sg2 카피 1 개, Sg3 카피 1 개, Sg4 카피 1 개, 및 Sg5 카피 1 개를 갖고,(1) Subject 1 has two Sg1 copies, one Sg2 copy, one Sg3 copy, one Sg4 copy, and one Sg5 copy,

(2) 개체 2는 Sg1 카피 2 개, Sg2 카피 2 개, 및 Sg4 카피 2 개를 갖고,(2) Subject 2 has two Sg1 copies, two Sg2 copies, and two Sg4 copies,

(3) 개체 3은 Sg1 카피 2 개, Sg3 카피 2 개, 및 Sg5 카피 2 개를 갖는다고 볼 수 있다.(3) Subject 3 can be considered to have two Sg1 copies, two Sg3 copies, and two Sg5 copies.

전형적인 결과 1Typical results 1

집단 1에서는 다음과 같이 밝혀졌다.In cohort 1 it was found:

서열 태그 Sg1 1000 카피Sequence tag Sg1 1000 copies

서열 태그 Sg2 327 카피Copy of sequence tag Sg2 327

서열 태그 Sg3 673 카피Copy of sequence tag Sg3 673

서열 태그 Sg4 521 카피Copy of sequence tag Sg4 521

서열 태그 Sg5 479 카피Copy of sequence tag Sg5 479

집단 2에서는 다음과 같이 밝혀졌다.In cohort 2 it was found:

서열 태그 Sg1 1000 카피Sequence tag Sg1 1000 copies

서열 태그 Sg2 345 카피Copy of sequence tag Sg2 345

서열 태그 Sg3 665 카피Copy of sequence tag Sg3 665

서열 태그 Sg4 502 카피Copy of sequence tag Sg4 502

서열 태그 Sg5 498 카피Copy of sequence tag Sg5 498

집단 1 및 집단 2 사이에 그룹 G1, G2, 및 G3의 각각의 조성에 있어서 현저한 차이가 없기 때문에, 이 그룹들은 집단 간의 표현형의 차이와 관련이 없다고 결론지을 수 있다.Since there is no significant difference in the respective composition of groups G1, G2, and G3 between group 1 and group 2, it can be concluded that these groups are not related to the difference in phenotype between groups.

전형적인 결과 2Typical results 2

집단 1에서는 다음과 같이 밝혀졌다.In cohort 1 it was found:

서열 태그 Sg1 1000 카피Sequence tag Sg1 1000 copies

서열 태그 Sg2 993 카피Copy of sequence tag Sg2 993

서열 태그 Sg3 7 카피7 copies of the sequence tag Sg3

서열 태그 Sg4 521 카피Copy of sequence tag Sg4 521

서열 태그 Sg5 479 카피Copy of sequence tag Sg5 479

집단 2에서는 다음과 같이 밝혀졌다.In cohort 2 it was found:

서열 태그 Sg1 1000 카피Sequence tag Sg1 1000 copies

서열 태그 Sg2 946 카피Copy of sequence tag Sg2 946

서열 태그 Sg3 54 카피54 copies of the sequence tag Sg3

서열 태그 Sg4 502 카피Copy of sequence tag Sg4 502

서열 태그 Sg5 498 카피Copy of sequence tag Sg5 498

집단 1 및 집단 2 사이에 그룹 G1 및 G3의 각각의 조성에 있어서 현저한 차이가 없기 때문에, 이 그룹들은 집단 간의 표현형의 차이와 관련이 없다고 결론지을 수 있다. 집단 1 및 집단 2 사이에 그룹 G2의 조성에 있어서 현저한 차이가 있기 때문에, 이 그룹은 집단 간의 표현형의 차이와 관련이 있다고 결론지을 수 있다. 또한, 집단 2에 속하는 가능성이 Sg2를 갖는 갖는 개인보다 Sg3 서열을 갖는 개인이 더 크다.Since there is no significant difference in the respective composition of groups G1 and G3 between group 1 and group 2, it can be concluded that these groups are not related to the difference in phenotype between groups. Since there is a significant difference in the composition of group G2 between group 1 and group 2, it can be concluded that this group is related to the difference in phenotype between groups. In addition, individuals with Sg3 sequences are more likely to belong to population 2 than individuals with Sg2.

전형적인 결과 3Typical results 3

집단 1에서는 다음과 같이 밝혀졌다.In cohort 1 it was found:

서열 태그 Sg1 1000 카피Sequence tag Sg1 1000 copies

서열 태그 Sg2 314 카피Copy of sequence tag Sg2 314

서열 태그 Sg3 686 카피Copy of sequence tag Sg3 686

서열 태그 Sg4 486 카피Sequence tag Sg4 486 copies

서열 태그 Sg5 514 카피Copy of sequence tag Sg5 514

집단 2에서는 다음과 같이 밝혀졌다.In cohort 2 it was found:

서열 태그 Sg1 1000 카피Sequence tag Sg1 1000 copies

서열 태그 Sg2 289 카피Copy of sequence tag Sg2 289

서열 태그 Sg3 711 카피Copy of sequence tag Sg3 711

서열 태그 Sg4 511 카피Copy of sequence tag Sg4 511

서열 태그 Sg5 489 카피Copy of sequence tag Sg5 489

집단 1 및 집단 2 사이에 그룹 G1, G2, 및 G3의 각각의 조성에 있어서 현저한 차이가 없다. 그러나, 개체가 서열 태그의 조합을 얼마나 많이 갖는지를 셈으로써 데이터를 더 분석할 수 있다:There is no significant difference in the composition of each of groups G1, G2, and G3 between group 1 and group 2. However, we can further analyze the data by counting how many combinations of sequence tags an individual has:

본 분석은 집단 1과 집단 2 간의 서열 태그의 조합에 있어서 현저한 차이를 나타낸다. 따라서, 서열 태그의 이러한 조합은 집단 간의 표현형 차이와 관련이 있다.This analysis shows significant differences in the combination of sequence tags between population 1 and population 2. Thus, this combination of sequence tags is related to phenotypic differences between populations.

7. 인용된 참고문헌7. Cited References

여기에 인용된 모든 참고문헌은 전체가 참고로 여기에 통합되며, 각각 개개의 출판물 또는 특허 또는 특허출원이 구체적으로 개별적으로 모든 목적을 위해 전체가 참고로 통합된다고 기재되어 있다면 동일한 정도로 모든 목적을 위해 통합된다.All references cited herein are hereby incorporated by reference in their entirety, and each publication or patent or patent application is specifically incorporated for all purposes to the full extent and for all purposes for each purpose. Are integrated.

본 발명의 요지 및 범위를 벗어남이 없이 본 발명의 많은 변형 및 변화를 줄 수 있으며, 이는 당업자에게 명백하다. 여기에 기재된 특정 구현예는 단지 예를 들기 위해 제공된 것이며, 본 발명은 단지 첨부된 특허청구범위 및 그러한 청구범위에서 청구하는 것과 충분한 균등물에 의해서만 제한되어야 한다.Many modifications and variations of the present invention can be made without departing from the spirit and scope of the invention, which is apparent to those skilled in the art. The specific embodiments described herein are provided by way of example only, and the invention is to be limited only by the appended claims and equivalents thereof as claimed in those claims.

Claims

I) A) digesting the nucleic acid of each individual organism using one or more first restriction enzymes to produce a set of restriction fragments; And

B) determining one or more restriction sequence tags for each of the restriction fragments, wherein the one or more restriction sequence tags comprise a sequence of corresponding restriction fragments, the set of restriction sequence tags of each individual organism Thereby generating a restriction sequence tag set of each individual organism of the one or more individual organisms; And

II) classifying the restriction sequence tags of the one or more individual organisms into one or more restriction sequence tag groups comprising homologous restriction sequence tags that identify sequence variations associated with the phenotype;

A method of determining genome wide sequence variation associated with a phenotype of one or more individual organisms.

The method of claim 1, wherein determining the restriction sequence tag set

B1) linking the restriction fragment in the set of restriction fragments with a first engineered nucleic acid comprising a predetermined nucleotide sequence to obtain a first annular nucleic acid fragment, wherein the predetermined nucleotide sequence is determined by the second restriction enzyme by the restriction enzyme. At least one recognition site of a second restriction enzyme, positioned and oriented to cleave at the fragment;

B2) digesting the first cyclic nucleic acid fragment with the second restriction enzyme;

B3) altering the termini produced by the second restriction enzyme to enable ligation;

B4) linking the ends produced by the second restriction enzyme to generate a second set of circular nucleic acid fragments; And

B5) sequencing at least a portion of said respective restriction fragment of said second circular nucleic acid to determine said set of restriction sequence tags.

3. The method of claim 2, wherein each of the one or more recognition sites is located close to the terminus of the first engineered nucleic acid.

4. The method of claim 2 or 3, wherein each of the one or more recognition sites are located less than 25 nucleotides away from the end of the first engineered nucleic acid.

The method of claim 2, wherein each of the one or more recognition sites is located less than 5 nucleotides away from the end of the first engineered nucleic acid.

The method of any one of claims 2 to 5, wherein the second restriction enzyme is a type IIS endonuclease.

The method according to any one of claims 2 to 6, further comprising the step of immobilizing and amplifying the nucleic acid fragment contained in the second annular nucleic acid sequence on a solid surface before the step B5). .

8. The method of claim 7, wherein said fixing and amplifying step is performed by producing a colony of said nucleic acid fragment in said second circular nucleic acid fragment on said solid surface, said nucleic acid fragment colony being said nucleic acid fragment in said second circular nucleic acid fragment. And a plurality of immobilized single stranded DNA molecules comprising one of the two.

The method of claim 8,

Iii) linearizing the second circular nucleic acid fragment to produce a linear fragment;

Ii) providing a solid surface comprising a plurality of colony primers fixed at the 5 'end on the solid surface, wherein each colony primer comprises a sequence that is hybridizable with the sequence at the 3' end of the linear fragment Step to be;

Iii) denaturing the linear fragments to produce single stranded fragments;

Iii) annealing the single stranded fragment to the immobilized colony primer;

Iii) performing a primer extension reaction using the annealed single stranded fragment as a template to generate an immobilized double stranded nucleic acid fragment;

Iii) denaturing the immobilized double stranded nucleic acid fragments to produce immobilized single stranded fragments;

Iii) annealing the fixed single stranded fragments to the fixed colony primers;

Iii) repeating said steps v) to iii) such that said colonies are produced at each particular site on said solid surface.

The method of claim 8,

Ii) mixing the linear fragment with colony primers each comprising a sequence hybridizable with a sequence at the 3 'end of the linear fragment;

Iii) implanting the linear fragments and colony primers at 5 'ends on a solid surface to produce fixed linear fragments and fixed colony primers;

Iii) denaturing the fixed linear fragment to produce a fixed single stranded fragment;

v) annealing the fixed single stranded fragment to the fixed colony primer to obtain an annealed single stranded fragment;

Iii) annealing the fixed single stranded fragments to the fixed colony primers;

The method of claim 8,

Ii) mixing the linear fragments with colony primers, each of which comprises a sequence hybridizable with a sequence at the 3 'end of the linear fragment, wherein the concentration of the colony primers is adjusted to allow amplification of the implanted linear fragments to occur. step;

Iii) applying the amplification solution containing polymerase and nucleotides to said solid surface such that said colonies are isothermally produced at a specific location on said solid surface, wherein said colonies are produced. .

The method of claim 9, wherein the sequencing is

Iii) hybridizing a sequencing primer to the colony;

Ii) performing primer extension with one labeled nucleotide;

Iii) detecting the amount of labeled nucleotides incorporated into each of the extended primers at said location; And

Iii) repeating steps ii) and iii) to determine a portion of the nucleotide sequence of each colony.

13. The method of claim 12, wherein the labeled nucleotides are nucleotides labeled with fluorescence and the detection involves detecting the fluorescence intensity of the labeled nucleotides.

The method of claim 1, wherein the first restriction enzyme cleaves at both recognition sites in such a way that the cleavage site surrounds a portion of the sequence that is not part of the recognition site, and determining the restriction sequence tag set

B1) altering the termini produced by the first restriction enzyme to enable ligation;

B2) linking said restriction fragment in said set of restriction fragments with a first engineered nucleic acid comprising a predetermined nucleotide sequence to obtain a first set of circular nucleic acid fragments; And

B3) sequencing at least a portion of said respective restriction fragment of said first circular nucleic acid to determine said set of restriction sequence tags.

The method of claim 14, further comprising the step of immobilizing and amplifying the nucleic acid fragment contained in the second annular nucleic acid fragment on the solid surface before step B3).

16. The method of claim 15, wherein the fixing and amplifying step is performed by generating colonies of the nucleic acid fragments in the first annular nucleic acid fragment on the solid surface, each of the colonies being in the nucleic acid fragments in the first annular nucleic acid fragment. A method comprising a plurality of immobilized single stranded DNA molecules.

The method of claim 16,

Iii) linearizing the first circular nucleic acid fragment to produce a linear fragment;

Ii) providing a solid surface comprising a plurality of colony primers immobilized at the 5 'end on the solid surface, wherein each colony primer comprises a sequence that is capable of hybridizing with the sequence at the 3' end of the linear fragment Step to be;

Iii) denaturing the linear fragments to produce single stranded fragments;

Iii) annealing the single stranded fragment to the immobilized colony primer;

Iii) annealing the fixed single stranded fragments to the fixed colony primers;

The method of claim 16,

Iii) annealing the fixed single stranded fragments to the fixed colony primers;

The method of claim 16,

Iii) linearizing said first type nucleic acid fragment to produce a linear fragment;

The method of claim 17, wherein the sequencing is

Iii) hybridizing a sequencing primer to the colony;

Ii) performing primer extension with one labeled nucleotide;

21. The method of claim 20, wherein the labeled nucleotides are nucleotides labeled with fluorescence and the detection involves detecting fluorescence intensity of the labeled nucleotides.

The method of claim 1, wherein determining the restriction sequence tag set

B1) a first engineered nucleic acid comprising a predetermined nucleotide sequence comprising a recognition site of a second restriction enzyme positioned and oriented so that the second restriction enzyme in the restriction fragment is cleaved and the restriction fragment in the set of restriction fragments Linking to obtain a first set of nucleic acid fragments;

B2) digesting the first nucleic acid fragment with the second restriction enzyme;

B4) linking said end produced by said second restriction enzyme with a second engineered nucleic acid comprising a predetermined nucleotide sequence to generate a second nucleic acid fragment; And

B5) sequencing at least a portion of each respective restriction fragment of the second nucleic acid fragment to determine the set of restriction sequence tags.

23. The method of claim 22, wherein said recognition site of said second restriction enzyme is located close to the terminus of said first engineered nucleic acid.

24. The method of claim 22 or 23, wherein each of the one or more recognition sites are located less than 25 nucleotides away from the end of the first engineered nucleic acid.

25. The method of any one of claims 22 to 24, wherein the recognition site is located 0 to 5 nucleotides away from the end of the first engineered nucleic acid.

26. The method of any one of claims 22 to 25, wherein the second restriction enzyme is a type IIS endonuclease.

27. The method of any one of claims 22 to 26, further comprising the step of immobilizing and amplifying the nucleic acid fragment in the second nucleic acid fragment on a solid surface before the step B5).

28. The method of claim 27, wherein said fixing and amplifying step is performed by producing a colony of said nucleic acid fragment in said second nucleic acid fragment on said solid surface, said nucleic acid fragment colony being in said nucleic acid fragment in said second nucleic acid fragment. A method comprising a plurality of immobilized single stranded DNA molecules.

The method of claim 28,

i) each colony primer comprises a sequence that is hybridizable with a sequence at the 3 'end of the second nucleic acid fragment and provides a solid surface comprising a plurality of colony primers immobilized at the 5' end on the solid surface. step;

Ii) denaturing said second nucleic acid fragment to produce a single stranded fragment;

Iii) annealing the single stranded fragment to the immobilized colony primer;

Iii) generating an immobilized double stranded nucleic acid fragment by performing a primer extension reaction using the annealed single stranded fragment as a template;

Iii) annealing the fixed single stranded fragments with fixed colony primers;

Iii) repeating said steps iii) to iii) such that said colonies are formed at each particular location of said solid surface.

The method of claim 28,

Iii) mixing the second window nucleic acid fragment with colony primers, each of which comprises a sequence hybridizable with a sequence at the 3 'end of the linear fragment;

Ii) implanting said second nucleic acid fragment and colony primer to the 5 'end on a solid surface to produce immobilized nucleic acid fragment and immobilized colony primer;

Iii) denaturing the immobilized nucleic acid fragments to produce immobilized single stranded fragments;

Iii) annealing the fixed single stranded fragment to the fixed colony primer to obtain an annealed single stranded fragment;

Iii) annealing the fixed single stranded fragments to the fixed colony primers;

The method of claim 28,

Iii) mixing the second nucleic acid fragment with a colony primer each comprising a sequence capable of hybridizing with a sequence at the 3 'end of the second nucleic acid fragment, wherein the concentration of the colony primer is amplified of the second nucleic acid fragment to be implanted Adjusting to make this happen;

Iii) grafting said second nucleic acid fragment and colony primer to the 5 'end on a solid surface to produce immobilized linear fragments and immobilized colony primers;

32. The method of any of claims 29-31, wherein the sequencing is

Iii) hybridizing a sequencing primer to the colony;

Ii) performing primer extension with one labeled nucleotide;

Iii) repeating steps ii) and iii) to determine a portion of the sequence of each colony.

33. The method of claim 32, wherein the labeled nucleotides are fluorescently labeled nucleotides, and wherein the detection involves detecting fluorescence intensity of the labeled nucleotides.

The method of claim 1, wherein the first restriction enzyme is a rare cutter, and the determining of the restriction sequence tag set comprises:

B1) linking said restriction fragment in said restriction fragment set with a first engineered nucleic acid comprising a predetermined nucleotide sequence to obtain a first set of nucleic acid fragments;

B2) digesting the first nucleic acid fragment with one or more second restriction enzymes that differ from the first restriction enzyme and do not cleave the first engineered nucleic acid to obtain a second restriction fragment;

B3) linking the end of the second restriction fragment with a second engineered nucleic acid comprising a predetermined nucleotide sequence to form a second set of nucleic acid fragments;

B4) sequencing at least a portion of said respective restriction fragment of said second nucleic acid fragment to determine said restriction sequence tag set.

35. The method of claim 34, wherein the rare cutter recognizes a six-base recognition sequence.

35. The method of claim 34, wherein the rare cutter recognizes an 8-base recognition sequence or at least 8 base recognition sequences.

37. The method of any one of claims 34 to 36, further comprising the step of immobilizing and amplifying the nucleic acid fragment of the second nucleic acid fragment on a solid surface before step B4).

38. The colony of the nucleic acid fragment of the second nucleic acid fragment as recited in claim 37 wherein said fixing and amplifying comprises a plurality of immobilized single-stranded DNA molecules of one of said nucleic acid fragments of said second nucleic acid fragment. By producing on a solid surface.

The method of claim 38,

Iii) providing a solid surface comprising a plurality of colony primers, wherein a colony primer comprising a sequence hybridizable with a sequence at the 3 'end of said second nucleic acid fragment is immobilized at the 5' end on a solid surface;

Iii) annealing the single stranded fragment to the immobilized colony primer;

Iii) annealing the fixed single stranded fragments to the fixed colony primers;

Iii) repeating said steps iii) to iii) to produce said colonies at each particular site on said solid surface.

The method of claim 38,

Iii) mixing the second nucleic acid fragment with colony primers each comprising a sequence capable of hybridizing with a sequence at the 3 'end of the second nucleic acid fragment;

Ii) implanting said second nucleic acid fragment and colony primer to the 5 'end on a solid surface to produce an immobilized second nucleic acid fragment and immobilized colony primer;

Iii) annealing the fixed single stranded fragments to the fixed colony primers;

The method of claim 38,

Iii) mixing the second nucleic acid fragment with a colony primer each comprising a sequence capable of hybridizing with a sequence at the 3 'end of the second nucleic acid fragment, wherein the concentration of the colony primer is Adjusted to allow amplification to occur;

Iii) implanting said second nucleic acid fragment and colony primer to the 5 'end on a solid surface to produce an immobilized second nucleic acid fragment and immobilized colony primer;

42. The method of any one of claims 39 to 41 wherein said sequencing is

Iii) hybridizing a sequencing primer to the colony;

Ii) performing primer extension with one labeled nucleotide;

43. The method of claim 42, wherein the labeled nucleotides are nucleotides labeled with fluorescence and the detection involves detecting the fluorescence intensity of the labeled nucleotides.

The method of claim 1, wherein determining the restriction sequence tag set

B1) linking said restriction fragment in said set of restriction fragments with a first engineered nucleic acid comprising a predetermined nucleotide sequence to obtain a first set of nucleic acid fragments; And

B2) obtaining a second restriction fragment by digesting the first nucleic acid fragment with a second restriction enzyme that is different from the first restriction enzyme and that does not cleave the first nucleic acid;

B3) linking the end of the second restriction fragment with a second engineered nucleic acid comprising a predetermined nucleotide sequence to generate a second set of nucleic acid fragments;

B4) sequencing at least a portion of each said nucleic acid fragment in said second nucleic acid fragment to determine said set of restriction sequence tags.

45. The method of claim 44, further comprising repeating steps B2) through B4) for a plurality of different second restriction enzymes.

46. The method of claim 45, further comprising the step of immobilizing and amplifying the nucleic acid fragments in said second nucleic acid fragments on a solid surface prior to said step B4).

47. The method of claim 46, wherein the fixing and amplifying step is performed by generating colonies of the nucleic acid fragments in the second nucleic acid fragment on the solid surface, each of the colonies being one of the nucleic acid fragments in the second nucleic acid fragment. A method comprising a plurality of immobilized single stranded DNA molecules.

The method of claim 47,

i) providing a solid surface comprising a plurality of colony primers, wherein each colony primer comprises a sequence hybridizable with a sequence at the 3 'end of the second nucleic acid fragment and is immobilized at the 5' end on the solid surface. ;

Iii) annealing the single stranded fragment to the immobilized colony primer;

Iii) annealing the fixed single stranded fragments with fixed colony primers;

The method of claim 47,

Iii) denaturing the immobilized second nucleic acid fragment to produce an immobilized single stranded fragment;

Iii) annealing the fixed single stranded fragments to the fixed colony primers; And

The method of claim 47,

Iii) mixing the second nucleic acid fragment with a colony primer each comprising a sequence capable of hybridizing with a sequence at the 3 'end of the second nucleic acid fragment, wherein the concentration of the colony primer is amplified of the second nucleic acid fragment to be implanted Adjusting for this to occur;

Iii) implanting said second nucleic acid fragment and colony primer to the 5 'end on a solid surface to produce fixed linear fragments and fixed colony primers;

51. The method of any of claims 48-50, wherein the sequencing is

Iii) hybridizing a sequencing primer to the colony;

Ii) performing primer extension with one labeled nucleotide;

53. The method of claim 51, wherein said labeled nucleotides are nucleotides labeled with fluorescence and said detection involves detecting the fluorescence intensity of said labeled nucleotides.

The method of claim 1, wherein determining the restriction sequence tag set

B1) the restriction fragment in the restriction fragment set, wherein the second restriction enzyme and the third restriction enzyme include two recognition sites of the second restriction enzyme and two restriction sites of different restriction enzymes; A recognition site is located between two recognition sites of the third restriction enzyme and the recognition site of the third restriction enzyme comprises a first nucleotide sequence comprising a predetermined nucleotide sequence located and oriented so that the third restriction enzyme is cleaved at the restriction fragment. Linking with the engineered nucleic acid of to obtain a first set of circular nucleic acid fragments;

B2) digesting the first nucleic acid fragment with a second restriction enzyme to obtain a second nucleic acid fragment;

B3) joining the ends of said second restriction fragment to produce a second set of circular nucleic acid fragments;

B4) sequencing a portion of each of said restriction fragments in said third circular nucleic acid fragment to determine said restriction sequence tag set.

The method of claim 53, wherein after step B3)

B5) digesting the second circular nucleic acid fragment with the third restriction enzyme to generate a third set of nucleic acid fragments;

B6) modifying the terminal produced by the third restriction enzyme to enable ligation;

B7) connecting the ends of said third nucleic acid fragments to produce a third set of circular nucleic acid fragments.

54. The method of claim 53, further comprising repeating steps B1) to B4) for each of a plurality of different second restriction enzymes.

56. The method of any one of claims 53 to 55, wherein each recognition site is located close to the terminus of the first engineered nucleic acid.

59. The method of any one of claims 53-56, wherein each of the recognition sites is located less than 25 nucleotides away from the end of the first engineered nucleic acid.

58. The method of any one of claims 53-57, wherein the recognition site is located 0 to 5 nucleotides away from the end of the first engineered nucleic acid.

59. The method of any of claims 53-58, wherein the second restriction enzyme is a type IIS endonuclease.

60. The method of claim 59, further comprising the step of immobilizing and amplifying the nucleic acid fragment in the second annular nucleic acid fragment on a solid surface prior to step B4).

61. The method of claim 60, wherein said fixing and amplifying step is performed by producing a colony of said nucleic acid fragment in said second annular nucleic acid fragment on said solid surface, said colony being one of said nucleic acid fragments in said second annular nucleic acid fragment. And a plurality of immobilized single-stranded DNA molecules.

62. The method of claim 61,

Ii) providing a solid surface comprising a plurality of such colony primers, each colony comprising a sequence hybridizable with a sequence at the 3 'end of the linear fragment and immobilized at the 5' end on the solid surface;

Iii) denaturing the linear fragments to produce single stranded fragments;

Iii) annealing the single stranded fragment to the immobilized colony primer;

Iii) annealing the fixed single stranded fragments with fixed colony primers;

62. The method of claim 61,

Ii) the linear fragments are mixed with colony primers each comprising a sequence hybridizable with a sequence at the 3 'end of the linear fragment, wherein the concentration of the colony primers is adjusted to allow amplification of the implanted linear fragments to occur. Doing;

Iii) implanting the linear nucleic acid fragments and colony primers at 5 'ends on a solid surface to produce fixed linear fragments and immobilized colony primers;

65. The method of any of claims 62 to 64, wherein the sequencing is

Iii) hybridizing a sequencing primer to the colony;

Ii) performing primer extension with one labeled nucleotide;

66. The method of claim 65, wherein the labeled nucleotides are nucleotides labeled with fluorescence, and wherein the detection involves detecting fluorescence intensity of the labeled nucleotides.

The method of claim 1, wherein determining the restriction sequence tag set

B1) linking said restriction fragment in said set of restriction fragments with a first engineered nucleic acid comprising a predetermined nucleotide sequence comprising a recognition site of a second restriction enzyme with a first restriction enzyme to link said first nucleic acid fragment set Obtaining;

B2) digesting the first nucleic acid fragment with a second restriction enzyme to obtain a second set of nucleic acid fragments;

B3) joining the ends of the second restriction fragment to generate a first set of circular nucleic acid fragments;

B4) sequencing at least a portion of each of said fourth nucleic acid fragments to determine said set of restriction sequence tags.

The method of claim 53, wherein after step B3)

B5) digesting the first cyclic nucleic acid fragment with the third restriction enzyme which is different from the first restriction enzyme and the second restriction enzyme to generate a third set of nucleic acid fragments;

B6) modifying the terminal produced by the third restriction enzyme to enable ligation; And

B7) connecting the ends of said third nucleic acid fragments to produce a second set of circular nucleic acid fragments.

68. The method of claim 67, further comprising repeating steps B1) to B4) for each of a plurality of different second restriction enzymes.

70. The method of claim 69, further comprising the step of immobilizing and amplifying the nucleic acid fragment in the first annular nucleic acid fragment on a solid surface prior to step B4).

71. The method of claim 70, wherein the fixing and amplifying step is performed by producing colonies of the nucleic acid fragments in the first annular nucleic acid fragment on the solid surface, each of the colonies being in the nucleic acid fragments in the first annular nucleic acid fragment. A method comprising a plurality of immobilized single stranded DNA molecules.

The method of claim 71 wherein

Iii) denaturing the linear fragment to produce a single stranded fragment;

Iii) annealing the single stranded fragment to the immobilized colony primer;

Iii) annealing the fixed single stranded fragments with fixed colony primers;

The method of claim 71 wherein

75. The method of any of claims 72-74, wherein the sequencing is

Iii) hybridizing a sequencing primer to the colony;

Ii) performing primer extension with one labeled nucleotide;

76. The method of claim 75, wherein said labeled nucleotides are nucleotides labeled with fluorescence, and said detection involves detecting the fluorescence intensity of said labeled nucleotides.

77. The method of any one of claims 1 to 76, further comprising the step of digesting said restriction fragment set with a plurality of different first restriction enzymes in step A).

78. The method of any one of claims 1-77, wherein each said group consists of restriction sequence tags wherein at least 60% are homologous.

79. The method of claim 78, wherein each said group consists of restriction sequence tags wherein at least 70% are homologous.

80. The method of claim 79, wherein each said group consists of restriction sequence tags that are at least 80% homologous.

81. The method of claim 80, wherein each said group consists of restriction sequence tags that are at least 90% homologous.

84. The method of claim 81, wherein each said group consists of restriction sequence tags that are at least 99% homologous.

A) by the method of any of claims 1-82, determining a set of restriction sequence tags for each population of organisms having one or more organisms for each of a plurality of different phenotypes;

B) A method of determining genome-wide sequence variation among a plurality of different phenotypes, comprising comparing said set of restriction sequence tags in living organisms of different phenotypes to determine one or more sequence variations associated with different phenotypes. .

84. The method of claim 83, further comprising mapping the at least one restriction sequence tag to the genomic sequence of the organism after step B) to identify the genomic location of the at least one restriction sequence tag. .

77. The method of any one of claims 45-52, 55-66, and 69-76, wherein the plurality of different second restriction enzymes comprise at least three different restriction enzymes. Characterized in that.

A) by the method of claim 85, determining a set of restriction sequence tags for each population of organisms having one or more organisms for each of a plurality of different phenotypes;

87. The method of claim 86, further comprising mapping the at least one restriction sequence tag to the genomic sequence of the organism after step B) to identify the genomic location of the at least one restriction sequence tag. .

77. The method of any one of claims 45-52, 55-66, and 69-76, wherein the plurality of different second restriction enzymes comprises at least 10 different restriction enzymes. Characterized in that.

A) by the method of claim 88, determining a set of restriction sequence tags for each population of organisms having one or more organisms for each of a plurality of different phenotypes;

B) A method of determining genome-wide sequence variation among a plurality of different phenotypes, comprising comparing the set of restriction sequences tags in living organisms of different phenotypes to determine one or more sequence variations associated with different phenotypes. .

90. The method of claim 89, further comprising mapping the at least one restriction sequence tag to the genomic sequence of the organism after step B) to identify the genomic location of the at least one restriction sequence tag. .

83. The method of any one of the preceding claims, wherein said one or more individual living beings are humans.

84. The method of any one of the preceding claims, wherein each set of restriction fragments comprises at least ten different restriction fragments.

84. The method of any one of the preceding claims, wherein each set of restriction fragments comprises at least 100 different restriction fragments.

84. The method of any one of the preceding claims, wherein each set of restriction fragments comprises at least 1000 different restriction fragments.

84. The method of any one of the preceding claims, wherein each set of restriction fragments comprises at least 10,000 different restriction fragments.

83. The method of any one of the preceding claims, wherein each set of restriction fragments comprises at least 100,000 different restriction fragments.

83. The method of any one of the preceding claims, wherein each set of restriction fragments comprises at least 10 ⁶ different restriction fragments.

84. The method of any one of the preceding claims, wherein each of the set of restriction fragments comprises at least 10 ⁷ different restriction fragments.

84. The method of any one of the preceding claims, wherein each set of restriction fragments comprises at least 10 ⁸ different restriction fragments.

100. The method of any one of claims 1 to 99, wherein step I) is performed on one individual.

101. The method of any one of claims 1-100, wherein said step II) of classifying a restriction sequence tag further comprises comparing said restriction sequence tag with a reference sequence.

102. The method of claim 101, wherein said reference sequence comprises a genomic sequence of an organism.