CN104685064A

CN104685064A - Highly multiplex PCR methods and compositions

Info

Publication number: CN104685064A
Application number: CN201280075224.8A
Authority: CN
Inventors: B·齐默曼; M·M·希尔; P·G·拉格劳特; M·多德
Original assignee: Gene Security Network Inc
Current assignee: Natera Inc
Priority date: 2012-07-24
Filing date: 2012-11-21
Publication date: 2015-06-03
Also published as: JP6997814B2; HK1211058A1; JP6997813B2; JP7348330B2; JP6916153B2; JP2020054402A; CA2877493A1; JP6392222B2; JP2022037145A; IL236435A0; CA2877493C; JP2022027971A; JP2015526073A; JP2020054401A; JP2022027975A; WO2014018080A1; JP7503043B2; AU2012385961A1; AU2012385961B2; JP7027468B2

Abstract

The invention provides methods for simultaneously amplifying multiple nucleic acid regions of interest in one reaction volume as well as methods for selecting a library of primers for use in such amplification methods. The invention also provides library of primers with desirable characteristics, such as minimal formation of amplified primer dimers or other non-target amplicons.

Description

Highly multiplexed PCR methods and compositions

相关申请的交叉引用Cross References to Related Applications

本申请要求2012年11月21日提交的美国实用申请第13/683,604号和2012年7月24日提交的美国临时申请第61/675,020号的权益和优先权。美国实用申请第13/683,604号是2011年11月18日提交的美国实用申请第13/300,235号的部分继续申请，是2011年5月18日提交的美国实用申请第13/110,685号的部分继续申请，并且要求2012年7月24日提交的美国临时申请第61/675,020号的权益。美国实用申请第13/110,685号要求2010年5月18日提交的美国临时申请第61/395,850号；2010年6月21日提交的美国临时申请第61/398,159号；2011年2月9日提交的美国临时申请第61/462,972号；2011年3月2日提交的美国临时申请第61/448,547号；以及2011年4月12日提交的美国临时申请第61/516,996号的权益。美国实用申请第13/300,235号要求2011年6月23日提交的美国临时申请第61/571,248号的权益。所有这些申请的全部内容为了其中的传授内容特此以引用的方式并入本文中。This application claims the benefit of and priority to US Utility Application No. 13/683,604, filed November 21, 2012, and US Provisional Application No. 61/675,020, filed July 24, 2012. U.S. Utility Application No. 13/683,604 is a continuation-in-part of U.S. Utility Application No. 13/300,235, filed November 18, 2011, a continuation-in-part of U.S. Utility Application No. 13/110,685, filed May 18, 2011 application, and claims the benefit of U.S. Provisional Application No. 61/675,020, filed July 24, 2012. U.S. Utility Application No. 13/110,685 required U.S. Provisional Application No. 61/395,850 filed May 18, 2010; U.S. Provisional Application No. 61/398,159 filed June 21, 2010; filed February 9, 2011 U.S. Provisional Application No. 61/462,972, filed March 2, 2011; U.S. Provisional Application No. 61/448,547, filed March 2, 2011; and U.S. Provisional Application No. 61/516,996, filed April 12, 2011. US Utility Application No. 13/300,235 claims the benefit of US Provisional Application No. 61/571,248, filed June 23, 2011. All of these applications are hereby incorporated by reference in their entirety for the teachings therein.

关于联邦赞助研究或开发的声明Statement Regarding Federally Sponsored Research or Development

这项工作得到由美国国家卫生研究院授予的授权号5R44HD60423-3的支持。美国政府可以拥有基于本申请发布的任何专利的权利。This work was supported by grant number 5R44HD60423-3 awarded by the National Institutes of Health. The US Government may have rights in any patents issued based on this application.

技术领域technical field

本发明主要涉及用于在一个反应体积中同时扩增多个相关核酸区域的方法和组合物。The present invention generally relates to methods and compositions for the simultaneous amplification of multiple related nucleic acid regions in one reaction volume.

背景技术Background technique

为了提高检测通量并且允许更有效地使用核酸样品，可以通过将多个寡核苷酸引物与样品组合并且然后使所述样品在本领域中称为复合PCR的过程中经历聚合酶链式反应(PCR)条件来执行相关样品中多个目标核酸的同时扩增。复合PCR的使用可以显著简化实验程序并且缩短用于核酸分析和检测所需的时间。然而，当向同一个PCR反应中添加多个对时，会产生非目标扩增产物，例如经扩增的引物二聚体。产生这类产物的风险随着引物数量增加而增加。这些非目标扩增子显著限制了扩增产物进行进一步分析和/或检测的使用。因此，需要改进方法以减少在复合PCR期间形成非目标扩增子。To increase detection throughput and allow for more efficient use of nucleic acid samples, multiple oligonucleotide primers can be combined with a sample and then subjected to the polymerase chain reaction in a process known in the art as multiplex PCR (PCR) conditions to perform simultaneous amplification of multiple target nucleic acids in related samples. The use of multiplex PCR can significantly simplify experimental procedures and shorten the time required for nucleic acid analysis and detection. However, when multiple pairs are added to the same PCR reaction, non-target amplification products such as amplified primer-dimers can be generated. The risk of producing such products increases with the number of primers. These non-target amplicons significantly limit the use of the amplification products for further analysis and/or detection. Therefore, improved methods are needed to reduce the formation of non-target amplicons during multiplex PCR.

改进的复合PCR方法将适用于各种应用，例如非侵入性产前基因诊断(NPD)。具体来说，产前诊断的当前方法可以警示医师和父母正在成长的胎儿的异常。不进行产前诊断，在出生时，50个婴儿中会有一个出现严重的身体或精神障碍，并且多达30个中就有一个将有某种形式的先天性畸形。令人遗憾的是，标准方法要么准确性差，要么涉及侵入性程序，有导致流产的风险。基于母本血液激素水平或超声测量的方法是非侵入性的，但是它们的准确性也低。例如羊膜穿刺术、绒毛活检和胎儿血液采样的方法具有高准确性，但是它们是侵入性的并且存在重大风险。在美国，在所有怀孕者的约3％中执行羊膜穿刺术，但在过去十五年中，其使用频率已经下降。The improved multiplex PCR method will be suitable for various applications such as non-invasive prenatal genetic diagnosis (NPD). Specifically, current methods of prenatal diagnosis can alert physicians and parents to abnormalities in the growing fetus. Without prenatal diagnosis, at birth, one in 50 babies will have a serious physical or mental disability, and as many as one in 30 will have some form of congenital malformation. Unfortunately, standard methods are either inaccurate or involve invasive procedures that risk miscarriage. Methods based on maternal blood hormone levels or ultrasound measurements are non-invasive, but they are also less accurate. Methods such as amniocentesis, chorionic villus biopsy and fetal blood sampling are highly accurate, but they are invasive and carry significant risks. In the United States, amniocentesis is performed in about 3% of all pregnancies, but its use has declined in frequency over the past fifteen years.

正常人类在每一个健康的二倍体细胞中具有两组染色体，每组23条，分别来自亲本的一个拷贝。非整倍性，一种在核细胞中细胞含有太多和/或太少染色体的病况，被认为是导致大比例植入失败、流产和遗传性疾病的原因。染色体异常检测可以鉴别个体或胚胎的如下病况，尤其例如唐氏综合症(Down syndrome)、克氏综合症(Klinefelter′ssyndrome)和特纳综合症(Turner syndrome)，此外还增加了成功妊娠的机会。染色体异常测试像母亲的年龄一样特别重要，据估计，在35岁与40岁之间的有至少40％的胚胎是异常的，并且在40岁以上时，有一半以上的胚胎是异常的。Normal humans have two sets of 23 chromosomes in each healthy diploid cell, one copy from each parent. Aneuploidy, a condition in which cells contain too many and/or too few chromosomes in the nucleus, is thought to be responsible for a high proportion of implantation failures, miscarriages, and genetic disorders. Chromosomal abnormality testing can identify conditions such as Down syndrome, Klinefelter's syndrome and Turner syndrome in an individual or embryo, among others, in addition to increasing the chances of a successful pregnancy . Chromosomal abnormality testing is especially important as is the age of the mother, it is estimated that at least 40% of embryos between the ages of 35 and 40 are abnormal, and over the age of 40 more than half of all embryos are abnormal.

最近已经发现，无细胞的胎儿DNA和完整的胎儿细胞可以进入母本血液循环。因此，对这种遗传物质的分析可以允许早期NPD。需要经改进的方法来改进敏感性和特异性并且减少NPD所需的时间和成本。It has recently been discovered that cell-free fetal DNA and intact fetal cells can enter the maternal blood circulation. Therefore, analysis of this genetic material may allow for early NPD. Improved methods are needed to improve sensitivity and specificity and reduce the time and cost required for NPD.

发明内容Contents of the invention

在一方面，本发明的特征在于扩增核酸样品中的目标基因座的方法。在一些实施例中，所述方法涉及(i)使所述核酸样品与同时杂交到至少1,000、2,000、5,000、7,500、10,000、20,000、25,000、30,000、40,000、50,000、75,000或100,000个不同的目标基因座的测试引物库接触以产生反应混合物；并且(ii)使所述反应混合物经历引物延伸反应条件以产生包括目标扩增子的扩增产物。在一些实施例中，所述方法还包括确定存在或不存在至少一个目标扩增子(例如至少50％、60％、70％、80％、90％、95％、96％、97％、98％、99％或99.5％的目标扩增子)。在一些实施例中，所述方法还包括测定至少一个目标扩增子(例如至少50％、60％、70％、80％、90％、95％、96％、97％、98％、99％或99.5％的目标扩增子)的序列。In one aspect, the invention features a method of amplifying a target locus in a nucleic acid sample. In some embodiments, the method involves (i) hybridizing the nucleic acid sample to at least 1,000, 2,000, 5,000, 7,500, 10,000, 20,000, 25,000, 30,000, 40,000, 50,000, 75,000, or 100,000 different targets simultaneously. The pool of test primers for the locus is contacted to produce a reaction mixture; and (ii) subjecting the reaction mixture to primer extension reaction conditions to produce an amplification product comprising the amplicon of interest. In some embodiments, the method further comprises determining the presence or absence of at least one target amplicon (e.g., at least 50%, 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98% %, 99%, or 99.5% of target amplicons). In some embodiments, the method further comprises determining at least one target amplicon (e.g., at least 50%, 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99% or 99.5% of target amplicons).

在本发明任一方面的各种实施例中，至少1,000、2,000、5,000、7,500、10,000、20,000、25,000、30,000、40,000、50,000、75,000或100,000个不同的目标基因座被扩增。在一些实施例中，至少50％、60％、70％、80％、90％、95％、96％、97％、98％、99％或99.5％的扩增产物是目标扩增子。在一些实施例中，至少50％、60％、70％、80％、90％、95％、96％、97％、98％、99％或99.5％的靶向基因座被扩增。在不同实施例中，小于60％、50％、40％、30％、20％、10％、5％、4％、3％、2％、1％、0.5％、0.25％、0.1％或0.05％的扩增产物是引物二聚体。在一些实施例中，测试引物库包括至少1,000、2,000、5,000、7,500、10,000、20,000、25,000、30,000、40,000、50,000、75,000或100,000个测试引物对，其中每对引物包括杂交到相同的目标基因座的正向测试引物和反向测试引物。在一些实施例中，测试引物库包括至少1,000、2,000、5,000、7,500、10,000、20,000、25,000、30,000、40,000、50,000、75,000或100,000个杂交到不同目标基因座的单独测试引物，其中所述单独引物不是引物对的一部分。In various embodiments of any aspect of the invention, at least 1,000, 2,000, 5,000, 7,500, 10,000, 20,000, 25,000, 30,000, 40,000, 50,000, 75,000, or 100,000 different target loci are amplified. In some embodiments, at least 50%, 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99%, or 99.5% of the amplification products are target amplicons. In some embodiments, at least 50%, 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99%, or 99.5% of the targeted loci are amplified. In various embodiments, less than 60%, 50%, 40%, 30%, 20%, 10%, 5%, 4%, 3%, 2%, 1%, 0.5%, 0.25%, 0.1%, or 0.05 % of amplification products were primer dimers. In some embodiments, the library of test primers comprises at least 1,000, 2,000, 5,000, 7,500, 10,000, 20,000, 25,000, 30,000, 40,000, 50,000, 75,000, or 100,000 test primer pairs, wherein each pair of primers includes Forward test primer and reverse test primer for seat. In some embodiments, the library of test primers comprises at least 1,000, 2,000, 5,000, 7,500, 10,000, 20,000, 25,000, 30,000, 40,000, 50,000, 75,000, or 100,000 individual test primers that hybridize to different target loci, wherein the individual Primers are not part of a primer pair.

在本发明任一方面的各种实施例中，每个测试引物的浓度是小于100、75、50、25、10、5、2或1nM。在不同实施例中，测试引物的GC含量介于30％到80％之间，例如介于40％到70％或50％到60％之间，包括端点。在一些实施例中，测试引物的GC含量的范围(例如，最大GC含量减去最小GC含量，例如80％-60％＝20％的范围)小于30％、20％、10％或5％。在一些实施例中，测试引物的解链温度(T_m)介于40℃到80℃，例如50℃到70℃、55℃到65℃或57℃到60.5℃之间，包括端点。在一些实施例中，测试引物的解链温度的范围是小于20℃、15℃、10℃、5℃、3℃或1℃。在一些实施例中，测试引物的长度介于15到100个核苷酸之间，例如介于15到75个核苷酸、15到40个核苷酸、17到35个核苷酸、18到30个核苷酸、20到65个核苷酸之间，包括端点。在一些实施例中，测试引物包括非目标特异性标记，例如形成内部环结构的标记。在一些实施例中，所述标记介于两个DNA结合区之间。在不同实施例中，测试引物包括对目标基因座具有特异性的5′区、对目标基因座不具有特异性并且形成环结构的内部区域以及对目标基因座具有特异性的3′区。在不同实施例中，3′区的长度是至少7个核苷酸。在一些实施例中，3′区的长度介于7个与20个核苷酸之间，例如介于7到15个核苷酸或7到10个核苷酸之间，包括端点。在不同实施例中，测试引物包括对目标基因座不具有特异性的5′区(例如标记或通用引物结合位点)、接着是对目标基因座具有特异性的区域、对目标基因座不具有特异性并且形成环结构的内部区域以及对目标基因座具有特异性的3′区。在一些实施例中，测试引物的长度的范围是小于50、40、30、20、10或5个核苷酸。在一些实施例中，目标扩增子的长度介于50个与100个核苷酸之间，例如介于60个与80个核苷酸或60到75个核苷酸之间，包括端点。在一些实施例中，目标扩增子的长度的范围是小于50、25、15、10或5个核苷酸。In various embodiments of any aspect of the invention, the concentration of each test primer is less than 100, 75, 50, 25, 10, 5, 2 or 1 nM. In various embodiments, the test primers have a GC content of between 30% and 80%, such as between 40% and 70% or between 50% and 60%, inclusive. In some embodiments, the range of GC content of the test primers (eg, maximum GC content minus minimum GC content, eg, a range of 80%-60% = 20%) is less than 30%, 20%, 10%, or 5%. In some embodiments, the test primer has a melting temperature ( _Tm ) between 40°C to 80°C, eg, 50°C to 70°C, 55°C to 65°C, or 57°C to 60.5°C, inclusive. In some embodiments, the range of melting temperatures of the test primers is less than 20°C, 15°C, 10°C, 5°C, 3°C, or 1°C. In some embodiments, the test primers are between 15 and 100 nucleotides in length, such as between 15 and 75 nucleotides, 15 to 40 nucleotides, 17 to 35 nucleotides, 18 to 30 nucleotides, between 20 and 65 nucleotides, inclusive. In some embodiments, a test primer includes a non-target-specific label, eg, a label that forms an internal loop structure. In some embodiments, the label is between two DNA binding regions. In various embodiments, the test primer includes a 5' region specific for the locus of interest, an inner region that is not specific for the locus of interest and forms a loop structure, and a 3' region specific for the locus of interest. In various embodiments, the 3' region is at least 7 nucleotides in length. In some embodiments, the 3' region is between 7 and 20 nucleotides, such as between 7 and 15 nucleotides or 7 and 10 nucleotides in length, inclusive. In various embodiments, the test primer includes a 5' region that is not specific for the locus of interest (e.g., a marker or universal primer binding site), followed by a region that is specific for the locus of interest, a region that is not specific for the locus of interest, and a region that is not specific for the locus of interest. The inner region that is specific and forms the loop structure and the 3' region that is specific for the locus of interest. In some embodiments, test primers range in length from less than 50, 40, 30, 20, 10, or 5 nucleotides. In some embodiments, the amplicon of interest is between 50 and 100 nucleotides, such as between 60 and 80 nucleotides or 60 to 75 nucleotides in length, inclusive. In some embodiments, target amplicons range in length from less than 50, 25, 15, 10, or 5 nucleotides.

在本发明任一方面的各种实施例中，引物延伸反应条件是聚合酶链式反应条件(PCR)。在不同实施例中，退火步骤的长度是大于3、5、8、10或15分钟。在不同实施例中，延伸步骤的长度是大于3、5、8、10或15分钟。In various embodiments of any aspect of the invention, the primer extension reaction conditions are polymerase chain reaction conditions (PCR). In various embodiments, the length of the annealing step is greater than 3, 5, 8, 10 or 15 minutes. In various embodiments, the length of the extending step is greater than 3, 5, 8, 10 or 15 minutes.

在本发明任一方面的各种实施例中，测试引物用于同时扩增包括来自胎儿的怀孕母亲的母本DNA和胎儿DNA的样品中的至少1,000个不同的目标基因座以确定存在或不存在胎儿染色体异常。在不同实施例中，所述方法包括将通用引物结合位点接合到所述样品中的DNA分子；使用至少1,000个特异性引物和一个通用引物扩增所接合的DNA分子，产生第一组扩增产物；以及使用至少1,000对特异性引物扩增第一组扩增产物，产生第二组扩增产物。在不同实施例中，使用至少2,000、5,000、7,500、10,000、20,000、25,000、30,000、40,000、50,000、75,000或100,000个不同的引物对。In various embodiments of any aspect of the invention, test primers are used to simultaneously amplify at least 1,000 different target loci in a sample comprising maternal DNA and fetal DNA from a pregnant mother of a fetus to determine the presence or absence There are fetal chromosomal abnormalities. In various embodiments, the method comprises joining universal primer binding sites to DNA molecules in the sample; amplifying the joined DNA molecules using at least 1,000 specific primers and one universal primer, producing a first set of amplified and amplifying the first set of amplification products using at least 1,000 pairs of specific primers to generate a second set of amplification products. In various embodiments, at least 2,000, 5,000, 7,500, 10,000, 20,000, 25,000, 30,000, 40,000, 50,000, 75,000, or 100,000 different primer pairs are used.

在本发明任一方面的各种实施例中，测试引物用于同时扩增包括来自胎儿的假设父亲的DNA的样品中的至少1,000个不同的目标基因座并且同时扩增包括来自所述胎儿的怀孕母亲的母本DNA和胎儿DNA的样品中的目标基因座以确定所述假设父亲是否是所述胎儿的亲生父亲。In various embodiments of any aspect of the invention, the test primers are used to simultaneously amplify at least 1,000 different target loci in a sample comprising DNA from a putative father of a fetus and to simultaneously amplify Target loci in samples of maternal DNA and fetal DNA of the pregnant mother to determine whether the putative father is the biological father of the fetus.

在本发明任一方面的各种实施例中，测试引物用于同时扩增来自胚胎的一个细胞或多个细胞中的至少1,000个不同的目标基因座以确定存在或不存在染色体异常。在不同实施例中，分析来自一组两个或更多个胚胎的细胞，并且选择一个胚胎用于体外受精。In various embodiments of any aspect of the invention, the test primers are used to simultaneously amplify at least 1,000 different loci of interest in a cell or cells from an embryo to determine the presence or absence of a chromosomal abnormality. In various embodiments, cells from a set of two or more embryos are analyzed, and one embryo is selected for in vitro fertilization.

在本发明任一方面的各种实施例中，测试引物用于同时扩增法医核酸样品中的至少1,000个不同的目标基因座。在不同实施例中，退火步骤的长度是大于3、5、8、10或15分钟。In various embodiments of any aspect of the invention, the test primers are used to simultaneously amplify at least 1,000 different target loci in a forensic nucleic acid sample. In various embodiments, the length of the annealing step is greater than 3, 5, 8, 10 or 15 minutes.

在本发明任一方面的各种实施例中，所述方法涉及使用测试引物同时扩增对照核酸样品中的至少1,000个不同的目标基因座以产生第一组目标扩增子并且同时扩增测试核酸样品中的目标基因座以产生第二组目标扩增子；并且比较第一组和第二组目标扩增子以确定目标基因座是否存在于一个样品中但是不存在于另一个样品中，或目标基因座是否以不同水平存在于对照样品和测试样品中。在不同实施例中，测试样品来自疑似具有相关疾病或表现型(例如癌症)或相关疾病或表现型的增加的风险的个体；并且其中目标基因座中的一或多个包括与相关疾病或表现型的增加的风险相关或与相关疾病或表现型相关的序列(例如，多态性或其它突变)。在不同实施例中，所述方法涉及使用测试引物同时扩增包括RNA的对照样品中的至少1,000个不同的目标基因座以产生第一组目标扩增子并且同时扩增包括RNA的测试样品中的目标基因座以产生第二组目标扩增子；并且比较第一组和第二组目标扩增子以确定在对照样品与测试样品之间的RNA表达水平方面存在或不存在差异。在不同实施例中，所述RNA是mRNA。在不同实施例中，测试样品来自疑似具有相关疾病或表现型(例如癌症)或相关疾病或表现型(例如癌症)的增加的风险的个体；并且其中目标基因座中的一或多个包括与相关疾病或表现型的增加的风险相关或与相关疾病或表现型相关的序列(例如，多态性或其它突变)。在一些实施例中，测试样品来自经诊断具有相关疾病或表现型(例如癌症)的个体；并且其中对照样品与测试样品之间在RNA表达水平方面的差异说明目标基因座包括与相关疾病或表现型的增加或降低的风险相关的序列(例如，多态性或其它突变)。In various embodiments of any aspect of the invention, the method involves simultaneously amplifying at least 1,000 different target loci in a control nucleic acid sample using test primers to generate a first set of target amplicons and simultaneously amplifying the test target loci in the nucleic acid sample to generate a second set of target amplicons; and comparing the first set and the second set of target amplicons to determine whether the target loci are present in one sample but not in the other sample, Or whether the target locus is present at different levels in the control and test samples. In various embodiments, the test sample is from an individual suspected of having an associated disease or phenotype (e.g., cancer) or an increased risk of an associated disease or phenotype; and wherein one or more of the target loci comprises Sequences (eg, polymorphisms or other mutations) associated with increased risk of phenotypes or associated diseases or phenotypes. In various embodiments, the method involves simultaneously amplifying at least 1,000 different target loci in a control sample comprising RNA using test primers to generate a first set of target amplicons and simultaneously amplifying in a test sample comprising RNA to generate a second set of target amplicons; and comparing the first set and the second set of target amplicons to determine the presence or absence of differences in RNA expression levels between the control sample and the test sample. In various embodiments, the RNA is mRNA. In various embodiments, the test sample is from an individual suspected of having an associated disease or phenotype (e.g., cancer) or an increased risk of an associated disease or phenotype (e.g., cancer); and wherein one or more of the target loci comprises an Sequences (eg, polymorphisms or other mutations) associated with or associated with increased risk of a relevant disease or phenotype. In some embodiments, the test sample is from an individual diagnosed with an associated disease or phenotype (e.g., cancer); and wherein a difference in RNA expression levels between the control sample and the test sample indicates that the target locus comprises a gene associated with the associated disease or phenotype. Sequences (eg, polymorphisms or other mutations) associated with increased or decreased risk of type.

在本发明任一方面的一些实施例中，测试引物基于一或多个参数选自候选引物库，例如使用本发明的方法中的任一种选择引物。在一些实施例中，测试引物至少部分基于候选引物形成引物二聚体的能力选自候选引物库。In some embodiments of any aspect of the invention, test primers are selected from a library of candidate primers based on one or more parameters, eg, primers are selected using any of the methods of the invention. In some embodiments, test primers are selected from a pool of candidate primers based at least in part on the ability of the candidate primers to form primer dimers.

在一方面，本发明的特征在于从候选引物库选择测试引物的方法。在不同实施例中，所述选择涉及(i)在计算机上计算来自所述库的两种候选引物的大部分或所有可能组合的不理想分数，其中每个不理想分数至少部分基于在两种候选引物之间形成二聚体的似然性；(ii)从候选引物库中去除不理想分数最高的候选引物；以及(iii)如果在步骤(ii)中去除的候选引物是引物对的成员，那么从候选引物库中去除所述引物对的另一个成员；以及(iv)任选地重复步骤(ii)和(iii)，从而选择测试引物库。在一些实施例中，执行所述选择方法直到库中剩余的候选引物组合的不理想分数全部等于或低于最小阈值为止。在一些实施例中，执行所述选择方法直到库中剩余的候选引物的数量减少到所希望的数量为止。在不同实施例中，计算库中至少80％、90％、95％、98％、99％或99.5％的可能的候选引物组合的不理想分数。在不同实施例中，库中剩余的候选引物能够同时扩增至少1,000、2,000、5,000、7,500、10,000、20,000、25,000、30,000、40,000、50,000、75,000或100,000个不同的目标基因座。在不同实施例中，所述方法还包括(v)使包括目标基因座的核酸样品与库中剩余的候选引物接触以产生反应混合物；并且(vi)使反应混合物经历引物延伸反应条件以产生包括目标扩增子的扩增产物。In one aspect, the invention features a method of selecting a test primer from a pool of candidate primers. In various embodiments, the selecting involves (i) calculating in silico undesired scores for most or all possible combinations of two candidate primers from the library, wherein each undesired score is based, at least in part, on two the likelihood of dimer formation between the candidate primers; (ii) removal of the candidate primer with the highest undesirable score from the pool of candidate primers; and (iii) if the candidate primer removed in step (ii) is a member of a primer pair , then the other member of the primer pair is removed from the pool of candidate primers; and (iv) optionally repeating steps (ii) and (iii), thereby selecting a pool of test primers. In some embodiments, the selection method is performed until the undesired scores of the remaining candidate primer combinations in the library are all at or below a minimum threshold. In some embodiments, the selection method is performed until the number of candidate primers remaining in the library is reduced to a desired number. In various embodiments, a non-ideal score is calculated for at least 80%, 90%, 95%, 98%, 99%, or 99.5% of the possible candidate primer combinations in the library. In various embodiments, the remaining candidate primers in the library are capable of simultaneously amplifying at least 1,000, 2,000, 5,000, 7,500, 10,000, 20,000, 25,000, 30,000, 40,000, 50,000, 75,000, or 100,000 different target loci. In various embodiments, the method further comprises (v) contacting a nucleic acid sample comprising the target locus with the remaining candidate primers in the library to generate a reaction mixture; and (vi) subjecting the reaction mixture to primer extension reaction conditions to generate a reaction mixture comprising Amplification product of target amplicon.

在一方面，本发明的特征在于从候选引物库选择测试引物的方法。在不同实施例中，测试引物的选择选自候选引物库，涉及(i)在计算机上计算来自所述库的两种候选引物的大部分或所有可能组合的不理想分数，其中每个不理想分数至少部分基于在两种候选引物之间形成二聚体的似然性；(ii)从候选引物库中去除作为两种候选引物的最大数量组合中不理想分数高于第一最小阈值的部分的候选引物；(iii)如果在步骤(ii)中去除的候选引物是引物对的成员，那么从候选引物库中去除所述引物对的另一个成员；以及(iv)任选地重复步骤(ii)和(iii)，从而选择测试引物库。在一些实施例中，执行所述选择方法直到库中剩余的候选引物组合的不理想分数全部等于或低于第一最小阈值为止。在一些实施例中，执行所述选择方法直到库中剩余的候选引物的数量减少到所希望的数量为止。在不同实施例中，计算库中至少80％、90％、95％、98％、99％或99.5％的可能的候选引物组合的不理想分数。在不同实施例中，库中剩余的候选引物能够同时扩增至少1,000、2,000、5,000、7,500、10,000、20,000、25,000、30,000、40,000、50,000、75,000或100,000个不同的目标基因座。在不同实施例中，所述方法还包括(v)使包括目标基因座的核酸样品与库中剩余的候选引物接触以产生反应混合物；并且(vi)使反应混合物经历引物延伸反应条件以产生包括目标扩增子的扩增产物。In one aspect, the invention features a method of selecting a test primer from a pool of candidate primers. In various embodiments, the selection of test primers is selected from a library of candidate primers, involving (i) calculating in silico the undesired scores for most or all possible combinations of two candidate primers from said library, where each undesired The score is based at least in part on the likelihood of dimer formation between the two candidate primers; (ii) removing from the pool of candidate primers those portions of the maximum number of combinations of the two candidate primers that have an undesirable score above a first minimum threshold (iii) if the candidate primer removed in step (ii) is a member of a primer pair, then remove another member of the primer pair from the library of candidate primers; and (iv) optionally repeat steps ( ii) and (iii), thereby selecting a pool of test primers. In some embodiments, the selection method is performed until the undesired scores of remaining candidate primer combinations in the library are all at or below a first minimum threshold. In some embodiments, the selection method is performed until the number of candidate primers remaining in the library is reduced to a desired number. In various embodiments, a non-ideal score is calculated for at least 80%, 90%, 95%, 98%, 99%, or 99.5% of the possible candidate primer combinations in the library. In various embodiments, the remaining candidate primers in the library are capable of simultaneously amplifying at least 1,000, 2,000, 5,000, 7,500, 10,000, 20,000, 25,000, 30,000, 40,000, 50,000, 75,000, or 100,000 different target loci. In various embodiments, the method further comprises (v) contacting a nucleic acid sample comprising the target locus with the remaining candidate primers in the library to generate a reaction mixture; and (vi) subjecting the reaction mixture to primer extension reaction conditions to generate a reaction mixture comprising Amplification product of target amplicon.

在本发明任一方面的各种实施例中，所述选择方法涉及通过将步骤(ii)中所用的第一最小阈值降低到更低的第二最小阈值并且任选地重复步骤(ii)和(iii)来进一步减少库中剩余的候选引物的数量。在一些实施例中，所述选择方法涉及将步骤(ii)中所用的第一最小阈值增加到更高的第二最小阈值并且任选地重复步骤(ii)和(iii)。在一些实施例中，执行所述选择方法直到库中剩余的候选引物组合的不理想分数全部等于或低于第二最小阈值为止，或直到库中剩余的候选引物的数量减少到所希望的数量为止。In various embodiments of any aspect of the invention, the selection method involves reducing the first minimum threshold used in step (ii) to a lower second minimum threshold and optionally repeating steps (ii) and (iii) to further reduce the number of remaining candidate primers in the library. In some embodiments, the selection method involves increasing the first minimum threshold used in step (ii) to a higher second minimum threshold and optionally repeating steps (ii) and (iii). In some embodiments, the selection method is performed until the undesired scores of the remaining candidate primer combinations in the library are all at or below a second minimum threshold, or until the number of remaining candidate primer combinations in the library is reduced to a desired number until.

在本发明任一方面的各种实施例中，所述方法涉及在步骤(i)之前，鉴别或选择杂交到目标基因座的引物。在一些实施例中，多个引物(或引物对)杂交到相同的目标基因座，并且所述选择方法用于基于一或多个参数选择关于此目标基因座的一个引物(或一个引物对)。在不同实施例中，所述方法涉及在步骤(ii)之前，从所述库中去除产生与通过另一个引物对产生的目标扩增子重叠的目标扩增子的引物对。在不同实施例中，基于一或多个其它参数从两种或更多种候选引物的群组中选出具有从候选引物库中去除的相等不理想分数的候选引物。在一些实施例中，库中剩余的候选引物用作本发明方法中的任一种中的测试引物库。在一些实施例中，所得测试引物库包括本发明引物库中的任一个。In various embodiments of any aspect of the invention, the method involves, prior to step (i), identifying or selecting primers that hybridize to the locus of interest. In some embodiments, multiple primers (or primer pairs) hybridize to the same target locus, and the selection method is used to select a primer (or a primer pair) for this target locus based on one or more parameters . In various embodiments, the method involves removing from the pool, prior to step (ii), a primer pair that produces a target amplicon that overlaps a target amplicon produced by another primer pair. In various embodiments, candidate primers are selected from a group of two or more candidate primers based on one or more other parameters with an equal score of undesirability for removal from the pool of candidate primers. In some embodiments, the remaining candidate primers in the library are used as a pool of test primers in any of the methods of the invention. In some embodiments, the resulting library of test primers includes any of the primer libraries of the invention.

在本发明任一方面的各种实施例中，不理想分数至少部分基于一或多个选自由以下组成的群组的参数：目标基因座的杂合率、与在目标基因座的序列(例如，多态性)相关的疾病流行率、与在目标基因座的序列(例如，多态性)相关的疾病外显率、候选引物对目标基因座的特异性、候选引物的大小、目标扩增子的解链温度、目标扩增子的GC含量、目标扩增子的扩增效率以及目标扩增子的大小。In various embodiments of any aspect of the invention, the undesired score is based at least in part on one or more parameters selected from the group consisting of: the heterozygosity rate of the target locus, the correlation between the sequence at the target locus (e.g. , polymorphism), disease prevalence associated with sequence at the target locus (e.g., polymorphism), specificity of candidate primers for the target locus, size of candidate primers, target amplification The melting temperature of the target amplicon, the GC content of the target amplicon, the amplification efficiency of the target amplicon, and the size of the target amplicon.

在本发明任一方面的各种实施例中，不理想分数至少部分基于一或多个选自由以下组成的群组的参数：目标基因座的杂合率、候选引物对目标基因座的特异性、候选引物的大小、目标扩增子的解链温度、目标扩增子的GC含量、目标扩增子的扩增效率以及目标扩增子的大小；并且测试引物用于同时扩增包括来自胎儿的怀孕母亲的母本DNA和胎儿DNA的样品中的至少1,000个不同的目标基因座以确定存在或不存在胎儿染色体异常。在不同实施例中，所述方法包括将通用引物结合位点接合到所述样品中的DNA分子；使用至少1,000个特异性引物和一个通用引物扩增所接合的DNA分子，产生第一组扩增产物；以及使用至少1,000对特异性引物扩增第一组扩增产物，产生第二组扩增产物。在不同实施例中，使用至少2,000、5,000、7,500、10,000、20,000、25,000、30,000、40,000、50,000、75,000或100,000个不同的引物对。在不同实施例中，至少2,000、5,000、7,500、10,000、20,000、25,000、30,000、40,000、50,000、75,000或100,000个不同的目标基因座被扩增。In various embodiments of any aspect of the invention, the undesired score is based at least in part on one or more parameters selected from the group consisting of: heterozygosity rate for the target locus, specificity of the candidate primers for the target locus , the size of the candidate primer, the melting temperature of the target amplicon, the GC content of the target amplicon, the amplification efficiency of the target amplicon, and the size of the target amplicon; At least 1,000 different target loci in samples of maternal DNA and fetal DNA from pregnant mothers to determine the presence or absence of fetal chromosomal abnormalities. In various embodiments, the method comprises joining universal primer binding sites to DNA molecules in the sample; amplifying the joined DNA molecules using at least 1,000 specific primers and one universal primer, producing a first set of amplified and amplifying the first set of amplification products using at least 1,000 pairs of specific primers to generate a second set of amplification products. In various embodiments, at least 2,000, 5,000, 7,500, 10,000, 20,000, 25,000, 30,000, 40,000, 50,000, 75,000, or 100,000 different primer pairs are used. In various embodiments, at least 2,000, 5,000, 7,500, 10,000, 20,000, 25,000, 30,000, 40,000, 50,000, 75,000, or 100,000 different target loci are amplified.

在本发明任一方面的各种实施例中，不理想分数至少部分基于一或多个选自由以下组成的群组的参数：目标基因座的杂合率、候选引物对目标基因座的特异性、候选引物的大小、目标扩增子的解链温度、目标扩增子的GC含量、目标扩增子的扩增效率以及目标扩增子的大小；并且测试引物用于同时扩增包括来自胎儿的假设父亲的DNA的样品中的至少1,000个不同的目标基因座并且同时扩增包括来自胎儿的怀孕母亲的母本DNA和胎儿DNA的样品中的目标基因座以确定所述假设父亲是否是所述胎儿的亲生父亲。在不同实施例中，至少2,000、5,000、7,500、10,000、20,000、25,000、30,000、40,000、50,000、75,000或100,000个不同的目标基因座被扩增。In various embodiments of any aspect of the invention, the undesired score is based at least in part on one or more parameters selected from the group consisting of: heterozygosity rate for the target locus, specificity of the candidate primers for the target locus , the size of the candidate primer, the melting temperature of the target amplicon, the GC content of the target amplicon, the amplification efficiency of the target amplicon, and the size of the target amplicon; at least 1,000 different target loci in a sample of DNA from the putative father and simultaneously amplify the target loci in a sample including maternal DNA and fetal DNA from the pregnant mother of the fetus to determine whether the putative father is the the biological father of the fetus. In various embodiments, at least 2,000, 5,000, 7,500, 10,000, 20,000, 25,000, 30,000, 40,000, 50,000, 75,000, or 100,000 different target loci are amplified.

在本发明任一方面的各种实施例中，不理想分数至少部分基于一或多个选自由以下组成的群组的参数：目标基因座的杂合率、候选引物对目标基因座的特异性、候选引物的大小、目标扩增子的解链温度、目标扩增子的GC含量、目标扩增子的扩增效率以及目标扩增子的大小；并且测试引物用于同时扩增来自胚胎的一个细胞或多个细胞中的至少1,000个不同的目标基因座以确定存在或不存在染色体异常。在不同实施例中，分析来自一组两个或更多个胚胎的细胞，并且选择一个胚胎用于体外受精。在不同实施例中，至少2,000、5,000、7,500、10,000、20,000、25,000、30,000、40,000、50,000、75,000或100,000个不同的目标基因座被扩增。In various embodiments of any aspect of the invention, the undesired score is based at least in part on one or more parameters selected from the group consisting of: heterozygosity rate for the target locus, specificity of the candidate primers for the target locus , the size of the candidate primer, the melting temperature of the target amplicon, the GC content of the target amplicon, the amplification efficiency of the target amplicon, and the size of the target amplicon; and the test primers are used to simultaneously amplify At least 1,000 different target loci in a cell or cells to determine the presence or absence of chromosomal abnormalities. In various embodiments, cells from a set of two or more embryos are analyzed, and one embryo is selected for in vitro fertilization. In various embodiments, at least 2,000, 5,000, 7,500, 10,000, 20,000, 25,000, 30,000, 40,000, 50,000, 75,000, or 100,000 different target loci are amplified.

在本发明任一方面的各种实施例中，不理想分数至少部分基于一或多个选自由以下组成的群组的参数：目标基因座的杂合率、候选引物对目标基因座的特异性、候选引物的大小、目标扩增子的解链温度、目标扩增子的GC含量、目标扩增子的扩增效率以及目标扩增子的大小；并且测试引物用于同时扩增法医核酸样品中的至少1,000个不同的目标基因座。在不同实施例中，退火步骤的长度是大于3、5、8、10或15分钟。在不同实施例中，至少2,000、5,000、7,500、10,000、20,000、25,000、30,000、40,000、50,000、75,000或100,000个不同的目标基因座被扩增。In various embodiments of any aspect of the invention, the undesired score is based at least in part on one or more parameters selected from the group consisting of: heterozygosity rate for the target locus, specificity of the candidate primers for the target locus , the size of the candidate primer, the melting temperature of the target amplicon, the GC content of the target amplicon, the amplification efficiency of the target amplicon, and the size of the target amplicon; and the test primers are used to simultaneously amplify the forensic nucleic acid sample At least 1,000 different target loci in . In various embodiments, the length of the annealing step is greater than 3, 5, 8, 10 or 15 minutes. In various embodiments, at least 2,000, 5,000, 7,500, 10,000, 20,000, 25,000, 30,000, 40,000, 50,000, 75,000, or 100,000 different target loci are amplified.

在本发明任一方面的各种实施例中，不理想分数至少部分基于一或多个选自由以下组成的群组的参数：目标基因座的杂合率、与在目标基因座的序列(例如，多态性)相关的疾病流行率、与在目标基因座的序列(例如，多态性)相关的疾病外显率、候选引物对目标基因座的特异性、候选引物的大小、目标扩增子的解链温度、目标扩增子的GC含量、目标扩增子的扩增效率以及目标扩增子的大小；并且所述方法涉及使用测试引物同时扩增对照核酸样品中的至少1,000个不同的目标基因座以产生第一组目标扩增子并且同时扩增测试核酸样品中的目标基因座以产生第二组目标扩增子；并且比较第一和第二组目标扩增子以确定目标基因座是否存在于一个样品中但是不存在于另一个样品中，或目标基因座是否以不同水平存在于对照样品和测试样品中。在不同实施例中，测试样品来自疑似具有相关疾病或表现型或相关疾病或表现型的增加的风险的个体；并且其中目标基因座中的一或多个包括在目标基因座处与相关疾病或表现型的增加的风险相关或与相关疾病或表现型相关的序列(例如，多态性)。在不同实施例中，至少2,000、5,000、7,500、10,000、20,000、25,000、30,000、40,000、50,000、75,000或100,000个不同的目标基因座被扩增。In various embodiments of any aspect of the invention, the undesired score is based at least in part on one or more parameters selected from the group consisting of: the heterozygosity rate of the target locus, the correlation between the sequence at the target locus (e.g. , polymorphism), disease prevalence associated with sequence at the target locus (e.g., polymorphism), specificity of candidate primers for the target locus, size of candidate primers, target amplification The melting temperature of the target amplicon, the GC content of the target amplicon, the amplification efficiency of the target amplicon, and the size of the target amplicon; and the method involves using test primers to simultaneously amplify at least 1,000 different target loci to generate a first set of target amplicons and simultaneously amplify target loci in the test nucleic acid sample to generate a second set of target amplicons; and comparing the first and second set of target amplicons to determine the target Whether the locus is present in one sample but not the other, or whether the locus of interest is present at different levels in the control and test samples. In various embodiments, the test sample is from an individual suspected of having a relevant disease or phenotype or an increased risk of a relevant disease or phenotype; Sequences (eg, polymorphisms) associated with increased risk of phenotypes or with associated diseases or phenotypes. In various embodiments, at least 2,000, 5,000, 7,500, 10,000, 20,000, 25,000, 30,000, 40,000, 50,000, 75,000, or 100,000 different target loci are amplified.

在本发明任一方面的各种实施例中，不理想分数至少部分基于一或多个选自由以下组成的群组的参数：目标基因座的杂合率、与在目标基因座的序列(例如，多态性)相关的疾病流行率、与在目标基因座的序列(例如，多态性)相关的疾病外显率、候选引物对目标基因座的特异性、候选引物的大小、目标扩增子的解链温度、目标扩增子的GC含量、目标扩增子的扩增效率以及目标扩增子的大小；并且所述方法涉及使用测试引物同时扩增包括RNA的对照样品中的1,000个不同的目标基因座以产生第一组目标扩增子并且同时扩增包括RNA的测试样品中的目标基因座以产生第二组目标扩增子；并且比较第一和第二组目标扩增子以确定对照样品与测试样品之间在RNA表达水平方面存在或不存在差异。在不同实施例中，所述RNA是mRNA。在不同实施例中，测试样品来自疑似具有相关疾病或表现型(例如癌症)或相关疾病或表现型(例如癌症)的增加的风险的个体；并且其中目标基因座中的一或多个包括与相关疾病或表现型的增加的风险相关或与相关疾病或表现型相关的序列(例如，多态性或其它突变)。在一些实施例中，测试样品来自经诊断具有相关疾病或表现型(例如癌症)的个体；并且其中对照样品与测试样品之间在RNA表达水平方面的差异说明目标基因座包括与相关疾病或表现型的增加或降低的风险相关的序列(例如，多态性或其它突变)。在不同实施例中，至少2,000、5,000、7,500、10,000、20,000、25,000、30,000、40,000、50,000、75,000或100,000个不同的目标基因座被扩增。In various embodiments of any aspect of the invention, the undesired score is based at least in part on one or more parameters selected from the group consisting of: the heterozygosity rate of the target locus, the correlation between the sequence at the target locus (e.g. , polymorphism), disease prevalence associated with sequence at the target locus (e.g., polymorphism), specificity of candidate primers for the target locus, size of candidate primers, target amplification the melting temperature of the target amplicon, the GC content of the target amplicon, the amplification efficiency of the target amplicon, and the size of the target amplicon; and the method involves using test primers to simultaneously amplify 1,000 of a control sample comprising RNA different target loci to generate a first set of target amplicons and simultaneously amplifying target loci in a test sample comprising RNA to generate a second set of target amplicons; and comparing the first and second set of target amplicons To determine the presence or absence of differences in RNA expression levels between control samples and test samples. In various embodiments, the RNA is mRNA. In various embodiments, the test sample is from an individual suspected of having an associated disease or phenotype (e.g., cancer) or an increased risk of an associated disease or phenotype (e.g., cancer); and wherein one or more of the target loci comprises an Sequences (eg, polymorphisms or other mutations) associated with or associated with increased risk of a relevant disease or phenotype. In some embodiments, the test sample is from an individual diagnosed with an associated disease or phenotype (e.g., cancer); and wherein a difference in RNA expression levels between the control sample and the test sample indicates that the target locus comprises a gene associated with the associated disease or phenotype. Sequences (eg, polymorphisms or other mutations) associated with increased or decreased risk of type. In various embodiments, at least 2,000, 5,000, 7,500, 10,000, 20,000, 25,000, 30,000, 40,000, 50,000, 75,000, or 100,000 different target loci are amplified.

在一方面，本发明的特征在于引物库。在一些实施例中，使用本发明方法中的任一种从候选引物库中选择引物。在一些实施例中，所述库包括同时杂交到至少1,000、2,000、5,000、7,500、10,000、20,000、25,000、30,000、40,000、50,000、75,000或100,000个不同的目标基因座的引物。在一些实施例中，所述库包括同时扩增至少1,000、2,000、5,000、7,500、10,000、20,000、25,000、30,000、40,000、50,000、75,000或100,000个不同的目标基因座的引物。在一些实施例中，所述库包括同时扩增至少1,000、2,000、5,000、7,500、10,000、20,000、25,000、30,000、40,000、50,000、75,000或100,000个不同的目标基因座以使得小于60％、40％、30％、20％、10％、5％、4％、3％、2％、1％、0.5％、0.25％、0.1％或0.05％的扩增产物是引物二聚体的引物。在一些实施例中，所述库包括同时扩增1,000、2,000、5,000、7,500、10,000、20,000、25,000、30,000、40,000、50,000、75,000或100,000个不同的目标基因座以使得至少50％、60％、70％、80％、90％、95％、96％、97％、98％、99％或99.5％的扩增产物是目标扩增子的引物。在一些实施例中，所述库包括同时扩增目标基因座以使得在1,000、2,000、5,000、7,500、10,000、20,000、25,000、30,000、40,000、50,000、75,000或100,000个不同的目标基因座中至少50％、60％、70％、80％、90％、95％、96％、97％、98％、99％或99.5％的靶向基因座被扩增的引物。在一些实施例中，引物库包括至少1,000、2,000、5,000、7,500、10,000、20,000、25,000、30,000、40,000、50,000、75,000或100,000个引物对，其中每对引物包括正向测试引物和反向测试引物，其中每对测试引物杂交到一个目标基因座。在一些实施例中，引物库包括至少1,000、2,000、5,000、7,500、10,000、20,000、25,000、30,000、40,000、50,000、75,000或100,000个各自杂交到不同目标基因座的单独引物，其中所述单独引物不是引物对的一部分。In one aspect, the invention features primer libraries. In some embodiments, primers are selected from a pool of candidate primers using any of the methods of the invention. In some embodiments, the library comprises primers that hybridize simultaneously to at least 1,000, 2,000, 5,000, 7,500, 10,000, 20,000, 25,000, 30,000, 40,000, 50,000, 75,000, or 100,000 different target loci. In some embodiments, the library comprises primers that simultaneously amplify at least 1,000, 2,000, 5,000, 7,500, 10,000, 20,000, 25,000, 30,000, 40,000, 50,000, 75,000, or 100,000 different target loci. In some embodiments, the library comprises simultaneously amplifying at least 1,000, 2,000, 5,000, 7,500, 10,000, 20,000, 25,000, 30,000, 40,000, 50,000, 75,000, or 100,000 different target loci such that less than 60%, 40 %, 30%, 20%, 10%, 5%, 4%, 3%, 2%, 1%, 0.5%, 0.25%, 0.1%, or 0.05% of the amplification products are primers that are primer dimers. In some embodiments, the library comprises simultaneously amplifying 1,000, 2,000, 5,000, 7,500, 10,000, 20,000, 25,000, 30,000, 40,000, 50,000, 75,000, or 100,000 different target loci such that at least 50%, 60% , 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99%, or 99.5% of the amplification products are primers for the target amplicon. In some embodiments, the library comprises simultaneously amplifying target loci such that among 1,000, 2,000, 5,000, 7,500, 10,000, 20,000, 25,000, 30,000, 40,000, 50,000, 75,000, or 100,000 different target loci at least Primers that amplify 50%, 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99%, or 99.5% of the targeted loci. In some embodiments, the primer library comprises at least 1,000, 2,000, 5,000, 7,500, 10,000, 20,000, 25,000, 30,000, 40,000, 50,000, 75,000, or 100,000 primer pairs, wherein each pair of primers includes a forward test primer and a reverse test primer Primers, where each pair of test primers hybridizes to one locus of interest. In some embodiments, the library of primers comprises at least 1,000, 2,000, 5,000, 7,500, 10,000, 20,000, 25,000, 30,000, 40,000, 50,000, 75,000, or 100,000 individual primers that each hybridize to a different target locus, wherein the individual primers Not part of the primer pair.

在本发明任一方面的各种实施例中，每个引物的浓度是小于100、75、50、25、10、5、2或1nM。在不同实施例中，引物的GC含量介于30％到80％之间，例如介于40％到70％或50％到60％之间，包括端点。在一些实施例中，引物的GC含量的范围是小于30％、20％、10％或5％。在一些实施例中，引物的解链温度介于40℃到80℃，例如50℃到70℃、55℃到65℃或57℃到60.5℃之间，包括端点。在一些实施例中，引物的解链温度的范围是小于15℃、10℃、5℃、3℃或1℃。在一些实施例中，引物的长度介于15到100个核苷酸之间，例如介于15到75个核苷酸、15到40个核苷酸、17到35个核苷酸、18到30个核苷酸或20到65个核苷酸之间，包括端点。在一些实施例中，引物包括非目标特异性标记，例如形成内部环结构的标记。在一些实施例中，所述标记介于两个DNA结合区之间。在不同实施例中，引物包括对目标基因座具有特异性的5′区、对目标基因座不具有特异性并且形成环结构的内部区域以及对目标基因座具有特异性的3′区。在不同实施例中，3′区的长度是至少7个核苷酸。在一些实施例中，3′区的长度介于7个与20个核苷酸之间，例如介于7到15个核苷酸或7到10个核苷酸之间，包括端点。在不同实施例中，引物包括对目标基因座不具有特异性的5′区(例如另一种标记或通用引物结合位点)、接着是对目标基因座具有特异性的区域、对目标基因座不具有特异性并且形成环结构的内部区域以及对目标基因座具有特异性的3′区。在一些实施例中，引物的长度的范围是小于50、40、30、20、10或5个核苷酸。在一些实施例中，目标扩增子的长度介于50个与100个核苷酸之间，例如介于60个与80个核苷酸或60到75个核苷酸之间，包括端点。在一些实施例中，目标扩增子的长度的范围是小于50、25、15、10或5个核苷酸。In various embodiments of any aspect of the invention, the concentration of each primer is less than 100, 75, 50, 25, 10, 5, 2 or 1 nM. In various embodiments, the primers have a GC content of between 30% and 80%, such as between 40% and 70% or between 50% and 60%, inclusive. In some embodiments, the GC content of the primers ranges from less than 30%, 20%, 10%, or 5%. In some embodiments, the primer has a melting temperature between 40°C and 80°C, eg, 50°C to 70°C, 55°C to 65°C, or 57°C to 60.5°C, inclusive. In some embodiments, the melting temperature of the primer ranges from less than 15°C, 10°C, 5°C, 3°C, or 1°C. In some embodiments, the length of the primer is between 15 and 100 nucleotides, such as between 15 and 75 nucleotides, 15 to 40 nucleotides, 17 to 35 nucleotides, 18 to 30 nucleotides or between 20 and 65 nucleotides inclusive. In some embodiments, a primer includes a non-target-specific label, such as a label that forms an internal loop structure. In some embodiments, the label is between two DNA binding regions. In various embodiments, the primers include a 5' region specific for the locus of interest, an inner region that is not specific for the locus of interest and forms a loop structure, and a 3' region specific for the locus of interest. In various embodiments, the 3' region is at least 7 nucleotides in length. In some embodiments, the 3' region is between 7 and 20 nucleotides, such as between 7 and 15 nucleotides or 7 and 10 nucleotides in length, inclusive. In various embodiments, the primers include a 5' region that is not specific for the locus of interest (e.g., another marker or universal primer binding site), followed by a region specific for the locus of interest, An inner region that is not specific and forms a loop structure, and a 3' region that is specific for the locus of interest. In some embodiments, primers range in length from less than 50, 40, 30, 20, 10, or 5 nucleotides. In some embodiments, the amplicon of interest is between 50 and 100 nucleotides, such as between 60 and 80 nucleotides or 60 to 75 nucleotides in length, inclusive. In some embodiments, target amplicons range in length from less than 50, 25, 15, 10, or 5 nucleotides.

在一方面，本发明提供包括本发明引物库中的任一个用于扩增核酸样品中的目标基因座的试剂盒。在一些实施例中，试剂盒包括使用所述库扩增目标基因座的说明书。In one aspect, the invention provides a kit comprising any one of the library of primers of the invention for amplifying a target locus in a nucleic acid sample. In some embodiments, the kit includes instructions for using the library to amplify the locus of interest.

在一方面，本发明的特征在于用于测定孕育中的胎儿中的染色体的倍性状态的方法。在一些实施例中，所述方法涉及使核酸样品与同时杂交到至少1,000、2,000、5,000、7,500、10,000、20,000、25,000、30,000、40,000、50,000、75,000或100,000个不同的多态基因座的引物库接触以产生反应混合物；其中所述核酸样品包括来自胎儿的母亲的母本DNA和来自胎儿的胎儿DNA。在一些实施例中，使反应混合物经历引物延伸反应条件以产生扩增产物；用高通量测序仪测量扩增产物以产生测序数据；基于测序数据在计算机上计算在多态基因座的等位基因计数；在计算机上创建多个各自关于染色体的不同可能倍性状态的倍性假设；针对每种倍性假设，在计算机上为在染色体上的多态基因座处的预计等位基因计数构建联合分布模型；使用联合分布模型和等位基因计数在计算机上测定倍性假设中的每一个的相对概率；以及通过选择对应于具有最大概率的假设的倍性状态，判读胎儿的倍性状态。In one aspect, the invention features methods for determining the ploidy state of chromosomes in a gestating fetus. In some embodiments, the methods involve simultaneously hybridizing a nucleic acid sample with primers to at least 1,000, 2,000, 5,000, 7,500, 10,000, 20,000, 25,000, 30,000, 40,000, 50,000, 75,000, or 100,000 different polymorphic loci The libraries are contacted to produce a reaction mixture; wherein the nucleic acid sample comprises maternal DNA from the mother of the fetus and fetal DNA from the fetus. In some embodiments, the reaction mixture is subjected to primer extension reaction conditions to generate amplification products; the amplification products are measured with a high-throughput sequencer to generate sequencing data; alleles at polymorphic loci are calculated in silico based on the sequencing data Gene counting; creating in silico multiple ploidy hypotheses each for the different possible ploidy states of the chromosome; for each ploidy hypothesis, constructing in silico for predicted allele counts at polymorphic loci on the chromosome a joint distribution model; determining in silico the relative probability of each of the ploidy hypotheses using the joint distribution model and the allele counts; and deciphering the ploidy state of the fetus by selecting the ploidy state corresponding to the hypothesis with the greatest probability.

在一方面，本发明的特征在于用于测定孕育中的胎儿中的染色体的倍性状态的方法。在一个实施例中，用于测定孕育中的胎儿中的染色体的倍性状态的方法包括获得包含来自胎儿的母亲的母本DNA和来自胎儿的胎儿DNA的第一DNA样品；通过分离DNA制备第一样品以便获得制备样品；测量制备样品中在染色体上的多个多态基因座处的DNA；在计算机上从关于制备样品得到的DNA测量结果计算在多个多态基因座处的等位基因计数；在计算机上创建多个各自关于染色体的不同可能倍性状态的倍性假设；针对每种倍性假设，在计算机上为在染色体上的多个多态基因座处的预计等位基因计数构建联合分布模型；使用联合分布模型和关于制备样品测量的等位基因计数，在计算机上测定倍性假设中的每一个的相对概率；以及通过选择对应于具有最大概率的假设的倍性状态，判读胎儿的倍性状态。In one aspect, the invention features methods for determining the ploidy state of chromosomes in a gestating fetus. In one embodiment, a method for determining the ploidy state of chromosomes in a gestating fetus comprises obtaining a first DNA sample comprising maternal DNA from the mother of the fetus and fetal DNA from the fetus; preparing a second DNA sample by isolating the DNA; A sample in order to obtain a prepared sample; measure DNA in the prepared sample at multiple polymorphic loci on a chromosome; calculate alleles at the multiple polymorphic loci on a computer from DNA measurements made on the prepared sample Gene counting; creating in silico multiple ploidy hypotheses each for the different possible ploidy states of the chromosome; for each ploidy hypothesis, in silico the predicted alleles at the multiple polymorphic loci on the chromosome constructing a joint distribution model from the counts; determining in silico the relative probability of each of the ploidy hypotheses using the joint distribution model and the allele counts measured on the prepared sample; and by selecting the ploidy state corresponding to the hypothesis with the greatest probability , to interpret the ploidy status of the fetus.

在一方面，本发明的特征在于测试染色体在包括母本和胎儿的DNA的混合物的样品中的非正态分布的方法。在一些实施例中，所述方法涉及(i)使所述样品与同时杂交到至少1,000、2,000、5,000、7,500、10,000、20,000、25,000、30,000、40,000、50,000、75,000或100,000个不同的目标基因座的引物库接触以产生反应混合物；其中所述目标基因座来自多个不同的染色体；并且其中多个不同的染色体包括至少一个疑似在所述样品中具有非正态分布的第一染色体和至少一个假定在所述样品中正态分布的第二染色体；(ii)使反应混合物经历引物延伸反应条件以产生扩增产物；(iii)对扩增产物进行测序以获得多个与目标基因座对准的序列标记；其中序列标记的长度足以分配给特异性目标基因座；(iv)在计算机上将多个序列标记分配给其对应的目标基因座；(v)在计算机上测定与第一染色体的目标基因座对准的序列标记的数量和与第二染色体的目标基因座对准的序列标记的数量；以及(vi)在计算机上比较来自步骤(v)的数量以确定存在或不存在第一染色体的非正态分布。In one aspect, the invention features a method of testing for a non-normal distribution of chromosomes in a sample comprising a mixture of maternal and fetal DNA. In some embodiments, the method involves (i) simultaneously hybridizing the sample to at least 1,000, 2,000, 5,000, 7,500, 10,000, 20,000, 25,000, 30,000, 40,000, 50,000, 75,000, or 100,000 different target genes A primer library for a locus is contacted to generate a reaction mixture; wherein the target locus is from a plurality of different chromosomes; and wherein the plurality of different chromosomes includes at least one first chromosome suspected of having a non-normal distribution in the sample and at least A second chromosome that is assumed to be normally distributed in the sample; (ii) subjecting the reaction mixture to primer extension reaction conditions to generate amplification products; (iii) sequencing the amplification products to obtain multiple pairs of loci of interest. Sequence markers that are accurate; wherein the length of the sequence markers is sufficient to be assigned to a specific target locus; (iv) assigning multiple sequence markers to their corresponding target loci in silico; (v) determining in silico the relationship with the first chromosome and (vi) comparing in silico the numbers from step (v) to determine the presence or absence of the second chromosome A non-normal distribution of chromosomes.

在一方面，本发明提供用于检测存在或不存在胎儿非整倍性的方法。在一些实施例中，所述方法涉及(i)使包括母本和胎儿的DNA的混合物的样品与同时杂交到至少1,000、2,000、5,000、7,500、10,000、20,000、25,000、30,000、40,000、50,000、75,000或100,000个不同的非多态目标基因座的引物库接触以产生反应混合物；其中所述目标基因座来自多个不同的染色体；(ii)使反应混合物经历引物延伸反应条件以产生包括目标扩增子的扩增产物；(iii)在计算机上对来自第一和第二相关染色体的目标扩增子的相对频率进行定量；(iv)在计算机上比较来自第一和第二相关染色体的目标扩增子的相对频率；以及(v)基于所比较的第一和第二相关染色体的相对频率，鉴别存在或不存在非整倍性。在一些实施例中，第一染色体是疑似整倍体的染色体。在一些实施例中，第二染色体是疑似非整倍性的染色体。In one aspect, the invention provides methods for detecting the presence or absence of fetal aneuploidy. In some embodiments, the method involves (i) simultaneously hybridizing a sample comprising a mixture of maternal and fetal DNA to at least 1,000, 2,000, 5,000, 7,500, 10,000, 20,000, 25,000, 30,000, 40,000, 50,000, Primer pools of 75,000 or 100,000 different non-polymorphic target loci are contacted to generate a reaction mixture; wherein the target loci are from a plurality of different chromosomes; (ii) subjecting the reaction mixture to primer extension reaction conditions to generate Amplification products of amplicons; (iii) quantification in silico of relative frequencies of target amplicons from first and second associated chromosomes; (iv) comparison in silico of target amplicons from first and second associated chromosomes the relative frequency of the amplicon; and (v) identifying the presence or absence of aneuploidy based on the compared relative frequencies of the first and second associated chromosomes. In some embodiments, the first chromosome is a chromosome suspected of being euploid. In some embodiments, the second chromosome is a chromosome suspected of aneuploidy.

在一方面，公开了一种用于确定在包含胎儿和母本的基因组DNA的母本组织样品中存在或不存在胎儿非整倍性的方法，所述方法包括(a)从所述母本组织样品获得胎儿和母本的基因组DNA的混合物；(b)对随机选自步骤(a)的胎儿和母本的基因组DNA的混合物的DNA片段进行大规模并行DNA测序以确定所述DNA片段的序列；(c)鉴别在步骤(b)中所得序列所属的染色体；(d)使用步骤(c)的数据确定所述母本和胎儿的基因组DNA的混合物中至少一种第一染色体的量，其中所述至少一种第一染色体假定在胎儿中是整倍体，(e)使用步骤(c)的数据确定所述母本和胎儿的基因组DNA的混合物中第二染色体的量，其中所述第二染色体疑似在胎儿中是非整倍体；(f)计算胎儿DNA在胎儿和母本DNA的混合物中的分数；(g)如果第二目标染色体是整倍体，那么使用步骤(d)中的数量计算第二目标染色体的量的预计分布；(h)如果第二目标染色体是非整倍体，那么使用步骤(d)中的第一数量和在步骤(f)中所计算的胎儿DNA在胎儿和母本DNA的混合物中的分数，计算第二目标染色体的量的预计分布；以及(i)使用最大似然法或最大后验概率法确定如在步骤(e)中测定的第二染色体的量是否更可能是在步骤(g)中计算的分布或在步骤(h)中计算的分布的一部分；从而说明存在或不存在胎儿非整倍性。In one aspect, a method for determining the presence or absence of fetal aneuploidy in a maternal tissue sample comprising fetal and maternal genomic DNA comprising (a) obtaining obtaining a mixture of fetal and maternal genomic DNA from a tissue sample; (b) performing massively parallel DNA sequencing on DNA fragments randomly selected from the mixture of fetal and maternal genomic DNA in step (a) to determine the identity of said DNA fragments sequence; (c) identifying the chromosome to which the sequence obtained in step (b) belongs; (d) using the data from step (c) to determine the amount of at least one first chromosome in the mixture of maternal and fetal genomic DNA, wherein said at least one first chromosome is assumed to be euploid in the fetus, (e) using the data from step (c) to determine the amount of a second chromosome in the mixture of maternal and fetal genomic DNA, wherein said The second chromosome is suspected to be aneuploid in the fetus; (f) calculate the fraction of fetal DNA in the mixture of fetal and maternal DNA; (g) if the second chromosome of interest is euploid, then use the Calculate the expected distribution of the quantity of the second target chromosome; (h) if the second target chromosome is aneuploid, use the first quantity in step (d) and the fetal DNA calculated in step (f) in the fraction in the mixture of fetal and maternal DNA, calculating a predicted distribution of the amount of the second chromosome of interest; and (i) determining the second chromosome as determined in step (e) using a maximum likelihood method or a maximum posterior probability method is more likely to be part of the distribution calculated in step (g) or the distribution calculated in step (h); thereby indicating the presence or absence of fetal aneuploidy.

在本发明任一方面的各种实施例中，所述方法还包括从胎儿父母的一方或双方获得基因型数据。在一些实施例中，从胎儿父母的一方或双方获得基因型数据包括制备来自父母的DNA，其中所述制备包含在多个多态基因座优先富集的DNA以得到所制备的亲本DNA，任选地扩增所制备的亲本DNA，并且测量制备样品中在多个多态基因座处的亲本DNA。In various embodiments of any aspect of the invention, the method further comprises obtaining genotype data from one or both parents of the fetus. In some embodiments, obtaining genotype data from one or both parents of the fetus comprises preparing DNA from the parents, wherein said preparing comprises DNA preferentially enriched at multiple polymorphic loci to obtain prepared parental DNA, either The prepared parental DNA is optionally amplified and the parental DNA is measured at multiple polymorphic loci in the prepared sample.

在本发明任一方面的各种实施例中，使用从父母一方或双方获得的遗传数据为染色体上多个多态基因座的预计等位基因计数概率构建联合分布模型。在一些实施例中，样品(例如，第一样品)已经从母本血浆中分离并且其中通过从关于制备样品得到的DNA测量结果估计母本基因型数据而从母本获得基因型数据。In various embodiments of any aspect of the invention, genetic data obtained from one or both parents is used to construct a joint distribution model for predicted allele count probabilities at multiple polymorphic loci on a chromosome. In some embodiments, the sample (eg, the first sample) has been isolated from maternal plasma and wherein the genotype data is obtained from the mother by estimating the maternal genotype data from DNA measurements taken on preparing the sample.

在一方面，公开了一种用于帮助确定孕育中的胎儿中的染色体的倍性状态的诊断盒，其中所述诊断盒能够执行本发明方法中的任一种的制备和测量步骤。In one aspect, a diagnostic kit for assisting in determining the ploidy state of chromosomes in a gestating fetus is disclosed, wherein the diagnostic kit is capable of performing the preparation and measurement steps of any one of the methods of the invention.

在本发明任一方面的各种实施例中，等位基因计数是概率性的而不是二进制的。在一些实施例中，测量制备样品中在多个多态基因座处的DNA还用于确定胎儿是否已经遗传了一个或多个疾病连锁的单倍型。In various embodiments of any aspect of the invention, the allele counts are probabilistic rather than binary. In some embodiments, measuring DNA at multiple polymorphic loci in the prepared sample is also used to determine whether the fetus has inherited one or more disease-linked haplotypes.

在本发明任一方面的各种实施例中，为等位基因计数概率构建联合分布模型是通过将有关染色体中不同的位置的染色体交叉的概率的数据用于染色体上多态等位基因之间的模型相关性而进行的。在一些实施例中，使用不需要使用参考染色体的方法为等位基因计数构建联合分布模型并执行测定每种假设的相对概率的步骤。In various embodiments of any aspect of the invention, the joint distribution model for allele count probabilities is constructed by using data on the probability of chromosomal crossovers at different locations in the chromosome for based on model correlation. In some embodiments, the steps of constructing a joint distribution model for allele counts and determining the relative probability of each hypothesis are performed using a method that does not require the use of a reference chromosome.

在本发明任一方面的各种实施例中，测定每种假设的相对概率利用了胎儿DNA在制备样品中的估计分数。在一些实施例中，用于计算等位基因计数概率和测定每种假设的相对概率的制备样品的DNA测量结果包含原始遗传数据。在一些实施例中，选择对应于具有最大概率的假设的倍性状态是使用最大似然估计或最大后验概率估计来执行的。In various embodiments of any aspect of the invention, determining the relative probability of each hypothesis utilizes the estimated fraction of fetal DNA in the prepared sample. In some embodiments, DNA measurements of prepared samples used to calculate allele count probabilities and determine the relative probabilities of each hypothesis comprise raw genetic data. In some embodiments, selecting the ploidy state corresponding to the hypothesis with the greatest probability is performed using maximum likelihood estimation or maximum a posteriori probability estimation.

在本发明任一方面的各种实施例中，判读胎儿的倍性状态还包括将使用联合分布模型和等位基因计数概率测定的倍性假设中的每一个的相对概率与使用统计技术计算的倍性假设中的每一个的相对概率组合，所述统计技术取自由以下组成的群组：阅读计数分析、比较杂合率、仅在使用亲本遗传信息才可用的统计信息、针对某些亲本背景的归一化基因型信号的概率、使用样品(例如，第一样品)或制备样品的估计胎儿分数计算的统计信息以及其组合。In various embodiments of any aspect of the invention, deciphering the ploidy status of the fetus further comprises comparing the relative probabilities of each of the ploidy hypotheses determined using the joint distribution model and allele count probabilities to those calculated using statistical techniques. Relative probability combinations for each of the ploidy hypotheses, the statistical techniques taken from the group consisting of: read count analysis, comparing heterozygosity rates, statistics available only when using parental genetic information, for certain parental backgrounds The probability of a normalized genotype signal for , statistics calculated using the estimated fetal fraction of the sample (eg, the first sample) or the prepared sample, and combinations thereof.

在本发明任一方面的各种实施例中，计算所判读的倍性状态的置信度估计。在一些实施例中，所述方法还包括基于胎儿的所判读的倍性状态采取临床行动，其中所述临床行动选自终止妊娠或维持妊娠中的一种。In various embodiments of any aspect of the invention, a confidence estimate of the called ploidy state is calculated. In some embodiments, the method further comprises taking a clinical action based on the called ploidy state of the fetus, wherein the clinical action is selected from one of terminating the pregnancy or maintaining the pregnancy.

在本发明任一方面的各种实施例中，所述方法可以对4周与5周妊娠之间、5周与6周妊娠之间、6周与7周妊娠之间、7周与8周妊娠之间、8周与9周妊娠之间、9周与10周妊娠之间、10周与12周妊娠之间、12周与14周妊娠之间、14周与20周妊娠之间、20周与40周妊娠之间、在头三个月、在中三个月、在末三个月或其组合的胎儿执行。In various embodiments of any aspect of the invention, the method can be between 4 weeks and 5 weeks of gestation, between 5 weeks and 6 weeks of gestation, between 6 weeks and 7 weeks of gestation, between 7 weeks and 8 weeks of gestation Between gestation, between 8 and 9 weeks of gestation, between 9 and 10 weeks of gestation, between 10 and 12 weeks of gestation, between 12 and 14 weeks of gestation, between 14 and 20 weeks of gestation, between 20 Fetuses performed between 40 weeks and 40 weeks gestation, in the first trimester, in the second trimester, in the third trimester, or a combination thereof.

在本发明任一方面的各种实施例中，使用所述方法产生了一份展示孕育中的胎儿中的染色体的所测定倍性状态的报告。在一些实施例中，公开了一种经设计与本发明方法中的任一种一起使用以测定孕育中的胎儿中的目标染色体的倍性状态的试剂盒，所述试剂盒包括多个内部正向引物和任选地多个内部反向引物，其中所述引物中的每一个被设计成用于杂交到目标染色体和任选地额外染色体上紧接着多态位点之一的上游和/或下游的DNA区域，其中所述杂交区域通过少量碱基与多态位点隔开，其中所述少量选自由以下组成的群组：1、2、3、4、5、6到10、11到15、16到20、21到25、26到30、31到60以及其组合。In various embodiments of any aspect of the invention, the method is used to generate a report showing the determined ploidy state of chromosomes in a gestating fetus. In some embodiments, a kit designed for use with any of the methods of the invention to determine the ploidy state of a target chromosome in a gestating fetus is disclosed, the kit comprising a plurality of internal normal A forward primer and optionally a plurality of internal reverse primers, wherein each of the primers is designed for hybridization to the target chromosome and optionally an additional chromosome immediately upstream of one of the polymorphic sites and/or A downstream DNA region, wherein the hybridizing region is separated from the polymorphic site by a small number of bases, wherein the small number is selected from the group consisting of: 1, 2, 3, 4, 5, 6 to 10, 11 to 15, 16 to 20, 21 to 25, 26 to 30, 31 to 60 and combinations thereof.

在一方面，本发明的特征在于用于确定假设父亲是否是怀孕母亲体内正在孕育的胎儿的亲生父亲的方法。在一些实施例中，所述方法涉及(i)同时扩增来自假设父亲的遗传物质上的多个多态基因座，包括至少1,000、2,000、5,000、7,500、10,000、20,000、25,000、30,000、40,000、50,000、75,000或100,000个不同的多态基因座，从而产生第一组扩增产物；(ii)同时扩增来源于怀孕母亲的血液样品的DNA混合样品上对应的多个多态基因座以产生第二组扩增产物；其中DNA混合样品包括胎儿DNA和母本DNA；(iii)基于第一和第二组扩增产物，使用基因型测量结果，在计算机上测定假设父亲是胎儿的亲生父亲的概率；以及(iv)使用所测定的假设父亲是胎儿的亲生父亲的概率确定假设父亲是否是胎儿的亲生父亲。在不同实施例中，所述方法进一步包括同时扩增来自母亲的遗传物质上对应的多个多态基因座以产生第三组扩增产物；其中基于第一、第二和第三组扩增产物，使用基因型测量结果测定假设父亲是胎儿的亲生父亲的概率。In one aspect, the invention features a method for determining whether a hypothetical father is the biological father of a fetus developing in a pregnant mother. In some embodiments, the method involves (i) simultaneously amplifying a plurality of polymorphic loci on the genetic material from the putative father, including at least 1,000, 2,000, 5,000, 7,500, 10,000, 20,000, 25,000, 30,000, 40,000 , 50,000, 75,000, or 100,000 different polymorphic loci, thereby generating a first set of amplification products; (ii) simultaneously amplifying corresponding multiple polymorphic loci on the DNA mixture sample derived from the pregnant mother's blood sample to generating a second set of amplification products; wherein the mixed sample of DNA includes fetal DNA and maternal DNA; (iii) determining in silico that the presumed father is the biological father of the fetus based on the first and second sets of amplification products, using genotype measurements. the probability of the father; and (iv) determining whether the hypothetical father is the biological father of the fetus using the determined probability that the hypothetical father is the biological father of the fetus. In various embodiments, the method further comprises simultaneously amplifying corresponding multiple polymorphic loci on the genetic material from the mother to generate a third set of amplification products; wherein based on the first, second, and third sets of amplification Product that uses genotype measurements to determine the probability that the putative father is the biological father of the fetus.

在一方面，本发明提供估计一组胚胎中的每个胚胎将按需要发育的相对似然性的方法。在一些实施例中，所述方法涉及使来自每个胚胎的样品与同时杂交到至少1,000、2,000、5,000、7,500、10,000、20,000、25,000、30,000、40,000、50,000、75,000或100,000个不同的目标基因座的引物库接触以针对每个胚胎产生反应混合物，其中所述样品各自来源于来自胚胎的一或多个细胞。在一些实施例中，使每个反应混合物经历引物延伸反应条件以产生扩增产物。在一些实施例中，所述方法包括基于扩增产物，在计算机上测定来自每个胚胎的至少一个细胞的一或多个特征；并且基于每个胚胎的至少一个细胞的一或多个特征，在计算机上估计每个胚胎将按需要发育的相对似然性。In one aspect, the invention provides a method of estimating the relative likelihood that each embryo in a set of embryos will develop as desired. In some embodiments, the method involves simultaneously hybridizing samples from each embryo to at least 1,000, 2,000, 5,000, 7,500, 10,000, 20,000, 25,000, 30,000, 40,000, 50,000, 75,000, or 100,000 different genes of interest The primer pools of the loci are contacted to generate a reaction mixture for each embryo, wherein the samples are each derived from one or more cells from the embryo. In some embodiments, each reaction mixture is subjected to primer extension reaction conditions to generate amplification products. In some embodiments, the method comprises determining in silico one or more characteristics of at least one cell from each embryo based on the amplification product; and based on the one or more characteristics of the at least one cell from each embryo, The relative likelihood that each embryo will develop as desired is estimated in silico.

在一方面，本发明的特征在于测量核酸样品中的两种或更多种目标基因座的量的方法。在一些实施例中，所述方法涉及(i)使用PCR扩增包括第一标准基因座、第二标准基因座、第一目标基因座和第二目标基因座的核酸样品以形成扩增产物；其中第一标准基因座和第一目标基因座具有相同数量的核苷酸但是其序列在一或多个核苷酸处不同；并且其中第二标准基因座和第二目标基因座具有相同数量的核苷酸但是其序列在一或多个核苷酸处不同；(ii)对扩增产物进行测序以确定比较所扩增的第一标准基因座相比于所扩增的第二标准基因座的相对量的标准比率；其中所述标准比率指示第一标准基因座和第二标准基因座的扩增在PCR效率方面的差异；(iii)测定比较所扩增的第一目标基因座相比于所扩增的第二目标基因座的相对量的目标比率；以及(iv)基于步骤(ii)的标准比率调整步骤(iii)的目标比率以确定样品中第一目标基因座和第二目标基因座的相对量。在不同实施例中，所述方法涉及测定样品中第一目标基因座和第二目标基因座的绝对量。在不同实施例中，所述方法进一步包括确定样品中存在或不存在目标基因座(例如至少1,000、2,000、5,000、7,500、10,000、20,000、25,000、30,000、40,000、50,000、75,000或100,000个不同的目标基因座)。在不同实施例中，所述方法涉及使用本发明引物库中的任一个。在不同实施例中，所述方法涉及同时扩增1,000、2,000、5,000、7,500、10,000、20,000、25,000、30,000、40,000、50,000、75,000或100,000个不同的目标基因座。In one aspect, the invention features a method of measuring the amount of two or more target loci in a nucleic acid sample. In some embodiments, the method involves (i) using PCR to amplify a nucleic acid sample comprising a first standard locus, a second standard locus, a first target locus, and a second target locus to form an amplification product; wherein the first standard locus and the first target locus have the same number of nucleotides but differ in sequence at one or more nucleotides; and wherein the second standard locus and the second target locus have the same number of nucleotides but their sequences differ at one or more nucleotides; (ii) sequencing the amplified product to determine the comparison of the amplified first standard locus compared to the amplified second standard locus The standard ratio of the relative amount of; wherein the standard ratio indicates the difference in PCR efficiency of the amplification of the first standard locus and the second standard locus; (iii) determining the comparison of the amplified first target locus compared to a target ratio based on the relative amount of amplified second target locus; and (iv) adjusting the target ratio of step (iii) based on the standard ratio of step (ii) to determine the first target locus and the second target in the sample Relative amount of loci. In various embodiments, the methods involve determining the absolute amounts of a first locus of interest and a second locus of interest in a sample. In various embodiments, the method further comprises determining the presence or absence of the target locus in the sample (e.g., at least 1,000, 2,000, 5,000, 7,500, 10,000, 20,000, 25,000, 30,000, 40,000, 50,000, 75,000, or 100,000 different target loci). In various embodiments, the methods involve using any of the primer libraries of the invention. In various embodiments, the method involves simultaneously amplifying 1,000, 2,000, 5,000, 7,500, 10,000, 20,000, 25,000, 30,000, 40,000, 50,000, 75,000, or 100,000 different target loci.

在一方面，本发明的特征在于定量地测量分析用样品中多个遗传目标的方法。在一些实施例中，所述方法包括(i)将来源于分析用样品的遗传物质与多种目标特异性扩增试剂和对应于目标特异性扩增试剂目标的多个标准序列混合；(ii)扩增遗传物质和标准序列的目标区域以产生目标扩增子和标准序列扩增子；以及(iii)测量所产生的目标扩增子和标准序列扩增子的数量。在一些实施例中，遗传物质存在于基因库中。在一些实施例中，遗传目标是多态基因座(例如SNP)。在一些实施例中，数量的测量通过对序列进行计数来实现。在一些实施例中，所述方法进一步包括测定基因库所源于的样品中至少一种染色体的估计拷贝数，其中所述测定涉及比较目标扩增子的序列读数的数量与标准扩增子的序列读数的数量。在一些实施例中，标准序列和基因库包括能够被相同引物引发的通用引发位点。在一些实施例中，混合步骤包括至少10、100、500、1,000、2,000、5,000、7,500、10,000、20,000、25,000、30,000、40,000、50,000、75,000或100,000份不同的目标特异性扩增试剂和至少10、100、500、1,000、2,000、5,000、7,500、10,000、20,000、25,000、30,000、40,000、50,000、75,000或100,000个标准序列。在不同实施例中，所述方法涉及使用本发明引物库中的任一个。在不同实施例中，所述方法涉及同时扩增1,000、2,000、5,000、7,500、10,000、20,000、25,000、30,000、40,000、50,000、75,000或100,000个不同的目标区域。在一些实施例中，标准序列中的每一个的相对量是已知的。在一些实施例中，所述序列中的每一个的相对量已经关于参考基因组进行校准。在一些实施例中，分析用样品包括胎儿和母本的基因组的混合物。在一些实施例中，分析用样品来源于孕妇的血液或来源于血浆。在一些实施例中，参考基因组具有至少一个非整倍性，例如在13号、18号、21号、X或Y染色体处的非整倍性。在一些实施例中，参考基因组是二倍体。In one aspect, the invention features methods of quantitatively measuring a plurality of genetic targets in a sample for analysis. In some embodiments, the method comprises (i) mixing genetic material derived from a sample for analysis with a plurality of target-specific amplification reagents and a plurality of standard sequences corresponding to targets of the target-specific amplification reagents; (ii ) amplifying the target region of the genetic material and the standard sequence to generate the target amplicon and the standard sequence amplicon; and (iii) measuring the amount of the target amplicon and the standard sequence amplicon produced. In some embodiments, the genetic material is present in a gene bank. In some embodiments, the genetic target is a polymorphic locus (eg, a SNP). In some embodiments, the measurement of the quantity is achieved by counting the sequence. In some embodiments, the method further comprises determining the estimated copy number of at least one chromosome in the sample from which the gene bank was derived, wherein said determining involves comparing the number of sequence reads of the amplicon of interest to that of a standard amplicon. Number of sequence reads. In some embodiments, standard sequences and gene libraries include universal priming sites capable of being primed by the same primers. In some embodiments, the mixing step includes at least 10, 100, 500, 1,000, 2,000, 5,000, 7,500, 10,000, 20,000, 25,000, 30,000, 40,000, 50,000, 75,000, or 100,000 parts of different target-specific amplification reagents and at least 10, 100, 500, 1,000, 2,000, 5,000, 7,500, 10,000, 20,000, 25,000, 30,000, 40,000, 50,000, 75,000, or 100,000 standard sequences. In various embodiments, the methods involve using any of the primer libraries of the invention. In various embodiments, the method involves simultaneously amplifying 1,000, 2,000, 5,000, 7,500, 10,000, 20,000, 25,000, 30,000, 40,000, 50,000, 75,000, or 100,000 different target regions. In some embodiments, the relative amounts of each of the standard sequences are known. In some embodiments, the relative amounts of each of the sequences have been calibrated against a reference genome. In some embodiments, the sample for analysis includes a mixture of fetal and maternal genomes. In some embodiments, the sample for analysis is derived from blood of a pregnant woman or derived from plasma. In some embodiments, the reference genome has at least one aneuploidy, such as an aneuploidy at chromosome 13, 18, 21, X or Y. In some embodiments, the reference genome is diploid.

在一方面，本发明的特征在于一种包括多个基因标准序列的混合物，其中混合物中的每个基因标准序列的相对量已经通过与参考基因组的校准而确定。在不同实施例中，所述混合物包括至少10、100、500、1,000、2,000、5,000、7,500、10,000、20,000、25,000、30,000、40,000、50,000、75,000或100,000个基因标准序列。在不同实施例中，基因标准序列包括第一通用引发位点、第二通用引发位点、第一目标特异性引发位点、第二目标特异性引发位点以及位于第一和第二目标特异性引发位点之间的标志物序列，其中第一目标特异性位点和第二目标特异性引发位点位于第一和第二通用引发位点之间。在不同实施例中，所述校准涉及使用本发明引物库中的任一个。在不同实施例中，所述校准涉及同时扩增1,000、2,000、5,000、7,500、10,000、20,000、25,000、30,000、40,000、50,000、75,000或100,000个不同的目标区域。在一些实施例中，参考基因组具有至少一个非整倍性，例如在13号、18号、21号、X或Y染色体处的非整倍性。在一些实施例中，参考基因组是二倍体。In one aspect, the invention features a mixture comprising a plurality of genetic standard sequences, wherein the relative amount of each genetic standard sequence in the mixture has been determined by calibration to a reference genome. In various embodiments, the mixture includes at least 10, 100, 500, 1,000, 2,000, 5,000, 7,500, 10,000, 20,000, 25,000, 30,000, 40,000, 50,000, 75,000, or 100,000 gene standard sequences. In various embodiments, the gene standard sequence includes a first universal priming site, a second universal priming site, a first target-specific priming site, a second target-specific priming site and A marker sequence between the generic priming sites, wherein the first target-specific priming site and the second target-specific priming site are located between the first and second universal priming sites. In various embodiments, said calibration involves using any of the primer libraries of the invention. In various embodiments, the calibration involves simultaneously amplifying 1,000, 2,000, 5,000, 7,500, 10,000, 20,000, 25,000, 30,000, 40,000, 50,000, 75,000 or 100,000 different target regions. In some embodiments, the reference genome has at least one aneuploidy, such as an aneuploidy at chromosome 13, 18, 21, X or Y. In some embodiments, the reference genome is diploid.

在一方面，本发明的特征在于产生一组经校准基因标准序列的方法。在一些实施例中，所述方法包括(i)形成扩增反应混合物，它包括从参考基因组制备的基因库、多个目标特异性扩增引物试剂组以及对应于目标特异性扩增试剂组的多个基因标准序列；(ii)扩增基因库和基因标准序列以产生目标序列的扩增子和基因标准序列的扩增子；(iii)测量目标序列的扩增子和基因标准序列的扩增子的数量；以及(iv)测定基因标准序列中的每一个相对于彼此的相对量，借此校准多个基因标准序列。在不同实施例中，使用至少10、100、500、1,000、2,000、5,000、7,500、10,000、20,000、25,000、30,000、40,000、50,000、75,000或100,000个基因标准序列。在不同实施例中，所述方法涉及使用本发明引物库中的任一个。在不同实施例中，所述方法涉及同时扩增1,000、2,000、5,000、7,500、10,000、20,000、25,000、30,000、40,000、50,000、75,000或100,000个不同序列。在一些实施例中，参考基因组具有至少一个非整倍性，例如在13号、18号、21号、X或Y染色体处的非整倍性。在一些实施例中，参考基因组是二倍体。In one aspect, the invention features a method of generating a set of calibrated genetic standard sequences. In some embodiments, the method includes (i) forming an amplification reaction mixture comprising a gene pool prepared from a reference genome, a plurality of target-specific amplification primer reagent sets, and a corresponding target-specific amplification reagent set. multiple gene standard sequences; (ii) amplify the gene pool and gene standard sequences to generate target sequence amplicons and gene standard sequence amplicons; (iii) measure target sequence amplicons and gene standard sequence amplicons the number of accumulators; and (iv) determining the relative amount of each of the gene standard sequences relative to each other, thereby calibrating the plurality of gene standard sequences. In various embodiments, at least 10, 100, 500, 1,000, 2,000, 5,000, 7,500, 10,000, 20,000, 25,000, 30,000, 40,000, 50,000, 75,000, or 100,000 gene standard sequences are used. In various embodiments, the methods involve using any of the primer libraries of the invention. In various embodiments, the method involves simultaneously amplifying 1,000, 2,000, 5,000, 7,500, 10,000, 20,000, 25,000, 30,000, 40,000, 50,000, 75,000, or 100,000 different sequences. In some embodiments, the reference genome has at least one aneuploidy, such as an aneuploidy at chromosome 13, 18, 21, X or Y. In some embodiments, the reference genome is diploid.

在一方面，本发明提供一组已经根据本发明方法中的任一种进行校准的基因标准序列。在一方面，本发明提供一组可以在执行所述方法之前、在此期间或在此之后进行校准的基因标准序列。In one aspect, the invention provides a set of genetic standard sequences that have been calibrated according to any of the methods of the invention. In one aspect, the invention provides a set of genetic standard sequences that can be calibrated before, during or after performing the method.

在一方面，本发明的特征在于测量具有至少一个具有缺失的等位基因的相关基因的拷贝数的方法。在一些实施例中，所述方法包括(i)将来源于分析用样品的遗传物质与对相关基因具有特异性并且不能够显著扩增相关基因的包含缺失的等位基因的扩增试剂、对应于相关基因的标准序列、对参考序列具有特异性的扩增试剂以及对应于参考序列的标准序列混合；(ii)扩增相关基因序列、对应于相关基因的标准序列、参考序列以及对应于参考序列的标准序列以产生相关基因扩增子、参考序列扩增子以及标准序列扩增子；以及(iii)测量所产生的目标扩增子和标准序列扩增子的数量。在一些实施例中，数量的测量通过对序列读数进行计数来实现。在一些实施例中，所述方法进一步包括测定基因库所源于的样品中至少一种染色体的估计拷贝数，其中所述测定涉及比较目标扩增子的序列的数量与标准扩增子的序列的数量。在一些实施例中，标准序列和基因库包括能够被相同引物引发的通用引发位点。在一些实施例中，所述序列中的每一个的相对量已经关于参考基因组进行校准。在不同实施例中，使用至少10、100、500、1,000、2,000、5,000、7,500、10,000、20,000、25,000、30,000、40,000、50,000、75,000或100,000个基因标准序列。在不同实施例中，所述方法涉及使用本发明引物库中的任一个。在不同实施例中，所述方法涉及同时扩增1,000、2,000、5,000、7,500、10,000、20,000、25,000、30,000、40,000、50,000、75,000或100,000个不同的目标区域。在一些实施例中，参考基因组是二倍体。在一些实施例中，分析用样品来源于血液。In one aspect, the invention features a method of measuring the copy number of a gene associated with at least one allele having a deletion. In some embodiments, the method comprises (i) combining genetic material derived from a sample for analysis with an amplification reagent specific for a gene of interest comprising a deletion allele that is not capable of significantly amplifying the gene of interest, corresponding to (ii) amplify the relevant gene sequence, the standard sequence corresponding to the relevant gene, the reference sequence and the reference sequence corresponding to the reference sequence; standard sequences of sequences to generate related gene amplicons, reference sequence amplicons, and standard sequence amplicons; and (iii) measuring the number of generated target amplicons and standard sequence amplicons. In some embodiments, the measurement of quantity is achieved by counting sequence reads. In some embodiments, the method further comprises determining the estimated copy number of at least one chromosome in the sample from which the gene bank was derived, wherein said determining involves comparing the number of sequences of target amplicons to the sequences of standard amplicons quantity. In some embodiments, standard sequences and gene libraries include universal priming sites capable of being primed by the same primers. In some embodiments, the relative amounts of each of the sequences have been calibrated against a reference genome. In various embodiments, at least 10, 100, 500, 1,000, 2,000, 5,000, 7,500, 10,000, 20,000, 25,000, 30,000, 40,000, 50,000, 75,000, or 100,000 gene standard sequences are used. In various embodiments, the methods involve using any of the primer libraries of the invention. In various embodiments, the method involves simultaneously amplifying 1,000, 2,000, 5,000, 7,500, 10,000, 20,000, 25,000, 30,000, 40,000, 50,000, 75,000, or 100,000 different target regions. In some embodiments, the reference genome is diploid. In some embodiments, the sample for analysis is derived from blood.

在本发明任一方面的一些实施例中，优先富集样品(例如，第一样品)中在目标基因座(例如，多个多态基因座)处的DNA包括获得多个已预环化探针，其中每个探针靶向基因座(例如，多态基因座)中的一个，其中所述探针的3′和5′端优选地被设计成用于杂交到通过少量碱基与基因座的多态位点隔开的DNA区域，其中所述少量是1、2、3、4、5、6、7、8、9、10、11、12、13、14、15、16、17、18、19、20、21到25、26到30、31到60或其组合；使已预环化探针杂交到样品(例如，第一样品)的DNA；使用DNA聚合酶填充杂交探针末端之间的间隙；使已预环化探针环化；以及扩增已环化探针。In some embodiments of any aspect of the invention, preferentially enriching DNA at a locus of interest (e.g., a plurality of polymorphic loci) in a sample (e.g., a first sample) comprises obtaining a plurality of precircularized Probes, wherein each probe targets one of the loci (e.g., polymorphic loci), wherein the 3' and 5' ends of the probes are preferably designed for hybridization to DNA regions separated by polymorphic sites of a genetic locus, wherein said small number is 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21 to 25, 26 to 30, 31 to 60, or combinations thereof; hybridize pre-circularized probes to DNA of a sample (e.g., first sample); fill hybridization using a DNA polymerase Gaps between probe ends; Circularization of precircularized probes; and Amplification of circularized probes.

在本发明任一方面的一些实施例中，在目标基因座(例如，多个多态基因座)优先富集的DNA包括获得多个接合介导的PCR探针，其中每个PCR探针靶向目标基因座(例如，多态基因座)中的一个，并且其中上游和下游PCR探针被设计成用于杂交到优选地通过少量碱基与基因座的多态位点隔开的DNA的一条链上的DNA区域，其中所述少量是1、2、3、4、5、6、7、8、9、10、11、12、13、14、15、16、17、18、19、20、21到25、26到30、31到60或其组合；使接合介导的PCR探针杂交到样品(例如，第一样品)的DNA；使用DNA聚合酶填充接合介导的PCR探针末端之间的间隙；对接合介导的PCR探针进行接合；以及扩增所接合的接合介导的PCR探针。In some embodiments of any aspect of the invention, DNA preferentially enriched at loci of interest (e.g., a plurality of polymorphic loci) comprises obtaining a plurality of junction-mediated PCR probes, wherein each PCR probe targets To one of the loci of interest (e.g., polymorphic loci), and wherein the upstream and downstream PCR probes are designed for hybridization to DNA preferably separated by a small number of bases from the polymorphic locus of the locus A region of DNA on one strand, wherein said small amount is 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21 to 25, 26 to 30, 31 to 60, or combinations thereof; hybridizing the ligation-mediated PCR probe to DNA of a sample (e.g., the first sample); filling the ligation-mediated PCR probe with a DNA polymerase gap between needle ends; ligation of the ligation-mediated PCR probe; and amplification of the ligated ligation-mediated PCR probe.

在本发明的各种方面的一些实施例中，在目标基因座(例如，多个多态基因座)优先富集的DNA包括获得多个靶向基因座(例如，多态基因座)的杂交捕获探针；使杂交捕获探针杂交到样品(例如，第一样品)中的DNA并且以物理方式从DNA样品(例如，第一样品)中去除一些或所有未杂交DNA。In some embodiments of the various aspects of the invention, DNA preferentially enriched at a target locus (e.g., a plurality of polymorphic loci) comprises obtaining a hybridization of a plurality of targeted loci (e.g., a polymorphic locus) Capture Probe; hybridizing the capture probe to DNA in a sample (eg, first sample) and physically removing some or all of the unhybridized DNA from the DNA sample (eg, first sample).

在本发明任一方面的一些实施例中，杂交捕获探针被设计成用于杂交到侧接但是不重叠多态位点的区域。在一些实施例中，杂交捕获探针被设计成用于杂交到侧接但是不重叠多态位点的区域，并且其中侧接捕获探针的长度可以选自由以下组成的群组：小于约120个碱基、小于约110个碱基、小于约100个碱基、小于约90个碱基、小于约80个碱基、小于约70个碱基、小于约60个碱基、小于约50个碱基、小于约40个碱基、小于约30个碱基并且小于约25个碱基。在一些实施例中，杂交捕获探针被设计成用于杂交到重叠多态位点的区域，并且其中多个杂交捕获探针包含针对每个多态基因座的至少两个杂交捕获探针，并且其中每个杂交捕获探针被设计成用于与在多态基因座处的不同等位基因互补。In some embodiments of any aspect of the invention, hybridization capture probes are designed to hybridize to regions flanking, but not overlapping, the polymorphic site. In some embodiments, the hybridization capture probe is designed to hybridize to a region flanking but not overlapping the polymorphic site, and wherein the length of the flanking capture probe can be selected from the group consisting of less than about 120 bases, less than about 110 bases, less than about 100 bases, less than about 90 bases, less than about 80 bases, less than about 70 bases, less than about 60 bases, less than about 50 bases bases, less than about 40 bases, less than about 30 bases, and less than about 25 bases. In some embodiments, the hybridization capture probes are designed to hybridize to regions of overlapping polymorphic loci, and wherein the plurality of hybridization capture probes comprises at least two hybridization capture probes for each polymorphic locus, And wherein each hybridization capture probe is designed to complement a different allele at the polymorphic locus.

在本发明任一方面的一些实施例中，在多个多态基因座优先富集的DNA包括获得多个内部正向引物，其中每个引物靶向多态基因座中的一个，并且其中内部正向引物的3′端被设计成用于杂交到多态位点上游并且通过少量碱基与多态位点隔开的DNA区域，其中所述少量选自由1、2、3、4、5、6到10、11到15、16到20、21到25、26到30或31到60个碱基对组成的群组；任选地获得多个内部反向引物，其中每个引物靶向多态基因座中的一个，并且其中内部反向引物的3′端被设计成用于杂交到多态位点上游并且通过少量碱基与多态位点隔开的DNA区域，其中所述少量选自由1、2、3、4、5、6到10、11到15、16到20、21到25、26到30或31到60个碱基对组成的群组；使内部引物杂交到DNA；以及使用聚合酶链式反应扩增DNA以形成扩增子。In some embodiments of any aspect of the invention, DNA preferentially enriched at a plurality of polymorphic loci comprises obtaining a plurality of internal forward primers, wherein each primer targets one of the polymorphic loci, and wherein the internal The 3' end of the forward primer is designed to hybridize to a DNA region upstream of the polymorphic site and separated from the polymorphic site by a small number of bases selected from the group consisting of 1, 2, 3, 4, 5 , 6 to 10, 11 to 15, 16 to 20, 21 to 25, 26 to 30, or 31 to 60 base pairs in groups; optionally obtain multiple internal reverse primers, where each primer targets One of the polymorphic loci, and wherein the 3' end of the internal reverse primer is designed to hybridize to a DNA region upstream of the polymorphic site and separated from the polymorphic site by a small number of bases, wherein the small selected from the group consisting of 1, 2, 3, 4, 5, 6 to 10, 11 to 15, 16 to 20, 21 to 25, 26 to 30, or 31 to 60 base pairs; hybridizes internal primer to DNA and amplifying the DNA using the polymerase chain reaction to form amplicons.

在本发明任一方面的一些实施例中，所述方法还包括获得多个外部正向引物，其中每个引物靶向目标(例如，多态基因座)中的一个，并且其中外部正向引物被设计成用于杂交到内部正向引物上游的DNA区域；任选地获得多个外部反向引物，其中每个引物靶向目标基因座(例如，多态基因座)中的一个，并且其中外部反向引物被设计成用于杂交到紧接着内部反向引物下游的DNA区域；使第一引物杂交到DNA；以及使用聚合酶链式反应扩增DNA。In some embodiments of any aspect of the invention, the method further comprises obtaining a plurality of outer forward primers, wherein each primer targets one of the targets (e.g., polymorphic loci), and wherein the outer forward primers Designed to hybridize to a DNA region upstream of an internal forward primer; optionally obtaining a plurality of external reverse primers, wherein each primer targets one of the loci of interest (e.g., polymorphic loci), and wherein The outer reverse primer is designed to hybridize to the region of DNA immediately downstream of the inner reverse primer; hybridize the first primer to the DNA; and amplify the DNA using polymerase chain reaction.

在本发明任一方面的一些实施例中，所述方法还包括获得多个外部反向引物，其中每个引物靶向多态基因座中的一个，并且其中外部反向引物被设计成用于杂交到紧接着内部反向引物下游的DNA区域；任选地获得多个外部正向引物，其中每个引物靶向目标基因座(例如，多态基因座)中的一个，并且其中外部正向引物被设计成用于杂交到内部正向引物上游的DNA区域；使第一引物杂交到DNA；以及使用聚合酶链式反应扩增DNA。In some embodiments of any aspect of the invention, the method further comprises obtaining a plurality of outer reverse primers, wherein each primer targets one of the polymorphic loci, and wherein the outer reverse primers are designed for hybridizes to the DNA region immediately downstream of the inner reverse primer; optionally obtaining a plurality of outer forward primers, wherein each primer targets one of the loci of interest (e.g., polymorphic loci), and wherein the outer forward Primers are designed to hybridize to a region of DNA upstream of the internal forward primer; hybridize the first primer to the DNA; and amplify the DNA using polymerase chain reaction.

在本发明任一方面的一些实施例中，制备样品(例如，第一样品)进一步包括将通用衔接子附加到样品(例如，第一样品)中的DNA并且使用聚合酶链式反应扩增样品(例如，第一样品)中的DNA。在一些实施例中，至少一定分数的被扩增的扩增子是小于100bp、小于90bp、小于80bp、小于70bp、小于65bp、小于60bp、小于55bp、小于50bp或小于45bp，并且其中所述分数是10％、20％、30％、40％、50％、60％、70％、80％、90％或99％。In some embodiments of any aspect of the invention, preparing the sample (e.g., the first sample) further comprises appending universal adapters to the DNA in the sample (e.g., the first sample) and amplifying the DNA using polymerase chain reaction. DNA in a sample (eg, first sample) is amplified. In some embodiments, at least some fraction of the amplified amplicons is less than 100 bp, less than 90 bp, less than 80 bp, less than 70 bp, less than 65 bp, less than 60 bp, less than 55 bp, less than 50 bp, or less than 45 bp, and wherein said fraction Is it 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90% or 99%.

在本发明任一方面的一些实施例中，扩增DNA是在一个或多个单独的反应体积中进行的，并且其中每个单独的反应体积含有超过100个不同的正向和反向引物对、超过200个不同的正向和反向引物对、超过500个不同的正向和反向引物对、超过1,000个不同的正向和反向引物对、超过2,000个不同的正向和反向引物对、超过5,000个不同的正向和反向引物对、超过10,000个不同的正向和反向引物对、超过20,000个不同的正向和反向引物对、超过50,000个不同的正向和反向引物对或超过100,000个不同的正向和反向引物对。In some embodiments of any aspect of the invention, amplifying the DNA is performed in one or more individual reaction volumes, and wherein each individual reaction volume contains more than 100 different pairs of forward and reverse primers , over 200 different forward and reverse primer pairs, over 500 different forward and reverse primer pairs, over 1,000 different forward and reverse primer pairs, over 2,000 different forward and reverse Primer pairs, over 5,000 different forward and reverse primer pairs, over 10,000 different forward and reverse primer pairs, over 20,000 different forward and reverse primer pairs, over 50,000 different forward and Reverse primer pairs or over 100,000 different forward and reverse primer pairs.

在本发明任一方面的一些实施例中，制备样品(例如，第一样品)进一步包含把样品(例如，第一样品)分成多个部分，并且其中优先富集每个部分中在目标基因座(例如，多个多态基因座)子组处的DNA。在一些实施例中，内部引物通过鉴别可能形成不希望的引物双螺旋体的引物对并且从多个引物中去除被鉴别为可能形成不希望的引物双螺旋体的引物对中的至少一个来选择。在一些实施例中，内部引物含有被设计成用于杂交靶向基因座(例如，多态基因座)的上游或下游的区域，并且任选地含有被设计成用于允许PCR扩增的通用引发序列。在一些实施例中，引物中的至少一些额外含有不同于每个单独的引物分子的随机区域。在一些实施例中，引物中的至少一些额外含有分子条形码。In some embodiments of any aspect of the invention, preparing the sample (e.g., the first sample) further comprises dividing the sample (e.g., the first sample) into a plurality of fractions, and wherein each fraction is preferentially enriched in the target DNA at a subset of loci (eg, multiple polymorphic loci). In some embodiments, internal primers are selected by identifying primer pairs that are likely to form undesired primer duplexes and removing from the plurality of primers at least one of the primer pairs that are identified as likely to form undesired primer duplexes. In some embodiments, an internal primer contains a region designed to hybridize upstream or downstream of a targeted locus (e.g., a polymorphic locus), and optionally contains a general primer designed to allow PCR amplification. trigger sequence. In some embodiments, at least some of the primers additionally contain random regions that differ from each individual primer molecule. In some embodiments, at least some of the primers additionally contain molecular barcodes.

在本发明任一方面的一些实施例中，优先富集引起的制备样品与样品(例如，第一样品)之间等位基因偏差的平均程度的倍数选自由以下组成的群组：不超过2倍、不超过1.5倍、不超过1.2倍、不超过1.1倍、不超过1.05倍、不超过1.02倍、不超过1.01倍、不超过1.005倍、不超过1.002倍、不超过1.001倍并且不超过1.0001倍。在一些实施例中，多个多态基因座是SNP。在一些实施例中，测量制备样品中的DNA通过测序进行。In some embodiments of any aspect of the invention, the multiple of the average degree of allelic bias between the prepared sample and the sample (e.g., the first sample) caused by preferential enrichment is selected from the group consisting of: no more than 2 times, not more than 1.5 times, not more than 1.2 times, not more than 1.1 times, not more than 1.05 times, not more than 1.02 times, not more than 1.01 times, not more than 1.005 times, not more than 1.002 times, not more than 1.001 times and not more than 1.0001 times. In some embodiments, the plurality of polymorphic loci are SNPs. In some embodiments, measuring the DNA in the prepared sample is by sequencing.

在本发明任一方面的一些实施例中，目标基因座存在于相同的相关核酸(例如相同染色体或染色体的相同区域)上。在一些实施例中，目标基因座中的至少一些存在于不同的相关核酸(例如不同染色体)上。在一些实施例中，核酸样品包括片段化或经消化的核酸。在一些实施例中，核酸样品包括基因组DNA、cDNA或mRNA。在一些实施例中，核酸样品包括来自单细胞的DNA。在一些实施例中，核酸样品是实质上不含细胞的血液或血浆样品。在一些实施例中，核酸样品包括或来源于血液、血浆、唾液、精液、精子、细胞培养上清液、粘液分泌、牙斑、胃肠道组织、粪便、尿液、毛发、骨骼、体液、泪液、组织、皮肤、指甲、分裂球、胚胎、羊水、绒毛样品、胆汁、淋巴、子宫颈粘液或法医样品。在一些实施例中，目标基因座是人核酸的区段。在一些实施例中，目标基因座包含或其组成为单核苷酸多态性(SNP)。在一些实施例中，引物是DNA分子。In some embodiments of any aspect of the invention, the loci of interest are present on the same related nucleic acid (eg, the same chromosome or the same region of a chromosome). In some embodiments, at least some of the loci of interest are present on different associated nucleic acids (eg, different chromosomes). In some embodiments, nucleic acid samples include fragmented or digested nucleic acids. In some embodiments, nucleic acid samples include genomic DNA, cDNA or mRNA. In some embodiments, the nucleic acid sample includes DNA from a single cell. In some embodiments, the nucleic acid sample is a substantially cell-free blood or plasma sample. In some embodiments, the nucleic acid sample includes or is derived from blood, plasma, saliva, semen, sperm, cell culture supernatant, mucus secretion, dental plaque, gastrointestinal tissue, feces, urine, hair, bone, body fluid, Tears, tissue, skin, nails, blastomeres, embryos, amniotic fluid, villi samples, bile, lymph, cervical mucus or forensic samples. In some embodiments, the target locus is a segment of human nucleic acid. In some embodiments, the locus of interest comprises or consists of single nucleotide polymorphisms (SNPs). In some embodiments, primers are DNA molecules.

在本发明任一方面的一些实施例中，样品(例如，第一样品)中的DNA源自母本血浆。在一些实施例中，制备样品(例如，第一样品)进一步包含扩增DNA。在一些实施例中，制备样品(例如，第一样品)进一步包含优先富集样品(例如，第一样品)中在目标基因座(例如，多个多态基因座)处的DNA。In some embodiments of any aspect of the invention, the DNA in the sample (eg, the first sample) is derived from maternal plasma. In some embodiments, preparing the sample (eg, the first sample) further comprises amplifying the DNA. In some embodiments, preparing the sample (eg, first sample) further comprises preferentially enriching DNA at loci of interest (eg, a plurality of polymorphic loci) in the sample (eg, first sample).

在不同实施例中，引物延伸反应或聚合酶链式反应包括通过聚合酶添加一或多个核苷酸。在不同实施例中，引物延伸反应或聚合酶链式反应不包括接合介导的PCR。在不同实施例中，引物延伸反应或聚合酶链式反应不包括通过连接酶连接两个引物。在不同实施例中，引物不包括连锁倒置探针(LIP)，其还可以被称为已预环化探针、预环化中探针、环化中探针；锁式探针或分子倒置探针(MIP)。In various embodiments, the primer extension reaction or polymerase chain reaction includes the addition of one or more nucleotides by a polymerase. In various embodiments, the primer extension reaction or polymerase chain reaction does not include ligation-mediated PCR. In various embodiments, the primer extension reaction or polymerase chain reaction does not involve joining the two primers by a ligase. In various embodiments, the primers do not include linkage inversion probes (LIPs), which may also be referred to as precircularized probes, precircularizing probes, circularizing probes; padlock probes or molecular inversions probe (MIP).

应了解，本文中所述的本发明的各方面和实施例包括“包含”、“组成为”和“基本上组成为”各方面和实施例。It is to be understood that aspects and embodiments of the invention described herein include "comprising," "consisting of," and "consisting essentially of" aspects and embodiments.

定义definition

单核苷酸多态性(SNP)是指同一物种的两个成员的基因组之间可以不同的单核苷酸。所述术语的使用不应该暗示对每个变异体发生的频率的任何限制。A single nucleotide polymorphism (SNP) refers to a single nucleotide that can differ between the genomes of two members of the same species. The use of the term should not imply any limitation on the frequency with which each variant occurs.

序列是指DNA序列或基因序列。它可以指个体中DNA分子或链的一级物理结构。它可以指在所述DNA分子中发现的核苷酸的序列，或所述DNA分子的互补链。它可以指DNA分子中包含的信息，如在计算机模拟中用来代表DNA分子的信息。Sequence refers to a DNA sequence or gene sequence. It can refer to the primary physical structure of a DNA molecule or strand in an individual. It can refer to the sequence of nucleotides found in said DNA molecule, or the complementary strand of said DNA molecule. It can refer to the information contained in a DNA molecule, as used to represent the DNA molecule in computer simulations.

基因座是指个体的DNA上的特定相关区域，它可以指SNP、可能的插入或缺失位点或一些其它相关基因变异的位点。疾病连锁的SNP还可以指疾病连锁的基因座。A locus refers to a specific associated region on an individual's DNA, which may refer to a SNP, a site of possible insertion or deletion, or some other site of associated genetic variation. A disease-linked SNP can also refer to a disease-linked locus.

多态等位基因也称“多态基因座”，是指在指定物种内的个体之间基因型不同的等位基因或基因座。多态等位基因的一些实例包括单核苷酸多态性、短串连重复序列、缺失、复制和倒置。Polymorphic alleles, also known as "polymorphic loci", refer to alleles or loci that differ in genotype between individuals within a given species. Some examples of polymorphic alleles include single nucleotide polymorphisms, short tandem repeats, deletions, duplications and inversions.

多态位点是指在个体之间变化的多态区域中发现的特定核苷酸。A polymorphic site refers to a specific nucleotide found in a polymorphic region that varies between individuals.

等位基因是指占据特定基因座的基因。Alleles are genes that occupy a specific locus.

遗传数据也称“基因型数据”，是指描述一或多个个体的基因组方面的数据。它可以指一个或一组基因座、部分或整个序列、部分或整个染色体或整个基因组。它可以指一个或多个核苷酸的身份；它可以指一组连续的核苷酸或来自基因组中不同位置的核苷酸或其组合。基因型数据通常是在计算机中模拟的，然而，也可能考虑在序列中以化学编码的遗传数据表示的实际核苷酸。基因型数据可以被说成是“在”个体“上”、个体“的”、“在”个体、“来自”个体或“在”个体“上”。基因型数据可以指来自基因分型平台的输出测量结果，其中那些测量是对遗传物质进行的。Genetic data, also known as "genotype data," refers to data describing aspects of the genome of one or more individuals. It can refer to a locus or a group of genes, part or the whole sequence, part or the whole chromosome or the whole genome. It can refer to the identity of one or more nucleotides; it can refer to a contiguous group of nucleotides or nucleotides from different locations in the genome or a combination thereof. Genotype data are usually simulated in silico, however, it is also possible to consider the actual nucleotides represented in the sequence as chemically encoded genetic data. Genotype data can be said to be "on" an individual, "of" an individual, "on" an individual, "from" an individual, or "on" an individual. Genotype data may refer to output measurements from a genotyping platform, where those measurements are made on genetic material.

遗传物质也称为“遗传样品”，是指身体物质，例如来自包含DNA或RNA的一或多个个体的组织或血液。Genetic material, also called "genetic sample," refers to bodily material, such as tissue or blood, from one or more individuals that contains DNA or RNA.

有噪声的遗传数据是指以下任何遗传数据：等位基因丢失、不确定的碱基对测量、不正确的碱基对测量、丢失的碱基对测量、不确定的插入或缺失测量、不确定的染色体区段拷贝数的测量、杂散信号、丢失测量、其它错误或其组合。Noisy genetic data refers to any of the following genetic data: allelic loss, uncertain base pair measurements, incorrect base pair measurements, missing base pair measurements, uncertain insertion or deletion measurements, uncertain Measurements of copy number of chromosomal segments, spurious signals, missing measurements, other errors, or combinations thereof.

置信度是指所谓的SNP、等位基因、等位基因组、倍性判读或所测定的染色体区段拷贝数正确地代表个体的真实遗传状态的统计似然性。Confidence refers to the so-called statistical likelihood that a SNP, allele, set of alleles, ploidy call, or determined copy number of a chromosome segment correctly represents the true genetic state of an individual.

倍性判读也称为“染色体拷贝数判读”或“拷贝数判读”(CNC)，可以指测定细胞中所存在的一或多个染色体的数量和/或染色体身份的行为。Ploidy calling, also known as "chromosomal copy number calling" or "copy number calling" (CNC), can refer to the act of determining the number and/or chromosome identity of one or more chromosomes present in a cell.

非整倍性是指在细胞中存在错误数量的染色体(例如，错误数量的完整染色体或错误数量的染色体区段，例如存在染色体区段的缺失或复制)的状态。在人体细胞的情况下，它可以指细胞不含22对常染色体和一对性染色体的情况。在人配子的情况下，它可以指细胞不含23条染色体中的一条的情况。在单一染色体类型的情况下，它可以指其中存在多于或少于两个同源但不一致的染色体拷贝，或其中存在源自同一亲本的两个染色体拷贝的情况。在一些实施例中，染色体区段的缺失是微缺失。Aneuploidy refers to a state in which there is an incorrect number of chromosomes in a cell (eg, an incorrect number of intact chromosomes or an incorrect number of chromosome segments, eg, there is a deletion or duplication of a chromosome segment). In the case of human cells, it can refer to a situation where the cell does not contain the 22 pairs of autosomes and one pair of sex chromosomes. In the case of human gametes, it can refer to the situation where a cell does not contain one of the 23 chromosomes. In the case of a single chromosome type, it can refer to a situation where there are more or less than two homologous but non-identical copies of a chromosome, or where there are two copies of a chromosome derived from the same parent. In some embodiments, the deletion of a chromosomal segment is a microdeletion.

倍性状态是指细胞中一或多个染色体类型的数量和/或染色体身份。Ploidy state refers to the number and/or chromosome identity of one or more chromosome types in a cell.

染色体可以指单一染色体拷贝，意指在正常体细胞中存在46条的单一DNA分子；一个实例是‘源于母本的18号染色体’。染色体还可以指在正常的人体细胞中存在23条的染色体类型；一个实例是‘18号染色体’。Chromosome can refer to a single chromosomal copy, meaning a single DNA molecule of 46 in normal somatic cells; an example is 'maternally derived chromosome 18'. Chromosome can also refer to a type of chromosome of which there are 23 in normal human cells; an example is 'chromosome 18'.

染色体特征可以指参考染色体数量，即染色体类型。正常的人类具有22个类型的已编号的常染色体和两个类型的性染色体。它还可以指染色体的亲本来源。它还可以指从亲本遗传的特定染色体。它还可以指染色体的其它属性特征。Chromosomal characteristics may refer to a reference chromosome number, ie, chromosome type. Normal humans have 22 types of numbered autosomes and two types of sex chromosomes. It can also refer to the parental source of chromosomes. It can also refer to a specific chromosome inherited from a parent. It can also refer to other attribute characteristics of chromosomes.

遗传物质状态或简称“遗传状态”可以指DNA上一组SNP的身份、遗传物质的定相单倍型以及DNA序列，包括插入、缺失、重复和突变。它还可以指一或多个染色体、染色体区段或染色体区段组的倍性状态。The state of genetic material or simply "genetic state" can refer to the identity of a set of SNPs on the DNA, the phased haplotype of the genetic material, and the DNA sequence, including insertions, deletions, duplications, and mutations. It can also refer to the ploidy state of one or more chromosomes, chromosome segments, or groups of chromosome segments.

等位基因数据是指关于一组一或多个等位基因的一组基因型数据。它可以指定相单倍型数据。它可以指SNP身份，并且它可以指DNA的序列数据，包括插入、缺失、重复和突变。它可以包括每个等位基因的亲本来源。Allelic data refers to a set of genotype data for a set of one or more alleles. It can specify phase haplotype data. It can refer to SNP identity, and it can refer to sequence data of DNA, including insertions, deletions, duplications and mutations. It can include the parental origin of each allele.

等位基因状态是指一组一或多个等位基因中的基因的实际状态。它可以指通过等位基因数据描述的基因的实际状态。Allelic state refers to the actual state of a gene in a set of one or more alleles. It can refer to the actual state of a gene described by allelic data.

等位基因的比率或等位基因比率是指存在于样品中或个体中的基因座处的每个等位基因的量之间的比率。当通过测序测量样品时，等位基因的比率可以指映射到基因座处的每个等位基因的序列读数的比率。当通过基于强度的测量方法测量样品时，等位基因比率可以指如通过所述测量方法估计的存在于基因座处的每个等位基因的量的比率。The ratio of alleles or allelic ratio refers to the ratio between the amount of each allele present at a locus in a sample or in an individual. When measuring a sample by sequencing, the ratio of alleles can refer to the ratio of sequence reads that map to each allele at a locus. When a sample is measured by an intensity-based measurement method, the allele ratio may refer to the ratio of the amount of each allele present at a locus as estimated by the measurement method.

等位基因计数是指映射到特定基因座的序列的数量，并且如果所述基因座是多态的，那么它是指映射到等位基因中的每一个的序列的数量。如果以二进制方式对每个等位基因进行计数，那么所述等位基因计数将是整数。如果以概率方式对等位基因进行计数，那么所述等位基因计数可以是分数。Allele count refers to the number of sequences that map to a particular locus, and if the locus is polymorphic, it refers to the number of sequences that map to each of the alleles. If each allele was counted in binary, the allele count would be an integer. If alleles are counted probabilistically, the allele count may be a fraction.

等位基因计数概率是指可能映射到特定基因座或在多态基因座处的一组等位基因的序列的数量以及映射概率。应注意，等位基因计数相当于等位基因计数概率，其中每个计数序列的映射概率是二进制的(零或一)。在一些实施例中，等位基因计数概率可以是二进制的。在一些实施例中，等位基因计数概率可以被设定成等于DNA测量结果。The allele count probability refers to the number and probability of mapping to a particular locus or set of alleles at a polymorphic locus. It should be noted that allele counts are equivalent to allele count probabilities, where the mapping probability for each sequence of counts is binary (zero or one). In some embodiments, allele count probabilities may be binary. In some embodiments, the allele count probability can be set equal to the DNA measurement.

等位基因分布或‘等位基因计数分布’是指在一组基因座中的每个基因座处存在的每个等位基因的相对量。等位基因分布可以指个体、样品或对样品进行的一组测量。在测序的情况下，等位基因分布是指映射到一组多态基因座中的每个等位基因处的特定等位基因的读数的数量或可能数量。等位基因测量结果可以以概率方式进行处理，也就是说，对于指定序列读数存在的指定等位基因的似然性是介于0与1之间的分数，或者它们可以按二进制方式进行处理，也就是说，任何指定读数被认为恰好是特定等位基因的零个或一个拷贝。Allelic distribution or 'allele count distribution' refers to the relative amount of each allele present at each locus in a set of loci. An allelic distribution can refer to an individual, a sample, or a set of measurements taken on a sample. In the context of sequencing, allelic distribution refers to the number or likely number of reads that map to a particular allele at each allele in a set of polymorphic loci. Allelic measures can be processed in a probabilistic fashion, that is, the likelihood that a given allele is present for a given sequence read is a score between 0 and 1, or they can be processed in a binary fashion, That is, any given read is considered to be exactly zero or one copy of a particular allele.

等位基因分布模式是指针对不同的亲本背景的一组不同的等位基因分布。某些等位基因分布模式可以指示某些倍性状态。The allelic distribution pattern refers to a set of different allele distributions for different parental backgrounds. Certain allelic distribution patterns can indicate certain ploidy states.

等位基因偏差是指在杂合基因座测量的等位基因的比率与初始DNA样品中所存在的比率的不同程度。在特定基因座处的等位基因偏差程度等于如所测量的在所述基因座处观察到的等位基因比率除以初始DNA样品中在所述基因座处的等位基因的比率。等位基因偏差可以定义为大于一，以使得如果等位基因偏差程度的计算返回小于1的值x，那么等位基因偏差程度可以重述为1/x。等位基因偏差可能是由于扩增偏差、纯化偏差或以不同方式影响不同等位基因的一些其它现象。Allelic bias refers to the degree to which the measured ratio of alleles at a heterozygous locus differs from the ratio present in the original DNA sample. The degree of allelic bias at a particular locus is equal to the ratio of alleles observed at that locus as measured divided by the ratio of alleles at that locus in the original DNA sample. The allelic bias can be defined to be greater than one, such that if the calculation of the allelic bias degree returns a value x less than 1, then the allelic bias degree can be restated as 1/x. Allelic bias may be due to amplification bias, purification bias, or some other phenomenon that affects different alleles differently.

引物也称为“PCR探针”，是指单一DNA分子(DNA寡聚物)或DNA分子(DNA寡聚物)的集合，其中DNA分子是一致的，或几乎一致的，并且其中引物含有被设计成用于杂交到靶向基因座(例如靶向多态基因座或非多态基因座)的区域，并且可以含有被设计成用于允许PCR扩增的引发序列。引物还可以含有分子条形码。引物可以含有针对每个单独的分子有所不同的随机区域。术语“测试引物”和“候选引物”不意味着是限制性的并且可以指本文中所公开的引物中的任一种。A primer, also called a "PCR probe", refers to a single DNA molecule (DNA oligomer) or a collection of DNA molecules (DNA oligomers) in which the DNA molecules are identical, or nearly identical, and in which the primer contains A region designed to hybridize to a targeted locus (eg, targeted to a polymorphic locus or a non-polymorphic locus), and may contain a priming sequence designed to allow PCR amplification. Primers can also contain molecular barcodes. Primers may contain random regions that differ for each individual molecule. The terms "test primer" and "candidate primer" are not meant to be limiting and may refer to any of the primers disclosed herein.

引物库是指两种或更多种引物的群体。在不同实施例中，所述库包括至少1,000、2,000、5,000、7,500、10,000、20,000、25,000、30,000、40,000、50,000、75,000或100,000个不同引物。在不同实施例中，所述库包括至少1,000、2,000、5,000、7,500、10,000、20,000、25,000、30,000、40,000、50,000、75,000或100,000个不同的引物对，其中每对引物包括正向测试引物和反向测试引物，其中每对测试引物杂交到一个目标基因座。在一些实施例中，引物库包括至少1,000、2,000、5,000、7,500、10,000、20,000、25,000、30,000、40,000、50,000、75,000或100,000个各自杂交到不同目标基因座的不同的单独引物，其中所述单独引物不是引物对的一部分。在一些实施例中，所述库具有(i)引物对和(ii)不是引物对的一部分的单独引物(例如通用引物)。A primer library refers to a population of two or more primers. In various embodiments, the library comprises at least 1,000, 2,000, 5,000, 7,500, 10,000, 20,000, 25,000, 30,000, 40,000, 50,000, 75,000, or 100,000 different primers. In various embodiments, the library comprises at least 1,000, 2,000, 5,000, 7,500, 10,000, 20,000, 25,000, 30,000, 40,000, 50,000, 75,000, or 100,000 different primer pairs, wherein each pair of primers includes a forward test primer and Reverse test primers, where each pair of test primers hybridizes to one locus of interest. In some embodiments, the library of primers comprises at least 1,000, 2,000, 5,000, 7,500, 10,000, 20,000, 25,000, 30,000, 40,000, 50,000, 75,000, or 100,000 different individual primers that each hybridize to a different target locus, wherein the Individual primers are not part of a primer pair. In some embodiments, the library has (i) primer pairs and (ii) individual primers that are not part of the primer pairs (eg, universal primers).

杂交捕获探针是指通过例如PCR或直接合成等各种方法产生并且旨在与样品中的特异性目标DNA序列的一条链互补的可能被修饰的任何核酸序列。可以向制备样品中添加外源性杂交捕获探针并且通过变性-重退火过程杂交以形成外源性-内源性片段的双螺旋体。这些双螺旋体然后可以通过各种手段以物理方式与样品分离。A hybridization capture probe refers to any nucleic acid sequence, possibly modified, produced by various methods such as PCR or direct synthesis and intended to be complementary to one strand of a specific target DNA sequence in a sample. Exogenous hybridization capture probes can be added to the prepared sample and hybridized through a denaturation-reannealing process to form exogenous-endogenous fragment duplexes. These duplexes can then be physically separated from the sample by various means.

序列读数是指表示使用克隆测序法测量的核苷酸碱基序列的数据。克隆测序可以产生表示一个初始DNA分子的单份或克隆或团簇的序列数据。序列读数还可以具有在序列的每个碱基位置处的相关质量分数，它表示核苷酸被正确地判读的概率。Sequence reads refer to data representing the sequence of nucleotide bases measured using clonal sequencing methods. Clonal sequencing can generate sequence data representing a single copy or clones or clusters of an original DNA molecule. Sequence reads can also have an associated quality score at each base position in the sequence, which indicates the probability that the nucleotide was called correctly.

映射序列读数是测定序列读数在特定生物体的基因组序列中的源位置的过程。序列读数的源位置是基于读数与基因组序列的核苷酸序列的类似性。Mapping sequence reads is the process of determining the source location of sequence reads in the genome sequence of a particular organism. The source position of the sequence reads is based on the nucleotide sequence similarity of the reads to the genomic sequence.

匹配拷贝错误也称为“匹配染色体非整倍性”(MCA)，是指一个细胞含有两条一致或几乎一致的染色体的非整倍性状态。这种类型的非整倍性可以出现在减数分裂中配子的形成期间，并且可以被称作减数分裂不分离错误。这种类型的错误可以出现在有丝分裂中。匹配三体性可以指在个体中存在三个拷贝的指定染色体并且所述拷贝中的两个是一致的情况。Matching copy errors, also known as "matching chromosomal aneuploidy" (MCA), are aneuploidy states in which a cell contains two identical or nearly identical chromosomes. This type of aneuploidy can arise during the formation of gametes in meiosis and can be referred to as a meiotic nondisjunction error. This type of error can appear in mitosis. Matching trisomy may refer to a situation where three copies of a given chromosome are present in an individual and two of the copies are identical.

不匹配的拷贝错误也称为“独特的染色体非整倍性”(UCA)，是指一个细胞含有来自同一亲本并且可以是同源但不一致的两条染色体的非整倍性状态。这种类型的非整倍性可以出现在减数分裂期间，并且可以被称作减数分裂错误。不匹配的三体性可以指在个体中存在三个拷贝的指定染色体并且所述拷贝中的两个来自同一亲本并且是同源但不一致的情况。应注意，不匹配的三体性可以指其中存在来自一个亲本的两条同源染色体并且其中所述染色体的一些区段是一致的而其它区段仅仅是同源的情况。Mismatched copy errors, also known as "unique chromosomal aneuploidy" (UCA), refer to the aneuploidy state in which a cell contains two chromosomes from the same parent that can be homologous but not identical. This type of aneuploidy can arise during meiosis and can be referred to as a meiotic error. A mismatched trisomy can refer to a situation where there are three copies of a given chromosome in an individual and two of the copies are from the same parent and are homologous but not identical. It should be noted that a mismatched trisomy can refer to a situation where there are two homologous chromosomes from one parent and where some segments of the chromosomes are identical while other segments are merely homologous.

同源染色体是指含有通常在减数分裂期间配对的同组基因的染色体拷贝。Homologous chromosomes are copies of chromosomes that contain the same set of genes that usually pair up during meiosis.

一致染色体是指含有同组基因并且关于每个基因，它们具有一致或几乎一致的同组等位基因的染色体拷贝。Concordant chromosomes are copies of chromosomes that contain the same set of genes and, for each gene, they have identical or nearly identical sets of alleles.

等位基因丢失(ADO)是指来自同源染色体的一组碱基对中的至少一个碱基对在指定等位基因处检测不到的情况。Allelic loss (ADO) refers to the condition in which at least one base pair in a set of base pairs from homologous chromosomes is not detected at a given allele.

基因座丢失(LDO)是指来自同源染色体的一组碱基对中的两个碱基对在指定等位基因处检测不到的情况。Loss of loci (LDO) refers to the situation where two base pairs in a set of base pairs from homologous chromosomes are not detected at a given allele.

纯合子是指具有类似的等位基因作为对应的染色体基因座。Homozygous means having similar alleles as corresponding chromosomal loci.

杂合子是指具有不同的等位基因作为对应的染色体基因座。Heterozygous means having different alleles as corresponding chromosomal loci.

杂合率是指群体中在指定基因座处具有杂合等位基因的个体的比率。杂合率还可以指在个体或DNA样品中的指定基因座处所预计或测量的等位基因比率。The heterozygosity rate refers to the ratio of individuals in a population that have a heterozygous allele at a given locus. Heterozygosity rate can also refer to the predicted or measured ratio of alleles at a given locus in an individual or DNA sample.

高信息量的单核苷酸多态性(HISNP)是指其中胎儿具有不存在于母亲的基因型中的等位基因的SNP。A highly informative single nucleotide polymorphism (HISNP) refers to a SNP in which the fetus has an allele that is not present in the mother's genotype.

染色体区域是指染色体的区段或完整染色体。A chromosomal region refers to a segment of a chromosome or a complete chromosome.

染色体的区段是指大小范围可以是从一个碱基对到整个染色体的染色体部分。A segment of a chromosome refers to a portion of a chromosome that can range in size from one base pair to an entire chromosome.

染色体是指完整染色体或染色体的区段或部分。Chromosome refers to a complete chromosome or a segment or part of a chromosome.

拷贝是指染色体区段的拷贝数。它可以指染色体区段的一致拷贝或不一致的同源拷贝，其中染色体区段的不同拷贝含有一组实质上类似的基因座，并且其中等位基因中的一或多个是不同的。应注意在非整倍性的一些情况下，例如M2拷贝错误，有可能指定染色体区段的一些拷贝是一致的以及相同染色体区段的一些拷贝是不一致的。Copies refers to the number of copies of a chromosomal segment. It can refer to identical copies of a chromosomal segment or non-identical homologous copies, where different copies of the chromosomal segment contain a set of substantially similar loci, and where one or more of the alleles differ. It should be noted that in some cases of aneuploidy, such as M2 copy errors, it is possible that some copies of a given chromosome segment are identical and some copies of the same chromosome segment are not identical.

单倍型是指在同一染色体上通常共同遗传的多个基因座上等位基因的组合。取决于一组指定基因座之间已经发生的重组事件的数量，单倍型可以指少达两个基因座或指整个染色体。单倍型还可以指统计学相关的单一染色单体上的一组单核苷酸多态性(SNP)。Haplotype refers to the combination of alleles at multiple loci that are usually inherited together on the same chromosome. Depending on the number of recombination events that have occurred between a given set of loci, a haplotype can refer to as few as two loci or to an entire chromosome. A haplotype can also refer to a statistically related set of single nucleotide polymorphisms (SNPs) on a single chromatid.

单倍型数据也称为“定相数据”或“有序遗传数据”，是指来自二倍体或多倍体基因组中的单一染色体的数据，即，二倍体基因组中的染色体的经分离的母本或父本拷贝。Haplotype data, also known as "phased data" or "ordered genetic data," refers to data from a single chromosome in a diploid or polyploid genome, that is, the segregated sequence of chromosomes in a diploid genome. copy of the maternal or paternal parent.

定相是指鉴于无序的二倍体(或多倍体)遗传数据，测定个体的单倍型遗传数据的行为。它可以指针对一条染色体上所发现的一组等位基因，确定在等位基因处的两个基因中的哪一个与个体中的两条同源染色体中的每一条相关的行为。Phasing refers to the act of determining haplotype genetic data of an individual in view of disordered diploid (or polyploid) genetic data. It may refer to the act of determining, for a set of alleles found on a chromosome, which of the two genes at the allele is associated with each of the two homologous chromosomes in an individual.

定相数据是指已经测定了一或多个单倍型的遗传数据。Phased data refers to genetic data for which one or more haplotypes have been determined.

假设是指在一组指定的染色体的可能倍性状态或在一组指定基因座的一组可能的等位基因状态。这组可能性可以包含一或多个元素。A hypothesis refers to a set of possible ploidy states at a specified set of chromosomes or a set of possible allelic states at a specified set of loci. The set of possibilities can contain one or more elements.

拷贝数假设也称为“倍性状态假设”，是指关于个体中的染色体的拷贝数的假设。它还可以指关于染色体中的每一条的身份的假设，包括每条染色体的亲本来源和在个体中存在亲本的两条染色体中的哪一条。它还可以指关于来自相关个体的哪些染色体或染色体区段(如果存在的话)在遗传上对应于个体的指定染色体的假设。A copy number hypothesis, also referred to as a "ploidy state hypothesis", refers to a hypothesis about the copy number of a chromosome in an individual. It can also refer to assumptions about the identity of each of the chromosomes, including the parental origin of each chromosome and which of the two parental chromosomes is present in an individual. It can also refer to a hypothesis as to which chromosomes or chromosome segments (if any) from related individuals genetically correspond to a given chromosome of the individual.

目标个体是指要测定其遗传状态的个体。在一些实施例中，仅有限量的DNA可获自目标个体。在一些实施例中，目标个体是胎儿。在一些实施例中，可以存在一个以上目标个体。在一些实施例中，来源于一对父母的每个胎儿都可以被认为是目标个体。在一些实施例中，要测定的遗传数据是一个或一组等位基因判读。在一些实施例中，要测定的遗传数据是倍性判读。A target individual is an individual whose genetic status is to be determined. In some embodiments, only a limited amount of DNA is available from the individual of interest. In some embodiments, the target individual is a fetus. In some embodiments, there may be more than one target individual. In some embodiments, each fetus from a pair of parents may be considered a target individual. In some embodiments, the genetic data to be determined is one or a set of allelic calls. In some embodiments, the genetic data to be determined is a ploidy call.

相关个体是指与目标个体遗传上相关并且因此与其共享单倍域的任何个体。在一种情况下，相关个体可以是目标个体的基因父母或来源于父母的任何遗传物质，例如精子、极体、胚胎、胎儿或孩子。它还可以指兄弟姐妹、父母或祖父母。A related individual refers to any individual that is genetically related to, and thus shares a haplotype domain with, the individual of interest. In one instance, the related individual may be the genetic parent of the subject individual or any genetic material derived from the parent, such as sperm, polar body, embryo, fetus or child. It can also refer to siblings, parents, or grandparents.

兄弟姐妹是指其基因父母与所讨论的个体相同的任何个体。在一些实施例中，它可以指已出生孩子、胚胎或胎儿，或来源于已出生孩子、胚胎或胎儿的一或多个细胞。兄弟姐妹还可以指源自父母一方的单倍体个体，例如精子、极体或任何其它组单倍型遗传物质。个体可以被认为是自身的兄弟姐妹。A sibling is any individual whose genetic parents are the same as the individual in question. In some embodiments, it may refer to a born child, embryo or fetus, or one or more cells derived from a born child, embryo or fetus. Siblings can also refer to haploid individuals derived from one parent, such as sperm, polar bodies, or any other set of haplotype genetic material. Individuals can be thought of as siblings of themselves.

胎儿是指“胎儿的”或“遗传上类似于胎儿的胎盘区域的”。在孕妇中，胎盘的某些部分在遗传上类似于胎儿，并且母本血液中发现的自由浮动的胎儿DNA可能来源于胎盘上与胎儿基因型相匹配的部分。应注意胎儿中一半染色体的遗传信息是遗传自胎儿的母亲。在一些实施例中，从这些母本遗传的来自胎儿细胞的染色体的DNA被认为是“胎儿来源的”，而不是“母本来源的”。Fetus means "of a fetus" or "of a region of the placenta that is genetically similar to a fetus." In pregnant women, some parts of the placenta are genetically similar to the fetus, and free-floating fetal DNA found in maternal blood may have originated from parts of the placenta that match the fetal genotype. It should be noted that the genetic information of half of the chromosomes in the fetus is inherited from the mother of the fetus. In some embodiments, DNA from the chromosomes of fetal cells inherited from these mothers is considered "of fetal origin" rather than "of maternal origin."

胎儿来源的DNA是指最初作为基因型基本上相当于胎儿基因型的细胞的一部分的DNA。DNA of fetal origin refers to DNA that was originally part of a cell whose genotype substantially corresponds to the fetal genotype.

母本来源的DNA是指最初作为基因型基本上相当于母亲的基因型的细胞的一部分的DNA。DNA of maternal origin refers to DNA that was originally part of a cell whose genotype substantially corresponds to that of the mother.

孩子可以指胚胎、分裂球或胎儿。应注意在本发明所公开的实施例中，所述概念同样很好地适用于作为已出生孩子、胎儿、胚胎或来自其的一组细胞的个体。术语孩子的使用可以打算简单地意味着被称作孩子的个体是父母的遗传后代。A child may refer to an embryo, blastomere or fetus. It should be noted that in the presently disclosed embodiments, the concept applies equally well to an individual who is a born child, fetus, embryo, or a group of cells derived therefrom. Use of the term child may be intended to mean simply that the individual referred to as a child is the genetic descendant of the parents.

父母是指个体的基因母亲或父亲。个体通常具有两个亲本(母本和父本)，但是情况可能不一定是这样，例如在基因或染色体嵌合中。亲本可以被认为是个体。Parent refers to the genetic mother or father of an individual. Individuals usually have two parents (maternal and paternal), but this may not necessarily be the case, for example in genetic or chromosomal mosaicism. Parents can be thought of as individuals.

亲本背景是指在目标的两个亲本中的一方或双方的两条相关染色体中的每一条上，指定SNP的遗传状态。Parental context refers to the genetic status of a specified SNP on each of the two related chromosomes of one or both parents of the target.

按需要发育也称“正常发育”，是指将能存活的胚胎植入子宫并引起妊娠、和/或妊娠继续并引起活产、和/或已出生孩子没有染色体异常、和/或已出生孩子没有其它不希望的遗传病况，例如疾病连锁基因。术语“按需要发育”打算涵盖父母和保健促进者可能希望的任何情况。在一些情况下，“按需要发育”可以指适用于医疗研究或其它目的的不能存活或能存活的胚胎。Development on demand, also known as "normal development," refers to the implantation of a viable embryo into the uterus and resulting in pregnancy, and/or the continuation of pregnancy and resulting in a live birth, and/or the birth of a child without chromosomal abnormalities, and/or the birth of a child No other undesired genetic conditions, such as disease-linked genes. The term "development as needed" is intended to cover any situation that parents and health care providers may wish. "Development as needed" may refer, in some instances, to non-viable or viable embryos suitable for medical research or other purposes.

插入到子宫中是指在体外受精的情况下将胚胎转移到子宫腔中的过程。Insertion into the uterus is the process of transferring an embryo into the uterine cavity in the case of in vitro fertilization.

母本血浆是指来自怀孕女性的血液的血浆部分。Maternal plasma refers to the plasma fraction of blood from a pregnant female.

临床决定是指采取或不采取具有影响个体的健康或存活的结果的行动的任何决定。在产前诊断的情况下，临床决定可以指堕胎或不堕胎的决定。A临床决定还可以指进行进一步测试的决定、采取行动以减轻不希望的表现型或采取行动以准备好伴随异常的孩子的出生。A clinical decision refers to any decision to take or not to take an action that has consequences affecting the health or survival of an individual. In the case of prenatal diagnosis, the clinical decision can refer to a decision to abort or not to abort. A clinical decision can also refer to a decision to conduct further testing, to take action to alleviate an undesired phenotype, or to take action to prepare for the birth of a child with an abnormality.

诊断盒是指被设计成用于执行本文中所公开的方法的一个或多个方面的一台机器或机器的组合。在一个实施例中，诊断盒可以放在患者护理点。在一个实施例中，诊断盒可以相继执行靶向扩增、测序。在一个实施例中，诊断盒可以单独或借助于技术员起作用。A diagnostic cassette refers to a machine or combination of machines designed to perform one or more aspects of the methods disclosed herein. In one embodiment, the diagnostic cassette may be placed at the point of patient care. In one embodiment, the diagnostic cassette may perform targeted amplification followed by sequencing. In one embodiment, the diagnostic cartridge can function alone or with the aid of a technician.

基于信息的方法是指在很大程度上依赖于统计学来了解大量数据的方法。在产前诊断的情况下，它是指被设计成鉴于例如来自分子阵列或测序的大量遗传数据，通过以统计方式推断最可能状态而不是通过直接物理测量状态来确定一或多个染色体的倍性状态或一或多个等位基因的等位基因状态的方法。在本发明的一个实施例中，基于信息的技术可以是本专利中所公开的技术。在本发明的一个实施例中，它可以是PARENTALSUPPORT^TM。An information-based approach is one that relies heavily on statistics to make sense of large amounts of data. In the context of prenatal diagnosis, it refers to the determination of the ploidy of one or more chromosomes by statistically inferring the most probable state rather than by direct physical measurement of state, given large amounts of genetic data, for example from molecular arrays or sequencing. Sexual status or the allelic status of one or more alleles. In one embodiment of the present invention, the information-based technology may be the technology disclosed in this patent. In one embodiment of the invention it may be PARENTALSUPPORT ^(TM) .

原始遗传数据是指通过基因分型平台输出的模拟强度信号。在SNP阵列的情况下，原始遗传数据是指在进行任何基因型判读之前的强度信号。在测序的情况下，原始遗传数据是指类似于色谱图的模拟测量结果，它在测定任何碱基对的身份之前并且在序列已经映射到基因组之前完成测序仪。Raw genetic data refers to the simulated intensity signal output by the genotyping platform. In the case of SNP arrays, raw genetic data refers to the intensity signal before any genotype calls are made. In the context of sequencing, raw genetic data refers to the analog measurements, akin to a chromatogram, that go through a sequencer before the identity of any base pairs is determined and before the sequences have been mapped to the genome.

二次遗传数据是指通过基因分型平台输出的经处理的遗传数据。在SNP阵列的情况下，二次遗传数据是指通过与SNP阵列阅读器相关的软件进行的等位基因判读，其中所述软件已经做出了指定等位基因存在或不存在于样品中的判读。在测序的情况下，二次遗传数据是指已经测定了序列的碱基对身份，并且可能还指所述序列已经被映射到基因组的何处。Secondary genetic data refers to the processed genetic data output by the genotyping platform. In the context of SNP arrays, secondary genetic data refers to allelic calls made by software associated with the SNP array reader, where the software has made a call for the presence or absence of the specified allele in the sample . In the context of sequencing, secondary genetic data means that the base pair identity of a sequence has been determined, and possibly also where in the genome that sequence has been mapped.

非侵入性产前诊断(NPD)或也称为“非侵入性产前筛选”(NPS)，是指使用母亲的血液中所发现的遗传物质，测定母亲体内正在孕育的胎儿的遗传状态的方法，其中所述遗传物质是通过抽取母亲的静脉内血液获得的。Non-Invasive Prenatal Diagnosis (NPD) or also known as "Non-Invasive Prenatal Screening" (NPS), refers to the method of determining the genetic status of a developing fetus in the mother's body using genetic material found in the mother's blood , wherein the genetic material is obtained by drawing intravenous blood from the mother.

对应于基因座的DNA的优先富集或在基因座处的DNA的优先富集是指促使富集后DNA混合物中对应于所述基因座的DNA分子的百分比高于富集前DNA混合物中对应于所述基因座的DNA分子的百分比的任何方法。所述方法可以涉及选择性扩增对应于基因座的DNA分子。所述方法可以涉及去除不对应于基因座的DNA分子。所述方法可以涉及方法组合。富集度被定义为富集后混合物中对应于所述基因座的DNA分子的百分比除以富集前混合物中对应于所述基因座的DNA分子的百分比。优先富集可以在多个基因座处执行。在本发明的一些实施例中，富集度大于20。在本发明的一些实施例中，富集度大于200。在本发明的一些实施例中，富集度大于2,000。当执行在多个基因座的优先富集时，富集度可以指基因座组中所有基因座的平均富集度。Preferential enrichment of DNA corresponding to or at a locus refers to causing a higher percentage of DNA molecules corresponding to said locus in the DNA mixture after enrichment than in the DNA mixture before enrichment. Any method for the percentage of DNA molecules at the locus. The method may involve selective amplification of DNA molecules corresponding to the loci. The method may involve removing DNA molecules that do not correspond to a locus. The method may involve a combination of methods. The degree of enrichment is defined as the percentage of DNA molecules corresponding to the locus in the enriched mixture divided by the percentage of DNA molecules corresponding to the locus in the pre-enrichment mixture. Preferential enrichment can be performed at multiple loci. In some embodiments of the invention, the degree of enrichment is greater than 20. In some embodiments of the invention, the degree of enrichment is greater than 200. In some embodiments of the invention, the degree of enrichment is greater than 2,000. When performing preferential enrichment at multiple loci, the enrichment may refer to the average enrichment of all loci in the locus group.

扩增是指增加DNA分子的拷贝数的方法。Amplification refers to a method of increasing the number of copies of a DNA molecule.

选择性扩增可以指增加特定DNA分子或对应于特定DNA区域的DNA分子的拷贝数的方法。它还可以指增加特定靶向DNA分子或靶向DNA区域的拷贝数而不只是增加非靶向分子或DNA区域的方法。选择性扩增可以是优先富集的方法。Selective amplification may refer to a method of increasing the copy number of a specific DNA molecule or a DNA molecule corresponding to a specific region of DNA. It can also refer to methods of increasing the copy number of specific targeted DNA molecules or targeted DNA regions rather than just increasing non-targeted molecules or DNA regions. Selective amplification can be a method of preferential enrichment.

通用引发序列是指可以例如通过接合、PCR或接合介导的PCR而附加到目标DNA分子群体的DNA序列。在添加到目标分子群体后，对通用引发序列具有特异性的引物可以用以使用一对扩增引物来扩增目标群体。通用引发序列通常无目标序列无关。A universal priming sequence refers to a DNA sequence that can be appended to a population of DNA molecules of interest, eg, by ligation, PCR, or ligation-mediated PCR. After addition to the population of molecules of interest, primers specific to the universal priming sequence can be used to amplify the population of interest using a pair of amplification primers. The universal priming sequence is usually independent of the target sequence.

通用衔接子或‘接合衔接子’或‘库标记’是含有可以共价连接到目标双链DNA分子群体的5′和3′端的通用引发序列的DNA分子。衔接子的添加为目标群体的5′和3′端提供了通用引发序列，可以使用一对扩增引物从所述通用引发序列发生PCR扩增，对来自目标群体的所有分子进行扩增。Universal adapters or 'junction adapters' or 'library tags' are DNA molecules that contain universal priming sequences that can be covalently ligated to the 5' and 3' ends of a population of double-stranded DNA molecules of interest. The addition of adapters provides a universal priming sequence for the 5' and 3' ends of the target population from which PCR amplification can occur using a pair of amplification primers to amplify all molecules from the target population.

靶向是指用于选择性扩增或者优先富集DNA混合物中对应于一组基因座的那些DNA分子的方法。Targeting refers to methods used to selectively amplify or preferentially enrich those DNA molecules in a DNA mixture that correspond to a set of loci.

联合分布模型是指定义事件概率的模型，所述事件关于多个随机变量加以定义，指定在相同的概率空间上定义的多个随机变量，其中变量的概率是连锁的。在一些实施例中，可以使用变量的概率不连锁的简并情况。A joint distribution model refers to a model that defines the probability of an event defined with respect to a plurality of random variables, specifying a plurality of random variables defined on the same probability space, where the probabilities of the variables are chained. In some embodiments, a degenerate case where the probabilities of the variables are not linked may be used.

附图说明Description of drawings

本发明所公开的实施例将进一步关于附图进行解释，其中在数个视图中通过相似的数字提及相似的结构。所示的图不一定按比例绘制，重点实际上主要放在说明本发明所公开的实施例的原理上。The disclosed embodiments of the present invention will be further explained with reference to the drawings, wherein like structures are referred to by like numerals throughout the several views. The drawings shown are not necessarily to scale, emphasis instead primarily being placed upon illustrating the principles of the disclosed embodiments of the invention.

图1：直接复合微型PCR方法的图形表示。Figure 1: Graphical representation of the direct multiplex mini-PCR method.

图2：半嵌套式微型PCR方法的图形表示。Figure 2: Graphical representation of the semi-nested mini-PCR method.

图3：全嵌套式微型PCR方法的图形表示。Figure 3: Graphical representation of the fully nested mini-PCR method.

图4：半侧嵌套式微型PCR方法的图形表示。Figure 4: Graphical representation of the hemi-nested mini-PCR method.

图5：三重半侧嵌套式微型PCR方法的图形表示。Figure 5: Graphical representation of the triple hemi-nested mini-PCR method.

图6：单边嵌套式微型PCR方法的图形表示。Figure 6: Graphical representation of the one-sided nested mini-PCR method.

图7：单边微型PCR方法的图形表示。Figure 7: Graphical representation of the one-sided mini-PCR method.

图8：反向半嵌套式微型PCR方法的图形表示。Figure 8: Graphical representation of the reverse semi-nested mini-PCR method.

图9：半嵌套式方法的一些可能的工作流。Figure 9: Some possible workflows for a semi-nested approach.

图10：成环接合衔接子的图形表示。Figure 10: Graphical representation of circular ligated adapters.

图11：内部标记引物的图形表示。Figure 11: Graphical representation of internally labeled primers.

图12：具有内部标记的一些引物的实例。Figure 12: Examples of some primers with internal markers.

图13：使用含接合衔接子结合区的引物进行的方法的图形表示。Figure 13: Graphical representation of the method performed using primers containing ligated adapter binding regions.

图14：采用两种不同的分析技术的计数法的模拟倍性判读准确性。Figure 14: Simulated ploidy call accuracy of the enumeration method using two different analytical techniques.

图15：在实验4中在细胞系中的多个SNP的两种等位基因的比率。Figure 15: Ratio of two alleles of multiple SNPs in cell lines in Experiment 4.

图16：在实验4中在细胞系中的多个通过染色体进行分类的SNP的两种等位基因的比率。Figure 16: Ratios of two alleles for various chromosomally sorted SNPs in cell lines in Experiment 4.

图17A-D：在四份孕妇血浆样品中的多个通过染色体进行分类的SNP的两种等位基因的比率。Figures 17A-D: Ratios of two alleles for multiple chromosomally sorted SNPs in four maternal plasma samples.

图18：可以通过数据校正之前和之后的二项式方差来解释的数据的分数。Figure 18: Fraction of data that can be explained by binomial variance before and after data correction.

图19：示出了样品中的胎儿DNA在短库制备方案之后相对富集的图。Figure 19: Graph showing the relative enrichment of fetal DNA in samples following the short library preparation protocol.

图20：比较直接PCR和半嵌套式方法的读数深度图。Figure 20: Read depth plot comparing direct PCR and semi-nested approaches.

图21：三个基因组样品的直接PCR的读数深度比较。Figure 21 : Read depth comparison of direct PCR for three genomic samples.

图22：三个样品的半嵌套式微型PCR的读数深度比较。Figure 22: Comparison of read depth for semi-nested mini-PCR for three samples.

图23：1,200重和9,600重反应的读数深度比较。Figure 23: Comparison of read depth for 1,200-plex and 9,600-plex reactions.

图24：在三条染色体处的六个细胞的读数计数比率。Figure 24: Read count ratios for six cells at three chromosomes.

图25：在三条染色体处的两个三细胞反应和一个在1ng基因组DNA上操作的第三反应的等位基因比率。Figure 25: Allelic ratios of two triplet reactions at three chromosomes and one third run on 1 ng of genomic DNA.

图26：在三条染色体处的两个单细胞反应的等位基因比率。Figure 26: Allelic ratios of two single cell responses at three chromosomes.

图27：两个引物库的比较，示出了每个引物库所靶向的具有特定次要等位基因频率的基因座的数量。Figure 27: Comparison of two primer pools showing the number of loci with specific minor allele frequencies targeted by each primer pool.

图28A：PCR产物的电泳图。图28B-28M分别是图28A中的泳道1-12的电泳图。Figure 28A: Electropherogram of PCR products. Figures 28B-28M are electropherograms of lanes 1-12 in Figure 28A, respectively.

图29A-29E：用于测定胎儿非整倍性的本发明方法的草图描绘(图29A)。采用母本和父本基因型数据(来自血液或腮抹试)和来自HapMap数据库的交叉频率数据产生(图29B)针对计算机模拟的每种可能的胎儿倍性状态的多个独立假设(图29C)。这些假设中的每一个被扩展为包括考虑了不同的可能交叉点的子假设。数据模型预测了鉴于每种假设的胎儿基因型和不同的胎儿cfDNA分数，测序数据将呈什么样(预计等位基因分布)，并将其与实际测序数据比较；使用贝叶斯统计(Bayesian statistics)测定每种假设的似然性。在这个假设实例中，确定了似然性最高的假设(整倍性)(图29D)。针对每个拷贝数假设家族(单体性、二体性或三倍性)，对图29C的单独似然性进行求和。具有最大似然性的假设被判读为倍性状态，揭示了胎儿分数，并且代表着样品特异性计算的准确性(图29E)。Figures 29A-29E: Schematic depiction of the method of the invention for determining fetal aneuploidy (Figure 29A). Using maternal and paternal genotype data (from blood or cheek smears) and crossover frequency data from the HapMap database generated (FIG. 29B) multiple independent hypotheses for each possible fetal ploidy state simulated in silico (FIG. 29C ). Each of these hypotheses is expanded to include sub-hypotheses that take into account different possible intersections. The data model predicted what the sequencing data would look like (expected allelic distribution) given each hypothetical fetal genotype and different fetal cfDNA fractions, and compared it to the actual sequencing data; using Bayesian statistics ) measures the likelihood of each hypothesis. In this hypothetical example, the hypothesis with the highest likelihood (euploidy) was identified (Fig. 29D). The individual likelihoods of Figure 29C were summed for each copy number hypothesis family (monosomy, disomy or triploidy). Hypotheses with maximum likelihood were interpreted as ploidy status, revealed fetal fraction, and represented the accuracy of sample-specific calculations (Fig. 29E).

图30A-30H：整倍性(图30A-30C)、单体性(图30D)和三体性(图30E-30H)的典型图形表示。关于所有的图，x轴表示单独多态基因座沿着每条染色体的线性位置(如图下方所指出)，并且y轴表示呈总(A+B)等位基因读数的分数形式的A等位基因的读数值。在图的右侧指出了母本和胎儿的基因型以及带中心在y轴上的位置。必要时为了促进可视化，可以根据母本基因型对所述图进行颜色编码，以使得红色表示母本基因型AA，蓝色表示母本基因型BB，并且绿色表示母本基因型AB。必要时，可以在“胎儿基因型”那一列中用颜色标出母本的等位基因贡献。以母本|胎儿形式表示等位基因贡献，以使得母亲为AA并且胎儿为AB的等位基因表示为AA|AB。图30A：当存在两条染色体并且胎儿cfDNA分数是0％时产生的图。这个图来自没有怀孕的妇女，并且因此表示当基因型完全是母本时的模式。等位基因簇因此以1(AA等位基因)、0.5(AB等位基因)和0(BB等位基因)为中心。图30B：当存在两条染色体并且胎儿分数是12％时产生的图。胎儿等位基因对A等位基因读数的分数的贡献使一些等位基因点的位置沿着y轴上移或下移，以使得各带以1(AA|AA等位基因)、0.94(AA|AB等位基因)、0.56(AB|AA等位基因)、0.50(AB|AB等位基因)、0.44(AB|BB等位基因)、0.06(BB|AB等位基因)以及0(BB|BB等位基因)为中心。图30C.当存在两条染色体并且胎儿分数是26％时产生的图。包括两个红色和两个蓝色的边缘带和一组三个中间绿色带的模式是非常明显的(颜色图中未示)。各带以1(AA|AA等位基因)、0.87(AA|AB等位基因)、0.63(AB|AA等位基因)、0.50(AB|AB等位基因)、0.37(AB|BB等位基因)、0.13(BB|AB等位基因)以及0(BB|BB等位基因)为中心。图30D：当存在一条染色体并且胎儿分数是26％时产生的图。一个外部红色和一个外部蓝色的边缘带以及两个中间绿色带的标志模式表示母本遗传的单体性(颜色图中未示)。因为胎儿向等位基因读数仅贡献了单个等位基因(A或B)，所以内部边缘红色和蓝色的带不存在，并且中间的一组三条带被压缩成两条带(颜色图中未示)。各带以1(AA|A等位基因)、0.57(AB|A等位基因)、0.43(AB|B等位基因)以及0(BB|B等位基因)为中心。图30E：当存在三条染色体并且胎儿分数是27％时产生的图。这种模式具有两个红色和两个蓝色的边缘带以及两个中间绿色带，表示母本遗传的减数分裂三体性(颜色图中未示)。各带以1(AA|AAA等位基因)、0.88(AA|AAB等位基因)、0.56(AB|AAB等位基因)、0.44(AB|ABB等位基因)、0.12(BB|ABB等位基因)以及0(BB|BBB等位基因)为中心。图30F：当存在三条染色体并且胎儿分数是14％时产生的图。这种模式具有三个红色和三个蓝色的边缘带以及两个中间绿色带，表示父本遗传的减数分裂三体性(颜色图中未示)。各带以1(AA|AAA等位基因)、0.93(AA|AAB等位基因)、0.87(AA|ABB等位基因)、0.60(AB|AAA等位基因)、0.53(AB|AAB等位基因)、0.47(AB|ABB等位基因)、0.40(AB|BBB等位基因)、0.13(BB|AAB等位基因)、0.07(BB|ABB等位基因)以及0(BB|BBB等位基因)为中心。图30G：当存在三条染色体并且胎儿分数是35％时产生的图。这种模式具有两个红色和两个蓝色的边缘带以及四个中间绿色带，表示母本遗传的有丝分裂三体性(颜色图中未示)。各带以1(AA|AAA等位基因)、0.85(AA|AAB等位基因)、0.72(AB|AAA等位基因)、0.57(AB|AAB等位基因)、0.43(AB|ABB等位基因)、0.28(AB|BBB等位基因)、0.15(BB|ABB等位基因)以及0(BB|BBB等位基因)为中心。图30H：当存在三条染色体并且胎儿分数是25％时产生的图。这种模式具有两个红色和两个蓝色的边缘带以及四个中间绿色带，表示父本遗传的有丝分裂三体性(颜色图中未示)。这种模式可以通过内部边缘带的位置与母本遗传的有丝分裂三体性(如图30G中)相区别。具体来说，各带以1(AA|AAA等位基因)、0.78(AA|ABB等位基因)、0.67(AB|AAA等位基因)、0.56(AB|AAB等位基因)、0.44(AB|ABB等位基因)、0.33(AB|BBB等位基因)、0.22(BB|AAB等位基因)以及0(BB|BBB等位基因)为中心。Figures 30A-30H: Representative graphical representations of euploidy (Figures 30A-30C), monosomy (Figure 30D) and trisomy (Figures 30E-30H). For all plots, the x-axis represents the linear position of the individual polymorphic loci along each chromosome (as indicated below the figure), and the y-axis represents A etc. as a fraction of the total (A+B) allelic reads. The read value of the bit gene. The maternal and fetal genotypes and the position of the band centers on the y-axis are indicated on the right side of the graph. To facilitate visualization if necessary, the plot can be color coded according to maternal genotype such that red represents maternal genotype AA, blue represents maternal genotype BB, and green represents maternal genotype AB. The maternal allelic contribution can be color-coded in the "Fetal Genotype" column, if desired. Allelic contributions are expressed in the maternal|fetal form, such that alleles with mother AA and fetus AB are denoted AA|AB. Figure 30A: Graph generated when two chromosomes are present and the fetal cfDNA fraction is 0%. This graph is from a non-pregnant woman, and thus represents the pattern when the genotype is exclusively maternal. The allelic clusters are thus centered around 1 (AA allele), 0.5 (AB allele) and 0 (BB allele). Figure 30B: Graph produced when two chromosomes are present and the fetal fraction is 12%. The contribution of the fetal allele to the fraction of A allele reads shifts the position of some allele points up or down along the y-axis such that the bands are scaled by 1(AA|AA allele), 0.94(AA |AB allele), 0.56(AB|AA allele), 0.50(AB|AB allele), 0.44(AB|BB allele), 0.06(BB|AB allele), and 0(BB |BB allele) as the center. Figure 30C. Graph generated when two chromosomes are present and the fetal fraction is 26%. A pattern consisting of two red and two blue fringe bands and a set of three intermediate green bands is evident (not shown in color map). Each band is represented by 1 (AA|AA allele), 0.87 (AA|AB allele), 0.63 (AB|AA allele), 0.50 (AB|AB allele), 0.37 (AB|BB allele gene), 0.13 (BB|AB allele) and 0 (BB|BB allele) as centers. Figure 30D: Graph generated when one chromosome is present and the fetal fraction is 26%. A flag pattern of one outer red and one outer blue marginal band and two intermediate green bands indicates maternally inherited monosomy (not shown in color map). Because the fetus contributes only a single allele (A or B) to the allelic readout, the red and blue bands at the inner edges are absent, and the central set of three bands is compressed into two bands (not shown in the colormap Show). Bands are centered around 1 (AA|A allele), 0.57 (AB|A allele), 0.43 (AB|B allele), and 0 (BB|B allele). Figure 30E: Graph generated when three chromosomes are present and the fetal fraction is 27%. This pattern has two red and two blue marginal bands and two intermediate green bands, indicating a maternally inherited meiotic trisomy (not shown in color map). Each band is represented by 1 (AA|AAA allele), 0.88 (AA|AAB allele), 0.56 (AB|AAB allele), 0.44 (AB|ABB allele), 0.12 (BB|ABB allele gene) and 0 (BB|BBB allele) as the center. Figure 30F: Graph generated when three chromosomes are present and the fetal fraction is 14%. This pattern has three red and three blue marginal bands and two intermediate green bands, indicating a paternally inherited meiotic trisomy (not shown in color map). Each band is represented by 1 (AA|AAA allele), 0.93 (AA|AAB allele), 0.87 (AA|ABB allele), 0.60 (AB|AAA allele), 0.53 (AB|AAB allele gene), 0.47 (AB|ABB allele), 0.40 (AB|BBB allele), 0.13 (BB|AAB allele), 0.07 (BB|ABB allele), and 0 (BB|BBB allele gene) as the center. Figure 30G: Graph generated when three chromosomes are present and the fetal fraction is 35%. This pattern has two red and two blue marginal bands and four intermediate green bands, indicating a maternally inherited mitotic trisomy (not shown in color map). Each band is represented by 1 (AA|AAA allele), 0.85 (AA|AAB allele), 0.72 (AB|AAA allele), 0.57 (AB|AAB allele), 0.43 (AB|ABB allele gene), 0.28 (AB|BBB allele), 0.15 (BB|ABB allele), and 0 (BB|BBB allele) as centers. Figure 30H: Graph generated when three chromosomes are present and the fetal fraction is 25%. This pattern has two red and two blue marginal bands and four intermediate green bands, indicating a paternally inherited mitotic trisomy (not shown in color map). This pattern can be distinguished from a maternally inherited mitotic trisomy (as in Figure 30G) by the position of the internal marginal zone. Specifically, each band is represented by 1 (AA|AAA allele), 0.78 (AA|ABB allele), 0.67 (AB|AAA allele), 0.56 (AB|AAB allele), 0.44 (AB |ABB allele), 0.33 (AB|BBB allele), 0.22 (BB|AAB allele) and 0 (BB|BBB allele).

图31：如所示的(图31A)整倍体、(图31B)T13、(图31C)T18、(图31D)T21、(图31E)45，X以及(图31F)47,XXY测试样品的图形表示。每条染色体示出在图的顶部，胎儿和母本的基因型示出在图的右侧，x轴表示SNP沿着每条染色体的线性位置，以及y轴表示呈总读数的分数形式的A等位基因的读数值。应注意，更改的簇定位基于胎儿分数，如本文中所述。每个点表示单一SNP基因座。胎儿和母本的基因型示出在图的右侧，并且染色体身份示出在图的顶部。Figure 31 : (Figure 31A) Euploid, (Figure 31B) T13, (Figure 31C) T18, (Figure 31D) T21, (Figure 31E) 45,X and (Figure 31F) 47,XXY test samples as indicated graphical representation of . Each chromosome is shown at the top of the graph, fetal and maternal genotypes are shown on the right side of the graph, the x-axis represents the linear position of the SNP along each chromosome, and the y-axis represents A as a fraction of the total reads The read value of the allele. It should be noted that the altered cluster localization is based on fetal fraction, as described in this paper. Each dot represents a single SNP locus. Fetal and maternal genotypes are shown on the right side of the graph, and chromosomal identity is shown on the top of the graph.

图32：性染色体非整倍性的组合出生时流行率大于常染色体非整倍性的组合出生时流行率。Figure 32: The combined birth prevalence of sex chromosome aneuploidies is greater than the combined birth prevalence of autosomal aneuploidies.

虽然上述各图阐述了本发明所公开的实施例，但是也涵盖其它实施例，如讨论中所指出。本发明借助于图示展示了例示性实施例，并且不限于此。本领域的技术人员可以想到在本发明所公开的实施例的原理的范围和精神内的许多其它修改和实施例。While the above figures illustrate disclosed embodiments of the invention, other embodiments are also contemplated, as noted in the discussion. The present invention presents exemplary embodiments by means of illustrations and is not limited thereto. Many other modifications and embodiments within the scope and spirit of the principles of the disclosed embodiments of the invention will occur to those skilled in the art.

具体实施方式Detailed ways

本发明部分基于以下出人意料的发现：通常引物库中只有相对较少数量的引物导致了在复合PCR反应期间形成的大量扩增的引物二聚体。已经开发出方法来选出最不希望的引物以便从候选引物库中去除。通过将引物二聚体的量减少到可忽略的量(PCR产物的约0.1％)，这些方法允许所得引物库在单一复合PCR反应中同时扩增大量目标基因座。因为引物杂交到目标基因座并且对其进行扩增而不是杂交到其它引物并形成扩增的引物二聚体，所以可以被扩增的不同目标基因座的数量增加。还发现了，使用比正常情况低的引物浓度和长得多的退火时间增加了引物杂交到目标基因座而不是彼此杂交并形成引物二聚体的似然性。The present invention is based in part on the surprising discovery that generally only a relatively small number of primers in a primer library results in the formation of a large number of amplified primer-dimers during multiplex PCR reactions. Methods have been developed to select the least desirable primers for removal from the pool of candidate primers. By reducing the amount of primer-dimers to negligible amounts (approximately 0.1% of the PCR product), these methods allow the resulting primer pool to simultaneously amplify a large number of target loci in a single multiplex PCR reaction. Because a primer hybridizes to a target locus and amplifies it rather than hybridizing to other primers and forming amplified primer-dimers, the number of different target loci that can be amplified is increased. It was also found that using lower than normal primer concentrations and much longer annealing times increased the likelihood that the primers would hybridize to the locus of interest rather than each other and form primer-dimers.

在基因组样品中的19,488个目标基因座的PCR扩增和测序期间，99.4％-99.7％的测序读数映射到基因组，其中99.99％映射到靶向基因座。关于具有1000万测序读数的血浆样品，通常19,488个靶向基因座中至少19,350个(99.3％)被扩增并测序。能够一次同时扩增如此大量的目标基因座大大减少了分析数千个目标基因座所需的时间量和DNA量。举例来说，来自单细胞的DNA足以同时分析数千个目标基因座，这对DNA量低的应用来说是重要的，例如在体外受精之前对来自胚胎的单细胞进行基因测试或对含极少DNA的法医样品进行基因测试。另外，能够在一个反应体积(例如在一个室或孔中)中分析目标基因座而不是将样品拆分成多个不同反应降低了反应之间可能发生的变化性。另外，已经开发出方法使用参考标准来校正不同目标基因座之间可能发生的扩增偏差。举例来说，目标基因座之间由于例如GC含量等因素造成的扩增效率的差异可以引起实际上以相同量存在的目标基因座产生不同量的PCR产物。类似于目标基因座的参考标准的使用允许检测这类扩增偏差以使得它可以在目标基因座的定量期间得到校正。During PCR amplification and sequencing of 19,488 target loci in the genomic sample, 99.4%-99.7% of the sequencing reads mapped to the genome, of which 99.99% mapped to the targeted loci. For plasma samples with 10 million sequencing reads, typically at least 19,350 (99.3%) of the 19,488 targeted loci were amplified and sequenced. Being able to simultaneously amplify such a large number of target loci at once greatly reduces the amount of time and DNA volume required to analyze thousands of target loci. For example, DNA from a single cell is sufficient for the simultaneous analysis of thousands of target loci, which is important for applications with low DNA quantities, such as genetic testing of single cells from embryos prior to in vitro fertilization or analysis of extremely DNA-less forensic samples for genetic testing. Additionally, being able to analyze the locus of interest in one reaction volume (eg, in one chamber or well) rather than splitting the sample into multiple different reactions reduces the variability that may occur between reactions. Additionally, methods have been developed to correct for amplification bias that may occur between different target loci using reference standards. For example, differences in amplification efficiency between loci of interest due to factors such as GC content can cause loci of interest that are present in virtually the same amount to produce different amounts of PCR product. The use of a reference standard similar to the target locus allows detection of such amplification bias so that it can be corrected during quantification of the target locus.

在PCR产物的测序期间，检测到例如引物二聚体的矫作物并且因此抑制了目标扩增子的检测。由于这种限制，通常使用含杂交探针的微阵列来检测，因为微阵列对引物二聚体的干扰较不敏感。现在已经用最小非目标扩增子实现的高水平复合允许相继使用PCR、测序来作为微阵列的替代方案。During the sequencing of PCR products, artefacts such as primer dimers are detected and thus inhibit the detection of target amplicons. Because of this limitation, microarrays containing hybridization probes are often used for detection, since microarrays are less sensitive to interference from primer-dimers. The high level of multiplexing that has now been achieved with minimal non-target amplicons allows the use of PCR followed by sequencing as an alternative to microarrays.

本发明的复合PCR方法可以用于各种应用中，例如基因分型、染色体异常(例如胎儿染色体非整倍性)的检测、基因突变和多态性(例如单核苷酸多态性，SNP)分析、基因缺失分析、亲权测定、群体中基因差异的分析、法医分析、测量疾病诱因、mRNA的定量分析以及传染剂(例如细菌、寄生虫以及病毒)的检测和鉴别。复合PCR方法还可以用于非侵入性产前测试，例如亲权测试或胎儿染色体异常的检测。The multiplex PCR method of the present invention can be used in various applications, such as genotyping, detection of chromosomal abnormalities (such as fetal chromosomal aneuploidy), gene mutation and polymorphism (such as single nucleotide polymorphism, SNP ) analysis, gene deletion analysis, paternity determination, analysis of genetic differences in populations, forensic analysis, measurement of disease predisposition, quantitative analysis of mRNA, and detection and identification of infectious agents such as bacteria, parasites, and viruses. The multiplex PCR method can also be used for non-invasive prenatal testing such as paternity testing or detection of fetal chromosomal abnormalities.

例示性引物设计方法Exemplary Primer Design Methods

高度复合PCR通常会产生极高比例的由非生产性副反应(例如引物二聚体形成)产生的DNA产物。在一个实施例中，可以从引物库中去除最有可能引起非生产性副反应的特定引物，得到将产生更大比例的映射到基因组的扩增DNA的引物库。去除有问题的引物(也就是说，特别有可能形成二聚体的那些引物)的步骤已经出乎意料地实现了极其高的PCR复合水平，以便通过测序进行后续分析。在性能因引物二聚体和/或其它故障产物而显著降低的系统(例如测序)中，已经实现了比其它描述的复合高出大于10倍、大于50倍和大于100倍的复合。应注意，这与例如微阵列、塔克曼(TAQMAN)、PCR等基于探针的检测方法相反，在这些方法中，过量的引物二聚体将不会明显地影响结果。还应注意，在本领域中的一般信念是用于测序的复合PCR在同一孔中限于约100个检测。富鲁达(Fluidigm)和雷恩·当斯(Rain Dance)提供了用于在并行反应中对一个样品执行48或1000个PCR检测的平台。Highly complex PCR often produces an extremely high proportion of DNA products resulting from unproductive side reactions such as primer-dimer formation. In one embodiment, specific primers most likely to cause unproductive side reactions can be removed from the primer pool, resulting in a primer pool that will generate a greater proportion of amplified DNA that maps to the genome. The step of removing problematic primers (that is, those primers that are particularly likely to form dimers) has unexpectedly achieved extremely high levels of PCR compounding for subsequent analysis by sequencing. In systems where performance is significantly degraded by primer dimers and/or other malfunctioning products, such as sequencing, greater than 10-fold, greater than 50-fold and greater than 100-fold greater complexation than otherwise described has been achieved. It should be noted that this is in contrast to probe-based detection methods such as microarray, TAQMAN, PCR, etc., where excess primer dimers will not significantly affect the results. It should also be noted that the general belief in the art is that multiplex PCR for sequencing is limited to about 100 assays in the same well. Fluidigm and Rain Dance offer platforms for performing 48 or 1000 PCR assays on one sample in parallel reactions.

从库中选择使非映射引物二聚体或其它引物故障产物的量降到最低的引物存在多种方式。经验数据表明，少量‘坏’引物造成了大量非映射引物二聚体副反应。去除这些‘坏’引物可以增加映射到靶向基因座的序列读数的百分比。鉴别‘坏’引物的一种方式是查看通过靶向扩增被扩增的DNA的测序数据；可以去除所看到的频率最大的那些引物二聚体，得到明显不大可能产生不映射到基因组的副产物DNA的引物库。还存在公开可用的可以计算各种引物组合的结合能的程序，并且去除结合能最高的那些引物组合也将得到明显不大可能产生不映射到基因组的副产物DNA的引物库。There are several ways to select primers from the library that minimize the amount of non-mapped primer dimers or other primer failure products. Empirical data shows that a small number of 'bad' primers results in a large number of unmapped primer-dimer side reactions. Removing these 'bad' primers can increase the percentage of sequence reads that map to the targeted loci. One way to identify 'bad' primers is to look at the sequencing data of the DNA that was amplified by targeted amplification; those primer-dimers seen at the highest frequency can be removed, resulting in significantly less likely to produce DNA that does not map to the genome Primer library of by-product DNA. There are also publicly available programs that can calculate the binding energies of various primer combinations, and removing those with the highest binding energies will also result in a pool of primers that are significantly less likely to produce by-product DNA that does not map to the genome.

在用于选择引物的一些实施例中，通过将一或多个引物或引物对设计为候选目标基因座来创建初始候选引物库。可以基于公开可用的关于目标基因座所希望的参数的信息选择一组候选目标基因座(例如SNP)，所述信息例如在目标群体内SNP的频率或SNP的杂合率。在一个实施例中，可以使用Primer3程序(www.primer3.sourceforge.net；libprimer3版本2.2.3，其特此以全文引用的方式并入本文中)设计PCR引物。必要时，引物可以被设计成在特定退火温度范围内退火，具有特定范围的GC含量，具有特定大小范围，产生在特定大小范围内的目标扩增子和/或具有其它参数特征。以每种候选目标基因座多个引物或引物对为起始物质增加了引物或引物对针对大部分或所有目标基因座将剩余在库中的似然性。在一个实施例中，选择准则可能需要每个目标基因座至少一个引物对剩余在库中。以那种方式，大部分或所有目标基因座将在使用最终引物库时被扩增。这正是以下应用所希望的，例如筛选基因组中在大量位置处的缺失或复制，或筛选与疾病或增加的疾病风险相关的大量序列(例如多态性或其它突变)。如果来自库的一个引物对将产生与通过另一个引物对产生的目标扩增子重叠的目标扩增子，那么可以从库中去除所述引物对中的一个以防止干扰。In some embodiments for selecting primers, an initial library of primer candidates is created by designing one or more primers or primer pairs to candidate target loci. A set of candidate target loci (eg, SNPs) can be selected based on publicly available information about desired parameters of the target loci, such as the frequency of the SNP or the heterozygosity rate of the SNP within the target population. In one example, PCR primers can be designed using the Primer3 program (www.primer3.sourceforge.net; libprimer3 version 2.2.3, which is hereby incorporated by reference in its entirety). Primers can be designed to anneal within a specific annealing temperature range, have a specific range of GC content, have a specific size range, produce target amplicons within a specific size range, and/or be characterized by other parameters, if desired. Starting with multiple primers or primer pairs per candidate target locus increases the likelihood that primers or primer pairs for most or all target loci will remain in the library. In one embodiment, selection criteria may require that at least one primer pair per locus of interest remain in the library. In that way, most or all loci of interest will be amplified when using the final primer pool. This is desirable for applications such as screening for deletions or duplications at large numbers of locations in the genome, or screening large numbers of sequences associated with disease or increased disease risk (eg polymorphisms or other mutations). If one primer pair from a library would produce a target amplicon that overlaps a target amplicon generated by another primer pair, one of the primer pairs can be removed from the library to prevent interference.

在一些实施例中，计算(例如在计算机上计算)来自候选引物库的两种引物的大部分或所有可能组合的“不理想分数”(更高的分数表示最小的合意性)。在不同实施例中，计算库中至少80％、90％、95％、98％、99％或99.5％的可能的候选引物组合的不理想分数。每个不理想分数至少部分基于在两种候选引物之间形成二聚体的似然性。必要时，不理想分数还可以基于一或多个选自由以下组成的群组的其它参数：目标基因座的杂合率、与在目标基因座的序列(例如，多态性)相关的疾病流行率、与在目标基因座的序列(例如，多态性)相关的疾病外显率、候选引物对目标基因座的特异性、候选引物的大小、目标扩增子的解链温度、目标扩增子的GC含量、目标扩增子的扩增效率以及目标扩增子的大小。如果考虑多个因素，那么不理想分数可以基于各个参数的加权平均值来计算。所述参数可以基于其对于将使用引物的特定应用的重要性分配不同的权重。在一些实施例中，从库中去除不理想分数最高的引物。如果所去除的引物是杂交到一个目标基因座的引物对的成员，那么可以从库中去除所述引物对的另一个成员。可以视需要重复去除引物的过程。在一些实施例中，执行所述选择方法直到库中剩余的候选引物组合的不理想分数全部等于或低于最小阈值为止。在一些实施例中，执行所述选择方法直到库中剩余的候选引物的数量减少到所希望的数量为止。In some embodiments, "undesirable scores" are calculated (eg, calculated in silico) for most or all possible combinations of two primers from a pool of candidate primers (a higher score indicates minimal desirability). In various embodiments, a non-ideal score is calculated for at least 80%, 90%, 95%, 98%, 99%, or 99.5% of the possible candidate primer combinations in the library. Each undesirable score is based at least in part on the likelihood of dimer formation between two candidate primers. If desired, the undesired score can also be based on one or more other parameters selected from the group consisting of: heterozygosity rate at the target locus, disease prevalence associated with sequences (e.g., polymorphisms) at the target locus rate, disease penetrance associated with sequence (e.g., polymorphism) at the target locus, specificity of candidate primers for the target locus, size of candidate primers, melting temperature of the target amplicon, target amplification The GC content of the target amplicon, the amplification efficiency of the target amplicon, and the size of the target amplicon. If multiple factors are considered, the unfavorable score can be calculated based on a weighted average of the individual parameters. The parameters can be assigned different weights based on their importance to the particular application in which the primer will be used. In some embodiments, primers with the highest undesirable scores are removed from the library. If the removed primer is a member of a primer pair that hybridizes to one locus of interest, then the other member of the primer pair can be removed from the library. The process of removing primers can be repeated as necessary. In some embodiments, the selection method is performed until the undesired scores of the remaining candidate primer combinations in the library are all at or below a minimum threshold. In some embodiments, the selection method is performed until the number of candidate primers remaining in the library is reduced to a desired number.

在不同实施例中，在计算不理想分数之后，从库中去除作为两种候选引物的最大数量组合中不理想分数高于第一最小阈值的部分的候选引物。这个步骤忽略了等于或低于第一最小阈值的相互作用，因为这些相互作用较不重要。如果所去除的引物是杂交到一个目标基因座的引物对的成员，那么可以从库中去除所述引物对的另一个成员。可以视需要重复去除引物的过程。在一些实施例中，执行所述选择方法直到库中剩余的候选引物组合的不理想分数全部等于或低于第一最小阈值为止。如果库中剩余的候选引物的数量高于所希望的，那么可以通过将第一最小阈值降低到更低的第二最小阈值并且重复去除引物的过程来减少引物数量。如果库中剩余的候选引物的数量低于所希望的，那么可以通过将第一最小阈值增加到更高的第二最小阈值并且使用初始候选引物库重复去除引物的过程继续所述方法，从而允许库中剩余更多候选引物。在一些实施例中，执行所述选择方法直到库中剩余的候选引物组合的不理想分数全部等于或低于第二最小阈值为止，或直到库中剩余的候选引物的数量减少到所希望的数量为止。In various embodiments, after calculating the undesired score, candidate primers that are the portion of the maximum number of combinations of two candidate primers with an unsatisfactory score above a first minimum threshold are removed from the library. This step ignores interactions at or below the first minimum threshold, since these interactions are less important. If the removed primer is a member of a primer pair that hybridizes to one locus of interest, then the other member of the primer pair can be removed from the library. The process of removing primers can be repeated as necessary. In some embodiments, the selection method is performed until the undesired scores of remaining candidate primer combinations in the library are all at or below a first minimum threshold. If the number of candidate primers remaining in the library is higher than desired, the number of primers can be reduced by lowering the first minimum threshold to a lower second minimum threshold and repeating the process of removing primers. If the number of candidate primers remaining in the pool is lower than desired, the method can be continued by increasing the first minimum threshold to a higher second minimum threshold and repeating the process of removing primers using the initial pool of candidate primers, thereby allowing There are more primer candidates remaining in the library. In some embodiments, the selection method is performed until the undesired scores of the remaining candidate primer combinations in the library are all at or below a second minimum threshold, or until the number of remaining candidate primer combinations in the library is reduced to a desired number until.

必要时，可以将产生与通过另一个引物对产生的目标扩增子重叠的目标扩增子的引物对分到分开的扩增反应。对于需要分析所有候选目标基因座(而不是由于与目标扩增子重叠而从分析中省略候选目标基因座)的应用会需要多个PCR扩增反应。If desired, a primer pair that produces a target amplicon that overlaps a target amplicon produced by another primer pair can be split into separate amplification reactions. Multiple PCR amplification reactions would be required for applications requiring the analysis of all candidate target loci (rather than omitting candidate target loci from analysis due to overlap with target amplicons).

这些选择方法使必须从库中去除的候选引物的数量降到最低，实现了引物二聚体的所希望的减少。通过从库中去除更少数量的候选引物，可以使用所得引物库扩增更多(或所有)目标基因座。These selection methods minimize the number of candidate primers that must be removed from the library, achieving the desired reduction in primer-dimers. By removing a smaller number of candidate primers from the library, the resulting primer library can be used to amplify more (or all) of the target loci.

复合大量引物向可以被包括的检测施加了大量限制。无意地相互作用的检测产生了杂散扩增产物。微型PCR的大小限制会引起进一步限制。在一个实施例中，有可能以极大量的潜在SNP目标(介于约500到大于1百万之间)为起始物质并且试图设计出扩增每个SNP的引物。当可以设计出引物时，有可能试图通过使用针对DNA双螺旋体形成的公开热力学参数来估计在所有可能的引物对之间形成杂散引物双螺旋体的似然性，来鉴别可能形成杂散产物的引物对。引物相互作用可以通过与相互相用相关的分数功能进行排名并且消除相互相用分数最差的引物直到满足所希望的引物数量为止。在可能是杂合的SNP最适用的情况下，也有可能对检测清单进行排名并且选择杂合最兼容的检测。实验已经验证，相互相用分数高的引物最有可能形成引物二聚体。在高复合下，不可能消除所有杂散相互作用，但是必需去除计算机模拟中相互相用分数最高的引物或引物对，因为它们会主导整个反应，大大限制预定目标的扩增。我们已经执行这个程序创建了高达并且在一些情况下，超过10,000个引物的复合引物组。由于这个程序，改进是巨大的，相比于来自没有去除最差引物的反应的10％，实现了对目标产物进行80％以上、90％以上、95％以上、98％以上并且甚至99％以上的扩增，如通过所有PCR产物的测序所测定。当与如先前所述的部分半嵌套式方法组合时，90％以上并且甚至95％以上的扩增子可以映射到靶向序列。Complexing a large number of primers imposes a number of constraints on the assays that can be included. Detection of unintentional interactions produces spurious amplification products. Further limitations arise from the size limitations of mini-PCRs. In one embodiment, it is possible to start with a very large number of potential SNP targets (between about 500 and greater than 1 million) and attempt to design primers that amplify each SNP. When primers can be designed, it is possible to attempt to identify those likely to form spurious products by using published thermodynamic parameters for DNA duplex formation to estimate the likelihood of spurious primer duplex formation between all possible primer pairs. primer pair. Primer interactions can be ranked by a score function related to interaction and primers with the worst interaction scores are eliminated until the desired number of primers is met. It is also possible to rank the list of tests and select the most heterozygous compatible test where the SNP that is likely to be heterozygous is most applicable. Experiments have verified that primers with high mutual use fractions are most likely to form primer-dimers. At high complexes, it is not possible to eliminate all spurious interactions, but it is necessary to remove primers or primer pairs with the highest fractional interaction in silico, as they can dominate the overall reaction, greatly limiting amplification of the intended target. We have performed this procedure to create composite primer sets of up to, and in some cases, more than 10,000 primers. Thanks to this procedure, the improvement is enormous, achieving 80%, 90%, 95%, 98% and even 99% of the target product compared to 10% from reactions that did not remove the worst primers. Amplification of , as determined by sequencing of all PCR products. When combined with the partially semi-nested approach as previously described, more than 90% and even more than 95% of the amplicons could be mapped to the targeting sequence.

应注意，存在用于确定哪些PCR探针可能形成二聚体的其它方法。在一个实施例中，分析已经使用一组非优化引物扩增的一池DNA可以足以确定有问题的引物。举例来说，可以使用测序进行分析，并且确定以最大数量存在的那些二聚体最有可能是形成二聚体的那些，并且可以去除。It should be noted that there are other methods for determining which PCR probes are likely to form dimers. In one embodiment, analyzing a pool of DNA that has been amplified using a non-optimized set of primers may be sufficient to identify problematic primers. For example, sequencing can be used for analysis and it is determined that those dimers present in the greatest number are the ones most likely to form dimers and can be removed.

这种方法具有多种潜在应用，例如用于SNP基因分型、杂合率测定、拷贝数测量以及其它靶向测序应用。在一个实施例中，引物设计方法可以与本文档中其它地方所述的微型PCR方法组合使用。在一些实施例中，引物设计方法可以用作大规模复合PCR方法的一部分。This method has a variety of potential applications, such as for SNP genotyping, heterozygosity rate determination, copy number measurement, and other targeted sequencing applications. In one embodiment, the primer design method can be used in combination with the mini-PCR method described elsewhere in this document. In some embodiments, primer design methods can be used as part of a large-scale multiplex PCR method.

在引物上使用标记可以减少引物二聚体产物的扩增和测序。在一些实施例中，引物含有与标记形成环结构的内部区域。在特定实施例中，引物包括对目标基因座具有特异性的5′区、对目标基因座不具有特异性并且形成环结构的内部区域以及对目标基因座具有特异性的3′区。在一些实施例中，环区域可以处于两个结合区之间，其中两个结合区被设计成用于结合到模板DNA的连续或相邻区域。在不同实施例中，3′区的长度是至少7个核苷酸。在一些实施例中，3′区的长度介于7个与20个核苷酸之间，例如介于7到15个核苷酸或7到10个核苷酸之间，包括端点。在不同实施例中，引物包括对目标基因座不具有特异性的5′区(例如标记或通用引物结合位点)、接着是对目标基因座具有特异性的区域、对目标基因座不具有特异性并且形成环结构的内部区域以及对目标基因座具有特异性的3′区。标记-引物可以用于将必需的目标特异性序列缩短到低于20、低于15、低于12并且甚至低于10个碱基对。这可以是在标准引物设计的情况下当使引物结合位点内的目标序列片段化或它可以被设计到引物设计中时偶然发现的。这种方法的优点包括：它增加了可以被设计用于某一最大扩增子长度的检测数量，并且它缩短了引物序列的“不提供信息的”测序。它还可以与内部标记(参看本文档中的其它地方)组合使用。Use of labels on primers can reduce amplification and sequencing of primer-dimer products. In some embodiments, the primer contains an internal region that forms a loop structure with the label. In certain embodiments, the primer includes a 5' region specific for the locus of interest, an inner region that is not specific for the locus of interest and forms a loop structure, and a 3' region specific for the locus of interest. In some embodiments, the loop region may be between two binding regions designed to bind to contiguous or adjacent regions of template DNA. In various embodiments, the 3' region is at least 7 nucleotides in length. In some embodiments, the 3' region is between 7 and 20 nucleotides, such as between 7 and 15 nucleotides or 7 and 10 nucleotides in length, inclusive. In various embodiments, the primers include a 5' region that is not specific for the locus of interest (e.g., a marker or universal primer binding site), followed by a region that is specific for the locus of interest, nonspecific for the locus of interest The inner region that is homogeneous and forms a loop structure and the 3' region that is specific for the locus of interest. Tag-primers can be used to shorten the necessary target-specific sequence to less than 20, less than 15, less than 12 and even less than 10 base pairs. This can be found incidentally when fragmenting the target sequence within the primer binding site in the case of standard primer design or it can be engineered into the primer design. Advantages of this approach include: it increases the number of assays that can be engineered for a certain maximum amplicon length, and it reduces "uninformative" sequencing of primer sequences. It can also be used in combination with internal markup (see elsewhere in this document).

在一个实施例中，复合靶向PCR扩增中非生产性产物的相对量可以通过升高退火温度减少。在含有与目标特异性引物相同的标记的扩增库的情况下，退火温度可以相比于基因组DNA有所增加，因为标记将造成引物结合。在一些实施例中，我们使用的引物浓度显著低于预先报告的并且所使用的退火时间长于其它地方所报告的。在一些实施例中，退火时间可以是长于3分钟、长于5分钟、长于8分钟、长于10分钟、长于15分钟、长于20分钟、长于30分钟、长于60分钟、长于120分钟、长于240分钟、长于480分钟、并且甚至长于960分钟。在一个实施例中，使用比先前报告中长的退火时间，允许更低的引物浓度。在不同实施例中，使用长于正常延伸时间，例如超过3、5、8、10或15分钟。在一些实施例中，引物浓度低到50nM、20nM、10nM、5nM、1nM并且低于1μM。这意外地产生了高度复合反应的稳固性能，例如1,000重反应、2,000重反应、5,000重反应、10,000重反应、20,000重反应、50,000重反应、并且甚至100,000重反应。在一个实施例中，扩增使用一个、两个、三个、四个或五个用长退火时间操作的周期，接着是具有更常见的退火时间的含标记引物的PCR周期。In one embodiment, the relative amount of non-productive products in a multiplex targeted PCR amplification can be reduced by increasing the annealing temperature. In the case of amplified pools containing the same tag as the target-specific primers, the annealing temperature can be increased compared to genomic DNA because the tag will cause the primer to bind. In some examples, we used significantly lower primer concentrations than previously reported and used longer annealing times than reported elsewhere. In some embodiments, the annealing time may be longer than 3 minutes, longer than 5 minutes, longer than 8 minutes, longer than 10 minutes, longer than 15 minutes, longer than 20 minutes, longer than 30 minutes, longer than 60 minutes, longer than 120 minutes, longer than 240 minutes, Longer than 480 minutes, and even longer than 960 minutes. In one example, longer annealing times than in previous reports were used, allowing for lower primer concentrations. In various embodiments, a longer than normal extension time is used, for example over 3, 5, 8, 10 or 15 minutes. In some embodiments, primer concentrations are as low as 50 nM, 20 nM, 10 nM, 5 nM, 1 nM and below 1 μM. This surprisingly yields robust performance for highly complex reactions, such as 1,000-fold reactions, 2,000-fold reactions, 5,000-fold reactions, 10,000-fold reactions, 20,000-fold reactions, 50,000-fold reactions, and even 100,000-fold reactions. In one embodiment, amplification uses one, two, three, four or five cycles operating with long annealing times followed by PCR cycles with labeled primers with more common annealing times.

为了选择目标位置，可以从一池候选引物对设计开始并且创建引物对之间的潜在不利相互作用的热力学模型，并且然后使用所述模型消除与池中的其它设计不兼容的设计。To select target positions, one can start with a pool of candidate primer pair designs and create a thermodynamic model of potentially adverse interactions between primer pairs, and then use the model to eliminate designs that are incompatible with other designs in the pool.

在选择过程之后，库中剩余的引物可以用于本发明方法中的任一种中。After the selection process, the remaining primers in the library can be used in any of the methods of the invention.

例示性引物库Exemplary primer library

在一方面，本发明的特征在于引物库，例如使用本发明方法中的任一种选自候选引物库的引物。在一些实施例中，所述库包括在一个反应体积中同时杂交(或能够同时杂交)到或同时扩增(或能够同时扩增)至少1,000、2,000、5,000、7,500、10,000、20,000、25,000、30,000、40,000、50,000、75,000或100,000个不同的目标基因座的引物。在不同实施例中，所述库包括在一个反应体积中同时扩增(或能够同时扩增)介于1,000个到2,000个、2,000个到5,000个、5,000个到7,500个、7,500个到10,000个、10,000个到20,000个、20,000个到25,000个、25,000个到30,000个、30,000个到40,000个、40,000个到50,000个、50,000个到75,000个或75,000个到100,000个之间的不同目标基因座的引物，包括端点。在不同实施例中，所述库包括在一个反应体积中同时扩增(或能够同时扩增)介于1,000个到100,000个之间的不同目标基因座的引物，例如介于1,000个到50,000个、1,000个到30,000个、1,000个到20,000个、1,000个到10,000个、2,000个到30,000个、2,000个到20,000个、2,000个到10,000个、5,000个到30,000个、5,000个到20,000个或5,000个到10,000个之间的不同目标基因座，包括端点。在一些实施例中，所述库包括在一个反应体积中同时扩增(或能够同时扩增)目标基因座以使得小于60％、40％、30％、20％、10％、5％、4％、3％、2％、1％、0.5％、0.25％、0.1％或0.5％的扩增产物是引物二聚体的引物。在各个实施例中，作为引物二聚体的扩增产物的量介于0.5％到60％之间，例如介于0.1％到40％、0.1％到20％、0.25％到20％、0.25％到10％、0.5％到20％、0.5％到10％、1％到20％或1％到10％之间，包括端点。在一些实施例中，引物在一个反应体积中同时扩增(或能够同时扩增)目标基因座以使得至少50％、60％、70％、80％、90％、95％、96％、97％、98％、99％或99.5％的扩增产物是目标扩增子。在不同实施例中，作为目标扩增子的扩增产物的量介于50％到99.5％之间，例如介于60％到99％、70％到98％、80％到98％、90％到99.5％或95％到99.5％之间，包括端点。在一些实施例中，引物在一个反应体积中同时扩增(或能够同时扩增)目标基因座以使得至少50％、60％、70％、80％、90％、95％、96％、97％、98％、99％或99.5％的靶向基因座被扩增。在不同实施例中，被扩增的目标基因座的量介于50％到99.5％之间，例如介于60％到99％、70％到98％、80％到99％、90％到99.5％、95％到99.9％或98％到99.99％之间，包括端点。在一些实施例中，引物库包括至少1,000、2,000、5,000、7,500、10,000、20,000、25,000、30,000、40,000、50,000、75,000或100,000个引物对，其中每对引物包括正向测试引物和反向测试引物，其中每对测试引物杂交到一个目标基因座。在一些实施例中，引物库包括至少1,000、2,000、5,000、7,500、10,000、20,000、25,000、30,000、40,000、50,000、75,000或100,000个各自杂交到不同目标基因座的单独引物，其中所述单独引物不是引物对的一部分。In one aspect, the invention features a library of primers, eg, primers selected from a library of candidate primers using any of the methods of the invention. In some embodiments, the library comprises at least 1,000, 2,000, 5,000, 7,500, 10,000, 20,000, 25,000, Primers for 30,000, 40,000, 50,000, 75,000, or 100,000 different target loci. In various embodiments, the library comprises between 1,000 to 2,000, 2,000 to 5,000, 5,000 to 7,500, 7,500 to 10,000 simultaneously amplified (or capable of being simultaneously amplified) in one reaction volume , 10,000 to 20,000, 20,000 to 25,000, 25,000 to 30,000, 30,000 to 40,000, 40,000 to 50,000, 50,000 to 75,000, or 75,000 to 100,000 different target loci Primers, including endpoints. In various embodiments, the library comprises primers that simultaneously amplify (or are capable of simultaneously amplifying) between 1,000 and 100,000 different target loci in one reaction volume, such as between 1,000 and 50,000 , 1,000 to 30,000, 1,000 to 20,000, 1,000 to 10,000, 2,000 to 30,000, 2,000 to 20,000, 2,000 to 10,000, 5,000 to 30,000, 5,000 to 20,000 or 5,000 Between 10,000 and 10,000 different loci of interest, including endpoints. In some embodiments, the library comprises simultaneous amplification (or is capable of simultaneous amplification) of target loci in one reaction volume such that less than 60%, 40%, 30%, 20%, 10%, 5%, 4 %, 3%, 2%, 1%, 0.5%, 0.25%, 0.1%, or 0.5% of the amplification products are primers that are primer dimers. In various embodiments, the amount of amplification product as a primer-dimer is between 0.5% and 60%, such as between 0.1% and 40%, 0.1% and 20%, 0.25% and 20%, 0.25% to 10%, 0.5% to 20%, 0.5% to 10%, 1% to 20%, or 1% to 10%, inclusive. In some embodiments, the primers simultaneously amplify (or are capable of simultaneously amplifying) the target loci in one reaction volume such that at least 50%, 60%, 70%, 80%, 90%, 95%, 96%, 97% %, 98%, 99%, or 99.5% of the amplification products are target amplicons. In various embodiments, the amount of amplified product as target amplicon is between 50% and 99.5%, such as between 60% and 99%, 70% and 98%, 80% and 98%, 90% to 99.5% or 95% to 99.5%, inclusive. In some embodiments, the primers simultaneously amplify (or are capable of simultaneously amplifying) the target loci in one reaction volume such that at least 50%, 60%, 70%, 80%, 90%, 95%, 96%, 97% %, 98%, 99%, or 99.5% of the targeted loci were amplified. In various embodiments, the amount of amplified target loci is between 50% and 99.5%, such as between 60% and 99%, 70% and 98%, 80% and 99%, 90% and 99.5% %, 95% to 99.9%, or 98% to 99.99%, inclusive. In some embodiments, the primer library comprises at least 1,000, 2,000, 5,000, 7,500, 10,000, 20,000, 25,000, 30,000, 40,000, 50,000, 75,000, or 100,000 primer pairs, wherein each pair of primers includes a forward test primer and a reverse test primer Primers, where each pair of test primers hybridizes to one locus of interest. In some embodiments, the library of primers comprises at least 1,000, 2,000, 5,000, 7,500, 10,000, 20,000, 25,000, 30,000, 40,000, 50,000, 75,000, or 100,000 individual primers that each hybridize to a different target locus, wherein the individual primers Not part of the primer pair.

在不同实施例中，每个引物的浓度小于100、75、50、25、20、10、5、2或1nM，或小于500、100、10或1μM。在不同实施例中，每个引物的浓度介于1μM到100nM之间，例如介于1μM到1nM、1到75nM、2到50nM或5到50nM之间，包括端点。在不同实施例中，引物的GC含量介于30％到80％之间，例如介于40％到70％或50％到60％之间，包括端点。在一些实施例中，引物的GC含量的范围是小于30％、20％、10％或5％。在一些实施例中，引物的GC含量的范围介于5％到30％，例如5％到20％或5％到10％之间，包括端点。在一些实施例中，测试引物的解链温度(T_m)介于40℃到80℃，例如50℃到70℃、55℃到65℃或57℃到60.5℃之间，包括端点。在一些实施例中，使用Primer3程序(1ibprimer3版本2.2.3)使用内置式圣塔露琪亚(SantaLucia)参数(www.primer3.sourceforge.net)计算T_m。在一些实施例中，引物的解链温度的范围是小于15℃、10℃、5℃、3℃或1℃。在一些实施例中，引物的解链温度的范围介于1℃到15℃之间，例如介于1℃到10℃、1℃到5℃或1℃到3℃之间，包括端点。在一些实施例中，引物的长度介于15到100个核苷酸之间，例如介于15到75个核苷酸、15到40个核苷酸、17到35个核苷酸、18到30个核苷酸、20到65个核苷酸之间，包括端点。在一些实施例中，引物的长度的范围是小于50、40、30、20、10或5个核苷酸。在一些实施例中，引物的长度的范围介于5到50个核苷酸，例如5到40个核苷酸、5到20个核苷酸或5到10个核苷酸之间，包括端点。在一些实施例中，目标扩增子的长度介于50个与100个核苷酸之间，例如介于60个与80个核苷酸或60到75个核苷酸之间，包括端点。在一些实施例中，目标扩增子的长度的范围是小于50、25、15、10或5个核苷酸。在一些实施例中，目标扩增子的长度的范围介于5到50个核苷酸，例如5到25个核苷酸、5到15个核苷酸或5到10个核苷酸之间，包括端点。In various embodiments, the concentration of each primer is less than 100, 75, 50, 25, 20, 10, 5, 2 or 1 nM, or less than 500, 100, 10 or 1 μM. In various embodiments, the concentration of each primer is between 1 μM to 100 nM, eg, between 1 μM to 1 nM, 1 to 75 nM, 2 to 50 nM, or 5 to 50 nM, inclusive. In various embodiments, the primers have a GC content of between 30% and 80%, such as between 40% and 70% or between 50% and 60%, inclusive. In some embodiments, the GC content of the primers ranges from less than 30%, 20%, 10%, or 5%. In some embodiments, the primers have a GC content in the range of 5% to 30%, such as 5% to 20% or 5% to 10%, inclusive. In some embodiments, the test primer has a melting temperature ( _Tm ) between 40°C to 80°C, eg, 50°C to 70°C, 55°C to 65°C, or 57°C to 60.5°C, inclusive. In some embodiments, _Tm is calculated using the Primer3 program (libprimer3 version 2.2.3) using built-in Santa Lucia parameters (www.primer3.sourceforge.net). In some embodiments, the melting temperature of the primer ranges from less than 15°C, 10°C, 5°C, 3°C, or 1°C. In some embodiments, the melting temperature of the primer ranges between 1°C to 15°C, eg, between 1°C to 10°C, 1°C to 5°C, or 1°C to 3°C, inclusive. In some embodiments, the length of the primer is between 15 and 100 nucleotides, such as between 15 and 75 nucleotides, 15 to 40 nucleotides, 17 to 35 nucleotides, 18 to 30 nucleotides, between 20 and 65 nucleotides, inclusive. In some embodiments, primers range in length from less than 50, 40, 30, 20, 10, or 5 nucleotides. In some embodiments, the primers range in length from 5 to 50 nucleotides, such as 5 to 40 nucleotides, 5 to 20 nucleotides, or 5 to 10 nucleotides inclusive. . In some embodiments, the amplicon of interest is between 50 and 100 nucleotides, such as between 60 and 80 nucleotides or 60 to 75 nucleotides in length, inclusive. In some embodiments, target amplicons range in length from less than 50, 25, 15, 10, or 5 nucleotides. In some embodiments, the target amplicon ranges in length from 5 to 50 nucleotides, such as 5 to 25 nucleotides, 5 to 15 nucleotides, or 5 to 10 nucleotides in length , including the endpoint.

这些引物库可以用于本发明方法中的任一种中。These primer libraries can be used in any of the methods of the invention.

例示性引物试剂盒Exemplary Primer Kits

在一方面，本发明的特征在于一种试剂盒(例如用于扩增核酸样品中的目标基因座的试剂盒)，它包括本发明引物库中的任一个。在一些实施例中，可以配制包含多个被设计成用于实现本发明中所描述的方法的引物的试剂盒。引物可以是如本文中所公开的外部正向和反向引物、内部正向和反向引物，它们可以是如引物设计部分中所公开的已经被设计成与试剂盒中的其它引物具有低结合亲和力的引物，它们可以是如相关部分中所述的杂交捕获探针或已预环化探针，或其一些组合。在一个实施例中，可以配制被设计成与本文中所公开的方法一起使用来测定孕育中的胎儿中的目标染色体的倍性状态的试剂盒，所述试剂盒包含多个内部正向引物和任选地多个内部反向引物以及任选地外部正向引物和外部反向引物，其中所述引物中的每一个被设计成用于杂交到紧接着目标染色体和任选地额外染色体上的目标位点(例如，多态位点)中的一个的上游和/或下游的DNA区域。在一个实施例中，引物试剂盒可以与本文档中其它地方所述的诊断盒组合使用。在一些实施例中，试剂盒包括使用所述库扩增目标基因座的说明书。In one aspect, the invention features a kit (eg, a kit for amplifying a target locus in a nucleic acid sample) that includes any one of the primer libraries of the invention. In some embodiments, kits comprising a plurality of primers designed to carry out the methods described herein can be formulated. Primers can be outer forward and reverse primers as disclosed herein, inner forward and reverse primers, which can have been designed to have low binding to other primers in the kit as disclosed in the primer design section Affinity primers, which can be hybrid capture probes or precircularized probes, or some combination thereof, as described in the relevant section. In one embodiment, a kit designed for use with the methods disclosed herein to determine the ploidy state of a target chromosome in a gestating fetus can be formulated comprising a plurality of internal forward primers and Optionally a plurality of internal reverse primers and optionally external forward primers and external reverse primers, wherein each of said primers is designed for hybridization to immediately following the target chromosome and optionally the additional chromosome A region of DNA upstream and/or downstream of one of the target sites (eg, polymorphic site). In one embodiment, a primer kit can be used in combination with a diagnostic kit as described elsewhere in this document. In some embodiments, the kit includes instructions for using the library to amplify the locus of interest.

例示性复合PCR方法Exemplary multiplex PCR method

在一方面，本发明的特征在于扩增核酸样品中的目标基因座的方法，所述方法涉及(i)使核酸样品与同时杂交到最少1,000、2,000、5,000、7,500、10,000、20,000、25,000、30,000、40,000、50,000、75,000或100,000个不同的目标基因座的引物库接触以产生反应混合物；并且(ii)使反应混合物经历引物延伸反应条件(例如PCR条件)以产生包括目标扩增子的扩增产物。在一些实施例中，所述方法还包括确定存在或不存在至少一个目标扩增子(例如至少50％、60％、70％、80％、90％、95％、96％、97％、98％、99％或99.5％的目标扩增子)。在一些实施例中，所述方法还包括测定至少一个目标扩增子(例如至少50％、60％、70％、80％、90％、95％、96％、97％、98％、99％或99.5％的目标扩增子)的序列。在一些实施例中，至少50％、60％、70％、80％、90％、95％、96％、97％、98％、99％或99.5％的靶向基因座被扩增。在不同实施例中，小于60％、50％、40％、30％、20％、10％、5％、4％、3％、2％、1％、0.5％、0.25％、0.1％或0.05％的扩增产物是引物二聚体。In one aspect, the invention features a method of amplifying a target locus in a nucleic acid sample that involves (i) hybridizing a nucleic acid sample to at least 1,000, 2,000, 5,000, 7,500, 10,000, 20,000, 25,000, 30,000, 40,000, 50,000, 75,000, or 100,000 primer pools of different target loci are contacted to generate a reaction mixture; and (ii) subjecting the reaction mixture to primer extension reaction conditions (e.g., PCR conditions) to generate an amplicon comprising the target amplicon increase product. In some embodiments, the method further comprises determining the presence or absence of at least one target amplicon (e.g., at least 50%, 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98% %, 99%, or 99.5% of target amplicons). In some embodiments, the method further comprises determining at least one target amplicon (e.g., at least 50%, 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99% or 99.5% of target amplicons). In some embodiments, at least 50%, 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99%, or 99.5% of the targeted loci are amplified. In various embodiments, less than 60%, 50%, 40%, 30%, 20%, 10%, 5%, 4%, 3%, 2%, 1%, 0.5%, 0.25%, 0.1%, or 0.05 % of amplification products were primer dimers.

在一个实施例中，本文中所公开的方法使用高度有效高度复合的靶向PCR来扩增DNA，接着使用高通量测序来测定每个目标基因座处的等位基因频率。在一个反应体积中以大部分所得序列读数映射到靶向基因座的方式复合超过约50或100个PCR引物的能力是新颖并且非显而易见的。允许高度复合靶向PCR以高度有效方式执行的一种技术涉及设计不太可能彼此杂交的引物。通过创建至少500、至少1,000、至少2,000、至少5,000、至少7,500、至少10,000、至少20,000、至少25,000、至少30,000、至少40,000、至少50,000、至少75,000或至少100,000个潜在引物对之间的潜在不利相互作用，或引物与样品DNA之间的不希望的相互作用的热力学模型，并且然后使用所述模型消除与池中的其它设计不兼容的设计来选择通常称为引物的PCR探针。允许高度复合靶向PCR以高度有效方式执行的另一种技术是使用靶向PCR的部分或完全嵌套式方法。使用这些方法中的一种或组合允许复合单一池中至少300、至少800、至少1,200、至少4,000或至少10,000个引物，其中所得被扩增的DNA包含在测序时将映射到靶向基因座的DNA分子中的大部分。使用这些方法中的一种或组合允许复合单一池中的大量引物，其中所得被扩增DNA包含超过50％、超过60％、超过67％、超过80％、超过90％、超过95％、超过96％、超过97％、超过98％、超过99％或超过99.5％的DNA分子映射到靶向基因座。In one embodiment, the methods disclosed herein use highly efficient high multiplex targeted PCR to amplify DNA, followed by high throughput sequencing to determine the allele frequency at each locus of interest. The ability to multiplex more than about 50 or 100 PCR primers in one reaction volume in such a way that a majority of the resulting sequence reads map to targeted loci is novel and non-obvious. One technique that allows highly complex targeted PCR to be performed in a highly efficient manner involves designing primers that are less likely to hybridize to each other. By creating at least 500, at least 1,000, at least 2,000, at least 5,000, at least 7,500, at least 10,000, at least 20,000, at least 25,000, at least 30,000, at least 40,000, at least 50,000, at least 75,000, or at least 100,000 potentially unfavorable interactions between potential primer pairs effect, or a thermodynamic model of unwanted interactions between primers and sample DNA, and then use the model to select PCR probes, often called primers, to eliminate designs that are incompatible with other designs in the pool. Another technique that allows highly multiplex targeted PCR to be performed in a highly efficient manner is the use of a partially or fully nested approach to targeted PCR. Use of one or a combination of these methods allows for the multiplexing of at least 300, at least 800, at least 1,200, at least 4,000, or at least 10,000 primers in a single pool, wherein the resulting amplified DNA contains DNA that will map to a targeted locus when sequenced. most of the DNA molecule. Using one or a combination of these methods allows multiplexing of a large number of primers in a single pool, wherein the resulting amplified DNA comprises more than 50%, more than 60%, more than 67%, more than 80%, more than 90%, more than 95%, more than 96%, greater than 97%, greater than 98%, greater than 99%, or greater than 99.5% of the DNA molecules map to the targeted locus.

在一些实施例中，目标遗传物质的检测可以用复合方式进行。可以并行操作的基因目标序列的数量可以在一到十、十到一百、一百到一千、一千到一万、一万到十万、十万到一百万或一百万到一千万的范围内变化。每个池复合超过100个引物的先前尝试已经产生了不希望的副反应显著问题，例如引物-二聚体形成。In some embodiments, detection of genetic material of interest can be performed in a multiplexed manner. The number of gene target sequences that can be manipulated in parallel can range from one to ten, ten to one hundred, one hundred to one thousand, one thousand to ten thousand, ten thousand to one hundred thousand, one hundred thousand to one million, or one million to one Variations in the range of tens of millions. Previous attempts to multiplex more than 100 primers per pool have produced significant problems with undesired side reactions, such as primer-dimer formation.

靶向PCRtargeted PCR

在一些实施例中，PCR可以用以靶向基因组的特定位置。在血浆样品中，使初始DNA高度片段化(通常小于500bp，平均长度小于200bp)。在PCR中，正向和反向引物退火到相同片段以使得能够扩增。因此，如果片段短，那么PCR检测必须同样扩增相对较短的区域。如MIP，如果多态位置太靠近聚合酶结合位点，那么它会导致不同等位基因的扩增偏差。当前，靶向多态区域的PCR引物(例如含有SNP的那些引物)通常被设计成使得引物的3′端将杂交到与多态碱基紧密相邻的碱基。在本发明的一个实施例中，正向和反向PCR引物的3′端被设计成用于杂交到远离靶向等位基因的变异体位置(多态位点)的一个或几个位置的碱基。多态位点(SNP或其它)之间的碱基和设计引物的3′端所杂交的碱基的数量可以是一个碱基，它可以是两个碱基，它可以是三个碱基，它可以是四个碱基，它可以是五个碱基，它可以是六个碱基，它可以是七到十个碱基，它可以是十一到十五个碱基，或它可以是十六到二十个碱基。正向和反向引物可以被设计成用于杂交不同数量的远离多态位点的碱基。In some embodiments, PCR can be used to target specific locations in the genome. In plasma samples, the initial DNA was highly fragmented (typically less than 500 bp with an average length of less than 200 bp). In PCR, forward and reverse primers anneal to the same fragment to enable amplification. Therefore, if the fragment is short, then the PCR assay must also amplify a relatively short region. Like MIP, if the polymorphic position is too close to the polymerase binding site, then it can lead to biased amplification of different alleles. Currently, PCR primers targeting polymorphic regions (such as those containing SNPs) are typically designed such that the 3' end of the primer will hybridize to the base immediately adjacent to the polymorphic base. In one embodiment of the invention, the 3' ends of the forward and reverse PCR primers are designed for hybridization to one or several positions away from the variant position (polymorphic site) of the targeted allele. base. The number of bases between the polymorphic sites (SNP or others) and the 3' end of the designed primer hybridized can be one base, it can be two bases, it can be three bases, It could be four bases, it could be five bases, it could be six bases, it could be seven to ten bases, it could be eleven to fifteen bases, or it could be Sixteen to twenty bases. Forward and reverse primers can be designed to hybridize different numbers of bases away from the polymorphic site.

可以产生大量PCR检测，然而，不同PCR检测之间的相互作用使得难以对它们复合超过约一百个检测。可以使用各种复合分子方法来提高复合水平，但是它仍然可能限于每个反应少于100，或许200或可能500个检测。含大量DNA的样品可以被拆分成多个子反应并且然后在测序之前重组。关于DNA分子的整个样品或一些子群体受到限制的样品，拆分样品将引入统计噪声。在一个实施例中，少量或有限数量的DNA可以指低于10pg、介于10与100pg之间、介于100pg与1ng之间、介于1与10ng之间或介于10与100ng之间的量。应注意，虽然这种方法尤其适用于少量DNA，其中涉及拆分成多个池的其它方法可以引起与所引入的随机噪声相关的显著问题，但是这种方法当它在含任何数量DNA的样品上操作时还然提供了使偏差降到最低的益处。在这些情况下，可以使用通用预扩增步骤来增加整体样品数量。理想地，这个预扩增步骤应该不明显地更改等位基因分布。Large numbers of PCR assays can be generated, however, the interaction between different PCR assays makes it difficult to multiplex them beyond about a hundred assays. Various complex molecular approaches can be used to increase the level of complexation, but it is still likely to be limited to less than 100, perhaps 200 or possibly 500 assays per reaction. Samples containing large amounts of DNA can be split into multiple sub-reactions and then reassembled prior to sequencing. Splitting the sample will introduce statistical noise with respect to the entire sample of DNA molecules or samples where some subpopulations are restricted. In one embodiment, a small or limited amount of DNA can refer to an amount below 10 pg, between 10 and 100 pg, between 100 pg and 1 ng, between 1 and 10 ng, or between 10 and 100 ng . It should be noted that while this method works especially well for small amounts of DNA, where other methods involving splitting into multiple pools can cause significant problems related to the random noise introduced, this method does not work well for samples containing any amount of DNA. It also provides the benefit of minimizing bias when operating on the fly. In these cases, a general preamplification step can be used to increase the overall sample size. Ideally, this preamplification step should not significantly alter the allelic distribution.

在一个实施例中，本发明方法可以从有限样品(例如来自体液的单细胞或DNA)产生对大量靶向基因座、具体来说1,000到5,000个基因座、5,000到10,000个基因座或大于10,000个基因座具有特异性的PCR产物，用于通过测序进行基因分型或一些其它基因分型方法。当前，进行超过5到10个目标的复合PCR反应提出了一项重大挑战并且通常受到例如引物二聚体和其它矫作物的引物副产物的阻挠。当使用微阵列用杂交探针检测目标序列时，引物二聚体和其它矫作物可以忽略，因为这些物质检测不到。然而，当使用测序作为检测方法时，绝大部分测序读数将对这类矫作物而不是样品中所希望的目标序列进行测序。现有技术中所述在一个反应体积中复合超过50或100个反应、接着测序的方法通常将产生超过20％、并且通常超过50％、在许多情况下超过80％并且在一些情况下超过90％的偏离目标序列读数。In one embodiment, the method of the present invention can generate a large number of targeted loci, specifically 1,000 to 5,000 loci, 5,000 to 10,000 loci, or more than 10,000 locus-specific PCR products for genotyping by sequencing or some other genotyping method. Currently, performing multiplex PCR reactions with more than 5 to 10 targets presents a significant challenge and is often hindered by primer by-products such as primer dimers and other artefacts. When microarrays are used to detect target sequences with hybridization probes, primer-dimers and other artefacts can be ignored because these species are not detected. However, when sequencing is used as a detection method, the vast majority of sequencing reads will sequence such artifacts rather than the desired target sequence in the sample. Methods described in the prior art to multiplex more than 50 or 100 reactions in one reaction volume followed by sequencing will typically yield more than 20%, and often more than 50%, in many cases more than 80% and in some cases more than 90% % of off-target sequence reads.

一般来说，为了执行样品的多个目标(超过50、超过100、超过500或超过1,000)的靶向测序，可以将样品拆分成多个扩增一个单独目标的并行反应。这已经在PCR多孔板中执行或可以在商业平台中进行，例如富鲁达访问阵列(FLUIDIGM ACCESS)(在微流体芯片中每个样品48个反应)或液滴PCR(DROPLET PCR)，通过雷恩·当斯技术(数百到数千个目标)。令人遗憾的是，这些拆分与合并方法对于含有限量DNA的样品是有问题的，因为通常不存在足够的基因组拷贝以确保每个孔中存在基因组的每个区域的一个拷贝。当靶向多态基因座并且需要多态基因座处的等位基因的相对比例时，这是尤其严重的问题，因为通过拆分与合并引入的随机噪声将导致存在于初始DNA样品中的等位基因的比例的测量非常不准确。在此描述了一种可有效地并且高效地扩增多个PCR反应的方法，所述方法适用于仅有限量DNA可用的情况。在一个实施例中，所述方法可以适用于分析单细胞、体液、DNA混合物(例如发现于母本血浆中的自由浮动的DNA)、活检、环境和/或法医样品。In general, to perform targeted sequencing of multiple targets (over 50, over 100, over 500, or over 1,000) of a sample, the sample can be split into multiple parallel reactions that amplify a single target. This has been performed in PCR multi-well plates or can be performed in commercial platforms such as FLUIDIGM ACCESS (48 reactions per sample in a microfluidic chip) or droplet PCR (DROPLET PCR) via Ray En Downs technique (hundreds to thousands of targets). Unfortunately, these split-and-merge methods are problematic for samples containing limited amounts of DNA because often not enough copies of the genome are present to ensure that one copy of each region of the genome is present in each well. This is a particularly serious problem when targeting polymorphic loci and the relative proportions of alleles at the polymorphic loci are desired, since random noise introduced by splitting and merging will result in isomorphic differences present in the initial DNA sample. The measurement of the ratio of alleles is very inaccurate. A method is described herein that can efficiently and efficiently amplify multiple PCR reactions where only limited amounts of DNA are available. In one embodiment, the method can be adapted for analysis of single cells, bodily fluids, DNA mixtures (such as free-floating DNA found in maternal plasma), biopsies, environmental and/or forensic samples.

在一个实施例中，靶向测序可以涉及以下步骤中的一个、多个或所有。a)产生并且扩增在DNA片段两端上具有衔接子序列的库。b)在库扩增之后分成多个反应。c)产生并且任选地扩增在DNA片段两端上具有衔接子序列的库。d)每个目标使用一个目标特异性“正向”引物和一个标记特异性引物执行所选目标的1000到10,000重扩增。e)使用“反向”目标特异性引物和一个(或更多个)对以第一轮中的目标特异性正向引物的一部分的形式引入的通用标记具有特异性的引物，从这种产物执行第二扩增。f)执行所选目标的1000重预扩增持续有限数量个周期。g)将产物分成多个等分试样并且在单独的反应(例如，50到500重)中扩增目标子池，但是这可以一直使用直到单重。h)合并并行子池反应的产物。i)在这些扩增期间，引物可以进行对兼容标记(部分或全长)测序以使得可以对产物进行测序。In one embodiment, targeted sequencing may involve one, more or all of the following steps. a) Generate and amplify a library with adapter sequences on both ends of the DNA fragments. b) Divide into multiple reactions after library amplification. c) Generating and optionally amplifying a library with adapter sequences at both ends of the DNA fragments. d) Perform 1,000 to 10,000 multiplex amplifications of selected targets using one target-specific "forward" primer and one marker-specific primer per target. e) Using a "reverse" target-specific primer and one (or more) primers specific for the universal marker introduced as part of the target-specific forward primer in the first round, from this product Perform a second amplification. f) Perform 1000-plex pre-amplification of selected targets for a limited number of cycles. g) Divide the product into multiple aliquots and amplify the subpool of interest in separate reactions (eg, 50 to 500 multiplexes), but this can be used down to singleplexes. h) Combining the products of parallel subpool reactions. i) During these amplifications, primers can be sequenced to compatible markers (partial or full length) so that the products can be sequenced.

高度复合PCRhigh multiplex PCR

本文中公开了允许超过一百个到数万个来自核酸样品(例如从血浆获得的基因组DNA)的目标序列(例如，SNP基因座)的靶向扩增的方法。扩增样品可以相对不含引物二聚体产物并且在目标基因座具有低等位基因偏差。如果在扩增期间或在扩增之后，产物与测序兼容衔接子附接，那么这些产物的分析可以通过测序执行。Disclosed herein are methods that allow the targeted amplification of over a hundred to tens of thousands of target sequences (eg, SNP loci) from a nucleic acid sample (eg, genomic DNA obtained from plasma). The amplified sample can be relatively free of primer-dimer products and have low allelic bias at the locus of interest. Analysis of products can be performed by sequencing if, during or after amplification, the products are attached with sequencing compatible adapters.

使用本领域中已知的方法执行高度复合PCR扩增产生的引物二聚体产物超过所希望的扩增产物并且不适用于测序。这些可以通过消除形成这些产物的引物或通过执行引物的计算机模拟选择而凭经验减少。然而，检测数量越大，这个问题变得越难。Performing high multiplex PCR amplification using methods known in the art produces primer-dimer products in excess of the desired amplification product and is unsuitable for sequencing. These can be empirically reduced by eliminating primers that form these products or by performing in silico selection of primers. However, the larger the number of detections, the harder this problem becomes.

一种解决方案是将5000重反应拆分成数个重数更低的扩增，例如一百个50重或五十个100重反应，或使用微流体或甚至将样品拆分成单独的PCR反应。然而，如果样品DNA有限，例如在妊娠血浆的非侵入性产前诊断中，那么应该避免在多个反应之间划分样品，因为这将产生瓶颈效应。One solution is to split the 5000-plex reaction into several lower multiplex amplifications, such as one hundred 50-plex or fifty 100-plex reactions, or use microfluidics or even split the samples into separate PCRs reaction. However, if sample DNA is limited, such as in non-invasive prenatal diagnosis of pregnancy plasma, then splitting the sample between multiple reactions should be avoided as this will create a bottleneck effect.

本文中描述了首先总体地扩增样品的血浆DNA并且然后将样品分成多个复合目标富集反应的方法，每个反应具有更适中数量的目标序列。在一个实施例中，本发明方法可以用于优先富集多个基因座处的DNA混合物，所述方法包含以下步骤中的一或多个：从DNA混合物产生并且扩增库，其中库中的分子具有接合在DNA片段两端上的衔接子序列；将扩增库分成多个反应；每个目标使用一个目标特异性“正向”引物和一个或多个衔接子特异性通用“反向”引物，执行所选目标的第一轮复合扩增。在一个实施例中，本发明方法进一步包括使用“反向”目标特异性引物和一个或多个对以第一轮中的目标特异性正向引物的一部分的形式引入的通用标记具有特异性的引物，执行第二扩增。在一个实施例中，所述方法可以涉及全嵌套式、半侧嵌套式、半嵌套式、单边全嵌套式、单边半侧嵌套式或单边半嵌套式PCR方法。在一个实施例中，本发明方法用于优先富集多个基因座处的DNA混合物，所述方法包含执行所选目标的复合预扩增持续有限数量个周期，将产物分成多个等分试样并且在单独的反应中扩增目标子池，以及合并并行子池反应的产物。应注意，这种方法可以用于对于50到500个基因座、对于500到5,000个基因座、对于5,000到50,000个基因座、或甚至对于50,000到500,000个基因座，以将产生低水平等位基因偏差的方式执行靶向扩增。在一个实施例中，引物携带部分或全长测序兼容标记。Described herein is a method of first collectively amplifying the plasma DNA of a sample and then splitting the sample into multiple composite target enrichment reactions, each with a more modest number of target sequences. In one embodiment, the method of the present invention can be used to preferentially enrich a DNA mixture at multiple loci, the method comprising one or more of the following steps: generating and amplifying a library from the DNA mixture, wherein the Molecules have adapter sequences ligated on both ends of DNA fragments; divide amplified pool into multiple reactions; use one target-specific "forward" primer and one or more adapter-specific universal "reverse" per target Primers, which perform the first round of multiplex amplification of the selected target. In one embodiment, the method of the invention further comprises the use of a "reverse" target-specific primer and one or more markers specific for the universal marker introduced as part of the target-specific forward primer in the first round. primers, to perform a second amplification. In one embodiment, the method may involve fully nested, half-nested, half-nested, single-sided fully nested, single-sided half-nested, or single-sided half-nested PCR methods . In one embodiment, the method of the invention is used to preferentially enrich a DNA mixture at multiple loci, the method comprising performing multiplex preamplification of selected targets for a finite number of cycles, dividing the product into multiple aliquots sample and amplify target subpools in separate reactions, and combine products from parallel subpool reactions. It should be noted that this approach can be used for 50 to 500 loci, for 500 to 5,000 loci, for 5,000 to 50,000 loci, or even for 50,000 to 500,000 loci to produce low-level alleles Targeted amplification is performed in a genetically biased manner. In one embodiment, the primers carry partial or full-length sequencing compatible tags.

工作流可以需要(1)提取DNA，例如血浆DNA；(2)制备在片段两端上具有通用衔接子的片段库；(3)使用对衔接子具有特异性的通用引物扩增库；(4)将扩增样品“库”分成多个等分试样；(5)对等分试样执行复合(例如约100重、1,000或10,000重，其中每个目标用一个目标特异性引物和标记特异性引物)扩增；(6)合并一个样品的等分试样；(7)识别样品的条形码；(8)混合样品并且调整浓度；(9)对样品进行测序。工作流可以包含多个含有所列步骤中的一个的子步骤(例如步骤(2)制备库步骤可以需要三个酶促步骤(末端平端化、dA加尾和衔接子接合)和三个纯化步骤)。工作流的步骤可以组合、分割或按不同顺序(例如识别条形码和合并样品)执行。A workflow may require (1) extraction of DNA, such as plasma DNA; (2) preparation of a fragment library with universal adapters on both ends of the fragments; (3) amplification of the library using universal primers specific to the adapters; (4) ) divide the amplified sample "pool" into multiple aliquots; (5) perform multiplexing (e.g., about 100, 1,000, or 10,000 multiplexes on the aliquots with one target-specific primer and marker-specific (6) pool aliquots of a sample; (7) identify the barcode of the sample; (8) mix the samples and adjust the concentration; (9) sequence the sample. The workflow can contain multiple sub-steps containing one of the steps listed (e.g. step (2) library preparation step can require three enzymatic steps (end blunting, dA tailing, and adapter ligation) and three purification steps ). The steps of the workflow can be combined, split, or executed in a different order (such as identifying barcodes and merging samples).

重要的是应注意，可以按偏向于更高效地扩增短片段的方式执行库扩增。以此方式，有可能优先扩增更短的序列，例如单核小体DNA片段，如孕妇的循环中所发现的无细胞胎儿DNA(胎盘来源的)。应注意，PCR检测可以具有标记，例如测序标记(通常是15到25个碱基的截短形式)。在复合之后，合并样品的PCR复合结果并且然后通过标记特异性PCR(也可以通过接合进行)完成(包括识别条形码)标记。此外，可以在与复合相同的反应中添加完整测序标记。在第一周期中，可以用目标特异性引物扩增目标，接着由标记特异性引物接管以完成SQ-衔接子序列。PCR引物可以不携带标记。测序标记可以通过接合附接到扩增产物。It is important to note that library amplification can be performed in a manner that favors the more efficient amplification of short fragments. In this way, it is possible to preferentially amplify shorter sequences, such as mononucleosomal DNA fragments, such as cell-free fetal DNA (of placental origin) found in the circulation of pregnant women. It should be noted that PCR detection may have a marker, such as a sequencing marker (usually a truncated form of 15 to 25 bases). After multiplexing, the PCR multiplexing results of the samples are pooled and labeling (including identification of barcodes) is then done by marker-specific PCR (which can also be done by conjugation). Additionally, full sequencing tags can be added in the same reaction as multiplexing. In the first cycle, the target can be amplified with target-specific primers, followed by tag-specific primers taking over to complete the SQ-adaptor sequence. PCR primers may not carry a marker. A sequencing marker can be attached to the amplification product by ligation.

在一个实施例中，关于例如胎儿非整倍性的检测的各种应用，可以相继使用高度复合PCR、通过克隆测序估计扩增物质。尽管传统的复合PCR同时估计多达五十个基因座，但是本文中所述的方法可以用于实现同时估计超过50个基因座、同时估计超过100个基因座、同时估计超过500个基因座、同时估计超过1,000个基因座、同时估计超过5,000个基因座、同时估计超过10,000个基因座、同时估计超过50,000个基因座以及同时估计超过100,000个基因座。实验已经显示，可以在单一反应中以足够好的效率和特异性同时估计多达10,000个、包括10,000个并且超过10,000个不同的基因座，从而作出具有高准确性的非侵入性产前非整倍性诊断和/或拷贝数判读。可以在单一反应中将检测与整个样品组合，所述样品例如从母本血浆中分离的cfDNA样品、其部分或cfDNA样品的另一经处理衍生物。样品(例如，cfDNA或衍生物)还可以被拆分成多个并行复合反应。最佳的样品拆分和复合通过权衡各种性能规格来确定。由于材料数量有限，所以将样品拆分成多个部分会引入抽样噪声、处置时间，并且增加错误可能性。反之，更高的复合会产生更大量的杂散扩增和更大的扩增不平等，这两者均会降低测试性能。In one embodiment, for various applications such as the detection of fetal aneuploidy, amplified material can be estimated by clonal sequencing using high multiplex PCR sequentially. While conventional multiplex PCR estimates up to fifty loci simultaneously, the methods described herein can be used to achieve simultaneous estimation of more than 50 loci, simultaneous estimation of more than 100 loci, simultaneous estimation of more than 500 loci, More than 1,000 loci are estimated simultaneously, more than 5,000 loci are estimated simultaneously, more than 10,000 loci are estimated simultaneously, more than 50,000 loci are estimated simultaneously, and more than 100,000 loci are estimated simultaneously. Experiments have shown that up to, including 10,000, and more than 10,000 different loci can be estimated simultaneously in a single reaction with sufficient efficiency and specificity to make a non-invasive prenatal non-integrated gene with high accuracy. Ploidy diagnosis and/or copy number call. Detection can be combined in a single reaction with an entire sample, such as a cfDNA sample isolated from maternal plasma, a fraction thereof, or another processed derivative of a cfDNA sample. Samples (eg, cfDNA or derivatives) can also be split into multiple parallel multiplexing reactions. Optimal sample splitting and recombination is determined by weighing various performance specifications. Due to the limited amount of material, splitting the sample into multiple parts introduces sampling noise, processing time, and increases the possibility of error. Conversely, higher recombination produces greater amounts of spurious amplification and greater amplification inequality, both of which degrade test performance.

在本文中所述方法的应用中的两个关键相关考虑因素是初始样品(例如，血浆)的有限量和要获得等位基因频率或其它测量结果的物质中的初始分子的数量。如果初始分子的数量下降到低于一定水平，那么随机抽样噪声变得显著，并且会影响测试准确性。通常，如果对每个目标基因座包含相等的500-1000个初始分子的样品进行测量，那么可以获得质量足以作出非侵入性产前非整倍性诊断的数据。存在多种方式增加不同测量的数量，例如增加样品体积。施加给样品的每个操控也潜在地产生了物质损失。必需表征由各种操控所引起的损失并加以避免，或视需要改进某些操控的结果以避免可能降低测试性能的损失。Two key relevant considerations in the application of the methods described herein are the finite amount of starting sample (eg, plasma) and the number of starting molecules in the material from which allele frequency or other measurements are to be obtained. If the number of initial molecules drops below a certain level, random sampling noise becomes significant and affects test accuracy. Typically, data of sufficient quality for a non-invasive prenatal diagnosis of aneuploidy can be obtained if measurements are performed on a sample containing an equal number of 500-1000 initial molecules per locus of interest. There are various ways to increase the number of different measurements, such as increasing the sample volume. Every manipulation applied to a sample also potentially creates a loss of matter. It is necessary to characterize the losses caused by various manipulations and avoid them, or improve the results of certain manipulations as necessary to avoid losses that may reduce the performance of the test.

在一个实施例中，有可能在后续步骤中通过扩增所有或一部分初始样品(例如，cfDNA样品)来减轻潜在损失。多种方法可用以扩增样品中的所有遗传物质，增加可用于下游程序的量。在一个实施例中，在一个不同衔接子、两个不同衔接子或多个不同衔接子的接合之后，通过PCR扩增接合介导的PCR(LM-PCR)DNA片段。在一个实施例中，使用复合置换扩增(MDA)phi-29聚合酶来等温扩增所有DNA。在DOP-PCR和变体中，使用随机引发来扩增初始物质DNA。每种方法具有某些特征，例如在基因组的所有呈现区域内扩增的均匀性，初始DNA的捕获和扩增的效率，以及随片段长度而变的扩增性能。In one embodiment, it is possible to mitigate potential losses in subsequent steps by amplifying all or a portion of the initial sample (eg, cfDNA sample). Various methods are available to amplify all genetic material in a sample, increasing the amount available for downstream procedures. In one embodiment, ligation-mediated PCR (LM-PCR) DNA fragments are amplified by PCR following ligation of one different adapter, two different adapters, or multiple different adapters. In one embodiment, multiple displacement amplification (MDA) phi-29 polymerase is used to isothermally amplify all DNA. In DOP-PCR and variants, random priming is used to amplify the starting material DNA. Each method has certain characteristics, such as uniformity of amplification across all represented regions of the genome, efficiency of capture and amplification of initial DNA, and amplification performance as a function of fragment length.

在一个实施例中，LM-PCR可以与具有3′酪氨酸的单一异源双链衔接子一起使用。异源双链衔接子能够使用可以在第一轮PCR期间被转化为初始DNA片段的5′和3′端上的两个不同序列的单一衔接子分子。在一个实施例中，有可能通过大小分离或产品(例如安剖(AMPURE)、塔斯(TASS))或其它类似方法对扩增库进行分级。在接合之前，可以使样品DNA末端平端化，并且然后向3′端添加单一腺苷碱基。在接合之前，可以使用限制酶或一些其它裂解方法裂解DNA。在接合期间，样品片段的3′腺苷和衔接子的互补性3′酪氨酸突出端可以增强接合效率。从时间观点来看，可以限制PCR扩增的延伸步骤，以减少长于约200bp、约300bp、约400bp、约500bp或约1,000bp的片段的扩增。因为在母本血浆中发现的更长的DNA几乎专门来自母本，所以这可以促使胎儿DNA富集10％-50％并改进测试性能。使用如通过可商购的试剂盒说明的条件操作多个反应；产生了少于10％样品DNA分子的成功接合。关于这一点的反应条件的一系列优化将接合提高到约70％。In one example, LM-PCR can be used with a single heteroduplex adapter with a 3' tyrosine. Heteroduplex adapters enable the use of a single adapter molecule that can be converted to two different sequences on the 5' and 3' ends of the original DNA fragment during the first round of PCR. In one embodiment, it is possible to fractionate the amplified library by size separation or product (eg AMPURE, TASS) or other similar methods. Prior to ligation, the sample DNA ends can be blunt-ended and then a single adenosine base added to the 3' end. Prior to ligation, the DNA can be cleaved using restriction enzymes or some other cleavage method. During ligation, the 3' adenosine of the sample fragments and the complementary 3' tyrosine overhang of the adapter can enhance ligation efficiency. From a temporal standpoint, the extension step of PCR amplification can be limited to reduce the amplification of fragments longer than about 200 bp, about 300 bp, about 400 bp, about 500 bp, or about 1,000 bp. Since the longer DNA found in maternal plasma is almost exclusively of maternal origin, this could enable 10%-50% enrichment of fetal DNA and improve test performance. Multiple reactions were run using conditions as specified by commercially available kits; resulting in successful ligation of less than 10% of the sample DNA molecules. A series of optimizations of the reaction conditions at this point increased conjugation to about 70%.

微型PCRmini-PCR

以下微型PCR方法是含有短核酸、经消化核酸或片段化核酸(例如cfDNA)的样品所希望的。传统的PCR检测设计导致不同胎儿分子大量损失，但是可以通过设计出称为微型PCR检测的极短PCR检测来大大减少损失。使母本血清中的胎儿cfDNA高度片段化并且片段大小大致以高斯(Gaussian)方式分布，其中平均值是160bp，标准偏差是15bp，最小大小是约100bp，并且最大大小是约220bp。片段开头和末尾位置相对于靶向多态性的分布虽然不一定是随机的，但是在单独的目标中和在全体所有目标中大幅变化并且一个特定目标基因座的多态位点可以占据来源于所述基因座的各个片段中从开头到末尾的任何位置。应注意，术语微型PCR同样可以指不具有额外约束或限制的普通PCR。The following mini-PCR method is desirable for samples containing short nucleic acids, digested nucleic acids, or fragmented nucleic acids such as cfDNA. Conventional PCR assay designs result in substantial loss of different fetal molecules, but this loss can be greatly reduced by designing extremely short PCR assays called mini-PCR assays. Fetal cfDNA in maternal serum was highly fragmented and fragment sizes were distributed roughly in a Gaussian fashion with a mean of 160 bp, a standard deviation of 15 bp, a minimum size of about 100 bp, and a maximum size of about 220 bp. The distribution of fragment start and end positions relative to targeted polymorphisms, while not necessarily random, varies widely within individual targets and across all targets and polymorphic sites at a particular target locus can occupy polymorphic sites derived from Any position from the beginning to the end of each segment of the locus. It should be noted that the term mini-PCR may equally refer to ordinary PCR without additional constraints or limitations.

在PCR期间，扩增将仅发生自包含正向和反向引物位点的模板DNA片段。因为胎儿cfDNA片段短，所以两个引物位点存在的似然性，包含正向和反向引物位点的长度L的胎儿片段的似然性是扩增子长度与片段长度的比率。在理想条件下，扩增子是45、50、55、60、65或70bp的检测将分别成功地扩增自72％、69％、66％、63％、59％或56％的可用模板片段分子。扩增子长度是正向和反向引发位点的5′端之间的距离。比本领域技术人员通常使用的要短的扩增子长度可以通过仅仅需要短序列读数而产生所希望的多态基因座的更有效的测量。在一个实施例中，扩增子的实质部分应该小于100bp、小于90bp、小于80bp、小于70bp、小于65bp、小于60bp、小于55bp、小于50bp或小于45bp。During PCR, amplification will only occur from the template DNA fragment containing the forward and reverse primer sites. Because fetal cfDNA fragments are short, the likelihood that two primer sites exist, the likelihood of a fetal fragment of length L containing forward and reverse primer sites, is the ratio of amplicon length to fragment length. Under ideal conditions, assays with amplicons of 45, 50, 55, 60, 65 or 70 bp will be successfully amplified from 72%, 69%, 66%, 63%, 59% or 56% of the available template fragments, respectively molecular. Amplicon length is the distance between the 5' ends of the forward and reverse priming sites. Amplicon lengths shorter than those typically used by those skilled in the art can yield more efficient measurements of desired polymorphic loci by requiring only short sequence reads. In one embodiment, a substantial portion of the amplicon should be less than 100 bp, less than 90 bp, less than 80 bp, less than 70 bp, less than 65 bp, less than 60 bp, less than 55 bp, less than 50 bp, or less than 45 bp.

应注意，在现有技术中已知的方法中，通常避免例如本文中所述的短检测，因为它们不是所需的并且它们通过限制引物长度、退火特征和正向与反向引物之间的距离对引物设计施加了大量限制。It should be noted that in methods known in the prior art, short assays such as those described herein are generally avoided, as they are not desired and they limit primer length, annealing characteristics and distance between forward and reverse primers. A number of constraints are imposed on primer design.

还应注意，如果任一个引物的3′端在多态位点的大约1-6个碱基内，那么存在偏差扩增的潜在性。在初始聚合酶结合位点的这种单一碱基差异可以引起一个等位基因优先扩增，这可以更改所观察到的等位基因频率并降低性能。所有这些限制都使鉴别将成功地扩增特定基因座的引物并且此外，设计在同一个复合反应中兼容的大量引物组变得非常具有挑战性。在一个实施例中，内部正向和反向引物的3′端被设计成用于杂交到多态位点上游的DNA区域，并且通过少量碱基与多态位点隔开。理想地，碱基数量可以介于6个与10个碱基之间，但是同样可以介于4个与15个碱基之间、介于三个与20个碱基之间、介于两个与30个碱基之间或介于1个与60个碱基之间，并且实现实质上相同的目的。It should also be noted that if the 3' end of either primer is within approximately 1-6 bases of the polymorphic site, there is the potential for biased amplification. This single base difference in the initial polymerase binding site can cause preferential amplification of one allele, which can alter the observed allele frequency and reduce performance. All of these limitations make it very challenging to identify primers that will successfully amplify a particular locus and, moreover, to design a large number of primer sets that are compatible in the same multiplex reaction. In one embodiment, the 3' ends of the internal forward and reverse primers are designed to hybridize to a DNA region upstream of the polymorphic site and separated from the polymorphic site by a small number of bases. Ideally, the number of bases can be between 6 and 10 bases, but equally can be between 4 and 15 bases, between three and 20 bases, between two between 30 bases or between 1 and 60 bases, and achieve substantially the same purpose.

复合PCR可以涉及扩增所有目标的单轮PCR或它可以涉及一轮PCR、接着是一或多轮嵌套式PCR或嵌套式PCR的一些变体。嵌套式PCR由后续一或多轮PCR扩增组成，所述PCR扩增使用一或多个通过至少一个碱基对内部结合到前一轮中所用引物的新引物。嵌套式PCR减少了后续反应中由扩增引起的杂散扩增目标的数量，仅仅是来自前一反应的具有正确内部序列的那些扩增产物。减少杂散扩增目标改进了可以获得的、尤其在测序中可以获得的适用测量结果的数量。嵌套式PCR通常需要设计完全在前一引物结合位点内部的引物，必定增加扩增所需最小DNA区段大小。关于例如母本血浆cfDNA的样品，其中DNA被高度片段化，更大的检测大小减少了可以从中获得测量结果的不同cfDNA分子的数量。在一个实施例中，为了抵消这种作用，可以使用部分嵌套式方法，其中第二轮引物中的一个或两个与第一结合位点重叠，内部延长了一定数量的碱基，从而获得额外特异性同时最低限度地增加总检测大小。Multiplex PCR can involve a single round of PCR that amplifies all targets or it can involve one round of PCR followed by one or more rounds of nested PCR or some variation of nested PCR. Nested PCR consists of one or more subsequent rounds of PCR amplification using one or more new primers internally bound by at least one base pair to the primers used in the previous round. Nested PCR reduces the number of spurious amplified targets caused by amplification in subsequent reactions to only those amplified products from the previous reaction with the correct internal sequence. Reducing spurious amplification targets improves the number of useful measurements that can be obtained, especially in sequencing. Nested PCR usually requires the design of primers that are completely inside the binding site of the previous primer, necessarily increasing the minimum DNA segment size required for amplification. Regarding samples such as maternal plasma cfDNA, where the DNA is highly fragmented, larger assay sizes reduce the number of distinct cfDNA molecules from which measurements can be obtained. In one embodiment, to counteract this effect, a partially nested approach can be used, in which one or both of the second round primers overlap the first binding site and are internally extended by a certain number of bases to obtain Additional specificity while minimally increasing overall assay size.

在一个实施例中，PCR检测的复合池被设计成用于潜在地扩增一或多条染色体上的杂合SNP或其它多态或非多态基因座并且这些检测被用于单一反应中来扩增DNA。PCR检测的数量可以介于50个与200个PCR检测之间、介于200个与1,000个PCR检测之间、介于1,000个与5,000个PCR检测之间或介于5,000个与20,000个PCR检测之间(分别是50到200重、200到1,000重、1,000到5,000重、5,000到20,000重、超过20,000重)。在一个实施例中，约10,000个PCR检测(10,000重)的复合池被设计成用于潜在地扩增X、Y、13、18和21以及1或2号染色体上的杂合SNP基因座并且这些检测被用于单一反应中来扩增从以下物质获得的cfDNA：血浆样品、绒毛样品、羊膜穿刺术样品、单一或少量细胞、其它体液或组织、癌症或其它遗传物质。每个基因座的SNP频率可以通过克隆或扩增子的一些其它测序方法来确定。所有检测的等位基因频率分布或比率的统计分析可以用于确定样品是否含有测试中所包括的染色体中的一或多个的三体性。在另一实施例中，将初始cfDNA样品拆分成两个样品并且执行并行5,000重检测。在另一实施例中，将初始cfDNA样品拆分成n个样品并且执行并行(约10,000/n)重检测，其中n介于2与12之间、或介于12与24之间、或介于24与48之间、或介于48与96之间。收集数据并且以已经描述的类似方式进行分析。应注意，这种方法同样适用于检测易位、缺失、复制和其它染色体异常。In one embodiment, multiple pools of PCR assays are designed to potentially amplify heterozygous SNPs or other polymorphic or non-polymorphic loci on one or more chromosomes and these assays are used in a single reaction to Amplify DNA. The number of PCR tests can be between 50 and 200 PCR tests, between 200 and 1,000 PCR tests, between 1,000 and 5,000 PCR tests, or between 5,000 and 20,000 PCR tests (50 to 200 weights, 200 to 1,000 weights, 1,000 to 5,000 weights, 5,000 to 20,000 weights, and over 20,000 weights, respectively). In one embodiment, a composite pool of approximately 10,000 PCR assays (10,000 multiplexes) is designed to potentially amplify heterozygous SNP loci on chromosomes X, Y, 13, 18, and 21 and chromosomes 1 or 2 and These assays are used in a single reaction to amplify cfDNA obtained from plasma samples, chorionic villi samples, amniocentesis samples, single or small numbers of cells, other body fluids or tissues, cancer or other genetic material. The SNP frequency for each locus can be determined by cloning or some other sequencing method of the amplicon. Statistical analysis of all detected allele frequency distributions or ratios can be used to determine whether a sample contains a trisomy for one or more of the chromosomes included in the test. In another embodiment, the initial cfDNA sample was split into two samples and parallel 5,000-plex assays were performed. In another embodiment, the initial cfDNA sample is split into n samples and parallel (approximately 10,000/n) multiplexing is performed, where n is between 2 and 12, or between 12 and 24, or between Between 24 and 48, or between 48 and 96. Data were collected and analyzed in a similar manner as already described. It should be noted that this method is equally applicable to the detection of translocations, deletions, duplications and other chromosomal abnormalities.

在一个实施例中，还可以向引物中的任一个的3′或5′端添加与目标基因组不具有同源性的尾部。这些尾部促进后续操控、程序或测量。在一个实施例中，尾部序列对于正向和反向目标特异性引物来说可以是相同的。在一个实施例中，可以针对正向和反向目标特异性引物使用不同尾部。在一个实施例中，可以针对不同基因座或基因座组使用多个不同尾部。某些尾部可以在所有基因座中或在基因座子组中共享。举例来说，使用对应于当前测序平台中的任一个所需的正向和反向序列的正向和反向尾部可以实现在扩增之后直接测序。在一个实施例中，尾部可以用作可以用于添加其它适用序列的所有扩增目标中的共同引发位点。在一些实施例中，内部引物可以含有被设计成用于杂交靶向基因座(例如多态基因座)的上游或下游的区域。在一些实施例中，引物可以含有分子条形码。在一些实施例中，引物可以含有被设计成用于允许PCR扩增的通用引发序列。In one embodiment, a tail having no homology to the target genome may also be added to the 3' or 5' end of any of the primers. These tails facilitate subsequent manipulations, procedures or measurements. In one example, the tail sequence can be the same for the forward and reverse target-specific primers. In one embodiment, different tails can be used for the forward and reverse target-specific primers. In one embodiment, multiple different tails can be used for different loci or groups of loci. Certain tails can be shared among all loci or among subsets of loci. For example, direct sequencing after amplification can be achieved using forward and reverse tails corresponding to the forward and reverse sequences required by any of the current sequencing platforms. In one embodiment, the tail can serve as a common priming site in all amplification targets that can be used to add other suitable sequences. In some embodiments, internal primers may contain regions designed to hybridize upstream or downstream of a targeted locus (eg, a polymorphic locus). In some embodiments, primers may contain molecular barcodes. In some embodiments, primers may contain universal priming sequences designed to allow PCR amplification.

在一个实施例中，创建10,000重PCR检测池以使得正向和反向引物具有对应于高通量测序仪器所需的正向和反向序列的尾部，所述高通量测序仪器例如可购自伊路米那(ILLUMINA)的HISEQ、GAIIX或MYSEQ。另外，测序尾部所包括的5′是可以用作后续PCR中的引发位点的额外序列，用于向扩增子添加核苷酸条形码序列，实现在高通量测序仪器的单一泳道中进行多个样品的复合测序。In one example, a 10,000-plex PCR pool is created such that the forward and reverse primers have tails corresponding to the forward and reverse sequences required by a high-throughput sequencing instrument, such as commercially available HISEQ, GAIIX or MYSEQ from ILLUMINA. In addition, the 5' included in the sequencing tail is an additional sequence that can be used as a priming site in subsequent PCR to add a nucleotide barcode sequence to the amplicon, enabling multiple sequencing in a single lane of a high-throughput sequencing instrument. Composite sequencing of samples.

在一个实施例中，创建10,000重PCR检测池以使得反向引物具有对应于高通量测序仪器所需的反向序列的尾部。在用第一个10,000重检测扩增之后，可以使用另一个具有针对所有目标的部分嵌套式正向引物(例如6碱基嵌套式)和对应于第一轮中所包括的反向测序尾部的反向引物的10,000重池来执行后续PCR扩增。后面这一轮用仅一个目标特异性引物和通用引物进行的部分嵌套式扩增限制了所需的检测大小，减少了抽样噪声，但是大大减少了杂散扩增子的数量。可以向所附接的接合衔接子添加测序标记和/或作为PCR探针的一部分，以使得所述标记是最终扩增子的一部分。In one embodiment, a 10,000-plex PCR detection pool is created such that the reverse primers have tails corresponding to the reverse sequences required by high throughput sequencing instruments. After amplification with the first 10,000-plex detection, another forward primer with a partially nested (e.g., 6-base nested) targeting all targets and corresponding to the reverse sequencing included in the first round can be used. Tail the 10,000-plex pool of reverse primers to perform subsequent PCR amplification. This latter round of partially nested amplification with only one target-specific and universal primer limits the required assay size, reduces sampling noise, but greatly reduces the number of spurious amplicons. Sequencing tags can be added to the attached ligated adapters and/or as part of the PCR probes so that the tags are part of the final amplicon.

胎儿分数影响测试性能。存在多种方式富集母本血浆中所发现的DNA的胎儿分数。可以通过先前所述的已经讨论的LM-PCR方法以及通过靶向去除长母本片段来增加胎儿分数。在一个实施例中，在目标基因座的复合PCR扩增之前，可以执行额外复合PCR反应以选择性地去除对应于后续复合PCR中所靶向的基因座的长并且很大程度上源于母本的片段。额外引物被设计成用于与无细胞胎儿DNA片段中预计所存在的相比，距离多态性更远距离退火位点。这些引物可以在目标多态基因座的复合PCR之前用于一个周期复合PCR反应中。这些末端引物标记有可以允许选择性识别标记过的DNA碎片的分子或部分。在一个实施例中，这些DNA分子可以用生物素分子共价修饰，所述生物素分子允许在一个PCR周期之后去除新形成的包含这些引物的双链DNA。在所述第一轮期间形成的双链DNA可能是源于母本的。可以通过使用链霉亲和素磁珠实现杂交物质的去除。存在可以同样起作用的其它标记方法。在一个实施例中，可以使用大小选择方法来富集样品中更短的DNA链；例如那些小于约800bp、小于约500bp或小于约300bp的DNA链。然后可以像往常一样进行短片段的扩增。Fetal fraction affects test performance. There are various ways to enrich the fetal fraction of DNA found in maternal plasma. Fetal fraction can be increased by the already discussed LM-PCR method described previously as well as by targeted removal of long maternal fragments. In one embodiment, prior to multiplex PCR amplification of the loci of interest, additional multiplex PCR reactions can be performed to selectively remove long and largely maternally derived genes corresponding to loci targeted in subsequent multiplex PCRs. fragment of this book. Additional primers were designed for annealing sites that were more distant from the polymorphism than would be expected to be present in the cell-free fetal DNA fragment. These primers can be used in a one-cycle multiplex PCR reaction prior to multiplex PCR for the polymorphic loci of interest. These end primers are labeled with molecules or moieties that allow selective recognition of labeled DNA fragments. In one example, these DNA molecules can be covalently modified with a biotin molecule that allows removal of newly formed double-stranded DNA containing these primers after one cycle of PCR. The double-stranded DNA formed during this first round is likely to be of maternal origin. Removal of hybridized material can be achieved by using streptavidin magnetic beads. There are other labeling methods that can work equally well. In one embodiment, a size selection method can be used to enrich for shorter DNA strands in the sample; for example, those less than about 800 bp, less than about 500 bp, or less than about 300 bp. Amplification of short fragments can then be performed as usual.

本发明中所述的微型PCR方法实现了来自单一样品的数百到数千或甚至数百万个基因座在单一反应中的高度复合扩增和分析。同样地，可以复合扩增DNA的检测；可以通过使用条形码PCR在一个测序泳道中复合数十到数百个样品。这种复合检测已经成功地测试了多达49重，并且更加高得多的程度的复合是可能的。实际上，这允许数百个样品在单一测序操作中在数千个SNP处进行基因分型。关于这些样品，所述方法允许测定基因型和杂合率并且同时测定拷贝数，两者均可以用于非整倍性检测目的。这种方法尤其适用于从母本血浆中所发现的自由浮动的DNA检测孕育中的胎儿的非整倍性。这种方法可以用作用于鉴别胎儿性别和/或预测胎儿亲权的方法的一部分。它可以用作用于突变剂量方法的一部分。这种方法可以用于任何量的DNA或RNA，并且靶向区域可以是SNP、其它多态区域、非多态区域以及其组合。The mini-PCR method described in this invention enables highly multiplexed amplification and analysis of hundreds to thousands or even millions of loci from a single sample in a single reaction. Likewise, detection of amplified DNA can be multiplexed; tens to hundreds of samples can be multiplexed in one sequencing lane by using barcoded PCR. This composite assay has been successfully tested up to 49-fold, and much higher degrees of composites are possible. In effect, this allows hundreds of samples to be genotyped at thousands of SNPs in a single sequencing operation. With respect to these samples, the method allows determination of genotype and heterozygosity rate and simultaneous determination of copy number, both of which can be used for aneuploidy detection purposes. This method is particularly useful for detecting aneuploidy in a gestating fetus from free-floating DNA found in maternal plasma. This method can be used as part of a method for identifying the sex of a fetus and/or predicting the parentage of a fetus. It can be used as part of a dosing method for mutations. This approach can be used with any amount of DNA or RNA, and the targeted regions can be SNPs, other polymorphic regions, non-polymorphic regions, and combinations thereof.

在一些实施例中，可以使用片段化DNA的接合介导的通用PCR扩增。接合介导的通用PCR扩增可以用于扩增血浆DNA，然后可以将其分成多个并行反应。它还可以用于优先扩增短片段，从而富集胎儿部分。在一些实施例中，通过接合向片段中添加标记可以实现更短片段的检测，使用引物的更短目标序列特异性部分和/或在减少非特异性反应的更高温度下退火。In some embodiments, ligation-mediated universal PCR amplification of fragmented DNA can be used. Ligation-mediated universal PCR amplification can be used to amplify plasma DNA, which can then be divided into multiple parallel reactions. It can also be used to preferentially amplify short fragments, thereby enriching the fetal fraction. In some embodiments, detection of shorter fragments can be achieved by adding labels to the fragments by ligation, using shorter target-sequence-specific portions of primers and/or annealing at higher temperatures that reduce non-specific reactions.

本文中所述的方法可以用于存在与一定量的受到污染的DNA混合的目标DNA组的多个目的。在一些实施例中，目标DNA和受到污染的DNA可以来自遗传相关个体。举例来说，可以从含有胎儿(目标)DNA以及母本(受到污染的)DNA的母本血浆检测胎儿(目标)基因异常；所述异常包括整个染色体异常(例如非整倍性)、部分染色体异常(例如缺失、复制、倒置、易位)、多核苷酸多态性(例如STR)、单核苷酸多态性和/或其它基因异常或差异。在一些实施例中，目标和受到污染的DNA可以来自同一个体，但是其中目标和受到污染的DNA因一或多个突变而不同，例如在癌症的情况下。(参看例如H.马蒙(H.Mamon)等人优先扩增来自血浆的细胞凋亡DNA：用于增强循环DNA中次要DNA更改的检测的潜能(Preferential Ampiification of Apoptotic DNA fromPlasma：Potential for Enhancing Detection of Minor DNA Alterations in Circulating DNA).临床化学(Clinical Chemistry)54：9(2008))。在一些实施例中，可以在细胞培养(细胞凋亡)上清液中发现DNA。在一些实施例中，有可能在生物样品(例如，血液)中诱导细胞凋亡用于后续库制备、扩增和/或测序。在本发明其它地方中提出了用于实现这一目的的多个工作流和方案。The methods described herein can be used for multiple purposes where there is a target DNA group mixed with an amount of contaminated DNA. In some embodiments, the target DNA and the contaminated DNA can be from genetically related individuals. For example, fetal (target) genetic abnormalities can be detected from maternal plasma containing fetal (target) DNA as well as maternal (contaminated) DNA; such abnormalities include whole chromosome abnormalities (such as aneuploidy), partial chromosome abnormalities Abnormalities (eg, deletions, duplications, inversions, translocations), polynucleotide polymorphisms (eg, STRs), single nucleotide polymorphisms, and/or other genetic abnormalities or differences. In some embodiments, the target and contaminating DNA may be from the same individual, but where the target and contaminating DNA differ by one or more mutations, such as in the case of cancer. (See e.g. Preferential Ampiification of Apoptotic DNA from Plasma: Potential for Enhancing Detection of Minor DNA Alterations in Circulating DNA by Preferential Amplification of Apoptotic DNA from Plasma by H. Mamon et al. Detection of Minor DNA Alterations in Circulating DNA). Clinical Chemistry 54:9 (2008). In some embodiments, DNA can be found in cell culture (apoptotic) supernatants. In some embodiments, it is possible to induce apoptosis in a biological sample (eg, blood) for subsequent library preparation, amplification, and/or sequencing. Several workflows and schemes for accomplishing this are presented elsewhere in this disclosure.

在一些实施例中，目标DNA可以源自单细胞、源自由小于一个拷贝的目标基因组组成的DNA样品、源自低量DNA、源自混合来源的DNA(例如妊娠血浆：胎盘和母本DNA；癌症患者血浆和肿瘤：健康与癌症DNA之间的混合、移植等)、源自其它体液、源自细胞培养物、源自培养上清液、源自法医DNA样品、源自古老的DNA样品(例如被截留在琥珀中的昆虫)、源自其它DNA样品以及其组合。In some embodiments, the DNA of interest can be derived from a single cell, from a DNA sample consisting of less than one copy of the genome of interest, from low amounts of DNA, from DNA from mixed sources (e.g., pregnancy plasma: placental and maternal DNA; Cancer patient plasma and tumors: mixing between healthy and cancer DNA, transplantation, etc.), from other body fluids, from cell cultures, from culture supernatants, from forensic DNA samples, from ancient DNA samples ( such as insects trapped in amber), derived from other DNA samples, and combinations thereof.

在一些实施例中，可以使用短扩增子大小。短扩增子大小尤其适合于片段化DNA(参看例如A.西科拉(A.Sikora)等人用短PCR扩增子检测增加量的无细胞胎儿DNA(Detection of increased amounts of cell-free fetal DNA with short PCR amplicons).临床化学.2010年1月；56(1)：136-8)。In some embodiments, short amplicon sizes can be used. Short amplicon sizes are especially suitable for fragmented DNA (see e.g. A. Sikora et al. Detection of increased amounts of cell-free fetal DNA using short PCR amplicons. DNA with short PCR amplicons). Clin Chem. 2010 Jan;56(1):136-8).

短扩增子大小的使用可以产生一些显著益处。短扩增子大小可以产生优化的扩增效率。短扩增子大小通常产生更短的产物，因此非特异性引发的机率更低。更短的产物可以更密集地聚集在测序流动细胞上，因为簇将更小。应注意，本文中所描述的方法可以同样适用于更长的PCR扩增子。必要时可以增加扩增子长度，例如当对更大的序列伸长部进行测序时。对单细胞并且对基因组DNA操作以长度100bp到200bp的检测作为嵌套式PCR方案中的第一步的146重靶向扩增实验，得到阳性结果。The use of short amplicon sizes can yield some significant benefits. Short amplicon sizes result in optimized amplification efficiencies. Short amplicon sizes generally yield shorter products and therefore less chance of non-specific priming. Shorter products can be more densely clustered on the sequencing flow cell because the clusters will be smaller. It should be noted that the methods described herein can be equally applied to longer PCR amplicons. Amplicon length can be increased if necessary, for example when sequencing larger stretches of sequence. Positive results were obtained in single cell and genomic DNA operations with 146-fold targeted amplification experiments with a length of 100bp to 200bp as the first step in the nested PCR protocol.

在一些实施例中，本文中所描述的方法可以用于扩增和/或检测SNP、拷贝数、核苷酸甲基化、mRNA水平、其它类型的RNA表达水平、其它遗传特征和/或表观遗传特征。本文中所述的微型PCR方法可以连同下一代测序一起使用；它可以与其它下游方法一起使用，例如微阵列、通过数字PCR计数、实时PCR、质谱分析等。In some embodiments, the methods described herein can be used to amplify and/or detect SNPs, copy number, nucleotide methylation, mRNA levels, other types of RNA expression levels, other genetic characteristics and/or expression Epigenetic features. The mini-PCR method described herein can be used in conjunction with next generation sequencing; it can be used with other downstream methods such as microarrays, counting by digital PCR, real-time PCR, mass spectrometry, etc.

在一些实施例中，本文中所述的微型PCR扩增方法可以用作用于准确定量少数群体的方法的一部分。它可以用于使用尖峰校准器进行绝对定量。它可以用于通过极深测序进行突变/次要等位基因定量，并且可以按高度复合方式操作。它可以用于人类、动物、植物或其它生物中的亲属或祖先的标准亲权和身份测试。它可以用于法医测试。它可以用于任何种类材料的快速基因分型和拷贝数分析(CN)，所述材料例如羊水和CVS、精子、受孕产物(POC)。它可以用于单细胞分析，例如来自胚胎的活检样品的基因分型。它可以用于通过使用微型PCR靶向测序进行快速胚胎分析(在活检不到一天、一天或两天内)。In some embodiments, the mini-PCR amplification methods described herein can be used as part of a method for accurate quantification of minority populations. It can be used for absolute quantification using a spike calibrator. It can be used for mutation/minor allele quantification by very deep sequencing and can operate in a highly multiplexed fashion. It can be used for standard parentage and identity testing of relatives or ancestry in humans, animals, plants or other organisms. It can be used in forensic testing. It can be used for rapid genotyping and copy number analysis (CN) of any kind of material such as amniotic fluid and CVS, sperm, product of conception (POC). It can be used for single-cell analysis such as genotyping of biopsy samples from embryos. It can be used for rapid embryo analysis (within less than one, one or two days of biopsy) by targeted sequencing using mini-PCR.

在一些实施例中，它可以用于肿瘤分析：肿瘤活检通常是健康细胞和肿瘤细胞的混合物。靶向PCR允许对SNP和基因座进行深度测序而不接近背景序列。它可以用于肿瘤DNA的拷贝数和杂合性丢失分析。所述肿瘤DNA可以出现于肿瘤患者的多个不同体液或组织中。它可以用于检测肿瘤复发和/或肿瘤筛选。它可以用于种子的质量控制测试。它可以用于繁殖或捕鱼目的。应注意，为了倍性判读，这些方法中的任一种可以同样用于靶向非多态基因座。In some embodiments, it can be used in tumor analysis: tumor biopsies are often a mixture of healthy and tumor cells. Targeted PCR allows deep sequencing of SNPs and loci without access to background sequences. It can be used for copy number and loss of heterozygosity analysis of tumor DNA. The tumor DNA can be present in many different body fluids or tissues of tumor patients. It can be used to detect tumor recurrence and/or tumor screening. It can be used for quality control testing of seeds. It can be used for breeding or fishing purposes. It should be noted that either of these methods can equally be used to target non-polymorphic loci for ploidy calling.

描述了作为本文中所公开的方法的基础的一些基本方法的一些文献包括：(1)王HY(Wang HY)，罗M(Luo M)，捷列先科IV(Tereshchenko IV)，弗里克DM(FrikkerDM)，崔X(Cui X)，李JY(Li JY)，胡G(Hu G)，朱Y(Chu Y)，阿扎罗MA(AzaroMA)，林Y(Lin Y)，沈L(Shen L)，杨Q(Yang Q)，坎波利斯ME(Kambouris ME)，高R(Gao R)，施W(Shih W)，李H(Li H).基因组研究(Genome Res.)2005年2月；15(2)：276-83。分子遗传学、微生物学和免疫学部门/新泽西癌症研究所，罗伯特伍德约翰逊医学院(Robert Wood Johnson Medical School)，新不伦瑞克(New Brunswick)，新泽西州(New Jersey)08903，美国。(2)以高敏感性对单核苷酸多态性进行高通量基因分型(High-throughput genotyping of single nucleotide polymorphisms with highsensitivity).李H，王HY，崔X，罗M，胡G，格里纳沃尔特DM(Greenawalt DM)，捷列先科IV，李JY，朱Y，高R.分子生物学方法(Methods Mol Biol.)2007；396-PubMedPMID：18025699。(3)包含复合平均9个检测用于测序的方法描述在：嵌套式补丁PCR实现了候选基因中的高度复合突变发现(Nested Patch PCR enables highly multiplexedmutation discovery in candidate genes).瓦利KE(Varley KE)，密特拉RD(Mitra RD).基因组研究2008年11月；18(11)：1844-50。电子版2008年10月10日。应注意，本文中所公开的方法允许复合的数量级超过以上参考文献。Some of the literature describing some of the basic methods underlying the methods disclosed herein include: (1) Wang HY, Luo M, Tereshchenko IV, Frick DM(FrikkerDM), Cui X(Cui X), Li JY(Li JY), Hu G(Hu G), Zhu Y(Chu Y), Azaro MA(AzaroMA), Lin Y(Lin Y), Shen L (Shen L), Yang Q (Yang Q), Kambouris ME (Kambouris ME), Gao R (Gao R), Shi W (Shih W), Li H (Li H). Genome Research (Genome Res.) 2005 Feb;15(2):276-83. Division of Molecular Genetics, Microbiology, and Immunology/Cancer Institute of New Jersey, Robert Wood Johnson Medical School, New Brunswick, New Jersey 08903, USA. (2) High-throughput genotyping of single nucleotide polymorphisms with high sensitivity (High-throughput genotyping of single nucleotide polymorphisms with high sensitivity). Li H, Wang HY, Cui X, Luo M, Hu G, Greenawalt DM, Teresenko IV, Li JY, Zhu Y, Gao R. Methods Mol Biol. 2007; 396-PubMedPMID:18025699. (3) A method including a composite average of 9 detections for sequencing is described in: Nested Patch PCR enables highly multiplexed mutation discovery in candidate genes. Varley KE (Varley KE), Mitra RD. Genome Res 2008 Nov;18(11):1844-50. Electronic version October 10, 2008. It should be noted that the methods disclosed herein allow for orders of magnitude more complexities than the above references.

靶向PCR变本-嵌套Targeted PCR variant - nested

在进行PCR时有多个可能的工作流；描述了本文中所公开的方法中典型的一些工作流。本文中概述的步骤并不打算排除其它可能步骤也不是暗示本文中所述步骤中的任一个是所述方法恰当地起作用所需的。大量参数变化或其它修改在文献中是已知的，并且可以在不影响本发明本质的情况下做出。以下给出了一个具体的一般性工作流，后面跟着多种可能的变体。变体通常是指可能的第二个PCR反应，例如可以进行的不同类型的嵌套(步骤3)。重要的是要注意，变体可以在与本文中明确描述不同的时间或以不同的顺序进行。必要时，用于说明的使用多态基因座的实例可以容易地被调适成用于扩增非多态基因座。There are several possible workflows when performing PCR; some typical of the methods disclosed herein are described. The steps outlined herein are not intended to exclude other possible steps nor to imply that any of the steps described herein are required for the method to function properly. Numerous parameter changes or other modifications are known in the literature and can be made without affecting the essence of the invention. A concrete general workflow is given below, followed by many possible variants. Variants generally refer to possible second PCR reactions, such as different types of nesting that can be done (step 3). It is important to note that variants may be performed at different times or in different orders than explicitly described herein. The illustrated examples using polymorphic loci can be readily adapted for amplification of non-polymorphic loci, if necessary.

1.样品中的DNA可以具有所附接的接合衔接子，通常称为库标记或接合衔接子标记(LT)，其中接合衔接子含有通用引发序列，接着是通用扩增。在一个实施例中，这可以使用被设计成用于在片段化之后创建测序库的标准方案来进行。在一个实施例中，可以使DNA样品末端平端化，并且然后可以在3′端添加A。可以添加并接合含T突出端的Y衔接子。在一些实施例中，可以使用除A或T突出端以外的其它粘性末端。在一些实施例中，可以添加其它衔接子，例如成环接合衔接子。在一些实施例中，衔接子可以具有被设计用于PCR扩增的标记。1. The DNA in the sample can have attached ligation adapters, commonly referred to as library tags or ligation adapter tags (LT), where the ligation adapters contain a universal priming sequence followed by universal amplification. In one embodiment, this can be done using standard protocols designed for creating sequencing libraries after fragmentation. In one example, the ends of the DNA sample can be blunt-ended, and an A can then be added at the 3' end. A Y adapter containing a T overhang can be added and ligated. In some embodiments, cohesive ends other than A or T overhangs may be used. In some embodiments, additional adapters may be added, such as circular ligation adapters. In some embodiments, adapters may have tags designed for PCR amplification.

2.特异性目标扩增(STA)：可以在一个反应体积中复合数百到数千到数万并且甚至数十万个目标的预扩增。STA通常运行10到30个周期，虽然它可以运行5到40个周期、2到50个周期并且甚至1到100个周期。可以对引物加尾，例如用于更简单的工作流或避免对大比例二聚体进行测序。应注意，两个携带相同标记的引物的二聚体通常将不能被有效地扩增或测序。在一些实施例中，可以执行介于1个与10个周期之间的PCR；在一些实施例中，可以执行介于10个与20个周期之间的PCR；在一些实施例中，可以执行介于20个与30个周期之间的PCR；在一些实施例中，可以执行介于30个与40个周期之间的PCR；在一些实施例中，可以执行超过40个周期的PCR。扩增可以是线性扩增。可以优化PCR周期数以产生最佳读数深度(DOR)轮廓。不同目的会需要不同的DOR轮廓。在一些实施例中，读数更均匀分布在所有检测之间是所希望的；如果一些检测的DOR过小，那么随机噪声可能过高以致于数据不适用；而如果读数深度过高，那么每个额外读数的边际效用相对较小。2. Specific Target Amplification (STA): Pre-amplification that can multiplex hundreds to thousands to tens of thousands and even hundreds of thousands of targets in one reaction volume. A STA typically runs for 10 to 30 cycles, although it can run for 5 to 40 cycles, 2 to 50 cycles, and even 1 to 100 cycles. Primers can be tailed, e.g. for simpler workflows or to avoid sequencing large proportions of dimers. It should be noted that dimers of two primers carrying the same label will generally not be efficiently amplified or sequenced. In some embodiments, between 1 and 10 cycles of PCR may be performed; in some embodiments, between 10 and 20 cycles of PCR may be performed; in some embodiments, between 10 and 20 cycles of PCR may be performed; Between 20 and 30 cycles of PCR; in some embodiments, between 30 and 40 cycles of PCR can be performed; in some embodiments, more than 40 cycles of PCR can be performed. Amplification can be linear amplification. The number of PCR cycles can be optimized to produce an optimal depth-of-read (DOR) profile. Different purposes will require different DOR profiles. In some embodiments, a more even distribution of reads across all assays is desirable; if the DOR of some assays is too small, then the random noise may be too high for the data to be applicable; and if the read depth is too high, then each The marginal utility of the extra readings is relatively small.

引物尾部可以改进来自通用标记库的片段化DNA的检测。如果库标记和引物尾部含有同源序列，那么杂交可以得到改进(例如，解链温度(T_M)降低)并且如果仅一部分引物目标序列在样品DNA片段中，那么可以延伸引物。在一些实施例中，可以使用13个或更多个目标特异性碱基对。在一些实施例中，可以使用10到12个目标特异性碱基对。在一些实施例中，可以使用8到9个目标特异性碱基对。在一些实施例中，可以使用6到7个目标特异性碱基对。在一些实施例中，可以对预扩增DNA执行STA，例如MDA、RCA、其它全基因组扩增或衔接子介导的通用PCR。在一些实施例中，可以对富集或耗尽了某些序列和群体的样品执行STA，例如通过大小选择、目标捕获、定向降解。Primer tails can improve detection of fragmented DNA from universal marker libraries. If the library markers and primer tails contain homologous sequences, hybridization can be improved (eg, melting temperature ( _TM ) is reduced) and primers can be extended if only a portion of the primer target sequence is in the sample DNA fragment. In some embodiments, 13 or more target-specific base pairs may be used. In some embodiments, 10 to 12 target-specific base pairs may be used. In some embodiments, 8 to 9 target-specific base pairs can be used. In some embodiments, 6 to 7 target-specific base pairs may be used. In some embodiments, STA, such as MDA, RCA, other whole-genome amplification, or adapter-mediated universal PCR, can be performed on pre-amplified DNA. In some embodiments, STA can be performed on samples enriched or depleted for certain sequences and populations, eg, by size selection, target capture, directed degradation.

3.在一些实施例中，有可能执行第二个复合PCR或引物延伸反应以增加特异性并且减少不希望的产物。举例来说，全嵌套、半嵌套、半侧嵌套和/或细分成更小检测池的并行反应都是可以用于增加特异性的技术。实验已经显示，在引物完全相同的情况下，将样品拆分成三个400重反应产生了特异性大于一个1,200重反应的产物DNA。类似地，实验已经显示，在引物完全相同的情况下，将样品拆分成四个2,400重反应产生了特异性大于一个9,600重反应的产物DNA。在一个实施例中，有可能使用具有相同和相反方向性的目标特异性和标记特异性引物。3. In some embodiments, it is possible to perform a second multiplex PCR or primer extension reaction to increase specificity and reduce undesired products. For example, full nesting, half-nesting, half-side nesting, and/or parallel reactions subdivided into smaller detection pools are all techniques that can be used to increase specificity. Experiments have shown that splitting the sample into three 400-plex reactions yields product DNA with greater specificity than one 1,200-plex reaction with identical primers. Similarly, experiments have shown that splitting a sample into four 2,400-plex reactions with identical primers yields product DNA that is more specific than one 9,600-plex reaction. In one embodiment, it is possible to use target-specific and marker-specific primers with the same and opposite directionality.

4.在一些实施例中，有可能使用标记特异性引物和“通用扩增”来扩增通过STA反应产生的DNA样品(稀释、纯化或其它)，即扩增多个或所有经预扩增并且标记的目标。引物可以含有额外的功能序列，例如条形码，或用于在高通量测序平台上测序所需的全衔接子序列。4. In some embodiments, it is possible to amplify DNA samples (diluted, purified or otherwise) generated by STA reactions using marker-specific primers and "universal amplification", i.e. amplifying multiple or all preamplified And mark the target. Primers can contain additional functional sequences, such as barcodes, or full adapter sequences required for sequencing on high-throughput sequencing platforms.

这些方法可以用于分析任何DNA样品，并且当DNA样品特别小时，或当它是其中DNA源自一个以上个体的DNA样品时，例如在母本血浆的情况下，这些方法尤其适用。这些方法可以用于以下DNA样品：例如单一或少量细胞、基因组DNA、血浆DNA、扩增血浆库、扩增细胞凋亡上清液库或其它混合DNA样品。在一个实施例中，这些方法可以用于可以在单一个体中存在具有不同基因组成的细胞的情况下，例如在癌症或移植的情况下。These methods can be used to analyze any DNA sample, and are especially useful when the DNA sample is particularly small, or when it is a DNA sample in which the DNA originates from more than one individual, such as in the case of maternal plasma. These methods can be used with DNA samples such as single or small numbers of cells, genomic DNA, plasma DNA, amplified plasma pools, amplified apoptotic supernatant pools, or other mixed DNA samples. In one embodiment, these methods can be used in situations where cells with different genetic makeup may exist in a single individual, such as in the case of cancer or transplantation.

方案变体(以上工作流的变体和/或补充)Scenario variants (variations and/or additions to the above workflows)

直接复合微型PCR：用标记引物进行的多个目标序列的特异性目标扩增(STA)示出在图1中。101表示在X处具有相关多态基因座的双链DNA。102表示添加了用于通用扩增的接合衔接子的双链DNA。103表示已经用杂交的PCR引物进行通用扩增的单链DNA。104表示最终PCR产物。在一些实施例中，可以对超过100、超过200、超过500、超过1,000、超过2,000、超过5,000、超过10,000、超过20,000、超过50,000、超过100,000或超过200,000个目标进行STA。在后续反应中，标记特异性引物扩增所有目标序列并且延长标记以包括测序所需的所有序列，包括样品索引。在一个实施例中，可以不标记引物或可以仅标记某些引物。可以通过常规衔接子接合添加测序衔接子。在一个实施例中，初始引物可以携带标记。Direct multiplex mini-PCR: Specific target amplification (STA) of multiple target sequences with labeled primers is shown in Figure 1 . 101 represents double-stranded DNA with associated polymorphic loci at X. 102 represents double stranded DNA with ligation adapters added for universal amplification. 103 represents single-stranded DNA that has been universally amplified with hybridized PCR primers. 104 represents the final PCR product. In some embodiments, STA may be performed on more than 100, more than 200, more than 500, more than 1,000, more than 2,000, more than 5,000, more than 10,000, more than 20,000, more than 50,000, more than 100,000, or more than 200,000 objects. In subsequent reactions, marker-specific primers amplify all target sequences and extend the markers to include all sequences required for sequencing, including sample indexing. In one embodiment, no primers may be labeled or only certain primers may be labeled. Sequencing adapters can be added by conventional adapter ligation. In one embodiment, an initial primer may carry a marker.

在一个实施例中，将引物设计成所扩增的DNA长度出乎意料地短。现有技术表明，本领域的普通技术人员通常设计100+bp扩增子。在一个实施例中，扩增子可以被设计成小于80bp。在一个实施例中，扩增子可以被设计成小于70bp。在一个实施例中，扩增子可以被设计成小于60bp。在一个实施例中，扩增子可以被设计成小于50bp。在一个实施例中，扩增子可以被设计成小于45bp。在一个实施例中，扩增子可以被设计成小于40bp。在一个实施例中，扩增子可以被设计成小于35bp。在一个实施例中，扩增子可以被设计成介于40与65bp之间。In one embodiment, primers are designed to amplify unexpectedly short lengths of DNA. The prior art indicates that one of ordinary skill in the art typically designs 100+bp amplicons. In one embodiment, the amplicon can be designed to be smaller than 80 bp. In one embodiment, the amplicon can be designed to be smaller than 70 bp. In one embodiment, the amplicon can be designed to be smaller than 60 bp. In one embodiment, the amplicon can be designed to be smaller than 50 bp. In one embodiment, the amplicon can be designed to be smaller than 45 bp. In one embodiment, the amplicon can be designed to be smaller than 40 bp. In one embodiment, the amplicon can be designed to be smaller than 35 bp. In one embodiment, the amplicon can be designed to be between 40 and 65 bp.

使用这种方案使用1200重扩增执行实验。使用基因组DNA和妊娠血浆；约70％的序列读数映射到靶向序列。细节提供在本文档中的其它地方。未经设计和检测选择的1042重测序导致＞99％的序列是引物二聚体产物。Experiments were performed using this protocol with 1200 multiplex amplification. Genomic DNA and pregnancy plasma were used; approximately 70% of sequence reads mapped to targeted sequences. Details are provided elsewhere in this document. Resequencing of 1042 without design and detection selection resulted in >99% of the sequences being primer dimer products.

连续PCR：在STA1之后，可以用复杂性降低的相同引物池并行扩增产物的多个等分试样。第一次扩增可以得到足够拆分的物质。这种方法对于少量样品来说尤其好，例如约6-100pg、约100pg到1ng、约1ng到10ng或约10ng到100ng样品。将1200重变为三个400重来执行所述方案。测序读数的映射从单独1200重中的约60％到70％增加到超过95％。Sequential PCR: After STA1, multiple aliquots of the product can be amplified in parallel with the same primer pool of reduced complexity. The first amplification yielded sufficient material for resolution. This method works especially well for small amounts of sample, such as about 6-100 pg, about 100 pg to 1 ng, about 1 ng to 10 ng, or about 10 ng to 100 ng of sample. Turn 1200 weights into three 400 weights to execute the scheme. Mapping of sequencing reads increased from about 60% to 70% of the 1200-plex alone to over 95%.

半嵌套式微型PCR：(参看图2)在STA 1之后，执行第二个STA，包含一组复合的内部嵌套式正向引物(103B，105b)和一个(或少量)标记特异性反向引物(103A)。101表示在X处具有相关多态基因座的双链DNA。102表示添加了用于通用扩增的接合衔接子的双链DNA。103表示已经用杂交的正向引物B和反向引物A进行通用扩增的单链DNA。104表示来自103的PCR产物。105表示来自104的产物，它含有杂交的嵌套式正向引物b和反向标记A，反向标记A已经是来自103与104之间所发生的PCR的分子的一部分。106表示最终PCR产物。在这种工作流的情况下，通常超过95％的序列映射到预定目标。嵌套式引物可以与外部正向引物序列重叠但是引入了额外的3′端碱基。在一些实施例中，有可能使用介于一个与20个之间的额外3′碱基。实验已经显示，在1200重设计中使用9个或更多个额外3′碱基同样适用。Semi-nested mini-PCR: (see Figure 2) After STA 1, a second STA is performed, consisting of a composite set of internally nested forward primers (103B, 105b) and one (or a small number) of marker-specific reverse primers. To the primer (103A). 101 represents double-stranded DNA with associated polymorphic loci at X. 102 represents double stranded DNA with ligation adapters added for universal amplification. 103 represents the single-stranded DNA that has been universally amplified with hybridized forward primer B and reverse primer A. 104 represents the PCR product from 103. 105 represents the product from 104, which contains the hybridized nested forward primer b and reverse marker A, which was already part of the molecule from the PCR that occurred between 103 and 104. 106 represents the final PCR product. In the case of this workflow, typically more than 95% of the sequences map to the intended target. Nested primers can overlap the outer forward primer sequence but introduce an extra 3' base. In some embodiments, it is possible to use between one and 20 additional 3' bases. Experiments have shown that using 9 or more additional 3' bases in a 1200-plex design works equally well.

全嵌套式微型PCR：(参看图3)在STA步骤1之后，有可能用携带标记(A、a、B、b)的两种嵌套式引物执行第二个复合PCR(或复杂性降低的并行复合PCR)。101表示在X处具有相关多态基因座的双链DNA。102表示添加了用于通用扩增的接合衔接子的双链DNA。103表示已经用杂交的正向引物B和反向引物A进行通用扩增的单链DNA。104表示来自103的PCR产物。105表示来自104的产物，它含有杂交的嵌套式正向引物b和嵌套式反向引物a。106表示最终PCR产物。在一些实施例中，有可能使用两组全套引物。使用全嵌套式微型PCR方案的实验被用于对单个和三个细胞执行146重扩增，而不使用附接通用接合衔接子和扩增的步骤102。Fully nested mini-PCR: (see Figure 3) After STA step 1, it is possible to perform a second multiplex PCR (or reduced complexity parallel multiplex PCR). 101 represents double-stranded DNA with associated polymorphic loci at X. 102 represents double stranded DNA with ligation adapters added for universal amplification. 103 represents the single-stranded DNA that has been universally amplified with hybridized forward primer B and reverse primer A. 104 represents the PCR product from 103. 105 represents the product from 104, which contains the hybridized nested forward primer b and nested reverse primer a. 106 represents the final PCR product. In some embodiments, it is possible to use two full sets of primers. Experiments using a fully nested mini-PCR protocol were used to perform 146 multiplex amplification on single and triple cells without the step 102 of attaching universal ligation adapters and amplification.

半侧嵌套式微型PCR：(参看图4)有可能使用在片段末端具有衔接子的目标DNA。执行STA，包含一组复合的正向引物(B)和一个(或少量)标记特异性反向引物(A)。可以使用通用标记特异性正向引物和目标特异性反向引物执行第二个STA。101表示在X处具有相关多态基因座的双链DNA。102表示添加了用于通用扩增的接合衔接子的双链DNA。103表示已经用杂交的反向引物A进行通用扩增的单链DNA。104表示来自103的PCR产物，它使用反向引物A和接合衔接子标记引物LT扩增。105表示来自104的产物，它含有杂交的正向引物B。106表示最终PCR产物。在这种工作流中，目标特异性正向和反向引物被用于分开的反应中，从而降低了反应的复杂性并且防止形成正向和反向引物的二聚体。应注意在本实例中，引物A和B可以认为是第一引物，并且引物‘a’和‘b’可以认为是内部引物。这种方法是直接PCR的巨大改进，因为它与直接PCR一样好，但是它避免了引物二聚体。在第一轮半侧嵌套式方案之后，通常看到约99％非靶向DNA，然而在第二轮之后，通常有巨大的改进。Hemi-nested mini-PCR: (see Figure 4) It is possible to use target DNA with adapters at the ends of the fragments. Perform STA with a composite set of forward primers (B) and one (or small number) of marker-specific reverse primers (A). A second STA can be performed using a universal marker-specific forward primer and a target-specific reverse primer. 101 represents double-stranded DNA with associated polymorphic loci at X. 102 represents double stranded DNA with ligation adapters added for universal amplification. 103 represents single-stranded DNA that has been universally amplified with hybridized reverse primer A. 104 represents the PCR product from 103 amplified using reverse primer A and ligation adapter marker primer LT. 105 represents the product from 104, which contains hybridized forward primer B. 106 represents the final PCR product. In this workflow, target-specific forward and reverse primers are used in separate reactions, reducing reaction complexity and preventing dimer formation of forward and reverse primers. It should be noted that in this example, primers A and B may be considered first primers, and primers 'a' and 'b' may be considered internal primers. This method is a huge improvement over direct PCR because it works just as well, but it avoids primer-dimers. After the first round of the hemi-nested protocol, about 99% non-targeted DNA is typically seen, however after the second round there is usually a huge improvement.

三重半侧嵌套式微型PCR：(参看图5)有可能使用在片段末端具有衔接子的目标DNA。执行STA，包含一组复合的正向引物(B)和一个(或少量)标记特异性反向引物(A)和(a)。可以使用通用标记特异性正向引物和目标特异性反向引物执行第二个STA。101表示在X处具有相关多态基因座的双链DNA。102表示添加了用于通用扩增的接合衔接子的双链DNA。103表示已经用杂交的反向引物A进行通用扩增的单链DNA。104表示来自103的PCR产物，它使用反向引物A和接合衔接子标记引物LT扩增。105表示来自104的产物，它含有杂交的正向引物B。106表示来自105的PCR产物，它使用反向引物A和正向引物B扩增。107表示来自106的产物，它含有杂交的反向引物‘a’。108表示最终PCR产物。应注意在本实例中，引物‘a’和B可以认为是内部引物，并且A可以认为是第一引物。任选地，A和B两者均可以认为是第一引物，并且‘a’可以认为是内部引物。反向和正向引物的名称可以转换。在这种工作流中，目标特异性正向和反向引物被用于分开的反应中，从而降低了反应的复杂性并且防止形成正向和反向引物的二聚体。这种方法是直接PCR的巨大改进，因为它与直接PCR一样好，但是它避免了引物二聚体。在第一轮半侧嵌套式方案之后，通常看到约99％非靶向DNA，然而在第二轮之后，通常有巨大的改进。Triple Hemi-Nested Mini-PCR: (see Figure 5) It is possible to use target DNA with adapters at the ends of the fragments. Perform STA with a composite set of forward primers (B) and one (or a small number) of marker-specific reverse primers (A) and (a). A second STA can be performed using a universal marker-specific forward primer and a target-specific reverse primer. 101 represents double-stranded DNA with associated polymorphic loci at X. 102 represents double stranded DNA with ligation adapters added for universal amplification. 103 represents single-stranded DNA that has been universally amplified with hybridized reverse primer A. 104 represents the PCR product from 103 amplified using reverse primer A and ligation adapter marker primer LT. 105 represents the product from 104, which contains hybridized forward primer B. 106 represents the PCR product from 105 amplified using reverse primer A and forward primer B. 107 represents the product from 106, which contains the hybridized reverse primer 'a'. 108 represents the final PCR product. It should be noted that in this example, primers 'a' and B may be considered internal primers, and A may be considered the first primer. Optionally, both A and B may be considered the first primer, and 'a' may be considered the inner primer. The names of reverse and forward primers can be switched. In this workflow, target-specific forward and reverse primers are used in separate reactions, reducing reaction complexity and preventing dimer formation of forward and reverse primers. This method is a huge improvement over direct PCR because it works just as well, but it avoids primer-dimers. After the first round of the hemi-nested protocol, about 99% non-targeted DNA is typically seen, however after the second round there is usually a huge improvement.

单边嵌套式微型PCR：(参看图6)有可能使用在片段末端具有衔接子的目标DNA。还可以用一组复合的嵌套式正向引物并且使用接合衔接子标记作为反向引物来执行STA。然后可以使用一组嵌套式正向引物和通用反向引物执行第二个STA。101表示在X处具有相关多态基因座的双链DNA。102表示添加了用于通用扩增的接合衔接子的双链DNA。103表示已经用杂交的正向引物A进行通用扩增的单链DNA。104表示来自103的PCR产物，它使用正向引物A和接合衔接子标记反向引物LT扩增。105表示来自104的产物，它含有杂交的嵌套式正向引物a。106表示最终PCR产物。这种方法可以通过在第一个和第二个STA中使用重叠引物来检测比标准PCR短的目标序列。通常对已经经历了以上STA步骤1(附接通用标记和扩增)的DNA样品执行所述方法；两个嵌套式引物仅在一侧上，另一侧使用库标记。对细胞凋亡上清液和妊娠血浆库执行所述方法。在这种工作流的情况下，约60％的序列映射到预定目标。应注意，含有反向衔接子序列的读数未被映射，因此如果含有反向衔接子序列的那些读数被映射，那么预计这个数更高。Single-sided nested mini-PCR: (see Figure 6) It is possible to use target DNA with adapters at the ends of the fragments. STA can also be performed with a composite set of nested forward primers and using ligation adapter markers as reverse primers. A second STA can then be performed using a nested set of forward primers and a universal reverse primer. 101 represents double-stranded DNA with associated polymorphic loci at X. 102 represents double stranded DNA with ligation adapters added for universal amplification. 103 represents the single-stranded DNA that has been universally amplified with hybridized forward primer A. 104 represents the PCR product from 103 amplified using forward primer A and ligation adapter marker reverse primer LT. 105 represents the product from 104, which contains the hybridized nested forward primer a. 106 represents the final PCR product. This method can detect target sequences shorter than standard PCR by using overlapping primers in the first and second STA. The method is typically performed on a DNA sample that has undergone STA step 1 above (attach universal marker and amplify); two nested primers on one side only, library marker on the other. The method was performed on apoptotic supernatants and pregnant plasma pools. In the case of this workflow, about 60% of the sequences mapped to the intended targets. Note that reads containing reverse adapter sequences were not mapped, so this number would be expected to be higher if those reads containing reverse adapter sequences were mapped.

单边微型PCR：有可能使用在片段末端具有衔接子的目标DNA(参看图7)。可以用一组复合的正向引物和一个(或少量)标记特异性反向引物执行STA。101表示在X处具有相关多态基因座的双链DNA。102表示添加了用于通用扩增的接合衔接子的双链DNA。103表示含有杂交的正向引物A的单链DNA。104表示来自103的PCR产物，它使用正向引物A和接合衔接子标记反向引物LT扩增，并且是最终PCR产物。这种方法可以检测比标准PCR短的目标序列。然而，它会相对非特异性，因为只使用了一个目标特异性引物。这种方案的有效性是单边嵌套式微型PCR的一半。Single-sided mini-PCR: It is possible to use target DNA with adapters at the ends of the fragments (see Figure 7). STA can be performed with a composite set of forward primers and one (or a small number) of tag-specific reverse primers. 101 represents double-stranded DNA with associated polymorphic loci at X. 102 represents double stranded DNA with ligation adapters added for universal amplification. 103 represents the single-stranded DNA containing hybridized forward primer A. 104 represents the PCR product from 103 amplified using forward primer A and ligation adapter marker reverse primer LT and is the final PCR product. This method can detect target sequences that are shorter than standard PCR. However, it will be relatively non-specific since only one target-specific primer is used. This protocol is half as effective as single-sided nested mini-PCR.

反向半嵌套式微型PCR：有可能使用在片段末端具有衔接子的目标DNA(参看图8)。可以用一组复合的正向引物和一个(或少量)标记特异性反向引物执行STA。101表示在X处具有相关多态基因座的双链DNA。102表示添加了用于通用扩增的接合衔接子的双链DNA。103表示含有杂交的反向引物B的单链DNA。104表示来自103的PCR产物，它使用反向引物B和接合衔接子标记正向引物LT扩增。105表示来自104的PCR产物，它含有杂交的正向引物A和内部反向引物‘b’。106表示已经使用正向引物A和反向引物‘b’从105扩增的PCR产物，并且它是最终PCR产物。这种方法可以检测比标准PCR短的目标序列。Reverse semi-nested mini-PCR: It is possible to use target DNA with adapters at the ends of the fragments (see Figure 8). STA can be performed with a composite set of forward primers and one (or a small number) of tag-specific reverse primers. 101 represents double-stranded DNA with associated polymorphic loci at X. 102 represents double stranded DNA with ligation adapters added for universal amplification. 103 represents the single-stranded DNA containing hybridized reverse primer B. 104 represents the PCR product from 103 amplified using reverse primer B and ligation adapter marker forward primer LT. 105 represents the PCR product from 104, which contains hybridized forward primer A and internal reverse primer 'b'. 106 represents the PCR product that has been amplified from 105 using forward primer A and reverse primer 'b', and it is the final PCR product. This method can detect target sequences that are shorter than standard PCR.

还可以存在作为以上方法的简单迭代或组合的更多变体，例如双重嵌套式PCR，其中使用了三组引物。另一种变体是一个半边嵌套式微型PCR，其中也可以用一组复合的嵌套式正向引物和一个(或少量)标记特异性反向引物执行STA。There can also be more variants that are simple iterations or combinations of the above methods, such as double nested PCR, where three sets of primers are used. Another variant is a half-nested mini-PCR, where STA can also be performed with a composite set of nested forward primers and one (or a small number) of marker-specific reverse primers.

应注意，在所有这些变体中，正向引物和反向引物的身份可以互换。应注意在一些实施例中，嵌套式变体同样可以在无初始库制备的情况下操作，所述初始库制备包含附接衔接子标记和通用扩增步骤。应注意在一些实施例中，可以用额外的正向和/或反向引物和扩增步骤包括额外几轮PCR，这些额外步骤在需要进一步增加对应于目标基因座的DNA分子的百分比时会尤其适用。It should be noted that in all these variants the identities of the forward and reverse primers can be interchanged. It should be noted that in some embodiments, nested variants can also be manipulated without an initial library preparation comprising the attachment of adapter tags and general amplification steps. It should be noted that in some embodiments, additional rounds of PCR may be included with additional forward and/or reverse primers and amplification steps, these additional steps may be particularly useful when it is desired to further increase the percentage of DNA molecules corresponding to the loci of interest. Be applicable.

嵌套工作流nested workflow

存在多种方式来执行扩增，以不同的嵌套程度和以不同的复合程度。在图9中，给出了含有一些可能工作流的流程图。应注意，10,000重PCR的使用仅仅意味着一个实例；这些流程图将同样适用于其它复合程度。There are various ways to perform amplification, with different degrees of nesting and with different degrees of compositing. In Fig. 9, a flowchart with some possible workflows is given. It should be noted that the use of 10,000-plex PCRs is meant as an example only; these flow diagrams will apply equally to other multiplex degrees.

成环接合衔接子circular ligation adapter

当例如为了制造测序所用的库而添加通用标记衔接子时，存在多种接合衔接子的方式。一种方式是使样品DNA的末端平端化，执行A加尾，并且与具有T突出端的衔接子接合。存在多种其它的接合衔接子的方式。还存在多个可以被接合的衔接子。举例来说，可以在衔接子由两条DNA链组成的情况下使用Y衔接子，其中一条链具有双链区域和由正向引物区指定的区域，并且其中另一条链由与第一条链上的双链区互补的双链区和含有反向引物的区域指定。当退火时，双链区可以含有T突出端以便借由A突出端接合到双链DNA。When adding universally tagged adapters, eg, to make a library for sequencing, there are multiple ways to join the adapters. One way is to blunt the ends of the sample DNA, perform A-tailing, and ligate with adapters with T-overhangs. There are various other ways of joining adapters. There are also multiple adapters that can be ligated. For example, Y adapters can be used where the adapter consists of two strands of DNA, one of which has a double-stranded region and a region designated by the forward primer region, and where the other strand is composed of the same DNA as the first strand. The double-stranded region on the complementary double-stranded region and the region containing the reverse primer are specified. When annealed, the double stranded region may contain a T overhang for joining to double stranded DNA via an A overhang.

在一个实施例中，衔接子可以是DNA环，其中末端区域是互补的，并且其中环区域含有正向引物标记区(LFT)、反向引物标记区(LRT)以及介于两者之间的裂解位点(参看图10)。101是指双链、末端平端化的目标DNA。102是指A加尾的目标DNA。103是指含有T突出端‘T’和裂解位点‘Z’的成环接合衔接子。104是指附接有成环接合衔接子的目标DNA。105是指附接有接合衔接子、在裂解位点裂解的目标DNA。LFT是指接合衔接子正向标记，并且LRT是指接合衔接子反向标记。互补区可以以T突出端或可以用于接合到目标DNA的其它特征为末端。裂解位点可以是一系列通过UNG裂解的尿嘧啶，或可以通过限制酶或其它裂解方法或仅基本扩增识别并且裂解的序列。这些衔接子可以用于任何库制备，例如用于测序。这些衔接子可以与本文中所述其它方法中的任一种组合使用，例如微型PCR扩增方法。In one embodiment, the adapter can be a DNA loop, wherein the terminal regions are complementary, and wherein the loop region contains a forward primer tagging region (LFT), a reverse primer tagging region (LRT), and a region in between. Cleavage site (see Figure 10). 101 refers to double-stranded, blunt-ended target DNA. 102 refers to A-tailed target DNA. 103 refers to a circular ligation adapter containing a T overhang 'T' and a cleavage site 'Z'. 104 refers to the target DNA with attached circular ligation adapters. 105 refers to the target DNA with attached ligation adapter, cleaved at the cleavage site. LFT refers to ligation adapter forward tag and LRT refers to ligation adapter reverse tag. The complementary region may terminate in a T-overhang or other feature that may be used for ligation to the target DNA. The cleavage site can be a series of uracils that are cleaved by UNG, or a sequence that can be recognized and cleaved by restriction enzymes or other cleavage methods or just substantial amplification. These adapters can be used in any library preparation, for example for sequencing. These adapters can be used in combination with any of the other methods described herein, such as mini-PCR amplification methods.

内部标记引物internal marker primer

当使用测序来确定指定多态基因座处所存在的等位基因时，序列读数通常始于引物结合位点(a)的上游，并且然后到达多态位点(X)。通常如图11中左侧所示配置标记。101是指含有相关多态基因座‘X’和附接有标记‘b’的引物‘a’的单链目标DNA。为了避免非特异性杂交，引物结合位点(目标DNA中与‘a’互补的区域)的长度通常是18到30bp。序列标记‘b’通常是约20bp；理论上这些可以是长于约15bp的任何长度，但是很多人使用由测序平台公司(sequencing p1atform company)出售的引物序列。‘a’与‘X’之间的距离‘d’可以是至少2bp以便避免等位基因偏差。当使用本文中所公开的方法或其它方法执行复合PCR扩增时，其中需要谨慎的引物设计以避免过多的引物-引物相互相用，‘a’与‘X’之间的可允许距离‘d’的范围可以变化很大：从2bp到10bp、从2bp到20bp、从2bp到30bp或甚至从2bp到超过30bp。因此，当使用图11中左侧所示的引物配置时，序列读数必须是最小40bp以获得长度足以测量多态基因座的读数，并且取决于‘a’和‘d’的长度，序列读数可能需要长达60或75bp。通常，序列读数越长，对指定数量的读数进行测序的成本和时间越高，因此，将必需的读数长度降到最低可以节省时间和金钱。另外，因为平均来说，在所述读数上较早读取的碱基读取得比所述读数上较晚读取的那些要准确，所以降低必需序列读数长度还可以提高多态区域测量结果的准确性。When sequencing is used to determine the allele present at a given polymorphic locus, sequence reads typically begin upstream of the primer binding site (a) and then arrive at the polymorphic site (X). Markers are typically configured as shown on the left in Figure 11. 101 refers to the single stranded target DNA containing the associated polymorphic locus 'X' and primer 'a' attached with marker 'b'. To avoid non-specific hybridization, the length of the primer binding site (the region complementary to 'a' in the target DNA) is usually 18 to 30 bp. The sequence marker 'b' is usually about 20 bp; in theory these could be any length longer than about 15 bp, but many people use primer sequences sold by sequencing platform companies. The distance 'd' between 'a' and 'X' may be at least 2 bp in order to avoid allelic bias. When performing multiplex PCR amplification using the methods disclosed herein or other methods where careful primer design is required to avoid excessive primer-primer interactions, the allowable distance between 'a' and 'X' The range of d' can vary widely: from 2bp to 10bp, from 2bp to 20bp, from 2bp to 30bp or even from 2bp to over 30bp. Therefore, when using the primer configuration shown on the left in Figure 11, sequence reads must be a minimum of 40bp to obtain reads long enough to measure polymorphic loci, and depending on the length of 'a' and 'd', sequence reads may Needs to be as long as 60 or 75 bp. In general, the longer the sequence reads, the more costly and time-consuming to sequence a given number of reads, so minimizing the necessary read length can save time and money. In addition, reducing the required sequence read length can also improve the accuracy of polymorphic region measurements because, on average, bases that are called earlier on the read are called more accurately than those that are called later on the read. accuracy.

在一个实施例中，关于内部标记引物，引物结合位点(a)被拆分成多个区段(a′、a″、a″′......)，并且序列标记(b)在处于两个引物结合位点中间的DNA区段上，如图11，103中所示。这种配置允许测序仪做出更短的序列读数。在一个实施例中，a′+a″应该是至少约18bp，并且可以长达30、40、50、60、80、100或超过100bp。在一个实施例中，a″应该是至少约6bp，并且在一个实施例中，介于约8与16bp之间。在所有其它因素相同的情况下，可以使用内部标记引物将所需的序列读数的长度切割掉至少6bp、多达8bp、10bp、12bp、15bp，并且甚至多达20或30bp。这可以得到显著的金钱、时间和准确性优势。在图12中给出了内部标记引物的一个实例。In one embodiment, for internally labeled primers, the primer binding site (a) is split into segments (a', a", a"'...), and the sequence labeled (b) On the DNA segment between the two primer binding sites, as shown in Figure 11, 103. This configuration allows the sequencer to make shorter sequence reads. In one embodiment, a'+a" should be at least about 18 bp, and can be as long as 30, 40, 50, 60, 80, 100 or more than 100 bp. In one embodiment, a" should be at least about 6 bp, And in one embodiment, between about 8 and 16 bp. All other factors being equal, internal marker primers can be used to cut the length of the desired sequence read by at least 6bp, as much as 8bp, 10bp, 12bp, 15bp, and even as much as 20 or 30bp. This can yield significant monetary, time and accuracy advantages. An example of an internally labeled primer is given in FIG. 12 .

含有接合衔接子结合区的引物Primers containing ligated adapter binding regions

片段化DNA的一个问题是因为它的长度短，所以多态性接近DNA链末端的机率高于长链(例如101，图10)。因为多态性的PCR捕获要求在多态性的两侧上具有合适长度的引物结合位点，所以具有靶向多态性的大量DNA链将由于引物与靶向结合位点之间的重叠不足而错过。在一个实施例中，目标DNA 101可以附接有接合衔接子102，并且目标引物103可以具有与附接在所设计的结合区(a)的上游的接合衔接子标记(1t)互补的区域(cr)(参看图13)；因此，在结合区(101中与a互补的区域)比用于杂交通常所需的短18bp的情况下，引物上与库标记互补的区域(cr)能够增加与可以进行PCR的那一点的结合能。应注意，由于更短的结合区所丢失的任何特异性可以通过含有适当长的目标结合区的其它PCR引物来弥补。应注意，本实施例可以与直接PCR或本文中所述的其它方法中的任一种组合使用，所述其它方法例如嵌套式PCR、半嵌套式PCR、半侧嵌套式PCR、单边嵌套式或半或半侧嵌套式PCR或其它PCR方案。One problem with fragmented DNA is that because of its short length, the probability of polymorphisms is higher near the end of the DNA strand than in longer strands (eg 101, Figure 10). Because PCR capture of polymorphisms requires primer-binding sites of appropriate length on both sides of the polymorphism, a large number of DNA strands with targeted polymorphisms will suffer from insufficient overlap between primers and targeted binding sites. And miss. In one embodiment, the target DNA 101 may have a ligation adapter 102 attached, and the target primer 103 may have a region complementary to the ligation adapter tag (it) attached upstream of the designed binding region (a) ( cr) (see Figure 13); thus, in the case where the binding region (the region complementary to a in 101) is 18 bp shorter than normally required for hybridization, the region (cr) on the primer that is complementary to the library marker is able to increase compatibility with Binding energy at the point where PCR can be performed. It should be noted that any specificity lost due to shorter binding regions can be compensated by other PCR primers containing appropriately long target binding regions. It should be noted that this example can be used in combination with direct PCR or any of the other methods described herein, such as nested PCR, semi-nested PCR, hemi-nested PCR, single Side-nested or half- or half-side-nested PCR or other PCR schemes.

当使用测序数据以及涉及比较所观察到的等位基因数据与针对各个假设的预计等位基因分布的分析方法来测定倍性时，来自等位基因的具有低读数深度的每个额外读数将给出比具有高读数深度的等位基因读数更多的信息。因此，理想地，将希望看到均匀读数深度(DOR)，其中每个基因座将具有类似数量的代表性序列读数。因此，需要使DOR方差降到最低。在一个实施例中，有可能通过增加退火时间来减小DOR的方差系数(这可以被定义为DOR的标准偏差/平均DOR)。在一些实施例中，退火温度可以是长于2分钟、长于4分钟、长于十分钟、长于30分钟并且长于一小时或甚至更长。因为退火是一个平衡过程，所以借由增加退火时间来改进DOR方差不存在限制。在一个实施例中，增加引物浓度可以减小DOR方差。When ploidy is determined using sequencing data and analytical methods that involve comparing observed allelic data to predicted allelic distributions for individual hypotheses, each additional read from an allele with low read depth will give more informative than allelic reads with high read depth. Therefore, ideally, one would want to see a uniform depth of reads (DOR), where each locus would have a similar number of representative sequence reads. Therefore, it is desirable to minimize the DOR variance. In one embodiment, it is possible to reduce the coefficient of variance of DOR (this can be defined as standard deviation of DOR/average DOR) by increasing the annealing time. In some embodiments, the annealing temperature may be longer than 2 minutes, longer than 4 minutes, longer than ten minutes, longer than 30 minutes and longer than one hour or even longer. Since annealing is a balancing process, there is no limit to improving the DOR variance by increasing the annealing time. In one example, increasing the primer concentration can reduce the DOR variance.

例示性全基因组扩增方法Exemplary Whole Genome Amplification Methods

在一些实施例中，本发明方法可以涉及扩增DNA，例如在仅扩增目标基因座之前使用全基因组应用来扩增核酸样品。DNA的扩增是一个将少量遗传物质转化为包含一组类似的遗传数据的更大量遗传物质的过程，可以通过各种方法进行，所述方法包括(但不限于)聚合酶链式反应(PCR)。扩增DNA的一种方法是全基因组扩增(WGA)。存在多种可用于WGA的方法：接合介导的PCR(LM-PCR)、简并寡核苷酸引物PCR(DOP-PCR)以及复合置换扩增(MDA)。在LM-PCR中，称作衔接子的短DNA序列被接合到DNA的平末端。这些衔接子含有通用扩增序列，它们被用于通过PCR扩增DNA。在DOP-PCR中，也含有通用扩增序列的随机引物用于第一轮退火和PCR中。然后，第二轮PCR用于扩增其它含有通用引物序列的序列。MDA使用phi-29聚合酶，这是一种复制DNA并且已被用于单细胞分析的高度进行性并且非特异性酶。扩增来自单细胞的物质的主要限制是(1)使用极稀DNA浓度或极小体积的反应混合物的必要性，和(2)在全基因组中，从蛋白质可靠地解离DNA的困难。无论如何，单细胞全基因组扩增已经成功地用于各种应用多年。存在从DNA样品扩增DNA的其它方法。DNA扩增将初始DNA样品转化为在序列组方面类似但是数量大出很多的DNA样品。在一些情况下，可以不需要扩增。In some embodiments, the methods of the invention may involve amplifying DNA, for example using genome-wide applications to amplify a nucleic acid sample prior to amplifying only the loci of interest. Amplification of DNA, the process of converting a small amount of genetic material into a larger amount of genetic material containing a similar set of genetic data, can be performed by various methods including, but not limited to, the polymerase chain reaction (PCR ). One method of amplifying DNA is whole genome amplification (WGA). There are several methods available for WGA: ligation-mediated PCR (LM-PCR), degenerate oligonucleotide-primed PCR (DOP-PCR), and multiple displacement amplification (MDA). In LM-PCR, short DNA sequences called adapters are ligated to the blunt ends of DNA. These adapters contain universal amplification sequences that are used to amplify DNA by PCR. In DOP-PCR, random primers also containing universally amplified sequences are used in the first round of annealing and PCR. Then, a second round of PCR was used to amplify additional sequences containing the universal primer sequences. MDA uses phi-29 polymerase, a highly progressive and nonspecific enzyme that replicates DNA and has been used for single-cell analysis. The major limitations to amplifying material from single cells are (1) the necessity to use very dilute DNA concentrations or very small volumes of reaction mixtures, and (2) the difficulty of reliably dissociating DNA from proteins in whole genomes. Regardless, single-cell whole-genome amplification has been successfully used in various applications for many years. Other methods exist for amplifying DNA from DNA samples. DNA amplification converts an initial DNA sample into a DNA sample that is similar in sequence set but much larger in number. In some cases, amplification may not be required.

在一些实施例中，可以使用例如WGA或MDA的通用扩增来扩增DNA。在一些实施例中，可以通过靶向扩增，例如使用靶向PCR或环化中探针来扩增DNA。在一些实施例中，可以使用靶向扩增方法或引起所希望的与不希望的DNA完全或部分分离的方法，例如通过杂交方法捕获来优先富集DNA。在一些实施例中，可以通过使用通用扩增方法与优先富集方法的组合来扩增DNA。这些方法中的一些的更充分的描述可以在本文档中的其它地方找到。In some embodiments, DNA can be amplified using universal amplification such as WGA or MDA. In some embodiments, the DNA can be amplified by targeted amplification, eg, using targeted PCR or circularizing probes. In some embodiments, DNA may be preferentially enriched using targeted amplification methods or methods that result in complete or partial separation of desired from undesired DNA, such as capture by hybridization methods. In some embodiments, DNA can be amplified by using a combination of general amplification methods and preferential enrichment methods. A fuller description of some of these methods can be found elsewhere in this document.

例示性富集和测序方法Exemplary Enrichment and Sequencing Methods

在一个实施例中，本文中所公开的方法使用选择性富集技术，所述技术保持存在于初始DNA样品中的在来自一组目标基因座(例如，多态基因座)的每个目标基因座(例如，每个多态基因座)处的相对等位基因频率。虽然富集对于用于分析多态基因座的方法来说是特别有利的，但是必要时，这些富集方法可以容易地被调适成用于非多态基因座。在一些实施例中，扩增和/或选择性富集技术可以涉及PCR(例如接合介导的PCR)、通过杂交进行的片段捕获、分子倒置探针或其它环化中探针。在一些实施例中，用于扩增或选择性富集的方法可以涉及使用探针，其中在正确杂交到目标序列之后，核苷酸探针的3′端或5′端通过少量核苷酸与等位基因的多态位点隔开。这种间隔降低了一个等位基因的优先扩增(称为等位基因偏差)。这是优于涉及使用探针的方法的一种改进，其中正确杂交的探针的3′端或5′端与等位基因的多态位点直接相邻或非常靠近。在一个实施例中，排除其中杂交区可以或确定含有多态位点的探针。在杂交位点的多态位点会引起一些等位基因的不平等杂交或抑制整体杂交，引起某些等位基因优先扩增。这些实施例优于涉及靶向扩增和/或选择性富集的其它方法的改进之处在于，它们更好地保持了样品在每个多态基因座处的初始等位基因频率，无论样品是来自单一个体还是个体混合物的纯基因组样品。In one embodiment, the methods disclosed herein use selective enrichment techniques that preserve the presence of each gene of interest from a set of loci of interest (e.g., polymorphic loci) present in the initial DNA sample. Relative allele frequencies at loci (eg, each polymorphic locus). While enrichment is particularly advantageous for methods used to analyze polymorphic loci, these enrichment methods can easily be adapted for non-polymorphic loci, if necessary. In some embodiments, amplification and/or selective enrichment techniques may involve PCR (eg, ligation-mediated PCR), fragment capture by hybridization, molecular inversion probes, or other circularizing probes. In some embodiments, methods for amplification or selective enrichment may involve the use of probes in which, after proper hybridization to the target sequence, the 3' or 5' ends of the nucleotide probes are passed through a small number of nucleotides Separated from the polymorphic site of the allele. This spacing reduces the preferential amplification of one allele (known as allelic bias). This is an improvement over methods involving the use of probes where the 3' or 5' ends of the correctly hybridized probes are directly adjacent or very close to the polymorphic site of the allele. In one embodiment, probes where the hybridization region may or is determined to contain a polymorphic site are excluded. Polymorphic sites at hybridization sites can cause unequal hybridization of some alleles or inhibit overall hybridization, causing preferential amplification of certain alleles. An improvement of these embodiments over other methods involving targeted amplification and/or selective enrichment is that they better preserve the original allele frequency of the sample at each polymorphic locus, regardless of the sample Whether it is a pure genomic sample from a single individual or a mixture of individuals.

使用在一组目标基因座处富集DNA样品、接着测序的技术作为用于非侵入性产前等位基因判读或倍性判读的方法的一部分可以带来多个意想不到的优点。在本发明的一些实施例中，所述方法涉及测量适用于基于信息的方法、例如PARENTAL SUPPORT^TM(PS)的遗传数据。所述实施例中的一些的最终结果是胚胎或胎儿的可行遗传数据。存在多种方法可以被用于测量个体和/或个体相关的遗传数据作为实施方法的一部分。在一个实施例中，在本文中公开了一种用于富集一组靶向等位基因的浓度的方法，所述方法包含以下步骤中的一或多个：靶向扩增遗传物质、添加基因座特异性寡核苷酸探针、接合指定DNA链、分离所希望的DNA组、去除不希望的反应组分、通过杂交检测某些DNA序列以及通过DNA测序方法检测DNA的一条或多条链的序列。在一些情况下，DNA链可以指目标遗传物质；在一些情况下，它们可以指引物；在一些情况下，它们可以指合成序列，或其组合。这些步骤可以按多种不同顺序执行。Using the technique of enriching a DNA sample at a set of loci of interest followed by sequencing as part of a method for non-invasive prenatal allelic or ploidy calling can lead to several unexpected advantages. In some embodiments of the invention, the methods involve measuring genetic data suitable for information-based methods, such as PARENTAL SUPPORT ^™ (PS). The end result of some of the described embodiments is actionable genetic data of an embryo or fetus. There are a variety of methods that can be used to measure an individual and/or genetic data associated with an individual as part of implementing the method. In one embodiment, disclosed herein is a method for enriching the concentration of a set of targeted alleles comprising one or more of the following steps: targeted amplification of genetic material, addition of Locus-specific oligonucleotide probes, ligation of specified DNA strands, separation of desired DNA groups, removal of unwanted reaction components, detection of certain DNA sequences by hybridization, and detection of one or more strands of DNA by DNA sequencing methods sequence of chains. In some cases, DNA strands may refer to genetic material of interest; in some cases, they may refer to primers; in some cases, they may refer to synthetic sequences, or a combination thereof. These steps can be performed in a variety of different orders.

举例来说，在靶向扩增之前DNA的通用扩增步骤可以带来数个优点，例如去除瓶颈效应的风险并且降低等位基因偏差。DNA可以与寡核苷酸探针混合，所述探针可以与目标序列两侧的两个相邻区域杂交。在杂交之后，探针末端可以通过添加聚合酶、用于接合的构件以及用于允许探针环化的任何必需试剂来连接。在环化之后，可以添加核酸外切酶以消化非环化遗传物质，接着检测已环化探针。DNA可以与PCR引物混合，所述PCR引物可以与目标序列两侧的两个相邻区域杂交。在杂交之后，探针末端可以通过添加聚合酶、用于接合的构件以及用于完成PCR扩增的任何必需试剂来连接。可以通过靶向一组基因座的杂交捕获探针靶向已扩增或未扩增的DNA；在杂交之后，可以对探针进行定位并使其与混合物分离以提供目标序列富集的DNA混合物。For example, a universal amplification step of DNA prior to targeted amplification can bring several advantages, such as removing the risk of bottleneck effects and reducing allelic bias. The DNA can be mixed with oligonucleotide probes that hybridize to two adjacent regions flanking the target sequence. Following hybridization, the probe ends can be ligated by adding polymerase, building blocks for ligation, and any necessary reagents to allow circularization of the probe. Following circularization, an exonuclease can be added to digest non-circularized genetic material, followed by detection of circularized probes. The DNA can be mixed with PCR primers that hybridize to two adjacent regions flanking the target sequence. After hybridization, the probe ends can be ligated by adding polymerase, building blocks for ligation, and any necessary reagents for accomplishing PCR amplification. Amplified or unamplified DNA can be targeted by hybridization capture probes targeting a set of loci; following hybridization, the probes can be localized and separated from the mixture to provide a DNA mixture enriched for the target sequence .

使用靶向某些基因座接着测序的方法作为用于等位基因判读或倍性判读的方法的一部分可以带来多个意想不到的优点。可以靶向或优先富集DNA的一些方法包括使用环化中探针、连锁倒置探针(LIP、MIP)、通过例如休尔塞莱克特(SURESELECT)的杂交方法捕获以及靶向PCR或接合介导的PCR扩增策略。Using a method of targeting certain loci followed by sequencing as part of a method for allele calling or ploidy calling can lead to several unexpected advantages. Some methods by which DNA can be targeted or preferentially enriched include the use of circularizing probes, linkage inversion probes (LIP, MIP), capture by hybridization methods such as SURESELECT, and targeted PCR or ligation mediators. Guided PCR amplification strategy.

在一些实施例中，本发明方法涉及测量适用于本文中进一步所述的基于信息的方法、例如PARENTAL SUPPORT^TM(PS)的遗传数据。PARENTAL SUPPORT^TM是一种用于操控遗传数据的基于信息的方法，其各方面在本文中有所描述。所述实施例中的一些的最终结果是胚胎或胎儿的可行遗传数据，接着是基于所述可行数据的临床决定。PS方法后面的算法采用所测量的目标个体、通常是胚胎或胎儿的遗传数据，和所测量的相关个体的遗传数据，并且能够提高获知目标个体的遗传状态的准确性。在一个实施例中，所测量的遗传数据用于在产前基因诊断期间做出倍性决定的情况下。在一个实施例中，所测量的遗传数据用于在体外受精期间对胚胎做出倍性决定或等位基因判读的情况下。存在多种方法可以被用于在上述情况下测量个体和/或个体相关的遗传数据。不同方法包含多个步骤，那些步骤通常涉及扩增遗传物质、添加寡核苷酸探针、接合指定DNA链、分离所希望的DNA组、去除不希望的反应组分、通过杂交检测某些DNA序列、通过DNA测序方法检测DNA的一条或多条链的序列。在一些情况下，DNA链可以指目标遗传物质；在一些情况下，它们可以指引物；在一些情况下，它们可以指合成序列，或其组合。这些步骤可以按多种不同顺序执行。In some embodiments, the methods of the invention involve measuring genetic data suitable for use in information-based methods further described herein, such as PARENTAL SUPPORT ^™ (PS). PARENTAL SUPPORT ^™ is an information-based method for manipulating genetic data, aspects of which are described herein. The end result of some of these embodiments is actionable genetic data for an embryo or fetus, followed by a clinical decision based on the actionable data. The algorithm behind the PS method uses the measured genetic data of the target individual, usually an embryo or fetus, and the measured genetic data of related individuals, and can improve the accuracy of knowing the genetic status of the target individual. In one embodiment, the measured genetic data is used in the context of making ploidy decisions during prenatal genetic diagnosis. In one embodiment, the measured genetic data is used in the context of ploidy determination or allelic calling of embryos during in vitro fertilization. There are a variety of methods that can be used to measure individuals and/or individual-related genetic data in the situations described above. The different methods contain multiple steps, those steps usually involve amplifying the genetic material, adding oligonucleotide probes, joining the specified DNA strands, isolating the desired DNA group, removing unwanted reaction components, detecting some DNA by hybridization Sequence, the detection of the sequence of one or more strands of DNA by DNA sequencing methods. In some cases, DNA strands may refer to genetic material of interest; in some cases, they may refer to primers; in some cases, they may refer to synthetic sequences, or a combination thereof. These steps can be performed in a variety of different orders.

应注意，理论上有可能靶向基因组中从一个基因座到远超过一百万个基因座之间的任何数量的基因座。如果DNA样品经历靶向并且然后测序，那么通过测序仪读取的等位基因的百分比将相对于其在样品中的自然丰度有所富集。富集度可以是从百分之一(或甚至更小)到十倍、百倍、千倍或甚至数百万倍之间的任何数。在人类基因组中，存在大约30亿个碱基对和核苷酸，包含约0.75亿个多态基因座。所靶向的基因座越多，富集度可能越小。所靶向的基因座的数量越少，富集度可能越大，并且关于指定数量的序列读数，可以在那些基因座处获得的读数深度越大。It should be noted that it is theoretically possible to target any number of loci in the genome, from one locus to well over a million loci. If a DNA sample is subjected to targeting and then sequenced, the percentage of alleles read by the sequencer will be enriched relative to their natural abundance in the sample. The degree of enrichment can be anywhere from one percent (or even less) to ten, hundred, thousand or even millions of times. In the human genome, there are approximately 3 billion base pairs and nucleotides, containing approximately 75 million polymorphic loci. The more loci that are targeted, the smaller the enrichment is likely to be. The fewer the number of loci targeted, the greater the enrichment possible and, for a given number of sequence reads, the greater the depth of reads that can be obtained at those loci.

在本发明的一个实施例中，靶向或优先可以完全集中在SNP。在一个实施例中，靶向或优先可以集中在任何多态位点。多种商业靶向产品可用以富集外显子。出人意料地，当使用依赖于等位基因分布的进行NPD的方法时，排他性地靶向SNP或排他性地靶向多态基因座是特别有利的。还公开了使用测序进行NPD的方法，例如美国专利7,888,017，涉及读数计数分析，其中读数计数集中在对映射到指定染色体的读数的数量进行计数，其中所分析的序列读数不是集中在基因组的多态区域。不是集中在多态等位基因的那些类型的方法将不会那么大地受益于靶向或优先富集一组等位基因。In one embodiment of the invention, targeting or prioritization can be entirely focused on SNPs. In one embodiment, targeting or preference can be focused on any polymorphic site. A variety of commercial targeting products are available to enrich for exons. Surprisingly, targeting exclusively SNPs or exclusively polymorphic loci is particularly advantageous when using methods for performing NPD that rely on allelic distribution. Methods for NPD using sequencing are also disclosed, e.g. U.S. Patent 7,888,017, which relates to read counting analysis, where read counting focuses on counting the number of reads that map to a given chromosome, where the sequence reads analyzed are not focused on polymorphisms of the genome area. Those types of methods that do not focus on polymorphic alleles will not benefit as much from targeting or preferentially enriching a set of alleles.

在本发明的一个实施例中，有可能使用集中在SNP以富集基因组的多态区域中的遗传样品的靶向方法。在一个实施例中，有可能集中在少量SNP，例如介于1与100个SNP之间，或更大数量，例如介于100与1,000之间、介于1,000与10,000之间、介于10,000与100,000之间或超过100,000个SNP。在一个实施例中，有可能集中在一条或少量与活三体出生有关的染色体，例如13号、18号、21号、X和Y染色体或其一些组合。在一个实施例中，有可能使所靶向的SNP富集一个小的倍数，例如介于1.01倍与100倍之间，或富集一个更大的倍数，例如介于100倍与1,000,000倍之间，或甚至超过1,000,000倍。在本发明的一个实施例中，有可能使用用于创建在基因组的多态区域中优先富集的DNA样品的靶向方法。在一个实施例中，有可能使用这种方法来创建具有这些特征中的任一个的DNA混合物，其中所述DNA混合物含有母本DNA以及自由浮动的胎儿DNA。在一个实施例中，有可能使用这种方法来创建具有这些因素的任何组合的DNA混合物。举例来说，本文中所述的方法可以用于制造包含母本DNA和胎儿DNA并且优先富集对应于200个SNP的DNA的DNA混合物，所有DNA皆位于18号或21号染色体上，并且平均富集了1000倍。在另一实例中，有可能使用所述方法来创建在10,000个SNP优先富集的DNA混合物，所述DNA中的所有或大部分位于13号、18号、21号、X和Y染色体上，并且每个基因座平均富集超过500倍。本文中所述的靶向方法中的任一种可以用于创建在某些基因座优先富集的DNA混合物。In one embodiment of the invention, it is possible to use a targeted approach that focuses on SNPs to enrich genetic samples in polymorphic regions of the genome. In one embodiment, it is possible to focus on a small number of SNPs, such as between 1 and 100 SNPs, or a larger number, such as between 100 and 1,000, between 1,000 and 10,000, between 10,000 and Between 100,000 or more than 100,000 SNPs. In one embodiment, it is possible to focus on one or a small number of chromosomes involved in the birth of a living trisomy, such as chromosomes 13, 18, 21, X and Y, or some combination thereof. In one embodiment, it is possible to enrich the targeted SNP by a small factor, such as between 1.01-fold and 100-fold, or by a larger factor, such as between 100-fold and 1,000,000-fold between, or even more than 1,000,000 times. In one embodiment of the invention, it is possible to use targeted methods for creating DNA samples that are preferentially enriched in polymorphic regions of the genome. In one embodiment, it is possible to use this method to create a DNA mixture having any of these characteristics, wherein the DNA mixture contains maternal DNA as well as free-floating fetal DNA. In one example, it is possible to use this method to create DNA mixtures with any combination of these factors. For example, the methods described herein can be used to create a DNA mixture comprising maternal and fetal DNA preferentially enriched for DNA corresponding to 200 SNPs, all located on chromosome 18 or 21, and on average Enriched 1000-fold. In another example, it is possible to use the method to create a DNA mixture preferentially enriched at 10,000 SNPs, all or most of which are located on chromosomes 13, 18, 21, X and Y, And each locus was enriched more than 500-fold on average. Any of the targeting methods described herein can be used to create DNA mixtures that are preferentially enriched at certain loci.

在一些实施例中，本发明方法进一步包括使用高通量DNA测序仪测量用带分数表示的DNA，其中用带分数表示的DNA含有来自一或多条染色体的数量不成比例的序列，其中所述一或多条染色体取自包含13号染色体、18号染色体、21号染色体、X染色体、Y染色体以及其组合的群组。In some embodiments, the methods of the invention further comprise using a high-throughput DNA sequencer to measure fractionated DNA containing a disproportionate amount of sequence from one or more chromosomes, wherein said The one or more chromosomes are taken from the group comprising chromosome 13, chromosome 18, chromosome 21, X chromosome, Y chromosome, and combinations thereof.

本文中描述了三种方法：复合PCR、通过杂交靶向捕获以及连锁倒置探针(LIP)，它们可以用于从来自母本血浆样品的足够数量的多态基因座获得测量结果并加以分析，从而检测胎儿非整倍性；这并不意味着排除选择性富集目标基因座的其它方法。同样可以使用其它方法而不改变所述方法的本质。在每种情况下，所分析的多态性可以包括单核苷酸多态性(SNP)、小片段插入缺失或STR。优选方法涉及使用SNP。每种方法产生等位基因频率数据；可以分析每个目标基因座的等位基因频率数据和/或来自这些基因座的联合等位基因频率分布以测定胎儿的倍性。每种方法由于有限的源材料和母本血浆由母本和胎儿的DNA的混合物组成的事实而具有其自身的考虑因素。这种方法可以与其它方法组合提供更准确的测定。在一个实施例中，这种方法可以与例如美国专利7,888,017中所述的序列计数方法组合。所述方法还可以用于从母本血浆样品非侵入性地检测胎儿亲权。另外，每种方法可以应用于其它DNA混合物或纯DNA样品以检测存在或不存在非整倍体染色体，对来自降解的DNA样品的大量SNP进行基因分型，检测区段性拷贝数变异(CNV)，检测其它相关的基因型状态或其一些组合。Three methods are described here: multiplex PCR, targeted capture by hybridization, and linked inversion probes (LIPs), which can be used to obtain and analyze measurements from a sufficient number of polymorphic loci from maternal plasma samples, Fetal aneuploidy is thereby detected; this is not meant to exclude other methods of selectively enriching the loci of interest. Other methods can likewise be used without changing the essence of the method described. In each case, the polymorphisms analyzed may include single nucleotide polymorphisms (SNPs), small indels or STRs. A preferred method involves the use of SNPs. Each method produces allele frequency data; the allele frequency data for each locus of interest and/or the joint allele frequency distribution from these loci can be analyzed to determine the ploidy of the fetus. Each method has its own considerations due to the limited source material and the fact that maternal plasma consists of a mixture of maternal and fetal DNA. This method can be combined with other methods to provide a more accurate assay. In one embodiment, this method can be combined with sequence counting methods such as those described in US Patent 7,888,017. The method can also be used to non-invasively detect fetal parentage from maternal plasma samples. Additionally, each method can be applied to other DNA mixtures or pure DNA samples to detect the presence or absence of aneuploid chromosomes, genotype large numbers of SNPs from degraded DNA samples, detect segmental copy number variations (CNV ), detecting other relevant genotypic states or some combination thereof.

准确测量等位基因在样品中的分布Accurately measure the distribution of alleles in a sample

当前测序方法可以用于估计等位基因在样品中的分布。一种此类方法涉及对一池DNA的序列进行随机抽样，称为鸟枪法测序。测序数据中的特定等位基因的比例通常极低并且可以通过简单统计来测定。人类基因组含有约30亿个碱基对。因此，如果所用测序方法做出100bp读数，那么将每3000万个序列读数测量特定等位基因约一次。Current sequencing methods can be used to estimate the distribution of alleles in a sample. One such method involves random sampling of the sequence of a pool of DNA, known as shotgun sequencing. The proportion of a particular allele in the sequencing data is usually very low and can be determined by simple statistics. The human genome contains approximately 3 billion base pairs. Thus, if the sequencing method used makes 100 bp reads, then a particular allele will be measured approximately once every 30 million sequence reads.

在一个实施例中，本发明方法用于从所述染色体测量的基因座的等位基因分布来确定在DNA样品中存在或不存在两个或更多个含有相同基因座组的不同单倍型。不同单倍型可以表示来自一个个体的两条不同的同源染色体、来自三体个体的三条不同的同源染色体、来自母亲和胎儿的三种不同的同源单倍型，其中单倍型中的一种在母亲和胎儿之间共享、来自母亲和胎儿的三种或四种单倍型，其中单倍型中的一种或两种在母亲和胎儿之间共享、或其它组合。单倍型之间的多态等位基因往往可提供更多信息，然而其中母亲和父亲对于同一等位基因不是纯合的任何等位基因将通过所测量的等位基因分布得到优于可从简单的读数计数分析获得的信息的有用信息。In one embodiment, the method of the invention is used to determine the presence or absence of two or more different haplotypes containing the same set of loci in a DNA sample from the allelic distribution of loci measured from said chromosome . Different haplotypes can represent two different homologous chromosomes from an individual, three different homologous chromosomes from a trisomy individual, and three different homologous haplotypes from the mother and fetus, where the haplotypes One of the haplotypes shared between the mother and the fetus, three or four haplotypes from the mother and the fetus, wherein one or two of the haplotypes are shared between the mother and the fetus, or other combinations. Polymorphic alleles between haplotypes tend to be more informative, however any allele where the mother and father are not homozygous for the same allele will be better informed by the measured allelic distribution than can be obtained from Useful information on the information obtained by simple read count analysis.

然而，这种样品的鸟枪法测序效率极低，因为它产生了样品中不同单倍型之间的非多态区域或不是相关染色体的区域的多个序列，并且因此未揭示关于目标单倍型的比例的信息。本文中描述了特异性靶向和/或优先富集样品的基因组中更可能是多态的DNA区段以增加通过测序获得的等位基因信息的获得量的方法。应注意，关于富集样品中所测量的等位基因分布真正代表目标个体中所存在的实际量，关键的是在靶向区段中的指定基因座处，存在一个等位基因相比于另一个等位基因的极少或无优先富集。本领域中已知用于靶向多态等位基因的当前方法被设计成用于确保检测到所存在的任何等位基因中的至少一些。然而，这些方法不是为了测量初始混合物中所存在的多态等位基因的未偏差等位基因分布而设计的。以下这点并非显而易见：目标富集的任何特定方法将能够产生富集样品，其中所测量的等位基因分布将比任何其它方法更好地准确表示未经扩增的初始样品中所存在的等位基因分布。虽然理论上为了实现这类目的可以预计多种富集方法，但是本领域的普通技术人员充分认识到，在当前的扩增、靶向和其它优先富集方法中存在大量随机或确定性偏差。本文中所述方法的一个实施例允许DNA混合物中所发现的对应于基因组中的指定基因座的多个等位基因以等位基因中的每一个的富集度几乎相同的方式扩增或优先富集。另一种说法是，所述方法允许混合物中所存在的等位基因的相对数量从整体上增加，而对应于每个基因座的等位基因之间的比率基本上与它们在初始DNA混合物中保持相同。关于所报告的一些方法，基因座的优先富集可以产生超过1％、超过2％、超过5％并且甚至超过10％的等位基因偏差。这种优先富集可能是由于在使用通过杂交方法捕获时的捕获偏差，或对每个周期来说可能较小但是当混合超过20、30或40个周期时会变大的扩增偏差。为了本发明，比率基本上保持相同意指初始混合物中的等位基因的比率除以所得混合物中的等位基因的比率介于0.95与1.05之间、介于0.98与1.02之间、介于0.99与1.01之间、介于0.995与1.005之间、介于0.998与1.002之间、介于0.999与1.001之间或介于0.9999与1.0001之间。应注意，在此呈现的等位基因比率的计算不可以用于确定目标个体的倍性状态并且仅仅可以是用于衡量等位基因偏差的度量。However, shotgun sequencing of such samples is extremely inefficient because it yields multiple sequences of non-polymorphic regions between different haplotypes in the sample or regions that are not related chromosomes and thus does not reveal information about the target haplotype ratio information. Described herein are methods for specifically targeting and/or preferentially enriching DNA segments in the genome of a sample that are more likely to be polymorphic to increase the yield of allelic information obtained by sequencing. It should be noted that, for the allelic distribution measured in the enriched sample to truly represent the actual amount present in the target individual, it is critical that at a given locus in the targeted segment, one allele is present compared to the other. Little or no preferential enrichment of one allele. Current methods known in the art for targeting polymorphic alleles are designed to ensure detection of at least some of any alleles present. However, these methods are not designed to measure the unbiased allelic distribution of the polymorphic alleles present in the initial mixture. It is not obvious that any particular method of target enrichment will be able to produce enriched samples in which the measured allelic distribution will be a better representation than any other method of the isoforms present in the unamplified initial sample. bit gene distribution. While a variety of enrichment methods can theoretically be envisioned for such purposes, those of ordinary skill in the art are well aware that there are substantial random or deterministic biases in current amplification, targeting, and other preferential enrichment methods . One embodiment of the methods described herein allows multiple alleles found in a mixture of DNA corresponding to a given locus in the genome to be amplified or preferentially amplified in such a way that each of the alleles is approximately equally enriched. Enrichment. Said another way, the method allows for an overall increase in the relative numbers of alleles present in the mixture, while the ratios between the alleles corresponding to each locus are substantially the same as they were in the initial DNA mixture. stay the same. For some of the reported methods, preferential enrichment of loci can produce allelic bias of more than 1%, more than 2%, more than 5% and even more than 10%. This preferential enrichment may be due to capture bias when using capture by hybridization methods, or amplification bias which may be small for each cycle but becomes larger when mixing exceeds 20, 30 or 40 cycles. For the purposes of the present invention, the ratio remains substantially the same meaning that the ratio of alleles in the initial mixture divided by the ratio of alleles in the resulting mixture is between 0.95 and 1.05, between 0.98 and 1.02, between 0.99 and 1.01, between 0.995 and 1.005, between 0.998 and 1.002, between 0.999 and 1.001, or between 0.9999 and 1.0001. It should be noted that the calculation of allele ratios presented here cannot be used to determine the ploidy state of an individual of interest and can only be a measure for measuring allelic bias.

在一个实施例中，在已经在目标基因座组优先富集混合物之后，可以使用前代、当代或下一代对克隆样品(从单一分子产生的样品；实例包括伊路米那GAIIx、伊路米那HISEQ、生命技术索立德(LIFE TECHNOLOGIES SOLiD)，5500XL)进行测序的测序仪器中的任一个对它进行测序。可以通过对靶向区内的特异性等位基因进行测序来估计比率。这些测序读数可以根据等位基因类型和因此确定的不同等位基因的比率来加以分析和计数。关于长度为一个到几个碱基的变异，将通过测序执行等位基因的检测并且测序读数必需覆盖所讨论的等位基因以便估计所捕获分子的等位基因组成。可以通过增加测序读数的长度来增加用于分析基因型所捕获分子的总数。所有分子的完整测序将确保可用于富集池中的最大量数据的集合。然而，测序当前是昂贵的，并且可以使用更少量序列读数测量等位基因分布的方法将具有巨大价值。另外，存在最大可能读数长度的技术限制以及随着读数长度增加的准确性限制。具有最大效用的等位基因将是长度为一个到几个碱基，但是理论上可以使用比测序读数的长度短的任何等位基因。虽然等位基因变异出现在所有类型中，但是本文所提供的实例集中在含有仅几个相邻碱基对的SNP或变异体。在许多情况下，例如区段性拷贝数变异体的更大变异体可以通过这些更小变异的聚集体来检测，因为所述区段内部的SNP的整个集合都被复制。大于几个碱基的变异体(例如STR)需要特别考虑并且一些靶向方法起作用而其它的将不起作用。In one embodiment, after the mixture has been preferentially enriched at the locus group of interest, previous, current or next generation pairs can be used to clone samples (samples generated from a single molecule; examples include Illumina GAIIx, Illumina GAIIx, Illumina It can be sequenced by any one of the sequencing instruments for HISEQ, LIFE TECHNOLOGIES SOLiD, 5500XL). Ratio can be estimated by sequencing specific alleles within the targeted region. These sequencing reads can be analyzed and counted according to the allele type and thus the determined ratio of different alleles. For variants that are one to a few bases in length, detection of alleles will be performed by sequencing and sequencing reads must cover the allele in question in order to estimate the allelic composition of the captured molecule. The total number of molecules captured for genotyping can be increased by increasing the length of the sequencing reads. Complete sequencing of all molecules will ensure collection of the largest amount of data available in the enrichment pool. However, sequencing is currently expensive, and a method that can measure allelic distribution using a smaller number of sequence reads would be of great value. In addition, there are technical limitations on the maximum possible read length and accuracy limitations as read length increases. The alleles with greatest utility will be one to a few bases in length, but in theory any allele shorter than the length of the sequencing reads could be used. Although allelic variation occurs in all types, the examples provided herein focus on SNPs or variants that contain only a few adjacent base pairs. In many cases, larger variants such as segmental copy number variants can be detected by aggregates of these smaller variants because the entire set of SNPs within the segment are replicated. Variants larger than a few bases (eg STRs) require special consideration and some targeting approaches will work while others will not.

存在多种可以用于特异性分离和富集基因组中的一个或多个变异体位置的靶向方法。通常，这些依赖于利用侧接变异体序列的恒定序列。存在其他人的与在测序情况下的靶向相关的报告，其中底物是母本血浆(参看例如廖(Liao)等人，临床化学(Clin.Chem.)2011；57(1)：第92-101页)。然而，这些方法使用靶向外显子的靶向探针，并且不是集中在靶向基因组的多态区域。在一个实施例中，本发明方法涉及使用排他性地或几乎完全集中在多态区域的靶向探针。在一个实施例中，本发明方法涉及使用排他性地或几乎完全集中在SNP的靶向探针。在本发明的一些实施例中，靶向多态位点由至少10％SNP、至少20％SNP、至少30％SNP、至少40％SNP、至少50％SNP、至少60％SNP、至少70％SNP、至少80％SNP、至少90％SNP、至少95％SNP、至少98％SNP、至少99％SNP、至少99.9％SNP或排他性地SNP组成。There are a variety of targeted methods that can be used to specifically isolate and enrich for one or more variant positions in the genome. Typically, these rely on the use of constant sequences flanking the variant sequences. There are other reports related to targeting in the context of sequencing where the substrate is maternal plasma (see e.g. Liao et al., Clin. Chem. 2011; 57(1): p. 92 -101 pages). However, these methods use targeted probes that target exons and are not focused on targeting polymorphic regions of the genome. In one embodiment, the methods of the invention involve the use of targeting probes that focus exclusively or almost exclusively on polymorphic regions. In one embodiment, the methods of the invention involve the use of targeting probes that focus exclusively or almost exclusively on SNPs. In some embodiments of the invention, the targeted polymorphic loci consist of at least 10% SNPs, at least 20% SNPs, at least 30% SNPs, at least 40% SNPs, at least 50% SNPs, at least 60% SNPs, at least 70% SNPs , at least 80% SNPs, at least 90% SNPs, at least 95% SNPs, at least 98% SNPs, at least 99% SNPs, at least 99.9% SNPs or exclusively SNPs.

在一个实施例中，本发明方法可以用于测定DNA分子混合物的基因型(在特异性基因座处的DNA基本组成)和那些基因型的相对比例，其中那些DNA分子可以来源于一个或多个遗传上不同的个体。在一个实施例中，本发明方法可以用于测定在一组多态基因座处的基因型，和存在于那些基因座的不同等位基因的量的相对比率。在一个实施例中，多态基因座可以完全由SNP组成。在一个实施例中，多态基因座可以包含SNP、单一串联重复以及其它多态性。在一个实施例中，本发明方法可以用于测定DNA混合物中在一组多态基因座处的等位基因的相对分布，其中所述DNA混合物包含源自母亲的DNA和源自胎儿的DNA。在一个实施例中，可以测定从孕妇血液中分离的DNA混合物的联合等位基因分布。在一个实施例中，在一组基因座处的等位基因分布可以用于测定孕育中的胎儿的一或多条染色体的倍性状态。In one embodiment, the method of the present invention can be used to determine the genotypes (basic composition of DNA at specific loci) and the relative proportions of those genotypes of a mixture of DNA molecules, where those DNA molecules can be derived from one or more genetically distinct individuals. In one embodiment, the methods of the invention can be used to determine genotypes at a set of polymorphic loci, and the relative ratios of the amounts of the different alleles present at those loci. In one embodiment, a polymorphic locus may consist entirely of SNPs. In one embodiment, polymorphic loci can comprise SNPs, single tandem repeats, and other polymorphisms. In one embodiment, the methods of the invention can be used to determine the relative distribution of alleles at a set of polymorphic loci in a DNA mixture comprising maternally derived DNA and fetally derived DNA. In one embodiment, the joint allelic distribution of a mixture of DNA isolated from blood of a pregnant woman can be determined. In one embodiment, the distribution of alleles at a set of loci can be used to determine the ploidy state of one or more chromosomes of a gestating fetus.

在一个实施例中，DNA分子混合物可以来源于从一个个体的多个细胞提取的DNA。在一个实施例中，从中获得DNA的细胞的初始集合可以包含具有相同或不同基因型的二倍体或单倍体细胞的混合物，如果所述个体是嵌合体(生殖系或体细胞)。在一个实施例中，DNA分子混合物还可以来源于从单细胞提取的DNA。在一个实施例中，DNA分子混合物还可以来源于从同一个体或不同个体的两个或更多个细胞的混合物提取的DNA。在一个实施例中，DNA分子混合物可以来源于从已经从例如血浆的细胞释放的已知含有无细胞DNA的生物材料中分离的DNA。在一个实施例中，这种生物材料可以是来自一或多个个体的DNA混合物，正如在妊娠期间的情况一样，其中已经显示在混合物中存在胎儿DNA。在一个实施例中，生物材料可以来自母本血液中所发现的细胞混合物，其中所述细胞中的一些源于胎儿。在一个实施例中，生物材料可以是来自孕妇血液的已经富集了胎儿细胞的细胞。In one embodiment, the mixture of DNA molecules can be derived from DNA extracted from multiple cells of an individual. In one example, the initial collection of cells from which DNA is obtained may comprise a mixture of diploid or haploid cells of the same or different genotypes, if the individual is mosaic (germline or somatic). In one embodiment, the mixture of DNA molecules can also be derived from DNA extracted from single cells. In one embodiment, the mixture of DNA molecules can also be derived from DNA extracted from a mixture of two or more cells of the same individual or different individuals. In one embodiment, the mixture of DNA molecules may be derived from DNA isolated from biological material known to contain cell-free DNA that has been released from cells such as plasma. In one embodiment, such biological material may be a mixture of DNA from one or more individuals, as is the case during pregnancy, where fetal DNA has been shown to be present in the mixture. In one embodiment, the biological material may be derived from a mixture of cells found in maternal blood, some of which are of fetal origin. In one embodiment, the biological material may be cells from the blood of a pregnant woman that have been enriched with fetal cells.

环化中探针circularizing probe

本发明的一些实施例涉及在本发明的复合PCR方法中使用不是LIP的引物扩增之前或之后，使用先前已经在文献中进行描述的“连锁倒置探针”(LIP)来扩增目标基因座。LIP是一个通用术语，打算涵盖涉及产生环状DNA分子的技术，其中所述探针被设计成用于杂交到靶向等位基因的两侧上所靶向的DNA区，以使得恰当聚合酶和/或连接酶和适当条件、缓冲液和其它试剂的添加将完成靶向等位基因上的互补、倒置式DNA区创建捕获在靶向等位基因中所发现的信息的环状DNA环。LIP还可以被称作已预环化探针、预环化中探针或环化中探针。LIP探针可以是长度介于50与500个核苷酸之间的线性DNA分子，并且在一个实施例中，长度介于70与100个核苷酸之间；在一些实施例中，它可以比本文中所述的要长或短。本发明的其它实施例涉及例如锁式探针和分子倒置探针(MIP)的LIP技术的不同入肉。Some embodiments of the invention involve the use of "Linked Inversion Probes" (LIPs), which have been previously described in the literature, to amplify a locus of interest either before or after amplification using primers other than LIPs in the multiplex PCR method of the invention . LIP is a general term intended to cover techniques involving the production of circular DNA molecules in which the probes are designed to hybridize to the targeted DNA region on either side of the targeted allele so that the appropriate polymerase The addition of and/or ligase and appropriate conditions, buffers and other reagents will complete the complementary, inverted DNA region on the targeted allele creating a circular DNA loop that captures the information found in the targeted allele. A LIP may also be referred to as a precircularizing probe, a precircularizing probe, or a circularizing probe. The LIP probe can be a linear DNA molecule between 50 and 500 nucleotides in length, and in one embodiment, between 70 and 100 nucleotides in length; in some embodiments, it can be longer or shorter than described in this article. Other embodiments of the invention involve different implementations of LIP techniques such as padlock probes and molecular inversion probes (MIP).

靶向特异性位置用于测序的一种方法是合成探针，其中所述探针的3′和5′端在与靶向区相邻的位置和在靶向区的两侧上以倒置方式退火到目标DNA，以使得DNA聚合酶和DNA连接酶的添加引起从3′端延伸，向与目标分子互补的单链探针添加碱基(间隙填充)，接着是将新的3′端接合到初始探针的5′端，产生环状DNA分子，所述分子可以随后与背景DNA分离。探针末端被设计成用于侧接靶向相关区域。这种方法的一个方面通常被称作MIP并且已经结合阵列技术一起用于测定所填充序列的性质。在测量等位基因比率的情况下使用MIP的一个缺点是杂交、环化和扩增步骤针对同一基因座处的不同等位基因不是以相等速率发生的。这导致所测量的等位基因比率并不代表初始混合物中所存在的实际等位基因比率。One approach to targeting specific locations for sequencing is to synthesize probes in which the 3' and 5' ends of the probe are positioned adjacent to and flanking the targeting region in an inverted fashion. Annealing to target DNA such that addition of DNA polymerase and DNA ligase causes extension from the 3' end, addition of bases to the single-stranded probe complementary to the target molecule (gap filling), followed by ligation of the new 3' end To the 5' end of the original probe, a circular DNA molecule is generated which can then be separated from background DNA. The probe ends are designed to flank targeting relevant regions. One aspect of this approach is commonly referred to as MIP and has been used in conjunction with array technology to determine the properties of the populated sequences. One disadvantage of using MIP in the context of measuring allele ratios is that the hybridization, circularization and amplification steps do not occur at equal rates for different alleles at the same locus. This results in the measured allele ratios not being representative of the actual allele ratios present in the initial mixture.

在一个实施例中，构造环化中探针以使得被设计成用于杂交靶向多态基因座的上游的探针区域和被设计成用于杂交靶向多态基因座的下游的探针区域通过非核酸骨架共价连接。这种骨架可以是任何生物相容性分子或生物相容性分子的组合。可能的生物相容性分子的一些实例是聚(乙二醇)、聚碳酸酯、聚氨基甲酸酯、聚乙烯、聚丙烯、砜聚合物、硅酮、纤维素、含氟聚合物、丙烯酸系化合物、苯乙烯嵌段共聚物以及其它嵌段共聚物。In one embodiment, the probes in the circularization are constructed such that a region of the probe designed to hybridize upstream of the targeted polymorphic locus and a probe designed to hybridize downstream of the targeted polymorphic locus The domains are covalently linked by a non-nucleic acid backbone. This backbone can be any biocompatible molecule or combination of biocompatible molecules. Some examples of potentially biocompatible molecules are poly(ethylene glycol), polycarbonate, polyurethane, polyethylene, polypropylene, sulfone polymers, silicone, cellulose, fluoropolymers, acrylic series compounds, styrene block copolymers and other block copolymers.

在本发明的一个实施例中，已经将这种方法修改得容易经受测序作为查询序列中所填充者的一种手段。为了保持初始样品的初始等位基因比例，必须考虑至少一个关键考虑因素。间隙填充区中不同等位基因中的变异位置必须不能太靠近探针结合位点，因为会存在由DNA聚合酶引起的起始偏差，造成变异体差异。另一个考虑因素是在与间隙填充区中的变异体相关的探针结合位点可能存在其它变异，这些变异会导致不同等位基因的不相等扩增。在本发明的一个实施例中，已预环化探针的3′端和5′端被设计成用于杂交到远离靶向等位基因的变异体位置(多态位点)的一个或几个位置的碱基。多态位点(SNP或其它)之间的碱基和已预环化探针的3′端和/或5′被设计成用于杂交的碱基的数量可以是一个碱基，它可以是两个碱基，它可以是三个碱基，它可以是四个碱基，它可以是五个碱基，它可以是六个碱基，它可以是七到十个碱基，它可以是十一到十五个碱基，或者它可以是十六到二十个碱基、二十到三十个碱基或三十到六十个碱基。正向和反向引物可以被设计成用于杂交不同数量的远离多态位点的碱基。可以用当前DNA合成技术产生大量环化中探针，允许产生并且潜在地合并极大量探针，实现对多个基因座的同时查询。已经报导对超过300,000个探针起作用。论述了涉及可以用于测量目标个体的基因组数据的环化中探针的方法的两篇论文包括：波雷卡(Porreca)等人，自然·方法(Nature Methods)，20074(11)，第931-936页；以及特纳(Turner)等人，自然·方法，2009，6(5)，第315-316页。这些论文中所述的方法可以与本文中所述的其它方法组合使用。这两篇论文的方法的某些步骤可以与本文中所述的其它方法的其它步骤组合使用。In one embodiment of the invention, this method has been modified to readily amenable to sequencing as a means of querying for fillers in sequences. In order to maintain the initial allelic ratio of the initial sample, at least one key consideration must be taken into account. The position of the variant among the different alleles in the gap-fill region must not be too close to the probe binding site, since there will be an initiation bias caused by the DNA polymerase, causing variant differences. Another consideration is that there may be other variations in the probe binding site associated with variants in the gap-fill region that would lead to unequal amplification of different alleles. In one embodiment of the invention, the 3' and 5' ends of the precircularized probes are designed to hybridize to one or more of the variant positions (polymorphic sites) away from the targeted allele. bases at positions. The number of bases between the polymorphic site (SNP or other) and the 3' end and/or 5' of the precircularized probe designed to hybridize can be one base, which can be Two bases, it could be three bases, it could be four bases, it could be five bases, it could be six bases, it could be seven to ten bases, it could be Eleven to fifteen bases, or it could be sixteen to twenty bases, twenty to thirty bases or thirty to sixty bases. Forward and reverse primers can be designed to hybridize different numbers of bases away from the polymorphic site. Large numbers of circularizing probes can be produced with current DNA synthesis techniques, allowing the generation and potentially pooling of extremely large numbers of probes, enabling simultaneous interrogation of multiple loci. Has been reported to work with over 300,000 probes. Two papers that discuss methods involving circularizing probes that can be used to measure genomic data from an individual of interest include: Porreca et al., Nature Methods, 20074(11), p. 931 - pp. 936; and Turner et al., Nature Methods, 2009, 6(5), pp. 315-316. The methods described in these papers can be used in combination with other methods described herein. Certain steps of the methods of these two papers may be used in combination with other steps of other methods described herein.

在本文中所公开的方法的一些实施例中，任选地扩增目标个体的遗传物质，接着杂交已预环化探针，执行间隙填充以在已杂交探针的两个末端之间填充碱基，接合两个末端以形成环化探针，并且使用例如滚环扩增来扩增已环化探针。在通过例如在LIP系统中使恰当设计的寡核苷酸探针环化来撷取所希望的目标等位基因信息之后，可以测量已环化探针的基因序列，得到所希望的序列数据。在一个实施例中，恰当设计的寡核苷酸探针可以在目标个体的未扩增遗传物质上直接环化，并且随后扩增。应注意，可以使用多个扩增程序来扩增初始遗传物质或已环化LIP，包括滚环扩增、MDA或其它扩增方案。可以使用不同方法来测量目标基因组上的遗传信息，例如使用高通量测序、桑格测序(Sanger sequencing)、其它测序方法、通过杂交捕获、通过环化捕获、复合PCR、其它杂交方法以及其组合。In some embodiments of the methods disclosed herein, the genetic material of the target individual is optionally amplified, followed by hybridization of the pre-circularized probes, gap filling is performed to fill bases between the two ends of the hybridized probes base, the two ends are joined to form a circularized probe, and the circularized probe is amplified using, for example, rolling circle amplification. After extracting the desired target allelic information by circularizing appropriately designed oligonucleotide probes, eg, in a LIP system, the gene sequence of the circularized probes can be measured to obtain the desired sequence data. In one example, appropriately designed oligonucleotide probes can be directly circularized on the unamplified genetic material of a target individual and subsequently amplified. It should be noted that multiple amplification procedures can be used to amplify the original genetic material or the circularized LIP, including rolling circle amplification, MDA or other amplification schemes. Different methods can be used to measure genetic information on the target genome, such as using high-throughput sequencing, Sanger sequencing, other sequencing methods, capture by hybridization, capture by circularization, multiplex PCR, other hybridization methods, and combinations thereof .

在已经使用以上方法中的一或多种测量了个体的遗传物质之后，然后可以使用基于信息的方法(例如PARENTAL SUPPORT^TM方法)以及恰当的基因测量术来测定个体的一或多条染色体的倍性状态和/或一个或一组等位基因、尤其是与相关疾病或遗传状态相关的那些等位基因的遗传状态。应注意，已经报告使用LIP用于基因序列的复合捕获，接着是用测序进行基因分型。然而，尚未使用由用于扩增单细胞、少量细胞中所发现的遗传物质或细胞外DNA的基于LIP的策略产生的测序数据来测定目标个体的倍性状态。After an individual's genetic material has been measured using one or more of the above methods, the individual's ploidy of one or more chromosomes can then be determined using an informative method (such as the PARENTAL SUPPORT ^™ method) along with appropriate genetic measurements. Sexual status and/or the genetic status of one or a group of alleles, especially those alleles associated with a relevant disease or genetic status. It should be noted that the use of LIP has been reported for composite capture of gene sequences followed by genotyping with sequencing. However, sequencing data generated by LIP-based strategies for amplifying the genetic material found in single cells, small numbers of cells, or extracellular DNA have not been used to determine the ploidy state of an individual of interest.

应用基于信息的方法从如通过杂交阵列(例如伊路米那印飞尼姆(INFINIUM)阵列或昂飞(AFFYMETRIX)基因芯片)测量的遗传数据测定个体的倍性状态已经在本文档中其它地方的文档参考文献中有所描述。然而，本文中所述的方法展示了优于先前在文献中所述方法的改进。举例来说，基于LIP的方法、接着是高通量测序出乎意料地提供了更好的基因型数据，因为所述方法具有更好的复合能力、更好的捕获特异性、更好的均匀性以及低等位基因偏差。更大的复合允许靶向更多等位基因，给出更准确的结果。更好的均匀性引起更多靶向等位基因被测量，给出更准确的结果。更低的等位基因偏差率产生了更低的错误判读率，给出更准确的结果。更准确的结果产量了临床结果的改进，并产生了更好的医疗护理。The application of information-based methods to determine the ploidy state of individuals from genetic data as measured by hybridization arrays (e.g., Illumina Infinium arrays or AFFYMETRIX gene chips) has been described elsewhere in this document described in the documentation references for . However, the methods described herein demonstrate improvements over methods previously described in the literature. For example, a LIP-based approach followed by high-throughput sequencing unexpectedly provides better genotypic data due to better recombination capabilities, better capture specificity, better uniformity sex and low allelic bias. Larger composites allow targeting of more alleles, giving more accurate results. Better uniformity results in more targeted alleles being measured, giving more accurate results. A lower allelic bias yields a lower rate of miscalls, giving more accurate results. More accurate results yield improved clinical outcomes and lead to better medical care.

重要的是应注意，可以使用LIP作为用于靶向DNA样品中的特异性基因座的方法用于通过除测序以外的方法进行基因分型。举例来说，LIP可以用于靶向DNA以便使用SNP阵列或其它基于DNA或RNA的微阵列进行基因分型。It is important to note that LIP can be used as a method for targeting specific loci in a DNA sample for genotyping by methods other than sequencing. For example, LIP can be used to target DNA for genotyping using SNP arrays or other DNA or RNA based microarrays.

接合介导的PCRconjugation-mediated PCR

接合介导的PCR可以用于在使用未接合的引物进行PCR扩增之前或之后扩增目标基因座。接合介导的PCR是一种用于通过扩增DNA混合物中的一个或多个基因座来优先富集DNA样品的PCR方法，所述方法包含：获得一组引物对，其中所述对中的每个引物含有目标特异性序列和非目标序列，其中目标特异性序列优选地被设计成用于退火到目标区域：多态位点的一个上游和一个下游，并且所述目标区域可以与多态位点相隔0、1、2、3、4、5、6、7、8、9、10、11-20、21-30、31-40、41-50、51-100或大于100；从上游引物的3′端聚合DNA以用与目标分子互补的核苷酸填充它与下游引物的5′端之间的单链区；将上游引物的最后聚合的碱基接合到下游引物的相邻5′碱基；以及使用上游引物的5′端和下游引物的3′端处所含的非目标序列仅扩增聚合并且接合的分子。可以在同一反应中混合不同目标的引物对。非目标序列充当通用序列以使得已经成功地聚合并且接合的所有引物对可以用单对扩增引物进行扩增。Ligation-mediated PCR can be used to amplify the locus of interest either before or after PCR amplification using non-ligated primers. Ligation-mediated PCR is a PCR method for preferentially enriching a DNA sample by amplifying one or more loci in a DNA mixture, the method comprising: obtaining a set of primer pairs in which Each primer contains a target-specific sequence and a non-target sequence, wherein the target-specific sequence is preferably designed for annealing to a target region: one upstream and one downstream of the polymorphic site, and the target region can be associated with the polymorphic site. Sites are 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11-20, 21-30, 31-40, 41-50, 51-100 or more than 100 apart; from upstream The 3′ end of the primer polymerizes DNA to fill the single-stranded region between it and the 5′ end of the downstream primer with nucleotides complementary to the target molecule; the last polymerized base of the upstream primer is joined to the adjacent 5′ end of the downstream primer ' bases; and using non-target sequences contained at the 5' end of the upstream primer and the 3' end of the downstream primer to amplify only the polymerized and ligated molecule. Primer pairs for different targets can be mixed in the same reaction. The non-target sequence acts as a universal sequence so that all primer pairs that have successfully polymerized and ligated can be amplified with a single pair of amplification primers.

通过杂交捕获capture by hybridization

在一些实施例中，本发明方法可以涉及除了使用复合PCR以外，还使用以下通过杂交捕获的方法中的任一种来扩增目标基因座。目标基因组中的一组特定序列的优先富集可以用多种方式来实现。在本文档中的其它地方描述了如何使用LIP来靶向一组特定序列，但是在所有那些应用中，为了同一目的，同样可以使用其它靶向和/或优先富集方法。另一种靶向方法的一个实例是通过杂交捕获的方法。通过杂交捕获的商用技术的一些实例包括安捷伦(AGILENT)的SURE SELECT和伊路米那的TRUSEQ。在通过杂交捕获中，使一组与所希望的靶向序列互补或大部分互补的寡核苷酸杂交到DNA混合物，并且然后以物理方式与所述混合物分离。在所希望的序列已经杂交到靶向寡核苷酸之后，以物理方式取出靶向寡核苷酸的作用还用于取出靶向序列。在取出已杂交寡核苷酸之后，可以将它们加热到高于其解链温度并且可以对它们进行扩增。以物理方式取出靶向寡核苷酸的一些方式是通过使靶向寡核苷酸共价键结到固体载体，例如磁珠或芯片。以物理方式取出靶向寡核苷酸的另一种方式是通过将它们共价键结到对另一个分子部分具有强亲和力的分子部分。这种分子对的一个实例是生物素和链霉亲和素，例如用于SURE SELECT中。因此，所述靶向序列可以共价附接到生物素分子，并且在杂交之后，可以使用粘附有链霉亲和素的固体载体拉下生物素标记的寡核苷酸，所述寡核苷酸上杂交有靶向序列。In some embodiments, methods of the invention may involve the use of any of the following methods of capture by hybridization in addition to the use of multiplex PCR to amplify the locus of interest. The preferential enrichment of a specific set of sequences in a target genome can be achieved in a number of ways. How to use LIP to target a specific set of sequences is described elsewhere in this document, but in all those applications other methods of targeting and/or preferential enrichment can be used for the same purpose as well. An example of another targeting method is the method of capture by hybridization. Some examples of commercially available technologies for capture by hybridization include AGILENT's SURE SELECT and Illumina's TRUSEQ. In capture by hybridization, a set of oligonucleotides that are complementary or substantially complementary to a desired targeting sequence are hybridized to a DNA mixture and then physically separated from the mixture. The act of physically removing the targeting oligonucleotide after the desired sequence has hybridized to the targeting oligonucleotide also serves to remove the targeting sequence. After the hybridized oligonucleotides are removed, they can be heated above their melting temperature and they can be amplified. Some ways of physically removing the targeting oligonucleotide are by covalently bonding the targeting oligonucleotide to a solid support, such as a magnetic bead or chip. Another way to physically remove targeting oligonucleotides is by covalently bonding them to a molecular moiety that has a strong affinity for another molecular moiety. An example of such a molecular pair is biotin and streptavidin, eg as used in SURE SELECT. Thus, the targeting sequence can be covalently attached to a biotin molecule, and after hybridization, the biotin-labeled oligonucleotide can be pulled down using a streptavidin-attached solid support, the oligonucleotide A targeting sequence is hybridized to the nucleotide.

杂交捕获涉及使与相关目标互补的探针杂交到目标分子。最初开发杂交捕获探针是用于在目标之间相对均匀地靶向并富集大部分基因组。在所述应用中，重要的是以充分的均匀性扩增所有目标以使得所有区域都可以通过测序被检测，但是不重视初始样品中的等位基因的比例。在捕获之后，样品中所存在的等位基因可以通过已捕获分子的直接测序来测定。这些测序读数可以根据等位基因类型进行分析和计数。然而，使用当前技术，已捕获序列的所测量的等位基因分布通常并不代表初始等位基因分布。Hybridization capture involves the hybridization of probes complementary to the target of interest to the target molecule. Hybridization capture probes were originally developed to target and enrich a large portion of the genome relatively uniformly across targets. In this application, it is important to amplify all targets with sufficient uniformity so that all regions can be detected by sequencing, but regardless of the ratio of alleles in the original sample. After capture, the alleles present in the sample can be determined by direct sequencing of the captured molecules. These sequencing reads can be analyzed and counted according to allele type. However, using current techniques, the measured allelic distribution of the captured sequence is often not representative of the original allelic distribution.

在一个实施例中，通过测序执行等位基因的检测。为了捕获多态位点的等位基因身份，测序读数必需覆盖所讨论的等位基因以便估计已捕获分子的等位基因组成。因为捕获分子在测序时通常具有可变的长度，所以除非对整个分子进行测序，否则不能确保变异体位置重叠。然而，成本考虑因素以及关于最大可能长度的技术限制和测序读数的准确性使得对整个分子测序并不可行。在一个实施例中，读数长度可以从约30增加到约50或约70个碱基，可以大大增加靶向序列内变异体位置重叠的读数的数量。In one embodiment, detection of alleles is performed by sequencing. In order to capture the allelic identity of a polymorphic site, sequencing reads must cover the allele in question in order to estimate the allelic composition of the captured molecule. Because capture molecules are often of variable length when sequenced, overlapping of variant positions cannot be ensured unless the entire molecule is sequenced. However, cost considerations as well as technical limitations regarding the maximum possible length and accuracy of sequencing reads make sequencing the entire molecule infeasible. In one embodiment, the read length can be increased from about 30 to about 50 or about 70 bases, which can greatly increase the number of reads with overlapping variant positions within the targeted sequence.

用于增加查询相关位置的读数的数量的另一种方式是减小探针的长度，只要它不引起基本富集的等位基因的偏差即可。合成探针的长度应该足够长以使得被设计成用于杂交到一个基因座处所存在的两个不同等位基因的两个探针将以几乎相等的亲和力杂交到初始样品中的各个等位基因。当前，本领域中已知方法描述了通常长于120个碱基的探针。在一个当前实施例中，如果等位基因是一个或几个碱基，那么捕获探针可以是小于约110个碱基、小于约100个碱基、小于约90个碱基、小于约80个碱基、小于约70个碱基、小于约60个碱基、小于约50个碱基、小于约40个碱基、小于约30个碱基并且小于约25个碱基，并且这足以确保所有等位基因的平等富集。当待使用杂交捕获技术富集的DNA混合物是包含从例如母本血液的血液中分离的自由浮动的DNA的混合物时，DNA的平均长度非常短，通常小于200个碱基。更短探针的使用产生了杂交捕获探针将捕获所希望的DNA片段的更大机率。更大变异可能需要更长探针。在一个实施例中，相关变异是一个(SNP)到几个碱基的长度。在一个实施例中，基因组中的靶向区域可以使用杂交捕获探针优先富集，其中杂交捕获探针的长度低于90个碱基，并且可以小于80个碱基、小于70个碱基、小于60个碱基、小于50个碱基、小于40个碱基、小于30个碱基或小于25个碱基。在一个实施例中，为了增加所希望的等位基因被测序的机率，被设计成用于杂交到侧接多态等位基因位置的区域的探针的长度可以从超过90个碱基减少到约80个碱基、或约70个碱基、或约60个碱基、或约50个碱基、或约40个碱基、或约30个碱基、或约25个碱基。Another way to increase the number of reads querying relevant positions is to reduce the length of the probe, as long as it does not introduce a bias towards substantially enriched alleles. The length of the synthetic probes should be long enough that two probes designed to hybridize to the two different alleles present at a locus will hybridize with nearly equal affinity to the respective alleles in the initial sample . Currently, methods known in the art describe probes that are generally longer than 120 bases. In a current embodiment, if the allele is one or a few bases, the capture probe can be less than about 110 bases, less than about 100 bases, less than about 90 bases, less than about 80 bases bases, less than about 70 bases, less than about 60 bases, less than about 50 bases, less than about 40 bases, less than about 30 bases, and less than about 25 bases, and this is sufficient to ensure that all Equal enrichment of alleles. When the DNA mixture to be enriched using hybrid capture techniques is a mixture comprising free-floating DNA isolated from blood such as maternal blood, the average length of the DNA is very short, typically less than 200 bases. The use of shorter probes creates a greater chance that the hybrid capture probe will capture the desired DNA fragment. Larger variations may require longer probes. In one embodiment, the associated variation is one (SNP) to a few bases in length. In one embodiment, targeted regions in the genome can be preferentially enriched using hybridization capture probes, wherein the length of the hybridization capture probes is less than 90 bases, and can be less than 80 bases, less than 70 bases, Less than 60 bases, less than 50 bases, less than 40 bases, less than 30 bases, or less than 25 bases. In one embodiment, to increase the chance that the desired allele is sequenced, the length of the probes designed to hybridize to the region flanking the polymorphic allele position can be reduced from more than 90 bases to About 80 bases, or about 70 bases, or about 60 bases, or about 50 bases, or about 40 bases, or about 30 bases, or about 25 bases.

在合成探针与目标分子之间存在最小重叠以便实现捕获。这种合成探针可以做得尽可能的短但仍然大于这个所需要的最小重叠。使用更短探针长度靶向多态区域的作用是将存在更多与目标等位基因区域重叠的分子。初始DNA分子的片段化状态也影响着将与靶向等位基因重叠的读数的数量。例如血浆样品的一些DNA样品由于体内发生的生物过程已经被片段化。然而，具有更长片段的样品将在测序之前通过库制备和富集而受益于片段化。当探针和片段均较短(约60-80bp)时，可以获得最大特异性，相对较少的序列读数未能与关键相关区域重叠。There is minimal overlap between the synthetic probe and the target molecule in order to achieve capture. The synthetic probes can be made as short as possible and still be larger than the required minimum overlap. The effect of using shorter probe lengths to target polymorphic regions is that there will be more molecules that overlap the target allelic region. The fragmentation state of the initial DNA molecule also affects the number of reads that will overlap with the targeted allele. Some DNA samples, such as plasma samples, have been fragmented due to biological processes occurring in vivo. However, samples with longer fragments will benefit from fragmentation through library preparation and enrichment prior to sequencing. Maximum specificity was achieved when both probes and fragments were short (approximately 60-80 bp), with relatively few sequence reads failing to overlap key regions of interest.

在一个实施例中，可以调整杂交条件以使初始样品中所存在的不同等位基因的捕获均匀性达到最大。在一个实施例中，降低杂交温度以使等位基因之间的杂交偏差降到最低。本领域中已知的方法避免使用更低的温度进行杂交，因为降温具有增加探针杂交到不希望的目标的影响。然而，当目的是为了以最大保真度保持等位基因比率时，使用更低杂交温度的方法提供了最佳地准确的等位基因比率，但是事实是当前技术教导远离这种方法。也可以增加杂交温度以要求目标与合成探针之间更大的重叠，以使得仅仅捕获与靶向区域实质上重叠的目标。在本发明的一些实施例中，将杂交温度从正常杂交温度降低到约40℃、约45℃、约50℃、约55℃、约60℃、约65℃或约70℃。In one embodiment, hybridization conditions can be adjusted to maximize the uniformity of capture of the different alleles present in the initial sample. In one embodiment, the hybridization temperature is reduced to minimize hybridization bias between alleles. Methods known in the art avoid the use of lower temperatures for hybridization because lowering the temperature has the effect of increasing hybridization of probes to undesired targets. However, when the aim is to preserve allelic ratios with maximum fidelity, methods using lower hybridization temperatures provide optimally accurate allelic ratios, but the reality is that current technology teaches away from this method. The hybridization temperature can also be increased to require greater overlap between the target and the synthesized probe so that only targets that substantially overlap the targeted region are captured. In some embodiments of the invention, the hybridization temperature is reduced from the normal hybridization temperature to about 40°C, about 45°C, about 50°C, about 55°C, about 60°C, about 65°C, or about 70°C.

在一个实施例中，可以将杂交捕获探针设计成使得捕获探针中的DNA与在侧接多态等位基因的区域中所发现的DNA互补的区域不与多态位点紧密相邻。实际上，可以将捕获探针设计成使得捕获探针中被设计成用于杂交到侧接目标的多态位点的DNA的区域与捕获探针中将与多态位点范德华(van der Waals)接触的部分隔开长度等于一个或少量碱基的短距离。在一个实施例中，杂交捕获探针被设计成用于杂交到侧接多态等位基因但是不与它交叉的区域；这可以被称为侧接捕获探针。侧接捕获探针的长度可以是小于约120个碱基、小于约110个碱基、小于约100个碱基、小于约90个碱基，并且可以是小于约80个碱基、小于约70个碱基、小于约60个碱基、小于约50个碱基、小于约40个碱基、小于约30个碱基或小于约25个碱基。通过侧接捕获探针靶向的基因组区域可以通过1、2、3、4、5、6、7、8、9、10、11-20或大于20个碱基对与多态基因座隔开。In one embodiment, hybridization capture probes can be designed such that regions of DNA in the capture probes that are complementary to DNA found in regions flanking polymorphic alleles are not immediately adjacent to the polymorphic site. In fact, the capture probe can be designed such that the region of the capture probe designed to hybridize to the DNA flanking the polymorphic site of the target will be identical to the van der Waals polymorphic site in the capture probe. ) contacts are separated by a short distance equal in length to one or a few bases. In one embodiment, hybridization capture probes are designed to hybridize to regions that flank the polymorphic allele but do not intersect it; this may be referred to as a flanking capture probe. The flanking capture probes can be less than about 120 bases, less than about 110 bases, less than about 100 bases, less than about 90 bases, and can be less than about 80 bases, less than about 70 bases in length. bases, less than about 60 bases, less than about 50 bases, less than about 40 bases, less than about 30 bases, or less than about 25 bases. Genomic regions targeted by flanking capture probes can be separated from polymorphic loci by 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11-20, or greater than 20 base pairs .

使用靶向序列捕获描述基于靶向捕获的疾病筛选测试。定制靶向序列捕获，比如当前由安捷伦(SURE SELECT)、罗氏-尼姆布雷根(ROCHE-NIMBLEGEN)或伊路米那提供的那些。捕获探针可以被定制设计成用于确保不同类型突变的捕获。关于点突变，与点突变重叠的一或多个探针应该足以捕获突变并对其测序。A target capture based disease screening test is described using Target Enrichment Capture. Custom target enrichment, such as those currently offered by Agilent (SURE SELECT), Roche-Nimbregen (ROCHE-NIMBLEGEN), or Illumina. Capture probes can be custom designed to ensure capture of different types of mutations. With regard to point mutations, one or more probes overlapping the point mutation should be sufficient to capture and sequence the mutation.

关于小插入或缺失，与突变重叠的一或多个探针可以足以捕获包含突变的片段并对其测序。探针限制捕获效率之间的杂交可能不太有效，通常被设计成用于参考基因组序列。为了确保捕获包含突变的片段，可以设计两个探针，一个匹配正常等位基因并且一个匹配突变体等位基因。更长的探针可以增强杂交。多个重叠探针可以增强捕获。最后，将探针与突变紧密相邻但不重叠地放置可以允许正常和突变体等位基因的捕获效率相对类似。For small insertions or deletions, one or more probes overlapping the mutation may be sufficient to capture and sequence the fragment containing the mutation. Hybridization between probes that limit capture efficiency may be less efficient and are usually designed to work with a reference genome sequence. To ensure that fragments containing mutations are captured, two probes can be designed, one matching the normal allele and one matching the mutant allele. Longer probes can enhance hybridization. Multiple overlapping probes can enhance capture. Finally, placing probes in close proximity to, but not overlapping with, mutations may allow for relatively similar capture efficiencies of normal and mutant alleles.

关于单一串联重复(STR)，与这些高度变异位点重叠的探针不太可能很好地捕获片段。为了增强捕获，可以将探针与变异位点相邻但不重叠地放置。然后可以按正常方式对片段进行测序，揭示STR的长度和组成。With respect to single tandem repeats (STRs), probes that overlap these highly variable sites are less likely to capture fragments well. To enhance capture, probes can be placed adjacent to, but not overlapping with, the variant site. The fragments can then be sequenced in the normal manner, revealing the length and composition of the STRs.

关于大缺失，一系列重叠探针可以起作用，这是当前用于外显子捕获系统中的一种常用方法。然而，借由这种方法，可能难以确定个体是否或杂合的。靶向并且估计捕获区域内的SNP可以潜在地揭示所述区域内的杂合性丢失，指出个体是携带者。在一个实施例中，有可能在潜在地缺失区域内放置非重叠探针或单探针并且使用所捕获的片段数量作为杂合性的度量。在个体携带大缺失的情况下，相对于非缺失(二倍体)参考基因座，预计二分之一的片段数量可用于捕获。因此，从缺失区域获得的读数的数量应该是从正常二倍体基因座获得的大约一半。对潜在地缺失区域内的多个单探针的测序读数深度进行合计并取平均可以增强信号并且改进诊断置信度。还可以组合两种方法：靶向SNP以鉴别杂合性丢失和使用多个单探针以获得来自基因座的基本片段数量的定量测量。这些策略中的任一个或两个可以与其它策略组合以更好的获得同一目的。For large deletions, a series of overlapping probes can work, which is a common approach currently used in exon capture systems. However, with this approach, it may be difficult to determine whether an individual is or is heterozygous. Targeting and estimating SNPs within a captured region can potentially reveal loss of heterozygosity within that region, indicating that the individual is a carrier. In one embodiment, it is possible to place non-overlapping probes or single probes within potentially deleted regions and use the number of fragments captured as a measure of heterozygosity. In the case of an individual carrying a large deletion, one-half the number of fragments is expected to be available for capture relative to the non-deletion (diploid) reference locus. Therefore, the number of reads obtained from deleted regions should be about half that obtained from normal diploid loci. Aggregating and averaging the sequencing read depths of multiple single probes within potentially missing regions can enhance signal and improve diagnostic confidence. It is also possible to combine the two approaches: targeting SNPs to identify loss of heterozygosity and using multiple single probes to obtain a quantitative measure of the number of essential fragments from a locus. Either or both of these strategies can be combined with other strategies to better achieve the same goal.

如果在测试期间，cfDNA检测到男性胎儿(如通过在同一测试中捕获并测序的Y染色体片段的存在所表明)和其中母亲和父亲均不受影响的X连锁显性突变或其中母亲不受影响的显性突变，那么将表示胎儿的风险高。在不受影响的母亲的相同基因内检测到两种突变体隐性等位基因将暗示胎儿遗传了来自父亲的突变体等位基因和可能遗传了来自母亲的第二突变体等位基因。在所有情况下，可以通过羊膜穿刺术或绒毛抽样指出后续测试。If during testing, cfDNA detects a male fetus (as indicated by the presence of a Y chromosome segment captured and sequenced in the same test) and an X-linked dominant mutation in which neither the mother nor the father is affected or in which the mother is unaffected A dominant mutation would indicate a high risk to the fetus. Detection of two mutant recessive alleles within the same gene in the unaffected mother would imply that the fetus inherited the mutant allele from the father and possibly a second mutant allele from the mother. In all cases, follow-up testing may be indicated by amniocentesis or chorionic villus sampling.

基于靶向捕获的疾病筛选测试可以与关于非整倍性的基于靶向捕获的非侵入性产前诊断测试组合。Targeted capture-based disease screening tests can be combined with targeted capture-based non-invasive prenatal diagnostic tests for aneuploidy.

存在多种方式来减小读数深度(DOR)变化性：举例来说，可以增加引物浓度，可以使用更长的靶向扩增探针或可以操作更多个STA周期(例如超过25个、超过30个、超过35个或甚至超过40个)。There are various ways to reduce depth of read (DOR) variability: for example, primer concentrations can be increased, longer targeted amplification probes can be used, or more STA cycles can be run (e.g., more than 25, more than 30, more than 35 or even more than 40).

测定样品中的DNA分子数量的例示性方法Exemplary Methods of Determining the Quantity of DNA Molecules in a Sample

本文中描述了一种通过在第一轮DNA扩增期间，针对样品中的每个初始DNA分子产生独特地鉴别的分子来测定样品中的DNA分子数量的方法。在此描述了一种用于实现以上目的的程序，接着是单分子或克隆测序法。Described herein is a method for determining the number of DNA molecules in a sample by generating uniquely identified molecules for each initial DNA molecule in the sample during a first round of DNA amplification. A procedure for accomplishing the above, followed by single-molecule or clone sequencing, is described here.

所述方法需要靶向一或多个特异性基因座并且产生初始分子的标记拷贝，其方式为使得每个目标基因座的大部分或所有标记分子将具有独特的标记并且在使用克隆或单分子测序对这种条形码进行测序时可以彼此区别。每个独特测序的条形码表示初始样品中的独特分子。同时，测序数据用于确定所述分子源自哪一基因座。使用这个信息可以确定初始样品中针对每个基因座的独特分子的数量。The method requires targeting one or more specific loci and producing tagged copies of the original molecule in such a way that most or all tagged molecules for each locus of interest will have a unique tag and when using clonal or single-molecule Sequencing such barcodes can be distinguished from one another when sequenced. Each uniquely sequenced barcode represents a unique molecule in the original sample. At the same time, the sequencing data is used to determine from which locus the molecule is derived. Using this information the number of unique molecules for each locus in the initial sample can be determined.

这种方法可以用于需要对初始样品中的分子数量进行定量估计的任何应用。此外，一或多个目标的独特分子的数量可以与一或多个其它目标的独特分子的数量相关，从而测定相对拷贝数、等位基因分布或等位基因比率。或者，从各个目标检测到的拷贝数可以通过分布建模以便鉴别初始目标的最可能拷贝数。应用包括(但不限于)检测插入和缺失，例如在杜氏肌营养不良(Duchenne Muscular Dystrophy)的携带者中所发现的那些；定量染色体的缺失或复制区段，例如在拷贝数变异体中所观察到的那些；来自出生个体的样品的染色体拷贝数；来自未出生个体(例如胚胎或胎儿)的样品的染色体拷贝数。This method can be used in any application that requires a quantitative estimate of the number of molecules in an initial sample. In addition, the number of unique molecules of one or more targets can be correlated with the number of unique molecules of one or more other targets, thereby determining relative copy numbers, allelic distributions, or allele ratios. Alternatively, the detected copy number from each target can be modeled by distribution in order to identify the most likely copy number of the original target. Applications include (but are not limited to) detection of insertions and deletions, such as those found in carriers of Duchenne Muscular Dystrophy; quantification of deleted or duplicated segments of chromosomes, such as observed in copy number variants chromosomal copy numbers from samples from unborn individuals; chromosomal copy numbers from samples from unborn individuals (eg, embryos or fetuses).

所述方法可以与同时估计靶向序列中所包含的变异相组合。这可以用于测定表示初始样品中的每个等位基因的分子的数量。这种拷贝数方法可以与以下各者相组合：SNP或其它序列变异的估计以测定出生和未出生的个体的染色体拷贝数；来自基因座的具有短序列变异但是其中PCR可以扩增多个目标区域的拷贝的鉴别和定量，例如脊髓性肌萎缩的携带者检测；由不同个体的混合物组成的样品中不同来源的分子的拷贝数的测定，例如在从母本血浆获得的自由浮动的DNA检测胎儿非整倍性中。The method can be combined with simultaneous estimation of the variation contained in the targeting sequence. This can be used to determine the number of molecules representing each allele in the initial sample. This copy number approach can be combined with: estimates of SNPs or other sequence variations to determine chromosomal copy number in born and unborn individuals; Identification and quantification of copies of regions, e.g. carrier detection in spinal muscular atrophy; determination of copy number of molecules of different origin in a sample consisting of a mixture of different individuals, e.g. in free-floating DNA assays obtained from maternal plasma In fetal aneuploidy.

在一个实施例中，所述方法因为它涉及单目标基因座，所以可以包含以下步骤中的一或多个：(1)设计一对用于特异性基因座的PCR扩增的标准寡聚物。(2)在合成期间，添加具有与目标特异性寡聚物中的一个的5′端的目标基因座或基因组不具有互补性或具有最小互补性的指定碱基的序列。这个序列称为尾部，是用于后续扩增的已知序列，后面是随机核苷酸序列。这些随机核苷酸包含随机区域。随机区域包含随机产生的核酸序列，所述核酸在每个探针分子之间的概率不同。因此，在合成之后，加尾寡聚物池将由寡聚物集合组成，所述集合从已知序列开始、接着是分子之间有所不同的未知序列、接着是目标特异性序列。(3)仅仅使用加尾寡聚物执行一轮扩增(变性、退火、延伸)。(4)向反应中添加核酸外切酶，有效地停止PCR反应，并且使反应在恰当温度下保温以去除没有退火到模板并延伸以形成双链产物的正向单链寡核苷酸。(5)使反应在高温下保温以使核酸外切酶变性并消除其活动。(6)向反应中添加与第一反应中所用的寡聚物以及另一目标特异性寡聚物的尾部互补的新的寡核苷酸以实现第一轮PCR中所产生的产物的PCR扩增。(7)继续扩增以产生足够的产物用于下游克隆测序。(8)通过多种方法测量已扩增PCR产物，例如针对覆盖所述序列的足够数量的碱基进行克隆测序。In one embodiment, the method, as it involves a single locus of interest, may comprise one or more of the following steps: (1) designing a pair of standard oligos for PCR amplification of specific loci . (2) During synthesis, a sequence having designated bases having no or minimal complementarity with the target locus or genome at the 5' end of one of the target-specific oligomers is added. This sequence, called the tail, is a known sequence for subsequent amplification, followed by a random nucleotide sequence. These random nucleotides comprise random regions. Random regions contain randomly generated nucleic acid sequences that vary in probability between each probe molecule. Thus, after synthesis, the pool of tailed oligomers will consist of a collection of oligomers starting with known sequences, followed by unknown sequences that vary from molecule to molecule, followed by target specific sequences. (3) Perform one round of amplification (denaturation, annealing, extension) using tailed oligos only. (4) Adding an exonuclease to the reaction, effectively stops the PCR reaction, and incubating the reaction at an appropriate temperature to remove forward-sense single-stranded oligonucleotides that do not anneal to the template and extend to form double-stranded products. (5) Incubate the reaction at high temperature to denature and eliminate the activity of the exonuclease. (6) Adding new oligonucleotides complementary to the tail of the oligomer used in the first reaction and another target-specific oligomer to the reaction to enable PCR amplification of the product generated in the first round of PCR increase. (7) Continue to amplify to generate enough products for downstream clonal sequencing. (8) Measure the amplified PCR product by various methods, such as performing cloning and sequencing for a sufficient number of bases covering the sequence.

在一个实施例中，本发明方法涉及并行或以其它方式靶向多个基因座。不同目标基因座的引物可以独立地产生并且混合以创建复合PCR池。在一个实施例中，初始样品可以分成子池并且可以在重组和测序之前靶向每个子池中的不同基因座。在一个实施例中，可以在再分所述池之前执行标记步骤和多个扩增周期以确保在拆分之前有效靶向所有目标，并且通过使用再分池中更小的引物组继续扩增来改进后续扩增。In one embodiment, the methods of the invention involve targeting multiple loci in parallel or otherwise. Primers for different target loci can be generated independently and mixed to create a multiplex PCR pool. In one embodiment, the initial sample can be divided into subpools and different loci in each subpool can be targeted prior to recombination and sequencing. In one example, a labeling step and multiple cycles of amplification can be performed prior to subdividing the pool to ensure that all targets are efficiently targeted prior to subdividing, and amplification continues by using smaller primer sets in the subdivided pool to improve subsequent amplification.

这种技术将尤其适用的应用的一个实例是非侵入性产前非整倍性诊断，其中指定基因座处的等位基因比率或等位基因在多个基因座处的分布可以用于帮助测定胎儿中所存在的染色体的拷贝数。在此情况下，需要扩增初始样品中所存在的DNA同时维持各个等位基因的相对量。在一些情况下，尤其在存在极少量DNA的情况下，例如少于5,000个基因组拷贝、少于1,000个基因组拷贝、少于500个基因组拷贝并且少于100个基因组拷贝，可能遇到被称作瓶颈效应的现象。这是在以下情况下：存在初始样品中的任何指定等位基因的少量拷贝，并且扩增偏差可以引起DNA扩增池的那些等位基因的比率与初始DNA混合物中明显不同。通过在标准PCR扩增之前向DNA的每条链施加一组独特或几乎独特的条形码，有可能从一组n个来源于相同初始分子的测序DNA的一致分子中排除n-1个DNA拷贝。One example of an application where this technique would be particularly useful is non-invasive prenatal aneuploidy diagnosis, where the allele ratio at a given locus or the distribution of alleles at multiple loci can be used to help determine fetal The number of copies of chromosomes present in . In this case, it is desirable to amplify the DNA present in the original sample while maintaining the relative amounts of the individual alleles. In some cases, especially where very small amounts of DNA are present, such as less than 5,000 genome copies, less than 1,000 genome copies, less than 500 genome copies, and less than 100 genome copies, it is possible to encounter what is known as The phenomenon of bottleneck effect. This is the case where there are few copies of any given allele in the original sample, and amplification bias can cause the ratio of those alleles of the amplified pool of DNA to be significantly different than in the original DNA mixture. By applying a set of unique or nearly unique barcodes to each strand of DNA prior to standard PCR amplification, it is possible to exclude n − 1 DNA copies from a set of n consensus molecules derived from sequenced DNA of the same initial molecule.

举例来说，设想个体基因组和来自所述个体的DNA混合物中的杂合SNP，其中在初始DNA样品中存在每个等位基因的十个分子。在扩增之后，可以存在100,000个对应于所述基因座的DNA分子。由于随机过程，DNA比率可以是从1∶2到2∶1之间的任何比率，然而，因为初始分子中的每一个都用独特标记进行标记，所以将有可能测定扩增池中来源于每个等位基因的正好10个DNA分子的DNA。因此，这种方法将给出的每个等位基因的相对量的测量结果比不使用这种方法的方法要准确。关于其中需要将等位基因偏差的相对量降到最低的方法，这种方法将提供更准确的数据。As an example, imagine a heterozygous SNP in the genome of an individual and in a mixture of DNA from that individual, where ten molecules of each allele were present in the original DNA sample. After amplification, there may be 100,000 DNA molecules corresponding to the locus. Due to the stochastic process, the DNA ratio can be anywhere from 1:2 to 2:1, however, since each of the initial molecules is labeled with a unique label, it will be possible to determine the DNA ratio derived from each DNA in the amplified pool. DNA of exactly 10 DNA molecules for each allele. Therefore, this method will give a more accurate measure of the relative amount of each allele than a method that does not use this method. For methods where it is desired to minimize the relative amount of allelic bias, this method will provide more accurate data.

测序片段与目标基因座的结合可以用多种方式获得。在一个实施例中，从靶向片段获得具有足够长度的序列以覆盖分子条形码以及具有足够数量的对应于目标序列的独特碱基以允许明确鉴别目标基因座。在另一个实施例中，含有随机产生的分子条形码的分子条形码引物还可以含有基因座特异性条形码(基因座条形码)，它鉴别与它结合的目标。这种基因座条形码在所有分子条形码引物中针对每个单独目标并且因此所有所得扩增子将是一致的但是不同于所有其它目标。在一个实施例中，本文中所述的标记方法可以与单边嵌套方案组合。Binding of sequenced fragments to loci of interest can be obtained in a number of ways. In one embodiment, a sequence of sufficient length to cover the molecular barcode and a sufficient number of unique bases corresponding to the sequence of interest is obtained from the targeting fragment to allow unambiguous identification of the locus of interest. In another embodiment, a molecular barcode primer containing a randomly generated molecular barcode may also contain a locus-specific barcode (locus barcode) that identifies the target to which it binds. This locus barcode is for each individual target in all molecular barcode primers and thus all resulting amplicons will be identical but distinct from all other targets. In one embodiment, the labeling methods described herein can be combined with a one-sided nesting scheme.

在一个实施例中，分子条形码引物的设计和产生可以变成如下实践：分子条形码引物可以由与目标序列不互补的序列、接着是随机分子条形码区域、接着是目标特异性序列组成。分子条形码的序列5′可以用于子序列PCR扩增并且可以包含适用于将扩增子转化为用于测序的库的序列。随机分子条形码序列可以用多种方式产生。优选方法以在条形码区域的合成期间将所有四种碱基包括到反应的方式合成分子标记引物。碱基的所有或各个组合可以使用IUPAC DNA模糊代码指定。以此方式，所合成的分子集合将含有分子条形码区域中的序列的随机混合物。条形码区域的长度将决定多少引物将含有独特条形码。独特序列的数量与条形码区域的长度有关，如N^L，其中N是碱基数，通常是4，并且L是条形码长度。五个碱基的条形码可以产生多达1024个独特序列；八个碱基条形码可以产生65536个独特条形码。在一个实施例中，可以通过测序方法测量DNA，其中序列数据表示单个分子的序列。这可以包括对单个分子直接测序的方法或对单个分子进行扩增以形成可通过序列仪器检测但是仍然表示单个分子的克隆的方法，本文中称为克隆测序。In one embodiment, the design and generation of molecular barcode primers can become practiced as follows: A molecular barcode primer can consist of a sequence that is not complementary to a target sequence, followed by a random molecular barcode region, followed by a target-specific sequence. The sequence 5' of the molecular barcode can be used for subsequence PCR amplification and can contain sequences suitable for converting the amplicons into libraries for sequencing. Random molecular barcode sequences can be generated in a number of ways. The preferred method synthesizes the molecular marker primers in such a way that all four bases are included in the reaction during the synthesis of the barcode region. All or individual combinations of bases can be assigned using the IUPAC DNA ambiguity codes. In this way, the synthesized collection of molecules will contain a random mixture of sequences in the barcode region of the molecule. The length of the barcode region will determine how many primers will contain a unique barcode. The number of unique sequences is related to the length of the barcode region, as N ^L , where N is the number of bases, usually 4, and L is the barcode length. A five base barcode can generate up to 1024 unique sequences; an eight base barcode can generate 65536 unique barcodes. In one embodiment, DNA can be measured by sequencing methods, where sequence data represents the sequence of a single molecule. This can include methods of directly sequencing a single molecule or methods of amplifying a single molecule to form clones that are detectable by sequencing instruments but still represent a single molecule, referred to herein as clone sequencing.

用于定量扩增产物的例示性方法和试剂Exemplary Methods and Reagents for Quantification of Amplification Products

相关特异性核酸序列的定量通常通过定量实时PCR技术进行，例如塔克曼(生命技术)、INVADER探针(第三波技术(THIRD WAVE TECHNOLOGIES))等。这类技术遭受许多缺点，例如同时并行(复合)分析多个序列的能力和为仅仅狭窄范围的可能扩增周期(例如，当PCR扩增产量的对数对比周期数在线性范围内时)提供准确定量数据的能力有限。DNA测序技术，尤其是高通量的下一代测序技术(通常称为大规模平行测序技术)，例如在MYSEQ(伊路米那)、HISEQ(伊路米那)、离子激流(ION TORRENT)(生命技术)、基因组分析仪ILX(伊路米那)、GSFLEX+(罗氏454)等中所用的那些，可以用于定量测量样品中所存在的相关序列的拷贝数，从而提供关于起始物质的定量信息，例如拷贝数或转录水平。高通量基因测序仪能够使用条形码(即，用独特的核酸序列进行样品标记)以便鉴别来自个体的特异性样品，从而允许在DNA测序仪的单一运行中同时分析多个样品。对库制品(或其它相关的核制品)中的基因组的指定区域进行测序的次数(读数的数量)将与相关基因组中序列的拷贝数(或在含有cDNA的制品的情况下，表达水平)成比例。然而，基因库(和源于类似基因组的制品)的制备和测序可以引入许多偏差，这些偏差会干扰获得相关核酸序列的准确的定量读数。举例来说，不同核酸序列可以在核扩增步骤期间以不同效率扩增，所述核扩增步骤发生在基因库制备或样品制备期间。Quantification of relevant specific nucleic acid sequences is usually performed by quantitative real-time PCR techniques, such as Tuckerman (Life Technologies), INVADER probes (THIRD WAVE TECHNOLOGIES), etc. Such techniques suffer from a number of disadvantages, such as the ability to analyze multiple sequences in parallel (multiplex) at the same time and providing for only a narrow range of possible amplification cycles (for example, when the logarithm of the PCR amplification yield versus the number of cycles is in the linear range). The ability to accurately quantify data is limited. DNA sequencing technology, especially high-throughput next-generation sequencing technology (commonly known as massively parallel sequencing technology), such as MYSEQ (Illumina), HISEQ (Illumina), ion torrent (ION TORRENT) ( Life Technologies), Genome Analyzer ILX (Illumina), GSFLEX+ (Roche 454), etc., can be used to quantitatively measure the copy number of related sequences present in the sample, thereby providing quantitative information about the starting material Information such as copy number or transcript level. High-throughput gene sequencers are capable of using barcoding (ie, sample labeling with a unique nucleic acid sequence) in order to identify specific samples from individuals, allowing simultaneous analysis of multiple samples in a single run of the DNA sequencer. The number of times (number of reads) to sequence a given region of the genome in a library artifact (or other related nuclear artifact) will be proportional to the copy number (or in the case of cDNA-containing artifacts, the expression level) of the sequence in the associated genome. Proportion. However, the preparation and sequencing of gene banks (and artifacts derived from similar genomes) can introduce many biases that interfere with obtaining accurate quantitative readouts of related nucleic acid sequences. For example, different nucleic acid sequences may be amplified with different efficiencies during the nuclear amplification step that occurs during gene bank preparation or sample preparation.

不同扩增效率的问题可以通过使用本发明的某些实施例来缓解。本发明包括各种方法和组合物，所述方法和组合物是指使用标准物以包含在可以用于改进定量准确性的扩增过程中。本发明尤其适用于通过分析母本血液中的自由浮动的胎儿DNA来检测胎儿的非整倍性，如本文中所述和如其它地方所述：美国专利第8,008,018号；美国专利第7,332,277号；PCT公开申请WO 2012/078792A2；以及PCT公开申请WO 2011/146632A1，所述专利各自以全文引用的方式并入本文中。本发明的实施例还适用于检测体外产生的胚胎的非整倍性。可以检测的市售重要的非整倍性包括人类13号、18号、21号、X和Y染色体的非整倍性。The problem of different amplification efficiencies can be alleviated by using certain embodiments of the invention. The present invention includes various methods and compositions that refer to the use of standards for inclusion in amplification processes that can be used to improve quantitative accuracy. The present invention is particularly applicable to the detection of fetal aneuploidy by analysis of free-floating fetal DNA in maternal blood, as described herein and elsewhere: US Patent No. 8,008,018; US Patent No. 7,332,277; PCT Published Application WO 2012/078792A2; and PCT Published Application WO 2011/146632A1, each of which is incorporated herein by reference in its entirety. Embodiments of the invention are also applicable to the detection of aneuploidy in embryos produced in vitro. Commercially important aneuploidies that can be tested include those of human chromosomes 13, 18, 21, X, and Y.

本发明的实施例可以与人类或非人类核酸一起使用，并且可以应用于动物和植物来源的核酸。本发明的实施例还可以用于检测和/或定量以缺失或插入为特征的其它遗传疾病的等位基因。可以在相关等位基因的疑似携带者中检测含有缺失的等位基因。Embodiments of the invention can be used with human or non-human nucleic acids, and can be applied to nucleic acids of animal and plant origin. Embodiments of the invention may also be used to detect and/or quantify alleles of other genetic diseases characterized by deletions or insertions. Alleles containing deletions can be detected among suspected carriers of related alleles.

本发明的一个实施例包括以已知数量(相对或绝对)存在的标准。举例来说，考虑由8号染色体(含有基因座A)是二倍体并且21号染色体(含有基因座B)是三倍体的基因来源制得的基因库。可以从这种样品产生基因库，其将含有的序列的数量是样品中所存在的染色体的数量的函数，例如基因座A的200个拷贝和基因座B的300个拷贝。然而，如果基因座A的扩增效率远高于基因座B，那么在PCR之后可以存在A扩增子的60,000个拷贝和B扩增子的30,000个拷贝，从而在通过高通量DNA测序(或其它定量核酸检测技术)分析时混淆初始基因组样品的真正染色体拷贝数。为了缓解这个问题，针对基因座A采用标准序列，其中标准序列的扩增效率与基因座A基本上相同。类似地，创建针对基因座B的标准序列，其中标准序列的扩增效率与基因座B基本上相同。在PCR(或其它扩增技术)之前，向混合物中添加基因座A的标准序列和基因座B的标准序列。这些标准序列以已知数量(相对数量或绝对数量)存在。因此如果在前一实例中向混合物中添加(在扩增之前)标准序列A和标准序列B的1∶1混合物，那么将产生标准A扩增子的3000个拷贝并且将产生标准B扩增子的1000个拷贝，说明在一组相同条件下，基因座A的扩增效率是基因座B的3倍。One embodiment of the invention includes standards present in known quantities (relative or absolute). As an example, consider a gene pool made from a gene source where chromosome 8 (containing locus A) is diploid and chromosome 21 (containing locus B) is triploid. A gene library can be generated from such a sample that will contain the number of sequences that are a function of the number of chromosomes present in the sample, eg 200 copies of locus A and 300 copies of locus B. However, if locus A is amplified much more efficiently than locus B, then 60,000 copies of the A amplicon and 30,000 copies of the B amplicon may be present after PCR, making it difficult to detect the presence of an amplicon by high-throughput DNA sequencing ( or other quantitative nucleic acid detection techniques) to confuse the true chromosome copy number of the initial genome sample. To alleviate this problem, a standard sequence is employed for locus A, wherein the amplification efficiency of the standard sequence is substantially the same as that of locus A. Similarly, a standard sequence for locus B is created, wherein the standard sequence amplifies at substantially the same efficiency as locus B. Prior to PCR (or other amplification technique), a standard sequence for locus A and a standard sequence for locus B are added to the mixture. These standard sequences exist in known quantities (relative or absolute). So if in the previous example a 1:1 mixture of standard sequence A and standard sequence B was added (before amplification) to the mixture, then 3000 copies of the standard A amplicon would be produced and the standard B amplicon would be produced 1000 copies of , indicating that under the same set of conditions, the amplification efficiency of locus A is 3 times that of locus B.

在各种实施例中，基因组中含有相关SNP(或其它多态性)的一或多个所选区域可以特异性扩增并且随后测序。这种目标特异性扩增可以发生在用于测序的基因库的形成期间。所述库可以含有多个用于扩增的靶向区域。在一些实施例中，至少10、100、500、1,000、2,000、5,000、7,500、10,000、20,000、25,000、30,000、40,000、50,000、75,000或100,000个相关区域。这类库的实例在本文中有所描述并且可以见于2011年11月18日提交的美国专利申请第2012/0270212号中，所述专利以全文引用的方式并入本文中。In various embodiments, one or more selected regions of the genome containing associated SNPs (or other polymorphisms) can be specifically amplified and subsequently sequenced. This target-specific amplification can occur during the formation of a gene library for sequencing. The library can contain multiple targeted regions for amplification. In some embodiments, at least 10, 100, 500, 1,000, 2,000, 5,000, 7,500, 10,000, 20,000, 25,000, 30,000, 40,000, 50,000, 75,000, or 100,000 regions of interest. Examples of such libraries are described herein and can be found in US Patent Application No. 2012/0270212, filed November 18, 2011, which is hereby incorporated by reference in its entirety.

多种高通量DNA测序技术需要修饰遗传起始物质，例如接合通用引发位点和/或条形码，以便形成用于促进在执行后续测序反应之前进行的小核酸片段的克隆扩增的库。在一些实施例中，一或多个标准序列在基因库形成期间添加或在库扩增之前添加到基因库的前体组分。可以选择标准序列来模拟(但可基于核苷酸碱基序列区别)待制备的目标基因组片段通过高通量基因测序技术进行测序。在一个实施例中，标准序列除了一个、两个、三个、四到十个或十一到二十个核苷酸之外，可以与目标基因组片段一致。在一些实施例中，当目标基因序列含有SNP时，标准序列除了在多态碱基处的核苷酸之外，可以与SNP一致，可以选择在多态碱基处的核苷酸作为在自然界中在所述位置观察不到的四个核苷酸中的一个。标准序列可以用于多个目标基因座(例如多态基因座)的高度复合分析中。可以在库形成过程期间(在扩增之前)添加已知数量(相对或绝对)的标准序列以便提供在测定分析样品中的相关目标序列的量中准确性更大的标准度量。对标准序列的已知数量的了解结合对由具有先前表征的倍性水平(例如已知所有常染色体是二倍体)的基因组形成的用于测序的库的倍性水平形成的了解使用的组合可以用于相对于每个标准序列的对应目标序列校准每个标准序列的扩增特性并且考虑包含多个标准序列的混合物批次之间的变化。考虑到通常必需同时分析大量基因座，产生包含一大组标准序列的混合物是有用的。本发明的实施例包括包含多个标准序列的混合物。理想地，混合物中的每个标准序列的量都以高精确度已知。然而，极难获得这种理想状态，因为作为实际物质，在混合物中的每个标准序列的数量方面存在大量变化，尤其是包含大量不同的合成寡核苷酸的混合物。这种变化具有许多来源，例如批次之间体外寡核苷酸合成反应效率的变化、体积测量结果不准确、移液操作中的变化。此外，这种变化可以出现在理论上以确切相同的量含有确切相同的标准序列组的不同批次之间。因此，所关注的是独立地校准每个批次的标准序列。可以针对具有已知染色体组成的参考基因组来校准各批标准序列。可以通过在测序方案中包括最少扩增步骤或无扩增步骤对这批标准序列进行测序来校准这批标准序列。本发明的实施例包括不同标准序列的经校准混合物。本发明的其它实施例包括校准不同标准序列的混合物的方法和通过本发明方法得到的不同标准序列的经校准混合物。Many high-throughput DNA sequencing technologies require modification of genetic starting material, such as joining of universal priming sites and/or barcodes, in order to form libraries for facilitating clonal amplification of small nucleic acid fragments prior to performing subsequent sequencing reactions. In some embodiments, one or more standard sequences are added during gene bank formation or to a precursor component of the gene bank prior to bank amplification. Standard sequences can be selected to simulate (but can be differentiated based on nucleotide base sequences) target genome fragments to be prepared for sequencing by high-throughput gene sequencing technology. In one embodiment, the standard sequence can be identical to the genomic segment of interest for all but one, two, three, four to ten, or eleven to twenty nucleotides. In some embodiments, when the target gene sequence contains a SNP, the standard sequence can be consistent with the SNP except for the nucleotides at the polymorphic bases, and the nucleotides at the polymorphic bases can be selected as the nucleotides at the polymorphic bases in nature. one of the four nucleotides not observed at that position. Standard sequences can be used in highly compounded analyzes of multiple loci of interest (eg, polymorphic loci). Known quantities (relative or absolute) of standard sequences can be added during the library formation process (prior to amplification) in order to provide a standard measure with greater accuracy in determining the amount of related target sequences in an assay sample. Knowledge of the known quantity of standard sequences is used in combination with knowledge of the ploidy level formation of libraries for sequencing formed from genomes with previously characterized ploidy levels (e.g., all autosomes are known to be diploid) It can be used to calibrate the amplification properties of each standard sequence relative to its corresponding target sequence and to account for batch-to-batch variation in mixtures containing multiple standard sequences. Given that often a large number of loci must be analyzed simultaneously, it is useful to generate a mixture comprising a large set of standard sequences. Embodiments of the invention include mixtures comprising multiple standard sequences. Ideally, the amount of each standard sequence in the mixture is known with high precision. However, it is extremely difficult to achieve this ideal state because, as a practical matter, there is a large variation in the amount of each standard sequence in a mixture, especially a mixture containing a large number of different synthetic oligonucleotides. This variation has many sources, such as variations in the efficiency of in vitro oligonucleotide synthesis reactions between batches, inaccurate volume measurements, and variations in pipetting practices. Furthermore, such variations can occur between different batches that theoretically contain the exact same set of standard sequences in the exact same amounts. Therefore, it is of interest to calibrate each batch of standard sequences independently. Each batch of standard sequences can be calibrated against a reference genome of known chromosomal composition. The batch of standard sequences can be calibrated by sequencing the batch of standard sequences including minimal or no amplification steps in the sequencing protocol. Embodiments of the invention include calibrated mixtures of different standard sequences. Other embodiments of the invention include methods of calibrating mixtures of different standard sequences and calibrated mixtures of different standard sequences obtained by the methods of the invention.

本发明标准序列混合物和其使用方法的各个实施例可以包含至少10、100、500、1,000、2,000、5,000、7,500、10,000、20,000、25,000、30,000、40,000、50,000、75,000或100,000或更多个标准序列，以及各个中间量。标准序列的数量可以与所选用于在用于DNA测序的靶向库的产生期间进行分析的目标序列的数量相同。然而，在一些实施例中，使用比所构造的库中的靶向区域的数量少的数量的标准序列可能是有利的。使用更低数量可能是有利的，以便避免遇到所用高通量DNA测序仪的测序能力的限制。标准序列的数量可以是50％或小于靶向区域的数量、40％或小于靶向区域的数量、30％或小于靶向区域的数量、20％或小于靶向区域的数量、10％或小于靶向区域的数量、5％或小于靶向区域的数量、1％或小于靶向区域的数量以及各个中间值。举例来说，如果使用15,000对靶向含有特异性SNP的基因座的引物创建基因库，那么可以在库构造的扩增步骤之前添加含有1500个对应于15,000个目标基因座中的1500个的标准序列的合适混合物。Various embodiments of the standard sequence mixtures of the invention and methods of using them can comprise at least 10, 100, 500, 1,000, 2,000, 5,000, 7,500, 10,000, 20,000, 25,000, 30,000, 40,000, 50,000, 75,000, or 100,000 or more standard sequence, and various intermediate quantities. The number of standard sequences may be the same as the number of target sequences selected for analysis during generation of a targeted library for DNA sequencing. However, in some embodiments it may be advantageous to use a smaller number of standard sequences than the number of targeted regions in the constructed library. It may be advantageous to use lower amounts in order to avoid encountering limitations in the sequencing capabilities of the high throughput DNA sequencer used. The number of standard sequences can be 50% or less than the number of targeted regions, 40% or less than the number of targeted regions, 30% or less than the number of targeted regions, 20% or less than the number of targeted regions, 10% or less The number of targeted regions, the number of 5% or less targeted regions, the number of 1% or smaller targeted regions, and various intermediate values. For example, if a gene library was created using 15,000 primer pairs targeting loci containing a specific SNP, then a standard containing 1500 corresponding to 1500 of the 15,000 loci of interest could be added prior to the amplification step of library construction A suitable mixture of sequences.

在库构造期间所添加的标准序列的量在不同实施例中可以显著变化。在一些实施例中，每个标准序列的量可以与用于库制备的基因组物质样品中所存在的目标序列的预测量大致相同。在其它实施例中，每个标准序列的量可以大于或小于用于库制备的基因组物质样品中所存在的目标序列的预测量。虽然目标序列和标准序列的初始相对量不是本发明功能的关键，但是所述量优选在比用于库制备的基因组物质样品中所存在的目标序列的量大100倍到小100倍的范围内。过量标准会在DNA测序仪的指定运行中使用所述仪器的过多测序能力。使用过低量的标准序列将导致数据不足以辅助分析扩增效率变化。The amount of standard sequences added during library construction can vary significantly in different embodiments. In some embodiments, the amount of each standard sequence can be about the same as the predicted amount of the target sequence present in the sample of genomic material used for library preparation. In other embodiments, the amount of each standard sequence may be greater or less than the predicted amount of the target sequence present in the sample of genomic material used for library preparation. While the initial relative amounts of target and standard sequences are not critical to the function of the invention, the amounts are preferably in the range of 100-fold greater to 100-fold less than the amount of target sequence present in the sample of genomic material used for library preparation . Excess standards use excess sequencing capacity of the DNA sequencer for a given run of the instrument. Using too low a standard sequence will result in insufficient data to aid in the analysis of changes in amplification efficiency.

可以选择核苷酸碱基序列与相关扩增区域非常类似的标准序列；优选地，标准序列具有与所分析的基因组区域，即“目标序列”确切相同的引物结合位点。标准序列必须可与指定基因座处的对应目标序列区别。为方便起见，标准序列的这种可区别的区域将被称作“标志物序列”。在一些实施例中，目标序列的标志物序列区含有多态区域，例如SNP，并且可以通过引物结合区侧接在两侧上。可以选择密切匹配对应目标序列的GC含量的标准序列。在一些实施例中，标准序列的引物结合区通过通用引发位点侧接。选择这些通用引发位点以匹配用于分析的基因组库中所用的通用引发位点。在其它实施例中，标准序列不具有通用引发位点并且在库产生期间添加通用引发位点。标准序列通常以单链形式提供。相对于对应的目标序列和用于扩增目标序列的序列特异性试剂来定义标准序列。在一些实施例中，用于分析的核酸样品中所存在的目标序列含有相关多态性，例如SNP、缺失或插入。标准序列是合成多核苷酸，它的核苷酸碱基序列与目标序列类似但是仍然可通过至少一个核苷酸碱基差异与目标序列区别，从而提供区别来源于标准序列的扩增子序列与来源于目标序列的扩增子序列的机制。标准序列经选择以使得当用同一组扩增试剂(例如PCR引物)扩增时具有与对应的目标序列基本上相同的扩增特性。在一些实施例中，标准序列可以具有与对应的目标序列相同的引物序列结合位点。在其它实施例中，标准序列可以具有与对应的目标序列不同的引物序列结合位点。在一些实施例中，标准序列可以经选择以产生长度与从对应的目标序列产生的扩增子的长度相同的扩增子。在其它实施例中，标准序列可以经选择以产生长度与从对应的目标序列产生的扩增子的长度略微不同的扩增子。A standard sequence can be selected whose nucleotide base sequence is very similar to the relevant amplified region; preferably, the standard sequence has exactly the same primer binding sites as the genomic region being analyzed, ie the "target sequence". A standard sequence must be distinguishable from the corresponding target sequence at a given locus. For convenience, such distinguishable regions of standard sequences will be referred to as "marker sequences". In some embodiments, the marker sequence region of the target sequence contains polymorphic regions, such as SNPs, and may be flanked on both sides by primer binding regions. A standard sequence can be selected that closely matches the GC content of the corresponding target sequence. In some embodiments, the primer binding region of the standard sequence is flanked by universal priming sites. These universal priming sites were chosen to match those used in the genomic libraries used for analysis. In other embodiments, the standard sequence does not have a universal priming site and the universal priming site is added during library generation. Standard sequences are usually provided as single strands. Standard sequences are defined relative to corresponding target sequences and the sequence-specific reagents used to amplify the target sequences. In some embodiments, the target sequence present in the nucleic acid sample for analysis contains associated polymorphisms, such as SNPs, deletions or insertions. A standard sequence is a synthetic polynucleotide that is similar in nucleotide base sequence to the target sequence but is still distinguishable from the target sequence by at least one nucleotide base difference, thereby providing an amplicon sequence that distinguishes the standard sequence from that of the target sequence. Mechanism of amplicon sequence derived from target sequence. A standard sequence is selected to have substantially the same amplification properties as a corresponding target sequence when amplified with the same set of amplification reagents (eg, PCR primers). In some embodiments, a standard sequence can have the same primer sequence binding site as a corresponding target sequence. In other embodiments, a standard sequence may have a different primer sequence binding site than the corresponding target sequence. In some embodiments, standard sequences can be selected to produce amplicons of the same length as amplicons generated from corresponding target sequences. In other embodiments, standard sequences may be selected to produce amplicons of slightly different lengths than amplicons generated from corresponding target sequences.

在扩增反应已经完成之后，在高通量DNA测序仪上对库进行测序，其中对单独分子进行无性扩增和测序。对目标序列的每个等位基因的序列读数的数量进行计数，也对对应于目标序列的标准序列的序列读数的数量进行计数。还对至少一对其它的目标序列和对应的标准序列执行所述过程。考虑例如基因座A，产生基因座A的等位基因1的X_A1读数；产生基因座A的等位基因2的X_A2读数，并且产生标准序列A的X_AC读数。测定每个相关基因座的(X_A1加X_A2)与X_AC的比率。如先前所论述，可以对参考基因组，例如已知所有染色体是二倍体的基因组执行所述过程。所述过程可以重复多次以便提供大量读数值，从而测定平均读数数量和读数数量的标准偏差。对包含大量对应于不同基因座的不同标准序列的混合物执行所述过程。通过假定(1)X_A1加X_A2对应于染色体的已知数量，例如对于正常人类女性基因组来说是2，并且(2)标准序列具有与其对应的天然基因座类似的扩增(和可检测性)特性，可以确定复合标准混合物中不同标准序列的相对量。经校准的复合标准序列混合物然后可以用于调整复合扩增反应中不同基因座之间扩增效率的变化性。After the amplification reaction has been completed, the library is sequenced on a high-throughput DNA sequencer, in which individual molecules are clonally amplified and sequenced. The number of sequence reads for each allele of the target sequence is counted, and the number of sequence reads corresponding to the standard sequence of the target sequence is also counted. The process is also performed on at least one other pair of target sequences and corresponding standard sequences. Consider for example locus A, generate X _A1 reads for allele 1 of locus A; generate X _A2 reads for allele 2 of locus A, and generate X _AC reads for canonical sequence A. The ratio of (X _A1 plus X _A2 ) to X _AC was determined for each locus of interest. As previously discussed, the process can be performed on a reference genome, such as a genome in which all chromosomes are known to be diploid. The process can be repeated multiple times to provide a large number of reads to determine the average number of reads and the standard deviation of the number of reads. The process is performed on a mixture comprising a large number of different standard sequences corresponding to different loci. By assuming that (1) X _A1 plus X _A2 corresponds to a known number of chromosomes, e.g., 2 for a normal human female genome, and (2) the standard sequence has similar amplification (and detectability) to its corresponding natural locus properties, the relative amounts of different standard sequences in a composite standard mixture can be determined. The calibrated composite standard sequence mixture can then be used to adjust for variability in amplification efficiency between different loci in the multiplex amplification reaction.

本发明的其它实施例包括用于测量相关特异性基因的拷贝数的方法和组合物，包括以将干扰通过测序定量的大量缺失为特征的复制和突变体基因。测序在检测具有这类缺失的等位基因方面会有问题。包括标准序列的扩增过程可以用于减少这种问题。Other embodiments of the invention include methods and compositions for measuring the copy number of specific genes of interest, including duplications and mutant genes characterized by large deletions that would interfere with quantification by sequencing. Sequencing can be problematic in detecting alleles with such deletions. Amplification procedures involving standard sequences can be used to reduce this problem.

在本发明的一个实施例中，用于分析的目标序列是具有野生型(即功能)形式和以缺失为特征的突变体形式的基因。这类基因的实例是SMN1，一种具有造成遗传疾病脊髓性肌萎缩(SMA)的缺失的等位基因。所关注的借助于高通量基因测序技术检测携带基因的突变体形式的个体。将这类技术应用于检测缺失突变可能是有问题的，尤其是因为在测序中观察不到序列(与检测简单的点突变或SNP相反)。这类实施例采用(1)对相关基因具有特异性的一对扩增引物，其中所述扩增引物将扩增相关基因(或其一部分)并且将不显著扩增突变体等位基因；(2)对应于相关基因的野生型等位基因，但是通过至少一个可检测核苷酸碱基不同的标准序列(即，目标序列)；(3)对充当参考序列的第二目标序列具有特异性的一对扩增引物；以及(4)对应于参考序列的标准序列。In one embodiment of the invention, the target sequence for analysis is a gene having a wild-type (ie functional) form and a mutant form characterized by a deletion. An example of such a gene is SMN1, an allele with a deletion that causes the genetic disease spinal muscular atrophy (SMA). Of interest are the detection of individuals carrying mutant forms of the gene by means of high-throughput gene sequencing techniques. Applying such techniques to the detection of deletion mutations can be problematic, not least because the sequence is not observed in sequencing (as opposed to detection of simple point mutations or SNPs). Such embodiments employ (1) a pair of amplification primers specific for the gene of interest, wherein the amplification primers will amplify the gene of interest (or a portion thereof) and will not significantly amplify the mutant allele; ( 2) correspond to the wild-type allele of the related gene, but differ by at least one detectable nucleotide base standard sequence (i.e., target sequence); (3) have specificity for a second target sequence serving as a reference sequence a pair of amplification primers; and (4) a standard sequence corresponding to the reference sequence.

在本发明的一个实施例中，提供了一种用于测量相关基因的拷贝数的方法，其中相关基因具有一个包含缺失的有意义的等位基因。所述方法可以采用对相关基因具有特异性的扩增试剂，例如通过扩增相关基因的至少一部分或整个相关基因或邻近相关基因的区域而不扩增所述相关基因的包含缺失的等位基因而对相关基因具有特异性的PCR引物。另外，本发明方法采用对应于相关基因的标准序列，其中所述标准序列通过来自相关基因的至少一个核苷酸碱基而不同(以使得标准序列的序列可以与天然存在的相关基因容易地区别开来)。通常，标准序列将含有与相关基因相同的引物结合位点以便使相关基因与对应于相关基因的标准序列之间的任何扩增歧视降到最低。反应还将包含对参考序列具有特异性的扩增试剂。参考序列是在有待分析的基因组中具有已知(或至少假定已知)拷贝数的序列。所述反应进一步包含对应于参考序列的标准序列。通常，对应于参考序列的标准序列将含有与参考序列相同的引物结合位点以便使参考序列与对应于参考序列的标准序列之间的任何扩增歧视降到最低。In one embodiment of the present invention, there is provided a method for measuring the copy number of a related gene having a meaningful allele comprising a deletion. The method may employ amplification reagents specific for the gene of interest, for example by amplifying at least a portion of the gene of interest or the entire gene of interest or a region adjacent to the gene of interest without amplifying deletion-containing alleles of the gene of interest. PCR primers specific to the relevant genes. In addition, the method of the present invention employs a standard sequence corresponding to a related gene, wherein the standard sequence differs by at least one nucleotide base from the related gene (so that the sequence of the standard sequence can be easily distinguished from the naturally occurring related gene open). Typically, the standard sequence will contain the same primer binding sites as the gene of interest in order to minimize any amplification discrimination between the gene of interest and the standard sequence corresponding to the gene of interest. The reaction will also contain amplification reagents specific for the reference sequence. A reference sequence is a sequence with a known (or at least assumed known) copy number in the genome to be analyzed. The reaction further comprises a standard sequence corresponding to a reference sequence. Typically, the standard sequence corresponding to the reference sequence will contain the same primer binding sites as the reference sequence in order to minimize any amplification discrimination between the reference sequence and the standard sequence corresponding to the reference sequence.

例示性核酸样品Exemplary Nucleic Acid Samples

在一些实施例中，可以制备和/或纯化遗传样品。本领域中已知多种用于实现此类目的的标准程序。在一些实施例中，可以对样品进行离心以分离各层。在一些实施例中，可以使用过滤分离DNA。在一些实施例中，DNA的制备可以涉及扩增、分离、通过色谱纯化、液液分离、离析、优先富集、优先扩增、靶向扩增或本领域中已知或本文中所述的多种其它技术中的任一种。In some embodiments, genetic samples can be prepared and/or purified. A variety of standard procedures are known in the art for this purpose. In some embodiments, the sample can be centrifuged to separate the layers. In some embodiments, filtration can be used to isolate DNA. In some embodiments, preparation of DNA may involve amplification, separation, purification by chromatography, liquid-liquid separation, isolation, preferential enrichment, preferential amplification, targeted amplification, or other methods known in the art or described herein. Any of a variety of other techniques.

在一些实施例中，本文中所公开的方法可以用于其中例如在体外受精中存在极少量DNA的情况或其中一个或几个细胞(通常少于十个细胞、少于二十个细胞或少于40个细胞)可用的法医情况。在这些实施例中，本文中所公开的方法用以从未被其它DNA污染的少量DNA作出倍性判读，但是其中少量DNA的倍性判读非常困难。在一些实施例中，本文中所公开的方法可以用于其中目标DNA被另一个体的DNA污染的情况，例如在产前诊断、亲权测试或受孕产物测试的情况下在母本血液中。这些方法将特别有利的一些其它情况将是在癌症测试的情况下，其中在更大量的正常细胞中仅仅存在一个或少量细胞。用作这些方法的一部分的基因测量可以对包含DNA或RNA的任何样品进行，所述样品例如(但不限于)：血液、血浆、体液、尿液、毛发、泪液、唾液、组织、皮肤、指甲、分裂球、胚胎、羊水、绒毛样品、粪便、胆汁、淋巴、子宫颈粘液、精液或包含核酸的其它细胞或材料。在一个实施例中，本文中所公开的方法可以用核酸检测方法进行，所述方法例如测序、微阵列、qPCR、数字PCR或用于测量核酸的其它方法。如果出于一些原因发现这是所希望的，那么可以计算在基因座处的等位基因计数概率的比率，并且所述等位基因比率可以用于与本文中所描述的一些方法组合确定倍性状态，假定所述方法是兼容的。在一些实施例中，本文中所公开的方法涉及在计算机上从关于经处理样品得到的DNA测量结果来计算多个多态基因座处的等位基因比率。在一些实施例中，本文中所公开的方法涉及在计算机上从关于经处理样品得到的DNA测量结果以及本发明所述的其它改进的任何组合来计算在多个多态基因座处的等位基因比率。In some embodiments, the methods disclosed herein may be used where, for example, in in vitro fertilization there is very little DNA or where one or a few cells (typically less than ten cells, less than twenty cells, or in 40 cells) available forensic cases. In these examples, the methods disclosed herein were used to make ploidy calls from small amounts of DNA that were not contaminated with other DNA, but where ploidy calls were very difficult. In some embodiments, the methods disclosed herein can be used in situations where the DNA of interest is contaminated with DNA from another individual, for example in maternal blood in the case of prenatal diagnosis, paternity testing, or product of conception testing. Some other situations where these methods would be particularly advantageous would be in the case of cancer testing where only one or a small number of cells are present among a larger number of normal cells. Genetic measurements used as part of these methods can be performed on any sample comprising DNA or RNA such as (but not limited to): blood, plasma, body fluids, urine, hair, tears, saliva, tissue, skin, nails , blastomeres, embryos, amniotic fluid, villi samples, stool, bile, lymph, cervical mucus, semen, or other cells or materials containing nucleic acids. In one embodiment, the methods disclosed herein can be performed using nucleic acid detection methods such as sequencing, microarrays, qPCR, digital PCR, or other methods for measuring nucleic acids. If this is found to be desirable for some reason, then the ratio of the allele count probabilities at the locus can be calculated and the allele ratio can be used to determine ploidy in combination with some of the methods described herein status, assuming the methods are compatible. In some embodiments, the methods disclosed herein involve calculating in silico allele ratios at a plurality of polymorphic loci from DNA measurements made on processed samples. In some embodiments, the methods disclosed herein involve calculating in silico alleles at multiple polymorphic loci from any combination of DNA measurements made on processed samples and other improvements described herein genetic ratio.

在一些实施例中，这种方法可以用于对单细胞、少量细胞、二到五个细胞、六到十个细胞、十到二十个细胞、二十到五十个细胞、五十到一百个细胞、一百到一千个细胞，或少量细胞外DNA，例如一到十皮克、十到一百皮克、一百皮克到一纳克、一到十纳克、十到一百纳克或一百纳克到一微克进行基因分型。In some embodiments, this method can be used for single cells, few cells, two to five cells, six to ten cells, ten to twenty cells, twenty to fifty cells, fifty to one A hundred cells, a hundred to a thousand cells, or a small amount of extracellular DNA, such as one to ten picograms, ten to one hundred picograms, one hundred picograms to one nanogram, one to ten nanograms, ten to one One hundred nanograms or one hundred nanograms to one microgram for genotyping.

例示性RNA表达研究Exemplary RNA Expression Studies

本发明的复合PCR方法可以用于增加可以在基因表达谱分析实验期间估计的目标基因座的数量。举例来说，可以同时监测数千基因的表达水平以确定一个人是否具有与疾病(例如癌症)或增加的疾病风险相关的序列(例如多态性或其它突变)。这些方法可以用于通过比较来自患病和不患病的患者的样品中的基因表达(例如特定mRNA等位基因的表达)来鉴别与增加或降低的疾病(例如癌症)风险相关的序列(例如多态性或其它突变)。另外，可以确定特殊处理、疾病或发育阶段对基因表达的影响。类似地，这些方法可以用于通过比较感染和未感染细胞或组织中的基因表达来鉴别其表达响应于病原体或其它生物体而改变的基因。在这些方法中，可以基于有待分析的多态性的频率调整测序读数的数量以使得如果存在有待检测的多态性，那么针对所述多态性执行足够的读数。The multiplex PCR method of the present invention can be used to increase the number of target loci that can be estimated during a gene expression profiling experiment. For example, the expression levels of thousands of genes can be monitored simultaneously to determine whether a person has sequences (eg, polymorphisms or other mutations) associated with disease (eg, cancer) or increased disease risk. These methods can be used to identify sequences (e.g., e.g. polymorphism or other mutation). Additionally, the effect of particular treatments, diseases or developmental stages on gene expression can be determined. Similarly, these methods can be used to identify genes whose expression is altered in response to pathogens or other organisms by comparing gene expression in infected and uninfected cells or tissues. In these methods, the number of sequencing reads can be adjusted based on the frequency of the polymorphism to be analyzed such that if there is a polymorphism to be detected, enough reads are performed for the polymorphism.

在一些实施例中，使用逆转录酶(RT)扩增含有RNA(例如mRNA)的样品并且然后使用DNA聚合酶(PCR)扩增所得DNA(例如cDNA)。RT和PCR步骤可以在同一反应体积中依次执行或分开执行。本发明引物库中的任一个可以用于这种逆转录聚合酶链式反应(RT-PCR)方法中。在不同实施例中，使用寡脱氧胸苷酸、随机引物、寡脱氧胸苷酸与随机引物的混合物或对目标基因座具有特异性的引物执行逆转录。为了避免扩增受到污染的基因组DNA，可以将RT-PCR的引物设计成使得一个引物的一部分杂交到一个外显子的3′端并且所述引物的另一部分杂交到相邻外显子的5′端。这类引物退火到从剪接mRNA合成的cDNA，但是不退火到基因组DNA。为了检测受到污染的DNA的扩增，RT-PCR引物对可以被设计成用于侧接含有至少一个内含子的区域。从cDNA(无内含子)扩增的产物比从基因组DNA(含有内含子)扩增的那些小。产物的大小差异用于检测受到污染的DNA的存在。在一些实施例中，只有当mRNA序列已知时，才选择相隔至少300-400个碱基对的引物退火位点，因为来自真核DNA的这种大小的片段可能含有剪接点。或者，可以用DNA酶处理样品以降解受到污染的DNA。In some embodiments, a sample containing RNA (eg, mRNA) is amplified using reverse transcriptase (RT) and the resulting DNA (eg, cDNA) is then amplified using DNA polymerase (PCR). The RT and PCR steps can be performed sequentially or separately in the same reaction volume. Any of the primer libraries of the invention can be used in this reverse transcription polymerase chain reaction (RT-PCR) method. In various embodiments, reverse transcription is performed using oligo-dT, random primers, a mixture of oligo-dT and random primers, or primers specific for the locus of interest. To avoid amplification of contaminated genomic DNA, primers for RT-PCR can be designed such that a part of one primer hybridizes to the 3' end of one exon and the other part of the primer hybridizes to the 5' end of an adjacent exon. 'end. Such primers anneal to cDNA synthesized from spliced mRNA, but do not anneal to genomic DNA. To detect amplification of contaminated DNA, RT-PCR primer pairs can be designed to flank a region containing at least one intron. Products amplified from cDNA (without introns) were smaller than those amplified from genomic DNA (with introns). The size difference of the products is used to detect the presence of contaminated DNA. In some embodiments, primer annealing sites that are at least 300-400 base pairs apart are selected only when the mRNA sequence is known, since fragments of this size from eukaryotic DNA are likely to contain splice junctions. Alternatively, samples can be treated with DNase to degrade contaminating DNA.

亲权测试的例示性方法Exemplary Methods of Paternity Testing

本发明的复合PCR方法可以用于改进亲权测试的准确性，因为可以一次性分析如此多的目标基因座(参看例如2011年12月22日提交的美国公开第2012/0122701号，其特此以全文引用的方式并入本文中)。举例来说，复合PCR方法可以允许适用于在本文中所述的PARENTAL SUPPORT算法中分析数千个多态基因座(例如SNP)以确定假设父亲是否是胎儿的亲生父亲。在一些实施例中，所述方法涉及(i)同时扩增来自假设父亲的遗传物质上的多个多态基因座，包括至少1,000、2,000、5,000、7,500、10,000、20,000、25,000、30,000、40,000、50,000、75,000或100,000个不同的多态基因座，从而产生第一组扩增产物；(ii)同时扩增来源于怀孕母亲的血液样品的DNA混合样品上对应的多个多态基因座以产生第二组扩增产物；其中DNA混合样品包含胎儿DNA和母本DNA；(iii)基于第一和第二组扩增产物，使用基因型测量结果，在计算机上测定假设父亲是胎儿的亲生父亲的概率；以及(iv)使用所测定的假设父亲是胎儿的亲生父亲的概率确定假设父亲是否是胎儿的亲生父亲。在不同实施例中，所述方法进一步包括同时扩增来自母亲的遗传物质上对应的多个多态基因座以产生第三组扩增产物；其中基于第一、第二和第三组扩增产物，使用基因型测量结果测定假设父亲是胎儿的亲生父亲的概率。The multiplex PCR method of the present invention can be used to improve the accuracy of paternity testing because so many loci of interest can be analyzed at one time (see, e.g., U.S. Publication No. 2012/0122701, filed December 22, 2011, which is hereby incorporated herein by reference in its entirety). For example, multiplex PCR methods can allow for the analysis of thousands of polymorphic loci (eg, SNPs) suitable for use in the PARENTAL SUPPORT algorithm described herein to determine whether the putative father is the biological father of the fetus. In some embodiments, the method involves (i) simultaneously amplifying a plurality of polymorphic loci on the genetic material from the putative father, including at least 1,000, 2,000, 5,000, 7,500, 10,000, 20,000, 25,000, 30,000, 40,000 , 50,000, 75,000, or 100,000 different polymorphic loci, thereby generating a first set of amplification products; (ii) simultaneously amplifying corresponding multiple polymorphic loci on the DNA mixture sample derived from the pregnant mother's blood sample to generating a second set of amplification products; wherein the mixed sample of DNA comprises fetal DNA and maternal DNA; (iii) determining in silico, using genotype measurements, the presumed father to be the biological father of the fetus based on the first and second sets of amplification products the probability of the father; and (iv) determining whether the hypothetical father is the biological father of the fetus using the determined probability that the hypothetical father is the biological father of the fetus. In various embodiments, the method further comprises simultaneously amplifying corresponding multiple polymorphic loci on the genetic material from the mother to generate a third set of amplification products; wherein based on the first, second, and third sets of amplification Product that uses genotype measurements to determine the probability that the putative father is the biological father of the fetus.

用于胚胎表征和选择的例示性方法Exemplary Methods for Embryo Characterization and Selection

本发明的复合PCR方法可以用于通过允许一次性分析数千个目标基因座来改进用于体外受精的胚胎选择(参看例如2008年5月27日提交、2011年12月22日提交的美国公开第2011/0092763号，其特此以全文引用的方式并入本文中)。举例来说，复合PCR方法可以允许适用于在本文中所述的PARENTAL SUPPORT算法中分析数千个多态基因座(例如SNP)以从一组用于体外受精的胚胎中选择胚胎。The multiplex PCR method of the present invention can be used to improve embryo selection for in vitro fertilization by allowing the analysis of thousands of loci of interest at once (see, e.g., U.S. publications filed May 27, 2008, December 22, 2011 2011/0092763, which is hereby incorporated by reference in its entirety). For example, multiplex PCR methods can allow for the analysis of thousands of polymorphic loci (eg, SNPs) suitable for use in the PARENTAL SUPPORT algorithm described herein to select embryos from a set of embryos for in vitro fertilization.

在一些实施例中，本发明提供估计一组胚胎中的每个胚胎将按需要发育的相对似然性的方法。在一些实施例中，所述方法涉及使来自每个胚胎的样品与同时杂交到至少1,000、2,000、5,000、7,500、10,000、20,000、25,000、30,000、40,000、50,000、75,000或100,000个不同的目标基因座的引物库接触以针对每个胚胎产生反应混合物，其中所述样品各自来源于来自胚胎的一或多个细胞。在一些实施例中，使每个反应混合物经历引物延伸反应条件以产生扩增产物。在一些实施例中，所述方法包括基于扩增产物，在计算机上测定来自每个胚胎的至少一个细胞的一或多个特征；并且基于每个胚胎的至少一个细胞的一或多个特征，在计算机上估计每个胚胎将按需要发育的相对似然性。在一些实施例中，所述方法包括使用基于信息的方法来测定至少一个特征，例如本文中所述的PARENTAL SUPPORT算法。在一些实施例中，所述特征包括倍性状态。在一些实施例中，所述特征选自由以下组成的群组：非整倍体、整倍体、嵌合体、缺体、单体性、单亲二体性、三体性、四体性、非整倍性类型、不匹配的拷贝错误三体性、匹配拷贝错误三体性、非整倍性的母本来源、非整倍性的父本来源、存在或不存在疾病连锁基因、任何非整倍体染色体的染色体身份、异常的遗传病况、缺失或复制、特征的似然性以及其组合。所述特征可以与取自由以下组成的群组的染色体相关：一号染色体、二号染色体、三号染色体、四号染色体、五号染色体、六号染色体、七号染色体、八号染色体、九号染色体、十号染色体、十一号染色体、十二号染色体、十三号染色体、十四号染色体、十五号染色体、十六号染色体、十七号染色体、十八号染色体、十九号染色体、二十号染色体、二十一号染色体、二十二号染色体、X染色体或Y染色体以及其组合。In some embodiments, the present invention provides methods of estimating the relative likelihood that each embryo in a set of embryos will develop as desired. In some embodiments, the method involves simultaneously hybridizing samples from each embryo to at least 1,000, 2,000, 5,000, 7,500, 10,000, 20,000, 25,000, 30,000, 40,000, 50,000, 75,000, or 100,000 different genes of interest The primer pools of the loci are contacted to generate a reaction mixture for each embryo, wherein the samples are each derived from one or more cells from the embryo. In some embodiments, each reaction mixture is subjected to primer extension reaction conditions to generate amplification products. In some embodiments, the method comprises determining in silico one or more characteristics of at least one cell from each embryo based on the amplification product; and based on the one or more characteristics of the at least one cell from each embryo, The relative likelihood that each embryo will develop as desired is estimated in silico. In some embodiments, the method comprises determining at least one characteristic using an information-based method, such as the PARENTAL SUPPORT algorithm described herein. In some embodiments, the characteristics include ploidy status. In some embodiments, the characteristic is selected from the group consisting of: aneuploidy, euploidy, mosaicism, deletion, monosomy, uniparental disomy, trisomy, tetrasomy, non- Type of euploidy, unmatched copy error trisomy, matched copy error trisomy, maternal source of aneuploidy, paternal source of aneuploidy, presence or absence of disease-linked genes, any aneuploidy Chromosomal identity of ploidy chromosomes, abnormal genetic conditions, deletions or duplications, likelihood of traits, and combinations thereof. The characteristic may be associated with a chromosome taken from the group consisting of: chromosome 1, chromosome 2, chromosome 3, chromosome 4, chromosome 5, chromosome 6, chromosome 7, chromosome 8, chromosome 9 Chromosome, Chromosome 10, Chromosome 11, Chromosome 12, Chromosome 13, Chromosome 14, Chromosome 15, Chromosome 16, Chromosome 17, Chromosome 18, Chromosome 19 , chromosome 20, chromosome 21, chromosome 22, chromosome X or chromosome Y and combinations thereof.

例示性产前诊断方法Exemplary Prenatal Diagnostic Methods

本发明的复合PCR方法可以用于改进产前诊断方法，例如测定胎儿染色体的倍性状态。鉴于可以同时扩增大量目标基因座，可以作出更准确的测定。The multiplex PCR method of the present invention can be used to improve prenatal diagnosis methods, such as determining the ploidy status of fetal chromosomes. Given that a large number of target loci can be amplified simultaneously, more accurate assays can be made.

在一个实施例中，本发明提供用于从自DNA混合样品(即，来自胎儿母亲的DNA和来自胎儿的DNA)测量的基因型数据和任选地从自来自母亲和可能还来自父亲的遗传物质样品测量的基因型数据来测定孕育中的胎儿的染色体的倍性状态的离体方法，其中所述测定是通过以下进行的：使用联合分布模型以鉴于亲本基因型数据针对不同的可能的胎儿倍性状态创建一组预计等位基因分布，并且比较预计等位基因分布与在混合样品中测量的实际等位基因分布，并且选择预计等位基因分布模式最密切匹配所观察到的等位基因分布模式的倍性状态。在一个实施例中，混合样品来源于母本血液或母本血清或血浆。在一个实施例中，DNA混合样品可以在目标基因座(例如，多个多态基因座)优先富集。在一个实施例中，优先富集是以使等位基因偏差降到最低的方式进行的。在一个实施例中，本发明涉及一种DNA组合物，它已经在多个基因座优先富集以使得等位基因偏差低。在一个实施例中，通过对来自混合样品的DNA进行测序来测量等位基因分布。在一个实施例中，联合分布模型采用将以二项式方式分布的等位基因。在一个实施例中，针对遗传连锁基因座同时考虑来自各个来源的现存重组频率创建预计联合等位基因分布组，例如使用来自国际HapMap协作组的数据。In one embodiment, the present invention provides genotype data for measurement from a mixed sample of DNA (i.e., DNA from the mother of the fetus and DNA from the fetus) and optionally genetic data from the mother and possibly also the father. An ex vivo method for determining the ploidy state of chromosomes in a gestating fetus from genotype data measured on a sample of material, wherein said determination is performed by using a joint distribution model for different possible fetuses given the parental genotype data Ploidy status creates a set of predicted allelic distributions and compares the predicted allelic distributions to the actual allelic distributions measured in the pooled sample and selects the predicted allelic distribution pattern that most closely matches the observed allele Ploidy status of the distribution pattern. In one embodiment, the pooled sample is derived from maternal blood or maternal serum or plasma. In one embodiment, a pooled sample of DNA can be preferentially enriched at loci of interest (eg, multiple polymorphic loci). In one embodiment, preferential enrichment is performed in a manner that minimizes allelic bias. In one embodiment, the invention relates to a DNA composition that has been preferentially enriched at multiple loci such that allelic bias is low. In one embodiment, the allelic distribution is measured by sequencing DNA from a pooled sample. In one embodiment, the joint distribution model employs alleles that will be distributed in a binomial fashion. In one embodiment, sets of predicted joint allele distributions are created for genetically linked loci taking into account existing recombination frequencies from various sources, eg, using data from the International HapMap Collaboration.

在一个实施例中，本发明提供用于非侵入性产前诊断(NPD)的方法，具体来说，通过观察关于DNA混合物测量的基因型数据中在多个多态基因座处的等位基因测量结果来确定胎儿的非整倍性状态，其中某些等位基因测量结果指示非整倍体胎儿，而其它等位基因测量结果指示整倍体胎儿。在一个实施例中，通过对来源于母本血浆的DNA混合物进行测序来测量基因型数据。在一个实施例中，DNA样品可以优先富集在对应于有待计算等位基因分布的多个基因座的DNA分子。在一个实施例中，测量包含仅仅或几乎仅仅来自母亲的遗传物质的DNA样品和可能还测量包含仅仅或几乎仅仅来自父亲的遗传物质的DNA样品。在一个实施例中，父母一方或双方的基因测量结果以及所估计的胎儿分数被用于创建对应于胎儿的不同的可能的基本遗传状态的多个预计等位基因分布；预计等位基因分布可以称为假设。在一个实施例中，母本遗传数据不是通过测量排他性地或几乎排他性地天然地源于母本的遗传物质来测定的；实际上，它是从关于包含母本和胎儿的DNA的混合物的母本血浆得到的基因测量结果来估计的。在一些实施例中，所述假设可以包含在一或多条染色体处的胎儿倍性、胎儿中哪些染色体的哪些区段遗传自哪些亲本以及其组合。在一些实施例中，通过以下来确定胎儿的倍性状态：比较所观察到的等位基因测量结果与不同的假设，其中所述假设中的至少一些对应于不同的倍性状态；并且鉴于所观察到的等位基因测量结果，选择对应于最可能正确的假设的倍性状态。在一个实施例中，这种方法涉及使用来自一些或所有所测量的SNP的等位基因测量数据，无论所述基因座是纯合的或杂合的，并且因此不涉及使用在仅仅杂合基因座处的等位基因。这种方法可能不适合遗传数据涉及仅一个多态基因座的情况。当遗传数据包含来自目标染色体的超过十个多态基因座或超过二十个多态基因座的数据时，这种方法是特别有利的。当遗传数据包含目标染色体的超过50个多态基因座、目标染色体的超过100个多态基因座或超过200个多态基因座的数据时，这种方法是尤其有利的。在一些实施例中，遗传数据可以包含目标染色体的超过500个多态基因座、目标染色体的超过1,000个多态基因座、超过2,000个多态基因座或超过5,000个多态基因座的数据。In one embodiment, the present invention provides a method for non-invasive prenatal diagnosis (NPD), specifically by observing alleles at multiple polymorphic loci in genotype data measured on a mixture of DNA Measurements are used to determine the aneuploidy status of the fetus, wherein certain allelic measurements are indicative of an aneuploid fetus and other allele measurements are indicative of a euploid fetus. In one embodiment, genotype data is measured by sequencing a mixture of DNA derived from maternal plasma. In one embodiment, the DNA sample may be preferentially enriched in DNA molecules corresponding to a plurality of loci for which the allelic distribution is to be calculated. In one embodiment, a DNA sample comprising only or almost exclusively genetic material from the mother and possibly also a DNA sample comprising only or almost exclusively genetic material from the father is measured. In one embodiment, genetic measurements of one or both parents and estimated fetal fractions are used to create a plurality of predicted allelic distributions corresponding to different possible underlying genetic states of the fetus; the predicted allelic distributions can be called hypothesis. In one embodiment, the maternal genetic data is not determined by measuring genetic material that is exclusively or nearly exclusively naturally derived from the mother; rather, it is determined from the maternal This plasma is estimated from genetic measurements obtained. In some embodiments, the assumptions may include fetal ploidy at one or more chromosomes, which segments of which chromosomes in the fetus are inherited from which parents, and combinations thereof. In some embodiments, the ploidy state of the fetus is determined by comparing observed allelic measurements with different hypotheses, wherein at least some of the hypotheses correspond to different ploidy states; and given the From the observed allelic measurements, select the ploidy state that corresponds to the most likely correct hypothesis. In one embodiment, this method involves using allelic measurement data from some or all of the measured SNPs, whether the loci are homozygous or heterozygous, and thus does not involve using Alleles at the locus. This approach may not be suitable for cases where the genetic data involve only one polymorphic locus. This approach is particularly advantageous when the genetic data contains data from more than ten polymorphic loci or more than twenty polymorphic loci from the chromosome of interest. This approach is particularly advantageous when the genetic data comprises data for more than 50 polymorphic loci for the target chromosome, for more than 100 polymorphic loci, or for more than 200 polymorphic loci for the target chromosome. In some embodiments, the genetic data may comprise data for more than 500 polymorphic loci of a chromosome of interest, of more than 1,000 polymorphic loci of a chromosome of interest, of more than 2,000 polymorphic loci, or of more than 5,000 polymorphic loci.

在一个实施例中，本文中所公开的方法得到了在多态基因座的每个等位基因的独立观察数的定量措施。这不同于例如微阵列或定性PCR的大部分方法，它们提供关于两种等位基因的比率的信息，但是不对任一等位基因的独立观察数进行定量。在提供关于独立观察数的定量信息的方法的情况下，在倍性计算中仅采用比率，而定量信息本身不适用。为了说明保留关于独立观察数的信息的重要性，考虑具有两种等位基因(A和B)的样品基因座。在第一实验中，观察到二十个A等位基因和二十个B等位基因；在第二实验中，观察到200个A等位基因和200个B等位基因。在两个实验中，所述比率(A/(A+B))均等于0.5，然而第二实验比第一实验多传达了关于A或B等位基因的频率确定性的信息。其他人的一些方法涉及对单独等位基因的等位基因比率(信道比)(即x_i/y_i)取平均或求和并且分析这个比率，将它与参考染色体进行比较或使用关于预计这个比率在特定情况下如何起作用的规则。在这类方法中不含等位基因加权，其中假定可以确保每个等位基因的PCR产物的量相同并且所有等位基因应该以相同方式起作用。这类方法具有多个缺点，并且更重要地是，妨碍了本发明其它地方所述的多种改进方法的使用。In one embodiment, the methods disclosed herein yield a quantitative measure of the number of independent observations for each allele at a polymorphic locus. This differs from most methods such as microarray or qualitative PCR, which provide information on the ratio of two alleles, but do not quantify the number of independent observations for either allele. In the case of methods that provide quantitative information about the number of independent observations, only ratios are employed in ploidy calculations, and the quantitative information itself is not applicable. To illustrate the importance of retaining information about the number of independent observations, consider a sample locus with two alleles (A and B). In the first experiment, twenty A alleles and twenty B alleles were observed; in the second experiment, 200 A alleles and 200 B alleles were observed. In both experiments the ratio (A/(A+B)) was equal to 0.5, however the second experiment conveyed more information about the frequency certainty of the A or B allele than the first experiment. Some methods by others involve averaging or summing the allelic ratios (channel ratios) of the individual alleles (i.e. x _i /y _i ) and analyzing this ratio, comparing it to a reference chromosome or using information about the predicted The rules for how a ratio works in a particular situation. There is no allelic weighting in this type of method, where it is assumed that the same amount of PCR product is ensured for each allele and that all alleles should behave in the same way. Such methods have several disadvantages and, more importantly, prevent the use of the various improved methods described elsewhere herein.

在一个实施例中，本文中所公开的方法明确模拟了所预计的二体性的等位基因频率分布以及在三体性的情况下可以预计的多个等位基因频率分布，所述三体性由减数分裂I期间的不分离、减数分裂II期间的不分离和/或胎儿发育早期有丝分裂期间的不分离产生。为了说明为何这是重要的，设想其中不存在交叉的情况：减数分裂I期间的不分离将产生三体性，其中从父母一方遗传两个不同的同源物；相比之下，减数分裂II期间或胎儿发育早期的有丝分裂期间的不分离将产生来自父母一方的相同同源物的两个拷贝。每个情形将产生在每个多态基因座以及在所有基因座处由于遗传连锁而被认为联合的不同预计等位基因频率。交叉引起同源物之间的遗传物质的交换，使遗传模式更复杂；在一个实施例中，本发明方法通过使用重组率信息外加基因座之间的物理距离来适应这一点。在一个实施例中，为了能够改进减数分裂I不分离与减数分裂II或有丝分裂不分离之间的区别，本发明方法将随着距离着丝点的距离增加而增加的交叉概率并入到所述模型中。减数分裂II和有丝分裂不分离可以通过以下事实来区别：有丝分裂不分离通常产生一个同源物的一致或几乎一致的拷贝而在减数分裂II不分离事件之后存在的两个同源物通常由于配子发生期间的一或多种交叉而不同。In one embodiment, the methods disclosed herein explicitly model the expected allele frequency distribution for disomies as well as the multiple allele frequency distributions that can be expected in the case of a trisomy, the trisomy Sex arises from nondisjunction during meiosis I, nondisjunction during meiosis II, and/or nondisjunction during mitosis early in fetal development. To illustrate why this is important, imagine a situation in which there is no crossover: nondisjunction during meiosis I would produce a trisomy, in which two distinct homologues are inherited from one parent; by contrast, meiotic Nondisjunction during division II or during mitosis early in fetal development will produce two copies of the same homologue from one parent. Each scenario will yield a different predicted allele frequency at each polymorphic locus and at all loci that are considered joint due to genetic linkage. Crossover causes the exchange of genetic material between homologues, making the pattern of inheritance more complex; in one embodiment, the methods of the invention accommodate this by using recombination rate information plus the physical distance between loci. In one embodiment, in order to enable improved discrimination between meiotic I nondisjunction and meiotic II or mitotic nondisjunction, the present method incorporates an increasing crossover probability with increasing distance from the centromere into in the model. Meiotic II and mitotic nondisjunction can be distinguished by the fact that mitotic nondisjunction usually produces identical or nearly identical copies of one homologue whereas two homologues present after the meiotic II nondisjunction event are usually due to Varies by one or more crosses during gametogenesis.

在一些实施例中，本文中所公开的方法涉及比较所观察到的等位基因测量结果与对应于可能的胎儿基因非整倍性的理论假设，并且不涉及对杂合基因座处的等位基因的比率进行定量的步骤。当基因座的数量低于约20时，使用包含对杂合基因座处的等位基因的比率进行定量的方法进行的倍性测定和使用包含比较所观察到的等位基因测量结果与对应于可能的胎儿遗传状态的等位基因分布假设的方法进行的倍性测定可能得到类似结果。然而，当基因座的数量超过50时，这两种方法可能得到明显不同的结果；当基因座的数量超过400、超过1,000或超过2,000时，这两种方法非常可能得到越来越明显不同的结果。这些差异是由于以下事实所致：包含对杂合基因座处的等位基因的比率进行定量而不独立地测量每个等位基因的量值并且对所述比率进行合计或取平均的方法妨碍了包括使用联合分布模型、执行连锁分析、使用二项式分布模型的技术和/或其它先进的统计技术的使用；而使用包含比较所观察到的等位基因测量结果与对应于可能的胎儿遗传状态的理论等位基因分布假设的方法可以使用这些技术，这可以实质上提高测定的准确性。In some embodiments, the methods disclosed herein involve comparing observed allelic measurements to theoretical hypotheses corresponding to possible fetal Gene ratios are quantified. When the number of loci is below about 20, a ploidy determination using a method comprising quantifying the ratio of alleles at a heterozygous locus and using a method comprising comparing the observed allelic measurements with those corresponding to Ploidy determinations based on assumptions about the allelic distribution of likely fetal genetic status may yield similar results. However, when the number of loci exceeds 50, the two methods may give significantly different results; result. These differences are due to the fact that methods involving quantification of the ratio of alleles at a heterozygous locus without independently measuring the magnitude of each allele and summing or averaging the ratios prevent techniques including the use of joint distribution models, performing linkage analysis, using binomial distribution models, and/or the use of other advanced statistical techniques; Methods that state theoretical allele distribution assumptions can use these techniques, which can substantially improve the accuracy of the assay.

在一个实施例中，本文中所公开的方法涉及使用联合分布模型确定所观察到的等位基因测量结果的分布是指示整倍体还是非整倍体胎儿。联合分布模型的使用不同于通过独立地处理多态基因座来测定杂合率的方法并且显著改进了所述方法；其中不同之处在于所得决定的准确性明显更高。不受任何特定理论的限制，据信它们具有更高准确性的一个原因是联合分布模型考虑了SNP之间的连锁和在减数分裂期间已经发生的交叉的似然性，所述减数分裂产生配子，配子形成胚胎，胚胎生长成胎儿。当针对一或多种假设创建等位基因测量结果的预计分布时使用连锁概念的目的是它允许创建比不使用连锁时显著更好地符合实际的预计等位基因测量结果分布。举例来说，设想存在两个SNP：彼此相邻定位的1和2，并且母亲的一个同源物上在SNP 1是A并且在SNP 2是A，并且在第二个同源物上在SNP 1是B并且在SNP 2是B。如果父亲的两个同源物上的两个SNP均是A，并且针对胎儿SNP 1所测量的是B，那么这表示第二个同源物已经由胎儿遗传，并且因此B存在于胎儿的SNP 2上的似然性高得多。考虑了连锁的模型将预测这一点，而不考虑连锁的模型将不预测。或者，如果母亲的SNP 1处是AB并且在附近SNP2处是AB，那么可以使用对应于所述位置处的母本三体性的两种假设：一种涉及匹配拷贝错误(减数分裂II或胎儿发育早期有丝分裂中的不分离)，并且一种涉及不匹配的拷贝错误(减数分裂I中的不分离)。在匹配拷贝错误三体性的情况下，如果胎儿从母亲的SNP 1处遗传了AA，那么胎儿更有可能从母亲的SNP 2处遗传AA或BB，但不是AB。在不匹配的拷贝错误的情况下，胎儿将从母亲的两个SNP处遗传AB。通过考虑了连锁的倍性判读方法得到的等位基因分布假设将作出这些预测，并且因此对应于实际等位基因测量结果的程度显著大于未考虑连锁的倍性判读方法。应注意，当使用依赖于计算等位基因比率并且合计那些等位基因比率的方法时，连锁方法是不可能的。In one embodiment, the methods disclosed herein involve using a joint distribution model to determine whether the observed distribution of allelic measurements is indicative of a euploid or aneuploid fetus. The use of a joint distribution model differs from and significantly improves upon methods for determining heterozygosity rates by independently addressing polymorphic loci; the difference being that the resulting decisions are significantly more accurate. Without being bound by any particular theory, it is believed that one reason for their greater accuracy is that the joint distribution model takes into account the linkage between SNPs and the likelihood of crossovers that have occurred during meiosis, which Gametes are produced, the gametes form embryos, and the embryos grow into fetuses. The purpose of using the linkage concept when creating predicted distributions of allelic measurements against one or more assumptions is that it allows the creation of predicted distributions of allelic measurements that correspond significantly better to reality than if linkage were not used. As an example, imagine that there are two SNPs: 1 and 2 positioned next to each other, and that on one homologue of the mother is A at SNP 1 and A at SNP 2, and on the second homologue is A at SNP 1 is B and is B at SNP 2. If both SNPs on both congeners of the father are A, and B is measured against fetal SNP 1, then this indicates that the second congener has been inherited by the fetus, and therefore B is present at the fetal SNP The likelihood on 2 is much higher. Models that account for linkage will predict this, while models that do not account for linkage will not. Alternatively, if the mother has AB at SNP 1 and AB at nearby SNP2, then two hypotheses corresponding to maternal trisomy at said position can be used: one involving matching copy errors (either meiosis II or nondisjunction in mitosis early in fetal development), and a copy error involving a mismatch (nondisjunction in meiosis I). In the case of a matched-copy-error trisomy, if the fetus inherited AA from the mother's SNP 1, the fetus was more likely to inherit AA or BB from the mother's SNP 2, but not AB. In the case of a mismatched copy error, the fetus will inherit AB from both SNPs from the mother. Allelic distribution assumptions derived by ploidy calling methods that account for linkage will make these predictions and thus correspond to actual allelic measurements to a significantly greater extent than ploidy calling methods that do not account for linkage. It should be noted that linkage methods are not possible when using methods that rely on calculating allele ratios and summing those allele ratios.

相信使用包含比较所观察到的等位基因测量结果与对应于可能的胎儿遗传状态的理论假设的方法进行倍性测定具有更高准确性的一个原因是当使用测序来测量等位基因时，这种方法可以从来自等位基因的数据搜集更多信息，其中读数总数低于其它方法；例如，依赖于计算并合计等位基因比率的方法将产生不成比例地加权的随机噪声。举例来说，设想涉及使用测序来测量等位基因的情况，并且其中存在一组基因座，其中针对每个基因座仅检测五个序列读数。在一个实施例中，关于等位基因中的每一个，所述数据可以与假设等位基因分布相比较，并且根据序列读数的数量进行加权；因此来自这些测量的数据将进行恰当地加权并且并入到整个测定中。这与涉及对杂合基因座处的等位基因的比率进行定量的方法形成对比，因为这种方法可以仅计算可能等位基因比率的0％、20％、40％、60％、80％或100％的比率；这些当中没有一个可以接近预计等位基因比率。在这后一种情况下，所计算的等位基因比率将必须由于读数不足而被舍弃，不然就将具有不成比例的加权并且在测定中引入了随机噪声，从而降低了测定的准确性。在一个实施例中，单独等位基因测量结果可以处理成独立测量结果，其中关于同一基因座处的等位基因得到的测量结果之间的关系与关于不同基因座处的等位基因得到的测量结果之间的关系没有不同。One reason for the greater accuracy of ploidy determination using methods that involve comparing observed allelic measurements to theoretical hypotheses corresponding to probable genetic states of the fetus is believed to be that when sequencing is used to measure alleles, this This method can glean more information from data from alleles where the total number of reads is lower than other methods; for example, methods that rely on calculating and summing allele ratios will generate disproportionately weighted random noise. As an example, imagine a situation involving the use of sequencing to measure alleles, and where there is a set of loci where only five sequence reads are detected for each locus. In one embodiment, for each of the alleles, the data can be compared to a hypothetical allelic distribution and weighted according to the number of sequence reads; thus the data from these measurements will be appropriately weighted and into the entire measurement. This is in contrast to methods that involve quantifying the ratio of alleles at a heterozygous locus, since such methods can only calculate 0%, 20%, 40%, 60%, 80%, or 100% ratio; none of these came close to the predicted allelic ratio. In this latter case, the calculated allele ratios would have to be discarded due to insufficient reads, or would have disproportionate weighting and introduce random noise into the assay, reducing the accuracy of the assay. In one embodiment, individual allele measurements may be processed into independent measurements, where the relationship between measurements obtained for alleles at the same locus is the same as measurements obtained for alleles at different loci The relationship between the results is not different.

在一个实施例中，本文中所公开的方法涉及在不比较任何度量与在预计是二体的参考染色体上所观察到的等位基因测量结果(称为RC方法)的情况下，确定所观察到的等位基因测量结果的分布指示整倍体还是非整倍体胎儿。这显著改进了方法，例如使用鸟枪法测序的方法，所述方法通过估计怀疑染色体相对于一或多条假定二体参考染色体的随机测序片段的比例来检测非整倍性。如果假定二体参考染色体实际上不是二体的，那么这种RC方法得出了不正确的结果。这可能发生在其中非整倍性比单条染色体的三体性真实或其中胎儿是三倍体并且所有常染色体是三体的情况下。在雌性三倍体(69，XXX)胎儿的情况下，实际上完全不存在二体染色体。本文中所述的方法不需要参考染色体并且将能够正确地鉴别雌性三倍体胎儿中的三体染色体。关于每条染色体、假设、孩子分数和噪声水平；联合分布模型可以在不存在以下任一个的情况下适用：参考染色体数据、整体孩子分数估计或固定参考假设。In one embodiment, the methods disclosed herein involve determining the observed alleles without comparing any metric to allelic measurements observed on a reference chromosome predicted to be disomic (referred to as the RC method). The distribution of allelic measurements obtained indicates whether the fetus is euploid or aneuploid. This significantly improves methods, such as those using shotgun sequencing, that detect aneuploidy by estimating the proportion of randomly sequenced fragments of a suspected chromosome relative to one or more putative disomic reference chromosomes. This RC method gives incorrect results if the assumed disomic reference chromosome is not actually disomic. This may occur in cases where aneuploidy is more true than trisomy of a single chromosome or where the fetus is triploid and all autosomes are trisomy. In the case of a female triploid (69,XXX) fetus, disomic chromosomes are practically completely absent. The methods described herein do not require a reference chromosome and will correctly identify trisomy in female triploid fetuses. With respect to each chromosome, hypothesis, child score, and noise level; the joint distribution model can be fitted in the absence of either: reference chromosome data, overall child score estimates, or a fixed reference hypothesis.

在一个实施例中，本文中所公开的方法证实了观察多态基因座处的等位基因分布可以如何用于以比现有技术中的方法大的准确性来测定胎儿的倍性状态。在一个实施例中，所述方法使用靶向测序来获得在多个SNP处的混合的母本-胎儿基因型和任选地母亲和/或父亲基因型，以首先确定在不同假设下各个预计等位基因频率分布，并且然后观察在母本-胎儿混合物上获得的定量等位基因信息并且估计哪个假设拟合数据最好，其中对应于与数据的最佳拟合的假设的遗传状态被判读为正确的遗传状态。在一个实施例中，本文中所公开的方法还使用拟合程度来产生所判读的遗传状态是正确的遗传状态的置信度。在一个实施例中，本文中所公开的方法涉及使用分析具有不同亲本背景的基因座所发现的等位基因的分布的算法，并且比较不同亲本背景(不同亲本基因型模式)的不同倍性状态的所观察到的等位基因分布与预计等位基因分布。这不同于不使用能够估计混合的母本-胎儿样品中的每个基因座处的每个等位基因的独立情况的数量的方法并且是一种改进。在一个实施例中，本文中所公开的方法涉及使用在基因座处测量的所观察到的等位基因分布来确定所观察到的等位基因测量结果的分布指示整倍体还是非整倍体胎儿，其中母亲是杂合的。这不同于在母亲是杂合的情况下不使用基因座处所观察到的等位基因分布的方法并且是一种改进，因为在其中DNA针对已知不是所述特定目标个体的高信息量的基因座不优先富集或优先富集的情况下，允许在倍性测定中使用来自一组序列数据的多达约两倍的基因测量数据，产生更准确的测定。In one example, the methods disclosed herein demonstrate how observing the distribution of alleles at polymorphic loci can be used to determine the ploidy state of a fetus with greater accuracy than methods in the prior art. In one embodiment, the method uses targeted sequencing to obtain mixed maternal-fetal genotypes and optionally maternal and/or paternal genotypes at multiple SNPs to first determine the individual predicted genotypes under different assumptions. Allele frequency distribution, and then look at the quantitative allelic information obtained on the maternal-fetal mixture and estimate which hypothesis fits the data best, where the genetic state corresponding to the hypothesis with the best fit to the data is called for the correct genetic status. In one embodiment, the methods disclosed herein also use the degree of fit to generate a confidence that the called genetic state is the correct genetic state. In one embodiment, the methods disclosed herein involve using an algorithm that analyzes the distribution of alleles found at loci with different parental backgrounds, and comparing the different ploidy states of the different parental backgrounds (different parental genotype patterns) The observed and predicted allelic distributions of . This differs from and is an improvement over methods that do not use the ability to estimate the number of independent instances of each allele at each locus in a mixed maternal-fetal sample. In one embodiment, the methods disclosed herein involve using the observed distribution of alleles measured at a locus to determine whether the distribution of observed allelic measurements indicates euploidy or aneuploidy Fetus in which the mother is heterozygous. This differs from and is an improvement over methods that do not use the allelic distribution observed at the locus if the mother is heterozygous, as in it the DNA targets genes that are known not to be highly informative for that particular target individual The absence of preferential enrichment or preferential enrichment of loci allows for the use of approximately twice as many gene measurements from one set of sequence data in ploidy determinations, resulting in more accurate determinations.

在一个实施例中，本文中所公开的方法使用联合分布模型，所述模型假定每个基因座处的等位基因频率在自然界中是多项式的(并且因此当SNP是双等位基因时是二项的)。在一些实施例中，联合分布模型使用B-二项式分布。当使用测量技术(例如测序)为存在于每个基因座的每个等位基因提供定量测量时，二项式模型可以应用于每个基因座并且可以确定等位基因频率的基本程度和所述频率中的置信度。凭借本领域中已知从等位基因比率产生倍性判读的方法或其中舍弃定量等位基因信息的方法，不能确定所观察到的比率的确定性。本发明方法不同于计算等位基因比率并且合计那些比率以作出倍性判读的方法并且是一种改进，因为涉及计算特定基因座处的等位基因比率并且然后合计那些比率的任何方法必定假定从任何指定等位基因或基因座测量的指示DNA量的强度或计数将以高斯方式分布。本文中所公开的方法不涉及计算等位基因比率。在一些实施例中，本文中所公开的方法可以涉及在模型中并入多个基因座处的每个等位基因的观察数量。在一些实施例中，本文中所公开的方法可以涉及计算预计分布本身，允许使用联合二项式分布模型，所述模型可以比假定等位基因测量结果是高斯分布的任何模型准确。二项式分布模型明显比高斯分布准确的似然性随着基因座数量增加而增加。举例来说，当查询少于20个基因座时，二项式分布模型明显更好的似然性低。然而，当使用超过100、或尤其超过400、或尤其超过1,000、或尤其超过2,000个基因座时，二项式分布模型明显比高斯分布模型准确，从而产生更准确的倍性测定的似然性将非常高。二项式分布模型明显比高斯分布准确的似然性还随着在每个基因座处的观察数量增加而增加。举例来说，当观察每个基因座处少于10个不同序列时，二项式分布模型明显更好的似然性低。然而，当每个基因座使用超过50个序列读数、或尤其超过100个序列读数、或尤其超过200个序列读数、或尤其超过300个序列读数时，二项式分布模型明显比高斯分布模型准确，从而产生更准确的倍性测定的似然性将非常高。In one embodiment, the methods disclosed herein use a joint distribution model that assumes that the allele frequency at each locus is multinomial in nature (and thus biallelic when the SNP is biallelic). item). In some embodiments, the joint distribution model uses a B-binomial distribution. When a measurement technique such as sequencing is used to provide a quantitative measure of each allele present at each locus, a binomial model can be applied to each locus and the fundamental degree of allele frequency and the Confidence in frequency. The certainty of the observed ratios cannot be determined by means of methods known in the art to generate ploidy calls from allele ratios or methods in which quantitative allelic information is discarded. The method of the present invention differs from and is an improvement over methods of calculating allele ratios and summing those ratios to make a ploidy call, since any method involving calculating allele ratios at a particular locus and then summing those ratios necessarily assumes that from The measured intensities or counts indicative of the amount of DNA for any given allele or locus will be distributed in a Gaussian manner. The methods disclosed herein do not involve calculation of allele ratios. In some embodiments, the methods disclosed herein may involve incorporating in the model the observed quantities of each allele at multiple loci. In some embodiments, the methods disclosed herein may involve calculating the predicted distribution itself, allowing the use of a joint binomial distribution model that may be more accurate than any model that assumes a Gaussian distribution of allele measurements. The likelihood that the binomial distribution model is significantly more accurate than the Gaussian distribution increases with the number of loci. For example, when querying fewer than 20 loci, the likelihood that the binomial distribution model is significantly better is low. However, when using more than 100, or especially more than 400, or especially more than 1,000, or especially more than 2,000 loci, the binomial distribution model is significantly more accurate than the Gaussian distribution model, resulting in a more accurate likelihood of ploidy determination will be very high. The likelihood that the binomial distribution model is significantly more accurate than the Gaussian distribution also increases with the number of observations at each locus. For example, when fewer than 10 distinct sequences at each locus are observed, the likelihood that the binomial distribution model is significantly better is low. However, the binomial distribution model was significantly more accurate than the Gaussian distribution model when using more than 50 sequence reads per locus, or especially more than 100 sequence reads, or especially more than 200 sequence reads, or especially more than 300 sequence reads , resulting in a more accurate ploidy determination with a very high likelihood.

在一个实施例中，本文中所公开的方法使用测序来测量DNA样品中每个基因座处的每个等位基因的情况数量。每个测序读数可以映射到特定基因座并且处理成二进制序列读数；或者，读数一致性和/或映射的概率可以并入作为序列读数的一部分，产生概率性序列读数，即映射到指定基因座的序列读数的可能整数或分数。使用二进制计数或计数概率，有可能针对每组测量结果使用二项式分布，允许计算关于计数数量的置信区间。使用二项式分布的这种能力允许更准确的倍性估计和计算出更精确的置信区间。这不同于使用强度来测量所存在的等位基因的量的方法并且是一种改进，所述方法例如使用微阵列的方法或使用荧光阅读器进行测量来测量电泳带中荧光标记DNA的强度的方法。In one embodiment, the methods disclosed herein use sequencing to measure the number of instances of each allele at each locus in a DNA sample. Each sequencing read can be mapped to a specific locus and processed into a binary sequence read; alternatively, read identities and/or probabilities of mapping can be incorporated as part of the sequence reads, resulting in probabilistic sequence reads, i.e., probabilistic sequence reads that map to a given locus Possible integer or fraction of sequence reads. Using binary counts or count probabilities, it is possible to use a binomial distribution for each set of measurements, allowing the calculation of confidence intervals about the number of counts. This ability to use the binomial distribution allows for more accurate ploidy estimates and the calculation of more precise confidence intervals. This is different from and an improvement over methods that use intensity to measure the amount of alleles present, such as methods using microarrays or measurements using fluorescent readers to measure the intensity of fluorescently labeled DNA in electrophoretic bands method.

在一个实施例中，本文中所公开的方法使用本发明数据集的各方面来测定针对那组数据的估计等位基因频率分布的参数。这改进了利用训练组数据或先前各组数据来设定本发明预计等位基因频率分布的参数或可能预计的等位基因比率的方法。这是因为参与每一个遗传样品的收集和测量的条件设定不同，并且因此使用来自本发明数据集的数据来测定待用于所述样品的倍性测定中的联合分布模型的参数的方法将往往更准确。In one embodiment, the methods disclosed herein use aspects of the data sets of the invention to determine parameters of estimated allele frequency distributions for that set of data. This improves the method of using the training set of data or previous sets of data to set the parameters of the present invention's predicted allele frequency distribution or possibly predicted allele ratios. This is because the conditions involved in the collection and measurement of each genetic sample are set differently, and therefore the method of using data from the dataset of the present invention to determine the parameters of the joint distribution model to be used in the ploidy determination of said samples will be tends to be more accurate.

在一个实施例中，本文中所公开的方法涉及使用最大似然技术确定所观察到的等位基因测量结果的分布是指示整倍体还是非整倍体胎儿。最大似然技术的使用不同于使用单一假设排斥技术的方法并且显著改进了所述方法，其中不同之处在于所得测定将具有明显更高的准确性。一个原因是单一假设排斥技术基于仅仅一个测量结果分布而不是两个来设定截止阈值，意味着阈值通常不是最佳的。另一个原因是最大似然技术允许优化每个单独样品的截止阈值而不是无论每个单独样品的具体特征如何，测定用于所有样品的截止阈值。另一个原因是最大似然技术的使用允许计算每次倍性判读的置信度。对每次判读作出置信度计算的能力允许从业者知晓哪些判读是准确的，并且哪些更有可能是错误的。在一些实施例中，各种方法可以与最大似然估计技术组合以增强倍性判读的准确性。在一个实施例中，最大似然技术可以与美国专利7,888,017中所述的方法组合使用。在一个实施例中，最大似然技术可以与使用靶向PCR扩增来扩增混合样品中的DNA、接着测序并且使用读数计数法分析的方法组合使用，所述读数计数法例如通过如在2011年10月在蒙特利尔(Montreal)的2011年国际人类遗传学大会提出的串联诊断(TANDEMDIAGNOSTICS)来使用。在一个实施例中，本文中所公开的方法涉及估计混合样品中的DNA的胎儿分数并且使用所述估计来计算倍性判读和倍性判读的置信度。应注意，这与使用所估计的胎儿分数作为足够胎儿分数的筛选、接着使用单一假设排斥技术进行倍性判读的方法不同并且是有区别的，所述单一假设排斥技术不考虑胎儿分数也不产生针对判读的置信度计算。In one embodiment, the methods disclosed herein involve determining whether an observed distribution of allelic measurements is indicative of a euploid or aneuploid fetus using maximum likelihood techniques. The use of the maximum likelihood technique differs from and significantly improves the method using the single hypothesis rejection technique, with the difference that the resulting assay will have significantly higher accuracy. One reason is that the single hypothesis rejection technique sets the cutoff threshold based on only one measurement distribution rather than two, meaning that the threshold is often not optimal. Another reason is that the maximum likelihood technique allows optimization of the cut-off threshold for each individual sample rather than determining a cut-off threshold for all samples regardless of the specific characteristics of each individual sample. Another reason is that the use of maximum likelihood techniques allows the calculation of a confidence level for each ploidy call. The ability to make a confidence calculation for each reading allows the practitioner to know which readings are accurate and which are more likely to be wrong. In some embodiments, various methods can be combined with maximum likelihood estimation techniques to enhance the accuracy of ploidy calls. In one embodiment, the maximum likelihood technique may be used in combination with the method described in US Patent 7,888,017. In one embodiment, the maximum likelihood technique can be used in combination with methods that use targeted PCR amplification to amplify DNA in mixed samples, followed by sequencing and analysis using read counting methods such as those described in 2011 TANDEMDIAGNOSTICS presented at the 2011 International Congress of Human Genetics in Montreal in October 2011. In one embodiment, the methods disclosed herein involve estimating the fetal fraction of DNA in a pooled sample and using the estimate to calculate a ploidy call and a confidence in the ploidy call. It should be noted that this is distinct and distinct from the approach of using estimated fetal fraction as a screen for sufficient fetal fraction followed by ploidy calls using a single hypothesis exclusion technique that does not take fetal fraction into account and does not generate Confidence calculations for calls.

在一个实施例中，本文中所公开的方法通过为每个测量结果附加一个概率考虑了数据有噪声并且含有错误的趋势。使用最大似然技术从使用附加有概率性估计的测量数据得到的一组假设中选出正确假设使得不正确测量结果将减少，并且正确的测量结果将用于产生倍性判读的计算变得更有可能。为了更精确，这种方法系统地降低了不正确测量的数据对倍性测定的影响。这改进了其中假定所有数据同样正确的方法或其中从计算结果中任意排除外围数据得到倍性判读的方法。使用信道比测量的现有方法要求通过对单独SNP信道比取平均将所述方法延伸到多个SNP。不通过预计测量方差基于SNP质量和所观察到的读数深度对单独SNP进行加权降低了所得统计的准确性，导致倍性判读的准确性明显降低，尤其是在边界情况下。In one embodiment, the methods disclosed herein take into account the tendency of data to be noisy and contain errors by attaching a probability to each measurement. The use of maximum likelihood techniques to select correct hypotheses from a set of hypotheses derived using measurements appended with probabilistic estimates will result in fewer incorrect measurements and fewer correct measurements will be used in the calculations used to generate ploidy calls. possible. To be more precise, this approach systematically reduces the influence of incorrectly measured data on ploidy determinations. This improves upon methods in which all data are assumed to be equally correct or in which outlying data are arbitrarily excluded from calculation results to obtain ploidy calls. Existing methods using channel ratio measurements require extending the method to multiple SNPs by averaging the individual SNP channel ratios. Not weighting individual SNPs based on SNP quality and observed read depth by expected measurement variance reduces the accuracy of the resulting statistics, leading to significantly less accurate ploidy calls, especially in borderline cases.

在一个实施例中，本文中所公开的方法不以对胎儿上的哪些SNP或其它多态基因座是杂合的了解为先决条件。这种方法允许在其中父本基因型信息不可用的情况下进行倍性判读。这改进了以下方法：其中对哪些SNP是杂合的了解必须提前已知以便为目标恰当地选择基因座或解释对混合的胎儿/母本DNA样品所进行的基因测量。In one embodiment, the methods disclosed herein do not presuppose knowledge of which SNPs or other polymorphic loci are heterozygous on the fetus. This approach allows for ploidy calls in situations where paternal genotype information is not available. This improves methods where knowledge of which SNPs are heterozygous must be known in advance in order to properly select loci for a target or to interpret genetic measurements made on mixed fetal/maternal DNA samples.

当对其中少量DNA可用或其中胎儿DNA的百分比低的样品使用时，本文中所描述的方法是特别有利的。这是由于当仅仅少量DNA可用时发生的相应更高的等位基因丢失比率和/或当胎儿和母本DNA的混合样品中胎儿DNA的百分比低时相应更高的胎儿等位基因丢失比率。高等位基因丢失比率意味着目标个体中未测量的等位基因的百分比大，导致胎儿分数计算不准确，并且倍性测定不准确。因为本文中所公开的方法可以使用考虑了SNP之间的遗传模式的连锁的联合分布模型，所以可以作出明显更准确的倍性测定。本文中所描述的方法允许在混合物中的胎儿DNA分子的百分比小于40％、小于30％、小于20％、小于10％、小于8％并且甚至小于6％时作出准确的倍性测定。The methods described herein are particularly advantageous when used on samples where small amounts of DNA are available or where the percentage of fetal DNA is low. This is due to a correspondingly higher rate of allelic loss that occurs when only small amounts of DNA are available and/or when the percentage of fetal DNA in a mixed sample of fetal and maternal DNA is low. A high allele loss rate means that the target individual has a large percentage of unmeasured alleles, leading to inaccurate fetal fraction calculations and inaccurate ploidy determinations. Because the methods disclosed herein can use a joint distribution model that takes into account the linkage of inheritance patterns between SNPs, significantly more accurate ploidy determinations can be made. The methods described herein allow accurate ploidy determination to be made when the percentage of fetal DNA molecules in the mixture is less than 40%, less than 30%, less than 20%, less than 10%, less than 8%, and even less than 6%.

在一个实施例中，当个体的DNA与相关个体的DNA混合时，有可能基于测量结果测定个体的倍性状态。在一个实施例中，DNA混合物是母本血浆中所发现的自由浮动的DNA，它可以包括来自母亲的具有已知染色体组型和已知基因型的DNA，并且它可以与具有未知染色体组型和未知基因型的胎儿DNA混合。有可能使用来自父母一方或双方的已知基因型信息来预测不同倍性状态的混合样品中DNA的多个潜在遗传状态、每个父母对胎儿的不同染色体贡献以及任选地混合物中不同的胎儿DNA分数。每个潜在组成可以被称作假设。胎儿的倍性状态然后可以通过查看实际测量结果，并且确定哪些潜在组成最可能给出所观察到的数据来确定。In one embodiment, it is possible to determine the ploidy state of an individual based on the measurements when the individual's DNA is mixed with the DNA of a related individual. In one embodiment, the DNA mixture is the free-floating DNA found in maternal plasma, which can include DNA from the mother with a known karyotype and a known genotype, and which can be compared with DNA with an unknown karyotype. mixed with fetal DNA of unknown genotype. It is possible to use known genotype information from one or both parents to predict multiple potential genetic states of DNA in a mixed sample of different ploidy states, different chromosomal contributions from each parent to the fetus, and optionally different fetuses in the mixture DNA score. Each potential component can be called a hypothesis. The ploidy state of the fetus can then be determined by looking at the actual measurements and determining which underlying components are most likely given the observed data.

以上各点的进一步讨论可见于本文档的其它地方。Further discussion of the above points can be found elsewhere in this document.

非侵入性产前诊断(NPD)Non-Invasive Prenatal Diagnosis (NPD)

非侵入性产前诊断的过程涉及多个步骤。所述步骤中的一些可以包括：(1)从胎儿获得遗传物质；(2)离体富集可能在混合样品中的胎儿的遗传物质；(3)离体扩增遗传物质；(4)离体优先富集特异性基因座处的遗传物质；(5)离体测量遗传物质；以及(6)在计算机上并且离体分析基因型数据。在本文中描述了用于实践这六个和其它相关步骤的方法。所述方法步骤中的至少一些不直接对身体施加。在一个实施例中，本发明涉及应用于与身体分离并且分开的组织和其它生物材料的治疗和诊断方法。所述方法步骤中的至少一些在计算机上执行。The process of non-invasive prenatal diagnosis involves several steps. Some of the steps may include: (1) obtaining genetic material from the fetus; (2) ex vivo enrichment of genetic material from the fetus that may be in a mixed sample; (3) ex vivo amplification of genetic material; (4) ex vivo (5) measuring genetic material ex vivo; and (6) analyzing genotype data in silico and ex vivo. Methods for practicing these six and other related steps are described herein. At least some of the method steps are not applied directly to the body. In one embodiment, the present invention relates to therapeutic and diagnostic methods applied to tissues and other biological materials that have been isolated and separated from the body. At least some of the method steps are performed on a computer.

本发明的一些实施例允许临床医师以非侵入性方式测定在母亲体内正在孕育的胎儿的遗传状态，以使得婴儿的健康不会因收集胎儿的遗传物质而处于危险之中，并且母亲不需要经历侵入性程序。此外，在某些方面，本发明允许以高准确性、明显大于例如基于非侵入性母本血清分析物的筛选(例如广泛用于产前护理中的三重测试)的准确性测定胎儿遗传状态。Some embodiments of the present invention allow a clinician to determine the genetic status of a fetus being conceived in the mother in a non-invasive manner so that the health of the baby is not at risk by collecting the genetic material of the fetus and the mother does not need to go through invasive procedure. Furthermore, in certain aspects, the present invention allows determination of fetal genetic status with high accuracy, significantly greater than, for example, non-invasive maternal serum analyte-based screening (eg, triple testing widely used in prenatal care).

本文中所公开的方法的高准确性是如本文中所述的用于分析基因型数据的信息法的结果。现代技术进展已经产生了使用这类方法作为高通量测序和基因分型阵列从遗传样品测量大量遗传信息的能力。本文中所公开的方法允许临床医师更好地利用大量可用数据，并且作出胎儿遗传状态的更准确诊断。多个实施例的细节给出在下文中。不同实施例可以涉及上述步骤的不同组合。不同步骤的不同实施例的各个组合可以互换使用。The high accuracy of the methods disclosed herein is a result of the informative methods used to analyze genotype data as described herein. Modern technological advances have yielded the ability to measure large amounts of genetic information from genetic samples using such methods as high-throughput sequencing and genotyping arrays. The methods disclosed herein allow clinicians to make better use of the vast amount of data available and to make more accurate diagnoses of the genetic status of the fetus. Details of various embodiments are given below. Different embodiments may involve different combinations of the steps described above. Various combinations of different embodiments of different steps may be used interchangeably.

在一个实施例中，血液样品取自怀孕母亲，并且母亲血液的血浆中的自由浮动的DNA含有具有母本来源的DNA和胎儿来源的DNA的混合物，经分离并用于测定胎儿的倍性状态。在一个实施例中，本文中所公开的方法涉及以等位基因比率和/或等位基因分布在富集之后保持大概一致的方式优先富集DNA混合物中对应于多态等位基因的那些DNA序列。在一个实施例中，本文中所公开的方法涉及基于高度有效靶向的PCR的扩增以使得极高百分比的所得分子对应于目标基因座。在一个实施例中，本文中所公开的方法涉及对含有具有母本来源的DNA和胎儿来源的DNA的DNA混合物进行测序。在一个实施例中，本文中所公开的方法涉及使用所测量的等位基因分布来测定母亲体内正在孕育的胎儿的倍性状态。在一个实施例中，本文中所公开的方法涉及向临床医师报告所测定的倍性状态。在一个实施例中，本文中所公开的方法涉及采取临床行动，例如执行后续侵入性测试，例如绒毛抽样或羊膜穿刺术，准备三体个体的出生或选择性终止三体胎儿。In one embodiment, a blood sample is taken from a pregnant mother, and free floating DNA in the plasma of the mother's blood, containing a mixture of DNA of maternal and fetal origin, is separated and used to determine the ploidy state of the fetus. In one embodiment, the methods disclosed herein involve preferential enrichment of those DNAs in a mixture of DNA that correspond to polymorphic alleles in such a way that the allele ratios and/or allele distributions remain approximately the same after enrichment sequence. In one embodiment, the methods disclosed herein involve highly efficient targeted PCR-based amplification such that a very high percentage of the resulting molecules correspond to the loci of interest. In one embodiment, the methods disclosed herein involve sequencing a DNA mixture comprising DNA of maternal origin and DNA of fetal origin. In one embodiment, the methods disclosed herein involve the use of measured allelic distributions to determine the ploidy state of a gestating fetus in a mother. In one embodiment, the methods disclosed herein involve reporting the determined ploidy status to a clinician. In one embodiment, the methods disclosed herein involve taking clinical action, such as performing subsequent invasive tests, such as chorionic villus sampling or amniocentesis, preparing for the birth of a trisomy individual or selectively aborting a trisomy fetus.

本申请参考2006年11月28日提交的美国实用申请第11/603,406号(美国公开第20070184467号)；2008年3月17日提交的美国实用申请第12/076,348号(美国公开第20080243398号)；2009年8月4日提交的PCT申请第PCT/US09/52730号(PCT公开第WO/2010/017214号)；2010年9月30日提交的PCT申请第PCT/US10/050824号(PCT公开第WO/2011/041485号)；2011年5月18日提交的美国实用申请第13/110,685号以及2012年10月3日提交的PCT申请第PCT/12/58578号，所述申请各自以全文引用的方式并入本文中。在这个编档中所用词汇中的一些可以在这些参考文献中具有其先行词。本文中所述概念中的一些可以根据这些参考文献中所发现的概念更好地理解。This application refers to U.S. Utility Application No. 11/603,406 (U.S. Publication No. 20070184467), filed November 28, 2006; U.S. Utility Application No. 12/076,348, filed March 17, 2008 (U.S. Publication No. 20080243398) ; PCT Application No. PCT/US09/52730, filed August 4, 2009 (PCT Publication No. WO/2010/017214); PCT Application No. PCT/US10/050824, filed September 30, 2010 (PCT Publication No. WO/2011/041485); U.S. Utility Application No. 13/110,685, filed May 18, 2011, and PCT Application No. PCT/12/58578, filed October 3, 2012, each in its entirety Incorporated herein by reference. Some of the words used in this archive may have their antecedents in these references. Some of the concepts described herein may be better understood from concepts found in these references.

筛选包含自由浮动的胎儿DNA的母本血液Screening of maternal blood for free-floating fetal DNA

本文中所描述的方法可以用于帮助测定孩子、胎儿或其它目标个体的基因型，其中发现目标的遗传物质以其它遗传物质的数量存在。在一些实施例中，基因型可以指一条或多条染色体的倍性状态，它可以指一个或多个疾病连锁等位基因或其一些组合。在本发明中，讨论集中在测定胎儿的遗传状态，其中胎儿DNA发现于母本血液中，但是本实例不打算限制这种方法可以应用的可能情况。另外，所述方法可以适用于其中目标DNA的量与非目标DNA成任何比例的情况下；举例来说，目标DNA可以占介于所存在DNA的0.000001％与99.999999％之间的任何值。另外，非目标DNA不一定需要来自一个个体，或甚至来自相关个体，只要来自一些或所有相关非目标个体的遗传数据是已知的即可。在一个实施例中，本文中所公开的方法可以用于从含有胎儿DNA的母本血液测定胎儿的基因型数据。它还可以用于其中在孕妇子宫中存在多个胎儿或其中在样品中可以存在其它受到污染的DNA，例如来自其它已经出生的兄弟姐妹的情况。The methods described herein can be used to aid in genotyping a child, fetus, or other individual of interest in which genetic material of interest is found to be present in amounts of other genetic material. In some embodiments, a genotype can refer to the ploidy state of one or more chromosomes, it can refer to one or more disease-linked alleles, or some combination thereof. In the present invention, the discussion focuses on determining the genetic status of a fetus where fetal DNA is found in maternal blood, but this example is not intended to limit the possible situations in which this method can be applied. In addition, the method may be applicable in cases where the amount of target DNA is in any proportion to non-target DNA; for example, target DNA may comprise any value between 0.000001% and 99.999999% of the DNA present. Additionally, the non-target DNA need not necessarily be from one individual, or even from related individuals, so long as genetic data from some or all of the related non-target individuals is known. In one embodiment, the methods disclosed herein can be used to determine fetal genotype data from maternal blood containing fetal DNA. It can also be used in cases where there are multiple fetuses in the uterus of a pregnant woman or where there may be other contaminated DNA in the sample, eg from other already born siblings.

这种技术可以利用胎儿血细胞通过胎盘绒毛接近母本循环的现象。通常，仅极少量的胎儿细胞以这种方式进入母本循环(不足以产生胎儿-母本出血的阳性克-贝(Kleihauer-Betke)测试)。可以通过各种技术对胎儿细胞进行分类和分析以寻找特定DNA序列，但是不含侵入性程序本身所具有的风险。这种技术还可以利用自由浮动的胎儿DNA通过在胎盘组织的细胞凋亡之后的DNA释放而接近母本循环的现象，其中所讨论的胎盘组织含有具有与胎儿相同的基因型的DNA。母本血浆中所发现的自由浮动的DNA已经显示含有比例高达30％-40％胎儿DNA的胎儿DNA。This technique can take advantage of the close proximity of fetal blood cells to the maternal circulation through the placental villi. Typically, only a very small number of fetal cells enter the maternal circulation in this way (not enough to produce a positive Kleihauer-Betke test for fetal-maternal hemorrhage). Fetal cells can be sorted and analyzed for specific DNA sequences by various techniques, but without the risks inherent in invasive procedures. This technique can also take advantage of the phenomenon of free-floating fetal DNA approaching the maternal circulation through DNA release following apoptosis of placental tissue containing DNA of the same genotype as the fetus. Free-floating DNA found in maternal plasma has been shown to contain fetal DNA in proportions as high as 30%-40% fetal DNA.

在一个实施例中，可以从孕妇抽取血液。研究已经显示，母本血液除了母本来源的自由浮动的DNA以外，还可以含有来自胎儿的少量自由浮动的DNA。另外，除了母本来源的通常不含核DNA的多个血细胞以外，还可以存在包含胎儿来源的DNA的去核胎儿血细胞。在本领域中已知多种方法来分离胎儿DNA或创建胎儿DNA富集的部分。举例来说，色谱已经显示创建了胎儿DNA富集的某些部分。In one embodiment, blood may be drawn from a pregnant woman. Studies have shown that maternal blood can contain small amounts of free-floating DNA from the fetus in addition to free-floating DNA of maternal origin. In addition, there may be enucleated fetal blood cells containing DNA of fetal origin in addition to a plurality of blood cells of maternal origin which generally do not contain nuclear DNA. Various methods are known in the art to isolate fetal DNA or create fetal DNA-enriched fractions. For example, chromatography has been shown to create enriched fractions of fetal DNA.

在掌握了以相对非侵入性方式抽取并且含有一定量胎儿DNA的母本血液、血浆或其它体液的样品之后，可以对所述样品中所发现的DNA进行基因分型，所述胎儿DNA或者细胞或自由浮动的、或者与母本DNA成比例富集或呈其初始比率。在一些实施例中，可以使用针从静脉、例如贵要静脉(basilica vein)抽出血液来抽取血液。本文中所述的方法可以用于测定胎儿的基因型数据。举例来说，它可以用于测定一或多条染色体处的倍性状态，它可以用于测定一个或一组SNP的身份，包括插入、缺失和易位。它可以用于测定一或多种单倍型，包括一或多个基因型特征的亲本来源。Once a sample of maternal blood, plasma, or other bodily fluid is obtained that is drawn in a relatively non-invasive manner and contains a certain amount of fetal DNA, the DNA found in the sample can be genotyped, the fetal DNA or the cells Either free-floating, or enriched in proportion to maternal DNA or at its initial ratio. In some embodiments, blood may be drawn using a needle to draw blood from a vein, such as the basilica vein. The methods described herein can be used to determine fetal genotype data. For example, it can be used to determine the ploidy state at one or more chromosomes, it can be used to determine the identity of a SNP or a group of SNPs, including insertions, deletions and translocations. It can be used to determine the parental origin of one or more haplotypes, including one or more genotypic traits.

应注意，这种方法将对可以用于任何基因分型和/或测序方法的任何核酸起作用，所述基因分型和/或测序方法例如伊路米那印飞尼姆阵列平台、昂飞基因芯片、伊路米那基因组分析仪或生命技术索立德系统。这包括从血浆提取的自由浮动的DNA或其扩增物(例如全基因组扩增，PCR)；来自其它细胞类型(例如来自全血的人淋巴细胞)的基因组DNA或其扩增物。关于DNA的制备，产生适用于这些平台中的一个的基因组DNA的任何提取或纯化方法将同样适用。这种方法可以对RNA样品同样有效。在一个实施例中，样品的储存可以按将使降解降到最低的方式(例如零下、在约-20℃下或在更低温度下)进行。It should be noted that this method will work on any nucleic acid that can be used in any genotyping and/or sequencing method, such as the Illumina Infinium array platform, Affiliate Microarray, Illumina Genome Analyzer or Life Technologies Solid Systems. This includes free-floating DNA extracted from plasma or amplification thereof (eg whole genome amplification, PCR); genomic DNA or amplification thereof from other cell types (eg human lymphocytes from whole blood). With regard to DNA preparation, any extraction or purification method that produces genomic DNA suitable for one of these platforms will equally apply. This method can work equally well for RNA samples. In one embodiment, storage of the sample may be performed in a manner that will minimize degradation (eg, subzero, at about -20° C., or at lower temperatures).

Parental SupportParental Support

一些实施例可以与PARENTAL SUPPORT^TM(PS)方法组合使用，所述PS方法的实施例描述在美国申请第11/603,406号(美国公开第20070184467号)、美国申请第12/076,348号(美国公开第20080243398号)、美国申请13/110,685、PCT申请PCT/US09/52730(PCT公开第WO/2010/017214号)以及PCT申请第PCT/US10/050824号(PCT公开第WO/2011/041485号)中，所述申请以其全文引用的方式并入本文中。PARENTAL SUPPORT^TM是一种基于信息的方法，可以用于分析遗传数据。在一些实施例中，本文中所公开的方法可以认为是PARENTAL SUPPORT^TM方法的一部分。在一些实施例中，PARENTAL SUPPORT^TM方法是可以用于以高准确性测定目标个体、来自所述个体的一个或少量细胞或由来自目标个体的DNA和来自其它个体中的一个或多个的DNA组成的DNA混合物的遗传数据、具体来说测定目标个体中的疾病相关等位基因、其它相关等位基因和/或一条或多条染色体的倍性状态的方法的集合。PARENTALSUPPORT^TM可以指这些方法中的任一种。PARENTAL SUPPORT^TM是基于信息的方法的一个实例。PARENTAL SUPPORT^TM方法的示例性实施例示出在图29-31G中并且描述在实验19中。Some embodiments may be used in combination with the PARENTAL SUPPORT ^™ (PS) method, examples of which are described in U.S. Application No. 11/603,406 (U.S. Publication No. 20070184467), U.S. Application No. 12/076,348 (U.S. Publication No. 20080243398), U.S. Application No. 13/110,685, PCT Application No. PCT/US09/52730 (PCT Publication No. WO/2010/017214), and PCT Application No. PCT/US10/050824 (PCT Publication No. WO/2011/041485) , said application is incorporated herein by reference in its entirety. PARENTAL SUPPORT ^TM is an information-based approach that can be used to analyze genetic data. In some embodiments, the methods disclosed herein may be considered part of the PARENTAL SUPPORT ^™ method. In some embodiments, the PARENTAL SUPPORT ^™ method can be used to determine with high accuracy a target individual, one or a small number of cells from said individual, or DNA from a target individual and from one or more of other individuals. A collection of genetic data for a constituent DNA mixture, specifically methods for determining disease-associated alleles, other associated alleles and/or the ploidy state of one or more chromosomes in an individual of interest. PARENTALSUPPORT ^(TM) may refer to any of these methods. PARENTAL SUPPORT ^™ is an example of an information-based approach. An exemplary embodiment of the PARENTAL SUPPORT ^™ method is shown in FIGS. 29-31G and described in Experiment 19.

PARENTAL SUPPORT^TM方法利用已知的亲本遗传数据，即母亲和/或父亲的单倍型和/或二倍体遗传数据，以及对减数分裂机制和目标DNA以及可能一或多个相关个体的不完善测量的了解，以及基于群体的交叉频率，以便在计算机模拟中以高置信度重构多个等位基因处的基因型和/或含任何目标细胞的胚胎的倍性状态以及在具有关键基因座的位置处的目标DNA。PARENTAL SUPPORT^TM方法不仅可以重构测量不良的单核苷酸多态性(SNP)，而且可以重构插入和缺失以及完全测量不到的SNP或DNA的整个区域。此外，PARENTAL SUPPORT^TM方法可以测量多个疾病连锁基因座以及筛选单细胞的非整倍性。在一些实施例中，PARENTAL SUPPORT^TM方法可以用于在IVF周期期间表征来自胚胎活检的一或多个细胞以测定一或多个细胞的遗传病况。The PARENTAL SUPPORT ^TM method utilizes known parental genetic data, i.e. haplotype and/or diploid genetic data of the mother and/or father, together with different knowledge of meiotic machinery and target DNA and possibly one or more related individuals. Refine measured knowledge, and population-based crossover frequencies, to reconstruct in silico with high confidence genotypes at multiple alleles and/or ploidy states of embryos containing any cell of interest and at key genes locus of target DNA at the location. The PARENTAL SUPPORT ^TM method can reconstruct not only poorly measured single nucleotide polymorphisms (SNPs), but also insertions and deletions as well as completely unmeasured SNPs or entire regions of DNA. In addition, the PARENTAL SUPPORT ^™ method can measure multiple disease-linked loci as well as screen single cells for aneuploidy. In some embodiments, the PARENTAL SUPPORT ^™ method can be used to characterize one or more cells from an embryo biopsy during an IVF cycle to determine the genetic condition of the one or more cells.

PARENTAL SUPPORT^TM方法允许清除有噪声的遗传数据。这可以通过使用相关个体(父母)的基因型作为参考，推断出目标基因组(胚胎)中的正确基因等位基因来进行。当仅仅少量遗传物质可用(例如PGD)时并且当基因型的直接测量结果由于有限量的遗传物质而固有地带有噪声时，PARENTAL SUPPORT^TM会尤其恰当。当仅可从目标个体获得一小部分遗传物质(例如NPD)时并且当基因型的直接测量结果由于来自另一个个体的受到污染的DNA信号而固有地带有噪声时，PARENTAL SUPPORT^TM会尤其恰当。PARENTAL SUPPORT^TM方法能够在胚胎上重构高度准确有序的二倍体等位基因序列以及染色体区段的拷贝数，尽管常规无序二倍体测量的特征可能是高比率的等位基因丢失、插入、可变的扩增偏差以及其它错误。所述方法可以采用基本的遗传模型和基本的测量误差模型。遗传模型可以测定每个SNP处的等位基因概率和SNP之间的交叉概率。可以基于从亲本获得的数据在每个SNP处对等位基因概率进行建模并且基于从如由国际HapMap计划开发的HapMap数据库获得的数据对SNP之间的交叉概率进行建模。鉴于恰当的基本遗传模型和测量误差模型，可以使用最大后验概率(MAP)估计，对计算效率加以修改，从而估计胚胎中每个SNP处的正确有序的等位基因值。The PARENTAL SUPPORT ^TM method allows cleaning of noisy genetic data. This can be done by inferring the correct gene alleles in the target genome (embryo) using the genotypes of the related individuals (parents) as a reference. PARENTAL SUPPORT ^™ would be especially appropriate when only a small amount of genetic material is available (eg PGD) and when direct measurements of genotype are inherently noisy due to the limited amount of genetic material. PARENTAL SUPPORT ^™ would be especially appropriate when only a small portion of genetic material (eg NPD) is available from the individual of interest and when direct measurements of genotype are inherently noisy due to contaminated DNA signals from another individual. The PARENTAL SUPPORT ^TM method enables highly accurate reconstruction of ordered diploid allelic sequences and copy numbers of chromosomal segments on embryos, although routine disordered diploid measurements may be characterized by high rates of allelic loss, Insertions, variable amplification bias, and other errors. The method can employ a basic genetic model and a basic measurement error model. The genetic model can determine the allelic probability at each SNP and the probability of crossover between SNPs. Allele probabilities can be modeled at each SNP based on data obtained from the parents and crossover probabilities between SNPs can be modeled based on data obtained from the HapMap database as developed by the International HapMap Project. Given an appropriate underlying genetic model and measurement error model, maximum a posteriori probability (MAP) estimation can be used, modified for computational efficiency, to estimate the correct ordered allelic value at each SNP in an embryo.

在一些情况下，以上所概述的技术能够在指定来源于个体的极少量DNA的情况下测定所述个体的基因型。这可以是来自一个或少量细胞的DNA，或它可以来自母本血液中所发现的少量胎儿DNA。In some cases, the techniques outlined above are capable of determining the genotype of an individual given very small amounts of DNA derived from the individual. This can be DNA from one or a small number of cells, or it can come from the small amount of fetal DNA found in the mother's blood.

假设suppose

在本发明的情况下，假设是指可能的遗传状态。它可以指可能的倍性状态。它可以指可能的等位基因状态。一组假设可以指一组可能的基因状态、一组可能的等位基因状态、一组可能的倍性状态或其组合。在一些实施例中，可以设计一组假设以使得来自所述组的一个假设将对应于任何指定个体的实际遗传状态。在一些实施例中，可以设计一组假设以使得可以通过来自所述组的至少一个假设描述每一种可能的遗传状态。在本发明的一些实施例中，方法的一个方面是测定哪个假设对应于所讨论个体的实际遗传状态。In the context of the present invention, hypotheses refer to possible genetic states. It can refer to possible ploidy states. It can refer to probable allelic states. A set of hypotheses can refer to a set of possible genetic states, a set of possible allelic states, a set of possible ploidy states, or a combination thereof. In some embodiments, a set of hypotheses can be designed such that one hypothesis from the set will correspond to the actual genetic state of any given individual. In some embodiments, a set of hypotheses can be designed such that every possible genetic state can be described by at least one hypothesis from the set. In some embodiments of the invention, an aspect of the method is determining which hypothesis corresponds to the actual genetic state of the individual in question.

在本发明的另一个实施例中，一个步骤涉及创建假设。在一些实施例中，它可以是拷贝数假设。在一些实施例中，它可以涉及关于来自相关个体中的每一个的染色体的哪些区段在遗传上对应于其它相关个体的哪些区段(如果存在的话)的假设。创建假设可以指设定变量界限的行为以使得通过那些变量涵盖正在研究的整组可能的基因状态。In another embodiment of the invention, one step involves creating a hypothesis. In some embodiments it may be a copy number hypothesis. In some embodiments, it may involve assumptions about which segments of chromosomes from each of the related individuals genetically correspond to which segments, if any, of other related individuals. Creating a hypothesis can refer to the act of setting the bounds on variables such that the entire set of possible gene states being studied is encompassed by those variables.

“拷贝数假设”也称为“倍性假设”或“倍性状态假设”，可以指关于目标个体中针对指定染色体拷贝、染色体类型或染色体区段的可能倍性状态的假设。它还可以指个体中一种以上染色体类型处的倍性状态。一组拷贝数假设可以指其中每种假设对应于个体中不同的可能倍性状态的一组假设。一组假设可以涉及一组可能的倍性状态、一组可能的亲本单倍型贡献、混合样品中的一组可能的胎儿DNA百分比或其组合。A "copy number hypothesis" also known as a "ploidy hypothesis" or "ploidy state hypothesis" can refer to a hypothesis about the likely ploidy state in a target individual for a given chromosome copy, chromosome type, or chromosome segment. It can also refer to the ploidy state at more than one chromosome type in an individual. A set of copy number hypotheses can refer to a set of hypotheses where each hypothesis corresponds to a different possible ploidy state in an individual. A set of hypotheses can relate to a set of possible ploidy states, a set of possible parental haplotype contributions, a set of possible fetal DNA percentages in a pooled sample, or a combination thereof.

正常个体含有来自每个亲本的一条每种染色体。然而，由于减数分裂和有丝分裂错误，个体有可能具有来自每个亲本的0、1、2或更多个指定染色体类型。实际上，很难看到来自一个亲本的两条以上指定染色体。在本发明中，一些实施例仅仅考虑其中指定染色体的0、1或2个拷贝来自一个亲本的可能假设；微小延伸是考虑来源于一个亲本的更多或更少的可能拷贝。在一些实施例中，对于指定染色体，存在九种可能假设：关于母本来源的0、1或2条染色体的三种可能假设，乘以关于父本来源的0、1或2条染色体的三种可能假设。(m，f)是指假设，其中m是从母亲遗传的指定染色体的数量，并且f是从父亲遗传的指定染色体的数量。因此，九种假设是(0，0)、(0，1)、(0，2)、(1，0)、(1，1)、(1，2)、(2，0)、(2，1)以及(2，2)。这些还可以写成H₀₀、H₀₁、H₀₂、H₁₀、H₁₂、H₂₀、H₂₁以及H₂₂。不同假设对应于不同倍性状态。举例来说，(1，1)是指正常的二体染色体；(2，1)是指母本三体性，并且(0，1)是指父本单体性。在一些实施例中，其中从一个亲本遗传两条染色体并且从另一亲本遗传一条染色体的情况可以进一步分成两种情况：一种是其中两条染色体是一致的(匹配拷贝错误)，并且一种是其中两条染色体是同源但不是一致的(不匹配的拷贝错误)。在这些实施例中，存在十六种可能假设。应了解，有可能使用其它假设组和不同数量的假设。A normal individual contains one copy of each chromosome from each parent. However, due to meiotic and mitotic errors, it is possible for an individual to have 0, 1, 2 or more of a given chromosome type from each parent. In practice, it is rare to see more than two specified chromosomes from one parent. In the present invention, some embodiments only consider possible hypotheses where 0, 1 or 2 copies of a given chromosome came from one parent; microextensions consider more or fewer possible copies from one parent. In some embodiments, there are nine possible hypotheses for a given chromosome: three possible hypotheses for 0, 1 or 2 chromosomes of maternal origin, multiplied by three possible hypotheses for 0, 1 or 2 chromosomes of paternal origin possible hypothesis. (m, f) refers to the hypothesis, where m is the number of a given chromosome inherited from the mother, and f is the number of a given chromosome inherited from the father. Therefore, the nine hypotheses are (0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2), (2, 0), (2 , 1) and (2, 2). These can also be written as H ₀₀ , H ₀₁ , H ₀₂ , H ₁₀ , H ₁₂ , H ₂₀ , H ₂₁ and H ₂₂ . Different hypotheses correspond to different ploidy states. For example, (1,1) refers to a normal disomic chromosome; (2,1) refers to a maternal trisomy, and (0,1) refers to a paternal monosomy. In some embodiments, the situation where two chromosomes are inherited from one parent and one chromosome is inherited from the other parent can be further divided into two situations: one in which the two chromosomes are identical (match copy error), and one in which is where two chromosomes are homologous but not identical (mismatched copy error). In these examples, there are sixteen possible hypotheses. It should be appreciated that it is possible to use other sets of assumptions and a different number of assumptions.

在本发明的一些实施例中，倍性假设是指关于来自其它相关个体的哪些染色体对应于目标个体的基因组中所发现的染色体的假设。在一些实施例中，所述方法的关键是可以预计相关个体共享单倍域这一事实，并且使用所测量的来自相关个体的遗传数据，以及对哪些单倍域在目标个体与相关个体之间匹配的了解，有可能以比仅使用目标个体的基因测量结果高的置信度推断出目标个体的正确遗传数据。因此，在一些实施例中，倍性假设不仅可以涉及染色体的数量，而且可以涉及相关个体中哪些染色体与目标个体中的一或多条染色体是一致或几乎一致的。In some embodiments of the invention, a ploidy hypothesis refers to a hypothesis as to which chromosomes from other related individuals correspond to the chromosomes found in the genome of the target individual. In some embodiments, key to the method is the fact that related individuals can be expected to share haplotype domains, and using measured genetic data from related individuals, and the knowledge of which haplodomains are between the target individual and the related individual With knowledge of the match, it is possible to infer the correct genetic data of the target individual with a higher degree of confidence than using only the genetic measurements of the target individual. Thus, in some embodiments, the ploidy hypothesis may relate not only to the number of chromosomes, but also to which chromosomes in the related individual are identical or nearly identical to one or more chromosomes in the target individual.

在已经定义了假设组之后，当所述算法对输入遗传数据操作时，它们可以输出针对研究中的假设中的每一个的确定的统计概率。各个假设的概率可以通过使用恰当的遗传数据作为输入，针对各个假设中的每一个，数学计算概率相等的值来确定，如通过专门技术、算法和/或本发明中其它地方所述方法中的一或多个所述。After the set of hypotheses have been defined, when the algorithms operate on the input genetic data, they can output a determined statistical probability for each of the hypotheses under study. The probabilities of the respective hypotheses can be determined by mathematically calculating values of equal probabilities for each of the respective hypotheses, using appropriate genetic data as input, as in techniques, algorithms, and/or methods described elsewhere herein. one or more of the above.

在如通过多种技术测定，估计出不同假设的概率之后，可以将它们组合。针对每种假设，这可能需要将通过每种技术测定的概率相乘。可以对假设概率的乘积归一化。应注意，一种倍性假设是指染色体的一种可能的倍性状态。After the probabilities of different hypotheses have been estimated, as determined by various techniques, they can be combined. For each hypothesis, this may entail multiplying the probabilities determined by each technique. The product of hypothesis probabilities can be normalized. It should be noted that a ploidy hypothesis refers to one possible ploidy state of a chromosome.

“组合概率”也称为“组合假设”或组合专门技术的结果的过程是线性代数领域的技术人员应该熟悉的概念。组合概率的一种可能方式如下：当给定一组遗传数据，使用专门技术来估计一组假设时，所述方法的输出是一组以一对一方式与所述假设组中的每种假设相关的概率。当通过第一种专门技术测定的一组概率(各自与所述组中的假设中的一种相关)与通过第二种专门技术测定的一组概率(各自与同一组假设相关)组合时，将两组概率相乘。这意味着，针对所述组中的每种假设，将如通过两种专门的方法测定的与所述假设相关的两个概率相乘在一起，并且对应的乘积是输出概率。这个过程可以延伸到任何数量的专门技术。如果仅使用一种专门技术，那么输出概率与输入概率相同。如果使用两种以上专门技术，那么相关概率可以同时相乘。可以对乘积进行归一化以使得假设组中假设的概率总计为100％。"Combining probabilities" also known as "combining hypotheses" or the process of combining outcomes of expertise is a concept that should be familiar to those skilled in the field of linear algebra. One possible way to combine probabilities is as follows: When a specialized technique is used to estimate a set of hypotheses given a set of genetic data, the output of the method is a set of associated probabilities. When a set of probabilities determined by a first technique (each associated with one of the hypotheses in the set) is combined with a set of probabilities determined by a second technique (each associated with the same set of hypotheses), Multiply the two sets of probabilities together. This means that for each hypothesis in the set, the two probabilities associated with that hypothesis, as determined by two specialized methods, are multiplied together, and the corresponding product is the output probability. This process can be extended to any number of specialized techniques. If only one specialized technique is used, the output probabilities are the same as the input probabilities. If more than two specialized techniques are used, the associated probabilities can be multiplied simultaneously. The products can be normalized so that the probabilities of the hypotheses in the hypothesis group add up to 100%.

在一些实施例中，如果指定假设的组合概率大于其它假设中的任一个的组合概率，那么可以认为所述假设确定是最有可能的。在一些实施例中，如果归一化概率大于阈值，那么可以确定假设是最有可能的，并且可以判读倍性状态或其它遗传状态。在一个实施例中，这可以意味着，与所述假设相关的染色体的数量和身份可以称为倍性状态。在一个实施例中，这可以意味着，与所述假设相关的等位基因的身份可以称为等位基因状态。在一些实施例中，阈值可以介于约50％与约80％之间。在一些实施例中，阈值可以介于约80％与约90％之间。在一些实施例中，阈值可以介于约90％与约95％之间。在一些实施例中，阈值可以介于约95％与约99％之间。在一些实施例中，阈值可以介于约99％与约99.9％之间。在一些实施例中，阈值可以超过约99.9％。In some embodiments, a hypothesis determination may be considered most likely if the combined probability of a given hypothesis is greater than the combined probability of any of the other hypotheses. In some embodiments, if the normalized probability is greater than a threshold, then the hypothesis can be determined to be most likely and the ploidy state or other genetic state can be called. In one embodiment, this may mean that the number and identity of the chromosomes associated with the hypothesis may be referred to as the ploidy state. In one embodiment, this may mean that the identity of the allele associated with the hypothesis may be referred to as the allelic state. In some embodiments, the threshold may be between about 50% and about 80%. In some embodiments, the threshold may be between about 80% and about 90%. In some embodiments, the threshold may be between about 90% and about 95%. In some embodiments, the threshold may be between about 95% and about 99%. In some embodiments, the threshold may be between about 99% and about 99.9%. In some embodiments, the threshold may exceed about 99.9%.

亲本背景parental background

亲本背景是指在目标的两个亲本中的一方或双方的两条相关染色体中的每一条上，指定等位基因的遗传状态。应注意在一个实施例中，亲本背景不是指目标的等位基因状态，实际上它是指亲本的等位基因状态。指定SNP的亲本背景可以由四个碱基对(两个父本的和两个母本的)组成；它们彼此可以相同或不同。它通常写成“m₁m₂|f₁f₂”，其中m₁和m₂是两条母本染色体上的指定SNP的遗传状态，并且f₁和f₂是两条父本染色体上的指定SNP的遗传状态。在一些实施例中，亲本背景可以写成“f₁f₂|m₁m₂”。应注意，下标“1”和“2”是指第一和第二染色体的指定等位基因处的基因型；还应注意，将哪条染色体标注为“1”并且将哪条标注为“2”的选择是任意的。Parental background refers to the genetic status of a given allele on each of the two related chromosomes in one or both parents of a target. It should be noted that in one embodiment, the parental context does not refer to the allelic state of the target, it actually refers to the allelic state of the parents. The parental context of a given SNP can consist of four base pairs (two paternal and two maternal); they can be the same or different from each other. It is usually written "m ₁ m ₂ | f ₁ f ₂ ", where m ₁ and m ₂ are the genetic status of the specified SNP on the two maternal chromosomes, and f ₁ and f ₂ are the specified SNPs on the two paternal chromosomes Genetic status of SNPs. In some embodiments, the parental context can be written as "f ₁ f ₂ |m ₁ m ₂ ". Note that the subscripts "1" and "2" refer to the genotypes at the indicated alleles of the first and second chromosomes; note also which chromosome is labeled "1" and which is labeled " The choice of 2" is arbitrary.

应注意在本发明中，A和B通常用于一般性地表示碱基对身份；A或B同样可以很好地表示C(胞嘧啶)、G(鸟嘌呤)、A(腺嘌呤)或T(胸腺嘧啶)。举例来说，如果在基于指定SNP的等位基因处，母亲的一条染色体上在所述SNP处的基因型是T，并且同源染色体上在所述SNP处是G，并且父亲的两条同源染色体上在所述SNP处的所述等位基因处的基因型是G，那么可以说，目标个体的等位基因具有亲本背景AB|BB；还可以说，所述等位基因具有亲本背景AB|AA。应注意，理论上，四种可能核苷酸中的任一种都可以出现在指定等位基因处，并且因此有可能在指定等位基因处，例如母亲的基因型是AT，并且父亲的基因型是GC。然而，经验数据表明，在大多数情况下，在指定等位基因处仅观察到四个可能碱基对中的两个。有可能，例如当使用单一串联重复时，具有两种以上、四种以上并且甚至十种以上亲本背景。在本发明中，讨论假定在指定等位基因处将观察到仅两个可能碱基对，但是可以修改本文中所公开的实施例以考虑这个假设不成立的情况。It should be noted that in the present invention, A and B are generally used to denote base pair identity generically; A or B equally well denote C (cytosine), G (guanine), A (adenine), or T (Thymine). For example, if at an allele based on a given SNP, the mother's genotype at that SNP on one chromosome is T, and the homologous chromosome is G at that SNP, and the father's two homologous The genotype at the allele at the SNP on the source chromosome is G, then it can be said that the allele of the target individual has the parental background AB|BB; it can also be said that the allele has the parental background AB|AA. It should be noted that, theoretically, any of the four possible nucleotides could occur at a given allele, and thus it is possible that at a given allele, for example, the mother's genotype is AT, and the father's gene Type is GC. However, empirical data show that in most cases only two of the four possible base pairs are observed at a given allele. It is possible, eg when using a single tandem repeat, to have more than two, more than four and even more than ten parental contexts. In the present invention, the discussion assumes that only two possible base pairs will be observed at a given allele, but the examples disclosed herein can be modified to account for cases where this assumption does not hold.

亲本背景在NPD中的用途Use of Parental Background in NPD

非侵入性产前诊断是可以用于从以非侵入性方式获得的遗传物质(例如在怀孕母亲上抽取的血液)测定胎儿遗传状态的重要技术。可以分离血液并且分离血浆，接着分离血浆DNA。可以使用大小选择分离出具有恰当长度的DNA。所述DNA可以在一组基因座优先富集。这种DNA然后可以通过多种手段测量，例如通过杂交到基因分型阵列并且测量荧光，或通过在高通量测序仪上测序。Non-invasive prenatal diagnosis is an important technique that can be used to determine the genetic status of a fetus from non-invasively obtained genetic material, such as blood drawn on a pregnant mother. The blood can be separated and the plasma separated, followed by the plasma DNA. DNA of the appropriate length can be isolated using size selection. The DNA may be preferentially enriched at a set of loci. This DNA can then be measured by various means, such as by hybridization to a genotyping array and measuring fluorescence, or by sequencing on a high throughput sequencer.

在非侵入性产前诊断的情况下，当使用测序进行胎儿的倍性判读时，存在多种方式来使用序列数据。最常见的方式是可以使用序列数据对映射到指定染色体的读数数量进行简单地计数。举例来说，设想当你正在试图测定胎儿的21号染色体的倍性状态时。进一步设想样品中的DNA由10％胎儿来源DNA和90％母本来源DNA组成。在此情况下，你可以查看可能预计是二体的染色体(例如3号染色体)上的平均读数数量，并且将所述平均读数数量与21号染色体上的读数数量进行比较，其中针对所述染色体上作为独特序列的一部分的碱基对的数量调整读数。如果胎儿是整倍体，那么可以预计在所有位置(经历随机变异)每单位基因组的DNA的量是大致相等的。另一方面，如果胎儿在21号染色体是三体的，那么可以预计21号染色体相比于基因组上的其它位置，每基因单位的DNA稍多一些。具体来说，可以预计混合物中21号染色体上的DNA多出约5％。当使用测序来测量DNA时，可以预计21号染色体相比于其它染色体，每个独特区段约5％可更独特地映射的读数。可以使用来自特定染色体的一定量DNA的观察结果作为非整倍性诊断的基础，所述DNA的量当针对可独特地映射到所述染色体的序列的数量进行调整时高于一定阈值。可以用于检测非整倍性的另一种方法类似于以上方法，不同之处在于可以考虑亲本背景。In the context of non-invasive prenatal diagnosis, when sequencing is used for ploidy calling of fetuses, there are various ways to use the sequence data. Most commonly, sequence data can be used to simply count the number of reads that map to a given chromosome. For example, imagine that you are trying to determine the ploidy state of chromosome 21 of a fetus. It is further envisioned that the DNA in the sample consists of 10% DNA of fetal origin and 90% DNA of maternal origin. In this case, you can look at the average number of reads on a chromosome that might be expected to be disomic (eg chromosome 3) and compare the average number of reads to the number of reads on chromosome 21 for which Reads are adjusted on the number of base pairs that are part of the unique sequence. If the fetus is euploid, then one would expect the amount of DNA per unit of genome to be approximately equal at all positions (subject to random variation). On the other hand, if the fetus is trisomy on chromosome 21, then one would expect slightly more DNA per gene unit on chromosome 21 than elsewhere on the genome. Specifically, about 5 percent more DNA on chromosome 21 could be expected in the mixture. When using sequencing to measure DNA, one can expect approximately 5% more uniquely mappable reads per unique segment on chromosome 21 than on other chromosomes. An aneuploidy diagnosis can be based on the observation that an amount of DNA from a particular chromosome is above a certain threshold when adjusted for the number of sequences that can be uniquely mapped to that chromosome. Another method that can be used to detect aneuploidy is similar to the above method, except that parental background can be considered.

当考虑靶向哪些等位基因时，可以考虑某些亲本背景可能比其它亲本背景提供更多信息的似然性。举例来说，AA|BB和对称背景BB|AA是提供信息最多的背景，因为已知胎儿携带不同于母亲的等位基因。出于对称性的原因，AA|BB和BB|AA背景可以被称作AA|BB。另一组提供信息的亲本背景是AA|AB和BB|AB，因为在这些情况下，胎儿具有50％的机率携带母亲所不具有的等位基因。出于对称性的原因，AA|AB和BB|AB背景可以被称作AA|AB。第三组提供信息的亲本背景是AB|AA和AB|BB，因为在这些情况下，胎儿携带已知的父本等位基因，并且所述等位基因也存在于母基因组中。出于对称性的原因，AB|AA和AB|BB背景可以被称作AB|AA。第四种亲本背景是AB|AB，其中胎儿具有未知的等位基因状态，并且无论等位基因状态如何，它都是母亲具有相同等位基因的亲本背景。第五种亲本背景是AA|AA，其中母亲和父亲是杂合的。When considering which alleles to target, one can consider the likelihood that certain parental backgrounds may be more informative than others. For example, AA|BB and the symmetric background BB|AA are the most informative backgrounds because the fetus is known to carry different alleles than the mother. For symmetry reasons, the AA|BB and BB|AA backgrounds may be referred to as AA|BB. Another set of informative parental backgrounds are AA|AB and BB|AB because in these cases the fetus has a 50% chance of carrying an allele that the mother does not have. For symmetry reasons, the AA|AB and BB|AB backgrounds may be referred to as AA|AB. A third set of informative parental contexts is AB|AA and AB|BB, since in these cases the fetus carries a known paternal allele that is also present in the maternal genome. For symmetry reasons, the AB|AA and AB|BB backgrounds may be referred to as AB|AA. The fourth parental context is AB|AB, where the fetus has an unknown allelic state and it is the parental context in which the mother has the same allele regardless of the allelic state. A fifth parental background is AA|AA, where the mother and father are heterozygous.

本发明所么开的实施例的不同实施方式Different implementations of the disclosed embodiments of the present invention

本文中公开了用于测定目标个体的倍性状态的方法。目标个体可以是分裂球、胚胎或胎儿。在本发明的一些实施例中，用于测定目标个体中的一或多条染色体的倍性状态的方法可以包括本文档所述步骤中的任一个以及其组合。Disclosed herein are methods for determining the ploidy state of an individual of interest. Target individuals can be blastomeres, embryos or fetuses. In some embodiments of the present invention, the method for determining the ploidy state of one or more chromosomes in a target individual may include any one and a combination of the steps described in this document.

在一些实施例中，待用于测定胎儿的遗传状态中的遗传物质的来源可以是从母本血液中分离的胎儿细胞，例如有核胎儿红血细胞。所述方法可以涉及从怀孕母亲获得血液样品。所述方法可以涉及使用可视化技术，基于某种颜色组合与有核红血细胞独特相关并且类似颜色组合与母本血液中存在的任何其它细胞不相关的想法，分离胎儿红血细胞。与有核红血细胞相关的颜色组合可以包括核周围的血色素的红色，所述颜色可以通过染色变得更加明显，并且核物质的颜色可以例如染成蓝色。通过从母本血液分离细胞并且使它们扩散在载玻片上，并且然后鉴别在哪些点处看到了红色(来自血色素)和蓝色(来自核物质)，可以能够鉴别有核红血细胞的位置。然后可以使用显微操纵器提取那些有核红血细胞，使用基因分型和/或测序技术测量那些细胞中的遗传物质的基因型的方面。In some embodiments, the source of genetic material to be used in determining the genetic status of the fetus may be fetal cells isolated from maternal blood, such as nucleated fetal red blood cells. The method can involve obtaining a blood sample from a pregnant mother. The method may involve using visualization techniques to separate fetal red blood cells based on the idea that certain color combinations are uniquely associated with nucleated red blood cells and similar color combinations are not associated with any other cells present in maternal blood. The color combination associated with nucleated red blood cells may include the red color of the hemoglobin surrounding the nucleus, which may be made more pronounced by staining, and the color of the nuclear material may, for example, be stained blue. By isolating the cells from maternal blood and spreading them on a glass slide, and then identifying at which points red (from hemoglobin) and blue (from nuclear material) are seen, it may be possible to identify the location of the nucleated red blood cells. Those nucleated red blood cells can then be extracted using a micromanipulator, and genotyped aspects of the genetic material in those cells can be measured using genotyping and/or sequencing techniques.

在一个实施例中，可以用仅在胎儿血色素而不是母本血色素存在下发荧光的模具对有核红血细胞进行染色，并且因此去除有核红血细胞是来源于母亲还是胎儿的不确定性。本发明的一些实施例可以涉及染色或以其它方式标记核物质。本发明的一些实施例具体来说可以涉及使用胎儿细胞特异性抗体标记胎儿核物质。In one embodiment, nucleated red blood cells can be stained with a mold that only fluoresces in the presence of fetal and not maternal hemoglobin, and thus removes uncertainty as to whether the nucleated red blood cells are of maternal or fetal origin. Some embodiments of the invention may involve staining or otherwise marking nuclear material. Some embodiments of the invention may specifically relate to labeling fetal nuclear material using fetal cell-specific antibodies.

存在多种其它方式从母本血液分离胎儿细胞，或从母本血液分离胎儿DNA，或在母本遗传物质的存在下富集胎儿遗传物质样品。在此列出了这些方法中的一些，但是这并不打算是穷尽性清单。为方便起见，在此列出了一些恰当技术：使用以荧光方式或以其它方式标记的抗体、尺寸排阻色谱、以磁性方式或以其它方式标注的亲和力标记、表观遗传差异(例如在特异性等位基因，母本与胎儿细胞之间的差异甲基化)、密度梯度离心、接着是CD45/14耗损和从CD45/14阴性细胞进行CD71阳性选择、具有不同渗透压摩尔浓度的单或双珀可(Percoll)梯度或半乳糖特异性凝集素方法。There are various other ways of isolating fetal cells from maternal blood, or isolating fetal DNA from maternal blood, or enriching a sample of fetal genetic material in the presence of maternal genetic material. Some of these methods are listed here, but this is not intended to be an exhaustive list. For convenience, some appropriate techniques are listed here: use of fluorescently or otherwise labeled antibodies, size exclusion chromatography, magnetically or otherwise labeled affinity tags, epigenetic differences (e.g. in specific sex alleles, differential methylation between maternal and fetal cells), density gradient centrifugation followed by CD45/14 depletion and CD71 positive selection from CD45/14 negative cells, single or Double Percoll gradient or galactose-specific lectin method.

在本发明的一个实施例中，目标个体是胎儿，并且对来自胎儿的多个DNA样品进行不同基因型测量。在本发明的一些实施例中，胎儿DNA样品来自经分离胎儿细胞，其中胎儿细胞可以与母本细胞混合。在本发明的一些实施例中，胎儿DNA样品来自自由浮动的胎儿DNA，其中胎儿DNA可以与自由浮动的母本DNA混合。在一些实施例中，胎儿DNA样品可以来源于含有母本DNA和胎儿DNA的混合物的母本血浆或母本血液。在一些实施例中，胎儿DNA可以按介于以下范围内的母本∶胎儿比率与母本DNA混合：99.9％∶0.1％到99％∶1％、99％∶1％到90％∶10％、90％∶10％到80％∶20％、80％∶20％到70％∶30％、70％∶30％到50％∶50％、50％∶50％到10％∶90％或10％∶90％到1％∶99％、1％∶99％到0.1％∶99.9％。In one embodiment of the invention, the target individual is a fetus, and different genotype measurements are made on multiple DNA samples from the fetus. In some embodiments of the invention, the fetal DNA sample is from isolated fetal cells, wherein the fetal cells can be mixed with maternal cells. In some embodiments of the invention, the fetal DNA sample is from free-floating fetal DNA, wherein the fetal DNA can be mixed with free-floating maternal DNA. In some embodiments, the fetal DNA sample can be derived from maternal plasma or maternal blood containing a mixture of maternal DNA and fetal DNA. In some embodiments, fetal DNA may be mixed with maternal DNA at a maternal:fetal ratio ranging from 99.9%:0.1% to 99%:1%, 99%:1% to 90%:10% , 90%: 10% to 80%: 20%, 80%: 20% to 70%: 30%, 70%: 30% to 50%: 50%, 50%: 50% to 10%: 90% or 10 %: 90% to 1%: 99%, 1%: 99% to 0.1%: 99.9%.

目标个体和/或相关个体的遗传数据可以通过使用取自以下群组的工具和/或技术测量恰当遗传物质而从分子状态转换成电子状态，所述群组包括(但不限于)：基因分型微阵列和高通量测序。一些高通量测序方法包括桑格DNA测序、焦磷酸测序、伊路米那索莱萨(SOLEXA)平台、伊路米那的基因组分析仪或应用生物系统(APPLIEDBIOSYSTEM)的454测序平台、赫利克斯(HELICOS)的真正单分子测序平台、翠鸟分子(HALCYON MOLECULAR)的电子显微镜测序法或任何其它测序方法。所有这些方法都将存储在DNA样品中的遗传数据以物理方式转换成一组通常在将被处理的途中存储于存储器装置中的遗传数据。Genetic data of a target individual and/or related individuals can be converted from a molecular state to an electronic state by measuring the appropriate genetic material using tools and/or techniques drawn from the group including (but not limited to): microarrays and high-throughput sequencing. Some high-throughput sequencing methods include Sanger DNA sequencing, pyrosequencing, Illumina Solexa (SOLEXA) platform, Illumina Genome Analyzer or Applied Biosystems (APPLIEDBIOSYSTEM) 454 sequencing platform, Helik HELICOS' true single-molecule sequencing platform, HALCYON MOLECULAR's electron microscope sequencing method, or any other sequencing method. All of these methods physically convert the genetic data stored in the DNA sample into a set of genetic data that is usually stored in a memory device en route to being processed.

相关个体的遗传数据可以通过分析取自以下群组的物质来测量，所述群组包括(但不限于)：个体的块状二倍体组织、来自个体的一或多个二倍体细胞、来自个体的一或多个单倍体细胞、来自目标个体的一或多个分裂球、在个体上所发现的细胞外遗传物质、来自个体的发现于母本血液中的细胞外遗传物质、来自个体的发现于母本血液中的细胞、从来自相关个体的配子产生的一或多个胚胎、从此类胚胎获得的一或多个分裂球、在相关个体上发现的细胞外遗传物质、已知来源于相关个体的遗传物质以及其组合。Genetic data of related individuals can be measured by analyzing material taken from groups including (but not limited to): bulk diploid tissue from an individual, one or more diploid cells from an individual, One or more haploid cells from an individual, one or more blastomeres from an individual of interest, extracellular genetic material found on an individual, extracellular genetic material from an individual found in maternal blood, from Cells of an individual found in maternal blood, one or more embryos produced from gametes from a related individual, one or more blastomeres obtained from such embryos, extracellular genetic material found on a related individual, known Genetic material derived from related individuals and combinations thereof.

在一些实施例中，可以针对目标个体的相关染色体类型中的每一个创建一组至少一个倍性状态假设。倍性状态假设中的每一个可以指目标个体的染色体或染色体区段的一种可能倍性状态。这组假设可以包括预计目标个体的染色体可能具有的可能倍性状态中的一些或所有。可能倍性状态中的一些可以包括缺体、单体性、二体性、单亲二体性、整倍性、三体性、匹配三体性、不匹配三体性、母本三体性、父本三体性、四体性、平衡(2∶2)四体性、不平衡(3∶1)四体性、五体性、六体性、其它非整倍性以及其组合。这些非整倍性状态中的任一个可以是混合或部分非整倍性，例如不平衡易位、平衡易位、罗氏易位(Robertsonian translocation)、重组、缺失、插入、交叉以及其组合。In some embodiments, a set of at least one ploidy state hypothesis may be created for each of the relevant chromosome types of the target individual. Each of the ploidy state hypotheses can refer to one possible ploidy state of a chromosome or chromosome segment of a target individual. The set of hypotheses can include some or all of the possible ploidy states that the target individual's chromosomes are expected to have. Some of the possible ploidy states may include absence, monosomy, disomy, uniparental disomy, euploidy, trisomy, matched trisomy, unmatched trisomy, maternal trisomy, Paternal trisomy, tetrasomy, balanced (2:2) tetrasomy, unbalanced (3:1) tetrasomy, pentasomy, hexasomy, other aneuploidy, and combinations thereof. Any of these aneuploidy states can be mixed or partial aneuploidy, such as unbalanced translocations, balanced translocations, Robertsonian translocations, recombinations, deletions, insertions, crossovers, and combinations thereof.

在一些实施例中，对所测定的倍性状态的了解可以用于做出临床决定。这种了解通常以内容的物理排列形式存储在存储器装置中，然后可以转换成报告。然后可以根据所述报告行动。举例来说，临床决定可以是终止妊娠；或者，临床决定可以是继续妊娠。在一些实施例中，临床决定可以涉及被设计成用于降低遗传病症的表现型表达的严重程度的干预，或采取相关步骤为有特殊需要的孩子做准备的决定。In some embodiments, knowledge of the determined ploidy state can be used to make clinical decisions. This understanding is typically stored in a memory device as a physical arrangement of content, which can then be converted into reports. Actions can then be taken based on the report. For example, the clinical decision may be to terminate the pregnancy; alternatively, the clinical decision may be to continue the pregnancy. In some embodiments, a clinical decision may involve an intervention designed to reduce the severity of the phenotypic expression of a genetic disorder, or a decision to take related steps to prepare a child with special needs.

在本发明的一个实施例中，可以修改本文中所述方法中的任一种以允许多个目标来自同一个目标个体，例如从同一位怀孕母亲抽取多份血液。这可以改进模型的准确性，因为多次遗传测量可以提供更多可以用于测定目标基因型的数据。在一个实施例中，一组目标遗传数据充当所报告的主要数据，并且其它的充当用于复查主要目标遗传数据的数据。在一个实施例中，认为各自从取自目标个体的遗传物质测量的多组遗传数据是平行的，并且因此两组目标遗传数据均用以帮助确定以高准确性测量的亲本遗传数据中的哪些部分构成了胎儿基因组。In one embodiment of the invention, any of the methods described herein can be modified to allow multiple targets from the same target individual, eg, multiple blood draws from the same pregnant mother. This improves the accuracy of the model because multiple genetic measurements provide more data that can be used to determine the genotype of interest. In one embodiment, one set of target genetic data serves as the primary data reported and the other serves as data for review of the primary target genetic data. In one embodiment, multiple sets of genetic data, each measured from genetic material taken from a target individual, are considered to be parallel, and thus both sets of target genetic data are used to help determine which of the parental genetic data are measured with high accuracy. part of the fetal genome.

在一个实施例中，所述方法可以用于亲权测试目的。举例来说，鉴于来自母亲和来自可能是或可能不是基因父亲的男子的基于SNP的基因型信息以及从混合样品测量的基因型信息，有可能确定所述男子的基因型信息是否真的代表孕育中的胎儿的真正基因父亲。做到这一点的简单方式是简单地查看其中母亲是AA，并且可能父亲是AB或BB的背景。在这些情况下，预计这时可以看到父亲分别贡献一半(AA|AB)或所有(AA|BB)。考虑预计ADO，可直接确定所观察到的胎儿SNP是否与可能父亲的那些SNP有关。In one embodiment, the method can be used for paternity testing purposes. For example, given the SNP-based genotype information from the mother and from the man who may or may not be the genetic father, as well as genotype information measured from a pooled sample, it is possible to determine whether the man's genotype information is indeed representative of gestation The true genetic father of the fetus in . A simple way to do this is to simply look at the context where the mother is AA and possibly the father is AB or BB. In these cases, expect to see the father contributing half (AA|AB) or all (AA|BB) at this point, respectively. Taking into account predicted ADO, it is straightforward to determine whether observed fetal SNPs are related to those of the likely father.

本发明的一个实施例可以如下：孕妇想要知道她的胎儿是否患有唐氏综合症和/或它是否会患囊肿性纤维化，并且她不希望生出患有这些病况中的任一种的孩子。医生获取她的血液，并且用一种标志物给血色素染色以使得它清楚地呈现红色，并且用另一种标志物给核物质染色以使得它清楚地呈现蓝色。已知母本红血细胞通常是无核的，而高比例的胎儿细胞含有细胞核，医生能够通过鉴别显示红色和蓝色两种颜色的那些细胞从视觉上分离出多个有核红血细胞。医生用显微操纵器从载玻片上拾取这些细胞并且将它们送到实验室，对十个单独细胞进行扩增和基因分型。通过使用基因测量，PARENTALSUPPORT^TM方法能够确定十个细胞中的六个是母本血液细胞，并且十个细胞中的四个是胎儿细胞。如果怀孕母亲已经生下了孩子，那么PARENTAL SUPPORT^TM还可以用于通过作出胎儿细胞的可靠等位基因判读并且显示它们与已出生孩子的那些不相似而确定胎儿细胞不同于已出生孩子的细胞。应注意，这种方法在概念上与本发明的父本测试实施例类似。从胎儿细胞测量的遗传数据的质量由于难以对单细胞进行基因分型而可能非常不好，包含多个等位基因丢失。临床医师能够使用PARENTAL SUPPORT^TM，使用所测量的胎儿DNA以及亲本的可靠DNA测量结果以高准确性推断出胎儿基因组的各方面，从而将包含在来自胎儿的遗传物质上的遗传数据转换成存储在计算机上的所预测的胎儿遗传状态。临床医师能够确定胎儿的倍性状态和存在或不存在多个相关的疾病连锁基因。结果表明胎儿是整倍体，并且不是囊肿性纤维化的携带者，并且母亲决定继续妊娠。An embodiment of the invention may be as follows: a pregnant woman wants to know whether her fetus has Down syndrome and/or whether it will develop cystic fibrosis, and she does not wish to give birth to a baby with any of these conditions child. Doctors take her blood and stain the hemoglobin with one marker so that it is clearly red, and the nuclear material with another marker so that it is clearly blue. Knowing that maternal red blood cells are usually non-nucleated, while a high proportion of fetal cells contain nuclei, doctors can visually separate multiple nucleated red blood cells by identifying those cells that show both red and blue colors. Doctors use a micromanipulator to pick up the cells from the slide and send them to a lab where ten individual cells are expanded and genotyped. By using genetic measurements, the PARENTALSUPPORT ^™ method was able to determine that six out of ten cells were maternal blood cells and four out of ten cells were fetal cells. If the pregnant mother has given birth to a child, PARENTAL SUPPORT ^™ can also be used to determine that fetal cells differ from those of the already born child by making reliable allelic calls of the fetal cells and showing that they are not similar to those of the already born child. It should be noted that this approach is conceptually similar to the parental test embodiment of the present invention. The quality of genetic data measured from fetal cells can be very poor due to the difficulty of genotyping single cells, containing multiple allelic losses. Using PARENTAL SUPPORT ^TM , clinicians are able to infer with high accuracy aspects of the fetal genome using measured fetal DNA as well as reliable DNA measurements of the parents, thereby converting the genetic data contained on the genetic material from the fetus into stored in Predicted genetic status of the fetus on a computer. Clinicians are able to determine the ploidy status of the fetus and the presence or absence of multiple associated disease-linked genes. It turned out that the fetus was euploid and not a carrier of cystic fibrosis, and the mother decided to continue the pregnancy.

在本发明的一个实施例中，怀孕母亲想要确定她的胎儿是否患有任何全染色体异常。她去找她的医生，并且给出她的血液样品，并且她和她的丈夫从面颊抹试给出了他们自己的DNA样品。实验室研究人员使用MDA方案扩增亲本DNA，并且使用伊路米那印飞尼姆阵列测量大量SNP处的亲本遗传数据，对亲本DNA进行基因分型。研究人员然后将血液旋转降速，取出血浆，并且使用尺寸排阻色谱分离自由浮动的DNA样品。或者，研究人员使用一或多种荧光抗体，例如对胎儿血色素具有特异性的抗体，从而分离有核胎儿红血细胞。研究人员然后获取经分离或富集的胎儿遗传物质并且使用经恰当地设计以使得每个寡核苷酸的两端对应于目标等位基因两侧上的侧接序列的70-mer寡核苷酸库对它进行扩增。在添加聚合酶、连接酶和恰当试剂之后，寡核苷酸经历间隙填充环化，捕获所希望的等位基因。添加核酸外切酶，热灭活，并且产物直接用作PCR扩增模板。在伊路米那基因组分析仪上对PCR产物进行测序。序列读数用作PARENTALSUPPORT^TM方法的输入，所述方法然后预测胎儿的倍性状态。In one embodiment of the invention, a pregnant mother wishes to determine whether her fetus has any global chromosomal abnormalities. She went to her doctor and gave a sample of her blood, and she and her husband gave samples of their own DNA from cheek swabs. Laboratory researchers amplified the parental DNA using the MDA protocol and genotyped the parental DNA using the Illumina Infinium array to measure the parental genetic data at a large number of SNPs. The researchers then spun down the blood, removed the plasma, and used size-exclusion chromatography to separate the free-floating DNA samples. Alternatively, researchers use one or more fluorescent antibodies, such as antibodies specific for fetal hemoglobin, to isolate nucleated fetal red blood cells. Researchers then take the isolated or enriched fetal genetic material and use 70-mer oligonucleotides that are appropriately designed so that the two ends of each oligonucleotide correspond to the flanking sequences flanking the allele of interest The acid library amplifies it. After the addition of polymerase, ligase, and appropriate reagents, the oligonucleotides undergo gap-filling circularization, capturing the desired allele. Exonuclease is added, heat inactivated, and the product used directly as template for PCR amplification. PCR products were sequenced on an Illumina Genome Analyzer. The sequence reads are used as input to the PARENTALSUPPORT ^™ method, which then predicts the ploidy state of the fetus.

在另一个实施例中，其中母亲怀孕了并且是高龄产妇的夫妇想要知道孕育中的胎儿是否患有唐氏综合症、特纳综合症、普拉德-威利综合症或一些其它全染色体异常。产科医师从母亲和父亲抽取血液。将血液送到实验室，在实验室中，技术员对母本样品进行离心以分离血浆和白细胞层。白细胞层和父本血液样品中的DNA通过扩增转换并且编码在所扩增的遗传物质中的遗传数据通过在高通量测序仪上运行遗传物质来测量亲本基因型而从以分子形式存储的遗传数据进一步转换成以电子方式存储的遗传数据。使用5,000重半侧嵌套式靶向PCR方法，在一组基因座优先富集血浆样品。将DNA片段混合物制成适用于测序的DNA库。然后使用高通量测序方法，例如伊路米那GAIIx基因组分析仪对DNA进行测序。所述测序将以分子形式编码在DNA中的信息转换成以电子方式编码在计算机硬件中的信息。基于信息的技术包括本发明所公开的实施例，例如PARENTAL SUPPORT^TM，可以用于测定胎儿的倍性状态。这可以涉及在计算机上，从关于所制备样品得到的DNA测量结果计算多个多态基因座处的等位基因计数概率；在计算机上创建多个各自关于染色体的不同可能倍性状态的倍性假设；针对每种倍性假设，在计算机上为染色体上的多个多态基因座处的预计等位基因计数构建联合分布模型；使用联合分布模型和对所制备样品测量的等位基因计数，在计算机上测定倍性假设中的每一个的相对概率；以及通过选择对应于具有最大概率的假设的倍性状态，判读胎儿的倍性状态。经测定，胎儿患有唐氏综合症。将报告打印出来，或用电子方式发送给孕妇的产科医师，孕妇的产科医师将诊断结果传送给这位妇女。这位妇女、她的丈夫和医生坐下来讨论他们的选择。基于对胎儿患有三体病况的了解，这对夫妇决定终止妊娠。In another embodiment, a couple in which the mother is pregnant and is of advanced maternal age would like to know whether the gestating fetus has Down syndrome, Turner syndrome, Prader-Willi syndrome, or some other full chromosome abnormal. The obstetrician draws blood from the mother and father. The blood is sent to a laboratory where a technician centrifuges the maternal sample to separate the plasma and buffy coat. The DNA in the buffy coat and paternal blood samples is transformed by amplification and the genetic data encoded in the amplified genetic material is converted from molecularly stored DNA by running the genetic material on a high-throughput sequencer to measure the parental genotype The genetic data is further converted into genetic data stored electronically. Plasma samples were preferentially enriched at a panel of loci using a 5,000-plex hemi-nested targeted PCR approach. The DNA fragment mixture is made into a DNA library suitable for sequencing. The DNA is then sequenced using a high-throughput sequencing method, such as the Illumina GAIIx Genome Analyzer. The sequencing converts information molecularly encoded in DNA into information electronically encoded in computer hardware. Information-based techniques, including embodiments disclosed herein, such as PARENTAL SUPPORT ^(TM) , can be used to determine the ploidy status of a fetus. This may involve calculating in silico allele count probabilities at multiple polymorphic loci from DNA measurements taken on the prepared sample; assumptions; for each ploidy hypothesis, a joint distribution model is constructed in silico for the expected allele counts at multiple polymorphic loci on the chromosome; using the joint distribution model and the allele counts measured on the prepared samples, determining on a computer the relative probability of each of the ploidy hypotheses; and deciphering the ploidy state of the fetus by selecting the ploidy state corresponding to the hypothesis with the greatest probability. It was determined that the fetus had Down syndrome. The report is printed or sent electronically to the pregnant woman's obstetrician, who transmits the diagnosis to the woman. The woman, her husband, and the doctor sat down to discuss their options. Based on knowledge that the fetus had a trisomy, the couple decided to terminate the pregnancy.

在一个实施例中，一家公司可以决定提供被设计成用于从所抽取的母本血液检测孕育中的胎儿的非整倍性的诊断技术。他们的产品可以涉及一位母亲出现在她的产科医师面前，她的产科医师可以抽取她的血液。产科医师还可以收集胎儿父亲的遗传样品。临床医师可以从母本血液中分离出血浆，并且从血浆纯化DNA。临床医师还可以从母本血液中分离出白细胞层，并且从白细胞层制备DNA。临床医师还可以从父本遗传样品制备DNA。临床医师可以使用本发明中描述的分子生物学技术向来源于血浆样品的DNA中的DNA中附加通用扩增标记。临床医师可以扩增经通用标记的DNA。临床医师可以通过多种技术优先富集DNA，所述技术包括通过杂交和靶向PCR捕获。靶向PCR可以涉及嵌套、半侧嵌套或半嵌套或用于引起血浆来源DNA的有效富集的任何其它方法。靶向PCR可以大规模复合，例如在一个反应体积中复合10,000个引物，其中所述引物靶向13号、18号、21号、X染色体以及对X和Y两者共用的那些基因座以及任选地其它染色体上的目标SNP。选择性富集和/或扩增可以涉及用不同的标记、分子条形码、扩增用标记和/或测序用标记来标记每个单独分子。临床医师然后可以对血浆样品，以及还可能对所制备的母本和/或父本DNA进行测序。分子生物学步骤可以通过诊断盒完全或部分地执行。序列数据可以馈入到单台计算机或另一种类型的计算平台，例如可以在“云”中找到的。计算平台可以从通过测序仪所得到的测量结果计算在靶向多态基因座的等位基因计数。计算平台可以针对13号、18号、21号、X和Y染色体中的每一条创建关于缺体、单体性、二体性、匹配三体性和不匹配三体性的多个倍性假设。计算平台可以针对有待查询的五条染色体中的每一条的每种倍性假设，为染色体上在目标基因座的预计等位基因计数构建联合分布模型。计算平台可以使用联合分布模型和关于来源于血浆样品的优先富集DNA所测量的等位基因计数测定倍性假设中的每一个正确的概率。计算平台可以针对13号、18号、21号、X和Y染色体中的每一条，通过选择对应于具有最大概率的有密切关系的假设的倍性状态，判读胎儿的倍性状态。可以产生包含所判读的倍性状态的报告，并且可以将所述报告以电子方式发送给产科医师，展现在输出装置上，或可以向产科医师递交所述报告的打印硬拷贝。产科医师可以通知患者和任选地胎儿的父亲，并且他们可以决定他们愿意接受哪些临床选择并且哪个是最合意的。In one example, a company may decide to offer a diagnostic technology designed to detect aneuploidy in a gestating fetus from drawn maternal blood. Their product could involve a mother showing up to her obstetrician who can draw her blood. The obstetrician may also collect a genetic sample from the father of the fetus. Clinicians can separate plasma from maternal blood and purify DNA from plasma. Clinicians can also isolate the buffy coat from maternal blood and prepare DNA from the buffy coat. Clinicians can also prepare DNA from paternal genetic samples. Clinicians can use the molecular biology techniques described in this invention to attach universal amplification markers to DNA in DNA derived from plasma samples. Clinicians can amplify universally labeled DNA. Clinicians can preferentially enrich DNA through a variety of techniques, including capture by hybridization and targeted PCR. Targeted PCR may involve nesting, hemi-nesting or hemi-nesting or any other method for causing efficient enrichment of plasma-derived DNA. Targeted PCR can be multiplexed on a large scale, e.g. 10,000 primers targeting 13, 18, 21, chromosome X and those loci common to both X and Y as well as any Optionally target SNPs on other chromosomes. Selective enrichment and/or amplification may involve labeling each individual molecule with a different marker, molecular barcode, marker for amplification, and/or marker for sequencing. The clinician can then sequence the plasma sample, and possibly also the prepared maternal and/or paternal DNA. The molecular biological steps can be carried out completely or partially by the diagnostic cartridge. Sequence data can be fed into a single computer or another type of computing platform such as can be found in the "cloud". The computing platform can calculate allele counts at targeted polymorphic loci from measurements made by the sequencer. Computing platform can create multiple ploidy hypotheses for absence, monosomy, disomy, matched trisomies, and unmatched trisomies for each of chromosomes 13, 18, 21, X, and Y . The computing platform can build a joint distribution model for predicted allele counts at the target locus on the chromosome for each ploidy hypothesis for each of the five chromosomes to be queried. The computing platform can determine the probability that each of the ploidy hypotheses are correct using the joint distribution model and the allele counts measured for the preferentially enriched DNA derived from the plasma sample. The computing platform can interpret the ploidy state of the fetus by selecting the ploidy state corresponding to the closely related hypothesis with the highest probability for each of chromosomes 13, 18, 21, X and Y. A report containing the called ploidy status can be generated and can be sent electronically to the obstetrician, displayed on an output device, or a printed hard copy of the report can be submitted to the obstetrician. The obstetrician can inform the patient and optionally the father of the fetus, and they can decide which clinical options they are willing to accept and which are most desirable.

在另一个实施例中，孕妇(下文称为“母亲”)可以决定她想要知道她的胎儿是否携带任何遗传异常或其它病况。她可能想要在她确信继续妊娠之前确保不存在任何总体异常。她可以去找她的产科医师，她的产科医师可以获取她的血液样品。他还可以获取遗传样品，例如来自她面颊的腮抹试。他还可以从胎儿的父亲获取遗传样品，例如腮抹试、精子样品或血液样品。他可以将样品发送给临床医师。临床医师可以富集母本血液样品中自由浮动的胎儿DNA的分数。临床医师可以富集母本血液样品中去核胎儿血细胞的分数。临床医师可以使用本文中所述方法的各个方面来测定胎儿的遗传数据。所述遗传数据可以包括胎儿的倍性状态和/或胎儿中的一个或多个疾病连锁等位基因的身份。可以产生一份报告，总结产前诊断结果。所述报告可以传送或邮寄给医生，医生可以告诉母亲胎儿的遗传状态。母亲可以基于胎儿具有一或多种染色体或遗传异常或不希望的病况的事实而决定中断妊娠。她还可以基于胎儿不具有任何总体染色体或遗传异常或任何相关的遗传病况的事实而决定继续妊娠。In another embodiment, a pregnant woman (hereinafter "mother") may decide that she wants to know whether her fetus carries any genetic abnormalities or other conditions. She may want to make sure there are no gross abnormalities before she is sure to continue the pregnancy. She can go to her obstetrician, and her obstetrician can get a sample of her blood. He can also obtain genetic samples, such as a cheek smear from her cheek. He may also obtain a genetic sample, such as a cheek smear, sperm sample, or blood sample, from the fetus' father. He can send the sample to the clinician. Clinicians can enrich the fraction of free-floating fetal DNA in maternal blood samples. Clinicians can enrich the fraction of enucleated fetal blood cells in a maternal blood sample. A clinician can use aspects of the methods described herein to determine genetic data for a fetus. The genetic data may include the ploidy state of the fetus and/or the identity of one or more disease-linked alleles in the fetus. A report can be produced summarizing the results of the prenatal diagnosis. The report can be transmitted or mailed to a doctor who can inform the mother of the genetic status of the fetus. The mother may decide to interrupt the pregnancy based on the fact that the fetus has one or more chromosomal or genetic abnormalities or undesired conditions. She may also decide to continue the pregnancy based on the fact that the fetus does not have any gross chromosomal or genetic abnormalities or any related genetic conditions.

另一个实例可以涉及已经通过精子捐献者人工受精并且怀孕的孕妇。她想要使她所怀的胎儿患遗传疾病的风险降到最低。她已经在抽血者处抽血，并且使用本发明中描述的技术来分离三个有核胎儿红血细胞，并且还从母亲和基因父亲收集组织样品。适当时对来自胎儿和来自母亲和父亲的遗传物质进行扩增并且使用伊路米那印飞尼姆微球阵列进行基因分型，并且本文中所描述的方法以高准确性对亲本和胎儿基因型进行清洁和定相，以及对胎儿做出倍性判读。发现胎儿是整倍体，并且从经重构的胎儿基因型预测表现型敏感性，并且产生报告并将它发送给母亲的医师以便他们可以决定什么样的临床决定可能是最好的。Another example could involve a pregnant woman who has been artificially inseminated by a sperm donor and has become pregnant. She wants to minimize the risk of genetic disease in the fetus she is carrying. She has drawn blood at a phlebotomist and used the techniques described in this invention to isolate three nucleated fetal red blood cells and also collected tissue samples from the mother and the genetic father. Genetic material from the fetus and from the mother and father was amplified as appropriate and genotyped using the Illumina Infinium Bead Array, and the methods described herein segregate parental and fetal genes with high accuracy. Type cleaning and phasing, and fetal ploidy calls. The fetus is found to be euploid, and phenotypic sensitivity is predicted from the reconstructed fetal genotype, and a report is produced and sent to the mother's physician so they can decide what clinical decision might be best.

在一个实施例中，母亲和父亲的原始遗传物质借助于扩增转换成一定量的在序列上类似但是数量上更大的DNA。然后，借助于基因分型方法，由核酸编码的基因型数据被转换成可以按物理方式和/或电子方式存储在存储器装置上的基因测量结果，例如上文所述的那些。使用编程语言，将构成PARENTAL SUPPORT^TM算法(其相关部分在本文中详细讨论)的相关算法翻译成计算机程序。然后，通过在计算机硬件上执行计算机程序，而不是以物理方式编码的比特和字节，以表示原始测量数据的模式排列，它们被转换成表示胎儿倍性状态的高置信度测定的模式。这种转换的细节将依赖于数据本身和用于执行本文中所述方法的计算机语言与硬件系统。然后，将经以物理方式配置以用于表示胎儿的高质量倍性测定的数据转换成报告，所述报告可以发送给医疗保健从业者。这种转换可以使用打印机或计算机显示器执行。所述报告可以是在纸或其它合适的介质上打印的拷贝，或者它可以是电子的。在电子报告的情况下，它可以被传送，它可以按物理方式存储在存储器装置上由医疗保健从业者计算机可访问的位置；它还可以展现在屏幕上以便对它进行阅读。在屏幕显示器的情况下，可以通过引起在显示装置上的像素的物理转换，将所述数据转换成可读格式。可以借助于以物理方式向荧光屏发射电子，借助于更改电荷，以物理方式改变可以位于发射或吸收光子的基板前方的屏幕上的一组特定像素的透明度，来实现转换。这种转换可以借助于改变液晶中分子的纳米级取向来实现，例如在一组特定像素，从向列相到胆甾相或近晶相。这种转换可以借助于电流实现，所述电流引起从一组特定像素发射光子，所述像素由多个以有意义的模式排列的发光二极管制成。这种转换可以通过用于显示信息的任何其它方式实现，例如计算机屏幕或一些其它输出装置或信息传输方式。医疗保健从业者然后可以根据所述报告行动，以使得报告中的数据被转换成行动。所述行动可以是继续或中断妊娠，在中断妊娠的情况下，具有遗传异常的孕育中的胎儿被转换成无生命的胎儿。可以聚集本文中所列的转换，以使得例如可以通过本发明中所概述的多个步骤将怀孕母亲和父亲的遗传物质转换成由遗传异常胎儿流产组成或由继续妊娠组成的医疗决定。或者，可以将一组基因型测量结果转换成帮助医师治疗他的怀孕患者的报告。In one embodiment, the original genetic material of the mother and father is converted by means of amplification into a quantity of DNA that is similar in sequence but greater in quantity. Then, by means of the genotyping method, the genotype data encoded by the nucleic acid is converted into genetic measurements, such as those described above, that can be stored physically and/or electronically on a memory device. Using a programming language, the relevant algorithms making up the PARENTAL SUPPORT ^™ algorithm (the relevant parts of which are discussed in detail herein) are translated into computer programs. Then, by executing a computer program on computer hardware, rather than physically encoding bits and bytes, arranged in patterns representing raw measurement data, they are converted into patterns representing high-confidence determinations of fetal ploidy status. The details of such conversion will depend on the data itself and the computer language and hardware system used to perform the methods described herein. The data physically configured to represent the high quality ploidy determination of the fetus is then converted into a report that can be sent to a healthcare practitioner. This conversion can be performed using a printer or computer monitor. The report may be a printed copy on paper or other suitable medium, or it may be electronic. In the case of an electronic report, it can be transmitted, it can be physically stored on a memory device in a location accessible by the healthcare practitioner's computer; it can also be presented on a screen so that it can be read. In the case of a screen display, the data can be converted into a readable format by causing a physical transformation of pixels on the display device. Switching can be achieved by physically emitting electrons to the phosphor screen, by altering the electrical charge, by physically changing the transparency of a specific set of pixels on the screen that may be located in front of the substrate that emits or absorbs the photons. This switch can be achieved by changing the nanoscale orientation of the molecules in the liquid crystal, for example from a nematic phase to a cholesteric or smectic phase at a specific set of pixels. This switching can be achieved with the aid of an electrical current that causes photons to be emitted from a specific set of pixels made from a number of light-emitting diodes arranged in a meaningful pattern. This conversion may be accomplished by any other means for displaying information, such as a computer screen or some other output device or means of transmission of information. A healthcare practitioner can then act on the report such that the data in the report is translated into action. The action may be continuation or interruption of pregnancy, in which case a gestating fetus with a genetic abnormality is converted into a non-living fetus. The transformations outlined herein can be aggregated such that, for example, the genetic material of the pregnant mother and father can be transformed through the multiple steps outlined in this invention into a medical decision to consist of abortion of a genetically abnormal fetus or to continue a pregnancy. Alternatively, a set of genotype measurements can be converted into a report that helps a physician treat his pregnant patient.

在本发明的一个实施例中，本文中所述的方法可以用于测定胎儿的倍性状态，甚至当寄宿母亲(即孕妇)不是她所怀胎儿的亲生母亲时仍能测定。在本发明的一个实施例中，本文中所述的方法可以用于仅仅使用母本血液样品并且不需要父本遗传样品即测定胎儿的倍性状态。In one embodiment of the invention, the methods described herein can be used to determine the ploidy state of a fetus even when the host mother (ie, pregnant woman) is not the biological mother of the fetus she is carrying. In one embodiment of the invention, the methods described herein can be used to determine the ploidy state of a fetus using only a maternal blood sample and without the need for a paternal genetic sample.

本发明所公开的实施例中的一些数学方法用于建立关于有限数量的非整倍性状态的假设。在一些情况下，预计例如仅零条、一条或两条染色体源自每个亲本。在本发明的一些实施例中，数学推导可以扩展到考虑其它形式的非整倍性，例如四体性(其中三条染色体源自一个亲本)、五体性、六体性等，但不改变本发明的基本概念。同时，有可能集中在更小数量的倍性状态，例如仅仅三体性和二体性。应注意，指示非整数条染色体的倍性测定结果可以指示遗传物质样品中的嵌合。Several mathematical methods in the presently disclosed embodiments are used to establish hypotheses about a finite number of aneuploidy states. In some cases, it is expected that, for example, only zero, one or two chromosomes are derived from each parent. In some embodiments of the invention, the mathematical derivation can be extended to account for other forms of aneuploidy, such as tetrasomy (where three chromosomes are derived from one parent), pentasomy, hexasomy, etc., without changing the invention basic concept. At the same time, it is possible to focus on a smaller number of ploidy states, such as only trisomies and disomies. It should be noted that a ploidy determination result indicating a non-integer number of chromosomes may indicate mosaicism in a sample of genetic material.

在一些实施例中，遗传异常是一类非整倍性，例如唐氏综合症(或21-三体症)、爱德华氏综合症(Edwards syndrome)(18-三体症)、帕陶氏综合症(Patau syndrome)(13-三体症)、特纳综合症(45X)、克氏综合症(含2条X染色体的男性)、普拉德-威利综合症以及迪乔治综合症(DiGeorge syndrome)(UPD 15)。先天失常，例如上一句中所列的那些，通常是不希望的，并且对胎儿患有一或多种表现型异常的了解可以为以下决定提供依据：终止妊娠、采取必需的预防措施以为有特殊需要的孩子的出生做准备或采用一些用于减轻染色体异常严重程度的治疗方法。In some embodiments, the genetic abnormality is a type of aneuploidy, such as Down syndrome (or trisomy 21), Edwards syndrome (trisomy 18), Patau's syndrome Patau syndrome (13-trisomy), Turner syndrome (45X), Klinefelter syndrome (male with 2 X chromosomes), Prader-Willi syndrome, and DiGeorge syndrome (DiGeorge syndrome) syndrome) (UPD 15). Congenital anomalies, such as those listed in the previous sentence, are usually undesired, and knowledge that the fetus has one or more phenotypic abnormalities can inform the decision to: terminate the pregnancy, take necessary precautions as special needs Preparing for the birth of a child in your child or taking some treatments to lessen the severity of the chromosomal abnormality.

在一些实施例中，本文中所描述的方法可以在极早的孕龄使用，例如早在四周、早在五周、早在六周、早在七周、早在八周、早在九周、早在十周、早在十一周以及早在十二周。In some embodiments, the methods described herein can be used at very early gestational ages, such as as early as four weeks, as early as five weeks, as early as six weeks, as early as seven weeks, as early as eight weeks, as early as nine weeks , as early as ten weeks, as early as eleven weeks and as early as twelve weeks.

在一些实施例中，本文中所公开的方法是在植入前基因诊断(PGD)的情况下用于在体外受精期间的胚胎选择，其中目标个体是胚胎，并且亲本基因型数据可以用于从来自第3天的胚胎的单细胞或两个细胞活检或来自第5天或第6天的胚胎滋养层的活检的测序数据，做出关于胚胎的倍性测定。在PGD情况下，仅测量孩子DNA，并且仅测试少量细胞，一般是一个到五个，但也可多达十个、二十个或五十个。然后通过孩子基因型和细胞数量，简单地确定A和B等位基因(在SNP)的起始拷贝总数。在NPD中，起始拷贝数极高并且因此预计在PCR之后的等位基因比率准确反映了起始比率。然而，PGD中的少量起始拷贝意味着，污染和不完善的PCR效率对PCR之后的等位基因比率具有重要影响。这种影响可能比预测在测序之后测量的等位基因比率的偏差时的读数深度重要。可以基于PCR探针效率和污染概率，通过PCR过程的蒙特卡罗模拟(Monte Carlosimulation)来创建针对已知的孩子基因型测量的等位基因比率的分布。鉴于针对每个可能的孩子基因型的等位基因比率分布，可以如针对NIPD所述计算各个假设的似然性。In some embodiments, the methods disclosed herein are used for embryo selection during in vitro fertilization in the context of preimplantation genetic diagnosis (PGD), where the target individual is an embryo, and parental genotype data can be used to derive from Ploidy determinations on embryos were made from sequencing data from single or two cell biopsies of day 3 embryos or from day 5 or 6 embryo trophoblast biopsies. In the case of PGD, only the child's DNA is measured, and only a small number of cells are tested, typically one to five, but as many as ten, twenty or fifty. The total number of starting copies of the A and B alleles (at the SNP) is then simply determined from the child genotype and cell number. In NPD, the starting copy number is extremely high and therefore the allele ratio after PCR is expected to accurately reflect the starting ratio. However, the low number of starting copies in PGD means that contamination and imperfect PCR efficiency have a significant impact on the allele ratio after PCR. This effect may be more important than read depth in predicting bias in allele ratios measured after sequencing. A distribution of allele ratios measured for known child genotypes can be created by Monte Carlosimulation of the PCR process based on PCR probe efficiencies and contamination probabilities. Given the allele ratio distribution for each possible child genotype, the likelihood of each hypothesis can be calculated as described for NIPD.

最大似然估计maximum likelihood estimation

本领域中已知用于检测存在或不存在生物现象或医学病况的大部分方法涉及使用单一假设排斥测试，其中测量与病况有关的度量值，并且如果度量值在指定阈值的一侧，那么存在病况；而如果度量值下降到阈值的另一侧，那么不存在病况。当在零假设与备择假设之间做决定时，单一假设排斥测试仅仅查看零分布。在不考虑备择分布的情况下，不能鉴于所观察到的数据估计每种假设的似然性并且因此不能计算所述判读的置信度。因此，在单一假设排斥测试的情况下，在不清楚与特定情况相关的置信度的情况下，得到是或否的回答。Most methods known in the art for detecting the presence or absence of a biological phenomenon or medical condition involve the use of a single hypothesis rejection test in which a metric related to the condition is measured and if the metric is on one side of a specified threshold, then the presence or absence condition; and if the metric falls to the other side of the threshold, the condition does not exist. A single hypothesis rejection test looks only at the null distribution when deciding between the null hypothesis and the alternative hypothesis. Without taking into account the alternative distributions, the likelihood of each hypothesis cannot be estimated given the observed data and thus the confidence in the call cannot be calculated. Thus, in the case of a single-hypothesis rejection test, a yes or no answer is obtained without knowing the degree of confidence associated with a particular situation.

在一些实施例中，本文中所公开的方法能够使用最大似然法检测存在或不存在生物现象或医学病况。这实质上改进了使用单一假设排斥技术的方法，因为适当时可以针对每种情况调整对不存在或存在病况的判读的阈值。这对于旨在从可自存在于母本血浆中所发现的自由浮动的DNA中的胎儿和母本DNA的混合物获得的遗传数据测定在孕育中的胎儿中存在或不存在非整倍性的诊断技术是尤其恰当的。这是因为随着血浆来源部分中胎儿DNA的分数改变，判读非整倍性对比整倍性的最佳阈值改变。随着胎儿分数下降，与非整倍性相关的数据的分布变得越来越类似于与整倍性相关的数据的分布。In some embodiments, the methods disclosed herein are capable of detecting the presence or absence of a biological phenomenon or medical condition using maximum likelihood methods. This substantially improves methods using single hypothesis rejection techniques, as the thresholds for calls of absence or presence of the condition can be adjusted for each case as appropriate. This is useful for diagnostics aimed at determining the presence or absence of aneuploidy in a gestating fetus from genetic data obtainable from a mixture of fetal and maternal DNA present in free-floating DNA found in maternal plasma Technology is especially appropriate. This is because the optimal threshold for calling aneuploidy versus euploidy changes as the fraction of fetal DNA in the plasma-derived fraction changes. The distribution of data associated with aneuploidy became more and more similar to that of data associated with euploidy as fetal fraction decreased.

最大似然估计法使用与每种假设相关的分布来估计以每种假设为条件的数据的似然性。这些条件概率然后可以转化为假设判读和置信度。类似地，最大后验概率估计法使用与最大似然估计相同的条件概率，而且当选择最佳假设并且测定置信度时并入了群体先验。Maximum likelihood estimation uses the distributions associated with each hypothesis to estimate the likelihood of the data conditioned on each hypothesis. These conditional probabilities can then be translated into hypothesis calls and confidence levels. Similarly, maximum a posteriori probability estimation uses the same conditional probabilities as maximum likelihood estimation, and incorporates a population prior when selecting the best hypothesis and determining confidence.

因此，最大似然估计(MLE)技术或密切相关的最大后验概率(MAP)技术的使用得到两个优点，第一它增加了正确判读的机率，并且它还允许计算每个判读的置信度。在一个实施例中，选择对应于具有最大概率的假设的倍性状态是使用最大似然估计或最大后验概率估计来执行的。在一个实施例中，公开了一种用于测定孕育中的胎儿的倍性状态的方法，所述方法涉及采用本领域中当前已知使用单一假设排斥技术的任何方法并且重新用公式表示它以使得它使用MLE或MAP技术。可以通过应用这些技术显著改进的方法的一些实例可以发现于美国专利8,008,018、美国专利7,888,017或美国专利7,332,277中。Thus, the use of the Maximum Likelihood Estimation (MLE) technique or the closely related Maximum A Posteriori Probability (MAP) technique yields two advantages, first it increases the chance of a correct interpretation, and it also allows the calculation of a confidence level for each interpretation . In one embodiment, selecting the ploidy state corresponding to the hypothesis with the greatest probability is performed using maximum likelihood estimation or maximum a posteriori probability estimation. In one embodiment, a method for determining the ploidy state of a gestating fetus is disclosed that involves taking any method currently known in the art that uses a single hypothesis rejection technique and reformulating it as makes it use MLE or MAP techniques. Some examples of methods that can be significantly improved by applying these techniques can be found in US Patent 8,008,018, US Patent 7,888,017 or US Patent 7,332,277.

在一个实施例中，描述了一种用于测定在包含胎儿和母本基因组DNA的母本血浆样品中存在或不存在胎儿非整倍性的方法，所述方法包含：获得母本血浆样品；用高通量测序仪测量在血浆样品中所发现的DNA片段；将所述序列映射到染色体并且测定映射到每条染色体的序列读数的数量；计算血浆样品中胎儿DNA的分数；使用胎儿分数和映射到预计是整倍体的一条或多条参考染色体的序列读数的数量，计算如果那样如果第二目标染色体是整倍体，那么将预计存在的目标染色体的量的预计分布；和如果所述染色体是非整倍体，那么将预计的一个或多个预计分布；以及使用MLE或MAP测定所述分布中哪一种最可能是正确的，从而指示存在或不存在胎儿非整倍性。在一个实施例中，从血浆测量DNA可以涉及执行大规模并行鸟枪法测序。在一个实施例中，从血浆样品测量DNA可以涉及对例如通过靶向扩增，在多个多态或非多态基因座已经优先富集的DNA进行测序。多个基因座可以被设计成用于靶向一个或少量疑似非整倍体染色体和一个或少量参考染色体。优先富集的目的是为了增加为倍性测定提供信息的序列读数的数量。In one embodiment, a method for determining the presence or absence of fetal aneuploidy in a maternal plasma sample comprising fetal and maternal genomic DNA is described, the method comprising: obtaining a maternal plasma sample; measuring DNA fragments found in a plasma sample with a high-throughput sequencer; mapping the sequences to chromosomes and determining the number of sequence reads mapped to each chromosome; calculating the fraction of fetal DNA in the plasma sample; using the fetal fraction and a number of sequence reads that map to one or more reference chromosomes predicted to be euploid, calculating a predicted distribution of the amount of target chromosomes that would be predicted to be present if the second target chromosome were euploid; and if said If the chromosome is aneuploid, one or more predicted distributions are predicted; and MLE or MAP is used to determine which of said distributions is most likely to be correct, thereby indicating the presence or absence of fetal aneuploidy. In one embodiment, measuring DNA from plasma may involve performing massively parallel shotgun sequencing. In one embodiment, measuring DNA from a plasma sample may involve sequencing DNA that has been preferentially enriched at multiple polymorphic or non-polymorphic loci, eg, by targeted amplification. Multiple loci can be designed to target one or a small number of suspected aneuploid chromosomes and one or a small number of reference chromosomes. The purpose of preferential enrichment is to increase the number of sequence reads that inform ploidy determination.

倍性判读信息法ploidy reading information method

本文中描述了一种鉴于序列数据测定胎儿倍性状态的方法。在一些实施例中，这个序列数据可以在高通量测序仪上测量。在一些实施例中，序列数据可以对来源于从母本血液中分离的自由浮动的DNA的DNA进行测量，其中自由浮动的DNA包含一些母本来源的DNA和一些胎儿/胎盘来源的DNA。本部分将描述本发明的一个实施例，其中在假定已经进行分析的混合物中胎儿DNA的分数不是已知的并且将从数据估计的情况下测定胎儿的倍性状态。它还将描述一个实施例，其中混合物中胎儿DNA的分数(“胎儿分数”)或胎儿DNA的百分比可以通过另一种方法测量，并且在测定胎儿倍性状态时假定是已知的。在一些实施例中，胎儿分数可以仅仅使用关于母本血液样品本身所得到的基因分型测量结果来计算，所述母本血液样品是胎儿和母本DNA的混合物。在一些实施例中，所述分数还可以使用所测量的或以其它方式已知的母亲基因型和/或所测量的或以其它方式已知的父亲基因型来计算。在另一个实施例中，胎儿的倍性状态可以仅仅基于针对所讨论的染色体所计算的胎儿DNA的分数相比于针对假定二体的参考染色体所计算的胎儿DNA的分数来测定。Described herein is a method for determining the ploidy status of a fetus in view of sequence data. In some embodiments, this sequence data can be measured on a high throughput sequencer. In some embodiments, sequence data may be measured on DNA derived from free-floating DNA isolated from maternal blood, wherein the free-floating DNA comprises some maternally derived DNA and some fetal/placental derived DNA. This section will describe an embodiment of the invention in which the ploidy state of the fetus is determined under the assumption that the fraction of fetal DNA in the mixture that has been analyzed is not known and will be estimated from the data. It will also describe an embodiment wherein the fraction of fetal DNA in the mixture ("fetal fraction") or percentage of fetal DNA can be measured by another method and assumed to be known when determining the ploidy status of the fetus. In some embodiments, the fetal fraction can be calculated using only genotyping measurements taken on the maternal blood sample itself, which is a mixture of fetal and maternal DNA. In some embodiments, the score may also be calculated using the measured or otherwise known maternal genotype and/or the measured or otherwise known paternal genotype. In another embodiment, the ploidy state of the fetus may be determined solely based on the fraction of fetal DNA calculated for the chromosome in question compared to the fraction of fetal DNA calculated for a presumed disomy reference chromosome.

在优选实施例中，假如关于特定染色体，我们观察并分析N个SNP，对此我们：In a preferred embodiment, given that we observe and analyze N SNPs on a particular chromosome, for which we:

●设定NR个自由浮动的DNA序列测量结果S＝(s₁，...，s_NR)。因为这种方法利用SNP测量结果，所以对应于非多态基因座的所有序列数据可以忽略不计。在简化版本中，其中我们具有在每个SNP上的(A，B)计数，其中A和B对应于存在于指定基因座的两种等位基因，S可以写成S＝((a₁，b₁)，...，(a_N，b_N))，其中a_i是在SNP i上的A计数，b_i是在SNP i上的B计数，并且• Set NR free-floating DNA sequence measurements S = (s ₁ , . . . , s _NR ). Because this method utilizes SNP measurements, all sequence data corresponding to non-polymorphic loci can be ignored. In a simplified version, where we have (A, B) counts at each SNP, where A and B correspond to the two alleles present at a given locus, S can be written as S=((a ₁ , b ₁ ),..., (a _N , b _N )), where a _i is the A count on SNP i, b _i is the B count on SNP i, and

●由以下组成的亲本数据● Parental data consisting of

○来自SNP微阵列或其它基于强度的基因分型平台的基因型：母亲M＝(m₁，...，m_N)，父亲F＝(f₁，...，f_N)，其中m_i，f_i∈(AA，AB，BB)。○ Genotypes from SNP microarrays or other intensity-based genotyping platforms: mother M=(m ₁ , . . . , m _N ), father F=(f ₁ , . . . , f _N ), where m _i , f _i ∈ (AA, AB, BB).

○和/或序列数据测量结果：NRM母亲测量结果SM＝(sm₁，...，sm_nrm)，NRF父亲测量结果SF＝(sf₁，...，sf_nrf)。类似于以上简化，如果我们具有在每个SNP上的(A，B)计数，那么SM＝((am₁，bm₁)，...，(am_N，bm_N))，SF＝((af₁，bf₁)，...，(af_N，bf_N))。○ and/or sequence data measurements: NRM mother measurements SM = (sm ₁ , . . . , sm _nrm ), NRF father measurements SF = (sf ₁ , . . . , sf _nrf ). Similar to the above simplification, if we have (A, B) counts at each SNP, then SM = ((am ₁ , bm ₁ ), . . . , (am _N , bm _N )), SF = (( af ₁ , bf ₁ ), . . . , (af _N , bf _N )).

总起来说，母亲、父亲孩子数据表示为D＝(M，F，SM，SF，S)。应注意，亲本数据是所希望的并且提高了算法的准确性，但是不是必需的，尤其是父亲数据。这意味着即使在不存在母亲和/或父亲数据的情况下，仍有可能得到非常准确的拷贝数结果。In general, mother, father and child data are expressed as D=(M, F, SM, SF, S). It should be noted that parental data is desirable and improves the accuracy of the algorithm, but not required, especially paternal data. This means that very accurate copy number results are still possible even in the absence of maternal and/or paternal data.

有可能通过对所认为的所有假设(H)的对数似然性这个数据求最大值，推导出最佳拷贝数估计。具体来说，有可能使用联合分布模型和对所制备的样品测量的等位基因计数，测定倍性假设中的每一个的相对概率，并且使用那些相对概率如下测定最可能是正确的假设：It is possible to derive an optimal copy number estimate by maximizing the log-likelihood data for all hypotheses (H) considered. Specifically, it is possible to determine the relative probability of each of the ploidy hypotheses using the joint distribution model and allele counts measured on the prepared samples, and use those relative probabilities to determine the hypothesis most likely to be correct as follows:

类似地，鉴于所述数据，后验假设似然性可以写成：Similarly, given the data, the posterior hypothesis likelihood can be written as:

其中先验概率(H)是基于模型设计和先验知识，分配给每种假设H的先验概率。where the prior probability (H) is the prior probability assigned to each hypothesis H based on the model design and prior knowledge.

还可能使用先验找到最大后验概率估计：It is also possible to find the maximum a posteriori probability estimate using the prior:

在一个实施例中，可以考虑的拷贝数假设是：In one embodiment, copy number hypotheses that may be considered are:

●单体性：●Singleness:

○母本H10(来自母亲的一个拷贝)○Maternal H10 (one copy from mother)

○父本H01(来自父亲的一个拷贝)○Sire parent H01 (one copy from father)

●二体性：H11(母亲和父亲各一个拷贝)●Disomy: H11 (one copy each for mother and father)

●简单三体性，不考虑交叉：●Simple trisomy without crossover:

○母本：H21_匹配(来自母亲的两个一致拷贝，来自父亲的一个拷贝)、H21_不匹配(来自母亲的两个拷贝，来自父亲的一个拷贝)○Maternal: H21_match (two identical copies from mother, one copy from father), H21_mismatch (two copies from mother, one copy from father)

○父本：H12_匹配(来自母亲的一个拷贝，来自父亲的两个一致拷贝)、H12_不匹配(来自母亲的一个拷贝，来自父亲的两个拷贝)○Paternal: H12_match (one copy from mother, two identical copies from father), H12_mismatch (one copy from mother, two copies from father)

●复合三体性，允许交叉(使用联合分布模型)：Compound trisomy, allowing crossover (using joint distribution model):

○母本H21(来自母亲的两个拷贝，来自父亲的一个)，○ female parent H21 (two copies from mother, one from father),

○父本H12(来自母亲的一个拷贝，来自父亲的两个拷贝)○ Male parent H12 (one copy from mother, two copies from father)

在其它实施例中，可以考虑其它倍性状态，例如缺体(H00)、单亲二体性(H20和H02)以及四体性(H04、H13、H22、H31和H40)。In other embodiments, other ploidy states may be considered, such as absence (H00), uniparental disomy (H20 and H02), and tetrasomy (H04, H13, H22, H31 and H40).

如果不存在交叉，那么每个三体性无论来源是有丝分裂、减数分裂I或减数分裂II，都将是匹配或不匹配三体性中的一种。由于交叉，真正的三体性通常是两个的组合。首先，描述了一种推导关于简单假设的假设似然性的方法。然后，描述了一种推导关于复合假设的假设似然性的方法，将单独SNP似然性与交叉组合。If there were no crossover, then every trisomy, whether of mitotic, meiotic I, or meiotic II origin, would be either a matched or mismatched trisomy. A true trisomy is usually a combination of two due to crossover. First, a method for deriving hypothesis likelihoods about simple hypotheses is described. Then, a method for deriving hypothesis likelihoods about composite hypotheses is described, combining individual SNP likelihoods with crossovers.

关于简单假设的LIK(D|H)LIK(D|H) on Simple Hypotheses

在一个实施例中，针对简单假设，LIK(D|H)可以如下测定。关于简单假设H，整条染色体上的假设H的对数似然性LIK(H)可以计算为单独SNP的对数似然性的总和，假定已知的或所推导出的孩子分数是cf。在一个实施例中，有可能从数据推导出cf。In one embodiment, for simple assumptions, LIK(D|H) can be determined as follows. For a simple hypothesis H, the log-likelihood LIK(H) of hypothesis H over the entire chromosome can be calculated as the sum of the log-likelihoods of the individual SNPs, assuming the known or derived child fraction is cf. In one embodiment, it is possible to derive cf from the data.

这种假设不假定SNP之间的任何连锁，并且因此不利用联合分布模型。This assumption does not assume any linkage between SNPs, and thus does not exploit joint distribution models.

在一些实施例中，可以基于每个SNP测定对数似然性。在特定SNP i上，假定胎儿倍性假设为H并且胎儿DNA百分比为cf，所观察到的数据D的对数似然性定义为：In some embodiments, log-likelihoods can be determined on a per-SNP basis. On a particular SNP i, assuming fetal ploidy hypothesis H and percent fetal DNA cf, the log-likelihood of the observed data D is defined as:

其中鉴于假设H，m是可能真实的母亲基因型，f是可能真实的父亲基因型，其中m，f{AA，AB，BB}，并且c是可能的孩子基因型。具体来说，关于单体性，c；关于二体性，c；关于三体性，c。where m is the likely true maternal genotype and f is the likely true paternal genotype given assumption H, where m, f{AA,AB,BB}, and c are the likely child genotypes. Specifically, about monosomy, c; about disomy, c; about trisomy, c.

基因型先验频率：p(m|i)是基于在SNP I处表示为pA_i的已知群体频率，SNP i上的母亲基因型m的一般先验概率。具体来说：Genotype prior frequency: p(m|i) is the general prior probability of maternal genotype m at SNP i based on the known population frequency denoted pA _i at SNP i. Specifically:

；；;;

父亲基因型概率p(f|i)可以用类似方式测定。The paternal genotype probability p(f|i) can be determined in a similar manner.

真实的孩子概率：是鉴于亲本m，f，并且假定假设H(这些可以容易计算)，获得真实的孩子基因型＝c的概率。举例来说，关于H11、H21匹配和H21不匹配，下面给出p(c|m，f，H)。True child probability: is the probability of obtaining true child genotype = c, given the parents m, f, and assuming assumptions H (these can be easily calculated). For example, p(c|m, f, H) is given below for H11, H21 match and H21 mismatch.

数据似然性：是鉴于真实的母亲基因型m、真实的父亲基因型f、真实的孩子基因型c、假设H以及孩子分数cf，在SNP i上指定数据D的概率。它可以如下分解为母亲、父亲和孩子数据的概率：Data Likelihood: is the probability of assigning data D on SNP i given the true mother genotype m, true father genotype f, true child genotype c, hypothesis H, and child fraction cf. It can be broken down into probabilities for mother, father and child data as follows:

母亲SNP阵列数据似然性：假定SNP阵列基因型是正确的，在SNP i，母亲SNP阵列基因型数据相比于真实基因型m的概率简单地是Maternal SNP array data likelihood: Assuming that the SNP array genotype is correct, at SNP i, the probability of maternal SNP array genotype data compared to the true genotype m is simply

母亲序列数据似然性：在计数S_i＝(am_i，bm_i)的情况下，在不涉及额外噪声或偏差的情况下，在SNP i的母亲序列数据的概率是二项式概率，定义为P(SM|m，i)＝P_X|m(am_i)，其中X|m～Binom(p_m(A)，am_i+bm_i)，其中定义为Maternal sequence data likelihood: Given counts S _i = (am _i , bmi ₎ , the probability of maternal sequence data at SNP i is the binomial probability, defined by is P(SM|m, i)=P _X|m (am _i ), where X|m～Binom(p _m (A), am _i +bm _i ), which is defined as

mm AAAAA ABAB BBBB AA BB 无判读No interpretation p(A)p(A) 11 0.50.5 00 11 00 0.50.5

父亲数据似然性：类似方程式适用于父亲数据似然性。Paternal Data Likelihood: A similar equation applies to paternal data likelihood.

应注意，有可能在无亲本数据、尤其是父亲数据的情况下测定孩子基因型。举例来说，如果不可获得父亲基因型数据F，那么可以仅使用。如果不可获得父亲序列数据SF，那么可以仅使用P(SF|f，i)＝1。It should be noted that it is possible to determine the genotype of a child without parental data, especially paternal data. For example, paternal genotype data F may only be used if not available. If the parent sequence data SF is not available, then P(SF|f,i)=1 can only be used.

在一些实施例中，所述方法涉及针对每种倍性假设，为染色体上的多个多态基因座处的预计等位基因计数构建联合分布模型；在此描述了实现这种目的的一种方法。自由胎儿DNA数据似然性：是鉴于真实的母亲基因型m、真实的孩子基因型c、孩子拷贝数假设H并且假定孩子分数为cf，SNP i上自由胎儿DNA序列数据的概率。它实际上是鉴于SNP i上的A含量的真实概率，在SNP I上序列数据S的概率In some embodiments, the method involves constructing a joint distribution model for predicted allele counts at multiple polymorphic loci on a chromosome for each ploidy hypothesis; a method for doing so is described herein. method. Free fetal DNA data likelihood: is the probability of free fetal DNA sequence data at SNP i given the true maternal genotype m, true child genotype c, child copy number hypothesis H, and assumed child fraction cf. It is actually the probability of sequence data S at SNP I given the true probability of A content at SNP i

关于计数，其中S_i＝(a_i，b_i)，不涉及额外的数据噪声或偏差，Regarding counts, where S _i = (a _i , _bi ), no additional data noise or bias is involved,

其中X～Binom(p(A)，a_i+b_i)，其中p(A)＝。在更复杂的情况下，其中精确比对和每个SNP的(A，B)计数未知，是综合二项式的组合。where X~Binom(p(A), a _i + _bi ), where p(A)=. In more complex cases, where the exact alignment and (A,B) counts for each SNP are unknown, is a combination of synthetic binomials.

A含量的真实概率：，假定真实的母亲基因型＝m，真实的孩子基因型＝c，并且整体孩子分数＝cf，在这种母亲/孩子混合物中在SNP i上的A含量的真实概率定义为True Probability of A Content: , assuming true maternal genotype = m, true child genotype = c, and overall child fraction = cf, the true probability of A content at SNP i in this mother/child mixture is defined for

其中#A(g)＝基因型g中A的数量，是母亲的染色体性并且是在假设H下孩子的倍性(单体性是1，二体性是2，三体性是3)。where #A(g) = number of A's in genotype g, is the chromosomal sex of the mother and is the ploidy of the child under assumption H (monosomy is 1, disomy is 2, trisomy is 3).

使用联合分布模型：复合假设的LIK(D|H)Using Joint Distribution Models: LIK(D|H) for Composite Hypotheses

在一些实施例中，所述方法涉及针对每种倍性假设，为染色体上的多个多态基因座处的预计等位基因计数构建联合分布模型；在此描述了实现这种目的的一种方法。在许多情况下，三体性由于交叉，通常不仅仅是匹配或不匹配，因此在本部分中，推导复合假设H21(母本三体性)和H12(父本三体性)的结果，所述假设组合匹配和不匹配三体性，解释可能的交叉。In some embodiments, the method involves constructing a joint distribution model for predicted allele counts at multiple polymorphic loci on a chromosome for each ploidy hypothesis; a method for doing so is described herein. method. In many cases, trisomy is often not just a match or mismatch due to crossover, so in this section, the results of the composite hypotheses H21 (maternal trisomy) and H12 (paternal trisomy) are derived, so The above hypotheses combine matching and mismatching trisomies, explaining the possible crossover.

在三体性的情况下，如果不存在交叉，那么三体性将简单地是匹配或不匹配三体性。匹配三体性是孩子从一个亲本遗传了一致染色体区段的两个拷贝的情况。不匹配三体性是孩子从亲本遗传了每个同源染色体区段的一个拷贝的情况。由于交叉，染色体的一些区段可以具有匹配三体性，并且其它部分可以具有不匹配三体性。在本部分中描述了如何为一组等位基因的杂合率构建联合分布模型；也就是说，针对一或多种假设，为多个基因座处的预计等位基因计数构建联合分布模型。In the case of a trisomy, if there is no crossover, then the trisomy will simply be a matching or mismatching trisomy. Matched trisomy is a condition in which a child inherits two copies of an identical chromosomal segment from one parent. Unmatched trisomy is a condition in which a child inherits one copy of each homologous chromosome segment from a parent. Due to crossing over, some segments of a chromosome may have a matching trisomy and other parts may have an unmatched trisomy. This section describes how to construct a joint distribution model for heterozygosity rates for a set of alleles; that is, for one or more assumptions, to construct a joint distribution model for predicted allele counts at multiple loci.

假如在SNP i上，是匹配假设H_m的拟合，并且是不匹配假设H_u的拟合，并且pc(i)＝SNP i-1与i之间的交叉概率。那么可以如下计算完整的似然性：Assume that on SNP _i , is the fit of the matching hypothesis H _m and is the fit of the mismatching hypothesis Hu, and pc(i)=crossover probability between SNP i−1 and i. Then the full likelihood can be calculated as follows:

其中是关于SNP 1：N，以假设E为结果的可能性。E＝最后一个SNP的假设，E。以递归方式，可以计算：where is the likelihood for SNP 1:N with hypothesis E as the outcome. E = Hypothesis for the last SNP, E. Recursively, one can compute:

其中～E是除E以外的假设(不是E)，其中所考虑的假设是H_m和H_u。具体来说，可以基于在相同假设并且无交叉或相反假设并且交叉的情况下1∶(i-1)SNP的似然性，乘以SNP i的似然性，计算1∶i SNP的似然性where ~E is a hypothesis other than E (not E), where the hypotheses considered are H _m and _Hu . Specifically, the likelihood of the 1:i SNP can be calculated based on the likelihood of the 1:(i-1) SNP under the same assumption and no crossover or the opposite assumption and crossover, multiplied by the likelihood of SNP i sex

关于SNP 1，i＝1，。For SNP 1, i = 1, .

关于SNP 2，i＝2，，For SNP 2, i = 2,

并且关于i＝3∶N等。And about i=3:N etc.

在一些实施例中，可以测定孩子分数。孩子分数可以指DNA混合物中源自孩子的序列的比例。在非侵入性产前诊断的情况下，孩子分数可以指母本血浆中源自胎儿或含胎儿基因型的胎盘部分的序列比例。它可以指已经从母本血浆制备并且可以富集胎儿DNA的DNA样品中的孩子分数。测定DNA样品中的孩子分数的一个目的适用于可以对胎儿做出倍性判读的算法，因此，孩子分数可以指为了非侵入性产前诊断进行测序分析的任何DNA样品。In some embodiments, child scores may be determined. A child fraction can refer to the proportion of sequences in a DNA mixture that are derived from a child. In the context of non-invasive prenatal diagnosis, the child fraction can refer to the proportion of sequence in maternal plasma that is derived from the fetus or contains the placental portion of the fetal genotype. It can refer to the child's fraction in a DNA sample that has been prepared from maternal plasma and can be enriched for fetal DNA. One purpose of determining the child's fraction in a DNA sample applies to algorithms that can make ploidy calls to fetuses, thus, child's fraction can refer to any DNA sample that is sequenced for non-invasive prenatal diagnosis.

本发明中所提出的作为非侵入性产前非整倍性诊断方法的一部分的算法中的一些假定孩子分数是已知的，这可能并非总是如此。在一个实施例中，有可能在存在或不存在亲本数据的情况下，通过对所选染色体上的二体性的似然性求最大值，找到最可能的孩子分数Some of the algorithms proposed in this invention as part of the non-invasive prenatal aneuploidy diagnosis method assume that the child's score is known, which may not always be the case. In one embodiment, it is possible to find the most probable child score by maximizing the likelihood of disomy on selected chromosomes, in the presence or absence of parental data

具体来说，关于二体性假设和关于染色体chr上的孩子分数cf，假如LIK(D|H11，cf，chr)＝如上所述的对数似然性，关于Cset中的所选染色体(通常是1∶16)，假定是整倍体，完整似然性是：Specifically, regarding the disomy hypothesis and regarding the child fraction cf on chromosome chr, given LIK(D|H11, cf, chr) = log-likelihood as above, regarding the selected chromosome in Cset (usually is 1:16), assuming euploidy, the full likelihood is:

最可能的孩子分数(经推导为。The most probable child score (derived as .

有可能使用任一组染色体。还可能在不假定参考染色体上的整倍性的情况下推导孩子分数。使用这种方法，有可能测定以下情况中的任一种的孩子分数：(1)具有关于亲本的阵列数据和关于母本血浆的鸟枪法测序数据的情况；(2)具有关于亲本的阵列数据和关于母本血浆的靶向测序数据的情况；(3)具有关于亲本和母本血浆的靶向测序数据的情况；(4)具有关于母亲和母本血浆分数的靶向测序数据的情况；(5)具有关于母本血浆分数的靶向测序数据的情况；(6)亲本和孩子分数测量结果的其它组合。It is possible to use either set of chromosomes. It is also possible to derive child scores without assuming euploidy on the reference chromosome. Using this approach, it is possible to determine child scores for either: (1) with array data on the parents and shotgun sequencing data on the maternal plasma; (2) with array data on the parents and the case of targeted sequencing data on maternal plasma; (3) the case with targeted sequencing data on parental and maternal plasma; (4) the case with targeted sequencing data on maternal and maternal plasma fractions; (5) Cases with targeted sequencing data on maternal plasma fractions; (6) Other combinations of parental and child fractional measurements.

在一些实施例中，信息法可以结合数据丢失；这可以产生具有更高准确性的倍性测定。在本发明中的其它地方，已经假定获得A的概率是真实的母亲基因型、真实的孩子基因型、混合物中的孩子分数以及孩子拷贝数的直接函数。还可能会丢失母亲或孩子等位基因，举例来说代替测量混合物中真实的孩子AB，这种情况可能是仅仅测量映射到等位基因A的序列。可以将基因组伊路米那数据的亲本丢失率表示为d_pg，将序列数据的亲本丢失率表示为d_ps并且将序列数据的孩子丢失率表示为d_cs。在一些实施例中，母亲丢失率可以假定是零，并且孩子丢失率相对较低；在此情况下，结果不会受到丢失的严重影响。在一些实施例中，等位基因丢失的可能性可以非常大，以致于它们对预测倍性判读产生了显著的影响。针对这种情况，在此将等位基因丢失并入算法中：In some embodiments, the informatics method can incorporate data loss; this can result in a ploidy determination with greater accuracy. Elsewhere in the present invention, the probability of obtaining an A has been assumed to be a direct function of the true mother's genotype, the true child's genotype, the fraction of children in the pool, and the child's copy number. It is also possible to miss the mother or child allele, for example instead of measuring the true child AB in the mixture, it might be the case that only the sequence mapped to allele A is measured. The parental loss rate for genomic illumina data can be denoted as d _pg , the parental loss rate for sequence data as d _ps and the child loss rate for sequence data as d _cs . In some embodiments, the mother loss rate may be assumed to be zero, and the child loss rate relatively low; in this case, the results would not be significantly affected by loss. In some embodiments, the probabilities of allelic loss can be so great that they have a significant impact on predicting ploidy calls. For this case, allelic dropout is incorporated into the algorithm here:

亲本SNP阵列数据丢失：关于母亲基因组数据M，假如丢失之后的基因型是m_d，那么The parental SNP array data is missing: Regarding the maternal genome data M, if the genotype after the loss is m _d , then

其中，如前所述；并且是鉴于真实的基因型m，针对丢失率d，在可能的丢失之后基因型m_d的似然性，定义如下where, as before; and is the likelihood of genotype m _d after a possible loss, given the true genotype m, against the dropout rate d, defined as

类似方程式适用于父亲SNP阵列数据。Similar equations apply to paternal SNP array data.

亲本序列数据丢失：关于母亲序列数据SMLoss of parental sequence data: About maternal sequence data SM

其中如在先前的部分中所定义并且来自二项式分布的概率如先前在亲本数据可能性部分中所定义。类似方程式适用于父本序列数据。where as defined in the previous section and probabilities from the binomial distribution as previously defined in the Parental Data Likelihood section. Similar equations apply to paternal sequence data.

自由浮动的DNA序列数据丢失：Free-floating DNA sequence data is missing:

其中如在自由浮动的数据可能性部分中所定义。where as defined in the free-floating data possibilities section.

在一个实施例中，鉴于真实的母亲基因型，假定丢失率d_ps，是所观察到的母亲基因型的概率；并且鉴于真实的孩子基因型，假定丢失率d_cs，是所观察到的孩子基因型的概率。如果nA_T＝真实基因型c中A等位基因的数量，nA_D＝所观察到的基因型中A等位基因的数量，其中nA_T≥nA_D，并且类似地nB_T＝真实基因型c中B等位基因的数量，nB_D＝所观察到的基因型中B等位基因的数量，其中nB_T≥nB_D并且d＝丢失率，那么In one embodiment, given the true maternal genotype, the assumed dropout rate d _ps , is the observed probability of the maternal genotype; and given the true child genotype, the assumed dropout rate d _cs , is the observed probability of the child Probability of genotype. If _nAT = number of A alleles in true genotype c, nA _D = number of A alleles in observed genotypes, where _nAT ≥ _nAD , and similarly nB _T = true genotype c The number of B alleles in , nB _D = the number of B alleles in the observed genotype, where nB _T ≥ nB _D and d = dropout rate, then

在一个实施例中，信息法可以并入随机和一致偏差。理想来说，在序列计数的数量中没有一个SNP一致性抽样偏差或随机噪声(二项式分布变化除外)。具体来说，在SNPi上，关于母亲基因型m、真实的孩子基因型c和孩子分数cf，并且X＝SNP i上(A+B)读数集合中A的数量，X起作用方式如同X～Binomial(p，A+B)，其中p＝＝A含量的真实概率。In one embodiment, the information method can incorporate random and uniform bias. Ideally, there would be no one SNP consistent sampling bias or random noise (other than binomial distribution variation) in the number of sequence counts. Specifically, on SNPi, with respect to mother genotype m, true child genotype c, and child fraction cf, and X = number of A in the set of (A+B) reads on SNP i, X works as if X∼ Binomial(p,A+B), where p==true probability of A content.

在一个实施例中，信息法可以并入随机偏差。通常的情况是，假如测量结果存在偏差，因此在这个SNP上得到A的概率等于q，这个q稍微不同于如上文所定义的p。p与q有多不同取决于测量过程的准确性和其它因素的数量并且可以通过q偏离p的标准偏差来定量。在一个实施例中，有可能将q建模为具有β分布，其中参数取决于以p为中心的所述分布的平均值，和一些指定标准偏差s。具体来说，这给出了，其中。如果我们令，那么参数可以经推导为，其中。In one embodiment, the information method may incorporate random bias. It is often the case that, given that the measurements are biased, the probability of getting an A at this SNP is therefore equal to q, which is slightly different from p as defined above. How different p is from q depends on the accuracy of the measurement process and a number of other factors and can be quantified by the standard deviation of q from p. In one embodiment, it is possible to model q as having a beta distribution, with parameters depending on the mean of said distribution centered at p, and some specified standard deviation s. Specifically, this gives, where . If we let , then the parameters can be derived as, where.

这是β-二项式分布的定义，其中从具有可变参数q的二项式分布进行抽样，其中q服从β分布，平均值为p。因此，在不具有偏差的情况下，在SNP i上，假定真实的母亲基因型(m)，鉴于SNP i上的母亲序列A计数(am_i)和SNP i上的母亲序列B计数(bm_i)，亲本序列数据(SM)概率可以如下计算：This is the definition of the beta-binomial distribution, where sampling is done from a binomial distribution with variable parameter q, where q follows a beta distribution with mean p. Thus, without bias, at SNP i, assuming the true maternal genotype (m), given maternal sequence A counts at SNP i (am _i ) and maternal sequence B counts at SNP i (bm _i ), the parental sequence data (SM) probability can be calculated as follows:

现在，包括随机偏差与标准偏差s，这就变成了：Now, including the random deviation and standard deviation s, this becomes:

X|m～BetaBinom(p_m(A)，am_i+bm_i，s)X|m～BetaBinom(p _m (A), am _i +bm _i , s)

在不具有偏差的情况下，假定真实的母亲基因型(m)、真实的孩子基因型(c)、孩子分数(cf)，假定孩子假设H，鉴于在SNP i上的自由浮动的DNA序列A计数(a_i)和在SNP i上的自由浮动的序列B计数(b_i)，母本血浆DNA序列数据概率可以如下计算：With no bias, given the true maternal genotype (m), true child genotype (c), child fraction (cf), assumed child hypothesis H, given a free-floating DNA sequence A at SNP i Counts (a _i ) and free-floating sequence B counts (bi ) on SNP _i , the maternal plasma DNA sequence data probabilities can be calculated as follows:

其中X～Binom(p(A)，a_i+b_i)，其中p(A)＝。where X~Binom(p(A), a _i + _bi ), where p(A)=.

在一个实施例中，包括随机偏差与标准偏差s，这就变成了X～BetaBinom(p(A)，a_i+b_i，s)，其中额外变化的量通过偏差参数s指定或等效于N。s值越小(或N值越大)，这种分布越接近常规二项式分布。有可能从明确的背景AA|AA、BB|BB、AA|BB、BB|AA估计偏差的量，即估计上文的N，并且在以上概率中使用所估计的取决于数据的特性，可以使N是个常数，与读数深度a_i+b_i无关；或是a_i+b_i的函数，使得读数深度越大，偏差越小。In one embodiment, including random deviation and standard deviation s, this becomes X ~ BetaBinom(p(A), a _i + _bi , s), where the amount of additional variation is specified by the bias parameter s or equivalently in N. The smaller the value of s (or the larger the value of N), the closer this distribution is to the regular binomial distribution. It is possible to estimate the amount of bias from the explicit background AA|AA, BB|BB, AA|BB, BB|AA, i.e. estimate N above, and use the estimated Depending on the characteristics of the data, N can be made a constant, independent of the reading depth a _i + b _i ; or a function of a _i + b _i , so that the greater the reading depth, the smaller the deviation.

在一个实施例中，信息法可以并入一致性每个SNP偏差。由于测序过程的人工因素，一些SNP可以具有一致更低或更高的计数，与A含量的真实量无关。假如SNP i向A计数数量中一致地加入了w_i％的偏差。在一些实施例中，可以在相同条件下从所推导出的训练数据的集合估计这个偏差，并且加回到亲本序列数据估计中，为：In one embodiment, informational methods can incorporate concordant per-SNP biases. Due to artifacts of the sequencing process, some SNPs can have consistently lower or higher counts, independent of the true amount of A content. Suppose SNP i consistently adds w _i % bias to the A count population. In some embodiments, this bias can be estimated from the derived set of training data under the same conditions and added back to the parental sequence data estimate as:

并且在自由浮动的DNA序列数据概率估计的情况下，为：and in the case of free-floating DNA sequence data probability estimates, is:

，其中X～BetaBinom(p(A)+w_i，a_i+b_i，s)，, where X～BetaBinom(p(A)+w _i , a _i +b _i , s),

在一些实施例中，可以写下所述方法以尤其考虑额外噪声、有差别的样品质量、有差别的SNP质量以及随机抽样偏差。在此给出了它的一个实例。这种方法已经显示尤其适用于使用大规模复合微型PCR方案产生的数据的情况下，并且被用于实验7到13中。所述方法涉及数个步骤，每个步骤向最终模型引入了不同种类的噪声和/或偏差：In some embodiments, the method can be written to account for, inter alia, additional noise, differential sample quality, differential SNP quality, and random sampling bias. An example of it is given here. This approach has been shown to be particularly applicable in the context of data generated using a large-scale multiplex mini-PCR protocol and was used in experiments 7 to 13. The described method involves several steps, each of which introduces different kinds of noise and/or bias to the final model:

(1)假如包含母本和胎儿的DNA的混合物的第一样品含有初始量大小＝N₀的DNA分子，通常在1,000-40,000的范围内，其中p＝真实％参考(1) If the first sample comprising a mixture of maternal and fetal DNA contains an initial amount of DNA molecules of size = N ₀ , typically in the range of 1,000-40,000, where p = true % reference

(2)在使用通用接合衔接子扩增中，假定对N₁个分子进行抽样；通常N₁约为N₀/2个分子并且由于抽样而引入了随机抽样偏差。扩增样品可以含有N₂个分子，其中N₂＞＞N₁。令X₁表示所抽样的N₁个分子中参考基因座的量(基于每个SNP)，其中p₁＝X₁/N₁的变化在方案其余部分引入了随机抽样偏差。通过使用β-二项式(BB)分布代替使用简单二项式分布模型，将这种抽样偏差包括在模型中。稍后可以基于每个样品，在调整泄漏和扩增偏差之后，在SNP上，0＜p＜1，从训练数据估计β-二项式分布的参数N。泄漏是不正确地读取SNP的倾向。(2) In amplification using universal ligated adapters, N1 molecules are assumed to be sampled; typically N1 is _{approximately} _N0 / ₂ molecules and a random sampling bias is introduced due to sampling. The amplified sample may contain N ₂ molecules, where N ₂ >>N ₁ . Let X ₁ denote the amount of reference locus (on a per SNP basis) among the N ₁ molecules sampled, where variation of p ₁ =X ₁ /N ₁ introduces random sampling bias in the rest of the protocol. This sampling bias was included in the model by using the beta-binomial (BB) distribution instead of using the simple binomial distribution model. The parameter N of the beta-binomial distribution can later be estimated from the training data on a SNP, 0<p<1, after adjusting for leakage and amplification bias on a per-sample basis. Leakage is the tendency to read SNPs incorrectly.

(3)扩增步骤将扩增任何等位基因偏差，由此扩增由于可能不均匀扩增所引入的偏差。假如在基因座的一个等位基因扩增f倍，在所述基因座的另一个等位基因扩增g倍，其中f＝ge^b，其中b＝0表示无偏差。偏差参数b以0为中心，并且表示特定SNP上的A等位基因相较于B等位基因扩增多出或少于多少。参数b在SNP与SNP之间可以不同。偏差参数b可以基于每个SNP，例如从训练数据估计。(3) The amplification step will amplify any allelic bias, thereby amplifying the bias introduced by possible non-uniform amplification. If one allele at a locus is amplified by a factor of f, the other allele at that locus is amplified by a factor of g, where f=ge ^b , where b=0 means no bias. The bias parameter b is centered around 0 and represents how much more or less the A allele is amplified compared to the B allele at a particular SNP. The parameter b can vary from SNP to SNP. The bias parameter b can be estimated on a per SNP basis, eg from training data.

(4)测序步骤涉及对所扩增的分子样品进行测序。在这个步骤中，可以存在泄漏，其中泄漏是不正确地读取SNP的情况。泄漏可以由任何数量的问题引起，并且可以导致所读取的SNP不是真实的等位基因A，而是在所述基因座发现的另一个等位基因B或通常不是发现在所述基因座的等位基因C或D。假如所述测序从大小N₃的扩增样品测量多个DNA分子的序列数据，其中N₃＜N₂。在一些实施例中，N₃可以在20,000到100,000、100,000到500,000、500,000到4,000,000、4,000,000到20,000,000或20,000,000到100,000,000的范围内。经抽样的每个分子被正确地读取的概率是p_g，在此情况下它将正确地表示等位基因A。样品将被不正确地读作与初始分子无关的等位基因的概率是1-p_g，并且看起来像等位基因A的概率将是p_r，等位基因B的概率是p_m或等位基因C或等位基因D的概率是p_o，其中p_r+p_m+p_o＝1。参数p_g、p_r、p_m、p_o基于每个SNP从训练数据估计。(4) The sequencing step involves sequencing the amplified molecular sample. During this step, there can be leaks, where leaks are instances of SNPs being incorrectly read. Leakage can be caused by any number of issues, and can result in a SNP read that is not the true allele A, but another allele B found at that locus or not normally found at that locus Allele C or D. Suppose the sequencing measures sequence data for a plurality _of DNA molecules from an amplified sample _of size N3, where N3< _N2 . _In some embodiments, N3 may range from 20,000 to 100,000, 100,000 to 500,000, 500,000 to 4,000,000, 4,000,000 to 20,000,000, or 20,000,000 to 100,000,000. The probability that each sampled molecule is correctly read is p _g , in which case it will correctly represent allele A. The probability that a sample will be incorrectly read as an allele unrelated to the initial molecule is 1- _pg , and the probability that it looks like allele A will be _pr , _allele B will be pm or etc. The probability of allele C or allele D is p _o , where p _r +p _m +p _o =1. The parameters p _g , p _r , p _m , p _o are estimated from the training data on a per SNP basis.

不同方案可以涉及类似步骤，其中分子生物学步骤的变化引起不同量的随机抽样、不同扩增水平和不同泄漏偏差。以下模型可以同样很好地适用于这些情况中的每一个。基于每个SNP，所抽样的DNA量的模型由以下给出：Different protocols may involve similar steps, with variations in molecular biology steps resulting in different amounts of random sampling, different levels of amplification, and different leakage biases. The following models work equally well for each of these situations. On a per SNP basis, the model for the amount of DNA sampled is given by:

X₃～BetaBinomial(L(F(p，b)，p_r，p_g)，N*H(p，b))X ₃ ～BetaBinomial(L(F(p, b), p _r , p _g ), N*H(p, b))

其中p＝参考DNA的真实量，b＝每个SNP偏差，并且如上所述，p_g是正确读数的概率，p_r是不正确读取但是偶然发现看起来像正确等位基因的读数的概率，在如上所述的坏读数情况下，那么：where p = true amount of reference DNA, b = bias per SNP, and as above, _pg is the probability of a correct read and _pr is the probability of an incorrect read but by chance a read that looks like the correct allele , in the case of bad readings as above, then:

F(p，b)＝pe^b/(pe^b+(1-p))，H(p，b)＝(e^bp+(1-p))²/e^b，L(p，p_r，p_g)＝p*p_g+p_r*(1-p_g)。F(p,b)=pe ^b /(pe ^b +(1-p)), H(p,b)=(e ^b p+(1-p)) ² /e ^b , L(p,p _r , p _g )=p*p _g +p _r *(1−p _g ).

在一些实施例中，所述方法使用β-二项式分布而不是简单二项式分布；这照顾到随机抽样偏差。β-二项式分布的参数N基于每个样品根据需要进行估计。使用偏差校正F(p，b)、H(p，b)而不仅仅是p，照顾到扩增偏差。偏差参数b基于每个SNP从训练数据提前估计。In some embodiments, the method uses a beta-binomial distribution instead of a simple binomial distribution; this takes care of random sampling bias. The parameter N of the β-binomial distribution was estimated on a per-sample basis as needed. Using bias correction F(p,b), H(p,b) instead of just p takes care of amplification bias. The bias parameter b is estimated in advance from the training data on a per-SNP basis.

在一些实施例中，所述方法使用泄漏校正L(p，pr，p_g)，而不仅仅是p；这照顾到泄漏偏差，即不同SNP和样品质量。在一些实施例中，参数p_g、p_r、p_o基于每个SNP从训练数据提前估计。在一些实施例中，参数p_g、p_r、p_o可以随当前样品更新，以说明不同样品质量。In some embodiments, the method uses a leak correction L(p, pr, p _g ), not just p; this takes into account leak bias, ie different SNPs and sample quality. In some embodiments, the parameters p _g , p _r , p _o are estimated in advance from the training data on a per SNP basis. In some embodiments, the parameters p _g , p _r , p _o may be updated with the current sample to account for different sample masses.

本文中所述的模型是非常通用的并且可以说明有差别的样品质量和有差别的SNP质量。不同的样品和SNP不同地处理，如通过一些实施例使用β-二项式分布，其平均值和方差是DNA的初始量以及样品和SNP质量的函数的事实所例示。The model described herein is very general and can account for differential sample mass and differential SNP mass. Different samples and SNPs are treated differently, as exemplified by the fact that some embodiments use a beta-binomial distribution whose mean and variance are functions of the initial amount of DNA and the quality of the sample and SNP.

平台建模platform modeling

考虑单一SNP，其中血浆中所存在的预计等位基因比率是r(基于母本和胎儿的基因型)。预计等位基因比率定义为所组合的母本和胎儿的DNA中A等位基因的预计分数。关于母本基因型g_m和孩子基因型g_c，预计等位基因比率由方程式1给出，假定基因型同样以等位基因比率形式呈现。Consider a single SNP where the predicted allele ratio present in plasma is r (based on maternal and fetal genotypes). The predicted allele ratio is defined as the predicted fraction of the A allele in the combined maternal and fetal DNA. With respect to the maternal genotype _gm and the child _gc , the predicted allele ratio is given by Equation 1, assuming that the genotypes are also presented in the form of allele ratios.

r＝fg_c+(1-f)g_m (1)r＝fg _c +(1-f)g _m (1)

在SNP的观察结果由所存在的每个等位基因的映射读数的数量n_a和n_b组成，n_a和n_b共计是读数深度d。假定阈值已经被应用于映射概率和phred分数以使得映射和等位基因观察结果可以被认为是正确的。phred分数是数值度量，涉及在特定碱基的特定测量是错的概率。在一个实施例中，其中碱基已经通过测序测量，phred分数可以从对应于所判读碱基的染料强度与其它碱基的染料强度的比率来计算。观察似然性的最简单模型是二项式分布，它假定d读数中的每一个是从具有等位基因比率r的一个大池中独立地获取的。方程式2描述了这个模型。The observations at a SNP consist of the number _n _a and n _b of mapped reads for each allele present, which together is the read depth _d . It is assumed that thresholds have been applied to the mapping probabilities and phred scores so that mapping and allele observations can be considered correct. A phred score is a numerical measure that relates to the probability that a particular measurement at a particular base is wrong. In one embodiment, where a base has been measured by sequencing, a phred score can be calculated from the ratio of the dye intensity corresponding to the called base to the dye intensity of other bases. The simplest model for the likelihood of observation is the binomial distribution, which assumes that each of d reads is independently taken from a large pool with allele ratio r. Equation 2 describes this model.

P(n_a，n_b|r)＝p_bino(n_a；n_a+n_b，r)＝ (2)P(n _a , n _b |r) = p _bino (n _a ; n _a +n _b , r) = (2)

二项式模型可以按多种方式扩展。当母本和胎儿的基因型是所有A或所有B时，血浆中的预计等位基因比率将是0或1，并且二项式概率将不能很好地定义。实际上，实际上有时观察到意想不到的等位基因。在一个实施例中，有可能使用经校正的等位基因比率以允许少量意想不到的等位基因。在一个实施例中，有可能使用训练数据对出现在每个SNP上的意想不到的等位基因的比率进行建模，并且使用这个模型来纠正预计等位基因比率。当预计等位基因比率不是0或1时，由于扩增偏差或其它现象，所观察到的等位基因比率不能针对预计等位基因比率以足够高的读数深度收敛。然后可以将等位基因比率建模为以预计等位基因比率为中心的β分布，产生针对P(n_a，n_b|r)的β-二项式分布，它的方差高于二项式。The binomial model can be extended in several ways. When the maternal and fetal genotypes are all A or all B, the expected allele ratios in plasma will be 0 or 1, and the binomial probabilities will not be well defined. In fact, unexpected alleles were sometimes observed in practice. In one embodiment, it is possible to use corrected allele ratios to allow for a small number of unexpected alleles. In one embodiment, it is possible to use the training data to model the unexpected allele ratios that occur at each SNP, and use this model to correct the expected allele ratios. When the predicted allele ratio is not 0 or 1, the observed allele ratio cannot converge to the predicted allele ratio at a sufficiently high read depth due to amplification bias or other phenomena. The allele ratios can then be modeled as a beta distribution centered on the predicted allele ratios, yielding a beta-binomial distribution for P(n _a , n _b |r), which has higher variance than the binomial .

在单一SNP的响应的平台模型将被定义为F(a，b，g_c，g_m，f)(3)，或鉴于母本和胎儿的基因型，观察n_a＝a并且n_b＝b的概率，它还通过方程式1取决于胎儿分数。F的函数形式可以是二项式分布、β-二项式分布或如上文所论述的类似函数。The platform model for the response at a single SNP would be defined as F(a, b, g _c , g _m , f) (3), or given the maternal and fetal genotypes, observe n _a = a and n _b = b The probability of , which also depends on the fetal fraction via Equation 1. The functional form of F may be a binomial distribution, a beta-binomial distribution, or a similar function as discussed above.

F(a，b，g_c，g_m，f)＝P(n_a＝a，n_b＝b|g_c，g_m，f)＝P(n_a＝a，n_b＝b|r(g_c，g_m，f)) (3)F(a,b,g _c ,g _m ,f)=P(n _a =a,n _b =b|g _c ,g _m ,f)=P(n _a =a,n _b =b|r( g _c ，g _m ，f)) (3)

在一个实施例中，孩子分数可以如下测定。用于产前测试的胎儿分数f的最大似然估计可以在不使用父本信息的情况下推导出。在父本的遗传数据不可用的情况下，例如在所记录的父亲实际上不是胎儿的基因父亲的情况下，这会是恰当的。从母本基因型是0或1的SNP集合估计胎儿分数，得到仅两个可能的胎儿基因型的集合。将S₀定义为母本基因型为0的SNP集合并且将S₁定义为母本基因型为1的SNP集合。在S₀上的可能的胎儿基因型是0和0.5，得到一组可能的等位基因比率R₀(f)＝{0，f/2}。类似地，R₁(f)＝{1-f/2，1}。可以稍微扩展这种方法以包括母本基因型是0.5的SNP，但是由于可能的等位基因比率的更大集合，这些SNP将提供更少信息。In one embodiment, child scores may be determined as follows. Maximum likelihood estimates of fetal fraction f for prenatal testing can be derived without using paternal information. This may be appropriate where paternal genetic data is not available, for example where the recorded father is not actually the genetic father of the fetus. The fetal fraction is estimated from the set of SNPs whose maternal genotype is 0 or 1, resulting in a set of only two possible fetal genotypes. Define S ₀ as the set of SNPs with maternal genotype 0 and define S ₁ as the set of SNPs with maternal genotype 1 . The possible fetal genotypes on S ₀ are 0 and 0.5, resulting in a set of possible allele ratios R ₀ (f)={0,f/2}. Similarly, R ₁ (f)={1-f/2,1}. This approach could be slightly extended to include SNPs whose maternal genotype is 0.5, but these SNPs would be less informative due to the larger set of possible allele ratios.

将N_a0和N_b0定义为通过在S₀中的s关于SNP的n_as和n_bs所形成的向量，并且N_a1和N_b1类似地针对S₁。f的最大似然估计由方程式4定义。Define N _a0 and N _b0 _as the vectors formed by s in S ₀ with respect to nas and n _bs of the SNP, and N _a1 and N _b1 similarly for S ₁ . Maximum Likelihood Estimation of f Defined by Equation 4.

arg max_f P(N_a0，N_b0|f)P(N_a1，N_b1|f) (4)arg max _f P(N _a0 ，N _b0 |f)P(N _a1 ，N _b1 |f) (4)

假定在每个SNP的等位基因计数独立地以SNP的血浆等位基因比率为条件，概率可以表示为每个集合中SNP上的结果(5)。Assuming that the allele count at each SNP is independently conditioned on the SNP's plasma allele ratio, the probability can be expressed as the outcome over the SNPs in each pool (5).

P(N_a0，N_b0|f)＝P(n_as，n_bs|f) (5)P(N _a0 , N _b0 |f)=P(n _as , n _bs |f) (5)

P(N_a1，N_b1|f)＝P(n_as，n_bs|f)P(N _a1 , N _b1 |f)=P(n _as , n _bs |f)

关于f的相关性是通过可能的等位基因比率R₀(f)和R₁(f)的集合。SNP概率P(n_as，n_bs|f)可以通过假定以f为条件的最大似然基因型取近似值。在适当高的胎儿分数和读数深度下，最大似然基因型的选择将具有高置信度。举例来说，在10％的胎儿分数和1000的读数深度下，考虑母亲具有基因型零的SNP。预计等位基因比率是0和5％，这在足够高的读数深度下将可容易区别。将估计孩子基因型代入方程式5中，产生用于胎儿分数估计的完整方程式(6)。The correlation with f is through the set of possible allelic ratios R ₀ (f) and R ₁ (f). The SNP probability P(n _as , n _bs |f) can be approximated by assuming a maximum likelihood genotype conditional on f. At suitably high fetal fractions and read depths, maximum likelihood genotypes will be selected with high confidence. For example, at a fetal fraction of 10% and a read depth of 1000, consider that the mother has a SNP with genotype zero. The expected allelic ratios were 0 and 5%, which would be easily distinguishable at sufficiently high read depths. Substituting the estimated child genotype into Equation 5 yields the complete Equation (6) for fetal fraction estimation.

arg max_f (6)arg max _f (6)

胎儿分数必须在范围[0，1]内并且因此优化可以通过约束的一维搜索容易地实施。The fetal fraction must be in the range [0,1] and thus optimization can be easily implemented by a constrained one-dimensional search.

在存在低读数深度或高噪声水平的情况下，可以优选的是不假定最大似然基因型，假定最大似然基因型会导致人为的高置信度。另一种方法将是对在每个SNP的可能基因型进行求和，针对S₀中的SNP针对P(n_a，n_b|f)产生以下表达式(7)。先验概率P(r)可以假定在R₀(f)上是均匀的，或可以基于群体频率。S₁组的扩展是微不足道的。In the presence of low read depth or high noise levels, it may be preferable not to assume a maximum likelihood genotype, which would result in an artificially high confidence. Another approach would be to sum the possible genotypes at each SNP, yielding the following expression (7) for P(n _a , n _b |f) for the SNP in S ₀ . The prior probability P(r) may be assumed to be uniform over R ₀ (f), or may be based on population frequencies. The expansion _of the S1 group is trivial.

P(n_a，n_b|f)＝ (7)P(n _a , n _b |f) = (7)

在一些实施例中，概率可以如下推导。置信度可以从两种假设H_t和H_f的数据似然性来计算。基于响应模型、估计胎儿分数、母亲基因型、等位基因群体频率以及血浆等位基因计数，推导出每种假设的似然性。In some embodiments, the probability can be derived as follows. Confidence can be calculated from the likelihood of the data for the two hypotheses _Ht and _Hf . The likelihood of each hypothesis was derived based on the response model, estimated fetal fraction, maternal genotype, allele population frequencies, and plasma allele counts.

定义以下符号：Define the following symbols:

假定在每个SNP的观察独立地以血浆等位基因比率为条件，亲权假设的似然性是在SNP上的似然性的结果。以下方程式推导单一SNP的似然性。方程式8是任何假设h的似然性的通用表达式，所述表达式随后将被分解为H_t和H_f的特定情况。The likelihood of the parentage hypothesis is a consequence of the likelihood at the SNP, assuming that the observation at each SNP is independently conditioned on the plasma allele ratio. The following equations derive the likelihood of a single SNP. Equation 8 is a general expression for the likelihood of any hypothesis h, which will then be decomposed into the specific cases of _Ht and _Hf .

P(n_a，n_b|h，G_m，G_tf，f)＝P(n _a , n _b |h, G _m , G _tf , f) =

＝=

＝ (8)= (8)

在H_t的情况下，根据方程式9，假设父亲是真父亲并且胎儿基因型从母本基因型和假设父亲基因型遗传。In the case of _Ht , according to Equation 9, the hypothetical father is the true father and the fetal genotype is inherited from the maternal genotype and the hypothetical paternal genotype.

P(n_a，n_b|，G_m，G_tf，f)＝ (9)P(n _a , n _b |, G _m , G _tf , f) = (9)

＝=

在H_f的情况下，假设父亲不是真父亲。真实的父亲基因型的最佳估计通过在每个SNP的群体频率给出。因此，孩子基因型的概率通过已知的母亲基因型和群体频率测定，如在方程式10中。In the case of H _f the hypothetical father is not the real father. The best estimate of the true paternal genotype is given by the population frequency at each SNP. Thus, the probability of a child's genotype is determined from the known maternal genotype and population frequency, as in Equation 10.

P(n_a，n_b|，G_m，G_tf，f)＝P(n _a , n _b |, G _m , G _tf , f) =

＝=

正确亲权的置信度C_p使用贝叶斯法则(Bayes rule)(11)从具有两种似然性的SNP的结果计算。The confidence _Cp of correct parentage is calculated from the results for SNPs with two likelihoods using Bayes rule (11).

Cp＝ (11)Cp= (11)

使用胎儿分数百分比的最大似然模型Maximum Likelihood Model Using Percent Fetal Fraction

通过测量母本血清中所含自由浮动的DNA或通过测量任何混合样品中的基因型物质来测定胎儿倍性状态是一项很有意义的工作。存在多种方法，例如执行读数计数分析，其中假定如果胎儿的特定染色体是三体的，那么来自母本血液中所发现的所述染色体的DNA的总量将相对于参考染色体有所升高。用于检测这类胎儿中的三体性的一种方式是使针对每条染色体预计的DNA的量归一化，例如根据分析集中对应于指定染色体的SNP的数量，或根据染色体的可独特地映射部分的数量。在已经对测量结果归一化之后，判定所测量的DNA的量超过某一阈值的任何染色体是三体的。这种方法描述在范(Fan)等人PNAS，2008；105(42)；第16266-16271页；以及丘(Chiu)等人BMJ 2011；342：c7401中在丘等人的论文中，通过如下计算Z分数来实现归一化：Determination of fetal ploidy status by measuring free-floating DNA contained in maternal serum or by measuring genotypic material in any pooled sample is of interest. There are various approaches, such as performing a read count analysis, where it is assumed that if the fetus is trisomy for a particular chromosome, then the total amount of DNA from that chromosome found in the maternal blood will be elevated relative to a reference chromosome. One way to detect trisomies in such fetuses is to normalize the amount of DNA expected for each chromosome, e.g., according to the number of SNPs in the analysis set that correspond to a given chromosome, or according to the number of SNPs that are uniquely available for the chromosome. The number of mapped sections. After the measurements have been normalized, any chromosome whose measured amount of DNA exceeds a certain threshold is judged to be trisomy. This approach is described in Fan et al. PNAS, 2008; 105(42); pp. 16266-16271; and Chiu et al. BMJ 2011; 342:c7401. In the paper by Chiu et al., via Compute Z-scores for normalization:

在测试情况下21号染色体百分比的Z分数＝((在测试情况下21号染色体百分比)-(参考对照中21号染色体平均百分比))/(参考对照中21号染色体百分比的标准偏差)。Z-score for the percentage of chromosome 21 in the test case = ((percent of chromosome 21 in the test case) - (average percentage of chromosome 21 in the reference control))/(standard deviation of the percentage of chromosome 21 in the reference control).

这些方法使用单一假设排斥法测定胎儿倍性状态。然而，它们有一些明显的缺点。因为用于测定胎儿倍性的这些方法不会根据样品中胎儿DNA的百分比而变，所以它们使用一个截止值；这样的结果是测定准确性不是最佳的，并且混合物中胎儿DNA的百分比相对较低的那些情况将遭受最差的准确性。These methods determine fetal ploidy status using a single hypothesis exclusion method. However, they have some significant disadvantages. Because these methods for determining fetal ploidy do not vary according to the percentage of fetal DNA in the sample, they use a cut-off value; the consequence of this is that the accuracy of the assay is not optimal and the percentage of fetal DNA in the mixture is relatively low. Those cases that are low will suffer from the worst accuracy.

在一个实施例中，本发明方法用于测定胎儿倍性状态，所述方法涉及考虑样品中胎儿DNA的分数。在本发明的另一个实施例中，所述方法涉及使用最大似然估计。在一个实施例中，本发明方法涉及计算样品中胎儿或胎盘来源的DNA百分比。在一个实施例中，用于判读非整倍性的阈值基于所计算的胎儿DNA百分比适应性地调整。在一些实施例中，用于估计DNA混合物中胎儿来源的DNA的百分比的方法包含：获得包含来自母亲的遗传物质和来自胎儿的遗传物质的混合样品；获得来自胎儿父亲的遗传样品；测量混合样品中的DNA；测量父亲样品中的DNA；以及使用混合样品和父亲样品的DNA测量结果，计算混合样品中胎儿来源的DNA的百分比。In one embodiment, the method of the invention is used to determine the ploidy state of a fetus, said method involving taking into account the fraction of fetal DNA in a sample. In another embodiment of the invention, the method involves the use of maximum likelihood estimation. In one embodiment, the methods of the invention involve calculating the percentage of DNA of fetal or placental origin in a sample. In one embodiment, the threshold for calling aneuploidy is adaptively adjusted based on the calculated percent fetal DNA. In some embodiments, the method for estimating the percentage of fetal-derived DNA in a DNA mixture comprises: obtaining a mixed sample comprising genetic material from the mother and genetic material from the fetus; obtaining a genetic sample from the father of the fetus; measuring the mixed sample DNA in the pooled sample; measuring the DNA in the paternal sample; and using the DNA measurements in the pooled sample and the paternal sample, calculating the percentage of fetal-derived DNA in the pooled sample.

在本发明的一个实施例中，可以测量混合物中胎儿DNA的分数或胎儿DNA的百分比。在一些实施例中，所述分数可以仅仅使用关于母本血浆样品本身所得到的基因分型测量结果来计算，所述母本血浆样品是胎儿和母本DNA的混合物。在一些实施例中，所述分数还可以使用所测量的或以其它方式已知的母亲基因型和/或所测量的或以其它方式已知的父亲基因型来计算。在一些实施例中，胎儿DNA百分比可以使用关于母本和胎儿的DNA的混合物所得到的测量结果以及对亲本背景的了解来计算。在一个实施例中，胎儿DNA的分数可以使用群体频率来计算以基于特定等位基因测量结果的概率调整模型。In one embodiment of the invention, the fraction of fetal DNA or the percentage of fetal DNA in the mixture can be measured. In some embodiments, the fraction may be calculated using only genotyping measurements made on the maternal plasma sample itself, which is a mixture of fetal and maternal DNA. In some embodiments, the score may also be calculated using the measured or otherwise known maternal genotype and/or the measured or otherwise known paternal genotype. In some embodiments, the percent fetal DNA can be calculated using measurements made on a mixture of maternal and fetal DNA and knowledge of the parental background. In one embodiment, fractions of fetal DNA can be calculated using population frequencies to adjust the model based on the probability of a particular allele measurement.

在本发明的一个实施例中，置信度可以基于胎儿倍性状态的测定结果的准确性来计算。在一个实施例中，具有最大似然性的假设(H_主要)的置信度可以如下计算：(1-H_主要)/∑(所有H)。如果所有假设的分布已知，那么有可能测定假设的置信度。如果亲本基因型信息已知，那么有可能测定所有假设的分布。如果关于整倍体胎儿的数据的预计分布和非整倍体胎儿的数据的预计分布的知识已知，那么有可能计算倍性测定的置信度。如果亲本基因型数据已知，那么有可能计算这些预计分布。在一个实施例中，可以使用关于正常假设和关于异常假设的测试统计分布的知识来测定判读的可靠性以及优化阈值，从而做出更可靠的判读。当混合物中胎儿DNA的量和/或百分比低时，这是尤其适用的。它将有助于避免因为测试统计发现实际上是非整倍体的胎儿是整倍体的情况，例如Z统计不超过基于针对更高的胎儿DNA百分比的情况优化的阈值得到的阈值。In one embodiment of the present invention, the confidence level may be calculated based on the accuracy of the determination of the ploidy status of the fetus. In one embodiment, the confidence for the hypothesis with maximum likelihood (H _dominant ) can be calculated as follows: (1-H _dominant )/Σ(all H). If the distributions of all hypotheses are known, then it is possible to determine the confidence of the hypotheses. If parental genotype information is known, it is possible to determine all hypothesized distributions. If the knowledge about the expected distribution of data for euploid fetuses and the expected distribution of data for aneuploid fetuses is known, then it is possible to calculate the confidence level of the ploidy determination. If the parental genotype data are known, then it is possible to calculate these expected distributions. In one embodiment, knowledge of the distribution of test statistics on the normal hypothesis and on the abnormal hypothesis can be used to determine the reliability of the call and optimize thresholds to make more reliable calls. This is especially true when the amount and/or percentage of fetal DNA in the mixture is low. It will help to avoid situations where a fetus that is actually aneuploid is found to be euploid by test statistics such as Z-statistics not exceeding thresholds based on thresholds optimized for cases with higher fetal DNA percentages.

在一个实施例中，本文中所公开的方法可以用于通过测定母本和胎儿的遗传物质的混合物中母本和胎儿的目标染色体的拷贝数来测定胎儿非整倍性。这种方法会需要获得包含母本和胎儿的遗传物质的母本组织；在一些实施例中，这种母本组织可以是从母本血液中分离的母本血浆或组织。这种方法还会需要通过处理上述母本组织，从所述母本组织获得母本和胎儿遗传物质的混合物。这种方法会需要将所得遗传物质分配到多个反应样品中；随机提供包含目标染色体的目标序列的单独反应样品和不包含目标染色体的目标序列的单独反应样品，例如对样品执行高通量测序。这种方法会需要分析存在或不存在于所述单独反应样品中的遗传物质的目标序列以提供表示在反应样品中存在或不存在可能整倍体胎儿染色体的第一数量的二进制结果和表示在反应样品中存在或不存在可能非整倍体胎儿染色体的第二数量的二进制结果。任一数量的二进制结果均可以例如借助于信息技术来计算，所述信息技术对映射到特定染色体、染色体的特定区域、特定基因座或一组基因座的序列读数进行计数。这种方法可以涉及基于染色体长度、染色体区域的长度或所述组中基因座的数量，使二进制事件的数量归一化。这种方法会需要使用第一数量，计算针对反应样品中可能整倍体胎儿染色体的二进制结果的数量的预计分布。这种方法会需要使用第一数量和混合物中所发现的胎儿DNA的估计分数，例如通过使针对可能整倍体胎儿染色体的二进制结果的数量的预计读数计数分布乘以(1+n/2)，其中n是估计胎儿分数，来计算针对反应样品中可能非整倍体胎儿染色体的二进制结果的数量的预计分布。在一些实施例中，序列读数可以按概率映射而不是二进制结果方式来处理；这种方法将产生更高的准确性，但是需要更大的计算能力。胎儿分数可以通过多种方法估计，这些方法中的一些描述在本发明其它地方中。这种方法可以涉及使用最大似然法确定第二数量是否对应于为整倍体或非整倍体的可能非整倍体胎儿染色体。这种方法可以涉及鉴于所测量的数据，将胎儿的倍性状态判读为对应于正确的似然性最大的假设的倍性状态。In one embodiment, the methods disclosed herein can be used to determine fetal aneuploidy by determining the copy number of a maternal and fetal chromosome of interest in a mixture of maternal and fetal genetic material. Such methods would entail obtaining maternal tissue containing both maternal and fetal genetic material; in some embodiments, such maternal tissue may be maternal plasma or tissue isolated from maternal blood. Such a method would also entail obtaining a mixture of maternal and fetal genetic material from said maternal tissue by processing said maternal tissue. This approach would require partitioning of the resulting genetic material into multiple reaction samples; randomly providing a separate reaction sample containing the target sequence of the target chromosome and a separate reaction sample not containing the target sequence of the target chromosome, e.g. performing high-throughput sequencing on the samples . Such an approach would entail analyzing the presence or absence of a target sequence of genetic material in said individual reaction samples to provide a binary result representing the presence or absence of a first number of possibly euploid fetal chromosomes in the reaction sample and expressed in Responding to the binary outcome of the presence or absence of a second number of possibly aneuploid fetal chromosomes in the sample. Any number of binary outcomes can be calculated, for example, by means of information technology that counts sequence reads that map to a particular chromosome, a particular region of a chromosome, a particular locus, or a set of loci. Such methods may involve normalizing the number of binary events based on chromosome length, the length of a chromosome region, or the number of loci in the set. Such an approach would entail using the first quantity to calculate a predicted distribution for the number of binary outcomes of possible euploid fetal chromosomes in the reaction sample. Such an approach would entail using the first quantity and estimated fraction of fetal DNA found in the mixture, e.g. by multiplying the expected read count distribution for the number of binary outcomes of likely euploid fetal chromosomes by (1+n/2) , where n is the estimated fetal fraction, to calculate the expected distribution of the number of binary outcomes for possible aneuploid fetal chromosomes in the reaction sample. In some embodiments, sequence reads can be processed as a probabilistic map rather than a binary outcome; this approach will yield higher accuracy, but requires greater computing power. Fetal fraction can be estimated by a variety of methods, some of which are described elsewhere herein. Such a method may involve determining whether the second number corresponds to a probable aneuploid fetal chromosome that is euploid or aneuploid using a maximum likelihood method. Such a method may involve interpreting the ploidy state of the fetus as corresponding to the most likely hypothesized ploidy state of the correct given the measured data.

应注意，最大似然模型的使用可以用于提高测定胎儿倍性状态的任何方法的准确性。类似地，可以计算测定胎儿倍性状态的任何方法的置信度。最大似然模型的使用将改进倍性测定是使用单一假设排斥技术进行的任何方法的准确性。最大似然模型可以用于可以计算正常和异常情况下的似然性分布的任何方法。最大似然模型的使用意味着为倍性判读计算置信度的能力。It should be noted that the use of maximum likelihood models can be used to improve the accuracy of any method of determining the ploidy status of a fetus. Similarly, confidence levels for any method of determining fetal ploidy status can be calculated. The use of maximum likelihood models will improve the accuracy of any method in which ploidy determination is performed using single hypothesis exclusion techniques. Maximum likelihood models can be used with any method that can compute likelihood distributions for normal and abnormal conditions. The use of maximum likelihood models implies the ability to calculate confidence levels for ploidy calls.

所述方法的进一步讨论Further discussion of the method

在一个实施例中，本文中公开的方法利用在多态基因座的每个等位基因的独立观察数的定量测量，其中这不涉及计算等位基因的比率。这不同于例如基于微阵列的一些方法的方法，这些方法提供关于在基因座的两种等位基因的比率的信息，但是不对任一等位基因的独立观数进行定量。本领域中已知的一些方法可以提供关于独立观察数的定量信息，但是用于确定倍性的计算仅仅利用等位基因比率，但不利用定量信息。为了说明保留关于独立观察数的信息的重要性，考虑具有两种等位基因(A和B)的样品基因座。在第一实验中，观察到二十个A等位基因和二十个B等位基因；在第二实验中，观察到200个A等位基因和200个B等位基因。在两个实验中，所述比率(A/(A+B))均等于0.5，然而第二实验比第一实验多传达了关于A或B等位基因的频率确定性的信息。本发明方法并不利用等位基因比率，而是使用定量数据来更准确地对在每个多态基因座的最可能等位基因频率进行建模。In one embodiment, the methods disclosed herein utilize a quantitative measure of the number of independent observations of each allele at a polymorphic locus, where this does not involve calculating ratios of alleles. This differs from methods such as microarray-based methods that provide information about the ratio of two alleles at a locus, but do not quantify the number of independent views of either allele. Some methods known in the art can provide quantitative information on the number of independent observations, but the calculations used to determine ploidy only use allele ratios, not quantitative information. To illustrate the importance of retaining information about the number of independent observations, consider a sample locus with two alleles (A and B). In the first experiment, twenty A alleles and twenty B alleles were observed; in the second experiment, 200 A alleles and 200 B alleles were observed. In both experiments the ratio (A/(A+B)) was equal to 0.5, however the second experiment conveyed more information about the frequency certainty of the A or B allele than the first experiment. Rather than utilizing allele ratios, the methods of the present invention use quantitative data to more accurately model the most likely allele frequency at each polymorphic locus.

在一个实施例中，本发明方法构建用于合计多个多态基因座的测量结果的遗传模型，从而更好地区别三体性与二体性以及测定三体性类型。另外，本发明方法并入了遗传连锁信息以增强方法准确性。这和本领域中已知的其中对染色体上所有多态基因座的等位基因比率取平均的一些方法形成对比。本文中所公开的方法明确对由在减数分裂I期间的不分离、在减数分裂II期间的不分离和在胎儿发育早期有丝分裂期间的不分离产生的二体性以及三体性中预计的等位基因频率分布进行建模。为了说明为何这是重要的，如果不存在交叉，那么在减数分裂I期间的不分离将产生三体性，其中从一个亲本遗传两个不同的同源物；在减数分裂II期间或在胎儿发育早期有丝分裂期间的不分离将产生来自一个亲本的同一同源物的两个拷贝。每种情形都在每个多态基因座以及在认为联合地所有以物理方式连锁的基因座(即在同一染色体上的基因座)产生不同的预计等位基因频率。交叉引起同源物之间的遗传物质的交换，使遗传模式更复杂；但是本发明方法通过使用遗传连锁信息，即重组率信息和基因座之间的物理距离来适应这一点。为了更好地区别减数分裂I不分离和减数分裂II或有丝分裂不分离，本发明方法将随着距离着丝点的距离增加而增加的交叉概率并入模型中。减数分裂II和有丝分裂不分离可以通过以下事实来区别：有丝分裂不分离通常产生一个同源物的一致或几乎一致的拷贝而在减数分裂II不分离事件之后存在的两个同源物通常由于配子发生期间的一或多种交叉而不同。In one embodiment, the present method constructs a genetic model for summing measurements of multiple polymorphic loci to better distinguish trisomies from disomies and determine trisomy types. In addition, the method of the present invention incorporates genetic linkage information to enhance method accuracy. This is in contrast to some methods known in the art in which the allele ratios for all polymorphic loci on a chromosome are averaged. The methods disclosed herein explicitly predict disomies and trisomies resulting from nondisjunction during meiosis I, nondisjunction during meiosis II, and nondisjunction during mitosis in early fetal development. Allele frequency distributions were modeled. To illustrate why this is important, nondisjunction during meiosis I would produce a trisomy, in which two distinct homologues are inherited from one parent, if there were no crossover; either during meiosis II or at Nondisjunction during mitosis early in fetal development will produce two copies of the same homologue from one parent. Each case yields a different predicted allele frequency at each polymorphic locus and at all physically linked loci (ie, loci on the same chromosome) that are considered jointly. Crossover causes the exchange of genetic material between homologues, making the pattern of inheritance more complex; but the method of the present invention accommodates this by using genetic linkage information, ie recombination rate information and the physical distance between loci. To better distinguish meiotic I nondisjunction from meiotic II or mitotic nondisjunction, the method of the present invention incorporates in the model an increasing probability of crossover with increasing distance from the centromere. Meiotic II and mitotic nondisjunction can be distinguished by the fact that mitotic nondisjunction usually produces identical or nearly identical copies of one homologue whereas two homologues present after the meiotic II nondisjunction event are usually due to Varies by one or more crosses during gametogenesis.

在一个实施例中，如果假定是二体性，那么本发明方法可能无法测定亲本的单倍型。在一个实施例中，在三体性的情况下，本发明方法可以通过使用从一个亲本的两个拷贝获取血浆的事实，做出关于亲本一方或双方的单倍型的测定，并且亲本相信息可以通过注意哪两个拷贝是从所讨论的亲本遗传的来测定。具体来说，孩子可以遗传亲本的相同拷贝中的两个(匹配三体性)或亲本的两个拷贝(不匹配三体性)。在每个SNP，可以计算匹配三体性和不匹配三体性的似然性。不使用解释交叉的连锁模型的倍性判读方法将以所有染色体上匹配和不匹配三体性的简单加权平均值形式计算三体性的总似然性。然而，由于产生分离误差和交叉的生物机制，只有当发生交叉时，染色体上的三体性才可以从匹配变到不匹配(并且反之亦然)。本发明方法以概率方式考虑了交叉的似然性，促使倍性判读的准确性大于不考虑交叉的似然性的那些方法。In one example, the methods of the invention may fail to determine the haplotype of the parents if disomy is assumed. In one embodiment, in the case of trisomy, the method of the invention can make a determination about the haplotype of one or both parents by using the fact that plasma was obtained from both copies of a parent, and the parental phase information This can be determined by noting which two copies were inherited from the parent in question. Specifically, a child can inherit two identical copies of a parent (matching trisomy) or two copies of a parent (unmatching trisomy). At each SNP, the likelihood of matching trisomies and mismatching trisomies can be calculated. Ploidy calling methods that do not use a linkage model to account for crossover will calculate the total likelihood of a trisomy as a simple weighted average of matching and mismatching trisomies on all chromosomes. However, due to the biological mechanisms that produce segregation errors and crossovers, a trisomy on a chromosome can change from matching to mismatching (and vice versa) only when crossing over occurs. The method of the present invention takes into account the likelihood of crossover in a probabilistic manner, resulting in greater accuracy of ploidy calls than those methods that do not consider the likelihood of crossover.

在一个实施例中，参考染色体用于测定孩子分数和噪声水平量或概率分布。在一个实施例中，孩子分数、噪声水平和/或概率分布仅仅使用可从待测定倍性状态的染色体获得的遗传信息测定。本发明方法在无参考染色体的情况下以及在不固定特定的孩子分数或噪声水平的情况下起作用。这是相较于本领域中已知的其中来自参考染色体的遗传数据必需用于校准孩子分数和染色体行为的方法的显著改进和区别点。In one embodiment, a reference chromosome is used to determine child scores and noise level quantities or probability distributions. In one embodiment, child scores, noise levels and/or probability distributions are determined using only genetic information available from the chromosomes whose ploidy status is to be determined. The inventive method works without a reference chromosome and without fixing a specific child score or noise level. This is a significant improvement and distinguishing point over methods known in the art where genetic data from a reference chromosome must be used to calibrate child scores and chromosome behaviour.

在不需要参考染色体来测定胎儿分数的一个实施例中，如下测定假设：In one embodiment where a reference chromosome is not required to determine fetal fraction, the following determination assumptions are made:

*先验概率(H)* Prior probability (H)

借由算法与参考染色体，通常假定参考染色体是二体性，并且然后可以(a)基于这个假设和参考染色体数据，固定最可能的孩子分数和随机噪声水平N：By means of an algorithm with a reference chromosome, it is usually assumed that the reference chromosome is disomy, and then one can (a) fix the most probable child score and the random noise level N based on this assumption and the reference chromosome data:

并且然后简化为and then simplifies to

或(b)基于这个假设和参考染色体数据，估计孩子分数和噪声水平分布。具体来说，对于cfr和N，将不会固定仅一个值，而是针对可能的cfr、N值的更宽范围分配概率p(cfr，N)：Or (b) estimate child score and noise level distributions based on this assumption and reference chromosome data. Specifically, for cfr and N, instead of fixing just one value, the probability p(cfr, N) is assigned for a wider range of possible cfr, N values:

其中先验概率(cfr，N)是特定孩子分数和噪声水平的先验概率，通过先验知识和实验测定。必要时，仅在cfr，N范围内均匀。然后可以写下：where the prior probability (cfr, N) is the prior probability of a specific child's score and noise level, determined by prior knowledge and experiments. Uniform only in cfr, N range if necessary. Then one can write:

以上两种方法均得到良好结果。The above two methods have obtained good results.

应注意，在一些情况下，使用参考染色体是不合意的、不可能的或不可行的。在这种情况下，有可能推单独地导出每个染色体的最佳倍性判读。具体来说：It should be noted that in some cases it may be undesirable, impossible or infeasible to use a reference chromosome. In this case, it is possible to deduce the optimal ploidy call for each chromosome individually. Specifically:

可以如上所述单独地测定每个染色体的，假定假设H，不仅仅针对参考染色体假定二体性。使用这种方法，为了保持噪声和孩子分数这两个参数固定，有可能针对每条染色体和每种假设，固定所述参数中的任一个，或保持两个参数呈概率形式。As described above, each chromosome can be determined individually, assuming hypothesis H, not only postulated disomy for the reference chromosome. Using this approach, to keep the two parameters noise and child score fixed, it is possible to fix either of the parameters, or keep both parameters in probabilistic form, for each chromosome and each hypothesis.

DNA的测量容易产生噪声和/或误差，尤其是在DNA量少的情况下或在DNA与受到污染的DNA混合的情况下测量。这种噪声导致基因型数据的准确性降低，并且倍性判读的准确性降低。在一些实施例中，平台建模或噪声建模的一些其它方法可以用于对抗噪声对倍性测定的有害影响。本发明方法使用两个信道的联合模型，它解释了由于输入DNA的量、DNA质量和/或方案质量所造成的随机噪声。Measurements of DNA are prone to noise and/or errors, especially when the amount of DNA is small or where the DNA is mixed with contaminated DNA. This noise results in less accurate genotype data and less accurate ploidy calls. In some embodiments, platform modeling or some other method of noise modeling can be used to counteract the detrimental effect of noise on ploidy determinations. The method of the present invention uses a joint model of two channels, which accounts for random noise due to the amount of input DNA, DNA quality and/or protocol quality.

这和本领域中已知的其中倍性测定是使用在基因座的等位基因强度的比率进行的一些方法形成对比。这种方法妨碍了准确的SNP噪声建模。具体来说，测量结果的误差通常不是特别取决于所测量的信道强度比率，这使模型简化为使用一维信息。噪声、信道质量和信道交互的准确建模要求二维联合模型，这不能使用等位基因比率来建模。This is in contrast to some methods known in the art in which ploidy determinations are made using ratios of allelic intensities at loci. This approach prevents accurate SNP noise modeling. Specifically, the errors of the measurements are usually not particularly dependent on the measured channel strength ratios, which reduces the model to use one-dimensional information. Accurate modeling of noise, channel quality, and channel interaction requires a two-dimensional joint model, which cannot be modeled using allele ratios.

具体来说，将两个信道信息投影到比率r(其中f(x，y)是r＝x/y)不适用于准确的信道噪声和偏差建模。在特定SNP上的噪声不是比率的函数，即噪声(x，y)≠f(x，y)，但是实际上是两个信道的联合函数。举例来说，在二项式模型中，所测量比率的噪声的方差是r(1-r)/(x+y)，它不单纯是r的函数。在这种模型中，其中包括了任何信道偏差或噪声，假如在SNP i上，所观察到的信道X值是x＝a_iX+b_i，其中X是真实信道值，b_i是额外信道偏差和随机噪声。类似地，假如y＝c_iY+d_i。所观察到的比率r＝x/y不能准确地预测真实比率X/Y或对残余噪声建模，因为(aiX+bi)/(ciY+di)不是X/Y的函数。In particular, projecting two channel information to a ratio r (where f(x,y) is r=x/y) is not suitable for accurate channel noise and bias modeling. The noise at a particular SNP is not a function of the ratio, ie noise(x,y)≠f(x,y), but is actually a joint function of the two channels. For example, in a binomial model, the variance of the noise of the measured ratio is r(1-r)/(x+y), which is not a pure function of r. In such a model, which includes any channel bias or noise, if at SNP i, the observed channel X value is x=a _i X+ _bi , where X is the true channel value and _bi is the extra channel Bias and random noise. Similarly, suppose y=c _i Y+d _i . The observed ratio r=x/y cannot accurately predict the true ratio X/Y or model residual noise because (aiX+bi)/(ciY+di) is not a function of X/Y.

本文中所公开的方法描述了使用单独测量的所有信道的联合二项式分布对噪声和偏差进行建模的有效方式。可以在本文档中的其它地方谈到每个SNP一致偏差、P(好)和P(参考|坏)、P(突变|坏)(它们有效地调整SNP行为)的部分中找到相关方程式。在一个实施例中，本发明方法使用β-二项式分布，它避免限制仅仅依赖于等位基因比率的实践，但是实际上基于两个信道计数对行为进行建模。The methods disclosed herein describe an efficient way to model noise and bias using the joint binomial distribution of all channels measured individually. Relevant equations can be found elsewhere in this document in the sections talking about each SNP Consensus Bias, P(good) and P(Ref|Bad), P(Mutation|Bad), which effectively tune the behavior of the SNP. In one embodiment, the inventive method uses a beta-binomial distribution, which avoids the practice of being restricted to relying solely on allele ratios, but actually models behavior based on two channel counts.

在一个实施例中，本文中所公开的方法可以通过使用所有可用的测量结果，从母本血浆中所发现的遗传数据判读孕育中的胎儿的倍性。在一个实施例中，本文中所公开的方法可以通过使用仅仅亲本背景子组的测量结果，从母本血浆中所发现的遗传数据判读孕育中的胎儿的倍性。本领域中已知的一些方法仅仅使用所测量的遗传数据，其中亲本背景来自AA|BB背景，也就是说，其中亲本双方在指定基因座均是纯合的，但是针对不同等位基因。这种方法的一个问题是来自AA|BB背景的多态基因座的比例小，通常小于10％。在本文中所公开的方法的一个实施例中，所述方法不使用在亲本背景是AA|BB的基因座进行的母本血浆的遗传测量结果。在一个实施例中，本发明方法仅仅使用亲本背景是AA|AB、AB|AA和AB|AB的那些多态基因座的血浆测量结果。In one embodiment, the methods disclosed herein can call the ploidy of a gestating fetus from genetic data found in maternal plasma by using all available measurements. In one embodiment, the methods disclosed herein can call the ploidy of a gestating fetus from genetic data found in maternal plasma by using measurements from only a subset of parental backgrounds. Some methods known in the art use only measured genetic data where the parental background is from an AA|BB background, that is, where both parents are homozygous at a given locus, but for different alleles. A problem with this approach is that the proportion of polymorphic loci from the AA|BB background is small, typically less than 10%. In one embodiment of the methods disclosed herein, the method does not use genetic measurements of maternal plasma at loci where the parental background is AA|BB. In one embodiment, the methods of the invention use only plasma measurements for those polymorphic loci whose parental backgrounds are AA|AB, AB|AA and AB|AB.

本领域中已知的一些方法涉及对AA|BB背景中来自SNP的等位基因比率取平均，其中亲本双方的基因型都存在，并且要求从这些SNP上的平均等位基因比率测定倍性判读。这种方法由于有差别的SNP行为而遭受显著不准确性。应注意，这种方法假定亲本双方的基因型都是已知的。相比之下，在一些实施例中，本发明方法使用联合信道分布模型，它不假定亲本任一方的存在，并且不假定均匀SNP行为。在一些实施例中，本发明方法解释不同的SNP行为/加权。在一些实施例中，本发明方法不需要了解亲本一方或双方的基因型。以下是本发明方法可以如何实现这一点的一个实例：Some methods known in the art involve averaging the allele ratios from SNPs in an AA|BB background where the genotypes of both parents are present, and requiring the determination of ploidy calls from the average allele ratios at these SNPs . This approach suffers from significant inaccuracy due to differential SNP behavior. It should be noted that this method assumes that the genotypes of both parents are known. In contrast, in some embodiments, the present methods use a joint channel distribution model that does not assume the presence of either parent and does not assume uniform SNP behavior. In some embodiments, the methods of the invention account for different SNP behaviors/weightings. In some embodiments, the methods of the invention do not require knowledge of the genotype of one or both parents. The following is an example of how the inventive method can accomplish this:

在一些实施例中，可以基于每个SNP测定假设的对数似然性。在特定SNP i上，假定胎儿倍性假设H和胎儿DNA百分比cf，所观察到的数据D的对数似然性定义为：In some embodiments, the log-likelihood of a hypothesis can be determined on a per SNP basis. The log-likelihood of the observed data D at a particular SNP i is defined as:

其中鉴于假设H，m是可能真实的母亲基因型，f是可能真实的父亲基因型，其中m，f{AA，AB，BB}，并且其中c是可能的孩子基因型。具体来说，关于单体性，c；关于二体性，c；关于三体性，c。应注意，包括亲本基因型数据通常产生更准确的倍性测定，然而，亲本基因型数据不是本发明方法良好起作用所必需的。where m is the likely true maternal genotype, f is the likely true paternal genotype given assumption H, where m, f{AA,AB,BB}, and where c is the likely child genotype. Specifically, about monosomy, c; about disomy, c; about trisomy, c. It should be noted that including parental genotype data generally results in more accurate ploidy determinations, however, parental genotype data is not required for the method of the invention to work well.

本领域中已知的一些方法涉及对来自SNP的等位基因比率取平均，其中母亲是纯合的但是测量血浆(AA|AB或AA|BB背景)中的不同等位基因，并且要求从这些SNP上的平均等位基因比率测定倍性判读。这种方法打算用于其中父本基因型不可用的情况。应注意，宣称在特定SNP上的血浆是杂合的并且不存在纯合和相反的父亲BB有多准确是有问题的：因为在低孩子分数的情况下，看起来存在B等位基因可能只是存在噪声；另外，看上去不存在B可能只是胎儿测量结果的等位基因丢失。甚至在可以实际上测定血浆的杂合性的情况下，这种方法仍将不能够区别父本三体性。具体来说，关于其中母亲是AA，并且其中在血浆中测量到一些B的SNP，如果父亲是GG，那么所得孩子基因型是AGG，使得A的平均比率是33％(孩子分数＝100％)。但是在其中父亲是AG的情况下，所得孩子基因型关于匹配三体性可以是AGG，贡献33％A比率；或关于不匹配三体性是AAG，使平均比率更倾向于66％A。鉴于在交叉的染色体上的多个三体性，整体染色体可以具有介于无不匹配三体性与所有不匹配三体性之间的任何不匹配三体性，这个比率可以在33％到66％之间变化。关于普通二体性，比率应该是约50％。在不使用连锁模型或平均值的准确误差模型的情况下，这种方法将错过父本三体性的多个情况。相比之下，本文中所公开的方法基于可用的基因型信息和群体频率，为每个候选亲本基因型分配亲本基因型概率，并且不明确要求亲本基因型。另外，即使在不存在或存在亲本基因型数据的情况下，本文中所公开的方法仍能够检测三体性，并且可以使用连锁模型通过鉴别从匹配到不匹配三体性的可能交叉点来补偿。Some methods known in the art involve averaging allele ratios from SNPs where the mother is homozygous but measure different alleles in plasma (AA|AB or AA|BB background), and require The average allele ratio at the SNP determines the ploidy call. This method is intended for use in situations where paternal genotypes are not available. It should be noted that it is questionable how accurate it is to declare that the plasma is heterozygous at a particular SNP and that no homozygous and opposite paternal BB is present: since in the case of low child fractions it appears that the presence of the B allele may simply be Noise is present; otherwise, the apparent absence of B may simply be an allelic loss of fetal measurements. Even where plasma heterozygosity could actually be determined, this approach would not be able to distinguish paternal trisomies. Specifically, for SNPs where the mother is AA, and where some B is measured in the plasma, if the father is GG, then the resulting child genotype is AGG such that the mean rate of A is 33% (child fraction = 100%) . But in cases where the father is AG, the resulting child genotype could be AGG for a matched trisomy, contributing 33% of the A rate, or AAG for an unmatched trisomy, skewing the average rate towards 66% A. Given multiple trisomies on crossed chromosomes, the overall chromosome can have any mismatched trisomies between no mismatched trisomies and all mismatched trisomies, this rate can range from 33% to 66% change between. For ordinary disomy, the ratio should be about 50%. Without using linkage models or accurate error models for the mean, this approach would miss multiple cases of paternal trisomy. In contrast, the methods disclosed herein assign a parental genotype probability to each candidate parental genotype based on available genotype information and population frequencies, and do not explicitly require parental genotypes. In addition, the methods disclosed herein are capable of detecting trisomies even in the absence or presence of parental genotype data, and can use linkage models to compensate by identifying possible crossover points from matched to unmatched trisomies .

本领域中已知的一些方法宣称一种用于对其中母本或父本的基因型均不已知的SNP的等位基因比率取平均，并且用于从这些SNP上的平均比率测定倍性判读的方法。然而，实现这些目的的方法没有公开。本文中所公开的方法能够在这种情况下做出准确的倍性判读，并且在本文档中的其它地方公开了具体实践方式，使用联合概率最大似然法和任选地利用SNP噪声和偏差模型以及连锁模型。Some methods known in the art claim one for averaging the allelic ratios for SNPs where neither the maternal or paternal genotypes are known, and for determining ploidy calls from the average ratios over these SNPs Methods. However, methods for achieving these objects are not disclosed. The methods disclosed herein enable accurate ploidy calls to be made in this situation, and are disclosed elsewhere in this document in practice, using joint probabilistic maximum likelihood and optionally exploiting SNP noise and bias model and chain model.

本领域中已知的一些方法涉及对等位基因比率取平均并且要求从在一个或几个SNP的平均等位基因比率测定倍性判读。然而，这类方法不利用连锁概念。本文中所公开的方法没有这些缺点。Some methods known in the art involve averaging the allele ratios and require determination of ploidy calls from the average allele ratios at one or a few SNPs. However, such methods do not exploit the chain concept. The methods disclosed herein do not have these disadvantages.

使用序列长度作为先验来测定DNA来源Using sequence length as a prior to determine DNA origin

已经报告，母本和胎儿的DNA的序列长度分布不同，其中胎儿一般更短。在本发明的一个实施例中，有可能使用呈经验数据形式的先前知识，并且为母亲(P(X|母本))和胎儿DNA(P(X|胎儿))的预计长度构造先验分布。鉴于新的未鉴别的长度x的DNA序列，基于x是母本的还是胎儿的先验似然性，有可能分配指定DNA序列是母本或胎儿DNA的概率。具体来说，如果P(x|母本)＞P(x|胎儿)，那么DNA序列可以归为母本，其中P(x|母本)＝P(x|母本)/[(P(x|母本)+P(x|胎儿)]；并且如果p(x|母本)＜p(x|胎儿)，那么DNA序列可以归为胎儿，P(x|胎儿)＝P(x|胎儿)/[(P(x|母本)+P(x|胎儿)]。在本发明的一个实施例中，可以通过考虑可以按高概率分配给母本或胎儿的序列测定特定针对所述样品的母本和胎儿序列长度的分布，并且然后所述样品特定分布可以用作所述样品的预计大小分布。It has been reported that the sequence length distribution of maternal and fetal DNA differs, with the fetus generally being shorter. In one embodiment of the invention, it is possible to use prior knowledge in the form of empirical data and construct prior distributions for the expected lengths of maternal (P(X|maternal)) and fetal DNA (P(X|fetal)) . Given a new unidentified DNA sequence of length x, it is possible to assign a probability that a given DNA sequence is maternal or fetal DNA based on the prior likelihood that x is maternal or fetal. Specifically, if P(x|maternal)>P(x|fetal), then the DNA sequence can be classified as maternal, where P(x|maternal)=P(x|maternal)/[(P( x|maternal)+P(x|fetal)]; and if p(x|maternal)<p(x|fetal), then the DNA sequence can be classified as fetal, P(x|fetal)=P(x| Fetus)/[(P(x|maternal)+P(x|fetus)]. In one embodiment of the present invention, it can be determined specifically for the The distribution of maternal and fetal sequence lengths for a sample, and then the sample-specific distribution can be used as the predicted size distribution for the sample.

用于使测序成本降到最低的变量读数深度Variable read depth for minimizing sequencing costs

在关于诊断的多个临床试验中，例如在丘等人BMJ 2011；342：c7401中，设定了具有多个参数的方案，并且然后针对试验中的每一个患者以相同参数执行相同方案。在使用测序作为测量遗传物质的方法来测定母亲体内正在孕育的胎儿的倍性状态的情况下，一个相关参数是读数数量。读数数量可以指实际读数的数量、预定读数的数量、测序仪上的部分泳道、全部泳道或全流动池。在这些研究中，读数数量通常在将确保所有或几乎所有样品获得所希望的准确性水平的水平下设定。测序目前是一种昂贵的技术，成本大约是每500万可映射读数200美元，并且虽然价格正在降低，但是允许在类似的准确性水平下但是以更少的读数来操作基于测序的诊断的任何方法必定将节省大量金钱。In several clinical trials on diagnosis, for example in Yau et al. BMJ 2011;342:c7401, a protocol with multiple parameters is set up and then the same protocol is performed with the same parameters for each patient in the trial. In cases where sequencing is used as a method of measuring genetic material to determine the ploidy state of a fetus in a mother's gestation, a relevant parameter is the number of reads. The number of reads can refer to the number of actual reads, the number of scheduled reads, a fraction of lanes on a sequencer, all lanes, or a full flow cell. In these studies, the number of reads is typically set at a level that will ensure the desired level of accuracy for all or nearly all samples. Sequencing is currently an expensive technology, costing approximately $200 per 5 million mappable reads, and while prices are decreasing, any technology that allows sequencing-based diagnostics to operate at a similar level of accuracy but with fewer reads The method is sure to save a lot of money.

倍性测定的准确性通常取决于多种因素，包括读数数量和混合物中胎儿DNA的分数。当混合物中胎儿DNA的分数更高时，准确性通常更高。同时，如果读数数量更大，那么准确性通常更高。有可能出现以两种情况以相当的准确性测定倍性状态的情形，其中第一种情况所具有的混合物中胎儿DNA的分数低于第二种情况；并且在第一种情况中进行测序的读数多于第二种情况。有可能使用混合物中胎儿DNA的估计分数作为达到指定准确性水平所必需测定的读数数量的指导。The accuracy of ploidy determination often depends on several factors, including the number of reads and the fraction of fetal DNA in the mixture. Accuracy is generally higher when the fraction of fetal DNA in the mixture is higher. Also, the accuracy is usually higher if the number of readings is greater. There may be situations where ploidy status is determined with comparable accuracy in two cases, wherein the first case has a lower fraction of fetal DNA in the mixture than the second case; There are more readings than in the second case. It is possible to use the estimated fraction of fetal DNA in the mixture as a guide to the number of reads that must be determined to achieve a specified level of accuracy.

在本发明的一个实施例中，可以运行一组样品，其中对所述组中的不同样品测序的读数深度不同，其中每个样品所运行的读数数量以每个混合物中胎儿DNA的计算分数所能达到的指定准确性水平进行选择。在本发明的一个实施例中，这会需要对混合样品进行测量以测定混合物中胎儿DNA的分数；胎儿分数的这种估计可以用测序进行，测序可以用塔克曼进行，测序可以用qPCR进行，测序可以用SNP阵列进行，测序可以用可以区别在指定基因座的不同等位基因的任何方法进行。进行胎儿分数估计的需要可以通过在当与实际测量数据比较时所考虑的假设组中包括涵盖所有或一组选定胎儿部分的假设来消除。在已经测定了混合物中的胎儿DNA分数之后，可以测定每个样品的待进行读数的序列的数量。In one embodiment of the invention, a set of samples may be run where different samples in the set are sequenced at different depths of reads, where the number of reads run per sample is determined by the calculated fraction of fetal DNA in each mixture. Select the specified level of accuracy that can be achieved. In one embodiment of the invention, this would require measurement of the pooled sample to determine the fraction of fetal DNA in the pool; such estimation of the fetal fraction can be done with sequencing, sequencing can be done with Tuckerman, sequencing can be done with qPCR Sequencing can be performed using SNP arrays, and sequencing can be performed by any method that can distinguish between different alleles at a given locus. The need to perform fetal fraction estimation can be eliminated by including assumptions covering all or a selected set of fetal parts in the set of hypotheses considered when compared to actual measured data. After the fraction of fetal DNA in the mixture has been determined, the number of sequences to be read for each sample can be determined.

在本发明的一个实施例中，100名孕妇访问其相应的OB，并且她们的血液被抽入含有抗裂解剂和/或用于灭活DNA酶的试剂的采血管中。他们每个人带回家一个试剂盒，让其孕育的胎儿的父亲给出唾液样本。所有100对夫妇的这两组遗传物质被送回实验室，其中将母亲血液旋转降速并且分离白细胞层以及血浆。血浆包含母本DNA以及胎盘来源的DNA的混合物。母本白细胞层和父本血液使用SNP阵列进行基因分型，并且用休尔塞莱克特杂交探针靶向母本血浆样品中的DNA。用探针展开的DNA被用于产生100个用于母本样品中的每一个的标记库，其中每个样品都用不同的标记进行标记。从每个库中取出一部分，将那些部分中的每一个混合在一起并且以复合方式添加到伊路米那HISEQ DNA测序仪的两个泳道上，其中每个泳道产生约5000万可映射读数，在100种复合混合物上产生约1亿可映射读数，或每个样品约100万读数。序列读数用于测定每个混合物中胎儿DNA的分数。样品中的50个的混合物中胎儿DNA超过15％，并且100万读数足以按99.9％的置信度测定胎儿的倍性状态。In one example of the present invention, 100 pregnant women visit their respective OBs, and their blood is drawn into blood collection tubes containing anti-lysing agents and/or reagents for inactivating DNase. They each took home a kit for the father of the fetus to give a saliva sample. Both sets of genetic material from all 100 couples were sent back to the laboratory, where the mother's blood was spun down and the buffy coat and plasma were separated. Plasma contains a mixture of maternal DNA as well as DNA of placental origin. Maternal buffy coat and paternal blood were genotyped using SNP arrays, and Hull Select hybridization probes were used to target DNA in maternal plasma samples. The DNA developed with the probes was used to generate a marker library of 100 for each of the maternal samples, where each sample was labeled with a different marker. A fraction was taken from each library, and each of those fractions were pooled together and added in a multiplexed fashion to two lanes of the Illumina HISEQ DNA sequencer, where each lane yielded approximately 50 million mappable reads, Generates ~100 million mappable reads on 100 complex mixtures, or ~1 million reads per sample. Sequence reads were used to determine the fraction of fetal DNA in each mixture. The pool of 50 of the samples contained more than 15% fetal DNA, and 1 million reads were sufficient to determine the ploidy state of the fetus with 99.9% confidence.

在剩余混合物中，25种的胎儿DNA介于10％与15％之间；将从这些混合物制备的相关库中的每一个的一部分复合并在HISEQ的一个泳道上运行，每个样品再产生200万读数。将胎儿DNA介于10％与15％之间的混合物中的每一个的这两组序列数据加到一起，并且所得每个样品300万读数足以按99.9％的置信度测定那些胎儿的倍性状态。Of the remaining mixtures, 25 had between 10% and 15% fetal DNA; a portion of each of the relevant libraries prepared from these mixtures was multiplexed and run on one lane of HISEQ, yielding an additional 200 per sample. million readings. These two sets of sequence data for each of the mixtures between 10% and 15% fetal DNA were added together, and the resulting 3 million reads per sample were sufficient to determine the ploidy status of those fetuses with 99.9% confidence .

在剩余混合物中，13种的胎儿DNA介于6％与10％之间；将从这些混合物制备的相关库中的每一个的一部分复合并在HISEQ的一个泳道上运行，每个样品再产生400万读数。将胎儿DNA介于6％与10％之间的混合物中的每一个的这两组序列数据加到一起，并且所得每个混合物500万总读数足以按99.9％的置信度测定那些胎儿的倍性状态。Of the remaining mixtures, 13 had between 6% and 10% fetal DNA; a portion of each of the relevant libraries prepared from these mixtures was multiplexed and run on one lane of HISEQ, yielding 400 additional samples per sample. million readings. These two sets of sequence data for each of the pools with fetal DNA between 6% and 10% were added together and the resulting total of 5 million reads per pool was sufficient to determine the ploidy of those fetuses with 99.9% confidence state.

在剩余混合物中，8种的胎儿DNA介于4％与6％之间；将从这些混合物制备的相关库中的每一个的一部分复合并在HISEQ的一个泳道上运行，每个样品再产生600万读数。将胎儿DNA介于4％与6％之间的混合物中的每一个的这两组序列数据加到一起，并且所得每个混合物700万总读数足以按99.9％的置信度测定那些胎儿的倍性状态。Eight of the remaining mixtures had between 4% and 6% fetal DNA; a portion of each of the relevant libraries prepared from these mixtures was multiplexed and run on one lane of HISEQ, yielding 600 additional samples per sample. million readings. These two sets of sequence data for each of the pools with fetal DNA between 4% and 6% were added together, and the resulting 7 million total reads per pool were sufficient to determine the ploidy of those fetuses with 99.9% confidence state.

在剩余四种混合物中，它们的胎儿DNA全都介于2％与4％之间；将从这些混合物制备的相关库中的每一个的一部分复合并在HISEQ的一个泳道上运行，每个样品再产生1200万读数。将胎儿DNA介于2％与4％之间的混合物中的每一个的这两组序列数据加到一起，并且所得每个混合物1300万总读数足以按99.9％的置信度测定那些胎儿的倍性状态。In the remaining four mixtures, all of them had between 2% and 4% fetal DNA; a portion of each of the relevant libraries prepared from these mixtures was multiplexed and run on one lane of HISEQ, and each sample was reassembled. Generates 12 million readings. These two sets of sequence data for each of the pools with fetal DNA between 2% and 4% were added together, and the resulting total of 13 million reads per pool was sufficient to determine the ploidy of those fetuses with 99.9% confidence state.

这种方法需要在HISEQ机器上有六个测序泳道以使超过100个样品达到99.9％准确性。如果每一个样品都需要相同数量的运行来确保每一个倍性测定的准确性都是99.9％，那么将需要25个测序泳道；并且如果容许4％的无判读率或误差率，那么14个测序泳道就已经可以实现了。This method requires six sequencing lanes on the HISEQ machine to achieve 99.9% accuracy over 100 samples. If each sample required the same number of runs to ensure that each ploidy determination was 99.9% accurate, then 25 sequencing lanes would be required; and if a 4% no-call or error rate was tolerated, then 14 sequencing Swimlanes are already available.

使用原始基因分型数据Using raw genotyping data

可以使用对在母本血液中发现的胎儿DNA所测量的胎儿遗传信息来完成NPD的方法有多种。这些方法中的一些涉及使用SNP阵列来测量胎儿DNA，一些方法涉及非靶向测序，并且一些方法涉及靶向测序。靶向测序可以靶向SNP，可以靶向STR，可以靶向其它多态基因座，可以靶向非多态基因座或其一些组合。这些方法中的一些可以涉及使用商用或专用的等位基因判读器，从来自进行测量的机器的传感器的强度数据判读等位基因的身份。举例来说，伊路米那印飞尼姆系统或昂飞基因芯片微阵列系统涉及连接有可以杂交到DNA互补区段的DNA序列的珠粒或微芯片；在杂交之后，可以检测到的传感器分子的荧光特性发生变化。还存在测序方法，例如伊路米那索莱萨基因组测序仪或ABI索立德基因组测序仪，其中对DNA片段的基因序列进行测序；在与有待测序的链互补的DNA链的延伸之后，通常经由附接到互补核苷酸的荧光或辐射标记检测已延伸核苷酸的身份。在所有这些方法中，基因型或测序数据通常基于荧光或其它信号或缺乏这些信号来测定。这些系统通常与低水平软件包装组合，所述软件包从荧光或其它检测装置的模拟输出(原始遗传数据)进行特异性等位基因判读(二次遗传数据)。举例来说，在SNP阵列上的指定等位基因的情况下，所述软件将做出判读，例如如果测量到的荧光强度高于或低于某一阈值，那么存在或不存在某一SNP。类似地，测序仪的输出是指示所检测到的每种染料的荧光水平的色谱图，并且所述软件做出以下判读：某一碱基对是A或T或C或G。高通量测序仪通常进行一系列这样的测量，判读表示被测序的DNA序列的最有可能的结构的读数。在此将色谱图的直接模拟输出定义为原始遗传数据，并且在此认为通过软件进行的碱基对/SNP判读是二次遗传数据。在一个实施例中，原始数据是指原始强度数据，它是基因分型平台的未经处理输出，其中基因分型平台可以指SNP阵列或测序平台。二次遗传数据是指经处理遗传数据，其中已经做出等位基因判读，或序列数据已经分配碱基对，和/或序列读数已经映射到基因组。There are a number of ways in which NPD can be accomplished using fetal genetic information measured on fetal DNA found in maternal blood. Some of these methods involve the use of SNP arrays to measure fetal DNA, some methods involve non-targeted sequencing, and some methods involve targeted sequencing. Targeted sequencing can target SNPs, can target STRs, can target other polymorphic loci, can target non-polymorphic loci, or some combination thereof. Some of these methods may involve reading the identity of the allele from the intensity data from the sensor of the machine making the measurement using a commercial or dedicated allele reader. For example, the Illumina Infinium system or the Affilix gene chip microarray system involve beads or microchips attached with DNA sequences that can hybridize to complementary segments of DNA; after hybridization, sensors that can detect The fluorescent properties of the molecule change. There are also sequencing methods, such as the Illumina Solesa Genome Sequencer or the ABI Solid Genome Sequencer, in which the genetic sequence of a DNA fragment is sequenced; after extension of the DNA strand complementary to the strand to be sequenced, usually The identity of the extended nucleotide is detected via a fluorescent or radioactive label attached to the complementary nucleotide. In all of these methods, genotype or sequencing data are typically determined based on fluorescence or other signals or lack thereof. These systems are often combined with low-level software packages that perform specific allelic calls (secondary genetic data) from analog output from fluorescence or other detection devices (primary genetic data). For example, in the case of a given allele on a SNP array, the software will make a call, eg, the presence or absence of a certain SNP if the measured fluorescence intensity is above or below a certain threshold. Similarly, the output of the sequencer is a chromatogram indicating the level of fluorescence detected for each dye, and the software makes the following call: a certain base pair is A or T or C or G. High-throughput sequencers typically perform a series of such measurements, calling out reads that represent the most likely structure of the sequenced DNA sequence. The direct analog output of chromatograms is defined here as primary genetic data, and base pair/SNP calls by software are considered here as secondary genetic data. In one embodiment, raw data refers to raw intensity data, which is the unprocessed output of a genotyping platform, where a genotyping platform can refer to a SNP array or a sequencing platform. Secondary genetic data refers to processed genetic data in which allelic calls have been made, or sequence data have been assigned base pairs, and/or sequence reads have been mapped to a genome.

许多更高水平的应用利用这些等位基因判读、SNP判读和序列读数，即基因分型软件产生的二次遗传数据。举例来说，DNANEXUS、ELAND或MAQ将采用测序读数并且将它们映射到基因组。举例来说，在非侵入性产前诊断的情况下，例如PARENTALSUPPORT^TM的复杂信息可以利用大量SNP判读来测定个体的基因型。此外，在植入前基因诊断的情况下，有可能采用一组被映射到基因组的序列读数，并且通过采用被映射到每条染色体或染色体区段的读数的归一化计数，有可能可以测定个体的倍性状态。在非侵入性产前诊断的情况下，有可能可以采用已经针对母本血浆中所存在的DNA进行测量的一组序列读数，并且将它们映射到基因组。然后可以采用映射到每条染色体或染色体区段的读数的归一化计数，并且使用所述数据来测定个体的倍性状态。举例来说，有可能可以得出以下结论：被抽取血液的母亲体内正在孕育的胎儿中具有不成比例地大量读数的那些染色体是三体的。Many higher-level applications make use of these allelic calls, SNP calls, and sequence reads, the secondary genetic data generated by genotyping software. For example, DNANEXUS, ELAND or MAQ will take sequencing reads and map them to the genome. For example, in the case of non-invasive prenatal diagnosis, complex information such as PARENTALSUPPORT ^™ can utilize a large number of SNP calls to determine the genotype of an individual. Furthermore, in the case of preimplantation genetic diagnosis, it is possible to take a set of sequence reads that are mapped to the genome, and by taking a normalized count of the reads that are mapped to each chromosome or chromosome segment, it is possible to determine The ploidy state of an individual. In the case of non-invasive prenatal diagnosis, it is possible to take a set of sequence reads already measured against DNA present in maternal plasma and map them to the genome. A normalized count of reads mapped to each chromosome or chromosome segment can then be taken and the data used to determine the ploidy state of the individual. For example, it may be possible to conclude that those chromosomes with a disproportionately large number of reads in a gestating fetus of a mother whose blood was drawn are trisomy.

然而，实际上，测量仪器的初始输出是模拟信号。当通过与测序软件相关的软件判读某一碱基对时，例如所述软件可以判读碱基对为T，实际上，所述判读是软件认为最可能的判读。在一些情况下，然而，所述判读可以具有低置信度，例如模拟信号可以指示特定碱基对只有90％可能是T，并且10％可能是A。在另一实例中，与SNP阵列阅读器相关的基因型判读软件可以判读某一等位基因为G。然而，实际上，基本模拟信号可以指示所述等位基因只有70％可能是G，并且所述等位基因30％可能是T。在这些情况下，当更高水平应用使用通过更低水平的软件得到的基因型判读和序列判读时，它们丢失了一些信息。也就是说，如通过基因分型平台直接测量的原始遗传数据可能比通过所连接的软件包测定的二次遗传数据混乱，但是它含有更多信息。在将二次遗传数据序列映射到基因组的过程中，剔除多个读数，因为一些碱基读取的清晰度不够和/或映射不清楚。当使用原始遗传数据序列读数时，已经在首先被转化为二次遗传数据序列读数时被剔除的那些读数中的所有或多个可以通过以概率方式处理读数来使用。In practice, however, the initial output of the measuring instrument is an analog signal. When a certain base pair is interpreted by the software related to the sequencing software, for example, the software can interpret the base pair as T, in fact, the interpretation is the most likely interpretation considered by the software. In some cases, however, the call may have a low confidence level, for example an analog signal may indicate that a particular base pair is only 90% likely to be a T, and 10% likely to be an A. In another example, genotype calling software associated with a SNP array reader can call a certain allele as G. In practice, however, the base analog signal may indicate that the allele is only 70% likely to be G, and the allele is 30% likely to be T. In these cases, when higher level applications use genotype calls and sequence calls obtained by lower level software, they lose some information. That is, primary genetic data as measured directly by a genotyping platform may be messier than secondary genetic data determined by a connected software package, but it is more informative. During the sequence mapping of secondary genetic data to the genome, multiple reads were discarded because some base reads were not sufficiently defined and/or mapped poorly. When using primary genetic data sequence reads, all or more of those reads that have been rejected when first converted to secondary genetic data sequence reads can be used by probabilistically processing the reads.

在本发明的一个实施例中，更高水平软件不依赖于通过更低水平软件测定的等位基因判读、SNP判读或序列读数。实际上，更高水平软件的计算是基于从基因分型平台直接测量的模拟信号。在本发明的一个实施例中，修改例如PARENTAL SUPPORT^TM的基于信息的方法以使得其重构胚胎/胎儿/孩子的遗传数据的能力经改造以直接使用如通过基因分型平台测量的原始遗传数据。在本发明的一个实施例中，例如PARENTALSUPPORT^TM的基于信息的方法能够在使用原始遗传数据并且不使用二次遗传数据的情况下做出等位基因判读和/或染色体拷贝数判读。在本发明的一个实施例中，所有的基因判读、SNP判读、序列读数、序列映射都以概率方式，通过使用如通过基因分型平台直接测量的原始强度数据，而不是将原始遗传数据转化为二次基因判读来处理。在一个实施例中，用于计算等位基因计数概率和测定每种假设的相对概率的制备样品的DNA测量结果包含原始遗传数据。In one embodiment of the invention, the higher level software does not rely on allelic calls, SNP calls or sequence reads determined by the lower level software. In fact, the calculations of the higher level software are based on the analog signals measured directly from the genotyping platform. In one embodiment of the invention, an information-based approach such as PARENTAL SUPPORT ^™ is modified such that its ability to reconstruct the genetic data of an embryo/fetus/child is adapted to directly use raw genetic data as measured by a genotyping platform . In one embodiment of the invention, an information-based approach such as PARENTALSUPPORT ^™ enables allelic calls and/or chromosome copy number calls to be made using primary genetic data and without secondary genetic data. In one embodiment of the invention, all gene calls, SNP calls, sequence reads, sequence mappings are probabilistically by using raw intensity data as directly measured by genotyping platforms, rather than converting raw genetic data into Secondary gene interpretation to deal with. In one embodiment, the DNA measurements of the prepared samples used to calculate the allele count probabilities and determine the relative probabilities of each hypothesis comprise raw genetic data.

在一些实施例中，所述方法可以提高并有至少一个相关个体的遗传数据的目标个体遗传数据的准确性，所述方法包含获得对目标个体的基因组具有特异性的原始遗传数据和对相关个体的基因组具有特异性的遗传数据；创建一组一或多种关于相关个体的哪些染色体的哪些区段可能对应于目标个体的基因组中的那些区段的假设；鉴于目标个体的原始遗传数据和相关个体的遗传数据测定每个假设的概率；以及使用与每种假设相关的概率测定目标个体的实际遗传物质的最可能状态。在一些实施例中，所述方法可以测定目标个体的基因组中染色体区段的拷贝数，所述方法包含创建一组关于目标个体的基因组中存在多少染色体区段拷贝的拷贝数假设；将来自目标个体的原始遗传数据和来自一或多个相关个体的遗传信息并入数据集中；估计与数据集相关的平台响应的特征，其中平台响应可以从一个实验变到另一个；鉴于数据集和平台响应特征，计算每个拷贝数假设的条件概率；以及基于最可能的拷贝数假设，测定染色体区段的拷贝数。在一个实施例中，本发明方法可以测定目标个体中至少一条染色体的倍性状态，所述方法包含获得来自目标个体和来自一或多个相关个体的原始遗传数据；针对目标个体的每条染色体，创建一组至少一个倍性状态假设；使用一或多种专门技术测定所述组中每种倍性状态假设的统计概率，所用的每种专门技术是鉴于所得遗传数据；针对每种倍性状态假设，组合如通过一或多种专门技术测定的统计概率；以及基于每种倍性状态假设的组合统计概率，测定目标个体中每一条染色体的倍性状态。在一个实施例中，本发明方法可以测定目标个体和目标个体的亲本一方或双方以及任选地一或多个相关个体的一组等位基因中的等位基因状态，所述方法包含获得来自目标个体和来自亲本一方或双方以及来自任何相关个体的原始遗传数据；针对目标个体和亲本一方或双方以及任选地一或多个相关个体，创建一组至少一个等位基因假设，其中所述假设描述等位基因组中可能的等位基因状态；鉴于所得遗传数据，测定假设组中每种等位基因假设的统计概率；以及针对目标个体和亲本一方或双方以及任选地一或多个相关个体，基于每种等位基因假设的统计概率，测定等位基因组中每个等位基因的等位基因状态。In some embodiments, the method can improve the accuracy of the genetic data of the target individual with the genetic data of at least one related individual, the method comprising obtaining the raw genetic data specific to the genome of the target individual and the genetic data of the related individual genetic data specific to the genome of the target individual; create a set of one or more hypotheses about which segments of which chromosomes of the related individual might correspond to those segments in the genome of the target individual; given the original genetic data of the target individual and the associated The genetic data of the individual determines the probability of each hypothesis; and using the probabilities associated with each hypothesis, the most likely state of the actual genetic material of the target individual is determined. In some embodiments, the method may determine the copy number of the chromosome segment in the genome of the target individual, the method comprising creating a set of copy number hypotheses about how many copies of the chromosome segment are present in the genome of the target individual; Raw genetic data for individuals and genetic information from one or more related individuals are incorporated into datasets; characteristics of platform responses associated with datasets are estimated, where platform responses can vary from one experiment to another; given datasets and platform responses features, computing a conditional probability for each copy number hypothesis; and determining the copy number of the chromosome segment based on the most likely copy number hypothesis. In one embodiment, the method of the present invention can determine the ploidy state of at least one chromosome in a target individual, said method comprising obtaining raw genetic data from the target individual and from one or more related individuals; for each chromosome of the target individual , creating a set of at least one ploidy state hypothesis; determining the statistical probability of each ploidy state hypothesis in said set using one or more expertise, each expertise used given the resulting genetic data; for each ploidy state assumptions, combining statistical probabilities as determined by one or more specialized techniques; and determining the ploidy state of each chromosome in the target individual based on the combined statistical probabilities assumed for each ploidy state. In one embodiment, the method of the present invention can determine the allelic status in a set of alleles of a target individual and one or both parents of the target individual, and optionally one or more related individuals, the method comprising obtaining target individual and raw genetic data from one or both parents and from any related individuals; creating a set of at least one allelic hypothesis for the target individual and one or both parents and optionally one or more related individuals, wherein said Hypotheses describe possible allelic states in the set of alleles; given the resulting genetic data, determining the statistical probability of each allelic hypothesis in the set of hypotheses; Individually, the allelic status of each allele in the allelic set is determined based on the statistical probability assumed for each allele.

在一些实施例中，混合样品的遗传数据可以包含序列数据，其中序列数据可以不独特地映射到人类基因组。在一些实施例中，混合样品的遗传数据可以包含序列数据，其中序列数据映射到基因组中的多个位置，其中每种可能映射都与指定映射是正确的概率相关。在一些实施例中，假定序列读数不与基因组中的特定位置相关。在一些实施例中，序列读数与基因组中的多个位置和属于所述位置的相关概率相关。In some embodiments, the genetic data of the pooled sample may comprise sequence data, where the sequence data may not uniquely map to the human genome. In some embodiments, the genetic data of the pooled sample may comprise sequence data, where the sequence data maps to multiple locations in the genome, where each possible mapping is associated with a probability that the given mapping is correct. In some embodiments, it is assumed that sequence reads are not associated with a particular location in the genome. In some embodiments, sequence reads are associated with multiple locations in the genome and associated probabilities of belonging to said locations.

用于测定染色体拷贝数的计数法Counting Methods for Determining Chromosomal Copy Number

在一方面，本发明的特征在于通过比较与不同染色体对准的序列标记的数量来测试胎儿染色体的非正态分布的方法(参看例如2012年4月20日提交的美国专利第8,296,076号，所述专利特此以全文引用的方式并入本文中)。如本领域中已知，术语“序列标记”是指可以用于鉴别某一更大序列的相对较短(例如，15-100)的核酸序列，例如映射到染色体或基因组区域或基因。在一些实施例中，所述方法涉及(i)使包括母本和胎儿的DNA的混合物的样品与同时杂交到至少1,000、2,000、5,000、7,500、10,000、20,000、25,000、30,000、40,000、50,000、75,000或100,000个不同的目标基因座的引物库接触以产生反应混合物；其中所述目标基因座来自多个不同的染色体；并且其中多个不同的染色体包含至少一个疑似在所述样品中具有非正态分布的第一染色体和至少一个假定在所述样品中正态分布的第二染色体；(ii)使反应混合物经历引物延伸反应条件以产生扩增产物；(iii)对扩增产物进行测序以获得多个与目标基因座对准的序列标记；其中序列标记的长度足以分配给特异性目标基因座；(iv)在计算机上将多个序列标记分配给其对应的目标基因座；(v)在计算机上测定与第一染色体的目标基因座对准的序列标记的数量和与第二染色体的目标基因座对准的序列标记的数量；以及(vi)在计算机上比较来自步骤(v)的数量以确定存在或不存在第一染色体的非正态分布。In one aspect, the invention features a method of testing for non-normal distribution of fetal chromosomes by comparing the number of sequence markers aligned to different chromosomes (see, e.g., U.S. Patent No. 8,296,076, filed April 20, 2012, The aforementioned patents are hereby incorporated herein by reference in their entirety). As known in the art, the term "sequence marker" refers to a relatively short (eg, 15-100) nucleic acid sequence that can be used to identify a larger sequence, eg, maps to a chromosomal or genomic region or gene. In some embodiments, the method involves (i) simultaneously hybridizing a sample comprising a mixture of maternal and fetal DNA to at least 1,000, 2,000, 5,000, 7,500, 10,000, 20,000, 25,000, 30,000, 40,000, 50,000, 75,000 or 100,000 different primer pools of target loci are contacted to generate a reaction mixture; wherein the target loci are from a plurality of different chromosomes; and wherein the plurality of different chromosomes contain at least one suspected nonpositive gene in the sample (ii) subjecting the reaction mixture to primer extension reaction conditions to generate an amplification product; (iii) sequencing the amplification product to obtaining a plurality of sequence tags aligned with the target locus; wherein the sequence tags are of sufficient length to be assigned to a specific target locus; (iv) assigning in silico the plurality of sequence tags to their corresponding target loci; (v) Determining in silico the number of sequence markers aligned to the target locus of the first chromosome and the number of sequence markers aligned to the target locus of the second chromosome; and (vi) comparing in silico the results from step (v) Quantities to determine the presence or absence of a non-normal distribution of the first chromosome.

在一方面，本发明提供用于通过比较染色体之间目标扩增子的相对频率来检测存在或不存在胎儿非整倍性的方法(参看例如2012年1月23日提交的PCT公开第WO2012/103031号，所述公开特此以全文引用的方式并入本文中)。在一些实施例中，所述方法涉及(i)使包括母本和胎儿的DNA的混合物的样品与同时杂交到至少1,000、2,000、5,000、7,500、10,000、20,000、25,000、30,000、40,000、50,000、75,000或100,000个不同的非多态目标基因座的引物库接触以产生反应混合物；其中所述目标基因座来自多个不同的染色体；(ii)使反应混合物经历引物延伸反应条件以产生包括目标扩增子的扩增产物；(iii)在计算机上对来自第一和第二相关染色体的目标扩增子的相对频率进行定量；(iv)在计算机上比较来自第一和第二相关染色体的目标扩增子的相对频率；以及(v)基于所比较的第一和第二相关染色体的相对频率，鉴别存在或不存在非整倍性。在一些实施例中，第一染色体是疑似整倍体的染色体。在一些实施例中，第二染色体是疑似非整倍性的染色体。In one aspect, the invention provides methods for detecting the presence or absence of fetal aneuploidy by comparing the relative frequencies of target amplicons between chromosomes (see, e.g., PCT Publication No. WO2012/ 103031, the disclosure of which is hereby incorporated by reference in its entirety). In some embodiments, the method involves (i) simultaneously hybridizing a sample comprising a mixture of maternal and fetal DNA to at least 1,000, 2,000, 5,000, 7,500, 10,000, 20,000, 25,000, 30,000, 40,000, 50,000, Primer pools of 75,000 or 100,000 different non-polymorphic target loci are contacted to generate a reaction mixture; wherein the target loci are from a plurality of different chromosomes; (ii) subjecting the reaction mixture to primer extension reaction conditions to generate Amplification products of amplicons; (iii) quantification in silico of relative frequencies of target amplicons from first and second associated chromosomes; (iv) comparison in silico of target amplicons from first and second associated chromosomes the relative frequency of the amplicon; and (v) identifying the presence or absence of aneuploidy based on the compared relative frequencies of the first and second associated chromosomes. In some embodiments, the first chromosome is a chromosome suspected of being euploid. In some embodiments, the second chromosome is a chromosome suspected of aneuploidy.

组合产前诊断方法Combined Prenatal Diagnosis Methods

可以用于产前诊断或产前筛选非整倍性或其它遗传缺陷的方法有多种。在本文档中其它地方和在2006年11月28日提交的美国实用申请第11/603,406号；2008年3月17日提交的美国实用申请第12/076,348号以及PCT申请第PCT/S09/52730号中描述了一种这样的方法，所述方法使用相关个体的遗传数据来提高准确性，借由相关个体的遗传数据来获知或估计目标个体(例如胎儿)的遗传数据。用于产前诊断的其它方法涉及测量母本血液中的某些激素的水平，其中那些激素与各种遗传异常有关。它的一个实例被称作三重测试，其中测量母本血液中的数种(通常两种、三种、四种或五种)不同激素的水平的测试。在其中使用多种方法来测定指定结果的似然性，其中所述方法中没有一种本身是决定性的情况下，有可能组合通过那些方法给出的信息以做出比任一种单独方法准确的预测。在三重测试中，组合通过三种不同激素给出的信息可以产生的遗传异常预测比单独的激素水平可以预测的要准确。There are various methods that can be used for prenatal diagnosis or prenatal screening for aneuploidy or other genetic defects. Elsewhere in this document and in U.S. Utility Application No. 11/603,406, filed November 28, 2006; U.S. Utility Application No. 12/076,348, filed March 17, 2008; and PCT Application No. PCT/S09/52730 One such method is described in , which uses the genetic data of related individuals to improve the accuracy by which the genetic data of the target individual (such as a fetus) is known or estimated. Other methods used in prenatal diagnosis involve measuring the levels of certain hormones in the maternal blood, where those hormones are associated with various genetic abnormalities. An example of this is called a triple test, a test in which the levels of several (usually two, three, four or five) different hormones in the maternal blood are measured. In cases where multiple methods are used to determine the likelihood of a given result, none of which is conclusive in itself, it is possible to combine the information given by those methods to make a more accurate statement than either method alone. Prediction. In a triple test, combining information given by the three different hormones could yield more accurate predictions of genetic abnormalities than hormone levels alone could.

本文中公开了一种用于做出关于胎儿的遗传状态、尤其是胎儿的遗传异常的可能性的更准确预测的方法，所述方法包含组合胎儿的遗传异常的预测结果，其中那些预测是使用多种方法做出的。“更准确”方法可以指在指定的假阳性率下，具有更低的假阴性率的用于诊断异常的方法。在本发明的有利实施例中，所述预测中的一或多个是基于关于胎儿的遗传数据做出的，其中遗传知识使用PARENTAL SUPPORT^TM方法测定，也就是说，使用与胎儿相关的个体的遗传数据以更大的准确性测定胎儿的遗传数据。在一些实施例中，遗传数据可以包括胎儿的倍性状态。在一些实施例中，遗传数据可以指胎儿基因组上的一组等位基因判读。在一些实施例中，可以使用三重测试做出一些预测。在一些实施例中，可以使用母本血液中的其它激素水平的测量结果做出一些预测。在一些实施例中，通过考虑诊断的方法做出的预测可以与通过考虑筛选的方法做出的预测组合。在一些实施例中，所述方法涉及测量α-胎蛋白(AFP)的母本血液水平。在一些实施例中，所述方法涉及测量未结合的雌三醇(UE₃)的母本血液水平。在一些实施例中，所述方法涉及测量β-人绒膜促性腺激素(β-hCG)的母本血液水平。在一些实施例中，所述方法涉及测量侵蚀性滋养细胞抗原(ITA)的母本血液水平。在一些实施例中，所述方法涉及测量抑制素的母本血液水平。在一些实施例中，所述方法涉及测量妊娠相关血浆蛋白A(PAPP-A)的母本血液水平。在一些实施例中，所述方法涉及测量其它激素或母本血清标志物的母本血液水平。在一些实施例中，可以使用其它方法做出一些预测。在一些实施例中，可以使用充分整合的测试做出一些预测，例如将在约第12周妊娠的超声波和血液测试与在约第16周的第二次血液测试组合的测试。在一些实施例中，所述方法涉及测量胎儿颈半透明度(NT)。在一些实施例中，所述方法涉及使用所测量的上述激素水平进行预测。在一些实施例中，所述方法涉及上述方法的组合。Disclosed herein is a method for making more accurate predictions about the genetic status of the fetus, particularly the likelihood of a genetic abnormality in the fetus, the method comprising combining the predictions of the genetic abnormality of the fetus, wherein those predictions are made using Made in a variety of ways. A "more accurate" method may refer to a method for diagnosing an abnormality that has a lower false negative rate at a specified false positive rate. In an advantageous embodiment of the invention, one or more of said predictions are made on the basis of genetic data about the fetus, wherein genetic knowledge is determined using the PARENTAL SUPPORT ^™ method, that is to say using the Genetic Data Determine the genetic data of the fetus with greater accuracy. In some embodiments, the genetic data may include the ploidy state of the fetus. In some embodiments, the genetic data may refer to a set of allelic calls on the fetal genome. In some embodiments, some predictions may be made using triple testing. In some embodiments, some predictions can be made using measurements of other hormone levels in maternal blood. In some embodiments, predictions made by methods that consider diagnosis may be combined with predictions made by methods that consider screening. In some embodiments, the methods involve measuring maternal blood levels of alpha-fetoprotein (AFP). In some embodiments, the methods involve measuring maternal blood levels of unconjugated estriol (UE ₃ ). In some embodiments, the method involves measuring maternal blood levels of β-human chorionic gonadotropin (β-hCG). In some embodiments, the methods involve measuring maternal blood levels of invasive trophoblastic antigen (ITA). In some embodiments, the methods involve measuring maternal blood levels of statins. In some embodiments, the methods involve measuring maternal blood levels of pregnancy-associated plasma protein A (PAPP-A). In some embodiments, the methods involve measuring maternal blood levels of other hormones or maternal serum markers. In some embodiments, other methods may be used to make some predictions. In some embodiments, some predictions can be made using a well-integrated test, such as a test that combines an ultrasound and blood test at about week 12 of pregnancy with a second blood test at about week 16. In some embodiments, the method involves measuring fetal neck translucency (NT). In some embodiments, the methods involve making predictions using measured levels of the hormones described above. In some embodiments, the method involves a combination of the methods described above.

组合预测的方式有多种，例如一种方式可以将激素测量结果转化成中位数倍数(MoM)并且然后转化成似然率(LR)。类似地，其它测量结果可以使用NT分布的混合模型被转换成LR。用于NT和生化标志物的LR可以乘以年龄和妊娠相关风险以推导出各种病况的风险，例如21-三体症。检测率(DR)和假阳性率(FPR)可以通过承担超过指定风险阈值的风险比例来计算。There are several ways to combine predictions, for example one could convert hormone measurements into multiples of the median (MoM) and then into likelihood ratios (LR). Similarly, other measurements can be converted to LR using a mixed model of the NT distribution. LR for NT and biochemical markers can be multiplied by age and pregnancy-related risk to derive risk for various conditions, such as trisomy 21. The Detection Rate (DR) and False Positive Rate (FPR) can be calculated by taking the proportion of risks that exceed a specified risk threshold.

在一个实施例中，判读倍性状态的方法涉及将使用联合分布模型和等位基因计数概率测定的倍性假设中的每一个的相对概率与使用统计技术计算的倍性假设中的每一个的相对概率组合，所述统计技术取自测定胎儿是三体的风险分数的其它方法，包括(但不限于)：阅读计数分析、比较杂合率、仅在使用亲本遗传信息才可用的统计信息、针对某些亲本背景的归一化基因型信号的概率、使用第一样品或制备样品的估计胎儿分数计算的统计信息以及其组合。In one embodiment, a method of interpreting ploidy status involves comparing the relative probabilities of each of the ploidy hypotheses determined using a joint distribution model and allele count probabilities to the ploidy hypotheses calculated using statistical techniques. Combining relative probabilities, the statistical techniques derived from other methods of determining the risk score of a fetus being trisomy include (but are not limited to): read count analysis, comparing heterozygosity rates, statistics available only when using parental genetic information, Probabilities of normalized genotype signals for certain parental backgrounds, statistics calculated using estimated fetal fractions of first or prepared samples, and combinations thereof.

另一种方法可以涉及四个测量的激素水平的情况，其中关于那些激素的概率分布是已知的：关于整倍体情况，p(x₁，x₂，x₃，x₄|e)；并且关于非整倍体情况，p(x₁，x₂，x₃，x₄|a)。然后可以测量DNA测量结果的概率分布，整倍体和非整倍体情况分别是g(y|e)和g(y|a)。假定它们关于整倍体/非整倍体假设是独立的，可以组合为p(x₁，x₂，x₃，x₄|a)g(y|a)和p(x₁，x₂，x₃，x₄|e)g(y|e)，并且然后鉴于母本年龄，将各自乘以先验p(a)和p(e)。然后可以选择最高的那个。Another approach may involve the case of four measured hormone levels, where the probability distribution over those hormones is known: p(x ₁ , x ₂ , x ₃ , x ₄ |e) for the euploid case; And for the case of aneuploidy, p(x ₁ , x ₂ , x ₃ , x ₄ |a). The probability distribution of the DNA measurements can then be measured, g(y|e) and g(y|a) for the euploid and aneuploid cases, respectively. Assuming they are independent with respect to the euploid/aneuploid hypothesis, they can be combined as p(x ₁ , x ₂ , x ₃ , x ₄ |a)g(y|a) and p(x ₁ , x ₂ , x ₃ , x ₄ |e)g(y|e), and are then each multiplied by the priors p(a) and p(e) given the maternal age. The highest one can then be chosen.

在一个实施例中，有可能想到中心极限定理以假定在g(y|a或e)上的分布是高斯分布，并且通过查看多个样品来测量平均值和标准偏差。在另一个实施例中，可以假定它们关于结果不是独立的并且收集足够多的样品来估计联合分布p(x₁，x₂，x₃，x₄|a或e)。In one embodiment, it is possible to think of the central limit theorem to assume that the distribution over g(y|a or e) is Gaussian, and to measure the mean and standard deviation by looking at multiple samples. In another embodiment, it can be assumed that they are not independent with respect to the outcome and enough samples collected to estimate the joint distribution p(x ₁ , x ₂ , x ₃ , x ₄ |a or e).

在一个实施例中，将目标个体的倍性状态判定为与概率最大的假设相关的倍性状态。在一些情况下，一种假设将具有大于90％的归一化组合概率。每种假设都与一种或一组倍性状态相关，并且可以选择与归一化组合概率大于90％或一些其它阈值，例如50％、80％、95％、98％、99％或99.9％的假设相关的倍性状态作为被判读为所判定的倍性状态的假设所需的阈值。In one embodiment, the ploidy state of the target individual is determined to be the ploidy state associated with the most probable hypothesis. In some cases, a hypothesis will have a normalized combined probability greater than 90%. Each hypothesis is associated with one or a group of ploidy states and can be chosen to be combined with a normalized probability greater than 90% or some other threshold such as 50%, 80%, 95%, 98%, 99% or 99.9% The hypothesized associated ploidy state serves as the threshold required to be interpreted as a hypothesis for the determined ploidy state.

母本血液中来自先前妊娠的孩子的DNADNA from a child from a previous pregnancy in the mother's blood

非侵入性产前诊断的一个困难是区分来自当前妊娠的胎儿细胞与来自先前妊娠的胎儿细胞。有些人认为，来自先前妊娠的遗传物质将在一段时间之后消失，但是尚未显示出确凿证据。在本发明的一个实施例中，有可能使用PARENTAL SUPPORT^TM(PS)方法和父本基因组的知识，测定母本血液中所存在的父本来源的胎儿DNA(也就是说，胎儿从父亲遗传的DNA)。这种方法可以利用定相亲本遗传信息。有可能使用祖父母的遗传数据(例如从祖父的精子测量的遗传数据)或来自其它已出生孩子或流产样品的遗传数据，从未定相基因型信息定相亲本基因型。还可以借助于基于HapMap的定相或父本细胞的单倍型分析对未定相遗传信息进行定相。已经通过在有丝分裂阶段当染色体是紧密束时逮捕细胞并且使用微流体将分开的染色体放置在分开的孔中，证明了成功的单倍型分析。在另一个实施例中，有可能使用已定相亲本单倍型数据来检测来自父亲的一种以上同源物的存在，意味着在血液中存在来自一个以上孩子的遗传物质。通过集中于胎儿中预计是整倍体的染色体，可以排除胎儿患有三体性的可能性。此外，有可能测定胎儿DNA是否不是来自当前父亲，在此情况下，可以使用其它方法(例如三重测试)来预测遗传异常。One difficulty with non-invasive prenatal diagnosis is the differentiation of fetal cells from a current pregnancy from those from a previous pregnancy. Some believe that genetic material from previous pregnancies disappears after a while, but conclusive evidence has yet to be shown. In one embodiment of the invention, it is possible to use the PARENTAL SUPPORT ^™ (PS) method and knowledge of the paternal genome to determine the presence of fetal DNA of paternal origin (that is, fetal DNA inherited from the father) present in the maternal blood. DNA). This approach can take advantage of phased parental genetic information. It is possible to phase the parental genotypes from the unphased genotype information using the genetic data of the grandparents (eg, measured from the grandfather's sperm) or genetic data from other born children or aborted samples. Unphased genetic information can also be phased by means of HapMap-based phasing or haplotype analysis of paternal cells. Successful haplotype analysis has been demonstrated by arresting cells at the mitotic stage when chromosomes are in tight bundles and using microfluidics to place separate chromosomes in separate wells. In another embodiment, it is possible to use phased parental haplotype data to detect the presence of more than one congener from the father, meaning the presence of genetic material from more than one child in the blood. By focusing on the chromosomes in the fetus that are predicted to be euploid, the possibility of the fetus having a trisomy can be ruled out. In addition, it is possible to determine whether the fetal DNA is not from the current father, in which case other methods (such as triple testing) can be used to predict genetic abnormalities.

可以存在经由除了抽血以外的方法可得的胎儿遗传物质的其它来源。在母本血液中可得的胎儿遗传物质的情况下，主要存在两类：(1)整个胎儿细胞，例如有核胎儿红血细胞或幼红细胞，和(2)自由浮动的胎儿DNA。在整个胎儿细胞的情况下，存在一些证据，胎儿细胞可以在母本血液中存留一段延长的时间段以使得有可能从孕妇中分离含有来自先前妊娠的孩子或胎儿的DNA的细胞。还存在证据，自由浮动的胎儿DNA在大约数周内从系统中清除。一项挑战是如何确定细胞中所含遗传物质的个体的身份，即确保所测量的遗传物质不是来自先前妊娠的胎儿。在本发明的一个实施例中，母本遗传物质的知识可以用于确保所讨论的遗传物质不是母本遗传物质。实现这个目的的方法有多种，包括基于信息的方法，例如PARENTAL SUPPORT^TM，如本文档中或本文档中所提到的任一专利中所述。There may be other sources of fetal genetic material available via methods other than blood draws. In the case of fetal genetic material available in maternal blood, there are two main categories: (1) whole fetal cells, such as nucleated fetal red blood cells or erythroblasts, and (2) free-floating fetal DNA. In the case of whole fetal cells, there is some evidence that fetal cells can persist in maternal blood for extended periods of time such that it is possible to isolate cells from pregnant women containing DNA from a child or fetus from a previous pregnancy. There is also evidence that free-floating fetal DNA is cleared from the system in a matter of weeks. One challenge is how to determine the identity of the individual whose genetic material the cells contain, ensuring that the genetic material being measured is not from a fetus from a previous pregnancy. In one embodiment of the invention, knowledge of the maternal genetic material can be used to ensure that the genetic material in question is not the maternal genetic material. There are various ways to accomplish this, including information-based approaches such as PARENTAL SUPPORT ^™ , as described in this document or in any of the patents mentioned in this document.

在本发明的一个实施例中，从怀孕母亲获取的血液可以分成包含自由浮动的胎儿DNA的部分和包含有核红血细胞的部分。自由浮动的DNA可以任选地富集，并且可以测量DNA的基因型信息。根据从自由浮动的DNA测量的基因型信息，母本基因型的知识可以用于测定胎儿基因型的各方面。这些方面可以指倍性状态和/或一组等位基因身份。然后，个体的有核红血细胞可以使用本文档中其它地方和其它参考专利、尤其是在本文档的第一部分中所提到到那些专利中所述的方法进行基因分型。母本基因组的知识将允许测定任何指定单血细胞是否在基因上是母本的。并且如上所述测定的胎儿基因型的方面将允许测定单血细胞在基因上是否来源于当前正在孕育的胎儿。本质上，本发明的这个方面允许使用母亲的遗传知识和可能地来自其它相关个体(例如父亲)的遗传信息以及从母本血液中所发现的自由浮动的DNA测量的遗传信息来测定母本血液中所发现的经分离有核细胞是否(a)在基因上是母本的，(b)在基因上来自当前正在孕育的胎儿，或(c)在基因上来自先前妊娠的胎儿。In one embodiment of the invention, blood obtained from a pregnant mother may be separated into a fraction comprising free-floating fetal DNA and a fraction comprising nucleated red blood cells. Free-floating DNA can optionally be enriched, and genotype information from the DNA can be measured. Knowledge of the maternal genotype can be used to determine various aspects of the fetal genotype based on genotype information measured from free-floating DNA. These aspects can refer to ploidy state and/or set of allelic identities. Individual nucleated red blood cells can then be genotyped using the methods described elsewhere in this document and in other referenced patents, particularly those mentioned in the first part of this document. Knowledge of the maternal genome will allow determination of whether any given single blood cell is genetically maternal. And the aspect of fetal genotype determined as described above will allow determination of whether a single blood cell is genetically derived from a currently gestating fetus. Essentially, this aspect of the invention allows the use of genetic knowledge of the mother and possibly genetic information from other related individuals (such as the father) as well as genetic information measured from free-floating DNA found in the maternal blood to determine the Whether the isolated nucleated cells found in are (a) genetically maternal, (b) genetically derived from a currently gestating fetus, or (c) genetically derived from a previously pregnant fetus.

产前性染色体非整倍性测定Prenatal sex chromosome aneuploidy determination

在本领域中已知的方法中，人们在尝试从母亲的血液测定孕育中的胎儿的性别时已经使用了在母亲的血浆中存在胎儿的自由浮动的DNA(fffDNA)的事实。如果能够检测到母本血浆中的Y特异性基因座，那么这意味着孕育中的胎儿是男性。然而，当使用本领域中已知方法时，在血浆中检测不到Y特异性基因座并不总是确保孕育中的胎儿是女性，如在一些情况下，fffDNA的量太低以致于不能确保在男性胎儿的情况下能检测到Y特异性基因座。In methods known in the art, the fact that there is fetal free-floating DNA (ff DNA) in the mother's plasma has been used when attempting to determine the sex of a gestating fetus from the mother's blood. If the Y-specific locus can be detected in maternal plasma, then this means that the gestating fetus is male. However, when using methods known in the art, the absence of detection of the Y-specific locus in plasma does not always ensure that the fetus in pregnancy is female, as in some cases the amount of fffDNA is too low to ensure The Y-specific loci could be detected in the case of male fetuses.

在此提出了一种新颖的方法，它不需要测量Y特异性核酸，即来自排他性地父本来源的基因座的DNA。先前公开的Parental Support方法使用交叉频率数据、亲本基因型数据以及信息技术来测定孕育中的胎儿的倍性状态。胎儿的性别只是胎儿在性染色体的倍性状态。XX的孩子是女性，并且XY是男性。本文中所述的方法还能够测定胎儿的倍性状态。应注意，性别测定与性染色体的倍性测定实际上是同义的；在性别测定的情况下，通常假设孩子是整倍体，因此可能的假设更少。A novel approach is presented here that does not require the measurement of Y-specific nucleic acids, ie, DNA from exclusively paternally derived loci. A previously published Parental Support method uses crossover frequency data, parental genotype data, and information technology to determine the ploidy status of a gestating fetus. The sex of the fetus is just the ploidy state of the fetus in the sex chromosomes. XX's child is female and XY is male. The methods described herein also enable determination of the ploidy state of the fetus. It should be noted that sex determination is practically synonymous with ploidy determination of the sex chromosomes; in the case of sex determination, the child is usually assumed to be euploid, so fewer assumptions are possible.

本文中所公开的方法涉及查看X和Y染色体所共用的基因座，从而创建关于代表胎儿的胎儿DNA的预计量的基线。然后，可以查询仅仅对X染色体具有特异性的那些区域以确定胎儿是女性还是男性。在男性的情况下，我们预计看到胎儿DNA中来自对X染色体具有特异性的基因座少于来自对X和Y均具有特异性的基因座。相比之下，在女性胎儿中，我们预计这些组中的每一个的DNA的量是相同的。所讨论的DNA可以通过可以对样品上所存在的DNA的量进行定量的任何技术来测量，例如qPCR、SNP阵列、基因分型阵列或测序。关于排他性地来自一个个体的DNA，我们将预计看到以下：The methods disclosed herein involve looking at the loci shared by the X and Y chromosomes to create a pre-measured baseline of fetal DNA representative of the fetus. Then, only those regions specific to the X chromosome can be queried to determine whether the fetus is female or male. In the case of males, we would expect to see less fetal DNA from loci specific to the X chromosome than from loci specific to both X and Y. In female fetuses, by contrast, we would expect the amount of DNA in each of these groups to be the same. The DNA in question can be measured by any technique that can quantify the amount of DNA present on a sample, such as qPCR, SNP arrays, genotyping arrays or sequencing. With regard to DNA derived exclusively from one individual, we would expect to see the following:

在胎儿DNA与母亲DNA混合并且其中混合物中胎儿DNA的分数是F，并且其中混合物中母本DNA的分数是M，因此F+M＝100％的情况下，我们将预计看到以下：In the case where fetal DNA is mixed with maternal DNA and where the fraction of fetal DNA in the mixture is F, and where the fraction of maternal DNA in the mixture is M, so F+M = 100%, we would expect to see the following:

在其中F和M已知的情况下，可以计算预计比率，并且可以将所观察到的数据与预计数据进行比较。在其中M和F不已知的情况下，可以基于历史数据选择阈值。在两种情况下，在对X和Y具有特异性的基因座测量的DNA的量可以用作基线，并且胎儿的性别测试可以基于在仅仅对X染色体具有特异性的基因座上观察到的DNA的量。如果所述量比基线低出约等于1/2F的量或使它下降到预定阈值以下的量，那么确定胎儿是男性；并且如果所述量约等于基线或如果不低出使它下降到预定阈值以下的量，那么确定胎儿是女性。In cases where F and M are known, predicted ratios can be calculated and observed data can be compared to predicted data. In cases where M and F are not known, thresholds can be chosen based on historical data. In both cases, the amount of DNA measured at loci specific for X and Y can be used as a baseline, and sex testing of the fetus can be based on DNA observed at loci specific for the X chromosome only amount. If the amount is lower than baseline by an amount approximately equal to 1/2F or an amount that brings it down below a predetermined threshold, then the fetus is determined to be male; and if the amount is about equal to baseline or if not lower so that it falls below a predetermined threshold If the amount is below the threshold, then the fetus is determined to be female.

在另一个实施例中，可以仅仅查看X和Y染色体共用的那些基因座，通常称为Z染色体。Z染色体上的基因座子组在X染色体上通常总是A，并且在Y染色体上通常总是B。如果发现来自Z染色体的SNP具有B基因型，那么将胎儿判读为男性；如果发现来自Z染色体的SNP仅仅具有A基因型，那么将胎儿判读为女性。在另一个实施例中，可以查看仅仅在X染色体上发现的基因座。例如AA|B的背景是尤其提供信息的，因为B的存在表示胎儿具有来自父亲的X染色体。例如AB|B的背景也提供信息，因为我们预计看到相比于男性胎儿，在女性胎儿的情况下，B通常仅仅存在一半。在另一个实施例中，可以查看在Z染色体上的SNP，其中A和B等位基因均存在于X和Y染色体上，并且其中已知哪些SNP是来自父本的Y染色体，和哪些是来自父本的X染色体。In another example, one can look only at those loci shared by the X and Y chromosomes, commonly referred to as the Z chromosome. The set of loci on the Z chromosome is always always A on the X chromosome, and always always B on the Y chromosome. If the SNP from the Z chromosome was found to have the B genotype, the fetus was interpreted as male; if the SNP from the Z chromosome was found to have only the A genotype, the fetus was interpreted as female. In another embodiment, loci found only on the X chromosome can be looked at. A context such as AA|B is especially informative because the presence of B indicates that the fetus has an X chromosome from the father. Backgrounds such as AB|B are also informative, as we would expect to see that B is usually only half present in the case of female fetuses compared to male fetuses. In another example, one can look at SNPs on the Z chromosome, where both the A and B alleles are present on the X and Y chromosomes, and where it is known which SNPs are from the paternal Y chromosome, and which are from The father's X chromosome.

在一个实施例中，有可能扩增已知在由Y染色体和X染色体共享的同源非重组(HNR)区域之间变化的单核苷酸位置。在这个HNR区域内，X与Y染色体之间的序列很大程度上是一致的。在这个一致性区域内的单核苷酸位置虽然在群体中的X染色体中和Y染色体中是不变的，但是在X与Y染色体之间是不同的。每个PCR检测可以扩增来自存在于X和Y染色体上的基因座的序列。在每个扩增序列内将是可以使用测序或一些其它方法检测的单一碱基。In one embodiment, it is possible to amplify single nucleotide positions known to vary between homologous non-recombination (HNR) regions shared by the Y and X chromosomes. Within this HNR region, there is a large degree of sequence identity between the X and Y chromosomes. Single nucleotide positions within this region of consensus, while invariant among X and Y chromosomes in a population, differ between X and Y chromosomes. Each PCR test can amplify sequences from loci present on both the X and Y chromosomes. Within each amplified sequence will be a single base that can be detected using sequencing or some other method.

在一个实施例中，胎儿的性别可以从母本血浆中所发现的胎儿的自由浮动的DNA测定，并且所述方法包含以下步骤中的一些或所有：1)设计PCR(常规或微型PCR，必要时加上复合式)引物扩增HNR区域内的X/Y变异体单核苷酸位置；2)获得母本血浆；3)使用HNR X/Y PCR检测，PCR扩增来自母本血浆的目标；4)对扩增子进行测序；5)检查序列数据中在扩增序列中的一或多个内Y等位基因的存在。一或多个的存在将指示男性胎儿。所有扩增子皆不存在所有Y等位基因指示女性胎儿。In one embodiment, the sex of the fetus can be determined from free-floating DNA of the fetus found in maternal plasma, and the method comprises some or all of the following steps: 1) Design PCR (conventional or mini-PCR, optionally 2) Obtaining maternal plasma; 3) Using HNR X/Y PCR detection, PCR amplifies the target from maternal plasma ; 4) Sequencing the amplicon; 5) Checking the sequence data for the presence of one or more internal Y alleles in the amplified sequence. The presence of one or more will indicate a male fetus. Absence of all Y alleles in all amplicons indicates a female fetus.

在一个实施例中，可以使用靶向测序来测量母本血浆中的DNA和/或亲本基因型。在一个实施例中，可以忽略明确源自父本来源的DNA的所有序列。举例来说，在背景AA|AB中，可以对A序列的数量进行计数并且忽略所有B序列。为了测定以上算法的杂合率，可以比较所观察到的A序列的数量与指定探针的总序列的预计数量。可以基于每个样品计算每个探针的预计序列数量的方式有多种。在一个实施例中，有可能使用历史数据来测定所有序列读数中属于每个特异性探针的分数是多少并且然后使用这个经验分数以及序列读数的总数来估计在每个探针的序列数量。另一种方法可以是靶向一些已知的纯合等位基因并且然后使用历史数据，将在每个探针的读数数量与在已知纯合等位基因的读数数量联系在一起。关于每个样品，然后可以测量在纯合等位基因的读数数量并且然后使用这个测量结果以及凭经验推导出的关系，来估计在每个探针的序列读数的数量。In one embodiment, targeted sequencing can be used to measure DNA in maternal plasma and/or parental genotype. In one embodiment, all sequences that are unequivocally derived from DNA of paternal origin can be ignored. For example, in the background AA|AB, the number of A sequences can be counted and all B sequences ignored. To determine the heterozygosity rate of the above algorithm, the number of observed A sequences can be compared to the expected number of total sequences for a given probe. There are a number of ways in which the predicted number of sequences per probe can be calculated on a per-sample basis. In one embodiment, it is possible to use historical data to determine what fraction of all sequence reads belongs to each specific probe and then use this empirical score along with the total number of sequence reads to estimate the number of sequences at each probe. Another approach could be to target some known homozygous alleles and then use historical data to correlate the number of reads at each probe with the number of reads at known homozygous alleles. For each sample, the number of reads at the homozygous allele can then be measured and this measurement, along with an empirically derived relationship, then used to estimate the number of sequence reads at each probe.

在一些实施例中，有可能通过组合通过多种方法得到的预测结果来确定胎儿的性别。在一些实施例中，多种方法取自本发明中所述的方法。在一些实施例中，多种方法中的至少一种取自本发明中所述的方法。In some embodiments, it is possible to determine the sex of the fetus by combining predictions obtained by multiple methods. In some embodiments, multiple methods are taken from the methods described in the present invention. In some embodiments, at least one of the methods is taken from the methods described herein.

在一些实施例中，本文中所述的方法可以用于测定孕育中的胎儿的倍性状态。在一个实施例中，倍性判读方法使用对X染色体具有特异性或X和Y染色体所共用的基因座，但是不利用任何Y特异性基因座。在一个实施例中，倍性判读方法使用以下中的一或多者：对X染色体具有特异性的基因座、X和Y染色体所共用的基因座以及对Y染色体具有特异性的基因座。在一个实施例中，其中性染色体的比率是相似的，例如45,X(特纳综合症)、46,XX(正常女性)以及47,XXX(X三体)，可以通过比较等位基因分布与根据各种假设的预计等位基因分布来实现区分。在另一个实施例中，这可以通过比较性染色体的序列读数相对于一条或多条假定是整倍体的参考染色体的相对数量来实现。还应注意，可以扩展这些方法以包括非整倍体情况。In some embodiments, the methods described herein can be used to determine the ploidy state of a gestating fetus. In one embodiment, the ploidy calling method uses loci specific to the X chromosome or shared by the X and Y chromosomes, but does not utilize any Y-specific loci. In one embodiment, the ploidy calling method uses one or more of: a locus specific to the X chromosome, a locus common to the X and Y chromosomes, and a locus specific to the Y chromosome. In one example, where the sex chromosome ratios are similar, such as 45,X (Turner syndrome), 46,XX (normal female) and 47,XXX (trisomy X), the allelic distribution can be compared Discrimination is achieved from predicted allele distributions under various assumptions. In another embodiment, this may be accomplished by comparing the relative number of sequence reads of the sex chromosomes relative to one or more reference chromosomes that are assumed to be euploid. It should also be noted that these methods can be extended to include aneuploid cases.

单基因疾病筛选Single Gene Disease Screening

在一个实施例中，可以扩展用于测定胎儿的倍性状态的方法以能够同时测试单基因病症。单基因疾病诊断利用与用于非整倍性测试相同的靶向方法，并且要求额外的特异性目标。在一个实施例中，单基因NPD诊断是通过连锁分析。在许多情况下，cfDNA样品的直接测试是不可靠的，因为母本DNA的存在使得几乎不可能测定胎儿是否遗传了母亲的突变。唯一父本来源的等位基因的检测的挑战性更小，但是只有在所述疾病是显性的并且由父亲携带的情况下才能提供充分的信息，限制了所述方法的实用性。在一个实施例中，所述方法涉及PCR或相关扩增方法。In one embodiment, the method for determining the ploidy state of a fetus can be extended to enable simultaneous testing for monogenic disorders. Single gene disease diagnosis utilizes the same targeted approach as used for aneuploidy testing and requires additional specific targets. In one embodiment, monogenic NPD diagnosis is by linkage analysis. In many cases, direct testing of cfDNA samples is unreliable because the presence of maternal DNA makes it nearly impossible to determine whether the fetus has inherited the mother's mutation. Detection of unique paternally derived alleles is less challenging, but is only sufficiently informative if the disease is dominant and carried by the father, limiting the utility of the method. In one embodiment, the method involves PCR or related amplification methods.

在一些实施例中，所述方法涉及使用来自一级亲属的信息，对亲本中在非常紧密地连锁的SNP周围的异常等位基因进行定相。然后可以对从这些SNP获得的靶向测序数据运行Parental Support以测定胎儿从亲本双方遗传了哪些同源物(正常或异常)。只要SNP是充分连锁的，那么就可以非常可靠地测定胎儿的基因型的遗传情况。在一些实施例中，所述方法包含(a)向我们的用于非整倍性测试的复合池中添加一组SNP基因座以密集地侧接一组指定的常见疾病；(b)基于来自各个亲属的遗传数据，用正常和异常的等位基因对来自所添加的这些SNP的等位基因进行可靠地定相；以及(c)在疾病基因座周围区域中在所遗传的母本和父本同源物上重构胎儿单倍型或定相SNP等位基因组来测定胎儿基因型。在一些实施例中，向用于非整倍性测试的多态基因座组中添加紧密连接到疾病连锁基因座的额外探针。In some embodiments, the methods involve using information from first-degree relatives to phase abnormal alleles in the parents around very closely linked SNPs. Parental Support can then be run on the targeted sequencing data obtained from these SNPs to determine which homologues (normal or abnormal) the fetus has inherited from both parents. As long as the SNPs are sufficiently linked, the inheritance of the genotype of the fetus can be determined very reliably. In some embodiments, the method comprises (a) adding to our composite pool for aneuploidy testing a set of SNP loci to densely flank a specified set of common diseases; Genetic data for each relative, reliably phase the alleles from these added SNPs with normal and abnormal alleles; and (c) in the region surrounding the disease locus in the inherited maternal and paternal Reconstruct fetal haplotypes or phase SNP allelic sets on this congener to determine fetal genotypes. In some embodiments, additional probes that are tightly linked to disease-linked loci are added to the set of polymorphic loci used for aneuploidy testing.

重构胎儿双倍型具有挑战性，因为样品是母本和胎儿的DNA的混合物。在一些实施例中，所述方法并入了用于定相SNP和疾病等位基因的相对信息，然后考虑SNP的物理距离和来自位置特异性重组可能性的重组数据以及从母本血浆的遗传测量所观察到的数据，从而获得胎儿的最可能的基因型。Reconstructing fetal diplotypes is challenging because the sample is a mixture of maternal and fetal DNA. In some embodiments, the method incorporates relative information for phasing SNPs and disease alleles, and then considers the physical distance of the SNPs and recombination data from position-specific recombination possibilities as well as genetic inheritance from maternal plasma. The observed data are measured to obtain the most likely genotype of the fetus.

在一个实施例中，在靶向多态基因座组中包括每个疾病连锁基因座多个额外探针；每个疾病连锁基因座的额外探针的数量可以介于4与10之间、介于11与20之间、介于21与40之间、介于41与60之间、介于61与80之间或其组合。In one embodiment, multiple additional probes per disease-linked locus are included in the panel of targeted polymorphic loci; the number of additional probes per disease-linked locus can be between 4 and 10, between Between 11 and 20, between 21 and 40, between 41 and 60, between 61 and 80 or combinations thereof.

给来自亲本的二倍体数据定相会具有挑战性，并且可以实现这一点的方式有多种。在本发明中讨论了一些，其它的更详细地描述在其它公开(参看例如2009年2月9日提交的PCT公开第W02009105531号和2009年8月4日提交的PCT公开第W02010017214号，所述公开各自特此以全文引用的方式并入本文中)中。在一个实施例中，亲本可以通过推断，通过测量来自亲本的单倍体组织，例如通过测量一或多个精子或卵子进行定相。在一个实施例中，亲本可以通过推断，使用所测量的一级亲属(例如父母的父母或兄弟姐妹)的基因型数据来进行定相。在一个实施例中，亲本可以通过稀释定相，其中在一个或多个孔中将DNA稀释到其中预计每个孔中每个单倍型不超过约一个拷贝的程度，并且然后测量一或多个孔中的DNA。在一个实施例中，亲本基因型可以通过使用使用基于群体的单倍型频率的计算机程序进行定相，推断最可能的相型。在一个实施例中，如果另一亲本的已定相单倍型数据以及亲本的一或多个遗传后代的未定相遗传数据是已知的，那么可以对亲本进行定相。在一些实施例中，亲本的遗传后代可以是一或多个胚胎、胎儿和/或已出生孩子。这些方法和用于对亲本一方或双方进行定相的其它方法中的一些更详细地公开在例如2010年8月19日提交的美国公开第2011/0033862号；2011年2月3日提交的美国公开第2011/0178719号；2006年11月22日提交的美国公开第2007/0184467号；2008年3月17日提交的美国公开第2008/0243398号中，所述公开各自特此以全文引用的方式并入本文中。Phasing diploid data from the parents can be challenging, and there are a number of ways this can be accomplished. Some are discussed in this disclosure, others are described in more detail in other publications (see e.g. PCT Publication No. WO2009105531, filed February 9, 2009, and PCT Publication No. WO2010017214, filed August 4, 2009, which describe The disclosures of each are hereby incorporated herein by reference in their entirety). In one embodiment, the parents can be phased by inference, by measuring haploid tissue from the parents, eg, by measuring one or more sperm or eggs. In one embodiment, parenthood can be phased by inference, using measured genotype data for first-degree relatives (eg, parents-of-parents or siblings). In one example, the parents can be phased by dilution, wherein the DNA is diluted in one or more wells to the extent that no more than about one copy of each haplotype is expected in each well, and then one or more DNA in each well. In one embodiment, parental genotypes can be phased using a computer program that uses population-based haplotype frequencies to infer the most likely phasing type. In one embodiment, a parent can be phased if phased haplotype data for the other parent and unphased genetic data for one or more genetic progeny of the parent are known. In some embodiments, a genetic offspring of a parent may be one or more embryos, fetuses, and/or born children. Some of these methods and others for phasing one or both parents are disclosed in more detail, for example, in U.S. Publication No. 2011/0033862, filed August 19, 2010; Publication No. 2011/0178719; U.S. Publication No. 2007/0184467, filed November 22, 2006; U.S. Publication No. 2008/0243398, filed March 17, 2008, each of which is hereby incorporated by reference in its entirety incorporated into this article.

胎儿基因组重构fetal genome remodeling

在一方面，本发明的特征在于用于测定胎儿的单倍型的方法。在不同实施例中，这种方法允许测定胎儿遗传了哪些多态基因座(例如SNP)并且重构胎儿中所存在的哪些同源物(包括重组事件)(并且从而将序列插入在多态基因座之间)。必要时，基本上可以重构胎儿的整个基因组。如果在胎儿的基因组中剩余一些不确定性(例如交叉的间距)，那么必要时可以通过分析额外多态基因座将这种不确定性降到最低。在不同实施例中，选择多态基因座以将任何不确定性降低到所希望的水平的密度涵盖染色体中的一或多条。这种方法非常适用于检测胎儿中的相关多态性或其它突变，因为它使得它们的检测是基于连锁(例如胎儿基因组中连锁多态基因座的存在)而不是通过直接检测胎儿基因组中相关多态性或其它突变。举例来说，如果亲本是与囊肿性纤维化(CF)相关的突变的携带者，那么可以分析包括来自胎儿母亲的母本DNA和来自胎儿的胎儿DNA的核酸样品以测定胎儿DNA是否包括含有CF突变的单倍型。具体来说，可以分析多态基因座以测定胎儿DNA是否包括含有CF突变的单倍型，而不必检测胎儿DNA中的CF突变本身。In one aspect, the invention features a method for determining the haplotype of a fetus. In various embodiments, this method allows determining which polymorphic loci (e.g., SNPs) are inherited by the fetus and reconstructing which homologues (including recombination events) are present in the fetus (and thereby inserting sequences at polymorphic genes between seats). If necessary, essentially the entire genome of the fetus can be reconstructed. If some uncertainty remains in the genome of the fetus (such as the spacing of crossovers), this can be minimized by analyzing additional polymorphic loci, if necessary. In various embodiments, polymorphic loci are selected to encompass one or more of the chromosomes at a density that reduces any uncertainty to a desired level. This method is well suited for the detection of associated polymorphisms or other mutations in the fetus because it enables their detection to be based on linkage (eg, the presence of linked polymorphic loci in the fetal genome) rather than by direct detection of associated polymorphisms in the fetal genome. Morphology or other mutations. For example, if the parents are carriers of a mutation associated with cystic fibrosis (CF), a nucleic acid sample comprising maternal DNA from the mother of the fetus and fetal DNA from the fetus can be analyzed to determine whether the fetal DNA includes Mutant haplotype. In particular, polymorphic loci can be analyzed to determine whether fetal DNA includes a haplotype containing a CF mutation without necessarily detecting the CF mutation itself in the fetal DNA.

在一些实施例中，所述方法涉及测定亲本单倍型(例如，胎儿的母亲或父亲的单倍型)。在一些实施例中，这种测定是在不使用来自母亲或父亲的亲戚的数据的情况下进行的。在一些实施例中，相继使用如本文和其它地方(参看例如20108月19日提交的美国公开第2011/0033862号，所述公开特此以全文引用的方式并入本文中)中所述的稀释方法、SNP基因分型或测序来测定亲本单倍型。因为DNA被稀释，所以一种以上单倍型不太可能在相同的部分(或管子)中。因此，在管子中可以有效地存在单一DNA分子，这允许测定单一DNA分子上的单倍型。在一些实施例中，所述方法包括将DNA样品分成多个部分以使得所述部分中的至少一个包括来自一对染色体的一条染色体或一个染色体区段，并且对所述部分中的至少一个中的DNA样品进行基因分型(例如，测定两个或更多个多态基因座的存在)，从而测定亲本单倍型。在一些实施例中，基因分型涉及测序(例如鸟枪法测序)。在一些实施例中，基因分型涉及使用SNP阵列来检测多态基因座，例如至少1,000、2,000、5,000、7,500、10,000、20,000、25,000、30,000、40,000、50,000、75,000或100,000个不同的多态基因座。在一些实施例中，基因分型涉及使用复合PCR。在一些实施例中，所述方法涉及使一部分样品与同时杂交到至少1,000、2,000、5,000、7,500、10,000、20,000、25,000、30,000、40,000、50,000、75,000或100,000个不同的多态基因座(例如SNP)的引物库接触以产生反应混合物；并且使反应混合物经历引物延伸反应条件以产生扩增产物，用高通量测序仪测量扩增产物，产生测序数据。In some embodiments, the methods involve determining a parental haplotype (eg, the haplotype of the mother or father of the fetus). In some embodiments, this determination is made without the use of data from maternal or paternal relatives. In some embodiments, dilution methods as described herein and elsewhere (see, e.g., U.S. Publication No. 2011/0033862, filed August 19, 2010, which is hereby incorporated by reference in its entirety) are used sequentially , SNP genotyping or sequencing to determine parental haplotypes. Because the DNA is diluted, it is unlikely that more than one haplotype will be in the same section (or tube). Thus, effectively a single DNA molecule can be present in a tube, which allows the determination of haplotypes on a single DNA molecule. In some embodiments, the method includes dividing the DNA sample into fractions such that at least one of the fractions includes a chromosome or a chromosome segment from a pair of chromosomes, and dividing the DNA sample into at least one of the fractions The DNA sample is genotyped (eg, to determine the presence of two or more polymorphic loci) to determine the parental haplotype. In some embodiments, genotyping involves sequencing (eg, shotgun sequencing). In some embodiments, genotyping involves using a SNP array to detect polymorphic loci, e.g., at least 1,000, 2,000, 5,000, 7,500, 10,000, 20,000, 25,000, 30,000, 40,000, 50,000, 75,000, or 100,000 different polymorphisms loci. In some embodiments, genotyping involves the use of multiplex PCR. In some embodiments, the method involves simultaneously hybridizing a portion of the sample to at least 1,000, 2,000, 5,000, 7,500, 10,000, 20,000, 25,000, 30,000, 40,000, 50,000, 75,000, or 100,000 different polymorphic loci (e.g. SNPs) are contacted to generate a reaction mixture; and the reaction mixture is subjected to primer extension reaction conditions to generate amplification products, which are measured with a high-throughput sequencer to generate sequencing data.

在一些实施例中，通过本文中所述方法中的任一种，使用来自母亲的亲属的数据测定母亲的单倍型。在一些实施例中，通过本文中所述方法中的任一种，使用来自父亲的亲属的数据测定父亲的单倍型。在一些实施例中，测定父亲和母亲的单倍型。在一些实施例中，使用SNP阵列测定来自母亲(或父亲)和母亲(或父亲)的亲属的DNA样品中至少1,000、2,000、5,000、7,500、10,000、20,000、25,000、30,000、40,000、50,000、75,000或100,000个不同的多态基因座的存在。在一些实施例中，所述方法涉及使来自母亲(或父亲)和/或母亲(或父亲)的亲属的DNA样品与同时杂交到至少1,000、2,000、5,000、7,500、10,000、20,000、25,000、30,000、40,000、50,000、75,000或100,000个不同的多态基因座(例如SNP)的引物库接触以产生反应混合物；并且使反应混合物经历引物延伸反应条件以产生扩增产物，用高通量测序仪测量扩增产物，产生测序数据。亲本单倍型可以基于SNP阵列或测序数据测定。在一些实施例中，亲本数据可以通过本文档中其它地方所描述或提到的方法进行定相。In some embodiments, the mother's haplotype is determined using data from relatives of the mother by any of the methods described herein. In some embodiments, the haplotype of the father is determined using data from the father's relatives by any of the methods described herein. In some embodiments, the haplotypes of the father and mother are determined. In some embodiments, at least 1,000, 2,000, 5,000, 7,500, 10,000, 20,000, 25,000, 30,000, 40,000, 50,000, 75,000 DNA samples from the mother (or father) and relatives of the mother (or father) are determined using the SNP array Or the existence of 100,000 different polymorphic loci. In some embodiments, the method involves simultaneously hybridizing DNA samples from the mother (or father) and/or relatives of the mother (or father) to at least 1,000, 2,000, 5,000, 7,500, 10,000, 20,000, 25,000, 30,000 , 40,000, 50,000, 75,000, or 100,000 primer pools of different polymorphic loci (e.g., SNPs) are contacted to generate a reaction mixture; and the reaction mixture is subjected to primer extension reaction conditions to generate an amplification product, measured with a high-throughput sequencer The products are amplified to generate sequencing data. Parental haplotypes can be determined based on SNP arrays or sequencing data. In some embodiments, parental data may be phased by methods described or referred to elsewhere in this document.

这个亲本单倍型数据可以用于测定胎儿是否遗传了亲本单倍型。在一些实施例中，包括来自胎儿母亲的母本DNA和来自胎儿的胎儿DNA的核酸样品使用SNP阵列进行分析以检测至少1,000、2,000、5,000、7,500、10,000、20,000、25,000、30,000、40,000、50,000、75,000或100,000个不同的多态基因座。在一些实施例中，包括来自胎儿母亲的母本DNA和来自胎儿的胎儿DNA的核酸样品通过使所述样品与同时杂交到至少1，000、2,000、5,000、7,500、10,000、20,000、25,000、30,000、40,000、50,000、75,000或100,000个不同的多态基因座(例如SNP)的引物库接触以产生反应混合物来进行分析。在一些实施例中，使反应混合物经历引物延伸反应条件以产生扩增产物。在一些实施例中，用高通量测序仪测量扩增产物，产生测序数据。在不同实施例中，SNP阵列或测序数据用于通过使用关于在染色体中的不同位置的染色体交叉的概率的数据(例如通过使用重组数据，例如可以发现于HapMap数据库中的数据，来创建针对任何间距的重组风险分数)，对所述染色体上的多态等位基因之间的相关性进行建模，来测定亲本单倍型。在一些实施例中，基于测序数据，在计算机上计算在多态基因座的等位基因计数。在一些实施例中，在计算机上创建多个各自关于染色体的不同的可能倍性状态的倍性假设；针对每种倍性假设，在计算机上为在染色体上的多态基因座的预计等位基因计数构建模型(例如联合分布模型)；使用联合分布模型和等位基因计数，在计算机上测定倍性假设中的每一个的相对概率；以及通过选择对应于具有最大概率的假设的倍性状态，判读胎儿的倍性状态。在一些实施例中，使用不需要使用参考染色体的方法为等位基因计数构建联合分布模型并执行测定每种假设的相对概率的步骤。This parental haplotype data can be used to determine whether the fetus has inherited the parental haplotype. In some embodiments, a nucleic acid sample comprising maternal DNA from the mother of the fetus and fetal DNA from the fetus is analyzed using a SNP array to detect at least 1,000, 2,000, 5,000, 7,500, 10,000, 20,000, 25,000, 30,000, 40,000, 50,000 , 75,000 or 100,000 different polymorphic loci. In some embodiments, a nucleic acid sample comprising maternal DNA from the mother of the fetus and fetal DNA from the fetus is obtained by simultaneously hybridizing the samples to at least 1,000, 2,000, 5,000, 7,500, 10,000, 20,000, 25,000, 30,000 , 40,000, 50,000, 75,000, or 100,000 primer libraries for different polymorphic loci (eg, SNPs) are contacted to generate reaction mixtures for analysis. In some embodiments, the reaction mixture is subjected to primer extension reaction conditions to generate amplification products. In some embodiments, the amplification products are measured with a high-throughput sequencer to generate sequencing data. In various embodiments, the SNP array or sequencing data is used to create an index for any crossover by using data on the probability of chromosomal crossovers at different locations in the chromosome (e.g. by using recombination data, such as can be found in the HapMap database). Recombination Risk Score for Spacing), the correlation between polymorphic alleles on the chromosome is modeled to determine the parental haplotype. In some embodiments, allele counts at polymorphic loci are calculated in silico based on the sequencing data. In some embodiments, a plurality of ploidy hypotheses are created in silico, each for the different possible ploidy states of the chromosome; for each ploidy hypothesis, the predicted alleles for the polymorphic loci on the chromosome are Gene counting to construct a model (e.g., joint distribution model); using the joint distribution model and allele counts, to determine in silico the relative probability of each of the ploidy hypotheses; and by selecting the ploidy state corresponding to the hypothesis with the greatest probability , to interpret the ploidy status of the fetus. In some embodiments, the steps of constructing a joint distribution model for allele counts and determining the relative probability of each hypothesis are performed using a method that does not require the use of a reference chromosome.

在一些实施例中，测定一或多条取自由13号、18号、21号、X和Y染色体组成的群组的染色体的胎儿单倍型。在一些实施例中，测定所有胎儿染色体的胎儿单倍型。在不同实施例中，所述方法基本上测定胎儿的整个基因组。在一些实施例中，测定至少30％、40％、50％、60％、70％、80％、90％或95％的胎儿基因组的单倍型。在一些实施例中，胎儿的单倍型测定结果包括关于至少1,000、2,000、5,000、7,500、10,000、20,000、25,000、30,000、40,000、50,000、75,000或100,000个不同的多态基因座存在哪一个等位基因的信息。In some embodiments, the fetal haplotype is determined for one or more chromosomes taken from the group consisting of chromosomes 13, 18, 21, X, and Y. In some embodiments, fetal haplotypes are determined for all fetal chromosomes. In various embodiments, the methods measure substantially the entire genome of the fetus. In some embodiments, at least 30%, 40%, 50%, 60%, 70%, 80%, 90%, or 95% of the fetal genome is haplotyped. In some embodiments, the haplotype determination results for the fetus include information about which of at least 1,000, 2,000, 5,000, 7,500, 10,000, 20,000, 25,000, 30,000, 40,000, 50,000, 75,000, or 100,000 different polymorphic loci are present, etc. gene information.

DNA的组合物Composition of DNA

当对关于胎儿和母本血液的混合物所测量的测序数据执行信息分析以测定关于胎儿的基因组信息(例如胎儿的倍性状态)时，测量在一组等位基因的等位基因分布会是有利的。令人遗憾的是，在许多情况下，例如当尝试从母本血液样品的血浆中所发现的DNA混合物测定胎儿的倍性状态时，可用的DNA的量不足以按良好的保真度直接测量混合物中的等位基因分布。在这些情况下，DNA混合物的扩增将提供足够数量的DNA分子，以便可以按良好的保真度测量所希望的等位基因分布。然而，用于测序的DNA扩增中通常所用的当前扩增方法通常是非常有偏差的，意指它们不使多态基因座的两个等位基因扩增相同的量。有偏差的扩增会导致等位基因分布与初始混合物中的等位基因分布极不相同。针对大多数目的，存在于多态基因座的等位基因的相对量的高度准确的测量结果是不需要的。相比之下，在本发明的一个实施例中，特异性富集多态等位基因并且保持等位基因比率的扩增或富集方法是有利的。Measuring the allelic distribution over a set of alleles can be advantageous when performing informative analysis on sequencing data measured on a mixture of fetal and maternal blood to determine genomic information about the fetus, such as the ploidy state of the fetus of. Unfortunately, in many cases, such as when attempting to determine the ploidy state of a fetus from the mixture of DNA found in the plasma of a maternal blood sample, the amount of DNA available is insufficient for direct measurement with good fidelity Allelic distribution in a mixture. In these cases, amplification of the DNA mixture will provide a sufficient number of DNA molecules so that the desired allelic distribution can be measured with good fidelity. However, current amplification methods commonly used in DNA amplification for sequencing are often very biased, meaning that they do not amplify both alleles of a polymorphic locus by the same amount. Biased amplification can result in an allelic distribution that is very different from that in the initial mixture. Highly accurate measurements of the relative amounts of alleles present at polymorphic loci are not required for most purposes. In contrast, in one embodiment of the invention, an amplification or enrichment method that specifically enriches for polymorphic alleles and preserves allele ratios is advantageous.

本文中描述了多种可以用于以使等位基因偏差降到最低的方式在多个基因座优先富集DNA样品的方法。一些实例是使用环化中探针来靶向多个基因座，其中已预环化探针的3′端和5′端被设计成用于杂交到远离靶向等位基因的多态位点的一个或几个位置的碱基。另一个实例是使用PCR探针，其中PCR探针的3′端被设计成用于杂交到远离靶向等位基因的多态位点的一个或几个位置的碱基。另一个实例是使用拆分与合并方法来创建DNA混合物，其中以低等位基因偏差富集经优先富集的基因座，不具有直接复合的缺点。另一个实例是使用杂交捕获方法，其中捕获探针被设计成使得捕获探针中被设计成用于杂交到侧接目标的多态位点的DNA的区域通过一个或少量碱基与多态位点隔开。Described herein are various methods that can be used to preferentially enrich a DNA sample at multiple loci in a manner that minimizes allelic bias. Some examples are the use of circularizing probes to target multiple loci, where the 3' and 5' ends of the precircularized probes are designed to hybridize to polymorphic sites away from the targeted allele bases at one or several positions. Another example is the use of PCR probes, where the 3' end of the PCR probe is designed to hybridize to a base one or a few positions away from the polymorphic site of the targeted allele. Another example is the use of a split-and-merge approach to create DNA mixtures in which preferentially enriched loci are enriched with low allelic bias, without the drawbacks of direct recombination. Another example is the use of hybrid capture methods in which the capture probes are designed such that the region of the DNA in the capture probe that is designed to hybridize to the polymorphic site flanking the target is separated by one or a few bases from the polymorphic site. Dots separated.

在其中在一组多态基因座所测量的等位基因分布用于测定个体的倍性状态的情况下，需要保持如经制备用于遗传测量的DNA样品中的等位基因的相对量。这个制备可以涉及WGA扩增、靶向扩增、选择性富集技术、杂交捕获技术、环化中探针或打算扩增DNA的量和/或选择性增强对应于某些等位基因的DNA分子的存在的其它方法。In cases where the allelic distribution measured at a set of polymorphic loci is used to determine the ploidy state of an individual, it is necessary to maintain the relative amounts of the alleles as in a DNA sample prepared for genetic measurements. This preparation may involve WGA amplification, targeted amplification, selective enrichment techniques, hybridization capture techniques, circularization of probes or the amount of DNA to be amplified and/or selective enhancement of DNA corresponding to certain alleles Other methods of existence of molecules.

在本发明的一些实施例中，存在一组被设计成用于其中基因座具有最大的次要等位基因频率的目标基因座的DNA探针。在本发明的一些实施例中，存在一组被设计成用于目标的探针，其中基因组具有在那些基因座具有高信息量SNP的胎儿的最大似然性。在本发明的一些实施例中，存在一组被设计成用于目标基因座的探针，其中所述探针针对指定群体子组优化。在本发明的一些实施例中，存在一组被设计成用于目标基因座的探针，其中所述探针针对群体子组的指定混合优化。在本发明的一些实施例中，存在一组被设计成用于目标基因座的探针，其中所述探针针对一对指定亲本优化，所述亲本来自具有不同次要等位基因频率概况的不同群体子组。在本发明的一些实施例中，存在包含至少一个经退火到具有胎儿来源的一片DNA的碱基对的DNA的已环化链。在本发明的一些实施例中，存在包含至少一个经退火到具有胎盘来源的一片DNA的碱基对的DNA的已环化链。在本发明的一些实施例中，存在已环化的DNA的已环化链，而核苷酸中的至少一些退火到具有胎儿来源的DNA。在本发明的一些实施例中，存在已环化的DNA的已环化链，而核苷酸中的至少一些退火到具有胎盘来源的DNA。在本发明的一些实施例中，存在一组探针，其中探针中的一些靶向单一串联重复，并且探针中的一些靶向单核苷酸多态性。在一些实施例中，为了非侵入性产前诊断选择基因座。在一些实施例中，为了非侵入性产前诊断使用探针。在一些实施例中，使用可以包括环化中探针、MIP、通过杂交探针捕获、在SNP阵列上的探针或其组合的方法来靶向基因座。在一些实施例中，所述探针被用作环化中探针、MIP、通过杂交探针捕获、在SNP阵列上的探针或其组合。在一些实施例中，为了非侵入性产前诊断对基因座进行测序。In some embodiments of the invention, there is a set of DNA probes designed for the locus of interest where the locus has the greatest minor allele frequency. In some embodiments of the invention, there is a set of probes designed to target where the genome has the greatest likelihood of a fetus with highly informative SNPs at those loci. In some embodiments of the invention, there is a set of probes designed for a locus of interest, wherein the probes are optimized for a given subset of the population. In some embodiments of the invention, there is a set of probes designed for a locus of interest, wherein the probes are optimized for a given mix of population subsets. In some embodiments of the invention, there is a set of probes designed for a locus of interest, wherein the probes are optimized for a given pair of parents derived from genes with different minor allele frequency profiles. different population subgroups. In some embodiments of the invention, there is a circularized strand of DNA comprising at least one base pair annealed to a piece of DNA of fetal origin. In some embodiments of the invention, there is a circularized strand of DNA comprising at least one base pair annealed to a piece of DNA having placental origin. In some embodiments of the invention, there is a circularized strand of circularized DNA with at least some of the nucleotides annealing to DNA of fetal origin. In some embodiments of the invention, there is a circularized strand of circularized DNA with at least some of the nucleotides annealing to DNA of placental origin. In some embodiments of the invention, there is a set of probes, wherein some of the probes target single tandem repeats and some of the probes target single nucleotide polymorphisms. In some embodiments, loci are selected for non-invasive prenatal diagnosis. In some embodiments, the probe is used for non-invasive prenatal diagnosis. In some embodiments, loci are targeted using methods that may include circularizing probes, MIPs, probe capture by hybridization, probes on SNP arrays, or combinations thereof. In some embodiments, the probes are used as circularization probes, MIPs, capture by hybridization probes, probes on SNP arrays, or combinations thereof. In some embodiments, the loci are sequenced for non-invasive prenatal diagnosis.

在其中当与相关亲本背景组合时序列的相对信息量更大的情况下，由此得出使含有其中亲本背景已知的SNP的序列读数的数量达到最大可以使关于混合样品的测序读数组的信息量达到最大。在一个实施例中，含有其中亲本背景已知的SNP的序列读数的数量可以通过使用qPCR优先扩增特异性序列来增强。在一个实施例中，含有其中亲本背景已知的SNP的序列读数的数量可以通过使用环化中探针(例如，MIP)优先扩增特异性序列来增强。在一个实施例中，含有其中亲本背景已知的SNP的序列读数的数量可以通过使用通过杂交方法捕获(例如休尔塞莱克特)优先扩增特异性序列来增强。可以使用不同方法来增强含有其中亲本背景已知的SNP的序列读数的数量。在一个实施例中，可以通过延伸接合、无延伸接合、通过杂交捕获或PCR来实现靶向。In cases where the relative informativeness of the sequence is greater when combined with the relevant parental context, it follows that maximizing the number of sequence reads containing SNPs for which the parental context is known can maximize the number of sequencing read sets for mixed samples. The amount of information is at its maximum. In one embodiment, the number of sequence reads containing SNPs for which the parental context is known can be enhanced by preferentially amplifying specific sequences using qPCR. In one embodiment, the number of sequence reads containing SNPs for which the parental context is known can be enhanced by preferentially amplifying specific sequences using circularizing probes (eg, MIPs). In one embodiment, the number of sequence reads containing SNPs for which the parental context is known can be enhanced by preferentially amplifying specific sequences using capture by hybridization methods (eg, Hue Select). Different approaches can be used to enhance the number of sequence reads containing SNPs for which the parental context is known. In one example, targeting can be achieved by extension ligation, extension-free ligation, by hybridization capture, or PCR.

在片段化基因组DNA的样品中，一部分DNA序列独特地映射到单独染色体；其它DNA序列可以发现在不同染色体上。应注意，通常使血浆中所发现的DNA(无论来源是母本还是胎儿)片段化，通常长度不足500bp。在典型基因组样品中，大约3.3％的可映射序列将映射到13号染色体；2.2％的可映射序列将映射到18号染色体；1.35％的可映射序列将映射到21号染色体；4.5％的可映射序列将映射到女性中的X染色体；2.25％的可映射序列将映射到X染色体(在男性中)；以及0.73％的可映射序列将映射到Y染色体(在男性中)。这些是胎儿中最有可能是非整倍体的染色体。此外，在短序列中，20个序列中约1个将含有SNP，使用dbSNP上所含的SNP。鉴于可能存在尚未发现的多个SNP，所述比例很可能会更高。In a sample of fragmented genomic DNA, some DNA sequences uniquely map to individual chromosomes; other DNA sequences may be found on different chromosomes. It should be noted that DNA found in plasma (whether maternal or fetal in origin) is generally fragmented, usually to less than 500 bp in length. In a typical genomic sample, approximately 3.3% of mappable sequences would map to chromosome 13; 2.2% of mappable sequences would map to chromosome 18; 1.35% of mappable sequences would map to chromosome 21; 4.5% of mappable sequences would map to chromosome 18; Mapped sequences will map to the X chromosome in females; 2.25% of mappable sequences will map to the X chromosome (in males); and 0.73% of mappable sequences will map to the Y chromosome (in males). These are the chromosomes most likely to be aneuploid in the fetus. Also, in short sequences, approximately 1 in 20 sequences will contain a SNP, using the SNP contained on dbSNP. Given that there may be multiple SNPs that have not been discovered, the ratio is likely to be higher.

在本发明的一个实施例中，靶向方法可以用于增强DNA样品中映射到指定染色体的DNA的分数，以使得所述分数显著超过以上列出的基因组样品的典型百分比。在本发明的一个实施例中，靶向方法可以用于增强DNA样品中的DNA分数，以使得含有SNP的序列的百分比显著大于基因组样品中通常可以发现的百分比。在本发明的一个实施例中，为了产前诊断，靶向方法可以用于靶向母本和胎儿的DNA的混合物中来自染色体或来自一组SNP的目标DNA。In one embodiment of the invention, targeted methods can be used to enhance the fraction of DNA in a DNA sample that maps to a given chromosome such that the fraction significantly exceeds the typical percentages listed above for genomic samples. In one embodiment of the invention, a targeted approach can be used to enhance the DNA fraction in a DNA sample such that the percentage of sequences containing a SNP is significantly greater than what would normally be found in a genomic sample. In one embodiment of the invention, targeting methods can be used to target target DNA from a chromosome or from a set of SNPs in a mixture of maternal and fetal DNA for prenatal diagnosis.

应注意，已经报告(美国专利7,888,017)了一种用于测定胎儿非整倍性的方法，所述方法是通过计数映射到怀疑染色体的读数的数量并且将它与映射到参考染色体的读数的数量进行比较，并且使用多于怀疑染色体上的读数的过量对应于胎儿中在所述染色体的三倍性的假设。用于产前诊断的那些方法将不利用任何类别的靶向，它们也不描述使用靶向进行产前诊断。It should be noted that a method for determining fetal aneuploidy has been reported (US Patent 7,888,017) by counting the number of reads that map to a suspect chromosome and comparing it to the number of reads that map to a reference chromosome A comparison is made and an excess of reads on a suspected chromosome is used that corresponds to the hypothesis of triploidy at that chromosome in the fetus. Those methods for prenatal diagnosis will not utilize any class of targeting, nor do they describe the use of targeting for prenatal diagnosis.

通过在混合样品的测序中利用靶向方法，有可能可以在更少序列读数的情况下获得一定水平的准确性。准确性可以指敏感性，它可以指特异性，或它可以指其一些组合。所希望的准确性水平可以是介于90％与95％之间；它可以是介于95％与98％之间；它可以是介于98％与99％之间；它可以是介于99％与99.5％之间；它可以是介于99.5％与99.9％之间；它可以是介于99.9％与99.99％之间；它可以是介于99.99％与99.999％之间；它可以是介于99.999％与100％之间。超过95％的准确性水平可以被称作高准确性。By utilizing targeted approaches in sequencing mixed samples, it is possible to achieve a level of accuracy with fewer sequence reads. Accuracy can refer to sensitivity, it can refer to specificity, or it can refer to some combination thereof. The desired level of accuracy can be between 90% and 95%; it can be between 95% and 98%; it can be between 98% and 99%; it can be between 99% between 99.5% and 99.5%; it could be between 99.5% and 99.9%; it could be between 99.9% and 99.99%; it could be between 99.99% and 99.999%; it could be between Between 99.999% and 100%. An accuracy level of more than 95% can be called high accuracy.

在现有技术中存在多种展示可以如何从母本和胎儿的DNA的混合样品测定胎儿的倍性状态的公开方法，例如：G.J.W.廖(G.J.W.Liao)等人临床化学2011；57(1)第92-101页。这些方法集中在沿着每条染色体的数千个位置。沿着染色体可以靶向然而针对指定数量的序列读数，仍能从DNA的混合样品产生关于胎儿的高准确性倍性测定的位置数量是出乎意料地低。在本发明的一个实施例中，准确的倍性测定可以通过使用靶向测序、使用任何靶向方法，例如qPCR、配体介导的PCR、其它PCR方法、通过杂交捕获或环化中探针来进行，其中沿着染色体需要被靶向的基因座的数量可以是介于5,000个与2,000个基因座之间；它可以是介于2,000个与1,000个基因座之间；它可以是介于1,000个与500个基因座之间；它可以是介于500个与300个基因座之间；它可以是介于300个与200个基因座之间；它可以是介于200个与150个基因座之间；它可以是介于150个与100个基因座之间；它可以是介于100个与50个基因座之间；它可以是介于50个与20个基因座之间；它可以是介于20个与10个基因座之间。最佳地，它可以是介于100个与500个基因座之间。可以通过靶向少量基因座并且执行出乎意料地少量序列读数来获得高水平准确性。读数数量可以是介于1亿与0.5亿读数之间；读数数量可以是介于0.5亿与0.2亿读数之间；读数数量可以是介于0.2亿与0.1亿读数之间；读数数量可以是介于0.1亿与500万读数之间；读数数量可以是介于500万与200万读数之间；读数数量可以是介于200万与100万之间；读数数量可以是介于100万与500,000之间；读数数量可以是介于500,000与200,000之间；读数数量可以是介于200,000与100,000之间；读数数量可以是介于100,000与50,000之间；读数数量可以是介于50,000与20,000之间；读数数量可以是介于20,000与10,000之间；读数数量可以是少于10,000。更少的读数数量是更大量的输入DNA所必需的。In the prior art there are various published methods showing how the ploidy state of the fetus can be determined from a mixed sample of maternal and fetal DNA, for example: G.J.W. Liao et al. Clin Chem. 2011; 57(1) pp. Pages 92-101. These methods focus on thousands of locations along each chromosome. The number of positions along a chromosome that can be targeted yet for a given number of sequence reads to still yield highly accurate ploidy determinations for a fetus from a mixed sample of DNA is unexpectedly low. In one embodiment of the invention, accurate ploidy determination can be achieved by using targeted sequencing, using any targeted method, such as qPCR, ligand-mediated PCR, other PCR methods, capture by hybridization, or circularizing probes. , wherein the number of loci that need to be targeted along the chromosome can be between 5,000 and 2,000 loci; it can be between 2,000 and 1,000 loci; it can be between Between 1,000 and 500 loci; it could be between 500 and 300 loci; it could be between 300 and 200 loci; it could be between 200 and 150 between loci; it could be between 150 and 100 loci; it could be between 100 and 50 loci; it could be between 50 and 20 loci; It can be between 20 and 10 loci. Optimally, it may be between 100 and 500 loci. High levels of accuracy can be obtained by targeting a small number of loci and performing an unexpectedly small number of sequence reads. The number of readings can be between 100 million and 50 million readings; the number of readings can be between 50 million and 20 million readings; the number of readings can be between 20 million and 0.1 million readings; the number of readings can be between Between 10 million and 5 million readings; the number of readings can be between 5 million and 2 million readings; the number of readings can be between 2 million and 1 million; the number of readings can be between 1 million and 500,000 the number of readings may be between 500,000 and 200,000; the number of readings may be between 200,000 and 100,000; the number of readings may be between 100,000 and 50,000; the number of readings may be between 50,000 and 20,000; The number of reads may be between 20,000 and 10,000; the number of reads may be less than 10,000. Fewer read numbers are required for larger amounts of input DNA.

在一些实施例中，存在一种组合物，它包含胎儿来源的DNA和母本来源的DNA的混合物，其中独特地映射到13号染色体的序列的百分比是大于4％、大于5％、大于6％、大于7％、大于8％、大于9％、大于10％、大于12％、大于15％、大于20％、大于25％或大于30％。在本发明的一些实施例中，存在一种组合物，它包含胎儿来源的DNA和母本来源的DNA的混合物，其中独特地映射到18号染色体的序列的百分比是大于3％、大于4％、大于5％、大于6％、大于7％、大于8％、大于9％、大于10％、大于12％、大于15％、大于20％、大于25％或大于30％。在本发明的一些实施例中，存在一种组合物，它包含胎儿来源的DNA和母本来源的DNA的混合物，其中独特地映射到21号染色体的序列的百分比是大于2％、大于3％、大于4％、大于5％、大于6％、大于7％、大于8％、大于9％、大于10％、大于12％、大于15％、大于20％、大于25％或大于30％。在本发明的一些实施例中，存在一种组合物，它包含胎儿来源的DNA和母本来源的DNA的混合物，其中独特地映射到X染色体的序列的百分比是大于6％、大于7％、大于8％、大于9％、大于10％、大于12％、大于15％、大于20％、大于25％或大于30％。在本发明的一些实施例中，存在一种组合物，它包含胎儿来源的DNA和母本来源的DNA的混合物，其中独特地映射到Y染色体的序列的百分比是大于1％、大于2％、大于3％、大于4％、大于5％、大于6％、大于7％、大于8％、大于9％、大于10％、大于12％、大于15％、大于20％、大于25％或大于30％。In some embodiments, there is a composition comprising a mixture of DNA of fetal origin and DNA of maternal origin, wherein the percentage of sequences uniquely mapped to chromosome 13 is greater than 4%, greater than 5%, greater than 6 %, greater than 7%, greater than 8%, greater than 9%, greater than 10%, greater than 12%, greater than 15%, greater than 20%, greater than 25%, or greater than 30%. In some embodiments of the invention there is a composition comprising a mixture of DNA of fetal origin and DNA of maternal origin wherein the percentage of sequences uniquely mapped to chromosome 18 is greater than 3%, greater than 4% , greater than 5%, greater than 6%, greater than 7%, greater than 8%, greater than 9%, greater than 10%, greater than 12%, greater than 15%, greater than 20%, greater than 25%, or greater than 30%. In some embodiments of the invention there is a composition comprising a mixture of DNA of fetal origin and DNA of maternal origin, wherein the percentage of sequences uniquely mapped to chromosome 21 is greater than 2%, greater than 3% , greater than 4%, greater than 5%, greater than 6%, greater than 7%, greater than 8%, greater than 9%, greater than 10%, greater than 12%, greater than 15%, greater than 20%, greater than 25%, or greater than 30%. In some embodiments of the invention, there is a composition comprising a mixture of DNA of fetal origin and DNA of maternal origin, wherein the percentage of sequences uniquely mapped to the X chromosome is greater than 6%, greater than 7%, Greater than 8%, greater than 9%, greater than 10%, greater than 12%, greater than 15%, greater than 20%, greater than 25%, or greater than 30%. In some embodiments of the invention, there is a composition comprising a mixture of DNA of fetal origin and DNA of maternal origin, wherein the percentage of sequences uniquely mapped to the Y chromosome is greater than 1%, greater than 2%, Greater than 3%, greater than 4%, greater than 5%, greater than 6%, greater than 7%, greater than 8%, greater than 9%, greater than 10%, greater than 12%, greater than 15%, greater than 20%, greater than 25%, or greater than 30 %.

在一些实施例中，描述了一种组合物，它包含胎儿来源的DNA和母本来源的DNA的混合物，其中独特地映射到一条染色体并且含有至少一个单核苷酸多态性的序列的百分比是大于0.2％、大于0.3％、大于0.4％、大于0.5％、大于0.6％、大于0.7％、大于0.8％、大于0.9％、大于1％、大于1.2％、大于1.4％、大于1.6％、大于1.8％、大于2％、大于2.5％、大于3％、大于4％、大于5％、大于6％、大于7％、大于8％、大于9％、大于10％、大于12％、大于15％或大于20％；并且其中所述染色体取自13号、18号、21号、X或Y的群组。在本发明的一些实施例中，存在一种组合物，它包含胎儿来源的DNA和母本来源的DNA的混合物，其中独特地映射到一条染色体并且含有至少一个来自一组单核苷酸多态性的单核苷酸多态性的序列的百分比是大于0.15％、大于0.2％、大于0.3％、大于0.4％、大于0.5％、大于0.6％、大于0.7％、大于0.8％、大于0.9％、大于1％、大于1.2％、大于1.4％、大于1.6％、大于1.8％、大于2％、大于2.5％、大于3％、大于4％、大于5％、大于6％、大于7％、大于8％、大于9％、大于10％、大于12％、大于15％或大于20％，其中所述染色体取自13号、18号、21号、X和Y染色体组，并且其中在单核苷酸多态性组中的单核苷酸多态性的数量是介于1与10之间、介于10与20之间、介于20与50之间、介于50与100之间、介于100与200之间、介于200与500之间、介于500与1,000之间、介于1,000与2,000之间、介于2,000与5,000之间、介于5,000与10,000之间、介于10,000与20,000之间、介于20,000与50,000之间以及介于50,000与100,000之间。In some embodiments, a composition comprising a mixture of DNA of fetal origin and DNA of maternal origin wherein the percentage of sequences that uniquely map to a chromosome and contain at least one single nucleotide polymorphism is described Is greater than 0.2%, greater than 0.3%, greater than 0.4%, greater than 0.5%, greater than 0.6%, greater than 0.7%, greater than 0.8%, greater than 0.9%, greater than 1%, greater than 1.2%, greater than 1.4%, greater than 1.6%, greater than 1.8%, greater than 2%, greater than 2.5%, greater than 3%, greater than 4%, greater than 5%, greater than 6%, greater than 7%, greater than 8%, greater than 9%, greater than 10%, greater than 12%, greater than 15% or greater than 20%; and wherein said chromosome is taken from the group 13, 18, 21, X or Y. In some embodiments of the invention, there is a composition comprising a mixture of fetal-derived DNA and maternal-derived DNA that uniquely maps to a chromosome and contains at least one sequence from a set of SNPs The percentage of sequences with single nucleotide polymorphisms is greater than 0.15%, greater than 0.2%, greater than 0.3%, greater than 0.4%, greater than 0.5%, greater than 0.6%, greater than 0.7%, greater than 0.8%, greater than 0.9%, Greater than 1%, greater than 1.2%, greater than 1.4%, greater than 1.6%, greater than 1.8%, greater than 2%, greater than 2.5%, greater than 3%, greater than 4%, greater than 5%, greater than 6%, greater than 7%, greater than 8 %, greater than 9%, greater than 10%, greater than 12%, greater than 15% or greater than 20%, wherein the chromosome is taken from No. 13, No. 18, No. 21, X and Y chromosome groups, and wherein in the single nucleotide The number of SNPs in a polymorphism set is between 1 and 10, between 10 and 20, between 20 and 50, between 50 and 100, between Between 100 and 200, between 200 and 500, between 500 and 1,000, between 1,000 and 2,000, between 2,000 and 5,000, between 5,000 and 10,000, between 10,000 and Between 20,000, between 20,000 and 50,000, and between 50,000 and 100,000.

理论上，每个扩增周期都使所存在的DNA的量加倍；然而，实际上，扩增程度略低于二。理论上，扩增(包括靶向扩增)将产生DNA混合物的无偏差扩增；实际上，然而，不同等位基因的扩增程度往往不同于其它等位基因。当DNA被扩增时，等位基因偏差的程度通常随着扩增步骤的数量而增加。在一些实施例中，本文中所述的方法涉及以低水平等位基因偏差扩增DNA。因为等位基因偏差随着周期每增加一个而增加，所以可以通过计算整体偏差的n次方根来测定每个周期等位基因偏差，其中n是富集度的以2为底的对数。在一些实施例中，存在一种组合物，它包含第二DNA混合物，其中所述第二DNA混合物已经在多个多态基因座从第一DNA混合物优先富集，其中富集度是至少10、至少100、至少1,000、至少10,000、至少100,000或至少1,000,000，并且其中在第二DNA混合物中在每个基因座的等位基因的比率与在第一DNA混合物中在所述基因座的等位基因的比率相差的系数平均不到1,000％、500％、200％、100％、50％、20％、10％、5％、2％、1％、0.5％、0.2％、0.1％、0.05％、0.02％或0.01％。在一些实施例中，存在一种组合物，它包含第二DNA混合物，其中所述第二DNA混合物已经在多个多态基因座从第一DNA混合物优先富集，其中多个多态基因座的每个周期等位基因偏差平均不到10％、5％、2％、1％、0.5％、0.2％、0.1％、0.05％或0.02％。在一些实施例中，多个多态基因座包含至少10个基因座、至少20个基因座、至少50个基因座、至少100个基因座、至少200个基因座、至少500个基因座、至少1,000个基因座、至少2,000个基因座、至少5,000个基因座、至少10,000个基因座、至少20,000个基因座或至少50,000个基因座。In theory, each cycle of amplification doubles the amount of DNA present; in practice, however, the degree of amplification is slightly less than two. In theory, amplification (including targeted amplification) will produce unbiased amplification of DNA mixtures; in practice, however, different alleles will often be amplified to a different extent than other alleles. When DNA is amplified, the degree of allelic bias generally increases with the number of amplification steps. In some embodiments, the methods described herein involve amplifying DNA with low levels of allelic bias. Because allelic bias increases with each cycle, allelic bias per cycle can be determined by computing the nth root of the overall bias, where n is the base-2 logarithm of the enrichment. In some embodiments, there is a composition comprising a second DNA mixture, wherein the second DNA mixture has been preferentially enriched at a plurality of polymorphic loci from the first DNA mixture, wherein the degree of enrichment is at least 10 , at least 100, at least 1,000, at least 10,000, at least 100,000, or at least 1,000,000, and wherein the ratio of alleles at each locus in the second DNA mixture to the allele at that locus in the first DNA mixture The ratios of genes differ on average by less than 1,000%, 500%, 200%, 100%, 50%, 20%, 10%, 5%, 2%, 1%, 0.5%, 0.2%, 0.1%, 0.05% , 0.02% or 0.01%. In some embodiments, there is a composition comprising a second mixture of DNA, wherein the second mixture of DNA has been preferentially enriched from a first mixture of DNA at a plurality of polymorphic loci, wherein the plurality of polymorphic loci The allelic bias per cycle averaged less than 10%, 5%, 2%, 1%, 0.5%, 0.2%, 0.1%, 0.05%, or 0.02%. In some embodiments, the plurality of polymorphic loci comprises at least 10 loci, at least 20 loci, at least 50 loci, at least 100 loci, at least 200 loci, at least 500 loci, at least 1,000 loci, at least 2,000 loci, at least 5,000 loci, at least 10,000 loci, at least 20,000 loci, or at least 50,000 loci.

一些实施例some examples

在一些实施例中，本文中公开了一种用于产生揭示了孕育中的胎儿中所测定的染色体倍性状态的报告的方法，所述方法包含：获得含有来自胎儿母亲的DNA和来自胎儿的DNA的第一样品；从胎儿亲本的一方或双方获得基因型数据；通过分离DNA制备第一样品以便获得制备样品；测量制备样品中在多个多态基因座的DNA；从关于制备样品得到的DNA测量结果，在计算机上计算在多个多态基因座的等位基因计数或等位基因计数概率；针对染色体的不同的可能倍性状态，在计算机上创建多个关于在染色体上的多个多态基因座的预计等位基因计数概率的倍性假设；针对每种倍性假设，使用来自胎儿亲本的一方或双方的基因型数据，在计算机上为染色体上每个多态基因座的等位基因计数概率构建联合分布模型；使用联合分布模型和针对制备样品计算的等位基因计数概率，在计算机上测定倍性假设中的每一个的相对概率；通过选择对应于具有最大概率的假设的倍性状态，判读胎儿的倍性状态；以及产生揭示了所测定的倍性状态的报告。In some embodiments, disclosed herein is a method for generating a report revealing a ploidy state determined in a gestating fetus, the method comprising: obtaining DNA containing DNA from the mother of the fetus and DNA from the fetus. A first sample of DNA; obtaining genotype data from one or both of the fetal parents; preparing a first sample by isolating DNA to obtain a prepared sample; measuring DNA at multiple polymorphic loci in a prepared sample; From the DNA measurements obtained, allele counts or allele count probabilities at multiple polymorphic loci are calculated on a computer; for different possible ploidy states of the chromosome, multiple ploidy states on the chromosome are created on the computer Ploidy assumptions for predicted allele count probabilities for multiple polymorphic loci; Construct a joint distribution model for the allele count probabilities of the ploidy hypotheses; using the joint distribution model and the allele count probabilities calculated for the prepared samples, the relative probabilities of each of the ploidy hypotheses are determined in silico; assumed ploidy status, deciphering the ploidy status of the fetus; and generating a report revealing the determined ploidy status.

在一些实施例中，所述方法用于测定多个母亲体内相应的多个孕育中的胎儿的倍性状态，所述方法进一步包含：测定制备样品中的每一个中胎儿来源的DNA的百分比；并且其中测量制备样品中的DNA的步骤通过对制备样品中的每一个中的多个DNA分子进行测序来进行，其中被测序的DNA分子来自具有更小的胎儿DNA分数的那些制备样品比来自具有更大的胎儿DNA分数的那些制备样品的要多。In some embodiments, the method is used to determine the ploidy state of a corresponding plurality of gestating fetuses in a plurality of mothers, the method further comprising: determining the percentage of fetal-derived DNA in each of the prepared samples; and wherein the step of measuring the DNA in the prepared samples is performed by sequencing a plurality of DNA molecules in each of the prepared samples, wherein the sequenced DNA molecules are from those preparations with a smaller fraction of fetal DNA than from those preparations with Those with greater fetal DNA fractions prepared more samples.

在一些实施例中，所述方法用于测定多个母亲体内相应的多个孕育中的胎儿的倍性状态，并且其中针对胎儿中的每一个，通过对第一分数的制备DNA样品进行测序，得到第一组测量结果来测量制备样品中的DNA，所述方法进一步包含：鉴于第一组DNA测量结果，测定胎儿中的每一个的倍性假设中的每一个的第一相对概率；对来自其中倍性假设中的每一个的第一相对概率测定结果表明对应于非整倍体胎儿的倍性假设具有显著的但是非决定性的概率的那些胎儿的第二分数的制备样品进行重测序，得到第二组测量结果；使用第二组测量结果和任选地也使用第一组测量结果，测定胎儿的倍性假设的第二相对概率；以及通过选择对应于具有最大概率(如通过第二相对概率测定所测定)的假设的倍性状态，判读对其第二样品进行重测序的胎儿的倍性状态。In some embodiments, the method is used to determine the ploidy state of a corresponding plurality of gestating fetuses in a plurality of mothers, and wherein, for each of the fetuses, by sequencing a first fraction of the prepared DNA sample, Obtaining a first set of measurements to measure DNA in the prepared sample, the method further comprising: determining a first relative probability for each of the ploidy hypotheses for each of the fetuses given the first set of DNA measurements; A second fraction of prepared samples of those fetuses in which the first relative probability determination for each of the ploidy hypotheses indicates a significant but inconclusive probability corresponding to the ploidy hypothesis for aneuploid fetuses is resequenced to obtain A second set of measurements; using the second set of measurements and optionally also using the first set of measurements, determine a second relative probability of a ploidy hypothesis for the fetus; and by selecting the The hypothetical ploidy state determined by probabilistic determination) is interpreted as the ploidy state of the fetuses whose second sample was resequenced.

在一些实施例中，公开了一种目标组合物，所述目标组合物包含：优先富集的DNA样品，其中所述优先富集的DNA样品已经在多个多态基因座从第一DNA样品优先富集，其中第一DNA样品由来源于母本血浆的母本DNA和胎儿DNA的混合物组成，其中富集度是至少2倍，并且其中第一样品与优先富集样品之间的平均等位基因偏差选自由以下组成的群组：小于2％、小于1％、小于0.5％、小于0.2％、小于0.1％、小于0.05％、小于0.02％并且小于0.01％。在一些实施例中，公开了一种创建这类优先富集的DNA样品的方法。In some embodiments, a target composition is disclosed comprising: a preferentially enriched DNA sample, wherein the preferentially enriched DNA sample has been selected from a first DNA sample at a plurality of polymorphic loci preferential enrichment, wherein the first DNA sample consists of a mixture of maternal DNA and fetal DNA derived from maternal plasma, wherein the degree of enrichment is at least 2-fold, and wherein the average of the first sample and the preferentially enriched sample The allelic bias is selected from the group consisting of less than 2%, less than 1%, less than 0.5%, less than 0.2%, less than 0.1%, less than 0.05%, less than 0.02%, and less than 0.01%. In some embodiments, a method of creating such preferentially enriched DNA samples is disclosed.

在一些实施例中，公开了一种用于测定在包含胎儿和母本的基因组DNA的母本组织样品中存在或不存在胎儿非整倍性的方法，其中所述方法包含：(a)从所述母本组织样品获得胎儿和母本的基因组DNA的混合物；(b)在多个多态等位基因选择性富集胎儿和母本DNA的混合物；(c)使来自步骤a的胎儿和母本的基因组DNA的混合物的选择性富集片段进行分布，得到包含单一基因组DNA分子或单一基因组DNA分子的扩增产物的反应样品；(d)对步骤c)的反应样品中的选择性富集的基因组DNA片段执行大规模并行DNA测序，测定所述选择性富集片段的序列；(e)鉴别在步骤d)中所得序列所属的染色体；(f)分析步骤d)的数据，测定i)来自步骤d)的基因组DNA片段中属于至少一条假定在母亲和胎儿中是二倍体的第一目标染色体的数量，和ii)来自步骤d)的基因组DNA片段中属于第二目标染色体的的数量，其中所述第二染色体疑似在胎儿中是非整倍体；(g)如果第二目标染色体是整倍体，那么使用在步骤f)部分i)中测定的数量，计算来自步骤d)的基因组DNA片段中关于第二目标染色体的数量的预计分布；(h)如果第二目标染色体是非整倍体，那么使用在步骤f)部分i)中的第一数量和步骤b)的混合物中所发现的胎儿DNA的估计分数，计算来自步骤d)的基因组DNA片段中关于第二目标染色体的数量的预计分布；以及(i)使用最大似然或最大后验概率法测定在步骤f)部分ii)中测定的基因组DNA片段的数量是否更可能是在步骤g)中计算的分布或在步骤h)中计算的分布的一部分；从而指示存在或不存在胎儿非整倍性。In some embodiments, a method for determining the presence or absence of fetal aneuploidy in a sample of maternal tissue comprising fetal and maternal genomic DNA is disclosed, wherein the method comprises: (a) obtaining from The maternal tissue sample obtains a mixture of fetal and maternal genomic DNA; (b) selectively enriches the mixture of fetal and maternal DNA at multiple polymorphic alleles; (c) makes fetal and maternal DNA from step a The selective enrichment fragments of the mixture of maternal genomic DNA are distributed to obtain a reaction sample comprising a single genomic DNA molecule or an amplification product of a single genomic DNA molecule; (d) the selective enrichment in the reaction sample of step c) performing massively parallel DNA sequencing on the collected genomic DNA fragments to determine the sequence of said selectively enriched fragments; (e) identifying the chromosome to which the sequence obtained in step d) belongs; (f) analyzing the data of step d) to determine i ) the number of genomic DNA fragments from step d) that belong to at least one first target chromosome that is assumed to be diploid in the mother and fetus, and ii) the number of genomic DNA fragments from step d) that belong to a second target chromosome quantity, wherein said second chromosome is suspected to be aneuploid in the fetus; (g) if the second target chromosome is euploid, then using the quantity determined in step f) part i), calculate the the expected distribution of the number of second target chromosomes in the genomic DNA fragment; (h) if the second target chromosome is aneuploid, then use the first number in step f) part i) and the mixture of step b) an estimated fraction of fetal DNA found, calculating an expected distribution of the number of second target chromosomes in the genomic DNA fragments from step d); and (i) using maximum likelihood or maximum a posteriori methods to determine ) is more likely to be part of the distribution calculated in step g) or the distribution calculated in step h); thereby indicating the presence or absence of fetal aneuploidy.

例示性癌症诊断方法Exemplary Cancer Diagnostic Methods

应注意，已经表明，可以在主体的血液中发现生活在主体内的来源于癌症的DNA。以相同的方式，基因诊断可以从母本血液中所发现的混合DNA的测量结果进行，基因诊断同样可以很好地从主体血液中所发现的混合DNA的测量结果进行。基因诊断可以包括非整倍性状态或基因突变。基于从关于母本血液获得的测量结果测定胎儿的倍性状态或遗传状态阅读的本发明任一权利要求同样可以很好地基于从主体血液的测量结果测定癌症的倍性状态或遗传状态来阅读。It should be noted that it has been shown that DNA derived from cancer living in a subject can be found in the blood of the subject. In the same way that genetic diagnosis can be made from measurements of admixture of DNA found in the maternal blood, genetic diagnosis can equally well be made from measurements of admixture of DNA found in the subject's blood. Genetic diagnosis can include aneuploidy status or genetic mutations. Any claim of the invention that reads on the basis of determining the ploidy state or genetic state of the fetus from measurements obtained on the mother's blood can equally well read on the basis of determining the ploidy state or genetic state of the cancer from measurements on the subject's blood. .

在一些实施例中，本发明方法允许测定癌症的倍性状态，所述方法包括获得含有来自主体的遗传物质和来自癌症的遗传物质的混合样品；测量混合样品中的DNA；计算混合样品中具有癌症来源的DNA的分数；以及使用关于混合样品得到的测量结果和所计算的分数，测定癌症的倍性状态。在一些实施例中，所述方法可以进一步包含基于癌症的倍性状态的测定结果投予癌症治疗。在一些实施例中，所述方法可以进一步包含基于癌症的倍性状态的测定结果投予癌症治疗，其中癌症治疗取自包含药物、生物疗法和基于抗体的疗法以及其组合的群组。In some embodiments, the methods of the invention allow for the determination of the ploidy state of a cancer, the method comprising obtaining a mixed sample containing genetic material from a subject and genetic material from a cancer; measuring the DNA in the mixed sample; the fraction of cancer-derived DNA; and determining the ploidy state of the cancer using the measurements obtained on the pooled sample and the calculated fraction. In some embodiments, the method can further comprise administering a cancer therapy based on the determination of the ploidy state of the cancer. In some embodiments, the method can further comprise administering a cancer treatment based on the determination of the ploidy state of the cancer, wherein the cancer treatment is taken from the group comprising drugs, biological therapies, and antibody-based therapies, and combinations thereof.

例示性实施方法Exemplary Implementations

本文中所公开的实施例中的任一个可以在数字电子电路、集成电路、专门设计的ASIC(专用集成电路)、计算机硬件、固件、软件或其组合中实施。本发明所公开的实施例的设备可以按有形地实施于通过可编程处理器执行的机器可读的存储装置中的计算机程序产品来实施；并且本发明所公开的实施例的方法步骤可以通过关于输入数据和产生输出的操作，通过可编程处理器执行用于执行本发明所公开的实施例的功能的程序指令来进行。本发明所公开的实施例可以在一或多个在可编程系统上可执行和/或可解释的计算机程序中有利地实施，所述可编程系统包括至少一个可编程处理器，它可以是专用的或通用的，经耦合以从存储系统、至少一个输入装置以及至少一个输出装置接收数据和指令，并且向存储系统、至少一个输入装置以及至少一个输出装置传输数据和指令。每个计算机程序都可以按高水平的程序或面向对象编程语言或必要时以汇编语言或机器语言实施；并且在任何情况下，所述语言可以是编译语言或解释语言。计算机程序可以按任何形式采用，包括以独立程序形式或以模块、组件、子程序或适用于计算环境中的其它单元的形式。计算机程序可以用于在一台计算机上或在一个站点或通过通信网络分布在多个站点并且互连的多台计算机上执行或解释。Any of the embodiments disclosed herein may be implemented in digital electronic circuitry, integrated circuits, specially designed ASICs (Application Specific Integrated Circuits), computer hardware, firmware, software, or combinations thereof. The apparatus of the disclosed embodiments of the present invention may be implemented as a computer program product tangibly embodied in a machine-readable storage device executed by a programmable processor; and the method steps of the disclosed embodiments of the present invention may be implemented by referring to The operations of inputting data and generating output are performed by a programmable processor executing program instructions for carrying out the functions of the disclosed embodiments of the invention. The disclosed embodiments of the invention may be advantageously implemented in one or more computer programs executable and/or interpretable on a programmable system comprising at least one programmable processor, which may be a dedicated or in general, coupled to receive data and instructions from and transmit data and instructions to the storage system, at least one input device, and at least one output device. Each computer program can be implemented in a high-level procedural or object-oriented programming language, or in assembly or machine language if necessary; and in any case, the language can be a compiled or interpreted language. A computer program can be implemented in any form, including as a stand-alone program or as a module, component, subroutine or other unit suitable for use in a computing environment. A computer program can be intended to be executed or interpreted on one computer or on multiple computers at one site or distributed across multiple sites and interconnected by a communication network.

如本文所用的计算机可读存储媒体是指实体或有形存储(相较于信号)并且包括(但不限于)以用于信息的有形存储的任何方法或技术实施的易失性和非易失性、可拆卸和非可拆卸式媒体，所述信息例如计算机可读指令、数据结构、程序模块或其它数据。计算机可读存储媒体包括(但不限于)RAM、ROM、EPROM、EEPROM、闪速存储器或其它固态存储技术、CD-ROM、DVD或其它光存储器、盒式磁带、磁带、磁盘存储器或其它磁性存储装置或可以用于有形地存储所希望的信息或数据或指令并且可以通过计算机或处理器访问的任何其它实体或物质媒体。As used herein, computer-readable storage media refers to physical or tangible storage (as opposed to signals) and includes, but is not limited to, volatile and nonvolatile media implemented in any method or technology for tangible storage of information. , removable and non-removable media, such as computer readable instructions, data structures, program modules or other data. Computer-readable storage media include, but are not limited to, RAM, ROM, EPROM, EEPROM, flash memory or other solid-state storage technology, CD-ROM, DVD or other optical storage, cassette tape, magnetic tape, magnetic disk storage or other magnetic storage A device or any other physical or material medium that can be used to tangibly store desired information or data or instructions and which can be accessed by a computer or processor.

本文中所述方法中的任一种可以包括呈实体格式的数据输出，例如在计算机屏幕上或在打印纸上。在本文档中其它地方的任何实施例的解释中，应了解，所述方法可以与呈可以通过医师起作用的格式的可操作数据的输出组合。另外，所述方法可以与产生临床治疗的临床决定的实际执行或不采取行动的临床决定的执行组合。本文档中关于测定关于目标个体的遗传数据所述的实施例中的一些可以与在IVF的情况下选择一或多个胚胎用于移转的决定组合，任选地与将胚胎转移到准妈妈的子宫的过程组合。本文档中关于测定关于目标个体的遗传数据所述的实施例中的一些可以与通知医疗专业人员潜在染色体异常或缺乏染色体异常组合，任选地与在产前诊断的情况下堕胎或不堕胎的决定组合。本文中所述的实施例中的一些可以与可操作数据的输出、以及产生临床治疗的临床决定的执行或不采取行动的临床决定的执行组合。Any of the methods described herein may include the output of data in a tangible format, such as on a computer screen or on printed paper. In the explanation of any of the embodiments elsewhere in this document, it should be understood that the method can be combined with the output of actionable data in a format that can be acted upon by a physician. Additionally, the method may be combined with the actual execution of a clinical decision to generate clinical treatment or the execution of a clinical decision to take no action. Some of the embodiments described in this document with regard to determining genetic data about a target individual can be combined with the decision to select one or more embryos for transfer in the case of IVF, optionally with transferring the embryos to the expectant mother process composition of the uterus. Some of the embodiments described in this document with respect to determining genetic data about a target individual can be combined with notifying a medical professional of a potential chromosomal abnormality or lack thereof, optionally with the possibility of an abortion or no abortion in the case of prenatal diagnosis Decide on a combination. Some of the embodiments described herein may be combined with the output of actionable data, and the execution of a clinical decision to result in a clinical treatment or to take no action.

例示性诊断盒Exemplary diagnostic kit

在一个实施例中，本发明包含能够部分地或完全地执行本发明中所述方法中的任一种的诊断盒。在一个实施例中，所述诊断盒可以位于医师的办公室、医院实验室或适当地接近患者护理点的任何合适位置。所述盒可以能够以完全自动化方式运行整个方法，或所述盒可以需要一个或多个步骤通过技术员手动完成。在一个实施例中，所述盒可以能够至少分析关于母本血浆所测量的基因型数据。在一个实施例中，所述盒可以连接到用于将诊断盒上所测量的基因型数据传输到外部计算设施的构件，所述外部计算设施然后可以分析基因型数据，并且可能还产生报告。诊断盒可以包括能够将水性或液体样品从一个容器转移到另一个的机器单元。它可以包含多种固体和液体试剂。它可以包含高通量测序仪。它可以包含计算机。In one embodiment, the invention comprises a diagnostic cartridge capable of partially or fully performing any of the methods described in the invention. In one embodiment, the diagnostic cassette may be located in a physician's office, a hospital laboratory, or any suitable location suitably close to the patient's point of care. The cassette may be capable of running the entire method in a fully automated fashion, or the cassette may require one or more steps to be performed manually by a technician. In one embodiment, the cassette may be capable of analyzing at least genotype data measured on maternal plasma. In one embodiment, the cartridge may be connected to means for transmitting the genotype data measured on the diagnostic cartridge to an external computing facility which may then analyze the genotype data and possibly also generate a report. A diagnostic cartridge may include a robotic unit capable of transferring an aqueous or liquid sample from one container to another. It can contain a variety of solid and liquid reagents. It can contain high-throughput sequencers. It can contain computers.

实验部分Experimental part

在以下实例中描述了本发明所公开的实施例，所述实例是为了帮助了解本发明而阐述的，并且不应该理解为以任何方式限制如在以上权利要求书中所定义的本发明的范围。提出以下实例以便提供本领域普通技术人员如何使用所述实施例的完整公开内容和描述，并且并不打算限制本发明的范围，它们也不打算表示以下实验是所执行的所有或仅有实验。已经致力于确保关于所用数量(例如量、温度等)的准确性，但是应该考虑一些实验误差和偏差。除非另外规定，否则份数都是体积份，并且温度用摄氏度表示。应了解，如所述可以在不改变实验基本方面的情况下得到的方法变体是为了说明。Disclosed embodiments of the present invention are described in the following examples, which are set forth to aid in the understanding of the present invention and should not be construed in any way as limiting the scope of the present invention as defined in the preceding claims . The following examples are presented in order to provide a complete disclosure and description of how the embodiments may be used by one of ordinary skill in the art, and are not intended to limit the scope of the invention, nor are they intended to represent that the following experiments were all or the only ones performed. Efforts have been made to ensure accuracy with respect to quantities used (eg, amounts, temperature, etc.), but some experimental errors and deviations should be accounted for. Parts are by volume and temperatures are in degrees Celsius unless otherwise specified. It is to be understood that method variants as described which can be obtained without changing fundamental aspects of the experiments are for illustration.

实验1Experiment 1

目的是为了显示贝叶斯最大似然估计(MLE)算法，所述算法使用亲本基因型来计算胎儿分数，相比于已公开方法改进了非侵入性产前三体诊断的准确性。The aim was to show that the Bayesian Maximum Likelihood Estimation (MLE) algorithm, which uses parental genotypes to calculate fetal fraction, improves the accuracy of non-invasive prenatal trisomy diagnosis compared to published methods.

通过对关于21-三体症和相应的母亲细胞系所得的读数进行抽样，创建母本cfDNA的模拟测序数据。关于已公开方法(丘等人BMJ 2011；342：c7401)和我们的基于MLE的算法，从各种胎儿分数的500个模拟测定正确的二体性和三体性判读的比率。我们通过从四位怀孕母亲和相应的父亲获得根据IRB批准协议收集的500万鸟枪读数，来验证模拟。在290K SNP阵列上获得亲本基因型。(参看图14)Mock sequencing data of maternal cfDNA were created by sampling the reads obtained for trisomy 21 and the corresponding maternal cell line. The ratio of correct disomy and trisomy calls was determined from 500 simulations of various fetal fractions with respect to published methods (Yau et al. BMJ 2011;342:c7401) and our MLE-based algorithm. We validated the simulations by obtaining 5 million shotgun readings collected under an IRB-approved protocol from four pregnant mothers and corresponding fathers. Parental genotypes were obtained on a 290K SNP array. (See Figure 14)

在模拟中，基于MLE的方法对于低到9％的胎儿分数达到了99.0％的准确性并且报告了很好地对应于整体准确性的置信度。我们使用四个真实样品来验证这些结果，其中我们以所计算的超过99％的置信度获得了所有正确判读。相比之下，我们关于丘等人所公开的算法的实施方式需要18％的胎儿分数来达到99.0％的准确性，并且在9％胎儿DNA下仅达到87.8％的准确性。In simulations, the MLE-based method achieved 99.0% accuracy for fetal fractions as low as 9% and reported confidence levels that corresponded well to the overall accuracy. We validated these results using four real samples where we obtained all correct calls with a calculated confidence of over 99%. In contrast, our implementation of the algorithm disclosed by Yau et al. required a fetal fraction of 18% to achieve 99.0% accuracy, and achieved only 87.8% accuracy at 9% fetal DNA.

在第一和第二孕期早期所预计的胎儿分数下，从亲本基因型以及基于MLE的方法测定胎儿分数所达到的准确性大于公开算法。此外，本文中所公开的方法产生的置信度度量值在决定结果可靠性方面，尤其在倍性检测更困难的低胎儿分数下是至关重要的。公开方法使用基于大量二体性训练数据集来判读倍性的较不准确的阈值方法，这是一种预定义假阳性率的方法。另外，在无置信度度量值的情况下，当胎儿cfDNA不足以做出判读时，公开方法具有报告假阴性结果的风险。在一些实施例中，计算所判读的倍性状态的置信度估计值。Determination of fetal fraction from parental genotypes and MLE-based methods achieved greater accuracy than published algorithms for predicted fetal fractions in the first and second trimesters. Furthermore, the confidence measures produced by the methods disclosed herein are critical in determining the reliability of the results, especially at low fetal fractions where ploidy detection is more difficult. The published method uses a less accurate threshold method for calling ploidy based on a large disomy training data set, which is a method of pre-defining the false positive rate. In addition, published methods risk reporting false-negative results when fetal cfDNA is insufficient to make a call in the absence of a confidence measure. In some embodiments, a confidence estimate for the called ploidy state is calculated.

实验2Experiment 2

目的是为了通过使用靶向测序方法以及亲本基因型和Hapmap数据，以贝叶斯最大似然估计(MLE)算法改进胎儿的18、21和X三体症的非侵入性检测，尤其是在由低胎儿分数组成的样品中。The aim is to improve the non-invasive detection of fetal trisomies 18, 21 and X by using targeted sequencing methods together with parental genotype and Hapmap data with a Bayesian maximum likelihood estimation (MLE) algorithm, especially in the in samples composed of low fetal fractions.

根据IRB批准协议，从胎儿染色体组型已知的患者获得来自四位整倍体和两位三体阳性怀孕者的母本样品和相应的父本样品。从血浆提取母本cfDNA并且在优先富集所述靶向特异性SNP之后，获得大约1000万序列读数。类似地对亲本样品进行测序，获得基因型。Maternal samples and corresponding paternal samples from four euploid and two trisomy-positive pregnancies were obtained from patients with known fetal karyotypes according to an IRB-approved protocol. Maternal cfDNA was extracted from plasma and after preferential enrichment for the target-specific SNPs, approximately 10 million sequence reads were obtained. The parental samples were similarly sequenced to obtain genotypes.

关于所有整倍体样品和非整倍体样品的正常染色体，所述算法正确地判读18号和21号染色体二体症。18-三体症和21-三体症判读是正确的，正如在男性和女性胎儿中的X染色体拷贝数。通过所述算法产生的置信度在所有情况下都超过98％。The algorithm correctly called disomies 18 and 21 with respect to the normal chromosomes of all euploid samples and aneuploid samples. The interpretation of trisomy 18 and trisomy 21 was correct, as was the X chromosome copy number in male and female fetuses. The confidence levels generated by the algorithm exceeded 98% in all cases.

所述方法准确地报告了六个样品的所有测试染色体的倍性，包括由小于12％的胎儿DNA组成的样品，所述样品占第1和第2孕期早期样品的大约30％。本发明MLE算法与公开方法之间的关键差异是它利用亲本基因型和Hapmap数据来改进准确性并产生置信度度量值。在低胎儿分数下，所有方法都变得不太准确；在胎儿cfDNA不足以做出可靠判读的情况下正确地鉴别样品很重要。其他人已经使用了Y染色体特异性探针来估计男性胎儿的胎儿分数，但是并行亲本基因分型实现了两种性别的胎儿分数的估计。使用非靶向鸟枪法测序的公开方法的另一个固有限制是由于例如GC丰度的因素的差异，染色体的倍性判读的准确性不同。本发明的靶向测序方法在很大程度上独立于这类染色体规模变化并且在染色体之间获得了更一致的性能。The method accurately reported the ploidy of all chromosomes tested for six samples, including samples consisting of less than 12% fetal DNA, which accounted for approximately 30% of first trimester 1 and 2 trimester samples. A key difference between the present MLE algorithm and the disclosed method is that it utilizes parental genotype and Hapmap data to improve accuracy and generate confidence measures. At low fetal fractions, all methods become less accurate; it is important to correctly identify samples where fetal cfDNA is insufficient for reliable interpretation. Others have used Y-chromosome-specific probes to estimate fetal fraction in male fetuses, but parallel parental genotyping enables estimation of fetal fraction in both sexes. Another inherent limitation of published methods using non-targeted shotgun sequencing is that the accuracy of ploidy calls for chromosomes varies due to differences in factors such as GC abundance. The targeted sequencing method of the present invention is largely independent of such chromosome scale variations and achieves a more consistent performance across chromosomes.

实验3Experiment 3

目的是为了确定是否可以使用新颖的信息学来分析母本血浆中的自由浮动的胎儿DNA的SNP基因座，以高置信度检测三倍体胎儿的三体性。The aim was to determine whether novel informatics could be used to analyze free-floating fetal DNA in maternal plasma for SNP loci to detect trisomy in triploid fetuses with high confidence.

在异常的超声波之后，从怀孕患者抽取20mL血液。在离心之后，从白细胞层提取母本DNA(蒂聂斯，凯杰(DNEASY,QIAGEN))；从血浆提取无细胞DNA(齐安普凯杰(QIAAMP QIAGEN))。向两个DNA样品中的2号、21号和X染色体上的SNP基因座施加靶向测序。最大似然性贝叶斯估计从所有可能的倍性状态组中选出最可能假设。所述方法测定胎儿DNA分数、倍性状态和倍性测定中的明确置信度。关于参考染色体的倍性不做假设。诊断使用独立于序列读数计数的测试统计，这是最新的现有技术水平。Following the abnormal ultrasound, 20 mL of blood was drawn from the pregnant patient. After centrifugation, maternal DNA was extracted from buffy coat (DNEASY, QIAGEN); cell-free DNA was extracted from plasma (QIAAMP QIAGEN). Targeted sequencing was applied to SNP loci on chromosomes 2, 21 and X in both DNA samples. Maximum likelihood Bayesian estimation selects the most likely hypothesis from all possible groups of ploidy states. The method determines fetal DNA fraction, ploidy state and unambiguous confidence in ploidy determination. No assumptions were made about the ploidy of the reference chromosome. The diagnosis uses a test statistic independent of sequence read counts, which is the latest state of the art.

本发明方法准确诊断2号和21号染色体的三体性。估计孩子分数是11.9％[CI11.7-12.1]。发现胎儿具有2号和21号染色体的一个母本和两个父本拷贝，有效置信度是1(误差概率＜10^-30)。这是在2号和21号染色体上分别92,600个和258,100个读数的情况下实现的。The method of the invention accurately diagnoses the trisomy of No. 2 and No. 21 chromosomes. The estimated child score was 11.9% [CI11.7-12.1]. The fetus was found to have one maternal and two paternal copies of chromosomes 2 and 21 with an effective confidence level of 1 (probability of error <10 ⁻³⁰ ). This was achieved with 92,600 and 258,100 reads on chromosomes 2 and 21, respectively.

这是第一次证明从胎儿是三倍体的母本血液非侵入性产前诊断三体染色体，如通过中期染色体组型所证实。非侵入性诊断的现有方法不会检测这个样品中的非整倍性。当前方法依赖于三体染色体上相对于二体参考染色体的过剩序列读数；但是三倍体胎儿没有二体参考。此外，现有方法在这个胎儿DNA分数和序列读数数量下达不到类似高的置信度倍性测定。直截了当的做法是将所述方法扩展到所有24条染色体。This is the first demonstration of non-invasive prenatal diagnosis of trisomy from maternal blood in which the fetus is triploid, as confirmed by metaphase karyotype. Existing methods of non-invasive diagnosis would not detect aneuploidy in this sample. Current methods rely on excess sequence reads on trisomic chromosomes relative to a disomic reference chromosome; however, triploid fetuses do not have a disomic reference. Furthermore, existing methods do not achieve similarly high confidence ploidy determinations at this fraction of fetal DNA and number of sequence reads. It is straightforward to extend the method to all 24 chromosomes.

实验4Experiment 4

以下方案用于使用标准PCR(意指不使用嵌套)，对从整倍体怀孕者的母本血浆中分离的DNA以及来自三倍性21细胞系的基因组DNA进行800重扩增。库制备和扩增涉及单管末端平端化，接着是A加尾。使用安捷伦休尔塞莱克特试剂盒中所发现的接合试剂盒运行衔接子接合，并且运行PCR持续7个周期。然后，使用800个靶向2号、21号和X染色体上的SNP的不同引物对进行15个STA周期(95℃，30s；72℃，1min；60℃，4min；65℃，1min；72℃，30s)。用12.5nM引物浓度运行反应。然后用伊路米那IIGAX测序仪对DNA进行测序。测序仪输出190万个读数，其中92％映射到基因组；在映射到基因组的那些读数中，超过99％映射到由靶向引物所靶向的区域中的一个。血浆DNA和基因组DNA的数量基本上相同。图15示出了在从已知在21号染色体是三体性的细胞系获得的基因组DNA中通过测序仪检测的约780个SNP的两种等位基因的比率。应注意，为了便于可视化，在此绘制了等位基因比率，因为等位基因分布从视觉上阅读并不简单明了。圆圈表示二体染色体上的SNP，而星形表示三体染色体上的SNP。图16是与图X中的相同数据的另一个图示，其中Y轴是针对每个SNP所测量的A和B的相对数量，并且其中X轴是其中SNP通过染色体隔开的SNP数量。在图16中，SNP1到312发现在2号染色体上，SNP 313到605发现在21号三体染色体上，并且SNP 606到800是在X染色体上。2号和X染色体的数据显示二体染色体，因为相对序列计数位于三个簇中：在图顶部的AA、在图底部的BB以及在图中间的AB。21号三体染色体的数据显示四个簇：在图顶部的AAA、在线0.65(2/3)周围的AAB、在线0.35(1/3)周围的ABB以及在图底部的BBB。The following protocol was used for the 800-fold amplification of DNA isolated from maternal plasma of euploid pregnant women and genomic DNA from the triploid 21 cell line using standard PCR (meaning no nesting was used). Library preparation and amplification involved blunting of single tube ends followed by A-tailing. Adapter ligation was run using the ligation kit found in the Agilent Huell Select kit, and PCR was run for 7 cycles. Then, 15 STA cycles (95°C, 30s; 72°C, 1min; 60°C, 4min; 65°C, 1min; 72°C) were performed using 800 different primer pairs targeting SNPs on chromosome 2, 21, and X , 30s). Reactions were run with a primer concentration of 12.5 nM. DNA was then sequenced with an Illumina IIGAX sequencer. The sequencer output 1.9 million reads, 92% of which mapped to the genome; of those reads that mapped to the genome, more than 99% mapped to one of the regions targeted by the targeting primers. The amounts of plasma DNA and genomic DNA are essentially the same. Figure 15 shows the ratio of two alleles of about 780 SNPs detected by a sequencer in genomic DNA obtained from a cell line known to be trisomy at chromosome 21. It should be noted that allele ratios are plotted here for ease of visualization, as allele distributions are not straightforward to read visually. Circles indicate SNPs on disomic chromosomes, while stars indicate SNPs on trisomic chromosomes. Figure 16 is another representation of the same data as in Panel X, where the Y-axis is the relative amount of A and B measured for each SNP, and where the X-axis is the number of SNPs where the SNPs are separated by chromosomes. In Figure 16, SNP1 to 312 are found on chromosome 2, SNP 313 to 605 are found on trisomy 21, and SNP 606 to 800 are on chromosome X. The data for chromosomes 2 and X show disomic chromosomes because the relative sequence counts are in three clusters: AA at the top of the plot, BB at the bottom of the plot, and AB in the middle of the plot. The data for trisomy 21 show four clusters: AAA at the top of the plot, AAB around the line 0.65 (2/3), ABB around the line 0.35 (1/3), and BBB at the bottom of the plot.

图17A-D示出了相同的800重方案的数据，但是是针对从孕妇的四个血浆样品扩增的DNA所测量的。关于这四个样品，我们预计看到七个点簇：(1)沿着图的顶部的是其中母亲和胎儿均是AA的那些基因座；(2)略低于图的顶部的是其中母亲是AA并且胎儿是AB的那些基因座；(3)略高于线0.5的是其中母亲是AB并且胎儿是AA的那些基因座；(4)沿着线0.5的是其中母亲和胎儿均是AB的那些基因座；(5)略低于线0.5的是其中母亲是AB并且胎儿是BB的那些基因座；(6)略高于图的底部的是其中母亲是BB并且胎儿是AB的那些基因座；(1)沿着图底部的是其中母亲和胎儿均是BB的那些基因座。胎儿分数越小，簇(1)与(2)之间、簇(3)、(4)与(5)之间以及簇(6)与(7)之间的间隔越小。预计间隔是胎儿来源的DNA的分数的一半。举例来说，如果DNA是20％胎儿的并且80％母本的，那么我们预计(1)到(7)分别以1.0、0.9、0.6、0.5、0.4、0.1和0.0为中心；参看例如图17D，POOL1_BC5_参考比率。如果改为DNA是8％胎儿的并且92％母本的，那么我们预计(1)到(7)分别以1.00、0.96、0.54、0.50、0.46、0.04和0.00为中心；参看例如图17B，POOL1_BC2_参考比率。如果检测不到胎儿DNA，那么我们预计看不到(2)、(3)、(5)或(6)；或者，我们可以说间隔是零，并且因此(1)和(2)相互重叠，正如(3)、(4)和(5)以及(6)和(7)一样；参看例如图17C，POOL1_BC7_参考比率。应注意，图17A，POOL1_BC1_参考比率的胎儿分数是约25％。Figures 17A-D show data for the same 800-plex protocol, but measured on DNA amplified from four plasma samples from pregnant women. For these four samples, we expect to see seven clusters of points: (1) along the top of the plot are those loci where both the mother and fetus are AA; (2) just below the top of the plot are those loci where the mother Those loci that are AA and the fetus is AB; (3) just above the line 0.5 are those loci where the mother is AB and the fetus is AA; (4) along the line 0.5 are those where both mother and fetus are AB (5) slightly below the line 0.5 are those loci where the mother is AB and the fetus is BB; (6) slightly above the bottom of the plot are those genes where the mother is BB and the fetus is AB loci; (1) along the bottom of the figure are those loci where both mother and fetus are BB. The smaller the fetal fraction, the smaller the intervals between clusters (1) and (2), between clusters (3), (4) and (5), and between clusters (6) and (7). The interval is expected to be half the fraction of fetal-derived DNA. For example, if the DNA is 20% fetal and 80% maternal, then we would expect (1) to (7) to be centered around 1.0, 0.9, 0.6, 0.5, 0.4, 0.1, and 0.0, respectively; see eg Figure 17D , POOL1_BC5_Reference Ratio. If instead the DNA is 8% fetal and 92% maternal, then we expect (1) to (7) to be centered at 1.00, 0.96, 0.54, 0.50, 0.46, 0.04, and 0.00, respectively; see for example Figure 17B, POOL1_BC2 _Reference ratio. If no fetal DNA is detected, then we would not expect to see (2), (3), (5) or (6); alternatively, we could say that the interval is zero, and thus (1) and (2) overlap each other, As with (3), (4) and (5) and (6) and (7); see eg Figure 17C, POOL1_BC7_Reference Ratio. It should be noted, Figure 17A, that the fetal fraction of POOL1_BC1_reference ratio is about 25%.

实验5Experiment 5

DNA扩增和测量的大多数方法都会产生一些等位基因偏差，其中通常在基因座发现的两种等位基因所检测到的强度或计数并不代表DNA样品中等位基因的实际量。举例来说，对于单一个体，在杂合基因座，我们预计看到两种等位基因的比率是1∶1，这是杂合基因座的预计理论比率；然而由于等位基因偏差，我们会看到55∶45、或甚至60∶40。还应注意，在测序的情况下，如果读数深度低，那么简单的随机噪声都会引起显著的等位基因偏差。在一个实施例中，有可能对每个SNP的行为进行建模以使得如果针对特定等位基因观察到一致偏差，那么可以校正这个偏差。图18示出了可以通过偏差校正之前和之后的二项式方差来解释的数据分数。在图18中，星形表示在800重实验的原始序列数据上所观察到的等位基因偏差；圆圈表示在校正之后的等位基因偏差。应注意，如果完全不存在等位基因偏差，那么我们将预计数据沿着线x＝y下降。通过使用150重靶向扩增来扩增DNA产生的一组类似数据在偏差校正之后产生的数据下降到非常接近线1∶1上。Most methods of DNA amplification and measurement introduce some allelic bias, where the detected intensities or counts of the two alleles commonly found at a locus do not represent the actual amount of the allele in the DNA sample. For example, for a single individual, at a heterozygous locus, we would expect to see a 1:1 ratio of the two alleles, which is the predicted theoretical ratio for a heterozygous locus; however, due to allelic bias, we would See 55:45, or even 60:40. It should also be noted that in the case of sequencing, simple random noise can cause significant allelic bias if the read depth is low. In one embodiment, it is possible to model the behavior of each SNP such that if a consistent bias is observed for a particular allele, this bias can be corrected for. Figure 18 shows the fraction of data that can be explained by the binomial variance before and after bias correction. In Figure 18, the stars represent the allelic bias observed on the raw sequence data of the 800-plex experiment; the circles represent the allelic bias after correction. Note that if there was no allelic bias at all, then we would expect the data to fall along the line x=y. A similar set of data generated by amplifying DNA using 150-plex targeted amplification produced data that fell very close to the line 1:1 after bias correction.

实验6Experiment 6

使用接合衔接子与对衔接子标记具有特异性的引物进行DNA的通用扩增具有富集更短的DNA链的比例的作用，其中引物退火和延伸时间限于几分钟。经设计用于创建适用于测序的DNA库的大多数库方案含有这类步骤，并且公开了例示性方案并且所述例示性方案对本领域中的技术人员是众所周知的。在本发明的一些实施例中，将含通用标记的衔接子接合到血浆DNA，并且使用对衔接子标记具有特异性的引物进行扩增。在一些实施例中，通用标记可以是如用于测序所使用的相同标记，它可以是仅仅用于PCR扩增的通用标记或它可以是一组标记。因为胎儿DNA在自然界中通常是短的，而母本DNA在自然界中可以是短的和长的，所以这种方法具有富集混合物中的胎儿DNA的比例的作用。自由浮动的DNA被认为是来自凋亡细胞的DNA，并且含有胎儿和母本的DNA，是短的，大部分低于200bp。通过细胞溶解(在静脉切开术之后的常见现象)释放的细胞DNA通常几乎完全是母本的，并且也非常长，大部分超过500bp。因此，静置超过几分钟的血液样品将含有短的(胎儿的+母本的)和更长的(母本的)DNA的混合物。在与已经仅使用靶向扩增来扩增的血浆相比时，以相对较短的延伸时间对母本血浆执行通用扩增，接着进行靶向扩增将往往会增加胎儿DNA的相对比例。这可以在图19中看到，所述图示出了当输入是血浆DNA时测量的胎儿百分比(纵轴)对比当输入DNA是已使用伊路米那GAIIx库制备方案制备的库的血浆DNA时测量的胎儿百分比。所有的点下降到所述线以下，指示库制备步骤浓缩了胎儿来源的DNA的分数。当在靶向扩增之前执行库制备时，两个红色(指示溶血作用并且因此所存在的长母本DNA的量将从细胞溶解有所增加)的血浆样品显示胎儿分数的尤其显著的富集。本文中所公开的方法尤其适用于其中存在溶血作用的情况或已经发生了其中包含受到污染的DNA的相对较长的链的细胞已经溶解、短DNA与长DNA的混合样品受到污染的一些其它情况。相对较短的退火和延伸时间通常是介于30秒与2分钟之间，但是它们可以短到5或10秒或更短，或长达5或10分钟。Universal amplification of DNA using ligated adapters with primers specific for adapter tags has the effect of enriching the fraction of shorter DNA strands where primer annealing and extension times are limited to a few minutes. Most library protocols designed to create DNA libraries suitable for sequencing contain such steps, and exemplary protocols are disclosed and well known to those skilled in the art. In some embodiments of the invention, adapters containing universal markers are ligated to plasma DNA and amplified using primers specific for the adapter markers. In some embodiments, the universal marker can be the same marker as used for sequencing, it can be a universal marker used only for PCR amplification or it can be a set of markers. Because fetal DNA is generally short in nature, while maternal DNA can be both short and long in nature, this method has the effect of enriching the proportion of fetal DNA in the mixture. Free-floating DNA is thought to be DNA from apoptotic cells, and contains both fetal and maternal DNA, and is short, mostly below 200 bp. Cellular DNA released by cell lysis (a common phenomenon following phlebotomy) is usually almost exclusively maternal and is also very long, mostly in excess of 500 bp. Thus, a blood sample that has been left undisturbed for more than a few minutes will contain a mixture of short (fetal+maternal) and longer (maternal) DNA. Performing universal amplification on maternal plasma with a relatively short extension time followed by targeted amplification will tend to increase the relative proportion of fetal DNA when compared to plasma that has been amplified using only targeted amplification. This can be seen in Figure 19, which shows the measured fetal percentage (vertical axis) when the input is plasma DNA versus plasma DNA from a library that has been prepared using the Illumina GAIIx library preparation protocol Fetal percentage measured at time. All points fell below the line, indicating that the library preparation step enriched the fraction of fetal-derived DNA. The two plasma samples in red (indicating hemolysis and thus the amount of long maternal DNA present will increase from lysis) show a particularly pronounced enrichment of the fetal fraction when library preparation is performed prior to targeted amplification . The methods disclosed herein are particularly applicable in situations where there is hemolysis or some other situation where cells containing relatively long strands of contaminated DNA have been lysed, mixed samples of short and long DNA are contaminated . Relatively short annealing and extension times are typically between 30 seconds and 2 minutes, but they can be as short as 5 or 10 seconds or less, or as long as 5 or 10 minutes.

实验7Experiment 7

以下方案用于使用直接PCR方案以及半嵌套式方法，对从整倍体怀孕者的母本血浆中分离的DNA以及来自三倍性21细胞系的基因组DNA进行1,200重扩增。库制备和扩增涉及单管末端平端化，接着是A加尾。使用安捷伦休尔塞莱克特试剂盒中所发现的接合试剂盒的改良来运行衔接子接合，并且运行PCR持续7个周期。在靶向引物池中，对来自21号染色体的SNP进行550个检测，并且对来自1号和X染色体中的每一个的SNP进行325个检测。两个方案涉及15个STA周期(95℃，30s；72℃，1min；60℃，4min；65℃，30s；72℃，30s)，使用16nM引物浓度。半嵌套式PCR方案涉及15个STA周期(95℃，30s；72℃，1min；60℃，4min；65℃，30s；72℃，30s)的第二扩增，使用29nM内部正向标记浓度和1μM或0.1μM反向标记浓度。然后用伊路米那IIGAX测序仪对DNA进行测序。关于直接PCR方案，73％的读数映射到基因组；关于半嵌套式方案，97.2％的序列读数映射到基因组。因此，半嵌套式方案多产生约30％的信息，可能主要是由于消除了最可能产生引物二聚体的引物。The following protocol was used for the 1,200-plex amplification of DNA isolated from maternal plasma of a euploid pregnant woman and genomic DNA from a triploid 21 cell line using a direct PCR protocol with a semi-nested approach. Library preparation and amplification involved blunting of single tube ends followed by A-tailing. Adapter ligation was performed using a modification of the ligation kit found in the Agilent Select kit, and PCR was run for 7 cycles. In the targeted primer pool, 550 tests were performed for SNPs from chromosome 21 and 325 tests were performed for SNPs from each of chromosomes 1 and X. Both protocols involved 15 STA cycles (95°C, 30s; 72°C, 1 min; 60°C, 4min; 65°C, 30s; 72°C, 30s), using 16nM primer concentrations. The semi-nested PCR protocol involved a second amplification of 15 STA cycles (95°C, 30s; 72°C, 1min; 60°C, 4min; 65°C, 30s; 72°C, 30s), using an internal forward marker concentration of 29nM and 1 μM or 0.1 μM reverse labeling concentration. DNA was then sequenced with an Illumina IIGAX sequencer. For the direct PCR protocol, 73% of the reads mapped to the genome; for the semi-nested protocol, 97.2% of the sequence reads mapped to the genome. Thus, the semi-nested protocol yielded about 30% more information, probably primarily due to the elimination of the primers most likely to generate primer-dimers.

当使用半嵌套式方案时的读数深度变化性往往高于当使用直接PCR方案时(参看图20)，其中菱形是指用半嵌套式方案运行的基因座的读数深度，并且正方形是指在不嵌套的情况下运行的基因座的读数深度。关于菱形，SNP是通过读数深度排列的，因此菱形全部落在曲线上，而正方形的相关性似乎松散；SNP的排列是任意的，并且点的高度表示读数深度而不是它的左到右的位置。Read depth variability tends to be higher when using the semi-nested protocol than when using the direct PCR protocol (see Figure 20), where diamonds refer to the read depth for loci run with the semi-nested protocol and squares refer to Read depth for loci run without nesting. Regarding the diamonds, the SNPs are arranged by read depth, so the diamonds all fall on the curve, while the squares seem to be loosely correlated; the arrangement of the SNPs is arbitrary, and the height of the point indicates the read depth rather than its left-to-right position .

在一些实施例中，本文中所述的方法可以达到极佳的读数深度(DOR)方差。举例来说，在使用基因组DNA的1,200重直接PCR扩增的这个实验的一个版本(图21)中，在1,200个检测中：1186个检测的DOR大于10；平均读数深度是400；1063个检测(88.6％)的读数深度介于200与800之间，并且具有理想窗，其中每个等位基因的读数数量高到足以给出有意义的数据，而每个等位基因的读数数量并没有高到那些读数的边际用途特别小。仅仅12个等位基因具有更高的读数深度，其中最高是1035个读数。DOR的标准偏差是290，平均DOR是453，DOR的方差系数是64％，存在950,000个总读数，并且63.1％的读数映射到基因组。在使用1,200重半嵌套式方案的另一个实验(图22)中，DOR更高。DOR的标准偏差是583，平均DOR是630，DOR的方差系数是93％，存在870,000个总读数，并且96.3％的读数映射到基因组。注意，在这两种情况中，SNP是通过母亲的读数深度排列的，因此曲线表示母本读数深度。孩子和父亲之间的差别并不显著；它只是为了这个解释而显著的趋势。In some embodiments, the methods described herein can achieve excellent depth-of-read (DOR) variance. For example, in one version of this experiment (FIG. 21 ) using 1,200-plex direct PCR amplification of genomic DNA, out of 1,200 assays: 1186 assays had a DOR greater than 10; the average read depth was 400; 1063 assays (88.6%) had a read depth between 200 and 800 with an ideal window where the number of reads per allele was high enough to give meaningful data, whereas the number of reads per allele was not The marginal use of readings as high as those is particularly small. Only 12 alleles had higher read depth, the highest being 1035 reads. The standard deviation of DOR was 290, the mean DOR was 453, the coefficient of variance of DOR was 64%, there were 950,000 total reads, and 63.1% of the reads mapped to the genome. In another experiment (FIG. 22) using a 1,200-fold semi-nested scheme, the DOR was higher. The standard deviation of DOR was 583, the mean DOR was 630, the coefficient of variance of DOR was 93%, there were 870,000 total reads, and 96.3% of the reads mapped to the genome. Note that in both cases, the SNPs are ranked by the mother's read depth, so the curve represents the maternal read depth. The difference between children and fathers is not significant; it's just a trend that is significant for this explanation.

实验8Experiment 8

在一个实验中，半嵌套式1,200重PCR用于从一个细胞和从三个细胞扩增DNA。这个实验与使用从母本血液中分离的胎儿细胞进行产前非整倍性测试或使用活检分裂球或滋养外胚层样品进行植入前基因诊断有关。每种情况存在来自2名个体(46XY和47XX+21)的1个和3个细胞的3个复制。检测靶向1号、21号和X染色体。使用三种不同的溶解方法：阿克图尔斯(ARCTURUS)、MPERv2和碱性溶解。在一个测序泳道中对复合的48个样品运行测序。针对三条染色体中的每一个并且针对所述复制中的每一个，算法返回正确的倍性判读。In one experiment, semi-nested 1,200-plex PCR was used to amplify DNA from one cell and from three cells. This experiment is relevant for prenatal aneuploidy testing using fetal cells isolated from maternal blood or for preimplantation genetic diagnosis using biopsied blastomere or trophectoderm samples. In each case there were 3 replicates of 1 and 3 cells from 2 individuals (46XY and 47XX+21). The test targets chromosomes 1, 21, and X. Three different lysis methods were used: ARCTURUS, MPERv2 and alkaline lysis. Sequencing was run on a pool of 48 samples in one sequencing lane. The algorithm returns the correct ploidy call for each of the three chromosomes and for each of the replicates.

实验9Experiment 9

在一个实验中，制备四个母本血浆样品并且使用半侧嵌套式9,600重方案扩增。按以下方式制备样品：离心多达40mL母本血液以分离白细胞层和血浆。从白细胞层制备母本样品中的基因组DNA并且从血液样品或唾液样品制备父本DNA。使用凯杰循环核酸试剂盒(CIRCULATING NUCLEIC ACID kit)分离母本血浆中的无细胞DNA并且根据制造商的说明书，在45μL TE缓冲液中洗脱。将通用接合衔接子附接到35μL经纯化血浆DNA的每个分子的末端并且使用衔接子特异性引物扩增库，持续7个周期。用安津考特安普蕾(AGENCOURTAMPURE)珠粒纯化库并且在50μl水中洗脱。In one experiment, four maternal plasma samples were prepared and amplified using a hemi-nested 9,600-plex protocol. Samples were prepared as follows: Up to 40 mL of maternal blood was centrifuged to separate the buffy coat and plasma. Genomic DNA in maternal samples is prepared from buffy coat and paternal DNA is prepared from blood or saliva samples. Cell-free DNA in maternal plasma was isolated using the CIRCULATING NUCLEIC ACID kit and eluted in 45 μL TE buffer according to the manufacturer's instructions. Universal junction adapters were attached to the ends of each molecule of 35 μL of purified plasma DNA and the pool was amplified using adapter-specific primers for 7 cycles. The library was purified with AGENCOURTAMPURE beads and eluted in 50 μl of water.

使用引物浓度为14.5nM的9600个目标特异性标记反向引物和500nM的一个库衔接子特异性正向引物，使3μl DNA扩增15个STA周期(首先在95℃下聚合酶活化10min；然后是15个周期：95℃，30s；72℃，10s；65℃，1min；60℃，8min；65℃，3min以及72℃，30s；以及最后在72℃下延伸2min)。3 μl of DNA was amplified for 15 STA cycles using 9600 target-specific tagged reverse primers at a primer concentration of 14.5 nM and one library adapter-specific forward primer at 500 nM (first polymerase activation at 95°C for 10 min; then is 15 cycles: 95°C, 30s; 72°C, 10s; 65°C, 1min; 60°C, 8min; 65°C, 3min and 72°C, 30s; and finally extension at 72°C for 2min).

半侧嵌套式PCR方案涉及使用1000nM反向标记浓度和针对9600个目标特异性正向引物中的每一个16.6u nM浓度，进行第一STA结果的稀释产物的第二扩增，持续15个STA周期(首先在95℃下聚合酶活化10min；然后是15个周期：95℃，30s；65℃，1min；60℃，5min；65℃，5min以及72℃，30s；以及最后在72℃下延伸2min)。The hemi-nested PCR protocol involved a second amplification of the diluted product of the first STA result using a reverse marker concentration of 1000 nM and a concentration of 16.6u nM for each of the 9600 target-specific forward primers for 15 STA cycle (first polymerase activation at 95°C for 10min; then 15 cycles: 95°C, 30s; 65°C, 1min; 60°C, 5min; 65°C, 5min and 72°C, 30s; and finally at 72°C extension 2min).

然后通过标准PCR，用1μM标记特异性正向和带条形码的反向引物扩增STA产物的等分试样，持续10个周期，产生带条形码的测序库。每个库的等分试样与不同条形码库混合并且使用离心柱纯化。Aliquots of the STA products were then amplified by standard PCR with 1 μM marker-specific forward and barcoded reverse primers for 10 cycles to generate barcoded sequencing libraries. Aliquots of each library were mixed with different barcoded libraries and purified using spin columns.

以此方式，在单孔反应中使用9,600个引物；所述引物被设计成用于靶向在1号、2号、13号、18号、21号、X和Y染色体上所发现的SNP。然后使用伊路米那GAIIX测序仪对扩增子进行测序。通过测序仪，每个样品产生约390万读数，其中370万(94％)读数映射到基因组，并且在那些读数中，290万读数(74％)以344的平均读数深度和255的中值读数深度映射到靶向SNP。发现四个样品的胎儿分数是9.9％、18.9％、16.3％和21.2％。In this way, 9,600 primers were used in a single well reaction; the primers were designed to target SNPs found on chromosomes 1, 2, 13, 18, 21, X and Y. The amplicons were then sequenced using an Illumina GAIIX sequencer. Through the sequencer, approximately 3.9 million reads were generated per sample, of which 3.7 million (94%) mapped to the genome, and of those reads, 2.9 million reads (74%) at an average read depth of 344 and a median read depth of 255 Depth mapping to targeted SNPs. The fetal fractions of the four samples were found to be 9.9%, 18.9%, 16.3% and 21.2%.

相关的母本和父本的基因组DNA样品使用半嵌套式9600重方案扩增并进行测序。半嵌套式方案的不同之处在于它在第一STA中施加9,600个外部正向引物和7.3nM标记反向引物。第二STA的热循环条件和组成以及条形码PCR与半侧嵌套式方案相同。Related maternal and paternal genomic DNA samples were amplified and sequenced using a semi-nested 9600-plex protocol. The semi-nested protocol differs in that it applies 9,600 outer forward primers and 7.3nM labeled reverse primers in the first STA. Thermal cycling conditions and composition for the second STA and barcode PCR were the same as for the half-side nested protocol.

使用本文中所公开的信息法分析测序数据并且判读其DNA存在于4个母本血浆样品中的胎儿的六条染色体的倍性状态。以超过99.2％的置信度正确地判读所述组中所有28条染色体的倍性判读，除了一条染色体被正确地判读，但是置信度是83％。The sequencing data were analyzed using the informatics method disclosed herein and the ploidy status of six chromosomes of fetuses whose DNA was present in 4 maternal plasma samples was called. Ploidy calls for all 28 chromosomes in the panel were correctly called with greater than 99.2% confidence, except for one chromosome which was called correctly, but with 83% confidence.

图23示出了9,600重半侧嵌套方法的读数深度以及在实验7中所述的1,200重半嵌套式方法的读数深度，但是读数深度大于100、大于200并且大于400的SNP的数量显著高于1,200重方案中。可以将在第90个百分位的读数数量除以在第10个百分位的读数数量，得到指示读数深度的均匀性的无量纲度量值；数字越小，读数深度越均匀(窄)。关于在实验9中运行的方法，第90个百分位/第10个百分位的平均比率是11.5；而关于在实验7中运行的方法，所述平均比率是5.6。针对指定的方案复合度，读数深度越窄，测序效率越好，因为用于确保一定百分比的读数超过读数数量阈值所必需的序列读数越少。Figure 23 shows the read depth of the 9,600 multiple hemi-nested approach and the 1,200 multiple hemi-nested approach described in Experiment 7, but the number of SNPs with read depths greater than 100, greater than 200, and greater than 400 were significantly Higher than 1,200 in heavy program. The number of reads at the 90th percentile can be divided by the number of reads at the 10th percentile to obtain a dimensionless measure indicating the uniformity of the read depth; the lower the number, the more uniform (narrow) the read depth. For the method run in Experiment 9, the average 90th percentile/10th percentile ratio was 11.5; for the method run in Experiment 7, the average ratio was 5.6. For a given protocol multiplex, the narrower the read depth, the better the sequencing efficiency because fewer sequence reads are necessary to ensure that a certain percentage of reads exceeds the read number threshold.

实验10Experiment 10

在一个实验中，制备四个母本血浆样品并且使用半嵌套式9,600重方案扩增。实验10的细节非常类似于实验9，例外之处是嵌套方案，并且包括四个样品的身份。以超过99.7％的置信度正确地判读所述组中所有28条染色体的倍性判读。760万(97％)读数映射到基因组，并且630万(80％)读数映射到靶向SNP。平均读数深度是751，并且中值读数深度是396。In one experiment, four maternal plasma samples were prepared and amplified using a semi-nested 9,600-plex protocol. The details of Experiment 10 are very similar to Experiment 9, with the exception of a nested scheme and including the identities of the four samples. Ploidy calls for all 28 chromosomes in the panel were correctly called with greater than 99.7% confidence. 7.6 million (97%) reads mapped to the genome and 6.3 million (80%) reads mapped to targeted SNPs. The average reading depth was 751 and the median reading depth was 396.

实验11Experiment 11

在一个实验中，将三个母本血浆样品拆分成五个相等部分，并且每个部分使用2,400个复合引物(四个部分)或1,200个复合引物(一个部分)扩增并且使用半嵌套式方案扩增，总共10,800个引物。在扩增之后，将所述部分合并到一起用于测序。实验11的细节非常类似于实验9，例外之处在于嵌套方案和拆分与合并方法。以超过99.7％的置信度正确地判读所述组中所有21条染色体的倍性判读，除了一个丢失的判读，其中置信度是83％。340万读数映射到靶向SNP，平均读数深度是404并且中值读数深度是258。In one experiment, three maternal plasma samples were split into five equal fractions, and each fraction was amplified using 2,400 compound primers (four fractions) or 1,200 compound primers (one fraction) and using semi-nested Amplified by the formula protocol, a total of 10,800 primers. After amplification, the fractions are pooled together for sequencing. The details of Experiment 11 are very similar to Experiment 9, with the exception of the nesting scheme and the split-and-merge approach. The ploidy calls for all 21 chromosomes in the panel were correctly called with more than 99.7% confidence, except for one missing call, where the confidence was 83%. 3.4 million reads mapped to targeted SNPs with an average read depth of 404 and a median read depth of 258.

实验12Experiment 12

在一个实验中，将四个母本血浆样品拆分成四个相等部分，并且每个部分使用2,400个复合引物扩增并且使用半嵌套式方案扩增，总共9,600个引物。在扩增之后，将所述部分合并到一起用于测序。实验12的细节非常类似于实验9，例外之处在于嵌套方案和拆分与合并方法。以超过97％的置信度正确地判读所述组中所有28条染色体的倍性判读，除了一个丢失的判读，其中置信度是78％。450万读数映射到靶向SNP，平均读数深度是535并且中值读数深度是412。In one experiment, four maternal plasma samples were split into four equal fractions, and each fraction was amplified with 2,400 composite primers and amplified using a semi-nested protocol, for a total of 9,600 primers. After amplification, the fractions are pooled together for sequencing. The details of Experiment 12 are very similar to Experiment 9, with the exception of the nesting scheme and the split-and-merge approach. The ploidy calls for all 28 chromosomes in the panel were correctly called with more than 97% confidence, except for one missing call, where the confidence was 78%. 4.5 million reads mapped to targeted SNPs with an average read depth of 535 and a median read depth of 412.

实验13Experiment 13

在一个实验中，制备四个母本血浆样品并且使用9,600重三重半侧嵌套式方案扩增，总共9,600个引物。实验12的细节非常类似于实验9，例外之处在于涉及三轮扩增的嵌套方案；三轮分别涉及15、10和15个STA周期。以超过99.9％的置信度正确地判读所述组中28条染色体中的27条的倍性判读，除了一条以94.6％正确地判读，并且一个置信度为80.8％的丢失的判读。350万读数映射到靶向SNP，平均读数深度是414并且中值读数深度是249。In one experiment, four maternal plasma samples were prepared and amplified using a 9,600-plex triple hemi nested protocol for a total of 9,600 primers. The details of Experiment 12 were very similar to Experiment 9, with the exception of a nested protocol involving three rounds of amplification; the three rounds involved 15, 10 and 15 STA cycles, respectively. The ploidy calls for 27 of the 28 chromosomes in the panel were correctly called with greater than 99.9% confidence, except for one that was correctly called at 94.6% and one missing call with 80.8% confidence. 3.5 million reads mapped to targeted SNPs with an average read depth of 414 and a median read depth of 249.

实验14Experiment 14

在一个实验中，使用1,200重半嵌套式方案扩增45组细胞，测序，并且对三条染色体做出倍性测定。应注意，这个实验是打算模拟对来自第3天的胚胎的单细胞活检或来自第5天的胚胎的滋养外胚层活检进行植入前基因诊断的条件。将15个单独的单细胞和30组三个细胞放置于45个单独的反应管中进行总共45个反应，其中每个反应包含来自仅一个细胞系的细胞，但是不同反应包含来自不同细胞系的细胞。将细胞制备到5μl洗涤缓冲液中并且通过添加5μl阿克图尔斯皮克普蕾(ARCTURUS PICOPURE)溶解缓冲液(应用生物系统)溶解并且在56℃下保温20分钟，在95℃下保温10分钟。In one experiment, 45 sets of cells were amplified using a 1,200-plex semi-nested protocol, sequenced, and ploidy determined for three chromosomes. It should be noted that this experiment is intended to simulate the conditions of preimplantation genetic diagnosis on single cell biopsies from day 3 embryos or trophectoderm biopsies from day 5 embryos. 15 individual single cells and 30 groups of three cells were placed in 45 individual reaction tubes for a total of 45 reactions, where each reaction contained cells from only one cell line, but different reactions contained cells from different cell lines cell. Cells were prepared into 5 μl of wash buffer and lysed by adding 5 μl of Arcturus PICOPURE lysis buffer (Applied Biosystems) and incubated at 56°C for 20 minutes, at 95°C for 10 minute.

使用引物浓度为50nM的1200个目标特异性正向和标记反向引物，扩增单/三个细胞的DNA持续25个STA周期(首先在95℃下聚合酶活化10min；然后是25个周期：95℃，30s；72℃，10s；65℃，1min；60℃，8min；65℃，3min以及72℃，30s；以及最后在72℃下延伸2min)。DNA from single/triple cells was amplified for 25 STA cycles using 1200 target-specific forward and labeled reverse primers at a primer concentration of 50 nM (first polymerase activation at 95°C for 10 min; then 25 cycles: 95°C, 30s; 72°C, 10s; 65°C, 1min; 60°C, 8min; 65°C, 3min and 72°C, 30s; and finally extension at 72°C for 2min).

半嵌套式PCR方案涉及使用1000nM反向标记特异性引物浓度和针对400个目标特异性嵌套式正向引物中的每一个60nM的浓度，进行第一STA结果的稀释产物的三个并行的第二扩增，持续20个STA周期(首先在95℃聚合酶活化10min；然后是15个周期：95℃，30s；65℃，1min；60℃，5min；65℃，5min以及72℃，30s；以及最后在72℃下延伸2min)。在三个并行的400重反应中，由此扩增在第一STA中扩增的总共1200个目标。The semi-nested PCR protocol involved three parallel runs of the diluted product of the first STA result using a reverse marker-specific primer concentration of 1000 nM and a concentration of 60 nM for each of the 400 target-specific nested forward primers. Second amplification, continued for 20 STA cycles (first polymerase activation at 95°C for 10min; then 15 cycles: 95°C, 30s; 65°C, 1min; 60°C, 5min; 65°C, 5min and 72°C, 30s ; and a final extension at 72° C. for 2 min). A total of 1200 targets amplified in the first STA were thus amplified in three parallel 400-plex reactions.

然后通过标准PCR，用1μM标记特异性正向和带条形码的反向引物扩增STA产物的等分试样，持续15个周期，产生带条形码的测序库。每个库的等分试样与不同条形码库混合并且使用离心柱纯化。Aliquots of the STA products were then amplified by standard PCR with 1 μM marker-specific forward and barcoded reverse primers for 15 cycles to generate barcoded sequencing libraries. Aliquots of each library were mixed with different barcoded libraries and purified using spin columns.

以此方式，在单细胞反应中使用200个引物；所述引物被设计成用于靶向在1号、21号和X染色体上所发现的SNP。然后使用伊路米那GAIIX测序仪对扩增子进行测序。通过测序仪，每个样品产生约390万读数，其中500,000到800,000百万读数映射到基因组(每个样品74％到94％的所有读数)。In this way, 200 primers were used in single-cell reactions; the primers were designed to target SNPs found on chromosomes 1, 21 and X. The amplicons were then sequenced using an Illumina GAIIX sequencer. Through the sequencer, approximately 3.9 million reads were generated per sample, of which 500,000 to 800,000 million reads mapped to the genome (74% to 94% of all reads per sample).

使用相同的半嵌套式1200重检测池与类似方案，以更少的周期和1200重第二STA，分析来自细胞系的相关的母本和父本的基因组DNA样品，并且进行测序。Related maternal and paternal genomic DNA samples from cell lines were analyzed and sequenced using the same semi-nested 1200-plex pool with a similar protocol, with fewer cycles and a 1200-plex second STA.

使用本文中所公开的信息法分析测序数据并且判读所述样品的三条染色体的倍性状态。The sequencing data was analyzed using the informative method disclosed herein and the ploidy state of the three chromosomes of the sample was called.

图24示出了六个样品的三条染色体(1＝1号染色体；2＝21号染色体；3＝X染色体)的归一化读数深度比率(纵轴)。将比率设定成等于映射到所述染色体的读数的数量，进行归一化，并且除以映射到所述染色体的读数的数量，对三个孔取平均，每个包含三个46XY细胞。对应于46XY反应的三组数据点预计比率是1∶1。对应于47XX+21细胞的三组数据点预计1号染色体的比率是1∶1，21号染色体是1.5∶1，并且X染色体是2∶1。Figure 24 shows the normalized read depth ratios (vertical axis) for three chromosomes (1=chromosome 1; 2=chromosome 21; 3=chromosome X) for six samples. The ratio was set equal to the number of reads mapped to the chromosome, normalized, and divided by the number of reads mapped to the chromosome, averaged over three wells, each containing three 46XY cells. The expected ratio of the three sets of data points corresponding to the 46XY reaction is 1:1. The three sets of data points corresponding to 47XX+21 cells predict a ratio of 1:1 for chromosome 1, 1.5:1 for chromosome 21, and 2:1 for chromosome X.

图25示出了针对三条染色体(1号、21号、X)针对三个反应所绘制的等位基因比率。左下方的反应示出了在三个46XY细胞上的反应。左侧区域是1号染色体的等位基因比率，中间区域是21号染色体的等位基因比率，并且右侧区域是X染色体的等位基因比率。关于46XY细胞，关于1号染色体，我们预计看到的比率是1、0.5和0，对应于SNP基因型AA、AB和BB。关于46XY细胞，关于21号染色体，我们预计看到的比率是1、0.5和0，对应于SNP基因型AA、AB和BB。关于46XY细胞，关于X染色体，我们预计看到的比率是1和0，对应于SNP基因型A和B。右下方的反应示出了在三个47XX+21细胞上的反应。如左下图中一样，通过染色体隔开各等位基因比率。关于47XX+21细胞，关于1号染色体，我们预计看到的比率是1、0.5和0，对应于SNP基因型AA、AB和BB。关于47XX+21细胞，关于21号染色体，我们预计看到的比率是1、0.67、0.33和0，对应于SNP基因型AAA、AAB、ABB和BBB。关于47XX+21细胞，关于X染色体，我们预计看到的比率是1、0.5和0，对应于SNP基因型AA、AB和BB。右上方的图是关于包含来自47XX+21细胞系的1ng基因组DNA的反应得到的。图26示出了如图25中相同的图，但是仅对一个细胞执行反应。左图是包含47XX+21细胞的反应，并且右图是包含46XX细胞的反应。Figure 25 shows the allele ratios plotted for the three reactions for the three chromosomes (1, 21, X). The lower left response shows the response on three 46XY cells. The left region is the allele ratio of chromosome 1, the middle region is the allele ratio of chromosome 21, and the right region is the allele ratio of X chromosome. For 46XY cells, for chromosome 1, we expect to see ratios of 1, 0.5, and 0, corresponding to the SNP genotypes AA, AB, and BB. For 46XY cells, for chromosome 21 we expect to see ratios of 1, 0.5, and 0, corresponding to the SNP genotypes AA, AB, and BB. For 46XY cells, for the X chromosome, we expect to see ratios of 1 and 0, corresponding to SNP genotypes A and B. The lower right reaction shows the reaction on three 47XX+21 cells. Each allele ratio is separated by chromosome as in the lower left panel. For 47XX+21 cells, for chromosome 1, we expect to see ratios of 1, 0.5, and 0, corresponding to the SNP genotypes AA, AB, and BB. For 47XX+21 cells, for chromosome 21, we expect to see ratios of 1, 0.67, 0.33, and 0, corresponding to the SNP genotypes AAA, AAB, ABB, and BBB. For 47XX+21 cells, for the X chromosome, we expect to see ratios of 1, 0.5, and 0, corresponding to the SNP genotypes AA, AB, and BB. The upper right graph was obtained for a reaction containing 1 ng of genomic DNA from the 47XX+21 cell line. Figure 26 shows the same graph as in Figure 25, but with the reaction performed on only one cell. The left panel is the reaction containing 47XX+21 cells and the right panel is the reaction containing 46XX cells.

从图25和图26中所示的图，直观地显而易见，存在关于染色体的两个点簇，其中我们预计看到的比率是1和0；关于染色体的三个点簇，其中我们预计看到的比率是1、0.5和0；以及关于染色体的四个点簇，其中我们预计看到的比率是1、0.67、0.33和0。parental support算法能够对所有45个反应的所有三条染色体做出正确判读。From the plots shown in Figures 25 and 26, it is intuitively apparent that there are two clusters of points about chromosomes where we expect to see ratios of 1 and 0, and three clusters of points about chromosomes where we expect to see The ratios for are 1, 0.5, and 0; and four clusters of points about chromosomes, where we expect to see ratios of 1, 0.67, 0.33, and 0. The parental support algorithm was able to make correct calls for all three chromosomes for all 45 reactions.

实验15Experiment 15

在一个实验中，制备母本血浆样品并且使用半侧嵌套式19,488重方案扩增。按以下方式制备样品：离心多达20mL母本血液以分离白细胞层和血浆。从白细胞层制备母本样品中的基因组DNA并且从血液样品或唾液样品制备父本DNA。使用凯杰循环核酸试剂盒分离母本血浆中的无细胞DNA并且根据制造商的说明书，在50μL TE缓冲液中洗脱。将通用接合衔接子附接到40μL经纯化血浆DNA的每个分子的末端并且使用衔接子特异性引物扩增库，持续9个周期。用安津考特安普蕾珠粒纯化库并且在50μl DNA悬浮缓冲液中洗脱。In one experiment, maternal plasma samples were prepared and amplified using a hemi-nested 19,488-plex protocol. Samples were prepared as follows: Up to 20 mL of maternal blood was centrifuged to separate the buffy coat and plasma. Genomic DNA in maternal samples is prepared from buffy coat and paternal DNA is prepared from blood or saliva samples. Cell-free DNA in maternal plasma was isolated using the Qiagen Circulating Nucleic Acid Kit and eluted in 50 μL TE buffer according to the manufacturer's instructions. Universal ligation adapters were attached to the ends of each molecule of 40 μL of purified plasma DNA and pools were amplified using adapter-specific primers for 9 cycles. The library was purified with Anzin Coat Ampulex beads and eluted in 50 μl of DNA suspension buffer.

使用引物浓度为7.5nM的19,488个目标特异性标记反向引物和500nM的一个库衔接子特异性正向引物，使6μl DNA扩增15个第1轮STA周期(首先在95℃下聚合酶活化10min；然后是15个周期：96℃，30s；65℃，1min；58℃，6min；60℃，8min；65℃，4min以及72℃，30s；以及最后在72℃下延伸2min)。6 μl of DNA was amplified for 15 round 1 STA cycles (first polymerase activation at 95°C) using 19,488 target-specific tagged reverse primers at a primer concentration of 7.5 nM and one library adapter-specific forward primer at 500 nM 10 min; followed by 15 cycles: 96 °C, 30 s; 65 °C, 1 min; 58 °C, 6 min; 60 °C, 8 min; 65 °C, 4 min and 72 °C, 30 s; and finally extension at 72 °C for 2 min).

半侧嵌套式PCR方案涉及使用1000nM反向标记浓度和针对19,488个目标特异性正向引物中的每一个20nM浓度，进行第1轮STA结果的稀释产物的第二扩增，持续15个周期(第2轮STA)(首先在95℃下聚合酶活化10min；然后是15个周期：95℃，30s；65℃，1min；60℃，5min；65℃，5min以及72℃，30s；以及最后在72℃下延伸2min)。The hemi-nested PCR protocol involved a second amplification of the diluted product of the 1st round of STA results for 15 cycles using a reverse marker concentration of 1000 nM and a concentration of 20 nM for each of the 19,488 target-specific forward primers (2nd round of STA) (first polymerase activation at 95°C for 10min; then 15 cycles: 95°C, 30s; 65°C, 1min; 60°C, 5min; 65°C, 5min and 72°C, 30s; and finally Extension at 72°C for 2 min).

然后通过标准PCR，用1μM标记特异性正向和带条形码的反向引物扩增第2轮STA产物的等分试样，持续12个周期，产生带条形码的测序库。每个库的等分试样与不同条形码库混合并且使用离心柱纯化。Aliquots of round 2 STA products were then amplified by standard PCR with 1 μM marker-specific forward and barcoded reverse primers for 12 cycles to generate barcoded sequencing libraries. Aliquots of each library were mixed with different barcoded libraries and purified using spin columns.

以此方式，在单孔反应中使用19,488个引物；所述引物被设计成用于靶向在1号、2号、13号、18号、21号、X和Y染色体上所发现的SNP。然后使用伊路米那GAIIX测序仪对扩增子进行测序。通过测序仪，每个血浆样品产生约1000万读数，其中940-960万读数映射到基因组(94％-96％)，并且在那些读数中，99.95％以460的平均读数深度和350的中值读数深度映射到靶向SNP。为了比较，完美均匀分布将是：10M读数/19,488个目标＝513个读数/目标。关于引物二聚体，30,000个读数来自所测序的引物二聚体(由测序仪产生的读数的0.3％)。关于基因组样品，99.4％-99.7％的读数映射到基因组，在那些读数中，99.99％映射到靶向SNP，并且0.1％由测序仪产生的读数是引物二聚体。In this way, 19,488 primers were used in a single well reaction; the primers were designed to target SNPs found on chromosomes 1, 2, 13, 18, 21, X and Y. The amplicons were then sequenced using an Illumina GAIIX sequencer. Through the sequencer, approximately 10 million reads were generated per plasma sample, of which 9.4-9.6 million reads mapped to the genome (94%-96%), and of those reads, 99.95% with a mean read depth of 460 and a median of 350 Read depth mapping to targeted SNPs. For comparison, a perfect uniform distribution would be: 10M reads/19,488 targets = 513 reads/target. Regarding primer-dimers, 30,000 reads were derived from sequenced primer-dimers (0.3% of reads generated by the sequencer). Regarding genomic samples, 99.4%-99.7% of reads mapped to the genome, and of those reads, 99.99% mapped to targeted SNPs, and 0.1% of reads generated by the sequencer were primer-dimers.

关于具有1000万测序读数的血浆样品，通常19,488个靶向SNP中至少19,350个(99.3％)被扩增并测序。关于具有2M测序读数的DNA样品，通常至少19,000个靶向SNP(97.5％)被扩增并测序。更低的数量可能是由于抽样噪声，因为读数数量更低并且测序仪错过了一些扩增产物。必要时，可以增加测序读数数量以增加被扩增并测序的靶向SNP的数量。For a plasma sample with 10 million sequencing reads, typically at least 19,350 (99.3%) of the 19,488 targeted SNPs were amplified and sequenced. For a DNA sample with 2M sequencing reads, typically at least 19,000 targeted SNPs (97.5%) were amplified and sequenced. The lower number may be due to sampling noise, since the number of reads was lower and the sequencer missed some amplification products. If necessary, the number of sequencing reads can be increased to increase the number of targeted SNPs that are amplified and sequenced.

使用7.5nM半嵌套式19,488个外部正向引物和标记反向引物在第1轮STA中扩增相关的母本和父本的基因组DNA样品。第2轮STA的热循环条件和组成以及条形码PCR与半侧嵌套式方案相同。Related maternal and paternal genomic DNA samples were amplified in round 1 STA using 7.5 nM semi-nested 19,488 external forward primers and labeled reverse primers. Thermal cycling conditions and composition for round 2 STA and barcode PCR were the same as the half-side nested protocol.

发现407个样品的平均胎儿分数是14.8％。使用本文中所公开的信息法分析测序数据并且判读其DNA存在于407个母本血浆样品中的378个中的胎儿的四条染色体(13号、18号、21号、Y)和在407个母本血浆样品中的375个中的X染色体的倍性状态。以超过90％的置信度正确地判读所述组中所有1,887条染色体的倍性判读。1887个判读中的1882个超过95％；并且1,887个判读中的1,862个以超过99％的置信度判读。The mean fetal fraction of 407 samples was found to be 14.8%. The sequencing data were analyzed using the informative method disclosed herein and the DNA was called for four chromosomes (13, 18, 21, Y) of the fetus in 378 of the 407 maternal plasma samples and in 407 maternal plasma samples. Ploidy status of the X chromosome in 375 of the present plasma samples. Ploidy calls were correctly called for all 1,887 chromosomes in the panel with greater than 90% confidence. 1882 of 1887 calls were greater than 95%; and 1,862 of 1,887 calls were called with greater than 99% confidence.

使用水代替从血浆提取的DNA执行与血浆PCR方案中类似的对照实验。基于实验的六个此类试验，5％-6％的测序读数是引物二聚体。其它测序读数是由于背景噪声所致。这个实验证明，即使在不存在含有引物所杂交的目标基因座(而不是杂交到其它引物并形成扩增引物二聚体)的核酸样品的情况下，形成了更少的引物二聚体。Control experiments similar to those in the plasma PCR protocol were performed using water instead of DNA extracted from plasma. Based on six such trials experimented, 5%-6% of sequencing reads were primer-dimers. Other sequencing reads were due to background noise. This experiment demonstrates that even in the absence of a nucleic acid sample containing a target locus to which a primer hybridizes (rather than hybridizing to other primers and forming amplification primer-dimers), fewer primer-dimers are formed.

实验16Experiment 16

以下实验展示了用于设计并选择可以用于本发明的复合PCR方法中的任一种中的引物库的例示性方法。目的是为了从初始候选引物库中选择可以用于在单一反应中同时扩增大量目标基因座(或目标基因座的子组)的引物。关于一组初始候选目标基因座，不必针对每个目标基因座设计或选择引物。优选地，针对大部分最合意的目标基因座设计并选择引物。The following experiments demonstrate exemplary methods for designing and selecting primer libraries that can be used in any of the multiplex PCR methods of the invention. The goal is to select from an initial pool of candidate primers primers that can be used to simultaneously amplify a large number of target loci (or subsets of target loci) in a single reaction. With respect to an initial set of candidate target loci, it is not necessary to design or select primers for each target locus. Preferably, primers are designed and selected for most of the most desirable loci of interest.

步骤1step 1

基于关于目标基因座的所希望的参数的公开可用的信息，例如目标群体内SNP的频率或SNP的杂合率(www.ncbi.nlm.nih.gov/projects/SNP/；谢里ST(Sherry ST)，沃德MH(Ward MH)，科罗多夫M(Kholodov M)等人dbSNP：基因变异的NCBI数据库(dbSNP：the NCBI database of genetic variation).核酸研究(Nucleic Acids Res.)2001年1月1日；29(1)：308-11，其各自以全文引用的方式并入本文中)，来选择一组候选目标基因座(例如SNP)。针对每个候选基因座，使用Primer3程序(www.primer3.sourceforge.net；libprimer3版本2.2.3，其特此以全文引用的方式并入本文中)设计一或多个PCR引物对。如果没有针对特定目标基因座的可行的PCR引物设计，那么根据进一步考虑，消除所述目标基因座。Based on publicly available information on desired parameters of the target locus, such as the frequency of SNPs within the target population or the heterozygosity rate of SNPs (www.ncbi.nlm.nih.gov/projects/SNP/; Sherry ST (Sherry ST), Ward MH (Ward MH), Korodov M (Kholodov M) et al. dbSNP: the NCBI database of genetic variation (dbSNP: the NCBI database of genetic variation). Nucleic Acids Res. 2001 Jan 1;29(1):308-11, each of which is incorporated herein by reference in its entirety), to select a set of candidate target loci (eg, SNPs). For each candidate locus, one or more PCR primer pairs were designed using the Primer3 program (www.primer3.sourceforge.net; libprimer3 version 2.2.3, which is hereby incorporated by reference in its entirety). If there is no viable PCR primer design for a particular locus of interest, then on further consideration, that locus of interest is eliminated.

必要时，可以计算大多数或所有目标基因座的“目标基因座分数”(更高的分数表示更高的合意性)，例如基于目标基因座的各种所希望的参数的加权平均值，计算目标基因座分数。所述参数可以基于其对于将使用引物的特定应用的重要性分配不同的权重。例示性参数包括目标基因座的杂合率、与在目标基因座的序列(例如，多态性)相关的疾病流行率、与在目标基因座的序列(例如，多态性)相关的疾病外显率、用于扩增目标基因座的候选引物的特异性、用于扩展目标基因座的候选引物的大小以及目标扩增子的大小。If necessary, a "target locus score" (higher scores indicating higher desirability) can be calculated for most or all target loci, e.g. based on a weighted average of various desired parameters for the target loci, calculating target locus fraction. The parameters can be assigned different weights based on their importance to the particular application in which the primer will be used. Exemplary parameters include heterozygosity at the locus of interest, prevalence of disease associated with a sequence (e.g., polymorphism) at the locus of interest, prevalence of disease associated with a sequence (e.g., polymorphism) at the locus of interest, Penetrance, specificity of the candidate primers used to amplify the target locus, size of the candidate primers used to expand the target locus, and the size of the target amplicon.

步骤2step 2

从步骤1计算每个引物与所有其它目标基因座的所有引物之间的热力学相互相用分数(参看例如阿拉维H.T.(Allawi，H.T.)和小圣塔露琪亚J.(SantaLucia，J.，Jr.)(1998)，“DNA中内部C-T错配的热力学(Thermodynamics of Internal C-T Mismatches inDNA)”，核酸研究26，2694-2701；佩雷N.(Peyret，N.)，塞纳维拉特纳P.A.(Seneviratne，P.A.)，阿拉维H.T.和小圣塔露琪亚J.(1999)，“具有内部A-A、C-C、G-G和T-T错配的DNA序列的最近邻热力学和NMR(Nearest-Neighbor Thermodynamics and NMR ofDNA Sequences with Internal A-A，C-C，G-G，and T-T Mismatches)”，生物化学(Biochemistry)38，3468-3477；阿拉维H.T.和小圣塔露琪亚J.(1998)，“DNA中的内部A-C错配的最近邻热力学：序列相关性和pH作用(Nearest-Neighbor Thermodynamics ofInternal A-C Mismatches in DNA：Sequence Dependence and pH Effects)”，生物化学37，9435-9444；阿拉维H.T.和小圣塔露琪亚J.(1998)，“DNA中内部G-A错配的最近邻热力学参数(Nearest Neighbor Thermodynamic Parameters for Internal G-A Mismatches inDNA)”，生物化学37，2170-2179；以及阿拉维H.T.和小圣塔露琪亚J.(1997)，“DNA中的内部G-T错配的热力学和NMR(Thermodynamics and NMR of Internal G-TMismatches in DNA)”，生物化学36，10581-10594；MultiPLX 2.1(卡普林斯基L(Kaplinski L)，安德烈松R(Andreson R)，普兰德T(Puurand T)，雷姆M(Remm M).MultiPLX：PCR引物的自动分组和评估(MultiPLX：automatic grouping and evaluation ofPCR primers).生物信息学(Bioinformatics).2005年4月15日；21(8)：1701-2，其各自特此以全文引用的方式并入本文中)。这个步骤产生了相互相用分数的2D矩阵。相互相用分数预测涉及两种相互作用的引物的引物二聚体的似然性。如下计算分数：Calculate the thermodynamic interaction scores between each primer and all primers for all other target loci from step 1 (see e.g. Allawi, H.T. and Santa Lucia, J., Jr.) (1998), "Thermodynamics of Internal C-T Mismatches in DNA (Thermodynamics of Internal C-T Mismatches in DNA)", Nucleic Acids Research 26, 2694-2701; Pere N. (Peyret, N.), Seina Villatner P.A. (Seneviratne, P.A.), Alavi H.T., and Santa Lucia J. (1999), "Nearest-Neighbor Thermodynamics and NMR of DNA sequences with internal A-A, C-C, G-G, and T-T mismatches. NMR ofDNA Sequences with Internal A-A, C-C, G-G, and T-T Mismatches)", Biochemistry (Biochemistry) 38, 3468-3477; Alavi H.T. and Santa Lucia J. (1998), "The internal A-C in DNA Nearest-Neighbor Thermodynamics of Internal A-C Mismatches in DNA: Sequence Dependence and pH Effects", Biochemistry 37, 9435-9444; Alavi H.T. and Santa Lucia Jr. J. (1998), "Nearest Neighbor Thermodynamic Parameters for Internal G-A Mismatches in DNA", Biochemistry 37, 2170-2179; and Alavi H.T. and Santa Lucia Jr. J. (1997), "Thermodynamics and NMR of Internal G-T mismatches in DNA" and "Thermodynamics and NMR of Internal G-TMismatches in DNA", Biochemistry 36, 10581-10594; MultiPLX 2.1 (Kaplinsky L( Kaplinski L), Andreson R (Andreson R), Puurand T (Puurand T), Remm M (Remm M). MultiPLX: automatic grouping and evaluation of PCR primers (MultiPLX: automatic grouping and evalu ation of PCR primers. Bioinformatics. 2005 Apr 15;21(8):1701-2, each of which is hereby incorporated herein by reference in its entirety). This step yields a 2D matrix of cross-correlation scores. The interaction scores predict the likelihood of a primer-dimer involving two interacting primers. Calculate the score as follows:

相互作用分数＝max(-ΔG_2，0.8*(-ΔG_i))Interaction score = max(-ΔG_2, 0.8*(-ΔG_i))

其中in

ΔG_2＝可通过PCR在两个末端(即每个引物的3′端退火到另一引物)上延伸的二聚体的吉布斯能(Gibbs energy)(打破二聚体所需的能量)；并且ΔG_2 = Gibbs energy (the energy required to break the dimer) of the dimer that can be extended by PCR on both ends (i.e. the 3' end of each primer anneals to the other primer); and

ΔG_1＝可通过PCR在至少一个末端上延伸的二聚体的吉布斯能。ΔG_1 = Gibbs energy of a dimer that can be extended on at least one terminus by PCR.

步骤3：Step 3:

针对每个目标基因座，如果存在一种以上引物对设计，那么使用以下方法选出一种设计：For each locus of interest, if more than one primer pair design exists, choose a design using the following method:

1针对基因座的每个引物对设计，找到针对所述设计中的两种引物和针对所有其它目标基因座的所有设计的所有引物的最坏情况(最高)的相互相用分数。1 For each primer pair design for a locus, find the worst-case (highest) interaction score for all primers for both primers in that design and for all designs for all other target loci.

2挑选具有最佳(最低)的最坏情况相互相用分数的设计。2 Pick the design with the best (lowest) worst-case interaction score.

步骤4step 4

构建一个图以使得每个节点表示一个基因座和其相关的引物对设计(例如，最大团问题)。在每对节点之间创建一条边。给每条边分配权重，所述权重等于与由所述边连接的两个节点相关的引物之间的最坏情况(最高)相互相用分数。A graph is constructed such that each node represents a locus and its associated primer pair design (eg, maximal clique problem). Create an edge between each pair of nodes. Each edge is assigned a weight equal to the worst-case (highest) interaction score between primers associated with the two nodes connected by the edge.

步骤5step 5

必要时，针对两个不同目标基因座的每一对设计，其中来自一种设计的一种引物和来自另一种设计的一种引物将退火到与目标区域重叠，在两种设计的节点之间再加一条边。将这些边的权重设定为等于在步骤4中所分配的最高权重。因此，步骤5防止库具有将退火到与目标区域重叠的引物，并且由此防止了在复合PCR反应期间彼此干扰。When necessary, for each pair of designs for two different target loci, where one primer from one design and one primer from the other design will anneal to overlap the region of interest, between the nodes of the two designs Add a side in between. Set the weights of these edges equal to the highest weight assigned in step 4. Thus, step 5 prevents the library from having primers that would anneal to overlap the region of interest, and thus prevent interference with each other during the multiplex PCR reaction.

步骤6step 6

初始相互相用分数阈值计算如下：The initial mutual use score threshold is calculated as follows:

权重阈值＝max(边权重)-0.05*(max(边权重)-min(边权重))Weight threshold = max(edge weight)-0.05*(max(edge weight)-min(edge weight))

其中in

max(边权重)是图中的最大边权重；并且max(edgeweight) is the maximum edge weight in the graph; and

min(边权重)是图中的最小边权重。min(edgeweight) is the minimum edge weight in the graph.

阈值的初始界限设定如下：The initial bounds of the threshold are set as follows:

max_权重阈值＝max(边权重)max_weightthreshold = max(edge_weight)

min_权重阈值＝min(边权重)min_weight threshold = min(edge weight)

步骤7step 7

构造一个新图，所述图由与步骤5的图相同的节点集组成，仅包括权重超过权重阈值的边。因此，步骤忽略了分数等于或低于权重阈值的相互作用。Construct a new graph consisting of the same set of nodes as the graph of step 5, including only edges whose weight exceeds a weight threshold. Therefore, the step ignores interactions whose scores are equal to or lower than the weight threshold.

步骤8Step 8

从步骤7的图中去除节点(和所有连接到已去除节点的边)直到没有边剩下为止。节点是通过反复地施加以下程序去除的：Remove nodes (and all edges connected to removed nodes) from the graph of step 7 until no edges remain. Nodes are removed by iteratively applying the following procedure:

1寻找最高程度(最高数量的边)的节点。如果存在一个以上，那么任意挑选一个。1 Find the highest degree (highest number of edges) node. If more than one exists, pick one arbitrarily.

2定义由以上所挑选的节点和与它连接的所有节点组成的节点集，但是不包括程度小于以上所挑选的节点的任何节点。2 Define a node set consisting of the node chosen above and all nodes connected to it, but excluding any nodes of lesser degree than the node chosen above.

3从步骤1的集合中选择具有最低目标基因座分数(分数越低，表示合意性越低)的节点。从图中去除那个节点。3 Select the node with the lowest target locus score (lower scores indicate less desirability) from the set in step 1. Remove that node from the graph.

步骤9step 9

如果图中剩余的节点数量满足复合PCR池所需的目标基因座数量(在可接受容限内)，那么在步骤10继续所述方法。If the number of nodes remaining in the graph satisfies the number of target loci required for the composite PCR pool (within acceptable tolerances), then the method continues at step 10 .

如果图中剩余的节点过多或过少，那么执行二分法检索来确定怎样的阈值会使图中剩余所希望数量的节点。如果在图中存在过多节点，那么权重阈值界限调整如下：If too many or too few nodes remain in the graph, then a binary search is performed to determine what threshold would leave the desired number of nodes in the graph. If there are too many nodes in the graph, the weight threshold bounds are adjusted as follows:

max_权重阈值＝权重阈值max_weightthreshold = weightthreshold

否则的话(如果在图中存在过少节点)，那么权重阈值界限调整如下：Otherwise (if there are too few nodes in the graph), then the weight threshold bounds are adjusted as follows:

min_权重阈值＝权重阈值min_weight threshold = weight threshold

然后，权重阈值调整如下：Then, the weight threshold is adjusted as follows:

权重阈值＝(max_权重阈值+min_权重阈值)/2Weight threshold = (max_weight threshold + min_weight threshold)/2

重复步骤7-9。Repeat steps 7-9.

步骤10Step 10

为引物库选择与图中剩余的节点相关的引物对设计。这个引物库可以用于本发明方法中的任一种中。Select the primer pair design associated with the remaining nodes in the graph for the primer pool. This library of primers can be used in any of the methods of the invention.

必要时，可以对其中仅一个引物(而不是引物对)被用于扩增目标基因座的引物库执行这种设计并选择引物的方法。在此情况下，节点表示每个目标基因座的一个引物(而不是引物对)。If desired, this method of designing and selecting primers can be performed on primer pools in which only one primer (rather than a primer pair) is used to amplify the locus of interest. In this case, the nodes represent one primer (rather than a pair of primers) per locus of interest.

实验17Experiment 17

图27是比较使用本发明方法设计的两个引物库的图。这个图示出了具有特定次要等位基因频率的由每个引物库靶向的基因座的数量。在“新池”库的选择期间，保留更多引物。这个库实现了更多目标基因座，尤其是具有相对较大的次要等位基因频率的目标基因座(它们是用于本发明的一些方法，例如用于检测胎儿染色体异常的提供更多信息的等位基因)的扩增。Figure 27 is a graph comparing two primer libraries designed using the methods of the present invention. This graph shows the number of loci targeted by each primer pool with a specific minor allele frequency. During selection of the "new pool" library, more primers are retained. This library achieves more target loci, especially those with relatively large minor allele frequencies (they are used in some methods of the present invention, such as for the detection of fetal chromosomal abnormalities to provide more information allele) amplification.

这些引物库用于以下复合PCR方法中。从每个受试者收集血液(20-40mL)到两到四个CELL-FREE^TM DNA管(施特雷克(Streck))中。经由双重离心方案(2,000g，20min；接着是3,220g，30min)从每个样品中分离血浆(最少7mL)，在第一次旋转之后转移上清液。使用凯杰齐安普循环核酸试剂盒从7-20mL血浆中分离cfDNA并且在45μLTE缓冲液中洗脱。从在第一次离心之后获得的白细胞层中分离纯母本基因组DNA，并且类似地从血液、唾液或面颊样品制备纯父本基因组DNA。These primer pools were used in the following multiplex PCR method. Blood (20-40 mL) was collected from each subject into two to four CELL-FREE ^™ DNA tubes (Streck). Plasma (minimum 7 mL) was isolated from each sample via a double centrifugation protocol (2,000 g, 20 min; followed by 3,220 g, 30 min), with the supernatant transferred after the first spin. cfDNA was isolated from 7-20 mL plasma using the Qiagen Ziamp Circulating Nucleic Acid Kit and eluted in 45 μL LTE buffer. Pure maternal genomic DNA was isolated from the buffy coat obtained after the first centrifugation, and pure paternal genomic DNA was similarly prepared from blood, saliva or cheek samples.

使用11,000个目标特异性检测使母本cfDNA、母本基因组DNA和父本基因组DNA样品预扩增15个周期，并且将等分试样转移到使用嵌套式引物的15个周期的第二PCR反应中。最后，通过在第三轮12个周期的PCR中添加带条形码的标记制备用于测序的样品。因此，在单一反应中扩增11,000个目标；所述目标包括在13号、18号、21号、X和Y染色体上发现的SNP。然后使用伊路米那GAIIx或HISEQ测序仪对扩增子进行测序。以低于胎儿基因型的读数深度(约20％的cfDNA读数深度)对亲本基因型进行测序。Maternal cfDNA, maternal genomic DNA, and paternal genomic DNA samples were preamplified for 15 cycles using 11,000 target-specific assays, and aliquots were transferred to a second PCR of 15 cycles using nested primers Reacting. Finally, samples were prepared for sequencing by adding barcoded markers in a third round of 12-cycle PCR. Thus, 11,000 targets were amplified in a single reaction; the targets included SNPs found on chromosomes 13, 18, 21, X and Y. The amplicons were then sequenced using an Illumina GAIIx or HISEQ sequencer. Parental genotypes were sequenced at a lower read depth than fetal genotypes (approximately 20% of cfDNA read depth).

实验18Experiment 18

必要时，PCR产物的大小和数量可以使用标准方法来分析，例如使用安捷伦技术2100生物分析仪(图28A-M)。举例来说，在2,400重(图28B-28G)和19,488重实验(图28H到28M)中使用本文中所述的不含嵌套的直接PCR方法。图28B-28D和28H到28J的引物的量是10nM。图28E-28G和28K到28M的引物的量是1nM。图28B、28E、28H和28K的输入DNA的量是24ng；图28C、28F、28I和28L是80ng；并且图28D、28G、28J和28M是250ng。更多输入DNA产生更大比例的所希望的180碱基对产物。在140个碱基对处的峰值是引物二聚体产物。When necessary, the size and quantity of PCR products can be analyzed using standard methods, for example using the Agilent Technologies 2100 Bioanalyzer (FIG. 28A-M). For example, the direct PCR method without nesting described herein was used in 2,400-plex (Figures 28B-28G) and 19,488-plex experiments (Figures 28H to 28M). The amount of primers for Figures 28B-28D and 28H to 28J was 10 nM. The amount of primers for Figures 28E-28G and 28K to 28M was 1 nM. The amount of input DNA was 24 ng for Figures 28B, 28E, 28H, and 28K; 80 ng for Figures 28C, 28F, 28I, and 28L; and 250 ng for Figures 28D, 28G, 28J, and 28M. More input DNA yielded a greater proportion of the desired 180 bp product. The peak at 140 bp is the primer dimer product.

实验19Experiment 19

原理循证研究证明，在所有染色体中，T13、T18、T21、45，X和47，XXY的检测的准确性一样高。Evidence-of-principle studies have demonstrated that T13, T18, T21, 45,X, and 47,XXY are equally accurate in all chromosomes.

患者patient

依据当地法律，根据由机构审查委员会批准的协议，在特定的产前护理中心招收怀孕的夫妇。入选标准是年龄至少18岁，孕龄至少九周，单胎妊娠并且签署知情同意书。从怀孕母亲抽取血液样品，并且从父亲收集血液或面颊样品。选择来自2名T13(帕陶氏综合症)怀孕者、2名T18(爱德华氏综合症)怀孕者、2名T21(唐氏综合症)怀孕者、2名45，X怀孕者、2名47，XXY怀孕者以及90名正常孕妇的样品，然后测试约500名妇女群体以测试所述方法所检测的染色体异常。当可获得出生后的孩子组织时，通过对样品进行分子染色体组型分析证实了正常的胎儿染色体组型。在侵入性测试之前从低风险妇女抽取整倍体样品。在侵入性测试之后至少7天抽取非整倍体样品并且经由在独立实验室的细胞遗传学染色体组型分析或荧光原位杂交证实非整倍性。Pregnant couples were enrolled in specific antenatal care centers according to local law and according to protocols approved by the Institutional Review Board. Inclusion criteria were age at least 18 years, gestational age at least nine weeks, singleton pregnancy and informed consent. A blood sample was drawn from the pregnant mother, and a blood or cheek sample was collected from the father. Selection from 2 T13 (Patau syndrome) pregnant, 2 T18 (Edwards syndrome) pregnant, 2 T21 (Down syndrome) pregnant, 2 45, X pregnant, 2 47 , XXY pregnant women and samples of 90 normal pregnant women, then a population of about 500 women was tested for the chromosomal abnormalities detected by the method. When postnatal child tissue was available, a normal fetal karyotype was confirmed by molecular karyotyping of the samples. Euploid samples were drawn from low-risk women prior to invasive testing. Aneuploidy samples were drawn at least 7 days after invasive testing and aneuploidy was confirmed via cytogenetic karyotyping or fluorescence in situ hybridization at an independent laboratory.

样品制备和复合PCRSample preparation and multiplex PCR

关于图30A-E、30G、30H和31A-31G中的数据，如实验15中所述执行样品制备和19,488重PCR。关于图30F中的数据，如实验17中所述执行样品制备和11,000重PCR。For the data in Figures 30A-E, 30G, 30H, and 31A-31G, sample preparation and 19,488-plex PCR were performed as described in Experiment 15. For the data in Figure 30F, sample preparation and 11,000-plex PCR were performed as described in Experiment 17.

方法和数据分析Methods and Data Analysis

所述算法考虑亲本基因型和交叉频率数据(例如来自HapMap数据库的数据)，针对19,488个多态基因座针对极大量可能的胎儿倍性状态并且在各种胎儿cfDNA分数下，计算预计等位基因分布。(图29A-29C)。不同于基于等位基因比率的方法，它还考虑了连锁不平衡，并且使用非高斯数据模型，鉴于所观察到的平台特征和扩增偏差来描述在SNP的等位基因测量结果的预计分布。它然后比较各种预测等位基因分布与如在cfDNA样品中测量的实际等位基因分布(图29C)，并且基于测序数据计算每种假设(单体性、二体性或三体性，其中基于各种潜在交叉，存在多种假设)的似然性。所述算法对每种单独的单体性、二体性或三体性假设的似然性求和(图29D)，并且将具有最大整体似然性的倍性状态判读为拷贝数和胎儿分数(图29E)。虽然实验室研究人员不会无视样品染色体组型，但是所述算法在无人类干预的情况下判读倍性状态并且无视真相。The algorithm considers parental genotype and crossover frequency data (such as data from the HapMap database) to calculate predicted alleles for a very large number of possible fetal ploidy states for 19,488 polymorphic loci and at various fetal cfDNA fractions distributed. (FIGS. 29A-29C). Unlike allele ratio-based methods, it also takes linkage disequilibrium into account and uses a non-Gaussian data model to describe the expected distribution of allelic measurements at SNPs given observed plateau characteristics and amplification bias. It then compares the various predicted allelic distributions to the actual allelic distribution as measured in cfDNA samples (Fig. 29C), and calculates each hypothesis (monosomy, disomy, or trisomy, where Based on various potential intersections, there are multiple hypotheses) likelihoods. The algorithm sums the likelihood for each individual monosomy, disomy, or trisomy hypothesis (FIG. 29D) and calls the ploidy state with the greatest overall likelihood as copy number and fetal fraction (FIG. 29E). While lab researchers don't disregard sample karyotypes, the algorithm reads ploidy status without human intervention and disregards the truth.

数据解释data interpretation

所产生数据的图形表示Graphical representation of the resulting data

为了测定相关染色体的倍性状态，所述算法考虑在每条染色体的3,000到4,000个SNP的两种可能等位基因中的每一种的序列计数的分布。重要的是应注意，所述算法使用不适用于可视化的方法做出倍性判读。因此，为了说明，在此以标注为A和B的两种最可能等位基因的比率形式，以简化方式展现数据，以使得可以更容易使相关趋势可视化。这个简化说明不考虑算法特征中的一些。举例来说，所述算法中不可能用显示等位基因比率的可视化方法来说明的两个重要方面是：1)利用连锁不平衡的能力，即在一个SNP的测量结果对相邻SNP的可能身份的影响；和2)鉴于平台特征和扩增偏差，使用非高斯数据模型来描述在SNP的等位基因测量结果的的预计分布。还应注意，所述算法仅仅考虑了在每个SNP的两种最常见等位基因，忽略了其它可能的等位基因。To determine the ploidy state of the associated chromosomes, the algorithm considers the distribution of sequence counts for each of the two possible alleles of 3,000 to 4,000 SNPs per chromosome. It is important to note that the algorithm makes ploidy calls using methods that are not suitable for visualization. Therefore, for illustration, the data are presented here in a simplified form in the form of ratios of the two most likely alleles, labeled A and B, so that relevant trends can be more easily visualized. This simplified illustration does not take into account some of the algorithmic features. As an example, two important aspects of the algorithm that are not possible to illustrate with a visualization showing allelic ratios are: 1) the ability to exploit linkage disequilibrium, i.e. the likelihood that a measurement at one SNP will have a negative impact on neighboring SNPs. Effects of identity; and 2) Non-Gaussian data models were used to describe the expected distribution of allelic measurements at SNPs given platform characteristics and amplification bias. It should also be noted that the algorithm only considers the two most common alleles at each SNP, ignoring other possible alleles.

图30A-30H中的图形表示包括其中存在两条、一条或三条胎儿染色体的样品。一般来说，这些图形表示分别指示整倍性(图30A-30C)、单体性(图30D)和三体性(图30E-30H)。在所有图中，每个点表示单一SNP，其中沿着横轴从左到右依次绘制一条染色体的靶向SNP。纵轴指示A等位基因的读数数量，呈SNP的A和B等位基因的读数总数的分数形式。应注意，所述测量是对从母本血液中分离的总cfDNA进行的，并且cfDNA包括母本和胎儿的cfDNA；因此，每个点表示所述SNP的胎儿和母本DNA贡献的组合。因此，将母本cfDNA的比例从0％增加到100％将使一些点取决于母本和胎儿的基因型在图内逐渐上移或下移。这一点在下文中用相应的图进行更详细地描述。The graphical representations in Figures 30A-30H include samples in which two, one or three fetal chromosomes are present. In general, these graphical representations indicate euploidy (Figures 30A-30C), monosomy (Figure 30D) and trisomy (Figures 30E-30H), respectively. In all figures, each dot represents a single SNP, where the targeted SNPs for one chromosome are plotted sequentially from left to right along the horizontal axis. The vertical axis indicates the number of reads for the A allele as a fraction of the total number of reads for the A and B alleles of the SNP. It should be noted that the measurements were performed on total cfDNA isolated from maternal blood, and that cfDNA included both maternal and fetal cfDNA; thus, each point represents the combined fetal and maternal DNA contribution to the SNP. Thus, increasing the proportion of maternal cfDNA from 0% to 100% will gradually shift some points up or down the graph depending on the genotype of the mother and fetus. This is described in more detail below with the corresponding figures.

如果想要促进可视化，那么可以根据母本基因型对各点进行颜色编码，因为母本基因型对每个点的位置贡献得更多并且大部分三体性是母本遗传的；这有助于使倍性状态可视化。具体来说，母本基因型是AA的SNP可以用红色表示，母本基因型是AB的SNP可以用绿色表示，并且母本基因型是BB的SNP可以用蓝色表示。If you want to facilitate visualization, you can color-code the points according to the maternal genotype, since the maternal genotype contributes more to the position of each point and most trisomies are maternally inherited; this helps for visualizing ploidy status. Specifically, SNPs whose maternal genotype is AA can be represented in red, SNPs whose maternal genotype is AB can be represented in green, and SNPs whose maternal genotype is BB can be represented in blue.

在所有情况下，发现母亲和胎儿中的A等位基因纯合(AA)的SNP与图的上限密切相关，因为A等位基因读数的分数高，因为应该不存在B等位基因。反之，发现母亲和胎儿中的B等位基因纯合的SNP与图的下限密切相关，因为A等位基因读数的分数低，因为应该仅存在B等位基因。不与图的上限和下限密切相关的点表示其中母亲、胎儿或两者是杂合的SNP；这些点适用于鉴别胎儿倍性，但是还可以关于测定父本对比母本遗传提供信息。这些点基于母本和胎儿的基因型和胎儿分数分开，并且因此，沿着y轴的每个单独点的精确位置取决于化学计量学和胎儿分数。举例来说，其中母亲是AA并且胎儿是AB的基因座预计取决于胎儿分数，具有不同分数的A等位基因读数，并且由此沿着y轴不同定位。In all cases, SNPs that were found to be homozygous for the A allele (AA) in mothers and fetuses were strongly associated with the upper limit of the plot, since the fraction of A allele reads was high, since the B allele should not be present. Conversely, SNPs found to be homozygous for the B allele in mothers and fetuses are strongly associated with the lower limit of the plot, since the fraction of A allele reads is low since only the B allele should be present. Points that do not correlate closely with the upper and lower limits of the plot represent SNPs where the mother, fetus, or both are heterozygous; these points are useful for identifying fetal ploidy, but can also be informative for determining paternal versus maternal inheritance. The points are separated based on maternal and fetal genotype and fetal fraction, and thus, the precise location of each individual point along the y-axis depends on stoichiometry and fetal fraction. For example, a locus where the mother is AA and the fetus is AB is expected to have different fractions of A allele reads depending on fetal fraction, and thus be positioned differently along the y-axis.

存在两条染色体presence of two chromosomes

图30A-30C描绘了当样品是完全母本的(不存在胎儿cfDNA，图30A)、含有中等的胎儿cfDNA分数(图30B)或含有高胎儿cfDNA分数(图30C)时，指示两条染色体的存在情况的数据。Figures 30A-30C depict the presence of two chromosomes when the sample is completely maternal (no fetal cfDNA present, Figure 30A), contains a moderate fetal cfDNA fraction (Figure 30B), or contains a high fetal cfDNA fraction (Figure 30C). Existence data.

图30A示出了从没有怀孕的妇女的血液中分离的cfDNA获得的数据。当不存在胎儿cfDNA并且样品仅含有母本cfDNA时，所述图仅表示整倍体母本基因型；标志模式包括点“簇”：与图顶部密切相关的红色簇(其中母本基因型是AA的SNP)、与图底部密切相关的蓝色簇(其中母本基因型是BB的SNP)以及单一集中的绿色簇(其中母本基因型是AB的SNP)(颜色图中未示)。Figure 30A shows data obtained from cfDNA isolated from blood of non-pregnant women. When no fetal cfDNA is present and the sample contains only maternal cfDNA, the plot represents euploid maternal genotypes only; the signature pattern consists of point "clusters": the red clusters that are closely related to the top of the plot (where the maternal genotype is AA SNPs), a blue cluster closely related to the bottom of the plot (where the maternal genotype is a SNP for BB), and a single concentrated green cluster (where the maternal genotype is a SNP for AB) (not shown in the color plot).

当存在胎儿cfDNA时，点的位置偏移以使得簇分开成离散“带”。应注意，关于胎儿分数为0％的样品，点的分组称为“簇”(如图30A中)；并且关于胎儿分数＞0％的所有样品，点的分组称为“带”(如图30B-30J中)。如果胎儿分数足够高，那么将容易看见这些离散带。具体来说，图30B和30C展示了与分别以中等和高胎儿分数存在的两条胎儿染色体相关的特征模式。这种模式包括对应于母亲中的杂合SNP的三个中心绿色带和各自在图的顶部(红色)和底部(蓝色)的对应于母亲中的纯合SNP的两个“边缘”带(颜色图中未示)。When fetal cfDNA is present, the positions of the spots are shifted so that the clusters separate into discrete "bands". It should be noted that for samples with a fetal fraction of 0%, groupings of points are called "clusters" (as in Figure 30A); and for all samples with fetal fraction > 0%, groupings of points are called "bands" (as in Figure 30B -30J). These discrete bands will be easily visible if the fetal fraction is high enough. Specifically, Figures 30B and 30C demonstrate characteristic patterns associated with two fetal chromosomes present at intermediate and high fetal fractions, respectively. This pattern includes three central green bands corresponding to heterozygous SNPs in the mother and two "marginal" bands corresponding to homozygous SNPs in the mother at the top (red) and bottom (blue) of the plot, respectively ( Color chart not shown).

图30B示出了从携带整倍体胎儿和12％胎儿cfDNA分数的妇女的血浆样品中分离的cfDNA获得的数据。在此，与图的顶部和底部密切相关的点簇各自分开成两个离散带：一个红色和一个蓝色的与图的上限或下限保持密切相关的外部边缘带；和一个红色和一个蓝色的已经与图的界限分开的内部边缘带(颜色图中未示)。这些以0.92和0.08为中心的内部边缘带分别表示其中母本基因型是AA并且胎儿基因型是AB的SNP(用红色表示)，和其中母本基因型是BB并且胎儿基因型是AB的SNP(用蓝色表示)。绿色点的中心簇扩大，但是在这个胎儿分数下，分开成独特的带不容易看到。Figure 30B shows data obtained from cfDNA isolated from plasma samples of women carrying euploid fetuses and a 12% fetal cfDNA fraction. Here, clusters of points closely associated with the top and bottom of the graph are each separated into two discrete bands: a red and a blue outer edge band that remains closely associated with the upper or lower bound of the graph; and a red and a blue The inner edge bands (not shown in the color map) that have been separated from the bounds of the map. These inner marginal bands centered at 0.92 and 0.08 represent the SNPs (in red) where the maternal genotype is AA and the fetal genotype is AB, and the SNPs where the maternal genotype is BB and the fetal genotype is AB, respectively (shown in blue). The central cluster of green dots expands, but separation into distinct bands is not easily seen at this fetal fraction.

在高胎儿cfDNA分数下，容易看清表示两条染色体的存在情况的典型模式(一组三条绿色带以及两个红色和两个蓝色边缘带)(颜色图中未示)。图30C展示了从以26％的胎儿cfDNA分数携带整倍体胎儿的妇女的血浆样品获得的数据。在此，边缘带已被分离以使得内部带朝向图中心偏移，这是由于B等位基因的水平因胎儿cfDNA分数增加而更改。值得注目地，在更高的胎儿分数下，现在容易看清中心绿色簇分成了三个不同带。在此情况下，这三个一组的中心带在0.37、0.50和0.63周围成簇，对应于其中母本基因型是AB并且胎儿基因型是AA(顶部)、AB(中间)和BB(底部)的那些SNP。At high fetal cfDNA fractions, the typical pattern (a set of three green bands with two red and two blue bordered bands) indicating the presence of two chromosomes is easily seen (not shown in the color map). Figure 30C presents data obtained from plasma samples from a woman carrying a euploid fetus with a fetal cfDNA fraction of 26%. Here, the marginal bands have been separated such that the inner bands are shifted towards the center of the plot, due to changes in the level of the B allele due to increasing fetal cfDNA fraction. Remarkably, at higher fetal fractions it is now easy to see the separation of the central green cluster into three distinct bands. In this case, the central bands of the triplets are clustered around 0.37, 0.50 and 0.63, corresponding to where the maternal genotype is AB and the fetal genotypes are AA (top), AB (middle) and BB (bottom). ) of those SNPs.

这些标志模式，即三个绿色带和四个边缘带(两个红色和两个蓝色)表示两条染色体的存在情况，如在常染色体整倍性中或女性(XX)胎儿中的X染色体。These hallmark patterns, three green bands and four bordered bands (two red and two blue), indicate the presence of two chromosomes, such as the X chromosome in autosomal euploidy or in female (XX) fetuses .

存在一条染色体there is a chromosome

当胎儿仅仅遗传了单条染色体并且因此仅仅遗传了单个等位基因时，胎儿的杂合性是不可能的。因此，唯一可能的胎儿SNP身份是A或B。因此，母本遗传的单体染色体的特征模式具有表示母亲是杂合的SNP的两个中心绿色带；并且仅仅具有表示其中母亲是纯合的SNP的单一边缘红色和蓝色带，并且它们分别与图的上限和下限(1和0)保持密切相关(图30D)(颜色图中未示)。应注意，不存在内部边缘带。这种模式表示一条染色体的存在情况，如在母本遗传的常染色体单体性中或男性(XY)胎儿中的X染色体。Fetal heterozygosity is not possible when the fetus has inherited only a single chromosome and thus only a single allele. Therefore, the only possible fetal SNP identities are A or B. Thus, the characteristic pattern of a maternally inherited monosomy has two central green bands representing SNPs in which the mother is heterozygous; and has only a single marginal red and blue band representing SNPs in which the mother is homozygous, and they are respectively A strong correlation was maintained with the upper and lower bounds (1 and 0) of the plot (Fig. 30D) (not shown in the color plot). It should be noted that there are no inner edge bands. This pattern indicates the presence of one chromosome, such as the X chromosome in a maternally inherited autosomal monosomy or in a male (XY) fetus.

存在三条染色体three chromosomes

三体染色体具有三种特征模式。第一种模式表示母本遗传的减数分裂三体性，其中胎儿从母亲遗传了两条同源、不一致染色体的减数分裂错误(图30E)；这种模式包括两个中心绿色带以及边缘红色和蓝色带各两个。(颜色图中未示)第二种模式表示父本遗传的减数分裂三体性，其中胎儿从父亲遗传了两条同源、不一致染色体(图30F)；这种模式包括四个中心绿色带和三个边缘红色带与三个边缘蓝色带(颜色图中未示)。第三种模式表示母本地(图30G)或父本地(图30H)遗传的有丝分裂三体性，其中胎儿从母亲或父亲遗传了两条一致染色体的有丝分裂错误；这种模式包括四个中心绿色带以及边缘红色和蓝色带各两个。母本地和父本地遗传的有丝分裂三体性可以通过侧面红色和蓝色带的位置来区别，使得红色和蓝色内部边缘带(不与图的界限相关的那些)更靠近父本地遗传的有丝分裂三体性的中心(颜色图中未示)。这是由于一致染色体的父本贡献所致。应注意，我们的先前结果表明，在分裂球阶段，66.7％的母本遗传的三体性是减数分裂的，并且仅仅10.2％的三体性是父本遗传的。Trisomic chromosomes have three characteristic patterns. The first pattern represents a maternally inherited meiotic trisomy, in which the fetus inherits meiotic errors of two homologous, discordant chromosomes from the mother (Fig. 30E); this pattern includes two central green bands as well as marginal Two each of the red and blue bands. (not shown in color diagram) The second pattern represents a paternally inherited meiotic trisomy in which the fetus inherits two homologous, discordant chromosomes from the father (Fig. 30F); this pattern includes four central green bands and three marginal red bands and three marginal blue bands (not shown in color map). A third pattern represents a maternally (Fig. 30G) or paternally (Fig. 30H) inherited mitotic trisomy, in which the fetus inherits mitotic errors of two identical chromosomes from either the mother or the father; this pattern includes four central green bands And two each of the fringe red and blue bands. Maternally and paternally inherited mitotic trisomies can be distinguished by the position of the lateral red and blue bands such that the red and blue inner marginal bands (those not associated with the boundaries of the diagram) are closer to paternally inherited mitotic trisomies The center of the body (not shown in the color map). This is due to the paternal contribution of identical chromosomes. It should be noted that our previous results showed that at the blastomere stage, 66.7% of maternally inherited trisomies were meiotic and only 10.2% of paternally inherited trisomies were.

关于Y染色体，PS方法考虑了一组不同的假设：存在零条、一条或两条染色体。因为每个基因座的序列读数不存在母本贡献并且因为杂合基因座是不可能的(两条Y染色体必定涉及两条一致染色体的情况)，所以所述带保持与图的顶部(A等位基因)或底部(B等位基因)密切相关(数据图中未示)，并且取决于定量等位基因计数数据，大大简化了分析。应注意因为所述方法询问SNP，它使用来自Y染色体的同源非重组性SNP，因此针对一个探针对获得了X和Y上的数据。Regarding the Y chromosome, the PS method considers a different set of assumptions: the presence of zero, one or two chromosomes. Because there is no maternal contribution to the sequence reads at each locus and because heterozygous loci are not possible (the case where two Y chromosomes must involve two identical chromosomes), the bands remain consistent with the top of the plot (A et al. allele) or bottom (B allele) are closely related (not shown in data plots) and depend on quantitative allele count data, greatly simplifying analysis. It should be noted that because the method interrogates SNPs, it uses homologous non-recombinant SNPs from the Y chromosome, thus obtaining data on both X and Y for one probe pair.

鉴别非整倍性Identify aneuploidy

鉴于足够的胎儿分数，使用这种基于图的可视化方法来鉴别常染色体非整倍性是简单明了的，并且仅仅需要鉴别图中所存在的染色体的异常数量，如上所述。组合X和Y染色体的拷贝数知识来鉴别是否存在性染色体非整倍性。具体来说，表示基因型为47，XXX的胎儿的图将具有典型的“三条染色体”模式，并且表示基因型为47，XXY的胎儿的图将具有针对X染色体的典型的“两条染色体”模式，但是也将具有指示存在一条Y染色体的等位基因读数。所述方法类似地能够判读47，XYY，其中“一条染色体”模式表示存在单一X染色体，并且等位基因读数指示存在两条Y染色体。基因型为45，X的胎儿将具有针对X染色体的典型的“一条染色体”模式，和指示零条Y染色体的数据。Given sufficient fetal fractions, identification of autosomal aneuploidies using this graph-based visualization is straightforward and requires only identifying the abnormal number of chromosomes present in the graph, as described above. Combining copy number knowledge of the X and Y chromosomes to identify the presence of sex chromosome aneuploidy. Specifically, a plot representing a fetus with genotype 47,XXX will have a typical "three chromosomes" pattern, and a plot representing a fetus with genotype 47,XXY will have a typical "two chromosomes" pattern for the X chromosome pattern, but will also have allelic reads indicating the presence of a Y chromosome. The method is similarly capable of calling 47,XYY, where a "one chromosome" pattern indicates the presence of a single X chromosome and an allelic read indicates the presence of two Y chromosomes. A fetus with genotype 45, X will have a typical "one chromosome" pattern for the X chromosome, and data indicating zero Y chromosomes.

胎儿分数的影响The effect of fetal fraction

如上文所讨论，来自胎儿的序列读数的数量促成了图中每个点沿着y轴的精确位置。因为胎儿分数会影响源自胎儿和母亲的读数的比例，所以它也会影响每个点的定位。在胎儿cfDNA的高分数(一般来说超过约20％)下，如在图30C-30E和图30G和30H中，容易清楚，虽然点簇主要基于母本基因型，但是来自基因型不同于母本基因型的等位基因的胎儿DNA的存在使得簇偏移成多个不同带。然而，随着胎儿分数减少(如在图30B和30F中)，点朝向图的极点和中心回归，产生更紧密的簇。具体来说，其中母本基因型是AA的边缘红色带组朝向图的顶部回归；其中母本基因型是BB的边缘蓝色带组朝向底部回归；其中母亲是杂合的中心绿色带组在图的中心压缩成单一簇(比较图30B和30C)(颜色图中未示)。虽然针对低胎儿分数情况，使用这种可视化技术并不容易看清非整倍性，但是所述算法能够鉴别具有极低胎儿分数、例如3％胎儿分数的倍性状态。它能够这么做是因为统计技术比较所观察到的数据与预测指定样品参数集(包括例如拷贝数、亲本基因型和胎儿分数)的等位基因分布的非常精确的数据模型。数据模型精确性在低胎儿分数情况中是关键的，因为不同倍性状态的等位基因分布之间的差异与胎儿分数成比例。另外，所述算法能够确定数据集所含数据何时不足以做出确信的胎儿倍性测定了。As discussed above, the number of sequence reads from the fetus contributes to the precise location of each point in the graph along the y-axis. Because fetal fraction affects the proportion of readings derived from the fetus and the mother, it also affects the positioning of each point. At high fractions of fetal cfDNA (generally over about 20%), as in Figures 30C-30E and Figures 30G and 30H, it is readily apparent that although the point clusters are primarily based on maternal The presence of fetal DNA of alleles of this genotype shifts the cluster into multiple distinct bands. However, as the fetal fraction decreases (as in Figures 30B and 30F), the points regress toward the poles and center of the plot, resulting in tighter clusters. Specifically, the marginal red band group where the maternal genotype is AA regresses towards the top of the plot; the marginal blue band group where the maternal genotype is BB regresses towards the bottom; and the central green band group where the mother is heterozygous is in The center of the map is compressed into a single cluster (compare Figures 30B and 30C) (not shown in color map). While aneuploidy is not readily visible using this visualization technique for low fetal fraction cases, the algorithm is able to identify ploidy states with very low fetal fractions, eg 3% fetal fraction. It can do this because statistical techniques compare observed data with very accurate data models that predict allelic distribution for a given set of sample parameters including, for example, copy number, parental genotype, and fetal fraction. Data model accuracy is critical in the low fetal fraction case since the difference between allele distributions for different ploidy states is proportional to fetal fraction. Additionally, the algorithm is capable of determining when a data set contains insufficient data to make a confident determination of fetal ploidy.

结果result

映射到靶向SNP的测序读数认为是提供信息的并且由所述算法所用。在测序结果中观察到超过95％的目标基因座。在图31A-31G中描绘了使关键倍性判读可视化的图。图31A表示整倍体样品。在此，13号、18号和21号染色体具有典型的“两条染色体”模式(如本文中所述)。这包括一组三个中心绿色带以及两个红色和两个蓝色的边缘带。这个以及X染色体的两个中心绿色带和沿着图的周边的Y染色体带的存在指示整倍体XY基因型(颜色图中未示)。Sequencing reads that map to targeted SNPs are considered informative and used by the algorithm. More than 95% of the target loci were observed in the sequencing results. Diagrams to visualize key ploidy calls are depicted in Figures 31A-31G. Figure 31A represents a euploid sample. Here, chromosomes 13, 18 and 21 have a typical "two chromosome" pattern (as described herein). This includes a set of three central green bands and two red and two blue fringe bands. The presence of this along with the two central green bands of the X chromosome and the Y chromosome band along the perimeter of the plot indicates a euploid XY genotype (not shown in color plot).

图31B、31C和31D中的图分别指出了最普遍的常染色体三体性T13、T18和T21。具体来说，图31B描绘T13样品。在此，18号和21号染色体展示典型的“两条染色体”模式，X染色体展示典型的“一条染色体”模式，并且存在来自Y染色体的读数。合起来，这表示在18号和21号染色体的二体性，并且鉴别出了胎儿XY基因型。然而，具体来说，13号染色体描绘了典型的“三条染色体”模式。类似地，图31C描绘T18样品，并且图31D描绘T21样品。The graphs in Figures 31B, 31C and 31D indicate the most prevalent autosomal trisomies T13, T18 and T21, respectively. Specifically, Figure 3 IB depicts the T13 sample. Here, chromosomes 18 and 21 exhibit a typical "two chromosome" pattern, the X chromosome exhibits a typical "one chromosome" pattern, and there are reads from the Y chromosome. Taken together, this indicated disomy at chromosomes 18 and 21 and identified a fetal XY genotype. Specifically, however, chromosome 13 depicts the typical "three-chromosome" pattern. Similarly, Figure 31C depicts a T18 sample, and Figure 31D depicts a T21 sample.

所述方法还能够检测性染色体非整倍性，包括45，X(图31E)、47，XXY(图31F)和47，XYY(图31G)。应注意，所述方法是判读在13号、18号、21号、X和Y染色体的拷贝数；假定剩余染色体是二体性，报告整体染色体数量。描绘45，X样品的图的X染色体区域揭示了单一染色体的存在。然而，缺乏来自Y染色体的读数以及13号、18号和21号染色体的“两条染色体”模式指示45，X基因型。反之，47，XXY样品产生的图揭示了两条X染色体的存在。数据还揭示了来自Y染色体的等位基因读数。连同13号、18号和21号染色体的两个拷贝的存在，这指示47，XXY基因型。针对X染色体的“一条染色体”模式的存在和指示存在两条Y染色体的读数指示47，XYY基因型。The method is also capable of detecting sex chromosome aneuploidies, including 45,X (Figure 31E), 47,XXY (Figure 31F) and 47,XYY (Figure 31G). It should be noted that the method is to call copy numbers on chromosomes 13, 18, 21, X, and Y; the overall chromosome number is reported assuming the remaining chromosomes are disomy. The X chromosome region of the graph depicting the 45,X sample revealed the presence of a single chromosome. However, the absence of reads from the Y chromosome and the "two-chromosome" pattern of chromosomes 13, 18, and 21 indicated the 45,X genotype. In contrast, the 47,XXY sample produced a map that revealed the presence of two X chromosomes. The data also revealed allelic reads from the Y chromosome. Together with the presence of two copies of chromosomes 13, 18 and 21, this is indicative of a 47,XXY genotype. The presence of a "one chromosome" pattern for the X chromosome and reads indicating the presence of two Y chromosomes is indicative of a 47,XYY genotype.

讨论discuss

这种方法从母本血液非侵入性地检测T13、T18、T21、45，X、47，XXY和47，XYY。这种方法通过19,488个SNP的靶向复合PCR扩增和高通量测序，询问来自母本血浆的cfDNA。这一点以及考虑了亲本基因型信息和许多样品参数(包括胎儿分数和DNA质量)的方法的复杂信息分析更有力地检测胎儿信号并且对七类最常见的出生时非整倍性(T13、T18、T21、45，X、47，XXX、47，XXY和47，XYY)中所牵涉的所有五条染色体做出了高度准确的倍性判读。这种方法提供优于先前方法的多个临床优点，包括并且最值得注目地是更大的临床覆盖率和样品特异性计算准确性(类似于个性化风险分数)。This method non-invasively detects T13, T18, T21, 45,X, 47,XXY and 47,XYY from maternal blood. This method interrogates cfDNA from maternal plasma by targeted multiplex PCR amplification of 19,488 SNPs and high-throughput sequencing. This, along with complex information analysis of methods that take into account parental genotype information and many sample parameters, including fetal fraction and DNA quality, more robustly detects fetal signals and is specific for the seven most common types of aneuploidy at birth (T13, T18 Highly accurate ploidy calls were made for all five chromosomes involved in , T21, 45, X, 47, XXX, 47, XXY and 47, XYY). This approach offers multiple clinical advantages over previous methods, including, and most notably, greater clinical coverage and sample-specific computational accuracy (similar to personalized risk scores).

增加的临床覆盖率Increased Clinical Coverage

这种方法鉴于其准确检测常染色体三体性和性染色体非整倍性的能力，提供的非整倍性覆盖率相当于临床上可获得的NIPT方法的约两倍。在此提出的方法是以高准确性判读性染色体的倍性的唯一非侵入性测试。先前DNA混合实验和在我们的实验检测中所分析的单独的血浆样品表明，这种方法将检测更大的性染色体异常群体，包括47，XXX。在此提出的方法还以高灵敏度和特异性检测在13号、18号和21号染色体的非整倍性，并且恰当的引物设计预计能够同样检测在剩余染色体的拷贝数。Given its ability to accurately detect autosomal trisomies and sex chromosome aneuploidies, this method provides aneuploidy coverage equivalent to approximately twice that of clinically available NIPT methods. The method presented here is the only non-invasive test for calling the ploidy of sex chromosomes with high accuracy. Previous DNA pooling experiments and individual plasma samples analyzed in our experimental assay suggest that this approach will detect a larger population of sex chromosome abnormalities, including 47,XXX. The method presented here also detects aneuploidy at chromosomes 13, 18 and 21 with high sensitivity and specificity, and proper primer design is expected to detect copy number at the remaining chromosomes as well.

样品特异性计算准确性Sample specific calculation accuracy

值得注目地，这种方法计算针对每个样品中的每条染色体上的倍性判读的样品特异性准确性。通过鉴别并标记有可能产生准确性不良的测试结果的具有质量不良的DNA或低胎儿分数的单独样品，通过这种方法计算的准确性预计显著降低了不正确判读的比率。相比之下，基于大规模并行鸟枪法测序(MPSS)的方法使用单一假设排斥测试产生阳性或阴性判读，并且其准确性估计是基于公开研究群体而不是基于单独样品的特征，假定它们具有与所述群体相同的准确性。然而，具有在群体分布尾部的参数的样品的单独准确性可以显著不同。在低胎儿分数下，如在孕龄早期中或针对具有低DNA质量的样品，加剧了这种情况。这些样品一般不被鉴别并标记用于后续测试，这会产生丢失的判读。然而，本发明方法考虑了多个参数，包括胎儿分数和多个DNA质量度量标准，来做出每个染色体拷贝数判读，计算所述判读的样品特异性准确性。这使得所述方法以低准确性鉴别单独样品并且对它们进行标记用于后续测试。预计这么做几乎消除了丢失的判读，尤其在妊娠早期当胎儿分数通常较低时。假定无判读远优选于丢失的判读，因为无判读只要求重新绘制和再分析。Notably, this method calculates the sample-specific accuracy of the ploidy call on each chromosome in each sample. The accuracy calculated by this method is expected to significantly reduce the rate of incorrect calls by identifying and flagging individual samples with poor quality DNA or low fetal fractions that have the potential to yield poorly accurate test results. In contrast, methods based on massively parallel shotgun sequencing (MPSS) generate positive or negative calls using single-hypothesis rejection tests, and their accuracy estimates are based on characteristics of published study populations rather than individual samples, assuming they have the same The groups have the same accuracy. However, individual accuracies for samples with parameters in the tails of the population distribution can vary significantly. This is exacerbated at low fetal fractions, as in early gestational age or for samples with low DNA quality. These samples are generally not identified and flagged for subsequent testing, which can result in lost interpretations. However, the methods of the present invention take into account multiple parameters, including fetal fraction and multiple DNA quality metrics, to make each chromosome copy number call, calculating the sample-specific accuracy of the call. This makes the method identify individual samples with low accuracy and flag them for subsequent testing. Doing so is expected to virtually eliminate missing calls, especially in the first trimester when fetal fractions are typically low. Assuming no calls is far preferable to missing calls because no calls only require redrawing and reanalysis.

将所计算的准确性转化为传统的风险分数Convert calculated accuracy to traditional risk scores

这种方法可以提供针对高风险孕妇的非整倍性的调整风险，其中调整风险考虑了先验风险(本P(Benn P)，库克H(Cuckle H)，派格曼特E(Pergament E).唐氏综合症的非侵入性产前诊断：范例将转变，但是缓慢地(Non-invasive prenatal diagnosis forDown syndrome：the paradigm will shift，but slowly).超声波产科学和妇科学(UltrasoundObstet Gyneco1)2012；39：127-130，其特此以全文引用的方式并入本文中)。虽然本发明方法提供每个患者定制的计算准确性，但是针对临床使用，这些准确性可以被转化为传统的风险分数，传统的风险分数也表示非整倍体妊娠的风险但是以分数形式表示。传统的风险分数考虑了各种参数，包括母本的年龄相关风险和生化标志物的血清水平，从而提供一个风险分数，超过所述风险分数，就认为母亲是高风险的并且向她建议后续侵入性诊断程序。这种方法显著优化了这个风险分数，由此降低了假阳性率和假阴性率，并且提供单独母本风险的更准确的评定。如在此所用的计算准确性是倍性判读是正确的似然性，并且以百分比表示，但是在实验19中所用的计算准确性不包括年龄相关风险。因为风险分数的计算通常包括年龄相关风险，所以计算准确性和传统的风险分数不可互换；它们必须组合以转化成传统的风险分数。组合年龄相关风险与计算准确性的公式是：This approach can provide adjusted risks for aneuploidy in high-risk pregnancies that take into account prior risks (Benn P, Cuckle H, Pergament E ). Non-invasive prenatal diagnosis for Down syndrome: the paradigm will shift, but slowly (Non-invasive prenatal diagnosis for Down syndrome: the paradigm will shift, but slowly). Ultrasound Obstet Gyneco1 2012 ;39:127-130, which is hereby incorporated by reference herein in its entirety). While the method of the present invention provides calculated accuracies tailored to each patient, for clinical use these accuracies can be converted to conventional risk scores, which also represent the risk of aneuploid pregnancy but expressed in fractional form. Traditional risk scores take into account various parameters, including maternal age-related risk and serum levels of biochemical markers, thereby providing a risk score above which the mother is considered high risk and recommended for subsequent invasion diagnostic procedures. This approach significantly optimizes this risk score, thereby reducing the false positive and false negative rates and providing a more accurate assessment of individual maternal risk. The calculated accuracy as used here is the likelihood that the ploidy call is correct and is expressed as a percentage, but the calculated accuracy used in Experiment 19 does not include age-related risk. Because calculations of risk scores often include age-related risks, computational accuracy and traditional risk scores are not interchangeable; they must be combined to transform into a traditional risk score. The formula for combining age-related risk and calculated accuracy is:

其中R₁是如通过本发明方法计算的风险分数并且R_２是如通过早期妊娠筛选计算的风险分数。wherein R ₁ is the risk score as calculated by the method of the invention and R ₂ is the risk score as calculated by the first trimester screening.

基于SNP的方法抵消了扩增变化问题SNP-Based Approaches Offset Amplification Variation Issues

一些其它方法所用的计数方法的固有缺点是它们通过测量映射到相关染色体(例如，21号染色体)的读数数量与映射到参考染色体的那些读数数量的比率来测定胎儿倍性状态。具有高或低GC含量的染色体(包括13号、X和Y染色体)的扩增变化性高。这会产生信号变化，其量值与胎儿cfDNA信号相当，它会通过改变来自相关染色体的等位基因读数与来自参考染色体的等位基因读数的比率，混淆拷贝数判读。这会导致13号、X和Y染色体的准确性低。值得注目地，在低胎儿cfDNA分数下，正如在早期孕龄的情况往往是这样的，加剧了这个问题。An inherent disadvantage of counting methods used by some other methods is that they determine fetal ploidy status by measuring the ratio of the number of reads that map to a relevant chromosome (eg, chromosome 21 ) to those that map to a reference chromosome. Chromosomes with high or low GC content, including chromosome 13, X, and Y, showed high variability in amplification. This produces a signal change that is comparable in magnitude to the fetal cfDNA signal, which confounds copy number calls by altering the ratio of allelic reads from the associated chromosome to the allelic reads from the reference chromosome. This results in low accuracy for Chromosome 13, X and Y. Notably, this problem is exacerbated at low fetal cfDNA fractions, as is often the case in early gestational age.

相比之下，基于SNP的方法不依赖于染色体之间的一致扩增水平，并且由此预计提供针对所有染色体同样准确的结果。因为本发明方法部分地查看在多态基因座的不同等位基因的相对计数，它们在定义上只有单核苷酸不同，它不需要使用参考染色体，并且这排除了依赖于对读数计数进行定量的方法所固有的染色体与染色体之间的扩增变化问题。不同于需要整倍体参考染色体的定量方法，本发明方法预计能够检测三倍性以及拷贝数中性异常，如单亲二体性。In contrast, SNP-based methods do not rely on consistent amplification levels between chromosomes and are thus expected to provide equally accurate results for all chromosomes. Because the present method looks in part at the relative counts of different alleles at polymorphic loci, which differ by definition by only a single nucleotide, it does not require the use of a reference chromosome, and this precludes reliance on quantification of read counts Chromosome-to-chromosome amplification variation is inherent to the method. Unlike quantitative methods that require a euploid reference chromosome, the methods of the present invention are expected to be able to detect triploidy as well as copy number neutral abnormalities such as uniparental disomies.

早期检测的重要性The importance of early detection

值得注目地，性染色体非整倍性的所组合的出生时流行率高于最常见的常染色体非整倍性(图32)。然而，当前不存在可靠地检测性染色体异常的常规非侵入性筛选方法。因此，一般以唐氏综合症或其它常染色体非整倍性的常规测试的副作用形式产前检测性染色体异常；完全错过了一大部分情况。早期并且准确的检测对于其中早期治疗性干预改进临床结果的这些病症中的多种是至关重要的。举例来说，特纳综合症通常直到青春期才被诊断出来，但是其整体出生时流行率是2,500名女性中就有1名。生长激素疗法已知用于预防由病症引起的身材矮小，但是当在4岁之前开始的治疗明显更有效。另外，雌激素替代疗法可以在患有特纳综合症的患者中刺激第二性征，但是再次治疗必须在不满十岁时，在通常检测到综合症之前开始。总之，这强调了性染色体非整倍性的早期、常规并且安全的检测的重要性。这种方法提供了具有充当性染色体异常的常规筛选的潜力的第一方法。Notably, the combined birth prevalence of sex chromosome aneuploidies was higher than the most common autosomal aneuploidies (Figure 32). However, no routine non-invasive screening method that reliably detects sex chromosome abnormalities currently exists. Thus, prenatal testing for sex chromosome abnormalities is generally performed as a side effect of routine testing for Down syndrome or other autosomal aneuploidies; a large proportion of cases are missed entirely. Early and accurate detection is critical for many of these disorders in which early therapeutic intervention improves clinical outcome. Turner syndrome, for example, is often not diagnosed until adolescence, but its overall birth prevalence is 1 in 2,500 women. Growth hormone therapy is known to prevent short stature caused by the condition, but treatment is significantly more effective when started before the age of 4 years. In addition, estrogen replacement therapy can stimulate secondary sexual characteristics in patients with Turner syndrome, but retreatment must begin before the age of ten, before the syndrome is usually detected. Taken together, this underscores the importance of early, routine, and safe detection of sex chromosome aneuploidy. This approach provides the first method with the potential to serve as a routine screen for sex chromosomal abnormalities.

额外应用additional application

因为这种方法利用靶向扩增，所以它独特地准备检测亚微观异常，例如微缺失和微重复。虽然如MPSS的非靶向方法已经显示检测迪乔治微缺失综合症，但是这需要足够高的基因组覆盖率水平，使得所述方法不可行。这是因为非靶向扩增在亚微观区域上的有效性将小数个数量级，因为极小分数的测序读数将是提供信息的。另外，当前可获得的方法在准确鉴别性染色体的倍性状态方面有困难的事实表明，它们也将在更小的染色体区段上遇到可变扩增的问题。Because this method utilizes targeted amplification, it is uniquely poised to detect submicroscopic abnormalities such as microdeletions and microduplications. While non-targeted approaches such as MPSS have been shown to detect DiGeorge microdeletion syndrome, this requires sufficiently high levels of genome coverage that the approach is not feasible. This is because the effectiveness of non-targeted amplification on submicroscopic regions will be orders of magnitude smaller, since a very small fraction of sequencing reads will be informative. Additionally, the fact that currently available methods have difficulty accurately identifying the ploidy state of the sex chromosomes suggests that they will also suffer from variable amplification problems on smaller chromosome segments.

类似地，基于SNP的方法可以检测UPD病症，这些是拷贝数中性异常，将不能通过依赖于计数的当前非侵入性方法或依赖于细胞遗传学染色体组型分析和/或荧光原位杂交的传统侵入性方法(如羊膜穿刺术和CVS)来检测。这是因为基于SNP的方法能够独特地区别个体单倍型，而临床上可获得的基于MPSS的和靶向方法扩增非多态基因座并且因此不能够确定例如相关染色体是否源自同一亲本。这意味着这些微缺失/微复制和UPD综合症，包括普拉德-威利综合症、安格尔曼(Angelman)综合症和贝威氏(Beckwith-Wiedemann)综合症，一般不能产前诊断，并且通常最初被产后误诊。这显著延迟了治疗性干预。另外，因为这种方法靶向SNP，所以这种方法也将促进亲本单倍型重构，允许检测单独疾病连锁基因座的胎儿遗传情况(科茨曼JO(Kitzman JO)，辛德尔MW(Snyder MW)，文图拉M(Ventura M)等人人类胎儿的非侵入性整个基因组测序(Noninvasive whole-genome sequencing of a human fetus).科学·转化医学(SciTransl Med)2012；4：137ra76，其特此以全文引用的方式并入本文中)。Similarly, SNP-based methods can detect UPD disorders, which are copy number neutral abnormalities that would not be possible with current non-invasive methods that rely on enumeration or those that rely on cytogenetic karyotyping and/or fluorescence in situ hybridization. Traditional invasive methods (such as amniocentesis and CVS) to detect. This is because SNP-based methods are able to uniquely distinguish individual haplotypes, whereas clinically available MPSS-based and targeted methods amplify non-polymorphic loci and thus are not able to determine, for example, whether related chromosomes are derived from the same parent. This means that these microdeletions/microduplications and UPD syndromes, including Prader-Willi syndrome, Angelman syndrome, and Beckwith-Wiedemann syndrome, generally cannot be diagnosed prenatally , and are often initially misdiagnosed postpartum. This significantly delays therapeutic intervention. Additionally, because this approach targets SNPs, this approach will also facilitate parental haplotype reconstitution, allowing detection of fetal inheritance at individual disease-linked loci (Kitzman JO, Snyder MW MW), Ventura M (Ventura M) et al. Noninvasive whole-genome sequencing of a human fetus. SciTransl Med 2012;4:137ra76, hereby incorporated herein by reference in its entirety).

在此提出的结果证实了这种方法用于鉴别产前非整倍性的扩展范围。具体来说，通过对19,488个SNP进行扩增和测序，这种方法能够测定在13号、18号、21号、X和Y染色体的拷贝数，并且独特地预计检测不能通过任何其它临床上可获得的非侵入性方法检测的其它染色体异常，例如三倍性和UPD。所增加的临床覆盖率和强大的样品特异性计算准确性表明，这种方法可以向用于检测胎儿染色体非整倍性的侵入性测试提供切实可行的辅助。The results presented here demonstrate the extended scope of this approach for identifying prenatal aneuploidies. Specifically, by amplifying and sequencing 19,488 SNPs, this method enables the determination of copy number on chromosomes 13, 18, 21, X, and Y, and uniquely predicts that detection cannot be performed by any other clinically available method. Other chromosomal abnormalities such as triploidy and UPD detected by non-invasive methods obtained. The increased clinical coverage and robust sample-specific computational accuracy suggest that this approach could provide a viable adjunct to invasive tests for the detection of fetal chromosomal aneuploidy.

本文中所引用的所有专利、公开申请以及公开参考文献特此以其全文引用的方式并入本文中。虽然本发明方法已经结合其特定实施例有所描述，但是应了解，能够对它做出进一步修改。此外，本申请旨在涵盖本发明方法的任何变体、用途或改编，包括如在本发明方法所属领域中的已知或惯用实践的范围内的并且如属于以上权利要求书范围内的本发明偏离。举例来说，本文中所公开的用于DNA的方法中的任一种可以通过包括用于将RNA转化成DNA的逆转录步骤，容易地被调适成用于RNA。必要时，用于说明的使用多态基因座的实例可以容易地被调适成用于扩增非多态基因座。All patents, published applications, and published references cited herein are hereby incorporated by reference in their entirety. Although the inventive method has been described in connection with specific embodiments thereof, it will be appreciated that further modifications can be made thereto. Furthermore, this application is intended to cover any variations, uses, or adaptations of the methods of the invention, including the invention as they come within known or customary practice in the art to which the methods of the invention pertain and as falling within the scope of the following claims Deviate. For example, any of the methods disclosed herein for use with DNA can be readily adapted for use with RNA by including a reverse transcription step for converting RNA to DNA. The illustrated examples using polymorphic loci can be readily adapted for amplification of non-polymorphic loci, if necessary.

Claims

1. a method for the target gene seat in amplification of nucleic acid sample, described method comprises:

A () makes described nucleic acid samples contact to produce reaction mixture with the test primer storehouse hybridizing at least 1,000 different target gene seat simultaneously; And

B () makes described reaction mixture experience primer extension reaction condition to produce the amplified production comprising target amplicon.

2. method according to claim 1, wherein at least 5,000 different target gene seat is amplified.

3. method according to claim 1, wherein at least 10,000 different target gene seat is amplified.

4. method according to claim 1, wherein at least 20,000 different target gene seat is amplified.

5. method according to claim 1, wherein at least 30,000 different target gene seat is amplified.

6. method according to claim 1, wherein the described amplified production of at least 90% is target amplicon.

7. method according to claim 1, wherein the described amplified production of at least 95% is target amplicon.

8. method according to claim 1, wherein the described amplified production of at least 99% is target amplicon.

9. method according to claim 1, wherein the described target gene seat of at least 90% is amplified.

10. method according to claim 1, wherein the described target gene seat of at least 95% is amplified.

11. methods according to claim 1, wherein the described target gene seat of at least 99% is amplified.

12. methods according to claim 1, the described amplified production being wherein less than 20% is test primer dimer.

13. methods according to claim 1, the described amplified production being wherein less than 10% is test primer dimer.

14. methods according to claim 1, the described amplified production being wherein less than 1% is test primer dimer.

15. methods according to claim 1, the described amplified production being wherein less than 0.1% is test primer dimer.

16. methods according to claim 1, wherein said test primer is selected from candidate drugs storehouse based on one or more parameter.

17. methods according to claim 16, wherein said test primer is selected from candidate drugs storehouse based on the ability of described candidate drugs formation primer dimer at least partly.

18. methods according to claim 17, wherein said selection comprises

I () calculates major part from two kinds of candidate drugs in described storehouse or the undesirable mark that likely combines on computers, wherein each undesirable mark is based, at least in part, between described two kinds of candidate drugs and forms dimeric likelihood;

(ii) from described candidate drugs storehouse, the highest candidate drugs of undesirable mark is removed;

(iii) if the described candidate drugs removed in step (ii) is the member of primer pair, from described candidate drugs storehouse, another member of described primer pair is so removed; And

(iv) optionally repeating step (ii) and (iii).

19. methods according to claim 17, wherein said selection comprises

(ii) from described candidate drugs storehouse, undesirable mark is removed in the maximum quantity combination as two kinds of candidate drugs higher than the candidate drugs of the part of the first minimum threshold;

(iv) optionally repeating step (ii) and (iii).

20. methods according to claim 19, comprise the quantity by remaining candidate drugs in the following described storehouse of further minimizing: described first minimum threshold used in step (ii) is reduced to the second lower minimum threshold and repeating step (ii) and (iii) until in described storehouse the undesirable mark of remaining candidate drugs combination be all equal to or less than described second minimum threshold, or until in described storehouse the quantity of remaining candidate drugs reduce to desired quantity.

21. methods according to claim 19, comprise described first minimum threshold used in step (ii) is increased to the second higher minimum threshold and repeating step (ii) and (iii) until in described storehouse the undesirable mark of remaining candidate drugs combination be all equal to or less than described second minimum threshold, or until in described storehouse the quantity of remaining candidate drugs reduce to desired quantity.

22. methods according to claim 18 or 19, wherein candidate drugs selects from the group of two or more candidate drugs with the equal undesirable mark removed from described storehouse based on one or more parameter.

23. methods according to claim 1, wherein the concentration of often kind of test primer is less than 100nM.

24. methods according to claim 1, wherein the concentration of often kind of test primer is less than 10nM.

25. methods according to claim 1, wherein the concentration of often kind of test primer is less than 2nM.

26. methods according to claim 1, wherein said test primer storehouse comprises the test primer pair that at least 1,000 comprises positive test primer and negative testing primer, and wherein often pair of test primer hybridization is to target gene seat.

27. methods according to claim 1, wherein said test primer storehouse comprises the independent test primer that at least 1,000 hybridizes to different target locus separately.

28. methods according to claim 1, the GC content of wherein said test primer, between 30% and 80%, comprises end points.

29. methods according to claim 1, the scope of the GC content of wherein said test primer is less than 20%.

30. methods according to claim 1, the melting temperature(Tm) of wherein said test primer, between 40 DEG C and 80 DEG C, comprises end points.

31. methods according to claim 1, the scope of the melting temperature(Tm) of wherein said test primer is less than 5 DEG C.

32. methods according to claim 1, the length of wherein said test primer, between 17 and 35 Nucleotide, comprises end points.

33. methods according to claim 1, wherein said test primer comprises non-targeted specific marker.

34. methods according to claim 33, wherein said mark forms inner loop structure.

35. methods according to claim 35, wherein said mark is between two DNA lands.

36. methods according to claim 34, wherein said test primer comprises and has specific 5th ' district to target gene seat, do not have specificity and form the interior region of ring structure and have specific 3rd ' district to described target gene seat described target gene seat.

37. methods according to claim 36, the length in wherein said 3rd ' district is at least 7 Nucleotide.

38. according to method according to claim 37, and the length in wherein said 3rd ' district, between 7 and 20 Nucleotide, comprises end points.

39. method according to claim 33, wherein said test primer comprises target gene seat is not had to specific 5th ' district, is then have specific region to target gene seat, do not have specificity and form the interior region of ring structure and have specific 3rd ' district to described target gene seat described target gene seat.

40. methods according to claim 1, the scope of the length of wherein said target amplicon is less than 15 Nucleotide.

41. methods according to claim 1, wherein said primer extension reaction condition is polymerase chain reaction condition (PCR).

42. methods according to claim 41, wherein the length of annealing steps is greater than 10 minutes.

43. methods according to claim 1, comprise further and determine at least one target amplicon of presence or absence.

44. methods according to claim 1, comprise the sequence measuring at least one target amplicon further.

45. methods according to claim 1, wherein said target gene seat is present on identical relative chromosome.

46. methods according to claim 1, at least some in wherein said target gene seat is present on different relative chromosome nucleic acid.

47. methods according to claim 1, wherein said nucleic acid samples comprises the nucleic acid through fragmentation or digestion.

48. methods according to claim 1, wherein nucleic acid samples comprises genomic dna, cDNA or mRNA.

49. methods according to claim 1, wherein nucleic acid samples comprises from single celled DNA.

50. methods according to claim 1, wherein nucleic acid samples comprises or derives from blood, blood plasma, saliva, sperm, cell culture supernatant, mucus secretion, dental plaque, stomach intestinal tissue, ight soil, urine or forensic samples.

51. methods according to claim 1, wherein said target gene seat is the section of people's nucleic acid.

52. methods according to claim 1, wherein said target gene seat comprises single nucleotide polymorphism.

Select the method for testing primer from candidate drugs storehouse for 53. 1 kinds, described method comprises:

A () calculates major part from two kinds of candidate drugs in described storehouse or the undesirable mark that likely combines on computers, wherein each undesirable mark is based, at least in part, between described two kinds of candidate drugs and forms dimeric likelihood;

B () removes the highest candidate drugs of undesirable mark from described candidate drugs storehouse;

If c described candidate drugs that () removes in step (b) is the member of primer pair, from described candidate drugs storehouse, so remove another member of described primer pair; And

D () be repeating step (b) and (c) optionally, thus select test primer storehouse.

Select the method for testing primer from candidate drugs storehouse for 54. 1 kinds, described method comprises:

B () to remove in the maximum quantity combination as two kinds of candidate drugs undesirable mark higher than the candidate drugs of the part of the first minimum threshold from described candidate drugs storehouse;

55. methods according to claim 54, comprise the quantity by remaining candidate drugs in the following described storehouse of further minimizing: described first minimum threshold used in step (b) is reduced to the second lower minimum threshold and repeating step (b) and (c) until in described storehouse the undesirable mark of remaining candidate drugs combination be all equal to or less than described second minimum threshold, or until in described storehouse the quantity of remaining candidate drugs reduce to desired quantity.

56. methods according to claim 54, comprise described first minimum threshold used in step (b) is increased to the second higher minimum threshold and repeating step (b) and (c) until in described storehouse the undesirable mark of remaining candidate drugs combination be all equal to or less than described second minimum threshold, or until in described storehouse the quantity of remaining candidate drugs reduce to desired quantity.

57. methods according to claim 53 or 54, wherein candidate drugs selects from the group of two or more candidate drugs with the equal undesirable mark removed from described candidate drugs storehouse based on one or more other parameter.

58. methods according to claim 53 or 54, wherein said undesirable mark is selected from by the parameter of the following group formed based on one or more at least partly: the heterozygosis rate of described target gene seat, to at the polymorphism of described target gene seat or the relevant incidence rate that suddenlys change, to described target gene seat polymorphism or suddenly change disease penetrance relevant, described candidate drugs is to the specificity of described target gene seat, the size of described candidate drugs, the melting temperature(Tm) of described target amplicon, the GC content of described target amplicon, the amplification efficiency of described target amplicon and the size of described target amplicon.

59. methods according to claim 58, wherein said undesirable mark is selected from by the parameter of the following group formed based on one or more at least partly: the heterozygosis rate of described target gene seat, described candidate drugs are to the size of the GC content of the melting temperature(Tm) of the size of the specificity of described target gene seat, described candidate drugs, described target amplicon, described target amplicon, the amplification efficiency of described target amplicon and described target amplicon; And wherein said test primer is used to increase and comprises from least 1,000 target gene seat in the maternal DNA of the pregnant mothers of fetus and the sample of foetal DNA to determine presence or absence fetal chromosomal abnormalities simultaneously.

60. methods according to claim 59, comprise the described DNA molecular joined to by universal primer binding site in described sample; Use at least 1,000 Auele Specific Primer and a universal primer to increase the DNA molecular engaged, produce first group of amplified production; And use first group of amplified production described at least 1,000 pair of primer amplified, produce second group of amplified production.

61. methods according to claim 58, wherein said undesirable mark is selected from by the parameter of the following group formed based on one or more at least partly: the heterozygosis rate of described target gene seat, described candidate drugs are to the size of the GC content of the melting temperature(Tm) of the size of the specificity of described target gene seat, described candidate drugs, described target amplicon, described target amplicon, the amplification efficiency of described target amplicon and described target amplicon; And wherein said test primer is used to increase simultaneously the DNA of the hypothesis father comprised from fetus sample at least 1,000 target gene seat and amplification simultaneously comprise from the described target gene seat in the maternal DNA of the described pregnant mothers of fetus and the sample of foetal DNA, thus determine that whether described hypothesis father is the natural father of described fetus.

62. methods according to claim 58, wherein said undesirable mark is selected from by the parameter of the following group formed based on one or more at least partly: the heterozygosis rate of described target gene seat, described candidate drugs are to the size of the specificity of described target gene seat, the size of described candidate drugs and described target amplicon; And wherein said test primer is used at least 1,000 the target gene seat using the annealing time of at least 5 minutes to increase in legal medical expert's nucleic acid samples simultaneously.

63. methods according to claim 58, comprise at least 1,000 target gene seat using described test primer simultaneously to increase in contrast nucleic acid samples with produce first group of target amplicon and described target gene seat simultaneously in amplification assay nucleic acid samples to produce second group of target amplicon; And more described first and second groups of target amplicon are not still stored in another sample to determine whether target gene seat is present in a sample, or whether target gene seat is present in described control sample and described test sample with different levels.

64. methods according to claim 63, wherein said test sample is from the doubtful individuality with the risk of relative disease or phenotype or relative disease or phenotypic increase; And one or more in wherein said target gene seat comprises the risk polymorphism that is relevant or that be correlated with described relative disease or phenotype to described relative disease or phenotypic increase.

65. methods according to claim 58, comprise use described test primer to increase to comprise simultaneously at least 1,000 target gene seat in the control sample of RNA with produce first group of target amplicon and the described target gene seat that comprises in the test sample of RNA of simultaneously increasing to produce second group of target amplicon; And more described first and second groups of target amplicon with determine described rna expression between described control sample and described test sample horizontal in presence or absence difference.

66. methods according to claim 65, wherein said RNA is mRNA.

67. methods according to claim 65, wherein said test sample is from the doubtful individuality with the risk of cancer of cancer or increase; And one or more in wherein said target gene seat comprises polymorphism that is relevant to the risk of cancer increased or that be correlated with cancer or other suddenlys change.

68. methods according to claim 65, wherein said test sample is the individuality from suffering from cancer after diagnosing; And between wherein said control sample and test sample Discrepancy Description target gene seat in described rna expression is horizontal comprise with increase polymorphism that the risk of cancer that reduces is relevant or other suddenly change.

69. methods according to claim 53 or 54, before being included in step (b) further, remove the primer pair producing the target amplicon overlapping with the target amplicon produced by another primer pair from described storehouse.

70. methods according to claim 53 or 54, in wherein said storehouse, remaining described candidate drugs can increase at least 1,000 different target gene seat simultaneously.

71. methods according to claim 53 or 54, comprise further:

E () makes the nucleic acid samples comprising target gene seat contact to produce reaction mixture with described candidate drugs remaining in described storehouse; And

F () makes described reaction mixture experience primer extension reaction condition to produce the amplified production comprising target amplicon.

72. 1 kinds of primer storehouses, its at least 1,000 different target gene seat that simultaneously increases is test primer dimer to make the described amplified production being less than 30%.

73. 1 kinds of primer storehouses, it increases at least 1,000 different target gene seat to make the described amplified production of at least 80% be target amplicon simultaneously.

74. 1 kinds of primer storehouses, it increases target gene seat to make at least 1 simultaneously, and in 000 different target gene seat, the described target gene seat of at least 80% is amplified.

75. storehouses according to any one of claim 72 to 74, wherein at least 5,000 different target gene seat is amplified.

76. storehouses according to any one of claim 72 to 74, wherein at least 10,000 different target gene seat is amplified.

77. storehouses according to any one of claim 72 to 74, wherein at least 20,000 different target gene seat is amplified.

78. storehouses according to any one of claim 72 to 74, wherein at least 30,000 different target gene seat is amplified.

79. storehouses according to any one of claim 72 to 74, wherein the described amplified production of at least 90% is target amplicon.

80. storehouses according to any one of claim 72 to 74, wherein the described amplified production of at least 95% is target amplicon.

81. storehouses according to any one of claim 72 to 74, wherein the described amplified production of at least 99% is target amplicon.

82. storehouses according to any one of claim 72 to 74, wherein the described target gene seat of at least 90% is amplified.

83. storehouses according to any one of claim 72 to 74, wherein the described target gene seat of at least 95% is amplified.

84. storehouses according to any one of claim 72 to 74, wherein the described target gene seat of at least 99% is amplified.

85. storehouses according to any one of claim 72 to 74, the described amplified production being wherein less than 20% is primer dimer.

86. storehouses according to any one of claim 72 to 74, the described amplified production being wherein less than 10% is primer dimer.

87. storehouses according to any one of claim 72 to 74, the described amplified production being wherein less than 1% is primer dimer.

88. storehouses according to any one of claim 72 to 74, the described amplified production being wherein less than 0.1% is primer dimer.

89. storehouses according to any one of claim 72 to 74, comprise at least 1,000 primer pair, wherein each primer pair comprises the forward primer and reverse primer that hybridize to target gene seat.

90. storehouses according to any one of claim 72 to 74, comprise the independent primer that at least 1,000 hybridizes to different target locus.

91. 1 kinds of test kits for the target gene seat in amplification of nucleic acid sample, comprise (i) storehouse according to any one of claim 72 to 74 and (iii) and use described storehouse to increase the specification sheets of described target gene seat.

92. 1 kinds for measuring the method for the chromosomal ploidy state in the fetus in breeding, described method comprises:

A () makes nucleic acid samples contact to produce reaction mixture with the primer storehouse hybridizing at least 1,000 different polymorphic locus simultaneously; Wherein said nucleic acid samples comprises from the maternal DNA of mother of described fetus and the foetal DNA from described fetus; And

B () makes described reaction mixture experience primer extension reaction condition to produce amplified production;

C () measures described amplified production to produce sequencing data with high-flux sequence instrument;

D () calculates on computers based on described sequencing data and counts at the allelotrope of described polymorphic locus;

E () creates multiple separately about the ploidy hypothesis of described chromosomal different possibility ploidy state on computers;

F () is supposed for often kind of ploidy, be that the expectation allelotrope counting at the described polymorphic locus place on described karyomit(e) builds simultaneous distribution model on computers;

G () uses described simultaneous distribution model and described allelotrope to count the relative probability of each measured on computers in described ploidy hypothesis; And

H () is by selecting the ploidy state corresponding to the hypothesis with maximum probability, the ploidy state of fetus described in interpretation.

93. 1 kinds of test chromosome comprise maternal and fetus DNA mixture sample in the method for skewed distribution, described method comprises:

A () makes described sample contact to produce reaction mixture with the primer storehouse hybridizing at least 1,000 different target gene seat simultaneously; Wherein said target gene seat is from multiple different karyomit(e); And wherein said multiple different karyomit(e) comprise at least one doubtful there is skewed distribution in described sample the first chromosome and the second karyomit(e) of at least one supposition normal distribution in described sample;

C () checks order to obtain to described amplified production the sequence mark that multiple and described target gene seat aims at; The length of wherein said sequence mark is enough to distribute to specific targets locus;

D described multiple sequence mark is distributed to the target gene seat of its correspondence by () on computers;

E quantity that () measures the sequence mark aimed at the target gene seat of described the first chromosome on computers and the quantity of sequence mark of aiming at described second chromosomal target gene seat; And

F () compares quantity from step (e) on computers to determine the skewed distribution of the first chromosome described in presence or absence.

94. 1 kinds for detecting the method for presence or absence fetus dysploidy, described method comprises:

A sample that () makes to comprise the mixture of the DNA of maternal and fetus and the primer storehouse simultaneously hybridizing at least 1,000 different non-polymorphic target gene seat contact to produce reaction mixture; Wherein said target gene seat is from multiple different karyomit(e);

B () makes described reaction mixture experience primer extension reaction condition to produce the amplified production comprising target amplicon;

C () carries out quantitatively to the relative frequency from described the first and second relevant chromosomal target amplicon on computers;

D () compares the described relative frequency from described the first and second relevant chromosomal target amplicon on computers; And

E () differentiates presence or absence dysploidy based on described relevant first and second chromosomal compared relative frequencies.

Suppose that whether father is the method for the natural father of the fetus bred in pregnant mothers body for determining for 95. 1 kinds, described method comprises:

A () is increased from the multiple polymorphic locuses on the genetic material of described hypothesis father simultaneously, comprise at least 1,000 different polymorphic locus, thus produce first group of amplified production;

(b) increase simultaneously the blood sample deriving from described pregnant mothers DNA biased sample on corresponding multiple polymorphic locuses to produce second group of amplified production; Wherein said DNA biased sample comprises foetal DNA and maternal DNA;

C (), based on described first and second groups of amplified productions, use genotype measuring result measures the probability that described hypothesis father is the natural father of described fetus on computers; And

D () uses the described hypothesis father measured to be that the probability of the natural father of described fetus determines that whether described hypothesis father is the natural father of described fetus.

96. according to the method described in claim 95, comprises further and increases from multiple polymorphic locuses corresponding on the genetic material of described mother to produce the 3rd group of amplified production simultaneously; Wherein based on described first, second, and third group of amplified production, use genotype measuring result measures the probability that described hypothesis father is the natural father of described fetus.

97. 1 kinds of methods measuring the amount of two or more the target gene seats in nucleic acid samples, described method comprises:

A () uses pcr amplification to comprise the nucleic acid samples of the first standard gene seat, the second standard gene seat, first object locus and the second target gene seat to form amplified production; Wherein said first standard gene seat and described first object locus have the Nucleotide of equal amts but its sequence is different at one or more Nucleotide place; And wherein said second standard gene seat and described second target gene seat have the Nucleotide of equal amts but its sequence is different at one or more Nucleotide place;

B () is checked order to determine to compare the standard ratio of the first increased standard gene seat compared to the relative quantity of the second increased standard gene seat to described amplified production; The difference of amplification in PCR efficiency of the wherein said standard ratio described first standard gene seat of instruction and described second standard gene seat;

C () measures and compares the target rate of increased first object locus compared to the relative quantity of the second increased target gene seat; And

(d) based on the described standard ratio adjustment from step (b) from the described target rate of step (c) to determine the relative quantity of first object locus described in described sample and described second target gene seat.

98. according to the method described in claim 97, comprises the absolute magnitude measuring first object locus and described second target gene seat described in described sample further.

99. according to the method described in claim 97, comprises and measures presence or absence target gene seat in described sample.

100. according to the method described in claim 97, comprises at least 1,000 the different target gene seat that simultaneously increases.

101. according to the method described in claim 97, and wherein said target gene seat is present on identical relative chromosome.

102. according to the method described in claim 97, and at least some in wherein said target gene seat is present on different relative chromosome.