[go: up one dir, main page]

CN118647717A - Non-invasive prenatal sample preparation and related methods and uses - Google Patents

Non-invasive prenatal sample preparation and related methods and uses Download PDF

Info

Publication number
CN118647717A
CN118647717A CN202380016677.1A CN202380016677A CN118647717A CN 118647717 A CN118647717 A CN 118647717A CN 202380016677 A CN202380016677 A CN 202380016677A CN 118647717 A CN118647717 A CN 118647717A
Authority
CN
China
Prior art keywords
nucleotides
deficiency
cfdna
syndrome
type
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202380016677.1A
Other languages
Chinese (zh)
Inventor
D·穆齐
G·古尔德
王珏皛
C·J·巴蒂
R·帕特尔
S·加尼什
K·特雷廷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Mailiad Women's Health Co
Original Assignee
Mailiad Women's Health Co
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mailiad Women's Health Co filed Critical Mailiad Women's Health Co
Priority claimed from PCT/US2023/010496 external-priority patent/WO2023137021A2/en
Publication of CN118647717A publication Critical patent/CN118647717A/en
Pending legal-status Critical Current

Links

Landscapes

  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

本公开是涉及从妊妇或孕妇制备游离DNA样本的方法,以及分析此类样本的相关方法。

The present disclosure relates to methods for preparing cell-free DNA samples from pregnant women or pregnant women, and related methods for analyzing such samples.

Description

非侵入性产前样本制备以及相关方法和用途Non-invasive prenatal sample preparation and related methods and uses

相关申请案的交叉引用CROSS-REFERENCE TO RELATED APPLICATIONS

本申请案要求2022年1月11日提交的美国临时申请案第63/298,593号和2022年7月1日提交的美国临时申请案第63/357,915号的权益,且每个申请案的全部内容以引用方式并入本文中。This application claims the benefit of U.S. Provisional Application No. 63/298,593, filed on January 11, 2022, and U.S. Provisional Application No. 63/357,915, filed on July 1, 2022, and the entire contents of each application are incorporated herein by reference.

技术领域Technical Field

本文描述了从妊妇制备样本的方法,以及分析此类样本的相关方法。This article describes methods for preparing samples from pregnant women and associated methods for analyzing such samples.

背景技术Background Art

以下对本技术背景的描述仅是为了帮助理解本技术而提供,并不承认描述或构成本技术的现有技术。The following description of the background of the present technology is provided only to help understand the present technology, and is not admitted to describe or constitute the prior art of the present technology.

非侵入性产前筛查(non-invasive pre-natal screening;NIPS)已经成为妊妇保健的一常规组成部分。NIPS可涉及筛查非整倍体(如唐氏综合征(Down syndrome)等)和筛查母亲或胎儿的其它基因异常。许多此类筛查利用游离DNA(cfDNA);然而,cfDNA的利用遇到许多挑战,因为母体血浆中仅一小部分cfDNA来自胎儿。Non-invasive pre-natal screening (NIPS) has become a routine part of maternal care. NIPS can involve screening for aneuploidy (such as Down syndrome) and other genetic abnormalities in the mother or fetus. Many of these screens utilize cell-free DNA (cfDNA); however, the use of cfDNA encounters many challenges because only a small portion of cfDNA in maternal plasma comes from the fetus.

另外,对某些遗传性病况的产前筛查传统上需要从母亲和父亲二者获得DNA样本。举例来说,检测非整倍体和各种基因状况的传统方法需要从胎儿的母亲和父亲二者获得基因组DNA(gDNA)样本,以及从母亲获得cfDNA。因此,这种测试需要至少三个样本,每个样本可以不同方式进行处理和评估。In addition, prenatal screening for certain genetic conditions traditionally requires obtaining DNA samples from both the mother and the father. For example, traditional methods for detecting aneuploidy and various genetic conditions require obtaining genomic DNA (gDNA) samples from both the mother and the father of the fetus, as well as cfDNA from the mother. Therefore, such tests require at least three samples, each of which can be processed and evaluated in different ways.

本公开通过提供选择性地使母体样本的胎儿部分富集的方法来解决那些挑战,使得非整倍体和其它基因变体/突变二者的NIPS可以仅用单一母体样本并行进行。The present disclosure addresses those challenges by providing methods to selectively enrich the fetal portion of a maternal sample so that NIPS for both aneuploidy and other genetic variants/mutations can be performed in parallel with only a single maternal sample.

发明内容Summary of the invention

本公开总体上是涉及新的样本制备和对来自单一样本的非整倍体和其它基因变异(如致病性SNP、插入或缺失(INDEL)和单基因拷贝数变异(single gene copy numbervariation))的并行筛查。这些组合物和方法通过对所需分析进行流线型化和简化、使用更少样本及降低背景噪声来改善非侵入性产前筛查(NIPS),与常规产前筛查分析相比,所有所述方法皆具有更低的复杂性且需要更少的时间。The present disclosure is generally directed to novel sample preparation and parallel screening for aneuploidy and other genetic variations (such as pathogenic SNPs, insertions or deletions (INDELs), and single gene copy number variations) from a single sample. These compositions and methods improve non-invasive prenatal screening (NIPS) by streamlining and simplifying the required analysis, using less sample, and reducing background noise, all with less complexity and requiring less time than conventional prenatal screening analysis.

在一个方面中,本公开提供了制备具有经富集的胎儿部分的生物样本的方法,其包括:In one aspect, the present disclosure provides a method of preparing a biological sample having an enriched fetal portion, comprising:

(a-1)从孕妇获得包括游离DNA(cfDNA)的生物样本;(a-1) obtaining a biological sample including cell-free DNA (cfDNA) from a pregnant woman;

(b-1)从所述生物样本中萃取cfDNA;(b-1) extracting cfDNA from the biological sample;

(c-1)制备cfDNA片段库以获得cfDNA库;(c-1) preparing a cfDNA fragment library to obtain a cfDNA library;

(d-1)根据大小分离所述cfDNA库中的所述cfDNA片段,以仅保留小于约150个核苷酸长度、约155个核苷酸长度、约160个核苷酸长度、约165个核苷酸长度、约170个核苷酸长度、约175个核苷酸长度、约180个核苷酸长度、约185个核苷酸长度、约190个核苷酸长度、约195个核苷酸长度、或约200个核苷酸长度的cfDNA片段;(d-1) separating the cfDNA fragments in the cfDNA pool according to size to retain only cfDNA fragments of less than about 150 nucleotides in length, about 155 nucleotides in length, about 160 nucleotides in length, about 165 nucleotides in length, about 170 nucleotides in length, about 175 nucleotides in length, about 180 nucleotides in length, about 185 nucleotides in length, about 190 nucleotides in length, about 195 nucleotides in length, or about 200 nucleotides in length;

(e-1)对所保留的cfDNA片段进行测序以获得第一序列库;(e-1) sequencing the retained cfDNA fragments to obtain a first sequence library;

(f-1)基于读段长度的长度鉴定存在于所述第一序列库的至少两个窗口中的(i)游离胎儿DNA(cffDNA)序列和(ii)游离母体DNA(cfmDNA)序列;及(f-1) identifying (i) cell-free fetal DNA (cffDNA) sequences and (ii) cell-free maternal DNA (cfmDNA) sequences present in at least two windows of the first sequence library based on the length of the reads; and

(g-1)从所述序列库的所述至少两个窗口中的每一个中分离所述cffDNA序列,从而获得至少两个富集胎儿部分的序列库;(g-1) separating the cffDNA sequence from each of the at least two windows of the sequence library, thereby obtaining at least two sequence libraries enriched in the fetal portion;

或者or

(a-2)从孕妇获得包括游离DNA(cfDNA)的生物样本;(a-2) obtaining biological samples including cell-free DNA (cfDNA) from pregnant women;

(b-2)从所述生物样本中萃取cfDNA;(b-2) extracting cfDNA from the biological sample;

(c-2)分离来自(b-2)的所萃取的样本中的cfDNA片段,以仅保留小于约150个核苷酸长度、约155个核苷酸长度、约160个核苷酸长度、约165个核苷酸长度、约170个核苷酸长度、约175个核苷酸长度、约180个核苷酸长度、约185个核苷酸长度、约190个核苷酸长度、约195个核苷酸长度、或约200个核苷酸长度的cfDNA片段;(c-2) separating the cfDNA fragments in the extracted sample from (b-2) to retain only cfDNA fragments of less than about 150 nucleotides in length, about 155 nucleotides in length, about 160 nucleotides in length, about 165 nucleotides in length, about 170 nucleotides in length, about 175 nucleotides in length, about 180 nucleotides in length, about 185 nucleotides in length, about 190 nucleotides in length, about 195 nucleotides in length, or about 200 nucleotides in length;

(d-2)从来自(c-2)的所分离的cfDNA片段制备cfDNA库;(d-2) preparing a cfDNA library from the isolated cfDNA fragments from (c-2);

(e-2)对所述cfDNA库进行测序以获得第一序列库;(e-2) sequencing the cfDNA library to obtain a first sequence library;

(f-2)基于读段长度的长度鉴定存在于所述第一序列库的至少两个窗口中的(i)游离胎儿DNA(cffDNA)序列和(ii)游离母体DNA(cfmDNA)序列;及(f-2) identifying (i) cell-free fetal DNA (cffDNA) sequences and (ii) cell-free maternal DNA (cfmDNA) sequences present in at least two windows of the first sequence library based on the length of the reads; and

(g-2)从所述序列库的所述至少两个窗口中的每一个中分离所述cffDNA序列,从而获得至少两个富集胎儿部分的序列库。(g-2) isolating the cffDNA sequence from each of the at least two windows of the sequence library, thereby obtaining at least two sequence libraries enriched in the fetal part.

在一些实施例中,分离所述cfDNA片段使所述生物样本中的胎儿部分富集约1.1倍、约1.2倍、约1.3倍、约1.4倍、约1.5倍、约1.6倍、约1.7倍、约1.8倍、约1.9倍、或约2.0倍。In some embodiments, isolating the cfDNA fragments enriches the fetal portion of the biological sample by about 1.1 times, about 1.2 times, about 1.3 times, about 1.4 times, about 1.5 times, about 1.6 times, about 1.7 times, about 1.8 times, about 1.9 times, or about 2.0 times.

在一些实施例中,从所述第一序列库的所述至少两个窗口中分离所述cffDNA序列使所述生物样本中的胎儿部分富集约1.1倍、约1.2倍、约1.3倍、约1.4倍、约1.5倍、约1.6倍、约1.7倍、约1.8倍、约1.9倍、约2.0倍、约2.1倍、约2.2倍、约2.3倍、约2.4倍、约2.5倍、约2.6倍、约2.7倍、约2.8倍、约2.9倍、约3.0倍、约3.1倍、约3.2倍、约3.3倍、约3.4倍、或约3.5倍。In some embodiments, separation of the cffDNA sequences from the at least two windows of the first sequence library enriches the fetal portion in the biological sample by about 1.1 times, about 1.2 times, about 1.3 times, about 1.4 times, about 1.5 times, about 1.6 times, about 1.7 times, about 1.8 times, about 1.9 times, about 2.0 times, about 2.1 times, about 2.2 times, about 2.3 times, about 2.4 times, about 2.5 times, about 2.6 times, about 2.7 times, about 2.8 times, about 2.9 times, about 3.0 times, about 3.1 times, about 3.2 times, about 3.3 times, about 3.4 times, or about 3.5 times.

在一些实施例中,分离所述cfDNA片段包括电泳。In some embodiments, separating the cfDNA fragments comprises electrophoresis.

在一些实施例中,评估所述第一序列库的至少3个、至少4个、至少5个、至少6个、至少7个、至少8个、至少9个、或至少10个窗口,以鉴定和分离cffDNA序列,从而分别获得至少3个、至少4个、至少5个、至少6个、至少7个、至少8个、至少9个、或至少10个富集胎儿部分的序列库。In some embodiments, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, or at least 10 windows of the first sequence library are evaluated to identify and isolate cffDNA sequences, thereby obtaining at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, or at least 10 sequence libraries enriched in fetal portions, respectively.

在一些实施例中,所述方法可进一步包括通过对所述第一序列库中的cffDNA和cfmDNA的序列读段与参考基因组进行比较、对来自所述第一库的序列读段进行解多工、从所述第一序列库中去除重复序列、或其组合,从而从cfmDNA中鉴定和分离cffDNA。In some embodiments, the method may further include identifying and separating cffDNA from cfmDNA by comparing sequence reads of cffDNA and cfmDNA in the first sequence library to a reference genome, demultiplexing sequence reads from the first library, removing repetitive sequences from the first sequence library, or a combination thereof.

在一些实施例中,所述方法可进一步包括评估所述至少两个富集胎儿部分的序列库中一种或多种基因突变的存在。在一些实施例中,所述一种或多种基因突变导致选自以下的至少一种病况:21-羟化酶缺乏症、ABCC8相关高胰岛素症、ARSACS、软骨发育不全、全色盲、腺苷单磷酸脱胺酶1、胼胝体发育不全伴神经元病、黑尿症、α-1-抗胰蛋白酶缺乏症、α-甘露糖储积症、α-肌聚糖病、α-地中海贫血;阿尔茨海默症(Alzheimers),血管收缩素II受体I型、脂蛋白E基因分型;精胺琥珀酸尿症(Argininosuccinicaciduria)、天门冬葡萄糖胺尿、运动失调伴维生素E缺乏、运动失调毛细管扩张症、自体免疫多内分泌病变综合征1型、BRCA1遗传性乳腺癌/卵巢癌、BRCA2遗传性乳腺癌/卵巢癌、Bardet-Biedl二氏综合征、Best卵黄囊状黄斑失养症、β-肌聚糖病、β-地中海贫血、生物素酶缺乏症、Blau综合征、Bloom综合征、CFTR相关病症、CLN3相关神经性类蜡脂褐质病、CLN5相关神经性类蜡脂褐质病、CLN8相关神经性类蜡脂褐质病、Canavan病、肉毒碱棕榈酰转移酶IA缺乏症、肉毒碱棕榈酰转移酶II缺乏症、软骨-毛发发育不良、脑海绵状畸形(Cerebral Cavernous Malformation)、无脉络脉畸型、Cohen氏综合征、先天性白内障、面部异形(Facial Dysmorphism)及神经病变、先天性糖基化障碍la(Congenital Disorder of Glycosylationla)、先天性糖基化障碍Ib、先天性芬兰肾病(Congenital Finnish Nephrosis)、克隆氏病(Crohn Disease)、胱氨酸病、DFNA 9(COCH)、糖尿病及听力损失、早发性原发性肌紧张不足(Early-Onset PrimaryDystonia;DYTI)、Herlitz-Pearson型交界型水疱性表皮松解症(Epidermolysis BullosaJunctional,Herlitz-Pearson Type)、FANCC相关Fanconi贫血、FGFR1相关颅缝线封闭过早、FGFR2相关颅缝线封闭过早、FGFR3相关颅缝线封闭过早、第五因素Leiden血栓好发症(Factor VLeiden Thrombophilia)、第五因素R2突变血栓好发症、第十一因素缺乏症、第十三因素缺乏症、家族性腺瘤性息肉病(Familial Adenomatous Polyposis)、家族性自主神经障碍(Familial Dysautonomia)、家族性高胆固醇血症B型、家族性地中海热(FamilialMediterranean Fever)、游离唾液酸存储障碍(Free Sialic Acid Storage Disorder)、额颞叶痴呆伴帕金森氏症17(Frontotemporal Dementia with Parkinsonism-17)、延胡索酸酶缺乏症、GJB2相关DFNA 3型非综合征性听力损失及耳聋、GJB2相关DFNB 1非综合征性听力损失及耳聋、GNE相关肌病、半乳糖血症、Gaucher氏病、葡萄糖-6-磷酸脱氢酶缺乏症、戊二酸血症1型、糖原贮积病1a型(Glycogen Storage Disease Type 1a)、糖原贮积病Ib型、糖原贮积病II型、糖原贮积病III型、糖原贮积病V型、Gracile综合征、HFE相关联的遗传性血铁沉积症(HFE-Associated Hereditary Hemochromatosis)、Halder AIMs、血红蛋白Sβ-地中海贫血、遗传性果糖不耐受、遗传性胰腺炎、遗传性胸腺嘧啶-尿嘧啶尿症(HereditaryThymine-Uraciluria)、己糖胺酶A缺乏症、有汗性外胚层发育异常2(Hidrotic EctodermalDysplasia 2)、胱硫醚β-合酶缺乏引起的高胱氨酸尿症、高钾血周期性麻痹1型、高鸟氨酸血症-高氨血症-高瓜氨酸尿综合征、原发性高草酸盐尿症1型、原发性高草酸盐尿症2型、软骨生成减退、低钾血周期性麻痹1型、低钾血周期性麻痹2型、低磷酸酶症、婴儿肌病及乳酸性酸中毒(致死型和非致死型)、异戊酸血症、Krabbe病、LGMD2I、Leber遗传性视神经病变、法国-加拿大型Leigh综合征、长链3-羟酰基-辅酶A脱氢酶缺乏症(Long Chain 3-Hydroxyacyl-CoA Dehydrogenase Deficiency)、MELAS、MERRF、MTHFR缺乏症、MTHFR不耐热变异、MTRNR1相关听力损失及耳聋、MTTS1相关听力损失及耳聋、MYH相关联的息肉病、枫糖浆尿病1A型、枫糖浆尿病1B型、马科恩-亚百特氏综合征(McCune-Albright Syndrome)、中链酰基辅酶A脱氢酶缺乏症、巨脑白质病伴皮质下囊肿(MegalencephalicLeukoencephalopathy with Subcortical Cyst)、异染性白质失养症(MetachromaticLeukodystrophy)、线粒体心肌病(Mitochondrial Cardiomyopathy)、线粒体DNA相关联的Leigh综合征及NARP、黏脂贮积病IV(Mucolipidosis IV)、黏多糖病I型(Mucopolysaccharidosis Type I)、黏多糖病IIIA型、黏多糖病VII型、多发性内分泌瘤2型、肌-眼-脑疾病、线样肌病(Nemaline Myopathy)、神经表型、由于神经磷脂酶缺乏引起的尼曼-匹克病(Niemann-Pick Disease Due to Sphingomyelinase Deficiency)、尼曼-匹克病C1型、奈梅亨染色体断裂综合征(Nijmegen Breakage Syndrome)、PPT1相关神经性类蜡脂褐质病、PROP1相关下垂体激素缺乏症(PROP1-related pituitary hormonedeficiency)、Pallister-Hall综合征、先天性肌刚痉病(Paramyotonia Congenita)、Pendred综合征、过氧化体双功能酶缺乏症、广泛性发展障碍(Pervasive DevelopmentalDisorder)、苯丙氨酸羟化酶缺乏症、血浆蛋白原活化因子抑制物I(PlasminogenActivator Inhibitor I)、常染色体隐性遗传多囊肾病、凝血酶原G20210A血栓好发症、假维生素D缺乏性佝偻病、致密成骨不全症、Bothnia型常染色体隐性色素沉着性视网膜炎、雷特氏综合征(Rett Syndrome)、肢根性点状软骨发育异常1型(RhizomelicChondrodysplasia Punctata Type 1)、短链酰基辅酶A脱氢酶缺乏症、Shwachman-Diamond综合征、Sjogren-Larsson综合征、Smith-Lemli-Opitz综合征、痉挛性截瘫13、硫酸盐转运蛋白相关骨软骨发育不良、TFR2相关遗传性血色病、TPP1相关神经性类蜡脂褐质病、致死性软骨发育不全、运甲状腺素蛋白淀粉样变性(Transthyretin Amyloidosis)、三功能蛋白缺乏症、酪氨酸羟化酶缺乏性DRD、酪氨酸血症I型、Wilson氏病、X性联青年性视网膜劈裂症(X-Linked Juvenile Retinoschisis)、囊肿纤维化(cystic fibrosis)、脊髓性肌肉萎缩症(SMA)、血红素病、和Zellweger综合征谱系。In some embodiments, the method may further include evaluating the presence of one or more genetic mutations in the sequence libraries of the at least two enriched fetal portions. In some embodiments, the one or more genetic mutations result in at least one condition selected from the following: 21-hydroxylase deficiency, ABCC8-related hyperinsulinism, ARSACS, achondroplasia, achromatopsia, adenosine monophosphate deaminase 1, corpus callosum agenesis with neuronopathy, alkaptonuria, alpha-1-antitrypsin deficiency, alpha-mannose storage disease, alpha-sarcoglycan disease, alpha-thalassemia; Alzheimer's disease, angiotensin II receptor type I, lipoprotein E genotyping; Argininosuccinic aciduria, aspartate glucosamineuria, ataxia with vitamin E deficiency, ataxia telangiectasia dystrophia, autoimmune polyendocrinopathy syndrome type 1, BRCA1 hereditary breast/ovarian cancer, BRCA2 hereditary breast/ovarian cancer, Bardet-Biedl syndrome, Best yolk sac macular dystrophy, beta-sarcoglycanosis, beta-thalassemia, biotinidase deficiency, Blau syndrome, Bloom syndrome, CFTR-related disorders, CLN3-related neurogenic cerolipofuscinosis, CLN5-related neurogenic cerolipofuscinosis, CLN8-related neurogenic cerolipofuscinosis, Canavan disease, carnitine palmitoyltransferase IA deficiency, carnitine palmitoyltransferase II deficiency, chondro-hair dysplasia, cerebral cavernous malformation (Cerebral cavernous malformation Cavernous Malformation), Choroiderosis, Cohen's Syndrome, Congenital Cataract, Facial Dysmorphism and Neuropathy, Congenital Disorder of Glycosylation la, Congenital Disorder of Glycosylation Ib, Congenital Finnish Nephrosis, Crohn Disease, Cystinosis, DFNA 9 (COCH), Diabetes and Hearing Loss, Early-Onset Primary Dystonia (DYTI), Epidermolysis Bullosa Junctional, Herlitz-Pearson Type, FANCC-associated Fanconi Anemia, FGFR1-associated Premature Suture Closure, FGFR2-associated Premature Suture Closure, FGFR3-associated Premature Suture Closure, Factor V Leiden Thrombophilia Thrombophilia), Factor V R2 Mutation Thrombophilia, Factor 11 Deficiency, Factor 13 Deficiency, Familial Adenomatous Polyposis, Familial Dysautonomia, Familial Hypercholesterolemia Type B, Familial Mediterranean Fever, Free Sialic Acid Storage Disorder, Frontotemporal Dementia with Parkinsonism-17, Fumarase Deficiency, GJB2-Related DFNA Type 3 Nonsyndromic Hearing Loss and Deafness, GJB2-Related DFNB 1 Nonsyndromic Hearing Loss and Deafness, GNE-Related Myopathy, Galactosemia, Gaucher's Disease, Glucose-6-Phosphate Dehydrogenase Deficiency, Glutaric Acidemia Type 1, Glycogen Storage Disease Type 1a 1a), Glycogen Storage Disease Type Ib, Glycogen Storage Disease Type II, Glycogen Storage Disease Type III, Glycogen Storage Disease Type V, Gracile Syndrome, HFE-Associated Hereditary Hemochromatosis, Halder AIMs, Hemoglobin Sβ-thalassemia, Hereditary Fructose Intolerance, Hereditary Pancreatitis, Hereditary Thymine-Uraciluria, Hexosaminidase A Deficiency, Hidrotic Ectodermal Dysplasia 2 2), Homocystinuria due to cystathionine β-synthase deficiency, Hyperkalemic periodic paralysis type 1, Hyperornithinemia-hyperammonemia-homocitrullinuria syndrome, Primary hyperoxaluria type 1, Primary hyperoxaluria type 2, Hypochondrogenosis, Hypokalemic periodic paralysis type 1, Hypokalemic periodic paralysis type 2, Hypophosphatasia, Infantile myopathy and lactic acidosis (fatal and non-fatal), Isovaleric acidemia, Krabbe disease, LGMD2I, Leber hereditary optic neuropathy, French-Canadian Leigh syndrome, Long Chain 3-Hydroxyacyl-CoA Dehydrogenase deficiency (Long Chain 3-Hydroxyacyl-CoA Dehydrogenase deficiency Deficiency), MELAS, MERRF, MTHFR Deficiency, MTHFR Thermolabile Variants, MTRNR1-Associated Hearing Loss and Deafness, MTTS1-Associated Hearing Loss and Deafness, MYH-Associated Polyposis, Maple Syrup Urine Disease Type 1A, Maple Syrup Urine Disease Type 1B, McCune-Albright Syndrome, Medium Chain Acyl-CoA Dehydrogenase Deficiency, Megalencephalic Leukoencephalopathy with Subcortical Cyst, Metachromatic Leukodystrophy, Mitochondrial Cardiomyopathy, Mitochondrial DNA-Associated Leigh Syndrome and NARP, Mucolipidosis IV, Mucopolysaccharidosis Type I I), Mucopolysaccharidosis Type IIIA, Mucopolysaccharidosis Type VII, Multiple Endocrine Neoplasia Type 2, Muscle-Eye-Brain Disease, Nemaline Myopathy, Neurological Phenotype, Niemann-Pick Disease Due to Sphingomyelinase Deficiency, Niemann-Pick Disease Type C1, Nijmegen Breakage Syndrome, PPT1-Related Neurological Cerofuscinosis, PROP1-Related Pituitary Hormone Deficiency, Pallister-Hall Syndrome, Paramyotonia Congenita, Pendred Syndrome, Peroxisome Bifunctional Enzyme Deficiency, Pervasive Developmental Disorder, Phenylalanine Hydroxylase Deficiency, Plasminogen Activator Inhibitor I I), autosomal recessive polycystic kidney disease, prothrombin G20210A thrombophilia, pseudovitamin D deficiency rickets, pycnodysostosis, autosomal recessive pigmented retinitis Bothnia, Rett syndrome, Rhizomelic Chondrodysplasia Punctata Type 1, short-chain acyl-CoA dehydrogenase deficiency, Shwachman-Diamond syndrome, Sjogren-Larsson syndrome, Smith-Lemli-Opitz syndrome, spastic paraplegia 13, sulfate transporter-related osteochondrodysplasia, TFR2-related hereditary hemochromatosis, TPP1-related neurological cerolipofuscinosis, lethal achondroplasia, transthyretin amyloidosis Amyloidosis), trifunctional protein deficiency, tyrosine hydroxylase deficiency DRD, tyrosinemia type I, Wilson's disease, X-linked juvenile retinoschisis, cystic fibrosis, spinal muscular atrophy (SMA), heme diseases, and Zellweger syndrome spectrum.

在一些实施例中,所述方法可进一步包括评估包括cfDNA的生物样本中非整倍体的存在。在一些实施例中,所述非整倍体选自单染色体、三染色体、四染色体、五染色体、微缺失、微复制、以及单染色体、三染色体、四染色体、和五染色体的嵌合体形式。In some embodiments, the method may further include assessing the presence of aneuploidy in a biological sample comprising cfDNA. In some embodiments, the aneuploidy is selected from a single chromosome, a trisomy, a tetrasome, a pentasome, a microdeletion, a microduplication, and a mosaic form of a single chromosome, a trisomy, a tetrasome, and a pentasome.

在另一方面中,本公开提供了并行检测单一母体样本中非整倍体的存在或不存在以及至少一种基因变体的存在或不存在的方法,其包括In another aspect, the present disclosure provides a method for concurrently detecting the presence or absence of aneuploidy and the presence or absence of at least one genetic variant in a single maternal sample, comprising

(i)从孕妇获得生物样本,其中所述生物样本包括游离DNA(cfDNA);(i) obtaining a biological sample from a pregnant woman, wherein the biological sample comprises cell-free DNA (cfDNA);

(ii)制备cfDNA库;(ii) preparing a cfDNA library;

(iii)对所述cfDNA库进行测序以产生序列库;及(iii) sequencing the cfDNA library to generate a sequence library; and

(iv)检测所述单一母体样本中非整倍体的存在或不存在以及至少一种基因变体的存在或不存在;(iv) detecting the presence or absence of aneuploidy and the presence or absence of at least one genetic variant in the single maternal sample;

其中(a)使所述cfDNA库富集以增加胎儿部分,(b)使所述序列库富集以增加胎儿部分,或(c)其组合,使得在检测所述单一母体样本中非整倍体的存在或不存在以及至少一种基因变体的存在或不存在之前,所述单一母体样本的所述胎儿部分增加至少1.1倍、至少1.2倍、至少1.3倍、至少1.4倍、或至少1.5倍。Wherein (a) the cfDNA library is enriched to increase the fetal portion, (b) the sequence library is enriched to increase the fetal portion, or (c) a combination thereof, such that the fetal portion of the single maternal sample is increased by at least 1.1 times, at least 1.2 times, at least 1.3 times, at least 1.4 times, or at least 1.5 times prior to detecting the presence or absence of aneuploidy and the presence or absence of at least one genetic variant in the single maternal sample.

在一些实施例中,所述生物样本是血液、血清、或血浆。In some embodiments, the biological sample is blood, serum, or plasma.

在一些实施例中,使所述cfDNA库富集以增加胎儿部分,并且使所述序列库富集以增加胎儿部分。In some embodiments, the cfDNA library is enriched to increase the fetal portion, and the sequence library is enriched to increase the fetal portion.

在一些实施例中,富集所述cfDNA库的胎儿部分包括从所述cfDNA库中去除大于约150个核苷酸长度、约155个核苷酸长度、约160个核苷酸长度、约165个核苷酸长度、约170个核苷酸长度、约175个核苷酸长度、或约180个核苷酸长度的任何DNA片段。在一些实施例中,从所述cfDNA库中去除所述DNA片段包括电泳。In some embodiments, enriching the fetal portion of the cfDNA pool comprises removing any DNA fragments greater than about 150 nucleotides in length, about 155 nucleotides in length, about 160 nucleotides in length, about 165 nucleotides in length, about 170 nucleotides in length, about 175 nucleotides in length, or about 180 nucleotides in length from the cfDNA pool. In some embodiments, removing the DNA fragments from the cfDNA pool comprises electrophoresis.

在一些实施例中,富集所述序列库的胎儿部分包括对所述序列库的至少两个窗口中的序列进行基于读段长度的大小排除,从而获得至少两个富集胎儿部分的序列库。在一些实施例中,评估所述第一序列库的至少3个、至少4个、至少5个、至少6个、至少7个、至少8个、至少9个、或至少10个窗口,以鉴定和分离cffDNA序列,从而分别获得至少3个、至少4个、至少5个、至少6个、至少7个、至少8个、至少9个、或至少10个富集胎儿部分的序列库。在一些实施例中,所述序列库的至少两个窗口选自(i)0-145个核苷酸的序列、(ii)0-150个核苷酸的序列、(iii)0-155个核苷酸、(iv)0-160个核苷酸、(v)0-165个核苷酸、(vi)0-168个核苷酸、(vii)0-170个核苷酸、(viii)0-175个核苷酸、(ix)0-180个核苷酸、(x)0-185个核苷酸、(xi)0-190个核苷酸、(xii)0-195个核苷酸、(xiii)0-200个核苷酸、和(xiv)未闸控者。In some embodiments, enriching the fetal portion of the sequence library comprises performing size exclusion based on read length on sequences in at least two windows of the sequence library, thereby obtaining at least two sequence libraries enriched in the fetal portion. In some embodiments, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, or at least 10 windows of the first sequence library are evaluated to identify and isolate cffDNA sequences, thereby obtaining at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, or at least 10 sequence libraries enriched in the fetal portion, respectively. In some embodiments, at least two windows of the sequence library are selected from (i) sequences of 0-145 nucleotides, (ii) sequences of 0-150 nucleotides, (iii) 0-155 nucleotides, (iv) 0-160 nucleotides, (v) 0-165 nucleotides, (vi) 0-168 nucleotides, (vii) 0-170 nucleotides, (viii) 0-175 nucleotides, (ix) 0-180 nucleotides, (x) 0-185 nucleotides, (xi) 0-190 nucleotides, (xii) 0-195 nucleotides, (xiii) 0-200 nucleotides, and (xiv) ungated.

在一些实施例中,富集所述序列库的胎儿部分进一步包括通过对所述第一序列库中的cffDNA和cfmDNA的序列读段与参考基因组进行比较、对来自所述第一库的序列读段进行解多工、从所述第一序列库中去除重复序列、或其组合,从而从cfmDNA中鉴定和分离cffDNA。In some embodiments, enriching the fetal portion of the sequence library further includes identifying and separating cffDNA from cfmDNA by comparing sequence reads of cffDNA and cfmDNA in the first sequence library to a reference genome, demultiplexing sequence reads from the first library, removing repetitive sequences from the first sequence library, or a combination thereof.

在一些实施例中,检测至少一种基因变体的存在或不存在包括在所述至少两个富集胎儿部分的序列库中的每一个中确定样本中编码所述至少一种基因变体的每个等位基因的等位基因平衡,并基于所述至少两个富集胎儿部分的序列库中的每一个中的所述等位基因平衡产生每个等位基因的等位基因平衡轨迹,基于所述至少两个富集胎儿部分的序列库的深度产生深度轨迹,或产生等位基因平衡轨迹和深度轨迹的组合。In some embodiments, detecting the presence or absence of at least one genetic variant includes determining the allelic balance of each allele encoding the at least one genetic variant in the sample in each of the at least two sequence libraries enriched for the fetal portion, and generating an allelic balance trajectory for each allele based on the allelic balance in each of the at least two sequence libraries enriched for the fetal portion, generating a depth trajectory based on the depth of the at least two sequence libraries enriched for the fetal portion, or generating a combination of an allelic balance trajectory and a depth trajectory.

在一些实施例中,检测非整倍体的存在或不存在包括分析所述序列库中对应于感兴趣的染色体的至少一个序列的序列深度。在一些实施例中,对应于所述感兴趣的染色体的所述至少一个序列的所述序列深度适配所述感兴趣的染色体的预期深度模型。在一些实施例中,所述序列深度通过下式来计算:In some embodiments, detecting the presence or absence of aneuploidy includes analyzing the sequence depth of at least one sequence corresponding to a chromosome of interest in the sequence library. In some embodiments, the sequence depth of at least one sequence corresponding to the chromosome of interest is adapted to the expected depth model of the chromosome of interest. In some embodiments, the sequence depth is calculated by the following formula:

其中:in:

dp是妊娠深度d p is the depth of pregnancy

f是胎儿部分f is the fetus

cm是母体拷贝数c m is the maternal copy number

db是背景深度d b is the background depth

cf是胎儿拷贝数。c f is the fetal copy number.

在一些实施例中,将所述序列深度正规化以控制GC偏差、样本背景、杂交探针捕获、或其组合。In some embodiments, the sequence depth is normalized to control for GC bias, sample background, hybridization probe capture, or a combination thereof.

在一些实施例中,所述方法包括检测选自单染色体、三染色体、四染色体、多染色体X、多染色体Y、微缺失、微重复、五染色体、及其组合的非整倍体的存在或不存在。In some embodiments, the method comprises detecting the presence or absence of an aneuploidy selected from a monosomy, a trisomy, a tetrasomy, polysomy X, polysomy Y, a microdeletion, a microduplication, a pentasome, and combinations thereof.

在一些实施例中,所述至少一种基因变体与选自以下的疾病相关联:21-羟化酶缺乏症、ABCC8相关高胰岛素症、ARSACS、软骨发育不全、全色盲、腺苷单磷酸脱胺酶1、胼胝体发育不全伴神经元病、黑尿症、α-1-抗胰蛋白酶缺乏症、α-甘露糖储积症、α-肌聚糖病、α-地中海贫血;阿尔茨海默症,血管收缩素II受体I型、脂蛋白E基因分型;精胺琥珀酸尿症、天门冬葡萄糖胺尿、运动失调伴维生素E缺乏、运动失调毛细管扩张症、自体免疫多内分泌病变综合征1型、BRCA1遗传性乳腺癌/卵巢癌、BRCA2遗传性乳腺癌/卵巢癌、Bardet-Biedl二氏综合征、Best卵黄囊状黄斑失养症、β-肌聚糖病、β-地中海贫血、生物素酶缺乏症、Blau综合征、Bloom综合征、CFTR相关病症、CLN3相关神经性类蜡脂褐质病、CLN5相关神经性类蜡脂褐质病、CLN8相关神经性类蜡脂褐质病、Canavan病、肉毒碱棕榈酰转移酶IA缺乏症、肉毒碱棕榈酰转移酶II缺乏症、软骨-毛发发育不良、脑海绵状畸形、无脉络脉畸型、Cohen氏综合征、先天性白内障、面部异形及神经病变、先天性糖基化障碍la、先天性糖基化障碍Ib、先天性芬兰肾病、克隆氏病、胱氨酸病、DFNA 9(COCH)、糖尿病及听力损失、早发性原发性肌紧张不足(DYTI)、Herlitz-Pearson型交界型水疱性表皮松解症、FANCC相关Fanconi贫血、FGFR1相关颅缝线封闭过早、FGFR2相关颅缝线封闭过早、FGFR3相关颅缝线封闭过早、第五因素Leiden血栓好发症、第五因素R2突变血栓好发症、第十一因素缺乏症、第十三因素缺乏症、家族性腺瘤性息肉病、家族性自主神经障碍、家族性高胆固醇血症B型、家族性地中海热、游离唾液酸存储障碍、额颞叶痴呆伴帕金森氏症17、延胡索酸酶缺乏症、GJB2相关DFNA 3型非综合征性听力损失及耳聋、GJB2相关DFNB 1非综合征性听力损失及耳聋、GNE相关肌病、半乳糖血症、Gaucher氏病、葡萄糖-6-磷酸脱氢酶缺乏症、戊二酸血症1型、糖原贮积病1a型、糖原贮积病Ib型、糖原贮积病II型、糖原贮积病III型、糖原贮积病V型、Gracile综合征、HFE相关联的遗传性血铁沉积症、Halder AIMs、血红蛋白Sβ-地中海贫血、遗传性果糖不耐受、遗传性胰腺炎、遗传性胸腺嘧啶-尿嘧啶尿症、己糖胺酶A缺乏症、有汗性外胚层发育异常2、胱硫醚β-合酶缺乏引起的高胱氨酸尿症、高钾血周期性麻痹1型、高鸟氨酸血症-高氨血症-高瓜氨酸尿综合征、原发性高草酸盐尿症1型、原发性高草酸盐尿症2型、软骨生成减退、低钾血周期性麻痹1型、低钾血周期性麻痹2型、低磷酸酶症、婴儿肌病及乳酸性酸中毒(致死型和非致死型)、异戊酸血症、Krabbe病、LGMD2I、Leber遗传性视神经病变、法国-加拿大型Leigh综合征、长链3-羟酰基-辅酶A脱氢酶缺乏症、MELAS、MERRF、MTHFR缺乏症、MTHFR不耐热变异、MTRNR1相关听力损失及耳聋、MTTS1相关听力损失及耳聋、MYH相关联的息肉病、枫糖浆尿病1A型、枫糖浆尿病1B型、马科恩-亚百特氏综合征、中链酰基辅酶A脱氢酶缺乏症、巨脑白质病伴皮质下囊肿、异染性白质失养症、线粒体心肌病、线粒体DNA相关联的Leigh综合征及NARP、粘脂贮积病IV、黏多糖病I型、黏多糖病IIIA型、黏多糖病VII型、多发性内分泌瘤2型、肌-眼-脑疾病、线样肌病、神经表型、由于神经磷脂酶缺乏引起的尼曼-匹克病、尼曼-匹克病C1型、奈梅亨染色体断裂综合征、PPT1相关神经性类蜡脂褐质病、PROP1相关下垂体激素缺乏症、Pallister-Hall综合征、先天性肌刚痉病、Pendred综合征、过氧化体双功能酶缺乏症、广泛性发展障碍、苯丙氨酸羟化酶缺乏症、血浆蛋白原活化因子抑制物I、常染色体隐性遗传多囊肾病、凝血酶原G20210A血栓好发症、假维生素D缺乏性佝偻病、致密成骨不全症、Bothnia型常染色体隐性色素沉着性视网膜炎、雷特氏综合征、肢根性点状软骨发育异常1型、短链酰基辅酶A脱氢酶缺乏症、Shwachman-Diamond综合征、Sjogren-Larsson综合征、Smith-Lemli-Opitz综合征、痉挛性截瘫13、硫酸盐转运蛋白相关骨软骨发育不良、TFR2相关遗传性血色病、TPP1相关神经性类蜡脂褐质病、致死性软骨发育不全、运甲状腺素蛋白淀粉样变性、三功能蛋白缺乏症、酪氨酸羟化酶缺乏性DRD、酪氨酸血症I型、Wilson氏病、X性联青年性视网膜劈裂症、囊肿纤维化、脊髓性肌肉萎缩症(SMA)、血红素病、和Zellweger综合征谱系。In some embodiments, the at least one genetic variant is associated with a disease selected from the group consisting of: 21-hydroxylase deficiency, ABCC8-related hyperinsulinism, ARSACS, achondroplasia, achromatopsia, adenosine monophosphate deaminase 1, corpus callosum agenesis with neuronopathy, alkaptonuria, alpha-1-antitrypsin deficiency, alpha-mannose storage disease, alpha-sarcoglycanosis, alpha-thalassemia; Alzheimer's disease, angiotensin II receptor type I, lipoprotein E genotyping; sperminosuccinic aciduria, aspartaminuria, ataxia with vitamin E deficiency, ataxia telangiectasia, autoimmune polyendocrinopathy syndrome type 1, BRCA1 hereditary breast cancer/ovarian cancer, BRCA2 hereditary breast cancer/ovarian cancer, Bardet -Biedl syndrome, Best yolk sac macular dystrophy, beta-sarcoglycanosis, beta-thalassemia, biotinidase deficiency, Blau syndrome, Bloom syndrome, CFTR-related disorders, CLN3-related neurogenic cerolipofuscinosis, CLN5-related neurogenic cerolipofuscinosis, CLN8-related neurogenic cerolipofuscinosis, Canavan disease, carnitine palmitoyltransferase IA deficiency, carnitine palmitoyltransferase II deficiency, chondro-hair dysplasia, spongiomatous malformation, choroidergic malformation, Cohen syndrome, congenital cataract, facial dysmorphism and neuropathy, congenital glycosylation disorder la, congenital glycosylation disorder Ib, congenital Finnish nephropathy, Crohn's disease, cystinosis, DFNA 9(COCH), diabetes mellitus and hearing loss, early-onset primary atonia (DYTI), junctional epidermolysis bullosa Herlitz-Pearson type, FANCC-related Fanconi anemia, FGFR1-related premature cranial suture closure, FGFR2-related premature cranial suture closure, FGFR3-related premature cranial suture closure, factor V Leiden thrombophilia, factor V R2 mutation thrombophilia, factor 11 deficiency, factor 13 deficiency, familial adenomatous polyposis, familial dysautonomia, familial hypercholesterolemia type B, familial Mediterranean fever, free sialic acid storage disorder, frontotemporal dementia with Parkinson's disease 17, fumarase deficiency, GJB2-related DFNA type 3 non-syndromic hearing loss and deafness, GJB2-related DFNB 1Nonsyndromic hearing loss and deafness, GNE-related myopathy, galactosemia, Gaucher's disease, glucose-6-phosphate dehydrogenase deficiency, glutaric acidemia type 1, glycogen storage disease type 1a, glycogen storage disease type Ib, glycogen storage disease type II, glycogen storage disease type III, glycogen storage disease type V, Gracile syndrome, HFE-related hereditary hemosideroscopy, Halder AIMs, hemoglobin Sβ-thalassemia, hereditary fructose intolerance, hereditary pancreatitis, hereditary thymine-uraciluria, hexosaminidase A deficiency, hidrotic ectodermal dysplasia 2, homocystinuria due to cystathionine β-synthase deficiency, hyperkalemic periodic paralysis type 1, hyperornithinemia-hyperammonemia-homocitrullinuria syndrome, primary hyperoxaluria type 1, primary hyperoxaluria type 2, hypochondrogenosis, hypokalemic periodic paralysis type 1, hypokalemic periodic paralysis type 2, hypophosphatasia, infantile myopathy and lactic acidosis (fatal and nonfatal), isovaleric acidemia, Krabbe disease, LGMD2I, Leber hereditary optic neuropathy, French-Canadian Leigh syndrome syndrome, long-chain 3-hydroxyacyl-CoA dehydrogenase deficiency, MELAS, MERRF, MTHFR deficiency, MTHFR thermolabile variants, MTRNR1-related hearing loss and deafness, MTTS1-related hearing loss and deafness, MYH-associated polyposis, maple syrup urine disease type 1A, maple syrup urine disease type 1B, Markon-Subbatt syndrome, medium-chain acyl-CoA dehydrogenase deficiency, megalencephaly with subcortical cysts, metachromatic leukodystrophy, mitochondrial cardiomyopathy, mitochondrial DNA-associated Leigh syndrome and NARP, mucolipidosis IV, mucopolysaccharidosis type I, mucopolysaccharidosis type IIIA, mucopolysaccharidosis type VII, multiple endocrine neoplasia type 2, muscle-eye-brain disease, nematode myopathy, neuromyopathy Phenotype, Niemann-Pick disease due to phospholipase deficiency, Niemann-Pick disease type C1, Nijmegen chromosome breakage syndrome, PPT1-related neurological cerolipofuscinosis, PROP1-related pituitary hormone deficiency, Pallister-Hall syndrome, myospasm congenita, Pendred syndrome, peroxisome bifunctional enzyme deficiency, pervasive developmental disorder, phenylalanine hydroxylase deficiency, plasma proteinogen activator inhibitor I, autosomal recessive polycystic kidney disease, prothrombin G20210A thrombophilia, pseudovitamin D deficiency rickets, pycnodystrophy, autosomal recessive pigmented retinitis Bothnia type, Rett syndrome, chondrodysplasia punctata abnormality type 1, short-chain acyl-CoA dehydrogenase deficiency, Shwachman-Diamond syndrome, Sjogren-Larsson syndrome, Smith-Lemli-Opitz syndrome, spastic paraplegia 13, sulfate transporter-related osteochondrodysplasia, TFR2-related hereditary hemochromatosis, TPP1-related neuropathic cerolipofuscinosis, lethal achondroplasia, transthyretin amyloidosis, trifunctional protein deficiency, tyrosine hydroxylase deficiency DRD, tyrosinemia type I, Wilson's disease, X-linked juvenile retinoschisis, cystic fibrosis, spinal muscular atrophy (SMA), heme diseases, and Zellweger syndrome spectrum.

在另一方面中,本公开提供了富集生物样本中的游离胎儿DNA(cffDNA)的方法,其包括从孕妇获得包括游离DNA(cfDNA)的生物样本,其中所述cfDNA包括cffDNA和游离母体DNA(cfmDNA);从所述生物样本中萃取所述cfDNA;及使所萃取的cfDNA经受大小排除过程,其中所述大小排除过程具有约150个核苷酸长度、约155个核苷酸长度、约160个核苷酸长度、约165个核苷酸长度、约170个核苷酸长度、约175个核苷酸长度、或约180个核苷酸长度的截止大小,从而产生富集cffDNA的样本。In another aspect, the present disclosure provides a method for enriching free fetal DNA (cffDNA) in a biological sample, comprising obtaining a biological sample comprising free DNA (cfDNA) from a pregnant woman, wherein the cfDNA comprises cffDNA and free maternal DNA (cfmDNA); extracting the cfDNA from the biological sample; and subjecting the extracted cfDNA to a size exclusion process, wherein the size exclusion process has a cutoff size of about 150 nucleotides in length, about 155 nucleotides in length, about 160 nucleotides in length, about 165 nucleotides in length, about 170 nucleotides in length, about 175 nucleotides in length, or about 180 nucleotides in length, thereby producing a sample enriched in cffDNA.

在另一方面中,本公开提供了计算机模拟处理游离DNA(cfDNA)的方法,其包括对包括游离胎儿DNA(cffDNA)和游离母体DNA(cfmDNA)的cfDNA样本进行测序以制备序列库;进行基于读段长度的分析,其中在所述序列库的至少两个窗口中建立感兴趣的核酸序列的等位基因平衡;以及基于所述至少两个窗口的所述等位基因平衡建立轨迹。In another aspect, the present disclosure provides a method for computer simulation processing of free DNA (cfDNA), which includes sequencing a cfDNA sample including free fetal DNA (cffDNA) and free maternal DNA (cfmDNA) to prepare a sequence library; performing a read length-based analysis, wherein an allelic balance of a nucleic acid sequence of interest is established in at least two windows of the sequence library; and establishing a trajectory based on the allelic balance of the at least two windows.

在另一方面中,本公开提供了在非侵入性产前筛查(NIPS)中减少来自多余遗传物质的背景噪声的方法,其包括In another aspect, the present disclosure provides a method for reducing background noise from excess genetic material in non-invasive prenatal screening (NIPS), comprising

(i)从孕妇获得生物样本,其中所述生物样本包括游离DNA(cfDNA);及(i) obtaining a biological sample from a pregnant woman, wherein the biological sample comprises cell-free DNA (cfDNA); and

(ii)处理用于NIPS的cfDNA,其中处理包括富集所述生物样本中的游离胎儿DNA(cffDNA)、对所述cfDNA进行计算机模拟处理、或其组合。(ii) processing the cfDNA for NIPS, wherein the processing comprises enriching the free fetal DNA (cffDNA) in the biological sample, performing computer simulation processing on the cfDNA, or a combination thereof.

在一些实施例中,处理包括富集所述生物样本中的游离胎儿DNA(cffDNA)富集和对所述cfDNA进行计算机模拟处理两者。In some embodiments, processing includes both enriching cell-free fetal DNA (cffDNA) in said biological sample and performing computer simulation processing on said cfDNA.

在一些实施例中,富集所述生物样本中的游离胎儿DNA(cffDNA)包括本文所公开的富集生物样本中的游离胎儿DNA(cffDNA)的方法中的任一种。In some embodiments, enriching the cell-free fetal DNA (cffDNA) in the biological sample comprises any of the methods disclosed herein for enriching the cell-free fetal DNA (cffDNA) in a biological sample.

在一些实施例中,对所述cfDNA进行计算机模拟处理包括本文所公开的对游离DNA(cfDNA)进行计算机模拟处理的方法中的任一种。In some embodiments, performing computer simulation processing on the cfDNA includes any of the methods for performing computer simulation processing on free DNA (cfDNA) disclosed herein.

在一些实施例中,所述方法可进一步包括正规化以控制GC偏差、样本背景、杂交探针捕获、或其组合。In some embodiments, the method may further include normalization to control for GC bias, sample background, hybridization probe capture, or a combination thereof.

以下实施方式是示范性及解释性的,但不旨在限制。The following embodiments are exemplary and explanatory but not intended to be limiting.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1提供了对常规大小排除技术与所公开的大小排除方法进行比较的图,所公开的大小排除方法更宽容且保留更多cffDNA。FIG. 1 provides a graph comparing conventional size exclusion techniques to the disclosed size exclusion method, which is more forgiving and retains more cffDNA.

图2提供了所公开的计算机模拟富集方法的视觉化,其依赖于一移动窗口分析来密切观察等位基因平衡随胎儿和母体cfDNA的量变化的变化。FIG. 2 provides a visualization of the disclosed in silico enrichment method, which relies on a moving window analysis to closely observe changes in allelic balance as the amount of fetal and maternal cfDNA changes.

图3示出了从所公开的移动窗口分析观察到的等位基因平衡的两种视觉化方式。FIG. 3 shows two visualizations of allelic balance observed from the disclosed moving window analysis.

图4示出了所公开的方法及系统的一个实施例的一示范性计算流程的概述。FIG. 4 shows an overview of an exemplary computational flow of one embodiment of the disclosed method and system.

图5示出了深度调用如何用于确立非整倍体的存在的数个视觉表示。上图比较了正常妊娠和胎儿具有三染色体21的妊娠中常规核型(karyotype)与染色体21的深度读段。中间图表示当观察到三染色体时预期的深度偏移的类型。下图示出了表示各种倍数体(ploidy)的四个已知拷贝数(CN)曲线(例如,CN=1、CN=2、CN=3、和CN=4)的预期拟合,其中阴影区域指示来自包含三染色体的样本的读段深度将如何在预期拟合曲线内拟合。Fig. 5 shows several visual representations of how deep calling is used to establish the presence of aneuploidy. The upper figure compares the depth reads of conventional karyotype and chromosome 21 in normal pregnancy and pregnancy in which the fetus has three chromosomes 21. The middle figure represents the type of deep offset expected when three chromosomes are observed. The lower figure shows the expected fit of four known copy number (CN) curves (e.g., CN=1, CN=2, CN=3, and CN=4) representing various ploidy, where the shaded area indicates how the depth of the reads from the sample containing the three chromosomes will fit within the expected fitting curve.

图6示出了数据图中的示范性改进,其可以通过对由1)GC偏差、2)样本背景、和3)杂交探针捕获引起的变化采用三重正规化控制来达成。FIG6 shows an exemplary improvement in data plots that can be achieved by employing a triple normalization control for variations caused by 1) GC bias, 2) sample background, and 3) hybridization probe capture.

图7示出了针对具有不同拟合样本(fit sample)的数个染色体,深度读段相对于预期拟合曲线的拟合。每个图中的阴影区域表示指定染色体的一给定样本的深度。每个图中从左至右的拟合曲线是所述拟合模型中1、2、或3条染色体的预期拟合。Fig. 7 shows the fitting of the depth reads relative to the expected fitting curve for several chromosomes with different fitting samples. The shaded area in each figure represents the depth of a given sample of a specified chromosome. The fitting curve from left to right in each figure is the expected fitting of 1, 2, or 3 chromosomes in the fitting model.

图8示出了一基因(SMN2)的一深度轨迹图,其中母亲具有一个基因拷贝,而胎儿具有零个。FIG. 8 shows a deep trajectory graph for a gene (SMN2) where the mother has one copy of the gene and the fetus has zero.

具体实施方式DETAILED DESCRIPTION

本文所公开的样本制备及方法大体上是涉及从一生母收集生物样本(例如,血液或其它含DNA的样本)然后进行筛查的新方法,如通过一非侵入性产前筛查并行检测非整倍体和基因突变(例如,一隐性监测程序)。即,本公开提供了一种单一测试(例如,并行测试)以仅使用来自一个个体,即一生母的样本发现两组可检测基因状况(例如,非整倍体和基因变体筛查)。将这两种监测测试组合成不涉及生父的单一测试,相较于常规测试及方法提供了效率和方便,所述常规测试及方法通常需要一父亲样本并分别进行非整倍体筛查和基因变体筛查。此外,样本制备可以改善灵敏度、特异性,并使各种因果基因变体检测所不需要的多余遗传物质的噪声降至最低。The sample preparation and methods disclosed herein generally involve new methods for collecting biological samples (e.g., blood or other DNA-containing samples) from a biological mother and then performing screening, such as through a non-invasive prenatal screening to detect aneuploidy and genetic mutations in parallel (e.g., a recessive monitoring program). That is, the present disclosure provides a single test (e.g., a parallel test) to discover two sets of detectable genetic conditions (e.g., aneuploidy and genetic variant screening) using only samples from one individual, the biological mother. Combining these two monitoring tests into a single test that does not involve the biological father provides efficiency and convenience compared to conventional tests and methods, which typically require a father's sample and perform aneuploidy screening and genetic variant screening separately. In addition, sample preparation can improve sensitivity, specificity, and minimize the noise of excess genetic material that is not required for the detection of various causal genetic variants.

下文将更全面地描述根据本公开的实施例。然而,本公开的方面可以不同形式实施,且不应被解释为受限于本文阐述的实施例。相反,提供这些实施例以使得本公开将为透彻且完整的,且将向所属技术领域中具有通常知识者充分传达本发明的范围。本文描述中所用的术语仅出于描述特定实施例的目的,且不打算进行限制。The following will more fully describe embodiments according to the present disclosure. However, aspects of the present disclosure can be implemented in different forms and should not be construed as being limited to the embodiments set forth herein. On the contrary, these embodiments are provided so that the present disclosure will be thorough and complete, and will fully convey the scope of the present invention to those with ordinary knowledge in the art. The terms used in the description herein are only for the purpose of describing specific embodiments and are not intended to be limiting.

除非另有明确说明,否则所有指定的实施例、特点及术语旨在包含所引用的实施例、特点或术语及其等同物。Unless expressly stated otherwise, all specified embodiments, features, and terms are intended to include the referenced embodiments, features, or terms and their equivalents.

I.定义I. Definitions

如本文所用,单数形式“一”、“一个”、及“所述”表示单数和复数两者,除非明确声明仅表示单数。As used herein, the singular forms "a," "an," and "the" refer to both the singular and the plural, unless explicitly stated to refer to the singular only.

如本文所用,术语“约”应理解为包含所述数值和+/-10%范围的相对术语。例如,短语“约10”应理解为意指“10”和“9至11”两者。As used herein, the term "about" should be understood as a relative term including the stated value and a range of +/-10%. For example, the phrase "about 10" should be understood to mean both "10" and "9 to 11".

此外,如本文所用,“和/或”是指且包含一个或多个相关联列出项的任何及所有可能的组合,以及当以替代(“或”)解释时不进行组合。Furthermore, as used herein, "and/or" refers to and encompasses any and all possible combinations of one or more of the associated listed items, as well as no combination when interpreted in the alternative ("or").

如本文所用,“可选”或“可选地”是指随后描述的事件或情况可能发生或可能不发生,并且所述描述包含所述事件或情况发生的情况及其不发生的情况。As used herein, "optional" or "optionally" means that the subsequently described event or circumstance may or may not occur, and that the description includes instances where said event or circumstance occurs and instances where it does not.

如本文所用,“DNA结合颗粒”是指任何与DNA片段如cfDNA片段相互作用的常规固相材料,或经修饰以与所述DNA片段相互作用的常规固相材料。例如,固相材料是任何类型的不溶性、通常刚性的材料、基质、或固定相材料,其在反应溶液中直接或间接与DNA相互作用。在某些示范性实施例中,DNA结合颗粒是珠粒。As used herein, "DNA binding particles" refers to any conventional solid phase material that interacts with DNA fragments such as cfDNA fragments, or conventional solid phase materials that are modified to interact with the DNA fragments. For example, a solid phase material is any type of insoluble, generally rigid material, matrix, or stationary phase material that interacts directly or indirectly with DNA in a reaction solution. In certain exemplary embodiments, the DNA binding particles are beads.

如本文所用,“珠粒”是指任何方便大小的固相颗粒,且可以具有不规则或规则的形状。在某些示范性实施例中,珠粒的表面经修饰以直接和/或间接结合DNA。例如,珠粒可包含硅烷醇基团、羧基基团、或促进珠粒与DNA直接相互作用和/或相互作用的其它基团。在某些示范性实施例中,二氧化硅珠粒(和凝胶)可以通过将一级胺、硫醇、巯基、丙基、辛基、以及其它衍生物添加到附着于二氧化硅的羟基基团(硅烷醇)而被官能化。珠粒可由任何数目的已知材料制成,包含纤维素、纤维素衍生物、丙烯酸树脂、玻璃、硅胶、聚苯乙烯、明胶、聚乙烯吡咯烷酮、乙烯基与丙烯酰胺的共聚物、与二乙烯基苯或类似物交联的聚苯乙烯、聚丙烯酰胺、乳胶凝胶、聚苯乙烯、葡聚糖、橡胶、硅、塑料、硝化纤维素、天然海绵、硅胶、受控多孔玻璃(CPG)、金属、交联葡聚糖(例如)、琼脂糖凝胶及所属技术领域中具有通常知识者已知的其它固相珠粒支撑物。在某些示范性实施例中,珠粒可被填充在一起,以便形成可以与常规柱色谱法一起使用的柱。As used herein, "beads" refer to solid phase particles of any convenient size and may have irregular or regular shapes. In certain exemplary embodiments, the surface of the beads is modified to bind directly and/or indirectly to DNA. For example, the beads may contain silanol groups, carboxyl groups, or other groups that promote direct interaction and/or interaction between the beads and DNA. In certain exemplary embodiments, the silica beads (and gels) may be functionalized by adding primary amines, thiols, mercaptos, propyl groups, octyl groups, and other derivatives to the hydroxyl groups (silanols) attached to the silica. The beads may be made of any number of known materials, including cellulose, cellulose derivatives, acrylic resins, glass, silica gel, polystyrene, gelatin, polyvinyl pyrrolidone, copolymers of vinyl and acrylamide, polystyrene cross-linked with divinylbenzene or the like, polyacrylamide, latex gel, polystyrene, dextran, rubber, silicon, plastic, nitrocellulose, natural sponges, silica gel, controlled pore glass (CPG), metals, cross-linked dextran (e.g. ), agarose gel and other solid phase bead supports known to those of ordinary skill in the art. In certain exemplary embodiments, the beads can be packed together to form a column that can be used with conventional column chromatography.

如本文所用,术语“基因变体”在用于提及本文所述的筛查、调用、或过程时是指被视为一非致病性或野生型基因序列的改变。因此,术语“基因变体”包含致病性单核苷酸多态性(SNP)、受试者基因组内碱基的插入或缺失(INDEL)、置换突变、单基因拷贝数变异等。另外,应注意,本文使用的术语“基因变体”不同于非整倍体,且术语“基因变体”不涉及缺失或额外的染色体。相反,术语“基因变体”应理解为与受试者基因组序列中的特征或改变(致病性的或其它)有关,而非染色体异常。As used herein, the term "gene variant" refers to a change that is considered as a non-pathogenic or wild-type gene sequence when used to refer to screening, calling, or processes described herein. Therefore, the term "gene variant" includes pathogenic single nucleotide polymorphisms (SNPs), insertions or deletions (INDELs) of bases in the subject's genome, substitution mutations, single gene copy number variations, etc. In addition, it should be noted that the term "gene variant" used herein is different from aneuploidy, and the term "gene variant" does not involve deletions or extra chromosomes. On the contrary, the term "gene variant" should be understood to be related to features or changes (pathogenic or other) in the subject's genome sequence, rather than chromosomal abnormalities.

如本文所用,术语“cfDNA库”或“核酸库”可互换使用,以指核酸的集合,例如,衍生自生物样本的游离核酸的集合。在一些实施例中,cfDNA库或核酸库通过扩增样本中的核酸或以其它方式使用基于无PCR的方法制备库来产生。在一些实施例中,cfDNA库或核酸库通过扩增样本内的特定目标片段而产生,如下详述。在一些实施例中,cfDNA库或核酸库中的部分或全部核酸包括转接子序列。转接子序列可以位于一端或两端。转接子序列可用于例如测序方法(例如NGS法)、扩增、反转录、或克隆至一载体中。As used herein, the terms "cfDNA library" or "nucleic acid library" are used interchangeably to refer to a collection of nucleic acids, for example, a collection of free nucleic acids derived from a biological sample. In some embodiments, the cfDNA library or nucleic acid library is produced by amplifying nucleic acids in a sample or otherwise using a PCR-free method to prepare a library. In some embodiments, the cfDNA library or nucleic acid library is produced by amplifying specific target fragments within a sample, as described in detail below. In some embodiments, some or all of the nucleic acids in the cfDNA library or nucleic acid library include a transfer subsequence. The transfer subsequence can be located at one end or both ends. The transfer subsequence can be used, for example, for sequencing methods (e.g., NGS methods), amplification, reverse transcription, or cloning into a vector.

cfDNA库或核酸库可包含核酸片段的集合,其可包含靶核酸序列(例如,其中可检测到与疾病相关联的基因变体的核酸序列)、参考核酸序列、或其组合。在一些实施例中,可组合来自相同受试者的二或更多个cfDNA或核酸库。A cfDNA library or nucleic acid library may comprise a collection of nucleic acid fragments, which may comprise a target nucleic acid sequence (e.g., a nucleic acid sequence in which a gene variant associated with a disease may be detected), a reference nucleic acid sequence, or a combination thereof. In some embodiments, two or more cfDNA or nucleic acid libraries from the same subject may be combined.

如本文所用,“序列库”是已经通过对cfDNA库或核酸库进行测序,例如使用大规模并行方法(如下一代测序或NGS)而制备的核酸序列的集合。NGS通常指允许对克隆扩增的和单一核酸分子进行大规模并行测序的测序方法,在此过程中,来自单一样本或多个不同样本的多个,例如数百万个核酸片段被一致测序。NGS的非限制性实例包含合成测序、连接测序、实时测序、及纳米孔测序。As used herein, a "sequence library" is a collection of nucleic acid sequences that have been prepared by sequencing a cfDNA library or a nucleic acid library, for example using a massively parallel method such as next generation sequencing or NGS. NGS generally refers to a sequencing method that allows massively parallel sequencing of clonally amplified and single nucleic acid molecules, during which multiple, for example millions of nucleic acid fragments from a single sample or multiple different samples are sequenced in unison. Non-limiting examples of NGS include synthetic sequencing, ligation sequencing, real-time sequencing, and nanopore sequencing.

II.样本制备II. Sample Preparation

游离DNA(cfDNA)是性质(例如,大小、序列、丰度)以及来源组织(例如,母体对胎儿)不同的DNA混合物。例如,从孕妇获得的cfDNA含母体和胎儿来源的DNA。当在给定母体血浆样本中利用cfDNA时,NIPS灵敏度的主要驱动因素是胎儿部分(FF)。胎儿部分包括来自胎儿或衍生自游离胎儿DNA(cffDNA)的总游离DNA部分。对于大多数样本,FF值介于1%与30%之间,但在许多情况下,所述数量甚至可能更低。Free DNA (cfDNA) is a mixture of DNA with different properties (e.g., size, sequence, abundance) and source tissues (e.g., maternal versus fetal). For example, cfDNA obtained from pregnant women contains DNA from both maternal and fetal sources. When cfDNA is used in a given maternal plasma sample, the main driver of NIPS sensitivity is the fetal fraction (FF). The fetal fraction includes the total free DNA fraction from the fetus or derived from free fetal DNA (cffDNA). For most samples, the FF value is between 1% and 30%, but in many cases, the number may be even lower.

本公开提供了样本制备及从孕妇(即,妊妇或生母)制备样本的方法,其可用于改善灵敏度、特异性,并在进行NIPS时将噪声降至最低。具体来说,样本制备可能依赖于从孕妇获得的cfDNA样本的物理处理、对从孕妇获得的cfDNA样本产生的测序读段的计算机模拟处理、或其组合。The present disclosure provides sample preparation and methods for preparing samples from pregnant women (i.e., pregnant women or biological mothers) that can be used to improve sensitivity, specificity, and minimize noise when performing NIPS. Specifically, sample preparation may rely on physical processing of cfDNA samples obtained from pregnant women, computer simulation processing of sequencing reads generated from cfDNA samples obtained from pregnant women, or a combination thereof.

A.胎儿部分的物理富集A. Physical enrichment of fetal parts

通过本公开的方法对从孕妇获得的cfDNA样本(例如,血液)的物理处理可以使cfDNA样本的胎儿部分富集高达3倍。具体来说,通过使用保留大部分胎儿游离DNA片段并去除大的游离母体DNA片段中的一些的大小截止值进行大小选择,可以在样本中富集胎儿部分。例如,可设定截止值以保留小于约150个核苷酸长度、约155个核苷酸长度、约160个核苷酸长度、约165个核苷酸长度、约170个核苷酸长度、约175个核苷酸长度、或约180个核苷酸长度的cfDNA片段。Physical treatment of cfDNA samples (e.g., blood) obtained from pregnant women by the methods disclosed herein can enrich the fetal portion of the cfDNA sample by up to 3 times. Specifically, size selection can be performed by using a size cutoff that retains most of the fetal free DNA fragments and removes some of the large free maternal DNA fragments, and the fetal portion can be enriched in the sample. For example, a cutoff value can be set to retain cfDNA fragments less than about 150 nucleotides in length, about 155 nucleotides in length, about 160 nucleotides in length, about 165 nucleotides in length, about 170 nucleotides in length, about 175 nucleotides in length, or about 180 nucleotides in length.

在一些实施例中,所述方法可用于选择及分离以下片段:少于75个核苷酸、少于80个核苷酸、少于85个核苷酸、少于90个核苷酸、少于95个核苷酸、少于100个核苷酸、少于105个核苷酸、少于110个核苷酸、少于115个核苷酸、少于120个核苷酸、少于125个核苷酸、少于130个核苷酸、少于135个核苷酸、少于140个核苷酸、少于145个核苷酸、少于150个核苷酸、少于155个核苷酸、少于160个核苷酸、少于165个核苷酸、少于170个核苷酸、少于175个核苷酸、少于180个核苷酸、少于195个核苷酸、少于200个核苷酸、少于205个核苷酸、少于206个核苷酸、少于210个核苷酸、少于215个核苷酸、少于220个核苷酸、少于225个核苷酸、少于230个核苷酸、少于235个核苷酸、少于240个核苷酸、少于245个核苷酸、少于250个核苷酸、少于255个核苷酸、少于260个核苷酸、少于265个核苷酸、少于270个核苷酸、少于275个核苷酸、少于280个核苷酸、少于285个核苷酸、少于290个核苷酸、少于295个核苷酸、少于300个核苷酸、少于305个核苷酸、少于310个核苷酸、少于311个核苷酸、少于315个核苷酸、少于320个核苷酸、或少于325个核苷酸。在一些实施例中,目标大小可以是少于125个核苷酸、少于130个核苷酸、少于135个核苷酸、少于140个核苷酸、少于145个核苷酸、少于150个核苷酸、少于155个核苷酸、少于160个核苷酸、少于165个核苷酸、少于170个核苷酸、少于175个核苷酸、少于180个核苷酸、少于195个核苷酸、或少于200个核苷酸。不管精确截止值或目标大小如何,所述过程的目标是保留cffDNA而损失很少或没有损失,并将cfmDNA减至最少或耗尽。In some embodiments, the methods can be used to select and isolate fragments of less than 75 nucleotides, less than 80 nucleotides, less than 85 nucleotides, less than 90 nucleotides, less than 95 nucleotides, less than 100 nucleotides, less than 105 nucleotides, less than 110 nucleotides, less than 115 nucleotides, less than 120 nucleotides, less than 125 nucleotides, less than 130 nucleotides, less than 135 nucleotides, less than 140 nucleotides, less than 145 nucleotides, less than 150 nucleotides, less than 155 nucleotides, less than 160 nucleotides, less than 165 nucleotides, less than 170 nucleotides, less than 175 nucleotides, less than 180 nucleotides, less than 195 nucleotides, less than 200 nucleotides, less than 205 nucleotides, less than 206 nucleotides, less than 210 nucleotides, less than 215 nucleotides, less than 220 nucleotides, less than 225 nucleotides, less than 230 nucleotides, less than 235 nucleotides, less than 240 nucleotides, less than 245 nucleotides, less than 250 nucleotides, less than 255 nucleotides, less than 260 nucleotides, less than 265 nucleotides, less than 270 nucleotides, less than 275 nucleotides, less than 280 nucleotides, less than 285 nucleotides, less than 290 nucleotides, less than 295 nucleotides, less than 300 nucleotides, less than 305 nucleotides, less than 310 nucleotides, less than 311 nucleotides, less than 315 nucleotides, less than 320 nucleotides, or less than 325 nucleotides. In some embodiments, the target size can be less than 125 nucleotides, less than 130 nucleotides, less than 135 nucleotides, less than 140 nucleotides, less than 145 nucleotides, less than 150 nucleotides, less than 155 nucleotides, less than 160 nucleotides, less than 165 nucleotides, less than 170 nucleotides, less than 175 nucleotides, less than 180 nucleotides, less than 195 nucleotides, or less than 200 nucleotides. Regardless of the precise cutoff or target size, the goal of the process is to retain cffDNA with little or no loss and minimize or deplete cfmDNA.

这种类型的基于大小的排除可以使用电泳(例如,凝胶电泳或毛细管电泳)和其它已知方法来进行,所述方法可以利用例如DNA结合颗粒,如珠粒(例如,AMPURETM珠粒)。在一个实施例中,使用核酸电泳分离,然后回收所需的片段长度。各种已知的电泳过程可用于这个目的,但在一个实施例中,可使用用于高通量核酸大小选择的具有Ranger TechnologyTM的NIMBUS SelectTM工作站。用于片段大小选择的其它策略包含遵循制造商关于“范围”模式的说明在琼脂糖盒(BluePippin,圣人科学(Sage Science))上进行电泳。从凝胶中洗脱出短片段,直至获得所洗脱的DNA的所需目标大小。其它方法包含但不限于固体支撑物捕获(例如,亲和柱),如抗体包被的旋转柱;阻力改变大小的同步(或非同步)系数(synchronous(or non-synchronous)coefficient of drag alteration sizing;SCODA);固相可逆固定化施胶(例如,使用羧基化磁珠粒);亲和色谱过程,或具有不同长度扩增子的PCR扩增与微芯片分离的组合。This type of size-based exclusion can be performed using electrophoresis (e.g., gel electrophoresis or capillary electrophoresis) and other known methods, which can utilize, for example, DNA-bound particles, such as beads (e.g., AMPURE beads). In one embodiment, nucleic acid electrophoresis is used to separate and then the required fragment length is recovered. Various known electrophoretic processes can be used for this purpose, but in one embodiment, the NIMBUS Select workstation with Ranger Technology for high-throughput nucleic acid size selection can be used. Other strategies for fragment size selection include following the manufacturer's instructions for "range" mode and performing electrophoresis on agarose boxes (BluePippin, Sage Science). Short fragments are eluted from the gel until the required target size of the eluted DNA is obtained. Other methods include, but are not limited to, solid support capture (e.g., affinity columns), such as antibody-coated spin columns; synchronous (or non-synchronous) coefficient of drag alteration sizing (SCODA); solid phase reversible immobilization sizing (e.g., using carboxylated magnetic beads); affinity chromatography processes, or a combination of PCR amplification with amplicons of different lengths and microchip separation.

所公开的基于大小的排除方法可使cfDNA样本中的胎儿部分富集至少1.1X、1.2X、1.25X、1.5X、1.75X、2X、2.25X、2.5X、2.75X、3X、3.25X、3.5X、3.75X、4X、4.25X、4.5X、4.75X、5X、5.5X、6X、6.5X、7X、7.5X、8X、8.5X、9X、9.5X、10X、15X、20X、25X、或更多倍。The disclosed size-based exclusion methods can enrich the fetal portion in a cfDNA sample by at least 1.1X, 1.2X, 1.25X, 1.5X, 1.75X, 2X, 2.25X, 2.5X, 2.75X, 3X, 3.25X, 3.5X, 3.75X, 4X, 4.25X, 4.5X, 4.75X, 5X, 5.5X, 6X, 6.5X, 7X, 7.5X, 8X, 8.5X, 9X, 9.5X, 10X, 15X, 20X, 25X, or more.

因此,本公开提供了游离胎儿DNA(cffDNA)的大小选择方法,其包括使包括cffDNA和游离母体DNA(cfmDNA)的游离DNA(cfDNA)样本经受大小排除过程,以富集从孕妇获得的DNA样本中的胎儿部分。Accordingly, the present disclosure provides a method for size selection of cell-free fetal DNA (cffDNA), which comprises subjecting a cell-free DNA (cfDNA) sample comprising cffDNA and cell-free maternal DNA (cfmDNA) to a size exclusion process to enrich the fetal portion in a DNA sample obtained from a pregnant woman.

B.胎儿部分的计算机模拟富集B. In silico enrichment of fetal fractions

本公开还提供了从孕妇获得的cfDNA样本(例如,血液、血浆、血清)的计算机模拟富集,其能够进一步富集cfDNA样本的胎儿部分。具体来说,所公开的计算机模拟富集包括基于读段长度的大小分析。出于本公开的目的,“基于读段长度的大小分析”是一种计算机模拟过程,其从一系列窗口中建立轨迹,所述轨迹应用于对读段数据进行测序。已建立的轨迹基于在一组FF水平上观察到的等位基因平衡(AB)。因此,通过来自不同窗口的计算机模拟大小选择来确定FF水平,从而允许区分母体和胎儿DNA(分别是cfmDNA和cffDNA)。例如,轨迹可以示出10% FF时的AB是55%,15% FF时的AB是60%,且20% FF时的AB是65%。这是向上倾斜的轨迹,因为AB随着FF增加而增加。这种轨迹的斜率和偏移(或截距)二者皆有用。例如,如果cfmDNA主要通过一给定窗口进行选择,使得FF尽可能低,那么所得AB主要反映母体基因型。随着具有更小片段的窗口拾取更多FF,AB的偏转指示胎儿基因型。因此,如果截距是约50%(意味着母亲是变体的杂合体),那么斜率为负的轨迹表明胎儿没有遗传特定的母体变体。The present disclosure also provides computer simulation enrichment of cfDNA samples (e.g., blood, plasma, serum) obtained from pregnant women, which can further enrich the fetal part of cfDNA samples. Specifically, the disclosed computer simulation enrichment includes size analysis based on read length. For the purpose of the present disclosure, "size analysis based on read length" is a computer simulation process that establishes a track from a series of windows, and the track is applied to sequence the read data. The established track is based on the allele balance (AB) observed at a set of FF levels. Therefore, the FF level is determined by computer simulation size selection from different windows, thereby allowing maternal and fetal DNA (cfmDNA and cffDNA, respectively) to be distinguished. For example, the track can show that AB at 10% FF is 55%, AB at 15% FF is 60%, and AB at 20% FF is 65%. This is an upward sloping track because AB increases as FF increases. Both the slope and offset (or intercept) of this track are useful. For example, if cfmDNA is mainly selected by a given window so that FF is as low as possible, the resulting AB mainly reflects the maternal genotype. As windows with smaller fragments pick up more FF, the deflection of AB is indicative of the fetal genotype. Thus, if the intercept is about 50% (meaning the mother is heterozygous for the variant), then a negatively sloped trajectory indicates that the fetus did not inherit the particular maternal variant.

理解cfDNA样本中的等位基因平衡改进了关注所需样本部分的能力(例如,用于非整倍体和基因变体分析的FF,或用于携带者分析的母体部分)。在一些实施例中,在基于大小的移动窗口分析之后进行体外适度大小选择(即,物理处理/大小排除)可以提供最佳结果。Understanding the allelic balance in cfDNA samples improves the ability to focus on the desired sample portion (e.g., FF for aneuploidy and genetic variant analysis, or the maternal portion for carrier analysis). In some embodiments, in vitro moderate size selection (i.e., physical processing/size exclusion) followed by size-based moving window analysis can provide optimal results.

一旦已经制备序列库,序列库的胎儿部分可以使用计算机模拟移动窗口分析进一步处理或富集。出于所公开方法的目的,“窗口”是序列库的选择或子部分,其包含特定大小范围的序列。例如,“窗口”可包含序列库中的所有序列,所述序列是0-145个核苷酸、0-150个核苷酸、0-155个核苷酸、0-160个核苷酸、0-165个核苷酸、0-170个核苷酸、0-175个核苷酸、0-180个核苷酸、0-185个核苷酸、0-190个核苷酸、0-195个核苷酸、0-200个核苷酸、0-205个核苷酸、0-210个核苷酸、0-215个核苷酸、0-220个核苷酸、0-225个核苷酸、25-145个核苷酸、25-150个核苷酸、25-155个核苷酸、25-160个核苷酸、25-165个核苷酸、25-170个核苷酸、25-175个核苷酸、25-180个核苷酸、25-185个核苷酸、25-190个核苷酸、25-195个核苷酸、25-200个核苷酸、25-205个核苷酸、25-210个核苷酸、25-215个核苷酸、25-220个核苷酸、25-225个核苷酸、50-145个核苷酸、50-150个核苷酸、50-155个核苷酸、50-160个核苷酸、50-165个核苷酸、50-170个核苷酸、50-175个核苷酸、50-180个核苷酸、50-185个核苷酸、50-190个核苷酸、50-195个核苷酸、50-200个核苷酸、50-205个核苷酸、50-210个核苷酸、50-215个核苷酸、50-220个核苷酸、50-225个核苷酸、75-145个核苷酸、75-150个核苷酸、75-155个核苷酸、75-160个核苷酸、75-165个核苷酸、75-170个核苷酸、75-175个核苷酸、75-180个核苷酸、75-185个核苷酸、75-190个核苷酸、75-195个核苷酸、75-200个核苷酸、75-205个核苷酸、75-210个核苷酸、75-215个核苷酸、75-220个核苷酸、75-225个核苷酸、100-145个核苷酸、100-150个核苷酸、100-155个核苷酸、100-160个核苷酸、100-165个核苷酸、100-170个核苷酸、100-175个核苷酸、100-180个核苷酸、100-185个核苷酸、100-190个核苷酸、100-195个核苷酸、100-200个核苷酸、100-205个核苷酸、100-210个核苷酸、100-215个核苷酸、100-220个核苷酸、100-225个核苷酸、或介于之间的任何范围。如果不设定一特定最大值和最小值,窗口可被视为“未闸控的”,且相反所述窗口包含整个序列库。图2示出其中序列库中的序列被分成四个窗口的一实例。Once a sequence library has been prepared, the fetal portion of the sequence library can be further processed or enriched using a computer simulated moving window analysis. For the purposes of the disclosed methods, a "window" is a selection or sub-portion of a sequence library that contains sequences of a particular size range. For example, a "window" can include all sequences in a sequence library that are 0-145 nucleotides, 0-150 nucleotides, 0-155 nucleotides, 0-160 nucleotides, 0-165 nucleotides, 0-170 nucleotides, 0-175 nucleotides, 0-180 nucleotides, 0-185 nucleotides, 0-190 nucleotides, 0-195 nucleotides, 0-200 nucleotides, 0-205 nucleotides, 0-210 nucleotides, 0-215 nucleotides, 0-220 nucleotides, 0-225 nucleotides, 25-145 nucleotides, 25-150 nucleotides, 25-155 nucleotides, 25-160 nucleotides, 25 ... 165 nucleotides, 25-170 nucleotides, 25-175 nucleotides, 25-180 nucleotides, 25-185 nucleotides, 25-190 nucleotides, 25-195 nucleotides, 25-200 nucleotides, 25-205 nucleotides, 25-210 nucleotides, 25-215 nucleotides, 25-220 nucleotides, 25-225 nucleotides, 50-145 nucleotides, 50-150 nucleotides, 50-155 nucleotides, 50-160 nucleotides, 50-165 nucleotides, 50-170 nucleotides, 50-175 nucleotides, 50-180 nucleotides, 50-185 nucleotides, 50-1 90 nucleotides, 50-195 nucleotides, 50-200 nucleotides, 50-205 nucleotides, 50-210 nucleotides, 50-215 nucleotides, 50-220 nucleotides, 50-225 nucleotides, 75-145 nucleotides, 75-150 nucleotides, 75-155 nucleotides, 75-160 nucleotides, 75-165 nucleotides, 75-170 nucleotides, 75-175 nucleotides, 75-180 nucleotides, 75-185 nucleotides, 75-190 nucleotides, 75-195 nucleotides, 75-200 nucleotides, 75-205 nucleotides, 75-210 nucleotides, 75-215 In some embodiments, the window is divided into four windows. The window is a plurality of nucleotides, 5 nucleotides, 75-220 nucleotides, 75-225 nucleotides, 100-145 nucleotides, 100-150 nucleotides, 100-155 nucleotides, 100-160 nucleotides, 100-165 nucleotides, 100-170 nucleotides, 100-175 nucleotides, 100-180 nucleotides, 100-185 nucleotides, 100-190 nucleotides, 100-195 nucleotides, 100-200 nucleotides, 100-205 nucleotides, 100-210 nucleotides, 100-215 nucleotides, 100-220 nucleotides, 100-225 nucleotides, or any range therebetween. If a specific maximum and minimum values are not set, the window can be considered as "ungated", and on the contrary the window comprises the entire sequence library. Fig. 2 illustrates an example in which the sequence in the sequence library is divided into four windows.

因此,所公开的计算机模拟富集方法可包括对序列库的至少两个窗口中的序列进行基于读段长度的大小排除,从而获得至少两个富集胎儿部分的序列库。在一些实施例中,可以评估3、4、5、6、7、8、9、10、或更多个窗口。在一些实施例中,可以评估至少5个、至少6个、至少7个、或至少8个窗口。在一些实施例中,窗口大小相同(例如,每个窗口包含设定范围的核苷酸,如0-100、5-105、10-110等)。在一些实施例中,窗口具有不同的大小。例如,每个附加窗口的大小可以增加,而最小值保持相同(例如,一组窗口的大小截止值为0-145、0-150、0-155、0-160、0-165、0-170等)。比较每个窗口中的等位基因平衡允许计算各个富集胎儿部分的序列库之间的等位基因平衡轨迹。所述轨迹是任何给定感兴趣的基因序列的等位基因平衡百分比跨所观察窗口的变化。等位基因平衡轨迹可计算为每个所观察窗口中等位基因平衡的斜率,并且它可以多种方式视觉化,如图3所示。Therefore, disclosed computer simulation enrichment method can include the sequence in at least two windows of sequence library being excluded based on the size of read length, thereby obtains the sequence library of at least two enrichment fetal parts.In certain embodiments, 3,4,5,6,7,8,9,10 or more windows can be evaluated.In certain embodiments, at least 5, at least 6, at least 7 or at least 8 windows can be evaluated.In certain embodiments, window size is identical (for example, each window comprises the nucleotide of setting range, such as 0-100,5-105,10-110 etc.).In certain embodiments, window has different sizes.For example, the size of each additional window can increase, and minimum value keeps the same (for example, the size cutoff value of a group of windows is 0-145,0-150,0-155,0-160,0-165,0-170 etc.).Comparing the allele balance in each window allows calculating the allele balance track between the sequence library of each enrichment fetal part.Described track is the variation of the allele balance percentage of any given gene sequence interested across the observed window. The allelic balance trajectory can be calculated as the slope of the allelic balance in each observed window, and it can be visualized in a variety of ways, as shown in Figure 3.

此外,cfmDNA序列库可通过在两种片段大小(如100-200个核苷酸、105-200个核苷酸、110-200个核苷酸、115-200个核苷酸、120-200个核苷酸、125-200个核苷酸、130-200个核苷酸、135-200个核苷酸、140-200个核苷酸、140-200个核苷酸、145-200个核苷酸、150-200个核苷酸、155-200个核苷酸、160-200个核苷酸、165-200个核苷酸、170-200个核苷酸、或175-200个核苷酸或介于之间的任何大小范围)之间进行集中分析来富集。在一些实施例中,选择用于富集的大小范围可为约155至约200个核苷酸。In addition, the cfmDNA sequence library can be enriched by focusing the analysis between two fragment sizes, such as 100-200 nucleotides, 105-200 nucleotides, 110-200 nucleotides, 115-200 nucleotides, 120-200 nucleotides, 125-200 nucleotides, 130-200 nucleotides, 135-200 nucleotides, 140-200 nucleotides, 140-200 nucleotides, 145-200 nucleotides, 150-200 nucleotides, 155-200 nucleotides, 160-200 nucleotides, 165-200 nucleotides, 170-200 nucleotides, or 175-200 nucleotides, or any size range therebetween. In some embodiments, the size range selected for enrichment can be about 155 to about 200 nucleotides.

在一些实施例中,所述序列库的至少两个窗口选自(i)0-145个核苷酸的序列、(ii)0-150个核苷酸的序列、(iii)0-155个核苷酸、(iv)0-160个核苷酸、(v)0-165个核苷酸、(vi)0-168个核苷酸、(vii)0-170个核苷酸、(viii)0-175个核苷酸、(ix)0-180个核苷酸、(x)0-185个核苷酸、(xi)0-190个核苷酸、(xii)0-195个核苷酸、(xiii)0-200个核苷酸、和(xiv)未闸控者。在一些实施例中,至少3个、至少4个、至少5个、至少6个、至少7个、至少8个、至少9个、或至少10个窗口选自(i)0-145个核苷酸的序列、(ii)0-150个核苷酸的序列、(iii)0-155个核苷酸、(iv)0-160个核苷酸、(v)0-165个核苷酸、(vi)0-168个核苷酸、(vii)0-170个核苷酸、(viii)0-175个核苷酸、(ix)0-180个核苷酸、(x)0-185个核苷酸、(xi)0-190个核苷酸、(xii)0-195个核苷酸、(xiii)0-200个核苷酸、和(xiv)未闸控者。In some embodiments, at least two windows of the sequence library are selected from (i) sequences of 0-145 nucleotides, (ii) sequences of 0-150 nucleotides, (iii) 0-155 nucleotides, (iv) 0-160 nucleotides, (v) 0-165 nucleotides, (vi) 0-168 nucleotides, (vii) 0-170 nucleotides, (viii) 0-175 nucleotides, (ix) 0-180 nucleotides, (x) 0-185 nucleotides, (xi) 0-190 nucleotides, (xii) 0-195 nucleotides, (xiii) 0-200 nucleotides, and (xiv) ungated. In some embodiments, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, or at least 10 windows are selected from (i) a sequence of 0-145 nucleotides, (ii) a sequence of 0-150 nucleotides, (iii) 0-155 nucleotides, (iv) 0-160 nucleotides, (v) 0-165 nucleotides, (vi) 0-168 nucleotides, (vii) 0-170 nucleotides, (viii) 0-175 nucleotides, (ix) 0-180 nucleotides, (x) 0-185 nucleotides, (xi) 0-190 nucleotides, (xii) 0-195 nucleotides, (xiii) 0-200 nucleotides, and (xiv) ungated.

计算机模拟富集序列库的胎儿部分还可以进一步包括通过将第一序列库中的cffDNA和cfmDNA的序列读段与参考基因组进行比较、对来自第一库中的序列读段进行解多工、从第一序列库中去除重复序列、或其组合,从而从cfmDNA中鉴定和分离cffDNA。Computer simulation of enriching the fetal portion of the sequence library can further include identifying and separating cffDNA from cfmDNA by comparing sequence reads of cffDNA and cfmDNA in the first sequence library to a reference genome, demultiplexing sequence reads from the first library, removing repetitive sequences from the first sequence library, or a combination thereof.

例如,样本制备可包含计算机模拟二进制比对处理,其中所收集的DNA样本可以通过使用短测序读段之间的重迭进行计算重建。如果可获得测序读段可与之比对的参考基因组,那么可促进基因组的重建。可使用序列比对工具将存储在文件中的短读段映射到参考基因组。随后,可使用深度和变体处理来鉴定和分离特定基因序列,以通知后续分析,其可以针对例如特定非整倍体和/或基因变体的鉴定。以这种方式,仅用有限量的最初收集的cfDNA,即可以标绘及汇集所收集的DNA的特定部分,以用于特异性测定检测(specificassay detection)。For example, sample preparation may include computer simulation binary comparison processing, wherein the collected DNA samples can be reconstructed by using the overlap between short sequencing reads. If a reference genome with which sequencing reads can be compared is available, the reconstruction of the genome can be facilitated. A sequence alignment tool can be used to map the short reads stored in a file to a reference genome. Subsequently, depth and variant processing can be used to identify and separate specific gene sequences to inform subsequent analysis, which can be directed to, for example, the identification of specific aneuploidies and/or gene variants. In this way, only a limited amount of initially collected cfDNA can be used, i.e., specific portions of the collected DNA can be plotted and assembled for specific assay detection (specific assay detection).

所收集的DNA样本可以通过使用短测序读段之间的重迭进行计算重建。因此,可以使用多路分用器(例如,demux)在第一遍标绘DNA样本,此允许确定评估特定筛查(例如,携带者、产前等)可能需要的独特分子标识符。独特分子标识符(UMI)(有时称为分子条形码(MBC))是在测序库制备方案中添加到DNA片段上的短序列(例如,标签),以鉴定特定筛查可能针对的所需DNA分子。这些标签在任何扩增之前添加,并可用于减少扩增引入的误差和定量偏差。The collected DNA samples can be computationally reconstructed using the overlap between short sequencing reads. Thus, a demultiplexer (e.g., demux) can be used to plot the DNA samples in the first pass, which allows the determination of unique molecular identifiers that may be needed to assess specific screening (e.g., carriers, prenatal, etc.). Unique molecular identifiers (UMIs), sometimes referred to as molecular barcodes (MBCs), are short sequences (e.g., tags) added to DNA fragments during sequencing library preparation protocols to identify desired DNA molecules that may be targeted by specific screening. These tags are added before any amplification and can be used to reduce errors and quantitative bias introduced by amplification.

一旦被标记,特定标记的DNA序列最初可以使用比对处理进行比对,以彼此标绘出所需的DNA序列。接着重复减少(例如,“去重复”)可以清除任何错误的鉴定和/或未比对,此可包括保留成对末端读段的重迭部分的共有序列。此后,可以进行再比对过程,以在所需DNA序列与所标记的DNA序列之间产生更稳健的标绘。Once labeled, the specifically labeled DNA sequences can initially be aligned using an alignment process to map the desired DNA sequences to each other. Repeat reduction (e.g., "de-replication") can then clean up any erroneous identifications and/or misalignments, which can include retaining the consensus sequence of the overlapping portions of paired end reads. Thereafter, a re-alignment process can be performed to produce a more robust mapping between the desired DNA sequence and the labeled DNA sequence.

可使用扩增来分离感兴趣或随后筛查所需的特异性核酸序列。例如,可以使用计算工具来计算理论聚合酶链式反应(PCR)结果,使用给定一组引物(探针)从所测序的DNA样本中扩增DNA序列,从而完成计算机模拟扩增。扩增后,可以通过去除(例如,修剪)位于序列开始及结束处的部分(例如,不完整)序列来提高特异性读段序列的质量。达成此点的一示范性但非限制性的方法被称为成对末端(PE)修剪,其可包含两个输入文件(用于正向和反向读段)和四个输出文件(用于正向成对、正向不成对、反向成对、及反向不成对读段)以鉴定及去除部分序列。有用的DNA样本的重建可被促进并存储在备用文件中。此外,可根据片段长度(根据核苷酸数目)将文件标绘为不同的二进数。Amplification can be used to isolate specific nucleic acid sequences of interest or subsequent screening. For example, computational tools can be used to calculate theoretical polymerase chain reaction (PCR) results, using a given set of primers (probes) to amplify DNA sequences from sequenced DNA samples, thereby completing computer simulation amplification. After amplification, the quality of specific read sequences can be improved by removing (e.g., trimming) partial (e.g., incomplete) sequences located at the beginning and end of the sequence. An exemplary but non-limiting method to achieve this is called paired end (PE) trimming, which can include two input files (for forward and reverse reads) and four output files (for forward paired, forward unpaired, reverse paired, and reverse unpaired reads) to identify and remove partial sequences. The reconstruction of useful DNA samples can be facilitated and stored in spare files. In addition, files can be plotted as different binary numbers according to fragment length (according to the number of nucleotides).

作为深度和变体处理的一部分,可以鉴定和分离存储在文件中的特异性基因序列,以通知针对特定非整倍体和/或因果基因变体的后续分析。所述文件可在特定程序中使用,以减轻初始收集的样本中的偏差。前述的计算机模拟步骤和计算制备可以针对给定测试或筛查的特定目标,针对特异性DNA序列将DNA样本优化。As part of depth and variant processing, specific gene sequences stored in files can be identified and isolated to inform subsequent analysis of specific aneuploidy and/or causal gene variants. The files can be used in specific programs to mitigate bias in the samples initially collected. The aforementioned computer simulation steps and calculation preparation can optimize DNA samples for specific DNA sequences for specific targets of a given test or screening.

所公开的计算机模拟处理可使cfDNA样本中的胎儿部分富集至少1.1X、1.2X、1.25X、1.5X、1.75X、2X、2.25X、2.5X、2.75X、3X、3.25X、3.5X、3.75X、4X、4.25X、4.5X、5.75X、5X、5.5X、6X、6.5X、7X、7.5X、8X、8.5X、9X、9.5X、10X、15X、20X、25X、或更多倍。The disclosed computer simulation processing can enrich the fetal portion in the cfDNA sample by at least 1.1X, 1.2X, 1.25X, 1.5X, 1.75X, 2X, 2.25X, 2.5X, 2.75X, 3X, 3.25X, 3.5X, 3.75X, 4X, 4.25X, 4.5X, 5.75X, 5X, 5.5X, 6X, 6.5X, 7X, 7.5X, 8X, 8.5X, 9X, 9.5X, 10X, 15X, 20X, 25X, or more times.

替代地,如果需要,所公开的计算机模拟处理还可用于通过选择较大的片段来富集样本的母体部分。在一些实施例中,所公开的计算机模拟处理可以使cfDNA样本中的母体部分富集至少1.1X、1.2X、1.25X、1.5X、1.75X、2X、2.25X、2.5X、2.75X、3X、3.25X、3.5X、3.75X、4X、4.25X、4.5X、5.75X、5X、5.5X、6X、6.5X、7X、7.5X、8X、8.5X、9X、9.5X、10X、15X、20X、25X、或更多倍。Alternatively, if desired, the disclosed computer simulation process can also be used to enrich the maternal portion of the sample by selecting larger fragments. In some embodiments, the disclosed computer simulation process can enrich the maternal portion in the cfDNA sample by at least 1.1X, 1.2X, 1.25X, 1.5X, 1.75X, 2X, 2.25X, 2.5X, 2.75X, 3X, 3.25X, 3.5X, 3.75X, 4X, 4.25X, 4.5X, 5.75X, 5X, 5.5X, 6X, 6.5X, 7X, 7.5X, 8X, 8.5X, 9X, 9.5X, 10X, 15X, 20X, 25X, or more times.

因此,本公开提供了cffDNA的计算机模拟分选及富集的方法,其包括对包括cffDNA和游离DNA母体(cfmDNA)的游离DNA(cfDNA)样本进行测序,并进行基于读段长度的大小分析,其中基于大小的移动窗口用于基于cfmDNA与cffDNA之间的等位基因平衡建立轨迹,从而阐明给定样本中cfmDNA或cffDNA的基因型。在一些实施例中,此类方法可进一步包括通过将cffDNA和cfmDNA的序列读段与参考基因组进行比较、对序列读段进行解多工、以及去除重复序列,从而从cfmDNA中鉴定和分离cffDNA。Therefore, the present disclosure provides a method for computer simulation sorting and enrichment of cffDNA, which includes sequencing a cell-free DNA (cfDNA) sample including cffDNA and cell-free DNA maternal (cfmDNA), and performing a size analysis based on the length of the read segment, wherein a size-based moving window is used to establish a trajectory based on the allelic balance between cfmDNA and cffDNA, thereby clarifying the genotype of cfmDNA or cffDNA in a given sample. In some embodiments, such methods may further include identifying and separating cffDNA from cfmDNA by comparing the sequence reads of cffDNA and cfmDNA to a reference genome, demultiplexing the sequence reads, and removing repetitive sequences.

C.物理富集与计算机模拟富集的组合C. Combination of physical enrichment and computer simulation enrichment

前述样本制备方法可单独或组合进行,以富集给定样本的胎儿部分。在物理富集或计算机模拟富集之前,可通过常规方式从母体样本(例如,血液、血浆、血清)中分离总cfDNA。例如,可以使用APOSTLETM游离DNA萃取试剂盒从样本中获得的澄清血浆中萃取总cfDNA。还可以使用用于cfDNA萃取的其它已知方法和市售试剂盒,包含但不限于Molzym有限公司(Molzym GmbH&Co KG)(德国不来梅(Bremen,DE))、凯杰(Qiagen)(德国希尔登(Hilden,DE))、德国MN(Macherey-Nagel)(德国杜伦(Düren,DE))、罗氏(Roche)(瑞士巴塞尔(Basel,CH))、和西格玛(Sigma)(德国戴森霍芬(Deisenhofen,DE))生产的试剂盒。The aforementioned sample preparation methods can be performed alone or in combination to enrich the fetal portion of a given sample. Before physical enrichment or computer simulation enrichment, total cfDNA can be separated from maternal samples (e.g., blood, plasma, serum) by conventional means. For example, APOSTLE TM free DNA extraction kit can be used to extract total cfDNA from clarified plasma obtained from a sample. Other known methods and commercially available kits for cfDNA extraction can also be used, including but not limited to Molzym GmbH & Co KG (Bremen, DE, Germany), Qiagen (Hilden, DE, Germany), MN (Macherey-Nagel, Germany) (Düren, DE, Germany), Roche (Basel, CH, Switzerland) and Sigma (Sigma) (Deisenhofen, DE, Germany) produced kits.

在物理富集和计算机模拟富集之后,胎儿部分可为用于进一步测试、筛查、或分析的DNA样本的2%、3%、4%、5%、6%、7%、8%、9%、10%、11%、12%、13%、14%、16%、17%、18%、19%、20%、21%、22%、23%、24%、25%、26%、27%、28%、29%、30%、31%、32%、33%、34%、35%、36%、37%、38%、39%、40%、41%、42%、43%、44%、45%、46%、47%、48%、49%、50%、51%、52%、53%、54%、55%、56%、57%、58%、59%、60%、61%、62%、63%、64%、65%、66%、67%、68%、69%、70%、71%、72%、73%、74%、75%、76%、77%、78%、79%、80%、85%、90%、95%、99%、或100%。另外或替代地,在物理富集和计算机模拟富集之后,胎儿部分可为约5%至100%、约5%至约95%、约5%至约90%、约5%至约85%、约5%至约80%、约5%至约75%、约10%至100%、约10%至约95%、约10%至约90%、约10%至约85%、约10%至约80%、约10%至约75%、约15%至100%、约15%至约95%、约15%至约90%、约15%至约85%、约15%至约80%、约15%至约75%、约20%至100%、约20%至约95%、约20%至约90%、约20%至约85%、约20%至约80%、约20%至约75%、约25%至100%、约25%至约95%、约25%至约90%、约25%至约85%、约25%至约80%、约25%至约75%、约30%至100%、约30%至约95%、约30%至约90%、约30%至约85%、约30%至约80%、约30%至约75%、约35%至100%、约35%至约95%、约35%至约90%、约35%至约85%、约35%至约80%、约35%至约75%、约40%至100%、约40%至约95%、约40%至约90%、约40%至约85%、约40%至约80%、约40%至约75%、约45%至100%、约45%至约95%、约45%至约90%、约45%至约85%、约45%至约80%、约45%至约75%、约50%至100%、约50%至约95%、约50%至约90%、约50%至约85%、约50%至约80%、及约50%至约75%。After physical enrichment and computer simulation enrichment, the fetal portion can be 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%, 16%, 17%, 18%, 19%, 20%, 21%, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, 30%, 31%, 32%, 33%, 34%, 35%, 36%, 37%, 38%, 39%, 40%, 41%, 42%, 43%, 44%, 45%, 46%, 47%, 48%, 49%, 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 100%, 101%, 102%, 103%, 104%, 105%, 106%, 107%, 108%, 109%, 110%, 0%, 41%, 42%, 43%, 44%, 45%, 46%, 47%, 48%, 49%, 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 85%, 90%, 95%, 99%, or 100%. Additionally or alternatively, after physical enrichment and computer simulation enrichment, the fetal portion can be about 5% to 100%, about 5% to about 95%, about 5% to about 90%, about 5% to about 85%, about 5% to about 80%, about 5% to about 75%, about 10% to 100%, about 10% to about 95%, about 10% to about 90%, about 10% to about 85%, about 10% to about 80%, about 10% to about 75%, about 15% to 100%, about 15% to about ...0% to about 95%, about 10% to about 90%, about 10% to about 85%, about 10% to about 80%, about 10% to about 75%, about 15% to 100%, about 15% to about 100 % to about 95%, about 15% to about 90%, about 15% to about 85%, about 15% to about 80%, about 15% to about 75%, about 20% to 100%, about 20% to about 95%, about 20% to about 90%, about 20% to about 85%, about 20% to about 80%, about 20% to about 75%, about 25% to 100%, about 25% to about 95%, about 25% to about 90%, about 20% to about 85%, about 20% to about 80%, about 20% to about 75%, about 25% to 100%, about 25% to about 95%, about 25% to about 90%, about 25% to about 85%, about 25% to about 80% , about 25% to about 75%, about 30% to 100%, about 30% to about 95%, about 30% to about 90%, about 30% to about 85%, about 30% to about 80%, about 30% to about 75%, about 35% to 100%, about 35% to about 95%, about 35% to about 90%, about 35% to about 85%, about 35% to about 80%, about 35% to about 75%, about 40% to 100%, about 40% to about 95%, about 40% to About 90%, about 40% to about 85%, about 40% to about 80%, about 40% to about 75%, about 45% to 100%, about 45% to about 95%, about 45% to about 90%, about 45% to about 85%, about 45% to about 80%, about 45% to about 75%, about 50% to 100%, about 50% to about 95%, about 50% to about 90%, about 50% to about 85%, about 50% to about 80%, and about 50% to about 75%.

因此,本公开提供了制备具有经富集的胎儿部分的游离DNA样本的方法,其包括使用大小排除处理cfDNA样本以保留游离胎儿DNA(cffDNA)并去除游离母体DNA(cfmDNA),进行计算机模拟处理以从cfmDNA中鉴定和分离cffDNA,或其组合。Accordingly, the present disclosure provides a method for preparing a cell-free DNA sample having an enriched fetal portion, comprising treating the cfDNA sample using size exclusion to retain cell-free fetal DNA (cffDNA) and remove cell-free maternal DNA (cfmDNA), performing computer simulation processing to identify and separate cffDNA from cfmDNA, or a combination thereof.

III.并行筛查方法III. Parallel Screening Methods

本公开提供了仅利用来自胎儿生母的单一生物样本(例如,血液、血浆、血清)评估或筛查胎儿中非整倍体和基因变体的方法。常规地,非整倍体测试和基因变体测试是分开进行,并且需要多个样本。事实上,筛查某些病况甚至还需要从生父获得生物样本。所公开方法克服了这些问题,并用于提供改善常规非侵入性产前筛查(NIPS)的新的有用方法。The present disclosure provides a method for evaluating or screening aneuploidy and gene variants in a fetus using only a single biological sample (e.g., blood, plasma, serum) from the fetus's biological mother. Conventionally, aneuploidy testing and gene variant testing are performed separately and require multiple samples. In fact, screening for certain conditions even requires obtaining a biological sample from the biological father. The disclosed method overcomes these problems and is used to provide a new useful method for improving conventional non-invasive prenatal screening (NIPS).

所公开方法可包括利用来自生母的同一单一cfDNA样本的两种并行筛查:用于检测非整倍体的第一筛查和用于检测基因变体的第二筛查。The disclosed method may include two parallel screens utilizing the same single cfDNA sample from the biological mother: a first screen for detecting aneuploidy and a second screen for detecting genetic variants.

在第一筛查中,所收集的样本的特定子部分(例如,较小cfDNA片段的子部分)可用于将胎儿部分优化,以评估非整倍体状况的存在或不存在。非整倍体的存在或不存在可通过确定允许区分母体和胎儿DNA的轨迹来确定。以这种方式,所公开的筛查可同时评估胎儿非整倍体和母体非整倍体,这在之前是不可能的。第一筛查可以另外或替代地依赖测序深度来确定给定样本中是否存在或不存在非整倍体。In the first screening, a specific sub-portion of the collected sample (e.g., a sub-portion of a smaller cfDNA fragment) can be used to optimize the fetal portion to assess the presence or absence of an aneuploidy condition. The presence or absence of an aneuploidy can be determined by determining a trajectory that allows the distinction between maternal and fetal DNA. In this way, the disclosed screening can simultaneously assess fetal aneuploidy and maternal aneuploidy, which was not possible before. The first screening can additionally or alternatively rely on sequencing depth to determine whether aneuploidy is present or absent in a given sample.

在第二筛查中,可使用所收集的样本的特定子部分(例如,较小cfDNA片段的子部分)将胎儿部分优化并将来自多余遗传物质的噪声降至最低。然后,通过例如建立轨迹以从多余样本物质中标绘出相关样本物质,可使用所述子部分检测各种基因变体。在每种筛查中,使用包含适当比率的游离母体DNA(cfmDNA)与游离胎儿cffDNA的基因样本的最优选刈幅,允许以合理的确定性检测已知非整倍体和基因变体的存在或不存在,而不必求助于针对一种方法或另一种方法定制并行筛查的个体焦点。In the second screening, the fetal part can be optimized and the noise from the redundant genetic material can be minimized using a specific sub-portion of the collected sample (e.g., a sub-portion of a smaller cfDNA fragment). Then, by, for example, establishing a trajectory to plot the relevant sample material from the redundant sample material, the sub-portion can be used to detect various gene variants. In each screening, the most preferred crop of a gene sample containing free maternal DNA (cfmDNA) and free fetal cffDNA at an appropriate ratio is used, allowing the presence or absence of known aneuploidies and gene variants to be detected with reasonable certainty, without having to resort to individual focus of customized parallel screening for one method or another.

所述方法可以从生母收集样本开始,典型地通过抽血,尽管还考虑了其它生物样本(例如,血浆、血清等)。这种样本包含游离DNA(cfDNA)。cfDNA可包含各种自由循环的DNA,包含循环肿瘤DNA(ctDNA)、游离线粒体DNA(cf mtDNA)、游离母体DNA(cfmDNA)、和游离胎儿DNA(cffDNA)。由于受试者是妊妇,cfDNA样本中还将会存在一定水平的胎儿DNA。此外,还可进行适合特定基因序列的靶向DNA捕获。因此,cfDNA以及靶向捕获的方面皆可用于所公开方法的目的。The method can start with collecting a sample from the biological mother, typically by drawing blood, although other biological samples (e.g., plasma, serum, etc.) are also considered. This sample contains free DNA (cfDNA). cfDNA can contain various free-circulating DNA, including circulating tumor DNA (ctDNA), free mitochondrial DNA (cf mtDNA), free maternal DNA (cfmDNA), and free fetal DNA (cffDNA). Since the subject is a pregnant woman, there will also be a certain level of fetal DNA in the cfDNA sample. In addition, targeted DNA capture suitable for specific gene sequences can also be performed. Therefore, aspects of cfDNA and targeted capture can be used for the purposes of the disclosed method.

在一个方面中,本公开提供了并行检测单一母体样本中非整倍体和至少一种基因突变的存在或不存在的方法,其包括(i)从孕妇获得生物样本,其中所述生物样本包括游离DNA(cfDNA);(ii)制备cfDNA库(例如,通过扩增cfDNA片段的目标群体);(iii)对所述cfDNA库进行测序以制备序列库;及(iv)检测所述单一母体样本中非整倍体和至少一种基因变体的存在或不存在;In one aspect, the present disclosure provides a method for concurrently detecting the presence or absence of aneuploidy and at least one genetic mutation in a single maternal sample, comprising (i) obtaining a biological sample from a pregnant woman, wherein the biological sample comprises cell-free DNA (cfDNA); (ii) preparing a cfDNA library (e.g., by amplifying a target population of cfDNA fragments); (iii) sequencing the cfDNA library to prepare a sequence library; and (iv) detecting the presence or absence of aneuploidy and at least one genetic variant in the single maternal sample;

其中(a)使cfDNA库富集以增加胎儿部分,(b)使序列库富集以增加胎儿部分,或(c)其组合,使得在检测单一母体样本中非整倍体和至少一种基因变体存在或不存在之前,单一母体样本的胎儿部分增加至少1.5倍。在一些实施例中,使cfDNA库富集以增加胎儿部分,且使序列库富集以增加胎儿部分。以下各部分提供关于每种富集形式的相关过程的更多细节。Wherein (a) cfDNA library is enriched to increase the fetal part, (b) sequence library is enriched to increase the fetal part, or (c) its combination, so that before detecting the presence or absence of aneuploidy and at least one gene variant in a single maternal sample, the fetal part of a single maternal sample is increased by at least 1.5 times. In some embodiments, the cfDNA library is enriched to increase the fetal part, and the sequence library is enriched to increase the fetal part. The following sections provide more details on the relevant processes of each enrichment form.

(i).生物样本(i) Biological samples

出于所公开方法的目的,生物样本需要含cfDNA,包含cffDNA。可从生母获得用于所公开方法的样本的实例包含但不限于血液、血清、和血浆。For the purposes of the disclosed methods, biological samples need to contain cfDNA, including cffDNA. Examples of samples that can be obtained from the biological mother for the disclosed methods include, but are not limited to, blood, serum, and plasma.

在一些实施例中,核酸萃取将在扩增样本中的cfDNA及制备所述一个或多个cfDNA库之前进行。用于核酸萃取的各种方案可用于本技术的方法中。市售核酸纯化试剂盒的实例包含Apostle MiniMax试剂盒、Molzym有限公司(德国不来梅)、凯杰(德国希尔登)、德国MN(德国杜伦)、罗氏(瑞士巴塞尔)或西格玛(德国戴森霍芬)。还可以使用基于使用聚苯乙烯珠粒等作为支撑材料的其它核酸纯化系统。还可以使用自动化DNA萃取平台,如 自动化、或EZ1TM自动化系统。In some embodiments, nucleic acid extraction will be performed prior to amplifying the cfDNA in the sample and preparing the one or more cfDNA pools. Various protocols for nucleic acid extraction can be used in the methods of the present technology. Examples of commercially available nucleic acid purification kits include Apostle MiniMax kits, Molzym GmbH (Bremen, Germany), Qiagen (Hilden, Germany), MN (Düren, Germany), Roche (Basel, Switzerland), or Sigma (Deisenhofen, Germany). Other nucleic acid purification systems based on the use of polystyrene beads and the like as support materials can also be used. Automated DNA extraction platforms, such as Automation, or EZ1 TM automation system.

(ii).cfDNA库制备(ii) cfDNA library preparation

cfDNA库制备可使用已知的扩增方法(例如,xGen Prism库制备试剂盒(IDTTM))以及无PCR的库制备方法进行,如因美纳(Illumina)生产的COLLIBRITM和TRUSEQTM试剂盒、罗氏生产的KAPATMHyperPrep试剂盒、和MG科技(MG Tech)生产的MGIEasy试剂盒。可选地,cfDNA库的制备可包含末端修复的步骤。cfDNA可包括对给定核酸序列末端的其它损伤的突出端,且末端修复可将这种损伤或剪切的DNA转化为更容易连接至转接子、标签、或条形码的平端分子(blunt-ended molecule)。可进行一个或多个连接反应,以将转接子连接至样本的核酸序列上。转接子用于通过提供引物可以退火的一致序列来促进扩增,并用于分离感兴趣的序列。转接子可为一独特长度(以允许通过电泳进行分开及分离)、一独特序列,或包括其它特征以帮助扩增后分离目标核酸序列。cfDNA library preparation can be performed using known amplification methods (e.g., xGen Prism Library Preparation Kit (IDT )) as well as PCR-free library preparation methods, such as COLLIBRI , and TRUSEQ TM kit, KAPA TM HyperPrep kit produced by Roche, and MGIEasy kit produced by MG Tech. Optionally, the preparation of the cfDNA library may include an end repair step. cfDNA may include overhangs of other damage to the ends of a given nucleic acid sequence, and end repair may convert such damaged or sheared DNA into a blunt-ended molecule that is more easily connected to a transferor, a tag, or a barcode. One or more ligation reactions may be performed to connect the transferor to the nucleic acid sequence of the sample. The transferor is used to promote amplification by providing a consistent sequence to which the primer can anneal, and is used to separate sequences of interest. The transferor may be a unique length (to allow separation and separation by electrophoresis), a unique sequence, or include other features to help separate the target nucleic acid sequence after amplification.

基于PCR的方法通常用于在给定核酸样本的测序或分析之前产生经扩增的库;然而,PCR并非必需的,且所属技术领域中具有通常知识者将知悉不含PCR的库制备方法。利用市售试剂和聚合酶的各种PCR方法可用于库制备的核酸扩增部分(例如,KAPATMHiFiHotStart ReadyMix)。PCR-based methods are often used to generate amplified libraries prior to sequencing or analysis of a given nucleic acid sample; however, PCR is not required, and one of ordinary skill in the art will be aware of methods for library preparation that do not involve PCR. Various PCR methods utilizing commercially available reagents and polymerases can be used for the nucleic acid amplification portion of library preparation (e.g., KAPA HiFiHotStart ReadyMix).

使用本文描述或所属技术领域中具有通常知识者以其它方式已知的任何方法,可以从母体样本中制备cfDNA库。可选地,cfDNA库可以使用已知方法进行清理,如使用AMPURE珠粒或其它类似的方法分离库中的经扩增的片段,以便从样本中去除盐、不需要的大分子、和其它碎片。A cfDNA library can be prepared from a maternal sample using any method described herein or otherwise known to one of ordinary skill in the art. Optionally, the cfDNA library can be cleaned up using known methods, such as separating the amplified fragments in the library using AMPURE beads or other similar methods to remove salts, unwanted macromolecules, and other debris from the sample.

在cfDNA库测序之前,可如本文所述使胎儿部分富集。另外或替代地,在制备cfDNA库之前,可从母体样本中使胎儿部分富集。简单来说,富集cfDNA库或母体样本的胎儿部分是对样本的物理处理,其可包括从cfDNA库中去除任何大于约150个核苷酸长度、约155个核苷酸长度、约160个核苷酸长度、约165个核苷酸长度、约170个核苷酸长度、约175个核苷酸长度、约180个核苷酸长度、约185个核苷酸长度、约190个核苷酸长度、约195个核苷酸长度、或约200个核苷酸长度的DNA片段。Before cfDNA library sequencing, the fetal part can be enriched as described herein. In addition or alternatively, before preparing the cfDNA library, the fetal part can be enriched from the maternal sample. In simple terms, the fetal part of the enriched cfDNA library or maternal sample is a physical treatment of the sample, which may include removing any DNA fragment greater than about 150 nucleotides in length, about 155 nucleotides in length, about 160 nucleotides in length, about 165 nucleotides in length, about 170 nucleotides in length, about 175 nucleotides in length, about 180 nucleotides in length, about 185 nucleotides in length, about 190 nucleotides in length, about 195 nucleotides in length, or about 200 nucleotides in length from the cfDNA library.

这种类型的基于大小的排除可以使用电泳(例如,凝胶电泳或毛细管电泳)和其它已知方法来进行,所述方法可以利用例如DNA结合颗粒,如珠粒(例如,AMPURETM珠粒)。在一个实施例中,使用核酸电泳分离,然后回收所需的片段长度。各种已知电泳过程可以用于这种目的。例如,在一个实施例中,可使用用于高通量核酸大小选择的具有RangerTechnologyTM的NIMBUS SelectTM工作站。在另一实施例中,可使用BluePippin电泳系统。This type of size-based exclusion can be performed using electrophoresis (e.g., gel electrophoresis or capillary electrophoresis) and other known methods, which can utilize, for example, DNA-bound particles, such as beads (e.g., AMPURE beads). In one embodiment, nucleic acid electrophoresis is used to separate and then the required fragment length is recovered. Various known electrophoretic processes can be used for this purpose. For example, in one embodiment, the NIMBUS Select workstation with RangerTechnology for high-throughput nucleic acid size selection can be used. In another embodiment, the BluePippin electrophoresis system can be used.

先前的基于大小的排除方法已用于富集cfDNA库的胎儿部分,但与那些先前的方法不同,本发明人发现,如本文所述,当与进一步的计算机模拟选择相结合时,使用更高截止值可改善噪声降低效果。简单来说,尽管不受理论的束缚,但由于通过更宽容的大小选择保留了更高总数的cffDNA分子,因此可以降低噪声。常规地认为使用较低截止值更优异,因为它排除了更多母体cfDNA。图1示出了所公开的大小排除过程与传统方法的比较。如图1所示,这些限制性更强的传统方法还丢弃了大量cffDNA。因此,所公开的将更“宽容”的大小排除技术与进一步的计算机模拟富集组合的方法是一种改进,其具体解决了产前筛查领域中的一关键问题:胎儿部分的富集,而不会无意中或不必要地丢弃在一给定的样本内供应极其受限的cffDNA。Previous size-based exclusion methods have been used to enrich the fetal portion of the cfDNA library, but unlike those previous methods, the inventors found that, as described herein, the use of a higher cutoff value can improve the noise reduction effect when combined with further computer simulation selection. In short, although not bound by theory, the noise can be reduced because a higher total number of cffDNA molecules are retained by a more tolerant size selection. It is conventionally believed that using a lower cutoff value is superior because it excludes more maternal cfDNA. Figure 1 shows a comparison of the disclosed size exclusion process with traditional methods. As shown in Figure 1, these more restrictive traditional methods also discard a large amount of cffDNA. Therefore, the disclosed method of combining a more "tolerant" size exclusion technique with further computer simulation enrichment is an improvement that specifically solves a key problem in the field of prenatal screening: enrichment of the fetal portion without inadvertently or unnecessarily discarding the extremely limited supply of cffDNA in a given sample.

所公开的基于大小的排除方法可使cfDNA样本中的胎儿部分富集至少1.1X、1.2X、1.25X、1.5X、1.75X、2X、2.25X、2.5X、2.75X、3X、3.25X、3.5X、3.75X、4X、4.25X、4.5X、5.75X、5X、5.5X、6X、6.5X、7X、7.5X、8X、8.5X、9X、9.5X、10X、15X、20X、25X、或更多倍。The disclosed size-based exclusion methods can enrich the fetal portion in a cfDNA sample by at least 1.1X, 1.2X, 1.25X, 1.5X, 1.75X, 2X, 2.25X, 2.5X, 2.75X, 3X, 3.25X, 3.5X, 3.75X, 4X, 4.25X, 4.5X, 5.75X, 5X, 5.5X, 6X, 6.5X, 7X, 7.5X, 8X, 8.5X, 9X, 9.5X, 10X, 15X, 20X, 25X, or more.

(iii).对核酸库进行测序(iii) Sequencing of nucleic acid library

可以富集胎儿部分的核酸库可使用已知测序方法进行测序(例如,NovaSeq测序仪及流通池、Illumina测序仪、焦磷酸测序(pyrosequencing)、可逆染料终止子测序(Reversibledye-terminator sequencing)、SOLiD测序、离子半导体测序、Helioscope单分子测序、Ion TorrentTM(生命科技(Life Technologies),加利福尼亚州卡尔斯巴德(Carlsbad,CA))扩增子测序系统、454TM GS FLX TM测序系统、SMRTTM测序等)。在一些实施例中,从两端对核酸库中的cfDNA片段进行测序(即,成对末端模式)。在一些实施例中,对核酸库中的cfDNA片段进行一端测序(即,单末端模式)。在一些实施例中,核酸库中的cfDNA片段可以使用所靶向捕获方法如杂交捕获来分离或结合。从每个片段的两个末端测序允许确定片段的长度。在一些实施例中,可使用所得序列绘制cfDNA片段。The nucleic acid library that can be enriched for the fetal part can be sequenced using known sequencing methods (e.g., NovaSeq sequencer and flow cell, Illumina sequencer, pyrosequencing, reversible dye-terminator sequencing, SOLiD sequencing, ion semiconductor sequencing, Helioscope single molecule sequencing, Ion Torrent TM (Life Technologies, Carlsbad, CA) amplicon sequencing system, 454 TM GS FLX TM sequencing system, SMRT TM sequencing, etc.). In some embodiments, the cfDNA fragments in the nucleic acid library are sequenced from both ends (i.e., paired end mode). In some embodiments, the cfDNA fragments in the nucleic acid library are sequenced at one end (i.e., single end mode). In some embodiments, the cfDNA fragments in the nucleic acid library can be separated or combined using a targeted capture method such as hybrid capture. Sequencing from the two ends of each fragment allows the length of the fragment to be determined. In some embodiments, the resulting sequence can be used to draw cfDNA fragments.

在一些实施例中,所公开方法可利用目标捕获方法仅对感兴趣的特定片段进行测序。感兴趣的片段可例如对应于编码与遗传性疾病、病况、或性状(即,感兴趣的基因变体)相关的基因的cfDNA或对应于特定染色体的cfDNA。In some embodiments, the disclosed methods can utilize a targeted capture method to sequence only specific fragments of interest. The fragments of interest may, for example, correspond to cfDNA encoding a gene associated with a genetic disease, condition, or trait (i.e., a genetic variant of interest) or cfDNA corresponding to a specific chromosome.

一旦对核酸库中的cfDNA片段进行了测序,可使用本文所述的计算机模拟移动窗口分析进一步富集序列库的胎儿部分。出于所公开方法的目的,“窗口”是包含特定大小范围的序列的序列库的选择或子部分。例如,“窗口”可包含序列库中的所有序列,所述序列是0-145个核苷酸、0-150个核苷酸、0-155个核苷酸、0-160个核苷酸、0-165个核苷酸、0-170个核苷酸、0-175个核苷酸、0-180个核苷酸、0-185个核苷酸、0-190个核苷酸、0-195个核苷酸、0-200个核苷酸、25-145个核苷酸、25-150个核苷酸、25-155个核苷酸、25-160个核苷酸、25-165个核苷酸、25-170个核苷酸、25-175个核苷酸、25-180个核苷酸、25-185个核苷酸、25-190个核苷酸、25-195个核苷酸、25-200个核苷酸、50-145个核苷酸、50-150个核苷酸、50-155个核苷酸、50-160个核苷酸、50-165个核苷酸、50-170个核苷酸、50-175个核苷酸、50-180个核苷酸、50-185个核苷酸、50-190个核苷酸、50-195个核苷酸、50-200个核苷酸、75-145个核苷酸、75-150个核苷酸、75-155个核苷酸、75-160个核苷酸、75-165个核苷酸、75-170个核苷酸、75-175个核苷酸、75-180个核苷酸、75-185个核苷酸、75-190个核苷酸、75-195个核苷酸、75-200个核苷酸、100-145个核苷酸、100-150个核苷酸、100-155个核苷酸、100-160个核苷酸、100-165个核苷酸、100-170个核苷酸、100-175个核苷酸、100-180个核苷酸、100-185个核苷酸、100-190个核苷酸、100-195个核苷酸、100-200个核苷酸、或其之间的任何范围。在一些实施例中,所公开方法可利用二或更多个(例如,2、3、4、5、6、7、8、9、或10个、或更多个)窗口,所述窗口包含选自以下的二或更多个(例如,2、3、4、5、6、7、8、9、或10个、或更多个)大小范围中的片段:0-145个核苷酸、0-146个核苷酸、0-147个核苷酸、0-148个核苷酸、0-149个核苷酸、0-150个核苷酸、0-151个核苷酸、0-152个核苷酸、0-153个核苷酸、0-154个核苷酸、0-155个核苷酸、0-156个核苷酸、-157个核苷酸、0-158个核苷酸、0-159个核苷酸、0-160个核苷酸、0-161个核苷酸、0-162个核苷酸、0-163个核苷酸、0-164个核苷酸、0-165个核苷酸、0-166个核苷酸、0-167个核苷酸、0-168个核苷酸、0-169个核苷酸、0-170个核苷酸、0-171个核苷酸、0-172个核苷酸、0-173个核苷酸、0-174个核苷酸、0-175个核苷酸、0-176个核苷酸、0-177个核苷酸、0-178个核苷酸、0-179个核苷酸、0-180个核苷酸、0-181个核苷酸、0-182个核苷酸、0-183个核苷酸、0-184个核苷酸、0-185个核苷酸、0-186个核苷酸、0-187个核苷酸、0-188个核苷酸、0-189个核苷酸、0-190个核苷酸、0-191个核苷酸、0-192个核苷酸、0-193个核苷酸、0-194个核苷酸、0-195个核苷酸、0-196个核苷酸、0-197个核苷酸、0-198个核苷酸、0-199个核苷酸、0-200个核苷酸、5-145个核苷酸、5-146个核苷酸、5-147个核苷酸、5-148个核苷酸、5-149个核苷酸、5-150个核苷酸、5-151个核苷酸、5-152个核苷酸、5-153个核苷酸、5-154个核苷酸、5-155个核苷酸、5-156个核苷酸、-157个核苷酸、5-158个核苷酸、5-159个核苷酸、5-160个核苷酸、5-161个核苷酸、5-162个核苷酸、5-163个核苷酸、5-164个核苷酸、5-165个核苷酸、5-166个核苷酸、5-167个核苷酸、5-168个核苷酸、5-169个核苷酸、5-170个核苷酸、5-171个核苷酸、5-172个核苷酸、5-173个核苷酸、5-174个核苷酸、5-175个核苷酸、5-176个核苷酸、5-177个核苷酸、5-178个核苷酸、5-179个核苷酸、5-180个核苷酸、5-181个核苷酸、5-182个核苷酸、5-183个核苷酸、5-184个核苷酸、5-185个核苷酸、5-186个核苷酸、5-187个核苷酸、5-188个核苷酸、5-189个核苷酸、5-190个核苷酸、5-191个核苷酸、5-192个核苷酸、5-193个核苷酸、5-194个核苷酸、5-195个核苷酸、5-196个核苷酸、5-197个核苷酸、5-198个核苷酸、5-199个核苷酸、5-200个核苷酸、10-145个核苷酸、10-146个核苷酸、10-147个核苷酸、10-148个核苷酸、10-149个核苷酸、10-150个核苷酸、10-151个核苷酸、10-152个核苷酸、10-153个核苷酸、10-154个核苷酸、10-155个核苷酸、10-156个核苷酸、-157个核苷酸、10-158个核苷酸、10-159个核苷酸、10-160个核苷酸、10-161个核苷酸、10-162个核苷酸、10-163个核苷酸、10-164个核苷酸、10-165个核苷酸、10-166个核苷酸、10-167个核苷酸、10-168个核苷酸、10-169个核苷酸、10-170个核苷酸、10-171个核苷酸、10-172个核苷酸、10-173个核苷酸、10-174个核苷酸、10-175个核苷酸、10-176个核苷酸、10-177个核苷酸、10-178个核苷酸、10-179个核苷酸、10-180个核苷酸、10-181个核苷酸、10-182个核苷酸、10-183个核苷酸、10-184个核苷酸、10-185个核苷酸、10-186个核苷酸、10-187个核苷酸、10-188个核苷酸、10-189个核苷酸、10-190个核苷酸、10-191个核苷酸、10-192个核苷酸、10-193个核苷酸、10-194个核苷酸、10-195个核苷酸、10-196个核苷酸、10-197个核苷酸、10-198个核苷酸、10-199个核苷酸、10-200个核苷酸、15-145个核苷酸、15-146个核苷酸、15-147个核苷酸、15-148个核苷酸、15-149个核苷酸、15-150个核苷酸、15-151个核苷酸、15-152个核苷酸、15-153个核苷酸、15-154个核苷酸、15-155个核苷酸、15-156个核苷酸、-157个核苷酸、15-158个核苷酸、15-159个核苷酸、15-160个核苷酸、15-161个核苷酸、15-162个核苷酸、15-163个核苷酸、15-164个核苷酸、15-165个核苷酸、15-166个核苷酸、15-167个核苷酸、15-168个核苷酸、15-169个核苷酸、15-170个核苷酸、15-171个核苷酸、15-172个核苷酸、15-173个核苷酸、15-174个核苷酸、15-175个核苷酸、15-176个核苷酸、15-177个核苷酸、15-178个核苷酸、15-179个核苷酸、15-180个核苷酸、15-181个核苷酸、15-182个核苷酸、15-183个核苷酸、15-184个核苷酸、15-185个核苷酸、15-186个核苷酸、15-187个核苷酸、15-188个核苷酸、15-189个核苷酸、15-190个核苷酸、15-191个核苷酸、15-192个核苷酸、15-193个核苷酸、15-194个核苷酸、15-195个核苷酸、15-196个核苷酸、15-197个核苷酸、15-198个核苷酸、15-199个核苷酸、15-200个核苷酸、或其之间的任何范围。在一些实施例中,所公开方法可利用至少八个窗口,所述窗口所包括的大小范围包含0至约145个核苷酸、0至约150个核苷酸、0至约155个核苷酸、0至约160个核苷酸、0至约165个核苷酸、0至约168个核苷酸、0至约175个核苷酸、及0至约190个核苷酸。在一些实施例中,所公开方法可利用八个窗口,所述窗口包括大小范围为0-145个核苷酸、0-150个核苷酸、0-155个核苷酸、0-160个核苷酸、0-165个核苷酸、0-168个核苷酸、0-175个核苷酸、和0-190个核苷酸。Once the cfDNA fragments in the nucleic acid library have been sequenced, the fetal portion of the sequence library can be further enriched using a computer simulated moving window analysis as described herein. For the purposes of the disclosed methods, a "window" is a selection or sub-portion of a sequence library that contains sequences of a particular size range. For example, a "window" can include all sequences in a sequence library that are 0-145 nucleotides, 0-150 nucleotides, 0-155 nucleotides, 0-160 nucleotides, 0-165 nucleotides, 0-170 nucleotides, 0-175 nucleotides, 0-180 nucleotides, 0-185 nucleotides, 0-190 nucleotides, 0-195 nucleotides, 0-200 nucleotides, 25-145 nucleotides, 25-150 nucleotides, 25-155 nucleotides, 25-160 nucleotides, 25-165 nucleotides, 25-170 nucleotides, 25-175 nucleotides, 25-180 nucleotides, 25-185 nucleotides, 25-190 nucleotides, 25-195 nucleotides, 25-200 nucleotides, 50-145 nucleotides, 50-150 nucleotides, 50-155 nucleotides, 50-160 nucleotides, 50-165 nucleotides, 50-170 nucleotides, 50-1 75 nucleotides, 50-180 nucleotides, 50-185 nucleotides, 50-190 nucleotides, 50-195 nucleotides, 50-200 nucleotides, 75-145 nucleotides, 75-150 nucleotides, 75-155 nucleotides, 75-160 nucleotides, 75-165 nucleotides, 75-170 nucleotides, 75-175 nucleotides, 75-180 nucleotides, 75-185 nucleotides, 75-190 nucleotides, 75 -195 nucleotides, 75-200 nucleotides, 100-145 nucleotides, 100-150 nucleotides, 100-155 nucleotides, 100-160 nucleotides, 100-165 nucleotides, 100-170 nucleotides, 100-175 nucleotides, 100-180 nucleotides, 100-185 nucleotides, 100-190 nucleotides, 100-195 nucleotides, 100-200 nucleotides, or any range therebetween. In some embodiments, the disclosed methods can utilize two or more (e.g., 2, 3, 4, 5, 6, 7, 8, 9, or 10, or more) windows comprising fragments in two or more (e.g., 2, 3, 4, 5, 6, 7, 8, 9, or 10, or more) size ranges selected from 0-145 nucleotides, 0-146 nucleotides, 0-147 nucleotides, 0-148 nucleotides, 0-149 nucleotides, 0-150 nucleotides, 0-151 nucleotides, 0-152 nucleotides, 0-153 nucleotides, 0-154 nucleotides, 0-155 nucleotides, 0-156 nucleotides, -157 nucleotides, 0-158 nucleotides, 0-159 nucleotides, 0-160 nucleotides, 0-161 nucleotides, 0-162 nucleotides, 0-163 nucleotides, 0-164 nucleotides, 0-165 nucleotides, 0-166 nucleotides, 0-167 nucleotides, 0-168 nucleotides, 0-169 nucleotides, 0-170 nucleotides, 0-171 nucleotides, 0-172 nucleotides, 0-173 nucleotides, 0-174 nucleotides, 0-175 nucleotides, 0-176 nucleotides, 0-177 nucleotides, 0-178 nucleotides, 0-179 nucleotides, 0-18 0 nucleotides, 0-181 nucleotides, 0-182 nucleotides, 0-183 nucleotides, 0-184 nucleotides, 0-185 nucleotides, 0-186 nucleotides, 0-187 nucleotides, 0-188 nucleotides, 0-189 nucleotides, 0-190 nucleotides, 0-191 nucleotides, 0-192 nucleotides, 0-193 nucleotides, 0-194 nucleotides, 0-195 nucleotides Acid, 0-196 nucleotides, 0-197 nucleotides, 0-198 nucleotides, 0-199 nucleotides, 0-200 nucleotides, 5-145 nucleotides, 5-146 nucleotides, 5-147 nucleotides, 5-148 nucleotides, 5-149 nucleotides, 5-150 nucleotides, 5-151 nucleotides, 5-152 nucleotides, 5-153 nucleotides, 5-154 nucleotides, 5- 155 nucleotides, 5-156 nucleotides, -157 nucleotides, 5-158 nucleotides, 5-159 nucleotides, 5-160 nucleotides, 5-161 nucleotides, 5-162 nucleotides, 5-163 nucleotides, 5-164 nucleotides, 5-165 nucleotides, 5-166 nucleotides, 5-167 nucleotides, 5-168 nucleotides, 5-169 nucleotides, 5-170 nucleotides nucleotides, 5-171 nucleotides, 5-172 nucleotides, 5-173 nucleotides, 5-174 nucleotides, 5-175 nucleotides, 5-176 nucleotides, 5-177 nucleotides, 5-178 nucleotides, 5-179 nucleotides, 5-180 nucleotides, 5-181 nucleotides, 5-182 nucleotides, 5-183 nucleotides, 5-184 nucleotides, 5-185 nucleotides, 5 -186 nucleotides, 5-187 nucleotides, 5-188 nucleotides, 5-189 nucleotides, 5-190 nucleotides, 5-191 nucleotides, 5-192 nucleotides, 5-193 nucleotides, 5-194 nucleotides, 5-195 nucleotides, 5-196 nucleotides, 5-197 nucleotides, 5-198 nucleotides, 5-199 nucleotides, 5-200 nucleotides, 10-14 5 nucleotides, 10-146 nucleotides, 10-147 nucleotides, 10-148 nucleotides, 10-149 nucleotides, 10-150 nucleotides, 10-151 nucleotides, 10-152 nucleotides, 10-153 nucleotides, 10-154 nucleotides, 10-155 nucleotides, 10-156 nucleotides, -157 nucleotides, 10-158 nucleotides, 10-159 nucleotides, 10-160 nucleotides, 10-161 nucleotides, 10-162 nucleotides, 10-163 nucleotides, 10-164 nucleotides, 10-165 nucleotides, 10-166 nucleotides, 10-167 nucleotides, 10-168 nucleotides, 10-169 nucleotides, 10-170 nucleotides, 10-171 nucleotides, 10-172 nucleotides, 10-173 nucleotides, 10-174 nucleotides, 10-175 nucleotides, 10-176 nucleotides, 10-177 nucleotides, 10-178 nucleotides, 10-179 nucleotides, 10-180 nucleotides, 10-181 nucleotides, 10-182 nucleotides, 10-183 nucleotides, 10-184 nucleotides, 10-185 nucleotides, 10-186 nucleotides, 10-187 nucleotides, 10-188 nucleotides, 10-189 nucleotides, 10-190 nucleotides, 10-191 nucleotides, 10-192 nucleotides, 10-193 nucleotides, 10-194 nucleotides, 10-195 nucleotides, 10-196 nucleotides, 10-197 nucleotides, 10-198 nucleotides, 10-199 nucleotides, 10-200 nucleotides, 15-145 nucleotides, 15-146 nucleotides, 15-147 nucleotides, 15-148 nucleotides, 15-149 nucleotides, 15-150 nucleotides, 15-151 nucleotides, 15-152 nucleotides, 15-153 nucleotides, 15-154 nucleotides, 15-155 nucleotides, 15-156 nucleotides, -157 nucleotides, 15-158 nucleotides, 15-159 nucleotides Acid, 15-160 nucleotides, 15-161 nucleotides, 15-162 nucleotides, 15-163 nucleotides, 15-164 nucleotides, 15-165 nucleotides, 15-166 nucleotides, 15-167 nucleotides, 15-168 nucleotides, 15-169 nucleotides, 15-170 nucleotides, 15-171 nucleotides, 15-172 nucleotides, 15-173 nucleotides Acid, 15-174 nucleotides, 15-175 nucleotides, 15-176 nucleotides, 15-177 nucleotides, 15-178 nucleotides, 15-179 nucleotides, 15-180 nucleotides, 15-181 nucleotides, 15-182 nucleotides, 15-183 nucleotides, 15-184 nucleotides, 15-185 nucleotides, 15-186 nucleotides, 15-187 nucleotides In some embodiments, the disclosed method can utilize at least eight windows, and the size ranges included in the windows include 0 to about 145 nucleotides, 0 to about 150 nucleotides, 0 to about 155 nucleotides, 0 to about 160 nucleotides, 0 to about 165 nucleotides, 0 to about 168 nucleotides, 0 to about 175 nucleotides, and 0 to about 190 nucleotides. In some embodiments, the disclosed methods may utilize eight windows including size ranges of 0-145 nucleotides, 0-150 nucleotides, 0-155 nucleotides, 0-160 nucleotides, 0-165 nucleotides, 0-168 nucleotides, 0-175 nucleotides, and 0-190 nucleotides.

在一些实施例中,所公开方法可利用二或更多个(例如,2、3、4、5、6、7、8、9、或10个、或更多个)窗口,所述窗口包含选自以下的二或更多个(例如,2、3、4、5、6、7、8、9、或10个、或更多个)大小范围中的片段:约20至约145个核苷酸、约20至约150个核苷酸、约20至约155个核苷酸、约20至约160个核苷酸、约20至约165个核苷酸、约20至约170个核苷酸、约20至约175个核苷酸、约20至约180个核苷酸、约20至约185个核苷酸、约20至约190个核苷酸、约20至约195个核苷酸、约20至约200个核苷酸、约25至约145个核苷酸、约25至约150个核苷酸、约25至约155个核苷酸、约25至约160个核苷酸、约25至约165个核苷酸、约25至约170个核苷酸、约25至约175个核苷酸、约25至约180个核苷酸、约25至约185个核苷酸、约25至约190个核苷酸、约25至约195个核苷酸、约25至约200个核苷酸、约50至约145个核苷酸、约50至约150个核苷酸、约50至约155个核苷酸、约50至约160个核苷酸、约50至约165个核苷酸、约50至约170个核苷酸、约50至约175个核苷酸、约50至约180个核苷酸、约50至约185个核苷酸、约50至约190个核苷酸、约50至约195个核苷酸、约50至约200个核苷酸、约75至约145个核苷酸、约75至约150个核苷酸、约75至约155个核苷酸、约75至约160个核苷酸、约75至约165个核苷酸、约75至约170个核苷酸、约75至约175个核苷酸、约75至约180个核苷酸、约75至约185个核苷酸、约75至约190个核苷酸、约75至约195个核苷酸、约75至约200个核苷酸、约100至约145个核苷酸、约100至约150个核苷酸、约100至约155个核苷酸、约100至约160个核苷酸、约100至约165个核苷酸、约100至约170个核苷酸、约100至约175个核苷酸、约100至约180个核苷酸、约100至约185个核苷酸、约100至约190个核苷酸、约100至约195个核苷酸、约100至约200个核苷酸、或其之间的任何范围。In some embodiments, the disclosed methods can utilize two or more (e.g., 2, 3, 4, 5, 6, 7, 8, 9, or 10, or more) windows comprising fragments in two or more (e.g., 2, 3, 4, 5, 6, 7, 8, 9, or 10, or more) size ranges selected from about 20 to about 145 nucleotides, about 20 to about 150 nucleotides, about 20 to about 155 nucleotides, about 20 to about 160 nucleotides, about 20 to about 165 nucleotides, about 20 to about 170 nucleotides, about 20 to about 175 nucleotides, about 20 to about 180 nucleotides, about 20 to about 185 nucleotides, about 20 to about 190 nucleotides, about 20 to about 195 nucleotides, about 20 to about 196 nucleotides, about 20 to about 197 nucleotides, about 20 to about 198 nucleotides, about 20 to about 210 nucleotides, about 20 to about 211 nucleotides, about 20 to about 212 nucleotides, about 20 to about 213 nucleotides, about 20 to about 214 nucleotides, about 20 to about 215 nucleotides, about 20 to about 216 nucleotides, about 20 to about 218 190 nucleotides, about 20 to about 195 nucleotides, about 20 to about 200 nucleotides, about 25 to about 145 nucleotides, about 25 to about 150 nucleotides, about 25 to about 155 nucleotides, about 25 to about 160 nucleotides, about 25 to about 165 nucleotides, about 25 to about 170 nucleotides, about 25 to about 175 nucleotides, about 25 to about 180 nucleotides, about 25 to about 185 nucleotides, about 25 to about 190 nucleotides, about 25 to about 195 nucleotides, about 25 to about 200 nucleotides, about 50 to about 145 nucleotides, about 50 to about 150 nucleotides, about 50 ... nucleotides, about 50 to about 160 nucleotides, about 50 to about 165 nucleotides, about 50 to about 170 nucleotides, about 50 to about 175 nucleotides, about 50 to about 180 nucleotides, about 50 to about 185 nucleotides, about 50 to about 190 nucleotides, about 50 to about 195 nucleotides, about 50 to about 200 nucleotides, about 75 to about 145 nucleotides, about 75 to about 150 nucleotides, about 75 to about 155 nucleotides, about 75 to about 160 nucleotides, about 75 to about 165 nucleotides, about 75 to about 170 nucleotides, about 75 to about 175 nucleotides, about 75 to about 180 nucleotides, about 75 to about 185 nucleotides, about 50 to about 190 nucleotides, about 50 to about 195 nucleotides, about 50 to about 200 nucleotides. In some embodiments, the present invention relates to a polypeptide having a plurality of nucleotides, the plurality of nucleotides, the plurality of nucleotides, the plurality of nucleotides, the plurality of nucleotides, the plurality of nucleotides, the plurality of nucleotides, the plurality of nucleotides, the plurality of nucleotides, the plurality of nucleotides, the plurality of nucleotides, the plurality of nucleotides, the plurality of nucleotides, the plurality of nucleotides, the plurality of nucleotides, the plurality of nucleotides, the plurality of nucleotides, the plurality of nucleotides, the plurality of nucleotides, the plurality of nucleotides, the plurality of nucleotides, the plurality of nucleotides, the plurality of nucleotides, the plurality of nucleotides, the plurality of nucleotides, the plurality of nucleotides, the plurality of

出于所公开方法的目的,用于后续分析和轨迹计算的窗口可以是不同的大小(即,每个窗口包含不同范围的片段大小,如0-145、0-150、0-155等)或者窗口可以是相同的大小(即,每个窗口包含不同的片段,但跨越设定的大小范围,如0-145、5-150、10-155等)。窗口可被视为“未闸控的”,即未设定特定最大值和最小值,且相反所述窗口包含整个序列库。图2示出了可如何将序列库中的序列分成六个不同窗口的实例。For the purposes of the disclosed method, the windows used for subsequent analysis and trajectory calculation can be different sizes (i.e., each window contains a different range of fragment sizes, such as 0-145, 0-150, 0-155, etc.) or the windows can be the same size (i.e., each window contains different fragments, but spans a set size range, such as 0-145, 5-150, 10-155, etc.). The windows can be considered "ungated", i.e., no specific maximum and minimum values are set, and instead the window contains the entire sequence library. Figure 2 shows an example of how the sequences in the sequence library can be divided into six different windows.

如上所述,富集序列库的胎儿部分是计算机模拟富集的形式,其可包括对序列库的至少两个窗口中的序列进行基于读段长度的大小排除,从而获得至少两个富集胎儿部分的序列库。比较每个窗口中的等位基因平衡允许计算各个富集胎儿部分的序列库之间的等位基因平衡轨迹。等位基因平衡轨迹是任何给定感兴趣的基因序列的等位基因平衡百分比跨所观察窗口的变化。等位基因平衡轨迹可计算为每个所观察窗口中等位基因平衡的斜率,并且它可以多种方式视觉化,如图3所示。例如,图3上图中的带型(banding pattern)示出了等位基因平衡在多个所观察窗口中的差异,或者等位基因平衡轨迹可被视觉化为高斯混合模型(Gaussian mixture model;GMM)。应理解,每个窗口(例如,0-145、0-150、0-155等)将拥有与其它窗口不同的相关联的胎儿部分,并且这个胎儿部分值可用作轨迹图的X轴,如图3所示(上图)。换句话说,图3(上图)所示的轨迹图类型提供了等位基因平衡相对于胎儿部分的视觉化,其中沿X轴(即,胎儿部分轴)的点由不同窗口的选择提供。As mentioned above, the fetal part of enrichment sequence library is the form of computer simulation enrichment, which may include the sequence in at least two windows of sequence library being excluded based on the size of read length, thereby obtaining the sequence library of at least two enrichment fetal parts.Comparing the allele balance in each window allows calculating the allele balance track between the sequence library of each enrichment fetal part.The allele balance track is the variation of the allele balance percentage across the observed window of any given gene sequence of interest.The allele balance track can be calculated as the slope of allele balance in each observed window, and it can be visualized in a variety of ways, as shown in Figure 3.For example, the banding pattern (banding pattern) in the figure above Fig. 3 shows the difference of allele balance in multiple observed windows, or the allele balance track can be visualized as Gaussian mixture model (Gaussian mixture model; GMM).It should be understood that each window (for example, 0-145, 0-150, 0-155 etc.) will have the fetal part associated with different from other windows, and this fetal part value can be used as the X-axis of trajectory diagram, as shown in Figure 3 (upper figure). In other words, a trajectory plot of the type shown in FIG3 (top) provides a visualization of allelic balance relative to fetal fraction, where points along the X-axis (ie, the fetal fraction axis) are provided by selections of different windows.

无论等位基因平衡数据是如何视觉化,它皆可用于鉴定cfDNA序列库中的杂合和纯合突变或感兴趣的标记。例如,等位基因平衡可转化为带型图,其中y轴针对特定感兴趣的基因或核酸序列显示具有给定等位基因的样本中cfDNA的百分比(例如,0%、10%、20%、30%、40%、50%、60%),且x轴显示对应于野生型序列或与特定疾病、病况、或性状相关联的突变/变体(例如,与囊肿纤维化相关联的CFTR基因内的不同已知突变)的感兴趣的基因或核酸的不同替代物。Regardless of how the allelic balance data is visualized, it can be used to identify heterozygous and homozygous mutations or markers of interest in a library of cfDNA sequences. For example, allelic balance can be converted into a strip chart where the y-axis shows the percentage of cfDNA in a sample with a given allele for a particular gene or nucleic acid sequence of interest (e.g., 0%, 10%, 20%, 30%, 40%, 50%, 60%), and the x-axis shows different alternatives for the gene or nucleic acid of interest corresponding to the wild-type sequence or a mutation/variant associated with a particular disease, condition, or trait (e.g., different known mutations within the CFTR gene associated with cystic fibrosis).

举例来说,在具有20%胎儿部分的样本或窗口中,可示出y轴上10%处的条带对应于作为来自生父DNA的携带者的胎儿(或者,在一些情况下,它可表示胎儿中的原发突变(denovo mutation))。所述窗口内y轴上40%处的条带对应于感兴趣的基因或序列中突变/变体为阴性(即,纯合参考)的胎儿。y轴上50%处的条带对应于来自生母DNA的携带者的胎儿,或者,如果母亲和父亲携带相同的突变/变体(即,alt等位基因),那么所述胎儿可具有父亲的alt等位基因和母亲的参考等位基因。因此,50%处的条带可指示胎儿和母亲各有一个alt等位基因。y轴上60%处的条带对应于感兴趣的基因或序列中突变/变体为纯合alt的胎儿(即,胎儿为阳性)。因此,分析跨序列库的多个窗口(即,多个富集胎儿部分的序列库)的等位基因平衡提供了一种新的有用方法,以从包含cfDNA的母体样本中确定基因变体/突变的存在或不存在,而不需要任何额外的样本。此外,由于移动窗口分析提供的富集,噪声和背景显著降低,此使得即使在具有极少量cffDNA(例如,<总cfDNA的5%)的样本中仍能进行稳健的检测。另外,应注意,如果窗口或样本不具有20%的胎儿部分,那么前述条带可能偏移或移动,且它们可能不会分别精确地处于10%、40%、50%、和60%处。For example, in a sample or window with a 20% fetal portion, the band at 10% on the y-axis may be shown to correspond to a fetus that is a carrier of DNA from the biological father (or, in some cases, it may represent a denovo mutation in the fetus). The band at 40% on the y-axis within the window corresponds to a fetus that is negative (i.e., homozygous reference) for a mutation/variant in the gene or sequence of interest. The band at 50% on the y-axis corresponds to a fetus that is a carrier of DNA from the biological mother, or, if the mother and father carry the same mutation/variant (i.e., alt allele), the fetus may have the father's alt allele and the mother's reference allele. Therefore, the band at 50% may indicate that the fetus and mother each have an alt allele. The band at 60% on the y-axis corresponds to a fetus that is homozygous alt for a mutation/variant in the gene or sequence of interest (i.e., the fetus is positive). Therefore, analyzing the allele balance of multiple windows across the sequence library (i.e., multiple sequence libraries enriched with fetal parts) provides a new and useful method to determine the presence or absence of gene variants/mutations from maternal samples containing cfDNA without the need for any additional samples. In addition, due to the enrichment provided by the moving window analysis, noise and background are significantly reduced, which enables robust detection even in samples with very small amounts of cffDNA (e.g., <5% of total cfDNA). In addition, it should be noted that if the window or sample does not have a 20% fetal part, the aforementioned bands may be offset or moved, and they may not be exactly at 10%, 40%, 50%, and 60%, respectively.

虽然为了确定等位基因平衡轨迹需要至少两个窗口,但为了所公开方法的目的可以评估的窗口的数目不受特别限制,且可包含多个额外的窗口。因此,在一些实施例中,评估第一序列库的至少3个、至少4个、至少5个、至少6个、至少7个、至少8个、至少9个、或至少10个窗口,从而分别获得至少3个、至少4个、至少5个、至少6个、至少7个、至少8个、至少9个、或至少10个富集胎儿部分的序列库,从中可以鉴定和分离cffDNA序列。在一些实施例中,序列库的至少两个窗口选自(i)0-145个核苷酸的序列、(ii)0-150个核苷酸的序列、(iii)0-155个核苷酸、(iv)0-160个核苷酸、(v)0-165个核苷酸、(vi)0-170个核苷酸、(vii)0-175个核苷酸、(viii)0-180个核苷酸、(ix)0-190个核苷酸、(x)0-195个核苷酸、(xi)0-200个核苷酸、和(xii)未闸控者。在一些实施例中,至少3个、至少4个、至少5个、至少6个、至少7个、至少8个、至少9个、或至少10个窗口选自(i)0-145个核苷酸的序列、(ii)0-150个核苷酸的序列、(iii)0-155个核苷酸、(iv)0-160个核苷酸、(v)0-165个核苷酸、(vi)0-170个核苷酸、(vii)0-175个核苷酸、(viii)0-180个核苷酸、(ix)0-190个核苷酸、(x)0-195个核苷酸、(xi)0-200个核苷酸、和(xii)未闸控者。Although at least two windows are required to determine the allelic balance trajectory, the number of windows that can be evaluated for the purposes of the disclosed method is not particularly limited, and may include multiple additional windows. Therefore, in some embodiments, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, or at least 10 windows of the first sequence library are evaluated, thereby obtaining at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, or at least 10 sequence libraries enriched in the fetal portion, respectively, from which cffDNA sequences can be identified and isolated. In some embodiments, at least two windows of the sequence library are selected from (i) sequences of 0-145 nucleotides, (ii) sequences of 0-150 nucleotides, (iii) 0-155 nucleotides, (iv) 0-160 nucleotides, (v) 0-165 nucleotides, (vi) 0-170 nucleotides, (vii) 0-175 nucleotides, (viii) 0-180 nucleotides, (ix) 0-190 nucleotides, (x) 0-195 nucleotides, (xi) 0-200 nucleotides, and (xii) ungated. In some embodiments, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, or at least 10 windows are selected from (i) a sequence of 0-145 nucleotides, (ii) a sequence of 0-150 nucleotides, (iii) 0-155 nucleotides, (iv) 0-160 nucleotides, (v) 0-165 nucleotides, (vi) 0-170 nucleotides, (vii) 0-175 nucleotides, (viii) 0-180 nucleotides, (ix) 0-190 nucleotides, (x) 0-195 nucleotides, (xi) 0-200 nucleotides, and (xii) ungated.

图4提供了在序列库内计算机模拟富集胎儿部分,然后使用生物信息学算法(在本文中还称为“调用程式(caller)”)及后处理以从单一样本中并行鉴定非整倍体和基因变体的过程的一个示范性实施例的概述。FIG. 4 provides an overview of an exemplary embodiment of a process for in silico enrichment of fetal fractions within a sequence library, followed by use of a bioinformatics algorithm (also referred to herein as a "caller") and post-processing to identify aneuploidies and genetic variants in parallel from a single sample.

A.样本处理和计算管线A. Sample Processing and Computational Pipeline

一般来说,用于对非整倍体和基因变体进行并行评估的所公开方法的样本处理步骤可以如上文第II部分(“样本制备”)所述进行。在此处展开更多特征。In general, the sample processing steps of the disclosed methods for concurrent assessment of aneuploidy and genetic variants can be performed as described above in Section II ("Sample Preparation"). Expand more features here.

所公开方法可包括将来自序列库的测序数据转化为有用输出的计算管线,其包含确定cffDNA中是否存在非整倍体或任何基因变体。可以可选地提供的额外有用输出包含但不限于胎儿性别和其它基本胎儿统计数据的确定。The disclosed method may include converting the sequencing data from the sequence library into a computational pipeline of useful output, which includes determining whether aneuploidy or any gene variant exists in the cffDNA. The additional useful output that may be optionally provided includes but is not limited to the determination of fetal gender and other basic fetal statistics.

计算管线可包括二进制比对图(BAM)处理,其中所收集的DNA样本可使用短测序读段进行计算重建。如果可获得测序读段可与之比对的参考基因组,那么可促进基因组的重建。可使用序列比对工具将存储在文件中的短读段映射到参考基因组。这种做法会产生BAM文件,其中特异性基因序列可以在下一步中进行处理。The computational pipeline may include a binary alignment map (BAM) process, in which the collected DNA samples may be reconstructed using short sequencing reads. If a reference genome to which sequencing reads can be compared is available, the reconstruction of the genome may be facilitated. A sequence alignment tool may be used to map the short reads stored in the file to the reference genome. This approach may produce a BAM file, in which the specific gene sequence may be processed in the next step.

计算管线还可包括深度和变体处理,在此期间,可以鉴定和分离特异性基因序列,以通知针对特定非整倍体和/或基因变体的后续分析。基于所收集的初始DNA的量,可对所收集的DNA的特定部分进行标绘,并且可选地,对其进行汇集,以用于分析及检测感兴趣的特异性序列。一旦在深度和变体处理步骤进行了标绘,即可使用特定调用程式及后处理来鉴定及汇集关于非整倍体、基因变体、和任何其它输出的输出信息到结果报告中。通常将结果报告、递送、或传送给母亲、父亲、督管妊娠的医生(即,母亲的妇产科医生)、或其组合。The computing pipeline may also include depth and variant processing, during which time, specific gene sequences may be identified and separated to notify subsequent analysis of specific aneuploidy and/or gene variants. Based on the amount of the initial DNA collected, the specific portion of the DNA collected may be plotted, and optionally, it may be brought together for analysis and detection of specific sequences of interest. Once plotted in depth and variant processing steps, specific call routines and post-processing may be used to identify and bring together output information about aneuploidy, gene variants, and any other outputs in the result report. Usually the result report, delivery, or transmission to mother, father, the doctor supervising pregnancy (i.e., mother's obstetrician and gynecologist) or a combination thereof.

可使用特定生物信息学算法(即,“调用程序”;下文描述)来评估BAM文件中DNA样本的深度。所用的调用程式可确定非整倍体和感兴趣的基因变体两者的存在或不存在。即,这两个目标可使用相同制备及处理的BAM文件一起完成(例如,以并行方式)。因此,可使用非整倍体调用程式来检测非整倍体,同时可以并行运行使用专用调用程式的其它基因变体。这些计算步骤的具体范围将在下文更详细地讨论。A specific bioinformatics algorithm (i.e., a "caller"; described below) can be used to assess the depth of a DNA sample in a BAM file. The caller used can determine the presence or absence of both aneuploidy and a gene variant of interest. That is, these two goals can be accomplished together (e.g., in parallel) using the same prepared and processed BAM file. Therefore, an aneuploid caller can be used to detect aneuploidy, while other gene variants using a dedicated caller can be run in parallel. The specific scope of these calculation steps will be discussed in more detail below.

应理解,如上所述的本公开可以模块或集成方式使用计算机软件以控制逻辑的形式实现。基于本文提供的公开内容和教导,所属技术领域中具有通常知识者将知悉及理解使用硬件及硬件与软件的组合来实现本公开的其它方式和/或方法。It should be understood that the present disclosure as described above can be implemented in the form of control logic using computer software in a modular or integrated manner. Based on the disclosure and teachings provided herein, a person of ordinary skill in the art will know and understand other ways and/or methods of implementing the present disclosure using hardware and a combination of hardware and software.

在本申请案中描述的软件组件、过程、或功能中的任一个可被实现为由处理器使用任何合适的计算机语言(如,例如Python、R、汇编语言Java、JavaScript、C、C++、或Perl)、使用例如常规或面向对象的技术执行的软件代码。软件代码可作为一系列指令或命令存储在计算机可读取媒体上,如随机存取存储器(RAM)、只读存储器(ROM)、磁性媒体(如硬磁碟或软磁碟)、或光学媒体(如CD-ROM)。任何这种计算机可读取媒体可驻留在单一计算设备上或计算设备内,并且可存在于系统或网络内的不同计算设备上或计算设备内。Any of the software components, processes, or functions described in this application may be implemented as software code executed by a processor using any suitable computer language (such as, for example, Python, R, assembly language Java, JavaScript, C, C++, or Perl), using, for example, conventional or object-oriented techniques. The software code may be stored as a series of instructions or commands on a computer-readable medium, such as a random access memory (RAM), a read-only memory (ROM), a magnetic medium (such as a hard disk or a floppy disk), or an optical medium (such as a CD-ROM). Any such computer-readable medium may reside on or within a single computing device, and may be present on or within different computing devices within a system or network.

B.非整倍体的检测B. Detection of Aneuploidy

出于本公开的目的,可使用所公开方法评估或检测的非整倍体包含但不限于单染色体(例如,Turner综合征)、三染色体(例如,唐氏综合征、Edward氏综合征、Patau氏综合征、三染色体13、三染色体18、三染色体21)、四染色体、多染色体X和/或Y、微缺失及微重复(如染色体22q11.2缺失综合征)、和五染色体。For the purposes of this disclosure, aneuploidies that may be assessed or detected using the disclosed methods include, but are not limited to, single chromosomes (e.g., Turner syndrome), trisomy (e.g., Down syndrome, Edward syndrome, Patau syndrome, trisomy 13, trisomy 18, trisomy 21), quadrisomes, polysomes X and/or Y, microdeletions and microduplications (e.g., chromosome 22q11.2 deletion syndrome), and pentasomes.

本公开提供了用于单独或者与感兴趣的基因变体/突变并行检测非整倍体的系统及方法,其依赖于测序深度以确定给定样本中非整倍体存在还是不存在。出于本公开的目的,“深度”被定义为通过测序获得的与感兴趣的位点重迭的读段的数目和库的大小或库中每个碱基被测量的平均次数的比率。The present disclosure provides systems and methods for detecting aneuploidy alone or in parallel with a gene variant/mutation of interest, which rely on sequencing depth to determine whether aneuploidy is present or absent in a given sample. For the purposes of the present disclosure, "depth" is defined as the ratio of the number of reads obtained by sequencing that overlap with a site of interest and the size of the library or the average number of times each base in the library is measured.

从母体cfDNA样本制备的任何给定库中观察到的深度是胎儿部分、母体拷贝数、和胎儿拷贝数的函数。如果存在非整倍体(例如,三染色体),那么目标染色体的深度应以确定、可预测的方式不同于具有23条染色体的样本。例如,在三染色体中,与背景相比,目标染色体(例如,染色体21)的深度将增加。图5说明了这个措施所依据的原理。The depth observed in any given library prepared from a maternal cfDNA sample is a function of the fetal portion, maternal copy number, and fetal copy number. If aneuploidy is present (e.g., trisomy), the depth of the target chromosome should be different from a sample with 23 chromosomes in a determined, predictable manner. For example, in a trisomy, the depth of the target chromosome (e.g., chromosome 21) will be increased compared to the background. Figure 5 illustrates the principle behind this measure.

一般来说,当检测包含胎儿cfDNA的一些部分(例如,胎儿部分)的母体样本内的非整倍体(无论它是胎儿非整倍体还是母体非整倍体)时,可基于可检测的非整倍体区域或非整倍常染色体与已知非非整倍体区域或染色体相比的偏移来鉴定非整倍体的存在。即,根据实际胎儿部分,每个片段的分析(例如,下面的式1)将产生cfDNA妊娠深度对cfDNA密度的可绘图结果。如图5的中间图所示,这种偏移可以统计方式计算或被视觉化,其中背景深度表示没有非整倍体的样本的比较器或集合,且偏移的目标深度表示包括三染色体的样本,因此指示胎儿中存在非整倍体(假定妊妇没有表达出所述非整倍体)。这种偏差使用正规化分布曲线可检测到,并且随着样本的胎儿部分经由本文所述的富集过程而增加,这种偏差将会更加明显。In general, when detecting aneuploidy (whether it is fetal aneuploidy or maternal aneuploidy) in a maternal sample containing some parts (e.g., fetal part) of fetal cfDNA, the presence of aneuploidy can be identified based on the offset of detectable aneuploid regions or aneuploid autosomes compared to known non-aneuploid regions or chromosomes. That is, according to the actual fetal part, the analysis of each fragment (e.g., Formula 1 below) will produce a plottable result of cfDNA pregnancy depth to cfDNA density. As shown in the middle figure of Figure 5, this offset can be calculated or visualized in a statistical manner, wherein the background depth represents a comparator or set of samples without aneuploidy, and the offset target depth represents a sample including three chromosomes, thus indicating that there is aneuploidy in the fetus (assuming that the pregnant woman does not express the aneuploidy). This deviation is detectable using a normalized distribution curve, and as the fetal part of the sample increases via the enrichment process described herein, this deviation will be more obvious.

在一些实施例中,可使用深度调用图(在图5的下图中示出)将偏移视觉化及量化。如图5的下图所示,可确定给定样本的深度(即,阴影区域)以在四个已知拷贝数(CN)曲线(例如,CN=1、CN=2、CN=3、和CN=4)中的一个内进行拟合。在这个示范性图中,阴影分布在CN=3曲线内拟合,因此指示存在具有三个染色体拷贝的非整倍体(即,三染色体)。可以采用各种处理步骤来增强分布图结果并消除分析期间数据中的噪声。In some embodiments, the offset can be visualized and quantified using a depth call map (shown in the lower figure of FIG. 5 ). As shown in the lower figure of FIG. 5 , the depth (i.e., shaded area) of a given sample can be determined to fit within one of four known copy number (CN) curves (e.g., CN=1, CN=2, CN=3, and CN=4). In this exemplary figure, the shaded distribution fits within the CN=3 curve, thus indicating the presence of an aneuploidy (i.e., trisomy) with three chromosome copies. Various processing steps can be used to enhance the distribution map results and eliminate noise in the data during analysis.

在一些实施例中,检测非整倍体的存在或不存在可包括计算深度轨迹。深度轨迹是任何给定的感兴趣的基因序列的读段深度跨所观察窗口的变化。深度轨迹可被计算为跨所观察窗口的深度对胎儿部分的斜率,并且可以多种方式视觉化,如图8所示。当胎儿部分增加时减小的深度轨迹将指示胎儿比母亲具有更少的基因(或染色体)拷贝。随着胎儿部分增加而保持恒定的深度轨迹将指示胎儿与母亲具有相同的基因(或染色体)拷贝数。并且随着胎儿部分增加而增加的深度轨迹将指示胎儿比母亲具有更多的基因(或染色体)拷贝。尽管深度轨迹可用于确定染色体数目以检测非整倍体的存在或不存在的目的,但应注意,深度轨迹还可用于检测某些基因变体(如,拷贝数异常)的存在或不存在。In certain embodiments, the presence or absence of detecting aneuploidy may include calculating depth track. Depth track is the variation of the depth of the read of any given gene sequence of interest across the observed window. Depth track can be calculated as the slope of the depth across the observed window to the fetal part, and can be visualized in a variety of ways, as shown in Figure 8. The depth track that decreases when the fetal part increases will indicate that the fetus has fewer genes (or chromosomes) copies than the mother. The depth track that keeps constant as the fetal part increases will indicate that the fetus has the same gene (or chromosome) copy number as the mother. And the depth track that increases as the fetal part increases will indicate that the fetus has more genes (or chromosomes) copies than the mother. Although the depth track can be used to determine the chromosome number to detect the presence or absence of aneuploidy, it should be noted that the depth track can also be used to detect the presence or absence of certain gene variants (such as, abnormal copy number).

在分析任何给定样本的染色体深度时,可能需要考虑GC偏差并将其正规化。GC含量(或鸟嘌呤-胞嘧啶含量)是DNA分子上为鸟嘌呤或胞嘧啶(来自四种不同碱基的可能性,还包含腺嘌呤和胸腺嘧啶)的含氮碱基的百分比。高GC含量可扭曲结果并导致高水平的噪声。例如,在图5C的上下文中,增加的噪声将加宽数据带和相应拷贝数假设(黑线)的宽度,并且随着这些分布变得更宽,准确解释真实拷贝数水平变得更加困难。正确正规化会减少高噪声样本中深度的差异,从而减少GC偏差的影响并改善非整倍体调用。When analyzing the chromosome depth of any given sample, it may be necessary to account for and normalize GC bias. GC content (or guanine-cytosine content) is the percentage of nitrogenous bases on a DNA molecule that are guanine or cytosine (from the possibility of four different bases, also including adenine and thymine). High GC content can distort the results and lead to high levels of noise. For example, in the context of Figure 5C, the increased noise will widen the width of the data band and the corresponding copy number hypothesis (black line), and as these distributions become wider, it becomes more difficult to accurately interpret the true copy number level. Correct normalization will reduce the difference in depth in high-noise samples, thereby reducing the impact of GC bias and improving aneuploidy calls.

此外,对由1)GC偏差、2)样本背景、和3)杂交探针捕获(适当时;即,在利用混合探针的实施例中)引起的变化的三重正规化控制可跨所采样的数据采用,以改善所采样的数据的分布图,如图6所示。如图6中所提供的,顶部一组分布图示出了没有任何正规化的原始深度数据,中间一组分布图示出了在采用GC偏差正规化之后改善的分布图,且底部一组分布图在第二(样本背景)和第三(杂交探针捕获)正规化数据处理步骤完成之后甚至得到了更大的改善。因此,三重正规化控制可改善所采样的数据的分布图,并且在某些所公开的实施例中或者对于某些样本有用。一旦正规化,这些分布图可与模型预期进行比较,以得出关于非整倍体存在或不存在的结论,如图7所说明的。In addition, a triple normalization control of the changes caused by 1) GC bias, 2) sample background, and 3) hybridization probe capture (when appropriate; that is, in embodiments utilizing hybrid probes) can be adopted across the sampled data to improve the distribution map of the sampled data, as shown in Figure 6. As provided in Figure 6, the top group of distribution maps shows the original depth data without any normalization, the middle group of distribution maps shows the improved distribution map after adopting GC bias normalization, and the bottom group of distribution maps is even more improved after the second (sample background) and third (hybridization probe capture) normalization data processing steps are completed. Therefore, the triple normalization control can improve the distribution map of the sampled data, and is useful in some disclosed embodiments or for some samples. Once normalized, these distribution maps can be compared with model expectations to draw conclusions about the presence or absence of aneuploidy, as illustrated in Figure 7.

图7示出了非整倍体发生率的正规化深度拟合模型预期的图,其可用于解读汇集及可选地正规化的样本分布。深度拟合模型可使用常规已知的非整倍体分布来汇集,用于比较步骤,以解读实际汇集及可选地正规化的分布是否匹配汇集的已知模型中的一个或多个。如图7所示,以灰色示出的正规化的深度分布可相对于已知分布曲线设定,所述分布曲线反映了染色体13、18、21、和X的1、2、或3个拷贝(以从左到右的顺序)。可使用最大似然选择最可能的胎儿拷贝数来确定特定曲线拟合。由于最大似然拟合产生了与特定调用的匹配,因此可得出关于所分析样本内非整倍体存在或不存在的结论。Fig. 7 shows the figure of the normalized depth fitting model expectation of aneuploidy incidence, which can be used for interpreting the sample distribution of collection and alternatively normalization. The depth fitting model can be collected using conventional known aneuploidy distribution, for comparison step, to interpret whether the actual collection and alternatively normalized distribution match one or more of the known models collected. As shown in Figure 7, the normalized depth distribution shown in gray can be set relative to a known distribution curve, and the distribution curve reflects 1, 2 or 3 copies (in order from left to right) of chromosome 13, 18, 21 and X. Maximum likelihood can be used to select the most likely fetal copy number to determine specific curve fitting. Because maximum likelihood fitting produces a matching with a specific call, it is possible to draw a conclusion about the presence or absence of aneuploidy in the analyzed sample.

基于当样本中存在非整倍体时将观察到的深度的预测差异,可设计非整倍体调用程式来选择在正态分布上产生非整倍体的最高似然的一组母体和胎儿拷贝数。为此,开发了以下方程式来确定给定非整倍体的深度:Based on the predicted differences in depth that will be observed when aneuploidy is present in a sample, an aneuploidy caller can be designed to select a set of maternal and fetal copy numbers that have the highest likelihood of producing an aneuploidy on a normal distribution. To this end, the following equation was developed to determine the depth of a given aneuploidy:

[式1][Formula 1]

其中:in:

dp是血浆深度d p is the plasma depth

f是胎儿部分f is the fetus

cm是母体拷贝数c m is the maternal copy number

db是背景深度d b is the background depth

cf是胎儿拷贝数c f is the fetal copy number

此调用程式示出对检测常染色体和性别染色体非整倍体以及胎儿性别调用的高度灵敏度和特异性。下面的实例提供了关于非整倍体调用程式的表现的进一步细节。This calling program shows high sensitivity and specificity for detecting autosomal and sex chromosome aneuploidies as well as fetal sex calling. The following example provides further details on the performance of the aneuploidy calling program.

在完成本文公开的筛查后,医生可选择进行进一步的评估,如扩展的非整倍体分析(EAA),其分析甚至更多编号的染色体对以提供对妊娠健康的额外见解。因此,在一些实施例中,所公开的确定非整倍体存在或不存在的方法可进一步包括EAA。After completing the screening disclosed herein, the physician may choose to perform further evaluation, such as an extended aneuploidy analysis (EAA), which analyzes even more numbered chromosome pairs to provide additional insights into the health of the pregnancy. Thus, in some embodiments, the disclosed methods of determining the presence or absence of aneuploidy may further include an EAA.

C.基因变体的检测C. Detection of Gene Variants

一般来说,作为所公开方法的一部分而检测的基因变体(例如,基因突变)是与特定遗传性或可遗传疾病、病况、或性状相关联的基因变体、标记、或突变。基因变体可包含单核苷酸变异(SNV)、致病性或非致病性单核苷酸多态性(SNP)、插入及缺失(indel)、替代突变、或单基因拷贝数变体。In general, the genetic variants (e.g., genetic mutations) detected as part of the disclosed methods are genetic variants, markers, or mutations associated with specific hereditary or heritable diseases, conditions, or traits. Genetic variants may include single nucleotide variations (SNVs), pathogenic or non-pathogenic single nucleotide polymorphisms (SNPs), insertions and deletions (indels), substitution mutations, or single gene copy number variants.

基因变体可能与多于一种疾病、病况、或性状相关联。基因变体可表达为聚核苷酸的变异,如野生型(即,非突变的或与疾病或病况无关的)基因或基因座之间至少约1、2、3、4、5、6、7、8、9、10、20、30、40、50、或更多的序列差异。可使用所公开方法检测的基因变体类型的非限制性实例包含但不限于单核苷酸多态性(SNP)、缺失/插入多态性(DIP)、微拷贝数变体(CNV)、短串联重复(STR)、限制性片段长度多态性(RFLP)、单序列重复(SSR)、可变量目串联重复(VNTR)、随机扩增多态性DNA(RAPD)、扩增片段长度多态性、基于反转座子的插入多态性、序列特异性扩增多态性、和可遗传表观遗传修饰(例如,DNA甲基化)。Gene variants may be associated with more than one disease, condition, or trait. Gene variants may be expressed as variations in polynucleotides, such as at least about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, or more sequence differences between wild-type (i.e., non-mutated or unrelated to a disease or condition) genes or loci. Non-limiting examples of types of gene variants that can be detected using the disclosed methods include, but are not limited to, single nucleotide polymorphisms (SNPs), deletion/insertion polymorphisms (DIPs), microcopy number variants (CNVs), short tandem repeats (STRs), restriction fragment length polymorphisms (RFLPs), single sequence repeats (SSRs), variable tandem repeats (VNTRs), randomly amplified polymorphic DNA (RAPDs), amplified fragment length polymorphisms, retrotransposon-based insertion polymorphisms, sequence-specific amplified polymorphisms, and heritable epigenetic modifications (e.g., DNA methylation).

出于所公开方法的目的,至少1、2、3、4、5、6、7、8、9、10、15、20、25、30、35、40、45、50、55、60、65、70、75、80、85、90、95、100、125、150、175、200、225、250、275、300、325、350、375、400、425、450、475、500、525、550、575、600、625、650、675、700、725、750、775、800、825、850、875、900、925、950、975、或1000种、或更多种不同的基因变体的存在或不存在可在单一测定中检测,并与非整倍体存在或不存在的检测并行进行。在一些实施例中,所述方法可并行检测与至少1、2、3、4、5、6、7、8、9、10、15、20、25、30、35、40、45、50、55、60、65、70、75、80、85、90、95、100、105、110、115、120、125、130、135、140、145、150、155、160、165、170、175、180、185、190、195、或200种、或更多种疾病、病况、或性状相关联的基因变体的存在或不存在。For purposes of the disclosed methods, at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 125, 150, 175, 200, 225, 250, 275, 300, 325, 350, 375, 400, 425, 430, 440, 450, 460, 470, 480, 490, 500, 510, 520, 530, 540, 550, 560, 570, 580, 590, 601 The presence or absence of 50, 475, 500, 525, 550, 575, 600, 625, 650, 675, 700, 725, 750, 775, 800, 825, 850, 875, 900, 925, 950, 975, or 1000, or more different genetic variants can be detected in a single assay and performed in parallel with the detection of the presence or absence of aneuploidy. In some embodiments, the methods can detect in parallel the presence or absence of gene variants associated with at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115, 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 190, 195, or 200, or more diseases, conditions, or traits.

一般来说,通过所公开方法检测的基因变体类型的存在是与患有或发展疾病、病况、或性状的风险增加约、小于约、或大于约1%、5%、10%、15%、20%、25%、30%、40%、50%、60%、70%、80%、90%、100%、200%、300%、400%、500%、或更多相关联。在一些实施例中,基因变体的存在使患有或发展疾病、病况、或性状的风险增加约、小于约、或大于约1倍、2倍、3倍、4倍、5倍、6倍、7倍、8倍、9倍、10倍、25倍、50倍、100倍、500倍、1000倍、10000倍、或更多倍。在一些实施例中,基因变体的存在使患有或发展疾病、病况、或性状的风险增加了任何统计学上显著的量,如所具有的p值为约或小于约0.1、0.05、10-3、10-4、10-5、10-6、10-7、10-8、10-9、10-10、10-11、10-12、10-13、10-14、10-15、或更小的增加。In general, the presence of a gene variant type detected by the disclosed methods is associated with an increased risk of having or developing a disease, condition, or trait by about, less than about, or greater than about 1%, 5%, 10%, 15%, 20%, 25%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 100%, 200%, 300%, 400%, 500%, or more. In some embodiments, the presence of a gene variant increases the risk of having or developing a disease, condition, or trait by about, less than about, or greater than about 1-fold, 2-fold, 3-fold, 4-fold, 5-fold, 6-fold, 7-fold, 8-fold, 9-fold, 10-fold, 25-fold, 50-fold, 100-fold, 500-fold, 1000-fold, 10000-fold, or more. In some embodiments, the presence of a genetic variant increases the risk of having or developing a disease, condition, or trait by any statistically significant amount, such as having a p-value of about or less than about 0.1, 0.05, 10-3 , 10-4 , 10-5 , 10-6 , 10-7 , 10-8 , 10-9 , 10-10, 10-11 , 10-12 , 10-13 , 10-14 , 10-15 , or an increase of less than about.

出于本公开的目的,可以通过确定基因变体的存在或不存在来评估或检测的遗传性疾病包含但不限于21-羟化酶缺乏症、ABCC8相关高胰岛素症、ARSACS、软骨发育不全、全色盲、腺苷单磷酸脱胺酶1、胼胝体发育不全伴神经元病、黑尿症、α-1-抗胰蛋白酶缺乏症、α-甘露糖储积症、α-肌聚糖病、α-地中海贫血;阿尔茨海默症,血管收缩素II受体I型、脂蛋白E基因分型;精胺琥珀酸尿症、天门冬葡萄糖胺尿、运动失调伴维生素E缺乏、运动失调毛细管扩张症、自体免疫多内分泌病变综合征1型、BRCA1遗传性乳腺癌/卵巢癌、BRCA2遗传性乳腺癌/卵巢癌、Bardet-Biedl二氏综合征、Best卵黄囊状黄斑失养症、β-肌聚糖病、β-地中海贫血、生物素酶缺乏症、Blau综合征、Bloom综合征、CFTR相关病症、CLN3相关神经性类蜡脂褐质病、CLN5相关神经性类蜡脂褐质病、CLN8相关神经性类蜡脂褐质病、Canavan病、肉毒碱棕榈酰转移酶IA缺乏症、肉毒碱棕榈酰转移酶II缺乏症、软骨-毛发发育不良、脑海绵状畸形、无脉络脉畸型、Cohen氏综合征、先天性白内障、面部异形及神经病变、先天性糖基化障碍la、先天性糖基化障碍Ib、先天性芬兰肾病、克隆氏病、胱氨酸病、DFNA9(COCH)、糖尿病及听力损失、早发性原发性肌紧张不足(DYTI)、Herlitz-Pearson型交界型水疱性表皮松解症、FANCC相关Fanconi贫血、FGFR1相关颅缝线封闭过早、FGFR2相关颅缝线封闭过早、FGFR3相关颅缝线封闭过早、第五因素Leiden血栓好发症、第五因素R2突变血栓好发症、第十一因素缺乏症、第十三因素缺乏症、家族性腺瘤性息肉病、家族性自主神经障碍、家族性高胆固醇血症B型、家族性地中海热、游离唾液酸存储障碍、额颞叶痴呆伴帕金森氏症17、延胡索酸酶缺乏症、GJB2相关DFNA 3型非综合征性听力损失及耳聋、GJB2相关DFNB 1非综合征性听力损失及耳聋、GNE相关肌病、半乳糖血症、Gaucher氏病、葡萄糖-6-磷酸脱氢酶缺乏症、戊二酸血症1型、糖原贮积病1a型、糖原贮积病Ib型、糖原贮积病II型、糖原贮积病III型、糖原贮积病V型、Gracile综合征、HFE相关联的遗传性血铁沉积症、Halder AIMs、血红蛋白Sβ-地中海贫血、遗传性果糖不耐受、遗传性胰腺炎、遗传性胸腺嘧啶-尿嘧啶尿症、己糖胺酶A缺乏症、有汗性外胚层发育异常2、胱硫醚β-合酶缺乏引起的高胱氨酸尿症、高钾血周期性麻痹1型、高鸟氨酸血症-高氨血症-高瓜氨酸尿综合征、原发性高草酸盐尿症1型、原发性高草酸盐尿症2型、软骨生成减退、低钾血周期性麻痹1型、低钾血周期性麻痹2型、低磷酸酶症、婴儿肌病及乳酸性酸中毒(致死型和非致死型)、异戊酸血症、Krabbe病、LGMD2I、Leber遗传性视神经病变、法国-加拿大型Leigh综合征、长链3-羟酰基-辅酶A脱氢酶缺乏症、MELAS、MERRF、MTHFR缺乏症、MTHFR不耐热变异、MTRNR1相关听力损失及耳聋、MTTS1相关听力损失及耳聋、MYH相关联的息肉病、枫糖浆尿病1A型、枫糖浆尿病1B型、马科恩-亚百特氏综合征、中链酰基辅酶A脱氢酶缺乏症、巨脑白质病伴皮质下囊肿、异染性白质失养症、线粒体心肌病、线粒体DNA相关联的Leigh综合征及NARP、粘脂贮积病IV、黏多糖病I型、黏多糖病IIIA型、黏多糖病VII型、多发性内分泌瘤2型、肌-眼-脑疾病、线样肌病、神经表型、由于神经磷脂酶缺乏引起的尼曼-匹克病、尼曼-匹克病C1型、奈梅亨染色体断裂综合征、PPT1相关神经性类蜡脂褐质病、PROP1相关下垂体激素缺乏症、Pallister-Hall综合征、先天性肌刚痉病、Pendred综合征、过氧化体双功能酶缺乏症、广泛性发展障碍、苯丙氨酸羟化酶缺乏症、血浆蛋白原活化因子抑制物I、常染色体隐性遗传多囊肾病、凝血酶原G20210A血栓好发症、假维生素D缺乏性佝偻病、致密成骨不全症、Bothnia型常染色体隐性色素沉着性视网膜炎、雷特氏综合征、肢根性点状软骨发育异常1型、短链酰基辅酶A脱氢酶缺乏症、Shwachman-Diamond综合征、Sjogren-Larsson综合征、Smith-Lemli-Opitz综合征、痉挛性截瘫13、硫酸盐转运蛋白相关骨软骨发育不良、TFR2相关遗传性血色病、TPP1相关神经性类蜡脂褐质病、致死性软骨发育不全、运甲状腺素蛋白淀粉样变性、三功能蛋白缺乏症、酪氨酸羟化酶缺乏性DRD、酪氨酸血症I型、Wilson氏病、X性联青年性视网膜劈裂症、囊肿纤维化、脊髓性肌肉萎缩症(SMA)、血红素病、和Zellweger综合征谱系。For the purposes of this disclosure, genetic diseases that can be evaluated or detected by determining the presence or absence of gene variants include, but are not limited to, 21-hydroxylase deficiency, ABCC8-related hyperinsulinism, ARSACS, achondroplasia, achromatopsia, adenosine monophosphate deaminase 1, agenesis of the corpus callosum with neuronopathy, alkaptonuria, alpha-1-antitrypsin deficiency, alpha-mannose storage disease, alpha-sarcoglycanosis, alpha-thalassemia; Alzheimer's disease, angiotensin II receptor type I, lipoprotein E genotyping; sperminosuccinic aciduria, asparaginuria, exercise disorders with vitamin E deficiency, ataxia-telangiectasia, autoimmune polyendocrinopathy type 1, BRCA1 hereditary breast/ovarian cancer, BRCA2 hereditary breast/ovarian cancer, Bardet-Biedl syndrome, Best yolk sac macular dystrophy, beta-sarcoglycanopathy, beta-thalassemia, biotinidase deficiency, Blau syndrome, Bloom syndrome, CFTR-related disorders, CLN3-related neurogenic cerolipofuscinosis, CLN5-related neurogenic cerolipofuscinosis, CLN8-related neurogenic cerolipofuscinosis disease, Canavan disease, carnitine palmitoyltransferase IA deficiency, carnitine palmitoyltransferase II deficiency, chondro-hair dysplasia, cerebral cavernous malformation, choroidergic malformation, Cohen syndrome, congenital cataract, facial dysmorphism and neuropathy, congenital glycosylation disorder la, congenital glycosylation disorder Ib, congenital Finnish nephropathy, Crohn's disease, cystinosis, DFNA9 (COCH), diabetes mellitus and hearing loss, early-onset primary myotonia insufficiency (DYTI), Herlitz-Pearson type junctional epidermolysis bullosa anemia, FANCC-associated Fanconi anemia, FGFR1-associated premature suture closure, FGFR2-associated premature suture closure, FGFR3-associated premature suture closure, factor V Leiden thrombophilia, factor V R2 mutation thrombophilia, factor 11 deficiency, factor 13 deficiency, familial adenomatous polyposis, familial dysautonomia, familial hypercholesterolemia type B, familial Mediterranean fever, free sialic acid storage disorder, frontotemporal dementia with Parkinson's disease 17, fumarase deficiency, GJB2-associated DFNA Type 3 nonsyndromic hearing loss and deafness, GJB2-related DFNB 1 nonsyndromic hearing loss and deafness, GNE-related myopathy, galactosemia, Gaucher's disease, glucose-6-phosphate dehydrogenase deficiency, glutaric acidemia type 1, glycogen storage disease type 1a, glycogen storage disease type Ib, glycogen storage disease type II, glycogen storage disease type III, glycogen storage disease type V, Gracile syndrome, HFE-related hereditary hemosiderinosis, Halder AIMs, hemoglobin Sβ-thalassemia, hereditary fructose intolerance, hereditary pancreatitis, hereditary thymine-uraciluria, hexosaminidase A deficiency, hidrotic ectodermal dysplasia 2, homocystinuria due to cystathionine β-synthase deficiency, hyperkalemic periodic paralysis type 1, hyperornithinemia-hyperammonemia-homocitrullinuria syndrome, primary hyperoxaluria type 1, primary hyperoxaluria type 2, hypochondrogenosis, hypokalemic periodic paralysis type 1, hypokalemic periodic paralysis type 2, hypophosphatasia, infantile myopathy and lactic acidosis (fatal and nonfatal), isovaleric acidemia, Krabbe disease, LGMD2I, Leber hereditary optic neuropathy, French-Canadian Leigh syndrome syndrome, long-chain 3-hydroxyacyl-CoA dehydrogenase deficiency, MELAS, MERRF, MTHFR deficiency, MTHFR thermolabile variants, MTRNR1-related hearing loss and deafness, MTTS1-related hearing loss and deafness, MYH-associated polyposis, maple syrup urine disease type 1A, maple syrup urine disease type 1B, Markon-Subbatt syndrome, medium-chain acyl-CoA dehydrogenase deficiency, megalencephaly with subcortical cysts, metachromatic leukodystrophy, mitochondrial cardiomyopathy, mitochondrial DNA-associated Leigh syndrome and NARP, mucolipidosis IV, mucopolysaccharidosis type I, mucopolysaccharidosis type IIIA, mucopolysaccharidosis type VII, multiple endocrine neoplasia type 2, muscle-eye-brain disease, nematode myopathy, neuromyopathy Phenotype, Niemann-Pick disease due to phospholipase deficiency, Niemann-Pick disease type C1, Nijmegen chromosome breakage syndrome, PPT1-related neurogenic cerolipofuscinosis, PROP1-related pituitary hormone deficiency, Pallister-Hall syndrome, myospasm congenita, Pendred syndrome, peroxisome bifunctional enzyme deficiency, pervasive developmental disorder, phenylalanine hydroxylase deficiency, plasma proteinogen activator inhibitor I, autosomal recessive polycystic kidney disease, prothrombin G20210A thrombophilia, pseudovitamin D deficiency rickets, pycnodystrophy, autosomal recessive pigmented retinitis Bothnia type, Rett syndrome, chondrodysplasia punctata abnormality type 1, short-chain acyl-CoA dehydrogenase deficiency, Shwachman-Diamond syndrome, Sjogren-Larsson syndrome, Smith-Lemli-Opitz syndrome, spastic paraplegia 13, sulfate transporter-related osteochondrodysplasia, TFR2-related hereditary hemochromatosis, TPP1-related neuropathic cerolipofuscinosis, lethal achondroplasia, transthyretin amyloidosis, trifunctional protein deficiency, tyrosine hydroxylase deficiency DRD, tyrosinemia type I, Wilson's disease, X-linked juvenile retinoschisis, cystic fibrosis, spinal muscular atrophy (SMA), heme diseases, and Zellweger syndrome spectrum.

出于所公开方法的目的,可以使用计算机模拟移动窗口分析来进行基因变体的鉴定或检测,以基于本文所述跨所分析窗口的等位基因平衡(当评估涉及SNP、插入及缺失、或其它点突变的基因变体时)或基于深度(当评估涉及拷贝数变化的基因变体时)来建立轨迹。这种分析对于检测隐性病况、性状、或疾病可能特别有用。如上所述,这个过程可包括对序列库的至少两个窗口中的序列进行基于读段长度的大小排除,从而获得至少两个富集胎儿部分的序列库。比较每个窗口中的等位基因平衡允许计算各个富集胎儿部分的序列库之间的等位基因平衡轨迹。等位基因平衡轨迹是任何给定感兴趣的基因序列的等位基因平衡百分比跨所观察窗口的变化。等位基因平衡轨迹可被计算为跨所观察窗口的等位基因平衡相对于胎儿部分的斜率,并且它可以多种方式视觉化,如图3所示。For the purpose of disclosed method, the identification or detection of gene variant can be carried out using computer simulation moving window analysis, to establish track based on the allele balance (when assessing the gene variant of SNP, insertion and deletion or other point mutation) across the analyzed window described herein or based on depth (when assessing the gene variant of copy number variation). This analysis may be particularly useful for detecting recessive conditions, traits or diseases. As mentioned above, this process may include the sequence in at least two windows of the sequence library being excluded based on the size of the read length, thereby obtaining the sequence library of at least two enriched fetal parts. The allele balance in each window is compared to allow the allele balance track between the sequence library of each enriched fetal part to be calculated. The allele balance track is the variation of the allele balance percentage of any given gene sequence of interest across the observed window. The allele balance track can be calculated as the slope of the allele balance across the observed window relative to the fetal part, and it can be visualized in a variety of ways, as shown in Figure 3.

可利用等位基因平衡轨迹来鉴定cfDNA库内的杂合及纯合突变。例如,轨迹中的单一点是是基于给定窗口中的等位基因平衡,并可被转换成带型图,其中y轴针对特定感兴趣的基因或核酸序列显示具有给定等位基因的样本中cfDNA的百分比(例如,0%、10%、20%、30%、40%、50%、60%),且x轴显示感兴趣的基因或核酸的不同等位基因(即,参考等位基因或alt等位基因),其对应于野生型序列或与特定疾病、病况、或性状相关联的突变/变体(例如,与囊肿纤维化相关联的CFTR基因内的不同已知突变)。如果窗口或样本中的胎儿部分是例如20%,那么y轴上10%处的条带对应于作为来自生父DNA的携带者的胎儿或胎儿中的原发突变。y轴上40%处的条带对应于感兴趣的基因或序列的突变/变体为阴性(即,纯合参考)的胎儿,而母亲是杂合的(即携带者)。y轴上50%处的条带对应于来自生母DNA的携带者的胎儿,或者在母亲及父亲二者皆为具有相同alt等位基因的携带者的情况下,对应于为生父DNA的携带者的胎儿。y轴上60%处的条带对应于感兴趣的基因或序列的突变/变体为纯合阳性的胎儿。如上所述,上文讨论的条带(即,在10%、40%、50%、和60%处)并非固定的,且它们的位置将基于胎儿部分而变化。例如,如果胎儿部分改为10%(与上述实例中的20%相对照),那么条带的值分别从10%、40%、50%、和60%变化为5%、45%、50%、和55%。Allelic balance tracks can be used to identify heterozygous and homozygous mutations within a cfDNA library. For example, a single point in the track is based on allelic balance in a given window and can be converted into a strip chart, where the y-axis shows the percentage of cfDNA in a sample with a given allele for a particular gene or nucleic acid sequence of interest (e.g., 0%, 10%, 20%, 30%, 40%, 50%, 60%), and the x-axis shows different alleles of the gene or nucleic acid of interest (i.e., reference alleles or alt alleles), which correspond to wild-type sequences or mutations/variants associated with a particular disease, condition, or trait (e.g., different known mutations in the CFTR gene associated with cystic fibrosis). If the fetal portion in the window or sample is, for example, 20%, then the band at 10% on the y-axis corresponds to a primary mutation in a fetus or fetus that is a carrier of DNA from the biological father. The band at 40% on the y-axis corresponds to a fetus that is negative for mutations/variants of the gene or sequence of interest (i.e., homozygous reference), while the mother is heterozygous (i.e., carrier). The band at 50% on the y-axis corresponds to the fetus from the carrier of the biological mother's DNA, or when the mother and the father are both carriers with the same alt allele, corresponds to the fetus for the carrier of the biological father's DNA. The band at 60% on the y-axis corresponds to the fetus that the mutation/variant of the gene or sequence interested is homozygous positive. As mentioned above, the band discussed above (that is, at 10%, 40%, 50%, and 60%) is not fixed, and their position will change based on the fetal part. For example, if the fetal part is changed to 10% (compared with 20% in the above example), the value of the band changes to 5%, 45%, 50%, and 55% from 10%, 40%, 50%, and 60% respectively.

等位基因平衡轨迹结合了来自每个所观察窗口的这种静态信息,其必然将具有不同的胎儿部分。因此,轨迹可依赖于具有上述20%胎儿部分的窗口,具有10%胎儿部分的第二窗口,以及可选地,具有不同胎儿部分的1、2、3、4、5、6、7、8、9、或10个更多的窗口。The allele balance track combines this static information from each observed window, which will necessarily have a different fetal portion. Thus, the track can rely on a window with the above-mentioned 20% fetal portion, a second window with a 10% fetal portion, and optionally, 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 more windows with different fetal portions.

另外或替代地,感兴趣的基因变体的特定调用程式可以依赖于拷贝数的评估、深度分析(如上文关于非整倍体所述)或所属技术领域中已知的其它检测形式。例如,可使用深度轨迹来检测拷贝数变体的存在或不存在,如SMA1、RHD、HBA1、和HBA 2的拷贝数变体,它们皆与特定的遗传性疾病相关联。在一些实施例中,深度轨迹可具有负斜率(指示胎儿中较少的拷贝)、近似平坦的斜率(指示胎儿与母亲之间相同数目的拷贝)、或正斜率(指示胎儿中更多的拷贝,并且这种斜率可基于具有不同胎儿部分的1、2、3、4、5、6、7、8、9、或10个更多的窗口。Additionally or alternatively, specific calling routines for gene variants of interest may rely on an assessment of copy number, deep analysis (as described above with respect to aneuploidy), or other forms of detection known in the art. For example, a deep track may be used to detect the presence or absence of copy number variants, such as those of SMA1, RHD, HBA1, and HBA 2, which are all associated with specific genetic diseases. In some embodiments, a deep track may have a negative slope (indicating fewer copies in the fetus), an approximately flat slope (indicating the same number of copies between the fetus and the mother), or a positive slope (indicating more copies in the fetus, and such slope may be based on 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 more windows with different fetal portions.

在一些实施例中,针对某些病况的调用程式可依赖于检测“差异碱基”的存在或不存在。在一些实施例中,针对某些病况的调用程式可依赖于检测野生型序列(例如,SNV)中取代的存在或不存在。在一些实施例中,针对某些病况的调用程式可依赖于检测单核苷酸多态性(SNP)的存在或不存在。在一些实施例中,针对某些病况的调用程式可依赖于检测一个或多个插入或缺失(INDEL)的存在或不存在。在多个SNV、差异碱基、SNP、或其组合与给定病况相关联的情况下,集中或合并甚至少量SNV、差异碱基、SNP、插入或缺失、或其组合(例如,<3、<4、<5、<6、<7、<8、<9、<10、<11、<12、<13、<14、<15)的检测讯号可提供基因型之间的经改善的分离。因此,在一些实施例中,针对某一病况的调用程式可能依赖于对1、2、3、4、5、6、7、8、9、10、11、12、13、14、或15个、或更多SNV、差异碱基、SNP、插入或缺失、或其组合的存在或不存在的检测。In some embodiments, the calling program for certain conditions may rely on the presence or absence of detecting "difference bases". In some embodiments, the calling program for certain conditions may rely on detecting the presence or absence of substitutions in wild-type sequences (e.g., SNVs). In some embodiments, the calling program for certain conditions may rely on detecting the presence or absence of single nucleotide polymorphisms (SNPs). In some embodiments, the calling program for certain conditions may rely on detecting the presence or absence of one or more insertions or deletions (INDELs). In the case where multiple SNVs, difference bases, SNPs, or combinations thereof are associated with a given condition, focusing or merging even a small amount of SNVs, difference bases, SNPs, insertions or deletions, or combinations thereof (e.g., <3, <4, <5, <6, <7, <8, <9, <10, <11, <12, <13, <14, <15) detection signals can provide improved separation between genotypes. Thus, in some embodiments, a call for a condition may rely on detection of the presence or absence of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15, or more SNVs, differential bases, SNPs, insertions or deletions, or a combination thereof.

例如,在检测α地中海贫血时,所公开方法可利用检测HBA1和HBA2的双顺式突变存在或不存在的调用程式,所述双顺式突变是所述病况的最常见原因。因此,调用程式可检测到,例如,从感兴趣的区域,如双缺失区域中的多个探针获得的共有拷贝数讯号。For example, in detecting alpha thalassemia, the disclosed methods can utilize a caller that detects the presence or absence of a double cis mutation of HBA1 and HBA2, which is the most common cause of the condition. Thus, the caller can detect, for example, a shared copy number signal obtained from multiple probes in a region of interest, such as a double deletion region.

因此,利用所公开方法允许以单一样本调用感兴趣的基因变体,并且与非整倍体的检测并行进行。事实上,这种方法甚至可针对给定感兴趣的基因变体来确定胎儿是纯合的还是杂合的。此外,在母亲及父亲拥有不同alt等位基因的一些实施例中,可以确定胎儿是否从母亲、父亲、或两者获得了特定变体。这是用以确定母体样本中存在或不存在基因变体/突变的新的有用方式。Therefore, utilizing the disclosed method allows calling the gene variant of interest with a single sample, and is performed in parallel with the detection of aneuploidy. In fact, this method can even determine whether the fetus is homozygous or heterozygous for a given gene variant of interest. In addition, in some embodiments where the mother and the father have different alt alleles, it can be determined whether the fetus has obtained a specific variant from the mother, the father, or both. This is a new and useful way to determine the presence or absence of gene variants/mutations in maternal samples.

D.噪声的降低D. Noise reduction

如上所解释以及在实例中进一步所示,所公开的方法及系统可显著降低cfDNA数据中的噪声,此会改善用于检测基因变体和非整倍体的测定的性能。由于从孕妇获得的大多数生物样本中cffDNA水平较低,从常规处理及检测方法产生的高水平背景噪声可导致样本不可用、不可解释、或两者兼有。因此,所公开的噪声降低方法表示改善常规非侵入性产前筛查(NIPS)的新的有用方法。As explained above and further shown in the examples, the disclosed methods and systems can significantly reduce the noise in cfDNA data, which will improve the performance of assays for detecting genetic variants and aneuploidy. Since the cffDNA levels are low in most biological samples obtained from pregnant women, the high level of background noise generated from conventional processing and detection methods can cause samples to be unusable, uninterpretable, or both. Therefore, the disclosed noise reduction method represents a new and useful method for improving conventional non-invasive prenatal screening (NIPS).

另外,本公开提供了在非侵入性产前筛查(NIPS)中降低来自多余遗传物质的背景噪声的方法,其包括(i)从孕妇获得生物样本,其中所述生物样本包括游离DNA(cfDNA);及(ii)处理用于NIPS的cfDNA,其中处理包括富集生物样本中的游离胎儿DNA(cffDNA)、对cfDNA进行计算机模拟处理、或其组合。In addition, the present disclosure provides a method for reducing background noise from excess genetic material in non-invasive prenatal screening (NIPS), which comprises (i) obtaining a biological sample from a pregnant woman, wherein the biological sample comprises cell-free DNA (cfDNA); and (ii) processing the cfDNA for NIPS, wherein the processing comprises enriching free fetal DNA (cffDNA) in the biological sample, performing computer simulation processing on the cfDNA, or a combination thereof.

在一些实施例中,降低噪声的方法将包括富集生物样本中的游离胎儿DNA(cffDNA)及对cfDNA进行计算机模拟处理。In some embodiments, the method of reducing noise will include enriching free fetal DNA (cffDNA) in a biological sample and performing computer simulation processing on the cfDNA.

为了降低噪声,富集生物样本中的cffDNA可包括所公开的胎儿部分的物理分离或富集的方法。例如,在一些实施例中,富集生物样本中的cffDNA可包括从孕妇获得包括游离DNA(cfDNA)的生物样本,其中cfDNA包括cffDNA和游离DNA母体(cfmDNA);从生物样本中萃取cfDNA;以及使所萃取的cfDNA经受大小排除过程,其中大小排除过程具有约150个核苷酸长度、约155个核苷酸长度、约160个核苷酸长度、约165个核苷酸长度、约170个核苷酸长度、约175个核苷酸长度、或约180个核苷酸长度的截止大小,从而产生富集cffDNA的核酸。To reduce noise, enriching cffDNA in a biological sample may include a disclosed method of physical separation or enrichment of a fetal portion. For example, in some embodiments, enriching cffDNA in a biological sample may include obtaining a biological sample including cell-free DNA (cfDNA) from a pregnant woman, wherein the cfDNA includes cffDNA and cell-free DNA maternal (cfmDNA); extracting cfDNA from the biological sample; and subjecting the extracted cfDNA to a size exclusion process, wherein the size exclusion process has a cutoff size of about 150 nucleotides in length, about 155 nucleotides in length, about 160 nucleotides in length, about 165 nucleotides in length, about 170 nucleotides in length, about 175 nucleotides in length, or about 180 nucleotides in length, thereby generating nucleic acids enriched in cffDNA.

类似地,为了降低噪声,计算机模拟处理可包括所公开的分析序列库或序列库数据的方法中的任一种,以将基因变体或非整倍体的任何分析集中在样本的胎儿部分。例如,在一些实施例中,计算机模拟处理可包括对包括游离胎儿DNA(cffDNA)和游离母体DNA(cfmDNA)的cfDNA样本进行测序,以制备序列库;进行基于读段长度的分析,其中在序列库的至少两个窗口中建立感兴趣的核酸序列的等位基因平衡;以及基于所述至少两个窗口的等位基因平衡建立轨迹,其中所述轨迹指示包括感兴趣的核酸序列的样本中存在的等位基因的百分比。Similarly, to reduce noise, the computer simulation processing may include any of the disclosed methods for analyzing a sequence library or sequence library data to focus any analysis of genetic variants or aneuploidies on the fetal portion of the sample. For example, in some embodiments, the computer simulation processing may include sequencing a cfDNA sample including free fetal DNA (cffDNA) and free maternal DNA (cfmDNA) to prepare a sequence library; performing a read length-based analysis, wherein an allelic balance of a nucleic acid sequence of interest is established in at least two windows of the sequence library; and establishing a trajectory based on the allelic balance of the at least two windows, wherein the trajectory indicates the percentage of alleles present in the sample including the nucleic acid sequence of interest.

噪声降低可进一步包括正规化以控制GC偏差、样本背景、杂交探针捕获、或其组合。一般来说,正规化可为“中值正规化”。换句话说,探针读段深度可除以跨具有相似GC含量的探针的中值,然后除以跨母体及胎儿中具有推定拷贝数2的样本及探针的四分位数间距平均值。Noise reduction can further include normalization to control for GC bias, sample background, hybridization probe capture, or a combination thereof. In general, normalization can be "median normalization." In other words, the probe read depth can be divided by the median across probes with similar GC content, and then divided by the interquartile range mean across samples and probes with an inferred copy number of 2 in the mother and fetus.

杂交探针捕获可能会出现问题,因为含有变体和重迭捕获探针的DNA片段以较低效率被捕获,从而降低了备选等位基因的等位基因平衡。然而,捕获偏差通常为可重复的,且在此类情况下,可使用下式来学习及校正:Hybridization probe capture can be problematic because DNA fragments containing variants and overlapping capture probes are captured with lower efficiency, thereby reducing the allelic balance of alternative alleles. However, capture bias is often reproducible and in such cases can be learned and corrected using the following formula:

[式2][Formula 2]

杂交探针捕获的校正及正规化对于确保正确的插入或缺失调用特别有用,尽管它可更普遍地帮助变体调用。Correction and normalization of hybridization probe capture is particularly useful for ensuring correct insertion or deletion calls, although it can aid variant calling more generally.

给出以下实例来说明所公开的样本制备及方法。然而,应理解,本发明不限于这些实例中描述的具体实施例或细节。The following examples are given to illustrate the disclosed sample preparation and methods. It should be understood, however, that the invention is not limited to the specific embodiments or details described in these examples.

实例Examples

实例1-非整倍体调用程式性能Example 1 - Aneuploidy caller performance

所公开的非整倍体调用程式是基于如上所述的序列读段深度。为了确立这种方法的可行性,对110个可行性样本进行了分析,并与标准认可的非整倍体检测系统(MyriadPREQUELTM产前筛查)进行了比较。样本中可检测到的非整倍体和每个样本的对照调用如下表所示:The disclosed aneuploidy calling program is based on the sequence read depth as described above. In order to establish the feasibility of this method, 110 feasibility samples were analyzed and compared with the standard recognized aneuploidy detection system (MyriadPREQUEL TM prenatal screening). The detectable aneuploids in the samples and the control calls for each sample are shown in the following table:

所公开的基于深度的分析方法提供了以下结果:The disclosed depth-based analysis method provides the following results:

●对于常染色体+22q●For autosome +22q

○灵敏度=100%(CI:89.95%-100%)○Sensitivity = 100% (CI: 89.95%-100%)

○特异性=99.75%(CI:98.59%-99.96%)○Specificity = 99.75% (CI: 98.59%-99.96%)

○一假阳性嵌合单染色体21调用○ A false positive mosaic single chromosome 21 call

●对于性别染色体非整倍体● For sex chromosome aneuploidy

○灵敏度=100%(CI:63.06%-100%)○Sensitivity = 100% (CI: 63.06%-100%)

○特异性=100%(CI:96.41%-100%)○Specificity = 100% (CI: 96.41%-100%)

●对于胎儿性别调用●Calling the sex of the fetus

○与对照测试100%一致○100% consistent with control test

110个样本中仅一个样本因低深度而不合格(0.9%的重复运行率)。Only one sample out of 110 failed due to low depth (0.9% repeat run rate).

实例2-SNV/插入或缺失(即,基因变体)调用程式性能Example 2 - SNV/Indel (i.e., Gene Variant) Calling Program Performance

使用来自5个产前配对的十五(15)个设计混合物来验证本文公开的SNV/插入或缺失调用程式系统的性能。单独及与一组已知在群体内具有高可变性的SNV(即,dbSNP)组合的感兴趣的基因区域(ROI)的灵敏度和特异性如下表所示:Fifteen (15) designed mixtures from 5 prenatal pairs were used to validate the performance of the SNV/indel caller system disclosed herein. The sensitivity and specificity of the gene region of interest (ROI) alone and in combination with a set of SNVs known to have high variability within the population (i.e., dbSNP) are shown in the following table:

这种初始性能是在不使用本文所述的物理富集过程的情况下建立。预计富集胎儿部分及优化不同滤波器参数将进一步改善性能。This initial performance was established without using the physical enrichment process described herein. It is expected that enrichment of the fetal fraction and optimization of different filter parameters will further improve the performance.

另外,所公开的单基因SNV/插入或缺失调用程式在FF为5.8%-16%的5个独特cfDNA样本上满足性能要求。结果如下表所示。In addition, the disclosed single gene SNV/insertion or deletion calling program met the performance requirements on 5 unique cfDNA samples with FF of 5.8%-16%. The results are shown in the following table.

对于这种性能评估,当同时实施基于物理大小的排除的富集及偏差校正时,观察到最佳性能。For this performance evaluation, the best performance was observed when both enrichment by physical size-based exclusion and bias correction were performed simultaneously.

实例3-SMA调用程式性能分析Example 3 - SMA caller performance analysis

脊髓性肌肉萎缩症(SMA)是一种通常包含在产前筛查中的遗传性病况。然而,由于SMN1基因与SMN2基因之间的高度同源性,SMA调用很难。这些基因在很少位置上(最显著的是外显子7)不同,且SMA携带者/受影响的状况仅取决于SMN1拷贝数。Spinal muscular atrophy (SMA) is a genetic condition that is often included in prenatal screening. However, SMA calling is difficult due to the high homology between the SMN1 and SMN2 genes. These genes differ in a few positions (most notably exon 7), and the SMA carrier/affected status depends only on the number of SMN1 copies.

所公开的系统评估多个碱基(至多44个差异碱基)的存在或不存在,以确保正确的调用。如下表所示,SMA调用程式非常准确、灵敏、及特异。The disclosed system evaluates the presence or absence of multiple bases (up to 44 differential bases) to ensure correct calling. As shown in the table below, the SMA calling program is very accurate, sensitive, and specific.

出于这个评估的目的,携带者胎儿被视为健康的。For the purpose of this evaluation, carrier fetuses were considered healthy.

实例4-α地中海贫血调用程式性能分析Example 4 - Alpha Thalassemia Caller Performance Analysis

α地中海贫血是一种减少血红蛋白生产的血液病。它是一种基因遗传性病况,通常包含在产前筛查中。所公开的系统评估了HBA1和HBA2的双顺式突变的存在或不存在,所述双顺式突变是所述病况的最常见原因。更具体地,从双重缺失区域中的多个探针中获得了共有拷贝数讯号。如下表所示,α地中海贫血调用程式高度准确、灵敏、及特异。Alpha thalassemia is a blood disorder that reduces hemoglobin production. It is a genetically inherited condition that is often included in prenatal screening. The disclosed system evaluates the presence or absence of double cis mutations of HBA1 and HBA2, which are the most common cause of the condition. More specifically, a common copy number signal is obtained from multiple probes in the double deleted region. As shown in the table below, the alpha thalassemia caller is highly accurate, sensitive, and specific.

出于这个评估的目的,携带者胎儿被视为健康的。For the purpose of this evaluation, carrier fetuses were considered healthy.

实例5-RhD调用程式性能分析Example 5 - RhD caller performance analysis

如果妊娠母亲是D(-)型,而胎儿是D(+)型,当母亲的血液接触到胎儿的血液时,就会发生溶血性疾病。这种病况通常包含在产前筛查中。RhD(-)最常见的原因是RHD的整个基因缺失。因此,开发了基于221个可靠差异碱基的调用程式来评估拷贝数。如下表所示,RhD调用程式高度准确、灵敏、及特异。If the pregnant mother is D(-) and the fetus is D(+), a hemolytic disease can occur when the mother's blood comes into contact with the fetus's blood. This condition is often included in prenatal screening. The most common cause of RhD(-) is a deletion of the entire gene for RHD. Therefore, a caller based on 221 reliable differential bases was developed to estimate copy number. As shown in the table below, the RhD caller is highly accurate, sensitive, and specific.

**********

说明书中提到的所有专利及出版物指示本公开所属技术领域中具有通常知识者的水平。所有专利及出版物以引用方式并入本文中,其程度如同每一个别出版物被具体及单独地指出以引用方式并入。All patents and publications mentioned in the specification are indicative of the levels of ordinary skill in the art to which the present disclosure pertains. All patents and publications are incorporated herein by reference to the same extent as if each individual publication was specifically and individually indicated to be incorporated by reference.

本技术就本申请案中描述的特定实施例来说不受限制,这些特定实施例旨在作为本技术的个别范围的单一说明。在不脱离本技术的精神及范围的情况下,可以对本技术进行许多修改及变化,这对所属技术领域中具有通常知识者来说是显而易见的。除本文中所列举的那些外,所属技术领域中具有通常知识者从前述描述将显而易见在本技术的范围内的功能上等效的方法及设备。此类修改及变化旨在落入本技术的范围内。应理解,本技术不限于特定方法、试剂、化合物、组合物、或系统,其当然可改变。还应理解,本文使用的术语仅用于描述特定实施例的目的,并不旨在进行限制。The present technology is not limited to the specific embodiments described in this application, and these specific embodiments are intended to be a single description of the individual scope of the present technology. Without departing from the spirit and scope of the present technology, many modifications and changes can be made to the present technology, which is obvious to those with common knowledge in the art. In addition to those listed herein, those with common knowledge in the art will be obvious from the foregoing description that the functionally equivalent methods and equipment within the scope of the present technology. Such modifications and changes are intended to fall within the scope of the present technology. It should be understood that the present technology is not limited to ad hoc methods, reagents, compounds, compositions, or systems, which can certainly be changed. It should also be understood that the terms used herein are only used to describe the purpose of specific embodiments and are not intended to be limited.

Claims (33)

1. A method of preparing a biological sample having an enriched fetal portion, comprising:
(a-1) obtaining a biological sample comprising free DNA (cfDNA) from a pregnant woman;
(b-1) extracting cfDNA from the biological sample;
(c-1) preparing a cfDNA fragment library to obtain a cfDNA library;
(d-1) isolating the cfDNA fragments in the cfDNA library according to size to retain cfDNA fragments of less than about 150 nucleotides in length, about 155 nucleotides in length, about 160 nucleotides in length, about 165 nucleotides in length, about 170 nucleotides in length, about 175 nucleotides in length, about 180 nucleotides in length, about 185 nucleotides in length, about 190 nucleotides in length, about 195 nucleotides in length, or about 200 nucleotides in length;
(e-1) sequencing the retained cfDNA fragments to obtain a first sequence pool;
(f-1) identifying (i) free fetal DNA (cffDNA) sequences and (ii) free maternal DNA (cfmDNA) sequences present in at least two windows of the first sequence pool based on the length of the read length; and
(G-1) isolating the cffDNA sequences from each of the at least two windows of the sequence library, thereby obtaining at least two fetal fraction-enriched sequence libraries;
Or alternatively
(A-2) obtaining a biological sample comprising free DNA (cfDNA) from a pregnant woman;
(b-2) extracting cfDNA from the biological sample;
(c-2) isolating cfDNA fragments in the extracted sample from (b-2) to retain cfDNA fragments of only less than about 150 nucleotides in length, about 155 nucleotides in length, about 160 nucleotides in length, about 165 nucleotides in length, about 170 nucleotides in length, about 175 nucleotides in length, about 180 nucleotides in length, about 185 nucleotides in length, about 190 nucleotides in length, about 195 nucleotides in length, or about 200 nucleotides in length;
(d-2) preparing a cfDNA library from the isolated cfDNA fragments from (c-2);
(e-2) sequencing the cfDNA library to obtain a first sequence library;
(f-2) identifying (i) free fetal DNA (cffDNA) sequences and (ii) free maternal DNA (cfmDNA) sequences present in at least two windows of the first sequence pool based on the length of the read length; and
(G-2) isolating the cffDNA sequences from each of the at least two windows of the sequence pool, thereby obtaining at least two sequence pools enriched for fetal parts.
2. The method of claim 1, wherein isolating the cfDNA fragments enriches the fetal portion in the biological sample about 1.1-fold, about 1.2-fold, about 1.3-fold, about 1.4-fold, about 1.5-fold, about 1.6-fold, about 1.7-fold, about 1.8-fold, about 1.9-fold, or about 2.0-fold.
3. The method of claim 1 or 2, wherein isolating the cffDNA sequences from the at least two windows of the first sequence library enriches the fetal portion in the biological sample by about 1.1-fold, about 1.2-fold, about 1.3-fold, about 1.4-fold, about 1.5-fold, about 1.6-fold, about 1.7-fold, about 1.8-fold, about 1.9-fold, about 2.0-fold, about 2.1-fold, about 2.2-fold, about 2.3-fold, about 2.4-fold, about 2.5-fold, about 2.6-fold, about 2.7-fold, about 2.8-fold, about 2.9-fold, about 3.0-fold, about 3.1-fold, about 3.2-fold, about 3.3-fold, about 3.4-fold, or about 3.5-fold.
4. A method according to any one of claims 1 to 3, wherein isolating the cfDNA fragments comprises electrophoresis.
5. The method of any one of claims 1 to 4, wherein at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, or at least 10 windows of the first sequence pool are evaluated to identify and isolate cffDNA sequences to obtain at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, or at least 10 fetal fraction enriched sequence pools, respectively.
6. The method of any one of claims 1 to 5, further comprising identifying and isolating cffDNA from cfmDNA by comparing sequence reads of cffDNA and cfmDNA in the first sequence pool to a reference genome, demultiplexing sequence reads from the first pool, removing repetitive sequences from the first sequence pool, or a combination thereof.
7. The method of any one of claims 1 to 6, further comprising assessing the presence of one or more genetic mutations in the sequence pool of the at least two fetal-enriched parts.
8. The method of claim 7, wherein the one or more genetic mutations result in at least one condition selected from the group consisting of: 21-hydroxylase deficiency, ABCC 8-associated hyperinsulinemia, ARSACS, achondroplasia, holoceanopia, adenosine monophosphate deaminase 1, corpus callosum dysplasia with neuronal disease, black urine, alpha-1-antitrypsin deficiency, alpha-mannose storage disorder, alpha-sarcoidosis, alpha-thalassemia; alzheimer's disease (Alzheimers), angiotensinamide II receptor type I, lipoprotein E genotyping; Spermine succinuria (Argininosuccinicaciduria), asparagi glucosaminuria, movement disorders with vitamin E deficiency, movement disorder telangiectasia, autoimmune endocrine disorder syndrome type 1, BRCA1 hereditary breast/ovarian cancer, BRCA2 hereditary breast/ovarian cancer, bardet-Biedl two's syndrome, best yolk vesicular dystrophy, beta-sarcoidosis, beta-thalassemia, biotin enzyme deficiency, blau syndrome, bloom syndrome, CFTR-related disorders, CLN 3-related neuroid lipofuscinosis, CLN 5-related neuroid lipofuscinosis, CLN 8-related neuroid lipofuscinosis, canavan-related carnitine palmitoyl transferase IA deficiency, carnitine palmitoyl transferase II deficiency, cartilage-hair dysplasia (CerebralCavernous Malformation), brain spongiform malformation (CerebralCavernous Malformation), pulse-free vein abnormality, cohen syndrome, congenital cataract, facial dysmorphism (Facial Dysmorphism) and neuropathy, congenital glycosylation disorder la (Congenital Disorder of Glycosylationla), Congenital glycosylation disorder Ib, congenital finnish nephropathy (CongenitalFinnish Nephrosis), crohn's Disease (Crohn's Disease), cystine Disease, DFNA9 (COCH), diabetes mellitus, hearing loss, primary deficiency of muscle tone in Early stage (Early-Onset Primary Dystonia; DYTI), herlitz-Pearson Type boundary vesicular epidermolysis (Epidermolysis Bullosa Junctional, herlitz-Pearson Type), FANCC-related Fanconi anemia, FGFR 1-related cranial suture closure prematurely, FGFR 2-related cranial suture closure prematurely, FGFR 3-related cranial suture closure prematurely, leiden thrombotic complications of the fifth factor (Factor V Leiden Thrombophilia), R2-mutant thrombotic complications of the fifth factor, Deficiency of the eleventh factor, deficiency of the thirteenth factor, familial adenomatous polyposis (FamilialAdenomatous Polyposis), familial autonomic dysfunction (Familial Dysautonomia), familial hypercholesterolemia type B, familial mediterranean fever (FAMILIAL MEDITERRANEAN FEVER), free sialic acid Storage dysfunction (FREE SIALIC ACID Storage dispenser), frontotemporal dementia with Parkinsonism 17 (Frontotemporal DEMENTIA WITH Parkinsonism-17), and, Fumarase deficiency, GJB 2-related DFNA 3-type non-syndrome hearing loss and deafness, GJB 2-related DFNB-type non-syndrome hearing loss and deafness, GNE-related myopathy, galactosylation, gaucher's disease, glucose-6-phosphate dehydrogenase deficiency, glutarate 1 type, glycogen Storage disease type 1a (Glycogen Storage DISEASE TYPE a), glycogen Storage disease type Ib, glycogen Storage disease type II, glycogen Storage disease type III, glycogen Storage disease type V, gracile syndrome, HFE-associated hereditary blood iron deposition (HFE-Associated Hereditary Hemochromatosis), glucose-6-phosphate dehydrogenase deficiency, glutarate 1 type, HALDER AIMS, hemoglobin S beta-thalassemia, hereditary fructose intolerance, hereditary pancreatitis, hereditary thymine-uracil urine disease (HEREDITARY THYMINE-Uraciluria), hexosaminidase A deficiency, sweat ectodermal dysplasia 2 (Hidrotic Ectodermal Dysplasia 2), homocystinuria due to cystathionine beta-synthase deficiency, hyperkalemia periodic paralysis type 1, hyperornithine blood-hyperaminomia-homocystinuria syndrome, primary hyperoxaluria type 1, Primary hyperoxaluria type 2, hypochondrogenesis, hypokalemia type 1, hypokalemia type 2, hypophosphatase, infantile myopathy, lactic acidosis (dying and non-dying), isovaleric acidemia, krabbe disease, LGMD2I, leber hereditary optic neuropathy, french-Canadian Leigh syndrome, long-Chain 3-hydroxyacyl-CoA dehydrogenase deficiency (Long Chain 3-Hydroxyacyl-CoADehydrogenase Deficiency), MELAS, MERRF, MTHFR deficiency, MTHFR thermolabile mutation, MTRNR 1-related hearing loss and deafness, MTTS 1-related hearing loss and deafness, MYH-related polyposis, maple syrup urine disease type 1A, maple syrup urine disease type 1B, equine Cohn-sub-baite syndrome (McCune-Albright Syndrome), medium chain acyl-CoA dehydrogenase deficiency, megawhite encephalopathy with subcortical cyst (MEGALENCEPHALIC LEUKOENCEPHALOPATHY WITH SUBCORTICAL CYST), Metachromatic white matter malnutrition (Metachromatic Leukodystrophy), mitochondrial cardiomyopathy (Mitochondrial Cardiomyopathy), mitochondrial DNA-associated Leigh syndrome and NARP, mucolipidosis IV (Mucolipidosis IV), mucopolysaccharidosis type I (Mucopolysaccharidosis Type I), mucopolysaccharidosis type IIIA, mucopolysaccharidosis type VII, multiple endocrine tumor type 2, myo-ocular-brain disease, sarcoidosis (Nemaline Myopathy), neuro-phenotype, niemann-pick disease due to neurotrypsin deficiency (Niemann-PICK DISEASE Due to Sphingomyelinase Deficiency), niemann-pick disease type C1, nemehenne chromosome fracture syndrome (Nijmegen Breakage Syndrome), PPT 1-related neuro-ceroid lipofuscinosis, PROP1-related hypophysin hormone deficiency (PROP 1-related pituitary hormone deficiency), and, PALLISTER-Hall syndrome, congenital myorigid spasticity (Paramyotonia Congenita), pendred syndrome, peroxisome bifunctional enzyme deficiency, widespread dysfunction (PERVASIVE DEVELOPMENTALDISORDER), phenylalanine hydroxylase deficiency, plasma protein activation factor inhibitor I (Plasminogen ActivatorInhibitorI), autosomal recessive polycystic kidney disease, prothrombin G20210A thrombotic complications, prothrombin G20210A, Pseudo-vitamin D deficiency rickets, compact osteogenesis imperfecta, bothnia autosomal recessive pigmentation retinitis, ratty Syndrome (Rett Syndrome), acrophnodic cartilage dysplasia type 1 (Rhizomelic Chondrodysplasia Punctata Type 1), short chain acyl-CoA dehydrogenase deficiency, SHWACHMAN-Diamond Syndrome, sjogren-Larsson Syndrome, smith-Lemli-Opitz Syndrome, spastic paraplegia 13, Sulfate transporter-related osteochondral dysplasia, TFR 2-related hereditary hemochromatosis, TPP 1-related neurotype ceruloplasmin brown disease, lethal cartilage hypoplasia, transthyretin amyloidosis (TRANSTHYRETIN AMYLOIDOSIS), trifunctional protein deficiency, tyrosine hydroxylase deficiency DRD, tyrosinemia type I, wilson's disease, X-Linked Juvenile Retinoschisis, cystic fibrosis (cystic fibrosis), X-bijuveniles retinal cleavage disease, Spinal Muscular Atrophy (SMA), heme disease, and Zellweger syndrome lineages.
9. The method of any one of claims 1 to 8, further comprising assessing the presence of aneuploidy in the biological sample comprising cfDNA.
10. The method of claim 9, wherein the aneuploidy is selected from the group consisting of a single chromosome, a trisomy, a tetrasomy, a pentachromosome, a microdeletion, a microreplication, and chimeric forms of a single chromosome, a trisomy, a tetrasomy, and a pentachromosome.
11. A method of detecting in parallel the presence or absence of aneuploidy and the presence or absence of at least one gene variant in a single maternal sample comprising:
(i) Obtaining a biological sample from a pregnant woman, wherein the biological sample comprises free DNA (cfDNA);
(ii) Preparing a cfDNA library;
(iii) Sequencing the cfDNA library to generate a sequence library; and
(Iv) Detecting the presence or absence of aneuploidy and the presence or absence of at least one gene variant in the single maternal sample;
Wherein (a) enriching the cfDNA pool to increase fetal fraction, (b) enriching the sequence pool to increase fetal fraction, or (c) a combination thereof, such that the fetal fraction of the single maternal sample is increased by at least 1.1-fold, at least 1.2-fold, at least 1.3-fold, at least 1.4-fold, or at least 1.5-fold prior to detecting the presence or absence of aneuploidy and the presence or absence of at least one gene variant in the single maternal sample.
12. The method of claim 11, wherein the biological sample is blood or plasma.
13. The method of claim 11 or 12, wherein the cfDNA pool is enriched to increase the fetal portion and the sequence pool is enriched to increase the fetal portion.
14. The method of any one of claims 11 to 13, wherein enriching the fetal portion of the cfDNA library comprises removing from the cfDNA library any DNA fragment greater than about 150 nucleotides in length, about 155 nucleotides in length, about 160 nucleotides in length, about 165 nucleotides in length, about 170 nucleotides in length, about 175 nucleotides in length, or about 180 nucleotides in length.
15. The method of claim 14, wherein removing the DNA fragments from the cfDNA library comprises electrophoresis.
16. The method of any one of claims 11 to 15, wherein enriching the fetal portions of the pool of sequences comprises size exclusion based on read length of sequences in at least two windows of the pool of sequences to obtain at least two pools of sequences enriched in fetal portions.
17. The method of claim 16, wherein at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, or at least 10 windows of the first sequence pool are evaluated to identify and isolate cffDNA sequences to obtain at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, or at least 10 fetal fraction enriched sequence pools, respectively.
18. The method of claim 16 or 17, wherein the at least two windows of the sequence library are selected from (i) a sequence of 0-145 nucleotides, (ii) a sequence of 0-150 nucleotides, (iii) 0-155 nucleotides, (iv) 0-160 nucleotides, (v) 0-165 nucleotides, (vi) 0-168 nucleotides, (vii) 0-170 nucleotides, (viii) 0-175 nucleotides, (ix) 0-180 nucleotides, (x) 0-185 nucleotides, (xi) 0-190 nucleotides, (xii) 0-195 nucleotides, (xiii) 0-200 nucleotides, and (xiv) an unchecked.
19. The method of any one of claims 16 to 18, wherein enriching the fetal portion of the library of sequences further comprises identifying and isolating cffDNA from cfmDNA by comparing sequence reads of cffDNA and cfmDNA in the first library of sequences to a reference genome, demultiplexing sequence reads from the first library, removing repetitive sequences from the first library of sequences, or a combination thereof.
20. The method of any one of claims 11 to 19, wherein detecting the presence or absence of at least one gene variant comprises determining in each of the at least two fetal portion-enriched sequence libraries the allelic balance of each allele encoding the at least one gene variant in the sample, and generating an allelic balance locus for each allele based on the allelic balance in each of the at least two fetal portion-enriched sequence libraries, a depth locus based on the depth of the at least two fetal portion-enriched sequence libraries, or a combination of an allelic balance locus and a depth locus.
21. The method of any one of claims 11 to 20, wherein detecting the presence or absence of aneuploidy comprises analyzing the sequence library for sequence depth corresponding to at least one sequence of a chromosome of interest.
22. The method of claim 21, wherein the sequence depth corresponding to the at least one sequence of the chromosome of interest adapts an expected depth model of the chromosome of interest.
23. The method of claim 21 or 22, wherein the sequence depth is calculated by:
wherein:
d p is the depth of pregnancy
F is fetal part
C m is the parent copy number
D b is the background depth
C f is fetal copy number.
24. The method of any one of claims 21 to 23, wherein the sequence depth is normalized to control GC bias, sample background, hybridization probe capture, or a combination thereof.
25. The method of any one of claims 11 to 24, wherein the method comprises detecting the presence or absence of an aneuploidy selected from the group consisting of a single chromosome, a trisomy, a tetrasomy, a polysomy, a microdeletion, a microreplication, a pentachromosome, and combinations thereof.
26. The method of any one of claims 11 to 24, wherein the at least one gene variant is associated with a disease selected from the group consisting of: 21-hydroxylase deficiency, ABCC 8-associated hyperinsulinemia, ARSACS, achondroplasia, holoceanopia, adenosine monophosphate deaminase 1, corpus callosum dysplasia with neuronal disease, black urine, alpha-1-antitrypsin deficiency, alpha-mannose storage disorder, alpha-sarcoidosis, alpha-thalassemia; alzheimer's disease, angiotensinamide II receptor type I, lipoprotein E genotyping; Spermine succinuria, asparagi glucosaminuria, movement disorders with vitamin E deficiency, movement disorder telangiectasia, autoimmune endocrine disorder syndrome type 1, BRCA1 hereditary breast/ovarian cancer, BRCA2 hereditary breast/ovarian cancer, bardet-Biedl two's syndrome, best yolk vesicular dystrophy, beta-sarcoidosis, beta-thalassemia, biotin enzyme deficiency, blau syndrome, bloom syndrome, CFTR related disorders, CLN3 related neurotype ceroid lipofuscinosis, CLN5 related neurotype ceroid lipofuscinosis, and, CLN 8-related neuroid lipofuscinosis, canavan disease, carnitine palmitoyl transferase IA deficiency, carnitine palmitoyl transferase II deficiency, cartilage-hair dysplasia, cerebral spongiform malformation, pulse-free vein abnormality, cohen's syndrome, congenital cataract, facial dysmorphism and neuropathy, congenital glycosylation disorder la, congenital glycosylation disorder Ib, congenital Finnish nephropathy, crohn's disease, cystine disease, DFNA 9 (COCH), diabetes mellitus and hearing loss, early primary muscular tension Deficiency (DYTI), herlitz-Pearson-type interface-type epidermolysis bullosa, FANCC-related Fanconi anemia, FGFR 1-related cranial suture closure prematurely, FGFR 2-related cranial suture closure prematurely, FGFR 3-related cranial suture closure prematurely, leiden thrombotic complications of the fifth factor, R2 mutant thrombotic complications of the fifth factor, deficiency of the eleventh factor, deficiency of the thirteenth factor, familial adenomatous polyposis, familial autonomic dysfunction, familial hypercholesterolemia type B, familial mediterranean fever, free sialic acid storage disorders, frontotemporal dementia with Parkinson's disease 17, fumarase deficiency, GJB 2-related DFNA 3-type non-symptomatic hearing loss, deafness, GJB 2-related DFNB non-syndrome hearing loss, deafness, GNE-related myopathy, galactosylemia, gaucher' S disease, glucose-6-phosphate dehydrogenase deficiency, glutarate 1, glycogen storage disease type 1a, glycogen storage disease type Ib, glycogen storage disease type II, glycogen storage disease type III, glycogen storage disease type V, gracile syndrome, HFE-related hereditary iron deposition, HALDER AIMS, hemoglobin S beta-thalassemia, hereditary fructose intolerance, hereditary pancreatitis, hereditary thymine-uracil uracratia, hexosaminidase A deficiency, Hyperhidrosis ectodermal dysplasia 2, cystathionine beta-synthase deficiency induced homocystinuria, hyperkalemia periodic paralysis type 1, hypercornithine-hyperchlorhydria syndrome, primary hyperoxaluria type 1, primary hyperoxaluria type 2, hypochondrus, hypokalemia periodic paralysis type 1, hypokalemia periodic paralysis type 2, hypophosphatase, infantile myopathy and lactic acidosis (dying and non-lethal), isovaleria, krabbe disease, LGMD2I, leber hereditary optic neuropathy, french-Canadian Leigh syndrome, long-chain 3-hydroxyacyl-CoA dehydrogenase deficiency, MELAS, MERRF, MTHFR deficiency, MTHFR thermolabile mutation, MTRNR 1-related hearing loss and deafness, MTTS 1-related hearing loss and deafness, MYH-related polyposis, maple syrup urine disease type 1A, maple syrup urine disease type 1B, equine Cohn-multiple-hundred's syndrome, medium chain acyl-coa dehydrogenase deficiency, megawhite matter disease with subcortical cyst, metachromatic white matter malnutrition, mitochondrial cardiomyopathy, mitochondrial DNA-related Leigh syndrome and NARP, mucopolysaccharidosis IV, mucopolysaccharidosis type I, mucopolysaccharidosis IIIA, mucopolysaccharidosis type VII, Multiple endocrine tumor type 2, myo-eye-brain disease, linear myopathy, neurophenogenesis, niman-pick disease due to neurotrypsin deficiency, niman-pick disease type C1, nemeheng chromosome breakage syndrome, PPT 1-related neuroceroid lipofuscinosis, PROP 1-related hypophysin hormone deficiency, PALLISTER-Hall syndrome, congenital myospasticity, pendred syndrome, peroxisome bifunctional enzyme deficiency, extensive development disorder, phenylalanine hydroxylase deficiency, plasma protein activating factor inhibitor I, autosomal recessive genetic polycystic kidney disease, prothrombin G20210A thrombotic complications, Pseudo-vitamin D deficiency rickets, compact osteogenesis imperfecta, bothnia autosomal recessive pigmentation retinitis, rattsia syndrome, punctate cartilage dysplasia type 1, short chain acyl-coa dehydrogenase deficiency, SHWACHMAN-Diamond syndrome, sjogren-Larsson syndrome, smith-Lemli-Opitz syndrome, spastic paraplegia 13, sulfate transporter-related osteochondral dysplasia, TFR 2-related hereditary hemochromatosis, TPP 1-related neurotype ceroid lipofuscinosis, lethal cartilage hypoplasia, transthyretin amyloidosis, trifunctional protein deficiency, tyrosine hydroxylase deficiency DRD, tyrosinemia type I, wilson's disease, bipolar X retinal splitting, cystic fibrosis, spinal Muscular Atrophy (SMA), heme disease, and Zellweger syndrome lineages.
27. A method of enriching a biological sample for free fetal DNA (cfDNA), comprising obtaining a biological sample comprising free DNA (cfDNA) from a pregnant woman, wherein the cfDNA comprises cfDNA and free maternal DNA (cfmDNA); extracting the cfDNA from the biological sample; and subjecting the extracted cfDNA to a size exclusion process, wherein the size exclusion process has a cutoff size of about 150 nucleotides in length, about 155 nucleotides in length, about 160 nucleotides in length, about 165 nucleotides in length, about 170 nucleotides in length, about 175 nucleotides in length, or about 180 nucleotides in length, thereby producing a cfDNA enriched sample.
28. A method of computer-simulated processing of free DNA (cfDNA), comprising sequencing cfDNA samples comprising free fetal DNA (cffDNA) and free maternal DNA (cfmDNA) to prepare a sequence pool; performing a read length-based analysis, wherein an allelic balance of the nucleic acid sequence of interest is established in at least two windows of the sequence library; and establishing a trajectory based on the allelic balance of the at least two windows.
29. A method of reducing background noise from unwanted genetic material in non-invasive prenatal screening (NIPS), comprising
(I) Obtaining a biological sample from a pregnant woman, wherein the biological sample comprises free DNA (cfDNA); and
(Ii) Treating cfDNA for NIPS, wherein treating comprises enriching free fetal DNA (cffDNA) in the biological sample, subjecting the cfDNA to a computer-simulated treatment, or a combination thereof.
30. The method of claim 29, wherein processing comprises both enriching free fetal DNA (cffDNA) in the biological sample and subjecting the cfDNA to a computer-simulated process.
31. The method of claim 29 or 30, wherein enriching free fetal DNA (cffDNA) in the biological sample comprises the method of claim 27.
32. The method of any one of claims 29 to 31, wherein computer-simulated processing of the cfDNA comprises the method of claim 28.
33. The method of any one of claims 29 to 32, further comprising normalization to control GC bias, sample background, hybridization probe capture, or a combination thereof.
CN202380016677.1A 2022-01-11 2023-01-10 Non-invasive prenatal sample preparation and related methods and uses Pending CN118647717A (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US63/298,593 2022-01-11
US202263357915P 2022-07-01 2022-07-01
US63/357,915 2022-07-01
PCT/US2023/010496 WO2023137021A2 (en) 2022-01-11 2023-01-10 Non-invasive prenatal sample preparation and related methods and uses

Publications (1)

Publication Number Publication Date
CN118647717A true CN118647717A (en) 2024-09-13

Family

ID=92671773

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202380016677.1A Pending CN118647717A (en) 2022-01-11 2023-01-10 Non-invasive prenatal sample preparation and related methods and uses

Country Status (1)

Country Link
CN (1) CN118647717A (en)

Similar Documents

Publication Publication Date Title
US20220325344A1 (en) Identifying a de novo fetal mutation from a maternal biological sample
JP7381116B2 (en) Determination of nucleic acid sequence disequilibrium
WO2012096579A2 (en) Paired end random sequence based genotyping
TWI856481B (en) Non-invasive prenatal sample preparation and related methods and uses
CN118647717A (en) Non-invasive prenatal sample preparation and related methods and uses
HK40047861B (en) Fetal genomic analysis from a maternal biological sample
HK40047861A (en) Fetal genomic analysis from a maternal biological sample
HK40007427A (en) Fetal genomic analysis from a maternal biological sample
HK40007427B (en) Fetal genomic analysis from a maternal biological sample
AU2013203446B2 (en) Identifying a de novo fetal mutation from a maternal biological sample
HK40077753A (en) Molecular analyses using long cell-free fragments in pregnancy
HK1239754B (en) Fetal genomic analysis from a maternal biological sample
HK1175504A (en) Fetal genomic analysis from a maternal biological sample
HK1175504B (en) Fetal genomic analysis from a maternal biological sample
HK1239754A1 (en) Fetal genomic analysis from a maternal biological sample
HK1222413B (en) Fetal genome analysis from a maternal biological sample

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination