WO2025034580A1

WO2025034580A1 - Phase methylation-based markers for tissue and cell-type-specific identification and monitoring

Info

Publication number: WO2025034580A1
Application number: PCT/US2024/040801
Authority: WO
Inventors: David Zhang; Andrea AMERUOSO; Alessandro Pinto; Hannah Roberts; Kellie HULL; Jana HAVEY
Original assignee: Pupil Bio, Inc.
Priority date: 2023-08-04
Filing date: 2024-08-02
Publication date: 2025-02-13

Abstract

Methods and compositions are disclosed for identifying one or more ultra-specific phased methylation patterns (uPMPs) for a tissue or cell-type of interest. Also disclosed and described are uPMPs for certain tissue or cell-type of interest and the use of uPMPs, e.g.., in a panel, for diagnostic purposes.

Description

PHASED METHYLATION-BASED MARKERS FOR TISSUE AND CELL-TYPE-

SPECIFIC IDENTIFICATION AND MONITORING

CROSS-REFERENCE TO RELATED APPLICATIONS

[001] This application claims the benefit of U.S. Provisional Application No. 63/517,813, filed August 4, 2023, which is hereby incorporated by reference in its entirety.

FIELD

[002] This application relates generally to compositions, assays, methods, kits and apparatuses for identifying and using biomarkers based on the methylation levels of certain target genomic regions.

BACKGROUND

[003] Liquid biopsies offer a way to identify the loss of specific cells or tissues by observing their unique methylation patterns. Methylation, a type of epigenetic modification, affects gene expression and activates cell-specific pathways. Therefore, different cells or tissues exhibit unique methylation profiles that mirror the activation of various genes or pathways. For example, in Type 1 Diabetes Mellitus, which is caused by the loss of beta cells in the pancreatic islets, the INS gene is active and thus its promoter is demethylated - contrary from other tissues or cells.

[004] Many researchers concentrate on either individual methylation loci or patterns of 3-9 phased methylation sites on the same strand, which consist of CpG sites that are consistently either methylated or unmethylated in the cell/tissue of interest but not in the rest of tissues or background (e.g., cfDNA). Methylation markers with better sensitivity and specificity are desired. A process is described here for identifying such markers by identifying phased methylation patterns with breadth and depth.

SUMMARY

[005] This disclosure presents methodologies for identifying ultra-specific phased methylation patterns (uPMPs), which are unique to certain tissues or cell types. Although not every uPMP described in this application needs to be present in each specific tissue or cell type, any detected uPMP can serve as an indicator of specific tissue or cellular distress. This is particularly relevant when abnormal cell death rates are observed, especially when these patterns are found in cell-free DNA (cfDNA). [006] In another aspect, this disclosure provides a method for identifying one or more ultraspecific phased methylation patterns (uPMPs) for a tissue or cell-type of interest, comprising: (a) obtaining a set of cell-free DNA (cfDNA) samples from each subject in a first group of subjects; (b) obtaining a set of genomic DNA samples from the tissue or cell-type of interest from each subject in the first group of subjects and/or a second group of subjects; (c) providing conditions capable of converting unmethylated cytosines to uracils in nucleic acid molecules in the set of cfDNA samples and the set of genomic DNA samples to generate a set of converted cfDNA samples and a set of converted genomic DNA samples; (d) selecting a set of target genomic regions; (e) capturing the set of target genomic regions from the set of converted cfDNA samples and the set of converted genomic DNA samples to generate a set of captured cfDNA libraries and a set of captured genomic DNA libraries; (f) subject the set of captured cfDNA libraries to sequencing at a depth that is at least 0.1X, 0.5X, IX, 10X, 100X, l,000X, 10,000X or 100,000X; (g) subject the set of captured genomic DNA libraries to sequencing at a depth that is a least 0. IX, 0.5X, IX, 10X, 100X, l,000X, 10,000X or 100,000X; (h) determining phased methylation patterns (PMPs) in the set of captured cfDNA libraries and the set of captured genomic DNA libraries; and (i) identifying one or more uPMPs for the tissue or cell-type of interest, wherein the one or more uPMPs are detected in at least one library of the set of captured genomic DNA libraries, but not in any library of the set of captured cfDNA libraries.

[007] In another aspect, this disclosure describes the application of multiple sets of uPMPs. Each set comprises markers that individually possess high specificity, though not necessarily high sensitivity. However, when used in combination, these sets of uPMPs maximize both sensitivity and specificity.

[008] In one embodiment, these sets of uPMPs can be utilized for early detection and subsequent monitoring of degenerative conditions. In a separate embodiment, these sets of uPMPs can also be used for the early detection and monitoring of autoimmune diseases.

BRIEF DESCRIPTION OF THE DRAWINGS

[009] The following drawings and provided descriptions illustrate the apparent features, advantages, and uses of the invention(s). The incorporated drawings and descriptions included herein serve to identify specifications that will further explain the concept of the invention(s) and allow the production of art that allows a trained professional to make and use the invention(s). The drawings are not illustrated to scale.

[0010] Figure 1 provides illustrative schematics for phased methylation, read phased methylation, and phased methylation patterns. Top shows a hypothetical DNA molecule that features five CpG methylation sites, with each site possessing a non-homogeneous methylation status. In the sequencing read, the methylation status at each site is represented as "C" for a methylated cytosine and "c” for an unmethylated cytosine (originally denoted in upper or lower case, respectively, on the molecule). Thus, the read phased methylation is a compilation of the methylation status of each CpG site, and is denoted as T0100', where T indicates a methylated site, and 'O' represents an unmethylated site. This Read Phased Methylation can further be decomposed into ten subsets (combinations) of size three, each representing a Phased Methylation Pattern (PMP), as shown in the accompanying figure.

[0011] Figure 2 shows the identification of common and unique PMPs across two individual molecules. Consider two molecules, each featuring the same CpG sites, sharing identical methylation status at four out of five sites. As a result, these two molecules generate sixteen PMPs of size three. Of these, six PMPs are unique to the first molecule (Molecule A), six are unique to the second molecule (Molecule B), and four PMP is shared between the two molecules.

[0012] Figure 3 shows a workflow to identify ultra-specific Phase Methylation Patterns (uPMPs). Panel (A) shows a sample processing schematic. Panel (B) shows an exemplary uPMP identification process. Early onset T1DM is associated with unique phase methylation patterns. These patterns are absent in cfDNA samples even at high sequencing depths of l,500x, but are found in pancreatic islet samples with relative frequency >20%. These distinct methylation patterns are determined by unique methylation statuses of the CpG sites they encompass.

[0013] Figure 4 depicts an islet uPMP found in KIRREL2 for CpG positions 35861365, 35861379, 35861452 on Chromosome 19 - antisense strand. Shown in panel (A), out of the 8 possible patterns of size =3, the PMP 1,1,1 (hypermethylated) is found in islets but not in cfDNA, at sequencing depths of 150x and l,500x respectively. Panel (B) shows a detailed methylation patterns breakdown. In 25 cfDNA samples, each independently processed and sequenced, 5,572 total reads have been found mapping to Chrl9: 35861365-35861452. No reads contained the patterns 110 and 111. In 23 islet samples, a total of 537 reads have been found mapping to the same coordinates. Of these, 322 mapped the pattern 111 and only 3 reads the pattern 110. While the first pattern is found in all samples, the second is found only in one sample. Both patterns are thus identified as uPMP, as both have specificity -100% and found in at least one islet sample.

[0014] Figure 5 showcases islet uPMPs within the Insulin gene and its flanking regions, each comprising 3 CpG sites. These were found in at least one islet sample, but not in cfDNA. The plot represents a lOkb genomic region on chrl 1 :2153660-2164221 (depicted on the x-axis, coordinate on assembly GRCH38), which harbors 599 total CpG sites (shown as light grey dots). In total, 69 uPMPs, were identified across the two strands. Among these, 51 were located in the downstream region of the INS gene, three in the promoter region, and the remaining within the INS gene. At least two of these, located in the UTR region, overlap with the CpG sites (on relative positions chrl l : 2160805 and 2160808) previously used by Harold et al. (PNAS 108, no. 47 (2011): 19018— 23) to detect beta-cell death in cfDNA using qPCR. While all 69 uPMPs exhibit 100% specificity, the sensitivity of each marker ranges from 4% (1/21 positive samples) to 66% (14/21).

[0015] Figure 6 depicts how ultra-specific phased methylation patterns can be used to interrogate samples and discriminate between tissue types. Panel (A) illustrates the selection of biomarkers based on non-overlapping uPMP present in the samples of an Islet Panel. Panel (B) depicts the Islet Panel main findings. Panel (C) depicts the distribution of biomarker homology across the genes interrogated as bimodal. Panel (D) depicts the cumulative function of 1,300 highly specific low sensitivity markers to achieve a combined sensitivity that significantly exceeds that of a single ultra-specific 100% sensitive marker.

[0016] Figure 7 presents examples of ultra-specific phased methylation patterns of size 3 (uPMPs) for genes INS, PDX1, and TECPR1. Grey shadows represent reads per PMP for each sample. The methylation pattern, shown on the y-axis, is represented in binary: 0 for an unmethylated CpG site and 1 for a methylated CpG site. Genomic coordinates are based on GRCh38. The figure summarizes findings for 25 cfDNA samples and 21 islet samples.

DETAILED DESCRIPTION

[0017] Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs.

[0018] All publications, patents, and patent applications mentioned in this specification are herein incorporated by reference to the same extent as if each individual publication, patent, or patent application was specifically and individually indicated to be incorporated by reference in their entirety.

[0019] As used herein and in the appended claims, the singular forms “a,” “an,” and “the” include plural referents unless the context clearly dictates otherwise. By way of example, “an element” means at least one element and can include more than one element.

[0020] Where a range of values is provided, it is understood that each intervening value, between the upper and lower limit of that range and any other stated or intervening value in that stated range is encompassed within the disclosure. The upper and lower limits of these smaller ranges may independently be included in the smaller ranges, and are also encompassed within the disclosure, subject to any specifically excluded limit in the stated range. Where the stated range includes one or both of the limits, ranges excluding either or both of those included limits are also included in the disclosure.

[0021] When a grouping of alternatives is presented, any and all combinations of the members that make up that grouping of alternatives is specifically envisioned. For example, if an item is selected from a group consisting of A, B, C, and D, the inventors specifically envision each alternative individually (e.g., A alone, B alone, etc.), as well as combinations such as A, B, and D; A and C; B and C; etc.

[0022] The term “and/or” when used in a list of two or more items means any one of the listed items by itself or in combination with any one or more of the other listed items. For example, the expression “A and/or B” is intended to mean either or both of A and B - i.e., A alone, B alone, or A and B in combination. The expression “A, B and/or C” is intended to mean A alone, B alone, C alone, A and B in combination, A and C in combination, B and C in combination, or A, B, and C in combination.

General Definitions

[0023] As used herein, the term “substantially”, when used to modify a quality, generally allows a certain degree of variation without that quality being lost. For example, in certain aspects such degree of variation can be less than 0.1%, about 0.1%, about 0.2%, about 0.3%, about 0.4%, about 0.5%, about 0.6%, about 0.7%, about 0.8%, about 0.9%, about 1%, between 1-2%, between 2-3%, between 3-4%, between 4-5%, or greater than 5% or 10%.

[0024] The term “about”, “around” or “approximately”, when modifying the quantity (e.g., mg) of a substance or composition, or the value of a parameter characterizing a step in a method, or the like, refers to variation in the numerical quantity that can occur, for example, through typical measuring, handling, and sampling procedures involved in the preparation, characterization and/or use of the substance or composition; through an inadvertent error in these procedures; through differences in the manufacture, source, or purity of the ingredients employed to make or use the compositions or carry out the procedures; and the like. In certain aspects, “about” can mean a variation of ± 0.1%, ± 0.5%, ± 1%, ± 2%, ± 3%, ± 4%, ± 5%, ± 6%, ± 7%, ± 8%, ± 9% or ± 10%. [0025] As used herein, the term “phased methylation” refers to the methylation status of different CpG sites that co-occur on the same single-stranded DNA molecule.

[0026] As used herein, the term “read phased methylation” refers to the methylation status of different CpG sites as found in the same NGS read.

[0027] As used herein, the term “phased methylation patterns” or “PMPs” refers to combinations of methylation status of at least 3 CpG sites identified within a given read phased methylation. PMPs are typically characterized by a list of CpG coordinates, and their methylation status as found in the read phased methylation. PMPs may skip certain methylation sites in a given read phased methylation. In an aspect, a PMP refers to methylation status of 3 CpG sites in a given read phased methylation, where some or all of the 3 CpG sites are intercalated by one or more CpG sites not considered as part of the PMP. PMPs are strand-dependent. This means that a phased methylation pattern observed on the sense strand of the original double-stranded DNA molecule may not necessarily have a corresponding pattern on the antisense strand of the same original double-stranded DNA molecule.

[0028] As used herein, the term “ultra-specific phased methylation patterns” or “uPMPs” refers to PMPs not detected in any cell-free DNA (cfDNA) samples from any donor, but are detected in at least one genomic DNA sample from a tissue or cell-type of interest sourced from at least one donor. In some aspect, uPMPs are detected only in some, but not all, genomic DNA samples from a tissue or cell-type of interest, such as below 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70% or 80% of the samples. In some aspect, a uPMP comprises selected methylation sites intercalated by at least 1, at least 2 or at least 3, but below 10 unselected methylation sites.

[0029] In one embodiment and as an example, uPMPs in Type 1 Diabetes Mellitus (T1DM) are PMPs that are not detected in any cell-free DNA (cfDNA) sample (from a donor), but are detected in at least one pancreatic islet sample (from the same donor). In this context, "not detected" refers to the lack of any read phased methylation carrying a specific PMP, provided that the sequencing depth is at least l,500x for cfDNA samples (background). Conversely, "detected" refers to the presence of at least one read phased methylation carrying a specific PMP, provided that the sequencing depth is at least 150x.

[0030] As used herein, the term “ultra-specific phased methylation pattern panel” or “uPMP panel” refers to a set of between 2 and 10,000 uPMPs. This panel, which is specific to at least one tissue or cell type, provides a level of sensitivity that exceeds that of an individual uPMP for at least one tissue or cell type. In a typical uPMP panel, each individual uPMP have a specificity close to 100%, and sensitivity above 0% but below 99%. Collectively, the resulting uPMP panel has the specificity equal to individual markers (i.e., close to 100%), but a sensitivity greater than 70%.

[0031] As used herein, the term “subject” generally refers to an entity or a medium that has testable or detectable genetic information. A subject can be a person, individual, or patient. A subject can be a vertebrate, such as, for example, a mammal. Non-limiting examples of mammals include humans, simians, farm animals, sport animals, rodents, and pets. The subject can be a person that has a condition, disorder or disease or is suspected of having such condition, disorder or disease. The subject may be displaying a symptom(s) indicative of a health or physiological state or condition of the subject, such as a cancer, autoimmune condition, or other disease, disorder, or condition of the subject. As an alternative, the subject can be asymptomatic with respect to such health or physiological state or condition.

[0032] The term “normal” or “healthy”, as used herein, generally refers to a cell, tissue, plasma, blood, biological sample, or subject not having, or suspected to have, a condition, disorder or disease.

[0033] As used herein, a “tissue” corresponds to any cells. Different types of tissue may correspond to different types of cells (e.g., liver, lung, pancreas or blood), but also may correspond to tissue from different organisms (mother vs. fetus) or to healthy cells vs. tumor cells. A “biological sample” refers to any sample that is taken from a subject (e.g., a human, such as a pregnant woman, a person with cancer, or a person suspected of having cancer, an organ transplant recipient, or a subject suspected of having a disease process involving an organ (e.g. the pancreas in diabetes, the heart in myocardial infarction, or the brain in stroke) and contains one or more nucleic acid molecule(s) of interest. The biological sample can be a bodily fluid, such as blood, plasma, serum, urine, vaginal fluid, uterine or vaginal flushing fluids, plural fluid, ascitic fluid, cerebrospinal fluid, saliva, sweat, tears, sputum, bronchoalveolar lavage fluid, etc. Stool samples can also be used.

[0034] As used herein, the term “sample” generally refers to a biological sample obtained from or derived from one or more subjects. Biological samples may be cell-free biological samples or substantially cell-free biological samples, or may be processed or fractionated to produce cell- free biological samples. For example, cell-free biological samples may include cell-free ribonucleic acid (cfRNA), cell-free deoxyribonucleic acid (cfDNA), cell-free fetal DNA (cffDNA), plasma, serum, urine, saliva, amniotic fluid, and derivatives thereof. Cell-free biological samples may be obtained or derived from subjects using an ethylenediaminetetraacetic acid (EDTA) collection tube, a cell-free RNA collection tube, or a cell-free DNA collection tube. Cell-free biological samples may be derived from whole blood samples by fractionation. Biological samples or derivatives thereof may contain cells. For example, a biological sample may be a blood sample or a derivative thereof (e.g., blood collected by a collection tube or blood drops).

[0035] As used herein, the term “cell-free nucleic acid” refers to any extracellular nucleic acid that is not attached to a cell. A cell-free nucleic acid can be a nucleic acid circulating in blood. Alternatively, a cell-free nucleic acid can be a nucleic acid in other bodily fluid disclosed herein, e.g., urine. A cell-free nucleic acid can be a deoxyribonucleic acid (“DNA”), e.g., genomic DNA, mitochondrial DNA, or a fragment thereof. A cell-free nucleic acid can be a ribonucleic acid (“RNA”), e.g., mRNA, short-interfering RNA (siRNA), microRNA (miRNA), circulating RNA (cRNA), transfer RNA (tRNA), ribosomal RNA (rRNA), small nucleolar RNA (snoRNA), Piwi- interacting RNA (piRNA), long non-coding RNA (long ncRNA), or a fragment thereof. In some cases, a cell-free nucleic acid is a DNA/RNA hybrid. A cell-free nucleic acid can be doublestranded, single-stranded, or a hybrid thereof. A cell-free nucleic acid can be released into bodily fluid through secretion or cell death processes, e.g., cellular necrosis and apoptosis.

[0036] A cell-free nucleic acid can comprise one or more epigenetically modifications. For example, a cell-free nucleic acid can be acetylated, methylated, ubiquitylated, phosphorylated, sumoylated, ribosylated, and/or citrullinated. For example, a cell-free nucleic acid can be methylated cell-free DNA.

[0037] As used herein, the term “bisulfite treatment” refers to the treatment of DNA with bisulfite or a salt thereof, such as sodium bisulfite (NaHSO). Bisulfite reacts readily with the 5,6-double bond of cytosine, but poorly with methylated cytosine. Cytosine reacts with the bisulfite ion to form a sulfonated cytosine reaction intermediate which is susceptible to deamination, giving rise to a sulfonated uracil. The sulfonate group can be removed under alkaline conditions, resulting in the formation of uracil. Uracil is recognized as a thymine by polymerases and amplification will result in an adenine-thymine base pair instead of a cytosine-guanine base pair.

[0038] The term “genomic region”, as used herein, generally refers to identified regions of nucleic acid that are identified by their location in the chromosome. In some examples, the genomic regions are referred to by a gene name and encompass coding and non-coding regions associated with that physical region of nucleic acid. As used herein, a gene comprises coding regions (exons), non-coding regions (introns), transcriptional control or other regulatory regions, and promoters. In another example, the genomic region may incorporate an intron or exon or an intron/exon boundary within a named gene.

[0039] As used herein, a “site” corresponds to a single site, which may be a single base position or a group of correlated base positions, e.g., a CpG site.

[0040] As used herein, the term “CpG sites” are regions of DNA where a cytosine nucleotide is followed by a guanine nucleotide in the linear sequence of bases along its 5' —> 3' direction. CpG sites occur with high frequency in genomic regions called CpG islands.

[0041] The term “CpG islands”, as used herein, generally refers to a contiguous region of genomic DNA that satisfies the criteria of (1) having a frequency of CpG dinucleotides corresponding to an “Observed/Expected Ratio” greater than about 0.6, and (2) having a “GC Content” greater than about 0.5. CpG islands may be between about 0.2 to about 3 kilobases (kb) in length having a high frequency of CpG sites. CpG islands may be found at or near promoters of about 40% of mammalian genes. CpG islands may also be found outside of mammalian genes. In some examples, CpG islands are found in exons, introns, promoters, enhancers, inhibitors, and transcriptional regulatory elements. CpG islands may tend to occur upstream of so-called “housekeeping genes”. CpG islands may have a CpG dinucleotide content of at least about 60% of what would be statistically expected. The occurrence of CpG islands at or upstream of the 5' end of genes may reflect a role in the regulation of transcription.

[0042] The term “hypermethylation”, as used herein, generally refers to the average methylation state corresponding to an increased presence of 5-mC at one or a plurality of CpG dinucleotides within a DNA sequence of a test DNA sample, relative to the amount of 5-mC found at corresponding CpG dinucleotides within a normal control DNA sample. In an aspect, a uPMP exhibits hypermethylation.

[0043] The term “hypomethylation”, as used herein, generally refers to the average methylation state corresponding to a decreased presence of 5-mC at one or a plurality of CpG dinucleotides within a DNA sequence of a test DNA sample, relative to the amount of 5-mC found at corresponding CpG dinucleotides within a normal control DNA sample. In an aspect, a uPMP exhibits hypomethylation.

[0044] The term “methylation state” or “methylation status”, as used herein, generally refers to the presence or absence of 5-methylcytosine (“5-mC”) at one or a plurality of CpG dinucleotides within a DNA sequence. Methylation states at one or more particular palindromic CpG methylation sites (each having two CpG dinucleotide sequences) within a DNA sequence include “unmethylated,” “fully-methylated”, and “hemi-methylated.”

[0045] The term “methylated cytosine”, as used herein, generally refers to any methylated forms of the nucleic acid base cytosine that contains a methyl or hydroxymethyl functional group at the 5' position. Methylated cytosines may he regulators of gene transcription in genomic DNA. This term may include 5-methylcytosine and 5-hydroxymethylcytosine.

[0046] The term “methylation assay” refers to any assay for determining the methylation state of one or more CpG dinucleotide sequences within a sequence of DNA.

[0047] The term “methylation converted” or “converted” nucleic acid, as used herein, generally refers to nucleic acid, such as for example DNA, that has undergone a process used to convert the DNA for methylation sequencing. Examples of conversion processes include reagent-based (such as bisulfite) conversion, enzymatic conversion, or combination conversion (such as TAPS conversion) where unmethylated cytosines are converted into uracil prior to PCR amplification or sequencing. The conversion process may be used in methyl sequencing methods to distinguish between methylated and unmethylated cytosine bases. Exemplary enzymes for enzymatically converting cytosine include DNA methyltransferase, Uracil-DNA glycosylase (UDG), and G/T mismatch-specific endonuclease (GTSCE).

[0048] As used herein, in next-generation sequencing (NGS), sequencing depth refers to the number of times a specific nucleotide in a target region of a genome is read or sequenced. It is a measure of how extensively a particular DNA or RNA molecule is sampled during the sequencing process. Sequencing depth is commonly expressed as "X coverage" or "X-fold coverage," where X represents the average number of times a given nucleotide is sequenced. For example, if a genome has a sequencing depth of 3 OX, it means that, on average, each nucleotide in the genome has been sequenced 30 times.

[0049] In an aspect, the present disclosure provides methods that use a panel of uPMPs useful for the analysis of methylation within a region or gene. Other aspects provide novel uses of the region, gene, and the gene product as well as methods, assays, and kits directed to detecting, differentiating, and distinguishing certain condition or disorders, e.g., autoimmune diseases. One particular use is the early detection of autoimmune conditions.

[0050] In another aspect, the present disclosure describes processes and methods for the selection of individual ultra-specific phased methylation patterns (uPMPs). This selection process involves the analysis and comparison of PMPs across multiple samples. These samples represent tissues or cell types of interest from multiple subjects and heterogeneous mixtures such as cell-free DNA (cfDNA) samples from multiple healthy donors.

[0051] In a further aspect of the invention, the present disclosure describes methods comprising (a) obtaining a set of single-stranded target nucleic acids wherein unmethylated cytosines are converted into uracil moieties, (b) obtaining a uPMP panel comprising primers for at least 2, 5, 10, 15, 20, 30, 50, 100, 150, 200, 250, 300, 350, 400, 450 or 500 uPMPs, and (c) applying the uPMP panel to the set of single-stranded target nucleic acids to assess the presence or absence of nucleic acid molecules originating from a tissue or cell-type of interest.

[0052] In some embodiments of the invention of the present disclosure, the uPMP panel comprises one or more uPMPs associated with at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 15, 20, 25 or 30 genes or loci selected from the group consisting of MX1, TECPR1, PDK1, NR5A2, ONCUT1, CCNI1, INS1, IGF2, NPHS1, PGGHG, OTAM, SLC30A2, IFTM1, LBX1, ALDH1L2, CC2D1B, KCNK6, SLC1A2, SEL1L, PDE1A, GAPL1, LC16M2, CTRL, SH3GL2B, PY, TIME1, RBPS, RPXL, CEL, ELA2B, PKD1, LEFTY1, ATSPERB, FFAR1, PRDX4, TLRB1, CYB2, SLC30A8, SCYN, GTR2, CELB, OTAM, PNLIP, REGIA, AMY2A, CTRC, CTRB2, PAK3, MGA, PNLPR1, PLA2G1B, AQP12A, IL23R, CUZD1, TEX11, CELX1, AQP12B, GPR119, SLC38A5, PRSS2, PRSS1, AMY2A, CTRC and SPINK1. [0053] In other aspects of the invention, the present disclosure describes compositions which may comprise a set of single-stranded targets wherein unmethylated cytosines are converted into uracil moieties; a set of forward primers, each flanking at the 5'-end of at least one uPMPs; a set of reverse primers, each flanking at the 3 '-end of at least one uPMPs; and enzymes and buffer for PCR amplification.

[0054] In some embodiments, the set of forward primers and the set of reverse primers flank one or more uPMPs associated with at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 15, 20, 25 or 30 genes or loci selected from the group consisting of MX1, TECPR1, PDK1, NR5A2, ONCUT1, CCNI1, INS1, IGF2, NPHS1, PGGHG, OTAM, SLC30A2, IFTM1, LBX1, ALDH1L2, CC2D1B, KCNK6, SLC1A2, SEL1L, PDE1A, GAPL1, LC16M2, CTRL, SH3GL2B, PY, TIME1, RBPS, RPXL, CEL, ELA2B, PKD1, LEFTY1, ATSPERB, FFAR1, PRDX4, TLRB1, CYB2, SLC30A8, SCYN, GTR2, CELB, OTAM, PNLIP, REGIA, AMY2A, CTRC, CTRB2, PAK3, MGA, PNLPR1, PLA2G1B, AQP12A, IL23R, CUZD1, TEX11, CELX1, AQP12B, GPR119, SLC38A5, PRSS2, PRSS1, AMY2A, CTRC and SPINK1. In some such embodiments, the set of forward primers and/or the set of reverse primers have a set size of at least or at most 2, 5, 10, 15, 20, 30, 50, 100, 150, 250, 300, 450, or 500.

[0055] It will be readily apparent to those skilled in the art that other suitable modifications and adaptations of the devices, systems and methods described herein may be made using suitable equivalents without departing from the scope of the aspects disclosed herein. Having now described certain aspects in detail, the same will be more clearly understood by reference to the following example, which is included for purposes of illustration only and is not intended to be limiting. All patents, patent applications, and references described herein are incorporated by reference in their entirety for all purposes.

EXAMPLES

Example 1: Identification of a uPMP panel associated with early onset of Type 1 Diabetes Mellitus (T1DM)

[0056] A set of ultra-specific phased methylation patterns (uPMP) associated with early onset T1DM were identified through deep sequencing of healthy donors' cfDNA (N=23) and DNA from islet and beta cells derived from different donors (N=19). Figure 1 explains the concept behind phased methylation, read phased methylation, and phased methylation patterns (PMP).

[0057] A targeted hybridization-capture methylation panel was devised. This targeted panel included a set of probes covering approximately 3Mb across 80 regions of interest. Each region was defined as a full gene, plus or minus 3kb, flanking the upstream/downstream coordinates of the gene's start and end points.

[0058] The list of genes to be targeted was selected based on publicly available methylation and proteomic data. Initially, genes expressed in a tissue of interest at a rate four times greater than in other tissues, or in a group of 2-5 tissues compared to any other tissue, were chosen. After initial gene selection, publicly available methylation data was used to identify and prioritize genes exhibiting higher differential methylation at CpG sites.

[0059] Capture probes and reagents were ordered from Twist Bioscience (San Francisco, USA). Approximately 25,000 sets of probes were synthesized and delivered as a pool, each set comprising eight probes of 120nt in length: two probes for each parental strand, either fully or partially converted.

[0060] cfDNA was extracted from plasma using a SPRI-based kit (Beckman Coulter, USA), following the manufacturer's instructions, and stored at -20°C until use. Islet DNA was extracted using a column-based kit (NEB, USA), fragmented to a median length of approximately 220bp using sonication (Covaris, USA), and stored at -20°C until use. Libraries were prepared using EM- Seq (NEB, USA), following the manufacturer's instructions. Briefly, the DNA fragments were end-repaired and ligated with sequencing adapters. After purification, the samples underwent oxidation of 5-Methylcytosines and 5-Hydroxymethylcytosines, followed by a second purification. The samples were then treated with the APOBEC enzyme to convert unmethylated cytosine into uracil. After another round of purification, the fragments were amplified using NGS indexed primers. The product was purified, normalized, and used as input in the capture reaction. Finally, the eluate from the capture reaction was amplified, purified, and sequenced at l,500x and 150x sequencing depths for cfDNA and islet samples, respectively.

[0061] NGS data from each sample were aligned to the reference genome using commonly used methylation aligners (Meth-BWA). Subsequently, a custom software tool was used to determine the phased methylation patterns. Finally, specific phased methylation patterns of the islets, as compared to the cfDNA, were identified in agreement with our definitions (Figure 1).

[0062] A total of approximately 1,000,000 candidate markers, spanning across 87 kb, were identified by comparing methylation patterns from islets and cfDNA samples (Figure 3). An exemplary islet uPMP is found around the KIRREL2 gene (Figure 4). Further exemplary islet uPMPs were identified around the INS gene (Figure 5).

Example 2: Validation of candidate PMPs [0063] The uPMPs identified in Example 1 were validated at higher sequencing depths. For this, a subset of the initial hybridization capture panel was used, comprising a set of probes targeting approximately 300 Kb across 80 regions of interest. Capture probes and reagents were ordered from Twist Bioscience (San Francisco, USA). cfDNA samples were processed as previously described and sequenced at 30,000x. Islets from 19 samples were combined, and alpha cells and beta cells were separated using FACS. Next, the DNA from the different cell types was extracted and processed as above and sequenced at 150x.

[0064] Validated beta-cell markers were identified as uPMPs not observed in any of the cfDNA samples at 30,000x sequencing depth or alpha cells at 150x sequencing depth, but were observed in beta-cells at 150x sequencing separated from the islet samples.

[0065] Validated alpha-cell markers were identified as uPMPs that were not observed in any of the cfDNA samples at 30,000x sequencing depth or beta-cells at 150x sequencing depth, but were observed in alpha-cells at 150x sequencing separated from the islet samples.

[0066] Validated pancreas distress markers were identified as uPMPs that were not observed in any of the cfDNA samples at 30,000x sequencing depth or beta-cells at 150x sequencing depth but were observed in islet samples or in alpha-cells

Example 3: Amplicons-based targeted uPMP panel.

[0067] A targeted amplicon-based panel for T1DM uPMPs was devised comprising at least 20 beta-cell uPMP, and 10 alpha-cells or pancreas distressed uPMP.

Example 4: Ultra-specific phased methylation patterns for interrogation of samples and discrimination between tissue types.

[0068] Without being bound by theory and for exemplary purposes, biomarkers were defined as regions of fixed length (e.g., 120 nt) that do not overlap, characterized by a uPMP of size 3 (comprising three CpG sites) that is 100% specific and highly sensitive. As shown in Figure 6A, this methylation pattern was present in most of the 21 samples of an islet panel. The islet panel targeted 80 genes across 3Mb, covering 60 thousand CpG sites (30k on each strand); and queried 5,754 potential biomarkers. Of these biomarkers, 1,701 were not represented (0 reads) in 25 cfDNA samples, and 1,333 were found only in islets (see Figure 6B). The distribution of biomarker homology across the genes was also interrogated. Examining the best sensitivity uPMPs on a gene level indicated most genes showed a bias towards heterogeneous uPMPs. The observed distribution appears bimodal, where genes either have both heterogeneous and homogeneous patterns or only heterogeneous patterns (see Figure 6C). Highly specific low sensitivity markers can be combined to create a highly sensitive panel. The cumulative function of 1,300 biomarkers demonstrates that despite low sensitivity, a large population of highly specific markers, when considered together, can achieve a combined sensitivity of more than 200 times greater than a single ultra-specific 100% sensitive marker (see Figure 6D).

Example 5: Ultra-specific phased methylation patterns for genes INS, PDX1, and TECPR1.

[0069] Ultra-specific phased methylation patterns of size 3 (uPMPs) for genes INS, PDX1, and TECPR1 were identified and summarized for 25 cfDNA samples and 21 islet samples. The identified methylation patterns were represented using binary code: “0” for an unmethylated CpG site and “1” for a methylated CpG site.

[0070] The INS gene was shown to harbor a uPMP at coordinates chrl 1 : 2157011, 2157037, 2157072, showing binary pattern “100”, absent in cfDNA samples but present in 66% of islet samples (see Figure 7, top left and top right panels).

[0071] Similarly, PDX1 at chrl3: 27926732, 27926745, 27926759 showed the binary pattern “101”, absent in cfDNA samples, but found in 90% of islet samples. The binary pattern “010” for TECPR1 at chr7: 98217694, 98217700, 98217712 was found in all islet samples but none of the cfDNA samples.

Claims

1. A method for identifying one or more ultra-specific phased methylation patterns (uPMPs) for a tissue or cell-type of interest, comprising: a. obtaining a set of cell-free DNA (cfDNA) samples from each subject in a first group of subjects; b. obtaining a set of genomic DNA samples from the tissue or cell-type of interest from each subject in the first group of subjects and/or a second group of subjects; c. providing conditions capable of converting unmethylated cytosines to uracils in nucleic acid molecules of the set of cfDNA samples and the set of genomic DNA samples to generate a set of converted cfDNA samples and a set of converted genomic DNA samples; d. selecting a set of target genomic regions; e. capturing the set of target genomic regions from the set of converted cfDNA samples and the set of converted genomic DNA samples to generate a set of captured cfDNA libraries and a set of captured genomic DNA libraries; f. subject the set of captured cfDNA libraries to sequencing at a depth that is at least 0.1X, 0.5X, IX, 10X, 100X, l,000X, 10,000X or 100,000X; g. subject the set of captured genomic DNA libraries to sequencing at a depth that is a least 0.1X, 0.5X, IX, 10X, 100X, l,000X, 10,000X or 100,000X; h. determining phased methylation patterns (PMPs) in the set of captured cfDNA libraries and the set of captured genomic DNA libraries; i. identifying one or more uPMPs for the tissue or cell-type of interest, wherein the one or more uPMPs are detected in at least one library of the set of captured genomic DNA libraries, but not in any library of the set of captured cfDNA libraries.

2. The method of claim 1, wherein the sequencing depth for the set of captured cfDNA libraries is at least 2, 3, 4, 5, 7, 9, 10, 15, 20, 25, 30, 40, 50, 100, 500, or 1000 times of the sequencing depth for the set of captured genomic DNA libraries

3. The method of claim 1 or 2, wherein the set of target genomic regions each comprises a genic region or portion thereof, of a gene differentially expressed in the tissue or cell-type of interest relative to (i) at least one other tissue or cell-type or (ii) all other tissues or cell-types, optionally the genic region is selected from the group consisting of an upstream regulatory element, a downstream regulatory element, a promoter and an enhancer.

4. The method of claim 3, wherein the gene is expressed in (i) the tissue or cell-type of interest or (ii) a group of 2 to 5 tissues or cell-types comprising the tissue or cell-type of interest, in each of cases (i) and (ii), at a rate 2, 3, 4, 5, 6, 7, 8, 9 or 10 times greater than in other tissues or cell-types.

5. The method of claim 3 or 4, wherein the genic region or portion thereof is enriched for CpG islands.

6. The method of claim 3 or 4, wherein the genic region or portion thereof exhibits higher differential methylation at one or more CpG sites in the tissue or cell-type of interest compared to (i) at least one other tissue or cell-type or (ii) all other tissues or cell-types.

7. The method of any of claims 1 to 4, wherein the first group of subjects and/or a second group of subjects are healthy individuals.

8. The method of any of claims 1 to 7, wherein the set of captured cfDNA libraries and the set of captured genomic DNA libraries are each sequenced in pooled format.

9. The method of any of claims 1 to 7, wherein the set of captured cfDNA libraries and the set of captured genomic DNA libraries are each barcoded, pooled and sequenced in a single sequencing run.

10. The method of any of claims 1 to 9, wherein the sequencing depth for the set of captured cfDNA libraries is at least 500x, lOOOx, 1500x, 2000x, 3000x, 4000x, 5000x, lOOOOx, 20000x or 50000x coverage.

11. The method of any of claims 1 to 9, wherein the sequencing depth for the set of captured cfDNA libraries is at more 500x, lOOOx, 1500x, 2000x, 3000x, 4000x, 5000x, lOOOOx, 20000x or 50000x coverage.

12. The method of any of claims 1 to 11, wherein sequencing depth for the set of captured genomic DNA libraries is at most 50x, lOOx, 150x, 200x, 300x, 400x, 500x, lOOOx, 2000x or 5000x coverage.

13. The method of any of claims 1 to 11, wherein sequencing depth for the set of captured genomic DNA libraries is at least 50x, lOOx, 150x, 200x, 300x, 400x, 500x, lOOOx, 2000x or 5000x coverage.

14. The method of any of the preceding claims, wherein the cell-free DNA (cfDNA) samples are obtained from a biological sample selected from the group consisting of a bodily fluid, blood, plasma, serum, urine, vaginal fluid, uterine or vaginal flushing fluids, plural fluid, ascitic fluid, cerebrospinal fluid, saliva, sweat, tears, sputum, and bronchoalveolar lavage fluid.

15. The method of any of the preceding claims, wherein the tissue or cell-type of interest is selected from the group consisting of: pancreatic islet, alpha cells, beta cells, delta cells, gamma cells, epsilon cells, pancreatic polypeptide cells, ductal cells, and acinar cells.

16. The method of any of the preceding claims, wherein the conversion of unmethylated cytosines to uracils is via an enzymatic or chemical method.

17. The method of any of the preceding claims, further comprising: a. selecting a set of at least 2, 5, 10, 15, 20, 30, 50, 100, 150, 200, 250, 300, 350, 400, 450 or 500 uPMPs to construct a uPMP panel for assessing in a cfDNA sample the presence or absence of cfDNA originating from the tissue or cell-type of interest.

18. A composition comprising: a. a set of single-stranded targets wherein unmethylated cytosines are converted into uracil moieties; b. a set of forward primers, each flanking at the 5'-end of at least one uPMPs; c. a set of reverse primers, each flanking at the 3 '-end of at least one uPMPs; and d. enzymes and buffer for PCR amplification.

19. The composition of claim 18, wherein the set of forward primers and/or the set of reverse primers have a set size of 2, 5, 10, 15, 20, 30, 50, 100, 150, 250, 300, 450, or 500.

20. A composition comprising: a. a set of single-stranded targets wherein unmethylated cytosines are converted into uracil; and b. a set of capture probes each being a reverse complement to at least one uPMPs.

21. A method for identifying one or more ultra-specific phased methylation patterns (uPMPs) for a tissue or cell-type of interest, comprising: a. obtaining a set of cell-free nucleic acid (cfNA) samples from each subject in a first group of subjects; b. obtaining a set of nucleic acid (NA) samples from the tissue or cell-type of interest from each subject in the first group of subjects and/or a second group of subjects; c. selecting a set of target nucleic acid sequence regions; d. capturing the set of target nucleic acid sequence regions from the set of cfNA samples and the set of NA samples to generate a set of captured cfNA libraries and a set of captured NA libraries; e. subject the set of captured cfNA libraries and the set of captured NA libraries to methylation sequencing to different sequencing depths, wherein the sequencing depth for the set of captured cfNA libraries is at least 2, 3, 4, 5, 7, 9, 10, 15, 20, 25, 30, 40, 50, 100, 500, or 1000 times of the sequencing depth for the set of captured NA libraries; f. determining phased methylation patterns (PMPs) in the set of captured cfNA libraries and the set of captured NA libraries; g. identifying one or more uPMPs for the tissue or cell-type of interest, wherein the one or more uPMPs are detected in at least one library of the set of captured NA libraries, but not in any library of the set of captured cfNA libraries.

22. A method comprising: (a) obtaining a set of single-stranded target nucleic acids wherein unmethylated cytosines are converted into uracil moieties, (b) obtaining a uPMP panel comprising primers for at least 2, 5, 10, 15, 20, 30, 50, 100, 150, 200, 250, 300, 350, 400, 450 or 500 uPMPs, and (c) applying the uPMP panel to the set of single-stranded target nucleic acids to assess the presence or absence of nucleic acid molecules originating from a tissue or cell-type of interest.

23. The method of claim 22, wherein the uPMP panel comprises one or more uPMPs associated with at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 15, 20, 25 or 30 genes or loci selected from the group consisting of MX1, TECPR1, PDK1, NR5A2, ONCUT1, CCNI1, INS1, IGF2, NPHS1, PGGHG, OTAM, SLC30A2, IFTM1, LBX1, ALDH1L2, CC2D1B, KCNK6, SLC1A2, SEL1L, PDE1A, GAPL1, LC16M2, CTRL, SH3GL2B, PY, TIME1, RBPS, RPXL, CEL, ELA2B, PKD1, LEFTY1, ATSPERB, FFAR1, PRDX4, TLRB1, CYB2, SLC30A8, SCYN, GTR2, CELB, OTAM, PNLIP, REGIA, AMY2A, CTRC, CTRB2, PAK3, MGA, PNLPR1, PLA2G1B, AQP12A, IL23R, CUZD1, TEX11, CELX1, AQP12B, GPR119, SLC38A5, PRSS2, PRSS1, AMY2A, CTRC and SPINK 1.

24. The composition of claim 18, wherein the set of forward primers and the set of reverse primers flank one or more uPMPs associated with at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 15, 20, 25 or 30 genes or loci selected from the group consisting of MX1, TECPR1, PDK1, NR5A2, 0NCUT1, CCNI1, INS1, IGF2, NPHS1, PGGHG, OTAM, SLC30A2, IFTM1, LBX1, ALDH1L2, CC2D1B, KCNK6, SLC1A2, SEL1L, PDE1A, GAPL1, LC16M2, CTRL, SH3GL2B, PY, TIME1, RBPS, RPXL, CEL, ELA2B, PKD1, LEFTY1, ATSPERB, FFAR1, PRDX4, TLRB1, CYB2, SLC30A8, SCYN, GTR2, CELB, OTAM, PNLIP, REGIA, AMY2A, CTRC, CTRB2, PAK3, MGA, PNLPR1, PLA2G1B, AQP12A, IL23R, CUZD1, TEX11, CELX1, AQP12B, GPR119, SLC38A5, PRSS2, PRSS1, AMY2A, CTRC and SPINK 1.

25. The composition of claim 18, wherein the set of forward primers and/or the set of reverse primers have a set size of at least or at most 2, 5, 10, 15, 20, 30, 50, 100, 150, 250, 300, 450, or 500.