[go: up one dir, main page]

WO2024150227A1 - Embedded molecular inversion probe-based targeted sequencing methods and related compositions - Google Patents

Embedded molecular inversion probe-based targeted sequencing methods and related compositions Download PDF

Info

Publication number
WO2024150227A1
WO2024150227A1 PCT/IL2024/050039 IL2024050039W WO2024150227A1 WO 2024150227 A1 WO2024150227 A1 WO 2024150227A1 IL 2024050039 W IL2024050039 W IL 2024050039W WO 2024150227 A1 WO2024150227 A1 WO 2024150227A1
Authority
WO
WIPO (PCT)
Prior art keywords
nucleic acid
sample
acid sequence
target nucleic
mip
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/IL2024/050039
Other languages
French (fr)
Inventor
Tamir BIEZUNER
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sequentify Ltd
Original Assignee
Sequentify Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sequentify Ltd filed Critical Sequentify Ltd
Priority to IL322010A priority Critical patent/IL322010A/en
Priority to EP24701526.6A priority patent/EP4642925A1/en
Publication of WO2024150227A1 publication Critical patent/WO2024150227A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6813Hybridisation assays
    • C12Q1/6827Hybridisation assays for detection of mutation or polymorphism
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing

Definitions

  • the present application relates to the field of targeted sequencing and provides an enhanced molecular inversion probe (MIP) technology that utilizes embedded barcodes from the first stages of the assay.
  • MIP molecular inversion probe
  • NGS library preparation is defined by a series of molecular biology steps that start with a DNA/RNA template and end with a sequencing ready DNA library. These steps modify the original template to contain the correct DNA context that enables sequencing.
  • DNA index/barcode/tag may be inserted within the library preparation procedure at the final steps of the library preparation to allow for later pooling the NGS library into a single NGS library reaction. This index is being sequenced together as part of the DNA sequencing procedure and allows the allocation of the sequencing data to the original sample from which it was originated, in an in silico procedure termed “demultiplexing”.
  • a major drawback in any NGS library preparation procedure is the fact that until the indexing step, samples cannot be mixed and processed together [1]. This results in two main problems: A) costs.
  • Targeted sequencing is a series of library preparation techniques that enable creating an NGS library that focus on specific genomic regions.
  • MIPs propose a cost- effective approach for targeted sequencing as it is a one-pot, internal-purification free, and simple to handle reaction [4]. Unlike capture probes, MIP probes are partially sequenced (and hence enable unique molecular identifier, UMI embedding). There is a need for improved MIP methods and related probes/kits to enable a neglectable library preparation cost for sequencing approaches that may pave the way to population wide genomic screening.
  • the present disclosure provides a Molecular Inversion Probe-based method for targeted sequencing of at least one sample, comprising at least one of the steps of: a. contacting at least one target nucleic acid sequence originating from at least one sample with at least one MIP comprising unified at least one sample identifier index, and incubating for hybridization, said MIP comprising:
  • At least one sample identifier index at least one sample identifier index; thereby obtaining at least one hybridization product/s from at least one sample
  • said at least one hybridization product comprises said at least one MIP hybridized to the first and second target regions of at least one target nucleic acid sequence of interest
  • the reaction mixture may be the polymerization and/or ligation reaction mixture
  • the reaction mixture may be the polymerization and/or ligation reaction mixture
  • c optionally subjecting the reaction mixture obtained in step (b) to enzymatic digestion thereby digesting linear MIP/s or nucleic acid molecule/s present in said reaction mixture, thereby obtaining at least one non-digested cyclized product/s from at least one sample; d. amplifying the nucleic acid sequence of said non-digested cyclized product/s of step (c).
  • the present disclosure provides a method for diagnosing a pathological disorder in at least one subject by identifying at least one genetic and/or epigenetic variation/s associated with said pathologic disorder, and/or at least one nucleic acid sequence of at least one pathogenic entity, in at least one target nucleic acid sequence of at least one samples of said at least one subject, the method comprising the step of performing molecular inversion probe-based method for targeted sequencing in at least one subject or in any nucleic acid molecule obtained therefrom, wherein the presence of one or more of said variation/s in said target nucleic acid sequence and/or of at least one nucleic acid sequence of at least one pathogenic entity in said at least one sample, indicates that the least one subject has a risk, is a carrier, or is suffering from said pathologic disorder, and wherein the molecular inversion probe-based method for targeted sequencing comprising at least one of the steps of: a. contacting at least one target nucleic acid sequence originating from at least one sample with at least one MIP comprising
  • At least one sample identifier index at least one sample identifier index; thereby obtaining at least one hybridization product/s from at least one sample
  • said at least one hybridization product comprises said at least one MIP hybridized to the first and second target regions of at least one target nucleic acid sequence of interest
  • the reaction mixture may be the polymerization and/or ligation reaction mixture
  • the reaction mixture may be the polymerization and/or ligation reaction mixture
  • c optionally subjecting the reaction mixture obtained in step (b) to enzymatic digestion thereby digesting linear MIP/s or nucleic acid molecule/s present in said reaction mixture, thereby obtaining at least one non-digested cyclized product/s from at least one sample; d. amplifying the nucleic acid sequence of said non-digested cyclized product/s of step (c).
  • An additional aspect of the present disclosure relates to a method of detecting the presence of one or more target microorganism, infectious entity in at least one sample, the method comprising the step of performing molecular inversion probe-based method for targeted sequencing in at least one nucleic acid molecule obtained from said at least one sample, wherein the presence of one or more target nucleic acid sequence associated with said microorganism or infectious entity in said at least one sample indicates the presence thereof in the sample, and wherein the molecular inversion probe-based method for targeted sequencing comprising at least one the steps of: a. contacting at least one target nucleic acid sequence originating from at least one sample with at least one MIP comprising unified at least one sample identifier index, and incubating for hybridization, said MIP comprising:
  • At least one sample identifier index at least one sample identifier index; thereby obtaining at least one hybridization product/s from at least one sample
  • said at least one hybridization product comprises said at least one MIP hybridized to the first and second target regions of at least one target nucleic acid sequence of interest
  • the reaction mixture may be the polymerization and/or ligation reaction mixture
  • the reaction mixture may be the polymerization and/or ligation reaction mixture
  • c optionally subjecting the reaction mixture obtained in step (b) to enzymatic digestion thereby digesting linear MIP/s or nucleic acid molecule/s present in said reaction mixture, thereby obtaining at least one non-digested cyclized product/s from at least one sample; d. amplifying the nucleic acid sequence of said non-digested cyclized product/s of step (c).
  • the present disclosure provides a method of determining the genotype and/or genetic profile of at least one nucleic acid sequence of at least one organism, or at least one infectious entity from at least one test sample, the method comprising the step of performing molecular inversion probe-based method for targeted sequencing in said at least one test sample comprising said at least one nucleic acid sequence, wherein the molecular inversion probe-based method for targeted sequencing comprising at least one of the steps of: a. contacting at least one target nucleic acid sequence originating from at least one sample with at least one MIP comprising unified at least one sample identifier index, and incubating for hybridization, said MIP comprising:
  • At least one sample identifier index at least one sample identifier index; thereby obtaining at least one hybridization product/s from at least one sample
  • said at least one hybridization product comprises said at least one MIP hybridized to the first and second target regions of at least one target nucleic acid sequence of interest
  • the reaction mixture may be the polymerization and/or ligation reaction mixture
  • the reaction mixture may be the polymerization and/or ligation reaction mixture
  • c optionally subjecting the reaction mixture obtained in step (b) to enzymatic digestion thereby digesting linear MIP/s or nucleic acid molecule/s present in said reaction mixture, thereby obtaining at least one non-digested cyclized product/s from at least one sample; d. amplifying the nucleic acid sequence of said non-digested cyclized product/s of step (c).
  • a yet additional aspect of the present disclosure relates to a Molecular Inversion Probe (MIP) comprising:
  • An additional aspect of the present disclosure relates to a plurality of Molecular Inversion Probes (MIPs) comprising unified at least one sample identifier index, wherein each of said MIPs targets a different target nucleic acid sequence of interest, and wherein each MIP comprises:
  • a further aspect of the invention relates to a kit comprising at least one set of plurality of MIPs, each of the at least one set of MIPs comprises unified at least one sample identifier index, wherein each of said MIPs targets a different target nucleic acid sequence of interest, and wherein each MIP comprises:
  • Figure 1A-1D Schematic representation of the MIP and eMIP panels and protocols.
  • Fig. 1A The basic standard MIP probe structure involves a backbone (black) and two flanking DNA targeting regions (grey).
  • Fig. IB The reaction is composed of 4 sequential steps that process each DNA sample (marked in different nuances) separately.
  • Fig. 1C The eMIP probe structure harbors an embedded index sequence upstream the targeting arms. Every panel has a different index sequence that imprints the libraries from the first MIP step, allowing the pooling of the reactions earlier than the standard protocol (which allows pooling only after the barcoding PCR). In the eMIP protocol, every sample that is expected to be pooled is targeted by a different panel, and hence gets a different index.
  • Fig. ID An embodiment of the eMIP protocol in which following the first step of the protocol (hybridization) the processed libraries are pooled to be processed together in a single reaction.
  • Figure 2 Estimated experiment cost reduction by using the eMIP protocol.
  • Figure 1A-3B DNMT3A R882 mutation analysis using a single probe.
  • Figure 2A-4D VAF analysis of 3 expected homoduplex mutations withing different DNA samples from MIP and eMIP experiment.
  • a number of 7 identical myeloid panels were applied to 7 DNA templates, 1XNTC, and a duplicate for OCI-AML3, K562 and normal Human Genomic DNA (marked WT). Each panel was marked by a different DNA barcode to allow for demultiplexing at different eMIP pooling stages.
  • Fig. 4A Standard MIP reaction.
  • Fig. 4B eMIP reactions pooled after hybridization.
  • Fig. 4C eMIP reactions pooled after gap fill.
  • Fig. 4D eMIP reactions pooled after exonuclease.
  • Figure 5A-5D VAF analysis of 3 expected homoduplex mutations withing different DNA samples from MIP and eMIP experiment (data of each duplicate and QC).
  • a number of 7 identical myeloid panels were applied to 7 DNA templates, 1XNTC, and a duplicate for 0CI-AML3, K562 and normal Human Genomic DNA (marked WT). Each panel was marked by a different DNA barcode to allow for demultiplexing at different eMIP pooling stages (Standard: standard MIP reaction as control).
  • Fig. 5A VAF of each duplicate after exonuclease, gap filling, hybridization and standard MIP reaction. Diagonal lines and white - NRAS and TP53 mutation, respectively, left Y-axis. Horizontal lines - Total reads per sample, right Y-axis.
  • Fig. SB Performance data per AML3 dilution at different eMIP pooling stages - Total reads count per sample.
  • Fig. SC Performance data per AML3 dilution at different eMIP pooling stages - Percent of on-target reads per sample.
  • Fig. 5D Performance data per AML3 dilution at different eMIP pooling stages - Percent of uniformity per sample.
  • Figure 6A-6C Schematic representation of different configurations of the probes for eMIP.
  • Fig. 6A Probe suitable for eMIP composed of two sample identifiers comprising 5 nucleotides, on both sides of the probe, flanking the arms.
  • Fig. 6B Probe suitable for eMIP composed of two sample identifiers comprising 6 nucleotides, on both sides of the probe, flanking the arms and two UMIs comprising 4 nucleotides.
  • Fig. 6C Probe suitable for eMIP composed of one sample identifier comprising 8 nucleotides, flaking the Ligation arm and one UMI comprising 8 nucleotides.
  • Lig Arm ligation arm; Ex Arm: extension arm; BC: barcode; UMI: unique molecular identifier; BB: backbone; Ns: nucleotides.
  • Figure 7A-7B Measured VAF and read depth (DP) of NRAS variant per expected variant VAF in diluted samples using the standard MIP reaction or eMIP reaction pooled after hybridization step or gap-fill (extension) step.
  • Fig. 7A Measured VAF of diluted samples of a known mutation in the NRAS gene with 0CI-AML3 mutated cell line DNA and normal DNA as the diluent.
  • Fig. 7B Read depth (DP) of diluted samples of a known mutation in the NRAS gene with 0CI-AML3 mutated cell line DNA and normal DNA as the diluent.
  • Figure 8 Measured VAF using different configurations of eMIP probes using the standard MIP reaction or eMIP reaction pooled after hybridization step or gap-fill (extension) step.
  • Figure 9 Measured VAF using the standard MIP reaction or eMIP reaction pooled after hybridization step or gap-fill (extension) step with all different configurations of eMIP probes together.
  • Figure 10 QC data of different configurations of eMIP probes VAF using the standard MIP reaction or eMIP reaction pooled after hybridization step or gap-fill (extension) step.
  • Figure 11 QC data of different concentrations of DNA starting materials.
  • Figure 12 Total reads obtained with 4 different parameters of the eMIP reaction i.e. DNA concentration, number of samples in pool, starting hybridization volume and volume at gapfilling step.
  • Figure 13A-13B Reads ratio of NA23245 (NIGMS Human Genetic Cell Repository) DNA and OCI-AML3 DNA.
  • Fig. 13A Reads ratio of higher- volume sample and lower- volume sample out of total reads for NA23245 DNA.
  • Fig. 13B Reads ratio of higher- volume sample and lower- volume sample out of total reads for OCI-AML3 DNA.
  • Figure 14A-14D Correlation plots of variants from NRAS DNA template of expected VAF (X axis) and its measured VAF (Y axis).
  • Fig. 14A Standard MIP protocol, non-deduplicated data.
  • Fig. 14B eMIP protocol, non-deduplicated data.
  • Fig. 14C Standard MIP protocol, deduplicated data.
  • Fig. 14D eMIP protocol, deduplicated data.
  • Figure 15A-15D Correlation plots of variants from DNMT3A DNA template of expected VAF (X axis) and its measured VAF (Y axis).
  • Fig. ISA Standard MIP protocol, non-deduplicated data.
  • Fig. 15B eMIP protocol, non-deduplicated data.
  • Fig. 15C Standard MIP protocol, deduplicated data.
  • Fig. 15D eMIP protocol, deduplicated data.
  • Figure 16A-16D Correlation plots of variants from JAK2 DNA template of expected VAF (X axis) and its measured VAF (Y axis).
  • Fig. 16A Standard MIP protocol, non-deduplicated data.
  • Fig. 16B eMIP protocol, non-deduplicated data.
  • Fig. 16C Standard MIP protocol, deduplicated data.
  • Fig. 16D eMIP protocol, deduplicated data.
  • Figure 17A-17C Schematic representation of the controlled eMIP experiment.
  • Fig. 17A eMIP panels with embedded barcodes marked: EB8N#.
  • Fig. 17B Template oligos marked TEB8N#.
  • Fig. 17C Final formation of the expected library (after pooling, EB8N# and TEB8N# numbers should match).
  • Figure 18 Heatmap of matching read counts per combination of the controlled eMIP experiment, 16X multiplexity.
  • Fastq files(rows) were sorted by their eMIP index, in duplicates.
  • the search patterns (columns) are sorted right to left.
  • Figure 19 Heatmap of matching read counts per combination of the controlled eMIP experiment, 24X multiplexity.
  • Figure 20 Heatmap of matching read counts per combination of the controlled eMIP experiment, 48X multiplexity. Fastq files (rows) were sorted by their eMIP index, in duplicates. The search patterns (columns) are sorted right to left.
  • Figure 21 Heatmap of matching read counts per combination of the controlled eMIP experiment, 96X multiplexity.
  • NGS Next Generation sequencing
  • the present disclosure provides a Molecular Inversion Probe-based method for targeted sequencing of at least one sample, comprising at least one of the steps of: a. contacting at least one target nucleic acid sequence originating from at least one sample with at least one MIP comprising unified at least one sample identifier index, and incubating for hybridization, said MIP comprising:
  • At least one sample identifier index at least one sample identifier index; thereby obtaining at least one hybridization product/s from at least one sample
  • said at least one hybridization product comprises said at least one MIP hybridized to the first and second target regions of at least one target nucleic acid sequence of interest
  • the reaction mixture may be the polymerization and/or ligation reaction mixture
  • the reaction mixture may be the polymerization and/or ligation reaction mixture
  • c optionally subjecting the reaction mixture obtained in step (b) to enzymatic digestion thereby digesting linear MIP/s or nucleic acid molecule/s present in said reaction mixture, thereby obtaining at least one non-digested cyclized product/s from at least one sample; d. amplifying the nucleic acid sequence of said non-digested cyclized product/s of step (c).
  • the steps (a) and (b) may be performed together.
  • the present disclosure provides a Molecular Inversion Probe-based method for targeted sequencing of at least one sample, comprising the steps of: a. contacting at least one target nucleic acid sequence originating from at least one sample with at least one MIP comprising unified at least one sample identifier index, and incubating for hybridization, said MIP comprising:
  • At least one sample identifier index at least one sample identifier index; thereby obtaining at least one hybridization product/s from at least one sample
  • said at least one hybridization product comprises said at least one MIP hybridized to the first and second target regions of at least one target nucleic acid sequence of interest
  • the reaction mixture may be the polymerization and/or ligation reaction mixture
  • the reaction mixture may be the polymerization and/or ligation reaction mixture
  • the reaction mixture obtained in step (b) to enzymatic digestion thereby digesting linear MIP/s or nucleic acid molecule/s present in said reaction mixture, thereby obtaining at least one non-digested cyclized product/s from at least one sample; d. amplifying the nucleic acid sequence of said non-digested cyclized product/s of step (c).
  • the methods of the present disclosure is for simultaneous targeted sequencing of at least two samples, wherein the hybridization or cyclized or non-digested cyclized product/s obtained in any one of step (a) or (b) or (c) and corresponding to each of said at least two samples are mixed/pooled together before the amplification step (d).
  • the present disclosure provides a Molecular Inversion Probebased method for simultaneous targeted sequencing of at least two samples, comprising the steps of: a. contacting at least one target nucleic acid sequence originating from at least two samples with at least two MIP comprising unified at least one sample identifier index, and incubating for hybridization, said MIP comprising:
  • step (iii) at least one sample identifier index; thereby obtaining at least one hybridization product/s from at least two samples; and b. subjecting the hybridization product/s obtained in step (a) to a polymerization reaction, thereby synthesizing a sequence corresponding to the at least one target nucleic acid sequence of interest nested between the first and second regions, wherein the synthesized sequence is further ligated to obtain at least one cyclized product/s in a reaction mixture from said at least two samples; and c.
  • step (b) subjecting the reaction mixture obtained in step (b) to enzymatic digestion thereby digesting linear MIP/s or nucleic acid molecule/s present in said reaction mixture, thereby obtaining at least one non-digested cyclized product/s from at least two samples; d. amplifying the nucleic acid sequence of said non-digested cyclized product/s of step (c); wherein the hybridization or cyclized or non-digested cyclized product/s obtained in any one of step (a) or (b) or (c) and corresponding to each of said at least two samples are mixed/pooled together before the amplification step (d).
  • the hybridization product/s obtained in step (a) for each of at least two samples are mixed/pooled.
  • the cyclized product/s obtained in step (b) for each of at least two samples are mixed/pooled.
  • the non-digested cyclized product/s obtained in step (c) for each of at least two samples are mixed/pooled.
  • MIPs are, e.g., nucleic acid hybridization probes that hybridize to a target nucleic acid in a loop with the 5' and 3' ends adjacent to or separated in the target with a small gap.
  • the MIPs are typically designed to interrogate a target nucleotide in the gap using the high specificity of the DNA polymerase reaction. If provided with the appropriate dNTP, the polymerase can fill the gap between the MIP 5' and 3' ends. For example, if the target nucleic acid has an adenine “A” in the gap, using the target as a template, the polymerase can fill the gap if provided with a complementary dTTP.
  • the polymerase will add a “T” and fill the gap in the gap-fill reaction. With the gap filled, a ligase can close the remaining nick and circularize the MIP.
  • the circularized MIPs are then enriched or isolated.
  • all other nucleic acids including MIPs that did not hybridize and circularize (also referred to herein as linear MIPs), can be removed e.g. digested with one or more nuclease.
  • MIP reaction products are typically detected after an amplification step, such as PCR using primer binding sites within the MIPs or rolling circle amplification, on a capture array.
  • MIPs useful in the disclosed methods comprise "first" and “second” regions that comprise sequences complementary to the first and second regions, respectively, of the target nucleic acid sequence. Such first and second complementary regions may also be named as Ligation arm and Extension arm.
  • the term “complementary” as used herein refers to the hybridization or base pairing between nucleotides or nucleic acids, such as, for instance, between the two strands of a double stranded DNA molecule or between an oligonucleotide primer and a primer binding site on a single stranded nucleic acid to be sequenced or amplified.
  • Complementary nucleotides are, generally, A and T (or A and U), or C and G.
  • Two single stranded RNA or DNA molecules are said to be complementary when the nucleotides of one strand, optimally aligned and compared and with appropriate nucleotide insertions or deletions, pair with at least about 70% to 100% of the nucleotides of the other strand, with at least about 80% of the nucleotides of the other strand, specifically, about 80% to 100%, more specifically at least about 90% to 95%, and more preferably from about 98% to 100%.
  • complementarity exists when an RNA or DNA strand will hybridize under selective hybridization conditions to its complement.
  • selective hybridization will occur when there is at least about 65% complementary over a stretch, preferably at least about 75%, more preferably at least about 90% complementary.
  • homology regions of a MIP display at least about 75%, or about 80% to 100%, more specifically about 90% to 95%, and more preferably about 98% to 100%, or about 100% complementarity with the corresponding complementary sequence within the target nucleic acid of interest, e.g., unless there is a mismatch at the position of the interrogated nucleotide of interest.
  • the complementary regions of the MIPs provided and used in the disclosed methods may be also referred to herein as homology regions.
  • "Homology regions”, as used herein are those regions of a molecular inversion probe that are complementary to the target nucleic acid of interest.
  • MIPs typically have two homology regions (HRs), one at or near the 5' end of the probe and one at or near the 3' end.
  • the HRs are adapted to hybridize to a target nucleic acid of interest so that they about each other or are separated by a gap of a single target nucleotide or a plurality of target nucleotides.
  • the first and second complementary region of the target nucleic acid sequence flank the sequence to be interrogated (e.g., SNP etc.).
  • a gap of a plurality of target nucleotides can include, e.g., from 1 to about 2000 nucleotides, for example, from 1 to 500 nucleotides, and more preferably 1 to 250 nucleotides. The size of the gap will depend on a variety of factors, including the sequence of the intended target, the size of the overall MIP, the quantity and size of non-HR portions of the MIP, the desired purpose of the assay and associated characteristics, and other factors.
  • a MIP designed to interrogate a SNP may have a gap of a single nucleotide while a MIP designed to interrogate a multi-base insertion may have a gap of multiple nucleotides.
  • the first and/or the second homology regions of the disclosed MIP may be about 10 to about 200 nucleotides long, specifically, about 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115, 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 17
  • the MIP probe used in the present disclosure may comprise degenerative homology arms, or complementary regions.
  • the complementary regions of the disclosed MIPs may comprise one or more degenerate base, specifically, between about 0.1% to about 90% degenerate bases, and are therefore referred to herein as degenerative homology regions or arms, complementary regions or arms. More specifically, degenerate base means more than one base possibility at a particular position. An oligonucleotide sequence can be synthesized with multiple bases at the same position, this is termed as degenerate base also sometime referred as "wobble" position or "mixed base". IUB (International Union of Biochemistry) has established single letter codes for all possible degenerate possibilities.
  • a degenerate base position may have any combination of two, three, or four bases. Chemical synthesis of oligos using IUB degenerate bases is programmed and automated to deliver the percentage of each base for reaction at that specific base position; example for the letter "N", 25% of each base will be delivered for coupling. The delivery and coupling may not be 100% accurate and efficient for each base and thus approximately 10% deviation should be expected and considered in the final oligo sequence. For degenerate (mixed bases) positions use the following IUB codes.
  • sample identifier index relates to a tag or an index that may be for example a nucleic acid sequence, a Unique Molecular Identifier (UMI) or any suitable label known in the art, that is employed in order to identify a specific sample.
  • UMI Unique Molecular Identifier
  • a sample identifier index is distinct for an index employed for target recognition.
  • the at least one sample identifier index present in a MIP would be uniform for each sample.
  • the at least one sample identifier index would be uniform for each MIP employed for a particular sample.
  • each sample specific set would possess uniform at least one sample identifier index.
  • said sample identifier index is at a position that does not disturb the target recognition. In some embodiments, said sample index identifier is not used for target recognition.
  • the sample identifier index may comprise between about 4 nucleotides to about 50 nucleotides, specifically, between 4 to 40 nucleotides, between 4 to 40 nucleotides, between 4 to 40 nucleotides, specifically 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40 nucleotides.
  • the sample identifier index in the disclosed MIPs comprise 4 nucleotides.
  • the sample identifier index in the disclosed MIPs comprise 5 nucleotides.
  • the sample identifier index in the disclosed MIPs comprise 6 nucleotides. In some embodiments, the sample identifier index in the disclosed MIPs comprise 8 nucleotides. In some embodiments, the sample identifier index in the disclosed MIPs comprise 9 nucleotides. In some embodiments, the sample identifier index in the disclosed MIPs comprise 10 nucleotides.
  • hybridization refers to the process in which two single-stranded polynucleotides bind non-covalently to form a stable double-stranded polynucleotide; triplestranded hybridization is also theoretically possible.
  • the resulting (usually) double-stranded polynucleotide is a “hybrid.”
  • Hybridizations are usually performed under stringent conditions, for example, at a temperature of at least 25 °C and more. As other factors may affect the stringency of hybridization, including base composition and length of the complementary strands, presence of organic solvents and extent of base mismatching, the combination of parameters is more important than the absolute measure of any one alone.
  • hybridizing conditions include any condition (time, temperature, buffer) that result in specific hybridization between complementary sequences, e.g., target nucleic acid sequence is said to specifically hybridize to the MIP probe nucleic acid complementary region when it hybridizes at least 50% as well (e.g., quantitatively under the same hybridization conditions) to the probe as to the perfectly matched complementary target, i.e., with a signal to noise ratio at least half as high as hybridization of the probe to the target under conditions in which the perfectly matched probe binds to the perfectly matched complementary target.
  • the hybridization step may be performed in a thermal cycler.
  • the hybridization program used may be either gradual (ramp temp) or constant.
  • a “gap-fill reaction” is a reaction, described herein, in which a gap is filled by the action of a polymerase between 5' and 3' ends of a molecular inversion probe hybridized to a complementary target nucleic acid.
  • the filled gap consists of a single nucleotide.
  • the gap can be more than one nucleotide, for example, between about 1 to about 500 nucleotides, specifically, between about 1 to about 450 nucleotides, between about 1 to about 400 nucleotides, between about 1 to about 350 nucleotides, between about 1 to about 300 nucleotides, between about 1 to about 250 nucleotides, e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 40, 50 75, 100, 150, 200, 250 or more nucleotides, e.g., between first and second MIP homology regions specifically hybridized to a target nucleic acid.
  • the methods disclosed herein may further encompass gaps of hundreds of nucleotides, and/or gaps between different chromosomes, that may be used in methods that define genomic topological organization, as will be discussed in more detail herein after. It should be understood that the synthesized sequence is further ligated to obtain cyclized product/s in the polymerization and/or ligation reaction mixture.
  • the polymerization reaction is performed by a DNA polymerase.
  • a polymerase as used herein is a member of a group of enzymes required for DNA synthesis.
  • the main function of the DNA polymerase is to synthesize DNA during replication.
  • DNA polymerase works in pairs, replicating two strands of DNA in tandem. They add deoxyribonucleotides at the 3'-OH group of the growing DNA strand.
  • the DNA strand grows in 5’— >3’ direction by their polymerization activity. Adenine pairs with thymine and guanine pairs with cytosine. DNA polymerases cannot initiate the replication process and they need a primer to add to the nucleotides.
  • the polymerization reaction is therefore the synthesis of the DNA strand that corresponds to the appropriate template, as indicated above in connection with the gap-fill reaction.
  • DNA polymerases There are five DNA polymerases identified in E.coli. All the DNA polymerases differ in structure, functions and rate of polymerization and processivity.
  • DNA Polymerase I is coded by polA gene. It is a single polypeptide and has a role in recombination and repair. It has both 5’— >3’ and 3’— >5’ exonuclease activity. DNA polymerase I removes the RNA primer from lagging strand by 5’— >3’ exonuclease activity and also fills the gap.
  • DNA Polymerase II is coded by polB gene.
  • DNA Polymerase III is the main enzyme for replication in E.coli. It is coded by polC gene. It also has proofreading 3’— >5’ exonuclease activity.
  • DNA Polymerase IV is coded by dinB gene. Its main role is in DNA repair during SOS response, when DNA replication is stalled at the replication fork.
  • the DNA polymerase may be any DNA polymerase known in the art.
  • the DNA polymerase is a high- fidelity DNA polymerase.
  • High-Fidelity DNA Polymerase sets a new standard for both fidelity and robust performance. With the highest fidelity amplification available (-280 times higher than Ta ). Q5 DNA Polymerase results in ultra-low error rates. Q5 DNA Polymerase is composed of a novel polymerase that is fused to the processivity-enhancing Sso7d DNA binding domain, improving speed, fidelity and reliability of performance. According to some embodiments, the high-fidelity DNA polymerase in GC enriched DNA regions.
  • the DNA polymerase includes, but is not limited to, any one or more of the following: Q5 High- Fidelity (HF) DNA Polymerase, Advantage® GC Genomic LA Polymerase (Takara), PrimeSTAR® GXL DNA Polymerase (Takara) and AccuPrimeTM GC-Rich DNA Polymerase (Invitrogen), Platinum SuperFi II DNA Polymerase (Thermo Fisher Scientific), KAPA2G Robust HotStart PCR Kit. Still further, in some specific embodiments, a Q5 high fidelity DNA polymerase is used in the present polymerization reaction. In some embodiments, at least one DNA polymerase and dNTPs are added to the hybridized MIP for performing the polymerization reaction. More specifically, in some embodiments, the reaction mixture as referred to herein may comprise in some embodiments any suitable elements required for the polymerization reaction.
  • At least one ligase is added to the reaction.
  • the reaction and/or ligation reaction is performed by incubating in a thermal cycler.
  • DNA Ligase is an enzyme that catalyzes the NAD-dependent ligation of adjacent 3'-hydroxyl and 5'-phosphate termini in duplex DNA structures. Derived from a thermophilic bacterium, Ampligase DNA Ligase is stable and active at much higher temperatures than conventional DNA ligases. The half-life of Ampligase DNA Ligase is 48 hours at 65°C and more than 1 hour at 95°C. In most cases, the upper limit on reaction temperatures with Ampligase DNA Ligase is determined by the Tm of the DNA substrate.
  • Ampligase DNA Ligase has no detectable activity on blunt ends or RNA substrates.
  • the enzyme is active in a variety of DNA polymerase buffers within a pH range of 7-8. It should be understood that any ligase may be used for the disclosed method. Still further, in some embodiments, the polymerization and ligation may be performed at an appropriate temperature for a suitable period of time.
  • thermocycler also known as a thermal cycler, PCR machine or DNA amplifier
  • Thermocycler is a laboratory apparatus most commonly used to amplify segments of DNA via the polymerase chain reaction (PCR).
  • Thermal cyclers may also be used in laboratories to facilitate other temperature-sensitive reactions, including enzymatic reaction (polymerization, exonuclease, restriction enzyme digestion, ligation).
  • the device has a thermal block with holes where tubes holding the reaction mixtures can be inserted.
  • the cycler then raises and lowers the temperature of the block in discrete, pre-programmed steps.
  • the ramp rate of a thermal cycler indicates the change in temperature from one PCR step to another over time and is usually expressed in degrees Celsius per second (°C/sec).
  • up ramp” and “down ramp” refer to the heating and cooling of thermal blocks, respectively.
  • the target nucleic acid sequence is an RNA
  • the nucleic acid molecules are converted into DNA molecules, specifically, cDNA molecules by reversed transcription, for example by using reverse transcriptase.
  • the disclosed methods may comprise an amplification step that may be performed by any suitable amplification methods. In some particular and non-limiting embodiments, the amplification is performed using a PCR reaction.
  • PCR Polymerase chain reaction
  • PCR is a reaction for making multiple copies or replicates of a target nucleic acid flanked by primer binding sites, such reaction comprising one or more repetitions of the following steps: (i) denaturing the target nucleic acid, (ii) annealing primers to the primer binding sites, and (iii) extending the primers by a nucleic acid polymerase in the presence of nucleoside triphosphates.
  • the reaction is cycled through different temperatures optimized for each step in a thermal cycler instrument.
  • PCR encompasses derivative forms of the reaction, including but not limited to, RT-PCR, real-time PCR, nested PCR, quantitative PCR, multiplexed PCR, and the like.
  • the methods of the present disclosure may comprise any one of real-time PCR, nested PCR, quantitative PCR, multiplexed PCR, and the like.
  • Reaction volumes range from a few hundred nanoliters, e.g., 200 nl, to a few hundred pl, e.g. 200 pl.
  • Reverse transcription PCR or "RT-PCR” means a PCR that is preceded by a reverse transcription reaction that converts a target RNA to a complementary single stranded DNA, which is then amplified.
  • Nested PCR means a two-stage PCR wherein the amplicon of a first PCR becomes the sample for a second PCR using a new set of primers, at least one of which binds to an interior location of the first amplicon.
  • initial primers in reference to a nested amplification reaction mean the primers used to generate a first amplicon
  • secondary primers mean the one or more primers used to generate a second, or nested, amplicon.
  • Multiplexed PCR means a PCR wherein multiple target sequences are simultaneously carried out in the same reaction mixture.
  • the methods of the present disclosure may comprise deduplicated PCR or deduplication of PCR (for example as shown in Example 8).
  • UMIs may be used for deduplicated PCR or deduplication of PCR.
  • next-generation sequencing (NGS) or high-throughput sequencing duplicate DNA fragments can be generated during the PCR amplification process. These duplicates may arise due to various factors, such as biased amplification, template overamplification, or other technical artifacts. The presence of duplicate fragments can affect the accuracy of downstream analyses, particularly variant calling and quantification.
  • Deduplicated PCR involves the identification and removal of these duplicate DNA fragments to ensure more accurate and reliable data analysis.
  • the number of PCR cycles may be optimized (for example as shown in Example 6) specifically the number of cycles may be 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50. In some specific embodiments, the number of PCR cycles may be 20 or 22 or 24.
  • the disclosed method may optionally comprise an addition step of enzymatic digestion.
  • the digestion involves the use of at least one exonuclease.
  • Exonucleases refers to enzymes that catalyze the removal of nucleotides in either the 5-prime to 3-prime or the 3-prime to 5-prime direction from the ends of single-stranded and/or double-stranded DNA. Removal of nucleotides is achieved by cleavage of phosphodiester bonds via hydrolysis.
  • exonucleases digest at nicks in the DNA. Some exonucleases remove one base at a time. Lambda Exonuclease is an example of this and transforms double-stranded DNA into single-stranded DNA by chewing from the free ending containing a 5-prime phosphate, degrading one strand preferentially but not the other. Other examples are Exo I and Exo III. Other exonucleases, such as T5, ExoV or Exo VII remove short oligos. The products of T5 Exo also include individual bases.
  • Exonucleases such as Exo VII and V, digest in both the 5-prime to 3-prime and 3-prime to 5-prime direction, while others, such as Exo T and Exo I, only work in one direction. Some exonucleases, such as Exo I and Exo T only digest single-stranded DNA while leaving behind double-stranded DNA. Exonucleases such as T7 Exo digest only double-stranded DNA, while others, such as T5 Exo and Exo V, can digest both single and double-stranded DNA. In more specific embodiments, Exonuclease I and/or Exonuclease III are used.
  • any form of linear MIP probe and/or nucleic acid sequence is removed following the gap-fill reaction by digestion with a combination of exonucleases.
  • the exonuclease mixture contains exonuclease I and exonuclease III. Exonuclease
  • Exonuclease III is a 3 '-exonuclease which catalyzes the removal of mononucleotides from the 3'-OH end of double stranded DNA. It also dephosphorylates DNA strands which possess a 3'-phosphate group and has RNase H activity.
  • Exonuclease VII digests DNA from free 3' or 5' ends. Exonuclease VII has been reported to have little activity on circularized DNA.
  • simultaneous targeted sequencing of at least two samples may be performed wherein the hybridization or cyclized or non-digested cyclized product/s obtained in any one of step (a) or (b) or (c) and corresponding to each of said at least two samples are mixed/pooled together before the amplification step (d).
  • the methods according to the present disclosure may enable the mixing/pooling of between 2 to about 100,000 samples.
  • the method for simultaneous targeted sequencing of at least two samples may further comprise a demultiplexing step or procedure. In some further embodiments, the method for simultaneous targeted sequencing of at least two samples may comprise an in silico demultiplexing step or procedure.
  • the sample index identifier is sequenced together as part of the sequencing procedure and allows the allocation of the sequencing data to the original sample from which it was originated in an in silico procedure termed ‘ ‘demultiplexing” .
  • the mixing/pooling of the different product/s mentioned above may be performed in several ways, that will result in either conserving the initial volume of reaction or reducing the initial volume of reaction or increasing the initial volume of reaction. For example, in some embodiments, after every pooling step (that is carried out in the same volume, from every reaction), the following reaction may be processed at a single initial reaction volume.
  • the pooling of samples of higher expected VAF% may be performed at lower volume and the pooling of samples of lower expected VAF% may be performed at higher volume as shown in Example 7.
  • the ratios between the volume of samples of higher expected VAF% to sample with lower expected VAF% may be about 1:1.5, 1:2, 1:3, 1:3.5, 1:4.0, 1:4.5, 1:5.0, 1:5.5, 1:6.0, 1:6.5, 1:7.0, 1:7.5, 1:8.0, 1:8.5, 1:9.0, 1:9.5, 1:10.0, 1:10.5, 1:11.0, 1:11.5, 1:12.0, 1:12.5, 1:13.0, 1:13.5, 1:14.0, 1:14.5, 1:15.0, 1:15.5, 1:16.0, 1:16.5, 1:17.0,
  • the concentration of DNA starting materials may be optimized as described in Example 5.
  • the concentration of target nucleic acid sequence originating from at least one sample may range from 1 ng/pl to 10000 ng/pl, or from 1 ng/pl to 100 ng/pl, 10 ng/ pl to 100 ng/pl, 20 ng/ pl to 100 ng/pl, 30 ng/ pl to 100 ng/pl, 40 ng/ pl to 100 ng/pl, 50 ng/ pl to 100 ng/pl, 60 ng/ pl to 100 ng/pl, 70 ng/ pl to 100 ng/pl, 80 ng/ pl to 100 ng/pl, 90 ng/ pl to 100 ng/pl, or 100 ng/pl to 10000 ng/pl, 200 ng/pl to 10000 ng/pl, 300 ng/p
  • the concentration of target nucleic acid sequence originating from at least one sample may be about 1 , 2, 3, 4, 5, 6, 7, 8, 9 or 10 ng/pl, or about 10, 20, 30, 40, 50, 60, 70, 80, 90, 100 ng/pl or about 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000 ng/pl.
  • concentration of DNA material present in each step of the methods may also be optimized as shown in Example 6.
  • concentration of hybridization or cyclized or non-digested cyclized product/s may range from 1 ng/pl to 10000 ng/pl, or from 1 ng/pl to 100 ng/pl, 10 ng/ pl to 100 ng/pl, 20 ng/ pl to 100 ng/pl, 30 ng/ pl to 100 ng/pl, 40 ng/ pl to 100 ng/pl, 50 ng/ pl to 100 ng/pl, 60 ng/ pl to 100 ng/pl, 70 ng/ pl to 100 ng/pl, 80 ng/ pl to 100 ng/pl, 90 ng/ pl to 100 ng/pl, or 100 ng/pl to 10000 ng/pl, 200 ng/pl to 10000
  • the concentration of hybridization or cyclized or non-digested cyclized product/s may be about 1 , 2, 3, 4, 5, 6, 7, 8, 9 or 10 ng/pl, or about 10, 20, 30, 40, 50, 60, 70, 80, 90, 100 ng/pl or about 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000 ng/pl.
  • the disclosed method further comprises a sequencing step.
  • the methods of the present disclosure further comprises sequencing the amplified nucleic acid sequences obtained in step (d).
  • DNA sequencing or “Sequencing” is the process of determining the nucleic acid sequence - the order of nucleotides in DNA. It includes any method or technology that is used to determine the order of the four bases: adenine, guanine, cytosine, and thymine.
  • NGS second-generation sequencing
  • NGS technology is typically characterized by being highly scalable, allowing the entire genome to be sequenced at once. Usually, this is accomplished by fragmenting the genome into small pieces, randomly sampling for a fragment, and sequencing it using one of a variety of technologies. An entire genome sequencing is possible because multiple fragments are sequenced at once (giving it the name "massively parallel” sequencing) in an automated process. More specifically, NGS generates large quantities of sequence data within a shorter time duration and massive cost reduction as compared to conventional Sanger’s sequencing method. This technique uses different chemistries, matrices and bioinformatics technologies which can be used to sequence entire genome in shorter time periods.
  • DNA sequencing pipeline includes various steps which includes, DNA fragmentation, NGS Library preparation (these two can be combined by transposase mediated library preparation) Sequencing and Data analysis.
  • DNA Fragmentation targeted DNA is broken into several small segments using different methods like sonication and enzymatic digestion.
  • the next step involves the preparation of a NGS Library, wherein each piece of the fragmented DNA is modified DNA to be sequencing ready, namely by adding DNA sequences (adapters) that are required for sequencing instrument compatibility, in some embodiments of DNA sequencing generally termed “targeted sequencing” the desired target is captured after library preparation (“probe capture” or amplified “amplicon/MIP” from the genomic template).
  • the library is sequenced using the various DNA sequencing methods.
  • Each DNA fragment has an adapter on one end that connects it to a solid substrate such as beads or flow cells, and another adapter on the other end that anneals to a primer that starts the polymerase chain reaction (PCR).
  • PCR polymerase chain reaction
  • PCR produces several copies of the same fragment, which are sequenced at the same time.
  • DNA Sequencing may be performed in some embodiments, using an NGS sequencer.
  • the library is uploaded onto a sequencing matrix.
  • the platform on which the sequencing takes place is known as a sequencing matrix. Sequencing matrices differ depending on the sequencer. For example, the Illumina NGS sequencer uses flow cells, while the Ion torrent NGS sequencer uses sequencing chips.
  • the disclosed method may further comprise applying machine learning algorithm on the identified variants or a subgroup thereof, for calculating sensitivity, specificity and precision thereof.
  • step (d) further comprising identification of the amplified nucleic acid sequences obtained in step (d) via array-based hybridization approaches.
  • next-generation sequencing may be used along with next-generation sequencing; it may be used with other downstream methods such as microarrays, counting by digital PCR, real-time PCR, Mass-spectrometry analysis etc.
  • the at least one MIP is a plurality of MIPs comprising unified at least one sample identifier index, each of the MIPs targets different target nucleic acid sequences of interest.
  • the MIP suitable for the method of the present disclosure may be as further defined below in the following aspects of the present disclosure.
  • the plurality of MIPs suitable for the method of the present may be as further defined below in the following relevant aspect.
  • the method of the present disclosure may further comprise identifying variants of interest.
  • the method of the present disclosure may further comprise applying machine learning algorithm on the identified variants or a subgroup thereof, for calculating sensitivity, specificity and precision thereof.
  • the subgroup of variants comprises variants having Variant Allele Frequency (VAF) below threshold.
  • the present disclosure thus provides a sensitive and improved method displaying noise reduction allowing detection of variants with VAF as low as 10%, 9%, 8%, 7%, 6%, 5%, 4%, 3%, 2%, 1%, 0.1 %, 0.9%, 0.8%, 0.7%, 0.6%, 0.5%, 0.4%, 0.3%, 0.2%, 0.1%, more specifically, between 0.5% to 0.6%, specifically, 0.51%, 0.52%, 0.53%, 0.54%, 0.55%, 0.56%, 0.57%, 0.58%, 0.59%, 0.6%, or less, specifically, 0.5, with sensitivity of about 100% to 75%, specifically, 100%, 99%, 98%, 97%, 96%, 95%, 94%, 93%, 92%, 91%, 90%, 89%, 88%, 87%, 86%, 85%, 84%, 3%, 82%, 81%, 80%, 79%, 78%, 77%, 76%, 75%, or less, and specifically, 80% sensitivity and significantly higher precision.
  • target nucleic acid sequence or “target nucleic acid of interest", as used herein, refers to the sample nucleic acid putatively including a target sequence of interest.
  • the target sequence of interest with regard to a MIP includes those sequences complementary to the MIP homology regions.
  • the sequence may include one or more interrogated nucleotides that may or may not match a corresponding nucleotide on a MIP homology region, or may or may not provide a substrate for a polymerase provided with the complementary dNTP/s.
  • target nucleic acid sequence of interest refers in some embodiments to a nucleic acid sequence that may comprise or comprised within a gene or any fragment or derivative thereof.
  • the target nucleic acid sequence or gene of interest may comprise coding or non-coding DNA regions, or any combination thereof.
  • the nucleic acid sequence of interest may comprise coding sequences and thus may comprise exons or fragments thereof that encode any product.
  • the target nucleic acid sequence of interest may comprise non-coding sequences, as for example start codons, 5’ untranslated regions (5’ UTR), 3’ un-translated regions (3’ UTR), or other regulatory sequences, in particular regulatory sequences.
  • the at least one target nucleic acid sequence of interest is at least one of a genomic nucleic acid sequence, a transcriptomic nucleic acid sequence, and a circulating free DNA (cfDNA).
  • the at least one target nucleic acid sequence of interest is a nucleic acid sequence associated with, or comprising, at least one of: genetic and/or epigenetic variation/s, pathologic disorder/s, infectious entity, microorganism/s and GC-rich regions.
  • the disclosed MIP-based targeted sequencing methods are particularly useful for target nucleic acid sequences comprising GC-rich regions. As indicated herein, the disclosed methods are particularly effective and applicable for target nucleic acid sequences that comprise GC -regions or display high GC-content.
  • GC-content is the percentage of nitrogenous bases in a DNA or RNA molecule that are either guanine (G) or cytosine (C). This measure indicates the proportion of G and C bases out of an implied four total bases, also including adenine and thymine in DNA and adenine and uracil in RNA.
  • GC-content may be given for a certain fragment of DNA or RNA or for an entire genome.
  • it may denote the GC-content of an individual gene or section of a gene (domain), a group of genes or gene clusters, a non-coding region, or a synthetic oligonucleotide such as a primer.
  • the GC content of a gene region can impact its coverage, with regions having 50-60% GC content receiving the highest coverage while regions with high (70- 80%) or low (30-40%) GC content having significantly decreased coverage.
  • the genetic variations comprise at least one of: single nucleotide variant (SNVs) and/or single- nucleotide polymorphisms (SNPs), insertions and/or deletions, (indels), inversions, copy number variations (CNV), structural variations, alternative splicing, loss of heterozygosity (LOH), gene fusions, translocations, duplications and variable number of tandem repeats.
  • SNVs single nucleotide variant
  • SNPs single- nucleotide polymorphisms
  • indels inversions
  • CNV copy number variations
  • LH loss of heterozygosity
  • gene fusions translocations
  • duplications variable number of tandem repeats.
  • the disclosed MIP-based targeted sequencing methods are particularly useful for target nucleic acid sequences comprising at least one of: single nucleotide variant (SNVs) and/or single- nucleotide polymorphisms (SNPs), insertions and/or deletions, (indels), inversions, copy number variations (CNV), structural variations, alternative splicing, loss of heterozygosity (LOH), gene fusions, translocations, duplications and variable number of tandem repeats.
  • SNVs single nucleotide variant
  • SNPs single- nucleotide polymorphisms
  • indels inversions
  • CNV copy number variations
  • LH loss of heterozygosity
  • gene fusions translocations
  • duplications duplications and variable number of tandem repeats.
  • the disclosed MIP-based targeted sequencing methods are particularly useful for target nucleic acid sequences comprising SNP.
  • SNP single nucleotide polymorphism
  • the least frequent allele should have a frequency of 1 % or greater.
  • the most frequent allele is referred to as the “major allele”.
  • SNPs are usually bi-allelic, mainly due to the low frequency of single nucleotide substitutions in DNA.
  • the term “SNP” usually refers to the least frequent allele (i.e.
  • the disclosed MIP-based targeted sequencing methods are particularly useful for target nucleic acid sequences comprising CNV.
  • Copy-number variation means variation from one person to another in the number of copies of a particular gene or DNA sequence.
  • the disclosed MIP-based targeted sequencing methods are particularly useful for target nucleic acid sequences comprising deletion.
  • Deletion refers to any mutation that involves the loss of genetic material. It can be small, involving a single missing DNA base pair, or large, involving hundreds or thousands of nucleotides, and in some embodiments event a piece of a chromosome.
  • the disclosed MIP-based targeted sequencing methods are particularly useful for target nucleic acid sequences comprising indel.
  • Indel as referred to herein relates to an insertion or deletion of bases in the genome of an organism. It is classified among small genetic variations, measuring from 1 to 10,000 base pairs in length. A microindel is defined as an indel that results in a net change of 1 to 50 nucleotides.
  • the disclosed MIP-based targeted sequencing methods are particularly useful for target nucleic acid sequences comprising insertion mutation.
  • Insertion mutation as used herein is a mutation involving the addition of genetic material.
  • An insertion mutation can be small, involving a single extra DNA base pair, or large, involving a piece of a chromosome/s.
  • the disclosed MIP-based targeted sequencing methods are particularly useful for target nucleic acid sequences comprising inversion.
  • Inversion is a chromosomal segment that has been broken off and reinserted in the same locus, but with the reverse orientation.
  • the disclosed MIP-based targeted sequencing methods are particularly useful for target nucleic acid sequences comprising translocation.
  • Translocation refers to herein as the positional change of one or more chromosome segments in cells or gametes.
  • the disclosed MIP-based targeted sequencing methods are particularly useful for target nucleic acid sequences comprising structural variations in nucleic acid molecules, for example, genomic organization or topological organization of nucleic acids. More specifically, although genomes are defined by their sequence, the linear arrangement of nucleotides is only their most basic feature. A fundamental property of genomes is their topological organization in three-dimensional space in the intact cell nucleus. The application of imaging methods and genome-wide biochemical approaches, combined with functional data, is revealing the precise nature of genome topology/organization and its regulatory functions in gene expression and genome maintenance. In the context of the subject disclosure, genomic organization refers to the linear order of DNA elements and their division into chromosomes.
  • Genome organization can also refer to the 3D structure of chromosomes and the positioning of DNA sequences within the nucleus.
  • One non-limiting example for high-throughput genomic and epigenomic technique to capture chromatin conformation is the Hi-C (or standard Hi-C) technique.
  • Hi-C is considered as a derivative of a series of chromosome conformation capture technologies, including but not limited to 3C (chromosome conformation capture), 4C (chromosome conformation capture-on-chip/circular chromosome conformation capture), and 5C (chromosome conformation capture carbon copy).
  • NGS next-generation sequencing
  • the disclosed MIP-based targeted sequencing methods are particularly useful for target nucleic acid sequences comprising epigenetic modifications.
  • Epigenetics as referred to herein, relates to heritable phenotype changes that do not involve alterations in the nucleic acid sequence. Epigenetics most often involves changes that affect gene activity and expression, and thereby the phenotype of the cell. Epigenetic modifications or variations, involve in some embodiments, covalent modification of the DNA sequence or of proteins associated with DNA organization and functioning.
  • epigenetic variations as disclosed herein comprise DNA methylation, (e.g. cytosine methylation and hydroxymethylation), histone modifications (e.g. lysine acetylation, lysine and arginine methylation, serine and threonine phosphorylation, and lysine ubiquitination and sumoylation).
  • the target gene or nucleic acid sequence of interest may be any nucleic acid sequence or gene or fragments thereof that display aberrant expression, stability, activity or function in a mammalian subject, as compared to normal and/or healthy subject.
  • Such target gene or any fragments thereof or any target nucleic acid sequence may be in some embodiments, associated, linked or connected, directly or indirectly with at least one pathologic condition.
  • the length of the nucleic acid sequence of interest may be about 100,000 nucleotides in length, or less than 75,000 nucleotides in length or less than 50,000 nucleotides in length, or less than 40,000 nucleotides in length, or less than 30,000 nucleotides in length, or less than 20,000 nucleotides in length, or less than 15,000 nucleotides in length, or less than 10,000 nucleotides in length, or less than 5000 nucleotides in length, or less than 1000 nucleotides in length, or less than 900 nucleotides in length, or less than 800 nucleotides in length, or less than 700 nucleotides in length, or less than 600 nucleotides in length, or less than 500 nucleotides in length, or less than 450 nucleotides in length, or less than 400 nucleotides in length, or less than 300 nucleotides in length, or less than 200 nucleotides in length, or less than 100 nucleo
  • genomic nucleic acid sequence may include nuclear DNA and non-nuclear DNA or may be any either linear or circular nucleic acids.
  • nuclear DNA specifically, chromosomal DNA and Microbiome DNA (e.g., Gut microbiome), as well as circular genomic DNA such as mitochondrial DNA and chloroplast DNA (cpDNA).
  • genomic nucleic acid sequence may further include genomic nucleic acid molecules of any organism or microorganism as disclosed in the present disclosure, or any nucleic acid sequence of any infectious entity, for example, viruses, specifically, any viruses disclosed by the present disclosure, or any bacteriophages and transducing particles.
  • the target nucleic acid sequences may be of chromosomal or non-chromosomal source.
  • Nucleic acid sequences of non-chromosomal source encompassed by the present disclosure include transposons, plasmids, mitochondrial DNA, and chloroplast DNA, as well as nucleic acid molecules of any other genetic element.
  • the target nucleic acid sequence applicable in the disclosed methods may be any circulating free DNA (cfDNA). More specifically, Cell-free nucleic acids (cf-NAs) include several types of DNA (cf-DNA) and RNA molecules (cell-free non-coding RNAs, and protein coding RNA - mRNA) that are present in extracellular fluids.
  • cf-DNA cell-free nuclear DNA
  • cf-mtDNA cell-free mitochondrial DNA
  • Circulating free DNA cfDNA
  • cfDNA Circulating free DNA
  • cfDNA can be used to describe various forms of DNA freely circulating in the bloodstream, including circulating tumor DNA (ctDNA), cell-free mitochondrial DNA (ccf mtDNA), and cell-free fetal DNA (cffDNA).
  • the target nucleic acid sequence applicable in the methods of the present disclosure may be in some embodiments, cell free non-coding RNA or long non-coding RNAs.
  • Cell free non-coding RNA relate to small non-coding RNA, including but not limited to microRNAs (miRNA), siRNA, piRNA, snRNA, snoRNA, YRNA etc., or long non-coding RNA (IncRNAs) including but not limited to pseudogen RNA, telomerase RNA, circular RNA (cirRNA), etc.
  • the target nucleic acid sequence applicable in the methods of the present disclosure may be Long non-coding RNAs.
  • Long non-coding RNAs are non-protein-coding transcripts with a length of more than 200 nt. They can be transcribed from intergenic regions (long intervening non-coding RNAs), from the introns of protein-coding genes (intronic IncRNAs) or as antisense transcripts of genes.
  • sncRNA small non-coding RNAs
  • sncRNA small non-coding RNAs
  • post-transcriptional gene regulation e.g., antisense IncRNAs binding to their corresponding sense transcripts and alter splice-site recognition or spliceosome recruitment in mRNA processing.
  • the target sequence may be transcriptomic nucleic acid sequence, thereby providing information with respect to the transcriptome and/or the exome of an organism.
  • the at least one target nucleic acid sequence of interest is associated with at least one hereditary, congenital, and/or somatic pathologic disorder or condition.
  • the neoplastic disorder is Acute Myeloid Leukemia (AML).
  • AML Acute Myeloid Leukemia
  • the at least one target nucleic acid sequence of interest is derived from a genomic DNA of a human subject prone to have AML.
  • the target nucleic acid sequences of interest may comprise the genomic locus associated with AML.
  • a genomic locus associated with AML may comprise the DNMT3A gene.
  • said genomic locus may be chr2:25457097-25457316 (hgl9 build).
  • the target nucleic acid sequences of interest may comprise the R882 codon.
  • said different target nucleic acid sequences of interest may comprise the same genomic locus.
  • said genomic locus may be associated with AML, for example said genomic locus may be chr2:25457097-25457316 (hgl9 build).
  • said different target nucleic acid sequences of interest may comprise the DNMT3A gene, e.g. said different target nucleic acid sequences of interest may comprise the R882 codon.
  • said genomic locus may be chrl:114713908.
  • the genomic locus may comprise the NRAS gene, e.g. may be associated with cancer.
  • said genomic locus may be chrl7:7,675,206, e.g. may comprise the TP53 gene, e.g. may be associated with cancer.
  • the genomic locus may comprise the JAK2 gene, e.g. may be associated with cancer.
  • the infectious entity may be at least one of a viral, a bacterial, a fungal, a parasitic and a protozoan pathogen.
  • the at least one sample is a biological or environmental sample.
  • the terms "sample”, “test sample” and “specimen” are used interchangeably in the present specification and claims and are used in its broadest sense. They are meant to include both biological and environmental samples and may include an exemplar of synthetic origin. This term refers to any media that may contain the at least one microorganism, e.g., a pathogen and may include fluid, cell and/or tissue samples.
  • the biological sample is a fluid sample. Fluid sample include, but are not limited to, saliva, mucosa, feces, serum, urine, blood, plasma, cerebral spinal fluid (CSF), milk, bronchoalveolar lavage (BAL) fluid, rinse fluid obtained from wash of body cavities, phlegm, pus.
  • biological samples including samples taken from various body regions (nose, throat, vagina, ear, eye, skin, sores), food products (both solids and fluids) and swabs taken from medicinal instruments, apparatus, materials), samples from various surfaces [hospitals, elderly homes, food manufacturing facilities, slaughterhouses, pharmaceutical equipment (catheters etc), food preparation or packaging products), solutions and buffers], sewage etc.
  • biological samples may be provided from animal, including human, fluid, solid (e.g., stool) or tissue, as well as liquid and solid food and feed products, food designed for human consumption, a sample including food designed for animal consumption, food matrices and ingredients such as dairy items, vegetables, meat and meat by-products, waste and sewage.
  • biological samples may include saliva, mucosa (nasal or oral swab samples), feces, serum, blood, urine, anterior nares specimen collected by a healthcare professional or by onsite or home self-collection specimens throat swab.
  • Biological samples and specimens may be obtained from human as well as from all of the various families of domestic animals, as well as feral or wild animals, including, but not limited to, such animals as ungulates, bear, birds, fish, lagamorphs, rodents, etc.
  • environmental samples include environmental material such as surface matter, earth, soil, water, air and industrial samples, as well as samples obtained from food and dairy processing instruments, apparatus, equipment, utensils, disposable and non-disposable items. These examples are not to be construed as limiting the sample types applicable to the present disclosure.
  • the sample may be any media, specifically, a liquid media that may contain the target nucleic acid molecules or sequences. Typically, substances, surfaces and samples or specimens that are a priori not liquid may be contacted with a liquid media which is used and tested by the methods disclosed herein.
  • the at least one sample is a biological sample and originates from a subject.
  • the at least two samples are biological samples and originated from the same subject or different subjects.
  • the subject is at least one organism of the biological kingdom Animalia or at least one organism of the biological kingdom Plantae.
  • the methods of the present disclosure may be applicable for any subject of the biological kingdom Animalia. It should be understood that an organism of the Animalia kingdom in accordance with the present disclosure includes any invertebrate or vertebrate organism. in some embodiments, the methods of the present disclosure may be applicable for an invertebrate organism. More specifically, Invertebrates are animals that neither possess nor develop a vertebral column (commonly known as a backbone or spine), derived from the notochord. This includes all animals apart from the subphylum Vertebrata.
  • invertebrates include the Phylum Porifera - Sponges, the Phylum Cnidaria - Jellyfish, hydras, sea anemones, corals, the Phylum Ctenophora - Comb jellies, the Phylum Platyhelminthes - Flatworms, the Phylum Mollusca - Molluscs, the Phylum Arthropoda - Arthropods, the Phylum Annelida - Segmented worms like earthworm and the Phylum Echinodermata - Echinoderms.
  • Familiar examples of invertebrates include insects; crabs, lobsters and their kin; snails, clams, octopuses and their kin; starfish, sea-urchins and their kin; jellyfish and worms.
  • Vertebrates comprise all species of animals within the subphylum Vertebrata (chordates with backbones).
  • the animals of the vertebrates group include Fish, Amphibians, Reptiles, Birds and Mammals (e.g., Marsupials, Primates, Rodents and Cetaceans).
  • Vertebrates represent the overwhelming majority of the phylum Chordata, with currently about 66,000 species described. Vertebrates include the jawless fish and the jawed vertebrates, which include the cartilaginous fish (sharks, rays, and ratfish) and the bony fish.
  • the subject of the present disclosure may be any one of a human or non-human mammal, an avian, an insect, a fish, an amphibian, a reptile, a crustacean, a crab, a lobster, a snail, a clam, an octopus, a starfish, a sea-urchin, jellyfish, and worms.
  • the subject referred to herein may be a mammal.
  • such mammalian organisms may include any member of the mammalian nineteen orders, specifically, Order Artiodactyla (even-toed hoofed animals), Order Carnivora (meat-eaters), Order Cetacea (whales and purpoises), Order Chiroptera (bats), Order Dermoptera (colugos or flying lemurs), Order Edentata (toothless mammals), Order Hyracoidae (hyraxes, desserties), Order Insectivora (insect-eaters), Order Lagomorpha (pikas, hares, and rabbits), Order Marsupialia (pouched animals), Order Monotremata (egg-laying mammals), Order Perissodactyla (odd-toed hoofed animals), Order Pholidata, Order Pinnipedia (seals and walruses), Order Primates (primates),
  • Order Artiodactyla
  • the present disclosure may be applicable for any organism of the order primates. More specifically, primates are divided into two distinct suborders, the first is the strepsirrhines that includes lemurs, galagos, and lorisids. The second is haplorhines - that includes tarsier, monkey, and ape clades, the last of these including humans.
  • the present disclosure may be applicable for any organism of the subfamily Homininae, that includes the hylobatidae (gibbons) and the hominidae that includes ponqunae (orangutans) and homininae [gorillini (gorilla) and hominini ((panina(chimpanzees) and hominina (humans))].
  • the methods of the present disclosure may be applicable for a mammal that may be any domestic mammal, for example, at least one of a Cattle, domestic pig (swine, hog), sheep, horse, goat, alpaca, lama and Camels. Still further, in some embodiments, the mammalian subject is human subject.
  • the present disclosure concerns any eukaryotic organism and as such, may be also applicable for members of the biological kingdom Plantae.
  • the disclosed methods may be applicable for any plant.
  • such plant may be a dioecious plant or monoecious plant.
  • the organism of the biological kingdom Plantae may be a dioecious plant, specifically, a plant presenting biparental reproduction.
  • the plant diagnosed by the disclosed methods may be of the family Cannabaceae, specifically, any one of Cannabis (hemp, marijuana) and Humulus (hops).
  • the plant of the family Cannabaceae may be Cannabis (hemp, marijuana).
  • the plant of the family Cannabaceae may be Humulus (hops).
  • any plants are applicable in the present disclosure, for example, any model plants such as, Arabidopsis, Tobacco, Solanum licopersicum, Solanum tuberosum.
  • Canola, Cereals (Corn wheat, Barley), rice, sugarcane, Beet, Cotton, Banana, Cassava, sweet potato, lentils, chickpea, peas, Soy, nuts, peanuts, Lemna, Apple, may be applicable in the present disclosure.
  • a non-comprehensive list of useful annual and perennial, domesticated or wild, monocotyledonous or dicotyledonous land plant or Algae - i.e unicellular or multicellular algae including diatoms, microalgae, ulva, nori, gracilaria
  • applicable in accordance with the present disclosure may include but are not limited to crops, ornamentals, herbs (i.e., labiacea such as sage, basil and mint, or lemon grass, chives), grasses (i.e., lawn and biofuel grasses and animal feed grasses), cereals (i.e., rice, wheat, rye, oats, corn), legumes (i.e.
  • Crucifera i.e., oilseed rape, mustard, brassicas, cauliflower, radish
  • Sesame the monocot Aspargales (i.e. onion, garlic, leek, asparagus, vanilla, lilies, tulips, narcissus), Myrtacea (i.e., Eucalyptus, pomegranate, guava), Subtropical fruit trees (i.e. Avocado, Mango, Litchi, papaya), Citrus (i.e. orange, lemon, grapefruit), Rosacea (i.e. apple, cherry, plum, almond, roses), berry-plants (i.e.
  • grapes mulberries, blueberries, raspberry, strawberry
  • nut trees i.e. macademia, hazelnut, pecan, walnut, chestnuts, brazil nut, cashew
  • palms i.e., oil-palm, coconut and dates
  • evergreen coniferous or deciduous trees, woody species.
  • the subject is a human.
  • the present disclosure provides a method for diagnosing a pathological disorder in at least one subject by identifying at least one genetic and/or epigenetic variation/s associated with said pathologic disorder, and/or at least one nucleic acid sequence of at least one pathogenic entity, in at least one target nucleic acid sequence of at least one samples of said at least one subject, the method comprising the step of performing molecular inversion probe-based method for targeted sequencing in at least one subject or in any nucleic acid molecule obtained therefrom, wherein the presence of one or more of said variation/s in said target nucleic acid sequence and/or of at least one nucleic acid sequence of at least one pathogenic entity in said at least one sample, indicates that the least one subject has a risk, is a carrier, or is suffering from said pathologic disorder, and wherein the molecular inversion probe-based method for targeted sequencing comprising at least one of the steps of: a. contacting at least one target nucleic acid sequence originating from at least one sample with at least one MIP comprising
  • At least one sample identifier index at least one sample identifier index; thereby obtaining at least one hybridization product/s from at least one sample
  • said at least one hybridization product comprises said at least one MIP hybridized to the first and second target regions of at least one target nucleic acid sequence of interest
  • the reaction mixture may be the polymerization and/or ligation reaction mixture
  • the reaction mixture may be the polymerization and/or ligation reaction mixture
  • c optionally subjecting the reaction mixture obtained in step (b) to enzymatic digestion thereby digesting linear MIP/s or nucleic acid molecule/s present in said reaction mixture, thereby obtaining at least one non-digested cyclized product/s from at least one sample; d. amplifying the nucleic acid sequence of said non-digested cyclized product/s of step (c).
  • the steps (a) and (b) may be performed together.
  • the methods may be suitable for screening of at least one carrier of at least one pathological disorder.
  • a carrier may carry a gene/nucleic acid sequence that may lead to genetic disorder.
  • the method of the present disclosure is for simultaneous targeted sequencing of at least two samples, wherein the hybridization or cyclized or non-digested cyclized product/s obtained in any one of step (a) or (b) or (c) and corresponding to each of said at least two samples are mixed/pooled together before the amplification step (d).
  • the present disclosure provides a a method for diagnosing a pathological disorder in at least one subject by identifying at least one genetic and/or epigenetic variation/s associated with said pathologic disorder, and/or at least one nucleic acid sequence of at least one pathogenic entity, in at least one target nucleic acid sequence of at least two samples of said at least one subject, the method comprising the step of performing molecular inversion probebased method for simultaneous targeted sequencing in at least one subject or in any nucleic acid molecule obtained therefrom, wherein the presence of one or more of said variation/s in said target nucleic acid sequence and/or of at least one nucleic acid sequence of at least one pathogenic entity in said at least two samples, indicates that the least one subject has a risk, is a carrier, or is suffering from said pathologic disorder, and wherein the molecular inversion probe-based method for simultaneous targeted sequencing comprising the steps of: a.
  • MIP comprising: (i) a first region comprising a first sequence complementary to a first target region in a target nucleic acid sequence of interest;
  • step (iii) at least one sample identifier index; thereby obtaining at least one hybridization product/s from at least two samples; and b. subjecting the hybridization product/s obtained in step (a) to a polymerization reaction, thereby synthesizing a sequence corresponding to the at least one target nucleic acid sequence of interest nested between the first and second regions, wherein the synthesized sequence is further ligated to obtain at least one cyclized product/s in a reaction mixture from said at least two samples; and c.
  • step (b) subjecting the reaction mixture obtained in step (b) to enzymatic digestion thereby digesting linear MIP/s or nucleic acid molecule/s present in said reaction mixture, thereby obtaining at least one non-digested cyclized product/s from at least two samples; d. amplifying the nucleic acid sequence of said non-digested cyclized product/s of step (c); wherein the hybridization or cyclized or non-digested cyclized product/s obtained in any one of step (a) or (b) or (c) and corresponding to each of said at least two samples are mixed/pooled together before the amplification step (d).
  • the hybridization product/s obtained in step (a) for each of at least two samples are mixed/pooled.
  • the cyclized product/s obtained in step (b) for each of at least two samples are mixed/pooled.
  • the non-digested cyclized product/s obtained in step (c) for each of at least two samples are mixed/pooled.
  • the method for simultaneous targeted sequencing of at least two samples may further comprise a demultiplexing step or procedure. In some further embodiments, the method for simultaneous targeted sequencing of at least two samples may comprise an in silico demultiplexing step or procedure.
  • the pathologic disorder is at least one of: a neoplastic disorder, a metabolic condition, an inflammatory disorder, an infectious disease caused by a pathogen, mental disorders, an autoimmune disease, a cardiovascular disease, a neurodegenerative disorder, fetal genetic condition and an age-related condition. Still further, in some embodiments, the pathologic disorder may be at least one somatic, spontaneous, or acquired pathologic disorder or condition.
  • pathologic disorders applicable in the present disclosure my be any spontaneous, or acquired pathologic disorder, for example, and disorder caused by environmental exposure to a pathogenic agent or any environmental stress or condition.
  • the pathologic disorder is at least one of: a neoplastic disorder, a metabolic condition, an inflammatory disorder, an infectious disease caused by a pathogen, mental disorders, an autoimmune disease, a cardiovascular disease, a neurodegenerative disorder, fetal genetic condition and an age-related condition.
  • the pathologic disorder may be at least one hereditary disease.
  • Hereditary disease refers to a disease or disorder that is caused by defective genes which are inherited from the parents. A hereditary disease may result unexpectedly when two healthy carriers of a defective recessive gene reproduce but can also happen when the defective gene is dominant.
  • Non-limiting examples of hereditary diseases include Duchenne muscular dystrophy (DMD), Cystic Fibrosis, Tay-Sachs disease (also known as GM2 gangliosidosis or hexosaminidase A deficiency), Ataxia-Telangiectasia (A-T), Sickle-cell disease (SCD), or sickle-cell anemia (SCA or anemia), Lesch-Nyhan syndrome (LNS, also known as Nyhan's syndrome, Amyotrophic Lateral Sclerosis, Cystinosis, Kelley-Seegmiller syndrome and Juvenile gout), color blindness, Haemochromatosis (or haemosiderosis), Haemophilia, Phenylketonuria (PKU), Phenylalanine Hydroxylase Deficiency disease, Polycystic kidney disease (PKD or PCKD, also known as polycystic kidney syndrome), Alpha-galactosidase A deficiency, Fabry
  • the pathological disorder may be at least one congenital disorders. More specifically, A congenital disorder is a medical condition that is present at or before birth. These conditions, also referred to as birth defects, can be acquired during the fetal stage of development or from the genetic make up of the parents. Congenital disorders are not necessarily hereditary, since they may be caused by infections during pregnancy or injury to the fetus at birth. Major anomalies are sometimes associated with minor anomalies, which might be objective (e.g., preauricular tags) or more subjective (e.g. low-set ears).
  • objective e.g., preauricular tags
  • subjective e.g. low-set ears
  • Non limiting embodiments include external disorders and internal disorders such as Neural tube defects, Microcephaly, Microtia/ Anotia, Orofacial clefts, Exomphalos (omphalocele), Gastroschisis, Hypospadias, Reduction defects of upper and lower limbs, Talipes, equinovarus/club foot, Congenital heart defects, Esophageal atresia/tracheoesophageal fistula, Large intestinal atresia/stenosis, Anorectal atresia/stenosis and Renal agenesis/hypoplasia.
  • external disorders and internal disorders such as Neural tube defects, Microcephaly, Microtia/ Anotia, Orofacial clefts, Exomphalos (omphalocele), Gastroschisis, Hypospadias, Reduction defects of upper and lower limbs, Talipes, equinovarus/club foot, Congenital heart defects, Es
  • the pathological disorder may be at least one somatic disorders.
  • a somatic symptom disorder formerly known as a somatoform disorder is any mental disorder that manifests as physical symptoms that suggest illness or injury, but cannot be explained fully by a general medical condition or by the direct effect of a substance, and are not attributable to another mental disorder (e.g., panic disorder).
  • Somatic symptom disorders as a group, are included in a number of diagnostic schemes of mental illness. Somatic disorders may be also referred to as somatization disorder and undifferentiated somatoform disorder.
  • the relevant pathologic disorder may be at least one of: a proliferative disorder, and/or a neoplastic disorder, a metabolic condition, an inflammatory disorder, an infectious disease caused by a pathogen, a mental disorder, an autoimmune disease, a cardiovascular disease, a neurodegenerative disorder, fetal genetic condition and an age-related condition.
  • pathologic disorders encompassed by the present disclosure further include infections and parasitic diseases, endocrine, nutritional diseases, immunity disorders, diseases of blood and blood forming organs, mental disorders, diseases of nervous system and sense organs, diseases of the circulatory system, diseases of the respiratory system, diseases of the digestive system, diseases of genitourinary system, complications of pregnancy, childbirth and the puerperium, diseases of the skin and subcutaneous tissue, diseases of musculoskeletal system and connective tissue and congenital anomalies.
  • relevant pathologic disorder may be any neoplastic disorder and/or any proliferative disorder. More specifically, as used herein to describe the present disclosure, "neoplastic disorder”, “proliferative disorder”, “cancer”, “tumor” and “malignancy” all relate equivalently to a hyperplasia of a tissue or organ. If the tissue is a part of the lymphatic or immune systems, malignant cells may include non-solid tumors of circulating cells. Malignancies of other tissues or organs may produce solid tumors. In general, the methods of the present disclosure may be applicable for diagnosing of a patient suffering from any one of non-solid and solid tumors. Malignancy, as contemplated in the present disclosure may be any one of carcinomas, melanomas, lymphomas, leukemias, myeloma and sarcomas.
  • Carcinoma refers to an invasive malignant tumor consisting of transformed epithelial cells. Alternatively, it refers to a malignant tumor composed of transformed cells of unknown histogenesis, but which possess specific molecular or histological characteristics that are associated with epithelial cells, such as the production of cytokeratins or intercellular bridges.
  • Melanoma as used herein, is a malignant tumor of melanocytes.
  • Melanocytes are cells that produce the dark pigment, melanin, which is responsible for the color of skin. They predominantly occur in skin but are also found in other parts of the body, including the bowel and the eye. Melanoma can occur in any part of the body that contains melanocytes.
  • Leukemia refers to progressive, malignant diseases of the blood-forming organs and is generally characterized by a distorted proliferation and development of leukocytes and their precursors in the blood and bone marrow. Leukemia is generally clinically classified on the basis of (1) the duration and character of the disease-acute or chronic; (2) the type of cell involved; myeloid (myelogenous), lymphoid (lymphogenous), or monocytic; and (3) the increase or nonincrease in the number of abnormal cells in the blood-leukemic or aleukemic (subleukemic).
  • Sarcoma is a cancer that arises from transformed connective tissue cells. These cells originate from embryonic mesoderm, or middle layer, which forms the bone, cartilage, and fat tissues. This is in contrast to carcinomas, which originate in the epithelium. The epithelium lines the surface of structures throughout the body, and is the origin of cancers in the breast, colon, and pancreas.
  • Myeloma as mentioned herein is a cancer of plasma cells, a type of white blood cell normally responsible for the production of antibodies. Collections of abnormal cells accumulate in bones, where they cause bone lesions, and in the bone marrow where they interfere with the production of normal blood cells. Most cases of myeloma also feature the production of a paraprotein, an abnormal antibody that can cause kidney problems and interferes with the production of normal antibodies leading to immunodeficiency. Hypercalcemia (high calcium levels) is often encountered.
  • Lymphoma is a cancer in the lymphatic cells of the immune system.
  • lymphomas present as a solid tumor of lymphoid cells. These malignant cells often originate in lymph nodes, presenting as an enlargement of the node (a tumor). It can also affect other organs in which case it is referred to as extranodal lymphoma.
  • Non limiting examples for lymphoma include Hodgkin's disease, non-Hodgkin's lymphomas and Burkitt's lymphoma.
  • malignancies that may find utility in the present disclosure can comprise but are not limited to hematological malignancies (including lymphoma, leukemia and myeloproliferative disorders, as described above), hypoplastic and aplastic anemia (both virally induced and idiopathic), myelodysplastic syndromes, all types of paraneoplastic syndromes (both immune mediated and idiopathic) and solid tumors (including GI tract, colon, lung, liver, breast, prostate, pancreas and Kaposi's sarcoma.
  • hematological malignancies including lymphoma, leukemia and myeloproliferative disorders, as described above
  • hypoplastic and aplastic anemia both virally induced and idiopathic
  • myelodysplastic syndromes all types of paraneoplastic syndromes (both immune mediated and idiopathic)
  • solid tumors including GI tract, colon, lung, liver, breast, prostate, pancreas and Kaposi's sarcoma
  • the disclosed methods may be applicable for solid tumors such as tumors in lip and oral cavity, pharynx, larynx, paranasal sinuses, major salivary glands, thyroid gland, esophagus, stomach, small intestine, colon, colorectum, anal canal, liver, gallbladder, extrahepatic bile ducts, ampulla of vater, exocrine pancreas, lung, pleural mesothelioma, bone, soft tissue sarcoma, carcinoma and malignant melanoma of the skin, breast, vulva, vagina, cervix uteri, corpus uteri, ovary, fallopian tube, gestational trophoblastic tumors, penis, prostate, testis, kidney, renal pelvis, ureter, urinary bladder, urethra, carcinoma of the eyelid, carcinoma of the conjunctiva, malignant melanoma of the conjunctiva, malignant melanoma of
  • the methods disclosed herein are applicable for any neoplastic disorder, specifically, any malignant or non-malignant proliferative disorder.
  • the method and uses of the present disclosure are applicable for any cancer.
  • the methods and uses of the present disclosure may be applicable for any one of: Acute lymphoblastic leukemia; Acute myeloid leukemia; Adrenocortical carcinoma; AIDS- related cancers; AIDS-related lymphoma; Anal cancer; Appendix cancer; Astrocytoma, childhood cerebellar or cerebral; Basal cell carcinoma; Bile duct cancer, extrahepatic; Bladder cancer; Bone cancer, Osteosarcoma/Malignant fibrous histiocytoma; Brainstem glioma; Brain tumor; Brain tumor, cerebellar astrocytoma; Brain tumor, cerebral astrocytoma/malignant glioma; Brain tumor, ependy
  • the disease disorder or condition may be ageing related condition.
  • the pathological disorder may be AML.
  • AML Acute myeloid leukemia
  • AML a type of blood cancer
  • DNMT3A DNA (cytosine-5-)-methyltransferase 3 alpha
  • the target nucleic acid sequences of interest may comprise the genomic locus associated with AML.
  • a genomic locus associated with AML may comprise the DNMT3A gene.
  • said genomic locus may be chr2:25457097-25457316 (hgl9 build).
  • the target nucleic acid sequences of interest may comprise the R882 codon.
  • said different target nucleic acid sequences of interest may comprise the same genomic locus.
  • said genomic locus may be associated with AML, for example said genomic locus may be chr2:25457097-25457316 (hgl9 build).
  • said different target nucleic acid sequences of interest may comprise the DNMT3A gene, e.g. said different target nucleic acid sequences of interest may comprise the R882 codon.
  • said genomic locus may be chrl:114713908.
  • the genomic locus may comprise the NRAS gene, e.g. may be associated with cancer.
  • said genomic locus may be chrl7:7,675,206, e.g. may comprise the TP53 gene, e.g. may be associated with cancer.
  • the genomic locus may comprise the JAK2 gene, e.g. may be associated with cancer.
  • the molecular inversion probe-based method for targeted sequencing is as defined as in the previous aspect of the present disclosure.
  • the methods of the present disclosure further comprises administering a suitable treatment to said subject following diagnostic of a pathological disorder in the subject.
  • the present disclosure provides a method of treating or preventing a pathological disorder in at least one subject by identifying at least one genetic and/or epigenetic variation/s associated with said pathologic disorder, and/or at least one nucleic acid sequence of at least one pathogenic entity, in at least one target nucleic acid sequence of at least one samples of said at least one subject, by performing the above described molecular inversion probe-based methods for targeted sequencing in at least one subject or in any nucleic acid molecule obtained therefrom thereby diagnosing a pathological disorder in at least one subject, the method further comprising administering a suitable treatment to said subject.
  • the present disclosure provides a suitable treatment for use in a method of treating or preventing a pathological disorder in at least one subject by identifying at least one genetic and/or epigenetic variation/s associated with said pathologic disorder, and/or at least one nucleic acid sequence of at least one pathogenic entity, in at least one target nucleic acid sequence of at least one samples of said at least one subject, by performing the above described molecular inversion probe-based method for targeted sequencing in at least one subject or in any nucleic acid molecule obtained therefrom thereby diagnosing a pathological disorder in at least one subject, the method further comprising administering a suitable treatment to said subject.
  • An additional aspect of the present disclosure relates to a method of detecting the presence of one or more target microorganism, infectious entity in at least one sample, the method comprising the step of performing molecular inversion probe-based method for targeted sequencing in at least one nucleic acid molecule obtained from said at least one sample, wherein the presence of one or more target nucleic acid sequence associated with said microorganism or infectious entity in said at least one sample indicates the presence thereof in the sample, and wherein the molecular inversion probe-based method for targeted sequencing comprising at least one the steps of: a. contacting at least one target nucleic acid sequence originating from at least one sample with at least one MIP comprising unified at least one sample identifier index, and incubating for hybridization, said MIP comprising:
  • At least one sample identifier index at least one sample identifier index; thereby obtaining at least one hybridization product/s from at least one sample
  • said at least one hybridization product comprises said at least one MIP hybridized to the first and second target regions of at least one target nucleic acid sequence of interest
  • the reaction mixture may be the polymerization and/or ligation reaction mixture
  • the reaction mixture may be the polymerization and/or ligation reaction mixture
  • c optionally subjecting the reaction mixture obtained in step (b) to enzymatic digestion thereby digesting linear MIP/s or nucleic acid molecule/s present in said reaction mixture, thereby obtaining at least one non-digested cyclized product/s from at least one sample; d. amplifying the nucleic acid sequence of said non-digested cyclized product/s of step (c).
  • the steps (a) and (b) may be performed together.
  • the method of the present disclosure is for simultaneous targeted sequencing of at least two samples, wherein the hybridization or cyclized or non-digested cyclized product/s obtained in any one of step (a) or (b) or (c) and corresponding to each of said at least two samples are mixed/pooled together before the amplification step (d).
  • a further aspect of the present disclosure relates a method of detecting the presence of one or more target microorganism, infectious entity in at least two samples, the method comprising the step of performing molecular inversion probe-based method for simultaneous targeted sequencing in at least one nucleic acid molecule obtained from said at least two samples, wherein the presence of one or more target nucleic acid sequence associated with said microorganism or infectious entity in said at least two samples indicates the presence thereof in the samples, and wherein the molecular inversion probe-based method for simultaneous targeted sequencing comprising the steps of: a. contacting at least one target nucleic acid sequence originating from at least two samples with at least two MIP comprising unified at least one sample identifier index, and incubating for hybridization, said MIP comprising:
  • step (iii) at least one sample identifier index; thereby obtaining at least one hybridization product/s from at least two samples; and b. subjecting the hybridization product/s obtained in step (a) to a polymerization reaction, thereby synthesizing a sequence corresponding to the at least one target nucleic acid sequence of interest nested between the first and second regions, wherein the synthesized sequence is further ligated to obtain at least one cyclized product/s in a reaction mixture from said at least two samples; and c.
  • step (b) subjecting the reaction mixture obtained in step (b) to enzymatic digestion thereby digesting linear MIP/s or nucleic acid molecule/s present in said reaction mixture, thereby obtaining at least one non-digested cyclized product/s from at least two samples; d. amplifying the nucleic acid sequence of said non-digested cyclized product/s of step (c); wherein the hybridization or cyclized or non-digested cyclized product/s obtained in any one of step (a) or (b) or (c) and corresponding to each of said at least two samples are mixed/pooled together before the amplification step (d).
  • the hybridization product/s obtained in step (a) for each of at least two samples are mixed/pooled.
  • the cyclized product/s obtained in step (b) for each of at least two samples are mixed/pooled. In some embodiments, the non-digested cyclized product/s obtained in step (c) for each of at least two samples are mixed/pooled.
  • the method for simultaneous targeted sequencing of at least two samples may further comprise a demultiplexing step or procedure. In some further embodiments, the method for simultaneous targeted sequencing of at least two samples may comprise an in silico demultiplexing step or procedure.
  • the microorganism is a prokaryotic microorganism, or a lower eukaryotic microorganism, and wherein said infectious entity is at least one of a viral, a bacterial, a fungal, a parasitic and a protozoan pathogen.
  • pathogen refers to an infectious agent that causes a disease in a subject host.
  • Pathogenic agents include prokaryotic microorganisms, lower eukaryotic microorganisms, complex eukaryotic organisms, viruses, fungi, mycoplasma, prions, parasites, for example, a parasitic protozoan, yeasts or a nematode.
  • the methods of the present disclosure may be applicable for detecting a pathogen that may be in further specific embodiment, a viral pathogen or a virus.
  • the pathogen may be at least one viral pathogen.
  • the infectious entity may be a virus.
  • virus refers to obligate intracellular parasites of living but non-cellular nature, consisting of DNA or RNA and a protein coat. Viruses range in diameter from about 20 to about 300 nm.
  • Class I viruses (Baltimore classification) have a double-stranded DNA as their genome; Class II viruses have a single-stranded DNA as their genome; Class III viruses have a double-stranded RNA as their genome; Class IV viruses have a positive single-stranded RNA as their genome, the genome itself acting as mRNA; Class V viruses have a negative single-stranded RNA as their genome used as a template for mRNA synthesis; and Class VI viruses have a positive single- stranded RNA genome but with a DNA intermediate not only in replication but also in mRNA synthesis.
  • viruses is used in its broadest sense to include any virus, specifically, any enveloped virus.
  • the viral pathogen may be of any of the following orders, specifically, Herpesvirales (large eukaryotic dsDNA viruses), Ligamenvirales (linear, dsDNA (group I) archaean viruses), Mononegavirales (include nonsegmented (-) strand ssRNA (Group V) plant and animal viruses), Nidovirales (composed of (+) strand ssRNA (Group IV) viruses), Ortervirales (single-stranded RNA and DNA viruses that replicate through a DNA intermediate (Groups VI and VII)), Picornavirales (small (+) strand ssRNA viruses that infect a variety of plant, insect and animal hosts), Tymovirales (monopartite (+) ssRNA viruses), Bunyavirales contain tripartite (-) ssRNA viruses (Group V) and Caudovirales
  • the viral pathogens applicable in the disclosed methods may be DNA viruses, specifically, any virus of the following families: the Adenoviridae family, the Papovaviridae family, the Parvoviridae family, the Herpesviridae family, the Poxviridae family, the Hepadnaviridae family and the Anelloviridae family.
  • the viral pathogens applicable in the disclosed methods may be RNA viruses, specifically, any virus of the following families: the Reoviridae family, Picornaviridae family, Caliciviridae family, Togaviridae family, Arenaviridae family, Flaviviridae family, Orthomyxoviridae family, Paramyxoviridae family, Bunyaviridae family, Rhabdoviridae family, Filoviridae family, Coronaviridae family, Astroviridae family, Bornaviridae family, Arteriviridae family, Hepeviridae family and the Retroviridae family.
  • the Reoviridae family Picornaviridae family
  • Caliciviridae family Caliciviridae family
  • Togaviridae family Arenaviridae family
  • Flaviviridae family Orthomyxoviridae family
  • Paramyxoviridae family Bunyaviridae family
  • Rhabdoviridae family Filovirida
  • EBV Epstein-Barr
  • CMV Cytomegalo virus
  • pox viruses smallpox, vaccinia, hepatitis B (HBV
  • the methods of the present disclosure may be suitable for detecting at least one coronavirus (CoV).
  • CoVs are common in humans and usually cause mild to moderate upper-respiratory tract illnesses. There are four main sub-groupings of coronaviruses, known as alpha, beta, gamma, and delta. The seven coronaviruses known to-date as infecting humans are: alpha coronaviruses 229E and NL63, and beta coronaviruses OC43, HKU1, SARS- CoV and SARS-CoV2, and MERS-CoV (the coronavirus that causes Middle East Respiratory Syndrome, or MERS).
  • MERS-CoV Middle East Respiratory Syndrome
  • the SARS-CoV and SARS-CoV2 are a lineage B beta Coronavirus and the MERS-CoV is a lineage C beta Coronavirus.
  • the methods of the present disclosure may be suitable for detecting SARS-CoV2.
  • the disclosed methods may be applicable for detecting bacteria, and in some embodiments, bacterial pathogens.
  • bacteria in this context refers to any type of a single celled microbe.
  • bacteria and microbe are interchangeable. This term encompasses herein bacteria belonging to general classes according to their basic shapes, namely spherical (cocci), rod (bacilli), spiral (spirilla), comma (vibrios) or corkscrew (spirochaetes), as well as bacteria that exist as single cells, in pairs, chains or clusters.
  • bacteria refers to any of the prokaryotic microorganisms that exist as a single cell or in a cluster or aggregate of single cells.
  • the term “bacteria” specifically refers to Gram positive, Gram negative or Acid-fast organisms.
  • the Gram-positive bacteria can be recognized as retaining the crystal violet stain used in the Gram staining method of bacterial differentiation, and therefore appear to be purple-colored under a microscope.
  • the Gram-negative bacteria do not retain the crystal violet, making positive identification possible.
  • bacteria with a thicker peptidoglycan layer in the cell wall outside the cell membrane (Gram-positive), and to bacteria with a thin peptidoglycan layer of their cell wall that is sandwiched between an inner cytoplasmic cell membrane and a bacterial outer membrane (Gramnegative).
  • This term further applies to some bacteria, such as Deinococcus, which stain Grampositive due to the presence of a thick peptidoglycan layer, but also possess an outer cell membrane, and thus suggested as intermediates in the transition between monoderm (Grampositive) and diderm (Gram-negative) bacteria.
  • “Acid fast organisms like Mycobacterium contain large amounts of lipid substances within their cell walls called mycolic acids that resist staining by conventional methods such as a Gram stain.
  • a pathogen to be detected by the disclosed methods may be any bacteria involved in nosocomial infections or any mixture of such bacteria.
  • Nosocomial Infections refers to Hospital-acquired infections, namely, an infection whose development is favoured by a hospital environment, such as surfaces and/or medical personnel, and is acquired by a patient during hospitalization.
  • Nosocomial infections are infections that are potentially caused by organisms resistant to antibiotics. Nosocomial infections have an impact on morbidity and mortality and pose a significant economic burden. In view of the rising levels of antibiotic resistance and the increasing severity of illness of hospital in-patients, this problem needs an urgent solution.
  • Clostridium difficile methicillin-resistant Staphylococcus aureus, coagulase-negative Staphylococci, vancomycin-resistant Enteroccocci, resistant Enterobacteriaceae, Pseudomonas aeruginosa, Acinetobacter and Stenotrophomonas maltophilia.
  • the nosocomial-infection pathogens could be subdivided into Gram-positive bacteria ⁇ Staphylococcus aureus, Coagulase-negative staphylococci'), Gram-positive cocci (Enterococcus faecalis and Enterococcus f aecium), Gram-negative rod-shaped organisms (Klebsiella pneumonia, Klebsiella oxytoca, Escherichia coli, Proteus aeruginosa, Serratia spp.), Gram-negative bacilli (Enterobacter aerogenes, Enterobacter cloacae), aerobic Gram-negative coccobacilli (Acinetobacter baumanii, Stenotrophomonas maltophilia) and Gram-negative aerobic bacillus (Stenotrophomonas maltophilia, previously known as Pseudomonas maltophilia'). Among many others Pseudomonas aeruginosa is
  • the disclosed methods may be applicable in detecting “ESKAPE” pathogens.
  • these pathogens include but are not limited to Enterococcus faecium, Staphylococcus aureus, Clostidium difficile, Klebsiella pneumoniae, Acinetobacter baumanii, Pseudomonas aeruginosa, and Enterobacter.
  • the pathogen according to the present disclosure may be a bacterial cell of at least one of E. coli, Pseudomonas spp, specifically, Pseudomonas aeruginosa, Staphylococcus spp, specifically, Staphylococcus aureus, Streptococcus spp, specifically, Streptococcus pyogenes, Salmonella spp, Shigella spp, Clostidium spp, specifically, Clostidium difficile, Enterococcus spp, specifically, Enterococcus faecium, Klebsiella spp, specifically, Klebsiella pneumonia, Acinetobacter spp, specifically, Acinetobacter baumanni, Yersinia spp, specifically, Yersinia pestis and Enterobacter species or any mutant, variant isolate or any combination thereof.
  • a lower eukaryotic organism applicable in the present invention disclosure may include in some embodiments, a yeast or fungus such as but not limited to Pneumocystis carinii, Candida albicans, Aspergillus, Histoplasma capsulatum, Blastomyces dermatitidis, Cryptococcus neoformans, Trichophyton and Microsporum, are also encompassed by the disclosed methods.
  • a yeast or fungus such as but not limited to Pneumocystis carinii, Candida albicans, Aspergillus, Histoplasma capsulatum, Blastomyces dermatitidis, Cryptococcus neoformans, Trichophyton and Microsporum, are also encompassed by the disclosed methods.
  • a complex eukaryotic organism includes worms, insects, arachnids, nematodes, aemobe, Entamoeba histolytica, Giardia lamblia, Trichomonas vaginalis, Trypanosoma brucei gambiense, Trypanosoma cruzi, Balantidium coli, Toxoplasma gondii, Cryptosporidium or Leishmania.
  • the methods of the present disclosure may be suitable for detecting fungal pathogens.
  • fungi refers to a division of eukaryotic organisms that grow in irregular masses, without roots, stems, or leaves, and are devoid of chlorophyll or other pigments capable of photosynthesis. Each organism (thallus) is unicellular to filamentous and possess branched somatic structures (hyphae) surrounded by cell walls containing glucan or chitin or both and containing true nuclei.
  • fungi includes for example, fungi that cause diseases such as ringworm, histoplasmosis, blastomycosis, aspergillosis, cryptococcosis, sporotrichosis, coccidioidomycosis, paracoccidio-idoinycosis, and candidiasis.
  • the methods of the present disclosure may be applicable for detecting a parasitic pathogen.
  • parasitic protozoan which refers to organisms formerly classified in the Kingdom “protozoa”. They include organisms classified in Amoebozoa, Excavata and Chromalveolata. Examples include Entamoeba histolytica, Plasmodium (some of which cause malaria), and Giardia lamblia.
  • the term parasite includes, but not limited to, infections caused by somatic tapeworms, blood flukes, tissue roundworms, ameba, and Plasmodium, Trypanosoma, Leishmania, and Toxoplasma species.
  • the methods of the present disclosure may be applicable for detecting a nematode.
  • nematode refers to roundworms. Roundworms have tubular digestive systems with openings at both ends.
  • Some examples of nematodes include, but are not limited to, basal order Monhysterida, the classes Dorylaimea, Enoplea and Secernentea and the “Chromadorea” assemblage.
  • the methods of the present disclosure may be applicable for detecting at least one microorganism, specifically, pathogen in food or food products and beverages. More specifically, by the term “food”, it is referred to any substance consumed, usually of plant or animal origin. Some non limiting examples of animals used for feeding are cows, pigs, poultry, etc.
  • the term food also comprises products derived from animals, such as, but not limited to, milk and food products derived from milk, eggs, meat, etc.
  • a drink or beverage is a liquid which is specifically prepared for human consumption. Non limiting examples of drinks include, but are not limited to water, milk, alcoholic and non-alcoholic beverages, soft drinks, fruit extracts, etc.
  • the molecular inversion probe-based method for targeted sequencing is as defined in the previous aspects of the present disclosure.
  • the method of the present disclosure further comprises administering to said subject a suitable treatment against said one or more target microorganism, infectious entity present in at least one sample.
  • the present disclosure provides a method of treating or preventing a pathological disorder caused by one or more target microorganism, infectious entity in at least one subject by detecting the presence of one or more target microorganism, infectious entity in at least one sample, by performing the above described molecular inversion probe-based methods for targeted sequencing in at least one nucleic acid molecule obtained from said at least one sample, thereby detecting the presence of one or more target microorganism, infectious entity, the method further comprising administering a suitable treatment to said subject.
  • the present disclosure provides a suitable treatment for use in a method of treating or preventing pathological disorder caused by one or more target microorganism, infectious entity in at least one subject by detecting the presence of one or more target microorganism, infectious entity in at least one sample, by performing the above described molecular inversion probe-based methods for targeted sequencing in at least one nucleic acid molecule obtained from said at least one sample, thereby detecting the presence of one or more target microorganism, infectious entity, the method further comprising administering a suitable treatment to said subject.
  • the present disclosure provides a method of determining the genotype and/or genetic profile of at least one nucleic acid sequence of at least one organism, or at least one infectious entity from at least one test sample, the method comprising the step of performing molecular inversion probe-based method for targeted sequencing in said at least one test sample comprising said at least one nucleic acid sequence, wherein the molecular inversion probe-based method for targeted sequencing comprising at least one of the steps of: a. contacting at least one target nucleic acid sequence originating from at least one sample with at least one MIP comprising unified at least one sample identifier index, and incubating for hybridization, said MIP comprising:
  • At least one sample identifier index at least one sample identifier index; thereby obtaining at least one hybridization product/s from at least one sample
  • said at least one hybridization product comprises said at least one MIP hybridized to the first and second target regions of at least one target nucleic acid sequence of interest
  • the reaction mixture may be the polymerization and/or ligation reaction mixture
  • the reaction mixture may be the polymerization and/or ligation reaction mixture
  • c optionally subjecting the reaction mixture obtained in step (b) to enzymatic digestion thereby digesting linear MIP/s or nucleic acid molecule/s present in said reaction mixture, thereby obtaining at least one non-digested cyclized product/s from at least one sample; d. amplifying the nucleic acid sequence of said non-digested cyclized product/s of step (c).
  • the steps (a) and (b) may be performed together.
  • the molecular inversion probe-based method for targeted sequencing is as defined in the previous aspects of the present disclosure above.
  • the present disclosure provides a method of determining the genotype and/or genetic profile of at least one nucleic acid sequence of at least one organism, or at least one infectious entity from at least two test samples, the method comprising the step of performing molecular inversion probe-based method for simultaneous targeted sequencing in said at least two test samples comprising said at least one nucleic acid sequence, wherein the molecular inversion probe-based method for simultaneous targeted sequencing comprising the steps of: a. contacting at least one target nucleic acid sequence originating from at least two samples with at least two MIP comprising unified at least one sample identifier index, and incubating for hybridization, said MIP comprising:
  • step (iii) at least one sample identifier index; thereby obtaining at least one hybridization product/s from at least two samples; and b. subjecting the hybridization product/s obtained in step (a) to a polymerization reaction, thereby synthesizing a sequence corresponding to the at least one target nucleic acid sequence of interest nested between the first and second regions, wherein the synthesized sequence is further ligated to obtain at least one cyclized product/s in a reaction mixture from said at least two samples; and c.
  • step (b) subjecting the reaction mixture obtained in step (b) to enzymatic digestion thereby digesting linear MIP/s or nucleic acid molecule/s present in said reaction mixture, thereby obtaining at least one non-digested cyclized product/s from at least two samples; d. amplifying the nucleic acid sequence of said non-digested cyclized product/s of step (c); wherein the hybridization or cyclized or non-digested cyclized product/s obtained in any one of step (a) or (b) or (c) and corresponding to each of said at least two samples are mixed/pooled together before the amplification step (d).
  • the method for simultaneous targeted sequencing of at least two samples may further comprise a demultiplexing step or procedure. In some further embodiments, the method for simultaneous targeted sequencing of at least two samples may comprise an in silico demultiplexing step or procedure.
  • the disclosed methods thus concern genotyping of a nucleic acid sequence.
  • genotyping refers to the identification of the nucleic acid sequence at specific loci in the DNA of an individual.
  • DNA profile As used herein, the terms "DNA profile,” “genetic fingerprint,” and “genotypic profile” are used interchangeably herein to refer to the allelic variations in a collection of polymorphic loci, such as a tandem repeat, a single nucleotide polymorphism (SNP), etc.
  • SNP single nucleotide polymorphism
  • the methods disclosed herein may be useful for interrogating DNA methylation degree, and pattern.
  • DNA methylation is a stable, heritable, covalent modification to DNA, occurring mainly at CpG dinucleotides, but is also found at non-CpG sites. Methylation is associated with normal developmental processes, as well as the changes that are observable during oncogenesis and other pathological processes, such as gene silencing of tumor suppressor or DNA repair genes.
  • Bisulfite genomic sequencing is regarded as a gold-standard technology for detection of DNA methylation and provides a qualitative, quantitative and efficient approach to identify 5 -methylcytosine at single base-pair resolution.
  • This method is based on the finding that the amination reactions of cytosine and 5 -methylcytosine (5mC) proceed with very different consequences after the treatment of sodium bisulfite.
  • the MIP based sequencing methods of the present disclosure may be therefore applicable in identifying epigenetic modifications.
  • genotyping and genetic profiling methods disclosed by the present disclosure can be useful in various applications, to name but few, such applications may include agriculture, health, parental testing, epidemiology, and forensic applications.
  • the disclosed genotyping and genetic profiling methods may be applied in Agricultural genomics, or Agri genomics (the application of genomics in agriculture).
  • the methods disclosed herein may be applied in seed selection, livestock improvements.
  • the methods disclosed herein identify genetic markers linked to desirable traits, informing cultivation and breeding decisions.
  • the methods disclosed herein may be useful to improve plant and animal selection, nutrition, health surveillance, traceability, and veterinary diagnostics systems.
  • the methods disclosed herein may be applied in developing varieties of plant crops with, for example, desirable traits such as drought tolerance, disease resistance, and higher yield.
  • the methods disclosed herein may be applied in agrigenomics for identifying and propagating genetic variants that confer beneficial agronomic traits, in complex environments, acquiring the ability to cope with elements in their environment such as predators, soil conditions, and climate.
  • phenotypic traits of agriculture value include but not limited to yield and growth, disease resistance, abiotic stress adaptation, reproduction, nutrition/end-use quality, sustainability, etc.
  • the genotyping and genetic profiling methods disclosed herein may be applied in providing valuable information about the biological status of important resources like fisheries, crop and livestock health, and food safety and authenticity.
  • the methods may be used to identify organisms present within various environments in order to understand ecosystem diversity. Species contribute DNA to their environment, which can be easily recovered and is often referred to as environmental DNA (eDNA), that may serve as a means of differentiating species based on a unique genetic fingerprint. In this way, eDNA is used to determine the repertoire of organisms present in any setting from seawater to soil and food.
  • eDNA environmental DNA
  • This and other emerging applications of genomics are shaping best practices for resource monitoring and management related to agriculture and may be use by the disclosed methods.
  • the disclosed genotyping and genetic profiling methods may be utilized by animal breeders.
  • the term “breeder animal” refers to a non-human animal (e.g., domestic animals as mammals, specifically horse, sheep, cows, dogs, etc. fish, and avian animals) used for breeding. Accordingly, a breeder animal may be one that is used for breeding using conventional means, such as, e.g., mating a male breeder animal with a female breeder animal.
  • a breeder animal may be one that is used as a donor of genetic material (e.g., sperm, egg, or mitochondria of the breeder animal) for the purpose of producing an offspring animal having one or more predetermined traits in the absence of physical mating with another breeder animal.
  • genetic material e.g., sperm, egg, or mitochondria of the breeder animal
  • the genetic source material may be obtained and used from a single breeder animal or in combination with genetic material from one or more additional breeder animals.
  • a breeder animal may be a living animal or a deceased animal. In the case of a deceased animal, genetic material is obtained from the animal antemortem and cryopreserved for later use in producing an offspring animal having one or more predetermined traits.
  • the disclosed genotyping and genetic profiling methods may be applicable in forensic applications. More specifically, the use of a subset of markers in a human genome has been utilized to determine an individual's personal identity, or DNA fingerprint or profile. These markers include locations or loci of short tandem repeated sequences (STRs) and intermediate tandem repeated sequences (ITRs) which in combination are useful in identifying one individual from another on a genetic level. Accordingly, STR markers are frequently used in the fields of forensic analysis, paternity determination and detection of genetic diseases and cancers.
  • STRs short tandem repeated sequences
  • ITRs intermediate tandem repeated sequences
  • the genotyping and genetic profiling methods disclosed herein may be applicable for DNA profiling which may use in some non-limiting examples, selected biological markers for determining the identity of a DNA sample.
  • the most common analysis for determining a DNA profile is to determine the profile for a number of short tandem repeated (STRs) sequences found in an organism's genome.
  • STRs short tandem repeated
  • Species identification is one of most important components of forensic practice. For example, in some cases of poaching and trading of endangered species, it has been used to provide important information and assist in police investigations. In the food industry, identification of the species present in meat products can be achieved, and in archeology, human remains can be distinguished from non-human remains.
  • DNA profile is useful in forensics for identifying an individual based on a nucleic acid sample.
  • DNA profile as used herein may also be used for other applications, such as diagnosis and prognosis of diseases including cancer, cancer biomarker identification, inheritance analysis, genetic diversity analysis, genetic anomaly identification, quantification of minority populations, databanking, forensics, criminal case work, paternity, personal identification, etc.
  • the methods disclosed herein may apply to any organism, for example humans, non-human primates, animals, plants, viruses, bacteria, fungi and the like.
  • the present methods are not, only useful for DNA profiling (e.g., forensics, paternity, individual identification, etc.) and humans as a target genome, but could also be used for other targets such as cancer and disease markers, genetic anomaly markers and/or when the target genome is not human based.
  • Still further embodiments of the present disclosure concerns genotyping and genetic profiling methods that may be applicable in microbiome analysis which allows one to identify and quantify (relatively) the microbial community in a given set of samples.
  • the genotyping and genetic profiling methods of the present disclosure may be used for tumor analysis. More specifically, tumor biopsies are often a mixture of health and tumor cells. Targeted PCR allows deep sequencing of SNPs and loci with close to no background sequences. It may be used for copy number and loss of heterozygosity analysis on tumor DNA. Said tumor DNA may be present in many different body fluids or tissues of tumor patients. It may be used for detection of tumor recurrence, and/or tumor screening.
  • the genotyping and genetic profiling methods of the present disclosure may be useful for diagnosis of fetal genetic abnormalities.
  • the starting sample may be obtained from maternal tissue (e.g., blood, plasma) or may contain fetal samples (present in amniotic fluid).
  • the methods described in the present disclosure apply techniques for allowing detection of small, but statistically significant, differences in polynucleotide copy number.
  • the targets for the assays and MIP probes described herein can be any genetic target associated with fetal genetic abnormalities, including aneuploidy as well as other genetic variations, such as mutations, insertions, additions, deletions, translocation, point mutation, trinucleotide repeat disorders and/or single nucleotide polymorphisms (SNPs), as well as control targets not associated with fetal genetic abnormalities. Still further, in some embodiments, the methods and compositions described herein can enable detection of extra or missing chromosomes, particularly those typically associated with birth defects or miscarriage. For example, the methods and compositions described herein may enable detection of autosomal trisomies (e.g., Trisomy 13, 15, 16, 18, 21, or 22).
  • autosomal trisomies e.g., Trisomy 13, 15, 16, 18, 21, or 22.
  • the trisomy that is detected is a liveborn trisomy that may indicate that an infant will be born with birth defects (e.g., Trisomy 13 (Patau Syndrome), Trisomy 18 (Edwards Syndrome), and Trisomy 21 (Down Syndrome)).
  • the abnormality may also be of a sex chromosome (e.g., XXY (Klinefelter 's Syndrome), XYY (Jacobs Syndrome), or XXX (Trisomy X).
  • the genetic target may be in any chromosome for example, 13, 18, 21, X or Y.
  • additional fetal conditions that can be determined based on the methods and systems herein include monosomy of one or more chromosomes (X chromosome monosomy, also known as Turner's syndrome), trisomy of one or more chromosomes (13, 18, 21, and X) , tetrasomy and pentasomy of one or more chromosomes (which in humans is most commonly observed in the sex chromosomes, e.g.
  • the genetic target may comprise more than 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23 ,24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43 ,44 ,45, 46, 47, 48, 49, 50, 75, 100, 125, 150, 175, 200, 225, 250, 300, 350, 400, 450, 500, 1,000, 5,000, 10,000, 20,000, 30,000, 40,000, 50,000, 60,000, 70,000, 80,000, 90,000 or 100,000 sites on a specific chromosome.
  • the genetic target comprises targets on more than 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, or 22 different chromosomes. In some cases, the genetic target comprises targets on less than 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, or 23 chromosomes. In some cases, the genetic target comprises a gene that is known to be mutated in an inherited genetic disorder, including autosomal dominant and recessive disorders, and sex-linked dominant and recessive disorders. Non-limiting examples include genetic mutations that give rise to autoimmune diseases, neurodegenerative diseases, cancers, and metabolic disorders.
  • the method detects the presence of a genetic target associated with a genetic abnormality (such as trisomy), by comparing it in reference to a genetic target not associated with a genetic abnormality (such as a gene located on a normal diploid chromosome).
  • a genetic target associated with a genetic abnormality such as trisomy
  • genotyping and genetic profiling methods disclosed herein may be used for standard paternity and identity testing of relatives or ancestors, in human, animals, plants or other creatures. It may be used for rapid genotyping and copy number analysis (CN), on any kind of material, e.g., amniotic fluid and CVS, sperm, product of conception (POC). It may be used for single cell analysis, such as genotyping on samples biopsied from embryos. It may be used for rapid embryo analysis (within less than one, one, or two days of biopsy).
  • a yet additional aspect of the present disclosure relates to a Molecular Inversion Probe (MIP) comprising:
  • MIPs Molecular inversion probes
  • Molecular inversion probes are nucleic acid hybridization probes that hybridize to a target nucleic acid in a loop with the 5' and 3' ends adjacent to or separated in the target with a small gap.
  • One of the specific features of the probe according to the present disclosure is that it comprises a “sample identifier index”.
  • sample identifier index relates to a tag or an index that may be for example a nucleic acid sequence, a Unique Molecular Identifier (UMI) or any suitable label known in the art, that is employed in order to identify a specific sample.
  • UMI Unique Molecular Identifier
  • a sample identifier index is distinct for an index employed for target recognition. In some embodiments, said sample identifier index is at a position that does not disturb the target recognition. In some embodiments, said sample index identifier is not used for target recognition.
  • the probe may comprise at least one UMI.
  • Unique Molecular Identifiers are unique molecular identifiers composed of short sequences or molecular "tags", for the purpose of identifying the specific MIP used.
  • the MIP may comprise two UMIs.
  • the at least one UMI of the disclosed MIP probe may flank one of the first and second complementary region (or homology arms) or the at least one sample index identifier.
  • the UMI may comprise between about 4 nucleotides to about 50 nucleotides, specifically, between 4 to 40 nucleotides, between 4 to 40 nucleotides, between 4 to 40 nucleotides, specifically 4, 5, 6, 7, 8, 9, 10 nucleotides.
  • UMIs useful in the disclosed MIPs comprise 7 nucleotides.
  • UMIs useful in the disclosed MIPs comprise 4 nucleotides.
  • UMIs useful in the disclosed MIPs comprise 8 nucleotides.
  • UMIs useful in the disclosed MIPs comprise 9 nucleotides.
  • UMIs useful in the disclosed MIPs comprise 10 nucleotides.
  • the UMI may flank at least one of said sample index identifier.
  • the at least one sample identifier index flanks at least one of said first and/or second regions.
  • flank refers to a nucleic acid sequence positioned between two defined regions.
  • flank may refer to a position either downstream or upstream to said first and/or second regions and/or a position at either the 3’ end or 5’ end of said first and/or second regions.
  • flank refers to a position close to said first and/or second regions, that is for example a position of about 1 to 100 nucleotides, or between about 1 nucleotide to about 90 nucleotides, 1 nucleotide to about 80 nucleotides, 1 nucleotide to about 70 nucleotides, 1 nucleotide to about 60 nucleotides, 1 nucleotide to about 50 nucleotides, specifically, between 1 to 40 nucleotides, between 1 to 30 nucleotides, between 1 to 20 nucleotides, between 1 to 10 nucleotides, specifically 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 nucleotides from said at least one first and/or second regions.
  • the at least one sample identifier index flanks the 3' end of said first region and/or the 5' end of said second region.
  • the probe of the present disclosure further comprises at least one additional index.
  • said additional index is for target/probe recognition/UMI.
  • the probe suitable according to the present disclosure may have several different configurations as shown for example in Example 4.
  • the probe of the present disclosure may be comprise more than one sample index identifier.
  • the probe may be comprise 2, 3, 4, 5, 6, 7, 8, 9, 10 sample index identifiers.
  • the probe may be comprise 2 sample index identifiers.
  • the probe may comprise at least one UMI.
  • the probe may comprise more than one UMI.
  • the probe may be comprise 2, 3, 4, 5, 6, 7, 8, 9, 10 UMIs.
  • the probe may comprise 2 identical sample identifiers, wherein one sample identifier index flanks the 3' end of said first region and the second sample index identifier flanks the 5' end of said second region.
  • the probe may comprise 2 identical sample identifiers, wherein one sample identifier index flanks the 3' end of said first region and the second sample index identifier flanks the 5' end of said second region and the probe may comprise two identical UMI, wherein one UMI flanks the 3' end of one of the sample index identifier and the second UMI flanks the 5' end of the other sample index identifier.
  • the probe may comprise one sample index identifier flanking the 3’ end of the first region and one UMI flanking the 5' end of the second region. In some further embodiment, the probe may comprise one UMI flanking the 3’ end of the first region and one sample index identifier flanking the 5' end of the second region.
  • the probe of the present disclosure is double-stranded.
  • single strand MIPs may also be applicable according to the present disclosure.
  • said at least one target nucleic acid sequence of interest is at least one of a genomic nucleic acid sequence, a transcriptomic nucleic acid sequence, and a circulating free DNA (cfDNA).
  • said at least one target nucleic acid sequence of interest is a nucleic acid sequence associated with, or comprising, at least one of: genetic and/or epigenetic variation/s, pathologic disorder/s, infectious entity, microorganism/s and GC-rich regions.
  • said genetic variations comprise at least one of: single nucleotide variant (SNVs) and/or single- nucleotide polymorphisms (SNPs), insertions and/or deletions, (indels), inversions, copy number variations (CNV), structural variations, alternative splicing, loss of heterozygosity (LOH), gene fusions, translocations, duplications and variable number of tandem repeats.
  • SNVs single nucleotide variant
  • SNPs single- nucleotide polymorphisms
  • indels inversions
  • CNV copy number variations
  • LH loss of heterozygosity
  • said at least one target nucleic acid sequence of interest is associated with at least one hereditary, congenital, and/or somatic pathologic disorder or condition.
  • said pathologic disorder is at least one of: a neoplastic disorder, a metabolic condition, an inflammatory disorder, an infectious disease caused by a pathogen, mental disorders, an autoimmune disease, a cardiovascular disease, a neurodegenerative disorder, fetal genetic condition and an age-related condition.
  • said neoplastic disorder is Acute Myeloid Leukemia (AML).
  • said infectious entity is at least one of a viral, a bacterial, a fungal, a parasitic and a protozoan pathogen.
  • An additional aspect of the present disclosure relates to a plurality of Molecular Inversion Probes (MIPs) comprising unified at least one sample identifier index, wherein each of said MIPs targets a different target nucleic acid sequence of interest, and wherein each MIP comprises: (i) a first region comprising a first sequence complementary to a first target region in a target nucleic acid sequence of interest of said MIP,
  • MIPs Molecular Inversion Probes
  • a plurality of MIPs comprising unified at least one sample identifier index may enable to target different target nucleic acid sequences of interest.
  • the disclosed method may use 1 to 1 ,000,000 or more different MIPs directed either to the same or to a different target nucleic acid sequence.
  • the plurality of probes form at least one library/panel.
  • the at least one index flanks at least one of said first and/or second regions.
  • the at least one sample identifier index flanks the 3' end of said first region and/or the 5' end of said second region.
  • the plurality of probes of the present disclosure comprise at least one additional index.
  • said additional index is for target or for probe recognition or is a UMI.
  • said probe is double-stranded. Additional configurations of the probes are as further detailed above.
  • said different target nucleic acid sequences of interest are associated with, or comprising, at least one of genetic and/or epigenetic variation/s, pathologic disorder/s, infectious entity, microorganism/s and GC-rich regions.
  • said different target nucleic acid sequences of interest are all associated with, or comprising, at least one of the same genetic and/or epigenetic variation/s, pathologic disorder/s, infectious entity, microorganism/s and GC-rich regions.
  • the plurality of probes may refer to for example a COVID19 panel, a myeloid panel, a generic cancer panel, a cancer specific panel or a carrier screening panel.
  • said different target nucleic acid sequences of interest may comprise the same genomic locus.
  • said genomic locus may be associated with AML, for example said genomic locus may be chr2:25457097-25457316 (hgl9 build).
  • said different target nucleic acid sequences of interest may comprise the DNMT3A gene, e.g. said different target nucleic acid sequences of interest may comprise the R882 codon.
  • said genomic locus is chrl:114713908.
  • the genomic locus may comprise the NRAS gene, e.g. may be associated with cancer.
  • said genomic locus may be chrl7:7,675,206, e.g. may comprise the TP53 gene, e.g. may be associated with cancer.
  • the genomic locus may comprise the JAK2 gene, e.g. may be associated with cancer.
  • said at least one target nucleic acid sequence of interest is at least one of a genomic nucleic acid sequence, a transcriptomic nucleic acid sequence, and a circulating free DNA (cfDNA).
  • said genetic variations comprise at least one of: single nucleotide variant (SNVs) and/or single- nucleotide polymorphisms (SNPs), insertions and/or deletions, (indels), inversions, copy number variations (CNV), structural variations, alternative splicing, loss of heterozygosity (LOH), gene fusions, translocations, duplications and variable number of tandem repeats.
  • SNVs single nucleotide variant
  • SNPs single- nucleotide polymorphisms
  • indels inversions
  • CNV copy number variations
  • LH loss of heterozygosity
  • said at least one target nucleic acid sequence of interest is associated with at least one hereditary, congenital, and/or somatic pathologic disorder or condition.
  • said pathologic disorder is at least one of: a neoplastic disorder, a metabolic condition, an inflammatory disorder, an infectious disease caused by a pathogen, mental disorders, an autoimmune disease, a cardiovascular disease, a neurodegenerative disorder, fetal genetic condition and an age-related condition.
  • said neoplastic disorder is Acute Myeloid Leukemia (AML).
  • said infectious entity is at least one of a viral, a bacterial, a fungal, a parasitic and a protozoan pathogen.
  • the plurality of MIPs may be for performing molecular inversion probe-based method for simultaneous targeted sequencing of at least two samples of different target nucleic acid sequences of interest.
  • a further aspect of the invention relates to a kit comprising at least one set of plurality of MIPs, each of the at least one set of MIPs comprises unified at least one sample identifier index, wherein each of said MIPs targets a different target nucleic acid sequence of interest, and wherein each MIP comprises:
  • the plurality of MIPs is as defined in the previous aspects of the invention above.
  • each of the at least one set of plurality of MIPs is adapted for performing molecular inversion probe-based method in at least one sample.
  • the kit comprises at least two sets of plurality of MIPs, and wherein said at least one sample identifier index is different in each set.
  • the kit of the present disclosure is for performing molecular inversion probe-based method for simultaneous targeted sequencing of at least two samples.
  • a further aspect of the invention relates to a kit comprising at least one MIPs, wherein each MIP comprises:
  • the kit may comprise at least two MIPs. In some embodiments, the kit may be performing molecular inversion probe-based method for simultaneous targeted sequencing of at least two samples.
  • nucleic acid molecule or sequence is referred to often herein, and relates to DNA, RNA, single-stranded, partially single-stranded, partially double-stranded or doublestranded nucleic acid sequences; sequences comprising nucleotides, ribonucleotides, deoxyribonucleotides, nucleotide analogs, modified nucleotides and nucleotides comprising backbone modifications, branch points and non-nucleotide residues, groups or bridges; synthetic RNA, DNA and chimeric nucleotides, hybrids, duplexes, heteroduplexes; and any ribonucleotide, deoxyribonucleotide or chimeric counterpart thereof and/or corresponding complementary sequence and any chemical modifications thereof.
  • Modifications include, but are not limited to, those which provide other chemical groups that incorporate additional charge, polarizability, hydrogen bonding, electrostatic interaction, and functionality to the nucleic acid ligand bases or to the nucleic acid ligand as a whole. Such modifications include, but are not limited to, 2'-position sugar modifications, 5-position pyrimidine modifications, 8-position purine modifications, modifications at exocyclic amines, substitution of 4-thiouridine, substitution of 5-bromo or 5-iodo- uracil; backbone modifications, methylations, unusual base-pairing combinations such as the isobases, isocytidine, and isoguanidine and the like. Modifications can also include 3' and 5' modifications such as capping.
  • a pathological disorder refers to a condition, in which there is a disturbance of normal functioning, any abnormal condition of the body or mind that causes discomfort, dysfunction, or distress to the person affected or those in contact with that person. It should be noted that the terms “disease”, “disorder”, “condition” and “illness”, are equally used herein.
  • any of the methods and probes or plurality of probes described by the present disclosure may be applicable for any of the disorders disclosed herein or any condition associated therewith.
  • the interchangeably used terms "associated”, “linked” and “related”, when referring to pathologies herein, mean diseases, disorders, conditions, or any pathologies which at least one of: share causalities, co-exist at a higher than coincidental frequency, or where at least one disease, disorder condition or pathology causes the second disease, disorder, condition or pathology.
  • subject or “subject in need” or “patient”, it is meant any organism who may be affected by the above-mentioned conditions. Examples of relevant organisms according to the present disclosure are further detailed above.
  • treat means preventing, ameliorating or delaying the onset of one or more clinical indications of disease activity in a subject having a pathologic disorder.
  • Treatment refers to therapeutic treatment. Those in need of treatment are subjects suffering from a pathologic disorder. Specifically, providing a "preventive treatment” (to prevent) or a “prophylactic treatment” is acting in a protective manner, to defend against or prevent something, especially a condition or disease.
  • association means any genetic or epigenetic variations which at least one of: cause either directly or indirectly, responsible for, share causalities, co-exist at a higher than coincidental frequency, with at least one disease, disorder condition or pathology or any symptoms thereof.
  • disease As used herein, “disease”, “disorder”, “condition”, “pathology” and the like, as they relate to a subject's health, are used interchangeably and have meanings ascribed to each and all of such terms.
  • the term "about” as used herein indicates values that may deviate up to 1%, more specifically 5%, more specifically 10%, more specifically 15%, and in some cases up to 20% higher or lower than the value referred to, the deviation range including integer values, and, if applicable, non-integer values as well, constituting a continuous range. In some embodiments, the term “about” refers to ⁇ 10 %.
  • a reference to “A and/or B”, when used in conjunction with open-ended language such as “comprising” can refer, in one embodiment, to A only (optionally including elements other than B); in another embodiment, to B only (optionally including elements other than A); in yet another embodiment, to both A and B (optionally including other elements); etc.
  • the phrase “at least one,” in reference to a list of one or more elements, should be understood to mean at least one element selected from any one or more of the elements in the list of elements, but not necessarily including at least one of each and every element specifically listed within the list of elements and not excluding any combinations of elements in the list of elements.
  • This definition also allows that elements may optionally be present other than the elements specifically identified within the list of elements to which the phrase “at least one” refers, whether related or unrelated to those elements specifically identified.
  • “at least one of A and B” can refer, in one embodiment, to at least one, optionally including more than one, A, with no B present (and optionally including elements other than B); in another embodiment, to at least one, optionally including more than one, B, with no A present (and optionally including elements other than A); in yet another embodiment, to at least one, optionally including more than one, A, and at least one, optionally including more than one, B (and optionally including other elements); etc.
  • compositions comprising, “comprising”, “includes”, “including”, “having” and their conjugates mean “including but not limited to”.
  • consisting of means “including and limited to”.
  • consisting essentially of means that the composition, method or structure may include additional ingredients, steps and/or parts, but only if the additional ingredients, steps and/or parts do not materially alter the basic and novel characteristics of the claimed composition, method or structure.
  • range format is merely for convenience and brevity and should not be construed as an inflexible limitation on the scope of the invention. Accordingly, the description of a range should be considered to have specifically disclosed all the possible sub ranges as well as individual numerical values within that range. For example, description of a range such as from 1 to 6 should be considered to have specifically disclosed sub ranges such as from 1 to 3, from 1 to 4, from 1 to 5, from 2 to 4, from 2 to 6, from 3 to 6 etc., as well as individual numbers within that range, for example, 1, 2, 3, 4, 5, and 6. This applies regardless of the breadth of the range.
  • method refers to manners, means, techniques and procedures for accomplishing a given task including, but not limited to, those manners, means, techniques and procedures either known to, or readily developed from known manners, means, techniques and procedures by practitioners of the chemical, pharmacological, biological, biochemical and medical arts.
  • the four MIP sequential steps are hybridization, gap-fill and ligation, exonuclease, and barcoding PCR ( Figure 1A-1B).
  • eMIP-embedded Molecular Inversion Probes is presented, a MIP based targeted sequencing approach that utilizes indexed panels that harbor a sample specific barcode within every probe sequence ( Figure 1C-1D). This allows the index imprinting on every to-be-sequenced library at the very first step of the MIP reaction, the hybridization reaction and adding a combinatorial index layer on top of the standard barcoding PCR index primers, that may not be required.
  • the probe sequence was structured as such that the first 4 bases of the sequencing readl and first 4 bases of the sequencing read2 generate a panel/sample unique tag (Table 1).
  • a number of 5 conditions were measured: Standard, regular MIP reaction with separated barcoding PCR step i.e. after hybridization, after gap fill or after exonuclease ( Figure IB) and Undetermined, a standard MIP experiment in which data that was deliberately delivered to the undetermined fastq files, following barcoding PCR using primers that do not harbor an index.
  • Table 1 DNMT3A single probe eMIP experiment indexes. Different panel tags marked P# were embedded within the probe sequence, such that they will be sequenced on both reads before the arm sequence. Every row details the expected readl and read2 4 first bases.
  • a Myeloid panel was used targeting 394 genomic regions, InfiniSeq Myeloid panel v5.0.3 (footprint of 64,120 bases to target different DNA samples with a known mutation pattern). The panel was doubled X7 times, each with a different index set, sequenced from both readl & read2 (Table 2). A number of 3 DNA samples were tested: OCI-AML3, K562 and normal Human Genomic DNA.
  • Figure 5A-5D provide further data for each of the duplicates with respect to Variant Allele Frequency (VAF) and data coming from Quality Control (QC).
  • VAF Variant Allele Frequency
  • QC Quality Control
  • the on target percent represents the percentage of reads aligning to predefined panel targets out of the total reads per sample.
  • the percent of uniformity represents the percentage of bases within the specified targets of interest that are covered by at least 20% of the mean coverage of bases in those targets.
  • Table 2 Myeloid panel eMIP experiment indexes. Different panel tags marked P# were embedded within the probe sequence, such that they will be sequenced on both reads before the arm sequence. Every row details the expected R1 and R2 4 first bases.
  • probes suitable for eMIP were designed with two sample identifiers comprising 5 nucleotides instead of 4 (Fig. 6A), or with two sample identifiers comprising 6 nucleotides and two UMIs comprising 4 nucleotides (Fig. 6B), or with one sample identifier comprising 8 nucleotides and one UMI comprising 8 nucleotides (Fig. 6C).
  • LOD limit of detection
  • VAF Variant Allele Frequency
  • a 7X panel multiplexing experiment was calibrated using the same short oncology panels as used in the previous experiment (Example 3) in order to test different DNA concentrations as starting template material.
  • Each 7 template eMIP experiment consists of 4 X identical WT DNA (modifying concentration between experiments), 1 X AML3 DNA (lOOng/pl constant) and 2 NTC (DDW).
  • Figure 11 presents an average of only WT data and similar quality (uniformity per base% and on-target%) from lOOng/pl until 20ng/pl with a reduction of quality starting at 5ng/pl.
  • DDW data was omitted from analysis due to low number of reads (7000 total reads on average).
  • DNA concentration (0.1-50ng/pL) all variations are in the same eMIP pool.
  • the percent of on- target is the same or improved over the standard reaction per concentration, with similar behavior over the entire experiment conditions.
  • the percent of uniformity is less affected by concentration differences throughout the experiment with a reduction demonstrated at Ing/pl concentration.
  • standard reaction does not show such reduction at low template concentrations.
  • a myeloid panel in a 24x MPX eMIP reaction was used - targeting different mixture ratios of NA23245 (NIGMS Human Genetic Cell Repository) DNA and normal DNA in either duplicates or triplicates, and NTC (all in the sample 24x MPX eMIP reaction).
  • the 8BC- 8UMI probe configuration was used.
  • the hybridization reactions were divided into two groups: higher expected VAF% - pooling 1 p 1 and lower expected VAF% 1.5pl. This ratio of 1:1.5 should be interpreted as 50% more reads to the lower expected VAF% samples compared to the higher expected VAF% samples.
  • the same experiment was repeated with an OCI-AML3 DNA and normal DNA. Analysis compared the total reads per sample for each group divided by the total reads for all samples.
  • DNA templates were first created by mixture of different ratios of normal DNA (that does not contain known variants) with cell line DNA templates of expected variant frequencies for mutations in the genes DNMT3A (Heterozygous in 0CI-AML3), NRAS (Homozygous in OCI-AML3), and JAK2 (Homozygous in NA23245).
  • the 8BC-8UMI probe configuration was used. These mixtures allowed for expected low VAF% that reaches up to 2.59%.
  • every data point represents a variant called from a single DNA template of expected VAF (x-axis) and its measured VAF (y-axis). JAK2 mutation calling from any sample that contained 0CI-AML3 was omitted from the analysis since there seems to be a deletion of this variant from the DNA (all samples that were diluted with AML3 DNA demonstrated the same VAF).
  • noisy VAF% data was observed from these samples, which was attributed to uneven karyotype/CNV in these cell lines. Consequently, this analysis was restricted to mixtures of cell line DNA and normal DNA.
  • VAF calling from 24x eMIP is comparable to the data obtained from individual standard MIP reactions, with a slightly lower R-squared value for the DNMT3A variant in the eMIP protocol. Additionally, similar or improved R-squared correlation of deduplicated data compared to non-deduplicated data was observed for the eMIP protocol. Notably, the No Template Control (NTC) data consistently shows 0 VAF throughout the experiment.
  • a synthetic sequence of length 56 bases was designed with flanking sequences that enable hybridization to one of the probes in the Myeloid panel. Included between these sequences is an 8 bases index sequence (50%GC). The same sequence was ordered 96 times with 8 bases indexes per individual sequence. It was then opted to process these synthetic oligos as templates for a set of eMIP experiments to test pooling 16X, 24X, 48X and 96X. Specifically, 16, 24, 48, or 96 barcode-embedded myeloid panels were mixed with individual appropriate number of synthetic oligos (e.g.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Organic Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Microbiology (AREA)
  • Immunology (AREA)
  • Biotechnology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Analytical Chemistry (AREA)
  • Physics & Mathematics (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The present disclosure relates to specific probes, plurality of probes, kits and related methods for performing improved molecular inversion probe-based methods, e.g. for simultaneous targeted sequencing of several samples.

Description

EMBEDDED MOLECULAR INVERSION PROBE-BASED TARGETED SEQUENCING METHODS AND RELATED COMPOSITIONS
FIELD OF THE INVENTION
The present application relates to the field of targeted sequencing and provides an enhanced molecular inversion probe (MIP) technology that utilizes embedded barcodes from the first stages of the assay.
BACKGROUND ART
References considered to be relevant as background to the presently disclosed subject matter are listed below:
[1] J, K. et al. Good Laboratory Standards for Clinical Next-Generation Sequencing Cancer Panel Tests. Journal of pathology and translational medicine 51 (2017).
[2] L, M. et al. Target-enrichment strategies for next-generation sequencing. Nature methods 7 (2010). https://doi.org: 10.1038/nmeth.1419
[3] I, K., J, A., AF, G., BE, S. & CL, H. Overview of Target Enrichment Strategies. Current protocols in molecular biology 112 (2015).
[4] T, B. et al. An improved molecular inversion probe based targeted sequencing approach for low variant allele frequency. NAR genomics and bioinformatics 4 (2022).
Acknowledgement of the above references herein is not to be inferred as meaning that these are in any way relevant to the patentability of the presently disclosed subject matter.
BACKGROUND OF THE INVENTION
Next Generation sequencing (NGS) library preparation is defined by a series of molecular biology steps that start with a DNA/RNA template and end with a sequencing ready DNA library. These steps modify the original template to contain the correct DNA context that enables sequencing. DNA index/barcode/tag may be inserted within the library preparation procedure at the final steps of the library preparation to allow for later pooling the NGS library into a single NGS library reaction. This index is being sequenced together as part of the DNA sequencing procedure and allows the allocation of the sequencing data to the original sample from which it was originated, in an in silico procedure termed “demultiplexing”. A major drawback in any NGS library preparation procedure is the fact that until the indexing step, samples cannot be mixed and processed together [1]. This results in two main problems: A) costs. Since reactions cannot be mixed until after the barcoding step, they should be processed in parallel. This results in multiplying the used reagents and consumables (pipette tips and tubes) and therefore multiplying costs. B) Risk of untraceable cross-contamination between samples. Since sample barcoding occurs at a later stage, faulty processing of the library preparation reaction that results in crossing of e.g., DNA between reactions may result in erroneous data. As the barcoding step is further down the library preparation process, the risk of untraceable cross-contamination rises.
Targeted sequencing is a series of library preparation techniques that enable creating an NGS library that focus on specific genomic regions. There are three general categories: probe capture, molecular inversion probes, and amplicon (PCR) based [2-3]. MIPs propose a cost- effective approach for targeted sequencing as it is a one-pot, internal-purification free, and simple to handle reaction [4]. Unlike capture probes, MIP probes are partially sequenced (and hence enable unique molecular identifier, UMI embedding). There is a need for improved MIP methods and related probes/kits to enable a neglectable library preparation cost for sequencing approaches that may pave the way to population wide genomic screening.
SUMMARY OF THE INVENTION
In a first aspect, the present disclosure provides a Molecular Inversion Probe-based method for targeted sequencing of at least one sample, comprising at least one of the steps of: a. contacting at least one target nucleic acid sequence originating from at least one sample with at least one MIP comprising unified at least one sample identifier index, and incubating for hybridization, said MIP comprising:
(i) a first region comprising a first sequence complementary to a first target region in a target nucleic acid sequence of interest;
(ii) a second region comprising a second sequence complementary to a second target region in said target nucleic acid sequence of interest; and
(iii) at least one sample identifier index; thereby obtaining at least one hybridization product/s from at least one sample (In some embodiments, said at least one hybridization product comprises said at least one MIP hybridized to the first and second target regions of at least one target nucleic acid sequence of interest); and b. subjecting the hybridization product/s obtained in step (a) to a polymerization reaction, thereby synthesizing a sequence corresponding to the at least one target nucleic acid sequence of interest nested between the first and second regions, wherein the synthesized sequence is further ligated to obtain at least one cyclized product/s in a reaction mixture (In some embodiments, the reaction mixture may be the polymerization and/or ligation reaction mixture) from said at least one sample; and c. optionally subjecting the reaction mixture obtained in step (b) to enzymatic digestion thereby digesting linear MIP/s or nucleic acid molecule/s present in said reaction mixture, thereby obtaining at least one non-digested cyclized product/s from at least one sample; d. amplifying the nucleic acid sequence of said non-digested cyclized product/s of step (c).
In a further aspect, the present disclosure provides a method for diagnosing a pathological disorder in at least one subject by identifying at least one genetic and/or epigenetic variation/s associated with said pathologic disorder, and/or at least one nucleic acid sequence of at least one pathogenic entity, in at least one target nucleic acid sequence of at least one samples of said at least one subject, the method comprising the step of performing molecular inversion probe-based method for targeted sequencing in at least one subject or in any nucleic acid molecule obtained therefrom, wherein the presence of one or more of said variation/s in said target nucleic acid sequence and/or of at least one nucleic acid sequence of at least one pathogenic entity in said at least one sample, indicates that the least one subject has a risk, is a carrier, or is suffering from said pathologic disorder, and wherein the molecular inversion probe-based method for targeted sequencing comprising at least one of the steps of: a. contacting at least one target nucleic acid sequence originating from at least one sample with at least one MIP comprising unified at least one sample identifier index, and incubating for hybridization, said MIP comprising:
(i) a first region comprising a first sequence complementary to a first target region in a target nucleic acid sequence of interest;
(ii) a second region comprising a second sequence complementary to a second target region in said target nucleic acid sequence of interest; and
(iii) at least one sample identifier index; thereby obtaining at least one hybridization product/s from at least one sample (In some embodiments, said at least one hybridization product comprises said at least one MIP hybridized to the first and second target regions of at least one target nucleic acid sequence of interest); and b. subjecting the hybridization product/s obtained in step (a) to a polymerization reaction, thereby synthesizing a sequence corresponding to the at least one target nucleic acid sequence of interest nested between the first and second regions, wherein the synthesized sequence is further ligated to obtain at least one cyclized product/s in a reaction mixture (In some embodiments, the reaction mixture may be the polymerization and/or ligation reaction mixture) from said at least one sample; and c. optionally subjecting the reaction mixture obtained in step (b) to enzymatic digestion thereby digesting linear MIP/s or nucleic acid molecule/s present in said reaction mixture, thereby obtaining at least one non-digested cyclized product/s from at least one sample; d. amplifying the nucleic acid sequence of said non-digested cyclized product/s of step (c).
An additional aspect of the present disclosure relates to a method of detecting the presence of one or more target microorganism, infectious entity in at least one sample, the method comprising the step of performing molecular inversion probe-based method for targeted sequencing in at least one nucleic acid molecule obtained from said at least one sample, wherein the presence of one or more target nucleic acid sequence associated with said microorganism or infectious entity in said at least one sample indicates the presence thereof in the sample, and wherein the molecular inversion probe-based method for targeted sequencing comprising at least one the steps of: a. contacting at least one target nucleic acid sequence originating from at least one sample with at least one MIP comprising unified at least one sample identifier index, and incubating for hybridization, said MIP comprising:
(i) a first region comprising a first sequence complementary to a first target region in a target nucleic acid sequence of interest;
(ii) a second region comprising a second sequence complementary to a second target region in said target nucleic acid sequence of interest; and
(iii) at least one sample identifier index; thereby obtaining at least one hybridization product/s from at least one sample (In some embodiments, said at least one hybridization product comprises said at least one MIP hybridized to the first and second target regions of at least one target nucleic acid sequence of interest); and b. subjecting the hybridization product/s obtained in step (a) to a polymerization reaction, thereby synthesizing a sequence corresponding to the at least one target nucleic acid sequence of interest nested between the first and second regions, wherein the synthesized sequence is further ligated to obtain at least one cyclized product/s in a reaction mixture (In some embodiments, the reaction mixture may be the polymerization and/or ligation reaction mixture) from said at least one sample; and c. optionally subjecting the reaction mixture obtained in step (b) to enzymatic digestion thereby digesting linear MIP/s or nucleic acid molecule/s present in said reaction mixture, thereby obtaining at least one non-digested cyclized product/s from at least one sample; d. amplifying the nucleic acid sequence of said non-digested cyclized product/s of step (c).
In a further aspect, the present disclosure provides a method of determining the genotype and/or genetic profile of at least one nucleic acid sequence of at least one organism, or at least one infectious entity from at least one test sample, the method comprising the step of performing molecular inversion probe-based method for targeted sequencing in said at least one test sample comprising said at least one nucleic acid sequence, wherein the molecular inversion probe-based method for targeted sequencing comprising at least one of the steps of: a. contacting at least one target nucleic acid sequence originating from at least one sample with at least one MIP comprising unified at least one sample identifier index, and incubating for hybridization, said MIP comprising:
(i) a first region comprising a first sequence complementary to a first target region in a target nucleic acid sequence of interest;
(ii) a second region comprising a second sequence complementary to a second target region in said target nucleic acid sequence of interest; and
(iii) at least one sample identifier index; thereby obtaining at least one hybridization product/s from at least one sample (In some embodiments, said at least one hybridization product comprises said at least one MIP hybridized to the first and second target regions of at least one target nucleic acid sequence of interest); and b. subjecting the hybridization product/s obtained in step (a) to a polymerization reaction, thereby synthesizing a sequence corresponding to the at least one target nucleic acid sequence of interest nested between the first and second regions, wherein the synthesized sequence is further ligated to obtain at least one cyclized product/s in a reaction mixture (In some embodiments, the reaction mixture may be the polymerization and/or ligation reaction mixture) from said at least one sample; and c. optionally subjecting the reaction mixture obtained in step (b) to enzymatic digestion thereby digesting linear MIP/s or nucleic acid molecule/s present in said reaction mixture, thereby obtaining at least one non-digested cyclized product/s from at least one sample; d. amplifying the nucleic acid sequence of said non-digested cyclized product/s of step (c).
A yet additional aspect of the present disclosure relates to a Molecular Inversion Probe (MIP) comprising:
(i) a first region comprising a first sequence complementary to a first target region in a target nucleic acid sequence of interest, (ii) a second region comprising a second sequence complementary to a second target region in said target nucleic acid sequence of interest; and
(iii) at least one sample identifier index.
An additional aspect of the present disclosure relates to a plurality of Molecular Inversion Probes (MIPs) comprising unified at least one sample identifier index, wherein each of said MIPs targets a different target nucleic acid sequence of interest, and wherein each MIP comprises:
(i) a first region comprising a first sequence complementary to a first target region in a target nucleic acid sequence of interest of said MIP,
(ii) a second region comprising a second sequence complementary to a second target region in said target nucleic acid sequence of interest of said MIP; and
(iii) at least one sample identifier index.
A further aspect of the invention relates to a kit comprising at least one set of plurality of MIPs, each of the at least one set of MIPs comprises unified at least one sample identifier index, wherein each of said MIPs targets a different target nucleic acid sequence of interest, and wherein each MIP comprises:
(i) a first region comprising a first sequence complementary to a first target region in a target nucleic acid sequence of interest of said MIP,
(ii) a second region comprising a second sequence complementary to a second target region in said target nucleic acid sequence of interest of said MIP; and
(iii) at least one sample identifier index; and optionally, at least one of:
(a) instructions for use;
(b) additional reagents.
BRIEF DESCRIPTION OF THE DRAWINGS
In order to better understand the subject matter that is disclosed herein and to exemplify how it may be carried out in practice, embodiments will now be described, by way of non-limiting example only, with reference to the accompanying drawings, in which:
Figure 1A-1D: Schematic representation of the MIP and eMIP panels and protocols.
Fig. 1A: The basic standard MIP probe structure involves a backbone (black) and two flanking DNA targeting regions (grey).
Fig. IB: The reaction is composed of 4 sequential steps that process each DNA sample (marked in different nuances) separately. Fig. 1C: The eMIP probe structure harbors an embedded index sequence upstream the targeting arms. Every panel has a different index sequence that imprints the libraries from the first MIP step, allowing the pooling of the reactions earlier than the standard protocol (which allows pooling only after the barcoding PCR). In the eMIP protocol, every sample that is expected to be pooled is targeted by a different panel, and hence gets a different index.
Fig. ID: An embodiment of the eMIP protocol in which following the first step of the protocol (hybridization) the processed libraries are pooled to be processed together in a single reaction.
Figure 2: Estimated experiment cost reduction by using the eMIP protocol.
Estimates if pooling after the hybridization (circle), gap fill (square) or exonuclease step (star) as a factor of the number of pooled samples (x-axis).
Figure 1A-3B: DNMT3A R882 mutation analysis using a single probe.
Fig. 3A: To control for the correct P# calling, each standard MIP reactions mapped P sequence was first compared with their known embedded barcode (n=12). Exact match median is 99.1%, and when allowing one mismatch 99.9%.
Fig. 3B: Same 12 probes were applied on different DNA templates with known VAF: 50% (DNMT3A cell line), 25% (DNMT3A cell line DNA diluted with equimolar concentration of normal DNA) and 0% (normal DNA). Each box (n=4) represents the data measured for any condition. Standard, regular MIP reaction with separated barcoding PCR step (Figure IB). Undetermined, a standard MIP experiment in which data that was deliberately delivered to the undetermined fastq files, following barcoding PCR using primers that do not harbor an index. Others, reactions from the eMIP protocol, demultiplexed by the P# (see Table 1 , Figure ID) pooled after different protocol steps: after hybridization step, gap fill step, or exonuclease step.
Figure 2A-4D: VAF analysis of 3 expected homoduplex mutations withing different DNA samples from MIP and eMIP experiment.
A number of 7 identical myeloid panels were applied to 7 DNA templates, 1XNTC, and a duplicate for OCI-AML3, K562 and normal Human Genomic DNA (marked WT). Each panel was marked by a different DNA barcode to allow for demultiplexing at different eMIP pooling stages. Vertical lines and horizontal lines - NRAS and TP53 mutation, respectively, left Y-axis. White - Total reads per sample, right Y-axis.
Fig. 4A: Standard MIP reaction. Fig. 4B: eMIP reactions pooled after hybridization.
Fig. 4C: eMIP reactions pooled after gap fill.
Fig. 4D: eMIP reactions pooled after exonuclease.
Figure 5A-5D: VAF analysis of 3 expected homoduplex mutations withing different DNA samples from MIP and eMIP experiment (data of each duplicate and QC).
A number of 7 identical myeloid panels were applied to 7 DNA templates, 1XNTC, and a duplicate for 0CI-AML3, K562 and normal Human Genomic DNA (marked WT). Each panel was marked by a different DNA barcode to allow for demultiplexing at different eMIP pooling stages (Standard: standard MIP reaction as control).
Fig. 5A: VAF of each duplicate after exonuclease, gap filling, hybridization and standard MIP reaction. Diagonal lines and white - NRAS and TP53 mutation, respectively, left Y-axis. Horizontal lines - Total reads per sample, right Y-axis.
Fig. SB: Performance data per AML3 dilution at different eMIP pooling stages - Total reads count per sample.
Fig. SC: Performance data per AML3 dilution at different eMIP pooling stages - Percent of on-target reads per sample.
Fig. 5D: Performance data per AML3 dilution at different eMIP pooling stages - Percent of uniformity per sample.
Figure 6A-6C: Schematic representation of different configurations of the probes for eMIP.
Fig. 6A: Probe suitable for eMIP composed of two sample identifiers comprising 5 nucleotides, on both sides of the probe, flanking the arms.
Fig. 6B: Probe suitable for eMIP composed of two sample identifiers comprising 6 nucleotides, on both sides of the probe, flanking the arms and two UMIs comprising 4 nucleotides.
Fig. 6C: Probe suitable for eMIP composed of one sample identifier comprising 8 nucleotides, flaking the Ligation arm and one UMI comprising 8 nucleotides.
Lig Arm: ligation arm; Ex Arm: extension arm; BC: barcode; UMI: unique molecular identifier; BB: backbone; Ns: nucleotides.
Figure 7A-7B: Measured VAF and read depth (DP) of NRAS variant per expected variant VAF in diluted samples using the standard MIP reaction or eMIP reaction pooled after hybridization step or gap-fill (extension) step. Fig. 7A: Measured VAF of diluted samples of a known mutation in the NRAS gene with 0CI-AML3 mutated cell line DNA and normal DNA as the diluent.
Fig. 7B: Read depth (DP) of diluted samples of a known mutation in the NRAS gene with 0CI-AML3 mutated cell line DNA and normal DNA as the diluent.
Figure 8: Measured VAF using different configurations of eMIP probes using the standard MIP reaction or eMIP reaction pooled after hybridization step or gap-fill (extension) step.
Figure 9: Measured VAF using the standard MIP reaction or eMIP reaction pooled after hybridization step or gap-fill (extension) step with all different configurations of eMIP probes together.
Figure 10: QC data of different configurations of eMIP probes VAF using the standard MIP reaction or eMIP reaction pooled after hybridization step or gap-fill (extension) step.
Figure 11: QC data of different concentrations of DNA starting materials.
Figure 12: Total reads obtained with 4 different parameters of the eMIP reaction i.e. DNA concentration, number of samples in pool, starting hybridization volume and volume at gapfilling step.
Figure 13A-13B: Reads ratio of NA23245 (NIGMS Human Genetic Cell Repository) DNA and OCI-AML3 DNA.
Fig. 13A: Reads ratio of higher- volume sample and lower- volume sample out of total reads for NA23245 DNA.
Fig. 13B: Reads ratio of higher- volume sample and lower- volume sample out of total reads for OCI-AML3 DNA.
Figure 14A-14D: Correlation plots of variants from NRAS DNA template of expected VAF (X axis) and its measured VAF (Y axis).
Fig. 14A: Standard MIP protocol, non-deduplicated data.
Fig. 14B: eMIP protocol, non-deduplicated data.
Fig. 14C: Standard MIP protocol, deduplicated data.
Fig. 14D: eMIP protocol, deduplicated data. Figure 15A-15D: Correlation plots of variants from DNMT3A DNA template of expected VAF (X axis) and its measured VAF (Y axis).
Fig. ISA: Standard MIP protocol, non-deduplicated data.
Fig. 15B: eMIP protocol, non-deduplicated data.
Fig. 15C: Standard MIP protocol, deduplicated data.
Fig. 15D: eMIP protocol, deduplicated data.
Figure 16A-16D: Correlation plots of variants from JAK2 DNA template of expected VAF (X axis) and its measured VAF (Y axis).
Fig. 16A: Standard MIP protocol, non-deduplicated data.
Fig. 16B: eMIP protocol, non-deduplicated data.
Fig. 16C: Standard MIP protocol, deduplicated data.
Fig. 16D: eMIP protocol, deduplicated data.
Figure 17A-17C: Schematic representation of the controlled eMIP experiment.
Fig. 17A: eMIP panels with embedded barcodes marked: EB8N#.
Fig. 17B: Template oligos marked TEB8N#.
Fig. 17C: Final formation of the expected library (after pooling, EB8N# and TEB8N# numbers should match).
Figure 18: Heatmap of matching read counts per combination of the controlled eMIP experiment, 16X multiplexity.
Fastq files(rows) were sorted by their eMIP index, in duplicates. The search patterns (columns) are sorted right to left.
Figure 19: Heatmap of matching read counts per combination of the controlled eMIP experiment, 24X multiplexity.
Fastq files (rows) were sorted by their eMIP index, in duplicates. The search patterns (columns) are sorted right to left.
Figure 20: Heatmap of matching read counts per combination of the controlled eMIP experiment, 48X multiplexity. Fastq files (rows) were sorted by their eMIP index, in duplicates. The search patterns (columns) are sorted right to left.
Figure 21: Heatmap of matching read counts per combination of the controlled eMIP experiment, 96X multiplexity.
Fastq files (rows) were sorted by their eMIP index, in duplicates. The search patterns (columns) are sorted right to left.
DETAILED DESCRIPTION OF THE INVENTION
In recent years, sequencing costs have dramatically declined while the upstream processing costs, also referred to as Next Generation sequencing (NGS) library preparation remained steady. Depending on the application, this series of modifying molecular biology processes, applied in parallel to DNA/RNA templates is reagent, cost, and time consuming and is prone for crosscontamination. Current methods to solve these issues involve lowering reaction volumes and pooling libraries after a minimum number of indexing steps (e.g., in capture based targeted sequencing methods) but neither solve all issues. To address these issues, the present application provides an enhanced molecular inversion probes (MIP) technology that utilizes sample specific probe sets which harbor embedded barcodes with the premise to index samples from the first stage and allows for early sample pooling to a single reaction. To validate the method, embedded barcode panels were applied: both a single probe and a full myeloid probe set on DNA samples originated from DNA samples with known mutations and allele frequencies; the reactions were pooled after different MIP steps. The analysis of the pooled samples recapitulated the results originated by the standard single processed MIP reactions even when pooling after the first MIP step. Interestingly, results match both the expected mutation analysis and other run parameters such as on-target and uniformity rates. The results presented herein provides a 1- to 2-fold experiment cost reduction demonstrating a neglectable library preparation cost for targeted enrichment experiment.
In a first aspect, the present disclosure provides a Molecular Inversion Probe-based method for targeted sequencing of at least one sample, comprising at least one of the steps of: a. contacting at least one target nucleic acid sequence originating from at least one sample with at least one MIP comprising unified at least one sample identifier index, and incubating for hybridization, said MIP comprising:
(i) a first region comprising a first sequence complementary to a first target region in a target nucleic acid sequence of interest; (ii) a second region comprising a second sequence complementary to a second target region in said target nucleic acid sequence of interest; and
(iii) at least one sample identifier index; thereby obtaining at least one hybridization product/s from at least one sample (In some embodiments, said at least one hybridization product comprises said at least one MIP hybridized to the first and second target regions of at least one target nucleic acid sequence of interest); and b. subjecting the hybridization product/s obtained in step (a) to a polymerization reaction, thereby synthesizing a sequence corresponding to the at least one target nucleic acid sequence of interest nested between the first and second regions, wherein the synthesized sequence is further ligated to obtain at least one cyclized product/s in a reaction mixture (In some embodiments, the reaction mixture may be the polymerization and/or ligation reaction mixture) from said at least one sample; and c. optionally subjecting the reaction mixture obtained in step (b) to enzymatic digestion thereby digesting linear MIP/s or nucleic acid molecule/s present in said reaction mixture, thereby obtaining at least one non-digested cyclized product/s from at least one sample; d. amplifying the nucleic acid sequence of said non-digested cyclized product/s of step (c).
In some embodiments, the steps (a) and (b) may be performed together.
In some specific embodiments, the present disclosure provides a Molecular Inversion Probe-based method for targeted sequencing of at least one sample, comprising the steps of: a. contacting at least one target nucleic acid sequence originating from at least one sample with at least one MIP comprising unified at least one sample identifier index, and incubating for hybridization, said MIP comprising:
(i) a first region comprising a first sequence complementary to a first target region in a target nucleic acid sequence of interest;
(ii) a second region comprising a second sequence complementary to a second target region in said target nucleic acid sequence of interest; and
(iii) at least one sample identifier index; thereby obtaining at least one hybridization product/s from at least one sample (In some embodiments, said at least one hybridization product comprises said at least one MIP hybridized to the first and second target regions of at least one target nucleic acid sequence of interest); and b. subjecting the hybridization product/s obtained in step (a) to a polymerization reaction, thereby synthesizing a sequence corresponding to the at least one target nucleic acid sequence of interest nested between the first and second regions, wherein the synthesized sequence is further ligated to obtain at least one cyclized product/s in a reaction mixture (In some embodiments, the reaction mixture may be the polymerization and/or ligation reaction mixture) from said at least one sample; and c. subjecting the reaction mixture obtained in step (b) to enzymatic digestion thereby digesting linear MIP/s or nucleic acid molecule/s present in said reaction mixture, thereby obtaining at least one non-digested cyclized product/s from at least one sample; d. amplifying the nucleic acid sequence of said non-digested cyclized product/s of step (c).
In some embodiments, the methods of the present disclosure is for simultaneous targeted sequencing of at least two samples, wherein the hybridization or cyclized or non-digested cyclized product/s obtained in any one of step (a) or (b) or (c) and corresponding to each of said at least two samples are mixed/pooled together before the amplification step (d).
Therefore in a further aspect, the present disclosure provides a Molecular Inversion Probebased method for simultaneous targeted sequencing of at least two samples, comprising the steps of: a. contacting at least one target nucleic acid sequence originating from at least two samples with at least two MIP comprising unified at least one sample identifier index, and incubating for hybridization, said MIP comprising:
(i) a first region comprising a first sequence complementary to a first target region in a target nucleic acid sequence of interest;
(ii) a second region comprising a second sequence complementary to a second target region in said target nucleic acid sequence of interest; and
(iii) at least one sample identifier index; thereby obtaining at least one hybridization product/s from at least two samples; and b. subjecting the hybridization product/s obtained in step (a) to a polymerization reaction, thereby synthesizing a sequence corresponding to the at least one target nucleic acid sequence of interest nested between the first and second regions, wherein the synthesized sequence is further ligated to obtain at least one cyclized product/s in a reaction mixture from said at least two samples; and c. subjecting the reaction mixture obtained in step (b) to enzymatic digestion thereby digesting linear MIP/s or nucleic acid molecule/s present in said reaction mixture, thereby obtaining at least one non-digested cyclized product/s from at least two samples; d. amplifying the nucleic acid sequence of said non-digested cyclized product/s of step (c); wherein the hybridization or cyclized or non-digested cyclized product/s obtained in any one of step (a) or (b) or (c) and corresponding to each of said at least two samples are mixed/pooled together before the amplification step (d).
In some embodiments, the hybridization product/s obtained in step (a) for each of at least two samples are mixed/pooled.
In some embodiments, the cyclized product/s obtained in step (b) for each of at least two samples are mixed/pooled.
In some embodiments, the non-digested cyclized product/s obtained in step (c) for each of at least two samples are mixed/pooled.
As used herein , Molecular inversion probes (MIPs) are, e.g., nucleic acid hybridization probes that hybridize to a target nucleic acid in a loop with the 5' and 3' ends adjacent to or separated in the target with a small gap. The MIPs are typically designed to interrogate a target nucleotide in the gap using the high specificity of the DNA polymerase reaction. If provided with the appropriate dNTP, the polymerase can fill the gap between the MIP 5' and 3' ends. For example, if the target nucleic acid has an adenine “A” in the gap, using the target as a template, the polymerase can fill the gap if provided with a complementary dTTP. The polymerase will add a “T” and fill the gap in the gap-fill reaction. With the gap filled, a ligase can close the remaining nick and circularize the MIP. The circularized MIPs are then enriched or isolated. In some embodiments, because circularized single strand DNA is not a substrate for many nucleases, all other nucleic acids, including MIPs that did not hybridize and circularize (also referred to herein as linear MIPs), can be removed e.g. digested with one or more nuclease. MIP reaction products are typically detected after an amplification step, such as PCR using primer binding sites within the MIPs or rolling circle amplification, on a capture array.
In some embodiments, MIPs useful in the disclosed methods comprise "first" and "second" regions that comprise sequences complementary to the first and second regions, respectively, of the target nucleic acid sequence. Such first and second complementary regions may also be named as Ligation arm and Extension arm. The term “complementary” as used herein refers to the hybridization or base pairing between nucleotides or nucleic acids, such as, for instance, between the two strands of a double stranded DNA molecule or between an oligonucleotide primer and a primer binding site on a single stranded nucleic acid to be sequenced or amplified. Complementary nucleotides are, generally, A and T (or A and U), or C and G. Two single stranded RNA or DNA molecules are said to be complementary when the nucleotides of one strand, optimally aligned and compared and with appropriate nucleotide insertions or deletions, pair with at least about 70% to 100% of the nucleotides of the other strand, with at least about 80% of the nucleotides of the other strand, specifically, about 80% to 100%, more specifically at least about 90% to 95%, and more preferably from about 98% to 100%. Alternatively, complementarity exists when an RNA or DNA strand will hybridize under selective hybridization conditions to its complement. Typically, selective hybridization will occur when there is at least about 65% complementary over a stretch, preferably at least about 75%, more preferably at least about 90% complementary. In some embodiments, homology regions of a MIP display at least about 75%, or about 80% to 100%, more specifically about 90% to 95%, and more preferably about 98% to 100%, or about 100% complementarity with the corresponding complementary sequence within the target nucleic acid of interest, e.g., unless there is a mismatch at the position of the interrogated nucleotide of interest.
Still further, the complementary regions of the MIPs provided and used in the disclosed methods may be also referred to herein as homology regions. "Homology regions”, as used herein are those regions of a molecular inversion probe that are complementary to the target nucleic acid of interest. As indicated above, MIPs typically have two homology regions (HRs), one at or near the 5' end of the probe and one at or near the 3' end. In some embodiments, the HRs are adapted to hybridize to a target nucleic acid of interest so that they about each other or are separated by a gap of a single target nucleotide or a plurality of target nucleotides. In some embodiments, the first and second complementary region of the target nucleic acid sequence, flank the sequence to be interrogated (e.g., SNP etc.). A gap of a plurality of target nucleotides can include, e.g., from 1 to about 2000 nucleotides, for example, from 1 to 500 nucleotides, and more preferably 1 to 250 nucleotides. The size of the gap will depend on a variety of factors, including the sequence of the intended target, the size of the overall MIP, the quantity and size of non-HR portions of the MIP, the desired purpose of the assay and associated characteristics, and other factors. For instance, a MIP designed to interrogate a SNP may have a gap of a single nucleotide while a MIP designed to interrogate a multi-base insertion may have a gap of multiple nucleotides. In some embodiments, the first and/or the second homology regions of the disclosed MIP may be about 10 to about 200 nucleotides long, specifically, about 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115, 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 190, 195, 200 or more nucleotides. It should be further noted that the first and second complementary regions of the disclosed MIPs may be either the same or different.
In some embodiments, the MIP probe used in the present disclosure may comprise degenerative homology arms, or complementary regions. In some embodiments, the complementary regions of the disclosed MIPs may comprise one or more degenerate base, specifically, between about 0.1% to about 90% degenerate bases, and are therefore referred to herein as degenerative homology regions or arms, complementary regions or arms. More specifically, degenerate base means more than one base possibility at a particular position. An oligonucleotide sequence can be synthesized with multiple bases at the same position, this is termed as degenerate base also sometime referred as "wobble" position or "mixed base". IUB (International Union of Biochemistry) has established single letter codes for all possible degenerate possibilities. An example is "R" that is A+G at the same position with 50% of the oligo sequence will have an A at that position, and the other 50% have G. A degenerate base position may have any combination of two, three, or four bases. Chemical synthesis of oligos using IUB degenerate bases is programmed and automated to deliver the percentage of each base for reaction at that specific base position; example for the letter "N", 25% of each base will be delivered for coupling. The delivery and coupling may not be 100% accurate and efficient for each base and thus approximately 10% deviation should be expected and considered in the final oligo sequence. For degenerate (mixed bases) positions use the following IUB codes. R=A+G, Y=C+T, M=A+C, K=G+T, S=G+C, W=A+T, H=A+T+C, B=G+T+C, D=G+A+T, V=G+A+C, N=A+C+G+T.
As used herein, the term “sample identifier index” relates to a tag or an index that may be for example a nucleic acid sequence, a Unique Molecular Identifier (UMI) or any suitable label known in the art, that is employed in order to identify a specific sample. A sample identifier index is distinct for an index employed for target recognition. When performing the Molecular inversion probe based method as detailed in the present disclosure, the at least one sample identifier index present in a MIP would be uniform for each sample. Furthermore, when relating to a plurality of MIP according to the present disclosure and as further detailed below, the at least one sample identifier index would be uniform for each MIP employed for a particular sample. When further relating to a set of plurality of MIPs, each sample specific set would possess uniform at least one sample identifier index.
In some embodiments, said sample identifier index is at a position that does not disturb the target recognition. In some embodiments, said sample index identifier is not used for target recognition.
Still further, in some embodiments, the sample identifier index may comprise between about 4 nucleotides to about 50 nucleotides, specifically, between 4 to 40 nucleotides, between 4 to 40 nucleotides, between 4 to 40 nucleotides, between 4 to 40 nucleotides, specifically 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40 nucleotides. In some embodiments, the sample identifier index in the disclosed MIPs comprise 4 nucleotides. In some embodiments, the sample identifier index in the disclosed MIPs comprise 5 nucleotides. In some embodiments, the sample identifier index in the disclosed MIPs comprise 6 nucleotides. In some embodiments, the sample identifier index in the disclosed MIPs comprise 8 nucleotides. In some embodiments, the sample identifier index in the disclosed MIPs comprise 9 nucleotides. In some embodiments, the sample identifier index in the disclosed MIPs comprise 10 nucleotides.
The term “hybridization” as used herein refers to the process in which two single-stranded polynucleotides bind non-covalently to form a stable double-stranded polynucleotide; triplestranded hybridization is also theoretically possible. The resulting (usually) double-stranded polynucleotide is a “hybrid.” Hybridizations are usually performed under stringent conditions, for example, at a temperature of at least 25 °C and more. As other factors may affect the stringency of hybridization, including base composition and length of the complementary strands, presence of organic solvents and extent of base mismatching, the combination of parameters is more important than the absolute measure of any one alone. The hybridization step of the disclosed methods is performed in conditions suitable to allow the successful hybridization of the at least one MIP to the target sequence, thereby forming the hybridized MIP. In some embodiments, “hybridizing conditions” include any condition (time, temperature, buffer) that result in specific hybridization between complementary sequences, e.g., target nucleic acid sequence is said to specifically hybridize to the MIP probe nucleic acid complementary region when it hybridizes at least 50% as well (e.g., quantitatively under the same hybridization conditions) to the probe as to the perfectly matched complementary target, i.e., with a signal to noise ratio at least half as high as hybridization of the probe to the target under conditions in which the perfectly matched probe binds to the perfectly matched complementary target. In some embodiments, the hybridization step may be performed in a thermal cycler. In yet some further embodiments, the hybridization program used may be either gradual (ramp temp) or constant.
A “gap-fill reaction” is a reaction, described herein, in which a gap is filled by the action of a polymerase between 5' and 3' ends of a molecular inversion probe hybridized to a complementary target nucleic acid. In many embodiments, the filled gap consists of a single nucleotide. However, in some embodiments of the MIP gap-fill reactions, the gap can be more than one nucleotide, for example, between about 1 to about 500 nucleotides, specifically, between about 1 to about 450 nucleotides, between about 1 to about 400 nucleotides, between about 1 to about 350 nucleotides, between about 1 to about 300 nucleotides, between about 1 to about 250 nucleotides, e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 40, 50 75, 100, 150, 200, 250 or more nucleotides, e.g., between first and second MIP homology regions specifically hybridized to a target nucleic acid. In some embodiments, the methods disclosed herein may further encompass gaps of hundreds of nucleotides, and/or gaps between different chromosomes, that may be used in methods that define genomic topological organization, as will be discussed in more detail herein after. It should be understood that the synthesized sequence is further ligated to obtain cyclized product/s in the polymerization and/or ligation reaction mixture.
In some embodiments, the polymerization reaction is performed by a DNA polymerase. A polymerase as used herein, is a member of a group of enzymes required for DNA synthesis. The main function of the DNA polymerase is to synthesize DNA during replication. DNA polymerase works in pairs, replicating two strands of DNA in tandem. They add deoxyribonucleotides at the 3'-OH group of the growing DNA strand. The DNA strand grows in 5’— >3’ direction by their polymerization activity. Adenine pairs with thymine and guanine pairs with cytosine. DNA polymerases cannot initiate the replication process and they need a primer to add to the nucleotides. The polymerization reaction is therefore the synthesis of the DNA strand that corresponds to the appropriate template, as indicated above in connection with the gap-fill reaction. There are five DNA polymerases identified in E.coli. All the DNA polymerases differ in structure, functions and rate of polymerization and processivity. DNA Polymerase I is coded by polA gene. It is a single polypeptide and has a role in recombination and repair. It has both 5’— >3’ and 3’— >5’ exonuclease activity. DNA polymerase I removes the RNA primer from lagging strand by 5’— >3’ exonuclease activity and also fills the gap. DNA Polymerase II is coded by polB gene. It is made up of 7 subunits. Its main role is in repair and also a backup of DNA polymerase III. It has 3’— >5’ exonuclease activity. DNA Polymerase III is the main enzyme for replication in E.coli. It is coded by polC gene. It also has proofreading 3’— >5’ exonuclease activity. DNA Polymerase IV is coded by dinB gene. Its main role is in DNA repair during SOS response, when DNA replication is stalled at the replication fork. According to some embodiments, the DNA polymerase may be any DNA polymerase known in the art. According to some embodiments, the DNA polymerase is a high- fidelity DNA polymerase. High-Fidelity DNA Polymerase sets a new standard for both fidelity and robust performance. With the highest fidelity amplification available (-280 times higher than Ta ). Q5 DNA Polymerase results in ultra-low error rates. Q5 DNA Polymerase is composed of a novel polymerase that is fused to the processivity-enhancing Sso7d DNA binding domain, improving speed, fidelity and reliability of performance. According to some embodiments, the high-fidelity DNA polymerase in GC enriched DNA regions. According to some embodiments, the DNA polymerase includes, but is not limited to, any one or more of the following: Q5 High- Fidelity (HF) DNA Polymerase, Advantage® GC Genomic LA Polymerase (Takara), PrimeSTAR® GXL DNA Polymerase (Takara) and AccuPrime™ GC-Rich DNA Polymerase (Invitrogen), Platinum SuperFi II DNA Polymerase (Thermo Fisher Scientific), KAPA2G Robust HotStart PCR Kit. Still further, in some specific embodiments, a Q5 high fidelity DNA polymerase is used in the present polymerization reaction.In some embodiments, at least one DNA polymerase and dNTPs are added to the hybridized MIP for performing the polymerization reaction. More specifically, in some embodiments, the reaction mixture as referred to herein may comprise in some embodiments any suitable elements required for the polymerization reaction.
In some embodiments at least one ligase is added to the reaction. In yet some further embodiments, the reaction and/or ligation reaction is performed by incubating in a thermal cycler. More specifically DNA Ligase, as used herein, is an enzyme that catalyzes the NAD-dependent ligation of adjacent 3'-hydroxyl and 5'-phosphate termini in duplex DNA structures. Derived from a thermophilic bacterium, Ampligase DNA Ligase is stable and active at much higher temperatures than conventional DNA ligases. The half-life of Ampligase DNA Ligase is 48 hours at 65°C and more than 1 hour at 95°C. In most cases, the upper limit on reaction temperatures with Ampligase DNA Ligase is determined by the Tm of the DNA substrate. Under conditions of maximal hybridization stringency, nonspecific ligation is nearly eliminated. Ampligase DNA Ligase has no detectable activity on blunt ends or RNA substrates. The enzyme is active in a variety of DNA polymerase buffers within a pH range of 7-8. It should be understood that any ligase may be used for the disclosed method. Still further, in some embodiments, the polymerization and ligation may be performed at an appropriate temperature for a suitable period of time.
In some embodiments, the methods according to the present disclosure is performed using a thermocycler. Thermocycler (also known as a thermal cycler, PCR machine or DNA amplifier), as used herein, is a laboratory apparatus most commonly used to amplify segments of DNA via the polymerase chain reaction (PCR). Thermal cyclers may also be used in laboratories to facilitate other temperature-sensitive reactions, including enzymatic reaction (polymerization, exonuclease, restriction enzyme digestion, ligation). The device has a thermal block with holes where tubes holding the reaction mixtures can be inserted. The cycler then raises and lowers the temperature of the block in discrete, pre-programmed steps. The ramp rate of a thermal cycler indicates the change in temperature from one PCR step to another over time and is usually expressed in degrees Celsius per second (°C/sec). The terms “up ramp” and “down ramp” refer to the heating and cooling of thermal blocks, respectively.
It should be understood that in some embodiments where the target nucleic acid sequence is an RNA, prior to the hybridization reaction, the nucleic acid molecules are converted into DNA molecules, specifically, cDNA molecules by reversed transcription, for example by using reverse transcriptase. As indicated above, in some embodiments, the disclosed methods may comprise an amplification step that may be performed by any suitable amplification methods. In some particular and non-limiting embodiments, the amplification is performed using a PCR reaction.
"Polymerase chain reaction" or "PCR" means a reaction for the in vitro amplification of specific DNA sequences by the simultaneous primer extension of complementary strands of DNA, as is notoriously well known in the art. In other words, PCR is a reaction for making multiple copies or replicates of a target nucleic acid flanked by primer binding sites, such reaction comprising one or more repetitions of the following steps: (i) denaturing the target nucleic acid, (ii) annealing primers to the primer binding sites, and (iii) extending the primers by a nucleic acid polymerase in the presence of nucleoside triphosphates. Usually, the reaction is cycled through different temperatures optimized for each step in a thermal cycler instrument. Particular temperatures, durations at each step, and rates of change between steps depend on many factors well-known to those of ordinary skill in the art. For example, in a conventional PCR using Taq DNA polymerase, a double stranded target nucleic acid may be denatured at a temperature >90°C, primers annealed at a temperature in the range 50-75 °C, and primers extended at a temperature in the range 72-78°C. The term "PCR" encompasses derivative forms of the reaction, including but not limited to, RT-PCR, real-time PCR, nested PCR, quantitative PCR, multiplexed PCR, and the like.
Thus in some embodiments, the methods of the present disclosure may comprise any one of real-time PCR, nested PCR, quantitative PCR, multiplexed PCR, and the like.
Reaction volumes range from a few hundred nanoliters, e.g., 200 nl, to a few hundred pl, e.g. 200 pl. "Reverse transcription PCR" or "RT-PCR" means a PCR that is preceded by a reverse transcription reaction that converts a target RNA to a complementary single stranded DNA, which is then amplified. "Nested PCR" means a two-stage PCR wherein the amplicon of a first PCR becomes the sample for a second PCR using a new set of primers, at least one of which binds to an interior location of the first amplicon. As used herein, "initial primers" in reference to a nested amplification reaction mean the primers used to generate a first amplicon, and "secondary primers" mean the one or more primers used to generate a second, or nested, amplicon. "Multiplexed PCR" means a PCR wherein multiple target sequences are simultaneously carried out in the same reaction mixture.
In some embodiments, the methods of the present disclosure may comprise deduplicated PCR or deduplication of PCR (for example as shown in Example 8). In some embodiments, UMIs may be used for deduplicated PCR or deduplication of PCR. In next-generation sequencing (NGS) or high-throughput sequencing, duplicate DNA fragments can be generated during the PCR amplification process. These duplicates may arise due to various factors, such as biased amplification, template overamplification, or other technical artifacts. The presence of duplicate fragments can affect the accuracy of downstream analyses, particularly variant calling and quantification. Deduplicated PCR involves the identification and removal of these duplicate DNA fragments to ensure more accurate and reliable data analysis. For instance: 1) Removal of False positives: by detection of sequencing reads that originated from the same molecule, one can create a consensus sequence per original sequence and remove erroneous generated by e.g. amplification errors and sequencing errors. 2) Detection of low allele frequencies: Through the deduplication of reads, one can achieve a more precise depth count per base. This becomes particularly valuable when dealing with low-abundance bases, as in applications like somatic mutation calling in cancer research. UMIs may be used for deduplication of PCR noise based on the UMI sequence which is part of the probe and hence may enable deduplication of the amplification noise (Kivioja, T., et al. Nat Methods 9, 72-74 (2012)).
In some embodiments, the number of PCR cycles may be optimized (for example as shown in Example 6) specifically the number of cycles may be 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50. In some specific embodiments, the number of PCR cycles may be 20 or 22 or 24.
As indicated above, for separating the cyclized products obtained in the polymerization and ligation step from any linear MIPs or other linear nucleic acid molecules that may be present in the reaction mixture, the disclosed method may optionally comprise an addition step of enzymatic digestion. However, it should be appreciated that in some embodiments, the digestion involves the use of at least one exonuclease. The term "Exonucleases" refers to enzymes that catalyze the removal of nucleotides in either the 5-prime to 3-prime or the 3-prime to 5-prime direction from the ends of single-stranded and/or double-stranded DNA. Removal of nucleotides is achieved by cleavage of phosphodiester bonds via hydrolysis. Most exonucleases digest at nicks in the DNA. Some exonucleases remove one base at a time. Lambda Exonuclease is an example of this and transforms double-stranded DNA into single-stranded DNA by chewing from the free ending containing a 5-prime phosphate, degrading one strand preferentially but not the other. Other examples are Exo I and Exo III. Other exonucleases, such as T5, ExoV or Exo VII remove short oligos. The products of T5 Exo also include individual bases. Exonucleases such as Exo VII and V, digest in both the 5-prime to 3-prime and 3-prime to 5-prime direction, while others, such as Exo T and Exo I, only work in one direction. Some exonucleases, such as Exo I and Exo T only digest single-stranded DNA while leaving behind double-stranded DNA. Exonucleases such as T7 Exo digest only double-stranded DNA, while others, such as T5 Exo and Exo V, can digest both single and double-stranded DNA. In more specific embodiments, Exonuclease I and/or Exonuclease III are used. In some embodiments, any form of linear MIP probe and/or nucleic acid sequence is removed following the gap-fill reaction by digestion with a combination of exonucleases. The exonuclease mixture contains exonuclease I and exonuclease III. Exonuclease
1 may digest single-stranded DNA in a 3'— >5' direction, requires a free 3'-hydroxyl terminus, but does not digest double-stranded DNA. Exonuclease III is a 3 '-exonuclease which catalyzes the removal of mononucleotides from the 3'-OH end of double stranded DNA. It also dephosphorylates DNA strands which possess a 3'-phosphate group and has RNase H activity. Exonuclease VII digests DNA from free 3' or 5' ends. Exonuclease VII has been reported to have little activity on circularized DNA.
In accordance with the present disclosure, simultaneous targeted sequencing of at least two samples may be performed wherein the hybridization or cyclized or non-digested cyclized product/s obtained in any one of step (a) or (b) or (c) and corresponding to each of said at least two samples are mixed/pooled together before the amplification step (d).
In some specific, the methods according to the present disclosure may enable the mixing/pooling of between 2 to about 100,000 samples. For example, 2 to 90,000, 2 to 85,000, 2 to 80,000, 2 to 75,000, 2 to 70,000, 2 to 65,000, 2 to 60,000, 2 to 55, 000, 2 to 50,000, 2 to 45,000,
2 to 40,000, 2 to 35,000, 2 to 30,000, 2 to 25,000, 2 to 20,000, 2 to 15,000, 2 to 10,000, 2 to 900, 2 to 9000, 2 to 8500, 2 to 8000, 2 to 7500, 2 to 7000, 2 to 6500, 2 to 6000, 2 to 5500, 2 to 5000, 2 to 4500, 2 to 4000, 2 to 3500, 2 to 3000, 2 to 2500, 2 to 2000, 2 to 1500, 2 to 1000, 2 to 950, 2 to 900, 2 to 850, 2 to 800, 2 to 750, 2 to 700, 2 to 650, 2 to 600, 2 to 550, 2 to 500, 2 to 450, 2 to 400, 2 to 350, 2 to 300, 2 to 250, 2 to 200, 2 to 150, 2 to 100, 2 to 95, 2 to 90, 2 to 85, 2 to 80, 2 to 75, 2 to 70, 2 to 65, 2 to 60, 2 to 55, 2 to 50, 2 to 45, 2 to 40, 2 to 35, 2 to 30, 2 to 25, 2 to 20, 2 to 15, 2 to 10, specifically 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 40, 50 75, 100, 150, 200, 250, 500, 1000, 10,000, 100,000 or more samples. In some specific embodiments, the methods according to the present disclosure may enable the mixing/pooling of 16 or 24 or 48 or 96 samples.
In some embodiments, the method for simultaneous targeted sequencing of at least two samples may further comprise a demultiplexing step or procedure. In some further embodiments, the method for simultaneous targeted sequencing of at least two samples may comprise an in silico demultiplexing step or procedure. As mentioned above, the sample index identifier is sequenced together as part of the sequencing procedure and allows the allocation of the sequencing data to the original sample from which it was originated in an in silico procedure termed ‘ ‘demultiplexing” . As defined herein, in some embodiments, the mixing/pooling of the different product/s mentioned above may be performed in several ways, that will result in either conserving the initial volume of reaction or reducing the initial volume of reaction or increasing the initial volume of reaction. For example, in some embodiments, after every pooling step (that is carried out in the same volume, from every reaction), the following reaction may be processed at a single initial reaction volume.
In some embodiments, the pooling of samples of higher expected VAF% may be performed at lower volume and the pooling of samples of lower expected VAF% may be performed at higher volume as shown in Example 7. In some embodiments, the ratios between the volume of samples of higher expected VAF% to sample with lower expected VAF% may be about 1:1.5, 1:2, 1:3, 1:3.5, 1:4.0, 1:4.5, 1:5.0, 1:5.5, 1:6.0, 1:6.5, 1:7.0, 1:7.5, 1:8.0, 1:8.5, 1:9.0, 1:9.5, 1:10.0, 1:10.5, 1:11.0, 1:11.5, 1:12.0, 1:12.5, 1:13.0, 1:13.5, 1:14.0, 1:14.5, 1:15.0, 1:15.5, 1:16.0, 1:16.5, 1:17.0,
1:17.5, 1:18.0, 1:18.5, 1:19.0, 1:19.5, 1:20.0, 1:20.5, 1:21.0, 1:21.5, 1:22.0, 1:22.5, 1:23.0, 1:23.5,
1:24.0, 1:24.5, 1:25.0, 1:25.5, 1:26.0, 1:26.5, 1:27.0, 1:27.5, 1:28.0, 1:28.5, 1:29.0, 1:29.5, 1:30.0,
1:30.5, 1:31.0, 1:31.5, 1:32.0, 1:32.5, 1:33.0, 1:33.5, 1:34.0, 1:34.5, 1:35.0, 1:35.5, 1:36.0, 1:36.5,
1:37.0, 1:37.5, 1:38.0, 1:38.5, 1:39.0, 1:39.5, 1:40.0, 1:40.5, 1:41.0, 1:41.5, 1:42.0, 1:42.5, 1:43.0,
1:43.5, 1:44.0, 1:44.5, 1:45.0, 1:45.5, 1:46.0, 1:46.5, 1:47.0, 1:47.5, 1:48.0, 1:48.5, 1:49.0, 1:49.5,
1:50.0, 1:50.5, 1:51.0, 1:51.5, 1:52.0, 1:52.5, 1:53.0, 1:53.5, 1:54.0, 1:54.5, 1:55.0, 1:55.5, 1:56.0,
1:56.5, 1:57.0, 1:57.5, 1:58.0, 1:58.5, 1:59.0, 1:59.5, 1:60.0, 1:60.5, 1:61.0, 1:61.5, 1:62.0, 1:62.5,
1:63.0, 1:63.5, 1:64.0, 1:64.5, 1:65.0, 1:65.5, 1:66.0, 1:66.5, 1:67.0, 1:67.5, 1:68.0, 1:68.5, 1:69.0,
1:69.5, 1:70.0, 1:70.5, 1:71.0, 1:71.5, 1:72.0, 1:72.5, 1:73.0, 1:73.5, 1:74.0, 1:74.5, 1:75.0, 1:75.5,
1:76.0, 1:76.5, 1:77.0, 1:77.5, 1:78.0, 1:78.5, 1:79.0, 1:79.5, 1:80.0, 1:80.5, 1:81.0, 1:81.5, 1:82.0,
1:82.5, 1:83.0, 1:83.5, 1:84.0, 1:84.5, 1:85.0, 1:85.5, 1:86.0, 1:86.5, 1:87.0, 1:87.5, 1:88.0, 1:88.5,
1:89.0, 1:89.5, 1:90.0, 1:90.5, 1:91.0, 1:91.5, 1:92.0, 1:92.5, 1:93.0, 1:93.5, 1:94.0, 1:94.5, 1:95.0,
1:95.5, 1:96.0, 1:96.5, 1:97.0, 1:97.5, 1:98.0, 1:98.5, 1:99.0, 1:99.5 or 1:100.0.
The concentration of DNA starting materials (i.e. concentration of target nucleic acid sequence originating from at least one sample mentioned in step (a) of the method of the present disclosure) may be optimized as described in Example 5. In some embodiments the concentration of target nucleic acid sequence originating from at least one sample may range from 1 ng/pl to 10000 ng/pl, or from 1 ng/pl to 100 ng/pl, 10 ng/ pl to 100 ng/pl, 20 ng/ pl to 100 ng/pl, 30 ng/ pl to 100 ng/pl, 40 ng/ pl to 100 ng/pl, 50 ng/ pl to 100 ng/pl, 60 ng/ pl to 100 ng/pl, 70 ng/ pl to 100 ng/pl, 80 ng/ pl to 100 ng/pl, 90 ng/ pl to 100 ng/pl, or 100 ng/pl to 10000 ng/pl, 200 ng/pl to 10000 ng/pl, 300 ng/pl to 10000 ng/pl, 400 ng/pl to 10000 ng/pl, 500 ng/pl to 10000 ng/pl, 600 ng/pl to 10000 ng/pl, 700 ng/pl to 10000 ng/pl, 800 ng/pl to 10000 ng/pl, 900 ng/pl to 10000 ng/pl, 1000 ng/pl to 10000 ng/pl, or 100 ng/pl to 1000 ng/pl, 100 ng/pl to 900 ng/pl, 100 ng/pl to 800 ng/pl, 100 ng/pl to 700 ng/pl, 100 ng/pl to 600 ng/pl, 100 ng/pl to 500 ng/pl, 100 ng/pl to 400 ng/pl, 100 ng/pl to 300 ng/pl, 100 ng/pl to 200 ng/pl. In some other embodiments, the concentration of target nucleic acid sequence originating from at least one sample may be about 1 , 2, 3, 4, 5, 6, 7, 8, 9 or 10 ng/pl, or about 10, 20, 30, 40, 50, 60, 70, 80, 90, 100 ng/pl or about 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000 ng/pl.
The concentration of DNA material present in each step of the methods (i.e. concentration of hybridization or cyclized or non-digested cyclized product/s of the method of the present disclosure) may also be optimized as shown in Example 6. In some embodiments the concentration of hybridization or cyclized or non-digested cyclized product/s may range from 1 ng/pl to 10000 ng/pl, or from 1 ng/pl to 100 ng/pl, 10 ng/ pl to 100 ng/pl, 20 ng/ pl to 100 ng/pl, 30 ng/ pl to 100 ng/pl, 40 ng/ pl to 100 ng/pl, 50 ng/ pl to 100 ng/pl, 60 ng/ pl to 100 ng/pl, 70 ng/ pl to 100 ng/pl, 80 ng/ pl to 100 ng/pl, 90 ng/ pl to 100 ng/pl, or 100 ng/pl to 10000 ng/pl, 200 ng/pl to 10000 ng/pl, 300 ng/pl to 10000 ng/pl, 400 ng/pl to 10000 ng/pl, 500 ng/pl to 10000 ng/pl, 600 ng/pl to 10000 ng/pl, 700 ng/pl to 10000 ng/pl, 800 ng/pl to 10000 ng/pl, 900 ng/pl to 10000 ng/pl, 1000 ng/pl to 10000 ng/pl, or 100 ng/pl to 1000 ng/pl, 100 ng/pl to 900 ng/pl, 100 ng/pl to 800 ng/pl, 100 ng/pl to 700 ng/pl, 100 ng/pl to 600 ng/pl, 100 ng/pl to 500 ng/pl, 100 ng/pl to 400 ng/pl, 100 ng/pl to 300 ng/pl, 100 ng/pl to 200 ng/pl. In some other embodiments, the concentration of hybridization or cyclized or non-digested cyclized product/s may be about 1 , 2, 3, 4, 5, 6, 7, 8, 9 or 10 ng/pl, or about 10, 20, 30, 40, 50, 60, 70, 80, 90, 100 ng/pl or about 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000 ng/pl.
In yet some further embodiments, the disclosed method further comprises a sequencing step.
In some more specific embodiments, the methods of the present disclosure further comprises sequencing the amplified nucleic acid sequences obtained in step (d).
More specifically, the synthetized sequences obtained by the disclosed methods are subjected in some optional embodiments to any suitable sequencing method. Sequencing of the target sequence thus allows to define various variants of the analyzed target sequence. As used herein, “DNA sequencing” or “Sequencing” is the process of determining the nucleic acid sequence - the order of nucleotides in DNA. It includes any method or technology that is used to determine the order of the four bases: adenine, guanine, cytosine, and thymine. Several methods for DNA sequencing were developed and became commercially available in the past two decades. Together these were called the "next-generation" or "second-generation" sequencing (NGS) methods, in order to distinguish them from the earlier methods, including Sanger sequencing. NGS technology is typically characterized by being highly scalable, allowing the entire genome to be sequenced at once. Usually, this is accomplished by fragmenting the genome into small pieces, randomly sampling for a fragment, and sequencing it using one of a variety of technologies. An entire genome sequencing is possible because multiple fragments are sequenced at once (giving it the name "massively parallel" sequencing) in an automated process. More specifically, NGS generates large quantities of sequence data within a shorter time duration and massive cost reduction as compared to conventional Sanger’s sequencing method. This technique uses different chemistries, matrices and bioinformatics technologies which can be used to sequence entire genome in shorter time periods. DNA sequencing pipeline includes various steps which includes, DNA fragmentation, NGS Library preparation (these two can be combined by transposase mediated library preparation) Sequencing and Data analysis. In DNA Fragmentation, targeted DNA is broken into several small segments using different methods like sonication and enzymatic digestion. The next step involves the preparation of a NGS Library, wherein each piece of the fragmented DNA is modified DNA to be sequencing ready, namely by adding DNA sequences (adapters) that are required for sequencing instrument compatibility, in some embodiments of DNA sequencing generally termed “targeted sequencing” the desired target is captured after library preparation (“probe capture” or amplified “amplicon/MIP” from the genomic template). The library is sequenced using the various DNA sequencing methods. Each DNA fragment has an adapter on one end that connects it to a solid substrate such as beads or flow cells, and another adapter on the other end that anneals to a primer that starts the polymerase chain reaction (PCR). PCR produces several copies of the same fragment, which are sequenced at the same time. As a result, these techniques are sometimes referred to as massively parallel sequencing techniques. DNA Sequencing may be performed in some embodiments, using an NGS sequencer. In a specific sequencer, the library is uploaded onto a sequencing matrix. The platform on which the sequencing takes place is known as a sequencing matrix. Sequencing matrices differ depending on the sequencer. For example, the Illumina NGS sequencer uses flow cells, while the Ion torrent NGS sequencer uses sequencing chips.
Several generations of sequencing methods have been developed. The present disclosure encompasses the use of any known method. To name but a few, Pyrosequencing / 454 Sequencing, ABI SOLiD, Solexa/Illumina Sequencing, Pacific Biosciences Single Molecule Real Time Reads, Nanopore DNA Sequencing, Singular Genomics G4, Element Biosciences AVITI, Ultima Genomics.
The required short segments are isolated using different methods such as Hybridization Capture Assay, Amplicon Assay. Still further, in some embodiments, the disclosed method may further comprise applying machine learning algorithm on the identified variants or a subgroup thereof, for calculating sensitivity, specificity and precision thereof.
In yet another embodiment, further comprising identification of the amplified nucleic acid sequences obtained in step (d) via array-based hybridization approaches.
The methods described herein may be used along with next-generation sequencing; it may be used with other downstream methods such as microarrays, counting by digital PCR, real-time PCR, Mass-spectrometry analysis etc.
In some embodiments, the at least one MIP is a plurality of MIPs comprising unified at least one sample identifier index, each of the MIPs targets different target nucleic acid sequences of interest.
In some embodiments, the MIP suitable for the method of the present disclosure may be as further defined below in the following aspects of the present disclosure.
Still further, the plurality of MIPs suitable for the method of the present may be as further defined below in the following relevant aspect.
In some embodiments, the method of the present disclosure may further comprise identifying variants of interest.
In another embodiment, the method of the present disclosure may further comprise applying machine learning algorithm on the identified variants or a subgroup thereof, for calculating sensitivity, specificity and precision thereof. In some further embodiment, the subgroup of variants comprises variants having Variant Allele Frequency (VAF) below threshold. The present disclosure thus provides a sensitive and improved method displaying noise reduction allowing detection of variants with VAF as low as 10%, 9%, 8%, 7%, 6%, 5%, 4%, 3%, 2%, 1%, 0.1 %, 0.9%, 0.8%, 0.7%, 0.6%, 0.5%, 0.4%, 0.3%, 0.2%, 0.1%, more specifically, between 0.5% to 0.6%, specifically, 0.51%, 0.52%, 0.53%, 0.54%, 0.55%, 0.56%, 0.57%, 0.58%, 0.59%, 0.6%, or less, specifically, 0.5, with sensitivity of about 100% to 75%, specifically, 100%, 99%, 98%, 97%, 96%, 95%, 94%, 93%, 92%, 91%, 90%, 89%, 88%, 87%, 86%, 85%, 84%, 3%, 82%, 81%, 80%, 79%, 78%, 77%, 76%, 75%, or less, and specifically, 80% sensitivity and significantly higher precision.
The term “target nucleic acid sequence” or "target nucleic acid of interest ", as used herein, refers to the sample nucleic acid putatively including a target sequence of interest. The target sequence of interest, with regard to a MIP includes those sequences complementary to the MIP homology regions. The sequence may include one or more interrogated nucleotides that may or may not match a corresponding nucleotide on a MIP homology region, or may or may not provide a substrate for a polymerase provided with the complementary dNTP/s.
Still further, the terms "target nucleic acid sequence of interest ", “nucleic acid sequence of interest”, "a target gene of interest", “a target gene", are used interchangeably, and refer in some embodiments to a nucleic acid sequence that may comprise or comprised within a gene or any fragment or derivative thereof. The target nucleic acid sequence or gene of interest may comprise coding or non-coding DNA regions, or any combination thereof. In some embodiments, the nucleic acid sequence of interest may comprise coding sequences and thus may comprise exons or fragments thereof that encode any product. In other embodiments, the target nucleic acid sequence of interest may comprise non-coding sequences, as for example start codons, 5’ untranslated regions (5’ UTR), 3’ un-translated regions (3’ UTR), or other regulatory sequences, in particular regulatory sequences.
In some embodiments, the at least one target nucleic acid sequence of interest is at least one of a genomic nucleic acid sequence, a transcriptomic nucleic acid sequence, and a circulating free DNA (cfDNA).
In some further embodiments, the at least one target nucleic acid sequence of interest is a nucleic acid sequence associated with, or comprising, at least one of: genetic and/or epigenetic variation/s, pathologic disorder/s, infectious entity, microorganism/s and GC-rich regions.
In some embodiments, the disclosed MIP-based targeted sequencing methods are particularly useful for target nucleic acid sequences comprising GC-rich regions. As indicated herein, the disclosed methods are particularly effective and applicable for target nucleic acid sequences that comprise GC -regions or display high GC-content. GC-content (or guanine- cytosine content) is the percentage of nitrogenous bases in a DNA or RNA molecule that are either guanine (G) or cytosine (C). This measure indicates the proportion of G and C bases out of an implied four total bases, also including adenine and thymine in DNA and adenine and uracil in RNA.
GC-content may be given for a certain fragment of DNA or RNA or for an entire genome. When it refers to a fragment, it may denote the GC-content of an individual gene or section of a gene (domain), a group of genes or gene clusters, a non-coding region, or a synthetic oligonucleotide such as a primer. The GC content of a gene region can impact its coverage, with regions having 50-60% GC content receiving the highest coverage while regions with high (70- 80%) or low (30-40%) GC content having significantly decreased coverage.
In some alternative embodiments, the genetic variations comprise at least one of: single nucleotide variant (SNVs) and/or single- nucleotide polymorphisms (SNPs), insertions and/or deletions, (indels), inversions, copy number variations (CNV), structural variations, alternative splicing, loss of heterozygosity (LOH), gene fusions, translocations, duplications and variable number of tandem repeats.
In some embodiments, the disclosed MIP-based targeted sequencing methods are particularly useful for target nucleic acid sequences comprising at least one of: single nucleotide variant (SNVs) and/or single- nucleotide polymorphisms (SNPs), insertions and/or deletions, (indels), inversions, copy number variations (CNV), structural variations, alternative splicing, loss of heterozygosity (LOH), gene fusions, translocations, duplications and variable number of tandem repeats.
In some embodiments, the disclosed MIP-based targeted sequencing methods are particularly useful for target nucleic acid sequences comprising SNP. The term “single nucleotide polymorphism” (SNP) as herein defined, refers to a single base change in the DNA sequence. For a base position with sequence alternatives in genomic DNA to be considered as a SNP, the least frequent allele (the “minor allele”) should have a frequency of 1 % or greater. The most frequent allele is referred to as the “major allele”. SNPs are usually bi-allelic, mainly due to the low frequency of single nucleotide substitutions in DNA. As known to a person skilled in the art, the term “SNP” usually refers to the least frequent allele (i.e. the minor allele), when present in the genome either on both chromosomes (then an individual is said to be homozygous for a certain polymorphism) or on a single chromosome (then an individual is said to be heterozygous for a certain polymorphism). Known specific SNPs are assigned with unique identifiers, usually referred to by accession numbers with a prefix such as “SNP”, "refSNP" or "rs", as known to one of skill in the art. Single nucleotide polymorphism database (dbSNP) of nucleotide sequence variation is available on the NCBI website.
In some embodiments, the disclosed MIP-based targeted sequencing methods are particularly useful for target nucleic acid sequences comprising CNV. Copy-number variation, as used herein, means variation from one person to another in the number of copies of a particular gene or DNA sequence.
In some embodiments, the disclosed MIP-based targeted sequencing methods are particularly useful for target nucleic acid sequences comprising deletion. Deletion, refers to any mutation that involves the loss of genetic material. It can be small, involving a single missing DNA base pair, or large, involving hundreds or thousands of nucleotides, and in some embodiments event a piece of a chromosome.
In some embodiments, the disclosed MIP-based targeted sequencing methods are particularly useful for target nucleic acid sequences comprising indel. Indel as referred to herein relates to an insertion or deletion of bases in the genome of an organism. It is classified among small genetic variations, measuring from 1 to 10,000 base pairs in length. A microindel is defined as an indel that results in a net change of 1 to 50 nucleotides.
In some embodiments, the disclosed MIP-based targeted sequencing methods are particularly useful for target nucleic acid sequences comprising insertion mutation. Insertion mutation, as used herein is a mutation involving the addition of genetic material. An insertion mutation can be small, involving a single extra DNA base pair, or large, involving a piece of a chromosome/s.
In some embodiments, the disclosed MIP-based targeted sequencing methods are particularly useful for target nucleic acid sequences comprising inversion. Inversion, is a chromosomal segment that has been broken off and reinserted in the same locus, but with the reverse orientation.
In some embodiments, the disclosed MIP-based targeted sequencing methods are particularly useful for target nucleic acid sequences comprising translocation. Translocation refers to herein as the positional change of one or more chromosome segments in cells or gametes.
Still further, in some embodiments, the disclosed MIP-based targeted sequencing methods are particularly useful for target nucleic acid sequences comprising structural variations in nucleic acid molecules, for example, genomic organization or topological organization of nucleic acids. More specifically, although genomes are defined by their sequence, the linear arrangement of nucleotides is only their most basic feature. A fundamental property of genomes is their topological organization in three-dimensional space in the intact cell nucleus. The application of imaging methods and genome-wide biochemical approaches, combined with functional data, is revealing the precise nature of genome topology/organization and its regulatory functions in gene expression and genome maintenance. In the context of the subject disclosure, genomic organization refers to the linear order of DNA elements and their division into chromosomes. Genome organization can also refer to the 3D structure of chromosomes and the positioning of DNA sequences within the nucleus. There are several techniques to capture chromosome/chromatin confirmation. One non-limiting example for high-throughput genomic and epigenomic technique to capture chromatin conformation is the Hi-C (or standard Hi-C) technique. In general, Hi-C is considered as a derivative of a series of chromosome conformation capture technologies, including but not limited to 3C (chromosome conformation capture), 4C (chromosome conformation capture-on-chip/circular chromosome conformation capture), and 5C (chromosome conformation capture carbon copy). Hi-C comprehensively detects genome-wide chromatin interactions in the cell nucleus by combining 3C and next-generation sequencing (NGS) approaches and has been considered as a qualitative leap in C-technology (chromosome conformation capture-based technologies) development and the beginning of 3D genomics.
Still further, in some embodiments, the disclosed MIP-based targeted sequencing methods are particularly useful for target nucleic acid sequences comprising epigenetic modifications. Epigenetics as referred to herein, relates to heritable phenotype changes that do not involve alterations in the nucleic acid sequence. Epigenetics most often involves changes that affect gene activity and expression, and thereby the phenotype of the cell. Epigenetic modifications or variations, involve in some embodiments, covalent modification of the DNA sequence or of proteins associated with DNA organization and functioning. In some embodiments, epigenetic variations as disclosed herein comprise DNA methylation, (e.g. cytosine methylation and hydroxymethylation), histone modifications (e.g. lysine acetylation, lysine and arginine methylation, serine and threonine phosphorylation, and lysine ubiquitination and sumoylation).
In some embodiments, the target gene or nucleic acid sequence of interest may be any nucleic acid sequence or gene or fragments thereof that display aberrant expression, stability, activity or function in a mammalian subject, as compared to normal and/or healthy subject. Such target gene or any fragments thereof or any target nucleic acid sequence may be in some embodiments, associated, linked or connected, directly or indirectly with at least one pathologic condition. More specifically, the length of the nucleic acid sequence of interest may be about 100,000 nucleotides in length, or less than 75,000 nucleotides in length or less than 50,000 nucleotides in length, or less than 40,000 nucleotides in length, or less than 30,000 nucleotides in length, or less than 20,000 nucleotides in length, or less than 15,000 nucleotides in length, or less than 10,000 nucleotides in length, or less than 5000 nucleotides in length, or less than 1000 nucleotides in length, or less than 900 nucleotides in length, or less than 800 nucleotides in length, or less than 700 nucleotides in length, or less than 600 nucleotides in length, or less than 500 nucleotides in length, or less than 450 nucleotides in length, or less than 400 nucleotides in length, or less than 300 nucleotides in length, or less than 200 nucleotides in length, or less than 100 nucleotides in length, or less than 50 nucleotides in length, or less than 40 nucleotides in length, or less than 30 nucleotides in length, or less than 20 nucleotides in length, or less than 10 nucleotides in length.
It should be appreciated that in some embodiments the target nucleic acid sequence may be any genomic nucleic acid sequence. In some embodiments, genomic nucleic acid sequence may include nuclear DNA and non-nuclear DNA or may be any either linear or circular nucleic acids. For example, nuclear DNA, specifically, chromosomal DNA and Microbiome DNA (e.g., Gut microbiome), as well as circular genomic DNA such as mitochondrial DNA and chloroplast DNA (cpDNA). Still further, genomic nucleic acid sequence may further include genomic nucleic acid molecules of any organism or microorganism as disclosed in the present disclosure, or any nucleic acid sequence of any infectious entity, for example, viruses, specifically, any viruses disclosed by the present disclosure, or any bacteriophages and transducing particles. In some embodiments, the target nucleic acid sequences may be of chromosomal or non-chromosomal source. Nucleic acid sequences of non-chromosomal source encompassed by the present disclosure include transposons, plasmids, mitochondrial DNA, and chloroplast DNA, as well as nucleic acid molecules of any other genetic element. Still further, in some embodiments, the target nucleic acid sequence applicable in the disclosed methods may be any circulating free DNA (cfDNA). More specifically, Cell-free nucleic acids (cf-NAs) include several types of DNA (cf-DNA) and RNA molecules (cell-free non-coding RNAs, and protein coding RNA - mRNA) that are present in extracellular fluids. There are two main types of cf-DNA: cell-free nuclear DNA (cf-nDNA) and cell-free mitochondrial DNA (cf-mtDNA). More specifically, Circulating free DNA (cfDNA) are degraded DNA fragments of about 50 to 200 bp, that are released to the blood plasma. cfDNA can be used to describe various forms of DNA freely circulating in the bloodstream, including circulating tumor DNA (ctDNA), cell-free mitochondrial DNA (ccf mtDNA), and cell-free fetal DNA (cffDNA). Still further, the target nucleic acid sequence applicable in the methods of the present disclosure may be in some embodiments, cell free non-coding RNA or long non-coding RNAs. More specifically, Cell free non-coding RNA (cf-ncRNAs) relate to small non-coding RNA, including but not limited to microRNAs (miRNA), siRNA, piRNA, snRNA, snoRNA, YRNA etc., or long non-coding RNA (IncRNAs) including but not limited to pseudogen RNA, telomerase RNA, circular RNA (cirRNA), etc.
Still further, the target nucleic acid sequence applicable in the methods of the present disclosure may be Long non-coding RNAs. Long non-coding RNAs (IncRNAs) as used herein, are non-protein-coding transcripts with a length of more than 200 nt. They can be transcribed from intergenic regions (long intervening non-coding RNAs), from the introns of protein-coding genes (intronic IncRNAs) or as antisense transcripts of genes. They have broad molecular functions: they may be involved in the epigenetic regulation of allelic expression (e.g., in X chromosome dosage compensation in female mammals), they may act as scaffolds for protein complexes or as decoys for specific target molecules to limit their availability (e.g., IncRNAs possess binding sites for miRNAs, regulating their abundance). They may also serve as precursors for small non-coding RNAs (sncRNA) or be involved in post-transcriptional gene regulation (e.g., antisense IncRNAs binding to their corresponding sense transcripts and alter splice-site recognition or spliceosome recruitment in mRNA processing).
In yet some further embodiments, the target sequence may be transcriptomic nucleic acid sequence, thereby providing information with respect to the transcriptome and/or the exome of an organism.
In some specific embodiments, the at least one target nucleic acid sequence of interest is associated with at least one hereditary, congenital, and/or somatic pathologic disorder or condition.
In some specific embodiments, the neoplastic disorder is Acute Myeloid Leukemia (AML). In some further specific embodiments, the at least one target nucleic acid sequence of interest is derived from a genomic DNA of a human subject prone to have AML.
In some specific embodiments, the target nucleic acid sequences of interest may comprise the genomic locus associated with AML. Thus, in some embodiments a genomic locus associated with AML may comprise the DNMT3A gene. In some further embodiments, said genomic locus may be chr2:25457097-25457316 (hgl9 build). In some more specific embodiments, the target nucleic acid sequences of interest may comprise the R882 codon.
In some more specific embodiments, said different target nucleic acid sequences of interest may comprise the same genomic locus. In another embodiment, said genomic locus may be associated with AML, for example said genomic locus may be chr2:25457097-25457316 (hgl9 build). In some embodiments, said different target nucleic acid sequences of interest may comprise the DNMT3A gene, e.g. said different target nucleic acid sequences of interest may comprise the R882 codon.
In some further embodiments, said genomic locus may be chrl:114713908. In some specific embodiments, the genomic locus may comprise the NRAS gene, e.g. may be associated with cancer.
In yet some other embodiments, said genomic locus may be chrl7:7,675,206, e.g. may comprise the TP53 gene, e.g. may be associated with cancer.
In some specific embodiments, the genomic locus may comprise the JAK2 gene, e.g. may be associated with cancer.
In yet another embodiment, the infectious entity may be at least one of a viral, a bacterial, a fungal, a parasitic and a protozoan pathogen.
In some embodiments, the at least one sample is a biological or environmental sample.
In some embodiments, the terms "sample", "test sample" and "specimen" are used interchangeably in the present specification and claims and are used in its broadest sense. They are meant to include both biological and environmental samples and may include an exemplar of synthetic origin. This term refers to any media that may contain the at least one microorganism, e.g., a pathogen and may include fluid, cell and/or tissue samples. In some embodiments herein, the biological sample is a fluid sample. Fluid sample include, but are not limited to, saliva, mucosa, feces, serum, urine, blood, plasma, cerebral spinal fluid (CSF), milk, bronchoalveolar lavage (BAL) fluid, rinse fluid obtained from wash of body cavities, phlegm, pus. Still further, biological samples including samples taken from various body regions (nose, throat, vagina, ear, eye, skin, sores), food products (both solids and fluids) and swabs taken from medicinal instruments, apparatus, materials), samples from various surfaces [hospitals, elderly homes, food manufacturing facilities, slaughterhouses, pharmaceutical equipment (catheters etc), food preparation or packaging products), solutions and buffers], sewage etc.
More specifically, biological samples may be provided from animal, including human, fluid, solid (e.g., stool) or tissue, as well as liquid and solid food and feed products, food designed for human consumption, a sample including food designed for animal consumption, food matrices and ingredients such as dairy items, vegetables, meat and meat by-products, waste and sewage. In some embodiments, biological samples may include saliva, mucosa (nasal or oral swab samples), feces, serum, blood, urine, anterior nares specimen collected by a healthcare professional or by onsite or home self-collection specimens throat swab. Biological samples and specimens may be obtained from human as well as from all of the various families of domestic animals, as well as feral or wild animals, including, but not limited to, such animals as ungulates, bear, birds, fish, lagamorphs, rodents, etc.
Still further, environmental samples include environmental material such as surface matter, earth, soil, water, air and industrial samples, as well as samples obtained from food and dairy processing instruments, apparatus, equipment, utensils, disposable and non-disposable items. These examples are not to be construed as limiting the sample types applicable to the present disclosure. The sample may be any media, specifically, a liquid media that may contain the target nucleic acid molecules or sequences. Typically, substances, surfaces and samples or specimens that are a priori not liquid may be contacted with a liquid media which is used and tested by the methods disclosed herein.
In some embodiments, the at least one sample is a biological sample and originates from a subject.
In some embodiments, the at least two samples are biological samples and originated from the same subject or different subjects.
In some embodiments, the subject is at least one organism of the biological kingdom Animalia or at least one organism of the biological kingdom Plantae. The methods of the present disclosure may be applicable for any subject of the biological kingdom Animalia. It should be understood that an organism of the Animalia kingdom in accordance with the present disclosure includes any invertebrate or vertebrate organism. in some embodiments, the methods of the present disclosure may be applicable for an invertebrate organism. More specifically, Invertebrates are animals that neither possess nor develop a vertebral column (commonly known as a backbone or spine), derived from the notochord. This includes all animals apart from the subphylum Vertebrata. More specifically, invertebrates include the Phylum Porifera - Sponges, the Phylum Cnidaria - Jellyfish, hydras, sea anemones, corals, the Phylum Ctenophora - Comb jellies, the Phylum Platyhelminthes - Flatworms, the Phylum Mollusca - Molluscs, the Phylum Arthropoda - Arthropods, the Phylum Annelida - Segmented worms like earthworm and the Phylum Echinodermata - Echinoderms. Familiar examples of invertebrates include insects; crabs, lobsters and their kin; snails, clams, octopuses and their kin; starfish, sea-urchins and their kin; jellyfish and worms.
Still further, in some embodiments, the methods of the present disclosure may be applicable for a vertebrate organism. Vertebrates comprise all species of animals within the subphylum Vertebrata (chordates with backbones). The animals of the vertebrates group include Fish, Amphibians, Reptiles, Birds and Mammals (e.g., Marsupials, Primates, Rodents and Cetaceans).
Vertebrates represent the overwhelming majority of the phylum Chordata, with currently about 66,000 species described. Vertebrates include the jawless fish and the jawed vertebrates, which include the cartilaginous fish (sharks, rays, and ratfish) and the bony fish.
Still further, in some embodiments, the subject of the present disclosure may be any one of a human or non-human mammal, an avian, an insect, a fish, an amphibian, a reptile, a crustacean, a crab, a lobster, a snail, a clam, an octopus, a starfish, a sea-urchin, jellyfish, and worms.
In more specific embodiments, the subject referred to herein may be a mammal. In yet some further embodiments, such mammalian organisms may include any member of the mammalian nineteen orders, specifically, Order Artiodactyla (even-toed hoofed animals), Order Carnivora (meat-eaters), Order Cetacea (whales and purpoises), Order Chiroptera (bats), Order Dermoptera (colugos or flying lemurs), Order Edentata (toothless mammals), Order Hyracoidae (hyraxes, dassies), Order Insectivora (insect-eaters), Order Lagomorpha (pikas, hares, and rabbits), Order Marsupialia (pouched animals), Order Monotremata (egg-laying mammals), Order Perissodactyla (odd-toed hoofed animals), Order Pholidata, Order Pinnipedia (seals and walruses), Order Primates (primates), Order Proboscidea (elephants), Order Rodentia (gnawing mammals), Order Sirenia (dugongs and manatees), Order Tubulidentata (aardvarks). In yet some further embodiments, the present disclosure may be applicable for any organism of the order primates. More specifically, primates are divided into two distinct suborders, the first is the strepsirrhines that includes lemurs, galagos, and lorisids. The second is haplorhines - that includes tarsier, monkey, and ape clades, the last of these including humans. In yet some further embodiments, the present disclosure may be applicable for any organism of the subfamily Homininae, that includes the hylobatidae (gibbons) and the hominidae that includes ponqunae (orangutans) and homininae [gorillini (gorilla) and hominini ((panina(chimpanzees) and hominina (humans))].
In some specific embodiment, the methods of the present disclosure may be applicable for a mammal that may be any domestic mammal, for example, at least one of a Cattle, domestic pig (swine, hog), sheep, horse, goat, alpaca, lama and Camels. Still further, in some embodiments, the mammalian subject is human subject.
As mentioned above, the present disclosure concerns any eukaryotic organism and as such, may be also applicable for members of the biological kingdom Plantae.
In more specific embodiments, the disclosed methods may be applicable for any plant. In more specific embodiments, such plant may be a dioecious plant or monoecious plant.
More specifically, in some embodiments the organism of the biological kingdom Plantae may be a dioecious plant, specifically, a plant presenting biparental reproduction. In some specific embodiments, the plant diagnosed by the disclosed methods may be of the family Cannabaceae, specifically, any one of Cannabis (hemp, marijuana) and Humulus (hops). In more specific embodiments, the plant of the family Cannabaceae may be Cannabis (hemp, marijuana). In yet some further embodiments, the plant of the family Cannabaceae may be Humulus (hops).
In some embodiments, any plants are applicable in the present disclosure, for example, any model plants such as, Arabidopsis, Tobacco, Solanum licopersicum, Solanum tuberosum.
In yet some further embodiments, Canola, Cereals (Corn wheat, Barley), rice, sugarcane, Beet, Cotton, Banana, Cassava, sweet potato, lentils, chickpea, peas, Soy, nuts, peanuts, Lemna, Apple, may be applicable in the present disclosure.
A non-comprehensive list of useful annual and perennial, domesticated or wild, monocotyledonous or dicotyledonous land plant or Algae - (i.e unicellular or multicellular algae including diatoms, microalgae, ulva, nori, gracilaria), applicable in accordance with the present disclosure may include but are not limited to crops, ornamentals, herbs (i.e., labiacea such as sage, basil and mint, or lemon grass, chives), grasses (i.e., lawn and biofuel grasses and animal feed grasses), cereals (i.e., rice, wheat, rye, oats, corn), legumes (i.e. soy, beans, lentils, chick peas, peas, peanuts), leafy vegetables (i.e. kale, bok-choi, cress, lettuce, spinach, cabbage), Amaranthacea (i.e. sugar beet, beet, quinoa, spinach), Compositea (i.e. sunflower, lettuce, aster), Malvaceae (i.e. cotton, cacao, okra, hibiscus), cucurbits (i.e., cucumber, squash, melon, watermelon), Solanaceous species (i.e tobacco, potato, tomato, petunia and pepper), Umbellifera (i.e. carrot, celery, dill, parsley, cumin), Crucifera (i.e., oilseed rape, mustard, brassicas, cauliflower, radish), Sesame, the monocot Aspargales (i.e. onion, garlic, leek, asparagus, vanilla, lilies, tulips, narcissus), Myrtacea (i.e., Eucalyptus, pomegranate, guava), Subtropical fruit trees (i.e. Avocado, Mango, Litchi, papaya), Citrus (i.e. orange, lemon, grapefruit), Rosacea (i.e. apple, cherry, plum, almond, roses), berry-plants (i.e. grapes, mulberries, blueberries, raspberry, strawberry), nut trees (i.e. macademia, hazelnut, pecan, walnut, chestnuts, brazil nut, cashew), banana and plantain, palms (i.e., oil-palm, coconut and dates), evergreen, coniferous or deciduous trees, woody species.
In some embodiments, the subject is a human.
In a further aspect, the present disclosure provides a method for diagnosing a pathological disorder in at least one subject by identifying at least one genetic and/or epigenetic variation/s associated with said pathologic disorder, and/or at least one nucleic acid sequence of at least one pathogenic entity, in at least one target nucleic acid sequence of at least one samples of said at least one subject, the method comprising the step of performing molecular inversion probe-based method for targeted sequencing in at least one subject or in any nucleic acid molecule obtained therefrom, wherein the presence of one or more of said variation/s in said target nucleic acid sequence and/or of at least one nucleic acid sequence of at least one pathogenic entity in said at least one sample, indicates that the least one subject has a risk, is a carrier, or is suffering from said pathologic disorder, and wherein the molecular inversion probe-based method for targeted sequencing comprising at least one of the steps of: a. contacting at least one target nucleic acid sequence originating from at least one sample with at least one MIP comprising unified at least one sample identifier index, and incubating for hybridization, said MIP comprising:
(i) a first region comprising a first sequence complementary to a first target region in a target nucleic acid sequence of interest;
(ii) a second region comprising a second sequence complementary to a second target region in said target nucleic acid sequence of interest; and
(iii) at least one sample identifier index; thereby obtaining at least one hybridization product/s from at least one sample (In some embodiments, said at least one hybridization product comprises said at least one MIP hybridized to the first and second target regions of at least one target nucleic acid sequence of interest); and b. subjecting the hybridization product/s obtained in step (a) to a polymerization reaction, thereby synthesizing a sequence corresponding to the at least one target nucleic acid sequence of interest nested between the first and second regions, wherein the synthesized sequence is further ligated to obtain at least one cyclized product/s in a reaction mixture (In some embodiments, the reaction mixture may be the polymerization and/or ligation reaction mixture) from said at least one sample; and c. optionally subjecting the reaction mixture obtained in step (b) to enzymatic digestion thereby digesting linear MIP/s or nucleic acid molecule/s present in said reaction mixture, thereby obtaining at least one non-digested cyclized product/s from at least one sample; d. amplifying the nucleic acid sequence of said non-digested cyclized product/s of step (c).
In some embodiments, the steps (a) and (b) may be performed together.
In some embodiments, the methods may be suitable for screening of at least one carrier of at least one pathological disorder.
In some embodiments, a carrier may carry a gene/nucleic acid sequence that may lead to genetic disorder.
In some embodiments, the method of the present disclosure is for simultaneous targeted sequencing of at least two samples, wherein the hybridization or cyclized or non-digested cyclized product/s obtained in any one of step (a) or (b) or (c) and corresponding to each of said at least two samples are mixed/pooled together before the amplification step (d).
Thus in a further aspect, the present disclosure provides a a method for diagnosing a pathological disorder in at least one subject by identifying at least one genetic and/or epigenetic variation/s associated with said pathologic disorder, and/or at least one nucleic acid sequence of at least one pathogenic entity, in at least one target nucleic acid sequence of at least two samples of said at least one subject, the method comprising the step of performing molecular inversion probebased method for simultaneous targeted sequencing in at least one subject or in any nucleic acid molecule obtained therefrom, wherein the presence of one or more of said variation/s in said target nucleic acid sequence and/or of at least one nucleic acid sequence of at least one pathogenic entity in said at least two samples, indicates that the least one subject has a risk, is a carrier, or is suffering from said pathologic disorder, and wherein the molecular inversion probe-based method for simultaneous targeted sequencing comprising the steps of: a. contacting at least one target nucleic acid sequence originating from at least two samples with at least two MIP comprising unified at least one sample identifier index, and incubating for hybridization, said MIP comprising: (i) a first region comprising a first sequence complementary to a first target region in a target nucleic acid sequence of interest;
(ii) a second region comprising a second sequence complementary to a second target region in said target nucleic acid sequence of interest; and
(iii) at least one sample identifier index; thereby obtaining at least one hybridization product/s from at least two samples; and b. subjecting the hybridization product/s obtained in step (a) to a polymerization reaction, thereby synthesizing a sequence corresponding to the at least one target nucleic acid sequence of interest nested between the first and second regions, wherein the synthesized sequence is further ligated to obtain at least one cyclized product/s in a reaction mixture from said at least two samples; and c. subjecting the reaction mixture obtained in step (b) to enzymatic digestion thereby digesting linear MIP/s or nucleic acid molecule/s present in said reaction mixture, thereby obtaining at least one non-digested cyclized product/s from at least two samples; d. amplifying the nucleic acid sequence of said non-digested cyclized product/s of step (c); wherein the hybridization or cyclized or non-digested cyclized product/s obtained in any one of step (a) or (b) or (c) and corresponding to each of said at least two samples are mixed/pooled together before the amplification step (d).
In some embodiments, the hybridization product/s obtained in step (a) for each of at least two samples are mixed/pooled.
In some embodiments, the cyclized product/s obtained in step (b) for each of at least two samples are mixed/pooled.
In some embodiments, the non-digested cyclized product/s obtained in step (c) for each of at least two samples are mixed/pooled.
In some embodiments, the method for simultaneous targeted sequencing of at least two samples may further comprise a demultiplexing step or procedure. In some further embodiments, the method for simultaneous targeted sequencing of at least two samples may comprise an in silico demultiplexing step or procedure.
In some embodiments, the pathologic disorder is at least one of: a neoplastic disorder, a metabolic condition, an inflammatory disorder, an infectious disease caused by a pathogen, mental disorders, an autoimmune disease, a cardiovascular disease, a neurodegenerative disorder, fetal genetic condition and an age-related condition. Still further, in some embodiments, the pathologic disorder may be at least one somatic, spontaneous, or acquired pathologic disorder or condition.
In yet some further embodiments pathologic disorders applicable in the present disclosure my be any spontaneous, or acquired pathologic disorder, for example, and disorder caused by environmental exposure to a pathogenic agent or any environmental stress or condition.
In some embodiments, the pathologic disorder is at least one of: a neoplastic disorder, a metabolic condition, an inflammatory disorder, an infectious disease caused by a pathogen, mental disorders, an autoimmune disease, a cardiovascular disease, a neurodegenerative disorder, fetal genetic condition and an age-related condition.
In some embodiments, the pathologic disorder may be at least one hereditary disease. The term “Hereditary disease” as herein defined refers to a disease or disorder that is caused by defective genes which are inherited from the parents. A hereditary disease may result unexpectedly when two healthy carriers of a defective recessive gene reproduce but can also happen when the defective gene is dominant. Non-limiting examples of hereditary diseases include Duchenne muscular dystrophy (DMD), Cystic Fibrosis, Tay-Sachs disease (also known as GM2 gangliosidosis or hexosaminidase A deficiency), Ataxia-Telangiectasia (A-T), Sickle-cell disease (SCD), or sickle-cell anemia (SCA or anemia), Lesch-Nyhan syndrome (LNS, also known as Nyhan's syndrome, Amyotrophic Lateral Sclerosis, Cystinosis, Kelley-Seegmiller syndrome and Juvenile gout), color blindness, Haemochromatosis (or haemosiderosis), Haemophilia, Phenylketonuria (PKU), Phenylalanine Hydroxylase Deficiency disease, Polycystic kidney disease (PKD or PCKD, also known as polycystic kidney syndrome), Alpha-galactosidase A deficiency, Fabry disease, Anderson-Fabry disease, Angiokeratoma Corporis Diffusum, CADASIL (cerebral autosomal dominant arteriopathy with subcortical infarcts and leukoencephalopathy), Cerebral arteriopathy with subcortical infarcts and leukoencephalopathy, Cerebral autosomal dominant ateriopathy with subcortical infarcts and leukoencephalopathy, Carboxylase Deficiency, Multiple (Late-Onset), Cerebroside Lipidosis syndrome, Gaucher's disease, Choreoathetosis self-mutilation hyperuricemia syndrome, Classic Galactosemia, Galactosemia, Crohn's disease, also known as Crohn syndrome and regional enteritis, Incontinentia Pigmenti (also known as "Bloch-Siemens syndrome," "Bloch-Sulzberger disease," "Bloch-Sulzberger syndrome" "melanoblastosis cutis," and "naevus pigmentosus systematicus"), galactosemia Microcephaly, alpha-1 antitrypsin deficiency (Alpha-1), Adenosine deaminase (ADA) deficiency, Severe Combined Immunodeficiency (SCID), neurofibromatosis type 1 (NF1), Wiskott-Aldrich syndrome, Stargardt macular degeneration, Fanconi’s anemia, Spinal muscular atrophy (SMA) and Leber's congenital amaurosis (LCA). In yet some further embodiments, the pathological disorder may be at least one congenital disorders. More specifically, A congenital disorder is a medical condition that is present at or before birth. These conditions, also referred to as birth defects, can be acquired during the fetal stage of development or from the genetic make up of the parents. Congenital disorders are not necessarily hereditary, since they may be caused by infections during pregnancy or injury to the fetus at birth. Major anomalies are sometimes associated with minor anomalies, which might be objective (e.g., preauricular tags) or more subjective (e.g. low-set ears). Non limiting embodiments include external disorders and internal disorders such as Neural tube defects, Microcephaly, Microtia/ Anotia, Orofacial clefts, Exomphalos (omphalocele), Gastroschisis, Hypospadias, Reduction defects of upper and lower limbs, Talipes, equinovarus/club foot, Congenital heart defects, Esophageal atresia/tracheoesophageal fistula, Large intestinal atresia/stenosis, Anorectal atresia/stenosis and Renal agenesis/hypoplasia.
In yet some further embodiments, the pathological disorder may be at least one somatic disorders. A somatic symptom disorder, formerly known as a somatoform disorder is any mental disorder that manifests as physical symptoms that suggest illness or injury, but cannot be explained fully by a general medical condition or by the direct effect of a substance, and are not attributable to another mental disorder (e.g., panic disorder). Somatic symptom disorders, as a group, are included in a number of diagnostic schemes of mental illness. Somatic disorders may be also referred to as somatization disorder and undifferentiated somatoform disorder.
In yet some further embodiments, the relevant pathologic disorder may be at least one of: a proliferative disorder, and/or a neoplastic disorder, a metabolic condition, an inflammatory disorder, an infectious disease caused by a pathogen, a mental disorder, an autoimmune disease, a cardiovascular disease, a neurodegenerative disorder, fetal genetic condition and an age-related condition. Still further, pathologic disorders encompassed by the present disclosure further include infections and parasitic diseases, endocrine, nutritional diseases, immunity disorders, diseases of blood and blood forming organs, mental disorders, diseases of nervous system and sense organs, diseases of the circulatory system, diseases of the respiratory system, diseases of the digestive system, diseases of genitourinary system, complications of pregnancy, childbirth and the puerperium, diseases of the skin and subcutaneous tissue, diseases of musculoskeletal system and connective tissue and congenital anomalies.
In yet some further embodiments, relevant pathologic disorder may be any neoplastic disorder and/or any proliferative disorder. More specifically, as used herein to describe the present disclosure, "neoplastic disorder", “proliferative disorder”, “cancer”, “tumor” and “malignancy” all relate equivalently to a hyperplasia of a tissue or organ. If the tissue is a part of the lymphatic or immune systems, malignant cells may include non-solid tumors of circulating cells. Malignancies of other tissues or organs may produce solid tumors. In general, the methods of the present disclosure may be applicable for diagnosing of a patient suffering from any one of non-solid and solid tumors. Malignancy, as contemplated in the present disclosure may be any one of carcinomas, melanomas, lymphomas, leukemias, myeloma and sarcomas.
Carcinoma as used herein, refers to an invasive malignant tumor consisting of transformed epithelial cells. Alternatively, it refers to a malignant tumor composed of transformed cells of unknown histogenesis, but which possess specific molecular or histological characteristics that are associated with epithelial cells, such as the production of cytokeratins or intercellular bridges.
Melanoma as used herein, is a malignant tumor of melanocytes. Melanocytes are cells that produce the dark pigment, melanin, which is responsible for the color of skin. They predominantly occur in skin but are also found in other parts of the body, including the bowel and the eye. Melanoma can occur in any part of the body that contains melanocytes.
Leukemia refers to progressive, malignant diseases of the blood-forming organs and is generally characterized by a distorted proliferation and development of leukocytes and their precursors in the blood and bone marrow. Leukemia is generally clinically classified on the basis of (1) the duration and character of the disease-acute or chronic; (2) the type of cell involved; myeloid (myelogenous), lymphoid (lymphogenous), or monocytic; and (3) the increase or nonincrease in the number of abnormal cells in the blood-leukemic or aleukemic (subleukemic).
Sarcoma is a cancer that arises from transformed connective tissue cells. These cells originate from embryonic mesoderm, or middle layer, which forms the bone, cartilage, and fat tissues. This is in contrast to carcinomas, which originate in the epithelium. The epithelium lines the surface of structures throughout the body, and is the origin of cancers in the breast, colon, and pancreas.
Myeloma as mentioned herein is a cancer of plasma cells, a type of white blood cell normally responsible for the production of antibodies. Collections of abnormal cells accumulate in bones, where they cause bone lesions, and in the bone marrow where they interfere with the production of normal blood cells. Most cases of myeloma also feature the production of a paraprotein, an abnormal antibody that can cause kidney problems and interferes with the production of normal antibodies leading to immunodeficiency. Hypercalcemia (high calcium levels) is often encountered.
Lymphoma is a cancer in the lymphatic cells of the immune system. Typically, lymphomas present as a solid tumor of lymphoid cells. These malignant cells often originate in lymph nodes, presenting as an enlargement of the node (a tumor). It can also affect other organs in which case it is referred to as extranodal lymphoma. Non limiting examples for lymphoma include Hodgkin's disease, non-Hodgkin's lymphomas and Burkitt's lymphoma.
Further malignancies that may find utility in the present disclosure can comprise but are not limited to hematological malignancies (including lymphoma, leukemia and myeloproliferative disorders, as described above), hypoplastic and aplastic anemia (both virally induced and idiopathic), myelodysplastic syndromes, all types of paraneoplastic syndromes (both immune mediated and idiopathic) and solid tumors (including GI tract, colon, lung, liver, breast, prostate, pancreas and Kaposi's sarcoma. The disclosed methods may be applicable for solid tumors such as tumors in lip and oral cavity, pharynx, larynx, paranasal sinuses, major salivary glands, thyroid gland, esophagus, stomach, small intestine, colon, colorectum, anal canal, liver, gallbladder, extrahepatic bile ducts, ampulla of vater, exocrine pancreas, lung, pleural mesothelioma, bone, soft tissue sarcoma, carcinoma and malignant melanoma of the skin, breast, vulva, vagina, cervix uteri, corpus uteri, ovary, fallopian tube, gestational trophoblastic tumors, penis, prostate, testis, kidney, renal pelvis, ureter, urinary bladder, urethra, carcinoma of the eyelid, carcinoma of the conjunctiva, malignant melanoma of the conjunctiva, malignant melanoma of the uvea, retinoblastoma, carcinoma of the lacrimal gland, sarcoma of the orbit, brain, spinal cord, vascular system, hemangiosarcoma and Kaposi's sarcoma. In yet some further embodiments, the methods of the present disclosure may be applicable for any of the proliferative disorders discussed herein.
Still further, it should be appreciated that the methods disclosed herein are applicable for any neoplastic disorder, specifically, any malignant or non-malignant proliferative disorder. In yet some further embodiments, the method and uses of the present disclosure are applicable for any cancer. Thus, in some illustrative and non-limiting embodiments, the methods and uses of the present disclosure may be applicable for any one of: Acute lymphoblastic leukemia; Acute myeloid leukemia; Adrenocortical carcinoma; AIDS- related cancers; AIDS-related lymphoma; Anal cancer; Appendix cancer; Astrocytoma, childhood cerebellar or cerebral; Basal cell carcinoma; Bile duct cancer, extrahepatic; Bladder cancer; Bone cancer, Osteosarcoma/Malignant fibrous histiocytoma; Brainstem glioma; Brain tumor; Brain tumor, cerebellar astrocytoma; Brain tumor, cerebral astrocytoma/malignant glioma; Brain tumor, ependymoma; Brain tumor, medulloblastoma; Brain tumor, supratentorial primitive neuroectodermal tumors; Brain tumor, visual pathway and hypothalamic glioma; Breast cancer; Bronchial adenomas/carcinoids; Burkitt lymphoma; Carcinoid tumor, childhood; Carcinoid tumor, gastrointestinal; Carcinoma of unknown primary; Central nervous system lymphoma, primary; Cerebellar astrocytoma, childhood; Cerebral astrocytoma/Malignant glioma, childhood; Cervical cancer; Childhood cancers; Chronic lymphocytic leukemia; Chronic myelogenous leukemia; Chronic myeloproliferative disorders; Colon Cancer; Cutaneous T-cell lymphoma; Desmoplastic small round cell tumor; Endometrial cancer; Ependymoma; Esophageal cancer; Ewing's sarcoma in the Ewing family of tumors; Extracranial germ cell tumor, Childhood; Extragonadal Germ cell tumor; Extrahepatic bile duct cancer; Eye Cancer, Intraocular melanoma; Eye Cancer, Retinoblastoma; Gallbladder cancer; Gastric (Stomach) cancer; Gastrointestinal Carcinoid Tumor; Gastrointestinal stromal tumor (GIST); Germ cell tumor: extracranial, extragonadal, or ovarian; Gestational trophoblastic tumor; Glioma of the brain stem; Glioma, Childhood Cerebral Astrocytoma; Glioma, Childhood Visual Pathway and Hypothalamic; Gastric carcinoid; Hairy cell leukemia; Head and neck cancer; Heart cancer; Hepatocellular (liver) cancer; Hodgkin lymphoma; Hypopharyngeal cancer; Hypothalamic and visual pathway glioma, childhood; Intraocular Melanoma; Islet Cell Carcinoma (Endocrine Pancreas); Kaposi sarcoma; Kidney cancer (renal cell cancer); Laryngeal Cancer; Leukemias; Leukemia, acute lymphoblastic (also called acute lymphocytic leukemia); Leukemia, acute myeloid (also called acute myelogenous leukemia); Leukemia, chronic lymphocytic (also called chronic lymphocytic leukemia); Leukemia, chronic myelogenous (also called chronic myeloid leukemia); Leukemia, hairy cell; Lip and Oral Cavity Cancer; Liver Cancer (Primary); Lung Cancer, Non-Small Cell; Lung Cancer, Small Cell; Lymphomas; Lymphoma, AIDS-related; Lymphoma, Burkitt; Lymphoma, cutaneous T-Cell; Lymphoma, Hodgkin; Lymphomas, Non- Hodgkin (an old classification of all lymphomas except Hodgkin's); Lymphoma, Primary Central Nervous System; Marcus Whittle, Deadly Disease; Macroglobulinemia, Waldenstrom; Malignant Fibrous Histiocytoma of Bone/Osteosarcoma; Medulloblastoma, Childhood; Melanoma; Melanoma, Intraocular (Eye); Merkel Cell Carcinoma; Mesothelioma, Adult Malignant; Mesothelioma, Childhood; Metastatic Squamous Neck Cancer with Occult Primary; Mouth Cancer; Multiple Endocrine Neoplasia Syndrome, Childhood; Multiple Myeloma/Plasma Cell Neoplasm; Mycosis Fungoides; Myelodysplastic Syndromes; Myelodysplastic/Myeloproliferative Diseases; Myelogenous Leukemia, Chronic; Myeloid Leukemia, Adult Acute; Myeloid Leukemia, Childhood Acute; Myeloma, Multiple (Cancer of the Bone-Marrow); Myeloproliferative Disorders, Chronic; Nasal cavity and paranasal sinus cancer; Nasopharyngeal carcinoma; Neuroblastoma; Non-Hodgkin lymphoma; Non-small cell lung cancer; Oral Cancer; Oropharyngeal cancer; Osteosarcoma/malignant fibrous histiocytoma of bone; Ovarian cancer; Ovarian epithelial cancer (Surface epithelial- stromal tumor); Ovarian germ cell tumor; Ovarian low malignant potential tumor; Pancreatic cancer; Pancreatic cancer, islet cell; Paranasal sinus and nasal cavity cancer; Parathyroid cancer; Penile cancer; Pharyngeal cancer; Pheochromocytoma; Pineal astrocytoma; Pineal germinoma; Pineoblastoma and supratentorial primitive neuroectodermal tumors, childhood; Pituitary adenoma; Plasma cell neoplasia/Multiple myeloma; Pleuropulmonary blastoma; Primary central nervous system lymphoma; Prostate cancer; Rectal cancer; Renal cell carcinoma (kidney cancer); Renal pelvis and ureter, transitional cell cancer; Retinoblastoma; Rhabdomyosarcoma, childhood; Salivary gland cancer; Sarcoma, Ewing family of tumors; Sarcoma, Kaposi; Sarcoma, soft tissue; Sarcoma, uterine; Sezary syndrome; Skin cancer (nonmelanoma); Skin cancer (melanoma); Skin carcinoma, Merkel cell; Small cell lung cancer; Small intestine cancer; Soft tissue sarcoma; Squamous cell carcinoma - see Skin cancer (nonmelanoma); Squamous neck cancer with occult primary, metastatic; Stomach cancer; Supratentorial primitive neuroectodermal tumor, childhood; T-Cell lymphoma, cutaneous (Mycosis Fungoides and Sezary syndrome); Testicular cancer; Throat cancer; Thymoma, childhood; Thymoma and Thymic carcinoma; Thyroid cancer; Thyroid cancer, childhood; Transitional cell cancer of the renal pelvis and ureter; Trophoblastic tumor, gestational; Unknown primary site, carcinoma of, adult; Unknown primary site, cancer of, childhood; Ureter and renal pelvis, transitional cell cancer; Urethral cancer; Uterine cancer, endometrial; Uterine sarcoma; Vaginal cancer; Visual pathway and hypothalamic glioma, childhood; Vulvar cancer; Waldenstrom macroglobulinemia and Wilms tumor (kidney cancer).
In some specific and non-limiting embodiments, the disease disorder or condition may be ageing related condition.
In some embodiments, the pathological disorder may be AML. Acute myeloid leukemia (AML), a type of blood cancer, is characterized by an increase in the number of abnormal white blood cells in the bone marrow, frequently causing hematopoietic insufficiency. It is a heterogeneous disease featuring cytogenetic aberrations, recurrent somatic mutations and alterations in gene expression. DNA (cytosine-5-)-methyltransferase 3 alpha (DNMT3A) is closely associated with epigenetic modifications in mammalian development and disease. More recent studies have identified recurrent somatic mutations in DNMT3A in AML, most of which are heterozygous. The DNMT3A R882 codon is a mutational hotspot.
In some specific embodiments, the target nucleic acid sequences of interest may comprise the genomic locus associated with AML. Thus, in some embodiments a genomic locus associated with AML may comprise the DNMT3A gene. In some further embodiments, said genomic locus may be chr2:25457097-25457316 (hgl9 build). In some more specific embodiments, the target nucleic acid sequences of interest may comprise the R882 codon.
In some more specific embodiments, said different target nucleic acid sequences of interest may comprise the same genomic locus. In another embodiment, said genomic locus may be associated with AML, for example said genomic locus may be chr2:25457097-25457316 (hgl9 build). In some embodiments, said different target nucleic acid sequences of interest may comprise the DNMT3A gene, e.g. said different target nucleic acid sequences of interest may comprise the R882 codon.
In some further embodiments, said genomic locus may be chrl:114713908. In some specific embodiments, the genomic locus may comprise the NRAS gene, e.g. may be associated with cancer.
In yet some other embodiments, said genomic locus may be chrl7:7,675,206, e.g. may comprise the TP53 gene, e.g. may be associated with cancer.
In some specific embodiments, the genomic locus may comprise the JAK2 gene, e.g. may be associated with cancer.
In some embodiments, the molecular inversion probe-based method for targeted sequencing is as defined as in the previous aspect of the present disclosure.
In some further embodiments, the methods of the present disclosure further comprises administering a suitable treatment to said subject following diagnostic of a pathological disorder in the subject.
In a further aspect, the present disclosure provides a method of treating or preventing a pathological disorder in at least one subject by identifying at least one genetic and/or epigenetic variation/s associated with said pathologic disorder, and/or at least one nucleic acid sequence of at least one pathogenic entity, in at least one target nucleic acid sequence of at least one samples of said at least one subject, by performing the above described molecular inversion probe-based methods for targeted sequencing in at least one subject or in any nucleic acid molecule obtained therefrom thereby diagnosing a pathological disorder in at least one subject, the method further comprising administering a suitable treatment to said subject.
In a further aspect, the present disclosure provides a suitable treatment for use in a method of treating or preventing a pathological disorder in at least one subject by identifying at least one genetic and/or epigenetic variation/s associated with said pathologic disorder, and/or at least one nucleic acid sequence of at least one pathogenic entity, in at least one target nucleic acid sequence of at least one samples of said at least one subject, by performing the above described molecular inversion probe-based method for targeted sequencing in at least one subject or in any nucleic acid molecule obtained therefrom thereby diagnosing a pathological disorder in at least one subject, the method further comprising administering a suitable treatment to said subject.
An additional aspect of the present disclosure relates to a method of detecting the presence of one or more target microorganism, infectious entity in at least one sample, the method comprising the step of performing molecular inversion probe-based method for targeted sequencing in at least one nucleic acid molecule obtained from said at least one sample, wherein the presence of one or more target nucleic acid sequence associated with said microorganism or infectious entity in said at least one sample indicates the presence thereof in the sample, and wherein the molecular inversion probe-based method for targeted sequencing comprising at least one the steps of: a. contacting at least one target nucleic acid sequence originating from at least one sample with at least one MIP comprising unified at least one sample identifier index, and incubating for hybridization, said MIP comprising:
(i) a first region comprising a first sequence complementary to a first target region in a target nucleic acid sequence of interest;
(ii) a second region comprising a second sequence complementary to a second target region in said target nucleic acid sequence of interest; and
(iii) at least one sample identifier index; thereby obtaining at least one hybridization product/s from at least one sample (In some embodiments, said at least one hybridization product comprises said at least one MIP hybridized to the first and second target regions of at least one target nucleic acid sequence of interest); and b. subjecting the hybridization product/s obtained in step (a) to a polymerization reaction, thereby synthesizing a sequence corresponding to the at least one target nucleic acid sequence of interest nested between the first and second regions, wherein the synthesized sequence is further ligated to obtain at least one cyclized product/s in a reaction mixture (In some embodiments, the reaction mixture may be the polymerization and/or ligation reaction mixture) from said at least one sample; and c. optionally subjecting the reaction mixture obtained in step (b) to enzymatic digestion thereby digesting linear MIP/s or nucleic acid molecule/s present in said reaction mixture, thereby obtaining at least one non-digested cyclized product/s from at least one sample; d. amplifying the nucleic acid sequence of said non-digested cyclized product/s of step (c).
In some embodiments, the steps (a) and (b) may be performed together.
In some embodiments, the method of the present disclosure is for simultaneous targeted sequencing of at least two samples, wherein the hybridization or cyclized or non-digested cyclized product/s obtained in any one of step (a) or (b) or (c) and corresponding to each of said at least two samples are mixed/pooled together before the amplification step (d). Thus a further aspect of the present disclosure relates a method of detecting the presence of one or more target microorganism, infectious entity in at least two samples, the method comprising the step of performing molecular inversion probe-based method for simultaneous targeted sequencing in at least one nucleic acid molecule obtained from said at least two samples, wherein the presence of one or more target nucleic acid sequence associated with said microorganism or infectious entity in said at least two samples indicates the presence thereof in the samples, and wherein the molecular inversion probe-based method for simultaneous targeted sequencing comprising the steps of: a. contacting at least one target nucleic acid sequence originating from at least two samples with at least two MIP comprising unified at least one sample identifier index, and incubating for hybridization, said MIP comprising:
(i) a first region comprising a first sequence complementary to a first target region in a target nucleic acid sequence of interest;
(ii) a second region comprising a second sequence complementary to a second target region in said target nucleic acid sequence of interest; and
(iii) at least one sample identifier index; thereby obtaining at least one hybridization product/s from at least two samples; and b. subjecting the hybridization product/s obtained in step (a) to a polymerization reaction, thereby synthesizing a sequence corresponding to the at least one target nucleic acid sequence of interest nested between the first and second regions, wherein the synthesized sequence is further ligated to obtain at least one cyclized product/s in a reaction mixture from said at least two samples; and c. subjecting the reaction mixture obtained in step (b) to enzymatic digestion thereby digesting linear MIP/s or nucleic acid molecule/s present in said reaction mixture, thereby obtaining at least one non-digested cyclized product/s from at least two samples; d. amplifying the nucleic acid sequence of said non-digested cyclized product/s of step (c); wherein the hybridization or cyclized or non-digested cyclized product/s obtained in any one of step (a) or (b) or (c) and corresponding to each of said at least two samples are mixed/pooled together before the amplification step (d).
In some embodiments, the hybridization product/s obtained in step (a) for each of at least two samples are mixed/pooled.
In some embodiments, the cyclized product/s obtained in step (b) for each of at least two samples are mixed/pooled. In some embodiments, the non-digested cyclized product/s obtained in step (c) for each of at least two samples are mixed/pooled.
In some embodiments, the method for simultaneous targeted sequencing of at least two samples may further comprise a demultiplexing step or procedure. In some further embodiments, the method for simultaneous targeted sequencing of at least two samples may comprise an in silico demultiplexing step or procedure.
In some embodiments, the microorganism is a prokaryotic microorganism, or a lower eukaryotic microorganism, and wherein said infectious entity is at least one of a viral, a bacterial, a fungal, a parasitic and a protozoan pathogen.
As used herein, the term “pathogen” refers to an infectious agent that causes a disease in a subject host. Pathogenic agents include prokaryotic microorganisms, lower eukaryotic microorganisms, complex eukaryotic organisms, viruses, fungi, mycoplasma, prions, parasites, for example, a parasitic protozoan, yeasts or a nematode.
In yet some further embodiments, the methods of the present disclosure may be applicable for detecting a pathogen that may be in further specific embodiment, a viral pathogen or a virus. In some embodiments, the pathogen may be at least one viral pathogen.
In some embodiments, the infectious entity may be a virus. The term "virus" as used herein, refers to obligate intracellular parasites of living but non-cellular nature, consisting of DNA or RNA and a protein coat. Viruses range in diameter from about 20 to about 300 nm. Class I viruses (Baltimore classification) have a double-stranded DNA as their genome; Class II viruses have a single-stranded DNA as their genome; Class III viruses have a double-stranded RNA as their genome; Class IV viruses have a positive single-stranded RNA as their genome, the genome itself acting as mRNA; Class V viruses have a negative single-stranded RNA as their genome used as a template for mRNA synthesis; and Class VI viruses have a positive single- stranded RNA genome but with a DNA intermediate not only in replication but also in mRNA synthesis.
It should be noted that the term “viruses” is used in its broadest sense to include any virus, specifically, any enveloped virus. In some specific embodiments, the viral pathogen may be of any of the following orders, specifically, Herpesvirales (large eukaryotic dsDNA viruses), Ligamenvirales (linear, dsDNA (group I) archaean viruses), Mononegavirales (include nonsegmented (-) strand ssRNA (Group V) plant and animal viruses), Nidovirales (composed of (+) strand ssRNA (Group IV) viruses), Ortervirales (single-stranded RNA and DNA viruses that replicate through a DNA intermediate (Groups VI and VII)), Picornavirales (small (+) strand ssRNA viruses that infect a variety of plant, insect and animal hosts), Tymovirales (monopartite (+) ssRNA viruses), Bunyavirales contain tripartite (-) ssRNA viruses (Group V) and Caudovirales (tailed dsDNA (group I) bacteriophages).
In some embodiments, the viral pathogens applicable in the disclosed methods may be DNA viruses, specifically, any virus of the following families: the Adenoviridae family, the Papovaviridae family, the Parvoviridae family, the Herpesviridae family, the Poxviridae family, the Hepadnaviridae family and the Anelloviridae family.
In yet some further specific embodiments, the viral pathogens applicable in the disclosed methods may be RNA viruses, specifically, any virus of the following families: the Reoviridae family, Picornaviridae family, Caliciviridae family, Togaviridae family, Arenaviridae family, Flaviviridae family, Orthomyxoviridae family, Paramyxoviridae family, Bunyaviridae family, Rhabdoviridae family, Filoviridae family, Coronaviridae family, Astroviridae family, Bornaviridae family, Arteriviridae family, Hepeviridae family and the Retroviridae family. Of particular interest are viruses of the families adenoviruses, papovaviruses, herpesviruses: simplex, varicella-zoster, Epstein-Barr (EBV), Cytomegalo virus (CMV), pox viruses: smallpox, vaccinia, hepatitis B (HBV), rhinoviruses, hepatitis A (HBA), poliovirus, respiratory syncytial virus (RSV), Middle East Respiratory Syndrome (MERS-CoV), Severe acute respiratory syndrome (SARS- Cov), SARS-CoV2, corona virus, rubella virus, hepatitis C (HBC), arboviruses, rabies virus, influenza viruses A and B, measles virus, mumps virus, human deficiency virus (HIV), HTLV I and II and Zika virus.
In some specific and embodiments, the methods of the present disclosure may be suitable for detecting at least one coronavirus (CoV). CoVs are common in humans and usually cause mild to moderate upper-respiratory tract illnesses. There are four main sub-groupings of coronaviruses, known as alpha, beta, gamma, and delta. The seven coronaviruses known to-date as infecting humans are: alpha coronaviruses 229E and NL63, and beta coronaviruses OC43, HKU1, SARS- CoV and SARS-CoV2, and MERS-CoV (the coronavirus that causes Middle East Respiratory Syndrome, or MERS). The SARS-CoV and SARS-CoV2 are a lineage B beta Coronavirus and the MERS-CoV is a lineage C beta Coronavirus. In some specific and embodiments, the methods of the present disclosure may be suitable for detecting SARS-CoV2.
Still further, in some embodiments, the disclosed methods may be applicable for detecting bacteria, and in some embodiments, bacterial pathogens. The term "bacteria" (in singular a "bacterium") in this context refers to any type of a single celled microbe. Herein the terms "bacterium" and "microbe" are interchangeable. This term encompasses herein bacteria belonging to general classes according to their basic shapes, namely spherical (cocci), rod (bacilli), spiral (spirilla), comma (vibrios) or corkscrew (spirochaetes), as well as bacteria that exist as single cells, in pairs, chains or clusters. It should be noted that the term "bacteria" as used herein refers to any of the prokaryotic microorganisms that exist as a single cell or in a cluster or aggregate of single cells. In more specific embodiments, the term "bacteria" specifically refers to Gram positive, Gram negative or Acid-fast organisms. The Gram-positive bacteria can be recognized as retaining the crystal violet stain used in the Gram staining method of bacterial differentiation, and therefore appear to be purple-colored under a microscope. The Gram-negative bacteria do not retain the crystal violet, making positive identification possible. In other words, the term 'bacteria' applies herein to bacteria with a thicker peptidoglycan layer in the cell wall outside the cell membrane (Gram-positive), and to bacteria with a thin peptidoglycan layer of their cell wall that is sandwiched between an inner cytoplasmic cell membrane and a bacterial outer membrane (Gramnegative). This term further applies to some bacteria, such as Deinococcus, which stain Grampositive due to the presence of a thick peptidoglycan layer, but also possess an outer cell membrane, and thus suggested as intermediates in the transition between monoderm (Grampositive) and diderm (Gram-negative) bacteria._Acid fast organisms like Mycobacterium contain large amounts of lipid substances within their cell walls called mycolic acids that resist staining by conventional methods such as a Gram stain.
In some embodiments, a pathogen to be detected by the disclosed methods, may be any bacteria involved in nosocomial infections or any mixture of such bacteria. The term "Nosocomial Infections " refers to Hospital-acquired infections, namely, an infection whose development is favoured by a hospital environment, such as surfaces and/or medical personnel, and is acquired by a patient during hospitalization. Nosocomial infections are infections that are potentially caused by organisms resistant to antibiotics. Nosocomial infections have an impact on morbidity and mortality and pose a significant economic burden. In view of the rising levels of antibiotic resistance and the increasing severity of illness of hospital in-patients, this problem needs an urgent solution. Common nosocomial organisms include Clostridium difficile, methicillin-resistant Staphylococcus aureus, coagulase-negative Staphylococci, vancomycin-resistant Enteroccocci, resistant Enterobacteriaceae, Pseudomonas aeruginosa, Acinetobacter and Stenotrophomonas maltophilia.
The nosocomial-infection pathogens could be subdivided into Gram-positive bacteria {Staphylococcus aureus, Coagulase-negative staphylococci'), Gram-positive cocci (Enterococcus faecalis and Enterococcus f aecium), Gram-negative rod-shaped organisms (Klebsiella pneumonia, Klebsiella oxytoca, Escherichia coli, Proteus aeruginosa, Serratia spp.), Gram-negative bacilli (Enterobacter aerogenes, Enterobacter cloacae), aerobic Gram-negative coccobacilli (Acinetobacter baumanii, Stenotrophomonas maltophilia) and Gram-negative aerobic bacillus (Stenotrophomonas maltophilia, previously known as Pseudomonas maltophilia'). Among many others Pseudomonas aeruginosa is an extremely important nosocomial Gram-negative aerobic rod pathogen.
In some embodiments, the disclosed methods may be applicable in detecting “ESKAPE” pathogens. As indicated herein, these pathogens include but are not limited to Enterococcus faecium, Staphylococcus aureus, Clostidium difficile, Klebsiella pneumoniae, Acinetobacter baumanii, Pseudomonas aeruginosa, and Enterobacter.
In further embodiments the pathogen according to the present disclosure may be a bacterial cell of at least one of E. coli, Pseudomonas spp, specifically, Pseudomonas aeruginosa, Staphylococcus spp, specifically, Staphylococcus aureus, Streptococcus spp, specifically, Streptococcus pyogenes, Salmonella spp, Shigella spp, Clostidium spp, specifically, Clostidium difficile, Enterococcus spp, specifically, Enterococcus faecium, Klebsiella spp, specifically, Klebsiella pneumonia, Acinetobacter spp, specifically, Acinetobacter baumanni, Yersinia spp, specifically, Yersinia pestis and Enterobacter species or any mutant, variant isolate or any combination thereof.
A lower eukaryotic organism applicable in the present invention disclosure may include in some embodiments, a yeast or fungus such as but not limited to Pneumocystis carinii, Candida albicans, Aspergillus, Histoplasma capsulatum, Blastomyces dermatitidis, Cryptococcus neoformans, Trichophyton and Microsporum, are also encompassed by the disclosed methods. A complex eukaryotic organism includes worms, insects, arachnids, nematodes, aemobe, Entamoeba histolytica, Giardia lamblia, Trichomonas vaginalis, Trypanosoma brucei gambiense, Trypanosoma cruzi, Balantidium coli, Toxoplasma gondii, Cryptosporidium or Leishmania.
Still further, in certain embodiments the methods of the present disclosure may be suitable for detecting fungal pathogens. The term "fungi" (or a “fungus”), as used herein, refers to a division of eukaryotic organisms that grow in irregular masses, without roots, stems, or leaves, and are devoid of chlorophyll or other pigments capable of photosynthesis. Each organism (thallus) is unicellular to filamentous and possess branched somatic structures (hyphae) surrounded by cell walls containing glucan or chitin or both and containing true nuclei. It should be noted that "fungi" includes for example, fungi that cause diseases such as ringworm, histoplasmosis, blastomycosis, aspergillosis, cryptococcosis, sporotrichosis, coccidioidomycosis, paracoccidio-idoinycosis, and candidiasis.
In some embodiments, the methods of the present disclosure may be applicable for detecting a parasitic pathogen. More specifically, “parasitic protozoan”, which refers to organisms formerly classified in the Kingdom “protozoa”. They include organisms classified in Amoebozoa, Excavata and Chromalveolata. Examples include Entamoeba histolytica, Plasmodium (some of which cause malaria), and Giardia lamblia. The term parasite includes, but not limited to, infections caused by somatic tapeworms, blood flukes, tissue roundworms, ameba, and Plasmodium, Trypanosoma, Leishmania, and Toxoplasma species.
In some embodiments, the methods of the present disclosure may be applicable for detecting a nematode. As used herein, the term “nematode” refers to roundworms. Roundworms have tubular digestive systems with openings at both ends. Some examples of nematodes include, but are not limited to, basal order Monhysterida, the classes Dorylaimea, Enoplea and Secernentea and the “Chromadorea” assemblage.
In some embodiments, the methods of the present disclosure may be applicable for detecting at least one microorganism, specifically, pathogen in food or food products and beverages. More specifically, by the term “food”, it is referred to any substance consumed, usually of plant or animal origin. Some non limiting examples of animals used for feeding are cows, pigs, poultry, etc. The term food also comprises products derived from animals, such as, but not limited to, milk and food products derived from milk, eggs, meat, etc. A drink or beverage is a liquid which is specifically prepared for human consumption. Non limiting examples of drinks include, but are not limited to water, milk, alcoholic and non-alcoholic beverages, soft drinks, fruit extracts, etc.
In some embodiments, the molecular inversion probe-based method for targeted sequencing is as defined in the previous aspects of the present disclosure.
In some further embodiments, the method of the present disclosure further comprises administering to said subject a suitable treatment against said one or more target microorganism, infectious entity present in at least one sample.
In a further aspect, the present disclosure provides a method of treating or preventing a pathological disorder caused by one or more target microorganism, infectious entity in at least one subject by detecting the presence of one or more target microorganism, infectious entity in at least one sample, by performing the above described molecular inversion probe-based methods for targeted sequencing in at least one nucleic acid molecule obtained from said at least one sample, thereby detecting the presence of one or more target microorganism, infectious entity, the method further comprising administering a suitable treatment to said subject.
In a further aspect, the present disclosure provides a suitable treatment for use in a method of treating or preventing pathological disorder caused by one or more target microorganism, infectious entity in at least one subject by detecting the presence of one or more target microorganism, infectious entity in at least one sample, by performing the above described molecular inversion probe-based methods for targeted sequencing in at least one nucleic acid molecule obtained from said at least one sample, thereby detecting the presence of one or more target microorganism, infectious entity, the method further comprising administering a suitable treatment to said subject.
In a further aspect, the present disclosure provides a method of determining the genotype and/or genetic profile of at least one nucleic acid sequence of at least one organism, or at least one infectious entity from at least one test sample, the method comprising the step of performing molecular inversion probe-based method for targeted sequencing in said at least one test sample comprising said at least one nucleic acid sequence, wherein the molecular inversion probe-based method for targeted sequencing comprising at least one of the steps of: a. contacting at least one target nucleic acid sequence originating from at least one sample with at least one MIP comprising unified at least one sample identifier index, and incubating for hybridization, said MIP comprising:
(i) a first region comprising a first sequence complementary to a first target region in a target nucleic acid sequence of interest;
(ii) a second region comprising a second sequence complementary to a second target region in said target nucleic acid sequence of interest; and
(iii) at least one sample identifier index; thereby obtaining at least one hybridization product/s from at least one sample (In some embodiments, said at least one hybridization product comprises said at least one MIP hybridized to the first and second target regions of at least one target nucleic acid sequence of interest); and b. subjecting the hybridization product/s obtained in step (a) to a polymerization reaction, thereby synthesizing a sequence corresponding to the at least one target nucleic acid sequence of interest nested between the first and second regions, wherein the synthesized sequence is further ligated to obtain at least one cyclized product/s in a reaction mixture (In some embodiments, the reaction mixture may be the polymerization and/or ligation reaction mixture) from said at least one sample; and c. optionally subjecting the reaction mixture obtained in step (b) to enzymatic digestion thereby digesting linear MIP/s or nucleic acid molecule/s present in said reaction mixture, thereby obtaining at least one non-digested cyclized product/s from at least one sample; d. amplifying the nucleic acid sequence of said non-digested cyclized product/s of step (c). In some embodiments, the steps (a) and (b) may be performed together.
In some embodiments, the molecular inversion probe-based method for targeted sequencing is as defined in the previous aspects of the present disclosure above.
In a further aspect, the present disclosure provides a method of determining the genotype and/or genetic profile of at least one nucleic acid sequence of at least one organism, or at least one infectious entity from at least two test samples, the method comprising the step of performing molecular inversion probe-based method for simultaneous targeted sequencing in said at least two test samples comprising said at least one nucleic acid sequence, wherein the molecular inversion probe-based method for simultaneous targeted sequencing comprising the steps of: a. contacting at least one target nucleic acid sequence originating from at least two samples with at least two MIP comprising unified at least one sample identifier index, and incubating for hybridization, said MIP comprising:
(i) a first region comprising a first sequence complementary to a first target region in a target nucleic acid sequence of interest;
(ii) a second region comprising a second sequence complementary to a second target region in said target nucleic acid sequence of interest; and
(iii) at least one sample identifier index; thereby obtaining at least one hybridization product/s from at least two samples; and b. subjecting the hybridization product/s obtained in step (a) to a polymerization reaction, thereby synthesizing a sequence corresponding to the at least one target nucleic acid sequence of interest nested between the first and second regions, wherein the synthesized sequence is further ligated to obtain at least one cyclized product/s in a reaction mixture from said at least two samples; and c. subjecting the reaction mixture obtained in step (b) to enzymatic digestion thereby digesting linear MIP/s or nucleic acid molecule/s present in said reaction mixture, thereby obtaining at least one non-digested cyclized product/s from at least two samples; d. amplifying the nucleic acid sequence of said non-digested cyclized product/s of step (c); wherein the hybridization or cyclized or non-digested cyclized product/s obtained in any one of step (a) or (b) or (c) and corresponding to each of said at least two samples are mixed/pooled together before the amplification step (d).
In some embodiments, the method for simultaneous targeted sequencing of at least two samples may further comprise a demultiplexing step or procedure. In some further embodiments, the method for simultaneous targeted sequencing of at least two samples may comprise an in silico demultiplexing step or procedure.
The disclosed methods thus concern genotyping of a nucleic acid sequence. The term “genotyping” as herein defined refers to the identification of the nucleic acid sequence at specific loci in the DNA of an individual. As used herein, the terms "DNA profile," "genetic fingerprint," and "genotypic profile" are used interchangeably herein to refer to the allelic variations in a collection of polymorphic loci, such as a tandem repeat, a single nucleotide polymorphism (SNP), etc. A DNA profile is useful in forensics for identifying an individual based on a nucleic acid sample.
In some embodiments, the methods disclosed herein may be useful for interrogating DNA methylation degree, and pattern. DNA methylation is a stable, heritable, covalent modification to DNA, occurring mainly at CpG dinucleotides, but is also found at non-CpG sites. Methylation is associated with normal developmental processes, as well as the changes that are observable during oncogenesis and other pathological processes, such as gene silencing of tumor suppressor or DNA repair genes. Bisulfite genomic sequencing is regarded as a gold-standard technology for detection of DNA methylation and provides a qualitative, quantitative and efficient approach to identify 5 -methylcytosine at single base-pair resolution. This method is based on the finding that the amination reactions of cytosine and 5 -methylcytosine (5mC) proceed with very different consequences after the treatment of sodium bisulfite. The MIP based sequencing methods of the present disclosure may be therefore applicable in identifying epigenetic modifications.
The genotyping and genetic profiling methods disclosed by the present disclosure can be useful in various applications, to name but few, such applications may include agriculture, health, parental testing, epidemiology, and forensic applications.
More specifically, in some embodiments, the disclosed genotyping and genetic profiling methods may be applied in Agricultural genomics, or Agri genomics (the application of genomics in agriculture). In some non-limiting embodiments, the methods disclosed herein may be applied in seed selection, livestock improvements. In some non-limiting examples, the methods disclosed herein identify genetic markers linked to desirable traits, informing cultivation and breeding decisions. In some other non-limiting examples, the methods disclosed herein may be useful to improve plant and animal selection, nutrition, health surveillance, traceability, and veterinary diagnostics systems. In some non-limiting examples, the methods disclosed herein may be applied in developing varieties of plant crops with, for example, desirable traits such as drought tolerance, disease resistance, and higher yield. The methods disclosed herein may be applied in agrigenomics for identifying and propagating genetic variants that confer beneficial agronomic traits, in complex environments, acquiring the ability to cope with elements in their environment such as predators, soil conditions, and climate. Examples of phenotypic traits of agriculture value include but not limited to yield and growth, disease resistance, abiotic stress adaptation, reproduction, nutrition/end-use quality, sustainability, etc.
The genotyping and genetic profiling methods disclosed herein may be applied in providing valuable information about the biological status of important resources like fisheries, crop and livestock health, and food safety and authenticity. The methods may be used to identify organisms present within various environments in order to understand ecosystem diversity. Species contribute DNA to their environment, which can be easily recovered and is often referred to as environmental DNA (eDNA), that may serve as a means of differentiating species based on a unique genetic fingerprint. In this way, eDNA is used to determine the repertoire of organisms present in any setting from seawater to soil and food. This and other emerging applications of genomics are shaping best practices for resource monitoring and management related to agriculture and may be use by the disclosed methods.
In some other embodiments, the disclosed genotyping and genetic profiling methods may be utilized by animal breeders. As used herein, the term “breeder animal” refers to a non-human animal (e.g., domestic animals as mammals, specifically horse, sheep, cows, dogs, etc. fish, and avian animals) used for breeding. Accordingly, a breeder animal may be one that is used for breeding using conventional means, such as, e.g., mating a male breeder animal with a female breeder animal. Alternatively, a breeder animal may be one that is used as a donor of genetic material (e.g., sperm, egg, or mitochondria of the breeder animal) for the purpose of producing an offspring animal having one or more predetermined traits in the absence of physical mating with another breeder animal. In cases where an offspring animal is produced without requiring mating between two breeder animals, the genetic source material may be obtained and used from a single breeder animal or in combination with genetic material from one or more additional breeder animals. Additionally, a breeder animal may be a living animal or a deceased animal. In the case of a deceased animal, genetic material is obtained from the animal antemortem and cryopreserved for later use in producing an offspring animal having one or more predetermined traits.
Still further, in some embodiments, the disclosed genotyping and genetic profiling methods may be applicable in forensic applications. More specifically, the use of a subset of markers in a human genome has been utilized to determine an individual's personal identity, or DNA fingerprint or profile. These markers include locations or loci of short tandem repeated sequences (STRs) and intermediate tandem repeated sequences (ITRs) which in combination are useful in identifying one individual from another on a genetic level. Accordingly, STR markers are frequently used in the fields of forensic analysis, paternity determination and detection of genetic diseases and cancers.
Thus, in some embodiments, the genotyping and genetic profiling methods disclosed herein may be applicable for DNA profiling which may use in some non-limiting examples, selected biological markers for determining the identity of a DNA sample. For example, the most common analysis for determining a DNA profile is to determine the profile for a number of short tandem repeated (STRs) sequences found in an organism's genome. Species identification is one of most important components of forensic practice. For example, in some cases of poaching and trading of endangered species, it has been used to provide important information and assist in police investigations. In the food industry, identification of the species present in meat products can be achieved, and in archeology, human remains can be distinguished from non-human remains.
Still further, a DNA profile is useful in forensics for identifying an individual based on a nucleic acid sample. DNA profile as used herein may also be used for other applications, such as diagnosis and prognosis of diseases including cancer, cancer biomarker identification, inheritance analysis, genetic diversity analysis, genetic anomaly identification, quantification of minority populations, databanking, forensics, criminal case work, paternity, personal identification, etc.
In some further embodiments, the methods disclosed herein may apply to any organism, for example humans, non-human primates, animals, plants, viruses, bacteria, fungi and the like. As such, the present methods are not, only useful for DNA profiling (e.g., forensics, paternity, individual identification, etc.) and humans as a target genome, but could also be used for other targets such as cancer and disease markers, genetic anomaly markers and/or when the target genome is not human based.
Still further embodiments of the present disclosure concerns genotyping and genetic profiling methods that may be applicable in microbiome analysis which allows one to identify and quantify (relatively) the microbial community in a given set of samples.
Still further, in some embodiments, the genotyping and genetic profiling methods of the present disclosure, may be used for tumor analysis. More specifically, tumor biopsies are often a mixture of health and tumor cells. Targeted PCR allows deep sequencing of SNPs and loci with close to no background sequences. It may be used for copy number and loss of heterozygosity analysis on tumor DNA. Said tumor DNA may be present in many different body fluids or tissues of tumor patients. It may be used for detection of tumor recurrence, and/or tumor screening.
In yet some further aspects thereof, the genotyping and genetic profiling methods of the present disclosure may be useful for diagnosis of fetal genetic abnormalities. In such case, the starting sample may be obtained from maternal tissue (e.g., blood, plasma) or may contain fetal samples (present in amniotic fluid). The methods described in the present disclosure apply techniques for allowing detection of small, but statistically significant, differences in polynucleotide copy number. The targets for the assays and MIP probes described herein can be any genetic target associated with fetal genetic abnormalities, including aneuploidy as well as other genetic variations, such as mutations, insertions, additions, deletions, translocation, point mutation, trinucleotide repeat disorders and/or single nucleotide polymorphisms (SNPs), as well as control targets not associated with fetal genetic abnormalities. Still further, in some embodiments, the methods and compositions described herein can enable detection of extra or missing chromosomes, particularly those typically associated with birth defects or miscarriage. For example, the methods and compositions described herein may enable detection of autosomal trisomies (e.g., Trisomy 13, 15, 16, 18, 21, or 22). In other cases, the trisomy that is detected is a liveborn trisomy that may indicate that an infant will be born with birth defects (e.g., Trisomy 13 (Patau Syndrome), Trisomy 18 (Edwards Syndrome), and Trisomy 21 (Down Syndrome)). The abnormality may also be of a sex chromosome (e.g., XXY (Klinefelter 's Syndrome), XYY (Jacobs Syndrome), or XXX (Trisomy X). In some embodiments, the genetic target may be in any chromosome for example, 13, 18, 21, X or Y. Still further, to name but few, additional fetal conditions that can be determined based on the methods and systems herein include monosomy of one or more chromosomes (X chromosome monosomy, also known as Turner's syndrome), trisomy of one or more chromosomes (13, 18, 21, and X) , tetrasomy and pentasomy of one or more chromosomes (which in humans is most commonly observed in the sex chromosomes, e.g. XXXX, XXYY, XXXY, XYYY, XXXXX, XXXXY, XXXYY, XYYYY and XXYYY), monoploidy, triploidy (three of every chromosome, e.g. 69 chromosomes in humans), tetraploidy (four of every chromosome, e.g. 92 chromosomes in humans), pentaploidy and multiploidy.
In some cases, the genetic target may comprise more than 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23 ,24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43 ,44 ,45, 46, 47, 48, 49, 50, 75, 100, 125, 150, 175, 200, 225, 250, 300, 350, 400, 450, 500, 1,000, 5,000, 10,000, 20,000, 30,000, 40,000, 50,000, 60,000, 70,000, 80,000, 90,000 or 100,000 sites on a specific chromosome. In some cases, the genetic target comprises targets on more than 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, or 22 different chromosomes. In some cases, the genetic target comprises targets on less than 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, or 23 chromosomes. In some cases, the genetic target comprises a gene that is known to be mutated in an inherited genetic disorder, including autosomal dominant and recessive disorders, and sex-linked dominant and recessive disorders. Non-limiting examples include genetic mutations that give rise to autoimmune diseases, neurodegenerative diseases, cancers, and metabolic disorders. In some embodiments, the method detects the presence of a genetic target associated with a genetic abnormality (such as trisomy), by comparing it in reference to a genetic target not associated with a genetic abnormality (such as a gene located on a normal diploid chromosome).
Still further, the disclosed genotyping and genetic profiling methods disclosed herein may be used for standard paternity and identity testing of relatives or ancestors, in human, animals, plants or other creatures. It may be used for rapid genotyping and copy number analysis (CN), on any kind of material, e.g., amniotic fluid and CVS, sperm, product of conception (POC). It may be used for single cell analysis, such as genotyping on samples biopsied from embryos. It may be used for rapid embryo analysis (within less than one, one, or two days of biopsy).
A yet additional aspect of the present disclosure relates to a Molecular Inversion Probe (MIP) comprising:
(i) a first region comprising a first sequence complementary to a first target region in a target nucleic acid sequence of interest,
(ii) a second region comprising a second sequence complementary to a second target region in said target nucleic acid sequence of interest; and
(iii) at least one sample identifier index.
As mentioned above, Molecular inversion probes (MIPs) are nucleic acid hybridization probes that hybridize to a target nucleic acid in a loop with the 5' and 3' ends adjacent to or separated in the target with a small gap. One of the specific features of the probe according to the present disclosure is that it comprises a “sample identifier index”.
As previously mentioned, the term “sample identifier index” relates to a tag or an index that may be for example a nucleic acid sequence, a Unique Molecular Identifier (UMI) or any suitable label known in the art, that is employed in order to identify a specific sample. A sample identifier index is distinct for an index employed for target recognition. In some embodiments, said sample identifier index is at a position that does not disturb the target recognition. In some embodiments, said sample index identifier is not used for target recognition.
In some embodiments, the probe may comprise at least one UMI. Unique Molecular Identifiers (UMI) are unique molecular identifiers composed of short sequences or molecular "tags", for the purpose of identifying the specific MIP used. In yet some further embodiments, the MIP may comprise two UMIs. Still further, the at least one UMI of the disclosed MIP probe may flank one of the first and second complementary region (or homology arms) or the at least one sample index identifier. Still further, in some embodiments, the UMI may comprise between about 4 nucleotides to about 50 nucleotides, specifically, between 4 to 40 nucleotides, between 4 to 40 nucleotides, between 4 to 40 nucleotides, between 4 to 40 nucleotides, specifically 4, 5, 6, 7, 8, 9, 10 nucleotides. In some embodiments UMIs useful in the disclosed MIPs comprise 7 nucleotides. In yet some specific embodiments, UMIs useful in the disclosed MIPs comprise 4 nucleotides. In yet some further embodiments, UMIs useful in the disclosed MIPs comprise 8 nucleotides. In yet some further embodiments, UMIs useful in the disclosed MIPs comprise 9 nucleotides. In yet some further embodiments, UMIs useful in the disclosed MIPs comprise 10 nucleotides. In some embodiments, the UMI may flank at least one of said sample index identifier.
In some embodiments, the at least one sample identifier index flanks at least one of said first and/or second regions.
As used herein, the term “flanked” refers to a nucleic acid sequence positioned between two defined regions. The term “flank” may refer to a position either downstream or upstream to said first and/or second regions and/or a position at either the 3’ end or 5’ end of said first and/or second regions. As used herein, the term “flank” refers to a position close to said first and/or second regions, that is for example a position of about 1 to 100 nucleotides, or between about 1 nucleotide to about 90 nucleotides, 1 nucleotide to about 80 nucleotides, 1 nucleotide to about 70 nucleotides, 1 nucleotide to about 60 nucleotides, 1 nucleotide to about 50 nucleotides, specifically, between 1 to 40 nucleotides, between 1 to 30 nucleotides, between 1 to 20 nucleotides, between 1 to 10 nucleotides, specifically 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 nucleotides from said at least one first and/or second regions.
In some embodiments, the at least one sample identifier index flanks the 3' end of said first region and/or the 5' end of said second region.
In some embodiments, the probe of the present disclosure further comprises at least one additional index.
In some specific embodiments, said additional index is for target/probe recognition/UMI.
The probe suitable according to the present disclosure may have several different configurations as shown for example in Example 4. In some embodiments, the probe of the present disclosure may be comprise more than one sample index identifier. In some specific embodiments, the probe may be comprise 2, 3, 4, 5, 6, 7, 8, 9, 10 sample index identifiers. In some embodiments, the probe may be comprise 2 sample index identifiers. In some embodiments, the probe may comprise at least one UMI. In some further embodiments, the probe may comprise more than one UMI. In some specific embodiments, the probe may be comprise 2, 3, 4, 5, 6, 7, 8, 9, 10 UMIs. In some specific embodiments the probe may comprise 2 identical sample identifiers, wherein one sample identifier index flanks the 3' end of said first region and the second sample index identifier flanks the 5' end of said second region. In some other embodiments, the probe may comprise 2 identical sample identifiers, wherein one sample identifier index flanks the 3' end of said first region and the second sample index identifier flanks the 5' end of said second region and the probe may comprise two identical UMI, wherein one UMI flanks the 3' end of one of the sample index identifier and the second UMI flanks the 5' end of the other sample index identifier. In some further embodiments, the probe may comprise one sample index identifier flanking the 3’ end of the first region and one UMI flanking the 5' end of the second region. In some further embodiment, the probe may comprise one UMI flanking the 3’ end of the first region and one sample index identifier flanking the 5' end of the second region.
In some further embodiments, the probe of the present disclosure is double-stranded. However, it should be appreciated that in some embodiments, also single strand MIPs may also be applicable according to the present disclosure.
In some further embodiments, said at least one target nucleic acid sequence of interest is at least one of a genomic nucleic acid sequence, a transcriptomic nucleic acid sequence, and a circulating free DNA (cfDNA).
In yet another embodiments, said at least one target nucleic acid sequence of interest is a nucleic acid sequence associated with, or comprising, at least one of: genetic and/or epigenetic variation/s, pathologic disorder/s, infectious entity, microorganism/s and GC-rich regions.
In some more specific embodiments, said genetic variations comprise at least one of: single nucleotide variant (SNVs) and/or single- nucleotide polymorphisms (SNPs), insertions and/or deletions, (indels), inversions, copy number variations (CNV), structural variations, alternative splicing, loss of heterozygosity (LOH), gene fusions, translocations, duplications and variable number of tandem repeats.
In some further embodiments, said at least one target nucleic acid sequence of interest is associated with at least one hereditary, congenital, and/or somatic pathologic disorder or condition.
In some embodiments, said pathologic disorder is at least one of: a neoplastic disorder, a metabolic condition, an inflammatory disorder, an infectious disease caused by a pathogen, mental disorders, an autoimmune disease, a cardiovascular disease, a neurodegenerative disorder, fetal genetic condition and an age-related condition.
In another embodiment, said neoplastic disorder is Acute Myeloid Leukemia (AML).
In some further embodiments, said infectious entity is at least one of a viral, a bacterial, a fungal, a parasitic and a protozoan pathogen.
An additional aspect of the present disclosure relates to a plurality of Molecular Inversion Probes (MIPs) comprising unified at least one sample identifier index, wherein each of said MIPs targets a different target nucleic acid sequence of interest, and wherein each MIP comprises: (i) a first region comprising a first sequence complementary to a first target region in a target nucleic acid sequence of interest of said MIP,
(ii) a second region comprising a second sequence complementary to a second target region in said target nucleic acid sequence of interest of said MIP; and
(iii) at least one sample identifier index.
According to some embodiments, a plurality of MIPs comprising unified at least one sample identifier index may enable to target different target nucleic acid sequences of interest.
The term "plurality" as used herein refers to more than one. More specifically, the disclosed method may use 1 to 1 ,000,000 or more different MIPs directed either to the same or to a different target nucleic acid sequence. For example, 1 to 90,000, 1 to 85,000, 1 to 80,000, 1 to 75,000, 1 to 70,000, 1 to 65,000, 1 to 60,000, 1 to 55, 000, 1 to 50,000, 1 to 45,000, 1 to 40,000, 1 to 35,000, 1 to 30,000, 1 to 25,000, 1 to 20,000, 1 to 15,000, 1 to 10,000, 1 to 900, 1 to 9000, 1 to 8500, 1 to 8000, 1 to 7500, 1 to 7000, 1 to 6500, 1 to 6000, 1 to 5500, 1 to 5000, 1 to 4500, 1 to 4000, 1 to 3500, 1 to 3000, 1 to 2500, 1 to 2000, 1 to 1500, 1 to 1000, 1 to 950, 1 to 900, 1 to 850, 1 to 800, 1 to 750, 1 to 700, 1 to 650, 1 to 600, 1 to 550, 1 to 500, 1 to 450, 1 to 400, 1 to 350, 1 to 300, 1 to 250, 1 to 200, 1 to 150, 1 to 100, 1 to 95, 1 to 90, 1 to 85, 1 to 80, 1 to 75, 1 to 70, 1 to 65, 1 to 60, 1 to 55, 1 to 50, 1 to 45, 1 to 40, 1 to 35, 1 to 30, 1 to 25, 1 to 20, 1 to 15, 1 to 10, specifically, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 40, 50 75, 100, 150, 200, 250, 500, 1000, 10,000, 100,000 or more MIPs.
In some embodiments, the plurality of probes form at least one library/panel.
In some embodiments, the at least one index flanks at least one of said first and/or second regions.
In some embodiments, the at least one sample identifier index flanks the 3' end of said first region and/or the 5' end of said second region.
In some embodiments, the plurality of probes of the present disclosure comprise at least one additional index.
In some embodiments, said additional index is for target or for probe recognition or is a UMI. In some specific embodiments, said probe is double-stranded. Additional configurations of the probes are as further detailed above.
In some embodiments, said different target nucleic acid sequences of interest are associated with, or comprising, at least one of genetic and/or epigenetic variation/s, pathologic disorder/s, infectious entity, microorganism/s and GC-rich regions.
In yet another embodiments, said different target nucleic acid sequences of interest are all associated with, or comprising, at least one of the same genetic and/or epigenetic variation/s, pathologic disorder/s, infectious entity, microorganism/s and GC-rich regions. In some specific embodiments, the plurality of probes may refer to for example a COVID19 panel, a myeloid panel, a generic cancer panel, a cancer specific panel or a carrier screening panel.
In some more specific embodiments, said different target nucleic acid sequences of interest may comprise the same genomic locus. In another embodiment, said genomic locus may be associated with AML, for example said genomic locus may be chr2:25457097-25457316 (hgl9 build). In some embodiments, said different target nucleic acid sequences of interest may comprise the DNMT3A gene, e.g. said different target nucleic acid sequences of interest may comprise the R882 codon. In some further embodiments, said genomic locus is chrl:114713908. In some specific embodiments, the genomic locus may comprise the NRAS gene, e.g. may be associated with cancer. In yet some other embodiments, said genomic locus may be chrl7:7,675,206, e.g. may comprise the TP53 gene, e.g. may be associated with cancer. In some specific embodiments, the genomic locus may comprise the JAK2 gene, e.g. may be associated with cancer.
In some embodiments, said at least one target nucleic acid sequence of interest is at least one of a genomic nucleic acid sequence, a transcriptomic nucleic acid sequence, and a circulating free DNA (cfDNA).
In some further embodiments, said genetic variations comprise at least one of: single nucleotide variant (SNVs) and/or single- nucleotide polymorphisms (SNPs), insertions and/or deletions, (indels), inversions, copy number variations (CNV), structural variations, alternative splicing, loss of heterozygosity (LOH), gene fusions, translocations, duplications and variable number of tandem repeats.
In yet another embodiment, said at least one target nucleic acid sequence of interest is associated with at least one hereditary, congenital, and/or somatic pathologic disorder or condition.
In some specific embodiments, said pathologic disorder is at least one of: a neoplastic disorder, a metabolic condition, an inflammatory disorder, an infectious disease caused by a pathogen, mental disorders, an autoimmune disease, a cardiovascular disease, a neurodegenerative disorder, fetal genetic condition and an age-related condition. In some more specific embodiments, said neoplastic disorder is Acute Myeloid Leukemia (AML). In yet another embodiment, said infectious entity is at least one of a viral, a bacterial, a fungal, a parasitic and a protozoan pathogen.
In some further embodiments, the plurality of MIPs may be for performing molecular inversion probe-based method for simultaneous targeted sequencing of at least two samples of different target nucleic acid sequences of interest.
A further aspect of the invention relates to a kit comprising at least one set of plurality of MIPs, each of the at least one set of MIPs comprises unified at least one sample identifier index, wherein each of said MIPs targets a different target nucleic acid sequence of interest, and wherein each MIP comprises:
(i) a first region comprising a first sequence complementary to a first target region in a target nucleic acid sequence of interest of said MIP,
(ii) a second region comprising a second sequence complementary to a second target region in said target nucleic acid sequence of interest of said MIP; and
(iii) at least one sample identifier index; and optionally, at least one of:
(a) instructions for use;
(b) additional reagents.
In some embodiments, the plurality of MIPs is as defined in the previous aspects of the invention above.
In some embodiments, each of the at least one set of plurality of MIPs is adapted for performing molecular inversion probe-based method in at least one sample.
In some embodiments, the kit comprises at least two sets of plurality of MIPs, and wherein said at least one sample identifier index is different in each set.
In some embodiments, the kit of the present disclosure is for performing molecular inversion probe-based method for simultaneous targeted sequencing of at least two samples.
A further aspect of the invention relates to a kit comprising at least one MIPs, wherein each MIP comprises:
(i) a first region comprising a first sequence complementary to a first target region in a target nucleic acid sequence of interest of said MIP,
(ii) a second region comprising a second sequence complementary to a second target region in said target nucleic acid sequence of interest of said MIP; and
(iii) at least one sample identifier index; and optionally, at least one of:
(a) instructions for use;
(b) additional reagents.
In some embodiments, the kit may comprise at least two MIPs. In some embodiments, the kit may be performing molecular inversion probe-based method for simultaneous targeted sequencing of at least two samples.
The term "nucleic acid molecule or sequence" is referred to often herein, and relates to DNA, RNA, single-stranded, partially single-stranded, partially double-stranded or doublestranded nucleic acid sequences; sequences comprising nucleotides, ribonucleotides, deoxyribonucleotides, nucleotide analogs, modified nucleotides and nucleotides comprising backbone modifications, branch points and non-nucleotide residues, groups or bridges; synthetic RNA, DNA and chimeric nucleotides, hybrids, duplexes, heteroduplexes; and any ribonucleotide, deoxyribonucleotide or chimeric counterpart thereof and/or corresponding complementary sequence and any chemical modifications thereof. Modifications include, but are not limited to, those which provide other chemical groups that incorporate additional charge, polarizability, hydrogen bonding, electrostatic interaction, and functionality to the nucleic acid ligand bases or to the nucleic acid ligand as a whole. Such modifications include, but are not limited to, 2'-position sugar modifications, 5-position pyrimidine modifications, 8-position purine modifications, modifications at exocyclic amines, substitution of 4-thiouridine, substitution of 5-bromo or 5-iodo- uracil; backbone modifications, methylations, unusual base-pairing combinations such as the isobases, isocytidine, and isoguanidine and the like. Modifications can also include 3' and 5' modifications such as capping.
As indicated above, the methods and probes or plurality of probes provided by the present disclosure may be used for the diagnosis and/or further treatment of a “pathological disorder”, which refers to a condition, in which there is a disturbance of normal functioning, any abnormal condition of the body or mind that causes discomfort, dysfunction, or distress to the person affected or those in contact with that person. It should be noted that the terms "disease", "disorder", "condition" and "illness", are equally used herein.
It should be appreciated that any of the methods and probes or plurality of probes described by the present disclosure may be applicable for any of the disorders disclosed herein or any condition associated therewith. It is understood that the interchangeably used terms "associated", “linked” and "related", when referring to pathologies herein, mean diseases, disorders, conditions, or any pathologies which at least one of: share causalities, co-exist at a higher than coincidental frequency, or where at least one disease, disorder condition or pathology causes the second disease, disorder, condition or pathology.
By “subject” or “subject in need” or “patient”, it is meant any organism who may be affected by the above-mentioned conditions. Examples of relevant organisms according to the present disclosure are further detailed above.
It is to be understood that the terms "treat”, “treating”, “treatment" or forms thereof, as used herein, mean preventing, ameliorating or delaying the onset of one or more clinical indications of disease activity in a subject having a pathologic disorder. Treatment refers to therapeutic treatment. Those in need of treatment are subjects suffering from a pathologic disorder. Specifically, providing a "preventive treatment" (to prevent) or a "prophylactic treatment" is acting in a protective manner, to defend against or prevent something, especially a condition or disease.
Furthermore it is understood that the interchangeably used terms "associated", “linked” and "related", when referring to pathologies as disclosed herein after, mean any genetic or epigenetic variations which at least one of: cause either directly or indirectly, responsible for, share causalities, co-exist at a higher than coincidental frequency, with at least one disease, disorder condition or pathology or any symptoms thereof.
More specifically, as used herein, “disease”, “disorder”, “condition”, “pathology” and the like, as they relate to a subject's health, are used interchangeably and have meanings ascribed to each and all of such terms.
It should be understood that the terms specified in the claims and embodiments as defined by the herein definitions are applicable for each and every aspect and embodiment of the invention according to the present disclosure.
All definitions, as defined and used herein, should be understood to control over dictionary definitions, definitions in documents incorporated by reference, and/or ordinary meanings of the defined terms.
The term "about" as used herein indicates values that may deviate up to 1%, more specifically 5%, more specifically 10%, more specifically 15%, and in some cases up to 20% higher or lower than the value referred to, the deviation range including integer values, and, if applicable, non-integer values as well, constituting a continuous range. In some embodiments, the term "about" refers to ± 10 %.
The indefinite articles “a” and “an,” as used herein in the specification and in the claims, unless clearly indicated to the contrary, should be understood to mean “at least one.” It must be noted that, as used in this specification and the appended claims, the singular forms “a”, “an” and “the” include plural referents unless the content clearly dictates otherwise.
The phrase “and/or,” as used herein in the specification and in the claims, should be understood to mean “either or both” of the elements so conjoined, i.e., elements that are conjunctively present in some cases and disjunctively present in other cases. Multiple elements listed with “and/or” should be construed in the same fashion, i.e., “one or more” of the elements so conjoined. Other elements may optionally be present other than the elements specifically identified by the “and/or” clause, whether related or unrelated to those elements specifically identified. Thus, as a non-limiting example, a reference to “A and/or B”, when used in conjunction with open-ended language such as “comprising” can refer, in one embodiment, to A only (optionally including elements other than B); in another embodiment, to B only (optionally including elements other than A); in yet another embodiment, to both A and B (optionally including other elements); etc.
As used herein in the specification and in the claims, “or” should be understood to have the same meaning as “and/or” as defined above. For example, when separating items in a list, “or” or “and/or” shall be interpreted as being inclusive, i.e., the inclusion of at least one, but also including more than one, of a number or list of elements, and, optionally, additional unlisted items. Only terms clearly indicated to the contrary, such as “only one of’ or “exactly one of,” or, when used in the claims, “consisting of,” will refer to the inclusion of exactly one element of a number or list of elements. In general, the term “or” as used herein shall only be interpreted as indicating exclusive alternatives (i.e., “one or the other but not both”) when preceded by terms of exclusivity, such as “either,” “one of,” “only one of,” or “exactly one of’ “Consisting essentially of,” when used in the claims, shall have its ordinary meaning as used in the field of patent law.
As used herein in the specification and in the claims, the phrase “at least one,” in reference to a list of one or more elements, should be understood to mean at least one element selected from any one or more of the elements in the list of elements, but not necessarily including at least one of each and every element specifically listed within the list of elements and not excluding any combinations of elements in the list of elements. This definition also allows that elements may optionally be present other than the elements specifically identified within the list of elements to which the phrase “at least one” refers, whether related or unrelated to those elements specifically identified. Thus, as a non-limiting example, “at least one of A and B” (or, equivalently, “at least one of A or B,” or, equivalently “at least one of A and/or B”) can refer, in one embodiment, to at least one, optionally including more than one, A, with no B present (and optionally including elements other than B); in another embodiment, to at least one, optionally including more than one, B, with no A present (and optionally including elements other than A); in yet another embodiment, to at least one, optionally including more than one, A, and at least one, optionally including more than one, B (and optionally including other elements); etc.
It should also be understood that, unless clearly indicated to the contrary, in any methods claimed herein that include more than one step or act, the order of the steps or acts of the method is not necessarily limited to the order in which the steps or acts of the method are recited.
Throughout this specification and the Examples and claims which follow, all transitional phrases such as “comprising,” “including,” “carrying,” “having,” “containing,” “involving,” “holding,” “composed of,” and the like are to be understood to be open-ended, i.e., to mean including but not limited to. Specifically, it should understood to imply the inclusion of a stated integer or step or group of integers or steps but not the exclusion of any other integer or step or group of integers or steps. Only the transitional phrases “consisting of’ and “consisting essentially of’ shall be closed or semi-closed transitional phrases, respectively, as set forth in the United States Patent Office Manual of Patent Examining Procedures. More specifically, the terms "comprises", "comprising", "includes", "including", “having” and their conjugates mean "including but not limited to". The term “consisting of means “including and limited to”. The term "consisting essentially of" means that the composition, method or structure may include additional ingredients, steps and/or parts, but only if the additional ingredients, steps and/or parts do not materially alter the basic and novel characteristics of the claimed composition, method or structure.
It should be noted that various embodiments of this invention may be presented in a range format. It should be understood that the description in range format is merely for convenience and brevity and should not be construed as an inflexible limitation on the scope of the invention. Accordingly, the description of a range should be considered to have specifically disclosed all the possible sub ranges as well as individual numerical values within that range. For example, description of a range such as from 1 to 6 should be considered to have specifically disclosed sub ranges such as from 1 to 3, from 1 to 4, from 1 to 5, from 2 to 4, from 2 to 6, from 3 to 6 etc., as well as individual numbers within that range, for example, 1, 2, 3, 4, 5, and 6. This applies regardless of the breadth of the range. Whenever a numerical range is indicated herein, it is meant to include any cited numeral (fractional or integral) within the indicated range. The phrases "ranging/ranges between" a first indicate number and a second indicate number and "ranging/ranges from" a first indicate number "to" a second indicate number are used herein interchangeably and are meant to include the first and second indicated numbers and all the fractional and integral numerals there between.
As used herein the term "method" refers to manners, means, techniques and procedures for accomplishing a given task including, but not limited to, those manners, means, techniques and procedures either known to, or readily developed from known manners, means, techniques and procedures by practitioners of the chemical, pharmacological, biological, biochemical and medical arts.
It is appreciated that certain features of the invention, which are, for clarity, described in the context of separate embodiments, may also be provided in combination in a single embodiment. Conversely, various features of the invention, which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable sub combination or as suitable in any other described embodiment of the invention. Certain features described in the context of various embodiments are not to be considered essential features of those embodiments, unless the embodiment is inoperative without those elements. Various embodiments and aspects of the present invention as delineated herein above and as claimed in the claims section below find experimental support in the following examples.
Disclosed and described, it is to be understood that this invention is not limited to the particular examples, methods steps, and compositions disclosed herein as such methods steps and compositions may vary somewhat. It is also to be understood that the terminology used herein is used for the purpose of describing particular embodiments only and not intended to be limiting since the scope of the present invention will be limited only by the appended claims and equivalents thereof.
The following examples are representative of techniques employed by the inventors in carrying out aspects of the present invention. It should be appreciated that while these techniques are exemplary of preferred embodiments for the practice of the invention, those of skill in the art, in light of the present disclosure, will recognize that numerous modifications can be made without departing from the spirit and intended scope of the invention.
EXAMPLES
Without further elaboration, it is believed that one skilled in the art can, using the preceding description, utilize the present invention to its fullest extent. The following preferred specific embodiments are, therefore, to be construed as merely illustrative, and not limitative of the claimed invention in any way.
Experimental procedures
All MIP and eMIP reactions were performed as described previously [4] . All reactions were performed separately and pooled as described. Pooled samples were barcoded at the final barcoding reaction by a different Illumina index in order to separate between different experiment. To maximize uniformity, after every pooling step (that was carried out in the same volume, from every reaction), the pool was mixed, and the following reaction were processed at a single reaction volume, i.e. if pooling was performed after the exonuclease step, in which a single reaction volume is 15pl, after collecting and mixing the same volume from several tubes, 15pi 1 was taken for the barcoding PCR reaction. To define between each pooled data, a different Illumina barcode was designated at the barcoding PCR reaction that allowed the usage of the same probes in different experiments.
Following standard demultiplexing, pooled samples were demultiplexed by an in-house developed script by the respective panel indexes. Analysis, from arm trimming to mapping was as described [4]. Variant calling for the single probe panel experiment was done using samtools mpileup (samtools mpileup -Q30 -q 45 -max-depth 99999999 -count-orphans — no-BAQ — excl-flags UNMAP, QCFAIL, SECONDARY -f SGENOME -1 $TARGETS), and for the myeloid panel experiment using VarScan (samtools mpileup -f {input.reference} -max-depth 99999999 —countorphans —no-BAQ -excl-flags UNMAP, QCFAIL, SECONDARY {input} I varscan mpileup2cns — min-var-freq 0.0000001 — output-vcf 1 —variants 1 -strand-filter 0).
EXAMPLE 1 eMIP-embedded Molecular Inversion Probes
The four MIP sequential steps are hybridization, gap-fill and ligation, exonuclease, and barcoding PCR (Figure 1A-1B). Here, eMIP-embedded Molecular Inversion Probes is presented, a MIP based targeted sequencing approach that utilizes indexed panels that harbor a sample specific barcode within every probe sequence (Figure 1C-1D). This allows the index imprinting on every to-be-sequenced library at the very first step of the MIP reaction, the hybridization reaction and adding a combinatorial index layer on top of the standard barcoding PCR index primers, that may not be required.
Utilizing the eMIP approach, evidence was provided that samples can be pooled as early as the first hybridization step, and following demultiplexing by the embedded index, the expected data coming from the control samples - a standard MIP reaction was recapitulated. The eMIP approach solves both above-mentioned problems: A) cost: two factors can reduce the reaction cost by minimizing the reagents being used: earlier pooling of samples to a single reaction (i.e. pooling immediately after hybridization), and pooling of several samples together (Figure 2). B) crosscontamination: imprinting the library with index at the first step of the reaction reduces the crosscontamination pitfall. In the first experiment, a single probe MIP reaction targeting DNA samples was used with a known mutations pattern. In the next experiment, a Myeloid panel targeting 394 genomic regions at a footprint of 64,120 bases was used to target DNA samples with a known mutation pattern.
EXAMPLE 2 eMIP proof-of-concept on Acute Myeloid Leukemia (AML) mutations
To demonstrate the feasibility of the eMIP approach and to test how early in the protocol can the samples be pooled without negatively affect results, a set of panels was created. Each panel consists of a single probe (n=l) that targets the same genomic locus: chr2:25457097-25457316 (hgl9 build), targeting the DNMT3A gene, and specifically the R882 codon that is the most common mutation is acute myeloid leukemia (AML). The probe sequence was structured as such that the first 4 bases of the sequencing readl and first 4 bases of the sequencing read2 generate a panel/sample unique tag (Table 1). A number of 12 probes were applied on different DNA templates with known variant allele frequency (VAF) in replicates (n=4): 50% (OCI-AML3 cell line), 25% (OCI-AML3 cell line DNA diluted with equimolar concentration of normal Human Genomic DNA, Promega G1471) and 0% (Human Genomic DNA). A number of 5 conditions were measured: Standard, regular MIP reaction with separated barcoding PCR step i.e. after hybridization, after gap fill or after exonuclease (Figure IB) and Undetermined, a standard MIP experiment in which data that was deliberately delivered to the undetermined fastq files, following barcoding PCR using primers that do not harbor an index. It was then sought to extend the eMIP demultiplexing by repeating the same experiment with sample agnostic PCR primers, that generate a full length, index-less library. Others reactions from the eMIP protocol were performed demultiplexed by the P# (see Table 1) pooled after different protocol steps: after hybridization step, gap fill step, or exonuclease step (Figure ID).
Figure imgf000072_0001
Table 1: DNMT3A single probe eMIP experiment indexes. Different panel tags marked P# were embedded within the probe sequence, such that they will be sequenced on both reads before the arm sequence. Every row details the expected readl and read2 4 first bases.
To control for the correct P# calling, each standard MIP reactions mapped P sequence was first compared with their known embedded barcode (n=12). Exact match median is 99.1%, and when allowing one mismatch 99.9% (Figure 3A). By demultiplexing the data from the undetermined NGS run reads, inventors were able to recapitulate the original experiment and correctly call the VAF of expected sample barcodes, providing another evidence that demultiplexing can be done using the eMIP panel barcodes. Next, it was intended to validate the correct VAF calling from the expected VAF, as defined by the DNA template (Figure 3B). The data demonstrates that the correct VAF was called at each pooling step. Surprisingly, the hybridization step provided no leakage between libraries, although the pool is done before any polymerization was taken (it was performed only after pooling was done, at the gap fill step).
EXAMPLE 3 eMIP approach on the InfiniSeq Myeloid panel
Following the above proof-of-concept, it was sought to extend the eMIP validation on real life panel. A Myeloid panel was used targeting 394 genomic regions, InfiniSeq Myeloid panel v5.0.3 (footprint of 64,120 bases to target different DNA samples with a known mutation pattern). The panel was doubled X7 times, each with a different index set, sequenced from both readl & read2 (Table 2). A number of 3 DNA samples were tested: OCI-AML3, K562 and normal Human Genomic DNA. Repeating the same conditions as before (standard MIP reaction, pooling after the hybridization, gap fill, and exonuclease step), inventors then looked for the VAF at specific known 100% mutated regions: NRAS chrl: 114713908 for OCI-AML3, and TP53 chr 17:7, 675, 206 for K562. Normal DNA should yield 0% for both. Results demonstrate correct VAF calling at all conditions, even at the “after hybridization” step (Figure 4A-4D).
Figure 5A-5D provide further data for each of the duplicates with respect to Variant Allele Frequency (VAF) and data coming from Quality Control (QC). The on target percent represents the percentage of reads aligning to predefined panel targets out of the total reads per sample. The percent of uniformity represents the percentage of bases within the specified targets of interest that are covered by at least 20% of the mean coverage of bases in those targets.
Figure imgf000073_0001
Table 2: Myeloid panel eMIP experiment indexes. Different panel tags marked P# were embedded within the probe sequence, such that they will be sequenced on both reads before the arm sequence. Every row details the expected R1 and R2 4 first bases.
EXAMPLE 4
Different configurations of suitable probes for eMIP
For both the above described experiments (Examples 2 and 3), four nucleotides (4N) of the UMI were switched with 4 nucleotides that represent the sample identifier, on both sides of the probe, flanking the arms. The sequences are detailed in the tables above (Tables 1 and 2).
Additional experiments were also performed showing similar results on panels constructed in different configurations, as illustrated in Figure 6. For example, probes suitable for eMIP were designed with two sample identifiers comprising 5 nucleotides instead of 4 (Fig. 6A), or with two sample identifiers comprising 6 nucleotides and two UMIs comprising 4 nucleotides (Fig. 6B), or with one sample identifier comprising 8 nucleotides and one UMI comprising 8 nucleotides (Fig. 6C).
In the present experiment, it was aimed to:
(A) validate if pooling after hybridization and/or gap filling yields comparable data to standard MIP reactions and
(B) test various eMIP formations (as shown in Figure 6).
Here, a limit of detection (LOD) experiment was employed, diluting a known mutation in the NRAS gene (chrl:115,256,529-hgl9) with 0CI-AML3 mutated cell line DNA and normal DNA as the diluent. The expected mutation is 100% (homozygous), and the rest of the samples should have expected ratios of 50%, 25%, 10%, 5%, 0% (normal DNA). Water control (NTC) was another template in the pool.
(A) The diluted samples were analyzed using either the standard MIP reaction or 7X Multiplexing (MPX) eMIP reaction that were pooled either after the hybridization step or after the gap-fill (extension) step. Each reaction had a technical duplicate, repeating the entire MIP reaction from the hybridization step. The panel used in the experiment was a short 31 probes oncology panel, and the analysis was performed using Varscan variant caller. NTC did not show any detected variant for any of the treatments (Standard, pool after Hybridization, pool after gap-fill, and standard reaction).
Notably, serial dilution of DNA using normal DNA may not display the expected Variant Allele Frequency (VAF) due to either erroneous dilution or miscalculation due to uneven copy number variation (CNV)/karyotype between the DNA samples (as the calculation is done by concentration). Therefore, the success of such an experiment is by comparing the VAF of the eMIP protocol to the standard reaction.
The results detailed in Figure 7A-7B demonstrate that when collecting all data from the same treatment, there is no significant difference between the standard reaction and eMIP treatments, both by QC measurements and VAF inspection. Moreover, pooling after hybridization detects the correct VAF better than pooling after gap fill.
(B) Different embodiments of eMIP probes were used to test the best-performing formation. The 5BC, 6BC-4UMI, 8BC-8UMI configurations imply the type of backbone - namely length and position of the sample index and UMI tag, if exists (as seen in Figure 6). Comparing data from to the standard reaction per variant dilution (omitting 0 reads samples), it was demonstrated that it is possible capture the correct VAF for all embodiments with even better data coming from after hybridization (Figure 8). The 6BC-4UMI, 8BC-8UMI configurations present better VAF detection than the 5BC embodiment and similar or improved data quality over the standard reaction using the same probes. When collecting all data per expected VAF, the same trend was observed (Figure 9). The 6BC-4UMI, 8BC-8UMI configuration present similar or better QC results as the standard reaction (Figure 10).
EXAMPLE 5
Testing different concentrations of DNA starting material.
A 7X panel multiplexing experiment was calibrated using the same short oncology panels as used in the previous experiment (Example 3) in order to test different DNA concentrations as starting template material. Each 7 template eMIP experiment consists of 4 X identical WT DNA (modifying concentration between experiments), 1 X AML3 DNA (lOOng/pl constant) and 2 NTC (DDW). Figure 11 presents an average of only WT data and similar quality (uniformity per base% and on-target%) from lOOng/pl until 20ng/pl with a reduction of quality starting at 5ng/pl. DDW data was omitted from analysis due to low number of reads (7000 total reads on average).
EXAMPLE 6
Further optimization of the eMIP reaction.
Convinced with the ability to pool samples after the hybridization step, optimized different parameters were further optimized. The 8BC-8UMI probe configuration was used. The Myeloid panel was employed and DNA sample was normal DNA. A number of 4 different parameters across 3 amounts of cycles (20/22/24 cycles) was tested. Results were compared to the same samples that did not undergo an eMIP protocol in 24 cycles (standard reaction), limited to 150K reads.
1. DNA concentration (0.1-50ng/pL) all variations are in the same eMIP pool.
2. A pool of 16x/24x samples
3. Starting hybridization from 6/9pL DNA (Total of 0.6ng-300ng/0.9ng-450ng respective to DNA concentrations in point 1).
4. Using 6/8pL of Mix2 in the gap-filling step.
In Figure 12, a clear and consistent correlation is observed between the concentration of a given sample and the resulting number of reads. This suggests the importance of normalizing template concentrations before initiating the eMIP reaction to ensure an equal number of reads across samples. It is noteworthy that the standard reactions, pooled individually for technical reasons, received a higher number of reads compared to the eMIP samples.
Data shows that the best performing conditions (for all DNA concentrations) in terms of percent on-target were:
1. 20 cycles of PCR
2. A pool of 24 samples
3. 6pL DNA in hybridization
4. 6pL of Mix2 for gap-fill (std)
Data shows that the best performing conditions (for all DNA concentrations) in terms of percent of uniformity were:
1. 20 cycles of PCR
2. A pool of 16 samples
3. 9pL DNA in hybridization
4. 6pL of Mix2 for gap-fill (std)
Generally, although reducing the concentration of samples intra-pool, the percent of on- target is the same or improved over the standard reaction per concentration, with similar behavior over the entire experiment conditions. On the other hand, the percent of uniformity is less affected by concentration differences throughout the experiment with a reduction demonstrated at Ing/pl concentration. However, standard reaction does not show such reduction at low template concentrations.
EXAMPLE 7
Testing the effect of differential pooling volumes on the final intra-library read representation. Pooling early in the MIP protocol, as occurs in the eMIP protocol, presents a drawback due to the inability to normalize library concentrations within the same pool post eMIP reaction. This essentially fixes the read ratio per sample. To address this limitation, one approach, as illustrated above, involves normalizing template concentrations. It was aimed to investigate a hypothesis of whether differential pooling at the pooling stage (i.e., after the hybridization step) allows for the control of the total number of reads. One optional strategy is to use pooling samples of higher expected VAF% at lower volume and samples of lower expected VAF% at higher volume. By normalizing hybridization volumes, it will overcome the fixed read ratio deriving from equal volume pooling.
To do so, a myeloid panel in a 24x MPX eMIP reaction was used - targeting different mixture ratios of NA23245 (NIGMS Human Genetic Cell Repository) DNA and normal DNA in either duplicates or triplicates, and NTC (all in the sample 24x MPX eMIP reaction). The 8BC- 8UMI probe configuration was used. In order to enhance reads of lower VAF samples, the hybridization reactions were divided into two groups: higher expected VAF% - pooling 1 p 1 and lower expected VAF% 1.5pl. This ratio of 1:1.5 should be interpreted as 50% more reads to the lower expected VAF% samples compared to the higher expected VAF% samples. The same experiment was repeated with an OCI-AML3 DNA and normal DNA. Analysis compared the total reads per sample for each group divided by the total reads for all samples.
The optimal reads ratio for the higher-volume sample out of the total reads was expected to be 5%, while the reads ratio out of total reads of the lower-volume sample should be 3%. The data for both NA23245 and OCI-AML3 revealed noteworthy differences between both groups (T-test p=3X10-3 and p=9X10-5 for each experiment, respectively) as shown in Figure 13.
EXAMPLE 8
Validation of24x multiplexity data accuracy using a limit of detection (LOD) experiment.
To increase cost-effectiveness of the eMIP protocol, it was sought to increase the multiplexity of the pooled hybridization to 24x samples in one tube and compare it to the same reaction using standard MIP reaction. It was also intended to apply the unique molecular identifier (UMI) tag embedded in the probes to enhance accuracy by PCR deduplication.
To do so, a myeloid panel was used in a 24X MPX eMIP reaction: DNA templates were first created by mixture of different ratios of normal DNA (that does not contain known variants) with cell line DNA templates of expected variant frequencies for mutations in the genes DNMT3A (Heterozygous in 0CI-AML3), NRAS (Homozygous in OCI-AML3), and JAK2 (Homozygous in NA23245). The 8BC-8UMI probe configuration was used. These mixtures allowed for expected low VAF% that reaches up to 2.59%. In Figures 14, 15 and 16, every data point represents a variant called from a single DNA template of expected VAF (x-axis) and its measured VAF (y-axis). JAK2 mutation calling from any sample that contained 0CI-AML3 was omitted from the analysis since there seems to be a deletion of this variant from the DNA (all samples that were diluted with AML3 DNA demonstrated the same VAF).
Noisy VAF% data was observed from these samples, which was attributed to uneven karyotype/CNV in these cell lines. Consequently, this analysis was restricted to mixtures of cell line DNA and normal DNA.
In summary, it was shown that VAF calling from 24x eMIP is comparable to the data obtained from individual standard MIP reactions, with a slightly lower R-squared value for the DNMT3A variant in the eMIP protocol. Additionally, similar or improved R-squared correlation of deduplicated data compared to non-deduplicated data was observed for the eMIP protocol. Notably, the No Template Control (NTC) data consistently shows 0 VAF throughout the experiment.
EXAMPLE 9
Testing eMIP reaction in large scale
To test the ability to perform the eMIP reaction in large scale, a controlled system was selected. A synthetic sequence of length 56 bases was designed with flanking sequences that enable hybridization to one of the probes in the Myeloid panel. Included between these sequences is an 8 bases index sequence (50%GC). The same sequence was ordered 96 times with 8 bases indexes per individual sequence. It was then opted to process these synthetic oligos as templates for a set of eMIP experiments to test pooling 16X, 24X, 48X and 96X. Specifically, 16, 24, 48, or 96 barcode-embedded myeloid panels were mixed with individual appropriate number of synthetic oligos (e.g. in the 16X experiment, myeloid panel with index #1 was mixed with synthetic oligo #1, myeloid panel with index #2 was mixed with synthetic oligo #2 until myeloid panel with index #16 that was mixed with synthetic oligo #16). The 8BC-8UMI probe configuration was used. 16, 24, 48 or 96 eMIP hybridization reactions were pooled together followed by a single, one tube, eMIP reaction (gap filling, exonuclease, PCR). Figure 17 provides a schematic representation of the experiment.
Each multiplex reaction was performed in duplicate. The reactions were then sequenced and individual Fastq sequence files were generated utilizing the panel embedded barcode (EB8N sequence). Sequencing data per fastq file was searched by all possible synthetic sequence patterns (eMIP arm to arm where TEB8N is internal) to create a matrix of all possible false and true positives. Data demonstrates that all MPX options, including 48X and 96X generate the expected data minimal cross contamination/leakage between samples Figures 18-21 show heatmap of read counts per combination.
Interestingly, when calculating the erroneous calls, there is a similar True Positive rate for all MPX sizes with no clear drop in quality: 96.9%, 97.1%, 96.6%, 96.2% for all multiplex sizes 16X, 24X, 48X and 96X, respectively. Without wishing to be bound to theory, this suggests that the noise is intrinsic (e.g. index hoping/sample handling) and not necessarily due to the multiplexity size. Moreover, if the rate is similar, and assuming that the noise distributes equally between other samples, it seems that higher multiplexity “dilutes” the noise contribution of each sample in the pool.

Claims

CLAIMS:
1. A Molecular Inversion Probe-based method for targeted sequencing of at least one sample, comprising at least one of the steps of: a. contacting at least one target nucleic acid sequence originating from at least one sample with at least one MIP comprising unified at least one sample identifier index, and incubating for hybridization, said MIP comprising:
(i) a first region comprising a first sequence complementary to a first target region in a target nucleic acid sequence of interest;
(ii) a second region comprising a second sequence complementary to a second target region in said target nucleic acid sequence of interest; and
(iii) at least one sample identifier index; thereby obtaining at least one hybridization product/s from at least one sample; and b. subjecting the hybridization product/s obtained in step (a) to a polymerization reaction, thereby synthesizing a sequence corresponding to the at least one target nucleic acid sequence of interest nested between the first and second regions, wherein the synthesized sequence is further ligated to obtain at least one cyclized product/s in a reaction mixture from said at least one sample; and c. optionally subjecting the reaction mixture obtained in step (b) to enzymatic digestion thereby digesting linear MIP/s or nucleic acid molecule/s present in said reaction mixture, thereby obtaining at least one non-digested cyclized product/s from at least one sample; d. amplifying the nucleic acid sequence of said non-digested cyclized product/s of step (c).
2. The method of claim 1, for simultaneous targeted sequencing of at least two samples, wherein the hybridization or cyclized or non-digested cyclized product/s obtained in any one of step (a) or (b) or (c) and corresponding to each of said at least two samples are mixed/pooled together before the amplification step (d).
3. The method of claim 2, wherein the hybridization product/s obtained in step (a) for each of at least two samples are mixed/pooled.
4. The method of claim 2, wherein the cyclized product/s obtained in step (b) for each of at least two samples are mixed/pooled.
5. The method of claim 2, wherein the non-digested cyclized product/s obtained in step (c) for each of at least two samples are mixed/pooled.
6. The method of any one of claims 1 to 5, further comprising sequencing the amplified nucleic acid sequences obtained in step (d).
7. The method of any one of claims 1 to 6, wherein said at least one MIP is a plurality of MIPs comprising unified at least one sample identifier index, each of the MIPs targets different target nucleic acid sequences of interest.
8. The method of any one of claims 1 to 7, wherein said at least one target nucleic acid sequence of interest is at least one of a genomic nucleic acid sequence, a transcriptomic nucleic acid sequence, and a circulating free DNA (cfDNA).
9. The method of any one of claims 1 to 8, wherein said at least one target nucleic acid sequence of interest is a nucleic acid sequence associated with, or comprising, at least one of: genetic and/or epigenetic variation/s, pathologic disorder/s, infectious entity, microorganism/s and GC-rich regions.
10. The method of claim 9, wherein said genetic variations comprise at least one of: single nucleotide variant (SNVs) and/or single- nucleotide polymorphisms (SNPs), insertions and/or deletions, (indels), inversions, copy number variations (CNV), structural variations, alternative splicing, loss of heterozygosity (LOH), gene fusions, translocations, duplications and variable number of tandem repeats.
11. The method of any one of claims 1 to 10, wherein said at least one target nucleic acid sequence of interest is associated with at least one hereditary, congenital, and/or somatic pathologic disorder or condition.
12. The method of claim 9, wherein said infectious entity is at least one of a viral, a bacterial, a fungal, a parasitic and a protozoan pathogen.
13. The method of any one of claims 1 to 12, wherein said at least one sample is a biological or environmental sample.
14. The method of any one of claim 1 to 12, wherein said at least one sample is a biological sample and originates from a subject.
15. The method of any one of claim 2 to 12, wherein said at least two samples are biological samples and originated from the same subject or different subjects.
16. The method of claim 14 or 15, wherein said subject is at least one organism of the biological kingdom Animalia or at least one organism of the biological kingdom Plantae.
17. The method of any one of claim 14 to 16, wherein said subject is a human.
18. A method for diagnosing a pathological disorder in at least one subject by identifying at least one genetic and/or epigenetic variation/s associated with said pathologic disorder, and/or at least one nucleic acid sequence of at least one pathogenic entity, in at least one target nucleic acid sequence of at least one samples of said at least one subject, the method comprising the step of performing molecular inversion probe-based method for targeted sequencing in at least one subject or in any nucleic acid molecule obtained therefrom, wherein the presence of one or more of said variation/s in said target nucleic acid sequence and/or of at least one nucleic acid sequence of at least one pathogenic entity in said at least one sample, indicates that the least one subject has a risk, is a carrier, or is suffering from said pathologic disorder, and wherein the molecular inversion probe-based method for targeted sequencing comprising the at least one of the steps of: a. contacting at least one target nucleic acid sequence originating from at least one sample with at least one MIP comprising unified at least one sample identifier index, and incubating for hybridization, said MIP comprising:
(i) a first region comprising a first sequence complementary to a first target region in a target nucleic acid sequence of interest;
(ii) a second region comprising a second sequence complementary to a second target region in said target nucleic acid sequence of interest; and
(iii) at least one sample identifier index; thereby obtaining at least one hybridization product/s from at least one sample; and b. subjecting the hybridization product/s obtained in step (a) to a polymerization reaction, thereby synthesizing a sequence corresponding to the at least one target nucleic acid sequence of interest nested between the first and second regions, wherein the synthesized sequence is further ligated to obtain at least one cyclized product/s in a reaction mixture from said at least one sample; and c. optionally subjecting the reaction mixture obtained in step (b) to enzymatic digestion thereby digesting linear MIP/s or nucleic acid molecule/s present in said reaction mixture, thereby obtaining at least one non-digested cyclized product/s from at least one sample; d. amplifying the nucleic acid sequence of said non-digested cyclized product/s of step (c).
19. The method of claim 18, for simultaneous targeted sequencing of at least two samples, wherein the hybridization or cyclized or non-digested cyclized product/s obtained in any one of step (a) or (b) or (c) and corresponding to each of said at least two samples are mixed/pooled together before the amplification step (d).
20. The method of claim 19, wherein the hybridization product/s obtained in step (a) for each of at least two samples are mixed/pooled.
21. The method of claim 19, wherein the cyclized product/s obtained in step (b) for each of at least two samples are mixed/pooled.
22. The method of claim 19, wherein the non-digested cyclized product/s obtained in step (c) for each of at least two samples are mixed/pooled.
23. The method of any one of claims 18 to 22, wherein said pathologic disorder is at least one of: a neoplastic disorder, a metabolic condition, an inflammatory disorder, an infectious disease caused by a pathogen, mental disorders, an autoimmune disease, a cardiovascular disease, a neurodegenerative disorder, fetal genetic condition and an age-related condition.
24. The method of any one of claims 18 to 23, wherein said molecular inversion probe-based method for targeted sequencing is as defined by any one of claims 6 to 17.
25. A method of detecting the presence of one or more target microorganism, infectious entity in at least one sample, the method comprising the step of performing molecular inversion probebased method for targeted sequencing in at least one nucleic acid molecule obtained from said at least one sample, wherein the presence of one or more target nucleic acid sequence associated with said microorganism or infectious entity in said at least one sample indicates the presence thereof in the sample, and wherein the molecular inversion probe-based method for targeted sequencing comprising at least one of the steps of: a. contacting at least one target nucleic acid sequence originating from at least one sample with at least one MIP comprising unified at least one sample identifier index, and incubating for hybridization, said MIP comprising:
(i) a first region comprising a first sequence complementary to a first target region in a target nucleic acid sequence of interest;
(ii) a second region comprising a second sequence complementary to a second target region in said target nucleic acid sequence of interest; and
(iii) at least one sample identifier index; thereby obtaining at least one hybridization product/s from at least one sample; and b. subjecting the hybridization product/s obtained in step (a) to a polymerization reaction, thereby synthesizing a sequence corresponding to the at least one target nucleic acid sequence of interest nested between the first and second regions, wherein the synthesized sequence is further ligated to obtain at least one cyclized product/s in a reaction mixture from said at least one sample; and c. optionally subjecting the reaction mixture obtained in step (b) to enzymatic digestion thereby digesting linear MIP/s or nucleic acid molecule/s present in said reaction mixture, thereby obtaining at least one non-digested cyclized product/s from at least one sample; d. amplifying the nucleic acid sequence of said non-digested cyclized product/s of step (c).
26. The method of claim 25, for simultaneous targeted sequencing of at least two samples, wherein the hybridization or cyclized or non-digested cyclized product/s obtained in any one of step (a) or (b) or (c) and corresponding to each of said at least two samples are mixed/pooled together before the amplification step (d).
27. The method of claim 26, wherein the hybridization product/s obtained in step (a) for each of at least two samples are mixed/pooled.
28. The method of claim 27, wherein the cyclized product/s obtained in step (b) for each of at least two samples are mixed/pooled.
29. The method of claim 27, wherein the non-digested cyclized product/s obtained in step (c) for each of at least two samples are mixed/pooled.
30. The method of any one of claims 25 to 29, wherein said microorganism is a prokaryotic microorganism, or a lower eukaryotic microorganism, and wherein said infectious entity is at least one of a viral, a bacterial, a fungal, a parasitic and a protozoan pathogen.
31. The method of any one of claims 25 to 30, wherein said molecular inversion probe-based method for targeted sequencing is as defined by any one of claims 6 to 17.
32. A method of determining the genotype and/or genetic profile of at least one nucleic acid sequence of at least one organism, or at least one infectious entity from at least one test sample, the method comprising the step of performing molecular inversion probe-based method for targeted sequencing in said at least one test sample comprising said at least one nucleic acid sequence, wherein the molecular inversion probe-based method for targeted sequencing comprising at least one of the steps of: a. contacting at least one target nucleic acid sequence originating from at least one sample with at least one MIP comprising unified at least one sample identifier index, and incubating for hybridization, said MIP comprising:
(i) a first region comprising a first sequence complementary to a first target region in a target nucleic acid sequence of interest;
(ii) a second region comprising a second sequence complementary to a second target region in said target nucleic acid sequence of interest; and
(iii) at least one sample identifier index; thereby obtaining at least one hybridization product/s from at least one sample; and b. subjecting the hybridization product/s obtained in step (a) to a polymerization reaction, thereby synthesizing a sequence corresponding to the at least one target nucleic acid sequence of interest nested between the first and second regions, wherein the synthesized sequence is further ligated to obtain at least one cyclized product/s in a reaction mixture from said at least one sample; and c. optionally subjecting the reaction mixture obtained in step (b) to enzymatic digestion thereby digesting linear MIP/s or nucleic acid molecule/s present in said reaction mixture, thereby obtaining at least one non-digested cyclized product/s from at least one sample; d. amplifying the nucleic acid sequence of said non-digested cyclized product/s of step (c).
33. The method of claim 32, wherein said molecular inversion probe-based method for targeted sequencing is as defined by any one of claims 2 to 17.
34. A Molecular Inversion Probe (MIP) comprising:
(i) a first region comprising a first sequence complementary to a first target region in a target nucleic acid sequence of interest,
(ii) a second region comprising a second sequence complementary to a second target region in said target nucleic acid sequence of interest; and
(iii) at least one sample identifier index.
35. The probe of claim 34, wherein said at least one sample identifier index flanks at least one of said first and/or second regions.
36. The probe of claim 35, wherein said at least one sample identifier index flanks the 3' end of said first region and/or the 5' end of said second region.
37. The probe of any one of claims 34 to 36, further comprises at least one additional index.
38. A plurality of Molecular Inversion Probes (MIPs) comprising unified at least one sample identifier index, wherein each of said MIPs targets a different target nucleic acid sequence of interest, and wherein each MIP comprises:
(i) a first region comprising a first sequence complementary to a first target region in a target nucleic acid sequence of interest of said MIP,
(ii) a second region comprising a second sequence complementary to a second target region in said target nucleic acid sequence of interest of said MIP; and
(iii) at least one sample identifier index.
39. The plurality of MIPs of claim 38, forming at least one library/panel.
40. The plurality of MIPs of claim 38 or 39, wherein said at least one index flanks at least one of said first and/or second regions.
41. The plurality of MIPs of claim 40, wherein said at least one sample identifier index flanks the 3' end of said first region and/or the 5' end of said second region.
42. The plurality of MIPs of any one of claims 38 to 41, further comprises at least one additional index.
43. A kit comprising at least one set of plurality of MIPs, each of the at least one set of MIPs comprises unified at least one sample identifier index, wherein each of said MIPs targets a different target nucleic acid sequence of interest, and wherein each MIP comprises:
(i) a first region comprising a first sequence complementary to a first target region in a target nucleic acid sequence of interest of said MIP,
(ii) a second region comprising a second sequence complementary to a second target region in said target nucleic acid sequence of interest of said MIP; and
(iii) at least one sample identifier index; and optionally, at least one of:
(a) instructions for use;
(b) additional reagents.
44. The kit of claim 43, wherein said plurality of MIPs is as defined in any one of claims 38 to 42.
45. The kit of claim 43 or 44, wherein each of the at least one set of plurality of MIPs is adapted for performing molecular inversion probe-based method in at least one sample.
46. The kit of any one of claims 43 to 45, wherein said kit comprises at least two sets of plurality of MIPs, and wherein said at least one sample identifier index is different in each set.
47. The kit of claim 46, for performing molecular inversion probe-based method for simultaneous targeted sequencing of at least two samples.
PCT/IL2024/050039 2023-01-10 2024-01-10 Embedded molecular inversion probe-based targeted sequencing methods and related compositions Ceased WO2024150227A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
IL322010A IL322010A (en) 2023-01-10 2024-01-10 Embedded molecular inversion probe-based targeted sequencing methods and related compositions
EP24701526.6A EP4642925A1 (en) 2023-01-10 2024-01-10 Embedded molecular inversion probe-based targeted sequencing methods and related compositions

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202363479230P 2023-01-10 2023-01-10
US63/479,230 2023-01-10

Publications (1)

Publication Number Publication Date
WO2024150227A1 true WO2024150227A1 (en) 2024-07-18

Family

ID=89663337

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IL2024/050039 Ceased WO2024150227A1 (en) 2023-01-10 2024-01-10 Embedded molecular inversion probe-based targeted sequencing methods and related compositions

Country Status (3)

Country Link
EP (1) EP4642925A1 (en)
IL (1) IL322010A (en)
WO (1) WO2024150227A1 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130337447A1 (en) * 2009-04-30 2013-12-19 Good Start Genetics, Inc. Methods and compositions for evaluating genetic markers
WO2019068880A1 (en) * 2017-10-06 2019-04-11 Cartana Ab Rna templated ligation

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130337447A1 (en) * 2009-04-30 2013-12-19 Good Start Genetics, Inc. Methods and compositions for evaluating genetic markers
WO2019068880A1 (en) * 2017-10-06 2019-04-11 Cartana Ab Rna templated ligation

Non-Patent Citations (12)

* Cited by examiner, † Cited by third party
Title
"Bioinformatics for Cancer Immunotherapy : Methods and Protocols", vol. 1492, 8 November 2016, SPRINGER, New York, NY, ISBN: 978-1-0716-0326-0, article STUART CANTSILIERIS ET AL: "Targeted Capture and High-Throughput Sequencing Using Molecular Inversion Probes (MIPs)", pages: 95 - 106, XP055753394, DOI: 10.1007/978-1-4939-6442-0_6 *
"United States Patent Office Manual of Patent Examining Procedures"
CURRENT PROTOCOLS IN MOLECULAR BIOLOGY, vol. 112, 2015
HIATT J. B. ET AL: "Single molecule molecular inversion probes for targeted, high-accuracy detection of low-frequency variation", GENOME RESEARCH, vol. 23, no. 5, 4 February 2013 (2013-02-04), US, pages 843 - 854, XP055774259, ISSN: 1088-9051, Retrieved from the Internet <URL:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3638140/pdf/843.pdf> DOI: 10.1101/gr.147686.112 *
J, K. ET AL.: "Good Laboratory Standards for Clinical Next-Generation Sequencing Cancer Panel Tests", JOURNAL OF PATHOLOGY AND TRANSLATIONAL MEDICINE, vol. 51, 2017
J.-K. YOON ET AL: "microDuMIP: target-enrichment technique for microarray-based duplex molecular inversion probes", NUCLEIC ACIDS RESEARCH, vol. 43, no. 5, 20 November 2014 (2014-11-20), GB, pages e28 - e28, XP055408152, ISSN: 0305-1048, DOI: 10.1093/nar/gku1188 *
JESSE J. SALK ET AL: "Enhancing the accuracy of next-generation sequencing for detecting rare and subclonal mutations", NATURE REVIEWS GENETICS, vol. 19, no. 5, 26 March 2018 (2018-03-26), GB, pages 269 - 285, XP055681812, ISSN: 1471-0056, DOI: 10.1038/nrg.2017.117 *
KIVIOJA, T ET AL., NAT METHODS, vol. 9, 2012, pages 72 - 74
L, M. ET AL.: "Target-enrichment strategies for next-generation sequencing", NATURE METHODS, vol. 7, 2010, XP002658413, DOI: 10.1038/nmeth.1419
T, B. ET AL.: "An improved molecular inversion probe based targeted sequencing approach for low variant allele frequency", NAR GENOMICS AND BIOINFORMATICS, vol. 4, 2022, XP055982617, DOI: 10.1093/nargab/lqab125
TAO LIMING ET AL: "Retrospective cell lineage reconstruction in Humans using short tandem repeats", BIORXIV, 19 October 2020 (2020-10-19), XP093150635, Retrieved from the Internet <URL:https://www.biorxiv.org/content/10.1101/191296v7.full.pdf> [retrieved on 20240411], DOI: 10.1101/191296 *
XIAOYIN CHEN ET AL: "Efficient in situ barcode sequencing using padlock probe-based BaristaSeq", NUCLEIC ACIDS RESEARCH, vol. 46, no. 4, 28 November 2017 (2017-11-28), GB, pages e22 - e22, XP055751607, ISSN: 0305-1048, DOI: 10.1093/nar/gkx1206 *

Also Published As

Publication number Publication date
IL322010A (en) 2025-09-01
EP4642925A1 (en) 2025-11-05

Similar Documents

Publication Publication Date Title
Zilberman et al. Genome-wide analysis of DNA methylation patterns
Takahashi et al. 5′ end–centered expression profiling using cap-analysis gene expression and next-generation sequencing
KR102210852B1 (en) Systems and methods to detect rare mutations and copy number variation
US8481292B2 (en) Increasing confidence of allele calls with molecular counting
CN112752852A (en) Method for detecting donor-derived cell-free DNA
US20140051585A1 (en) Methods and compositions for reducing genetic library contamination
US20200123538A1 (en) Compositions and methods for library construction and sequence analysis
CN117778527A (en) Compositions and methods for identifying nucleic acid molecules
WO2017020024A2 (en) Systems and methods for genetic analysis
JP2016513461A (en) Prenatal genetic analysis system and method
JP7539770B2 (en) Sequencing methods for detecting genomic rearrangements
Ma et al. High throughput characterizations of poly (A) site choice in plants
WO2014028778A1 (en) Methods and compositions for reducing genetic library contamination
WO2013106807A1 (en) Scalable characterization of nucleic acids by parallel sequencing
EP3480319A1 (en) Method for producing dna library and method for analyzing genomic dna using dna library
CN114787385A (en) Methods and systems for detecting nucleic acid modifications
Zhai et al. Identification of transcriptome SNPs for assessing allele-specific gene expression in a super-hybrid rice Xieyou9308
US20250129417A1 (en) Ultrafast molecular inversion probe-based targeted sequencing assay for low variant allele frequency
US10870879B2 (en) Method for the preparation of bar-coded primer sets
Fowler et al. Novel approach for deriving genome wide SNP analysis data from archived blood spots
WO2024150227A1 (en) Embedded molecular inversion probe-based targeted sequencing methods and related compositions
JP2008148612A (en) Tools for identification of chicken breeds and their use
JP4111985B2 (en) Identification of genes with diverse expression levels
US20240271196A1 (en) Nucleic acid sequence analysis
Reese Investigating Genomic Imprinting in the Brushtail Possum

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24701526

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 322010

Country of ref document: IL

WWE Wipo information: entry into national phase

Ref document number: 2024701526

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE

WWP Wipo information: published in national office

Ref document number: 2024701526

Country of ref document: EP