WO2025014563A1

WO2025014563A1 - Droplet-based single-cell joint profiling of chromatin targets and transcriptome

Info

Publication number: WO2025014563A1
Application number: PCT/US2024/028497
Authority: WO
Inventors: Yang XIE; Chenxu ZHU; Bing Ren; Christopher HARTL
Original assignee: Individual
Current assignee: Individual
Priority date: 2023-07-10
Filing date: 2024-05-09
Publication date: 2025-01-16
Anticipated expiration: 2026-01-10

Abstract

We disclose a droplet-based Paired-Tag protocol for concurrent transcriptome and chromatin target site library generation that is faster and more accessible than previous methods. Using cultured mammalian cells and primary brain tissues, we demonstrate its superior performance at identifying the candidate cis-regulatory elements and associating their dynamic chromatin state to target gene expression in each constituent cell type in a complex tissue.

Description

DROPLET-BASED SINGLE-CELL JOINT PROFILING OF CHROMATIN TARGETS

AND TRANSCRIPTOME

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] This document claims priority to US Prov Ser. No. 63/525,751, filed July 10, 2023, the contents of which are hereby incorporated by reference in their entirety'.

UNITED STATES GOVERNMENT SUPPORT

[0002] This invention was made with government support under grant nos. 1U19MH114831- 02, R41MH128993 and R01 AG066018 from the Ludwig Institute for Cancer Research; and grant nos. 4R00HG011483 and 5RM1HG011014. The United States government therefore has certain rights in the invention.

BACKGROUND

[0003] The chemical modifications to the histone proteins and nucleic acids along the chromosomes, collectively referred to as the cell’s epigenome, regulate the spatiotemporal gene expression patterns in multicellular eukaryotic organisms (Strahl, B. D & Allis, C. D. The language of covalent histone modifications. Nature 403, 41-45 (2000); Preissl, S., Gaulton, K. J. & Ren, B. Characterizing cis-regulatory elements using single-cell epigenomics. Nat. Rev. Genet. 24, 25-27 (2022)). Epigenome analysis has been successfully used io reveal mechanisms of gene regulation and dysregulation in development, disease pathogenesis, and aging. However, considerable technical barriers exist in profiling the epigenome of complex tissues due io the heterogeneity of cell types and chromatin states within the biospecimens.

[0004] Single-cell epigenomic techniques can circumvent this barrier by revealing the landscape of DNA methylation (Smallwood, S. A. et al. Single-cell genome-wide bisulfite sequencing for assessing epigenetic heterogeneity. Nat Methods 11, 817-820 (2014); Luo, C. et al. Single-cell methylomes identify neuronal subtypes and regulatory elements in mammalian cortex. Science (80-. ) (2017) doi:10.1126/science.aan3351) chromosome conformation (Nagano, T. et al. Single-cell Hi-C reveals cell-to-cell variability in chromosome structure. Nature 502, 59-64 (2013); Ramani, V. et al. Massively multiplex single-cell Hi-C. Nat. Methods 14, 263-266 (2017), Tan, L., Xing, D., Chang, C. H., Li, H & Xie, X. S. Three-dimensional genome structures of single diploid human cells. Science (80-. ). 361, 924-928 (2018)) chromatin accessibility (Jin, W. et al. Genome-wide detection of DNase i hypersensitive sites in single cells and FFPE tissue samples. Nature 528, 142-146 (2015); Buenrostro, J. D. et al Single-cell chromatin accessibility reveals principles of regulatory variation. Nature 523, 486-490 (2015); Satpathy, /V T. et al. Massively parallel single-cell chromatin landscapes of human immune cell development and intratumoral T cell exhaustion. Nat. Biotechnol. 37, 925-936 (2019); Cusanovich, D. A. et al. Multiplex singlecell profding of chromatin accessibility by combinatorial cellular indexing. Science (80-. ) 348, 910-914 (2015)), histone modifications (Rotem, A et al. Single-cell ChlP-seq reveals cell subpopulations defined by chromatin state. Nat. Biotechnol. 33, 1165-1 172 (2015); Ai, S. et al. Profiling chromatin states using single-cell itChlP-seq. Nat. Cell Biol. 21, 1164-1172 (2019); Grosselin, K. et al. High-throughput single-cell ChlP-seq identifies heterogeneity of chromatin states in breast cancer Nat. Genet. 51, 1060-1066 (2019); Ku, W L. et al. Singlecell chromatin immunocleavage sequencing (scChIC-seq) to profile histone modification. Nat Methods 16, (2019); Bartosovic, M., Kabbe, M. & Castelo-Branco, G. Single-cell CUT&Tag profiles histone modifications and transcription factors in complex tissues. Nature Biotechnology 39, (Springer US, 2021); Wu, S. J. et al. Single-cell CUT&Tag analysis of chromatin modifications in differentiation and tumor progression. Nat. Biotechnol. 39, 819- 824 (2021); Rooijers, K. et al. Simultaneous quantification of protein-DNA contacts and transcriptomes in single cells. Nat. Biotechnol. 37, 766-772 (2019)), and transcription factor binding (Patty-, B. J. & Hainer, S. J. Transcription factor chromatin profiling genome-wide using uliCUT&RUN in single cells and individual blastocysts. Nat. Protoc. 16, 2633-2666 (2021); Bartlett, D. A. et al. High-throughput single-cell epigenomic profiling by targeted insertion of promoters (Tip-seq). J. Cell Biol. 220, (2021)) at single cell resolution. Singlecell multiomic assays that jointly survey multiple molecular modalities (Li, G. et al. Joint profiling of DNA methylation and chromatin architecture in single cells Nat. Methods 16, 991-993 (2019); Lee, D. S. et al. Simultaneous profiling of 3D genome structure and DNA methylation in single human cells. Nat. Methods 16, 999-1006 (2019)) including gene expression (Cao, J. et al. Joint profiling of chromatin accessibility and gene expression in thousands of single cells. Science (80-. ) .361, 1380-1.385 (2018); Zhu, C. et al An ultra high- throughput method for single-cell joint analysis of open chromatin and transcriptome. Nat. Struct. Mol. Biol. 26, 1063-1070 (2019); Ma, S. et al. Chromatin Potential Identified by Shared Single-Cell Profiling of RNA and Chromatin. Cell 183, 1103-11 I 6.e20 (2020); Zhu, C. et al. Joint profiling of histone modifications and transcriptome in single cells from mouse brain. Nat. Methods 18, 283-292 (2021). Zhang, B. et al. Characterizing cellular heterogeneity in chromatin state with scCUT&Tag-pro. Nat. Biotechno] . 40, 1220-1230 (2022); Sun, Z. et al. Joint single-cell multiomic analysis in Wnt3a induced asymmetric stem cell division. Nat. Commun. 12, (2021)) or protein abundance (Xiong, H., Luo, ¥., Wang, Q , Yu, X. & He, A. Single-cell joint detection of chromatin occupancy and transcriptome enables higher-dimensional epigenomic reconstructions. Nat. Methods 18, 652-660 (2021); Mimitou, E. P et al. Scalable, multimodal profiling of chromatin accessibility, gene expression and protein levels in single cells. Nature Biotechnology 39, (Springer US, 2021); Chen, A. F. et al. NEAT-seq: simultaneous profiling of intra-nuclear proteins, chromatin accessibility and gene expression in single cells. Nat. Methods 19, 547-553 (2022)) further help to decipher the complex interplay between the epigenome and transcriptional machinery. In particular, single-cel) CUT&Tag has enabled the characterization of active or silenced chromatin states at candidate cis-regulatory elements (Bartosovic, M., Kabbe, M. & Castelo- Branco, G. Single-cell CUT&Tag profiles histone modifications and transcription factors in complex tissues Nature Biotechnology 39, (Springer US, 2021); Wu, S. J. et a). Single-cel) CUT&Tag analysis of chromatin modifications in differentiation and tumor progression. Nat. Bioiechnol. 39, 819-824 (2021)) within different cell types of primary tissues However, previously disclosed single-cell epigenomics assays have been slow for general adoption, owing to factors such as lengthy procedures and lack of general accessibility.

SUMMARY

[0005] Disclosed herein are compositions, systems and methods for generating nucleic acid libraries, such as commonly barcoded libraries informative of chromatin configuration and transcriptome profile of nucleic acids in a single cel). Library generation variously comprises one or more of the steps of permeabilizing a target ceil nucleus; contacting contents of the target nucleus to a transposase that is coupled to a chromatin-targeting binding moiety, wherein the transposase is loaded using a barcoded transposase end nucleic acid; delivering the target nucleus to an emulsion droplet comprising i) a bead, said bead having attached thereto a first oligonucleotide comprising a poly-dT segment and a second oligonucleotide that targets the transposase end nucleic acid, wherein the first oligonucleotide and the second oligonucleotide share a common barcode; ii) a ligase and iii) a reverse transcriptase, in an environment compatible with both ligation and reverse transcription; performing reverse transcription and ligation in the emulsion droplet, and preparing a nucleic acid library from nucleic acid contents of the emulsion droplet.

[0006] A number of transposases are consistent with the disclosure here, such as TnS, Tn3, Tn7, sleeping beauty, mu or other transposase or other enzyme or enzyme complex capable of inserting oligo tags at one or both ends generated by an endonuclease cleavage, such as a cleavage targeted to a particular chromatin constituent.

[0007] To facilitate targeted cleavage, the endonuclease such as transposase is targeted to a particular chromatin site or category of site using a target binding moiety such as antibody or other moiety capable of specifically binding a particular target or class of target. Receptor ligand pairs or oligobinders, for example, may also be used to facilitate endonuclease targeting.

[0008] Exemplary targets or chromatin sites include, for example, chromatin modifications such as histone modifications, such as histone acetylation sites, deacetylation sites, methylation sites or demethylation sites. Some targets include unmodified histones

Alternately or in combination, some targets comprise other chromatin constituents such as transcription factors, RNA polymerase complexes or proteins, DNA replication complexes or proteins, DNA repair complexes or proteins, transcription repressors, or chromatin modification complexes such as polycomb complexes, or constituent thereof such as EZH2 proteins, which are in some cases involved in histone methylation. Similarly, histone acetylation enzymes or complexes may also be targeted, as may DNA polymerase complexes, telomere synthase complexes, centromere binding complexes or moieties or other chromatin associated proteins or activities

[0009] Target binding moieties are covalently or noncovaiently tethered to endonucleases such as transposases through a number of approaches consistent with the disclosure herein. In cases where the binding moiety comprises an antibody or harbors an antibody Fc region, endonucleases such as transposases are in some cases bound to or fused to antibody binding moieties such as protein G or protein A, such that the antibody may be attached to the endonuclease such as a transposase so as to direct the endonuclease such as a transposase to the target region or target moiety. [0010] Exemplary endonucleases consistent with the disclosure herein comprise Tn5-protein A fusions, that facilitate conjugation to histone-modification targeting antibodies [0011] Library generation is often performed at least in part in partitions such as emulsion droplets, for example so as to facilitate high throughput generation of multiple libraries from multiple cells without the cells intermingling pursuant to library generation.

[0012] Often, transposases in a common emulsion droplet share a common transposase end insertion sequence oligo. This sequence is in some cases unique or common or specific or identifying of one particular emulsion droplet. Alternately, the transposase end insertion sequence oligo is in some cases general to a plurality of emulsion droplets. In these cases one may tag the emulsion droplet contents using a population of oligo tags, such as those delivered in a bead to the droplet. The population of oligo tags are in some cases uniform, unique or identifying for a certain bead or comprise a subset of sequence that is uniform or identifying for a certain bead relative to other beads. The population of oligo tags in some cases comprises a randomer segment that distinguishes at least some oligos of a bead from others of the population. Populations in some cases comprise a segment that is common to oligos of the population bound to the bead but that distinguish the population from a second population bound io a second bead, or from many, most or all of the other bead bound populations.

[0013] The transposase end segments or barcoded transposase end nucleic acid comprises in some cases a region common to a plurality of barcoded transposase end nucleic acids in a droplet, and in some cases the barcoded transposase end nucleic acid comprises a region distinct from a plurality of barcoded transposase end nucleic acids in a droplet.

[0014] Some transposase end segments such as uniform or barcoded transposase end nucleic acid comprises a double-stranded region and an overhang, so as to facilitate ligation to oligos of the oligo bead, such as second oligos of the oligo bead. Barcodes in some cases identify one or more of a sample of origin, a cell type, a specific cell or a specific emulsion droplet. Some barcodes are unique in that they occur only once in a sample or a library- generation reaction Alternately, some barcodes facilitate identification or distinguishmeni of one sample of origin, a cell type, treatment, a specific cell or a specific emulsion droplet from a second sample of origin, a cell type, treatment, a specific cell or a specific emulsion droplet, or from a plurality of other samples of origin, cell types, specific cells or a specific emulsion droplets. [0015] Accessing nuclear DNA often comprises permeabilizing cell nuclei. Cell and nucleus penneabilization and endonuclease such as transposase treatment in some cases occurs prior to partitioning in emulsions. In these cases, nuclei are in some cases treated in bulk, and often using transposon ends that are not barcoded, as barcoding is effected subsequent to emulsion partitioning. Alternately, in some cases barcoding occurs prior to partitioning, while m other cases transposon or other endonuclease treatment occurs subsequent to partitioning in emulsions.

[0016] The permeabilizing in some cases does not release ribonucleic acids from the nucleus, and in some cases the permeabilizing does not release ribonucleic acids from a cell harboring the nucleus. Similarly, in some cases the permeabilizing allows retention of at least some ribonucleic acids in the nucleus, or wherein the permeabilizing allows retention of at least some ribonucleic acids in a cell harboring the nucleus.

[0017] Emulsion droplets into which nuclei such as permeabilized nuclei are portioned, such as individually partitioned or partitioned under, say, a Poisson distribution such that a substantial portion of the nuclei are individually partitioned, in some cases further comprise a reverse transcriptase, a ii gase, buffers suitable for reverse transcriptase activity, oligo primers suitable for reverse transcriptase activity, buffers suitable for ligase activity, and a bead or beads comprising a first oligo and a second suitable for genomic fragment tagging and reverse transcription product capture. In some cases beads are delivered at no more than one per droplet, or at a distribution such as a Poisson distribution such that a substantial plurality of emulsion droplets comprise no more than one oligo bead

[0018] Oligos are in some cases released from the bead. Some beads are configured to release oligos via bead degradation or via cleavage of the oligos from the bead surface or interior. Release may occur one or mor e of prior to rever se transcription, concurrent with reverse transcription or subsequent to reverse transcription. Similarly, release may occur one or more of prior to ligation, concurrent with ligation or subsequent to ligation

[0019] Oligos on the beads are in some cases barcoded, such as with a bead-identifying barcode, or in some cases also with a random er region that varies among oligos of a bead. [0020] Subsequent to or concurrent with reverse transcription, ligation or reverse transcription and ligation, the emulsion is broken Concurrently, previous to or subsequent to breaking the emulsion, ligation products, reverse transcription capture products or ligation products and reverse transcription capture products are amplified, such as using primers consistent with sequencing library preparation, such as primers having P5 and P7 end sequences suitable for bridge amplification in a flow cell of a sequencing by synthesis device. Alternately or in combination, constituents are optionally concatemeri zed and packaged for long read sequencing, such as by addition of smartbell ends to library constituents.

[0021] Some methods facilitate library generation without using split and pool barcoding. Some such methods facilitate library generation without contacting nucleic acids to a sequence specific nuclease such as a restriction endonuclease. Some such methods facilitate library generation without performing TdT library preamplification or restriction endonuclease treatment. Furthermore, some such methods do not require barcoded reverse transcription primer oligos, as barcoding is effected through droplet specific beads harboring bead specific, bead unique or bead indicative oligos. Some such methods also do not require quantification or normalization of bulked results, in some cases because the cellular input is more closely regulated or controlled in the compositions, methods and systems disclosed herein.

[0022] Methods, systems and compositions herein facilitate analysts of a broad array of cell types, such as cultured cells, for example mouse embryonic stem cells (mESCs), liver cells, brain cells, frozen tissue, tumor cells, particularly tumor cells in heterogeneous populations comprising both quiescent and metastatic or stem tumor cells, peripheral blood mononuclear cells (PBMCs). More generally, the methods, compositions and systems herein facilitate sorting and analysis of transcriptome and target associated genomic nucleic acid information from one or more cell populations in a homogeneous or heterogeneous cell sample. Libraries are generated on a cell by cell basis, such that a ceil or cell subpopulation may be identified in a heterogeneous cell sample despite making up a small fraction of the cell sample, for example no more than a single ceil in the sample, or no more than 0.001%, 0.002%, 0 005%, 0.01%, 0.02%, 0.05%, 0.1%, 0 2%, 0.5%, 1%, 2%, 5%, 10%, 20% or more than 20% of a sample.

[0023] Furthermore, through the methods, systems and compositions here, ceils may be sorted without knowing a priori the criteria upon which sorting is to be effected. That is, cells can be subjected to library formation and transcriptome data analyzed so as to identify one or more distinguishing factors such as individual transcript presence, relative accumulation patterns, presence or relative or absolute abundance of one or more transcripts implicated in a process of interest (even if not targeted for analysis prior to the library generation), transcript aggregate ace-a-c .nation patterns, covariance of subsets of transcripts or other criteria either predetermined or derived or identified a posteriori from the sample transcriptome data. Distinguishing criteria derived from transcriptome groupings may then be used to group cell s, and their target adjacent genomic libraries may be assessed as to whether or to what degree the transcriptome patterns or categories correspond to distribution of the target features in the cell genome for each cell analyzed Alternately, ceils may be sorted based upon target distribution patterns among ceil subpopulations in a heterogeneous population, and the subpopulations may then be assayed for transcriptome or transcript accumulation levels or patters that correlate with the target distribution patterns In either of these cases, cell data sorting subpopulation formation is effected using either target distribution or transcriptome library data, without having to rely upon microfluidic physical sorting of cells into populations, and features of these cell data subpopulations identified through data analysis may further be assayed as to commonalities among the data set not used for sorting, (that is, either transcriptome library data or target distribution, respectively) This sorting is enabled though the present disclosure in part because target distribution and transcriptome library data are independently mappable to an emulsion droplet, and to a cell, of origin, and this sorting dramatically simplifies the fluidics system and the reagents needed to effect this sorting.

[0024] Also disclosed herein are kits and compositions for paired genomic and transcriptomic library generation, such as generation of libraries informative of the impact of chromosomal modifications on genomic DN A accessibility and transcript accumulation. Some such kits facilitate library generation in no more than 2 days of lab work.

[0025] Some such kits comprise one or more of an endonuclease such as a transposase, a target biding moiety, a bead comprising a reverse transcription capture primer and a ligation oligo, a ligase, a reverse transcriptase and in some cases buffer formulations.

[0026] Endonucleases consistent with the kits herein include, for example, a number of* transposases, such as Tn5, Tn3, Tn7, sleeping beauty, mu or other transposase or other enzyme or enzyme complex capable of inserting oligo tags at one or both ends generated by an endonuclease cleavage, such as a cleavage targeted to a particular chromatin constituent [0027] A broad range of target binding moi eties are consistent with the disclosure herein, such as antibody or other moiety capable of specifically binding a particular target or class of target. Receptor ligand pairs or oligobinders, for example, may also be used to facilitate endonuclease targeting.

[0028] Exemplary targets or chromatin sites include, for example, chromatin modifications such as histone modifications, such as histone acetylation sites, deacetylation sites, methylation sites or demethylation sites. Some targets include unmodified histones. Alternately or in combination, some targets comprise other chromatin constituents such as transcription factors, RNA polymerase complexes or proteins, DNA replication complexes or proteins, DNA repair complexes or proteins, transcription repressors, or chromatin modification complexes such as polycomb complexes, or constituent thereof such as EZH2 proteins, which are in some cases involved in histone methylation. Similarly, histone acetylation enzymes or complexes may also be targeted, as may DNA polymerase complexes, telomere synthase complexes, centromere binding complexes or moieties or other chromatin associated proteins or activities.

[0029] Consistent with the methods herein, target binding moieties are covalently or noncovalentiy tethered to endonucleases such as transposases through a number of approaches consistent with the disclosure herein. In cases where the binding moiety comprises an antibody or harbors an antibody Fc region, endonucleases such as transposases are in some cases bound to or fused to antibody binding moieties such as protein G or protein A, such that the antibody may be attached to the endonuclease such as a transposase so as to direct the endonuclease such as a transposase to the target region or target moiety.

[0038] Exemplary endonucleases consistent with the disclosure herein comprise Tn5~protein A fusions, that facilitate conjugation to hi stone-modification targeting antibodies.

[0031] Also included in some kits are beads, such as beads harboring oiigos for library generation or library barcoding. Some beads harbor two general types of oiigos- a first set of oiigos comprising a poly-dT segment and a barcode, configured for reverse transcription product capture pursuant to tran scriptome library formation, and a second set of oiigos comprising a barcode, often a barcode similar to, identical to or indicative of a common bead origin as that of the reverse transcription capture oiigos on the bead, and configured for genomic library generation. By using correlatable barcodes on a common bead, molecules arising from the oiigos on a common bead may be assigned to a common bead of origin, a common sample, or a common partition such as a common emulsion droplet, upon determination of their sequence. Oiigos in some cases further comprise a region that varies among oligos of a bead, either between rt and genomic ligation oligos or among members of a particular oligo class. Such variation allows one to distinguish one oligo product from another arising from a single bead, or to identity duplicates arising from later amplification. [0032] Ligation oligos are in some cases configured for compatibility with endonuclease delivered or transposase loaded insertion oligos. For example, some transposase-loaded insertion oligos are double stranded, having an overhang that is reverse complementary to ligation or genomic library oligos, such that the overhang may anneal to a ligation or genomic library oligo so as to facilitate ligation or nick repair so as to link the ligation or genomic library oligo to the endonuclease introduced oligo

[0033] Alternately, in some cases bridging oligos are included separately from the transposase-loaded insertion oligos, such that they may anneal to transposase-loaded insertion oligos and ligation or genomic library oligos concurrently so as to facilitate ligation or nick repair.

[0034] A broad range of bead compositions are compatible with the disclosure herein. Some beads are configured to release oligos via. bead degradation or via cleavage of the oligos from the bead surface or interior. Exemplary bead chemistries include agarose or hydrogel.

[0035] Some kits herein comprise or are configured to be compatible with exogenously provided enzymes such those exhibiting ligase activity, nick repair activity, or reverse transcriptase activity, or combinations up to and including all of these activities.

[0036] Also disclosed herein are compositions, such as compositions comprising one or more of an oil carrier harboring an aqueous droplet, said aqueous droplet comprising a first nucleus having first genomic DNA cleaved at a first target site such as a first chromatin target site, and reverse transcribed cDNA generated from RNA harbored by the nucleus.

[0037] Some such compositions further comprise a bead, wherein the bead, comprises a first oligo population configured fitr ligation to the first genomic DNA cleaved at the first target site, and a second oligo population configured to comprise a poly-dA segment for capture of the reverse transcribed cDNA. In some cases the first oligo population and the second oligo population share a sequence such that the first oligo population and. the second, oligo population may be assigned to a common bead of origin when mixed with first oligos and second oligos or products thereof arising from distinct beads, such as beads from distinct emulsion droplets. The sequence is in some cases identical at that portion of the first oligo and the second oligo. Alternately, in some cases the tagging sequences of the first oligo and the second oligo are distinct but nonetheless sufficient to assign first oligo product and second oligo products to a common bead of origin distinct from first ohgos and second oligos arising from distinct beads in distinct emulsion droplets.

[0038] The compositions in some cases further comprise a second droplet comprising a second nucleus, a second bead comprising second ligation ohgos reverse complementary to tiie second barcode and second reverse transcription capture primers identified by the second barcode.

[0039] A droplet or droplets of the composition in some cases comprise a ligase or ligase activity, and buffers consistent with ligase activity.

[0040] A droplet or droplets of the composition in some cases comprise a reverse transcriptase or reverse transcriptase activity, and buffers consistent with said activity.

[0041] The first genomic DNA cleaved at a first target site such as a first chromatin target site is in some cases tagged using an adapter conducive to or capable of facilitating ligation to an oligo of the first oligo population. The adapter in some cases comprises a double stranded region adjacent to an overhang having sequence reverse complementary to the first oligo sequence, such that the overhang serves to stabilize the first oligo for ligation.

[0042] A number of target sites are consistent with the disclosure herein, such as histone acetylation sites or histone methylation sites, for example Lys27, Lys9 or Lys4, histone niethyitransferase binding sites such as EZH2 binding sites or other polycomb complex component binding sites; transcription factor binding sites, transcription repressor biding sites, RNA polymerase binding sites, DNA polymerase binding sites, DNA repair complex constituent binding sites, centromere or telomere binding protein binding sites or other chromatin constituent or binding partners.

[0043] Also disclosed herein are compositions, such as those relating to library formation, relating to cell nucleic acids partitioned into droplets. Some such compositions comprise one or more of an oil carrier harboring a first aqueous droplet and a second aqueous droplet, said first aqueous droplet comprising a first nucleus library, said first nucleus library comprising commonly barcoded first nucleus genomic fragments and first nucleus reverse transcribed fragments, and said second aqueous droplet comprising a second nucleus library, said second nucleus library comprising commonly barcoded second nucleus genomic fragments and second nucleus reverse transcribed fragments. In some cases the first droplet does not harbor library components other than first nucleus library components. In some cases the second droplet does not harbor library components other than second nucleus library components. Often, the first library components share a tag or tags that allow them to be grouped to the exclusion of second library components.

[0044] In particular, in some embodiments the first nucleus library arises from a single cell, such as a single ceil nucleus. The first nucleus genomic fragments are in some cases generated by binding-moiety guided endonuclease cleavage such as transposon end insertion. Endonuclease cleavage moi eties consistent with the disclosure herein include, for example transposons Tn5, Tn3, Tn7, sleeping beauty, mu transposons or other transposons or endonuclease cleavage moieties consistent with the disclosure herein.

[0045] Endonuclease cleavage such as transposon end insertion is in some cases guided by a binding moiety such as an antibody tethered to or otherwise directing the endonuclease activity. Tethering is in some cases facilitated by fusion of an antibody binding moiety such as protein A or protein G to the endonuclease or transposon. Some exemplary targeting enzymes include protein A-transposon fusions, such as pATn5 fusions.

[0046] Endonuclease cleavage such as transposon end insertion is used to cleave genomic nucleic acids at positions directed by binding moieties, so as to deliver insertion end oligos to the cleavage sites The insertion end oligos in some cases harbor terminal overhangs so as to facilitate ligation of adapters, such as barcoded adapters indicative of the ceil or origin of the genomic nucleic acids, to the genomic fragments

[0047] A broad range of binding moieties are consistent with the disclosure herein, though m many embodiments the binding moieties comprise antibodies. The binding moieties may target any number of genome or nucleic acid related targets, such as in some cases targets indicative of chromatin configuration. Some such sites relate to histone acetylation, histone deacetylation, histone methylation, or histone demethylation, for example at sites histone sites Lys27, Lys9 or Lys4. Alternate targets consistent with the disclosure herein comprise, for example, polycomb complex constituents such as histone methyl transferase EZH2, transcription factors, transcription repressors, DNA repair complexes, RNA polymerase complexes, DNA polymerase complexes, centromere proteins, telomere proteins or other proteins or chromatin constituents mentioned herein or known in the art.

[0048] Endonuclease such as transposases, insertion end oligos, and targeting moieties such as antibodies are in some cases tailored for particular nuclei or particular cells, and may be differentially barcoded. Alternately, in some cases these reagents are general to both the first droplet and the second droplet, or even to a substantial portion or all of the droplets of an emulsion.

[0049] Some such compositions further comprise reverse transcription reagents, such as reverse transcription enzymes, primers such as pdy-T or randomer such as random hexamer primers to prime reverse transcription, as buffer and nucleotide components.

[0050] In exemplary embodiments, the first aqueous droplet comprises a bead that contributes barcoded or otherwise tagged digos to the first droplet genomic library and the first droplet reverse transcribed library. These barcoded or otherwise tagged digos are added to the genomic fragments via ligation, in some cases guided by annealing to the overhang added to the genomic fragments by the endonuclease such as a transposase. These barcoded or otherwise tagged digos are added to the reverse-transcribed nucldc adds via capture probes, such as capture probes comprising pdy-A portions and barcode portions. The pdy-A portions anneal to the pdy-T regions of revers-transcribed mRNA molecules to facilitate barcoding of these mdecules.

[0051] Consequently, some compositions comprise a first droplet comprising genomic and reverse transcribed library templates sharing a first barcode, such as a first barcode delivered via a first bead having genomic ligation probes and reverse transcription capture probes sharing a first barcode. Some compositions further comprise a second droplet cunprising genomic and reverse transcribed library templates sharing a second barcode, such as a second barcode delivered via a second bead having genomic ligation probes and reverse transcription capture probes sharing a second barcode. The genomic and reverse transoibed library templates sharing a first barcode in the first droplet in some cases arise from a first single cell, while, similarly, in some cases genomic and reverse transcribed library templates sharing a second barcode in the second droplet in some cases arise from a second single cell. [0052] Genomic and reverse transoibed library templates from the first droplet and the second droplet are in some cases released from their emulsion droplets, bulked, and subjected to processing such as addition of end sequences for sequencing preparation, and then sequenced. Nuddc acid reads from these libraries are readily assigned a cdl of origin using barcode sequence such as first barcode sequence to group reads arising from the first cdl and second barcode sequences to group reads arising from the second cdl.

[0053] Similarly, disdosed herein are nuddc acid libraries, arch as those canprising one or more of first nudeus target site borders, first transcriptome fragments, second nudeus target site borders and second transcriptome fragments, wherein the first nucleus target site borders, and the first transcriptome fragments share common first tags, and wherein die second nucleus target site borders and second transcriptome fragments share common second tags. Alternately, in some cases first nucleus target site borders and first transcriptome fragments do not share common first tags, but are nonetheless identifiable as first nucleus target site borders and first transcriptome fragments based at least in part on tag sequence.

[0054] First nucleus target site borders are often indicative of proximity of the borders to one or more chromatin or chromatin associated targets. A broad range of chromatin modifications, binding complexes, binding complex constituents or protons acting on chromatin or chromatin constituent such as nucleic adds, for example genomic nucleic adds as are found in chromosomes, may serve as targets. In some cases a target comprises any molecule that may be bound by an antibody or nanobody or other binding moiety so as to localize an endonudease or transposon to the target vicinity

[0055] Any number of genome or nucleic add related targets, such as in some cases targets indicative of chromatin configuration, are contemplated heran. Some such sites relate to histone acetylation, histone deacetylation, histone methylation, or histone demethylation, for example at sites histone sites Lys27, Lys9 or Lys4. Alternate targets consistent with the disclosure herein comprise, for example, polycomb complex constituents such as histone methyltransferase EZH2, transcription factors, transcription repressors, DNA repair complexes, RNA polymerase complexes, DNA polymerase complexes, centromere proteins, telomere proteins or other proteins or chromatin constituents mentioned herein or known in the art.

[0056] Generally, first nudeus target site borders correspond to nudeic adds tagged as bring in the vidnity of a target Thus, presence of a sequence in a nudeus target site border component of a library is indicative of presence of that sequence in the vicinity of the target in native chromatin of that cell nudeus.

[0057] Similarly, first transcriptome fragments present in a library herein correspond to transcripts that accumulate in a cell having a target distribution in chromatin consistent with tire first nudeus target site borders of the portion of the first library, above.

[0058] Libraries herein provide information as to target location and the impact of that target location on transcript accumulation levels in a first individual cell, libraries heran may further provide information as to target location and the impact of that target location on transcript accumulation levels in a second individual cell. The first and second cell may differ in, for example, presence of a surface protein, developmental stage, cell growth or division activity, malignancy, senescence or other distinguishing factor.

[0059] First cell library information is distinguished from second cell library information using, for example, first barcodes that are common to the first cell nucleic acids or that otherwise facilitate or allow grouping of first ceil nucleic acids to the exclusion of second ceil nucleic acids. First barcodes may comprise an identical barcode region, and in some cases also a random er region that allows one to distinguish among individual first cell library constituents. Alternately or in combination, some first cell barcodes identify a subset of first cell library constituents, such as reverse transcribed library constituents, and distinct from first cell genomic constituents, but that nonetheless allow first cell library constituents to be grouped to the exclusion of, for example, second cell library constituents.

[0060] First cell libraries are often generated in a first compartment such as a first emulsion droplet distinct from a second compartment such as a second emulsion droplet in which second cell libraries are generated Often, the first compartment comprises no more than one first cell nucleus, and the second cell compartment comprises no more than one second cei l nucleus. Thus, in some cases the libraries herein represent target distribution and transcript accumulation for an individual first cell. In some cases the libraries also represent target distribution and transcript accumulation for an individual second ceil . The first cel I library and the second cell library are distinctly tagged or barcoded such that they may be distinguished from one another even when bulked, commonly packaged for sequencing such as through addition of p5 and p7 ends or circularization for long read sequencing, and sequenced on a common flow cell

[0061] C on si stent with the disclosure elsewhere herein, disclosed are methods relating to cell data bulking for a heterogeneous cell population. Some such methods comprise one or more of generating a nuclear genome library for each of a plurality of cells of the population and generating a transcriptome library for each of a plurality of cells of the population, wherein the nuclear genome library for a first cell of the plurality of cells of the population and the transcriptome library for a first cell of the plurality of cells of the population are barcoded such that each library may be correlated to the first cell, and wherein the nuclear genome library for a second cell of the plurality of cells of the population and the transcriptome library for a second cell of the plurality of cells of the population are barcoded such that each library may be correlated to the second cell; identifying a feature common to either the nuclear genome library for the first cell and the nuclear genome library for the second cell, or a feature common to the transcriptome library for the first cell and the transcriptome library for the second cell; and bulking data from the first cell and the second cell from the library from which the common feature was not identified.

[0062] In some embodiments, die common feature is identified from the nuclear genome library for the first cell and the nuclear genome library for the second cell. Alternately, the common feature is Identified from the transcriptome library for the first cell and the transcriptome library for the second cell.

[0063] Often, the first cell and the second cell are not assayed for a common cell surface protein prior to library generating, and may in some cases not be assayed for any phenotypic feature.

[0064] Consistent with the data-based bulking of some of these methods, the first cell and the second cell are often not physically co-segregated to the exclusion of a third cell of the population.

[0065] Bulking data from the from the first cell and the second cell allows detection of a feature that is present in the data from die from the first cell and the second cell. This is true even in the cases wherein the feature is present in the data from the first cell at a level that is below a statisdcal threshold for detection, or wherein the feature is present in the data from the second cell at a level that is below a statistical threshold for detection. In particular, bulking allows the feature to be present in the bulking data at a level that is above a statistical threshold for detection

[0066] Data may be obtained from particularly rare cell subsets of a population, such as wherein the first cell and the second cell comprise no more than 0.001% of the population, or wherein the first cell and the second cell comprise no more than 1% of the population, or no more than 0.001%, 0.00200, 0.005%, 0.01%, 0.02%, 0 05%, 0.1%, 0.2%, 0.5%, 1%, 2%, 5%, 10%, 20% or more than 20% of a sample.

[0067] Generating die nuclear genome library for each of the plurality of ceils in some cases comprises contacting the cells to a target-directed endonuclease that tags target-proximal cleavage sites. The target-directed endonuclease is in some cases a transposase, such as transposase is directed by a conjugated antibody, for example an antibody directs the transposase to a chromatin feature. Some exemplary features comprise a histone modification, or a feature selected from the list consisting of a chromatin modification, a histone modification, a histone acetylation site, a histone deacetylation site, a histone methylation site, a histone demethylation site, unmodified histones, a transcription factor, an RNA polymerase complex, and RNA polymerase protein, a DNA replication complex, a DNA replication protein, a DNA repair complex, a DNA repair complex protein, a transcription repressor, a chromatin modification complex, a polycomb complex, a polycomb complex constituent, an EZH2 protein, a histone acetylation enzyme, a histone acetylation complex, a DNA polymerase complex, a DNA polymerase complex protein, a telomere synthase complex, a telomere protein, a centromere binding complex, a centromere binding protein, and a chromatin associated viral particle.

[0068] Some such methods comprise delivering the first cell to a first emulsion droplet having a first bead tethered to commonly first barcoded oligos. And may also comprise delivering the second cell to a second emulsion droplet having a second bead tethered to commonly second barcoded oligos.

[0069] The disclosure as described above and elsewhere herein is consistent with a broad range of cell types. Some embodiments are consistent with eukaryotic or nuclei containing cells generally. Various aspects of these embodiments relate to particular cel) types, such as fungal cells, metazoan or animal cells, pathogen or parasite cells, vitally infected cells, plant cells, or particular mammalian cells such as cells of specific organs or conditions, such as mESCs, mouse or human liver cells, brain cells, tumor cells, or any or a broad range of human or other animal cells.

INCORPORATION BY REFERENCE

[0070] All publications, patents, and patent applications mentioned in this specification are herein incorporated by reference to the same extent as if each individual publication, patent, or patent application was specifically and individually indicated to be incoq^orated by reference.

BRIEF DESCRIPTION OF THE DRAW INGS

[0071] Fig. 1 presents Joint profiling of transcriptome and histone modifications in single cells using Droplet Paired-Tag. [0072] Fig 2 demonstrates that Droplet Paired-Tag effectively resolves multiple cell types in the mouse frontal cortex and identifies the candidate cis-regulatory elements within each cell type.

[0073] Fig. 3 presents a Comparison of the workflows of Droplet Paired-Tag and combinatorial -indexing-based Paired -Tag.

[0074] Fig 4 presents Experimental design and quality control 559 metrics of Droplet Paired-Tag.

[0075] Fig. 5 presents Quality control metrics of mESC Droplet Paired-Tag data, a, Strategies of valid nuclei selection.

[0076] Fig 6 presents Annotation of cell clusters in the mouse frontal cortex Droplet Paired- Tag based on transcriptomic profiles.

[0077] Fig. 7 presents Integrative analysis of Droplet Paired-Tag transcriptomic 594 profiles with public datasets

[0078] Fig 8 presents Quality control of mouse frontal cortex Droplet Paired-Tag histone modifications profiles

[0079] Fig. 9 presents Integrative analysis of histone modifications 627 at candidates cis regulatory elements across cell types in mouse frontal cortex

[0080] Fig. 10 presents Landscape of histone modifications across cell types in mouse frontal coriex.

[0081 ] Fig. 11 presents Nuclei gating strategy for mouse frontal cortex. After nuclei extraction, nuclei were stained with DRAQ7 and proceeded to FACS.

[0082] Fig. 12 presents data from individual huma brain ceils relating to RNA levels, and impact of Histone 3 Lysine 27 acetylation and Histone 3 Lysine 27 methylation on genomic nucleic acid accessibility

[0083] Fig. 13 presents data from individual cells relating to the impact of Histone 3 Lysine 27 acetylation, Histone 3 Lysine 27 methylation and EZH2 localization on genomic nucleic acid accessibility.

[0084] Fig 14 presents pseudo-bulked data relating io genomic nucleic acid accessibility at sites identified by pA-Tn5 transposons targeted by anti-H3K27Ac antibodies from three single cell samples (the top three read lines) followed by data derived using the same targeting moiety on bulk samples, and finally data obtained using AT AC seq untargeted chromatin analysis. [0085] Fig. 15 presents data from individual human brain cells relating to RNA levels, and pseudo-bulked data relating to genomic nucleic acid accessibility at sites identified by pA- Tn5 transposons targeted by anti -H3 Kdme i , anti-H3K27ac, anti-H3K27me3, and ami- H3K9me3 antibodies in cells identified as excitatory neurons based on RNA expression and segmented by donor status.

[0086] Fig 16 presents data obtained from mouse liver cells relating to transcript accumulation and genomic nucleic acid accessibility probed by anti-H3K27Ac antibody directed and anti-H3K27Me3 antibody directed pA-Tn 5 transposon activity.

DETAILED DESCRIPTION

[0087] Introduction.

[0088] Here, we report Droplet Paired-Tag, a fast and broadly accessible technique tor producing high-quaiiiy single-cell joint profiles of bistone modification and transcriptome in parallel, which has the potential to be quickly adopted by the research community for interrogating die dynamic and cell-type-specific epigenome landscapes in complex tissues. The key modifications compared to the initial combinatorial indexing-based Paired-Tag protocol are the adaptation of a commercially available microfluidics platform (e.g. lOx Chromium Single Cell Multi ome) to introduce cellular barcodes, and a much-simplified protocol to prepare sequencing libraries.

[0089] The procedure herein, therefore, offers three key advantages First, changing to a droplet-based platform greatly shortens the hands-on time in both the molecular barcoding and library preparation steps (Tig. 3) As a result, Droplet Paired-Tag can be performed in less than 1.5 days from nuclei preparation to sequencing library construction, considerably shorter than the conventional 3 -day-long Paired-Tag procedure.

[0090] Second, this design can also be more easily adapted owing to the wide availability of the commercial 10X Chromium platform and reagent kits

[0091] Third, the simplified procedure also brings improved performance in the identification of cis-regi.il atory elements and correlating chromatin states of distal elements to expression levels of putative target genes.

[0092] In Droplet Paired-Tag, nuclei are first permeabilized, followed by targeted tagmentation using primary antibodies against a histone modification that are pre-coupled with protein A -Tn 5 transposase fusion proteins (pA-Tn5), in a procedure modified from the reported CUT&Tag method (Kaya-okur, H. S. et al. CUT & Tag for efficient epi genomic profiling of small samples and single cells. (2019). See Method). The resulting nuclei and barcoded beads are then co- encapsulated into droplets with a microfluidic device (lOx Genomics Chromium X controller). Two types ofoligos with the same set of barcodes are embedded in the beads: a barcoded poly-dA oligo to label cDNA by annealing to the poly-dT of cDNA ends, and a capturing oligo to label DNA fragments derived from tagmentation. Reverse transcription is carried out m the droplets or prior to emulsion using poly-dT primers or random hexamer primers,, and the products are captured using the barcoded poly-dA oligo of the bead. A ligation reaction is simultaneously performed to attach barcoded capturing oligo to the tagmen ted chromatin fragments. To facilitate this ligation process with our inhouse protein A-Tn5 fusion (pA-Tn5), we designed the pA-TnS adaptor with 3 ’-extended bridge-pMENTs sequence reverse complementary with the capturing oligo sequence (Fig. 1 at a, Fig 4 at a, Table 1 ). Upon completion of the reactions, the reverse transcription products as well as tagmented DNA fragments from the same ceils are both labeled with the same unique cellular barcodes 'The droplets are then dissolved, and the cDNA and chromatin fragments are purified, amplified, and then split for sequencing library construction by following manufacturer-recommended protocols fl OX Chromium Single Cell Multiome) (Fig 4 at b-d).

[0093] As a proof -of-con cep r, we first used Droplet Paired-Tag to analyze a histone modification (either H 3K27ac or H3K27me3) jointly with transcriptome in individual mouse embryonic stem cells (mESCs). The complexity of the DNA-dedicated libraries corresponding to histone modifications (median number of 1,448 and 3,224 fragments per nuclei for H3K27ac and H3K27me3, respectively) was comparable to those of the combinatorial indexing-based Paired-Tag. The library complexity was also similar to or higher than those from other published methods for analyzing histone modification in single cells such as the scCUT&Tag, CoTECH and scCUT&Tag-pro (Fig. 1 at b). Compared to H3K27ac, a higher fraction of reads from H3K27me3 Droplet Paired-Tag experiments correspond to mono-, di- and tri nucleosomes fragments (Fig 5 at b), consistent with the more compact chromatin structures within the Polycomb complex repressed regions (Aranda, S., Mas, G & Di Croce, L. Regulation of gene transcription by Poly comb proteins. Sci . Adv.

1, 1-16 (2015).) f or transcriptome analysis, Droplet Paired-Tag yielded gene expression dedicating libraries with comparable complexity as the other commonly used scRNA-seq platforms (Fig. 1 c-d). The gene expression profile detected by Droplet Paired-Tag was in excellent agreement with bulk RNA seq data from mouse ES cells (Fig. 5 at c). As expected, the detected gene expression levels are in general positively correlated with the H3K27ac signal over the transcription start site (TSS), and inversely correlated with H3K27me3 deposition across gene bodies (Fig. 5 at d). These results indicate that Droplet Paired-Tag can reliably capture histone modification and gene expression simultaneously from the same cell in a high throughput fashion.

[0094] To evaluate the sensitivity and specificity of Droplet Paired-Tag in histone modification profiling, we compared the mESC single-cell data with bulk CUT&Tag and public ChlP-seq datasets generated from the mouse ES cells (Huang, H. et al. CTCF mediates dosage- and sequence-context-dependent transcriptional insulation by forming local chromatin domains. Nat. Genet. 53, 1064-1074 (2021); Kinoshita, M. et al. Capture of Mouse and Human Stem Cells with Features of Formative Pluripotency. Cell Stem Cell 28, 453- 471.e8 (2021)). For both H3K27_ac and H3K27me3, the aggregated single-cell signals faithfully resembled those from bulk CUT&Tag and ChlP-seq experiments (Fig. 1 at e), and showed high enrichment over peaks identified from ChlP-seq datasets (Fig. 5 at f). As an example, single-cell H3K27ac reads from Droplet Paired-Tag (Fig. 1 at f) ma rked the promoter region of the pluripotent gene Sox2 and its downstream super-enhancer, while H3K27me.3 reads are deposited on genes involved in cellular differentiation and expressed in specific cell lineages (e.g., Gata4) (Li, ¥., Rivera, C. M., Ishii, H., Jin, F. & Selvaraj, S. CRISPR Reveals a Distal Super-Enhancer Required for Sox2 Expression in Mouse Embryonic Stem Cells 1-17 (2014). doi:10. 1371/joumal.pone.0114485; Fujikura, J. et al. Differentiation of embryonic stem cells is induced by GAT A factors. Genes Dev. 16, 784-789 (2002)). 72% of the peaks identified from aggregated single-cell signals from Droplet Paired- Tag overlap with those from ChlP-seq experiments (Fig. 1 at g). By down-sampling the number of nuclei profiled in Droplet Paired-Tag experiments, we found that the number of H3K27ac peaks detected reaches saturation after about 2,500 nuclei, or around 6 million total reads By comparison, Paired-Tag dataset requires 60% more reads to reach saturation (Fig. 1 at h, Fig. 5 at e) Kaya-okur, H. S. et al. CUT & Tag for efficient epigenomic profiling of small samples and single cells. (2019), Boyd, J. R. et al. peaksat; an R package for ChlP-seq peak saturation analysis. BMC Genomics 24, 1-8 (2023)). [0095] To demonstrate the utility of Droplet Paired-Tag for single-cdl epigenomic analysis of primary tissues, we used it to analyze histone modifications and gene expression at single- cdl resolution in the adult mouse frontal cortex. We performed Droplet Paired-Tag experiments to profile either H3K27ac or H3K27me3, together with RNA expression, each in three biological replicates. After filtering out low-quality cells and potential doublets, we recovered 22,054 nuclei in total, of which 11,874 nuclei were profiled for H3K27ac and 10,180 for H3K27me3. Clustering of these nuclei based cm ther transcriptome profiles identified 20 major cell clusters corresponding to 9 glutamatergic neuron types (Snap25+/Slc17a7+), 6 GABAergic neuron types (Snap25+/Gadl+) and 5 non-neuron cell types (Snap25-) (Fig 2 at a, Fig. 6 at a, c). Most cell types were known to exist in the frontal cortex regions, except for three cell types found in one sample, namdy D12MSN, OBGA, OBGL, that likdy originate from anatomical regions proximal to the frontal cortex due to variations in surgical sectioning during the sample preparation (Fig. 6 at b). Interestingly, out of the 20 major cell dusters, 17 and 18 can also be recovered by independent clustering using histone modification H3K27ac or H3K27me3 signals (Fig. 2 at a), respectivdy, suggesting that cell type annotations are highly concordant between transcriptome- and epigenome-based dustering (Fig. 2 at b, Fig. 5 at a-d). Additionally, tire cdl dusters reported in this study were also consistent with those from the previous Paired-Tag dataset and the BICCN reference 10X scRNA-seq dataset generated from the mouse primary motor cortex (Fig. 7 at e-h) (Yao, Z. el al. A transcriptomic and epigenomic cdl alias of the mouse primary motor cortex. Nature 598, 103-110 (2021);.

[0096] To jointly analyze Droplet Paired-Tag data corresponding to different histone modifications, we used transcriptome-based dustering and cdl type annotation in all subsequent analyses. For histone modality, pseudo-bulk levd signals showed high concordance with both bulk CUT&Tag and ChlP-seq experiments (Fig 8 at a-b). Pseudobulk single-cdl histone signals from cdls within each duster showed that H3K27ac signal is abundant at transcription start sites of cdl-type specific genes, whereas these regions are generally silenced by H3K27me3 in other cdl types (Fig 2 at c, Fig 8 at e, Fig 10 at a-b). Compared with scCUT&Tag 16, Droplet Paired-Tag yidded a comparable fraction of reads in peaks (FRiP) but higher numbers of unique fragments per nudei for both histone modifications. Compared with the combinatorial-indexing-based Paired-Tag 26, Droplet Paired-Tag recovered fewer unique reads per nudei, but showed higher FRiP and thus captured a higher number of peak-associated reads. The improvements in both signal sensitivity and spedficity likdy contributed to the higher resolution in separating cdl types (Fig. 2 at e, f). Compared to the list of open chromatin regions identified in the same brain cdl types (CEMBA) (Li, Y. E. el al. An atlas of gene regulatory dements in adult mouse cerebrum. Nature 598, 129-136 (2021)), Droplet Paired-Tag yidded the lowest levd of H3K27me3 signals at the open chromatin regions, indicating minimal off-target Tn5 transposase activities in our procedure (Fig. 2 at g). To evaluate the sensitivity of Droplet Paired-Tag, we identified the peaks of H3K27ac signals in each cdl duster and retained those that appeared in two or more replicates. The resulting union set of 63,799 peaks was two times more than those detected in the previous combinatorial-indexing-based Paired-Tag (27,522) from a similar number of nudd (11,874 versus 11,749) (Fig. 8 at d). A higher fraction of H3K27ac peaks detected in Droplet Paired-Tag overlapped with the open chromatin regions from the same brain cdl types titan the previous Paired-Tag dataset (90.8% vs. 80.5%) (Fig. 2 at h). Taken together, these results suggest that Droplet Paired-Tag can generate high quality transcriptomic and epigenomic profiles at single-cell resolution from complex tissues.

[0097] To further demonstrate the utility of Droplet Paired-Tag in characterizing candidate ds regulatory dements (cCREs) activity states, we examined variations of chromatin states (H3K27ac or H3K27me3 ) at identified cCREs in different brain cdl types (Li, Y. E. el al. An atlas of gene regulatory dements in adult mouse cerebrum Nature 598, 129-136 (2021)). We dassified cCREs as distal or proximal based on their distance to promoter regions (see Method), then performed non-negative matrix factorization to group all the cCREs with suffident levds of H3K27ac (RPKM > 1) and H3K27me3 (RPKM > 1) signals in at least one brain cdl type into 20 cCRE modules, each showing a distinct pattern of cell-type specificity (Fig. 2 at j, Fig. 9 at a). For proximal cCREs, promoter region H3K27ac signal again showed a strong positive corrdation with transcription, while H3K27me3 signal showed an overall inverse corrdation (Fig. 2 at i). Transcriptional factor (TF) binding motif analysis of each CRE module revealed known transcription factors involved. For example, cCRE module corresponding to the ITL23GL duster (excitatory neurons from cortex layer 2/3) was enriched for NEURODI and MEF2C motifs, and the OGC cCRE module was enriched for motifs of oligodendrocytes critical transcriptional factors like SOXIO (Fig. 2 at k, Fig. 9 at b). [0098] Droplet Paired-Tag data enables the prediction of putative target genes of distal cCREs due to the joint profiling of gene expression level and chromatin state from the same cells. We calculated the pairwise Spearman’s correlation coefficients (SCC) between the histone modification signals at distal cCREs and promoter regions of potential target genes within a 500kb window (Fig. 2 at 1 ) (Zhu, C. et al. Joint profiling of histone modifications and transcriptome in single cells from mouse brain. Nat. Methods 18, 283-292 (2021); Li, Y. E. et al. An atlas of gene regulatory elements in adult mouse cerebrum. Nature 598, 129-136 (2021)). This analysis identified 20,241 significant cCREs-gene pairs with a positive correlation between H3K27ac signals and gene expression, and 4,738 pairs with a negative correlation between H3K27me3 and gene expression (FDR < 0.05, Fig. 2 at m). Interestingly, Droplet Paired-Tag data captured stronger linkages between cCREs and genes over background than the combinatorial indexing Paired-Tag (Fig. 9 at c). Gene ontology- (GO) analysis showed that H3K27ac-associated genes in OGC population were enriched for terms related to myelination process, consistent with the likely enhancer function of the distal cCREs. On the other hand, the H3K27me.3-associated genes in the same cell type are enriched for terms related to neuronal functions (Fig. 2 at n). Through integrative analysis with a recently published mouse brain single-cell chromosome contacts dataset (Liu, H. et al. DNA methylation atlas of the mouse brain at single-cell resolution. Nature 598, (Springer US, 2021)), we found that cell-type specific cCREs-gene pairs exhibiting long-range chromatin contacts were overall more positively (for H3K27ac) correlated than cCRE-gene pairs showing no detectable chromatin contacts (Fig. 9 at d-e).

[0099] Droplet Paired tag, such as disclosed in the methods, compositions and systems as disclosed herein, enables correlation of transcriptome and chromatin configuration patters on an individual cell basis. Accordingly, a broad range of input samples may be used, and the uniformity or purity of cell population of the samples does not impact the results obtained for any individual cell Libraries are generated on a cell by cell basis, such that a cell or cell subpopulation may be identified in a heterogeneous cell sample despite making up a small fraction of the cell sample, for example no more than a single cell in the sample, or no more than 0.001%, 0.002%, 0.005%, 0.01%, 0.02%, 0.05%, 0.1%, 0.2%, 0.5%, 1%, 2%, 5%, 10%, 20% or more than 20% of a sample. Furthermore, individual cell transcriptome and target genomic location data are available for a posteriori analysis such as correlation analysis and grouping. That is, cell data may be subjected to analysis so as to identify common features, either of target location library data or of transcriptome data, and the data may be grouped or 'pseudo-bulked’ to assay for aggregate commonalities or to gain higher density coverage of the data used for the grouping, or of the data not used tor the group determination. Uniike methods in the art, this ‘pseudo-bulking’ does not require the use of pre-processing sorting technologies such as microfluidics sorting, and does not require particular detectable markers to he used to drive this sorting, such as cell surface markers as are relied upon for FACS cell sorting or bulking prior to analysis. Accordingly, the approaches herein allow detection of novel features, such as transcription patters or chromatin target patterns, and to assay for correlations between these novel features and elements of either or both of the libraries generated from the cells giving rise to the novel features.

[0100] In summary, Droplet Paired-Tag is a fast and robust method for joint profiling of chromatin targets such as histone modification and gene expression in single cells. We demonstrated the utility and superior performance of this method for analyzing cell -type- specific gene regulatory programs in complex tissue. The shortened, more easily adaptable procedure due to the use of a widely available microfluidic device (i.e , 10x Genomi cs Chromium) would likely facilitate the quick adaption of this method in the field of epigenetics. Droplet Paired-Tag adds a new toolkit for investigation of the gene regulatory mechanisms in disease and lifespan.

[0101 ] Turning to the Figures, one sees the following.

[0102] Fig.1 presents Joint profiling of transcriptome and histone modifications in single cells using Droplet Paired-Tag. At the topmost row of images, labeled “a,” one sees a schematic overview of Droplet Paired- 'Tag design At left, permeabi lized nuclei retaining mRNA are contacted In bulk to antibody-conjugated pA-Tn5 transposase to tag accessible genomic nucleic acids in the vicinity of antibody-targeted moi eties in chromatin of the nuclei. Nuclei are incubated at 37 C in the presence of Mg2+ ions to facilitate tagmentation. Nuclei are then co-emulsified with barcode beads to a target amount of one nucleus and one bead per emulsion droplet Barcode beads comprise ligation oligos to label nucleic acid ends exposed by tagmentation and capture oligos to label reverse-transcribed cDNA generated from the RNA templates. For a given bead, ligation oligos and capture oligos are barcoded such that their barcode sequences indicate a common bead of origin distinct from other beads in the emulsion. The barcodes are in some cases identical through at least a distinct portion of the barcode segment or of each oligo, though nonidentical barcodes that may nonetheless be mapped to a common bead of origin are also consistent with the disclosure herein. Tagmentation-generated genomic fragment ends are ligated to ligation oligos, while reverse- transcribed cDNA templates are captured and labeled using capture oligos, such that nucleic acids of the emulsion droplet are barcoded so that they may later be assigned io a common droplet, and common nucleus, of origin to the exclusion of nucleic acids from other droplets of the emulsion. The emulsion is then broken, and barcoded fragments are subjected to library preparation suitable for the sequencing approach to be used.

[0103] At the left end of the second row of images, labeled “b,” one sees a comparison of number of unique fragments (y-axis, log scale running from 100, Ik, 10k to 10()k) generated across different single-cell epigenome assays by each of droplet paired tag mESC, paired tag (bulk) mESC, coTECH mESC, scCUT and TAG PBMC, and scCUT and TAG-pro PBMC. Numbers of fragments are determined for both antiH3K27Ac (light blue, left) and antiH3K27Me3 (brown, right) antibody guided transposons, except for coTECH mESC (Me only) and scCUT and TAG PBMC (Ac only). The results show that for each antibody-guided reaction, both genomic and cDNA fragments generated through practice of the methods herein were detected at an equal or higher number of unique fragments per cell for the H3K27A.C assay, and at a higher number of unique fragments per cell for the H.3K27Me3 assay.

[0104] Next to the right in the second row of images, labeled “c,” one sees a comparison of number of unique fragments (y-axis, log scale running from 100, Ik, 10k to 100k) generated across different single-cell multiomic assays by each of droplet paired tag mESC, paired tag (bulk) mESC, coTECH mESC, and lOx multiome PBMC. Numbers of fragments are determined for both antiH3K27Ac (light blue, left) and antiH3K27Me3 (brown, right) antibody guided transposons, except that numbers for 10x multiome PBMC numbers are determined using an AT AC untargeted approach (grey). The results show that for each antibody-guided reaction, both genomic and cDNA fragments generated through practice of the methods herein were detected at an equal or higher number of unique fragments than approaches in the art.

[0105] Next to the right in the second row of images, labeled “d,” one sees a comparison of number of unique fragments (y-axis, log scale running from 100, Ik, to 10k) generated across different single-cell multiomic assays by each of droplet paired tag mESC, paired tag (bulk) mESC, coTECH mESC, and 10x multiome PBMC. Numbers of fragments are determined for both antiH3K27Ac (light blue, left) and antiH3K27Me3 (brown, right) antibody guided transposons, except that numbers for 10x multiome PBMC numbers are determined using an AT AC untargeted approach (grey). The results show that for each antibody-guided reaction, both genomic and cDNA fragments generated through practice of the methods herein were detected at an equal or higher number of unique fragments than approaches in the art.

[0106] In the third row' of images, on the left, labeled “e,” one sees Genome-wade Spearman’s correlation coefficients between mESC histone modification datasets from bulk CUT&Tag assayed using H3K27Ac, two replicates of Droplet Paired-Tag (indicated with ‘sc’) assayed using H3K27Ac, and ChlP-seq (indicated with ‘bulk’ ) assayed using H3K27Ac, followed by ChlP-seq (indicated with ‘bulk’) assayed using H3K27Me, bulk CUT&Tag assayed using H3K27Me, and a single replicate of Droplet Paired-Tag (indicated with ‘sc’) assayed using H3K27Me. All of the H3K27Ac assays show Spearman’s coefficients at or above 0.75, while the two replicates of the method herein show a correlation of 0 85. All of the H3K27Ac assays show Spearman’s coefficients at or above 0.80. Correlations between H3K27Ac and H3K27Me3 assays were no greater than 0.37. These results indicate that the added specificity gained through the single-cell analysis of the approaches herein did not correspond to a loss in repeatability or accuracy relative to approaches in the art

[0107] Next one sees, at “f ’ Genome browser view showing examples of pseudo bulk and single-cell histone modification signal of Droplet Paired-Tag in mESC, along with bulk ChlP-seq and CUT&Tag for H3K27Ac (top) and H3K27Me3 (bottom). The pluripotent, gene (So.x2) along with its super enhancer (SCR), adjacent to Gm38509, and the repressed gene (Gata4) are highlighted in light blue. Under each Droplet Paired-Tag line, one sees the individual cell reads that are pseudo-bulked to generate the aggregate signal. These results indicate that the added specificity gained through the single-cell analysis of the approaches herein did not correspond to a loss in repeatability or accuracy relative to approaches in the art.

[0108] At “g,” one sees Number and overlap of peaks called using H3K27Ac Droplet Paired- Tag compared to that from ChlP-seq datasets. Droplet Paired-Tag generated 39,933 reads, having a 72.2% overlap with the 34,054 CHIP reads, which in turn showed a 72.4% reciprocal overlap.

[0109] At “h,” a Scatter plot shows the relationships between the total number of H3K27ac peaks called and the number of nuclei after down-sampling. The dashed line indicates the cutoff of the number of nuclei that could recover 85% of peaks. This demonstrates the utility of single nuclei assaying, as the number of nuclei to be assayed may be more precisely controlled so as to make best use of limited sample avail ability.

[0110] At Fig. 2, one sees data demonstrating that Droplet Paired-Tag effectively resolves multiple cell types in the mouse frontal cortex and identifies the candidate cis-regulatory elements within each ceil type. This resolution is effected through data analysts, and does not rely upon microfluidic sorting or particular markers such as cell surface protein or other molecules to be Identified to dictate sorting outcomes. At “a,” UMAP embedding visualization of frontal cortex Droplet Paired-Tag data, clustered and annotated based on transcriptome (gene expression), H3K 27ac and H3K 27me • respectively. At “b,” one sees Overlap of shared annotations between transcriptomic and epigenomic clustering. Runs presented include, left to right on bottom, A SC, MGL, OGC, OPC, OBGL, ITL23GL, ro . mf d .. H'L56GL, PTGL, NPGL, C Kd .. D12MSN, OBGA, VIPGA, PVGA and SSTGA. For each gene, one sees significant overlap in gene expression between the H3K27ac and H3K27me3 assays. At “c,” one sees Representative genome browser view of, from top to bottom in each row, gene expression, H3K27ac and H3K27me3 distribution over cell-type specific marker genes. At “d,” one sees Comparison of the number of unique transcripts and genes detected in each cell between Droplet Paired-Tag and other methods measuring single- cell transcriptome on mouse brain samples. At “e-f,” one sees a Comparison of the unique fragments and fraction of reads in peaks in each cell between Droplet Paired-Tag and other methods measuring single-cell histone modifications on mouse brain samples At ^" g, ^" one sees Signal enrichment over CEMBA cCREs in Droplet Paired-Tag and other methods measuring single cell histone modifications. At “h,” one sees Comparison of the number of H3K27ac peaks called from Droplet Paired-Tag dataset with die original Paired-Tag dataset, and the cCREs identified from previous snATAC-seq analysis of the mouse cortex generated by the Center for Epigenomic of Mouse Brain Atlas (CEMBA) (Li, Y E et al. An atlas of gene regulatory elements in adult mouse cerebrum. Nature 598, 129-136 (2021)). At “I,” one sees a Heatmap showing gene expression values from promoter-proximal cCREs, and at “j,” both histone modifications signal over promoter proximal cCREs across different cel l types. At ^"k,” one sees a Heatmap of known motifs enrichment for each cCRE module of promoter- proximal cCREs. Examples of known motifs are shown along with the heatmap. At “I,” one sees Schematics for identifying potential targets for cCREs. Putative cCRE-gene pairs were first determined by calculating the co-occupancy of H3K27ac or H3K27me3 reads over distal cCREs and promoter regions. The Spearman correlation coefficients (SCCs) between expression levels of genes and H3K27ac or H3K27me3 levels of cCREs were then used to identify cCRE-gene pairs. At “rn one sees Frequency density plots showing the distribution of Spearman’s correlation coefficients between histone modification signal over distal cCREs and their putative target gene expression level. Cutoffs (FDR = 0.05) used to identify cell type specific cCRE-genes pairs are also indicated. At “n,” one sees a Heatmap showing histone modification signals at distal cCREs with potential activate or repressive roles and their putative target genes expression. Example enriched GO terms for genes in selected cell types are also shown.

[0111] At Fig. 3, one sees a Comparison of the workflows of Droplet Paired-Tag and combinatorial-indexing-based Paired-Tag. Droplet Paired-Tag is completed in no more than part of two days, representing a substantial decrease over the four day prior ait protocol, while also gaining individual cell specific data. The detailed workflow describing the end-to- end procedure of Droplet Paired-Tag (left ) is compared to the conventional combinatorial- indexing-based version of Paired-Tag (right).

[0112] At Fig. 4, one sees Experimental design and quality control 559 metrics of Droplet Paired-Tag. At the upper left, labeled “a,” one sees a Schematics for Droplet Paired-Tag barcoding and library preparation process. Modification on oligos is not shown for simplicity. At “b-d,” one sees Example capillary electrophoretic gel images from a TapeStation showing final libraries size distribution for Droplet Paired-Tag DNA and RNA. At “e,” one sees Barcode mapping rate for Droplet Paired-Tag and standard 10X Multiome experiment. At “f- g,” on sees Fraction of mitochondrial reads and sequence mapping rate for Droplet Paired- Tag DNA libraries.

[0113] At Fig. 5, one sees Quality control metrics of mESC Droplet Paired-Tag data. At “a,” one sees a Strategies of valid nuclei selection DNA barcodes were filtered based on total reads per nuclei and fraction of reads in peak regions. Valid nuclei were further selected based on the pairing of valid DNA and RN A nuclei in the scatterplot of total reads per nuclei (right). Cutoffs were set by manual inspection and depicted as dash lines and are also annotated inside the scatterplot. At “b,” one sees the Distribution for fragment lengths of the sequenced fragments from the H3K27ac and H3K27me3 Droplet Paired-Tag experiments. Distributions peak at 140bp intervals, consistent with chromatin structure. At “c,” one sees a Heatmap showing pairwise Spearman’s correlation coefficients of expression profiles from single-cell Droplet Paired-Tag experiment and with the bulk mRNA-seq datasets. Presented from left to right, and top to bottom, are mESC mRNA-seq repl (bulk), mESC mRNA-seq rep2 (bulk), Droplet Paired Tag repl (sc) for H3K27Ac, Droplet Paired Tag rep2 (sc) for H3K27Ac, and Droplet Paired Tag (sc) for H3K27Me. One sees a high degree of repeatability between Droplet Paired Tag replicates (0.92), and between Ac and Me transcriptomes (0.92-0.93). Correlations between bulk and sc samples ranged from 0.79 to 0.80.

[0114] At “d,” one sees Boxplots showing the expression level of genes grouped by histone marks occupancy at their TSS regions (H3K27ac) or gene bodies (H3K27me3). At “e,” one sees Scatter plot showing the relationships between total number of H3K27ac peaks called and the number of reads from Droplet Paired-Tag or Paired-Tag dataset in a down-sampling test, similar to Fig. 1 at “g”. Dashed line indicates 85% peaks recovered for Droplet Paired- Tag dataset. At “f,” one sees Enrichment of histone modification signals over ChlP-seq peaks, compared between Droplet Paired-Tag and ChlP-seq data (Kinoshita, M. et al. Capture of Mouse and Human Stem Cells with Features of Formative Pluripotency. Cell Siem Cell 28, 453-471 e8 (2021)).

[0115] At Fig. 6, one sees Annotation of cell clusters in the mouse frontal cortex Droplet Paired-Tag based on transcriptomic profiles. At “a,” is presented Marker genes expression in 20 mouse brain cell types and the fraction of nuclei from each set of experiments. At “b”, one sees Fraction of nuclei in each cell type from each replicate. Cell types with biased distribution in any of the replicates were from anatomical regions (right) proximal to frontal cortex. At “c,” one sees Averaged Log2 fold enrichment ratio of the normalized selected marker genes expression (RPK.M) in cluster of interest versus the rest of the dataset. At “d,” one sees UMAP embeddings and overlap scores based on Droplet Paired-Tag transcriptome profiles down sampled to different nuclei numbers.

[0116] At Fig. 7, one sees an Integrative analysis of Droplet Paired-Tag transcriptomic 594 profiles with public datasets At “a-b,” one sees UMAP embedding, cell type compositions of Droplet Paired-Tag transcriptome profiles based on histone modification co-profiled. At “c,” one sees Overlap of all annotations between transcriptomic and epigenomic clustering. At “d,” one sees UMAP showing co-embedding of single nuclei transcriptomic profile from Droplet Paired-Tag, previously published Paired-Tag and reference BICCN 10X scRNA-seq datasets on mouse motor cortex regions. At “e,” one sees a Boxplot showing Pearson correlation coefficients of variable genes for matched and unmatched cell types between Droplet Paired-Tag, published Paired-Tag and public 10X scRNA-seq datasets. At “f,” one sees Overlap of shared annotations between Droplet Paired-Tag and the previously published Paired-Tag transcriptomic clustering results. Cell types not from frontal cortex are excluded in comparison. At “g,” one sees Scatterplots showing expression levels of variable genes in Pvalb+ neurons (PVGA) in Droplet Paired-Tag, published Paired-Tag and public 10X scRNA seq datasets.

[0117] At Fig. 8, one sees Quality control of mouse frontal cortex Droplet Paired-Tag histone modifications profiles. At “a,” one sees Genome-wise Spearman’s correlation coefficients between mouse frontal cortex histone modification datasets from bulk CUT&Tag and three single-cell Droplet Paired-Tag replicates. The replicates exhibit Spearman’s correlations of at least 0.93 with one another, and at least 0.83 with the bulk sample for H3K27Ac assays, and similar values for the H3K27Me3 assays. At “b,” one sees Genome wise Spearman’s correlation coefficients between mouse frontal cortex glutamatergic neurons histone modification datasets from single-cell Droplet Paired-Tag and bulk ChlP-seq. Correlations between single cell and two bulk ChlP-seq replicates are 0.83-0.84 for H3K27Ac, and 0.71- 0.75 for H3K27Me. At “c,” one sees Comparison of the number of unique fragments and FRiP in single cell between monoclonal and polyclonal H3K27ac antibodies used. At “d,” one sees Overlap of the number of H3K27ac peaks called from Droplet Paired-Tag and the original Paired-Tag datasets. Both datasets are down-sampled to similar number of cells. At “e,” one sees a Heatmap showing gene expression level and promoters or gene bodies histone modifications signal over marker genes in each cell type. Examples of known marker genes are shown. At “f.” one sees a UMAP showing co-embedding of single nuclei H3K27ac profile from Droplet Paired-Tag, Paired-Tag and scCUT&Tag. At “g,” one sees an Overlap of shared annotations between Droplet Paired-Tag and public single cell H3K27ac datasets. At “h,” one sees a UMAP showing co-embedding of single nuclei H3K27me3 profile from Droplet Paired-Tag, Paired-Tag and scCUT&Tag. At “i,” one sees Overlap of shared annotations between Droplet Paired-Tag and public single cell H3K27ac datasets.

[0118] At Fig. 9, one sees an Integrative analysis of histone modifications 627 at candidates cis regulatory elements across cell types in mouse frontal cortex, “a,” presents a Heatmap showing H3K27ac and H3K27me3 signals at the distal cCREs across different cell types. “b,” presents a Heatmap of known motifs enrichment for each cRE module of distal cCREs. Examples of known motifs are shown along with the heatmap. “c,” presents a Cumulative distribution function plot of Spearman’s correlation coefficients between histone modification at distal cCREs and gene expression at their putative target genes Distributions from Droplet Paired-Tag and the previously published Paired-Tag experiment are shown, and the greatest separation between real and random background for each dataset is annotated inside the plot, “d,” presents a boxplot showing Spearman’s correlation coefficients of H3K27ac- or H3K27me3- associated distal cCREs-gene pairs intersected at or not at loop anchors identified in snm.3C seq dataset (Liu, H. et al. DNA methylation atlas of the mouse brain at single-cell resolution. Nature 598, (Springer US, 2021)). “e,” presents Representative snm3C-seq contact heatmap and genome browser view of cell-class (glutamatergic versus non-neurons) specific active and repressive putative cCREs regulating gene Neurod6. Distal cCREs and proximal regions are highlighted in pink.

[0119] At Fig. 10, one sees a Landscape of histone modifications across cell types in mouse frontal cortex. At “a-b,” one sees Representative Genome browser view showing H3K27ac (a) or H3K27me3 (b) signal over marker genes in each mouse brain cell type. At “c,” one sees Top 10 H3K27me3 marker bins from each cell type with >100 nuclei profiled. At “d,” one sees Spearman’s correlation coefficients between H3K27me3 signal in cell-type specific marker bins (CPM) and bin-overlapped genes expression level (RPKM) from different cell types.

[0120] At Fig. 11, one sees a Nuclei gating strategy for mouse frontal cortex. After nuclei extraction, nuclei were stained with DRAQ7 and proceeded to FACS. First, potential nuclei were identified using forward scatter (FSC) area and backscatter (BSC) area (left dot plot). Next, potential doublets were removed based on BSC as well as FSC signal width (two middle dot plots). Finally, 200 - 500k diploid nuclei (2n) were collected for antibody incubation (right dot plot).

[0121] At Fig. 12, one sees data from individual huma brain cells relating to RNA levels, and impact of Histone 3 Lysine 27 acetylation and Histone 3 Lysine 27 methylation on genomic nucleic acid accessibility. This was executed in post-mortem human brain. Shown here are data embeddings for 3 modalities: RNA (combined across experiments); RNA+H3K27ac, and RNA+H3K27me3. [0122] At Fig. 13, one sees data from individual cdls relating to the impact of H3K acetylation, H3K27 methylation and EZH2 localization on genomic nudeic add accessibility. This was executed in post-mortem human brain. Shown here are UMI profiles for recovered barcodes (y-axis: RNA UMI per barcode, x-axis: DNA UMI per barcode). As is typical with all droplet assays, there is a large background of low-coverage barcodes (>90% of barcodes accounting for <5% of reads); and -2,000 positive barcodes per-modality. These results demonstrate the ability to apply the technology to targets outside of histones.

[0123] At Fig. 14, one sees pseudobulked data relating to genomic nudeic add accessibility at sites identified by pA-Tn5 transposons targeted by anti-H3K27Ac antibodies from three single cdl samples (the top three read lines) followed by data derived using die same targeting moiety on bulk samples, and finally data obtained using ATAC seq untargeted chromatin analysis. This demonstrates reproducibility, concordance with bulk standard, and low open-chromatin off-target.

[0124] At Fig 15 one sees data from individual human brain cells relating to RNA levels, alone and in integration with single-cdl RNA produced by alternative methodologies, and pseudobulked data relating to genomic nucleic add accessibility at sites identified by pA- Tn5 transposons targeted by anti-H3K4mel, anti-H3K27ac, anti-H3K27me3, and anti- H3K9me3 antibodies in cdls identified as excitatory neurons based on RNA expression and segmented by donor status. This demonstrates the simultaneous use of both modalities: for cellular type annotation (“RNA”) and type-specific targeted nucleic add accessibility (“DNA"). Fig 15, like part “c” of Fig 2, illustrates a functional implementation of the techndogy herein, in that unsorted cdls may be individually assayed as to their transcriptomes and accessible genomic nudeic adds in the vicinity of a target; and the resulting data may be grouped based upon one or more criteria derived from the data, such as transcriptome data indicative of a particular cdl type, such that chromatin configuration for that cdl type may be evaluated for individual cdls of that type, alone or in combination with ‘pseudo-bulked* results aggregated from a plurality of cdls sharing transcriptome data indicative of a common type. Through these approaches, as presented in Fig 15, one may effect cdl-spedfic data sorting and cdl category specific bulking of the resulting data. The result of this data sorting is to generate data comparable to that pursued through cdl specific sorting and bulking fluidic techndogies such as FACS based sorting techndogies. [0125] At Fig. 16, one sees data obtained from mouse liver cells relating to transcript accumulation and genomic nucleic acid accessibility probed by anti-H3K27Ac antibody directed and anti-H3K27Me3 antibody directed pA-Tn5 transposon activity. These results demonstrate the application of (droplet) Paired-Tag in tissues beyond brain

[0126] Representative Oligos consistent with the disclosure herein are presented below

Table 1 - Oligos Sequence

[0127] AdapterA /5Phos/TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG

[0128] AdapterB GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG

[0129] pMENTs-Bridge

[0130] /5Phos/CTGTCTCTTATACACATCTGACGCTGCCGACGACAGACGCG

[0131] pMENTs /5Phos/CTGTCTCTTATACACATCT/3ddC/

Barcoding pA-Tn5 for Droplet Paired-Tag

[0132] In a conventional Droplet Paired-Tag or 10X Genomics Single Cell Multiome AT AC + Gene Expression experiment, no barcodes are added prior to generating the droplets. Here we describe a method of providing barcodes to the pA-Tn5 so that distinct barcodes can be added at the transposition step.

How to add barcodes

[0133] The DNA loaded onto pA-Tn5 for Droplet Paired-Tag assay is depicted below, where “Phos” indicates the 5’ phosphate group. In the droplet, the top strand becomes ligated to the oligo on the 10X Multiome gel bead with the help of the overhang on the bottom strand to acquire the single>-cell barcode.

[0134] 5 ' -Phos-TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG-3 '

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I

3 ' -GCGCAGACAGCAGCCGTCGCAGTCTACACATATTCTCTGTC-Phos-5 '

[0135] We can add extra barcodes as depicted as “NNNNNN” below.

[0136] 5 ' -Phos-NNNNNNTCGTCGGCAGCGTCAGATGTGTATAAGAGACAG-3' I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I

3 ’ -GCGCAGACNNNNNMAGCAGCCGTCGCAGTCTACACATATTCTCTGTC-Phos-5 '

[0137] The length of the barcodes can be varied depending on how many baraxles are needed. This barcode can be sequenced together with the single-cell barcode from the 10X Multiome gel bead. Benefits of adding barcodes to pA-Tn5 - Loading more than one transposition reactions in one well

[0138] In the conventional protocol, one well of a 10X Genomics microfluidic chip is loaded with nuclei from one transposition reaction. If multiple transposition reactions are mixed, there is no way to identify which nuclei came from which transposition reactions.

[0139] By using pA-Tn5 with distinct barcodes, it becomes possible to distinguish nuclei from different transposition reactions even when they were mixed and loaded to the same well.

[0140] Mixing transposition reactions for overloading

[0141] There is a trade-off between the number of cells to profile and the multiplet rate. Loading a high number of nuclei to increase the number of cells to profile leads to a high multiplet rate (the rate that two or more nuclei acquire the same barcode), which hinders the ability to resolve single cells. The multiplet rate will be low if the number of loaded nuclei is kept low, but that will lead to many empty droplets and a low number of total cells to profile. [0142] If additional barcodes can be added by pA-Tn5, they can be used to identify a portion, if not all, of the multiplets. Suppose distinct barcodes are added in two transposition reactions. If you mix an equal number of the nuclei from the two reactions before loading onto the droplets, half of the doublets will have two different pA-Tn 5 -derived barcodes and can be identified. If you mix nuclei from three transposition reactions with distinct barcodes, two thirds of the doublets can be identified

[0143] The identified multiplets can be excluded from the analysis and doing so reduces the portion of the multiplets in the cells to be profiled. Thus, the barcodes from the transposition reactions can be used to suppress the negative effect of the multiplets and enable loading of a higher number of nuclei

[0144] Mixing transposition reactions to minimize the effect of a clog

[0145] A 10X Genomics microfluidic chip can get clogged, which result in loss of samples from one or more wells.

[0146] If a sample from one transposition reaction is loaded to only one welt the entire sample gets lost when there is a clog of the well. Suppose four different transposition reactions A, B, C, and D are performed in one experiment and suppose the well for sample D is clogged. Then the data for samples A, B, and C are obtained, but the data for sample D are completely lost. [0147] Suppose different barcodes are added to reactions A, B, C, and D and the samples are mixed before loading to the droplets. If one well is clogged, the total number of cells profiled will be reduced, but it will be still possible to obtain the data from all the samples A, B, C, and D. The effect of a clog is not as catastrophic as without the extra barcodes.

EXAMPLES

Example 1 . Cell culture.

[0148] Mouse embryonic stem cells (mESC) used in this study have been described in a previous study. mESC were maintained in feeder-free and serum-free 2i media at 37 °C with 5% CO2. To isolate nuclei, mESC were dissociated with Accutase (Innovative Cell Technologies, AT I 04), collected by centrifugation, washed twice with PBS (Gibco, 10010023) then resuspended in cold nuclei permeabilization buffer (lOm.M Tris-HCl pH 7.4 (Sigma, T4661), lOmMNaCI (Sigma, S7653), 3 mM MgCI2 (Sigma, 63069), l*Protease Inhibitor (Roche, 05056489001), 0.5 U μl 1 RNase OUT (Invitrogen, 10777-019), 0.5 U pl 1 SUPERase Inhibitor (Invitrogen, AM2694), 0.1% IGEPAL CA630 (Sigma, 18896) and 0.02% Digitonin (Sigma, Cat#D141) for 1 minute. Nuclei were counted using a BioRad TC20 cell counter with 0.4% Trypan Blue (Gibco, 15250061) staining. For each Droplet Paired-224 Tag experiment, 0.5 million nuclei were used.

Example 2. Mouse brain dissection and nuclei extraction.

[0149] All animal work described in this manuscript has been approved and conducted under the oversight of the UC San Diego Institutional Animal Care and Use Committee Male C57BL/6J mice were purchased from the Jackson Laboratory (#000664) at 12 weeks of age and housed in the barrier facility at UC San Diego in a 12-hour light/dark cycle in a temperature-controlled room, with ad libitum access to water and food, until euthanasia and tissue collection at 16 weeks of age. The frontal cortex was dissected from 16-week male mice, snap-frozen in liquid nitrogen, and stored at -80°C before proceeding to nuclei extraction.

[0150] Single cell suspensions were prepared from frozen tissues by Jouncing in the douncing buffer (0.25M sucrose (Sigma, S7903), 25mM KC1 (Sigma, P9333), 5mM MgCI2, 10mM Tris-HCl pH 7.4, 1 mMDTT (Sigma, D9779), 1x Protease Inhibitor, 0.5 U μl 1 Rnase OUT, 0.5 U pl 1 SUPERase Inhibitor) The cel) suspension was then filtered through a 30-pm Cell-Trie (Sysmex) for debris removal, and spun down for lOmins at 300g, 4 °C. Cell pellets was washed once with Jouncing buffer, spun down again, and resuspended in cold nuclei permeabilization buffer for lOmins. Permeabilized nuclei were pelleted by centrifuge for lOmins at 1,000g, 4 °C, and washed with sort buffer (1*PBS (Gibco, 10010023), 1x Protease Inhibitor (Roche, 05056489001), 0.5 U μl 1 Rnase OUT (Invitrogen, 10777-019), 0.5 U pl 1 SUPERase Inhibitor (Inviirogen, AM2694), 1mM EDTA (Invitrogen, 15575020), 1% BSA (Sigma, A1595) once. After resuspension in sort buffer, nuclei are stained with 2 M 7-AAD (Invitrogen, Al 310) for 10mins on ice, and proceed to Fluorescence-activated nuclei sorting (FANS) with an SH800 cell sorter (Sony) for isolation of single nucleus (Fig. 11). Nuclei are collected in collection buffer (I /PBS (Gibco, 10010023), 5 / Protease Inhibitor (Roche, 05056489001), 2.5 U pl 1 Rnase OUT (Invitrogen, 10777-019), 2.5 U pl 1 SUPERase Inhibitor (Invitrogen, AM2694), ImM EDTA (Invitrogen, 15575020), 5% BSA (Sigma, Al 595)) at 5 °C, then immediately proceed to centrifuge for lOmins at 750g, 4 °C. Nuclei were washed twice with sort buffer and counted on an RWD C100-pro fluorescence cell counter with DAPI staining to better estimate the nuclei number. For each histone modification, around 0.5 million sorted nuclei were aliquoted and proceeded to Paired-Tag experiments.

Example 3. Assembly of active 258 transposon complex.

[0151] Sequences for all custom oligos used in this study are provided in Table 1. To assemble Protein A-Tn5 (pA-Tn5) complex suitable for 10X Single Cell Multiome ATAC + Gene Expression platform, we mixed transposome oligos (100 pM) in two separated PCR tubes (USA Scientific, 1402-2300): 25 pL AdapterA + 25 pL bridge-pMENTs, or 25 pL AdapterB + 25 μL pMENTs. Oligos were annealed in a thermal cycler with the following program: 95 °C for 5 mins, then slowly cool down to 12 °C at the speed of 0.1 °C/s. Each 1 μL of the annealed transposome DNAs was mixed separately with 6 μL of unloaded pA-Tn5 (0.5 mg/mL, MacroLab), mixed, and quickly spun down. The mixtures were incubated at room temperature for 30mins then at 4 °C for an additional 10 min, and an equal volume of assembled pA-Tn5-AdapterA and pA-Tn5-AdapterB were mixed to form functional pA-Tn5 complex. Assembled transposon complexes can be stored at -20 °C for up to six months, and transposon activity is validated with bulk CUT&Tag assay before use.

Example 4. Antibodies.

[0152] Antibodies used in this study include: H3K27ac (Abeam, ab177178, Lot GR3202987- 20 (recombinant); Abeam, ab4729, Lot GR3442886-1 (polyclonal)) and H3K27me3 (Abeam, ab 192985, Lot GR3399022-3 (recombinant)). We found that antibody specificity is critical for high-quality signals of single-cell histone data. For H3K27ac, although recombinant antibody yielded a higher fragment number per cell than polyclonal antibodies, its enrichment at transcription starting sites or ChlP-seq peaks were lower (Fig. 8 at c). Therefore, except for replicate 1 (repl) of the mouse frontal cortex datasets, all other experiments targeting H3K27ac were carried out with the polyclonal antibody.

Example 5. Droplet Paired Tag Experimental Protocol Antibody-guided tagmentation

[0153] pA-Tn5 and primary antibody were pre-conjugated during nuclei extraction. 1 μg primary antibody and 1 μ L assembled pA-Tn5 are premixed in 35 μL MED1 buffer (20 mM HEPES pH 7.5, 300mM NaCl, 0.5mM Spermidine, 1/ Protease Inhibitor cocktail, 0.5 U pl 1 SUPERase IN, 0.5 U μl 1 Rnase OUT, 0.01% IGEPAL CA630, 0.01% Digitonin, 2 mM EDTA and 1% BSA) and rotated at room temperature for 1 hour, as a previous study showed that high salt condition is critical to reducing undesired open chromatin background 9. Nuclei extracted with the above-described protocol were also resuspended in MED1 buffer, and 0.15 - 0.50 million nuclei are distributed into the pre-mixed antibody/p.A-Tn5 to a final volume 75 μ L. The mixtures were rotated at 4 °C overnight for epitope targeting along with pA-Tn5 tethering.

[0154] After overnight incubation, nuclei were isolated by centrifugation for I Omins at 300g, 4 °C and washed with MED2 buffer (20mM HEPES pH 7.5, 300 mM NaCl, 0 5mM Spermidine, 1 Protease Inhibitor cocktail, 0.5 U pl 1 SUPERase IN, 0.5 U pl 1 Rnase OUT, 0.01% IGEPAL CA630, 0.01% Digitonin and 1% BSA) three times to remove excessive antibody and pA-Tn5. Tagmentation is next carried out in 50 pL MED2 buffer supplemented with WmM MgC12 (Sigma, Ml 028) at 550 r.p.m., 37 °C for 60mins in a ThermoMixer (Eppendorf). The tagmentation reaction was terminated by adding an equal volume of stop solution (WmM Tris-HCl pH 8.0, 20mM EDTA 2% BSA, 2x Protease Inhibitor cocktail, 1 U μl 1 SUPERase IN and 1 U pl 1 Rnase OUT). Nuclei were spun down for 10 min at 500g, 4 °C, and washed once with 1x Nuclei Buffer (10X Genomics PN-2000207, supplemented with 1mM DTT, 0 5 U μl 1 SUPERase IN and 0.5 U pl 1 Rnase OUT). Finally, nuclei were resuspended in 10 μl 1 xNuclei Buffer, and 10,000 - 16,000 nuclei were aliquoted into PCR tubes with a total volume of 8 μ L. Normally, the nuclei recovery' rate is around .30% - 60% depending on the starting input materials. To assess the CUT&Tag performance, around 10,000 - 50,000 nuclei were used for bulk analysis, the fragmented DNA in these nuclei was purified with MiniElute (QIAGEN, 28004) column, and amplified for quality control assessment. 7 uL ATAC Buffer B (10X Genomics PN-2000193) is added to 8 iff., nuclei mixture to reach a final reaction volume of 15 pL, same as specified in the user manual of Chromium Next GEM Single Cell Mui dome kit (CG000338, RevF), except that we substitute 3 uL ATAC enzyme B ( 10X Genomics PN- 2000265/ 2000272) with 1 xNuclei Buffer.

Reverse transcription, cell barcoding, library preparation

[0155] Barcoding reaction mixes are prepared as described in the manual of Chromium Next GEM Single Ceil Multiome kit. 60 pL prepared master mix is added to 15 pL nuclei mixture before being loaded onto Chromium Next GEM Chip J and proceeding to droplet generation with Chromium X microfluidics system (IOX Genomics). Reverse transcription and ceil barcoding were carried out inside 10X GEM) Final D'NA and RNA libraries amplification was performed according to the Chromium Single Cell ATAC Library kit manual, except that we used an increased number of amplification cycles ( in total 12-1.3) for hi stone modality library.

Example 6 Preprocessing of Droplet Paired Tag Data

[0156] All sequencing was carried out with an Illumina. Nextseq2000 sequencer. Fastq files are demultiplexed using cellranger-arc v.2 0.0 (lOx Genomics) with command. ' cellranger-arc mkfastq'; However, D'NA and RNA data, were preprocessed using cellranger-atac v.2.0.0 and cellranger v.6.1 2, respectively, and barcodes were manually paired with custom script using the matching relationship provided in cetiranger-arc. 'To select high-quality nuclei, we first aggregated histone modification data from the same sample and performed peaks calling to find narrow peaks (for H3K27a.c) or broad peaks (for H3K27me3) using MACS2 with default parameters. Then, we filtered the histone modality based on the pen-cell fragment number and fraction of reads in peaks (FRiP). Next, we selected nuclei by pairing pass-filtered nuclei from both modalities (histone modification and transcriptome), as shown in Fig. 5 at a-f.

Before clustering, nuclei with a. high fraction of mitochondrial and ribosomal RNA reads were filtered out. Nuclei with an extremely high number of reads were also filtered out since most of them are doublet. Potential doublets were identified and removed using Scrublet for individual RNA datasets, corresponding D'NA barcodes were also removed.

[0157] For genome browser track generation, sample or cell-type specific bigwig files were generated from bam files with deepTools (v.3.5. 1), and visualized in IGV (v.2. 15.4). [0158] For FRiP calculation, duplicated reads were removed using Picard MarkDuplicates, taking barcode information into account. Peaks calling was then performed using macs2 with default parameters, except that for H3K27me3 we called broad peaks with ’-broad* due to its broad domain enrichment Pre-processing pipdines and scripts are shared at the website gjthub.com/Xieeeee/Droplet-Paired-Tag.

Example 7. Analysis of Droplet Paired Tag Data Signal enrichment calculation

[0159] Density plot and heatmap of signal enrichment over ChlP-seq peaks or CEMBA cCREs were carried out using deepTools. Peaks overlapped with ENCODE blacklist (v2) or CUT&RUN blacklist regions were removed during the calculaticm of enrichment (Amemiya, H. M., Kundaje, A & Boyle, A. P. The ENCODE Blacklist: Identification of Problematic Regions of the Genome. Sd. Rep. 9, 1-5 (2019); Nordin, A, Zambanini, G., Pagdla, P. & CantQ, C. The CUT&RUN Blacklist of Problematic Regions of the Genome. bioRxiv 2022.11.11.516118 (2022), doi:10.1101/2022.11.11.516118).

Clustering and annotation

[0160] Clustering of single-cell transcriptome data was carried out in R using Seurat (v.4.1.0) and Signac (v.1.6.0) (Hao, Y. et al. Integrated analysis of multimodal single-cell data. Cdl 184, 3573-3587.e29 (2021); Stuart, T., Srivastava, A, Madad, S., Lareau, C. A. & Satija, R. Single-cell chromatin state analysis with Signac. Nat. Methods 18, 1333-1341 (2021)). In short, gene counts were normalized, and the top 2,500 variable genes were sdected for dimension reduction by principal component analysis (PCA). For all datasets, the first 35 principal components were used to correct batch effects with harmony, followed by UMAP visualization and Louvain clustering. Marker genes for each cluster are identified using Seurat, with the Log2 fold change threshold set to 0.25. Annotation of cluster identities was deme with the marker genes characterized in previous studies. For epigenome data, 10X fragment files were converted to cdl-by-bin matrices using 5kb non-overlapping genomic bins, and clustering was carried out using Signac. Briefly, sequencing depth was normalized using the two-step term frequency-inverse document frequency (TF-IDF). The top 85% of genomic bins were sdected for linear dimension reduction by singular value decomposition (SVD), and batch effects were corrected with harmony, again followed by UMAP visualization and Louvain clustering. Gene activity score was computed by signal density in promoter and gene body regions. [0161] To compare clustering results from different modalities, we first annotated the epigenome dusters by nominating the most abundant cell type identified with transcriptome dustering. Overlap coefficients were calculated according to the proportion of cdls sharing the same labels from both the transcriptome (A) and epigenome dusters (B), in the transcriptome dusters:

[0162] Oi =max (Ax intersected with Bi)/Ai Integration with public snRNA-seq datasets [0163] Integration of Droplet Paired-Tag RNA modality with the original Paired-Tag dataset and 10X scRNA-seq dataset is carried out using Seurat. Briefly, gene counts for all datasets were normalized, and the top 2,000 shared variable genes across datasets were identified as integration features. Dimensional reduction (CCA) was carried out to project all nudei into the same embedding, and “anchors" (pairs of cdls from different datasets) were identified by mutual nearest neighbors searching. Low confidence anchors were filtered out, and finally shared neighbor overlap between anchor and query cdls in an overall neighbor graph was computed. Louvain dustering on the overall neighbor graph is used for co-embedded duster identification. To compare dustering results from different datasets, overlap coefficients were calculated based on the number of cdls sharing the same labels from both the query (A) and reference dusters (B), in the co-embedding dusters (C) (i = query cdl type, j = reference cell type, k = co-embedding duster):

[0164] Oi j=min((Ai intersected with Ck/Ai), max (Aj intersected with Ck/Bj)) Identification of peak set

[0165] To determine the peaks using H3K27ac data, we adopted a previously described method to perform peak calling and merge peaks across replicates. Properly paired reads from all pass filtered nude in the same replicate were merged to generate a pseudo-bulk dataset for all biological replicates. After shift correction for pA-Tn5 cleavage site, peak calling was performed on the using MACS2 with these parameters: '-q 0.01 -nomodel -shift -75 -extsize 150 — keep-dup all -B — SPMR* . Since we used two different antibodies, to reduce variation caused by antibody affinity and specificity, we retained peaks identified in at least two replicates as conserved peaks during merging. Finally, we extended peak summits to a fixed width of 500 bp for merging and downstream analysis.

Identification of cCRE modules [0166] To ensure a fair comparison between different techniques, we filtered the CEMBA cCREs list (Li, Y. E. et al. An atlas of gene regulatory dements in adult mouse cerebrum. Nature 598, 129-136 (2021).for dements with an arbitrary cutoff (histone modification signal RPKM > 1) in at least one transcriptome-based duster from both Droplet Paired-Tag and Paired-Tag datasets, and retained cCRE with H3K27ac (289,437, 87.9% of cCREs) or H3K27me3 (127,005, 34.9% of cCREs) signal RPKM > 1 in at least one cluster for downstream analysis. For visualization, shared cCREs (108,319) from H3K27ac and H3K27me3 groups are sdected for plotting. cCREs were classified as distal or proximal based on thdr distance to ±2 kb of the TSS in GENCODE mm 10 (vm25). Proximal or distal cCREs were grouped into 20 different modules by non-negative matrix factorization (NMF) (Li, Y. E. et al. Identification of high -confidence RNA regulatory dements by combinatorial classification of RNA-protdn binding sites. Genome Bid. 18, 1-16 (2017)) based on thdr histone modification signal intensity. Downstream motif enrichment and gene ontology analysis are based cm tins dassification of cCRE modules. For visualization, 95,799 distal cCREs and 12,520 proximal cCREs pass both H3K27ac and H3K27me3 signal cutoffs are plotted. Cdl types with <100 nuclei are excluded for further analysis (VLMC, Vascular and leptomeningeal cdls).

Unking cCREs with putative target genes

[0167] We used a previously described method to link cCREs with thdr putative 424 regulatory genes for both histone modifications (Li, Y. E. et al. An atlas of gene regulatory dements in adult mouse cerebrum. Nature 598, 129-136 (2021)). First, cCREs with cooccupancy of H3K27ac or H3K27me3 within 500kb genomic distance were identified using Cicero (Pliner, H. A. et al. Cicero Predicts ds-Regulatory DNA Interactions from Single-Cdl Chromatin Accessibility Data. Mol. Cdl 71, 858-871 ,e8 (2018)). cCREs with co-occupancy (Cicero score) >0.1 were retained for further analysis. Next, cCREs were classified as distal or proximal based on thdr distance to ±2 kb of die TSS in GENCODE mmlO (vm25). In our analysis, only distal-to-proximal pairs were sdected for comparison. We calculated Spearman’s correlation coefficients (SCC) between gene expression and histone modification signal over cCRE across dusters to examine the relationship between coaccessibility pairs. To estimate random background levds, we shuffled the cell identities for each read and calculated the corresponding SCCs. Finally, we fit a normal distribution modd and sd a cutoff at SCCs score with an empirically defined significance threshold of FDR <0.05, to sdect significant positively (H3K27ac) or negatively (H3K27me3) correlated cCRE gene pairs. To compare with the original Paired-Tag dataset on the strength of putative cCRE-gene pairs, we used Kolmogorov-Smirnov test to calculate the difference between putative cCREs-gene linkages ova random backgrounds For comparison with the previously published Paired- Tag dataset, The greatest distance (D) between real and background distributions is calculated for each dataset, respectively.

Motif enrichment and gene ontology analysis

[0168] Motif enrichment for each cCRE module was performed using HOMER 'findMotifsGenome.pl'. The displayed motif heatmap in Fig. 2 at k and Fig. 9 at b are from the results of known motif discovery. Gene ontology analysis for each enriched gene set is carried out using PANTHER database with default parameters, and biological process terms were used for annotation. To exclude ambiguous terms, we only selected the top enriched terms ranked by fold enrichment x -log10(adjusted p-value).

Integrative analysis of Droplet Paired-Tag and snm3C-seq data

[0169] Mouse brain snm3C-seq dataset was downloaded from GEO with accessing number GSE156683 (Liu, H et al. DNA methylation atlas of the mouse brain at single-cell resolution. Nature 598, (Springer US, 2021)). Contact pairs from individual cells are merged together and visualized using pairtools (v.1.0.2) at 5kb resolution. To summarize putative cCREs-genes pairs at loop anchors, we first performed loop calling using HiCCUPS (juicer tools v.1.22.01) at resolution of 5kb, lOkb and 25kb. Merged loop sets are then intersected with cCRE-gene pairs using bedtools pairtobed (v.2.27.1).

Data Availability

[0170] Raw data obtained in this study have been deposited at the NCBI Gene Expression Omnibus (GEO) (at the site www.ncbi.nlm.nih.gov/geo/) with accession number GSE224560 (reviewers' token: oxqjyieajcvtgh). The processed data can also be accessed as supplementary files in CEO. Datasets for mESC H3K27ac ChlP-seq were downloaded from 4DN data portal with the accession numbers 4DNESTVGLCD9. Other external datasets were downloaded from NCBI Gene Expression Omnibus (GEO) with the following accession numbers: mESC H3K27me3 ChlP-seq (GSE156589), Paired-Tag (GSE152020), CoTECH (GSE158435X scCUT&TAG on PMBC (GSE157910), scCUT&TAG on brain (GSE163532), scCUT&TAG-pro (GSE195725X snm3C-seq on brain (GSE156683) and ChlP-seq on mouse cortex excitatory neurons (GSE141587). The CEMBA snATAC-seq datasets and BICCN 10XsnRNA-eeq MOp data are downloaded via theNeMO archive (RRID: SCR 016152) (the site as8ets.nemoarduve.org/dat-chlnqb7). lOx PBMC scRNA seq and E18 embryonic mouse brain Multiome datasets were downloaded from the lOx Genomics website

(www.10xgenanics.com/resources/datasets). Source data for statistical analysis are provided along in this paper.

Code Availability

[0171] Scripts and code are available at the site github.com/Xieeeee/Droplet-Paired-Tag.

Claims

CLAIMS What is daimed is as follows:

1. A method of generating a nudek add library, the metiiod comprising: permeabilizing a target cdl nudeus; contacting contents of the target nudeus to a transposase that is coupled to a chromatin-targeting binding moiety, wherein the transposase is loaded using a barcoded transposase end nuddc add; delivering the target nudeus to an emulsion droplet comprising i) a bead, said bead having attached thereto a first digonudeotide comprising a pdy-dA segment and a second digonudeotide that targets the tramposase end nuddc add, wherein the first digonudeotide and the second digonudeotide share a common barcode; ii) a ligase and iii) a reverse transcriptase, in an environment compatible with both ligation and reverse transcription; performing reverse transcription and ligation in the emulsion droplet; and preparing a nuddc add library from nuddc add contents of the emulsion droplet

2. The metiiod of daim 1, wherein the transposase comprises a Tn5 tramposase.

3. The metiiod of daim 1, wherrin the transposase comprises a Tn3 transposase.

4. The method of daim 1, wherein the transposase comprises a Tn? tramposase.

5. The method of daim 1, wherein the transposase comprises a sleeping beauty transposase.

6. The method of daim 1, wherein the transposase comprises a mu tramposase.

7. The metiiod of daim 1, wherein die transposase comprises an antibody binding moiety.

8. The metiiod of any one of daims 2 - 6, wherein the transposase comprises an antibody binding moiety.

9. The metiiod of daim 7 that is, wherein the transposase comprises a protrin A fusion protein.

10. The metiiod of daim 1, wherrin the transposase comprises a protein A- Tn5 fusion protein.

11. The metiiod of daim 1, wherrin the chromatin-targeting binding mdety targets a chromatin modification.

12. The method of daim 11, wherein the chromatin modification comprises a histone modification.

13. The method of daim 12, wherein the histone modification comprises histone acetylation.

14. The method of daim 12, wherein the histone modification comprises histone deacetylation.

15. The method of daim 12, wherein the histone modification comprises histone methylation.

16. The method of daim 12, wherein the histone modification comprises histone demethylation.

17. The metiiod of daim 1, wherein die chromatin-4argeting binding moiety targets an unmodified histone.

18. The method of daim 1, wherein the chromatin-targeting binding moiety targets a transcription factor.

19. The metiiod of daim 1, wherein the chromatin-4argeting binding moiety targets a transcription repressor.

20. The method of daim 1, wherein the chromatin-targeting binding moiety comprises an antibody.

21. The metiiod of any one of drims 12 to 19, wherein the chromatin-taigeting binding moiety comprises an antibody.

22. The method of daim 1, wherein the transposase that is coupled to a chromatintargeting binding moiety via protein A-antibody Fc binding interaction.

23. The method of daim 1, wherein the barcoded transposase end nuddc add comprises a region common to a plurality of barcoded transposase end nuddc adds in a droplet

24. The metiiod of daim 1, wherein the barcoded transposase end nuddc add comprises a region distinct from a plurality of barcoded transposase end nuddc adds in a droplet

25. The method of daim 1, daim 23, or daim 24, wherein the barcoded transposase aid nuddc add comprises a double-stranded region and an overhang.

26. The metiiod of daim 1, daim 23, or daim 24, wherein the barcoded transposase end nuddc add canprises a double-stranded region and an overhang, wherein the overhang is reverse complementary to a portion of the second digonudeotide.

27. The metiiod of daim 1, wherein the barcode identifies the cdlular origin of the nudeus.

28. The method of daim 1, wherein the permeabilizing does not release ribonuddc adds from the nudeus.

29. The method of daim 1, wherein the permeabilizing does not release ribonudeic adds from a cdl harboring the nudeus.

30. The method of daim 1, wherein die permeabilizing allows retention of at least some ribonudeic adds in the nudeus.

31. The method of daim 1, wherein the permeabilizing allows retention of at least some ribonudeic adds in a cdl harboring the nudeus.

32. The mediod of daim 1, comprising releasing die first digo and the second digo from the bead prior to reverse transcribing.

33. The mediod of daim 1, comprising rdeasing the first digo and the second digo from the bead prior to ligating.

34. The mediod of daim 1, comprising rdeasing die first digo and the second digo from the bead subsequent to reverse transcribing.

35. The method of daim 1, comprising rdeasing the first digo and the second digo from die bead subsequent to ligating.

36. The mediod of daim 1, wherein preparing a nuddc add library canprises breaking the emulsion droplet

37. The mediod of daim 1, wherein prqiaring a nuddc add library comprises amplifying reverse transcription products.

38. The method of daim 1, wherein prqiaring a nuddc add library comprises amplifying ligation products.

39. The mediod of daim 1, wherein prqiaring a nuddc add library comprises amplifying using primers having library adapter ends.

40. The mediod of daim 39, wherein the library adapter ends comprise P5 and P7 adapter ends.

41. The mediod of daim 1, comprising sequencing at least a portion of the nuddc add library.

42. A kit, canprising: a transposase, a target biding moiety, a bead comprising a reverse transcription capture primer and a ligation oligo, a ligase, a reverse transcription primer and a reverse transcriptase.

43. The kit of daim 42, wherein the transposase comprises a Tn5 transposase.

44. The kit of daim 42, wherein the transposase comprises a Tn3 transposase.

45. The Idt of daim 42, wherein the transposase comprises a Tn7 transposase.

46. The kit of daim 42, wherein the transposase comprises a sleeping beauty transposase.

47. The kit of daim 42, wherein the transposase comprises a mu transposase.

48. The Idt of daim 42, wherein die transposase comprises an antibody binding mdety.

49. The Idt of daim 48, wherein the antibody binding mdety canprises at least a functional portion of a protein A protein.

50. The kit of daim 48, wherein the antibody binding mdety comprises a protein A.

51. The kit of daim 42, wherein die target biding mdety comprises an antibody Fab region.

52. The Idt of claim 42, wherein the target biding moiety comprises an antibody Fc region.

53. The kit of daim 42, wherein die target biding moiety comprises an antibody.

54. The kit of daim 53, wherein the antibody directs die transposase to a target region.

55. The Idt of daim 53, wherein the antibody binds to the transposase via an antibody binding moiety.

56. The kit of daim 55, wherein the antibody binding moiety comprises at least a segment of protein A.

57. The Idt of daim 55, wherein the antibody binding moiety canprises protein A.

58. The Idt of daim 42, wherein the transposase comprises a transposase end nuddc add.

59. The kit of daim 58, wherein the transposase end nuddc add comprises a cell barcode.

60. The kit of daim 58, wherein the transposase end nuddc add comprises a randomer sequence segment.

61. The Idt of daim 58, wherein the transposase end nuddc add comprises a doublestranded segment

62. The kit of daim 61, wherein the transposase end nuddc add comprises an overhang segment.

63. The Idt of daim 62, wherein the overhang segment is reverie complementary to at least a portion of tire ligation digo.

64. The Idt of daim 42, wherein die ligation digo and the reverse transcription primer share a common barcode sequence.

65. The kit of claim 42, wherein the reverse transcription primer comprises a pdy-T segment

66. The kit of claim 42, wherein the ligation digo and the reverse transcription primer are tethered to the bead.

67. The Idt of claim 66, wherein tiie ligation digo and the reverse transcription primer are tethered to tiie bead by cleavable tethers.

68. The Idt of claim 42, wherein the ligation digo and the reverse transcription primer are embedded in the bead.

69. The kit of daim 42, wherein tiie bead is enzymatically degradable.

70. The kit of daim 42, wherein the bead is degradable under physidogical conditions.

71. The Idt of daim 42, wherein the bead has a mdting temperature bdow 100 C.

72. The kit of daim 42, further comprising library generating primers.

73. The kit of daim 72, wherein tiie library generating primers comprise a P5 primer and a P7 primer.

74. A composition comprising an oil carrier harboring an aqueous droplet, said aqueous droplet comprising a first nudeus, said first nudeus having first genomic DNA deaved at first target sites tagged by a first barcode.

75. The composition of daim 74, comprising a first bead population comprising first ligation digos reverse complementary to the first barcode and first reverse transcription capture primers identified by the first barcode.

76. The composition of daim 75, comprising a second aqueous droplet, comprising a bead comprising second ligation digos reverse complementary to the second barcode and second reverse transcription capture primers identified by the second barcode.

77. The composition of claim 74, comprising a reverse transcriptase.

78. The composition of claim 74, comprising a ligase.

79. The composition of claim 74, wherein tiie first target sites comprise histone acetylation sites.

80. The composition of claim 74, wherein the first target sites comprise histone deacetylation sites.

81. The canposition of claim 74, wherein tiie first target sites comprise histone methylation sites.

82. The composition of claim 74, wherein the first target sites comprise histone demethylation sites.

83. The composition of claim 74, wherein the first target sites comprise transcription facta biding sites.

84. The composition of claim 74, wherein the first target sites comprise transcription repressor binding sites.

85. A composition comprising an oil carrier harboring a first aqueous droplet and a second aqueous droplet, said first aqueous droplet comprising a first nucleus library said first nucleus library comprising commonly barcoded first nucleus genomic fragments and first nucleus reverse transcribed fragments, and said second aqueous droplet comprising a second nucleus library, said second nucleus library comprising commonly barcoded second nucleus genomic fragments and second nudeus reverse transcribed fragments.

86. The composition of claim 85, wherein die first nudeus library arises from a single cdl.

87. The composition of claim 85, wherein the first nucleus genomic fragments are generated by binding-moiety guided transposon insertion.

88. The canposition of claim 87, wherein die binding moiety comprises an antibody.

89. The composition of claim 87, wherein the antibody binds histone acetylation sites.

90. The composition of claim 87, wherein the antibody binds histone deacetylation sites.

91. The composition of claim 87, wherein the antibody binds histone methylation sites.

92. The composition of claim 87, wherein the antibody binds histone demethylation sites.

93. The composition of claim 87, wherein die antibody binds transcription factor biding sites.

94. The composition of daim 87, wherein the antibody binds transcription repressa binding sites.

95. The composition of daim 87, wherein the transposon comprises a TnS transposase.

96. The composition of daim 87, wherein the transposon comprises a Tn3 transposase.

97. The composition of daim 87, wherein the transposon comprises a Tn7 transposase.

98. The composition of daim 87, wherein the transposon comprises a sleeping beauty transposase.

99. The composition erf daim 87, wherein the transposon comprises a mu transposase.

100. The composition of claim 87, wherein the transposon is fused to an antibody binding moiety.

101. The composition of claim 100, wherein die antibody binding moiety comprises at least a functional portion of a protein A protein.

102. The composition of claim 100, wherein the antibody binding moiety comprises a protein A.

103. The composition of claim 87, wherein the first nucleus genomic fragments and first nucleus reverse transcribed fragments correspond to a first nucleus chromatin conformation.

104. The composition of claim 103, wherein the second nucleus genomic fiagments and second nucleus reverse transcribed fiagments correspond to a second nucleus chromatin conformation.

105. The composition of claim 104, wherein die first nucleus chromatin conformation and the second nucleus chromatin conformation are different

106. The composition of claim 104, wherein the first nudeus chromatin conformation and die second nucleus chromatin conformation are similar.

107. The composition of claim 104, wherein die first nucleus chromatin conformation and the second nucleus chromatin conformation are the same.

108. A nudeic add library comprising first nudeus target site borders, first transoiptome fragments, second nudeus target site borders and second transcriptome fragments, wherein die first nudeus target site borders and die first transcriptome fragments share common first tags, and wherein the second nudeus target site borders and second transoiptome fragments share common second tags.

109. The nudeic add library of daim 108, wherein the first nudeic add target site borders arise from first nudeic add target sites corresponding to histone acetylation sites.

110. The nudac add library of daim 108, wherein the first nudde add target site borders arise from first nudac add target sites corresponding to histone deacetylation sites.

111. The nudeic add library of daim 108, wherein the first nudeic add target site borders arise from first nudac add target sites corresponding to histone methylation sites.

112. The nudac add library of daim 108, wherein the first nudeic add target site borders arise from first nudac add target sites corresponding to histone demethylation sites.

113. The nudeic add library of daim 108, wherein the first nudac add target site borders arise from first nudac add target sites corresponding to transcription factor biding sites.

114. The nucleic add library of daim 108, wherdn the first nuddc add target site borders arise from first nuddc add target sites corresponding to transcription repressor binding sites.

115. The nuddc add library of daim 108, wherein the first nuddc add transcription fragments arise from chromatin having a configuration indicated by histone positions at histone acetylation sites.

116. The nuddc add library of daim 108, wherein the first nuddc add transcription fragments arise from chromatin having a configuration indicated by histone positions at histone deacetylation sites.

117. The nuddc add library of daim 108, wherein the first nuddc add transcription fragments arise from duomatin having a configuration indicated by histone positions at histone methylation rites.

118. The nuddc add library of daim 108, wherein the first nuddc add transcription fragments arise from chromatin having a configuration indicated by histone positions at histone demethylation rites.

119. The nuddc add library of daim 108, wherein the first nuddc add transcription fragments arise from chromatin having a configuration indicated by histone positions at transcription factor binding rites.

120. The nuddc add library of daim 108, wherein the first nuddc add transcription fragments arise from chromatin having a configuration indicated by histone positions at transcription repressor binding rites.

121. The nuddc add library of daim 108, wherein the library constituents are tagged in a common compartment

122. The nuddc add library of daim 121, wherein the compartment is an emulsion droplet

123. The nuddc add library of daim 108, wherein the library constituents share common P5 and P7 ends.

124. The nuddc add library of daim 108, wherein the library constituents are concatemerized for long read sequendng.

125. The nuddc add library of daim 108, wherein the library constituents share common smartbell ends.

126. A method of chromatin mapping and transcriptome correlation, comprising transposon tagging differentially accessible duomatin to generate first tagged genomic fragments in a first nudeus, depositing the first nudeus in an emulsion droplet, generating a first genomic library from the first tagged genomic fragments, generating a first transoiptome from RNA of the first cdl harboring the first nudeus, wherein the first genomic library and the first transcriptome share a common tag ddivered by genomic library tags and reverse transcription primers on a common bead in the emulsion droplet, wherein the emulsion droplet further comprises a second nudeus and second transcriptome, wherein the transposon tagging and library generation occur in less than 2 days.

127. A method of cdl data bulking for a heterogeneous cdl population, comprising generating a nudear genome library for each of a plurality of cdls of the population and generating a transcriptome library for each of a plurality of cdls of the population, wherein tire nudear genome library for a first cdl of the plurality of cdls of the population and the transoiptome library for a first cdl of the plurality of cdls of the population are barcoded such that each library may be corrdated to the first cdl, and wherein the nudear genome library for a second cdl of the plurality of cdls of die population and the transcriptome library for a second cdl of the plurality of cdls of the population are barcoded such that each library may be corrdated to die second cdl; identifying a feature common to either the nudear genome library for the first cdl and the nudear genome library for the second cdl, or a feature common to the transcriptome library fa* die first cdl and the transoiptome library for the second cdl; and bulking data fiom the first cdl and the second cdl from the library from winch the common feature was not identified.

128. The method of daim 127, wherein the common feature is identified from the nudear genome library for the first cdl and the nudear genome library for the second cdl.

129. The method of daim 127, wherein the common feature is identified from the transcriptome library for the first cdl and the transcriptome library for tiie second cdl.

130. The method of daim 127, wherein the first cdl and the second cdl are not assayed for a common cdl surface protein prior to library generating.

131. The method of daim 127, wherein the first cdl and the second cdl are not physically co-segregated to the exdusion of a third cdl of tiie population.

132. The method of daim 127, wherein the bulking data from the from the first cdl and tiie second cdl allows detection of a feature that is present in the data from the from the first cdl and the second cdl.

133. The method of daim 132, wherein the feature is present in the data from the first cdl at a levd that is bdow a statistical threshdd for detection.

134. The method of daim 133, wherein the feature is present in the data from the second cdl at a levd dial is bdow a statistical threshdd for detection.

135. The method of daim 134, wherein the feature is presort in die bulking data at a levd that is above a statistical threshdd for detection.

136. The method of daim 127, wherein the first cdl and the second cdl comprise no more than 0.001% of die population.

137. The mediod of daim 127, wherein die first cdl and the second cdl comprise no more than 1% of the population.

138. The method of daim 127, wherein generating the nudear genome library for each of the plurality of cells comprises contacting the cells to a target-directed endonudease that tags target-proximal deavage sites.

139. The mediod of daim 138, wherein the target-directed endonudease is a transposase.

140. The method of daim 139, wherein the transposase is directed by a conjugated antibody.

141. The mediod of daim 140, wherein die antibody directs the transposase to a chromatin feature.

142. The method of daim 141, wherein the feature is ahistone modification.

143. The method of daim 141, wherein the feature is sdected from the list consisting of a chromatin modification, a histone modification, a histone acetylation site, a histone deacetylation site, a histone methylation site, a histone demethylation site, unmodified histones, a transcription factor, an RNA polymerase complex, and RNA pdymerase protein, a DNA replication complex, a DNA replication protein, a DNA repair complex, a DNA repair complex protein, a transcription repressor, a chromatin modification complex, a pdycomb complex, a pdycomb complex constituent, an EZH2 protein, a histone acetylation enzyme, a histone acetylation complex, a DNA polymerase complex, a DNA pdymerase complex protein, a telomere synthase complex, a telomere protein, a centromere binding complex, a centromere binding protein, and a chromatin associated viral partide.

144. The mediod of daim 127, comprising delivering the first cdl to a first emulsion droplet having a first bead tethered to commonly first barcoded digos.

145. The method of daim 144, comprising delivering the second cell to a second emulsion droplet having a second bead tethered to commonly second barcoded digos.