WO2025147587A1 - Protein sequencing using super-resolution imaging - Google Patents
Protein sequencing using super-resolution imaging Download PDFInfo
- Publication number
- WO2025147587A1 WO2025147587A1 PCT/US2025/010206 US2025010206W WO2025147587A1 WO 2025147587 A1 WO2025147587 A1 WO 2025147587A1 US 2025010206 W US2025010206 W US 2025010206W WO 2025147587 A1 WO2025147587 A1 WO 2025147587A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- amino acid
- molecule
- modified amino
- polymerizable
- modified
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N33/00—Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
- G01N33/48—Biological material, e.g. blood, urine; Haemocytometers
- G01N33/50—Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
- G01N33/68—Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving proteins, peptides or amino acids
- G01N33/6803—General methods of protein analysis not limited to specific proteins or families of proteins
- G01N33/6818—Sequencing of polypeptides
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N33/00—Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
- G01N33/48—Biological material, e.g. blood, urine; Haemocytometers
- G01N33/50—Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
- G01N33/68—Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving proteins, peptides or amino acids
- G01N33/6803—General methods of protein analysis not limited to specific proteins or families of proteins
- G01N33/6818—Sequencing of polypeptides
- G01N33/6824—Sequencing of polypeptides involving N-terminal degradation, e.g. Edman degradation
Definitions
- the stacked plurality of modified amino acids comprises a first amino acid and a second amino acid, wherein the first amino acid and the second amino acid are spaced along the stacked plurality of stacked polymerizable molecules at a distance greater than 50 nanometers.
- the polymerizable molecule comprises a nucleic acid molecule that is configured to hybridize to a nucleic acid anchor molecule of the plurality of nucleic acid anchor molecules.
- the loading comprises loading a plurality of modified amino acids.
- the binding agent comprises an antibody, an antibody fragment, a nanobody, an aptamer, a peptide, a polymer, an inorganic compound, or a small molecule.
- the method further comprises, prior to (a), generating the modified amino acid.
- the generating comprises (I) providing, a linker and the polymerizable molecule, (II) coupling the linker to (i) an amino acid of a peptide and (ii) the polymerizable molecule to generate an amino acid-linker complex, and (III) cleaving the amino acid, thereby generating the modified amino acid.
- the linker is unifunctional, bifunctional, trifunctional, quadrifunctional, or poly functional.
- the method further comprises, (IV) coupling the amino acid-linker complex or the modified amino acid to a capture moiety.
- the capture moiety is coupled to a substrate. In some embodiments, the capture moiety is coupled to the peptide. In some embodiments, the capture moiety is coupledto a C-terminus of the peptide. In some embodiments, (IV) is performed prior to (III). In some embodiments, the method further comprises derivatizing the modified amino acid. In some embodiments, the method further comprises repeating (I)-(IH) on the peptide. In some embodiments, the method further comprises repeating (IV) on the peptide. In some embodiments, the repeatingyields a stacked plurality of modified amino acids comprising a plurality of stacked polymerizable molecules.
- the fluidic device comprises a surface, wherein the surface comprises a plurality of nucleic acid anchor molecules coupled thereto, wherein the plurality of nucleic acid anchor molecules is configured to couple to the plurality of modified amino acids.
- the polymerizable molecule comprises a nucleic acid molecule that is configured to couple to a nucleic acid anchor molecule of the plurality of nucleic acid anchor molecules via hybridization.
- a method for processing a DNA molecule comprising: (a) attaching a first sequence of the DNA molecule to a substrate; (b) linearizing the DNA molecule adjacent to the substrate; and (c) attaching a second sequence ofthe DNA molecule DNA molecule to the substrate.
- the linearizing is performed using shear stress or an electric field.
- the DNA molecule comprises at least 150 nucleotides.
- the DNA molecule has a length of at least 50 nanometers.
- the detectable label comprises a fluorophore or quantum dot.
- the detectable label comprises a fluorescent label, a FRET, a chemiluminescence label, an electrochemiluminescence label, a bioluminescence label, a phosphorescence label, or a label that generates light through other types of reactions or stimulations.
- the method further comprises a substrate, wherein the sub strate is bound to the modified amino acid.
- the polymerizable molecule is linearized adjacent to the substrate.
- a method for characterizing a modified amino acid comprising: (a) providing the modified amino acid, wherein the modified amino acid comprises a polymerizable molecule; and (b) using Raman spectroscopy associated high- resolution imaging technique to determine the identity of the modified amino acid.
- the imaging step (b) is performed using label-free detection of the modified amino acid.
- the modified amino acid generates intrinsic vibrational modes in the Raman spectrum.
- the intrinsic vibrational modes is enhanced or modulated by specialized methods.
- the imaging step (b) is performed using a binding agent associated with the modified amino acid.
- the binding agent comprises an antibody, an antibody fragment, a nanobody, an aptamer, a peptide, a polymer, an inorganic compound, or a small molecule.
- the binding agent comprises a detectable label for Raman spectroscopy.
- a stacked plurality of modified amino acids comprises the modified amino acid and an additional modified amino acid, wherein the modified amino acid and the additional modified amino acid are spaced along the plurality of stacked polymerizable molecules at a distance greater than 50 nanometers.
- FIG. 4 shows exemplary data for a substrate comprising one or more capture moieties.
- FIG. 5 shows exemplary data for detection of a modified amino acid, as described herein.
- FIG. 6 shows additional exemplary data for detection of a modified amino acid, as described herein.
- “about” can mean a range of up to ⁇ 20%, preferably up to ⁇ 10%, more preferably up to ⁇ 5%, and more preferably still up to ⁇ 1% of a given value.
- the term can mean within an order of magnitude, preferably within 2-fold, of a value.
- protein generally refers to a molecule comprising two or more amino acids joined by a peptidebond.
- a protein may also be referred to as a “polypeptide”, “oligopeptide”, or “peptide”.
- a protein can be a naturally occurring molecule, or a synthetic molecule (e.g., an artificial protein, peptide, enzyme).
- a protein may include one or more nonnatural amino acids, modified amino acids, or non-amino acid linkers.
- a protein may contain D- amino acid enantiomers, L- amino acid enantiomers or both. Amino acids of a protein may be modified naturally or synthetically, such as by post-translational modifications or by chemical modification.
- proteins may be distinguished from each other based on different genes from which they are expressed in an organism, different primary sequence length or different primary sequence composition. Proteins expressed from the same gene may nonetheless be different proteoforms, for example, being distinguished based on nonidentical length, non-identical amino acid sequence or non-identical post-translational modifications. Different proteins can be distinguished based on one or both of gene of origin and proteoform state.
- single analyte may refer to an analyte that is individually manipulated or distinguished from other analytes.
- a single analyte may comprise a biomolecule or a synthetic molecule.
- a single analyte may comprise a small molecule.
- a single analyte can be a single molecule (e.g., a single biomolecule such as a single protein, nucleic acid molecule, affinity reagent, lipid, carbohydrate, etc.), a single complex of two or more molecules (e.g., a multimeric protein having two or more separable subunits, a single protein attached to a nucleic acid molecule or a single protein attached to an affinity reagent), a single particle, or the like.
- Reference herein to a “single analyte” in the context of a composition, system or method herein does not necessarily exclude application of the composition, system or method to multiple single analytes that are manipulated or distinguished individually, unless indicated contextually or explicitly to the contrary.
- affixed refers to a connection between a polypeptide and a substrate such that at least a portion of the polypeptide and the substrate are held in physical proximity.
- the term “affixed” encompasses both an indirect or direct connection and may be reversible or irreversible, for example the connection is optionally a covalent bond or a non-covalent bond.
- sample refers to a collected substance or material that comprises or is suspected to comprise one or more analytes of interest (e.g., biomolecules, e.g, polypeptides).
- a sample may be modified for purposes such as storage or stability.
- a sample may be naturally occurring or synthetic.
- a sample may be obtained from an organism or part of an organism, such as from a fluid, tissue, or cell.
- a sample may include biological and/or non-biological components.
- biological sample or “biological source” refer to a sample that is derived from a predominantly biological system or organism, such as one or more viral particles, cells (e.g. individualized cells), organelles (e.g. individualized organelles), tissues, bodily fluids, bone, cartilage, and exoskeleton.
- a biological sample may comprise a prokaryotic cell (e.g., bacteria) or eukaryotic cell (e.g., fungus, protist, algae, plant, animal).
- a biological sample may be processed to purify and retain one or more biomolecules (e.g., proteins, nucleic acids, carbohydrates, lipids, glycoproteins, lipoproteins, metabolites, etc.) from the biological sample.
- a biological sample e.g., a protein sample
- a biological sample may be derived from cultured cells, which may be treated or untreated.
- a biological sample e.g., a protein sample
- tissue specimens such as biopsy samples, which may optionally be processed to liberate biomolecules (e.g., proteins) contained therein.
- Tissue samples may also be derived from in vivo specimens, including fresh, frozen, acute, and fixed tissues.
- hydrogel refers to a three-dimensional polymeric structure that is substantially insoluble in water, but which is capable of absorbing and retaining large quantities of water to form a substantially stable, often soft and pliable, structure.
- water can penetrate in between polymer chains of a polymer network, subsequently causing swelling and the formation of a hydrogel.
- hydrogels are superabsorb ent (e.g., containingmore than about 90% water) and canbeincludedof natural or synthetic polymers.
- hydrogels include but are not limited to, hyaluronans, chitosans, agar, heparin, sulfate, cellulose, alginates (including alginate sulfate), collagen, dextrans (including dextran sulfate), pectin, carrageenan, polylysine, gelatins (including gelatin type A), agarose, (meth)acrylate-oligolactide-PEO-oligolactide-(meth)acrylate, PEO — PPO-PEO copolymers (Pluronics), poly(phosphazene), poly(methacrylates), poly(N-vinylpyrrolidone), PL(G)A-PEO- PL(G)A copolymers, polyethylene imine), polyethylene glycol (PEG)-thiol, PEG-acrylate, acrylamide, N,N'-bis(acryloyl)cystamine, PEG, polypropylene oxide (P
- hydrogel subunits or “hydrogel precursors” mean hydrophilic monomers, prepolymers, or polymers that can be crosslinked, or “polymerized”, to form a three-dimensional (3D) hydrogel network. It is believed that this fixation of the biological specimen in the presence of hydrogel subunits crosslinks the components of the specimen to the hydrogel subunits, thereby securing molecular components in place, preserving the tissue architecture and cell morphology.
- antibody and “immunoglobulin” may generally refer to proteins that can recognize and bind to a specific antigen.
- An antibody or immunoglobulin may refer to an antibody isotype, fragments of antibodies including, but not limited to, Fab, Fv, scFv, and Fd fragments, chimeric antibodies, humanized antibodies, single-chain antibodies, and fusion proteins including an antigen-binding portion of an antibody and a non-antibody protein.
- the antibodies may be detectably labeled, e.g., with a fluorophore, radioisotope, enzyme (e.g, a peroxidase) which generates a detectable product, fluorescent protein, nucleic acid barcode sequence, and the like.
- the antibodies may be further conjugated to other moieties, such as members of specific binding pairs, e.g., biotin (member of biotin-avidin specific binding pair), and the like.
- Also encompassed by the terms are nanobodies, Fab', Fv, F(ab')2, scFv, and other antibody fragments that retain specific binding to antigen.
- Antibodies may exist in a variety of other forms including, for example, Fv, Fab, and (Fab)2, diabodies, monobodies, single domain antibodies (sdAb), as well asbi-functional (i.e., bi-specific, e.g., bi-specific T-cell engager) hybrid antibodies (e.g., Lanzavecchia et al., Eur. J. Immunol. 17, 105 (1987)) and in single chains (e.g, Huston et al., Proc. Natl. Acad. Sci. U.S.A., 85, 5879-5883 (1988) and Bird et al., Science, 242, 423-426 (1988), which are incorporated herein by reference).
- sdAb single domain antibodies
- bi-functional hybrid antibodies e.g., bi-specific, e.g., bi-specific T-cell engager
- hybrid antibodies e.g., Lanzavecchia et al., Eur. J. Immuno
- Binding or “coupling” as used herein generally refers to a covalent or non-covalent interaction between two molecules (referred to herein as “binding partners”, e.g., a substrate and an enzyme or an antibody and an epitope). Binding between binding partners may be specific or non-specific.
- bindingpartners e.g., abindingpartner and a cognate molecule
- binding partners e.g., abindingpartner and a cognate molecule
- a specific binding interaction may entail a binding partner that binds to a cognate molecule.
- the specific binding interaction may entail the binding of the binding partner to its cognate molecule at a significantly or substantially higher level or with greater affinity as compared to the binding of the binding partner to a non-cognate molecule.
- a specific binding interaction may entail a first binding partner that has greater selectivity of binding to the cognate molecule as compared to a non-cognate molecule.
- nucleic acid refers to a polymeric form of naturally occurring or synthetic nucleotides, or analogs thereof, of any length.
- a nucleic acid molecule may comprise one or more deoxy ribonucleotides, deoxynucleotide triphosphates, dideoxynucleotide triphosphates, deoxynucleotide hexaphosphates, dideoxynucleotide hexaphosphates, ribonucleotides, hexitol nucleotides, cyclohexane nucleotides, or analogs or combinations thereof.
- a nucleic acid molecule may comprise, e.g., DNA, RNA, HNA, CeNA, and modified forms thereof.
- a nucleic acid molecule may comprise nucleotides that are linked by phosphodiester bonds.
- Anucleic acid molecule may have any two- orthree-dimensional structure, and may perform any function, known or unknown.
- a nucleic acid molecule may be single stranded, double stranded, or partially double stranded.
- Non-limiting examples of polynucleotides include a gene, a gene fragment, exons, introns, messenger RNA (mRNA), transfer RNA, ribosomal RNA, noncoding RNA, small interfering RNA, short hairpin RNA, micro RNA, scaRNA, ribozymes, riboswitches, viral RNA, complementary DNA (cDNA), cosmid DNA, mitochondrial DNA, chromosomal or genomic DNA, viral DNA, recombinant polynucleotides, branched polynucleotides, plasmids, vectors, isolated DNA of any sequence, control regions, isolated RNA of any sequence, nucleic acid probes, nucleic acid adapters, and primers.
- mRNA messenger
- the nucleic acid molecule may be linear, circular, or any other geometry.
- polynucleotide analogs include but are not limited to xeno nucleic acid (XNA), bridged nucleic acid (BNA), glycol nucleic acid (GNA), hexitol nucleic acid (HNA), cyclohexane nucleic acid (CeNA), peptide nucleic acids (PNAs), yPNAs, morpholino polynucleotides, locked nucleic acids (LNAs), threose nucleic acid (TNA), 2 '-O-Methyl polynucleotides, 2'-O-alkyl ribosyl substituted polynucleotides, phosphorothioate polynucleotides, and boronophosphate polynucleotides.
- XNA xeno nucleic acid
- BNA bridged nucleic acid
- GNA glycol nucle
- a polynucleotide analog may possess purine or pyrimidine analogs, including for example, 7-deaza purine analogs, 8-halopurine analogs, 5-halopyrimidine analogs, inverted base, or universal base analogs that can pair with any base, including hypoxanthine, nitroazoles, isocarb ostyril analogues, azole carboxamides, and aromatic tri azole analogues, or base analogs with additional functionality, such as a biotin moiety for affinity binding.
- amino acid generally refers to an organic compound that combines to form a protein or peptide.
- An amino acid generally comprises an amine group, a carboxylic acid group, and a side-chain specific to each amino acid, which serve as a monomeric subunit of a peptide.
- An amino acid may include the 20 standard, naturally occurring or canonical amino acids as well as non-standard or non-canonical amino acids.
- the standard, naturally- occurring or canonical amino acids include Alanine (A or Ala), Cysteine (C or Cys), Aspartic Acid (D or Asp), Glutamic Acid (E or Glu), Phenylalanine (F or Phe), Glycine (G or Gly), Histidine (H or His), Isoleucine (I or He), Lysine (K orLys), Leucine (L or Leu), Methionine (M or Met), Asparagine (N or Asn), Proline (P or Pro), Glutamine (Q or Gin), Arginine (R or Arg), Serine (S or Ser), Threonine (T or Thr), Valine (V or Vai), Tryptophan (W or Trp), and Tyrosine (Y or Tyr).
- amino acid may be an L-amino acid or a D-amino acid.
- Non-standard amino acids may be modified amino acids, amino acid analogs, amino acid mimetics, non-standard proteinogenic amino acids, or non-proteinogenic amino acids that occur naturally or are chemically synthesized.
- non-standard amino acids include, but are not limited to, selenocysteine, pyrro lysine, and N-formylmethionine, (3 -amino acids, Homo-amino acids, Proline and Pyruvic acid derivatives, 3 -substituted alanine derivatives, glycine derivatives, ring- substituted phenylalanine and tyrosine derivatives, linear core amino acids, and N-methyl amino acids.
- modified amino acid may refer to the amino acid-linker complex, the amino acid-linker-polymerizable molecule complex, or derivatives thereof.
- the modified amino acid may be used to refer to the amino acid-linker complex or the amino acidlinker-polymerizable molecule complex before or after cleavage.
- the modified amino acid may refer to a portion of the amino acid-linker complex or the amino acid-linker- polymerizable molecule complex (e.g., justthe comprised aminoacid portion, justthe amino acidlinker complex portion, etc.).
- amino acid type generally refers to one of the standard, naturally-occurring or canonical amino acids, e.g., one member of the group consisting of Alanine (A or Ala), Cysteine (C or Cys), Aspartic Acid (D or Asp), Glutamic Acid (E or Glu), Phenylalanine (F or Phe), Glycine (G or Gly), Histidine (H or His), Isoleucine (I or He), Lysine (K or Lys), Leucine (L or Leu), Methionine (M or Met), Asparagine (N or Asn), Proline (P or Pro), Glutamine (Q or Gin), Arginine (R or Arg), Serine (S or Ser), Threonine (T or Thr), Valine (V or Vai), Tryptophan (W or Trp), Tyrosine (Y or Tyr), derivatives thereof, and modified forms of any of the aforementioned amino acids.
- amino acid type may be used herein to distinguish a plurality of amino acids that comprise different side chain groups, rather than a plurality of amino acids that are identical (e.g., different positional amino acids of a single peptide that have the same side chain).
- An amino acid type may comprise a modified version of one of the standard, naturally -occurring or canonical amino acids e.g., post translational modifications, an epigenetic modification, or chemical or enzymatic modifications.
- an amino acid type can include non-canonical amino acids.
- a post-translational modification includes modifications of the amino terminus and/or the carboxyl terminus of a peptide.
- Modifications of the terminal amino group include, but are not limited to, des-amino, N-lower alkyl, N-di-lower alkyl, and N-acyl modifications.
- Modifications of the terminal carboxy group include, but are not limited to, amide, lower alkyl amide, dialkyl amide, and lower alkyl ester modifications (e.g., wherein lower alkyl is C1-C4 alkyl).
- a post- translational modification also includes modifications, such as but not limited to those described above, of amino acids falling between the amino and carboxy termini.
- the term post-translational modification can also include peptide modifications that include one or more detectable labels.
- a post-translational modification may be naturally occurring or synthetic.
- a linker may comprise a first reactive group that is able to couple to a monomer of the polymeric analyte (e.g., an amino acid of a peptide) and optionally, cleave the amino acid from a peptide.
- a monomer of the polymeric analyte e.g., an amino acid of a peptide
- the first reactive group may be an amino-acid reactive group, e.g., an isothiocyanate (ITC) such as phenyl isothiocyanate (PITC), 3-pyridyl isothiocyanate (PYITC), 2-piperidinoethyl isothiocyanate (PEITC), 3-(4-morpholino) propyl isothiocyanate (MPITC), 3- (diethylamino)propyl isothiocyanate (DEPTIC) or naphthylisothiocyanate (NITC), fluorescein isothiocyanate (FITC), ammonium thiocyanate, potassium thiocyanate, trimethylsilyl isothiocyanate (TMS-ITC), phenyl phosphoroisothiocyanatidate, acetyl isothiocyanate (AITC), or an aldehyde group, e.g., orthophthalaldehyde (OP A), 2, 3
- the linker may additionally comprise a second reactive group that is capable of coupling, either directly or indirectly, to the capture moiety.
- the capture moiety may comprise a click chemistry moiety (e.g., alkyne)
- the second reactive group of the linker may comprise an additional click chemistry moiety (e.g., azide)thatcan reactwith the click chemistry moiety of the capture moiety.
- the linker may be coupled indirectly to the capture moiety, e.g., via noncovalent interaction or via an intermediate linking molecule.
- the intermediate linking molecule may comprise a polymerizable molecule (e.g., a polymer or nucleic acid molecule) that can couple the linker to the capture moiety .
- the click chemistry moieties of the linker and capture moiety or intermediate linking molecule may comprise any suitable bioorthogonal moieties, as described elsewhere herein, e.g., alkenes, alkynes, azides, epoxides, amines, thiols, nitrones, isonitriles, isocyanides, aziridines, activated esters, and tetrazines, and combinations, variations, or derivatives thereof.
- the linker may be subjected to conditions sufficient to react the first click chemistry moiety to the second click chemistry moiety, e.g., provision of metal catalysts, appropriate solvents, pH, temperature, ionic concentration, or light/energy for any useful duration of time.
- the linker may comprise any additional useful moieties.
- the linker may comprise a releasable or cleavable moiety, which may facilitate removal of the monomer from the polymeric analyte, or portion thereof, or from the substrate.
- a releasable or cleavable moiety may comprise, for example, a disulfide bond, which may be releasable by contacting with a reducing agent (e.g., DTT, TCEP).
- the linker may couple to the third polymerizable molecule via the releasable or cleavable moiety, alternatively or in addition to the coupling via click chemistry moieties.
- the linker may additionally or alternatively comprise any number of spacing moieties, e.g., polymers (e.g., PEG, PVA, polyacrylamide), aminohexanoic acid, nucleic acids, alkyl chains, etc.
- spacing moieties may increase the distance between any other moieties of the linker, e.g., the amino acid-reactive group and the polymerizable molecule-reactive group.
- the linker may comprise or be coupled to a detectable moiety, e.g., a fluorophore, radioisotope, mass tag, nucleic acid molecule (which can also act as a releasable or cleavable moiety), or other detectable moiety.
- a detectable moiety e.g., a fluorophore, radioisotope, mass tag, nucleic acid molecule (which can also act as a releasable or cleavable moiety), or other detectable moiety.
- the linker comprises a fluorophore, which can enable localization visualization of the linker using single- molecule imaging.
- the monomer may be labeled with a first fluorophore and the linker may comprise a second fluorophore to enable localization visualization of the linker and the monomer (e.g., using two-channel imaging or FRET).
- the polymerizable molecule comprises a linker.
- the linker may be used, for instance, for coupling a reactive moiety to the polymerizable molecule, which reactive moiety can react with that of another linker.
- a polymerizable molecule e.g, nucleic acid molecule, may comprise a linker that comprises a click chemistry moiety.
- the linker comprising the click chemistry moiety may be coupled to the polymerizable molecule using any useful approach, e.g., by incorporation of a linker-conjugated nucleotide or nucleoside and may be located at any useful position (e.g., at a 5 ’ end, at a 3 ’ end, in the center of the polymerizable molecule).
- a click-functionalized nucleotide or nucleoside e.g., ethynyl deoxyuridine, octadiynyl deoxyuridine, can be incorporated into the backbone of a DNA or RNA molecule.
- the click chemistry moiety of the polymerizable molecule may then couple to another linker that comprises a complementary click chemistry moiety and also an amino acid reactive group (e.g., isothiocyanate, dansyl chloride, DNFB, etc.).
- the linker may comprise any number of spacing moieties, e.g., alkyl chains, polymer spacers (e.g., PEG), nucleic acid or oligo spacers, or other useful spacing moieties which may be useful in modulating the size or molecular weight of the linker.
- the linker may comprise atleast 1, at least2, at least 3, at least4, at least 5, at least 6, at least 7, at least 8, at least
- the linker may comprise at most about 100, at most about
- the modified amino acid or derivative thereof may be generated using an intramolecular expansion process, e.g., using one or more linkers and polymerizable molecules.
- intramolecular expansion individual amino acids, clusters of amino acids, orsmall peptides (e.g., a dipeptide, tripeptide, or quadripeptide) of a protein (e.g., a peptide, polypeptide, or protein analyte) may be sequentially removed and re-tethered together, such that the distance between the individual amino acids or clusters of amino acids is increased.
- performing an intramolecular expansion process of one or more amino acids of a peptide may enable advanced imagingtechniques, e.g., super-resolution imaging, of the expanded amino acids which would be otherwise impossible whilst the amino acids are attached to the peptide.
- advanced imagingtechniques e.g., super-resolution imaging
- the distance between amino acids may be sufficient to allow for single amino acid resolution using fluorescence superresolution imaging that would not be possible for single amino acids within a peptide (due to the resolution limit, steric crowding of binding agents, etc.).
- a method for intramolecular expansion may comprise providing a peptide comprising a plurality of amino acids, a linker (e.g., as described elsewhere herein), a polymerizable molecule (e.g., as described elsewhere herein), and a capture moiety.
- the linker may be configured to couple to (i) an amino acid (e.g., NTAA or CTAA) of the peptide and (ii) the polymerizable molecule.
- the method may further comprise contacting the linker with the amino acid and the polymerizable molecule.
- the linker may be provided pre-tethered to the polymerizable molecule and subsequently reacted with the amino acid.
- the linker may couple to the amino acid of the peptide to generate an amino acid-linker complex, which may or may not comprise the polymerizable molecule.
- the amino acid-linker complex may then couple to the capture moiety via the polymerizable molecule.
- the polymerizable molecule and the capture moieties may both comprise nucleic acid molecules, which may be coupled via hybridization, ligation, an extension reaction, or combination thereof.
- the method may further comprise, cleaving the amino acid from the peptide to yield a modified amino acid that comprises the amino acid-linker complex, and optionally repeating the process.
- an additional linker may be provided which is configured to couple to (i) an additional amino acid of the peptide (e.g., the n- 1 NTAA or n-1 CTAA) and (ii) an additional polymerizable molecule.
- the additional linker may be provided precoupled to the additional polymerizable molecule.
- the method may further comprise contacting the additional linker with the additional amino acid to generate an additional amino acid-linker complex.
- the additional polymerizable molecule may be coupled to the additional linker prior to, during, or subsequent to the coupling of the additional linker to the additional amino acid.
- the additional polymerizable molecule may be configured to couple to the (first) modified amino acid, e.g., via coupling of the polymerizable molecule and the additional polymerizable molecule.
- the additional linker-additional amino acid complex may couple to the modified amino acid, thereby generating a stacked plurality of modified amino acids see, e.g., FIGs. 1A-1B).
- the additional amino acid may be cleaved from the peptide prior to, during, or sub sequentto generation of the stacked plurality of modified amino acids.
- a “modified amino acid” as used herein may refer to the amino acid-linker complex, the amino acid-linker-polymerizable molecule complex, or derivatives thereof.
- the modified amino acid may be used to refer to the amino acid-linker complex or the amino acid-linker- polymerizable molecule complex before or after cleavage.
- the modified amino acid may refer to a portion of the amino acid-linker complex or the amino acid-linker- polymerizable molecule complex (e.g., justthe comprised aminoacid portion, justthe amino acidlinker complex portion, etc.).
- intramolecular expansion of the peptide or protein may occur across a plurality of capture moieties.
- a substrate may be provided that comprises a plurality of capture moieties, and, in some instances, the capture moieties are located adj acentto the peptide or protein.
- a first amino acid (e.g., n NTAA or n CTAA) of the peptide or protein may be coupled to a first capture moiety (e.g., via a first linker and a first polymerizable molecule), a second amino acid (e.g., n-1 NTAA or n-1 CTAA) may be coupled to a second capture moiety (e.g., via a second linker and a second polymerizable molecule), and a third amino acid (e.g., n-2 NTAA or n-2 CTAA) may be coupled to a third capture moiety (e.g., via a third linker and third polymerizable molecule).
- a first capture moiety e.g., via a first linker and a first polymerizable molecule
- a second amino acid e.g., n-1 NTAA or n-1 CTAA
- a third amino acid e.g., n-2 NTAA or n-2 CTAA
- a first amino acid (e.g., n NTAA) maybe coupled to a first capture moiety
- a second amino acid e.g., n-1 NTAA
- the modified amino acid e.g, from the n NTAA
- a third amino acid e.g., n-2 NTAA
- the polymerizable molecule may comprise temporal information, e.g., abarcode on the round or cycle number thatthe polymerizable molecule is provided.
- any number of amino acids (or modified amino acids) may be coupled to any number of capture moieties.
- a capture moiety may couple to the amino acid, the linker, or the polymerizable molecule.
- the coupling of the amino acid, the linker, or the polymerizable molecule to the capture moiety may comprise a covalent interaction or a noncovalent interaction.
- the coupling may occur by interaction of binding pairs, e.g., biotin and avidin (or streptavidin), antigen or epitope and antibody or antibody fragment, cyclodextrins and small hydrophobic molecules (e.g., alkanes, benzene, polycyclics), cucurbiturils and adamantaneammonium or trimethylammoniomethyl ferrocene, cyclophane (e.g., calixarenes, cavitands, pillararenes, tetralactams), etc.
- binding pairs e.g., biotin and avidin (or streptavidin), antigen or epitope and antibody or antibody fragment, cyclodextrins and small hydrophobic molecules (e.g., alkanes, benzene, polycyclics), cucurbiturils and adamantaneammonium or trimethylammoniomethyl ferrocene, cyclophane (e.g., calixarenes
- the capture moiety comprises an additional polymerizable molecule (e.g., a nucleic acid molecule or peptide).
- both the polymerizable molecule of the modified amino acid and the capture moiety may comprise nucleic acid molecules.
- the nucleic acid molecules may be coupled to one another, e.g., via complementary base pairing directly or via a splint molecule and optional ligation. Alternatively, or in addition to, the nucleic acid molecules may be coupled via an nucleic acid extension or amplification reaction.
- the nucleic acid molecule of the capture moiety or the polymerizable molecule can comprise any naturally occurring, non-naturally occurring or engineered nucleotide base.
- the nucleic acid molecule may comprise a pseudo-complementary base, a bridged nucleic acid, a xenonucleic acid, a locked nucleic acid, a peptide nucleic acid (PNA), a gamma- PNA, a morpholino, etc., as is described elsewhere herein.
- a range of average distances between the polymerizable molecules from one another or from the polymeric analytes may be used, e.g., from about 1 nm to about 40 nm, from about 2 nm to about lO nm, etc.
- the concentration or density of the molecules attached to the substrate may be modulated using one or more suitable approaches, including patterning or random deposition approaches. Examples of methods to control the concentration or density of the molecules attached to the substrate include limited dilution, addition of chaotropes (e.g., guanidine, formamide, urea), using metal organic compounds, etc.
- the molecules may be attachedto the substrate in a patterned fashion, e.g., using self-assembling monolayers, photopatterning, lithography, etching, or a combination thereof, or the molecules may be randomly arranged.
- a monomer of the polymeric analyte may be modified to facilitate recruitment of an enzyme to recognize or cleave a terminal monomer (e.g., aNTAA or CTAA of a peptide, the 5’ or 3 ’ nucleotide of a nucleic acid molecule, or the first or last monomer of a polymer) or set of monomers.
- a terminal amino acid of a peptide analyte may be modified with a saccharide in order to recruit a lectin or lectin-bound protease.
- one or more monomers of a polymeric analyte may comprise or be coupled to a nucleic acid molecule having a first sequence that is complementary to a second sequence comprised by an oligo-bound protease. Hybridization of the first sequence to the second sequence may facilitate local recruitment of the protease to the monomer to be cleaved.
- a peptide analyte may be modified with PITC, which may allow for recruitment and cleavage by an Edmanase.
- the polymeric analyte may comprise one or more modified monomers.
- the modification of the monomers may be naturally occurring, or synthetic. Synthetic modifications may be performed prior to, during, or subsequent to cleavage of a monomer from the polymeric analyte and may be advantageous in preserving the identity of the monomer. For instance, during standard Edman degradation reactions to cleave a terminal amino acid (monomer) from the peptide, some amino acid residues may be altered or rendered undetectable by the reaction conditions.
- the conditions of Edman degradation may cause oxidation of cysteine residues, dehydration or destruction of the phenylthiohydantoin (PTH) forms of serine or threonine, react with and modify lysine residues, or render some post-translational modifications undetectable.
- PTH phenylthiohydantoin
- the polymeric analyte or monomer may be modified with a protecting group or moiety, such as a methyl, formyl, ethyl, acetyl, t-butyl, anisyl, benzyl, tifluoroacetyl, N-hydroxysuccinimide, t- butyloxycarbonyl (Boc), benzoyl, 4-methyl benzyl, thioanizyl, thiocresyl, benzyl oxymethyl, 4- nitrophenyl, benzyloxycarbonyl, 2-nitrobenzoyl, 2-nitrophenylsulphenyl, 4 -toluene sulphonyl, pentafluorophenyl, diphenylmethyl, 2-chlorobenzyloxycarbonyl, 2,4,5-trichlorophenyl, 2- bromobenz
- the polymeric analyte or monomer may be treated with a protecting agent, e.g., carboxyethyl methanethiosulfonate (CEMTS), thiazolidine, mercaptophenyl acetic acid, cyanobenzothiazole (e.g., for lipidation of N-terminal cysteines), acetamidomethyl, 2-methylsulfonylethyl-oxy carbonyl, etc.
- a protecting agent e.g., carboxyethyl methanethiosulfonate (CEMTS), thiazolidine, mercaptophenyl acetic acid, cyanobenzothiazole (e.g., for lipidation of N-terminal cysteines), acetamidomethyl, 2-methylsulfonylethyl-oxy carbonyl, etc.
- a protecting agent e.g., carboxyethyl methanethiosulfonate (CE
- the lysine residues may be blocked (e.g., the primary amines of lysine residues may be reacted) using an isothiocyanate (e.g., PITC), and optionally carrying out a single round of Edman degradation to generate a new N-terminal exposed end.
- an isothiocyanate e.g., PITC
- a monomer of the polymeric analyte may be modified to facilitate cleavage of the monomer from the polymeric analyte.
- an amino acid monomer of a peptide polymeric analytic may be modified such that it is recognized by an enzyme, e.g., acetylation of an amino acid, which can facilitate acyl peptide hydrolase cleavage of the acetylated amino acid. Additional or alternative modifications to the monomers, such as those described herein, may also facilitate recognition by or interaction with an engineered cleaving enzyme.
- a monomer comprising a naturally -occurring modification may be treated to remove or alter the naturally-occurring modification to render the polymeric analyte or monomer more amenable to the processing operations disclosed herein.
- acetylation, formylation, methylation, and pyrrolidone carboxylic acid post-translational modifications may be removed prior to sequencing.
- Acetylation modifications may be removed with acyl peptide hydrolase or acid treatment (e.g., using IN HC1). Methylation may be removed using aminopeptidases.
- Formylation modifications may be removed, for example, using acid treatment (e.g., 0.6M HC1 treatment).
- Pyrrolidone carboxylic acid (PCA) may be removed with pyroglutamate aminopeptidase.
- Exemplary C-terminal modifications may include amidation and methylation, both of which may be removed using carboxypeptidases.
- the polymerizable molecules described herein may be coupled to one another using any useful approach. Such couplingmay comprise a covalent interaction or a noncovalent interaction (e.g., ionic interaction, hydrophobic interaction, van der Waals forces, etc.).
- a first polymerizable molecule e.g., a linking nucleic acid molecule
- a second polymerizable molecule e.g., a capture moiety
- the first polymerizable molecule may comprise a first sequence that is complementary to a second sequence of the second polymerizable molecule, and the coupling may occur via hybridization of the first sequence to the second sequence.
- the first sequence and the second sequence may notbe complementary to one anotherbutmay be complementary to a third sequence and a fourth sequence, respectively, of a splint or bridge oligonucleotide. Accordingly, coupling of the first polymerizable molecule to the second polymerizable molecule may be mediated by hybridization of the first and second sequences to the third and fourth sequences, respectively, of the splint or bridge oligonucleotide.
- a nucleic acid reaction may be performed as part of or in addition to the coupling of the first polymerizable molecule to the second polymerizable molecule.
- the first sequence of the first polymerizable molecule may hybridize to the second sequence of the second polymerizable molecule, and a nucleic acid extension reaction (e.g., using a polymerase) may be performed.
- a nucleic acid extension reaction e.g., using a polymerase
- Such an extension reaction may allow for transfer of the encoded information of one of the polymerizable molecules (e.g., the first polymerizable molecule) to another polymerizable molecule (e.g., the second polymerizable molecule).
- the first sequence of the first polymerizable molecule may be ligated to the second sequence of the second polymerizable molecule to provide a first polymerizable molecule covalently coupled to the second polymerizable molecule.
- the polymerizable molecules may be coupled chemically, either covalently or noncovalently.
- the first polymerizable molecule may be chemically linked to the second polymerizable molecule.
- the first polymerizable molecule may comprise a first reactive moiety
- the second polymerizable molecule may comprise a second reactive moiety that is capable of reacting with the first reactive moiety.
- the first reactive moiety may be contacted with the second reactive moiety and b e subj ected to conditions sufficient to link the first reactive moiety to the second reactive moiety, e.g., via click chemistry.
- the first polymerizable molecule may be coupled to the second polymerizable molecule via a noncovalent or indirect interaction, e.g., biotin-streptavidin.
- a modified monomer may be altered such that it is rendered undetectable by the binding agent, e.g., to prevent binding of the binding agent to the modified monomer in subsequent iterations or cycles of detection (e.g., via contacting with additional binding agents).
- the monomer may be contacted with a blocking agent or derivatized such that the binding agent no longer recognizes the derivatized form.
- blocking strategies may be useful in preventing re-detection of the monomer. Additional strategies for inhibiting binding of binding agents to cleaved monomers are described elsewhere herein.
- the binding agent may be removed from the modified monomer or stacked plurality of modified monomers at any useful or convenient operation, e.g, subsequent to detection. Removal of the binding agent may be performed using chemical or enzymatic approaches, e.g., using chemical denaturants, detergents, acidic or alkaline conditions, heat, or proteases.
- the polymerizable molecule and/or detectable label of the binding agent may be removed or rendered undetectable, e.g., via a cleavage orrestriction site and use of a cleaving enzyme (e.g., UDG, restriction enzyme), chemical cleavage, photolysis, photobleaching, or other approach.
- a cleaving enzyme e.g., UDG, restriction enzyme
- the polymerizable molecule or detectable label is coupled to the binding agent via a noncovalent interaction, e.g., desthiobiotin-avidin; accordingly, decoupling of the polymerizable molecule or detectable label from the binding agent may be achieved by use of a competition agent, e.g., a higher-affinity biotin to competitively replace the desthiobiotin.
- a competition agent e.g., a higher-affinity biotin to competitively replace the desthiobiotin.
- the polymerizable molecules may be subjected to sequencing or identification.
- the polymerizable molecules may comprise spatial or temporal information (e.g., barcodes encoding for such information), such thatidentification or readout of the barcodes can yield information on the order or sequence that the modified monomers originated from the polymeric analyte.
- spatial or temporal information e.g., barcodes encoding for such information
- a first polymerizable molecule e.g., a linking nucleic acid molecule
- a second polymerizable molecule e.g., a second linking nucleic acid molecule
- the barcode information may be read out, such that the cycle or round that the polymerizable molecule was provided may be determined and may thus provide temporal or sequential information on the modified monomer (e.g., that an identified monomer was from a first cycle and that a second identified monomer was from a second cycle).
- the polymerizable molecules comprise nucleic acid molecules.
- the nucleic acid molecules may be subjected to a nucleic acid reaction at any useful or convenient step.
- the nucleic acid molecules may be amplified (e.g., using nucleic acid amplification approaches such as polymerase chain reaction (PCR), isothermal amplification, ligation-mediated amplification, transcription-based amplifi cation, etc.) to generate amplicons for sequencing.
- Amplification may be performed, for example, using the capture moieties or polymerizable molecules as primer binding sites.
- any number of useful preparation operations may be performed, such as purification or enrichment (e.g., using gel electrophoresis and extraction, column cleanups, SPRI, or other approach), cleanup, nucleic acid reactions (e.g, ligation, extension, amplification, tagmentation, restriction enzyme cleavage, phosphatase or kinase treatment), fragmenting, barcoding, addition of adapters, enzymatic treatment, etc.
- the polymerizable molecules, or the substrates comprisingthe polymerizable molecules may be filtered based on any useful characteristic or properties. Filtering based on a characteristic or property may achieve higher accuracy or less noise by removing poor quality molecules or enriching for higher quality polymerizable molecules prior to sequencing.
- polymerizable molecules or substrates (e.g., beads or particles) containing the polymerizable molecules may be filtered by size or length, quantity, presence of particular sequences (e.g. , primer sequences, sequences of interest), GC content, polarity, polarization, birefringence, fluorescence (or other optical property), anisotropy, charge, secondary structure (e.g., hairpins), or other useful metric, characteristic, or property or combinations thereof.
- sequences e.g. , primer sequences, sequences of interest
- GC content e.g., polarity, polarization, birefringence, fluorescence (or other optical property), anisotropy, charge, secondary structure (e.g., hairpins), or other useful metric, characteristic, or property or combinations thereof.
- Such filtration or enrichment may be performed using any suitable approach, e.g., affinity or hybridization approaches (e.g., bead-based affinity sequences or hybridization assays, which can enrich particular sequences), chromatography, size-based filtration, electrophoresis, electrofocusing, optoelectronics, SPRI, digital fluidics, magnetic activated sorting, fluorescence activated sorting, flow cytometry, or other suitable technique.
- affinity or hybridization approaches e.g., bead-based affinity sequences or hybridization assays, which can enrich particular sequences
- chromatography size-based filtration
- electrophoresis electrophoresis
- electrofocusing electrofocusing
- optoelectronics electrofocusing
- SPRI digital fluidics
- magnetic activated sorting fluorescence activated sorting
- flow cytometry or other suitable technique.
- Sequencing may be performed using a commercially available nanopore system, e.g, Oxford Nanopore Technologies, Genia Technologies, NobleGen, or Quantum Biosystem, or other sequencing and next generation sequencing systems, e.g., Illumina, BGI, Qiagen, ThermoFisher, PacBio, and Roche, including formats such as parallel bead arrays, sequencing by synthesis, sequencingby ligation (e.g., SOLiD), capillary electrophoresis, electronic microchips, “biochips,” microarrays, parallel microchips, single-molecule arrays, and Sanger sequencing, as is described elsewhere herein.
- a commercially available nanopore system e.g, Oxford Nanopore Technologies, Genia Technologies, NobleGen, or Quantum Biosystem
- sequencing and next generation sequencing systems e.g., Illumina, BGI, Qiagen, ThermoFisher, PacBio, and Roche, including formats such as parallel bead arrays, sequencing by synthesis, sequencingby ligation (e
- Sequencing of the modified amino acids or stacked plurality of modified amino acids may be performed by detecting a detectable label coupled to a binding agent that recognizes a particular amino acid type.
- the binding agent comprises a fluorophore
- detection and identification of the amino acid type of a modified amino acid comprises super-resolution fluorescence imaging, e.g., using dSTORM.
- sequencing reads may be generated by the detection and identification of the individual amino acid type comprised by the modified amino acid or stacked plurality of modified amino acids as well as the location (e.g., an X-Y position along an imaging plane) or proximity of other modified amino acids of a stacked plurality of modified amino acids.
- an array comprising a plurality of individually addressablelocations may be provided.
- each individually addressable location one or fewer modified amino acids or stacked plurality of modified amino acids is provided.
- all fluorescence signals arisingfromthatindividually addressable location may be attributed back to a single modified amino acid or stacked plurality of modified amino acids.
- the use of super-resolution imaging may allow for spatial distinction of each modified amino acid of the single stacked plurality of modified amino acids.
- both the identity of the amino acid type and the spatial position of the modified amino acid canbe determined, thereby providingthe sequenceof the peptidefrom which the stacked plurality of modified amino acids was derived.
- Identification of the single modified amino acid or stacked plurality of modified amino acids may comprise image processing of the acquired image data.
- dSTORM is used for imaging the single modified amino acid or stacked plurality of modified amino acids.
- image processing of STORM datasets including publicly available modules may be used (e.g., using Python script, including NumPy, SciPy, OpenCV, scikit-image).
- Image processing may be performed, e.g., to measure the intensity of a fluorescent signal or to deconvolve signals within proximity of one another, using any useful operations or combinations of operations, including, in non-limiting examples, filtering, blurring, noise reduction, thresholding, generating weighted centroids to identify coordinates of a fluorescent spot, estimation of variance of the identified centroid, measuring peak intensity, clustering, distance measurement, additional segmentation operations, among others.
- the k-mer sequences from the pool of reads may be assembled into longer contig sequences.
- a De Brujin graph may be generated, e.g., to represent splice variants, post- translational modifications, or other proteoforms.
- the isoforms may be assembled, and the expression level may be determined using a Bayesian approach.
- the assembled isoforms of proteins may be subjected to evaluation and error correction, e.g., by comparison with standard proteins that are spiked in samples, and assessing for missing segments of sequences, incorrect or redundant assembly, uniform coverage, etc.
- the linker may comprise a monomer-coupling group and subsequently be reacted with a linking molecule (e.g., a linking nucleic acid molecule); alternatively, the linker may be provided with the linking nucleic acid molecule as part of the linker (e.g., pre-conjugated).
- the binding agent may be provided at any useful or convenient step. For instance, the binding agent may be contacted with the modified monomers (e.g., monomer-linker complex sub sequent to cleavage from the polymeric analyte) prior to, during, or sub sequent to attachment or immobilization of the modified monomer to a substrate for analysis (e.g., via imaging).
- amine-fun ctionalized substrates may be coupled to azide-functionalizedDNA primers using a DBCO-NHS ester or DBCO-PEG-NHS ester linker.
- peptides may be coupled to a substrate by direct coupling or by using a linker.
- a peptide may be coupled to a substrate at a terminus of the peptide (e.g., C terminus orN terminus), at an internal residue or amino acid of the peptide, or at multiple locations along the peptide.
- the peptides may be coupled to a substrate using a linker, e.g., as described elsewhere herein.
- the linker may comprise atleasttwo functional groups (e.g., a heterobifunctional linker) that can couple to both the substrate and the nucleic acid molecules.
- the substrate may comprise an amine group, and alkyne-functionalized peptides may be attached using a linker such as azidoacetic acid NHS ester.
- amine-functionalized substrates may be coupled to azide-functionalized peptides using a DBCO- NHS ester or DBCO-PEG-NHS ester linker.
- substrates comprising an amine group may be coupled to an azide-functionalized peptide using EDC and Sulfo-NHS.
- a peptide may be functionalized with a functional moiety to enable attachment or coupling of the peptide to the substrate.
- the functional moiety may comprise a silane, e.g., aminosilane (e.g., APTES), amino-PEG-silane, click chemistry moiety or other linking moiety and can be attached to the peptide at a peptide terminus (N-terminus or C-terminus), at an internal amino acid, or at multiple locations (e.g., multiple internal amino acids, one or both termini, etc.).
- terminal ends of peptides may be achieved enzymatically or using enzyme analogs such as ribozymes or DNAzymes.
- enzyme analogs such as ribozymes or DNAzymes.
- carboxypeptidases or amidases are used for C-terminal functionalization (e.g., as described in Xu et al, ACS Chem Biol. 2011 Oct 21 ; 6(10): 1015-1020; Zhu et al, Chinese Chemical Letters. 2018, Vol 29 Issue 7, Pages 1116-1118; andZhu et al, ACS Catal.
- the click chemistry -functionalized peptides may then be directly attached to the substrate via another clickable group (e.g., BCN-azide or DBCO-azide coupling), or, in other instances, may be reacted with another linker or polymerizable molecule (e.g., a bait nucleic acid molecule with a clickable group) that can then link to the substrate directly or indirectly (e.g., using a capture nucleic acid molecule and hybridizing the bait nucleic acid molecule).
- another clickable group e.g., BCN-azide or DBCO-azide coupling
- another linker or polymerizable molecule e.g., a bait nucleic acid molecule with a clickable group
- ubiquitin ligase can be used to attach ubiquitin proteins with linker moieties to substrates. These linker moieties can then be used to chemically attach proteins to ubiquitin-coupled substrates.
- Internal amino acid residues or post- translationally modified residues may be coupled to substrates using, for example, thiol labeling amide coupling using EDC/NHS chemistry or DMT-MM to glutamate or aspartate residues, esterifying glutamate or aspartate residues, alkylation or disulfide bridge labeling of cysteines, or amide coupling to lysine residues.
- a peptide may be treated prior to, during, or subsequent to coupling of the peptide to a substrate.
- a peptide is conjugated with a tag that enables attachmentto the substrate, e.g., usingHis tags, SNAP-tags, CLIP-tags, SpyCatcher, SpyTag, nucleic acid tags (e.g, bait oligos which can attach to capture oligos of the substrate).
- it may be advantageous to block or protect primary amines or carboxyl groups and optionally, de-block or de-protect the N-terminus primary amine or C-terminus carboxy group in order to facilitate attachment of the N-terminus or C-terminus to a substrate.
- single-point (e.g., C- terminal) selective attachment of peptides can be achieved by reacting the peptide with a linker comprising an amine-reactive group (e.g., isothiocyanates such as PITC) and a reactive group (e.g., click chemistry group).
- the linker can be, for example, PITC-conjugated click chemistry moieties such as PITC-azide, PITC-alkyne, optionally with spacer moieties in between, e.g., PITC-alkyl-azide, PITC-PEG-azide, PITC-alkyl-alkyne, PITC-PEG-azide).
- the linker reacts with and “blocks” the primary amines (e.g., modifies lysines), includingthe N-terminus. Subsequent cleavage of the N-terminal amino acid (e.g., using an Edman reagent, such as acid), can be performed, and one of the remaining modified lysines may be attached to a substrate (e.g., using the click chemistry moiety coupled to the amine-reactive group).
- the primary amines e.g., modifies lysines
- Subsequent cleavage of the N-terminal amino acid e.g., using an Edman reagent, such as acid
- one of the remaining modified lysines may be attached to a substrate (e.g., using the click chemistry moiety coupled to the amine-reactive group).
- the peptide may be treated with a protease, e.g., LysC, which cleaves peptides such that a remaining peptide has a C- terminal lysine and such that the remaining peptide comprises a primary amine only at the C- terminal lysine residue and the N-terminus; such a cleavage may be performed prior to reacting the amine-reactive group, e.g., as shown by Xie et al. Langmuir 2022, 38, 30, 9119-9128, which is incorporated by reference herein in its entirety.
- a protease e.g., LysC
- carboxylic groups can be reacted in a way to enable C-terminal or internal residue attachment.
- carboxyl groups may be labeled with a C-terminal sequencing reagent, such as isothiocyanate, when treated with an activating reagent (e.g., acetic anhydride) to generate a peptide-thiohydantoin (at the C-terminus) and “blocked” carboxyl groups on the aspartic acid andglutamic acid residues.
- an activating reagent e.g., acetic anhydride
- the thiohydantoin may then be reacted to couple to a substrate.
- cleavage of the C-terminal amino acid via a single round of C-terminal sequencing degradation, or via a protease exposes only a single reactive carboxylic group at the C-terminal amino acid.
- the single reactive C-terminal carboxylic group can then be used as a reactive moiety for a single attachment site.
- a peptide or protein can be attached via the N-terminus using the specific reactivities of the N-terminus amine group.
- Amine-based reactions such as amide coupling, can be carried out at low pH where only the N-terminal amine group is active.
- 2-pyridinecarboxyaldehyde and variants can be used to react to the N-terminal amine group.
- a peptide may be conjugated to a substrate using a polymerization reaction, e.g., a free radical polymerization, such as using PEGylated peptides, methacrylamide- modified peptides, Michael-type addition of maleimide-terminated oligo-NIPAAM-conjugated peptides; photocrosslinking of azophenyl-conjugated peptides, or other polymerization reactions with monomer-conjugated peptides, e.g., as described by Krishna et al. Biopolymers. 2010; 94(1): 32-48, which is incorporated by reference herein in its entirety.
- a polymerization reaction e.g., a free radical polymerization, such as using PEGylated peptides, methacrylamide- modified peptides, Michael-type addition of maleimide-terminated oligo-NIPAAM-conjugated peptides; photocrosslinking of azophenyl-con
- the substrate may comprise, coupled thereto, any combination of molecules, including but not limited to peptides, proteins (e.g., enzymes, antibodies, nanobodies, antibody fragments), nucleic acid molecules, lipids, carbohydrates or sugars, metabolites, small molecules, polymers, metals, viral particles, biotin, avidin, streptavidin, neutravidin, etc.
- the multiple types of molecules may be attached simultaneously to the substrate or in a sequential manner.
- a substrate may be treated to conjugate nucleic acid molecules and subsequently treated to conjugate peptides, or alternatively, the substrate may be treated to conjugate peptides prior to the nucleic acid molecules.
- Any number of conjugation or attachment chemistries may be used.
- any number of conjugation chemistries may be used for each type of molecule.
- a substrate, or portion thereof may be subjected to conditions sufficient to passivate the substrate or portion thereof. Passivation of a substrate may beuseful for a variety of purposes, such as preventing nonspecific binding of binding agents, altering the surface density of a molecule (e.g., increasing the density of nucleic acid molecules or peptides), blocking reactive sites (e.g., blocking available click chemistry moieties subsequentto conjugation of the molecules on the substrate), etc.
- Passivation maybe achieved using chemical approaches, e.g., deposition of blocking agents such as proteins (e.g., albumin), Tween-20, polymers (e.g., PEG), metals or metal oxides, halogens or derivatives thereof (e.g., fluorine or fluorine derivatives, chlorine or chlorine derivatives), or biochemical approaches, e.g., using metal microbes.
- blocking agents such as proteins (e.g., albumin), Tween-20, polymers (e.g., PEG), metals or metal oxides, halogens or derivatives thereof (e.g., fluorine or fluorine derivatives, chlorine or chlorine derivatives), or biochemical approaches, e.g., using metal microbes.
- Substrates comprising reactive moieties may also be passivated following molecule conjugation (e.g., coupling of nucleic acid molecules, peptides, etc.) by reacting any unreacted sites with an appropriate molecule.
- a substrate comprising click chemistry moieties may be coupled to molecules of interest (e.g., polymerizable molecules, such as nucleic acid molecules, peptides) at a useful density using click chemistry (e.g., azide-nucleic acid molecules, azide-peptides).
- click chemistry e.g., azide-nucleic acid molecules, azide-peptides
- Unreacted sites may be passivated by providing and reacting complementary click-chemistry molecules, e.g., azide-polymers (e.g., PEG-azide), which may reduce downstream nonspecific interactions.
- Substrate passivation may occur at any useful time or step. For instance, passivation to block unreacted DBCO sites may be performed prior to, during, or subsequent to conjugation of analytes or other molecules of interest (e.g., peptides and nucleic acid molecules). The passivation may be controlled by stoichiometry or densities of the passivating agent relative to the molecules of interest, or by physical approaches, e.g., photopatterning, self-assembling monolayers, etc.
- analytes or other molecules of interest e.g., peptides and nucleic acid molecules.
- the passivation may be controlled by stoichiometry or densities of the passivating agent relative to the molecules of interest, or by physical approaches, e.g., photopatterning, self-assembling monolayers, etc.
- One or more methods for processing samples may comprise preparation of biological samples for analysis, which, in some instances, includes partitioning of cells for conducting single- cell analysis.
- a method for processing a biological sample may comprise extraction or isolation of one or more peptides or proteins from the biological sample for further processing and analysis, as is described elsewhere herein.
- a biological sample may comprise a bacterial liquid culture, a mammalian liquid culture, a blood, plasma, or serum sample. Processing of such liquid samples may include centrifugation (e.g., to isolate cells), resuspension of cells in a suitable medium, such as Dulbecco’s Phosphate Buffered Saline (DPBS), and optional culturing of the isolated cells.
- a suitable medium such as Dulbecco’s Phosphate Buffered Saline (DPBS)
- a biological sample may comprise cultured cells, e.g., cell cultured in suspension, or cells adhered to a solid surface, such as petri dishes or tissue culture dishes. Cultured adherent cells samples may be treated to generate a cell suspension, e.g., via a protease such as trypsin, to detach the cells from the surface.
- a biological sample may comprise a tissue or biopsy sample. A tissue or biopsy sample may be processed mechanically or enzymatically to generate a cell suspension.
- Such processing may include sonication (mechanical treatment) or enzymatic treatment, such as the use of pronase, collagenase, hyaluronidase, metalloproteinases, trypsin, or other enzymes that digest extracellular matrix components.
- the dissociated cells can then be stored in a suitable buffer, such as DPBS.
- Cell Sorting A biological sample or a cell suspension may be subjected to sorting to isolate a cell of interest. Sorting may be performed to select or isolate a cell based on a quality or characteristic of the cell, e.g., expression of a protein target, size, deformability, fluorescence or other optical property, or other physical property of the cell.
- Sorting may accomplished using any number of approaches, e.g., usingimmunosorting(e.g., fluorescence activated cell sorting(FACS) or magnetic activated cell sorting (MACS)), electrophoretic approaches, chromatography, microfluidic approaches (e.g., using inertial focusing, cell traps, electrophoresis, isoelectric focusing), acoustic sorting, optical sorting(e.g., optoelectronic tweezers), mechanical cell picking (e.g., using manual or robotic pipettes) or passive approaches (e.g., gravitational settling).
- approaches e.g., usingimmunosorting(e.g., fluorescence activated cell sorting(FACS) or magnetic activated cell sorting (MACS)), electrophoretic approaches, chromatography, microfluidic approaches (e.g., using inertial focusing, cell traps, electrophoresis, isoelectric focusing), acoustic sorting, optical
- Cells of a biological sample or cell suspension may be partitioned into individual partitions such that at least a subset of the individual partitions comprises a single cell.
- the individual partitions may comprise a barcode molecule (e.g., fluorophore or set of fluorophores, nucleic acid barcode molecules, etc.).
- Barcode molecules may be unique to the partition, such that each individual partition comprises a different barcode sequence than other partitions.
- the barcode molecules may be loaded into the individual partitions at any useful ratio of barcode molecules to sample species (e.g., cells, proteins, nucleic acid molecules).
- the barcode molecules may be loaded into partitions such that about O.0001, 0.001, 0.1, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 50, 100, 500, 1000, 5000, 10000, or 200000 barcodes are loaded per sample species. In some cases, the barcodes are loaded into partitions such that more than about 0.0001, 0.001, 0.1, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 50, 100, 500, 1000, 5000, 10000, or 200000 barcodes are loaded per sample species. In some cases, the barcodes are loaded in the partitions so that less than about 0.0001, 0.001, 0.1, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 50, 100, 500, 1000, 5000, 10000, or 200000 barcodes are loaded per sample species.
- a partition may assume any useful geometry such as a droplet, a microwell, a solid substrate, a gel (e.g., a cell encapsulated in a gel bead), a bead, a flask, a tube, a spot, a capsule, a channel, a chamber, or other compartment or vessel.
- a partition may be part of an array of partitions, e.g., a droplet in a microfluidic device, a microwell of a microwell plate, a spot on a multi- spot array, etc.
- a terminus of the peptide e.g., N-terminus or C-terminus
- an internal amino acid may be labeled with a barcode.
- the peptide may be fragmented priorto analysis or sequencing; accordingly, upstream attachment of multiple identical barcode molecules to the same peptide may allow for attribution of the sequence analysis back to a single peptide. Barcoding of peptides may occur priorto, during, or subsequent to fragmentation.
- an adjacency matrix of barcode sequences may be generated (e.g., to correspond barcode sequences on a single dual primer linker as spatially adjacent). Accordingly, each of the barcode sequences may be associated with a nearby adjacent barcode sequences, and as such, peptide portions may be aligned or attributed as being adjacent. Such an approach may be useful in instances where the peptide is fragmented, such that individual fragments of a peptide may be corresponded with the nearest neighbor using the barcode sequences.
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Molecular Biology (AREA)
- Hematology (AREA)
- Chemical & Material Sciences (AREA)
- Urology & Nephrology (AREA)
- Physics & Mathematics (AREA)
- Immunology (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Food Science & Technology (AREA)
- Biotechnology (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Microbiology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Cell Biology (AREA)
- Medicinal Chemistry (AREA)
- Analytical Chemistry (AREA)
- Biochemistry (AREA)
- General Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Pathology (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
- Peptides Or Proteins (AREA)
Abstract
Provided herein are systems and methods for processing polymeric analytes, e.g., peptides using an intramolecular expansion process and super-resolution imaging. One or more methods provided herein may comprise detection of individual monomers of the polymeric analyte using binding agents comprising detectable labels and detecting the detectable labels (e.g., using super-resolution imaging).
Description
PROTEIN SEQUENCING USING SUPER-RESOLUTION IMAGING
CROSS REFERENCE
[0001] This application claims benefit of U.S. Provisional Patent Application No. 63/618,018, filed January 5, 2024, which is incorporated by reference herein in its entirety.
BACKGROUND
[0002] Technological improvements in the analysis and characterization of biological molecules have proven to be critical in understanding biological and pathological mechanisms, which has implications in disease diagnosis and modeling, development of therapeutics and treatment, and improving health outcomes. Among these technological improvements, nucleic acid sequencing has emerged as an important tool for genomic and transcriptomic analysis of biological samples.
[0003] Protein signaling underpins a variety of cellular processes and serves important functions in viruses, cells, and living organisms. However, current technologies for studying proteins are limited in selectivity, sensitivity, throughput, or require a priori knowledge. As such, new approaches for characterizing and analyzing proteins are needed.
SUMMARY
[0004] Recognized herein is a need for technologies for studying proteins de novo, with low cost, high throughput, and ability to multiplex. Provided herein are systems, compositions, kits, and methods for analyzing proteins that address the abovementioned needs. A method of the present disclosure may comprise providing a peptide and sequentially generating modified amino acids from the peptide. The modified amino acids may comprise polymerizable molecules and may be generated via an intramolecular expansion process, and, in some instances, individual modified aminoacids from a single peptide may be coupledtogether to generate a stacked plurality of modified amino acids. The modified amino acids or the stacked plurality of modified amino acids may then be further analyzed. In some embodiments, the modified amino acids or stacked plurality of modified amino acids are contacted with a plurality of binding agents that are specific to a particular amino acid type; as such, detection of individual binding agents of the plurality of binding agents may yield information on the identity of one or more amino acids comprised by the modified amino acid or the stack of modified amino acids. One or more processes described herein may involve super-resolutionimaging, allowing for identification of individual aminoacids of the peptide in the order in which they appear or occur in the peptide. In some instances, a portion of the modified amino acids or stack of modified amino acids are attached to a substrate,
linearized, and then another portion of the modified amino acids or stack of modified amino acids is attached to the substrate, thereby allowing for streamlined sequential imaging and analysis of the individual amino acids comprised by the modified amino acids or stack of modified amino acids. Beneficially, the methods and systems provided herein provide a massively parallel, high- throughput, and accessible approach to de novo protein sequencing.
[0005] In an aspect, provided herein is a method for characterizing a modified amino acid, comprising: (a) providing the modified amino acid, wherein the modified amino acid comprises a polymerizable molecule coupled thereto; (b) contacting the modified amino acid with a binding agent, wherein the binding agent comprises a detectable label; and (c) detecting the detectable label, thereby identifying an amino acid type of the modified amino acid.
[0006] In some embodiments, the detecting comprises imaging the detectable label. In some embodiments, the detectablelabel comprises a fluorop hore or quantum dot. In some embodiments, the detectable label comprises a fluorescent label, a fluorescence resonance energy transfer (FRET) label, a chemiluminescence label, an electrochemiluminescence label, a bioluminescence label, a phosphorescence label, or a label that generates light through other types of reactions or stimulations. In some embodiments, the imaging is performed using super-resolution microscopy. In some embodiments, the super-resolution microscopy is dSTORM.
[0007] In some embodiments, the method further comprises, prior to (a), generating the modified amino acid. In some embodiments, the generating comprises (I) providing, a linker and the polymerizable molecule, (II) coupling the linker to (i) an amino acid of a peptide and (ii) the polymerizable molecule to generate an amino acid-linker complex, and (III) cleaving the amino acid, thereby generating the modified amino acid. In some embodiments, the linker is unifunctional, bifunctional, trifunctional, quadrifunctional, or polyfunctional. In some embodiments, the method further comprises (IV) coupling the amino acid-linker complex or the modified amino acid to a capture moiety. In some embodiments, the capture moiety is coupled to a substrate. In some embodiments, the capture moiety is coupled to the peptide. In some embodiments, the capture moiety is coupledto a C-terminus of the peptide. In some embodiments, (IV) is performed prior to (III). In some embodiments, the method further comprises derivatizing the modified amino acid. In some embodiments, the method further comprises repeating (I)-(IH) on the peptide. In some embodiments, the method further comprises repeating (IV) on the peptide. In some embodiments, the repeatingyields a stacked plurality of modified amino acids comprising a plurality of stacked polymerizable molecules. In some embodiments, the stacked plurality of modified amino acids comprises a first amino acid and a second amino acid, wherein the first
amino acid and the second amino acid are spaced along the stacked plurality of stacked polymerizable molecules at a distance greater than 50 nanometers.
[0008] In some embodiments, (a) comprises providing a fluidic device, and loading the modified amino acid in the fluidic device. In some embodiments, the fluidic device is a microfluidic device. In some embodiments, the method further comprises, linearizing the polymerizable molecule in the fluidic device. In some embodiments, the linearizing is performed using an electric field. In some embodiments, the linearizing is performed using shear stress. In some embodiments, the method further comprises loading the modified amino acid in the fluidic device. In some embodiments, the fluidic device comprises a surface, wherein the surface comprises a plurality of nucleic acid anchor molecules coupled thereto, wherein the plurality of nucleic acid anchor molecules is configured to couple to the modified amino acid. In some embodiments, the polymerizable molecule comprises a nucleic acid molecule that is configured to hybridize to a nucleic acid anchor molecule of the plurality of nucleic acid anchor molecules. In some embodiments, the loading comprises loading a plurality of modified amino acids.
[0009] In some embodiments, the binding agent comprises an antibody, an antibody fragment, a nanobody, an aptamer, a peptide, a polymer, an inorganic compound, or a small molecule.
[0010] In some embodiments, the polymerizable molecule comprises a DNA molecule. In some embodiments, the DNAmolecule comprises atleast 150 nucleotides. In some embodiments, the DNA molecule has a length of at least 50 nanometers. In some embodiments, the DNA molecule comprises an adapter sequence. In some embodiments, the adapter sequence is configured to couple to an anchor sequence of a substrate.
[0011] In another aspect, provided herein is a method for characterizing a modified amino acid, comprising: (a) providing the modified amino acid, wherein the modified amino acid comprises a polymerizable molecule; and (b) using super-resolution imaging to determine the amino acid type of the modified amino acid.
[0012] In some embodiments, (b) is performed using amino acid-specific binding agents. In some embodiments, the amino acid-specific binding agents comprises a detectable label. In some embodiments, the detectablelabel comprises a fluorop hore or quantum dot. In some embodiments, the detectable label comprises a fluorescent label, a FRET label, a chemiluminescence label, an electrochemiluminescence label, a bioluminescence label, a phosphorescence label, or a label that generates light through other types of reactions or stimulations. In some embodiments, the superresolution imaging is dSTORM.
[0013] In some embodiments, the method further comprises, prior to (a), generating the modified amino acid. In some embodiments, the generating comprises (I) providing, a linker and
the polymerizable molecule, (II) coupling the linker to (i) an amino acid of a peptide and (ii) the polymerizable molecule to generate an amino acid-linker complex, and (III) cleaving the amino acid, thereby generating the modified amino acid. In some embodiments, the linker is unifunctional, bifunctional, trifunctional, quadrifunctional, or poly functional. In some embodiments, the method further comprises, (IV) coupling the amino acid-linker complex or the modified amino acid to a capture moiety. In some embodiments, the capture moiety is coupled to a substrate. In some embodiments, the capture moiety is coupled to the peptide. In some embodiments, the capture moiety is coupledto a C-terminus of the peptide. In some embodiments, (IV) is performed prior to (III). In some embodiments, the method further comprises derivatizing the modified amino acid. In some embodiments, the method further comprises repeating (I)-(IH) on the peptide. In some embodiments, the method further comprises repeating (IV) on the peptide. In some embodiments, the repeatingyields a stacked plurality of modified amino acids comprising a plurality of stacked polymerizable molecules. In some embodiments, the stacked plurality of modified amino acids comprises a first amino acid and a second amino acid, wherein the first amino acid and the second amino acid are spaced along the plurality of stacked polymerizable molecules at a distance greater than 50 nanometers.
[0014] In some embodiments, (a) comprises providing a fluidic device, and loading the modified amino acid in the fluidic device. In some embodiments, the fluidic device is a microfluidic device. In some embodiments, the method further comprises linearizing the polymerizable molecule in the fluidic device. In some embodiments, the linearizing is performed using an electric field. In some embodiments, the linearizing is performed using shear stress. In some embodiments, the method further comprises providing a plurality of modified amino acids, includingthe modified amino acid, and loadingthe plurality of modified amino acids in the fluidic device. In some embodiments, the fluidic device comprises a surface, wherein the surface comprises a plurality of nucleic acid anchor molecules coupled thereto, wherein the plurality of nucleic acid anchor molecules is configured to couple to the plurality of modified amino acids. In some embodiments, the polymerizable molecule comprises a nucleic acid molecule that is configured to couple to a nucleic acid anchor molecule of the plurality of nucleic acid anchor molecules via hybridization.
[0015] In some embodiments, the binding agent comprises an antibody, an antibody fragment, a nanobody, an aptamer, a peptide, a polymer, an inorganic compound, or a small molecule.
[0016] In some embodiments, the polymerizable molecule comprises a DNA molecule. In some embodiments, the DNAmolecule comprises atleast 150 nucleotides. In some embodiments, the DNA molecule has a length of at least 50 nanometers. In some embodiments, the DNA
molecule comprises an adapter sequence. In some embodiments, the adapter sequence is configured to couple to an anchor sequence of a substrate.
[0017] In another aspect, provided herein is a method for processing a DNA molecule, comprising: (a) attaching a first sequence of the DNA molecule to a substrate; (b) linearizing the DNA molecule adjacent to the substrate; and (c) attaching a second sequence ofthe DNA molecule DNA molecule to the substrate.
[0018] In some embodiments, the linearizing is performed using shear stress or an electric field.
[0019] In some embodiments, the DNA molecule is part of a modified amino acid, wherein the modified amino acid comprises an amino acid or derivative thereof. In some embodiments, the method further comprises detecting the modified amino acid. In some embodiments, the detecting comprises imaging. In some embodiments, the imaging is performed using superresolution microscopy. In some embodiments, the imaging is performed using dSTORM. In some embodiments, the method further comprises coupling a binding agent to the modified amino acid, wherein the binding agent comprises a detectable label. In some embodiments, the binding agent comprises an antibody, an antibody fragment, a nanobody, an aptamer, a peptide, a polymer, an inorganic compound, or a small molecule. In some embodiments, the detectable label comprises a fluorophore or quantum dot. In some embodiments, the detectable label comprises a fluorescent label, a FRET label, a chemiluminescence label, an electrochemiluminescence label, a bioluminescence label, a phosphorescence label, or a label that generates light through other types of reactions or stimulations.
[0020] In some embodiments, the substrate comprises a surface of a fluidic device. In some embodiments, the fluidic device is a microfluidic device. In some embodiments, the method further comprises providing a plurality of DNA molecules, including the DNA molecule, and loading the plurality of DNA molecules in the fluidic device. In some embodiments, the DNA molecule is attached to the substrate using hybridization. In some embodiments, the plurality of DNA molecules is positioned adjacent to the surface such that a pitch between DNA molecules of the plurality of DNA molecules is about 1 micron.
[0021] In some embodiments, the DNA molecule comprises at least 150 nucleotides.
[0022] In some embodiments, the DNA molecule has a length of at least 50 nanometers.
[0023] In some embodiments, the DNA molecule comprises an adapter sequence. In some embodiments, the adapter sequence is configured to anchor the DNA molecule to a complementary sequence of a flow cell.
[0024] In some embodiments, the DNA molecule is a modified DNA molecule.
[0025] In yet another aspect, provided herein is a composition comprising a modified amino acid, comprising the modified amino acid, wherein the modified amino acid comprises a polymerizable molecule; a binding agent bound to the modified amino acid, wherein the binding agent comprises a detectable label.
[0026] In some embodiments, the detectable label comprises a fluorophore or quantum dot. In some embodiments, the detectable label comprises a fluorescent label, a FRET, a chemiluminescence label, an electrochemiluminescence label, a bioluminescence label, a phosphorescence label, or a label that generates light through other types of reactions or stimulations. In some embodiments, the method further comprises a substrate, wherein the sub strate is bound to the modified amino acid.. In some embodiments, the polymerizable molecule is linearized adjacent to the substrate.
[0027] In yet another aspect, provided herein is a method for characterizing a modified amino acid, comprising: (a) providing the modified amino acid, wherein the modified amino acid comprises a polymerizable molecule; and (b) using Raman spectroscopy associated high- resolution imaging technique to determine the identity of the modified amino acid.
[0028] In some embodiments, the imaging step (b) is performed using label-free detection of the modified amino acid. In some embodiments, the modified amino acid generates intrinsic vibrational modes in the Raman spectrum. In some embodiments, the intrinsic vibrational modes is enhanced or modulated by specialized methods. In some embodiments, the imaging step (b) is performed using a binding agent associated with the modified amino acid. In some embodiments, the binding agent comprises an antibody, an antibody fragment, a nanobody, an aptamer, a peptide, a polymer, an inorganic compound, or a small molecule. In some embodiments, the binding agent comprises a detectable label for Raman spectroscopy. In some embodiments, the detectable label comprises 4-Mercaptobenzoic Acid (4-MBA), 4-Aminothiophenol (4-ATP), Crystal Violet, Malachite Green, Nile Blue, 2-Naphthalenethiol, Methyleneblue, or any combination thereof. In some embodiments, the high-resolution imaging technique is super-resolution imaging. In some embodiments, the super-resolution imaging detects a single molecule of the modified amino acid when (b) is performed using label-free detection of the modified amino acid. In some embodiments, the super-resolution imaging detects a single molecule of the detectable label when (b) is performed using a binding agent associated with the modified amino acid. In some embodiments, a stacked plurality of modified amino acids comprises the modified amino acid and an additional modified amino acid, wherein the modified amino acid and the additional modified amino acid are spaced along the plurality of stacked polymerizable molecules at a distance greater than 50 nanometers.
[0029] Another aspect of the present disclosure provides a non-transitory computer readable medium comprising machine executable code that, upon execution by one or more computer processors, implements any of the methods above or elsewhere herein.
[0030] Another aspect of the present disclosure provides a system comprising one or more computer processors and computer memory coupled thereto. The computer memory comprises machine executable code that, upon execution by the one or more computer processors, implements any of the methods above or elsewhere herein.
[0031] Additional aspects and advantages of the present disclosure will become readily apparent to those skilled in this art from the following detailed description, wherein only illustrative embodiments of the present disclosure are shown and described. As will be realized, the present disclosure is capable of other and different embodiments, and its several details are capable of modifications in various obvious respects, all without departing from the disclosure. Accordingly, the drawings and description are to be regarded as illustrative in nature, and not as restrictive.
INCORPORATION BY REFERENCE
[0032] All publications, patents, and patent applications mentioned in this specification are herein incorporated by reference to the same extent as if each individual publication, patent, or patent application was specifically and individually indicated to be incorporated by reference. To the extent publications and patents or patent applications incorporated by reference contradict the disclosure contained in the specification, the specification is intended to supersede and/or take precedence over any such contradictory material.
BRIEF DESCRIPTION OF THE DRAWINGS
[0033] The novel features of the invention are set forth with particularity in the appended claims. A better understanding of the features and advantages of the present invention will be obtained by reference to the following detailed description that sets forth illustrative embodiments, in which the principles of the invention are utilized, and the accompanying drawings (also “Figure” and “FIG.” herein), of which:
[0034] FIG. 1 A schematically shows an example workflow for processing polymeric analytes molecules (e.g., peptides) described herein. FIG. IB schematically shows another example workflow for processing polymeric analytes in solution or on a substrate. FIG. 1C shows an example schematic of a workflow for processing a polymeric analyte using multiple capture moieties. FIG. ID schematically shows a processed polymeric analyte in a three-dimensional substrate. FIG. IE schematically shows provision of a modified monomer or stacked plurality of
modified monomers in a three-dimensional substrate. FIG. IF schematically shows detection and identification of modified monomers derived from a polymeric analyte. FIG. 1G schematically shows detection and identification of modified monomers derived from a polymeric analyte using trifunctional linkers.
[0035] FIG. 2A schematically shows an exemplary linker for connecting polymerizable molecules to polymeric analytes. FIG.2B schematically shows multiple generic structural designs of trifunctional linkers for connecting polymerizable molecules to polymeric analytes. FIG. 2C schematically shows specific trifunctional linkers for connecting polymerizable molecules to polymeric analytes.
[0036] FIG. 3 schematically shows a computer system that is programmed or otherwise configured to implement methods provided herein.
[0037] FIG. 4 shows exemplary data for a substrate comprising one or more capture moieties. [0038] FIG. 5 shows exemplary data for detection of a modified amino acid, as described herein.
[0039] FIG. 6 shows additional exemplary data for detection of a modified amino acid, as described herein.
[0040] FIG. 7 shows additional exemplary data for detection of a modified amino acid, as described herein.
[0041] FIG. 8 schematically shows an example detection method of a modified amino acid. [0042] FIG. 9 schematically shows an example detection method of a stacked plurality of modified amino acids.
[0043] FIG. 10 schematically shows an example detection method of a circularized stacked plurality of modified amino acids.
DETAILED DESCRIPTION
[0044] While various embodiments of the invention have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. Numerous variations, changes, and substitutions may occur to those skilled in the art without departing from the invention. It should be understood that various alternatives to the embodiments of the invention described herein may be employed.
Definitions
[0045] Whenever the term “at least,” “greater than,” or “greater than or equal to” precedes the first numerical value in a series of two or more numerical values, the term “at least,” “greater
than” or “greater than or equal to” applies to each of the numerical values in that series of numerical values. For example, greater than or equal to 1 , 2, or 3 is equivalent to greater than or equal to 1, greater than or equal to 2, or greater than or equal to 3.
[0046] Whenever the term “no more than,” “less than,” or “less than or equal to” precedes the first numerical value in a series of two or more numerical values, the term “no more than,” “less than,” or “less than or equal to” applies to each of the numerical values in that series of numerical values. For example, less than or equal to 3, 2, or 1 is equivalent to less than or equal to 3, less than or equal to 2, or less than or equal to 1.
[0047] References to “one embodiment,” “an embodiment,” “example embodiment,” “some embodiments,” “certain embodiments,” “various embodiments,” etc., indicate that the embodiment(s) of the disclosed technology so described may include a particular feature, structure, or characteristic, but not every embodiment necessarily includes the particular feature, structure, or characteristic. Further, repeated use of the phrase “in one embodiment” does not necessarily refer to the same embodiment, although it may.
[0048] Ranges may be expressed hereinas from “about” or “approximately” or “substantially” one particular value and/or to “about” or “approximately” or “substantially” another particular value. When such a range is expressed, other exemplary embodiments include from the one particular value and/or to the other particular value. Further, the term “about” means within an acceptable error range for the particular value as determined by one of ordinary skill in the art, which will depend in part on how the value is measured or determined, i.e., the limitations of the measurement system. For example, “about” can mean within an acceptable standard deviation, per the practice in the art. Alternatively, “about” can mean a range of up to ±20%, preferably up to ±10%, more preferably up to ±5%, and more preferably still up to ±1% of a given value. Alternatively, particularly with respect to biological systems or processes, the term can mean within an order of magnitude, preferably within 2-fold, of a value. Where particular values are described in the application and claims, unless otherwise stated, the term “about” is implicit and in this context means within an acceptable error range for the particular value.
[0049] By “comprising” or “containing” or “including” is meant that at least the named compound, element, particle, or method step is present in the composition or article or method, but does not exclude the presence of other compounds, materials, particles, method steps, even if the other such compounds, material, particles, method steps have the same function as what is named.
[0050] Throughout this description, various components may be identified having specific values or parameters, however, these items are provided as exemplary embodiments. Indeed, the
exemplary embodiments do not limit the various aspects and concepts of the present disclosure as many comparable parameters, sizes, ranges, and/or values may be implemented. The terms “first,” “second,” and the like, “primary,” “secondary,” and the like, do not denote any order, quantity, or importance, but rather are used to distinguish one element from another.
[0051] As used herein, the term “protein” generally refers to a molecule comprising two or more amino acids joined by a peptidebond. A protein may also be referred to as a “polypeptide”, “oligopeptide”, or “peptide”. A protein can be a naturally occurring molecule, or a synthetic molecule (e.g., an artificial protein, peptide, enzyme). A protein may include one or more nonnatural amino acids, modified amino acids, or non-amino acid linkers. A protein may contain D- amino acid enantiomers, L- amino acid enantiomers or both. Amino acids of a protein may be modified naturally or synthetically, such as by post-translational modifications or by chemical modification. In some circumstances, different proteins may be distinguished from each other based on different genes from which they are expressed in an organism, different primary sequence length or different primary sequence composition. Proteins expressed from the same gene may nonetheless be different proteoforms, for example, being distinguished based on nonidentical length, non-identical amino acid sequence or non-identical post-translational modifications. Different proteins can be distinguished based on one or both of gene of origin and proteoform state.
[0052] As used herein, the term “peptide” may refer to any short, single peptide chain. A peptide may be no more than about 100, 95, 90, 85, 80, 75, 70, 65, 60, 55, 50, 45, 40, 35, 30, 25, 20, 15, 10, 5, or less than about5 amino acidsinlength. A peptide may have a known or unknown biological function or activity. Peptides can include natural, synthetic, modified, or degraded proteins or peptides, or a combination thereof. Peptides can include proteinogenic, natural, synthetic, or modified amino acids or amino acid residues, or a combination thereof.
[0053] As used herein, the term “single analyte” may refer to an analyte that is individually manipulated or distinguished from other analytes. A single analyte may comprise a biomolecule or a synthetic molecule. A single analyte may comprise a small molecule. A single analyte can be a single molecule (e.g., a single biomolecule such as a single protein, nucleic acid molecule, affinity reagent, lipid, carbohydrate, etc.), a single complex of two or more molecules (e.g., a multimeric protein having two or more separable subunits, a single protein attached to a nucleic acid molecule or a single protein attached to an affinity reagent), a single particle, or the like. Reference herein to a “single analyte” in the context of a composition, system or method herein does not necessarily exclude application of the composition, system or method to multiple single
analytes that are manipulated or distinguished individually, unless indicated contextually or explicitly to the contrary.
[0054] As used herein, “polypeptide” refers to two or more amino acids linked together by a peptide bond. The term “polypeptide” includes proteins that have a C-terminal end and an N- terminal end as generally known in the art and may be synthetic in origin or naturally occurring As used herein “at least a portion of the polypeptide” refers to 2 or more amino acids of the polypeptide. A polypeptide may comprise one or more peptides. Optionally, a portion of the polypeptide includes at least: 1, 5, 10, 20, 30 or 50 amino acids, either consecutive or with gaps, of the complete amino acid sequence of the polypeptide, or the full amino acid sequence of the polypeptide.
[0055] As used herein, “affixed” refers to a connection between a polypeptide and a substrate such that at least a portion of the polypeptide and the substrate are held in physical proximity. The term “affixed” encompasses both an indirect or direct connection and may be reversible or irreversible, for example the connection is optionally a covalent bond or a non-covalent bond. [0056] As used herein, the term “sample” refers to a collected substance or material that comprises or is suspected to comprise one or more analytes of interest (e.g., biomolecules, e.g, polypeptides). A sample may be modified for purposes such as storage or stability. A sample may be naturally occurring or synthetic. A sample may be processed to separate or remove unwanted fractions or impurities from the analyte(s) of interest. A sample may be enriched or purified. For example, a sample may comprise a fraction of a separation process (e.g., chromatography, fractionation, electrophoresis, etc.). Alternatively, a sample may not be subjected to processing that separates or removes any unwanted fractions or impurities from the analyte(s) of interest. A sample may be obtained from any suitable source or location, including from organisms, cells, tissues, cell preparations, cell-freecompositions, the environment (e.g., air, water, dirt, soil, agriculture, soil, dust, sewage). A sample may be obtained from an organism or part of an organism, such as from a fluid, tissue, or cell. A sample may include biological and/or non-biological components. As used herein, the terms “biological sample” or “biological source” refer to a sample that is derived from a predominantly biological system or organism, such as one or more viral particles, cells (e.g. individualized cells), organelles (e.g. individualized organelles), tissues, bodily fluids, bone, cartilage, and exoskeleton. A biological sample may comprise a prokaryotic cell (e.g., bacteria) or eukaryotic cell (e.g., fungus, protist, algae, plant, animal). A biological sample may comprise a majority of biological material on a mass basis, excluding the weight of fluid within the sample. Biological samples may comprise one or more proteins, referred to herein as protein samples. Biological samples can be acquired from various
sources, e.g., from a clinical patient sample, such as blood, serum, plasma, Cerebral Spinal Fluid (CSF), saliva, mucosal secretions, sputum, urine, lymph, perspiration, vaginal fluid, semen, fecal matter, amniotic fluid, perspiration, synovial fluid, fine needle aspirates, a tissue biopsy, a tumor biopsy, etc. A biological sample may be processed to purify and retain one or more biomolecules (e.g., proteins, nucleic acids, carbohydrates, lipids, glycoproteins, lipoproteins, metabolites, etc.) from the biological sample. A biological sample (e.g., a protein sample) may be derived from cultured cells, which may be treated or untreated. A biological sample (e.g., a protein sample) can also result from tissue specimens, such as biopsy samples, which may optionally be processed to liberate biomolecules (e.g., proteins) contained therein. Tissue samples may also be derived from in vivo specimens, including fresh, frozen, acute, and fixed tissues.
[0057] As used herein, the term “hydrogel” refers to a three-dimensional polymeric structure that is substantially insoluble in water, but which is capable of absorbing and retaining large quantities of water to form a substantially stable, often soft and pliable, structure. In some embodiments, water can penetrate in between polymer chains of a polymer network, subsequently causing swelling and the formation of a hydrogel. In some embodiments, hydrogels are superabsorb ent (e.g., containingmore than about 90% water) and canbeincludedof natural or synthetic polymers. Examples of hydrogels include but are not limited to, hyaluronans, chitosans, agar, heparin, sulfate, cellulose, alginates (including alginate sulfate), collagen, dextrans (including dextran sulfate), pectin, carrageenan, polylysine, gelatins (including gelatin type A), agarose, (meth)acrylate-oligolactide-PEO-oligolactide-(meth)acrylate, PEO — PPO-PEO copolymers (Pluronics), poly(phosphazene), poly(methacrylates), poly(N-vinylpyrrolidone), PL(G)A-PEO- PL(G)A copolymers, polyethylene imine), polyethylene glycol (PEG)-thiol, PEG-acrylate, acrylamide, N,N'-bis(acryloyl)cystamine, PEG, polypropylene oxide (PPO), polyacrylic acid, poly(hydroxyethyl methacrylate) (PHEMA), poly(methyl methacrylate) (PMMA), poly(N- isopropylacrylamide) (PNIPAAm), poly(lactic acid) (PLA), poly(lactic-co-glycolic acid) (PLGA), polycaprolactone (PCL), poly(vinylsulfonic acid) (PVSA), poly(L-aspartic acid), poly(L-glutamic acid), bisacrylamide, diacrylate, diallylamine, triallylamine, divinyl sulfone, diethyleneglycol diallyl ether, ethyleneglycol diacrylate, polymethyleneglycol diacrylate, polyethyleneglycol diacrylate, trimethylopropoane trimethacrylate, ethoxylated trimethylol triacrylate, or ethoxylated pentaerythritol tetracrylate, or combinations thereof. A detailed description of suitable hydrogels may be found in published U.S. Patent Publication US 2010/0055733, herein specifically incorporated by reference. As used herein, the terms “hydrogel subunits” or “hydrogel precursors” mean hydrophilic monomers, prepolymers, or polymers that can be crosslinked, or “polymerized”, to form a three-dimensional (3D) hydrogel network. It is
believed that this fixation of the biological specimen in the presence of hydrogel subunits crosslinks the components of the specimen to the hydrogel subunits, thereby securing molecular components in place, preserving the tissue architecture and cell morphology.
[0058] As used herein, the terms “antibody” and “immunoglobulin” may generally refer to proteins that can recognize and bind to a specific antigen. An antibody or immunoglobulin may refer to an antibody isotype, fragments of antibodies including, but not limited to, Fab, Fv, scFv, and Fd fragments, chimeric antibodies, humanized antibodies, single-chain antibodies, and fusion proteins including an antigen-binding portion of an antibody and a non-antibody protein. The antibodies may be detectably labeled, e.g., with a fluorophore, radioisotope, enzyme (e.g, a peroxidase) which generates a detectable product, fluorescent protein, nucleic acid barcode sequence, and the like. The antibodies may be further conjugated to other moieties, such as members of specific binding pairs, e.g., biotin (member of biotin-avidin specific binding pair), and the like. Also encompassed by the terms are nanobodies, Fab', Fv, F(ab')2, scFv, and other antibody fragments that retain specific binding to antigen. Antibodies may exist in a variety of other forms including, for example, Fv, Fab, and (Fab)2, diabodies, monobodies, single domain antibodies (sdAb), as well asbi-functional (i.e., bi-specific, e.g., bi-specific T-cell engager) hybrid antibodies (e.g., Lanzavecchia et al., Eur. J. Immunol. 17, 105 (1987)) and in single chains (e.g, Huston et al., Proc. Natl. Acad. Sci. U.S.A., 85, 5879-5883 (1988) and Bird et al., Science, 242, 423-426 (1988), which are incorporated herein by reference). (See, generally, Hood et al., Immunology, Benjamin, N.Y., 2nd ed. (1984), and Hunkapiller and Hood, Nature, 323, 15-16 (1986), which are herein incorporated by reference).
[0059] “Binding” or “coupling” as used herein generally refers to a covalent or non-covalent interaction between two molecules (referred to herein as “binding partners”, e.g., a substrate and an enzyme or an antibody and an epitope). Binding between binding partners may be specific or non-specific.
[0060] As used herein, “specifically binds” or “binds specifically” generally refers to an interaction between bindingpartners (e.g., abindingpartner and a cognate molecule) suchthatthe binding partners bind to one another, but do not bind to other molecules that may be present in the environment (e.g., in a biological sample, in tissue, in an in vitro assay) under a set of conditions. A specific binding interaction may entail a binding partner that binds to a cognate molecule. The specific binding interaction may entail the binding of the binding partner to its cognate molecule at a significantly or substantially higher level or with greater affinity as compared to the binding of the binding partner to a non-cognate molecule. A specific binding interaction may entail a first
binding partner that has greater selectivity of binding to the cognate molecule as compared to a non-cognate molecule.
[0061] The terms “nucleic acid”, “nucleic acid molecule”, “oligonucleotide” and “polynucleotide” may be used interchangeably herein and generally refer to a polymeric form of naturally occurring or synthetic nucleotides, or analogs thereof, of any length. A nucleic acid molecule may comprise one or more deoxy ribonucleotides, deoxynucleotide triphosphates, dideoxynucleotide triphosphates, deoxynucleotide hexaphosphates, dideoxynucleotide hexaphosphates, ribonucleotides, hexitol nucleotides, cyclohexane nucleotides, or analogs or combinations thereof. A nucleic acid molecule may comprise, e.g., DNA, RNA, HNA, CeNA, and modified forms thereof. A nucleic acid molecule may comprise nucleotides that are linked by phosphodiester bonds. Anucleic acid molecule may have any two- orthree-dimensional structure, and may perform any function, known or unknown. A nucleic acid molecule may be single stranded, double stranded, or partially double stranded. Non-limiting examples of polynucleotides include a gene, a gene fragment, exons, introns, messenger RNA (mRNA), transfer RNA, ribosomal RNA, noncoding RNA, small interfering RNA, short hairpin RNA, micro RNA, scaRNA, ribozymes, riboswitches, viral RNA, complementary DNA (cDNA), cosmid DNA, mitochondrial DNA, chromosomal or genomic DNA, viral DNA, recombinant polynucleotides, branched polynucleotides, plasmids, vectors, isolated DNA of any sequence, control regions, isolated RNA of any sequence, nucleic acid probes, nucleic acid adapters, and primers. The nucleic acid molecule may be linear, circular, or any other geometry. Examples of polynucleotide analogs include but are not limited to xeno nucleic acid (XNA), bridged nucleic acid (BNA), glycol nucleic acid (GNA), hexitol nucleic acid (HNA), cyclohexane nucleic acid (CeNA), peptide nucleic acids (PNAs), yPNAs, morpholino polynucleotides, locked nucleic acids (LNAs), threose nucleic acid (TNA), 2 '-O-Methyl polynucleotides, 2'-O-alkyl ribosyl substituted polynucleotides, phosphorothioate polynucleotides, and boronophosphate polynucleotides. A polynucleotide analog may possess purine or pyrimidine analogs, including for example, 7-deaza purine analogs, 8-halopurine analogs, 5-halopyrimidine analogs, inverted base, or universal base analogs that can pair with any base, including hypoxanthine, nitroazoles, isocarb ostyril analogues, azole carboxamides, and aromatic tri azole analogues, or base analogs with additional functionality, such as a biotin moiety for affinity binding.
[0062] As used herein, the term “amino acid” generally refers to an organic compound that combines to form a protein or peptide. An amino acid generally comprises an amine group, a carboxylic acid group, and a side-chain specific to each amino acid, which serve as a monomeric subunit of a peptide. An amino acid may include the 20 standard, naturally occurring or canonical
amino acids as well as non-standard or non-canonical amino acids. The standard, naturally- occurring or canonical amino acids include Alanine (A or Ala), Cysteine (C or Cys), Aspartic Acid (D or Asp), Glutamic Acid (E or Glu), Phenylalanine (F or Phe), Glycine (G or Gly), Histidine (H or His), Isoleucine (I or He), Lysine (K orLys), Leucine (L or Leu), Methionine (M or Met), Asparagine (N or Asn), Proline (P or Pro), Glutamine (Q or Gin), Arginine (R or Arg), Serine (S or Ser), Threonine (T or Thr), Valine (V or Vai), Tryptophan (W or Trp), and Tyrosine (Y or Tyr). An amino acid may be an L-amino acid or a D-amino acid. Non-standard amino acids may be modified amino acids, amino acid analogs, amino acid mimetics, non-standard proteinogenic amino acids, or non-proteinogenic amino acids that occur naturally or are chemically synthesized. Examples of non-standard amino acids include, but are not limited to, selenocysteine, pyrro lysine, and N-formylmethionine, (3 -amino acids, Homo-amino acids, Proline and Pyruvic acid derivatives, 3 -substituted alanine derivatives, glycine derivatives, ring- substituted phenylalanine and tyrosine derivatives, linear core amino acids, and N-methyl amino acids.
[0063] As used herein, the term “modified amino acid” may refer to the amino acid-linker complex, the amino acid-linker-polymerizable molecule complex, or derivatives thereof. The modified amino acid may be used to refer to the amino acid-linker complex or the amino acidlinker-polymerizable molecule complex before or after cleavage. In some instances, the modified amino acid may refer to a portion of the amino acid-linker complex or the amino acid-linker- polymerizable molecule complex (e.g., justthe comprised aminoacid portion, justthe amino acidlinker complex portion, etc.).
[0064] As used herein, the term “amino acid type” generally refers to one of the standard, naturally-occurring or canonical amino acids, e.g., one member of the group consisting of Alanine (A or Ala), Cysteine (C or Cys), Aspartic Acid (D or Asp), Glutamic Acid (E or Glu), Phenylalanine (F or Phe), Glycine (G or Gly), Histidine (H or His), Isoleucine (I or He), Lysine (K or Lys), Leucine (L or Leu), Methionine (M or Met), Asparagine (N or Asn), Proline (P or Pro), Glutamine (Q or Gin), Arginine (R or Arg), Serine (S or Ser), Threonine (T or Thr), Valine (V or Vai), Tryptophan (W or Trp), Tyrosine (Y or Tyr), derivatives thereof, and modified forms of any of the aforementioned amino acids. The term “amino acid type” may be used herein to distinguish a plurality of amino acids that comprise different side chain groups, rather than a plurality of amino acids that are identical (e.g., different positional amino acids of a single peptide that have the same side chain). An amino acid type may comprise a modified version of one of the standard, naturally -occurring or canonical amino acids e.g., post translational modifications,
an epigenetic modification, or chemical or enzymatic modifications. In some instances, an amino acid type can include non-canonical amino acids.
[0065] As used herein, the term “post-translational modification” refers to modifications that occur on a peptide subsequentto translation. A post-translational modification maybe a covalent modification or enzymatic modification. Examples of post-translation modifications include, but are not limited to, acylation, acetylation, alkylation (including methylation), biotinylation, butyrylation, carbamylation, carbonylation, deamidation, deiminiation, diphthamide formation, disulfide bridge formation, eliminylation, flavin attachment, formylation, gamma-carboxylation, glutamylation, glycylation, glycosylation, glypiation, heme C attachment, hydroxylation, hypusine formation, iodination, isoprenylation, lipidation, lipoylation, malonylation, methylation, myristolylation, oxidation, transglutamination, palmitoylation, pegylation, phosphopantetheinylation, phosphorylation, prenylation, propionylation, retinylidene Schiff base formation, S-glutathionylation, S-nitrosylation, S-sulfenylation, S-adenosylation, selenation, succinylation, sulfination, ubiquitination, sumoylation, disulfide bond formation, and C-terminal amidation. A post-translational modification includes modifications of the amino terminus and/or the carboxyl terminus of a peptide. Modifications of the terminal amino group include, but are not limited to, des-amino, N-lower alkyl, N-di-lower alkyl, and N-acyl modifications. Modifications of the terminal carboxy group include, but are not limited to, amide, lower alkyl amide, dialkyl amide, and lower alkyl ester modifications (e.g., wherein lower alkyl is C1-C4 alkyl). A post- translational modification also includes modifications, such as but not limited to those described above, of amino acids falling between the amino and carboxy termini. The term post-translational modification can also include peptide modifications that include one or more detectable labels. A post-translational modification may be naturally occurring or synthetic.
[0066] As used herein, the term “binding agent” refers to a molecule, e.g., a nucleic acid molecule, a peptide, a polypeptide, a protein, carbohydrate, a synthetic molecule, or a small molecule that binds to, associates with, unites with, recognizes, or combines with another molecule. The binding agent may bind to a macromolecule or a component or feature of a macromolecule. A binding agent may form a covalent association or non-covalent association with a molecule, a macromolecule, or a component or feature of a macromolecule. A binding agent may also be a chimeric binding agent, composed of two or more types of molecules, such as a nucleic acid molecule-peptide chimeric binding agent, a carbohydrate-peptide chimeric binding agent, or a lipid-peptide chimeric binding agent. A binding agent may be a naturally occurring, synthetically produced, or recombinantly expressed molecule. A binding agent may bind to a single monomer or subunit of a polymeric analyte, such as a macromolecule (e.g., a
single amino acid of a peptide) or bind to a plurality of linked subunits of a macromolecule (e.g, a di-peptide, tri-peptide, or higher order peptide of a longer peptide, polypeptide, or protein molecule). A binding agent may bind to a linear molecule or a molecule having a three- dimensional structure (also referred to as conformation). For example, an antibody binding agent may bind to linear peptide, polypeptide, or protein, or bind to a conformational peptide, polypeptide, or protein. A binding agent may bind to an N-terminal peptide, a C-terminal peptide, or an intervening peptide of a peptide, polypeptide, or protein molecule. A binding agent may bind to an N-terminal amino acid, C-terminal amino acid, or an intervening amino acid of a peptide molecule. A binding agent may preferably bind to a chemically modified or labeled amino acid over a non-modified or unlabeled amino acid. For example, a binding agent may preferably bind to an amino acid that has been modified with an acetyl moiety, guanyl moiety, dansyl moiety, PTC moiety, DNP moiety, SNP moiety, etc., over an amino acid that does not possess such a moiety. A binding agent may bind to a post-translational modification, either naturally occurring or synthetic, of a peptide molecule. A binding agent may exhibit selective binding to a component or feature of a macromolecule (e.g., a binding agent may selectively bind to one of the 20 possible natural amino acid residues and with bind with very low affinity or not at all to the other 19 natural amino acid residues). A binding agent may exhibit less selective binding, where the binding agent is capable of binding a plurality of components or features of a macromolecule (e.g., a binding agent may bind with similar affinity to two or more different amino acid residues). A binding agent may comprise a tag, which may be coupled to the binding agent via a linker.
[0067] As used herein, the term “linker” generally refers to a molecule or moiety that is involved in joining two or more molecules. A linker may facilitate a covalent or noncovalent interaction of two or more molecules. A linker may be a crosslinker. The linker can be unifunctional, bifunctional, trifunctional, quadrifunctional, or polyfunctional. A linker can be or comprise a nucleotide, a nucleotide analog, an amino acid, a peptide, a polypeptide, or a nonnucleotide chemical moiety, such as an organic or inorganic compound. A linker may comprise a polymer, such as a polyethylene glycol (PEG), polyethylene, polypropylene, polyvinyl chloride, polystyrene or other organic or inorganic polymer. A linker may comprise one or more reactive ends, e.g., an amine-reactive group, a carboxyl-reactive group, a sulfhydryl-reactive group, a hydroxyl-reactive group, etc. Alternatively, a linker may not comprise a reactive end. In some examples, a linker may be used to join different molecule types, e.g., different biomolecule types such as a peptide with a nucleic acid molecule, a lipid with a peptide, a carbohydrate with a peptide, etc.; non-biomolecule types; or a biomolecule to a non-biomolecule. For example, a linker may be used to join a binding agent with a tag, a tag with a macromolecule (e.g., peptide, nucleic
acid molecule), a macromolecule with a solid support, a tag with a solid support, etc. A linker may join two molecules via enzymatic reaction or chemistry reaction (e.g., click chemistry). A linker may join more than two molecules, e.g., via enzymatic or chemical reactions.
[0068] The term “conjugated” asused herein generally refers to a covalent or ionic interaction between two entities, e.g., molecules, compounds, or combinations thereof.
[0069] As used herein, the term “tag” generally refers to a molecule or moiety that is conjugated to a molecule. Atagmay comprise a detectable label, e.g., a fluorophore or fluorescent protein, a radioactive isotope, an enzyme (e.g., a chromogenic or fluorescent protein, proteins that can catalyze chromogenic substrates), a mass tag, a hapten (e.g., biotin, digoxigenin, urushiol, fluorescein), a vibrational or FTIR tag (e.g., alkyne group). A tag may comprise a biomolecule, such as a nucleic acid molecule, a protein, a lipid, a carbohydrate, or a combination thereof. A tag may comprise one or more nucleic acid molecules, which may optionally encode information regarding the tag or the molecule onto which a tag is conjugated (e.g., a binding agent, such as an antib ody). For example, a tag may comprise a nucleic acid barcode molecule. A tag may comprise an organic compound or an inorganic compound. As used herein, the term “tag” may also refer to a patterned sequence of signals, wherein signals appear, skip, or disappear at defined intervals, creating a recognizable marker or reference within the signal data. This patterned appearance may encompass periodic or non-periodic intervals between signals, selective omissions, or complex structured combinations of signal presence and absence that collectively form a unique, detectable signature. For example, a tag may comprise a nucleic acid barcode molecule that has a series of distinct vibrational signatures detectable by techniques such as Raman or FTIR spectroscopy. [0070] As used herein, the term “barcode” generally refers to an identifying feature that may be used to distinguish similar items. A barcode may comprise a nucleic acid molecule of about 2 to about 30 bases. Abarcode may comprise a nucleic acid molecule of about2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 105, 110, 115, 120, 125, 130, 135, 140, 145, 150, or more b ases, which may provide a unique identifier tag or origin inf ormation f or a molecule (e.g., protein, polypeptide, peptide), a binding agent, a set of binding agents from a binding cycle, a sample molecule, a set of samples, molecules within a compartment (e.g., droplet, bead, partition or separated location), macromolecules within a set of compartments, a fraction of macromolecules, a set of macromolecule fractions, a spatial region or set of spatial regions, a library of macromolecules, or a library of binding agents. A barcode canbe an artificial sequence
or a naturally occurring sequence including peptides, proteins, protein complexes, carbohydrates, and synthetic polymeric materials. In certain embodiments, each barcode within a population of barcodes is different. In other embodiments, a portion of barcodes in a population of barcodes is different, e.g., at least about 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 97%, or 99% of the barcodes in a population of barcodesis different. A population of barcodes may be randomly generated or non-randomly generated. A population of barcodes may comprise error correcting barcodes. Barcodes can be used to computationally deconvolute sequence reads derived from an individual molecule, sample, library, etc. Barcodesmay comprise multiplexedinformation, e.g., arisingfrom different samples, compartments, individual molecules, etc. A barcode can also be used for deconvolution of a collection of molecules thathave been distributed into small compartments for enhanced mapping For example, rather than mapping a peptide back to the proteome, the peptide can be mapped back to its originating protein molecule or protein complex, a sample or partition from which it originated, etc. A barcode may comprise any useful structure moiety or motif, e.g., hairpins, loop sequences, or spacers. Barcodes can comprise artificial or modified nucleic acids, e.g., locked nucleic acids (LNA), protein nucleic acids (PNA), hexitol nucleic acids (HNA), cyclohexane nucleic acids (CeNA), or a combination thereof. Barcodes may comprise or be generated using a protein, e.g., Tai effector, Cas protein (e.g., Cas9), Argonaut, or coiled coils. A barcode may comprise any useful sequence, including repeat sequences (e.g., a poly -A, poly-T, poly-C, poly- G region) or the barcode may comprise non-repeat sequences.
[0071] As used herein, a “sample barcode”, also referred to as “sample tag” generally refers to a barcode molecule comprising identifying information of a sample from which a barcoded molecule derives.
[0072] As used herein, a “spatial barcode” generally refers to a barcode molecule comprising identifying information of a region of a 2-D or 3-D sample (e.g., a tissue section) from which a molecule originates or is derived. Spatial barcodes may be used for molecular pathology on tissue sections. A spatial barcode may allow for multiplex sequencing of a plurality of samples or libraries from tissue section(s).
[0073] As used herein, a “temporal barcode” generally refers to a barcode molecule comprising time-based information relating to the barcoded molecule. The types of time-based data encoded in a temporal barcode can include information such as a lifetime of a barcoded molecule, a time of collection of a sample, a time or duration since the beginning of an experiment or induction with a stimulus, information on the age of a cell or tissue, a sequence of interactions between molecules, a time or cycle or round (e.g., of an iterative process) in which the barcode
molecule is provided, among others. It is possible for different types of barcodes (e.g., spatial, temporal, cell-specific) to be combined in one multiplexed barcode.
[0074] As used herein, the term “fluorescent label,” “fluorescent tag,” or “fluorophore” comprises a signaling moiety that conveys information through the fluorescent absorption and/or emission properties of one or more molecules. Exemplary fluorescent properties comprise fluorescence intensity, fluorescence lifetime, emission spectrum characteristics and energy transfer. Fluorophores available for post- synthetic attachment comprise, but are not limited to, ALEXA FLUOR™ 350, ALEXA FLUOR™ 532, ALEXA FLUOR™ 546, ALEXA FLUOR™ 568, ALEX A FLUOR™ 594, ALEXA FLUOR™ 647, BODIPY493/503,BODIPYFL, BODIPY R6G, BODIPY 530/550, BODIPY TMR, BODIPY 558/568, BODIPY 558/568, BODIPY 564/570, BODIPY 576/589, BODIPY 581/591, BODIPY 630/650, BODIPY 650/665, Cascade Blue, Cascade Yellow, Dansyl, lissamine rhodamine B, Marina Blue, Oregon Green 488, Oregon Green 514, Pacific Blue, rhodamine 6G, rhodamine green, rhodamine red, tetramethyl rhodamine, Texas Red (available from Molecular Probes, Inc., Eugene, Oreg.), Cy2, Cy3.5, Cy5.5, and Cy7 (Amersham Biosciences, Piscataway, N.J.). FRET tandem fluorophores may also be used, comprising, but not limited to, PerCP-Cy5.5, PE-Cy5, PE-Cy5.5, PE-Cy7, PE-Texas Red, APC- Cy7, PE-Alexa dyes (610, 647, 680), and APC-Alexa dyes. Examples of fluorescent nucleotide analogues readily incorporated into nucleotide and/or polynucleotide sequences comprise, but are not limited to, Cy3-dCTP, Cy3-dUTP, Cy5-dCTP, Cy5-dUTP (Amersham Biosciences, Piscataway, N.J.), fluorescein- 12-dUTP, tetramethylrhodamine-6-dUTP, TEXAS RED™-5- dUTP, CASCADE BLUE™-7-dUTP, BODIPY TMFL-14-dUTP, BODIPY TMR-14-dUTP, BODIPY TMTR-14-dUTP, RHOD AMINE GREEN™-5-dUTP, OREGON GREENR™ 488-5- dUTP, TEXAS RED™-12-dUTP, BODIPY™ 630/650-14-dUTP, BODIPY™ 650/665-14- dUTP, ALEXA FLUOR™ 488-5-dUTP, ALEXA FLUOR™ 532-5-dUTP, ALEXA FLUOR™ 568-5-dUTP, ALEXA FLUOR™ 594-5-dUTP, ALEXA FLUOR™ 546-14-dUTP, fluorescein- 12-UTP, tetramethylrhodamine-6-UTP, TEXAS RED™-5-UTP, mCherry, CASCADEBLUE™- 7-UTP, BODIPY™ FL-14-UTP, BODIPY TMR-14-UTP, BODIPY™ TR-14-UTP, RHOD AMINE GREEN™-5-UTP, ALEXA FLUOR™ 488-5-UTP, and ALEXA FLUOR™ 546-14- UTP (Molecular Probes, Inc. Eugene, Oreg.). For exemplary methods for custom synthesis of nucleotides having other fluorophores, see, Henegariu et al. (2000) Nature BiotechnoL 18:345. More examples of fluorescent labels and nucleotides and/or polynucleotides conjugated to such fluorescent labels comprise those described in, for example, Hoagland, Handbook of Fluorescent Probes and Research Chemicals, Ninth Edition (Molecular Probes, Inc., Eugene, 2002); Keller and Manak, DNA Probes, 2nd Edition (Stockton Press, New York, 1993); Eckstein, editor,
Oligonucleotides and Analogues: A Practical Approach (IRL Press, Oxford, 1991); and Wetmur, Critical Reviews in Biochemistry and Molecular Biology, 26'221 -259 (1991). In some embodiments, exemplary techniques and methods methodologies applicable to the provided embodiments comprise those described in, for example, U.S. Pat. Nos. 4,757,141, 5, 151,507 and 5,091,519. In some embodiments, one or more fluorescent dyes are used as labels for labeled target sequences, for example, as described in U.S. Pat. No. 5, 188,934 (4,7-dichlorofluorescein dyes); U.S. Pat. No. 5,366,860 (spectrally resolvable rhodamine dyes); U.S. Pat. No. 5,847,162 (4,7-dichlororhodamine dyes); U.S. Pat. No. 4,318,846 (ether-substituted fluorescein dyes); U.S. Pat. No. 5,800,996 (energy transfer dyes); U.S. Pat. No. 5,066,580 (xanthine dyes); and U.S. Pat. No. 5,688,648 (energy transfer dyes). Labelling can also be carried out with quantum dots, as described in U.S. Pat. Nos. 6,322,901, 6,576,291, 6,423,551, 6,251,303, 6,319,426, 6,426,513, 6,444,143, 5,990,479, 6,207,392, US 2002/0045045 and US 2003/0017264. All references are herein incorporated by reference in their entireties.
[0075] As used herein, the term “FRET” refers to fluorescence resonance energy transfer, a process in which chemical moieties (e.g., fluorophores) transfer energy among themselves, or from a fluorophore to a non-fluorophore (e.g., a quencher molecule). In some circumstances, FRET involves an excited donor fluorophore transferring energy to a acceptor fluorophore via a short-range (e.g., about 10 nm or less) dipole-dipole interaction. In other circumstances, FRET involves a loss of fluorescence energy from a donor and an increase in fluorescence in an acceptor fluorophore. In still other forms of FRET, energy can be exchanged from an excited donor fluorophore to a non-fluorescing molecule (e.g., a quenching molecule). FRET includes Time- Resolved FRET (or TR-FRET), which combines the use of long-lived fluorophores and time- resolved detection (a delay between excitation and emission detection) to minimize fluorescent interference due to any inherent fluorescence of, e.g., target molecules or target-selective binding agents (see, e.g., Klostermeier et al. (2001 -2002) Biopolymers 61(3): 159-79). FRET is known to those of skill in the art and has been described (See Stryeretal., 1978,24/?/?. Rev. Biochem. ,47 :819; Selvin, 1995, Methods Enzymol., 246:300; Orpana, 2004 Biomol Eng 21, 45-50; Olivier, 2005 Mutant Res 573, 103-110, each of which is incorporated herein by reference in its entirety).
[0076] As used herein, “chemiluminescence” means light resulting from a chemical reaction in which one or more reagents of the reaction undergo a chemical change. The term “chemiluminescence” is intended to encompass electrochemiluminescence (e.g., a chemiluminescent reaction that occurs subsequently to an electrochemical reaction), bioluminescence (e.g., light resulting from biological reactions), phosphorescence (e.g., a type of photoluminescence similar to fluorescence, but it involves a longer-lived excited state, resulting
in delayed light emission after the excitation source is removed), as well as light resulting from other types of reactions. Non-limiting examples of chemiluminescent reagents include luminol, isoluminol, acridinium esters, lucigenin, peroxyoxalates, firefly luciferin, coelenterazine, dioxetanes, benzoyl peroxide, oxalyl chloride derivatives, 1,2-dioxetanone, and pyrogallol derivatives. Non-limiting examples of electrochemiluminescent reagents include luminol, acridan ester, ruthenium, ruthenium chelate, and ruthenium tribipyridine, and NHS ester. Non-limiting examples of bioluminescence reagents include firefly luciferin, Renilla luciferin (coelenterazine), bacterial luciferin, vargulin, and Cypridina luciferin. Non-limiting examples of phosphorescence reagents include Ruthenium(II) complexes, iridium(III) complexes, europium(III) complexes, terbium(III) complexes, erythrosin, eosin, halogenated aromatic compounds(e.g., bromo- oriodo- substituted benzenes), strontium aluminate, zinc sulfide, calcium tungstate, anthracene derivatives, carbazole compounds.
[0077] A “nucleotide sequence” according to the present invention may include any polymer or oligomer of nucleotides such as pyrimidine and purine bases, such as cytosine, thymine, and uracil, and adenine and guanine, respectively and combinations thereof. The nucleotide sequence may comprise any deoxyribonucleotide, ribonucleotide, hexitol-nucleotide, cyclohexanenucleotide, peptide nucleic acid component, and any chemical variants thereof, such as methylated, 7-deaza purine analogs, 8-halopurine analogs, hydroxymethylated or glycosylated forms of these bases, and the like. The polymers or oligomers may be heterogeneous or homogenous in composition and may be isolated from naturally occurring sources or may be artificially or synthetically produced. In addition, a nucleotide sequence may be DNA, RNA, HNA, CeNA or a mixture thereof, and may exist permanently or transitionally in single-stranded or double-stranded form, including homoduplex, heteroduplex, and hybrid states.
[0078] A “nucleic acid molecule” accordingto the present invention may include any polymer or oligomer of nucleotides such as pyrimidine and purine bases, such as cytosine, thymine, and uracil, and adenine and guanine, respectively and combinations thereof. The nucleotide sequence may comprise any deoxyribonucleotide, ribonucleotide, hexitol-nucleotide, cyclohexanenucleotide, peptide nucleic acid component, and any chemical variants thereof, such as methylated, 7-deaza purine analogs, 8-halopurine analogs, hydroxymethylated or glycosylated forms of these bases, and the like. The polymers or oligomers may be heterogeneous or homogenous in composition and may be isolated from naturally occurring sources or may be artificially or synthetically produced. A nucleic acid molecule may comprise DNA, RNA, HNA, CeNA or a mixture thereof, and may exist permanently or transitionally in single-stranded or double-stranded form, including homoduplex, heteroduplex, and hybrid states.
[0079] The terms “complementary” or “complementarity” refer to polynucleotides (i.e., a sequence of nucleotides) related by Watson-Crick base-pairing rules. For example, the sequence “5'-AGT-3',” is complementary to the sequence “5'- ACT-3'”. Complementarity maybe “partial,” in which only some of the nucleic acids' bases are matched according to the base pairing rules, or there may be “complete” or “total” complementarity between the nucleic acids. The degree of complementarity between nucleic acid strands can have significant effects on the efficiency and strength of hybridization between nucleic acid strands under defined conditions.
[0080] As used herein, the term “hybridization” is used in reference to the pairing of complementary nucleic acids. Hybridization and the strength of hybridization (e.g., the strength of the association between the nucleic acids) is influenced by such factors as the degree of complementary between the nucleic acids, stringency of the conditions involved, and the melting temperature of the formed hybrid. Hybridization methods involve the annealing of one nucleic acid to another, complementary nucleic acid, e.g., based on Watson-Crick base pairing.
[0081] As used herein, the term “proteomics” generally refers to quantitative and/or qualitative analysis of the proteome within a sample, such as biological sample, e.g., from cells, tissues, or bodily fluids. Proteomics may include the analysis of spatial distributions of proteins within a sample (e.g., cell and/or tissues). Proteomics may include studies of the dynamic state of the proteome, e.g., how one or more proteins change in time. A proteome may comprise multiple “-omes”, e.g., a kinome; a secretome; a receptome (e.g., GPCRome); an immunoproteome; a nutriproteome; a proteome subset defined by a post-translational modification (e.g., phosphorylation, ubiquitination, methylation, acetylation, glycosylation, oxidation, lipidation, and/or nitrosylation), such as a phosphoproteome (e.g., phosphotyrosine-proteome, tyrosine- kinome, and tyrosine-phosphatome), a glycoproteome, etc.; a proteome subset associated with a tissue or organ, a developmental stage, or a physiological or pathological condition; a proteome subset associated a cellular process, such as cell cycle, differentiation (or de-differentiation), cell death, senescence, cell migration, transformation, or metastasis; or any combination thereof.
[0082] The terminal amino acid at one end of the peptide chain that has a free amino group may be referred to herein as the “N-terminal amino acid” (NTAA). The terminal amino acid at the other end of the chain that has a free carboxyl group may be referred to herein as the “C-terminal amino acid” (CTAA). The amino acids making up a peptide may be numbered in order, with the peptide being “n” amino acids in length. As used herein, in some instances, NTAA may be considered the nth amino acid (also referred to herein as the “n NTAA”). In such cases, the next amino acid is the n- 1 amino acid, then the n-2 amino acid, and so on down the length of the peptide from the N-terminal end to C-terminal end. Alternatively, CTAA may be consideredthe nth amino
acid (also referred to herein as the “n CTAA”). In such cases, the next amino acidis the n-1, then the n-2 amino acid, and so on down the length of the peptidefromthe C-terminal end toN-terminal end. An NTAA, CTAA, or both may be modified or labeled with a chemical moiety.
[0083] As used herein, the terms “determining,” “measuring,” “assessing,” and“assaying” are used interchangeably and include both quantitative and qualitative determinations.
[0084] As used herein, the term “unique molecular identifier” or “UMI” generally refers to a molecule barcodecomprisingindexinginformation. AUMImay comprise a nucleic acid molecule of about 3 to about 150 bases (3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22,
23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48,
49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74,
75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100,
105, 110, 115, 120, 125, 130, 135, 140, 145, or 150 bases) in length. A UMI may provide a unique identifier tag for each molecule (e.g., peptide, binding agent, a nucleic acid molecule) that comprises or is coupled to a UMI. A UMI may comprise a random sequence (e.g., a random N- mer).
[0085] As used herein, a “derivative” of a nucleic acid molecule generally refers to a nucleic acid molecule that is derived from an originating nucleic acid molecule. The derivative may have the same or substantially the same nucleotide sequence as the originating nucleic acid molecule, or the derivative may comprise a complement or partial complement as the originating nucleic acid molecule. A derivative may be the same type of nucleic acid (e.g., DNA or RNA) as the originating nucleic acid molecule, or the derivative may be a different type of nucleic acid (e.g, cDNA generated from an RNA molecule). A nucleic acid molecule derivative may display sequence identity as the originating nucleic acid molecule. The derivative nucleic acid molecule may also be subjected to additional processing from the originating nucleic acid molecule, e.g., chemical or enzymatic modification, splicing, ligation, polymerization, fragmentation, tagmentation (e.g., using a transposase), digestion, etc.
[0086] A derivative polypeptide or peptide may be derived from an originating polypeptide (or peptide). A derivative may comprise the same amino acid sequence as the originating polypeptide, or the sequence may be different. The derivative polypeptide may result from or be subjected to additional processing from the originating polypeptide, e.g., chemical or enzymatic modification. The derivative polypeptide may comprise one or more tags, nucleic acid molecules, barcode molecules, labels (e.g., detectable lab els), fluorophores, probes, linkers, post-translational modifications, chemical protecting groups, or other chemical moieties.
[0087] As used herein, the term “compartment” or “partition” generally refers to a physical area or volume that separates or isolates a subset of molecules from a sample of molecules. For example, a compartment may separate an individual cell from other cells, or a sub set of a sample’s proteome from the rest of the sample's proteome. A compartmentmay be an aqueous compartment (e.g., microfluidic droplet), a solid compartment (e.g., picotiter well or microtiter well on a plate, tube, vial, gel bead), ora separated region on a surface. A compartmentmay comprise one or more beads to which macromolecules may be immobilized.
[0088] As used herein, the term “solid support”, “solid surface”, or “solid substrate” or “substrate” refers to any solid material, including porous and non-porous materials, to which a molecule can be associated directly or indirectly. The molecule may be associated with the substrate by covalent or non-covalent interactions, or a combination thereof. A substrate may be two-dimensional (e.g., planar surface) or three-dimensional (e.g., gel matrix or bead). A solid support may comprise, in non-limiting examples, a bead, a microbead, an array, a glass surface, a silicon surface, a plastic surface, a filter, a membrane, nylon or other polymer, a silicon wafer chip, a flow through chip, a flow cell, a microfluidic device or chip or a surface thereof, a biochip including signal transducing electronics, a channel, a microtiter well, an ELISA plate, a spinning interferometry disc, a nitrocellulosemembrane, a nitrocellulose-based polymer surface, a polymer matrix, a nanoparticle, or a microsphere. Materials for a solid support include but are not limited to acrylamide, agarose, cellulose, nitrocellulose, glass, gold, quartz, polystyrene, polyethylene vinyl acetate, polypropylene, polymethacrylate, polyethylene, polyethylene oxide, polysilicates, polycarbonates, Teflon, fluorocarbons, nylon, silicon rubber, polyanhydrides, polyglycolic acid, polylactic acid, polyorthoesters, functionalized silane, polypropylfumerate, collagen, glycosaminoglycans, poly amino acids, dextran, or any combination thereof. Solid supports further include thin film, membrane, bottles, dishes, fibers, woven fibers, shaped polymers such as tubes, particles, beads, microspheres, microparticles, or any combination thereof. For example, when solid surface is a bead, the bead can include, but is not limited to, a ceramic bead, polystyrene bead, a polymer bead, a methylstyrene bead, an agarose bead, an acrylamide bead, a solid core bead, a porous bead, a paramagnetic bead, a glass bead, or a controlled pore bead. Ahead maybe spherical or an irregularly shaped. A bead's size may range from nanometers, e.g., 1 nm, 10 nm, 100 nm, to millimeters, e.g., 1 mm. In certain embodiments, beads range in size from about 0.2 micron to about200 microns, orfrom about0.5 micron to about 5 microns. In some embodiments, beads can be about 1, 1.5, 2, 2.5, 2.8, 3, 3.5, 4, 4.5, 5, 5.5, 6, 6.5, 7, 7.5, 8, 8.5, 9, 9.5, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64,
65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, or 100 m in diameter. In certain embodiments, “a head” solid support may refer to an individual bead or a plurality of beads. A solid support may assume any useful geometry, e.g., pyramid, cube, cylinder, helix, sphere, spheroid, rod, disc, arrow, spring teardrop, prism, tetrapod, or any other useful geometry.
[0089] As used herein, “sequencing” generally refers to determining the order and identity of: (A) nucleotides (base sequences) in a nucleic acid sample, e.g., DNA orRNA; or determining the order and identity of (B) amino acids in all or part of a polymer, such as a protein, peptide, or other multimeric molecule. Many techniques are available for nucleic acid sequencing, such as Sanger sequencing or High Throughput Sequencing technologies (HTS). Sanger sequencing may involve sequencing via detection through (capillary) electrophoresis, in which up to 384 capillaries may be sequence analyzed in one run. High throughput sequencing involves the parallel sequencing of thousands or millions or more sequences at once. HTS can be defined as Next Generation sequencing (NGS), i.e. techniques based on solid phase pyrosequencing or as Next-Next Generation sequencingbased on single nucleotide real time sequencing (SMRT). HTS technologies are available such as offered by Roche, Illumina and Applied Biosystems (Life Technologies). Further high throughput sequencing technologies are described by and/or available from Helicos, Pacific Biosciences, Complete Genomics, Ion Torrent Systems, Oxford Nanopore Technologies, Nabsys, ZS Genetics, GnuBio. Additional sequencing methods include Raman sequencing and Infrared (IR) sequencing, which utilizes Raman spectroscopy or IR spectroscopy to detect molecular vibrations associated with specific nucleotide or amino acid sequences, enabling label-free sequencing based on unique vibrational energy signatures. Tunneling current sequencing identifies base sequences or amino acid sequences through electronic signal variations as nucleotides or amino acids pass through a nanoscale gap, detecting characteristic tunneling currents specific to each molecular component.
[0090] As used herein, “next generation sequencing” refers to high-throughput sequencing methods that allow the sequencing of millions to billions of molecules in parallel. Examples of next generation sequencing methods include sequencing by synthesis, sequencing by ligation, sequencing by hybridization, polony sequencing, ion semiconductor sequencing, nanopore sequencing, and pyro sequencing. By attaching primers to a solid substrate and a complementary sequence to a nucleic acid molecule, a nucleic acid molecule can be hybridized to the solid substrate via the primer and then multiple copies can be generated in a discrete area on the solid substrate by using polymerase to amplify (these groupings are sometimes referred to as polymerase colonies or polonies). Consequently, during the sequencing process, a nucleotide at a
particular position can be sequenced multiple times (e.g., hundreds or thousands of times) — this depth of coverage is referred to as “deep sequencing.” Examples of high throughput nucleic acid sequencing technology include platforms provided by Illumina, BGI, Qiagen, ThermoFisher, and Roche, including formats such as parallel bead arrays, sequencing by synthesis, sequencing by ligation, capillary electrophoresis, electronic microchips, “biochips,” microarrays, parallel microchips, and single-molecule arrays, zero mode waveguide based sequencing, some of which are reviewed by Service (Science 311 : 1544-1546, 2006).
[0091] As used herein, “analyzing” the macromolecule means to quantify, characterize, distinguish, or a combination thereof, all or a portion of the components of a molecule (e.g., a macromolecule, a biological molecule such as a protein, amino acid, nucleic acid molecule, etc.). For example, analyzing a peptide, polypeptide, or protein may comprise determining all or a portion of the amino acid sequence (contiguous or non-continuous) of the peptide. Analyzing a macromolecule may include partial identification of a component of the macromolecule. For example, partial identification of amino acids in a protein sequence can identify an amino acid in the protein as belonging to a subset of possible amino acids. Analysis may be performed sequentially, e.g., beginning with analysis of the n NTAA, and then proceeding to the next amino acid of the peptide (i.e., n-1, n-2, n-3, and so forth). In such instances, sequencing may be performed by cleavage ofthen NTAA, thereby converting the n-1 amino acid of the peptide to an N-terminal amino acid (referred to herein as the “n-1 NTAA”). Similarly, analysis of a peptide may begin from C-terminus towards the N-terminus with each round of cleavage from the C- terminus creating a new CTAA. Cleavage of the n CTAA converts the n-1 amino acid of the peptide to a C-terminal amino acid, referred to herein as an “n-1 CTAA”. Analyzing the peptide may also include determining a presence and frequency of post-translational modifications on the peptide, which may or may not include information regarding the sequential order of the post- translational modifications on the peptide. Analyzing the peptide may also include determining the presence and frequency of epitopes in the peptide, which may or may not include information regarding the sequential order or location of the epitopes within the peptide. Analyzingthe peptide may include combining different types of analysis, for example obtaining epitope information, amino acid sequence information, post-translationalmodificationinformation,orany combination thereof.
[0092] As used herein, the term “analyte” generally refers to a substance that is of interest to be further identified, characterized, or measured. An analyte can be, in non-limiting examples, an ion, chemical, compound, small molecule, element, particle, metal, biomolecule, macromolecule, metabolite, lipid, carbohydrate, peptide or protein, nucleic acid molecule, organelle, or cell. An
analyte may be naturally occurring or synthetic. The analyte may be a solid, semi-solid, liquid, semi-liquid, gas, or plasma. The analyte may be characterized qualitatively or quantitatively. A portion of an analyte may be analyzed. For example, an analyte may be a peptide and the constituent amino acids may be analyzed. The analyte may comprise a polymer, also referred to herein as “polymeric analyte”, which generally refers to an analyte of interest that comprises one or more monomers. A polymeric analyte can be, in non-limiting examples, a group of ions, chemicals, compounds, small molecules, elements, particles, metals, or a biomolecule, macromolecule, metabolite, lipid, carbohydrate, peptide or protein, nucleic acid molecule, organelle, or cell.
[0093] As used herein, the term “array” generally refers to a population of molecules that is attached to one or more solid supports such that the molecules at one address can be distinguished from molecules at other addresses. An array can include different molecules that are each located at different addresses on a solid support. Alternatively, an array can include separate solid supports each functioning as an address that bears a different molecule, wherein the different molecules can be identified according to the locations of the solid supports on a surface to which the solid supports are attached, or according to the locations of the solid supports in a liquid such as a fluid stream. The molecules of the array can be, for example, nucleic acids such as SNAPs, polypeptides, proteins, peptides, oligopeptides, enzymes, ligands, or receptors such as antibodies, functional fragments of antibodies or aptamers. The addresses of an array can optionally be optically observable, and, in some configurations, adjacent addresses can be optically distinguishable when detected using a method or apparatus set forth herein.
[0094] As used herein, the term “functionalized” refers to any material or substance that has been modified to include a functional group. A functionalized material or substance may be naturally or synthetically functionalized. For example, a polypeptide can be naturally functionalized with a phosphate group, oligosaccharide (e.g., glycosyl, glycosylphosphatidylinositol or phosphoglycosyl), nitrosyl, methyl, acetyl, lipid (e.g., glycosyl phosphatidylinositol, myristoyl orprenyl), ubiquitin or other naturally occurringpost-translational modification. A functionalized material or substance may be functionalized for any given purpose, including altering chemical properties (e.g., altering hydrophobicity or changing surface charge density) or altering reactivity (e.g., capable of reactingwith a moiety or reagentto form a covalent bond to the moiety or reagent).
[0095] As used herein, the term “anchoring group” refers to a molecule or particle that serves as an intermediary attaching a protein orpeptideto a surface (e.g., a solid support or a microbead). An anchoring group may be covalently or non-covalently attached to a surface and/or a
polypeptide. An anchoring group may be a biomolecule, polymer, particle, nanoparticle, or any other entity that can attach to a surface or polypeptide. In some cases, an anchoring group may be a structured nucleic acid particle.
[0096] As used herein, the term “click reaction,” “click chemistry,” or “bioorthogonal reaction” refers to single-step, thermodynamically favorable conjugation reaction utilizing biocompatible reagents. A click reaction may utilize no toxic or biologically incompatible reagents (e.g., acids, bases, heavy metals) or generate no toxic or biologically incompatible byproducts. A click reaction may utilize an aqueous solvent or buffer (e.g., phosphate buffer solution, Tris buffer, saline buffer, MOPS, etc.). A click reaction may be thermodynamically favorable if it has a negative Gibbs free energy of reaction, for example a Gibbs free energy of reaction of less than about -5 kiloJoules/mole (kJ/mol), -10 kJ/mol, -25 kJ/mol, -50 kJ/mol, -100 kJ/mol, -200 kJ/mol, -300 kJ/mol, -400 kJ/mol, or less than -500 kJ/mol. Exemplary bioorthogonal and click reactions are describedin detail in WO 2019/195633A1, which is herein incorporated by reference in its entirety. Exemplary click reactions may include metal-catalyzed azide-alkyne cycloaddition, strain-promoted azide-alkyne cycloaddition, strain-promoted azide- nitrone cycloaddition, strained alkene reactions, thiolene reaction, Diels- Alder reaction, inverse electron demand Diels-Alder reaction, [3+2] cycloaddition, [4+1] cycloaddition, nucleophilic substitution, dihydroxylation, thiolyne reaction, photoclick, nitrone dipole cycloaddition, norbomene cycloaddition, oxanob ornadiene cycloaddition, tetrazine ligation, and tetrazole photoclick reactions. Exemplary functional groups or reactive handles utilized to perform click reactions (also referred to herein as “click chemistry moieties”) may include alkenes (e.g., linear alkenes or cyclic alkenes such as trans-cyclooctene (TCO)), alkynes (e.g., linear alkynes or cycloalkynes (e.g., cyclooctynes or derivatives thereof, e.g., aza-dimethoxycyclooctyne (DIMAC), symmetrical pyrrolocyclooctyne (SYPCO), pyrrolocyclooctyne (PYRROC), difluorocyclooctyne (DIFO), a,a-bis(trifluoromethyl)pyrrolocyclooctyne
(TRIPCO), bicyclo[6.1.0]nonyne (BCN), dibenzocyclooctyne (DIBO), difluorinated cyclooctyne (DIFO), difluorobenzocyclooctyne (DIFBO), dibenzoazacyclo-octyne (DBCO), difluoro-aza- dibenzocyclooctyne (F2-DIBAC), biaryl-azacyclooctynone (BARAC), difluorodimeth oxydib enzocyclooctynol (FMDIBO), difluorodimeth oxydibenzocyclooctynone (keto-FMDIBO), and 3,3,6,6-tetramethylthiacycloheptyne (TMTH)), TMTH-sulf oximine (TMTHSI), azides, epoxides, amines, thiols, nitrones, isonitriles, isocyanides, aziridines, activated esters, and tetrazines, triazoles, and combinations, variations, or derivatives thereof. The click chemistry moieties may be subjected to conditions sufficient to react the first click chemistry
moiety to the second click chemistry moiety, e.g., provision of metal catalysts, appropriate solvents, pH, temperature, ionic concentration, or light/energy, for any useful duration of time. [0097] As used herein, the terms “group” and “moiety” are intended to be synonymous when used in reference to the structure of a molecule. The terms refer to a component or part of the molecule. The terms do not necessarily denote the relative size of the component or part compared to the molecule, unless indicated otherwise. The terms do not necessarily denote the relative size of the component or part compared to any other component or part of the molecule, unless indicated otherwise. A group or moiety can contain one or more atoms.
[0098] As used herein, “primers” generally refer to nucleic acid molecules which can prime the synthesis of a nucleic acid molecule (e.g., DNA or RNA). A primer may be single stranded. A primer may comprise one or more recognition sites for a protein (e.g., a polymerizing enzyme, a restriction enzyme, a cleaving enzyme, a nuclease, etc.) to bind to the primer or a primer hybridized to a template strand. A primer may comprise DNA, RNA, or other nucleic acid analogs or noncanonical bases (e.g., spacer moieties, uracils, abasic sites). A primer may optionally comprise any number of functional sequences such as sequencing primer sequences (e.g., P5 or P7 sequences), sequencing primer-binding sequences, read sequences (e.g., R1 orR2 sequences), restriction sites, nuclease-recognition sites, abasic sites, cleavage sites, transposition sites (e.g., mosaic end sequences), a barcode sequence, a unique molecular identifier (UMI), etc.
[0099] “Amplification” or “amplifying” generally refers to a polynucleotide amplification reaction, namely, a population of polynucleotides that are replicated from one or more starting sequences. Amplifying may refer to a variety of amplification reactions, including but not limited to polymerase chain reaction (PCR), linear polymerase reactions, nucleic acid sequence-based amplification, rolling circle amplification and similar reactions. An amplification reaction may generate an amplicon.
[00100] An “adapter” as referred to herein, generally refers to a short nucleic acid molecule (e.g., about 10 to about 100 base pairs in length). An adapter may comprise a short double-stranded DNA molecule. An adapter may be attached, e.g., via polymerization or ligation, to an end of a DNA fragments or amplicons. Adapters may comprise synthetic oligonucleotides, e.g., oligonucleotides that have nucleotide sequences which are at least partially complementary to each other. An adapter may have blunt ends, may have staggered ends (also referred to herein as a 3 ’ or 5’ “overhang sequence” or “sticky end”, or a blunt end and a staggered end. Adapters may be attached (e.g., via ligation) to fragments to provide an adapter-ligated fragment; the adapter- ligated fragment may serve as a starting point for subsequent manipulation e.g., for amplification
or sequencing. An adapter may be functionalized, e.g., conjugated with a tag, probe, detectable label, affinity capture reagent (e.g., biotin or streptavidin).
[00101] The term “capture moiety” as used herein generally refers to a molecule that is configured to be coupled to another moiety or molecule. A capture moiety can be a biomolecule, e.g., a lipid, carbohydrate, sugar, amino acid, peptide orprotein, nucleotide, nucleic acidmolecule, metabolite, or a combination thereof (e.g., glycoproteins, lipoproteins, glycosaminoglycans, etc.). A capture moiety can be a small molecule, organic compound, inorganic compound, metal, polymer, ion, or other molecule or molecular compound. A capture moiety may comprise a macromolecule. A capture moiety may comprise an enzyme, antibody, antibody fragment, nanobody, aptamer, biotin, streptavidin, avidin, neutravidin, or analogs or derivatives thereof. A capture moiety may comprise more than one molecule, e.g., a dimer, trimer, tetramer, pentamer, hexamer, heptamer, octamer, etc. A capture moiety can be a solid substrate or part of a solid substrate, or the capture moiety can be separate from a substrate, e.g., in a fluidic medium (e.g, air, in a liquid solution). A capture moiety may have specificity to a binding partner or a plurality of binding partners. A capture moiety may be able to bind to one molecule or moiety (univalent), or a plurality of molecules or moieties (multivalent).
[00102] As used herein, the abbreviations for the natural 1 -enantiomeric amino acids are conventional and can be as follows: alanine (A, Ala); arginine (R, Arg); asparagine (N, Asn); aspartic acid (D, Asp); cysteine (C, Cys); glutamic acid (E, Glu); glutamine (Q, Gin); glycine (G, Gly); histidine (H, His); isoleucine (I, He); leucine (L, Leu); lysine (K, Lys); methionine (M, Met); phenylalanine (F, Phe); proline (P, Pro); serine (S, Ser); threonine (T, Thr); tryptophan (W, Trp); tyrosine (Y, Tyr); valine (V, Vai). Unless otherwise specified, X can indicate any amino acid. In some aspects, X can be asparagine (N), glutamine (Q), histidine (H), lysine (K), or arginine (R). References to these amino acids are also in the form of “[amino acid] [residues/residues]” (e.g, lysine residue, lysine residues, leucine residue, leucine residues, etc.).
[00103] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. Although methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present disclosure, suitable methods and materials are described below.
Protein Sequencing Using Super-Resolution Imaging
[00104] The present disclosure provides approaches for sequencing polymeric analytes e.g, peptides, polymers, nucleic acid molecules, etc., in a highly parallelized and low-cost manner.
Sy stems and methods ofthe present disclosuremay comprise generating modified monomers from a polymeric analyte and detecting the modified monomers, thereby sequencing the polymeric analyte or a portion thereof. In some instances, the modified monomers are generated from the polymeric analyte using an intramolecular expansion process, e.g., using linkers and polymerizable molecules, which can increase the molecular distance between monomers. In one such example, the modified monomer comprises a modified amino acid generated from a peptidic polymeric analyte. The modified amino acid may be generated by coupling a polymerizable molecule to an amino acid of a peptide using a linker and cleavingthe amino acid from the peptide. In some instances, a plurality of modified amino acids may be coupledto one anothervia coupling of the polymerizable molecules to generate a stacked plurality of modified amino acids. One or more modified monomers (e.g., modified amino acids) or a stacked plurality of modified monomers may be contacted with abinding agent comprisinga detectable label, andthe detectable labels may be detected, thereby identifying a modified monomer of the one or more modified monomers. In some embodiments, the detectable label comprises a fluorescent lab el, aFRETlabel (e.g., a donor fluorophore paired with an acceptor fluorophore or a quencher molecule), a chemiluminescence label, an electrochemiluminescence label, a bioluminescence label, a phosphorescence label, or a label that generates light through other types of reactions or stimulations. In some embodiments, the detection of the detectable labels is performed using super-resolution imaging. Beneficially, the systems and methods provided herein enable high- throughput and low-cost sequencing of complex polymeric analytes, such as proteins.
[00105] In one aspect of the present disclosure, provided herein is a method for characterizing a modified amino acid, comprising: providing the modified amino acid and identifying the modified amino acid. In some instances, the identifying comprises using super-resolutionimaging In some instances, the identifying further comprises contacting the modified amino acid with a binding agent, wherein the binding agent comprises a detectable label; and detectingthe detectable label (e.g., using super-resolution imaging), thereby identifying an amino acid type of the modified amino acid. One or more operations of the method may be repeated to characterize multiple modified amino acids arising from a single or plurality of peptides. In some instances, the modified amino acid comprises a polymerizable molecule coupled thereto and is generated using an intramolecular expansion process, which may be useful in increasing the molecular distance between amino acids of a peptide and allow for image-based techniques for detection.
[00106] Modified amino acids: The modified amino acid or derivative thereof may originate from or be part of a protein or peptide; for example, the modified amino acid may comprise or be
derived from an amino acid located at a terminus (N-terminus or C-terminus) of a peptide, or the modified amino acid may comprise or be derived from an amino acid located within the peptide (also referred to herein as an internal amino acid or internal residue). The modified amino acid may comprise a proteinogenic amino acid or derivative thereof with any number of modifications. Examples of modifications include, in non-limiting examples, chemical modifications (e.g., protecting groups), biological modifications (e.g., post-translational modifications, modifications introduced by enzymatic treatment or digestion), physical modifications (e.g., mutations introduced by irradiation, heat, etc.), and the like. In some instances, the modified amino acid or derivative thereof comprises or is coupled to a binding agent, such as an antibody, antibody fragment, nanobody, aptamer, peptide, a small molecule, an inorganic compound, a polymer, or any variations or combinations thereof. In some instances, the modified amino acid comprises a non-naturally occurring chemical modification. For example, the modified amino acid may comprise a protecting group, such as, in non-limiting examples, a methyl, formyl, ethyl, acetyl, t- butyl, anisyl, benzyl, tifluoroacetyl, N-hydroxysuccinimide, t-butyloxycarbonyl, benzoyl, 4- methyl benzyl, thioanizyl, thiocresyl, benzyloxy methyl, 4-nitrophenyl, benzyloxycarbonyl, 2- nitrobenzoyl, 2-nitrophenylsulphenyl, 4-toluenesulphonyl, pentafluorophenyl, diphenylmethyl, 2- chlorobenzyl oxy carbonyl, 2,4,5-trichlorophenyl, 2-bromobenzyloxycarbonyl, 9- fluoreny Im ethyl oxycarbonyl, triphenylmethyl, or 2,2,5,7,8-pentamethyl-chroman-6-sulphonyl group. In some instances, the modified amino acid comprises a proteinogenic amino acid or derivative thereof that is coupled to a polymerizable molecule. In some instances, the modified amino acid comprises a proteinogenic amino acid or derivative thereof, a linker, and a polymerizable molecule.
[00107] The modified amino acid may comprise any useful modification. Modifications may be naturally occurring (e.g., post translational modifications) or non-naturally occurring, such as by labeling or tagging, e.g., with an amino acid- or amine-reactive agent or linker comprising the amino acid- or amine-reactive agent. Examples of amino acid- or amine-reactive agents include isothiocyanate (e.g., PITC, NITC), l-fluoro-2, -4-dinitrobenzene (DNFB), dansyl chloride, 4- sulfonyl-2-nitrobfluorobenzene (SNFB), an acetylating agent, an acylating agent, an alkylating agent, a guanidination agent, a thioacetylation agent, a dithioester, a thioacylation agent, a thiobenzoylation agent, a xanthate, or a derivative or combination thereof. Alternatively, or in addition to, the one or more modified amino acids may comprise an adduct (e.g., a polymer such as PEG, a polymerizable molecule such as a nucleic acid molecule, a nanoparticle or nanotube, a peptide or protein), a lipid, a carbohydrate, a metabolite, a fluorophore, a hapten, a quencher, a tag (e.g., a fluorescent tag, a magnetic tag, a radioactive tag), a barcode, or other moiety. In some
instances, a modified amino acid may comprise a modification that facilitates recruitment of an enzyme (or ribozyme orDNAzyme) to recognize or cleave a terminal amino acid, e.g., aNTAA or CTAA of a peptide. For example, a terminal amino acid of a peptide may be modified with a saccharide in order to recruit a lectin or lectin-bound protease. In another example, one or more modified amino acids may comprise or be coupled to a nucleic acid molecule having a first sequence that is complementary to a second sequence comprised by an oligo-bound protease. Hybridization of the first sequence to the second sequence may facilitate local recruitment of the protease to the amino acid to be cleaved. In yet another example, a peptide may be modified with phenylisothiocyanate (PITC), which may allow for recruitment and cleavage of the modified amino acid by an Edmanase. In some examples, modifications to amino acids may include epitope tags, which can facilitate binding of a binding agent to the modified amino acid. Examples of such epitope tags include fluorophores, nucleic acid molecules, peptides, haptens, polymers, chemical moieties, or other adduct molecule.
[00108] The methods described herein may further comprise generating the modified amino acid. In some instances, the modified amino acid comprises a proteinogenic amino acid or derivative thereof, a linker, and a polymerizable molecule. In one example, a peptide comprising a terminal amino acid may be contacted with a linker that comprises (i) first reactive moiety capable of reacting with an amino acid and (ii) a second reactive moiety. Prior to, during, or subsequent to the reaction of the first reactive moiety with the terminal amino acid, a polymerizable molecule comprising a third reactive moiety that is capable of reacting with the second reactive moiety may be provided. The second and third reactive moieties may comprise click chemistry moieties that can react with one another (e.g., azide and DBCO, azide and BCN, alkyne and DBCO, TCO and tetrazine, etc.). The reaction of the terminal amino acid with the linker and the linker to the polymerizable molecule may thus yield a peptide-linker-polymerizable molecule complex comprising the terminal amino acid, the linker, and the polymerizable molecule. In some instances, cleavage of the terminal amino acid from the peptide may be performed, and the modified amino acid may comprise the cleaved product comprising the cleaved, and optionally derivatized, amino acid, the linker, and the polymerizable molecule.
[00109] Polymerizable Molecule: The polymerizable molecule may comprise any useful polymerizable moiety . The polymerizable moiety may comprise a naturally occurring or synthetic polymer (organic or inorganic) or biopolymer. The polymer may comprise one or more monomer types. In instances where more than one monomer type is used, the polymer may form an alternating copolymer structure, a periodic copolymer structure, a random copolymer structure, a
block copolymer structure, a chained or grafted copolymer, or any other useful structure. The polymer may be linear or non-linear. The polymer may comprise polyethylene glycol (PEG), PEG-diacrylate, PEG-acrylate, PEG-thiol, PEG-azide, PEG-alkyne, polyacrylamide, agarose, collagen, fibrin, gelatin, chitosan, hyaluronic acid, alginate, polyvinyl alcohol, or anotherpolymer. [00110] In some examples, the polymerizable molecule comprises a biomolecule, e.g., a protein or peptide, a nucleic acid molecule, e.g., DNA or RNA, a carbohydrate or lipid chain. In some instances, the polymerizable molecule comprises a nucleic acid molecule, which can comprise any useful number and type of nucleotides, e.g., including canonical and noncanonical bases, and the number of nucleotides may be modulated based on the intended purpose. For example, the length of the nucleic acid molecules may be modulated to alter a property of the amino acid (e.g, volume, aspectratio, charge, etc. )to which the polymerized molecule istethered. The nucleic acid molecule may additionally or alternatively comprise any useful functional sequences, including but not limited to barcode sequences or other identifying sequences, UMI sequences, enzyme recognition sites (e.g., transposition sites, restriction sites), spacer sequences, sequencing primer sequences, read sequences, or primer sequences. The nucleic acid molecule may comprise canonical bases, noncanonical or modified bases, naturally occurring bases, synthetic bases, abasic sites, nucleotide analogs, or a combination thereof. In some instances, the modified amino acid or derivative thereof is generated using an iterative process, as described elsewhere herein; accordingly, the nucleic acid molecule may comprise information on the round or cycle number of the iterative process. In some instances, the nucleic acid molecule comprises a nucleic acid barcode molecule and may compriseuseful information onthe identity of the amino acid, temporal information, spatial information, etc.
[00111] The polymerizable molecules described herein may be any useful type of polymerizable molecule. The polymerizable molecules may by naturally occurring, such as biological polymers (e.g., nucleic acid molecules, peptides, polysaccharides, fatty acids), or other naturally occurring polymers, e.g., rubber, cellulose, starches, polyhydroxyalkanoates, chitosan, dextran, structural proteins (e.g., collagen, hyaluronic acid, glycosaminoglycans), agarose, carrageenan, isphagula, acacia, agar, gelatin, shellac, xanthan gum, guar gum, alginate, etc. The polymerizable molecules may be synthetic, e.g., acrylics, nylons, silicones, viscose, rayon, polyesters, poly carboxylic acids, polyvinyl acetate, polyacrylamide, polyacrylate, polyethylene glycol, polyurethane, polylactic acid, silica, polystyrene, polyacrylonitrile, polybutadiene, polycarbonate, polyethylene terephthalate, poly(chlorotrifluoroethylene), poly(ethylene oxide), polyethylene terephthalate), polyethylene, polyisobutylene, poly(methyl methacrylate), poly(oxymethylene), poly formaldehyde, polypropylene, polystyrene, polytetrafluoroethylene),
poly(vinyl acetate), poly(vinyl alcohol), poly(vinyl chloride), poly(vinylidene dichloride), poly(vinylidene difluoride), poly(vinyl fluoride) and combinations thereof. The polymerizable molecules may comprise one or more reactive moieties (e.g., radical groups) to initiate polymerization or may be polymerized via contacting of an initiating agent (e.g., ammonium persulfate, peroxide, or other radicalizing agent). The polymerizable molecules may be polymerizable via contacting of an enzyme (e.g., polymerizing enzyme such as polymerases), ribozyme or DNAzyme. Alternatively, or in addition to, the polymerizable molecules may be polymerizable via self-assembly. The polymerizable molecules may comprise a single polymer type (e.g., a homopolymer) or more than one polymer type (e.g., a copolymer) and may comprise random or arranged monomers. The polymerizable molecules may be a block polymer, alternating copolymer, periodic copolymer, statistical copolymer, stereoblock copolymer, gradient copolymer, branched copolymer, graft copolymer, etc.
[00112] The same or different types of polymerizable molecules may be used in the methods described herein. For example, the first polymerizable molecule comprised by or coupled to the binding agent may be a nucleic acid molecule, and the second polymerizable molecule may be a peptide. In another example, both the first polymerizable molecule and the second polymerizable molecule are nucleic acid molecules. In such an example, the first polymerizable molecule may be coupled to the second polymerizable molecule via ligation or hybridization. For instance, the first polymerizable molecule may comprise a first nucleic acid sequence and the second polymerizable molecule may comprise a second nucleic acid sequence. The first nucleic acid sequence may be complementary or partially complementary to the second nucleic acid sequence, and the coupling may comprise hybridizing the first nucleic acid sequence or portion thereof to the second nucleic acid sequence or portion thereof. Alternatively, the first nucleic acid sequence and the nucleic acid sequence may be complementary to two sequences of a splint or bridge oligonucleotide, and coupling may be mediated via hybridization to the splint oligo. The first nucleic acid sequence may be ligated to the second nucleic acid sequence, either chemically (e.g, via click chemistry approaches in which the first polymerizable molecule and the second polymerizable molecule comprise one member of a click chemistry pair) or enzymatically (e.g, using a ligase).
[00113] The polymerizable molecules may comprise functional portions. For example, the polymerizable moleculesmay comprise a nucleic acidmolecule comprising a functional sequence, such as a primer sequence (e.g., universal priming site), a sequencing sequence, a read sequence, a unique molecular identifier (UMI), a barcode sequence, a cleavage sequence (e.g., a restriction
site, a Cas-binding sequence), a transposition sequence (e.g., a mosaic end sequence), or a combination thereof.
[00114] The polymerizable molecules may be any useful size. The polymerizable molecules may be about 1 angstrom, about 2 angstrom, about 3 angstrom, about 4 angstrom, about 5 angstrom, about 6 angstrom, about ? angstrom, about 8 angstrom, about 9 angstrom, about 10 angstrom, about 20 angstrom, about 30 angstrom, about40 angstrom, about 50 angstrom, about 60 angstrom, about 70 angstrom, about 80 angstrom, about 90 angstrom, about 100 angstrom, about 200 angstrom, about 300 angstrom, about 400 angstrom, bout 500 angstrom, about 600 angstrom, about 700 angstrom, about 800 angstrom, about 900 angstrom, about 1000 angstrom, about 10,000 angstrom, about 100,000 angstrom or greater in size, length, or another dimension. In some instances, the polymerizable molecule (e.g., the first polymerizable molecule or the second polymerizable molecule) comprises a nucleic acid molecule comprising one or more nucleotide bases. The polymerizable molecule may comprise any useful number of nucleotide bases, e.g., about 1 base, about 2 bases, about 3 bases, about 4 bases, about 5 bases, about 6 bases, about ? bases, about 8 bases, about 9 bases, about 10 bases, about 20 bases, about 30 bases, about 40 bases, about 50 bases, about 60 bases, about 70 bases, about 80 bases, about 90 bases, about 100 bases, about 200 bases, about 300 bases, about 400 bases, about 500 bases, about 600 bases, about 700 bases, about 800 bases, about 900 bases, about 1000 bases, or a greater number of bases. [00115] The polymerizable molecules may comprise a nucleic acid molecule. The nucleic acid molecule can be single stranded, double stranded, or partially double-stranded. The nucleic acid molecule may comprise a modified nucleotide or non-canonical base. For instance, the polymerizable molecules may comprise a pseudo-complementary base, a bridged nucleic acid (BNA), a xenonucleic acid (XNA), a locked nucleic acid (LNA), a peptide nucleic acid (PNA), a gamma-PNA molecule, a morpholino, or a combination thereof. In some instances, a polymerizable molecule may comprise a hexitol nucleic acid (HNA) or a cyclohexyl nucleic acid (CeNA), which may be useful in rendering the polymerizable molecule more resistant to acid degradation (e.g., as used in conventional Edman degradation). Alternatively, or in addition to, a polymerizable molecule may comprise naturally occurring bases that are more resistant to acid degradation, e.g., be composed of primarily thymine or cytosine. For example, a nucleic acid molecule may comprise at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, or 100% thymines or cytosines, which can render the nucleic acid molecule more acid resistant as compared to a nucleic acid molecule comprising adenines or guanines.
[00116] The modified amino acid may comprise a polymerizable molecule coupled covalently or non -covalently thereto. The coupling may be performed using any suitable chemistries and
reaction conditions and may comprise the use of a linker. In one such example, a nucleic acid molecule may comprise a first reactive group, e.g., a first click chemistry moiety, as described elsewhere herein, and may be contacted with a linker comprising a second reactive group, e.g, a second click chemistry moiety that is able to react with the first reactive group. The linker may also comprise an additional reactive group that is able to tether to an amino acid (e.g., a terminal amino acid) and optionally, cleave the amino acid from a peptide. For example, the additional reactive group may be a thiocyanate conjugate, e.g., an isothiocyanate (ITC) such as phenyl isothiocyanate (PITC) or naphthylisothiocyanate (NITC), or an aldehyde group, e.g., orthophthalaldehyde (OPA), 2,3-naphthalenedicarboxyaldehyde (NDA), a guanidinylating agent, dinitrofluorobenzene (DNFB), dansyl chloride, a dithioester, a thiobenzoyl, a thioacetyl, a xanthate, or other amino acid-reactive group. The linker may be reacted with an amino acid of a peptide, e.g., the NTAA or CTAA. Use of such a linker comprising at least two reactive groups may allow for (i) tethering of the amino acid to the linker and (ii) tethering of the linker to the nucleic acid molecule (see, e.g., FIG. 2A). In some instances, the linker may be provided pretethered to the nucleic acid molecule prior to contacting with the amino acid. In some instances, the conjugation of the polymerizable molecule to the amino acid, either via a linker or without a linker, may change the chemical structure of the amino acid. For example, if using a linker comprising an isothiocyanate moiety, the amino acid may be derivatized to a thiocarbamyl group (e.g., under alkaline conditions), a thiazolone group (e.g., under acid conditions), a thiohydantoin group, or other chemical moiety, thereby generating a modified amino acid comprising a polymerizable molecule coupled thereto. In some instances, the linker comprises a fluorophore that can be used as an amino acid sensor, such as a BODIPY or modified BODIPY dye, as shown in FIGs. 2B and 2C. In such instances, the fluorophore may produce a unique signal or have unique photophysical properties (e.g., intensity, brightness, dynamic range, emission spectrum, emission intensity, fluorescence lifetime, quantum yield) based on the specific amino acid the linker binds to. An example of such an amino acid-sensor dye is described in Hendrick etal. 2024. bioRxiv. DOI: https://doi.org/10. 1101/2024.04.02.587799, which is incorporated by reference herein in its entirety. In such instances, use of high-resolution imaging (e.g., super-resolution or single-molecule imaging) may be used to detect the identity of the amino acid. As shown in FIG. 2B, the solid lines connecting the coupling moiety (e.g., amino acid reactive group), click chemistry moiety, and/or fluorophore can be or comprise a nucleotide, a nucleotide analog, an amino acid, a peptide, a polypeptide, a non-nucleotide chemical moiety, a polymer, or any combination of these. The linker may further comprise additional functional groups to enhance its versatility or functionality, e.g., an additional hydrophilic group.
[00117] Linkers: One or more linkers may be used to couple the polymerizable molecule to an amino acid or derivative thereof to thereby generate the modified amino acid. In some instances, the linker comprises a click chemistry moiety. The click chemistry moiety may comprise any suitable bioorthogonalmoieties, as described elsewhere herein, e.g., alkenes, alkynes (e.g., alkyne, cycloalkynes such as DBCO and BCN), azides, epoxides, amines, thiols, nitrones, isonitriles, isocyanides, aziridines, activated esters, and tetrazines, and combinations, variations, or derivatives thereof. The linker may be subjected to conditions sufficient to react the first click chemistry moiety to the second click chemistry moiety, e.g., provision of metal catalysts, appropriate solvents, pH, temperature, ionic concentration, or light/energy for any useful duration of time.
[00118] The linker may comprise an amino acid-reactive moiety. The amino acid- reactive moiety of the linker may be any useful moiety that enables the reactive moiety to conjugate to and optionally cleave an amino acid. In some examples, the third reactive moiety can react with a terminal amino acid (e.g., NTAA or CTAA). In such examples, the third reactive moiety may comprise any primary amine or carboxylic group reactive group, including but not limited to isocyanates, acyl azides, NHS esters, sulfonyl chlorides, aldehydes, glyoxals, epoxides, oxiranes, carbonates, aryl halides, imidoesters, carbodiimides, anhydrides, phenyl esters, isothiocyanates (e.g., phenyl isothiocyanate, sodium isothiocyanate, ammonium isothiocyanates (e.g., tetrabutylammonium isothiocyanate, tetrabutylammonium isothiocyanate), dithioesters, thiocarbamoyls, thiobenzoyl, guanidinylating agents, xanthates, diphenylphosphoryl isothiocyanate), acetyl chloride, cyanogen bromide, carboxypeptidases, mal eimide, succinimide, thiol-thiol disulfide bonds, vinyl, methylcyclopropene, acryloyl, allyl, among others. Additional examples of amino acid reactive groups are provided in U.S. Pat. Pub. No. 2020/0217853, which is incorporated by reference herein in its entirety.
[00119] The linker may comprise any additional useful moieties. For example, the linker may comprise a releasable or cleavable moiety, which may facilitate removal of the amino acid-linker complex from the polymerizable molecule, or portion thereof. Such a releasable or cleavable moiety may comprise, for example, a disulfide bond, which may be releasable by contacting with areducingagent(e.g., DTT, TCEP). In some examples, the linker may couple to the polymerizable molecule via the releasable or cleavable moiety, alternatively or in addition to the coupling via click chemistry moieties. As such, the coupling between the polymerizable molecule and the linker may be reversible. The linker may additionally comprise any number of spacing moieties, e.g., polymers (e.g., PEG, PV A, polyacrylamide), aminohexanoic acid, nucleic acids, alkyl chains,
etc. Such spacing moieties may increase the distance between any other moieties of the linker, e.g., the amino acid-reactive group and the polymerizable molecule-reactive group.
[00120] The coupling of the linker to the monomer or capture moiety may be covalent or noncovalent. In an example, a linker may comprise a first reactive group that is able to couple to a monomer of the polymeric analyte (e.g., an amino acid of a peptide) and optionally, cleave the amino acid from a peptide. For example, the first reactive group may be an amino-acid reactive group, e.g., an isothiocyanate (ITC) such as phenyl isothiocyanate (PITC), 3-pyridyl isothiocyanate (PYITC), 2-piperidinoethyl isothiocyanate (PEITC), 3-(4-morpholino) propyl isothiocyanate (MPITC), 3- (diethylamino)propyl isothiocyanate (DEPTIC) or naphthylisothiocyanate (NITC), fluorescein isothiocyanate (FITC), ammonium thiocyanate, potassium thiocyanate, trimethylsilyl isothiocyanate (TMS-ITC), phenyl phosphoroisothiocyanatidate, acetyl isothiocyanate (AITC), or an aldehyde group, e.g., orthophthalaldehyde (OP A), 2, 3 -naphthalenedicarb oxyaldehyde (ND A), 2-pyridinecarboxyaldehyde, which can react with an N-terminal amino acid (NTAA). The linker may additionally comprise a second reactive group that is capable of coupling, either directly or indirectly, to the capture moiety. In an example of direct coupling, the capture moiety may comprise a click chemistry moiety (e.g., alkyne), and the second reactive group of the linker may comprise an additional click chemistry moiety (e.g., azide)thatcan reactwith the click chemistry moiety of the capture moiety. Alternatively, the linker may be coupled indirectly to the capture moiety, e.g., via noncovalent interaction or via an intermediate linking molecule. In some instances, the intermediate linking molecule may comprise a polymerizable molecule (e.g., a polymer or nucleic acid molecule) that can couple the linker to the capture moiety . In one such example, the polymerizable molecule may comprise (i) a third reactive group that is capable of coupling to the second reactive group (e.g, via alkyne-azide click chemistry) of the linker and (ii) a moiety that can couple to the capture moiety (e.g., another orthogonal click chemistry reaction, avidin-biotin interaction, nucleic acid coupling or hybridization). In some instances, the linking polymerizable molecule comprises a nucleic acid molecule that comprises (i) a click chemistry moiety (e.g., alkyne) that can conjugate to the first reactive group (e.g., azide) of the linker and (ii) a nucleic acid sequence that can couple to the capture moiety, e.g., via ligation, splint ligation, or hybridization. In some instances, the linker comprises a linking nucleic acid molecule that comprises a self-splinting moiety.
[00121] When applicable, the click chemistry moieties of the linker and capture moiety or intermediate linking molecule may comprise any suitable bioorthogonal moieties, as described elsewhere herein, e.g., alkenes, alkynes, azides, epoxides, amines, thiols, nitrones, isonitriles, isocyanides, aziridines, activated esters, and tetrazines, and combinations, variations, or
derivatives thereof. The linker may be subjected to conditions sufficient to react the first click chemistry moiety to the second click chemistry moiety, e.g., provision of metal catalysts, appropriate solvents, pH, temperature, ionic concentration, or light/energy for any useful duration of time.
[00122] The linker may comprise any additional useful moieties. For example, the linker may comprise a releasable or cleavable moiety, which may facilitate removal of the monomer from the polymeric analyte, or portion thereof, or from the substrate. Such a releasable or cleavable moiety may comprise, for example, a disulfide bond, which may be releasable by contacting with a reducing agent (e.g., DTT, TCEP). In some examples, the linker may couple to the third polymerizable molecule via the releasable or cleavable moiety, alternatively or in addition to the coupling via click chemistry moieties. As such, the coupling between the polymerizable molecule and the linker may be reversible. The linker may additionally or alternatively comprise any number of spacing moieties, e.g., polymers (e.g., PEG, PVA, polyacrylamide), aminohexanoic acid, nucleic acids, alkyl chains, etc. Such spacing moieties may increase the distance between any other moieties of the linker, e.g., the amino acid-reactive group and the polymerizable molecule-reactive group. The linker may comprise or be coupled to a detectable moiety, e.g., a fluorophore, radioisotope, mass tag, nucleic acid molecule (which can also act as a releasable or cleavable moiety), or other detectable moiety. In some examples, the linker comprises a fluorophore, which can enable localization visualization of the linker using single- molecule imaging. In another example, the monomer may be labeled with a first fluorophore and the linker may comprise a second fluorophore to enable localization visualization of the linker and the monomer (e.g., using two-channel imaging or FRET).
[00123] In some instances, the polymerizable molecule comprises a linker. The linker may be used, for instance, for coupling a reactive moiety to the polymerizable molecule, which reactive moiety can react with that of another linker. In one such example, a polymerizable molecule, e.g, nucleic acid molecule, may comprise a linker that comprises a click chemistry moiety. The linker comprising the click chemistry moiety may be coupled to the polymerizable molecule using any useful approach, e.g., by incorporation of a linker-conjugated nucleotide or nucleoside and may be located at any useful position (e.g., at a 5 ’ end, at a 3 ’ end, in the center of the polymerizable molecule). For example, a click-functionalized nucleotide or nucleoside, e.g., ethynyl deoxyuridine, octadiynyl deoxyuridine, can be incorporated into the backbone of a DNA or RNA molecule. As such, the click chemistry moiety of the polymerizable molecule may then couple to another linker that comprises a complementary click chemistry moiety and also an amino acid reactive group (e.g., isothiocyanate, dansyl chloride, DNFB, etc.).
[00124] The linker may comprise any number of spacing moieties, e.g., alkyl chains, polymer spacers (e.g., PEG), nucleic acid or oligo spacers, or other useful spacing moieties which may be useful in modulating the size or molecular weight of the linker. For example, the linker may comprise atleast 1, at least2, at least 3, at least4, at least 5, at least 6, at least 7, at least 8, at least
9, at least 10, or a greater number of spacing moieties (e.g., hydrocarbon units, PEG units, nucleotides or spacer sequences etc.). The linker may comprise at most about 100, at most about
10, at most about 9, at most about 8, at most about ?, at most about 6, at most about 5, at most about4, at most about 3, at most about 2, or at most 1 spacing moiety. The linker may comprise any useful number of functional groups, e.g., for attachmentto multiple molecules. The linker may comprise atleast 1, at least 2, at least 3, at least 4, at least 5, at least 6, atleast 7, atleast 8, at least 9, at least 10, or a greater number of functional groups.
[00125] Intramolecular Expansion: The modified amino acid or derivative thereof may be generated using an intramolecular expansion process, e.g., using one or more linkers and polymerizable molecules. In intramolecular expansion, individual amino acids, clusters of amino acids, orsmall peptides (e.g., a dipeptide, tripeptide, or quadripeptide) of a protein (e.g., a peptide, polypeptide, or protein analyte) may be sequentially removed and re-tethered together, such that the distance between the individual amino acids or clusters of amino acids is increased. Beneficially, performing an intramolecular expansion process of one or more amino acids of a peptide may enable advanced imagingtechniques, e.g., super-resolution imaging, of the expanded amino acids which would be otherwise impossible whilst the amino acids are attached to the peptide. For instance, by increasing the spacing between amino acids, the distance between amino acids may be sufficient to allow for single amino acid resolution using fluorescence superresolution imaging that would not be possible for single amino acids within a peptide (due to the resolution limit, steric crowding of binding agents, etc.).
[00126] In an example, a method for intramolecular expansion may comprise providing a peptide comprising a plurality of amino acids, a linker (e.g., as described elsewhere herein), a polymerizable molecule (e.g., as described elsewhere herein), and a capture moiety. The linker may be configured to couple to (i) an amino acid (e.g., NTAA or CTAA) of the peptide and (ii) the polymerizable molecule. The method may further comprise contacting the linker with the amino acid and the polymerizable molecule. Alternatively, or in addition, the linker may be provided pre-tethered to the polymerizable molecule and subsequently reacted with the amino acid. The linker may couple to the amino acid of the peptide to generate an amino acid-linker complex, which may or may not comprise the polymerizable molecule. In the instances that the
amino acid-linker complex comprises the polymerizable molecule, the amino acid-linker complex may then couple to the capture moiety via the polymerizable molecule. For example, the polymerizable molecule and the capture moieties may both comprise nucleic acid molecules, which may be coupled via hybridization, ligation, an extension reaction, or combination thereof. In some instances, the method may further comprise, cleaving the amino acid from the peptide to yield a modified amino acid that comprises the amino acid-linker complex, and optionally repeating the process. In an example in which the process is repeated, an additional linker may be provided which is configured to couple to (i) an additional amino acid of the peptide (e.g., the n- 1 NTAA or n-1 CTAA) and (ii) an additional polymerizable molecule. Alternatively, the additional linker may be provided precoupled to the additional polymerizable molecule. The method may further comprise contacting the additional linker with the additional amino acid to generate an additional amino acid-linker complex. The additional polymerizable molecule may be coupled to the additional linker prior to, during, or subsequent to the coupling of the additional linker to the additional amino acid. The additional polymerizable molecule may be configured to couple to the (first) modified amino acid, e.g., via coupling of the polymerizable molecule and the additional polymerizable molecule. As such, in some examples, sub sequent to generation of the additional linker- additional amino acid complex, the additional linker-additional amino acid complex may couple to the modified amino acid, thereby generating a stacked plurality of modified amino acids see, e.g., FIGs. 1A-1B). The additional amino acid may be cleaved from the peptide prior to, during, or sub sequentto generation of the stacked plurality of modified amino acids.
[00127] A “modified amino acid” as used herein may refer to the amino acid-linker complex, the amino acid-linker-polymerizable molecule complex, or derivatives thereof. The modified amino acid may be used to refer to the amino acid-linker complex or the amino acid-linker- polymerizable molecule complex before or after cleavage. In some instances, the modified amino acid may refer to a portion of the amino acid-linker complex or the amino acid-linker- polymerizable molecule complex (e.g., justthe comprised aminoacid portion, justthe amino acidlinker complex portion, etc.).
[00128] In some instances, intramolecular expansion of the peptide or protein may occur across a plurality of capture moieties. For example, a substrate may be provided that comprises a plurality of capture moieties, and, in some instances, the capture moieties are located adj acentto the peptide or protein. A first amino acid (e.g., n NTAA or n CTAA) of the peptide or protein may be coupled to a first capture moiety (e.g., via a first linker and a first polymerizable molecule), a second amino acid (e.g., n-1 NTAA or n-1 CTAA) may be coupled to a second capture moiety (e.g., via a second
linker and a second polymerizable molecule), and a third amino acid (e.g., n-2 NTAA or n-2 CTAA) may be coupled to a third capture moiety (e.g., via a third linker and third polymerizable molecule). In another example, a first amino acid (e.g., n NTAA) maybe coupled to a first capture moiety, a second amino acid (e.g., n-1 NTAA) may be coupled to the modified amino acid (e.g, from the n NTAA), thereby generating a stacked plurality of modified amino acids, and a third amino acid (e.g., n-2 NTAA) may be coupled to a second capture moiety. In some instances, the polymerizable molecule may comprise temporal information, e.g., abarcode on the round or cycle number thatthe polymerizable molecule is provided. As will be appreciated, any number of amino acids (or modified amino acids) may be coupled to any number of capture moieties.
[00129] Capture Moieties: A capture moiety may couple to the amino acid, the linker, or the polymerizable molecule. The coupling of the amino acid, the linker, or the polymerizable molecule to the capture moiety may comprise a covalent interaction or a noncovalent interaction. The coupling may occur by interaction of binding pairs, e.g., biotin and avidin (or streptavidin), antigen or epitope and antibody or antibody fragment, cyclodextrins and small hydrophobic molecules (e.g., alkanes, benzene, polycyclics), cucurbiturils and adamantaneammonium or trimethylammoniomethyl ferrocene, cyclophane (e.g., calixarenes, cavitands, pillararenes, tetralactams), etc. In some embodiments, the coupling of the amino acid, the linker, or the polymerizable molecule to the capture moiety occurs through coupling of nucleic acid molecules (e.g., hybridization to one another or to a splint molecule or a nucleic acid extension).
[00130] In some instances, the capture moiety comprises an additional polymerizable molecule (e.g., a nucleic acid molecule or peptide). In one such example, both the polymerizable molecule of the modified amino acid and the capture moiety may comprise nucleic acid molecules. The nucleic acid molecules may be coupled to one another, e.g., via complementary base pairing directly or via a splint molecule and optional ligation. Alternatively, or in addition to, the nucleic acid molecules may be coupled via an nucleic acid extension or amplification reaction.
[00131] The nucleic acid molecule of the capture moiety or the polymerizable molecule can comprise any naturally occurring, non-naturally occurring or engineered nucleotide base. For example, the nucleic acid molecule may comprise a pseudo-complementary base, a bridged nucleic acid, a xenonucleic acid, a locked nucleic acid, a peptide nucleic acid (PNA), a gamma- PNA, a morpholino, etc., as is described elsewhere herein. The capture moiety may comprise one or more functional sequences, including, but not limited to a priming sequence, sequencing sequence (e.g., P5 or P7 sequence), sequencing read sequence (e.g., R1 or R2 sequence), a mosaic end sequence, a transposase recognition sequence, a cleavage site (e.g., restriction site), a UMI, a
blocking group, a spacer sequence, a barcode sequence, or other functional sequence. In some instances, the capture moiety comprises a cleavableorreleasable moiety, e.g., a restriction enzyme recognition site, an abasic site, a uracil which can be cleaved using USER® or uracil DNA glycosylase, a disulfide bondthat can be releasable upon addition of a reducing agent, etc. In some instances, the capture moiety comprises a partial restriction site; e.g., the capture moiety may comprise a first partial restriction site and the polymerizable molecule may comprise a second partial restriction site; upon coupling or ligation of the polymerizable molecule to the capture moiety, the two partial restriction sites may generate a complete restriction site, such that the individual molecules (capture moiety and polymerizable molecule) are not cleavableby restriction digest individually but the ligated or coupled product is. In some instances, the capture moiety comprises a barcode sequence that comprises any useful information, e.g., the identity of the peptide that is to be analyzed, temporal information, spatial information, etc.
[00132] In some instances, the capture moiety is provided coupled to a substrate. In one example, the substrate comprises, one or more identical capture nucleic acid molecules; these identical capture nucleic acid molecules may act as a capture moiety for generating one or more AALC complexes, e.g., for a terminal amino acid, an n-1 amino acid, etc. In some instances, commercially available substrates, e.g., beads (e.g., DNAbeads or barcoded beads), flow cells, or chips, e.g., Illumina® HiSeq, iSeq, MiniSeq, NextSeq, NovaSeq, etc. may be used as the substrates described herein. In some instances, the capture moieties may comprise additional useful sequences, e.g., primer sequences (e.g., P5 orP7 sequences) or read sequences (e.g., R1 or R2).
[00133] The capture moiety may be coupled to a substrate using any useful approach. In some instances, the capture moiety comprises a substrate-tethering group or linker or additional functional group. In some examples, the capture moiety comprises a nucleic acid molecule that comprises a substrate-tethering group, e.g., biotin, a click chemistry moiety such as an azide, that can couple to a substrate, e.g., a substrate comprising streptavidin or a complementary click chemistry moiety thatcan reactwith that of the substrate-tethering group. The capture moiety may additionally comprise a binding sequence, to which another nucleic acid molecule (e.g., a polymerizable molecule that is part of or coupled to the modified amino acid). In some instances, the capture moiety comprises a single-stranded oligonucleotide or a single-stranded region in which a complementary oligonucleotide can hybridize. The complementary oligonucleotide may comprise a detectable label (e.g., fluorophore) that allows for detection of the capture moiety. [00134] Alternatively, the capture moiety may not be coupled to a substrate. For instance, the capture moiety may be directly coupled to the peptide that is to be analyzed or is undergoing
intramolecular expansion. In such examples, the capture moiety may additionally comprise a nucleic acid barcode molecule that encodes the identity of the peptide or the originating sample or partition from which the peptide originated. The capture moiety may be coupled to any useful segment of the peptide, e.g., at a terminus (e.g., C-terminus) or at an internal residue. Alternatively, or in addition to, the capture moiety may be provided in a solution and may not be coupled to the substrate or the peptide.
[00135] Cleaving: In some instances, generating the modified amino acid further comprises cleaving the amino acid or the amino acid-linker complex from the peptide. The cleaving of the amino acid or amino acid-linker complex may be achieved using any suitable mechanism, such as via application of a stimulus. The stimulus can be, for example, a chemical stimulus, a biological stimulus, a thermal stimulus (e.g., application of heat), a photo-stimulus, a physical or mechanical stimulus, or other type of stimulus or a combination of stimuli. In some instances, the stimulus comprises a chemical stimulus, e.g., a change in pH, application of an acid orbase, addition of a lytic agent, initiating agent, radical-generating agent, reducing agent, etc. In some instances, the chemical stimulus comprises application of a Lewis acid (e.g., boron triflate, boron trifluoride etherate, boron trichloride, boron tribromide, boron triiodide, or scandium triflate). In some instances, the stimulus comprises a biological stimulus, e.g., enzyme (e.g., Edmanase, protease, endonuclease) or ribozyme or DNAzyme that can cleave or catalyze cleavage of the amino acid or amino acid-linker complex.
[00136] In some examples, the methods provided herein may comprise using a linker comprising an amino acid reactive group (e.g., PITC, a xanthate, a guanidinylating agent, a dithioester or thiocarbamoyl) and coupling the amino acid reactive group of the linker with the amino acid and cleavingthe amino acid from the peptide using a stimulus (e.g., change in pH, temperature). In an example, the linker may comprise a PITC moiety that couples to an NTAA under mildly alkaline conditions to generate a phenylthiocarbamoyl (PTC) derivative of the NTAA, and cleavage of theNTAA from the peptide may be achievedusingan Edman degradation reaction (e.g., application of an acid such as trifluoroacetic acid or boron triflate, optionally with heat), to generate a thiazolinone (ATZ) derivative or a phenylthiohydantoin (PTH) derivative. As described elsewhere herein, the linker may comprise a moiety or molecule (e.g., polymerizable molecule such as a nucleic acid molecule) that can also couple to the capture moiety such that the amino acid-linker complex may be coupled to the capture moiety, thereby generating an amino acid-linker-capture moiety complex.
[00137] Given the harsh reaction conditions of standard Edman degradation, the polymerizable molecules described herein (e.g., nucleic acid molecules, peptides, lipids) etc. may comprise alterations or modifications to render them more resistant to the reaction conditions. For example, nucleic acid molecules may comprise predominantly pyrimidines (e.g., thymines, cytosines, uracils) which are more resistantto acid degradation and heatas compared to purines(e.g., adenine and guanine). For example, a nucleic acid molecule may comprise at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, or 100% thymines or cytosines. Alternatively, or in addition to, canonical nucleotides may be substituted or may comprise acid-resistant nucleotide analogs, e.g., hexitol nucleic acids.
[00138] Alternative degradation chemistries may also be employed. Milder degradation under basic conditions forN-terminal amino acid removal can include the use of triethylamine acetate in acetonitrile or other solvent such as water, N, N-dimethylformamide (DMF), or a mixture of solvents. Alternatively, degradation may be achieved using a thioacylation approach, the use of milder acid reagents, e.g., trichloroacetic acid (pKa of 0.66) or dichloroacetic acid (pKa of 1 .35), or alternative basic reaction conditions, e.g., using acid-base pairs such as N, N- Diisopropylethylamine (DIPEA), pyridine, acetic acid derivatives, etc.
[00139] C-terminal degradation strategies are also provided herein. C-terminal degradation may comprise Edman-like degradation approaches. C-terminal degradation may employ the use of activatingreagents that react with the C-terminal carboxyl group of a peptide, and a derivatizing agent (e.g., a thiocyanate to generate a peptide-thiocyanate or peptide-thiohydantoin). Nonlimiting examples of activating reagents include acetyl chloride and acetic anhydride. Alternatively, or in addition to, single-step C-terminal derivatization of a peptide to a peptidyl- thiohydantoin may be performed, e.g., using Schlack-Kumpf approach, in which a peptide is reacted with thiocyanic acid (e.g., in acetone) to generate a peptidyl-thiohydantoin. The peptide- thiohydantoin may be cleaved, e.g., using basic conditions, to generate an amino acid thiohydantoin and remaining peptide.
[00140] Cleavage of amino acids may also be achieved using enzymatic or enzyme-analog (e.g., ribozyme or DNAzyme) approaches. Example enzymatic cleavage may include the use of Edmanases (e.g., modified cruzain), aminopeptidases (e.g., Pfu aminopeptidase I, PhTET aminopeptidases, P. horikoshii aminopeptidases), metalloenzymatic aminopeptidases, acylpeptide hydrolases, tRNA synthetases, endopeptidases, carb oxy peptidases, and the like. The enzymes or ribozymes or DNAzymes may be modified or engineered to recognize a modified amino acid, e.g., an amino acid that has a chemical moiety attached thereto (e.g., PITC, NITC,
dansyl chloride, SNFB, DNP, SNP, biotin, streptavidin, nucleic acid molecules, lipids, carbohydrates, acetyl groups, acyl groups, guandinylation agents, etc.).
[00141] One or more reactions may be accelerated by application of energy or radiation, e.g, electromagnetic radiation. For example, degradation or cleavage of the terminal amino acid of a peptide may be facilitated by applying microwave energy to accelerate the reaction kinetics. For example, hydrolysis of proteins may be facilitated by application of microwave energy, e.g., as described in Margolis et al., 1991, Journal of Automatic Chemistry. Vol 13, No. 3, pp 93-95, which is incorporated by reference herein.
[00142] In some instances, more than one amino acid may be cleaved from the peptide per cleavage event. The cleaving may comprise cleaving 2 amino acids, 3 amino acids, 4 amino acids, 5 amino acids, 6 amino acids, 7 amino acids, 8 amino acids, 9 amino acids, 10 amino acids, or more. For example, the polymeric analyte may comprise a peptide comprising a plurality of amino acids, and single amino acids, di-peptides, tri-peptides, quadri-peptides, or larger may be cleaved in the methods described herein. In some instances, at most about 10 amino acids, at most about 9 amino acids, at most about 8 amino acids, at most about 7 amino acids, at most about 6 amino acids, at most about 5 amino acids, at most about 4 amino acids, at most about 3 amino acids, or fewer amino acids may be cleaved in a given cleavage event. In some instances, cleavage of greater than one amino acid may be mediated using an enzyme (e.g., Edmanase, protease) or ribozyme or DNAzyme that is capable of recognizing or cleaving more than a single amino acid. [00143] Cleavage of the amino acid from the peptide may be conducted using a biological stimulus, such as an enzyme or ribozyme or DNAzyme. The enzyme can be any useful cleaving enzyme, e.g., a protease, such as an Edmanase, cruzain, a cleaving protein (e.g., ClpS, ClpX), Proteinase K, exopeptidase, aminopeptidase, diaminopeptidase, serine protease, cysteine protease, threonine protease, aspartic protease, aspartic protease, glutamic protease, metalloprotease, asparagine peptide lyase, pepsin, trypsin, pancreatin, Lys-C, Glu-C, Asp-N, chymotrypsin, carb oxy peptidase (e.g., carboxypeptidase A, carboxypeptidaseB, carboxypeptidase Y), SUMO protease, elastase, papain, endoproteinase, proteinase, TrypZean®, bromelain, collagenase, hyaluronase, thermolysin, ficin, keratinase, tryptase, fibroblast activation, enterokinase, chymotrypsinogen, chymase, clostripain, calpain, alpha-lytic protease, proline specific endopeptidase, furin, thrombin, subtilisin, genenase, PCSK9, cathepsin, prolidase, methionine aminopeptidase, cathepsin C, 1-cyclohexen-l-yl-boronic acid pinacol ester, pyroglutamate aminopeptidase, renin, kininogen, kallikrein, DPPIV/CD26, thimet oligopeptidase, prolyl oligopeptidase, leucine aminopeptidase, dipeptidylpeptidase, or other enzyme or protease, or a combination or variation (e.g., engineered mutant or variant) thereof. In some instances, the
cleaving enzyme or ribozyme orDNAzymemay be configured or engineered to cleave a terminal amino acid or plurality of amino acids; alternatively, the cleaving enzyme or ribozyme or DNAzyme may be configured or engineered to cleave off-site at a non-terminal location of the peptide, e.g., at an internal amino acid at an n-1, n-2, n-3, n-4, n-5, n-6, n-7, n-8, n-9, n-10, etc. position, where n is the number of amino acids in the peptide.
[00144] In the instances of enzymatic cleavage, additional reagents may be providedto catalyze or induce the cleavage. For instance, metalloproteases, aminopeptidases, or exopeptidases may facilitate cleavage of an amino acid or plurality of amino acids in the presence of a catalyst, e.g, metal or metal ion (e.g., cobalt). Accordingly, a catalyst may be provided in order to facilitate the binding of the enzyme to an amino acid or the subsequent cleavage of the amino acid from the peptide. In some examples, cleavage may be mediated by an apo-enzyme, which is inactive in the absence of a metal catalyst of cofactor, and cleavage may be controlled by addition of metal or metal ions.
[00145] Other examples of cleaving stimuli include: a photo stimulus (e.g., application of UV, X-rays, gamma rays, or other wavelength of light), mechanical stimulus (e.g., sonication, high pressure, electromagnetic energy), thermal stimulus (e.g., application of heat), or chemical stimulus. In some instances, the peptide may comprise or be altered to comprise a cleavable or labile bond that can be cleaved upon application of the appropriate stimulus, e.g., disulfide bonds (e.g., cleavable upon application of a chemical stimulus such as a reducing agent), ester linkages (e.g., cleavable with a change of pH), a vicinal-diol linkage (e.g., cleavable with sodium periodate), a Diels-Alder linkage (e.g., cleavable upon application of heat), a sulfonelinkage (e.g, cleavable via a base), a silyl ether linkage (e.g., cleavable via an acid), a glycosidic linkage (e.g, cleavable via an amylase), a peptide linkage (e.g., cleavable via a protease), or a phosphodiester linkage (e.g., cleavable via a nuclease (e.g., DNase)).
[00146] Similarly, in some instances, the capture moiety may be cleaved from the peptide or the substrate. The cleaving may occur at any useful or convenient step, e.g., after generation of the modified amino acid or stacked plurality of modified amino acids. In some instances, cleavage of the capture moiety may occur subsequent to the formation of a stacked plurality of modified amino acids, and the cleaved product may be sequenced, e.g., usingimaging approaches described elsewhere herein.
[00147] Binding agents: One or morebinding agents may be used herein to facilitate detection of the modified monomers (e.g., modified amino acids). A binding agent may be contacted with the modified amino acid or a stacked plurality of modified amino acids. The binding agent may
be any useful molecule that can couple to a modified amino acid. For example, a binding agent may be or comprise a protein or peptide (e.g., an antibody, antibody fragment, single chain variant fragment (scFv), nanobody, anticalin, tRNA synthetase or tRNA-acyl synthetase, a fibronectin domain), a peptide mimetic, a peptidomimetic (e.g., a peptoid, a beta-peptide, a D-peptide peptidomimetic), a poly saccharide, anucleic acidmolecule (e.g., aptamer), a somamer, a polymer, an inorganic compound, an organic compound, a small molecule, or derivatives (e.g., engineered variants) or combinations thereof. The binding agent may comprise a recognition site that specifically recognizes an amino acid type, a modified amino acid (e.g., an amino acid type coupled to a linker comprising a PITC, xanthate, guanidinylating agent, dithioester, thiocarbamoyl moiety), or a derivatized (and optionally modified) amino acid. For example, the binding agent may be configured to recognize or have binding specificity to a moiety of a modified amino acid, such as a specific amino acid residue, the residue-linker complex, or derivatized amino acid (e.g, a thiocarbomyl-derivatized residue, a thiazolone-derivatized residue, a thiohydantoin-derivatized residue, etc.), or a portion of a modified amino acid. In some instances, the binding agent may be derived or engineered from a naturally -occurring enzyme or protein, e.g., an aminopeptidase, exopeptidase, metalloprotease, antibody, anticalin, N-recognin protein, Clp protease, endoprotease (e.g., trypsin), or tRNA synthetase. In some examples, a binding agent may be a cleaving enzyme (e.g., trypsin, endoprotease) that has been modified to remove the peptidase activity. The binding agent may also recognize a terminal amino acid thatis attached to a substrate; for example, after all but the final amino acid of a peptide has been coupled to the capture moiety or capture moieties and cleaved, the final amino acid may remain coupled to a substrate. Accordingly, the binding agent may recognize and bind the surface-coupled amino acid.
[00148] The binding agents may be contacted with and specifically bind to modified amino acids, e.g., cleaved and/or derivatized amino acids, an amino acid-linker complex, an amino acidlinker-capture moiety complex, (altogether referred to herein as “monomeric analytes”). For example, a monomeric analyte may fall in any size or range of sizes that is less than that of the entire polymeric analyte. A monomeric analyte complex may be about 0.1 nanometer (nm), about 0.5 nm, 1 about 1 nm, about 5 nm, about 10 nm, about 20 nm, about 30 nm, about 40 nm, about 50 nm, about 60 nm, about 70 nm, about 80 nm, about 90 nm, about 100 nm, about 200 nm, about 300 nm, about 400 nm, about 500 nm, about 600 nm, about 700 nm, about 800 nm, about 900 nm, about 1 micrometer (|im), about 10 pm, about 100 pm, about 1 millimeter mm in size or greater. The monomeric analyte may have any molecular weight or range of molecular weights. The monomeric analyte may be about 1 dalton (Da), 10 Da, 100 Da, 500 Da, 1 kilodalton (kDa), 10
kDa, lOO kDa, l,000 kDa, 10,000kDa, 100,000kDa, orgreater. The monomeric analyte may vary in molecular weight or length, e.g., depending on the amino acid residue.
[00149] The binding agent may comprise or be coupled, directly or indirectly, to a polymerizable molecule. The polymerizable molecule of the binding agent may be the same type of molecule as the binding agent itself (e.g., both peptides, both nucleic acid molecules, etc.), or they may be different. In some instances, the binding agent comprises a peptide (e.g., antibody or antibody fragment) and the polymerizable molecule of the binding agent comprises a nucleic acid molecule. The polymerizable molecule of the binding agent may be conjugated to the binding agent via a chemical conjugation approach, e.g., using linkers such as SMCC, (N-e- maleimidocaproyloxy)succinimide ester (EMCS), succinimidyl-4-(p-maleimidophenyl)butryate (SMPB), succinimidyl-(N-maleimidopropionamido-ethyleneglycol) ester (SMPEG), Succinimidyl (NHS) esters, succinimidyl-4-formylbenzamide (S-4FB), succinidmidyl-6- hydrazino-nicotinamide (S-HyNic), 4-Phenyl-3H-l,2,4-triazoline-3.5(4H)diones (PT AD) or other diazonium, l-ethyl-3-3-dimethylaminoproyl carbodiimide hydrochloride (EDC), etc. Synthesis of the peptide-nucleic acid molecule conjugate may also be carried out using solidphase synthesis, fragment conjugation (e.g., using heterobifunctional crosslinkers such as those comprising an aliphatic chain and a maleimide group on one end and NHS on the other), click chemistry (e.g., strain-promoted azide alkyne cycloaddition, inverse-electron-demandDiels-Alder reactions), or combinations of approaches or chemistries. In some instances, the polymerizable molecule of the binding agent may be conjugated to the binding agent using an enzymatic approach. For example, a DNA-protein conjugate may be generated using a truncated a nuclease (e.g., Cas protein such as Cas9), a relaxase (e.g., VirD2), or other enzyme, ribozyme, or DNAzyme. In some instances, the polymerizable molecule of the binding agent may be conjugated to the binding agent using a SpyTag and SpyCatcher interaction, a biotin-avidin interaction, a SNAP-tag, or other interaction. Optional purification may be performed, e.g., using ion-exchange chromatography, HPLC, affinity chromatography, or other purification technique. [00150] The binding agent may be coupled to the polymerizable molecule of the binding agent via a noncovalent interaction. For instance, the binding agent may comprise an avidin or streptavidin tag, to which biotin-conjugated polymerizable molecules can bind. Alternatively, the binding agent may comprise a biotin tag to which an avidin or streptavidin-conjugated polymerizable molecule can bind.
[00151] More than one binding agent may be used to identify the modified amino acid. In one such case, sub sequentto the binding of the binding agentto the modified amino acid, an additional molecule (e.g., a secondary binding agent) comprising a detectable label, e.g., fluorophore,
radioisotope, mass tag, or an identifying polymerizable molecule (e.g., nucleic acid barcode molecule) may be contacted with and bind to the binding agentthat is bound to the modified amino acid. In one non-limiting example, the binding agent comprises a primary antibody or antibody fragment that recognizes the modified amino acid, e.g., terminal amino acid- linker or terminal amino acid-linker-capture moiety complex or portion thereof (e.g., the terminal amino acid, or the terminal amino acid-linker complex); subsequentto binding of the primary antibody or antibody fragment, a secondary antibody or antibody fragment comprising or coupled to a detectable label such as a fluorophoreor a polymerizable molecule (e.g., nucleic acidbarcode molecule) is coupled to the primary antibody. The polymerizable molecule of the secondary antibody or antibody fragment may comprise information on the secondary antibody or antibody fragment, the primary antibody or antibody fragment, or other information. Transfer or coupling of the polymerizable molecule of the secondary antibody or antibody fragment to the capture moiety or polymerizable molecule can be mediated by any suitable technique, e.g., hybridization of nucleic acid molecules optionally mediated by a splint molecule, click chemistry, or association of high affinity molecules (e.g., streptavidin and biotin).
[00152] Detectable Labels: The binding agent may be coupled to one or more detectable labels. The detectable label may be a label that can be directly detected, e.g., a dye, fluorophore, quantum dot, mass tag, radioisotope, or other detectable label. Alternatively, the detectable label may be detected indirectly, e.g., the detectable label may comprise a nucleic acid molecule that can be detected using downstream nucleic acid processing and sequencing, as described elsewhere herein. In some embodiments, the binding agent is coupled to one or more fluorophores which can be detected using fluorescence imaging approaches. In one such example, a plurality of binding agents that each have specificity to a different amino acid type may comprise a different fluorophore. Alternatively, or in addition to, the plurality of binding agents may comprise a combination of fluorophores, such that each binding agent comprises a uniquely identifiable fluorescence signal (e.g., a spectral set of fluorophores).
[00153] A detectable label may comprise any useful dye, fluorophore, fluorescent protein or combination thereof. In some instances, the detectable label comprises a fluorophore or fluorescent protein that is configured for super-resolution imaging. Non-limiting examples of useful fluorescent labels include Alexa Fluor, APEX Alexa Fluor, Oregon Green, ATTO, cyanine dyes (e.g., Cy3, Cy3B, Cy5), Dylight, SYTO13, YOYO-1, SYTOX, Tetramethyl Rhodamine (TMR), Janelia Fluor, CellMASK, LysoTracker, MitoTracker, CellMask, PS-CFP2, PA-GFP, mGeos-m, PATagRFP, PamCherry, tdEos, mEosl, mEOS 2, mEOS 3.2, mEOS 4b, mMaple3,
Dendra2, PSmOrange, Pa-mKate, Kaede, Dronpa, mGeosM, Dreiklang, mlrisFP, NijiFP, or variants or derivatives thereof. In some instances, the detectable label may comprise a photoswitchable, caged, or photoactivatable label, dye, fluorophore, or fluorescent protein. For example, a binding agent may be labeled with a pair of photoswitchable fluorophores (e.g., Cy3 and Cy5) for use in dSTORM imaging; a first excitation wavelength pulse may be used to excite fluorescence from a first fluorophore of the pair and to switch off (to a dark state) a second fluorophore of the pair. A second excitation wavelength may then be used to switch the first fluorophore off and the second fluorophore on. The use of photoswitchable dyes may be beneficial in switching on only a fraction of the fluorophores, thereby increasing the optical resolvability of the active fluorophores (as only a subpopulation of the fluorophores is switched on at a given moment). Alternatively, or in addition to, a combination of reporter and activator dyes or fluorophores may be used, e.g., for STORM imaging, e.g., as described by M. Rust et al. 2006. Nature Methods 3, 793-796, which is incorporated by reference herein in its entirety. In such an example, abindingagentmay be labeled with a reporter dye and an activator dye; a first excitation wavelength pulse may be used to excite fluorescencefrom the activator dyethattriggers the return of nearby reporter molecules to the on state.
[00154] The super-resolution imaging may be performed using any suitable buffer or buffer components. Selection of appropriate buffer solutions can improve or enhancethe photoswitching of fluorophores. Non-limiting examples of buffer components include reducing agents such as beta mercapto ethanol (BME), dithiothreitol (DTT), L-glutathione, mercaptoethylamine (MEA), Tris (2 -carboxy ethyl) phosphine (TCEP); oxygen scavengers such as glucose oxidase, catalase, protocatechuic acid, protocatechuic dioxygenase; fluorescence enhancers; glucose, or glucose oxidase.
[00155] In some instances, the modified amino acid or stacked plurality of modified amino acids is contacted with a library of binding agents. The library of binding agents may comprise a plurality of binding agents that have specificity to different analytes. For example, the library of binding agents may comprise a plurality of binding agents that recognize different amino acids or derivatives thereof (e.g., derivatized amino acids such as the PTH, PTC, or ATZ forms), clusters of amino acids (e.g., dipeptides, tripeptides, etc.), or combinations of amino acids (e.g., amino acids with similar side chain groups). In one such example, a given binding agent may recognize and bind to more than one amino acid, optionally with different affinities or binding kinetics. A given binding agent may recognize and bind to a single amino acid, two different amino acids, three different amino acids, four different amino acids, etc. For instance, a given binding agent may bind to amino acids with similar residues, e.g., amino acids with positively- charged side
chains (e.g., arginine, histidine, lysine), negatively-charged side chains (aspartic acid, glutamic acid), amino acids with polar uncharged side chains (e.g., serine, threonine, asparagine, glutamine), amino acids with hydrophobic side chains (e.g., alanine, valine, isoleucine, leucine, methionine, phenylalanine, tyrosine, trytophan), or a combination thereof. Altogether, the library of binding agents may specifically recognize or bind to any number of different amino acids; for example, the library of binding agents may be configured to specifically bind to at least 2, at least 3, at least4, at least 5, at least 6, at least 7, atleast 8, atleast 9, at least 10, at least 11, atleast 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, or at least 20 different proteinogenic amino acids or derivatives thereof.
[00156] The library ofbinding agents may comprise any useful number of binding agents, each of which can have differentbinding specificities. For example, a firstbindingagentmay recognize and one amino acid, and second binding agent may recognize two amino acids, and a third binding agent may recognize three amino acids. In another example, a first binding agent may recognize one amino acid, a second binding agent may recognize a different amino acid, and a third binding agent may recognize a plurality of amino acids. It will be appreciated that any number of binding agents may be used, and that each binding agent may have specificity to one or more amino acids. Altogether, the library of binding agents may bind to all 20 proteinogenic amino acids or derivatives thereof, or a subset (e.g., 10 or more, 15 or more) of the amino acids.
[00157] A binding agent may be passivated prior to or during contact with the cleaved monomer (e.g., cleaved modified amino acid). Passivation maybe achieved using a blocking agent or solution, such as milk proteins (e.g., lactoglobulin, lactalbumin, lactoferrin, casein, whey, immunoglobulin, insulin, growth factors, osteopontin), albumin (e.g., bovine serum albumin), Tween 20, commercially available blocking solutions, or a combination thereof. Alternatively, or in addition to, passivation of the binding agent may be performed using a polymer (e.g., polyethylene glycol), organic compound (e.g., oil, lipids), sugar, nanoparticle, inorganic compound, ion, etc.
[00158] Super-Re solution Imaging/Microscopy: The modified amino acid or stacked plurality of modified amino acids may be detected using super-resolution imaging. In one such example, the modified amino acid or stacked plurality of modified amino acids is contacted with a plurality of binding agents, wherein each of the plurality of binding agents comprises a detectable label. The plurality of binding agents may comprise binding agents (e.g., antibodies, antibody fragments, nanobodies, etc.) that are specific to a particular amino acid type (e.g., leucine, valine, isoleucine, phenylalanine, etc.) or derivative thereof (e.g., a leucine derivative, a valine derivative, a leucine-
linker complex, a valine-linker complex, etc.); each binding agent that binds to a particular amino acid type may comprise a fluorophore or quantum dot that is detectable using photoillumination and detection (e.g., fluorescence microscopy). Any suitable super-resolution imaging approach may be used, including microscopy techniques that have higher resolution than the Abbe diffractionlimit, such as total internal reflection microscopy (TIRF), photo-tunnelingmicroscopy, near field scanning optical microscopy (e.g., near-field optical random mapping microscopy), far field scanning optical microscopy, confocal microscopy, 4Pi microscopy, structured illumination microscopy (SIM) approaches, optical sectioning SIM (OS-SIM), super-resolution SIM (SR- SIM), spatially modulated illumination (SMI), saturated pattern excitation microscopy (SPEM), etc. In some instances, deterministic super-resolution approaches such as stimulated emission depletion (STED), ground state depletion (GSD), reversible saturable optical fluorescence transitions (RESOLFT), and saturated structured illumination microscopy (SSIM), or stochastic approaches such as spectral precision distance microscopy (SPDM), SPDM with modifiable fluorophores (SPDMphymod), photoactivated localization microscopy (PALM), fluorescence PALM (FPALM), stochastic optical reconstruction microscopy (STORM), and direct STORM (dSTORM) may be used. Other non-limiting examples of super-resolution microscopy include cryogenic optical localization in 3D (COLD), binding-activated localization microscopy (BALM), points accumulation for imaging in nanoscale topography (PAINT), super-resolution optical fluctuation imaging (SOFI), omnipresent localization microscopy (OLM), resolution enhancement by sequential imaging (RESI), and combination techniques such as 3D light microscopical nanosizing (LIMON), integrated correlative light and electron microscopy, among others. In some instances, a combination of microscopy approaches may be employed. The imaging approach can be 2-dimensional or 3 -dimensional. In some instances, the imaging approaches may include fluorescence resonance energy transfer (FRET). In some instances, three- dimensional imaging approaches may be used, e.g., confocal microscopy, OS-SIM or other optical sectioning approach.
[00159] In some instances, a stacked plurality of modified amino acids is contacted with binding agents comprising detectable labels and imaged using super-resolution microscopy. The modified amino acids of the stacked plurality of modified amino acids may be generated using an intramolecular expansion process, as described above, such thatthe distance between the modified amino acids is optimized for imaging. The modified amino acids may have a pitch (e.g., having a spacingbetween the amino acid moieties) of about 5 nm, about 10 nm, 20 nm, about 30 nm, about 40 nm, about 50 nm, about 60 nm, about 70 nm, about 80 nm, about 90 nm, about lOO nm, about 200 nm, or greater. The modified amino acids may have a pitch of at least about 20 nm, at least
about 30 nm, at least about 40 nm, at least about 50 nm, at least about 60 nm, at least about 70 nm, at least about 80 nm, at least about 90 nm, at least about 100 nm, at least about 200 nm, or greater. The modified amino acids may be spaced within a range of values, e.g., between about 50 nm and 100 nm, between about nm 5 and about 50 nm, etc.
[00160] Super-resolution imaging may also be useful in obtaining kinetic or time-lapse data. For instance, the super-resolution imaging of a modified amino acid or stacked plurality of modified amino acids may be performed over any useful pulse duration of time, e.g., femtoseconds, picoseconds, microseconds, milliseconds, seconds, or minutes. Similarly, a modified amino acid or stacked plurality of amino acids may be imaged periodically over a duration of time, e.g., once per second, once per minute, once per hour, etc.
[00161] Any number of imaging color (fluorescence) channels may be utilized for detecting the one or more detectable labels. For instance, imaging may be performed usinga single channel, two channels, three channels, four channels, or a greater number of channels. In instances where multi-channel imaging is employed, any useful combination of imaging equipment such as lasers or other light sources (e.g., mercury lamps, light-emitting diode lights, optical waveguide), mirrors (e.g., dichroic mirrors), optical filters (e.g., long pass or bandpass filters or other excitation or emission filters), apertures, and detection systems (e.g., cameras) may be used.
[00162] Alternatively, the modified amino acid or stacked plurality of modified amino acids may be detected using Raman spectroscopy associated high-resolution imaging techniques. In some examples, the modified amino acid or stacked plurality of modified amino acids can be identified through label-free detection, by uniquely intrinsic vibrational modes of the molecules (e.g., a "fingerprint" in the Raman spectrum). The unique "fingerprint" in the Raman spectrum may be further enhanced or modulated by specialized methods to enhance the spatial, spectral, or temporal vibrational modes in Raman spectroscopy. In other examples, the modified amino acid or stacked plurality of modified amino acids is contacted with a plurality of binding agents, wherein each of the plurality of binding agents comprises a detectable label for Raman spectroscopy. The plurality of binding agents may comprise binding agents (e.g., antibodies, antibody fragments, nanobodies, etc.) that are specific to a particular amino acid type (e.g., leucine, valine, isoleucine, phenylalanine, etc.) or derivative thereof (e.g., a leucine derivative, a valine derivative, a leucine-linker complex, a valine-linker complex, etc.); each binding agent that binds to a particular amino acid type may comprise a detectable Raman label (e.g., 4- Mercaptobenzoic Acid (4-MBA), 4-Aminothiophenol (4-ATP), Crystal Violet, Malachite Green, Nile Blue, 2-Naphthalenethiol, Methyleneblue, etc.). Any suitable Raman spectroscopy associated high-resolution imaging techniques may be used, such as Confocal Raman Microscopy, Tip-
Enhanced Raman Spectroscopy (TERS), Surface-Enhanced Raman Spectroscopy (SERS), Coherent Anti-Stokes Raman Spectroscopy (CARS), Stimulated Raman Scattering (SRS) Microscopy, Resonance Raman Spectroscopy (RRS), Spatially Offset Raman Spectroscopy (SORS), Multiplex Coherent Anti-Stokes Raman Scattering (M-CARS), Hyperspectral Raman Imaging, Electron Energy Loss Raman Spectroscopy (EELS-RS), Nonlinear Raman Spectroscopy, Low-Frequency Raman Spectroscopy (LFRS), Temperature-Dependent Raman Spectroscopy (TDRS), Polarized Raman Spectroscopy (PRS), Time-Resolved Raman Spectroscopy (TRRS), Raman Optical Activity (ROA), Magnetic Field-Enhanced Raman Spectroscopy, Deep Raman Spectroscopy (DRS), Angle-ResolvedRaman Spectroscopy (ARRS), Nano-Raman Spectroscopy, Rotational Raman Spectroscopy, Micro-Raman Spectroscopy (p- Raman), among others. In some instances, a combination ofRaman spectroscopy associated high- resolution imaging techniques may be employed. In some instances, the high-resolution imaging techniques can be 2-dimensional or 3 -dimensional. In some instances, the high-resolution imaging techniques can distinguish two modified amino acids that are spaced along a plurality of stacked polymerizable molecules. In some instances, the high-resolution imaging techniques can extend to the single-molecule detection level (e.g., distinguish a single stacked polymerizable molecule). In some instances, the high-resolution imaging techniques can distinguish two modified amino acids at a distance greater than 50 nanometers.
[00163] Linearization of modified amino acids: In some embodiments, the modified amino acids or stacked plurality of modified amino acids may be linearized to facilitate imaging (e.g., using super-resolution microscopy). For example, the modified amino acids or stacked plurality of amino acids may comprise one or more polymerizable molecules (e.g. , nucleic acid molecules), which can be linearized. In some instances, the linearization occurs on a substrate. In such instances, one portion of a modified amino acid or stacked plurality of modified amino acids may be immobilized attached to the substrate. For instance, the substrate may comprise one or more anchor molecules to which a portion of the modified amino acids or stacked plurality of modified amino acids can couple. In one such example, the polymerizable molecule of the modified amino acid or stacked plurality of amino acids may comprise a DNA molecule, which can couple to an anchoring DNA molecule attached to the substrate, e.g., via hybridization, ligation, an extension reaction, splinted hybridization and ligation, etc. In another example, the modified amino acid or stacked plurality of modified amino acids may comprise a click chemistry moiety which can couple to a complementary click chemistry anchor moiety. In yet another example, the modified amino acid or stacked plurality of modified amino acids and the anchor moiety may comprise one
or more coupling molecules of a binding pair, e.g., biotin, streptavidin, antibody, antigen, etc. As such, the modified amino acid or stacked plurality of modified amino acids may couple to the anchor moiety via a binding interaction (e.g., biotin-streptavidin, antigen-antibody, etc. interaction). In some instances, the coupling may occur using a chemical conjugation, e.g., conjugation of an end or portion ofthe modified aminoacid or stacked plurality of modified amino acids to the anchor molecule. The modified amino acid or stacked plurality of modified amino acids comprisingthe one or more polymerizable molecules (e.g., DNA molecules) may then be linearized, e.g., using applied force such as shear stress (e.g., using fluid flow, flowrate, channel height or depth), an electric field (e.g., electrophoresis), centrifugal or centripetal force, pressure (e.g., pressure-driven flow), a magnetic field (e.g., using magnetic particles attached to one or more portions of the modified amino acids or stacked plurality of modified amino acids), optical tweezers, mechanical forces (e.g., application of tensile strain or stress), chemical approaches, e.g., by controlling an ion or salt concentration, thermal approaches (e.g., application of heat), biological approaches, e.g., using a polymerizing enzyme or a DNA-binding enzyme (e.g., helicase), or a combination thereof. During or sub sequent to linearization, another portion of the modified amino acid or stacked plurality of modified amino acids may be attachedtothe substrate, e.g., via hybridization of an additional sequence of the DNA molecule comprisedby the modified amino acid or stacked plurality of modified amino acids to an additional DNA molecule attached to the substrate, or via chemical conjugation.
[00164] In some instances, linearization of the modified amino acid or stacked plurality of modified amino acids may be performedusinga micro- or nano-fabrication approaches, e.g., using a microfluidic or nanofluidic device. For instance, the modified amino acid or stacked plurality of modified amino acids comprising a polymerizable molecule may be placed on a microfluidic or nanofluidic device comprising a thin film comprising nanochannels. The nanochannels may be widened under applied strain, thereby apply a tensile strain on the polymerizable molecule, thereby stretching and linearizing the polymerizable molecule. In another example, the polymerizable molecule comprisedby the modified amino acid or stacked plurality of modified amino acids may be electrokinetically driven into a micro fluidic device and/or nanofluidic device comprising micro- or nano-channels; such electrokinetic forces (e.g., electrophoretic force) may also, in some instances, be used to linearize the molecule. Alternatively, or in addition to, the modified amino acids may be directed within a microfluidic device using miscible and immiscible compounds, e.g., an oil and water interface or emulsion (e.g., water-oil-water, oil-water-oil), which may drive the modified amino acid or stacked plurality of modified amino acids to a particular region or interface of the substrate. Additional examples and approaches of micro- and
nano-fabrication approaches for linearizing polymerizable molecules are provided in B.C. Kim et al, 2014. Biomaterials Science 2(3): 288-296, and S. K. Das, et al. 2010. Nucleic Acids Res. 38(18):el77, each of which is incorporated by reference herein in its entirety.
[00165] In some instances, immobilization or attachment of the modified amino acid or stacked plurality of modified amino acids to a substrate may be performed. For instance, one or more linkers or chemical conjugation reactions may be performed. In some instances, immobilization or attachment may be mediated using a polymeric matrix. For example, the modified amino acid or stacked plurality of modified amino acids may be provided in a polymerizable precursor solution (e.g., agarose, polyacrylamide, PEGDA, etc.), which and applied to a surface of a substrate (e.g., an amino-silanized glass slide, flow cell, microfluidic device, etc.). The precursor solution may then be subjected to conditions sufficientto polymerize the solution, e.g., using a stimulus. Any type of stimulus may be used, e.g., a chemical stimulus (such as addition of an initiator and optionally a catalyst), photo-stimulus (e.g., for photopolymerization), electrical stimulus, thermal stimulus (e.g., application or removal of heat), biological stimulus (e.g., a polymerizing enzyme), etc. In some instances, the polymer matrix may be generated prior to introduction or immobilization of the modified amino acid or stacked plurality of modified amino acids. For instance, a surface of a substrate may comprise a polymer matrix on the surface, and the modified amino acids or stacked plurality of modified amino acids may be introduced and attached to the polymer matrix (e.g., using a linker molecule such as Sulfo-SANPAH, EDC, or other linker, as described elsewhere herein). Prior to, during, or subsequent to polymerization, the modified amino acid or stacked plurality of modified amino acids may be provided and coupled or immobilized to the polymer matrix.
[00166] In some instances, the method may comprise fixing the modified amino acid or stacked plurality of modified amino acids, e.g., using a fixing agent such as glutaraldehyde, paraformaldehyde, etc., to a substrate, as described elsewhere herein. The fixative may be useful in further immobilizing the modified amino acid or stacked plurality of modified amino acids. Alternatively, or in addition to, the modified amino acid or stacked plurality of modified amino acids may be immobilized to a three-dimensional matrix using crosslinkers and polymerization, e.g., chemical polymerization (e.g., using an initiator), photo-polymerization, etc.
[00167] Alternatively, or in addition to, immobilization may be performed using a dehydration or drying process. For instance, the modified amino acid or stacked plurality of modified amino acids may be attached and linearized on oradjacentto a substrate (e.g., a flat surface, a gel matrix), and subsequentto attachment, linearization, and optional attachment again (e.g., atanotherportion of the modified amino acid), the substrate may be dried, e.g., using nitrogen air, heating, etc.
Beneficially, dehydration or drying of the substrate comprisingthe modified amino acid or stacked plurality of modified amino acids may minimize movement of the modified amino acid or stacked plurality of modified amino acids, which may facilitate high-resolution imaging.
[00168] In some instances, the modified amino acid or stacked plurality of modified amino acids need notbe linearized. For instance, the modified amino acid or stackedplurality of modified amino acids may comprise a circularized polymerizable molecule (e.g., a single- or doublestranded DNA backbone). The circularized polymerizable molecule may be attached to the substrate using any suitable approaches, e.g., as described elsewhere herein, which may include the use of linkers, click chemistry, binding pairs (e.g., biotin and streptavidin). In one such example, the modified amino acid or stacked plurality of modified amino acids may comprise a circular, double-stranded DNA backbone comprising pendant cleaved amino acids derived from a peptide analyte. The circular, double-stranded DNA backbone may additionally comprise one or more biotin moieties, which can be used to immobilize the double-stranded DNA backbone to a streptavidin-coated substrate, e.g., as described by M. Rust et al. 2006. Nature Methods. 3 , 793- 796, which is incorporated by reference herein in its entirety. Additional immobilization techniques, e.g., drying or dehydrating, addition of oil, fixation (e.g., using a chemical fixative), may be used to further prevent movement of the modified amino acid or stacked plurality of modified amino acids prior to analysis (e.g., using super-resolution imaging). In another example, individual modified amino acids may be provided on a substrate and detected, e.g., using binding agents with detectable labels, without the need for linearization.
[00169] In some instances, an immersion medium suitable for imaging may be applied to the modified amino acid or stacked plurality of modified amino acids. In one such example, a substrate-immobilized and optionally linearized modified amino acid may be provided. The substrate may be dried or dehydrated, thereby further immobilizing the modified amino acid or stacked plurality of modified amino acids. A suitable immersion medium, e.g., water or oil or another medium having a suitable refractive index, may be contacted with the substrate and applied to the modified amino acid or stacked plurality of modified amino acids. In some examples, oil may be selected as the immersion medium, which may be beneficial in further directing the modified amino acid or stackedplurality of modified amino acids to a surface (e.g, imaging surface) of the substrate, e.g., by hydrophobic sequestration of a hydrophilic polymer comprised by the modified amino acid or stackedplurality of modified amino acids. Further, the use of oil as the immersion medium may be useful in preserving dye molecules (e.g., preventing photobleaching or degradation of fluorophores), as well as restoring the refractive index of a
sample for imaging (e.g., for changing the refractive index of a dried substrate with air as an immersion medium to a refractive index similar to water).
[00170] In one example, a method provided herein may comprise providing a peptide, and generating a modified amino acid from the peptide. The generating the modified amino acid may comprise providing a linker and a polymerizable molecule, coupling the linker to the amino acid of the peptide and to the polymerizable molecule, thereby generating an amino acid-linker complex, and cleaving the amino acid from the peptide, thereby generating the modified amino acid comprising the linker and the polymerizable molecule. In some instances, the amino acidlinker complex or the cleaved modified amino acid is coupled to a capture moiety. One or more operations of the process may be repeated, thereby generating a stacked plurality of modified amino acids comprising a plurality of stackedpolymerizable molecules. The modified amino acids or stacked plurality of modified amino acids may then be analyzed. In some aspects of the present disclosure, provided herein is a method for characterizing a modified amino acid. In some instances, the modified amino acid is characterized or identified using a binding agent comprising a detectable label and detecting the detectable label, thereby identifying an amino acid type of the modified amino acid. In some embodiments, the modified amino acid is analyzed using superresolution imaging. In some embodiments, the method may comprise attaching a portion of the modified amino acid to a substrate, linearizing the modified amino acid or portion thereof, and attaching an additional portion of the modified amino acid to the substrate. In some instances, the method comprises attaching a first sequence of a DNA molecule or modified DNA molecule (e.g, a DNA molecule comprised by a modified amino acid) to a substrate; linearizing the DNA molecule and attaching a second sequence of the DNA molecule or the modified DNA molecule to the substrate. Additional methods and systems of processing polymeric analytes such as peptides are also described in U.S. Pat. No. 11,499,979, International Pat. App. Nos. PCT/US2023/017954 andPCT/US2023/071456, and U.S. Provisional Pat. App. No. 63/507,558, filed June 12, 2023, each of which is incorporated by reference herein in its entirety.
[00171] FIGs. 1A-1B schematically show example workflows for generating a modified monomer from a polymeric analyte (e.g., a modified amino acid from a peptide) either on a substrate (FIG. 1A) or with or without a substrate (FIG. IB). In workflow 100a of FIG. 1A, a polymeric analyte 103 (e.g., a peptide) and a capture moiety 105 are provided, which optionally are coupled to a substrate 101. The capture moiety 105 may comprise a first nucleic acid molecule (e.g., DNA). In process 106, a linker 109 and a polymerizable molecule, e.g., a linking nucleic acid molecule 111, are provided. In some instances, the linker 109 is pre-tethered to the polymerizable molecule (depicted as a linking nucleic acid molecule 111); alternatively, the linker
109 and the polymerizable molecule may be provided separately. In process 106, the linker 109 may couple to a monomer, e.g., an amino acid (e.g., NTAA) of the polymeric analyte 103 (e.g, a peptide) to generate a monomer-linker complex. In process 112, the monomer-linker complex may couple to the capture moiety 105, thereby generating a monomer-capture moiety complex. Coupling of the monomer-linker complex to the capture moiety 105 may be mediated by the polymerizable molecule, e.g., the linking nucleic acid molecule 111. Optionally, the monomer- linker complex and the capture moiety 105 may be covalently linked together (e.g., the linking nucleic acid molecule 111 may be covalently linked to the capture moiety 105), using chemical (e.g., click chemistry) or enzymatic (e.g., a ligase) approaches. Alternatively, or in addition to, the polymerizable molecule may comprise a first sequence that is complementary and may hybridize to a second sequence of the capture moiety 105 (not shown), or the polymerizable molecule may be linked to the capture moiety 105 via a splint orbridge molecule, which may comprise sequences that are complementary to the first sequence of the polymerizable molecule and the second sequence of the capture moiety 105 (not shown). In process 113, the monomer may be cleaved from the polymeric analyte 103, thereby providing a modified monomer that comprises the cleaved monomer-capture moiety complex; the modified monomer may comprise the cleaved monomer coupled to the linker 109, the polymerizable molecule (shown as a linking nucleic acid molecule 111), the capture moiety 105, oracombinationthereof(e.g., the modifiedmonomermay comprise the cleaved monomer, the linker, and the polymerizable molecule).
[00172] Any of the processes, e.g., 106, 112, and 113 may be iterated and repeated any number of times (“rounds”) using additional linkers 109 and polymerizable molecules (e.g., linking nucleic acid molecules optionally comprising cycle/round information), and tethering the additional polymerizable molecules together (e.g., tethering an additional polymerizable molecule to the polymerizable molecule of the monomer-capture moiety complex). Multiple rounds may be performed until all or a subset of the monomers in the polymeric analyte 103 are cleaved and tethered together. In some instances, processes 106, 112, and 113 may be iterated to generate a stacked plurality of modified amino acids 123 comprising a set of cleaved monomers, e.g., a concatenated set of modified monomers that each comprise a polymerizable molecule coupled thereto. The stacked plurality of modified amino acids 123 may comprise a stacked set of polymerizable molecules from the individual modified monomers. In some instances, the stacked plurality of amino acids 123 may be cleaved from the substrate. For example, the capture moiety 105 or the polymerizable molecule (e.g., linking nucleic acid molecule 111) may comprise a restriction or cleavable site that can be cleaved upon addition of the proper cleaving reagent, e.g, a restriction enzyme, a reducing agent (for disulfide bonds), etc. Alternatively, or in addition to,
the stacked set of polymerizable molecules may be subject to amplification to generate amplicons of the polymerizable molecules coupled to or comprised by the stacked plurality of modified amino acids. The stacked plurality of modified amino acids 123 may be subjected to analysis or characterization, e.g., by contacting the stacked plurality of modified amino acids 123 with a library of binding agents, which can bind to their respective monomer targets (e.g., an amino acid type), and detecting the binding agents.
[00173] Similarly, FIG. IB schematically illustrates another example workflow of generating a modified monomer, e.g., modified amino acid, in the absence of a substrate. In such an example workflow 100b, a polymeric analyte 103, e.g., a peptide and a capture moiety 105 are provided. The capture moiety 105 may comprise a first nucleic acid molecule (e.g., DNA molecule) and may comprise identifying information of the polymeric analyte 103, e.g., an identifying barcode sequence. The capture moiety 105 may additionally comprise a releasable or cleavable moiety. The polymeric analyte and the capture moiety 105 may, in some instances, be coupled to a substrate (FIG. IB inset). For instance, the capture moiety 105 may comprise a nucleic acid sequence that is complementary to a sequence on a substrate (e.g., bead, flat surface). In process 106, a linker 109 and polymerizable molecule, such as a linking nucleic acid molecule 111, are provided. In some instances, the linker 109 is pre-tethered to the polymerizable molecule (linking nucleic acid molecule 111); alternatively, the linker 109 and the polymerizable molecule (linking nucleic acid molecule 111) may be provided separately. The polymerizable molecule may comprise identifying temporal information, e.g., the cycle or round in which it is provided. In process 106, the linker 109 may couple to a monomer, e.g., an amino acid (e.g., NTAA) of the polymeric analyte 103 (e.g., peptide) to generate a monomer-linker complex. In process 112, the monomer-linker complex may couple to the capture moiety 105. Coupling of the monomer-linker complex to the capture moiety 105 may be mediatedby the polymerizable molecule and optionally an additional polymerizable molecule 116. Alternatively, the capture moiety 105 may be directly hybridized or ligated to the polymerizable molecule (linking nucleic acid molecule 111). Optionally, the monomer-linker complex and the capture moiety may be covalently linked together (e.g., via ligation). Alternatively, or in addition to, the polymerizable molecule may comprise a first sequence that is complementary and may hybridize to a second sequence of the capture moiety 105 (not shown), or the polymerizable molecule may be linked to the capture moiety 105 via a splint or bridge molecule, which may comprise sequences that are complementary to the first sequence of the polymerizable molecule and the second sequence of the capture moiety 105 (not shown). In process 113, the monomer may be cleaved from the polymeric analyte 103 to generate the modified monomer that comprises the monomer-capture
moiety complex. The modified monomer may comprise the cleaved monomer, the linker 109, the polymerizable molecule (e.g., linking nucleic acid molecule 111), the capture moiety 105, or a combination thereof (e.g., the cleaved monomer, the linker, and the polymerizable molecule, just the cleaved monomer, or just the cleaved monomer-linker complex). Processes 106, 112, and 113 may be iterated and repeated any number of times (“rounds”) using additional linkers 109 and polymerizable molecules, and tethering the additional polymerizable molecules together (e.g., tethering an additional polymerizable molecule to the monomer-capture moiety complex). Multiple rounds may continue until all or a subset of the monomers of the polymeric analyte 103 are tethered together. For example, the process may be iterated to generate a stacked plurality of modified monomers 123 comprising a set of concatenated modified monomers, e.g., concatenated monomer-linker-polymerizable complexes. The polymerizable molecules (e.g., linking nucleic acid molecules 111) of the stacked plurality of modified monomers 123 may be identical molecules (e.g., same nucleic acid sequence), or they may be different. After any useful number of rounds, the stacked plurality of modified monomers 123 may be cleaved from the or at the capture moiety 105, e.g., usingthe cleavable moiety. Alternatively, or in addition to, the stacked set of polymerizable molecules may be subject to amplification to generate amplicons of the polymerizable molecules coupled to or comprised by the stacked plurality of modified amino acids. The stacked plurality of modified amino acids 123 may be subjected to analysis or characterization, e.g., by contacting the stacked plurality of modified amino acids 123 with a library of binding agents, which can bind to their respective monomer targets (e.g., an amino acid type), and detecting the binding agents.
[00174] In some instances, the polymerizable molecule, e.g., linkingnucleic acid molecule 111 comprises temporal information on the cycle in which it is provided; as such, the temporal information may be used for conducting quality control. For example, if a missing cycle number is missing, then it can be inferred that an amino acid is missing or was not present in the peptide, that cleavage of the amino acid did not occur, or other error.
[00175] It will be appreciated that iteration of the workflow may be performed to generate a stacked plurality of modified monomers (e.g., modified amino acids), a set of non-concatenated individual modified monomers, or a combination thereof. In some instances, iteration of the workflow may be performed on separate capture moieties, such that individual modified monomers are generated from a single polymeric analyte. For example, FIG. 1C schematically shows an example workflow for intramolecular expansion to generate individual modified amino acids that are coupled to separate capture moieties. A polymeric analyte 103 (e.g., a peptide) and a plurality of capture moieties 105 are provided, which optionally are coupled to a substrate 101.
The capture moieties 105 may comprise an individual nucleic acid molecule (e.g., DNA). The polymeric analyte 103 may be contacted with a linker 109 and a polymerizable molecule, e.g, a linking nucleic acid molecule 111. In some instances, the linker 109 is pre-tethered to the polymerizable molecule (depicted as a linkingnucleic acid molecule 111); alternatively, the linker 109 and the polymerizable molecule may be provided separately. The monomer-linker complex may couple to a first capture moiety 105, thereby generating a monomer-capture moiety complex. Coupling of the monomer-linker complex to the capture moiety 105 may be mediated by the polymerizable molecule, e.g., the linking nucleic acid molecule 111. Optionally, the monomer- linker complex and the capture moiety 105 may be covalently linked together (e.g., the linking nucleic acid molecule 111 may be covalently linked to the capture moiety 105), using chemical (e.g., click chemistry) or enzymatic (e.g., a ligase) approaches. Alternatively, or in addition to, the polymerizable molecule may comprise a first sequence that is complementary and may hybridize to a second sequence of the capture moiety 105 (not shown), or the polymerizable molecule may be linked to the capture moiety 105 via a splint orbridge molecule, which may comprise sequences that are complementary to the first sequence of the polymerizable molecule and the second sequence of the capture moiety 105 (not shown). Subsequent to coupling to the first capture moiety, the monomer may be cleaved from the polymeric analyte 103, thereby providing a modified monomer that comprises the cleaved monomer-capture moiety complex; the modified monomer may comprise the cleaved monomer coupled to the linker 109, the polymerizable molecule (shown as a linkingnucleic acid molecule 111), the capture moiety 105, or a combination thereof (e.g., the modified monomer may comprise the cleaved monomer, the linker, and the polymerizable molecule).
[00176] Iteration of the process may be performed using additional linkers 109 and polymerizable molecules (e.g., linkingnucleic acid molecules optionally comprising cycle/round information), and tethering the additional polymerizable molecules to additional capture moieties 105. For instance, the individual capture moieties 105 may be provided attached to a substrate at a defined distance to facilitate coupling of the additional monomers thereto. Multiple rounds may be performed until all or a subset of the monomers in the polymeric analyte 103 are cleaved and tethered to individual capture moieties 105, thereby generating a plurality of individual modified monomers. In some instances, the individual modified monomers may then be couple together to generate a stacked plurality of modified monomers (e.g., modified amino acids) 123 comprising a concatenated set of individual modified monomers that each comprise a polymerizable molecule coupled thereto. Alternatively, or in addition to, the individual modified monomers may be subject to amplification to generate amplicons of the polymerizable molecules comprised by the
individual modified monomers. Individual modified monomers or stacked plurality of modified amino acids may be subjected to analysis or characterization, e.g., by contacting with a library of binding agents, which can bind to their respective monomer targets (e.g., an amino acid type), and detecting the binding agents.
[00177] In some instances, the intramolecular expansion may occur within a 3 -dimensional substrate. FIG. ID schematically shows an example of intramolecular expansion, e.g., as shown in FIGs. 1A-1C, occurring within a 3 -dimensional scaffold, e.g., a hydrogel. In such examples, the gel may comprise capture moieties across the 3-dimensional scaffold, and the intramolecular expansion process may be performed across multiple capture moieties.
[00178] The modified monomer or stacked plurality of modified monomers may be analyzed using an imaging approach, e.g., using super-resolution imaging. In some instances, the modified monomers or stacked plurality of modified monomers may be placed within a three-dimensional substrate prior to analysis. For instance, subsequentto generation of the modified monomer or stacked plurality of modified monomers, the modified monomer or stacked plurality of modified monomers may be removed from the solution or substrate and embedded in a gel matrix. In one such example, the modified monomer or stacked plurality of modified monomers may be mixed with a polymer precursor mix (e.g., polyacrylamide, agarose, or other hydrogel) and applied to a substrate (e.g., an additional substrate). The polymer precursor mix may then be polymerized using any useful crosslinking process, e.g., photocrosslinking, chemical crosslinking, heat crosslinking, or a combination thereof. An example of such an approach is depicted schematically in FIG. IE. In one example, a modified amino acid or stacked plurality of modified amino acids may comprise an acrydite or acrylamide moiety which can enable covalent coupling of the modified amino acid or stacked plurality of modified amino acids to the gel matrix. Alternatively, the modified amino acid or stacked plurality of modified amino acids may not be covalently bound to the gel matrix; rather, the modified amino acid or stacked plurality of modified amino acids may be noncovalently introduced to a gel matrix, e.g., using electrophoresis, passive diffusion, capillary action, or other transport approaches.
[00179] Use of a three-dimensional matrix, e.g., a hydrogel matrix, may be useful in processing or analyzingthe modified monomers or stacked plurality of modifiedmonomers. For instance, the modified monomers or stacked plurality of modified monomers may be subjected to electrophoresis within the three-dimensional matrix in order to linearize the modified monomer or stacked plurality of modified monomers. In one such example, the modified amino acid or stacked plurality of modified amino acids may comprise an acrydite or acrylamide moiety which can enable covalent coupling of the modified amino acid or stacked plurality of modified amino
acids to the gel matrix. Sub sequent to covalent incorporation into the gel matrix, the gel matrix comprisingthe modified amino acid or stacked plurality of modified amino acids may be subjected to an electric field to electrophorese and linearize the modified amino acid or stacked plurality of modified amino acids.
[00180] FIG. IF shows a schematic of a stacked plurality of modified monomers that can be analyzed or characterized using imaging, e.g., super-resolution imaging. Each modified monomer of the stacked plurality of modified monomers 123 comprises a cleaved monomer (shown as a circle), a linker (shown as a hexagon), and a polymerizable molecule, as described above. The modified monomers of the stacked plurality of modified monomers 123 may be spaced at any useful pitch, e.g., to enable optical resolution of the amino acid moieties (e.g., greaterthan 50 nm). In some embodiments, the optical resolution of the amino acid moieties can be greater than 0.1 nm, greaterthan 0.2 nm, greaterthan 0.5 nm, greaterthan 1 nm, greaterthan 2 nm, greaterthan 5 nm, greaterthan 10 nm, greaterthan 15 nm, greaterthan 20 nm, greaterthan 25 nm, greaterthan 30 nm, greater than 35 nm, greater than 40 nm, greater than 45 nm, or greater than 50 nm. The pitch may be controlled, for example, by controlling the length of the polymerizable molecules (e.g. , the number of nucleotides in a nucleic acid molecule, the number of monomers in a polymer, etc.) comprised by the modified amino acids. The stacked plurality of modified monomers 123 may be coupled to one or more anchor adapters 127, e.g., an anchor nucleic acid molecule, which may optionally be coupled to a substrate (e.g., flow cell, bead). In one example, a substrate may be provided that comprises a plurality of anchor adapters 127 (e.g., anchornucleic acid molecules) coupled thereto. The anchor adapters 127 may comprise identical sequences or a plurality of different sequences. A first portion (e.g., a first sequence such as an adapter sequence) of the stacked plurality of modified monomers may couple to one of the anchor adapters 127, thereby attaching the first portion of the stacked plurality of modified monomers to the substrate. Subsequent to attachment, the stacked plurality of modified monomers may be linearized, e.g., using application of a force such as fluid shear stress or hydrodynamic flow, application of an electric field, etc. The linearized stacked plurality ofmodified monomers may then be attached at a second portion (e.g., a second sequence) to another anchor adapter 127. The coupling of the first portion and the second portion of the stacked plurality of modified monomers may be facilitated by any useful mechanism, e.g., chemical conjugation, click chemistry, nucleic acid hybridization, biotin-streptavidin interaction, etc.
[00181] The linearized, stacked plurality of modified monomers may then be subjected to characterization or detection. Detection may be facilitatedby the use of binding agents 129 (e.g, antibodies, antibody fragments, nanobodies) that have specificity to a type of modified monomer,
e.g., a particular amino acid type. For instance, a library of binding agents comprising binding agents 129 thathave specificity to different amino acidtypesmay beprovided and contacted with the stacked plurality of modified monomers. The binding agents 129 may be labeled with a detectable label 131 (e.g., fluorophore, quantum dot) that identifies the binding agent 129. For instance, an anti-phenylalanine antibody may be labeled with a first fluorophore, and an antivaline antibody may be labeled with a second fluorophore that is different than the first fluorophore. Accordingly, detection of the fluorophores coupled to the binding agents, e.g., using fluorescence microscopy, may allow for identification of the amino acid type comprised by the modified monomers (e.g., detection of the first fluorophore would indicate presence of phenylalanine and detection of the second fluorophore would indicate presence of valine).
[00182] The detection of the detectable labels may be conducted in any useful sequence or approach. For example, a plurality of stacked plurality of modified monomers arising from a plurality of peptides may be provided on a surface (e.g., attached at a first portion to anchor adapters, linearized, and then attached at a second portion to additional anchor adapters). A first plurality of binding agents that are specific to a single monomer type (e.g., an anti-phenylalanine antibody or antibody fragment) and that are labeled with a first fluorophore may be provided and contacted with the plurality of stacked plurality of modified monomers and allowed to bind. Excess binding agent may optionally be washed or removed. Imaging, e.g., using super-resolution microscopy (e.g., dSTORM), may be performed to image the location of the first fluorophores, thereby providing information on the identity of monomer (the monomer type) and spatial (location) of the identified monomer. Subsequently, a second plurality of binding agents that are specific to another single monomer type (e.g., an anti-valine) antibody or antibody fragment) that are labeled with a second fluorophore may beprovided and contacted with the plurality of stacked plurality of modified monomers and allowed to bind. The imaging of the second fluorophore may then add additional information, identifying the monomer type (e.g., valine) and location of the identified monomer. The process may be repeated for all different monomer types (e.g., all 20 different proteinogenic amino acidtypes), until all monomer typesare identified. The combination of the imaging data from each cycle may yield the spatial location and identity of each modified monomer of the stacked plurality of modified monomers, thereby sequencingthe stacked plurality of modified monomer and the polymeric analyte from which the stacked plurality of modified monomers was generated.
[00183] In another example, multiple binding agents with different fluorophores and binding specificities may be provided (e.g., anti-phenylalanine, anti-valine, anti-arginine, anti -leu cine), with each binding agent type comprising a different fluorophore. After contacting the binding
agents with the stacked plurality of modified monomers and imaging, the process may be iterated until all monomer types are identified (e.g., 5 rounds of 4 different binding agents that recognize different amino acid types, thereby identifying all 20 proteinogenic amino acids). Alternatively, each of the binding agent types may comprise a set of fluorophores (e.g., a combination of four different fluorophores), and the unique spectral combination of the set of fluorophores may be used to identify the binding agent and thus the monomer type.
[00184] In some instances, the modified monomer or stacked plurality of modified monomers are embedded or immobilized (covalently or noncovalently) to a three-dimensional matrix and subjected to imaging within the three-dimensional matrix. In some instances, expansion microscopy may be implemented, which can enable higher resolution analysis of the individual modified amino acids, as described by F. Chen et al, 2015. Science. 347, 6621 , pp. 543-548, which is incorporated by reference herein in its entirety.
[00185] Beneficially, the approaches outlined herein may allowfor high-throughput, rapid and accurate sequencing of polymeric analytes. For instance, by generating the modified monomers using the methods described herein, attaching the modified monomers or stacked plurality of modified monomers to a substrate, and using binding agents comprising detectable labels and a super-resolution imaging detection method, many reads can be generated in parallel. For instance, for a 1 micrometer pitch between stacked pluralities of modified monomers (e.g., assuming a stacked plurality of modified monomers take over an area of approximately 1 mm x 1 mm), and that approximately 50% of the Immm x 1mm area squares are filled, a flow cell of 20 mm x 50 mm in size may be able to generate about 500 million reads at single-amino acid resolution.
[00186] FIG. 1G depicts another detailed schematic of an iterative workflow for the characterization of polymeric analytes, such as peptides, immobilized on a substrate, using trifunctional linkers. In some embodiments, the process begins with the anchoring of the analyte to the substrate via a nucleic acid-based anchor molecule (labeled as 105). This anchor molecule may enable robust immobilization while maintaining molecular accessibility, a critical feature for sub sequent biochemical processing. The polymeric analyte, labeled as 103, is shown as a peptide chain subjected to sequential cycles of modification, imaging, and analysis. In some embodiments, the polymeric analyte 103 is contacted with a trifunctional linker 107 comprising an amino acid reactive moiety (e.g., PITC), an alkyne click chemistry moiety, which may be reacted with a polymerizable molecule 111 (e.g., a linking nucleic acid molecule) comprising a complementary azide click chemistry moiety, and a fluorophore (e.g., a BODIPY dye). In some instances, the trifunctional linker 107 is pre-tethered to the polymerizable molecule 111 (depicted as a linking nucleic acid molecule 111); alternatively, the trifunctional linker 107 and the polymerizable
molecule 111 may be provided separately. In some embodiments, the trifunctional linker 107 then reacts with the N-terminal amino acid of polymeric analyte 103, generating a monomertrifunctional linker complex. Next, the polymerizable molecule 111 is covalently linked to an anchoring molecule on the substrate. In some embodiments, the N-terminal amino acid of polymeric analyte 103 is then cleaved and forms a cleaved monomer- trifunctional linker complex 109, which is tethered onto the substrate via the linking nucleic acid molecule 111. In some embodiments, the cleaved monomer-trifunctional linker complex 109 is then detected via the fluorophore (e.g., aBODIPY dye) on the trifunctional linker, or is first contacted with an antibody specific to the cleaved monomer-trifunctional linker complex and comprises a fluorophore.
[00187] Each modification step may be followedby imaging of the tagged or derivatized amino acids. In some embodiments super-resolution imaging, such as dSTORM (direct Stochastic Optical Reconstruction Microscopy), is employed to detect the tagged residues with nanometerscale resolution. This imaging method is particularly effective in distinguishing closely spaced residues or modifications, as it provides a high signal-to-noise ratio and the ability to resolve molecular features below the diffraction limit of conventional microscopy.
[00188] In some embodiments, following imaging, the workflow progresses to the next cycle, where the subsequent amino acid residue is processed. This iterative cycle of cleavage, modification, and imaging may continue along the length of the peptide chain until the entire sequence has been analyzed. In some embodiments, the methodology accommodates the use of electric fields or shear stress to linearize the peptide chain during processing, ensuring uniform spatial alignment of residues for consistent imaging and analysis.
[00189] Alternatively, cleaved monomer-trifunctional linker complexes may form a stacked plurality of modified amino acids in an intramolecular expansion process described elsewhere herein. A plurality of antibodies may be contacted with the stacked plurality of modified amino acids to specifically recognize the cleaved monomer-trifunctional linker complexes. The identity of each amino acid residue can be determined through imaging using super-resolution imaging. [00190] In some embodiments, the trifunctional linker may be illuminated in the process of contacting the polymeric analyte 103. The fluorophore of the trifunctional linker may release heat into a localized space where an amino acid reactive moiety (e.g., PITC) reacts with the N-terminal monomer of the polymeric analyte 103. This heat-driven localized reaction provides several advantages, including controlled and efficient activation of chemical interactions at the desired site on the analyte. The approach leverages the photothermal properties of the fluorophore to ensure spatial specificity and minimize unwanted side reactions, thereby improving the accuracy and reliability of downstream analysis, such as sequencing or structural characterization. It will
be appreciated that the methods described herein may also utilize other multifunctional linkers capable of modulating reaction speed or conditions.
[00191] Iteration: In some instances, one or more of the operations described herein may be iterated or repeated. Iteration of the operations may allow for sequential processing, analysis, or identification of the individual monomers of the polymeric analyte, which can allow for reconstruction of the entire polymeric analyte. For example, referring to FIGs. 1A-1B, the operations of the workflow 100a and 100b may be conducted to generate a modified amino acid sequentially for each terminal monomer (e.g., NTAA) of the polymeric analyte (e.g., peptide). In some instances, the individual modified amino acids may thenbe immobilized to a substrate (e.g, the same or separate substrate as shown in FIG. 1A), linearized, optionally immobilized (e.g., at another end), and detected. Alternatively, the individual modified amino acids may be combined, e.g., via hybridization or ligation ofthe polymerizable molecules to generate the stacked plurality of modified amino acids. As such, the stacked plurality of modified amino acids may comprise a plurality of polymerizable molecules from multiple rounds or cycles. As such, the polymerizable molecule of the second (or third, fourth, fifth, . . . nth) cycle may be configured to only couple to the first (or second, third, fourth, . . .n-lth) polymerizable molecule. For example, the first cycle polymerizable molecule may comprise a unique binding sequence that is absent on the capture moiety of the substrate, and to which the second cycle polymerizable molecule can bind. Accordingly, the second cycle polymerizable molecule may only bind to the first cycle polymerizable molecule and not to any of the capture moieties. In the event that an amino acid is missed during a cycle (a “null” event, e.g., not cleaved, not coupled to the linker or polymerizable molecule, etc.), a bridging polymerizable molecule may be provided that encodes for a null event but comprises the unique binding sequence, such that sub sequent rounds may continue, even if a null event occurs.
[00192] In instances where one or multiple additional capture moieties are used to couple to the modified monomers, the polymerizable molecules (e.g., linking nucleic acid molecules 111) may additionally encode temporal information, e.g., the cycle or iteration number, such that the order of the individual monomers may be determined. For example, for a given peptide, the terminal amino acid may be coupled to a polymerizable molecule that comprises a barcode sequence that identifies the cycle number (e.g., cycle 1) (not shown). The information encoded by the barcode sequence may be coupled to an adjacent (additional) capture moiety (not shown). Following cleavage of the monomer from the polymeric analyte (e.g., as shown in process 113 of FIG. IB), the workflow may be repeated for the n-1 terminal amino acid, which may again be
coupled to a capture moiety via a linker and barcoded polymerizable molecule and cleaved. The barcoded polymerizable molecule may comprise the cycle number (e.g., cycle 2).
[00193] In some instances, temporal information may be provided separately. For example, prior to, during, or subsequent to coupling of a polymerizable molecule 111 to the capture moiety, a temporal barcode may be provided that can couple to the polymerizable molecule 111 or to the capture moiety 105, or a combination thereof. The temporal barcode may comprise any useful agent, including a nucleic acid molecule, a peptide, a lipid, a carbohydrate, an enzyme (e.g., a chromogenic or fluorogenic enzyme) or a ribozyme or DNAzyme, a fluorophore, a dye, an intercalating agent, a dideoxynucleotide, a fluorescent nucleic acid molecule or nucleotide, a radioisotope, a mass tag, or other detectable label that can indicate the time or cycle (or iteration) number in which it is provided. In some instances, the temporal barcode comprises a cycle-specific nucleic acid barcode molecule, which can couple to the polymerizable molecule 111 or to the capture moiety 105. The temporal barcode may comprise any additional useful functional sequences, e.g., primer sites, sequencing sites, restriction sites, abasic or cleavable sites, etc. In some instances, the temporal barcode may comprise an amplification site that allows forbridge amplification of the temporal barcode and optionally, the coupled polymerizable molecules, to other capture or polymerizable molecules.
[00194] Polymeric Analytes: The sequencing approaches provided herein may be used for analyzing and characterizing peptides or for other types of polymeric analytes. For example, the methods outlined herein may be useful in sequencing or analyzing a biomolecule, macromolecule, or synthetic molecule. The polymeric analyte may be a biomolecule or other biological molecule that comprises one or more monomers. Non-limiting examples of polymeric biomolecules include nucleic acid molecules (e.g., DNA molecule, RNA molecule, DNA:RNA hybrids, aptamers), peptides and proteins, polysaccharides, lipid polymers (e.g., diglycerides, triglycerides and other fatty acids). The polymeric analyte may be a synthetic molecule, e.g., a peptoid or synthetic polymer, or a peptidomimetic (e.g., a peptoid, a beta-peptide, a D-peptide peptidomimetic). Nonlimiting examples of synthetic polymers include acrylics, nylons, silicones, viscose, rayon, polyesters, poly carboxylic acids, polyvinyl acetate, polyacrylamide, polyacrylate, polyethylene glycol, polyurethane, polylactic acid, silica, polystyrene, polyacrylonitrile, polybutadiene, polycarbonate, polyethylene terephthalate, poly(chlorotrifluoroethylene), polyethylene oxide), poly(ethylene terephthalate), polyethylene, polyisobutylene, poly(methyl methacrylate), poly(oxymethylene), poly formaldehyde, polypropylene, polystyrene, polytetrafluoroethylene), poly(vinyl acetate), poly(vinyl alcohol), poly(vinyl chloride), poly(vinylidene dichloride),
poly(vinylidene difluoride), poly(vinyl fluoride), or a combination thereof. The polymeric analytes may comprise a single polymer type (e.g., a homopolymer) or more than one polymer type (e.g., a copolymer) and may comprise random or arranged monomers. The polymeric analytes may be a block polymer, alternating copolymer, periodic copolymer, statistical copolymer, stereoblock copolymer, gradient copolymer, branched copolymer, graft copolymer, etc.
[00195] The polymeric analytes may be any size or comprise a range of sizes. The polymeric analyte may be about 1 nanometer (nm), about 5 nm, about 10 nm, about 20 nm, about 30 nm, about 40 nm, about 50 nm, about 60 nm, about 70 nm, about 80 nm, about 90 nm, about 100 nm, about200 nm, about 300 nm, about400 nm, about 500 nm, about 600 nm, about 700 nm, about 800 nm, about 900 nm, about 1 micrometer (pm), about 10 pm, about 100 pm, about 1 millimeter mm in size or greater. A plurality of polymeric analytes may comprise polymeric analytes of similar size or within a range of sizes, e.g., between about 10 nm to about lOO nm, between about 50 nm to about 1 pm. Similarly, the polymeric analytes may have any molecular weight or range of molecular weights. The polymeric analytes may be about 10 daltons (Da), 100 Da, 500 Da, 1 kilodalton (kDa), 10 kDa, 100 kDa, 1,000 kDa, 10,000 kDa, 100,000 kDa, or greater. The polymeric analytes may comprise polymeric analytes of similar molecular weight or within a range of molecular weights.
[00196] The monomers of the polymeric analytes may comprise any size or range of sizes that is less than that of the entire polymeric analyte. A monomer may be about 0. 1 nanometer (nm), about 0.5 nm, 1 about 1 nm, about 5 nm, about 10 nm, about 20 nm, about 30 nm, about40 nm, about 50 nm, about 60 nm, about 70 nm, about 80 nm, about 90 nm, about 100 nm, about 200 nm, about 300 nm, about 400 nm, about 500 nm, about 600 nm, about 700 nm, about 800 nm, about 900 nm, about 1 micrometer (pm), about 10 pm, about 100 pm, about 1 millimeter mm in size or greater. The monomers may have any molecular weight or range of molecular weights. The monomers may be about 1 dalton (Da), 10 Da, 100 Da, 500 Da, 1 kilodalton (kDa), 10 kDa, 100 kDa, 1,000 kDa, 10,000 kDa, 100,000 kDa, or greater. The monomers or polymeric analytes may range in size of molecular weight; for example, a polymeric analyte may comprise a peptide comprising amino acid monomers, which may vary in molecular weight from 75 Da (glycine) to 204 Da (tryptophan).
[00197] The polymeric analytes may comprise any number of monomers. The polymeric analytes may comprise about 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1,000, 2,000, 3,000, 4,000, 5,000, 6,000, 7,000, 8,000, 9,000, 10,000, 50,000, 100,000 or more monomers. The polymeric analytes may comprise at least about
2, at least about 3, at least about 4, at least about 5, at least about 6, at least about 7, at least about 8, at least about 9, at least about 10, at least about 20, at least about 30, at least about 40, at least about 50, at least about 60, at least about 70, at least about 80, at least about 90, at least about 100, at least about 500, at least about 1,000, at least about 5,000, at least about 10,000, at least about 50,000 at least about 100,000 or greater monomers. Alternatively, the polymeric analytes may comprise atmost about 100,000, at most about 50, 000, atmost about 10,000, at most about 5,000, at most about 1,000, at most about 500, at most about 100, at most about 50, at most about 10, at most about 5, or fewer monomers. The polymeric analytes may comprise a range of monomers; for example, a polymeric analyte may comprise about 5 monomers whereas another polymeric analyte may comprise about 500 monomers.
[00198] In some instances, the polymeric analyte comprises a peptide comprising amino acid monomeric units. The peptide may be naturally occurring or synthetic. The peptide may comprise any number of amino acids. The amino acids may be one of 20 proteinogenic amino acids and may comprise any number of post-translational modifications. The peptides or any of the constituent amino acids may be processed, e.g., contacted with protecting groups, alkylated, betaelimination of phosphate groups, etc., as is described elsewhere herein. In some instances, the peptides are derived from larger peptides or proteins and are fragmented.
[00199] Substrates'. One or more operations described herein may be performed using a substrate. For example, one or more molecules described herein (e.g., polymeric analyte such as a peptide, capture moiety, polymerizable molecule) may be coupled to a substrate. In some instances, the polymeric analyte, capture moiety, and one or more polymerizable molecules (e.g, the first or second polymerizable molecule), or a combination thereof may be provided coupled to one or more substrates. In one example, the polymeric analyte and a capture moiety are coupled to a substrate. In some instances, more than one substratemay be used. In such cases, the substrates may comprise the same material or different material.
[00200] The substrate may be made from any suitable material, e.g., glass, silicon, gel, polymer, etc., as is described elsewhere herein. In some instances, the substrate maybe a bead or a gel bead (e.g., polyacrylamide, agarose, or TentaGel® bead). The substrate may be functionalized. One or more molecules, e.g., a capture moiety and the polymeric analyte (e.g., a peptide) may be coupled to the substrate via a covalent or non-covalent interaction. The capture moiety and polymeric analyte (e.g., peptide) can be coupled to the substrate using any suitable chemistry, e.g., click chemistry moieties (e.g., alkyne-azide coupling), photoreactive groups (e.g, benzophenone), l-ethyl-3-(3-dimethylaminopropyl)carbodiimide hydrochloride (EDC) (e.g., to
couple amino-oligos or peptides), N-hydroxy sulfosuccinimide (NHS), Sulfo-NHS, orNHS-esters (e.g., to couple sulfhydryl oligos), maleimides, hydrazines, hydroxyl amines, thiols, biotinstreptavidin interactions, cystamine, glutaraldehyde, formaldehyde, succinimidyl 4-(N- maleimidomethyl)cyclohexame-l-carboxylate (SMCC), Sulfo-SMCC, 4-(4, 6-Dimeth oxy- 1,3,5- triazin-2-yl)-4-methylmorpholinium chloride (DMTMM), silane (e.g., amino silanes), combinations thereof, etc. In some instances, the substrate may be functionalized to comprise a coupling chemistry to couple the polymeric analyte or the capture moiety. In one non-limiting example, a substrate (e.g., bead or surface) may comprise an alkyne such as dibenzocyclooctyne (DBCO), which may be configured to react to an amine (e.g., DBCO-alcohol, DBCO-Boc, DBCO-NHS), a carboyxl or carbonyl (e.g., DBCO, DBCO-silane), a sulfhydryl, etc. An azide- functionalized nucleic acid or protein may react with DBCO to link the nucleic acid or protein to the DBCO substrate. In other examples, linkers such as bifunctional linkers may be used to attach a molecule to a substrate; such bifunctional linkers may comprise the same reactive moiety on both ends or a different moiety at each end (e.g., heterobifunctional linker). Additional examples of linkers are described elsewhere herein.
[00201] In some instances, a molecule (e.g., polymeric analytes such as peptides, capture moieties, polymerizable molecules) may be coupled to the substrate using an enzymatic approach, e.g., as described elsewhere herein. For example, a chemical linker or moiety such as a click chemistry moiety may be attached to a polymeric analyte (e.g., peptide) using an enzyme. The chemical linker or moiety may be able to react with another chemical linker or moiety (e.g., click chemistry moiety) of a substrate, capture moiety, or polymerizable molecule.
[00202] The substrates may be coupled to any useful number of molecules (e.g., polymeric analytes, modified monomers, stacked plurality of modified monomers, capture moieties, polymerizable molecules). In some instances, a substrate may comprise a plurality of polymeric analytes (e.g., peptides) and a plurality of capture moieties, which may be provided at any useful ratio or density. For example, the ratio of polymeric analytes, modified monomers, or stacked plurality of modified monomers to capture moieties may be about 1 :1, 1 :5, 1 :10, 1 :20, 1 :100, 1 : 1000, 1 :10,000, 1 :100,000, 1 :1,000,000 or lower. In some instances, the ratio of polymeric analytes to capture moieties may be at most about 1 : 1, at most about 1 :5, at most about 1 :10, at most about 1 :20, at most about 1:100, at most about 1 :1000, at most about 1 :10,000, at most about 1 : 100,000, at most about 1 : 1,000, 000 or lower.
[00203] Similarly, the molecules (e.g., polymeric analytes, modified monomers, stacked plurality of modified monomers, capture moieties, or polymerizable molecules) may be coupled to the substrate at any useful density, for example about 1 molecule/ square micron (p.m2), about
10 molecules/pm2, about 100 molecules/pm2, about 1,000 molecules/pm2, about 10,000 molecules/pm2 , about 100,000 molecules/pm2 , about 1,000,000 molecules/pm2 , about 10,000,000 molecules/pm2, about 100,000,000 molecules/pm2, about 1,000,000,000 molecules/pm2, about 10,000,000,000 molecules/pm2, about 100,000,000,000 molecules/ m2, or greater. The polymeric analytes, capture moieties, and polymerizable molecules may be coupled to the substrate at a range of densities, e.g., from about 100 to about 10,000 molecules/pm2, or from about 10 to about 1,000 molecules/pm2. The density of the polymeric analytes, capture moieties, and polymerizable molecules may be the same or different. For example, the density of the polymerizable molecules may be 1-fold, 2-fold, 3-fold, 4-fold, 5-fold, 6-fold, 7-fold, 8-fold, 9-fold, 10-fold, 100-fold, 1000-fold, 10,000-fold, 100,000-fold, 1,000,000-fold or greater-fold lower than that of the polymeric analyte.
[00204] In some instances, the molecules coupled to the substrate may be spaced apart at a designated or controlled distance. For example, the average spacing or distance between the capture moieties or the processed polymeric analyte, e.g., the stacked plurality of modified monomers, may be spaced at a pitch of about 1 nanometer (nm), about 2 nm, about 3 nm, about 4 nm, about 5 nm, about 6 nm, about 8 nm, about 9 nm, about 10 nm, about 20 nm, about 30 nm, about 40 nm, about 50 nm, about 60 nm, about 70 nm, about 80 nm, about 90 nm, about 100 nm, about 500 nm, about 1pm, about 5 pm, about l0 pm or greater. In some instances, the spacing between or pitch of the molecules (e.g., capture moieties, polymeric analyte, or processed polymeric analyte such as the stacked plurality of modified monomers) may be at most about 10 pm, at most about 5 pm, atmost about 1pm, at most about 500 nm, at most about 100 nm, at most about 90 nm, at most about 80 nm, at most about 70 nm, at most about 60 nm, at most about 50 nm, at most about 40 nm, at most about 30 nm, at most about 20 nm, at most about 10 nm, at most about 5 nm, or less. Similarly, the spacing or distance between a polymeric analyte and a polymerizable molecule or capture moiety may be about 1 nanometer (nm), about 2 nm, about 3 nm, about 4 nm, about 5 nm, about 6 nm, about 8 nm, about 9 nm, about 10 nm, about 20 nm, about 30 nm, about40 nm, about 50 nm, about 60 nm, about 70 nm, about 80 nm, about 90 nm, about 100 nm, about 500 nm, about 1 pm or greater. In some instances, the average spacing between the capture moiety and the polymeric analyte coupled to the substrate may be at most about 1 pm, at most about 500 nm, at most about 100 nm, at most about 90 nm, at most about 80 nm, at most about 70 nm, at most about 60 nm, at most about 50 nm, at most about 40 nm, at most about 30 nm, atmost about 20 nm, at most about 10 nm, at most about 5 nm, or less. A range of
average distances between the polymerizable molecules from one another or from the polymeric analytes may be used, e.g., from about 1 nm to about 40 nm, from about 2 nm to about lO nm, etc. [00205] The concentration or density of the molecules attached to the substrate may be modulated using one or more suitable approaches, including patterning or random deposition approaches. Examples of methods to control the concentration or density of the molecules attached to the substrate include limited dilution, addition of chaotropes (e.g., guanidine, formamide, urea), using metal organic compounds, etc. The moleculesmay be attachedto the substrate in a patterned fashion, e.g., using self-assembling monolayers, photopatterning, lithography, etching, or a combination thereof, or the molecules may be randomly arranged.
[00206] The substrate may comprise any useful size or dimension (e.g., length, width, height, diameter, radius), surface area, volume, or ratio or combination thereof. The substrate may comprise a bead or particle that may comprise a diameter of about 1 nanometer (nm), about 2 nm, about 3 nm, about 4 nm, about 5 nm, about 6 nm, about 8 nm, about 9 nm, about 10 nm, about 20 nm, about 30 nm, about 40 nm, about 50 nm, about 60 nm, about 70 nm, about 80 nm, about 90 nm, about 100 nm, about 500 nm, about 1pm, about 2 pan, about 3 pm, about 4 pm, about 5 pm, about 6 pm, about 7 pm, about 8 pm, about 9 pm, about 10 pm, about 20 pm, about 30 pm, about 40 pm, about 50pm, about 60pm, about 70 pm, about 80 pm, about 90 pm, about 100 pm, about 200 pm, about 300 pm, about 400 pm, about 500 pm, about 600 pm, about 700 pm, about 800 pm, about 900 pm, about 1 millimeter (mm) or greater. The substrate may comprise a surface area of about 1 square nanometer (nm2), about 10 nm2, about 100 nm2, about 1,000 nm2, about 10,000 nm2, about 100,000 nm2, about 1 pm2, about 10 pm2, about 100 pm2, about 1,000 pm2, about 10,000 pm2, about 100,000 pm2, about 1 mm2, about 10 mm2, about 100 mm2, about 1,000 mm2, about 10,000 mm2, about 100,000 mm2, about 1,000,000 mm2or greater.
[00207] The molecules may be coupled to the substrate in an ordered or random arrangement. In ordered arrangements, the molecules may be patterned using any conventional approach such as lithography (e.g., softlithography, photolithography), etching(e.g., ion etching, photo etching), or other patterning approach. In some instances, a linker (e.g., bifunctional linker) may be used to facilitate the coupling of the molecules (e.g., polymeric analytes, polymerizable molecules, capture moieties) to the substrate; such linkers may be patterned using any useful technique such as self-assembling monolayers, photopatterning, lithography, etching. In some instances, the molecules may be coupled to the substrate in a random arrangement. For example, the molecules may be provided at a stoichiometric ratio or controlled concentration to couple the molecules at
any useful ratio or density. In some instances, the substrate may comprise topographical or patterned features which may facilitate attachment of linkers to the patterned features.
[00208] In some instances, the methods provided herein may comprise using a plurality of substrates. For instance, the preparation of the modified monomers or stacked plurality of modified monomers may be performed using a substrate (e.g., as shown in FIG. 1A). The modified monomers or stacked plurality of modified monomers may then be removed from the substrate and contacted with an additional substrate for coupling and detection. In one such example, a modified monomer or stacked plurality of modified monomers may be contacted with a flow cell comprising one or more attachment or anchor molecules (e.g., anchor nucleic acid adapters). The modified monomer or stacked plurality of modified monomers may be coupled to the flow cell via one of the anchor molecules, linearized (e.g., usingflow or electrophoretic force), and attached at another point to another anchor molecule. The linearized molecule may then be detected, as described elsewhere herein.
[00209] Similarly, one or more substrates may be used for purification or enrichment. For example, referring again to FIG. IB, sub sequent to process 106, 112, 113, or sub sequent to any iterations of the workflow, one or more purification or enrichment operations may be performed. In one such example, a bead comprising a complementary nucleic acid sequence to at least a portion of the capture moiety 105, the linking nucleic acid molecule, or the additional polymerizable molecule may be used to purify or enrich for the desired product. In one particular example, sub sequent to coupling of the linking nucleic acid molecule 111 to the capture moiety 105, the coupled product may be purified from the sample using a magnetic bead comprising a complementary sequence to the capture moiety 105.
[00210] FIG. 2A schematically shows an example linker that may be used in sequencing polymeric analytes such as peptides. FIG. 2A Panel A shows a bifunctional linker 203 (e.g., 1- (but-3-yn-l-yl)-4-isothiocyanatobenzene) comprising an amino acid reactive moiety (e.g., PITC) and an alkyne click chemistry moiety, which may be reacted with a polymerizable molecule 201 (e.g., a linking nucleic acid molecule) comprising a complementary azide click chemistry moiety. The bifunctional linker may also comprise a spacer moiety, e.g., an alkyl chain (an ethyl group is depicted) of any length, a polymer (e.g., PEG) of any length, etc. The spacer moiety may be located between the amino acid reactive moiety and the click chemistry moiety. FIG. 2A Panel B shows the product of a click chemistry cycloaddition reaction between the azide and alkyne groups to generate a linker molecule comprising the polymerizable molecule and the amino acid reactive moiety . The conjugation of the polymerizable molecule 201 to the bifunctional linker 203 may occur at any useful or convenient step. In alternative examples (not shown), the bifunctional
linker 203 may comprise an azide group, e.g., l-(2-azidoethyl)-4-isothiocyanatobenzene, which can be reacted to a polymerizable molecule 201 comprising an alkyne moiety.
[00211] In some instances where the polymeric analyte comprises a peptide that comprises amino acid monomers, the coupling of the linker to an amino acid (e.g., NTAA or CTAA) changes the chemical structure of the amino acid. For example, if using a linker comprising an isothiocyanate moiety, the amino acid may be derivatized to a thiocarbamyl group (e.g., under mildly alkaline conditions) during or subsequent to contact with the isothiocyanate moiety. One or more further derivatizations may be performed. For instance, the amino acid or amino acid derivative (e.g., thiocarb amyl-derivatized amino acid) may be further derivatized to a thiazolone group (e.g., under acid conditions), a thiohydantoin group, or other chemical moiety. Similarly, a thiazolone group or thiohydantoin group may be further derivatized to a thiocarbamyl group. Additional examples of linkers that can be useful in processing or sequencing polymeric analytes such as peptides are provided in International Patent Application No. PCT/US2023/079684, U.S. Prov. Pat. App. Nos. 63/481,932, filed January 27, 2023, and 63/601,389, filed November 21, 2023, each of which applications is incorporated herein in its entirety.
[00212] Modifications ofMonomers: In some instances, one or more monomers (e.g., amino acids) of the polymeric analyte (e.g., peptide) may be modified. Modifications may be naturally- occurring (e.g., post translational modifications) or non-naturally occurring, such as by labeling or tagging, e.g., with an amino acid-oramine-reactive agentsuch asanisothiocyanate(e.g., PITC, NITC), l-fluoro-2, -4 -dinitrobenzene (DNFB), dansyl chloride, 4-sulfonyl-2-nitrobfluorobenzene (SNFB), an acetylating agent, an acylating agent, an alkylating agent, a guanidination agent, a thioacetylation agent, a thioacylation agent, a thiobenzoylation agent, or a derivative or combination thereof. Alternatively, or in addition to, the one or more monomers may be modified to comprise any useful moiety such as an adduct (e.g., a polymer such as PEG, a polymerizable molecule such as a nucleic acid molecule, a nanoparticle or nanotube, a peptide or protein), a d a, a carbohydrate, a metabolite, a fluorophore, a hapten, a quencher, a tag (e.g., a fluorescent tag a magnetic tag, a radioactive tag), a barcode, or other moiety. In some instances, a monomer of the polymeric analyte may be modified to facilitate recruitment of an enzyme to recognize or cleave a terminal monomer (e.g., aNTAA or CTAA of a peptide, the 5’ or 3 ’ nucleotide of a nucleic acid molecule, or the first or last monomer of a polymer) or set of monomers. For example, a terminal amino acid of a peptide analyte may be modified with a saccharide in order to recruit a lectin or lectin-bound protease. In another example, one or more monomers of a polymeric analyte may comprise or be coupled to a nucleic acid molecule having a first sequence that is complementary to a second sequence comprised by an oligo-bound protease. Hybridization of the first sequence
to the second sequence may facilitate local recruitment of the protease to the monomer to be cleaved. In yet another example, a peptide analyte may be modified with PITC, which may allow for recruitment and cleavage by an Edmanase. In some examples, modifications to monomers of a polymeric analyte may include epitope tags, which can facilitate binding of a binding agent (e.g, subsequent to cleavage of the monomer from the polymeric analyte). Examples of such epitope tags include fluorophores, nucleic acid molecules, peptides, haptens, polymers, chemical moieties, or other adduct molecule. Additional examples of modifications to polymeric analytes are described elsewhere herein.
[00213] The polymeric analyte may comprise one or more modified monomers. The modification of the monomers may be naturally occurring, or synthetic. Synthetic modifications may be performed prior to, during, or subsequent to cleavage of a monomer from the polymeric analyte and may be advantageous in preserving the identity of the monomer. For instance, during standard Edman degradation reactions to cleave a terminal amino acid (monomer) from the peptide, some amino acid residues may be altered or rendered undetectable by the reaction conditions. In an example, the conditions of Edman degradation may cause oxidation of cysteine residues, dehydration or destruction of the phenylthiohydantoin (PTH) forms of serine or threonine, react with and modify lysine residues, or render some post-translational modifications undetectable. As such, if modifying a peptide prior to analysis, e.g., to protect some of the amino acid residues or post-translational modifications may be useful in more accurately identifying each of the amino acid residues. In an example of a modification that can be performed prior to cleavage, a peptide or portion thereof may be alkylated, e.g., to alkylate the cysteine residues (e.g, using 4-vinylpyridine, iodoacetamide, iodoacetate, chloroacetate); acetylated, e.g., to react serine or threonine residues form an ester (e.g., using acetyl chloride) orusingacetic anhydride; oxidized, e.g., to convert cysteine residues to cysteic acid; reduced (e.g., using a reducing agent such as dithiothreitol, P-mercaptoethanol, or TCEP); contacted with a protecting group, e.g., phosphorylated residues may be protected (e.g., using a P-elimination of a phosphate group, with an optional Michael addition of a thiol group, e.g., as described in Knight, et al. Nat Biotechnology .21 , 1047-1054 (2003), which is incorporated by reference herein in its entirety), etc. The polymeric analyte or monomer may be modified with a protecting group or moiety, such as a methyl, formyl, ethyl, acetyl, t-butyl, anisyl, benzyl, tifluoroacetyl, N-hydroxysuccinimide, t- butyloxycarbonyl (Boc), benzoyl, 4-methyl benzyl, thioanizyl, thiocresyl, benzyl oxymethyl, 4- nitrophenyl, benzyloxycarbonyl, 2-nitrobenzoyl, 2-nitrophenylsulphenyl, 4 -toluene sulphonyl, pentafluorophenyl, diphenylmethyl, 2-chlorobenzyloxycarbonyl, 2,4,5-trichlorophenyl, 2- bromobenzyloxy carbonyl, 9-fluorenylmethyloxycarbonyl (FMOC), triphenylmethyl, or2,2,5,7,8-
pentamethy l-chroman-6-sulphonyl group . The polymeric analyte or monomer may be treated with a protecting agent, e.g., carboxyethyl methanethiosulfonate (CEMTS), thiazolidine, mercaptophenyl acetic acid, cyanobenzothiazole (e.g., for lipidation of N-terminal cysteines), acetamidomethyl, 2-methylsulfonylethyl-oxy carbonyl, etc. In some instances, the lysine residues may be blocked (e.g., the primary amines of lysine residues may be reacted) using an isothiocyanate (e.g., PITC), and optionally carrying out a single round of Edman degradation to generate a new N-terminal exposed end.
[00214] In some instances, a monomer of the polymeric analyte may be modified to facilitate cleavage of the monomer from the polymeric analyte. For example, an amino acid monomer of a peptide polymeric analytic may be modified such that it is recognized by an enzyme, e.g., acetylation of an amino acid, which can facilitate acyl peptide hydrolase cleavage of the acetylated amino acid. Additional or alternative modifications to the monomers, such as those described herein, may also facilitate recognition by or interaction with an engineered cleaving enzyme.
[00215] In some instances, a monomer comprising a naturally -occurring modification may be treated to remove or alter the naturally-occurring modification to render the polymeric analyte or monomer more amenable to the processing operations disclosed herein. For example, acetylation, formylation, methylation, and pyrrolidone carboxylic acid post-translational modifications may be removed prior to sequencing. Acetylation modifications may be removed with acyl peptide hydrolase or acid treatment (e.g., using IN HC1). Methylation may be removed using aminopeptidases. Formylation modifications may be removed, for example, using acid treatment (e.g., 0.6M HC1 treatment). Pyrrolidone carboxylic acid (PCA) may be removed with pyroglutamate aminopeptidase. Exemplary C-terminal modifications may include amidation and methylation, both of which may be removed using carboxypeptidases.
[00216] Coupling of Polymerizable Molecules: The polymerizable molecules described herein may be coupled to one another using any useful approach. Such couplingmay comprise a covalent interaction or a noncovalent interaction (e.g., ionic interaction, hydrophobic interaction, van der Waals forces, etc.). In some instances, a first polymerizable molecule (e.g., a linking nucleic acid molecule) and a second polymerizable molecule (e.g., a capture moiety) comprise nucleic acid molecules and may be coupled via hybridization, ligation, or both. For instance, the first polymerizable molecule may comprise a first sequence that is complementary to a second sequence of the second polymerizable molecule, and the coupling may occur via hybridization of the first sequence to the second sequence. Alternatively, the first sequence and the second sequence may notbe complementary to one anotherbutmay be complementary to a third sequence
and a fourth sequence, respectively, of a splint or bridge oligonucleotide. Accordingly, coupling of the first polymerizable molecule to the second polymerizable molecule may be mediated by hybridization of the first and second sequences to the third and fourth sequences, respectively, of the splint or bridge oligonucleotide.
[00217] In some instances, a nucleic acid reaction may be performed as part of or in addition to the coupling of the first polymerizable molecule to the second polymerizable molecule. For example, the first sequence of the first polymerizable molecule may hybridize to the second sequence of the second polymerizable molecule, and a nucleic acid extension reaction (e.g., using a polymerase) may be performed. Such an extension reaction may allow for transfer of the encoded information of one of the polymerizable molecules (e.g., the first polymerizable molecule) to another polymerizable molecule (e.g., the second polymerizable molecule). In another example, the first sequence of the first polymerizable molecule may be ligated to the second sequence of the second polymerizable molecule to provide a first polymerizable molecule covalently coupled to the second polymerizable molecule.
[00218] The polymerizable molecules may be coupled chemically, either covalently or noncovalently. In some instances, the first polymerizable molecule may be chemically linked to the second polymerizable molecule. For example, the first polymerizable molecule may comprise a first reactive moiety, and the second polymerizable molecule may comprise a second reactive moiety that is capable of reacting with the first reactive moiety. The first reactive moiety may be contacted with the second reactive moiety and b e subj ected to conditions sufficient to link the first reactive moiety to the second reactive moiety, e.g., via click chemistry. In other instances, the first polymerizable molecule may be coupled to the second polymerizable molecule via a noncovalent or indirect interaction, e.g., biotin-streptavidin.
[00219] In some instances, subsequent to detection, a modified monomer may be altered such that it is rendered undetectable by the binding agent, e.g., to prevent binding of the binding agent to the modified monomer in subsequent iterations or cycles of detection (e.g., via contacting with additional binding agents). For example, the monomer may be contacted with a blocking agent or derivatized such that the binding agent no longer recognizes the derivatized form. Such blocking strategies may be useful in preventing re-detection of the monomer. Additional strategies for inhibiting binding of binding agents to cleaved monomers are described elsewhere herein.
[00220] Similarly, in some instances, the binding agent may be removed from the modified monomer or stacked plurality of modified monomers at any useful or convenient operation, e.g, subsequent to detection. Removal of the binding agent may be performed using chemical or enzymatic approaches, e.g., using chemical denaturants, detergents, acidic or alkaline conditions,
heat, or proteases. Alternatively, or in addition to, if a polymerizable molecule and/or detectable label is coupled to the binding agent, the polymerizable molecule and/or detectable label of the binding agent may be removed or rendered undetectable, e.g., via a cleavage orrestriction site and use of a cleaving enzyme (e.g., UDG, restriction enzyme), chemical cleavage, photolysis, photobleaching, or other approach. In some instances, the polymerizable molecule or detectable label is coupled to the binding agent via a noncovalent interaction, e.g., desthiobiotin-avidin; accordingly, decoupling of the polymerizable molecule or detectable label from the binding agent may be achieved by use of a competition agent, e.g., a higher-affinity biotin to competitively replace the desthiobiotin.
[00221] Identification of polymerizable molecules: In some instances, the polymerizable molecules may be subjected to sequencing or identification. For instance, the polymerizable molecules may comprise spatial or temporal information (e.g., barcodes encoding for such information), such thatidentification or readout of the barcodes can yield information on the order or sequence that the modified monomers originated from the polymeric analyte. In one such example and referring again to FIGs. 1A-1B, a first polymerizable molecule (e.g., a linking nucleic acid molecule) may be provided which comprises barcode information of the cycle or round (e.g., cycle 1) that it is provided; subsequently, workflow 100a or 100b may be repeated using a second polymerizable molecule (e.g., a second linking nucleic acid molecule) that comprises barcode information of the cycle or round (e.g., cycle 2). Accordingly, the barcode information may be read out, such that the cycle or round that the polymerizable molecule was provided may be determined and may thus provide temporal or sequential information on the modified monomer (e.g., that an identified monomer was from a first cycle and that a second identified monomer was from a second cycle).
[00222] Nucleic Acid Reactions: In some instances, the polymerizable molecules comprise nucleic acid molecules. The nucleic acid molecules may be subjected to a nucleic acid reaction at any useful or convenient step. For example, the nucleic acid molecules may be amplified (e.g., using nucleic acid amplification approaches such as polymerase chain reaction (PCR), isothermal amplification, ligation-mediated amplification, transcription-based amplifi cation, etc.) to generate amplicons for sequencing. Amplification may be performed, for example, using the capture moieties or polymerizable molecules as primer binding sites. Any number of useful preparation operations may be performed, such as purification or enrichment (e.g., using gel electrophoresis and extraction, column cleanups, SPRI, or other approach), cleanup, nucleic acid reactions (e.g, ligation, extension, amplification, tagmentation, restriction enzyme cleavage, phosphatase or kinase treatment), fragmenting, barcoding, addition of adapters, enzymatic treatment, etc. In some
instances, the polymerizable molecules, or the substrates comprisingthe polymerizable molecules, may be filtered based on any useful characteristic or properties. Filtering based on a characteristic or property may achieve higher accuracy or less noise by removing poor quality molecules or enriching for higher quality polymerizable molecules prior to sequencing. For example, polymerizable molecules or substrates (e.g., beads or particles) containing the polymerizable molecules may be filtered by size or length, quantity, presence of particular sequences (e.g. , primer sequences, sequences of interest), GC content, polarity, polarization, birefringence, fluorescence (or other optical property), anisotropy, charge, secondary structure (e.g., hairpins), or other useful metric, characteristic, or property or combinations thereof. Such filtration or enrichment may be performedusingany suitable approach, e.g., affinity or hybridization approaches (e.g., bead-based affinity sequences or hybridization assays, which can enrich particular sequences), chromatography, size-based filtration, electrophoresis, electrofocusing, optoelectronics, SPRI, digital fluidics, magnetic activated sorting, fluorescence activated sorting, flow cytometry, or other suitable technique.
[00223] Sequencing may be performed using a commercially available nanopore system, e.g, Oxford Nanopore Technologies, Genia Technologies, NobleGen, or Quantum Biosystem, or other sequencing and next generation sequencing systems, e.g., Illumina, BGI, Qiagen, ThermoFisher, PacBio, and Roche, including formats such as parallel bead arrays, sequencing by synthesis, sequencingby ligation (e.g., SOLiD), capillary electrophoresis, electronic microchips, “biochips,” microarrays, parallel microchips, single-molecule arrays, and Sanger sequencing, as is described elsewhere herein.
[00224] Image-basedsequencing: Sequencing of the modified amino acids or stacked plurality of modified amino acids may be performed by detecting a detectable label coupled to a binding agent that recognizes a particular amino acid type. In some instances, the binding agent comprises a fluorophore, and detection and identification of the amino acid type of a modified amino acid comprises super-resolution fluorescence imaging, e.g., using dSTORM. In such instances, sequencing reads may be generated by the detection and identification of the individual amino acid type comprised by the modified amino acid or stacked plurality of modified amino acids as well as the location (e.g., an X-Y position along an imaging plane) or proximity of other modified amino acids of a stacked plurality of modified amino acids. In one example, an array comprising a plurality of individually addressablelocations may be provided. In each individually addressable location, one or fewer modified amino acids or stacked plurality of modified amino acids is provided. As such, all fluorescence signals arisingfromthatindividually addressable location may
be attributed back to a single modified amino acid or stacked plurality of modified amino acids. Further, the use of super-resolution imaging may allow for spatial distinction of each modified amino acid of the single stacked plurality of modified amino acids. As such, within an individually addressable location, both the identity of the amino acid type and the spatial position of the modified amino acid canbe determined, thereby providingthe sequenceof the peptidefrom which the stacked plurality of modified amino acids was derived.
[00225] Identification of the single modified amino acid or stacked plurality of modified amino acids may comprise image processing of the acquired image data. In some instances, dSTORMis used for imaging the single modified amino acid or stacked plurality of modified amino acids. Accordingly, image processing of STORM datasets, including publicly available modules may be used (e.g., using Python script, including NumPy, SciPy, OpenCV, scikit-image). Image processing may be performed, e.g., to measure the intensity of a fluorescent signal or to deconvolve signals within proximity of one another, using any useful operations or combinations of operations, including, in non-limiting examples, filtering, blurring, noise reduction, thresholding, generating weighted centroids to identify coordinates of a fluorescent spot, estimation of variance of the identified centroid, measuring peak intensity, clustering, distance measurement, additional segmentation operations, among others.
[00226] Peptide reads arising from a common parent protein may also be determined. For instance, fragmented peptides arising from a common parent protein may be labeled with a common barcode sequence, as described elsewhere herein. Putative peptide reads can thus be assembled based on the common barcode sequence, amino acid identity, and if applicable, cycle number. Erroneous reads may be identified through probabilistic modeling of accuracy of reads, resulting in reconstructed, fragmentary, peptide sequences (contigs) with possible gaps for missed or unidentified rounds/amino acid. An alternative option for de novo read reconstruction may employ end-to-end, unsupervised machine learning based reconstruction of peptide reads. This option may employ a Machine Learning Algorithm, which refers to a deep-learning based model that takes as its inputNGS sequencing reads associated with a parent protein/peptide barcode, and outputs the likely reconstruction of peptide reads (contigs). Training of the model can be conducted with protein sequencing runs using known protein/peptide standards. The de novo reconstruction may output reconstructed, fragmentary, peptide sequences (contigs) with a probability assigned to each amino acid as well as the assembled peptide sequence. In some instances, a k-mer or De Brujin approach may be used for peptide sequence reconstruction. For example, reads arisingfrom each polymerizable molecule may be broken down into shorter k-mer sequences. The k-mer sequences from the pool of reads may be assembled into longer contig
sequences. A De Brujin graph may be generated, e.g., to represent splice variants, post- translational modifications, or other proteoforms. The isoforms may be assembled, and the expression level may be determined using a Bayesian approach. The assembled isoforms of proteins may be subjected to evaluation and error correction, e.g., by comparison with standard proteins that are spiked in samples, and assessing for missing segments of sequences, incorrect or redundant assembly, uniform coverage, etc.
[00227] In some instances, in which the identity of the polymerizable molecule is useful (e.g, comprises spatial or temporal information), the identification of the polymerizable molecule (e.g, nucleic acid sequence) may be determined without use of a sequencing approach. For instance, probes may be used to couple to particular regions of a DNA molecule. The probes may comprise nucleic acid probes with probe sequences that can be used to specifically detect a nucleic acid sequence. In some instances, the probes may comprise detectable labels or moieties, e.g., a fluorophore, radioisotope, mass tag, etc. For example, hybridization-based assays such as SeqFISH or Nanostring may be performed to probe or assay particular regions of a polymerizable molecule to determine its identity. In other examples, an amplification-based approach may be used to determine the presence and identity of a polymerizable molecule. For example, PCR or nested PCR approaches may be used to selectively probe for a particular sequence of a DNA molecule.
[00228] In some instances, the substrates comprising the polymerizable molecules (e.g., following one or more iterations of workflow 100a of FIG. 1A) may be provided on an array for sequencing. For example, a plurality of beads comprising polymerizable molecules that encode for amino acids of a plurality of peptides may be provided on an array for sequencing. In one such example, the plurality of beads may be directly or indirectly coupled to an additional substrate (e.g., planar substrate, such as microscope slides or multi-well plates), and sequencing may be performed using image-based sequencing approaches (e.g., using sequencing by synthesis orin situ hybridization probes(e.g., fluorescence in situ hybridization) and a single-molecule resolution imaging system), amplification-based sequencing, or both. The plurality of beads may be coupled to the additional substrate using any suitable technique such as nucleic acid attachment using the polymerizable molecules or capture molecules, magnetic attachment (using a magnetic field and magnetic beads), optoelectronics, digital microfluidics, application of an electric field, gravity settling, centrifugation, capillary force, hydrogen bonding, electrostatic interactions or other suitable approach.
[00229] Fingerprinting: The methods described herein may be useful in complete de novo protein or peptide sequencing (e.g., the identification of each amino acid in a peptide), or for fingerprinting a protein (e.g., identifying only a subset of amino acid types in a peptide and inferring, using a reference database, the identity of the peptide). For fingerprinting, a subset of amino acids may be identified, e.g., using the approaches described herein, withoutthe need of binding agents that are specific to all 20 proteinogenic amino acids. For example, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, or 19 different binding agents with single-amino acid or multi-amino acid specificity may be sufficient to determine the identity of a protein or peptide. For known proteome databases, reference-based reconstruction may be performed by simulating NGS reads that would be generated from the set of possible peptides in the workflow. For each possible peptide, a simulation can produce NGS reads mimicking the output of this protein sequence system. Next, the real (experimental) NGS reads from a run can be matchedto simulated reads from candidate peptides from a database based on likelihood. This results in reconstructed, fragmentary, peptide sequences (contigs) with probability assigned to the assembled peptide sequence.
[00230] High Throughput Sequencing/Parallelization: The methods described herein may be conducted in a parallelized, high-throughput format. Such parallelization may be achieved by having substrates comprising multiple polymeric analytes coupled thereto and performing the operations (iteratively, across the substrate. Similarly, parallelization and high-throughput screening may be achieved by providing a plurality of stacked plurality of modified amino acids on a substrate for detection and sequencing using super-resolution imaging. In some instances, a library of binding agents may be used to recognize different monomer types (e.g., different amino acids of a peptide analyte or modified amino acids), such that different polymeric analytes (e.g, different peptides) may be processed on a single or multiple substrates.
[00231] A library of binding agents may be used to recognize different monomer types to facilitate high-throughput readout. As described herein, the library of binding agents may comprise binding agents that can recognize a single monomer (e.g., a single cleaved amino acid or amino acid-linker complex) or multiple monomers (e.g., multiple cleaved amino acids). In some instances, binding agents with varying levels of specificity may be used in a sequence or order, which may help render a less-specific binding agent to be more specific, simply based on the sequence in which itis provided. For example, a firstbindingagentmay be capable of specifically binding to a first monomeric analyte, and a second binding agent may be capable of binding to both the first monomeric analyte and a second monomeric analyte. The first binding agent may be provided and contacted with the first monomeric analyte and second monomeric analyte. Since
the first binding agent is specific to the first monomeric analyte, the first binding agent will bind exclusively to the first monomeric analyte. Subsequently, the second binding agent may be provided; however, since the first monomeric analyte is bound to the first binding agent, the first monomeric analyte may be inaccessible (e.g., sterically blocked) to the second binding agent. As such, the second binding agent may bind only to the second monomeric analyte. Accordingly, identification of the first binding agent and second binding agent (e.g., through detection of a label/tag or through sequencing of polymerizable molecules coupled to the binding agents and optionally transferred to a substrate), may allow for identification of the first monomeric analyte and the second monomeric analyte.
[00232] In some instances, it may be useful to barcode the polymeric analytes prior to processing. Barcode sequences may be attachedto the polymeric analytes at a single location (e.g., at a terminus), multiple locations, adjacentto the polymeric analyte (e.g., on a substrate), etc. as is described elsewhere herein. For example, a peptide may be labeled at the N-terminus, C- terminus, or an internal amino acid with a nucleic acid barcode molecule. The nucleic acid barcode sequence may comprise information or be unique to a partition or compartment, sample, peptide, etc. such that each unique barcode sequence can be traced back (e.g., subsequent to nucleic acid sequencing or other detection method) to the originating partition or compartment, sample, peptide, etc.
[00233] Alternatively, or in addition to, the capture moieties or polymerizable molecules may comprise a barcode sequence. The barcode sequence may be specific to a particular partition, sample, or spatial location. For example, a substrate may comprise a plurality of individually addressable units. The individually addressable unit may comprise a unique barcode specific to the individually addressable unit (e.g., a spatial barcode). The polymeric analytes may be coupled to the substrate such that each individually addressable unit comprises, on average, no more than one polymeric analyte, or, for analysis, no more than one modified monomer or stacked plurality of modified monomers. Such a distribution of polymeric analytes (or modified monomers or stacked plurality of modified monomers) may be obtained, for example, using a limited dilution approach (e.g., diluting the polymeric analytes to reduce the number of polymeric analytes that may attach to a given individually addressable unit) or by introduction of chaotropic agents (e.g, guanidine, formamide, urea). The polymeric analytes, modified monomers, or stacked plurality of modified monomers may be distributed across the individually addressable units according to a Poisson distribution. Thus, for a given substrate, about 6%, 10%, 18%, 20%, 30%, 36%, 40%, or 50% of the individually addressable units may comprise one or fewer polymeric analytes, modified monomers, or stacked plurality of modified monomers.
[00234] Modifications of Polymeric Analytes and Polymerizable Molecules'. The present disclosure also provides for methods of modifying polymeric analytes or monomers of the polymeric analytes (e.g., amino acids of a peptide), as well as polymerizable molecules described herein. Such modifications may useful, for example, in rendering a monomer more resistant to certain reaction conditions (e.g., Edman degradation), to increase or decrease binding affinity of a binding agent to the modified monomer, to assist in docking or interfacing of the modified monomer to an enzyme (e.g., a protease, cleaving enzyme or enzyme analog such as a ribozyme or DNAzyme, binding agent), or other purpose.
[00235] A polymeric analyte such as a peptide may be modified in order to render the peptide or a constituent amino acid more resistant to the reaction conditions for cleaving the amino acid from the peptide. For example, the peptide may be subjected to alkylation, e.g., using 4- vinylpyridine, iodoacetamide, which may be useful in preventing oxidation of cysteine residues. The peptide may be subjected to acetylation, e.g., O-acetylation to form an ester such as acetyl chloride, which may be useful in preventing dehydration, racemization, or destruction of a derivatized (e.g., PTH form) of serine or threonine. The peptide may be subjected to P-elimination of a phosphate, followed by a Michael addition of a thiol group, (e.g., as described in Knight et al. 2003. Nature Biotechnology 21 , 1047-1054, which is incorporated by reference herein) to detect phosphorylation events. The peptide may be contacted with phenyl isothiocyanate, acetic anhydride, or other amine-reactive group to protect lysine residues. Additional examples of peptide processing for Edman degradation can be found in Tarr, Methods of Protein Microcharacterization . pp 155-194, which is incorporated by reference herein.
[00236] A polymeric analyte or monomer may be modified to influence the interaction of a binding agent with the polymeric analyte or monomer, e.g., by derivatizingthe cleaved monomer, adding of chemical groups to the cleaved monomer, or other chemical processing (e.g., addition or removal of groups) from the cleaved monomer. Blocking of the binding agent may be achieved by appending a blocking agent, e.g., a chemical group or adduct, to the monomer-capture moiety complex; for example, conjugation a synthetic polymer (e.g., PEG), nucleic acid molecule, fluorophores, quenchers, nanotube, nanoparticle, small molecules, polypeptide or protein, fatty acid chain, or other large, sterically-hindering molecules. The blocking agents may be appended to the monomer using a chemical approach (e.g., reacting with an amino acid, e.g., via a photoreaction) or enzymatically, e.g., using methyltransferases, tRNA synthetases, acetyltransferases, etc.
[00237] Order of Operations: It will be appreciated that the operations presented in the methods described herein may be performed in any useful or convenient order and that some operations, in some instances, may be optional. For example, in some instances, the coupling of the monomer (e.g., a terminal amino acid) to the capture moiety may occur prior to, during, or subsequent to the cleaving of the monomer from the polymeric analyte. Similarly, the coupling of the linker to the capture moiety (e.g., either with or without a linking nucleic acid molecule) may occur prior to, during, or subsequent to the coupling of the linker to the monomer. In another example, the substrate may be provided with the modified monomers coupled thereto, such that cleavage of the monomer from the polymeric analyte is obviated. In yet another example, in instances where a linker is used to couple to the monomer (e.g. , amino acid) andthe capture moiety or substrate, the linker may comprise a monomer-coupling group and subsequently be reacted with a linking molecule (e.g., a linking nucleic acid molecule); alternatively, the linker may be provided with the linking nucleic acid molecule as part of the linker (e.g., pre-conjugated). In other examples, the binding agent may be provided at any useful or convenient step. For instance, the binding agent may be contacted with the modified monomers (e.g., monomer-linker complex sub sequent to cleavage from the polymeric analyte) prior to, during, or sub sequent to attachment or immobilization of the modified monomer to a substrate for analysis (e.g., via imaging).
[00238] Additional operations may be performed at any useful or convenient step, e.g., prior to provision of the polymeric analyte (e.g., peptide) or subsequent to one or more of the processing operations (e.g., subsequent to coupling of the linker, polymerizable molecules, contacting with binding agents, etc.). For instance, it may be useful to purify or enrich or purify a population of polymerizable molecules (e.g., subsequent to process 106, 112, or 113 in FIG. IB). Such enrichment or purification can be performed using any useful technique, e.g., bead-based enrichment, immunoprecipitation, chromatography, electrophoresis, DNA purification, etc. In one such example, purification of a nucleic acid molecule may be performed using a bead comprising or coupled to a complementary sequence of the nucleic acid molecule and optionally, subsequent to capture, eluting the nucleic acid molecule. Similarly, a protein may be purified using a bead comprising an antibody that recognizes the protein or a portion of the protein.
Substrate Conjugation
[00239] The present disclosure provides methods for coupling molecules (e.g., biomolecules such as nucleic acid molecules, peptides, lipids, carbohydrates, etc.) to a substrate. The substrate may be functionalized to allow for covalent or noncovalent coupling of the molecules to a substrate. The substrate may comprise any useful functional moiety, e.g., a reactive moiety. In a
non-limiting example, a reactive moiety may comprise a click chemistry moiety, such as azide, alkyne, nitrone, alkene (e.g., a strained alkene), tetrazine, methyltetrazine, triazole, tetrazole, phosphite, phosphine, etc. A click chemistry moiety may be reactive in copper-catalyzed Huisgen cycloaddition or the 1,3 -dipolar cycloaddition between an azide and a terminal alkyne, a Diels- Alder reaction (e.g., a cycloaddition between a diene and a dienophile), or a nucleophilic substitution reaction in which one of the reactive species is an epoxy or aziridine. A molecule that is to be coupled to a substrate may comprise a complementary click chemistry moiety to that of the substrate; for example, the substrate may comprise an alkyne moiety and the molecule to be coupled may comprise an azide moiety, which can react with the alkyne moiety of the substrate to generate a covalent linkage. In one such example, the sub state may comprise dibenzocyclooctyne (DBCO) moieties to which azide-comprising molecules (e.g., azide-DNA, azide-polymers, azide- peptides) can react and conjugate.
[00240] The reactive moiety may comprise a photoreactive moiety that may be activated when exposed to a photostimulus (e.g., light such as UV or visible light). Examples of photoreactive moieties include aryl (phenyl) azides (e.g., phenyl azide, ortho-hydroxyphenyl azide, metahydroxyphenyl azide, tetrafluorophenyl azide, ortho-nitrophenyl azide, meta-nitrophenyl azide), diazirines, azido-methyl-coumarins, benzophenones, anthraquinones, diazo compounds, diazirines, psoralen, and analogs or derivatives thereof.
[00241] The reactive moiety may comprise a carboxyl-reactive crosslinker group, such as diazomethane, diazoacetyl, carbonyldiimidazole, carbodiimides (e.g., l-ethyl-3-(3- dimethylaminopropyl)carbodiimide hydrochloride (EDC)), dicyclohexylcarbodiimide (DCC)), or an amine-reactive group (e.g., N-hydroxysulfosuccinimide (NHS), Sulfo-NHS, or NHS-esters). The reactive group may comprise a crosslinking agent, which may comprise an NHS group, an EDC group, a maleimide, a thiol, a cystamine, an aldehyde, a succinimidyl group, an expoxide, an acrylate. Examples of crosslinking agents include, for example, NHS (N-hydroxysuccinimide); sulfo-NHS (N-hydroxysulfosuccinimide); EDC (l-Ethyl-3 -[3 -dimethylaminopropyl]); carbodiimide hydrochloride; SMCC (succinimidyl 4-(N-maleimidomethyl)cyclohexane-l- carboxylate); DSS (disuccinimidyl sub erate); DSG (disuccinimidyl glutarate); DFDNB (1,5- difluoro-2,4-dinitrobenzene); BS3 (bis(sulfosuccinimidyl)suberate); TSAT (tris- (succinimidyl)aminotriacetate); BS(PEG)5 (PEGylated bis(sulfosuccinimidyl)suberate); BS(PEG)9 (PEGylated bis(sulfosuccinimidyl)suberate); DSP(dithiobis(succinimidyl propionate)); DTSSP (3,3'-dithiobis(sulfosuccinimidyl propionate)); DST(disuccinimidyl tartrate); BSOCOES (bis(2-(succinimidooxycarbonyloxy)ethyl)sulfone); EGS (ethylene glycol bis(succinimidyl succinate)); DMA (dimethyl adipimidate); DMP (dimethyl pimelimidate); DMS
(dimethyl suberimidate); DTBP (Wang and Richard's Reagent); BM(PEG)2 (1,8-bismaleimido- diethyleneglycol); BM(PEG)3 (1,11-bismaleimido-triethyleneglycol); BMB (1,4- bismaleimidobutane); DTME (dithiobismaleimidoethane); BMH (bismaleimidohexane); BMOE (bismaleimidoethane); TMEA (tris(2-maleimidoethyl)amine); SPDP (succinimidyl 3-(2- pyridyldithio)propionate); SMCC (Succinimidyl trans-4-(maleimidylmethyl)cyclohexane-l- Carb oxy late); SIA (succinimidyl iodoacetate); SBAP (succinimidyl 3- (bromoacetamido)propionate); STAB (succinimidyl (4-iodoacetyl)aminobenzoate); Sulfo-SIAB (sulfosuccinimidyl (4-iodoacetyl) aminobenzoate); AMAS (N-a-maleimidoacet-oxy succinimide ester); BMPS (N-P-maleimidopropyl-oxysuccinimide ester); GMBS (N-y-maleimidobutyryl- oxy succinimide ester); Sulfo-GMBS (N-y-maleimidobutyryl-oxysulfosuccinimide ester); MBS (m-maleimidobenzoyl-N-hydroxysuccinimide ester); Sulfo-MBS (m-maleimidobenzoyl-N- hydroxysulfosuccinimide ester); SMCC (succinimidyl 4-(N-maleimidomethyl)cyclohexane-l- carboxylate); Sulfo-SMCC (sulfosuccinimidyl 4-(N-maleimidomethyl)cyclohexane-l- carboxylate); EMCS (N-s-malemidocaproyl-oxysuccinimide ester); Sulfo-EMCS (N-E- maleimidocaproyl-oxysulfosuccinimide ester); SMPB (succinimidyl 4-(p- maleimidophenyl)butyrate); Sulfo-SMPB (sulfosuccinimidyl 4-(N-maleimidophenyl)butyrate); SMPH (Succinimidyl 6-((beta-maleimidopropionamido)hexanoate)); LC-SMCC (succinimidyl 4- (N-maleimidomethyl) cyclohexane- 1 -carboxy -(6-amidocaproate)); Sulfo-KMUS (N-K- maleimidoundecanoyl-oxy sulfosuccinimide ester); SPDP (succinimidyl 3-(2- pyridyldithio)propionate); LC-SPDP (succinimidyl 6-(3(2-pyridyldithio)propionamido) hexanoate); LC-SPDP (succinimidyl 6-(3(2-pyridyldithio)propionamido)hexanoate); Sulfo-LC- SPDP (sulfosuccinimidyl 6-(3'-(2-pyridyldithio)propionamido)hexanoate); SMPT (4- succinimidyloxycarbonyl-alpha-methyl-a(2-pyridyldithio)toluene); PEG4-SPDP (PEGylated, long-chain SPDP crosslinker); PEG12-SPDP (PEGylated, long-chain SPDP crosslinker); SM(PEG)2 (PEGylated SMCC crosslinker); SM(PEG)4 (PEGylated SMCC crosslinker); SM(PEG)6 (PEGylated, long-chain SMCC crosslinker); SM(PEG)8 (PEGylated, long-chain SMCC crosslinker); SM(PEG)12 (PEGylated, long-chain SMCC crosslinker); SM(PEG)24 (PEGylated, long-chain SMCC crosslinker); BMPH (N-P-maleimidopropionic acid hydrazide); EMCH (N-s-maleimidocaproic acid hydrazide); MPBH (4-(4-N-maleimidophenyl)butyric acid hydrazide); KMUH (N-K-maleimidoundecanoic acid hydrazide); PDPH (3 -(2- pyridyldithio)propionyl hydrazide); ATFB-SE (4-Azido-2,3,5,6-Tetrafluorobenzoic Acid, Succinimidyl Ester); ANB-NOS (N-5-azido-2 -nitrobenzoyloxysuccinimide); SDA (NHS- Diazirine) (succinimidyl 4,4'-azipentanoate); LC-SDA (NHS-LC-Diazirine) (succinimidyl 6- (4,4'-azipentanamido)hexanoate); SDAD (NHS-SS-Diazirine) (succinimidyl 2-((4,4 -
azipentanamido)ethyl)-l, 3 '-dithiopropionate); Sulfo-SDA (Sulfo-NHS-Diazirine) (sulfosuccinimidyl 4,4'-azipentanoate); Sulfo-LC-SDA (Sulfo-NHS-LC-Diazirine) (sulfosuccinimidyl 6-(4,4'-azipentanamido)hexanoate); Sulfo-SDAD (Sulfo-NHS-SS-Diazirine) (sulfosuccinimidyl 2-((4,4'-azipentanamido)ethyl)-l,3'-dithiopropionate); SPB (succinimidyl-[4- (psoralen-8-yloxy)]-butyrate); Sulfo-SANPAH (sulfosuccinimidyl 6-(4'-azido-2- nitrophenylamino)hexanoate); DCC (dicyclohexylcarbodiimide); EDC (l-ethyl-3-(3- dimethylaminopropyl)carbodiimide hydrochloride); gluteraldehyde; formaldehyde; and combinations or derivatives thereof.
[00242] Molecules may also be attached to substrates using linkers. The linkers can have any useful number of functional groups or reactive groups and may be uni-functional (having one functional group), bi-functional, tri-functional, quadri-functional, or comprise a greater number of functional groups. In some instances, a molecule (e.g., nucleic acid molecule, peptide, or polymer) may be attached to a substrate using a heterobifunctional linker. The heterobifunctional linker may comprise any useful functional group, as described herein. Non-limiting examples of heterobifunctional linkers include: p- Azidob enzyol hydrazide (ABH), N-5-Azido-2- nitrobenzoyloxysuccinimide (ANB-NOS), N-[4-(p-Azidosalicylamido)butyl]-3'-(2'- pyridyldithio) propionamide (APDP), p-Azidophenyl Glyoxal monohydrate (APG), Bis [B-(4- azidosalicylamido)ethyl]disulfide (BASED), Bis [2-(Succinimidooxy carbonyloxy )ethy 1] Sulfone (BSOCOES), BMPS, 1,4-Di [3'-(2'-pyridyldithio)propionamido] Butane (DPDPB), Dithiobis(succinimidyl Propionate) (DSP), Disuccinimidyl Suberate (DSS), Discuccinimidyl Tartrate (DST), 3,3'-Dithiobis(sulfosuccinimidyl Propionate (DTSSP), EDC, Ethylene Glycol bis (succinimidyl succinate) (EGS), N-(E-maleimidocaproic acid hydrazide (EMCH), N-(E- maleimidocaproyloxy)-succinimide ester (EMCS), N-Maleimidobutyryloxysuccinimide ester (GMBS), Hydroxylamine-HCl, MAL-PEG-SCM, m-Maleimidobenzoyl-N-hydroxysuccinimide Ester (MBS), N-Hydroxysuccinimidyl-4-azidosalicylic acid (NHS-ASA), PDPH, N-Succinimidyl bromoacetate (SBA), SIA, Sulfo-SIA, Succinimidyl-4-(N-maleimidomethyl)cyclohexane-l- carboxylate (SMCC), Succinimidyl 4-(p-maleimidophenyl) Butyrate (SMPB), Succinimidyl-6- [B-maleimidopropionamido]hexanoate (SMPH), N-Succinimidyl 3-[2-pyridyldithio]-propionate (SPDP), Sulfo-LC-SPDP, N-(p-Maleimidophenyl isocyanate (PMPI), N-Succinimidyl(4- iodoacetyl) Aminobenzoate (SIAB), Sulfo-MBS, Sulfo-SANPAH, Sulfo-SMCC, Sulfo-DST, Sulfo-EMCS, Sulfo-GMBS, N-Hydroxysulfosuccinimidyl-4-azidobenzoate (Sulfo-HSAB), Sulfosuccinimidyl (4-azidophenyl)-l,3 dithio propionate (Sulfo-SADP), Sulfosuccinimidyl 2-(m- azido-o-nitrobenzamido)-ethyl-l,3'-dithio propionate (Sulfo-SAND), Sulfosuccinimidyl-2-(p-
azidosalicylamido)ethyl- 1,3 -dithiopropionate (Sulfo SASD), Sulfo-SIAB, Sulfo-SMCC, Sulfo- SMPB, and the like.
[00243] Additional examples of conjugation reactions that may be used to attach molecules to substrates include an Ullmann reaction, Heck reaction, Negishi reaction, Stille reaction, Suzuki reaction, Buchwald-Hartwig coupling, Castro-Stevens coupling, Glaser coupling, Kumada coupling, Larock indole synthesis, Miyaura borylation, Sonagashira cross-coupling, a Grubbs reaction.
[00244] More than one type of molecule may be coupled to the substrate. For example, a substrate may be coupled to nucleic acid molecules and peptides. Alternatively, a substrate may be coupled to only one type of molecule (e.g., only nucleic acid molecules, only peptides, only lipids, only carbohydrates, etc.). A substrate may be coupled to any useful combination of molecules, linkers, reactive moieties or functional groups, which may be coupled at any useful density, as described elsewhereherein. For example, a multifunctional linker may be usedto attach both a nucleic acid barcode molecule and a peptide to the substrate. Alternatively, a substrate may comprise a linker and reactive sites; the linker may be used to attach one type of molecule (e.g, peptides or nucleic acid molecules), whereas the reactive sites maybe usedto attach another type of molecule (e.g., nucleic acid molecules or peptides).
[00245] The proximity of a molecule coupledto a substrate to its nearest neighbor (e.g., another molecule) may be controlled using a variety of approaches, e.g., self-assembling monolayers, patterning approaches, linking moieties, etc. In some instances, it may be advantageous to have two molecules in close proximity (e.g., two polymerizable molecules, such as a peptide and a nucleic acid molecule, ortwo nucleic acid molecules). For instance, with respectto the sequencing approaches described herein, capture moieties may be used to couple a monomer of a polymeric analyte, and subsequent to monomer cleavage, additional polymerizable molecules may be required to be in proximity to the capture moiety to allow for transfer of information encoded by polymerizable molecules of binding agents. The proximity of the molecules (e.g., capture moiety and polymerizable molecules) may be mediated using tethering molecules, such as nucleic acid molecule “staples” or multi-functional linkers.
[00246] Nucleic acid molecules may be coupled to a substrate by direct coupling. In such instances, the substrate or the nucleic acid molecules may comprise functional moieties that can interact. For example, the substrate and nucleic acid molecules may comprise a complementary click chemistry pair, e.g., alkyne and azide. In one such example, a substrate may comprise alkyne moieties (e.g., DBCO), which can be reacted with azide-functionalized nucleic acid molecules. The nucleic acid molecules maybe reacted with the alkyne moieties in a click chemistry reaction
to covalently link the substrate to the nucleic acid molecules. In another example, the substrate may comprise avidin or streptavidin moieties, to which biotinylated nucleic acid molecules may interact and bind non -covalently. Alternatively, or in addition to, the substrate may comprise a nucleic acid molecule to which additional nucleic acid molecules (e.g., nucleic acid analytes, nucleic acid linkers) are conjugated using hybridization, ligation, click chemistry, crosslinking (e.g., photocrosslinking such as CNVK).
[00247] Alternatively, or in addition to, the nucleic acid molecules may be coupled to a substrate using a linker, e.g., as described elsewhere herein. The linker may comprise at least two functional groups (e.g., a heterobifunctional linker) that can couple to both the substrate and the nucleic acid molecules. In an example, the substrate may comprise an amine group, and alkyne- functionalized DNA primers (e.g., DBCO-DNA primers) may be attached using a linker such as azidoacetic acid NHS ester. In another example, amine-fun ctionalized substrates may be coupled to azide-functionalizedDNA primers using a DBCO-NHS ester or DBCO-PEG-NHS ester linker. [00248] Similarly, peptides may be coupled to a substrate by direct coupling or by using a linker. A peptide may be coupled to a substrate at a terminus of the peptide (e.g., C terminus orN terminus), at an internal residue or amino acid of the peptide, or at multiple locations along the peptide. In examples of direct coupling, a peptide may be functionalized with a moiety that can interact with a moiety of the substrate (e.g., click chemistry pair, avidin-b iotin). For example, the substrate and peptides may comprise a complementary click chemistry pair, e.g., alkyne and azide, or binding partners such as avidin and biotin. In one example of a click chemistry pair, a substrate may comprise alkyne moieties (e.g., DBCO), which can be reacted with azide-functionalized peptides. The peptides may be reacted with the alkyne moieties in a click chemistry reaction to covalently link the substrate to the peptides. In another example, the substrate may comprise avidin or streptavidin moieties, to which biotinylated peptides may interact and bind non- covalently.
[00249] Alternatively, or in addition to, the peptides may be coupled to a substrate using a linker, e.g., as described elsewhere herein. The linker may comprise atleasttwo functional groups (e.g., a heterobifunctional linker) that can couple to both the substrate and the nucleic acid molecules. In an example, the substrate may comprise an amine group, and alkyne-functionalized peptides may be attached using a linker such as azidoacetic acid NHS ester. In another example, amine-functionalized substrates may be coupled to azide-functionalized peptides using a DBCO- NHS ester or DBCO-PEG-NHS ester linker. In yet another example, substrates comprising an amine group may be coupled to an azide-functionalized peptide using EDC and Sulfo-NHS.
[00250] A peptide may be functionalized with a functional moiety to enable attachment or coupling of the peptide to the substrate. The functional moiety may comprise a silane, e.g., aminosilane (e.g., APTES), amino-PEG-silane, click chemistry moiety or other linking moiety and can be attached to the peptide at a peptide terminus (N-terminus or C-terminus), at an internal amino acid, or at multiple locations (e.g., multiple internal amino acids, one or both termini, etc.). Chemical approaches to functionalize peptides can include C-terminal-specific conjugation (e.g, via C-terminal decarb oxylative alkylation) using photoredox catalysis, e.g., as described by Bloom et al, Nature Chemistry 10, 205-211 . 2018. and Zhang et al, ACS Chem. Biol. 2021, 16, 11 , 2595-2603, each of whichisincorporatedby reference herein in its entirety, or amide coupling to an amine-functionalized surface. N-terminal attachment may comprise amide coupling of the N-terminus amine group to a carboxylic group functionalized surface or using 2- pyridinecarboxaldehyde variants. Alternatively, or in addition to, functionalization of terminal ends of peptides may be achieved enzymatically or using enzyme analogs such as ribozymes or DNAzymes. In an example of enzymatic functionalization and attachment, carboxypeptidases or amidases are used for C-terminal functionalization (e.g., as described in Xu et al, ACS Chem Biol. 2011 Oct 21 ; 6(10): 1015-1020; Zhu et al, Chinese Chemical Letters. 2018, Vol 29 Issue 7, Pages 1116-1118; andZhu et al, ACS Catal. 2022, 12, 13, 8019-8026, each of which is incorporated by reference herein in its entirety), which can allow for the addition of a click chemistry moiety to the peptide. The click chemistry -functionalized peptides may then be directly attached to the substrate via another clickable group (e.g., BCN-azide or DBCO-azide coupling), or, in other instances, may be reacted with another linker or polymerizable molecule (e.g., a bait nucleic acid molecule with a clickable group) that can then link to the substrate directly or indirectly (e.g., using a capture nucleic acid molecule and hybridizing the bait nucleic acid molecule). Additional examples of enzymes that can be used for functionalization or attachment include Sortase A, subtiligase, Butelase I, or trypsiligase. In some examples, ubiquitin ligase can be used to attach ubiquitin proteins with linker moieties to substrates. These linker moieties can then be used to chemically attach proteins to ubiquitin-coupled substrates. In some examples, glycosylating enzymes may be used to conjugate functionalized sugar groups (e.g., click chemistry functionalized sugars, polymer-conjugated sugars, biotinylated sugars) to amino acid residues, which can allow for attachment to a substrate (e.g., via click chemistry, polymer crosslinking or nucleic acid hybridization, avidin-biotin interactions), etc. Internal amino acid residues or post- translationally modified residues may be coupled to substrates using, for example, thiol labeling amide coupling using EDC/NHS chemistry or DMT-MM to glutamate or aspartate residues,
esterifying glutamate or aspartate residues, alkylation or disulfide bridge labeling of cysteines, or amide coupling to lysine residues.
[00251] A peptide may be treated prior to, during, or subsequent to coupling of the peptide to a substrate. In some instances, a peptide is conjugated with a tag that enables attachmentto the substrate, e.g., usingHis tags, SNAP-tags, CLIP-tags, SpyCatcher, SpyTag, nucleic acid tags (e.g, bait oligos which can attach to capture oligos of the substrate). In some examples, it may be advantageous to block or protect primary amines or carboxyl groups and optionally, de-block or de-protect the N-terminus primary amine or C-terminus carboxy group in order to facilitate attachment of the N-terminus or C-terminus to a substrate. In an example, single-point (e.g., C- terminal) selective attachment of peptides can be achieved by reacting the peptide with a linker comprising an amine-reactive group (e.g., isothiocyanates such as PITC) and a reactive group (e.g., click chemistry group). The linker can be, for example, PITC-conjugated click chemistry moieties such as PITC-azide, PITC-alkyne, optionally with spacer moieties in between, e.g., PITC-alkyl-azide, PITC-PEG-azide, PITC-alkyl-alkyne, PITC-PEG-azide). The linker reacts with and “blocks” the primary amines (e.g., modifies lysines), includingthe N-terminus. Subsequent cleavage of the N-terminal amino acid (e.g., using an Edman reagent, such as acid), can be performed, and one of the remaining modified lysines may be attached to a substrate (e.g., using the click chemistry moiety coupled to the amine-reactive group). Optionally, the peptide may be treated with a protease, e.g., LysC, which cleaves peptides such that a remaining peptide has a C- terminal lysine and such that the remaining peptide comprises a primary amine only at the C- terminal lysine residue and the N-terminus; such a cleavage may be performed prior to reacting the amine-reactive group, e.g., as shown by Xie et al. Langmuir 2022, 38, 30, 9119-9128, which is incorporated by reference herein in its entirety.
[00252] Similarly, carboxylic groups can be reacted in a way to enable C-terminal or internal residue attachment. In an example of C-terminal conjugation, carboxyl groups may be labeled with a C-terminal sequencing reagent, such as isothiocyanate, when treated with an activating reagent (e.g., acetic anhydride) to generate a peptide-thiohydantoin (at the C-terminus) and “blocked” carboxyl groups on the aspartic acid andglutamic acid residues. The thiohydantoin may then be reacted to couple to a substrate. Alternatively, cleavage of the C-terminal amino acid via a single round of C-terminal sequencing degradation, or via a protease, exposes only a single reactive carboxylic group at the C-terminal amino acid. The single reactive C-terminal carboxylic group can then be used as a reactive moiety for a single attachment site.
[00253] In another approach, a peptide or protein can be attached via the N-terminus using the specific reactivities of the N-terminus amine group. Amine-based reactions, such as amide
coupling, can be carried out at low pH where only the N-terminal amine group is active. In addition, 2-pyridinecarboxyaldehyde and variants can be used to react to the N-terminal amine group.
[00254] In some instances, a peptide may be conjugated to a substrate using a polymerization reaction, e.g., a free radical polymerization, such as using PEGylated peptides, methacrylamide- modified peptides, Michael-type addition of maleimide-terminated oligo-NIPAAM-conjugated peptides; photocrosslinking of azophenyl-conjugated peptides, or other polymerization reactions with monomer-conjugated peptides, e.g., as described by Krishna et al. Biopolymers. 2010; 94(1): 32-48, which is incorporated by reference herein in its entirety.
[00255] Multiple types of molecules may be attached to a substrate. The substrate may comprise, coupled thereto, any combination of molecules, including but not limited to peptides, proteins (e.g., enzymes, antibodies, nanobodies, antibody fragments), nucleic acid molecules, lipids, carbohydrates or sugars, metabolites, small molecules, polymers, metals, viral particles, biotin, avidin, streptavidin, neutravidin, etc. The multiple types of molecules may be attached simultaneously to the substrate or in a sequential manner. For example, a substrate may be treated to conjugate nucleic acid molecules and subsequently treated to conjugate peptides, or alternatively, the substrate may be treated to conjugate peptides prior to the nucleic acid molecules. Any number of conjugation or attachment chemistries may be used. For example, in instances where multiple types of molecules are attached to the substrate, any number of conjugation chemistries may be used for each type of molecule.
[00256] A substrate, or portion thereof, may be subjected to conditions sufficient to passivate the substrate or portion thereof. Passivation of a substrate may beuseful for a variety of purposes, such as preventing nonspecific binding of binding agents, altering the surface density of a molecule (e.g., increasing the density of nucleic acid molecules or peptides), blocking reactive sites (e.g., blocking available click chemistry moieties subsequentto conjugation of the molecules on the substrate), etc. Passivation maybe achieved using chemical approaches, e.g., deposition of blocking agents such as proteins (e.g., albumin), Tween-20, polymers (e.g., PEG), metals or metal oxides, halogens or derivatives thereof (e.g., fluorine or fluorine derivatives, chlorine or chlorine derivatives), or biochemical approaches, e.g., using metal microbes. Substrates comprising reactive moieties may also be passivated following molecule conjugation (e.g., coupling of nucleic acid molecules, peptides, etc.) by reacting any unreacted sites with an appropriate molecule. For example, a substrate comprising click chemistry moieties, e.g., DBCO beads, may be coupled to molecules of interest (e.g., polymerizable molecules, such as nucleic acid molecules, peptides) at a useful density using click chemistry (e.g., azide-nucleic acid molecules, azide-peptides).
Unreacted sites may be passivated by providing and reacting complementary click-chemistry molecules, e.g., azide-polymers (e.g., PEG-azide), which may reduce downstream nonspecific interactions.
[00257] Substrate passivation may occur at any useful time or step. For instance, passivation to block unreacted DBCO sites may be performed prior to, during, or subsequent to conjugation of analytes or other molecules of interest (e.g., peptides and nucleic acid molecules). The passivation may be controlled by stoichiometry or densities of the passivating agent relative to the molecules of interest, or by physical approaches, e.g., photopatterning, self-assembling monolayers, etc.
Sample Processing
[00258] The present disclosure also provides systems, compositions, devices, and methods for processing samples. One or more methods for processing samples may comprise preparation of biological samples for analysis, which, in some instances, includes partitioning of cells for conducting single- cell analysis. A method for processing a biological sample may comprise extraction or isolation of one or more peptides or proteins from the biological sample for further processing and analysis, as is described elsewhere herein.
[00259] Preparation of Cell Suspensions for Single-Cell Analysis: The methods described herein may involve preparation of single cell suspensions from a biological sample. Single cell suspensions may be prepared from biological samples by dissociating cells and optionally, culturing them in a liquid medium. In some instances, biological samples comprise a liquid sample. For example, a biological sample may comprise a bacterial liquid culture, a mammalian liquid culture, a blood, plasma, or serum sample. Processing of such liquid samples may include centrifugation (e.g., to isolate cells), resuspension of cells in a suitable medium, such as Dulbecco’s Phosphate Buffered Saline (DPBS), and optional culturing of the isolated cells.
[00260] A biological sample may comprise cultured cells, e.g., cell cultured in suspension, or cells adhered to a solid surface, such as petri dishes or tissue culture dishes. Cultured adherent cells samples may be treated to generate a cell suspension, e.g., via a protease such as trypsin, to detach the cells from the surface. A biological sample may comprise a tissue or biopsy sample. A tissue or biopsy sample may be processed mechanically or enzymatically to generate a cell suspension. Such processing may include sonication (mechanical treatment) or enzymatic treatment, such as the use of pronase, collagenase, hyaluronidase, metalloproteinases, trypsin, or other enzymes that digest extracellular matrix components. The dissociated cells can then be stored in a suitable buffer, such as DPBS.
[00261] Cell Sorting: A biological sample or a cell suspension may be subjected to sorting to isolate a cell of interest. Sorting may be performed to select or isolate a cell based on a quality or characteristic of the cell, e.g., expression of a protein target, size, deformability, fluorescence or other optical property, or other physical property of the cell. Sorting may accomplished using any number of approaches, e.g., usingimmunosorting(e.g., fluorescence activated cell sorting(FACS) or magnetic activated cell sorting (MACS)), electrophoretic approaches, chromatography, microfluidic approaches (e.g., using inertial focusing, cell traps, electrophoresis, isoelectric focusing), acoustic sorting, optical sorting(e.g., optoelectronic tweezers), mechanical cell picking (e.g., using manual or robotic pipettes) or passive approaches (e.g., gravitational settling).
[00262] Partitioning: Cells of a biological sample or cell suspension may be partitioned into individual partitions such that at least a subset of the individual partitions comprises a single cell. The individual partitions may comprise a barcode molecule (e.g., fluorophore or set of fluorophores, nucleic acid barcode molecules, etc.). Barcode molecules may be unique to the partition, such that each individual partition comprises a different barcode sequence than other partitions. The barcode molecules may be loaded into the individual partitions at any useful ratio of barcode molecules to sample species (e.g., cells, proteins, nucleic acid molecules). The barcode molecules may be loaded into partitions such that about O.0001, 0.001, 0.1, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 50, 100, 500, 1000, 5000, 10000, or 200000 barcodes are loaded per sample species. In some cases, the barcodes are loaded into partitions such that more than about 0.0001, 0.001, 0.1, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 50, 100, 500, 1000, 5000, 10000, or 200000 barcodes are loaded per sample species. In some cases, the barcodes are loaded in the partitions so that less than about 0.0001, 0.001, 0.1, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 50, 100, 500, 1000, 5000, 10000, or 200000 barcodes are loaded per sample species.
[00263] A partition may assume any useful geometry such as a droplet, a microwell, a solid substrate, a gel (e.g., a cell encapsulated in a gel bead), a bead, a flask, a tube, a spot, a capsule, a channel, a chamber, or other compartment or vessel. A partition may be part of an array of partitions, e.g., a droplet in a microfluidic device, a microwell of a microwell plate, a spot on a multi- spot array, etc.
[00264] Lysis, Permeabilization, and Analyte Extraction: Single cells (e.g., in partitions) may be processed to obtain one or more analytes contained therein. A method for processing a single cell may comprise lysing the cell to release the contents into the individual compartment or partition. Lysis may be performed using a detergent (e.g., Triton-X 100, sodium dodecyl sulfate, sodium deoxycholate, CHAPS), RIPA buffer, a change in temperature (e.g., elevated or lower temperature, freezing, freeze-thawing), enzymes, mechanical lysis (e.g., sonication, application of
mechanical force), electrical lysis, or a combination thereof. Lysis may be performed in the presence of protease inhibitors to prevent degradation or digestion of the proteins from the cell. The contents may optionally be further processed, e.g., subjected to purification or extraction, denaturation of proteins or peptides, enzyme or chemical digestion, etc. In some instances, the contents may be subjected to enzymatic digestion to remove nucleic acid molecules, e.g., using nucleases such as DNAse or RNAse. Alternatively, or in addition to, a cell may be fixed (e.g., using a fixative) and/or permeabilized. Examples of fixatives include aldehydes (e.g., glutaraldehyde, formaldehyde, paraformaldehyde), alcohols (e.g., methanol, ethanol), acetone, acids (e.g., acetic acid, Davidson’s AFA), oxidizing agents (e.g., osmium tetroxide, potassium dichromate, chromic acid, permanganate salts), Zenker’s fixative, picrates, Hepes-glutamic acid buffer-mediated organic solvent protection effect (HOPE), or Kamovsky fixative. Cell permeabilization may be achieved mechanically (e.g., using sonication, electroporation, shearing) or chemically (e.g., using an organic solvent such as methanol or acetone or detergents such as saponin, Tween-20, Triton X-100).
[00265] Protein Processing: The biological sample (or single cell suspensions or partitioned cells) may be further processed to enable proteomic analysis. For example, de-aggregation of proteins in the sample may be performed, e.g., using chemical or mechanical approaches. Chemical de-aggregation methods can include but are not limited to sodium dodecyl (SDS), Triton-X 100, 3-((3-cholamidopropyl) dimethylamminio)-l-proppanesulfonate (CHAPS), ethylene carbonate, or formamide. Mechanical de-aggregation methods can include but are not limited to sonication or high temperature treatment. The biological sample (or single cell suspensions or partitioned cells) may be subjected to conditions sufficientto denature one or more proteins. Denaturation may be achieved using heat, chemicals (e.g., SDS, urea, guanidine), reducing agents (e.g., dithiothreitol (DTT), beta mercaptoethanol, TCEP), urea, enzymes (e.g, ClpX, ClpS, unfoldases), ribozymes or DNAzymes. Similarly, the peptides or proteins may be subjected to conditions to solubilize the peptides or proteins in a solution, e.g., using detergents, organic solvents, spermidine, or tagging the peptides or proteins with polyionic tags (e.g., DNA, PEG, or other polymers). Alternatively, or in addition, the peptides or proteins may be enriched or purified; in an example, the peptides or proteins of interest may be precipitated using trichloroacetic acid, chloroform, trizole, or other chemical reagent. Other biological or chemical agents may be included during the protein processing, e.g., lysozymes, papain, cruzain, trypsin, protease inhibitors, nucleases or nuclease-containing proteins (e.g., DNAse, RNAse, DNA glycosylases, restriction endonucleases, transposases, micrococcal nucleases, Cas proteins).
[00266] Peptides or proteins may be fragmented prior to analysis. Fragmenting proteins may be useful in reducing the size of the proteins and allow for efficient processing of peptides, as is described elsewhere herein. Fragmentation may be performed using proteases, e.g., trypsin, chymotrypsin, pepsin, Lys-C, Glu-C, Proteinase K, furin, thrombin, endopeptidase, papain, subtilisin, elastase, enterokinase, genenanse, endoproteinase, metalloproteases, or with chemical treatment, e.g., cyanogen bromide, hydrazine, hydroxylamine, formic acid, BNPS-skatole, iodosobenzoic acid, 2-nitro-5-thiocyanobenzoic acid, etc. Alternatively, or in addition to, fragmentation may be performed using mechanical methods, such as sonication, vortexing, mechanical stirring, usingtemperature changes (e.g., freeze/thaw, heating), or other fragmentation approach.
[00267] Enrichment of proteins or peptides in a biological sample may be performed, e.g, for separatingproteins and peptides from cellular debris or othertypes of analytes (e.g., nucleic acids, lipids, carbohydrates, metabolites). Such enrichment may include, for example, the use of affinity columns (e.g., ion exchange), size exclusion columns, affinity precipitation (e.g., immunoprecipitation), chemical precipitation (e.g., using trichloroacetic acid, chloroform, TRIzol, chromatography (e.g., HPLC), or electrophoresis. In instances where cells are partitioned prior to enrichment, the enrichment may be performed using microbeads, affinity microcolumns, affinity beads, etc. In some instances, fractionation may be performed on the proteins or peptides, which may be used to separate the proteins by size, hydrophobicity, charge, affinity, size, mass, density, etc.
[00268] Peptides may be barcoded, in bulk or in partitions. Peptides maybe barcoded with any useful type of barcode molecule, e.g., spectral or fluorescent barcodes, mass tags, nucleic acid barcode molecules, etc. The barcode molecules may allow for identification of an originating peptide, a partition, a sample, a cell, or cell compartment. For example, a cell sample may be partitioned such that a partition comprises at most one cell; the partition may comprise a unique barcode molecule (e.g., nucleic acid barcode molecule) that identifies the partition and thus the cell. Subsequent labeling of the peptides within the partition (e.g., by permeabilizing or lysing the cell) with the barcode molecules may be useful in identifyingthe peptides as arising or originating from the same cell or partition. In other examples, a substrate may comprise nucleic acid molecules comprising a unique barcode sequence that differs from barcode sequences of other sub strates. As such, the barcode sequence may be used to identify the sub strate. In some instances, barcoded substrates may be partitioned with cell samples, such that at least a subset of the partitions comprise a single cell and a single barcoded substrate. As such, the peptides arising from the single cell and transferred to the barcoded substrate may all be identifiable as originating
from the single cell. Barcode molecules may comprise additional useful functional sequences, e.g., UMIs, primer sites, restriction sites, cleavage sites, transposition sites, sequencing sites, read sites, etc.
[00269] Attachment of barcode molecules to peptides may be achieved using any suitable chemistry. For example, C-terminal conjugation of nucleic acid barcode molecules may be achieved by amide coupling of amine-conjugated DNA barcode molecules to peptides or by thiol alkylation, e.g., reacting a thiolated peptide with an alkylated (e.g., iodoacetamide) DNA barcode molecule. N-terminal conjugation can be achieved, for instance, using2-pyridinecarboxyaldehyde labeling of a DNA barcode and reacting with the N-terminus of a peptide. Internal residues, e.g, glutamate, can also be labeled with amine-conjugated DNA barcode molecules or carboxylated DNA barcodes (e.g., to react with primary amines in lysine).
[00270] Individual peptides may be barcoded at multiple locations for a given peptide. A peptide may be labeled at multiple sites with the same or different barcode sequences. For example, a peptide may be partitioned into a partition comprising a plurality of identical barcode molecules that comprise a barcode sequence that is unique to the partition. The peptide may be labeled at a single or multiple sites with the unique partition barcode sequence, optionally each comprising a unique molecular identifier (UMI), such that subsequent downstream analysis (e.g, sequencing) may be attributable to the same peptide using the barcode sequence. In some instances, a terminus of the peptide (e.g., N-terminus or C-terminus) or an internal amino acid may be labeled with a barcode. In some instances, the peptide may be fragmented priorto analysis or sequencing; accordingly, upstream attachment of multiple identical barcode molecules to the same peptide may allow for attribution of the sequence analysis back to a single peptide. Barcoding of peptides may occur priorto, during, or subsequent to fragmentation. Peptides may be labeled with barcodes (e.g., nucleic acid barcode molecules) using any suitable chemistry, e.g, as described above, or using bifunctional or trifunctional linkers comprising multiple linking moieties, e.g., as described elsewhere herein, such as click chemistry moieties, NHS-esters, EDC, etc. For example, C-terminal attachment may comprise amide coupling to C-terminus carboxylic group or photoredox tagging of C-terminus carboxylic group (e.g., to add an electrophile tag). N- terminal attachment may comprise amide coupling to N-terminus amine group, where specific attachment can occur at low pH, or using 2-pyridinecarboxaldehyde variants for specific attachmentto N-terminus. Internal attachment may comprise, for example, amide couplingusing EDC/NHS chemistry or DMT-MM to Glutamate or Aspartate; alkylation or disulfide bridge labeling of cysteines; or amide coupling to lysine residues.
[00271] In some examples, a peptide may be labeled with different barcode molecules, which can be indexed by proximity to one another, e.g., usingprimers that can anneal to adjacent barcode molecules. In one such approach, after a protein has been labeled with a plurality of barcodes with different barcode sequences, proximity -based polymerase extension may be used to copy and associate the sequence of adjacent barcodes. For example, each barcode molecule may comprise a primer binding site, to which a dual-primer linker sequence comprising two sequences is annealed. The dual primer linker sequence can bind to the primer binding sites of two adjacent barcodes. An extension reaction, e.g., using a polymerase, may extend and copy the barcode sequences of the adjacent barcodes. Subsequently, the dual primer linker sequence, which now has copies of the two adjacent barcodes, may be removed and sequenced. From the sequencing reads, an adjacency matrix of barcode sequences may be generated (e.g., to correspond barcode sequences on a single dual primer linker as spatially adjacent). Accordingly, each of the barcode sequences may be associated with a nearby adjacent barcode sequences, and as such, peptide portions may be aligned or attributed as being adjacent. Such an approach may be useful in instances where the peptide is fragmented, such that individual fragments of a peptide may be corresponded with the nearest neighbor using the barcode sequences.
[00272] In another example, a peptidemay be barcoded at multiple locations for a given peptide using bridge amplification. In such an approach, a peptide or protein may be labeled at multiple sites with a nucleic acid primer. A nucleic acid barcode molecule may be provided, which can anneal to the nucleic acid primer (not shown) or be ligated to the nucleic acid primer. Subsequent rounds of bridge amplification may be performed in order to copy the nucleic acid barcode molecule to the other primers located at other sites of the given peptide. In some examples, a peptide may be tagged with multiple copies of the nucleic acid primer, and barcode sequences may be provided sparsely, such that only one nucleic acid primer per peptide is extended by polymerase extension. Subsequent rounds of bridge amplification can result in a peptide having the same barcode sequence at each nucleic acid primer. Subsequent fragmenting of peptides may be performed, such that peptide fragments comprise on average, a single barcode. Accordingly, in some cases, the output such an amplification approach may be peptides with individual barcodes generated from fragmentingmulti-labeled proteins where peptides from the same protein have the same barcodes.
[00273] A sample of cells may be partitioned into individual partitions or compartments (e.g, droplets, microwells) such that at least a subset of the partitions comprise a single cell. The partitions may then be treated with a lysing agent to lyse the cells and release the proteins from the cells into the partition. The proteins may then be labeled with a partition-specific barcode (e.g,
using a barcode bead), such that all peptides or proteins arising from a single compartment comprises the same barcode. In some examples, the barcodes comprise nucleic acid barcode molecules, and the barcode sequence can be used in downstream processing, e.g., via sequencing the partition or cell from which a peptide originated. The nucleic acid barcode molecule may comprise any additional useful sequences, e.g., UMIs, primer sequences, etc.
[00274] Bulk Processing'. A biological sample may be processed in bulk. For example, a biological sample may be processed to obtain a suspension of cells, which may be directly lysed in the suspension, without partitioning of cells in individual compartments. Cells may be lysed in bulk using any useful approach, e.g., as described above and optionally subjected to further processing, e.g., homogenization, protease inhibition, denaturation, protein processing (e.g., chemical treatment, fragmentation), or a combination thereof. A biological sample may be subjected to pre-processing prior to cell lysis or protein extraction. Such pre-processing may include removal of debris, purification, filtration, concentration, or sorting.
[00275] Spatial barcoding'. A biological sample may comprise a tissue sample comprising multiple cells. Tissue samples may be processed using an approach to retain spatial information (e.g., to identify peptides from individual cells), e.g., using spatial barcodes. For instance, a 2-D or 3-D tissue sample may be provided, and individual cells or locations within a tissue sample may be contacted with a plurality of spatial barcodes (e.g., nucleic acid barcode molecules) comprising different barcode sequences. The different barcode sequences may be attributed to a particular location in the 2-D or 3-D tissue sample, which may correspond with a location of a cell. For example, spatial barcodes may be provided using deterministic methods such as two- photon patterning, or stochastic methods such as PCR, to assign different segments of the 2-D or 3-D tissue sample with unique spatial barcodes Accordingly, peptidesthat are labeled with spatial barcodes may be attributed back to a single location within a tissue sample, or back to a single cell.
Computer systems
[00276] The present disclosure provides computer systems that are programmed to implement methods of the disclosure. FIG. 3 shows a computer system 301 that is programmed or otherwise configured to sequence a polymeric analyte, e.g., peptide. The computer system 301 can regulate various aspects of imaging or image capture of the present disclosure, such as, for example, obtaining image data from a modified amino acid or stacked plurality of modified amino acids, and/or process the image data The computer system 301 can be an electronic device of a user or
a computer system that is remotely located with respect to the electronic device. The electronic device can be a mobile electronic device.
[00277] The computer system 301 includes a central processing unit (CPU, also “processor and “computer processor” herein) 305, which can be a single core or multi core processor, or a plurality of processors for parallel processing. The computer system 301 also includes memory or memory location 310 (e.g., random-access memory, read-only memory, flash memory), electronic storage unit 315 (e.g., hard disk), communication interface 320 (e.g., network adapter) for communicating with one or more other systems, and peripheral devices 325, such as cache, other memory, data storage and/or electronic display adapters. The memory 310, storage unit 315, interface 320 and peripheral devices 325 are in communication with the CPU 305 through a communication bus (solid lines), such as a motherboard. The storage unit 315 can be a data storage unit (or data repository) for storing data. The computer system 301 can be operatively coupled to a computer network (“network”) 340 with the aid of the communication interface 320. The network 340 can be the Internet, an internet and/or extranet, or an intranet and/or extranetthat is in communication with the Internet. The network 340 in some cases is a telecommunication and/or data network. The network 340 can include one or more computer servers, which can enable distributed computing, such as cloud computing. The network 340, in some cases with the aid of the computer system 301, can implement a peer-to-peer network, which may enable devices coupled to the computer system 301 to behave as a client or a server.
[00278] The CPU 305 can execute a sequence of machine-readable instructions, which can be embodied in a program or software. The instructions may be stored in a memory location, such as the memory 310. The instructions can be directed to the CPU 305, which can subsequently program or otherwise configure the CPU 305 to implement methods of the present disclosure. Examples of operations performed by the CPU 305 can include fetch, decode, execute, and writeback.
[00279] The CPU 305 can be part of a circuit, such as an integrated circuit. One or more other components of the system 301 can be included in the circuit. In some cases, the circuit is an application specific integrated circuit (ASIC).
[00280] The storage unit 315 can store files, such as drivers, libraries and saved programs. The storage unit 315 can store user data, e.g., user preferences and user programs. The computer system 301 in some cases can include one or more additional data storage units that are external to the computer system 301, such as located on a remote server that is in communication with the computer system 301 through an intranet or the Internet.
[00281] The computer system 301 can communicate with one or more remote computer systems through the network 340. For instance, the computer system 301 can communicate with a remote computer system of a user. Examples of remote computer systems include personal computers (e.g., portable PC), slate or tablet PC’s (e.g., Apple® iPad, Samsung® Galaxy Tab), telephones, Smart phones (e.g., Apple® iPhone, Android-enabled device, Blackberry®), or personal digital assistants. The user can access the computer system 301 via the network 340. [00282] Methods as described herein can be implemented by way of machine (e.g., computer processor) executable code stored on an electronic storage location of the computer system 301, such as, for example, on the memory 310 or electronic storage unit 315. The machine executable or machine-readable code can be provided in the form of software. During use, the code can be executed by the processor 305. In some cases, the code can be retrieved from the storage unit 315 and stored on the memory 310 for ready access by the processor 305. In some situations, the electronic storage unit 315 can be precluded, and machine-executable instructions are stored on memory 310.
[00283] The code can be pre-compiled and configured for use with a machine having a processer adapted to execute the code or can be compiled during runtime. The code can be supplied in a programming language that can be selected to enable the code to execute in a precompiled or as-compiled fashion.
[00284] Aspects of the systems and methods provided herein, such as the computer system 301, can be embodied in programming. Various aspects of the technology may be thought of as “products” or “articles of manufacture” typically in the form of machine (or processor) executable code and/or associated data that is carried on or embodied in a type of machine readable medium. Machine-executable code can be stored on an electronic storage unit, such as memory (e.g., readonly memory, random-access memory, flash memory) or a hard disk. “Storage” type media can include any or all of the tangible memory of the computers, processors or the like, or associated modules thereof, such as various semiconductor memories, tape drives, disk drives and the like, which may provide non-transitory storage at any time for the software programming. All or portions of the software may at times be communicated through the Internet or various other telecommunication networks. Such communications, for example, may enable loading of the software from one computer or processor into another, for example, from a management server or host computer into the computer platform of an application server. Thus, another type of media that may bear the software elements includes optical, electrical and electromagnetic waves, such as used across physical interfaces between local devices, through wired and optical landline networks and over various air-links. The physical elements that carry such waves, such as wired
or wireless links, optical links or the like, also may be considered as media bearing the software. As used herein, unless restricted to non-transitory, tangible “ storage” media, terms such as computer or machine “readable medium” refer to any medium that participates in providing instructions to a processor for execution.
[00285] Hence, a machine readable medium, such as computer-executable code, may take many forms, including but not limited to, a tangible storage medium, a carrier wave medium or physical transmission medium. Non-volatile storage media include, for example, optical or magnetic disks, such as any of the storage devices in any computer(s) or the like, such as may be used to implement the databases, etc. shown in the drawings. Volatile storage media include dynamic memory, such as main memory of such a computer platform. Tangible transmission media include coaxial cables; copper wire and fiber optics, including the wires that comprise a bus within a computer system. Carrier-wave transmission media may take the form of electric or electromagnetic signals, or acoustic or light waves such asthose generated duringradio frequency (RF) and infrared (IR) data communications. Common forms of computer-readable media therefore include for example: a floppy disk, a flexible disk, hard disk, magnetic tape, any other magnetic medium, a CD-ROM, DVD or DVD-ROM, any other optical medium, punch cards paper tape, any other physical storage medium with patterns of holes, a RAM, a ROM, a PROM and EPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wave transporting data or instructions, cables or links transporting such a carrier wave, or any other medium from which a computer may read programming code and/or data. Many of these forms of computer readable media may be involved in carrying one or more sequences of one or more instructions to a processor for execution.
[00286] The computer system 301 can include or be in communication with an electronic display 335 that comprises a user interface (UI) 340 for providing, for example, visual representation of the images or detection of the detectable labels described herein. Examples of UFs include, without limitation, a graphical user interface (GUI) and web-based user interface. [00287] Methods and systems of the present disclosure can be implemented by way of one or more algorithms. An algorithm can be implemented by way of software upon execution by the central processing unit 305. The algorithm can, for example, optimize parameters for imaging process image data, display various aspects of the image data, or provide a readout of the detected label.
Examples
Example 1- Super-Resolution Imaging of a Modified Amino Acid
[00288] As described herein, a method of characterizing a modified amino acid may comprise providingthe modified amino acid comprising a polymerizable molecule, contactingthe modified amino acid with a binding agent that comprises a detectable label, and detecting the detectable label. In some instances, the detection is performed using super-resolution imaging. To demonstrate the feasibility of this approach, a modified amino acid (phenylalanine) is generated and coupled to a substrate comprising a nucleic acid capture moiety. The substrate-bound modified phenylalanine is contacted with an antibody comprising a fluorophore. The antibody is specific to the modified phenylalanine. Detection is performed using super-resolution imaging. [00289] Preparation of substrates: The glass substrates were rigorously cleaned using detergent treatment to eliminate all organic residues from the surface, followed by etching with potassium hydroxide to enhance surface area. Subsequently, the substrate was subjected to thorough cleansing with deionized water and methanol prior to the initiation of the plasma cleaning procedure. The adoption of oxygen plasma cleaning was chosen to activate the glass surface for silanization purposes. A 0.5 mL of (3 -Aminopropyl)tri ethoxy silane (APTES) was uniformly coated onto the substrates utilizing the vapor deposition technique to facilitate polymerization. For the construction of a brush-structured polymer layer aimed at minimizing non-specific binding of oligonucleotides and antibodies, a 5000 molecular weight polyethylene glycol (FMOC NH-PEG-NHS-ester and NHS-ester mPEG in 1 to 40 ratio) was employed. Following the polymerization process, the substrate underwent treatment with 10% ammonium hydroxide to eliminate FMOC, and was subsequently functionalized with the desired functional groups, including alkene, azide, DBCO, BCN, among others, in preparation for subsequent experiments involving single-molecule imaging.
[00290] Next, the azide-functionalized sub strates were conjugated with DNA capture moieties using click chemistry. The DNA capture moieties were assembled by first attaching DBCO- functionalized, single-stranded DNA molecules at a 11 micromolar concentration to the azide- functionalized substrate and annealing an additional single-stranded DNA molecule (10 micromolar concentration) at 85 degrees for 20 minutes, thereby generating a partially doublestranded DNA capture moiety that is covalently linked to the substrate (via the azide-DBCO reaction). The single-stranded portion of the partially double-strandedDNA capture moiety allows for coupling of the modified amino acid.
[00291] FIG. 4 shows example data of a substrate described herein comprising DNA capture moieties 405, generating using the above-described process. FIG. 4 Panel A shows a schematic of the substrate comprising a PEG brush layer and that is functionalized with partially doublestranded DNA capture moieties 405. 30% of the conjugated, partially double-stranded DNA
capture moieties are labeled with a fluorophore (TMR) and imaged using super-resolution microscopy (ONI Nanoimager). FIG. 4 Panel B shows a fluorescence micrograph of spatially resolvable DNA-capture moieties (single dots). The scale bar indicates 100 nm. FIG. 4 Panel C shows another field of view of the substrate comprising the spatially resolved, partially doublestranded DNA-capture moieties. The line histogram shows three selected individual partially double-stranded DNA capture moieties that are spaced at a pitch of approximately 60 nm. The scalebar represents 0.4 microns.
[00292] Preparation ofModified Amino Acids: Modified amino acids may be prepared from a peptide, as described herein. In one example, a linker comprising an azide moiety and a phenylisothiocyanate (isothiocyanatobenzene) group is reacted with a peptide under mildly alkaline conditions. The phenylisothiocyanate reacts with the N-terminal amino acid (NTAA) of the peptide under mildly alkaline conditionsto generate an amino acid-linker complex. Prior to, during, or sub sequent to conjugation of the linker to the NTAA, the linker is reacted with aDBCO- conjugated linking nucleic acid molecule (linking DNA molecule). The reaction of the linker with the NTAA and the linking DNA molecule forms an amino acid-linker complex, which may then be coupled to the capture moiety via direct hybridization or via a splint molecule. The NTAA is then cleaved from the peptide (e.g., using acidic cleavage), thereby providing a modified amino acid that comprises a derivatized amino acid, the linker, and the linking DNA molecule that is coupled to the capture moiety. The modified amino acid may then be detected using one or more binding agents.
[00293] As proof of concept of the feasibility of the method, a synthetic modified amino acid is generated. The synthetic modified amino acid comprises a Cy5-labeled DNA molecule (an example of a polymerizable molecule that is a linking nucleic acid) that is conjugated to a derivatized amino acid, which is a phenylthiocarbamoyl derivative of phenylalanine (e.g., simulating the acidic cleavage reaction of the amino acid-linker complex from the peptide). For simplicity, the cleaved, phenylthiocarbamoyl phenylalanine derivative that comprises the linker is termed “ClickP-Phe” herein, and the synthetic modified amino acid comprises ClickP-Phe and the Cy5-labeled DNA molecule. The synthetic modified amino acid is then coupled to the substrate at a lOnM concentration for 1 hour by hybridizing the Cy5-labeled DNA molecule to the DNA capture moiety . The sub strate is then washed in a Tris buff er comprising 15x Tris-EDTA, 1.25M sodium chloride, and water.
[00294] Antibody Binding Agents: A custom -gen erated anti-phenylalanine (“Anti-Phe”) antibody is generated using a mouse hybridoma technology, as described in U.S. Prov. Pat. App. No. 63/584,382, filed on September 21, 2023, which is incorporated by reference herein in its
entirety. The Anti-Phe antibody specifically recognizes a portion of the synthetic modified amino acid, ClickP-Phe, and does not bind to native phenylalanine in a peptide. 5 nM of Anti-Phe is prepared and applied to the substrate and allowed to bind to the antigen the synthetic modified amino acid, for 5 minutes, then washed with deionized water, followed by Tris buffer. All antibody binding reactions occurs at ambient temperature.
[00295] FIG. 5 shows example data of detection of the synthetic modified amino acid. FIG. 5 Panel A shows a schematic of the substrate described above comprising a PEG brush layer and that is functionalized with partially double-stranded DNA capture moieties 505. The synthetic modified amino acids 531 comprising a ClickP-Phe and Cy5-labeled DNA molecule 533 are coupled to the partially double-stranded DNA capture moieties 505 by hybridizing the Cy5- labeled DNA molecule 533 to a partially double-stranded DNA capture moiety 505. An Anti-Phe antibody thatis fluorescently labeled (with TagRFP) is provided and allowed to bind to the ClickP- Phe. The fluorescence is then detected using super-imaging microscopy. FIG. 5 Panel B shows a fluorescence micrograph of spatially resolvable synthetic modified amino acids 531 comprising ClickP-Phe. FIG. 5 Panel C shows a negative control sample in which no synthetic modified amino acid was provided. The absence of fluorescence indicates thatthe antibody does not bind to the substrate or capture moiety in the absence of the antigen, the synthetic modified amino acid. FIG. 5 Panel D shows a fluorescence micrograph of addition of the Anti-Phe antibody again after washing the substrate shown in FIG. 5 Panel C, and adding the antigen, the synthetic modified amino acid. The fluorescence dots indicate binding of the Anti-Phe antibody to the antigen (synthetic modified amino acid).
[00296] FIG. 6 shows additional example data of detection of the synthetic modified amino acid using super-resolution imaging. The micrographs (left) shows two fluorescence channels that indicate the modified amino acid (“ClickP-Phe”) and the antibody (“Anti-Phe”). The center image indicates the overlaid image of the two fluorescence channels. The right-hand image shows two selected fluorescent dots in both channels, representing colocalization of the antibody (Anti-Phe) and the synthetic modified amino acid (comprising ClickP-Phe).
[00297] FIG. 7 shows additional example data of detection of the modified amino acid using super-resolution imaging. The micrographs (left and center) show the imaging of a substrate with the synthetic modified amino acid (comprising ClickP-Phe) and the antibody (Anti-Phe) comprising a TagRFP label, respectively. The right plot shows the fluorescence distribution of a selected modified amino acid-antibody complex as a function of distance in the X-Y, X-Z, and Y- Z dimensions. Both fluorescence channels for the antibody and the modified amino acid are overlaid (merged). The offset distance between the antibody and the modified amino acid is
-I l l-
representative of the -200 nm distance between the fluorophore of the antibody from the binding site of the antibody. Altogether, these results suggest that the antibody recognizes and binds to its target, which binding can be visualized and detected using super-resolution imaging.
Example 2- Immobilization of Modified Amino Acids
[00298] As described herein, a modified amino acid may be provided, coupled to a substrate, and detected using one or more binding agents.
[00299] FIG. 8 schematically shows an example detection method described herein. A modified amino acid derived from a peptide is provided, e.g., as described above. The modified amino acid (also referred to herein as “ClickP-amino acid” or “ClickP-AA”) comprises a linker and a linking DNA molecule.
[00300] Two sets of DNA oligos are ordered from Integrated DNA Technologies (IDT); the first set comprises a DNA oligo (“first DNA oligo”) that is used for immobilization onto a glass substrate, as described above, and the second set comprises the linkingDNAmoleculethatis used for generating the modified amino acid. The first set of DNA oligos comprises a DBCO moiety that is immobilized onto a glass substrate comprising azide moieties, as described above, thereby generating a “carpet” of the first DNA oligo that acts as anchoring oligos. The first DNA oligo may be single-stranded or partially double-stranded to enable sticky-end ligation of the linking DNA molecule, e.g., subsequentto attachingthelinkingDNA molecule to anNTAA and cleaving the NTAA from the peptide to generate the modified amino acid (ClickP-AA), as described above. [00301] The linking DNA molecule of a modified amino acid is annealed to an anchor oligo (a firstDNA oligo), either directly or via a splint molecule. Ligation of the anchor oligo to the linking DNA molecule of the modified amino acid can be performed using a ligase, e.g., New England Biolabs Quick Ligase. Subsequent to ligation, a fluorescently-labeled antibody is provided. The fluorescently-labeled antibody is specific to a particular amino acid type (e.g., an antiphenylalanine antibody, an anti-valine antibody, etc.) comprised by the modified amino acid. The antibody is incubated with the substrate and allowed to bind to the modified amino acid, then washed to remove unbound antibody. The fluorescence is measured using fluorescence microscopy, which can output the identity of the modified amino acid (e.g., phenylalanine, valine, etc.).
[00302] FIG. 9 schematically illustrates another example of a detection method. A stacked plurality of modified amino acids derived from a peptide is provided, e.g., as described elsewhere herein. The stacked plurality of modified amino acids comprise individual modified amino acids (also referred to herein as “ClickP-amino acid” or “ClickP-AA”) that are linked together. The
individual modified amino acids each comprise a linker and a linking DNA molecule; the linking DNA molecules are coupled together to generate the stacked plurality of modified amino acids. [00303] Two sets of DNA oligos are ordered from Integrated DNA Technologies (IDT); the first set comprises a DNA oligo (“first DNA oligo”) that is used for immobilization onto a glass substrate, and the second set comprises the linking DNA molecule that is used for generating the modified amino acid. The first set of DNA oligos comprises a biotin moiety that is capable of being immobilized onto a glass substrate that is coated with streptavidin (anchoring molecule). [00304] The stacked plurality of modified amino acids is generated, as described elsewhere herein, using a linker and the second set of DNA molecule (linking DNA molecules). The linking DNA molecules can then anneal to the first DNA oligo (comprising the biotin moiety) to generate a double-stranded DNA molecule. Alternatively, or in addition to, multiple DNA molecules may be used to generate the double-stranded DNA molecule comprising the biotin moiety. The doublestranded DNA molecule is then immobilized to a streptavidin-coated glass substrate via biotinstreptavidin interaction. Subsequent to immobilization, the modified amino acids are detected using fluorescently-labeled antibodies. The fluorescently -lab eled antibodies may be specific to a particular amino acid type (e.g., an anti-phenylalanine antibody, an anti-valine antibody, etc.) comprised by one or more modified amino acids of the stacked plurality of modified amino acids. The antibodies are incubated with the substrate and allowed to bind to the modified amino acids, then washed to remove unbound antibody. The fluorescence is measured using fluorescence microscopy, which can output the identity of the modified amino acids (e.g., phenylalanine, valine, etc.) within the stacked plurality of modified amino acids (or plurality of stacked plurality of modified amino acids).
[00305] FIG. 10 schematically illustrates yet another example of a detection method. Astacked plurality of modified amino acids derived from a peptide is provided, e.g., as described elsewhere herein. The stacked plurality of modified amino acids comprise individual modified amino acids (also referred to herein as “ClickP-amino acid” or “ClickP-AA”) that are linked together along a circular DNA backbone. The individual modified amino acids each comprise a linker and a linking DNA molecule; the linking DNA molecules are coupled together to generate the circularized stacked plurality of modified amino acids.
[00306] Two sets of DNA oligos are ordered from Integrated DNA Technologies (IDT); the first set comprises a DNA oligo (“first DNA oligo”) that is used for immobilization onto a glass substrate, and the second set comprises the linking DNA molecule that is used for generating the modified amino acid. The first set of DNA oligos comprises a biotin moiety that is capable of being immobilized onto a glass substrate that is coated with streptavidin (anchoring molecule).
[00307] The stacked plurality of modified amino acids is generated, as described elsewhere herein, using a linker and the second set of DNA molecules (linkingDNA molecules). The stacked plurality of modified amino acids may comprise concatenated copies of the linking DNA molecules (e.g., each modified amino acid of the stacked plurality of modified amino acids may comprise a linking DNA molecule). The linking DNA molecules of the modified amino acids can then anneal to the first DNA oligo (comprising the biotin moiety) to generate a double-stranded DNA molecule. Alternatively, or in addition to, multiple DNA molecules may be used to generate the double-stranded DNA molecule. The double-stranded DNA molecule may thenbe circularized and ligated (e.g., using ligase). In some instances, two or more double-stranded DNA molecules may be joined together and be circularized via ligation. The double-stranded, circularized stacked plurality of modified amino acids is then immobilized to a streptavidin-coated glass substrate via biotin-streptavidin interaction. Subsequent to immobilization, the modified amino acids are detected using fluorescently-labeled antibodies. The fluorescently -labeled antibodies may be specific to a particular amino acid type (e.g., an anti-phenylalanine antibody, an anti-valine antibody, etc.) comprised by one or more modified amino acids of the stacked plurality of modified amino acids. The antibodies are incubated with the substrate and allowed to bind to the modified amino acids, then washed to remove unbound antibody. The fluorescence is measured using fluorescence microscopy, which can output the identity of the modified amino acids (e.g, phenylalanine, valine, etc.) within the stacked plurality of modified amino acids (or plurality of stacked plurality of modified amino acids).
[00308] While preferred embodiments of the present invention have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. Itis not intended that the invention be limited by the specific examples provided within the specification. While the invention has been described with reference to the aforementioned specification, the descriptions and illustrations of the embodiments herein are not meant to be construed in a limiting sense. Numerous variations, changes, and substitutionswill now occur to those skilled in the art without departing from the invention. Furthermore, it shall be understood that all aspects of the invention are not limited to the specific depictions, configurations or relative proportions set forth herein which depend upon a variety of conditions and variables. It should be understood that various alternatives to the embodiments of the invention described herein may be employed in practicing the invention. It is therefore contemplated that the invention shall also cover any such alternatives, modifications, variations or equivalents. Itis intended that the following claims define the scope of the invention and that methods and structures within the scope of these claims and their equivalents be covered thereby.
Claims
1 . A method for characterizing a modified amino acid, comprising:
(a) providing said modified amino acid, wherein said modified amino acid comprises a polymerizable molecule coupled thereto;
(b) contacting said modified amino acid with a binding agent, wherein said binding agent comprises a detectable label; and
(c) detecting said detectable label, thereby identifying an amino acid type of said modified amino acid.
2. The method of claim 1, wherein said detecting comprises imaging said detectable label.
3. The method of claim 2, wherein said detectable label comprises a fluorophore or quantum dot.
4. The method of claim 1, wherein said detectable label comprises a fluorescent label, a fluorescence resonance energy transfer (FRET) label, a chemiluminescence label, an electrochemiluminescence label, a bioluminescence label, a phosphorescence label, or a label that generates light through other types of reactions or stimulations.
5. The method of claim 2 or 3, wherein said imaging is performed using super-resolution microscopy.
6. The method of claim 5, wherein said super-resolution microscopy is direct stochastic optical reconstruction microscopy (dSTORM).
7. The method of any one of claims 1-6, further comprising, prior to (a), generating said modified amino acid.
8. The method of claim 7, wherein said generating comprises (I) providing a linker and said polymerizable molecule; (II) coupling said linker to (i) an amino acid of a peptide and (ii) said polymerizable molecule to generate an amino acid-linker complex; and (III) cleaving said amino acid, thereby generating said modified amino acid.
9. The method of claim 8, wherein said linker is unifunctional, bifunctional, trifunctional, quadrifunctional, or polyfunctional.
10. The method of claim 8 or 9, further comprising (IV) coupling said amino acid-linker complex or said modified amino acid to a capture moiety.
11 . The method of claim 10, wherein said capture moiety is coupled to a substrate.
12. The method of claim 10, wherein said capture moiety is coupled to said peptide.
13. The method of claim 12, wherein said capture moiety is coupled to a C-terminus of said peptide.
14. The method of any one of claims 10-13, wherein (IV) is performed prior to (III).
15. The method of any one of claims 8-14, further comprising derivatizing said modified amino acid.
16. The method of any one of claims 8-15, further comprising repeating (I)-(III) on said peptide.
17. The method of any one of claims 10-16, further comprising repeating (IV) on said peptide.
18. The method of claim 16 or 17, wherein said repeating yields a stacked plurality of modified amino acids comprising a plurality of stacked polymerizable molecules.
19. The method of claim 18, wherein said stacked plurality of modified amino acids comprises a first amino acid and a second amino acid, wherein said first amino acid and said second amino acid are spaced along said stacked plurality of stacked polymerizable molecules at a distance greater than 50 nanometers.
20. The method of any one of claims 1-19, wherein (a) comprises providing a fluidic device, and loading said modified amino acid in said fluidic device.
21. The method of claim 20, wherein said fluidic device is a microfluidic device.
22. The method of claim 20 or 21, further comprising, linearizing said polymerizable molecule in said fluidic device.
23. The method of claim 22, wherein said linearizing is performed using an electric field.
24. The method of claim 22 or 23, wherein said linearizing is performed using shear stress.
25. The method of any one of claims 20-24, further comprising loading said modified amino acid in said fluidic device.
26. The method of claim 25, wherein said fluidic device comprises a surface comprising a plurality of nucleic acid anchor molecules coupled thereto, wherein said plurality of nucleic acid anchor molecules is configured to couple to said modified amino acid.
27. The method of claim 26, wherein said polymerizable molecule comprises a nucleic acid molecule that is configured to hybridize to a nucleic acid anchor molecule of said plurality of nucleic acid anchor molecules.
28. The method of claim 25, wherein said loading comprises loading a plurality of modified amino acids.
29. The method of any one of claims 1-28, wherein said binding agent comprises an antibody, an antibody fragment, a nanobody, an aptamer, a peptide, a polymer, an inorganic compound, or a small molecule.
30. The method of any one of claims 1-29, wherein said polymerizable molecule comprises a DNA molecule.
31. The method of claim 30, wherein said DNA molecule comprises at least 150 nucleotides.
32. The method of claim 30, wherein said DNA molecule has a length of at least 50 nanometers.
33. The method of claim 30, wherein said DNA molecule comprises an adapter sequence.
34. The method of claim 33, wherein said adapter sequence is configured to couple to an anchor sequence of a substrate.
35. A method for characterizing a modified amino acid, comprising:
(a) providing said modified amino acid, wherein said modified amino acid comprises a polymerizable molecule; and
(b) using super-resolution imaging to determine the identity of said modified amino acid.
36. The method of claim 35, wherein (b) is performed using amino acid-specific binding agents.
37. The method of claim 36, wherein said amino acid-specific binding agents comprise a detectable label.
38. The method of claim 37, wherein said detectable label comprises a fluorophore or quantum dot.
39. The method of claim 37, wherein said detectable label comprises a fluorescent label, a FRET label, a chemiluminescence label, an electrochemiluminescence label, a bioluminescence label, a phosphorescence label, or a label that generates light through other types of reactions or stimulations.
40. The method of claim 38, wherein said super-resolution imaging is dSTORM.
41. The method of any one of claims 35-40, further comprising, prior to (a), generating said modified amino acid.
42. The method of claim 41, wherein said generating comprises (I) providing a linker and said polymerizable molecule; (II) coupling said linker to (i) an amino acid of a peptide and (ii) said polymerizable molecule to generate an amino acid-linker complex; and (III) cleaving said amino acid, thereby generating said modified amino acid.
43. The method of claim 42, wherein said linker is unifunctional, bifunctional, trifunctional, quadrifunctional, or polyfunctional.
44. The method of claim 42 or 43, further comprising (IV) coupling said amino acid-linker complex or said modified amino acid to a capture moiety.
45. The method of claim 44, wherein said capture moiety is coupled to a substrate.
46. The method of claim 44, wherein said capture moiety is coupled to said peptide.
47. The method of claim 46, wherein said capture moiety is coupled to a C-terminus of said peptide.
48. The method of any one of claims 44-47, wherein (IV) is performed prior to (III).
49. The method of any one of claims 41-48, further comprising derivatizing said modified amino acid.
50. The method of any one of claims 42-49, further comprising repeating (I)-(III) on said peptide.
51. The method of any one of claims 44-50, further comprising repeating (IV) on said peptide.
52. The method of claim 50, wherein said repeating yields a stacked plurality of modified amino acids comprising a plurality of stacked polymerizable molecules.
53. The method of claim 52, wherein said stacked plurality of modified amino acids comprises a first amino acid and a second amino acid, wherein said first amino acid and said second amino acid are spaced along said plurality of stacked polymerizable molecules at a distance greater than 50 nanometers.
54. The method of any one of claims 32-53, wherein (a) comprises providing a fluidic device, and loading said modified amino acid in said fluidic device.
55. The method of claim 54, wherein said fluidic device is a microfluidic device.
56. The method of claim 54 or 55, further comprising, linearizing said polymerizable molecule in said fluidic device.
57. The method of claim 56, wherein said linearizing is performed using an electric field.
58. The method of claim 56, wherein said linearizing is performed using shear stress.
59. The method of any one of claims 54-58, further comprising providing a plurality of modified amino acids, including said modified amino acid, and loading said plurality of modified amino acids in said fluidic device.
60. The method of claim 59, wherein said fluidic device comprises a surface comprising a plurality of nucleic acid anchor molecules coupled thereto, wherein said plurality of nucleic acid anchor molecules is configured to couple to said plurality of modified amino acids.
61. The method of claim 60, wherein said polymerizable molecule comprises a nucleic acid molecule that is configured to couple to a nucleic acid anchor molecule of said plurality of nucleic acid anchor molecules via hybridization.
62. The method of any one of claims 35-61, wherein said binding agent comprises an antibody, an antibody fragment, a nanobody, an aptamer, a peptide, a polymer, an inorganic compound, or a small molecule.
63. The method of any one of claims 35-62, wherein said polymerizable molecule comprises a DNA molecule.
64. The method of claim 63, wherein said DNA molecule comprises at least 150 nucleotides.
65. The method of claim 63, wherein said DNA molecule has a length of at least 50 nanometers.
66. The method of claim 63, wherein said DNA molecule comprises an adapter sequence.
67. The method of claim 66, wherein said adapter sequence is configured to couple to an anchor sequence of a substrate.
68. A method for processing a DNA molecule, comprising:
(a) attaching a first sequence of said DNA molecule to a substrate;
(b) linearizing said DNA molecule adjacent to said substrate; and
(c) attaching a second sequence of said DNA molecule to said substrate.
69. The method of claim 68, wherein said linearizing is performed using shear stress or an electric field.
70. The method of claim 68 or 69, wherein said DNA molecule is part of a modified amino acid, wherein said modified amino acid comprises an amino acid or derivative thereof.
71. The method of claim 70, further comprising detecting said modified amino acid.
72. The method of claim 71, wherein said detecting comprises imaging.
73. The method of claim 72, wherein said imaging is performed using super-resolution microscopy.
74. The method of claim 73, wherein said imaging is performed using dSTORM.
75. The method of any one of claims 70-74, further comprising coupling a binding agent to said modified amino acid, wherein said binding agent comprises a detectable label.
76. The method of claim 75, wherein said binding agent comprises an antibody, an antibody fragment, a nanobody, an aptamer, a peptide, a polymer, an inorganic compound, or a small molecule.
77. The method of claim 75 or 76, wherein said detectable label comprises a fluorophore or quantum dot.
78. The method of claim 75 or 76, wherein said detectable label comprises a fluorescent label, a FRET label, a chemiluminescence label, an electrochemiluminescence label, a
bioluminescence label, a phosphorescence label, or a label that generates light through other types of reactions or stimulations.
79. The method of any one of claims 68-77, wherein said substrate comprises a surface of a fluidic device.
80. The method of claim 79, wherein said fluidic device is a microfluidic device.
81. The method of claim 79 or 80, further comprising, providing a plurality of DNA molecules, including said DNA molecule, and loading said plurality of DNA molecules in said fluidic device.
82. The method of claim 81 , wherein said DNA molecule is attached to said substrate using hybridization.
83. The method of claim 81, wherein said plurality of DNA molecules is positioned adjacent to said surface such that a pitch between DNA molecules of said plurality of DNA molecules is about 1 micron.
84. The method of claim 68, wherein said DNA molecule comprises at least 150 nucleotides.
85. The method of claim 68, wherein said DNA molecule has a length of at least 50 nanometers.
86. The method of claim 68, wherein said DNA molecule comprises an adapter sequence.
87. The method of claim 86, wherein said adapter sequence is configured to anchor said DNA molecule to a complementary sequence of a flow cell.
88. The method of any one of claims 68-87, wherein said DNA molecule is a modified DNA molecule.
89. A composition comprising a modified amino acid, comprising: said modified amino acid, wherein said modified amino acid comprises a polymerizable molecule; a binding agent bound to said modified amino acid, wherein said binding agent comprises a detectable label.
90. The composition of claim 89, wherein said detectable label comprises a fluorophore or quantum dot.
91 . The composition of claim 89, wherein said detectable label comprises a fluorescent label, a FRET, a chemiluminescence label, an electrochemiluminescence label, a bioluminescence label, a phosphorescence label, or a label that generates light through other types of reactions or stimulations.
92. The composition of claim 90 or 91, further comprising a substrate, wherein said substrate is bound to said modified amino acid.
93. The composition of claim 92, wherein said polymerizable molecule is linearized adjacent to said substrate.
94. A method for characterizing a modified amino acid, comprising:
(a) providing said modified amino acid, wherein said modified amino acid comprises a polymerizable molecule; and
(b) using Raman spectroscopy associated high-resolution imaging technique to determine the identity of said modified amino acid.
95. The method of claim 94, wherein (b) is performed using label-free detection of said modified amino acid.
96. The method of claim 95, wherein said modified amino acid generates intrinsic vibrational modes in the Raman spectrum.
97. The method of claim 96, wherein said intrinsic vibrational modes is enhanced or modulated by specialized methods.
98. The method of claim 94, wherein (b) is performed using a binding agent associated with said modified amino acid.
99. The method of claim 98, wherein said binding agent comprises an antibody, an antibody fragment, a nanobody, an aptamer, a peptide, a polymer, an inorganic compound, or a small molecule.
100. The method of claim 98, wherein said binding agent comprises a detectable label for Raman spectroscopy.
101. The method of claim 100, wherein said detectable label comprises 4- Mercaptobenzoic Acid (4-MBA), 4-Aminothiophenol (4-ATP), Crystal Violet, Malachite Green, Nile Blue, 2-Naphthalenethiol, Methyleneblue, or any combination thereof.
102. The method of claim 94, wherein said high-resolution imaging technique is super-resolution imaging.
103. The method of claim 102, wherein said super-resolution imaging detects a single molecule of said modified amino acid when (b) is performed using label-free detection of said modified amino acid.
104. The method of claim 102, wherein said super-resolution imaging detects a single molecule of said detectable label when (b) is performed using a binding agent associated with said modified amino acid.
105. The method of claim 94, wherein a stacked plurality of modified amino acids comprises said modified amino acid and an additional modified amino acid, wherein said
modified amino acid and said additional modified amino acid are spaced along said plurality of stacked polymerizable molecules at a distance greater than 50 nanometers.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202463618018P | 2024-01-05 | 2024-01-05 | |
| US63/618,018 | 2024-01-05 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2025147587A1 true WO2025147587A1 (en) | 2025-07-10 |
Family
ID=94533035
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2025/010206 Pending WO2025147587A1 (en) | 2024-01-05 | 2025-01-03 | Protein sequencing using super-resolution imaging |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2025147587A1 (en) |
Citations (23)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US4318846A (en) | 1979-09-07 | 1982-03-09 | Syva Company | Novel ether substituted fluorescein polyamino acid compounds as fluorescers and quenchers |
| US4757141A (en) | 1985-08-26 | 1988-07-12 | Applied Biosystems, Incorporated | Amino-derivatized phosphite and phosphate linking agents, phosphoramidite precursors, and useful conjugates thereof |
| US5066580A (en) | 1988-08-31 | 1991-11-19 | Becton Dickinson And Company | Xanthene dyes that emit to the red of fluorescein |
| US5091519A (en) | 1986-05-01 | 1992-02-25 | Amoco Corporation | Nucleotide compositions with linking groups |
| US5151507A (en) | 1986-07-02 | 1992-09-29 | E. I. Du Pont De Nemours And Company | Alkynylamino-nucleotides |
| US5188934A (en) | 1989-11-14 | 1993-02-23 | Applied Biosystems, Inc. | 4,7-dichlorofluorescein dyes as molecular probes |
| US5366860A (en) | 1989-09-29 | 1994-11-22 | Applied Biosystems, Inc. | Spectrally resolvable rhodamine dyes for nucleic acid sequence determination |
| US5688648A (en) | 1994-02-01 | 1997-11-18 | The Regents Of The University Of California | Probes labelled with energy transfer coupled dyes |
| US5800996A (en) | 1996-05-03 | 1998-09-01 | The Perkin Elmer Corporation | Energy transfer dyes with enchanced fluorescence |
| US5847162A (en) | 1996-06-27 | 1998-12-08 | The Perkin Elmer Corporation | 4, 7-Dichlororhodamine dyes |
| US5990479A (en) | 1997-11-25 | 1999-11-23 | Regents Of The University Of California | Organo Luminescent semiconductor nanocrystal probes for biological applications and process for making and using such probes |
| US6207392B1 (en) | 1997-11-25 | 2001-03-27 | The Regents Of The University Of California | Semiconductor nanocrystal probes for biological applications and process for making and using such probes |
| US6251303B1 (en) | 1998-09-18 | 2001-06-26 | Massachusetts Institute Of Technology | Water-soluble fluorescent nanocrystals |
| US6322901B1 (en) | 1997-11-13 | 2001-11-27 | Massachusetts Institute Of Technology | Highly luminescent color-selective nano-crystalline materials |
| US20020045045A1 (en) | 2000-10-13 | 2002-04-18 | Adams Edward William | Surface-modified semiconductive and metallic nanoparticles having enhanced dispersibility in aqueous media |
| US6426513B1 (en) | 1998-09-18 | 2002-07-30 | Massachusetts Institute Of Technology | Water-soluble thiol-capped nanocrystals |
| US20030017264A1 (en) | 2001-07-20 | 2003-01-23 | Treadway Joseph A. | Luminescent nanoparticles and methods for their preparation |
| US6576291B2 (en) | 2000-12-08 | 2003-06-10 | Massachusetts Institute Of Technology | Preparation of nanocrystallites |
| US20100055733A1 (en) | 2008-09-04 | 2010-03-04 | Lutolf Matthias P | Manufacture and uses of reactive microcontact printing of biomolecules on soft hydrogels |
| WO2019195633A1 (en) | 2018-04-04 | 2019-10-10 | Ignite Biosciences, Inc. | Methods of generating nanoarrays and microarrays |
| US20200217853A1 (en) | 2019-01-08 | 2020-07-09 | Massachusetts Institute Of Technology | Single-Molecule Protein and Peptide Sequencing |
| WO2023196642A1 (en) * | 2022-04-08 | 2023-10-12 | Glyphic Biotechnologies, Inc. | Methods and systems for processing polymeric analytes |
| WO2024030919A1 (en) * | 2022-08-02 | 2024-02-08 | Glyphic Biotechnologies, Inc. | Protein sequencing via coupling of polymerizable molecules |
-
2025
- 2025-01-03 WO PCT/US2025/010206 patent/WO2025147587A1/en active Pending
Patent Citations (27)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US4318846A (en) | 1979-09-07 | 1982-03-09 | Syva Company | Novel ether substituted fluorescein polyamino acid compounds as fluorescers and quenchers |
| US4757141A (en) | 1985-08-26 | 1988-07-12 | Applied Biosystems, Incorporated | Amino-derivatized phosphite and phosphate linking agents, phosphoramidite precursors, and useful conjugates thereof |
| US5091519A (en) | 1986-05-01 | 1992-02-25 | Amoco Corporation | Nucleotide compositions with linking groups |
| US5151507A (en) | 1986-07-02 | 1992-09-29 | E. I. Du Pont De Nemours And Company | Alkynylamino-nucleotides |
| US5066580A (en) | 1988-08-31 | 1991-11-19 | Becton Dickinson And Company | Xanthene dyes that emit to the red of fluorescein |
| US5366860A (en) | 1989-09-29 | 1994-11-22 | Applied Biosystems, Inc. | Spectrally resolvable rhodamine dyes for nucleic acid sequence determination |
| US5188934A (en) | 1989-11-14 | 1993-02-23 | Applied Biosystems, Inc. | 4,7-dichlorofluorescein dyes as molecular probes |
| US5688648A (en) | 1994-02-01 | 1997-11-18 | The Regents Of The University Of California | Probes labelled with energy transfer coupled dyes |
| US5800996A (en) | 1996-05-03 | 1998-09-01 | The Perkin Elmer Corporation | Energy transfer dyes with enchanced fluorescence |
| US5847162A (en) | 1996-06-27 | 1998-12-08 | The Perkin Elmer Corporation | 4, 7-Dichlororhodamine dyes |
| US6322901B1 (en) | 1997-11-13 | 2001-11-27 | Massachusetts Institute Of Technology | Highly luminescent color-selective nano-crystalline materials |
| US5990479A (en) | 1997-11-25 | 1999-11-23 | Regents Of The University Of California | Organo Luminescent semiconductor nanocrystal probes for biological applications and process for making and using such probes |
| US6207392B1 (en) | 1997-11-25 | 2001-03-27 | The Regents Of The University Of California | Semiconductor nanocrystal probes for biological applications and process for making and using such probes |
| US6423551B1 (en) | 1997-11-25 | 2002-07-23 | The Regents Of The University Of California | Organo luminescent semiconductor nanocrystal probes for biological applications and process for making and using such probes |
| US6251303B1 (en) | 1998-09-18 | 2001-06-26 | Massachusetts Institute Of Technology | Water-soluble fluorescent nanocrystals |
| US6319426B1 (en) | 1998-09-18 | 2001-11-20 | Massachusetts Institute Of Technology | Water-soluble fluorescent semiconductor nanocrystals |
| US6426513B1 (en) | 1998-09-18 | 2002-07-30 | Massachusetts Institute Of Technology | Water-soluble thiol-capped nanocrystals |
| US6444143B2 (en) | 1998-09-18 | 2002-09-03 | Massachusetts Institute Of Technology | Water-soluble fluorescent nanocrystals |
| US20020045045A1 (en) | 2000-10-13 | 2002-04-18 | Adams Edward William | Surface-modified semiconductive and metallic nanoparticles having enhanced dispersibility in aqueous media |
| US6576291B2 (en) | 2000-12-08 | 2003-06-10 | Massachusetts Institute Of Technology | Preparation of nanocrystallites |
| US20030017264A1 (en) | 2001-07-20 | 2003-01-23 | Treadway Joseph A. | Luminescent nanoparticles and methods for their preparation |
| US20100055733A1 (en) | 2008-09-04 | 2010-03-04 | Lutolf Matthias P | Manufacture and uses of reactive microcontact printing of biomolecules on soft hydrogels |
| WO2019195633A1 (en) | 2018-04-04 | 2019-10-10 | Ignite Biosciences, Inc. | Methods of generating nanoarrays and microarrays |
| US20200217853A1 (en) | 2019-01-08 | 2020-07-09 | Massachusetts Institute Of Technology | Single-Molecule Protein and Peptide Sequencing |
| US11499979B2 (en) | 2019-01-08 | 2022-11-15 | Massachusetts Institute Of Technology | Single-molecule protein and peptide sequencing |
| WO2023196642A1 (en) * | 2022-04-08 | 2023-10-12 | Glyphic Biotechnologies, Inc. | Methods and systems for processing polymeric analytes |
| WO2024030919A1 (en) * | 2022-08-02 | 2024-02-08 | Glyphic Biotechnologies, Inc. | Protein sequencing via coupling of polymerizable molecules |
Non-Patent Citations (33)
| Title |
|---|
| B.C. KIM ET AL., BIOMATERIALS SCIENCE, vol. 2, no. 3, 2014, pages 288 - 296 |
| BIRD ET AL., SCIENCE, vol. 242, 1988, pages 423 - 426 |
| BLOOM ET AL., NATURE CHEMISTRY, vol. 10, 2018, pages 205 - 211 |
| ESTANDIAN DANIEL MASAO: "Enabling tools for de-novo single molecule protein sequencing", 19 August 2021 (2021-08-19), pages 1 - 85, XP093259814, Retrieved from the Internet <URL:https://dspace.mit.edu/bitstream/handle/1721.1/139977/Estandian-danesta-bcs-phd-2021-thesis.pdf?sequence=1&isAllowed=y> * |
| F. CHEN ET AL., SCIENCE, vol. 347, no. 6621, 2015, pages 543 - 548 |
| HENDRICK ET AL., BIORXIV, 2024 |
| HENEGARIU ET AL., NATURE BIOTECHNOL., vol. 18, 2000, pages 345 |
| HOAGLAND: "Handbook of Fluorescent Probes and Research Chemicals", 2002, MOLECULAR PROBES, INC. |
| HOOD ET AL.: "Immunology", 1984, BENJAMIN |
| HUNKAPILLERHOOD, NATURE, vol. 323, 1986, pages 15 - 16 |
| HUSTON ET AL., PROC. NATL. ACAD. SCI. U.S.A., vol. 85, 1988, pages 5879 - 5883 |
| KELLERMANAK: "DNA Probes", 1993, STOCKTON PRESS |
| KLOSTERMEIER ET AL., BIOPOLYMERS, vol. 61, no. 3, 2001, pages 159 - 79 |
| KNIGHT ET AL., NAT BIOTECHNOLOGY, vol. 21, 2003, pages 1047 - 1054 |
| KNIGHT ET AL., NATURE BIOTECHNOLOGY, vol. 21, 2003, pages 1047 - 1054 |
| KRISHNA ET AL., BIOPOLYMERS, vol. 94, no. 1, 2010, pages 32 - 48 |
| LANZAVECCHIA ET AL., EUR. J. IMMUNOL., vol. 17, 1987, pages 105 |
| M. RUST ET AL., NATURE METHODS, vol. 3, 2006, pages 793 - 796 |
| M. RUST ET AL., NATURE METHODS., vol. 3, 2006, pages 793 - 796 |
| MARGOLIS ET AL., JOURNAL OF AUTOMATIC CHEMISTRY., vol. 13, no. 3, 1991, pages 93 - 95 |
| OLIVIER, MUTANT RES, vol. 573, 2005, pages 103 - 110 |
| ORPANA, BIOMOL ENG, vol. 21, 2004, pages 45 - 50 |
| S. K. DAS ET AL., NUCLEIC ACIDS RES., vol. 38, no. 18, 2010, pages 177 |
| SELVIN, METHODS ENZYMOL., vol. 246, 1995, pages 300 |
| SERVICE, SCIENCE, vol. 311, 2006, pages 1544 - 1546 |
| STRYER ET AL., ANN. REV. BIOCHEM., vol. 47, 1978, pages 819 |
| TARR, METHODS OF PROTEIN MICROCHARACTERIZATION., pages 155 - 194 |
| WETMUR, CRITICAL REVIEWS IN BIOCHEMISTRY AND MOLECULAR BIOLOGY, vol. 26, 1991, pages 227 - 259 |
| XIE ET AL., LANGMUIR, vol. 38, no. 30, 2022, pages 9119 - 9128 |
| XU ET AL., ACS CHEM BIOL., vol. 6, no. 10, 21 October 2011 (2011-10-21), pages 1015 - 1020 |
| ZHANG ET AL., ACS CHEM. BIOL., vol. 16, no. 11, 2021, pages 2595 - 2603 |
| ZHU ET AL., ACS CATAL., vol. 12, no. 13, 2022, pages 8019 - 8026 |
| ZHU ET AL., CHINESE CHEMICAL LETTERS, vol. 29, 2018, pages 1116 - 1118 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US10563257B2 (en) | In situ nucleic acid sequencing of expanded biological samples | |
| JP7057348B2 (en) | A method of combining biomolecule detection with a single assay using fluorescent in situ sequencing | |
| JP7253833B2 (en) | Methods and Kits Using Nucleic Acid Encoding and/or Labeling | |
| US20250155447A1 (en) | Methods and systems for processing polymeric analytes | |
| US20240409995A1 (en) | Single-molecule peptide sequencing through molecular barcoding and ex-situ analysis | |
| US12259394B2 (en) | Protein sequencing via coupling of polymerizable molecules | |
| US20250327811A1 (en) | Single-molecule peptide sequencing using dithioester and thiocarbamoyl amino acid reactive groups | |
| WO2024159162A1 (en) | Single-molecule peptide sequencing using guanidinylating agents | |
| WO2025064836A2 (en) | Amino acid binding agents and their uses | |
| WO2025147587A1 (en) | Protein sequencing using super-resolution imaging | |
| US20250188538A1 (en) | Single-molecule peptide sequencing using xanthate amino acid reactive groups | |
| HK40121862A (en) | Protein sequencing via coupling of polymerizable molecules | |
| HK40127161A (en) | Single-molecule peptide sequencing using dithioester and thiocarbamoyl amino acid reactive groups | |
| WO2025166050A1 (en) | Nanopore-based sequencing of peptides | |
| WO2026030423A1 (en) | Sequencing reagents using guanidinylating agents | |
| HK40116342A (en) | Single-molecule peptide sequencing through molecular barcoding and ex-situ analysis | |
| CN118679389A (en) | Single molecule peptide sequencing by molecular barcoding and ex situ analysis | |
| EP4646595A1 (en) | Peptide sequencer |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 25704022 Country of ref document: EP Kind code of ref document: A1 |