WO2025240918A1 - Systems and methods for generating codebooks - Google Patents
Systems and methods for generating codebooksInfo
- Publication number
- WO2025240918A1 WO2025240918A1 PCT/US2025/029852 US2025029852W WO2025240918A1 WO 2025240918 A1 WO2025240918 A1 WO 2025240918A1 US 2025029852 W US2025029852 W US 2025029852W WO 2025240918 A1 WO2025240918 A1 WO 2025240918A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- code words
- observed
- codebook
- code word
- images
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N21/00—Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
- G01N21/62—Systems in which the material investigated is excited whereby it emits light or causes a change in wavelength of the incident light
- G01N21/63—Systems in which the material investigated is excited whereby it emits light or causes a change in wavelength of the incident light optically excited
- G01N21/64—Fluorescence; Phosphorescence
- G01N21/645—Specially adapted constructive features of fluorimeters
- G01N21/6456—Spatial resolved fluorescence measurements; Imaging
- G01N21/6458—Fluorescence microscopy
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
- G16B40/10—Signal processing, e.g. from mass spectrometry [MS] or from PCR
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N21/00—Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
- G01N21/62—Systems in which the material investigated is excited whereby it emits light or causes a change in wavelength of the incident light
- G01N21/63—Systems in which the material investigated is excited whereby it emits light or causes a change in wavelength of the incident light optically excited
- G01N21/64—Fluorescence; Phosphorescence
- G01N21/6428—Measuring fluorescence of fluorescent products of reactions or of fluorochrome labelled reactive substances, e.g. measuring quenching effects, using measuring "optrodes"
- G01N2021/6439—Measuring fluorescence of fluorescent products of reactions or of fluorochrome labelled reactive substances, e.g. measuring quenching effects, using measuring "optrodes" with indicators, stains, dyes, tags, labels, marks
- G01N2021/6441—Measuring fluorescence of fluorescent products of reactions or of fluorochrome labelled reactive substances, e.g. measuring quenching effects, using measuring "optrodes" with indicators, stains, dyes, tags, labels, marks with two or more labels
Definitions
- the present disclosure generally relates to methods and systems for imagingbased in situ analysis of target analytes in biological samples and, more specifically, to methods for designing codebooks having a set of code words that are assigned to barcoded target analytes in a multiplexed assay.
- the codebook is designed to reduce (e.g., minimize) the impact of spatial crowding on accurate target analyte detection.
- each target molecule to be detected in a multiplexed assay is assigned a unique codeword from a codebook of valid code words.
- Some codebooks e.g., binary codebooks
- nucleic acid probes with target- specific barcodes corresponding to the designed code words are introduced to the tissue specimen, attached to the target molecules within the sample (the target molecules may have a generally stochastic distribution throughout the tissue specimen volume), and then typically amplified to create features (e.g., rolling circle amplification products (RCPs)) comprising multiple copies of the target- specific barcodes assigned to the target molecules.
- RCPs rolling circle amplification products
- Different target molecules may be close to one another within the three-dimensional volume of the tissue specimen.
- the distance between two target molecules, or representative features thereof approaches the "localization precision" (z.e., the accuracy with which the center of each representative feature can be measured in an optical image of the tissue specimen) of the optical imaging technique utilized for detection, then the observed optical signal in the image of that region of tissue will contain optical signals (e.g., "ON" signals in one or more optical detection channels) arising from the representative features of both target molecules, and the estimated center positions of the two representative target molecule features will partially or completely overlap.
- a decoding algorithm used to decode the target- specific barcodes corresponding to code words may not be able to determine which optical signal arose from each target molecule’s representative feature.
- Disclosed herein are methods comprising: receiving a codebook comprising a plurality of valid code words, wherein, for all valid code words in the codebook, a first Hamming distance between a first logical bitwise OR combination of any pair of valid code words and a second logical bitwise OR combination of any other pair of valid code words is greater than or equal to a predetermined number; receiving a plurality of images of a biological sample acquired over a plurality of sequencing or probing cycles, the plurality of images comprising a plurality of observed optical signals; decoding the plurality of observed optical signals in the plurality of images to obtain a plurality of observed code words; and for each observed code word: determining an assignment of the observed code word to one of the plurality of valid code words, or determining that the observed code word is not a valid code word of the plurality of valid code words.
- the method further comprises comparing the at least one intensity value representing an intensity of each observed optical signal to a predetermined intensity threshold to determine a binary value representing the intensity of each observed optical signal.
- each binary value comprises a 1 or a 0, wherein 1 represents an observed optical signal for which intensity is greater than or equal to the predetermined intensity threshold and 0 represents an observed optical signal for which intensity is less than the predetermined intensity threshold.
- decoding the plurality of observed optical signals in the plurality of images comprises obtaining the plurality of observed code words based on a series of binary values determined for each location.
- each observed code word of the plurality of code words comprises a plurality of code word segments, and wherein each code word segment comprises a specified string of binary values that corresponds to one of a specified set of observed optical signal states.
- each code word segment comprises a four bit string of binary values such that: a code word segment of 1 00 0 corresponds to a first optical signal state, A, in which an optical signal is detected in a first detection channel of a four-channel optical imaging instrument, and no optical signal is detected in a second, third, or fourth detection channel of the four-channel optical imaging instrument; a code word segment of 0 1 00 corresponds to a second optical signal state, B, in which an optical signal is detected in the second detection channel of the four-channel optical imaging instrument, and no optical signal is detected in the first, third, or fourth detection channel of the four-channel optical imaging instrument; a code word segment of 0 0 1 0 corresponds to a third optical signal state, C, in which an optical signal is detected in the third detection
- determining the assignment of the observed code word to one of the plurality of valid code words comprises determining a plurality of scores based on comparison of the observed code word to all or a portion of the plurality of valid code words. In some embodiments, the method further comprises selecting one of the plurality of valid code words having a highest score to assign as a replacement for the observed code word.
- the method further comprises identifying a target analyte in the biological sample based on the determined assignment of the observed code word to a valid code word and the codebook.
- the identified target analyte comprises a messenger RNA (mRNA) molecule or protein molecule.
- mRNA messenger RNA
- each valid code word of the plurality of valid code words has a second Hamming distance of greater than or equal to 4 from every other valid code word.
- the codebook comprises at least 50 valid code words.
- the codebook comprises up to 200,000 valid code words.
- systems comprising: a codebook database; a computing system comprising at least one computer-readable storage medium having program instructions stored thereon, the program instructions executable by at least one processor of the computing system to cause the at least one processor to perform a method comprising: receiving a codebook from the codebook database comprising a plurality of valid code words, wherein, for all valid code words in the codebook, a first Hamming distance between a first logical bitwise OR combination of any pair of valid code words and a second logical bitwise OR combination of any other pair of valid code words is greater than or equal to a predetermined number; receiving a plurality of images of a biological sample acquired over a plurality of sequencing or probing cycles, the plurality of images comprising a plurality of observed optical signals; decoding the plurality of observed optical signals in the plurality of images to obtain a plurality of observed code words; and for each observed code word: determining an assignment of the observed code word to one of the plurality of valid code words, or determining
- a codebook comprising a plurality of valid code words, wherein, for at least a first portion of the valid code words in the codebook, a first Hamming distance between a first logical bitwise OR combination of any pair of valid code words and a second logical bitwise OR combination of any other pair of valid code words is greater than or equal to a predetermined number; receiving a plurality of images of a biological sample acquired over a plurality of sequencing or probing cycles, the plurality of images comprising a plurality of observed optical signals; decoding the plurality of observed optical signals in the plurality of images to obtain a plurality of observed code words; and for each observed code word: determining an assignment of the observed code word to one of the plurality of valid code words, or determining that the observed code word is not a valid code word of the plurality of valid code words.
- each binary value comprises a 1 or a 0, wherein 1 represents an observed optical signal for which intensity is greater than or equal to the predetermined intensity threshold and 0 represents an observed optical signal for which intensity is less than the predetermined intensity threshold.
- decoding the plurality of observed optical signals in the plurality of images comprises obtaining the plurality of observed code words based on a series of binary values determined for each location.
- each observed code word of the plurality of code words comprises a plurality of code word segments, and wherein each code word segment comprises a specified string of binary values that corresponds to one of a specified set of observed optical signal states.
- the method further comprises selecting one of the plurality of valid code words having a highest score to assign as a replacement for the observed code word.
- the plurality of images comprises a plurality of images comprising different fields-of-view of the biological sample.
- the plurality of images comprises a plurality of z-stack images of the biological sample.
- the plurality of observed optical signals represents light emitted from a plurality of fluorophores.
- the method further comprises identifying a target analyte in the biological sample based on the determined assignment of the observed code word to a valid code word and the codebook.
- Disclosed herein are computer program products comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor to cause the processor to perform any of the methods described herein.
- databases comprising: a codebook comprising a plurality of valid code words, wherein, for all valid code words in the codebook, a first Hamming distance between a first logical bitwise OR combination of any pair of valid code words and a second logical bitwise OR combination of any other pair of valid code words is greater than or equal to 1.
- Disclosed herein are methods comprising: receiving a codebook comprising a plurality of valid code words, wherein, for all valid code words in the codebook, a first Hamming distance between a first logical bitwise OR combination of any pair of valid code words and a second logical bitwise OR combination of any other pair of valid code words is greater than or equal to 1; receiving a plurality of locations for a plurality of observed optical signals, wherein the plurality of observed optical signals are obtained from a plurality of images of a biological sample acquired over a plurality of sequencing or probing cycles; decoding the plurality of observed optical signals to obtain a plurality of observed code words at the plurality of locations; and for each observed code word: determining an assignment of the observed code word to one of the plurality of valid code words, or determining that the observed code word is not a valid code word of the plurality of valid code words.
- Disclosed herein are methods comprising: receiving a codebook having a plurality of code words, wherein, for all valid code words, a first Hamming distance between a first logical bitwise OR combination of any pair of valid code words and a second logical bitwise OR combination of any other pair of valid code words is greater than or equal to 1; receiving a analyte-index assignment for a plurality of target analytes; and using the analyte-index assignment to assign each target analyte of the plurality of target analytes to at least one of the plurality of code words such that each code word has at most one target analyte assignment, thereby generating an analyte-codeword assignment matrix.
- each code word of the plurality of codewords has an associated index.
- receiving the analyte-index assignment comprises receiving an analyte-index matrix.
- assigning each target analyte of the plurality of target analytes to at least one of the plurality of codewords comprises linking the plurality of target analytes and plurality of codewords based on the same indices.
- the plurality of target analytes comprises a plurality of nucleic acids.
- the plurality of nucleic acids comprises a plurality of genes.
- the plurality of nucleic acids comprises a plurality of RNA transcripts.
- the plurality of target analytes comprises a plurality of proteins.
- Disclosed herein are methods for performing in situ decoding comprising: receiving a plurality of images of a biological sample, wherein the plurality of images comprises images acquired in a plurality of sequencing or probing cycles; detecting, based on the plurality of images, a series of optical signals at one or more locations in the biological sample corresponding to one or more barcoded target analytes; determining, based on the series of optical signals detected in the plurality of images, a code word comprising a series of ON and OFF bits that corresponds to a barcode for one of the one or more barcoded target analytes; and identifying the barcoded target analyte based on a comparison of the determined code word to a codebook, wherein the code word corresponds to a member of a codebook comprising a plurality of code words for which: Hamming Distance ( V
- each series of optical signals detected in the plurality of images at the one or more locations comprises a series of ON signals and OFF signals.
- the plurality of code words in the codebook further satisfy a property that: Hamming Distance (Wi, Wj) > Q for any two pairwise combination of code words Wi and Wj, wherein Q is an integer value greater than or equal to 3.
- two or more code words are determined that correspond to two or more barcoded target analytes for which the corresponding series of optical signals partially overlap within the plurality of images, and wherein an error rate for correctly identifying the two or more barcoded target analytes is reduced compared to that when the plurality of code words in the codebook do not satisfy the relationship: Hamming Distance (Wi ⁇ Wj, W m ⁇ W n ) > K.
- the value of K is selectable by a user during design of the codebook.
- a first portion of the plurality of code words in the codebook satisfies a relationship: Hamming Distance Wj, Wm
- a second portion of the plurality of code words in the codebook satisfies a relationship: Hamming Distance ( V
- the values of Ki and K2 are selectable by a user during design of the codebook.
- a code word from the code book is randomly assigned to each of the one or more barcoded target analytes. In some embodiments, a code word from the code book is assigned to each of the one or more barcoded target analytes based on a decision rule designed to ensure that a total number of ON signals detected in a given image of the plurality of images is within ⁇ 10% of a mean number of ON signals detected per image for the plurality of images. In some embodiments, a code word from the code book is assigned to each of the one or more barcoded target analytes based on a decision rule designed to minimize a maximum predicted density of ON signals detected in images of the plurality of images.
- a code word from the code book is assigned to each of two or more barcoded target analytes based on expression data for the two or more barcoded target analytes in clustered cell types, and wherein the clustered cell types represent a distribution of cell types found in the biological sample.
- the expression data for the two or more barcoded target analytes comprises bulk gene expression data, bulk protein expression data, spatial gene expression data, spatial protein expression data, single cell gene expression data, single cell protein expression data, or any combination thereof.
- the two or more barcoded target analytes are rank-ordered according to a maximum expression level across all clustered cell types, and the two or more code words are assigned to the two or more rank-ordered barcoded target analytes using an iterative process repeated for each of the two or more barcoded target analytes in decreasing order of maximum expression level, the iterative process comprising: computing a predicted density of ON signals for every combination of remaining, unassigned code words and the barcoded target analyte across the plurality of images; selecting a code word from the remaining, unassigned code words that minimizes the predicted density of ON signals across the plurality of images; and assigning the selected code word to the barcoded target analyte.
- K is equal to 3, 4, or 5.
- Q is equal to 4, 5, 6, 7, or 8.
- the plurality of code words comprise code words of at least 60 bits, 80 bits, 100 bits, 120 bits, 140 bits, 160 bits, or 180 bits in length.
- the plurality of code words in the codebook comprises at least 1,000, 5,000, 10,000, 20,000, 40,000, 60,000, 80,000, 100,000, 120,000, 140,000, 160,000, 180,000, or 200, 000 unique code words.
- the series of optical signals comprise fluorescence signals.
- each code word of the plurality comprises M x N bits, where M is a number of sequencing or probing cycles and N is a number of optical detection channels in an instrument configured to perform the in situ decoding.
- the one or more barcoded target analytes comprise barcoded gene sequences, barcoded gene transcripts, barcoded proteins, or any combination thereof.
- databases comprising: one or more non-transitory computer- readable storage medium components, the one or more non-transitory computer-readable storage medium components individually or collectively storing a codebook comprising a plurality of code words for which: Hamming Distance (V Wj, Wm
- the plurality of code words in the codebook further satisfy a property that: Hamming Distance (Wi, Wj) > Q for any two pairwise combination of code words Wi and Wj, and wherein Q is an integer value greater than or equal to 3.
- the value of K is selectable by a user during design of the codebook.
- a first portion of the plurality of code words in the codebook satisfies a relationship: Hamming Distance ( Wz
- a second portion of the plurality of code words in the codebook satisfies a relationship: Hamming Distance ( V
- the values of Ki and K2 are selectable by a user during design of the codebook.
- K is equal to 3, 4, or 5.
- Q is equal to 4, 5, 6, 7, or 8.
- the plurality of code words comprise code words of at least 60 bits, 80 bits, 100 bits, 120 bits, 140 bits, 160 bits, or 180 bits in length.
- the plurality of code words in the codebook comprises at least 100, 500, 1,000, 5,000, 10,000, 20,000, 40,000, 60,000, 80,000, or 100,000 unique code words.
- each code word of the plurality comprises M x N bits, where M is a number of sequencing or probing cycles and N is a number of optical detection channels in an instrument configured to perform the in situ decoding.
- each code word in the codebook has at least 2 ON bits. In some embodiments, each code word in the codebook has no more than 4, 5, or 6 ON bits.
- systems comprising: one or more processors; and a memory communicatively coupled to the one or more processors and configured to store instructions that, when executed by the one or more processors, cause the system to: receive a plurality of images of a biological sample, wherein the plurality of images comprises images acquired in a plurality of sequencing or probing cycles; detect, based on the plurality of images, a series of optical signals at one or more locations in the biological sample corresponding to one or more barcoded target analytes; determine, based on the series of optical signals detected in the plurality of images, a code word comprising a series of ON and OFF bits that corresponds to a barcode for one of the one or more barcoded target analytes; and identify the barcoded target analyte based on a comparison of the determined code word to a codebook, wherein the code word corresponds to a member of a codebook comprising a plurality of code words for which: Hamming Distance (V W)
- each series of optical signals detected in the plurality of images at the one or more locations comprises a series of ON signals and OFF signals.
- the plurality of code words in the codebook further satisfy a property that: Hamming Distance (Wi, Wj) > Q for any two pairwise combination of code words Wi and Wj, wherein Q is an integer value greater than or equal to 3.
- two or more code words are determined that correspond to two or more barcoded target analytes for which the corresponding series of optical signals partially overlap within the plurality of images, and wherein an error rate for correctly identifying the two or more barcoded target analytes is reduced compared to that when the plurality of code words in the codebook do not satisfy the relationship: Hamming Distance ( V Wj, Wm ⁇ W n ) > K.
- the value of K is selectable by a user during design of the codebook.
- a first portion of the plurality of code words in the codebook satisfies a relationship: Hamming Distance f W/
- a second portion of the plurality of code words in the codebook satisfies a relationship: Hamming Distance ( V Wj, U%
- the values of Ki and K2 are selectable by a user during design of the codebook.
- a code word from the code book is randomly assigned to each of the one or more barcoded target analytes.
- a code word from the code book is assigned to each of the one or more barcoded target analytes based on a decision rule designed to ensure that a total number of ON signals detected in a given image of the plurality of images is within ⁇ 10% of a mean number of ON signals detected per image for the plurality of images. In some embodiments, a code word from the code book is assigned to each of the one or more barcoded target analytes based on a decision rule designed to minimize a maximum predicted density of ON signals detected in images of the plurality of images.
- a code word from the code book is assigned to each of two or more barcoded target analytes is based on expression data for the two or more barcoded target analytes in clustered cell types, and wherein the clustered cell types represent a distribution of cell types found in the biological sample.
- the expression data for the two or more barcoded target analytes comprises bulk gene expression data, bulk protein expression data, spatial gene expression data, spatial protein expression data, single cell gene expression data, single cell protein expression data, or any combination thereof.
- K is equal to 3, 4, or 5.
- Q is equal to 4, 5, 6, 7, or 8.
- the plurality of code words comprise code words of at least 60 bits, 80 bits, 100 bits, 120 bits, 140 bits, 160 bits, or 180 bits in length.
- the plurality of code words in the codebook comprises at least 1,000, 5,000, 10,000, 20,000, 40,000, 60,000, 80,000, or 100,000 unique code words.
- the series of optical signals comprise fluorescence signals.
- each code word of the plurality comprises M x N bits, where M is a number of sequencing or probing cycles and N is a number of optical detection channels in an instrument configured to perform the in situ decoding.
- the one or more barcoded target analytes comprise barcoded gene sequences, barcoded gene transcripts, barcoded proteins, or any combination thereof.
- FIGS. 1A-1B provide a non-limiting example of a process flowchart for generating an OR-robust codebook, in accordance with one implementation of the methods described herein.
- FIG. 2 provides a non-limiting example of a process flowchart for assigning the code words in an OR-robust codebook to a corresponding list of target analytes, in accordance with one implementation of the methods described herein.
- FIG. 3 provides a non-limiting example of a process flowchart for decoding optical signals derived from images of a biological sample to identify barcoded target analytes, in accordance with one implementation of the methods described herein.
- FIG. 4A depicts a non-limiting example of the structure of a binary code word for use with hybridization probe-based in situ detection of barcoded target analytes, in accordance with some implementations of the methods described herein.
- FIG. 6 provides a non-limiting schematic illustration of sequencing-based in situ detection of barcoded target analytes, in accordance with some implementations of the methods described herein.
- FIG. 7 depicts an overview of a volumetric sample imaging system and illustrates a Field of View (FOV) grid bounding the sample (e.g., hydrogel, tissue section, one or more cells, etc.) as projected onto the surface of a solid substrate supporting the sample.
- FOV Field of View
- FIG. 8 depicts the XZ cross-sectional view and illustrates tissue non-uniformity in the Z dimension, where the full (non-reduced) imaging volume is oversampled in the Z dimension.
- the objective lens focal point is positioned to acquire an image at every Z-slice in a Z-stack.
- An XZ image of signal distribution (bottom) demonstrates a non-uniform distribution of detected signal within the imaging volume.
- FIG. 9 depicts a system for performing an in situ detection or sequencing assay, in accordance with some implementations of the methods described herein.
- FIGS. 10A-10B illustrate cross-sectional views of an optics module in an imaging system, according to some embodiments.
- FIG. 11 depicts a computer system or computer network, in accordance with some instances of the systems described herein.
- the codebook designs described herein provide robust protections against codebook calling errors, for example, calling of a valid codeword in the codebook based on a detected codeword that has one or more detection errors.
- Detection errors may occur due to crosstalk in imaging channels (e.g., where two dyes have overlapping excitation spectra) or due to autofluorescence. Detection errors may also occur, for example, due to the close proximity of two or more target analytes having fluorescent oligonucleotides configured to emit fluorescence during the same imaging cycle.
- the codebook designs described herein reduce (e.g., minimize) the potential for errors (e.g., calling an incorrect transcript) during decoding.
- the codebook designs described herein reduce (e.g., minimizes) the impact of spatial crowding of target molecules within a biological sample (e.g., a tissue specimen) when performing decoding of detected signals from a plurality of imaging rounds.
- the codebooks described herein are referred to as "OR-robust" or "spatial collision robust" codebooks.
- An OR-robust codebook has the property that all or a portion of the valid codewords in the codebook satisfy the property that the Hamming distance between a first logical bitwise OR combination of any pair of valid code words and a second logical bitwise OR combination of any other pair of valid code words is greater than or equal to an OR-robust radius (i.e., a specified integer value greater than zero).
- an OR-robust codebook reduces the chance that light signals from any two different analytes in close proximity to one another combine into the same observed codeword (after all imaging cycles are completed) that is ultimately decoded to a valid codeword (with one or more errors in the observed codeword and/or a low quality score) or discarded entirely.
- methods for decoding optical signals can leverage an OR-robust codebook to enable accurate decoding of barcoded target molecules in multiplexed in situ assays even under conditions where the spatial densities of target molecules are high.
- the disclosed methods may comprise: receiving a codebook comprising a plurality of valid code words, where, for all valid code words in the codebook, a first Hamming distance between a first logical bitwise OR combination of any pair of valid code words and a second logical bitwise OR combination of any other pair of valid code words (including the case where the first logical bitwise OR combination and the second logical bitwise OR combination include a common code word) is greater than or equal to a predetermined number (e.g., where the predetermined number is 1, 2, 3, 4, 5, 6, 7, or 8); receiving a plurality of images of a biological sample acquired over a plurality of sequencing or probing cycles, each image of the plurality of images comprising a plurality of observed optical signals; decoding the plurality of observed optical signals in the plurality of images to obtain a plurality of observed code words; and for each observed code word: determining an assignment of the observed code word to one of the plurality of valid code words, or determining that the observed code word is
- the disclosed methods may comprise: receiving a codebook comprising a plurality of valid code words, where, for at least a first portion of the valid code words in the codebook, a first Hamming distance between a first logical bitwise OR combination of any pair of valid code words and a second logical bitwise OR combination of any other pair of valid code words (including the case where the first logical bitwise OR combination and the second logical bitwise OR combination include a common code word) is greater than or equal to 1; receiving a plurality of images of a biological sample acquired over a plurality of sequencing or probing cycles, each image of the plurality of images comprising a plurality of observed optical signals; decoding the plurality of observed optical signals in the plurality of images to obtain a plurality of observed code words; and for each observed code word: determining an assignment of the observed code word to one of the plurality of valid code words, or determining that the observed code word is not a valid code word of the plurality of valid code words.
- the disclosed methods may comprise: receiving a codebook comprising a plurality of valid code words, where, for all valid code words in the codebook, a first Hamming distance between a first logical bitwise OR combination of any pair of valid code words and a second logical bitwise OR combination of any other pair of valid code words is greater than or equal to 1 ; receiving a plurality of locations for a plurality of observed optical signals, wherein the plurality of observed optical signals are obtained from a plurality of images of a biological sample acquired over a plurality of sequencing or probing cycles; decoding the plurality of observed optical signals to obtain a plurality of observed code words at the plurality of locations; and for each observed code word: determining an assignment of the observed code word to one of the plurality of valid code words, or determining that the observed code word is not a valid code word of the plurality of valid code words.
- methods of generating a codebook for in situ decoding comprising: receiving a plurality of code words, where, for all valid code words, a first Hamming distance between a first logical bitwise OR combination of any pair of valid code words and a second logical bitwise OR combination of any other pair of valid code words (including the case where the first logical bitwise OR combination and the second logical bitwise OR combination include a common code word) is greater than or equal to 1 ; receiving a list of a plurality of target analytes; and for each target analyte on the list of the plurality of target analytes: assigning the target analyte to at least one of the plurality of code words such that each code word has at most one target analyte assignment, thereby generating the codebook.
- methods for performing in situ decoding comprising: receiving a plurality of images of a biological sample, where the plurality of images comprises images acquired in a plurality of sequencing or probing cycles; detecting, based on the plurality of images, a series of optical signals at one or more locations in the biological sample corresponding to one or more barcoded target analytes; determining, based on the series of optical signals detected in the plurality of images, a code word comprising a series of ON and OFF bits that corresponds to a barcode for one of the one or more barcoded target analytes; and identifying the barcoded target analyte based on a comparison of the determined code word to a codebook, wherein the code word corresponds to a member of a codebook comprising a plurality of code words for which:
- each OR combination is based on two different codewords.
- a pair of OR-combinations can share at most one codeword in the comparison.
- z cannot be equal to m and j cannot be equal to n.
- the codebooks described herein comprise binary codebooks, i.e., codebooks comprising binary code words having a plurality of binary segments (e.g., 4- bit binary segments of the form “bit 1, bit 2, bit 3, bit 4, etc.”) where each ON bit (“1”) indicates that a signal was detected in one of the plurality of optical detection channels of an imaging instrument used to perform the decoding in a given decoding cycle, and each OFF bit (“0”) indicates that no signal was detected in the particular optical detection channel in the given decoding cycle.
- each subsegment (and ultimately, each observed full codeword is associated with a specific X, Y, Z set of coordinates within an imaged 3D volume).
- Each binary segment may represent an individual imaging cycle of a plurality of imaging cycles. For example, where the color channels associated with the binary segments are “red channel, yellow channel, green channel, blue channel”, a binary segment of “1 0 00” indicates that a signal was detected in the red channel and no signal was detected in the yellow, green, or blue channels. When all binary segments are appended together, the resulting string of l’s and 0’s represents a full binary code word.
- the disclosed OR-robust codebooks satisfy the property that Hamming Distance (CWA ⁇ CWB, for all possible pairwise combinations (or a portion of all possible pairwise combinations) of a list of valid code words (e.g., CWA, CWB, CWC, CWD, etc.), where the notation CWX ⁇ CWY denotes a code word derived from the logical bitwise OR combination of code words CWx and CWY, and where K is an integer value greater than zero.
- code words within the code book are represented in illumination state space. For example, each state may be represented as a letter in the alphabet (e.g., red is state A, yellow is state B, green is state C, blue is state D, and empty is state E).
- each state corresponds to a binary string.
- the state A may be represented as 1000
- the state B may be represented as 0100
- the state C may be represented as 0010
- the state D may be represented as 0001
- the empty (no emission) state may be represented as 0000 (where each bit in the binary string corresponds to a color channel, similar to the binary segments described above).
- pairs of codewords in the codebook may satisfy Hamming Distance (CWA ⁇ CWB, CWC CWD) >0 for many combinations of two pairs of the code words, which means that if one observes a signal corresponding to CWA ⁇ CWB (e.g., if a first target molecule, such as a first rolling circle product (RCP), labeled with a barcode corresponding to code word CWA in a given decoding cycle is very close to a second target molecule, such as a second RCP, labeled with a barcode corresponding to code word CWB in that decoding cycle), then the signal from the first and second target molecules may be indistinguishable from a signal corresponding to CWD, and therefore may not be accurately decoded with suitable confidence.
- CWA ⁇ CWB Hamming Distance
- an OR-robust codebook can be designed for use with high-plexy analysis (e.g., a 2k gene panel, a 5k gene panel, or a whole transcriptome panel), gene panels where at least one gene in the panel is a highly expressed gene (signals from highly expressed genes can cause spatial crowding or overpower signals from lesser expressed genes), and/or protein panels (protein can appear diffuse thus causing spatial crowding).
- high-plexy analysis e.g., a 2k gene panel, a 5k gene panel, or a whole transcriptome panel
- gene panels where at least one gene in the panel is a highly expressed gene (signals from highly expressed genes can cause spatial crowding or overpower signals from lesser expressed genes), and/or protein panels (protein can appear diffuse thus causing spatial crowding).
- code words CWA and CWB are two different code words in the code book (e.g., code words CWA and CWB are not the same string of letters or bits) and code words CWc and CWD are two different code words in the codebook (e.g., code words CWc and CWD are not the same string of letters or bits).
- code words CWA and CWB are a first pair of code words and code words CWc and CWD are a second pair of codewords that is different from the first pair of codewords.
- code words CWA and CWc may be the same code word, but code words CWB and CWD are different code words such that the first pair of codewords is different from the second pair of codewords.
- code word CWA is not equal to code words CWB, CWC, and CWD;
- code word CWB is not equal to code words CWA, CWC, and CWD and
- code word CWc is not equal to code words CWA, CWB, and CWD (thus, code word CWD is not equal to code words CWA, CWB, and CWc).
- an OR-robust codebook can be generated by starting with at least one arbitrary code word (e.g., one, two, or three code words).
- the at least one arbitrary starting code words pass at least one validation check.
- a validation check may include a check that the code word has at least a specific number of ON bits or exactly a specific number of ON bits.
- the at least two starting code words may be separated by at least a predetermined Hamming distance (e.g., a HD of at least 6).
- two code words are arbitrarily selected having at least a predetermined edit distance (e.g., Hamming distance) from each other.
- two code words having 60 total bits (15 segments of 4 bits) and a maximum number of ON bits in any given code word of five, the maximum possible Hamming distance between these two code words is 10 (i.e., each of the five ON bits in the first code word do not overlap with any of the five ON bits in the second code word, meaning that 10 edits).
- two code words are selected having at least a predetermined edit distance (e.g., Hamming distance) from each other.
- any codewords beyond the second are selected such that the codeword has a predetermined edit distance (e.g., Hamming distance) from all other codewords, and the logical bitwise OR between any two pairs of codewords has a predetermined edit distance (e.g., Hamming distance) from one another.
- a predetermined edit distance e.g., Hamming distance
- two arbitrary codewords are selected (with sufficiently high edit distance from each other) to start, but the third codeword is no longer arbitrary and is selected so that the OR-robust property is not violated.
- pairs of pairs can be created: ⁇ CW A
- CWC) > K; HD(CW A
- CWC) > K; HD(CWA
- CWC) > K.
- new codewords are generated using a random generator.
- new codewords are generated using a deterministic generator having one or more properties (z.e., generating codewords in a specific, useful order).
- new codewords are generated to have a predetermined number of overlapping bits with codewords already present in the codebook (e.g., new codewords have a same number of overlapping bits or states with codewords already present in the codebook, new codewords have a maximum number of overlapping bits or states with codewords already present in the codebook).
- some methods for generating new codewords (e.g., as described above) will allow for larger OR-robust codebooks to be generated.
- a new candidate code word is generated (e.g., randomly generated) and tested to determine whether adding the new candidate code word to the codebook satisfies the Hamming Distance property and the OR-robust property. If adding the new candidate code word violates the Hamming Distance property (z.e., the candidate code word is less than a specific HD away from at least one other valid code word in the code book), then the candidate code word can be discarded. If adding the new code word to the codebook satisfies the OR-robust property, the new code word is added to the codebook as a valid code word, otherwise the new codeword is discarded, and another new code word is randomly generated and tested.
- This process may be repeated until a codebook having a predetermined number of valid code words is generated or no new valid codewords can be found after a predetermined number of attempts (e.g., 1 million trials) (such that the codebook satisfies the OR-robust property for all pairs of valid codewords).
- a predetermined number of attempts e.g., 1 million trials
- the process is repeated until no more codewords can be added to the codebook without violating the OR-robust property (or any other suitable constraint, such as a predetermined number of OR-robust codewords has been achieved).
- the process for generating new code words is generalized as a search (e.g., a depth-first search, breadth-first search, or informed search).
- the search algorithm implements back-tracking (e.g., where a codeword, such as a newly added codeword, is removed and another new codeword is generated that allows for a larger final OR-robust codebook).
- the new codeword generation algorithm finds a codeword CW_a, that can be added to the codebook without violating any constraints, but then the algorithm determines that no more codewords can be added beyond that codeword CW_a, giving a final OR-robust codebook size of 100.
- the algorithm may perform backtracking by removing at least codeword CW_a from the codebook.
- the algorithm determines a new codeword CW_b can be added to the codebook (after removing at least codeword CW_a), and that codeword CW_c can also be added with codeword CW_b resulting in a codebook with at least 101 codewords, as opposed to 100 codewords if CW_a was chosen.
- backtracking allows for the new codeword generation algorithm to go back arbitrarily far (z.e., remove arbitrarily up to a predetermined number of codewords from the codebook) to explore if a denser packing of OR-robust codewords is possible, thereby generating a larger OR-robust codebook.
- the OR-robust property may be imposed in addition to a conventional edit distance criterion, e.g., that the Hamming Distance for any pairwise combination of the base code words CWA and CWB is at least a predetermined distance parameter Q.
- the predetermined distance parameter Q is two times the number of single errors that can be detected and corrected plus 1.
- This edit distance criterion is defined as follows: Hamming Distance (CWA, CWB) (where Q is an integer value of greater than or equal to 1, for example, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, etc.).
- a hamming weight (HW) of a codeword is equal to the number of ON bits in that codeword.
- all codewords in the codebook have the same Hamming Weight.
- at least some codewords in the codebook have a different hamming weight (i.e., not all codewords in the codebook have the same Hamming Weight).
- a codebook may have 100 codewords having a HW of 5 and 50 codewords having a HW of 6.
- Additional constraint criteria include, but are not limited to, the conventional Hamming distance criterion (Hamming Distance (CWA, CWB) ⁇ Q), a maximum number of ON bits allowed per decoding cycle or code word segment (e.g., 1 ON bit for a 4-bit segment), a maximum number of ON bits allowed per code word (e.g., 4 ON bits per a 60-bit code word corresponding to a 15 cycle 4 color imaging decoding process, 5 ON bits per a 60-bit code word corresponding to a 15 cycle 4 color imaging decoding process, 6 ON bits per a 60-bit code word corresponding to a 15 cycle 4 color imaging decoding process, or 7 ON bits per a 60-bit code word corresponding to a 15 cycle 4 color imaging decoding process), exclusion of code words from a predetermined list of selected code words, etc.
- the conventional Hamming distance criterion Hamming Distance (CWA, CWB) ⁇ Q
- a maximum number of ON bits allowed per decoding cycle or code word segment
- a constraint imposed is that all codewords have at least one bit on in each of the optical channels (e.g., one of the four optical channels).
- a constraint imposed is that ON-bits from adjacent cycles cannot occupy the same color channel (e.g., if codeword CWA has an ON-bit in the red color channel in cycle 1 or has a state space assigned to the “red” state, then codeword CWA will not have an ON-bit in cycle 2 in the red color channel or have a state space assigned to the “red” color channel in cycle 2).
- codeword CWA has an ON-bit in the red color channel in cycle 1 or has a state space assigned to the “red” state, then codeword CWA will not have an ON-bit in cycle 2 in the red color channel or have a state space assigned to the “red” color channel in cycle 2).
- only valid candidate codewords are generated.
- a first candidate code word is selected and a second candidate code word is selected according to the methods described herein.
- the candidate code words can be checked to determine if the specified set of constraint criteria (e.g., the OR-robust criterion, etc.) are met and, if so, an additional (e.g., third, fourth, etc.) candidate code word can be selected and checked against the first two or more code words to see if the specified set of constraint criteria are still met for each pair, etc.
- the specified set of constraint criteria e.g., the OR-robust criterion, etc.
- an additional (e.g., third, fourth, etc.) candidate code word can be selected and checked against the first two or more code words to see if the specified set of constraint criteria are still met for each pair, etc.
- the first layer includes constraints imposed on any single codeword, e.g., a minimum hamming weight, minimum number of channels, maximum number of channels, etc.
- the third codeword (CW_3) is chosen such that CW_3 satisfies all single codeword constraints, in addition to being sufficiently far from CW_1 and CW_2 in Hamming space, and it has to satisfy the third layer of constraints, z.e., the "pairs of pairs of codewords constraints.”
- a first random starting codeword is selected having exactly a specific number of ON bits (e.g., five ON bits).
- testing of candidate codewords against all valid codewords is a computationally expensive process.
- testing candidate codewords for inclusion in the list of valid codewords can be sped up so that a candidate codeword does not need to be tested against all valid codewords to ensure that adding the candidate codeword does not cause the codebook to violate the OR-robust property.
- a test is performed to determine if adding the candidate codeword to the codebook will violate the OR-robust constraint.
- the test is to bitwise OR the candidate codeword with all valid codewords in the codebook and compare the OR-ed candidates against bitwise ORs of all pairs of valid codewords.
- the time complexity of this test is quadratic time and, thus, computationally expensive.
- the maximum OR robust radius for a codebook of codewords having five ON bits per codeword is 20.
- an OR-robust radius is selected for a codebook that is 2, 3, 4, 5, 6, 7, 8, 9, or 10.
- a constraint of a higher OR-robust radii will generate a codebook that does not have enough codewords for a given purpose, for example, for a whole 1 transcriptome analysis.
- an OR-robust radius of about 2 to about 10 will generate codebooks having a suitable number of valid codewords for in situ analysis.
- a process to efficiently check if adding a candidate codeword to the codebook would violate the or_robust_radius constraint To solve this problem in less than quadratic time, we can take advantage of the special structure of our codewords and our codebook.
- a property of the codebook is that all codewords in the codebook are at least Hamming distance of 6 apart. In various embodiments, all codewords have exactly a specific number of ON bits set (e.g., five ON bits).
- a faster algorithm using knowledge of the codebook properties described above is as follows:
- this algorithm reduces the time complexity of OR-robust validation of candidate codewords because the minimum Hamming distance between a first pair of ORed codeword pairs is a function of the number of set bits that each codeword in an ORed second pair has with the ORed first pair.
- FIGS. 1A-1B provide a non-limiting example of a flowchart for a process 100 for generating an OR-robust codebook.
- Process 100 can be performed, for example, using one or more electronic devices implementing software configured to perform the process 100.
- process 100 is performed using a client-server system, and the blocks of process 100 are divided up between the server and multiple client devices.
- portions of process 100 are described herein as being performed by particular devices of a client-server system, it will be appreciated that process 100 is not so limited.
- process 100 is performed using only a client device or only multiple client devices.
- some blocks are, optionally, combined, the order of some blocks is, optionally, changed, and some blocks are, optionally omitted.
- additional steps may be performed in combination with the process 100. Accordingly, the operations as illustrated (and described in greater detail below) are exemplary by nature and, as such, should not be viewed as limiting.
- a plurality of candidate code words are generated randomly (e.g., by one or more processors of a system configured to perform the process illustrated in FIG. 1A).
- the randomly generated set of candidate code words may comprise at least 1,000, 5,000, 10,000, 20,000, 40,000, 60,000, 80,000, 100,000, or more than 100,000 unique candidate code words.
- the candidate code words (and the selected set of filtered code words) may comprise binary code words, e.g., code words comprising a series (or string) of binary values (z.e., “1” or “0”).
- the candidate binary code words may comprise code words of at least 20 bits, 40 bits, 60 bits, 80 bits, 100 bits, 120 bits, 140 bits, 160 bits, 180 bits, or more than 180 bits in length.
- a codebook comprising binary code words of length 20 bits may include up to 1,048,576 unique code words
- the candidate code words (and the selected set of filtered code words) may comprise a series of code word segments, e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, or more than 20 code word segments, where each code word segment comprises, e.g., 2, 3, 4, or more than 4 bits.
- Each code word segment may represent a unique imaging cycle of a plurality of imaging cycles where a sample is imaged in a plurality of color channels e.g., red, yellow, green, and blue color channels).
- a binary code word of total length 60 bits includes 15 code word segments of 4 bits each.
- a binary code word of total length 80 bits includes 20 code word segments of 4 bits each.
- more than 100 imaging cycles are supported (e.g., enough imaging cycles to analyze a full transcriptome).
- the candidate code words may optionally be filtered to remove code words that don’t conform to, e.g., a constraint on a maximum number of ON bits per code word segment.
- a maximum number of ON bits per code word segment may be 1 bit, 2 bits, 3 bits, 4 bits, 5 bits, 6 bits, 7 bits, 8 bits, or more than 8 bits depending on the length of the code word segment.
- the candidate code words may optionally be filtered to remove code words that don’t conform to one or more additional constraints, e.g., a maximum number of ON bits allowed per code word (e.g., 5, 6, 7, 8, 9, 10, or more than 10 ON bits depending on the length of the code word, exclusion of code words from a predetermined list of selected code words, etc.
- additional constraints e.g., a maximum number of ON bits allowed per code word (e.g., 5, 6, 7, 8, 9, 10, or more than 10 ON bits depending on the length of the code word, exclusion of code words from a predetermined list of selected code words, etc.
- the plurality of candidate code words may optionally be filtered to remove candidate code words that don’t conform to a specified edit distance criterion, e.g., a criterion that Hamming Distance (CWA ⁇ CWB) for all pairwise combinations with other candidate code words of the plurality, where CWA and CWB are candidate code words, and predetermined distance parameter Q is an integer having a value of greater than or equal to 1.
- a specified edit distance criterion e.g., a criterion that Hamming Distance (CWA ⁇ CWB) for all pairwise combinations with other candidate code words of the plurality, where CWA and CWB are candidate code words
- predetermined distance parameter Q is an integer having a value of greater than or equal to 1.
- Q is 2k+l , where k is an integer have a value greater than or equal to 1.
- Hamming distance is a special case of an edit distance, a class of metrics used to compare and evaluate distances between two character strings, which allow for three kinds of edit operations to be performed on the characters of one string to transform it into the other string (e.g., substitution, insertion, or deletion of a single character).
- Other examples of edit distances include the Longest Common Subsequence Distance (LCSD) and the Levenshtein distance (LevD).
- the Levenshtein distance allows for deletion, insertion and substitution.
- the Longest Common Subsequence Distance allows for insertion and deletion, but not substitution.
- the Hamming distance allows only substitution, and hence only applies to strings of the same length.
- the use of a higher value of k provides greater error detection and correction capability for overcoming noisy signal detection during image-based decoding.
- the minimum acceptable value of k is determined based on the observed signal detection error rate for a given instrument used to perform decoding.
- the remaining candidate code words may be filtered to remove candidate code words that don’t conform to another specified edit distance criteria, e.g., a criterion that Hamming Distance (CWA ⁇ CWB, all logical bitwise OR combinations of any two candidate code words in a plurality of remaining candidate code words, where CWA ⁇ CWB indicates the logical bitwise OR combination of code words CWA and CWB, CWC ⁇ CWD indicates the logical bitwise OR combination of code words CWc and CWD, and K is an integer having a value greater than or equal to 1. In some instances, K may be equal to 1, 2, 3, 4, 5, 6, 7, 8, etc.
- CWA ⁇ CWB Hamming Distance
- the value of K is selectable by a user during design of the codebook.
- the hamming distance between two pairs of pairs of codewords is between 0 and the sum of their hamming weights (inclusive).
- the maximum OR-robust Hamming distance between pairs of pairs of codewords is 20 (z.e., none of the four codewords share any bit with any other codeword).
- the maximum OR-robust Hamming distance between pairs of pairs of codewords is 24.
- a first portion of the plurality of code words in the codebook satisfies a constraint that the Hamming Distance (CWA ⁇ CWB, CWC ⁇ CWD) ⁇ KI for all logical bitwise OR combinations of any two candidate code words in the first portion
- a second portion of the plurality of code words in the codebook satisfies a constraint that the Hamming Distance (CWA ⁇ CWB, CWcjCW ⁇ K2 for all logical bitwise OR combinations of any two candidate code words in the second portion, where Ki K2.
- the values of Ki and K2 are selectable by a user during design of the codebook.
- Such codebooks comprising a first portion of code words and a second portion of code words that satisfy different OR-robust constraints may be useful, for example, in situations where it is desirable to decode a first set of genes/transcripts with higher accuracy than a second set, so may use OR-robust code words that have a higher value of K (z.e., a stronger OR-robust criterion) for the first portion of code words than the remaining code words.
- K z.e., a stronger OR-robust criterion
- the remaining candidate code words may optionally be filtered to remove candidate code words that don’t conform to, e.g., a constraint on a maximum number of ON bits per code word segment.
- a maximum number of ON bits per code word segment may be 1 bit, 2 bits, 3 bits, 4 bits, 5 bits, 6 bits, 7 bits, 8 bits, or more than 8 bits depending on the length of the code word segment.
- the remaining candidate code words may optionally be filtered to remove candidate code words that don’t conform to one or more additional constraints, e.g., a maximum number of ON bits allowed per code word (e.g., 3, 4, 5, 6, 7, 8, 9, 10, or more than 10 ON bits depending on the length of the code word, exclusion of code words from a predetermined list of selected code words, etc.
- additional constraints e.g., a maximum number of ON bits allowed per code word (e.g., 3, 4, 5, 6, 7, 8, 9, 10, or more than 10 ON bits depending on the length of the code word, exclusion of code words from a predetermined list of selected code words, etc.
- an OR-robust codebook is output that comprises a plurality of selected code words that meet the specified list of constraints.
- each code word of the plurality of selected code words comprises M x N bits, where M is a number of sequencing or probing cycles and A is a number of optical detection channels in an instrument configured to perform the in situ decoding.
- the OR-robust codebook may be tailored for a specific in situ detection or sequencing application by assigning one or more code words contained therein to each of a plurality of barcoded target molecules (or target analytes) of interest. In some instances, more than one code word may be assigned to a single barcoded target molecule.
- the code words thus correspond to and represent the physical barcodes (e.g., oligonucleotide barcode sequences) attached to the target molecules in a multiplexed in situ assay, where the relationship between the structure of the code words and the structure of the physical barcodes depends on the read-out method (e.g., hybridization probe-based detection or nucleic acid sequencing) used in the in situ assay.
- the logical bitwise OR between any two pairs of codewords will be equal to 1 or 2. In various embodiments, the logical bitwise OR between at least one pair of pair of codewords is 0.
- FIG. 2 provides a non-limiting example of a flowchart for a process 200 for assigning the code words in an OR-robust codebook to a corresponding list of target analytes.
- Process 200 can be performed, for example, using one or more electronic devices implementing a software platform.
- process 200 is performed using a client-server system, and the blocks of process 200 are divided up between the server and multiple client devices.
- portions of process 200 are described herein as being performed by particular devices of a client-server system, it will be appreciated that process 200 is not so limited. In other examples, process 200 is performed using only a client device or only multiple client devices.
- process 200 some blocks are, optionally, combined, the order of some blocks is, optionally, changed, and some blocks are, optionally omitted. In some examples, additional steps may be performed in combination with the process 200. Accordingly, the operations as illustrated (and described in greater detail below) are exemplary by nature and, as such, should not be viewed as limiting.
- a list of code words from an OR-robust codebook is received (e.g., by one or more processors of a system configured to perform the process illustrated in FIG. 2).
- the OR robust codebook may be, for example, a codebook generated using the process illustrated in FIG. 1 for which valid code words comply with the “OR- robust” constraint that Hamming Distance ( CWA ⁇ CWB, for all logical bitwise OR combinations of any two candidate code words in a plurality of remaining candidate code words, where K is an integer having a value of greater than or equal to 1.
- a list of a plurality of target molecules (or target analytes) that are of interest in a particular experiment is received.
- the list may comprise a plurality of nucleic acids.
- the plurality of nucleic acids may comprise a plurality of genes.
- the plurality of nucleic acids may comprise a plurality of RNA transcripts.
- the plurality of target analytes may comprise a plurality of proteins.
- the plurality of target analytes may comprise a combination of nucleic acids e.g., genes or transcripts) and proteins.
- a target analyte from the list is assigned to at least one code word from the plurality of code words. Step 206 is repeated until all target analytes on the list have been assigned at least one code word from the OR-robust codebook.
- a code word from the codebook may be randomly assigned to each of the one or more barcoded target analytes.
- specific code words from the codebook e.g., those with the largest OR-robust distances
- a code word from the code book may be assigned to each of the one or more barcoded target analytes based on a decision rule designed to ensure that a total number of ON signals detected in a given image of the plurality of images is within ⁇ 5%, ⁇ 10%, ⁇ 15%, or ⁇ 20% of a mean number of ON signals detected per image for the plurality of images.
- a code word from the code book may be assigned to each of the one or more barcoded target analytes based on a decision rule designed to minimize a maximum predicted density of ON signals detected in images of the plurality of images.
- a code word from the code book may be assigned to each of two or more barcoded target analytes based on expression data for the two or more barcoded target analytes in clustered cell types (e.g., where code words with the largest OR-robust distances are assigned to genes/transcripts with the highest expression levels), and where the clustered cell types represent a distribution of cell types found in the biological sample.
- the expression data for the two or more barcoded target analytes may comprise bulk gene expression data, bulk protein expression data, spatial gene expression data, spatial protein expression data, single cell gene expression data, single cell protein expression data, or any combination thereof.
- the two or more assigned code words are rank-ordered according to code word weight
- the two or more barcoded target analytes are rank-ordered according to a maximum expression level across all clustered cell types
- the two or more rank-ordered code words are assigned to the two or more rank- ordered barcoded target analytes using an iterative process repeated for each of the two or more barcoded target analytes in decreasing order of maximum expression level, the iterative process comprising: computing a predicted density of ON signals for every combination of remaining, unassigned code words and the barcoded target analyte across the plurality of images; selecting a code word from the remaining, unassigned code words that minimizes the predicted density of ON signals across the plurality of images; and assigning the selected code word to the barcoded target analyte.
- the updated OR-robust codebook that includes code word - target analyte assignments is output.
- FIG. 3 provides a non-limiting example of a flowchart for a process 300 for decoding optical signals derived from images of a biological sample to identify barcoded target analytes.
- Process 300 can be performed, for example, using one or more electronic devices implementing a software platform.
- process 300 is performed using a client-server system, and the blocks of process 300 are divided up between the server and multiple client devices.
- portions of process 300 are described herein as being performed by particular devices of a client-server system, it will be appreciated that process 300 is not so limited. In other examples, process 300 is performed using only a client device or only multiple client devices.
- process 300 some blocks are, optionally, combined, the order of some blocks is, optionally, changed, and some blocks are, optionally omitted. In some examples, additional steps may be performed in combination with the process 300. Accordingly, the operations as illustrated (and described in greater detail below) are exemplary by nature and, as such, should not be viewed as limiting.
- an OR-robust codebook comprising a plurality of valid code words and their corresponding target analytes is received (e.g., by one or more processors of a system configured to perform the process illustrated in FIG. 3).
- the OR robust codebook may be, for example, a codebook generated using the process illustrated in FIG. 1 for which valid code words comply with the “OR-robust” constraint that Hamming for all logical bitwise OR combinations of any two candidate code words in a plurality of remaining candidate code words, where K is an integer having a value of greater than or equal to 1.
- all valid code words in the OR-robust codebook may comply with the with the “OR-robust” constraint that Hamming Distance (CWA ⁇ CWB, CWC ⁇ CWD) for all logical bitwise OR combinations of any two candidate code words in a plurality of remaining candidate code words, where K is an integer having a value of greater than or equal to 1.
- At least a first portion of the valid code words in the OR-robust codebook may comply with the “OR-robust” constraint that Hamming Distance ( CWA ⁇ CWB, CWCI CWD) all logical bitwise OR combinations of any two candidate code words in a plurality of remaining candidate code words, where K is an integer having a value of greater than or equal to 1.
- the first portion of the valid code words may comprise, for example, at least 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, or 90% of the valid code words in the OR-robust codebook, and the remaining portion of the valid code words may not comply with the OR-robust property.
- the codebook may comprise at least 50 valid code words (e.g., at least 50, 75, 100, 200, 300, 400, 500, 600, 700, 800, or 1,000 code words). In some instances, the codebook may comprise up to 300,000 valid code words (e.g., up to 10,000, 20,000, 30,000, 40,000, 50,000, 60,000, 70,000, 80,000, 90,000, 100,000, 120,000, 140,000, 160,000, 180,000, 200,000, 220,000, 240,000, 260,000, 280,000, or 300,000 valid code words).
- the codebook may comprise at least 50 valid code words (e.g., at least 50, 75, 100, 200, 300, 400, 500, 600, 700, 800, or 1,000 code words). In some instances, the codebook may comprise up to 300,000 valid code words (e.g., up to 10,000, 20,000, 30,000, 40,000, 50,000, 60,000, 70,000, 80,000, 90,000, 100,000, 120,000, 140,000, 160,000, 180,000, 200,000, 220,000, 240,000,
- a plurality of images of a biological sample is received, where the plurality of images was acquired over a plurality of decoding (sequencing or probing) cycles, and where each image comprises a plurality of observed optical signals.
- the biological sample may comprise a tissue sample.
- the biological sample may comprise cells, e.g., cells derived from a cell culture, a tissue sample, or cells deposited on a surface.
- the biological sample may comprise, e.g., a tissue specimen that has been fixed, embedded, and/or cleared as described elsewhere herein.
- the plurality of images may comprise a plurality of images comprising different fields-of-view of the biological sample.
- one or more images e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more than 10 images
- each decoding (probing or sequencing) cycle as necessary to image the entire cross-sectional area of the biological sample.
- the plurality of images may comprise a plurality of z-stack images of the biological sample.
- a z-stack of images i.e., a series of images acquired at each of two or more focal planes within the thickness of the biological sample
- each z-stack of image may comprise 2, 3, 4, 5, 6, 7, 8, 9, 10, or more than 10 images acquired at different focal planes within the thickness of the biological sample.
- the plurality of observed optical signals may comprise signal intensity measurements based on the plurality of images. In some instances, the plurality of optical signals and may represent light emitted from a plurality of fluorophores.
- the plurality of observed optical signals is decoded to obtain a plurality of observed code words.
- decoding the plurality of observed optical signals in the plurality of images to obtain a plurality of observed code words may comprise: determining a location of each observed optical signal in a first image of the plurality of images; aligning the location of each observed optical signal (or corresponding feature, e.g., an RCP derived from a target analyte) in the first image to a corresponding location of each observed optical signal (or corresponding feature, e.g., an RCP derived from a target analyte) in the remaining images of the plurality of images to obtain a series of observed optical signals at each location; and obtaining the plurality of observed code words based on the series of observed optical signals at each location.
- aligning the locations of each optical signal (or corresponding feature, e.g., an RCP derived from a target analyte) in the first image to corresponding locations in the remaining images of the plurality of images may comprise registering the plurality of images acquired over the plurality of decoding (sequencing or probing) cycles.
- the alignment may comprise determining that optical signals derived from features (e.g., RCPs derived from target analytes) in different images arise from the same feature if the features in different images are within about 5, 10, 15, 20, 40, or 50 nm of each other.
- each observed optical signal of the plurality of observed optical signals in each image has associated therewith at least one value .
- the at least one value includes an intensity value of the observed optical signal for each color channel.
- the at least one value includes one or more statistical parameters, such as, for example, mean brightness, median brightness, variance, or standard deviation.
- the at least one intensity value may comprise an analog intensity value.
- an analog intensity value is determined for each light signal (or lack of light signal) in each color channel for each cycle of the plurality of cycles. For example, an analog intensity value may have a range of 0 (no intensity observed) to 12,000 (e.g., a full well capacity of each pixel in the sensor array).
- Full well capacity is defined as the amount of charge that can be stored within an individual pixel without the pixel becoming saturated (when an individual pixel can no longer accept any more photoelectrons). Full well capacity is dependent on the pixel size of the sensor and the camera operating voltages.
- the analog intensity value is a sum of intensities from multiple pixels (e.g., two or more adjacent pixels). In some instances, the analog intensity value is an area under the curve. In some instances, an amplitude of an observed optical signal is a value of the peak of the spot, constrained by the well depth. In some instances, an analog intensity value is determined at a specific position for a detected RCP in each color channel. In some instances, presence of signal is detected separately in each color channel.
- the separately detected signals are combined into a single vector or array.
- an intensity measurement in that channel will be missing or assigned a value of zero or null.
- An exemplary set of analog intensity values for a single RCP detected during a single imaging cycle may be ⁇ 11000, 100, 0, 0 ⁇ indicating that a high intensity was detected in the first color channel (e.g., red color channel), a small amount of intensity was detected in a second color channel (e.g., yellow color channel), and no intensity was detected in the third (e.g., green color channel) and fourth color channels (e.g., blue color channel).
- the intensity values are binned into a single bin of a plurality of bins, where each bin represents a range of intensity values.
- the plurality of bins includes more than 2 bins.
- the plurality of bins includes 3 bins, 4 bins, 5 bins, 6 bins, 7 bins, 8 bins, 9 bins, 10 bins, 11 bins, 12 bins, 13 bins, 14 bins, 15 bins, 16 bins, 17 bins, 18 bins, 19 bins, 20 bins, 21 bins, 22 bins, 23 bins, 24 bins, 25 bins, etc.
- the plurality of bins includes more than 25 bins.
- the plurality of bins includes up to 100 bins.
- the plurality of bins may include 4 bins as follows: Bin 0 is 0 to 2999; Bin 1 is 3000 to 5999; Bin 2 is 6000 to 8999; Bin 3 is 9000 to 12000.
- intensity values from 0 to 2999 are binned into Bin 0, intensity values from 3000 to 5999 are binned into Bin 1, intensity values from 6000 to 8999 are binned into Bin 2, and intensity values from 9000 to 12000 are binned into Bin 3.
- each of the plurality of bins have approximately equal sizes (as per the example above).
- the plurality of bins has different sizes, for example, where Bin 0 is 0 to 1999; Bin 1 is 2000 to 4999; Bin 2 is 5000 to 8999; Bin 3 is 9000 to 12000.
- a set of intensity values are represented by the bin number into which the intensity value is placed.
- An exemplary set of binned intensity values for a single RCP detected during a single imaging cycle may be ⁇ 3, 1, 0, 0 ⁇ indicating that a high intensity was detected in the first color channel (e.g., red color channel), a small amount of intensity was detected in a second color channel (e.g., yellow color channel), and no intensity was detected in the third (e.g., green color channel) and fourth color channels (e.g., blue color channel).
- the at least one intensity value may comprise a raw intensity value, a normalized intensity value, or a calculated intensity value calculated based on at least one of: a size of a feature corresponding to the observed optical signal (e.g., the radius of an imaged RCP), a circularity of a feature corresponding to the observed optical signal (e.g., the circularity of an imaged RCP), or one or more Gaussian statistical parameters (e.g., mean, standard deviation, variance, etc.) characterizing a feature corresponding to the observed optical signal (e.g., an imaged RCP).
- pixel intensity values in an image are normalized based on pixel intensity of background signals and pixel intensity of puncta detected within the image.
- pixel values of an image are scaled using a background measurement (e.g., a mean or median of background intensities) as a floor and a predetermined intensity percentile (e.g., 99th intensity percentile) of the detected puncta as a ceiling.
- a background measurement e.g., a mean or median of background intensities
- a predetermined intensity percentile e.g., 99th intensity percentile
- intensity values are normalized.
- the intensity values of puncta e.g., observed optical signals
- a high percentile value e.g., 99 th percentile
- the values are scaled by the median raw intensity over all images to bring the values back into an intensity range similar to the original observed values.
- a third step for every decoding neighborhood (e.g., a predetermined radius around a specific puncta) of puncta, divide the intensity values of all puncta by the intensity value of the central puncta of the neighborhood, so that systematically dimmer puncta may decode while penalizing variance in the brightness values.
- the first and/or second steps may be omitted.
- the third step reduces FOV-to-FOV decoding variability ("global decoding").
- the process may further comprise comparing the at least one intensity value representing an intensity of each observed optical signal to a predetermined intensity threshold to determine a binary value representing the intensity of each observed optical signal.
- each binary value comprises a 1 or a 0, wherein 1 represents an observed optical signal for which intensity is greater than or equal to the predetermined intensity threshold (e.g., an ON signal), and 0 represents an observed optical signal for which intensity is less than the predetermined intensity threshold (e.g., an OFF signal).
- decoding the plurality of observed optical signals in the plurality of images may comprise obtaining the plurality of observed code words based on a series of binary values determined for each location.
- each observed code word of the plurality of code words may comprise a plurality of code word segments, and each code word segment may comprise a specified string of binary values that corresponds to one of a specified set of observed optical signal states.
- each code word segment comprises, for example, a four bit string of binary values such that:
- a code word segment of 1 0 00 corresponds to a first optical signal state, A, in which an optical signal is detected in a first detection channel of a four-channel optical imaging instrument, and no optical signal is detected in a second, third, or fourth detection channel of the four-channel optical imaging instrument;
- a code word segment of 0 1 00 corresponds to a second optical signal state, B, in which an optical signal is detected in the second detection channel of the four-channel optical imaging instrument, and no optical signal is detected in the first, third, or fourth detection channel of the four-channel optical imaging instrument;
- a code word segment of 00 1 0 corresponds to a third optical signal state, C, in which an optical signal is detected in the third detection channel of the four-channel optical imaging instrument, and no optical signal is detected in the first, second, or fourth detection channel of the four-channel optical imaging instrument;
- a code word segment of 00 0 1 corresponds to a fourth optical signal state, D, in which an optical signal is detected in the fourth detection channel of the four-channel optical imaging instrument, and no optical signal is detected in the first, second, or third detection channel of the four-channel optical imaging instrument;
- a code word segment of 00 00 corresponds to a fifth optical signal state, E, in which no optical signal is detected in any of the first, second, third, or fourth detection channels of the four-channel optical imaging instrument.
- the valid code words in the codebook are stored in a database using the optical signal state format.
- a valid codeword for a 15 cycle run may be AEEEDEEBEEEAECE.
- the observed optical signal is converted into the optical signal state format before assigning the observed optical signal to a valid code word.
- a probabilistic method of decoding is used to map a set of intensity values (e.g., binary intensity values, analog intensity values, binned intensity values, etc.) from each cycle of the plurality of cycles to an observed optical signal state.
- a frequency table is generated using all possible combinations of intensity values for the color channels (e.g., four color channels) such that the frequency table maps each unique set of four intensity values to a most-likely optical signal state (e.g., A, B, C, D, or E).
- the frequency table is generated from previous runs of an opto-fluidic instrument.
- the frequency table is generated using a control sample.
- the frequency table is updated during (e.g., after each cycle) or after each run in complete.
- an observed set of binned intensity values ⁇ 3, 1, 0, 0 ⁇ may be most likely to map to state A based on the frequency table.
- an observed set of binned intensity values ⁇ 3, 0, 0, 0 ⁇ may be most likely to map to state A based on the frequency table.
- an observed set of binned intensity values ⁇ 0, 1, 3, 1 ⁇ may be most likely to map to state C based on the frequency table (e.g., the low intensities may be caused by autofluorescence or spectral crosstalk).
- an observed set of binned intensity values ⁇ 0, 0, 2, 0 ⁇ may be most likely to map to state C based on the frequency table.
- an observed set of binned intensity values ⁇ 1, 1, 1, 1 ⁇ may be most likely to map to state E based on the frequency table.
- an observed set of binned intensity values ⁇ 0, 0, 0, 0 ⁇ may be most likely to map to state E based on the frequency table.
- an observed set of binned intensity values ⁇ 0, 0, 1, 1 ⁇ may be most likely to map to state E based on the frequency table.
- the optical signal state can be converted into a binary format as described above with respect to the code word segments.
- each observed code word is analyzed to determine if an assignment of the observed code word to one of the plurality of valid code words from the OR-robust codebook can be made, or alternatively, to determine that the observed code word is not a valid code word.
- Step 308 is repeated until all of the observed code words have been processed and either assigned to valid code words or classified as artifacts resulting from, e.g., non-specific hybridization of a labeled detection probe used for decoding or a sequencing error.
- determining the assignment of the observed code word to one of the plurality of valid code words may comprise identifying a valid code word of the plurality of valid code words that is identical to the observed code word.
- determining the assignment of the observed code word to one of the plurality of valid code words may comprise changing at least one of the binary values in the series of binary values corresponding to the observed code word to thereby assign the observed code word to a valid code word of the plurality of valid code words.
- determining the assignment of the observed code word to one of the plurality of valid code words may comprise determining a plurality of scores based on comparison of the observed code word to all or a portion of the plurality of valid code words.
- determining the assignment of the observed code word to one of the plurality of valid code words may further comprise selecting one of the plurality of valid code words having a highest score to assign as a replacement for the observed code word.
- the presence (and location) of a barcoded target analyte within the biological sample is identified for each valid code word detected in the plurality of images.
- the identified target analyte may comprise a messenger RNA (mRNA) molecule or protein molecule.
- the process illustrated in FIG. 3 may comprise post-processing of a plurality of stored optical images to obtain the plurality of optical signals (and their respective locations) for subsequent use in determining observed code words and determining their assignment to valid code words in a codebook.
- all or a portion of the process illustrated in FIG. 3 may be performed in the cloud, e.g., using a received plurality of images or received optical signal data (and corresponding location data) previously derived from the plurality of images.
- the process may comprise: receiving a codebook comprising a plurality of valid code words, where, for all or a first portion of valid code words in the codebook, a first Hamming distance between a first logical bitwise OR combination of any pair of valid code words and a second logical bitwise OR combination of any other pair of valid code words is greater than or equal to 1 ; receiving a plurality of locations for a plurality of observed optical signals, where the plurality of observed optical signals are obtained from a plurality of images of a biological sample acquired over a plurality of sequencing or probing cycles; decoding the plurality of observed optical signals to obtain a plurality of observed code words at the plurality of locations; and for each observed code word: (i) determining an assignment of the observed code word to one of the plurality of valid code words, or (ii) determining that the observed code word is not a valid code word of the plurality of valid code words.
- FIGS. 4A-4B illustrate the structure of binary code words, in accordance with some implementations of the methods described herein.
- the code words correspond to and represent the physical barcodes e.g., oligonucleotide barcode sequences) attached to the target molecules in a multiplexed in situ assay, where the relationship between the structure of the code words and the structure of the physical barcodes depends on the read-out method (e.g., hybridization probe-based detection or nucleic acid sequencing) used in the in situ assay.
- the read-out method e.g., hybridization probe-based detection or nucleic acid sequencing
- FIG. 4A depicts a non-limiting example of the structure of a binary code word for use with hybridization probe-based in situ detection of barcoded target analytes (described in more detail below).
- Each code word comprises a series of code word segments (e.g., 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, or more than 15 segments), where each code word segment comprises as series of bits (e.g., 2, 3, 4, or more than 4 bits), and where each bit in a given code word segment corresponds to the detection of an ON signal (“1”) or an OFF signal (“0”) in a given optical detection channel in a given decoding cycle.
- the number of bits in each code word segment corresponds to the number of optical detection channels (e.g., different fluorescence emission detection channels or different color detection channels) in an imaging instrument used to perform a cyclical decoding process comprising, e.g., 4, 5,
- each code word segment corresponds to optical signals detected in images acquired in a given decoding cycle after contacting the biological sample (e.g., a tissue specimen) with a set of detectably-labeled hybridization probes designed to hybridize to a segment of the physical barcode (e.g., a segment of the oligonucleotide barcode sequence).
- the biological sample e.g., a tissue specimen
- a set of detectably-labeled hybridization probes designed to hybridize to a segment of the physical barcode (e.g., a segment of the oligonucleotide barcode sequence).
- FIG. 4B depicts a non-limiting example of the structure of a binary code word for use with sequencing-based in situ detection of barcoded target analytes (described in more detail below).
- each code word comprises a series of code word segments (e.g., 4, 5, 6,
- each code word segment comprises as series of bits (e.g., 2, 3, 4, or more than 4 bits), and where each bit in a given code word segment corresponds to the detection of an ON signal (“1”) or an OFF signal (“0”) in a given optical detection channel in a given sequencing cycle.
- the number of bits in each code word segment corresponds to the number of optical detection channels (e.g., different fluorescence emission detection channels or different color detection channels) in an imaging instrument used to perform a cyclical sequencing process comprising, e.g., 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, or more than 15 sequencing cycles.
- the total number of bits in the binary code word is given by:
- each code word segment corresponds to optical signals detected in images acquired in a given sequencing cycle to determine the identify of a single nucleotide in the physical barcode (e.g., the oligonucleotide barcode sequence).
- FIG. 5 provides a non-limiting schematic illustration of hybridization probe-based in situ detection of barcoded target analytes (or amplified representations, e.g., RCPs, thereof), where the barcodes comprise, e.g., oligonucleotide barcode sequences that have been assigned to a corresponding code word from an OR-robust codebook.
- the barcodes comprise, e.g., oligonucleotide barcode sequences that have been assigned to a corresponding code word from an OR-robust codebook.
- the physical barcode sequences each comprise a series of short barcode (BC) segments (e.g., BC segment 1, BC segment 2, , BC segment M) with one barcode segment for each cycle in a cyclical decoding (probing) process (comprising M cycles in total) that is used to decode a set of optical signals associated with each barcode as detected in a plurality of images acquired of a biological sample during the cyclical decoding (probing) process.
- BC short barcode
- each decoding (probing) cycle a set of detectably-labeled hybridization probes (e.g., fluorescently-labeled hybridization probes) that are designed to hybridize to specific barcode segments are introduced into a biological sample (e.g., a tissue specimen that has been fixed, embedded, and/or cleared as described elsewhere herein) and allowed to hybridize to a corresponding barcode segment.
- a biological sample e.g., a tissue specimen that has been fixed, embedded, and/or cleared as described elsewhere herein
- the number of unique hybridization probes in the set is typically the same as the number of unique barcode segments to be probed in a given decoding (probing) cycle.
- all of the unique hybridization probes in the set may be labeled with a detectable label, e.g., a fluorescent label, where different hybridization probes in the set are labeled with a different fluorophore.
- a detectable label e.g., a fluorescent label
- only a subset of the unique hybridization probes in the set may be labeled with a detectable label, e.g., a fluorophore, where different hybridization probes in the subset are labeled with a different fluorophore.
- the number of different detectable labels, e.g., fluorophores, used in each set of detectably-labeled hybridization probes may be the same for sets used in different cycles of the hybridization probe-based decoding process. In some instances, the number of different detectable labels, e.g., fluorophores, used in each set of detectably- labeled hybridization probes may be different for sets used in different cycles of the hybridization probe-based decoding process.
- the number of different detectable labels, e.g., fluorophores, used in each set of detectably-labeled hybridization probes will depend on factors such as the number of different optical detection channels (e.g., one color, two color, three color, or four color detection) in the instrument used to perform decoding, and the design of the code words used in the multiplexed in situ assay (e.g., in some cases, an absence of signal in a given decoding cycle (i.e., an OFF signal) may be used as part of code word design).
- the design of the code words used in the multiplexed in situ assay e.g., in some cases, an absence of signal in a given decoding cycle (i.e., an OFF signal) may be used as part of code word design).
- the total number of unique barcode segments to be probed in a given decoding (probing) cycle may be, e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more than 10 unique barcode segments (or unique hybridization probes).
- the biological sample is then imaged, and the image(s) (e.g., fluorescence image(s)) are processed using any of a variety of image processing techniques known to those of skill in the art to measure signal intensities at the locations of a plurality of barcoded target molecules (or amplified representations, e.g., RCPs, thereof).
- image(s) e.g., fluorescence image(s)
- RCPs amplified representations
- the plurality of barcoded target molecules may comprise, e.g., at least 10, 20, 30, 40, 50, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1,000, 5,000, 10,000, 20,000, 30,000, 40,000, 50,000, 60,000, 70,000, 80,000, 90,000, or 100,000 barcoded target molecules (or amplified representations, e.g., RCPs, thereof).
- One or more images comprising different fields-of-view of the biological sample may be acquired in each cycle as necessary to image the entire cross-sectional area of the biological sample.
- a z-stack of images z.e., a series of images acquired at each of two or more focal planes within the thickness of the biological sample
- each z-stack of image may comprise 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 25, 30, 35, 40, 45, 50, or more than 50 images acquired at different focal planes within the thickness of the biological sample.
- the hybridized probes are stripped from the biological sample and the process is repeated for a specified number of cycles, M.
- FIG. 6 provides a non-limiting schematic illustration of sequencing-based in situ detection of barcoded target analytes (or amplified representations, e.g., RCPs, thereof), where the barcodes comprise, e.g., oligonucleotide barcode sequences that have been assigned to a corresponding code word from an OR-robust codebook.
- the barcodes comprise, e.g., oligonucleotide barcode sequences that have been assigned to a corresponding code word from an OR-robust codebook.
- the physical barcode sequences each comprise an oligonucleotide sequence of M nucleotides in length, where one nucleotide is to be identified in each cycle in a cyclical decoding (base-by- base nucleic acid sequencing) process (comprising M cycles in total) that is used to decode a set of optical signals associated with each barcode as detected in a plurality of images acquired of a biological sample during the cyclical decoding (sequencing) process.
- base-by- base nucleic acid sequencing comprising M cycles in total
- Any of a variety of base-by-base sequencing techniques known to those of skill in the art may be used to determine barcode sequences multiplexed in situ assays that utilize the disclosed codebook design methods.
- a set of detectably-labeled nucleotides e.g., fluorescently-labeled, 3’ reversibly terminated nucleotides
- a biological sample e.g., a tissue specimen that has been fixed, embedded, and/or cleared as described elsewhere herein
- a polymerase e.g., a reversed barcode sequence
- the number of unique 3’ reversibly terminated nucleotides in the set is typically the same as the number of unique nucleotide residues (typically four) that are potentially present at a given position in the barcode sequence to be probed in a given decoding (sequencing) cycle.
- all of the unique 3’ reversibly terminated nucleotides in the set may be labeled with a detectable label, e.g., a fluorescent label, where different nucleotides in the set are labeled with a different fluorophore.
- a detectable label e.g., a fluorescent label
- only a subset of the unique 3’ reversibly terminated nucleotides in the set may be labeled with a detectable label, e.g., a fluorophore, where different nucleotides in the subset are labeled with a different fluorophore.
- the number of different detectable labels, e.g., fluorophores, used in each set of detectably-labeled 3’ reversibly terminated nucleotides may be the same for sets used in different cycles of the sequencing -based decoding process. In some instances, the number of different detectable labels, e.g., fluorophores, used in each set of detectably-labeled 3’ reversibly terminated nucleotides may be different for sets used in different cycles of the sequencing-based decoding process.
- the number of different detectable labels, e.g., fluorophores, used in each set of detectably-labeled 3’ reversibly terminated nucleotides will depend on factors such as the number of different optical detection channels (e.g., one color, two color, three color, or four color detection) in the instrument used to perform decoding, and the design of the code words used in the multiplexed in situ assay (e.g., in some cases, an absence of signal in a given decoding cycle (i.e., an OFF signal) may be used as part of code word design)
- the total number of unique nucleotide residues to be probed in a given decoding (sequencing) cycle may be, e.g., 2, 3, or 4 (or more than 4 if non-natural nucleotides that obey similar base-pairing rules are included).
- the biological sample is then imaged, and the image(s) (e.g., fluorescence image(s)) are processed using any of a variety of image processing techniques known to those of skill in the art to measure signal intensities at the locations of a plurality of barcoded target molecules (or amplified representations, e.g., RCPs, thereof).
- image(s) e.g., fluorescence image(s)
- RCPs amplified representations
- the plurality of barcoded target molecules may comprise, e.g., at least 10, 20, 30, 40, 50, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1,000, 5,000, 10,000, 20,000, 30,000, 40,000, 50,000, 60,000, 70,000, 80,000, 90,000, or 100,000 barcoded target molecules (or amplified representations, e.g., RCPs, thereof).
- One or more images comprising different fields-of-view of the biological sample may be acquired in each cycle as necessary to image the entire cross-sectional area of the biological sample.
- a z-stack of images z.e., a series of images acquired at each of two or more focal planes within the thickness of the biological sample
- each z-stack of image may comprise 2, 3, 4, 5, 6, 7, 8, 9, 10, or more than 10 images acquired at different focal planes within the thickness of the biological sample.
- the 3’ reversibly terminated nucleotide that has been incorporated into the priming strand is deprotected and the process is repeated for a specified number of cycles, M.
- the processing of images acquired during the in situ decoding schemes illustrated in the flowcharts of FIG. 5 and FIG. 6 is performed in similar fashion.
- the images may be processed in real-time immediately following acquisition.
- the images may be post-processed, i.e., they may be stored in computer memory and processed at a later time.
- Processing of the image(s) acquired in each decoding (probing or sequencing) cycle results in the generation of a fluorescence data set for each decoding (probing or sequencing) cycle (e.g., fluorescence data set 1, fluorescence data set 2, , fluorescence data set M) that each comprise measured fluorescence signal intensities (in the case that the detectable labels comprise fluorophores) for each of the plurality of locations at which target molecules (or amplified representations, e.g., RCPs, thereof) are detected.
- a fluorescence data set for each decoding (probing or sequencing) cycle e.g., fluorescence data set 1, fluorescence data set 2, , fluorescence data set M
- each comprise measured fluorescence signal intensities in the case that the detectable labels comprise fluorophores
- the fluorescence data sets comprise measured fluorescence signal intensities for each of a plurality of target molecule locations (or the locations of amplified representations, e.g., RCPs, thereof) in two dimensions.
- the fluorescence data sets comprise measured fluorescence signal intensities for each of a plurality of target molecule locations (or the locations of amplified representations, e.g., RCPs, thereof) in three dimensions.
- the compiled set of fluorescence data sets e.g., fluorescence data set 1, fluorescence data set 2, , fluorescence data set M
- fluorescence data set 1 may then be processed to identify a series of fluorescence signals at each of a plurality of target molecule locations (or the locations of amplified representations, e.g., RCPs, thereof) detected in the images acquired over the course of performing the M decoding (probing or sequencing) cycles.
- the fluorescence signals may comprise analog signals (z.e., continuous, real-valued fluorescence intensity signals, such as those obtained when using photomultipliers or photomultiplier arrays).
- the fluorescence signals may comprise digital signals (z.e., digitized renditions of continuous, real-valued fluorescence intensity signals, such as those obtained when using CMOS or CCD image sensors).
- the fluorescence signals may be processed, e.g., to perform one or more of background subtraction, normalization, fitting to a Gaussian or other line shape function, determination of a centroid position, etc. Any of a variety of image processing methods known to those of skill in the art may be used for image processing / pre-processing.
- Examples include, but are not limited to, Canny edge detection methods, Canny-Deriche edge detection methods, first-order gradient edge detection methods (e.g., the Sobel operator), second order differential edge detection methods, phase congruency (phase coherence) edge detection methods, other image segmentation algorithms (e.g., intensity thresholding, intensity clustering methods, intensity histogram-based methods, etc.), feature and pattern recognition algorithms (e.g., the generalized Hough transform for detecting arbitrary shapes, the circular Hough transform, etc.), and mathematical analysis algorithms (e.g., Fourier transform, fast Fourier transform, wavelet analysis, auto-correlation, etc.), or any combination thereof.
- Canny edge detection methods Canny-Deriche edge detection methods
- first-order gradient edge detection methods e.g., the Sobel operator
- second order differential edge detection methods e.g., phase congruency (phase coherence) edge detection methods
- other image segmentation algorithms e.g., intensity thresholding, intensity clustering methods, intensity histogram-
- the fluorescence signals may be processed and/or compared to a predetermined fluorescence intensity threshold to generate corresponding binary signal values (e.g., ON signals (“1”) or OFF signals (“0”) that indicate whether or not a fluorescence signal of intensity greater than or equal to the predetermined fluorescence intensity threshold was detected in a given optical detection channel (e.g., a given fluorescence emission detection channel or a given color detection channel) for a given decoding (probing or sequencing) cycle.
- a predetermined fluorescence intensity threshold e.g., ON signals (“1”) or OFF signals (“0”) that indicate whether or not a fluorescence signal of intensity greater than or equal to the predetermined fluorescence intensity threshold was detected in a given optical detection channel (e.g., a given fluorescence emission detection channel or a given color detection channel) for a given decoding (probing or sequencing) cycle.
- a given optical detection channel e.g., a given fluorescence emission detection channel or a given color detection channel
- the series of binary signal values determined for each target molecule location (or the location of the amplified representation, e.g., RCP, thereof) in the series of M decoding (probing or sequencing) cycles may then be used, in combination with prior knowledge of the optical detection channels for which signals were detected in each decoding (probing or sequencing) cycle, to identify a plurality of observed code words corresponding to the plurality of barcoded target molecules.
- an observed code word may be identical to one of the valid code words from the OR-robust codebook and the identity of the corresponding target molecule can be determined directly from the OR-robust codebook assignments.
- an observed code word may correspond closely to one of the valid code words from the OR-robust codebook, but may not be identical series of binary values.
- the properties of the OR-robust code book may be used to detect and/or correct errors arising from, e.g., non-specific hybridization of detectably-labeled probes, or sequencing errors, and thereby assign the observed code word to a valid code word from the OR-robust codebook.
- an observed code word may be assigned to a valid code word if changing one or more of the binary values (e.g., bits) in the series of binary values corresponding to the observed code word results in with the observed code word being identical to a valid code word of the plurality of valid code words in the OR-robust codebook.
- an observed code word may be assigned to (e.g., replaced by) a valid code word based on determining a plurality of scores (e.g., pairwise edit distances, Hamming distances, and/or Hamming distances between logical bitwise OR code word combinations) based on comparison of the observed code word to all or a portion of the plurality of valid code words in the OR-robust codebook.
- the observed code word may be assigned to (e.g., replaced by) a valid code word that exhibits the highest score (e.g., the minimum edit distance, Hamming distance, and/or Hamming distance between logical bitwise OR code word combinations).
- each score in the plurality of scores is a probability (e.g., 0 to 1). In some instance, the highest score is the highest probability. In some instances, each score in the plurality of scores is a loglikelihood. In some instance, the highest score is the highest log-likelihood.
- one or more observed code words may be assigned to (e.g., replaced by) valid code words based on replacement with a corresponding valid code word in the OR-robust codebook that has a maximum likelihood as computed from the log likelihood (or negative log likelihood) of a probability distribution generated by a probabilistic model that provides probabilities for detecting a given code word, or code word segment, at a given location in a given decoding (probing or sequencing) cycle based on a set of detected optical signals (e.g., fluorescence signals) associated with a set of hybridization probes or nucleotides used to detect the barcode sequences.
- optical signals e.g., fluorescence signals
- one or more observed code words may be assigned to (e.g., replaced by) valid code words based on replacement with a corresponding valid code word in the OR-robust codebook that: (i) is within a predetermined pairwise edit distance (e.g., a predetermined Hamming distance and/or a predetermined Hamming distance between logical bitwise OR combinations of valid code words) from the observed code word, and (ii) has a maximum likelihood as computed from the log likelihood (or negative log likelihood) for a probability distribution generated by a probabilistic model that provides probabilities for detecting a given code word, or code word segment, at a given location in a given decoding (probing or sequencing) cycle based on a set of detected optical signals associated with a set of hybridization probes or nucleotides used to detect the barcode sequences.
- a predetermined pairwise edit distance e.g., a predetermined Hamming distance and/or a predetermined Hamming distance between logical bitwise OR combinations of valid code words
- one or more observed code words may be assigned to (e.g., replaced by) valid code words based on an iterative process comprising correcting the one or more observed code words by replacement with one of the valid code words that: (i) is within a predetermined pairwise edit distance (e.g., a predetermined Hamming distance and/or a predetermined Hamming distance between logical bitwise OR combinations of valid code words) from the observed code word (determined, for example, by rank-ordering the set of valid code words according to their pairwise edit distance from the observed code word), and (ii) has a maximum likelihood as computed from a log likelihood (or negative log likelihood) for a probability distribution generated by a probabilistic model that provides probabilities for detecting a given code word, or code word segment thereof, at a given location in a given decoding (probing or sequencing) cycle based on a set of detected optical signals, and updating the probabilistic model using the corrected code words, where the process is repeated until a fully corrected set of validate
- each previously corrected code word is replaced with one of the valid code words that: (iii) is within a predetermined pairwise edit distance (e.g., a predetermined Hamming distance and/or a predetermined Hamming distance between logical bitwise OR combinations of valid code words) of the previously corrected code word, and (iv) has a maximum likelihood as computed from the log likelihood (or negative log likelihood) for a probability distribution generated by the updated probabilistic model.
- rates of various error modes are estimated, such of optical cross-talk or stripping errors, by comparing the observed codeword to the bestmatching valid codeword, and a probabilistic decoding model can be updated based on the estimated error rates.
- parameters of a maximum likelihood model are updated according to the empirical rates of those errors.
- each previously corrected code word is replaced with one of the valid code words that: (iii) is within a predetermined pairwise edit distance (e.g., a predetermined Hamming distance and/or a predetermined Hamming distance between logical bitwise OR combinations of valid code words) of the previously corrected code word, and (iv) has a maximum likelihood as computed from the truncated log likelihood (or negative truncated log likelihood) for a probability distribution generated by the updated probabilistic model.
- a predetermined pairwise edit distance e.g., a predetermined Hamming distance and/or a predetermined Hamming distance between logical bitwise OR combinations of valid code words
- the provided methods involve analyzing, e.g., detecting or determining, one or more sequences present in the probes or probe sets or products thereof (e.g., rolling circle amplification products thereof).
- the detecting is performed at one or more locations in the biological sample.
- the locations are the locations of RNA transcripts in the biological sample.
- the locations are the locations at which the probes or probe sets hybridize to the RNA transcripts in the biological sample, and are optionally ligated and amplified by rolling circle amplification.
- detecting the one or more sequences present in the probes or probe sets in the biological sample is performed, and the detected sequences are compared to an expected set of detected sequences.
- the expected set of sequences is based on the barcode sequences of the panels of probes or probe sets in the probe mixture and the known expression levels of the RNA transcripts of the first, second, and/or third sets of genes in the first and second cell populations.
- the one or more sequences are one or more barcode sequences or complements thereof.
- the expected set of detected sequences include sequences expected to be detected at a high expression level (e.g., more than 20 counts of the detected sequence per cell) in one or both of the first and second cell populations.
- the expected set of detected sequences include sequences expected to be detected at a medium expression level (e.g., 5-20 counts of the detected sequence per cell) in one or both of the first and second cell populations. In some embodiments, the expected set of detected sequences include sequences expected to be detected at a low expression level (e.g., 1-5 counts of the detected sequence per cell) in one or both of the first and second cell populations.
- a medium expression level e.g., 5-20 counts of the detected sequence per cell
- the expected set of detected sequences include sequences expected to be detected at a low expression level (e.g., 1-5 counts of the detected sequence per cell) in one or both of the first and second cell populations.
- the detecting comprises a plurality of repeated cycles of hybridization and removal of probes (e.g., detectably labeled probes, or intermediate probes that bind to detectably labeled probes) to the primary probe or probe set hybridized to the target nucleic acid, or to a rolling circle amplification product generated from the probe or probe set hybridized to the target nucleic acid.
- probes e.g., detectably labeled probes, or intermediate probes that bind to detectably labeled probes
- Detectably-labeled probes can be useful for detecting multiple target nucleic acids and be detected in one or more hybridization cycles (e.g., sequential hybridization assays, or sequencing by hybridization).
- the detecting can comprise binding an intermediate probe directly or indirectly to the primary probe or probe set, binding a detectably labeled probe directly or indirectly to a detection region of the intermediate probe, and detecting a signal associated with the detectably labeled probe.
- the method comprises detecting a rolling circle amplification product (RCP) generated using a circular or circularized primary probe or probe set as a template.
- the method comprises detecting a rolling circle amplification product (RCP) generated using a circular or circularized probe or probe that binds to a primary probe or probe set as a template.
- detecting the RCP comprises binding an intermediate probe directly or indirectly to the RCP, binding a detectably labeled probe directly or indirectly to a detection region of the intermediate probe, and detecting a signal associated with the detectably labeled probe.
- the method can comprise performing one or more wash steps to remove unbound and/or nonspecifically bound intermediate probe molecules from the primary probes or the products of the primary probes.
- the detecting can comprise: detecting signals associated with detectably labeled probes that are hybridized to barcode regions or complements thereof in the primary probe or probe set or a product thereof (e.g., an RCP); and/or detecting signals associated with detectably labeled probes that are hybridized to intermediate probes which are in turn hybridized to the barcode regions or complements thereof.
- the detectably labeled probes can be fluorescently labeled.
- the methods comprise detecting the sequence in all or a portion of a primary probe or probe set or an RCP, or detecting a sequence of the primary probe or probe set or RCP, such as one or more barcode sequences present in the primary probe or probe set or RCP.
- the sequence of the RCP, or barcode thereof is indicative of a sequence of the target nucleic acid to which the RCP is hybridized.
- the analysis and/or sequence determination comprises detecting a sequence in all or a portion of the nucleic acid concatemer and/or in situ hybridization to the RCP.
- the detection step is by sequential fluorescent in situ hybridization (e.g., for combinatorial decoding of the barcode sequence or complement thereof).
- the detection or determination comprises hybridizing to a probe directly or indirectly a detection oligonucleotide labeled with a fluorophore, an isotope, a mass tag, or a combination thereof.
- the detection or determination comprises imaging the probe hybridized to the target nucleic acid (e.g., imaging one or more detectably labeled probes hybridized thereto).
- the target nucleic acid is an mRNA in a tissue sample, and the detection or determination is performed when the target nucleic acid and/or the amplification product is in situ in the tissue sample.
- the target nucleic acid is an amplification product (e.g., a rolling circle amplification product).
- sequencing can be performed by sequencing-by- synthesis (SBS).
- a sequencing primer is complementary to primer binding sequences located at or near the one or more barcode sequence(s).
- sequencing-by- synthesis can comprise reverse transcription and/or amplification in order to generate a template sequence from which a primer sequence can bind.
- Exemplary SBS methods comprise those described for example, but not limited to, US 2007/0166705, US 2006/0188901, US 7,057,026, US 2006/0240439, US 2006/0281109, US 2011/0059865, US 2005/0100900, US 9,217,178, US 2009/0118128, US 2012/0270305, US 2013/0260372, and US 2013/0079232, all of which are herein incorporated by reference in their entireties.
- Accurate decoding of a single-stranded template (barcode) sequences relies on successfully classifying signals that arise from the stepwise addition of A, G, C, and T nucleotides by a polymerase to a complementary primer extension strand.
- these methods typically include modifying the template sequences with a known adapter sequence used to tether the template sequences to a solid support (e.g., the interior surface(s) of a flow cell) in a random or patterned array by hybridization to complementary adapter sequence attached to the support surface, where the adapter sequences typically also include primer binding sites used for clonal amplification and/or sequencing.
- the template sequences may be designed to include both the barcode sequences and amplification and/or sequencing primer binding sites, where the template sequences may be attached to target analytes (for nucleic acid analytes) using, e.g., a padlock or other circularizable probe, and amplified using, e.g., rolling circle amplification.
- the amplified template sequences (comprising barcode sequences) are then probed through a cyclic series of single-base addition primer extension reactions that use detectably-labeled, e.g., fluorescently-labeled, nucleotides to identify the sequence of bases in the template sequences, where the fluorescently-labeled nucleotides are typically blocked at the 3’-OH group with a reversible terminator moiety.
- detectably-labeled e.g., fluorescently-labeled
- the cyclical sequence process thus comprises repeating the steps of (i) contacting a primed template sequence (i.e., a template sequence comprising a bound primer strand having a free 3 ’-OH group) with a mixture of fluorescently-labeled, 3 ’-OH reversibly-terminated nucleotides and a polymerase to enable incorporation of a nucleotide that is complementary to a nucleotide in the template sequence into an extended primer strand, (ii) washing away any unbound nucleotides and polymerase molecules, (iii) imaging the sample (e.g., the surface of a flow cell to which the amplified template sequences are attached, or a tissue sample within which the amplified template sequences are distributed), and (iv) deprotecting the 3’ end of the extended primer strand to remove the reversible terminator moiety and cleaving off the fluorophore, thereby enabling initiation of the next cycle.
- a primed template sequence
- the mixture of nucleotides (e.g., fluorescently-labeled, 3 ’-OH reversibly-terminated nucleotides) used in each cycle may be the same. In some instances, the mixture of nucleotides (e.g., fluorescently-labeled, 3 ’-OH reversibly-terminated nucleotides) used in one or more cycles may be different from that used in one or more different cycles.
- all of the nucleotides (e.g., detectably-labeled, 3’-OH reversibly-terminated nucleotides) in the mixture of nucleotides may be labeled with a detectable label (e.g., a fluorophore), where different nucleotides in the mixture are labeled with different detectable labels.
- a detectable label e.g., a fluorophore
- only a subset of the nucleotides (e.g., detectably-labeled, 3 ’-OH reversibly-terminated nucleotides) in the mixture of nucleotides may be labeled with a detectable label (e.g., a fluorophore), where different nucleotides in the subset are labeled with different detectable labels.
- the subset of nucleotide (e.g., detectably-labeled, 3’-OH reversibly-terminated nucleotides) may comprise, e.g., one, two, or three of A, T/U, G, and C.
- the “sequencing-by-ligation” (SBL) approach uses a DNA ligase to identify the nucleotide present at a given position in a template sequence. Unlike sequencing-by- synthesis approaches, this method does not use a DNA polymerase to perform primer extension. Instead, the mismatch sensitivity of a DNA ligase enzyme is used to determine the underlying sequence of the template nucleic acid molecule (see, e.g., EP0703991).
- the "sequencing-by-binding" (SBB) approach is based on performing repetitive cycles of detecting a stabilized complex that forms at each position along the template sequence (e.g., a ternary complex that includes the primed template, a polymerase, and a cognate nucleotide for the position), under conditions that prevent covalent incorporation of the cognate nucleotide into the primer, and then extending the primer to allow detection of the next position along the template (see, e.g., U.S. Pat. Nos. 9,951,385 and 10,655,176).
- detection of the nucleotide at each position of the template occurs prior to extension of the primer to the next position.
- the methodology is used to distinguish the four different nucleotide types that can be present at positions along a nucleic acid template by uniquely labelling each type of ternary complex (i.e., different types of ternary complexes differing in the type of nucleotide it contains) or by separately delivering the reagents needed to form each type of ternary complex.
- the labeling may comprise fluorescence labeling of, e.g., the cognate nucleotide or the polymerase that participates in the ternary complex.
- the "sequencing-by-avidity" (or SB A) approach relies on the increased avidity ( or “functional affinity") derived from forming a complex comprising a plurality of individual non-covalent binding interactions (see, e.g., U.S. Pat. Nos. 10,768,173 and 10,982,280).
- the sequencing-by-avidity approach is based on the detection of a multivalent binding complex formed between a fluorescently-labeled polymer-nucleotide conjugate, a polymerase, and a plurality of primed target nucleic acid molecules, which allows the detection/base calling step to be separated from the nucleotide incorporation step. Fluorescence imaging is used to detect the bound complex and thereby determine the identity of the N + 1 nucleotide in the target nucleic acid sequence (where the primer extension strand is N nucleotides in length).
- the disclosed methods may comprise using one or more nucleotides or analogs thereof, including a native nucleotide or a nucleotide analog or modified nucleotide (e.g., labeled with one or more detectable labels).
- a nucleotide analog comprises a nitrogenous base, five-carbon sugar, and phosphate group, wherein any component of the nucleotide may be modified and/or replaced.
- a method disclosed herein may comprise using one or more non-incorporable nucleotides. Non-incorporable nucleotides may be modified to become incorporable at any point during the sequencing method.
- Nucleotide analogs include, but are not limited to, alpha-phosphate modified nucleotides, alpha-beta nucleotide analogs, beta-phosphate modified nucleotides, beta-gamma nucleotide analogs, gamma-phosphate modified nucleotides, caged nucleotides, or ddNTPs. Examples of nucleotide analogs are described in U.S. Patent No. 8,071,755, which is incorporated by reference herein in its entirety.
- a method disclosed herein may comprise using terminators that reversibly prevent nucleotide incorporation at the 3 '-end of the primer.
- One type of reversible terminator is a 3'-O-blocked reversible terminator.
- the terminator moiety is linked to the oxygen atom of the 3'-OH end of the 5-carbon sugar of a nucleotide.
- U.S. Patent Nos. 7,544,794 and 8,034,923 (the disclosures of these patents are incorporated by reference) describe reversible terminator dNTPs having the 3'-OH group replaced by a 3'-ONH2 group.
- reversible terminator is a 3 '-unblocked reversible terminator, wherein the terminator moiety is linked to the nitrogenous base of a nucleotide.
- U.S. Patent No. 8,808,989 discloses particular examples of base-modified reversible terminator nucleotides that may be used in connection with the methods described herein.
- Other reversible terminators that similarly can be used in connection with the methods described herein include those described in U.S. Patent Nos. 7,956,171, 8,071,755, and 9,399,798, herein incorporated by reference.
- a method disclosed herein may comprise using nucleotide analogs having terminator moieties that irreversibly prevent nucleotide incorporation at the 3 '-end of the primer.
- Irreversible nucleotide analogs include 2', 3'-dideoxynucleotides, ddNTPs (ddGTP, ddATP, ddTTP, ddCTP). Dideoxynucleotides lack the 3'-OH group of dNTPs that is essential for polymerase-mediated synthesis.
- a method disclosed herein may comprise using non- incorporable nucleotides comprising a blocking moiety that inhibits or prevents the nucleotide from forming a covalent linkage to a second nucleotide (3'-OH of a primer) during the incorporation step of a nucleic acid polymerization reaction.
- the blocking moiety can be removed from the nucleotide, allowing for nucleotide incorporation.
- a method disclosed herein may comprise using 1, 2, 3, 4 or more nucleotide analogs.
- a nucleotide analog is replaced, diluted, or sequestered during an incorporation step.
- a nucleotide analog is replaced with a native nucleotide.
- a nucleotide analog is modified during an incorporation step. The modified nucleotide analog can be similar to or the same as a native nucleotide.
- a method disclosed herein may comprise using a nucleotide analog having a different binding affinity for a polymerase than a native nucleotide.
- a nucleotide analog has a different interaction with a next base than a native nucleotide.
- Nucleotide analogs and/or non-incorporable nucleotides may base-pair with a complementary base of a template nucleic acid.
- Any suitable enzyme having a polymerase activity can be used in the sequencing reactions described herein, and exemplary polymerases include, but are not limited to, bacterial DNA polymerases, eukaryotic DNA polymerases, archaeal DNA polymerases, viral DNA polymerases and phage DNA polymerases.
- Bacterial DNA polymerases include E. coli DNA polymerases I, II and III, IV and V, the Klenow fragment of E. coli DNA polymerase, Clostridium stercorarium (Cst) DNA polymerase, Clostridium thermocellum (Cth) DNA polymerase and Sulfolobus solfataricus (Sso) DNA polymerase.
- Eukaryotic DNA polymerases include DNA polymerases a, P, y, 5, e, q, , c. p, and K, as well as the Revl polymerase (terminal deoxycytidyl transferase) and terminal deoxynucleotidyl transferase (TdT).
- Viral DNA polymerases include T4 DNA polymerase, phi-29 DNA polymerase, GA-1, phi-29-like DNA polymerases, PZA DNA polymerase, phi- 15 DNA polymerase, Cpl DNA polymerase, Cp7 DNA polymerase, T7 DNA polymerase, and T4 polymerase.
- DNA polymerases include thermostable and/or thermophilic DNA polymerases such as DNA polymerases isolated from Thermits aquaticus (Taq) DNA polymerase, Thermits filiformis (Tfi) DNA polymerase, Thermococcus zilligi (Tzi) DNA polymerase, Thermus thermophilus (Tth) DNA polymerase, Thermus flavusu (Tfl) DNA polymerase, Pyrococcus woesei (Pwo) DNA polymerase, Pyrococcus furiosus (Pfu) DNA polymerase and Turbo Pfu DNA polymerase, Thermococcus litoralis (Tli) DNA polymerase, Pyrococcus sp.
- Taq Thermits aquaticus
- Tfi Thermits filiformis
- Tzi Thermococcus zilligi
- Tzi Thermus thermophilus
- Tth Thermus flavusu
- Pwo Pyrococc
- GB-D polymerase Thermotoga maritima (Tma) DNA polymerase, Bacillus stearothermophilus (Bst) DNA polymerase, Pyrococcus Kodakaraensis (KOD) DNA polymerase, Pfx DNA polymerase, Thermococcus sp. JDF-3 (JDF-3) DNA polymerase, Thermococcus gorgonarius (Tgo) DNA polymerase, Thermococcus acidophilium DNA polymerase; Sulfolobus acidocaldarius DNA polymerase; Thermococcus sp.
- modified versions of the extremely thermophilic marine archaea Thermococcus species 9° N can be used.
- Still other useful DNA polymerases, including the 3PDX polymerase are disclosed in U.S. Patent No. 8,703,461, the disclosure of which is incorporated by reference in its entirety.
- RNA polymerases such as T7 RNA polymerase, T3 polymerase, SP6 polymerase, and Kl l polymerase
- Eukaryotic RNA polymerases such as RNA polymerase I, RNA polymerase II, RNA polymerase III, RNA polymerase IV, and RNA polymerase V
- Archaea RNA polymerase HIV-1 reverse transcriptase from human immunodeficiency virus type 1 (PDB 1HMV), HIV-2 reverse transcriptase from human immunodeficiency virus type 2, M-MLV reverse transcriptase from the Moloney murine leukemia virus, AMV reverse transcriptase from the avian myeloblastosis virus, and Telomerase reverse transcriptase that maintains the telomeres of eukaryotic chromosomes.
- PDB 1HMV human immunodeficiency virus type 1
- HIV-2 reverse transcriptase from human immunodeficiency virus type 2
- one or more nucleotides can be labeled with distinguishing and/or detectable tags or labels.
- the tags may be distinguishable by means of their differences in fluorescence, Raman spectrum, charge, mass, refractive index, luminescence, length, or any other measurable property.
- the tag may be attached to one or more different positions on the nucleotide, so long as the fidelity of binding to the polymerase-nucleic acid complex is sufficiently maintained to enable identification of the complementary base on the template nucleic acid correctly.
- the tag is attached to the nucleobase of the nucleotide.
- a tag is attached to the gamma phosphate position of the nucleotide.
- Detectable labels can be suitable for small scale detection and/or suitable for high- throughput screening.
- suitable detectable labels include, but are not limited to, radioisotopes, fluorophores, chemiluminescent compounds, bioluminescent compounds, and dyes.
- the detectable label can be qualitatively detected (e.g., optically or spectrally), or it can be quantified.
- Qualitative detection generally includes a detection method in which the existence or presence of the detectable label is confirmed, whereas quantifiable detection generally includes a detection method having a quantifiable (e.g., numerically reportable) value such as an intensity, duration, polarization, and/or other properties.
- the detectable label is bound to another moiety, for example, a nucleotide or nucleotide analog, and can include a fluorescent, a colorimetric, or a chemiluminescent label.
- a detectable label can be attached to another moiety, for example, a nucleotide or nucleotide analog.
- one or more nucleotides can be labeled with a cleavable detectable tag or label.
- the non-terminating fluorescently labeled nucleotides can include a DBCO-nucleotide conjugated to fluorescent compound with a disulfide linker.
- a non-terminating fluorescently labeled nucleotide is incorporated into the strand without termination, and after imaging, the linker can be cleaved to remove fluorescent label.
- a DBCO-nucleotide e.g., 5-DBCO-PEG4-UTP
- a click reaction with the cleavable linker conjugated to a fluorescent label (e.g., cleavable linker- ATTO647N), and a disulfide group can be cleaved by tris(2-carboxyethyl)phosphine (TCEP) reduction together with 3’-O-azidomethyl- dNTP.
- TCEP tris(2-carboxyethyl)phosphine
- the detectable label is a fluorophore.
- the fluorophore can be from a group that includes: 7-AAD (7- Aminoactinomycin D), Acridine Orange (+DNA), Acridine Orange (+RNA), Alexa Fluor® 350, Alexa Fluor® 430, Alexa Fluor® 488, Alexa Fluor® 532, Alexa Fluor® 546, Alexa Fluor® 555, Alexa Fluor® 568, Alexa Fluor® 594, Alexa Fluor® 633, Alexa Fluor® 647, Alexa Fluor® 660, Alexa Fluor® 680, Alexa Fluor® 700, Alexa Fluor® 750, Allophycocyanin (APC), AMCA / AMCA-X, 7- Aminoactinomycin D (7-AAD), 7- Amino-4-methylcoumarin, 6- Aminoquinoline, Aniline Blue, ANS, APC-Cy7, ATTO-TAGTM CBQC
- the detectable label can be directly detectable by itself (e.g., radioisotope labels or fluorescent labels) or, in the case of an enzymatic label, can be indirectly detectable, e.g., by catalyzing chemical alterations of a substrate compound or composition, which substrate compound or composition is directly detectable.
- the label can emit a signal or alter a signal delivered to the label so that the presence or absence of the label can be detected.
- coupling may be via a linker, which may be cleavable, such as photo-cleavable (e.g., cleavable under ultra-violet light), chemically-cleavable (e.g., via a reducing agent, such as dithiothreitol (DTT), tris(2-carboxyethyl)phosphine (TCEP)) or enzymatically cleavable (e.g., via an esterase, lipase, peptidase, or protease).
- a linker which may be cleavable, such as photo-cleavable (e.g., cleavable under ultra-violet light), chemically-cleavable (e.g., via a reducing agent, such as dithiothreitol (DTT), tris(2-carboxyethyl)phosphine (TCEP)) or enzymatically cleavable (e.g.,
- Fluorescence detection in tissue samples can often be hindered by the presence of strong background fluorescence.
- “Autofluorescence” is the general term used to distinguish background fluorescence (that can arise from a variety of sources, including aldehyde fixation, extracellular matrix components, red blood cells, lipofuscin, and the like) from the desired immunofluorescence from the fluorescently labeled antibodies or probes. Tissue autofluorescence can lead to difficulties in distinguishing the signals due to fluorescent antibodies or probes from the general background.
- a method disclosed herein utilizes one or more agents to reduce tissue autofluorescence, for example, Autofluorescence Eliminator (Sigma/EMD Millipore), TrueBlack Lipofuscin Autofluorescence Quencher (Biotium), MaxBlock Autofluorescence Reducing Reagent Kit (MaxVision Biosciences), and/or a very intense black dye (e.g., Sudan Black, or comparable dark chromophore).
- Autofluorescence Eliminator Sigma/EMD Millipore
- Biotium TrueBlack Lipofuscin Autofluorescence Quencher
- MaxBlock Autofluorescence Reducing Reagent Kit MaxVision Biosciences
- a very intense black dye e.g., Sudan Black, or comparable dark chromophore
- fluorescent labels and nucleotides and/or polynucleotides conjugated to such fluorescent labels comprise those described in, for example, Hoagland, Handbook of Fluorescent Probes and Research Chemicals, Ninth Edition (Molecular Probes, Inc., Eugene, 2002); Keller and Manak, DNA Probes, 2nd Edition (Stockton Press, New York, 1993); Eckstein, editor, Oligonucleotides and Analogues: A Practical Approach (IRL Press, Oxford, 1991); and Wetmur, Critical Reviews in Biochemistry and Molecular Biology, 26:227- 259 (1991).
- exemplary techniques and methods methodologies applicable to the provided embodiments comprise those described in, for example, US 4,757,141, US 5,151,507 and US 5,091,519.
- one or more fluorescent dyes are used as labels for labeled target sequences, for example, as described in US 5,188,934 (4,7- dichlorofluorescein dyes); US 5,366,860 (spectrally resolvable rhodamine dyes); US 5,847,162 (4,7- dichlororhodamine dyes); US 4,318,846 (ether-substituted fluorescein dyes); US 5,800,996 (energy transfer dyes); US 5,066,580 (xanthine dyes); and US 5,688,648 (energy transfer dyes).
- fluorescent label comprises a signaling moiety that conveys information through the fluorescent absorption and/or emission properties of one or more molecules.
- Exemplary fluorescent properties comprise fluorescence intensity, fluorescence lifetime, emission spectrum characteristics and energy transfer.
- the detection is carried out using any of a number of different types of microscopy, e.g., confocal microscopy, two-photon microscopy, light-field microscopy, intact tissue expansion microscopy, and/or CLARITYTM-optimized light sheet microscopy (COLM).
- confocal microscopy e.g., confocal microscopy, two-photon microscopy, light-field microscopy, intact tissue expansion microscopy, and/or CLARITYTM-optimized light sheet microscopy (COLM).
- fluorescence microscopy is used for detection and imaging of the sample.
- a fluorescence microscope is an optical microscope that uses fluorescence and phosphorescence instead of, or in addition to, reflection and absorption to study properties of organic or inorganic substances.
- fluorescence microscopy a sample is illuminated with light of a wavelength which excites fluorescence in the sample. The fluoresced light, which is usually at a longer wavelength than the illumination, is then imaged through a microscope objective.
- Two filters may be used in this technique; an illumination (or excitation) filter which ensures the illumination is near monochromatic and at the correct wavelength, and a second emission (or barrier) filter which ensures none of the excitation light source reaches the detector.
- the "fluorescence microscope” comprises any microscope that uses fluorescence to generate an image, whether it is a more simple set up like an epifluorescence microscope, or a more complicated design such as a confocal microscope, which uses optical sectioning to get better resolution of the fluorescent image.
- confocal microscopy is used for detection and imaging of the sample.
- Confocal microscopy uses point illumination and a pinhole in an optically conjugate plane in front of the detector to eliminate out-of-focus signal.
- the image's optical resolution is much better than that of wide-field microscopes.
- this increased resolution is at the cost of decreased signal intensity - so long exposures are often required.
- CLARITYTM-optimized light sheet microscopy provides an alternative microscopy for fast 3D imaging of large clarified samples. COLM interrogates large immunostained tissues, permits increased speed of acquisition and results in a higher quality of generated data.
- microscopy Other types of microscopy that can be employed comprise bright field microscopy, oblique illumination microscopy, dark field microscopy, phase contrast, differential interference contrast (DIC) microscopy, interference reflection microscopy (also known as reflected interference contrast, or RIC), single plane illumination microscopy (SPIM), super-resolution microscopy, laser microscopy, electron microscopy (EM), Transmission electron microscopy (TEM), Scanning electron microscopy (SEM), reflection electron microscopy (REM), Scanning transmission electron microscopy (STEM) and low- voltage electron microscopy (LVEM), scanning probe microscopy (SPM), atomic force microscopy (ATM), ballistic electron emission microscopy (BEEM), chemical force microscopy (CFM), conductive atomic force microscopy (C- AFM), electrochemical scanning tunneling microscope (ECSTM), electrostatic force microscopy (EFM), fluidic force microscope (FluidFM), force modulation microscopy (FMM), feature-oriented scanning probe microscopy (FOSPM),
- a method herein comprises subjecting the sample to expansion microscopy methods and techniques. Expansion allows individual targets (e.g., mRNA or RNA transcripts) which are densely packed within a cell, to be resolved spatially in a high-throughput manner. Expansion microscopy techniques are known in the art and can be performed as described in US 2016/0116384 and Chen et al., Science, 347, 543 (2015), each of which are incorporated herein by reference in their entirety.
- the method does not comprise subjecting the sample to expansion microscopy. In some embodiments, the method does not comprise dissociating a cell from the sample such as a tissue or the cellular microenvironment. In some embodiments, the method does not comprise lysing the sample or cells therein. In some embodiments, the method does not comprise embedding the sample or molecules from the sample in an exogenous matrix.
- analysis is performed on one or more images captured, and may comprise processing the image(s) and/or quantifying signals observed.
- images of signals from different fluorescent channels and/or nucleotide incorporation cycles can be compared and analyzed.
- images of signals (or absence thereof) at a particular location in a sample from different fluorescent channels and/or sequential incorporation cycles can be aligned to analyze an analyte at the location. For instance, a particular location in a sample can be tracked and signal spots from sequential incorporation cycles can be analyzed to detect a target polynucleotide sequence (e.g., a barcode sequence or subsequence thereof) in an analyte at the location.
- a target polynucleotide sequence e.g., a barcode sequence or subsequence thereof
- the analysis may comprise processing information of one or more cell types, one or more types of analytes, a number or level of analyte, and/or a number or level of cells detected in a particular region of the sample.
- the analysis comprises detecting a sequence e.g., a barcode sequence present in an amplification product at a location in the sample.
- the number of signals detected in a unit area in the biological sample is quantified.
- the signals detected at a corresponding position in the biological sample in a plurality of images taken at different z positions is quantified and analyzed.
- Methods and compositions disclosed herein may be used for analyzing a biological sample, which may be obtained from a subject using any of a variety of techniques including, but not limited to, biopsy, surgery, and laser capture microscopy (LCM), and generally includes cells and/or other biological material from the subject.
- a biological sample can also be obtained from a eukaryote, such as a tissue sample, a patient derived organoid (PDO) or patient derived xenograft (PDX).
- a biological sample from an organism may comprise one or more other organisms or components therefrom.
- a mammalian tissue section may comprise a prion, a viroid, a virus, a bacterium, a fungus, or components from other organisms, in addition to mammalian cells and non-cellular tissue components.
- Subjects from which biological samples can be obtained can be healthy or asymptomatic individuals, individuals that have or are suspected of having a disease (e.g., a patient with a disease such as cancer) or a pre-disposition to a disease, and/or individuals in need of therapy or suspected of needing therapy.
- a disease e.g., a patient with a disease such as cancer
- a pre-disposition to a disease e.g., a pre-disposition to a disease
- the biological sample corresponds to cells (e.g., derived from a cell culture, a tissue sample, or cells deposited on a surface).
- cells e.g., derived from a cell culture, a tissue sample, or cells deposited on a surface.
- individual cells can be naturally unaggregated.
- the cells can be derived from a suspension of cells (e.g., a body fluid such as blood) and/or disassociated or disaggregated cells from a tissue or tissue section.
- the number of cells in the biological sample can vary.
- Some biological samples comprise large numbers of cells, e.g., blood samples, while other biological samples comprise smaller or only a small number of cells or may only be suspected of containing cells, e.g., plasma, serum, urine, saliva, synovial fluids, amniotic fluid, lachrymal fluid, lymphatic fluid, liquor, cerebrospinal fluid and the like.
- a cell-containing biological sample can comprise a body fluid or a cell-containing sample derived from the body fluid, e.g., whole blood, samples derived from blood such as plasma or serum, buffy coat, urine, sputum, lachrymal fluid, lymphatic fluid, sweat, liquor, cerebrospinal fluid, ascites, milk, stool, bronchial lavage, saliva, amniotic fluid, nasal secretions, vaginal secretions, semen/seminal fluid, wound secretions, cell culture and swab samples, or any cell-containing sample derived from the aforementioned samples.
- a body fluid or a cell-containing sample derived from the body fluid e.g., whole blood, samples derived from blood such as plasma or serum, buffy coat, urine, sputum, lachrymal fluid, lymphatic fluid, sweat, liquor, cerebrospinal fluid, ascites, milk, stool, bronchial lavage, saliva, amniotic fluid, nasal secretions, vaginal secretions,
- a cell-containing biological sample can be a body fluid, a body secretion or body excretion, e.g., lymphatic fluid, blood, buffy coat, plasma or serum.
- a cell-containing biological sample can be a circulating body fluid such as blood or lymphatic fluid, e.g., peripheral blood obtained from a mammal such as human.
- the biological sample can include any number of macromolecules, for example, cellular macromolecules and organelles (e.g., mitochondria and nuclei).
- the biological sample can be obtained as a tissue sample, such as a tissue section, a cell pellet, a cell block, a biopsy, a core biopsy, needle aspirate, or fine needle aspirate.
- the sample can be a fluid sample, such as a blood sample, urine sample, or saliva sample.
- the sample can be a skin sample, a colon sample, a cheek swab, a histology sample, a histopathology sample, a plasma or serum sample, a tumor sample, living cells, cultured cells, a clinical sample such as, for example, whole blood or blood-derived products, blood cells, or cultured tissues or cells, including cell suspensions.
- the biological sample may comprise cells which are deposited on a surface.
- the biological sample may comprises transcripts of antigen receptor molecules.
- the biological sample comprises analytes from any of the sources described herein deposited on a surface.
- Bio samples can be derived from a homogeneous culture or population of the subjects or organisms mentioned herein or alternatively from a collection of several different organisms, for example, in a community or ecosystem.
- Biological samples can include one or more diseased cells.
- a diseased cell can have altered metabolic properties, gene expression, protein expression, and/or morphologic features. Examples of diseases include inflammatory disorders, metabolic disorders, nervous system disorders, and cancer. Cancer cells can be derived from solid tumors, hematological malignancies, cell lines, or obtained as circulating tumor cells. Biological samples can also include fetal cells and immune cells.
- Biological samples can include analytes (e.g., protein, RNA, and/or DNA) embedded in a 3D matrix.
- amplicons e.g., rolling circle amplification products
- analytes e.g., protein, RNA, and/or DNA
- a 3D matrix may comprise a network of natural molecules and/or synthetic molecules that are chemically and/or enzymatically linked, e.g., by crosslinking.
- a 3D matrix may comprise a synthetic polymer.
- a 3D matrix comprises a hydrogel.
- a substrate herein can be any support that is insoluble in aqueous liquid and which allows for positioning of biological samples, analytes, features, and/or reagents on the support.
- a biological sample can be attached to a substrate. Attachment of the biological sample can be irreversible or reversible, depending upon the nature of the sample and subsequent steps in the analytical method.
- the sample can be attached to the substrate reversibly by applying a suitable polymer coating to the substrate, and contacting the sample to the polymer coating. The sample can then be detached from the substrate, e.g., using an organic solvent that at least partially dissolves the polymer coating. Hydrogels are examples of polymers that are suitable for this purpose.
- the substrate can be coated or functionalized with one or more substances to facilitate attachment of the sample to the substrate.
- Suitable substances that can be used to coat or functionalize the substrate include, but are not limited to, lectins, poly-lysine, antibodies, and polysaccharides.
- a biological sample can be harvested from a subject (e.g., via surgical biopsy, whole subject sectioning) or grown in vitro on a growth substrate or culture dish as a population of cells, and prepared for analysis as a tissue slice or tissue section. Grown samples may be sufficiently thin for analysis without further processing steps. Alternatively, grown samples, and samples obtained via biopsy or sectioning, can be prepared as thin tissue sections using a mechanical cutting apparatus such as a vibrating blade microtome. As another alternative, in some embodiments, a thin tissue section can be prepared by applying a touch imprint of a biological sample to a suitable substrate material.
- the thickness of the tissue section can be a fraction of (e.g., less than 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, or 0.1) the maximum cross-sectional dimension of a cell.
- tissue sections having a thickness that is larger than the maximum cross-section cell dimension can also be used.
- cryostat sections can be used, which can be, e.g., 10-20 pm thick.
- the thickness of a tissue section typically depends on the method used to prepare the section and the physical characteristics of the tissue, and therefore sections having a wide variety of different thicknesses can be prepared and used.
- the thickness of the tissue section can be at least 0.1, 0.2, 0.3, 0.4, 0.5, 0.7, 1.0, 1.5, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 14, 15, 20, 30, 40, or 50 pm.
- Thicker sections can also be used if desired or convenient, e.g., at least 70, 80, 90, or 100 pm or more.
- the thickness of a tissue section is between 1-100 pm, 1-50 pm, 1-30 pm, 1-25 pm, 1-20 pm, 1-15 pm, 1- 10 pm, 2-8 pm, 3-7 pm, or 4-6 pm, but as mentioned above, sections with thicknesses larger or smaller than these ranges can also be analyzed.
- Multiple sections can also be obtained from a single biological sample.
- multiple tissue sections can be obtained from a surgical biopsy sample by performing serial sectioning of the biopsy sample using a sectioning blade. Spatial information among the serial sections can be preserved in this manner, and the sections can be analyzed successively to obtain three-dimensional information about the biological sample.
- the biological sample (e.g., a tissue section as described above) can be prepared by deep freezing at a temperature suitable to maintain or preserve the integrity (e.g., the physical characteristics) of the tissue structure.
- the frozen tissue sample can be sectioned, e.g., thinly sliced, onto a substrate surface using any number of suitable methods.
- a tissue sample can be prepared using a chilled microtome (e.g., a cryostat) set at a temperature suitable to maintain both the structural integrity of the tissue sample and the chemical properties of the nucleic acids in the sample.
- a temperature can be, e.g., less than -15°C, less than -20°C, or less than -25°C.
- the biological sample can be prepared using formalinfixation and paraffin-embedding (FFPE), which are established methods.
- FFPE formalinfixation and paraffin-embedding
- cell suspensions and other non-tissue samples can be prepared using formalinfixation and paraffin-embedding.
- the sample can be sectioned as described above.
- the paraffin-embedding material can be removed from the tissue section (e.g., deparaffinization) by incubating the tissue section in an appropriate solvent (e.g., xylene) followed by a rinse (e.g., 99.5% ethanol for 2 minutes, 96% ethanol for 2 minutes, and 70% ethanol for 2 minutes).
- a biological sample can be fixed in any of a variety of other fixatives to preserve the biological structure of the sample prior to analysis.
- a sample can be fixed via immersion in ethanol, methanol, acetone, paraformaldehyde (PFA)-Triton, and combinations thereof.
- acetone fixation is used with fresh frozen samples, which can include, but are not limited to, cortex tissue, mouse olfactory bulb, human brain tumor, human post-mortem brain, and breast cancer samples.
- pre-permeabilization steps may not be performed.
- acetone fixation can be performed in conjunction with permeabilization steps.
- the methods provided herein comprises one or more postfixing (also referred to as postfixation) steps.
- one or more post-fixing step is performed after contacting a sample with a polynucleotide disclosed herein, e.g., one or more probes such as a circular or padlock probe.
- one or more postfixing step is performed after a hybridization complex comprising a probe and a target is formed in a sample.
- one or more post-fixing step is performed prior to a ligation reaction disclosed herein, such as the ligation to circularize a padlock probe.
- one or more post-fixing step is performed after contacting a sample with a binding or labelling agent (e.g., an antibody or antigen binding fragment thereof) for a non-nucleic acid analyte such as a protein analyte.
- the labelling agent can comprise a nucleic acid molecule (e.g., reporter oligonucleotide) comprising a sequence corresponding to the labelling agent and therefore corresponds to (e.g., uniquely identifies) the analyte.
- the labelling agent can comprise a reporter oligonucleotide comprising one or more barcode sequences.
- a post-fixing step may be performed using any suitable fixation reagent disclosed herein, for example, 3% (w/v) paraformaldehyde in DEPC-PBS. (iv) Embedding
- a biological sample can be embedded in any of a variety of other embedding materials to provide structural substrate to the sample prior to sectioning and other handling steps.
- the embedding material can be removed e.g., prior to analysis of tissue sections obtained from the sample.
- suitable embedding materials include, but are not limited to, waxes, resins (e.g., methacrylate resins), epoxies, and agar.
- the biological sample can be embedded in a matrix (e.g., a hydrogel matrix). Embedding the sample in this manner typically involves contacting the biological sample with a hydrogel such that the biological sample becomes surrounded by the hydrogel.
- a hydrogel matrix e.g., a hydrogel matrix
- the sample can be embedded by contacting the sample with a suitable polymer material, and activating the polymer material to form a hydrogel.
- the hydrogel is formed such that the hydrogel is internalized within the biological sample.
- the biological sample is immobilized in the hydrogel via cross-linking of the polymer material that forms the hydrogel.
- Cross-linking can be performed chemically and/or photochemically, or alternatively by any other hydrogelformation method.
- composition and application of the hydrogel-matrix to a biological sample typically depends on the nature and preparation of the biological sample (e.g., sectioned, nonsectioned, type of fixation).
- the hydrogel-matrix can include a monomer solution and an ammonium persulfate (APS) initiator/tetramethylethylenediamine (TEMED) accelerator solution.
- APS ammonium persulfate
- TEMED tetramethylethylenediamine
- the biological sample consists of cells (e.g., cultured cells or cells disassociated from a tissue sample)
- the cells can be incubated with the monomer solution and APS/TEMED solutions.
- hydrogel-matrix gels are formed in compartments, including but not limited to devices used to culture, maintain, or transport the cells.
- hydrogelmatrices can be formed with monomer solution plus APS/TEMED added to the compartment to a depth ranging from about 0.1 pm to about 2 mm.
- biological samples can be stained using a wide variety of stains and staining techniques.
- a sample can be stained using any number of stains and/or immunohistochemical reagents.
- One or more staining steps may be performed to prepare or process a biological sample for an assay described herein or may be performed during and/or after an assay.
- the sample can be contacted with one or more nucleic acid stains, membrane stains (e.g., cellular or nuclear membrane), cytological stains, or combinations thereof.
- the stain may be specific to proteins, phospholipids, DNA (e.g., dsDNA, ssDNA), RNA, an organelle or compartment of the cell.
- the sample may be contacted with one or more labeled antibodies (e.g., a primary antibody specific for the analyte of interest and a labeled secondary antibody specific for the primary antibody).
- labeled antibodies e.g., a primary antibody specific for the analyte of interest and a labeled secondary antibody specific for the primary antibody.
- cells in the sample can be segmented using one or more images taken of the stained sample.
- the stain is performed using a lipophilic dye.
- the staining is performed with a lipophilic carbocyanine or aminostyryl dye, or analogs thereof (e.g, Dil, DiO, DiR, DiD).
- a lipophilic carbocyanine or aminostyryl dye or analogs thereof (e.g, Dil, DiO, DiR, DiD).
- Other cell membrane stains may include FM and RH dyes or immunohistochemical reagents specific for cell membrane proteins.
- the stain may include but is not limited to, acridine orange, acid fuchsin, Bismarck brown, carmine, coomassie blue, cresyl violet, DAPI, eosin, ethidium bromide, acid fuchsine, haematoxylin, Hoechst stains, iodine, methyl green, methylene blue, neutral red, Nile blue, Nile red, osmium tetroxide, ruthenium red, propidium iodide, rhodamine (e.g., rhodamine B), or safranine, or derivatives thereof.
- the sample may be stained with haematoxylin and eosin (H&E).
- the sample can be stained using hematoxylin and eosin (H&E) staining techniques, using Papanicolaou staining techniques, Masson’s trichrome staining techniques, silver staining techniques, Sudan staining techniques, and/or using Periodic Acid Schiff (PAS) staining techniques.
- HPA staining is typically performed after formalin or acetone fixation.
- the sample can be stained using Romanowsky stain, including Wright’s stain, Jenner’s stain, Can-Grunwald stain, Leishman stain, and Giemsa stain.
- biological samples can be destained. Methods of destaining or discoloring a biological sample generally depend on the nature of the stain(s) applied to the sample. For example, in some embodiments, one or more immunofluorescent stains are applied to the sample via antibody coupling. Such stains can be removed using techniques such as cleavage of disulfide linkages via treatment with a reducing agent and detergent washing, chaotropic salt treatment, treatment with antigen retrieval solution, and treatment with an acidic glycine buffer. Methods for multiplexed staining and destaining are described, for example, in Bolognesi et al., J. Histochem. Cytochem.
- a biological sample embedded in a matrix can be isometrically expanded.
- Isometric expansion methods that can be used include hydration, a preparative step in expansion microscopy, as described in Chen et al., Science 347(6221):543-548, 2015.
- Isometric expansion can be performed by anchoring one or more components of a biological sample to a gel, followed by gel formation, proteolysis, and swelling.
- analytes in the sample, products of the analytes, and/or probes associated with analytes in the sample can be anchored to the matrix (e.g., hydrogel).
- Isometric expansion of the biological sample can occur prior to immobilization of the biological sample on a substrate, or after the biological sample is immobilized to a substrate.
- the isometrically expanded biological sample can be removed from the substrate prior to contacting the substrate with probes disclosed herein.
- the steps used to perform isometric expansion of the biological sample can depend on the characteristics of the sample (e.g., thickness of tissue section, fixation, cross-linking), and/or the analyte of interest (e.g., different conditions to anchor RNA, DNA, and protein to a gel).
- proteins in the biological sample are anchored to a swellable gel such as a polyelectrolyte gel.
- An antibody can be directed to the protein before, after, or in conjunction with being anchored to the swellable gel.
- DNA and/or RNA in a biological sample can also be anchored to the swellable gel via a suitable linker.
- linkers include, but are not limited to, 6-((Acryloyl)amino) hexanoic acid (Acryloyl-X SE) (available from ThermoFisher, Waltham, MA), Label-IT Amine (available from MirusBio, Madison, WI) and Label X (described for example in Chen et al., Nat. Methods 13:679-684, 2016, the entire contents of which are incorporated herein by reference).
- Acryloyl-X SE 6-((Acryloyl)amino) hexanoic acid
- Label-IT Amine available from MirusBio, Madison, WI
- Label X described for example in Chen et al., Nat. Methods 13:679-684, 2016, the entire contents of which are incorporated herein by reference).
- Isometric expansion of the sample can increase the spatial resolution of the subsequent analysis of the sample.
- the increased resolution in spatial profiling can be determined by comparison of an isometrically expanded sample with a sample that has not been isometrically expanded.
- a biological sample is isometrically expanded to a size at least 2x, 2. lx, 2.2x, 2.3x, 2.4x, 2.5x, 2.6x, 2.7x, 2.8x, 2.9x, 3x, 3. lx, 3.2x, 3.3x, 3.4x, 3.5x, 3.6x, 3.7x, 3.8x, 3.9x, 4x, 4. lx, 4.2x, 4.3x, 4.4x, 4.5x, 4.6x, 4.7x, 4.8x, or 4.9x its nonexpanded size.
- the sample is isometrically expanded to at least 2x and less than 20x of its non-expanded size.
- the biological sample is reversibly cross-linked prior to or during an in situ assay.
- the analytes, polynucleotides and/or amplification product (e.g., amplicon) of an analyte or a probe bound thereto can be anchored to a polymer matrix.
- the polymer matrix can be a hydrogel.
- one or more of the polynucleotide probe(s) and/or amplification product (e.g., amplicon) thereof can be modified to contain functional groups that can be used as an anchoring site to attach the polynucleotide probes and/or amplification product to a polymer matrix.
- a modified probe comprising oligo dT may be used to bind to mRNA molecules of interest, followed by reversible crosslinking of the mRNA molecules.
- a hydrogel may include a macromolecular polymer gel including a network. Within the network, some polymer chains can optionally be cross-linked, although crosslinking does not always occur.
- a hydrogel can include hydrogel subunits, such as, but not limited to, acrylamide, bis-acrylamide, polyacrylamide and derivatives thereof, poly(ethylene glycol) and derivatives thereof (e.g., PEG-acrylate (PEG-DA), PEG-RGD), gelatin- methacryloyl (GelMA), methacrylated hyaluronic acid (MeHA), polyaliphatic polyurethanes, polyether polyurethanes, polyester polyurethanes, polyethylene copolymers, polyamides, polyvinyl alcohols, polypropylene glycol, poly tetramethylene oxide, polyvinyl pyrrolidone, polyacrylamide, poly (hydroxy ethyl acrylate), and poly (hydroxy ethyl meth
- a hydrogel includes a hybrid material, e.g., the hydrogel material includes elements of both synthetic and natural polymers.
- the hydrogel material includes elements of both synthetic and natural polymers. Examples of suitable hydrogels are described, for example, in U.S. Patent Nos. 6,391,937, 9,512,422, and 9,889,422, and in U.S. Patent Application Publication Nos. 2017/0253918, 2018/0052081 and 2010/0055733, the entire contents of each of which are incorporated herein by reference.
- the hydrogel can form the substrate.
- the substrate includes a hydrogel and one or more second materials.
- the hydrogel is placed on top of one or more second materials.
- the hydrogel can be pre-formed and then placed on top of, underneath, or in any other configuration with one or more second materials.
- hydrogel formation occurs after contacting one or more second materials during formation of the substrate. Hydrogel formation can also occur within a structure (e.g., wells, ridges, projections, and/or markings) located on a substrate.
- hydrogel formation on a substrate occurs before, contemporaneously with, or after probes are provided to the sample.
- hydrogel formation can be performed on the substrate already containing the probes.
- hydrogel formation occurs within a biological sample.
- a biological sample e.g., tissue section
- hydrogel subunits are infused into the biological sample, and polymerization of the hydrogel is initiated by an external or internal stimulus.
- functionalization chemistry in which a hydrogel is formed within a biological sample, functionalization chemistry can be used.
- functionalization chemistry includes hydrogel-tissue chemistry (HTC).
- HTC hydrogel-tissue chemistry
- Any hydrogel-tissue backbone (e.g., synthetic or native) suitable for HTC can be used for anchoring biological macromolecules and modulating functionalization.
- Non-limiting examples of methods using HTC backbone variants include CLARITY, PACT, ExM, SWITCH and ePACT.
- hydrogel formation within a biological sample is permanent.
- biological macromolecules can permanently adhere to the hydrogel allowing multiple rounds of interrogation.
- hydrogel formation within a biological sample is reversible.
- additional reagents are added to the hydrogel subunits before, contemporaneously with, and/or after polymerization.
- additional reagents can include but are not limited to oligonucleotides (e.g., probes), endonucleases to fragment DNA, fragmentation buffer for DNA, DNA polymerase enzymes, dNTPs used to amplify the nucleic acid and to attach the barcode to the amplified fragments.
- Other enzymes can be used, including without limitation, RNA polymerase, ligase, proteinase K, and DNAse.
- Additional reagents can also include reverse transcriptase enzymes, including enzymes with terminal transferase activity, primers, and switch oligonucleotides.
- optical labels are added to the hydrogel subunits before, contemporaneously with, and/or after polymerization.
- HTC reagents are added to the hydrogel before, contemporaneously with, and/or after polymerization.
- a cell labelling agent is added to the hydrogel before, contemporaneously with, and/or after polymerization.
- a cell-penetrating agent is added to the hydrogel before, contemporaneously with, and/or after polymerization.
- Hydrogels embedded within biological samples can be cleared using any suitable method.
- electrophoretic tissue clearing methods can be used to remove biological macromolecules from the hydrogel-embedded sample.
- a hydrogel-embedded sample is stored before or after clearing of hydrogel, in a medium (e.g., a mounting medium, methylcellulose, or other semi-solid mediums).
- a method disclosed herein comprises de-crosslinking the reversibly cross-linked biological sample.
- the de-crosslinking does not need to be complete.
- only a portion of crosslinked molecules in the reversibly cross-linked biological sample are de-crosslinked and allowed to migrate.
- a biological sample can be permeabilized to facilitate transfer of species (such as probes) into the sample. If a sample is not permeabilized sufficiently, the amount of species (such as probes) in the sample may be too low to enable adequate analysis. Conversely, if the tissue sample is too permeable, the relative spatial relationship of the analytes within the tissue sample can be lost. Hence, a balance between permeabilizing the tissue sample enough to obtain good signal intensity while still maintaining the spatial resolution of the analyte distribution in the sample is desirable.
- a biological sample can be permeabilized by exposing the sample to one or more permeabilizing agents.
- Suitable agents for this purpose include, but are not limited to, organic solvents (e.g., acetone, ethanol, and methanol), cross-linking agents (e.g., paraformaldehyde), detergents (e.g., saponin, Triton X-100TM or Tween-20TM), and enzymes (e.g., trypsin, proteases).
- the biological sample can be incubated with a cellular permeabilizing agent to facilitate permeabilization of the sample. Additional methods for sample permeabilization are described, for example, in Jamur et al., Method Mol. Biol. 588:63-66, 2010, the entire contents of which are incorporated herein by reference. Any suitable method for sample permeabilization can generally be used in connection with the samples described herein.
- the biological sample can be permeabilized by adding one or more lysis reagents to the sample.
- suitable lysis agents include, but are not limited to, bioactive reagents such as lysis enzymes that are used for lysis of different cell types, e.g., gram positive or negative bacteria, plants, yeast, mammalian, such as lysozymes, achromopeptidase, lysostaphin, labiase, kitalase, lyticase, and a variety of other commercially available lysis enzymes.
- lysis agents can additionally or alternatively be added to the biological sample to facilitate permeabilization.
- surfactant-based lysis solutions can be used to lyse sample cells. Lysis solutions can include ionic surfactants such as, for example, sarcosyl and sodium dodecyl sulfate (SDS). More generally, chemical lysis agents can include, without limitation, organic solvents, chelating agents, detergents, surfactants, and chaotropic agents.
- the biological sample can be permeabilized by nonchemical permeabilization methods.
- Non-chemical permeabilization methods that can be used include, but are not limited to, physical lysis techniques such as electroporation, mechanical permeabilization methods (e.g., bead beating using a homogenizer and grinding balls to mechanically disrupt sample tissue structures), acoustic permeabilization (e.g., sonication), and thermal lysis techniques such as heating to induce thermal permeabilization of the sample.
- Additional reagents can be added to a biological sample to perform various functions prior to analysis of the sample.
- DNase and RNase inactivating agents or inhibitors such as proteinase K, and/or chelating agents such as EDTA, can be added to the sample.
- a method disclosed herein may comprise a step for increasing accessibility of a nucleic acid for binding, e.g., a denaturation step to open up DNA in a cell for hybridization by a probe.
- proteinase K treatment may be used to free up DNA with proteins bound thereto.
- RNA or cDNA is the analyte
- one or more RNA or cDNA analyte species of interest can be selectively enriched.
- one or more species of RNA or cDNA of interest can be selected by addition of one or more oligonucleotides to the sample.
- the additional oligonucleotide is a sequence used for priming a reaction by an enzyme (e.g., a polymerase).
- one or more primer sequences with sequence complementarity to one or more RNAs or cDNAs of interest can be used to amplify the one or more RNAs or cDNAs of interest, thereby selectively enriching these RNAs or cDNAs.
- a first and second probe that is specific for (e.g., specifically hybridizes to) each RNA or cDNA analyte are used.
- templated ligation is used to detect gene expression in a biological sample.
- An analyte of interest such as a protein
- a labelling agent or binding agent e.g., an antibody or epitope binding fragment thereof
- the binding agent is conjugated or otherwise associated with a reporter oligonucleotide comprising a reporter sequence that identifies the binding agent, can be targeted for analysis.
- Probes may be hybridized to the reporter oligonucleotide and ligated in a templated ligation reaction to generate a product for analysis.
- gaps between the probe oligonucleotides may first be filled prior to ligation, using, for example, Mu polymerase, DNA polymerase, RNA polymerase, reverse transcriptase, VENT polymerase, Taq polymerase, and/or any combinations, derivatives, and variants (e.g., engineered mutants) thereof.
- the assay can further include amplification of templated ligation products (e.g., by multiplex PCR).
- the analytes may be further enriched for in situ readout by immobilization at a location in the biological sample.
- the analytes may comprise one or more fragments that are specific to a location in the biological sample.
- RNA can be down-selected (e.g., removed) using any of a variety of methods.
- probes can be administered to a sample that selectively hybridize to ribosomal RNA (rRNA), thereby reducing the pool and concentration of rRNA in the sample.
- rRNA ribosomal RNA
- DSN duplex- specific nuclease treatment can remove rRNA (see, e.g., Archer, et al, Selective and flexible depletion of problematic sequences from RNA-seq libraries at the cDNA stage, BMC Genomics, 15 401, (2014), the entire contents of which are incorporated herein by reference).
- hydroxyapatite chromatography can remove abundant species (e.g., rRNA) (see, e.g., Vandemoot, V.A., cDNA normalization by hydroxyapatite chromatography to enrich transcriptome diversity in RNA-seq applications, Biotechniques, 53(6) 373-80, (2012), the entire contents of which are incorporated herein by reference).
- a biological sample may comprise one or a plurality of analytes of interest. Methods for performing multiplexed assays to analyze two or more different analytes in a single biological sample are provided.
- compositions and kits comprising any of the reagents for sequencing nucleic acids according to any of the embodiments described herein.
- Such compositions can comprise, but are not limited to, nucleic acid molecules, nucleotides conjugated to reversible labels such as fluorophores, nucleotides comprising reversible terminators, polymerases, chelators (e.g. EDTA), and salts and buffer solutions.
- kits for analyzing an analyte in a biological sample according to any of the methods described herein.
- kits may comprise, e.g., one or more reagents for detecting one or more target analytes, and instructions for performing one or more steps of the methods provided herein.
- the one or more reagents for performing the methods provided herein may include, e.g., nucleotides, modified nucleotides, polymerases and/or other enzymes, hybridization probes for detection, circularizable probes for amplification, nucleic acid primers, buffers, etc.
- kits may comprise one or more nucleotide mixtures comprising any combination of reversibly-terminated (e.g., 3’-OH reversibly terminated) and/or non-terminated nucleotides selected from A, T/U, G, and C.
- each terminated or non-terminated (e.g., 3 ’-OH reversibly terminated) nucleotide of a different base can be labeled with a different detectable label (e.g., a different fluorophore).
- a different detectable label e.g., a different fluorophore
- kits may further comprise one or more reagents required for one or more steps comprising hybridization, ligation, extension, amplification, detection, and/or sample preparation as described herein, including, for example, wash buffers and/or ligation buffers.
- the kit further comprises an enzyme such as a ligase and/or a polymerase described herein.
- the kit comprises a polymerase, for instance for performing extension of the primers and to incorporate nucleotides.
- kits contain reagents for fixing, embedding, and/or permeabilizing the biological sample.
- kits may contain reagents for forming a functionalized matrix (e.g., a hydrogel) and/or for functionalizing a matrix (e.g., a hydrogel) with any suitable functional moieties.
- a functionalized matrix e.g., a hydrogel
- buffers and reagents for tethering the probes and products e.g., RCA products
- the various components of the kit may be present in separate containers or certain compatible components may be pre-combined into a single container.
- the kits further contain instructions for using the components of the kit to practice the provided methods.
- instrument systems configured to perform any of the methods or processes described herein, and databases storing codebooks generated using the disclosed methods or processes.
- the disclosed systems may comprise, for example, one or more processors; and a memory communicatively coupled to the one or more processors and configured to store instructions that, when executed by the one or more processors, cause the system to: receive a plurality of images of a biological sample, wherein the plurality of images comprises images acquired in a plurality of sequencing or probing cycles; detect, based on the plurality of images, a series of optical signals at one or more locations in the biological sample corresponding to one or more barcoded target analytes; determine, based on the series of optical signals detected in the plurality of images, a code word comprising a series of ON and OFF bits that corresponds to a barcode for one of the one or more barcoded target analytes; and identify the barcoded target analyte based on a comparison of the determined code word to a codebook, wherein the code word corresponds to a member of a codebook comprising a plurality of code words for which:
- Wj, W, ⁇ W hinderJ > K for all possible combinations of code words Wi, Wj, W m , W n , wherein W
- the disclosed databases for storing codebooks may comprise, for example, one or more non-transitory computer-readable storage medium components, the one or more non- transitory computer-readable storage medium components individually or collectively storing a codebook comprising a plurality of code words for which: for all possible combinations of code words Wi, Wj, W m , W n , wherein IV] Wj is a logical bitwise OR combination of any two code words Wi and Wj, wherein I ] W n is a logical bitwise OR combination of any two code words W m and W n , wherein K is an integer value greater than or equal to 1, wherein the codebook comprises L code words, and wherein z, j, m, and n are integers ranging in value from 0 to L - 1 and represent indices of the code words in the codebook.
- the disclosed instrument systems may comprise instruments having integrated optics and fluidics modules (e.g., “opto-fluidic instruments” or “opto-fluidic systems”) for detecting target molecules (e.g., nucleic acids, proteins, antibodies, etc.) in biological samples (e.g., one or more cells or a tissue sample) as described herein.
- the fluidics module is configured to deliver one or more reagents (e.g., detectably labeled nucleotides, polymerases, or conjugates) to the biological sample and/or remove spent reagents therefrom.
- the optics module is configured to illuminate the biological sample with light having one or more spectral emission curves (over a range of wavelengths) and subsequently capture one or more images of emitted light signals from the biological sample during one or more sequencing cycles (e.g., as described in Section III).
- an in situ assay e.g., sequencing a template nucleic acid
- the captured images may be processed in real time and/or at a later time to determine the presence of the one or more target molecules in the biological sample, as well as three-dimensional position information associated with each detected target molecule.
- the opto-fluidics instrument includes a sample module configured to receive (and, optionally, secure) one or more biological samples.
- the sample module includes an X-Y stage configured to move the biological sample along an X-Y plane (e.g., perpendicular to an objective lens of the optics module).
- the opto-fluidic instrument is configured to analyze one or more target molecules in their naturally occurring place (z.e., in situ) within the biological sample.
- an opto-fluidic instrument may be an in-situ analysis system used to analyze a biological sample and detect target molecules (e.g., analytes) including but not limited to DNA, RNA, proteins, antibodies, and/or the like.
- an opto-fluidic instrument that can be used for in situ target molecule detection via base-by-base sequencing (e.g., sequencing of an identifier sequence such as a barcode sequence) and/or other imaging or target molecule detection technique.
- an opto-fluidic instrument may include a fluidics module that includes fluids needed for establishing the experimental conditions required for the probing of target molecules in the sample.
- an opto-fluidic instrument may also include a sample module configured to receive the sample, and an optics module including an imaging system for illuminating (e.g., exciting one or more fluorescently labeled nucleotides within the sample) and/or imaging light signals received from the sample.
- the in situ analysis system may also include other ancillary modules configured to facilitate the operation of the opto-fluidic instrument, such as, but not limited to, cooling systems, motion calibration systems, etc.
- volumetric sample imaging systems e.g., an optofluidic instrument
- a z-stack of images is obtained for each Field of View (FOV) of the objective (FIG. 7).
- FOV Field of View
- tissue imaging applications automatically identifying relevant regions - those regions that contain target molecules such as nucleic acids or proteins - can be challenging as distribution of tissue is non-uniform in many biological samples (FIG. 8).
- the data extracted from the detection and analysis methods disclosed herein include the relative coordinates within a field of view (FOV) and provides intricate information regarding tissue organization.
- FOV field of view
- the systems and methods described herein use any suitable method to generate contrast of a sample against a background (e.g., illumination of a sample via bright field imaging, illumination of a sample via fluorescent imaging, inducing autofluorescence within the sample, adding contrast to the sample with one or more stains, etc.)
- FIG. 9 shows an example workflow of analysis of a biological sample 910 (e.g., cell or tissue sample) using an opto-fluidic instrument 900, according to various embodiments.
- the sample 910 can be a biological sample (e.g., a tissue) that includes molecules such as DNA, RNA, proteins, antibodies, etc.
- the sample 910 can be a sectioned tissue that is treated to access the RNA thereof for labeling with circularizable DNA probes. Ligation of the probes may generate a circular DNA probe which can be enzymatically amplified and bound with fluorescent oligonucleotides, which can create bright signal that is convenient to image and has a high signal-to-noise ratio.
- the sample 910 may be placed in the opto-fluidic instrument 900 for analysis and detection of the molecules in the sample 910.
- the opto-fluidic instrument 900 can be a system configured to facilitate the experimental conditions conducive for the detection of the target molecules.
- the opto-fluidic instrument 900 can include a fluidics module 930, an optics module 940, a sample module 950, and an ancillary module 960, and these modules may be operated by a system controller 920 to create the experimental conditions for the probing of the molecules in the sample 910 by selected probes (e.g., circularizable DNA probes), as well as to facilitate the imaging of the probed sample (e.g., by an imaging system of the optics module 940).
- the various modules of the opto-fluidic instrument 900 may be separate components in communication with each other, or at least some of them may be integrated together.
- the sample module 950 may be configured to receive the sample 910 into the opto-fluidic instrument 900.
- the sample module 950 may include a sample interface module (SIM) that is configured to receive a sample device (e.g., cassette) onto which the sample 910 can be deposited. That is, the sample 910 may be placed in the opto-fluidic instrument 900 by depositing the sample 910 (e.g., the sectioned tissue) on a sample device that is then inserted into the SIM of the sample module 950.
- SIM sample interface module
- the sample module 950 may also include an X-Y stage onto which the SIM is mounted.
- the X-Y stage may be configured to move the SIM mounted thereon (e.g., and as such the sample device containing the sample 910 inserted therein) in perpendicular directions along the two-dimensional (2D) plane of the opto-fluidic instrument 900.
- the experimental conditions that are conducive for the detection of the molecules in the sample 910 may depend on the target molecule detection technique that is employed by the opto-fluidic instrument 900.
- the opto-fluidic instrument 900 can be a system that is configured to detect molecules in the sample 910 via hybridization of probes.
- the experimental conditions can include molecule hybridization conditions that result in the intensity of hybridization of the target molecule (e.g., nucleic acid) to a probe (e.g., oligonucleotide) being significantly higher when the probe sequence is complementary to the target molecule than when there is a single-base mismatch.
- the hybridization conditions include the preparation of the sample 910 using reagents such as washing/stripping reagents, hybridizing reagents, etc., and such reagents may be provided by the fluidics module 930.
- the fluidics module 930 may include one or more components that may be used for storing the reagents, as well as for transporting said reagents to and from the sample device containing the sample 910.
- the fluidics module 930 may include reservoirs configured to store the reagents, as well as a waste container configured for collecting the reagents (e.g., and other waste) after use by the opto- fluidic instrument 900 to analyze and detect the molecules of the sample 910.
- the fluidics module 930 may also include pumps, tubes, pipettes, etc., that are configured to facilitate the transport of the reagent to the sample device (e.g., and as such the sample 910).
- the fluidics module 930 may include pumps (“reagent pumps”) that are configured to pump washing/stripping reagents to the sample device for use in washing/stripping the sample 910 (e.g., as well as other washing functions such as washing an objective lens of the imaging system of the optics module 940).
- reagent pumps that are configured to pump washing/stripping reagents to the sample device for use in washing/stripping the sample 910 (e.g., as well as other washing functions such as washing an objective lens of the imaging system of the optics module 940).
- the ancillary module 960 can be a cooling system of the opto-fluidic instrument 900, and the cooling system may include a network of coolantcarrying tubes that are configured to transport coolants to various modules of the opto-fluidic instrument 900 for regulating the temperatures thereof.
- the fluidics module 930 may include coolant reservoirs for storing the coolants and pumps (e.g., “coolant pumps”) for generating a pressure differential, thereby forcing the coolants to flow from the reservoirs to the various modules of the opto-fluidic instrument 900 via the coolant-carrying tubes.
- the fluidics module 930 may include returning coolant reservoirs that may be configured to receive and store returning coolants, i.e., heated coolants flowing back into the returning coolant reservoirs after absorbing heat discharged by the various modules of the opto-fluidic instrument 900.
- the fluidics module 930 may also include cooling fans that are configured to force air (e.g., cool and/or ambient air) into the returning coolant reservoirs to cool the heated coolants stored therein.
- the fluidics module 930 may also include cooling fans that are configured to force air directly into a component of the opto-fluidic instrument 900 so as to cool said component.
- the fluidics module 930 may include cooling fans that are configured to direct cool or ambient air into the system controller 920 to cool the same.
- the opto-fluidic instrument 900 may include an optics module 940 which include the various optical components of the opto-fluidic instrument 900, such as but not limited to a camera, an illumination module (e.g., light source such as LEDs), an objective lens, and/or the like.
- the optics module 940 may include a fluorescence imaging system that is configured to image the fluorescence emitted by the probes (e.g., oligonucleotides) in the sample 910 after the probes are excited by light from the illumination module of the optics module 940.
- the optics module 940 may also include an optical frame onto which the camera, the illumination module, and/or the X-Y stage of the sample module 950 may be mounted.
- the system controller 920 may be configured to control the operations of the opto-fluidic instrument 900 (e.g., and the operations of one or more modules thereof).
- the system controller 920 may take various forms, including a processor, a single computer (or computer system), or multiple computers in communication with each other.
- the system controller 920 may be communicatively coupled with data storage, set of input devices, display system, or a combination thereof. In some cases, some or all of these components may be considered to be part of or otherwise integrated with the system controller 920, may be separate components in communication with each other, or may be integrated together.
- the system controller 920 can be, or may be in communication with, a cloud computing platform.
- the opto-fluidic instrument 900 may analyze the sample 910 and may generate the output 970 that includes indications of the presence of the target molecules in the sample 910. For instance, with respect to the example embodiment discussed above where the opto-fluidic instrument 900 employs a hybridization technique for detecting molecules, the opto-fluidic instrument 900 may cause the sample 910 to undergo successive rounds of fluorescent probe hybridization (using two or more sets of fluorescent probes, where each set of fluorescent probes is excited by a different color channel) and be imaged to detect target molecules in the probed sample 910. In such cases, the output 970 may include optical signatures (e.g., a code word) specific to each gene, which allow the identification of the target molecules.
- optical signatures e.g., a code word
- an assembly for transilluminating a substrate can include a sample carrier device (e.g., a microfluidic chip or glass slide), a thermal control module configured to control the temperature of the sample carrier device (e.g., a thermoelectric module), and a light source configured to illuminate the sample carrier device.
- the assembly includes a heat exchanger (e.g., a fluid block having a cooling fluid flowing therethrough).
- an assembly for transilluminating can include sample carrier device (e.g., a sample substrate), an optically transparent substrate, a light source configured to illuminate the optically transparent substrate, a light scattering layer configured to scatter light from the light source, and/or a thermal control module configured to control the temperature of the sample carrier device and/or optically transparent substrate.
- sample carrier device e.g., a sample substrate
- optically transparent substrate e.g., a sample substrate
- a light source e.g., a sample substrate
- a light scattering layer configured to scatter light from the light source
- a thermal control module configured to control the temperature of the sample carrier device and/or optically transparent substrate.
- the sample carrier device (e.g., a cassette) can be configured to receive a sample.
- the sample carrier device can include one or more microfluidic channels, e.g., sample chambers or microfluidic channels etched into a planar substrate or chambers within a flow cell or microfluidic device.
- a sample carrier device for the systems disclosed herein can include, but is not limited to, a substrate configured to receive a sample, a microscope slide and/or an adapter configured to mount microscope slides (with or without coverslips) on a microscope stage or automated stage (e.g., an automated translation or rotational stage), a substrate, and/or an adapter configured to mount slides on a microscope stage or automated stage, a substrate comprising etched sample containment chambers (e.g., chambers open to the environment) and/or an adapter configured to mount such substrates on a microscope stage or automated stage, a flow cell and/or an adapter configured to mount flow cells on a microscope stage or automated stage, or a microfluidic device and/or an adapter configured to mount microfluidic devices on a microscope stage or automated stage.
- a substrate configured to receive a sample
- a microscope slide and/or an adapter configured to mount microscope slides (with or without coverslips) on a microscope stage or automated stage
- a substrate comprising etched sample containment
- the sample carrier device further includes a cassette configured to secure a substrate (e.g., a glass slide).
- a substrate e.g., a glass slide
- the cassette includes two or more components (e.g., a top half and a bottom half) into which the substrate is secured.
- the one or more sample carrier devices can be designed for performing a variety of chemical analysis, biochemical analysis, nucleic acid analysis, cell analysis, or tissue analysis applications.
- the sample carrier device e.g., flow cells and microfluidic devices
- the sample carrier device may comprise a sample, e.g., a tissue sample.
- the sample carrier device e.g., flow cells and microfluidic devices
- sample carrier devices for the disclosed systems can be fabricated from any of a variety of materials known to those of skill in the art including, but not limited to, glass (e.g., borosilicate glass, soda lime glass, etc.), fused silica (quartz), silicon, polymer (e.g., polystyrene (PS), macroporous polystyrene (MPPS), polymethylmethacrylate (PMMA), polycarbonate (PC), polypropylene (PP), polyethylene (PE), high density polyethylene (HDPE), cyclic olefin polymers (COP), cyclic olefin copolymers (COC), polyethylene terephthalate (PET), poly dimethylsiloxane (PDMS), etc.), polyetherimide (PEI) and perfluoroelasto
- the one or more materials used to fabricate sample carrier devices for the disclosed systems can be optically transparent to facilitate use with spectroscopic or imaging-based detection techniques.
- the entire sample carrier device can be optically transparent.
- only a portion of the sample carrier device e.g., an optically transparent “window”) can be optically transparent.
- sample carrier devices for the disclosed systems can be fabricated using any of a variety of techniques known to those of skill in the art, where the choice of fabrication technique is often dependent on the choice of material used, and vice versa.
- sample carrier device fabrication techniques include, but are not limited to, extrusion, drawing, precision computer numerical control (CNC) machining and boring, laser photoablation, photolithography in combination with wet chemical etching, deep reactive ion etching (DRIE), micro-molding, embossing, 3D-printing, thermal bonding, adhesive bonding, anodic bonding, and the like (see, e.g., Gale, et al. (2016), “A Review of Current Methods in Microfluidic Device Fabrication and Future Commercialization Prospects”, Inventions 3, 60, 1 - 25, which is hereby incorporated by reference in its entirety).
- CNC computer numerical control
- DRIE deep reactive ion etching
- FIG. 10A illustrates a cross-sectional view of an optics module 1000 in an imaging system.
- One or more illumination sources 1010 e.g., one or more light emitting diodes (LEDs)
- LEDs light emitting diodes
- the optical components include a collimator 1011.
- the optical components include a field stop 1012.
- the optical components include one or more excitation filters 1013.
- the one or more excitation filters 1013 are configured to filter light from the illumination source(s) 1010 for a predetermined range of wavelengths (e.g., each filter has one or more blocking band(s) and/or transmission band(s) that may be different or may overlap at least in part) and each excitation filter 1013 is aligned with appropriate illumination sources (e.g., blue LEDs, green LEDs, yellow LEDs, red LEDs, ultraviolet LEDs, etc.).
- the optical components include a condenser 1014.
- the optical components include a beam splitter 1015.
- An optical axis 1051 is illustrated extending through the center of the optical surfaces in the objective lens 1020 and its path includes an image plane, a focal plane, and input/output pupils (illustrated in FIG. 10B).
- a sensor array 1060 receives light signals from the sample 1050.
- the optical components include one or more emission filters 1065.
- the one or more emission filters 1065 are configured to filter light from the sample (e.g., emitted from one or more fluorophores, autofluorescence, etc.) for a predetermined range of wavelengths (e.g., each filter has one or more blocking band(s) and/or transmission band(s) that may be different or may overlap at least in part).
- the emission filters 1065 align (e.g., via motorized translation) with optics and/or the sensor array.
- the sample 1050 is probed with fluorescent probes configured to bind to a target (e.g., DNA or RNA) that, when illuminated with a particular wavelength (or range of wavelengths) of light, emit light signals that can be detected by the sensor array 1060.
- a target e.g., DNA or RNA
- the sample 1050 is repeatedly probed with two or more (e.g., two, three, four, five, six, etc.) different sets of probes.
- each set of probes corresponds to a specific color (e.g., blue, green, yellow, or red) such that, when illuminated by that color, probes bound to a target emit light signals.
- the sensor array 1060 is aligned with the optical axis 1051 of the objective lens 1020 (i.e., the optical axis of the camera is coincident with and parallel to the optical axis of the objective lens 1020). In various embodiments, the sensor array 1060 is positioned perpendicularly to the objective lens 1020 (i.e., the optical axis of the camera is perpendicular to and intersects the optical axis of the objective lens 1020). In various embodiments, a tube lens 1061 is mounted in the optical path to focus light on the sensor array 1060 thereby allowing for image formation with infinity -corrected objectives. Descriptions of optical modules and illumination assemblies for use in opto-fluidic instruments can be found in U.S.
- the sample is illuminated with one or more wavelengths configured to induce fluorescence in the sample.
- the sample is probed during one or more probing cycles with one or more fluorescent probes configured to bind to one or more target analytes.
- the one or more wavelengths are selected to induce fluorescence in a subset of the one or more fluorescent probes.
- each probing cycle includes illumination with two or more (e.g., four) colors of light.
- the sample is treated with a fluorescent stain configured to illuminate one or more structures within the sample.
- the sample is contacted with a nuclear stain.
- the sample is contacted with 4',6-diamidino-2-phenylindole (“DAPI”) configured to bind to adenine-thymine-rich regions in DNA.
- illumination of the sample causes autofluorescence of the sample.
- autofluorescence is the natural emission of light by biological structures when they have absorbed light, and may be used to distinguish the light originating from artificially added fluorescent markers.
- fluorescence of the sample through fluorescent probes, autofluorescence, and/or a fluorescent stain can be used with the methods described herein to determine one or more focus metrics of a tissue sample.
- the sample is illuminated via edge lighting or transillumination along one or more edges of the sample and/or sample substrate.
- the edge lighting provides dark-field illumination of the sample.
- edge lighting is provided by one or more light sources positioned to provide light substantially perpendicular to a normal of the substrate surface on which the sample is disposed.
- the substrate is a glass slide.
- the substrate is configured as a wave guide to thereby guide light emitted from the edge lighting towards the sample.
- illumination of the sample via edge lighting can be used with the methods described herein to determine one or more focus metrics of a tissue sample.
- Example: A mouse brain tissue sample is provided (fresh frozen or FFPE).
- the tissue sample can optionally be permeabilized (FFPE is already permeabilized).
- the tissue sample is contacted with a plurality of barcoded probes.
- the tissue sample is positioned in an optofluidic instrument having an OR-robust codebook stored thereon and, in each probing cycle of a plurality of probing cycles, the tissue sample is contacted with fluorescent tags. Fluorescent blobs from the tissue sample are detected by the optofluidic instrument in each probing cycle and the blobs are registered and/or aligned across all cycles.
- the optical signals are converted into an observed codeword, for example, using a probabilistic based decoder, The resulting observed codewords from the observed optical signals are decoded against an OR-robust codebook stored on the instrument.
- FIG. 11 illustrates an example of a computing device or system in accordance with one or more examples of the disclosure.
- Device 1100 can be a host computer connected to a network.
- Device 1100 can be a client computer or a server.
- device 1100 can be any suitable type of microprocessor-based device, such as a personal computer, workstation, server, or handheld computing device (portable electronic device), such as a phone or tablet.
- the device can include, for example, one or more of processor 1110, input device 1120, output device 1130, memory / storage 1140, and communication device 1160.
- Input device 1120 and output device 1130 can generally correspond to those described above, and they can either be connectable or integrated with the computer.
- Input device 1120 can be any suitable device that provides input, such as a touch screen, keyboard or keypad, mouse, or voice-recognition device.
- Output device 1130 can be any suitable device that provides output, such as a touch screen, haptics device, or speaker.
- Storage 1140 can be any suitable device that provides storage, such as an electrical, magnetic, or optical memory including a RAM, cache, hard drive, or removable storage disk.
- Communication device 1160 can include any suitable device capable of transmitting and receiving signals over a network, such as a network interface chip or device.
- the components of the computer can be connected in any suitable manner, such as via a physical bus 1170 or wirelessly.
- Software 1150 which can be stored in memory / storage 1140 and executed by processor 1110, can include, for example, the programming that embodies the functionality of the present disclosure (e.g., as embodied in the methods and systems described above).
- Software 1150 can also be stored and/or transported within any non-transitory computer- readable storage medium for use by or in connection with an instruction execution system, apparatus, or device, such as those described above, that can fetch instructions associated with the software from the instruction execution system, apparatus, or device and execute the instructions.
- a computer-readable storage medium can be any medium, such as storage 1140, that can contain or store programming for use by or in connection with an instruction execution system, apparatus, or device.
- Software 1150 can also be propagated within any transport medium for use by or in connection with an instruction execution system, apparatus, or device, such as those described above, that can fetch instructions associated with the software from the instruction execution system, apparatus, or device and execute the instructions.
- a transport medium can be any medium that can communicate, propagate, or transport programming for use by or in connection with an instruction execution system, apparatus, or device.
- the transport readable medium can include, but is not limited to, an electronic, magnetic, optical, electromagnetic, or infrared wired or wireless propagation medium.
- Device 1100 may be connected to a network, which can be any suitable type of interconnected communication system.
- the network can implement any suitable communications protocol and can be secured by any suitable security protocol.
- the network can comprise network links of any suitable arrangement that can implement the transmission and reception of network signals, such as wireless network connections, T1 or T3 lines, cable networks, DSL, or telephone lines.
- Device 1100 can implement any operating system suitable for operating on the network.
- Software 1150 can be written in any suitable programming language, such as C, C++, Java, or Python.
- application software embodying the functionality of the present disclosure can be deployed in different configurations, such as in a client/server arrangement or through a web browser as a web-based application or web service, for example.
- polynucleotide refers to polymeric forms of nucleotides of any length, either ribonucleotides or deoxyribonucleotides.
- this term comprises, but is not limited to, single-, double-, or multi- stranded DNA or RNA, genomic DNA, cDNA, DNA-RNA hybrids, or a polymer comprising purine and pyrimidine bases or other natural, chemically or biochemically modified, non-natural, or derivatized nucleotide bases.
- “Ligation” may refer to the formation of a covalent bond or linkage between the termini of two or more nucleic acids, e.g., oligonucleotides and/or polynucleotides, in a template-driven reaction.
- the nature of the bond or linkage may vary widely and the ligation may be carried out enzymatically or chemically.
- ligations are usually carried out enzymatically to form a phosphodiester linkage between a 5' carbon terminal nucleotide of one oligonucleotide with a 3' carbon of another nucleotide.
Landscapes
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Medical Informatics (AREA)
- Evolutionary Biology (AREA)
- Biotechnology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Analytical Chemistry (AREA)
- Chemical & Material Sciences (AREA)
- Biophysics (AREA)
- Theoretical Computer Science (AREA)
- Bioethics (AREA)
- Public Health (AREA)
- Signal Processing (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Epidemiology (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- Pathology (AREA)
- Immunology (AREA)
- General Physics & Mathematics (AREA)
- Biochemistry (AREA)
- Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
Methods for the design of a codebook comprising a set of codewords that are assigned to barcoded target analytes in a multiplexed in situ assay are described, where the codebook is designed to minimize the impact of target analyte spatial crowding on accurate decoding and detection of target analytes. Methods for performing in situ decoding using the disclosed codebooks, where, for all valid code words in the codebook, a first Hamming distance between a first logical bitwise OR combination of any pair of valid code words and a second logical bitwise OR combination of any other pair of valid code words is greater than or equal to 1, are also described.
Description
SYSTEMS AND METHODS FOR GENERATING CODEBOOKS
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application claims the priority benefit of United States Provisional Patent Application Serial No. 63/649,266, filed May 17, 2024, the contents of which are incorporated herein by reference in their entirety.
FIELD
[0002] The present disclosure generally relates to methods and systems for imagingbased in situ analysis of target analytes in biological samples and, more specifically, to methods for designing codebooks having a set of code words that are assigned to barcoded target analytes in a multiplexed assay. In particular, the codebook is designed to reduce (e.g., minimize) the impact of spatial crowding on accurate target analyte detection.
BACKGROUND
[0003] In some in-situ detection and sequencing methods, each target molecule to be detected in a multiplexed assay is assigned a unique codeword from a codebook of valid code words. Some codebooks (e.g., binary codebooks) may be designed to include only code words that have a minimum edit distance (e.g., a minimum Hamming distance) between each pairwise combination of code words to allow for error correction and/or accurate decoding despite the presence of errors in, e.g., imaging-based signal intensity data collected for each target molecule.
[0004] In order to apply multiplexed in situ detection and sequencing techniques to a tissue specimen, nucleic acid probes with target- specific barcodes corresponding to the designed code words are introduced to the tissue specimen, attached to the target molecules within the sample (the target molecules may have a generally stochastic distribution throughout the tissue specimen volume), and then typically amplified to create features (e.g., rolling circle amplification products (RCPs)) comprising multiple copies of the target- specific barcodes assigned to the target molecules. Different target molecules may be close to one another within the three-dimensional volume of the tissue specimen. If the distance between two target molecules, or representative features thereof (e.g., RCPs), approaches the "localization precision" (z.e., the accuracy with which the center of each representative
feature can be measured in an optical image of the tissue specimen) of the optical imaging technique utilized for detection, then the observed optical signal in the image of that region of tissue will contain optical signals (e.g., "ON" signals in one or more optical detection channels) arising from the representative features of both target molecules, and the estimated center positions of the two representative target molecule features will partially or completely overlap. In this case, a decoding algorithm used to decode the target- specific barcodes corresponding to code words may not be able to determine which optical signal arose from each target molecule’s representative feature. In a conventional codebook, there may be many different pairs of code words that correspond to the same observed set of ON signals in a given decoding cycle, and the decoding algorithm will not be able to distinguish between the various possible combinations to yield an accurate, high confidence determination of the code words corresponding to the two target molecules. Thus, there is a need for improved codebook designs that mitigate the interference of target molecule spatial crowding in tissue specimens, and thereby enable more accurate decoding in multiplexed in situ detection and sequencing assays.
SUMMARY
[0005] Disclosed herein are methods comprising: receiving a codebook comprising a plurality of valid code words, wherein, for all valid code words in the codebook, a first Hamming distance between a first logical bitwise OR combination of any pair of valid code words and a second logical bitwise OR combination of any other pair of valid code words is greater than or equal to a predetermined number; receiving a plurality of images of a biological sample acquired over a plurality of sequencing or probing cycles, the plurality of images comprising a plurality of observed optical signals; decoding the plurality of observed optical signals in the plurality of images to obtain a plurality of observed code words; and for each observed code word: determining an assignment of the observed code word to one of the plurality of valid code words, or determining that the observed code word is not a valid code word of the plurality of valid code words.
[0006] In some embodiments, the predetermined number is 1. In some embodiments, decoding the plurality of observed optical signals in the plurality of images to obtain a plurality of observed code words comprises: determining a location of each observed optical signal in a first image of the plurality of images; aligning the locations of the plurality of observed optical signals in the plurality of images to obtain a series of observed optical
signals at each location; and obtaining the plurality of observed code words based on the series of observed optical signals at each location. In some embodiments, aligning the locations of the plurality of observed optical signals in the plurality of images comprises registering the plurality of images acquired over the plurality of sequencing or probing cycles.
[0007] In some embodiments, each observed optical signal of the plurality of observed optical signals comprises at least one intensity value representing an intensity of the observed optical signal. In some embodiments, the at least one intensity value comprises an analog intensity value. In some embodiments, the at least one intensity value comprises a raw intensity value, a normalized intensity value, or a calculated intensity value calculated based on at least one of: a size of a feature corresponding to the observed optical signal, a circularity of a feature corresponding to the observed optical signal, or one or more Gaussian statistical parameters characterizing a feature corresponding to the observed optical signal. In some embodiments, the method further comprises comparing the at least one intensity value representing an intensity of each observed optical signal to a predetermined intensity threshold to determine a binary value representing the intensity of each observed optical signal. In some embodiments, each binary value comprises a 1 or a 0, wherein 1 represents an observed optical signal for which intensity is greater than or equal to the predetermined intensity threshold and 0 represents an observed optical signal for which intensity is less than the predetermined intensity threshold. In some embodiments, decoding the plurality of observed optical signals in the plurality of images comprises obtaining the plurality of observed code words based on a series of binary values determined for each location. In some embodiments, each observed code word of the plurality of code words comprises a plurality of code word segments, and wherein each code word segment comprises a specified string of binary values that corresponds to one of a specified set of observed optical signal states. In some embodiments, each code word segment comprises a four bit string of binary values such that: a code word segment of 1 00 0 corresponds to a first optical signal state, A, in which an optical signal is detected in a first detection channel of a four-channel optical imaging instrument, and no optical signal is detected in a second, third, or fourth detection channel of the four-channel optical imaging instrument; a code word segment of 0 1 00 corresponds to a second optical signal state, B, in which an optical signal is detected in the second detection channel of the four-channel optical imaging instrument, and no optical signal is detected in the first, third, or fourth detection channel of the four-channel optical
imaging instrument; a code word segment of 0 0 1 0 corresponds to a third optical signal state, C, in which an optical signal is detected in the third detection channel of the four- channel optical imaging instrument, and no optical signal is detected in the first, second, or fourth detection channel of the four-channel optical imaging instrument; a code word segment of 00 0 1 corresponds to a fourth optical signal state, D, in which an optical signal is detected in the fourth detection channel of the four-channel optical imaging instrument, and no optical signal is detected in the first, second, or third detection channel of the four-channel optical imaging instrument; and a code word segment of 00 00 corresponds to a fifth optical signal state, E, in which no optical signal is detected in any of the first, second, third, or fourth detection channels of the four-channel optical imaging instrument. In some embodiments, determining the assignment of the observed code word to one of the plurality of valid code words comprises identifying a valid code word of the plurality of valid code words that is identical to the observed code word. In some embodiments, determining the assignment of the observed code word to one of the plurality of valid code words comprises changing at least one of the binary values in the series of binary values corresponding to the observed code word to thereby assign the observed code word to a valid code word of the plurality of valid code words.
[0008] In some embodiments, determining the assignment of the observed code word to one of the plurality of valid code words comprises determining a plurality of scores based on comparison of the observed code word to all or a portion of the plurality of valid code words. In some embodiments, the method further comprises selecting one of the plurality of valid code words having a highest score to assign as a replacement for the observed code word.
[0009] In some embodiments, the plurality of images comprises a plurality of images comprising different fields-of-view of the biological sample. In some embodiments, the plurality of images comprises a plurality of z-stack images of the biological sample. In some embodiments, the plurality of observed optical signals represents light emitted from a plurality of fluorophores.
[0010] In some embodiments, the method further comprises identifying a target analyte in the biological sample based on the determined assignment of the observed code word to a valid code word and the codebook. In some embodiments, the identified target analyte comprises a messenger RNA (mRNA) molecule or protein molecule.
[0011] In some embodiments, each valid code word of the plurality of valid code words has a second Hamming distance of greater than or equal to 4 from every other valid code word. In some embodiments, the codebook comprises at least 50 valid code words. In some embodiments, the codebook comprises up to 200,000 valid code words.
[0012] Disclosed herein are systems comprising: a codebook database; a computing system comprising at least one computer-readable storage medium having program instructions stored thereon, the program instructions executable by at least one processor of the computing system to cause the at least one processor to perform a method comprising: receiving a codebook from the codebook database comprising a plurality of valid code words, wherein, for all valid code words in the codebook, a first Hamming distance between a first logical bitwise OR combination of any pair of valid code words and a second logical bitwise OR combination of any other pair of valid code words is greater than or equal to a predetermined number; receiving a plurality of images of a biological sample acquired over a plurality of sequencing or probing cycles, the plurality of images comprising a plurality of observed optical signals; decoding the plurality of observed optical signals in the plurality of images to obtain a plurality of observed code words; and for each observed code word: determining an assignment of the observed code word to one of the plurality of valid code words, or determining that the observed code word is not a valid code word of the plurality of valid code words. In some embodiments, the predetermined number is 1.
[0013] Disclosed herein are computer program products comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor to cause the processor to perform any of the methods described herein.
[0014] Disclosed herein are methods comprising: receiving a codebook comprising a plurality of valid code words, wherein, for at least a first portion of the valid code words in the codebook, a first Hamming distance between a first logical bitwise OR combination of any pair of valid code words and a second logical bitwise OR combination of any other pair of valid code words is greater than or equal to a predetermined number; receiving a plurality of images of a biological sample acquired over a plurality of sequencing or probing cycles, the plurality of images comprising a plurality of observed optical signals; decoding the plurality of observed optical signals in the plurality of images to obtain a plurality of observed code words; and for each observed code word: determining an assignment of the
observed code word to one of the plurality of valid code words, or determining that the observed code word is not a valid code word of the plurality of valid code words.
[0015] In some embodiments, the predetermined number is 1. In some embodiments, decoding the plurality of observed optical signals in the plurality of images to obtain a plurality of observed code words comprises: determining a location of each observed optical signal in a first image of the plurality of images; aligning the locations of the plurality of observed optical signals in the plurality of images to obtain a series of observed optical signals at each location; and obtaining the plurality of observed code words based on the series of observed optical signals at each location. In some embodiments, aligning the locations of the plurality of observed optical signals in the plurality of images comprises registering the plurality of images acquired over the plurality of sequencing or probing cycles. In some embodiments, each observed optical signal of the plurality of observed optical signals comprises at least one intensity value representing an intensity of the observed optical signal. In some embodiments, the at least one intensity value comprises an analog intensity value. In some embodiments, the at least one intensity value comprises a raw intensity value, a normalized intensity value, or a calculated intensity value calculated based on at least one of: a size of a feature corresponding to the observed optical signal, a circularity of a feature corresponding to the observed optical signal, or one or more Gaussian statistical parameters characterizing a feature corresponding to the observed optical signal. In some embodiments, the method further comprises comparing the at least one intensity value representing an intensity of each observed optical signal to a predetermined intensity threshold to determine a binary value representing the intensity of each observed optical signal. In some embodiments, each binary value comprises a 1 or a 0, wherein 1 represents an observed optical signal for which intensity is greater than or equal to the predetermined intensity threshold and 0 represents an observed optical signal for which intensity is less than the predetermined intensity threshold. In some embodiments, decoding the plurality of observed optical signals in the plurality of images comprises obtaining the plurality of observed code words based on a series of binary values determined for each location. In some embodiments, each observed code word of the plurality of code words comprises a plurality of code word segments, and wherein each code word segment comprises a specified string of binary values that corresponds to one of a specified set of observed optical signal states. In some embodiments, each code word segment comprises a four bit string of binary values such that: a code word segment of 1000 corresponds to a first optical signal state, A, in which
an optical signal is detected in a first detection channel of a four-channel optical imaging instrument, and no optical signal is detected in a second, third, or fourth detection channel of the four-channel optical imaging instrument; a code word segment of 0100 corresponds to a second optical signal state, B, in which an optical signal is detected in the second detection channel of the four-channel optical imaging instrument, and no optical signal is detected in the first, third, or fourth detection channel of the four-channel optical imaging instrument; a code word segment of 0010 corresponds to a third optical signal state, C, in which an optical signal is detected in the third detection channel of the four-channel optical imaging instrument, and no optical signal is detected in the first, second, or fourth detection channel of the four-channel optical imaging instrument; a code word segment of 0001 corresponds to a fourth optical signal state, D, in which an optical signal is detected in the fourth detection channel of the four-channel optical imaging instrument, and no optical signal is detected in the first, second, or third detection channel of the four-channel optical imaging instrument; and a code word segment of 0000 corresponds to a fifth optical signal state, E, in which no optical signal is detected in any of the first, second, third, or fourth detection channels of the four-channel optical imaging instrument. In some embodiments, determining the assignment of the observed code word to one of the plurality of valid code words comprises identifying a valid code word of the plurality of valid code words that is identical to the observed code word. In some embodiments, determining the assignment of the observed code word to one of the plurality of valid code words comprises changing at least one of the binary values in the series of binary values corresponding to the observed code word to thereby assign the observed code word to a valid code word of the plurality of valid code words. In some embodiments, determining the assignment of the observed code word to one of the plurality of valid code words comprises determining a plurality of scores based on comparison of the observed code word to all or a portion of the plurality of valid code words. In some embodiments, the method further comprises selecting one of the plurality of valid code words having a highest score to assign as a replacement for the observed code word. In some embodiments, the plurality of images comprises a plurality of images comprising different fields-of-view of the biological sample. In some embodiments, the plurality of images comprises a plurality of z-stack images of the biological sample. In some embodiments, the plurality of observed optical signals represents light emitted from a plurality of fluorophores. In some embodiments, the method further comprises identifying a target analyte in the biological sample based on the determined assignment of the observed code word to a valid code word and the codebook. In some embodiments, the identified target analyte comprises a
messenger RNA (mRNA) molecule or protein molecule. In some embodiments, each valid code word of the plurality of valid code words has a second Hamming distance of greater than or equal to 2 from every other valid code word. In some embodiments, the codebook comprises at least 50 valid code words. In some embodiments, the codebook comprises up to 100,000 code words.
[0016] Disclosed herein are systems comprising: a codebook database; a computing system comprising at least one computer-readable storage medium having program instructions stored thereon, the program instructions executable by at least one processor of the computing system to cause the at least one processor to perform a method comprising: receiving a codebook comprising a plurality of valid code words, wherein, for at least a first portion of the valid code words in the codebook, a first Hamming distance between a first logical bitwise OR combination of any pair of valid code words and a second logical bitwise OR combination of any other pair of valid code words is greater than or equal to 1; receiving a plurality of images of a biological sample acquired over a plurality of sequencing or probing cycles, the plurality of images comprising a plurality of observed optical signals; decoding the plurality of observed optical signals in the plurality of images to obtain a plurality of observed code words; and for each observed code word: determining an assignment of the observed code word to one of the plurality of valid code words, or determining that the observed code word is not a valid code word of the plurality of valid code words.
[0017] Disclosed herein are computer program products comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor to cause the processor to perform any of the methods described herein.
[0018] Disclosed herein are databases comprising: a codebook comprising a plurality of valid code words, wherein, for all valid code words in the codebook, a first Hamming distance between a first logical bitwise OR combination of any pair of valid code words and a second logical bitwise OR combination of any other pair of valid code words is greater than or equal to 1.
[0019] Disclosed herein are methods comprising: receiving a codebook comprising a plurality of valid code words, wherein, for all valid code words in the codebook, a first
Hamming distance between a first logical bitwise OR combination of any pair of valid code words and a second logical bitwise OR combination of any other pair of valid code words is greater than or equal to 1; receiving a plurality of locations for a plurality of observed optical signals, wherein the plurality of observed optical signals are obtained from a plurality of images of a biological sample acquired over a plurality of sequencing or probing cycles; decoding the plurality of observed optical signals to obtain a plurality of observed code words at the plurality of locations; and for each observed code word: determining an assignment of the observed code word to one of the plurality of valid code words, or determining that the observed code word is not a valid code word of the plurality of valid code words.
[0020] Disclosed herein are methods comprising: receiving a codebook having a plurality of code words, wherein, for all valid code words, a first Hamming distance between a first logical bitwise OR combination of any pair of valid code words and a second logical bitwise OR combination of any other pair of valid code words is greater than or equal to 1; receiving a analyte-index assignment for a plurality of target analytes; and using the analyte-index assignment to assign each target analyte of the plurality of target analytes to at least one of the plurality of code words such that each code word has at most one target analyte assignment, thereby generating an analyte-codeword assignment matrix.
[0021] In some embodiments, each code word of the plurality of codewords has an associated index. In some embodiments, receiving the analyte-index assignment comprises receiving an analyte-index matrix. In some embodiments, assigning each target analyte of the plurality of target analytes to at least one of the plurality of codewords comprises linking the plurality of target analytes and plurality of codewords based on the same indices. In some embodiments, the plurality of target analytes comprises a plurality of nucleic acids. In some embodiments, the plurality of nucleic acids comprises a plurality of genes. In some embodiments, the plurality of nucleic acids comprises a plurality of RNA transcripts. In some embodiments, the plurality of target analytes comprises a plurality of proteins.
[0022] Disclosed herein are methods for performing in situ decoding comprising: receiving a plurality of images of a biological sample, wherein the plurality of images comprises images acquired in a plurality of sequencing or probing cycles; detecting, based on the plurality of images, a series of optical signals at one or more locations in the biological sample corresponding to one or more barcoded target analytes; determining, based on the series of optical signals detected in the plurality of images, a code word comprising a series
of ON and OFF bits that corresponds to a barcode for one of the one or more barcoded target analytes; and identifying the barcoded target analyte based on a comparison of the determined code word to a codebook, wherein the code word corresponds to a member of a codebook comprising a plurality of code words for which: Hamming Distance ( V | Wj, WmlWn) > for all possible combinations of code words Wi, Wj, Wm, Wn, wherein V | Wj is a logical bitwise OR combination of any two code words Wi and Wj, wherein Wm| Wn is a logical bitwise OR combination of any two code words Wm and Wn, wherein K is an integer value greater than or equal to 1; wherein the codebook comprises L code words, and wherein z, j, m, and n are integers ranging in value from 0 to L- 1 and represent indices of the code words in the codebook.
[0023] In some embodiments, each series of optical signals detected in the plurality of images at the one or more locations comprises a series of ON signals and OFF signals.
[0024] In some embodiments, the plurality of code words in the codebook further satisfy a property that: Hamming Distance (Wi, Wj) > Q for any two pairwise combination of code words Wi and Wj, wherein Q is an integer value greater than or equal to 3.
[0025] In some embodiments, two or more code words are determined that correspond to two or more barcoded target analytes for which the corresponding series of optical signals partially overlap within the plurality of images, and wherein an error rate for correctly identifying the two or more barcoded target analytes is reduced compared to that when the plurality of code words in the codebook do not satisfy the relationship: Hamming Distance (Wi\Wj, Wm\ Wn) > K.
[0026] In some embodiments, the value of K is selectable by a user during design of the codebook. In some embodiments, a first portion of the plurality of code words in the codebook satisfies a relationship: Hamming Distance
Wj, Wm| Wn) > Ki, and a second portion of the plurality of code words in the codebook satisfies a relationship: Hamming Distance ( V | Wj, Wm| Wn) > K2, wherein Ki K2. In some embodiments, the values of Ki and K2 are selectable by a user during design of the codebook.
[0027] In some embodiments, a code word from the code book is randomly assigned to each of the one or more barcoded target analytes. In some embodiments, a code word from the code book is assigned to each of the one or more barcoded target analytes based on a decision rule designed to ensure that a total number of ON signals detected in a given image
of the plurality of images is within ± 10% of a mean number of ON signals detected per image for the plurality of images. In some embodiments, a code word from the code book is assigned to each of the one or more barcoded target analytes based on a decision rule designed to minimize a maximum predicted density of ON signals detected in images of the plurality of images. In some embodiments, a code word from the code book is assigned to each of two or more barcoded target analytes based on expression data for the two or more barcoded target analytes in clustered cell types, and wherein the clustered cell types represent a distribution of cell types found in the biological sample. In some embodiments, the expression data for the two or more barcoded target analytes comprises bulk gene expression data, bulk protein expression data, spatial gene expression data, spatial protein expression data, single cell gene expression data, single cell protein expression data, or any combination thereof. In some embodiments, the two or more barcoded target analytes are rank-ordered according to a maximum expression level across all clustered cell types, and the two or more code words are assigned to the two or more rank-ordered barcoded target analytes using an iterative process repeated for each of the two or more barcoded target analytes in decreasing order of maximum expression level, the iterative process comprising: computing a predicted density of ON signals for every combination of remaining, unassigned code words and the barcoded target analyte across the plurality of images; selecting a code word from the remaining, unassigned code words that minimizes the predicted density of ON signals across the plurality of images; and assigning the selected code word to the barcoded target analyte. In some embodiments, K is equal to 3, 4, or 5. In some embodiments, Q is equal to 4, 5, 6, 7, or 8. In some embodiments, the plurality of code words comprise code words of at least 60 bits, 80 bits, 100 bits, 120 bits, 140 bits, 160 bits, or 180 bits in length. In some embodiments, the plurality of code words in the codebook comprises at least 1,000, 5,000, 10,000, 20,000, 40,000, 60,000, 80,000, 100,000, 120,000, 140,000, 160,000, 180,000, or 200, 000 unique code words. In some embodiments, the series of optical signals comprise fluorescence signals. In some embodiments, each code word of the plurality comprises M x N bits, where M is a number of sequencing or probing cycles and N is a number of optical detection channels in an instrument configured to perform the in situ decoding. In some embodiments, the one or more barcoded target analytes comprise barcoded gene sequences, barcoded gene transcripts, barcoded proteins, or any combination thereof.
[0028] Disclosed herein are databases comprising: one or more non-transitory computer- readable storage medium components, the one or more non-transitory computer-readable
storage medium components individually or collectively storing a codebook comprising a plurality of code words for which: Hamming Distance (V Wj, Wm| Wn) > K for all possible combinations of code words Wi, Wj, Wm, Wn, wherein V | Wj is a logical bitwise OR combination of any two code words Wi and Wj, wherein Wm| Wn is a logical bitwise OR combination of any two code words Wm and Wn, wherein K is an integer value greater than or equal to 1, wherein the codebook comprises L code words, and wherein z, j, m, and n are integers ranging in value from 0 to 1 - L and represent indices of the code words in the codebook. In some embodiments, the plurality of code words in the codebook further satisfy a property that: Hamming Distance (Wi, Wj) > Q for any two pairwise combination of code words Wi and Wj, and wherein Q is an integer value greater than or equal to 3. In some embodiments, the value of K is selectable by a user during design of the codebook. In some embodiments, a first portion of the plurality of code words in the codebook satisfies a relationship: Hamming Distance ( Wz| Wj, Wm| Wn) > Ki, and a second portion of the plurality of code words in the codebook satisfies a relationship: Hamming Distance ( V | Wj, Wm^ Wn) > K2, wherein Ki K2. In some embodiments, the values of Ki and K2 are selectable by a user during design of the codebook. In some embodiments, K is equal to 3, 4, or 5. In some embodiments, Q is equal to 4, 5, 6, 7, or 8. In some embodiments, the plurality of code words comprise code words of at least 60 bits, 80 bits, 100 bits, 120 bits, 140 bits, 160 bits, or 180 bits in length. In some embodiments, the plurality of code words in the codebook comprises at least 100, 500, 1,000, 5,000, 10,000, 20,000, 40,000, 60,000, 80,000, or 100,000 unique code words. In some embodiments, each code word of the plurality comprises M x N bits, where M is a number of sequencing or probing cycles and N is a number of optical detection channels in an instrument configured to perform the in situ decoding. In some embodiments, each code word in the codebook has at least 2 ON bits. In some embodiments, each code word in the codebook has no more than 4, 5, or 6 ON bits.
[0029] Disclosed herein are systems comprising: one or more processors; and a memory communicatively coupled to the one or more processors and configured to store instructions that, when executed by the one or more processors, cause the system to: receive a plurality of images of a biological sample, wherein the plurality of images comprises images acquired in a plurality of sequencing or probing cycles; detect, based on the plurality of images, a series of optical signals at one or more locations in the biological sample corresponding to one or more barcoded target analytes; determine, based on the series of optical signals detected in the plurality of images, a code word comprising a series of ON and OFF bits that corresponds
to a barcode for one of the one or more barcoded target analytes; and identify the barcoded target analyte based on a comparison of the determined code word to a codebook, wherein the code word corresponds to a member of a codebook comprising a plurality of code words for which: Hamming Distance (V Wj, Wm| Wn) > K for all possible combinations of code words Wi, Wj, Wm, Wn, wherein IV, Wj is a logical bitwise OR combination of any two code words Wi and Wj, wherein W%| Wn is a logical bitwise OR combination of any two code words Wm and Wn, wherein K is an integer value greater than or equal to 1 ; wherein the codebook comprises L code words, and wherein i, j, m, and n are integers ranging in value from 0 to 1 - L and represent indices of the code words in the codebook. In some embodiments, each series of optical signals detected in the plurality of images at the one or more locations comprises a series of ON signals and OFF signals. In some embodiments, the plurality of code words in the codebook further satisfy a property that: Hamming Distance (Wi, Wj) > Q for any two pairwise combination of code words Wi and Wj, wherein Q is an integer value greater than or equal to 3. In some embodiments, two or more code words are determined that correspond to two or more barcoded target analytes for which the corresponding series of optical signals partially overlap within the plurality of images, and wherein an error rate for correctly identifying the two or more barcoded target analytes is reduced compared to that when the plurality of code words in the codebook do not satisfy the relationship: Hamming Distance ( V Wj, Wm{ Wn) > K. In some embodiments, the value of K is selectable by a user during design of the codebook. In some embodiments, a first portion of the plurality of code words in the codebook satisfies a relationship: Hamming Distance f W/| Wj, Wm^Wn) >Ki, and a second portion of the plurality of code words in the codebook satisfies a relationship: Hamming Distance ( V Wj, U%| Wn) > K2, wherein Ki K2. In some embodiments, the values of Ki and K2 are selectable by a user during design of the codebook. In some embodiments, a code word from the code book is randomly assigned to each of the one or more barcoded target analytes. In some embodiments, a code word from the code book is assigned to each of the one or more barcoded target analytes based on a decision rule designed to ensure that a total number of ON signals detected in a given image of the plurality of images is within ± 10% of a mean number of ON signals detected per image for the plurality of images. In some embodiments, a code word from the code book is assigned to each of the one or more barcoded target analytes based on a decision rule designed to minimize a maximum predicted density of ON signals detected in images of the plurality of images. In some embodiments, a code word from the code book is assigned to each of two or more barcoded target analytes is based on expression data for the two or more barcoded target analytes in clustered cell types,
and wherein the clustered cell types represent a distribution of cell types found in the biological sample. In some embodiments, the expression data for the two or more barcoded target analytes comprises bulk gene expression data, bulk protein expression data, spatial gene expression data, spatial protein expression data, single cell gene expression data, single cell protein expression data, or any combination thereof. In some embodiments, the two or more assigned code words are rank-ordered according to code word weight, the two or more barcoded target analytes are rank-ordered according to a maximum expression level across all clustered cell types, and the two or more rank-ordered code words are assigned to the two or more rank-ordered barcoded target analytes using an iterative process repeated for each of the two or more barcoded target analytes in decreasing order of maximum expression level, the iterative process comprising: computing a predicted density of ON signals for every combination of remaining, unassigned code words and the barcoded target analyte across the plurality of images; selecting a code word from the remaining, unassigned code words that minimizes the predicted density of ON signals across the plurality of images; and assigning the selected code word to the barcoded target analyte. In some embodiments, K is equal to 3, 4, or 5. In some embodiments, Q is equal to 4, 5, 6, 7, or 8. In some embodiments, the plurality of code words comprise code words of at least 60 bits, 80 bits, 100 bits, 120 bits, 140 bits, 160 bits, or 180 bits in length. In some embodiments, the plurality of code words in the codebook comprises at least 1,000, 5,000, 10,000, 20,000, 40,000, 60,000, 80,000, or 100,000 unique code words. In some embodiments, the series of optical signals comprise fluorescence signals. In some embodiments, each code word of the plurality comprises M x N bits, where M is a number of sequencing or probing cycles and N is a number of optical detection channels in an instrument configured to perform the in situ decoding. In some embodiments, the one or more barcoded target analytes comprise barcoded gene sequences, barcoded gene transcripts, barcoded proteins, or any combination thereof.
[0030] It should be appreciated that all combinations of the foregoing concepts and additional concepts discussed in greater detail below (provided such concepts are not mutually inconsistent) are contemplated as being part of the inventive subject matter disclosed herein. In particular, all combinations of claimed subject matter appearing at the end of this disclosure are contemplated as being part of the inventive subject matter disclosed herein.
INCORPORATION BY REFERENCE
[0031] All publications, comprising patent documents, scientific articles and databases, referred to in this application are incorporated by reference in their entirety for all purposes to the same extent as if each individual publication were individually incorporated by reference. If a definition set forth herein is contrary to or otherwise inconsistent with a definition set forth in the patents, applications, published applications and other publication that are herein incorporated by reference, the definition set forth herein prevails over the definition that is incorporated herein by reference.
BRIEF DESCRIPTION OF THE DRAWINGS
[0032] Various aspects of the disclosed methods, devices, and systems are set forth with particularity in the appended claims. A better understanding of the features and advantages of the disclosed methods, devices, and systems will be obtained by reference to the following detailed description of illustrative embodiments and the accompanying drawings, of which:
[0033] FIGS. 1A-1B provide a non-limiting example of a process flowchart for generating an OR-robust codebook, in accordance with one implementation of the methods described herein.
[0034] FIG. 2 provides a non-limiting example of a process flowchart for assigning the code words in an OR-robust codebook to a corresponding list of target analytes, in accordance with one implementation of the methods described herein.
[0035] FIG. 3 provides a non-limiting example of a process flowchart for decoding optical signals derived from images of a biological sample to identify barcoded target analytes, in accordance with one implementation of the methods described herein.
[0036] FIG. 4A depicts a non-limiting example of the structure of a binary code word for use with hybridization probe-based in situ detection of barcoded target analytes, in accordance with some implementations of the methods described herein.
[0037] FIG. 4B depicts a non-limiting example of the structure of a binary code word for use with sequencing-based in situ detection of barcoded target analytes, in accordance with some implementations of the methods described herein.
[0038] FIG. 5 provides a non-limiting schematic illustration of hybridization probe-based in situ detection of barcoded target analytes, in accordance with some implementations of the methods described herein.
[0039] FIG. 6 provides a non-limiting schematic illustration of sequencing-based in situ detection of barcoded target analytes, in accordance with some implementations of the methods described herein.
[0040] FIG. 7 depicts an overview of a volumetric sample imaging system and illustrates a Field of View (FOV) grid bounding the sample (e.g., hydrogel, tissue section, one or more cells, etc.) as projected onto the surface of a solid substrate supporting the sample.
[0041] FIG. 8 depicts the XZ cross-sectional view and illustrates tissue non-uniformity in the Z dimension, where the full (non-reduced) imaging volume is oversampled in the Z dimension. The objective lens focal point is positioned to acquire an image at every Z-slice in a Z-stack. An XZ image of signal distribution (bottom) demonstrates a non-uniform distribution of detected signal within the imaging volume.
[0042] FIG. 9 depicts a system for performing an in situ detection or sequencing assay, in accordance with some implementations of the methods described herein.
[0043] FIGS. 10A-10B illustrate cross-sectional views of an optics module in an imaging system, according to some embodiments.
[0044] FIG. 11 depicts a computer system or computer network, in accordance with some instances of the systems described herein.
DETAILED DESCRIPTION
[0045] Methods for constructing codebooks for use with multiplexed in situ analysis techniques are described. In some embodiments, the codebook designs described herein provide robust protections against codebook calling errors, for example, calling of a valid codeword in the codebook based on a detected codeword that has one or more detection errors. Detection errors may occur due to crosstalk in imaging channels (e.g., where two dyes have overlapping excitation spectra) or due to autofluorescence. Detection errors may also occur, for example, due to the close proximity of two or more target analytes having fluorescent oligonucleotides configured to emit fluorescence during the same imaging cycle.
That is, the codebook designs described herein reduce (e.g., minimize) the potential for errors (e.g., calling an incorrect transcript) during decoding. In some embodiments, the codebook designs described herein reduce (e.g., minimizes) the impact of spatial crowding of target molecules within a biological sample (e.g., a tissue specimen) when performing decoding of detected signals from a plurality of imaging rounds. The codebooks described herein are referred to as "OR-robust" or "spatial collision robust" codebooks. An OR-robust codebook has the property that all or a portion of the valid codewords in the codebook satisfy the property that the Hamming distance between a first logical bitwise OR combination of any pair of valid code words and a second logical bitwise OR combination of any other pair of valid code words is greater than or equal to an OR-robust radius (i.e., a specified integer value greater than zero). Thus, an OR-robust codebook reduces the chance that light signals from any two different analytes in close proximity to one another combine into the same observed codeword (after all imaging cycles are completed) that is ultimately decoded to a valid codeword (with one or more errors in the observed codeword and/or a low quality score) or discarded entirely. Further, methods for decoding optical signals can leverage an OR-robust codebook to enable accurate decoding of barcoded target molecules in multiplexed in situ assays even under conditions where the spatial densities of target molecules are high.
[0046] In some instances, for example, the disclosed methods may comprise: receiving a codebook comprising a plurality of valid code words, where, for all valid code words in the codebook, a first Hamming distance between a first logical bitwise OR combination of any pair of valid code words and a second logical bitwise OR combination of any other pair of valid code words (including the case where the first logical bitwise OR combination and the second logical bitwise OR combination include a common code word) is greater than or equal to a predetermined number (e.g., where the predetermined number is 1, 2, 3, 4, 5, 6, 7, or 8); receiving a plurality of images of a biological sample acquired over a plurality of sequencing or probing cycles, each image of the plurality of images comprising a plurality of observed optical signals; decoding the plurality of observed optical signals in the plurality of images to obtain a plurality of observed code words; and for each observed code word: determining an assignment of the observed code word to one of the plurality of valid code words, or determining that the observed code word is not a valid code word of the plurality of valid code words.
[0047] In some instances, the disclosed methods may comprise: receiving a codebook comprising a plurality of valid code words, where, for at least a first portion of the valid code words in the codebook, a first Hamming distance between a first logical bitwise OR combination of any pair of valid code words and a second logical bitwise OR combination of any other pair of valid code words (including the case where the first logical bitwise OR combination and the second logical bitwise OR combination include a common code word) is greater than or equal to 1; receiving a plurality of images of a biological sample acquired over a plurality of sequencing or probing cycles, each image of the plurality of images comprising a plurality of observed optical signals; decoding the plurality of observed optical signals in the plurality of images to obtain a plurality of observed code words; and for each observed code word: determining an assignment of the observed code word to one of the plurality of valid code words, or determining that the observed code word is not a valid code word of the plurality of valid code words.
[0048] In some instances, the disclosed methods may comprise: receiving a codebook comprising a plurality of valid code words, where, for all valid code words in the codebook, a first Hamming distance between a first logical bitwise OR combination of any pair of valid code words and a second logical bitwise OR combination of any other pair of valid code words is greater than or equal to 1 ; receiving a plurality of locations for a plurality of observed optical signals, wherein the plurality of observed optical signals are obtained from a plurality of images of a biological sample acquired over a plurality of sequencing or probing cycles; decoding the plurality of observed optical signals to obtain a plurality of observed code words at the plurality of locations; and for each observed code word: determining an assignment of the observed code word to one of the plurality of valid code words, or determining that the observed code word is not a valid code word of the plurality of valid code words.
[0049] In some instances, disclosed herein are methods of generating a codebook for in situ decoding comprising: receiving a plurality of code words, where, for all valid code words, a first Hamming distance between a first logical bitwise OR combination of any pair of valid code words and a second logical bitwise OR combination of any other pair of valid code words (including the case where the first logical bitwise OR combination and the second logical bitwise OR combination include a common code word) is greater than or equal to 1 ; receiving a list of a plurality of target analytes; and for each target analyte on the list of the
plurality of target analytes: assigning the target analyte to at least one of the plurality of code words such that each code word has at most one target analyte assignment, thereby generating the codebook.
[0050] In some instances, disclosed herein are methods for performing in situ decoding comprising: receiving a plurality of images of a biological sample, where the plurality of images comprises images acquired in a plurality of sequencing or probing cycles; detecting, based on the plurality of images, a series of optical signals at one or more locations in the biological sample corresponding to one or more barcoded target analytes; determining, based on the series of optical signals detected in the plurality of images, a code word comprising a series of ON and OFF bits that corresponds to a barcode for one of the one or more barcoded target analytes; and identifying the barcoded target analyte based on a comparison of the determined code word to a codebook, wherein the code word corresponds to a member of a codebook comprising a plurality of code words for which:
Hamming Distance f W/| Wj, W,^ W„J > K for all possible combinations of code words Wi, Wj, Wm, Wn, where V | Wj is a logical bitwise OR combination of any two code words Wi and Wj, where Wm| Wn is a logical bitwise OR combination of any two code words Wm and Wn, where K is an integer value greater than or equal to 1; wherein the codebook comprises L words, and where i, j, m, and n are integers ranging in value from 0 to L-l and represent indices of the code words in the codebook.
[0051] In various embodiments, each OR combination is based on two different codewords. In the above equation, z and j are not equal to one another and m and n are not equal to one another. That is, the following expression must be true: (i != j) AND (m != n). In various embodiments, a pair of OR-combinations can share at most one codeword in the comparison. In the above equation, z cannot be equal to m and j cannot be equal to n. Alternatively, z cannot be equal to n and j cannot be equal to m. That is, the following expression must be true: NOT (((i == m) AND (j ==n)) OR ((i ==n) AND (j==m))).
[0052] The section headings used herein are for organizational purposes only and are not to be construed as limiting the subject matter described.
I. Overview
[0053] Methods for constructing codebooks for use with multiplexed in situ analysis techniques are described, where the specific codebook design provides robust protection against code book calling errors and can reduce (e.g., minimize) the impact of spatial crowding of target molecules within a biological sample (e.g., a tissue specimen) when performing decoding of detected signals from a plurality of imaging rounds/cycles.
[0054] In some instances, the codebooks described herein comprise binary codebooks, i.e., codebooks comprising binary code words having a plurality of binary segments (e.g., 4- bit binary segments of the form “bit 1, bit 2, bit 3, bit 4, etc.”) where each ON bit (“1”) indicates that a signal was detected in one of the plurality of optical detection channels of an imaging instrument used to perform the decoding in a given decoding cycle, and each OFF bit (“0”) indicates that no signal was detected in the particular optical detection channel in the given decoding cycle. In various embodiments, each subsegment (and ultimately, each observed full codeword is associated with a specific X, Y, Z set of coordinates within an imaged 3D volume). Each binary segment may represent an individual imaging cycle of a plurality of imaging cycles. For example, where the color channels associated with the binary segments are “red channel, yellow channel, green channel, blue channel”, a binary segment of “1 0 00” indicates that a signal was detected in the red channel and no signal was detected in the yellow, green, or blue channels. When all binary segments are appended together, the resulting string of l’s and 0’s represents a full binary code word.
[0055] The disclosed OR-robust codebooks satisfy the property that Hamming Distance (CWA{ CWB,
for all possible pairwise combinations (or a portion of all possible pairwise combinations) of a list of valid code words (e.g., CWA, CWB, CWC, CWD, etc.), where the notation CWX{CWY denotes a code word derived from the logical bitwise OR combination of code words CWx and CWY, and where K is an integer value greater than zero. In various embodiments, code words within the code book are represented in illumination state space. For example, each state may be represented as a letter in the alphabet (e.g., red is state A, yellow is state B, green is state C, blue is state D, and empty is state E). In various embodiments, each state corresponds to a binary string. For example, the state A may be represented as 1000, the state B may be represented as 0100, the state C may be represented as 0010, the state D may be represented as 0001, and the empty (no emission) state may be
represented as 0000 (where each bit in the binary string corresponds to a color channel, similar to the binary segments described above).
[0056] If this constraint is not enforced, pairs of codewords in the codebook may satisfy Hamming Distance (CWA{ CWB, CWC CWD) >0 for many combinations of two pairs of the code words, which means that if one observes a signal corresponding to CWA{ CWB (e.g., if a first target molecule, such as a first rolling circle product (RCP), labeled with a barcode corresponding to code word CWA in a given decoding cycle is very close to a second target molecule, such as a second RCP, labeled with a barcode corresponding to code word CWB in that decoding cycle), then the signal from the first and second target molecules may be indistinguishable from a signal corresponding to
CWD, and therefore may not be accurately decoded with suitable confidence.
[0057] When a decoding method, is implemented with the disclosed OR-robust codebooks (and methods for design thereof), decoding barcoded target molecules in multiplexed in situ assays has higher accuracy and the decoding process is capable of tolerating higher spatial densities of target molecules. For example, an OR-robust codebook can be designed for use with high-plexy analysis (e.g., a 2k gene panel, a 5k gene panel, or a whole transcriptome panel), gene panels where at least one gene in the panel is a highly expressed gene (signals from highly expressed genes can cause spatial crowding or overpower signals from lesser expressed genes), and/or protein panels (protein can appear diffuse thus causing spatial crowding).
II. Methods for Generating and Using OR-Robust Codebooks
[0058] Methods used for generating codebooks having an OR-robust property involve imposing a constraint that a logical bitwise OR of any first pair of code words, CWA, CWB in the codebook has a minimum Hamming Distance from a logical bitwise OR of any other pair of code words CWc, CWD; specifically, Hamming Distance(CWA\CWB, CWC\ CWD) >K, where K is an integer having a value greater than or equal to 1. In some instances, for example, K = 1, 2, 3, 4, 5, 6, 7, 8, or more. In various embodiments, when testing a codebook for the OR-robust property, code words CWA and CWB are two different code words in the code book (e.g., code words CWA and CWB are not the same string of letters or bits) and code words CWc and CWD are two different code words in the codebook (e.g., code words CWc and CWD are not the same string of letters or bits). In various embodiments, when testing a
codebook for the OR-robust property, code words CWA and CWB are a first pair of code words and code words CWc and CWD are a second pair of codewords that is different from the first pair of codewords. For example, code words CWA and CWc may be the same code word, but code words CWB and CWD are different code words such that the first pair of codewords is different from the second pair of codewords. In another example, code word CWA is not equal to code words CWB, CWC, and CWD; code word CWB is not equal to code words CWA, CWC, and CWD and code word CWc is not equal to code words CWA, CWB, and CWD (thus, code word CWD is not equal to code words CWA, CWB, and CWc).
[0059] In various embodiments, an OR-robust codebook can be generated by starting with at least one arbitrary code word (e.g., one, two, or three code words). In various embodiments, the at least one arbitrary starting code words pass at least one validation check. For example, a validation check may include a check that the code word has at least a specific number of ON bits or exactly a specific number of ON bits. Where at least two starting code words are selected, the at least two starting code words may be separated by at least a predetermined Hamming distance (e.g., a HD of at least 6). In various embodiments, two code words are arbitrarily selected having at least a predetermined edit distance (e.g., Hamming distance) from each other. Using an example with two code words having 60 total bits (15 segments of 4 bits) and a maximum number of ON bits in any given code word of five, the maximum possible Hamming distance between these two code words is 10 (i.e., each of the five ON bits in the first code word do not overlap with any of the five ON bits in the second code word, meaning that 10 edits). In various embodiments, two code words are selected having at least a predetermined edit distance (e.g., Hamming distance) from each other. In various embodiments, any codewords beyond the second (e.g., third, fourth, fifth, etc.) are selected such that the codeword has a predetermined edit distance (e.g., Hamming distance) from all other codewords, and the logical bitwise OR between any two pairs of codewords has a predetermined edit distance (e.g., Hamming distance) from one another. In various embodiments, two arbitrary codewords are selected (with sufficiently high edit distance from each other) to start, but the third codeword is no longer arbitrary and is selected so that the OR-robust property is not violated. By way of example, with three codewords (e.g., CWA, CWB, CWC), the following pairs of pairs (or, pairs of bitwise OR combinations) can be created: {CWA|CWB, CWA|CWC} ; {CWA|CWB, CWB|CWC} ; {CWA|CWC, CWB|CWC} . All three of these pairs of pairs need to satisfy the OR-robust property:
HD(CWA|CWB, CWA|CWC) >= K; HD(CWA|CWB, CWB|CWC) >= K; HD(CWA|CWC, CWB|CWC) >= K.
[0060] In various embodiments, one skilled in the art will recognize that other schemes of generating a new code word can be utilized. In various embodiments, new codewords are generated using a random generator. In various embodiments, new codewords are generated using a deterministic generator having one or more properties (z.e., generating codewords in a specific, useful order). In various embodiments, new codewords are generated to have a predetermined number of overlapping bits with codewords already present in the codebook (e.g., new codewords have a same number of overlapping bits or states with codewords already present in the codebook, new codewords have a maximum number of overlapping bits or states with codewords already present in the codebook). In various embodiments, some methods for generating new codewords (e.g., as described above) will allow for larger OR-robust codebooks to be generated.
[0061] In various embodiments, a new candidate code word is generated (e.g., randomly generated) and tested to determine whether adding the new candidate code word to the codebook satisfies the Hamming Distance property and the OR-robust property. If adding the new candidate code word violates the Hamming Distance property (z.e., the candidate code word is less than a specific HD away from at least one other valid code word in the code book), then the candidate code word can be discarded. If adding the new code word to the codebook satisfies the OR-robust property, the new code word is added to the codebook as a valid code word, otherwise the new codeword is discarded, and another new code word is randomly generated and tested. This process may be repeated until a codebook having a predetermined number of valid code words is generated or no new valid codewords can be found after a predetermined number of attempts (e.g., 1 million trials) (such that the codebook satisfies the OR-robust property for all pairs of valid codewords).
[0062] In various embodiments, the process is repeated until no more codewords can be added to the codebook without violating the OR-robust property (or any other suitable constraint, such as a predetermined number of OR-robust codewords has been achieved). In various embodiments, the process for generating new code words is generalized as a search (e.g., a depth-first search, breadth-first search, or informed search). In various embodiments, the search algorithm implements back-tracking (e.g., where a codeword, such as a newly added codeword, is removed and another new codeword is generated that allows for a larger
final OR-robust codebook). For example, with a codebook having 99 codewords, the new codeword generation algorithm finds a codeword CW_a, that can be added to the codebook without violating any constraints, but then the algorithm determines that no more codewords can be added beyond that codeword CW_a, giving a final OR-robust codebook size of 100. In this example, the algorithm may perform backtracking by removing at least codeword CW_a from the codebook. Subsequently, the algorithm determines a new codeword CW_b can be added to the codebook (after removing at least codeword CW_a), and that codeword CW_c can also be added with codeword CW_b resulting in a codebook with at least 101 codewords, as opposed to 100 codewords if CW_a was chosen. Thus, backtracking allows for the new codeword generation algorithm to go back arbitrarily far (z.e., remove arbitrarily up to a predetermined number of codewords from the codebook) to explore if a denser packing of OR-robust codewords is possible, thereby generating a larger OR-robust codebook.
[0063] In some instances, the OR-robust property may be imposed in addition to a conventional edit distance criterion, e.g., that the Hamming Distance for any pairwise combination of the base code words CWA and CWB is at least a predetermined distance parameter Q. In various embodiments, the predetermined distance parameter Q is two times the number of single errors that can be detected and corrected plus 1. This edit distance criterion is defined as follows: Hamming Distance (CWA, CWB)
(where Q is an integer value of greater than or equal to 1, for example, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, etc.). In various embodiments, Q is equal to 2k+l where k is the number of single errors (e.g., letter errors or bit errors) that can be reliably detected and corrected. In some instances, for example, k = 1, 2, 3, 4, 5, 6, 7, or 8. In various embodiments, a hamming weight (HW) of a codeword is equal to the number of ON bits in that codeword. In various embodiments, a minimum Hamming distance between two codewords is 0 and a maximum Hamming distance between codewords having a predetermined HW is 2 * HW. For example, for codewords with HW = 5, the minimum hamming distance between two codewords is 0, and the maximum hamming distance between two codewords is 2*HW = 10. In various embodiments, all codewords in the codebook have the same Hamming Weight. In various embodiments, at least some codewords in the codebook have a different hamming weight (i.e., not all codewords in the codebook have the same Hamming Weight). For example, a codebook may have 100 codewords having a HW of 5 and 50 codewords having a HW of 6. In various embodiments, the hamming distance between two codewords CW_a and CW_b satisfies: 0 <= HD(CW_a,
CW_b) <= (HW(CW_a) + HW(CW_b)). Since 2k + 1 <= HD(CW_a, CW_b). From transitivity: 2k + 1 <= (HW(CW_a) + HW(CW_b)). So k <= ((HW(CW_a) + HW(CW_b)) - 1) / 2. Finally, some codeword in the codebook will have some maximum hamming weight HW_max, resulting in: k <= (2*HW_max - 1) / 2.
[0064] In one approach, for example, a set of candidate code words (e.g., code words comprising a series of 4-bit segments, where each ON bit or OFF bit in a given 4-bit segment indicates the expected detection of a corresponding ON signal or OFF signal, respectively, in one of four optical detection channels (e.g., color channels) in a corresponding decoding cycle) may be generated (e.g., randomly) and filtered to remove those that don’t conform to a specified set of criteria that includes the Hamming Distance CWA\CWs, CWcjCWo) > = K criterion. Additional constraint criteria that may be imposed include, but are not limited to, the conventional Hamming distance criterion (Hamming Distance (CWA, CWB) ^Q), a maximum number of ON bits allowed per decoding cycle or code word segment (e.g., 1 ON bit for a 4-bit segment), a maximum number of ON bits allowed per code word (e.g., 4 ON bits per a 60-bit code word corresponding to a 15 cycle 4 color imaging decoding process, 5 ON bits per a 60-bit code word corresponding to a 15 cycle 4 color imaging decoding process, 6 ON bits per a 60-bit code word corresponding to a 15 cycle 4 color imaging decoding process, or 7 ON bits per a 60-bit code word corresponding to a 15 cycle 4 color imaging decoding process), exclusion of code words from a predetermined list of selected code words, etc. In various embodiments, a constraint imposed is that all codewords have at least one bit on in each of the optical channels (e.g., one of the four optical channels). In various embodiments, a constraint imposed is that ON-bits from adjacent cycles cannot occupy the same color channel (e.g., if codeword CWA has an ON-bit in the red color channel in cycle 1 or has a state space assigned to the “red” state, then codeword CWA will not have an ON-bit in cycle 2 in the red color channel or have a state space assigned to the “red” color channel in cycle 2). In an alternative approach, rather than generating and filtering codewords, only valid candidate codewords are generated. For example, only candidate codewords having five ON bits and being at least a specific Hamming distance (e.g., HD of 6) away from all other candidate codewords are generated for the list of possible candidate codewords. This alternative approach reduces the number of filtering steps that need to be performed.
[0065] In some embodiments, a first candidate code word is selected and a second candidate code word is selected according to the methods described herein. In various embodiments, the candidate code words can be checked to determine if the specified set of constraint criteria (e.g., the OR-robust criterion, etc.) are met and, if so, an additional (e.g., third, fourth, etc.) candidate code word can be selected and checked against the first two or more code words to see if the specified set of constraint criteria are still met for each pair, etc. In various embodiments, there are 3 layers of constraints imposed on codewords within an OR-robust codebook. In various embodiments, the first layer includes constraints imposed on any single codeword, e.g., a minimum hamming weight, minimum number of channels, maximum number of channels, etc. In various embodiments, a second layer includes constraints imposed on any pair of codewords, e.g., the edit distance (e.g., Hamming distance) constraint. In various embodiments, the third layer includes constraints imposed on any pair of pairs of codewords, e.g., the OR-robust constraint. For example, a first codeword (CW_1) can be chosen at random from all codewords that satisfy the first layer of constraints, z.e., single codeword constraints. The second codeword (CW_2) can be chosen at random from all codewords that satisfy the first and second layer of constraints, e.g., that HD(CW_1, CW_2) >= min_hamming_distance. The third codeword (CW_3) is chosen such that CW_3 satisfies all single codeword constraints, in addition to being sufficiently far from CW_1 and CW_2 in Hamming space, and it has to satisfy the third layer of constraints, z.e., the "pairs of pairs of codewords constraints."
[0066] In various embodiments, to obtain the first three valid codewords in a codebook, a first random starting codeword is selected having exactly a specific number of ON bits (e.g., five ON bits). In various embodiments, to obtain a second valid starting codeword, a candidate codeword is obtained having the specific number of ON bits (e.g., five ON bits) and also passes the Hamming distance requirement (e.g., HD >= 6). In various embodiments, to obtain a third valid starting codeword, a candidate codeword is obtained having the specific number of ON bits (e.g., five ON bits), passes the Hamming distance requirement (e.g., HD >= 6), and also is OR-robust with the two starting valid codewords.
[0067] In various embodiments, testing of candidate codewords against all valid codewords is a computationally expensive process. In various embodiments, testing candidate codewords for inclusion in the list of valid codewords can be sped up so that a candidate codeword does not need to be tested against all valid codewords to ensure that
adding the candidate codeword does not cause the codebook to violate the OR-robust property. In various embodiments, given a candidate codeword, a test is performed to determine if adding the candidate codeword to the codebook will violate the OR-robust constraint. Given an assumption that all codewords in a codebook have a HD >= 6 from one another, other methods can be implemented instead of bitwise OR-ing a candidate codeword with all valid (incumbant) codewords and computing the hamming distance between all the resulting codewords. In various embodiments, the following inequality is effectively checked for a given candidate codeword:
((candidate CW | incumbent_CWA) XOR (incumbent_CWs | incumbent_CWc)).count_ones() > or_robust radius
In various embodiments, the above inequality can be checked more efficiently (e.g., faster) by calculating the following two values: num_bits_shared_with_CWB = ((candidate CW | incumbent_CWA) & incumbent_CWB).count_ones() num_bits_shared_with_CWc = ((candidate CW | incumbent_CWA) & incumbent_CWc).count_ones()
If both num_bits_shared_with_CWB and num_bits_shared_with_CWc are sufficiently small, the inequality above holds. In various embodiments, these two values can be used to determine whether a slow method or a faster method is used to test for the ability to add the candidate codeword to the codebook. When there are sufficiently small numbers of bits shared with codewords CWB and CWc, it is likely (e.g., certain) that adding the candidate codeword would not violate the OR-robust property of the codebook and so a faster test can be implemented. In other situations, for example, where bits are shared with codewords CWB and CWc, a more computationally expensive test is required due to the higher risk of the addition of the candidate codeword causing the OR-robust property of the codebook to be violated. In some embodiments, the test is to bitwise OR the candidate codeword with all valid codewords in the codebook and compare the OR-ed candidates against bitwise ORs of all pairs of valid codewords. In various embodiments, the time complexity of this test is quadratic time and, thus, computationally expensive. In some embodiments, the maximum OR robust radius for a codebook of codewords having five ON bits per codeword is 20. In some embodiments, an OR-robust radius is selected for a codebook that is 2, 3, 4, 5, 6, 7, 8, 9, or 10. In some embodiments, a constraint of a higher OR-robust radii will generate a codebook that does not have enough codewords for a given purpose, for example, for a whole 1
transcriptome analysis. In some embodiments, an OR-robust radius of about 2 to about 10 will generate codebooks having a suitable number of valid codewords for in situ analysis.
[0068] In various embodiments, a process to efficiently check if adding a candidate codeword to the codebook would violate the or_robust_radius constraint. To solve this problem in less than quadratic time, we can take advantage of the special structure of our codewords and our codebook. In various embodiments, a property of the codebook is that all codewords in the codebook are at least Hamming distance of 6 apart. In various embodiments, all codewords have exactly a specific number of ON bits set (e.g., five ON bits). In various embodiments, a faster algorithm using knowledge of the codebook properties described above is as follows:
1. Bitwise OR candidate codeword with all valid codebook words to produce ored_candidates .
2. Check if ored_candidates are all sufficiently far apart.
3. Group ored_candidates by number of set ON bits (8, 9, or 10).
4. For each group:
4.1. For each ored_candidate in group, compute how many bits it has with each incumbent codeword.
4.2. Grab only the codeword pairs where each codeword shares sufficiently many ones (precomputed) with the candidate pair. OR them together and call the result high_risk_pairs.
4.3. Compute Hamming distance between high_risk_pairs and ored_candidate, reject if any are below a desired threshold.
In various embodiments, this algorithm reduces the time complexity of OR-robust validation of candidate codewords because the minimum Hamming distance between a first pair of ORed codeword pairs is a function of the number of set bits that each codeword in an ORed second pair has with the ORed first pair.
[0069] FIGS. 1A-1B provide a non-limiting example of a flowchart for a process 100 for generating an OR-robust codebook. Process 100 can be performed, for example, using one or more electronic devices implementing software configured to perform the process 100. In some examples, process 100 is performed using a client-server system, and the blocks of process 100 are divided up between the server and multiple client devices. Thus, while portions of process 100 are described herein as being performed by particular devices of a
client-server system, it will be appreciated that process 100 is not so limited. In other examples, process 100 is performed using only a client device or only multiple client devices. In process 100, some blocks are, optionally, combined, the order of some blocks is, optionally, changed, and some blocks are, optionally omitted. In some examples, additional steps may be performed in combination with the process 100. Accordingly, the operations as illustrated (and described in greater detail below) are exemplary by nature and, as such, should not be viewed as limiting.
[0070] At step 102 in FIG. 1A, a plurality of candidate code words are generated randomly (e.g., by one or more processors of a system configured to perform the process illustrated in FIG. 1A).
[0071] In some instances, the randomly generated set of candidate code words may comprise at least 1,000, 5,000, 10,000, 20,000, 40,000, 60,000, 80,000, 100,000, or more than 100,000 unique candidate code words.
[0072] In some instances, the candidate code words (and the selected set of filtered code words) may comprise binary code words, e.g., code words comprising a series (or string) of binary values (z.e., “1” or “0”). In some instances, the candidate binary code words may comprise code words of at least 20 bits, 40 bits, 60 bits, 80 bits, 100 bits, 120 bits, 140 bits, 160 bits, 180 bits, or more than 180 bits in length. For binary code words, the total diversity (z.e., number of unique code words) of possible codewords, prior to any filtering of a specified length of L bits is given by 2L, thus a codebook comprising binary code words of length = 20 bits may include up to 1,048,576 unique code words, a codebook comprising binary code words of length = 40 bits may include up to 1.1 x 1012 unique code words, and so forth.
[0073] In some instances, the candidate code words (and the selected set of filtered code words) may comprise a series of code word segments, e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, or more than 20 code word segments, where each code word segment comprises, e.g., 2, 3, 4, or more than 4 bits. Each code word segment may represent a unique imaging cycle of a plurality of imaging cycles where a sample is imaged in a plurality of color channels e.g., red, yellow, green, and blue color channels). In one example, a binary code word of total length = 60 bits includes 15 code word segments of 4 bits each. In another example, a binary code word of total length = 80 bits includes 20 code
word segments of 4 bits each. In yet another example, a binary code word of total length = 144 bits includes 36 code word segments of 4 bits each. In some instances, up to 100 imaging cycles are supported thus having 100 code word segments (e.g., 100 code word segments = 100 optical states or, alternatively, 100 code word segments * 4 bits per segment = 400 bits). In some instances, more than 100 imaging cycles are supported (e.g., enough imaging cycles to analyze a full transcriptome).
[0074] At step 102a in FIG. 1A, the candidate code words may optionally be filtered to remove code words that don’t conform to, e.g., a constraint on a maximum number of ON bits per code word segment. In some instances, for example, a maximum number of ON bits per code word segment may be 1 bit, 2 bits, 3 bits, 4 bits, 5 bits, 6 bits, 7 bits, 8 bits, or more than 8 bits depending on the length of the code word segment.
[0075] At step 102b in FIG. 1A, the candidate code words may optionally be filtered to remove code words that don’t conform to one or more additional constraints, e.g., a maximum number of ON bits allowed per code word (e.g., 5, 6, 7, 8, 9, 10, or more than 10 ON bits depending on the length of the code word, exclusion of code words from a predetermined list of selected code words, etc.
[0076] At step 104 in FIG. 1A, the plurality of candidate code words may optionally be filtered to remove candidate code words that don’t conform to a specified edit distance criterion, e.g., a criterion that Hamming Distance (CWA{ CWB)
for all pairwise combinations with other candidate code words of the plurality, where CWA and CWB are candidate code words, and predetermined distance parameter Q is an integer having a value of greater than or equal to 1. In various embodiments, Q is 2k+l , where k is an integer have a value greater than or equal to 1.
[0077] Hamming distance is a special case of an edit distance, a class of metrics used to compare and evaluate distances between two character strings, which allow for three kinds of edit operations to be performed on the characters of one string to transform it into the other string (e.g., substitution, insertion, or deletion of a single character). Other examples of edit distances include the Longest Common Subsequence Distance (LCSD) and the Levenshtein distance (LevD). The Levenshtein distance allows for deletion, insertion and substitution. The Longest Common Subsequence Distance allows for insertion and deletion, but not
substitution. The Hamming distance allows only substitution, and hence only applies to strings of the same length.
[0078] In various embodiments, a codebook generated as described herein is compatible with the detection and correction of up to k errors as long as the Hamming distance for any two code words in the codebook is greater than or equal to 2k + 1 (where k = 1, 2, 3, 4, 5, 6, 7, 8, etc.). The use of a higher value of k provides greater error detection and correction capability for overcoming noisy signal detection during image-based decoding. The minimum acceptable value of k is determined based on the observed signal detection error rate for a given instrument used to perform decoding.
[0079] At step 106 in FIG. 1A, the remaining candidate code words may be filtered to remove candidate code words that don’t conform to another specified edit distance criteria, e.g., a criterion that Hamming Distance (CWA{ CWB,
all logical bitwise OR combinations of any two candidate code words in a plurality of remaining candidate code words, where CWA{ CWB indicates the logical bitwise OR combination of code words CWA and CWB, CWC{ CWD indicates the logical bitwise OR combination of code words CWc and CWD, and K is an integer having a value greater than or equal to 1. In some instances, K may be equal to 1, 2, 3, 4, 5, 6, 7, 8, etc. In some instances, the value of K is selectable by a user during design of the codebook. In various embodiments, the hamming distance between two pairs of pairs of codewords is between 0 and the sum of their hamming weights (inclusive). In one example, using a codebook where all codewords have a hamming weight 5, the maximum OR-robust Hamming distance between pairs of pairs of codewords is 20 (z.e., none of the four codewords share any bit with any other codeword). In another example, if all individual codewords have hamming weight == 6, the maximum OR-robust Hamming distance between pairs of pairs of codewords is 24.
[0080] In some instances, a first portion of the plurality of code words in the codebook satisfies a constraint that the Hamming Distance (CWA{ CWB, CWC^ CWD) ^KI for all logical bitwise OR combinations of any two candidate code words in the first portion, and a second portion of the plurality of code words in the codebook satisfies a constraint that the Hamming Distance (CWA{ CWB, CWcjCW ^K2 for all logical bitwise OR combinations of any two candidate code words in the second portion, where Ki K2. In some instances, the values of Ki and K2 are selectable by a user during design of the codebook. Such codebooks comprising a first portion of code words and a second portion of code words that satisfy
different OR-robust constraints may be useful, for example, in situations where it is desirable to decode a first set of genes/transcripts with higher accuracy than a second set, so may use OR-robust code words that have a higher value of K (z.e., a stronger OR-robust criterion) for the first portion of code words than the remaining code words.
[0081] At step 108 in FIG. IB, the remaining candidate code words (e.g., single candidate codewords) may optionally be filtered to remove candidate code words that don’t conform to, e.g., a constraint on a maximum number of ON bits per code word segment. In some instances, for example, a maximum number of ON bits per code word segment may be 1 bit, 2 bits, 3 bits, 4 bits, 5 bits, 6 bits, 7 bits, 8 bits, or more than 8 bits depending on the length of the code word segment.
[0082] At step 110 in FIG. IB, the remaining candidate code words may optionally be filtered to remove candidate code words that don’t conform to one or more additional constraints, e.g., a maximum number of ON bits allowed per code word (e.g., 3, 4, 5, 6, 7, 8, 9, 10, or more than 10 ON bits depending on the length of the code word, exclusion of code words from a predetermined list of selected code words, etc.
[0083] At step 112 in FIG. IB, an OR-robust codebook is output that comprises a plurality of selected code words that meet the specified list of constraints. In some instances, as will be discussed in more detail elsewhere herein, each code word of the plurality of selected code words comprises M x N bits, where M is a number of sequencing or probing cycles and A is a number of optical detection channels in an instrument configured to perform the in situ decoding.
[0084] Once the OR-robust codebook has been generated, it may be tailored for a specific in situ detection or sequencing application by assigning one or more code words contained therein to each of a plurality of barcoded target molecules (or target analytes) of interest. In some instances, more than one code word may be assigned to a single barcoded target molecule. The code words thus correspond to and represent the physical barcodes (e.g., oligonucleotide barcode sequences) attached to the target molecules in a multiplexed in situ assay, where the relationship between the structure of the code words and the structure of the physical barcodes depends on the read-out method (e.g., hybridization probe-based detection or nucleic acid sequencing) used in the in situ assay.
[0085] In various embodiments, the logical bitwise OR between any two pairs of codewords, for example, HD
CWC{CWD), will be equal to 1 or 2. In various embodiments, the logical bitwise OR between at least one pair of pair of codewords is 0.
[0086] FIG. 2 provides a non-limiting example of a flowchart for a process 200 for assigning the code words in an OR-robust codebook to a corresponding list of target analytes. Process 200 can be performed, for example, using one or more electronic devices implementing a software platform. In some examples, process 200 is performed using a client-server system, and the blocks of process 200 are divided up between the server and multiple client devices. Thus, while portions of process 200 are described herein as being performed by particular devices of a client-server system, it will be appreciated that process 200 is not so limited. In other examples, process 200 is performed using only a client device or only multiple client devices. In process 200, some blocks are, optionally, combined, the order of some blocks is, optionally, changed, and some blocks are, optionally omitted. In some examples, additional steps may be performed in combination with the process 200. Accordingly, the operations as illustrated (and described in greater detail below) are exemplary by nature and, as such, should not be viewed as limiting.
[0087] At step 202 in FIG. 2, a list of code words from an OR-robust codebook is received (e.g., by one or more processors of a system configured to perform the process illustrated in FIG. 2). The OR robust codebook may be, for example, a codebook generated using the process illustrated in FIG. 1 for which valid code words comply with the “OR- robust” constraint that Hamming Distance ( CWA{ CWB,
for all logical bitwise OR combinations of any two candidate code words in a plurality of remaining candidate code words, where K is an integer having a value of greater than or equal to 1.
[0088] At step 204 in FIG. 2, a list of a plurality of target molecules (or target analytes) that are of interest in a particular experiment is received. In some instances, the list may comprise a plurality of nucleic acids. In some instances, the plurality of nucleic acids may comprise a plurality of genes. In some instances, the plurality of nucleic acids may comprise a plurality of RNA transcripts. In some instances, the plurality of target analytes may comprise a plurality of proteins. In some instances, the plurality of target analytes may comprise a combination of nucleic acids e.g., genes or transcripts) and proteins.
[0089] At step 206 in FIG. 2, a target analyte from the list is assigned to at least one code word from the plurality of code words. Step 206 is repeated until all target analytes on the list have been assigned at least one code word from the OR-robust codebook.
[0090] In some instances, a code word from the codebook may be randomly assigned to each of the one or more barcoded target analytes. In some instances, specific code words from the codebook (e.g., those with the largest OR-robust distances) may be assigned to a specified list of genes/transcripts to denoise the decoding process for the specified list of genes/transcripts .
[0091] In some instances, a code word from the code book may be assigned to each of the one or more barcoded target analytes based on a decision rule designed to ensure that a total number of ON signals detected in a given image of the plurality of images is within ± 5%, ± 10%, ± 15%, or ± 20% of a mean number of ON signals detected per image for the plurality of images.
[0092] In some instances, a code word from the code book may be assigned to each of the one or more barcoded target analytes based on a decision rule designed to minimize a maximum predicted density of ON signals detected in images of the plurality of images.
[0093] In some instances, a code word from the code book may be assigned to each of two or more barcoded target analytes based on expression data for the two or more barcoded target analytes in clustered cell types (e.g., where code words with the largest OR-robust distances are assigned to genes/transcripts with the highest expression levels), and where the clustered cell types represent a distribution of cell types found in the biological sample. In some instances, the expression data for the two or more barcoded target analytes may comprise bulk gene expression data, bulk protein expression data, spatial gene expression data, spatial protein expression data, single cell gene expression data, single cell protein expression data, or any combination thereof. In some instances, the two or more assigned code words are rank-ordered according to code word weight, the two or more barcoded target analytes are rank-ordered according to a maximum expression level across all clustered cell types, and the two or more rank-ordered code words are assigned to the two or more rank- ordered barcoded target analytes using an iterative process repeated for each of the two or more barcoded target analytes in decreasing order of maximum expression level, the iterative process comprising: computing a predicted density of ON signals for every combination of
remaining, unassigned code words and the barcoded target analyte across the plurality of images; selecting a code word from the remaining, unassigned code words that minimizes the predicted density of ON signals across the plurality of images; and assigning the selected code word to the barcoded target analyte.
[0094] At step 208 in FIG. 2, the updated OR-robust codebook that includes code word - target analyte assignments is output.
[0095] FIG. 3 provides a non-limiting example of a flowchart for a process 300 for decoding optical signals derived from images of a biological sample to identify barcoded target analytes. Process 300 can be performed, for example, using one or more electronic devices implementing a software platform. In some examples, process 300 is performed using a client-server system, and the blocks of process 300 are divided up between the server and multiple client devices. Thus, while portions of process 300 are described herein as being performed by particular devices of a client-server system, it will be appreciated that process 300 is not so limited. In other examples, process 300 is performed using only a client device or only multiple client devices. In process 300, some blocks are, optionally, combined, the order of some blocks is, optionally, changed, and some blocks are, optionally omitted. In some examples, additional steps may be performed in combination with the process 300. Accordingly, the operations as illustrated (and described in greater detail below) are exemplary by nature and, as such, should not be viewed as limiting.
[0096] At step 302 in FIG. 3, an OR-robust codebook comprising a plurality of valid code words and their corresponding target analytes is received (e.g., by one or more processors of a system configured to perform the process illustrated in FIG. 3). The OR robust codebook may be, for example, a codebook generated using the process illustrated in FIG. 1 for which valid code words comply with the “OR-robust” constraint that Hamming
for all logical bitwise OR combinations of any two candidate code words in a plurality of remaining candidate code words, where K is an integer having a value of greater than or equal to 1.
[0097] In some instances, all valid code words in the OR-robust codebook may comply with the with the “OR-robust” constraint that Hamming Distance (CWA{ CWB, CWC^ CWD)
for all logical bitwise OR combinations of any two candidate code words in a plurality of
remaining candidate code words, where K is an integer having a value of greater than or equal to 1.
[0098] In some instances, at least a first portion of the valid code words in the OR-robust codebook may comply with the “OR-robust” constraint that Hamming Distance ( CWA{ CWB, CWCI CWD) all logical bitwise OR combinations of any two candidate code words in a plurality of remaining candidate code words, where K is an integer having a value of greater than or equal to 1. In some instances, the first portion of the valid code words may comprise, for example, at least 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, or 90% of the valid code words in the OR-robust codebook, and the remaining portion of the valid code words may not comply with the OR-robust property.
[0099] In some instances, each valid code word of the plurality of valid code words further complies with a second Hamming distance constraint, z.e., that the Hamming distance for any two valid code words is greater than or equal to Q, where Q is an integer having a value of greater than or equal to 1 (for example, <2 = 1, 2, 3, 4, 5, 6, 7, or 8).
[0100] In some instances, the codebook may comprise at least 50 valid code words (e.g., at least 50, 75, 100, 200, 300, 400, 500, 600, 700, 800, or 1,000 code words). In some instances, the codebook may comprise up to 300,000 valid code words (e.g., up to 10,000, 20,000, 30,000, 40,000, 50,000, 60,000, 70,000, 80,000, 90,000, 100,000, 120,000, 140,000, 160,000, 180,000, 200,000, 220,000, 240,000, 260,000, 280,000, or 300,000 valid code words).
[0101] At step 304 in FIG. 3, a plurality of images of a biological sample is received, where the plurality of images was acquired over a plurality of decoding (sequencing or probing) cycles, and where each image comprises a plurality of observed optical signals.
[0102] In some instances, the biological sample may comprise a tissue sample. In some instances, the biological sample may comprise cells, e.g., cells derived from a cell culture, a tissue sample, or cells deposited on a surface. In some instances, the biological sample may comprise, e.g., a tissue specimen that has been fixed, embedded, and/or cleared as described elsewhere herein.
[0103] In some instances, the plurality of images may comprise a plurality of images comprising different fields-of-view of the biological sample. In some instances, one or more
images e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more than 10 images) comprising different fields- of-view of the biological sample may be acquired in each decoding (probing or sequencing) cycle as necessary to image the entire cross-sectional area of the biological sample.
[0104] In some instances, the plurality of images may comprise a plurality of z-stack images of the biological sample. In some instances, a z-stack of images (i.e., a series of images acquired at each of two or more focal planes within the thickness of the biological sample) may be acquired for each of one or more fields-of-view of the sample. For example, each z-stack of image may comprise 2, 3, 4, 5, 6, 7, 8, 9, 10, or more than 10 images acquired at different focal planes within the thickness of the biological sample.
[0105] In some instances, the plurality of observed optical signals may comprise signal intensity measurements based on the plurality of images. In some instances, the plurality of optical signals and may represent light emitted from a plurality of fluorophores.
[0106] At step 306 in FIG. 3, the plurality of observed optical signals is decoded to obtain a plurality of observed code words.
[0107] In some instances, decoding the plurality of observed optical signals in the plurality of images to obtain a plurality of observed code words may comprise: determining a location of each observed optical signal in a first image of the plurality of images; aligning the location of each observed optical signal (or corresponding feature, e.g., an RCP derived from a target analyte) in the first image to a corresponding location of each observed optical signal (or corresponding feature, e.g., an RCP derived from a target analyte) in the remaining images of the plurality of images to obtain a series of observed optical signals at each location; and obtaining the plurality of observed code words based on the series of observed optical signals at each location.
[0108] In some instances, aligning the locations of each optical signal (or corresponding feature, e.g., an RCP derived from a target analyte) in the first image to corresponding locations in the remaining images of the plurality of images may comprise registering the plurality of images acquired over the plurality of decoding (sequencing or probing) cycles. In some instances, the alignment may comprise determining that optical signals derived from features (e.g., RCPs derived from target analytes) in different images arise from the same feature if the features in different images are within about 5, 10, 15, 20, 40, or 50 nm of each other.
[0109] In some instances, each observed optical signal of the plurality of observed optical signals in each image has associated therewith at least one value . In some instances, the at least one value includes an intensity value of the observed optical signal for each color channel. In some instances, the at least one value includes one or more statistical parameters, such as, for example, mean brightness, median brightness, variance, or standard deviation. In some instances, the at least one intensity value may comprise an analog intensity value. In some instances, an analog intensity value is determined for each light signal (or lack of light signal) in each color channel for each cycle of the plurality of cycles. For example, an analog intensity value may have a range of 0 (no intensity observed) to 12,000 (e.g., a full well capacity of each pixel in the sensor array). Full well capacity is defined as the amount of charge that can be stored within an individual pixel without the pixel becoming saturated (when an individual pixel can no longer accept any more photoelectrons). Full well capacity is dependent on the pixel size of the sensor and the camera operating voltages. In some instances, the analog intensity value is a sum of intensities from multiple pixels (e.g., two or more adjacent pixels). In some instances, the analog intensity value is an area under the curve. In some instances, an amplitude of an observed optical signal is a value of the peak of the spot, constrained by the well depth. In some instances, an analog intensity value is determined at a specific position for a detected RCP in each color channel. In some instances, presence of signal is detected separately in each color channel. In some instances, the separately detected signals are combined into a single vector or array. In an example, if a peak was not detected in a given channel, an intensity measurement in that channel will be missing or assigned a value of zero or null. An exemplary set of analog intensity values for a single RCP detected during a single imaging cycle may be { 11000, 100, 0, 0} indicating that a high intensity was detected in the first color channel (e.g., red color channel), a small amount of intensity was detected in a second color channel (e.g., yellow color channel), and no intensity was detected in the third (e.g., green color channel) and fourth color channels (e.g., blue color channel).
[0110] In some instances, the intensity values are binned into a single bin of a plurality of bins, where each bin represents a range of intensity values. In some instances, the plurality of bins includes more than 2 bins. In some instances, the plurality of bins includes 3 bins, 4 bins, 5 bins, 6 bins, 7 bins, 8 bins, 9 bins, 10 bins, 11 bins, 12 bins, 13 bins, 14 bins, 15 bins, 16 bins, 17 bins, 18 bins, 19 bins, 20 bins, 21 bins, 22 bins, 23 bins, 24 bins, 25 bins, etc. In some instances, the plurality of bins includes more than 25 bins. In some embodiments, the
plurality of bins includes up to 100 bins. For example, the plurality of bins may include 4 bins as follows: Bin 0 is 0 to 2999; Bin 1 is 3000 to 5999; Bin 2 is 6000 to 8999; Bin 3 is 9000 to 12000. In this example, intensity values from 0 to 2999 are binned into Bin 0, intensity values from 3000 to 5999 are binned into Bin 1, intensity values from 6000 to 8999 are binned into Bin 2, and intensity values from 9000 to 12000 are binned into Bin 3. In some instances, each of the plurality of bins have approximately equal sizes (as per the example above). In some instances, the plurality of bins has different sizes, for example, where Bin 0 is 0 to 1999; Bin 1 is 2000 to 4999; Bin 2 is 5000 to 8999; Bin 3 is 9000 to 12000.
[0111] In some instances, where binning is used, a set of intensity values are represented by the bin number into which the intensity value is placed. An exemplary set of binned intensity values for a single RCP detected during a single imaging cycle may be {3, 1, 0, 0} indicating that a high intensity was detected in the first color channel (e.g., red color channel), a small amount of intensity was detected in a second color channel (e.g., yellow color channel), and no intensity was detected in the third (e.g., green color channel) and fourth color channels (e.g., blue color channel).
[0112] In some instances, the at least one intensity value may comprise a raw intensity value, a normalized intensity value, or a calculated intensity value calculated based on at least one of: a size of a feature corresponding to the observed optical signal (e.g., the radius of an imaged RCP), a circularity of a feature corresponding to the observed optical signal (e.g., the circularity of an imaged RCP), or one or more Gaussian statistical parameters (e.g., mean, standard deviation, variance, etc.) characterizing a feature corresponding to the observed optical signal (e.g., an imaged RCP). In some instances, pixel intensity values in an image are normalized based on pixel intensity of background signals and pixel intensity of puncta detected within the image. For example, pixel values of an image are scaled using a background measurement (e.g., a mean or median of background intensities) as a floor and a predetermined intensity percentile (e.g., 99th intensity percentile) of the detected puncta as a ceiling.
[0113] In various embodiments, intensity values are normalized. In various embodiments, as a first step, the intensity values of puncta (e.g., observed optical signals) from every image are capped to a high percentile value (e.g., 99th percentile) and divided by that high percentile value to scale the intensity values between 0 and 1.0. In various
embodiments, as a second step, the values are scaled by the median raw intensity over all images to bring the values back into an intensity range similar to the original observed values. In various embodiments, as a third step, for every decoding neighborhood (e.g., a predetermined radius around a specific puncta) of puncta, divide the intensity values of all puncta by the intensity value of the central puncta of the neighborhood, so that systematically dimmer puncta may decode while penalizing variance in the brightness values. In various embodiments, the first and/or second steps may be omitted. In various embodiments, the third step reduces FOV-to-FOV decoding variability ("global decoding").
[0114] In some instances, the process may further comprise comparing the at least one intensity value representing an intensity of each observed optical signal to a predetermined intensity threshold to determine a binary value representing the intensity of each observed optical signal. In some instances, each binary value comprises a 1 or a 0, wherein 1 represents an observed optical signal for which intensity is greater than or equal to the predetermined intensity threshold (e.g., an ON signal), and 0 represents an observed optical signal for which intensity is less than the predetermined intensity threshold (e.g., an OFF signal).
[0115] In some instances, decoding the plurality of observed optical signals in the plurality of images may comprise obtaining the plurality of observed code words based on a series of binary values determined for each location. In some instances, each observed code word of the plurality of code words may comprise a plurality of code word segments, and each code word segment may comprise a specified string of binary values that corresponds to one of a specified set of observed optical signal states. In some instances, each code word segment comprises, for example, a four bit string of binary values such that:
• a code word segment of 1 0 00 corresponds to a first optical signal state, A, in which an optical signal is detected in a first detection channel of a four-channel optical imaging instrument, and no optical signal is detected in a second, third, or fourth detection channel of the four-channel optical imaging instrument;
• a code word segment of 0 1 00 corresponds to a second optical signal state, B, in which an optical signal is detected in the second detection channel of the four-channel optical imaging instrument, and no optical signal is detected in the first, third, or fourth detection channel of the four-channel optical imaging instrument;
• a code word segment of 00 1 0 corresponds to a third optical signal state, C, in which an optical signal is detected in the third detection channel of the four-channel optical imaging instrument, and no optical signal is detected in the first, second, or fourth detection channel of the four-channel optical imaging instrument;
• a code word segment of 00 0 1 corresponds to a fourth optical signal state, D, in which an optical signal is detected in the fourth detection channel of the four-channel optical imaging instrument, and no optical signal is detected in the first, second, or third detection channel of the four-channel optical imaging instrument; and
• a code word segment of 00 00 corresponds to a fifth optical signal state, E, in which no optical signal is detected in any of the first, second, third, or fourth detection channels of the four-channel optical imaging instrument.
[0116] In some instances, the valid code words in the codebook are stored in a database using the optical signal state format. For example, a valid codeword for a 15 cycle run may be AEEEDEEBEEEAECE. In some instances, the observed optical signal is converted into the optical signal state format before assigning the observed optical signal to a valid code word.
[0117] In some instances, a probabilistic method of decoding is used to map a set of intensity values (e.g., binary intensity values, analog intensity values, binned intensity values, etc.) from each cycle of the plurality of cycles to an observed optical signal state. In some instances, a frequency table is generated using all possible combinations of intensity values for the color channels (e.g., four color channels) such that the frequency table maps each unique set of four intensity values to a most-likely optical signal state (e.g., A, B, C, D, or E). In some instance, the frequency table is generated from previous runs of an opto-fluidic instrument. In some instances, the frequency table is generated using a control sample. In some instances, the frequency table is updated during (e.g., after each cycle) or after each run in complete. Using the example from above, an observed set of binned intensity values {3, 1, 0, 0} may be most likely to map to state A based on the frequency table. In another example, an observed set of binned intensity values {3, 0, 0, 0} may be most likely to map to state A based on the frequency table. In yet another example, an observed set of binned intensity values {0, 1, 3, 1 } may be most likely to map to state C based on the frequency table (e.g., the low intensities may be caused by autofluorescence or spectral crosstalk). In yet another
example, an observed set of binned intensity values {0, 0, 2, 0} may be most likely to map to state C based on the frequency table. In yet another example, an observed set of binned intensity values { 1, 1, 1, 1 } may be most likely to map to state E based on the frequency table. In yet another example, an observed set of binned intensity values {0, 0, 0, 0} may be most likely to map to state E based on the frequency table. In yet another example, an observed set of binned intensity values {0, 0, 1, 1 } may be most likely to map to state E based on the frequency table. In some instances, once a most- likely optical signal state (e.g., A, B, C, D, or E) has been determined using a probabilistic decoding method, the optical signal state can be converted into a binary format as described above with respect to the code word segments.
[0118] As described in more detail below, those of skill in the art will recognize that the number of decoding cycles used in the multiplexed in situ assay for which the OR-robust codebook is designed, and the number of independent detectable states (e.g., one color, two color, three color, four color, or no color) corresponding to the signals detected on one or more optical detection channels of the instrument used to decode the multiplexed in situ assay will impact the number and design of the valid code words in the codebook.
[0119] At step 308 in FIG. 3, each observed code word is analyzed to determine if an assignment of the observed code word to one of the plurality of valid code words from the OR-robust codebook can be made, or alternatively, to determine that the observed code word is not a valid code word. Step 308 is repeated until all of the observed code words have been processed and either assigned to valid code words or classified as artifacts resulting from, e.g., non-specific hybridization of a labeled detection probe used for decoding or a sequencing error.
[0120] In some instances, determining the assignment of the observed code word to one of the plurality of valid code words may comprise identifying a valid code word of the plurality of valid code words that is identical to the observed code word.
[0121] In some instances, determining the assignment of the observed code word to one of the plurality of valid code words may comprise changing at least one of the binary values in the series of binary values corresponding to the observed code word to thereby assign the observed code word to a valid code word of the plurality of valid code words.
[0122] In some instances, as described in more detail elsewhere herein, determining the assignment of the observed code word to one of the plurality of valid code words may comprise determining a plurality of scores based on comparison of the observed code word to all or a portion of the plurality of valid code words. In some instances, determining the assignment of the observed code word to one of the plurality of valid code words may further comprise selecting one of the plurality of valid code words having a highest score to assign as a replacement for the observed code word.
[0123] At step 310 in FIG. 3, the presence (and location) of a barcoded target analyte within the biological sample is identified for each valid code word detected in the plurality of images. In some instances, the identified target analyte may comprise a messenger RNA (mRNA) molecule or protein molecule.
[0124] In some instances, the process illustrated in FIG. 3 may comprise post-processing of a plurality of stored optical images to obtain the plurality of optical signals (and their respective locations) for subsequent use in determining observed code words and determining their assignment to valid code words in a codebook. In some instances, all or a portion of the process illustrated in FIG. 3 may be performed in the cloud, e.g., using a received plurality of images or received optical signal data (and corresponding location data) previously derived from the plurality of images. For example, in some instances, the process may comprise: receiving a codebook comprising a plurality of valid code words, where, for all or a first portion of valid code words in the codebook, a first Hamming distance between a first logical bitwise OR combination of any pair of valid code words and a second logical bitwise OR combination of any other pair of valid code words is greater than or equal to 1 ; receiving a plurality of locations for a plurality of observed optical signals, where the plurality of observed optical signals are obtained from a plurality of images of a biological sample acquired over a plurality of sequencing or probing cycles; decoding the plurality of observed optical signals to obtain a plurality of observed code words at the plurality of locations; and for each observed code word: (i) determining an assignment of the observed code word to one of the plurality of valid code words, or (ii) determining that the observed code word is not a valid code word of the plurality of valid code words.
[0125] FIGS. 4A-4B illustrate the structure of binary code words, in accordance with some implementations of the methods described herein. As noted above, the code words correspond to and represent the physical barcodes e.g., oligonucleotide barcode sequences)
attached to the target molecules in a multiplexed in situ assay, where the relationship between the structure of the code words and the structure of the physical barcodes depends on the read-out method (e.g., hybridization probe-based detection or nucleic acid sequencing) used in the in situ assay.
[0126] FIG. 4A depicts a non-limiting example of the structure of a binary code word for use with hybridization probe-based in situ detection of barcoded target analytes (described in more detail below). Each code word comprises a series of code word segments (e.g., 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, or more than 15 segments), where each code word segment comprises as series of bits (e.g., 2, 3, 4, or more than 4 bits), and where each bit in a given code word segment corresponds to the detection of an ON signal (“1”) or an OFF signal (“0”) in a given optical detection channel in a given decoding cycle. In some instances, the number of bits in each code word segment corresponds to the number of optical detection channels (e.g., different fluorescence emission detection channels or different color detection channels) in an imaging instrument used to perform a cyclical decoding process comprising, e.g., 4, 5,
6, 7, 8, 9, 10, 11, 12, 13, 14, 15, or more than 15 decoding cycles. The total number of bits in the binary code word is then given by:
Total # bits = M x N where M is the number of decoding (probing) cycles, and N is the number of bits determined in each decoding (probing) cycle (e.g., the number of bits per code word segment, where the code word segment comprises, e.g., 1 bit per detection channel). In the case of hybridization probe-based read-out, each code word segment corresponds to optical signals detected in images acquired in a given decoding cycle after contacting the biological sample (e.g., a tissue specimen) with a set of detectably-labeled hybridization probes designed to hybridize to a segment of the physical barcode (e.g., a segment of the oligonucleotide barcode sequence).
[0127] FIG. 4B depicts a non-limiting example of the structure of a binary code word for use with sequencing-based in situ detection of barcoded target analytes (described in more detail below). Again, each code word comprises a series of code word segments (e.g., 4, 5, 6,
7, 8, 9, 10, 11, 12, 13, 14, 15, or more than 15 segments), where each code word segment comprises as series of bits (e.g., 2, 3, 4, or more than 4 bits), and where each bit in a given code word segment corresponds to the detection of an ON signal (“1”) or an OFF signal (“0”) in a given optical detection channel in a given sequencing cycle. In some instances, the
number of bits in each code word segment corresponds to the number of optical detection channels (e.g., different fluorescence emission detection channels or different color detection channels) in an imaging instrument used to perform a cyclical sequencing process comprising, e.g., 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, or more than 15 sequencing cycles. The total number of bits in the binary code word is given by:
Total # bits = M x N where M is now the number of sequencing cycles, and N is the number of bits determined in each sequencing cycle (e.g., the number of bits per code word segment, where the code word segment comprises, e.g., 1 bit per detection channel). In the case of sequencing-based readout, each code word segment corresponds to optical signals detected in images acquired in a given sequencing cycle to determine the identify of a single nucleotide in the physical barcode (e.g., the oligonucleotide barcode sequence).
[0128] FIG. 5 provides a non-limiting schematic illustration of hybridization probe-based in situ detection of barcoded target analytes (or amplified representations, e.g., RCPs, thereof), where the barcodes comprise, e.g., oligonucleotide barcode sequences that have been assigned to a corresponding code word from an OR-robust codebook. In this scenario, the physical barcode sequences each comprise a series of short barcode (BC) segments (e.g., BC segment 1, BC segment 2, , BC segment M) with one barcode segment for each cycle in a cyclical decoding (probing) process (comprising M cycles in total) that is used to decode a set of optical signals associated with each barcode as detected in a plurality of images acquired of a biological sample during the cyclical decoding (probing) process.
[0129] In each decoding (probing) cycle, a set of detectably-labeled hybridization probes (e.g., fluorescently-labeled hybridization probes) that are designed to hybridize to specific barcode segments are introduced into a biological sample (e.g., a tissue specimen that has been fixed, embedded, and/or cleared as described elsewhere herein) and allowed to hybridize to a corresponding barcode segment. The number of unique hybridization probes in the set is typically the same as the number of unique barcode segments to be probed in a given decoding (probing) cycle.
[0130] In some instances, all of the unique hybridization probes in the set may be labeled with a detectable label, e.g., a fluorescent label, where different hybridization probes in the set are labeled with a different fluorophore. In some instances, only a subset of the unique
hybridization probes in the set may be labeled with a detectable label, e.g., a fluorophore, where different hybridization probes in the subset are labeled with a different fluorophore.
[0131] In some instances, the number of different detectable labels, e.g., fluorophores, used in each set of detectably-labeled hybridization probes may be the same for sets used in different cycles of the hybridization probe-based decoding process. In some instances, the number of different detectable labels, e.g., fluorophores, used in each set of detectably- labeled hybridization probes may be different for sets used in different cycles of the hybridization probe-based decoding process. In general, the number of different detectable labels, e.g., fluorophores, used in each set of detectably-labeled hybridization probes will depend on factors such as the number of different optical detection channels (e.g., one color, two color, three color, or four color detection) in the instrument used to perform decoding, and the design of the code words used in the multiplexed in situ assay (e.g., in some cases, an absence of signal in a given decoding cycle (i.e., an OFF signal) may be used as part of code word design).
[0132] In some instances, the total number of unique barcode segments to be probed in a given decoding (probing) cycle (and the total number of unique hybridization probes in the corresponding set of hybridization probes) may be, e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more than 10 unique barcode segments (or unique hybridization probes).
[0133] The biological sample is then imaged, and the image(s) (e.g., fluorescence image(s)) are processed using any of a variety of image processing techniques known to those of skill in the art to measure signal intensities at the locations of a plurality of barcoded target molecules (or amplified representations, e.g., RCPs, thereof). In some instances, the plurality of barcoded target molecules (or amplified representations, e.g., RCPs, thereof) may comprise, e.g., at least 10, 20, 30, 40, 50, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1,000, 5,000, 10,000, 20,000, 30,000, 40,000, 50,000, 60,000, 70,000, 80,000, 90,000, or 100,000 barcoded target molecules (or amplified representations, e.g., RCPs, thereof).
[0134] One or more images (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, or more than 1000) comprising different fields-of-view of the biological sample may be acquired in each cycle as necessary to image the entire cross-sectional area of the biological sample.
[0135] In some instances, a z-stack of images (z.e., a series of images acquired at each of two or more focal planes within the thickness of the biological sample) may be acquired for each of one or more fields-of-view of the sample. For example, each z-stack of image may comprise 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 25, 30, 35, 40, 45, 50, or more than 50 images acquired at different focal planes within the thickness of the biological sample.
[0136] Following completion of the imaging step, the hybridized probes are stripped from the biological sample and the process is repeated for a specified number of cycles, M.
[0137] FIG. 6 provides a non-limiting schematic illustration of sequencing-based in situ detection of barcoded target analytes (or amplified representations, e.g., RCPs, thereof), where the barcodes comprise, e.g., oligonucleotide barcode sequences that have been assigned to a corresponding code word from an OR-robust codebook. In this scenario, the physical barcode sequences each comprise an oligonucleotide sequence of M nucleotides in length, where one nucleotide is to be identified in each cycle in a cyclical decoding (base-by- base nucleic acid sequencing) process (comprising M cycles in total) that is used to decode a set of optical signals associated with each barcode as detected in a plurality of images acquired of a biological sample during the cyclical decoding (sequencing) process. Any of a variety of base-by-base sequencing techniques known to those of skill in the art (as described elsewhere herein) may be used to determine barcode sequences multiplexed in situ assays that utilize the disclosed codebook design methods.
[0138] In the case of sequencing-by-synthesis (SBS), for example, in each decoding (sequencing) cycle, a set of detectably-labeled nucleotides (e.g., fluorescently-labeled, 3’ reversibly terminated nucleotides) are introduced into a biological sample (e.g., a tissue specimen that has been fixed, embedded, and/or cleared as described elsewhere herein) along with a polymerase, and contacted with a primed barcode sequence under conditions that allow incorporation of a 3’ reversibly terminated nucleotide that is complementary to a corresponding nucleotide in the barcode sequence to be incorporated into the priming strand. The number of unique 3’ reversibly terminated nucleotides in the set is typically the same as the number of unique nucleotide residues (typically four) that are potentially present at a given position in the barcode sequence to be probed in a given decoding (sequencing) cycle.
[0139] In some instances, all of the unique 3’ reversibly terminated nucleotides in the set may be labeled with a detectable label, e.g., a fluorescent label, where different nucleotides in
the set are labeled with a different fluorophore. In some instances, only a subset of the unique 3’ reversibly terminated nucleotides in the set may be labeled with a detectable label, e.g., a fluorophore, where different nucleotides in the subset are labeled with a different fluorophore.
[0140] In some instances, the number of different detectable labels, e.g., fluorophores, used in each set of detectably-labeled 3’ reversibly terminated nucleotides may be the same for sets used in different cycles of the sequencing -based decoding process. In some instances, the number of different detectable labels, e.g., fluorophores, used in each set of detectably-labeled 3’ reversibly terminated nucleotides may be different for sets used in different cycles of the sequencing-based decoding process. In general, the number of different detectable labels, e.g., fluorophores, used in each set of detectably-labeled 3’ reversibly terminated nucleotides will depend on factors such as the number of different optical detection channels (e.g., one color, two color, three color, or four color detection) in the instrument used to perform decoding, and the design of the code words used in the multiplexed in situ assay (e.g., in some cases, an absence of signal in a given decoding cycle (i.e., an OFF signal) may be used as part of code word design)
[0141] In some instances, the total number of unique nucleotide residues to be probed in a given decoding (sequencing) cycle (and the total number of unique 3’ reversibly terminated nucleotides in the corresponding set of nucleotides) may be, e.g., 2, 3, or 4 (or more than 4 if non-natural nucleotides that obey similar base-pairing rules are included).
[0142] The biological sample is then imaged, and the image(s) (e.g., fluorescence image(s)) are processed using any of a variety of image processing techniques known to those of skill in the art to measure signal intensities at the locations of a plurality of barcoded target molecules (or amplified representations, e.g., RCPs, thereof). In some instances, the plurality of barcoded target molecules (or amplified representations, e.g., RCPs, thereof) may comprise, e.g., at least 10, 20, 30, 40, 50, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1,000, 5,000, 10,000, 20,000, 30,000, 40,000, 50,000, 60,000, 70,000, 80,000, 90,000, or 100,000 barcoded target molecules (or amplified representations, e.g., RCPs, thereof).
[0143] One or more images (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more than 10 images) comprising different fields-of-view of the biological sample may be acquired in each cycle as necessary to image the entire cross-sectional area of the biological sample.
[0144] In some instances, a z-stack of images (z.e., a series of images acquired at each of two or more focal planes within the thickness of the biological sample) may be acquired for each of one or more fields-of-view of the sample. For example, each z-stack of image may comprise 2, 3, 4, 5, 6, 7, 8, 9, 10, or more than 10 images acquired at different focal planes within the thickness of the biological sample.
[0145] Following completion of the imaging step, the 3’ reversibly terminated nucleotide that has been incorporated into the priming strand is deprotected and the process is repeated for a specified number of cycles, M.
[0146] The processing of images acquired during the in situ decoding schemes illustrated in the flowcharts of FIG. 5 and FIG. 6 is performed in similar fashion. In some instances, the images may be processed in real-time immediately following acquisition. In some instances, the images may be post-processed, i.e., they may be stored in computer memory and processed at a later time.
[0147] Processing of the image(s) acquired in each decoding (probing or sequencing) cycle results in the generation of a fluorescence data set for each decoding (probing or sequencing) cycle (e.g., fluorescence data set 1, fluorescence data set 2, , fluorescence data set M) that each comprise measured fluorescence signal intensities (in the case that the detectable labels comprise fluorophores) for each of the plurality of locations at which target molecules (or amplified representations, e.g., RCPs, thereof) are detected. In some instances, e.g., where one or more images having different fields-of-view were acquired at a single focal plane, the fluorescence data sets comprise measured fluorescence signal intensities for each of a plurality of target molecule locations (or the locations of amplified representations, e.g., RCPs, thereof) in two dimensions. In some instances, e.g., where a z-stack of images is acquired for each of the one or more different fields-of-view were acquired, the fluorescence data sets comprise measured fluorescence signal intensities for each of a plurality of target molecule locations (or the locations of amplified representations, e.g., RCPs, thereof) in three dimensions.
[0148] The compiled set of fluorescence data sets e.g., fluorescence data set 1, fluorescence data set 2, , fluorescence data set M) may then be processed to identify a series of fluorescence signals at each of a plurality of target molecule locations (or the locations of amplified representations, e.g., RCPs, thereof) detected in the images acquired
over the course of performing the M decoding (probing or sequencing) cycles. In some cases, the fluorescence signals may comprise analog signals (z.e., continuous, real-valued fluorescence intensity signals, such as those obtained when using photomultipliers or photomultiplier arrays). In some cases, the fluorescence signals may comprise digital signals (z.e., digitized renditions of continuous, real-valued fluorescence intensity signals, such as those obtained when using CMOS or CCD image sensors).
[0149] In some cases, the fluorescence signals may be processed, e.g., to perform one or more of background subtraction, normalization, fitting to a Gaussian or other line shape function, determination of a centroid position, etc. Any of a variety of image processing methods known to those of skill in the art may be used for image processing / pre-processing. Examples include, but are not limited to, Canny edge detection methods, Canny-Deriche edge detection methods, first-order gradient edge detection methods (e.g., the Sobel operator), second order differential edge detection methods, phase congruency (phase coherence) edge detection methods, other image segmentation algorithms (e.g., intensity thresholding, intensity clustering methods, intensity histogram-based methods, etc.), feature and pattern recognition algorithms (e.g., the generalized Hough transform for detecting arbitrary shapes, the circular Hough transform, etc.), and mathematical analysis algorithms (e.g., Fourier transform, fast Fourier transform, wavelet analysis, auto-correlation, etc.), or any combination thereof.
[0150] In some cases, the fluorescence signals may be processed and/or compared to a predetermined fluorescence intensity threshold to generate corresponding binary signal values (e.g., ON signals (“1”) or OFF signals (“0”) that indicate whether or not a fluorescence signal of intensity greater than or equal to the predetermined fluorescence intensity threshold was detected in a given optical detection channel (e.g., a given fluorescence emission detection channel or a given color detection channel) for a given decoding (probing or sequencing) cycle. The series of binary signal values determined for each target molecule location (or the location of the amplified representation, e.g., RCP, thereof) in the series of M decoding (probing or sequencing) cycles may then be used, in combination with prior knowledge of the optical detection channels for which signals were detected in each decoding (probing or sequencing) cycle, to identify a plurality of observed code words corresponding to the plurality of barcoded target molecules.
[0151] In some cases, an observed code word may be identical to one of the valid code words from the OR-robust codebook and the identity of the corresponding target molecule can be determined directly from the OR-robust codebook assignments.
[0152] In some cases, an observed code word may correspond closely to one of the valid code words from the OR-robust codebook, but may not be identical series of binary values. In such cases, the properties of the OR-robust code book may be used to detect and/or correct errors arising from, e.g., non-specific hybridization of detectably-labeled probes, or sequencing errors, and thereby assign the observed code word to a valid code word from the OR-robust codebook.
[0153] In some instances, for example, an observed code word may be assigned to a valid code word if changing one or more of the binary values (e.g., bits) in the series of binary values corresponding to the observed code word results in with the observed code word being identical to a valid code word of the plurality of valid code words in the OR-robust codebook.
[0154] In some instances, an observed code word may be assigned to (e.g., replaced by) a valid code word based on determining a plurality of scores (e.g., pairwise edit distances, Hamming distances, and/or Hamming distances between logical bitwise OR code word combinations) based on comparison of the observed code word to all or a portion of the plurality of valid code words in the OR-robust codebook. In some instances, the observed code word may be assigned to (e.g., replaced by) a valid code word that exhibits the highest score (e.g., the minimum edit distance, Hamming distance, and/or Hamming distance between logical bitwise OR code word combinations). In some instances, each score in the plurality of scores is a probability (e.g., 0 to 1). In some instance, the highest score is the highest probability. In some instances, each score in the plurality of scores is a loglikelihood. In some instance, the highest score is the highest log-likelihood.
[0155] In some instances, as described in U.S. Patent Application Publication No. 2022- 0084628 which is incorporated by reference herein in its entirety, one or more observed code words may be assigned to (e.g., replaced by) valid code words based on replacement with a corresponding valid code word in the OR-robust codebook that has a maximum likelihood as computed from the log likelihood (or negative log likelihood) of a probability distribution generated by a probabilistic model that provides probabilities for detecting a given code word, or code word segment, at a given location in a given decoding (probing or sequencing)
cycle based on a set of detected optical signals (e.g., fluorescence signals) associated with a set of hybridization probes or nucleotides used to detect the barcode sequences.
[0156] In some instances, one or more observed code words may be assigned to (e.g., replaced by) valid code words based on replacement with a corresponding valid code word in the OR-robust codebook that: (i) is within a predetermined pairwise edit distance (e.g., a predetermined Hamming distance and/or a predetermined Hamming distance between logical bitwise OR combinations of valid code words) from the observed code word, and (ii) has a maximum likelihood as computed from the log likelihood (or negative log likelihood) for a probability distribution generated by a probabilistic model that provides probabilities for detecting a given code word, or code word segment, at a given location in a given decoding (probing or sequencing) cycle based on a set of detected optical signals associated with a set of hybridization probes or nucleotides used to detect the barcode sequences.
[0157] In some instances, one or more observed code words may be assigned to (e.g., replaced by) valid code words based on an iterative process comprising correcting the one or more observed code words by replacement with one of the valid code words that: (i) is within a predetermined pairwise edit distance (e.g., a predetermined Hamming distance and/or a predetermined Hamming distance between logical bitwise OR combinations of valid code words) from the observed code word (determined, for example, by rank-ordering the set of valid code words according to their pairwise edit distance from the observed code word), and (ii) has a maximum likelihood as computed from a log likelihood (or negative log likelihood) for a probability distribution generated by a probabilistic model that provides probabilities for detecting a given code word, or code word segment thereof, at a given location in a given decoding (probing or sequencing) cycle based on a set of detected optical signals, and updating the probabilistic model using the corrected code words, where the process is repeated until a fully corrected set of validated code words is obtained. In some instances, following convergence, each previously corrected code word is replaced with one of the valid code words that: (iii) is within a predetermined pairwise edit distance (e.g., a predetermined Hamming distance and/or a predetermined Hamming distance between logical bitwise OR combinations of valid code words) of the previously corrected code word, and (iv) has a maximum likelihood as computed from the log likelihood (or negative log likelihood) for a probability distribution generated by the updated probabilistic model.
[0158] In various embodiments, rates of various error modes are estimated, such of optical cross-talk or stripping errors, by comparing the observed codeword to the bestmatching valid codeword, and a probabilistic decoding model can be updated based on the estimated error rates. In various embodiments, parameters of a maximum likelihood model are updated according to the empirical rates of those errors.
[0159] In some instances, an observed code word may be assigned to (e.g., replaced by) a valid code word based on an iterative process comprising correcting one or more observed code words by replacement with one of the valid code words that: (i) is within a predetermined pairwise edit distance (e.g., a predetermined Hamming distance and/or a predetermined pairwise Hamming distance between logical bitwise OR combinations of valid code words) from the observed code word (determined, for example, by rank-ordering the set of valid code words according to their pairwise edit distance from the observed code word), and (ii) has a maximum likelihood as computed from a truncated log likelihood (or negative truncated log likelihood) for a probability distribution generated by a probabilistic model that provides probabilities for detecting a given code word, or code word segment thereof, at a given location in a given decoding (probing or sequencing) cycle based on a set of detected optical signals, and updating the probabilistic model using the corrected code words, where the process is repeated until a fully corrected set of validated code words is obtained. In some instances, following convergence, each previously corrected code word is replaced with one of the valid code words that: (iii) is within a predetermined pairwise edit distance (e.g., a predetermined Hamming distance and/or a predetermined Hamming distance between logical bitwise OR combinations of valid code words) of the previously corrected code word, and (iv) has a maximum likelihood as computed from the truncated log likelihood (or negative truncated log likelihood) for a probability distribution generated by the updated probabilistic model.
III. Detection and Analysis
(i) Hybridization probe-based detection and decoding
[0160] In some aspects, the provided methods involve analyzing, e.g., detecting or determining, one or more sequences present in the probes or probe sets or products thereof (e.g., rolling circle amplification products thereof). In some embodiments, the detecting is performed at one or more locations in the biological sample. In some embodiments, the
locations are the locations of RNA transcripts in the biological sample. In some embodiments, the locations are the locations at which the probes or probe sets hybridize to the RNA transcripts in the biological sample, and are optionally ligated and amplified by rolling circle amplification.
[0161] In some embodiments, detecting the one or more sequences present in the probes or probe sets in the biological sample is performed, and the detected sequences are compared to an expected set of detected sequences. In some embodiments, the expected set of sequences is based on the barcode sequences of the panels of probes or probe sets in the probe mixture and the known expression levels of the RNA transcripts of the first, second, and/or third sets of genes in the first and second cell populations. In some embodiments, the one or more sequences are one or more barcode sequences or complements thereof. In some embodiments, the expected set of detected sequences include sequences expected to be detected at a high expression level (e.g., more than 20 counts of the detected sequence per cell) in one or both of the first and second cell populations. In some embodiments, the expected set of detected sequences include sequences expected to be detected at a medium expression level (e.g., 5-20 counts of the detected sequence per cell) in one or both of the first and second cell populations. In some embodiments, the expected set of detected sequences include sequences expected to be detected at a low expression level (e.g., 1-5 counts of the detected sequence per cell) in one or both of the first and second cell populations.
[0162] In some embodiments, the detecting comprises a plurality of repeated cycles of hybridization and removal of probes (e.g., detectably labeled probes, or intermediate probes that bind to detectably labeled probes) to the primary probe or probe set hybridized to the target nucleic acid, or to a rolling circle amplification product generated from the probe or probe set hybridized to the target nucleic acid.
[0163] Methods for binding and identifying a target nucleic acid that uses various probes or oligonucleotides have been described in, e.g., US2003/0013091, US2007/0166708, US2010/0015607, US2010/0261026, US2010/0262374, US2010/0112710, US2010/0047924, and US2014/0371088, each of which is incorporated herein by reference in its entirety. Detectably-labeled probes can be useful for detecting multiple target nucleic acids and be detected in one or more hybridization cycles (e.g., sequential hybridization assays, or sequencing by hybridization).
[0164] In some embodiments, the detecting can comprise binding an intermediate probe directly or indirectly to the primary probe or probe set, binding a detectably labeled probe directly or indirectly to a detection region of the intermediate probe, and detecting a signal associated with the detectably labeled probe. In some embodiments, the method comprises detecting a rolling circle amplification product (RCP) generated using a circular or circularized primary probe or probe set as a template. In some embodiments, the method comprises detecting a rolling circle amplification product (RCP) generated using a circular or circularized probe or probe that binds to a primary probe or probe set as a template. In some embodiments, detecting the RCP comprises binding an intermediate probe directly or indirectly to the RCP, binding a detectably labeled probe directly or indirectly to a detection region of the intermediate probe, and detecting a signal associated with the detectably labeled probe. In some embodiments, the method can comprise performing one or more wash steps to remove unbound and/or nonspecifically bound intermediate probe molecules from the primary probes or the products of the primary probes.
[0165] In some embodiments, the detecting can comprise: detecting signals associated with detectably labeled probes that are hybridized to barcode regions or complements thereof in the primary probe or probe set or a product thereof (e.g., an RCP); and/or detecting signals associated with detectably labeled probes that are hybridized to intermediate probes which are in turn hybridized to the barcode regions or complements thereof. In some embodiments, the detectably labeled probes can be fluorescently labeled.
[0166] In some embodiments, the methods comprise detecting the sequence in all or a portion of a primary probe or probe set or an RCP, or detecting a sequence of the primary probe or probe set or RCP, such as one or more barcode sequences present in the primary probe or probe set or RCP. In some embodiments, the sequence of the RCP, or barcode thereof, is indicative of a sequence of the target nucleic acid to which the RCP is hybridized. In some embodiments, the analysis and/or sequence determination comprises detecting a sequence in all or a portion of the nucleic acid concatemer and/or in situ hybridization to the RCP. In some embodiments, the detection step is by sequential fluorescent in situ hybridization (e.g., for combinatorial decoding of the barcode sequence or complement thereof).
[0167] In some embodiments, the detection or determination comprises hybridizing to a probe directly or indirectly a detection oligonucleotide labeled with a fluorophore, an isotope,
a mass tag, or a combination thereof. In some embodiments, the detection or determination comprises imaging the probe hybridized to the target nucleic acid (e.g., imaging one or more detectably labeled probes hybridized thereto). In some embodiments, the target nucleic acid is an mRNA in a tissue sample, and the detection or determination is performed when the target nucleic acid and/or the amplification product is in situ in the tissue sample. In some embodiments, the target nucleic acid is an amplification product (e.g., a rolling circle amplification product).
(ii) Sequencing-based detection and decoding
[0168] Any of a variety of base-by-base sequencing techniques known to those of skill in the art including may be used to determine barcode sequences (or more generally, template sequences) in multiplexed in situ assays that utilize the disclosed codebook design methods, including, but not limited to, “sequencing-by- synthesis” (SBS) (see, e.g., US Pat. No.
7,883,869 which is incorporated herein by reference in its entirety), “sequencing-by-ligation” (SBL) (see, e.g., U.S. Patent No. 5,552,278 which is incorporated herein by reference in its entirety), “sequencing-by-binding” (SBS) (see, e.g., U.S. Pat. Nos. 9,951,385 and 10,655,176 which are incorporated herein by reference in their entireties), and “sequencing-by-avidity” (SBA) (see, e.g., U.S. Pat. Nos. 10,768,173 and 10,982,280 which are incorporated herein by reference in their entireties).
[0169] In some embodiments, sequencing can be performed by sequencing-by- synthesis (SBS). In some embodiments, a sequencing primer is complementary to primer binding sequences located at or near the one or more barcode sequence(s). In such embodiments, sequencing-by- synthesis can comprise reverse transcription and/or amplification in order to generate a template sequence from which a primer sequence can bind. Exemplary SBS methods comprise those described for example, but not limited to, US 2007/0166705, US 2006/0188901, US 7,057,026, US 2006/0240439, US 2006/0281109, US 2011/0059865, US 2005/0100900, US 9,217,178, US 2009/0118128, US 2012/0270305, US 2013/0260372, and US 2013/0079232, all of which are herein incorporated by reference in their entireties.
[0170] Accurate decoding of a single-stranded template (barcode) sequences relies on successfully classifying signals that arise from the stepwise addition of A, G, C, and T nucleotides by a polymerase to a complementary primer extension strand. In conventional sequencing approached, these methods typically include modifying the template sequences
with a known adapter sequence used to tether the template sequences to a solid support (e.g., the interior surface(s) of a flow cell) in a random or patterned array by hybridization to complementary adapter sequence attached to the support surface, where the adapter sequences typically also include primer binding sites used for clonal amplification and/or sequencing. For in situ sequencing, the template sequences may be designed to include both the barcode sequences and amplification and/or sequencing primer binding sites, where the template sequences may be attached to target analytes (for nucleic acid analytes) using, e.g., a padlock or other circularizable probe, and amplified using, e.g., rolling circle amplification.
[0171] The amplified template sequences (comprising barcode sequences) are then probed through a cyclic series of single-base addition primer extension reactions that use detectably-labeled, e.g., fluorescently-labeled, nucleotides to identify the sequence of bases in the template sequences, where the fluorescently-labeled nucleotides are typically blocked at the 3’-OH group with a reversible terminator moiety. The cyclical sequence process thus comprises repeating the steps of (i) contacting a primed template sequence (i.e., a template sequence comprising a bound primer strand having a free 3 ’-OH group) with a mixture of fluorescently-labeled, 3 ’-OH reversibly-terminated nucleotides and a polymerase to enable incorporation of a nucleotide that is complementary to a nucleotide in the template sequence into an extended primer strand, (ii) washing away any unbound nucleotides and polymerase molecules, (iii) imaging the sample (e.g., the surface of a flow cell to which the amplified template sequences are attached, or a tissue sample within which the amplified template sequences are distributed), and (iv) deprotecting the 3’ end of the extended primer strand to remove the reversible terminator moiety and cleaving off the fluorophore, thereby enabling initiation of the next cycle.
[0172] In some instances, the mixture of nucleotides (e.g., fluorescently-labeled, 3 ’-OH reversibly-terminated nucleotides) used in each cycle may comprise any combination of A, T/U, G, and C. In some instances, the mixture of nucleotides may comprise one, two, three, or four of A, T/U, G, and C. In some instances, the mixture of nucleotides may comprise one or more non-natural nucleotides or nucleotide analogs.
[0173] In some instances, the mixture of nucleotides (e.g., fluorescently-labeled, 3 ’-OH reversibly-terminated nucleotides) used in each cycle may be the same. In some instances, the mixture of nucleotides (e.g., fluorescently-labeled, 3 ’-OH reversibly-terminated nucleotides) used in one or more cycles may be different from that used in one or more different cycles.
[0174] In some instances, all of the nucleotides (e.g., detectably-labeled, 3’-OH reversibly-terminated nucleotides) in the mixture of nucleotides may be labeled with a detectable label (e.g., a fluorophore), where different nucleotides in the mixture are labeled with different detectable labels. In some instances, only a subset of the nucleotides (e.g., detectably-labeled, 3 ’-OH reversibly-terminated nucleotides) in the mixture of nucleotides may be labeled with a detectable label (e.g., a fluorophore), where different nucleotides in the subset are labeled with different detectable labels. In some instances, the subset of nucleotide (e.g., detectably-labeled, 3’-OH reversibly-terminated nucleotides) may comprise, e.g., one, two, or three of A, T/U, G, and C.
[0175] The “sequencing-by-ligation” (SBL) approach uses a DNA ligase to identify the nucleotide present at a given position in a template sequence. Unlike sequencing-by- synthesis approaches, this method does not use a DNA polymerase to perform primer extension. Instead, the mismatch sensitivity of a DNA ligase enzyme is used to determine the underlying sequence of the template nucleic acid molecule (see, e.g., EP0703991).
[0176] The "sequencing-by-binding" (SBB) approach is based on performing repetitive cycles of detecting a stabilized complex that forms at each position along the template sequence (e.g., a ternary complex that includes the primed template, a polymerase, and a cognate nucleotide for the position), under conditions that prevent covalent incorporation of the cognate nucleotide into the primer, and then extending the primer to allow detection of the next position along the template (see, e.g., U.S. Pat. Nos. 9,951,385 and 10,655,176). In the sequencing-by-binding approach, detection of the nucleotide at each position of the template occurs prior to extension of the primer to the next position. Generally, the methodology is used to distinguish the four different nucleotide types that can be present at positions along a nucleic acid template by uniquely labelling each type of ternary complex (i.e., different types of ternary complexes differing in the type of nucleotide it contains) or by separately delivering the reagents needed to form each type of ternary complex. In some instances, the labeling may comprise fluorescence labeling of, e.g., the cognate nucleotide or the polymerase that participates in the ternary complex.
[0177] The "sequencing-by-avidity" (or SB A) approach relies on the increased avidity ( or "functional affinity") derived from forming a complex comprising a plurality of individual non-covalent binding interactions (see, e.g., U.S. Pat. Nos. 10,768,173 and 10,982,280). The sequencing-by-avidity approach is based on the detection of a multivalent
binding complex formed between a fluorescently-labeled polymer-nucleotide conjugate, a polymerase, and a plurality of primed target nucleic acid molecules, which allows the detection/base calling step to be separated from the nucleotide incorporation step. Fluorescence imaging is used to detect the bound complex and thereby determine the identity of the N + 1 nucleotide in the target nucleic acid sequence (where the primer extension strand is N nucleotides in length).
[0178] In some instances, the disclosed methods may comprise using one or more nucleotides or analogs thereof, including a native nucleotide or a nucleotide analog or modified nucleotide (e.g., labeled with one or more detectable labels). In some embodiments, a nucleotide analog comprises a nitrogenous base, five-carbon sugar, and phosphate group, wherein any component of the nucleotide may be modified and/or replaced. In some embodiments, a method disclosed herein may comprise using one or more non-incorporable nucleotides. Non-incorporable nucleotides may be modified to become incorporable at any point during the sequencing method.
[0179] Nucleotide analogs include, but are not limited to, alpha-phosphate modified nucleotides, alpha-beta nucleotide analogs, beta-phosphate modified nucleotides, beta-gamma nucleotide analogs, gamma-phosphate modified nucleotides, caged nucleotides, or ddNTPs. Examples of nucleotide analogs are described in U.S. Patent No. 8,071,755, which is incorporated by reference herein in its entirety.
[0180] In some embodiments, a method disclosed herein may comprise using terminators that reversibly prevent nucleotide incorporation at the 3 '-end of the primer. One type of reversible terminator is a 3'-O-blocked reversible terminator. Here the terminator moiety is linked to the oxygen atom of the 3'-OH end of the 5-carbon sugar of a nucleotide. For example, U.S. Patent Nos. 7,544,794 and 8,034,923 (the disclosures of these patents are incorporated by reference) describe reversible terminator dNTPs having the 3'-OH group replaced by a 3'-ONH2 group. Another type of reversible terminator is a 3 '-unblocked reversible terminator, wherein the terminator moiety is linked to the nitrogenous base of a nucleotide. For example, U.S. Patent No. 8,808,989 (the disclosure of which is incorporated by reference) discloses particular examples of base-modified reversible terminator nucleotides that may be used in connection with the methods described herein. Other reversible terminators that similarly can be used in connection with the methods described
herein include those described in U.S. Patent Nos. 7,956,171, 8,071,755, and 9,399,798, herein incorporated by reference.
[0181] In some embodiments, a method disclosed herein may comprise using nucleotide analogs having terminator moieties that irreversibly prevent nucleotide incorporation at the 3 '-end of the primer. Irreversible nucleotide analogs include 2', 3'-dideoxynucleotides, ddNTPs (ddGTP, ddATP, ddTTP, ddCTP). Dideoxynucleotides lack the 3'-OH group of dNTPs that is essential for polymerase-mediated synthesis.
[0182] In some embodiments, a method disclosed herein may comprise using non- incorporable nucleotides comprising a blocking moiety that inhibits or prevents the nucleotide from forming a covalent linkage to a second nucleotide (3'-OH of a primer) during the incorporation step of a nucleic acid polymerization reaction. The blocking moiety can be removed from the nucleotide, allowing for nucleotide incorporation.
[0183] In some embodiments, a method disclosed herein may comprise using 1, 2, 3, 4 or more nucleotide analogs. In some embodiments, a nucleotide analog is replaced, diluted, or sequestered during an incorporation step. In some embodiments, a nucleotide analog is replaced with a native nucleotide. In some embodiments, a nucleotide analog is modified during an incorporation step. The modified nucleotide analog can be similar to or the same as a native nucleotide.
[0184] In some embodiments, a method disclosed herein may comprise using a nucleotide analog having a different binding affinity for a polymerase than a native nucleotide. In some embodiments, a nucleotide analog has a different interaction with a next base than a native nucleotide. Nucleotide analogs and/or non-incorporable nucleotides may base-pair with a complementary base of a template nucleic acid.
[0185] Any suitable enzyme having a polymerase activity can be used in the sequencing reactions described herein, and exemplary polymerases include, but are not limited to, bacterial DNA polymerases, eukaryotic DNA polymerases, archaeal DNA polymerases, viral DNA polymerases and phage DNA polymerases. Bacterial DNA polymerases include E. coli DNA polymerases I, II and III, IV and V, the Klenow fragment of E. coli DNA polymerase, Clostridium stercorarium (Cst) DNA polymerase, Clostridium thermocellum (Cth) DNA polymerase and Sulfolobus solfataricus (Sso) DNA polymerase. Eukaryotic DNA polymerases include DNA polymerases a, P, y, 5, e, q, , c. p, and K, as
well as the Revl polymerase (terminal deoxycytidyl transferase) and terminal deoxynucleotidyl transferase (TdT). Viral DNA polymerases include T4 DNA polymerase, phi-29 DNA polymerase, GA-1, phi-29-like DNA polymerases, PZA DNA polymerase, phi- 15 DNA polymerase, Cpl DNA polymerase, Cp7 DNA polymerase, T7 DNA polymerase, and T4 polymerase. Other DNA polymerases include thermostable and/or thermophilic DNA polymerases such as DNA polymerases isolated from Thermits aquaticus (Taq) DNA polymerase, Thermits filiformis (Tfi) DNA polymerase, Thermococcus zilligi (Tzi) DNA polymerase, Thermus thermophilus (Tth) DNA polymerase, Thermus flavusu (Tfl) DNA polymerase, Pyrococcus woesei (Pwo) DNA polymerase, Pyrococcus furiosus (Pfu) DNA polymerase and Turbo Pfu DNA polymerase, Thermococcus litoralis (Tli) DNA polymerase, Pyrococcus sp. GB-D polymerase, Thermotoga maritima (Tma) DNA polymerase, Bacillus stearothermophilus (Bst) DNA polymerase, Pyrococcus Kodakaraensis (KOD) DNA polymerase, Pfx DNA polymerase, Thermococcus sp. JDF-3 (JDF-3) DNA polymerase, Thermococcus gorgonarius (Tgo) DNA polymerase, Thermococcus acidophilium DNA polymerase; Sulfolobus acidocaldarius DNA polymerase; Thermococcus sp. go N-7 DNA polymerase; Pyrodictium occultum DNA polymerase; Methanococcus voltae DNA polymerase; Methanococcus thermoautotrophicum DNA polymerase; Methanococcus jannaschii DNA polymerase; Desulfurococcus strain TOK DNA polymerase (D. Tok Pol); Pyrococcus abyssi DNA polymerase; Pyrococcus horikoshii DNA polymerase; Pyrococcus islandicum DNA polymerase; Thermococcus fumicolans DNA polymerase; Aeropyrum pemix DNA polymerase; and the heterodimeric DNA polymerase DP1/DP2. Engineered and modified polymerases also are useful in connection with the disclosed techniques. For example, modified versions of the extremely thermophilic marine archaea Thermococcus species 9° N (e.g., Therminator DNA polymerase from New England BioLabs Inc.; Ipswich, Mass.) can be used. Still other useful DNA polymerases, including the 3PDX polymerase are disclosed in U.S. Patent No. 8,703,461, the disclosure of which is incorporated by reference in its entirety. Additional examples include viral RNA polymerases such as T7 RNA polymerase, T3 polymerase, SP6 polymerase, and Kl l polymerase; Eukaryotic RNA polymerases such as RNA polymerase I, RNA polymerase II, RNA polymerase III, RNA polymerase IV, and RNA polymerase V, Archaea RNA polymerase, HIV-1 reverse transcriptase from human immunodeficiency virus type 1 (PDB 1HMV), HIV-2 reverse transcriptase from human immunodeficiency virus type 2, M-MLV reverse transcriptase from the Moloney murine leukemia virus, AMV reverse transcriptase
from the avian myeloblastosis virus, and Telomerase reverse transcriptase that maintains the telomeres of eukaryotic chromosomes.
(iii) Detectable labels
[0186] In some embodiments, one or more nucleotides can be labeled with distinguishing and/or detectable tags or labels. The tags may be distinguishable by means of their differences in fluorescence, Raman spectrum, charge, mass, refractive index, luminescence, length, or any other measurable property. The tag may be attached to one or more different positions on the nucleotide, so long as the fidelity of binding to the polymerase-nucleic acid complex is sufficiently maintained to enable identification of the complementary base on the template nucleic acid correctly. In some embodiments, the tag is attached to the nucleobase of the nucleotide. Alternatively, a tag is attached to the gamma phosphate position of the nucleotide.
[0187] Detectable labels can be suitable for small scale detection and/or suitable for high- throughput screening. As such, suitable detectable labels include, but are not limited to, radioisotopes, fluorophores, chemiluminescent compounds, bioluminescent compounds, and dyes. The detectable label can be qualitatively detected (e.g., optically or spectrally), or it can be quantified. Qualitative detection generally includes a detection method in which the existence or presence of the detectable label is confirmed, whereas quantifiable detection generally includes a detection method having a quantifiable (e.g., numerically reportable) value such as an intensity, duration, polarization, and/or other properties. In some embodiments, the detectable label is bound to another moiety, for example, a nucleotide or nucleotide analog, and can include a fluorescent, a colorimetric, or a chemiluminescent label.
[0188] In some embodiments, a detectable label can be attached to another moiety, for example, a nucleotide or nucleotide analog. In some embodiments, one or more nucleotides can be labeled with a cleavable detectable tag or label. For example, the non-terminating fluorescently labeled nucleotides can include a DBCO-nucleotide conjugated to fluorescent compound with a disulfide linker. In some embodiments, a non-terminating fluorescently labeled nucleotide is incorporated into the strand without termination, and after imaging, the linker can be cleaved to remove fluorescent label. In some embodiments, a DBCO-nucleotide (e.g., 5-DBCO-PEG4-UTP) can undergo a click reaction with the cleavable linker conjugated to a fluorescent label (e.g., cleavable linker- ATTO647N), and a disulfide group can be
cleaved by tris(2-carboxyethyl)phosphine (TCEP) reduction together with 3’-O-azidomethyl- dNTP.
[0189] In some embodiments, the detectable label is a fluorophore. For example, the fluorophore can be from a group that includes: 7-AAD (7- Aminoactinomycin D), Acridine Orange (+DNA), Acridine Orange (+RNA), Alexa Fluor® 350, Alexa Fluor® 430, Alexa Fluor® 488, Alexa Fluor® 532, Alexa Fluor® 546, Alexa Fluor® 555, Alexa Fluor® 568, Alexa Fluor® 594, Alexa Fluor® 633, Alexa Fluor® 647, Alexa Fluor® 660, Alexa Fluor® 680, Alexa Fluor® 700, Alexa Fluor® 750, Allophycocyanin (APC), AMCA / AMCA-X, 7- Aminoactinomycin D (7-AAD), 7- Amino-4-methylcoumarin, 6- Aminoquinoline, Aniline Blue, ANS, APC-Cy7, ATTO-TAG™ CBQCA, ATTO-TAG™ FQ, Auramine O-Feulgen, BCECF (high pH), BFP (Blue Fluorescent Protein), BFP / GFP FRET, BOBO™-1 / BO- PRO™-!, BOBO™-3 / BO-PRO™-3, BODIPY® FL, BODIPY® TMR, BODIPY® TR-X, BODIPY® 530/550, BODIPY® 558/568, BODIPY® 564/570, BODIPY® 581/591, BODIPY® 630/650-X, BODIPY® 650-665-X, BTC, Calcein, Calcein Blue, Calcium Crimson™, Calcium Green- 1™, Calcium Orange™, Calcofluor® White, 5- Carboxyfluoroscein (5-FAM), 5-Carboxynaphthofluoroscein, 6-Carboxyrhodamine 6G, 5- Carboxytetramethylrhodamine (5-TAMRA), Carboxy -X-rhodamine (5-ROX), Cascade Blue®, Cascade Yellow™, CCF2 (GeneBLAzer™), CFP (Cyan Fluorescent Protein), CFP / YFP FRET, Chromomycin A3, Cl-NERF (low pH), CPM, 6-CR 6G, CTC Formazan, Cy2®, Cy3®, Cy3.5®, Cy5®, Cy5.5®, Cy7®, Cychrome (PE-Cy5), Dansylamine, Dansyl cadaverine, Dansylchloride, DAPI, Dapoxyl, DCFH, DHR, DiA (4-Di-16-ASP), DiD (DilC18(5)), DIDS, Dil (DilC18(3)), DiO (DiOC18(3)), DiR (DilC18(7)), Di-4 ANEPPS, Di- 8 ANEPPS, DM-NERF (4.5-6.5 pH), DsRed (Red Fluorescent Protein), EBFP, ECFP, EGFP, ELF® -97 alcohol, Eosin, Erythrosin, Ethidium bromide, Ethidium homodimer- 1 (EthD-1), Europium (III) Chloride, 5-FAM (5-Carboxyfluorescein), Fast Blue, Fluorescein-dT phosphoramidite, FITC, Fluo-3, Fluo-4, FluorX®, Fluoro-Gold™ (high pH), Fluoro-Gold™ (low pH), Fluoro-Jade, FM® 1-43, Fura-2 (high calcium), Fura-2 / BCECF, Fura Red™ (high calcium), Fura Red™ / Fluo-3, GeneBLAzer™ (CCF2), GFP Red Shifted (rsGFP), GFP Wild Type, GFP / BFP FRET, GFP / DsRed FRET, Hoechst 33342 & 33258, 7- Hydroxy-4-methylcoumarin (pH 9), 1,5 IAEDANS, Indo-1 (high calcium), Indo-1 (low calcium), Indodicarbocyanine, Indotricarbocyanine, JC-1, 6-JOE, JOJO™-1 / JO-PRO™-1, LDS 751 (+DNA), LDS 751 (+RNA), LOLO™-1 / LO-PRO™-1, Lucifer Yellow, Ly soSensor™ Blue (pH 5), Ly soSensor™ Green (pH 5), Ly soSensor™ Yellow/Blue (pH
4.2), LysoTracker® Green, LysoTracker® Red, LysoTracker® Yellow, Mag-Fura-2, Mag- Indo-1, Magnesium Green™, Marina Blue®, 4-Methylumbelliferone, Mithramycin, MitoTracker® Green, MitoTracker® Orange, MitoTracker® Red, NBD (amine), Nile Red, Oregon Green® 488, Oregon Green® 500, Oregon Green® 514, Pacific Blue, PBF1, PE (R- phycoerythrin), PE-Cy5, PE-Cy7, PE-Texas Red, PerCP (Peridinin chlorphyll protein), PerCP-Cy5.5 (TruRed), PharRed (APC-Cy7), C-phycocyanin, R-phycocyanin, R- phycoerythrin (PE), PI (Propidium Iodide), PKH26, PKH67, POPO™-1 / PO-PRO™-1, POPO™-3 / PO-PRO™-3, Propidium Iodide (PI), PyMPO, Pyrene, Pyronin Y, Quantam Red (PE-Cy5), Quinacrine Mustard, R670 (PE-Cy5), Red 613 (PE-Texas Red) , Red Fluorescent Protein (DsRed), Resorufin, RH 414, Rhod-2, Rhodamine B, Rhodamine Green™, Rhodamine Red™, Rhodamine Phalloidin, Rhodamine 110, Rhodamine 123, 5-ROX (carboxy-X-rhodamine), S65A, S65C, S65L, S65T, SBFI, SITS, SNAFL®-1 (high pH), SNAFL®-2, SNARF®-1 (high pH), SNARF®-1 (low pH), Sodium Green™, SpectrumAqua®, SpectrumGreen® #1, SpectrumGreen® #2, SpectrumOrange®, SpectrumRed®, SYTO® 11, SYTO® 13, SYTO® 17, SYTO® 45, SYTOX® Blue, SYTOX® Green, SYTOX® Orange, 5-TAMRA (5-Carboxytetramethylrhodamine), Tetramethylrhodamine (TRITC), Texas Red® / Texas Red®-X, Texas Red®-X (NHS Ester), Thiadicarbocyanine, Thiazole Orange, TOTO®-1 / TO-PRO®-1, TOTO®-3 / TO-PRO®-3, TO-PRO®-5, Tri-color (PE-Cy5), TRITC (Tetramethylrhodamine), TruRed (PerCP-Cy5.5), WW 781, X-Rhodamine (XRITC) , Y66F, Y66H, Y66W, YFP (Yellow Fluorescent Protein), YOYO®-1 / YO-PRO®-1, YOYO®-3 / YO-PRO®-3, 6-FAM (Fluorescein), 6-FAM (NHS Ester), 6-FAM (Azide), HEX, TAMRA (NHS Ester), Yakima Yellow, MAX, TET, TEX615, ATTO 488, ATTO 532, ATTO 542, ATTO 550, ATTO 565, ATTO RholOl, ATTO 590, ATTO 633, ATTO 647N, TYE 563, TYE 665, TYE 705, 5’ IRDye® 700, 5’ IRDye® 800, 5’ IRDye® 800CW (NHS Ester), WellRED D4 Dye, WellRED D3 Dye, WellRED D2 Dye, Lightcycler® 640 (NHS Ester), and Dy 750 (NHS Ester).
[0190] The detectable label can be directly detectable by itself (e.g., radioisotope labels or fluorescent labels) or, in the case of an enzymatic label, can be indirectly detectable, e.g., by catalyzing chemical alterations of a substrate compound or composition, which substrate compound or composition is directly detectable. The label can emit a signal or alter a signal delivered to the label so that the presence or absence of the label can be detected. In some cases, coupling may be via a linker, which may be cleavable, such as photo-cleavable (e.g., cleavable under ultra-violet light), chemically-cleavable (e.g., via a reducing agent, such as
dithiothreitol (DTT), tris(2-carboxyethyl)phosphine (TCEP)) or enzymatically cleavable (e.g., via an esterase, lipase, peptidase, or protease).
(iv) Fluorescence detection
[0191] Fluorescence detection in tissue samples can often be hindered by the presence of strong background fluorescence. “Autofluorescence” is the general term used to distinguish background fluorescence (that can arise from a variety of sources, including aldehyde fixation, extracellular matrix components, red blood cells, lipofuscin, and the like) from the desired immunofluorescence from the fluorescently labeled antibodies or probes. Tissue autofluorescence can lead to difficulties in distinguishing the signals due to fluorescent antibodies or probes from the general background. In some embodiments, a method disclosed herein utilizes one or more agents to reduce tissue autofluorescence, for example, Autofluorescence Eliminator (Sigma/EMD Millipore), TrueBlack Lipofuscin Autofluorescence Quencher (Biotium), MaxBlock Autofluorescence Reducing Reagent Kit (MaxVision Biosciences), and/or a very intense black dye (e.g., Sudan Black, or comparable dark chromophore).
[0192] Examples of fluorescent labels and nucleotides and/or polynucleotides conjugated to such fluorescent labels comprise those described in, for example, Hoagland, Handbook of Fluorescent Probes and Research Chemicals, Ninth Edition (Molecular Probes, Inc., Eugene, 2002); Keller and Manak, DNA Probes, 2nd Edition (Stockton Press, New York, 1993); Eckstein, editor, Oligonucleotides and Analogues: A Practical Approach (IRL Press, Oxford, 1991); and Wetmur, Critical Reviews in Biochemistry and Molecular Biology, 26:227- 259 (1991). In some embodiments, exemplary techniques and methods methodologies applicable to the provided embodiments comprise those described in, for example, US 4,757,141, US 5,151,507 and US 5,091,519. In some embodiments, one or more fluorescent dyes are used as labels for labeled target sequences, for example, as described in US 5,188,934 (4,7- dichlorofluorescein dyes); US 5,366,860 (spectrally resolvable rhodamine dyes); US 5,847,162 (4,7- dichlororhodamine dyes); US 4,318,846 (ether-substituted fluorescein dyes); US 5,800,996 (energy transfer dyes); US 5,066,580 (xanthine dyes); and US 5,688,648 (energy transfer dyes). Labelling can also be carried out with quantum dots, as described in US 6,322,901, US 6,576,291, US 6,423,551, US 6,251,303, US 6,319,426, US 6,426,513, US 6,444,143, US 5,990,479, US 6,207,392, US 2002/0045045 and US 2003/0017264. As used herein, the term "fluorescent label" comprises a signaling moiety that conveys information
through the fluorescent absorption and/or emission properties of one or more molecules. Exemplary fluorescent properties comprise fluorescence intensity, fluorescence lifetime, emission spectrum characteristics and energy transfer.
(v) Imaging
[0193] In some aspects, the detection (comprising imaging) is carried out using any of a number of different types of microscopy, e.g., confocal microscopy, two-photon microscopy, light-field microscopy, intact tissue expansion microscopy, and/or CLARITY™-optimized light sheet microscopy (COLM).
[0194] In some embodiments, fluorescence microscopy is used for detection and imaging of the sample. In some aspects, a fluorescence microscope is an optical microscope that uses fluorescence and phosphorescence instead of, or in addition to, reflection and absorption to study properties of organic or inorganic substances. In fluorescence microscopy, a sample is illuminated with light of a wavelength which excites fluorescence in the sample. The fluoresced light, which is usually at a longer wavelength than the illumination, is then imaged through a microscope objective. Two filters may be used in this technique; an illumination (or excitation) filter which ensures the illumination is near monochromatic and at the correct wavelength, and a second emission (or barrier) filter which ensures none of the excitation light source reaches the detector. Alternatively, these functions may both be accomplished by a single dichroic filter. The "fluorescence microscope" comprises any microscope that uses fluorescence to generate an image, whether it is a more simple set up like an epifluorescence microscope, or a more complicated design such as a confocal microscope, which uses optical sectioning to get better resolution of the fluorescent image.
[0195] In some embodiments, confocal microscopy is used for detection and imaging of the sample. Confocal microscopy uses point illumination and a pinhole in an optically conjugate plane in front of the detector to eliminate out-of-focus signal. As only light produced by fluorescence very close to the focal plane can be detected, the image's optical resolution, particularly in the sample depth direction, is much better than that of wide-field microscopes. However, as much of the light from sample fluorescence is blocked at the pinhole, this increased resolution is at the cost of decreased signal intensity - so long exposures are often required. As only one point in the sample is illuminated at a time, 2D or 3D imaging requires scanning over a regular raster (z.e., a rectangular pattern of parallel
scanning lines) in the specimen. The achievable thickness of the focal plane is defined mostly by the wavelength of the used light divided by the numerical aperture of the objective lens, but also by the optical properties of the specimen. The thin optical sectioning possible makes these types of microscopes particularly good at 3D imaging and surface profiling of samples. CLARITY™-optimized light sheet microscopy (COLM) provides an alternative microscopy for fast 3D imaging of large clarified samples. COLM interrogates large immunostained tissues, permits increased speed of acquisition and results in a higher quality of generated data.
[0196] Other types of microscopy that can be employed comprise bright field microscopy, oblique illumination microscopy, dark field microscopy, phase contrast, differential interference contrast (DIC) microscopy, interference reflection microscopy (also known as reflected interference contrast, or RIC), single plane illumination microscopy (SPIM), super-resolution microscopy, laser microscopy, electron microscopy (EM), Transmission electron microscopy (TEM), Scanning electron microscopy (SEM), reflection electron microscopy (REM), Scanning transmission electron microscopy (STEM) and low- voltage electron microscopy (LVEM), scanning probe microscopy (SPM), atomic force microscopy (ATM), ballistic electron emission microscopy (BEEM), chemical force microscopy (CFM), conductive atomic force microscopy (C- AFM), electrochemical scanning tunneling microscope (ECSTM), electrostatic force microscopy (EFM), fluidic force microscope (FluidFM), force modulation microscopy (FMM), feature-oriented scanning probe microscopy (FOSPM), kelvin probe force microscopy (KPFM), magnetic force microscopy (MFM), magnetic resonance force microscopy (MRFM), near-field scanning optical microscopy (NSOM) (or SNOM, scanning near-field optical microscopy, SNOM, Piezoresponse Force Microscopy (PFM), PSTM, photon scanning tunneling microscopy (PSTM), PTMS, photothermal microspectroscopy/ microscopy (PTMS), SCM, scanning capacitance microscopy (SCM), SECM, scanning electrochemical microscopy (SECM), SGM, scanning gate microscopy (SGM), SHPM, scanning Hall probe microscopy (SHPM), SICM, scanning ion-conductance microscopy (SICM), SPSM spin polarized scanning tunneling microscopy (SPSM), SSRM, scanning spreading resistance microscopy (SSRM), SThM, scanning thermal microscopy (SThM), STM, scanning tunneling microscopy (STM), STP, scanning tunneling potentiometry (STP), SVM, scanning voltage microscopy (SVM), and synchrotron x-ray scanning tunneling microscopy (SXSTM), and intact tissue expansion microscopy (exM).
[0197] In some embodiments, a method herein comprises subjecting the sample to expansion microscopy methods and techniques. Expansion allows individual targets (e.g., mRNA or RNA transcripts) which are densely packed within a cell, to be resolved spatially in a high-throughput manner. Expansion microscopy techniques are known in the art and can be performed as described in US 2016/0116384 and Chen et al., Science, 347, 543 (2015), each of which are incorporated herein by reference in their entirety. In some embodiments, the method does not comprise subjecting the sample to expansion microscopy. In some embodiments, the method does not comprise dissociating a cell from the sample such as a tissue or the cellular microenvironment. In some embodiments, the method does not comprise lysing the sample or cells therein. In some embodiments, the method does not comprise embedding the sample or molecules from the sample in an exogenous matrix.
[0198] In some cases, analysis is performed on one or more images captured, and may comprise processing the image(s) and/or quantifying signals observed. In some embodiments, images of signals from different fluorescent channels and/or nucleotide incorporation cycles can be compared and analyzed. In some embodiments, images of signals (or absence thereof) at a particular location in a sample from different fluorescent channels and/or sequential incorporation cycles can be aligned to analyze an analyte at the location. For instance, a particular location in a sample can be tracked and signal spots from sequential incorporation cycles can be analyzed to detect a target polynucleotide sequence (e.g., a barcode sequence or subsequence thereof) in an analyte at the location. The analysis may comprise processing information of one or more cell types, one or more types of analytes, a number or level of analyte, and/or a number or level of cells detected in a particular region of the sample. In some embodiments, the analysis comprises detecting a sequence e.g., a barcode sequence present in an amplification product at a location in the sample. In some embodiments, the number of signals detected in a unit area in the biological sample is quantified. In some embodiments, the signals detected at a corresponding position in the biological sample in a plurality of images taken at different z positions (e.g., in the depth direction) is quantified and analyzed.
IV. Samples and Sample Processing
[0199] Methods and compositions disclosed herein may be used for analyzing a biological sample, which may be obtained from a subject using any of a variety of techniques including, but not limited to, biopsy, surgery, and laser capture microscopy (LCM), and
generally includes cells and/or other biological material from the subject. A biological sample can also be obtained from a eukaryote, such as a tissue sample, a patient derived organoid (PDO) or patient derived xenograft (PDX). A biological sample from an organism may comprise one or more other organisms or components therefrom. For example, a mammalian tissue section may comprise a prion, a viroid, a virus, a bacterium, a fungus, or components from other organisms, in addition to mammalian cells and non-cellular tissue components.
[0200] Subjects from which biological samples can be obtained can be healthy or asymptomatic individuals, individuals that have or are suspected of having a disease (e.g., a patient with a disease such as cancer) or a pre-disposition to a disease, and/or individuals in need of therapy or suspected of needing therapy.
[0201] In some embodiments, the biological sample corresponds to cells (e.g., derived from a cell culture, a tissue sample, or cells deposited on a surface). In a cell sample with a plurality of cells, individual cells can be naturally unaggregated. For example, the cells can be derived from a suspension of cells (e.g., a body fluid such as blood) and/or disassociated or disaggregated cells from a tissue or tissue section. The number of cells in the biological sample can vary. Some biological samples comprise large numbers of cells, e.g., blood samples, while other biological samples comprise smaller or only a small number of cells or may only be suspected of containing cells, e.g., plasma, serum, urine, saliva, synovial fluids, amniotic fluid, lachrymal fluid, lymphatic fluid, liquor, cerebrospinal fluid and the like.
[0202] In some embodiments, a cell-containing biological sample can comprise a body fluid or a cell-containing sample derived from the body fluid, e.g., whole blood, samples derived from blood such as plasma or serum, buffy coat, urine, sputum, lachrymal fluid, lymphatic fluid, sweat, liquor, cerebrospinal fluid, ascites, milk, stool, bronchial lavage, saliva, amniotic fluid, nasal secretions, vaginal secretions, semen/seminal fluid, wound secretions, cell culture and swab samples, or any cell-containing sample derived from the aforementioned samples. In some embodiments, a cell-containing biological sample can be a body fluid, a body secretion or body excretion, e.g., lymphatic fluid, blood, buffy coat, plasma or serum. In some embodiments, a cell-containing biological sample can be a circulating body fluid such as blood or lymphatic fluid, e.g., peripheral blood obtained from a mammal such as human.
[0203] The biological sample can include any number of macromolecules, for example, cellular macromolecules and organelles (e.g., mitochondria and nuclei). The biological sample can be obtained as a tissue sample, such as a tissue section, a cell pellet, a cell block, a biopsy, a core biopsy, needle aspirate, or fine needle aspirate. The sample can be a fluid sample, such as a blood sample, urine sample, or saliva sample. The sample can be a skin sample, a colon sample, a cheek swab, a histology sample, a histopathology sample, a plasma or serum sample, a tumor sample, living cells, cultured cells, a clinical sample such as, for example, whole blood or blood-derived products, blood cells, or cultured tissues or cells, including cell suspensions. In some embodiments, the biological sample may comprise cells which are deposited on a surface. In some embodiments, the biological sample may comprises transcripts of antigen receptor molecules. In some embodiments, the biological sample comprises analytes from any of the sources described herein deposited on a surface.
[0204] Biological samples can be derived from a homogeneous culture or population of the subjects or organisms mentioned herein or alternatively from a collection of several different organisms, for example, in a community or ecosystem.
[0205] Biological samples can include one or more diseased cells. A diseased cell can have altered metabolic properties, gene expression, protein expression, and/or morphologic features. Examples of diseases include inflammatory disorders, metabolic disorders, nervous system disorders, and cancer. Cancer cells can be derived from solid tumors, hematological malignancies, cell lines, or obtained as circulating tumor cells. Biological samples can also include fetal cells and immune cells.
[0206] Biological samples can include analytes (e.g., protein, RNA, and/or DNA) embedded in a 3D matrix. In some embodiments, amplicons (e.g., rolling circle amplification products) derived from or associated with analytes (e.g., protein, RNA, and/or DNA) can be embedded in a 3D matrix. In some embodiments, a 3D matrix may comprise a network of natural molecules and/or synthetic molecules that are chemically and/or enzymatically linked, e.g., by crosslinking. In some embodiments, a 3D matrix may comprise a synthetic polymer. In some embodiments, a 3D matrix comprises a hydrogel.
[0207] In some embodiments, a substrate herein can be any support that is insoluble in aqueous liquid and which allows for positioning of biological samples, analytes, features, and/or reagents on the support. In some embodiments, a biological sample can be attached to
a substrate. Attachment of the biological sample can be irreversible or reversible, depending upon the nature of the sample and subsequent steps in the analytical method. In certain embodiments, the sample can be attached to the substrate reversibly by applying a suitable polymer coating to the substrate, and contacting the sample to the polymer coating. The sample can then be detached from the substrate, e.g., using an organic solvent that at least partially dissolves the polymer coating. Hydrogels are examples of polymers that are suitable for this purpose.
[0208] In some embodiments, the substrate can be coated or functionalized with one or more substances to facilitate attachment of the sample to the substrate. Suitable substances that can be used to coat or functionalize the substrate include, but are not limited to, lectins, poly-lysine, antibodies, and polysaccharides.
[0209] A variety of steps can be performed to prepare or process a biological sample for and/or during an assay. Except where indicated otherwise, the preparative or processing steps described below can generally be combined in any manner and in any order to appropriately prepare or process a particular sample for and/or analysis.
(i) Tissue Sectioning
[0210] A biological sample can be harvested from a subject (e.g., via surgical biopsy, whole subject sectioning) or grown in vitro on a growth substrate or culture dish as a population of cells, and prepared for analysis as a tissue slice or tissue section. Grown samples may be sufficiently thin for analysis without further processing steps. Alternatively, grown samples, and samples obtained via biopsy or sectioning, can be prepared as thin tissue sections using a mechanical cutting apparatus such as a vibrating blade microtome. As another alternative, in some embodiments, a thin tissue section can be prepared by applying a touch imprint of a biological sample to a suitable substrate material.
[0211] The thickness of the tissue section can be a fraction of (e.g., less than 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, or 0.1) the maximum cross-sectional dimension of a cell. However, tissue sections having a thickness that is larger than the maximum cross-section cell dimension can also be used. For example, cryostat sections can be used, which can be, e.g., 10-20 pm thick.
[0212] More generally, the thickness of a tissue section typically depends on the method used to prepare the section and the physical characteristics of the tissue, and therefore sections having a wide variety of different thicknesses can be prepared and used. For example, the thickness of the tissue section can be at least 0.1, 0.2, 0.3, 0.4, 0.5, 0.7, 1.0, 1.5, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 14, 15, 20, 30, 40, or 50 pm. Thicker sections can also be used if desired or convenient, e.g., at least 70, 80, 90, or 100 pm or more. Typically, the thickness of a tissue section is between 1-100 pm, 1-50 pm, 1-30 pm, 1-25 pm, 1-20 pm, 1-15 pm, 1- 10 pm, 2-8 pm, 3-7 pm, or 4-6 pm, but as mentioned above, sections with thicknesses larger or smaller than these ranges can also be analyzed.
[0213] Multiple sections can also be obtained from a single biological sample. For example, multiple tissue sections can be obtained from a surgical biopsy sample by performing serial sectioning of the biopsy sample using a sectioning blade. Spatial information among the serial sections can be preserved in this manner, and the sections can be analyzed successively to obtain three-dimensional information about the biological sample.
(ii) Freezing
[0214] In some embodiments, the biological sample (e.g., a tissue section as described above) can be prepared by deep freezing at a temperature suitable to maintain or preserve the integrity (e.g., the physical characteristics) of the tissue structure. The frozen tissue sample can be sectioned, e.g., thinly sliced, onto a substrate surface using any number of suitable methods. For example, a tissue sample can be prepared using a chilled microtome (e.g., a cryostat) set at a temperature suitable to maintain both the structural integrity of the tissue sample and the chemical properties of the nucleic acids in the sample. Such a temperature can be, e.g., less than -15°C, less than -20°C, or less than -25°C.
(iii) Fixation and Post-fixation
[0215] In some embodiments, the biological sample can be prepared using formalinfixation and paraffin-embedding (FFPE), which are established methods. In some embodiments, cell suspensions and other non-tissue samples can be prepared using formalinfixation and paraffin-embedding. Following fixation of the sample and embedding in a paraffin or resin block, the sample can be sectioned as described above. Prior to analysis, the paraffin-embedding material can be removed from the tissue section (e.g., deparaffinization)
by incubating the tissue section in an appropriate solvent (e.g., xylene) followed by a rinse (e.g., 99.5% ethanol for 2 minutes, 96% ethanol for 2 minutes, and 70% ethanol for 2 minutes).
[0216] As an alternative to formalin fixation described above, a biological sample can be fixed in any of a variety of other fixatives to preserve the biological structure of the sample prior to analysis. For example, a sample can be fixed via immersion in ethanol, methanol, acetone, paraformaldehyde (PFA)-Triton, and combinations thereof.
[0217] In some embodiments, acetone fixation is used with fresh frozen samples, which can include, but are not limited to, cortex tissue, mouse olfactory bulb, human brain tumor, human post-mortem brain, and breast cancer samples. When acetone fixation is performed, pre-permeabilization steps (described below) may not be performed. Alternatively, acetone fixation can be performed in conjunction with permeabilization steps.
[0218] In some embodiments, the methods provided herein comprises one or more postfixing (also referred to as postfixation) steps. In some embodiments, one or more post-fixing step is performed after contacting a sample with a polynucleotide disclosed herein, e.g., one or more probes such as a circular or padlock probe. In some embodiments, one or more postfixing step is performed after a hybridization complex comprising a probe and a target is formed in a sample. In some embodiments, one or more post-fixing step is performed prior to a ligation reaction disclosed herein, such as the ligation to circularize a padlock probe.
[0219] In some embodiments, one or more post-fixing step is performed after contacting a sample with a binding or labelling agent (e.g., an antibody or antigen binding fragment thereof) for a non-nucleic acid analyte such as a protein analyte. The labelling agent can comprise a nucleic acid molecule (e.g., reporter oligonucleotide) comprising a sequence corresponding to the labelling agent and therefore corresponds to (e.g., uniquely identifies) the analyte. In some embodiments, the labelling agent can comprise a reporter oligonucleotide comprising one or more barcode sequences.
[0220] A post-fixing step may be performed using any suitable fixation reagent disclosed herein, for example, 3% (w/v) paraformaldehyde in DEPC-PBS.
(iv) Embedding
[0221] As an alternative to paraffin embedding described above, a biological sample can be embedded in any of a variety of other embedding materials to provide structural substrate to the sample prior to sectioning and other handling steps. In some cases, the embedding material can be removed e.g., prior to analysis of tissue sections obtained from the sample. Suitable embedding materials include, but are not limited to, waxes, resins (e.g., methacrylate resins), epoxies, and agar.
[0222] In some embodiments, the biological sample can be embedded in a matrix (e.g., a hydrogel matrix). Embedding the sample in this manner typically involves contacting the biological sample with a hydrogel such that the biological sample becomes surrounded by the hydrogel. For example, the sample can be embedded by contacting the sample with a suitable polymer material, and activating the polymer material to form a hydrogel. In some embodiments, the hydrogel is formed such that the hydrogel is internalized within the biological sample.
[0223] In some embodiments, the biological sample is immobilized in the hydrogel via cross-linking of the polymer material that forms the hydrogel. Cross-linking can be performed chemically and/or photochemically, or alternatively by any other hydrogelformation method.
[0224] The composition and application of the hydrogel-matrix to a biological sample typically depends on the nature and preparation of the biological sample (e.g., sectioned, nonsectioned, type of fixation). As one example, where the biological sample is a tissue section, the hydrogel-matrix can include a monomer solution and an ammonium persulfate (APS) initiator/tetramethylethylenediamine (TEMED) accelerator solution. As another example, where the biological sample consists of cells (e.g., cultured cells or cells disassociated from a tissue sample), the cells can be incubated with the monomer solution and APS/TEMED solutions. For cells, hydrogel-matrix gels are formed in compartments, including but not limited to devices used to culture, maintain, or transport the cells. For example, hydrogelmatrices can be formed with monomer solution plus APS/TEMED added to the compartment to a depth ranging from about 0.1 pm to about 2 mm.
[0225] Additional methods and aspects of hydrogel embedding of biological samples are described for example in Chen et al., Science 347(6221):543-548, 2015, the entire contents of which are incorporated herein by reference.
(v) Staining and Immunohistochemistry (IHC)
[0226] To facilitate visualization, biological samples can be stained using a wide variety of stains and staining techniques. In some embodiments, for example, a sample can be stained using any number of stains and/or immunohistochemical reagents. One or more staining steps may be performed to prepare or process a biological sample for an assay described herein or may be performed during and/or after an assay. In some embodiments, the sample can be contacted with one or more nucleic acid stains, membrane stains (e.g., cellular or nuclear membrane), cytological stains, or combinations thereof. In some examples, the stain may be specific to proteins, phospholipids, DNA (e.g., dsDNA, ssDNA), RNA, an organelle or compartment of the cell. The sample may be contacted with one or more labeled antibodies (e.g., a primary antibody specific for the analyte of interest and a labeled secondary antibody specific for the primary antibody). In some embodiments, cells in the sample can be segmented using one or more images taken of the stained sample.
[0227] In some embodiments, the stain is performed using a lipophilic dye. In some examples, the staining is performed with a lipophilic carbocyanine or aminostyryl dye, or analogs thereof (e.g, Dil, DiO, DiR, DiD). Other cell membrane stains may include FM and RH dyes or immunohistochemical reagents specific for cell membrane proteins. In some examples, the stain may include but is not limited to, acridine orange, acid fuchsin, Bismarck brown, carmine, coomassie blue, cresyl violet, DAPI, eosin, ethidium bromide, acid fuchsine, haematoxylin, Hoechst stains, iodine, methyl green, methylene blue, neutral red, Nile blue, Nile red, osmium tetroxide, ruthenium red, propidium iodide, rhodamine (e.g., rhodamine B), or safranine, or derivatives thereof. In some embodiments, the sample may be stained with haematoxylin and eosin (H&E).
[0228] The sample can be stained using hematoxylin and eosin (H&E) staining techniques, using Papanicolaou staining techniques, Masson’s trichrome staining techniques, silver staining techniques, Sudan staining techniques, and/or using Periodic Acid Schiff (PAS) staining techniques. PAS staining is typically performed after formalin or acetone fixation. In some embodiments, the sample can be stained using Romanowsky stain,
including Wright’s stain, Jenner’s stain, Can-Grunwald stain, Leishman stain, and Giemsa stain.
[0229] In some embodiments, biological samples can be destained. Methods of destaining or discoloring a biological sample generally depend on the nature of the stain(s) applied to the sample. For example, in some embodiments, one or more immunofluorescent stains are applied to the sample via antibody coupling. Such stains can be removed using techniques such as cleavage of disulfide linkages via treatment with a reducing agent and detergent washing, chaotropic salt treatment, treatment with antigen retrieval solution, and treatment with an acidic glycine buffer. Methods for multiplexed staining and destaining are described, for example, in Bolognesi et al., J. Histochem. Cytochem. 2017; 65(8): 431-444, Lin et al., Nat Commun. 2015; 6:8390, Pirici et al., J. Histochem. Cytochem. 2009; 57:567-75, and Glass et al., J. Histochem. Cytochem. 2009; 57:899-905, the entire contents of each of which are incorporated herein by reference.
(vi) Isometric Expansion
[0230] In some embodiments, a biological sample embedded in a matrix (e.g., a hydrogel) can be isometrically expanded. Isometric expansion methods that can be used include hydration, a preparative step in expansion microscopy, as described in Chen et al., Science 347(6221):543-548, 2015.
[0231] Isometric expansion can be performed by anchoring one or more components of a biological sample to a gel, followed by gel formation, proteolysis, and swelling. In some embodiments, analytes in the sample, products of the analytes, and/or probes associated with analytes in the sample can be anchored to the matrix (e.g., hydrogel). Isometric expansion of the biological sample can occur prior to immobilization of the biological sample on a substrate, or after the biological sample is immobilized to a substrate. In some embodiments, the isometrically expanded biological sample can be removed from the substrate prior to contacting the substrate with probes disclosed herein.
[0232] In general, the steps used to perform isometric expansion of the biological sample can depend on the characteristics of the sample (e.g., thickness of tissue section, fixation, cross-linking), and/or the analyte of interest (e.g., different conditions to anchor RNA, DNA, and protein to a gel).
[0233] In some embodiments, proteins in the biological sample are anchored to a swellable gel such as a polyelectrolyte gel. An antibody can be directed to the protein before, after, or in conjunction with being anchored to the swellable gel. DNA and/or RNA in a biological sample can also be anchored to the swellable gel via a suitable linker. Examples of such linkers include, but are not limited to, 6-((Acryloyl)amino) hexanoic acid (Acryloyl-X SE) (available from ThermoFisher, Waltham, MA), Label-IT Amine (available from MirusBio, Madison, WI) and Label X (described for example in Chen et al., Nat. Methods 13:679-684, 2016, the entire contents of which are incorporated herein by reference).
[0234] Isometric expansion of the sample can increase the spatial resolution of the subsequent analysis of the sample. The increased resolution in spatial profiling can be determined by comparison of an isometrically expanded sample with a sample that has not been isometrically expanded.
[0235] In some embodiments, a biological sample is isometrically expanded to a size at least 2x, 2. lx, 2.2x, 2.3x, 2.4x, 2.5x, 2.6x, 2.7x, 2.8x, 2.9x, 3x, 3. lx, 3.2x, 3.3x, 3.4x, 3.5x, 3.6x, 3.7x, 3.8x, 3.9x, 4x, 4. lx, 4.2x, 4.3x, 4.4x, 4.5x, 4.6x, 4.7x, 4.8x, or 4.9x its nonexpanded size. In some embodiments, the sample is isometrically expanded to at least 2x and less than 20x of its non-expanded size.
(vii) Crosslinking and De-crosslinking
[0236] In some embodiments, the biological sample is reversibly cross-linked prior to or during an in situ assay. In some aspects, the analytes, polynucleotides and/or amplification product (e.g., amplicon) of an analyte or a probe bound thereto can be anchored to a polymer matrix. For example, the polymer matrix can be a hydrogel. In some embodiments, one or more of the polynucleotide probe(s) and/or amplification product (e.g., amplicon) thereof can be modified to contain functional groups that can be used as an anchoring site to attach the polynucleotide probes and/or amplification product to a polymer matrix. In some embodiments, a modified probe comprising oligo dT may be used to bind to mRNA molecules of interest, followed by reversible crosslinking of the mRNA molecules.
[0237] A hydrogel may include a macromolecular polymer gel including a network. Within the network, some polymer chains can optionally be cross-linked, although crosslinking does not always occur.
[0238] In some embodiments, a hydrogel can include hydrogel subunits, such as, but not limited to, acrylamide, bis-acrylamide, polyacrylamide and derivatives thereof, poly(ethylene glycol) and derivatives thereof (e.g., PEG-acrylate (PEG-DA), PEG-RGD), gelatin- methacryloyl (GelMA), methacrylated hyaluronic acid (MeHA), polyaliphatic polyurethanes, polyether polyurethanes, polyester polyurethanes, polyethylene copolymers, polyamides, polyvinyl alcohols, polypropylene glycol, poly tetramethylene oxide, polyvinyl pyrrolidone, polyacrylamide, poly (hydroxy ethyl acrylate), and poly (hydroxy ethyl methacrylate), collagen, hyaluronic acid, chitosan, dextran, agarose, gelatin, alginate, protein polymers, methylcellulose, and the like, and combinations thereof.
[0239] In some embodiments, a hydrogel includes a hybrid material, e.g., the hydrogel material includes elements of both synthetic and natural polymers. Examples of suitable hydrogels are described, for example, in U.S. Patent Nos. 6,391,937, 9,512,422, and 9,889,422, and in U.S. Patent Application Publication Nos. 2017/0253918, 2018/0052081 and 2010/0055733, the entire contents of each of which are incorporated herein by reference.
[0240] In some embodiments, the hydrogel can form the substrate. In some embodiments, the substrate includes a hydrogel and one or more second materials. In some embodiments, the hydrogel is placed on top of one or more second materials. For example, the hydrogel can be pre-formed and then placed on top of, underneath, or in any other configuration with one or more second materials. In some embodiments, hydrogel formation occurs after contacting one or more second materials during formation of the substrate. Hydrogel formation can also occur within a structure (e.g., wells, ridges, projections, and/or markings) located on a substrate.
[0241] In some embodiments, hydrogel formation on a substrate occurs before, contemporaneously with, or after probes are provided to the sample. For example, hydrogel formation can be performed on the substrate already containing the probes.
[0242] In some embodiments, hydrogel formation occurs within a biological sample. In some embodiments, a biological sample (e.g., tissue section) is embedded in a hydrogel. In some embodiments, hydrogel subunits are infused into the biological sample, and polymerization of the hydrogel is initiated by an external or internal stimulus.
[0243] In embodiments in which a hydrogel is formed within a biological sample, functionalization chemistry can be used. In some embodiments, functionalization chemistry
includes hydrogel-tissue chemistry (HTC). Any hydrogel-tissue backbone (e.g., synthetic or native) suitable for HTC can be used for anchoring biological macromolecules and modulating functionalization. Non-limiting examples of methods using HTC backbone variants include CLARITY, PACT, ExM, SWITCH and ePACT. In some embodiments, hydrogel formation within a biological sample is permanent. For example, biological macromolecules can permanently adhere to the hydrogel allowing multiple rounds of interrogation. In some embodiments, hydrogel formation within a biological sample is reversible.
[0244] In some embodiments, additional reagents are added to the hydrogel subunits before, contemporaneously with, and/or after polymerization. For example, additional reagents can include but are not limited to oligonucleotides (e.g., probes), endonucleases to fragment DNA, fragmentation buffer for DNA, DNA polymerase enzymes, dNTPs used to amplify the nucleic acid and to attach the barcode to the amplified fragments. Other enzymes can be used, including without limitation, RNA polymerase, ligase, proteinase K, and DNAse. Additional reagents can also include reverse transcriptase enzymes, including enzymes with terminal transferase activity, primers, and switch oligonucleotides. In some embodiments, optical labels are added to the hydrogel subunits before, contemporaneously with, and/or after polymerization.
[0245] In some embodiments, HTC reagents are added to the hydrogel before, contemporaneously with, and/or after polymerization. In some embodiments, a cell labelling agent is added to the hydrogel before, contemporaneously with, and/or after polymerization. In some embodiments, a cell-penetrating agent is added to the hydrogel before, contemporaneously with, and/or after polymerization.
[0246] Hydrogels embedded within biological samples can be cleared using any suitable method. For example, electrophoretic tissue clearing methods can be used to remove biological macromolecules from the hydrogel-embedded sample. In some embodiments, a hydrogel-embedded sample is stored before or after clearing of hydrogel, in a medium (e.g., a mounting medium, methylcellulose, or other semi-solid mediums).
[0247] In some embodiments, a method disclosed herein comprises de-crosslinking the reversibly cross-linked biological sample. The de-crosslinking does not need to be complete.
In some embodiments, only a portion of crosslinked molecules in the reversibly cross-linked biological sample are de-crosslinked and allowed to migrate.
(viii) Tissue Permeabilization and Treatment
[0248] In some embodiments, a biological sample can be permeabilized to facilitate transfer of species (such as probes) into the sample. If a sample is not permeabilized sufficiently, the amount of species (such as probes) in the sample may be too low to enable adequate analysis. Conversely, if the tissue sample is too permeable, the relative spatial relationship of the analytes within the tissue sample can be lost. Hence, a balance between permeabilizing the tissue sample enough to obtain good signal intensity while still maintaining the spatial resolution of the analyte distribution in the sample is desirable.
[0249] In general, a biological sample can be permeabilized by exposing the sample to one or more permeabilizing agents. Suitable agents for this purpose include, but are not limited to, organic solvents (e.g., acetone, ethanol, and methanol), cross-linking agents (e.g., paraformaldehyde), detergents (e.g., saponin, Triton X-100™ or Tween-20™), and enzymes (e.g., trypsin, proteases). In some embodiments, the biological sample can be incubated with a cellular permeabilizing agent to facilitate permeabilization of the sample. Additional methods for sample permeabilization are described, for example, in Jamur et al., Method Mol. Biol. 588:63-66, 2010, the entire contents of which are incorporated herein by reference. Any suitable method for sample permeabilization can generally be used in connection with the samples described herein.
[0250] In some embodiments, the biological sample can be permeabilized by adding one or more lysis reagents to the sample. Examples of suitable lysis agents include, but are not limited to, bioactive reagents such as lysis enzymes that are used for lysis of different cell types, e.g., gram positive or negative bacteria, plants, yeast, mammalian, such as lysozymes, achromopeptidase, lysostaphin, labiase, kitalase, lyticase, and a variety of other commercially available lysis enzymes.
[0251] Other lysis agents can additionally or alternatively be added to the biological sample to facilitate permeabilization. For example, surfactant-based lysis solutions can be used to lyse sample cells. Lysis solutions can include ionic surfactants such as, for example, sarcosyl and sodium dodecyl sulfate (SDS). More generally, chemical lysis agents can
include, without limitation, organic solvents, chelating agents, detergents, surfactants, and chaotropic agents.
[0252] In some embodiments, the biological sample can be permeabilized by nonchemical permeabilization methods. Non-chemical permeabilization methods that can be used include, but are not limited to, physical lysis techniques such as electroporation, mechanical permeabilization methods (e.g., bead beating using a homogenizer and grinding balls to mechanically disrupt sample tissue structures), acoustic permeabilization (e.g., sonication), and thermal lysis techniques such as heating to induce thermal permeabilization of the sample.
[0253] Additional reagents can be added to a biological sample to perform various functions prior to analysis of the sample. In some embodiments, DNase and RNase inactivating agents or inhibitors such as proteinase K, and/or chelating agents such as EDTA, can be added to the sample. For example, a method disclosed herein may comprise a step for increasing accessibility of a nucleic acid for binding, e.g., a denaturation step to open up DNA in a cell for hybridization by a probe. For example, proteinase K treatment may be used to free up DNA with proteins bound thereto.
(ix) Selective Enrichment of RNA or cDNA Species
[0254] In some embodiments, where RNA or cDNA is the analyte, one or more RNA or cDNA analyte species of interest can be selectively enriched. For example, one or more species of RNA or cDNA of interest can be selected by addition of one or more oligonucleotides to the sample. In some embodiments, the additional oligonucleotide is a sequence used for priming a reaction by an enzyme (e.g., a polymerase). For example, one or more primer sequences with sequence complementarity to one or more RNAs or cDNAs of interest can be used to amplify the one or more RNAs or cDNAs of interest, thereby selectively enriching these RNAs or cDNAs.
[0255] In some aspects, when two or more analytes are analyzed, a first and second probe that is specific for (e.g., specifically hybridizes to) each RNA or cDNA analyte are used. For example, in some embodiments of the methods provided herein, templated ligation is used to detect gene expression in a biological sample. An analyte of interest (such as a protein), bound by a labelling agent or binding agent (e.g., an antibody or epitope binding fragment thereof), wherein the binding agent is conjugated or otherwise associated with a reporter
oligonucleotide comprising a reporter sequence that identifies the binding agent, can be targeted for analysis. Probes may be hybridized to the reporter oligonucleotide and ligated in a templated ligation reaction to generate a product for analysis. In some embodiments, gaps between the probe oligonucleotides may first be filled prior to ligation, using, for example, Mu polymerase, DNA polymerase, RNA polymerase, reverse transcriptase, VENT polymerase, Taq polymerase, and/or any combinations, derivatives, and variants (e.g., engineered mutants) thereof. In some embodiments, the assay can further include amplification of templated ligation products (e.g., by multiplex PCR).
[0256] In some embodiments, the analytes may be further enriched for in situ readout by immobilization at a location in the biological sample. In a non-limiting example, the analytes may comprise one or more fragments that are specific to a location in the biological sample.
[0257] Alternatively, one or more species of RNA can be down-selected (e.g., removed) using any of a variety of methods. For example, probes can be administered to a sample that selectively hybridize to ribosomal RNA (rRNA), thereby reducing the pool and concentration of rRNA in the sample. Additionally and alternatively, duplex- specific nuclease (DSN) treatment can remove rRNA (see, e.g., Archer, et al, Selective and flexible depletion of problematic sequences from RNA-seq libraries at the cDNA stage, BMC Genomics, 15 401, (2014), the entire contents of which are incorporated herein by reference). Furthermore, hydroxyapatite chromatography can remove abundant species (e.g., rRNA) (see, e.g., Vandemoot, V.A., cDNA normalization by hydroxyapatite chromatography to enrich transcriptome diversity in RNA-seq applications, Biotechniques, 53(6) 373-80, (2012), the entire contents of which are incorporated herein by reference).
[0258] A biological sample may comprise one or a plurality of analytes of interest. Methods for performing multiplexed assays to analyze two or more different analytes in a single biological sample are provided.
V. Compositions and Kits
[0259] In some aspects, provided herein are compositions and kits comprising any of the reagents for sequencing nucleic acids according to any of the embodiments described herein. Such compositions can comprise, but are not limited to, nucleic acid molecules, nucleotides conjugated to reversible labels such as fluorophores, nucleotides comprising reversible terminators, polymerases, chelators (e.g. EDTA), and salts and buffer solutions. Also
provided herein are kits, for analyzing an analyte in a biological sample according to any of the methods described herein.
[0260] In some embodiments, the kits may comprise, e.g., one or more reagents for detecting one or more target analytes, and instructions for performing one or more steps of the methods provided herein. In some embodiments, the one or more reagents for performing the methods provided herein may include, e.g., nucleotides, modified nucleotides, polymerases and/or other enzymes, hybridization probes for detection, circularizable probes for amplification, nucleic acid primers, buffers, etc.
[0261] In some embodiments, the kits may comprise one or more nucleotide mixtures comprising any combination of reversibly-terminated (e.g., 3’-OH reversibly terminated) and/or non-terminated nucleotides selected from A, T/U, G, and C.
[0262] In any of the embodiments herein, each terminated or non-terminated (e.g., 3 ’-OH reversibly terminated) nucleotide of a different base can be labeled with a different detectable label (e.g., a different fluorophore).
[0263] In some embodiments, the kits may further comprise one or more reagents required for one or more steps comprising hybridization, ligation, extension, amplification, detection, and/or sample preparation as described herein, including, for example, wash buffers and/or ligation buffers.
[0264] In some embodiments, the kit further comprises an enzyme such as a ligase and/or a polymerase described herein.
[0265] In some embodiments, the kit comprises a polymerase, for instance for performing extension of the primers and to incorporate nucleotides.
[0266] In some embodiments, the kits contain reagents for fixing, embedding, and/or permeabilizing the biological sample.
[0267] In some embodiments, the kits may contain reagents for forming a functionalized matrix (e.g., a hydrogel) and/or for functionalizing a matrix (e.g., a hydrogel) with any suitable functional moieties. In some examples, also provided are buffers and reagents for tethering the probes and products (e.g., RCA products) to the functionalized matrix.
[0268] The various components of the kit may be present in separate containers or certain compatible components may be pre-combined into a single container. In some embodiments, the kits further contain instructions for using the components of the kit to practice the provided methods.
VI. Opto-Fluidic Instruments, Codebook Databases, and Computer Systems for Analysis of Biological Samples
[0269] Also provided herein are instrument systems configured to perform any of the methods or processes described herein, and databases storing codebooks generated using the disclosed methods or processes.
[0270] The disclosed systems may comprise, for example, one or more processors; and a memory communicatively coupled to the one or more processors and configured to store instructions that, when executed by the one or more processors, cause the system to: receive a plurality of images of a biological sample, wherein the plurality of images comprises images acquired in a plurality of sequencing or probing cycles; detect, based on the plurality of images, a series of optical signals at one or more locations in the biological sample corresponding to one or more barcoded target analytes; determine, based on the series of optical signals detected in the plurality of images, a code word comprising a series of ON and OFF bits that corresponds to a barcode for one of the one or more barcoded target analytes; and identify the barcoded target analyte based on a comparison of the determined code word to a codebook, wherein the code word corresponds to a member of a codebook comprising a plurality of code words for which:
Hamming Distance ( W| Wj, W,^ W„J > K for all possible combinations of code words Wi, Wj, Wm, Wn, wherein W| Wj is a logical bitwise OR combination of any two code words Wi and Wj, wherein VFm| Wn is a logical bitwise OR combination of any two code words Wm and Wn, wherein K is an integer value greater than or equal to 1 ; wherein the codebook comprises L code words, and
wherein z, j, in. and n are integers ranging in value from 0 to L - 1 and represent indices of the code words in the codebook.
[0271] The disclosed databases for storing codebooks may comprise, for example, one or more non-transitory computer-readable storage medium components, the one or more non- transitory computer-readable storage medium components individually or collectively storing a codebook comprising a plurality of code words for which:
for all possible combinations of code words Wi, Wj, Wm, Wn, wherein IV] Wj is a logical bitwise OR combination of any two code words Wi and Wj, wherein I ] Wn is a logical bitwise OR combination of any two code words Wm and Wn, wherein K is an integer value greater than or equal to 1, wherein the codebook comprises L code words, and wherein z, j, m, and n are integers ranging in value from 0 to L - 1 and represent indices of the code words in the codebook.
[0272] The disclosed instrument systems may comprise instruments having integrated optics and fluidics modules (e.g., “opto-fluidic instruments” or “opto-fluidic systems”) for detecting target molecules (e.g., nucleic acids, proteins, antibodies, etc.) in biological samples (e.g., one or more cells or a tissue sample) as described herein. In an opto-fluidic instrument, the fluidics module is configured to deliver one or more reagents (e.g., detectably labeled nucleotides, polymerases, or conjugates) to the biological sample and/or remove spent reagents therefrom. Additionally, the optics module is configured to illuminate the biological sample with light having one or more spectral emission curves (over a range of wavelengths) and subsequently capture one or more images of emitted light signals from the biological sample during one or more sequencing cycles (e.g., as described in Section III). In some embodiments, an in situ assay (e.g., sequencing a template nucleic acid) disclosed herein can be performed using an automated instrument or system, e.g., an opto-fluidic instrument or system disclosed herein.
[0273] In various embodiments, the captured images may be processed in real time and/or at a later time to determine the presence of the one or more target molecules in the biological sample, as well as three-dimensional position information associated with each detected target molecule. Additionally, the opto-fluidics instrument includes a sample module configured to receive (and, optionally, secure) one or more biological samples. In some instances, the sample module includes an X-Y stage configured to move the biological sample along an X-Y plane (e.g., perpendicular to an objective lens of the optics module).
[0274] In various embodiments, the opto-fluidic instrument is configured to analyze one or more target molecules in their naturally occurring place (z.e., in situ) within the biological sample. For example, an opto-fluidic instrument may be an in-situ analysis system used to analyze a biological sample and detect target molecules (e.g., analytes) including but not limited to DNA, RNA, proteins, antibodies, and/or the like.
[0275] An opto-fluidic instrument that can be used for in situ target molecule detection via base-by-base sequencing (e.g., sequencing of an identifier sequence such as a barcode sequence) and/or other imaging or target molecule detection technique. That is, for example, an opto-fluidic instrument may include a fluidics module that includes fluids needed for establishing the experimental conditions required for the probing of target molecules in the sample. Further, such an opto-fluidic instrument may also include a sample module configured to receive the sample, and an optics module including an imaging system for illuminating (e.g., exciting one or more fluorescently labeled nucleotides within the sample) and/or imaging light signals received from the sample. The in situ analysis system may also include other ancillary modules configured to facilitate the operation of the opto-fluidic instrument, such as, but not limited to, cooling systems, motion calibration systems, etc.
[0276] In volumetric sample imaging systems (e.g., an optofluidic instrument), a z-stack of images is obtained for each Field of View (FOV) of the objective (FIG. 7). For such automated, high-throughput tissue imaging applications, automatically identifying relevant regions - those regions that contain target molecules such as nucleic acids or proteins - can be challenging as distribution of tissue is non-uniform in many biological samples (FIG. 8).
[0277] The data extracted from the detection and analysis methods disclosed herein (e.g., in situ detection and analysis of target analytes, such as SBS, SBL, SBH; and in situ hybridization techniques, such as smFISH and MERFISH) include the relative coordinates
within a field of view (FOV) and provides intricate information regarding tissue organization. In general, the systems and methods described herein use any suitable method to generate contrast of a sample against a background (e.g., illumination of a sample via bright field imaging, illumination of a sample via fluorescent imaging, inducing autofluorescence within the sample, adding contrast to the sample with one or more stains, etc.)
[0278] FIG. 9 shows an example workflow of analysis of a biological sample 910 (e.g., cell or tissue sample) using an opto-fluidic instrument 900, according to various embodiments. In various embodiments, the sample 910 can be a biological sample (e.g., a tissue) that includes molecules such as DNA, RNA, proteins, antibodies, etc. For example, the sample 910 can be a sectioned tissue that is treated to access the RNA thereof for labeling with circularizable DNA probes. Ligation of the probes may generate a circular DNA probe which can be enzymatically amplified and bound with fluorescent oligonucleotides, which can create bright signal that is convenient to image and has a high signal-to-noise ratio.
[0279] In various embodiments, the sample 910 may be placed in the opto-fluidic instrument 900 for analysis and detection of the molecules in the sample 910. In various embodiments, the opto-fluidic instrument 900 can be a system configured to facilitate the experimental conditions conducive for the detection of the target molecules. For example, the opto-fluidic instrument 900 can include a fluidics module 930, an optics module 940, a sample module 950, and an ancillary module 960, and these modules may be operated by a system controller 920 to create the experimental conditions for the probing of the molecules in the sample 910 by selected probes (e.g., circularizable DNA probes), as well as to facilitate the imaging of the probed sample (e.g., by an imaging system of the optics module 940). In various embodiments, the various modules of the opto-fluidic instrument 900 may be separate components in communication with each other, or at least some of them may be integrated together.
[0280] In various embodiments, the sample module 950 may be configured to receive the sample 910 into the opto-fluidic instrument 900. For instance, the sample module 950 may include a sample interface module (SIM) that is configured to receive a sample device (e.g., cassette) onto which the sample 910 can be deposited. That is, the sample 910 may be placed in the opto-fluidic instrument 900 by depositing the sample 910 (e.g., the sectioned tissue) on a sample device that is then inserted into the SIM of the sample module 950. In some instances, the sample module 950 may also include an X-Y stage onto which the SIM is
mounted. The X-Y stage may be configured to move the SIM mounted thereon (e.g., and as such the sample device containing the sample 910 inserted therein) in perpendicular directions along the two-dimensional (2D) plane of the opto-fluidic instrument 900.
[0281] The experimental conditions that are conducive for the detection of the molecules in the sample 910 may depend on the target molecule detection technique that is employed by the opto-fluidic instrument 900. For example, in various embodiments, the opto-fluidic instrument 900 can be a system that is configured to detect molecules in the sample 910 via hybridization of probes. In such cases, the experimental conditions can include molecule hybridization conditions that result in the intensity of hybridization of the target molecule (e.g., nucleic acid) to a probe (e.g., oligonucleotide) being significantly higher when the probe sequence is complementary to the target molecule than when there is a single-base mismatch. The hybridization conditions include the preparation of the sample 910 using reagents such as washing/stripping reagents, hybridizing reagents, etc., and such reagents may be provided by the fluidics module 930.
[0282] In various embodiments, the fluidics module 930 may include one or more components that may be used for storing the reagents, as well as for transporting said reagents to and from the sample device containing the sample 910. For example, the fluidics module 930 may include reservoirs configured to store the reagents, as well as a waste container configured for collecting the reagents (e.g., and other waste) after use by the opto- fluidic instrument 900 to analyze and detect the molecules of the sample 910. Further, the fluidics module 930 may also include pumps, tubes, pipettes, etc., that are configured to facilitate the transport of the reagent to the sample device (e.g., and as such the sample 910). For instance, the fluidics module 930 may include pumps (“reagent pumps”) that are configured to pump washing/stripping reagents to the sample device for use in washing/stripping the sample 910 (e.g., as well as other washing functions such as washing an objective lens of the imaging system of the optics module 940).
[0283] In various embodiments, the ancillary module 960 can be a cooling system of the opto-fluidic instrument 900, and the cooling system may include a network of coolantcarrying tubes that are configured to transport coolants to various modules of the opto-fluidic instrument 900 for regulating the temperatures thereof. In such cases, the fluidics module 930 may include coolant reservoirs for storing the coolants and pumps (e.g., “coolant pumps”) for generating a pressure differential, thereby forcing the coolants to flow from the reservoirs to
the various modules of the opto-fluidic instrument 900 via the coolant-carrying tubes. In some instances, the fluidics module 930 may include returning coolant reservoirs that may be configured to receive and store returning coolants, i.e., heated coolants flowing back into the returning coolant reservoirs after absorbing heat discharged by the various modules of the opto-fluidic instrument 900. In such cases, the fluidics module 930 may also include cooling fans that are configured to force air (e.g., cool and/or ambient air) into the returning coolant reservoirs to cool the heated coolants stored therein. In some instances, the fluidics module 930 may also include cooling fans that are configured to force air directly into a component of the opto-fluidic instrument 900 so as to cool said component. For example, the fluidics module 930 may include cooling fans that are configured to direct cool or ambient air into the system controller 920 to cool the same.
[0284] As discussed above, the opto-fluidic instrument 900 may include an optics module 940 which include the various optical components of the opto-fluidic instrument 900, such as but not limited to a camera, an illumination module (e.g., light source such as LEDs), an objective lens, and/or the like. The optics module 940 may include a fluorescence imaging system that is configured to image the fluorescence emitted by the probes (e.g., oligonucleotides) in the sample 910 after the probes are excited by light from the illumination module of the optics module 940.
[0285] In some instances, the optics module 940 may also include an optical frame onto which the camera, the illumination module, and/or the X-Y stage of the sample module 950 may be mounted.
[0286] In various embodiments, the system controller 920 may be configured to control the operations of the opto-fluidic instrument 900 (e.g., and the operations of one or more modules thereof). In some instances, the system controller 920 may take various forms, including a processor, a single computer (or computer system), or multiple computers in communication with each other. In various embodiments, the system controller 920 may be communicatively coupled with data storage, set of input devices, display system, or a combination thereof. In some cases, some or all of these components may be considered to be part of or otherwise integrated with the system controller 920, may be separate components in communication with each other, or may be integrated together. In other examples, the system controller 920 can be, or may be in communication with, a cloud computing platform.
[0287] In various embodiments, the opto-fluidic instrument 900 may analyze the sample 910 and may generate the output 970 that includes indications of the presence of the target molecules in the sample 910. For instance, with respect to the example embodiment discussed above where the opto-fluidic instrument 900 employs a hybridization technique for detecting molecules, the opto-fluidic instrument 900 may cause the sample 910 to undergo successive rounds of fluorescent probe hybridization (using two or more sets of fluorescent probes, where each set of fluorescent probes is excited by a different color channel) and be imaged to detect target molecules in the probed sample 910. In such cases, the output 970 may include optical signatures (e.g., a code word) specific to each gene, which allow the identification of the target molecules.
[0288] In some instances, an assembly for transilluminating a substrate can include a sample carrier device (e.g., a microfluidic chip or glass slide), a thermal control module configured to control the temperature of the sample carrier device (e.g., a thermoelectric module), and a light source configured to illuminate the sample carrier device. In some instances, the assembly includes a heat exchanger (e.g., a fluid block having a cooling fluid flowing therethrough). In some instances, an assembly for transilluminating can include sample carrier device (e.g., a sample substrate), an optically transparent substrate, a light source configured to illuminate the optically transparent substrate, a light scattering layer configured to scatter light from the light source, and/or a thermal control module configured to control the temperature of the sample carrier device and/or optically transparent substrate.
[0289] In some embodiments, the sample carrier device (e.g., a cassette) can be configured to receive a sample. In some embodiments, the sample carrier device can include one or more microfluidic channels, e.g., sample chambers or microfluidic channels etched into a planar substrate or chambers within a flow cell or microfluidic device.
[0290] A sample carrier device for the systems disclosed herein can include, but is not limited to, a substrate configured to receive a sample, a microscope slide and/or an adapter configured to mount microscope slides (with or without coverslips) on a microscope stage or automated stage (e.g., an automated translation or rotational stage), a substrate, and/or an adapter configured to mount slides on a microscope stage or automated stage, a substrate comprising etched sample containment chambers (e.g., chambers open to the environment) and/or an adapter configured to mount such substrates on a microscope stage or automated stage, a flow cell and/or an adapter configured to mount flow cells on a microscope stage or
automated stage, or a microfluidic device and/or an adapter configured to mount microfluidic devices on a microscope stage or automated stage. In some embodiments, the sample carrier device further includes a cassette configured to secure a substrate (e.g., a glass slide). In some embodiments, the cassette includes two or more components (e.g., a top half and a bottom half) into which the substrate is secured.
[0291] In some instances, the one or more sample carrier devices can be designed for performing a variety of chemical analysis, biochemical analysis, nucleic acid analysis, cell analysis, or tissue analysis applications. In some instances, for example, the sample carrier device (e.g., flow cells and microfluidic devices) may comprise a sample, e.g., a tissue sample. In some instances, the sample carrier device (e.g., flow cells and microfluidic devices) may comprise a sample, e.g., a tissue sample, placed in contact with, e.g., a substrate (e.g., a surface of the flow cell or microfluidic device).
[0292] The sample carrier devices for the disclosed systems (e.g., microscope slides, substrates comprising one or more etched microfluidic channel, flow cells or microfluidic devices comprising one or more microfluidic channels, etc.) can be fabricated from any of a variety of materials known to those of skill in the art including, but not limited to, glass (e.g., borosilicate glass, soda lime glass, etc.), fused silica (quartz), silicon, polymer (e.g., polystyrene (PS), macroporous polystyrene (MPPS), polymethylmethacrylate (PMMA), polycarbonate (PC), polypropylene (PP), polyethylene (PE), high density polyethylene (HDPE), cyclic olefin polymers (COP), cyclic olefin copolymers (COC), polyethylene terephthalate (PET), poly dimethylsiloxane (PDMS), etc.), polyetherimide (PEI) and perfluoroelastomer (FFKM) as more chemically inert alternatives, or any combination thereof. FFKM is also known as Kalrez.
[0293] The one or more materials used to fabricate sample carrier devices for the disclosed systems (e.g., substrates configured to receive a sample, microscope slides, substrates comprising one or more etched microfluidic channels, flow cells or microfluidic devices comprising one or more microfluidic channels or sample chambers, etc.) can be optically transparent to facilitate use with spectroscopic or imaging-based detection techniques. In some instances, the entire sample carrier device can be optically transparent. Alternatively, in some instances, only a portion of the sample carrier device (e.g., an optically transparent “window”) can be optically transparent.
[0294] The sample carrier devices for the disclosed systems (e.g., substrates configured to receive a sample, microscope slides, substrates comprising one or more etched microfluidic channels, flow cells or microfluidic devices comprising one or more microfluidic channels or sample chambers, etc.) can be fabricated using any of a variety of techniques known to those of skill in the art, where the choice of fabrication technique is often dependent on the choice of material used, and vice versa. Examples of suitable sample carrier device fabrication techniques include, but are not limited to, extrusion, drawing, precision computer numerical control (CNC) machining and boring, laser photoablation, photolithography in combination with wet chemical etching, deep reactive ion etching (DRIE), micro-molding, embossing, 3D-printing, thermal bonding, adhesive bonding, anodic bonding, and the like (see, e.g., Gale, et al. (2018), “A Review of Current Methods in Microfluidic Device Fabrication and Future Commercialization Prospects”, Inventions 3, 60, 1 - 25, which is hereby incorporated by reference in its entirety).
[0295] FIG. 10A illustrates a cross-sectional view of an optics module 1000 in an imaging system. One or more illumination sources 1010, e.g., one or more light emitting diodes (LEDs), provides light through one or more optical components and an objective lens 1020 to thereby illuminate a sample 1050. In various embodiments, the optical components include a collimator 1011. In various embodiments, the optical components include a field stop 1012. In various embodiments, the optical components include one or more excitation filters 1013. In various embodiments, the one or more excitation filters 1013 are configured to filter light from the illumination source(s) 1010 for a predetermined range of wavelengths (e.g., each filter has one or more blocking band(s) and/or transmission band(s) that may be different or may overlap at least in part) and each excitation filter 1013 is aligned with appropriate illumination sources (e.g., blue LEDs, green LEDs, yellow LEDs, red LEDs, ultraviolet LEDs, etc.). In various embodiments, the optical components include a condenser 1014. In various embodiments, the optical components include a beam splitter 1015. An optical axis 1051 is illustrated extending through the center of the optical surfaces in the objective lens 1020 and its path includes an image plane, a focal plane, and input/output pupils (illustrated in FIG. 10B).
[0296] A sensor array 1060 (e.g., CMOS sensor) receives light signals from the sample 1050. In various embodiments, the optical components include one or more emission filters 1065. In various embodiments, the one or more emission filters 1065 are configured to filter
light from the sample (e.g., emitted from one or more fluorophores, autofluorescence, etc.) for a predetermined range of wavelengths (e.g., each filter has one or more blocking band(s) and/or transmission band(s) that may be different or may overlap at least in part). In various embodiments, the emission filters 1065 align (e.g., via motorized translation) with optics and/or the sensor array. In various embodiments, the sample 1050 is probed with fluorescent probes configured to bind to a target (e.g., DNA or RNA) that, when illuminated with a particular wavelength (or range of wavelengths) of light, emit light signals that can be detected by the sensor array 1060. In various embodiments, the sample 1050 is repeatedly probed with two or more (e.g., two, three, four, five, six, etc.) different sets of probes. In various embodiments, each set of probes corresponds to a specific color (e.g., blue, green, yellow, or red) such that, when illuminated by that color, probes bound to a target emit light signals. In some embodiments, the sensor array 1060 is aligned with the optical axis 1051 of the objective lens 1020 (i.e., the optical axis of the camera is coincident with and parallel to the optical axis of the objective lens 1020). In various embodiments, the sensor array 1060 is positioned perpendicularly to the objective lens 1020 (i.e., the optical axis of the camera is perpendicular to and intersects the optical axis of the objective lens 1020). In various embodiments, a tube lens 1061 is mounted in the optical path to focus light on the sensor array 1060 thereby allowing for image formation with infinity -corrected objectives. Descriptions of optical modules and illumination assemblies for use in opto-fluidic instruments can be found in U.S. provisional patent application no. 63/427,282, filed on November 22, 2022, titled “Systems and Methods for Illuminating a Sample” and U.S. provisional patent application no. 63/427,360, file on November 22, 2022, titled “Systems and Methods for Imaging Samples,” each of which is incorporated by reference in its entirety.
[0297] In various embodiments, the sample is illuminated with one or more wavelengths configured to induce fluorescence in the sample. In various embodiments, the sample is probed during one or more probing cycles with one or more fluorescent probes configured to bind to one or more target analytes. In various embodiments, the one or more wavelengths are selected to induce fluorescence in a subset of the one or more fluorescent probes. In various embodiments, each probing cycle includes illumination with two or more (e.g., four) colors of light. In various embodiments, the sample is treated with a fluorescent stain configured to illuminate one or more structures within the sample. In various embodiments, the sample is contacted with a nuclear stain. In various embodiments, the sample is contacted with 4',6-diamidino-2-phenylindole (“DAPI”) configured to bind to adenine-thymine-rich
regions in DNA. In various embodiments, illumination of the sample causes autofluorescence of the sample. In various embodiments, autofluorescence is the natural emission of light by biological structures when they have absorbed light, and may be used to distinguish the light originating from artificially added fluorescent markers. In various embodiments, fluorescence of the sample through fluorescent probes, autofluorescence, and/or a fluorescent stain can be used with the methods described herein to determine one or more focus metrics of a tissue sample.
[0298] In various embodiments, the sample is illuminated via edge lighting or transillumination along one or more edges of the sample and/or sample substrate. In various embodiments, the edge lighting provides dark-field illumination of the sample. In various embodiments, edge lighting is provided by one or more light sources positioned to provide light substantially perpendicular to a normal of the substrate surface on which the sample is disposed. In various embodiments, the substrate is a glass slide. In various embodiments, the substrate is configured as a wave guide to thereby guide light emitted from the edge lighting towards the sample. In various embodiments, illumination of the sample via edge lighting can be used with the methods described herein to determine one or more focus metrics of a tissue sample.
[0299] Example: A mouse brain tissue sample is provided (fresh frozen or FFPE). The tissue sample can optionally be permeabilized (FFPE is already permeabilized). The tissue sample is contacted with a plurality of barcoded probes. The tissue sample is positioned in an optofluidic instrument having an OR-robust codebook stored thereon and, in each probing cycle of a plurality of probing cycles, the tissue sample is contacted with fluorescent tags. Fluorescent blobs from the tissue sample are detected by the optofluidic instrument in each probing cycle and the blobs are registered and/or aligned across all cycles. The optical signals are converted into an observed codeword, for example, using a probabilistic based decoder, The resulting observed codewords from the observed optical signals are decoded against an OR-robust codebook stored on the instrument.
[0300] FIG. 11 illustrates an example of a computing device or system in accordance with one or more examples of the disclosure. Device 1100 can be a host computer connected to a network. Device 1100 can be a client computer or a server. As shown in FIG. 11, device 1100 can be any suitable type of microprocessor-based device, such as a personal computer, workstation, server, or handheld computing device (portable electronic device), such as a
phone or tablet. The device can include, for example, one or more of processor 1110, input device 1120, output device 1130, memory / storage 1140, and communication device 1160. Input device 1120 and output device 1130 can generally correspond to those described above, and they can either be connectable or integrated with the computer.
[0301] Input device 1120 can be any suitable device that provides input, such as a touch screen, keyboard or keypad, mouse, or voice-recognition device. Output device 1130 can be any suitable device that provides output, such as a touch screen, haptics device, or speaker.
[0302] Storage 1140 can be any suitable device that provides storage, such as an electrical, magnetic, or optical memory including a RAM, cache, hard drive, or removable storage disk. Communication device 1160 can include any suitable device capable of transmitting and receiving signals over a network, such as a network interface chip or device. The components of the computer can be connected in any suitable manner, such as via a physical bus 1170 or wirelessly.
[0303] Software 1150, which can be stored in memory / storage 1140 and executed by processor 1110, can include, for example, the programming that embodies the functionality of the present disclosure (e.g., as embodied in the methods and systems described above). Software 1150 can also be stored and/or transported within any non-transitory computer- readable storage medium for use by or in connection with an instruction execution system, apparatus, or device, such as those described above, that can fetch instructions associated with the software from the instruction execution system, apparatus, or device and execute the instructions. In the context of this disclosure, a computer-readable storage medium can be any medium, such as storage 1140, that can contain or store programming for use by or in connection with an instruction execution system, apparatus, or device.
[0304] Software 1150 can also be propagated within any transport medium for use by or in connection with an instruction execution system, apparatus, or device, such as those described above, that can fetch instructions associated with the software from the instruction execution system, apparatus, or device and execute the instructions. In the context of this disclosure, a transport medium can be any medium that can communicate, propagate, or transport programming for use by or in connection with an instruction execution system, apparatus, or device. The transport readable medium can include, but is not limited to, an
electronic, magnetic, optical, electromagnetic, or infrared wired or wireless propagation medium.
[0305] Device 1100 may be connected to a network, which can be any suitable type of interconnected communication system. The network can implement any suitable communications protocol and can be secured by any suitable security protocol. The network can comprise network links of any suitable arrangement that can implement the transmission and reception of network signals, such as wireless network connections, T1 or T3 lines, cable networks, DSL, or telephone lines.
[0306] Device 1100 can implement any operating system suitable for operating on the network. Software 1150 can be written in any suitable programming language, such as C, C++, Java, or Python. In various implementations, application software embodying the functionality of the present disclosure can be deployed in different configurations, such as in a client/server arrangement or through a web browser as a web-based application or web service, for example.
VIII. TERMINOLOGY
[0307] Unless defined otherwise, all terms of art, notations and other technical and scientific terms or terminology used herein are intended to have the same meaning as is commonly understood by one of ordinary skill in the art to which the claimed subject matter pertains. In some cases, terms with commonly understood meanings are defined herein for clarity and/or for ready reference, and the inclusion of such definitions herein should not necessarily be construed to represent a substantial difference over what is generally understood in the art.
[0308] The terms "polynucleotide," "polynucleotide," and "nucleic acid molecule", used interchangeably herein, refer to polymeric forms of nucleotides of any length, either ribonucleotides or deoxyribonucleotides. Thus, this term comprises, but is not limited to, single-, double-, or multi- stranded DNA or RNA, genomic DNA, cDNA, DNA-RNA hybrids, or a polymer comprising purine and pyrimidine bases or other natural, chemically or biochemically modified, non-natural, or derivatized nucleotide bases. The backbone of the polynucleotide can comprise sugars and phosphate groups (as may typically be found in RNA or DNA), or modified or substituted sugar or phosphate groups.
[0309] A “primer” used herein can be an oligonucleotide, either natural or synthetic, that is capable, upon forming a duplex with a polynucleotide template, of acting as a point of initiation of nucleic acid synthesis and being extended from its 3' end along the template so that an extended duplex is formed. The sequence of nucleotides added during the extension process is determined by the sequence of the template nucleic acid. Primers usually are extended by a DNA polymerase.
[0310] “Ligation” may refer to the formation of a covalent bond or linkage between the termini of two or more nucleic acids, e.g., oligonucleotides and/or polynucleotides, in a template-driven reaction. The nature of the bond or linkage may vary widely and the ligation may be carried out enzymatically or chemically. As used herein, ligations are usually carried out enzymatically to form a phosphodiester linkage between a 5' carbon terminal nucleotide of one oligonucleotide with a 3' carbon of another nucleotide.
[0311] The term "about" as used herein refers to the usual error range for the respective value readily known to the skilled person in this technical field. Reference to "about" a value or parameter herein comprises (and describes) embodiments that are directed to that value or parameter per se.
[0312] As used herein, the singular forms "a," "an," and "the" comprise plural referents unless the context clearly dictates otherwise. For example, "a" or "an" means "at least one" or "one or more."
[0313] Throughout this disclosure, various aspects of the claimed subject matter are presented in a range format. It should be understood that the description in range format is merely for convenience and brevity and should not be construed as an inflexible limitation on the scope of the claimed subject matter. Accordingly, the description of a range should be considered to have specifically disclosed all the possible sub-ranges as well as individual numerical values within that range. For example, where a range of values is provided, it is understood that each intervening value, between the upper and lower limit of that range and any other stated or intervening value in that stated range is encompassed within the claimed subject matter. The upper and lower limits of these smaller ranges may independently be comprised in the smaller ranges, and are also encompassed within the claimed subject matter, subject to any specifically excluded limit in the stated range. Where the stated range comprises one or both of the limits, ranges excluding either or both of those comprised limits
are also comprised in the claimed subject matter. This applies regardless of the breadth of the range.
[0314] Use of ordinal terms such as “first”, “second”, “third”, etc., in the claims to modify a claim element does not by itself connote any priority, precedence, or order of one claim element over another or the temporal order in which acts of a method are performed, but are used merely as labels to distinguish one claim element having a certain name from another element having a same name (but for use of the ordinal term) to distinguish the claim elements. Similarly, use of a), b), etc., or i), ii), etc. does not by itself connote any priority, precedence, or order of steps in the claims. Similarly, the use of these terms in the specification does not by itself connote any required priority, precedence, or order.
[0315] The present disclosure is not intended to be limited in scope to the particular disclosed embodiments, which are provided, for example, to illustrate various aspects of the present disclosure. Various modifications to the compositions and methods described will become apparent from the description and teachings herein. Such variations may be practiced without departing from the true scope and spirit of the disclosure and are intended to fall within the scope of the present disclosure.
Claims
1. A method comprising: receiving a codebook comprising a plurality of valid code words, wherein, for all valid code words in the codebook, a first Hamming distance between a first logical bitwise OR combination of any pair of valid code words and a second logical bitwise OR combination of any other pair of valid code words is greater than or equal to a predetermined number; receiving a plurality of images of a biological sample acquired over a plurality of sequencing or probing cycles, the plurality of images comprising a plurality of observed optical signals; decoding the plurality of observed optical signals in the plurality of images to obtain a plurality of observed code words; and for each observed code word: determining an assignment of the observed code word to one of the plurality of valid code words, or determining that the observed code word is not a valid code word of the plurality of valid code words.
2. The method of claim 1, wherein the predetermined number is 1.
3. The method of claim 1 or claim 2, wherein decoding the plurality of observed optical signals in the plurality of images to obtain a plurality of observed code words comprises: determining a location of each observed optical signal in a first image of the plurality of images; aligning the locations of the plurality of observed optical signals in the plurality of images to obtain a series of observed optical signals at each location; and obtaining the plurality of observed code words based on the series of observed optical signals at each location.
4. The method of claim 3, wherein aligning the locations of the plurality of observed optical signals in the plurality of images comprises registering the plurality of images acquired over the plurality of sequencing or probing cycles.
5. The method of any one of claims 1 to 4, wherein each observed optical signal of the plurality of observed optical signals comprises at least one intensity value representing an intensity of the observed optical signal.
6. The method of claim 5, wherein the at least one intensity value comprises an analog intensity value.
7. The method of claim 5 or claim 6, wherein the at least one intensity value comprises a raw intensity value, a normalized intensity value, or a calculated intensity value calculated based on at least one of: a size of a feature corresponding to the observed optical signal, a circularity of a feature corresponding to the observed optical signal, or one or more Gaussian statistical parameters characterizing a feature corresponding to the observed optical signal.
8. The method of any one of claims 5 to 7, further comprising comparing the at least one intensity value representing an intensity of each observed optical signal to a predetermined intensity threshold to determine a binary value representing the intensity of each observed optical signal.
9. The method of claim 8, wherein each binary value comprises a 1 or a 0, wherein 1 represents an observed optical signal for which intensity is greater than or equal to the predetermined intensity threshold and 0 represents an observed optical signal for which intensity is less than the predetermined intensity threshold.
10. The method of claim 8 or claim 9, wherein decoding the plurality of observed optical signals in the plurality of images comprises obtaining the plurality of observed code words based on a series of binary values determined for each location.
11. The method of claim 10, wherein each observed code word of the plurality of code words comprises a plurality of code word segments, and wherein each code word segment comprises a specified string of binary values that corresponds to one of a specified set of observed optical signal states.
12. The method of claim 11, wherein each code word segment comprises a four bit string of binary values such that: a code word segment of 1 0 00 corresponds to a first optical signal state, A, in which an optical signal is detected in a first detection channel of a four-channel optical imaging instrument, and no optical signal is detected in a second, third, or fourth detection channel of the four-channel optical imaging instrument; a code word segment of 0 1 00 corresponds to a second optical signal state, B, in which an optical signal is detected in the second detection channel of the four-channel optical imaging instrument, and no optical signal is detected in the first, third, or fourth detection channel of the four-channel optical imaging instrument; a code word segment of 00 1 0 corresponds to a third optical signal state, C, in which an optical signal is detected in the third detection channel of the four-channel optical imaging instrument, and no optical signal is detected in the first, second, or fourth detection channel of the four-channel optical imaging instrument; a code word segment of 00 0 1 corresponds to a fourth optical signal state, D, in which an optical signal is detected in the fourth detection channel of the four-channel optical imaging instrument, and no optical signal is detected in the first, second, or third detection channel of the four-channel optical imaging instrument; and a code word segment of 00 00 corresponds to a fifth optical signal state, E, in which no optical signal is detected in any of the first, second, third, or fourth detection channels of the four-channel optical imaging instrument.
13. The method of any one of claims 10 to 12, wherein determining the assignment of the observed code word to one of the plurality of valid code words comprises identifying a valid code word of the plurality of valid code words that is identical to the observed code word.
14. The method of any one of claims 10 to 13, wherein determining the assignment of the observed code word to one of the plurality of valid code words comprises changing at least one of the binary values in the series of binary values corresponding to the observed code word to thereby assign the observed code word to a valid code word of the plurality of valid code words.
15. The method of any one of claims 1 to 14, wherein determining the assignment of the observed code word to one of the plurality of valid code words comprises determining a plurality of scores based on comparison of the observed code word to all or a portion of the plurality of valid code words.
16. The method of claim 15, further comprising selecting one of the plurality of valid code words having a highest score to assign as a replacement for the observed code word.
17. The method of any one of claims 1 to 16, wherein the plurality of images comprises a plurality of images comprising different fields-of-view of the biological sample.
18. The method of any one of claims 1 to 17, wherein the plurality of images comprises a plurality of z-stack images of the biological sample.
19. The method of any one of claims 1 to 18, wherein the plurality of observed optical signals represents light emitted from a plurality of fluorophores.
20. The method of any one of claims 1 to 19, further comprising identifying a target analyte in the biological sample based on the determined assignment of the observed code word to a valid code word and the codebook.
21. The method of claim 20, wherein the identified target analyte comprises a messenger RNA (mRNA) molecule or protein molecule.
22. The method of any one of claims 1 to 21, wherein each valid code word of the plurality of valid code words has a second Hamming distance of greater than or equal to 4 from every other valid code word.
23. The method of any one of claims 1 to 22, wherein the codebook comprises at least 50 valid code words.
24. The method of any one of claims 1 to 23, wherein the codebook comprises up to 200,000 valid code words.
25. A system comprising: a codebook database; a computing system comprising at least one computer-readable storage medium having program instructions stored thereon, the program instructions executable by at least
one processor of the computing system to cause the at least one processor to perform a method comprising: receiving a codebook from the codebook database comprising a plurality of valid code words, wherein, for all valid code words in the codebook, a first Hamming distance between a first logical bitwise OR combination of any pair of valid code words and a second logical bitwise OR combination of any other pair of valid code words is greater than or equal to a predetermined number; receiving a plurality of images of a biological sample acquired over a plurality of sequencing or probing cycles, the plurality of images comprising a plurality of observed optical signals; decoding the plurality of observed optical signals in the plurality of images to obtain a plurality of observed code words; and for each observed code word: determining an assignment of the observed code word to one of the plurality of valid code words, or determining that the observed code word is not a valid code word of the plurality of valid code words.
26. The system of claim 25, wherein the predetermined number is 1.
27. A computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor to cause the processor to perform the method of any one of claims 1 to 24.
28. A method comprising: receiving a codebook comprising a plurality of valid code words, wherein, for at least a first portion of the valid code words in the codebook, a first Hamming distance between a first logical bitwise OR combination of any pair of valid code words and a second logical bitwise OR combination of any other pair of valid code words is greater than or equal to a predetermined number; receiving a plurality of images of a biological sample acquired over a plurality of sequencing or probing cycles, the plurality of images comprising a plurality of observed optical signals;
decoding the plurality of observed optical signals in the plurality of images to obtain a plurality of observed code words; and for each observed code word: determining an assignment of the observed code word to one of the plurality of valid code words, or determining that the observed code word is not a valid code word of the plurality of valid code words.
29. The method of claim 28, wherein the predetermined number is 1.
30. The method of claim 28 or claim 29, wherein decoding the plurality of observed optical signals in the plurality of images to obtain a plurality of observed code words comprises: determining a location of each observed optical signal in a first image of the plurality of images; aligning the locations of the plurality of observed optical signals in the plurality of images to obtain a series of observed optical signals at each location; and obtaining the plurality of observed code words based on the series of observed optical signals at each location.
31. The method of claim 30, wherein aligning the locations of the plurality of observed optical signals in the plurality of images comprises registering the plurality of images acquired over the plurality of sequencing or probing cycles.
32. The method of any one of claims 28 to 31, wherein each observed optical signal of the plurality of observed optical signals comprises at least one intensity value representing an intensity of the observed optical signal.
33. The method of claim 32, wherein the at least one intensity value comprises an analog intensity value.
34. The method of claim 32 or claim 33, wherein the at least one intensity value comprises a raw intensity value, a normalized intensity value, or a calculated intensity value calculated based on at least one of: a size of a feature corresponding to the observed optical signal, a circularity of a feature corresponding to the observed optical signal, or one or more Gaussian statistical parameters characterizing a feature corresponding to the observed optical signal.
35. The method of any one of claims 32 to 34, further comprising comparing the at least one intensity value representing an intensity of each observed optical signal to a predetermined intensity threshold to determine a binary value representing the intensity of each observed optical signal.
36. The method of claim 35, wherein each binary value comprises a 1 or a 0, wherein 1 represents an observed optical signal for which intensity is greater than or equal to the predetermined intensity threshold and 0 represents an observed optical signal for which intensity is less than the predetermined intensity threshold.
37. The method of claim 35 or claim 36, wherein decoding the plurality of observed optical signals in the plurality of images comprises obtaining the plurality of observed code words based on a series of binary values determined for each location.
38. The method of claim 37, wherein each observed code word of the plurality of code words comprises a plurality of code word segments, and wherein each code word segment comprises a specified string of binary values that corresponds to one of a specified set of observed optical signal states.
39. The method of claim 38, wherein each code word segment comprises a four bit string of binary values such that: a code word segment of 1000 corresponds to a first optical signal state, A, in which an optical signal is detected in a first detection channel of a four-channel optical imaging instrument, and no optical signal is detected in a second, third, or fourth detection channel of the four-channel optical imaging instrument; a code word segment of 0100 corresponds to a second optical signal state, B, in which an optical signal is detected in the second detection channel of the four-channel optical imaging instrument, and no optical signal is detected in the first, third, or fourth detection channel of the four-channel optical imaging instrument; a code word segment of 0010 corresponds to a third optical signal state, C, in which an optical signal is detected in the third detection channel of the four-channel optical imaging instrument, and no optical signal is detected in the first, second, or fourth detection channel of the four-channel optical imaging instrument;
a code word segment of 0001 corresponds to a fourth optical signal state, D, in which an optical signal is detected in the fourth detection channel of the four-channel optical imaging instrument, and no optical signal is detected in the first, second, or third detection channel of the four-channel optical imaging instrument; and a code word segment of 0000 corresponds to a fifth optical signal state, E, in which no optical signal is detected in any of the first, second, third, or fourth detection channels of the four-channel optical imaging instrument.
40. The method of any one of claims 35 to 39, wherein determining the assignment of the observed code word to one of the plurality of valid code words comprises identifying a valid code word of the plurality of valid code words that is identical to the observed code word.
41. The method of any one of claims 35 to 40, wherein determining the assignment of the observed code word to one of the plurality of valid code words comprises changing at least one of the binary values in the series of binary values corresponding to the observed code word to thereby assign the observed code word to a valid code word of the plurality of valid code words.
42. The method of any one of claims 28 to 41, wherein determining the assignment of the observed code word to one of the plurality of valid code words comprises determining a plurality of scores based on comparison of the observed code word to all or a portion of the plurality of valid code words.
43. The method of claim 42, further comprising selecting one of the plurality of valid code words having a highest score to assign as a replacement for the observed code word.
44. The method of any one of claims 28 to 43, wherein the plurality of images comprises a plurality of images comprising different fields-of-view of the biological sample.
45. The method of any one of claims 28 to 44, wherein the plurality of images comprises a plurality of z-stack images of the biological sample.
46. The method of any one of claims 28 to 45, wherein the plurality of observed optical signals represents light emitted from a plurality of fluorophores.
47. The method of any one of claims 28 to 46, further comprising identifying a target analyte in the biological sample based on the determined assignment of the observed code word to a valid code word and the codebook.
48. The method of claim 47, wherein the identified target analyte comprises a messenger RNA (mRNA) molecule or protein molecule.
49. The method of any one of claims 28 to 48, wherein each valid code word of the plurality of valid code words has a second Hamming distance of greater than or equal to 2 from every other valid code word.
50. The method of any one of claims 28 to 49, wherein the codebook comprises at least 50 valid code words.
51. The method of any one of claims 28 to 50, wherein the codebook comprises up to 100,000 code words.
52. A system comprising: a codebook database; a computing system comprising at least one computer-readable storage medium having program instructions stored thereon, the program instructions executable by at least one processor of the computing system to cause the at least one processor to perform a method comprising: receiving a codebook comprising a plurality of valid code words, wherein, for at least a first portion of the valid code words in the codebook, a first Hamming distance between a first logical bitwise OR combination of any pair of valid code words and a second logical bitwise OR combination of any other pair of valid code words is greater than or equal to 1; receiving a plurality of images of a biological sample acquired over a plurality of sequencing or probing cycles, the plurality of images comprising a plurality of observed optical signals; decoding the plurality of observed optical signals in the plurality of images to obtain a plurality of observed code words; and for each observed code word: determining an assignment of the observed code word to one of the plurality of valid code words, or
determining that the observed code word is not a valid code word of the plurality of valid code words.
53. A computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor to cause the processor to perform the method of any one of claims 28 to 51.
54. A database comprising: a codebook comprising a plurality of valid code words, wherein, for all valid code words in the codebook, a first Hamming distance between a first logical bitwise OR combination of any pair of valid code words and a second logical bitwise OR combination of any other pair of valid code words is greater than or equal to 1.
55. A method comprising: receiving a codebook comprising a plurality of valid code words, wherein, for all valid code words in the codebook, a first Hamming distance between a first logical bitwise OR combination of any pair of valid code words and a second logical bitwise OR combination of any other pair of valid code words is greater than or equal to 1 ; receiving a plurality of locations for a plurality of observed optical signals, wherein the plurality of observed optical signals are obtained from a plurality of images of a biological sample acquired over a plurality of sequencing or probing cycles; decoding the plurality of observed optical signals to obtain a plurality of observed code words at the plurality of locations; and for each observed code word: determining an assignment of the observed code word to one of the plurality of valid code words, or determining that the observed code word is not a valid code word of the plurality of valid code words.
56. A method comprising: receiving a codebook having a plurality of code words, wherein, for all valid code words, a first Hamming distance between a first logical bitwise OR combination of any pair
of valid code words and a second logical bitwise OR combination of any other pair of valid code words is greater than or equal to 1 ; receiving a analyte-index assignment for a plurality of target analytes; and using the analyte-index assignment to assign each target analyte of the plurality of target analytes to at least one of the plurality of code words such that each code word has at most one target analyte assignment, thereby generating an analytecodeword assignment matrix.
57. The method of claim 56, wherein each code word of the plurality of codewords has an associated index.
58. The method of claim 56 or claim 57, wherein receiving the analyte-index assignment comprises receiving an analyte-index matrix.
59. The method of any one of claims 56 to 58, wherein assigning each target analyte of the plurality of target analytes to at least one of the plurality of codewords comprises linking the plurality of target analytes and plurality of codewords based on the same indices.
60. The method of any one of claims 56 to 59, wherein the plurality of target analytes comprises a plurality of nucleic acids.
61. The method of claim 60, wherein the plurality of nucleic acids comprises a plurality of genes.
62. The method of claim 60, wherein the plurality of nucleic acids comprises a plurality of RNA transcripts.
63. The method of any one of claims 56 to 59, wherein the plurality of target analytes comprises a plurality of proteins.
64. A method for performing in situ decoding comprising: receiving a plurality of images of a biological sample, wherein the plurality of images comprises images acquired in a plurality of sequencing or probing cycles; detecting, based on the plurality of images, a series of optical signals at one or more locations in the biological sample corresponding to one or more barcoded target analytes;
determining, based on the series of optical signals detected in the plurality of images, a code word comprising a series of ON and OFF bits that corresponds to a barcode for one of the one or more barcoded target analytes; and identifying the barcoded target analyte based on a comparison of the determined code word to a codebook, wherein the code word corresponds to a member of a codebook comprising a plurality of code words for which:
Hamming Distance ( V | Wj, VFm| Wn) > K for all possible combinations of code words Wi, Wj, Wm, Wn, wherein V | Wj is a logical bitwise OR combination of any two code words Wi and Wj, wherein VFm| Wn is a logical bitwise OR combination of any two code words Wm and Wn, wherein K is an integer value greater than or equal to 1; wherein the codebook comprises L code words, and wherein z, j, m, and n are integers ranging in value from 0 to L- 1 and represent indices of the code words in the codebook.
65. The method of claim 64, wherein each series of optical signals detected in the plurality of images at the one or more locations comprises a series of ON signals and OFF signals.
66. The method of claim 64 or claim 65, wherein the plurality of code words in the codebook further satisfy a property that:
Hamming Distance (Wi, Wj) > Q for any two pairwise combination of code words Wi and Wj, wherein Q is an integer value greater than or equal to 3.
67. The method of any one of claims 64 to 66, wherein two or more code words are determined that correspond to two or more barcoded target analytes for which the corresponding series of optical signals partially overlap within the plurality of images, and wherein an error rate for correctly identifying the two or more barcoded target analytes is reduced compared to that when the plurality of code words in the codebook do not satisfy the relationship:
Hamming Distance ( V Wj, Wm| Wn) > K.
68. The method of any one of claims 64 to 67, wherein the value of K is selectable by a user during design of the codebook.
69. The method of any one of claims 64 to 68, wherein a first portion of the plurality of code words in the codebook satisfies a relationship:
Hamming Distance ( V | Wj, Wm^ W,,) > Ki, and a second portion of the plurality of code words in the codebook satisfies a relationship:
wherein Ki K .
70. The method of claim 69, wherein the values of Ki and K2 are selectable by a user during design of the codebook.
71. The method of any one of claims 64 to 70, wherein a code word from the code book is randomly assigned to each of the one or more barcoded target analytes.
72. The method of any one of claims 64 to 70, wherein a code word from the code book is assigned to each of the one or more barcoded target analytes based on a decision rule designed to ensure that a total number of ON signals detected in a given image of the plurality of images is within ± 10% of a mean number of ON signals detected per image for the plurality of images.
73. The method of any one of claims 64 to 70, wherein a code word from the code book is assigned to each of the one or more barcoded target analytes based on a decision rule designed to minimize a maximum predicted density of ON signals detected in images of the plurality of images.
74. The method of any one of claims 64 to 70, wherein a code word from the code book is assigned to each of two or more barcoded target analytes based on expression data for the two or more barcoded target analytes in clustered cell types, and wherein the clustered cell types represent a distribution of cell types found in the biological sample.
I l l
75. The method of claim 74, wherein the expression data for the two or more barcoded target analytes comprises bulk gene expression data, bulk protein expression data, spatial gene expression data, spatial protein expression data, single cell gene expression data, single cell protein expression data, or any combination thereof.
76. The method of claim 74 or claim 75, wherein the two or more barcoded target analytes are rank-ordered according to a maximum expression level across all clustered cell types, and the two or more code words are assigned to the two or more rank-ordered barcoded target analytes using an iterative process repeated for each of the two or more barcoded target analytes in decreasing order of maximum expression level, the iterative process comprising: computing a predicted density of ON signals for every combination of remaining, unassigned code words and the barcoded target analyte across the plurality of images; selecting a code word from the remaining, unassigned code words that minimizes the predicted density of ON signals across the plurality of images; and assigning the selected code word to the barcoded target analyte.
77. The method of any one of claims 64 to 76, wherein K is equal to 3, 4, or 5.
78. The method of any one of claims 64 to 77, wherein Q is equal to 4, 5, 6, 7, or 8.
79. The method of any one of claims 64 to 78, wherein the plurality of code words comprise code words of at least 60 bits, 80 bits, 100 bits, 120 bits, 140 bits, 160 bits, or 180 bits in length.
80. The method of any one of claims 64 to 79, wherein the plurality of code words in the codebook comprises at least 1,000, 5,000, 10,000, 20,000, 40,000, 60,000, 80,000, 100,000, 120,000, 140,000, 160,000, 180,000, or 200, 000 unique code words.
81. The method of any one of claim 64 to 80, wherein the series of optical signals comprise fluorescence signals.
82. The method of any one of claims 64 to 81, wherein each code word of the plurality comprises M x N bits, where M is a number of sequencing or probing cycles and N is a number of optical detection channels in an instrument configured to perform the in situ decoding.
83. The method of any one of claims 64 to 82, wherein the one or more barcoded target analytes comprise barcoded gene sequences, barcoded gene transcripts, barcoded proteins, or any combination thereof.
84. A database comprising: one or more non-transitory computer-readable storage medium components, the one or more non-transitory computer-readable storage medium components individually or collectively storing a codebook comprising a plurality of code words for which:
for all possible combinations of code words Wi, Wj, Wm, Wn, wherein V | Wj is a logical bitwise OR combination of any two code words Wi and Wj, wherein Wm| Wn is a logical bitwise OR combination of any two code words Wm and Wn, wherein K is an integer value greater than or equal to 1, wherein the codebook comprises L code words, and wherein z, j, m, and n are integers ranging in value from 0 to 1 - L and represent indices of the code words in the codebook.
85. The database of claim 84, wherein the plurality of code words in the codebook further satisfy a property that:
Hamming Distance (Wi, Wj) > Q for any two pairwise combination of code words Wi and Wj, and wherein Q is an integer value greater than or equal to 3.
86. The database of claim 84 or claim 85, wherein the value of K is selectable by a user during design of the codebook.
87. The database of any one of claims 84 to 86, wherein a first portion of the plurality of code words in the codebook satisfies a relationship:
a second portion of the plurality of code words in the codebook satisfies a relationship:
Hamming Distance Wj, > K2, wherein K1 K2.
88. The database of claim 87, wherein the values of Ki and K2 are selectable by a user during design of the codebook.
89. The database of any one of claims 84 to 88, wherein K is equal to 3, 4, or 5.
90. The database of any one of claims 84 to 89, wherein Q is equal to 4, 5, 6, 7, or 8.
91. The database of any one of claims 84 to 90, wherein the plurality of code words comprise code words of at least 60 bits, 80 bits, 100 bits, 120 bits, 140 bits, 160 bits, or 180 bits in length.
92. The database of any one of claims 84 to 91, wherein the plurality of code words in the codebook comprises at least 100, 500, 1,000, 5,000, 10,000, 20,000, 40,000, 60,000, 80,000, or 100,000 unique code words.
93. The database of any one of claims 84 to 92, wherein each code word of the plurality comprises M x N bits, where M is a number of sequencing or probing cycles and N is a number of optical detection channels in an instrument configured to perform the in situ decoding.
94. The database of any one of claims 84 to 93, wherein each code word in the codebook has at least 2 ON bits.
95. The database of any one of claims 84 to 94, wherein each code word in the codebook has no more than 4, 5, or 6 ON bits.
96. A system comprising: one or more processors; and a memory communicatively coupled to the one or more processors and configured to store instructions that, when executed by the one or more processors, cause the system to: receive a plurality of images of a biological sample, wherein the plurality of images comprises images acquired in a plurality of sequencing or probing cycles;
detect, based on the plurality of images, a series of optical signals at one or more locations in the biological sample corresponding to one or more barcoded target analytes; determine, based on the series of optical signals detected in the plurality of images, a code word comprising a series of ON and OFF bits that corresponds to a barcode for one of the one or more barcoded target analytes; and identify the barcoded target analyte based on a comparison of the determined code word to a codebook, wherein the code word corresponds to a member of a codebook comprising a plurality of code words for which:
Hamming Distance f W/| Wj, W,^ W„J > K for all possible combinations of code words Wi, Wj, Wm, Wn, wherein V | Wj is a logical bitwise OR combination of any two code words Wi and Wj, wherein VFm| Wn is a logical bitwise OR combination of any two code words Wm and Wn, wherein K is an integer value greater than or equal to 1 ; wherein the codebook comprises L code words, and wherein z, j, m, and n are integers ranging in value from 0 to 1 - L and represent indices of the code words in the codebook.
97. The system of claim 96, wherein each series of optical signals detected in the plurality of images at the one or more locations comprises a series of ON signals and OFF signals.
98. The system of claim 96 or claim 97, wherein the plurality of code words in the codebook further satisfy a property that:
Hamming Distance (Wi, Wj) > Q for any two pairwise combination of code words Wi and Wj, wherein Q is an integer value greater than or equal to 3.
99. The system of any one of claims 96 to 98, wherein two or more code words are determined that correspond to two or more barcoded target analytes for which the corresponding series of optical signals partially overlap within the plurality of images, and
wherein an error rate for correctly identifying the two or more barcoded target analytes is reduced compared to that when the plurality of code words in the codebook do not satisfy the relationship:
Hamming Distance ( W;j Wj, Wm| Wn) > K.
100. The system of any one of claims 96 to 99, wherein the value of K is selectable by a user during design of the codebook.
101. The system of any one of claims 96 to 100, wherein a first portion of the plurality of code words in the codebook satisfies a relationship:
a second portion of the plurality of code words in the codebook satisfies a relationship:
wherein Ki K .
102. The system of claim 101, wherein the values of Ki and K2 are selectable by a user during design of the codebook.
103. The system of any one of claims 96 to 102, wherein a code word from the code book is randomly assigned to each of the one or more barcoded target analytes.
104. The system of any one of claims 96 to 102, wherein a code word from the code book is assigned to each of the one or more barcoded target analytes based on a decision rule designed to ensure that a total number of ON signals detected in a given image of the plurality of images is within ± 10% of a mean number of ON signals detected per image for the plurality of images.
105. The system of any one of claims 96 to 102, wherein a code word from the code book is assigned to each of the one or more barcoded target analytes based on a decision rule designed to minimize a maximum predicted density of ON signals detected in images of the plurality of images.
106. The system of any one of claims 96 to 102, wherein a code word from the code book is assigned to each of two or more barcoded target analytes is based on expression data for the
two or more barcoded target analytes in clustered cell types, and wherein the clustered cell types represent a distribution of cell types found in the biological sample.
107. The system of claim 106, wherein the expression data for the two or more barcoded target analytes comprises bulk gene expression data, bulk protein expression data, spatial gene expression data, spatial protein expression data, single cell gene expression data, single cell protein expression data, or any combination thereof.
108. The system of claim 106 or claim 107, wherein the two or more assigned code words are rank-ordered according to code word weight, the two or more barcoded target analytes are rank-ordered according to a maximum expression level across all clustered cell types, and the two or more rank-ordered code words are assigned to the two or more rank-ordered barcoded target analytes using an iterative process repeated for each of the two or more barcoded target analytes in decreasing order of maximum expression level, the iterative process comprising: computing a predicted density of ON signals for every combination of remaining, unassigned code words and the barcoded target analyte across the plurality of images; selecting a code word from the remaining, unassigned code words that minimizes the predicted density of ON signals across the plurality of images; and assigning the selected code word to the barcoded target analyte.
109. The system of any one of claims 96 to 108, wherein K is equal to 3, 4, or 5.
110. The system of any one of claims 96 to 109, wherein Q is equal to 4, 5, 6, 7, or 8.
111. The system of any one of claims 96 to 110, wherein the plurality of code words comprise code words of at least 60 bits, 80 bits, 100 bits, 120 bits, 140 bits, 160 bits, or 180 bits in length.
112. The system of any one of claims 96 to 111, wherein the plurality of code words in the codebook comprises at least 1,000, 5,000, 10,000, 20,000, 40,000, 60,000, 80,000, or 100,000 unique code words.
113. The system of any one of claim 96 to 111, wherein the series of optical signals comprise fluorescence signals.
114. The system of any one of claims 96 to 113, wherein each code word of the plurality comprises M x N bits, where M is a number of sequencing or probing cycles and N is a number of optical detection channels in an instrument configured to perform the in situ decoding.
115. The system of any one of claims 96 to 114, wherein the one or more barcoded target analytes comprise barcoded gene sequences, barcoded gene transcripts, barcoded proteins, or any combination thereof.
116. A method of generating a codebook, the method comprising: generating a starting codeword and adding the starting codeword to a codebook, wherein the starting codeword has a predetermined number of ON bits; generating a plurality of candidate codewords having a plurality of binary segments, wherein: each candidate codeword of the plurality of candidate codewords has the predetermined number of ON bits, and each candidate codeword of the plurality of candidate codewords is at least a predetermined Hamming distance away from the starting codeword; selecting a first candidate codeword from the plurality of candidate codewords and adding the first candidate codeword to the codebook; selecting a second candidate codeword from the plurality of candidate codewords; and adding the second candidate codeword to the codebook if addition of the second candidate codeword to the codebook does not violate an OR-robust property of the codebook, or discarding the second candidate codeword if addition of the second candidate codeword to the codebook would violate an OR-robust property of the codebook.
117. The method of claim 116, wherein the predetermined number of ON bits is 3 to 10.
118. The method of claim 117, wherein the predetermined number of ON bits is 5.
119. The method of any one of claims 116 to 118, wherein a total number of the plurality of binary segments is equal to a total number of a plurality of imaging cycles of an optofluidic instrument.
120. The method of claim 119, wherein the total number of the plurality of imaging cycles is about 15 to about 200.
121. The method of claim 119, wherein the total number of the plurality of imaging cycles is about 15 to about 36.
122. The method of any one of claims 116 to 121, wherein each binary segment of the plurality of binary segments has a binary segment length representing a number of color channels imaged during each cycle of the plurality of imaging cycles.
123. The method of claim 122, wherein the binary segment length is 4.
124. The method of any one of claims 116 to 123, wherein the predetermined Hamming distance is 6.
125. The method of any one of claims 1 to 123, further comprising: iterating over all candidate codewords to add the candidate codeword to the codebook if addition of the candidate codeword to the codebook does not violate an OR-robust property of the codebook, or discarding the candidate codeword if addition of the candidate codeword to the codebook would violate an OR-robust property of the codebook.
126. The method of any one of claims 116 to 125, further comprising: for each candidate codeword: determining a first number of shared ON bits between (i) a bitwise OR combination of the candidate codeword and a first incumbent codeword in the codebook, and (ii) a second incumbent codeword in the codebook; determining a second number of shared ON bits between (i) a bitwise OR combination of the candidate codeword and the first incumbent codeword in the codebook, and (ii) a third incumbent codeword in the codebook; and based on the first number of shared ON bits and the second number of shared ON bits, determining that adding the candidate codeword to the codebook does not violate the OR-robust property of the codebook.
127. The method of any one of claims 116 to 126, wherein the OR-robust property is defined as: a first Hamming distance between a first logical bitwise OR combination of any pair of
valid code words in the codebook and a second logical bitwise OR combination of any other pair of valid code words in the codebook is greater than or equal to a predetermined OR- robust radius.
128. The method of claim 127, wherein the predetermined OR-robust radius is 2 to 10.
129. The method of any one of claims 1 to 128, further comprising storing the codebook in a database.
130. The method of claim 129, further comprising transmitting the codebook over a network from the database to a remote computing node.
131. A system comprising: a codebook database; a computing system comprising at least one computer-readable storage medium having program instructions stored thereon, the program instructions executable by at least one processor of the computing system to cause the at least one processor to perform the method of any one of claims 116 to 130.
132. A computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor to cause the processor to perform the method of any one of claims 116 to 130.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202463649266P | 2024-05-17 | 2024-05-17 | |
| US63/649,266 | 2024-05-17 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2025240918A1 true WO2025240918A1 (en) | 2025-11-20 |
Family
ID=95981260
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2025/029852 Pending WO2025240918A1 (en) | 2024-05-17 | 2025-05-16 | Systems and methods for generating codebooks |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2025240918A1 (en) |
Citations (58)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US4318846A (en) | 1979-09-07 | 1982-03-09 | Syva Company | Novel ether substituted fluorescein polyamino acid compounds as fluorescers and quenchers |
| US4757141A (en) | 1985-08-26 | 1988-07-12 | Applied Biosystems, Incorporated | Amino-derivatized phosphite and phosphate linking agents, phosphoramidite precursors, and useful conjugates thereof |
| US5066580A (en) | 1988-08-31 | 1991-11-19 | Becton Dickinson And Company | Xanthene dyes that emit to the red of fluorescein |
| US5091519A (en) | 1986-05-01 | 1992-02-25 | Amoco Corporation | Nucleotide compositions with linking groups |
| US5151507A (en) | 1986-07-02 | 1992-09-29 | E. I. Du Pont De Nemours And Company | Alkynylamino-nucleotides |
| US5188934A (en) | 1989-11-14 | 1993-02-23 | Applied Biosystems, Inc. | 4,7-dichlorofluorescein dyes as molecular probes |
| US5366860A (en) | 1989-09-29 | 1994-11-22 | Applied Biosystems, Inc. | Spectrally resolvable rhodamine dyes for nucleic acid sequence determination |
| EP0703991A1 (en) | 1994-04-04 | 1996-04-03 | Spectragen, Inc. | Dna sequencing by stepwise ligation and cleavage |
| US5688648A (en) | 1994-02-01 | 1997-11-18 | The Regents Of The University Of California | Probes labelled with energy transfer coupled dyes |
| US5800996A (en) | 1996-05-03 | 1998-09-01 | The Perkin Elmer Corporation | Energy transfer dyes with enchanced fluorescence |
| US5847162A (en) | 1996-06-27 | 1998-12-08 | The Perkin Elmer Corporation | 4, 7-Dichlororhodamine dyes |
| US5990479A (en) | 1997-11-25 | 1999-11-23 | Regents Of The University Of California | Organo Luminescent semiconductor nanocrystal probes for biological applications and process for making and using such probes |
| US6207392B1 (en) | 1997-11-25 | 2001-03-27 | The Regents Of The University Of California | Semiconductor nanocrystal probes for biological applications and process for making and using such probes |
| US6251303B1 (en) | 1998-09-18 | 2001-06-26 | Massachusetts Institute Of Technology | Water-soluble fluorescent nanocrystals |
| US6322901B1 (en) | 1997-11-13 | 2001-11-27 | Massachusetts Institute Of Technology | Highly luminescent color-selective nano-crystalline materials |
| US20020045045A1 (en) | 2000-10-13 | 2002-04-18 | Adams Edward William | Surface-modified semiconductive and metallic nanoparticles having enhanced dispersibility in aqueous media |
| US6391937B1 (en) | 1998-11-25 | 2002-05-21 | Motorola, Inc. | Polyacrylamide hydrogels and hydrogel arrays made from polyacrylamide reactive prepolymers |
| US6426513B1 (en) | 1998-09-18 | 2002-07-30 | Massachusetts Institute Of Technology | Water-soluble thiol-capped nanocrystals |
| US20030013091A1 (en) | 2001-07-03 | 2003-01-16 | Krassen Dimitrov | Methods for detection and quantification of analytes in complex mixtures |
| US20030017264A1 (en) | 2001-07-20 | 2003-01-23 | Treadway Joseph A. | Luminescent nanoparticles and methods for their preparation |
| US6576291B2 (en) | 2000-12-08 | 2003-06-10 | Massachusetts Institute Of Technology | Preparation of nanocrystallites |
| US20050100900A1 (en) | 1997-04-01 | 2005-05-12 | Manteia Sa | Method of nucleic acid amplification |
| US7057026B2 (en) | 2001-12-04 | 2006-06-06 | Solexa Limited | Labelled nucleotides |
| US20060240439A1 (en) | 2003-09-11 | 2006-10-26 | Smith Geoffrey P | Modified polymerases for improved incorporation of nucleotide analogues |
| US20060281109A1 (en) | 2005-05-10 | 2006-12-14 | Barr Ost Tobias W | Polymerases |
| US20070166705A1 (en) | 2002-08-23 | 2007-07-19 | John Milton | Modified nucleotides |
| US20090118128A1 (en) | 2005-07-20 | 2009-05-07 | Xiaohai Liu | Preparation of templates for nucleic acid sequencing |
| US7544794B1 (en) | 2005-03-11 | 2009-06-09 | Steven Albert Benner | Method for sequencing DNA and RNA by synthesis |
| US20100015607A1 (en) | 2005-12-23 | 2010-01-21 | Nanostring Technologies, Inc. | Nanoreporters and methods of manufacturing and use thereof |
| US20100047924A1 (en) | 2008-08-14 | 2010-02-25 | Nanostring Technologies, Inc. | Stable nanoreporters |
| US20100055733A1 (en) | 2008-09-04 | 2010-03-04 | Lutolf Matthias P | Manufacture and uses of reactive microcontact printing of biomolecules on soft hydrogels |
| US20100112710A1 (en) | 2007-04-10 | 2010-05-06 | Nanostring Technologies, Inc. | Methods and computer systems for identifying target-specific sequences for use in nanoreporters |
| US20100262374A1 (en) | 2006-05-22 | 2010-10-14 | Jenq-Neng Hwang | Systems and methods for analyzing nanoreporters |
| US20100261026A1 (en) | 2005-12-23 | 2010-10-14 | Nanostring Technologies, Inc. | Compositions comprising oriented, immobilized macromolecules and methods for their preparation |
| US7883869B2 (en) | 2006-12-01 | 2011-02-08 | The Trustees Of Columbia University In The City Of New York | Four-color DNA sequencing by synthesis using cleavable fluorescent nucleotide reversible terminators |
| US20110059865A1 (en) | 2004-01-07 | 2011-03-10 | Mark Edward Brennan Smith | Modified Molecular Arrays |
| US7956171B2 (en) | 2007-05-18 | 2011-06-07 | Helicos Biosciences Corp. | Nucleotide analogs |
| US8034923B1 (en) | 2009-03-27 | 2011-10-11 | Steven Albert Benner | Reagents for reversibly terminating primer extension |
| US8071755B2 (en) | 2004-05-25 | 2011-12-06 | Helicos Biosciences Corporation | Nucleotide analogs |
| US20120270305A1 (en) | 2011-01-10 | 2012-10-25 | Illumina Inc. | Systems, methods, and apparatuses to image a sample for biological or chemical analysis |
| US20130079232A1 (en) | 2011-09-23 | 2013-03-28 | Illumina, Inc. | Methods and compositions for nucleic acid sequencing |
| US20130260372A1 (en) | 2012-04-03 | 2013-10-03 | Illumina, Inc. | Integrated optoelectronic read head and fluidic cartridge useful for nucleic acid sequencing |
| US8703461B2 (en) | 2009-06-05 | 2014-04-22 | Life Technologies Corporation | Mutant RB69 DNA polymerase |
| US8808989B1 (en) | 2013-04-02 | 2014-08-19 | Molecular Assemblies, Inc. | Methods and apparatus for synthesizing nucleic acids |
| US20140371088A1 (en) | 2013-06-14 | 2014-12-18 | Nanostring Technologies, Inc. | Multiplexable tag-based reporter system |
| US9217178B2 (en) | 2004-12-13 | 2015-12-22 | Illumina Cambridge Limited | Method of nucleotide detection |
| US20160116384A1 (en) | 2014-02-21 | 2016-04-28 | Massachusetts Institute Of Technology | Expansion microscopy |
| US9399798B2 (en) | 2011-09-13 | 2016-07-26 | Lasergen, Inc. | 3′-OH unblocked, fast photocleavable terminating nucleotides and methods for nucleic acid sequencing |
| US9512422B2 (en) | 2013-02-26 | 2016-12-06 | Illumina, Inc. | Gel patterned surfaces |
| US20170253918A1 (en) | 2016-03-01 | 2017-09-07 | Expansion Technologies | Combining protein barcoding with expansion microscopy for in-situ, spatially-resolved proteomics |
| US20180052081A1 (en) | 2016-05-11 | 2018-02-22 | Expansion Technologies | Combining modified antibodies with expansion microscopy for in-situ, spatially-resolved proteomics |
| US9951385B1 (en) | 2017-04-25 | 2018-04-24 | Omniome, Inc. | Methods and apparatus that increase sequencing-by-binding efficiency |
| US10655176B2 (en) | 2017-04-25 | 2020-05-19 | Omniome, Inc. | Methods and apparatus that increase sequencing-by-binding efficiency |
| US10768173B1 (en) | 2019-09-06 | 2020-09-08 | Element Biosciences, Inc. | Multivalent binding composition for nucleic acid analysis |
| US10982280B2 (en) | 2018-11-14 | 2021-04-20 | Element Biosciences, Inc. | Multipart reagents having increased avidity for polymerase binding |
| US20220084628A1 (en) | 2020-09-16 | 2022-03-17 | 10X Genomics, Inc. | Methods and systems for barcode error correction |
| WO2023172915A1 (en) * | 2022-03-08 | 2023-09-14 | 10X Genomics, Inc. | In situ code design methods for minimizing optical crowding |
| EP4273263A2 (en) * | 2014-07-30 | 2023-11-08 | President and Fellows of Harvard College | Systems and methods for determining nucleic acids |
-
2025
- 2025-05-16 WO PCT/US2025/029852 patent/WO2025240918A1/en active Pending
Patent Citations (65)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US4318846A (en) | 1979-09-07 | 1982-03-09 | Syva Company | Novel ether substituted fluorescein polyamino acid compounds as fluorescers and quenchers |
| US4757141A (en) | 1985-08-26 | 1988-07-12 | Applied Biosystems, Incorporated | Amino-derivatized phosphite and phosphate linking agents, phosphoramidite precursors, and useful conjugates thereof |
| US5091519A (en) | 1986-05-01 | 1992-02-25 | Amoco Corporation | Nucleotide compositions with linking groups |
| US5151507A (en) | 1986-07-02 | 1992-09-29 | E. I. Du Pont De Nemours And Company | Alkynylamino-nucleotides |
| US5066580A (en) | 1988-08-31 | 1991-11-19 | Becton Dickinson And Company | Xanthene dyes that emit to the red of fluorescein |
| US5366860A (en) | 1989-09-29 | 1994-11-22 | Applied Biosystems, Inc. | Spectrally resolvable rhodamine dyes for nucleic acid sequence determination |
| US5188934A (en) | 1989-11-14 | 1993-02-23 | Applied Biosystems, Inc. | 4,7-dichlorofluorescein dyes as molecular probes |
| US5688648A (en) | 1994-02-01 | 1997-11-18 | The Regents Of The University Of California | Probes labelled with energy transfer coupled dyes |
| EP0703991A1 (en) | 1994-04-04 | 1996-04-03 | Spectragen, Inc. | Dna sequencing by stepwise ligation and cleavage |
| US5552278A (en) | 1994-04-04 | 1996-09-03 | Spectragen, Inc. | DNA sequencing by stepwise ligation and cleavage |
| US5800996A (en) | 1996-05-03 | 1998-09-01 | The Perkin Elmer Corporation | Energy transfer dyes with enchanced fluorescence |
| US5847162A (en) | 1996-06-27 | 1998-12-08 | The Perkin Elmer Corporation | 4, 7-Dichlororhodamine dyes |
| US20050100900A1 (en) | 1997-04-01 | 2005-05-12 | Manteia Sa | Method of nucleic acid amplification |
| US6322901B1 (en) | 1997-11-13 | 2001-11-27 | Massachusetts Institute Of Technology | Highly luminescent color-selective nano-crystalline materials |
| US5990479A (en) | 1997-11-25 | 1999-11-23 | Regents Of The University Of California | Organo Luminescent semiconductor nanocrystal probes for biological applications and process for making and using such probes |
| US6423551B1 (en) | 1997-11-25 | 2002-07-23 | The Regents Of The University Of California | Organo luminescent semiconductor nanocrystal probes for biological applications and process for making and using such probes |
| US6207392B1 (en) | 1997-11-25 | 2001-03-27 | The Regents Of The University Of California | Semiconductor nanocrystal probes for biological applications and process for making and using such probes |
| US6251303B1 (en) | 1998-09-18 | 2001-06-26 | Massachusetts Institute Of Technology | Water-soluble fluorescent nanocrystals |
| US6319426B1 (en) | 1998-09-18 | 2001-11-20 | Massachusetts Institute Of Technology | Water-soluble fluorescent semiconductor nanocrystals |
| US6426513B1 (en) | 1998-09-18 | 2002-07-30 | Massachusetts Institute Of Technology | Water-soluble thiol-capped nanocrystals |
| US6444143B2 (en) | 1998-09-18 | 2002-09-03 | Massachusetts Institute Of Technology | Water-soluble fluorescent nanocrystals |
| US6391937B1 (en) | 1998-11-25 | 2002-05-21 | Motorola, Inc. | Polyacrylamide hydrogels and hydrogel arrays made from polyacrylamide reactive prepolymers |
| US20020045045A1 (en) | 2000-10-13 | 2002-04-18 | Adams Edward William | Surface-modified semiconductive and metallic nanoparticles having enhanced dispersibility in aqueous media |
| US6576291B2 (en) | 2000-12-08 | 2003-06-10 | Massachusetts Institute Of Technology | Preparation of nanocrystallites |
| US20030013091A1 (en) | 2001-07-03 | 2003-01-16 | Krassen Dimitrov | Methods for detection and quantification of analytes in complex mixtures |
| US20070166708A1 (en) | 2001-07-03 | 2007-07-19 | Krassen Dimitrov | Methods for detection and quantification of analytes in complex mixtures |
| US20030017264A1 (en) | 2001-07-20 | 2003-01-23 | Treadway Joseph A. | Luminescent nanoparticles and methods for their preparation |
| US7057026B2 (en) | 2001-12-04 | 2006-06-06 | Solexa Limited | Labelled nucleotides |
| US20060188901A1 (en) | 2001-12-04 | 2006-08-24 | Solexa Limited | Labelled nucleotides |
| US20070166705A1 (en) | 2002-08-23 | 2007-07-19 | John Milton | Modified nucleotides |
| US20060240439A1 (en) | 2003-09-11 | 2006-10-26 | Smith Geoffrey P | Modified polymerases for improved incorporation of nucleotide analogues |
| US9889422B2 (en) | 2004-01-07 | 2018-02-13 | Illumina Cambridge Limited | Methods of localizing nucleic acids to arrays |
| US20110059865A1 (en) | 2004-01-07 | 2011-03-10 | Mark Edward Brennan Smith | Modified Molecular Arrays |
| US8071755B2 (en) | 2004-05-25 | 2011-12-06 | Helicos Biosciences Corporation | Nucleotide analogs |
| US9217178B2 (en) | 2004-12-13 | 2015-12-22 | Illumina Cambridge Limited | Method of nucleotide detection |
| US7544794B1 (en) | 2005-03-11 | 2009-06-09 | Steven Albert Benner | Method for sequencing DNA and RNA by synthesis |
| US20060281109A1 (en) | 2005-05-10 | 2006-12-14 | Barr Ost Tobias W | Polymerases |
| US20090118128A1 (en) | 2005-07-20 | 2009-05-07 | Xiaohai Liu | Preparation of templates for nucleic acid sequencing |
| US20100015607A1 (en) | 2005-12-23 | 2010-01-21 | Nanostring Technologies, Inc. | Nanoreporters and methods of manufacturing and use thereof |
| US20100261026A1 (en) | 2005-12-23 | 2010-10-14 | Nanostring Technologies, Inc. | Compositions comprising oriented, immobilized macromolecules and methods for their preparation |
| US20100262374A1 (en) | 2006-05-22 | 2010-10-14 | Jenq-Neng Hwang | Systems and methods for analyzing nanoreporters |
| US7883869B2 (en) | 2006-12-01 | 2011-02-08 | The Trustees Of Columbia University In The City Of New York | Four-color DNA sequencing by synthesis using cleavable fluorescent nucleotide reversible terminators |
| US20100112710A1 (en) | 2007-04-10 | 2010-05-06 | Nanostring Technologies, Inc. | Methods and computer systems for identifying target-specific sequences for use in nanoreporters |
| US7956171B2 (en) | 2007-05-18 | 2011-06-07 | Helicos Biosciences Corp. | Nucleotide analogs |
| US20100047924A1 (en) | 2008-08-14 | 2010-02-25 | Nanostring Technologies, Inc. | Stable nanoreporters |
| US20100055733A1 (en) | 2008-09-04 | 2010-03-04 | Lutolf Matthias P | Manufacture and uses of reactive microcontact printing of biomolecules on soft hydrogels |
| US8034923B1 (en) | 2009-03-27 | 2011-10-11 | Steven Albert Benner | Reagents for reversibly terminating primer extension |
| US8703461B2 (en) | 2009-06-05 | 2014-04-22 | Life Technologies Corporation | Mutant RB69 DNA polymerase |
| US20120270305A1 (en) | 2011-01-10 | 2012-10-25 | Illumina Inc. | Systems, methods, and apparatuses to image a sample for biological or chemical analysis |
| US9399798B2 (en) | 2011-09-13 | 2016-07-26 | Lasergen, Inc. | 3′-OH unblocked, fast photocleavable terminating nucleotides and methods for nucleic acid sequencing |
| US20130079232A1 (en) | 2011-09-23 | 2013-03-28 | Illumina, Inc. | Methods and compositions for nucleic acid sequencing |
| US20130260372A1 (en) | 2012-04-03 | 2013-10-03 | Illumina, Inc. | Integrated optoelectronic read head and fluidic cartridge useful for nucleic acid sequencing |
| US9512422B2 (en) | 2013-02-26 | 2016-12-06 | Illumina, Inc. | Gel patterned surfaces |
| US8808989B1 (en) | 2013-04-02 | 2014-08-19 | Molecular Assemblies, Inc. | Methods and apparatus for synthesizing nucleic acids |
| US20140371088A1 (en) | 2013-06-14 | 2014-12-18 | Nanostring Technologies, Inc. | Multiplexable tag-based reporter system |
| US20160116384A1 (en) | 2014-02-21 | 2016-04-28 | Massachusetts Institute Of Technology | Expansion microscopy |
| EP4273263A2 (en) * | 2014-07-30 | 2023-11-08 | President and Fellows of Harvard College | Systems and methods for determining nucleic acids |
| US20170253918A1 (en) | 2016-03-01 | 2017-09-07 | Expansion Technologies | Combining protein barcoding with expansion microscopy for in-situ, spatially-resolved proteomics |
| US20180052081A1 (en) | 2016-05-11 | 2018-02-22 | Expansion Technologies | Combining modified antibodies with expansion microscopy for in-situ, spatially-resolved proteomics |
| US9951385B1 (en) | 2017-04-25 | 2018-04-24 | Omniome, Inc. | Methods and apparatus that increase sequencing-by-binding efficiency |
| US10655176B2 (en) | 2017-04-25 | 2020-05-19 | Omniome, Inc. | Methods and apparatus that increase sequencing-by-binding efficiency |
| US10982280B2 (en) | 2018-11-14 | 2021-04-20 | Element Biosciences, Inc. | Multipart reagents having increased avidity for polymerase binding |
| US10768173B1 (en) | 2019-09-06 | 2020-09-08 | Element Biosciences, Inc. | Multivalent binding composition for nucleic acid analysis |
| US20220084628A1 (en) | 2020-09-16 | 2022-03-17 | 10X Genomics, Inc. | Methods and systems for barcode error correction |
| WO2023172915A1 (en) * | 2022-03-08 | 2023-09-14 | 10X Genomics, Inc. | In situ code design methods for minimizing optical crowding |
Non-Patent Citations (14)
| Title |
|---|
| "Methods in Enzymology", vol. 572, 1 January 2016, ELSEVIER, ACADEMIC PRESS, NL, ISBN: 978-0-12-805382-9, article J.R. MOFFITT ET AL: "RNA Imaging with Multiplexed Error-Robust Fluorescence In Situ Hybridization (MERFISH)", pages: 1 - 49, XP055693313, DOI: 10.1016/bs.mie.2016.03.020 * |
| ARCHER ET AL.: "Selective and flexible depletion of problematic sequences from RNA-seq libraries at the cDNA stage", BMC GENOMICS, vol. 15, 2014, pages 401, XP021187323, DOI: 10.1186/1471-2164-15-401 |
| BOLOGNESI ET AL., J. HISTOCHEM. CYTOCHEM., vol. 65, no. 8, 2017, pages 431 - 444 |
| CHEN ET AL., NAT. METHODS, vol. 13, 2016, pages 679 - 684 |
| CHEN ET AL., SCIENCE, vol. 347, no. 6221, 2015, pages 543 - 548 |
| CHEN KOK HAO ET AL: "Spatially resolved, highly multiplexed RNA profiling in single cells", SCIENCE - AUTHOR MANUSCRIPT, vol. 348, no. 6233, 24 April 2015 (2015-04-24), US, XP055879252, ISSN: 0036-8075, DOI: 10.1126/science.aaa6090 * |
| GALE ET AL.: "A Review of Current Methods in Microfluidic Device Fabrication and Future Commercialization Prospects", INVENTIONS, vol. 3, no. 60, 2018, pages 1 - 25 |
| HOAGLAND: "Handbook of Fluorescent Probes and Research Chemicals", 2002, MOLECULAR PROBES, INC. |
| JAMUR ET AL., METHOD MOL. BIOL., vol. 588, 2010, pages 63 - 66 |
| KELLERMANAK: "DNA Probes", 1993, STOCKTON PRESS |
| LIN ET AL., NAT COMMUN., vol. 6, 2015, pages 8390 |
| PIRICI ET AL., J. HISTOCHEM. CYTOCHEM., vol. 57, 2009, pages 899 - 905 |
| VANDERNOOT, V.A.: "cDNA normalization by hydroxyapatite chromatography to enrich transcriptome diversity in RNA-seq applications", BIOTECHNIQUES, vol. 53, no. 6, 2012, pages 373 - 80 |
| WETMUR: "Critical Reviews in Biochemistry and Molecular Biology", vol. 26, 1991, IRL PRESS, pages: 227 - 259 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US12365944B1 (en) | Compositions and methods for amplification and sequencing | |
| USRE48913E1 (en) | Spatially addressable molecular barcoding | |
| US20240026426A1 (en) | Decoy oligonucleotides and related methods | |
| JP2024056705A (en) | Methods and compositions for combinatorial barcoding | |
| US12417646B2 (en) | Systems and methods for image segmentation | |
| US20250334785A1 (en) | Systems and methods for actively mitigating vibrations | |
| US20250012786A1 (en) | Systems and methods for tissue bounds detection | |
| WO2025240918A1 (en) | Systems and methods for generating codebooks | |
| US20250270636A1 (en) | Multi-fluorophore single nucleotide complexes for sequencing | |
| US20250285229A1 (en) | Multi-focus image fusion with background removal | |
| US20250277262A1 (en) | Click-chemistry retention of fluorescent nucleotides | |
| US20250257394A1 (en) | Polymerase-conjugate binding stabilization | |
| US12406371B2 (en) | Systems and methods for image segmentation using multiple stain indicators | |
| US20250207189A1 (en) | Dinucleotide stochastic sequencing | |
| US20250188524A1 (en) | Graphical user interface and method of estimating an instrument run completion time | |
| US20250389658A1 (en) | Systems and methods for imaging a sample | |
| US20250117932A1 (en) | Feature pyramiding for in situ data visualizations (aka dynamic display of molecular information dependent on zoom level) | |
| US20250092443A1 (en) | Rolling circle amplification methods and probes for improved spatial analysis |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 25729977 Country of ref document: EP Kind code of ref document: A1 |