WO2024250245A1 - Method for displaying expression level of spatial group - Google Patents
Method for displaying expression level of spatial group Download PDFInfo
- Publication number
- WO2024250245A1 WO2024250245A1 PCT/CN2023/099225 CN2023099225W WO2024250245A1 WO 2024250245 A1 WO2024250245 A1 WO 2024250245A1 CN 2023099225 W CN2023099225 W CN 2023099225W WO 2024250245 A1 WO2024250245 A1 WO 2024250245A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- function
- umi
- expression
- value
- gene
- Prior art date
Links
- 230000014509 gene expression Effects 0.000 title claims abstract description 134
- 238000000034 method Methods 0.000 title claims abstract description 68
- 239000011159 matrix material Substances 0.000 claims abstract description 31
- 238000009826 distribution Methods 0.000 claims abstract description 26
- 230000000007 visual effect Effects 0.000 claims abstract description 11
- 230000006870 function Effects 0.000 claims description 133
- 108090000623 proteins and genes Proteins 0.000 claims description 74
- 238000012163 sequencing technique Methods 0.000 claims description 27
- 230000009466 transformation Effects 0.000 claims description 27
- 238000005315 distribution function Methods 0.000 claims description 9
- 241001465754 Metazoa Species 0.000 claims description 5
- 238000003384 imaging method Methods 0.000 claims description 5
- 238000003860 storage Methods 0.000 claims description 3
- 238000004590 computer program Methods 0.000 claims description 2
- 238000009877 rendering Methods 0.000 claims description 2
- 238000010586 diagram Methods 0.000 abstract description 6
- 239000000523 sample Substances 0.000 description 22
- 241000196324 Embryophyta Species 0.000 description 5
- 210000001161 mammalian embryo Anatomy 0.000 description 4
- 108091032973 (ribonucleotides)n+m Proteins 0.000 description 3
- 230000003321 amplification Effects 0.000 description 3
- 210000004027 cell Anatomy 0.000 description 3
- 239000012634 fragment Substances 0.000 description 3
- 238000003199 nucleic acid amplification method Methods 0.000 description 3
- 238000010186 staining Methods 0.000 description 3
- 241000219194 Arabidopsis Species 0.000 description 2
- IJGRMHOSHXDMSA-UHFFFAOYSA-N Atomic nitrogen Chemical compound N#N IJGRMHOSHXDMSA-UHFFFAOYSA-N 0.000 description 2
- 101001121408 Homo sapiens L-amino-acid oxidase Proteins 0.000 description 2
- 102100026388 L-amino-acid oxidase Human genes 0.000 description 2
- 101100012902 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) FIG2 gene Proteins 0.000 description 2
- 101100233916 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) KAR5 gene Proteins 0.000 description 2
- 230000001413 cellular effect Effects 0.000 description 2
- 210000002257 embryonic structure Anatomy 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 108020004999 messenger RNA Proteins 0.000 description 2
- 238000002360 preparation method Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000013518 transcription Methods 0.000 description 2
- 230000035897 transcription Effects 0.000 description 2
- 241000219195 Arabidopsis thaliana Species 0.000 description 1
- OYPRJOBELJOOCE-UHFFFAOYSA-N Calcium Chemical compound [Ca] OYPRJOBELJOOCE-UHFFFAOYSA-N 0.000 description 1
- 102000016928 DNA-directed DNA polymerase Human genes 0.000 description 1
- 108010014303 DNA-directed DNA polymerase Proteins 0.000 description 1
- 101000827703 Homo sapiens Polyphosphoinositide phosphatase Proteins 0.000 description 1
- 206010028980 Neoplasm Diseases 0.000 description 1
- 108091034117 Oligonucleotide Proteins 0.000 description 1
- 102100023591 Polyphosphoinositide phosphatase Human genes 0.000 description 1
- 230000004570 RNA-binding Effects 0.000 description 1
- JLCPHMBAVCMARE-UHFFFAOYSA-N [3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-hydroxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methyl [5-(6-aminopurin-9-yl)-2-(hydroxymethyl)oxolan-3-yl] hydrogen phosphate Polymers Cc1cn(C2CC(OP(O)(=O)OCC3OC(CC3OP(O)(=O)OCC3OC(CC3O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c3nc(N)[nH]c4=O)C(COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3CO)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cc(C)c(=O)[nH]c3=O)n3cc(C)c(=O)[nH]c3=O)n3ccc(N)nc3=O)n3cc(C)c(=O)[nH]c3=O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)O2)c(=O)[nH]c1=O JLCPHMBAVCMARE-UHFFFAOYSA-N 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 239000011324 bead Substances 0.000 description 1
- 238000010804 cDNA synthesis Methods 0.000 description 1
- 229910052791 calcium Inorganic materials 0.000 description 1
- 239000011575 calcium Substances 0.000 description 1
- 201000011510 cancer Diseases 0.000 description 1
- 210000002421 cell wall Anatomy 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 239000003153 chemical reaction reagent Substances 0.000 description 1
- 239000002299 complementary DNA Substances 0.000 description 1
- 238000010205 computational analysis Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 229910052757 nitrogen Inorganic materials 0.000 description 1
- 238000000206 photolithography Methods 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 238000010839 reverse transcription Methods 0.000 description 1
- 238000005096 rolling process Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B45/00—ICT specially adapted for bioinformatics-related data visualisation, e.g. displaying of maps or networks
Definitions
- the present invention relates to the field of bioinformatics, and in particular to a method for displaying expression levels based on a spatial group.
- Gene chip technology is used to retain the sample location information on the chip, and then the second-generation sequencing technology is used to sequence the RNA in the sample.
- the read content is superimposed back on the tissue image, thereby generating a complete gene expression image on the tissue section.
- This technology is called spatial transcriptome technology and is extremely valuable in clinical research and cancer diagnosis.
- the current spatial transcriptome platform has an accuracy of nanometers.
- the present disclosure aims to solve one of the technical problems in the related art at least to a certain extent.
- the present disclosure provides a method for displaying the expression amount of a space group on the one hand. According to an embodiment of the present disclosure, the method comprises:
- the method of uniformly adjusting the UMI value includes directly adjusting the UMI value or adjusting the gene weight.
- the method of adjusting directly according to the UMI value includes the following steps:
- step (C) calculating the curvature of the distribution function according to the distribution characteristics of the UMI values in step (B) to obtain an inflection point G;
- the gene weight adjustment method comprises the following steps:
- the expression data with spatial information is displayed using linear grayscale, resulting in no fluctuations in the expression graph below, and the difference in expression at each position is small, which cannot provide obvious visual differences and cannot provide effective visual spatial information.
- the inventors of the present disclosure provide a method for displaying spatial group expression, firstly obtaining the expression matrix of the sample based on the spatial transcriptome sequencing platform, and drawing the expression image, then introducing a nonlinear function, according to the distribution characteristics of the UMI value, uniformly adjusting the UMI value corresponding to each coordinate in the expression image, and drawing a new expression image, thereby providing effective visual spatial information.
- the expression level image is a bin expression level image
- the bin can be bin1, bin2, bin70, etc. without restriction, and can be adjusted according to the required resolution, wherein the smaller the value after the bin, the higher the resolution.
- bin1 is preferred.
- the nonlinear function includes at least one selected from a power function, a logarithmic function, a Sigmoid function, and a Tanh function. It should be noted that any nonlinear function is applicable to the present invention.
- the nonlinear function 1 is the same as or different from the nonlinear function 2.
- the nonlinear function 1 includes at least one selected from a power function, a Sigmoid function, a Tanh function, and a logarithmic function.
- the nonlinear function 2 includes at least one selected from a logarithmic function and a Tanh function.
- the method for uniformly adjusting the UMI value includes a method of directly adjusting the UMI value or a method of redistributing the gene weight.
- the method of directly adjusting the UMI value includes the following steps:
- the gene weight adjustment method includes the following steps:
- the spatial transcriptome sequencing platform includes a method selected from:
- At least one of the Biotron S1000 spatial transcriptome sequencing platform, STOmics spatial transcriptome sequencing platform, and 10xgenomics spatial transcriptome sequencing platform At least one of the Biotron S1000 spatial transcriptome sequencing platform, STOmics spatial transcriptome sequencing platform, and 10xgenomics spatial transcriptome sequencing platform.
- the sample comprises a tissue sample selected from an animal or a plant.
- Another aspect of the present disclosure provides application of the above-mentioned method for displaying spatial group expression levels in spatial transcriptome visual imaging.
- Another aspect of the present disclosure provides a system for displaying spatial group expression, the system comprising:
- a data acquisition module wherein the data acquisition module is used to obtain an expression matrix of a sample based on a spatial transcriptome sequencing platform
- An image drawing module wherein the image drawing module is used to introduce a nonlinear function, and draw an expression image after uniformly adjusting the UMI value corresponding to each coordinate in the expression matrix according to the distribution characteristics of the UMI value;
- the method of uniformly adjusting the UMI value includes directly adjusting the UMI value or adjusting the gene weight.
- the method of adjusting directly according to the UMI value includes the following steps:
- the gene weight adjustment method comprises the following steps:
- the expression level image is a bin expression level image
- the bin can be bin1, bin2, bin70, etc. without limitation, and can be adjusted according to the required resolution, wherein the smaller the value after the bin, the higher the resolution.
- bin1 is preferred.
- the nonlinear function includes at least one selected from a power function, a logarithmic function, a Sigmoid function, and a Tanh function. It should be noted that any nonlinear function is applicable to the present invention.
- the nonlinear function 1 is the same as or different from the nonlinear function 2.
- the nonlinear function 1 includes at least one selected from a power function, a Sigmoid function, a Tanh function, and a logarithmic function.
- the nonlinear function 2 includes at least one selected from a logarithmic function and a Tanh function.
- the method for uniformly adjusting the UMI value includes a method for directly adjusting the UMI value or a method for adjusting the gene weight by redistributing the gene weight;
- the method of adjusting directly according to the UMI value includes the following steps:
- the gene weight adjustment method includes the following steps:
- the spatial transcriptome sequencing platform includes at least one selected from the group consisting of the Biotron S1000 spatial transcriptome sequencing platform, the STOmics spatial transcriptome sequencing platform, and the 10xgenomics spatial transcriptome sequencing platform.
- the nonlinear function includes at least one selected from a power function, a logarithmic function, a Sigmoid function, and a Tanh function.
- the sample comprises a tissue sample selected from an animal or a plant.
- Another aspect of the present disclosure provides application of the aforementioned system for displaying spatial group expression levels in spatial transcriptome visual imaging.
- the electronic device includes a memory, a processor;
- the processor runs a program corresponding to the executable program code by reading the executable program code stored in the memory, so as to implement the method for displaying the spatial group expression quantity described above.
- the computer-readable storage medium stores a computer program, and when the program is executed by a processor, the method for displaying the spatial group expression quantity described above is implemented.
- FIG1 shows a schematic diagram of a flow chart of a spatial expression quantity display method
- FIG2 shows the FB staining image of the heart-shaped embryo of the model plant Arabidopsis thaliana according to an embodiment of the present invention
- FIG3 shows an original bin1 expression level diagram of the heart-shaped embryo based on FIG2 according to an embodiment of the present invention (100% zoom);
- FIG4 shows a distribution diagram of all different UMI points of the heart-shaped embryo based on FIG3 according to an embodiment of the present invention
- FIG5 shows sigmoid transformation functions with different intercepts according to an embodiment of the present invention
- FIG6 shows Tanh transformation functions with different intercepts according to an embodiment of the present invention
- FIG9 shows the distribution of log2(total UMI)+1 of the total UMI value of each gene in an embodiment of the present invention
- FIG. 10 shows a bin1 expression graph drawn by weighting and reorganizing UMIs for each gene in an embodiment of the present invention.
- first and second are used for descriptive purposes only and should not be understood as indicating or implying relative importance or implicitly indicating the number of the indicated technical features.
- a feature defined as “first” or “second” may explicitly or implicitly include at least one of the features.
- “plurality” means at least two, such as two, three, etc., unless otherwise clearly and specifically defined.
- the terms “optionally”, “optional” or “optionally” generally mean that the subsequently described event or circumstance may but need not occur, and that the description includes instances where the event or circumstance occurs and instances where it does not.
- bin refers to a data set at the resolution of a unit in the Stereo-seq spatial transcriptome sequencing results.
- bin1 represents a sequencing data set in a single capture unit
- bin70 represents a sequencing data set formed by 70*70 capture units.
- bin1 expression map refers to plotting the number of gene expressions in the data into a grayscale image according to their position coordinates.
- the term "sigmoid transformation function” refers to an S-shaped function commonly seen in biology, also known as an S-shaped growth curve.
- the Sigmoid function is often used as an activation function of a neural network to map variables between 0 and 1.
- Tanh transformation function refers to one of the hyperbolic functions, tanh being the hyperbolic tangent.
- the hyperbolic tangent "tanh” is derived from two basic hyperbolic functions, the hyperbolic sine and the hyperbolic cosine.
- FB staining refers to calcium white staining, which can specifically bind to plant cell walls to achieve the purpose of displaying plant tissue morphology.
- UMI Unique Molecular Identifiers
- molecular barcode technology which is to add a unique label sequence to each fragment after the original sample genome is interrupted, which is used to distinguish thousands of different fragments in the same sample.
- these label sequences can be used to eliminate errors introduced by DNA polymerase, amplification and sequencing processes.
- Molecular barcodes are usually composed of random sequences of about 10nt (such as NNNNNNN), or degenerate bases (NNNRNYN). Different from sample labels (sample index or sample barcode), molecular barcodes are label sequences added to different fragments in the same sample, while sample labels are label sequences added to distinguish different samples. Therefore, each sample can only have one identical sample label, but there can be thousands of molecular barcodes.
- Spatial transcriptome sequencing technology is a high-throughput spatiotemporal group sequencing technology. It uses two sequencings to confirm the spatial position and corresponding expression level of mRNA sequences respectively. This technology can be accurate to the cellular level or even higher resolution.
- This technology 1 first deposits DNBs containing random barcode sequences onto a modified chip that has been etched by photolithography; 2 Compared with the bead-based method, the random barcodes labeled with DNBs produced by rolling circle amplification are used to obtain a larger spatial barcode pool while maintaining sequence fidelity.
- the array is then microphotographed, incubated with primers and sequenced to obtain a data matrix containing the coordinate code (CID) of each etched DNB; 3 By hybridizing with the CID, the molecular code (MID) and the polyT sequence containing oligonucleotides are connected at each point; 4
- the next step includes the capture of tissue polyA tail RNA by loading fresh nitrogen frozen tissue sections onto the chip surface, followed by fixation, permeation, and finally reverse transcription and amplification; 5
- the amplified cDNA is collected as a template for library preparation and sequenced together with the CID; 6
- Computational analysis of sequencing data can achieve spatially resolved transcriptomic research with a resolution of 500 or 715nm.
- the present disclosure proposes a method for displaying the expression amount of a space group, as shown in FIG1 , comprising:
- the method of uniformly adjusting the UMI value includes directly adjusting the UMI value or adjusting the gene weight.
- the method of adjusting directly according to the UMI value includes the following steps:
- the gene weight adjustment method comprises the following steps:
- S170 recalculate the UMI value of each gene at the coordinate point according to the gene weight, and draw an expression level image.
- the spatial group expression quantity display method provided by the present disclosure is based on mathematical statistics and introduces a nonlinear model when converting the expression quantity into a grayscale image, which can accurately and effectively provide spatial visual information.
- the bin1 expression map is only related to the UMI value of each position.
- Those skilled in the art can adjust the UMI value by any known nonlinear function. Two methods are preferred here, including a method of adjusting directly according to the UMI value and a method of adjusting by redistributing the gene weight.
- the direct adjustment method based on the UMI value includes the following steps:
- the gene weight adjustment method includes the following steps:
- the spatial transcriptome sequencing platform includes any spatial transcriptome technology platform known in the art.
- the nonlinear function includes but is not limited to nonlinear functions such as power function, logarithmic function, Sigmoid function, Tanh function, etc.
- the sample comprises a tissue sample selected from animals and plants.
- Example 1 Method for displaying the expression level of Arabidopsis heart-shaped embryo spatial group
- Fresh Arabidopsis heart-shaped embryos were frozen and embedded in OCT embedding solution, and the embedded heart-shaped embryos were sliced and stained for FB imaging ( Figure 2).
- the tissue slices were placed on a chip containing RNA-binding capture probes, fixed and permeabilized (to release the mRNA in the cells and bind to the corresponding capture probes to obtain gene expression information), and cDNA synthesis and sequencing library preparation were performed using the captured RNA as a template.
- the prepared library was sequenced (Stereo-seq spatial transcriptome sequencing platform) to obtain the expression data of the sample, and the expression matrix was drawn based on the data.
- the expression information was restored to the corresponding position in combination with the spatial position information carried in the data to obtain the original unprocessed bin1 expression image (Figure 3), and the expression level displayed by the original bin1 was viewed in the Fiji software. It was found that due to the excessive number of pixels in the bin1 image, the expression UMI value was relatively concentrated, and the clear tissue boundary and internal contour could not be observed. The original bin1 image was gray and it was difficult to identify effective information.
- the inventors found that most of the UMI values were low, only a few points had high UMI values, and the noise was large ( Figure 4). According to the distribution of the UMI and the spatial expression and display method proposed by the present invention, the transformation function can be adjusted.
- FIG5 is a sigmoid transformation function with different intercepts
- FIG6 is a Tanh transformation function with different intercepts.
- the Tanh transformation function can effectively reduce the UMI points and increase the points with high UMI values.
- the total UMI value of each gene is counted and the log2+1 distribution is taken.
- Log 2 (total UMI)+1 is used as the weight of each gene.
- the weight of each gene is added to recalculate the UMI value of each coordinate point of each gene ( Figure 9).
- the bin1 expression map generated by the new UMI is redrawn ( Figure 10). It can be seen that although the image becomes darker as a whole, the background area is reduced, and highly expressed genes can be observed.
Landscapes
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biotechnology (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Theoretical Computer Science (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
Description
本发明涉及生物信息领域,具体地,本发明涉及基于一种空间组表达量展示方法。The present invention relates to the field of bioinformatics, and in particular to a method for displaying expression levels based on a spatial group.
利用基因芯片技术将样本位置信息保留在芯片上,再利用二代测序技术对样本中的RNA进行测序,读取的内容叠加回组织图像上,从而生成了组织切片上完整的基因表达图像,这一技术被称为空间转录组技术,在临床研究和癌症诊断中具有极高的价值。Gene chip technology is used to retain the sample location information on the chip, and then the second-generation sequencing technology is used to sequence the RNA in the sample. The read content is superimposed back on the tissue image, thereby generating a complete gene expression image on the tissue section. This technology is called spatial transcriptome technology and is extremely valuable in clinical research and cancer diagnosis.
目前空间转录组平台精确度在纳米级。The current spatial transcriptome platform has an accuracy of nanometers.
目前,带有空间信息的表达量数据展示使用线性灰度,导致在bin下表达量图没有起伏,无法提供有效的视觉空间信息。Currently, the expression data with spatial information is displayed using linear grayscale, resulting in no fluctuations in the expression graph under the bin and failing to provide effective visual spatial information.
因此,亟需一种能精确有效的为空间转录组数据提供空间视觉信息方法。Therefore, there is an urgent need for a method that can accurately and effectively provide spatial visual information for spatial transcriptome data.
发明内容Summary of the invention
本公开旨在至少在一定程度上解决相关技术中的技术问题之一。为此,本公开一方面提供一种空间组表达量展示的方法。根据本公开的实施方案,所述方法包括:The present disclosure aims to solve one of the technical problems in the related art at least to a certain extent. To this end, the present disclosure provides a method for displaying the expression amount of a space group on the one hand. According to an embodiment of the present disclosure, the method comprises:
(A)利用空间转录组测序平台获取样本的表达量矩阵;(A) The expression matrix of samples was obtained using the spatial transcriptome sequencing platform;
(B)引入非线性函数,根据UMI值的分布特征,对所述表达量矩阵中每个坐标对应的UMI值统一调整后,绘制表达量图像;(B) introducing a nonlinear function, uniformly adjusting the UMI value corresponding to each coordinate in the expression matrix according to the distribution characteristics of the UMI value, and drawing an expression image;
其中,UMI值统一调整的方法包括直接根据UMI值调整法或基因权重调整法,Among them, the method of uniformly adjusting the UMI value includes directly adjusting the UMI value or adjusting the gene weight.
所述直接根据UMI值调整法包括以下步骤:The method of adjusting directly according to the UMI value includes the following steps:
(C)根据步骤(B)中UMI值的分布特征计算分布函数曲率,得到拐点G;(C) calculating the curvature of the distribution function according to the distribution characteristics of the UMI values in step (B) to obtain an inflection point G;
(D)将非线性函数1向左或右偏移G个坐标得到变换函数,使用所述变换函数对所述表达矩阵中的UMI值重新进行计算,绘制表达量图像;(D) shifting the nonlinear function 1 to the left or right by G coordinates to obtain a transformation function, and using the transformation function to recalculate the UMI value in the expression matrix to draw an expression quantity image;
所述基因权重调整法包括以下步骤:The gene weight adjustment method comprises the following steps:
(E)统计所述表达量矩阵中每个基因总的UMI值;(E) Counting the total UMI value of each gene in the expression matrix;
(F)将所述每个基因总的UMI值分别代入到非线性函数2中得到每个基因的基因权重;(F) Substituting the total UMI value of each gene into nonlinear function 2 to obtain the gene weight of each gene;
(G)根据所述基因权重,重新计算每个基因在坐标点上的UMI值,绘制表达量图像。(G) According to the gene weight, the UMI value of each gene at the coordinate point is recalculated and an expression level image is drawn.
现有技术中带有空间信息的表达量数据展示使用线性灰度,导致在下表达量图没有起伏,各个位置表达量区别较小,不能明显在视觉上提供差异,无法提供有效的视觉空间信息。基于此,本公开的发明人提供一种空间组表达量展示的方法,首先基于空间转录组测序平台获取样本的表达量矩阵,并进行表达量图像的绘制,之后通过引入非线性函数,根据UMI值的分布特征,对表达量图像中每个坐标对应的UMI值统一调整后,绘制新的表达量图像,进而提供有效的视觉空间信息。In the prior art, the expression data with spatial information is displayed using linear grayscale, resulting in no fluctuations in the expression graph below, and the difference in expression at each position is small, which cannot provide obvious visual differences and cannot provide effective visual spatial information. Based on this, the inventors of the present disclosure provide a method for displaying spatial group expression, firstly obtaining the expression matrix of the sample based on the spatial transcriptome sequencing platform, and drawing the expression image, then introducing a nonlinear function, according to the distribution characteristics of the UMI value, uniformly adjusting the UMI value corresponding to each coordinate in the expression image, and drawing a new expression image, thereby providing effective visual spatial information.
根据本公开的实施方案,所述表达量图像为bin表达量图像,所述bin可以bin1、bin2、bin70等不做限制,根据所需的分别率进行调整,其中,bin后的数值越小时,分辨率越高,在本发明中,优选为bin1。According to the implementation scheme of the present disclosure, the expression level image is a bin expression level image, and the bin can be bin1, bin2, bin70, etc. without restriction, and can be adjusted according to the required resolution, wherein the smaller the value after the bin, the higher the resolution. In the present invention, bin1 is preferred.
根据本公开的实施方案,所述非线性函数包括选自幂函数、对数函数、Sigmoid函数、Tanh函数中至少一种,需要说明的是,任何一种非线性函数都适用于本发明。According to the embodiments of the present disclosure, the nonlinear function includes at least one selected from a power function, a logarithmic function, a Sigmoid function, and a Tanh function. It should be noted that any nonlinear function is applicable to the present invention.
根据本公开的实施方案,所述非线性函数1与非线性函数2相同或不同。According to an embodiment of the present disclosure, the nonlinear function 1 is the same as or different from the nonlinear function 2.
根据本公开的实施方案,所述非线性函数1包括选自幂函数、Sigmoid函数、Tanh函数、对数函数中至少一种。According to an embodiment of the present disclosure, the nonlinear function 1 includes at least one selected from a power function, a Sigmoid function, a Tanh function, and a logarithmic function.
根据本公开的实施方案,所述非线性函数2包括选自对数函数、Tanh函数中至少一种。According to an embodiment of the present disclosure, the nonlinear function 2 includes at least one selected from a logarithmic function and a Tanh function.
根据本公开的实施方案,UMI值统一调整的方法包括直接根据UMI值调整法或重新分配基因权重调整法。According to an embodiment of the present disclosure, the method for uniformly adjusting the UMI value includes a method of directly adjusting the UMI value or a method of redistributing the gene weight.
根据本公开的实施方案,直接根据UMI值调整法包括以下步骤: According to an embodiment of the present disclosure, the method of directly adjusting the UMI value includes the following steps:
(a)利用空间转录组平台获取样本的表达量矩阵并绘制bin1表达量图像;(a) The spatial transcriptome platform was used to obtain the expression matrix of the samples and draw the bin1 expression image;
(b)统计bin1表达量中所有不同的UMI值;(b) Count all different UMI values in bin1 expression;
(c)绘制UMI值的倒“L”形分布图,计算分布函数曲率,得到拐点G;(c) Draw an inverted “L” shaped distribution graph of the UMI value, calculate the curvature of the distribution function, and obtain the inflection point G;
(d)将非线性函数向左或右偏移G个坐标得到变换函数,使用所述变换函数对bin1表达量图中的UMI值重新进行计算,绘制新的bin1表达量图像。(d) The nonlinear function is shifted to the left or right by G coordinates to obtain a transformation function, and the UMI value in the bin1 expression image is recalculated using the transformation function to draw a new bin1 expression image.
根据本公开的实施方案,根据基因权重调整法包括以下步骤:According to an embodiment of the present disclosure, the gene weight adjustment method includes the following steps:
(e)利用空间转录组平台获取样本的表达量矩阵并绘制bin1表达量图像;(e) Use the spatial transcriptome platform to obtain the expression matrix of the samples and draw the bin1 expression image;
(f)统计原始表达量图中每个基因总的UMI值;(f) Count the total UMI value of each gene in the original expression graph;
(g)绘制log2(total UMI)+N的倒“L”形分布,设置基因的log2(total UMI)+N值为当前基因的权重,其中“N”表示任意自然数,比如0、1、2、3等,优选为N=1;(g) draw an inverted "L"-shaped distribution of log 2 (total UMI) + N, and set the log 2 (total UMI) + N value of the gene as the weight of the current gene, where "N" represents any natural number, such as 0, 1, 2, 3, etc., preferably N = 1;
(h)根据每个基因的权重,重新计算每个基因在坐标点上的UMI值,绘制新的bin1表达量图。(h) According to the weight of each gene, the UMI value of each gene at the coordinate point is recalculated and a new bin1 expression map is drawn.
根据本公开的实施方案,所述空间转录组测序平台包括选自:According to an embodiment of the present disclosure, the spatial transcriptome sequencing platform includes a method selected from:
百创S1000空间转录组测序平台、STOmics空间转录组测序平台、10xgenomics空间转录组测序平台中的至少之一。At least one of the Biotron S1000 spatial transcriptome sequencing platform, STOmics spatial transcriptome sequencing platform, and 10xgenomics spatial transcriptome sequencing platform.
根据本公开的实施方案,所述样本包括选自动物、植物的组织样本。According to an embodiment of the present disclosure, the sample comprises a tissue sample selected from an animal or a plant.
本公开另一方面提供上面所述的空间组表达量展示的方法在空间转录组视觉成像中的应用。Another aspect of the present disclosure provides application of the above-mentioned method for displaying spatial group expression levels in spatial transcriptome visual imaging.
本公开另一方面提供一种空间组表达量展示的系统,所述系统包括:Another aspect of the present disclosure provides a system for displaying spatial group expression, the system comprising:
数据获取模块,所述数据获取模块用于基于空间转录组测序平台获取样本的表达量矩阵;A data acquisition module, wherein the data acquisition module is used to obtain an expression matrix of a sample based on a spatial transcriptome sequencing platform;
图像绘制模块,所述图像绘制模块用于引入非线性函数,根据UMI值的分布特征,对所述表达量矩阵中每个坐标对应的UMI值统一调整后,绘制表达量图像;An image drawing module, wherein the image drawing module is used to introduce a nonlinear function, and draw an expression image after uniformly adjusting the UMI value corresponding to each coordinate in the expression matrix according to the distribution characteristics of the UMI value;
其中,UMI值统一调整的方法包括直接根据UMI值调整法或基因权重调整法,Among them, the method of uniformly adjusting the UMI value includes directly adjusting the UMI value or adjusting the gene weight.
所述直接根据UMI值调整法包括以下步骤:The method of adjusting directly according to the UMI value includes the following steps:
(Ⅰ)根据图像绘制模块中UMI值的分布特征计算分布函数曲率,得到拐点G;(I) Calculate the curvature of the distribution function according to the distribution characteristics of the UMI value in the image rendering module and obtain the inflection point G;
(Ⅱ)将非线性函数1向左或右偏移G对数函数个坐标得到变换函数,使用所述变换函数对所述表达矩阵中的UMI值重新进行计算,绘制表达量图像;(II) shifting the nonlinear function 1 to the left or right by G logarithmic function coordinates to obtain a transformation function, using the transformation function to recalculate the UMI value in the expression matrix, and drawing an expression quantity image;
所述基因权重调整法包括以下步骤:The gene weight adjustment method comprises the following steps:
(ⅰ)统计所述表达矩阵中每个基因总的UMI值;(i) calculating the total UMI value of each gene in the expression matrix;
(ⅱ)将所述每个基因总的UMI值分别代入到非线性函数2中得到每个基因的基因权重;(ii) Substituting the total UMI value of each gene into nonlinear function 2 to obtain the gene weight of each gene;
(ⅲ)根据所述权重,重新计算每个基因在坐标点上的UMI值,绘制表达量图像。(iii) Recalculate the UMI value of each gene at the coordinate point according to the weight and draw an expression level image.
根据本公开的实施方案,所述表达量图像为bin表达量图像,所述bin可以bin1、bin2、bin70等不做限制,根据所需的分别率进行调整,其中,bin后的数值越小时,分辨率越高,在本发明中,优选为bin1。According to the implementation scheme of the present disclosure, the expression level image is a bin expression level image, and the bin can be bin1, bin2, bin70, etc. without limitation, and can be adjusted according to the required resolution, wherein the smaller the value after the bin, the higher the resolution. In the present invention, bin1 is preferred.
根据本公开的实施方案,所述非线性函数包括选自幂函数、对数函数、Sigmoid函数、Tanh函数中至少一种,需要说明的是,任何一种非线性函数都适用于本发明。According to the embodiments of the present disclosure, the nonlinear function includes at least one selected from a power function, a logarithmic function, a Sigmoid function, and a Tanh function. It should be noted that any nonlinear function is applicable to the present invention.
根据本公开的实施方案,所述非线性函数1与非线性函数2相同或不同。According to an embodiment of the present disclosure, the nonlinear function 1 is the same as or different from the nonlinear function 2.
根据本公开的实施方案,所述非线性函数1包括选自幂函数、Sigmoid函数、Tanh函数、对数函数中至少一种。According to an embodiment of the present disclosure, the nonlinear function 1 includes at least one selected from a power function, a Sigmoid function, a Tanh function, and a logarithmic function.
根据本公开的实施方案,所述非线性函数2包括选自对数函数、Tanh函数中至少一种。According to an embodiment of the present disclosure, the nonlinear function 2 includes at least one selected from a logarithmic function and a Tanh function.
根据本公开的实施方案,UMI值统一调整的方法包括直接根据UMI值调整法或重新分配基因权重调整法;According to an embodiment of the present disclosure, the method for uniformly adjusting the UMI value includes a method for directly adjusting the UMI value or a method for adjusting the gene weight by redistributing the gene weight;
其中,直接根据UMI值调整法包括以下步骤:The method of adjusting directly according to the UMI value includes the following steps:
(i)利用空间转录组平台获取样本的表达量矩阵并绘制bin1表达量图像;(i) Using the spatial transcriptome platform to obtain the expression matrix of the samples and draw the bin1 expression image;
(j)统计bin1表达量中所有不同的UMI值;(j) Count all different UMI values in bin1 expression;
(k)绘制UMI值的倒“L”形分布图,计算分布函数曲率,得到拐点G;(k) Draw an inverted “L” shaped distribution graph of the UMI value, calculate the curvature of the distribution function, and obtain the inflection point G;
(l)将非线性函数向左或右偏移G个坐标得到变换函数,使用所述变换函数对bin1表达量图中的UMI值重新进行计算,绘制新的bin1表达量图像; (1) Shifting the nonlinear function to the left or right by G coordinates to obtain a transformation function, using the transformation function to recalculate the UMI value in the bin1 expression image, and drawing a new bin1 expression image;
根据基因权重调整法包括以下步骤:The gene weight adjustment method includes the following steps:
(m)利用空间转录组平台获取样本的表达量矩阵并绘制bin1表达量图像;(m) Use the spatial transcriptome platform to obtain the expression matrix of the samples and draw the bin1 expression image;
(n)统计原始表达量图中每个基因总的UMI值;(n) Count the total UMI value of each gene in the original expression graph;
(o)绘制log2(total UMI)+N的倒“L”形分布,设置基因的log2(total UMI)+N值为当前基因的权重,其中“N”表示任意自然数,比如0、1、2、3等,优选为N=1;(o) draw an inverted "L"-shaped distribution of log 2 (total UMI) + N, and set the log 2 (total UMI) + N value of the gene as the weight of the current gene, where "N" represents any natural number, such as 0, 1, 2, 3, etc., preferably N = 1;
(p)根据每个基因的权重,重新计算每个基因在坐标点上的UMI值,绘制新的bin1表达量图。(p) According to the weight of each gene, recalculate the UMI value of each gene at the coordinate point and draw a new bin1 expression map.
根据本公开的实施方案,所述空间转录组测序平台包括选自百创S1000空间转录组测序平台、STOmics空间转录组测序平台、10xgenomics空间转录组测序平台中的至少之一。According to an embodiment of the present disclosure, the spatial transcriptome sequencing platform includes at least one selected from the group consisting of the Biotron S1000 spatial transcriptome sequencing platform, the STOmics spatial transcriptome sequencing platform, and the 10xgenomics spatial transcriptome sequencing platform.
根据本公开的实施方案,所述非线性函数包括选自幂函数、对数函数、Sigmoid函数、Tanh函数中至少一种。According to an embodiment of the present disclosure, the nonlinear function includes at least one selected from a power function, a logarithmic function, a Sigmoid function, and a Tanh function.
根据本公开的实施方案,所述样本包括选自动物、植物的组织样本。According to an embodiment of the present disclosure, the sample comprises a tissue sample selected from an animal or a plant.
本公开另一方面提供前面所述的空间组表达量展示的系统在空间转录组视觉成像中的应用。Another aspect of the present disclosure provides application of the aforementioned system for displaying spatial group expression levels in spatial transcriptome visual imaging.
本公开另一方面提供一种电子设备。根据本公开的实施方案,所述电子设备包括存储器、处理器;Another aspect of the present disclosure provides an electronic device. According to an embodiment of the present disclosure, the electronic device includes a memory, a processor;
其中,所述处理器通过读取所述存储器中存储的可执行程序代码来运行与所述可执行程序代码对应的程序,以用于实现前面所述的空间组表达量展示的方法。The processor runs a program corresponding to the executable program code by reading the executable program code stored in the memory, so as to implement the method for displaying the spatial group expression quantity described above.
本公开另一方面提供一种计算机可读存储介质。根据本公开的实施方案,所述计算机可读存储介质存储有计算机程序,所述程序被处理器执行时实现前面所述的空间组表达量展示的方法。Another aspect of the present disclosure provides a computer-readable storage medium. According to an embodiment of the present disclosure, the computer-readable storage medium stores a computer program, and when the program is executed by a processor, the method for displaying the spatial group expression quantity described above is implemented.
本公开的附加方面和优点将在下面的描述中部分给出,部分将从下面的描述中变得明显,或通过本公开的实践了解到。Additional aspects and advantages of the present disclosure will be given in part in the following description and in part will be obvious from the following description or will be learned through practice of the present disclosure.
本公开的上述和/或附加的方面和优点从结合下面附图对实施例的描述中将变得明显和容易理解,其中:The above and/or additional aspects and advantages of the present disclosure will become apparent and easily understood from the description of the embodiments in conjunction with the following drawings, in which:
图1显示了一种空间表达量展示方法的流程示意图;FIG1 shows a schematic diagram of a flow chart of a spatial expression quantity display method;
图2显示本发明实施方案中模式植物拟南芥心形胚的FB染色图;FIG2 shows the FB staining image of the heart-shaped embryo of the model plant Arabidopsis thaliana according to an embodiment of the present invention;
图3显示本发明实施方案以图2为基础的心形胚的原始bin1表达量图(100%缩放);FIG3 shows an original bin1 expression level diagram of the heart-shaped embryo based on FIG2 according to an embodiment of the present invention (100% zoom);
图4显示本发明实施方案以图3为基础的心形胚所有不同UMI点数分布图;FIG4 shows a distribution diagram of all different UMI points of the heart-shaped embryo based on FIG3 according to an embodiment of the present invention;
图5显示本发明实施方案中不同截距的sigmoid变换函数;FIG5 shows sigmoid transformation functions with different intercepts according to an embodiment of the present invention;
图6显示本发明实施方案中不同截距的Tanh变换函数;FIG6 shows Tanh transformation functions with different intercepts according to an embodiment of the present invention;
图7显示本发明实施方案中使用y=Tanh(x-6)函数重整UMI值后的bin1表达量图;FIG. 7 shows a bin1 expression graph after re-normalizing the UMI value using the y=Tanh(x-6) function in an embodiment of the present invention;
图8显示本发明实施方案中使用y=Tanh(x-6)函数调整后不同UMI的点数分布图;FIG8 shows a distribution diagram of the points of different UMIs after adjustment using the y=Tanh(x-6) function according to an embodiment of the present invention;
图9显示本发明实施方案中每个基因总的UMI值取log2(total UMI)+1分布情况图;FIG9 shows the distribution of log2(total UMI)+1 of the total UMI value of each gene in an embodiment of the present invention;
图10显示本发明实施方案中每个基因赋权重重整UMI绘制的bin1表达量图。FIG. 10 shows a bin1 expression graph drawn by weighting and reorganizing UMIs for each gene in an embodiment of the present invention.
发明详细描述DETAILED DESCRIPTION OF THE INVENTION
下面详细描述本公开的实施例。下面描述的实施例是示例性的,仅用于解释本公开,而不能理解为对本公开的限制。The embodiments of the present disclosure are described in detail below. The embodiments described below are exemplary and are only used to explain the present disclosure, and should not be understood as limiting the present disclosure.
此外,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。在本公开的描述中,“多个”的含义是至少两个,例如两个,三个等,除非另有明确具体的限定。In addition, the terms "first" and "second" are used for descriptive purposes only and should not be understood as indicating or implying relative importance or implicitly indicating the number of the indicated technical features. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one of the features. In the description of the present disclosure, "plurality" means at least two, such as two, three, etc., unless otherwise clearly and specifically defined.
在本文中所披露的范围的端点和任何值都不限于该精确的范围或值,这些范围或值应当理解为包含接近这些范围或值的值。对于数值范围来说,各个范围的端点值之间、各个范围的端点值和单独的点值之间,以及单独的点值之间可以彼此组合而得到一个或多个新的数值范围,这些数值范围应被视为在本文中具体公开。The endpoints and any values of the ranges disclosed in this article are not limited to the precise ranges or values, and these ranges or values should be understood to include values close to these ranges or values. For numerical ranges, the endpoint values of each range, the endpoint values of each range and the individual point values, and the individual point values can be combined with each other to obtain one or more new numerical ranges, which should be considered as specifically disclosed in this article.
为了更容易理解本发明,以下具体定义了某些技术和科学术语。除显而易见在本文件中的它处另有 明确定义,否则本文中使用的所有其它技术和科学术语都具有本发明所属领域的一般技术人员通常理解的含义。In order to make the present invention more easily understood, certain technical and scientific terms are specifically defined below. Unless expressly defined otherwise, all other technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
在本文中,术语“包含”或“包括”为开放式表达,即包括本发明所指明的内容,但并不排除其他方面的内容。In this document, the terms “include” or “comprising” are open expressions, that is, including the contents specified in the present invention but not excluding other contents.
在本文中,术语“任选地”、“任选的”或“任选”通常是指随后所述的事件或状况可以但未必发生,并且该描述包括其中发生该事件或状况的情况,以及其中未发生该事件或状况的情况。As used herein, the terms "optionally", "optional" or "optionally" generally mean that the subsequently described event or circumstance may but need not occur, and that the description includes instances where the event or circumstance occurs and instances where it does not.
本发明中,术语“bin”是指Stereo-seq空间转录组测序结果中单元的分辨率的情况下的数据集,比如bin1,代表单个捕获单元里面的测序数据集,bin70表示70*70个捕获单元合并成的一个测序数据集。In the present invention, the term "bin" refers to a data set at the resolution of a unit in the Stereo-seq spatial transcriptome sequencing results. For example, bin1 represents a sequencing data set in a single capture unit, and bin70 represents a sequencing data set formed by 70*70 capture units.
本发明中,术语“bin1表达量图”是指将数据中gene表达个数按照其位置坐标绘制成灰度图像。In the present invention, the term "bin1 expression map" refers to plotting the number of gene expressions in the data into a grayscale image according to their position coordinates.
本发明中,术语“sigmoid变换函数”是指是一个在生物学中常见的S型函数,也称为S型生长曲线。在信息科学中,由于其单增以及反函数单增等性质,Sigmoid函数常被用作神经网络的激活函数,将变量映射到0,1之间。In the present invention, the term "sigmoid transformation function" refers to an S-shaped function commonly seen in biology, also known as an S-shaped growth curve. In information science, due to its monotonic and inverse monotonic properties, the Sigmoid function is often used as an activation function of a neural network to map variables between 0 and 1.
本发明中,术语“Tanh变换函数”是指双曲函数中的一个,tanh为双曲正切。在数学中,双曲正切“tanh”是由双曲正弦和双曲余弦这两种基本双曲函数推导而来。In the present invention, the term "Tanh transformation function" refers to one of the hyperbolic functions, tanh being the hyperbolic tangent. In mathematics, the hyperbolic tangent "tanh" is derived from two basic hyperbolic functions, the hyperbolic sine and the hyperbolic cosine.
本发明中,术语“FB染色”是指钙白染色,可以和植物细胞壁特异结合,达到显示植物组织形态的目的。In the present invention, the term "FB staining" refers to calcium white staining, which can specifically bind to plant cell walls to achieve the purpose of displaying plant tissue morphology.
本发明中,术语“UMI”全称Unique Molecular Identifiers,又称分子条形码技术,是对原始样本基因组打断后的每一个片段都加上一段特有的标签序列,用于区分同一样本中成千上万的不同的片段,在后续的数据分析中可以通过这些标签序列来排除由于DNA聚合酶和扩增以及测序过程中所引入的错误。分子条形码通常由大约10nt左右的随机序列(比如NNNNNNN),或者简并碱基(NNNRNYN)组成,有别于样品标签(sample index或sample barcode),分子条形码是针对同一个样本中的不同片段加上的标签序列,而样品标签是用于区分不同样本而加上的标签序列。因此,每一个样本只能有一个相同的样品标签,但可以有成千上万的分子条形码。In the present invention, the term "UMI" stands for Unique Molecular Identifiers, also known as molecular barcode technology, which is to add a unique label sequence to each fragment after the original sample genome is interrupted, which is used to distinguish thousands of different fragments in the same sample. In subsequent data analysis, these label sequences can be used to eliminate errors introduced by DNA polymerase, amplification and sequencing processes. Molecular barcodes are usually composed of random sequences of about 10nt (such as NNNNNNN), or degenerate bases (NNNRNYN). Different from sample labels (sample index or sample barcode), molecular barcodes are label sequences added to different fragments in the same sample, while sample labels are label sequences added to distinguish different samples. Therefore, each sample can only have one identical sample label, but there can be thousands of molecular barcodes.
一种空间组表达量展示方法A method for displaying spatial group expression
空间转录组测序技术是高通量时空组测序技术,利用两次测序,分别确认mRNA序列的空间位置及对应表达量,该技术能精确到细胞级别甚至更高的分辨率,该技术①首先将含有随机条形码序列的DNB沉积到经光刻蚀刻的经修饰的芯片上;②与基于珠子的方法相比,使用滚环扩增放大产生的标记为DNB的随机条形码取得更大的空间条形码池,同时保持序列保真度。然后对阵列进行显微照相,用引物孵育并测序,以获得包含每个蚀刻DNB的坐标编码(CID)的数据矩阵;③通过与CID杂交,在每个点上连接分子编码(MID)和含有寡核苷酸的polyT序列;④下一步包括组织polyA尾RNA的捕获,通过将新鲜氮气冷冻组织切片加载到芯片表面,然后进行固定、渗透,最后进行逆转录和扩增;⑤收集扩增后的cDNA,作为制备文库的模板,与CID一起进行测序;⑥对测序数据进行计算分析,可以实现空间分辨的转录组学研究,其分辨率为500或715nm。帮助研究者辨别组织内部转录的位置,精确到细胞级别甚至更高的分辨率,这将促进研究者们对单个细胞和整个组织内细胞群体的认识和解读,如今已经应用在多个生物、医学领域,成为空间转录测序的主流方法之一。Spatial transcriptome sequencing technology is a high-throughput spatiotemporal group sequencing technology. It uses two sequencings to confirm the spatial position and corresponding expression level of mRNA sequences respectively. This technology can be accurate to the cellular level or even higher resolution. This technology ① first deposits DNBs containing random barcode sequences onto a modified chip that has been etched by photolithography; ② Compared with the bead-based method, the random barcodes labeled with DNBs produced by rolling circle amplification are used to obtain a larger spatial barcode pool while maintaining sequence fidelity. The array is then microphotographed, incubated with primers and sequenced to obtain a data matrix containing the coordinate code (CID) of each etched DNB; ③ By hybridizing with the CID, the molecular code (MID) and the polyT sequence containing oligonucleotides are connected at each point; ④ The next step includes the capture of tissue polyA tail RNA by loading fresh nitrogen frozen tissue sections onto the chip surface, followed by fixation, permeation, and finally reverse transcription and amplification; ⑤ The amplified cDNA is collected as a template for library preparation and sequenced together with the CID; ⑥ Computational analysis of sequencing data can achieve spatially resolved transcriptomic research with a resolution of 500 or 715nm. Helping researchers identify the location of transcription within tissues, accurate to the cellular level or even higher resolution, will promote researchers' understanding and interpretation of single cells and cell populations within the entire tissue. It has now been applied in multiple biological and medical fields and has become one of the mainstream methods for spatial transcription sequencing.
根据本发明的一些具体实施方案,本公开提出了一种空间组表达量展示方法,如图1所示,包括:According to some specific embodiments of the present invention, the present disclosure proposes a method for displaying the expression amount of a space group, as shown in FIG1 , comprising:
S100、利用空间转录组测序平台获取样本的表达量矩阵;S100, using the spatial transcriptome sequencing platform to obtain the expression matrix of the samples;
S120、引入非线性函数,根据UMI值的分布特征,对所述表达量矩阵中每个坐标对应的UMI值统一调整后,绘制表达量图像;S120, introducing a nonlinear function, uniformly adjusting the UMI value corresponding to each coordinate in the expression matrix according to the distribution characteristics of the UMI value, and drawing an expression image;
其中,UMI值统一调整的方法包括直接根据UMI值调整法或基因权重调整法,Among them, the method of uniformly adjusting the UMI value includes directly adjusting the UMI value or adjusting the gene weight.
所述直接根据UMI值调整法包括以下步骤:The method of adjusting directly according to the UMI value includes the following steps:
S130、根据步骤S120中UMI值的分布特征计算分布函数曲率,得到拐点G;S130, calculating the curvature of the distribution function according to the distribution characteristics of the UMI value in step S120, and obtaining an inflection point G;
S140、将非线性函数1向左或右偏移G个坐标得到变换函数,使用所述变换函数对所述表达矩阵中的UMI值重新进行计算,绘制表达量图像;S140, shifting the nonlinear function 1 to the left or right by G coordinates to obtain a transformation function, using the transformation function to recalculate the UMI value in the expression matrix, and drawing an expression quantity image;
所述基因权重调整法包括以下步骤: The gene weight adjustment method comprises the following steps:
S150、统计所述表达量矩阵中每个基因总的UMI值;S150, counting the total UMI value of each gene in the expression matrix;
S160、将所述每个基因总的UMI值分别代入到非线性函数2中得到每个基因的基因权重;S160, substituting the total UMI value of each gene into nonlinear function 2 to obtain the gene weight of each gene;
S170、根据所述基因权重,重新计算每个基因在坐标点上的UMI值,绘制表达量图像。S170: recalculate the UMI value of each gene at the coordinate point according to the gene weight, and draw an expression level image.
本公开提供的空间组表达量展示方法,基于数学统计,在表达量转灰度图像时引入非线性模型,能够精确有效的提供空间视觉信息。The spatial group expression quantity display method provided by the present disclosure is based on mathematical statistics and introduces a nonlinear model when converting the expression quantity into a grayscale image, which can accurately and effectively provide spatial visual information.
根据本发明的一些具体实施方案,bin1表达量图只与各个位置的UMI值有关,本领域技术人员可以通过任何已知的非线性函数来对UMI值进行调整,这里优选两种方法,包括直接根据UMI值调整法及重新分配基因权重调整法;According to some specific embodiments of the present invention, the bin1 expression map is only related to the UMI value of each position. Those skilled in the art can adjust the UMI value by any known nonlinear function. Two methods are preferred here, including a method of adjusting directly according to the UMI value and a method of adjusting by redistributing the gene weight.
其中,in,
直接根据UMI值调整法包括以下步骤:The direct adjustment method based on the UMI value includes the following steps:
(a)利用空间转录组平台获取样本的表达量矩阵并绘制bin1表达量图像;(a) The spatial transcriptome platform was used to obtain the expression matrix of the samples and draw the bin1 expression image;
(b)统计bin1表达量中所有不同的UMI值;(b) Count all different UMI values in bin1 expression;
(c)绘制UMI值的倒“L”形分布图,计算分布函数曲率,得到拐点G;(c) Draw an inverted “L” shaped distribution graph of the UMI value, calculate the curvature of the distribution function, and obtain the inflection point G;
(d)将非线性函数向左或右偏移G个坐标得到变换函数,使用该变换函数对bin1表达量图中的UMI值重新进行计算,绘制新的bin1表达量图像;(d) Shifting the nonlinear function to the left or right by G coordinates to obtain a transformation function, using the transformation function to recalculate the UMI value in the bin1 expression map, and drawing a new bin1 expression map;
根据基因权重调整法包括以下步骤:The gene weight adjustment method includes the following steps:
(e)利用空间转录组平台获取样本的表达量矩阵并绘制bin1表达量图像;(e) Use the spatial transcriptome platform to obtain the expression matrix of the samples and draw the bin1 expression image;
(f)统计原始表达量图中每个基因总的UMI值;(f) Count the total UMI value of each gene in the original expression graph;
(g)绘制log2(total UMI)+1的倒“L”形分布,设置基因的log2(total UMI)+1值为当前基因的权重;(g) Draw the inverted “L”-shaped distribution of log2(total UMI)+1, and set the log2(total UMI)+1 value of the gene as the weight of the current gene;
(h)根据每个基因的权重,重新计算每个基因在坐标点上的UMI值,绘制新的bin1表达量图。(h) According to the weight of each gene, the UMI value of each gene at the coordinate point is recalculated and a new bin1 expression map is drawn.
根据本发明的一些具体实施方案,所述空间转录组测序平台包括任何本领域已知空间转录组技术平台。According to some specific embodiments of the present invention, the spatial transcriptome sequencing platform includes any spatial transcriptome technology platform known in the art.
根据本发明的一些具体实施方案,所述所述非线性函数包括但不限于幂函数、对数函数、Sigmoid函数、Tanh函数等非线性函数。According to some specific embodiments of the present invention, the nonlinear function includes but is not limited to nonlinear functions such as power function, logarithmic function, Sigmoid function, Tanh function, etc.
根据本发明的一些具体实施方案,所述样本包括选自动物、植物的组织样本。According to some specific embodiments of the present invention, the sample comprises a tissue sample selected from animals and plants.
实施例中未注明具体技术或条件的,按照本领域内的文献所描述的技术或条件或者按照产品说明书进行。所用试剂或仪器未注明生产厂商者,均为可以通过市购获得的常规产品。If no specific techniques or conditions are specified in the examples, the techniques or conditions described in the literature in the field or the product instructions are used. If no manufacturer is specified for the reagents or instruments used, they are all conventional products that can be purchased commercially.
实施例1拟南芥心形胚空间组表达量展示方法Example 1 Method for displaying the expression level of Arabidopsis heart-shaped embryo spatial group
用OCT包埋液对新鲜的拟南心形胚进行冷冻包埋处理,将包埋好的心形胚进行切片并进行FB染色成像(图2),将组织切片放置在含有与RNA结合捕获探针的芯片上,并进行固定和透化(使细胞中的mRNA得到释放,并结合到相应的捕获探针上,从而获取基因表达信息),以捕获的RNA为模板进行cDNA合成和测序文库制备,将制备好的文库拿去测序(Stereo-seq空间转录组测序平台),得到样本的表达数据,并根据数据绘制表达矩阵,结合数据中携带的空间位置信息将表达信息还原至对应位置上,获得原始的未经处理的bin1表达量图像(图3),放入Fiji软件查看原始bin1展现的表达量情况。发现bin1图像由于像素点过多,表达量UMI值较为集中,无法观察到清晰的组织边界和内部轮廓,原始bin1图像灰暗,难以辨认有效信息。Fresh Arabidopsis heart-shaped embryos were frozen and embedded in OCT embedding solution, and the embedded heart-shaped embryos were sliced and stained for FB imaging (Figure 2). The tissue slices were placed on a chip containing RNA-binding capture probes, fixed and permeabilized (to release the mRNA in the cells and bind to the corresponding capture probes to obtain gene expression information), and cDNA synthesis and sequencing library preparation were performed using the captured RNA as a template. The prepared library was sequenced (Stereo-seq spatial transcriptome sequencing platform) to obtain the expression data of the sample, and the expression matrix was drawn based on the data. The expression information was restored to the corresponding position in combination with the spatial position information carried in the data to obtain the original unprocessed bin1 expression image (Figure 3), and the expression level displayed by the original bin1 was viewed in the Fiji software. It was found that due to the excessive number of pixels in the bin1 image, the expression UMI value was relatively concentrated, and the clear tissue boundary and internal contour could not be observed. The original bin1 image was gray and it was difficult to identify effective information.
发明人通过统计图3中心形胚所有不同UMI点数分布图,发现大部分UMI数值较低,只有少部分点UMI值较高,噪声较大(图4),根据该UMI的分布及本发明提出的空间表达展示方法,可以调整变换函数,The inventors found that most of the UMI values were low, only a few points had high UMI values, and the noise was large (Figure 4). According to the distribution of the UMI and the spatial expression and display method proposed by the present invention, the transformation function can be adjusted.
方法1:直接根据UMI值调整法Method 1: Directly adjust according to UMI value
(1)统计bin1表达量中所有不同的UMI值;(1) Count all different UMI values in bin1 expression;
(2)绘制UMI值的倒“L”形分布图,计算分布函数曲率,得到拐点G;(2) Draw an inverted “L”-shaped distribution graph of the UMI value, calculate the curvature of the distribution function, and obtain the inflection point G;
(3)将非线性函数向左或右偏移G个坐标得到变换函数,使用该变换函数对bin1表达量图中的UMI值重新进行计算,绘制新的bin1表达量图像。 (3) The nonlinear function is shifted to the left or right by G coordinates to obtain a transformation function, and the UMI value in the bin1 expression map is recalculated using the transformation function to draw a new bin1 expression map.
图5是不同截距的sigmoid变换函数,图6是不同截距的Tanh变换函数,Tanh变换函数能够有效降低UMI的点,提升高UMI值的点。FIG5 is a sigmoid transformation function with different intercepts, and FIG6 is a Tanh transformation function with different intercepts. The Tanh transformation function can effectively reduce the UMI points and increase the points with high UMI values.
期间,发明人也引入不同非线性函数对原始bin1图像进行处理,处理效果如表1所示,发现使用不同的方法处理能够突出bin1图中的不同的重点信息,如突出高表达量的地方,突出低表达量的地方,亦或降低背景噪声凸显轮廓信息等,最终选择了其中最优函数y=Tanh(x-G)+n(n为可选超参数)。依据对原始图像UMI的统计,计算得到拐点G为6,图7为y=Tanh(x-6)函数重整UMI值后生成的新的bin1表达量图,可以看到背景点减少,组织区域加亮,图8为y=Tanh(x-6)函数调整后不同UMI的点数分布图。During this period, the inventors also introduced different nonlinear functions to process the original bin1 image. The processing effect is shown in Table 1. It is found that different processing methods can highlight different key information in the bin1 image, such as highlighting areas with high expression, highlighting areas with low expression, or reducing background noise to highlight contour information, etc. Finally, the optimal function y=Tanh(x-G)+n (n is an optional hyperparameter) was selected. According to the statistics of the UMI of the original image, the inflection point G is calculated to be 6. Figure 7 is a new bin1 expression map generated after the y=Tanh(x-6) function reorganizes the UMI value. It can be seen that the background points are reduced and the tissue area is brightened. Figure 8 is a distribution diagram of the number of points of different UMIs after the y=Tanh(x-6) function is adjusted.
表1
Table 1
方法2:根据基因权重变换Method 2: Transformation based on gene weights
(1)统计原始表达量图中每个基因总的UMI值;(1) Count the total UMI value of each gene in the original expression map;
(2)绘制log2(total UMI)+1的倒“L”形分布,设置基因的log2(total UMI)+1值为当前基因的权重;(2) Draw an inverted “L”-shaped distribution of log2(total UMI)+1 and set the log2(total UMI)+1 value of the gene as the weight of the current gene;
(3)根据每个基因的权重,重新计算每个基因在坐标点上的UMI值,绘制新的bin1表达量图。(3) According to the weight of each gene, recalculate the UMI value of each gene at the coordinate point and draw a new bin1 expression map.
统计每个基因总的UMI值取log2+1分布的情况,将log2(total UMI)+1作为每个基因的权重,加入每个基因的权重重新计算每个基因每个坐标点的UMI值(图9),重新绘制新的UMI生成的bin1表达量图(图10),可以看到虽然图像整体变暗,但是背景区域减少了,能够观察到高表达的基因。 The total UMI value of each gene is counted and the log2+1 distribution is taken. Log 2 (total UMI)+1 is used as the weight of each gene. The weight of each gene is added to recalculate the UMI value of each coordinate point of each gene (Figure 9). The bin1 expression map generated by the new UMI is redrawn (Figure 10). It can be seen that although the image becomes darker as a whole, the background area is reduced, and highly expressed genes can be observed.
在本说明书的描述中,参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本公开的至少一个实施例或示例中。在本说明书中,对上述术语的示意性表述不必须针对的是相同的实施例或示例。而且,描述的具体特征、结构、材料或者特点可以在任一个或多个实施例或示例中以合适的方式结合。此外,在不相互矛盾的情况下,本领域的技术人员可以将本说明书中描述的不同实施例或示例以及不同实施例或示例的特征进行结合和组合。In the description of this specification, the description with reference to the terms "one embodiment", "some embodiments", "example", "specific example", or "some examples" etc. means that the specific features, structures, materials or characteristics described in conjunction with the embodiment or example are included in at least one embodiment or example of the present disclosure. In this specification, the schematic representations of the above terms do not necessarily refer to the same embodiment or example. Moreover, the specific features, structures, materials or characteristics described may be combined in any one or more embodiments or examples in a suitable manner. In addition, those skilled in the art may combine and combine the different embodiments or examples described in this specification and the features of the different embodiments or examples, without contradiction.
尽管上面已经示出和描述了本公开的实施例,可以理解的是,上述实施例是示例性的,不能理解为对本公开的限制,本领域的普通技术人员在本公开的范围内可以对上述实施例进行变化、修改、替换和变型。 Although the embodiments of the present disclosure have been shown and described above, it is to be understood that the above embodiments are exemplary and are not to be construed as limitations of the present disclosure. A person skilled in the art may change, modify, replace and vary the above embodiments within the scope of the present disclosure.
Claims (18)
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202380082879.6A CN120303737A (en) | 2023-06-08 | 2023-06-08 | A method for displaying spatial group expression |
PCT/CN2023/099225 WO2024250245A1 (en) | 2023-06-08 | 2023-06-08 | Method for displaying expression level of spatial group |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2023/099225 WO2024250245A1 (en) | 2023-06-08 | 2023-06-08 | Method for displaying expression level of spatial group |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2024250245A1 true WO2024250245A1 (en) | 2024-12-12 |
Family
ID=93794845
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2023/099225 WO2024250245A1 (en) | 2023-06-08 | 2023-06-08 | Method for displaying expression level of spatial group |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN120303737A (en) |
WO (1) | WO2024250245A1 (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110806546A (en) * | 2019-10-28 | 2020-02-18 | 腾讯科技(深圳)有限公司 | Battery health assessment method and device, storage medium and electronic equipment |
CN112522371A (en) * | 2020-12-21 | 2021-03-19 | 广州基迪奥生物科技有限公司 | Analysis method of spatial transcriptome sequencing data |
US20210155982A1 (en) * | 2019-11-21 | 2021-05-27 | 10X Genomics, Inc. | Pipeline for spatial analysis of analytes |
CN114262372A (en) * | 2021-12-24 | 2022-04-01 | 同济大学 | A transcription factor regulating osteogenic differentiation of cells and osteogenic differentiated cells |
CN114882955A (en) * | 2022-04-08 | 2022-08-09 | 广州国家实验室 | Transcriptome image generation device, method and application |
-
2023
- 2023-06-08 WO PCT/CN2023/099225 patent/WO2024250245A1/en unknown
- 2023-06-08 CN CN202380082879.6A patent/CN120303737A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110806546A (en) * | 2019-10-28 | 2020-02-18 | 腾讯科技(深圳)有限公司 | Battery health assessment method and device, storage medium and electronic equipment |
US20210155982A1 (en) * | 2019-11-21 | 2021-05-27 | 10X Genomics, Inc. | Pipeline for spatial analysis of analytes |
CN112522371A (en) * | 2020-12-21 | 2021-03-19 | 广州基迪奥生物科技有限公司 | Analysis method of spatial transcriptome sequencing data |
CN114262372A (en) * | 2021-12-24 | 2022-04-01 | 同济大学 | A transcription factor regulating osteogenic differentiation of cells and osteogenic differentiated cells |
CN114882955A (en) * | 2022-04-08 | 2022-08-09 | 广州国家实验室 | Transcriptome image generation device, method and application |
Also Published As
Publication number | Publication date |
---|---|
CN120303737A (en) | 2025-07-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6862581B2 (en) | Deep learning-based variant classifier | |
Glynn et al. | Detecting periodic patterns in unevenly spaced gene expression time series using Lomb–Scargle periodograms | |
Liang et al. | Pervasive correlated evolution in gene expression shapes cell and tissue type transcriptomes | |
Forster et al. | Experiments using microarray technology: limitations and standard operating procedures | |
Bar-Or et al. | Derivation of species-specific hybridization-like knowledge out of cross-species hybridization results | |
Birnbaum et al. | Measuring cell identity in noisy biological systems | |
US6502039B1 (en) | Mathematical analysis for the estimation of changes in the level of gene expression | |
WO2024250245A1 (en) | Method for displaying expression level of spatial group | |
CN109192246A (en) | Detect the method, apparatus and storage medium of chromosomal copy number exception | |
EP1630709B1 (en) | Mathematical analysis for the estimation of changes in the level of gene expression | |
Comander et al. | Argus—a new database system for Web-based analysis of multiple microarray data sets | |
Kim et al. | Spearman's footrule as a measure of cDNA microarray reproducibility | |
JP5787517B2 (en) | System and method for determining the amount of starting reagent using polymerase chain reaction | |
CN110428865B (en) | A high-throughput method for predicting antifreeze proteins | |
EP1134687B1 (en) | Method for displaying results of hybridization experiments | |
Fan et al. | MATL: A deep neural network using multi-scale convolutions and transformer for transcription factor binding site prediction | |
Lockhart et al. | DNA arrays and gene expression analysis in the brain | |
JP2000285120A (en) | Gene expression search method and apparatus therefor | |
Fajriyah | Microarray data analysis: Background correction and differentially expressed genes | |
Liyanaarachchi | A Copula Model Approach to Identify the Differential Gene Expression | |
Tchakounte-Wakem | A Comparison of Methods Taking into Account Asymmetry when Evaluating Differential Expression in Gene Expression Experiments | |
Van Buren | Incorporation of Quantification Uncertainty into Bulk and Single-Cell RNA-Seq Analysis | |
WO2006084216A2 (en) | Optimized probe selection method | |
CN110066862A (en) | A kind of reiterated DNA sequences recognition methods based on high-flux sequence reading | |
CN119694386A (en) | A method to exclude genes irrelevant to biological function |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23940131 Country of ref document: EP Kind code of ref document: A1 |